Printing special characters in a terminal window

Go To Last Post
8 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello!

 

I'm having some issues printing special characters via serial port. Same happens with tera term and putty. When I try to print special characters such as ñ, ¿, degree symbol, the terminal window shows ?'s. I want to print some fancy letters and doesn't work either. 

 

Can you tell me what I need to change ? is the UTF encoding? Tried changing the encoding in putty with no results.

 

 ▄▄▄·  ▌ ▐·▄▄▄  ·▄▄▄▄▄▄  ▄▄▄ . ▄▄▄· ▄ •▄ .▄▄ · 
▐█ ▀█ ▪█·█▌▀▄ █·▐▄▄·▀▄ █·▀▄.▀·▐█ ▀█ █▌▄▌▪▐█ ▀. 
▄█▀▀█ ▐█▐█•▐▀▀▄ ██▪ ▐▀▀▄ ▐▀▀▪▄▄█▀▀█ ▐▀▀▄·▄▀▀▀█▄
▐█ ▪▐▌ ███ ▐█•█▌██▌.▐█•█▌▐█▄▄▌▐█ ▪▐▌▐█.█▌▐█▄▪▐█
 ▀  ▀ . ▀  .▀  ▀▀▀▀ .▀  ▀ ▀▀▀  ▀  ▀ ·▀  ▀ ▀▀▀▀ 

Thanks, 

 

Jorge.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It is likely an encoding issue. Terminal programs, generally, only can deal with ASCII. There are no ASCII characters for ñ or ¿. Thus, you are likely sending UTF8 encoded text and that uses multiple bytes for non-ASCII characters. Your terminal program probably does not know how  to deal with this and just takes each byte as a character. You need to look for a terminal program, then, that is capable of handling non-ASCII codings. I do not know which terminal programs will do this.

 

Jim

 

Until Black Lives Matter, we do not have "All Lives Matter"!

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ka7ehk wrote:

It is likely an encoding issue. Terminal programs, generally, only can deal with ASCII. There are no ASCII characters for ñ or ¿. Thus, you are likely sending UTF8 encoded text and that uses multiple bytes for non-ASCII characters. Your terminal program probably does not know how  to deal with this and just takes each byte as a character. You need to look for a terminal program, then, that is capable of handling non-ASCII codings. I do not know which terminal programs will do this.

 

Jim

Thanks Jim., 

 

So, when I send a "ñ", or assign the corresponding byte in the DATA register of the serial port, the uC internally sends two characters?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This problem is as old as the hills. When the IBM PC first came out IBM put a bunch of "useful" characters into their 256 character set ROM that drive the CGA display adapter. Those characters looked like this:

 

http://www.gridsagegames.com/blog/gsg-content/uploads/2014/09/rexpaint_cp437_10x10.png

 

The first 128 of those (well the ones between 32 and 127 anyway) are pretty obvious as they are defined by ASCII (American Standard Code for Information Interchange) and deliver 0..9, A..Z and a..z among other things. But beyond 127 it was a pretty arbitrary choice and IBM chose to load it up with a whole load of "box edge" characters. This meant that users could then write software that did things like this:

 

https://doc.opensuse.org/projects/YaST/SLES9/YaST2-Package-Manager/pic/y2pkg-ncurses.png

 

Even though those are text characters (not pixel addressable graphics) this gives the idea of "windows" and "menus" just using the box edge characters. (this is often called "ncurses").

 

Pretty soon the success of the IBM PC meant that Europeans in France, Italy, Germany, Spain with all their Ç à ñ  Ü  and so on wanted to be able to type/edit/display in their European languages with all kinds of "odd" symbols. So from MS-DOS 3.2 onwards (I think it was?) IBM/Microsoft introduced the concept of "Code Pages". This was basically a way to swap in a different set of characters from 128 to 255. Perhaps the most useful one of them all was the ubiquitous Code Page 850:

 

http://czyborra.com/charsets/cp850.gif

 

which replaced 128.255 (0x 80..FF)  with most of the European language symbols and just left a few of the box edge characters for producing "ncurses" style display output. This lead to problems like this:

 

http://i.imgur.com/T2zCe.png

 

When things moved onto Windows there wasn't quite such an issue as pixel addressable graphic displays meant that pretty much any character could be drawn. But there was still the issue of numbering those characters - if you printed character 160 say what did you expect to appear (this was a favourite of mine as it was the UK '£' symbol - or at least it should have been!). In Windows they adopted an international standard mapping for characters 0..255 called ISO-8859-1 which is also called "Latin1". It has the following 256 characters:

 

https://www.netresultstracker.com/pthelp55/Images/iso88591.gif

 

But all of these character sets still have the problem that there are more than 256 characters needed to type in all international languages.

 

So next up came "Unicode". That uses 16 bits per character so there can be 65536 different characters (you don't get a picture for that one!). That does, indeed, allow almost any character to be carried from one place to another (serial cable, internet, whatever) and the guy at the other end sees the same thing on his display device as you authored before you sent the data.

 

Problem is that Unicode is wasteful. "Hello world" is now 22 bytes instead of 11 when there's no "special symbols" in that - so it's a real waste.

 

So the next solution was UTF-8. This carries the information using the bytes 0..255 again. HOWEVER it is not limited to just 256 characters. This conveys the first 128 character "as is" with 0xxxxxxx but above 128 it then uses "escapes" So for the next bunch of characters it encodes them as 110xxxxx 10xxxxxx where the 110 and 10 prefixes are fixed but the xxxxx xxxxxx allow 11 bit codes to pick the "next most popular" characters. Beyond that it then uses 1110xxxx 10xxxxxx 10xxxxxx to convey 16 bit character codes and so on. So the "less popular" your character is the more bytes/bits it takes to identify it. This idea is so popular that Wikipedia shows how UTF-8 is taking over as the main character encoding people use these days:

 

https://upload.wikimedia.org/wikipedia/commons/a/a9/UnicodeGrow2b.png

So the bottom line to get back to your original question is that you need to use a terminal (or configure it) to use the same character set as you are sending out in the first place.

 

So if you transmit Code Page 437 (original IBM) characters then set the terminal to Code Page 437 (if it has the option)

So if you transmit Code Page 850 (IBM "Euro" set) characters then set the terminal to Code Page 850 (if it has the option)

So if you transmit ISO-8859-1 (Latin1) (Windows original) characters then set the terminal to ISO-8859-1 (if it has the option)

So if you transmit Unicode (16 bit) characters then set the terminal to Unicode (if it has the option)

So if you transmit UTF-8 (packing Unicode into 8 bit) characters then set the terminal to UTF-8 (if it has the option)

 

The key thing is that the device where the output is displayed must be using the same character set mapping as the place where you authored the text/data to be displayed in the first place. If they match you will see the right thing. If they don't match you see:

 ▄▄▄·  ▌ ▐·▄▄▄  ·▄▄▄▄▄▄  ▄▄▄ . ▄▄▄· ▄ •▄ .▄▄ · 
▐█ ▀█ ▪█·█▌▀▄ █·▐▄▄·▀▄ █·▀▄.▀·▐█ ▀█ █▌▄▌▪▐█ ▀. 
▄█▀▀█ ▐█▐█•▐▀▀▄ ██▪ ▐▀▀▄ ▐▀▀▪▄▄█▀▀█ ▐▀▀▄·▄▀▀▀█▄
▐█ ▪▐▌ ███ ▐█•█▌██▌.▐█•█▌▐█▄▄▌▐█ ▪▐▌▐█.█▌▐█▄▪▐█
 ▀  ▀ . ▀  .▀  ▀▀▀▀ .▀  ▀ ▀▀▀  ▀  ▀ ·▀  ▀ ▀▀▀▀ 

The fact that has such a rich set of block/edge characters suggest that terminal is currently set to Code Page 437.

 

The web and probably even the editor you write C code with are probably all using UTF-8 these days. I guess the question is whether your terminal/display device can do UTF-8 too?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

+1

This should be a sticky!

 

(Possum Lodge oath) Quando omni flunkus, moritati.

"I thought growing old would take longer"

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Wow, I didn't expected such explanation, thank you very much!!

 

So, I must have the same character set both in terminal and in the firmware. 

 

I don't think that there is a character set configuration in the serial port, I asssume that what I must do is identify the character set of the "blocks" I'm trying to send and match the character set in the terminal program. 

 

In PuTTY, the character set is under window->translation->remote character set. I changed to 437 and didn't work. But after changing it to "Latin-1", the special characters such as "á" worked. 

 

Thank you very muchs!!

 

Jorge.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jorge.ar wrote:
I don't think that there is a character set configuration in the serial port,

There isn't. Serial ports just carry (usually) 8 bits from one place to another. How that data is actually interpreted by the sender and the receiver is a whole other thing!

 

Glad you got it working with Latin1. The implication of the fact that setting the receiver to "Latin1" and it showing the correct characters means that what you are using to author the AVR software is probably also using Latin1 too. So I you type:

UDR = 'á';

in your C code and this causes 'á' to appear in a Latin1 terminal at the other end suggests you are editing your C in Latin1 too.

 

If you wanted to make your C code work even if it were not Latin1 I guess you could:

#define A_ACUTE 224
...
UDR = A_ACUTE;

and that would work whatever the C editing was done with. But it will only come out looking right if the thing the AVR sends to IS set to Latin1. (8859-1).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
If you wanted to make your C code work even if it were not Latin1 I guess you could:

#define A_ACUTE 224
...
UDR = A_ACUTE;

Which is exactly how HTML does it - see, for example, http://www.w3schools.com/html/ht...

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...