While making a French/English flash card program, it became apparent that there was no easy way to work with accented
characters. The simplist way is to ignore the accents, but that is not a real solution. My platform is the Arduino
Mega2650, which has enough memory to store a non-trivial number of words and phrases.
There is no single standard format for displaying accented chars as there is for ordinary ASCII characters. The
PC Windows format use two separate but equal ALT+keypad number codes: ALT+130 and ALT+0130 show different characters.
Most text editors like UltraEdit use the Latin Maschule-C format, while the ArduinoIDE (and serial monitor) use the
Unicode UFT-8 format. The Arduino TFT screen libraries commonly use the original Adafruit 256-char font file, which
uses yet another encoding in the 'hidden' chars between 128 and 255. And then it seems that the accented characters
created by the Arduino IDE for the Mega are different from those created for the UNO/Nano. All I wanted was to have
the same characters displayed on the Arduino TFT screen as were being displayed in the Arduino IDE editor.
I downloaded a long list of 1544 phrases from the french Reverso com website using a Select-All [Cntrl-A] and copy-
to-buffer [Cntrl-C]. Then I pasted this list into a new OpenOffice-Word file using Cntrl-V. MS Word is one of the
few programs that can save text in the Unicode UFT-8 format used by Arduino's IDE. When the UTF-8 file of the phrases
loaded into the Arduino IDE, all the accented characters displayed correctly. Each phrase was then framed for flash
memory storage using:
const char frExpr0000[] PROGMEM = "Accuser réception";
const char frExpr0001[] PROGMEM = "Acheter/vendre chat en poche";
Next create an array consisting of the beginning flash address of each phrase:
const char * const PROGMEM frExprArray[] = { frExpr0000, frExpr0001 };
The main code can now use the phrases directly from flash memory with:
unsigned int expressionNumber ;
char expressionBuffer[80]; // make sure this is large enough for the largest string it must hold
strcpy_P( (char *) expressionBuffer, (char *) pgm_read_word (& frExprArray[expressionNumber]);
The char array 'expressionBuffer[]' now holds one complete phrase, with a terminating zero at the end. Each
char in the array has to be tested to determine if it is an accented char. The UFT-8 format used by the .ino source
file uses a sentinal byte with value 0xC3 to indicate that an accented character follows. The next char has a
value between 128 and 255. The char has to be converted to a byte in order to be tested. For some reason,
Arduino binary code for the UNO/Nano doesn't precede the accented char with an 0xC3 sentinal byte, and the following
byte is offset by 0x40 from the Mega's compiled accent-char value. I found this out by doing a hex-pair display
of the bytes stored in flash by the diffenent Arduino boards. A Mega2650 will show:
0000: 61 C3 A9 62 63 ... while a UNO/Nano will have 0000: 61 E9 62 63 ... for the same character display.
Finally the accented char has to be mapped to a bit-map found in the font table used by the TFT screen. The
Adafruit 256-char font table has about 30 of the most commonly-used accented chars that are found in about 99%
of French words. The big switch statement substitutes the char's UTF-8 format for the correct font bit-map.
for (uint8_t index=0; expressionBuffer
uint8_t myOneChar = (uint8_t) expressionBuffer
if (myOneChar < 128) tft.write(myOneChar); // TFT displays the char of ASCII# myOneChar
else {
if (myOneChar == 0xc3) { // this is the Mega2650 version. I don't know why the
index++; // Mega version writes different numbers for accented chars.
myOneChar = (uint8_t) expressionBuffer
}
switch (myOneChar) { // if myOneChar is greater than 128, check if it is on the list.
case 0xe9 : // If it is, then display instead the alternate ASCII from the font table.
tft.write(char(130)); break; // e aiguile lower_case
case 0xc9 :
tft.write(char(130)); break; // e aiguile upper_case {to lower case}
case 0xe0 :
tft.write(char(133)); break; // a grave lower_case
case 0xc0 :
tft.write(char(133)); break; // a grave upper_case {to lower case}
case 0xe8 :
tft.write(char(138)); break; // e grave lower_case
case 0xc8 :
tft.write(char(144)); break; // e grave upper_case
case 0xf9 :
tft.write(char(151)); break; // u grave lower_case
case 0xd9 :
tft.write(char(151)); break; // u grave upper_case
case 0xe2 :
tft.write(char(131)); break; // a circomflex lower_case
case 0xea :
tft.write(char(136)); break; // e circomflex lower_case
case 0xca :
tft.write(char(136)); break; // e circomflex upper_case {to lower case}
case 0xee :
tft.write(char(140)); break; // i circomflex lower_case
case 0xf4 :
tft.write(char(147)); break; // o circomflex lower_case
case 0xfb :
tft.write(char(150)); break; // u circomflex lower_case
case 0xe7 :
tft.write(char(135)); break; // c_cedille lower_case
case 0xc7 :
tft.write(char(128)); break; // c_cedille upper_case
default :
tft.print(F(" -accent- ")); break;
} // switch
} // else
} // for (index=...
All in all, it was a lot of work for a character conversion that should easily done in the background. I suppose that
this would have fixed (or made easier) long ago if English had lots of accented characters. C'est la vie.