Wednesday, March 23, 2011

The significance of diacritical marks and special characters

A friend from San Francisco once told me the tale of a person having received a Worker of the Year award. The award was in Spanish, and there was one minor problem...the printed award lacked the tilde (~) over the n in the Spanish word for year (año). The award thus read, "Trabajador del Ano" -- Worker of the Anus.
(Actually, I think my friend told me "Trabajador de Ano", but I think the middle word was probably "del" [of the], instead of just "de" [of].)

That's the most extreme case I've heard of where the lack of a diacritical mark makes a big difference.

On a more technical, but less disturbing, note, I recently learned a little tidbit about the eight-bit ISO 8859-15 (Western European Latin Alphabet #9) character set, which includes the Euro symbol along with some characters from the French and Finnish languages, characters which are missing from the more heavily used ISO 8859-1 (Western European Latin Alphabet #1) character set. Despite good intentions, ISO 8859-15 was apparently unable to overcome the popularity of ISO 8859-1.

The ISO 8859-15 bonus characters for French are œ, Œ, and Ÿ (replacing ½, ¼, and ¾, respectively).  I had never heard of French words that had Ÿ in them, so I found it interesting that such a character existed in the French alphabet.

A French Wikipedia page, which I can't find at the moment, mentioned that when the working group met to decide what characters would go into ISO 8859-1, one member said there were no contexts in which œ would be confused with oe (hmm, for words that have oe, I can only think of Noël and Citroën off the top of my head, and that e is different). Another member of that working group worked for a printer company, and said that they didn't even have Ÿ available on their printers. This was the mid-1980's, it seems, so those may have been daisy-wheel and dot matrix printers.

Oof, I see 8859-15 also kicked out ¦ ("pipe"), replacing it with a Finnish character, a dreadful decision for technical computer usage.

French Wikipedia page on ISO 8859-15

