Sunday, November 11, 2012

Unicode Numerical Character References (NCRs) for Chinese

I recently learned of the following two web forms (there are likely more) into which you can enter Chinese characters and receive back the decimal forms of Unicode Numerical Character References (NCRs), which can be used in web pages having a Unicode charset:

http://www.pinyin.info/tools/converter/chars2uninumbers.html
http://weber.ucsd.edu/~dkjordan/resources/unicodemaker.html

NCRs are combinations of ASCII characters that can be copied and pasted into, e.g., text editors that won't accept native Chinese characters. The first web form above gives this example:

台北 (Táiběi; Taipei) = 台北

I have been using a small quantity of native Chinese characters on my web site (not this blog) for years, but had not made any new edits of that Chinese in perhaps a few years, having inadvertently deinstalled my favored Chinese software, an older version of 自然輸入法 (Zìrán Shūrùfǎ; "Natural Input", although I don't remember if it had an actual English name), and having misplaced the installation CD.  I was recently surprised to discover that somewhere along the Iine, the web hosting company converted the native Chinese characters in my files into NCRs.

That has the benefit of making it possible for me to add Chinese directly when editing web page text files on the Unix server (using, e.g., the WebSSH app), although I need to (painfully) look up the NCR for each Chinese character.

The regrettable accompanying downside is that the Chinese in my files is now only visible as Chinese when viewed in a browser, and is no longer visible as Chinese in plain text editors.

No comments:

Post a Comment