Unicode

From The x3270 Wiki

Overview

Unicode is a standard for representing text from multiple languages. Each glyph is represented by a 32-bit code point, commonly written as U+nnnn or U+nnnnnnnn. The x3270 family uses Unicode as the internal representation of all text data. (It also retains EBCDIC text sent from the host, but the process of displaying it involves translations to and from Unicode.)

Text read by the emulators is converted to Unicode internally. When text is displayed, it is converted from Unicode to the workstation's local encoding.

UTF-8

The most common encoding for Unicode is UTF-8. UTF-8 uses a single byte to represent ASCII characters, and multi-byte sequences to represent code points above U+00ff.

Explicit use of Unicode

The Key() action accepts a Unicode code point (U+nnnn) as a parameter.

The String() action supports a \u escape sequence to specify Unicode text.

The ReadBuffer() action includes a unicode option to produce output as Unicode code points.

The utf8 resource and -utf8 command-line option explicitly override the workstation's local encoding and force input and output text to be UTF-8.

Limitations

The x3270 family only supports the Unicode Basic Multilingual Plane (BMP).

See also

Wikipedia article on Unicode

Wikipedia article on UTF-8