Host code page

From The x3270 Wiki

A host code page is the mapping between the host's EBCDIC code points and the Unicode code points used internally by the emulator. Differences in code pages cause the same data from the host to be displayed differently. For example, EBCDIC X'C1' is Unicode U+0041 (A) in almost all host code pages, but EBCDIC X'5B' is Unicode U+0024 ($) in code page 037 and Unicode U+00A3 (£) in code page 285.

Code page values are strings. They have mnemonic names like french, which is EBCDIC code page 297, and can also be specified as cp297. One x3270 family code page does not have an official IBM code page number: bracket, which is the default code page and is a variant of EBCDIC code page 037.

To display the code pages supported by the emulator, use the -v command-line option or execute the action Query(CodePages).

Character sets

3270 terminals also support the notion of a character set, which is related to a host code page. Character sets can be either single-byte (SBCS) or double-byte (DBCS).

A single-byte character set is used to represent a language with 256 or fewer glyphs (visually-distinct symbols). (The actual maximum is less than 256 because many code points are taken up by control characters and special symbols.) Each character is represented by an 8-bit value (one byte) and takes one position in the screen buffer. European languages are generally represented by single-byte character sets.

A double-byte character set is used to represent a language with more than 256 glyphs. Each character is represented by a 16-bit value (two bytes) and takes two positions in the screen buffer. Asian languages are generally represented by double-byte character sets.

A character set is formally defined as a GCSGID (a set of glyphs) and a CPGID (a mapping between EBCDIC code points and those glyphs). For an SBCS code page, the code page number and the CPGID of the character set have the same value. For a DBCS code page, there actually two character sets: a DBCS character set and an SBCS character set. Neither of the CPGIDs of the two character sets are the same as the code page number.

Here are some examples:

Code page name Code page number SBCS GCSGID SBCS CPGID DBCS GCSGID DBCS CPGID
us 037 697 37
french 297 697 297
italian 280 697 280
russian 880 959 880
chinese-gb18030 1388 1174 836 937 837
traditional-chinese 937 1175 37 935 835

For the SBCS examples, the code page numbers and SBCS CPGIDs are the same. US English, French and Italian use a common set of glyphs (GCSGID 697, Latin letters). Russian uses different glyphs (GCSGID 959, Cyrillic letters). GB18030 Chinese is code page 1388. Its SBCS character set uses unique glyphs (GCSGID 1174) and it includes a DBCS character set with GCSGID 937 (simplified Chinese glyphs). Traditional Chinese is code page 937. Its SBCS character set uses the same EBCDIC mapping as US English (CPGID 37), but with different glyphs. Its DBCS character set uses its own glyphs (GCSGID 935, traditional Chinese).

See also

IBM GA23-0059-4 3270 Data Stream Programmers Reference

Host Code Page Reference