FACTOID # 30: If Alaska were its own country, it would be the 26th largest in total area, slightly larger than Iran.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Code page

Code page is the traditional IBM term used for a specific character encoding table: a mapping in which a sequence of bits, usually a single octet representing integer values 0 through 255, is associated with a specific character. IBM and Microsoft often allocate a code page number to a character set even if that charset is better known by another name. International Business Machines Corporation (IBM, or colloquially, Big Blue) (NYSE: IBM) (incorporated June 15, 1911, in operation since 1888) is headquartered in Armonk, New York, USA. The company manufactures and sells computer hardware, software, and services. ... A character encoding consists of a code that pairs a sequence of characters from a given set with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the storage of text in computers and the transmission of text through telecommunication networks. ... BIT is an acronym for: Bangalore Institute of Technology Bilateral Investment Treaty Bhilai Institute of Technology - Durg Birla Institute of Technology - Mesra Battles in Time (Doctor Who magazine) Category: ... In computer technology and networking, an octet is a group of 8 bits. ... now. ... The Microsoft Corporation, commonly known as just Microsoft, (NASDAQ: MSFT, HKSE: 4338) is a multinational computer technology corporation with global annual sales of US$44. ... A character encoding is a code that pairs a set of characters (such as an alphabet or syllabary) with a set of something else, such as numbers or electrical pulses. ...


Whilst the term code page originated from IBM's EBCDIC-based mainframe systems, the term is most commonly associated with the IBM PC code pages. Microsoft, a maker of PC operating systems, refers to these code pages as OEM code pages, and supplements them with its own "ANSI" code pages. EBCDIC (Extended Binary Coded Decimal Interchange Code) is an 8-bit character encoding (code page) used on IBM mainframe operating systems, like z/OS, OS/390, VM and VSE, as well as IBM minicomputer operating systems like OS/400 and i5/OS. It is also employed on various non-IBM... The Microsoft Corporation, commonly known as just Microsoft, (NASDAQ: MSFT, HKSE: 4338) is a multinational computer technology corporation with global annual sales of US$44. ... To meet Wikipedias quality standards, this article or section may require cleanup. ... To meet Wikipedias quality standards, this article or section may require cleanup. ... The American National Standards Institute or ANSI (pronounced an-see) is a nonprofit organization that oversees the development of standards for products, services, processes and systems in the United States. ...


Most well-known code pages, excluding those for the CJK languages and Vietnamese, represent character sets that fit in 8 bits and don't involve anything that can't be represented by mapping each code to a simple bitmap, such as combining characters, complex scripts, etc. CJK is a collective term for Chinese, Japanese, and Korean, which comprise the main East Asian languages. ...


The text mode of standard (VGA compatible) PC graphics hardware is built around using an 8 bit code page, though it is possible to use two at once with some color depth sacrifice, and up to 8 may be stored in the display adaptor for easy switching [1]). There were a selection of code pages that could be loaded into such hardware. However, it is now commonplace for operating system vendors to provide their own character encoding and rendering systems that run in a graphics mode and bypass this system entirely. The character encodings used by these graphical systems (particularly Windows) are sometimes called code pages as well.

Contents


Relationship to ASCII

The basis of the IBM PC code pages is ASCII, a 7-bit code representing 128 characters and control codes. In the past, 8-bit extensions to the ASCII code often either set the top bit to zero, or used it as a parity bit in network data transmissions. When this bit was instead made available for representing character data, another 128 characters and control codes could be represented. IBM used this extended range to encode characters used by various languages. No formal standard existed for these ‘extended character sets’; IBM merely referred to the variants as code pages, as it had always done for variants of EBCDIC encodings. There are 95 printable ASCII characters, numbered 32 to 126. ... In computing and telecommunication, a parity bit is a binary digit that indicates whether the number of 1 bits in the preceding data was even or odd. ... The term extended ASCII (or high ASCII) describes eight-bit or larger character encodings that include the standard seven-bit ASCII characters as well as others. ... EBCDIC (Extended Binary Coded Decimal Interchange Code) is an 8-bit character encoding (code page) used on IBM mainframe operating systems, like z/OS, OS/390, VM and VSE, as well as IBM minicomputer operating systems like OS/400 and i5/OS. It is also employed on various non-IBM...


IBM PC (OEM) code pages

These code pages are most often used under MS-DOS-like operating systems; they include a lot of box drawing characters. Since the original IBM PC code page (number 437) was not really designed for international use, several incompatible variants emerged. Microsoft refers to these as the OEM code pages. Examples include: Microsofts disk operating system, MS-DOS, was Microsofts implementation of DOS, which was the first popular operating system for the IBM PC, and until recently, was widely used on the PC compatible platform. ... Box drawing characters are widely used in text user interfaces to draw various frames and boxes. ... IBM PC or MS-DOS code page 437, often abbreviated CP437 and also known as DOS-US or OEM-US, is the original character set of the IBM PC, circa 1981. ...

IBM PC or MS-DOS code page 437, often abbreviated CP437 and also known as DOS-US or OEM-US, is the original character set of the IBM PC, circa 1981. ... Code page 737 (CP 737, IBM 737, OEM 737) is a code page to be used under MS-DOS to write Greek language. ... The code page 850 is a code page which was used in occidental Europe, under systems such as DOS. It has been largely replaced with ISO 8859-1 and UTF-8, but is still sometimes used. ... Code page 852 (CP 852, IBM 852, OEM 852) is a code page to be used under MS-DOS with Eastern European languages that use Latin script. ... CP855 is a Cyrillic codepage to be used under MS-DOS. This codepage is not much used. ... The Cyrillic alphabet (or azbuka, from the old name of the first two letters) is an alphabet used for several Slavic languages; (Belarusian, Bulgarian, Macedonian, Russian, Rusyn, Serbian, and Ukrainian) and many other languages of the former Soviet Union, Asia and Eastern Europe. ... Code page 857 (CP 857, IBM 857, OEM 857) is a code page to be used under MS-DOS to write Turkish language. ... Code page 858 (CP 858, IBM 858, OEM 858) is a code page to be used under MS-DOS to write Western European languages. ... ISO 4217 Code EUR User(s) European Union; Austria, Belgium, Finland, France, Germany, Greece, Ireland, Italy, Luxembourg, Netherlands, Portugal, Spain, Andorra, Monaco, San Marino, Vatican City, Montenegro, Kosovo, French Guiana, Réunion, Saint-Pierre et Miquelon, Guadeloupe, Martinique, Mayotte and Åland. ... Code page 860 (CP 860, IBM 860, OEM 860) is a code page to be used under MS-DOS to write Portuguese language. ... Code page 861 (CP 861, IBM 861, OEM 861) is a code page to be used under MS-DOS to write Icelandic language (as well as other Nordic languages). ... Code page 863 (CP 863, IBM 863, OEM 863) is a code page to be used under MS-DOS to write French language (mainly in Canada). ... Code page 865 (CP 865, IBM 865, OEM 865) is a code page to be used under MS-DOS with Nordic languages (except Icelandic, for which CP861 is used). ... Overview map of the region. ... CP866 is a Cyrillic codepage to be used with MS-DOS. It is based on the alternative character set of GOST 19768-87. ... The Cyrillic alphabet (or azbuka, from the old name of the first two letters) is an alphabet used for several Slavic languages; (Belarusian, Bulgarian, Macedonian, Russian, Rusyn, Serbian, and Ukrainian) and many other languages of the former Soviet Union, Asia and Eastern Europe. ... Code page 869 (CP 869, IBM 869, OEM 869) is a code page to be used under MS-DOS to write Greek language. ...

Other code pages of note

In modern applications, operating systems and programming languages, the IBM code pages have been rendered obsolete by newer & better international standards, such as ISO 8859-1 and Unicode. Mac OS Roman is a character encoding primarily used by Mac OS to represent text. ... The Mac OS Roman character set Mac-Roman encoding is a one byte character encoding system, traditionally used by Mac OS. In Mac OS X, it has been replaced with Unicode. ... MacCyrillic encoding is used in Apple Macintosh computers to represent Cyrillic texts. ... MacCyrillic encoding is used in Apple Macintosh computers to represent Cyrillic texts. ... Macintosh Central European encoding is used in Apple Macintosh computers to represent texts in Central European and Southeastern European languages that use Latin script. ... Macintosh Central European encoding is used in Apple Macintosh computers to represent texts in Central European and Southeastern European languages that use Latin script. ... Code page 932 (aka CP932, Windows-31J) is Microsofts extension of Shift_JIS to include NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119). ... GBK is an extension of the GB2312 character set for simplified Chinese characters, used in the Peoples Republic of China. ... GBK is an extension of the GB2312 character set for simplified Chinese characters, used in the Peoples Republic of China. ... Code page 949 is Microsofts implementation that appears similar to KSC 5601. ... Code page 950 is Microsofts implementation of the defacto standard Big5. ... Because of technical limitations, some web browsers may not display some special characters in this article. ... When integers or any other data are represented with multiple bytes, there is no unique way of ordering of those bytes in memory or in a transmission over some medium, and so the order is subject to arbitrary convention. ... Because of technical limitations, some web browsers may not display some special characters in this article. ... When integers or any other data are represented with multiple bytes, there is no unique way of ordering of those bytes in memory or in a transmission over some medium, and so the order is subject to arbitrary convention. ... UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. ... UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. ... Because of technical limitations, some web browsers may not display some special characters in this article. ... ASMO449+ is a codepage used to write Arabic (and possibly some other languages that use Arabic script) on an ASCII terminal. ... ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding originally developed by ISO, but later jointly maintained by ISO and IEC. The standard, when supplemented with additional character assignments, is the... Because of technical limitations, some web browsers may not display some special characters in this article. ...


Windows (ANSI) code pages

Microsoft defined a number of code pages known as the ANSI code pages (as the first one, 1252 was based on an ansi draft of what became ISO 8859-1). Code page 1252 is built on ISO 8859-1 but uses the range 0x80-0x9F for extra printable characters rather than the C1 control codes used in ISO-8859-1. Some of the others are based in part on other parts of ISO 8859 but often rearranged to make them closer to 1252. The Microsoft Corporation, commonly known as just Microsoft, (NASDAQ: MSFT, HKSE: 4338) is a multinational computer technology corporation with global annual sales of US$44. ... Microsoft uses two main groups of code pages in Windows. ... ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding originally developed by ISO, but later jointly maintained by ISO and IEC. The standard, when supplemented with additional character assignments, is the... ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding originally developed by ISO, but later jointly maintained by ISO and IEC. The standard, when supplemented with additional character assignments, is the... ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. ... ISO 8859, more formally ISO/IEC 8859, is a joint ISO and IEC standard for 8-bit character encodings for use by computers. ...

Many Microsoft products produce characters in these ranges automatically, notably with ‘smart quotes’. This means that other software has to choose between Windows-1250 is a code page used under Microsoft Windows to represent texts in Eastern European languages that use Latin script, such as Polish, Czech, Slovak, Hungarian, Slovene, Croatian, Romanian and Albanian. ... Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic alphabet such as Russian and other languages. ... The Cyrillic alphabet (or azbuka, from the old name of the first two letters) is an alphabet used for several Slavic languages; (Belarusian, Bulgarian, Macedonian, Russian, Rusyn, Serbian, and Ukrainian) and many other languages of the former Soviet Union, Asia and Eastern Europe. ... ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. ... Windows-1253 is a Windows codepage used to write modern Greek (but not polytonic Greek). ... Windows-1254 is a codepage used under Microsoft Windows to write Turkish. ... Windows-1255 is a codepage used under Microsoft Windows to write Hebrew. ... This article is mainly about Hebrew letters. ... Windows-1256 is a codepage used to write Arabic (and possibly some other languages that use Arabic script) under Microsoft Windows. ... The Arabic alphabet is the script used for writing in the Arabic language. ... Windows-1257 (Windows Baltic) is a codepage used to write Estonian, Latvian and Lithuanian languages under Microsoft Windows. ... Baltic states and the Baltic Sea The Baltic states or the Baltic countries is a term which refers to three countries in Northern Europe: Estonia Latvia Lithuania Prior to World War II, Finland was sometimes considered a fourth Baltic state. ... Windows-1258 is a codepage used in Microsoft Windows to represent Vietnamese texts. ...

  • not interoperating with documents produced with Microsoft applications
  • mis-rendering the text in question
  • adding support for the Microsoft code pages, in effect making Microsoft's implementation a de facto standard.

Microsoft applications also mislabeled text in Windows-1252 as ISO-8859-1 and many Windows-based developers, ignorant of the issues involved, followed their example. Whilst current Microsoft applications seem to correctly label Windows-1252 text as such when they can (such as when sending e-mail), they still allow both reading and writing (e.g., through forms) these characters on websites declared as ISO-8859-1. The most popular competing web browsers do so too, favoring compatibility over standards compliance. ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. ... ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. ...


These code pages were sometimes viewed as part of Microsoft's embrace, extend and extinguish strategy towards open standards, though something as simple as an 8 bit character table could never really be kept proprietary. On the other hand, since standards bodies had decided to not assign graphical characters to the upper-half control-character positions 80–9F, which are hardly used in practice for control functions, 12.5% of the available code positions were wasted. Microsoft, like many other companies in their heyday, has publicly stated that it aims to embrace and extend popular standards and existing work. ...


Private code pages

When, early in the history of personal computers, users didn't find their character encoding requirements met, private or local code pages were created using Terminate and Stay Resident utilities or by re-programming BIOS EPROMs. In some cases, unofficial code page numbers were invented (e.g., cp895). Terminate and Stay Resident (TSR) computer programs were the only way to achieve a primitive sort of multitasking (usually just task-switching) using the DOS operating system. ... Phoenix AwardBIOS CMOS Setup utility on a standard PC BIOS, in computing, stands for Basic Input/Output System or Basic Integrated Operating System. ... EPROM. The small quartz window admits UV light during erasure. ...


When more diverse character set support became available most of those code pages fell into disuse, with some exceptions such as the Kamenický or KEYBCS2 encoding for the Czech and Slovak alphabets. Another character set is Iran System encoding standard that was created by Iran System corporation for Persian language support. This standard was in use in Iran in DOS-based programs and after introduction of Microsoft code page 1256 this standard became obsolete. However some windows and DOS programs using this encoding are still in use and some windows fonts with this encoding exist. The Kamenický encoding, named for the Kamenický brothers, was a very popular code page for personal computers running MS-DOS, used in the former Czechoslovakia (today Czech Republic and Slovakia). ... Iran System encoding standard was an 8-bit character encoding scheme and was created by Iran System corporation for Persian language support. ...


See also

A character encoding consists of a code that pairs a sequence of characters from a given set with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the storage of text in computers and the transmission of text through telecommunication networks. ...

External links


  Results from FactBites:
 
Code page - Wikipedia, the free encyclopedia (971 words)
Code page is the traditional IBM term used for a specific character encoding table: a mapping in which a sequence of bits, usually a single octet representing integer values 0 through 255, is associated with a specific character.
Microsoft defined a number of code pages known as the ANSI code pages (as the first one, 1252 was based on an ansi draft of what became ISO 8859-1).
Code page 1252 is built on ISO 8859-1 but uses the range 0x80-0x9F for extra printable characters rather than the C1 control codes used in ISO-8859-1.
Code page 437 - Wikipedia, the free encyclopedia (396 words)
IBM PC or MS-DOS code page 437, often abbreviated CP437 and also known as DOS-US or OEM-US, is the original character set of the IBM PC, circa 1981.
The codes can assume their original function as controls, but when placed in display RAM, for example in a screen editor like MS-DOS edit, they show as graphics.
All CP437 characters are in Unicode and in Microsoft's WGL4 character set, therefore in most of the fonts on Microsoft Windows, and also in the default VGA font of the Linux kernel, and the ISO 10646 fonts for X11.
  More results at FactBites »

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m