FACTOID # 6: Michigan is ranked 22nd in land area, but since 41.27% of the state is composed of water, it jumps to 11th place in total area.
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 


FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:



(* = Graphable)



Encyclopedia > Language tag

IETF language tags are defined by BCP 47, which is currently RFC 4646 and RFC 4647. These language tags are used in a number of modern standards, such as HTTP[1], HTML[2], XML[3] and PNG[4]. The Internet Engineering Task Force (IETF) develops and promotes Internet standards, cooperating closely with the W3C and ISO/IEC standard bodies; and dealing in particular with standards of the TCP/IP and Internet protocol suite. ... A Best Current Practice (BCP) means that a certain manner of proceeding is in general the most logical choice -- a de facto standard of sorts. ... Hypertext Transfer Protocol (HTTP) is a method used to transfer or convey information on the World Wide Web. ... HTML, short for Hypertext Markup Language, is the predominant markup language for the creation of web pages. ... The Extensible Markup Language (XML) is a general-purpose markup language. ... PNG (Portable Network Graphics) is a bitmapped image format that employs lossless data compression. ...

Each language tag is composed of one or more "subtags" separated by hyphens. With the exception of private use language tags and grandfathered language tags, the subtags occur in the following order:

  • a language subtag (potentially followed by up to three extended language subtags)
  • an optional script subtag
  • an optional region subtag
  • optional variant subtags
  • optional extension subtags
  • optional private use subtags

Language subtags are mainly derived from ISO 639-1 and ISO 639-2, script subtags from ISO 15924, and region subtags from ISO 3166-1 alpha-2 and UN M.49. Variant subtags are not derived from any standard. No extension subtags have yet been defined. The Language Subtag Registry, maintained by IANA, lists the current valid public subtags. ISO 639-1 is the first part of the ISO 639 international-standard language-code family. ... ISO 639-2 is the second part of the ISO 639 standard, which lists codes for the representation of the names of languages. ... ISO 15924, Codes for the representation of names of scripts, defines two sets of codes for a number of writing systems (scripts). ... ISO 3166-1 alpha-2 codes are two-letter country codes in the ISO 3166-1 standard to represent countries and dependent areas. ... The Internet Assigned Numbers Authority (IANA) is the entity that oversees global IP address allocation, DNS root zone management, and other Internet protocol assignments. ...

The most commonly seen language tags consist of just a language subtag, or a language subtag and a region subtag. For example, en represents English, and consists of a single language subtag (from ISO 639-1), while en-CA represents Canadian English, and consists of the language subtag en followed by the region subtag CA (from ISO 3166-1). The English language is a West Germanic language that originates in England. ... Canadian English (CaE) is a variety of English used in Canada. ...

Subtags are not case sensitive, but the specification recommends using the same case as in the Language Subtag Registry, where region subtags are uppercase, script subtags are titlecase and all other subtags are lowercase. This capitalization follows the recommendations of the underlying ISO standards. Text sometimes exhibits case sensitivity, that is, words can differ in meaning based on the differing use of uppercase and lowercase letters. ... Majuscules or capital letters (in the Roman alphabet: A, B, C, ...) are one type of case in a writing system. ... For any word written in a language with whose alphabet or alphabet equivalent has two cases, such as those using the Latin, Greek, Cyrillic, or Armenian alphabet, capitalization (or capitalisation) is the writing of that word with its first letter in majuscules (uppercase) and the remaining letters in minuscules (lowercase). ... Minuscule, or lower case, is the smaller form (case) of letters (in the Roman alphabet: a, b, c, ...). Originally alphabets were written entirely in majuscule (capital) letters which were spaced between well-defined upper and lower bounds. ...



IETF language tags were first defined in RFC 1766, published in March 1995. In January 2001 this was superseded by RFC 3066, which added the use of ISO 639-2 codes (whereas previously only ISO 639-1 codes had been allowed), permitted subtags with digits for the first time, and adopted the concept of language ranges from HTTP/1.1 to help with matching of language tags.

The next revision of the specification came in September 2006 with the publication of RFC 4646 (the main part of the specification) and RFC 4647 (which deals with matching behaviour). RFC 4646 introduced a more structured format for language tags and replaced the old register of tags with a new register of subtags that utilizes ISO 15924 and UN M.49 in addition to the previously used ISO 639 and ISO 3166. The small number of previously defined tags that did not conform to the new structure were grandfathered in order to maintain compatibility with RFC 3066.

An IETF Working Group is currently preparing the next version of the specification. The main purpose of this revision is to incorporate codes from ISO 639-3 into the Language Subtag Registry.[5] An IETF working group, or WG for short, is a working group of the IETF. It operates on rough consensus, is open to all who want to participate, has discussions on an open mailing list, and may hold meetings at IETF meetings. ... ISO 639-3 is in process of development as an international standard for language codes. ...

Relation to other standards

Although subtags are often derived from ISO standards, they do not follow these standards absolutely as this could lead to the meaning of language tags changing over time. This article or section does not cite any references or sources. ...

In particular, a subtag derived from a code assigned by ISO 639, ISO 15924 or ISO 3166 remains a valid (though deprecated) subtag even if the code is withdrawn from the corresponding ISO standard. If the ISO standard later assigns a new meaning to the withdrawn code, the corresponding subtag will still retain its old meaning.

This stability was introduced in RFC 4646. Before RFC 4646, changes in the meaning of ISO codes could cause changes in the meaning of language tags.

Issues with ISO 3166-1 and UN M.49

If a new ISO 3166-1 alpha-2 code would conflict with an existing region subtag (due to the code having previously had a different meaning), a UN M.49 code can be used instead. This rule was introduced in RFC 4646 and so far there has been no need to use it. UN M.49 is also the source for region subtags such as 005 for South America, as ISO 3166 does not provide codes for supranational regions.

Relation to ISO 639-3

RFC 4646, unlike its predecessors, defines the concept of an "extended language subtag", although it does not permit the registration of such subtags. The next version of the specification (currently in draft) is expected to require certain ISO 639-3 codes to be registered as extended language subtags, and to require other ISO 639-3 codes to be registered as (primary) language subtags.[6] ISO 639-3 is in process of development as an international standard for language codes. ...

See also

A language code is a system that assigns short letter codes to languages. ...


  1. ^ RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1, section 3.10
  2. ^ HTML 4.01 Specification, section 8.1
  3. ^ Extensible Markup Language (XML) 1.0 (Fourth Edition), section 2.12
  4. ^ Portable Network Graphics (PNG) Specification (Second Edition), section
  5. ^ Language Tag Registry Update charter
  6. ^ Draft of RFC 4646bis

External links

  Results from FactBites:
Cotse - Connected: An Internet Encyclopedia - 3.10 Language Tags (285 words)
A language tag identifies a natural language spoken, written, or otherwise conveyed by human beings for communication of information to other human beings.
The syntax and registry of HTTP language tags is the same as that defined by RFC 1766 [1].
Whitespace is not allowed within the tag and all tags are case- insensitive.
  More results at FactBites »



Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m