The Unicode collation algorithm provides a standard way to put names, words or strings of text in sequence according to the needs of a particular situation.
When used with the default Unicode collation element table (DUCET), this collation method is similar to the European ordering rules for strings in most European languages. In particular, for strings in the Latin alphabet, the ordering is the same as normal sorting order in English and similar languages, since it first looks only at letters stripped of any modifications or diacritical marks.
Note - this is complicated stuff and this description may be in error. It is better to look at the Unicode Technical Standard #10 (http://www.unicode.org/unicode/reports/tr10/) itself.
In addition to specifying a default sorting, UTS #10 also specify how tailorings are used to get any desired sorting behaviour for a locale.
An important open source implementation of UCA is included with the IBM International Components for Unicode, which also supports tailoring. You can see the effects of tailoring and a large number of language specific tailorings in the on-line ICU Locale Explorer.
The Unicodecollationalgorithm provides a standard way to put names, words or strings of text in sequence according to the needs of a particular situation.
When used with the default Unicodecollation element table (DUCET), this collation method is similar to the European ordering rules for strings in most European languages.
In particular, for strings in the Latin alphabet, the ordering is the same as normal sorting order in English and similar languages, since it first looks only at letters stripped of any modifications or diacritical marks.
A collation is a named function which takes two arbitrary length character strings (with the exception of the i;octetOctet Collationcollation) as input and can be used to perform one or more of three basic comparison operations: equality test, substring match, and ordering test.
A collation specification MUST state which of the three basic functions are supported (equality, substring, ordering) and how to perform each of the supported functions on any two input character strings including empty strings (with the exception of the i;octetOctet Collationcollation).
Collations must be deterministic, i.e.given a collation with a specific name, and any two fixed input strings, the result MUST be the same for the same operation.
Share your thoughts, questions and commentary here
Want to know more? Search encyclopedia, statistics and forums:
Press Releases |
Feeds |
Contact
The Wikipedia article included on this page is licensed under the
GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m