FACTOID # 9: The bookmobile capital of America is Kentucky.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Spell checker

In computing, a spell checker is an applications program that flags words in a document that may not spelled correctly. Spell checkers may be stand-alone capable of operating on a block of text, or as part of a larger application, such as a word processor, email client, electronic dictionary, or search engine. Image File history File links Question_book-3. ... For the formal concept of computation, see computation. ... Application software is a subclass of computer software that employs the capabilities of a computer directly and thoroughly to a task that the user wishes to perform. ... Proper spelling is the writing of a word or words with all necessary letters and diacritics present in an accepted standard order. ... A word processor (also more formally known as a document preparation system) is a computer application used for the production (including composition, editing, formatting, and possibly printing) of any sort of viewable or printed material. ... An email client (or mail user agent [MUA]) is a computer program that is used to read and send e-mail. ... For other uses, see Dictionary (disambiguation). ... A search engine is an information retrieval system designed to help find information stored on a computer system. ...

Eye have a spelling chequer,
It came with my Pea Sea.
It plane lee marks four my revue
Miss Steaks I can knot sea.


Eye strike the quays and type a word
And weight four it two say
Weather eye am write oar wrong
It tells me straight a weigh.


Eye ran this poem threw it,
Your shore real glad two no.
Its vary polished in it's weigh.
My chequer tolled me sew.


A chequer is a bless thing,
It freeze yew lodes of thyme.
It helps me right all stiles of righting,
And aides me when eye rime.


Each frays come posed up on my screen
Eye trussed too bee a joule.
The chequer pours o'er every word
Two cheque sum spelling rule.

An unsophisticated spell checker will find little or no fault with this poem because it checks words in isolation. A more sophisticated spell checker will make use of word n-gram to consider the context in which a word occurs.

Contents

It has been suggested that bigram be merged into this article or section. ...

Operation

Simple spell checkers operate on individual words by comparing each of them against the contents of a dictionary, possibly performing stemming on the word. If the word is not found it is considered to be a error, and an attempt may be made to suggest a word that was likely to have been intended. One such suggestion algorithm is to list those words in the dictionary having a small Levenshtein distance from the original word. For other uses, see Dictionary (disambiguation). ... A stemmer is a program or algorithm which determines the morphological root of a given inflected (or, sometimes, derived) word form -- generally a written word form. ... In information theory and computer science, the Levenshtein distance is a string metric which is one way to measure edit distance. ...


When a word which is not within the dictionary is encountered most spell checkers provide an option to add that word to a list of known exceptions that should not be flagged.


Design

A spell checker customarily consists of two parts:

  1. A set of routines for scanning text and extracting words, and
  2. An algorithm for comparing the extracted words against a known list of correctly spelled words (ie., the dictionary).

The scanning routines sometimes include language-dependent algorithms for handling morphology. Even for a lightly inflected language like English, word extraction routines will need to handle such phenomena as contractions and possessives. It is unclear whether morphological analysis provides a significant benefit. [1] For other uses, see Morphology. ... The English language is a West Germanic language that originates in England. ... In traditional grammar, a contraction is the formation of a new word from two or more individual words. ... Possession, in the context of linguistics, is an asymmetric relationship between two constituents, the referent of one of which (the possessor) possesses (owns, rules over, has as a part, has as a relative, etc. ...


The word list might contain just a list of words, or it might also contain additional information, such as hyphenation points or lexical and grammatical attributes.


As an adjunct to these two components, the program's user interface will allow users to approve replacements and modify the program's operation. The user interface is the part of a system exposed to users. ...


One exception to the above paradigm are spell checkers which use based solely statistical information, for instance using n-grams. This approach usually requires a lot of effort to obtain sufficient statistical information and may require a lot more runtime storage. These methods are not currently in general use. In some cases spell checkers use a fixed list of misspellings and suggestions for those misspellings; this less flexible approach is often used in paper-based correction methods, such as the see also entries of encyclopedias. It has been suggested that bigram be merged into this article or section. ...


History

The first spell checkers were widely available on mainframe computers in the late 1970s. The first spell checkers for personal computers appeared for CP/M computers in 1980, followed by packages for the IBM PC after it was introduced in 1981. Developers such as Maria Mariani, Soft-Art, Microlytics, Proximity, Circle Noetics, and Reference Software rushed OEM packages or end-user products into the rapidly expanding software market, primarily for the PC but also for Apple Macintosh, VAX, and Unix. On the PCs, these spell checkers were standalone programs, many of which could be run in TSR mode from within word-processing packages on PCs with sufficient memory. The 1970s decade refers to the years from 1970 to 1979, also called The Seventies. ... CP/M is an operating system originally created for Intel 8080/85 based microcomputers by Gary Kildall of Digital Research, Inc. ... IBM PC (IBM 5150) with keyboard and green screen monochrome monitor (IBM 5151), running MS-DOS 5. ... AUGUST 25 1981 US Marine Sean Vance is Born on the 25th of August {ear nav|1981}} Year 1981 (MCMLXXXI) was a common year starting on Thursday (link displays the 1981 Gregorian calendar). ... Original equipment manufacturer, or OEM, is a term that refers to containment-based re-branding, namely where one company uses a component of another company within its product, or sells the product of another company under its own brand. ... The first Macintosh computer, introduced in 1984, upgraded to a 512K Fat Mac. The Macintosh or Mac, is a line of personal computers designed, developed, manufactured, and marketed by Apple Computer. ... VAX is a 32-bit computing architecture that supports an orthogonal instruction set (machine language) and virtual addressing (i. ... Filiation of Unix and Unix-like systems Unix (officially trademarked as UNIX®, sometimes also written as or ® with small caps) is a computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs including Ken Thompson, Dennis Ritchie and Douglas McIlroy. ... Terminate and Stay Resident (TSR) is a system call in DOS operating systems that returned control to the system as if the program had quit, but kept the program in memory. ...


However, the market for standalone packages was short-lived, as by the mid 1980s developers of popular word-processing packages like WordStar and WordPerfect had incorporated spell checkers in their packages, mostly licensed from the above companies, who quickly expanded support from just English to European and eventually even Asian languages. However, this required increasing sophistication in the morphology routines of the software, particularly with regard to heavily-inflected languages like Hungarian and Finnish. Although the size of the word-processing market in a country like Iceland might not have justified the investment of implementing a spell checker, companies like WordPerfect nonetheless strove to localize their software for as many as possible national markets as part of their global marketing strategy. The 1980s refers to the years from 1980 to 1989. ... WordStar was a word processor application, published by MicroPro, originally written for the CP/M operating system but later ported to DOS, that enjoyed a dominant market share during the early to mid-1980s. ... WordPerfect is a proprietary word processing application. ... The English language is a West Germanic language that originates in England. ... For other uses, see Europe (disambiguation). ... There are a wide variety of languages spoken thoughout Asia, comprising a number of families and unrelated isolate languages. ... Next big thing redirects here. ...


Recently, spell checking has moved beyond word processors as Firefox 2.0, a web browser, has spell check support for user-written content, such as when writing on many webmail sites, blogs, and social networking websites. The web browsers Konqueror and Opera, the email client Kmail and the instant messaging client Pidgin also offer spell checking support, transparently using GNU Aspell as their engine. Firefox may refer to: Firefox (novel), written by Craig Thomas, published in 1978 Firefox (film), the 1982 movie starring Clint Eastwood, based on the novel Firefox (arcade game), the laserdisc arcade game based on the movie Mozilla Firefox, a web browser The Red Fox or the Red Panda, based on... An example of a Web browser (Mozilla Firefox) A web browser is a software application that enables a user to display and interact with text, images, videos, music and other information typically located on a Web page at a website on the World Wide Web or a local area network. ... Webmail is a class of web applications that allow users to read and write e-mail using a web browser, or in a more general sense, an e-mail account accessed through such an application. ... It has been suggested that Online diary be merged into this article or section. ... A social network is a map of the relationships between individuals, indicating the ways in which they are connected through various social familiarities ranging from casual acquaintance to close familial bonds. ... Konqueror is a file manager, web browser and file viewer, which was developed as part of the K Desktop Environment (KDE) by volunteers and runs on most Unix-like operating systems. ... Opera is an Internet suite which handles common internet-related tasks, including visiting web sites, sending and receiving e-mail messages, managing contacts, and online chat. ... KMail is the e-mail client of the KDE desktop environment. ... // Instant messaging (IM) is a form of real-time communication between two or more people based on typed text. ... In computing, a client is a system that accesses a (remote) service on another computer by some kind of network. ... Pidgin (formerly named Gaim) is a multi-platform instant messaging client. ... GNU Aspell, usually called just Aspell, is the standard spelling checker software for the GNU software system designed to replace Ispell. ...


Functionality

The first spell checkers were "verifiers" instead of "correctors." They offered no suggestions for incorrectly spelled words. This was helpful for typos but it was not so helpful for logical or phonetic errors. The challenge the developers faced was the difficulty in offering useful suggestions for misspelled words. This requires reducing words to a skeletal form and applying pattern-matching algorithms. Wikipedia does not yet have an article with this exact name. ...


It might seem logical that where spell-checking dictionaries are concerned, "the bigger, the better," so that correct words are not marked as incorrect. In practice, however, an optimal size for English appears to be around 90,000 entries. If there are more than this, incorrectly spelled words may be skipped because they are mistaken for others. For example, a linguist might determine in the basis of corpus linguistics that the word baht is more frequently a misspelling of bath or bat than a reference to the Thai currency. Hence, it would typically be more useful if a few people who write about Thai currency were slightly inconvenienced, than if the spelling errors of the many more people who discuss baths were overlooked. Corpus linguistics is the study of language as expressed in samples (corpora) or real world text. ... Thai banknotes and coins. ...

A screenshot of the Abiword spell checker
A screenshot of the Abiword spell checker

The first MS-DOS spell checkers were mostly used in proofing mode from within word processing packages. After preparing a document, a user scanned the text looking for misspellings. Later, however, batch processing was offered in such packages as Oracle's short-lived CoAuthor. This allowed a user to view the results after a document was processed and only correct the words that he or she knew to be wrong. When memory and processing power became abundant, spell checking was performed in the background in an interactive way, such as has been the case with the Sector Software produced Spellbound program released in 1987 and Microsoft Word since Word 95. Image File history File links Spell_check. ... Image File history File links Spell_check. ... Oracle Corporation (NASDAQ: ORCL) is one of the major companies developing database management systems (DBMS), tools for database development, middle-tier software, enterprise resource planning software (ERP), customer relationship management software (CRM) and supply chain management (SCM) software. ... Year 1987 (MCMLXXXVII) was a common year starting on Thursday (link displays 1987 Gregorian calendar). ... Microsoft Word is Microsofts flagship word processing software. ...


In recent years, spell checkers have become increasingly sophisticated; some are now capable of recognizing simple grammatical errors. However, even at their best, they rarely catch all the errors in a text (such as homonym errors) and will flag neologisms and foreign words as misspelling. This article is about grammar from a linguistic perspective. ... For the specialised use of homonym in scientific nomenclature, see Homonym (botany) and Homonym (zoology). ... A neologism is a word, term, or phrase which has been recently created (or coined), often to apply to new concepts, to synthesize pre-existing concepts, or to make older terminology sound more contemporary. ...


Spell-checking other languages

English is unusual in that most words used in formal writing have a single spelling that can be found in a typical dictionary, with the exception of some jargon and modified words. In many languages, however, it's typical to frequently combine words in new ways. In German, compound nouns are frequently coined from other existing nouns. Some scripts do not clearly separate one word from another, requiring word-splitting algorithms. Each of these presents unique challenges to non-English language spell checkers.


Context-sensitive spell checkers

Recently, research has focused on developing algorithms which are capable of recognizing a misspelled word, even if the word itself is in the vocabulary, based on the context of the surrounding words[citation needed]. Not only does this allow words such as those in the poem above to be caught, but it mitigates the detrimental effect of enlarging dictionaries, allowing more words to be recognized. The most common example of errors caught by such a system are homophone errors, such as the bold words in the following sentence: This article is about the term in linguistics. ...

Their coming too sea if its reel.

The most successful algorithm to date is Andrew Golding and Dan Roth's "winnow-based spelling correction algorithm", published in 1999, which is able to recognize about 96% of context-sensitive spelling errors, in addition to ordinary non-word spelling errors [2]. Context-sensitive spell checkers are likely to appear in future text-processing products. Events of 2008: (EMILY) Me Lesley and MIley are going to China! This article is about the year. ...


See also

The nearest neighbor algorithm in pattern recognition is a method for classifying phenomena based upon observable features. ... Record linkage refers to the task of finding identical entries in two or more files. ... This article might not be written in the formal tone expected of an encyclopedia entry. ... In computing terms, a grammar checker is a design feature or a software program designed to verify the grammatical correctness or lack of it in a written text. ...

External links

  • Computer Programs for Detecting and Correcting Spelling Errors

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m