FACTOID # 25: If you're tired of sitting in traffic on your way to work, move to North Dakota.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Information retrieval

Information retrieval (IR) is the science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand-alone databases or hypertextually-networked databases such as the World Wide Web. There is a common confusion, however, between data retrieval, document retrieval, information retrieval, and text retrieval, and each of these has its own bodies of literature, theory, praxis and technologies. IR is interdisciplinary, based on computer science, mathematics, library science, information science, information architecture, cognitive psychology, linguistics, statistics and physics. The ASCII codes for the word Wikipedia represented in binary, the numeral system most commonly used for encoding computer information. ... Metadata is data about data. ... This article is about computing. ... A relational database is a database that conforms to the relational model, and refers to a databases data and schema (the databases structure of how those data are arranged). ... In computing, hypertext is a user interface paradigm for displaying documents which, according to an early definition (Nelson 1970), branch or perform on request. ... The World Wide Web and WWW redirect here. ... Document retrieval is defined as the matching of some stated user query against useful parts of free-text records. ... Text retrieval is a branch of computerised information retrieval where the information is stored primarily in the form of text. ... This article is about the word praxis. ... Interdisciplinary work is that which integrates concepts across different disciplines. ... Computer science, or computing science, is the study of the theoretical foundations of information and computation and their implementation and application in computer systems. ... For other meanings of mathematics or uses of math and maths, see Mathematics (disambiguation) and Math (disambiguation). ... Library science is an interdisciplinary science incorporating the humanities, law and applied science to study topics related to libraries, the collection, organization, preservation and dissemination of information resources, and the political economy of information. ... Not to be confused with informatics or information theory. ... Information architecture (IA) is the art and science of expressing a model or concept of information used in activities that require explicit details of complex systems. ... Cognitive Psychology is the school of psychology that examines internal mental processes such as problem solving, memory, and language. ... For the journal, see Linguistics (journal). ... This article is about the field of statistics. ... A magnet levitating above a high-temperature superconductor demonstrates the Meissner effect. ...


Automated IR systems are used to reduce information overload. Many universities and public libraries use IR systems to provide access to books, journals, and other documents. Web search engines are the most visible IR applications. Information overload (aka information flood) is a term that is usually used in conjunction with various forms of Computer-mediated communication such as Electronic mail. ... Librarians and patrons in a typical larger urban public library. ... A search engine is an information retrieval system designed to help find information stored on a computer system. ... Areas where information retrieval techniques are employed include (the entries are in alphabetical order within each category): // Digital libraries Information filtering Recommender systems Media search Blog search Image retrieval Music retrieval News search Speech retrieval Video retrieval Search engines Desktop search Enterprise search Federated search Mobile search Social search Web...

Contents

History

But do you know that, although I have kept the diary [on a phonograph] for months past, it never once struck me how I was going to find any particular part of it in case I wanted to look it up?

—Dr Seward, Bram Stoker's Dracula, 1897 Abraham Bram Stoker (8 November 1847 – 20 April 1912) was an Irish writer of novels and short stories, who is best known today for his 1897 horror novel Dracula. ... This article is about the novel. ...

The idea of using computers to search for relevant pieces of information was popularized in an article As We May Think by Vannevar Bush in 1945.[1] First implementations of information retrieval systems were introduced in the 1950s and 1960s. By 1990 several different techniques had been shown to perform well on small text corpora (several thousand documents).[1] Vannevar Bushs essay As We May Think, first published in The Atlantic Monthly in July 1945, argued that as humans turned from war, scientific efforts should shift from increasing physical abilities to making all previous collected human knowledge more accessible. ... Vannevar Bush (March 11, 1890 – June 30, 1974) was an American engineer and science administrator, known for his political role in the development of the atomic bomb, and the idea of the memex—seen as a pioneering concept for the World Wide Web. ...


In 1992 the US Department of Defense, along with the National Institute of Standards and Technology (NIST), cosponsored the Text Retrieval Conference (TREC) as part of the TIPSTER text program. The aim of this was to look into the information retrieval community by supplying the infrastructure that was needed for evaluation of text retrieval methodologies on a very large text collection. This catalyzed research on methods that scale to huge corpora. The introduction of web search engines has boosted the need for very large scale retrieval systems even further. NIST logo The National Institute of Standards and Technology (NIST, formerly known as The National Bureau of Standards) is a non-regulatory agency of the United States Department of Commerce’s Technology Administration. ... The Text REtrieval Conference (TREC) is an on-going series of workshops focusing on a list of different information retrieval (IR) research areas, or It is co-sponsored by the National Institute of Standards and Technology (NIST) and Advanced Research and Development Activity (ARDA) center of the U.S. Department... It has been suggested that this article or section be merged with Scale (computing). ... A search engine is an information retrieval system designed to help find information stored on a computer system. ...


The use of digital methods for storing and retrieving information has led to the phenomenon of digital obsolescence, where a digital resource ceases to be readable because the physical media, the reader required to read the media, the hardware, or the software that runs on it, is no longer available. The information is initially easier to retrieve than if it were on paper, but is then effectively lost. Digital obsolescence is a situation where a digital resource is no longer readable because the physical media, the reader required to read the media, the hardware, or the software that runs on it, is no longer available. ...


Timeline

  • 1890: Hollerith tabulating machines were used to analyze the US census. (Herman Hollerith).
  • 1945: Vannevar Bush's As We May Think appeared in Atlantic Monthly
  • Late 1940s: The US military confronted problems of indexing and retrieval of wartime scientific research documents captured from Germans.
  • 1947: Hans Peter Luhn (research engineer at IBM since 1941) began work on a mechanized, punch card based system for searching chemical compounds.
  • 1950: The term "information retrieval" may have been coined by Calvin Mooers.
  • 1950s: Growing concern in the US for a "science gap" with the USSR motivated, encouraged funding, and provided a backdrop for mechanized literature searching systems (Allen Kent et al) and the invention of citation indexing (Eugene Garfield).
  • 1955: Allen Kent joined Case Western Reserve University, and eventually becomes associate director of the Center for Documentation and Communications Research. That same year, Kent and colleagues publish a paper in American Documentation describing the precision and recall measures, as well as detailing a proposed "framework" for evaluating an IR system, which includes statistical sampling methods for determining the number of relevant documents not retrieved.
  • 1958: International Conference on Scientific Information Washington DC included consideration of IR systems as a solution to problems identified. See: Proceedings of the International Conference on Scientific Information, 1958 (National Academy of Sciences, Washington, DC, 1959)
  • 1959: Hans Peter Luhn published "Auto-encoding of documents for information retrieval."
  • 1960: Melvin Earl (Bill) Maron and J. L. Kuhns published "On relevance, probabilistic indexing, and information retrieval" in Journal of the ACM 7(3):216-244, July 1960.
  • Early 1960s: Gerard Salton began work on IR at Harvard, later moved to Cornell.
  • 1962: Cyril W. Cleverdon published early findings of the Cranfield studies, developing a model for IR system evaluation. See: Cyril W. Cleverdon, "Report on the Testing and Analysis of an Investigation into the Comparative Efficiency of Indexing Systems". Cranfield Coll. of Aeronautics, Cranfield, England, 1962.
  • 1962: Kent published Information Analysis and Retrieval
  • 1963: Weinberg report "Science, Government and Information" gave a full articulation of the idea of a "crisis of scientific information." The report was named after Dr. Alvin Weinberg.
  • 1963: Joseph Becker and Robert M. Hayes published text on information retrieval. Becker, Joseph; Hayes, Robert Mayo. Information storage and retrieval: tools, elements, theories. New York, Wiley (1963).
  • 1964: Karen Spärck Jones finished her thesis at Cambridge, Synonymy and Semantic Classification, and continued work on computational linguistics as it applies to IR
  • 1964: The National Bureau of Standards sponsored a symposium titled "Statistical Association Methods for Mechanized Documentation." Several highly significant papers, including G. Salton's first published reference (we believe) to the SMART system.
  • Mid-1960s: National Library of Medicine developed MEDLARS Medical Literature Analysis and Retrieval System, the first major machine-readable database and batch retrieval system
  • Mid-1960s: Project Intrex at MIT
  • 1965: J. C. R. Licklider published Libraries of the Future
  • 1966: Don Swanson was involved in studies at University of Chicago on Requirements for Future Catalogs
  • 1968: Gerard Salton published Automatic Information Organization and Retrieval.
  • 1968: J. W. Sammon's RADC Tech report "Some Mathematics of Information Storage and Retrieval..." outlined the vector model.
  • 1969: Sammon's "A nonlinear mapping for data structure analysis" (IEEE Transactions on Computers) was the first proposal for visualization interface to an IR system.
  • Late 1960s: F. W. Lancaster completed evaluation studies of the MEDLARS system and published the first edition of his text on information retrieval
  • Early 1970s: first online systems--NLM's AIM-TWX, MEDLINE; Lockheed's Dialog; SDC's ORBIT
  • Early 1970s: Theodor Nelson promoting concept of hypertext, published Computer Lib/Dream Machines
  • 1971: N. Jardine and C. J. Van Rijsbergen published "The use of hierarchic clustering in information retrieval", which articulated the "cluster hypothesis." (Information Storage and Retrieval, 7(5), pp. 217-240, Dec 1971)
  • 1975: Three highly influential publications by Salton fully articulated his vector processing framework and term discrimination model:
    • A Theory of Indexing (Society for Industrial and Applied Mathematics)
    • "A theory of term importance in automatic text analysis", (JASIS v. 26)
    • "A vector space model for automatic indexing", (CACM 18:11)
  • 1978: The First ACM SIGIR conference.
  • 1979: C. J. Van Rijsbergen published Information Retrieval (Butterworths). Heavy emphasis on probabilistic models.
  • 1980: First international ACM SIGIR conference, joint with British Computer Society IR group in Cambridge
  • 1982: Belkin, Oddy, and Brooks proposed the ASK (Anomalous State of Knowledge) viewpoint for information retrieval. This was an important concept, though their automated analysis tool proved ultimately disappointing.
  • 1983: Salton (and M. McGill) published Introduction to Modern Information Retrieval (McGraw-Hill), with heavy emphasis on vector models.
  • Mid-1980s: Efforts to develop end user versions of commercial IR systems.
  • 1985-1993: Key papers on and experimental systems for visualization interfaces.
  • Work by D. B. Crouch, Robert R. Korfhage, M. Chalmers, A. Spoerri and others.
  • 1989: First World Wide Web proposals by Tim Berners-Lee at CERN.
  • 1992: First TREC conference.
  • 1997: Publication of Korfhage's Information Storage and Retrieval[2] with emphasis on visualization and multi-reference point systems.
  • Late 1990s: Web search engine implementation of many features formerly found only in experimental IR systems

Herman Hollerith (February 29, 1860 – November 17, 1929) was an German-American statistician who developed a mechanical tabulator based on punched cards in order to rapidly tabulate statistics from millions of pieces of data. ... Vannevar Bush (March 11, 1890 – June 30, 1974) was an American engineer and science administrator, known for his political role in the development of the atomic bomb, and the idea of the memex—seen as a pioneering concept for the World Wide Web. ... Vannevar Bushs essay As We May Think, first published in The Atlantic Monthly in July 1945, argued that as humans turned from war, scientific efforts should shift from increasing physical abilities to making all previous collected human knowledge more accessible. ... The Atlantic Monthly (also known as The Atlantic) is an American literary/cultural magazine that was founded in November 1857. ... Hans Peter Luhn (July 1, 1896 in Barmen, Germany – 1964) computer scientist for IBM, creator of the Luhn algorithm and KWIC (Key Words In Context) indexing. ... Calvin Northrup Mooers (1919-1994), was an American computer scientist who originated the expression_oriented text_processing language TRAC, and attempted to control its development by enforcement of his trademark on the name TRAC. External link Calvin N. Mooers Categories: Stub | 1919 births | 1994 deaths | Computer scientists ... Eugene Garfield (born September 16, 1925 in New York City) is an American scientist and one of the founders of bibliometrics. ... Case Western Reserve University is a private research university located in Cleveland, Ohio, United States, with some residence halls on the south end of campus located in Cleveland Heights. ... Gerard Salton (8 March 1927 - 28 August 1995) was a Professor of Computer Science at Cornell University. ... Alvin Martin Weinberg (April 20, 1915 - October 18, 2006) was a nuclear physicist and administrator at Oak Ridge National Laboratory (ORNL). ... Robert M. Hayes (b. ... Karen Spärck Jones is a British computer scientist, working at Cambridges Computer Laboratory. ... Computational linguistics is an interdisciplinary field dealing with the statistical and logical modeling of natural language from a computational perspective. ... As a non-regulatory agency of the United States Department of Commerce’s Technology Administration, the National Institute of Standards (NIST) develops and promotes measurement, standards, and technology to enhance productivity, facilitate trade, and improve the quality of life. ... J. C. R. Licklider Joseph Carl Robnett Licklider (March 11, 1915 â€“ June 26, 1990), known simply as J.C.R. or Lick was an American computer scientist, considered one of the most important figures in computer science and general computing history. ... Donald (Don) R. Swanson (born October 10, 1924) is an American information scientist, most renowned for his work in literature-based discovery in the biomedical domain. ... Theodor Holm Nelson (born 1937) is an American sociologist, philosopher, and pioneer of information technology. ... In computing, hypertext is a user interface paradigm for displaying documents which, according to an early definition (Nelson 1970), branch or perform on request. ... Leader of the Glasgow Information Retrieval Group based at Glasgow University. ... The Association for Computing Machinery, or ACM, was founded in 1947 as the worlds first scientific and educational computing society. ... SIGIR may refer to: Special Inspector General for Iraq Reconstruction Special Interest Group on Information Retrieval, a Special Interest Group (SIG) of the Association for Computing Machinery (ACM) concerned about information retrieval Category: ... Nicholas J. Belkin is a professor at School of Communication, Information and Library Studies at Rutgers University. ... Robert Roy Korfhage (December 2, 1930 - November 20, 1998) was an american computer scientist, famous for his contributions to information retrieval and several textbooks. ... The World Wide Web and WWW redirect here. ... Sir Tim Berners-Lee Sir Tim (Timothy John) Berners-Lee, KBE (TimBL or TBL) (b. ... CERN logo The European Organization for Nuclear Research (French: ), commonly known as CERN (see Naming), pronounced (or in French), is the worlds largest particle physics laboratory, situated just northwest of Geneva on the border between France and Switzerland. ... Robert Roy Korfhage (December 2, 1930 - November 20, 1998) was an american computer scientist, famous for his contributions to information retrieval and several textbooks. ... A search engine is an information retrieval system designed to help find information stored on a computer system. ...

Overview

An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. In information retrieval a query does not uniquely identify a single object in the collection. Instead, several objects may match the query, perhaps with different degrees of relevancy. Relevance is a term used to describe how pertinent, connected, or applicable some information is to a given matter. ...


An object is an entity which keeps or stores information in a database. User queries are matched to objects stored in the database. Depending on the application the data objects may be, for example, text documents, images or videos. Often the documents themselves are not kept or stored directly in the IR system, but are instead represented in the system by document surrogates. Areas where information retrieval techniques are employed include (the entries are in alphabetical order within each category): // Digital libraries Information filtering Recommender systems Media search Blog search Image retrieval Music retrieval News search Speech retrieval Video retrieval Search engines Desktop search Enterprise search Federated search Mobile search Social search Web...


Most IR systems compute a numeric score on how well each object in the database match the query, and rank the objects according to this value. The top ranking objects are then shown to the user. The process may then be iterated if the user wishes to refine the query.


Performance measures

Main article: Precision and Recall

Many different measures for evaluating the performance of information retrieval systems have been proposed. The measures require a collection of documents and a query. All common measures described here assume a ground truth notion of relevancy: every document is known to be either relevant or non-relevant to a particular query. In practice queries may be ill-posed and there may be different shades of relevancy. Precision and Recall are two widely used measures for evaluating the quality of results in domains such as Information Retrieval and statistical classification. ... The mathematical term well-posed problem stems from a definition given by Hadamard. ...


Precision

Precision is the fraction of the documents retrieved that are relevant to the user's information need. In computer science, particularly searching, relevance is a score assigned to a search result, representing how well the result meets the information need of the user who issued the search query. ...

 mbox{precision}=frac{|{mbox{relevant documents}}cap{mbox{retrieved documents}}|}{|{mbox{retrieved documents}}|}

In binary classification, precision is analogous to positive predictive value. Precision takes all retrieved documents into account. It can also be evaluated at a given cut-off rank, considering only the topmost results returned by the system. This measure is called precision at n or P@n. Binary classification is the task of classifying the members of a given set of objects into two groups on the basis of whether they have some property or not. ... The positive predictive value is the proportion of patients with positive test results who are correctly diagnosed. ...


Note that the meaning and usage of "precision" in the field of Information Retrieval differs from the definition of accuracy and precision within other branches of science and technology. “Accuracy” redirects here. ...


Recall

Recall is the fraction of the documents that are relevant to the query that are successfully retrieved.

mbox{recall}=frac{|{mbox{relevant documents}}cap{mbox{retrieved documents}}|}{|{mbox{relevant documents}}|}

In binary classification, recall is called sensitivity. So it can be looked at as the probability that a relevant document is retrieved by the query. The sensitivity of a binary classification test or algorithm, such as a blood test to determine if a person has a certain disease, or an automated system to detect faulty products in a factory, is a parameter that expresses something about the tests performance. ...


It is trivial to achieve recall of 100% by returning all documents in response to any query. Therefore recall alone is not enough but one needs to measure the number of non-relevant documents also, for example by computing the precision.


Fall-Out

The proportion of non-relevant documents that are retrieved, out of all non-relevant documents available:

 mbox{fall-out}=frac{|{mbox{non-relevant documents}}cap{mbox{retrieved documents}}|}{|{mbox{non-relevant documents}}|}

In binary classification, fall-out is closely related to specificity. More precisely: fall-out = 1 − specificity. It can be looked at as the probability that a non-relevant document is retrieved by the query. The specificity is a statistical measure of how well a binary classification test correctly identifies the negative cases, or those cases that do not meet the condition under study. ...


It is trivial to achieve fall-out of 0% by returning zero documents in response to any query.


F-measure

Main article: F-score

The weighted harmonic mean of precision and recall, the traditional F-measure or balanced F-score is: In mathematics, the harmonic mean (formerly sometimes called the subcontrary mean) is one of several kinds of average. ...

F = 2 cdot (mathrm{precision} cdot mathrm{recall}) / (mathrm{precision} + mathrm{recall}).,

This is also known as the F1 measure, because recall and precision are evenly weighted.


The general formula for non-negative real ß is:

F_beta = (1 + beta^2) cdot (mathrm{precision} cdot mathrm{recall}) / (beta^2 cdot mathrm{precision} + mathrm{recall}).,

Two other commonly used F measures are the F2 measure, which weights recall twice as much as precision, and the F0.5 measure, which weights precision twice as much as recall.


The F-measure was derived by van Rijsbergen (1979) so that Fβ "measures the effectiveness of retrieval with respect to a user who attaches ß times as much importance to recall as precision". It is based on van Rijsbergen's effectiveness measure E = 1 − (1 / (α / P + (1 − α) / R)). Their relationship is Fβ = 1 − E where α = 1 / (β2 + 1).


Average precision of precision and recall

The precision and recall are based on the whole list of documents returned by the system. Average precision emphasizes returning more relevant documents earlier. It is average of precisions computed after truncating the list after each of the relevant documents in turn:

 operatorname{AveP} = frac{sum_{r=1}^N (P(r) times mathrm{rel}(r))}{mbox{number of relevant documents}} !

where r is the rank, N the number retrieved, rel() a binary function on the relevance of a given rank, and P() precision at a given cut-off rank.


Model types

categorization of IR-models (translated from German entry, original source Dominik Kuropka)
categorization of IR-models (translated from German entry, original source Dominik Kuropka)

For the information retrieval to be efficient, the documents are typically transformed into a suitable representation. There are several representations. The picture on the right illustrates the relationship of some common models. In the picture, the models are categorized according to two dimensions: the mathematical basis and the properties of the model. Image File history File links Download high resolution version (2131x1101, 55 KB) Summary Adaption of the picture in the German article on Information Retrieval: http://de. ... Image File history File links Download high resolution version (2131x1101, 55 KB) Summary Adaption of the picture in the German article on Information Retrieval: http://de. ...


First dimension: mathematical basis

  • Set-theoretic models represent documents as sets of words or phrases. Similarities are usually derived from set-theoretic operations on those sets. Common models are:
  • Algebraic models represent documents and queries usually as vectors, matrices or tuples. The similarity of the query vector and document vector is represented as a scalar value.

Wikipedia does not have an article with this exact name. ... Vector space model (or term vector model) is an algebraic model used for information filtering, information retrieval, indexing and relevancy rankings. ... Latent semantic analysis (LSA) is a technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. ... In probability theory, Bayes theorem (often called Bayes Law) relates the conditional and marginal probabilities of two random events. ... [IN CONSTRUCTION] Probabilistic Relevance Model (PRM) is a formal framework for document retrieval developed in the 1970-80s by Stephen Robertson and Karen Spärk Jones, and lead to the developement of some of todays most succesfull document ranking algorithms. ... Statistical language models are probability distributions defined on sequences of words, P(w1. ... A means of determining the topic of text automatically using variational methods and graphical models developed by Dave Blei and Michael Jordan. ...

Second dimension: properties of the model

  • Models without term-interdependencies treat different terms/words as independent. This fact is usually represented in vector space models by the orthogonality assumption of term vectors or in probabilistic models by an independency assumption for term variables.
  • Models with immanent term interdependencies allow a representation of interdependencies between terms. However the degree of the interdependency between two terms is defined by the model itself. It is usually directly or indirectly derived (e.g. by dimensional reduction) from the co-occurrence of those terms in the whole set of documents.
  • Models with transcendent term interdependencies allow a representation of interdependencies between terms, but they do not allege how the interdependency between two terms is defined. They relay an external source for the degree of interdependency between two terms. (For example a human or sophisticated algorithms.)

In mathematics, orthogonal is synonymous with perpendicular when used as a simple adjective that is not part of any longer phrase with a standard definition. ... In statistics, dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction. ... Co-occurrence can either mean concurrence / coincidence or, in a more specific sense, the above-chance frequent occurrence of two terms from a text corpus alongside each other in a certain order. ...

Major figures

Gerard Salton (8 March 1927 - 28 August 1995) was a Professor of Computer Science at Cornell University. ... Hans Peter Luhn (July 1, 1896 in Barmen, Germany – 1964) computer scientist for IBM, creator of the Luhn algorithm and KWIC (Key Words In Context) indexing. ... Karen Spärck Jones is a British computer scientist, working at Cambridges Computer Laboratory. ... Leader of the Glasgow Information Retrieval Group based at Glasgow University. ...

Awards in the field

The Strix award is an annual award, presented since 1998 in memory of Dr Tony Kent, a past Fellow of the Information Scientists, who died in 1997. ... The Gerard Salton Award is presented by the Association for Computing Machinery (ACM) SIGIR (Special Interest Group on Information Retrieval) every three years to an individual who has made significant, sustained and continuing contributions to research in information retrieval. SIGIR also co-sponsors (with SIGWEB) the Vannevar Bush Award, for...

See also

Adversarial information retrieval (adversarial IR) is a topic in information retrieval that addresses tasks such as gathering, indexing, filtering, retrieving and ranking information from collections wherein a subset has been manipulated maliciously. ... Areas where information retrieval techniques are employed include (the entries are in alphabetical order within each category): // Digital libraries Information filtering Recommender systems Media search Blog search Image retrieval Music retrieval News search Speech retrieval Video retrieval Search engines Desktop search Enterprise search Federated search Mobile search Social search Web... Clustering can refer to Computer clustering - (in Computer science) the connection of many low-cost computers using special hardware and software such that they can be used as one larger computer. ... Controlled vocabularies are used in indexing schemes, subject headings, thesauri and taxonomies. ... Educational psychology is the study of how humans learn in educational settings, the effectiveness of educational interventions, the psychology of teaching, and the social psychology of schools as organizations. ... In text retrieval, full text search (also called free search text) refers to a technique for searching a computer-stored document or database; in a full text search, the search engine examines all of the words in every stored document as it tries to match search words supplied by the... Information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured or semistructured information from unstructured machine-readable documents. ... Not to be confused with informatics or information theory. ... Knowledge Visualization is a sub discipline of Information Design and Instructional Message Design (pedagogy; didactics, pedagogical psychology). ... In computer science, particularly searching, relevance is a score assigned to a search result, representing how well the result meets the information need of the user who issued the search query. ... Relevance feedback is a feature of some information retrieval systems. ... Search engine indexing entails how data is collected, parsed, and stored to facilitate fast and accurate retrieval. ... The tf-idf weight (term frequency - inverse document frequency) is a weight often used in Information Retrieval. ...

References

  1. ^ a b Singhal, Amit (2001). "Modern Information Retrieval: A Brief Overview". Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 24 (4): 35-43. 
  2. ^ Korfhage, Robert R. (1997). Information Storage and Retrieval. Wiley, 368 pages. ISBN 978-0-471-14338-3. 

External links

Leader of the Glasgow Information Retrieval Group based at Glasgow University. ...

  Results from FactBites:
 
Information Retrieval Research - SearchTools Topics (1597 words)
She offers seven suggestions to improve web retrieval: use faceted rather than hierarchical classification; don't try for a single "true" classification (and avoid the term 'ontology'); use subject and domain information retrieval vocabulary; remember the Bradford distribution; plan for explosive growth; provide tools for "human content processing"; learn from the history of information retrieval.
Introduction to the current state of information retrieval, including changes brought by the Web to a field that was previously oriented towards academia, libraries and corporate networks.
Information Storage and Retrieval Robert R. Korfhage: John Wiley and Sons, June, 1997 ISBN 04711143383, $49.95.
Encyclopedia4U - Information retrieval - Encyclopedia Article (218 words)
Information retrieval (IR) is the art and science of searching for information in documents, searching for documents themselves, searching for metadata which describes documents, or searching within databases, whether relational stand alone databases or hypertext networked databases such as the Internet or intranets, for text, sound, images or data.
There is a common confusion, however, between data, document, information, and text retrieval, and each of these have their own bodies of literature, theory, praxis and technologies.
It stands at the junction of many established fields, and draws upon cognitive psychology, information architecture, information design, human information behaviour, linguistics, semiotics, information science, computer science and librarianship.
  More results at FactBites »

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m