FACTOID # 8: Bookworms: Vermont has the highest number of high school teachers per capita and third highest number of librarians per capita.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Link rot

Link rot is the process by which links on a website gradually become irrelevant or broken as time goes on, because websites that they link to disappear, change their content or redirect to new locations. // A hyperlink (often referred to as simply a link), is a reference or navigation element in a document to another section of the same document, another document, or a specified section of another document, that automatically brings the referred information to the user when the navigation element is selected by... A website (alternatively, Web site or web site) is a collection of Web pages, images, videos and other digital assets that is hosted on one or several Web server(s), usually accessible via the Internet, cell phone or a LAN. A Web page is a document, typically written in HTML...


The phrase also describes the effects of failing to update web pages so that they become out-of-date, containing information that is old and useless, and that clutters up search engine results. This process most frequently occurs in personal web pages and is prevalent in free web hosts such as GeoCities, where there is no financial incentive to fix link rot (most of these sites have not been updated for years on end). A screenshot of a web page. ... Google search is the worlds most popular search engine. ... This article or section does not adequately cite its references or sources. ... Yahoo! GeoCities is a free webhosting service founded by David Bohnett and John Rezner in late 1994 as Beverly Hills Internet. ...

Contents

Prevalence

The 404 "not found" response is familiar to even the occasional Web user. A number of studies have examined the prevalence of link rot on the Web, in academic literature, and in digital libraries. In a 2003 experiment, Fetterly et al. (2003) discovered that about 0.5% of web pages disappeared each week. McCown et al. (2005) discovered that half of the URLs cited in D-Lib Magazine articles were no longer accessible 10 years after publication, and other studies have shown link rot in academic literature to be even worse (Spinellis, 2003, Lawrence et al., 2001). Nelson and Allen (2002) examined link rot in digital libraries found that about 3% of the objects were no longer accessible after one year. A 404 error is presented to the user. ... // Uniform Resource Locator (URL) is a technical, Web-related term used in two distinct meanings: In popular usage, it is a widespread synonym for Uniform Resource Identifier (URI) — many popular and technical texts will use the term URL when referring to URI; Strictly, the idea of a uniform syntax for... D-Lib Magazine is an on-line magazine dedicated to digital library research and development. ...


News sites contribute to the link rot problem by commonly keeping only recent news articles online where they are freely accessible at their original URLs, then removing them or moving them to a paid subscription area. This causes a heavy loss of supporting links in sites discussing newsworthy events and using news sites as references.


Discovering

Detecting link rot for a given URL is difficult using automated methods. If a URL is accessed and returns back an HTTP 200 (OK) response, it may be considered accessible, but the contents of the page may have changed and may no longer be relevant. Some web servers also return a soft 404, a page returned with a 200 (OK) response (instead of a 404) that indicates the URL is no longer accessible. Bar-Yossef et al. (2004) developed a heuristic for automatically discovering soft 404s. A Uniform Resource Locator, URL (spelled out as an acronym, not pronounced as earl), or Web address, is a standardized address name layout for resources (such as documents or images) on the Internet (or elsewhere). ... HTTP (for HyperText Transfer Protocol) is the primary method used to convey information on the World Wide Web. ... The following is a list of HTTP response status codes and standard associated phrases, intended to give a short textual description of the status. ... A 404 error is presented to the user. ... Mozilla Firefox displaying an Apache HTTP Server 404 error page. ...


Combating

Web archiving

To combat link rot, web archivists are actively engaged in collecting the Web or particular portions of the Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. The largest web archiving organization is the Internet Archive which strives to maintain an archive of the entire Web. National libraries, national archives and various consortia of organizations are also involved in archiving culturally important Web content. Web archiving is the process of collecting the Web or particular portions of the Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. ... WWWs historical logo designed by Robert Cailliau The World Wide Web is a system of interlinked, hypertext documents accessed via the Internet. ... It has been suggested that Digital obsolescence be merged into this article or section. ... Archive of the AMVC hahahahaAn archive refers to a collection of records, and also refers to the location in which these records are kept. ... An archive site is a type of website that stores information on, or the actual, webpages from the past for anyone to view. ... The logo of Internet Archive The Internet Archive (IA) is a non-profit organization dedicated to maintaining an on-line library and archive of Web and multimedia resources. ... A national library is a library specifically established by the government of a country to serve as the preeminent repository of information for that country. ... A national archive is a central archive maintained by a nation. ...


Individuals may also use a number of tools that allow them to archive web resources that may go missing in the future:

  • WebCite, a tool specifically for scholarly authors, journal editors and publishers to permanently archive "on-demand" and retrieve cited Internet references (Eysenbach and Trudel, 2005).
  • StayBoyStay is an on demand archiving service that can archive any number of webpages. The new URI includes a hash and date that prove when the archive was taken and that tampering has not occurred.
  • Archive-It, a subscription service that allows institutions to build, manage and search their own web archive
  • hanzo:web is a personal web archiving service created by Hanzo Archives that can archive a single web resource, a cluster of web resources, or an entire website, as a one-off collection, scheduled/repeated collection, an RSS/Atom feed collection or collect on-demand via Hanzo's open API.
  • Internet Archive (The Internet Archive Wayback Machine) is free to use and automatically takes periodic snapshots of pages that can then be accessed for free and without registration many years later simply by typing in the URL, which is helpful when dealing with link rot.

WebCite is a free tool for scholarly authors to webcite webpages which have previously been archived, to allow readers in the future (10, 20 50, 100 years) to retrieve what has been cited by the author. ... An application programming interface (API) is a source code interface that a computer system or program library provides to support requests for services to be made of it by a Length. ... The logo of Internet Archive The Internet Archive (IA) is a non-profit organization dedicated to maintaining an on-line library and archive of Web and multimedia resources. ...

Webmasters

Webmasters have developed a number of best practices for combating link rot: A webmaster is a person responsible for designing, developing, marketing, or maintaining Web site(s). ... Best Practice is a management idea which asserts that there is a technique, method, process, activity, incentive or reward that is more effective at delivering a particular outcome than any other technique, method, process, etc. ...

  • Avoiding unmanaged hyperlink collections
  • Avoiding links to pages deep in a website ("deep linking")
  • Using hyperlink checking software or a Content Management System (CMS) that automatically checks links
  • Using permalinks
  • Using redirection mechanisms (e.g. "301: Moved Permanently") to automatically refer browsers and crawlers to the new location of a URL

Deep linking, on the World Wide Web, is the act of placing on a Web page a hyperlink that points to a specific page or image within another website, as opposed to that websites main or home page. ... A Content Management System (CMS) is a software system used for content management. ... Example of a permalink at Jason Kottkes blog. ... URL redirection, also called URL forwarding, domain redirection and domain forwarding, is a technique on the World Wide Web for making a web page available under many URLs. ...

Authors citing URLs

A number of studies have shown how wide-spread link rot is in academic literature (see below). Authors of scholarly publications have also developed best-practices for combating link rot in their work:

On the Internet, a persistent uniform resource locator (PURL) is a uniform resource locator (URL) (i. ... A digital object identifier (or DOI) is a standard for persistently identifying a piece of intellectual property on a digital network and associating it with related data, the metadata, in a structured extensible way. ... WebCite is a free tool for scholarly authors to webcite webpages which have previously been archived, to allow readers in the future (10, 20 50, 100 years) to retrieve what has been cited by the author. ...

References

Link rot on the Web

  • Ziv Bar-Yossef, Andrei Z. Broder, Ravi Kumar, and Andrew Tomkins (2004). "Sic transit gloria telae: towards an understanding of the Web’s decay". Proceedings of the 13th international conference on World Wide Web: 328–337. 
  • Tim Berners-Lee (1998). "Cool URIs Don’t Change". 
  • Gunther Eysenbach and Mathieu Trudel (2005). "Going, going, still there: using the WebCite service to permanently archive cited web pages". Journal of Medical Internet Research 7 (5). 
  • Dennis Fetterly, Mark Manasse, Marc Najork, and Janet Wiener (2003). "A large-scale study of the evolution of web pages". Proceedings of the 12th international conference on World Wide Web. 
  • Wallace Koehler (2004). "A longitudinal study of web pages continued: A consideration of document persistence". Information Research 9 (2). 
  • John Markwell and David W. Brooks (2002). "Broken Links: The Ephemeral Nature of Educational WWW Hyperlinks". Journal of Science Education and Technology 11 (2): 105-108. 

In academic literature

  • Robert P. Dellavalle, Eric J. Hester, Lauren F. Heilig, Amanda L. Drake, Jeff W. Kuntzman, Marla Graber, Lisa M. Schilling (2003). "Going, Going, Gone: Lost Internet References". Science 302 (5646): 787-788. 
  • Steve Lawrence, David M. Pennock, Gary William Flake, Robert Krovetz, Frans M. Coetzee, Eric Glover, Finn Arup Nielsen, Andries Kruger, C. Lee Giles (2001). "Persistence of Web References in Scientific Research". Computer 34 (2): 26-31. 
  • Frank McCown, Sheffan Chan, Michael L. Nelson, and Johan Bollen (2005). "The Availability and Persistence of Web References in D-Lib Magazine". Proceedings of the 5th International Web Archiving Workshop and Digital Preservation (IWAW'05). 
  • Carmine Sellitto (2005). "The impact of impermanent Web-located citations: A study of 123 scholarly conference publications". Journal of the American Society for Information Science and Technology 56 (7): 695-703. 
  • Diomidis Spinellis (2003). "The Decay and Failures of Web References". Communications of the ACM 46 (1): 71-77. 

Dr. Steve Lawrence was among the group at NEC Research which was responsible for the creation of the Search Engine/Digital Library CiteSeer. ... Dr. C. Lee Giles is the David Reese Professor at the College of Information Sciences and Technology at the Pennsylvania State University. ...

In digital libraries

  • Michael L. Nelson and B. Danette Allen (2002). "Object Persistence and Availability in Digital Libraries". D-Lib Magazine 8 (1). 

See also

It has been suggested that Digital obsolescence be merged into this article or section. ... // Uniform Resource Locator (URL) is a technical, Web-related term used in two distinct meanings: In popular usage, it is a widespread synonym for Uniform Resource Identifier (URI) — many popular and technical texts will use the term URL when referring to URI; Strictly, the idea of a uniform syntax for... Web archiving is the process of collecting the Web or particular portions of the Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. ... WebCite is a free tool for scholarly authors to webcite webpages which have previously been archived, to allow readers in the future (10, 20 50, 100 years) to retrieve what has been cited by the author. ...

External links

  • Future-Proofing Your URIs
  • Jakob Nielsen, "Fighting Linkrot", Jakob Nielsen's Alertbox, June 14, 1998.
  • Warrick - a tool for recovering lost websites from the Internet Archive and search engine caches
  • Pagefactor - small, but growing, user-contributed database of moved URLs

  Results from FactBites:
 
Newswise | (741 words)
Link rot is the decay of World Wide Web links as the sites they connect to change or disappear.
In a paper to be published in the June issue of the Journal of Science Education and Technology, Brooks and Markwell likened the rate of link rot to the type of "extinction equation" commonly used to describe natural processes such as radioactive decay.
Educational (.edu) links were the most unstable, with 17.5 percent of links lost in 13 months, followed by commercial (.com) links at 16.5 percent and nonprofit organization (.org) links at 11.6 percent.
Bit rot - Wikipedia, the free encyclopedia (486 words)
Bit rot is a colloquial computing term used either to describe gradual decay of storage media or to facetiously describe the spontaneous degradation of a software program over time.
Bit rot is often defined as the event in which the small electric charge of a bit in memory disperses, possibly altering program code.
Bit rot can also be used to describe the very real phenomenon of data stored in EPROMs gradually decaying over the duration of many years, or in the decay of data stored on CD or DVD disks or other types of consumer storage.
  More results at FactBites »

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m