Encyclopedia > DjVu
File extension: .djvu, .djv
MIME type: image/vnd.djvu
Type code: DJVU
Developed by: ATT Research
Type of format: Image file formats

DjVu (pronounced déjà vu) is a computer file format designed primarily to store scanned images, especially those containing text and line drawings. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy compression for bitonal images. This allows for high quality, readable images to be stored in a minimum of space, so that they can be made available on the web. Image File history File links No higher resolution available. ... A filename extension is a suffix to the name of a computer file applied to show its format. ... Multipurpose Internet Mail Extensions (MIME) is an Internet Standard that extends the format of e-mail to support: text in character sets other than US-ASCII; non-text attachments; multi-part message bodies; and header information in non-ASCII character sets. ... A type code is a mechanism used in pre-Mac OS X versions of the Macintosh operating system to denote a files format, in a manner similar to file extensions in other operating systems. ... Image file formats provide a standardized method of organizing and storing image data. ... The term déjà vu (IPA:) (French for already seen, also called paramnesia from the Greek word para (παρα) for parallel and mnimi (μνήμη) for memory) describes the experience of feeling that one has witnessed or experienced a new situation previously. ... A BlueGene supercomputer cabinet. ... A file format is a particular way to encode information for storage in a computer file. ... In computing, a scanner is a device that analyzes an image, printed text, or handwriting, or an object (such as an ornament) and converts it to a digital image. ... Arithmetic coding is a method for lossless data compression. ... A lossy data compression method is one where compressing a file and then decompressing it retrieves a file that may well be different to the original, but is close enough to be useful in some way. ... The use of more than two keys simultaneously is known in music as polytonality. ... WWWs historical logo designed by Robert Cailliau The World Wide Web (or the Web) is a system of interlinked, hypertext documents accessed via the Internet. ...

DjVu has been promoted as an alternative to PDF, actually outperforming PDF on most scanned documents. The DjVu developers report that color magazine pages compress to 40–70KB, black and white technical papers compress to 15–40KB, and ancient manuscripts compress to around 100KB; all of these are significantly better than the typical 500KB required for a satisfactory JPEG image. Like PDF, DjVu can contain an OCRed text layer, making it easy to perform cut and paste and text search operations. Portable Document Format (PDF) is a file format created by Adobe Systems in 1993 for desktop publishing use. ... Optical Character Recognition, usually abbreviated to OCR, is a type of computer software designed to translate images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text, or to translate pictures of characters into a standard encoding scheme representing them (e. ... In computing, cut and paste is a user-interface paradigm for a means of moving text (typically plain text) or other data from a source to a destination. ...



The DjVu technology was originally developed by Yann Le Cun, Léon Bottou, Patrick Haffner, and Paul G. Howard at AT&T Laboratories in 1996. DjVu is a free file format. The file format specification is published as well as source code for the reference library. The ownership rights to the commercial development of the encoding software have been transferred to different companies over the years, including AT&T and LizardTech. The original authors maintain a GPLed implementation named "DjVuLibre". AT&T Labs is the research & development arm of American telecommunications giant, AT&T. AT&T Labs originated in 1996, when AT&T spun-off most of its Bell Labs research business as Lucent Technologies. ... Year 1996 (MCMXCVI) was a leap year starting on Monday (link will display full 1996 Gregorian calendar). ... A free file format is a file format that is free of any patents or copyright. ... AT&T Inc. ... The GNU logo The GNU General Public License (GNU GPL or simply GPL) is a widely-used free software license, originally written by Richard Stallman for the GNU project. ...

DjVu divides a single image into many different images, then compresses them separately. To create a DjVu file, the initial image is first separated into three images: a background image, a foreground image, and a mask image. The background and foreground images are typically lower-resolution color images (e.g., 100dpi); the mask image is a high-resolution bilevel image (e.g., 300dpi) and is typically where the text is stored. The background and foreground images are then compressed using a wavelet-based compression algorithm named IW44. The mask image is compressed using a method called JB2. The JB2 encoding method identifies nearly-identical shapes on the page, such as multiple occurrences of a particular character in a given font, style, and size. It compresses the bitmap of each unique shape separately, and then encodes the locations where each shape appears on the page. Thus, instead of compressing a letter "e" in a given font multiple times, it compresses the letter "e" once (as a compressed bit image) and then records every place on the page it occurs.

In 2002 the DjVu file format was chosen by the Internet archive as the format in which its Million Book Project provides scanned public domain books online (along with TIFF and PDF). The logo of Internet Archive Internet Archive headquarters The Internet Archive (IA) is a non-profit organization dedicated to maintaining an on-line library and archive of Web and multimedia resources. ... The public domain comprises the body of all creative works and other knowledge—writing, artwork, music, science, inventions, and others—in which no person or organization has any proprietary interest. ... This article is about TIFF, the computer image format. ...

DjVu format will be used by the One Laptop per Child project in order to easily supply existing paper books in an eBook format. The advantage of DjVu is that it is highly compressed and it does not require any font support. [1] First working prototype of $100 laptop One Laptop Per Child is a non-profit organization set up to oversee the $100 laptop project. ...

Comparison of the DjVu and PDF file formats

The primary difference between DjVu and PDF is that DjVu is a raster format, whereas PDF is primarily a vector format. This difference has several consequences: Suppose the smiley face in the top left corner is an RGB bitmap image. ... Example showing effect of vector graphics versus raster graphics. ...

  • The maximal resolution of a DjVu file must be specified at creation time. On the other hand, a vector image represented by a PDF file can usually be magnified at arbitrary resolution without loss of quality.
  • DjVu files render characters as images, without using fonts. PDF files usually render characters using fonts. Many PDF files do not embed the full representation of the necessary fonts, but simply specify their names and properties. The PDF viewer uses the exact same font if it is available. Otherwise it transforms an available font to compute an approximation with the metrics of the desired font.

The PDF format defines various means to store and render raster images. This capability is often used for representing scanned documents. Such PDF files suffer from the same fundamental limitations as raster formats. The size of these files depends dramatically on the underlying compression scheme. Some PDF compression schemes sometimes approach the performance of DjVu. In principle, DjVu compression could be adapted to represent raster images in PDF files. However there is no momentum for creating such a combination because it pleases neither the proponents of DjVu nor those of PDF. In mathematics, especially in order theory, a maximal element of a subset S of some partially ordered set is an element of S that is not smaller than any other element in S. The term minimal element is defined dually. ...

Both the DjVu and PDF formats define features that do not address the representation of the document appearance but aim at creating a document delivery platform. Both DjVu files and PDF files can be enriched with text, table of contents, hyperlinks, and metadata. The PDF format goes further by allowing sounds, interactive forms, and JavaScript programs. The DjVu format defines a protocol to transfer document pages on demand over the Internet. On the other hand, the DjVu format does not specify a way to certify the authenticity of a document or to define Digital Rights Management policies. // A hyperlink (often referred to as simply a link), is a reference or navigation element in a document to another section of the same document, another document, or a specified section of another document, that automatically brings the referred information to the user when the navigation element is selected by... The simplest definition of metadata is that it is data about data. ... JavaScript is a scripting language most often used for client-side web development. ... Digital rights management (DRM) is an umbrella term referring to technologies used by publishers or copyright owners to control access to or usage of digital data or hardware, and to restrictions associated with a specific instance of a digital work or device. ...

Which format to use (DjVu or PDF)

With PDF documents one can zoom in on vector-based content to an arbitrary depth or print them at an arbitrarily high resolution without introducing quality loss or jaggedness inherent to raster formats. On the other hand, if a PDF is simply used as a container for non-vector images (such as scans) those images will not gain anything. Another thing to keep in mind is that one can always convert a vector format into a raster format, usually with irrevocable data loss, but the other direction is very difficult.

PDF is most useful when the original source is an electronic document such as a Microsoft Word doc or TeX file. Such documents benefit most from the vector graphics technology that underlies PDF. DjVu files can be marginally smaller but only deliver a high resolution image, possibly enriched with the associated text. Microsoft Office Word is Microsofts flagship word processing software. ... TeX (IPA: as in Greek, often in English; written with a lowercase e in imitation of the logo) is a typesetting system created by Donald Knuth. ... Example showing effect of vector graphics versus raster graphics. ...

DjVu is very good for image files, and has especially been optimized for scanned text and images. If one has a set of scanned pages from a book or article, DjVu is superior to PDF. However, PDF could be better if the scanned raster images can be transformed into high quality vector graphics, for instance by applying Optical Character Recognition to the scanned image, identifying the fonts, and carefully proofreading the resulting file. This procedure is often undesirable or time/cost prohibitive. Suitable fonts might not be available, or one may want to preserve the original document more exactly, including signatures, marginal comments, paper texture, or other markings. In such cases, DjVu is the better choice. Optical Character Recognition, usually abbreviated to OCR, is a type of computer software designed to translate images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text, or to translate pictures of characters into a standard encoding scheme representing them (e. ...

