FACTOID # 27: If you're itching to live in a trailer park, hitch up your home and head to South Carolina, where a whopping 18% of residences are mobile homes.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Data compression

In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes. For example, this article could be encoded with fewer bits if one were to accept the convention that the word "compression" be encoded as "comp." One popular instance of compression with which many computer users are familiar is the ZIP file format, which, as well as providing compression, acts as an archiver, storing many files in a single output file. Source code (commonly just source or code) is any series of statements written in some human-readable computer programming language. ... Computer science, or computing science, is the study of the theoretical foundations of information and computation and their implementation and application in computer systems. ... Not to be confused with information technology, information science, or informatics. ... This article is about the unit of information. ... In communications, a code is a rule for converting a piece of information (for example, a letter, word, or phrase) into another form or representation, not necessarily of the same type. ... An encoder is a device used to encode a signal (such as a bitstream) or data into a form that is acceptable for transmission or storage. ... The ZIP file format is the most widely-used compressed file format in the IBM PC world. ... A file archiver combines a number of files together into one archive file, or a series of archive files, for easier transportation or storage. ...


As with any communication, compressed data communication only works when both the sender and receiver of the information understand the encoding scheme. For example, this text makes sense only if the receiver understands that it is intended to be interpreted as characters representing the English language. Similarly, compressed data can only be understood if the decoding method is known by the receiver. A sender was a circuit in a 20th century electromechanical telephone exchange which sent telephone numbers and other information to another exchange. ... The ASCII codes for the word Wikipedia represented in binary, the numeral system most commonly used for encoding computer information. ...


Compression is useful because it helps reduce the consumption of expensive resources, such as hard disk space or transmission bandwidth. On the downside, compressed data must be decompressed to be used, and this extra processing may be detrimental to some applications. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it's being decompressed (the option of decompressing the video in full before watching it may be inconvenient, and requires storage space for the decompressed video). The design of data compression schemes therefore involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced (if using a lossy compression scheme), and the computational resources required to compress and uncompress the data. Typical hard drives of the mid-1990s. ... Bandwidth is the difference between the upper and lower cutoff frequencies of, for example, a filter, a communication channel, or a signal spectrum, and is typically measured in hertz. ... A lossy data compression method is one where compressing data and then decompressing it retrieves data that may well be different from the original, but is close enough to be useful in some way. ...

Contents

Lossless vs. Lossy Compression

Lossless compression algorithms usually exploit statistical redundancy in such a way as to represent the sender's data more concisely, but nevertheless perfectly. Lossless compression is possible because most real-world data has statistical redundancy. For example, in English text, the letter 'e' is much more common than the letter 'z', and the probability that the letter 'q' will be followed by the letter 'z' is very small. Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. ...


Another kind of compression, called lossy data compression, is possible if some loss of fidelity is acceptable. For example, a person viewing a picture or television video scene might not notice if some of its finest details are removed or not represented perfectly (i.e. may not even notice compression artifacts). Similarly, two clips of audio may be perceived as the same to a listener even though one is missing details found in the other. Lossy data compression algorithms introduce relatively minor differences and represent the picture, video, or audio using fewer bits. A lossy data compression method is one where compressing data and then decompressing it retrieves data that may well be different from the original, but is close enough to be useful in some way. ... For the financial services company, see Fidelity Investments. ... A compression artifact (or artefact) is the result of an aggressive data compression scheme applied to an image, audio, or video that discards some data which is determined by an algorithm to be of lesser importance to the overall content but which is nonetheless discernible and objectionable to the user. ...


Lossless compression schemes are reversible so that the original data can be reconstructed, while lossy schemes accept some loss of data in order to achieve higher compression.


However, lossless data compression algorithms will always fail to compress some files; indeed, any compression algorithm will necessarily fail to compress any data containing no discernible patterns. Attempts to compress data that has been compressed already will therefore usually result in an expansion, as will attempts to compress encrypted data. Encrypt redirects here. ...


In practice, lossy data compression will also come to a point where compressing again does not work, although an extremely lossy algorithm, which for example always removes the last byte of a file, will always compress a file up to the point where it is empty.


An example of lossless vs. lossy compression is the following string:

888883333333

This string can be compressed as:

8[5]3[7].

Interpreted as, "5 eights, 7 threes", the original string is perfectly recreated, just written in a smaller form. In a lossy system, using

83

instead, the original data is lost, at the benefit of a smaller file size.


Applications

The above is a very simple example of run-length encoding, wherein large runs of consecutive identical data values are replaced by a simple code with the data value and length of the run. This is an example of lossless data compression. It is often used to optimize disk space on office computers, or better use the connection bandwidth in a computer network. For symbolic data such as spreadsheets, text, executable programs, etc., losslessness is essential because changing even a single bit cannot be tolerated (except in some limited cases). Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. ... In telecommunication, the term bandwidth compression has the following meanings: The reduction of the bandwidth needed to transmit a given amount of data in a given time. ... A computer network is a useless group of computers. ... Executable compression is any means of compressing an executable file and combining the compressed data with the decompression code it needs into a single executable. ...


For visual and audio data, some loss of quality can be tolerated without losing the essential nature of the data. By taking advantage of the limitations of the human sensory system, a great deal of space can be saved while producing an output which is nearly indistinguishable from the original. These lossy data compression methods typically offer a three-way tradeoff between compression speed, compressed data size and quality loss.


Lossy image compression is used in digital cameras, greatly increasing their storage capacities while hardly degrading picture quality at all. Similarly, DVDs use the lossy MPEG-2 codec for video compression. Image compression is the application of Data compression on digital images. ... Look up digital camera in Wiktionary, the free dictionary. ... DVD (also known as Digital Versatile Disc or Digital Video Disc) is a popular optical disc storage media format. ... MPEG-2 is a standard for the generic coding of moving pictures and associated audio information [1]. It is widely used around the world to specify the format of the digital television signals that are broadcast by terrestrial (over-the-air), cable, and direct broadcast satellite TV systems. ... A video codec is a device or software that enables video compression and or decompression for digital video. ... Video compression refers to making a digital video signal use less data, without noticeably reducing the quality of the picture. ...


In lossy audio compression, methods of psychoacoustics are used to remove non-audible (or less audible) components of the signal. Compression of human speech is often performed with even more specialized techniques, so that "speech compression" or "voice coding" is sometimes distinguished as a separate discipline than "audio compression". Different audio and speech compression standards are listed under audio codecs. Voice compression is used in Internet telephony for example, while audio compression is used for CD ripping and is decoded by audio players. Audio compression can mean two things: Audio data compression - in which the amount of data in a recorded waveform is reduced for transmission. ... Psychoacoustics is the study of subjective human perception of sounds. ... This article or section does not cite its references or sources. ... Speech coding is the compression of speech (into a code) for transmission with speech codecs that use audio signal processing and speech processing techniques. ... An audio codec is a computer program that compresses/decompresses digital audio data according to a given audio file format or streaming audio format. ... A typical VoIP Solution A typical analog telephone adapter for connecting an ordinary phone to a VoIP network Voice over IP (also called VoIP, IP Telephony, Internet telephony, and Digital Phone) is the routing of voice conversations over the Internet or any other IP-based network. ...


Theory

The theoretical background of compression is provided by information theory (which is closely related to algorithmic information theory) and by rate-distortion theory. These fields of study were essentially created by Claude Shannon, who published fundamental papers on the topic in the late 1940s and early 1950s. Doyle and Carlson (2000) wrote that data compression "has one of the simplest and most elegant design theories in all of engineering". Cryptography and coding theory are also closely related. The idea of data compression is deeply connected with statistical inference. Not to be confused with information technology, information science, or informatics. ... This article or section is in need of attention from an expert on the subject. ... Rate distortion theory is the branch of information theory addressing the problem of determining the minimal amount of entropy (or information) R that should be communicated over a channel such that the source (input signal) can be reconstructed at the receiver (output signal) with given distortion D. As such, rate... Claude Shannon Claude Elwood Shannon (April 30, 1916 – February 24, 2001), an American electrical engineer and mathematician, has been called the father of information theory,[1] and was the founder of practical digital circuit design theory. ... Year 2000 (MM) was a leap year starting on Saturday (link will display full 2000 Gregorian calendar). ... The German Lorenz cipher machine, used in World War II for encryption of very high-level general staff messages Cryptography (or cryptology; derived from Greek κρυπτός kryptós hidden, and the verb γράφω gráfo write or λεγειν legein to speak) is the study of message secrecy. ... Coding theory is a branch of mathematics and computer science dealing with the error-prone process of transmitting data across noisy channels, via clever means, so that a large number of errors that occur can be corrected. ...


Many lossless data compression systems can be viewed in terms of a four-stage model. Lossy data compression systems typically include even more stages, including, for example, prediction, frequency transformation, and quantization.


The Lempel-Ziv (LZ) compression methods are among the most popular algorithms for lossless storage. DEFLATE is a variation on LZ which is optimized for decompression speed and compression ratio, although compression can be slow. DEFLATE is used in PKZIP, gzip and PNG. LZW (Lempel-Ziv-Welch) is used in GIF images. Also noteworthy are the LZR (LZ-Renau) methods, which serve as the basis of the Zip method. LZ methods utilize a table-based compression model where table entries are substituted for repeated strings of data. For most LZ methods, this table is generated dynamically from earlier data in the input. The table itself is often Huffman encoded (e.g. SHRI, LZX). A current LZ-based coding scheme that performs well is LZX, used in Microsoft's CAB format. DEFLATE is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. ... PKZIP is an archiving tool originally written by the late Phil Katz, and marketed by his company PKWARE, Inc. ... gzip is a software application used for file compression. ... PNG (Portable Network Graphics), sometimes pronounced as ping, is a relatively new bitmap image format that is becoming popular on the World Wide Web and elsewhere. ... LZW (Lempel-Ziv-Welch) is an implementation of a lossless data compression algorithm created by Abraham Lempel and Jacob Ziv. ... In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. ... LZX is the name of an LZ77 family compression algorithm. ... In computing, CAB is the Microsoft Windows native compressed archive format. ...


The very best compressors use probabilistic models whose predictions are coupled to an algorithm called arithmetic coding. Arithmetic coding, invented by Jorma Rissanen, and turned into a practical method by Witten, Neal, and Cleary, achieves superior compression to the better-known Huffman algorithm, and lends itself especially well to adaptive data compression tasks where the predictions are strongly context-dependent. Arithmetic coding is used in the bilevel image-compression standard JBIG, and the document-compression standard DjVu. The text entry system, Dasher, is an inverse-arithmetic-coder. The introduction to this article provides insufficient context for those unfamiliar with the subject matter. ... JBIG is a lossless image compression standard from the Joint Bi-level Image Experts Group, standardized as ISO/IEC standard 11544 and as ITU-T recommendation T.82. ... DjVu (pronounced déjà vu) is a computer file format designed primarily to store scanned images, especially those containing text and line drawings. ... Dasher running under Linux Dasher is a computer accessibility tool enabling users to enter text efficiently using a pointing device rather than a keyboard. ...


Matt Mahoney, one of the 3 founders of the Hutter Prize, claims that "Compression is Equivalent to General Intelligence" [1]. The Hutter Prize is a cash prize funded by Marcus Hutter which rewards data compression improvements on a specific 100 MB English text file. ... For other uses, see Intelligence (disambiguation). ...


Comparative

Independent comparison of different methods of data compression (Results & Softwares, in French. Airelle, 2007). Numbers in parenthesis are the rank of the method of compression for the category of file specified above.

  • Text files, such as .htm or .txt, can be hard compressed.
  • Some files are already compressed (e.g. .mp3 or .zip), so the compression rate of such files is poor. Due to the addition of header data, often there are diminishing returns in such compression, causing the file to actually be slightly larger upon storage.
  • To be more representative of the performance, the global score (/20) is calculated with a non-parametric formula after the sum of the ranks (1 to 20) for each of the 20 tested methods.
Comparison of different methods of data compression
Files *.avi *.dll *.doc *.exe *.gif *.htm *.jpg *.mp3 *.mpg *.pdf *.txt *.wav *.zip Notation TOTAL
Number
of
files
16 26 138 24 246 79 44 29 8 36 8 1 19   674
Initial
size
5,261,152 5,254,220 5,254,656 5,254,056 5,246,209 5,261,187 5,246,116 5,250,432 5,257,720 5,257,876 5,253,436 5,256,024 5,262,680   68,315,764
7z 4,524,067 (2) 1,543,179 (3) 147,690 (3) 3,910,541 (3) 4 620 354 (1) 341,996 (4) 4,770,061 (4) 5,053,813 (2) 4,879,067 (5) 4,258,863 (3) 1,270,884 (3) 3,670,225 (5) 5,226,742 (14) 16/20 44,217,482
arj 4,696,659 (9) 2,160,530 (15) 1,018,050 (17) 4,130,505 (11) 4,702,449 (12) 898,370 (17) 4,803,740 (11) 5,108,093 (17) 4,910,699 (16) 4,606,736 (15) 1,875,329 (16) 4,450,535 (12) 5,223,905 (13) 6.1/20 48,585,600
bh 4,703,291 (12) 2,156,986 (12) 1,010,284 (15) 4,128,594 (9) 4,693,021 (9) 889,650 (15) 4,806,914 (13) 5,105,811 (13) 4,904,209 (11) 4,601,545 (13) 1,848,972 (13) 4,451,648 (15) 5,201,639 (4) 7.5/20 48,502,564
bz2 4,720,926 (18) 2,095,832 (7) 573,721 (5) 4,273,885 (18) 4,896,084 (18) 645,243 (5) 4,743,918 (2) 5,069,593 (4) 4,888,293 (7) 4,444,829 (5) 1,531,448 (6) 3,771,508 (7) 5,238,677 (16) 11.7/20 46,893,957
bza 4,639,340 (6) 2,166,940 (17) 987,806 (11) 4,231,254 (17) 4,878,327 (17) 783,188 (8) 4,787,973 (7) 5,076,189 (5) 4,873,810 (2) 4,618,970 (17) 1,516,326 (5) 3,770,938 (6) 5,227,572 (15) 9.8/20 47,558,633
cab 4,701,113 (11) 2,148,386 (10) 893,796 (7) 4,127,044 (8) 4,678,810 (5) 842,129 (10) 4,798,500 (8) 5,099,787 (8) 4,900,314 (10) 4,584,969 (8) 1,846,233 (12) 4,451,857 (18) 5,201,717 (5) 10.8/20 48,274,655
gza 4,703,371 (13) 2,157,116 (13) 1,001,990 (13) 4,126,436 (7) 4,693,136 (10) 874,444 (12) 4,803,739 (10) 5,105,765 (12) 4,904,249 (12) 4,597,720 (11) 1,840,188 (11) 4,451,638 (14) 5,201,436 (3) 9.2/20 48,461,228
j 4,678,506 (8) 1,914,777 (5) 703,722 (6) 4,057,445 (5) 4,681,437 (6) 691,916 (6) 4,805,059 (12) 5,092,070 (7) 4,898,847 (8) 4,326,394 (4) 1,629,228 (8) 3,594,954 (4) 5,215,150 (12) 13/20 46,289,505
jar 4,704,088 (14) 2,158,273 (14) 1,017,205 (16) 4,129,816 (10) 4,705,456 (13) 893,622 (16) 4,809,136 (16) 5,107,254 (15) 4,904,615 (13) 4,603,367 (14) 1,849,394 (14) 4,451,718 (16) 5,202,611 (8) 6.2/20 48,536,555
lha 4,711,090 (16) 2,215,476 (18) 1,020,194 (18) 4,204,071 (15) 4,830,501 (15) 913,845 (18) 4,918,792 (19) 5,206,933 (19) 5,066,716 (19) 4,802,049 (19) 1,895,771 (17) 4,447,253 (10) 5,263,136 (18) 6.7/20 49,495,827
lzh 4,711,090 (16) 2,215,476 (18) 1,066,340 (19) 4,143,461 (14) 4,819,157 (14) 971,166 (19) 4,816,349 (18) 5,107,584 (16) 4,924,974 (18) 4,635,416 (18) 1,945,961 (19) 4,449,756 (11) 5,212,837 (11) 5.3/20 49,019,567
pkz 4,899,083 (20) 2,354,373 (20) 1,173,097 (20) 4,401,289 (20) 5,120,590 (19) 1,018,250 (20) 5,162,114 (20) 5,253,006 (20) 5,203,747 (20) 5,076,577 (20) 2,084,290 (20) 5,027,854 (20) 5,264,213 (19) 0.2/20 52,038,483
rar 4,634,009 (5) 1,693,150 (4) 173,313 (4) 3,948,241 (4) 4,639,881 (4) 318,269 (3) 4,780,095 (6) 5,081 085 (6) 4,887,973 (6) 4,258,775 (2) 1,318,381 (4) 2,657,731 (3) 5,202,579 (7) 15.5/20 43,593,482
rk 4,589,894 (3) 1,474,339 (2) 132,629 (1) 3,866,814 (1) 4,628,017 (3) 257,588 (1) 4,434,701 (1) 5,017,545 (1) 4,787,286 (1) 4,498,992 (6) 1,168,720 (1) 1,659,771 (1) 5,183,337 (1) 18.2/20 41,699,633
rs 4,625,725 (4) 2,137,145 (9) 937,954 (10) 4,221,864 (16) 4,850,493 (16) 768,711 (7) 4,776,635 (5) 5,066,886 (3) 4,878,852 (3) 4,612,537 (16) 1,560,879 (7) 3,804,335 (8) 5,240,116 (17) 10.7/20 47,482,132
sqx 4,662,560 (7) 2,078,866 (6) 991,992 (12) 4,105,933 (6) 4,699,518 (11) 878,469 (14) 4,808,697 (15) 5,102,452 (10) 4,908,341 (14) 4,590,245 (10) 1,836,245 (9) 4,415,575 (9) 5,208,275 (10) 9.8/20 48,287,168
gz 4,707,481 (15) 2,165,409 (16) 907,006 (8) 4,133,949 (12) 4,684,949 (7) 861,638 (11) 4,807,701 (14) 5,105,913 (14) 4,909,789 (15) 4,588,822 (9) 1,853,650 (15) 4,451,792 (17) 5,202,392 (6) 7.8/20 48,380,491
uha 4,498,275 (1) 1,474,005 (1) 136,880 (2) 3,879,360 (2) 4,625,014 (2) 284,363 (2) 4,760,572 (3) 5,104,837 (11) 4,879,047 (4) 4,237,400 (1) 1,233,812 (2) 2,435,124 (2) 5,187,408 (2) 17.3/20 44,736,097
yz1 4,814,935 (19) 2,128,899 (8) 924,706 (9) 4,279,162 (19) 4,686,669 (8) 804,198 (9) 4,810,966 (17) 5,124,596 (18) 4,922,886 (17) 4,568,274 (7) 1,901,300 (18) 4,561,179 (19) 5,207,874 (9) 6.4/20 48,735,644
zip 4,701,064 (10) 2,155,923 (11) 1,009,814 (14) 4,135,619 (13) 5,270,565 (20) 877,679 (13) 4,799,508 (9) 5,101,205 (9) 4,898,961 (9) 4,599,883 (12) 1,839,080 (10) 4,450,719 (13) 5,264,564 (20) 7.5/20 49,104,584
Intermediate
compressed
size
4,701,089 2,152,155 962,880 4,130,160 4,696,327 851,884 4,803,740 5,103,645 4,902,262 4,593,983 1,839,634 4,448,505 5,210,556   48,519,559
Intermediate
compression
rate
10.6 % 59.0 % 81.7 % 21.4 % 10.5 % 83.8 % 8.4 % 2.8 % 6.8 % 12.6 % 65.0 % 15.4 % 1.0 %   29.0 %

P.Table(P.T.): PAQ8 (kgb archiver is Windows GUI of old PAQ7) is much better than this all, but for the copyright of the table it can't be copied. In economics, diminishing returns is the short form of diminishing marginal returns. ... 7z is a compressed archive file format that supports several different data compression, encryption and pre-processing filters. ... ARJ is a tool for creating compressed file archives. ... BH may stand for: Bahrain (ISO country code) Bosnia and Herzegovina Bournemouth: BH is the United Kingdom post code for the region in southern England which is served by Bournemouth postal sorting office. ... The title given to this article is incorrect due to technical limitations. ... In computing, CAB is the Microsoft Windows native compressed archive format. ... In computing, a JAR file (or Java ARchive) file used to distribute a set of Java classes. ... LHA may refer to: LHA (file format), a freeware compression utility and associated file format LHA (hull classification symbol), US Navy hull classification symbol for general purpose amphibious assault ships of the Tarawa class. ... LHA is a freeware compression utility and associated file format. ... PKZIP is an archiving tool originally written by the late Phil Katz, and marketed by his company PKWARE, Inc. ... “RAR” redirects here. ... WinRK is a commercial file archiver program for Microsoft Windows. ... The SQX-format is an open and free archive format. ... gzip is a software application used for file compression. ... UHarc is a solid, high-compression 386+ DOS/Windows file archiver with multimedia support. ... The ZIP file format is a popular data compression and archival format. ... PAQ is a series of open source data compression archivers that have evolved through collaborative development to top rankings on several benchmarks measuring compression ratio (although at the expense of speed and memory usage). ... KGB Archiver is an open-source file archiver and data compression utility created by Tomasz Pawlak. ...


Globally, the three best methods tested are rk, rar and 7z. WinRK and WinRar are commercial software, 7-zip is free, open source (LGPL licence) and can be used with Linux. WinRK is a commercial file archiver program for Microsoft Windows. ... REDIRECT RAR (file format) ... 7z is a compressed archive file format that supports several different data compression, encryption and pre-processing filters. ... WinRK is a commercial file archiver program for Microsoft Windows. ... WinRAR is a shareware file archiver and data compression utility by Eugene Roshal. ... 7-Zip is an open source file archiver designed originally for the Microsoft Windows operating system, and later made available to other systems. ... Open source refers to projects that are open to the public and which draw on other projects that are freely available to the general public. ... The GNU Lesser General Public License (formerly the GNU Library General Public License) or LGPL is a free software license published by the Free Software Foundation. ... This article is about operating systems that use the Linux kernel. ...


See also

Data compression topics

Algorithmic information theory is a field of study which attempts to capture the concept of complexity by using tools from theoretical computer science. ... Claude Shannon In information theory, the Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable. ... A self-extracting archive is an application which contains a compressed file archive, as well as programming to extract this information. ... Image compression is the application of Data compression on digital images. ... Speech coding is the compression of speech (into a code) for transmission with speech codecs that use audio signal processing and speech processing techniques. ... Video compression refers to making a digital video signal use less data, without noticeably reducing the quality of the picture. ... Multimedia compression is a general term referring to the compression of any type of multimedia, most notably graphics, audio, and video. ... The minimum description length principle is a formalization of Occams Razor in which the best hypothesis for a given set of data is the one that leads to the largest compression of the data. ... Minimum message length (MML) is a formal information theory restatement of Occams Razor: even when models are not equal in goodness of fit accuracy to the observed data, the one generating the shortest overall message is more likely to be correct (where the message consists of a statement of... This is a list of file formats used by archivers and compressors. ... It has been suggested that this article or section be merged into Comparison of file archivers. ... The following tables compare general and technical information for a number of file archivers. ... This is a list of Unix programs. ... A free file format is a file format that is free of any patents or copyright. ... HTTP compression is a capability built into both web servers and web browsers, to make better use of available bandwidth. ... A magic compression algorithm is an algorithm that is asserted to be able to losslessly compress any data stream, reducing its size. ...

Compression algorithms

Lossless data compression

Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. ... A dictionary coder, also sometimes known as a substitution coder, is any of a number of data compression algorithms which operate by searching for matches between the text to be compressed and a set of strings contained in a data structure (called the dictionary) maintained by the encoder. ... LZ77 and LZ78 are the names for the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. ... LZW (Lempel-Ziv-Welch) is an implementation of a lossless data compression algorithm created by Abraham Lempel and Jacob Ziv. ... The Burrows-Wheeler transform (BWT, also called block-sorting compression), is an algorithm used in data compression techniques such as bzip2. ... PPM is an adaptive statistical data compression technique based on context modeling and prediction. ... Context mixing is a type of data compression algorithm in which the next-symbol predictions of two or more statistical models are combined to yield a prediction that is often more accurate than any of the individual predictions. ... Dynamic Markov Compression (DMC) is a lossless data compression algorithm developed by Gordon Cormack and Nigel Horspool [1]. It uses predictive arithmetic coding similar to prediction by partial matching (PPM), except that the input is predicted one bit at a time (rather than one byte at a time). ... In information theory an entropy encoding is a data compression scheme that assigns codes to symbols so as to match code lengths with the probabilities of the symbols. ... In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. ... Adaptive Huffman coding is an adaptive coding technique based on Huffman coding, building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one-pass encoding and adaptation to changing conditions in data. ... The introduction to this article provides insufficient context for those unfamiliar with the subject matter. ... In the field of data compression, Shannon-Fano coding is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). ... Range encoding is a form of arithmetic coding, a data compression method, that is believed to be free from arithmetic coding related patents. ... Golomb coding is a form of entropy encoding invented by Solomon W. Golomb that is optimal for alphabets following geometric distributions, that is, when small values are vastly more common than large values. ... In probability theory and statistics, the geometric distribution is either of two discrete probability distributions: the probability distribution of the number X of Bernoulli trials needed to get one success, supported on the set { 1, 2, 3, ...}, or the probability distribution of the number Y = X âˆ’ 1 of failures before... Fibonacci, Elias Gamma, and Elias Delta vs binary coding Rice with k=2,3,4,5,8,16 vs binary In data compression, a universal code for integers is a prefix code that maps the positive integers onto binary codewords, with the additional property that whatever the true probability distribution... Elias gamma code is a universal code encoding positive integers. ... The Fibonacci code is a universal code which encodes positive integers into binary code words. ...

Lossy data compression

2-D DCT compared to the DFT The discrete cosine transform (DCT) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers. ... Fractal compression is a lossy compression method used to compress images using fractals. ... The fractal transform is a technique invented by Michael Barnsley to perform lossy image compression. ... Wavelet compression is a form of data compression well suited for image compression (sometimes also video compression and audio compression). ... Vector quantization is a classical technique from signal processing, originally used for data compression, which provides a method for modeling probability density functions by the distribution of prototype vectors. ... It has been suggested that this article or section be merged with Code Excited Linear Prediction. ... The introduction to this article provides insufficient context for those unfamiliar with the subject matter. ... Modulo-N code is a lossy compression algorithm used to compress correlated data sources using modulo arithmetic. ... An a-law algorithm is a standard companding algorithm, used in European digital communications systems to optimize, modify, the dynamic range of an analog signal for digitizing. ... In telecommunication, a mu-law algorithm (μ-law) is a standard analog signal compression or companding algorithm, used in digital communications systems of the North American and Japanese digital hierarchies, to optimize, , modify, the dynamic range of an audio analog signal prior to digitizing. ...

Example implementations

  • DEFLATE (a combination of LZ77 and Huffman coding) – used by ZIP, gzip and PNG files
  • LZMA used by 7-Zip and, to a lesser extent, StuffitX
  • LZO (very fast LZ variation, speed oriented)
  • LZX (an LZ77 family compression algorithm)
  • Unix compress utility (the .Z file format), and GIF use LZW
  • Unix pack utility (the .z file format) used Huffman coding
  • bzip2 (a combination of the Burrows-Wheeler transform and Huffman coding)
  • PAQ (very high compression based on context mixing, but extremely slow; competing in the top of the highest compression competitions)
  • JPEG (image compression using a discrete cosine transform, then quantization, then Huffman coding)
  • MPEG (audio and video compression standards family in wide use, using DCT and motion-compensated prediction for video)
    • MP3 (a part of the MPEG-1 standard for sound and music compression, using subbanding and MDCT, perceptual modeling, quantization, and Huffman coding)
    • AAC (part of the MPEG-2 and MPEG-4 audio coding specifications, using MDCT, perceptual modeling, quantization, and Huffman coding)
  • Vorbis (DCT based AAC-alike audio codec, designed with a focus on avoiding patent encumbrance)
  • JPEG 2000 (image compression using wavelets, then quantization, then entropy coding)
  • TTA (uses linear predictive coding for lossless audio compression)
  • FLAC (linear predictive coding for lossless audio compression)

DEFLATE is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. ... The ZIP file format is the most widely-used compressed file format in the IBM PC world. ... gzip is a software application used for file compression. ... PNG (Portable Network Graphics), sometimes pronounced as ping, is a relatively new bitmap image format that is becoming popular on the World Wide Web and elsewhere. ... LZMA, short for Lempel-Ziv-Markov chain-Algorithm, is a data compression algorithm in development since 2001 and used in the 7z format of the 7-Zip archiver. ... 7-Zip is an open source file archiver designed originally for the Microsoft Windows operating system, and later made available to other systems. ... LZO is a data compression algorithm that is focused on decompression speed. ... LZX is the name of an LZ77 family compression algorithm. ... Filiation of Unix and Unix-like systems Unix (officially trademarked as UNIX®, sometimes also written as or ® with small caps) is a computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs including Ken Thompson, Dennis Ritchie and Douglas McIlroy. ... Categories: Disambiguation | Software stubs | Data compression software ... GIF (Graphics Interchange Format) is a bitmap image format that is widely used on the World Wide Web, both for still images and for animations. ... LZW (Lempel-Ziv-Welch) is an implementation of a lossless data compression algorithm created by Abraham Lempel and Jacob Ziv. ... In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. ... The correct title of this article is . ... PAQ is a series of open source data compression archivers that have evolved through collaborative development to top rankings on several benchmarks measuring compression ratio (although at the expense of speed and memory usage). ... Context mixing is a type of data compression algorithm in which the next-symbol predictions of two or more statistical models are combined to yield a prediction that is often more accurate than any of the individual predictions. ... JPG redirects here. ... The Moving Picture Experts Group or MPEG is a working group of ISO/IEC charged with the development of video and audio encoding standards. ... 2-D DCT compared to the DFT The discrete cosine transform (DCT) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers. ... For other uses, see MP3 (disambiguation). ... MPEG-1 defines a group of Audio and Video (AV) coding and compression standards agreed upon by MPEG (Moving Picture Experts Group). ... modified discrete cosine transform (MDCT) is a Fourier-related transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last half... MPEG-4 AAC DRM encoding as used in the iTunes Store Advanced Audio Coding (AAC) is a standardized, lossy compression and encoding scheme for digital audio. ... MPEG-2 is a standard for the generic coding of moving pictures and associated audio information [1]. It is widely used around the world to specify the format of the digital television signals that are broadcast by terrestrial (over-the-air), cable, and direct broadcast satellite TV systems. ... MPEG-4 is a standard used primarily to compress audio and visual (AV) digital data. ... modified discrete cosine transform (MDCT) is a Fourier-related transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last half... Vorbis is an open source, lossy audio codec project headed by the Xiph. ... JPEG 2000 is a wavelet-based image compression standard. ... TTA may refer to Telecommunications Technology Association in Korea Tennessee Telecommunications Association, a telecommunications industry trade organization Tennessee Trails Association Terran Trade Authority, a science-fiction universe Texas Telephone Association, a trade association for telephone companies in Texas The Tough Alliance, a synthpop duo from Sweden Time triggered architecture, software... It has been suggested that this article or section be merged with Code Excited Linear Prediction. ... FLAC, an acronym for Free Lossless Audio Codec, is a popular file format for audio data compression. ... It has been suggested that this article or section be merged with Code Excited Linear Prediction. ...

Corpora

Data collections, commonly used for comparing compression algorithms.

There are very few or no other articles that link to this one. ... The Calgary Corpus is a body of text and binary data that is commonly used for comparing data compression algorithms. ...

References

  1. ^ Rationale for a Large Text Compression Benchmark

External links

  • Data Compression Benchmarks and Tests
  • Data Compression Tutorial
  • Compression Comparison Guide on various settings
  • Large Data Compression Benchmarks and Tests
  • Almost complete portraits of Data Compression inventors
  • Data Compression - Systematisation by T.Strutz
  • Lossless Data Compression by Greg Goebel
  • How Stuff Works - File Compression
  • Ultimate Command Line Compressors
  • The Data Compression News Blog
  • Practical Compressor Test (Compares speed and efficiency for commonly used compression programs)
  • The Monthly Data Compression Newsletter
  • Compressed File Types and File Extensions
  • Image and Video Compression Learning Tool (VcDemo)
  • deVault

  Results from FactBites:
 
Data Compression (1050 words)
Data compression is often referred to as coding, where coding is a very general term encompassing any special representation of data which satisfies a given need.
Data compression may be viewed as a branch of information theory in which the primary objective is to minimize the amount of data to be transmitted.
While the primary focus of this survey is data compression methods of general utility, Section 2 includes examples from the literature in which ingenuity applied to domain-specific problems has yielded interesting coding techniques.
Data compression - Wikipedia, the free encyclopedia (1402 words)
Compression is important because it helps reduce the consumption of expensive resources, such as disk space or connection bandwidth.
In lossy audio compression, methods of psychoacoustics are used to remove non-audible (or less audible) components of the signal.
Compression of human speech is often performed with even more specialized techniques, so that "speech compression" or "voice coding" is sometimes distinguished as a separate discipline than "audio compression".
  More results at FactBites »

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m