H.264 is a high compression digital video codec standard written by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG) as the product of a collective partnership effort known as the Joint Video Team (JVT). This standard is identical to ISO MPEG-4 part 10, and is also known as AVC, for Advanced Video Coding. The final drafting work on the first version of the standard was completed in May of 2003.
H.264 is a name related to the ITU-T line of H.26x video standards, while AVC relates to its ISO/IEC MPEG roots. It is usual to call the standard as H.264/AVC, or AVC/H.264 to emphasize the common heritage. The name H.26L, also related to its ITU-T history, is far less common, but still used. Occasionally, it has also been referred to as "the JVT codec", in reference to the JVT organization that developed it. (Such partnership and multiple naming is not unprecedented, as MPEG-2 video also arose from a partnership between MPEG and the ITU-T, and MPEG-2 video is also known in the ITU-T community as H.262.)
The intent of H.264/AVC project has been to create a standard that would be capable of providing good video quality at bit rates that are substantially lower (e.g., half or less) than what previous standards would need (e.g., relative to MPEG-2, H.263, or MPEG-4 part 2), and to do so without so much of an increase in complexity as to make the design impractically expensive to implement. An additional goal was to do this in a flexible way that would allow the standard to be applied to a very wide variety of applications (e.g., for both low and high bit rates, and low and high resolution video) and to work well on a very wide variety of networks and systems (e.g., for broadcast, DVD storage, RTP/IP packet networks, and ITU-T multimedia telephony systems).
The JVT recently completed the development of some extensions to the original standard that are known as the Fidelity Range Extensions (FRExt). These extensions support higher-fidelity video coding by supporting increased sample accuracy (including 10-bit and 12-bit coding) and higher-resolution color information (including sampling structures known as YUV 4:2:2 and YUV 4:4:4). Several other features are also included in the Fidelity Range Extensions project (such as adaptive switching between 4x4 and 8x8 integer transforms, encoder-specified perceptual-based quantization weighting matrices, efficient inter-picture lossless coding, support of additional color spaces, and a residual color transform). The design work on the Fidelity Range Extensions was completed in July of 2004, and the drafting was finished in September of 2004.
Since the completion of the original version of the standard in May of 2003, the JVT has also done one round of "corrigendum" errata corrections, and an additional round of such corrigendum work is now nearing completion and should be finished in early 2005.
H.264/AVC contains a number of new features that allow it to compress video much more effectively than older codecs and to provide more flexibility for application to a wide variety of network environments. In particular, some such key features include:
- Multi-picture motion compensation using previously-encoded pictures as references in a much more flexible way than in past standards, thus allowing up to 32 reference pictures to be used in some cases (unlike in prior standards, where the limit was typically one or, in the case of "B pictures", two). This particular feature usually allows modest improvements in bitrate and quality in most scenes. But in certain types of scenes, for example scenes with rapid repetitive flashing or back-and-forth scene cuts or uncovered background areas, it allows a very significant reduction in bit rate.
- Variable block-size motion compensation (VBSMC) with block sizes as large as 16x16 and as small as 4x4, enabling very precise segmentation of moving regions.
- Quarter-pixel precision for motion compensation, enabling very precise description of the displacements of moving areas. In fact, for chroma, the motion compensation has even more precision—down to one-eighth pixel.
- Weighted prediction, allowing an encoder to specify the use of a scaling and offset when performing motion compensation, and providing a significant benefit in performance in special cases—such as fade-to-black, fade-in, and cross-fade transitions.
- An in-loop deblocking filter which helps prevent the ringing and blocking artifacts common to other DCT-based image compression techniques.
- An exact-match integer 4x4 spatial block transform (similar to the well-known DCT design), and in the case of the new FRExt "High" profiles, the ability for the encoder to adaptively select between a 4x4 and 8x8 transform block size for the integer transform operation.
- A secondary Hadamard transform performed on DC coefficients of the primary spatial transform (for chroma DC coefficients and also luma in one special case) to obtain even more compression in smooth regions.
- Spatial prediction from the edges of neighboring blocks for "intra" coding (rather than the DC-only prediction found in MPEG-2 and the transform coefficient prediction found in H.263+ and MPEG-4 part 2).
- Context-adaptive binary arithmetic coding (CABAC), which is a clever technique to losslessly compress syntax elements in the video stream.
- Context-adaptive variable-length coding (CAVLC), which is a lower-complexity alternative to CABAC for the coding of quantized transform coefficient values. Although lower complexity than CABAC, CAVLC is more elaborate and more efficient than the methods typically used to code coefficients in other prior designs.
- A common simple and highly-structured variable length coding (VLC) technique for many of the syntax elements not coded by CABAC or CAVLC, referred to as an Exponential-Golomb (Exp-Golomb) code.
- A network abstraction layer (NAL) definition allowing the same video syntax to be used in many network environments, including features such as sequence parameter sets (SPSs) and picture parameter sets (PPSs) that provide more robustness and flexibility than provided in prior designs.
- Switching slices (called SP and SI slices), features that allow an encoder to direct a decoder to jump into an ongoing video stream for such purposes as video streaming bit rate switching and "trick mode" operation. When a decoder jumps into the middle of a video stream using the SP/SI feature, it can get an exact match to the decoded pictures at that location in the video stream despite using different pictures (or no pictures at all) as references prior to the switch.
- Flexible macroblock ordering (FMO, also known as slice groups) and arbitrary slice ordering (ASO), which are techniques for restructuring the ordering of the representation of the fundamental regions (called macroblocks) in pictures. Typically considered an error/loss robustness feature, FMO and ASO can also be used for other purposes.
- Data partitioning (DP), a feature providing the ability to separate more important and less important syntax elements into different packets of data, enabling the application of unequal error protection (UEP) and other types of improvement of error/loss robustness.
- Redundant slices (RS), an error/loss robustness feature allowing an encoder to send an extra representation of a picture region (typically at lower fidelity) that can be used if the primary representation is corrupted or lost.
- A simple automatic process for preventing the accidental emulation of start codes, which are special sequences of bits in the coded data that allow random access into the bitstream and recovery of byte alignment in systems that can lose byte synchronization.
- Supplemental enhancement information (SEI) and video usability information (VUI), which are extra information that can be inserted into the bitstream to enhance the use of the video for a wide variety of purposes.
- Auxiliary pictures, which can be used for such purposes as alpha blend compositing.
- Frame numbering, a feature that allows the creation of "sub-sequences" (enabling temporal scalability by optional inclusion of extra pictures between other pictures), and the detection and concealment of losses of entire pictures (which can occur due to network packet losses or channel errors).
- Picture order count, a feature that serves to keep the ordering of the pictures and the values of samples in the decoded pictures isolated from timing information (allowing timing information to be carried and controlled/changed separately by a system without affecting decoded picture content).
These techniques, along with several others, help H.264 to perform significantly better than any prior standard can, under a wide variety of circumstances in a wide variety of application environments. H.264 can often perform radically better than MPEG-2—typically obtaining the same quality at half of the bitrate or less.
Like other ISO/IEC MPEG video standards, H.264/AVC has a reference implementation that can be freely downloaded. Its main purpose is to give examples of H.264/AVC features, rather than being a useful application per se.
As with MPEG-2 and MPEG-4 part 2, the vendors of H.264/AVC products and services are expected to pay patent licensing royalties for the patents that their products use. The primary source of licenses for patents applying to this standard is a private organization known as MPEG-LA, LLC (which is not affiliated in any way with the MPEG standardization organization, but which also administers patent pools for MPEG-2 and MPEG-4 part 2 video).
The HD-DVD format planned for product deployment in late 2005 by the DVD Forum includes H.264/AVC as a mandatory player feature.
The Blu-ray Disc format planned for product deployment in late 2005 by the Blu-Ray Disc Association (BDA) includes H.264/AVC as a mandatory player feature.
The Digital Video Broadcast (DVB) standards body in Europe approved the use of H.264/AVC for broadcast television in Europe in late 2004.
The prime minister of France announced the selection of H.264/AVC as a requirement for receivers of HDTV and pay TV channels for digital terrestrial broadcast television services (referred to as "TNT") in France in late 2004.
The Advanced Television Systems Committee (ATSC) standards body in the United States is in final consideration work on potential use of H.264/AVC for U.S. broadcast television.
The Digital Multimedia Broadcast (DMB) service in the Republic of Korea will use H.264/AVC.
Major broadcasters in Japan (NHK, Tokyo Broadcasting System (TBS), Nippon Television (NTV), TV Asahi, Fuji TV and TV Tokyo) have announced support of H.264/AVC for mobile-segment terrestrial broadcast services of ISDB-T.
The 3rd Generation Partnership Project (3GPP) has approved the inclusion of H.264/AVC as an optional feature in release 6 of its mobile multimedia telephony services specifications.
The Motion Imagery Standards Board (MISB) of the United States Department of Defense (DoD) has adopted H.264/AVC for low bit-rate channels (e.g., less than 1 Mbits/sec) and is considering its adoption for other applications.
The Internet Engineering Task Force (IETF) is in the final stages of work on defining a payload packetization format for carrying H.264/AVC video using its Real-Time Protocol (RTP).
The Moving Picture Experts Group (MPEG) has fully integrated support of H.264/AVC into its system standards (e.g., MPEG-2 and MPEG-4 systems) and its ISO media file format specification.
The International Telecommunications Union-Telecom. Standardization Sector (ITU-T) has adopted H.264/AVC in its H.32x suite of multimedia telephony systems specifications. Based on the ITU-T standards, H.264/AVC is already widely used for videoconferencing, including its support in products of the two main companies in that market (Polycom and Tandberg). Essentially all new videoconferencing products now include support for H.264/AVC.
Several companies are producing custom chips capable of decoding H.264/AVC video. As of January 2005, sample quantities are available from Broadcom, Conexant, Neomagic, and STMicroelectronics. Sigma Designs predicts samples for March 2005. Such chips will allow widespread deployment of low-cost devices capable of playing H.264/AVC video at standard-definition and high-definition television resolutions.
Apple Computer is working on integrating H.264 into the next version of Mac OS X, version 10.4, code-named "Tiger". Apple has also announced support for H.264/AVC directly into the next version of QuickTime which will ship with Tiger.
The PlayStation Portable console features hardware decoding of video files in the H.264 format.
The Nero Digital package, co-developed by Ahead Software and Ateme, includes an H.264 encoder which was judged best overall by Doom9 in its 2004 codec shoot-out.
A tweaked variant of this codec is rumored to have been implemented in the form of the Sorenson codec, as was found by an FFmpeg developer working on reverse-engineering the Sorenson codec. (The reliability of this information is unknown.)
The free x264 codec, released under the terms of the GPL license.
H.264 will probably be used by various video-on-demand services on the Internet to provide films and television shows directly to computers. Also, it is likely that the same kind of content will be offered via filesharing networks in this codec, whether legally or not.
- H.264/AVC overview paper including new FRExt enhancements (Sullivan, Topiwala, and Luthra) (http://www.fastvdo.com/spie04/)
- Various papers on H.264/AVC and related topics (Wiegand) (http://iphome.hhi.de/wiegand/pubs.htm)
- More papers on H.264/AVC and related topics (Marpe) (http://iphome.hhi.de/marpe/pub.htm)
- H.264/AVC Software Coordination (Suehring) (http://iphome.hhi.de/suehring/tml/)
- H.264/MPEG-4 Part 10 Tutorials (Richardson) (http://www.vcodex.com/h264.html)
- Book: H.264 and MPEG-4 Video Compression (Richardson) (http://www.vcodex.com/h264mpeg4/)
- H.264/AVC Textbook (in Japanese: Okubo, Kadono, Kikuchi, and Suzuki) (http://internet.impress.co.jp/books/1983/)
- JVT Experts Group document archive (ftp://standards.polycom.com)
- MPEG LA Terms of H.264/MPEG-4 AVC Patent License (http://www.mpegla.com/news/n_03-11-17_avc.html)
- A fast GPL H.264 encoder library with support for most H.264 features (http://www.videolan.org/x264.html)
- MPEG Industry Forum (http://www.m4if.org)