FACTOID # 7: The top five best educated states are all in the Northeast.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Tar (file format)
Tar

GNU tar 1.16 showing three common types of tarballs (shown in red).
File extension: .tar
MIME type: application/x-tar
Uniform Type Identifier: public.tar-archive
Magic: ustar at byte 257
Type of format: file archive
Container for: anything
Contained by: compress, gzip, bzip2

In computing, tar (derived from tape archive) is both file format (in the form of a type of archive bitstream) and the name of the program used to handle such files. The format was standardized by POSIX.1-1998 and later POSIX.1-2001. Initially developed as a raw format, used for tape backup and other sequential access devices for backup purposes, it is now commonly used to collate collections of files into one larger file, for distribution or archiving, while preserving file system information such as user and group permissions, dates, and directory structures. Image File history File links Size of this preview: 800 × 437 pixel Image in higher resolution (838 × 458 pixel, file size: 138 KB, MIME type: image/png) Screenshot of Tar 1. ... A filename extension is a suffix to the name of a computer file applied to show its format. ... Multipurpose Internet Mail Extensions (MIME) is an Internet Standard that extends the format of e-mail to support: text in character sets other than US-ASCII; non-text attachments; multi-part message bodies; and header information in non-ASCII character sets. ... A Uniform Type Identifier (UTI) is a string that uniquely identifies the type of a class of items. ... In computer programming, a magic number is a constant used to identify the file or data type employed. ... A file archiver combines a number of files together into one archive file, or a series of archive files, for easier transportation or storage. ... Categories: Disambiguation | Software stubs | Data compression software ... The correct title of this article is . ... The correct title of this article is . ... Memory (Random Access Memory) Look up computing in Wiktionary, the free dictionary. ... A file format is a particular way to encode information for storage in a computer file. ... This is a list of formats used by Archiving software in archiving and data compression. ... A bitstream or bit stream is a time series of bits. ... POSIX or Portable Operating System Interface[1] is the collective name of a family of related standards specified by the IEEE to define the application programming interface (API) for software compatible with variants of the Unix operating system. ... DDS tape drive. ... In computer science sequential access means that a group of elements (e. ... In information technology, backup refers to the copying of data so that these additional copies may be restored after a data loss event. ... A software distribution is a bundle of a specific software (or a collection of multiple, even an entire operating system), already compiled and configured. ... For alternate uses see: Archive (disambiguation). ... In computing, a file system (often also written as filesystem) is a method for storing and organizing computer files and the data they contain to make it easy to find and access them. ... In computing, a directory, catalog, or folder, is an entity in a file system which can contain a group of files and/or other directories. ...


tar's linear roots can still be seen in its ability to work on any data stream and its slow partial extraction performance, as it has to read through the whole archive to extract only the final file. A tar file (somefile.tar), when subsequently compressed using a compression utility such as gzip, bzip, or (formerly) compress, produces a compressed tar file with a filename extension indicating the type of compression (e.g.: somefile.tar.gz). A .tar file is commonly referred to as a tarball, which may be compressed or not. In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes. ... The correct title of this article is . ... The title given to this article is incorrect due to technical limitations. ... Categories: Disambiguation | Software stubs | Data compression software ...


As is common with Unix utilities, tar is a single specialist program. It follows the Unix philosophy in that it can "do only one thing" (archive), "but do it well". tar is most commonly used in tandem with an external compression utility, since it has no built-in data compression facilities. These compression utilities generally only compress a single file, hence the pairing with tar, which can produce a single file from many files. To ease this common usage, the BSD and GNU versions of tar support the command line options -z (gzip), -j (bzip2), and -Z (compress), which will compress or decompress the archive file it is currently working with, although even in this case the (de)compression is still actually performed by an external program. Compression is sometimes avoided because of the greatly amplified potential for damage to data in long term storage. The Unix philosophy is a set of cultural norms and philosophical approaches to developing software based on the experience of leading developers of the Unix operating system. ... In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes. ... BSD redirects here; for other uses see BSD (disambiguation). ... GNU (pronounced ) is a computer operating system - consisting of a kernel, libraries, system utilities, compilers, and end-user application software - composed entirely of free software. ... In computer software, specifically command line interfaces, a switch (also known as option, command-line parameter, or command-line argument) is an indication by a user that a computer program should change its default behaviour. ...

Contents

Format details

A tar file is the concatenation of one or more files. Each file is preceded by a header block. The file data is written unaltered except that its length is rounded up to a multiple of 512 bytes and the extra space is zero filled. The end of an archive is marked by at least two consecutive zero-filled blocks. In information technology, Header refers to supplemental data placed at the beginning of a block of data being stored or transmitted, which contain information for the handling of the data block. ...


A limitation of early tape drives was that data could only be written to them in 512 byte blocks. As a result data in tar files is arranged in 512 byte blocks.


File header

The file header block contains metadata about a file. To ensure portability across different architectures with different byte orderings, the information in the header block is encoded in ASCII. Thus if all the files in an archive are text files, then the archive is essentially an ASCII file. Metadata (Greek meta after and Latin data information) are data that describe other data. ... In computing, endianness is the ordering used to represent some kind of data as a sequence of smaller units. ... There are 95 printable ASCII characters, numbered 32 to 126. ...


The fields defined by the original Unix tar format are listed in the table below. When a field is unused it is zero filled. The header is padded with zero bytes to make it up to a 512 byte block.

Field Offset Field Size Field
0 100 File name
100 8 File mode
108 8 Owner user ID
116 8 Group user ID
124 12 File size in bytes
136 12 Last modification time
148 8 Check sum for header block
156 1 Link indicator
157 100 Name of linked file

The Link indicator field can have the following values:

Value Meaning
0 Normal file
(ASCII NUL)[1] Normal file
1 Hard link
2 Symbolic link[2]
3 Character special
4 Block special
5 Directory
6 FIFO
7 Contiguous file[3]

A directory is also indicated by having a trailing slash(/) in the name. There are 95 printable ASCII characters, numbered 32 to 126. ... In computing, a hard link is a reference, or pointer, to physical data on a storage volume. ... In computing, a symbolic link (often shortened to symlink and also known as a soft link) consists of a special type of file that serves as a reference to another file. ... A device file or special file is an interface for a device driver that appears in a file system as if it were an ordinary file. ... A device file or special file is an interface for a device driver that appears in a file system as if it were an ordinary file. ... In computing, a named pipe (also FIFO for its behaviour) is an extension to the traditional pipe concept on Unix and Unix-like systems, and is one of the methods of inter-process communication. ...


For historical reasons numerical values are encoded in octal with leading zeroes. The final character is either a null or a space. Thus although there are 12 bytes reserved for storing the file size, only 11 octal digits can be stored. This gives a maximum file size of 8 gigabytes on archived files. To overcome this limitation some versions of tar, including the GNU implementation, support an extension in which the file size is encoded in binary. Additionally, versions of GNU tar from 1999 and before pad the values with space characters instead of zero characters. The octal numeral system, or oct for short, is the base-8 number system, and uses the digits 0 to 7. ... KK Null, a Japanese musician Null, a special value in computer programming. ... A space is a punctuation convention for providing interword separation in some scripts, including the Latin, Greek, Cyrillic, and Arabic. ... A gigabyte (derived from the SI prefix giga-) is a unit of information or computer storage equal to one billion (short scale, meaning a thousand million) bytes. ... GNU (pronounced ) is a computer operating system - consisting of a kernel, libraries, system utilities, compilers, and end-user application software - composed entirely of free software. ... GNU (pronounced ) is a computer operating system - consisting of a kernel, libraries, system utilities, compilers, and end-user application software - composed entirely of free software. ... A space is a punctuation convention for providing interword separation in some scripts, including the Latin, Greek, Cyrillic, and Arabic. ...


The checksum is calculated by taking the sum of the byte values of the header block with the eight checksum bytes taken to be ascii spaces (value 32). It is stored as a six digit octal number with leading zeroes followed by a nul and then a space.


USTAR format

Most modern tar programs read and write archives in the new USTAR (Uniform Standard Tape Archive) format, which has an extended header definition as defined by the POSIX (IEEE P1003.1) standards group. Older tar programs will ignore the extra information, while newer programs will test for the presence of the "ustar" string to determine if the new format is in use. The USTAR format allows for longer file names and stores extra information about each file.

Field Offset Field Size Field
0 156 (as in old format)
156 1 Type flag
157 100 (as in old format)
257 6 USTAR indicator
263 2 USTAR version
265 32 Owner user name
297 32 Owner group name
329 8 Device major number
337 8 Device minor number
345 155 Filename prefix

Example

The example below shows the hex dump of a header block from a tar file created using the GNU tar program. It was dumped with the od program. The "ustar" magic string can be seen, meaning that the tar file is in USTAR format. In mathematics and computer science, hexadecimal, base-16, or simply hex, is a numeral system with a radix, or base, of 16, usually written using the symbols 0–9 and A–F, or a–f. ... od is an octal dumping program for Unix and Unix-like systems. ...

 0000000 e t c / p a s s w d nul nul nul nul nul nul 0000020 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul * 0000140 nul nul nul nul 0 1 0 0 6 4 4 nul 0 0 0 0 0000160 0 0 0 nul 0 0 0 0 0 0 0 nul 0 0 0 0 0000200 0 0 4 1 3 5 5 nul 1 0 1 5 5 0 6 1 0000220 1 0 5 nul 0 1 1 5 5 6 nul sp 0 nul nul nul 0000240 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul * 0000400 nul u s t a r sp sp nul r o o t nul nul nul 0000420 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul 0000440 nul nul nul nul nul nul nul nul nul r o o t nul nul nul 0000460 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul * 0001000 

Note, the OpenBSD 3.7 tar does not have the 2 space characters after ustar. They are nul characters. OpenBSD is a freely available Unix-like computer operating system descended from Berkeley Software Distribution (BSD), a Unix derivative developed at the University of California, Berkeley. ...


Tarbombs

Tarbomb is derogatory hacker slang used to refer to a tarball containing files that untar to the current directory instead of untarring into a directory of their own. This can be a potential problem if it overwrites files using the same name in the current directory. It can also be a pain for the user who then needs to delete all the files that are scattered over the directory amongst other files. Often this ends up happening in the user's home directory. Such behaviour is often considered bad etiquette on the part of the archive's creator. The Jargon File is a glossary of hacker slang. ...


Tarpit

Tarpit is a term to describe a method of revision control where a tar is used to capture the state of development of a software module at a particular point in time. The use of a tarpit typically loosely mirrors the use of a revision control software tag and branching through the use of descriptive names. Revision control (also known as version control, source control or (source) code management (SCM)) is the management of multiple revisions of the same unit of information. ...


Notes

  1. ^ This is probably a workaround for buggy tar implementations (the byte 0x00 is ASCII NUL).
  2. ^ GNU tar's headers mark this field as "Reserved"[1]
  3. ^ Apparently relevant on an OS called RTU, this would be a normal file written in one contiguous section on-disc. GNU tar's headers mark this field as 'Reserved', and such items will probably be extracted as normal files on other operating systems.

There are 95 printable ASCII characters, numbered 32 to 126. ...

See also

This is a list of file formats used by archivers and compressors. ... It has been suggested that this article or section be merged into Comparison of file archivers. ... The following tables compare general and technical information for a number of file archivers. ... This is a list of Unix programs. ...

External links

  • The tar Command by The Linux Information Project (LINFO)
  • Official website of GNU tar
  • The file 'tar.h' from GNU tar
  • Detailed information on tar and USTAR file headers
  • linux tar command simplified
  • tar(1) man page via OpenBSD

  Results from FactBites:
 
TAR archive file format (361 words)
If an archived file has a size in bytes which is not divideable by 512 without rest, as many bytes as missing to the next 512 byte boundary are added to fill up that last block.
A file that is 513 bytes large thus occupies 512 bytes for the header, 513 bytes for the file's data, and 511 additional bytes so that the file content will be a multiple of 512 bytes (in this case, 1024 bytes).
GNU tar (implementation of the tar utility in C).
  More results at FactBites »

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m