- See Filing system for this term as it is used in libraries and offices
In computing, a file system is a method for storing and organizing computer files and the data they contain to make it easy to find and access them. File systems may use a storage device such as a hard disk or CD-ROM and involve maintaining the physical location of the files, or they may be virtual and exist only as an access method for virtual data or for data over a network (e.g. NFS).
More formally, a file system is a set of abstract data types that are implemented for the storage, hierarchical organization, manipulation, navigation, access, and retrieval of data.
Aspects of file systems
The most familiar file systems make use of an underlying data storage device that offers access to an array of fixed-size blocks, sometimes called sectors, generally 512 bytes each. The file system software is responsible for organizing these sectors into files and directories, and keeping track of which sectors belong to which file and which are not being used.
However, file systems need not make use of a storage device at all. A file system can be used to organize and represent access to any data, whether it be stored or dynamically generated (eg, from a network connection).
Whether the file system has a underlying storage device or not, file systems typically have directories which associate file names with files, usually by connecting the file name to an index into a file allocation table of some sort, such as the FAT in an MS-DOS file system, or an inode in a UNIX-like filesystem. Directory structures may be flat, or allow hierarchies where directories may contain subdirectories. In some file systems, file names are structured, with special syntax for filename extensions and version numbers. In others, file names are simple strings, and per-file metadata is stored elsewhere.
The hierarchical filesystem was an early research interest of Dennis Ritchie of Unix fame; previous implementations were restricted to only a few levels, notably the IBM implementations, even of their early databases like IMS. After the success of Unix, Ritchie extended the filesystem concept to every object in his later operating system developments, such as Plan 9.
Traditional filesystems offer facilities to create, move and delete both files and directories. They lack facilities to create additional links to a directory (hard links in Unix), rename parent links (".." in Unix-like OS), and create bidirectional links to files.
Traditional filesystems also offer facilities to truncate, append to, create, move, delete and in-place modify files. They do not offer facilities to prepend to or truncate from the beginning of a file, let alone arbitrary insertion into or deletion from a file. The operations provided are highly asymmetric and lack the generality to be useful in unexpected contexts. For example, interprocess pipes in Unix have to be implemented outside of the filesystem because it does not offer truncation from the beginning of files.
Secure access to basic file system operations can be based on a scheme of access control lists or capabilities. Research has shown access control lists to be difficult to secure properly, which is why research operating systems tend to use capabilities. Commercial file systems still use access control lists. see: secure computing
Types of file systems
File system types can be classified into disk file systems, network file systems and special purpose file systems.
Disk file systems
A disk file system is a file system designed for the storage of files on a data storage device, most commonly a disk drive, which might be directly or indirectly connected to the computer. Examples of disk file systems include FAT, NTFS, ext2, ISO 9660, ODS-5, and UDF.
Some disk file systems are also journaling file systems or versioning file systems.
Network file systems
A network file system is a file system where the files are accessed over a network, potentially simultaneously by several computers. Ideally, access to network filesystems is user transparent. Examples include NFS and CIFS.
Database file systems
New concepts for file management are database-based file systems. Instead of hierarchical structured management, files are identified by their characteristics, like type of file, topic, author, or similar metadata. Therefore a file search can be formulated in SQL or in natural speech. The example on the right side shows a query for "Movies that were directed by spielberg".
First beginnings for these file systems are GNOME Storage and WinFS.
Special purpose file systems
A special purpose file system is basically any file system that is not a disk file system or network file system. This includes systems where the files are arranged dynamically by software, intended for such purposes as communication between computer processes or temporary file space. Special purpose file systems are most commonly used by file-centric operating systems such as Unix. Examples include the '/proc' filesystem used by some Unix variants, which grants access to information about processes and other operating system features.
File systems and operating systems
Most operating systems provide a file system, as a file system is an integral part of any modern operating system. Early microcomputer operating systems' only real task was file management - a fact reflected in their names (see DOS and QDOS). Some early operaing systems had a separate component for handling file systems which was called a disk operating system. On some microcomputers, the disk operating system was loaded separately from the rest of the operating system.
Because of this, there needs to be an interface provided by the operating system software between the user and the file system. This interface can be textual (such as provided by a command line interface, such as the Unix shell, or OpenVMS DCL) or graphical (such as provided by a graphical user interface, such as a file browsers). If graphically, the metaphor of the folder, containing documents, other files, and nested folders is often used (see also: directory and folder).
File systems under Unix
Unix and Unix-like operating systems assigns a device name to each device, but this is not how the files on that device are accessed. Unix creates a virtual file system, which makes all the files on all the devices appear to exist under the one hierarchy. This means, in Unix, there is one root directory, and every file existing on the system is located under it somewhere. Furthermore, the Unix root directory does not have to be in any physical place. It might not be on your first hard drive - it might not even be on your computer. Unix can use a network shared resource as its root directory.
To gain access to files on another device, you must first inform the operating system where in the directory tree you would like those files to appear. This process is called mounting a file system. For example, to access the files on a CD-ROM, informally, one must tell the operating system "Take the file system from this CD-ROM and make it appear under the directory /mnt". The directory given to the operating system is called the mount point - in this case it is /mnt. The /mnt directory exists on all Unix systems, and it is intended specifically for use as a mount point for temporary media like floppy disks or CDs. It may be empty, or it may contain subdirectories for mounting individual devices. Generally, only the administrator (i.e. root user) may authorize the mounting of file systems.
Unix-like operating systems often include software and tools that assist in the mounting process and provide it new functionality. Some of these strategies have been coined "auto-mounting" as a reflection of their purpose.
- In many situations, filesystems other than the root need to be available as soon as the operating system has booted. All Unix-like systems therefore provide a facility for mounting filesystems at boot time. System administrators define these filesystems in the configuration file fstab, which also indicates options and mount points.
- In some situations, there is no need to mount certain filesystems at boot time, although their use may be desired thereafter. There are some utilities for Unix-like systems that allow the mounting of predefined filesystems upon demand.
- Removable media have become very common with microcomputer platforms. They allow programs and data to be transferred between machines without a physical connection. Two common examples include CD-ROMs and DVDs. Utilities have therefore been developed to detect the presence and availability of a medium and then mount that medium without any user intervention.
- Progressive Unix-like systems have also introduced a concept called supermounting. For example, a floppy disk that has been supermounted can be physically removed from the system. Under normal circumstances, the disk should have been synchronised and then unmounted before its removal. Provided synchronisation has occured, a different disk can be inserted into the drive. The system automatically notices that the disk has changed and updates the mount point contents to reflect the new medium.
Filesystems under MacOSX
Everything which applies to filesystems under Unix applies to filesystems under MacOSX. MacOSX uses a version of HFS+ with journaling. HFS+ is a 32-bit filesystem. It is case-preserving (the HFS+ remembers and displays the case of filename, but does not allow multiple files with the same name in the same folder differing only by case). HSF+ is a metadata-rich filesystem. HFS+ uses permissions. HFS+ is self-optimizing.
General overview of features running on top of HFS+
Filenames can be up to 255 characters. HFS+ represents filenames in UTF-8. The Metadata -- not the filename -- indicates filetype.
HFS+ has three kinds of links:
- Hard links
- Symbolic links
Aliases contain the fileID# and the name and path. One can move and rename the original object without breaking aliases. If one deletes the original object and replaces it, the path and name contained in the alias will allow it to connect to the new object. Aliases make over 90% of all links on HFS+.
Technical overview of HFS+
HFS Plus volumes are divided into sectors (called logical blocks in HFS), that are usually 512 bytes in size. These sectors are then grouped together into allocation blocks which can contain one or more sectors; the number of allocation blocks depends on the total size of the volume. HFS Plus uses a larger value to address allocation blocks than HFS, 32 bits rather than 16 bits; this means it can access 4,294,967,296 allocation blocks rather than the 65,536 allocation blocks available to HFS.
Typically an HFS Plus volume is embedded inside an HFS Wrapper. The wrapper was designed for two purposes. It allowed Macintosh computers without HFS Plus support in their ROM to boot HFS Plus volumes. It also was designed to help users transition to HFS Plus, by including a minimal HFS volume with a read-only file called Where_have_all_my_files_gone?, explaining to users with versions of the Mac OS without HFS Plus, that the volume requires a system with HFS Plus support.
There are nine structures that make up a typical HFS Plus volume:
- Sectors 0 and 1 of the volume are HFS boot blocks. These are identical to the boot blocks in an HFS volume. They are part of the HFS wrapper.
- Sector 2 contains the Volume Header equivalent to the Master Directory Block in an HFS volume. The Volume Header stores a wide variety of data about the volume itself, for example the size of allocation blocks, a timestamp that indicates when the volume was created or the location of other volume structures such as the Catalog File or Extent Overflow File. The Volume Header is always located in the same place.
- The Allocation File which keeps track of which allocation blocks are free and which are in use. It is similar to the Volume Bitmap in HFS, each allocation block is represented by one bit. A zero means the block is free and a one means the block is in use. The main difference with the HFS Volume Bitmap, is that the Allocation File is stored as a regular file, it does not occupy a special reserved space near the beginning of the volume. The Allocation File can also change size and does not have to be stored contiguously within a volume.
- The Catalog File is a B-tree that contains records for all the files and directories stored in the volume. The HFS Plus Catalog File is very similar to the HFS Catalog File, the main differences being records are larger to allow more fields and to allow for those fields to be larger (for example to allow the longer 255-character unicode file names in HFS Plus). A record in the HFS Catalog File is 512 bytes in size, a record in the HFS Plus Catalog File is 4KB in Mac OS and 8KB in Mac OS X. Fields in HFS are of fixed size, in HFS Plus the size can vary depending on the actual size of the data they store.
- The Extents Overflow File is another B-Tree that records the allocation blocks that are allocated to each file as extents. Each file record in the Catalog File is capable of recording eight extents for each fork of a file, once those are used extents are recorded in the Extents Overflow File. Bad blocks are also recorded as extents in the Extents Overflow File. The default size of an extent record in Mac OS is 1KB and 4KB in Mac OS X.
- The Attributes File is a new B-tree in HFS Plus that does not have a corresponding structure in HFS. The Attributes File can store three different types of 4KB records Inline Data Attribute records (which are not currently used but reserved for future use), Fork Data Attribute records and Extension Attribute records. Fork Data Attribute records contain references to extents that hold large attributes. Extension Attributes are used to extend a Fork Data Attribute record when its eight extent records are already used.
- The Startup File is designed for non-Mac OS systems that don't have HFS or HFS Plus support. It is similar to the Boot Blocks of an HFS volume.
- The second to last sector contains the Alternate Volume Header equivalent to the Alternate Master Directory Block of HFS.
- The last sector in the volume is reserved for use by Apple. It is used during the computer manufacturer process.
File systems under Plan 9
Plan 9 was originally designed to extend some of Unix's good points, and to introduce some new ideas of its own. With respect to file systems, the Unix system of treating things as files was continued, but in Plan 9, everything is treated as a file, and accessed as a file would be. Secondly, the underlying 9P protocol was used to ensure that the difference between a file existing on a remote system and a file existing on a local system was basically nil (apart from a possible difference in latency). This had the advantage that a device or devices, represented by files, on a remote computer, could be used as though it were the local computer's own device(s). This means that under Plan 9, multiple file servers provide access to devices, classing them as special file systems.
Everything on a Plan 9 system has, then, an abstraction as a file. For example, FTP connections are not handled by a dedicated program, but instead the ftpfs server mounts the remote hierachy as part of the local filesystem hierachy, and is accessed as if the remote files were local. Another example, the mail system uses file servers that synthesize virtual files and directories to represent your mailbox as /mail/fs/mbox.
File systems under Microsoft Windows
Microsoft Windows developed from an earlier operating system (MS-DOS which in turn was based on CP/M-80, which took many ideas from still earlier operating systems, notably several from DEC), and has added both file system and user interface ideas from several other sources since its first release (Unix, OS/2, etc). As such, Windows makes use of modified versions of the simple FAT and HPFS file systems, FAT32 and NTFS respectively. Older versions of the FAT file system had file name length limits, plus had restrictions on the maximum size of FAT-formatted disks or partitions. NTFS allowed ACL-based permission control, and inherited HPFS's automatic fragmentation control, which was a problem for FAT-based file systems.
NTFS itself supports many features. Some are not exposed or used by windows. The features include hard links, multiple file streams, attribute indexing, quota tracking, compression and mount-points for other file systems (called "junctions").
Windows, when using FAT32 and NTFS makes use of drive letters in order to distinguish the physical location of files on a disk. For example, the path C: WINNT represents a directory WINNT on the drive represented by the letter C. The C drive is most commonly used for the primary hard disk, on which Windows is installed and from which it boots. This "tradition" has become so firmly ingrained that bugs came about in older versions of Windows which made assumptions that the drive that the operating system was installed on was C. The tradition of using "C" for the drive letter can be traced to MS-DOS, where the letters A and B were reserved for up to two floppy disk drives.
Since Windows interacts with the user via a graphical user interface, its literature (help files, icon labels, ...) all refer to a "directory" as a folder which contains files. Network locations can be mapped to drive letters: for example, Z: could represent a location on a different server.
Newer file systems, such as WinFS, aim to move beyond the folder metaphor and into more general categories, with greater use of metadata.
File systems under OpenVMS
This topic is discussed here: Files-11
- Filesystem Specifications and Technical Whitepapers (http://www.forensics.nl/filesystems)