B-trees are tree data structures that are most commonly found in databases and filesystem implementations. B_trees keep data sorted and allow amortized logarithmic time insertions and deletions. Conceptually speaking, B_trees grow from the bottom up as elements are inserted, whereas most binary trees generally grow down.
The idea behind B-trees is that inner nodes can have a variable number of child nodes within some pre-defined range. In consequence, B-trees do not need re-balancing as frequently as other self_balancing binary search trees. The lower and upper bounds on the number of child nodes are fixed for a particular implementation. For example, in a 2-3 B-tree (often simply 2-3 tree), each internal node may have only 2 or 3 child nodes. A node is considered to be in an illegal state if it has an invalid number of child nodes.
B-trees have substantial advantages when the time needed to access another node may be significantly greater than the time needed to access values within a node. This is the case when an arbitrary node may reside on secondary storage until read into main memory, hence the use of B-trees in databases. Arranging that the node can have a relatively large number of child nodes increases the advantage.
There is some debate as to what B stands for. The most common belief is that B may stand for balanced, as all the leaf nodes appear at the same level in the tree (this is described as a balanced tree state). B may also stand for Rudolf Bayer, the designer of this data structure.
Inner node structures
Generally speaking, the "separation values" can simply be the values of the tree.
Each inner node has separation values which divide its sub-trees. For example, if an inner node has three child nodes (or sub-trees) then it must have two separation values a1 and a2. All values less than a1 will be in the leftmost sub-tree, values between a1 and a2 will be in the middle sub-tree, and values greater than a2 will be in the rightmost sub-tree.
Steps for deletion
- If after removing the desired node, no inner node is in an illegal state then the process is finished.
- If some inner node is in an illegal state then there are two possible cases:
- Its sibling node (a child of the same parent node) can transfer one of its child nodes to the current node and return it to a legal state. If so, after updating the separation values in the parent and the two siblings the operation ends.
- Its sibling does not have an extra child because it is on the lower bound too. In that case both these nodes are merged into a single node and the action is transferred to the parent node, since it has had a child node removed.
The process continues until the parent node remains in a legal state or until the root node is reached.
Steps for insertion
- If after inserting the node into the appropriate position, no inner node is in an illegal state then the process is finished.
- If some node has more than the maximum amount of child nodes then it is split into two nodes, each with the minimum amount of child nodes. This process continues action recursively in the parent node.
The action stops when either the node is in a legal state or the root is split into two nodes and a new root is inserted.
Searching is performed very similar to a binary tree search, simply by following the separation values until the value is found or the end of the tree is reached.
Suppose L is the least number of children a node is allowed to have, while U is the most number. Then each node will always have between L and U children, inclusively, with one exception: the root node may have anywhere from 2 to U children inclusively, or in other words, it is exempt from the lower bound restriction, instead having a lower bound of its own (2). This allows the tree to hold small numbers of elements. The root having one child makes no sense, since the subtree attached to that child could simply be attached to the root. Giving the root no children is also unnecessary, since a tree with no elements is typically represented as having no root node.
Robert Tarjan proved that the amortized number of splits/merges is 2.
- Rudolf Bayer, Binary B-Trees for Virtual Memory, ACM-SIGFIDET Workshop 1971, San Diego, California, Session 5B, p. 219-235.
- Rudolf Bayer and McCreight, E. M. Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1, 173-189, 1972.
- Donald E. Knuth, "The Art of Computer Programming", second edition, volume 3, section 6.2.4, 1997.