Deep dive into B and B+ Trees and how they are useful for indexing in database

Introduction

The aim of this post is to explain B and B+ tree data structure and how they are connected with indexes in the database. They are useful for reducing the time for executing a query within a database by making data retrieval faster.

We will go through the following topics for understanding this concept.

  1. M-way search tree
  2. Data representation of M-way search tree
  3. B trees
  4. B+ trees

If you like the following article please subscribe to the blog, to receive notification of newer posts.

M-Way Search Tree

Most of us are familiar with the concepts of a binary search tree, where each node has 1 key and at most 2 children per node. Keys in the left subtree are smaller than the keys on the right subtree. If the tree node has at most 3 children it’s called a 3-way search tree, 4 children then it’s called 4-way search tree and so on. In general, if the node in the tree has m number of children then it’s called an M-way search tree. Below is a visual example of BST and 3 way search tree.

Thus keys are arranged in increasing order within the node( K1 < K2 < K3<….<Kn). In a 3-way search tree, there are 2 keys per node and max number of child nodes is 3. So in general, for the M-Way search tree, every node has M-1 key and M children.

We can also say that BST is a 2-way search tree or a type of M-Way search tree.

Data representation of M-way search tree

Below is an example of how in terms of data structure each node of the M-way search tree could be represented. For simplicity, the diagram below shows a 4-Way search tree node representation.

As you can observe the node contains 2 things.

  1. Child Pointer
  2. Key-value

Since it’s a 4-way search tree there are 3 key values represented as K1, K2 and K3 plus 4 child pointers denoted as CP1, CP2, CP3, and CP4.

To make use of this M-Way search tree in the database we also have a record pointer associated with each Key. This record pointer points to the specific block of records in the database. The example diagram is shown below.

B Trees

There is one problem with the M-Way search tree. The new node creation process does not have any specific rules defining it. Suppose it’s a 10-way search tree and we want to insert values such as 10,20,30,…,N.

We Could end up with a tree shown below. Which is still a valid 10-Way search tree.

As you can see we are creating new nodes without filling up the empty key values in the same node. We still end up with a valid tree. That’s why B-trees solve this problem by providing a guideline for creating M-Way search tree.

Rules for creating B-Trees

  1. For every node except the root should have at least M/2 keys filled before creating a new node.
  2. The root can have a min of 2 children.
  3. All Leaf nodes should be at the same level.
  4. The creation process is bottom-up.

Thus we can say that B-trees are just M-Way search trees with additional rules defining the node creation process.

Let’s go through an example explaining how the B-tree is created step-by-step.

Suppose M = 4

Keys to insert = 10,20,40,50,60,70,80,30,35,15

Usually when the node becomes full then split the node an create a new node at the top. The following example shows the process of B-tree creation.

We also can have a record pointer for every node in the tree. which points to a record in a database. Its structure is similar to an index in a database. The following figure shows the comparison of B-tree and visual examples of how the index structure looks like in a database.

B+ Trees

B+ tree is similar to the B tree with few modifications.

  1. Only the leaf node has the record pointer.
  2. Every key will be present in the leaf node.
  3. Leaf nodes are connected using a linked list.

The following figure shows the structure of B+ trees.