Project 2

Project 2 is to implement in C++ a 2-3 tree. A 2-3 tree is a form of a balanced tree which isn't used anymore since red-black trees are better, but was historically one of the first balanced trees. Note that there are several variations on 2-3 trees so if you do a Google search you might find information on a variant. All the variants are similar and operate in essentially the same way, but have differences in the details.

Due date is Mon December 7 at midnight.

2-3 Trees

A 2-3 tree stores elements in a tree but unlike a binary search tree, elements are not stored on interior nodes of the tree.

Here are three drawings of 2-3 trees (that I stole from here). They all contain the elements 2, 4, 7, 10, 12, 15, 20, 30 which are the elements on the leaves. The branch nodes show the leftMax and middleMax.

                      ------------
                      | 4  |  12 |
                      ------------
                     /     |       \
                    /      |        \
      ------------    ------------    -------------
      | 2  |  4  |    | 7  | 10  |    | 15  | 20  |
      ------------    ------------    -------------
      /    |          /    |     \    /      |     \
     2     4         7    10     12  15      20     30



                      -------------
                      | 7  |  15  |
                      ------------
                    /      |        \
                   /       |         \
      ------------   --------------   ---------------
      | 2  |  4  |   |  10  | 12  |   |  20  |  30  |
      ------------   --------------   ---------------
      /    |     \    /     |    \      |    |
     2     4      7  10     12   15    20    30


                                       ---------------
                                       |  10  |  30  |
                                       ---------------
                                      /       |
                     -----------------        \
                    /                          \
               ------------                     ---------------
               | 4  |  10 |                     |  15  |  30  |
               ------------                     ---------------
              /      |                          /      |
             /       |                         /       |
  ------------   -------------   ---------------  ---------------
  | 2  |  4  |   |  7  | 10  |   |  12  |  15  |  |  20  |  30  |
  ------------   -------------   ---------------  ---------------
  /    |         /    |           /     |          /     |
 2     4        7     10         12    15         20    30

Find

Find works recursively using leftMax and middleMax. Say we are searching for \(x\) at branch node \(v\). If \(x \leq v.leftMax\) then recursively search the left child. If \(v.leftMax < x \leq v.middleMax\) then recursively search the middle subtree. Otherwise, search the right subtree.

Add

First, we need a recursive function add_helper. The recursion will only work when there are branch nodes, so the special case of an empty tree and a tree which is just a leaf will have to be handled later. The recursive procedure takes a single input \(x\) for the new element to add.

Base Case

The base case is a branch node \(v\) whose children are leaves. There are two subcases, depending on if \(v\) has two or three child leaves. Here are drawings of the two possibilities:

    ------------          ------------    
v-> | 3  |  7  |      v-> | 2  |  5  |    
    ------------          ------------    
    /    |                /    |     \    
   3     7               2     5      8   

For example

      ------------                        ------------         ------------
  v-> | 2  |  5  |   === add 4 ==>    v-> | 2  |  4  |    v'-> | 5  |  8  |
      ------------                        ------------         ------------
      /    |     \                       /    |               /    |      
     2     5      8                     2     4              5     8      

The base case therefore returns two things:

Inductive Case

Say we are at branch node \(v\).

After recursing, we get back potentially a new maximum key from that child tree. We use this to update \(v.leftMax\) or \(v.middleMax\). For example, if we recurse to the left and the left subtree returns a new maximum, we set leftMax to that value. We then want to return the new maximum key from the entire subtree, but notice that if we recurse to the left we do not know the largest element in our subtree! The saving grace is that the maximum value in our subtree did not change when recursing to the left, since we were only changing things in the left subtree. If we recurse to the right or recurse to the middle when there is no right subtree, we can know the new maximum key. Therefore, we will either return a new maximum value for the subtree or return that the maximum did not change.

Also after recursing, we sometimes get back a new clone of the child we recursed to. This new node must be added directly after the child that we recursed to. Again, there are two cases depending on if \(v\) has two or three children. For example, say we recurse to the left to node \(u\) and get back a new clone \(u'\). \(u'\) must then be added directly after \(u\).

      -------------------                              -------------------         -------------------  
  v-> |leftMax|middleMax|        ==== add u' ==>   v-> |leftMax|middleMax|    v'-> |leftMax|middleMax|  
      -------------------                              -------------------         -------------------
      /       |          \                             /       |                   /       |
     u        w           t                           u        u'                 w        t

The cases if \(v\) has two or three children are identical to before: if \(v\) has two children the new child can be added directly. If \(v\) has three children, adding the new child would push it to four children so a new node \(v'\) must be created. I suggest you write some utility functions which can be used from both the base case and inductive case.

The recursive case returns two things:

Add Method

With the above utility function, the add method works as follows:

Remove

Remove is similar. Again, we have a recursive remove_helper which works only when there are branch nodes. The special case of an empty tree and a leaf will be handled later. The recursive procedure takes a single parameter \(x\) for the element to remove.

Base Case

The base case is a branch node \(v\) whose children are leaves. We just check the leaves and if any are equal to \(x\) we remove it and possibly shift the other children. Note this could leave the node \(v\) with only a single child, but we are going to make the parent of \(v\) fix this situation. leftMax and middleMax are updated. Finally, the base case returns two things:

Inductive case

Say we are at branch node \(v\).

The recursion returns a boolean if the child has only a single grandchild. If that boolean is true, we need to fix up that child which we will do by moving grandchildren around from our other children.

Consider the following example, where we recursed to the middle child \(u\) and the middle child told us there is only a single grandchild.

                         -------------
                  v->    | 7  |  ##  |
                         -------------
                      /       |        \
                   /          |         \
      ------------     --------------   ---------------
      | 2  |  4  |  u->|  10  | ##  |   |  20  |  30  |
      ------------     --------------   ---------------
      /    |     \    /                  /     |
     2     4      7  10                 20     30

We can actually fix this situation in two ways. First, we could move the \(7\) from the left subtree over to be the new left grandchild of \(u\), moving \(10\) to the middle of \(u\). After this, all three children of \(v\) will have two grandchildren. Alternatively, we could move the \(10\) over into the right child, so the right child has \(10\), \(20\), and \(30\). Then \(u\) has no children and can be removed, which would cause the right child of \(v\) to move over to the middle. After this, \(v\) has two children each of which have three grandchildren.

In general, one of these two possibilities will be available (we might not get lucky enough to have them both happen). So if our child tells us they have only one grandchild, we either move the element out to a neighbor that has only has two grandchildren, or steal a grandchild from a neighbor that has three grandchildren. The leftMax and middleMax are updated for all branches that change. The recursive case returns two things:

Remove Method

The remove method then works as:

Iterator

The iterator can be very similar to the iterator for treaps from day28. You just need to modify the iterator to only stop at the leaves.

Implementation

I have started the implementation for you in the file 23tree.cpp.