Project 2

Project 2 is to implement in C++ a 2-3 tree. A 2-3 tree is a form of a balanced tree which isn't used anymore since red-black trees are better, but was historically one of the first balanced trees. Note that there are several variations on 2-3 trees so if you do a Google search you might find information on a variant. All the variants are similar and operate in essentially the same way, but have differences in the details.

Due date is Mon December 7 at midnight.

2-3 Trees

A 2-3 tree stores elements in a tree but unlike a binary search tree, elements are not stored on interior nodes of the tree.

All elements appear on the leaves. The interior branch nodes are only used for organization.
The tree is always balanced. All leaves are at identical depth in the tree.
The leaves are kept in sorted order from left to right, all at the same level in the tree.
Branch nodes have either 2 or 3 children. The children are labeled the left child, the middle child, and the right child. If a branch node has 2 children, the children are the left and middle children with the right child nil.
Each branch node stores two properties leftMax and middleMax. leftMax is the largest \(x\) entry in entire left subtree, and middleMax is the largest \(x\) entry in the entire middle subtree (both subtrees always exist, since only the right child is possibly nil).

Here are three drawings of 2-3 trees (that I stole from here). They all contain the elements 2, 4, 7, 10, 12, 15, 20, 30 which are the elements on the leaves. The branch nodes show the leftMax and middleMax.

                      ------------
                      | 4  |  12 |
                      ------------
                     /     |       \
                    /      |        \
      ------------    ------------    -------------
      | 2  |  4  |    | 7  | 10  |    | 15  | 20  |
      ------------    ------------    -------------
      /    |          /    |     \    /      |     \
     2     4         7    10     12  15      20     30



                      -------------
                      | 7  |  15  |
                      ------------
                    /      |        \
                   /       |         \
      ------------   --------------   ---------------
      | 2  |  4  |   |  10  | 12  |   |  20  |  30  |
      ------------   --------------   ---------------
      /    |     \    /     |    \      |    |
     2     4      7  10     12   15    20    30


                                       ---------------
                                       |  10  |  30  |
                                       ---------------
                                      /       |
                     -----------------        \
                    /                          \
               ------------                     ---------------
               | 4  |  10 |                     |  15  |  30  |
               ------------                     ---------------
              /      |                          /      |
             /       |                         /       |
  ------------   -------------   ---------------  ---------------
  | 2  |  4  |   |  7  | 10  |   |  12  |  15  |  |  20  |  30  |
  ------------   -------------   ---------------  ---------------
  /    |         /    |           /     |          /     |
 2     4        7     10         12    15         20    30

Find

Find works recursively using leftMax and middleMax. Say we are searching for \(x\) at branch node \(v\). If \(x \leq v.leftMax\) then recursively search the left child. If \(v.leftMax < x \leq v.middleMax\) then recursively search the middle subtree. Otherwise, search the right subtree.

Add

First, we need a recursive function add_helper. The recursion will only work when there are branch nodes, so the special case of an empty tree and a tree which is just a leaf will have to be handled later. The recursive procedure takes a single input \(x\) for the new element to add.

Base Case

The base case is a branch node \(v\) whose children are leaves. There are two subcases, depending on if \(v\) has two or three child leaves. Here are drawings of the two possibilities:

    ------------          ------------    
v-> | 3  |  7  |      v-> | 2  |  5  |    
    ------------          ------------    
    /    |                /    |     \    
   3     7               2     5      8

If \(v\) has just two children, we can just add \(x\) as a new leaf child of \(v\). Remember that leaves are kept in sorted order, so some existing leaves might need to move. For example, if we are inserting \(5\) into the example above with only two children, \(7\) would move to the right child and \(5\) would become the middle child. After doing this insert, leftMax and middleMax need to be updated. The new maximum entry is then returned, i.e. the right entry is returned.
If \(v\) has three children, then adding \(x\) would make \(v\) have four children so we can then split \(v\) into \(v\) and a new branch node \(v'\) and make each have two children. Indeed, say the children of \(v\) are \(w, y, z\). Then make the left and middle child of \(v\) the two smallest elements from among \(\{w, x, y, z\}\) and the right child nil. Create a new branch node \(v'\) and set the left and middle children of \(v'\) to be the two largest elements from among \(\{w, x, y, z\}\). Set leftMax and middleMax of \(v\) and \(v'\) properly. Finally, return the new maximum key from among all four elements, and additionally return the new node \(v'\).

For example

      ------------                        ------------         ------------
  v-> | 2  |  5  |   === add 4 ==>    v-> | 2  |  4  |    v'-> | 5  |  8  |
      ------------                        ------------         ------------
      /    |     \                       /    |               /    |      
     2     5      8                     2     4              5     8

The base case therefore returns two things:

The maximum key from the subtree(s), which will allow the parents to know the maximum in the subtree.
Optionally, a new node \(v'\) which was cloned from \(v\).

Inductive Case

Say we are at branch node \(v\).

If \(x \leq v.leftMax\), recursively insert into the left subtree
If \(v.leftMax < x\) and \(v\) has no right subtree, recursively insert into the middle subtree
If \(v\) has three children and \(v.leftMax < x \leq v.middleMax\), recursively insert into the middle subtree
Otherwise, recursively insert into the right subtree.

After recursing, we get back potentially a new maximum key from that child tree. We use this to update \(v.leftMax\) or \(v.middleMax\). For example, if we recurse to the left and the left subtree returns a new maximum, we set leftMax to that value. We then want to return the new maximum key from the entire subtree, but notice that if we recurse to the left we do not know the largest element in our subtree! The saving grace is that the maximum value in our subtree did not change when recursing to the left, since we were only changing things in the left subtree. If we recurse to the right or recurse to the middle when there is no right subtree, we can know the new maximum key. Therefore, we will either return a new maximum value for the subtree or return that the maximum did not change.

Also after recursing, we sometimes get back a new clone of the child we recursed to. This new node must be added directly after the child that we recursed to. Again, there are two cases depending on if \(v\) has two or three children. For example, say we recurse to the left to node \(u\) and get back a new clone \(u'\). \(u'\) must then be added directly after \(u\).

      -------------------                              -------------------         -------------------  
  v-> |leftMax|middleMax|        ==== add u' ==>   v-> |leftMax|middleMax|    v'-> |leftMax|middleMax|  
      -------------------                              -------------------         -------------------
      /       |          \                             /       |                   /       |
     u        w           t                           u        u'                 w        t

The cases if \(v\) has two or three children are identical to before: if \(v\) has two children the new child can be added directly. If \(v\) has three children, adding the new child would push it to four children so a new node \(v'\) must be created. I suggest you write some utility functions which can be used from both the base case and inductive case.

The recursive case returns two things:

The maximum key from the subtree(s), or a signal that the maximum did not change.
Optionally, a new node \(v'\) which was cloned from \(v\).

Add Method

With the above utility function, the add method works as follows:

If the tree is empty, add \(x\) as a leaf.
If the tree is just a leaf \(y\), create a new branch node, set \(x\) and \(y\) as the left and middle children of the new branch node (remember leaves are in sorted order), and set leftMax and middleMax.
If the root of the tree is a branch node, use the recursive add_helper procedure above. If the recursion returns a new clone \(r'\) of the root \(r\), create a new branch node \(s\) to become the new root and set \(r\) and \(r'\) as the new children of \(s\). Notice this is the only point we ever grow the height of the tree, and we therefore guarantee that the leaves are always at the same depth.

Remove

Remove is similar. Again, we have a recursive remove_helper which works only when there are branch nodes. The special case of an empty tree and a leaf will be handled later. The recursive procedure takes a single parameter \(x\) for the element to remove.

Base Case

The base case is a branch node \(v\) whose children are leaves. We just check the leaves and if any are equal to \(x\) we remove it and possibly shift the other children. Note this could leave the node \(v\) with only a single child, but we are going to make the parent of \(v\) fix this situation. leftMax and middleMax are updated. Finally, the base case returns two things:

The new maximum element in this tree
A boolean telling the parent if this node has only a single child.

Inductive case

Say we are at branch node \(v\).

If \(x \leq v.leftMax\), recursively delete from the left subtree
If \(v.leftMax < x\) and \(v\) has no right subtree, recursively delete from the middle subtree
If \(v\) has three children and \(v.leftMax < x \leq v.middleMax\), recursively delete from the middle subtree
Otherwise, recursively delete from the right subtree.

The recursion returns a boolean if the child has only a single grandchild. If that boolean is true, we need to fix up that child which we will do by moving grandchildren around from our other children.

Consider the following example, where we recursed to the middle child \(u\) and the middle child told us there is only a single grandchild.

                         -------------
                  v->    | 7  |  ##  |
                         -------------
                      /       |        \
                   /          |         \
      ------------     --------------   ---------------
      | 2  |  4  |  u->|  10  | ##  |   |  20  |  30  |
      ------------     --------------   ---------------
      /    |     \    /                  /     |
     2     4      7  10                 20     30

We can actually fix this situation in two ways. First, we could move the \(7\) from the left subtree over to be the new left grandchild of \(u\), moving \(10\) to the middle of \(u\). After this, all three children of \(v\) will have two grandchildren. Alternatively, we could move the \(10\) over into the right child, so the right child has \(10\), \(20\), and \(30\). Then \(u\) has no children and can be removed, which would cause the right child of \(v\) to move over to the middle. After this, \(v\) has two children each of which have three grandchildren.

In general, one of these two possibilities will be available (we might not get lucky enough to have them both happen). So if our child tells us they have only one grandchild, we either move the element out to a neighbor that has only has two grandchildren, or steal a grandchild from a neighbor that has three grandchildren. The leftMax and middleMax are updated for all branches that change. The recursive case returns two things:

The new maximum key in the subtree(s), or that the maximum key didn't change (like add, sometimes we don't know the maximum key but do know that it doesn't change).
A boolean if the node \(v\) has been reduced to only one child.

Remove Method

The remove method then works as:

If the tree is empty, nothing is removed
If the root is a leaf, check if the element to be removed matches the leaf and if so remove it.
If the root is a branch, use the above procedure to remove. If the recursive procedure returns that the root now has only one child, just set the root to be this child. This is the only place where the height of the tree is reduced, and we maintain that all leaves are at the same depth.

Iterator

The iterator can be very similar to the iterator for treaps from day28. You just need to modify the iterator to only stop at the leaves.

Implementation

I have started the implementation for you in the file 23tree.cpp.

The code uses a tagged-union to store each node, since a node can either be a branch or a leaf. I described tagged unions on day29.
You must fill in properties that are stored at a Branch.
You must implement the find, add, remove, and iterator methods.
For debugging, I suggest you add a method which can print out the tree, similar to the printing code I had for the treap on day28.
Add some test code in main to show that your implementation works.
Update Nov 11 I found a small problem in the 23tree.cpp code I had posted before. It is fixed now so if you redownload the file you will get the latest version. The problem was that add_helper_return and remove_helper_return used Tval but should be using Tkey. The 23tree works with keys and stores maximum keys. Values are only at the leaves.