Project 2 is to implement in C++ a 2-3 tree. A 2-3 tree is a form of a balanced tree which isn't used anymore since red-black trees are better, but was historically one of the first balanced trees. Note that there are several variations on 2-3 trees so if you do a Google search you might find information on a variant. All the variants are similar and operate in essentially the same way, but have differences in the details.
Due date is Mon December 7 at midnight.
A 2-3 tree stores elements in a tree but unlike a binary search tree, elements are not stored on interior nodes of the tree.
leftMax
and middleMax
. leftMax
is the largest \(x\) entry in entire left subtree, and middleMax
is the largest \(x\) entry in the entire middle subtree (both subtrees always exist, since only the right child is possibly nil).Here are three drawings of 2-3 trees (that I stole from here). They all contain the elements 2, 4, 7, 10, 12, 15, 20, 30
which are the elements on the leaves. The branch nodes show the leftMax
and middleMax
.
------------
| 4 | 12 |
------------
/ | \
/ | \
------------ ------------ -------------
| 2 | 4 | | 7 | 10 | | 15 | 20 |
------------ ------------ -------------
/ | / | \ / | \
2 4 7 10 12 15 20 30
-------------
| 7 | 15 |
------------
/ | \
/ | \
------------ -------------- ---------------
| 2 | 4 | | 10 | 12 | | 20 | 30 |
------------ -------------- ---------------
/ | \ / | \ | |
2 4 7 10 12 15 20 30
---------------
| 10 | 30 |
---------------
/ |
----------------- \
/ \
------------ ---------------
| 4 | 10 | | 15 | 30 |
------------ ---------------
/ | / |
/ | / |
------------ ------------- --------------- ---------------
| 2 | 4 | | 7 | 10 | | 12 | 15 | | 20 | 30 |
------------ ------------- --------------- ---------------
/ | / | / | / |
2 4 7 10 12 15 20 30
Find works recursively using leftMax
and middleMax
. Say we are searching for \(x\) at branch node \(v\). If \(x \leq v.leftMax\) then recursively search the left child. If \(v.leftMax < x \leq v.middleMax\) then recursively search the middle subtree. Otherwise, search the right subtree.
First, we need a recursive function add_helper
. The recursion will only work when there are branch nodes, so the special case of an empty tree and a tree which is just a leaf will have to be handled later. The recursive procedure takes a single input \(x\) for the new element to add.
The base case is a branch node \(v\) whose children are leaves. There are two subcases, depending on if \(v\) has two or three child leaves. Here are drawings of the two possibilities:
------------ ------------
v-> | 3 | 7 | v-> | 2 | 5 |
------------ ------------
/ | / | \
3 7 2 5 8
If \(v\) has just two children, we can just add \(x\) as a new leaf child of \(v\). Remember that leaves are kept in sorted order, so some existing leaves might need to move. For example, if we are inserting \(5\) into the example above with only two children, \(7\) would move to the right child and \(5\) would become the middle child. After doing this insert, leftMax
and middleMax
need to be updated. The new maximum entry is then returned, i.e. the right entry is returned.
If \(v\) has three children, then adding \(x\) would make \(v\) have four children so we can then split \(v\) into \(v\) and a new branch node \(v'\) and make each have two children. Indeed, say the children of \(v\) are \(w, y, z\). Then make the left and middle child of \(v\) the two smallest elements from among \(\{w, x, y, z\}\) and the right child nil. Create a new branch node \(v'\) and set the left and middle children of \(v'\) to be the two largest elements from among \(\{w, x, y, z\}\). Set leftMax
and middleMax
of \(v\) and \(v'\) properly. Finally, return the new maximum key from among all four elements, and additionally return the new node \(v'\).
For example
------------ ------------ ------------
v-> | 2 | 5 | === add 4 ==> v-> | 2 | 4 | v'-> | 5 | 8 |
------------ ------------ ------------
/ | \ / | / |
2 5 8 2 4 5 8
The base case therefore returns two things:
Say we are at branch node \(v\).
After recursing, we get back potentially a new maximum key from that child tree. We use this to update \(v.leftMax\) or \(v.middleMax\). For example, if we recurse to the left and the left subtree returns a new maximum, we set leftMax
to that value. We then want to return the new maximum key from the entire subtree, but notice that if we recurse to the left we do not know the largest element in our subtree! The saving grace is that the maximum value in our subtree did not change when recursing to the left, since we were only changing things in the left subtree. If we recurse to the right or recurse to the middle when there is no right subtree, we can know the new maximum key. Therefore, we will either return a new maximum value for the subtree or return that the maximum did not change.
Also after recursing, we sometimes get back a new clone of the child we recursed to. This new node must be added directly after the child that we recursed to. Again, there are two cases depending on if \(v\) has two or three children. For example, say we recurse to the left to node \(u\) and get back a new clone \(u'\). \(u'\) must then be added directly after \(u\).
------------------- ------------------- -------------------
v-> |leftMax|middleMax| ==== add u' ==> v-> |leftMax|middleMax| v'-> |leftMax|middleMax|
------------------- ------------------- -------------------
/ | \ / | / |
u w t u u' w t
The cases if \(v\) has two or three children are identical to before: if \(v\) has two children the new child can be added directly. If \(v\) has three children, adding the new child would push it to four children so a new node \(v'\) must be created. I suggest you write some utility functions which can be used from both the base case and inductive case.
The recursive case returns two things:
With the above utility function, the add method works as follows:
leftMax
and middleMax
.add_helper
procedure above. If the recursion returns a new clone \(r'\) of the root \(r\), create a new branch node \(s\) to become the new root and set \(r\) and \(r'\) as the new children of \(s\). Notice this is the only point we ever grow the height of the tree, and we therefore guarantee that the leaves are always at the same depth.Remove is similar. Again, we have a recursive remove_helper
which works only when there are branch nodes. The special case of an empty tree and a leaf will be handled later. The recursive procedure takes a single parameter \(x\) for the element to remove.
The base case is a branch node \(v\) whose children are leaves. We just check the leaves and if any are equal to \(x\) we remove it and possibly shift the other children. Note this could leave the node \(v\) with only a single child, but we are going to make the parent of \(v\) fix this situation. leftMax
and middleMax
are updated. Finally, the base case returns two things:
Say we are at branch node \(v\).
The recursion returns a boolean if the child has only a single grandchild. If that boolean is true, we need to fix up that child which we will do by moving grandchildren around from our other children.
Consider the following example, where we recursed to the middle child \(u\) and the middle child told us there is only a single grandchild.
-------------
v-> | 7 | ## |
-------------
/ | \
/ | \
------------ -------------- ---------------
| 2 | 4 | u->| 10 | ## | | 20 | 30 |
------------ -------------- ---------------
/ | \ / / |
2 4 7 10 20 30
We can actually fix this situation in two ways. First, we could move the \(7\) from the left subtree over to be the new left grandchild of \(u\), moving \(10\) to the middle of \(u\). After this, all three children of \(v\) will have two grandchildren. Alternatively, we could move the \(10\) over into the right child, so the right child has \(10\), \(20\), and \(30\). Then \(u\) has no children and can be removed, which would cause the right child of \(v\) to move over to the middle. After this, \(v\) has two children each of which have three grandchildren.
In general, one of these two possibilities will be available (we might not get lucky enough to have them both happen). So if our child tells us they have only one grandchild, we either move the element out to a neighbor that has only has two grandchildren, or steal a grandchild from a neighbor that has three grandchildren. The leftMax
and middleMax
are updated for all branches that change. The recursive case returns two things:
The remove method then works as:
The iterator can be very similar to the iterator for treaps from day28. You just need to modify the iterator to only stop at the leaves.
I have started the implementation for you in the file 23tree.cpp.
main
to show that your implementation works.add_helper_return
and remove_helper_return
used Tval
but should be using Tkey
. The 23tree works with keys and stores maximum keys. Values are only at the leaves.