Nov 16

Insert Exploration

Insert is discussed in section 9.2.2 in the book. The insert I discuss here operates in the same way but is recursive instead of iterative. Also, the book describes what the insert is doing and here I attempt to provide some intermediate steps on how the insert procedure could be discovered allowing you to follow along as I recreate insert. Hopefully seeing the two different presentations of the same thing can help you understand red-black tree insert.

Insert proceeds just like normal binary search tree insert to find a leaf to insert the node. The first question is what color to make the new node? If we make the new node black we are guaranteed to violate the black-height property. If we make the new node red, it is possible to get lucky and not violate anything. Certainly, adding a red node doesn't change any black-height so the black-height property is still OK. Also, if we add a red leaf node under a black node, we don't violate the no-red-edge property. The only problem is if we add a red leaf under a red node. So our first stab at implementing insert will be to insert a new red node and try and fix the red-edge that might appear. It turns out that we will be able to fix the red-edge problem, but I want to mention that if you were designing this data structure from scratch you still have two approaches and you would investigate both of them: add red and fix the red problem or add black and fix the black problem. It turns out that adding red and fixing the red problem is easier, but the only way anyone learned that was trying both approaches.

The structure of our insert method is standard binary search tree insert. If g is nil, we create a new red node. If g.x = x, we just replace the element stored in g with the new element (remember, elements are usually a pair of a key and a value). If x < g.x, recursively insert to the left. If x > g.x, recursively insert to the right.

insert_helper(g, x):
    if g == nil:
        g = new node
        g.x = x
        g.color = RED

    elif g.x == x:
        g.x = x

    else:
    
        if x < g.x:
            g.left = insert_helper(g.left, x)
        else:
            g.right = insert_helper(g.right, x)

        # TODO: check for red-edge violation, and fix or move closer to root.

    return g

After recursing, the tree will satisfy the binary search tree property but might fail the red-black properties. To make our life easier, we will try and always keep the black-height property satisfied. Since we are inserting a new red node, the initial insert does not violate the black-height property but could produce a red-edge. As long as our tree manipulation to fix the red-edge does not cause the black heights to change, we can concentrate on fixing the red-edge and not have to worry about fixing a black-height problem. Also to make our life even easier, we are going to make sure our tree manipulation does not produce a second red-edge, so that we only have to concentrate on a single red-edge.

To fix the red-edge, our strategy will be to move the red-edge closer to the root (or remove it completely, but we can't rely on removing it completely if we are unlucky). By moving the red-edge closer to the root (and preserving the binary search tree and black-height properties), we can eventually guarantee to fix the red-edge. To see this, consider that I have moved the red-edge all the way to the root, so that for example the root and its left child are red. At this point I can just change the root to black. This certainly fixes the red-edge, and since the root is on every single root to leaf path, the black-height property is preserved. (Note that the root is the only node which I can change the color like this and preserve the black-height property.)

Therefore our goal is to add some code to replace the TODO in the insert pseudocode above. That code should detect a red-edge and either try and remove it or move it closer to the root while preserving the black-height and binary search tree properties. In more detail, after the recursion returns there might be a red-edge between a child of g and grandchild of g. For example, say we recurse to the left. After recursing, it could be that the red-edge could not be removed, but it was moved so that the red-edge was between g.left and a child of g.left. So if there is a red edge between g.left and a child of g.left, our task is modify colors or change the tree so that the red edge is removed altogether, or the red-edge is between g itself and a child of g.

So consider that g has a red edge between a child and a grandchild. In the diagrams below, each node is labeled with its color, either B or R. A question mark is used for an unknown color (could be either color). Since we will make the tree have only a single red-edge, g itself must be black since otherwise there would be two red-edges: one between g and a child of g and one between a child of g and a grandchild of g.

     g ->    7,B
           /      \
        3,?         11,?
      /   \        /   \
    1,?    5,?    9,?   13,?

There are four possibilities, either 3-1, 3-5, 11-9 or 11-13 could be the red edge. (Actually, 3-1 and 3-5 can be the red-edge only if I recurse to the left, and 11-9 and 11-13 can be the red-edge only if I recurse to the right.) We will consider each possibility one by one. For each possibility, we will add code into the insert procedure replacing the TODO to detect and fix that case.

g.left and g.left.left are red

If 3 and 1 are red, then 5 and 7 must be black since there is only a single red-edge. There are now two possible subcases, either 11 is red or 11 is black. Let's consider the case when 11 is red. Since there is only a single red-edge violation, the tree looks like the following.

     g ->    7,B
           /      \
        3,R        11,R
      /   \        /   \
    1,R    5,B    9,B   13,B

To fix this, we could try turning 3 black. This certainly fixes the red-edge violation, but then the black-height property fails, because each root to leaf path that passes through 3 has increased the number of black nodes by one. We can fix this by converting 7 to red to decrease the number of black nodes on such paths, but turning 7 to red also removes a black node from all paths from root to leaf passing through 11. We can fix those paths by turning 11 to black and we have now fixed the black-height property and the red-edge violation between 1 and 3. We might have produced a red-edge violation between 7 and its parent (not drawn) since we changed 7 to red, but the key observation is that we have pushed the red-edge violation closer to the root. If we can continue pushing the red-edge violation closer to the root we can guarantee that we can eventually fix it.

     g ->    7,B                                g ->    7,R           
           /      \                                   /      \
        3,R        11,R      ==push_black==>       3,B        11,B
      /   \        /   \                         /   \        /   \
    1,R    5,B    9,B  13,B                    1,R    5,B    9,B  13,B

So we have the following operation.

push_black(g):
    g.color = red
    g.left.color = black
    g.right.color = black

(The book version of push_black is in terms of colors as numbers because the remove operation needs the more general version, but for now you can think of push_black as above.) We can now call push_black from the insert pseudocode above by replacing the TODO with some code which looks like the following (see below for the final version):

if g.left.color == RED and g.right.color == RED:
    push_black(g)
    return g
# TODO: check for other red-edge violations, and fix or move closer to root.

The following lemma states that this code properly fixes the problem of a red-edge between a child and grandchild of g when g.left and g.right are red.

Lemma 1 Let \(g\) be the root of a binary search tree satisfying the black-height property and where there exists a single red-edge between a child of \(g\) and a grandchild of \(g\). Also assume that \(g.left\) and \(g.right\) are both red. Then after calling push_black(g), \(g\) is the root of a red-black tree and the black-height did not change.

Proof Since there is only a single red-edge between a child of \(g\) and a grandchild, \(g\) itself must be black or there would be two more red-edges between \(g\) and the children of \(g\). Then push_black changes \(g.left\) and \(g.right\) to black, removing the red-edge between a child of \(g\) and a grandchild of \(g\). push_black also changes \(g\) to red so that every root to leaf path has the same number of black nodes. Finally, the only node changing color to red is \(g\) but \(g\) will not produce any new red-edges in the subtree rooted at \(g\), since both of \(g\)'s children become black. (There might be a red-edge between \(g\) and its parent, but the Lemma doesn't say anything about that.)

That was only one case. So what about we still have a red violation between 1 and 3 but 11 is black. We can't call push_black anymore since it won't preserve the black-height property.

     g ->    7,B
           /      \
        3,R        11,B
      /   \        /   \
    1,R    5,B    9,?   13,?

I'd still like to turn 3 black and 7 red since that fixes the red-edge violation and keeps the same number of black nodes on paths from roots to leaves that pass through 3. Like before, changing 7 to red decreases the number of black nodes on paths through 11 which is bad. But 3 is now black and if we rotate the tree right, 3 will replace 7 so paths that go through 11 will also go through 3, adding an extra black node on paths through 11 and not adding black nodes on any other paths. This is called flip_right.

flip_right(g):
    swap colors of g and g.left
    return rotate_right(g)

In pictures,

     g ->    7,B                                       g' ->   3,B
           /      \                                          /      \
        3,R        11,B       == flip_right(g) ==>        1,R         7,R    <- g
      /   \        /   \                                             /   \
    1,R    5,B    9,?  13,?                                        5,B   11,B
                                                                         /   \
                                                                       9,?    13,?

Note that after flipping right, the black-height property is still satisfied. To see this, we can check that each path from the root to one of the four leaves (1, 5, 9, and 13) still have the same number of black nodes. The path to 1 had one black node before and still have one black node after the flip. 5 had one black node before (7) and still has one black node after (3). 9 had two before (7, 11) and still has two after (3, 11). Finally, 13 had two before (7, 11) and still has two after (3, 11).

Therefore, we can add code to the insert psuedocode such as the following

if g.left.color == RED and g.left.left.color == RED and g.right.color == BLACK:
    return flip_right(g)
# TODO: check for other red-edge violations, and fix or move closer to root.

The following lemma states that the above code properly fixes the red-edge problem in this case.

Lemma 2 Let \(g\) be the root of a binary search tree satisfying the black-height property and where there exists a single red-edge between \(g.left\) and \(g.left.left\). Also assume that \(g.right\) is black and let \(g' = flip\_right(g)\). Then \(g'\) is the root of a red-black tree and the black-height did not change.

Proof Since the edge between \(g.left\) and \(g.left.left\) is the only red-edge, it must be the case that \(g\) is black and \(g.left.right\) is black before flipping. The tree therefore looks like the diagram above and flip_right does not produce any red-edges. The only node which changes from black to red is \(g\) itself (7 in the diagram above), but all nodes adjacent to \(g\) in the resulting tree are black so no new red-edges are produced in the subtree rooted at \(g'\). Finally, the black-height property is maintained by the flip, so \(g'\) is the root of a red-black tree.

g.left and g.left.right are red

Consider 3 and 5 are red, which forces 1 and 7 to be black since there is only a single red-edge.

     g ->    7,B
           /      \
        3,R        11,?
      /   \        /   \
    1,B    5,R    9,?   13,?

We could go through several cases like before, but there is a trick we can use to reduce the number of cases we need to check. We want to go back to the case when g.left and g.left.left are red, which we can do by rotating left at 3!

     g ->       7,B                                                 g ->      7,B           
            /         \                                                    /       \        
 w ->   3,R             11,?                                           5,R           11,?    
      /      \          /   \      == rotate_left(w) ==>             /     \        /   \   
    1,B       5,R      9,?   13,?                                   3,R    6,B   9,?   13,?
            /     \                                               /   \   
           4,B    6,B                                           1,B   4,B

(Quick-quiz: why must 4 and 6 be black?)

To see that this is safe, note that the black-height property is maintained by a rotation by checking that each root to leaf path has the same number of black nodes. Also, note that rotate_left will transform any situation where w is red, w.left is black, and w.right is black into a situation where the left child and left grandchild of g are both red, which we know how to fix using the above code.

Lemma 3 Let \(w\) be the root of a binary search tree satisfying the black-height property and where there exists a single red-edge between \(w\) and \(w.right\). Also assume that \(w.left\) is black and let \(w' = rotate\_left(w)\). Then \(w'\) is the root of a binary search tree and the black-height did not change. Also, there is a single red-edge between \(w'\) and \(w'.left\).

w ->     3,R                                           w' ->    5,R
      /      \                                                /     \
    1,B        5,R              == rotate_left(w) ==>       3,R     6,B       
              /    \                                       /   \     
            4,B    6,B                                  1,B    4,B

Proof First, the children of \(w.right\) (4 and 6 in the figure) must be black because otherwise there would be more than one red-edge. We therefore have the figure above, and after rotating there is a single red-edge between \(w'\) and \(w'.left\). Also, each root to leaf path has the same number of black nodes.

A trick to reduce the cases

So the remaining two cases are \(g.right\) and \(g.right.right\) red and the other case is \(g.right\) and \(g.right.left\) are red. We could in fact fix these cases very similar to the above essentially changing right and left thinking of reflecting everything in a mirror. But we would still need to write code for these cases and then think about why the code is correct, so if we could avoid the cases completely that would be better.

Recall from before that if \(g.left\) and \(g.right\) are both red and \(g\) is black, we can push_black and fix a possible red-edge between \(g.right\) and a child of \(g.right\). So let's just try and say that this is the only possible way that there could be a violation between \(g.right\) and a child of \(g.right\) since it is easy to fix. This is called the left-leaning property:

left-leaning-property: For each node w, if w.right is red then w.left is also red. Equivalently, the book states this as for each node u, if u.left is black then u.right is black.

OK, we can just make up a property. How can we know that we can keep this property satisfied during insert? Well, we have to go back to the 2-4 tree. When converting from a 2-4 tree to a red-black tree, how did a node with one red child and one black child appear? Exactly when converting from a node in the 2-4 tree which had 3 children. As long as we convert a node with 3 children in the 2-4 tree into a node with a left red child in the red-black tree, we can always produce a left-leaning red-black tree from a 2-4 tree. This implies that it is possible to insert into a left-leaning red-black tree to maintain the left-leaning property. In addition, something even nicer occurs: each 2-4 tree produces a unique left-leaning red-black tree.

Maintaining the left-leaning property

As we saw above when considering the case when \(g.left\) and \(g.left.right\) are red (which violated the left-leaning property), we can restore the left-leaning property by rotating left. Will rotation always work? We check out various cases and find that the following fails:

w ->     3,B                                                       5,R
      /      \                                                  /      \
    1,B        5,R              == rotate_left(w) ==>         3,B      6,?       
              /    \                                         /   \     
            4,?    6,?                                     1,B    4,?

Notice the path to 6 now has one less black node so the black-height property is violated! We can fix that by in addition swapping the colors on 3 and 5 so that 5 becomes black and 3 becomes red, which fixes the black-height property.

flip_left(w):
    swap colors on w and w.right
    return rotate_left(w)

There are two possible situations we will call flip_left: \(w\) is either red or black. Note that when we are fixing the left-leaning property, the only possible red-edge is between \(w\) and \(w.right\) since we are fixing the left-leaning property before considering the cases starting from the parent of \(w\) (the node I have been calling \(g\)). Therefore, in the following diagrams 4 and 6 must be black.

w ->     3,R                                                    5,R
      /      \                                                /     \
    1,B        5,R              == flip_left(w) ==>         3,R     6,B       
              /    \               when w is red           /   \     
            4,B    6,B                                  1,B    4,B  


w ->     3,B                                                     5,B
      /      \                                                /      \
    1,B        5,R              == flip_left(w) ==>         3,R      6,B       
              /    \               when w is black         /   \     
             4,B    6,B                                  1,B   4,B

Lemma 3, new version Let \(w\) be the root of a binary search tree satisfying the black-height property and where there exists a single red-edge between \(w\) and \(w.right\). Also assume that \(w.left\) is black and assume that the left-leaning property is satisfied at all nodes except \(w\). Let \(w' = flip\_left(w)\). Then \(w'\) is the root of a binary search tree satisfying the black-height property and the left-leaning property where the black-height did not change. Also, there is a single red-edge between \(w'\) and \(w'.left\).

Proof This is the same as Lemma 3 but using flip_left instead of rotate_left. Since \(w\) and \(w.right\) both are red, swapping their colors does nothing so flipping left is the same as rotating left.

Lemma 4 Let \(w\) be the root of a red-black tree where \(w\) is black, \(w.left\) is black, and \(w.right\) is red. Assume the left-leaning property is satisfied at all nodes except \(w\). Let \(w' = flip\_left(w)\). Then \(w'\) is the root of a left-leaning red-black tree and the black-height did not change.

Proof Since there are no red-edges, the children of \(w.right\) are black. Therefore, the tree looks like the above figure and after flipping, the black-height property is preserved. No new red-edges are produced since the only new red node is 3 but all nodes adjacent to 3 are black. Finally, the left-leaning property is fixed at \(w'\). This ends the proof.

Notice that we can fix the left-leaning property concentrating only on w and not g. Therefore, we are going to change the insert function so that the left-leaning property is satisfied when we return. Note that the way that the left-leaning property is violated is when we recurse to the right and the right child becomes red while the left child is black. We can fix this by changing the insert pseudocode to be:

if x < g.x:
    g.left = insert_helper(g.left, x)
else:
    g.right = insert_helper(g.right, x)

    if g.left.color == BLACK and g.right.color == RED:
        g = flip_left(g)

# code to fix red-edges here

This is a little confusing in the code because of the re-use of the variable g. Say we are inserting something between 5 and 7 (such as 6.75) so the recursion will go left from 7, right from 3, and right from 5. So you can consider that there are three recursive instances, one where g is 7, one where g is 3, and one for when g is 5. (In fact, there are more, one for each node along the path to wherever the new element is inserted.) For the recursive instance where g equals 3, we recursed to 5 and after recursing 5 has become red. The diagram below shows the tree exactly when the instance for 3 is executing and we have just finished from recursing to the right.

             7,B
           /      \
   g->  3,R        11,?
      /   \        /   \
    1,B    5,R    9,?   13,?
           /  \
         4,B   6,B

We have the red-edge between 3 and 5 but we also have violated the left-leaning property since 5 is red and 1 is black. At this point, the code flips left at g which is 3 at this moment and replaces g with the new root after flipping. This produces the following tree:

             7,B
           /      \
   g->  5,R        11,?
      /   \        /   \
    3,R    6,B    9,?   13,?
   /   \       
 1,B   4,B

We now have a red-edge between 5 and 3, but we don't fix that at this moment and can just return. This is because after we return, the next higher recursive instance will resume execution and in that instance g will be 7. This is exactly a case we have considered above, where g.left and g.left.left are red and can be fixed by push_black or flip_right depending on the color of 11.

The left-leaning property fixes all remaining cases

By pre-emptively preserving the left-leaning property, by the time we are considering a red-edge between a child of g and a grandchild of g there is only a single case to worry about and in fact we have covered the case already.

To see this, consider that \(g.right\) and \(g.right.right\) are red. Since there is a single red-edge, \(g.right.left\) is black and so the left-leaning property is violated so this case will never occur by the time the recursion reaches g. Now consider when \(g.right\) and \(g.right.left\) are red.

     g ->    7,B
           /      \
        3,?         11,R
      /   \        /   \
    1,?    5,?    9,R   13,B

In this case, I claim that 3 must be red. Say that 3 was black. Then the left-leaning property would have been violated before I started the insert unless 11 was initially black but it was changed to red somehow. But we haven't performed any operations yet that could have changed the color of 11! This requires an analysis of the above operations to make sure they could not change the color of 11 and the full details are below. Since 11 was initially red, by the left-leaning property 3 was also initially also red. Since 3 is red, a push_black fixes the red-edge by Lemma 1.

Insert Implementation

The implementation just needs to combine all the above cases in the correct way. I give a recursive algorithm for insert, but as mentioned it fixes the tree in the same way as the loop in the book since the same cases are examined.

The main question is what to do when one of the left or right children of a node is nil. The above cases always assumed a full tree with existing children to the right and left. Because we are trying to fix a red-edge, anytime the above logic considered a red node it must be an actual node. Therefore, during the above logical anaylsis if we had a node with missing children, we would have considered that these missing children were black. Each time a real node has a child pointer which is nil, we can think of that real node as having a virtual black node as a child instead. These virtual nodes will never appear in C++ and are just used to keep the logical reasoning above valid even if children are missing. We therefore will write two utility functions to test if a node is red or black, using black for nil.

is_black(g):
    return (g == nil or g.color == BLACK)

is_red(g):
    return (g != nil and g.color == RED)

With that, we can now combine all the above logic to produce the final insert psuedocode.

push_black(g):
    g.color = RED
    g.left.color = BLACK
    g.right.color = BLACK

flip_left(w):
    swap(w.color, w.right.color)
    # Rotate left
    m = w.right
    w.right = m.left
    m.left = w
    return m

flip_right(g):
    swap(g.color, g.left.color)
    # Rotate right
    n = g.left
    g.left = n.right
    n.right = g
    return n

insert_helper(g, x):
    if g == nil:
        g = new node
        g.x = x
        g.color = RED

    elif g.x == x:
        g.x = x

    else:

        if x < g.x:
            g.left = insert_helper(g.left, x)

        else:
            g.right = insert_helper(g.right, x)

            if is_black(g.left) and is_red(g.right):
                g = flip_left(g)

        if is_red(g.left) and is_red(g.right):
            assert(g.color == black)
            push_black(g)

        elif is_red(g.left) and is_red(g.left.left):
            assert(g.color == black)
            g = flip_right(g)

    return g

insert(x):
    root = insert_helper(root, x)
    if is_red(root):
        root.color = BLACK

Lemma Let \(g\) be the root of a left-leaning red-black tree and let \(b\) be the number of black nodes along a \(g\) to leaf path (which is the same for any such path). Let \(g' = insert\_helper(g, x)\). Then \(g'\) is the root of a binary search tree consisting of the elements of \(g\) plus \(x\). In addition, the black-height property and left-leaning property both hold in the tree rooted at \(g'\). Each \(g'\) to leaf path has exactly \(b\) black nodes (so the number of black nodes along paths did not change). Finally,

If \(g\) was black, then the tree rooted at \(g'\) has no red-edges and is actually a red-black tree. But \(g'\) could be red.
If \(g\) was red, then there could be a red-edge between \(g'\) and \(g'.left\) but no other red edges. Also, \(g'\) is still red and not black.

Proof The proof is by induction on the size of the tree rooted at \(g\).

Base case when \(g\) is nil. Returning a singleton red node correctly satisfies the properties.
Base case when \(g.x = x\). In this case, replacing \(g.x\) with \(x\) is correct to update the BST. Returning \(g' = g\) is also correct since there were no color changes to the tree so \(g'\) is the root of a left-leaning red-black tree.
Inductive case when \(x < g.x\). We insert \(x\) to the left to correctly preserve the BST property. Also, all operations performed preserve the black-height property. To finish this case, we need to check the existence of a red-edge and check the left-leaning property.
- If \(g.left\) was originally red, then \(g\) was originally black. By induction, it could be the case that \(g.left\) and \(g.left.left\) are red after the recursion to the left. Also, since \(g.left\) was originally red, after recursion \(g.left\) is still red by induction so no new left-leaning violation occurs. The potential red-edge between \(g.left\) and \(g.left.left\) is fixed by pushing black or flipping right, as proved by either Lemma 1 or Lemma 2 depending on the color of \(g.right\). Both of these operations preserve the left-leaning property and fix the red-edge, so \(g'\) is the root of a left-leaning red-black tree.
- If \(g.left\) was originally black, then by induction \(g.left\) could change to red but there is no red-edge anywhere in the tree rooted at \(g.left\). The only possible red-edge is between \(g\) itself and \(g.left\) if \(g\) was originally red, but in this case we can just return without modifying the tree since we are allowed to return with a red-edge between \(g\) and \(g.left\). If \(g.left\) switches from black to red, then this will not produce a left-leaning violation at \(g\). Since we then return \(g' = g\), the tree rooted at \(g'\) satisfies the left-leaning property.
Inductive case when \(x > g.x\). We insert \(x\) to the right to correctly preserve the BST property. Also, all operations performed preserve the black-height property. To finish this case, we need to check the existence of a red-edge and the left-leaning property.
- If \(g.right\) was originally red, then \(g\) was originally black since there are no red-edge violations before recursing. Also, \(g.left\) is red because otherwise before recursion the left-leaning property would have been violated. By induction, it could be the case that \(g.right\) and \(g.right.left\) are both red after recursing, producing a red-edge. Since \(g.left\) and \(g.right\) are red and \(g\) is black, push_black removes all red-edges by Lemma 1 and produces a left-leaning red-black tree.
- If \(g.right\) was originally black, then by induction \(g.right\) could turn red but there are no red-edges in the tree rooted at \(g.right\). There are two subcases:
  - If \(g\) was originally red, there could be a red-edge between \(g\) and \(g.right\) if \(g.right\) changes to red. Since \(g\) was originally red it must be that \(g.left\) is black so Lemma 3 flipping left at \(g\) produces a tree with a single red-edge between \(g'\) and \(g'.left\).
  - If \(g\) was originally black, then after recursing \(g\) is the root of a red-black tree no matter if \(g.right\) changes color. The left-leaning property might be violated if \(g.left\) is black and \(g.right\) changes to red, but in this case Lemma 4 implies that flipping left fixes the left-leaning property.

C++

red-black.cpp contains my implementation of insert. In addition to insert, it contains a debug_print method which not only prints the tree but also checks the no-red-edge property and the black-height property.

Exercise

Exercises 9..1, 9..2, and 9..5 from this page.
To my red-black.cpp code, add the find operation. Also, add a few tests in main to show the lookup is working properly.