Nov 20

Today I first finished up insert from day34. Next, I started on red-black tree remove. I have all the notes for remove here, but we will cover them in the next few lectures. Again, I provide a counterpoint to the book by describing the steps you might go through if you were designing red-black trees yourself.

BST Remove

To remove from a red-black tree, we first find a node to remove using the binary search tree property. Say we are trying to remove element \(x\). First, we use the normal BST find procedure to locate the node \(v\) with \(v.x\) equal to \(x\). If \(v\) is a leaf or has a single child, we can directly remove \(v\). If \(v\) has two children, \(v\) cannot be directly removed. Instead, we will locate a node \(u\) such that:

  1. \(u\) is a leaf or has only one child,
  2. if the element stored at \(v\) is replaced by the element stored at \(u\) then the BST property is still satisfied.

Once we have located such a node \(u\), we can move \(u.x\) to be stored at \(v.x\) and then remove the node \(u\) from the tree. This effectively removes the element stored at \(v\) which is the element we originally wanted to remove. How to find such a node \(u\)? Well, to be able to be stored at \(v\) the element must be larger than everything in the left subtree of \(v\) and smaller than everything in the right subtree of \(v\). That means we have two candidates: we can take the largest element from the left subtree of \(v\) or the smallest element from the right subtree of \(v\). Either will work, but I decided to take the smallest element from the right subtree of \(v\). In the homework exercise extract_min you figured out the way of finding such a node: go left until left is nil. Therefore, from \(v\) to find \(u\) we will go right and then go left until left is nil. This will produce a node \(u\) with \(u.x\) the smallest element from the right subtree of \(v\), so moving \(u.x\) to be stored at \(v.x\) and removing \(u\) still produces a valid binary search tree.

bst_extract_min(u):
    # Find and remove the minimum element.  Returns the x value of the minimum element along with the
    # new root of the subtree.
    if u.left == nil:
        # We have found the minimum value.  Return u.x as the value, and u.right as the new root
        # of the subtree with u removed.
        return (u.x, u.right)
    else:
        # Recurse to the left
        (minX, newLeftChild) = extract_min(u.left)

        # Set the new left child
        u.left = newLeftChild

        # Pass along the minX value and return u as the (unchanged) root
        return (minX, u)

bst_remove(v, x):
    # removes x from the subtree rooted at v. Returns the new root of the subtree.

    if v == nil:
        # x was never found so nothing needs to be removed
        return nil

    if v.x == x:
        # We have found the element to remove

        # If v has a nil left or right child, return the other as the new root effectively removing v
        if v.left = nil:
            return v.right
        if v.right = nil:
            return v.left
            
        # v has two children so find and remove the minimum from the right
        (minX, newRightRoot) = extract_min(v.right)

        # Replace v.x with the extracted value
        v.x = minX
        v.right = newRightRoot


    elif x < v.x:
        v.left = bst_remove(v.left, x)
    else:
        v.right = bst_remove(v.right, x)

    return v

Red-Black Properties

The above procedure will remove an element preserving the binary search tree property, but what about the red-black properties? If the node to remove (\(v\) itself or \(u\)) is red, then the black-height and no-red-edge properties are still satisfied. Indeed, removing a red node will not effect the number of black nodes on a root to leaf path. Also, since there were no red-edges before removing, both the parent and child of a red node are black so removing the red node will not produce a red-edge. The left-leaning property could be violated, but a simple flip will fix this as we saw when considering insert.

What about removing a black node? This will certainly cause the black-height property to be violated, since any root to leaf path passing through the removed node will have one fewer black node. This gives rise to two strategies to implement red-black removal.

The book takes the first approach and so I will as well, but I want to highlight that if you are building this data structure from scratch you would try both approaches and pick the easier of them or the more efficient. Indeed, both approaches work and in my opinion both are roughly of equal difficulty to implement, but fixing a black violation is more efficient in practical implementations (both have the same big-O complexity). Briefly, to make sure you remove a red node you can proceed as follows. The root is always black so we can change the root to red and preserve the black-height property, but perhaps create a red-edge if a child of the root is red. We can then push this red node we have created down the tree using rotations and flips, dragging the red-edge along. As long as we keep pushing the red node down the tree, we can eventually guarantee to remove a red node.

Enough about that, we will follow the book and implement remove by finding the node to remove exactly like the bst_remove above, either removing \(v\) itself or the minimum node from \(v\)'s right subtree. If the node we remove is black, we will then add some code to fix the black-height violation. As well, we will maintain the left-leaning property and red-edge property.

Fixing the black-height

To fix the black height, consider removing 11 which is black from the following tree.

             7,?           
           /      \        
        3,?         11,B   
      /   \        /   \   
    1,?    5,?    9,?   nil

If 9 is red, we can easily remove 11 and restore the the black-height property by changing 9 to black. This will not produce any red-edges since we are changing 9 from red to black. Also, all paths from root to leaf have the same number of black nodes.

When 9 is black, we can't get away with such an easy trick and it seems that removing 11 will violate the black-height property. The black-height property is a difficult one to reason about fixing because it is a global property: we need to consider all root to leaf paths in the entire tree. This hampers our ability to take advantage of the power of recursion and induction to reason only about a few nodes in the tree. (Contrast this with insert where we only considered a red-edge which was a local violation.) Therefore, we want to convert a black-height violation from a global violation to a local one, which we can do by a clever idea of adding a new color called double-black. This color double-black will not be a valid color on any node in the tree, since our final red-black tree must have only red or black on each node. But, during remove we might temporarily have one node colored double-black. We can then think of this as a violation since no double-black nodes are allowed, and we will write code to fix this violation by removing the double-black color. The advantage this gives us is that we can consider a double-black node as counting for two when we count the total number of black nodes on a root to leaf path. This allows us to remove a black node with a black child, make its child double-black, and preserve the black-height property. For example, consider the following snippet of a tree (so there are nodes above 7 and below 1, 5, and 9, I just haven't drawn them).

             7,R                                             7,R            
           /      \                                        /      \
        3,B         11,B       === remove 11 ===>       3,B         9,BB   
      /   \        /   \                              /   \      
    1,R    5,B    9,B   nil                         1,R    5,B

We remove node 11 and change the color of 9 to double-black. I claim that if we count the double-black node as two, the black-height property is preserved. Consider any root-to-leaf path that passes through 9 (there are nodes below 9, I just didn't draw them). Such paths before the removal had two black nodes, 11 and 9. After the removal, they have only 9 but 9 counts twice since it is double-black. Paths passing through 1 and 5 also had no change in the number of black nodes.

Thus, when removing a black node with a black child we introduce a double-black violation. More importantly, the black-height property is still satisfied. We can then proceed to fix the double-black violation (i.e. remove the double-black node) by operations such as rotations, flips, and color changes reasoning only locally about the violation. As long as these operations preserve the black-height property and don't introduce any red-edges, we can successfully remove from a red-black tree.

Our strategy will be similar to insert; our goal is to either change colors so no node is colored double-black or move the double-black node closer to the root. If the double-black node moves all the way to the root, the color of the root can just be changed to black. As long as these operations to fix this double-black violation preserve the black-height property, the no red edge property, and the left-leaning property, we will successfully remove from a red-black tree.

Remove pseudocode

We will have three possible colors and each color will also be considered as an integer value with red 0, black 1, and double black 2.

enum Color {
    RED = 0,
    BLACK = 1,
    DOUBLE_BLACK = 2,
}

First, the procedure to splice a node out of the tree. If the node we are removing is red, nothing else needs to be done. If the node we are removing is black, we increase the color on the child, either from red to black or black to double-black. Increasing the color on the child of the node we remove can be done by adding one. Since we consider red as 0, black as 1, and double-black as 2, adding 1 to red produces black and adding 1 to black produces double-black. The only point we have not yet talked about is when removing a leaf. There is no child to increment the color on. What we do is create a dummy node called doubleBlackNilNode. This node will be considered double-black since nil children are considered black and we increment the color. But once we fix this dummy node by removing the double-black color from it, the dummy will be removed and the node will revert to nil.

splice(v):
    # Remove node v which cannot have two children.  Returns the
    # root of the tree with v removed.

    assert(v.left == nil or v.right == nil)

    if v.left != nil:
        if v.color == BLACK:
            v.left.color += 1
        return v.left

    elif v.right != nil:
        if v.color == BLACK:
            v.right.color += 1
        return v.right

    else:
        # v is a leaf
        if v.color == BLACK:
            return doubleBlackNilNode
        else:
            return nil

Next, we use extract_min and remove similar to the BST versions above. There are two changes. First, we call fix_doubleblack before returning which will fix the double-black violation. Second, we use a trick to not have extract_min return the removed element. Instead, we pass v into the extract_min function. When extract_min finds the smallest element, before removing that node the element stored on the minimum node is copied to v.

extract_min(u, v):
    # Remove the minimum element from the tree rooted at u.  The removed element is stored on v.
    # The return value is the root of the tree with the minimum element removed

    if u.left == nil:
        # Copy element from u to v, overwriting what used to be at v
        v.x = u.x
        # Remove the node u from the tree
        return splice(u)

    else:
        # Go left, passing v along
        u.left = extract_min(u.left, v)

        # Fix any double-black violation
        return fix_doubleblack(u)

remove_helper(v, x):
    # Remove x from the subtree rooted at v.  The return value is the new root of the subtree

    if v == nil:
        return nil
    
    if x == v.x:
        # We have found the element to remove
        if (v.left == nil or v.right == nil):
            # Remove v itself
            return splice(v)
        else:
            # extract the minimum from the right.  Pass v in so the removed element replaces the
            # element stored at v
            v.right = extract_min(v.right, v)

    elif x < v.x:
        v.left = remove_helper(v.left, x)
    else:
        v.right = remove_helper(v.right, x)

    return fix_doubleblack(v)

Fix a double-black violation

All that remains is to write the fix_doubleblack function. As mentioned before, it attempts to either remove the double-black node or push the double-black node closer to the root. Therefore, at the start of the fix_doubleblack call we have a tree rooted at \(v\) where one of \(v\)'s children is potentially double-black. Our goal is to remove the double-black color or make \(v\) itself double-black, all the while preserving the black-heights and not producing any red-edges.

Therefore, there are three possibilties: (v.left is black, v.right is doubleblack), (v.left is doubleblack, v.right is black), and (v.left is red, v.right is double-black). Note that v.left is double-black and v.right is red is impossible because otherwise we would have violated the left-leaning property before the remove.

 v->    7,?                   7,?                   7,?       
      /      \              /      \              /      \
   3,B       11,BB       3,BB      11,B        3,R       11,BB

The case when v.left is red and v.right is double-black can be converted to the other case by flipping right at v. (Note that 1 and 5 are black because there are no red-edges.)

 v->    7,?                                        3,?       
      /      \                                   /      \    
   3,R       11,BB   == flip_right(v) ==>     1,B       7,R
  /   \                                                /   \
1,B    5,B                                          5,B     11,BB   

We then have the situation where from 7 the left child is black and the right is double-black. This gives the following code for fix_doubleblack. Note how we keep relabeling v to always be the root. Also, we deal at this point with the doubleBlackNilNode that could have been produced if we splice out a black leaf. Since the code will fix the double-black node by either removing it or moving the double-black node up the tree, we don't need the dummy doubleBlackNilNode anymore so convert it to actual nil.

is_double_black(v):
    return v != nil and (v == doubleBlackNilNode or v.color == DOUBLE_BLACK)

fix_doubleblack(v):

    if is_double_black(v.right) and is_red(v.left):

        if v.right == doubleBlackNilNode:
            # We don't need to keep around the dummy nil node, so set it to actual nil
            v.right = nil

        v = flip_right(v)

        v.right = fix_right_doubleblack(v.right)

    elif is_double_black(v.left):

        if v.left == doubleBlackNilNode:
            v.left = nil

        v = fix_left_doubleblack(v)

    elif is_double_black(v.right):
        
        if v.right == doubleBlackNilNode:
            v.right = nil

        v = fix_right_doubleblack(v)


     # Finally, fix the left-leaning property
     if is_black(v.left) and is_red(v.right):
         v = flip_left(v)

     return v

All that remains is to write fix_right_doubleblack and fix_left_doubleblack to fix up the remaining two cases.

Fixing a right double-black child

So the book calls this fixupCase3. While reasoning about this case, I found a simpler way of fixing a right double-black child than what is in the book. Both will either remove the double-black color or move it to the root.

The tree looks like the following. Let c be the color of 7, either red or black.

  v ->     7,c
        /      \        
     3,B        11,BB
   /   \       /    \
 1,?    5,?  9,?     13,?

We are trying to move the double-black color up the tree, so the first thing to do is make 11 black and increase by one the color on 7. This keeps the same black-count on paths passing through 11, but increases by one the black-count on paths through 3. To fix this, we can reduce 3 to red and then all paths have the same black-count as before. This is called pull_black. Thinking of colors as integers, we get the following code.

pull_black(v):
    v.color += 1
    v.left.color -= 1
    v.right.color -= 1

Note that 7 either changes from red to black or black to double-black. The new color I denote by c+1 so is either black or double-black.

  v ->     7,c                                                7,c+1
        /      \                                           /      \        
     3,B        11,BB        ===pull_black(v)==>        3,R        11,B
   /   \       /    \                                 /   \       /    \
 1,?    5,?  9,?     13,?                           1,?    5,?  9,?     13,?

This fixes the double-black node, but there could be a red-edge violation! We turned 3 from black to red and so if 1 or 5 (or both) were red we have introduced a red-edge or even two red-edges if both 1 and 5 were red. But notice that by the left-leaning property, we cannot have 1 black and 5 red, so if there is a red-edge at all we must have 1 red. Thinking back to insert, we removed this red-edge by flipping right at v

  v ->     7,c+1                                                  3,c+1
        /      \                                               /      \        
     3,R        11,B           == flip_right(v) ==>         1,R        7,R    
   /   \       /    \                                                 /    \
 1,R    5,?  9,?     13,?                                           5,?     11,B

We could still have a red-edge between 7 and 5 if 5 happens to be red. But note that both 1 and 7 are both red, so a push_black will fix this. A push_black operation even has the good result of removing the double-black color from 3 if c happend to be black, so we want to push_black even if there isn't a red-edge to fix.

push_black(v):
    v.color -= 1
    v.left.color += 1
    v.right.color += 1

After flipping right and pushing black, we end up with

  v ->     7,c+1                                            3,c+1                                   3,c  
        /      \                                         /      \                                /      \        
     3,R        11,B        == flip_right(v) ==>      1,R        7,R        ==push_black==>    1,B        7,B
   /   \       /    \                                           /    \                                  /    \
 1,R    5,?  9,?     13,?                                     5,?     11,B                            5,?     11,B

We now have no red-edges (we did not even introduce a red-edge between 3 and its parent since the color of 3 is the same as the original root 7). This leads to the following code:

fix_right_doubleblack(v):
    pull_black(v)
    if is_red(v.left) and is_red(v.left.left):
        v = flip_right(v)
        push_black(v)
    return v

Fixing a left double-black child.

This is very similar to the above case, except we can't rely on the left-leaning property. Again, this is a little different than what is in the book. Either my method or the book's method work to remove the double-black node or move it up the tree. The tree looks as follows. Again, the first step is to pull_black to move the double-black up the tree.

  v ->     7,c                                            7,c+1
        /      \                                        /      \        
     3,BB       11,B         ==pull_black==>         3,B        11,R     
   /   \       /    \                              /   \       /    \
 1,?    5,?  9,?     13,?                        1,?    5,?  9,?     13,?

Similar to before, we could have introduced a red-edge if either 9 or 13 (or both) are red. If both are black, we are done and can just return. Let's first consider the symmetric case from the previous when 13 is red (this is the reflection from when 1 was red). We now flip_right at 7 and then push_black to obtain:

  v ->     7,c+1                                        11,c+1                               11,c
        /      \                                      /      \                             /      \     
     3,B        11,R       == flip_left(v)==>     7,R        13,R   ==push_black==>    7,B        13,B
               /    \                            /   \                                /   \           
             9,?     13,R                      3,B    9,?                           3,B    9,?        

This maintains the black-heights and does not produce a red-edge (even if 9 is red). The left-leaning property could be violated if 9 is red, so if 9 is red we need a final flip_left at 7 to restore the left-leaning property.

Now consider when 13 is black and 9 is red.

 v ->     7,c+1         
       /      \         
    3,B        11,R     
              /    \    
            9,R     13,B

We could consider various rotations and color changes to fix this directly, but notice that if we rotate right at 11 we go back to the previous case where v.right and v.right.right are both red. Note that 8 and 10 are both black since there were no red-edges before we pulled black at v.

 v ->     7,c+1                                         v ->     7,c+1         
       /      \                                               /      \         
    3,B        11,R         == rotate_right at 11 ==>      3,B        9,R     
              /    \                                                 /    \    
            9,R     13,B                                           8,B     11,R
           /   \                                                          /    \
          8,B   10,B                                                    10,B    13,B

We still maintain the black-height property. The left-leaning property is temporarily violated between the children of 9, but we are back to the situation where v.right and v.right.right are both red, so as above flipping left and pushing black fixes this.

fix_left_doubleblack(v):
    pull_black(v)

    # Transform v.right with a red left and black right child to a case where v.right and
    # v.right.right are red.
    if is_red(v.right) and is_red(v.right.left) and is_black(v.right.right):
        v.right = rotate_right(v.right)

    # Fix the case when v.right and v.right.right are red
    if is_red(v.right) and is_red(v.right.right):
        v = flip_left(v)
        push_black(v)

        # Now fix the left-leaning violation that could have occurred
        if is_black(v.left.left) and is_red(v.left.right):
            v.left = flip_left(v.left)

    return v

Exercise

Exercise 9..3 and Exercise 9..10 from this page.