Oct 26

Treap Add

Recall the add pseudocode from last time

add_helper(newNode, v):
    # Adds newNode into the tree rooted at v
    # returns the new root of the tree, which could be different than v

    if v == nil:
        return newNode

    if newNode.x == v.x:
        v.x = newNode.x # recall x could be a pair of a key and value

    elif newNode.x < v.x:
        # Insert newNode to the left.  The root of the left subtree could change,
        # so the return value of the recusive call will replace v.left
        v.left = add_helper(newNode, v.left)

        # If the heap property is violated, rotate right
        if v.left.p < v.p:
            v = rotate_right(v)

    else:
        # Insert newNode to the right.  The root of the right subtree could change,
        # so the return value of the recusive call will replace v.right
        v.right = add_helper(newNode, v.right)

        # If the heap property is violated, rotate left
        if v.right.p < v.p:
            v = rotate_left(v)

    return v

Definition We say that \(v\) is the root of a treap if the subtree rooted at \(v\) satisfies the following two properties:

BSP property: For each node \(w\) in the subtree rooted at \(v\), all elements in the left subtree of \(w\) have smaller \(x\) value than \(w\), and all elements in the right subtree of \(w\) have larger \(x\) value than \(w\).
Heap property: For each node \(w\) in the subtree rooted at \(v\), we have that \(w.p < w.left.p\) and \(w.p < w.right.p\).

Lemma If \(v\) is the root of a treap and \(v'\) is the return value of \(add\_helper(newNode, v)\), then \(v'\) is the root of a treap containing all the original elements plus the element newNode.

Proof The proof is by induction on the size of the tree.

Base case: \(v\) is nil so the tree has size zero. In that case, the correct treap is just a treap with a single node, \(newNode\). A single node satisfies the BST and heap property.
Inductive case: There are three subcases:
- \(newNode.x = v.x\) In this case, the value of \(x\) is replaced and no other changes to the treap take place, so \(v' = v\) is still the root of a treap.
- \(newNode.x < v.x\): Here, we recursivly call \(add\_helper(newNode, v.left)\). Let \(t\) be the return value of this recursive call. By induction, \(t\) is the root of a treap with \(newNode\) correctly inserted. By setting \(v.left = t\), we now have a tree rooted at \(v\) which satisifes the BST property. But the heap property might not be satisfied. All elements in the right subtree of \(v\) satify the heap property since no elements in that subtree changed. All elements in the left subtree of \(v\) (the tree rooted at \(t\)) also satisfy the heap property since \(t\) is the root of a treap. The only node which might fail the heap property is \(v\) itself; it is possible that \(v.p > t.p\). If this is the case, we rotate the tree right. This rotation preserves the BST property. Also, the only parent-child relationship which changes its relative ordering is between \(v\) and \(t\). Thus \(t\) is now the root of the tree and the heap property is satisfied at \(t\). The code then sets \(v = t\) and returns \(v\), so returns the new root of the treap.
- \(newNode.x > v.x\): This case is almost identical to the previous case. The recursion produces a treap which we set into \(v.right\). The BST property is then satisfied, but \(v\) itself could fail the heap property. If \(v\) fails the heap property, we rotate the tree left. This preserves the BST property and fixes the heap property.

Running time

The add method has running time \(O(height)\). Since the expected height of the tree is \(O(\log n)\), the expected running time of add is \(O(\log n)\). Also, add uses \(O(\log n)\) stack space since each recursive call has a frame on the call stack. If you look at the iterative version of add in the book, it also has an expected running time of \(O(\log n)\), but uses only \(O(1)\) space. The iterative version in the book obtains this smaller space by using a parent property stored at each node. The downside is these parent pointers take \(O(n)\) space to store, so there is a tradeoff between less space used during add but more space used overall in the nodes. Note that the iterative version in the book could remove the parent pointers and then use an explicit stack during add to use \(O(\log n)\) space without parent pointers. Similarly, if the recursive version above included parent pointers, the recursive version could switch to be tail-recursive, in which case it would use only \(O(1)\) space.

Treap Remove

Here is a recursive version of remove.

remove_helper(x, v):
    # Removes x from subtree rooted at v
    # returns the root of the new subtree

    if v == nil:
        # x is not in the tree, so do nothing
        return v

    if x < v.x:
        v.left = remove_helper(x, v.left)
        return v
    elif x > v.x:
        v.right = remove_helper(x, v.right)
        return v
    else:
        # We have found the element to delete!
        
        # If one of the left or right subtree is missing, it can just replace v
        if v.left == nil:
            # Use the right subtree of v as the new root of this subtree
            return v.right
        elif v.right == nil:
            # Use the left subtree of v as the new root of this subtree
            return v.left
        else:
            # Both the left and right subtree exist.  In this case, we can't immediately
            # remove v, we need to push v closer to a leaf which we can do by performing
            # a rotation.  The direction of rotation is determined by which of the left or
            # right child should replace v to satisfy the heap property
            if v.left.p < v.right.p:
                # Rotate the tree right so that the left child of v becomes the root of the subtree
                t = rotate_right(v)
                # Now v (which is the element we are trying to delete) has moved into the right
                # subtree of t, so recursivly delete it from there.
                t.right = remove_helper(x, t.right)
                return t
            else:
                # Rotate the tree left so that the right child of v (which has smaller p) becomes
                # the root of the subtree
                t = rotate_left(v)
                # v is now in the left subtree of t, so remove it from there.
                t.left = remove_helper(x, t.left)
                return t

remove(x):
    root = remove_helper(x, root)

Lemma If \(v\) is the root of a treap and \(v'\) is the return value of \(remove\_helper(x, v)\), then \(v'\) is the root of a treap containing all the original elements except the element equal to \(x\) if such an element existed.

Proof The proof is by induction on the size of the tree.

Base case: \(v\) is nil. In this case, \(x\) is not in the tree so returning nil is correct.
Inductive case. There are several cases:
- Case: \(x < v.x\). If \(x\) exists it is located in the left subtree of \(v\). Let \(t\) be the return value from \(remove\_helper(x, v.left)\). By induction, \(t\) is the root of a treap with \(x\) potentially removed. \(t\) is then set as the left child of \(v\). The resulting tree satisfies the BST and heap properties.
- Case \(x > v.x\). If \(x\) exists it is located in the right subtree of \(v\). Similar to the previous case, replacing \(v.right\) by the return value of remove_helper correctly removes \(x\) if it exists and keeps \(v\) as the root of a treap.
- Case \(x == v.x\) and \(v.left\) is nil. In this case, \(v\) must be removed. Let \(t = v.right\) (note that \(t\) could be nil). Since \(v\) is the root of a treap, \(t\) is also the root of a treap. Since there are no elements to the left of \(v\), returning \(t\) is correct, since \(t\) is the root of a treap and the subtree rooted at \(t\) does not contain \(x\).
- Case \(x == v.x\) and \(v.right\) is nil. This case is similar to the previous case; \(v.left\) is the root of a treap which does not contain \(x\).
- Case \(x == v.x\), neither \(v.left\) nor \(v.right\) are nil, and \(v.left.p < v.right.p\). In this case we rotate the tree right. Let \(t = v.left\), let \(s = v.right\), and note that \(t\) is the return value from \(rotate\_right(v)\).
```
       v                   t
     /   \                / \
    t     s     =>       A   v
   / \   / \                / \
  A   B C   D              B   s
                              / \
                             C   D
```
After the rotation, the tree rooted at \(t\) satisfies the BST property since rotations preserve the BST property. But the heap property at \(t\) is currently violated, since \(t.p > v.p\) and \(v\) is the right child of \(t\). But we do know that \(t.p\) is smaller than everything in the subtree except \(v\). Indeed, \(t.p\) is smaller than everything in \(A\) and \(B\) since these were originally under \(t\) in the original treap. Also, \(t.p\) is smaller than \(s.p\) by assumption (the case we are looking at started with \(v.left.p < v.right.p\)). Finally, \(s.p\) is smaller than everything in \(C\) and \(D\) since they were originally under \(s\) in the original treap. Therefore, \(t.p\) is smaller than everything except \(v\).

The next step in the code is to recursivly call \(remove\_helper(x, t.right)\). Since \(v\) is now in the right subtree of \(t\) and \(x == v.x\), by induction the return value from \(remove\_helper(x, t.right)\) is the root of a treap with the key \(x\) removed, i.e. \(v\) has been removed. Setting this return value as the new \(t.right\) then produces a treap rooted at \(t\). The heap property at \(t\) is now satisfied since \(t.p\) is smaller than all other \(p\) values since \(v\) has been removed.
- Case \(x == v.x\), neither \(v.left\) nor \(v.right\) are nil, and \(v.left.p > v.right.p\). This case is very similar to the previous case, except we rotate left instead of right.

Running time

remove runs in time \(O(height)\) so the expected running time is \(O(\log n)\). Similar to add, as written the space usage of remove is \(O(\log n)\). But, if we examine the remove_helper function, it is almost tail recursive. The only thing that happens after a recursive call returns is that we replace the parameter we passed in with the return value. That is, here is a snippet from the code:

v.left = remove_helper(x, v.left)

We pass in \(v.left\) to the recursive call and then whatever is returned, we replace \(v.left\) by it. If in our C++ implementation we passed \(v.left\) by reference, the value in \(v.left\) could just be directly updated. (Recall that a parameter passed by reference can be directly edited by the callee.) If we do change the parameter to remove_helper to a reference, the function then becomes tail-recursive in which case the stack usage drops to \(O(1)\).

Implementation

treap.cpp is my implementation. The add function is essentially directly translated into C++. The remove functions use the above trick to obtain a tail-recursive version; the C++ contains some comments explaining in more detail how the parameter to remove_helper works as a reference.

I had to compile with -O2 in order for g++ to properly optimize remove_helper to perform tail-call optimization. To see that it in fact worked, what I did is compiled with g++ -O2 -g -std=c++11 treap.cpp. I then opened a.out inside gdb and placed a breakpoint on line 156 (which is the first line of the remove_helper function. Each time the breakpoint was hit, I checked the backtrace and saw that the backtrace had only a single function.

Exercise

Write pseudocode for an extract_min method on treaps. The extract_min method takes no parameters. It locates the entry in the treap with the smallest \(x\) value, removes it, and returns the removed \(x\) value. What is the running time of your pseudocode? Next, translate your pseudocode to C++ and add it to my treap.cpp implementation.