Nov 6

Join

To implement intersection and difference, we need another utility function join. join takes as input two treaps \(v_1\) and \(v_2\) where every element in \(v_1\) is smaller than every element in \(v_2\). It returns a new treap with the union of the elements from \(v_1\) and \(v_2\). Note this is simpler than the union operation since we know that the elements in \(v_1\) and \(v_2\) are separated.

join(v1, v2):
    if v1 == nil:
        return v2
    if v2 == nil:
        return v1

    if v1.p < v2.p:
        # Make a clone of v1 the new root
        w = new node
        w.x = v1.x
        w.p = v1.p

        # We now need to merge v2 with the children of w.  But everything
        # in v2 is larger than v1, so it only needs to be merged with the
        # right child of w.
        w.left = v1.left
        w.right = join(v1.right, v2)

        return w

    else:
        # Make a clone of v2 as the new root
        w = new node
        w.x = v2.x
        w.p = v2.p

        # Merge v1 with the left, since everything in v1 is smaller than v2.x
        w.left = join(v1, v2.left)
        w.right = v2.right

        return w

Join runs in time \(O(height)\) so expected time \(O(\log n)\).

Correctness

Lemma Let \(v_1\) and \(v_2\) be roots of treaps, where in addition for each node \(s\) in the treap rooted at \(v_1\) and each node \(t\) in the treap rooted at \(v_2\), we have that \(s.x < t.x\). Let \(w = join(v_1, v_2)\). Then \(w\) is the root of a treap consisting of the union of elements from \(v_1\) and \(v_2\).

Proof The proof is by induction on the sum of the size of \(v_1\) and \(v_2\).

Base case: \(v_1\) or \(v_2\) are nil. In this case, returning the other treap is the correct union.
Inductive case when \(v_1.p < v_2.p\). In this case, \(v_1\) has the smallest priority of any value in either the treap rooted at \(v_1\) or the treap rooted at \(v_2\), since \(v_1.p < v_2.p\). Therefore, \(v_1\) will become the root of the newly returned treap. But since this operation is persistent, we create a clone of \(v_1\) called \(w\). Next, the children of the new clone \(w\) will be the result of merging the children of \(v_1\) and the entire treap rooted at \(v_2\). Since \(w.x = v_1.x\) and we know by assumption that everything in \(v_2\) is larger, we know that everything in \(v_2\) will go to the right of \(w\). Therefore, the only elements smaller than \(w.x\) are exactly the left subtree of \(v_1\), so setting \(w.left = v_1.left\) is correct and preserves the BST and heap properties. Let \(u = join(v_1.right, v_2)\). First, notice that \(v_1.right\) and \(v_2\) are roots of treaps. In addition, everything in the treap rooted at \(v_1.right\) is smaller than everything in the treap rooted at \(v_2\). Therefore, by induction \(u\) is the root of a treap consisting of all elements from both \(v_1.right\) and \(v_2\). Therefore, setting \(w.right = u\) correctly preserves the BST and heap property and produces exactly the correct treap.
Inductive case when \(v_1.p > v_2.p\). This case is very similar to the previous case. We make a clone \(w\) of \(v_2\) the root, merge \(v_1\) into the left subtree of \(v_2\), and set \(v_2.right\) unchanged as the new right child of \(w\).

Intersection

Intersection then is implemented as follows. The idea is to basically copy the union code, but make a minor change in that we don't always add an element, we only add it if it appears in both v1 and v2. If we don't add the element, we must join the children together.

intersection(v1, v2):
    if v1 == nil or v2 == nil:
        return nil

    if v1.p < v2.p:
        # v1 might become the new root, so split v2 around v1.x
        (r1, r2, a) = split(v2, v1.x)

        # Now, intersect the stuff smaller than v1.x which is v1.left and r1
        left = intersect(v1.left, r1)

        # Now intersect the stuff larger than v1.x which is v1.right and r2
        right = intersect(v1.right, r2)

        if a == nil:
            # v1.x is not in v2, so unlike union don't add v1.  Instead, we have to join left and
            # right to form a new treep without v1.
            return join(left, right)

        else:
            # Similar to union, create a new clone of v1
            w = new node
            w.x = v1.x
            w.p = v1.p
            w.left = left
            w.right = right
            return w

    else:
        # v2 might become the new root, so split v1 around v2.x
        (r1, r2, a) = split(v1, v2.x)

        # Compute the left and right subtrees
        left = intersect(r1, v2.left)
        right = intersect(r2, v2.right)

        if a == nil:
            # v2.x is not in v1, so don't add v2.  Instead, join
            return join(left, right)

        else:
            # Like union, create a new clone of v2 but use the a.x value so
            # the intersection takes elements from v1
            w = new node
            w.x = a.x
            w.p = v2.p
            w.left = left
            w.right = right
            return w

Correctness

Lemma Let \(v_1\) and \(v_2\) be roots of treaps and let \(w = intersection(v_1, v_2)\). Then \(w\) is the root of a treap consisting of all elements which appear in both \(v_1\) and \(v_2\). The intersection is left-biased, so takes elements from \(v_1\).

Proof The proof is by induction on the sum of the size of \(v_1\) and \(v_2\).

Base case: \(v_1\) or \(v_2\) are nil. In this case, nil is the correct intersection.
Inductive case \(v_1.p < v_2.p\). In this case, the node \(v_1\) has the smallest priority among all elements, so if we keep \(v_1\) it will be the root. If we do decide to keep \(v_1\), we will need to merge the elements of the treap rooted at \(v_2\) to the left and right of \(v_1\) similar to union, so split the treap rooted at \(v_2\) around \(v_1.x\). Let \((r_1, r_2, a) = split(v_2, v_1.x)\). By the lemma about splits, \(r_1\) is the root of a treap with all elements from \(v_2\) smaller than \(v_1.x\), \(r_2\) is the root of a treap with all elements larger that \(v_1.x\) and \(a\) is either nil if \(v_1.x\) is not in \(v_2\) or equals a node from \(v_2\) with \(v_1.x\).

Therefore, \(v_1.left\) and \(r_1\) both consist of elements smaller than \(v_1.x\). Also, \(v_1.right\) and \(r_2\) both consist of elements larger than \(v_1.x\). Because of this, we don't need to intersect the entire treap rooted at \(v_1\) with the entire treap rooted at \(v_2\) since for example it is impossible for an element in the treap rooted at \(v_1.left\) to appear in the treap rooted at \(r_2\). Therefore, we just need to intersect \(v_1.left\) with \(r_1\) and intersect \(v_1.right\) with \(r_2\). Let \(left = intersect(v_1.left, r_1)\) and \(right = intersect(v_1.right, r_2)\). By induction, \(left\) consists of all elements shared between \(v_1.left\) and \(r_1\), and \(right\) consists of all elements shared between \(v_1.right\) and \(r_2\). Combining that with our previous logic, \(left\) is all elements that appear both in \(v_1\) and \(v_2\) which are smaller than \(v_1.x\), and \(right\) is all elements that appear both in \(v_1\) and \(v_2\) which are larger than \(v_1.x\).

The union of the elements in \(left\) and \(right\) are therefore exactly the elements that should appear in the resulting intersection, except we have to decide if \(v_1.x\) should appear as well or not.
- If \(a\) is nil, it means that \(v_1.x\) is not in \(v_2\) so \(v_1.x\) should not appear in the intersection. We therefore must return the union of \(left\) and \(right\), but since we know everything in \(left\) is smaller than \(v_1.x\) and everything in \(right\) is larger than \(v_1.x\), we can use join. By the lemma above about join, the result of join is the root of a treap consisting of the union of the elements from \(left\) and \(right\), exactly the treap we want to produce.
- If \(a\) is not nil, it means \(v_1.x\) is in \(v_2\) so \(v_1.x\) must appear in the returned treap. We therefore need to combine \(v_1\), \(left\), and \(right\) to form a treap. This is easy because of the way we built \(left\) and \(right\); we can just make a clone \(w\) of \(v_1\) and set \(w.left = left\) and \(w.right = right\). First, \(v_1\) had smallest priority of any element so the heap property at \(w\) is satisfied. Next, everything in \(left\) had smaller value than \(v_1.x = w.x\) and everything in \(right\) had larger value than \(v_1.x = w.x\) so the BST property is satisfied at \(w\). Finally, \(left\) and \(right\) were roots of treaps so the heap and BST properties are still satisfied at all elements in the treaps rooted at \(left\) and \(right\). Therefore, \(w\) is the root of a treap and consists exactly the of the intersection of \(v_1\) and \(v_2\).
Inductive case when \(v_2.p < v_1.p\). This is similar to the previous case. We split \(v_1\) around \(v_2.x\) so that we can add the elements from \(v_1\) into the left and right of \(v_2\). We then recursivly merge \(r_1\) and with \(v_2.left\) and \(r_2\) with \(v_2.right\). The result is that \(left\) is all elements that appear in both \(v_1\) and \(v_2\) that are smaller than \(v_2.x\) and \(right\) is all elements that appear in both \(v_1\) and \(v_2\) that are larger than \(v_2.x\). If \(a\) is nil, \(v_2.x\) should not appear in the output so just join \(left\) and \(right\). If \(a\) is not nil, \(v_2.x\) does exist in \(v_1\). Therefore, create a clone \(w\) of \(v_2\) to become the new root. Since we want a left-biased intersection and always take values from \(v_1\), set \(w.x = a.x\).

Difference

Difference is very similar to intersection!

The base case, when v1 or v2 are nil, is slightly different.
The recursive case works almost identically to intersection. The only change is in the logic to decide to join the left and right subtrees or to create a new clone. Intersection created a new clone if the element existed in both which was by checking a not equal to nil. Difference should create a clone if the element is in v1 and missing from v2.

Remove

To remove an element from a treap, we can just use difference to subtract a singleton tree consisting just of the element we want to remove.

remove(v, x):
    # v is the root of a treap and x is an element
    # the return value is a new treap with x removed from v if it exists.
    # v is not changed

    w = new node
    w.x = x
    w.p = random number
    return difference(v, w)

Implementation

persist-treap.cpp contains my implementation. For the most part is is a direct translation of the pseudocode. The recursive version of split does use a C++ tuple and tie to return tuples. Also, I describe a space-efficient version of split as well.

Exercise

Write pseudocode for difference and then fill in the implementation of difference inside persist-treap.cpp. As mentioned above, it is very similar to intersection so you can base it off of the C++ code for intersection. Finally, add some test code to main which calls difference and prints the result.