Oct 30

Lecture Overview

Treap iteration

This isn't mentioned in the book, but implementations of USet usually have the ability to iterate the elements. There are several possible ways of traversing a tree. For a BST, the nicest iterator is the in-order traversal because an in-order traversal on a binary search tree visits the nodes in order of their \(x\) value. An in-order traversal first visits all nodes to the left (recursively), then visits the node itself, then visits all nodes to the right (recursively). In a language that supports coroutines, this recursive traversal can be directly implemented, such as the following pythonesce code:

generate_inorder(v):
    if v.left != nil:
        yield from generate_inorder(v.left)
    yield v
    if v.right != nil:
        yield from generate_inorder(v.right)

In a language that does not have generators or coroutines such as C++ we have to work harder to create an iterator. Remember from before that the iterator is a structure which refers to a single element in the tree and has methods to move to the next and previous element. To implement our iterator, we essentially are going to implement the recursive version above but instead of recursion, we will maintain the stack ourselves. The recursive version uses the call stack to store in each call stack frame a node \(v\). Also, the nodes on the call stack are exactly a path in the tree from the root to a single node. So to create an iterator we make this stack explicit and store it in the iterator struct. We use the C++ stack implementation.

struct Iterator {
    private:
        stack<shared_ptr<Node>> path;
    public:
        //TODO
}

The value in the path property is a stack of nodes, with the root the bottom of the stack, the element referred to by the iterator as the top of the stack, and each node in the stack is the parent of the next node in the stack.

      f    
    /   \  
  b      g
 / \      \
a   d      i
   / \    /
  c   e  h

An example iterator might have path [f, b, d] to refer to the d node, or path [f, g, i] to refer to the i node.

The begin operation is a method of the treap and returns a new iterator pointing at the first element. The first element in an in-order traversal is the left-most node, found by going left until left is nil:

begin()
    i = new iterator
    v = root
    while v != nil: # Loop until we get nil
        i.path.push(v) # Add v to the path
        v = v.left # Go left
    return i

In the example tree above, begin starts at node a with path [f, b, a].

Now consider the next operation on iterators. There are several cases. First, consider that we are at b with path [f, b]. Since we are currently visiting b, it means we have already visited all nodes to the left of b and none to the right. So the next node to visit is from the right subtree of b. But remember it is an in-order traversal, so inside the right subtree of b we must visit the left-most node which happens to be c.

next()
    v = path.top()  # Get the current node
    if v.right != nil:
        # Go right
        v = v.right

        # Now go left until you can't go left, adding all nodes to the path
        while v != nil:
            path.push(v)
            v = v.left
    else:
        # TODO

Now we have to consider when the right child is nil. For example, a with path [f, b, a]. The next node in the in-order traversal from a is b which we can find by popping a from the path. But consider the more complicated situation of node e with path [f, b, d, e]. The next node in the in-order traversal is f.

In general, what we do when we find a node with no right child is move back up the tree one step by popping the path. If when stepping up the tree we pass over a right-edge then we know that we have already visited this node. For example, moving up the tree from e to d is over a right-edge, and the in-order traversal has already visited d because an element is visited before anything in its right subtree. If this happens, we have to continue stepping up the tree. In general, we step up the tree until we go over a left-edge, so e goes all the way up to f.

next()
    v = path.top()  # Get the current node
    if v.right != nil:
        # Go right
        v = v.right

        # Now go left until you can't go left, adding all nodes to the path
        while v != nil:
            path.push(v)
            v = v.left
        return

    else:

        while True:
            path.pop() # Remove v
            if path.size() == 0:
                # If the path is empty, we have visited all nodes.
                return

            else:
                # The new top of the path is the parent of v.
                parentOfV = path.top()
                if parentOfV.left = v:
                    # If moving up went over a left-edge, cancel the loop and return.
                    # parentOfV is exactly the node we want to visit next.
                    return
                else:
                    # Otherwise, set v to be parentOfV and continue the loop
                    v = parentOfV

Finally, we can then implement isAtEnd by checking if the path is empty:

isAtEnd():
    return path.size() == 0

Running Time

First, begin has running time \(O(height)\) which is in expectation \(O(\log n)\) and isAtEnd has running time \(O(1)\) (the reference for stack.size says the complexity of size is constant).

next has running time \(O(height)\), because occasionally we will pop all the way from a leaf to the root. In addition, consider the next call after we visited the root: the next node from the root is to go right and then go all the way to a leaf, which is also \(O(height)\). But these only happen occasionally so lets amortize the cost of next. We do so via the following banker's method:

In summary, begin runs in amortized time \(O(n)\) and next runs in amortized time \(O(1)\). This tells me that if I create a for loop which uses begin and next to iterate all elements (and the body of my for loop is \(O(1)\)), that for loop will run in time \(O(n)\) since I only call begin once and then each call to next is \(O(1)\). This is the best we can hope for since a for loop iterating all the elements in the tree must run in at least \(\Omega(n)\) time since there are \(n\) elements.

Implementation

Here is the implementation in C++. This can be copied directly into the Treap struct somewhere inside the public: section. Also, you must add #include <stack> to the top of the file.

struct Iterator {
    private:

        stack<shared_ptr<Node>> path;

    public:

        //Constructing an iterator goes to begin
        Iterator(Treap t) {
            shared_ptr<Node> v = t.root;
            while (v) {
                path.push(v);
                v = v->left;
            }
        }
    
        Tval key() {
            return path.top()->key;
        }

        Tval value() {
            return path.top()->val;
        }

        void next() {
            shared_ptr<Node> v = path.top();
            if (v->right) {
                //go right and then left until you reach nil
                v = v->right;
                while (v) {
                    path.push(v);
                    v = v->left;
                }
            } else {
                while (true) {
                    path.pop();
                    if (path.size() == 0) {
                        return;
                    } else {
                        shared_ptr<Node> parentOfV = path.top();
                        if (parentOfV->left == v) {
                            return;
                        } else {
                            v = parentOfV;
                        }
                    }
                }
            }
        }

        bool isAtEnd() {
            return path.size() == 0;
        }
};

Iterator begin() {
    return Iterator(*this);
}

Printing

While writing this iterator and to assist me in the debugging, I wrote the following treap printing method which you might find useful. This should only be used for debugging since it exposes the structure of the treap, in particular no user of the data structure should ever be using this method.

static void debug_print_helper(shared_ptr<Node> v, int indent) {
    //print the tree via a pre-order traversal.  We print the node, then print
    //its children at one larger indent level.
    if (v) {
        if (indent) {
            cout << setw(indent) << " ";
        }
        cout << v->key << ":" << v->p << ":" << v->val << endl;
        debug_print_helper(v->left, indent+4);
        debug_print_helper(v->right, indent+4);
    }
}

void debug_print() {
    debug_print_helper(root, 0);
}

Exercise

Copy the debug_print_helper into the private section of the treap and debug_print into the public section of your treap code. Also, you will need to add #import <iomanip> to the top of the file to get access to the setw function. Then add a call to t.debug_print() to the end of the main function and run the code. Note that the tree is the same each time you run it because seed to the random number generator is the same. Therefore, add a call to srand(time(0)); to the top of the main function. Then run the code several times, looking at the structure of the tree.

Secondly, copy the above iterator code into the treap.cpp implementation. Then add some code to the bottom of main using the iterator and a for loop to iterate all elements. For each element print out the key and value. Note how the elements are traversed in order of their key.