Project 3: Splay Tree of Trees

Due: Tuesday, April 7, before 9:00 pm


Addenda


Objectives

The objectives of this programming assignment are:


Introduction

BST with Array-based Rebalancing

In computer science, however, laziness is sometimes a viable strategy. Why do today what you can put off until tomorrow? Especially if there is a chance that you won't actually have to do it tomorrow, either?

In contrast to AVL trees and Red-Black Trees in which we use fiddly rotations and restructurings to maintain \(O(\log n)\) height, with array-based rebalancing, when an imbalance occurs we copy the entire imbalanced subtree to an array and rebuild it as balanced as possible. On the plus side, this is easier to implement than other restructurings, but it is certainly much slower. Therefore, we may want to rebalance less frequently, waiting until the tree is really out-of-whack. Insertion proceeds in the same manner as in an unbalanced binary search tree, until we detect an imbalance, at which point we fix the imbalance using the array-based method.

When a subtree is rebalanced, we convert the entire subtree into a sorted array. Then we convert the array back into a perfectly balanced BST. Rebuilding is easy because the array is sorted. We can find the middle element of the array in constant time and make it the root of the new subtree. Then, we recursively build the left subtree and the right subtree using, respectively, the portion of the array that has keys smaller than the root and the portion of the array that has keys larger than the root. The result is a binary search tree that is as balanced as possible. (See Project 3 Examples.) The rebalance procedure takes \(O(t)\) on a BST subtree with \(t\) elements.

For this BST implementation, the user determines the balance criteria for the tree by implementing a balance function that returns true if a node is imbalanced and returns false otherwise. The function is passed to the BST class using a function pointer. The function must take four integer arguments, in this order: leftHeight, rightHeight, leftSize, and rightSize. However, the function doesn't have to use all four arguments. For example, here is a function that implements the AVL height-balanced property:

   bool imbalfn(int leftHeight, int rightHeight, int leftSize, int rightSize) {
     return (leftHeight > rightHeight + 1) || (rightHeight > leftHeight + 1);
   }

As was mentioned previously, array-based rebalancing is expensive; using the AVL height-balanced property may not be very efficient as it tends to rebalanced frequently. Fortunately, it's easy to try other criteria by using a different balanced function with the constructor.

One note about the rebalance procedure: it is possible to have two nodes \(x\) and \(y\) needing rebalancing \(x\) is an ancestor of \(y\). In this situation, we want to do the rebalancing at \(x\) since rebalancing the subtree rooted at \(x\) will also rebalance the subtree rooted at \(y\). If we rebalanced at \(y\) first, the time spent rebalancing at \(y\) is completely wasted since all that work is undone when we rebalance at \(x\). (See Project 3 Examples.)

Splay Trees

Another type of “balancing” is used in a type of binary search tree called Splay tree. Basically, a splay tree is a type of binary search tree that throws the idea of normal balancing out the window, and focuses on the problem of accessing the same item over and over again. This results in a worst case running time for insertion and search of \(O(n)\), but this is a tradeoff for the ability to access the last item used in \(O(1)\) time. This structure achieves this through the use of one principle: whenever an item inserted or accessed through a search, it will become the new root of the tree.

For this project, we will only focus on the insertion and search functions of a splay tree. Before fully understanding how a splay tree searches, we need to emphasize a couple of sub-tasks a splay tree should be able to achieve. The first one is rotations. The figure below illustrates left and right rotations:

Left and Right Single Rotations
(Image source: http://seweb.ucsd.edu/~kube/cls/100/Lectures/lec5/lec5-5.html#pgfId-955859

It is not difficult to see that the ordering of the tree is preserved after rotating. In both images, we keep the \(a < Y < b < X < c\) relationship, and we assume that the \(a\), \(b\), and \(c\) trees were constructed correctly before the rotation occurred. For now, we will refer to these rotations as Zig (right rotation) and Zag (left rotation). These operations will help us better understand the four different cases in which we rotate in order to bring the node we searched for to the root.

Finally, the following diagram shows how we are able to pull a node we searched for to the root. Suppose we searched for node \(C\); we move it to the root by applying the operations described above:

Splaying at node \(C\)
(Image source: http://lcm.csa.iisc.ernet.in/dsa/node93.html)

Tree of Trees!

Suppose that some hacker with malicious intent has taken over your collection of ascii art images. For each image, he scrambled it and gave each ascii character a number. On top of that, he left a strange file called weights.txt, which you believe has to do with your now scrambled image. You deduce that these weights are also scrambled and they correspond to each character in your original image. With this knowledge, you devise a plan to recover your images.

For example, assume the following ascii art is composed of the letters a,b, and c:

  A B
  B C     (this is how the original ascii art should look like)

Then, you are given two files, one with your scrambled ascii art and one with the weights. The files may look like this:

  2 3
  1 2  (this is your scrambled ascii art)
  2 4
  1 3 (this is what the weights file looks like)

If you assign the ascii characters to the numbers in your scrambled file we have that A = 1, B = 2, C = 3. So then, our two files would look like this:

  Scrambled file before char assignment
  2 3
  1 2   
  Scrambled file after char assignment
  B C
  A B   
  Weights.txt
  2 4
  1 3   

Now, you know that the dimensions of your original file were 2x2, so you know that weights.txt is just a file containing the numbers 1 through 4 and you can use these numbers to reverse the scrambling! For example:

  Original image
  A B
  B C
  What weights.txt should look like after fix
  1 2
  3 4

The weights just map the positions of characters in the file. Now, it’s up to you to devise a way to reverse the scrambling! (not really, the right implementation is given below).

Since you just learned about binary search trees and splay trees, you think it’s a good idea to use them for solving each image. The plan is as follows:

  1. You plan to have several binary search trees, one for each line of your image into which will insert the correct characters for that row.
  2. These binary search trees will sort the characters by their weights given in weights.txt.
  3. To make things a little more interesting, rather than storing the binary trees in a vector, we will store them in a splay tree: each node in the splay tree will include a pointer to a BST. The splay trees will be ordered by their weight range.
  4. After they are sorted, you just have to print out each tree in the same line order, and you will have your restored image!

For instance, if you were fixing the image from the previous example, you would have two trees (since there are only two lines). You will insert the characters along with the weights into the right tree so that tree one contains the weights 1 and 2, and tree two contains the weights 3 and 4. If you print out tree one and then tree two, you will obtain the original image.


Assignment

Your assignment is to implement this algorithm by completing a BST that supports array-based balancing; the BST class will be used to store each line of the ascii art file. Since there might be a lot of trees created, we will be using a splay tree to organize the BSTs. This will be efficient if we end up inserting several weights that belong in the same BST: the splaying operation moves the most recently accessed node to the root, allowing \(O(1)\) access when accessing the same BST repeatedly. In short, this assignment is an implementation of a splay tree of binary search trees, where each node in the splay tree will contain the root of a binary search tree.

For this project, you are provided with the following files:

  • BST.h — defines the data and public functions for the BST class. This file also contains a nested node class, BNode, which will be used to build the BST.
  • BST.cpp — a skeleton implementation of the BST class.
  • Scanner.h — contains the data and public functions for Scanner class, which includes the splay tree implementation and other function to read input files and unscramble ASCII art images. This file also contains a nested node class called SNode, which will be used to build our splay tree.
  • Scanner.cpp — a skeleton implementation of the Scanner class. You will have to implement file I/O to read data from two comma delimited files and load that data into the tree of trees.
  • driver.cpp — a sample driver program for the Scanner class.
  • scrambled.txt — comma delimited file containing the scrambled image.
  • weights.txt — comma delimited file containing the weights that correspond to each character in the scrambled image.
  • bstOutput.txt — output produced by running the main() function in BST.cpp with a working BST implementation.
  • driverOutput.txt — output produced by driver.cpp with working Scanner and BST classes.
    Note: driverOutput.txt doesn't display correctly in a web browser; download it, e.g. to GL, and cat the file to see the ASCII art image.
  • These and all the provided project files are available on GL:

    /afs/umbc.edu/users/c/m/cmarron/pub/www/cs341.s20/projects/proj3files/

    You may not change any of the private variables or public function declarations. Also, any provided function implementations may not be modified. You may, however, add your own private variables and functions and additional “using” statements.

    You are responsible for thoroughly testing your code. You must implement and submit two test files:

    As always, make sure you get rid of those memory leaks and errors!

    Specifications

    BST Class

    Scanner Class


    Additional Requirements

    Requirement: Your BST class must implement array-based rebalancing and must support the use of user-supplied balance functions as described in the Introduction. Your implementation will be tested with various balance functions, some based on heights, others on size.

    Requirement: When called with verbose = true, the BST inorder() function must print an inorder traversal of the BST. For each node in the tree, it must print _data, _height, and _size, separated by colons. It must also print parentheses to show the structure of the tree (see the sample output in bstOutput.txt).

    Requirement: the Scanner class must implement a splay tree in which each node contains a pointer to a BST. Each SNode in the splay tree stores the range of weights that are stored in its BST; these determine the ordering of the SNodes. After every insertion, the inserted SNode must be splayed.

    Requirement: The inorder() function of the Scanner class must print an inorder traversal of the splay tree. For each node, it must print the upper and lower weight bounds, separated by a colon. Parentheses must be printed to show the tree structure (See the sample output in driverOutput.txt).

    Requirement: You may not use any additional STL containers, and vector may only be used to store the string vector _chars.


    Implementation Notes


    What to Submit

    You must submit the following files to the proj3 directory.

    If you followed the instructions in the Project Submission page to set up your directories, you can submit your code using this Unix command command.

    cp BST.h BST.cpp Scanner.h Scanner.cpp bstTest.cpp scanTest.cpp ~/cs341proj/proj3/