Project 3, Lazy Binary Search Trees
Due: Tuesday, April 11, 8:59pm
Addenda
- [Thu 3/30/17 16:10] Test programs updated to fix extra paramter used in calls to remove().
- [Thu 3/30/17 16:10] Test programs are available now.
- [Tue 3/28/17 11:19] Clarified that insert() and remove() should take O( log n ) time without counting the time to rebalance.
Objectives
The objective of this programming assignment is to have you practice using recursion in your programs and to familiarize you with the binary search tree data structure.Introduction
In real life, laziness if often disdained. (See Wikipedia article on sloth.) In computer science, however, laziness is sometimes a viable strategy. Why do today what you can put off until tomorrow? especially if there is a chance that you won't actually have to do it tomorrow either?
In contrast to a AVL trees and Red-Black Trees, where we diligently maintain a balance condition to guarantee that the tree has O( log n) height, in a Lazy BST, we don't worry about the balancing until things get really out of whack. Insert and delete proceed in the same manner as an unbalanced binary search tree until we notice that at some node of the BST, the left subtree is twice as large as the right subtree (or vice versa). When this happens we rebalance the subtree of the Lazy BST rooted at this node.
When a subtree of a Lazy BST is rebalanced, we convert the entire subtree into a sorted array. Then we convert the array back into a perfectly balanced BST. Rebuilding is easy because the array is sorted. We can find the middle element of the array in constant time and make it the root of the new subtree. Then, we recursively build the left subtree and the right subtree using, respectively, the portion of the array that has keys smaller than the root and the portion of the array that has keys larger than the root. The result is a binary search tree that is as balanced as possible. (See Project 3 Examples.) The rebalance procedure takes O( t ) on a BST subtree with t elements. However, we don't have to rebalance very often — amortized analysis shows that the insert and delete procedures take O( log n ) amortized time on a Lazy BST with n elements.
Since rebalancing is expensive, we add another provision: we won't rebalance a subtree that has height ≤ 3. An unbalanced subtree that has height 3 will not add very much to the height of the overall tree and hence will not contribute very much to the running time of the BST procedures. (We adopt the convention that the height of a leaf to is 0, where a leaf is a node that has actual data and no children.) By ignoring small unbalanced subtrees, we can avoid excessive rebalancing.
One note about the rebalance procedure: it is possible for a Lazy BST to have two nodes x and y where rebalancing is needed where x is an ancestor of y. In this situation, we want to do the rebalancing at x since rebalancing the subtree rooted at x will also rebalance the subtree rooted at y. If we rebalanced at y first, the time spent rebalancing at y is completely wasted since all that work is undone when we rebalance at x. (See Project 3 Examples.)
Assignment
Note: Running time is one of the most important considerations in the implementation of a data structure. Programs that produce the desired output but exceed the required running times are considered wrong implementations and will receive substantial deductions during grading.
Your assignment is to implement a Lazy BST. You may start with a binary search tree class from the textbook or given by your instructor, if you prefer. You may also design your own. Each option has advantages and disadvantages. A primary objective of this programming assignment is to have you use recursion. So, one component of grading will evaluate how elegantly you employ recursion to implement this data structure. (Yes, you are being graded on aesthetics!)
Since you will choose the design of the class definitions, no header files will be distributed with this project. Instead, the requirements are:
- The name of the class must be LazyBST.
- The header file must be named LazyBST.h (case sensitive).
- A client program that includes LazyBST.h should compile correctly without including any other header files.
- Your LazyBST class must have the member functions with the specified signatures indicated below.
- The implementation of your member functions and any supporting functions must be placed in a single file named LazyBST.cpp.
- No STL classes may be used in this programming project.
In order to implement LazyBST efficiently, your data structure must be able to determine the size and height of a subtree in constant time. You must have data members for the height and size of a subtree in the class representing the root of a subtree of a Lazy BST. The height and size data members must be updated whenever the height or size of that subtree changes. The update must not affect the asymptotic running time of insert, delete and search. These must still run in time proportional to the height of the tree.
To keep things simple for this project, we will just store int values in LazyBST. Although, well-written code should allow you to easily change the type of data stored in the data structure.
Here are the member functions you must implement in your LazyBST class. (You will need to implement others for your own coding needs.)
-
A default constructor with the signature
LazyBST::LazyBST() ;
The default constructor must create a LazyBST object that is ready to have its member function invoked without any further processing. -
A copy constructor with the signature
LazyBST::LazyBST(const LazyBST& other) ;
The copy constructor must make a deep copy and create a new object that has its own allocated memory. -
A destructor with the signature
LazyBST::~LazyBST() ;
The destructor must completely free all memory allocated for the object. (Use valgrind on GL to check for memory leaks.) -
An overloaded assignment operator with the signature:
const LazyBST& LazyBST::operator=(const LazyBST& rhs) ;
The assignment operator must deallocate memory used by the host object and then make deep copy of rhs. -
An insert() function that adds an item to LazyBST that has the following signature:
void LazyBST::insert (int key) ;
The insert() function must run in time proportional to the height of the Lazy BST (not counting time for rebalancing). Your LazyBST implementation must not allow duplicates. If the insert() function is invoked with a key value that already stored in the Lazy BST, your insert() function should do nothing, except that it may rebalance the tree if an imbalance is detected.
-
A remove() member function that finds and removes an item with the given key value. The remove() function should return a boolean value that indicates whether the key was found. Your remove() function should not abort or throw an exception when the key is not stored in the BST. The remove() member function must have the following signature:
bool LazyBST::remove(int key) ;
For full credit, your remove() method must run in time proportional to the height of the tree (not counting time for rebalancing).
-
A find() function that reports whether the given key is stored in the tree. The signature of the find() method should be:
bool LazyBST::find(int key) ;
For full credit, your find() method must run in time proportional to the height of the tree.
-
A member function rebalance() that rebalances a subtree of the Lazy BST as described above. The running time of rebalance() must be proportional to the number of nodes in the subtree being rebalanced. Note that a proper implementation would require you the keep track of the size and height of the subtree. Read the description above.
-
A member function inorder() that performs an inorder walk of the LazyBST and at each node, prints out the key followed by a : followed by the height of the node followed by another : followed by the size of the subtree rooted at that node. Furthermore, inorder() should print an open parenthesis before visiting the left subtree and a close parenthesis after visiting the right subtree. Nothing should be printed when inorder() is called on an empty tree, not even parentheses. This function will be used for grading, so make sure that it works correctly. The function must have the following signature:
void LazyBST::inorder() ;
For example, calling inorder() on the following BST should produce the string:(((((3:0:1)7:2:4((9:0:1)11:1:2))14:3:8((15:1:2(17:0:1))20:2:3))22:4:13(((24:0:1)26:1:2)30:2:4(37:0:1)))41:5:22((((50:0:1)54:1:3(59:0:1))60:2:4)64:3:8((71:1:2(75:0:1))79:2:3)))
Fig. 1: an unbalanced binary search tree.
Here, the 41:5:22 indicates that the node with key 41 has height 5 and that there are 22 nodes in the tree. The output before 41:5:22 is produced by visiting the left subtree. Everything after 41:5:22 is produced by visiting the right subtree.
-
A function locate() that returns whether there is a node in a
position of the LazyBST and stores the key in the reference parameter.
The position is given by a constant C string, where a character
'L' indicates left and a character 'R' indicates
right. The locate() function must have the signature
bool LazyBST::locate(const char *position, int& key) ;
For example in the BST above:- A call to locate("LRL",key) should return true and store 26 in key.
- A call to locate("RRLR",key) should return true and store 75 in key.
- A call to locate("RLR",key) should return false and not make any changes to key since there is not a node in that position. Note: locate() must not abort and must not throw an exception in this situation.
- A call to locate("",key) should return true and store 41 in key, since the empty string indicates the root of tree.
Your code must run without segmentation fault and without memory leaks. For grading purposes, memory leaks are considered as bad as segmentation faults. This is because many segmentation faults are cause by poorly written destructors. A program with an empty destructor might avoid some segmentation faults but will leak memory horribly. Thus, not implementing a destructor or not deleting unused memory must incur a penalty that is equivalent to a segmentation fault.
Test Programs
Here are sample driver programs to test your implementation. Passing these tests do not mean you will receive 100% on your project. It does not guarantee that you will pass tests used in grading. You should make additional tests of your own!
Note: your output may differ from the sample out provided because you may have correctly implemented remove() and rebalance() differently.
-
Simple test of insertion
Should see rebalance when inserting 33.
Driver program: test1.cpp and Sample output: test1.txt -
Simple test that also removes nodes.
Should see rebalancing during remove.
Driver program: test2.cpp and Sample output: test2.txt -
Simple test of inserting and removing.
This test includes inserting duplicates and attempt to remove keys not in the tree.
Driver program: test3.cpp and Sample output: test3.txt -
Checking return values from remove and find.
Driver program: test4.cpp and Sample output: test4.txt -
Tests copy constructor, destructor and assignment operator
Should test this with valgrind
Driver program: test5.cpp and Sample output: test5.txt -
Simple test of locate() function
Driver program: test6.cpp and Sample output: test6.txt -
Big test with recursive sanityCheck() and lots of data.
Driver program: test7.cpp and Sample output: test7.txt
Implementation Notes
Here we list some recommendations and point out some traps and pitfalls.-
Remember that we are defining the height of a leaf node to be 0. (The leaf node here is a node that contains actual data, not the null pointers at the bottom of a BST.)
-
There are many places where the height and size of a node needs to be updated including, for example, in the rebalance procedure.
-
When you insert a key that is already in the binary search tree, you are supposed to do nothing. (This is one of the standard alternatives.) This means you have to be careful about how you update the sizes of the subtrees, because when you insert a duplicate, the size does not change! (and you won't find out that it is a duplicate until you've found its 'clone').
-
When should we check if we need to rebalance? One time to consider is after we modify the Lazy BST in the insert() and remove() procedures. However, we want to do the rebalancing as high up the tree as possible. (See note above.) So, checking for rebalancing after insert() and remove() would require another traversal of the Lazy BST from the root.
Instead, it is much more convenient to check for rebalancing before we insert or remove an item (since we are traversing the BST top down). This may seem counter-intuitive since insert() and remove() will mess up our nicely balanced BST right after we cleaned it up. However, even if we check for rebalancing after these operations, the next insert() or remove() will mess up the tree anyway.
Another temptation is to insert() or remove() the item during the rebalance() procedure. (Hey, we are taking this subtree apart anyway, surely we can toss in or remove an item while we are at it.) This is possible, but not elegant. Let's concentrate on elegant uses of recursion in this project.
What to Submit
You must submit the following files to the proj3 directory.
- LazyBST.h
- LazyBST.cpp
- Driver.cpp
The Driver.cpp program should include tests showing the parts of your project that work correctly.
If you followed the instructions in the Project Submission page to set up your directories, you can submit your code using this Unix command command.