Project 2: RMQ List

Due: Tuesday, March 3, before 9:00 pm

Addenda

Added a section on testing your project. These are tests you should consider including in your mytest.cpp program.
You have to pass two keys to the query() function, but both RMQ methods need the indices of the keys in the linked list. The best way to handle this is to write the keys to an array when you are building the table for either the dynamic programming or block decomposition method; then use binary search to find the index of a key in the array. There is a binary search implementation in the sample driver program. This will add $O(\log n)$ to the query time, but that's not a problem.
Your implementation should work with any template types (K and V) for which the comparison operators and the insertion operator are defined. For example, I wrote a Donut class with comparison operators that work by comparing donut types and an insertion operator that just prints the type and description. I then created an RMQList<string,Donut>, inserted some (string, Donut) pairs, and performed a query with no problems.
When debugging, it is important to use small examples for which you can determine the solution by hand. Then you can compare the output and tables from your program with the correct answer.

Objectives

Implement a list container using a linked data structure (linked list).
Implement an efficient Range Minimum Query (RMQ) function for the list container.

Introduction

The Range Minimum Query problem (RMQ) is very simple to state. Let $A$ be a numeric array of size $n$. We wish to make numerous queries of the following type: given a pair of indices $(i, j)$, $0 \leq i \leq j < n$, return the minimum $$ \mbox{min}(i,j) = \min_{i\leq k \leq j} A[k]. $$

If we were only making a small number of queries, we would just compute the answers directly:

min(i, j): minValue = A[i] for k = i+1 to j min = MIN( minValue, A[k] ) // two-argument MIN function return minValue

Here $\mbox{MIN}(x,y)$ is the two-argument minimum function that returns the smaller of its two arguments. $\mbox{MIN}(x,y)$ runs in constant time.

The problem with this simple solution is that each query has $O(n)$ running time. In fact, making some assumptions about what we mean by a “random” query, we can show that the average interval length is approximately $n/3$ and so the average case running time is $\Theta(n)$.

We could speed-up the query by precomputing the answers. That is, we could compute $\mbox{min}(i, j)$ for all valid $(i, j)$ pairs and save them in a matrix. Then a query would consist of looking-up the appropriate value in the matrix and returning it, which has $\Theta(1)$ running time. However, there are $n(n+1)/2$ possible queries, meaning distinct $(i, j)$ pairs, so the precomputation would have a running time of $$ \frac{n(n+1)}{2} \cdot O(n) = O(n^3). $$ It is not too difficult to show that this method is actually $\Theta(n^3).$

Variant 1: Dynamic Programming

Dynamic Programming is an algorithmic design technique that is taught in Algorithms. For this problem, all you need to know is that there is a simple mathematical observations that allows us to precompute the $\mbox{min}(i,j)$ values more quickly. It is a fairly simple observation. Suppose I have already computed $\mbox{min}(i,j)$ and want to calculate $\mbox{min}(i,j+1)$ next; I can do this with a simple $O(1)$ update: $$ \mbox{min}(i, j+1) = \mbox{MIN}\left(\mbox{min}(i, j), A[j+1]\right) $$ where, as above, $\mbox{MIN}(x, y)$ is the two-argument minimum function.

We can use this to precompute all possible queries. Start with $\mbox{min}(0, 0)$, which is just $A[0]$. Calculate $\mbox{min}(0, 1)$ by applying the updated procedure. Apply the update procedure to $\mbox{min}(0, 1)$ to calculate $\mbox{min}(0, 2)$. Repeat until you've calculatd $\mbox{min}(0, n-1)$. Now start with $\mbox{min}(1, 1) = A[1]$ and apply updates to compute $\mbox{min}(1,2)$, etc., up to $\mbox{min}(1, n-1)$. Repeat this procedure until you've precomputed all the values. Be sure to save the precomputed values in an appropriate data structure, such as a two-dimensional array.

Using this procedure, we compute $\mbox{min}(i, j)$ for all $\frac{n(n+1)}{2}$ $(i,j)$ pairs. The computation for any one pair is $\Theta(1)$, so the time to complete the precomputation is $\Theta(n^2)$. Compare this to precomputation with the simple version, which had $\Theta(n^3)$ running time.

Once the precomputation is done, a query is just a lookup into the matrix in which we’ve stored the precomputed values, a $\Theta(1)$ operation. Using Dynamic Programming (clever updating), we have reduced the precomputation time to $\Theta(n^2)$ and retained a $\Theta(1)$ query time.

Variant 2: Block Decomposition

The simplest way to describe block decomposition is to look at an example. The table below contains a 15-long array in the middle row; the top row contains the indices into the array. In the bottom row, we have computed the minimum of each three-element block from the data array. That is, the first element of the bottom row, 16, is the minimum of 34, 16, and 58, etc.

0	1	2	3	4	5	6	7	8	9	10	11	12	13	14
34	16	58	-24	53	7	97	92	-12	45	9	0	-1	20	77
16			-24			-12			0			-1

How would we evaluate a query with this data structure? If the $(i, j)$ pair corresponds to one of the blocks, the query is a simple lookup. Using the example, $\mbox{min}(6,8)$ would simply return -12, the precomputed minimum for the third block.

If the query corresponds to multiple full blocks, this is only slightly harder. For $\mbox{min}(3,8)$, we would need to return the minimum of the second and third blocks, which is -24.

Unfortunately, many queries will not align with the block structure. For those, we need to do more work. Let $b_0, b_1, \ldots, b_4$ denote the five block minimums. Suppose we wish to query $\mbox{min}(2,10)$. The answer will involve two block minimums ($b_1, b_2)$ and data from two partial blocks (the first and fourth). For this example, we would compute the minimum of the elements $A[2], A[3], \ldots, A[10]$ by computing the minimum of the the values $\{A[2], b_1, b_2, A[9], A[10]\}$ which, for the example, would be the minimum of $\{58, -24, -12, 45, 9\}$, or -24.

How efficient is the block approach? Precomputing the block minimums can be done in $\Theta(n)$ time, using a single pass through the data array. The running time of the query method depends on the chosen block size. It turns out that the optimal block size is approximately $\sqrt{n},$ resulting in a $O(\sqrt{n})$ running time.

Summary

We’ve discussed four methods for computing Range Minimum Queries, each with different precomputation and query running times:

No precomputation. Write a function that computes a query with no precomputation. The query running time is $O(n)$, but there is no precomputation. If we have to make a lot of queries, this becomes infefficient.
Simple precomputation. Use our function from (1) to precompute all possible queries and save them in a two-dimensional array. The time to build the array is $O(n^3)$, but the query time is $O(1).$
Dynamic Programming (Variant 1). By thinking a little bit about how the precomputation is done, we reduce the precomputation time to $O(n^2)$ and retain the $O(1)$ query running time.
Block Decomposition (Variant 2). By breaking the problem into a number of smaller problems (blocks), we reduce the precomputation to $O(n)$, but the query running time increases to $O(\sqrt{n}).$

There are more complex methods that give different combinations of running times for the precomputation and queries. One approach, which utilizes a type of binary tree called a Cartesian Tree, gives $O(n)$ time for precomputation and $O(1)$ queries.

Assignment

Your assignment is to implement a linked list data structure with head and tail pointers along with support for efficient Range Minimum Queries (RMQ). The list will store key/value pairs where the types for both the key and the value are template parameters. The list must store the elements in ascending key order and must not allow duplicate keys. For the RMQ, you must use either Dynamic Programming or Block Decomposition.

For this project, you are provided with a skeleton .h file and a sample driver:

rmqlist.h — skeleton .h file for the templated RMQList classes.
driver.cpp — a sample driver program.
driver.txt — sample output from driver.cpp. The version of RMQList use to generate the ouptut uses Block Decomposition for RMQ. Your ouput may not be identical, depending on how you set-up the data structures.

These and all the provided project files are available on GL:

/afs/umbc.edu/users/c/m/cmarron/pub/www/cs341.s20/projects/proj2files/

The file rmqlist.h contains the function declarations for two classes:

Node — defines the nodes used in the linked list. The Node class is completely implemented and must not be modified.
RMQList — implements the linked list supporting Range Minimum Queries. You will be completing this class.

The private class variables for the linked list have been defined and should not be changed; however, you may add private variables to support efficient RMQ. Whicever RMQ variant you choose to use, you will need some array-type variables. You may only use C/C++ arrays; STL containers may not be used.

Additionally, you are responsible for thoroughly testing your program. Your test program, mytest.cpp, must be submitted along with rmqlist.h. For grading purposes, your RMQList implementation will be tested on input arrays of varying sizes, including very large arrays. Your submission will also be checked for memory leaks and memory errors.

Finally, you should strive to make your RMQ implementation as efficient as possible. Both RMQ methods require you to precompute tables that are used by the query function. Avoid recomputing the tables unecessarily. Inserting or removing data will invalidate the pre-computed tables for either method; are there cases in which it is possible to update the tables without completely recomputing them?

Specifications

For the RMQList class, you must declare the class variables necessary to support RMQ and implement the following methods:

Method	Description
`RMQList()`	Constructor. Creates an empty `RMQList` object.
`~RMQList()`	Destructor
`RMQList(const RMQList<K,V> &rhs)`	Copy Constructor
`const RMQList<K,V>& operator=(const RMQList<K,V> &rhs)`	Assignment Operator
`int size() const`	Returns the number of elements in the list. This function is already implemented.
`bool empty() const`	Returns `true` if the list is empty, false otherwise.
`bool insert(const K& key, const V& value)`	Inserts an element into the list. The list must be kept in increasing order by key, and duplicate keys are not allowed. The function returns `false` if there is already an entry with the given key value and returns `true` otherwise. If the specified key is larger than all keys in the list, then the function should append the new data to the end of the list, avoiding unecessary iteration.
`bool remove(const K& key)`	Remove the element with specified key value from the list. Returns `false` if there is no element with the specified `key`; returns `true` otherwise.
`bool update(const K& key, const V& value)`	Update the value for the element with the given key value. Returns `false` if there is no element with the given key; returns `true` otherwise.
`V query(const K& k1, const K& k2)`	Returns the minimum value between `k1` and `k2` in the list data (including the values associated with `k1` and `k2`). Must be implemented using Dynamic Programming or Block Decomposition, and queries must be efficient for large lists (tens of thousands of entries). Throws an exception if the list is empty or if an invalid key is passed (see Requirements, below).
`void dumpList() const`	Dump the contents of the linked list.
`void dumpTable() const`	Dump the entire RMQ tables. This will depend on which RMQ method you choose to implement. For the block decomposition approach, the function should print the number of blocks, the block size, and the block minimums. For dyanmic programming, the function should print the entire array of pre-computed minimums.
`void clear()`	Reset the `RMQList` object to its initial, empty state.

Requirement: Since the RMQList class is templated, the entire implementation must be in rmqlist.h.

Requirement: The following modifications may be made to rmqlist.h: addition of private class variables to support RMQ; addition of private helper function declarations; addition of function implementations. No other changes are permitted. In particular, you must not change the declaration of the required public functions.

Requirement: The class must correctly implement either the Dynamic Programming or Block Decomposition solution for RMQ. Your code will be tested for correct running times, and the TAs will visually inspect your implementation.

Reqirement: If you use the Block Decomposition method for RMQ, the blocksize must be $\lfloor\sqrt{n}\rfloor.$ This can be computed in C++ with (int) sqrt((float) n).

Requirement: The query function must throw exceptions as follows:

Throws range_error if query is called for an empty list.
Throws invalid_argument if either key value (k1 or k2) is not valid (not in the data).

Requirement: No STL containers or additional libraries may be used.

Requirement: Your code should not have any memory leaks or memory errors.

Requirement: Follow all coding standards as decribed on the C++ Coding Standards. In particular, class variable names must begin with “m_” or “_”.

Testing

Following is a non-exhaustive list of tests to perform on your implementation.

Basic tests of the linked list:

Create an RMQList. Perform various combinations of insertions, updates, and removals; check that the list remains correct and that elements are ordered by key value.
Check edge cases such as updating or removing the first or last entry.
Check that utility functions like size() and empty() return the correct values; check that they are both right for an empty list; check that they are correct through a sequence of insertions, updates, and removals.
Check that the clear() function removes all data, leaving an empty RMQList.
Do the usual tests for the copy constructor and assignment operator: check that they make a copy of the right-hand side, that it is a deep copy, and that they work correctly when the right-hand side is empty.
Check that insert(), update(), and remove() return the correct boolean value.
Check that your list works with different template values (not just RMQList<int, int>).

Tests of the query() function:

Create an RMQList; insert a bunch of data; perform queries and compare the results to a “brute force” computation of the minimum. See the sample driver for an example.
Create an RMQList; perform some queries; do some insertions, updates, or removals; do some more queries and check that they are correct.
Test that query() works correctly on an object created with the copy constructor or assignment operator.
Check that your RMQ implementation is not too slow. Using the dynamic programming approach, you should be able to build the tables for a 10,000-element list and perform 1,000 random queries in a few seconds on GL. Using block decomposition, you should be able to build the table for a 1,000,000-element list and perform 1,000 random queries in a few seconds on GL.

Tests of exceptions:

Check that query() throws invalid_argument if either key is invalid.
Check that query() throws range_error if it is called on an empty list.

Memory leaks and errors:

Run your test program in valgrind; check that there are no memory leaks or errors.
Note: If valgrind finds memory errors, compile your code with the -g option to enable debugging support and then re-run valgrind with the -s and --track-origins=yes options. valgrind will show you the lines numbers where the errors are detected and can usually tell you which line is causing the error.

What to Submit

You must submit the following files to the proj2 directory.

rmqlist.h
mytest.cpp

If you followed the instructions in the Project Submission page to set up your directories, you can submit your code using this Unix command:

cp rmqlist.h mytest.cpp ~/cs341proj/proj2/

CMSC 341 Data Structures — Projects & Support — Spring 2020