Project 2: RMQ List

Due: Tuesday, March 3, before 9:00 pm


Addenda


Objectives


Introduction

The Range Minimum Query problem (RMQ) is very simple to state. Let \(A\) be a numeric array of size \(n\). We wish to make numerous queries of the following type: given a pair of indices \((i, j)\), \(0 \leq i \leq j < n\), return the minimum $$ \mbox{min}(i,j) = \min_{i\leq k \leq j} A[k]. $$

If we were only making a small number of queries, we would just compute the answers directly:

min(i, j): minValue = A[i] for k = i+1 to j min = MIN( minValue, A[k] ) // two-argument MIN function return minValue

Here \(\mbox{MIN}(x,y)\) is the two-argument minimum function that returns the smaller of its two arguments. \(\mbox{MIN}(x,y)\) runs in constant time.

The problem with this simple solution is that each query has \(O(n)\) running time. In fact, making some assumptions about what we mean by a “random” query, we can show that the average interval length is approximately \(n/3\) and so the average case running time is \(\Theta(n)\).

We could speed-up the query by precomputing the answers. That is, we could compute \(\mbox{min}(i, j)\) for all valid \((i, j)\) pairs and save them in a matrix. Then a query would consist of looking-up the appropriate value in the matrix and returning it, which has \(\Theta(1)\) running time. However, there are \(n(n+1)/2\) possible queries, meaning distinct \((i, j)\) pairs, so the precomputation would have a running time of $$ \frac{n(n+1)}{2} \cdot O(n) = O(n^3). $$ It is not too difficult to show that this method is actually \(\Theta(n^3).\)

Variant 1: Dynamic Programming

Dynamic Programming is an algorithmic design technique that is taught in Algorithms. For this problem, all you need to know is that there is a simple mathematical observations that allows us to precompute the \(\mbox{min}(i,j)\) values more quickly. It is a fairly simple observation. Suppose I have already computed \(\mbox{min}(i,j)\) and want to calculate \(\mbox{min}(i,j+1)\) next; I can do this with a simple \(O(1)\) update: $$ \mbox{min}(i, j+1) = \mbox{MIN}\left(\mbox{min}(i, j), A[j+1]\right) $$ where, as above, \(\mbox{MIN}(x, y)\) is the two-argument minimum function.

We can use this to precompute all possible queries. Start with \(\mbox{min}(0, 0)\), which is just \(A[0]\). Calculate \(\mbox{min}(0, 1)\) by applying the updated procedure. Apply the update procedure to \(\mbox{min}(0, 1)\) to calculate \(\mbox{min}(0, 2)\). Repeat until you've calculatd \(\mbox{min}(0, n-1)\). Now start with \(\mbox{min}(1, 1) = A[1]\) and apply updates to compute \(\mbox{min}(1,2)\), etc., up to \(\mbox{min}(1, n-1)\). Repeat this procedure until you've precomputed all the values. Be sure to save the precomputed values in an appropriate data structure, such as a two-dimensional array.

Using this procedure, we compute \(\mbox{min}(i, j)\) for all \(\frac{n(n+1)}{2}\) \((i,j)\) pairs. The computation for any one pair is \(\Theta(1)\), so the time to complete the precomputation is \(\Theta(n^2)\). Compare this to precomputation with the simple version, which had \(\Theta(n^3)\) running time.

Once the precomputation is done, a query is just a lookup into the matrix in which we’ve stored the precomputed values, a \(\Theta(1)\) operation. Using Dynamic Programming (clever updating), we have reduced the precomputation time to \(\Theta(n^2)\) and retained a \(\Theta(1)\) query time.

Variant 2: Block Decomposition

The simplest way to describe block decomposition is to look at an example. The table below contains a 15-long array in the middle row; the top row contains the indices into the array. In the bottom row, we have computed the minimum of each three-element block from the data array. That is, the first element of the bottom row, 16, is the minimum of 34, 16, and 58, etc.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
34 16 58 -24 53 7 97 92 -12 45 9 0 -1 20 77
16 -24 -12 0 -1

How would we evaluate a query with this data structure? If the \((i, j)\) pair corresponds to one of the blocks, the query is a simple lookup. Using the example, \(\mbox{min}(6,8)\) would simply return -12, the precomputed minimum for the third block.

If the query corresponds to multiple full blocks, this is only slightly harder. For \(\mbox{min}(3,8)\), we would need to return the minimum of the second and third blocks, which is -24.

Unfortunately, many queries will not align with the block structure. For those, we need to do more work. Let \(b_0, b_1, \ldots, b_4\) denote the five block minimums. Suppose we wish to query \(\mbox{min}(2,10)\). The answer will involve two block minimums (\(b_1, b_2)\) and data from two partial blocks (the first and fourth). For this example, we would compute the minimum of the elements \(A[2], A[3], \ldots, A[10]\) by computing the minimum of the the values \(\{A[2], b_1, b_2, A[9], A[10]\}\) which, for the example, would be the minimum of \(\{58, -24, -12, 45, 9\}\), or -24.

How efficient is the block approach? Precomputing the block minimums can be done in \(\Theta(n)\) time, using a single pass through the data array. The running time of the query method depends on the chosen block size. It turns out that the optimal block size is approximately \(\sqrt{n},\) resulting in a \(O(\sqrt{n})\) running time.

Summary

We’ve discussed four methods for computing Range Minimum Queries, each with different precomputation and query running times:

  1. No precomputation. Write a function that computes a query with no precomputation. The query running time is \(O(n)\), but there is no precomputation. If we have to make a lot of queries, this becomes infefficient.
  2. Simple precomputation. Use our function from (1) to precompute all possible queries and save them in a two-dimensional array. The time to build the array is \(O(n^3)\), but the query time is \(O(1).\)
  3. Dynamic Programming (Variant 1). By thinking a little bit about how the precomputation is done, we reduce the precomputation time to \(O(n^2)\) and retain the \(O(1)\) query running time.
  4. Block Decomposition (Variant 2). By breaking the problem into a number of smaller problems (blocks), we reduce the precomputation to \(O(n)\), but the query running time increases to \(O(\sqrt{n}).\)
There are more complex methods that give different combinations of running times for the precomputation and queries. One approach, which utilizes a type of binary tree called a Cartesian Tree, gives \(O(n)\) time for precomputation and \(O(1)\) queries.


Assignment

Your assignment is to implement a linked list data structure with head and tail pointers along with support for efficient Range Minimum Queries (RMQ). The list will store key/value pairs where the types for both the key and the value are template parameters. The list must store the elements in ascending key order and must not allow duplicate keys. For the RMQ, you must use either Dynamic Programming or Block Decomposition.

For this project, you are provided with a skeleton .h file and a sample driver:

These and all the provided project files are available on GL:

/afs/umbc.edu/users/c/m/cmarron/pub/www/cs341.s20/projects/proj2files/

The file rmqlist.h contains the function declarations for two classes:

  1. Node — defines the nodes used in the linked list. The Node class is completely implemented and must not be modified.
  2. RMQList — implements the linked list supporting Range Minimum Queries. You will be completing this class.

The private class variables for the linked list have been defined and should not be changed; however, you may add private variables to support efficient RMQ. Whicever RMQ variant you choose to use, you will need some array-type variables. You may only use C/C++ arrays; STL containers may not be used.

Additionally, you are responsible for thoroughly testing your program. Your test program, mytest.cpp, must be submitted along with rmqlist.h. For grading purposes, your RMQList implementation will be tested on input arrays of varying sizes, including very large arrays. Your submission will also be checked for memory leaks and memory errors.

Finally, you should strive to make your RMQ implementation as efficient as possible. Both RMQ methods require you to precompute tables that are used by the query function. Avoid recomputing the tables unecessarily. Inserting or removing data will invalidate the pre-computed tables for either method; are there cases in which it is possible to update the tables without completely recomputing them?


Specifications

For the RMQList class, you must declare the class variables necessary to support RMQ and implement the following methods:

Method Description
RMQList() Constructor. Creates an empty RMQList object.
~RMQList() Destructor
RMQList(const RMQList<K,V> &rhs) Copy Constructor
const RMQList<K,V>& operator=(const RMQList<K,V> &rhs) Assignment Operator
int size() const Returns the number of elements in the list. This function is already implemented.
bool empty() const Returns true if the list is empty, false otherwise.
bool insert(const K& key, const V& value) Inserts an element into the list. The list must be kept in increasing order by key, and duplicate keys are not allowed. The function returns false if there is already an entry with the given key value and returns true otherwise. If the specified key is larger than all keys in the list, then the function should append the new data to the end of the list, avoiding unecessary iteration.
bool remove(const K& key) Remove the element with specified key value from the list. Returns false if there is no element with the specified key; returns true otherwise.
bool update(const K& key, const V& value) Update the value for the element with the given key value. Returns false if there is no element with the given key; returns true otherwise.
V query(const K& k1, const K& k2) Returns the minimum value between k1 and k2 in the list data (including the values associated with k1 and k2). Must be implemented using Dynamic Programming or Block Decomposition, and queries must be efficient for large lists (tens of thousands of entries). Throws an exception if the list is empty or if an invalid key is passed (see Requirements, below).
void dumpList() const Dump the contents of the linked list.
void dumpTable() const Dump the entire RMQ tables. This will depend on which RMQ method you choose to implement. For the block decomposition approach, the function should print the number of blocks, the block size, and the block minimums. For dyanmic programming, the function should print the entire array of pre-computed minimums.
void clear() Reset the RMQList object to its initial, empty state.

Requirement: Since the RMQList class is templated, the entire implementation must be in rmqlist.h.

Requirement: The following modifications may be made to rmqlist.h: addition of private class variables to support RMQ; addition of private helper function declarations; addition of function implementations. No other changes are permitted. In particular, you must not change the declaration of the required public functions.

Requirement: The class must correctly implement either the Dynamic Programming or Block Decomposition solution for RMQ. Your code will be tested for correct running times, and the TAs will visually inspect your implementation.

Reqirement: If you use the Block Decomposition method for RMQ, the blocksize must be \(\lfloor\sqrt{n}\rfloor.\) This can be computed in C++ with (int) sqrt((float) n).

Requirement: The query function must throw exceptions as follows:

Requirement: No STL containers or additional libraries may be used.

Requirement: Your code should not have any memory leaks or memory errors.

Requirement: Follow all coding standards as decribed on the C++ Coding Standards. In particular, class variable names must begin with “m_” or “_”.


Testing

Following is a non-exhaustive list of tests to perform on your implementation.

Basic tests of the linked list:

Tests of the query() function:

Tests of exceptions:

Memory leaks and errors:


What to Submit

You must submit the following files to the proj2 directory.

If you followed the instructions in the Project Submission page to set up your directories, you can submit your code using this Unix command:

cp rmqlist.h mytest.cpp ~/cs341proj/proj2/