Project 2, Square Lists
Due: Tuesday, October 10, 8:59:59pm
Links: [Project Submission] [Late Submissions] [Project Grading] [Grading Guidelines] [Academic Conduct]Change Log
Modified items are in orange.[Sunday October 8, 12:45pm] Added
-include ../../00Proj2/Int341.hto the g++ command. This is an attempt to prevent people who do not follow instructions and put a modified copy of Int341.h in their submission directory from hurting themselves too badly. Their copy of Int341.h will be ignored or they might get a compilation error. This is still not 100% foolproof. It is best not to copy Int341.h to the submission directory and, in any case, do not modify Int341.h. We will delete any copies of Int341.h that you submit prior to grading. So, if your program depends on a modified Int341.h to compile, it will not compile when we grade your submission.
[Wednesday October 4, 10:45am] Changed instructions for compiling and running the project to include logging into the "fedora" machines.
[Thursday Sep 28, 08:30am] Changed inspector() function in p2comptest.cpp to be more lenient in the definition of when an inner list is "too long" or "too short". Also, added test for indexOf() function and provided sample output p2comptest.txt
[Tuesday Sep 26, 10:30am] Added sections "Testing", "How to Submit" and "A Note on Timing".
[Tuesday Sep 26, 9:30am] If the position given to SqList::add() is equal to the number of items in the SqList then the data should be added at the end of the SqList.
Objectives
The objective of this programming assignment is to have you practice implementing a new data structure and also to gain some experience using the list template container class from the C++ Standard Template Library (STL).Introduction
For this project, you will implement a square list data structure. A square list is a linked list of linked lists where each linked list is guaranteed to have O(√n) items:
Fig. 1: A typical square list.
Figure 1 shows a typical square list of 25 integer values. The positions of the items are important. The ordering of the items in this example is: 3, 2, 5, 10, 21, 9, 32, 14, 41, 20, 23, ..., 15, 1, 11.
The idea of a square list is that we want to keep the length of the top-level and of all the inner lists bounded by O(√n). That way we can find the ith item of the list in O(√n) time. For example, if we want to find the 9th item in the list, we can progress down the top-level list and check the length of the inner lists. We know the 9th item cannot be in the first inner list, since it only has 3 items. It also cannot be in the second inner list, since the first two lists combined only has 6 items. Instead, we can find the 9th item of the square list by looking for the 3rd item of the third inner list, which turns out to be 41.
To accomplish this search in O(√n) time, we must be able to determine the length of each inner list in O(1) time. In the worst case, we have to search through O(√n) items of the top-level list before we find the inner list that holds the ith item. An additional O(√n) steps will allow us to find the desired item in that inner list, since the length of each inner list is also O(√n).
The main difficulty in maintaining a square list is that as we add items to and remove items from a square list, the length of the inner lists can change. This can happen in two ways. First, obviously, when we add items to or remove items from an inner list, the length of that inner list changes. Secondly, the length of an inner list relative to O(√n) can also change when we add or remove items elsewhere in the square list because doing so changes the value of n. For example, the 5th inner list in Figure 1 has 5 items. This happens to be exactly √25. If we removed all 10 items from the first 3 inner lists, that would leave us with only 15 items in the entire square list. After the removals, the length of the last inner list becomes bigger than √n even though the length of that list didn't change. The length becomes bigger than √n because √n dropped from √25 to √15, and 5 > √15.
Square List Maintenance
Our goal is to make sure that the top-level list and all the inner lists have lengths bounded by O(√n). It is too expensive to require that our square list always has √n inner lists, each with √n items. Instead, we maintain the following two conditions:
- Condition 1:
- Every inner list has ≤ 2 √n items.
- Condition 2:
- There are no adjacent short inner lists, where short is defined as having ≤ √n/2 items.
Notice that neither condition says anything about the length of the top-level list. Instead, we claim that if Condition 2 holds, then the top-level list cannot have more than 4 √n items. Too see this, suppose the contrary. That is, suppose that the top-level list has more than 4 √n items. (Yes, this is the beginning of a proof by contradiction.) Then, there must be more than 2 √n inner lists that are not short (otherwise, two of the short inner lists would be adjacent). Thus, the total number of items in these non-short lists must exceed 2 √n × √n/2 = n. This is a contradiction because n is the number of items (by definition) and cannot exceed itself. Therefore, the number of inner lists (and thus the length of the top-level list) must be bounded by 4 √n.
These observations allow us to maintain the O(√n) bounds on the lengths of the top-level list and the inner lists using the following procedure:
Consolidate:
- Traverse the top-level list.
- Whenever an empty inner list is encountered, remove that inner list.
- Whenever two adjacent short inner lists are encountered, merge them into a single inner list. (See Figures 2 and 3.)
- Whenever an inner list is found to have more than 2 √n items, split them into two lists of equal length. (See Figure 4.)
Some notes on the Consolidate procedure:
- Our strategy for this project is to run Consolidate after every operation that adds an item to or removes an item from the square list. (This is a simplification. See Addendum for a longer explanation.)
- When two short lists are merged into one, the order of the items in the square list must not change. In Figure 2, the original order of the items in the short lists where 10, 21, 32, 14. In Figure 3, the order of the items in the merged list is the same.
- We need the data structure for the inner list to support a merge operation in constant time. A singly linked list with a tail pointer would work.
- In Figure 4, the inner list that is too long has an even number of items. If a long list has an odd number of items, then after the split, one list will have one more item than the other. This does not affect the asymptotic running time.
- When a long list is split, the order of the items must be preserved. (See Figure 4.)
- Without any splits, the total running time for the Consolidate procedure is O(√n), because we can merge short lists in constant time.
- The split step can be costly because it takes O(t) time to split an inner list in half, where t is the length of the inner list. We can show using amortized analysis that splits do not happen very often. The proof is not hard but is beyond the scope of this course. The amortized analysis gives an amortized running time of O(√n) for most of the list operations (except indexOf). The amortized analysis shows that any mix of m list operations (not including indexOf) will take a total running time of O(m√n). Thus, the amortized time for each of the m square list operations is O(√n). Although, it is tempting to think of the amortized running time as an "average" running time, this is not accurate because the amortized analysis does not depend on the sequence of operations being "nice" or "average" in any way. Even when an adversary chooses the nastiest sequence of operations which results in the maximum number of splits, the total running time for that sequence of m operations will still be bounded by O(m√n).
Fig. 2: A square list with adjacent short inner lists. Note that 2 < √22/2 ≈ 2.345.
Fig. 3: Adjacent short lists merged.
Fig. 4: A long inner list split into two lists. Note that 6 > 2 √8 ≈ 5.657.
Assignment
Note: Running time is one of the most important considerations in the implementation of a data structure. Programs that produce the desired output but exceed the required running times are considered wrong implementations and will receive substantial deductions during grading.
Your assignment is to implement a square list data structure as described above that hold integer values (more on "integer" later). Both the top-level list and the inner lists should be implemented using the C++ STL list templated container class.
This assignment specifies the interface between the main program and your square list implementation, but you are free to design the class as you wish, subject to some requirements below. In particular, you are not provided with a header file for the square list class. Note that design is part of the grading criteria, so you do need to apply good design principles to your inner list data structure.
Requirement: Your square list class must be called SqList. The class definition must be placed in a file called SqList.h and the member functions must be implemented in a file called SqList.cpp. File names are case-sensitive on GL.
Requirement: Your square list class must store the square list in a list of lists. Your SqList class definition must have the declaration:
Requirement: You are given the class definition and implementation of the Int341 class in Int341.h and Int341.cpp. You may not change these files in any way.
Requirement: You must have a member function named consolidate with the following signature in your SqList
Requirement: You must have a public member function named inspector with the following signature in your SqList class definition:
Requirement: Your code must not have any memory leaks. When you run your code under valgrind on GL, it must report:
Requirement: Your implementation must be efficient. The running time for n operations with n data items stored in the SqList must be O(n √n). (See "How to Submit" below.)
Specifications
In addition to the requirements above, your SqList class must have the following member functions with the specified function signatures and running times:Implementation Notes
Before we list some recommendations and point out some traps and pitfalls, let's discuss Int341 and why we have it. One of the pitfalls of using STL container classes (and object-oriented programming in general) is unintentional copying. The Int341 class has just an int for its payload data. Its purpose is to allow us to track the number of times that the constructor, copy constructor, destructor and assignment operators are called. Consider the following main program that uses 5 different methods for retrieving the last item of a list<Int341> list. The report() member function of Int341 prints out the number of times that Int341 objects were created, copied and destroyed. (See Int341.h and Int341.cpp.) Note that the last two methods, using a reference and a pointer respectively, do not increase the number of calls.
-
#include "Int341.h"
using namespace std ;
int main() {
list
Output:
The main point here is that your consolidate() function should not copy any data items. In particular, when you split and merge inner lists, you should not increase the number of calls to the Int341 constructor, copy constructor, destructor or assignment operator. You can check this by calling Int341::report() before and after you call consolidate.
Now we list some recommendations and point out some traps and pitfalls:
- Apply the incremental programming methodology. Implement one or two member functions and fully debug them before writing more code.
- Carefully study the documentation for the STL list container class (e.g., here). Pay attention to the parameters, return values and the effect that a call would have on iterators. Make sure that you understand iterators and how to use them. There are often several ways to do the same thing with these member functions. You should choose the option that is more efficient and avoids copying.
- Do look at the splice member function for the list class.
- If you are not sure what a list member function does, write a small program that uses the function to test it.
- Review what the list destructor does and when it is invoked.
- The consolidate() function is the hardest. Think through the logic carefully before you code. For example, it is possible to have 3 short inner lists in a row. (How?) Would your consolidate() handle this case correctly? What are some other "weird" cases?
Testing
In Project 1, you were given an extensive suite of test programs. This is to provide you with an example of how you can test your code. For this project, you will have to write your own test program. The test program we provide below is just to make sure that your code will compile with the grading programs. Rest assured that the grading programs will exercise your code vigorously.
Test programs:
- Program that uses every required member function in the
SqList class. This includes an implementation of
inspector() that checks that your inner lists are not too
long and do not have adjacent short lists.
p2comptest.cpp
p2comptest.txt - Programs for timing trials
p2timetest1.cpp
p2timetest2.cpp
p2timetest3.cpp
How to Submit
You only need to submit two files: SqList.cpp and SqList.h. If for some reason you want to define new classes and additional functions, put all the declarations in SqList.h and all the implementations in SqList.cpp.
Do not submit Int341.h or Int341.cpp since those should not have changed.
We need a C++11 compiler to correctly perform the timing runs for this project. Instead of logging into the GL machines, log into one of fedora1.gl.umbc.edu, fedora2.gl.umbc.edu or fedora3.gl.umbc.edu. You can use the same username and password that you use on GL. The directory structure is the same as on GL. If you have customized your shell environment, some of your customizations might be broken, but you should still be able to run the g++ compiler. That is all we need. If you really cannot log into the fedora machines, then just use GL to record your timing runs.
If you followed the instructions in the Project Submission page to set up your directories, you can submit your code using this Unix command:
Do remember to exit from the script command. This creates a file called typescript that will record any compilation errors. Yes, we know you can edit this file, but the compilation errors will just show up when we compile the programs again and you will still get lots of points deducted. This step is to compel you to fix any changes needed to get your program to compile on GL without any errors.
Note: cd to the appropriate directory if you are submitting late.
Now you can delete the executable files with
Then you should just have 3 files in your submission directory. Check using the ls command. You can also double check that you are in the correct directory using the pwd command. (You should see your username instead of xxxxx.)
A Note on Timing
The main programs for timing above ( p2timetest1.cpp, p2timetest2.cpp and p2timetest3.cpp) double the number of items and the number of calls to SqList member functions each time. Since we expect the total running time to be O(n √n), doubling the value of n should increase the total running time by a factor of approximately 2.82. This is because
2n √ 2n = 2 √2 n √n
and 2 √2 ≈ 2.82. On some systems (e.g., Mac OS X), the running times bear this out:
The ratios 2.814/0.978 ≈ 2.877 and 8.201/2.814 ≈ 2.914 are quite close to the predicted value of 2.82.
However, it turns out that the Standard Template Library is not so standard. (See note on GNU website.) In particular the running time of the size() function may be O(1), which is what we want, or O(n), which is what is on GL. So, timing the same programs on GL gives quadratic running time.
Each successive run gives roughly quadruples the running time: 1.122/0.292 ≈ 3.84 and 4.248/1.122 ≈ 3.786. Your implementation should assume that size() takes O(1) time.
Discussion Topics
Here are some topics to think about to help you understand square lists. You can discuss these topics with other students without contradicting the course Academic Conduct Policy.- Suppose you start with an empty square list and keep inserting items in the front of the list. When does the first merge occur?
- What is the smallest number of items you can have in a square list that has 11 inner lists?
- Do we ever encounter long inner lists that have to be split (other than the first inner list) if we only allowed insertion and removal at the beginning of the list?
- After you split an inner list, is it possible that the same inner list has to be split again after the very next square list operation? after two operations? when could the next split occur?
- Can you ever encounter 3 short lists in a row during the Consolidate procedure? Does it matter? and should you write code whose correctness depends on the answer to these questions?