Project 3: QuadTrees for Life

Due: Thursday, November 1Extended to Nov. 4, 8:59:59pm

Links: [Project Submission] [Late Submissions] [Project Grading] [Grading Guidelines] [Academic Conduct]

Change Log

[Tuesday Oct 23, 11:30pm] Some of the provided source files have been fixed or slightly modified, and additional test programs, as well as sample inputs and outputs, have been added, as follows. Note: only the mods to QuadTree.h might have any possible impact on your code.

The sample test programs are described at the end of this document.

We have additionally decided to provide full implementations of the dump() member functions for both the QuadTree and QTQuad classes. In fact, you must use these implementations in your project now. Comparing against our sample output, and also our grading, depends on this exact formatting.


Objectives

The objective of this programming assignment is to give you your first intensive taste of a tree-based data structure.

Introduction

For this project, you will implement a version of what is known as a quadtree. (Please read the Wikipedia page's intro section, then the sections on region quadtrees and point-region quadtrees; skip the rest.) One common application for quadtrees is in computer graphics, where it is used to encode a bitmap at varying resolutions while taking advantage of regions of homogeneity to save space. A bitmap is a m x n rectangular matrix of pixels, representing a picture with intensity or color values at each pixel. For simplicity of explanation, let use as an example a 512 x 512 pixel image, although you should be able to easily figure out how to handle arbitrarily-sized images. We will also assume that our sample image is black and white, with each pixel storing the value 0 for black or 1 for white. Now, imagine if the 512 x 512 image were all white, or all black: we could represent the image with a 262,144-element array, but that would be wasteful: we could instead somehow represent our image by summarizing it with some succinct, symbolic, data structure-y version of the statement "one giant 512 x 512-unit white pixel". How could we do that? And what if it isn't one giant white (or black) block--can we still take at least partial advantage of this summarizing technique? For example, be able to say: "the entire left half is black, but the upper right has some splotches of white in the following places"?

The answer is "yes", using something called a quadtree. Being a tree, it consists of nodes, each of which has up to 4 child nodes, all starting from a single root node. In the case of a uniform 512 x 512 white region, we would represent that as a single root node representing the region covering the X dimension from 0 thru 511, and the Y dimension from 0 thru 511, also, with a value of "1".

Now, what if the image wasn't a single uniform value? What if the pixels in the bottom-left quadrant--i,e. X and Y both in the range from 0 thru 255--were black, while the rest were white? We would then subdivide the original region into four equal-sized square quadrants, by taking our original root and giving it four child nodes. The root no longer technically holds a pixel value-- its children do. The child representing the bottom-left quadrant has a pixel value of "0" for black, while the other three child nodes have a value of "1" for white.

What about a more extreme case: an all-white image with a single black pixel at x=0/Y=0? Then the root node would be "empty" (i.e., not directly holding a value), but would have four child nodes. The three covering the top-left, top-right and bottom-right quadrants, would all be leaves, and hold the value "1", but the bottom-left would again be a non-value node, since the region it covers--{0-255, 0-255} would have a mix of 0's and 1's. So again, this level-1 node would have 4 kids, 3 of them leaves holding values, but one child (again, the bottom-left) non-value because it covers both 0 and 1's. This pattern repeats down to level 9, where nodes/quadrants have shrunk to representing single pixels, and finally, even the bottom-left is all one color (trivially, since it contains only 1 pixel!). The idea is that you keep splitting nodes into smaller quadrants until a given quadrant contains only pixels of a uniform value.

Note that the above descriptions implies two additional properties: First, the tree must be full: every node has exactly 0 or 4 child nodes. Second, the splitting process implies that all squares at every level are aligned to a whole multiple of the square size at that level. For example, at a level where a square is 16x16 in size, it must start at some multiple of 16 in both X and Y dimensions.

The above is a called a region quadtree, since each node represents all the pixels in a square region of some size, and a value can be ascribed to any and every pixel. That summary value is stored in the node itself.

Another variation of a region quadtree is where you assume some default value for all non-represented cells and quadrants--what we will call "virtual cells". So if an internal node representing a region, say (0..15, 0..15) has 2 non-NULL child quadrants, and your default value is 0, the quadrants covered by the other 2 NULL children would be interpreted as containing all virtual 0-valued cells.

Next, for this project, we are not going to represent individual cells as discrete leaf nodes--that would be too inefficient for what we want to use this for. So, each of our leaves will represent a small fixed-size grid of N x N cells, where N is a fixed constant for our class. For example, if N==4, then our bottom-level nodes will each represent a quadrant of dimension 4, and will contain a 4x4 arrays of ints. Their parents will be internal nodes representing quadrants of dimension 8. For these parents, if only one of their 4x4 child quadrants has a non-zero cell, the other child pointers will be NULL, instead of pointing to an empty 4x4 grid. An other way to describe this is: we will never have an actual 4x4 grid that is empty.

An aside here on numbering quadrants: in the rest of this document, in referring to the four quadrants (and therefore, the four potential children) of a node, we will refer to them using Roman numerals, numbering the quadrants I to IV in the order bottom-left, bottom-right, top-left, top-right. This will be particularly important later on when we talk about tree traversal.

So, let us say our current root node is an internal node representing the region {x = 0..63, y = 0..63}. Furthermore, let us say it is a fresh new board with all cells initialized to 0. In that case, this would be the only actual node in our entire QuadTree, and would have all four child quadrant pointers set to NULL, indicating every cell in each of those quadrants is 0.

If we now set the point (50, 20) to the value 1, we could mathematically determine that it should go into quadrant II (i.e., bottom-right) under the root node, since quadrant II would have bounds {x = 32..63, y = 0..31}. Since quadrant II for the root node is NULL, we would create a new node and add it as child 2, then descend into that node.

Now, under that node, we would recursively calculate that based on the node's bounds and our point's coordinates, we would be in its top-right quadrant: quadrant IV. Again, since all child quadrant pointers are NULL (since our current node was itself just created), we would allocate another quadrant node, with bounds {x = 48..63, y = 16..31}.

We would continue this create-desired-child-quadrant-then-recurse process until we got down to the level where the node was covering an 4x4 region--our leaf node. At that level, we would allocate a 4x4 array of ints, and set the appropriate element in the 2-D array to "1" (or whatever value the caller requested).

So note that this is a simplified version of the typical region quadtree: we do not try to summarize any value other than 0, so if a quadrant is all 0's, we represent it as a NULL node pointer. However, if we have a quadrant--say, 16x16--that is all 1's, our tree would have a dimension 16 node that has all 4 dimension 8 child nodes, each of which in turn have all four dimension 4 child nodes, each of which would contain an actual 4x4 array of ints full of 1's. Not the most space-efficient implementation in this particular case, but our requirements are different.

So again, to summarize: quadrants at any level that are all 0's are represented as a NULL node, but any quadrant that contains even one non-0 cell will have a branch that descends all the way to a leaf grid. If there are multiple non-0 cells in that higher-level quadrant, they may or may not fall into the same leaf grid, depending on their location. Also note that in the general case, we could not have any nodes with all 4 children NULL, because then the node itself would represent an empty quadrant and would not exist as an explicit node, but rather as a corresponding NULL pointer in the parent. for that quadrant. We will discuss the one exception to this later.

When creating a Quadtree instance, the user will request a square region of arbitrary origin and dimensions, but we are requiring that you map the user's region into a root quadrant that is of a power-of-2 dimension, and position the user's region in the quadrant's lower-left corner. So, if the user asks to construct a board of size 33x33, with origin (47, 132), you should create a 64x64 root quadrant, and if the user then puts a value into coordinate (47, 132), it should be at your relative coordinate (0, 0). Whether you do this by consistently working in the user's coordinate system, or by immediately mapping their request to your own (0, 0)-centric coordinate system at the external calling interfaces, is up to you. It just has to work correctly, and again in the above example, if the user were to ask to put a value into coordinate (47+63, 132+63), you could conceivably put it into your tree. Except that you would refuse to, since we want you to reject requests not within the formal bounds they requested at construction time. In other words, you could do what they request, but you choose not to.

Our Motivating Example

There's a good chance most of you have implemented some version of Conway's Game of Life (GoL (not GoT)) in your CS1 or CS2 course. For a good background description, see the Wikipedia description. The simplest way to implement this game is using simple 2-D arrays to represent the board (we use the plural "arrays" because most implementations require a pair of boards representing the "before" and "after" state of the board in a given generation). The elements in the array represent living or dead cells, holding values of 1 and 0, respectively. The algorithm will consist of repeated iterations over the grid. Each iteration consists of a nested pair of for-loops that examines every cell on the entire board, counting up its neighbors and computing the state of that cell in the next generation. Therefor, the problem grows as the square of the dimension size (On2). This is especially inefficient because, due to the rules of GoL, the board maxes out in the worst case at a relatively low density of living (i.e., non-0) cells--typically below 10%. Also, even more relevant to our application, that density is only in the active region, which is typically a tiny fraction of the entire board. However, the user typically would prefer as large a board as possible, because patterns in GoL morph and migrate, and you want the patterns to be as free as possible to move or expand without having to deal with boundary cases at the edges of the board. One hack people have devised to work around the boundary effects is to "wrap" the board around into a 2-D toroidal array. However, this is exactly that: a hack to work around the fact that we cannot have an infinite board.

Another aspect of the program that could stand optimization is the cost of board traversal. It seems inefficient to examine every cell of the board when it is obvious from the rules of GoL that the only cells we are actively interested in are the cells that are live in the current generation, and their immediate neighbors. It is superfluous to examine any other cells. We can easily generate algorithm that turns the typical version on its head: instead of looking at a cell and asking what its neighbors contribute to it, we can look at all contributing cells, and tell them to add to running tallies for their neighbors. In fact, we've implemented such an algorithm, but need the data structure to support the necessary operations efficiently.

So, our application needs the space efficiency to represent a virtually immense board in a compact manner, where it is known that most of the cells will be empty. Also, it needs a way to iterate over only the living cells in an efficient manner. Quadtrees to the rescue!

Quadtree Operations

We are trying to create a data structure that, while implemented as a sparse tree, is meant to appear to the user as an abstract data type (ADT) akin to a simple, square 2-dimensional array. So the operations are simple grid-based coordinate requests to access, set, or modify cells in the 2D array. For the accessor, the user provides a x,y coordinate, and wants the value stored at those coordinates. Even though we don't actually store most of our 0's, we should return a correct 0 value if the user queries the corresponding coordinates.

The user is allowed to set a cell to an arbitrary value. If they change a 0 to a non-0, the value needs to be stored in an actual leaf grid array member, which might entail creating one or more quadrant nodes down the appropriate path, then allocating a grid leaf node and writing the value into it.

If the user sets a cell back to 0, you must determine if the entire grid node is now empty, and if so, then start pruning empty nodes upwards as far as possible. Note the additional slight complication that they might be asking to clear a node that is already 0, and therefore might not actually explicitly exist in our current tree. (That is not too difficult to handle, but you must consider it.)

You will also provide an increment() function, which adds a delta (or subtracts, if the delta is negative) to the current cell value. This is the most complicated of all, because you must consider two special cases:

  • You are changing a "virtual" 0 cell to a non-0 value, and might have to create nodes;
  • You are changing a non-0 value to a 0, and so might need to prune the tree.
  • While there are two other cases: setting a non-0 to a different non-0, or trying to add a delta of 0 to a virtual 0 cell, these are easy to test for and handle.

    You must implement a "clear" function that zeros out all the cells, but that is easy: think of how similar (but not identical) this is to the destructor.

    Last, but definitely not least, you must implement an iterator that allows the user to traverse over all the non-0 cells of the board. This is the primary attraction of our ADT: that we can do this quickly and avoid having to process any 0 cells, which our board primarily comprises.

    Compact Quadtree Maintenance

    The quadtree you design for this project must be kept as compact as possible. By "compact", we mean if all the cells covered by a quadrant node are zero, that node should not even exist: instead, the parent should just have a NULL pointer for that quadrant. Further note that if any node has 4 NULL children, that implies it is empty, and again, should not even exist. This is easy to maintain as long as we are adding (non-0) cells: we are just fleshing out NULL pointers and turning them into a path of non-empty quadrant nodes going all the way down to a leaf grid. However, it gets complicated when we zero cells out. Assuming the cell is currently non-0, we would first convert the leaf grid's array entry to a 0. However, we need to detect when the grid becomes completely empty, and must then delete that leaf grid node entirely, and set the corresponding pointer in its parent to NULL. In turn, if that node was its parent's only remaining non-NULL child, we must delete the parent, and this might propagate up to an arbitrary degree. This series of deletions will stop when you hit a node that has other children.

    Note: the one exception to the "delete when empty" rule is the root: if the entire top-level quadrant, i.e. the whole board, is empty, you still do not delete the root; this for practical reasons: it would make the rest of your code disproportionately complicated if the root might be NULL. So, bottom line: don't cross the streams (anyone get the movie ref?), and don't delete the root.

    Another way to describe this requirement is that your quadtree cannot have any leaf nodes other than grid nodes.


    Assignment

    Note: Running time is one of the most important considerations in the implementation of a data structure. Programs that produce the desired output but do so in an excessively inefficient manner are considered substandard implementations and will receive deductions during grading.

    Your assignment is to implement a quadtree data structure as described above, for holding distinct values for each cell of a 2-D grid. You will do this by implementing two primary classes: QuadTree and QTQuad (short for "QuadTree Quadrant") yourself, according to the incomplete class specification in the provide .h files, and the requirements described below. You are allowed to use simple STL classes like pair and vector inside your own class implementations to handle simple lower-level needs.

    At the top, you will design a QuadTree class that has definitions for all of the member functions described later in this section. Functionally, it will appear to provide the upper half of all of the quadtree functionality, but data-wise, it is primarily a holder for a pointer to the root node, which is of type QTQuad. You are free to add other function and data members to support the operations and runtimes required by this project.

    The QuadTree object will point to the root node of a quadtree constructed of nodes which are instances of the QTQuad class. The QTQuad class serves two purposes: First, it acts structurally as an internal node in the tree. It has an array of 4 pointers, each pointing to a child quadrant, stored in the specific order described earlier. Second, it must be able to do double duty as a leaf grid node. So it also has a pointer to a 2-D N x N array, where the dimension is #define'd as QTQ_GRID_DIM (currently 4). (In order for it to work correctly as C++ 2-D array, the dimension must be a constant.) You may also add other data members as necessary to implement the necessary functionality.

    Lastly, to allow efficient processing of non-0 nodes as required by our motivating application (GoL), you need to design an iterator nested class for the QuadTree class that provides the usual begin()/end() interface, supports moving from non-0 cell to non-0 cell using the '++' operator, and also supports access to the cell coordinates via the overloaded '*' pointer operator. The class should be QuadTree::iterator, and QuadTree should have functions QuadTree::begin() and QuadTree::end() that act as such functions should for a typical iterator implementation. The provided Life.cpp sample code provides a good example of how it wouldl be used.

    At the lowest level, you will use two classes we are providing to let you represent 2-D X/Y points (the Point class), as well as the bounding boxes (the BBox class). You will use the Point and BBox classes to send requests to and otherwise interact with the QuadTree and QTQuad classes. (In fact, the BBox class itself uses the Point class internally for one of its data members.) Note that for space reasons, all nodes--internal as well as leaf-- are of type QTQuad, but these do not contain any Point or BBox members; instead only the QuadTree has bounds stored as a BBox. That is because the BBox is deterministically implied for all quadrants under the root, and subquadrants under each of those, and so on. So, the QuadTree will pass the bounding box in to the QTQuad functions, which in turn will generate bounds for the subquadrant and pass that in as an argument to recursive calls down the tree.

    This assignment specifies the interface between the main program and your QuadTree implementation, and also specifies a set of required functions for the QTQuad implementation, but you are free to design the internals of the class as you wish, subject to some requirements below. In particular, while you are provided with fairly complete header files for the QuadTree and QTQuad classes, it is certain that you will have to augment those class definitions in order to do your project. You are provided complete implementations for the Point and BBox classes.

    Note that design is part of the grading criteria, so you do need to apply good design principles to your data structures. However, in your design, you must adhere strictly to each and every requirement and specification listed below, or the grading process will fail and you will get significant deductions.

    Requirement: At the end of any and all QuadTree operations, the resulting quadtree should always be a valid compact quadtree (see properties above).

    Requirement: You must complete the class definition for the QuadTree class, as provided in the file QuadTree.h. You cannot modify any of the existing data members, including their names, types, and public visibility, but you can add any additional data members as needed, in the areas commented as for that purpose. You cannot modify or remove any of the existing member function declarations, but you can add additional helper functions to facilitate your implementation. The class definition must be modified in-place in the provided skeletal file QuadTree.h and the member functions must be implemented in a file called QuadTree.cpp, which is not provided and which you must create in its entirety.

    Requirement: Your QuadTree class must leave the already-defined data member declarations exactly as-is, and use them exactly described below.

    QTQuad *m_root; BBox m_bounds; BBox m_qBounds;

    Requirement: Your QuadTree class must use the field named m_root to store the pointer to the QTQuad that is the root of the entire quadtree.

    Requirement: Your QuadTree class must use the field named m_bounds to store the actual board bounds as specified by the user. This will be used to determine the legality of the coordinates the user requests in the operations.

    Requirement: Your QuadTree class must use the field named m_qBounds to store the expanded bounds that is actually covered by the tree (i.e., dimension expanded out to the nearest power of 2, bottom-left coordinate the same as the original bounds in m_bounds. This defines the bounds for the root, and is passed in to the QTQuad functions.

    Requirement: You must complete the class definition for the QTQuad class, as provided in the file QTQuad.h. You cannot modify any of the existing data members, including their names and types, and public visibility, but you can add any addtional data members as needed. You cannot modify or remove any of the existing member function declarations, but you can add additional helper functions to facilitate your implementation. The class definition must be modified in-place in the provided skeletal file QTQuad.h and the member functions must be implemented in a file called QTQuad.cpp, which is not provided and which you must create in its entirety.

    Requirement: The QTQuad class will define the data stored in each node of your quadtree, both internal and leaf. You must leave the already-defined data member declarations exactly as-is:

    QTQuad *m_quads[QTQ_NUM_QUADS]; int (*m_cells)[QTQ_GRID_DIM]; Note, as already mentioned earlier, that the nodes do not ontain an explicit bounding box describing the region they cover, since this is implied by the combination of the bounding box for the entire tree and the QTQuad node's location in the tree. Also note that the two predefined fields--m_quads[] and m_cells, are mutually exclusive: only one or the other is active depending on whether it is serving as an internal node, or a grid leaf node. Yes, it could therefore have been defined as a union, but for reasons not stated, we chose not to.

    Requirement: You must finish the design for, and implement, the QuadTree::iterator class. You will be also be defining this class almost from scratch. We have only provided the function prototypes for all the member functions we are requiring you to implement. Otherwise, you have complete freedom in designing this class, as long as the required member functions work as described.

    Requirement: You are given the class definition and implementation of the utility classes Point and BBox to assist you in your implementation. The are provided in four files:

    You may not change these files in any way, nor should you submit them. We will replace them with our own versions in any case.

    Requirement: Your code must not have any memory leaks. When you run your code under valgrind on GL, it must report:

    All heap blocks were freed -- no leaks are possible

    Requirement: Your implementation must be efficient. We recognize that quadtrees are worst-case O(n2) (where 'n' is the dimension of the board), but our implementation is supposed to be close to O(m), where 'm' is the number of non-0 cells, independent of the board size.



    Specifications

    In addition to the requirements above, your QuadTree class must have the following member functions with the specified functionality:

    1. A default constructor that initializes a QuadTree properly, as a 16 x 16 grid with origin (0, 0). It should run in O(1) time.

    2. A 1-parameter constructor that initializes a QuadTree to the request size, at the request origin. It should run in O(1) time.

    3. A destructor that cleans up the entire tree, recovering all dynamically allocated space. Your implementation must not leak memory.

    4. A member function to get the value stored at 2-D coordinate position in a QuadTree. Returns the cell's value.

      int get(const Point &pt);

    5. A member function to set the value stored at 2-D coordinate position in a QuadTree. If the cell gets set to 0, and it was the last non-0 cell in its leaf grid, the grid node is deleted, with possible additional pruning upwards.

      int set(const Point &pt, int data);

    6. A member function to modify the value stored at 2-D coordinate position in a QuadTree by a requested amount. The delta is signed, so this function can actually increment or decrement, by any amount. increment() without a second argument increases the value by 1. Returns the cell's new value.

      int increment(const Point &pt, int delta = 1);

    7. A member function to clear a QuadTree by setting all the cells to 0. It does this by actually pruning the entire tree, making all the cells virtual, and the tree very compact.

      bool clearAll();

    8. The iterator support functions for performing iteration over the list of non-0 cells in the QuadTrees:

      iterator begin(); iterator end(); iterator::iterator(); bool operator==(const QTQuad::iterator &other); bool operator!=(const QTQuad::iterator &other); iterator &operator++(); // Prefix: e.g. "++it" iterator operator++(int dummy); // Postfix: "it++" Point * &operator*();

      These are all the standard iterator-related support functions, and should not need to be further described. Note that your implementation of operator*() should return a pointer to a Point object containing the coordinates of the next non-0 cell in the grid.

      Note that the iterator's behavior is further specified to be required to follow a particular scan order. At each intermediate internal node, the iterator must scan for non-empty quadrants in the I-II-III-IV order in a depth-first fashion. At the grid leaf node level, the grid must be scanned in a row-major form (meaning the non-0 cell with a lower row index must be returned first, and within a row, the a cell with a lower column index must be returned first. So, if there are non-0 cells at array positions grid[1][0], grid[1][3], and grid[2][0], the points (1, 0), (1, 3) and (2, 0) must be returned, in that order, in sequential calls to ++it (assuming it is the name of your QuadTree::iterator variable).

    9. A debugging member function that prints out the contents of the entire quadtree, exactly in the format shown in the sample output (see "Provided Files" section below). You should implement dump() in the most reliable manner possible (i.e., avoid calls to member functions which might themselves be buggy). The format we chose will allow us to provide a script that will allow the output to be reformatted in a nice nested indented format that will make analysis easier for your debugging, and will also allow us to analyze the structure for grading, so it is critical that you get it exactly right.

      void dump() ;

    10. A grading member function that is declared in QuadTree.h, which you should not implement:

      bool inspect(QTQuad * &root, BBox &bounds, BBox &m_qBounds);

    Then, your QTQuad class must have the following member functions with the specified functionality. Note that most of the functions are analogous to functions defined in QuadTree, except that they operate on the subtree rooted at the calling QTQuad object. Also note that most of the functions are also passed in a bounding box describing their quadrant, since that is not explicitly stored in the node, but rather is implied by the node's position in the quadtree. Those functions then compute the bounding box for each of their children and pass that in as a parameter for a recursive call:

    1. A constructor that initializes a QTQuad properly. It should run in O(1) time.

    2. A destructor that cleans up the entire (sub)quadtree rooted at the node, recovering all dynamically allocated space. Your implementation must not leak memory.

    3. A member function to get the value stored at 2-D coordinate position in a QuadTree. Returns the cell's value.

      int get(const Point &pt, const BBox &bounds);

    4. A member function to set the value stored at 2-D coordinate position in a QuadTree. If the cell gets set to 0, and it was the last non-0 cell in its leaf grid, the grid node is deleted, with possible additional pruning upwards.

      int set(const Point &pt, int data, const BBox &bounds);

    5. A member function to modify the value stored at 2-D coordinate position in a QuadTree by a requested amount. The delta is signed, so this function can actually increment or decrement, by any amount. increment() without a second argument increases the value by 1. Returns the cell's new value.

      int increment(const Point &pt, int delta = 1, const BBox &bounds);

    6. A member function to clear a QuadTree by setting all the cells to 0. It does this by actually pruning the entire tree, making all the cells virtual, and the tree very compact.

      bool clearAll();

    7. A debugging member function that prints out the contents of the entire (sub)quadtree rooted at current node, exactly in the format shown in the sample output. Note that dump() can easily be designed to be recursive, so you only have to print out the node-specific info, then recursively call dump() on each non-NULL child. (Note: same warnings about the importance of this function as for QuadTree::dump() above.)

      void dump() ;

    8. A grading member function that is declared in QTQuad.h, which you should not implement:

      bool inspect(bool &isInternal, union QTQ_Contents &u);


    Provided Files

    In the directory /afs/umbc.edu/users/p/a/park/www/cs341.f18/projects/proj3files, copies of all of the necessary files have been made available: