Project 1: Sparse Adjacency Matrices

Due: Tuesday, February 19, before 9:00 pm


Addenda


Objectives

The objective of this programming assignment is to review C++ programming using following features: object-oriented design, dynamic memory allocation, array manipulation, iterators, and exceptions.


Introduction

In computer science and discrete mathematics, a graph is not a plot of y = f(x). A graph has vertices and edges. Here's a simple example:

The circles are the vertices and the edges are the lines between the vertices. It is common to use the words "node" and "vertex" interchangeably.

In the example above, the set of vertices is {0,1,2,3,4} and the set of edges is {(3,4), (4,1), (3,0), (0,4), (0,1), (2,1), (4,2)}. It is common to use the ordered pair notation (u, v) to represent an edge, but in an undirected graph, the edges are not ordered. Thus, (2,1) and (1,2) are the same edge.

One common way to store a graph is using an adjacency matrix data structure. This data structure is just a matrix (two-dimensional array) in which the (u, v) entry is one if (u, v) is an edge and zero otherwise. For example, the graph above can be stored as:

The neighbors of a vertex v are the vertices in the graph that are connected to v by an edge. The first row of the matrix A, indexed by 0, indicates that vertex 0 has edges in common with vertices 1, 3, and 4; they are the neighbors of vertex 0.

Note that each edge is represented twice in A. For example, since there is an edge between vertex 0 and vertex 3, there are ones in both the (0, 3) and the (3, 0) entries of A. This is because the graphs we will work with in this project are undirected, meaning that an edge indicates a connection between two vertices without regard to direction.

Many graphs used in applications are sparse, meaning that they have relatively few ones. For example, suppose we wanted to make a graph in which the vertices are all members of the UMBC community (students, faculty, staff) and there is an edge between two vertices if either person has ever sent the other an email. Most of us only have email communication with a very small subset of all the people at UMBC: our rows in the matrix would be mostly zeros. It is a waste of memory to store all those zeros.

So how can we store our adjacency matrix without wasting space? We will use compressed sparse row format. Let N denote the number of vertices in the graph and NNZ the number of non-zero entries in our adjacency matrix. We can store the adjacency matrix efficiently with three arrays:

  1. nz — an integer array of length NNZ storing all the non-zero entries in row-major order. The first elements of nz are the non-zero elememnts of row 0, followed by the non-zero elements of row 1, etc. The data for each row should be stored in order of increasing column index.
  2. re — an integer array of length N+1 indicating where each row's data starts in nz. That is, the data for row u starts at index re[u] of nz. The last element of re should always be equal to NNZ.
  3. ci — an integer array of length NNZ storing the column indices for the elements in nz.

The adjacency matrix A for our example graph would be encoded as follows:

    nz = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}
    re = {0, 3, 6, 8, 10, 14}
    ci = {1, 3, 4, 0, 2, 4, 1, 4, 0, 4, 0, 1, 2, 3}

Comments:

  1. You may be thinking that there is something wrong with my definition of "efficient" since the sparse encoding stores 33 integer values when the original matrix could be stored with only 25 values. It is true that this is not particularly helpful for small matrices, but it is space efficient when working with large sparse matrices.
  2. Why are we bothering to store values in nz if they will always be ones? The short answer is that they will not necessarily always be ones. Sometimes it is useful to weight the edges of a graph. Going back to the UMBC email example, I might want to record how many emails were sent between individuals, not just the fact that one or more emails were sent; large edge weights might be indicative of which connections are more important.

Assignment

Your assignment is to implement a sparse adjacency matrix data structure Graph that is defined in the header file Graph.h. The Graph class provides two iterators. One iterator produces the neighbors for a given vertex. The second iterator produces each edge of the graph once.

Additionally, you must implement a test program that fully exercises your implementation of the Graph member functions. Place this program in the main() function in a file named Driver.cpp.

The purpose of an iterator is to provide programmers a uniform way to iterate through all items of a data structure using a for loop. For example, using the Graph class, we can iterate thru the neighbors of vertex 4 using:

Graph::NbIterator nit ; for (nit = G.nbBegin(4); nit != G.nbEnd(4) ; nit++) { cout << *nit << " " ; } cout << endl ;

The idea is that nit (for neighbor iterator) starts at the beginning of the data for vertex 4 in nz and is advanced to the next neighbor by the ++ operator. The for loop continues as long as we have not reached the end of the data for vertex 4. We check this by comparing against a special iterator for the end, nbEnd(4). This requires the NbIterator class to implement the ++, != and * (dereference) operators.

Similarly, the Graph class allows us to iterate through all edges of a graph using a for loop like:

Graph::EgIterator eit ; tuple<int,int,int> edge ; for (eit = G.egBegin() ; eit != G.egEnd() ; eit++) { edge = *eit ; // get current edge cout << "(" << get<0>(edge) << ", " << get<1>(edge) << ", " << get<2>(edge) << ") " ; } cout << endl ;

Note that each edge should be printed only once, even though it is represented twice in the sparse adjacency matrix data structure.

Since a program may use many data structures and each data structure might provide one or more iterators, it is common to make the iterator class for a data structure an inner class. Thus, in the code fragments above, nit and eit are declared as Graph::NbIterator and Graph::EgIterator objects, not just NbIterator and EgIterator objects.

If you have not used nested class declarations before, here's an example: nested.cpp and sample output. (For convenience, the class declarations and implementation are provided in one file, contrary to course coding standards.)


Specifications

Here are the specifics of the assignment, including a description for what each member function must accomplish.

Requirement: your implementation must dynamically resize the m_nz and m_ci arrays. See the descriptions of Graph (constructor) and addEdge, below.

Requirement: other than the templated tuple class, you must not use any classes from the Standard Template Library or other sources, including vector and list. All of the data structure must be implemented by your own code.

Requirement: your code must compile with the original Graph.h header file. You are not allowed to make any changes to this file. Yes, this prevents you from having useful helper functions. This is a deliberate limitation of this project. You may have to duplicate some code.

Requirement: per our course coding standards, your code must compile with g++ on the GL servers without using any compilation flags.

Requirement: a program fragment with a for loop that uses your NbIterator must have worst case running time that is proportional to the number of neighbors of the given vertex.

Requirement: a program fragment with a for loop that uses your EgIterator must have worst case running time that is proportional to the number of vertices in the graph plus the number of edges in the graph.



These are the member functions of the Graph class (not including the member functions of the inner classes).


These are the member functions of the edge iterator class EgIterator:


These are the member functions of the neighbor iterator class NbIterator. They are analogous to the functions for EgIterator.


Test Programs

The following test programs may be used to check the compatibility of your implementation. These programs do not check the correctness of your implementation. Even if your implementation compiles and runs correctly with these programs, it does not mean your implementation is error-free. Grading will be done using programs that exercise your implementation much more thoroughly. You must do the testing yourself --- testing is part of programming. Conversely, if your implementation does not compile or does not run correctly with these test programs, then it is unlikely that it will compile or run correctly with the grading programs.

These files, as well as Graph.h, are also available on GL in the directory:

/afs/umbc.edu/users/c/m/cmarron/pub/www/cs341.s19/projects/proj1files/


Implementation Notes


What to Submit

You must submit the following files to the proj1 directory.

You do not need to submit Graph.h because it should not have changed. If you do happen to place a copy of Graph.h in your submission directory, it will be replaced by a copy of the original version.

If you followed the instructions in the Project Submission page to set up your directories, you can submit your code using this Unix command command.

cp Graph.cpp Driver.cpp ~/cs341proj/proj1/