Math 627 - Introduction to Parallel Computing

Matthias K. Gobbert and Madhu Nayakkankuppam

Fall 2002 - Project Presentations


This page can be reached via my homepage at http://www.math.umbc.edu/~gobbert.

Program for Day 1

Monday, December 09, 2002, 07:00-09:00 p.m., MP 401

  1. 07:00-07:15
    A Comparison of PETSc Linear Solvers on the Poisson Equation
    Toni M. Smith
    The Portable Extensible Toolkit for Scientific Computation (PETSc) is used in parallel computing to solve large scale problems. Such problems might, at some point, require the numerical solution to a large linear system. We compare two of the methods avaliable in PETSc: Conjugate Gradient and Generalized Minimum RESidual. We test these methods on linear systems generated by the finite difference discretization of the Poisson equation and do so with and without available preconditioners. We wish to determine which method and preconditioner will return the numerical solution in the least amount of time, for the largest system, with the greatest efficiency. We find that the conjugate gradient method with block Jacobi preconditioning is the best option available in PETSc.
    Faculty mentor: Matthias K. Gobbert
    Postscript-file of the report.

  2. 07:20-07:35
    A Matrix-Free Conjugate Gradient Solution of the Poisson Equation
    Kevin P. Allen
    The conjugate gradient method is perhaps the most powerful algorithm for iteratively solving a system of linear equations with a symmetric positive definite system matrix. Parallelization of the method saves time without compromising accuracy of the solution. By using a matrix-free matrix-vector product, the system matrix is never stored, reducing the strain on memory. The combination of these aspects allows us to solve much larger systems of equations much more quickly.
    Faculty mentor: Matthias K. Gobbert
    Postscript-file of the report.

  3. 07:40-07:55
    A Reusable C++ Library for Parallel Maximum Likelihood Phylogenetic Reconstruction
    Rajeev Ayyagari
    A great deal of software, such as PHYLIP and AxML, is available for estimation of Phylogenetic Trees from DNA nucleic sequence data. Much of this software runs in parallel using MPI. However, the source for most existing software has a monolithic, hard-wired structure that greatly increases the effort involved in understanding or modifying these programs. This project produces a library of components for maximum-likelihood estimation of phylogenetic trees that can easily be modified, replaced or extended (through object-oriented techniques such as inheritance). Features include parallelization of the process of phylogenetic tree estimation in a way that makes it easy to add/modify library functionality. It hides as much of the parallelization from a casual user as possible, providing an interface that accepts just a communicator and does the apportioning automatically. Nucleotide-wise parallelization was chosen as a quantum of parallelization that was coarse-grained enough to avoid excessive communication overhead and fine grained enough the be applicable under most models of evolution. The method is shown to be highly amenable to parallelization.
    Faculty mentor: Madhu Nayakkankuppam
    PDF-file of the report.

  4. 08:00-08:15
    Data Distribution for Parallel Sparse Matrix-Vector Multiplication in the Conjugate Gradient Algorithm for Systems of Linear Equations
    Olena Shevchenko
    Sparse matrix-vector multiplication lies at the heart of many iterative solvers for linear systems (in particular, in the conjugate gradient method). In these solvers, a multiplication y = A x has to be carried out repeatedly for the same sparse matrix (in the conjugate gradient method A is square symmetric positive definite), but each time for a different input vector x. On a distributed-memory parallel computer, efficient multiplication requires a suitable distribution of the data and the associated work. In particular, this requires distributing the sparse matrix over the p processors of the parallel computer such that each processor has about the same number of non-zeros and such that the amount of communications is minimal. Such data distribution as well as matrix-vector multiplication was implemented and used in the conjugate gradient method.
    Faculty mentor: Madhu Nayakkankuppam
    Postscript-file of the report.

  5. 08:20-08:35
    A Parallel Solver for Large Systems of ODEs
    Bogdan Gavrea
    The goal of this project is to solve large systems of ordinary differential equations (ODEs) using the parallel solver PVODE. The system of ODEs was derived from a parabolic partial differential equation by discretizing with respect to the spatial coordinates, a method that is also known as the method of lines. The results obtained show that for large problems, linear speedup is achieved.
    Faculty mentor: Matthias K. Gobbert
    PDF-file of the report.

  6. 08:40-08:55
    Parallelizing the 1-D Fast Fourier Transform
    Michael Muscedere
    Parallelizing of the Fast Fourier Transform (FFT) algorithm is explored to increase execution speed and reduce single processor storage requirements. Parallel FFT distribution of data between node processors using a block structure of decimated input points is discussed. Further discussion describes the use of a hypercube communication topology to exchange necessary information between node processors. The resulting parallel FFT algorithm transforms an AM signal correctly and has significant speedup despite exchanges of large amount of information between node processors.
    Faculty mentor: Matthias K. Gobbert
    Postscript-file of the report.


Program for Day 2

Tuesday, December 10, 2002, 07:00-08:45 p.m., MP 401

  1. 07:00-07:15
    A Nonlinear Equations Solver in Parallel Using ScaLAPACK
    Ravi Siddani
    The Scalable Linear Algebra PACKage (ScaLAPACK) is a library that implements matrix algorithms in parallel. This project looks at ScaLAPACK routines that would be invoked within each iteration of the Broyden's algorithm to solve a system of non-linear equations in parallel. Limited timing studies show that when the size of the matrix is large and theblock size is small, we achieve a decrease in time taken for certain portions of the algorithm. Future work will result in a more concrete statement about time, and a statement about speed-up and efficiency.
    Faculty mentor: Madhu Nayakkankuppam
    Postscript-file of the report.

  2. 07:20-07:35
    A Parallelization of the Batch K-Means Clustering Algorithm
    Matthew C. Siegel
    Clustering large data sets can be time consuming and processor intensive. This paper describes a parallel implementation of a popular clustering algorithm, the k-means algorithm, to provide for faster clustering solutions. The algorithm was tested such that 3, 5, and 7 clusters were created on an 4 node Beowulf cluster of dual-Pentium III 1000 MHz machines, and while optimal levels of speedup were not achieved, benefits from parallel computing were definitely discovered. This work is joint with Dr. Jacob Kogan (Department of Mathmatics and Statistics, UMBC)
    Faculty mentor: Madhu Nayakkankuppam
    PDF-file of the report.

  3. 07:40-07:55
    Improved General Circle Detection Algorithm
    Tomasz Macura
    This paper considers the problem of detecting circles in binary images on parallel computers. The author presents improvements to the General Circle Detection Algorithm (GCDA) that was invented by A. Kavaianpour, S. Shoari, and N. Bagherzadeh. The GCDA is based on a transformation that converts circles in an image to families of straight lines. This reduces the problem of circle detection to that of line detection which may be solved by any line detection algorithm. Theoretical analysis of the performance of the Improved General Circle Detection Algorithm (IGCDA) suggests that speedup will be linear. The author's implementation of the IGCDA on a Beowulf cluster of generic personal computers achieved the predicted speedup.
    Faculty mentor: Matthias K. Gobbert
    Postscript-file of the report.

  4. 08:00-08:15
    A Parallel Implementation of Lanczos Algorithm in SVD Computations
    Srikanth V. Kallurkar
    This work presents a parallel implementation of Lanczos algorithm for SVD computations using MPI. We investigate execution time as a function of matrix sparsity, matrix size, and the number of processes. We observe speedups and slowdowns for parallelized version of the Lanczos algorithm. Reasons for the speedups and the slowdowns are discussed in the report. This work is joint with Dr. Jacob Kogan (Department of Mathmatics and Statistics, UMBC)
    Faculty mentor: Madhu Nayakkankuppam
    PDF-file of the report.

  5. 08:20-08:35
    Parallel Jacobi Iteration Method for the Poisson Problem Using a Two-Dimensional Cartesian Decomposition of the Domain
    Zhibin Sun
    To make the parallel Jacobi iteration method for the Poisson Problem more efficient, I use the topology attributes of MPI, which is 2-D decomposition of the domain, to accelerate the original solution using 1-D decomposition. In the result, we can see that the 2-D decomposition can obtain good accelerative results in some aspects.
    Faculty mentor: Matthias K. Gobbert
    Postscript-file of the report.


Copyright © 2001-2002 by Matthias K. Gobbert. All Rights Reserved.
This page version 1.7, December 2002.