MCS 572 PSC TCS and MPI Group/Individual Project Suggestions
Spring 2003
Professor F. B. Hanson
Project Report DUE Wednesday 02 April 2003 in class.
Students will make short presentations of group project results in class,
starting on Wednesday 02 April 2003, if any group is prepared.
CAUTION: Projects should have sufficient work to effective
utilize the TCS with MPI, but should not be so time consuming
as to severely affect the performance of other users. Write a group
(1 < group < 2) with good load balancing among the group
members) report that is a short professional paper (8 or 15 or so pages plus
appendices) as if for publication, i.e., with
- abstract (short description of problem and results)
- executive summary (give an itemized brief summary of your paper)
- introduction (motivate your problem for the class, citing prior
work)
- problem or method
- results and discussion (should include theoretical explanations of
interesting results and graphs; explain results whether good or bad)
- conclusions (brief, emphasizing your main results)
- acknowledgements (give thanks to others that helped you and to the
Pittsburgh Supercomputer Center of use of the TCS
if you use it: see tcs03guide.html acknowledgement.))
- references (list articles, books and other documents that you
used as sources)
- appendices: compiler informational code listing
file to identify the compiler-options command used, output files (samples
if there are too many) and supporting performance timings.
It is better if you can make up your own project out of thesis research,
this or other class topics, or other researchd,
but you should discuss this with Professor Hanson
before for advice on your plans. Let him know whatever project
you select for additional advice, because even the following ideas
are very broad.
WARNING: If you use test or sample floating point arrays in your
project, make sure they are genuine and random floating point, i.e.,
do not use trivial integers or numbers with patterns. Also, make sure
you MPI code represents a superproblem, else you will get slowdown
rather than speedup as many unfortunately found in the TCS Starter
problem because they use too small a problem size. Consult the
class local user's guide for how to run a scalar job to use as a
reference measurement. Also, this must be primarily be a TCS
problem, although comparison can be made to a ARGO, Platinum or
any of the departmental clusters.
See
TCS Project Suggestions
- Own Project.
A PSC TCS with MPI project or your own design, such as optimization
of some method connected with your thesis research area, graphical
visualization, another course, or some interesting science-engineering area.
- Statistics Project. Generate suitable sets of random
numbers (make sure they are floating point), each with a different
sample size N.
See the
TCS Local Guide
or TCS man pages. Describe how you tested the
randomness of your data, e.g., test for a uniform random distribution.
For each set, compute basic statistics, like mean, variance and
Chi-Square test in as efficient vector manner as possible
Plot T versus N and T versus p. Estimate or compute
and plot the Amdahl vector fraction as a function of N. Compare
speedups and efficiencies relative to N. Is the Amdahl law operative
as the problem size N becomes large? Develop your own performance
model that is appropriate for the behavior of the timing data with
number of processors p, sample size N and Chi-Square bin size Nb.
Does your performance model account for deviations in Amdahl's law?
- Row versus Column Oriented LU Decomposition Loops.
Determine regions of array size where there are efficiency advantages
on the Cray using column referencing as opposed to row referencing
in reordering LU decomposition multiple loops. Is the simple
Fortran column environment argument valid, and if not why not?
How strong is the dependence on loop iteration size N? What about
rectangular (non-square and very thin) matrices. Make sure your
floating point arrays are genuine. (See Dongarra, Gustavson and
Karp, SIAM Review, Vol. 26, 1984, pp. 91-122; for the CRAY-1).
- Iteration Methods. Make a comparison of the performance
of Jacobi and Gauss-Seidel methods for Elliptic Partial Differential
equations. Gauss-Seidel is better for serial computers, but what
about parallel and vector computers? (See Ortega, "Intro. Parallel
and Vector Solution of Linear Systems," 1988, or the newer Golub and Ortega
"Scientific Computing: An Introduction with Parallel Computing," 1993,
and related papers.)
See
Class Sample Laplace-MPI C Code. See also the F90 version if
interested.
- Test whether higher or lower levels of optimization give
higher performance. For instance, does the command
`cc -O[n] ... [pgm].c' lead to faster executables for some values of Option
Level `[n]' for matrix multiplication or some other large scale application.
Similarly for F90.
- Test F90 extensions
for enhanced performance on some large scale
problem. See the test problem in the Local TCS Guide.
- Compare Performance of MPI Functions/Subroutines.
For instance, compare the Collective Communication routine MPI_Bcast
with the Blocking Point to Point Communication routine MPI_Send along with
MPI_Recv, and with the Nonblocking Point to Point Communication
routine MPI_Isend along with MPI_Irecv. Use MPI_Wtime to measure performance
times. (Note shmem is the TCS native message passing library.
See `man shmem'.) Compare new MPI versions of Send and Recv with the
sequence Irec, Send and Wait for MPI.
- Computation versus Communication: Take a suitable MPI application
and try to find out what is the optimal ratio of computation to
communication and what the optimal message, i.e., array, size would be.
Web Source: http://www.math.uic.edu/~hanson/mcs572/tcs03project.html
Email Comments or Corrections or Questions to Professor Hanson.