MCS 572 Cray T3E Group Project Suggestions
Fall 1997
Professor F. B. HANSON
DUE Friday 05 December 1997 in class.
Two(2) copies of the report are due Friday 05 December 1997 in class
(one graded copy will be returned and the other will be used for the
progress report and next class proposal to PSC).
Students will make short presentations of group project results in class,
starting on Wednesday 03 December 1997.
CAUTION: Projects should have sufficient work to effective utilize
the T3E, but should
not be so time consuming as to severely affect the performance of other users.
Write a group (1 < group < 2) with good load balancing
among the group members) report that is a short paper (8 or 15 or so
pages plus appendices) as if for publication, i.e., with
- abstract (short description of problem and results)
- executive summary (give an itemized brief summary of your paper)
- introduction (motivate your problem for the class, citing prior
work)
- problem or method
- results and discussion (should include theoretical explanations of
interesting results and graphs; explain results whether good or bad)
- conclusions (brief)
- acknowledgements (give thanks to others that helped you and to the
Pittsburgh Supercomputer Center of use of the Cray C90 (if you use it) and T3E)
- references (list articles, books and other documents that you
used as sources)
- appendices: compiler informational code listing (*.lst file
in f90 or *.V file if you use the cc compiler), supporting timings.
You are welcome to make up
your own projects (see the first suggestion), but you should discuss this
with Professor Hanson
before hand for suggestions. Also let him know what ever project
you select for additional advice,
because even the following ideas are very broad.
WARNING: If you use test or sample floating point arrays in your
project, make sure they are genuine and random floating point, i.e.,
do not use trivial integers or numbers with patterns. Consult the
class local user's guide for how to run a scalar job to use as a
reference measurement.
See the
Class Cray Local Guide.
You are expected to use MPI for parallel programming
on the T3E (See the MPI-Laplace example
Class Sample Laplace-MPI Fortran Code
or
Class Sample Laplace-MPI C Code
discussed throughly in class and
the
Class MPI Help pages).
Use the MPI_Wtime wall timer for measuring performance times, unless you
can find a better timer.
Also, if your project is similar to the one you did
on the Convex Exemplar SPP1200, then you must give an extensive comparison to
your SPP1200 project, so that the work is comparable to what you would do for
new project topic.
The Project Suggestions
- Own Project.
A Cray T3E project or your own design, such as optimization
of some method connected with your thesis research area, graphical
visualization, another course, or some interesting science-engineering area.
- Statistics Project. Generate suitable sets of random
numbers (make sure they are floating point), each with a different
sample size N. The function `ranf' is a very good random number
generator (RNG), but check it out yourself. Another RNG is `rand'.
See the
Cray Local Guide
or T3E man pages. Describe how you tested the
randomness of your data, e.g., test for a uniform random distribution.
For each set, compute basic statistics, like mean, variance and
Chi-Square test in as efficient vector manner as possible (i.e.,
make use of the extended Fortran90 intrinsic sum function `sum' on
the Cray. Plot T versus N and T versus p. Estimate or compute
and plot the Amdahl vector fraction as a function of N. Compare
speedups and efficiencies relative to N. Is the Amdahl law operative
as the problem size N becomes large? Develop your own performance
model that is appropriate for the behavior of the timing data with
number of processors p, sample size N and Chi-Square bin size Nb.
Does your performance model account for deviations in Amdahl's law?
- Row versus Column Oriented LU Decomposition Loops.
Determine regions of array size where there are efficiency advantages
on the Cray using column referencing as opposed to row referencing
in reordering LU decomposition multiple loops. Is the simple
Fortran column environment argument valid, and if not why not?
How strong is the dependence on loop iteration size N? What about
rectangular (non-square and very thin) matrices. Make sure your
floating point arrays are genuine. (See Dongarra, Gustavson and
Karp, SIAM Review, Vol. 26, 1984, pp. 91-122; for the CRAY-1).
- Validity of Hanson's "Avoid These Things". Investigate
a number of Professor Hanson's Rules of Thumb about "Avoiding
Certain Optimization Hindering Constructs". Find out the validity
on loops (if loops were involved) with sufficient work (ie., bigger
than the toy class examples). Find regions of work size, if any,
where each rule works. For example: What is the quantitative
difference in overhead between common and subroutine argument
passing? How much does inlining subroutines and functions save?
statement.
- Iteration Methods. Make a comparison of the performance
of Jacobi and Gauss-Seidel methods for Elliptic Partial Differential
equations. Gauss-Seidel is better for serial computers, but what
about parallel and vector computers? (See Ortega, Intro. Parallel
and Vector Solution of Linear Systems, 1988 and related papers.)
See
Class Sample Laplace-MPI Fortran Code
or
Class Sample Laplace-MPI C Code
that were being revised from PSC.
- Test whether higher or lower levels of optimization give
higher performance. For instance, does the command
`f90 -O[n] ... [pgm].f' lead to faster executables for some values of Option
Level `[n]'
for matrix multiplication or some other application.
- Compare Performance of MPI Functions/Subroutines.
For instance, compare the Collective Communication routine MPI_Bcast
with the Blocking Point to Point Communication routine MPI_Send along with
MPI_Recv, and with the Nonblocking Point to Point Communication
routine MPI_Isend along with MPI_Irecv. Use MPI_Wtime to measure performance
times. (Note shmem is the T3E native message passing library.
See `man shmem'.)
- C90 and T3E Performance Comparison.
Take some application and make a comparison between optimized performance
on the PVP C90 and the MPP T3E.
Web Source: http://www.math.uic.edu/~hanson/t3eproject.html
Email Comments or Questions to Professor Hanson