MCS 572 Individual Cray C90 Starter Problem Fall 1997

Professor F. B. HANSON

DUE Monday 24 Nov 1997 in class (this is individual, not group, homework)

Optimize the code

c90start.f

on the PSC Cray C90 by doing whatever is necessary to get the best performance, provided that all variables have the same final storage values in the optimized code as the original code, WITHIN REASON, WITHOUT MULTITASKING, and no work is taken out of the original timing loop, such as using new data or parameter statement statements. A copy of this code can be found

by clicking c90start.f
or by using anonymous FTP to `www.math.uic.edu', change directory to `pub/Hanson/MCS572' and get the file `c90start.f'.

In particular, use the timer 'second' in the code with the optimizing Cray Fortran compile-link command:

cf77 -Wf"-em" -o start c90start.f &

(your compiler information listing should result in `start.l') and execute as

run start >& start.output &

in order to report

Summary Page with
1. Total user time for original code, using an average of a sample of 4 user timings.
2. Total user time for tuned code, using an average of a sample of 4 user timings.
3. Ratio of the original to tuned user cpu times.
4. Recompile and rerun the code with the higher level scalar and inlining optimizations with `-O scalar2 -inline2' instead of the default optimization option and then compare respective tuned and untuned versions with ratios of
Documentation:
1. Output results for original and tuned codes.
2. Document with comments what tuning was performed by each tuned loop. "default" to the "revised" optimization times.
3. Compiler optimization reports, before and after optimization tuning.

Be sure to label all above items for identification. Try to remove as many of the Cray Fortran (cf77) compiler non-optimized informational messages as possible, maximally use Fortran 90 array extensions, and use compiler directives only where needed. However, the FINAL storage into scalar variables and arrays must be the same as the original code. The best way to start is to temporarily put timers around all the loops, in order to find the most time consuming loop and work down to the smaller loops. Your final times should be the difference between the end of the code and the beginning of the code, less timer overhead, as in the original code.

Try to make the code fit the Cray vector model. Your performance will be inversely related to your total time in the new tuned optimized part of your program, if correct.

Notes:

Hint: the tuned time should be much smaller than the original. (Do not be misled by small improvements on the borg, ratios of order 600 times is a good improvement for the c90 and double that is very good, but very difficult.)
A Link to this problem source is on the MCS572 Class Homepage.
The Class Local Cray Guide is Ready for this C90 Assignment.

See especially the section: Annotated PSC Cray C90 Sample Session.
There will be no C version until or if the clock problem can be fixed.

Please report to Professor Hanson any problems:
Web Source:http://www.math.uic.edu/~hanson/c90start.html