MCS572 UIC Cray User's Local Guide to
NPACI Cray T90 Vector Multiprocessor
and T3E Massively Parallel Processor

version 14.00
30 November 2000

F. B. Hanson

Mail address:

Department of Mathematics, Statistics, and Computer Science

University of Illinois at Chicago

851 S. Morgan; SEO, MC 249

Chicago, IL 60607-7045

Office address:

Room: 718 SEO

Hanson World Wide WEB Home Page:

http://www.math.uic.edu/~hanson/

UIC Fall 2000 Course:

MCS 572 Introduction to Supercomputing

MCS 572 Class World Wide WEB Home Page:

http://www.math.uic.edu/~hanson/mcs572/

Acknowledgement:

Project MCS 572 Introduction to Supercomputing was supported in part by National Science Foundation cooperative agreement ACI-9619020 through computing resources provided by the National Partnership for Advanced Computational Infrastructure at the San Diego Supercomputer Center though NPACI Account UIL203 to Principal Investigator Floyd Hanson.

Introduction.

T90 Overview.
T3E Overview.
Supercomputer Centers Overview.
Guide Notation.

Background References.

Annotated NPACI Cray T90 Sample Session.

Annotated NPACI Cray T3E Sample Session.

ftp File Transfers between NPACI/Crays/UNICOS and UIC.

ftp File Transfers at the NPACI Crays.
ftp File Transfers from UIC UNIX .
ftp File Transfers from the UIC PC Labs.
ftp File Transfers at UIC.

Execution of T90 Cray FORTRAN90 (f90) or Cray C.

Example 1: Execution using the Terminal for Input and Output.
Example 2: Execution using Input and Output with Files.
Modifications for C: Compile and Execution with C.

UNIX Command Dictionary.

Cray UNICOS Specific Unix Commands.

UNICOS Special Information Commands>
T90 UNICOS f90 Compile, Load and Execution Commands.
UNICOS C Language Commands.
UNICOS Performance Commands.
UNICOS makefile Commands.
UNICOS Mail Commands.
UNICOS Network Queueing System (NQS).

Interrupts Dictionaries Telnet and UNIX.

UNIX Interrupts Dictionary.

T90 Fortran90 and other Extensions.

T90 Fortran90 (f90) Compiler Options.
T90 Fortran90 (f90) Miscellaneous Extensions.
Fortran90 Array Construction Functions.
Fortran90 Array Reduction Functions.
Fortran90 Array Manipulation Functions.
Fortran90 Array Location Functions.
Fortran90 Array Matrix Multiplication Functions.
Fortran90 Array Functions TEST CODE.
T90 Fortran90 (f90) Library Functions.
T90 Fortran90 (f90) Compiler Scalar Optimization Directives.
T90 Fortran90 (f90) Compiler Loop Directives.
T90 Fortran90 (f90) Compiler Storage Directives.
T90 Fortran90 (f90) Compiler Diagnostic Directives.
T90 Fortran90 (f90) Multi-Tasking Options.

MPI Message Passing Programming on Crays.

PVM Message Passing Programming on Crays.

Cray T90 f90 and cc Timing Utility Functions.

T90 Fortran90 (f90) Timing Utility Functions.
cc Timing Utility Function.
Table of T90/T3E Timers.
T3E MPI Wall Timer.

Introduction

This User's Local Guide is intended to be a sufficient, hands-on introduction to the San Diego Supercomputing Center/National Partnership for Advanced Computational Infrastructure (SDSC/NPACI) Cray T90 (Cray Y-MP T916/14512) Parallel Vector Processor (PVP) and Cray T3E (Cray T3E-600 LC256) Massively Parallel Processor (MPP) for our MCS 572 Introduction to Supercomputing class. The T90 and T3E are tightly coupled at NPACI, forming a heterogeneous computing environment. Both the T90 and T3E have variations of a Unix operating system. Cray, Inc., the current name of the original company Cray Research, Inc., was reformed when it was bought out from Silicon Graphics, Inc. (SGI) by Tera Computer Company with the merged company taking the on the famous Cray supercomputing name, now the company is called Cray, Inc.. Tera developed the Multi-Threaded Architecture computer and its first machine is the MTA supercomputer node at SDSC/NPACI. The NPACI Class Account for MCS572 Fall 2000 is `UIL203'.

T90 Overview.

The T90 is a pipelined vector multiprocessor type of supercomputer has 14 vector processors. The T90 is also called a Parallel Vector Processor (PVP) by Cray and NPACI. The T90 is also called an SMP (Shared Memory Processor (SMP) or more loosely Symmetric Multiprocessor). See the NPACI official users guide: Using the NPACI Cray T90. Also see the NPACI slide tutorial: Cray T90/J90 Hardware Overview. What does the CRAY T90 look like? CRAY T90 Picture.

T90 Processing Units.

The T90 multiprocessor uses Cray Research Incorporated custom silicon CPUs with a clock speed of 440 MHz or clock period of 2.273 nanoseconds, and each processor has a peak performance of 1.7GigaFlops. Each has 8 vector registers with 128 words (vector elements) of eight bytes (64 bits) each. The memory on this traditional Cray is all physical memory, i.e., Seymour Cray designed machines with ``real memory'', no virtual memory or associated memory paging. Cray T90 single precision uses IEEE single precision floating point format, a four byte or 32 bit word, but only 23 bits are used for the fraction in any word (by contrast, the older Cray PVPs like the C90 used 64 bits for the `real' data type called Cray Floating Point (CFP) Format, closer to IEEE double precision; the fraction occupies 3 bytes of an IBM single precision word of 4 bytes). Cray T90 IEEE floating point double precision is 64 bits with a 52 bit fraction.

T90 Pipelines.

There are 2 vector fetch pipes and one vector store per processor. The processor has a vector length (VL) scalar register to hold the current vector length in the vector registers and a 128-element vector mask register for conditional storage calculations. Each CPu has chainable dual vector pipes to handling odd and even numbered operands in alternate pipes for vector addition and scalar-vector multiplication, leading to an asymptotic speed up of 4 due to the dual pipes. The independent scalar, vector and floating point function units (e.g., add, multiply with division and square root performed in hardware (Note: there is no divide pipeline on the classic Crays like the C90, so division used both reciprocal and multiply floating point function units, so the Cray C90 will give slightly different results than divide pipelined computers)) are pipelined, delivering one result per clock cycle after start-up, and they can be chained and overlapped. For instance the start up or the number of stages of the vector add pipe is 6, leading to an asymptotic speedup of six.

T90 Benchmark Performance.

The T90 Cray Y-MP T916/14512, installed in 1997, ranks as the 328th top computer in the world and

The T90 has Hockney Linear Model (see MCS572 class notes) parameters of Rmax=17.29GF, Rpeak=24.78GF, Nmax=unlisted and N1/2=unlisted, for linear algebra, according the Top500 Report of June 1998, however the T90 no longer appears in the June 2000 report (see the Top 500 Class Summary). Similarly, Gunter's World's Most Powerful Computing Sites of August 1998 list the T90 at 28 GF, but does not appear in the August 2000 report (see the Most Powerful Computing Sites Class Summary). NPACI lists a peak performance of 1.76 GF per processor which would imply 24.6 GF with ideal parallelization using all 14 processors.

TOP500 Supercomputer Sites at Netlib (June 2000).

World's Most Powerful Computing Sites.

Caution: Theoretical peak values will depend on who is reporting them, since there are several different models used.

T90 Memory Units.

The shared main memory (RAM, called common memory (CM) by Cray) of the T90 is 512 Megawords (MW) or 4 Gigabytes (8 bytes per word, called double words on most non-Cray systems) with a peak memory bandwidth of 450GB/sec. Each word being 64 bits, arranged in a total of perhaps 256 banks with a bank cycle time of 7 clock periods, and each processor has 5 parallel memory ports (two fetch (2 words wide), one store (2 words wide), one for the 8KB-8Way scalar cache (1 word wide) and one instruction (1 word wide, for code)), with a 21GB per sec per CPU or 294GB/sec per T90 theoretical maximum memory bandwidth, but only 196GB per sec total achievable. The T90 YMP uses B and T registers for temporary stores to get around the lack of a cache. The Cray YMP has gather-scatter in hardware for indirectly addressed vectors. However, the default home directory hard disk memory size available to a user is less than 60 MB at NPACI, while the temporary work directory `/work/[npaci-login-id]' is limited to a huge 450 GB and subject to purges in 96 hours without backups. The T90 also has long term storage in the HPSS high performance solid state storage system that is used for storing super size files with an FTP-like command `pftp'. For more information on the NPACI T90 try the command "target" on "t90.npaci.edu" ( Click for Annotated target output).

T90 UNICOS Operating System.

UNICOS is the UNIX Cray Operating System and is a variant of UNIX with Cray extensions. Current Cray parameters can be found with the UNICOS target command with explanations of the output units using the man target command. This local guide assume the user is using the user oriented Unix C-Shell (/bin/csh), otherwise the user will have a lot of problems trying to use other shells like `sh' (Bourne) or `ksh' (Korn Shell).

Shell Caution: If you have trouble with the above compiler command syntax, it may be that you have your borg account shell set as the Bourne shell (ksh) rather than the user standard C-shell (csh). You can change you shell for the current session by the command

/bin/csh

or you can change the shell permanently to a C-shell for which this guide is written by the command:

chsh [user_name] /bin/csh

where `[user_name]' must be replaced by your actual borg account name, but you must also logout and logback in again to activate the shell change. Using a C-shell should make your access must easier, while the Korn shell (ksh) or generic shell (sh) would be more powerful if you were doing computer systems work rather than scientific computing. At ACCC, new student accounts are set up with a Korn shell (ksh) in the ACCC template, probably because that is what the systems people use.

ACCC Page for UIC HP9000 (borg) features.

Many needed files are hidden away deep in the directory structure, so that you must specify the appropriate paths in your .login or .cshrc if using the C-Shell (or .profile if using the K-Shell (sh)) so that you do not have to do it with each use. Hence, C-Shell users will need to add path lines in your .login or .cshrc files

T90 Login and File Transfer.

T90 Secure Login: Access to the T90, called `t90' MUST be by the Secure Shell SSH Protocol, which can readily be found on the UIC student computer server `icarus' by typing:

ssh t90.npaci.edu -l [npaci_user_id]

ssh [npaci_user_id]@t90.npaci.edu

This latter form does not seem so robust, especially on the T3E. If ssh has difficulty with the Unix ".ssh/known_hosts" (will differ on other platforms) then edit the file by deleting the entry for the node that is giving the problem since the ssh key may be expired and try ssh command again.

command, where for MCS572 your `[npaci_user_id]' is of the form `ux4526??' for Fall 2000 with `??' is a unique pair of digits. SSH works like the Unix remote login command `rlogin', but encrypts your password so that it is nearly impossible to steal. The commands `rlogin' or `telnet' do not work with the `t90', resulting in the response ``t90.npaci.edu: Connection refused''. Secure Shell ssh sets up a key on your accessing computer for t90.npaci.edu, so the first time you may be asked to `continue' to establish this key and should respond with `yes' (a simple `y' will not work). The key should work fine thereafter, unless the hostname file is corrupted or the key expires. For more ssh information see

SDSC/NPACI SSH Security Page.

Secure Shell ssh software is also available for Windows PCs, Macintosh afor Fall 2000 nd other platforms:

Secure Access Documentation.

{Remark: For Windows, use either TTSSH or PUTTY, the latter is simpler to use but not as high powered.}

T90 File Transfer: File transfer to `t90', while prohibited when accessing `TO' the `t90', is accomplished by the universal TCP/IP Internet Protocol command ONLY FROM the `t90':

ftp [my_uic_computer].[my_department_id].uic.edu

to your UIC computer. It may be possible to use `sftp' from your UIC computer, but that is presently too difficult.

If available, secure copy `scp' (related to the Unix remote copy `rcp' command, except `scp' protects passwords) will work for SCP to or from the T90:

{To SCP from the NPACI T90 to UIC, Enter:}

``t90% '' scp [t90-filename] [uic-id]@[node].uic.edu:[uic-filename] (CR)

{To SCP from UIC to the NPACI T90, Enter:}

``t90% '' scp [uic-filename] ux4526[??]@t90.npaci.edu:[t90-filename] (CR)

{You may also place directory name in front of the target filename, e.g., `[uic-id]@[node].uic.edu:[directory]/[uic-filename]'.}

T90 Programming Languages and Compilers.

The usual programming languages for the T90 are Cray fortran Fortran 90 (f90), Cray Standard C (cc), and Cray C++ (CC), with the degree of vectorization being about the same for cc and f90.

T90 users have large storage quotas (less than 30 MB) in their T90 home directories. For much larger files, there are the users' work directories `/work/[npaci-user-id]' where `[npaci-user-id]' for MCS572 is of the form `ux451401' for Prof. Hanson, as an example. Here is a typical Fortran compile command:

f90 -O3 -r3 -o [executable] [program].f &

which uses level 3 optimization `-O3' (expands to `-h inline3, scalar3, task3, vector3' options), creates a listing file `[program].lst' from the enabled report level 3 option `-r3' passed to the compiler, and executable named `[executable]' in place of the default `a.out' from the source file `[program].f' in the background (`&'). The default optimization level is `-O2' which is moderate, level 1 scalar optimization, no parallel optimization and level 2 vector optimization. However, `-O0' forces no optimization, while for levels 2 and 3 the scalar, inlining and vector optimizations must be separately specified with then own `-O' option. Optimization level `O3' is recommended for most programs. Somewhat similarly, the usual C (or C++ with CC) procedure is

cc -O3 -h scalar2 -h report=istvf -o [executable-file] [program].c &

which creates a compiler information listing file `[program].V' (`f' suboption of `-h report' with compiler information on inlining (`i'), scalar (`s'), tasking (`t': parallel), and vector (`v') optimization, but it only gives optimization message, unlike f90 which gives other compiler information listing directly on the source), and executable `[executable-file]' in place of the default `a.out' from the source file `[program].c' in the background (`&') using higher level optimization `O3' and the `h scalar2' avoids an memory abort problem caused by `scalar3' scalar optimization. Note that moderate optimization is slightly different than in the Fortran case since the `-h' option can control optimization suboptions separately, similarly for optimization levels `-O0', `-O1' and `-O3'. The default optimization level is basically level 1 scalar optimization.

T90 Editors.

The usual editors on the T90 are the Unix visual editor `vi' and the line editor `ex'. Also, X-Windows pass through emulations are available. For extensive program revision, it may be advisable to ftp your program file back to your UIC or home computer for editing and returning it back to the T90 by ftp when finished.

More T90 Information.

For more information consult the Using the NPACI Cray T90.

T3E Overview.

The Cray T3E MC512 is a massively parallel processor (MPP) with 272 CPU processors, called PEs or Processing Elements, connected according to a three-dimensional torus topology (originally the T3D model, but with processor upgrade became T3E with the `E' only meaning that it followed the letter `D' or `enhanced'). The T3E is also called a MIMD (Multiple Instruction, Multiple Data) computer by Cray. The NPACI T3E's internet name is

t3e.npaci.edu

with the prompt nickname of `golden %'. For T3E information from NPACI, see

Using the NPACI CRAY T3Es

What does the CRAY T3E look like? CRAY T3E Picture.

T3E Processing Units.

Each CPU or processing node or processing element (PE) is a Digital Equipment Corporation DEC Alpha 21164 RISC 64-bit microprocessor with the nodes running at clock speed of 300 MHz (3.33ns. clock period). Each PE is a 64bit processor, it operates on words 8 bytes wide, and all 32 bit operations are performed as if 64 bit. The 272 nodes are classified into three types including 5 Operating Systems PEs, 7 Command PEs and 260 user Application PEs, with varying size memory (RAM) measured in Mega Double Words (MW). Each processor has a peak performance of 600 MFlops. Only 260 of the 272 PEs are reserved for running applications, since 5 are assigned to the global operating system and 7 are reserved for interactive sessions or single PE command applications.

T3E Benchmark Performance.

The Cray T3E-600 LC256-128, installed at NPACI in 1996, ranks as the 116th top computer in the world (June 200; was the 36th in June 1998) and has peak aggregate speed of 153.6GF on linear algebra benchmarks with Hockney Linear Model (see MCS572 class notes) parameters of Rmax = 117 GF, Rpeak=268GF according to the June 2000 Top500 reports (Nmax=59,904 and N1/2=8,832 from June 1998 reports) given at the web link above. .

T3E Memory Units.

The memory is a hybrid logically-shared and physically distributed memory. The NPACI has a T3E-600 Model LC256-128 with 260 application processing elements, each having 16 MegaWords (128MB) each of 64bit double words making a total of 32.5GB (1998) with the peak memory bandwidth of 720MB/s and memory cycle time of 70 ns. The T3E has data, instruction and secondary memory caches.

T3E Operating System.

The operating system kernel is call CHORUS, a small set of UNIX-like multiprocessing primitives. However, since access to the T3E is by remote scheduling from the T90 using the UNICOS micro kernel operating system (version UNICOS/mk 2.0.0) and the Network Queueing System (NQS), the user should refer to subsections on those topics.

T3E Direct Access.

Users MUST access the T3E directly using the Secure Shell (ssh), such as from UIC `icarus',

ssh t3e.npaci.edu -l [npaci-login-name]

ssh [npaci-login-name]@t3e.npaci.edu

command. SSH works like the Unix remote login command `rlogin', but encrypts your password so that it is nearly impossible to steal. The commands `rlogin' or `telnet' do not work with the `t90', resulting in the response ``t90.npaci.edu: Connection refused''. See section on the T90 for more information and how to get ssh on your own computer.

However, the user's home directory ``/usr/users/[n]/[username]'' is limited to source files and scripts totaling no more than 10MB. For compiling or linking or executing interactive jobs, the user must copy the source to the user's work directory in ``/work/[username]''. The ``/work'' directories have a ``File Wiper'' lifetime of about 80 hours, but files can be quite large, with a 171GB total for everyone. There is also the HPSS high performance solid state storage with requires ftp like commands.

T3E Programming Languages.

The T3E programs are compiled directly on the T3D using MPP versions of the Fortran90 compiler:

f90 -O3 -r3 -Xm -o [executable] [source].f &

or the C compiler

cc -O3 -h scalar2 -h report=isvf -Xm -o [executable] [source].c &

and similarly for `C++' with the `CC' command.

In the above compilation commands, ``-O3'' is a high level of optimization (in the f90 case see T90 Fortran Optimization), the optimization information reports are requested through the options ``-r3'' and ``-h report=isvf'' yield ``[source].lst'' compiler informational listing for f90 and ``[source].V'' for f90 and cc compilers, respectively. Note that the and task (t) suboptions have no effect on the T3E cc compiler as in the T90 cc compiler. The option ``-Xm'' is ESSENTIAL so that a ``malleable executable'' is produced that can be executed with any number of processors with ``mpprun'', while the form ``-X[npes]'' will cause problems for ``mpprun''. The library option ``-l mpi'' is already assumed as a default by the compilers and permits use of MPI parallel programming in the code.

Execution of the executable ``[executable]'' is by the massively parallel envelope command:

mpprun -n[npes] [executable] < [data] >& [output] &

where the user is restricted to one non-production, interactive job at a time using no more than 32 PEs for not more than 60 minutes, while ``[data]'' and ``[output]'' are optional input data and output files, respectively.

The massively parallel envelope command is `mpprun', which assumed that the executable is malleable, i.e., the `-X [npes]' processor number option has not been used with the compile commands `f90' or `cc'. The compilers and mpprun do not require any additional libraries to handle MPI commands.

The T3E can be used interactively for smaller jobs at most times, or it can be used through the NQE (NQS) batch system, with the new version of parallel programming language called MPI (Message Passing Interface) using the latest and greatest MPT (Message Passing Toolkit) with new default buffer size.

There is also an another nonCray MPI execution envelope command called ``mpirun'' that has a similar format"

mpirun -np[npes] [executable] < [data] >& [output] &

but note that the number of processors option is ``-np'' rather than ``-n''. It is not clear what the advantage is over the Cray ``mppun'' command, except that it allows an interface with the ``p4'' macros which are related to shared memory ancestors to ``mpi''.

Simple job status can be obtained from the jobs command which will display ``Running'' or ``Done'' for normal operation, or ``Exit'' for error aborted jobs; the generic process status ps command will list all processors with cpu timings that you have running on your job. The advanced process status for all used application processors can be found with the ps plus command:

T3E Message Passing Interface.

The parallel programming parts of these MPP language versions are with processor communication handled by a new version (5/2000) called the Cray Research MPT (Message Passing Toolkit) with components

MPI (Message Passing Interface)

PVM (Parallel Virtual Machine)

SHMEM (Logically Shared, Distributed Memory (SHMEM) Routines)

forms of message passing embedded in the Fortran or C code. Sample T3E interactive MPI programs for Jacobi iterations of Laplace's equation are given in for the revised Fortran 90 code:

laplace.mpi_f.f
laplace.mpi_f.output for 200X200 mesh and 3000 maximum iteration.

and a one for the revised C code is given in

laplace.mpi_c.c
laplace.mpi_c.output for 200X200 mesh and 3000 maximum iteration.

For more MPI information, see the MCS572 MPI Class Pages:

T3E Batch Queueing System.

Remote job scheduling on the T3E is accomplished by using the NQS (Network Queueing Environment or System) job scripts. A sample PSC T3E target job script for Fortran code is given in

fpgm.job

and a one for PSC T3E C code is given in

cpgm.job

These job scripts are run with the NQS QSUB submit command from the user's `${TMP}' temporary directory, such as by

qsub ${HOME}/fpgm.job

qsub ${HOME}/cpgm.job

where `${HOME}' denotes the meta-name of the user's home directory on `t90'. The job status can be checked by the NQS QSTAT status command:

qstat -a

and when done, the user can view the output if any. If for any reason you need to kill the job before the end, first note the job id number `[job_id]' at the beginning of your job line in the `qstat -a' output, then enter the command:

qdel -k [job_id]

which should stop a running job.

A user can try out the class sample NQS QSUB job scripts by copying one of the above Laplace code to your home directory and then recopying it to the recyclable source file of the form `*pgm.*' as follows:

cp laplace_mpi_f.f fpgm.f

cp laplace.mpi_c.c cpgm.c

depending on whether you want to test Fortran or C versions; you will also have to create a simple input data file called `data', inserting the number of iterations (e.g., using vi to insert: 200) into the input data file; then in the home directory entering the queue submit command:

qsub fpgm.job

qsub cpgm.job

then check for a finished job with `qstat -a' until the message "no batch queue entries" appears, finally looking for output in a file of the form `*pgm.output'. You can always modify the sample job scripts to suit your particular job requirements.

There is also the newly installed Portland Group High Performance Fortran (command = pghpf), so use `man - pghpf' to find out how to use it (untested by FBH).

More T3E Information.

For T3E information from NPACI, see

Using the NPACI CRAY T3Es

Supercomputer Centers Overview.

Pittsburgh and Cornell Center have been phased out of the NSF centers. Users will have to switch as early as March 1998 to NCSA in Urbana which has a Cray Origin 2000 or SDSC in San Diego which has both Cray T90 and T3E. MCS572 Fall 1998 had access to both NCSA-NCSA and NPACI-SDSC. MCS572 Fall 2000 with use the NPACI-SDSC Cray T90 and T3E.

Guide Notation.

This mini-local-guide is meant to indicate ``what works'' primary for access from UNIX systems to NPACI-SDSC. The use of the Unix C-Shell on the T90 is assumed throughout most of this local guide.

Cray, CF77, CFT77, Auto-tasking, SEGLDR, and UNICOS are trademarks of Cray Research, Inc. UNIX is a trademark of AT&T.

Computer prompts or broadcasts will be enclosed in double quotes (``,''), background comments will be enclosed in curly braces ({,}), commands cited in the comments are highlighted by single quotes (`,') {do not type the quotes when typing the commands}, and optional or user specified arguments are enclosed in square brackets ([,]) {However, do not enter the square brackets.}. The symbol (CR) will denote an immediate carriage return or enter. {Ignore the blanks that precede it as in `[command] (CR)', making it easier to read.} The symbol (Esc) will denote an immediate pressing of the Escape-key {Use no brackets please.} The symbol (SPACE) will denote an immediate pressing of the Space-bar {Warning: Do not type any of these notational symbols in an actual computer session.}

Return to TABLE OF CONTENTS?

Background References

For further information, please consult the sources (you can just click on the highlighted topics to access if you are surfing the world Wide Web):

Professor Hanson's MCS 572 Introduction to Supercomputing Home Page provides a large variety of links to useful supercomputing information.
San Diego Supercomputing Center (SDSC/NPACI)Home Page on the World Wide Web permits the direct search of the NPACI public web information directories. Use of Mosaic, Netscape, OmniWeb or lynx (line edit version) applications permits this access. For a faster introduction to NPACI see of HotPage Information.
NPACI User HotPage.
NPACI Documentation Page.
Using the NPACI CRAY T90 provides access to a great deal of information on the T90 including specifications and a good size index.
Introduction to Vector Computing, NPACI-SDSC Slide Tutorial.
Cray T90/J90 Hardware Overview, NPACI-SDSC Slide Tutorial.
T90 Fortran Optimization, explaining `f90' optimization levels much better than `man - f90' and many other things.
CRAY (SV1-T90) Course Notes
University of Western Ontario, Information Technology Services:
CRAY T90 Product Information (Cray T90 Homepage).
Using the NPACI CRAY T3Es provides access to a great deal of formation on the T3E including specifications and a good size index.
CRAY T3E Series Product Information (Cray T3E Homepage).
CRAY T3E Technology Profile
CRAY Research Manuals Search (Cray CF90, C, C++, ...)
Cray T3E User's Guide from Finland
I/O Optimization on the CRAY T3E
Getting Started With MPI: A Message Passing Interface for Parallel Programming: An Introduction to MPI at SDSC.
MCS572 NPACI Cray MPI General Information Page.
MCS572 NPACI Cray MPI Example Page.
"T3E" MPP Fortran Programming Model (paper by ftp).
The Benchmarker's Guide to Single-processor Optimization for CRAY T3E Systems, postscript paper by Ed Anderson, Jeff Brooks, and Tom Hewitt, Cray Research, June 1997.
man [command] (CR), when invoked in a UNIX-like system such as UNICOS, produces an on-line listing of the manual pages on the command [command], or similar function.
Consultation concerning problems related to using the Crays can be obtained from Professor Hanson {718 SEO, X3-2142, hanson@uic.edu}. NPACI protocol requires that Professor Hanson contact NPACI consultants for this class, if they are necessary.

Return to TABLE OF CONTENTS?

Annotated NPACI Cray T90 Sample Session.

The login procedure depends on your local method of accessing the Convex from UIC, but the best access is from a Unix type system since the NPACI Cray T90 operating system is UNICOS, which is substantially Unix and it is to the user's advantage to use Unix to Unix communication. If you do not now have a Unix account you should try to get one from your department's Unix system or from the UIC Computer Center graduate student Unix (Sun) server called `icarus'. Unix workstations are available in many science and engineering departments. If that does not work out or is not practical, then see Professor Hanson about other alternatives.

Logging into the T90:

For the NPACI Cray T90, access MUST be by the `ssh' Secure Shell command, which works from Unix, with the format:

ssh t90.npaci.edu -l [NPACI_user_name]

where `[NPACI_user_name]' for NPACI is of the form `ux4526??' and must be specified to logon to the NPACI T90. {Caution: from a UNIX operating system, it is essential to use lower case; t90.npaci.edu is the full Internet name for the NPACI Cray T90. The corresponding Internet Number of `t90' is `132.249.40.48' and is more basic since the Internet Name is derived from the number and the number may work when computer name servers are down. Another Caution: the usual Unix remote login rlogin or telnet commands are NOT accepted by `t90'. The T90 (`t90)' should respond with something like:}
``Trying 132.249.40.48... Connected to t90.npaci.edu.'' { If you make a mistake, recalling that Unix is case sensitive, you will probably have to try both login-id and password again.}
``...system Message Of The Day (MOTD)...'' also gives your assigned temporary directory for program compiling and execution...''
``t90% '' {The `% ' is the default UNICOS prompt, here modified to `t90% '. You are now in UNICOS on the NPACI CRAY T90.

NPACI is very serious about computer security, so the first thing you should do is to change the NPACI given password on your original account sheet given in class by entering the password change command}
passwd (CR)
``Old password:'' {Enter your original password again.}
[old-password] (CR)
``New password:'' {Enter your new 8 character password, which must contain at least two alphabetic characters and at least one numeric or special character.}
[new-password] (CR)
``Re-enter new password:'' {Retype new password to confirm original typed change.}
[new-password] (CR)
``t90% '' {Congratulations, you made it to the Cray T90, have a nice session.

You can end this session at any time you have a `t90% ' prompt by entering `logout' or pressing the `Ctrl' control key and 'd' key simultaneously (i.e., `ctrl-d').

You can check what the name of your `t90' home directory (file system) is by the Unix ``print working directory'' command:}
``t90% '' pwd (CR) {Your disk directory should be something like `/usr/users/0/[t90_user_name]' where `[t90_user_name]' is your Login-id on the NPACI Cray T90. You can list the current files on your account by the Unix ``list sets'' command:}
``t90% '' ls (CR) {If this is a new account you probably will not have any regular files listed, but the following form of `ls' command has options that reveal hidden `dot' files and give the long form of file information:}
``t90% '' ls -al (CR) {You may continue with a Cray T90 session by getting help with the usual UNIX `man' manual command:}
``t90% '' man ls (CR) {The default `man' output is paged, so press the `Spacebar' or enter `d' for another page and enter `q' for quit. Try `man man' for more information.}

Processing Fortran Code: {For a sample session for compiling and executing a Fortran or C Program, you can get a copy of the MCS572 `t90' starter problem via the web and transfer it to `t90':

or by `cp' copy command:

{in f90:}

``t90% '' cp ~hanson/t90start.f start.f (CR)

{OR in C:}

``t90% '' cp ~hanson/t90startcc.c start.c (CR)

or by Anonymous FTP (note that you can only FTP from the T90, but not to FTP it!):
``t90% '' ftp www.math.uic.edu (CR)
``Name (www.math.uic.edu:[user]): '' anonymous (CR)
``Password: '' [send_email_identity_as_password] (CR)

``ftp> '' cd pub/Hanson/MCS572 (CR)

{in f90:}

``ftp> '' get t90start.f start.f (CR)

{OR in C:}

``ftp> '' ftp> get t90startcc.c start.c (CR)
``ftp> '' bye (CR)

{`quit' or `bye's simple abbreviation `by' also works.}

If available, secure copy `scp' (related to the Unix remote copy `rcp' command, except `scp' protects passwords) will work for SCP to or from the T90:

{To SCP from the NPACI T90 to UIC, Enter:}

``t90% '' scp [t90-filename] [uic-id]@[node].uic.edu:[uic-filename] (CR)

{To SCP from UIC to the NPACI T90, Enter:}

``t90% '' scp [uic-filename] ux4526[??]@t90.npaci.edu:[t90-filename] (CR)

{You may also place directory name in front of the target filename, e.g., `[uic-id]@[node].uic.edu:[directory]/[uic-filename]'.}

{To compile and link/load with Fortran 90, next Enter:}
``t90% '' f90 -O3 -r3 -o start start.f & (CR) {Here, `f90' is the optimizing Cray Fortran90 compiler, `-r3' is the enable marking compiler level 3 report option with the report going to the file `start.lst', `start.f' is the Fortran source file, `-o start' means that the execution file is named just `start' rather than the default `a.out', and the last `&' means compilation is done in the background permitting use of the terminal while waiting for the job to finish.

OR to compile and link/load in C, enter:}
``t90% '' cc -O3 -h scalar2 -h report=isvf -o start start.c & (CR) {Here `-h scalar2' avoids memory abort problems with `scalar3' optimization, `-h report=isvf' gives the compiler optimization report.

The status of the job can be determined form the Unix `jobs' command:}

``t90% '' jobs (CR)

{or use the process status command:}

``t90% '' ps (CR)

{When you get the message:
``Done f90 -O3 -r3 -o start start.f''
instead of ``+Running ...'', OR a similar message for the C code, then the listing file can be examined by the Unix `more' paging command:}

``t90% '' more start.lst (CR)

{OR in C:}

``t90% '' more start.V (CR) {Else, the listing file can be viewed by the Unix visual editor (or other favorite editor):}
``t90% '' vi start.lst (CR)

{OR in C:}

``t90% '' vi start.V (CR) {If compilation and listing is satisfactory, then the module `start' for either f90 OR C may be executed:}
``t90% '' start >& start.output & (CR) {Again, the process job status may be checked while waiting by entering:}

``t90% '' ps (CR)

{OR}

``t90% '' jobs (CR) {The `ps' command will also give the number of processors being used by counting the number of times `start' appears, the default is 4 processors. When the ``Done'' message is displayed rather than ``+Running ...'', then the output can be viewed in pages:}
``t90% '' more start.output (CR) {Caution: the executable `start', as with the default executable `a.out', is a binary, rather than text file, so is not readable. Next you can transfer your files back to your printer connected computer (SDSC would be prefer that you not print from the `t90') by FTP:}
``t90% '' ftp [home_machine].[department].uic.edu (CR)
``login: '' [user_name] (CR)
``password: '' [user_password] (CR)
``ftp> '' cd [target_directory] (CR)

``ftp> '' put start.f (CR)

{OR for C:}

``ftp> '' put start.c (CR)
``ftp> '' put start.lst (CR)

{OR for C:}

``ftp> '' put start.V (CR)
``ftp> '' put start.output (CR)
``ftp> '' by (CR) {Finally for the job, it is good file management, even in the temporary directory, to remove your executable file since it is cheaper to regenerate than to store, so as not to be a "storage hog" using the Unix remove `rm' files command:}
``t90% '' rm start (CR) {At this point you can `logout' of the `t90' by entering:}
``t90% '' logout (CR)
``% '' {Return to your local UIC UNIX session.}

Finding Work Disk:
At any time you can determine your temporary disk path name (it is given during the MOTD):}
``t90%'' echo $WORK (CR) {This command echoes the name of your current work directory (of the form `/work/[npaci_user_name]') that is usually displayed when you log in. `echo $TMP (CR)' does the same thing. The `cd $WORK (CR)' command takes you to your temporary directory and the `cd $HOME' command changes back to your home directory. The temporary directory is limited to active files that will be there for short periods, but there is no specific quota in size. Executable and object modules can take up a lot of space, so a good temporary location for them is the work disks. Be careful not to take up valuable UNICOS disk space, erase unneeded files by the command `rm [file] (CR)'. Otherwise, continuing the session:}
``t90%'' cp pgm.f $WORK (CR) {This command copies the a Fortran source file `pgm.f' to the user's temporary disk where the user will be less likely to run out of memory.}
``t90%'' cd $WORK (CR) {This command changes from the current directory to the user's temporary directory for doing computational intensive supercomputing.}
``t90%'' rm pgm.o run output (CR) {Removes the files named `pgm.o', `run' and `output' if they exist (answer `y' for yes if they do, or else UNICOS may say it can not find the UNICOS file name. This may be needed to prevent interference when the updates to these files are produced by compile and execute commands, while copy over restrictions are in effect.}

Movement and Alternate Ways to Execute Files:

You can instead move the files to your default home directory with the move command:}
``t90%'' mv output ${HOME}/[label].output (CR) {Moves the output file to your home directory and relabels it to `[label].output' for uniqueness, because on the temporary disk they are liable to be automatically deleted after inactivity.}

{You can execute `run' in the temporary directory while in the home directory with the command:}
``t90%'' ${TMP}/run < data > output & (CR) {Executes the executable in the temporary directory, where the mandatory curly brackets around the variable `TMP' tell the UNICOS shell that meta variable name is only `TMP' and not `TMP/run', for example. Also, while in the temporary directory, the home directory file `data' file is redirected as input to the executable `run'.}
``t90%'' run < ${HOME}/data > output & (CR) {Executes the executable from another directory other than the home directory, assuming that the input file `data' is still in your home directory, but that the output file `output' will be stored in your temporary directory in this example. The curly brackets in ``${HOME}'' are necessary, and do not signify a comment there. ~other topics are described in the Command Dictionaries of Section COMMAND DICTIONARY.}
``t90%'' logout (CR) {Logs you out of your remote NPACI session. You then return to your prior session on UICVM or UNIX or other computing seat.}

Return to TABLE OF CONTENTS?

Annotated NPACI Cray T3E Sample Session.

The login procedure for the NPACI Cray T3E Massively Parallel Processor is very similar to the NPACI Cray T90 Parallel Vector Processor, but the sample codes are usually different corresponding to the different types of processing models: distributed memory versus shared memory.

Logging into the T3E:

For the NPACI Cray T3E, access is by the `telnet' TCP/IP command. Telnet works for PCLab PCs as well as in Unix, with the format:

telnet t3e.npaci.edu

{Caution: for a UNIX operating system, it is essential to use lower case; t3e.npaci.edu is the full Internet name for the NPACI Cray T3E and it's internet address (number) is `128.182.73.68'. The T3E (`t3e)' should respond with:}
``Trying 128.182.73.68... Connected to t3e.npaci.edu.''
``Escape character is '^]'. .....''
``password:'' (CR) {Since your NPACI T3E user name is not likely to be your UIC user name, you should just enter a carriage return (CR) and then enter your T3E username.}
``Login incorrect''
``login:'' [NPACI_user_name] (CR) {Then enter the corresponding password:}
``password:'' [password] (CR) { If you make a mistake, recalling that Unix is case sensitive, you will probably have to try both login-id and password again.}
``...system Message Of The Day (MOTD)...''
``[NPACI_user_name]:t3e% '' {The `% ' is the default UNICOS prompt, here modified to `t3e% '. You are now in UNICOS on the NPACI CRAY T3E.

NPACI is very serious about computer security, so the first thing you should do is to change the NPACI given password on your original account sheet given in class by entering the password change command}
passwd (CR)
``Old password:'' {Enter your original password again.}
[old-password] (CR)
``New password:'' {Enter your new 8 character password, which must contain at least two alphabetic characters and at least one numeric or special character.}
[new-password] (CR)
``Re-enter new password:'' {Retype new password to confirm original typed change.}
[new-password] (CR)
``t3e% '' {Congratulations, you made it to the Cray T3E, have a nice session.

You can end this session at any time you have a `t3e% ' prompt by entering `logout' or pressing the `Ctrl' control key and 'd' key simultaneously (i.e., `ctrl-d').

You can check what the name of your `t3e' home directory (file system) is by the Unix ``print working directory'' command:}
``t3e% '' pwd (CR) {Your disk directory should be something like `/usr/users/[n]/[t3e]' where `[t3e]' is your Login-id on the NPACI Crays. You can list the current files on your account by the Unix ``list sets'' command:}
``t3e% '' ls (CR) {If this is a new account you probably will not have any regular files listed, but the following form of `ls' command has options that reveal hidden `dot' files and give the long form of file information:}
``t3e% '' ls -al (CR) {You may continue with a Cray T90 session by getting help with the usual UNIX `man' manual command:}
``t3e% '' man ls (CR) {The default `man' output is paged, so press the `Spacebar' or enter `d' for another page and enter `q' for quit. Try `man man' for more information.}

Processing Fortran Code: {For a sample session for compiling and executing a Fortran Program, you can get a copy of the MCS572 Laplace-MPI Sample problem via the web and transfer it from UIC to `t3e' by ftp from the T3E (see next section):

Click here to get `laplace.mpi_f.f', T90 Fortran Laplace-MPI code

For an example of compiling and linking the MCS572 Fortran based `t3e' starter problem from your temporary directory (assuming `fpgm.f' has been transferred to your `t3e' account), enter:
``t3e% '' mkdir /tmp/[NPACI_user_name] (CR) {This makes your temporary directory (unlike T90 you have to create your own and may have to recreate your temporary directory again is it is wiped out by the `File-Wiper. Next enter:}
``t3e% '' cp fpgm.f /tmp/[NPACI_user_name] (CR) {This copies the starter source file to your temporary directory, (the meta-variable `${TMP}' on the T3E only denotes the temporary directory created for each batch run). Next enter:}
``t3e% '' cd /tmp/[NPACI_user_name] (CR) {This changes the current directory (cd) to the user's temporary directory where you should compile, load and execute your code. Do not worry about your home directory since you go back by the command `cd ~'. Some users prefer the push and pop directory commands such as `pushd /tmp/[NPACI_user_name]' to get to the temporary directory and `popd' to return `${HOME}' or to which ever directory has been previously accessed, since prior directories are kept in a push/pop buffer stack. Next Enter:}
``t3e% '' f90 -O3 -r3 -Xm -l mpi -o fpgm fpgm.f & (CR) {Here, `f90' is the optimizing Cray Fortran 90 compiler, `-O3' is the level 3 optimization (usual) option, `-r3' is the enable marking compiler report option with the report going to the file `fpgm.lst', `-Xm' ensures a malleable executable that can run on any number of available T3E application processors, `-o fpgm' means that the execution file is named just `fpgm' rather than the default `a.out', `fpgm.f' is the Fortran 90 source file, and the last `&' means compilation is done in the background permitting use of the terminal while waiting for the job to finish. The status of the job can be determined form the Unix `jobs' command:}
``t3e% '' jobs (CR) {The process status `ps' command can also be used. When you get the message:
``Done f90 -O3 -r3 -Xm -l mpi -o fpgm fpgm.f''
instead of ``+Running ...'', then the listing file can be examined by the Unix `more' paging command:}
``t3e% '' more fpgm.lst (CR) {Else, the listing file can be viewed by the Unix visual editor (or other favorite editor):}
``t3e% '' vi fpgm.lst (CR) {If compilation and listing is satisfactory, then create a input file called `data' with the maximum number of iterations `5000' (without quotes) in it and is needed by the executable:}
``t3e% '' vi data (CR) {and the module `fpgm' may be executed within the `mpprun' envelope run environment on 4 T3E processors:}
``t3e% '' mpprun -n4 fpgm < data >& fpgm.output & (CR) {The mpprun command supplies a copy of the executable to each of the 4 processors (`-n4' option) needed for this code and all 4 participate in the solution, synchronized by the MPI parallel programming library subroutines. Again, the job status may be checked while waiting (or using `ps' which should indicate how long 4 processors have been running or `psp -p app' for much more information) by entering:}
``t3e% '' jobs (CR) {When the ``Done'' message is displayed rather than ``+Running ...'', then the output can be viewed in pages:}
``t3e% '' more fpgm.output (CR) {Caution: the executable `fpgm', as with the default executable `a.out', is a binary, rather than text file, so is not readable. Next you can transfer your files back to your printer connected computer (NPACI would be prefer that you not print from `t3e' since you do not have a way to retrieve the output) by FTP:}
``t3e% '' ftp [home_machine].[department].uic.edu (CR)
``login: '' [user_name] (CR)
``password: '' [user_password] (CR)
``ftp> '' cd [target_directory] (CR)
``ftp> '' put fpgm.f (CR)
``ftp> '' put fpgm.lst (CR)
``ftp> '' put fpgm.output (CR)
``ftp> '' by (CR)

If available, secure copy `scp' (related to the Unix remote copy `rcp' command, except `scp' protects passwords) will work for SCP to or from the T3E:

{To SCP from the NPACI T3E to UIC, Enter:}

``t3e% '' scp [t3e-filename] [uic-id]@[node].uic.edu:[uic-filename] (CR)

{To SCP from UIC to the NPACI T3E, Enter:}

``t3e% '' scp [uic-filename] ux4526[??]@t3e.npaci.edu:[t3e-filename] (CR)

{You may also place directory name in front of the target filename, e.g., `[uic-id]@[node].uic.edu:[directory]/[uic-filename]'.}

{Finally for the job, it is good file management, even in the temporary directory, to remove your executable file since it is cheaper to regenerate than to store, so as not to be a "storage hog" using the Unix remove `rm' files command:}
``t3e% '' rm fpgm (CR) {At this point you can `logout' of `t3e' by entering:}
``t3e% '' logout (CR)
``% '' {Return to your local UIC UNIX session.}

Processing C Code:

For a sample session for compiling and executing a Fortran Program, you can get a copy of the MCS572 Laplace-MPI Sample problem via the web and transfer it from UIC to `t3e' by ftp from the T3E (see next section):

Click here to get `laplace.mpi_c.c', T90 C Laplace-MPI code

``t3e%'' cd /tmp/[NPACI_user_name] (CR) {Next compile the sample code:}
``t3e%'' cc -O3 -h report=isf -Xm -l mpi -o cpgm cpgm.c & CR) {This Cray Standard C command both compiles and links the C program source `cpgm.c' which directly produces the compiler vector optimization report `cpgm.V' (suboption `f') with information inlining (`i'), scalar (`s') optimization. The `-Xm' option ensures a malleable executable that can run on any number of available T3E application processors. Also produced is the execution command file called `cpgm' instead of the default name `a.out'. For execution you will need to create a input file `data' with the maximum number of iterations, say `5000' in it:}
``t3e% '' vi data (CR) {When the job finishes compiling then the `cpgm' executable can be run in the `mppun' T3E execution environment using 4 application processors needed for this code:}

``t3e%'' mpprun -n4 cpgm < data >& cpgm.output & (CR) {The `mppun' command form finally executes the program `cpgm.c' on 4 processors (`-n4' option) with each processor getting its own copy of the executable, synchronized by the MPI parallel programming library functions. The output of `cpgm' is redirected into the output file `cpgm.output' in the background. This command line has the same syntax as it would for a Fortran job. The `> cpgm.output' option would not be used if output at the screen is desirable, but that is rarely practical except for small jobs. The ending `&' means that the job will run in the background enabling the user to continue working in the session. Interactive jobs are limited to 15 minutes and 32 processors, while batch jobs (see about `nqs' in Subsection on NQS) have a much less limits. Once you have finished with the huge `cpgm' file, it is a good file management to either remove it with the command:}
``t3e%'' rm cpgm (CR) {Removes the files `cpgm' answer `y' for yes to remove the two files if queried. When you are finished on `t3e' then}
``t3e%'' logout (CR) {Logs you out of your remote NPACI session. You then return to your prior session on UICVM or UNIX or other computing seat.}

Return to TABLE OF CONTENTS?

ftp File Transfers between NPACI/Cray/UNICOS and UIC

The FTP file transfer protocol is the fastest method of file transfer between UIC and NPACI Cray UNICOS, because it uses a fast internet communication link.

ftp File Transfers at the NPACI Crays

At the NPACI Crays you can transfer file between the Crays and UNIX. The `ftp' command on UNIX is very much like the `ftp' command in UNICOS.

In order to transfer a file from UNICOS and to UIC, enter the commands:

For Transfer to UIC Unix:

ftp [machine].[dept].uic.edu (CR)

[UIC-name] (CR)

[UIC-password] (CR)

cd [UIC-directory] (CR)

lcd [UNICOS-directory] (CR)

ls (CR)

!ls (CR)

!get [UIC-fn.ext] [[UNICOS-fn.ext]] (CR)

put [UNICOS-fn.ext] [[UNIX-fn.ext]] (CR)

by (CR)

If available, secure copy `scp' (related to the Unix remote copy `rcp' command, except `scp' protects passwords) will work for SCP to or from the T90:

{To SCP from the NPACI T90 to UIC, Enter:}

``t90% '' scp [t90-filename] [uic-id]@[node].uic.edu:[uic-filename] (CR)

{To SCP from UIC to the NPACI T90, Enter:}

``t90% '' scp [uic-filename] ux4526[??]@t90.npaci.edu:[t90-filename] (CR)

{You may also place directory name in front of the target filename, e.g., `[uic-id]@[node].uic.edu:[directory]/[uic-filename]'.}

Go to next section Execution of Cray FORTRAN90 or C?

Return to TABLE OF CONTENTS?

ftp File Transfers from UIC UNIX

The file transfer protocol program from a UNIX session is a similar to file transfers from the NPACI UNICOS sessions, because both have UNIX or extended UNIX operating systems, as discussed in the last section.

Go to next section Execution of Cray FORTRAN90 or C?

Return to TABLE OF CONTENTS?

ftp File Transfers from the UIC PC Labs

File transfer protocol (ftp) on a PC Lab PC may not be practical for must users, due to lack of permanent storage. The nearest Xerox PostScript printer to 2249f is SEL2263, while others are SEL2265, SEL2058 and SEO327. However, if the PC is your favorite medium, then use it as in the above Cray or Unix subsections.

Go to next section Execution of Cray FORTRAN90

Return to TABLE OF CONTENTS?

Execution of Cray T90 FORTRAN90 (f90) or Cray C

Compilation:
{Starting in Cray UNICOS with a f90 compatible FORTRAN source file named `[source].f', the typical format of the UNICOS compilation command is:}
f90 -r3 [source].f (CR)
{Here, `-r3' is the level 3 report option that enables the helpful information marking of loops for scalar optimization (S), vector optimization (V), parallel optimization (P), unrolling (R), unwound (W), or short vector (Vs) optimization in the compiler information listing file that is automatically stored in the file `[source].f'. No additional option is need for the automatic optimization, unless altered. The same thing can be accomplished by the preferred command}
f90 -c -r3 [source].f (CR)
{An object file is also produced and is stored in `[source].o'. }
{For Cray Standard C source file named `[source].c':}
cc -c -h report-istvf [source].c (CR) {This is the Cray Optimizing C compile command, with compiled source produced called `[source].o'. Here the option `-c' denotes compilation. Also}
cc -hreport=isvf -o run [source].c (CR)
{compiles and links the C program while producing messages on inlining, scalar and vector optimizations.}
Linking/Loading Step Only:

segldr -o run [source].o (CR) {This step is the link or load step using the Cray UNICOS segment loader with a typical format. Here the executable name is taken to be the generic name `run', but you can choose whatever name you like, omit the `-o' option produces the default UNIX name `a.out' for the executable. However, it is wise to stick to a single executable name, because the file size of typical executable will exceed your 250 Cray Block home directory limit and a single name makes it easier to delete or move to either your work disk. The `f90' combines the compile step of `f90 -c' and the link step of `segldr' such as in the command `f90 -r3 -o run pgm.f'.}
Execution Step:
{If you use UNIX redirection for input and output then the format is of the execute command is like:}
run < [data] >& [output] & (CR) {where `[data]' is the input file and `[output]' is the output file that also receives diagnostic messages in the background. Input files for FORTRAN `read' statements and output files for Fortran write statements can also be allocated to the terminal or UNICOS files using the Fortran `open' statement to reallocate units 5 and 6 for f90 as in the example programs used below.}

Example 1: Execution using the Terminal for Input and Output

As practice, you can run any source program that you have transported to UNICOS. The simple code ` convert.f':

      program convert
code: convert from debug fortran  cogs, slightly modified.                     
change:  input & output is to & from terminal, input at prompt.
caution: compile, load, and execute in UNICOS using the three commands:
command:   f90 convert.f
command:   segldr -o convert convert.o
command:   convert
      real a(999)
      write(*,*) 'input any integer less than 1000:'
      read(*,*) i
      a(i) = float(i)
      write(*,6000) a(i)
6000  format(' floating point representation: ',e13.5)
      write(*,*) 'What happens when you exceed array bound of 999?' 
      stop  
      end

{Since this simple-minded `convert.f' program uses the terminal as undeclared input and output units, corresponding respectively to `read(*,*)' and 'write(*,[n])' statements, without specifying an the f90 `open' statement. Be sure to do this in your work directory which you can change to by using `cd $TMP'. The source `convert.f' is executed with the 2 commands:}
f90 -o run convert.f (CR)
run (CR) {Note that the program is too trivial to need any optimization options. Here `f90' is a utility combining both the compile and load commands (no convert.o file is stored). Upon execution with `run', you are asked to supply an integer like 6:}
``Input an integer less than 1000:''
6 (CR) {UNICOS responds with the output:}

 floating point representation:   0.60000E+01
 What happens when you exceed array bound of 999?
 STOP executed at line 15 in Fortran routine 'CONVERT'
 CP: 0.002s,  Wallclock: 2.311s
 HWM mem: 163926, HWM stack: 2048, Stack overflows: 0

{To rerun the same code without recompiling, merely enter `run' again:}
run (CR) {Your response should be to enter another number as above. Do not spend too much time with `convert.f, because it can only read and write.}

Example 2: Execution using Input and Output with Files

The second example uses data files for both input unit 5 and output unit 6, as well as the UNIX Fortran seconds timer `second()'

code:  craytest fortran  from cogs disk =ncsa ctss users guide eg#1
      program tempt 
calculation:  c(i)=exp(-0.5)*pi/i
change:  input from file 'tempt.data' and output to file 'tempt.output'
change:  particular input is the vector length "nx"
change:  second cpu(user) time 'second' added.
caution:  In Cray T90 UNICOS compile, load and execute with:
command:  f90 -r3 -o tempt tempt.f
command:  tempt 
      parameter (ndim=5120)
      real a(ndim),b(ndim),c(ndim)
caution:  cft "real" implies 48bits = 6bytes for the fraction,
continued:  unlike ibm "real" which implies 24bits = 3bytes.
continued:  Otherwise all variables not starting with (i-n) are
continued:  implicitly real, unless otherwise declared.  The fraction
continued:  for "double precision" is 96bits=12bytes, hence no "real*8"
      open(6,file='tempt.output')
      open(5,file='tempt.data',status='old')
      read(5,*) nx
      t1=second()
      t2=second()
      call init(ndim,nx,a,b)
      call calc(ndim,nx,a,b,c)
      t3=second() 
      clock=t2-t1
      time=(t3-t2-clock)
      write(6,66) nx,nx,c(nx),clock,time
66    format(1x,' nx=',i5,'; c(',i5,')=',f10.7
     & /'  clock=',f12.7,' seconds; code=',f12.7,' seconds')
      stop
      end
           subroutine init(ndim,nx,a,b)
comment:  removed cdir$ novector
      real a(ndim),b(ndim)
      pi=acos(-1.0)
      do 10 ix=1,nx
         a(ix)=pi
         b(ix)=float(ix)
10    continue
      return
      end 
           subroutine calc(ndim,nx,a,b,c)
comment:  removed cdir$ novector
      real a(ndim),b(ndim),c(ndim)  
      do 20 ix=1,nx
         c(ix)=exp(-0.5)*a(ix)/b(ix)
20    continue
      return
      end

{CAUTION: you should not use `ex' or `vi' for extensive editing on the T90, which you can do at your local host, UICVM or UNIX; however, this short editing session will only take a short amount of time. A selected list of `ex' commands are given below. The `ex' prompt is `:'. You add lines with the `ex' subcommand `0a' to start with, to add one after line 0 in:}
``:'' 0a (CR)
500 (CR)
. (CR) {Finally save and exit `ex' with the combined `wq' = `w | q' subcommand:}
``:'' wq (CR) {If successful (you can check by `cat tempt.data (CR)' you should now have files called `tempt.data' from `ex' and `tempt.f' from `ftp' or other source, so that you can now compile, load, and execute `tempt.f' by entering the three command lines:}
``t90%'' f90 -r3 tempt.f (CR)
``t90%'' segldr -o run tempt.o (CR)
``t90%'' run (CR) {If successfully executed, UNICOS should respond something like:}

 STOP executed at line 31 in Fortran routine 'TEMPT'
 CP: 0.002s,  Wallclock: 0.114s,  0.1% of 16-CPU Machine
 HWM mem: 182085, HWM stack: 19416, Stack overflows: 0

{Your output will be in `tempt.output' and you can list is by the command:}
``t90%'' cat tempt.output (CR) {Your output should look something like:}

  nx=  500; c(  500)= 0.0038109
  clock=   0.0000041 seconds; code=   0.0000102 seconds

{If you wish to re-run the program again with a different number, the enter `ex tempt.f (CR)' again or `!ex (CR)', and enter within EX the subcommand `1c (CR)', type `1000 (CR)' to change `500' to `1000', enter `.' to end the change subcommand, enter `wq (CR)' after the ``:'' prompt, and then enter `run (CR)' and `cat tempt.output (CR)' again. When you are done with `tempt.f', remove all the files from UNICOS that you do not need, using:}
``t90%'' rm run tempt.o tempt.l (CR) {for example, but especially the big `run'.}

Modifications for C: Compile and Execution with C

For information on C language programs use the UNICOS commands:

man cc (CR)

For more details on commands and typical formats refer to the subsections of Section UNIX Command Dictionary. When executing larger codes with `run', the user may have to work on the user's temporary disk. For even more details see the NPACI UNICOS STARTUP PACKAGE and the appropriate Cray manuals. Also use the UNICOS `man' command.

UNIX Command Dictionary.

Return to TABLE OF CONTENTS?

Cray UNICOS Specific Unix Commands.

UNICOS Special Information Commands

help [SCCS command name] (CR)

docview [options] (CR)

primer, docview, UNICOS60update, c.std, fortran, parallel_5.0, segldr, perf, scc30, and cdbx

introductory, languages and utilities

usage (CR)

quota (CR)

UNICOS T90 Fortran90 (f90) Compile, Load and Execution Commands

f90 -r3 -[other options] [source].f [other source files] (CR)

Marking                       Meaning 
   S          scalar loop optimization (major marker)
   V          vector optimization (major marker)
   P          Parallel optimization (major marker)
   Vs         short vector optimization
   W          unwound     (major marker) {short inner-most loops with trip 
              counts of not more than 5 are collapsed or transformed to single
              statements so that the next inner-most loop can be vectorized
              provided there are no dependencies}
   b          bottom loading     {pre-fetching is used for the next
              iteration of scalar loops, only and `-o nobl' kills it} 
   c          conditionally vectorized, {subject to run-time
              determination of recurrence vector length}
   k          kernel scheduling
   i          unconditionally vectorized with IVDEP
   r          loop unrolling     {a set of loop iterations is
              collapsed into one iteration that has been enabled by the `-e'
              enabling option with its `m' loop marking sub-option}
   D          delete loop

segldr

f90 -eS [source].f (CR)

f90 -g [source].f (CR)

cdbx

man cdbx

segldr -o [executable-file] -l [library-list] [source].o (CR)

f90

Numerical Recipes in Fortran or C

f90 [-options] -o [executable] [source].f (CR)

f90 -o [executable]

f90

segldr

f90 -limsl [source].F (CR)

IMSL Software at NPACI

To find out what other special software is at NPACI click on: NPACI Installed Software

[executable-file] < [input-file] > [output-file] & (CR)

Return to TABLE OF CONTENTS?

UNICOS C Language Commands

cc -o run [file].c (CR)

cc -c [file].c (CR)

cc -hnoopt -o run [file].c (CR)

cc -o run -h report=isvf [file].c (CR)

See `man cc' or `docview' for more information.
#define fortran : Form of C header statement to permit the call to a fortran subroutine from a C program. For example:

#include <stdio.h>
#include <fortran.h>
#define fortran
main()
{
         fortran void SUB();
         float x = 3.14, y;
         SUB(&x, &y);
         printf("SUB answer: y = %f for x = %f\n", x, y);
}

#pragma _CRI [directive]

segldr -o [executable-file] -l [library list] [source].o (CR)

f90

Numerical Recipes in Fortran or C

[executable-file] < [input-file] > [output-file] & (CR)

Return to TABLE OF CONTENTS?

UNICOS Performance Commands

Cray Prof Profiling Facility:

Cray Error Explaining Command:

explain [error-message-code] (CR)

Cray Job Accounting (ja) Command:

ja (CR)
{[}executable] (CR)
ja -csf (CR)

parallel processing on the YMP series like the T90 is very expensive

Cray Perftrace (perf) or Performance Trace Facilities:

f90 -ef [source].f (CR)
segldr -l perf [source].o (CR)
a.out > [source].perf (CR)
segldr - l perf perf[n] [source].o (CR)
a.out >> [source].perf (CR)

Cray Hardware Performance Monitor (hpm):

hpm -g[n] -d [executable] > [source].hpm[n] (CR)

Cray JumpTrace (jt) and JumpView (jumpview):

Fortran Example:

f90 -ef [pgm].f
jt ./a.out
jumpview

C Example:

cc -ltrace -Gp [cpgm].c
jt ./a.out
jumpview -Luch >[cpgm].listing

JumpView Main Menu:

----------------------------------------------------- MAIN MENU
1  Master Summary          |  7  List by Average Time/Call
2  Routines: List by Time  |  8  Operating Environment
3  List by Megaflops       |  9  Long Report by Routine Name
4  List by In-Line Factor  | 10  Detail Report by Symbol
5  List by Name            | 11  Detail Report by Block
6  List by Calls           | 12  Options
----------------------------------------
  H  HELP
  Q  QUIT
          Enter Number/Letter of Action Desired
---------------------------------------------------------------

Cray Autotasking Expert Performance System (atexpert):

atexpert [options] (CR)

Return to TABLE OF CONTENTS?

UNICOS makefile Commands

make [-options] [step-name] (CR)

# Use ``make -f make.unicos_2 mrun>& pgm.l &;
run<data>out''.
SOURCES = pgm.f
OBJECTS = pgm.o
FLAGS = -em    
mrun : $(OBJECTS) 
segldr -o run $(OBJECTS) 
                                         
.f.o : f90  $(FLAGS)  $*.f

CAUTION: The commands, like `segldr' or `f90', must be preceded by a `Tab-key' tab as a delimiter, but the tab will not be visible in the UNIX listing.

fmgen -m [make-name] -c f90 -f [-flag] -o [executable] [source].f (CR)

f90

run

Caution: the makefile only uses the source name only when that coincides with the name used in the Fortran `program' statement and only one type of `f90' flag can be used

Return to TABLE OF CONTENTS?

UNICOS Mail Commands

mailx (CR)

caution: `mailx' is close to the usual Unix mail, whereas the UNICOS `mail' command is NOT

t [N](CR)

s [N] mbox (CR)

s [N] [file](CR)

e [N] (CR)

on `EX'

v [N] (CR)

d [N] (CR)

m [user] (CR)

~m [N] (CR)

\d (CR)

q (CR)

mailx [user] (CR)

~r[filename] (CR)

~c[userid] (CR)

~e (CR)

~v (CR)

on EX

on VI

wq (CR)

:wq' (CR)

\d (CR)

mailx [name]@[machine].[dept].uic.edu < [filename] (CR)

Return to TABLE OF CONTENTS?

UNICOS Network Queueing System (NQS)

qsub [options] (CR)

man qsub (CR)

#QSUB -lM [memory-amount]

#QSUB -lT [CPU-time-amount]

#QSUB -l mpp_p=[t3e_procs],mpp_t=[t3e_time]

#QSUB -q mpp

grep [username] /etc/group (CR)

For more information about batch processing with NQS, click on:

Using the CRAY T90

Using the CRAY T3E

qstat [options] (CR)

man qsub (CR)

/mpp/bin/mppstat (CR)

/usr/local/adm/access/bin/qstatmpp (CR)

Return to TABLE OF CONTENTS?

T90 Fortran90 (f90) and other Extensions

For optimization, it is recommended that your f90 program aid the f90 vector model, i.e. structure the code so that the compiler can automatically recognize as vectorizable. Usually only inner most loop is vectorizable. Avoid loop GOTOs and IFs. Avoid CALLs within loops. Avoid loop READs and WRITEs. Use vectorizable functions. Avoid data dependencies. Use compiler directives, such as `!DIR$ VECTOR' and `!DIR$ NOVECTOR'. Minimize vector strides. Tune code to Fortran column-wise environment in the physically linear memory. Don't even think about using tabs, except in makefiles.

T90 Fortran90 (f90) Compiler Options

See also Section ``Execution of Cray T90 Fortran90 (f90)'' and Subsection ``T90 UNICOS f90 Compile, Load and Execution Commands''. Also see the appropriate sections, `docview' and `man cc' for items on Cray Standard C.

T90 Fortran90 (f90) Miscellaneous Extensions

``FORTRAN90 Array Notation''

real [variables-list]

POINTER (P,A)

``Execution Time Allocation''

open ([unit],file=`[fn]',status='unknown')

save [variable or array name list separated by commas]

recursive [function or subroutine]([subprogram arguments])

[statement] ! [embedded comment]

intrinsic [f90-function1][,[f90-function2]]

Return to TABLE OF CONTENTS?

Fortran90 Array Construction Functions

PACK([array],[mask-array][,[vector]])

UNPACK([vector],[mask-array],[field-array])

SPREAD([array],[dim],[ncopies])

RESHAPE([array],[shape][,[pad]][,[order]])

Fortran90 Array Reduction Functions

The reduction functions reduce the input to a scalar output.

SUM([array][,[dim][,[mask]]])

PRODUCT([array][,[dim][,[mask]]])

MAXVAL([array][,[dim][,[mask]]])

MINVAL([array][,[dim][,[mask]]])

COUNT([mask][,[dim]])

ANY([mask][,[dim]])

ALL([mask][,[dim]])

Fortran90 Array Manipulation Functions

The manipulation functions rearrange the elements of the target matrix.

TRANSPOSE([array])

EOSHIFT([array],[shift][,[boundary][,[dim]]])

CSHIFT([array],[shift][,[boundary][,[dim]]])

Fortran90 Array Location Functions

The location functions find the location of elements of the target matrix.

MAXLOC([array][,[mask]])

MINLOC([array][,[mask]])

Fortran90 Array Matrix Multiply Functions

The matrix multiply functions compute the matrix products of the target matrices.

MATMUL([array1][array2])

DOT_PRODUCT([vector1][vector2])

Fortran90 Array Functions TEST CODE

T90 Fortran90 (f90) Differences:

The following f90 code contains examples of use of many of the Fortran90 array intrinsic functions mentioned above. There are some rules:

Intrinsic statement is needed for all f90 intrinsics within f90 codes.
Constructors of the form b=(/1 2 3/) work with the f90 compiler.
Fortran90 array intrinsics used within f90 will take no auxiliary markers or keywords like "dim=" or "mask=".
array sections can not be used in print statements: NOT print*,b(1:3)

How do you sum an entire array only subject to a mask, but with no dimension restrictions?

    If  b =  1  3  5            logical mask=b.gt.3
             2  4  6

    then   s3=sum(b,1,mask)  or  s2=sum(b,2,mask) work when real s3(3),s2(2)

    but    isum=sum(b,mask)  or  isum=sum(b,,mask) or isum=sum(b,:,mask)
           do NOT work.
    That is how do I enter a scalar dim for the whole array?

Here is a sample T90 Fortran 90 code `pgm.f' = ` t90f90test.f' with many examples, heavily commented and followed by the actual output run on t90.npaci.edu using the commands

   f90 -O3 -r3 -o run pgm.f&
   run>&pgm.out&
%%%%%%%%%%% pgm.f=t90f90test.f %%%%%%%%%
      program f90test
code98:  compare ranf() and random_number pseudo random number generators
code97:  update by removing old comments to cmfortran
code96:  retest=f90test.f redone on borg = convex spp1200/xa-16
      integer, parameter :: m = 6
      integer, parameter :: n = 4
      integer :: i,j
      integer, dimension(2) :: s2, ctr1, ctr2, ctr3, b2
      integer, dimension(3) :: s3 ,at ,ar1 ,ar2 ,br1 ,br2
      integer, dimension(4) :: as(4)
      integer, dimension(2,2) :: c ,bi
      integer, dimension(2,3) :: b, a
      integer, dimension(3,2) :: ct
      integer, dimension(3,4) :: cs
      integer, dimension(4,3) :: cst
      logical, dimension(2,3) :: test
      logical, dimension(64,64) :: inmask
      real, parameter :: tol = 0.5e-5
      integer, parameter :: niter = 5000
      real :: diffav
      real, dimension(8,8) :: us
      real, dimension(64,64) :: u , du
      real :: ranf, xran
      real, dimension(m,n) :: uniranf, uniran
      real, dimension(n,m) :: truniranf, truniran
      intrinsic  sum,maxval,minval,product
     & ,dot_product,matmul,transpose
     & ,cshift,eoshift,spread
      data b/1,2,3,4,5,6/     !replace constructors initialization
      data as/2,3,4,5/
      data at/2,3,4/
c --------------------Array Constructors:
       b(1,1:3) = (/1, 3, 5/)  ! initialize first row, along dimension 2.
       b(2,1:3) = (/2, 4, 6/)  ! initialize second row, along dimension 2.
      print*,'Note: constructors like "(/1,2/)" allowed in fc9.5'
      br1 = b(1,:)
      br2 = b(2,:)
      print60,br1,br2
60    format(' b(2,3)'/(3i3))
c --------------------Sum Function sum:
      isum = sum(b) ! => isum = 21; i.e., Front-End scalar.
      print61,' isum=sum(b)=',isum
61    format(1x,a36,i4)
      isum = sum(b(:,1:3:2)) ! => isum = 14; sole ':' means all values '1:2'.
      print61,' isum = sum("b(:,1:3:2)")=',isum
      bi=b(:,1:3:2)
      isum=sum(bi)
      print61,' isum = sum("b(:,1:3:2)")=',isum
      print*,'CAUTION: "dim=", etc., markers= NOT allowed in intrinsics'
      s2 = sum(b,2) ! redeclared with the correct array section shape.
      print62,' s2 = sum(b,2)=',s2  ! => s2 = (/9,12/), row sums
62    format(1x,a32,2i3)
      s3 = sum(b,1)  ! => s3 = (/3,7,11/); column sums.
      print63,' s3 = sum(b,1)=',s3
63    format(1x,a32,3i3)
      print*,'CAUTION:  "mask=" marker= STILL not allowed either.'
      s3 = sum(b,1,b.gt.3) ! => s3 = (/0,4,11/); i.e., conditional col sum
      print63,' s3 = sum(b,1,"b.gt.3") =',s3  
      test=b.gt.3
      s3 = sum(b,1,test) ! => s3 = (/0,4,11/); i.e., conditional col sum
      print63,' s3 = sum(b,1,"b.gt.3") =',s3  
      s2 = sum(b,2,test) ! => s2 = (/5,10/); i.e., conditional row sum
      print62,' s2 = sum(b,2,b.gt.3) =',s2  
cf8er:isum = sum(b,0,test) ! => isum = 18; i.e., add only elements
cf8er:print61,' isum = sum(b,0,b.gt.3) =',isum ! that are greater than three.
      print*,' CAUTION:  If "sum(array[dim[,mask]])", CANT use zero (0)'
     &      ,' for [dim] for whole array when there is a mask.'
c --------------------Maximum Value Function maxval:
      imax = maxval(b) ! => imax = 6; array maximum value.
      print61,' imax = maxval(b)=',imax
      s3 = maxval(b,1) ! => s3 = (/2,4,6/); column maximums.
      print63,' s3 = maxval(b,1)=',s3
      s2 = maxval(b,2) ! => s2 = (/5,6/); row maximums.
      print62,' s2 = maxval(b,2)=',s2
c --------------------Minimum Value Function minval:
      imin = minval(b) ! => imin = 1; array minimum value.
      print61,' imin = minval(b)=',imin
c --------------------Product Function product:
      s2 = product(b,2) ! => s2 = (/15,48/); products of column elements.
      print62,' s2 = product(b,2)=',s2
c --------------------Dot Product Function dot_product:
      idot = dot_product(br1,br2) ! => idot = 44; dot product of row
      print61,' idot = dot_product(b(1,:),b(2,:))=',idot ! vectors of b.
      print*,' CAUTION:  Array syntax not allowed in actual arguments.'
c --------------------Matrix Multiplication Function matmul:
      ! assuming array b of the previous section.
      ![Ans] = matmul([Array_1],[Array_2]) ! computes matrix multiplication
                                           ! of two rank two matrices.
      c = matmul(b(:,1:2),b(:,2:3)) ! => c(1,:)=(/15,23/);c(2,:)=(/22,34/).
      c=transpose(c)
      print623,'c=matmul(b(:,1:2),b(:,2:3))=',c
623   format(1x,a36/(2i3))
      ![Ans] = transpose([Array]) ! transforms an array to its transpose.
      ct = transpose(b) ! => ct(1,:)=(/1,2/);ct(2,:)=(/3,4/);ct(3,:)=(/5,6/).
      ctr1 = ct(1,:)
      ctr2 = ct(2,:)
      ctr3 = ct(3,:)
      print623,'ct = transpose(b)=',ctr1,ctr2,ctr3
c --------------------Circular Shift Function cshift:
        ! assume b is again initialized as
        !        b =  1 3 5
        !             2 4 6
      a = cshift(b,1,2)  ! => a = 3 5 1
                         !        4 6 2
cshift  EG1:
      ar1 = a(1,:)
      ar2 = a(2,:)
      print633,'a = cshift(a,1,2)=',ar1,ar2
633   format(1x,a36/(3i3))
    ! i.e., b(i,(j+shift) "mod" n) -> a(i,j) for j=1:2, etc.;
    ! nonstandard modulus fn: 0 "mod" n = n; 1 "mod" n = 1; ...;  n "mod" n = n
    ! i.e., the result is computed from shifting subscript in specified
        ! dimension of the source array by the specified shift.
      a = cshift(b,-1,2)  ! => a = 5 1 3
                          !        6 2 4
cshift  EG2:
      ar1 = a(1,:)
      ar2 = a(2,:)
      print633,'a = cshift(b,-1,2)=',ar1,ar2
        ! i.e., b(i,(j+shift) "mod" n) -> a(i,j) for j=2:3, etc.
cshift  EG3:
      s2(1) = 1
      s2(2) = 2
      a = cshift(b,s2,2)  ! a = 3 5 1
                          !     6 2 4
        ! i.e., an array-valued shift, or shift per row.
      ar1 = a(1,:)
      ar2 = a(2,:)
      print633,'a = cshift(b,(/1,2/),2)=',ar1,ar2
cshift Laplace Example:
        ! Jacobi Iteration for a 5-star discretization of 
        !        2D Laplace's equation:
      u = 0
      u(1,:)=2
      u(64,:)=2
      u(:,1)=2
      u(:,64)=1
      inmask = .FALSE.
      inmask(2:63,2:63) = .TRUE.
      diffav = 1
      iter=0
      do while (diffav.gt.tol.and.iter.lt.niter)
         iter=iter+1
         du = 0
         where(inmask)
            du = 0.25*(cshift(u,1,1)+cshift(u,-1,1)+cshift(u,1,2)
     &          +cshift(u,-1,2)) - u
            u = u + du
         end where
         du = du*du
         diffav = sqrt(sum(du)/(62*62))
      end do 
        ! which is the main program fragment of laplace.fcm.
      print*,'CAUTION: array sections not allowed in print'
      us = u(1:64:9,1:64:9)
      us=transpose(us)
      print66,'u = laplace-shift(u)= ; iter=',iter,'; av-diff ='
     &       ,diffav,us
66    format(1x,a36,i5,a11,e10.3/(8f8.4))
c --------------------End Off Shift Function eoshift:
      a = eoshift(b,-1,0,1) ! a = 0 0 0 note default boundary value is 0.
                            !     1 3 5
      ar1 = a(1,:)
      ar2 = a(2,:)
      print633,'a = eoshift(b,-1,0,1)=',ar1,ar2
      s2=(/-1,0/)
      b2=(/7,8/)
      a = eoshift(b,s2,b2,2) ! => a = 7 1 3
                             !        2 4 6
      ar1 = a(1,:)
      ar2 = a(2,:)
      print633,'a = eoshift(b,(/-1,0/),(/7,8/),2)=',ar1,ar2
      a = eoshift(b,2,0,2) ! => a = 5 0 0
                           ! =>     6 0 0
      ar1 = a(1,:)
      ar2 = a(2,:)
      print623,'a = eoshift(b,2,2)=',ar1,ar2
c --------------------Spread Function spread:
      cs = spread(as,1,3)
         ! contents of cs:
         !        2 3 4 5
         !        2 3 4 5
         !        2 3 4 5
      cst = transpose(cs)
      print64,'as =',as
64    format(1x,a32,4i3)
      print643,'cs = spread(as,1,3)=',cst
643   format(1x,a36/(4i3))
c --------------------
      cs = spread(at,2,4)
         ! contents of c:
         !        2 2 2 2
         !        3 3 3 3
         !        4 4 4 4
      cst = transpose(cs)
      print63,'at =',at
      print643,'cs = spread(at,2,4)=',cst
c ---------------------------------------------------------------------------
! i.e., b=spread(a,d,c)  =>
! a(n_1,n_2,...,n_(d-1),n_d,...,n_r) -> b(n_1,n_2,...,n_(d-1),c,n_d,...,n_r)
! where r is the rank of source array a and n_i is the size of dimension i;
! noting that a new dimension of size c is added before dimension d.
c ---------------------------------------------------------------------------
! Initialize scalar xran with a pseudo random number
      call random_number(harvest=xran)
      call random_number(uniran)
! xran and uniran contain uniformly distributed random numbers
      truniran = transpose(uniran)
      write(6,65) xran, truniran
65    format(' f90 uniform random_number(): xran =',f14.10/ 
     &   ' and f90 subroutine random_number() uniform random array:'
     &         /(4f14.10))
! standard UNICOS random number generator ranf:
      do i = 1, m
         do j = 1, n
            uniranf(i,j) = ranf()
           enddo
      enddo
      truniranf = transpose(uniranf)
      write(6,651) truniranf
651   format(' UNICOS function ranf() uniform random array:'/(4f14.10))
      stop
      end
%%%%%%%%%%% end pgm.f=t90f90test.f %%%%%%%%%

Click here to get `t90f90test.f', T90 Fortran 90 code

Here is the output t90f90test.output:

%%%%%%%%%%% begin pgm.output = t90f90test.output %%%%%%%%%
 Note: constructors like "(/1,2/)" allowed in fc9.5
 b(2,3)
  1  3  5
  2  4  6
                         isum=sum(b)=  21
            isum = sum("b(:,1:3:2)")=  14
            isum = sum("b(:,1:3:2)")=  14
 CAUTION: "dim=", etc., markers= NOT allowed in intrinsics
                   s2 = sum(b,2)=  9 12
                   s3 = sum(b,1)=  3  7 11
 CAUTION:  "mask=" marker= STILL not allowed either.
         s3 = sum(b,1,"b.gt.3") =  0  4 11
         s3 = sum(b,1,"b.gt.3") =  0  4 11
           s2 = sum(b,2,b.gt.3) =  5 10
  CAUTION:  If "sum(array[dim[,mask]])", CANT use zero (0) for [dim] for whole array when there is a mask.
                    imax = maxval(b)=   6
                s3 = maxval(b,1)=  2  4  6
                s2 = maxval(b,2)=  5  6
                    imin = minval(b)=   1
               s2 = product(b,2)= 15 48
   idot = dot_product(b(1,:),b(2,:))=  44
  CAUTION:  Array syntax not allowed in actual arguments.
         c=matmul(b(:,1:2),b(:,2:3))=
 15 23
 22 34
                   ct = transpose(b)=
  1  2
  3  4
  5  6
                   a = cshift(a,1,2)=
  3  5  1
  4  6  2
                  a = cshift(b,-1,2)=
  5  1  3
  6  2  4
             a = cshift(b,(/1,2/),2)=
  3  5  1
  6  2  4
 CAUTION: array sections not allowed in print
        u = laplace-shift(u)= ; iter= 4730; av-diff = 0.499E-05
  2.0000  2.0000  2.0000  2.0000  2.0000  2.0000  2.0000  1.0000
  2.0000  1.9762  1.9479  1.9090  1.8491  1.7440  1.5208  1.0000
  2.0000  1.9573  1.9068  1.8387  1.7387  1.5836  1.3402  1.0000
  2.0000  1.9469  1.8844  1.8014  1.6836  1.5141  1.2817  1.0000
  2.0000  1.9469  1.8844  1.8014  1.6836  1.5141  1.2817  1.0000
  2.0000  1.9573  1.9068  1.8387  1.7387  1.5836  1.3402  1.0000
  2.0000  1.9762  1.9479  1.9090  1.8491  1.7440  1.5208  1.0000
  2.0000  2.0000  2.0000  2.0000  2.0000  2.0000  2.0000  1.0000
               a = eoshift(b,-1,0,1)=
  0  0  0
  1  3  5
   a = eoshift(b,(/-1,0/),(/7,8/),2)=
  7  1  3
  2  4  6
                  a = eoshift(b,2,2)=
  5  0
  0  6
  0  0
                             as =  2  3  4  5
                 cs = spread(as,1,3)=
  2  3  4  5
  2  3  4  5
  2  3  4  5
                             at =  2  3  4
                 cs = spread(at,2,4)=
  2  2  2  2
  3  3  3  3
  4  4  4  4
 f90 uniform random_number(): xran =  0.5801136486
 and f90 subroutine random_number() uniform random array:
  0.9505127350  0.3056509439  0.0986253383  0.6938844384
  0.7863714253  0.6891007107  0.2765484551  0.9344770142
  0.2976202640  0.3826622387  0.6204460278  0.2120929553
  0.4536999003  0.1329027055  0.0835029668  0.1306527482
  0.0062619416  0.8318579032  0.9903771206  0.8625969805
  0.2757364264  0.5829797958  0.9793469434  0.8189092940
 UNICOS function ranf() uniform random array:
  0.5407187129  0.0187994091  0.3141160167  0.7651821004
  0.9415271082  0.2893071356  0.5849975196  0.9030257778
  0.8866798463  0.4966670053  0.3964840582  0.8718218141
  0.9311052262  0.5954839343  0.2096123584  0.8881281192
  0.4641396487  0.6280308383  0.4467249313  0.4578495774
  0.2349011311  0.7635970977  0.5911920675  0.4438340178
 STOP   executed at line 222 in Fortran routine 'F90TEST'
 CPU: 1.827s,  Wallclock: 0.533s,  24.5% of 14-CPU Machine
 Memory HWM: 308988, Stack HWM: 37805, Stack segment expansions: 0
%%%%%%%%%%% end pgm.output = t90f90test.output %%%%%%%%%

Click here to get `t90f90test.output', corresponding T90 Fortran 90 output

Cray T3E f90 Differences:

Here is a sample code with many examples, heavily commented and followed by the actual output run on t3e.npaci.edu using the commands

   f90 -O3 -r3 -Xm -o fpgm fpgm.f &
   mpprun -n1 fpgm  >& fpgm.output &

%%%%%%%%%%% pgm.f=cf97test.f %%%%%%%%%

f90 Library Functions

[variable] = ssum (n,a(m),k)

m X n

segldr

[variable] = sdot (n,a,1,b,1)

segldr

call mxm (a,m,b,kmax,c,n)

segldr

call mxv (a,m,b,n,c)

segldr

call random_number([HARVEST=][variable])

real s, r(100,100)
call random_number(harvest=s)
call random_number(r)

[random-variable] = ranf()

real s, r(100,100)
s = ranf()
do i = 1,100
   do j = 1,100
      r(i,j) = ranf()
   enddo
enddo

#include 
double _ranf(void);

call wheni[reln] ([nfind],[iarray],[inc],[itarget],[index],[nval]) (CR)

T90 Fortran90 (f90) Compiler Vector Toggling Directives

These statements are placed in the Fortran source just before the loop or other entity they are to effect, but they stay in effect until the opposite directive is given. However, for every toggling compiler directive that turns some action on, there is another directive with an `NO' prefix appended that turns that action off. The leading `C' must be in column 1 and a blank must be in column 6. For more information, see F90 Vol. 1: Fortran Reference Manual, Sect. 1.6 Compiler Directives.

!DIR$ VECTOR

!DIR$ NOVECTOR

!DIR$ VSEARCH

!DIR$ NOVSEARCH

!DIR$ INLINE

!DIR$ NOINLINE

T90 Fortran90 (f90) Compiler Scalar Optimization Directives

These directives effect scalar optimization at the point at which the directive appears and only affects the local program unit, such as the loop it appears in.

!DIR$ BL

!DIR$ NOBL

!DIR$ NOSIDEEFFECTS [subprogram-name]

!DIR$ SUPPRESS [variable-list]

T90 Fortran90 (f90) Compiler Loop Directives

These compiler directives hold only for the loop immediately following the directive.

!DIR$ IVDEP

!DIR$ NEXTSCALAR

!DIR$ SHORTLOOP

!DIR$ RECURRENCE

!DIR$ NORECURRENCE

T90 Fortran90 (f90) Compiler Storage Directives

These compiler directives alter the way memory is handled.

!DIR$ VFUNCTION [external-function-list]

f90 Vol. 1

!DIR$ AUXILIARY [array-list]

T90 Fortran90 (f90) Compiler Diagnostic Directives

These are used the same way as the vector directives.

!DIR$ BOUNDS [array-names]

!DIR$ NOBOUNDS

!DIR$ FLOW

!DIR$ NOFLOW

For more information on compiler directives and other f90 statements, refer to the `Cray Fortran (CFT) REFERENCE MANUAL'. Addition information on SCILIB functions can be found in the Cray Library Reference Manual, a copy of which is found in the UIC Supercomputing Support Office along with many other manuals.

Return to TABLE OF CONTENTS?

T90 Fortran (f90) Multitasking Options

The Cray supercomputers now have parallelization or tasking features in additions to vectorization features. However, the cost of running Cray Fortran is extremely large, because the user is charged for time on all processors utilized. In contrast, the user is not charged for each vector element with vectorization. HENCE THE USER SHOULD ONLY USE THE MULTITASKING FEATURES WHEN ABSOLUTELY NECESSARY.} Macrotasking refers to large grain or subroutine level parallelization. Microtasking refers to parallel loop optimization through compiler directives. Autotasking refers to automatic microtasking by the Fortran Preprocessor `fpp', i.e., through automatic code generation for multitasking. Compiler f90, preprocessor fpp and mid-processor fmp are currently version 4.0 at NPACI. More information is found on the NPACI Cray T90, in the directory `/usr/local/doc' files or subdirectories such as `cf77_50.rn' release notes or `unicos.7.0' sections.

A typical job accounting execution sequence might be
ja (CR)
${TMP}/[fn] < [data] > & [output] & (CR)
with the job accounting information appearing in a file of the form `.jacct[jobid]'. Including the pass through option `-Wd"-l [fn].ml" ' will also produce an fpp summary listing in `[fn].ml' (but no executable) with the markers `P` for autotasked, `V' for vectorized, `N' for not chosen or not optimized, adn `D' for data dependent.
f90 -O full -M [fn].f > [fn].m & (CR) : The `-M' option results in the intermediate Fortran file `[fn].m' with microtasking directives automatically inserted into the `[fn].f' source using the dependence analysis of the `fpp' preprocessor; no object or executable file is produced; the user can insert additional compiler directives into `[fn].m' and compile it with the Cray Fortran multitasking processor `fmp', the translator of the directives, by `fmp [fn].m > [fn].j (CR)'; the intermediate expanded file `[fn].j' is further assembled, linked and loaded by `sld -o [exec] [fn].j (CR)'.

Return to TABLE OF CONTENTS?

MPI Message Passing Programming on Crays.

MPI or Message Passing Interface is a library of subroutines in Fortran (procedures in C) that facilitate message passing form of parallel programming in a distributed computer or network environment. At NPACI, MPI is especially useful for writing parallel programs for the Cray T3E (T3E) massively parallel processors. Eventually, MPI will replace PVM, but currently there is more information about PVM than for MPI. MPI is more abstract and complicated than PVM, since a lot of the features of MPI are hidden behind its functions and its own compile and execution commands. For relevant information on MPI, consult the following pages, especially the example page:

Return to TABLE OF CONTENTS?

PVM Message Passing Programming on Crays.

PVM or Parallel Virtual Machine is a library of subroutines in Fortran (functions in C) that facilitate message passing form of parallel programming in a distributed computer or network environment. At NPACI, PVM is especially useful for writing parallel programs for the Cray T3E (T3E) massively parallel processors. PVM is used as a simple layer of commands with in a Fortran or C code, needing in most instances a PVM include statement, but compiles or executed with the usual Fortran and C commands. For relevant information on PVM, consult the following pages, especially the example page:

Return to TABLE OF CONTENTS?

Cray T90 f90 and cc Timing Utility Functions.

T90 Fortran90 (f90) Timing Utility Functions

[time-variable] = second() : The standard UNIX Fortran seconds timer utility, whose output value is user cpu time in seconds, as opposed to system ``cpu time'' and wall clock time (the sum of user and system times); also exists in the format `call second([time-variable])';for timing large loops, `second' overhead should be negligible; for most sizes of loops, the timing part of the code, with `!' marking comments on the statement line, might look like:

         real tv(100),cputim()                                         
         character*24 tchar(100)                                       
         kt = 1                                                        
         tv(kt) = second()      ! first 2 calls get the overhead       
         kt = kt + 1                                                   
         tv(kt) = second()      ! initial time                         
code-continues
          ...  more code ...
code-continues
         kt = kt + 1
         tchar(kt) = `loop [999]'
         tv(kt) = second()      
           do [999] i = 1, [1000]
code-continues
     ... rest of do .... 
code-continues
999      continue
         kt = kt + 1
         tv(kt) = second()      !tv(kt) - tv(kt-1) = do-cputime
code-continues
   ... more do loops and more timing step pairs .... 
code-continues
         kt = kt +1
         tv(kt) = second()            !final time
         overhd = tv(2) - tv(1)      !timer second overhead
           do [99999] ks =3, kt - 2      !cpu-time for each timed loop
         cputim(ks) = tv(ks+1) - tv(ks) -- overhd
         write(6,[99998]) ks, cputim(ks), tchar(ks)
Comment:  writes hinder vector optimization, so save writes until last
99999    continue
99998    format(1x,i3,' time =',f12.7,' for ',a)
         cputot = tv(kt) - tv(2) - (kt-2)*overhd
Caution: due to overhead variability, total can be off for small job
         write(6,*) 'total cpu-time =',cputot

[flag] = gettimeofday(&tp,&tzp);

' and that the following structures be declared:

struct timeval tp ;
/* timeval is a structure with pointer name tp and having      */
/* unsigned long  tp.tv_sec giving  seconds since Jan. 1, 1970 */
/* long  tp.tv_usec giving microseconds                        */
struct timezone tzp;         /* needed only for time zone data */

See `man gettimeofday' for more information and Cray T90 C Starter example: `t90startcc.c'.}

[time-variable] = tsecnd() : f90 task timer utility giving the cpu time for a task during multitasking.

cc Timing Utility Function

gettimeofday

#include 
#include 
#define NTime 20

main()
{
/* Time variables */
   struct timeval tp ;
/* timeval is a structure with pointer name tp and having */
/* unsigned long  tp.tv_sec giving  seconds since Jan. 1, 1970 */
/* long  tp.tv_usec giving microseconds */
   struct timezone tzp; /* needed only for time zone data */
   int gtod;
   long int tsecs[NTime], tmicrosecs[NTime];
   long int ttot[NTime], ttotmoh[NTime];
   float ts1, tt1, tu1, tu2, ts2, tt2, tu3, ts3, tt3;
   double ttotf;

/* begin main code */
   if (gettimeofday(&tp,&tzp) == -1) { perror("gettimeofday failed"); exit(1);}
   kt = 1;
/* gettimeofday = Microsecond Wall Timer C function;                         */
/* WallTime = UserTime + SystemTime, Undecomposed;                           */
/* gettimeofday returns gtod = 0 if successful;                              */
/* tv_sec in secs since 1/1/70;                                              */
/* tv_usec in added microseconds;                                            */
/* tzp gives the timezone;                                                   */
   gtod=gettimeofday(&tp,&tzp);
   tsecs[0] = tp.tv_sec;
   tmicrosecs[0] = tp.tv_usec;
   ++kt;
   gtod=gettimeofday(&tp,&tzp);
   tsecs[1] = tp.tv_sec;
   tmicrosecs[1] = tp.tv_usec;
/*   ...... MUCH DELETED CODE ........ */

/*   ...... MUCH DELETED CODE ........ */
/* Clock: Elapsed Total Time: */
   ++kt;
   gtod=gettimeofday(&tp,&tzp);
   tsecs[kt] = tp.tv_sec;
   tmicrosecs[kt] = tp.tv_usec;
/* Total Elapsed Time Including Clock Overhead*/
   ttot[kt] = (tsecs[kt]-tsecs[1])*1000000+(tmicrosecs[kt]-tmicrosecs[1]);
/* Total Elapsed Time Minus Clock Overhead */
   ttotmoh[kt] = ttot[kt] - (tmicrosecs[1] - tmicrosecs[0]);
   printf("\nIntermediate Raw Timing Output:");
   printf("\ntmicrosecs[(0,1,kt)]=(%12d,%12d,%12d), in microseconds",
      tmicrosecs[0],tmicrosecs[1],tmicrosecs[kt]);
   printf("\ntsecs[(0,1,kt)]=(%12d,%12d,%12d), in seconds",
      tsecs[0],tsecs[1],tsecs[kt]);
   printf("\n(ttot[kt],ttotmoh[kt])=(%12d,%12d), in microseconds",
      ttot[kt],ttotmoh[kt]);
   if (ttot[kt] < 0){printf("\n  Error:Negative Times:Bad Clock:Rerun Job\n");}
   ttotf = ttotmoh[kt]/1.e6;
   printf("\n T90 Starter C Problem Output");
   printf("\n  Timing Output:");
   printf("\n   final total time=%12.4e, in seconds\n",ttotf);
/* Change:  Extra output statements: */
}

Table of T90/T3E Timers

T90 (perhaps MPP) Timer Summary ... MCS572 F95/FBH
Timer     TimeMeasured      Units          Comments
-----     ------------      -----          --------
clock     System&User       Microseconds
cpused    User              ClockTicks     RTC ticks
gettimeofday    WallTime    Microseconds   plus many other things from TOD; C fn
ja        ElapsedUserSys    Seconds        plus more;on T90;only mppexec for T3E
rtclock   User              ClockTicks     current RTC ticks
RTC       RealTimeClock     ClockTicks     float version
IRTC      RealTimeClock     ClockTicks     int version
second    User              Seconds        Coarse, not useful for small timings 
secondr   ElapsedWall       Seconds        Coarse, not useful for small timings
sysclock  RealTimeClock     ClockTicks     plus #wraps (overflows)
timef     ElapsedWall       Milliseconds   Fn. gives elapsed time since 1st call
times     Process&Child     ClockTicks     needs include 
timex     ElapsedUserSys    Seconds        depends on opts in timex [opts] [cmd]
tsecnd    ElapsedTask       Seconds        for current multithreaded task

Notes: There are several other timers, but not appropriate for scientific computing. For actual use, consult the timer man page. Ideally, a timer should give usertime in intervals a small as microseconds. Hence, an ideal timer for the T3E would have to be designed from an rtc clock. Job accounting ja is done on T90, but gives mppexec time (must be T3E time). Using the C routine `gettimeofday' would be rough approximation, suggested on now extinct Thinking Machines Corp. CM-5.

T3E MPI Wall Timer

The Cray T3E at NPACI has a wall timer MPI_Wtime in seconds that works with MPI parallel programming codes for both f90 and cc codes. See the following information on MPI__Wtime and related functions:

MPI_Wtime Man Page, Measurements in Seconds.
MPI_Wtick Man Page, Resolution or Finest Time Interval Measured.
Sample Fortran f90 Code Illustrating MPI_Wtime Usage, for Laplace-Jacobi Application.
Sample C Code Illustrating MPI_Wtime Usage, for Laplace-Jacobi Application (under revision).

The best way to learn these commands is to use them in an actual computer session.

Good luck.

Return to TABLE OF CONTENTS?

Please report to Professor Hanson any problems or inaccuracies:

hanson@uic.edu

Web Source: http://www.math.uic.edu/~hanson/crayguide.html

MCS572 UIC Cray User's Local Guide to NPACI Cray T90 Vector Multiprocessor and T3E Massively Parallel Processor

version 14.0030 November 2000

F. B. Hanson

Mail address:

Office address:

Hanson World Wide WEB Home Page:

UIC Fall 2000 Course:

MCS 572 Class World Wide WEB Home Page:

Acknowledgement:

Table of Contents

Introduction

T90 Overview.

T3E Overview.

Supercomputer Centers Overview.

Guide Notation.

Background References

Annotated NPACI Cray T90 Sample Session.

Annotated NPACI Cray T3E Sample Session.

ftp File Transfers between NPACI/Cray/UNICOS and UIC

ftp File Transfers at the NPACI Crays

ftp File Transfers from UIC UNIX

ftp File Transfers from the UIC PC Labs

Execution of Cray T90 FORTRAN90 (f90) or Cray C

Example 1: Execution using the Terminal for Input and Output

Example 2: Execution using Input and Output with Files

Modifications for C: Compile and Execution with C

Cray UNICOS Specific Unix Commands.

UNICOS Special Information Commands

UNICOS T90 Fortran90 (f90) Compile, Load and Execution Commands

UNICOS C Language Commands

UNICOS Performance Commands

UNICOS makefile Commands

UNICOS Mail Commands

UNICOS Network Queueing System (NQS)

T90 Fortran90 (f90) and other Extensions

T90 Fortran90 (f90) Compiler Options

T90 Fortran90 (f90) Miscellaneous Extensions

Fortran90 Array Construction Functions

Fortran90 Array Reduction Functions

Fortran90 Array Manipulation Functions

Fortran90 Array Location Functions

Fortran90 Array Matrix Multiply Functions

Fortran90 Array Functions TEST CODE

T90 Fortran90 (f90) Differences:

Cray T3E f90 Differences:

f90 Library Functions

T90 Fortran90 (f90) Compiler Vector Toggling Directives

T90 Fortran90 (f90) Compiler Scalar Optimization Directives

T90 Fortran90 (f90) Compiler Loop Directives

T90 Fortran90 (f90) Compiler Storage Directives

T90 Fortran90 (f90) Compiler Diagnostic Directives

T90 Fortran (f90) Multitasking Options

MPI Message Passing Programming on Crays.

PVM Message Passing Programming on Crays.

Cray T90 f90 and cc Timing Utility Functions.

T90 Fortran90 (f90) Timing Utility Functions

cc Timing Utility Function

Table of T90/T3E Timers

T3E MPI Wall Timer

MCS572 UIC Cray User's Local Guide to
NPACI Cray T90 Vector Multiprocessor
and T3E Massively Parallel Processor

version 14.00
30 November 2000