24 March 2003
University of Illinois at Chicago
851 S. Morgan; SEO, MC 249
Chicago, IL 60607-7045
This User's Local Guide is intended to be a sufficient, hands-on introduction to the Pittsburgh Supercomputing Center TCS (Terascale Computing System Parallel Cluster for our MCS 572 Introduction to Supercomputing class. The TCS has a Compaq variation of the UNIX operating system called Tru64 UNIX.
The PSC Class Account for MCS572 Fall 2000 is `sc70jpp' for the PSC Grant SEEE030003P.
The PSC TCS MC512 is a large scale parallel cluster with 64 Compaq (HP) Alphaserver compute nodes, each wth 4 667 MHz processors, making a total of 256 processors. The PSC TCS's internet address is
with the prompt of `%'. For TCS information from PSC, see
Remark: There are a lot of inaccuracies in this outdated page, some of which are corrected in this local guide.
The TCS is a protoype for a larger, final terascale system called Lemieux with 750 computer nodes with at total of 3000 processors. Lemieux's web page should be consuted for updated system information:
The TCS and Lemieux AlphaServer System Reference card is found at
What does the PSC TCS look like? PSC TCS Picture
A simple view of the TCS architecture is given for the larger system:
Each compute node is an HP AlphaServer SC ES40/EV67/ nodes configured as as a 4 processor Symmetric MultiProcessor or Shared Memory Processor (SMP) with 4 GigaBytes (GB) of memory (RAM). The TSC cluster nodes are connected with a proprietary Quadrics Interconnection (IC) network. For more information on the compute nodes, see the nice Compaq slide show of N. Srivastava:
The PSC TCS, installed at PSC in April 2001, ranks as the 246th top computer in the world (Top 500 Computer Reports, November 2002, Source: http://www.top500.org) and has a theoretical asymptotic peak speed of Rmax = 264 GigaFlops (GF) on LINPACK linear algebra benchmarks, with Hockney Linear Model (see MCS572 class notes) parameters of asymptotic peak speed Rpeak = 342 GF (also called Rinfinity) and N1/2 = 20,000, with maximum order run Nmax=106,000 given at the web link above, or see the class summary
The random access memory (RAM) is globally shared 4GB memory on the 4 processor nodes, but distributed memory or 256 GB with respect as a cluster of 64 nodes, so is has a hybrid memory system globally as a 256 processor system. The processors or CPUs each have a 8 MB L2 cache memory (level 2 local memory).
The operating system is the Compaq Tru64 UNIX V5.1A (Rev. 1885). However, since compilation and execution TCS is by remote batch scheduling, the user uses a combination of the Compaq Portable Batch System (PBS) and the UNIX Network Queueing System (NQS), the user should refer to subsections on those topics. See
where the Shell Path "[shellpath]" can be found with system "which" in the format:
where "[shell]" is the standard system shell "sh", Bourne again shell "bash", the Korn shell "korn" and others. However, all of the NQS QSUB job scripts given here assume the C-shell which uses the resouce configuration file ".cshrc" which resides in the user's home directory and can be used to define commands and make aliases (format: "alias [aliasname] [aliasdefinition]", in cases of special command characters quotation marks are needed.). A sample of a ".cshrc" file for use on the TCS is
Users MUST access the PSC TCS directly using the Secure Shell (ssh), such as from UIC `icarus' or from department systems,
SSH works like the Unix remote login command `rlogin', but encrypts your password so that it is nearly impossible to steal. The commands `rlogin' or `telnet' do not work with the `tcs', resulting in the response "tcs.PSC.edu: Connection refused". See "man ssh" for help from the UNIX manual pages.
SSH is a UNIX command found on may UNIX systems, but you can get a free MS Windows version that comes in two main flavors:
Users MUST do their file transfer bewteen the PSC TCS and UIC using the Secure Shell (ssh) commands such as secure copy scp or secure FTP sftp. For example, from UIC
SCP Secure Copy:
This form of the command works well for a single file, which can also have a directory path, but the user password has to be given each time. For multiple files a wild card version can be use, e.g., for all C files omitting the target file name from PSC:
SFTP Secure File Transfer Protocol: See "man scp" for help from the UNIX manual pages.
Also, you can use the secure File Transfer Protocol (FTP) called sftp that works like the usual FTP, except that you can not use any abbreviations of the FTP subcommands (e.g., use "put" and not "put"), but SFTP secures your session better. For example, from UIC,
Remark: If your username is the same at both UIC node and PSC node,
then the "[username]@" is optional.
See "man sftp" for help from the UNIX manual pages.
HOME Directory:
Each PSC User has a home directory to keep files and subdirectories with
the full path specified by "/usr/users/[n]/[username]" where
[n] = 0:9. The home directory can be more simply referenced
by the UNIX symbol ~ or the UNIX meta or environmental variable representation ${HOME} as in cd $HOME to change directory
back to home or ls ${HOME}/mcs572 to list contents of a home
subdirectory "mcs572" (note that the curly brackets are
optional in the first example but required in the second example where
"HOME" is followed by nonblank characters. Home directory quotas are
100MB (Mb?), but may not be enforced.
SCRATCH Directory:
Each user has a scratch or work directory "/usr/scratch/[nx]/[username]"
where [nx] = 0:9 or [nx] = 0x:9x, and these directories
are linked to the disks /scratch1/ or /scratch2/. The
user's scratch directory can simply be referenced by the meta representation
${SCRATCH}, where the curly brackets are optional if ${SCRATCH} is used as a sole argument. It is strongly recommended
that the scratch directory directory be used for scheduling batch jobs
(essentially the only ones allowed) on the TCS cluster with the qsub
queueing submit command including all necessary input files. Caution:
On the tcs.html webpage the no longer existing $STAGE is incorrectly listed for the work directory.
LOCAL Directory:
Each TCS cluster computing node has global node memory accessible to all
four of its processors and that memory is accessible to the user only
when the user's code is executing, technically beginning with the
qsub script required shell identification, e.g.,
"#!/bin/csh" escape to the C-shell. Hence, it should not
be necessary to change the current directory to ${LOCAL}.
However, the parallel run command prun needs seemly redundant
"./[executable]" file. On the tcs.html webpage the no longer existing $TMPDIR is incorrectly listed for the compute node directory.
Remark: The commands "qsub" and "prun" are discribed more below.
FAR File ARchiver System:
The FAR system runs on golem.psc.edu and is accessible from
TCS and Lemieux for large file storage for long periods of time. You
will need information on the Andrew File System (AFS) and FAR special
instructions to use it and more
information on FAR is at
However, for the class, you will likely not be needing FAR.
The TCS programs are compiled directly on the TCS, given here
with some typical options, using the
Fortran90 Compiler:
C Compiler:
C++ HP Compiler:
In the above compilation commands, the opitons are
PRUN Parallel Run Command:
NQS Job Scripts:
and a one for PSC TCS 4 processor script for Fortran 90 code is given in
The new user should study these sample job scripts and others listed on
the class homepage:
Executable Job Scripts:
Else
chmod 755 fpgm4.job
NQS qsub Submit Command:
These job scripts are run with the NQS QSUB submit
command from the user's `${SCRATCH}' scratch directory, for example,
NQS qstat Status Command:
The job status can be checked by the NQS QSTAT status command:
NQS qdel Delete Command:
If for any
reason you need to kill the job before the end, first note the job
id number `[job_id]' at the beginning of your job line in the "qstat
-u [tcs-username]" output, then enter the command:
Job Script Examples:
A user can try out the class sample NQS QSUB job scripts by down loading
and copying
one of the following sample codes
to your home directory and then recopying it, say "[ExampleCode].c"
or "[ExampleCode].f"
to the recyclable source file of the form `*pgm.*' as follows:
The user will also have to create a simple input data file called
"cdata" or use the Pi Code example data file for the qsub scripts
since the script are written to take a data file as standard input,
(e.g., using the editor "vi" to revise the set of integration points in
cdata, terminated by zero)
into the input data file;
then in the home directory entering the queue submit command for
4 processors on a single node:
then check for a finished job with "qstat -u [psc-username]" until the
your queue record no longer is displayed, finally looking for the standard output and standard error files,
for example "ls -l *pgm4.output *pgm4.error". You can always modify the
sample job scripts to suit your particular job requirements, your own
file naming preferences or if you prefer to open and close files in the
code by hand.
MCS 572 Class MPI webpages:
PSC MPI Basics:
Cray native SHMEM communication library also available, but
is optimized between nodes like ELAN only and not within a node:
OpenMP is supported in Tru64 UNIX for C and Fortran, but not C++:
For TCS information from PSC, see
This local-guide is meant to indicate ``what works'' primary for
access from UNIX systems to PSC TCS. The use of the
Unix C-Shell on the TCS is assumed throughout most of this local guide.
UNIX is a trademark of AT&T.
Computer prompts or broadcasts will be enclosed in double quotes
(``_''),
background comments will be enclosed in curly braces
({_}),
commands cited in the comments are highlighted by single quotes or
double quotes depending on emphasis
(`_') or ("_")
{do not type the quotes when typing the commands}, and
optional or user specified arguments are enclosed in square brackets
([_])
{However, do not enter the square brackets.}. The symbol
(CR)
will denote an immediate carriage return or enter.
{Ignore the blanks
that precede it as in `[command] (CR)', making it easier to read.}
The symbol
(Esc)
will denote an immediate pressing of the Escape-key
{Use no brackets please.}
The symbol
(SPACE)
will denote an immediate pressing of the Space-bar
{Warning: Do not type any of these notational symbols in an
actual computer session.}
For further information, please consult the sources (you can just click on
the highlighted topics to access if you are surfing the world Wide Web):
UNDER RECONSTRUCTION
To find out what other special software is at NPACI click on:
NPACI Installed Software
See `man cc' or `docview' for more information.
For more information
about batch processing with NQS, click on:
For optimization, it is recommended that your
f90 program aid the f90 vector
model, i.e. structure the code so that the compiler can automatically
recognize as vectorizable. Usually only inner most loop is
vectorizable. Avoid loop GOTOs and IFs. Avoid CALLs within loops.
Avoid loop READs and
WRITEs. Use vectorizable functions. Avoid data dependencies.
Use compiler directives, such as `!DIR$ VECTOR' and `!DIR$ NOVECTOR'.
Minimize vector strides. Tune code to Fortran column-wise environment
in the physically linear memory.
Don't even think about using tabs, except in makefiles.
See also
Section ``Execution of Cray T90 Fortran90 (f90)''
and
Subsection ``T90 UNICOS f90 Compile, Load and
Execution Commands''. Also see the appropriate
sections, `docview' and `man cc' for items on Cray Standard C.
The reduction functions reduce the input to a scalar output.
The manipulation functions rearrange the elements of the target matrix.
The location functions find the location of elements of the target matrix.
The matrix multiply functions compute the matrix products of the target
matrices.
The following f90 code contains examples of use of many of the Fortran90 array
intrinsic functions mentioned above.
There are some rules:
Here is a sample code with many examples, heavily commented and followed
by the actual output run on t3e.npaci.edu using the commands
These statements are placed in the Fortran source just before
the loop or other entity they are to effect, but they stay in effect
until the opposite directive is given. However, for every toggling
compiler directive that turns some action on, there is another
directive with an `NO' prefix appended that turns that action off.
The leading `C' must be in column 1 and a blank must be in column 6.
For more information, see F90 Vol. 1: Fortran Reference Manual,
Sect. 1.6 Compiler Directives.
These directives effect scalar optimization at the point at which
the directive appears and only affects the local program unit, such as
the loop it appears in.
These compiler directives hold only for the loop immediately
following the directive.
These compiler directives alter the way memory is handled.
These are used the same way as the vector directives.
For more information on compiler directives
and other f90 statements,
refer to the `Cray Fortran (CFT) REFERENCE MANUAL'. Addition information
on SCILIB functions can be found in the Cray Library Reference Manual,
a copy of which is found in the UIC Supercomputing Support Office
along with many other manuals.
The Cray supercomputers now have parallelization or tasking
features in additions
to vectorization features. However, the cost of running Cray Fortran
is extremely large, because the user is charged for time on all
processors utilized. In contrast, the user is not charged for each
vector element with vectorization. HENCE THE USER SHOULD ONLY
USE THE MULTITASKING FEATURES WHEN ABSOLUTELY NECESSARY.}
Macrotasking refers to large grain or subroutine level
parallelization. Microtasking refers to parallel loop
optimization through compiler directives. Autotasking
refers to automatic microtasking by the Fortran Preprocessor
`fpp', i.e., through automatic code generation for multitasking.
Compiler f90, preprocessor fpp and mid-processor
fmp are currently version 4.0 at NPACI.
More information is found on the NPACI Cray T90, in the
directory `/usr/local/doc' files or subdirectories such as
`cf77_50.rn' release notes or `unicos.7.0' sections.
A typical job accounting execution
sequence might be
[time-variable] = second()
: The standard UNIX Fortran
seconds timer utility, whose output value is user cpu time
in seconds, as opposed to system ``cpu time''
and wall clock time (the sum of user and system times); also exists
in the format `call second([time-variable])';for timing large
loops, `second' overhead should be negligible;
for most sizes of loops, the timing part of the code, with `!' marking
comments on the statement line, might look like:
[time-variable] = tsecnd()
: f90 task timer utility
giving the cpu time for a task during multitasking.
The Cray T3E at NPACI has a wall timer MPI_Wtime in seconds that works with
MPI parallel programming codes for both f90 and cc codes. See the following
information on MPI__Wtime and related functions:
The best way to learn these commands is to use and test them in an actual
computer session on the TCS Cluster.
Please report to Professor Hanson any problems or inaccuracies:
TCS File Systems.
TCS Programming Languages.
f90 -O -lmpi -lelan -arch ev67 -lm -o [executable] [source].f
or the
cc -O -lmpi -lelan -arch ev67 -lm -o [executable] [source].c
or the
cxx -O -lmpi -lelan -arch ev67 -lm -o [executable] [source].c
See "man cxx" for help from the UNIX manual pages or
http://h30097.www3.hp.com/cplus/cxx_ref.htm
prun -N [Number_Nodes] -n [Number_Processors] [executable] < [data]
where "< [data]" means the data file is directed into standard UNIX input.
An executable can not run in parallel without "prun". Usually, the
number of nodes "[Number_Nodes]" and the number of processors "[Number_Processors]" are specified by the local meta environmental
variables, ${RMS_NODES} and ${RMS_NODES}, respectively, since both must
be initially set by a PBS statement in the QSUB script or in the options of the
"qsub" command, which automatially initialize the meta variables.
See "man prun" for help from the UNIX manual pages.
TCS Batch Queueing Systems: PBS and NQS with MPI.
Remote job scheduling on the TCS is accomplished by
using the UNIX Network Queueing System (NQS) job scripts,
but the script directives use the so-called Portable Batch System (PBS)
Directives
used on the HP Alphaservers, in place of the usual NQS Directives.
A sample PSC TCS 4 processor target job script for C code is given in
Before any job script can be used as an argument of the qsub
the
job script must be made executable for all,
e.g., using the UNIX change mode command:
chmod 755 cpgm.job
where in the second from, the files should already be readable (r).
or
chmod a+x cpgm4.job
for C Languages
or
chmod a+x fpgm4.job
for Fortran 90
qsub cpgm4.job
where `${SCRATCH}' denotes the meta-name of the user's scratch directory on
the TCS cluster.
See "man qsub" for help from the UNIX manual pages.
for C Languages
or
qsub fpgm4.job
for Fortran 90
qstat -u [tcs-username]
and when done, the user can view the output if any. Under the table
heading called "S" ,e.g., "Q" means that the job is queued waiting
to run, "R" means running, and "E" means exiting.
See "man qstat" for help from the UNIX manual pages.
qdel [job_id]
which should stop a running job, unless the system is busy.
See "man qdel" for help from the UNIX manual pages.
cp [ExampleCode].c cpgm.c
for C or F90, respectively.
or
cp [ExampleCode].f fpgm.f
qsub cpgm4.job
or
qsub fpgm4.job
TCS Message Passing Interface (MPI) Sources.
More TCS Information.
Guide Notation.
Background References
MPI Message Passing Programming on TCS.
MPI or Message Passing Interface is a library of subroutines in
Fortran (procedures in C) that facilitate message passing form of parallel
programming in a distributed computer or network environment. At NPACI, MPI
is especially useful for writing parallel programs for the Cray T3E (T3E)
massively parallel processors. Eventually, MPI will replace PVM, but
currently there is more information about PVM than for MPI. MPI is more
abstract and complicated than PVM, since a lot of the features of MPI are
hidden behind its functions and its own compile and execution commands.
For relevant information on MPI, consult the following pages, especially
the example page:
UNIX Command Dictionary.
UNICOS T90 Fortran90 (f90) Compile, Load and Execution Commands
f90 -r3 -[other options] [source].f
[other source files] (CR)
: Compiles source file `[source].f' and `[other source files]'
with the Cray level 3 report compiler option `-r3'
both with the default full optimization (`noaggress bl noinline recurrence
norecursion scalar vector ....... nozeroinc'),
producing an object file `[source].o' and compiler annotated listing
file `[source].l' with vectorization information:
Marking Meaning
S scalar loop optimization (major marker)
V vector optimization (major marker)
P Parallel optimization (major marker)
Vs short vector optimization
W unwound (major marker) {short inner-most loops with trip
counts of not more than 5 are collapsed or transformed to single
statements so that the next inner-most loop can be vectorized
provided there are no dependencies}
b bottom loading {pre-fetching is used for the next
iteration of scalar loops, only and `-o nobl' kills it}
c conditionally vectorized, {subject to run-time
determination of recurrence vector length}
k kernel scheduling
i unconditionally vectorized with IVDEP
r loop unrolling {a set of loop iterations is
collapsed into one iteration that has been enabled by the `-e'
enabling option with its `m' loop marking sub-option}
D delete loop
Use `-emx' in place of `-em' if you want a cross reference listing also.
Use the `-b [binfile]' option to name the object file with a name
other than the default `[source].o' name.
Use `-o aggress' to turn on a more aggressive form of optimization,
but be careful of the results. Use `-o inline' or
`-I [inline-source]' to get inlining of subprograms to avoid their
overhead. Use the compiler directives `NORECURRENCE' or `IVDEP' and
`RECURRENCE' to turn off and on the optimization of loop recurrences.
Use `-o recursion' to enable subprograms to be recursive. Use `-o zeroinc' if
zero increments of do loops indices or constant increment variables (CIV)
are used, because the default assumes there are none.
Use `segldr' command to load the execution module,
which then can be used to execute the program.
See below and the last section for more on the options.
It is much better to use makefiles for such commands.
f90 -eS [source].f (CR)
: Creates a Cray Assembly Language (CAL) file or calfile named
`[source].s' for the Fortran program `[source].f' that can be used
with the Cray Assembler or to determine how the Cray compiler has
carried out the optimization, particularly how it has used the
vector registers. The option `[name].s' can be used to name
the calfile with something other than the default name. No
object or binary file `[source].o' is produced, and a nasty message
will be given instead.
f90 -g [source].f (CR)
: Compiles the f90 and generates a symbol table for the debugger,
like `cdbx' (use `man cdbx'). See also `-G debug_lvl',
where `-G 0' is the same as `-g'.
segldr -o [executable-file] -l [library-list] [source].o (CR)
: This segment loader links and loads the object module
`[source].o' from the `f90'
step into the execution module named `[executable-file]' by
the `-o' option.
Without the `-o' option, the executable is the standard `a.out' file.
The library option may not be needed because many libraries are
searched by default: Pascal (libp.a), I/O (libio.a), utility (libu.a),
Fortran (libf.a), C (libc.a), Math (libm.a), and Science (libsci.a).
Numerical Recipes in Fortran or C of Press et al. are not
directly available in UNICOS.
f90 [-options] -o [executable] [source].f (CR)
: The `f90 -o [executable]' command combines both `f90'
compile and
`segldr' load functions in one command; e.g.,
f90 -limsl [source].F (CR)
: This Fortran90 parallel form is for using the IMSL mathematical and
statistical library; if more than one processor is used, then
`setenv NCPUS [nn]' must be executed first with
`[nn]' is number of CPU's requested. For more information, click on:
IMSL Software at NPACI
[executable-file] < [input-file] > [output-file] & (CR)
: Executes the executable module taking input from the file
`[input-file]' and redirecting output to `[output-file]' as a background
process.
UNICOS C Language Commands
cc -o run [file].c (CR)
: Compiles source [file].c, using the standard C compiler `scc2.0' and
producing an executable named run. In place of `cc', use `scc3.0'
or `scc' for the latest version of standard C or `pcc' for portable C.
cc -c [file].c (CR)
: Compiles source [file].c, using the standard C compiler `scc2.0' and
producing an object file named [file].o.
cc -hnoopt -o run [file].c (CR)
: Compiles source [file].c, using the standard C compiler `scc3.0' and
producing an executable file named run without scalar optimization or
vector optimization while `hopt' enables scalar and vector optimization,
Some other optimization related options are `-hinline' for inlining while
`-hnone' is the default no inlining, `-hnovector' for no vector (vector
is the default), and `-h listing' for a pseudo-assembler (CAL) listing.
Some standard C options are `-htask3' for automatic parallelization
(autotasking in "crayese") and `-hvector3' for more powerful vector
restructuring.
Other `-h' suboptions are `ivdep' for ignore vector dependence,
`-hreport=isvf'
generates messages about inlining (i), scalar optimization (s) and vector
optimization (v), and `-hreport=isvf' writes same messages to `[file].v'.
A commonly used form will be
cc -o run -h report=isvf [file].c (CR)
#define fortran
: Form of C header statement to permit the call to a fortran subroutine
from a C program. For example:
#include <stdio.h>
#include <fortran.h>
#define fortran
main()
{
fortran void SUB();
float x = 3.14, y;
SUB(&x, &y);
printf("SUB answer: y = %f for x = %f\n", x, y);
}
#pragma _CRI [directive]
: Form of C compiler directive placed within the C code, where some
example directives are `ivdep' for ignoring vector dependence,
`novector' for turning off the default vectorization, `vector' for
turning it back on, `inline' for procedure inline optimization,
`shortloop', `noreduction', `getcpus [p]',
`relcpus', `parallel ........', and `end parallel'. See `vector directives'
for instance in `docview' for more information and examples.
segldr -o [executable-file] -l [library list] [source].o (CR)
: This segment loader links and loads the object module
`[source].o' from the `f90'
pure compile step into the execution module named `[executable-file]' by
the `-o' option.
Without the `-o' option, the executable is the standard `a.out' file.
The library option may not be needed because many libraries are
searched by default: Pascal (libp.a), I/O (libio.a), utility (libu.a),
Fortran (libf.a), C (libc.a), Math (libm.a), and Science (libsci.a).
Numerical Recipes in Fortran or C of Press et al. are not
directly available in UNICOS.
[executable-file] < [input-file] > [output-file] & (CR)
: Executes the executable module taking input from the file
`[input-file]' and redirecting output to `[output-file]' as a background
process.
UNICOS Performance Commands
Cray Prof Profiling Facility:
Cray Error Explaining Command:
explain [error-message-code] (CR)
: Elaborates on the command error message '[error-message-code]' for
many commands; use `man explain' for a complete list.
Cray Job Accounting (ja) Command:
ja (CR)
{[}executable] (CR)
ja -csf (CR)
: This command sequence enables Job Accounting storing the information
in a file of the form `.jacct[jobid]', with options `c' giving
a command report, `f' giving a command flow report, `s' giving a
multitasking breakdown summary report. Note that the NPACI service unit charges
are approximately one cpu hour on the T90 and one element hour on the T3E,
assuming average memory (about 16MW) usage. Caution: In general, parallel
processing on the YMP series like the T90 is very expensive.
Cray Perftrace (perf) or Performance Trace Facilities:
f90 -ef [source].f (CR)
segldr -l perf [source].o (CR)
a.out > [source].perf (CR)
segldr - l perf perf[n] [source].o (CR)
a.out >> [source].perf (CR)
: Compiles the FORTRAN 77 program `[source].f' for use for the Cray
Perftrace or Performance Trace facilities.
(Flowtrace results are similarly found in the
output of the executable file executed after loader statement.)
The library suboption here is `perf' for referencing the libperf.a
library, which has several levels,
where `[n]' is the level `', `1', `2' or `3'.
Cray Hardware Performance Monitor (hpm):
hpm -g[n] -d [executable] > [source].hpm[n] (CR)
: Simulates the Hardware Performance Monitor with
`[executable]' and level `l' = `0' (scalar activity), `1' (hold issue
conditions), `2' (memory use), or `3' (instruction and vector
operations). The option `-d' means that a dedicated machine is
simulated.
Cray JumpTrace (jt) and JumpView (jumpview):
JumpTrace and JumpView help gather performance statistics in the form of
a report. Some use examples are:
Fortran Example:
f90 -ef [pgm].f
jt ./a.out
jumpview
C Example:
cc -ltrace -Gp [cpgm].c
jt ./a.out
jumpview -Luch >[cpgm].listing
JumpView Main Menu:
----------------------------------------------------- MAIN MENU
1 Master Summary | 7 List by Average Time/Call
2 Routines: List by Time | 8 Operating Environment
3 List by Megaflops | 9 Long Report by Routine Name
4 List by In-Line Factor | 10 Detail Report by Symbol
5 List by Name | 11 Detail Report by Block
6 List by Calls | 12 Options
----------------------------------------
H HELP
Q QUIT
Enter Number/Letter of Action Desired
---------------------------------------------------------------
Cray Autotasking Expert Performance System (atexpert):
atexpert [options] (CR)
: Autotasking expert performance system, needing X-windows display
for full power. See also `atchop' and `atscope'.
UNICOS makefile Commands
make [-options] [step-name] (CR)
: Makes the files [files] according to the template in the `makefile'.
E.g., the file `makefile.unicos_2':
# Use ``make -f make.unicos_2 mrun>& pgm.l &;
run<data>out''.
SOURCES = pgm.f
OBJECTS = pgm.o
FLAGS = -em
mrun : $(OBJECTS)
segldr -o run $(OBJECTS)
.f.o : f90 $(FLAGS) $*.f
{CAUTION: The commands, like `segldr' or `f90', must be preceded
by a `Tab-key' tab as a delimiter, but the tab will not be visible
in the UNIX listing.}
fmgen -m [make-name] -c f90 -f
[-flag] -o [executable] [source].f (CR)
: Automatically generates a makefile for
compiling under the `f90'
compiler and loading up the executable file named `[executable]'.
Invoke with `make -f [make-name] [executable](CR)' and the execute
`[executable]'. Also produces steps for profiling, flow-traces,
performance traces, and clean-up, in the heavily documented makefile.
For example, `make -c f90 -f -r3 -o run pgm.f (CR)' produces a
makefile named `makefile', executable named `run', an information
listing named `[name in program statement].l' with loops marked
by optimization type, etc.; the making is done with `make run (CR)'.
Caution: the makefile only uses the source name only when that
coincides with the name used in the Fortran `program' statement
and only one type of `f90' flag can be used. These flaws can
be corrected by editing the resulting makefile `[make-name]'.
UNICOS Mail Commands
mailx (CR)
: Shows user`s mail; caution: `mailx' is close to the usual Unix mail,
whereas the UNICOS `mail' command is NOT;
use the subcommand `t [N](CR)' to list message number `[N]'
, `s [N] mbox (CR)' to append message `[N]' to your mailbox `mbox' file
or `s [N] [file](CR)' to append `[N]' to another file;
`e [N] (CR)' to edit number [N] or look at a long file with `ex'
{see Section on `EX' below};
`v [N] (CR)' to edit number [N] or look at a long file with `vi';
`d [N] (CR)' deletes {your own mail!} `[N]';
`m [user] (CR)' permits you to send mail to another account `[user]';
a `~m [N] (CR)' inside the message after entering a subject,
permits you to forward message `[N]' to `[user]',
`\d (CR)' to end the new message {see the send form
below;`x' quits `mailx' without deleting {use this when you
run into problems}; and `q (CR)' to quit.
mailx [user] (CR)
: Sends mail to user `[user]';
the text is entered immediately in the
current blank space; carriage return to enter each line;
enter a file with a `~r[filename] (CR)';
route a copy to user `[userid]' by `~c[userid] (CR)';
enter the `ex' line editor with `~e (CR)'
or `vi' visual editor with `~v (CR)'
(see Sections on EX and on VI)
to make changes on entered lines,
exiting `ex' with a `wq (CR)' or `vi' with a `:wq' (CR)';
exit `mailx' by entering `\d (CR)'. {A bug in the
current version of Telnet does not
allow you to send a copy using the `cc:' entry.
mailx [name]@[machine].[dept].uic.edu < [filename] (CR)
: Sends the UNICOS file `[filename]' to user `[name]' on
some UNIX or other machine.
UNICOS Network Queueing System (NQS)
qsub [options] (CR)
: Submit a batch job to the queue; see `man qsub (CR)' for more
information. The option, for example, `-lM [16Mw]' permits running jobs with
up to 16 mega words of memory, for example. The option `[myjob].script'
provides the script instructions for running a background job. Note that
NPACI users must specify a script line
#QSUB -lM [memory-amount]
specifies a
memory of `[memory-amount]' bytes for a job using `Mw' to denote mega words,
instead of an option of `qsub'; and also required is
#QSUB -lT [CPU-time-amount]
specifying the amount of wall (user plus system) clock time in seconds.
In addition, T3E users must also specify
#QSUB -l mpp_p=[t3e_procs],mpp_t=[t3e_time]
giving the number of T3E processors and time on the T3E; and also
#QSUB -q mpp
giving the T3E queue name `mpp' (Caution: you must be in the `mpp' group to
use this queue, but you can check it by the command
grep [username] /etc/group (CR)
on the T90, whereas the default queue is `batch'.
qstat [options] (CR)
: Display status of queued batch jobs;
see `man qsub (CR)' for more information.
/mpp/bin/mppstat (CR)
: Not an NQS command, but displays the current T3E configuration and the
number of available processors (PEs).
/usr/local/adm/access/bin/qstatmpp (CR)
: Not an NQS command, but displays the currently queued T3E jobs.
T90 Fortran90 (f90) and other Extensions
T90 Fortran90 (f90) Compiler Options
T90 Fortran90 (f90) Miscellaneous Extensions
``FORTRAN90 Array Notation''
{f90 allows Fortran90 extensions for array, making
array statements like `AS =S', `C = A +B', `A(1:50) = B(1:100:2)' for
appropriately dimensioned arrays AS, A, B and C, and scalar S (i.e.,
like AS(i,j) = S, for all i and j within subscript bounds); in general
'A([start]:[end]:[step])' references the single subscript array section
for i = [start] to [end] in steps of [step]. Other examples are
`a(i,:)' for the i-th row of array `a', `a(:,j)' for the j-th column,
`a(1::2)' for the odd vector elements, `a(n:1:-1)' for the `n' vector
elements of `a' in reverse order, and
`z(1:n) = -log(z(1:n))' or `z(1:n) = ranf()'.}
real [variables-list]
{The f90 `real' declaration declares variables and array
elements as 32-bit (4-byte) words with only 23-bits allotted
to the fraction for IEEE precision.
This is somewhat different from the old non IEEE precision Cray
where real meant an 8 byte or 64 bit real.
Thus in f90 code, use the built-in functions
`abs', `sqrt', `exp', `amax1' and so forth. The IEEE precision f90
`double precision'
declaration is 64-bit with a 54-bit fraction, and hence is
entirely different from old Non-IEEE precision Cray `double precision'.}
POINTER (P,A)
{The f90 `pointer' statement declares that the declared
integer
(usually) variable holds (points to, for C-fans) the shifted initial
(base) address of the declared array A.}
``Execution Time Allocation''
{f90 allows execution time storage of temporary arrays
within subprograms, rather than at compile time; means that
f90 will be less sensitive to array bounds over-runs.}
open ([unit],file=`[fn]',status='unknown')
{Format of f90 OPEN statement assigning unit number [unit]
to filename [fn]; place in program after declarations;
[unit] = 5 defaults to UNIX `stdin' as does
[unit] = * for read statements
or reads from the terminal
unless it is redirected by an `open' or a `lt;';
[unit] = 6 defaults to UNIX `stdout' as does
[unit] = * for write statements
or writes to the terminal
unless it is redirected by an `open' or a `>';
[unit] = 0 defaults to UNIX `stderr' or writes diagnostics
to the terminal unless it is redirected by an `open' or a `>&';
note that file names are placed in quotes in the OPEN
statement; see also `man' for UNICOS `assign' and `env' statements.
}
save [variable or array name list separated by commas]
{The save statement is essential in f90 subroutines to save parameter
variable values for later calls to a subroutine; the `-ev' option
of f90 provides a better solution to this problem;
if not used can lead
logic errors, especially for users accustomed to F66 Fortran in
which variables are saved after the RETURN statement is executed, but
lost in f90.}
recursive [function or subroutine]([subprogram arguments])
{The 'recursive' prefix is required on subprograms called
recursively, but also the recursive suboption is needed in the compiler
statement.}
[statement] ! [embedded comment]
{The line embedded comment is now legal in Cray Fortran.}
intrinsic [f90-function1][,[f90-function2]]
{An Intrinsic function is needed in `f90' to declare any
Fortran90 intrinsics, such as ANY, DOT_PRODUCT, MAXVAL, RESHAPE, ALL, EOSHIFT,
MINLOC, SPREAD, COUNT, FLOAT, MINVAL, SUM, CSHIFT, MATMUL, PACK, TRANSPOSE,
MAXLOC, PRODUCT, UNPACK.}
Fortran90 Array Construction Functions
PACK([array],[mask-array][,[vector]])
{Transforms (packs) the array `[array]' into a vector `[vector]' (an optional
argument, which if not present, the output goes to the value of the function)
according to the true values of the `[array]'-conformable, logical
mask `[mask-array]'.
}
UNPACK([vector],[mask-array],[field-array])
{Transforms (unpacks) the vector `[vector]' into the array `[field-array]'
according to the true values of the `[field-array]'-conformable, logical mask
`[mask-array]'.
}
SPREAD([array],[dim],[ncopies])
{Transforms (spreads) the source array `[array]' into the output value of the
function with `[ncopies]' copies along the dimension `[dim]' (horizontal copies
if `[dim]'=1 and vertical if `[dim]'=2.
}
RESHAPE([array],[shape][,[pad]][,[order]])
{Transforms (reshapes) the source array `[array]' into the output value of the
function with shape `[shape]' with order `[order]' padding the array `[pad]'.
}
Fortran90 Array Reduction Functions
SUM([array][,[dim][,[mask]]])
{The `SUM' function computes the sum of the elements of the array `[array]'
along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if
`[dim]'=2) according to the true values in the conditional mask `[mask]',
if present. This function makes the Cray sum function the same as the
Connection Machine version.
}
PRODUCT([array][,[dim][,[mask]]])
{The `PRODUCT' function computes the product of the elements of the array
`[array]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if
`[dim]'=2) according to the true values in the conditional mask `[mask]',
if present.
}
MAXVAL([array][,[dim][,[mask]]])
{The `MAXVAL' function computes the maximum value of the elements of the array
`[array]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if
`[dim]'=2) according to the true values in the conditional mask `[mask]',
if present.
}
MINVAL([array][,[dim][,[mask]]])
{The `MINVAL' function computes the minimum value of the elements of the array
`[array]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if
`[dim]'=2) according to the true values in the conditional mask `[mask]',
if present.
}
COUNT([mask][,[dim]])
{The `COUNT' function computes the number of the true elements of the logical
array `[mask]' along the dimension `[dim]' (by columns if `[dim]'=1 or by
rows if `[dim]'=2), if present.
}
ANY([mask][,[dim]])
{The `ANY' function computes if there are any true elements in the logical
array `[mask]' along the dimension `[dim]' (by columns if `[dim]'=1 or by
rows if `[dim]'=2), if present, and returns a logical true or false answer.
}
ALL([mask][,[dim]])
{The `ALL' function computes if there are all true elements in the logical
array `[mask]' along the dimension `[dim]' (by columns if `[dim]'=1 or by
rows if `[dim]'=2), if present, and returns a logical true or false answer.
}
Fortran90 Array Manipulation Functions
TRANSPOSE([array])
{The `TRANSPOSE' function transposes the 2-subscript array `[array]' with
the result array of reversed dimensions.
}
EOSHIFT([array],[shift][,[boundary][,[dim]]])
{The `EOSHIFT' function does an end-off shift on the array `[array]' along
the dimension `[dim]' using the boundary value(s) `[boundary]' to fill in,
if necessary. Caution: Connection Machine arguments have a different order.
}
CSHIFT([array],[shift][,[boundary][,[dim]]])
{The `CSHIFT' function does a circular shift on the array `[array]' along
the dimension `[dim]' using the boundary value(s) `[boundary]' to fill in,
if necessary. Caution: Connection Machine arguments have a different order.
}
Fortran90 Array Location Functions
MAXLOC([array][,[mask]])
{The `MAXLOC' function finds the first element of target array `[array]' having
the maximum value, relative to the conditional mask `[mask]', if present.
}
MINLOC([array][,[mask]])
{The `MINLOC' function finds the first element of target array `[array]' having
the minimum value, relative to the conditional mask `[mask]', if present.
}
Fortran90 Array Matrix Multiply Functions
MATMUL([array1][array2])
{The `MATMUL' function computes the matrix product of target arrays
`[array1]' and `[array2]' commensurate for multiplication, with the result
matrix of appropriate size. This function is also used for matrix-vector
multiplication.
}
DOT_PRODUCT([vector1][vector2])
{The `DOT_PRODUCT' function computes the scalar, dot product of target
vectors `[vector1]' and `[vector2]', with the scalar result.
Caution: the Connection Machine function is `dotproduct'.
}
Fortran90 Array Functions TEST CODE
T90 Fortran90 (f90) Differences:
Here is a sample T90 Fortran 90
code
`pgm.f' = `
t90f90test.f'
with many examples, heavily commented and followed
by the actual output run on t90.npaci.edu using the commands
If b = 1 3 5 logical mask=b.gt.3
2 4 6
then s3=sum(b,1,mask) or s2=sum(b,2,mask) work when real s3(3),s2(2)
but isum=sum(b,mask) or isum=sum(b,,mask) or isum=sum(b,:,mask)
do NOT work.
That is how do I enter a scalar dim for the whole array?
f90 -O3 -r3 -o run pgm.f&
run>&pgm.out&
%%%%%%%%%%% pgm.f=t90f90test.f %%%%%%%%%
program f90test
code98: compare ranf() and random_number pseudo random number generators
code97: update by removing old comments to cmfortran
code96: retest=f90test.f redone on borg = convex spp1200/xa-16
integer, parameter :: m = 6
integer, parameter :: n = 4
integer :: i,j
integer, dimension(2) :: s2, ctr1, ctr2, ctr3, b2
integer, dimension(3) :: s3 ,at ,ar1 ,ar2 ,br1 ,br2
integer, dimension(4) :: as(4)
integer, dimension(2,2) :: c ,bi
integer, dimension(2,3) :: b, a
integer, dimension(3,2) :: ct
integer, dimension(3,4) :: cs
integer, dimension(4,3) :: cst
logical, dimension(2,3) :: test
logical, dimension(64,64) :: inmask
real, parameter :: tol = 0.5e-5
integer, parameter :: niter = 5000
real :: diffav
real, dimension(8,8) :: us
real, dimension(64,64) :: u , du
real :: ranf, xran
real, dimension(m,n) :: uniranf, uniran
real, dimension(n,m) :: truniranf, truniran
intrinsic sum,maxval,minval,product
& ,dot_product,matmul,transpose
& ,cshift,eoshift,spread
data b/1,2,3,4,5,6/ !replace constructors initialization
data as/2,3,4,5/
data at/2,3,4/
c --------------------Array Constructors:
b(1,1:3) = (/1, 3, 5/) ! initialize first row, along dimension 2.
b(2,1:3) = (/2, 4, 6/) ! initialize second row, along dimension 2.
print*,'Note: constructors like "(/1,2/)" allowed in fc9.5'
br1 = b(1,:)
br2 = b(2,:)
print60,br1,br2
60 format(' b(2,3)'/(3i3))
c --------------------Sum Function sum:
isum = sum(b) ! => isum = 21; i.e., Front-End scalar.
print61,' isum=sum(b)=',isum
61 format(1x,a36,i4)
isum = sum(b(:,1:3:2)) ! => isum = 14; sole ':' means all values '1:2'.
print61,' isum = sum("b(:,1:3:2)")=',isum
bi=b(:,1:3:2)
isum=sum(bi)
print61,' isum = sum("b(:,1:3:2)")=',isum
print*,'CAUTION: "dim=", etc., markers= NOT allowed in intrinsics'
s2 = sum(b,2) ! redeclared with the correct array section shape.
print62,' s2 = sum(b,2)=',s2 ! => s2 = (/9,12/), row sums
62 format(1x,a32,2i3)
s3 = sum(b,1) ! => s3 = (/3,7,11/); column sums.
print63,' s3 = sum(b,1)=',s3
63 format(1x,a32,3i3)
print*,'CAUTION: "mask=" marker= STILL not allowed either.'
s3 = sum(b,1,b.gt.3) ! => s3 = (/0,4,11/); i.e., conditional col sum
print63,' s3 = sum(b,1,"b.gt.3") =',s3
test=b.gt.3
s3 = sum(b,1,test) ! => s3 = (/0,4,11/); i.e., conditional col sum
print63,' s3 = sum(b,1,"b.gt.3") =',s3
s2 = sum(b,2,test) ! => s2 = (/5,10/); i.e., conditional row sum
print62,' s2 = sum(b,2,b.gt.3) =',s2
cf8er:isum = sum(b,0,test) ! => isum = 18; i.e., add only elements
cf8er:print61,' isum = sum(b,0,b.gt.3) =',isum ! that are greater than three.
print*,' CAUTION: If "sum(array[dim[,mask]])", CANT use zero (0)'
& ,' for [dim] for whole array when there is a mask.'
c --------------------Maximum Value Function maxval:
imax = maxval(b) ! => imax = 6; array maximum value.
print61,' imax = maxval(b)=',imax
s3 = maxval(b,1) ! => s3 = (/2,4,6/); column maximums.
print63,' s3 = maxval(b,1)=',s3
s2 = maxval(b,2) ! => s2 = (/5,6/); row maximums.
print62,' s2 = maxval(b,2)=',s2
c --------------------Minimum Value Function minval:
imin = minval(b) ! => imin = 1; array minimum value.
print61,' imin = minval(b)=',imin
c --------------------Product Function product:
s2 = product(b,2) ! => s2 = (/15,48/); products of column elements.
print62,' s2 = product(b,2)=',s2
c --------------------Dot Product Function dot_product:
idot = dot_product(br1,br2) ! => idot = 44; dot product of row
print61,' idot = dot_product(b(1,:),b(2,:))=',idot ! vectors of b.
print*,' CAUTION: Array syntax not allowed in actual arguments.'
c --------------------Matrix Multiplication Function matmul:
! assuming array b of the previous section.
![Ans] = matmul([Array_1],[Array_2]) ! computes matrix multiplication
! of two rank two matrices.
c = matmul(b(:,1:2),b(:,2:3)) ! => c(1,:)=(/15,23/);c(2,:)=(/22,34/).
c=transpose(c)
print623,'c=matmul(b(:,1:2),b(:,2:3))=',c
623 format(1x,a36/(2i3))
![Ans] = transpose([Array]) ! transforms an array to its transpose.
ct = transpose(b) ! => ct(1,:)=(/1,2/);ct(2,:)=(/3,4/);ct(3,:)=(/5,6/).
ctr1 = ct(1,:)
ctr2 = ct(2,:)
ctr3 = ct(3,:)
print623,'ct = transpose(b)=',ctr1,ctr2,ctr3
c --------------------Circular Shift Function cshift:
! assume b is again initialized as
! b = 1 3 5
! 2 4 6
a = cshift(b,1,2) ! => a = 3 5 1
! 4 6 2
cshift EG1:
ar1 = a(1,:)
ar2 = a(2,:)
print633,'a = cshift(a,1,2)=',ar1,ar2
633 format(1x,a36/(3i3))
! i.e., b(i,(j+shift) "mod" n) -> a(i,j) for j=1:2, etc.;
! nonstandard modulus fn: 0 "mod" n = n; 1 "mod" n = 1; ...; n "mod" n = n
! i.e., the result is computed from shifting subscript in specified
! dimension of the source array by the specified shift.
a = cshift(b,-1,2) ! => a = 5 1 3
! 6 2 4
cshift EG2:
ar1 = a(1,:)
ar2 = a(2,:)
print633,'a = cshift(b,-1,2)=',ar1,ar2
! i.e., b(i,(j+shift) "mod" n) -> a(i,j) for j=2:3, etc.
cshift EG3:
s2(1) = 1
s2(2) = 2
a = cshift(b,s2,2) ! a = 3 5 1
! 6 2 4
! i.e., an array-valued shift, or shift per row.
ar1 = a(1,:)
ar2 = a(2,:)
print633,'a = cshift(b,(/1,2/),2)=',ar1,ar2
cshift Laplace Example:
! Jacobi Iteration for a 5-star discretization of
! 2D Laplace's equation:
u = 0
u(1,:)=2
u(64,:)=2
u(:,1)=2
u(:,64)=1
inmask = .FALSE.
inmask(2:63,2:63) = .TRUE.
diffav = 1
iter=0
do while (diffav.gt.tol.and.iter.lt.niter)
iter=iter+1
du = 0
where(inmask)
du = 0.25*(cshift(u,1,1)+cshift(u,-1,1)+cshift(u,1,2)
& +cshift(u,-1,2)) - u
u = u + du
end where
du = du*du
diffav = sqrt(sum(du)/(62*62))
end do
! which is the main program fragment of laplace.fcm.
print*,'CAUTION: array sections not allowed in print'
us = u(1:64:9,1:64:9)
us=transpose(us)
print66,'u = laplace-shift(u)= ; iter=',iter,'; av-diff ='
& ,diffav,us
66 format(1x,a36,i5,a11,e10.3/(8f8.4))
c --------------------End Off Shift Function eoshift:
a = eoshift(b,-1,0,1) ! a = 0 0 0 note default boundary value is 0.
! 1 3 5
ar1 = a(1,:)
ar2 = a(2,:)
print633,'a = eoshift(b,-1,0,1)=',ar1,ar2
s2=(/-1,0/)
b2=(/7,8/)
a = eoshift(b,s2,b2,2) ! => a = 7 1 3
! 2 4 6
ar1 = a(1,:)
ar2 = a(2,:)
print633,'a = eoshift(b,(/-1,0/),(/7,8/),2)=',ar1,ar2
a = eoshift(b,2,0,2) ! => a = 5 0 0
! => 6 0 0
ar1 = a(1,:)
ar2 = a(2,:)
print623,'a = eoshift(b,2,2)=',ar1,ar2
c --------------------Spread Function spread:
cs = spread(as,1,3)
! contents of cs:
! 2 3 4 5
! 2 3 4 5
! 2 3 4 5
cst = transpose(cs)
print64,'as =',as
64 format(1x,a32,4i3)
print643,'cs = spread(as,1,3)=',cst
643 format(1x,a36/(4i3))
c --------------------
cs = spread(at,2,4)
! contents of c:
! 2 2 2 2
! 3 3 3 3
! 4 4 4 4
cst = transpose(cs)
print63,'at =',at
print643,'cs = spread(at,2,4)=',cst
c ---------------------------------------------------------------------------
! i.e., b=spread(a,d,c) =>
! a(n_1,n_2,...,n_(d-1),n_d,...,n_r) -> b(n_1,n_2,...,n_(d-1),c,n_d,...,n_r)
! where r is the rank of source array a and n_i is the size of dimension i;
! noting that a new dimension of size c is added before dimension d.
c ---------------------------------------------------------------------------
! Initialize scalar xran with a pseudo random number
call random_number(harvest=xran)
call random_number(uniran)
! xran and uniran contain uniformly distributed random numbers
truniran = transpose(uniran)
write(6,65) xran, truniran
65 format(' f90 uniform random_number(): xran =',f14.10/
& ' and f90 subroutine random_number() uniform random array:'
& /(4f14.10))
! standard UNICOS random number generator ranf:
do i = 1, m
do j = 1, n
uniranf(i,j) = ranf()
enddo
enddo
truniranf = transpose(uniranf)
write(6,651) truniranf
651 format(' UNICOS function ranf() uniform random array:'/(4f14.10))
stop
end
%%%%%%%%%%% end pgm.f=t90f90test.f %%%%%%%%%
Here is the output
t90f90test.output:
%%%%%%%%%%% begin pgm.output = t90f90test.output %%%%%%%%%
Note: constructors like "(/1,2/)" allowed in fc9.5
b(2,3)
1 3 5
2 4 6
isum=sum(b)= 21
isum = sum("b(:,1:3:2)")= 14
isum = sum("b(:,1:3:2)")= 14
CAUTION: "dim=", etc., markers= NOT allowed in intrinsics
s2 = sum(b,2)= 9 12
s3 = sum(b,1)= 3 7 11
CAUTION: "mask=" marker= STILL not allowed either.
s3 = sum(b,1,"b.gt.3") = 0 4 11
s3 = sum(b,1,"b.gt.3") = 0 4 11
s2 = sum(b,2,b.gt.3) = 5 10
CAUTION: If "sum(array[dim[,mask]])", CANT use zero (0) for [dim] for whole array when there is a mask.
imax = maxval(b)= 6
s3 = maxval(b,1)= 2 4 6
s2 = maxval(b,2)= 5 6
imin = minval(b)= 1
s2 = product(b,2)= 15 48
idot = dot_product(b(1,:),b(2,:))= 44
CAUTION: Array syntax not allowed in actual arguments.
c=matmul(b(:,1:2),b(:,2:3))=
15 23
22 34
ct = transpose(b)=
1 2
3 4
5 6
a = cshift(a,1,2)=
3 5 1
4 6 2
a = cshift(b,-1,2)=
5 1 3
6 2 4
a = cshift(b,(/1,2/),2)=
3 5 1
6 2 4
CAUTION: array sections not allowed in print
u = laplace-shift(u)= ; iter= 4730; av-diff = 0.499E-05
2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 1.0000
2.0000 1.9762 1.9479 1.9090 1.8491 1.7440 1.5208 1.0000
2.0000 1.9573 1.9068 1.8387 1.7387 1.5836 1.3402 1.0000
2.0000 1.9469 1.8844 1.8014 1.6836 1.5141 1.2817 1.0000
2.0000 1.9469 1.8844 1.8014 1.6836 1.5141 1.2817 1.0000
2.0000 1.9573 1.9068 1.8387 1.7387 1.5836 1.3402 1.0000
2.0000 1.9762 1.9479 1.9090 1.8491 1.7440 1.5208 1.0000
2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 1.0000
a = eoshift(b,-1,0,1)=
0 0 0
1 3 5
a = eoshift(b,(/-1,0/),(/7,8/),2)=
7 1 3
2 4 6
a = eoshift(b,2,2)=
5 0
0 6
0 0
as = 2 3 4 5
cs = spread(as,1,3)=
2 3 4 5
2 3 4 5
2 3 4 5
at = 2 3 4
cs = spread(at,2,4)=
2 2 2 2
3 3 3 3
4 4 4 4
f90 uniform random_number(): xran = 0.5801136486
and f90 subroutine random_number() uniform random array:
0.9505127350 0.3056509439 0.0986253383 0.6938844384
0.7863714253 0.6891007107 0.2765484551 0.9344770142
0.2976202640 0.3826622387 0.6204460278 0.2120929553
0.4536999003 0.1329027055 0.0835029668 0.1306527482
0.0062619416 0.8318579032 0.9903771206 0.8625969805
0.2757364264 0.5829797958 0.9793469434 0.8189092940
UNICOS function ranf() uniform random array:
0.5407187129 0.0187994091 0.3141160167 0.7651821004
0.9415271082 0.2893071356 0.5849975196 0.9030257778
0.8866798463 0.4966670053 0.3964840582 0.8718218141
0.9311052262 0.5954839343 0.2096123584 0.8881281192
0.4641396487 0.6280308383 0.4467249313 0.4578495774
0.2349011311 0.7635970977 0.5911920675 0.4438340178
STOP executed at line 222 in Fortran routine 'F90TEST'
CPU: 1.827s, Wallclock: 0.533s, 24.5% of 14-CPU Machine
Memory HWM: 308988, Stack HWM: 37805, Stack segment expansions: 0
%%%%%%%%%%% end pgm.output = t90f90test.output %%%%%%%%%
Cray T3E f90 Differences:
f90 -O3 -r3 -Xm -o fpgm fpgm.f &
mpprun -n1 fpgm >& fpgm.output &
%%%%%%%%%%% pgm.f=cf97test.f %%%%%%%%%
f90 Library Functions
[variable] = ssum (n,a(m),k)
{The optimized scientific library
SCILIB function `ssum' computes the sum of `n' elements
of array `a' starting from element `m' in steps of `k'; the equivalent
but not optimal, do loop is {sum=0; do 1 i=m,n+n-1,k;
1 sum=sum+a(i)}; e.g. `sum=ssum(n,a,1)'
returns the sum of the first `n' elements of the array
'a'; if `a' is an m X n 2-subscript array, use
`t = ssum(m*n,a(1,1),1)'; use `man ssum' for more information;
the `-l libsci.a' option in the `segldr' should be optional.
UPDATE: In cf77 version 6.0, `ssum' has been replaced by the Fortran90 `sum'
function.}
[variable] = sdot (n,a,1,b,1)
{The optimized SCILIB function `sdot' returns the calculated value of
the dot product of `n' elements of the vectors `a' and `b' in steps of 1;
the `-l libsci.a' option of the `segldr' should be optional.
UPDATE: In cf77 version 6.0, `sdot' has been replaced by the Fortran90
`DOT_PRODUCT' function.}
call mxm (a,m,b,kmax,c,n)
{The optimized SCILIB subroutine returns the calculated value of the
full matrix by matrix product of
the `m X kmax' array `a' and the `kmax X n'
array `b' into the `m X n' output array `c';
use `mxma' for multiplication of sub-matrices when the matrices are not
full; use `man mxm' (ignore UNICOS function of the same name) or
'man mxma' for more information; the `-l libsci.a' option
of the `segldr' should be optional.
UPDATE: In cf77 version 6.0, `mxm' has been replaced by the Fortran90
`MATMUL' function.}
call mxv (a,m,b,n,c)
{The optimized SCILIB subroutine returns the calculated value of the
full matrix by vector product of
the `m X n' array `a' and the `n'
vector `b' into the `m X n' output vector `c', by rolling up the `j'
loop; use `man mxv'; the `-l libsci.a' option
of the `segldr' should be optional.
UPDATE: In the T90 cf77 version 6.0, `mxv' has been replaced by the Fortran90
`MATMUL' function.}
call random_number([HARVEST=][variable])
{F90 Pseudo-random number generator on [0,1], as intrinsic subroutine rather
than intrinsic function, that gets the first or next random number or array
a stores it in the user output variable or array `[variable]'.
For example:
real s, r(100,100)
call random_number(harvest=s)
call random_number(r)
See `man random_number' for more information,
or `man rand_seed' for changing the random sequence.
}
[random-variable] = ranf()
{UNICOS Pseudo-random number generator on [0,1]
that gets the first or next random
number, e.g.,
real s, r(100,100)
s = ranf()
do i = 1,100
do j = 1,100
r(i,j) = ranf()
enddo
enddo
or use
`r(1:n) = ranf()' in Fortran90 notation; use `x(1:n) = -log(r(1:n))' to
convert to an exponential distribution; change the random
generator seed using `call ranset([new-seed]), but it is not necessary
to start with a seed; `ranf' is a great
random number generator since it properly vectorizes in loops;
use `man ranf' for more information, including use for C/C++ as `_ranf()'
which requires the following include and declaration statements:
#include
}
call wheni[reln]
([nfind],[iarray],[inc],[itarget],[index],[nval]) (CR)
{Finds all integer array (`[iarray]') elements in relation
(`[reln]') to the integer target (`[itarget]'); `[reln]' = `lt', `le',
`gt' or `ge'; `[n]' is the number of elements to be searched in
increments of `[inc]';
`[index]' is the integer array of the indices of the output; and
`[nval]' is the number in indices found.}
T90 Fortran90 (f90) Compiler Vector Toggling
Directives
!DIR$ VECTOR
{Compiler directive causes all following
inner DO loops to be vectorized unless
the loop is known to have only one iteration and until superceded
by another directive that alters vectorization.}
!DIR$ NOVECTOR
{Directive turns off vectorization at next DO loop
until turned back on.}
!DIR$ VSEARCH
{Directive permits optimization of loops that can have a premature
exit, as with convergence of an iteration. !DIR$ NOVSEARCH
directive turns it off.}
!DIR$ INLINE
{Directive turns on inlining, inline code generation, of subprograms
if `-I' or `-o inline' f90 options are used; !DIR$ NOINLINE
turns inlining off.}
T90 Fortran90 (f90) Compiler Scalar Optimization Directives
!DIR$ BL
{Directive turns on bottom loading for loops, pre-fetching data for
the next loop iteration; !DIR$ NOBL turns BL off.}
!DIR$ NOSIDEEFFECTS [subprogram-name]
{Allows keeping data in registers across subprograms, if no global
data (i.e., arguments of common blocks) are changed.}
!DIR$ SUPPRESS [variable-list]
{Directive temporarily suppresses scalar optimization on variables
in loops containing the directive.}
T90 Fortran90 (f90) Compiler Loop Directives
!DIR$ IVDEP
{Directive causes compiler to Ignore Vector DEPendencies in only
the next inner most DO. Disabled by the NOVECTOR directive.}
!DIR$ NEXTSCALAR
{Directive causes only the very NEXT DO loop to be executed in SCALAR
Mode with vectorization resuming if on. Disabled by the NOVECTOR
directive.}
!DIR$ SHORTLOOP
{Directive reduces vectorization overhead for the very next loop,
presumed SHORT or has less than 64 iterations. Disabled by the NOVECTOR
directive.}
!DIR$ RECURRENCE
{Directive turns on vectorization of reduction loops (e.g., sum loops);
!DIR$ NORECURRENCE turns it off.
Disabled by the NOVECTOR directive.}
T90 Fortran90 (f90) Compiler Storage Directives
!DIR$ VFUNCTION [external-function-list]
{Directive declares Vector version of an external CAL FUNCTION, where
CAL is the Cray Assembler Language, but the function can not be
declared in an External statement; works with list of CAL functions
separated by commas. See the f90 Vol. 1 for other restrictions.
}
!DIR$ AUXILIARY [array-list]
{Storage directive allows assignment to the secondary disk storage
for the Cray Y-MP only.}
T90 Fortran90 (f90) Compiler Diagnostic Directives
!DIR$ BOUNDS [array-names]
{Allows the checking of array subscript bounds, but inhibits
vectorization;
applies to all arrays unless particular one are listed as arguments.}
!DIR$ NOBOUNDS
{Prevents checking of subscript bounds.}
!DIR$ FLOW
{Turns on Flow-trace and `!DIR$ NOFLOW' turns it off.}
T90 Fortran (f90) Multitasking Options
ja (CR)
${TMP}/[fn] < [data] > & [output] & (CR)
with the job accounting information appearing in a file of the form
`.jacct[jobid]'. Including the pass through option
`-Wd"-l [fn].ml" ' will also produce an fpp summary listing
in `[fn].ml' (but no executable)
with the markers `P` for autotasked, `V' for
vectorized, `N' for not chosen or not optimized, adn `D' for
data dependent.
f90 -O full -M [fn].f > [fn].m & (CR)
: The `-M' option results in the intermediate Fortran file
`[fn].m' with microtasking directives automatically inserted into
the `[fn].f' source using the dependence analysis of the `fpp'
preprocessor; no object or executable file is produced;
the user can insert additional compiler directives into `[fn].m'
and compile it with the Cray Fortran multitasking processor
`fmp', the translator of the directives,
by `fmp [fn].m > [fn].j (CR)'; the intermediate expanded
file `[fn].j' is further assembled, linked and loaded by
`sld -o [exec] [fn].j (CR)'.
Cray T90 f90 and cc Timing Utility Functions.
T90 Fortran90 (f90) Timing Utility Functions
real tv(100),cputim()
character*24 tchar(100)
kt = 1
tv(kt) = second() ! first 2 calls get the overhead
kt = kt + 1
tv(kt) = second() ! initial time
code-continues
... more code ...
code-continues
kt = kt + 1
tchar(kt) = `loop [999]'
tv(kt) = second()
do [999] i = 1, [1000]
code-continues
... rest of do ....
code-continues
999 continue
kt = kt + 1
tv(kt) = second() !tv(kt) - tv(kt-1) = do-cputime
code-continues
... more do loops and more timing step pairs ....
code-continues
kt = kt +1
tv(kt) = second() !final time
overhd = tv(2) - tv(1) !timer second overhead
do [99999] ks =3, kt - 2 !cpu-time for each timed loop
cputim(ks) = tv(ks+1) - tv(ks) -- overhd
write(6,[99998]) ks, cputim(ks), tchar(ks)
Comment: writes hinder vector optimization, so save writes until last
99999 continue
99998 format(1x,i3,' time =',f12.7,' for ',a)
cputot = tv(kt) - tv(2) - (kt-2)*overhd
Caution: due to overhead variability, total can be off for small job
write(6,*) 'total cpu-time =',cputot
For timing small loops, put the small loop inside another
loop that just does a large number of repetitions of the small
loop, say N, then divide the time difference by N;
use `man second' for other information.
[flag] = gettimeofday(&tp,&tzp); : C/C++ microsecond
wall clock timer and timezong utility. Requires the special header
include statement: `#include
struct timeval tp ;
/* timeval is a structure with pointer name tp and having */
/* unsigned long tp.tv_sec giving seconds since Jan. 1, 1970 */
/* long tp.tv_usec giving microseconds */
struct timezone tzp; /* needed only for time zone data */
See `man gettimeofday' for more information and Cray T90 C Starter
example: `t90startcc.c'.}
cc Timing Utility Function
gettimeofday
Example C program using the gettimeofday function:
#include
Table of T90/T3E Timers
T90 (perhaps MPP) Timer Summary ... MCS572 F95/FBH
Timer TimeMeasured Units Comments
----- ------------ ----- --------
clock System&User Microseconds
cpused User ClockTicks RTC ticks
gettimeofday WallTime Microseconds plus many other things from TOD; C fn
ja ElapsedUserSys Seconds plus more;on T90;only mppexec for T3E
rtclock User ClockTicks current RTC ticks
RTC RealTimeClock ClockTicks float version
IRTC RealTimeClock ClockTicks int version
second User Seconds Coarse, not useful for small timings
secondr ElapsedWall Seconds Coarse, not useful for small timings
sysclock RealTimeClock ClockTicks plus #wraps (overflows)
timef ElapsedWall Milliseconds Fn. gives elapsed time since 1st call
times Process&Child ClockTicks needs include
Notes:
There are several other timers, but not appropriate for scientific computing.
For actual use, consult the timer man page.
Ideally, a timer should give usertime in intervals a small as microseconds.
Hence, an ideal timer for the T3E would have to be designed from an rtc clock.
Job accounting ja is done on T90, but gives mppexec time (must be T3E time).
Using the C routine `gettimeofday' would be rough approximation,
suggested on now extinct Thinking Machines Corp. CM-5.
T3E MPI Wall Timer
Web Source: http://www.math.uic.edu/~hanson/tcs03guide.html