MCS 572 MPI C & F90 Source Code Examples

MCS 572 MPI C & F90 Source Code Examples
Spring 2003

New: Modifications for

Using prun parallel run command in batch job scripts on the PSC TCS Cluster.
Using vmirun virtual machine parallel run command in batch job scripts on the NCSA Platinum Cluster.
Using scasub with mpirun and mpimon parallel run commands on the UIC ACCC Argo Cluster.

MPI-C & MPI-F90 Source, Ouput, and Batch Job Script Examples,
Plus Command Line Examples for Argo:

Cautions:

These examples are just for compilation tests, so for real performance tests these Toy examples must larger or super sized problems.

Also modify the job script parameters for Big Computation or Big Memory Jobs and More Processors.

MPI Code PBS/NQS Generic Batch Job Script:

Get MCS572 TCS Generic Script File for Jobs on Selected Number of Processors:
- PSC TCS Cluster (Job Scripts do both compiling and executing):
  - p = 1 Processor:
    cpgm1.job; for C OR fpgm1.job; for F90
  - p = 2 Processors:
    cpgm2.job; for C
  - p = 4 Processors:
    cpgm4.job; for C OR fpgm4.job; for F90
  - p = 8 Processors:
    cpgm8.job; for C OR fpgm8.job; for F90
- NCSA Platinum Cluster (Job Scripts do only executing, so you must do compiling in your home or scratch directories when using these scripts):
  - p = 1 Processor:
    cpgm1.job; for C
  - p = 2 Processors:
    cpgm2.job; for C using the ${HOME} Directory OR cpgm2mss.job; for C using the UniTree Directory with msscmd. (See also NCSA simple sample script at pt.ncsa.uiuc.edu:/usr/local/doc/pbs).
  - p = 4 Processors:
    cpgm4.job; for C
  - p = 8 Processors:
    cpgm8.job; for C
- UIC ACCC Argo Cluster Command line initiation of batch jobs using scasub are more appropriate for Argo, so see below for more information.
Copy Your MPI Source as Generic File "cpgm[n].c" OR "fpgm[n].f" (Generic Name Used in Job Script, but you can tune this for your own usage by editing the job scripts) to your "${SCRATCH}" ${HOME}" directories, or for appropriate naming scheme;
Copy (or Create by vi) MPI Inbut Data as Generic File "cdata" (Name Used in Job Script; If Code Does Not Require Redirected Data Input, Then Any File Called "cdata" Will Do) to your "${SCRATCH}" ${HOME}" directories;
Enter
"qsub cpgm[n].job" for C OR "qsub fpgm[n].job" for F90
to Queue Submit Command on the TCS Command Line in "${SCRATCH}" or NCSA "${HOME}"Directories (Submit Job to Batch Queue and Executes Using Node Local Directory Just for This Job, But Output Should Appear in Submitted Directory) for some "[n]" processors here;
Enter "qstat -u [user]" Queue Status Commmand Until "no batch queue entries" Found which means your job is in the RUN Queue ; ("no pipe queue entries" means you job is not yet in the run Queue in the queued Queue); OR use "qstat -b" for a more total view of batch queues;
Enter "cat cpgm[n].output" OR "cat fpgm[n].output", or use some editor like "vi", where "[n]" is the selected number of processors in the Job Script, (Ignore the message "Warning: no access to tty; thus no job control in this shell...", or similar message, since it does not affect your output and is not worth fixing.) Renaming Output If Desired and/or FTP Back to UIC for Printing.

MPI Code Batch Execution Initiated on Argo's Command Line:

On UIC ACCC Argo, Command Line Batch Scheduling is More Appropriate, Rather than by Batch Job Scripts.

Argo SCALI MPI Environment Needs to be Set for each Session, if using a User C-Shell (The Environment should be the Default for the BASH-Shell):
These also can be set in your .chsrc C-Shell resource configuration file as long as the full explicit path is set for the latter two since they are recursive, and this full path can be determined, for example, using echo $PATH or echo $LD_LIBRARY_PATH
Argo Compiled Executable Needs to be Made, for instance for the GNU C compiler gcc:
(Caution: the Argo Running Jobs page has several errors in this format). Also, the GNU F77 g77 compiler or the GNU C++ g++ compiler can be used instead of gcc; or any of the Portland Group compilers such as /usr/common/pgi/linux86/bin/pgcc can be used if their full path is given.
Argo SCASUB MPIRUN Procedure:
where scasub is the SCALI batch job Linux command line submit command used in place of the qsub Unix batch job script submit command, mpirun is the standard MPI Run command, [#processors] is the requested number of processors for 1 to 16 compute nodes, and [executable] compiled using C or other compiler. The first output should be the Job Number [Job#] in the format [Job#].argo.cc.edu. Unlike the large scale clusters, TCS and Platinum, file input in not convenient in this format, so using either assignments, data initialization, definitions or file open/scan fopen/scanf is suggested for data input.
Argo SCASUB MPIMON Procedure which allows Monitoring, Selection of Specific Processors when Available and the Selection of Specific Number of Virtual Processes (Jobs) per Physical Process (Processor) or Argo Node (Here illustrated for the 4 Processors of the 4th Zone subcluster with just One Virtual Process per Node):
where the "nodes" option -- is two dashes and the node/process format is of the form
Actually on Argo, mpirun is wrapped around mpimon, unlike the standard mpirun.
Queque Status and Job Number [Job#] can be Obtained from the usual QSTAT PBS/NQS command:
and after the job is RUN (R) and Exited (E), it will be found in in the $HOME default files [mpirun].o[Job#] or [mpimon].o[Job#] for standard output or [mpirun].o[Job#] or [mpimon].e[Job#], respectively for MPIRUN or MPIMON jobs.
For more information on the Argo Webpage, See ACCC Argo Cluster Page (TOC) for direct information.

Trapezoidal Rule Code Example (Adapted from Pacheco's PPMPI Code):
1. MPI Trapezoidal Rule Computing Code:
  trap.c for C OR trap.f for F90
2. TCS MPI C Trapezoidal Rule Computing Output:
  For nodes:procs = 1:1 => trap1c.output for C      OR      trap1f.output for F90
  For nodes:procs = 1:4 => trap4c.output for C      OR      trap4f.output for F90
  For nodes:procs = 2:8 => trap8c.output for C      OR      trap8f.output for F90
3. Compile Step for code cpgm.c OR fpgm.f:
  - PSC TCS Cluster Compile Syntax:
    "cc -O -lmpi -lelan -arch ev67 -o cpgm cpgm.c " for C OR "f90 -O -lmpi -lelan -arch ev67 -o fpgm fpgm.f " for F90
  - NCSA Platinum Cluster C Compile Syntax:
    " gcc -I/usr/local/vmi/mpich/include cpgm[n].c -o cpgm[n] -L/usr/local/vmi/mpich/lib/gcc -lmpich -lvmi -ldl -lpthread " for C
4. Execute Step:
  - PSC TCS Cluster Execution Syntax:
    
    "prun -N[#nodes] -n[#processors] ./cpgm < cdata" for C
    OR better
    "prun -N ${RMS_NODES} -n ${RMS_PROCS} ./cpgm < cdata" for F90.
  - NCSA Platinum Cluster Execution Syntax:
    vmirun "cpgm2 < cdata >& ${HOME}/cpgm2.output"
5. FOR Data:
  trapdata for C or F90, a dummy input file, since the trapezoidal code does not require input. On Argo, data input by Unix redirection is not conventient anyway. However, the number of nodes assigned should be sufficiently large and the function of integration should be sufficiently complicated to constitute a super job of super performance.
PI Code Example:
1. Click for MPI Pi Computing Code:
  pi_mpi.c for C OR pi_mpia.c for C on Argo (data input in code) OR pi_mpi_cpp.c for C++ (untested) OR pi_mpi.f for F90
2. TCS MPI Pi Computing Output:
  For nodes:procs = 1:1 => pi1c.output for C      OR      pi1f.output for F90
  For nodes:procs = 1:4 => pi4c.output for C      OR      pi4f.output for F90
  For nodes:procs = 2:8 => pi8c.output for C      OR      pi8f.output for F90
3. FOR Data:
  pidata for C or F90. The final '0' is a flag to stop scanning/reading the number of nodes. However, the number of nodes should be sufficiently large to constitute a super job of super performance.
2D Laplace Equation by Jacobi Method Code Example:
1. MPI Laplace Jacobi Computing Code:
  lap4mpi.c for C OR lap4mpia.c for C on Argo (data input in code) OR lap4mpi.f
2. TCS MPI Laplace Jacobi Computing Output:
  lap4c.output for Nonconverging 1000 interations in C OR lap4_f.output for Nonconverging 1000 interations in F90
3. FOR Data:
  lapdata for C or F90, supplies the number of iterations.
See Also MCS572 MPI Information Page

Web Source: http://www.math.uic.edu/~hanson/mcs572/MPI-Examples.html
Email Comments or Questions to Professor Hanson

MCS 572 MPI C & F90 Source Code Examples Spring 2003

New: Modifications for Using prun parallel run command in batch job scripts on the PSC TCS Cluster. Using vmirun virtual machine parallel run command in batch job scripts on the NCSA Platinum Cluster. Using scasub with mpirun and mpimon parallel run commands on the UIC ACCC Argo Cluster.

MPI-C & MPI-F90 Source, Ouput, and Batch Job Script Examples, Plus Command Line Examples for Argo:

MCS 572 MPI C & F90 Source Code Examples
Spring 2003

New: Modifications for

Using prun parallel run command in batch job scripts on the PSC TCS Cluster.
Using vmirun virtual machine parallel run command in batch job scripts on the NCSA Platinum Cluster.
Using scasub with mpirun and mpimon parallel run commands on the UIC ACCC Argo Cluster.

MPI-C & MPI-F90 Source, Ouput, and Batch Job Script Examples,
Plus Command Line Examples for Argo: