***************************************************************************** * Laplace Equation Numerically Solved by Jacobi Iteration Method on PSC T3E * Temperature T is initially 225 everywhere except at boundaries. * Revision for Correcting Errors and Stopping for Convergence. * * Physical Problem Domain Decomposed Array Problem * T=300+100*x/LX 0X,J * |y | | |* | | | | | * | | | I| | | | | | * T=100. |T=225.0| T=100.0 0x LY NR+1|____|____|____|____| |Y,I** * 0.0 LX 0 J NC+1 * T=100.0 0.0 LX/4 LX/2 3LX/4 LX * * Use Central Differencing Method with 4 PEs. * Rectangular Domain is decomposed into 4 vertical panels for Col-wise Fortran. * Each process gets its own copy of the code to execute, * with parallel synchronization handled by MPI in mpprun environment. * Each process only has subgrid for VPE=0,1,2,...,npes-1=3 here, * where VPE=VirtualOrUserProcessingElement. * Each Processor works on sub grid and then sends its Boundaries to neighbours * ** In array formulation, Y,I axis is Reversed Since T(0,0) Element at TopLeft * Note: Array T(i,j) is Only Local to Each PE and its Panel! * *97: FH/MCS572F97 interactive run suggestions: *97 compile and load: f90 -O2 -r3 -Xm -l mpi -o fpgm fpgm.f & *97 execute: mpprun -n4 fpgmfpgm.output& *97 Try ps Process Status Command While Running & Should See 4 Jobs Executing! *97 Try jobs Jobs Status Command to Check for Running, Exit or Done. *97 Input contains number of interations: niter * * Susheel Chitre PSC.... Feb 1994 *********************************************************** program laplace implicit none * New Definitions Correct Coarse Mesh Error In Original * Let NPROC=#procs;NYI=#YSubInts;NXI=#XSubInts;NXIL=#XSubIntsLocal; * NR=#InteriorRows;NC=#InteriorCols;NC4=#InteriorColsLocal; * MXITER=LimitMaxIterations: integer NPROC, NYI, NXI, NXIL, & NR, NC, NC4, MXITER parameter (NPROC=4, NYI=200, NXI=200, NXIL=NXI/NPROC, & NR=NYI-1, NC=NXI-1, NC4=NXIL, MXITER=5000) * Caution: Count Subintervals Rather Than Interior(Rows,Cols) which are * not properly divisible while Subintervals are (ERROR CORRECTED). * Let (LY,LX)=(Y,X)Lengths;(HY,HX)=(DY,DX)=(Y,X)StepSize: Corected Original real*8 LY, LX, HY, HX parameter (LY=1.e0, LX= 1.e0, HY=LY/NYI, HX=LX/NXI) * Let NR2=#Rows/2;NC0=StartCol;NCE=EndCol;DNC=ColStep for Sample Print integer NR2, NC0, NCE, NCL2, DNC parameter (NR2=NYI/2,NC0=NXIL/25,NCE=NXIL-NC0+1,NCL2=NXIL/2 & ,DNC=NCL2-NC0) integer NCL, NRO0,NROE,DNRO,NCO0,NCOE,DNCO parameter(NRO0=0,NROE=NYI,DNRO=NYI/20,NCO0=0,NCOE=NXIL & ,DNCO=NXIL/5) * Let stoptol=Stopping Tolerance for dtg=GlobalMaxAbsChange real*8 stoptol parameter (stoptol=0.5e-2) * Let LEFT=LeftMsgTag;RIGHT=RightMsgTag;ROOT=MasterVPE#: integer LEFT, RIGHT, ROOT parameter (LEFT=100, RIGHT=101, ROOT=0) * Timer: Declare MPI Wall Timer Variables (double precision is not supported): real*8 starttime,stoptime,timerres,totalwalltime * MPI Step: Include MPI Fortran Header Definitions and Declarations include 'mpif.h' * Let T(*,*)=LocalPanelTemp;Told(*,*)=OldLocalPanelTemp;dt=LocalMaxAbsChange; * and dtg=GlobalMaxAbsChange: real*8 T(0:NYI,0:NXIL+1), Told(0:NYI,0:NXIL+1), dt, dtg * Let i=Row(y)Index;j=Col(x)Index;iter=CurrentIteration;niter=#Iterations; * and mype=LocalVPE#;npes=#PEs;ierr=MPIReturnOrErrorCode;nstop=StopIter: integer i, j, iter, niter, mype, npes, ierr, nstop * MPI Step: Declare status, given included MPI_STATUS_SIZE value: integer status(MPI_STATUS_SIZE) * MPI Step: Initialize MPI, giving each PE copy of code;MPI required subroutine; * using programming model:SPMD=SglPgmMultiData. call MPI_Init(ierr) * MPI Step: Start MPI Wall Timer and get Tick Resolution: starttime = MPI_Wtime() timerres = MPI_Wtick() * MPI Step: Get T3E assigned npes=#PEs or Communication_size: call MPI_Comm_size(MPI_COMM_WORLD, npes, ierr) * where MPI_COMM_WORLD=MPI_Communication_World=UserSetOfPEs. * MPI Step: Get local (this PE) mype=PE#, also called Communication_rank: call MPI_Comm_rank(MPI_COMM_WORLD, mype, ierr) * Laplace: Set Local Initial Conditions for Laplace Equation: call initialize( T ) * Laplace: Set Local Boundary Conditions for Laplace Equation: call set_bcs( T, mype, npes ) * Initialize Iteration Stopping Parameter nstop = 0 NCL = NC4 * Laplace: Root Reads Number of Iterations for this Job from "data" Input File: if( mype .eq. ROOT ) then read*, niter if( niter.gt.MXITER ) niter = MXITER * Correct Local Interior Point Count so Sum = NC = #TotalInteriorPoints NCL=NC4-1 endif * Laplace: Save Old Iteration: do i=0, NR+1 do j=0, NCL+1 Told(i,j) = T(i,j) enddo enddo * MPI Step: All VPES Get Collective Communication of niter from Root: call MPI_Bcast(niter, 1, MPI_INTEGER, ROOT, MPI_COMM_WORLD, ierr) * where count is 1 and datatype is MPI_INTEGER. * * Do Computation on Local Sub-grid for niter 5-Star Jacobi Iterations: * Corrected for Arbitrary (LY,LX)Lengths. * Do 110 iter=1,niter Do j=1,NCL Do i=1,NR T(i,j) = 0.5 * ((Told(i+1,j)+Told(i-1,j))*HX**2 $ + (Told(i,j+1)+Told(i,j-1))*HY**2) $ /(HX**2+HY**2) Enddo Enddo * * Get dt=LocalMaxAbsChange for this PE and Save Old Temperature: * dt = 0 Do j=1,NCL Do i=1,NR dt = max( abs(T(i,j) - Told(i,j)), dt ) Told(i,j) = T(i,j) Enddo Enddo * * MPI Step: mype Sends Right Boundary Data to Right Neighbor mype+1, * using Point_to_Point Send Communication for mype=0,...,npes-2: if (mype .lt. npes-1) & call MPI_Send(T(1,NCL), NR, MPI_DOUBLE_PRECISION, mype+1, & RIGHT, MPI_COMM_WORLD, ierr) * starting at address T(1,NCL) for NR rows using datatype MPI_DOUBLE_PRECISION. * * MPI Step: mype Sends Left Boundary Data to Left Neighbor mype-1, * using Point_to_Point Send Communication for mype=1,...,npes-1: if (mype .ne. 0) & call MPI_Send(T(1,1 ), NR, MPI_DOUBLE_PRECISION, mype-1, & LEFT, MPI_COMM_WORLD, ierr) * starting at address T(1,1) for NR rows using datatype MPI_DOUBLE_PRECISION. * * MPI Step: mype Receives its Left Boundary Data from Left Neighbor mype-1 * (here using wildcard name MPI_ANY_SOURCE; note MsgTag=RIGHT), * using Point_to_Point Receive Communication for mype=1,...,npes-1: if (mype .ne. 0) & call MPI_Recv(T(1,0), NR, MPI_DOUBLE_PRECISION, & MPI_ANY_SOURCE, RIGHT, MPI_COMM_WORLD, status, ierr) * starting at address T(1,0) for NR rows using datatype MPI_DOUBLE_PRECISION, * with Additional Vector Structure status Argument. * * MPI Step: mype Receives its Right Boundary Data from its Right Neighbor mype+1 * (here using wildcard name MPI_ANY_SOURCE; note MsgTag=LEFT), * using Point_to_Point Receive Communication for mype=1,...,npes-1: * if (mype .ne. npes-1) & call MPI_Recv(T(1,NCL+1), NR, MPI_DOUBLE_PRECISION, & MPI_ANY_SOURCE, LEFT, MPI_COMM_WORLD, status, ierr) * starting at address T(1,NCL+1) for NR rows using datatype MPI_DOUBLE_PRECISION, * with Additional Vector Structure status Argument. * * MPI Step: mype Sends to RootPE LocalMaxAbsChange=dt using Collective (Global) * Reduction Maximum Function MPI_MAX with Result Eventually Collected in * GlobalMaxAbsChange=dtg of count 1 and datatype MPI_DOUBLE_PRECISION: call MPI_Reduce(dt, dtg, 1, MPI_DOUBLE_PRECISION, MPI_MAX, & ROOT, MPI_COMM_WORLD, ierr) * * Root Print Selected Values at Every 100th Iteration: * Important that Writes are in Serial Sections on DMP T3E: * Root Also Check Stopping Criteria Tolerance If( mype.eq.0 ) then If( mod(iter,100).eq.0 ) then write(6,1) mype, iter, dtg * 97: VPE=0 Writes Out a Sample Central Value: write(6,2) mype, iter, NR2, NC0, NCL,DNC, & T(NR2,NC0),T(NR2,NCL2),T(NR2,NCE) endif If(dtg.lt.stoptol) then nstop = 1 endif endif * Global Call Broadcast of Possible Stopping Value By All. call MPI_Bcast(nstop, 1, MPI_INTEGER, ROOT, MPI_COMM_WORLD, ierr) 1 format(' MYPE =',I4,'; ITER =',i5, & ,'; GlobalMaxAbsChange:dtg = ',e10.3) 2 format(' MYPE =',I4,'; ITER =',i5 & ,'; T(',i3,',',i3,':',i3,':',i3,') =',3f9.4) * Call Generic T3E barrier to Make Slaves Wait for Root to Write: call barrier * MPI Step: Barrier is Final Call for all PEs to Stop and Wait for Rest * in MPI_COMM_WORLD to Finish Before Continuing on to Next Iteration: call MPI_Barrier( MPI_COMM_WORLD, ierr ) if(nstop.eq.1) exit 110 CONTINUE * Stopping Criterion Stop: if( mype.eq.0 ) then * MPI Step: Stop MPI Wall Timer: stoptime = MPI_Wtime() totalwalltime = stoptime - starttime write(6,3) dtg, mype, totalwalltime, timerres endif 3 format(' Stopping for Small GlobalMaxAbsChange:dtg = ',e10.3 & /' MYPE =',i4,'; TotalWallTime =',e12.4 & ,' secs; WallTick =',e12.4,' secs;') * Print Final Sample Output Temperatures for Each mype's Panel: write(6,4) mype, iter, (j,j=NCO0,NCL,DNCO),NCL+1 4 format(' Sample Output: MYPE =',I4,'; ITER =',i5/' I/J ',10i7) do i = NROE, NRO0, -DNRO write(6,5) i, (T(i,j), j=NCO0,NCL,DNCO),T(i,NCL+1) end do 5 format(i4,1x,10f7.2) * MPI Step: MPI Required Finalization Step: call MPI_Finalize(ierr) * * End of Main Program Compilation! * END *----------------------------------------------------------------------* subroutine initialize( T ) * Laplace/Jacobi: Set Initial Conditions on Interior to Start Iterations. implicit none integer NPROC, NYI, NXI, NXIL, NR, NC, NC4, MXITER parameter (NPROC=4, NYI=200, NXI=200, NXIL=NXI/NPROC, & NR=NYI-1, NC=NXI-1, NC4=NXIL, MXITER=5000) real*8 LY, LX, HY, HX parameter (LY=1.e0, LX= 1.e0, HY=LY/NYI, HX=LX/NXI) real*8 T(0:NYI,0:NXIL+1), Told(0:NYI,0:NXIL+1) integer i, j do i=0, NYI do j=0, NXIL+1 T(i,j) = 225.0 enddo enddo return end *----------------------------------------------------------------------* subroutine set_bcs( T, mype, npes ) * Laplace: Set Boundary Conditions on Left, Right, Top and Bottom Boundaries. implicit none integer NPROC, NYI, NXI, NXIL, NR, NC, NC4, MXITER parameter (NPROC=4, NYI=200, NXI=200, NXIL=NXI/NPROC, & NR=NYI-1, NC=NXI-1, NC4=NXIL, MXITER=5000) real*8 LY, LX, HY, HX parameter (LY=1.e0, LX= 1.e0, HY=LY/NYI, HX=LX/NXI) real*8 T(0:NYI,0:NXIL+1), Told(0:NYI,0:NXIL+1) integer i, j, mype, npes, jx, NCL *97 Compute Local Step Sizes NCL = NC4 jx=1 * * Set Left and Right Boundaries: * if( mype.eq.0 ) then do i=1,NR * Geometric Right And Array Right: T(i,0 ) = 100.0 + 200.0*i*hy enddo NCL = NC4 - 1 jx = 0 endif if( mype.eq.npes-1 ) then do i=1,NR * Geometric Left And Array Left: T(i,NXIL+1) = 100.0 + 300.0*i*HY enddo endif * * Set Bottom and Top Boundaries: * do j=0,NXIL+1 * Geometric Bottom, But Array Top: T(0 ,j) = 100.0 * Geometric Top, But Array Bottom: T(NR+1,j) = 300.0 + 100.0*(mype*LX/4+(j-jx)*HX) enddo return end