Using MPI ========= We illustrate the collective communication commands to scatter data and gather results. Point-to-point communication happens via a send and a recv (receive) command. Scatter and Gather ------------------ Consider the addition of 100 numbers on a distributed memory 4-processor computer. For simplicity of coding, we sum the first one hundred positive integers and compute .. math:: S = \sum_{i=1}^{100} i. A parallel algorithm to sum 100 numbers proceeds in four stages: 1. distribute 100 numbers evenly among the 4 processors; 2. Every processor sums 25 numbers; 3. Collect the 4 sums to the manager node; and 4. Add the 4 sums and print the result. Scattering an array of 100 number over 4 processors and gathering the partial sums at the 4 processors to the root is displayed in :numref:`figscattergather`. .. _figscattergather: .. figure:: ./figscattergather.png :align: center Scattering data and gathering results. The scatter and gather are of the :index:`collective communication` type, as every process in the universe participates in this operation. The MPI commands to :index:`scatter` and :index:`gather` are respectively ``MPI_Scatter`` and ``MPI_Gather``. The specifications of the MPI command to scatter data from one member to all members of a group are described in :numref:`tabmpiscatter`. The specifications of the MPI command to gather data from all members to one member in a group are listed :numref:`tabmpigather`. .. index:: MPI_SCATTER .. _tabmpiscatter: .. table:: Arguments of the ``MPI_Scatter`` command. +------------------------------------------------------------------------------+ | MPI_SCATTER(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype,root,comm) | +===========+==================================================================+ | sendbuf | address of send buffer | +-----------+------------------------------------------------------------------+ | sendcount | number of elements sent to each process | +-----------+------------------------------------------------------------------+ | sendtype | data type of send buffer elements | +-----------+------------------------------------------------------------------+ | recvbuf | address of receive buffer | +-----------+------------------------------------------------------------------+ | recvcount | number of elements in receive buffer | +-----------+------------------------------------------------------------------+ | recvtype | data type of receive buffer elements | +-----------+------------------------------------------------------------------+ | root | rank of sending process | +-----------+------------------------------------------------------------------+ | comm | communicator | +-----------+------------------------------------------------------------------+ .. index:: MPI_GATHER .. _tabmpigather: .. table:: Arguments of the ``MPI_Gather`` command. +-----------------------------------------------------------------------------+ | MPI_GATHER(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype,root,comm) | +===========+=================================================================+ | sendbuf | starting address of send buffer | +-----------+-----------------------------------------------------------------+ | sendcount | number of elements in send buffer | +-----------+-----------------------------------------------------------------+ | sendtype | data buffer of send buffer elements | +-----------+-----------------------------------------------------------------+ | recvbuf | address of receive buffer | +-----------+-----------------------------------------------------------------+ | recvcount | number of elements for any single receive | +-----------+-----------------------------------------------------------------+ | recvtype | data type of receive buffer elements | +-----------+-----------------------------------------------------------------+ | root | rank of receiving process | +-----------+-----------------------------------------------------------------+ | comm | communicator | +-----------+-----------------------------------------------------------------+ The code for parallel summation, in the program ``parallel_sum.c``, illustrates the scatter and the gather. :: #include #include #include #define v 1 /* verbose flag, output if 1, no output if 0 */ int main ( int argc, char *argv[] ) { int myid,j,*data,tosum[25],sums[4]; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if(myid==0) /* manager allocates and initializes the data */ { data = (int*)calloc(100,sizeof(int)); for (j=0; j<100; j++) data[j] = j+1; if(v>0) { printf("The data to sum : "); for (j=0; j<100; j++) printf(" %d",data[j]); printf("\n"); } } MPI_Scatter(data,25,MPI_INT,tosum,25,MPI_INT,0,MPI_COMM_WORLD); if(v>0) /* after the scatter, every node has 25 numbers to sum */ { printf("Node %d has numbers to sum :",myid); for(j=0; j<25; j++) printf(" %d", tosum[j]); printf("\n"); } sums[myid] = 0; for(j=0; j<25; j++) sums[myid] += tosum[j]; if(v>0) printf("Node %d computes the sum %d\n",myid,sums[myid]); MPI_Gather(&sums[myid],1,MPI_INT,sums,1,MPI_INT,0,MPI_COMM_WORLD); if(myid==0) /* after the gather, sums contains the four sums */ { printf("The four sums : "); printf("%d",sums[0]); for(j=1; j<4; j++) printf(" + %d", sums[j]); for(j=1; j<4; j++) sums[0] += sums[j]; printf(" = %d, which should be 5050.\n",sums[0]); } MPI_Finalize(); return 0; } Send and Recv ------------- To illustrate :index:`point-to-point communication`, we consider the problem of squaring numbers in an array. An example of an input sequence is :math:`2, 4, 8, 16, \ldots` with corresponding output sequence :math:`4, 16, 64, 256, \ldots`. Instead of squaring, we could apply a difficult function :math:`y = f(x)` to an array of values for :math:`x`. A session with the parallel code with 4 processes runs as :: $ mpirun -np 4 /tmp/parallel_square The data to square : 2 4 8 16 Node 1 will square 4 Node 2 will square 8 Node 3 will square 16 The squared numbers : 4 16 64 256 $ Applying a parallel squaring algorithm to square :math:`p` numbers runs in three stages: 1. The manager sends :math:`p-1` numbers :math:`x_1, x_2, \ldots,x_{p-1}` to workers. Every worker receives: the :math:`i`-th worker receives :math:`x_i` in :math:`f`. The manager copies :math:`x_0` to :math:`f`: :math:`f = x_0`. 2. Every node (manager and all workers) squares :math:`f`. 3. Every worker sends :math:`f` to the manager. The manager receives :math:`x_i` from :math:`i`-th worker, :math:`i=1,2,\ldots,p-1`. The manager copies :math:`f` to :math:`x_0`: :math:`x_0 = f`, and prints. To perform point-to-point communication with MPI are ``MPI_Send`` and ``MPI_Recv``. The syntax for the :index:`blocking send` operation is in :numref:`tabmpisend`. :numref:`tabmpirecv` explains the :index:`blocking receive` operation. .. index:: MPI_SEND .. _tabmpisend: .. table:: The ``MPI_SEND`` command. +--------------------------------------------------+ | MPI_SEND(buf,count,datatype,dest,tag,comm) | +==========+=======================================+ | buf | initial address of the send buffer | +----------+---------------------------------------+ | count | number of elements in send buffer | +----------+---------------------------------------+ | datatype | data type of each send buffer element | +----------+---------------------------------------+ | dest | rank of destination | +----------+---------------------------------------+ | tag | message tag | +----------+---------------------------------------+ | comm | communication | +----------+---------------------------------------+ .. index:: MPI_RECV .. _tabmpirecv: .. table:: The ``MPI_RECV`` command. +-----------------------------------------------------+ | MPI_RECV(buf,count,datatype,source,tag,comm,status) | +==========+==========================================+ | buf | initial address of the receive buffer | +----------+------------------------------------------+ | count | number of elements in receive buffer | +----------+------------------------------------------+ | datatype | data type of each receive buffer element | +----------+------------------------------------------+ | source | rank of source | +----------+------------------------------------------+ | tag | message tag | +----------+------------------------------------------+ | comm | communication | +----------+------------------------------------------+ | status | status object | +----------+------------------------------------------+ Code for a parallel square is below. Every ``MPI_Send`` is matched by a ``MPI_Recv``. Observe that there are two loops in the code. One loop is explicitly executed by the root. The other, implicit loop, is executed by the ``mpiexec -n p`` command. :: #include #include #include #define v 1 /* verbose flag, output if 1, no output if 0 */ #define tag 100 /* tag for sending a number */ int main ( int argc, char *argv[] ) { int p,myid,i,f,*x; MPI_Status status; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&p); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if(myid == 0) /* the manager allocates and initializes x */ { x = (int*)calloc(p,sizeof(int)); x[0] = 2; for (i=1; i0) { printf("The data to square : "); for (i=0; i0) printf("Node %d will square %d\n",myid,f); } f *= f; /* every node does the squaring */ if(myid == 0) /* the manager receives f in x[i] from processor i */ for(i=1; i 0): COMM.send(S, dest=0) else: SUMS[0] = S for i in range(1, SIZE): SUMS[i] = COMM.recv(source=i) print 'total sum =', sum(SUMS) Recall that Python is case sensitive and the distinction between ``Send`` and ``send``, and between ``Recv`` and ``recv`` is important. In particular, ``COMM.send`` and ``COMM.recv`` have no type declarations, whereas ``COMM.Send`` and ``COMM.Recv`` have type declarations. Bibliography ------------ 1. L. Dalcin, R. Paz, and M. Storti. **MPI for Python**. *Journal of Parallel and Distributed Computing*, 65:1108--1115, 2005. 2. M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra. *MPI - The Complete Reference Volume 1, The MPI Core*. Massachusetts Institute of Technology, second edition, 1998. Exercises --------- 1. Adjust the parallel summation to work for :math:`p` processors where the dimension :math:`n` of the array is a multiple of :math:`p`. 2. Use C or Python to rewrite the program to sum 100 numbers using ``MPI_Send`` and ``MPI_Recv`` instead of ``MPI_Scatter`` and ``MPI_Gather``. 3. Use C or Python to rewrite the program to square :math:`p` numbers using ``MPI_Scatter`` and ``MPI_Gather``. 4. Show that a hypercube network topology has enough direct connections between processors for a fan out broadcast.