Introduction to Pthreads ======================== We illustrate the use of pthreads to implement the work crew model, working to process a sequence of jobs, given in a queue. the POSIX threads programming interface --------------------------------------- Before we start programming programming shared memory parallel computers, let us specify the relation between threads and processes. .. index:: thread, process, stack, heap A thread is a single sequential flow within a process. Multiple threads within one process share heap storage, static storage, and code. Each thread has its own registers and stack. Threads share the same single address space and synchronization is needed when threads access same memory locations. A single threaded process is depicted in :numref:`figthreadedprocess` next to a multithreaded process. .. _figthreadedprocess: .. figure:: ./figthreadedprocess.png :align: center At the left we see a process with one single thread and at the right a multithreaded process. Threads share the same single address space and synchronization is needed when threads access same memory locations. Multiple threads within one process share heap storage, for dynamic allocation and deallocation; static storage, fixed space; and code. Each thread has its own registers and stack. The difference between the stack and the heap: * stack: Memory is allocated by reserving a block of fixed size on top of the stack. Deallocation is adjusting the pointer to the top. * heap: Memory can be allocated at any time and of any size. .. index:: thread safe Every call to ``calloc`` (or ``malloc``) and the deallocation with ``free`` involves the heap. Memory allocation or deallocation should typically happen respectively before or after the running of multiple threads. In a multithreaded process, the memory allocation and deallocation should otherwise occur in a critical section. Code is *thread safe* if its simultaneous execution by multiple threads is correct. For UNIX systems, a standardized C language threads programming interface has been specified by the IEEE POSIX 1003.1c standard. POSIX stands for Portable Operating System Interface. Implementations of this :index:`POSIX` threads programming interface are referred to as POSIX threads, or Pthreads. We can see that ``gcc`` supports posix threads when we ask for its version number: :: $ gcc -v ... output omitted ... Thread model: posix ... output omitted ... In a C program we just insert :: #include and compilation may require the switch ``-pthread`` :: $ gcc -pthread program.c Using Pthreads -------------- Our first program with Pthreads is once again a hello world. We define the function each thread executes: :: #include #include #include void *say_hi ( void *args ); /* * Every thread executes say_hi. * The argument contains the thread id. */ int main ( int argc, char* argv[] ) { ... } void *say_hi ( void *args ) { int *i = (int*) args; printf("hello world from thread %d!\n",*i); return NULL; } Typing ``gcc -o /tmp/hello_pthreads hello_pthreads.c`` at the command prompt compiles the program and execution goes as follows: :: $ /tmp/hello_pthreads How many threads ? 5 creating 5 threads ... waiting for threads to return ... hello world from thread 0! hello world from thread 2! hello world from thread 3! hello world from thread 1! hello world from thread 4! $ Below is the main program: :: int main ( int argc, char* argv[] ) { printf("How many threads ? "); int n; scanf("%d",&n); { pthread_t t[n]; pthread_attr_t a; int i,id[n]; printf("creating %d threads ...\n",n); for(i=0; inb = n; jobs->nextjob = (int*)calloc(1,sizeof(int)); *(jobs->nextjob) = 0; jobs->work = (int*) calloc(n,sizeof(int)); int i; for(i=0; iwork[i] = 1 + rand() % 5; return jobs; } The function to process the jobs by ``n`` threads is defined below: :: int process_jobqueue ( jobqueue *jobs, int n ) { pthread_t t[n]; pthread_attr_t a; jobqueue q[n]; int i; printf("creating %d threads ...\n",n); for(i=0; inb; q[i].id = i; q[i].nextjob = jobs->nextjob; q[i].work = jobs->work; pthread_attr_init(&a); pthread_create(&t[i],&a,do_job,(void*)&q[i]); } printf("waiting for threads to return ...\n"); for(i=0; inextjob); } implementing a critical section with mutex ------------------------------------------ Running the processing of the job queue can go as follows: :: $ /tmp/process_jobqueue How many jobs ? 4 4 jobs : 3 5 4 4 How many threads ? 2 creating 2 threads ... waiting for threads to return ... thread 0 requests lock ... thread 0 releases lock thread 1 requests lock ... thread 1 releases lock *** thread 1 does job 1 *** thread 1 sleeps 5 seconds *** thread 0 does job 0 *** thread 0 sleeps 3 seconds thread 0 requests lock ... thread 0 releases lock *** thread 0 does job 2 *** thread 0 sleeps 4 seconds thread 1 requests lock ... thread 1 releases lock *** thread 1 does job 3 *** thread 1 sleeps 4 seconds thread 0 requests lock ... thread 0 releases lock thread 0 is finished thread 1 requests lock ... thread 1 releases lock thread 1 is finished done 4 jobs 4 jobs : 0 1 0 1 $ .. index:: mutex, critical section There are three steps to use a ``mutex`` (mutual exclusion): 1. initialization: ``pthread_mutex_t L = PTHREAD_MUTEX_INITIALIZER;`` 2. request a lock: ``pthread_mutex_lock(&L);`` 3. release the lock: ``pthread_mutex_unlock(&L);`` The main function is defined below: :: pthread_mutex_t read_lock = PTHREAD_MUTEX_INITIALIZER; int main ( int argc, char* argv[] ) { printf("How many jobs ? "); int njobs; scanf("%d",&njobs); jobqueue *jobs = make_jobqueue(njobs); if(v > 0) write_jobqueue(jobs); printf("How many threads ? "); int nthreads; scanf("%d",&nthreads); int done = process_jobqueue(jobs,nthreads); printf("done %d jobs\n",done); if(v>0) write_jobqueue(jobs); return 0; } Below is the definition of the function ``do_job``: :: void *do_job ( void *args ) { jobqueue *q = (jobqueue*) args; int dojob; do { dojob = -1; if(v > 0) printf("thread %d requests lock ...\n",q->id); pthread_mutex_lock(&read_lock); int *j = q->nextjob; if(*j < q->nb) dojob = (*j)++; if(v>0) printf("thread %d releases lock\n",q->id); pthread_mutex_unlock(&read_lock); if(dojob == -1) break; if(v>0) printf("*** thread %d does job %d ***\n", q->id,dojob); int w = q->work[dojob]; if(v>0) printf("thread %d sleeps %d seconds\n",q->id,w); q->work[dojob] = q->id; /* mark job with thread label */ sleep(w); } while (dojob != -1); if(v>0) printf("thread %d is finished\n",q->id); return NULL; } Pthreads allow for the finest granularity. Applied to the computation of the Mandelbrot set: One job is the computation of the grayscale of one pixel, in a 5,000-by-5,000 matrix. The next job has number :math:`n = 5,000*i + j`, where :math:`i = n/5,000` and :math:`j = n ~{\rm mod}~ 5,000`. The Dining Philosophers Problem ------------------------------- A classic example to illustrate the synchronization problem in parallel program is the dining philosophers problem. The problem setup, rules of the game: 1. Five philosophers are seated at a round table. 2. Each philosopher sits in front of a plate of food. 3. Between each plate is exactly one chop stick. 4. A philosopher thinks, eats, thinks, eats, ... 5. To start eating, every philosopher 1. first picks up the left chop stick, and 2. then picks up the right chop stick. Why is there a problem? The problem of the starving philosophers: * every philosoper picks up the left chop stick, at the same time, * there is no right chop stick left, every philosopher waits, ... Bibliography ------------ 1. Compaq Computer Corporation. **Guide to the POSIX Threads Library**, April 2001. 2. Mac OS X Developer Library. **Threading Programming Guide**, 2010. Exercises --------- 1. Modify the ``hello world!`` program with so that the master thread prompts the user for a name which is used in the greeting displayed by thread 5. Note that only one thread, the one with number 5, greets the user. 2. Consider the Monte Carlo simulations we have developed with MPI for the estimation of :math:`\pi`. Write a version with Pthreads and examine the speedup. 3. Consider the computation of the Mandelbrot set as implemented in the program ``mandelbrot.c`` of lecture 7. Write code for a work crew model of threads to compute the grayscales pixel by pixel. Compare the running time of your program using Pthreads with your MPI implementation. 4. Write a simulation for the dining philosophers problem. Could you observe starvation? Explain.