Lab 12 - Concurrency

Lab goals:


The tiny Web Server

In your repository for this lab, you will find the source code for a simple web server, known as the tiny web server. In the first exercise in this lab, you will modify this provided tiny server source code to make use of thread concurrency, one of the new concurrency topics covered by this lab.

The provided web server source files are located in the tiny subdirectory of your repository. To build tiny, use the Unix commands:

cd tiny
make

To run the tiny server, use the Unix command on CLEAR:

./tiny  port  &

where port is the port number at which tiny can then be reached. This must be unique integer between 1024 and 65,535, inclusive. And, note that if you want to be able to connect to tiny from outside of the CLEAR machines, you need to use a port number between 18000 and 18200.

To connect to tiny, navigate your web browser to the URL

http://hostname:port

where hostname is the full name of the CLEAR server that you are logged on to, and port is the same port number that you used in starting tiny.  Note that, between hostname and port there is a colon character with no spaces.  If you don't know the specific name of the CLEAR server you are logged on to, it can be found by running the command

uname -n

You must use the full name of the CLEAR server you are logged on to, including the .clear.rice.edu part.

IMPORTANT:  As with the echo server in the previous lab, when you are done testing your tiny web server, be sure to stop the execution of the server.  This can be done either with the kill command or with a combination of the fg command and typing control-C.  Be sure not to leave your tiny server running in the background.


Process Concurrency Using Pre-Forking

In the last lab, we looked at the use of process concurrency using the fork-after-accept technique.  Process concurrency can also be created instead using the pre-forking technique, which is more efficient.

In particular, the overhead of forking a new process is relatively high.  This forking of a new process after accepting a connection will increase the latency of the response to the request and the time until the next request can be accepted.  A more efficient design is to fork before the accept.  Specifically, in this technique, you create a pool of processes that are all attempting to accept and process new requests.  Here is an example of this technique using the echo server from the previous lab:

#include "csapp.h"
#include "echo.h"       // Declares echo().

void
process(int listenfd)
{
        struct sockaddr_in clientaddr;
        socklen_t clientlen = sizeof(struct sockaddr_in);
        int connfd;

        while (true) {
                clientlen = sizeof(clientaddr);
                connfd = Accept(listenfd, (SA *)&clientaddr, &clientlen);
                echo(connfd);
                Close(connfd);
        }
}

int
main(int argc, char **argv)
{
        int i, listenfd, numprocs, status;

        if (argc != 3) {
                fprintf(stderr, "usage: %s <port> <numprocs>\n", argv[0]);
                exit(1);
        }
        listenfd = Open_listenfd(argv[1]);
        numprocs = atoi(argv[2]);
        for (i = 0; i < numprocs; i++) {
                if (Fork() == 0)
                        process(listenfd);
        }
        while (true) {
                wait(&status);
                if (WIFEXITED(status) || WIFSIGNALED(status)) {
                        /*
                         * Start a new process to replace the terminated one.
                         */
                        if (Fork() == 0)
                                process(listenfd);
                }
        }
}

This works because accept() is an operating system kernel call, and internally, the Linux kernel forces the operation of accept() to be atomic. Here atomic means that, even if there are many different processes at the same time waiting to return from an accept() call on the same listening file descriptor, when a new connection is then made from some client, the kernel will pick only a single one of those processes to succeed and actually accept the connection.

The parent process in this model serves as a nanny for all of its children. Note that this application uses the capital letter versions of all system functions, and recall that these functions exit if an error is encountered. If any child terminates because of a function encountering an error, the nanny will fork another child, so there will always be a constant number of children. This makes the server more robust to failure. The nanny process should not be used for anything other than waiting on children and restarting them, to help prevent the nanny itself from failing.


Thread Concurrency

Process concurrency is not well suited for some applications. For example, if the application needs to share a large amount of information between each process, the overhead of the inter-process communication can become substantial. Sharing information between threads within a single process is easier in this case because the threads share the same entire address space of the process. In these types of applications, it would be better to use threads.

Each thread within a process executes independently, but all of the threads within the process share all of the memory and all file descriptors of the process. Support for threads on Unix is provided by the POSIX threads standard, or Pthreads for short.

The Pthreads function for creating a new thread is pthread_create. It has the following prototype:

int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void *), void *arg);

The four arguments to pthread_create are as follows:

As described above, the newly created thread begins execution with a call to the procedure at start_routine, passing it the single argument arg. If you want to pass more than one argument to that function, you should define a struct to contain the desired arguments; you should create and initialize that type of struct and then call pthread_create to have it pass the address of it as the single argument to the start_routine.

There is no asynchronous method for a thread to be notified of the completion of another thread like there is for processes with SIGCHLD. The only method for checking if another thread has finished is by polling with the pthread_join function. See man pthread_join for information on how to use this function.

If you want to have a thread that does not return a value, the thread must be marked as detached with the pthread_detach function. See man pthread_detach for information on how to use this function.

Here is an example of using Pthreads for an echo server:

#include "csapp.h"
#include "echo.h"       // Declares echo().

void    *start_routine(void *vargp);

int
main(int argc, char **argv)
{
        struct sockaddr_in clientaddr;
        socklen_t clientlen = sizeof(struct sockaddr_in);
        pthread_t tid;
        int *connfdp, listenfd;

        if (argc != 2) {
                fprintf(stderr, "usage: %s <port>\n", argv[0]);
                exit(0);
        }
        listenfd = Open_listenfd(argv[1]);
        while (true) {
                connfdp = Malloc(sizeof(int));
                *connfdp = Accept(listenfd, (SA *)&clientaddr, &clientlen);
                Pthread_create(&tid, NULL, start_routine, connfdp);
        }
}

void *
start_routine(void *vargp)
{
        int connfd = *(int *)vargp;

        Pthread_detach(pthread_self());
        Free(vargp);
        echo(connfd);
        Close(connfd);
        return (NULL);
}

Note that even though only a single integer is being passed to the newly created thread, space for this integer is allocated using malloc rather than the integer being located on the stack. This design is needed because there is a subtle bug that can occur if the integer is on the stack of the calling thread. If the calling thread modifies the argument (either deliberately or accidentally) before the new thread is first scheduled and begins execution, then the argument may change values behind the new thread's back; depending on the timing, the new thread may see either the original or the modified value for this argument when it begins execution. This problem can occur very easily by accident if the argument is on the calling thread's stack, for example simply by the calling thread returning from the procedure in which it created the new thread. Due to this problem, the argument to a new thread should always be in private memory, such as that allocated with malloc.


Mutexes

A mutex is a type of variable that can be used to provide mutual exclusion for safely executing critical sections. A piece of code is said to be a critical section (or part of a critical section, for related pieces of code) if more than one thread can execute this code and in it access one or more variables that are shared between these threads. For correctness, the idea is to have a thread that is executing in the critical section exclude other threads from entering in the critical section at the same time. Specifically, mutual exclusion means that no more than one thread at a time can be executing in the critical section, and thus no more than one thread at a time can be accessing these shared variables.

When using mutexes in this way, a thread may enter the critical section only after acquiring the mutex lock protecting that critical section. When the thread exits the critical section, it must release the mutex. When the mutex has been acquired and is being held by some thread, any other thread attempting to execute the critical section must wait until the mutex is released and can be acquired by this thread. Mutexes have some similarities to binary semaphores, but importantly, unlike semaphores, only the thread that acquires the mutex can release it.

The typical use of a mutex to protect a critical section looks like this:

LOCK MUTEX
critical section begins
...
...
...
critical section ends
UNLOCK MUTEX

The Pthreads library supports the following functions for using mutexes:

Pthreads Mutex Example

This example program illustrates the use of a Pthreads mutex. The shared counter variable cnt is initialized to 0, and the program then creates two threads that each iterate NITERS number of times, incrementing the count in this variable that is shared between the two threads. The final value of cnt should thus be 2 × NITERS, but without proper synchronization, this might not be the result. The complete source code for this program is provided in the file echo-key+sor/goodcnt-mutex.c in your repository for this lab.

#define NITERS 10000000         /* Number of iterations. */
unsigned int cnt = 0;           /* Shared count variable. */
pthread_mutex_t mutex;          /* Shared pthread mutex. */

int
main(void)
{
        pthread_t tid1, tid2;

        /* Initialize pthread variables. */
        Pthread_mutex_init(&mutex, NULL); /* INITIALIZE MUTEX */

        /* Start threads. */
        Pthread_create(&tid1, NULL, count, NULL);
        Pthread_create(&tid2, NULL, count, NULL);

        /* Wait for threads to finish. */
        Pthread_join(tid1, NULL);
        Pthread_join(tid2, NULL);

        if (cnt != (unsigned)NITERS * 2)
                printf("BOOM! cnt = %d\n", cnt);
        else
                printf("OK cnt = %d\n", cnt);

        /* Clean up. */
        Pthread_mutex_destroy(&mutex); /* DESTROY MUTEX */

        return 0;
}

/* Each thread executes this function. */
void *
count(void *arg)
{
        int i;

        for (i = 0; i < NITERS; i++) {
                /* Acquire mutex lock. */
                Pthread_mutex_lock(&mutex); /* ACQUIRE LOCK */
                cnt++;
                /* Release mutex lock. */
                Pthread_mutex_unlock(&mutex); /* RELEASE LOCK */
        }
        return NULL;
}

Condition Variables

A condition variable allows threads to synchronize with each other based typically on the value of some shared data. Imagine the scenario below in which a thread has to perform some task but cannot do so until the value of a shared variable shared_val is not equal to 0 and can perform that task only when the value of that variable is still not equal to 0. Checking the value of this variable and performing the task must be done inside a critical section, as otherwise the value of the variable could change while this is being done. This means that the thread must acquire a mutex before checking the value of the variable and, if not equal to 0, must release the mutex only after performing the task.

Consider the following inefficient example with two threads:

Thread 1   Thread 2
while (true) {
        LOCK MUTEX
        if (shared_val != 1)
                UNLOCK MUTEX
        else
                break
}
perform task
...
...
UNLOCK MUTEX
LOCK MUTEX
...
...
shared_val = 1
UNLOCK MUTEX

Condition variables allow for efficient handling of such scenarios. A condition variable does not hold any value but instead represents a condition or an event. A thread can wait on a condition variable until the condition is satisfied or the event occurs. It is the responsibility of other threads to signal the condition variable when this happens.

So in the above example, waiting for the value of the shared variable to be equal to 1 can be represented by a condition variable. One or more threads wait on this condition variable. When another thread sets the value to be equal to 1 (or, alternatively, detects that this is so), it also signals the condition variable.

The example below is equivalent to the previous example, but it uses a condition variable efficiently to provide the synchronization for waiting on the event (note, however, the caveat in Exercise #2 below):

Thread 1   Thread 2
LOCK MUTEX
if (shared_val != 1)
        COND WAIT
perform task
...
...
UNLOCK MUTEX
LOCK MUTEX
...
...
shared_val = 1
COND SIGNAL
UNLOCK MUTEX

Note that condition variables are always used in conjunction with mutexes. A condition variable is never used by itself.

The Pthreads library supports the following functions for using condition variables:


Barriers

A barrier can be used to synchronize all running threads at a particular common point within the execution of the algorithm. This is most useful when several threads execute in parallel some step of the algorithm in which they modify some variables shared with other threads. A barrier at the end of this portion of the code blocks each thread at that point until all of them reach the corresponding barrier. Then all of the threads are allowed to continue and execute beyond that point. The example below shows how barriers can be used (assume here that all threads execute the same code in parallel, as shown below):

All threads 1 ... n
all threads execute code section 1 in parallel
....
BARRIER
     /* each thread is blocked here until all threads reach this barrier, and then all threads continue again */
...
all threads execute code section 2 in parallel
...
BARRIER
     /* each thread is blocked here until all threads reach this barrier, and then all threads continue again */
...

The Pthreads library supports the following functions for using barriers:


GitHub Repository for This Lab

To obtain your private repo for this lab, please point your browser to this link for the starter code for the lab. Follow the same steps as for previous labs and assignments to create your repository on GitHub and then to clone it onto CLEAR. The directory for your repository for this lab will be

lab-12-concurrency-name

where name is your GitHub userid.


Submission

Be sure to git push the appropriate C source files for this lab before 11:55 PM tonight, to get credit for this lab.