PowerPC405 CMP

Description

This Shared Memory CMP simulator includes several PowerPC 405 32-bit RISC CPUs with a scalar 5-stage pipeline, and separated instruction and data caches. All the data and instruction caches are connected to a system bus connected to the main memory.

The instruction cache which does not appear on the figure is embedded within the PowerPC 405 CPU module.

This simulator is able to run benchmarks using pthreads, up to a number of running threads equal to the number of CPU in the simulator. Bus snooping on the system bus is used to implement cache coherency on shared data.

The whole memory if fully shared, as for a threaded application.

How to get the simulator

This simulator is available at https://unisim.org/svn/public/components/CycleLevel/simulators/ppc-pthread.

To compile it requires to download the whole public branch of the repository which contains all the components composing the simulator. More information can be found on the Public Branch page.

Simulating pthread benchmarks

Dealing with Posix Threads at simulator level means that the simulator has to take care of the complex thread scheduling (including thread migration) as the operating system would do. As implementing a whole Linux scheduler will be a huge task for a user-level simulator, we decided to restrict ourselves to a subclass of pthreaded benchmarks. We restrict the benchmark to the use of the standard pthread interface consisting of:

  • pthread_create and pthread_exit to start and terminate a thread.
  • pthread_self to get the current thread id.
  • pthread_join to wait for another thread to terminate.
  • pthread_mutex_init and pthread_mutex_destroy to define and release a new mutual exclusion lock.
  • pthread_mutex_lock and pthread_mutex_unlock to use a previously defined mutual exclusion lock.

And to a number of running threads that should be up to the number of simulated processors, so that no complex thread scheduling have to be implemented.

To explain how the simulator react to a pthreaded benchmark , let’s consider the following example:

  void *child(void *arg)
  { ... child thread code ...
   pthread_exit(NULL);
  }
  
  int main(int argc, char **argv)
  { ... initialization ...
   pthread_create(&t,NULL,compute,NULL);
   ... some other computation ...
   pthread_join(t,NULL);
   pthread_exit(NULL);
   return 0;
  }

The expected behavior consist of a main thread doing some initialization, then spawning a child thread. Then the main thread and the child thread running in parallel. At the end of the main thread, it waits for the child thread to finish if it did not already do so.

How pthread behave in the simulator

The simulated benchmark will be started on the first CPU, all the remaining CPU being asleep. A sleeping CPU is dormant and does not react to incoming signal, nor try to fetch some instructions.

The execution will remain on this CPU until the execution reach a pthread_create that should spawn a new thread. At that time one of the sleeping CPU will be initialized and waken up with a PC set to the function associated with the thread.
In the figure below, the main thread spawns a new child thread.

At this point both threads will be executed independently each in their own CPU and with their own stack address, until some synchronisation is required by some pthread_* calls.

Upon a pthread_join on an active thread, the CPU running the current threads fall asleep waiting for the target thread to terminate.
In the figure below, the main joins its child thread going to sleep until the child thread terminates.

Upon termination by a pthread_exit a thread terminates going back to sleep, and waking up all the threads waiting for this particular thread to terminate.
In the figure below the child thread terminates putting its own cpu back to sleep, and waking up the joining main thread.

Upon a pthread_mutex_lock, if the shared lock is unlocked, it atomically become locked, and the execution on the locking thread can carry on.
In the figure below, the main thread is requiring an available lock.

A pthread_mutex_lock, could also try to get a not avaialable lock. In such a case, the execution on the thread that required the lock does not carry on, meaning that the processor simulating this thread will redo this pthread_mutex_lock in the next cycles, until it can obtain such a lock.
In the figure below the child thread is trying to acquire a lock which is owned by the main thread%. As it is not able to get this lock the child process is stuck to the instruction asking for the lock.

A lock is released by a pthread_mutex_unlock. When a lock is released, all the threads that are waiting for this lock, looping on the pthread_mutex_lock will have an oportunity to acquire the lock.
In the figure below, the main thread is releasing a lock which is needed by the child thread. After the relase, the child thread obtain the lock.

Compiling pthread benchmarks for the Simulator

The Cross Compilers page of the Tools section describes how to build a PowerPC 405 cross compiler. Such a compiler can be used to build binaries that are using the Posix Threads.

As the simulator won’t handle the pthread as the real operating system would do, associating one running thread to one active processor, the benchmark should not be linked at compilation time with the posix thread library, but with a dedicated library defining the same function prototypes, and allowing the simulator to handle the pthread calls.

The following example details the changes required at the source code level:

The initial main.cpp file:

#include <pthread.h>  

void *child(void *arg)
{ ... child thread code ...
 pthread_exit(NULL);
}

int main(int argc, char **argv)
{ ... initialization ...
 pthread_create(&t,NULL,compute,NULL);
 ... some other computation ...
 pthread_join(t,NULL);
 pthread_exit(NULL);
 return 0;
}

Compiled with the following command line:

g++ -o main main.cpp -lpthread

Should be replaced by the following source code:

#include <unisim/simulated_pthread.h>  

void *child(void *arg)
{ ... child thread code ...
 pthread_exit(NULL);
}

int main(int argc, char **argv)
{ ... initialization ...
 pthread_create(&t,NULL,compute,NULL);
 ... some other computation ...
 pthread_join(t,NULL);
 pthread_exit(NULL);
 return 0;
}

And compiled with the following command line:

g++ -o main -I/where/unisim/is/installed/include main.cpp

As you can see the changes only consist of:

  • replacing the pthread #include by a dedicated one
  • remove the linking to the pthread library
  • add the path to where the #include could be found.

The new include path can be automatically probed using unisim_compiler -i as shown below:

g++ -o main -I`unisim_compiler -i` main.cpp

Modules composing the simulator

Services used by the simulator

  • elf32-loader
  • linuxos
  • gdb-server debugger
  • inline debugger
 
simulators/cycle/ppc-pthread.txt · Last modified: 2007/08/28 18:34 by girbal     Back to top
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki