The following programs simulate the operation of the following code
where types are: int A,B,C,D  int *P,*Q
 1.  A = 1;
 2.  B = 2;
 3.  C = 3;
 4.  *P = &A;
 5.  *Q = &C;
 6.  B = 4;
 7.  P = &B;
 8.  Q = P;
 9.  D = *Q;
which is considered to be run as a single thread with A,B,P in a cache
on a cpu that is different from the one running the thread and with C,D,Q
in the cache of the cpu running the thread.  There is some delay in 
passing the values of variables from the remote cache to the thread's
cache: call it 'DELAY'

A read barrier is implemented between lines 7 and 8 using function
rb().  That function returns 1 if the updated value of P is not yet
available to the thread.  When it returns 1, the program count does
not advance (due to count--) - this simulates a wait by the barrier.

A write barrier is implemented between lines 6 and 7 using function
wb().  It operates like rb().

Functions setP and setB are used to temporarily record new values of 
P and B and the (clock) cycle on which these become available.
Functions runP and runB set the new values from the temporary values 
when the appropriate (clock) cycle is reached.

01-barrier:
  Code is separated over two threads.  There are no barriers and 
  cache delays are simulated.
  run it like this:  11-barrier <number> where 
  <number> is a value between 0 and 7.  Run it without an
  argument for the meaning of the numbers.

     results
     -------
     D = 1 or D = 4  /* statements are all run in order (number = 0) */

     D = 1           /* statements in order, delay at cpu1 (number = 1) */

     D = 1 or D = 4  /* statements in order, delay at cpu2 (number = 2) */

     D = 1           /* statements in order, delays at cpu1/2 (number = 3) */

     D = 1 or D = 4  /* cpu1 statements out of order, no delays (number = 4) */

     D = 1 or D = 2  /* cpu1 statements out of order, delay at cpu1 (number = 5) */

     D = 1 or D = 4  /* cpu1 statements out of order, delay at cpu2 (number = 6) */

     D = 1 or D = 4  /* cpu1 statements out of order, delays at cpu1/2 (number = 7) */


11-barrier:
  Only the effect of delays on P are accounted for
  There are no barriers
  run it like this:  11-barrier DELAY
     results
     -------
     DELAY=0  D=4
     DELAY=1  D=1
     DELAY=2  D=1
     DELAY=3  D=1
     ...

12-barrier:
  Only the effect of delays on P are accounted for
  There is a read barrier between lines 7 and 8
  run it like this:  12-barrier DELAY
     results
     -------
     DELAY=0  D=4
     DELAY=1  D=4
     DELAY=2  D=4
     ...

13-barrier:
   Only the effect of delays on B are accounted for
   There are no barriers
   run it like this: 13-barrier DELAY
     results
     -------
     DELAY=0  D=4
     DELAY=1  D=4
     DELAY=2  D=4
     DELAY=3  D=2
     ...

14-barrier:
   Only the effect of delays on B are accounted for
   There is a write barrier between lines 6 and 7
   run it like this: 14-barrier DELAY
     results
     -------
     DELAY=0  D=4
     DELAY=1  D=4
     DELAY=2  D=4
     DELAY=3  D=4
     ...

15-barrier:
   The effects of B and P are accounted for
   There are no barriers
   run it like this: 15-barrier DELAY
     results
     -------
     DELAY=0  D=4
     DELAY=1  D=1
     DELAY=2  D=1
     DELAY=3  D=1
     DELAY=4  crash - must fix

16-barrier:
   The effects of B and P are accounted for
   There are read and write barriers
   run it like this: 16-barrier DELAY
     results
     -------
     DELAY=0  D=4
     DELAY=1  D=4
     DELAY=2  D=4
     DELAY=3  D=4
     ...