Pipelining allows the stages of various instructions to be executed in parallel.
gcc
, the __builtin_expect()
function allows the program to provide the compiler hints.State | Observed | Generated | Next State |
---|---|---|---|
Valid | PrRd | ~ | Valid |
Valid | PrWr | BusWr | Valid |
Valid | BusWr | ~ | Invalid |
Invalid | PrWr | BusWr | Valid |
Invalid | PrRd | BusRd | Valid |
State | Observed | Generated | Next State |
---|---|---|---|
Modified | PrRd | ~ | Modified |
Modified | PrWr | ~ | Modified |
Modified | BusRd | BusWB | Shared |
Modified | BusRdX | BusWB | Invalid |
Shared | PrRd | ~ | Shared |
Shared | BusRd | ~ | Shared |
Shared | BusRdX | ~ | Invalid |
Shared | PrWr | BusRdX | Modified |
Invalid | PrRd | BusRd | Shared |
Invalid | PrWr | BusRdX | Modified |
volatile
keyword provides the following features:setjmp
and longjmp
.sig_atomic_t
variables in signal handlers.volatile
keyword does not prevent reordering of instructions.pthreads
fix the issues of clone and provide a uniform interface for most systems.See Lecture 6 - Working with Threads for pthreads.
~ | Read 2nd | Write 2nd |
---|---|---|
Read 1st | RAR - No Dependency | WAR - Antidependency |
Write 1st | RAW - True Dependency | WAW - Output Dependency |
See Lecture 8 - Asynchronous I/O for cURL.
See Lecture 9 - Of Asgard Hel for Valgrind.
sleep()
frequently, then the threads in the lock convoy have an increased probability to make progress.notify()
to wake a single thread instead of all the threads, it is possible that the notify()
becomes lost.gcc
flags -fdump-tree-gimple
and -fdump-tree-all
can be used to see all the three address code.restrict
Qualifier¶restrict
qualifier on a pointer p
tells the compiler that it may assume that, in the scope of p
, the program will not use any other pointer q
to access the data at *p
.restrict
qualifier allows a compiler to optimize code, especially critical loops, better.icc
: Intel C Compiler.cc
: Solaris Studio Compiler.gcc
: GNU C Compiler - Graphite.clang
: Clang Compiler - polly.for (i = 0; i < 1000; i++)
x[i] = i + 3;
for (i = 0; i < 100; i++)
for (j = 0; j < 100; j++)
x[i][j] = x[i][j] + y[i - 1][j];
for (i = 0; i < 10; i++)
x[2 *i + 1] = x[2 * i];
for (j = 0; j <= 10; j++)
if (j > 5) x[i] = i + 3;
for (i = 0; i < 100; i++)
for (j = i; j < 100; j++)
x[i][j] = 5;
See Lecture 13 - OpenMP for OpenMP.
See Lecture 14 - OpenMP Tasks for OpenMP Tasks.
#pragma omp flush [(list)]
See Lecture 15 - Memory Consistency.
See Lecture 15 - Memory Consistency.
mfence
: All loads and stores before the barrier become visible before any loads and stores after the barrier become visible.sfence
: All stores before the barrier become visible before all stores after the barrier become visible.lfence
: All loads before the barrier become visible before all loads after the barrier become visible.See Lectures 17 to 36 for Post-Midterm Content; School Closure b/c Pandemic.