OpenMP Directives Explained


Here is an example of how a C compiler might transform a
omp-parallel directive into multithreaded code. (This
example doesn't show how the compiler would handle data,
such as private and shared variables.)


// Original Code                   // Transformed Code
{                                  {
   // single threaded code            // single threaded code (master thread)

   #pragma omp parallel               // fork n-1 threads using the
   {                                  // thread function defined below
      // structured block             for ( _i = 1; _i <= _n-1; _i++ )
   }                                     _ompc_fork( &_ompc_region_fn1, ... );
                                      _ompc_region_fn1(...); // master thread is thread 0
   // single threaded code            _ompc_join();
}
                                      // single threaded code  (master thread)
                                   }

                                   // turn the "structured block" into
                                   // the body of a thread function
                                   void _ompc_region_fn1(...)
                                   {
                                      // structured block
                                   }

==================================================================================

Here is an example of how a C compiler might transform a omp-for worksharing
directive with static scheduling. Notice that the heart of this transformation
is the function _ompc_static() which computes the loop limits for each thread
based on the thread`s id number and the total number of threads.
(How might you implement this function?)


// Original Code                   // Transformed Code
#pragma omp for                    int _tid = _ompc_get_thread_num();
for( i = a; i < b; i++ )           int _lower = a, _upper = b-1;
{                                  _ompc_static( _tid, &_lower, &_upper );
  // loop body                     for( _i = _lower; _i < _upper; _i++ )
}                                  {
                                     // loop body
                                   }
                                   _ompc_barrier();


==================================================================================

Here is an example of how a C compiler might transform a omp-sections
worksharing directive. Notice that the compiler transforms the sections
directive into an equivalent omp-for worksharing directive, which would
then itself be transformed in a manner similar to the above example.


//Original Code                  // Transformed Code
#pragma omp sections             #pragma omp for schedule(static, 1)
{                                for( _i = 0; _i < 3; _i++)
   #pragma omp section           {
   {                                switch(_i)
      // code fragment 1            {
   }                                case 0:
   #pragma omp section                 // code fragment 1
   {                                   break;
      // code fragment 2            case 1:
   }                                   // code fragment 2
   #pragma omp section                 break;
   {                                case 2:
      // code fragment 3               // code fragment 3
   }                                   break;
}                                   }
                                 }

==================================================================================

Here is an example of how a C compiler might transform a omp-for worksharing
directive with dynamic scheduling. Once again, the heart of the transformation
is a function _ompc_dynamic() which computes the limits of the next loop slice
that should be worked on. The implementation of _ompc_dynamic() is much more
complicated than _omp_static() since it must coordinate, at run time, with all
of the threads working on the loop. On the other hand, _ompc_static() need not
coordinate or communicate with any other thread, it`s completely "static".

                                              // Transformed Code
// Original Code                              struct{int lower, upper, done}
#pragma omp for schedule(dynamic, chunk)         _loop={a, b, 0 }
for( i = a; i < b; i++ )                        _slice={0, 0, 0 };
{                                             while (1)
  // loop body                                {  // request a new slice of the loop
}                                                _ompc_dynamic(&_loop, &_slice, chunk);
                                                 if ( _slice.done )
                                                    break;
                                                 for(_i = _slice.lower;
                                                     _i < _slice.upper;
                                                     _i++)
                                                 {
                                                   // loop body
                                                 }
                                              }
                                              _ompc_barrier();