AreWeThereYetAreWeThereYetAreWeThereYet

Oops


// Given a list of vehicles pending an update, eliminate those who
// are not yet ready for an update (not finished spawning, not heard
// from in a while, etc). Also perform all the various discrete,
// preliminary steps that can safely be executed in parallel.
static void _evaluateVehicleCandidates(VEHICLES& vehicles)
{
  // Have as many threads as possible spawn tasks of the
  // workload.
  #pragma omp parallel for schedule(guided) shared(vehicles)
  for ( size_t i = 0 ; i < vehicles.size() ; ++i )
  {
    // Turn each vehicle into a task, since the execution length
    // is variable.
    #pragma omp task firstprivate(i)
    {
      if ( _evaluateCandidate(vehicles[i]) == false )
      vehicles[i] = NULL ;
    }
  }
}

Cpu0: Idle: 99.8%
Cpu1: Idle: 0%

With 0 people on the server. Hmm.

I slowly backed out my various #pragma omp inclusions, and still it was using 100% of one CPU. The only odd thing was … it’s “the other” CPU…

valgrind –tool=cachegrind didn’t help. Apparently it was spending 25% of its time doing memset (apparently kcachegrind can’t read the file from this particular version of valgrind, sigh).

I dig, I prod, I poke. Nope. Just sinking faster. Now I’m at 93% and 0% idle.

15 minutes with the kittens and back to the computer. And it just came to me.

I’m actually only using 0.2% of the CPU, the other CPU is asking “Are we there yet?” (or more specifically, “Is there work yet?”).

The problem is that with 0 people on the server, the “#pragma omp parallel” still creates threads, with the second thread going off looking for work in a “futexspinlock.

So the first fix was: check if the bloody vehicle list is empty :)

But then as soon as I spawned, back to 100% cpu. Gah. Yes, of course, because only the first thread gets any work to do, and while it’s doing it, the second thread goes into “are we there yet mode” on the second core…

The solution is to use “#pragma omp parallel for … if(omp_get_num_threads() <= vehicles.size()” so that it only launches threads if the number of vehicles needing checking warrants a full load of threads. Infact, it’s probably only worth doing when there are more than 2* as many to check as there are threads…


// Given a list of vehicles pending an update, eliminate those who
// are not yet ready for an update (not finished spawning, not heard
// from in a while, etc). Also perform all the various discrete,
// preliminary steps that can safely be executed in parallel.
static void _evaluateVehicleCandidates(VEHICLES& vehicles)
{
  if ( vehicles.empty() )
    return ;
  // Have as many threads as possible spawn tasks of the
  // workload.
  #pragma omp parallel shared(vehicles) if(vehicles.size() >= 16) // Only go parallel if enough vehicles.
  {
    #pragma omp single // Have one thread dole-out the workload.
    for ( size_t i = 0 ; i < vehicles.size() ; ++i )
    {
      // Turn each vehicle into a task, since the execution length
      // is variable.
      #pragma omp task firstprivate(i)
      {
        if ( _evaluateCandidate(vehicles[i]) == false )
          vehicles[i] = NULL ;
      }
    }
  }
}

But I’m not sure that isn’t a mis-use of the task paradigm. The difference in run-times of individual iterations can be quite drastic, though.

6 Comments

Olly, add this plugin to your WordPress please, reading your code is quiet difficult due to the colour + how small it is.

http://www.lastengine.com/syntax-highlighter-wordpress-plugin/

Just a thought :)

I can’t comment much as I have not used C/C++ with those constructs, but the usual way one does that with what I work is to have a sync-ed FIFO queue and x threads, that when there’s nothing to do, they are “waiting/sleeping” (does not consume CPU). When an item is added to the queue, the threads are awoken, one of them gets the lock & the task and the rest go back to idle/wait after noticing that in fact there’s no work for them. So when there is no task, there is no running thread checking “are we there yet” as they are all “waiting”. It adapts to load and number of threads without affecting much the CPU.

Not sure how one would do that with OpenMP, or if it would be possible or work as cleanly. But just a thought.

S!

Sres: I’m using wordpress.com, so I don’t have any control over the plugins. That said, WordPress.com uses that syntax highlighter, and what I see on this page looks exactly like the syntax highlighter you linked to.

My code

Their code

OpenMP threads are done for you by the pragmas; I’m pretty sure they use spinlocks and futexes rather than smart queues – so you want to be dispatching work to them as quickly as possible.

Odd, I’m behind websense it could well be that the http://s0.wordpress.com/ sub domain is blocked… Noticed that on a few other wordpress sites that looke like arse.

Trackbacks and Pingbacks

Going Parallel « kfsone's pittanceMarch 30, 2010 at 7:37 pm

[…] Why? Because they are in a spinlock waiting for work. […]

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: