The little mutex that wouldn’t…

Every now and again, you run into a piece of code that does the impossible… It always comes down to finding the “misunderstanding” between you and the compiler (or interpreter) on some particular of an instruction. You might have said “halt and catch fire” and the computer might interpret “catch fire” as “stop it from hitting the ground” instead of “combust”.

A little but if Mutex-wrapped code I’ve inherited is insisting on crashing as two threads simultaneously try to work inside a Mutex. Which is impossible: Mutex is short for ‘Mutual Exclusion‘.

This is not for the feint hearted (a symptom of feint heartedness can be programming in Java :-P)

class Supercell::LoadThread
{
// …
    bool queued, loading, loaded;
    MutexLock _queueMutex, _loadMutex;
    Condition doLoad; 

     typedef std::list SuperCellList;
     SuperCellList _cells; 

// …
}

void Supercell::LoadThread::AddSupercell(Supercell* sc)
{
    _queueMutex.lock();
     _cells.push_back(sc);
    doload.signal(&_queueMutex);
    _queueMutex.unlock(); 
}

void Supercell::LoadThread::Run()
{
    SuperCell* sc = NULL;

    // Have to lock the queue before we begin so that we can
    // look at ‘done’ and ‘_cells’ and begin the “wait” process.
    _queueMutex.lock();

    for (;;)
    {
        while(false == _done && true == _cells.empty())
        {
            doLoad.wait();
        }
        if(_done) break;

        // Make sure we can lock the work mutex before we pick
        // up anything to work on.
        // Race-condition elimination; TRY and get the lock but
        // if we can’t get it immediately, we won’t do any work.
        bool lockedLoad = ( _loadMutex.tryLock() == 0 ) ;

        if ( lockedLoad )
        {
            sc = _cells.front();
            _cells.pop_front(); 
        }

        // Release the queue early because works non-trivial time.
        _queueLock.unlock() ;

        if (  lockedLoad )
        {
if ( NULL != sc ) 
            sc->Load();
            _loadMutex.unlock() ;
        }
 
        // Now we need to re-lock the queue so we can look
        // at “done” and _cells and start the wait loop again. 
        _queueMutex.lock() ;
    }
    _queueMutex.unlock() ;
}

// When a Renderer destructs, it removes all of its supercells from the loadthread…
void LoadThread::RemoveRendererSuperCells(Renderer* r)
{
    // Lock the queue, prevent the loader from grabbing the next queue entry.
    _queueMutex.lock() ;
    // We also need to wait for the loader to finishing up whatever it might be doing.
    _loadMutex.lock() ;

    // Code to remove all Supercells in _cells which have sc->_renderer == r
    for(SuperCellList::iterator it = _cells.begin() ; it != cells.end() ; )
    { 
        if((*it)->_renderer == r) _cells.remove(it++);
        else ++it;
    } 

    _loadMutex.unlock();
    _queueMutex.unlock();
}

And yet, it still crashes with Thread 8 instead sc->Load while Thread 1 is in RemoveRendererSuperCells .

HOW THE HELL?

Urgle.

I took the “do over” stick to it. I made sure all the derived classes have virtual destructors – not that it should matter; and I switched that std::list for a std::vector

   for ( SupercellList::iterator it = _cells.begin() ; it != _cells.end() ; )
   {
      if ( (*it)->_renderer == r )
      {
         *it = _cells.back() ;     // Move the last entry here
         _cells.pop_back() ;      // Remove the last entry 
      }
      else
         ++it ;
   }

Similar changes to the Loader … And now it’s a tiny, wee bit faster but gives the visual illusion of being noticeably faster (because of a change to the order in which it renders) … but more importantly, doesn’t crash.

Scratch Ticket #1984: PC and Mac client crashes at Map Screen when clicking navigation buttons.

5 Comments

What’s with the _loadMutex.unlock() bit in the middle of all that other stuff? That looks a little hinky.

You mean this?

if ( lockedLoad )
{
sc->Load();
_loadMutex.unlock() ;
}

Think about it very carefully… :)

I suppose that “_loadThread.tryLock()” instead of “_loadMutex.tryLock()” is a transcription error rather than a problem in the original code?

Actually, it’s an idiot error :) I copied & pasted the source (a sharp eye can tell which are the lines I’ve added to the code while debugging it vs the original lines). Anyway, while I was reviewing it for anything I might need to remove to sanitize it for blog posting (which I didn’t) I saw that and ‘corrected’ it.

dear lord. Netsted thread locking with inheritence. Talk about head-ache inducing :)

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: