parallelism

Parallel exercise: Palindromes and reversible words.

I’ve been looking for projects to test various methods of work-distrubtion, without being yet-another-Fibonacci-series that has been done to death already, and I stumbled on the idea of finding all the reversible words and palindromes within a given range of lengths (3-7 letters).

It immediately seemed like a fun challenge: the simplest approach would be to load an entire dictionary in from one of the open-source resources someplace, and simply iterate over each word and find those which still make a valid word when reversed.

But just as easily, you could start ignorant of the lexicon and use brute (or genetic) force to find it… Remember, this is an exercise, it’s not really so much about finding the results.

One of my variations used ZeroMQ to distribute tasks across 20 free Amazon EC2 servers: there are both Linux and Windows instances available for free (within limits) if you use “c1.micro”.

Searching for palindromes and reversible words (“Elle”, “tuba” => “abut”) provides a number of sub-tasks which you can compartmentalize in a variety of different ways. For example: combine letters to generate a word, match word against a dictionary, reverse the word and match that against a dictionary.

Computer, Operating and Programming.

If we’re ever going to turn the corner from single-processor computing to massively parallel and/or distributed computing, we need 3 big changes that must happen cooperatively: 1) A new instruction set, 2) a new programming language and 3) a new OS.

For the last decade, each of these critical components has evolved in a vacuum, some times seemingly in spite of each other.

Put in car terms: (CPUs) Tire manufacturers have “innovated” putting four tires on each wheel, giving you four times as much traction and therefore potentially four times as much speed; (Languages) engine manufacturers made engines greener to the point spiders live in them (while hoping you didn’t notice diesel engines get 2x the mpg); (OSes) car manufacturers have concentrated on making the vehicles look prettier in a larger variety of car parks.

Each is hamstrung by the demands and and constraints of the next to produce a net imperfection.

Sorting sucks.

Sorting and parallelization… Yeuch.

You can get significant performance gains by parallelizing various sort algorithms cleverly, but ultimately there’s some data on the left that needs to be on the right, and vice versa.

Answer tucked away in Changelogs…

The answer to my question about Thread Building Blocks’ usefulness for non-algorithmic parallelization (i.e., in a word, threading ;) was tucked away in a Change Log (it seems that TBB could use a little work on its documentation revision, and the book predates the change).

http://software.intel.com/en-us/blogs/2009/08/05/transitioning-from-intel-tbb-21-to-22/

ISO C++ thread class – A thin portable wrapper around OS threads. It’s a close approximation of the ISO C++ 200x class thread (Section 30.2 of http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2691.pdf). Now TBB lets you choose which is best, task-based versus thread-based, for your situation. Threads are typically better than tasks when the “work” is really more waiting than computation, such as for:

  • GUI, I/O or network interface threads.
  • Threads that need to wait on external events.
  • Programs that previously needed to use both native threads and Intel® TBB tasks.

Well, at least now I have a more immediately useful pet-project to work on: writing a generic encapsulation of a resource-sharing and resource-specific worker pool. Resource-sharing: a shared pool of resources which the workers variously use – which they use is indefinite i.e. a resource; Resource-specific: a pool that will handle specific actions for specific resources, i.e. this specific resource.

Resource-sharing: Write this data to any database handle in the pool; Resource-specific: Someone write this data to connection #42.

OpenMP joy from Visual Studio :)

An update from my earlier post, I got a reply to my OpenMP 3.0 ticket, for those of you who don’t have a Windows Live account:

Hopefully that means it’ll be in VS 2010 by release time. Superb.

<3 Visual Studio, yet again :)

Going Parallel

I’ve been looking at OpenMP and Thread Building Blocks. I’ve actually managed to OpenMP one part of the client (which seemed to reduce the “chug” as you move at high speed between large areas of terrain, but didn’t otherwise impact performance, maybe at spawning, I dunno), and a few parts of the cell host (I reorganized the order in which the host prepares vehicles for their world update; the initial part of which is quite thread-safe, and boils down to a simple culling algorithm; on the live servers that should significantly reduce the time the world update dispatch takes, meaning that the data sent to clients is “fresh”er by 5-10 ms under busy load).

My initial understanding of both systems was fairly hazy – I’m trying to run before I can walk, but then the documentation for both starts at a sprint.

The worst mistake I made was misunderstanding their focus: which is algorithmic parallelization.