With parallelism options like OpenMP and TBB, you know the data is local and can take various short cuts by handing over pointers. ZeroMQ takes the same short-cut for “inproc://” connections, and the zero-copy (that provides the ‘Zero’ part of the name) matches OpenMP and TB for efficiency (since they use mutexes, spinlocks etc).
A local-only parallel sort can take advantage of this; since you are only going to assign any given range to one thread at a time, you can just pass the address of the data and the working-range indexes and know that threads aren’t going to experience resource contention.
But if you want to make it scalable – e.g. by allowing remote connections to remote processors over InfiniBand with a tcp:// connection,
zmq::context_t zmqContext(2) ; zmq::socket_t myEndpoint(ZMQ_DOWNSTREAM) ; // For sending work downstream to workers. // Accept downstream worker connections from // local threads via inproc:// myEndpoint.bind("inproc://local-thread-workers") ; // Accept downstream worker connections // from remote InfiniBand processes via tcp myEndpoint.bind("tcp://0.0.0.0:12345") ;
well now you probably have to send them the data in the ranges they are going to be working on.
I suppose I could start by just adapting a traditional parallel sort to ZeroMQ and let someone smarter than me work out a scalable variant.