fix some of the multi-threading problems for the cid allocation. Two bugs
specifically:
- since we do not have a queue for incoming fragments of unknown cid, we need
to synchronize all processes before exiting the communicator creation. This
synchronization was/is located in comm_activate, which was however too late
for the multi-threaded case. Thus, for multi-threaded scenarios we are now
synchronizing 'before' we allow another thread to enter the cid-allocation
loop.
- for synchronization, we used for the sake of simplicity allreduce
operations. It turns out, that these operations interefered with the
allreductions in the cid-allocation routine, which lead to non-sense results
in the cid-allocation and potentially to endless loops.
Multi-threaded communicator creation seems to work now, is however still 'very
very' slow. I think, the busy wait of threads is killing the performance of
the active threads in the cid allocation. But this is another topic.
This commit was SVN r15910.