1
1
Brian Barrett bfbc281e93 Fix slow startup issue with the MX MTL. The problem is caused by mx_connect() being a one-sided operation from the API level, but not being an interrupting call when the target is not entering the MX library. So if most of the processes exit mtl_mx_add_procs() and enter the stage gate 2 barrier, the other processes can only progress their mx_connect() calls when the targets enter the mx library. Because the event library is in EV_ONELOOP mode, this only happens once a second. The mx progress thread (hidden in the MX library) also only wakes up once a second, so mx_connect calls can take a second to complete.
The temporary solution is to switch into EV_NONBLOCK mode earlier (right after the mx_connect loop) so that there isn't a giant slowdown when processes enter the stage gate 2 barrier before other proesses.  They will now not block in the event library for any period of time, which appears to have a 50% speedup when running at > 64 procs.

Refs trac:645

This commit was SVN r12713.

The following Trac tickets were found above:
  Ticket 645 --> https://svn.open-mpi.org/trac/ompi/ticket/645
2006-12-01 02:49:01 +00:00
..
2006-08-15 21:59:37 +00:00