Fix slow startup issue with the MX MTL. The problem is caused by mx_connect() being a one-sided operation from the API level, but not being an interrupting call when the target is not entering the MX library. So if most of the processes exit mtl_mx_add_procs() and enter the stage gate 2 barrier, the other processes can only progress their mx_connect() calls when the targets enter the mx library. Because the event library is in EV_ONELOOP mode, this only happens once a second. The mx progress thread (hidden in the MX library) also only wakes up once a second, so mx_connect calls can take a second to complete.

The temporary solution is to switch into EV_NONBLOCK mode earlier (right after the mx_connect loop) so that there isn't a giant slowdown when processes enter the stage gate 2 barrier before other proesses. They will now not block in the event library for any period of time, which appears to have a 50% speedup when running at > 64 procs. Refs trac:645 This commit was SVN r12713. The following Trac tickets were found above: Ticket 645 --> https://svn.open-mpi.org/trac/ompi/ticket/645
2006-12-01 02:49:01 +00:00 · 2006-12-01 02:49:01 +00:00 · bfbc281e93
--- a/ompi/mca/mtl/mx/mtl_mx.c
+++ b/ompi/mca/mtl/mx/mtl_mx.c
@ -151,6 +151,16 @@ ompi_mtl_mx_add_procs(struct mca_mtl_base_module_t *mtl,
        mtl_peer_data[i] =  (struct mca_mtl_base_endpoint_t*) 
            mtl_mx_endpoint;
    }
+
+    /* because mx_connect isn't an interupting function, need to
+       progress MX as often as possible during the stage gate 2.  This
+       would have happened after the stage gate anyway, so we're just
+       speeding things up a bit. */
+#if OMPI_ENABLE_PROGRESS_THREADS == 0
+    /* switch from letting us sit in the event library for a bit each
+       time through opal_progress() to completely non-blocking */
+    opal_progress_set_event_flag(OPAL_EVLOOP_NONBLOCK);
+#endif
    
    return OMPI_SUCCESS;
 }