This commit removes the --with-mpi-thread-multiple option and forces
MPI_THREAD_MULTIPLE support. This cleans up an abstration violation
in opal where OMPI_ENABLE_THREAD_MULTIPLE determines whether the
opal_using_threads is meaningful. To reduce the performance hit on
MPI_THREAD_SINGLE programs an OPAL_UNLIKELY is used for the
check on opal_using_threads in OPAL_THREAD_* macros.
This commit does not clean up the arguments to the various functions
that take whether muti-threading support is enabled. That should be
done at a later time.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a bug that can occur when communicating via XRC to
peers on the same node. UDCM was not saving the SRQ numbers on the
loopback endpoint (which shares its ib_addr info with all local peers)
so any messages to local peers use an invalid SRQ number.
Fixesopen-mpi/ompi#1383
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
There is a bug in MPMD detection that disables totalview if a : is
found anywhere on the command line. This includes inside an argument
option or MCA variable value. This commit changes the check to look
for the string " : " instead of the character : which should eliminate
the issue in most cases.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This bug fixes two issue with the ib_addr lock:
- The ib_addr lock must always be obtained regardless of
opal_using_threads() as the CPC is run in a seperate thread.
- The ib_addr lock is held in mca_btl_openib_endpoint_connected when
calling back into the CPC start_connect on any pending
connections. This will attempt to obtain the ib_addr lock
again. Since this is not a performance-critical part of the code
the lock has been changed to be recursive.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a bug that occurs when attempting a get or put
operation on an endpoint that is not already connected. In this case
the remote_srqn may be set to an invalid value as the rem_srqs array
on the endpoint is not populated. This commit moves the usage of the
rem_srqs array to the internal put/get functions where it is
guaranteed this array is populated.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
If ibv_fork_init() has been invoked the pages are marked MADV_DONTFORK.
If we only partially use a page, any data allocated on the remainder of
the page will be inaccessible to the child process.
Fixesopen-mpi/ompi#1363
This commit fixes a memory corruption bug when parsing lines of the
form:
-x FOO=bar
The code was making changes to the size of the buffer allocated for
key_buffer without making the appropriate changes to
key_buffer_len. This was causing subsequent calls to save_param_name
to write to invalid memory.
This commit makes the following changes:
- Fix the above bug by modifying trim_name to move the string within
the buffer instead of re-allocating space for the trimmed string.
- Cleaned up both trim_name and save_param_name. Both functions took
a prefix and suffix to trim. Problem was the prefix was not
treated like a prefix. Instead the "prefix" was located inside the
string using strstr then the trimmed value started after the
substring (even in the middle of the string). To allow trimming
both -x and --x (as well as -mca and --mca) trim_name is now
called with each prefix.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit ensures ib_addr->remote_xrc_rcv_qp_num value is set when
creating the loopback queue pair. This is needed when communicating
with any other local peer.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>