Add more stubs to reduce likelihood of future
mysterious segfaults if some of the newer pmix
funcs start to get used within ompi.
Add a get_version to return the version of the
Cray PMI library being used, since the Cray PMI
library actually has a function to get that info.
Be more accurate about which functions have a hope
of being implemented using Cray PMI and those which
never will.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
* update to configury to silence ident messages (thanks Gilles!)
* fix for warnings Jeff saw when get didn't find the requested data
* fix for Mac OSX operations
Bring Slurm PMI-1 component online
Bring the s2 component online
Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.
Bring the OMPI pubsub/pmi component online
Get comm_spawn working again
Ensure we always provide a cpuset, even if it is NULL
pmix/cray: adjust cray pmix component for pmix
Make changes so cray pmix can work within the integrated
ompi/pmix framework.
Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet
Cleanup comm_spawn - procs now starting, error in connect_accept
Complete integration
There is currently a path through the grdma mpool and vma rcache that
leads to deadlock. It happens during the rcache insert. Before the
insert the rcache mutex is locked. During the call a new vma item is
allocated and then inserted into the rcache tree. The allocation
currently goes through the malloc hooks which may (and does) call back
into the mpool if the ptmalloc heap needs to be reallocated. This
callback tries to lock the rcache mutex which leads to the
deadlock. This has been observed with multi-threaded tests and the
openib btl.
This change may lead to some minor slowdown in the rcache vma when
threading is enabled. This will only affect larger message paths in
some of the btls.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This new class is the same as the opal_mutex_t class but has a
different constructor. This constructor adds the recursive flag to the
mutex attributes for the lock. This class can be used where there may
be re-enty into the lock from within the same thread.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
There were several issues preventing the openib btl from running in
thread multiple mode:
- Missing locks in UDCM when generating a loopback endpoint. Fixed in
open-mpi/ompi@8205d79819.
- Incorrect sequence numbers generated in debug mode. This did not
prevent the openib btl from running but instead produced incorrect
error messages in debug builds.
- Recursive locking of the rcache lock caused by the malloc
hooks. This is fixed by open-mpi/ompi#827
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
When using eager RDMA in debug builds the openib btl generates a
sequence number for each send. The code independently updated the head
index and the sequence number for the eager rdma transaction. If
multiple threads enter this code at the same time and run in the
following order:
thread 1: update sequence (0 -> 1)
thread 2: update sequence (1 -> 2)
thread 2: update head (0 -> 1)
thread 1: update head (1 -> 2)
the sequence number for head[0] gets 1 and the sequence number for
head[1] gets 0. The fix is to generate the sequence number from the
head index.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>