This reverts commit open-mpi/ompi@f7257a8310.
Ensure that we properly cleanup the session directory tree. Prior code had issues with symlinks, especially if the file that the link points to was already removed as we traverse the tree. Also found that the dirent checks for directory type weren't fully portable, and so fall back to the stat-based approach which is known to be portable.
Fix singularity singletons by detecting we are in a container and properly setting the pmix selection to pick the isolated component. Remove a stale restriction blocking use of the sm btl
Or at least that was the origin of the issue. It turns out
we were freeing the wrong buffer (but as it only happen in the
case of an error we never noticed).
This patch addresses most (if not all) @derbeyn concerns
expressed on #1015. I added checks for the requests allocation
in all functions, ompi_coll_base_free_reqs is called with the
right number of requests, I removed the unnecessary basic_module_comm_t
and use the base_module_comm_t instead, I remove all uses of the
COLL_BASE_BCAST_USE_BLOCKING define, and other minor fixes.
This commit brings the scif btl up to date with changes made on master
to rework the mpool and rcache frameworks.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds the data necessesary for supporting dynamic add_procs
to the rdma message (opal_process_name_t). The endpoint lookup
function has been updated to match the code in udcm.
Closesopen-mpi/ompi#1468.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
A bus error occurs in sm OSC under the following conditions.
- sparc64 or any other architectures which need strict alignment.
- `MPI_WIN_POST` or `MPI_WIN_START` is called for a window created
by sm OSC.
- The communicator size is odd and greater than 3.
The lines 283-285 in current `ompi/mca/osc/sm/osc_sm_component.c` has
the following code.
```c
module->global_state = (ompi_osc_sm_global_state_t *) (module->segment_base);
module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1);
module->posts[0] = (uint64_t *) (module->node_states + comm_size);
```
The size of `ompi_osc_sm_node_state_t` is multiples of 4 but not
multiples of 8. So if `comm_size` is odd, `module->posts[0]` does
not aligned to 8. This causes a bus error when accessing
`module->posts[i][j]`.
This patch fixes the alignment of `module->posts[0]` by setting
`module->posts[0]` first.
Fix CID 1345825 (1 of 1): Dereference before null check (REVERSE_INULL):
ib_proc should not be NULL in this case. Removed the check and added a
check for NULL after OBJ_NEW.
CID 1269821 (1 of 1): Dereference null return value (NULL_RETURNS):
I labeled this one as a false positive (which it is) but the code in
question could stand be be cleaned up.
Fix CID 1356424 (1 of 1): Argument cannot be negative (NEGATIVE_RETURNS):
While trying to silence another Coverity issue another was
flagged. Protect the close of fd with if (fd >= 0).
CID 70772 (1 of 1): Dereference null return value (NULL_RETURNS):
CID 70773 (1 of 1): Dereference null return value (NULL_RETURNS):
CID 70774 (1 of 1): Dereference null return value (NULL_RETURNS):
None of these are errors and are intentional but now that we have a
list release function use that to make these go away. The cleanup is
similar to CID 1269821.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Fix CID 1315298: Resource leak (RESOURCE_LEAK) :
Fix CID 1315300: Resource leak (RESOURCE_LEAK):
Fix CID 1315299: Resource leak (RESOURCE_LEAK):
Fix CID 1315297 (#1 of 1): Resource leak (RESOURCE_LEAK):
Confirmed leaks in error paths. Added the leaked arrays to the
ERR_EXIT macro to ensure they are freed.
Fix CID 1315296 (#1 of 1): Resource leak (RESOURCE_LEAK):
Confirmed leak in error paths. Both the oversub and reqs arrays are
leaked. Free these arrays on error.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>