Per https://github.com/open-mpi/ompi/issues/5031, if the user didn't specify a particular PMIx installation, then default back to the internal version if it is newer than the discovered external one. PMIx doesn't yet provide a full signature so we have to just get as close as possible for now.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 1e6aaf7f226f5a4d940e544079e3977229746c11)
- there worked progress was missed on startup which caused hang
on one of ranks
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit a081fba0465e0e03472fc45c9a4a7154f539e3f6)
Due to decreasing support by vendors/other orgs for the OpenIB BTL,
only look for iWarp/RoCE devices by default. Allow IB HCAs
with ports configured for ethernet.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
OFI BTL uses context for completion but never ask for it in
fi_getinfo(3). This commit makes sure that we always ask for FI_CONTEXT
to eliminate any potential error.
Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
OMPIO now uses the correct delete function depending on the fs
mca_common_ompio_file_delete now works this way instead
of calling POSIX unlink:
- create a minimal file handle with the given file name
- select the best fs component using this file handle
- call the component-specific file delete function
Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>
It is needed because the fs components might be queried due to a MPI_File_delete call.
And in this case, we don't have a communicator value.
Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>
instead of invoking ompi_request_test_all(), that will end up
calling opal_progress() recursively, manually check the status
of the requests.
the same method is used in ompi_comm_request_progress()
Refs open-mpi/ompi#3901
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Blocking `MPI_SCATTER` and `MPI_SCATTERV` were fixed in 506d0e96f4
but noblocking `MPI_ISCATTER` and `MPI_ISCATTERV` were not fixed yet.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
Move memory hooks init (for request based operation) in osc ucx to window
creation time, to avoid performance issue in MPI initialization.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
This commit fixes two bugs in the RMA/atomic emulation code:
1) Fix a fragment leak when using AMO emulation.
2) Always initialize the single-copy emulation code. This is required
to use the AMO support.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a crash that occurs when using btl/vader as an RDMA
btl. This btl supports using CPU atomics and does not support using
the btl for self communication so we must use the local memory
optimizations in osc/rdma.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Data transferred by `MPI_BSEND` may corrupt if all of the following
conditions are met.
- The message size is less than the eager limit.
- The `btl_alloc` function in the BTL interface returns `NULL`
for some reason.
- The MPI program overwrites the send buffer after `MPI_BSEND`
returns.
The problem is in the way of pending a send request in ob1 PML.
The `mca_pml_ob1_send_request_start_copy` function retruns
`OMPI_ERR_OUT_OF_RESOURCE` if `mca_bml_base_alloc` function returns
`des = NULL`. In this case, the send request is added to the
`send_pending` list and `MPI_BSEND` returns immediately. Next time
the `mca_pml_ob1_send_request_start_copy` function tries sending,
the user buffer may have been overwritten by the MPI program.
Call hierarchy of `MPI_BSEND`:
```
MPI_Bsend
mca_pml_ob1_send
if (MCA_PML_BASE_SEND_BUFFERED == sendmode)
mca_pml_ob1_isend
MCA_PML_OB1_SEND_REQUEST_START_W_SEQ
mca_pml_ob1_send_request_start_seq
mca_pml_ob1_send_request_start_btl
if (size <= eager_limit)
if (req_send_mode == MCA_PML_BASE_SEND_BUFFERED)
mca_pml_ob1_send_request_start_copy
mca_bml_base_alloc
btl_alloc
if (OMPI_ERR_OUT_OF_RESOURCE == rc)
add_request_to_send_pending
ompi_request_free
```
To solve this problem, we should save the data to the buffer
attached by `MPI_BUFFER_ATTACH` before leaving `MPI_BSEND`.
This problem was introduced by ob1 optimization (commits 2b57f422
and a06e491c) in v1.8 series.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>