According to the MPI-3.1 p.52 and p.53 (cited below), a request
created by `MPI_*_INIT` but not yet started by `MPI_START` or
`MPI_STARTALL` is inactive therefore `MPI_WAIT` or its friends
must return immediately if such a request is passed.
The current implementation hangs in `MPI_WAIT` and its friends
in such case because a persistent request is initialized as
`req_complete = REQUEST_PENDING`. This commit fixes the
Also, this commit fixes internal requests used in `MPI_PROBE`
and `MPI_IPROBE` which was marked wrongly as persistent.
MPI-3.1 p.52:
We shall use the following terminology: A null handle is a handle
with value MPI_REQUEST_NULL. A persistent request and the handle
to it are inactive if the request is not associated with any ongoing
communication (see Section 3.9). A handle is active if it is neither
null nor inactive. An empty status is a status which is set to return
tag = MPI_ANY_TAG, source = MPI_ANY_SOURCE, error = MPI_SUCCESS, and
is also internally configured so that calls to MPI_GET_COUNT,
MPI_GET_ELEMENTS, and MPI_GET_ELEMENTS_X return count = 0 and
MPI_TEST_CANCELLED returns false. We set a status variable to empty
when the value returned by it is not significant. Status is set in
this way so as to prevent errors due to accesses of stale information.
MPI-3.1 p.53:
One is allowed to call MPI_WAIT with a null or inactive request
argument. In this case the operation returns immediately with empty
Signed-off-by: KAWASHIMA Takahiro <>
Adds the new API hcoll_conetxt_free that resolves the issues
observed with the ctx cache and group_destroy_notify.
Signed-off-by: Valentin Petrov <>
`sturct mca_pml_ob1_comm_proc_t`, which is allocated per
connected rank in a communicator, had two paddings after
`expected_sequence` and `send_sequence` by alignments.
By changing the order of the members, the size of
`mca_pml_ob1_comm_proc_t` is reduced by 8 bytes on 64-bit
Signed-off-by: KAWASHIMA Takahiro <>
This fixes a bug reported in-house occuring with this component. It is triggered if the data assigned to different aggregators is highly differing, leading to different number of internal iterations required to handle it.
Signed-off-by: Edgar Gabriel <>
protect the mca_coll_libnbc_component.active_requests list with
the new mca_coll_libnbc_component.lock mutex.
Thanks Jie Hu for the report
Signed-off-by: Gilles Gouaillardet <>
change the default value of the mca_io_ompio_cycle_buffer_size parameter in order to avoid accidental truncation of a file for very large individual operations.
Thanks to @cniethammer for reporting it.
Signed-off-by: Edgar Gabriel <>
- instead of coll_base_comm_get_reqs(2) for irecv/isend, use only
one request allocated in the stack and do a irecv/send
- instead of ompi_request_wait_all(2), simpy ompi_request_wait
Signed-off-by: Gilles Gouaillardet <>
this is generally done in mca_pml_ob1_recv_request_free(), but this is not invoked
in via mca_pml_ob1_recv(), so do it manually
Thanks Yvan Fournier for the report
Signed-off-by: Gilles Gouaillardet <>
* If (legal) non-uniform data type signatures are used in ibcast
then the chosen algorithm may fail on the request, and worst case
it could produce wrong answers.
* Add an MCA parameter that, by default, protects the user from this
scenario. If the user really wants to use it then they have to
'opt-in' by setting the following parameter to false:
- `-mca coll_libnbc_ibcast_skip_dt_decision f`
* Once the following Issues are resolved then this parameter can
be removed.
Signed-off-by: Joshua Hursey <>
Adds mapping of the MPI Fortran pair types (2INTEGER, 2REAL, 2DBLPREC)
to the corresponding hcoll dtypes.
Signed-off-by: Valentin Petrov <>
recvreq->req_recv.req_base.req_type should always be set before invoking
MCA_PML_OB1_RECV_REQUEST_INIT(recvreq, ...) otherwise, the previous type
might be set, and you could end up with MPC_PML_REQUEST_IMPROBE when
Thanks Chris Pattison for the report and test case.
* If an error is detected internal to libnbc (e.g., PML truncation error)
this patch makes sure that the request is completed and the `MPI_ERROR`
field is set approprately.
* Make an attempt to cleanup outstanding requests before returning.
- This is a "best attempt" since not all PMLs support canceling requests.
In order to optimize for MPI_IN_PLACE, data is sent from the receive buffer.
consequently, it should be sent with the receive type and count.
Thanks Josh Hursey for the report and test case
Refs open-mpi/ompi#2256