Update ORTE support for dynamic PMIx operations e.g., PMIx_Spawn
Update to track master
Ensure that --disable-pmix-dstore actually disables the dstore. Sync to a few debugger updates
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
This commit fixes errors in the lb and extent of darray datatypes. For
these datatypes the lb should be the start offset of the rank's data
in the array and the extent should be the size of the entire
datatype. In master the lb was always 0 and the extent was always to
small. This commit updates the call to opal_datatype_resize to set the
correct lb and fixes the extent calculation.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
According to the MPI-3.1 p.52 and p.53 (cited below), a request
created by `MPI_*_INIT` but not yet started by `MPI_START` or
`MPI_STARTALL` is inactive therefore `MPI_WAIT` or its friends
must return immediately if such a request is passed.
The current implementation hangs in `MPI_WAIT` and its friends
in such case because a persistent request is initialized as
`req_complete = REQUEST_PENDING`. This commit fixes the
initialization.
Also, this commit fixes internal requests used in `MPI_PROBE`
and `MPI_IPROBE` which was marked wrongly as persistent.
MPI-3.1 p.52:
We shall use the following terminology: A null handle is a handle
with value MPI_REQUEST_NULL. A persistent request and the handle
to it are inactive if the request is not associated with any ongoing
communication (see Section 3.9). A handle is active if it is neither
null nor inactive. An empty status is a status which is set to return
tag = MPI_ANY_TAG, source = MPI_ANY_SOURCE, error = MPI_SUCCESS, and
is also internally configured so that calls to MPI_GET_COUNT,
MPI_GET_ELEMENTS, and MPI_GET_ELEMENTS_X return count = 0 and
MPI_TEST_CANCELLED returns false. We set a status variable to empty
when the value returned by it is not significant. Status is set in
this way so as to prevent errors due to accesses of stale information.
MPI-3.1 p.53:
One is allowed to call MPI_WAIT with a null or inactive request
argument. In this case the operation returns immediately with empty
status.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
instead of compilation date __DATE__, use a MPI_Get_library_version() like string
Thanks Alastair McKinstry for the report
Fixesopen-mpi/ompi#2518
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
F90 types cannot be freed by the enduser as specified by the standard.
but since they are ompi_datatype_dup'ed from predefined datatypes,
they have to be explicitly free'd at finalize time in order
to avoid a memory leak.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
declare ompi_mpi_show_mca_params_file as NULL
so MPI_T_Init_thread() can be invoked without leaking memory
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Adds the new API hcoll_conetxt_free that resolves the issues
observed with the ctx cache and group_destroy_notify.
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
`sturct mca_pml_ob1_comm_proc_t`, which is allocated per
connected rank in a communicator, had two paddings after
`expected_sequence` and `send_sequence` by alignments.
By changing the order of the members, the size of
`mca_pml_ob1_comm_proc_t` is reduced by 8 bytes on 64-bit
architectures.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
This fixes a bug reported in-house occuring with this component. It is triggered if the data assigned to different aggregators is highly differing, leading to different number of internal iterations required to handle it.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
protect the mca_coll_libnbc_component.active_requests list with
the new mca_coll_libnbc_component.lock mutex.
Thanks Jie Hu for the report
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
change the default value of the mca_io_ompio_cycle_buffer_size parameter in order to avoid accidental truncation of a file for very large individual operations.
Thanks to @cniethammer for reporting it.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
- instead of coll_base_comm_get_reqs(2) for irecv/isend, use only
one request allocated in the stack and do a irecv/send
- instead of ompi_request_wait_all(2), simpy ompi_request_wait
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
this is generally done in mca_pml_ob1_recv_request_free(), but this is not invoked
in via mca_pml_ob1_recv(), so do it manually
Thanks Yvan Fournier for the report
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
* If (legal) non-uniform data type signatures are used in ibcast
then the chosen algorithm may fail on the request, and worst case
it could produce wrong answers.
* Add an MCA parameter that, by default, protects the user from this
scenario. If the user really wants to use it then they have to
'opt-in' by setting the following parameter to false:
- `-mca coll_libnbc_ibcast_skip_dt_decision f`
* Once the following Issues are resolved then this parameter can
be removed.
- https://github.com/open-mpi/ompi/issues/2256
- https://github.com/open-mpi/ompi/issues/1763
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
Adds mapping of the MPI Fortran pair types (2INTEGER, 2REAL, 2DBLPREC)
to the corresponding hcoll dtypes.
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
MPI_Sizeof related stuff has been moved to their own files.
Remove MPI_Sizeof from Fortran interfaces when it cannot be built
(e.g. stock gcc 4.8 on CentOS 7)
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
recvreq->req_recv.req_base.req_type should always be set before invoking
MCA_PML_OB1_RECV_REQUEST_INIT(recvreq, ...) otherwise, the previous type
might be set, and you could end up with MPC_PML_REQUEST_IMPROBE when
MCA_PML_REQUEST_RECV is expected.
Thanks Chris Pattison for the report and test case.
Fixesopen-mpi/ompi#2275
* If an error is detected internal to libnbc (e.g., PML truncation error)
this patch makes sure that the request is completed and the `MPI_ERROR`
field is set approprately.
* Make an attempt to cleanup outstanding requests before returning.
- This is a "best attempt" since not all PMLs support canceling requests.
In order to optimize for MPI_IN_PLACE, data is sent from the receive buffer.
consequently, it should be sent with the receive type and count.
Thanks Josh Hursey for the report and test case
Refs open-mpi/ompi#2256
- pass field_mask to ucp_init().
- use non-blocking disconnect.
- recv() with pre-allocated request.
- call opal_progress() from iprobe() and improbe().
- use shift pattern in connect/disconnect.
Multiple conduits can exist at the same time, and can even point to the same base transport. Each conduit can have its own characteristics (e.g., flow control) based on the info keys provided to the "open_conduit" call. For ease during the transition period, the "legacy" RML interfaces remain as wrappers over the new conduit-based APIs using a default conduit opened during orte_init - this default conduit is tied to the OOB framework so that current behaviors are preserved. Once the transition has been completed, a one-time cleanup will be done to update all RML calls to the new APIs and the "legacy" interfaces will be deleted.
While we are at it: Remove oob/usock component to eliminate the TMPDIR length problem - get all working, including oob_stress
Instead of ompi_datatype_get_extent(), use ompi_datatype_get_true_extent()
to get the local and remote lower bound. For derived types like
subarray, true_lb is the correct offset for RDMA operations.
Instead of ompi_datatype_get_extent(), use ompi_datatype_get_true_extent()
to get the origin and target lower bound. For derived types like
subarray, true_lb is the correct offset for RDMA operations. Also,
instead of the extent use the size of the datatype.
As long as it is illegal to call MPI_T_init_thread() after MPI_Finalize(),
be gentle and release as much memory as possible in MPI_Finalize().
opal_cleanup() will be invoked again by the OPAL destructor, but will
do nothing since classes was set to NULL
This commit adds some glue code to support the C++ bindings and
updates the bindings to use the new glue code. This protects our
internal headers (which are C99) from C++. This is done as a quick
workaround to compilation errors when the legacy C++ bindings are
requested.
Fixesopen-mpi/ompi#2055
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
* In open-mpi/ompi@f6f24a4f67 I missed
updating the library references for the wrapper compilers.
* Fixes the CXX wrapper compiler and CXX library is renamed as needed.
* Fixes the Java wrapper compiler and the Java library is renamed as needed.
* Add a configure time option to rename libmpi(_FOO).*
- `--with-libmpi-name=STRING`
* This commit only impacts the installed libraries.
Internal, temporary libraries have not been renamed to limit the
scope of the patch to only what is needed.
For example:
```shell
shell$ ./configure --with-libmpi-name=wookie
...
shell$ find . -name "libmpi*"
shell$ find . -name "libwookie*"
./lib/libwookie.so.0.0.0
./lib/libwookie.so.0
./lib/libwookie.so
./lib/libwookie.la
./lib/libwookie_mpifh.so.0.0.0
./lib/libwookie_mpifh.so.0
./lib/libwookie_mpifh.so
./lib/libwookie_mpifh.la
./lib/libwookie_usempi.so.0.0.0
./lib/libwookie_usempi.so.0
./lib/libwookie_usempi.so
./lib/libwookie_usempi.la
shell$
```
Relax CPU usage pressure from the application processes when doing
modex and barrier in ompi_mpi_init.
We see significant latencies in SLURM/pmix plugin barrier progress
because app processes are aggressively call opal_progress pushing
away daemon process doing collective progress.
--disable-io-ompio is a shortcut that disable the following
frameworks and components
- fbtl
- fcoll
- sharedfp
- common/ompio
- io/ompio
Fixesopen-mpi/ompi#1934
- move the mpi-io configury option into config/ompi_configure_options.m4
- add ompi/mca/common/ompio/configure.m4 so this component is not built when
Open MPI is configure'd with --disable-mpi-io
Fixesopen-mpi/ompi#2009
This commit fixes a typo in compare-and-swap when retrieving the
memory region associated with a displacement. It was erroneously 8
bytes instead of the datatype size. This can cause an incorrect RMA
range error when the compare-and-swap is less than 4 bytes from the
end of the region.
Fixedopen-mpi/ompi#2080
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
use MPI_MIN instead of MPI_MAX when appropriate, otherwise
a currently used CID can be reused, and bad things will likely happen.
Refs open-mpi/ompi#2061
This commit improves and corrects error handling. In
cases where existing objects are altered after a call
to ompi_java_exceptionCheck, the results of the exception
check method are checked. In the case of an exception,
memory is cleaned up and the code returns to Java without
altering existing objects.
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
This commit updates the intercomm allgather to do a local comm bcast
as the final step. This should resolve a hang seen in intercomm
tests.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit adds support for using network AMOs for MPI_Accumulate,
MPI_Fetch_and_op, and MPI_Compare_and_swap. This support is only
enabled if the ompi_single_intrinsic info key is specified or the
acc_single_interinsic MCA variable is set. This configuration
indicates to this implementation that no long accumulates will be
performed since these do not currently mix with the AMO
implementation.
This commit also cleans up the code somwhat. This includes removing
unnecessary struct keywords where the type is also typedef'd.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit cleans up some code in the passive target path. The code
used the buffered frag control send path but it is more appropriate to
use the unbuffered one. This avoids checking structures that are
should not be in use in this path.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
store oshmem related per proc data in an oshmem_proc_data_t struct,
that is stored in the padding section of an ompi_proc_t
this data can be accessed via the OSHMEM_PROC_DATA(proc) macro
Fixesopen-mpi/ompi#2023
if sendbuf is equal to recvbuf, that should not be interpreted
as equivalent to MPI_IN_PLACE on the non root rank(s)
Thanks Valentin Petrov for the report
predefined datatypes such as MPI_LONG_DOUBLE_INT are not really contiguous,
so use span as returned by opal_datatype_span() instead of type extent,
otherwise data might be written above allocated memory.
Thanks Valentin Petrov for the report