The data endpoint was not being set correctly for local peers in some
cases. This commit fixes the bug and cleans the associated code to
simplify the logic.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This now passes the loop test, and so we believe it resolves the random hangs in finalize.
Changes in PMIx master that are included here:
* Fixed a bug in the PMIx_Get logic
* Fixed self-notification procedure
* Made pmix_output functions thread safe
* Fixed a number of thread safety issues
* Updated configury to use 'uname -n' when hostname is unavailable
Work on cleaning up the event handler thread safety problem
Rarely used functions, but protect them anyway
Fix the last part of the intercomm problem
Ensure we don't cover any PMIx calls with the framework-level lock.
Protect against NULL argv comm_spawn
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
make NBC_Handle (almost) an internal structure created
by NBC_Schedule_request()
use a local variable instead of what was previously handle->tmpbuf
Refs open-mpi/ompi#3487
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
update the cartesian communicator based grouping strategy to match the other
algorithms used in the aggregator selection process.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
the cart_based_grouping aggregator strategy was not correctly updated
during the last major rewrite of the aggregator selection algorithm.
It is also not supposed to be called from file_open (but from
file_set_view).
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
the 'hostname' command might not be available on some platforms
such as Fedora Core 26, so mimick config/libtool.m4 and fallback
to 'uname -n' if needed
Refs. #3680
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
* Add work around support for the following missing ops in ROMIO 3.1.4
- `MPI_File_iread_at_all`
- `MPI_File_iwrite_at_all`
- `MPI_File_iread_all`
- `MPI_File_iwrite_all`
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
* Protects us from segv when ROMIO 314 is selected and one of the
following operations is called:
- MPI_File_iread_at_all
- MPI_File_iwrite_at_all
- MPI_File_iread_all
- MPI_File_iwrite_all
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
This commit adds code to allow support for the info assertions added
by mpi-forum/mpi-issues#11. The assertions added are:
mpi_assert_no_any_tag, mpi_assert_no_any_source,
mpi_assert_exact_length, and mpi_assert_allow_overtaking.
This commit also adds support for the mpi_assert_no_any_source and
mpi_assert_allow_overtaking info keys to the ob1 pml.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Hostname and PID are output as a message prefix in many places in
our code. Their printf-formats were either `[%s:%d]` or `[%s:%05d]`.
This commit changes `[%s:%d]` to `[%s:%05d]`. The latter was more
widely used in our code (including OPAL output system and the signal
handler).
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
This commit expands the effect of the MCA parameter `opal_abort_delay`
to the OPAL signal handler. This allows attaching of a debugger on
segmentation fault etc. before quitting the job.
The sleep code is moved to the `opal_delay_abort` function from the
`ompi_mpi_abort` and `oshmem_shmem_abort` functions for code cleanup.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
Example (using MPI_ORDER_C so the below has 6 rows of 4 ints to parcel out)
size = 4;
rank = 0;
ndims=2;
gsizes[0] = 6;
gsizes[1] = 4;
distribs[0] = MPI_DISTRIBUTE_CYCLIC;
distribs[1] = MPI_DISTRIBUTE_BLOCK;
dargs[0] = 2;
dargs[1] = 2;
psizes[0] = 2;
psizes[1] = 2;
MPI_Type_create_darray(size, rank, ndims,
gsizes, distribs, dargs, psizes,
MPI_ORDER_C, MPI_INT, &mydt);
Expectation for the layout:
inner dimension (1) is
4 items (ints) distributed block over 2 ranks with 2 items each
eg for rank 0: [ x x . . ]
outer dimension (0) is:
6 items (the above [ x x . .]) cyclic over 2 ranks with 2 items each
eg for rank 0:
[ x x . . ] : offset=0 bytes=8
[ x x . . ] : ofset=16 bytes=8
[ . . . . ]
[ . . . . ]
[ x x . . ] : offset=64 bytes=8
[ x x . . ] : offset=80 bytes=8
Or more specifically a stream of ints 0,1,2,3,4,5,6,7 sent into that
type should be
[ 0 1 . . ]
[ 2 3 . . ]
[ . . . . ]
[ . . . . ]
[ 4 5 . . ]
[ 6 7 . . ]
The data was laying out though as
[ 0 1 2 3 ]
[ . . . . ]
[ . . . . ]
[ . . . . ]
[ 4 5 6 7 ]
[ . . . . ]
because the recursive construction inside the block() function (which
creates the smaller row datatype [ x x . . ]) wasn't setting the extent
of that type.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
This commit adds the `req_start` member to the `ompi_request_t` struct.
The `MPI_START` and `MPI_STARTALL` routines call this callback function
instead of `MCA_PML_CALL(start(...))`. So components that return
persistent request must set this member to their request objects.
`mca_pml_base_module_t::pml_start` is not deleted because
`MCA_PML_CALL(start(...))` is still used elsewhere across OMPI.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
rename ompi/mpi/fortran/base/strings.h so it does not get pulled
when /usr/include/strings.h is expected.
Refs open-mpi/ompi#3639
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Convert the predefined MPI object padding to a fixed number of bytes
(vs. a multiple of sizeof(void*)) so that the padding is the same size
between 32 and 64 bit builds. I.e., we won't have a situation where
we've run out of padding in 32 bit builds but still have more space
available in 64 bit builds.
Fixes#3610
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This commit moves the info subscribe for the blocking_fence to after
the global_state is allocated and moves setting win->w_osc_module to
before the info subscribe for alloc_shared_contig. This fixes a SEGV
caught by MTT.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Without this change, the directory of `javadoc` command must be
included in the `PATH` environment variable at `make`-time.
Paths of `javac`, `javah`, and `jar` commands are detected in
`configure`. So the path of `javadoc` also should be detected.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
a buffer defined by (buf, count, dt)
will have data starting at buf+offset and ending len bytes later with
len = opal_datatype_span(&dt.super, count, &offset);
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
This commit removes a nonexistent function that was causing build
problems under certain environments.
Reference #3442
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
A component with implementation of R. Rabenseifner's algorithm for Reduce and Allreduce.
This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by a gather or an allgather.
Current limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators onl
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
coll/spacc: Modify implementation to use `ompi_coll_base_sendrecv()`
Replace irecv() + isend() + ompi_request_wait() to ompi_coll_base_sendrecv().
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
origin_datatype and target_datatype might be different and hence have different extent,
so use either origin_extent or target_extent when appropriate.
Refs open-mpi/ompi#3569
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
All other man pages don't have an empty line after
the "! or the older form: INCLUDE 'mpif.h'" line
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
The osc_rdma_get_remote_segment() has the 3rd and 4th args as
* target_disp
* length
which it uses to determine if the rdma falls within the bounds of
the window or not (actually it only checks the upper bound, but I'm
okay with that).
Anyway the caller previously was passing in the length argument as
target_datatype->super.size * target_count
which which doesn't really represent the number of bytes after target_disp
for which data exists. In particular I could create a datatype as
{ disp -4, len 4 } and use target_disp 4
and that would be bytes 0-3 of the window where the original code
would think it was bytes 4-7 and could abort at the range check.
Ive changed it to use the opal_datatype_span() function.
Signed-off-by: Mark Allen <markalle@us.ibm.com>