Keep track of the connected procs in vader_add_procs().
Otherwise, the same rank will reconnect the same shmem
segment (rank 0+...) multiple times instead of the next
one as intended.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
(cherry picked from commit f69c8d6819)
- fix a typo `alloc_shared_contig` to `alloc_shared_noncontig`
- correct the value of `blocking_fence`
Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
(cherry picked from commit a07a83d189)
This commit changes the behavior of the individual sharedfp component. If
the component cannot create either the datafile or the metadatafile during File_open,
no error is being raised going forward. This allows applications that do not use shared
file pointer operations to continue execution without any issue.
If the user however subsequently calls MPI_File_write_shared or similar operations, an error
will be raised.
Fixes issue #7429
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
(cherry picked from commit df6e3e503a)
This commit increaes the osc_rdma_max_attach variable from 32
to 64. The new default is kept low due to the small number
of registration resources on some systems (Cray Aries). A
larger max attachement value can be set by the user on other
systems.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 54c8233f4f)
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
This commit addresses two issues in osc/rdma:
1) It is erroneous to attach regions that overlap. This was being
allowed but the standard does not allow overlapping attachments.
2) Overlapping registration regions (4k alignment of attachments)
appear to be allowed. Add attachment bases to the bookeeping
structure so we can keep better track of what can be detached.
It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.
References #7384
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 6649aef8bd)
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
- pgcc18 defines __GNUC__ similar to Intel compilers. So we must
check for pgi higher up, or else configury will mistake
it for gcc.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
(cherry picked from commit 14785deb3c)
Build was broken by mistake in commit d40662edc41a5a4d09ae690b640cfdeeb24e15a1
Fixes#7362
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 907ad854b4)
- increase number of max segments to allow application be launched
on some Ubuntu configurations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit f742f289ea)
Automake's Fortran compilation rules inexplicably use CPPFLAGS and
AM_CPPFLAGS. Unfortunately, this can cause problems in some cases
(e.g., picking up already-installed mpi.mod in a system-default
include search path).
So in relevant module-using Fortran compilation Makefile.am's, zero
out CPPFLAGS and AM_CPPFLAGS.
This has a side-effect of requiring that we compile the one .c file in
the F08 library in a new, separate subdirectory (with its own
Makefile.am that does _not_ have CPPFLAGS/AM_CPPFLAGS zeroed out).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit ab398f4b9a)
These -D's are for C compilation, not Fortran compilation. Remove
this useless statement.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f4a47a5a8e)
Both opal_hwloc_base_get_relative_locality() and _get_locality_string()
iterate over hwloc levels to build the proc locality information.
Unfortunately, NUMA nodes are not in those normal levels anymore since 2.0.
We have to explicitly look a the special NUMA level to get that locality info.
I am factorizing the core of the iterations inside dedicated "_by_depth"
functions and calling them again for the NUMA level at the end of the loops.
Thanks to Hatem Elshazly for reporting the NUMA communicator split failure
at https://www.mail-archive.com/users@lists.open-mpi.org/msg33589.html
It looks like only the opal_hwloc_base_get_locality_string() part is needed
to fix that split, but there's no reason not to fix get_relative_locality()
as well.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit ea80a20e10)
Fix the C types for the following:
* MPI_UNWEIGHTED
* MPI_WEIGHTS_EMPTY
* MPI_ARGV_NULL
* MPI_ARGVS_NULL
* MPI_ERRCODES_IGNORE
There is lengthy discussion on
https://github.com/open-mpi/ompi/pull/7210 describing the issue; the
gist of it is that the C and Fortran types for several MPI global
sentenial values should agree (specifically: their sizes must(**)
agree). We erroneously had several of these array-like sentinel
values be "array-like" values in C. E.g., MPI_ERRCODES_IGNORE was an
(int *) in C while its corresponding Fortran type was "integer,
dimension(1)". On a 64 bit platform, this resulted in C expecting the
symbol size to be sizeof(int*)==8 while Fortran expected the symbol
size to be sizeof(INTEGER, DIMENSION(1))==4.
That is incorrect -- the corresponding C type needed to be (int).
Then both C and Fortran expect the size of the symbol to be the same.
(**) NOTE: This code has been wrong for years. This mismatch of types
typically worked because, due to Fortran's call-by-reference
semantics, Open MPI was comparing the *addresses* of these instances,
not their *types* (or sizes) -- so even if C expected the size of the
symbol to be X and Fortran expected the size of the symbol to be Y
(where X!=Y), all we really checked at run time was that the addresses
of the symbols were the same. But it caused linker warning messages,
and even caused errors in some cases.
Specifically: due to a GNU ld bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common
symbols are incorrectly versioned VER_NDX_LOCAL because their
definitions in Fortran sources have smaller st_size than those in
libmpi.so.
This makes the Fortran library not linkable with lld in distributions
that ship openmpi built with -Wl,--version-script
(https://bugs.llvm.org/show_bug.cgi?id=43748):
% mpifort -fuse-ld=lld /dev/null
ld.lld: error: corrupt input file: version definition index 0 for symbol
mpi_fortran_argv_null_ is out of bounds
>>> defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so
...
If we fix the C and Fortran symbols to actually be the same size, the
problem goes away and the GNU ld bug does not come into play.
This commit also fixes a minor issue that MPI_UNWEIGHTED and
MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed
by commit 107c0073dd).
Fixesopen-mpi/ompi#7209
Signed-off-by: Fangrui Song <i@maskray.me>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 5609268e90)