OMPI can only support PMIx v3 and above. PRRTE requires at least PMIx
v4, so protect against the case where OMPI is built against an external
PMIx v3.
Fix check of PMIx_Init return code for singleton operations.
Ensure that the PMIx framework gets properly opened.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Per the devel mailing list, we discussed the need/desirability of this change. Here is the logic behind not including it:
If you call "hwloc_topology_load", then hwloc merrily does its discovery and slams many-core systems. If you call "opal_hwloc_get_topology", then that is fine - it checks if we already have it, tries to get it from PMIx (using shared mem for hwloc 2.x), and only does the discovery if no other method is available.
We previously decided to let those who need the topology call "opal_hwloc_get_topology" to ensure the topo was available so that we don't load it unless someone actually needs it - in the case where it isn't available via PMIx, this avoids paying the startup time and memory footprint penalties for no reason. The function is protected so it will simply return SUCCESS if the topology is already defined.
After discussion, it was decided to stick with that "only setup the topology if someone actually needs it" approach. Hence, we will not blanket init the topology, and the mtl/ofi component will call opal_hwloc_get_topology to ensure the topo has been defined prior to using it.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
Use "prte" instead of "prun" for proxy execution of cmds like mpirun.
This avoids the fork/exec-rendezvous complexities and should result in
more reliable operation.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Restore missing call to get_topology - others were doing it in their
components as repeated calls just return success, but let's ensure it is
always present.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Extend the PMIx modex recv macros to cover the full set of
immediate/optional combinations. If PMIx_Init cannot reach a server,
then declare the MPI proc to be a singleton.
Provide full support for info values via PMIx
Catch all the values used in the "info" area of OMPI using data
available from PMIx instead of via envars. Update PMIx and PRRTE to sync
with their capabilities.
PMIx
- ensure cleanup of fork/exec children
- fix bug in gds/hash that left app info off of list
PRRTE
- fix multi-app bugs
- port setup_child logic from orte
- OMPI env changes
- set app->first_rank
- ensure common hostname across prun, prte, and pmix
- Fix "nolocal" support
Silence a warning from btl/vader
Signed-off-by: Ralph Castain <rhc@pmix.org>
The fence logic in MPI_Init got messed up somehow such that we were
always executing a fence, which is not desirable. The logic is supposed
to be:
* if async fence is requested and we are not collecting data, then do
not fence at all
* if async fence is requested and we are collecting data, then execute
the fence in the background - wait for completion at the end of MPI_Init.
* if async fence is not requested, then execute a blocking fence at that
point, collecting data as directed. Note that we cannot actually do a
blocking fence as we need to cycle the event library via opal_progress
as the PMIx progress thread is tied to the OMPI event base.
Signed-off-by: Ralph Castain <rhc@pmix.org>
orte is gone, and we don't want to require other
RM's to either use a prrte specific env, or to
set their own.
OMPI_MCA_orte_ess_num_procs -> OMPI_MCA_num_procs
OMPI_MCA_orte_cpu_type -> OMPI_MCA_cpu_type
See PRRTE PR's:
https://github.com/openpmix/prrte/pull/443https://github.com/openpmix/prrte/pull/440
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
This commit fixes an issue with the include usage in some
ompi source files. These source files are using the <> form
of include when the "" form is correct (as these are internal,
**not** system headers).
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
- Port memchecker call from a1d502c.
- Remove unused memcheck macro variables.
- Some code readability improvements.
- Remove some stray +1's in dynamic comm cleanup.
- Re-add OPAL_ENABLE_DEBUG macro to osc header.
- Cleanup some printf's, and includes.
- Refactor cleanup of dpm_disconnect_objs.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
- fix handling of contiguous datatypes with a non-zero true lower bound
- fix handling of datatypes using block of non contiguous predefined datatypes
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
OpenPMIx fills in a variety of info when it detects that we are in
singleton mode. Best way of detecting it is to look for the "singleton"
at the beginning of the returned nspace.
Make the modex recvs optional so we don't bounce up to the server and
then to the host trying to retrieve job-level info that must be given to
us at job start.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Temporary solution for the PML inconsistency issue discussed in #7475.
This patch address 2 things: first it make the PMIx key optional so that
if we are not in a full modex mode we don't do a direct modex, and
second it get the PML info from the vpid 0 instead of from the local
rank.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
This commit contains the following changes:
The C++ bindings were removed from the standard in MPI-3.0. This
commit removes the entirety of the C++ bindings as well as the
support configury.
Removes all references to C++ from the man pages. This includes the
bindings themselves, all references to what C++ bindings return,
all not-available comments, and differences between C++ and other
language bindings.
If the user passes --enable-mpi-cxx, --enable-mpi-cxx-seek, or
--enable-cxx-exceptions, print a warning message an abort configure.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
1- Remove the common symbols issue: global variable not initialized. (#7424)
Move the variables to local scope within the set_info function.
2- Remove GPFS hints using datashipping: not used anymore
3- Redirect output stream to corresponding fs framework.
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
The communicator might be not existent yet when mca_fs_gpfs_component_file_query() is called.
Therefore, we need to check it first before calling brodcast function.
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
This commit changes the behavior of the individual sharedfp component. If
the component cannot create either the datafile or the metadatafile during File_open,
no error is being raised going forward. This allows applications that do not use shared
file pointer operations to continue execution without any issue.
If the user however subsequently calls MPI_File_write_shared or similar operations, an error
will be raised.
Fixes issue #7429
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
If someone gives us a namespace that doesn't easily translate to an
integer, we have to create a mechanism for working around the
disconnect. PRRTE has been updated to give us a flag so we know we were
"natively" launched. If we don't see it, then fall back to generating a
hash of the nspace as our jobid. We then have to translate back/forth
between nspace and jobid using a lookup table.
Probably not the right long-term solution, but hopefully helps get us
thru for a bit.
Includes update of PRRTE pointer
Signed-off-by: Ralph Castain <rhc@pmix.org>
This commit fixes an issue with non-debug builds where adding an
attachment to the attachment list doesn't actually happen. This
causes all MPI_Win_detach calls to fail. The call was within an
assert which is optimized out in optimized builds.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
This commit increaes the osc_rdma_max_attach variable from 32
to 64. The new default is kept low due to the small number
of registration resources on some systems (Cray Aries). A
larger max attachement value can be set by the user on other
systems.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
This commit addresses two issues in osc/rdma:
1) It is erroneous to attach regions that overlap. This was being
allowed but the standard does not allow overlapping attachments.
2) Overlapping registration regions (4k alignment of attachments)
appear to be allowed. Add attachment bases to the bookeeping
structure so we can keep better track of what can be detached.
It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.
References #7384
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Fix a couple of spots in OMPI to resolve warnings. The one in comm_cid
in particular may be responsible for some/all of the comm_spawn issues
as it was passing an incorrect pointer to a macro, thus causing memory
corruption.
Update PRRTE and PMIx to deal with v3/v4 differences.
Signed-off-by: Ralph Castain <rhc@pmix.org>
- fix a typo `alloc_shared_contig` to `alloc_shared_noncontig`
- correct the value of `blocking_fence`
Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.
remove orte: misc fixes
- UCX fixes
- VPATH issue
- oshmem fixes
- remove useless definition
- Add PRRTE submodule
- Get autogen.pl to traverse PRRTE submodule
- Remove stale orcm reference
- Configure embedded PRRTE
- Correctly pass the prefix to PRRTE
- Correctly set the OMPI_WANT_PRRTE am_conditional
- Move prrte configuration to the end of OMPI's configure.ac
- Make mpirun a symlink to prun, when available
- Fix makedist with --no-orte/--no-prrte option
- Add a `--no-prrte` option which is the same as the legacy
`--no-orte` option.
- Remove embedded PMIx tarball. Replace it with new submodule
pointing to OpenPMIx master repo's master branch
- Some cleanup in PRRTE integration and add config summary entry
- Correctly set the hostname
- Fix locality
- Fix singleton operations
- Fix support for "tune" and "am" options
Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
Automake's Fortran compilation rules inexplicably use CPPFLAGS and
AM_CPPFLAGS. Unfortunately, this can cause problems in some cases
(e.g., picking up already-installed mpi.mod in a system-default
include search path).
So in relevant module-using Fortran compilation Makefile.am's, zero
out CPPFLAGS and AM_CPPFLAGS.
This has a side-effect of requiring that we compile the one .c file in
the F08 library in a new, separate subdirectory (with its own
Makefile.am that does _not_ have CPPFLAGS/AM_CPPFLAGS zeroed out).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Fix the C types for the following:
* MPI_UNWEIGHTED
* MPI_WEIGHTS_EMPTY
* MPI_ARGV_NULL
* MPI_ARGVS_NULL
* MPI_ERRCODES_IGNORE
There is lengthy discussion on
https://github.com/open-mpi/ompi/pull/7210 describing the issue; the
gist of it is that the C and Fortran types for several MPI global
sentenial values should agree (specifically: their sizes must(**)
agree). We erroneously had several of these array-like sentinel
values be "array-like" values in C. E.g., MPI_ERRCODES_IGNORE was an
(int *) in C while its corresponding Fortran type was "integer,
dimension(1)". On a 64 bit platform, this resulted in C expecting the
symbol size to be sizeof(int*)==8 while Fortran expected the symbol
size to be sizeof(INTEGER, DIMENSION(1))==4.
That is incorrect -- the corresponding C type needed to be (int).
Then both C and Fortran expect the size of the symbol to be the same.
(**) NOTE: This code has been wrong for years. This mismatch of types
typically worked because, due to Fortran's call-by-reference
semantics, Open MPI was comparing the *addresses* of these instances,
not their *types* (or sizes) -- so even if C expected the size of the
symbol to be X and Fortran expected the size of the symbol to be Y
(where X!=Y), all we really checked at run time was that the addresses
of the symbols were the same. But it caused linker warning messages,
and even caused errors in some cases.
Specifically: due to a GNU ld bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common
symbols are incorrectly versioned VER_NDX_LOCAL because their
definitions in Fortran sources have smaller st_size than those in
libmpi.so.
This makes the Fortran library not linkable with lld in distributions
that ship openmpi built with -Wl,--version-script
(https://bugs.llvm.org/show_bug.cgi?id=43748):
% mpifort -fuse-ld=lld /dev/null
ld.lld: error: corrupt input file: version definition index 0 for symbol
mpi_fortran_argv_null_ is out of bounds
>>> defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so
...
If we fix the C and Fortran symbols to actually be the same size, the
problem goes away and the GNU ld bug does not come into play.
This commit also fixes a minor issue that MPI_UNWEIGHTED and
MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed
by commit 107c0073dd).
Fixesopen-mpi/ompi#7209
Signed-off-by: Fangrui Song <i@maskray.me>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Instead of triggering the fault early in the initialization process, do
a serialized initialization and report the error once all the supporting
infrastructure is up and running.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
OpenMPI doesn't compile anymore with IME because the header
file "ompi/mca/fs/base/base.h" needs to be include in every
file where mca_fs_base_get_mpi_err() is used.
Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
The opal_gethostname() function provides a more robust mechanism
to retrieve the hostname than gethostname(), which can return
results that are not null-terminated, and which can vary in its
behavior from system to system.
opal_gethostname() just returns the value in opal_process_info.nodename;
this is populated in opal_init_gethostname() inside opal_init.c.
-Changed all gethostname calls in opal subtree to opal_gethostname
-Changed all gethostname calls in orte subtree to opal_gethostname
-Changed all gethostname calls in ompi subdir to opal_gethostname
-Changed all gethostname calls in oshmem subdir to opal_gethostname
-Changed opal_if.c in test subdir to use opal_gethostname
-Changed opal_init.c to include opal_init_gethostname. This function
returns an int and directly sets opal_process_info.nodename per
jsquyres' modifications.
Relates to open-mpi#6801
Signed-off-by: Charles Shereda <cpshereda@lanl.gov>
Some versions of Libfabric contain a bug in EFA where FI_REMOTE_COMM and
FI_LOCAL_COMM are not advertised. In order to workaround this, we need to call
fi_getinfo() without those capability bits to see if EFA is available first.
Also move around some of the provider include/exclude list logic so we can skip
this workaround if applicable.
Signed-off-by: Robert Wespetal <wesper@amazon.com>
Make sure to get an RDM provider that can provide both local and
remote communication. We need this check because some providers could
be selected via RXD or RXM, but can't provide local communication, for
example.
Add OPAL_CHECK_OFI_VERSION_GE() m4 macro to check that the Libfabric
we're building against is >= a target version. Use this check in two
places:
1. MTL/OFI: Make sure it is >= v1.5, because the FI_LOCAL_COMM /
FI_REMOTE_COMM constants were introduced in Libfabric API v1.5.
2. BTL/usnic: It already had similar configury to check for Libfabric
>= v1.1, but the usnic component was checking for >= v1.3. So
update the btl/usnic configury to use the new macro and check for
>= v1.3.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Add MCA parameter to allow support for message logging even when the MPI
library is initialized with support for thread multiple.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Add a comment about the stages in the vprotocols initialization and why
we cant use the threading provided during the PML initialization.
Instead, use the OMPI internal threading status to prevent the use of
message logging in multi-threaded applications.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Change the provider include and exclude list name comparison check to
ignore case. The UDP provider's name is uppercase and was being selected
despite being in the exclude list.
Signed-off-by: Robert Wespetal <wesper@amazon.com>
Squash compiler warning due to whitespace/brace problems.
The code block from lines 829-839 was improperly indented, which led to
both the code being confusing and a compiler warning. Comparing this code to
the current version in the MPICH repo made it clear that the code was simply
improperly indented. Fixing the indentation both makes the code readable and
squashes the compiler warning.
Signed-off-by: Maxwell Coil <mcoil@nd.edu>
Squash compiler warning.
ROMIO is third-party software but has an annoying compiler warning;
this is the minimum distance fix.
Signed-off-by: William Bailey <wbailey2@nd.edu>
I removed the implementation and/or prototypes of all unused functions defined for all components.
To reduce recurrent code, I created functions under base for the management of error codes and setting of file permission and amode.
Then, I replaced these recurrent code by those function for all components.
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
add a missing header file
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>