ompi_mtl_portals4_get_endpoint() was incorrectly making a direct
call to ompi_mtl_portals4_add_procs(). Instead use the actve PML
to call add_procs(). If add_procs() fails, call ompi_rte_abort()
to terminate the job.
Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
- Remove a duplicate "Configure host" output line
- No longer show the OPAL version. Since OPAL has never separated
into its own project, its version is always the same as Open MPI.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Even if the compiler supports an "alternate" short float type (e.g.,
_Float16), check to make sure that the compiler will correctly link
applications that perform mathematical operations on that type.
Carefully choose the mathematical test in the configure check to
ensure the mathematical operation is not removed by compiler
optimization (when setting CFLAGS=-O1 or higher).
Out of the box, clang 6.0.x and 7.0.x will fail to link applications
that try to perform addition (and other mathematical operations) on
_Float16 variables (an additional CLI flag is required to enable
software emulation of _Float16). If we detect a situation where the
type is supported by a sample program fails to link and the basename
of $CC is "clang", emit a warning and point the user to a relevant
README.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@fujitsu.com>
Keep all comments in the user-facing mpi.h.in as "old style" C
comments: /* */. This gives us maximum portability, just on the off
chance that a user's C compiler does not support //-style comments.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
1. __STDC_VERSION__ isn't necessarily defined (e.g., by C++
compilers). So check to make sure it is defined before we actually
check the value.
2. If we're in C++11 (or later), use static_assert().
3. Split the static assert macro in two macros:
* THIS_SYMBOL_WAS_REMOVED_IN_MPI30(...): Insert a valid expression
(i.e., 0, because it's only used with MPI_Datatype values, and
since MPI_Datatype is a pointer, 0 is a valid RHS expression)
before invoking the static assert so that we don't get a syntax
error instead of the actual static assert error.
* THIS_FUNCTION_WAS_REMOVED_IN_MPI30(...): No need for the valid
expression; just invoke the assert functionality.
Also remove an errant "\".
Thanks to Constantine Khrulev and Martin Audet for identifying the
issue and suggesting to use C11's static_assert().
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This is a fix based on a bugreport on github/mailing list from CGNS.
The core of the problem was that different processes entered different branches of
our aggregator selection logic, due to the fact that in some cases processes had
a matching file_view size and contiguous chunk size (thus assuming 1-D distribution),
and some processes did not (thus assuming 2-D distribution). The fix is to calculate
the avg. file view size across all processes and use this value, thus ensuring that
all processes enter the same branch.
Fixes issue #7809
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
Once any number of events are read, return immediately, rather than
waiting for fi_cq_read() to return FI_EAGAIN or an error. This can
improve observed latency if the user application is in a blocking call
waiting for us to return. Deleting the while loop here also means
ofi_progress_event_count serves as an upper bound for the total number
of events read in a single call (with the while loop we might read far
more, as long as new events continue to arrive).
Signed-off-by: Eric Badger <eric@badgerio.us>
mpi-next)
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
Ordering must match fortran definition index for errhandlers, and we
don't want to change the old ones.
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
As discussed, a feature is being added to libpsm2 to correctly handle
the case where the library is opened by multiple OMPI transports in the same
process. (For example, the OFI BTL and the PSM2 MTL).
* Improved error message to indicate required libpsm2 version.
* Adds a test at autogen/configure time for the existence of
PSM2_LIB_REFCOUNT_CAP.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
The gather and scatter operations did not use the correct message size
(Only did datatype size * com size). This did not correctly reflect the
total message size and prevents fine tuning within a com size. This
patch multiplies the value by the number of elements sent.
Signed-off-by: William Zhang <wilzhang@amazon.com>
Change ompi_mtl_ofi_get_endpoint() to call the active PML's add_procs()
rather than the OFI MTL add_procs() directly when discovering a new
process during operation.
Functionally, this has no impact in correct operation. However, the
current behavior means that the heterogenous and active PML checks
are not being executed in the dynamic discovery case.
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
also add common verbose variable.
Note the verbosity thing is a little tricky owing to the way the MCA frameworks and components are registered and
and initialized. The BTL's are registered/initialized prior to the MTL components even getting registered.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Also added infrastructure to have developers write man pages in
Markdown (vs. nroff). Pandoc >=v1.12 is used to convert those
Markdown files into actual nroff man pages.
Dist tarballs will contain generated nroff man pages; we don't want to
require users to have Pandoc installed. Anyone who builds Open MPI
from a git clone will need to have Pandoc installed (similar to how we
treat Flex). You can opt out of Open MPI's Pandoc-generated man pages
by configuring Open MPI with --disable-man-pages. This will also
disable "make dist" (i.e., "make dist" will error if you configured
with --disable-man-pages).
Also removed the stuff to re-generate man pages.
This commit also:
1. Includes a new man page, written in Markdown
(ompi/mpi/man/man5/MPI_T.5.md) that contains Open MPI-specific
information about MPI_T.
2. Includes a converted ompi/mpi/man/man3/MPI_T_init_thread.3.md (from
MPI_T_init_thread.3in -- i.e., nroff) just to show that Markdown
can be used throughout the Open MPI code base for man pages.
3. Made the Makefiles in ompi/mpi/man/man?/ be full-fledged
Makefile.am's (vs. Makefile.extras that are designed to be included
in ompi/Makefile.am). It is more convenient to test generation /
installation of man pages when you can "make" and "make install" in
their respective directories (vs. doing a build / install for the
entire ompi project).
4. Removed logic from ompi/Makefile.am that re-generated man pages if
opal_config.h changes.
Other man pages -- hopefully all of them! -- will be converted to
Markdown over time.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
- added detection of new API into configuration
- added tag_send call implemented using new API
- added MPI_Send/MPI_Isend/MPI_Recv/MPI_Irecv implementations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
Adds the capability to select a NIC based on hardware locality.
Creates a list of NICs that share the same cpuset as the process,
then selects the NIC based on the (local rank) % (number of NICs).
If no NICs are available that share the same cpuset, the selection process
will create a list of all available NICs and make a selection based on
(local rank) % (number of NICs)
Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
Deprecate the current OMPI-specific MPI_Info key definitions for
MPI_Comm_spawn and replace them with their PMIx equivalents. Issue a
deprecation/conversion warning as this is done. Also issue deprecation
warnings for options such as "ompi_non_mpi" that are no longer used.
Handle both cases where the user might pass either the PMIx attribute
name itself (e.g., "PMIX_MAPBY") or the string value of the attribute
(e.g., PMIX_MAPBY, which translates to "pmix.mapby"). This can only be
done for PMIx v4 and above, so protect that code.
Silence a couple of Coverity warnings and add a test along the way.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Consolidate the ompi_process_info and opal_process_info structs to
remove duplicate storage and conversion issues. Unwind some interweaving
of include files using opal.h. Silence a couple of warnings.
For now, set the arch to local if PMIX_ARCH is not found.
Signed-off-by: Ralph Castain <rhc@pmix.org>
For direct modex, all procs publish the selected pml module
and then at add_procs pml module for each proc is checked
against every other proc in the add_proc call.
For full modex, there is no change in functionality. Only Rank0
publishes its selected pml, all other procs in the add_proc call
check their selected pml against Rank0.
If pml's do not match, throw error and exit.
Signed-off-by: Dipti Kothari <dkothar@amazon.com>
adding PMIX_NUMA_RANK info to process metadata so that the local NUMA
rank can be accessed through the opal_process_info object.
Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
The locality for remote procs is not provided as it is only a local
concept. Thus, you must _always_ use modex_recv_optional to ensure you
don't hang waiting for a response until dmodex times out.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Do some code cleanup in the connect/accept code. Ensure that the OMPI
layer has access to the PMIx identifier for the process. Add macros for
converting PMIx names to/from strings. Cleanup a few of the simple test
programs. Add a little more info to a btl/tcp error message.
Signed-off-by: Ralph Castain <rhc@pmix.org>
As indicated in the MPI3.2 document 14.3.10 page 599 line 1, the only
MPI error code possible is MPI_SUCCESS. All other errors must be in the
error class MPI_T_ERR*.
Fix the return of few pvar/cvar function that failed to correctly
convert to an MPI error code.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Found a handful of other URLs that weren't https-ized, so I updated
them, too (after verifying that they support https, of course).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Pass the correct ompi_proc_t and array length to
mca_pml_base_pml_check_selected() during dynamic modex.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
Add a framework to support different types of threading models including
user space thread packages such as Qthreads and argobot:
https://github.com/pmodels/argobotshttps://github.com/Qthreads/qthreads
The default threading model is pthreads. Alternate thread models are
specificed at configure time using the --with-threads=X option.
The framework is static. The theading model to use is selected at
Open MPI configure/build time.
mca/threads: implement Argobots threading layer
config: fix thread configury
- Add double quotations
- Change Argobot to Argobots
config: implement Argobots check
If the poll time is too long, MPI hangs.
This quick fix just sets it to 0, but it is not good for the
Pthreads version. Need to find a good way to abstract it.
Note that even 1 (= 1 millisecond) causes disastrous performance
degradation.
rework threads MCA framework configury
It now works more like the ompi/mca/rte configury,
modulo some edge items that are special for threading package
linking, etc.
qthreads module
some argobots cleanup
Signed-off-by: Noah Evans <noah.evans@gmail.com>
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Change ompi_mtl_ofi_get_endpoint() to call the active PML's
add_procs() rather than the OFI MTL add_procs() directly when
discovering a new process during operation.
Functionally, this has no impact in correct operation. However,
the current behavior means that the heterogenous and active PML
checks are not being executed in the dynamic discovery case.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
We currently save the hostname of a proc when we create the ompi_proc_t for it. This was originally done because the only method we had for discovering the host of a proc was to include that info in the modex, and we had to therefore store it somewhere proc-local. Obviously, this ccarried a memory penalty for storing all those strings, and so we added a "cutoff" parameter so that we wouldn't collect hostnames above a certain number of procs.
Unfortunately, this still results in an 8-byte/proc memory cost as we have a char* pointer in the opal_proc_t that is contained in the ompi_proc_t so that we can store the hostname of the other procs if we fall below the cutoff. At scale, this can consume a fair amount of memory.
With the switch to relying on PMIx, there is no longer a need to cache the proc hostnames. Using the "optional" feature of PMIx_Get, we restrict the retrieval to be purely proc-local - i.e., we retrieve the info either via shared memory or from within the proc-internal hash storage (depending upon the active PMIx components). Thus, the retrieval of a hostname is purely a local operation involving no communication.
All RM's are required to provide a complete hostname map of all procs at startup. Thus, we have full access to all hostnames without including them in a modex or having to cache them on each proc. This allows us to remove the char* pointer from the opal_proc_t, saving us 8-bytes/proc.
Unfortunately, PMIx_Get does not currently support the return of a static pointer to memory. Thus, even though PMIx has the hostname in its memory, it can only return a malloc'd version of it. I have therefore ensured that the return from opal_get_proc_hostname is consistently malloc'd and free'd wherever used. This shouldn't be a burden as the hostname is only used in one of two circumstances:
(a) in an error message
(b) in a verbose output for debugging purposes
Thus, there should be no performance penalty associated with the malloc/free requirement. PMIx will eventually be returning static pointers, and so we can eventually simplify this method and return a "const char*" - but as noted, this really isn't an issue even today.
Signed-off-by: Ralph Castain <rhc@pmix.org>
OMPI can only support PMIx v3 and above. PRRTE requires at least PMIx
v4, so protect against the case where OMPI is built against an external
PMIx v3.
Fix check of PMIx_Init return code for singleton operations.
Ensure that the PMIx framework gets properly opened.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Per the devel mailing list, we discussed the need/desirability of this change. Here is the logic behind not including it:
If you call "hwloc_topology_load", then hwloc merrily does its discovery and slams many-core systems. If you call "opal_hwloc_get_topology", then that is fine - it checks if we already have it, tries to get it from PMIx (using shared mem for hwloc 2.x), and only does the discovery if no other method is available.
We previously decided to let those who need the topology call "opal_hwloc_get_topology" to ensure the topo was available so that we don't load it unless someone actually needs it - in the case where it isn't available via PMIx, this avoids paying the startup time and memory footprint penalties for no reason. The function is protected so it will simply return SUCCESS if the topology is already defined.
After discussion, it was decided to stick with that "only setup the topology if someone actually needs it" approach. Hence, we will not blanket init the topology, and the mtl/ofi component will call opal_hwloc_get_topology to ensure the topo has been defined prior to using it.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
Use "prte" instead of "prun" for proxy execution of cmds like mpirun.
This avoids the fork/exec-rendezvous complexities and should result in
more reliable operation.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Restore missing call to get_topology - others were doing it in their
components as repeated calls just return success, but let's ensure it is
always present.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Extend the PMIx modex recv macros to cover the full set of
immediate/optional combinations. If PMIx_Init cannot reach a server,
then declare the MPI proc to be a singleton.
Provide full support for info values via PMIx
Catch all the values used in the "info" area of OMPI using data
available from PMIx instead of via envars. Update PMIx and PRRTE to sync
with their capabilities.
PMIx
- ensure cleanup of fork/exec children
- fix bug in gds/hash that left app info off of list
PRRTE
- fix multi-app bugs
- port setup_child logic from orte
- OMPI env changes
- set app->first_rank
- ensure common hostname across prun, prte, and pmix
- Fix "nolocal" support
Silence a warning from btl/vader
Signed-off-by: Ralph Castain <rhc@pmix.org>
The fence logic in MPI_Init got messed up somehow such that we were
always executing a fence, which is not desirable. The logic is supposed
to be:
* if async fence is requested and we are not collecting data, then do
not fence at all
* if async fence is requested and we are collecting data, then execute
the fence in the background - wait for completion at the end of MPI_Init.
* if async fence is not requested, then execute a blocking fence at that
point, collecting data as directed. Note that we cannot actually do a
blocking fence as we need to cycle the event library via opal_progress
as the PMIx progress thread is tied to the OMPI event base.
Signed-off-by: Ralph Castain <rhc@pmix.org>
orte is gone, and we don't want to require other
RM's to either use a prrte specific env, or to
set their own.
OMPI_MCA_orte_ess_num_procs -> OMPI_MCA_num_procs
OMPI_MCA_orte_cpu_type -> OMPI_MCA_cpu_type
See PRRTE PR's:
https://github.com/openpmix/prrte/pull/443https://github.com/openpmix/prrte/pull/440
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
This commit fixes an issue with the include usage in some
ompi source files. These source files are using the <> form
of include when the "" form is correct (as these are internal,
**not** system headers).
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
- Port memchecker call from a1d502c.
- Remove unused memcheck macro variables.
- Some code readability improvements.
- Remove some stray +1's in dynamic comm cleanup.
- Re-add OPAL_ENABLE_DEBUG macro to osc header.
- Cleanup some printf's, and includes.
- Refactor cleanup of dpm_disconnect_objs.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
- fix handling of contiguous datatypes with a non-zero true lower bound
- fix handling of datatypes using block of non contiguous predefined datatypes
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
OpenPMIx fills in a variety of info when it detects that we are in
singleton mode. Best way of detecting it is to look for the "singleton"
at the beginning of the returned nspace.
Make the modex recvs optional so we don't bounce up to the server and
then to the host trying to retrieve job-level info that must be given to
us at job start.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Temporary solution for the PML inconsistency issue discussed in #7475.
This patch address 2 things: first it make the PMIx key optional so that
if we are not in a full modex mode we don't do a direct modex, and
second it get the PML info from the vpid 0 instead of from the local
rank.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
This commit contains the following changes:
The C++ bindings were removed from the standard in MPI-3.0. This
commit removes the entirety of the C++ bindings as well as the
support configury.
Removes all references to C++ from the man pages. This includes the
bindings themselves, all references to what C++ bindings return,
all not-available comments, and differences between C++ and other
language bindings.
If the user passes --enable-mpi-cxx, --enable-mpi-cxx-seek, or
--enable-cxx-exceptions, print a warning message an abort configure.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
1- Remove the common symbols issue: global variable not initialized. (#7424)
Move the variables to local scope within the set_info function.
2- Remove GPFS hints using datashipping: not used anymore
3- Redirect output stream to corresponding fs framework.
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
The communicator might be not existent yet when mca_fs_gpfs_component_file_query() is called.
Therefore, we need to check it first before calling brodcast function.
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
This commit changes the behavior of the individual sharedfp component. If
the component cannot create either the datafile or the metadatafile during File_open,
no error is being raised going forward. This allows applications that do not use shared
file pointer operations to continue execution without any issue.
If the user however subsequently calls MPI_File_write_shared or similar operations, an error
will be raised.
Fixes issue #7429
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
If someone gives us a namespace that doesn't easily translate to an
integer, we have to create a mechanism for working around the
disconnect. PRRTE has been updated to give us a flag so we know we were
"natively" launched. If we don't see it, then fall back to generating a
hash of the nspace as our jobid. We then have to translate back/forth
between nspace and jobid using a lookup table.
Probably not the right long-term solution, but hopefully helps get us
thru for a bit.
Includes update of PRRTE pointer
Signed-off-by: Ralph Castain <rhc@pmix.org>
This commit fixes an issue with non-debug builds where adding an
attachment to the attachment list doesn't actually happen. This
causes all MPI_Win_detach calls to fail. The call was within an
assert which is optimized out in optimized builds.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
This commit increaes the osc_rdma_max_attach variable from 32
to 64. The new default is kept low due to the small number
of registration resources on some systems (Cray Aries). A
larger max attachement value can be set by the user on other
systems.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
This commit addresses two issues in osc/rdma:
1) It is erroneous to attach regions that overlap. This was being
allowed but the standard does not allow overlapping attachments.
2) Overlapping registration regions (4k alignment of attachments)
appear to be allowed. Add attachment bases to the bookeeping
structure so we can keep better track of what can be detached.
It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.
References #7384
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Fix a couple of spots in OMPI to resolve warnings. The one in comm_cid
in particular may be responsible for some/all of the comm_spawn issues
as it was passing an incorrect pointer to a macro, thus causing memory
corruption.
Update PRRTE and PMIx to deal with v3/v4 differences.
Signed-off-by: Ralph Castain <rhc@pmix.org>
- fix a typo `alloc_shared_contig` to `alloc_shared_noncontig`
- correct the value of `blocking_fence`
Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.
remove orte: misc fixes
- UCX fixes
- VPATH issue
- oshmem fixes
- remove useless definition
- Add PRRTE submodule
- Get autogen.pl to traverse PRRTE submodule
- Remove stale orcm reference
- Configure embedded PRRTE
- Correctly pass the prefix to PRRTE
- Correctly set the OMPI_WANT_PRRTE am_conditional
- Move prrte configuration to the end of OMPI's configure.ac
- Make mpirun a symlink to prun, when available
- Fix makedist with --no-orte/--no-prrte option
- Add a `--no-prrte` option which is the same as the legacy
`--no-orte` option.
- Remove embedded PMIx tarball. Replace it with new submodule
pointing to OpenPMIx master repo's master branch
- Some cleanup in PRRTE integration and add config summary entry
- Correctly set the hostname
- Fix locality
- Fix singleton operations
- Fix support for "tune" and "am" options
Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
Automake's Fortran compilation rules inexplicably use CPPFLAGS and
AM_CPPFLAGS. Unfortunately, this can cause problems in some cases
(e.g., picking up already-installed mpi.mod in a system-default
include search path).
So in relevant module-using Fortran compilation Makefile.am's, zero
out CPPFLAGS and AM_CPPFLAGS.
This has a side-effect of requiring that we compile the one .c file in
the F08 library in a new, separate subdirectory (with its own
Makefile.am that does _not_ have CPPFLAGS/AM_CPPFLAGS zeroed out).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Fix the C types for the following:
* MPI_UNWEIGHTED
* MPI_WEIGHTS_EMPTY
* MPI_ARGV_NULL
* MPI_ARGVS_NULL
* MPI_ERRCODES_IGNORE
There is lengthy discussion on
https://github.com/open-mpi/ompi/pull/7210 describing the issue; the
gist of it is that the C and Fortran types for several MPI global
sentenial values should agree (specifically: their sizes must(**)
agree). We erroneously had several of these array-like sentinel
values be "array-like" values in C. E.g., MPI_ERRCODES_IGNORE was an
(int *) in C while its corresponding Fortran type was "integer,
dimension(1)". On a 64 bit platform, this resulted in C expecting the
symbol size to be sizeof(int*)==8 while Fortran expected the symbol
size to be sizeof(INTEGER, DIMENSION(1))==4.
That is incorrect -- the corresponding C type needed to be (int).
Then both C and Fortran expect the size of the symbol to be the same.
(**) NOTE: This code has been wrong for years. This mismatch of types
typically worked because, due to Fortran's call-by-reference
semantics, Open MPI was comparing the *addresses* of these instances,
not their *types* (or sizes) -- so even if C expected the size of the
symbol to be X and Fortran expected the size of the symbol to be Y
(where X!=Y), all we really checked at run time was that the addresses
of the symbols were the same. But it caused linker warning messages,
and even caused errors in some cases.
Specifically: due to a GNU ld bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common
symbols are incorrectly versioned VER_NDX_LOCAL because their
definitions in Fortran sources have smaller st_size than those in
libmpi.so.
This makes the Fortran library not linkable with lld in distributions
that ship openmpi built with -Wl,--version-script
(https://bugs.llvm.org/show_bug.cgi?id=43748):
% mpifort -fuse-ld=lld /dev/null
ld.lld: error: corrupt input file: version definition index 0 for symbol
mpi_fortran_argv_null_ is out of bounds
>>> defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so
...
If we fix the C and Fortran symbols to actually be the same size, the
problem goes away and the GNU ld bug does not come into play.
This commit also fixes a minor issue that MPI_UNWEIGHTED and
MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed
by commit 107c0073dd).
Fixesopen-mpi/ompi#7209
Signed-off-by: Fangrui Song <i@maskray.me>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Instead of triggering the fault early in the initialization process, do
a serialized initialization and report the error once all the supporting
infrastructure is up and running.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
OpenMPI doesn't compile anymore with IME because the header
file "ompi/mca/fs/base/base.h" needs to be include in every
file where mca_fs_base_get_mpi_err() is used.
Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
The opal_gethostname() function provides a more robust mechanism
to retrieve the hostname than gethostname(), which can return
results that are not null-terminated, and which can vary in its
behavior from system to system.
opal_gethostname() just returns the value in opal_process_info.nodename;
this is populated in opal_init_gethostname() inside opal_init.c.
-Changed all gethostname calls in opal subtree to opal_gethostname
-Changed all gethostname calls in orte subtree to opal_gethostname
-Changed all gethostname calls in ompi subdir to opal_gethostname
-Changed all gethostname calls in oshmem subdir to opal_gethostname
-Changed opal_if.c in test subdir to use opal_gethostname
-Changed opal_init.c to include opal_init_gethostname. This function
returns an int and directly sets opal_process_info.nodename per
jsquyres' modifications.
Relates to open-mpi#6801
Signed-off-by: Charles Shereda <cpshereda@lanl.gov>
Some versions of Libfabric contain a bug in EFA where FI_REMOTE_COMM and
FI_LOCAL_COMM are not advertised. In order to workaround this, we need to call
fi_getinfo() without those capability bits to see if EFA is available first.
Also move around some of the provider include/exclude list logic so we can skip
this workaround if applicable.
Signed-off-by: Robert Wespetal <wesper@amazon.com>
Make sure to get an RDM provider that can provide both local and
remote communication. We need this check because some providers could
be selected via RXD or RXM, but can't provide local communication, for
example.
Add OPAL_CHECK_OFI_VERSION_GE() m4 macro to check that the Libfabric
we're building against is >= a target version. Use this check in two
places:
1. MTL/OFI: Make sure it is >= v1.5, because the FI_LOCAL_COMM /
FI_REMOTE_COMM constants were introduced in Libfabric API v1.5.
2. BTL/usnic: It already had similar configury to check for Libfabric
>= v1.1, but the usnic component was checking for >= v1.3. So
update the btl/usnic configury to use the new macro and check for
>= v1.3.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Add MCA parameter to allow support for message logging even when the MPI
library is initialized with support for thread multiple.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Add a comment about the stages in the vprotocols initialization and why
we cant use the threading provided during the PML initialization.
Instead, use the OMPI internal threading status to prevent the use of
message logging in multi-threaded applications.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Change the provider include and exclude list name comparison check to
ignore case. The UDP provider's name is uppercase and was being selected
despite being in the exclude list.
Signed-off-by: Robert Wespetal <wesper@amazon.com>
Squash compiler warning due to whitespace/brace problems.
The code block from lines 829-839 was improperly indented, which led to
both the code being confusing and a compiler warning. Comparing this code to
the current version in the MPICH repo made it clear that the code was simply
improperly indented. Fixing the indentation both makes the code readable and
squashes the compiler warning.
Signed-off-by: Maxwell Coil <mcoil@nd.edu>
Squash compiler warning.
ROMIO is third-party software but has an annoying compiler warning;
this is the minimum distance fix.
Signed-off-by: William Bailey <wbailey2@nd.edu>
I removed the implementation and/or prototypes of all unused functions defined for all components.
To reduce recurrent code, I created functions under base for the management of error codes and setting of file permission and amode.
Then, I replaced these recurrent code by those function for all components.
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
add a missing header file
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
When O_CREAT and O_EXCL are set, open() shall fail if the file exists. Therefore, I assigned the file creation to the root process only.
I also translated the errno codes to their corresponding MPI error codes.
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
I updated the gpfs component to match the latest structures and functions definitions and remove compiler wranings.
Disabled mpi info setting for the gpfs option "Data Shipping" since it is no longer supported in the latest gpfs versions.
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
update the configure logic of the gpfs component
based on what we learned from user feedback over the last
two years for the other components
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
Delete check for amode which should go to a higler layer, e.g. ompi_file_open.
Only perform Info value check if key is found.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
Until now sqrt(n) was missed as a factor for odd square numbers n. This
lead to suboptimal results of MPI_Dims_create for input numbers like 9,
25, 49, ... Fix the results by including sqrt(n) in the search for
factors.
Refs: #7186
Signed-off-by: Michael Lass <bevan@bi-co.net>
In fcoll_two_phase_supprot_fns.c: calculation of the aggregator index
failed for large offsets on 32bit machine, due to improper handling of
64bit offsets.
Fixes Issue #7110
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
Compiling OMPI on cray systems using latest Cray compilers (clang based)
yielded some compiler warnings from ompio/lustre. Squash these warnings.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
This is closely related to Platform-MPI's old -prot feature.
The long-format of the tables it prints could look like this:
> Host 0 [myhost001] ranks 0 - 1
> Host 1 [myhost002] ranks 2 - 3
> Host 2 [myhost003] ranks 4
> Host 3 [myhost004] ranks 5
> Host 4 [myhost005] ranks 6
> Host 5 [myhost006] ranks 7
> Host 6 [myhost007] ranks 8
> Host 7 [myhost008] ranks 9
> Host 8 [myhost009] ranks 10
>
> host | 0 1 2 3 4 5 6 7 8
> ======|==============================================
> 0 : sm tcp tcp tcp tcp tcp tcp tcp tcp
> 1 : tcp sm tcp tcp tcp tcp tcp tcp tcp
> 2 : tcp tcp self tcp tcp tcp tcp tcp tcp
> 3 : tcp tcp tcp self tcp tcp tcp tcp tcp
> 4 : tcp tcp tcp tcp self tcp tcp tcp tcp
> 5 : tcp tcp tcp tcp tcp self tcp tcp tcp
> 6 : tcp tcp tcp tcp tcp tcp self tcp tcp
> 7 : tcp tcp tcp tcp tcp tcp tcp self tcp
> 8 : tcp tcp tcp tcp tcp tcp tcp tcp self
>
> Connection summary:
> on-host: all connections are sm or self
> off-host: all connections are tcp
In this example hosts 0 and 1 had multiple ranks so "sm" was more
meaningful than "self" to identify how the ranks on the host are
talking to each other. While host 2..8 were one rank per host so
"self" was more meaningful as their btl.
Above a certain number of hosts (12 by default) the above table gets too big
so we shrink to a more abbreviated looking table that has the same data:
> host | 0 1 2 3 4 8
> ======|====================
> 0 : A C C C C C C C C
> 1 : C A C C C C C C C
> 2 : C C B C C C C C C
> 3 : C C C B C C C C C
> 4 : C C C C B C C C C
> 5 : C C C C C B C C C
> 6 : C C C C C C B C C
> 7 : C C C C C C C B C
> 8 : C C C C C C C C B
> key: A == sm
> key: B == self
> key: C == tcp
Then above 36 hosts we stop printing the 2d table entirely and just print the
summary:
> Connection summary:
> on-host: all connections are sm or self
> off-host: all connections are tcp
The options to control it are
-mca comm_method 1 : print the above table at the end of MPI_Init
-mca comm_method 2 : print the above table at the beginning of MPI_Finalize
-mca comm_method_max <n> : number of hosts <n> for which to print a full size 2d
-mca comm_method_brief 1 : only print summary output, no 2d table
-mca comm_method_fakefile <filename> : for debugging only
* printing at init vs finalize:
The most important difference between these two is that when printing the table
during MPI_Init(), we send extra messages to make sure all hosts are connected to
each other. So the table ends up working against the idea of on-demand connections
(although it's only forcing the n^2 connections in the number of hosts, not the
total ranks). If printing at MPI_Finalize() we don't create any connections that
aren't already connected, so the table is more likely to have "n/a" entries if
some hosts never connected to each other.
* how many hosts <n> for which to print a full size 2d table
The option -mca comm_method_max <n> can be used to specify a number of hosts <n>
(default 12) that controls at what host-count the unabbreviated / abbreviated
2d tables get printed:
1 - n : full size 2d table
n+1 - 3n : shortened 2d table
3n+1 - inf : summary only, no 2d table
* brief
The option -mca comm_method_brief 1 can be used to skip the printing of the 2d
table and only show the short summary
* fakefile
This is a debugging option that allows easeir testing of all the printout
routines by letting all the detected communication methods between the hosts
be overridden by fake data from a file.
The source of the information used in the table is the .mca_component_name
In the case of BTLs, the module always had a .btl_component linking back to the
component. The vars mca_pml_base_selected_component and ompi_mtl_base_selected_component
offer similar functionality for pml/mtl.
So with the ability to identify the component, we can then access
the component name with code like this
mca_pml_base_selected_component.pmlm_version.mca_component_name
See the three lookup_{pml,mtl,btl}_name() functions in hook_comm_method_fns.c,
and their use in comm_method() to parse the strings and produce an integer
to represent the connection type being used.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
This is based on a bug reported on the mailing list using a netcdf testcase.
The problem occurs if processes are using a custom file view, but on some
of them it appears as if the default file view is being used. Because of that,
the simple-grouping option lead to different number of aggregators used on different
processes, and ultimately to a deadlock. This patch fixes the problem by not using
the file_view size anymore for the calculation in the simple-grouping option,
but the contiguous chunk size (which is identical on all processes).
Fixes issue #7109
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>