1
1
Граф коммитов

10600 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
174e967dbc
Remove ORTE project
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.

remove orte: misc fixes

 - UCX fixes
 - VPATH issue
 - oshmem fixes
 - remove useless definition
 - Add PRRTE submodule
 - Get autogen.pl to traverse PRRTE submodule
 - Remove stale orcm reference
 - Configure embedded PRRTE
 - Correctly pass the prefix to PRRTE
 - Correctly set the OMPI_WANT_PRRTE am_conditional
 - Move prrte configuration to the end of OMPI's configure.ac
 - Make mpirun a symlink to prun, when available
 - Fix makedist with --no-orte/--no-prrte option
 - Add a `--no-prrte` option which is the same as the legacy
   `--no-orte` option.
 - Remove embedded PMIx tarball. Replace it with new submodule
   pointing to OpenPMIx master repo's master branch
 - Some cleanup in PRRTE integration and add config summary entry
 - Correctly set the hostname
 - Fix locality
 - Fix singleton operations
 - Fix support for "tune" and "am" options

Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-02-07 18:20:06 -08:00
Jeff Squyres
ab398f4b9a fortran: ensure not to use [AM_]CPPFLAGS
Automake's Fortran compilation rules inexplicably use CPPFLAGS and
AM_CPPFLAGS.  Unfortunately, this can cause problems in some cases
(e.g., picking up already-installed mpi.mod in a system-default
include search path).

So in relevant module-using Fortran compilation Makefile.am's, zero
out CPPFLAGS and AM_CPPFLAGS.

This has a side-effect of requiring that we compile the one .c file in
the F08 library in a new, separate subdirectory (with its own
Makefile.am that does _not_ have CPPFLAGS/AM_CPPFLAGS zeroed out).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-02-03 14:45:32 -08:00
Jeff Squyres
f4a47a5a8e fortran: remove useless CPPFLAGS assignment
These -D's are for C compilation, not Fortran compilation.  Remove
this useless statement.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-02-03 14:45:25 -08:00
Fangrui Song
5609268e90 Make C and Fortran types for MPI sentinels agree in size
Fix the C types for the following:

* MPI_UNWEIGHTED
* MPI_WEIGHTS_EMPTY
* MPI_ARGV_NULL
* MPI_ARGVS_NULL
* MPI_ERRCODES_IGNORE

There is lengthy discussion on
https://github.com/open-mpi/ompi/pull/7210 describing the issue; the
gist of it is that the C and Fortran types for several MPI global
sentenial values should agree (specifically: their sizes must(**)
agree).  We erroneously had several of these array-like sentinel
values be "array-like" values in C.  E.g., MPI_ERRCODES_IGNORE was an
(int *) in C while its corresponding Fortran type was "integer,
dimension(1)".  On a 64 bit platform, this resulted in C expecting the
symbol size to be sizeof(int*)==8 while Fortran expected the symbol
size to be sizeof(INTEGER, DIMENSION(1))==4.

That is incorrect -- the corresponding C type needed to be (int).
Then both C and Fortran expect the size of the symbol to be the same.

(**) NOTE: This code has been wrong for years.  This mismatch of types
typically worked because, due to Fortran's call-by-reference
semantics, Open MPI was comparing the *addresses* of these instances,
not their *types* (or sizes) -- so even if C expected the size of the
symbol to be X and Fortran expected the size of the symbol to be Y
(where X!=Y), all we really checked at run time was that the addresses
of the symbols were the same.  But it caused linker warning messages,
and even caused errors in some cases.

Specifically: due to a GNU ld bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common
symbols are incorrectly versioned VER_NDX_LOCAL because their
definitions in Fortran sources have smaller st_size than those in
libmpi.so.

This makes the Fortran library not linkable with lld in distributions
that ship openmpi built with -Wl,--version-script
(https://bugs.llvm.org/show_bug.cgi?id=43748):

  % mpifort -fuse-ld=lld /dev/null
  ld.lld: error: corrupt input file: version definition index 0 for symbol
  mpi_fortran_argv_null_ is out of bounds
  >>> defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so
  ...

If we fix the C and Fortran symbols to actually be the same size, the
problem goes away and the GNU ld bug does not come into play.

This commit also fixes a minor issue that MPI_UNWEIGHTED and
MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed
by commit 107c0073dd).

Fixes open-mpi/ompi#7209

Signed-off-by: Fangrui Song <i@maskray.me>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-02-02 04:20:39 -08:00
Austen Lauria
3c0fa537c2
Merge pull request #7113 from devreal/osc_rdma_cas
RDMA osc: perform CAS in shared memory if possible
2020-01-28 10:59:17 -05:00
Austen Lauria
10f6a77640
Merge pull request #7315 from abouteiller/export/tcp_errors_v2
Handle error cases in TCP BTL (v2)
2020-01-27 17:03:07 -05:00
Aurelien Bouteiller
395fcc4253
Disable inband PML error reporting during MPI Finalize as it interferes with the Finalize process.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-01-27 13:32:34 -05:00
Austen Lauria
969eb0286c
Merge pull request #6970 from ggouaillardet/topic/cuda_no_orte
coll/cuda: remove unnecessary references to ORTE
2020-01-27 13:12:32 -05:00
Austen Lauria
4f6978466d
Merge pull request #7284 from bosilca/fix/monitoring_registration
Minor cleanup in the monitoring PML.
2020-01-27 13:01:30 -05:00
Jeff Squyres
f16d2903aa
Merge pull request #7328 from DDNStorage/ime_fs_missing_header
fs/ime: fix compilation errors due to missing header inclusion
2020-01-23 14:13:59 -05:00
Sylvain Didelot
01ae0c22b8 fs/ime: fix compilation errors due to missing header inclusion
OpenMPI doesn't compile anymore with IME because the header
file "ompi/mca/fs/base/base.h" needs to be include in every
file where mca_fs_base_get_mpi_err() is used.

Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
2020-01-22 17:56:55 +01:00
Austen Lauria
df9745a251
Merge pull request #7299 from awlauria/fix_warnings
Fix some compiler warnings.
2020-01-17 11:33:41 -05:00
Jeff Squyres
25931ea8bf
Merge pull request #7200 from cpshereda/master-opal_gethostname-change
Fix unsafe use of gethostname()
2020-01-13 16:07:43 -05:00
Charles Shereda
cbc6feaab2 Created opal_gethostname() as safer gethostname substitute.
The opal_gethostname() function provides a more robust mechanism
to retrieve the hostname than gethostname(), which can return
results that are not null-terminated, and which can vary in its
behavior from system to system.

opal_gethostname() just returns the value in opal_process_info.nodename;
this is populated in opal_init_gethostname() inside opal_init.c.

-Changed all gethostname calls in opal subtree to opal_gethostname
-Changed all gethostname calls in orte subtree to opal_gethostname
-Changed all gethostname calls in ompi subdir to opal_gethostname
-Changed all gethostname calls in oshmem subdir to opal_gethostname
-Changed opal_if.c in test subdir to use opal_gethostname
-Changed opal_init.c to include opal_init_gethostname. This function
 returns an int and directly sets opal_process_info.nodename per
 jsquyres' modifications.

Relates to open-mpi#6801

Signed-off-by: Charles Shereda <cpshereda@lanl.gov>
2020-01-13 08:52:17 -08:00
Robert Wespetal
49128a7adb mtl/ofi: Add workaround for EFA local/remote capabilities bug
Some versions of Libfabric contain a bug in EFA where FI_REMOTE_COMM and
FI_LOCAL_COMM are not advertised. In order to workaround this, we need to call
fi_getinfo() without those capability bits to see if EFA is available first.

Also move around some of the provider include/exclude list logic so we can skip
this workaround if applicable.

Signed-off-by: Robert Wespetal <wesper@amazon.com>
2020-01-13 08:26:01 -08:00
Jeff Squyres
21bc9042e1 mtl/ofi: check for FI_LOCAL_COMM+FI_REMOTE_COMM
Make sure to get an RDM provider that can provide both local and
remote communication.  We need this check because some providers could
be selected via RXD or RXM, but can't provide local communication, for
example.

Add OPAL_CHECK_OFI_VERSION_GE() m4 macro to check that the Libfabric
we're building against is >= a target version.  Use this check in two
places:

1. MTL/OFI: Make sure it is >= v1.5, because the FI_LOCAL_COMM /
   FI_REMOTE_COMM constants were introduced in Libfabric API v1.5.
2. BTL/usnic: It already had similar configury to check for Libfabric
   >= v1.1, but the usnic component was checking for >= v1.3.  So
   update the btl/usnic configury to use the new macro and check for
   >= v1.3.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-01-13 08:19:53 -08:00
George Bosilca
05093f9cb1
Minor cleanup in the monitoring PML.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-01-13 09:24:00 -05:00
Austen Lauria
b65ec27307 Fix some compiler warnings.
Silence unused variables, incompatible pointer types,
un-initialized variables, and signed/unsigned comparisons.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2020-01-10 13:10:53 -05:00
Jeff Squyres
dd2d7d2866
Merge pull request #7189 from michaellass/fix-dims_create
dims_create: fix calculation of factors for odd squares
2020-01-10 09:47:09 -05:00
bosilca
2d659323b9
Merge pull request #7233 from bosilca/fix/unused_fortran_protected
Remove unused variable.
2020-01-08 08:04:06 -05:00
Eisuke Kawashima
d26d4e1d63
Fix typo and update URLs (https, redirection) [skip ci]
Signed-off-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
2020-01-07 03:52:25 +09:00
Howard Pritchard
b17cd9049f
Merge pull request #7238 from rwespetal/ofi-prov-name-fix
mtl/ofi: ignore case when comparing provider names
2020-01-02 12:23:39 -07:00
Jeff Squyres
8b424c3863
Merge pull request #7232 from bosilca/hjelmn_neighbor_alltoall_fix
Neighbor alltoall fix
2019-12-17 17:24:05 -05:00
Robert Wespetal
9b72e9465d mtl/ofi: ignore case when comparing provider names
Change the provider include and exclude list name comparison check to
ignore case. The UDP provider's name is uppercase and was being selected
despite being in the exclude list.

Signed-off-by: Robert Wespetal <wesper@amazon.com>
2019-12-16 13:05:00 -08:00
George Bosilca
97eb5d0cf2 Remove unused variable.
It got left over during the Fortran rework.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-12-11 16:59:50 -05:00
George Bosilca
86acdee460
Fix the communication ordering for all cartesian neighbor collectives.
This work is rooted in the [MPI Forum issue
153](https://github.com/mpi-forum/mpi-issues/issues/153).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-12-11 12:40:38 -05:00
Todd Kordenbrock
1af6dbe277
Merge pull request #7066 from tkordenbrock/topic/master/portals4.fix.flowcontrol.bugs
portals4: fix flow control bugs
2019-12-11 06:31:26 -06:00
Jeff Squyres
73ecece55d
Merge pull request #7211 from mcoil1/pr/fixing-compiler-warnings
Fix a few compiler warnings
2019-12-09 07:26:15 -05:00
Maxwell Coil
52241dbbcd libnbc: fixed uninitialized variable
Squash compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
2019-12-08 14:03:48 -05:00
Maxwell Coil
3ced33c2eb ompi/dpm/dpm.c: Fix uninititalized variable
Squash compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
2019-12-08 14:03:48 -05:00
William Bailey
e2718e0196 fcoll/two_phase: Compiler warning for wrong variable type used
Squash compiler warning. Changed output specifier to match variable type (long int -> long long int).

Signed-off-by: William Bailey <wbailey2@nd.edu>
2019-12-08 13:15:30 -05:00
Maxwell Coil
8c237e2684 romio: Update ADIOI_R_Exchange_data function
Squash compiler warning due to whitespace/brace problems.

The code block from lines 829-839 was improperly indented, which led to
both the code being confusing and a compiler warning. Comparing this code to
the current version in the MPICH repo made it clear that the code was simply
improperly indented. Fixing the indentation both makes the code readable and
squashes the compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
2019-12-05 18:24:02 -05:00
William Bailey
30bda56bce romio: fix uninitialized variable
Squash compiler warning.

ROMIO is third-party software but has an annoying compiler warning;
this is the minimum distance fix.

Signed-off-by: William Bailey <wbailey2@nd.edu>
2019-12-02 17:35:05 -05:00
Gilles Gouaillardet
967cf68027 configury: fix a typo in mpiext/shortfloat
remove an extra and useless comma

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-11-29 11:09:54 +09:00
raafatfeki
7b2d83c898 mca/fs: Remove unused functions and prototypes & reduce recurent code through all components
I removed the implementation and/or prototypes of all unused functions defined for all components.
To reduce recurrent code, I created functions under base for the management of error codes and setting of file permission and amode.
Then, I replaced these recurrent code by those function for all components.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>

add a missing header file

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-11-25 09:01:38 -06:00
raafatfeki
5988297bd8 fs/gpfs: Assign file creation to one process and management of error codes translation.
When O_CREAT and O_EXCL are set, open() shall fail if the file exists. Therefore, I assigned the file creation to the root process only.
I also translated the errno codes to their corresponding MPI error codes.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2019-11-25 09:01:38 -06:00
raafatfeki
a2b1a03e7b fs/gpfs: Update component to compile with latest gpfs versions
I updated the gpfs component to match the latest structures and functions definitions and remove compiler wranings.
Disabled mpi info setting for the gpfs option "Data Shipping" since it is no longer supported in the latest gpfs versions.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2019-11-25 09:01:38 -06:00
Edgar Gabriel
715a6e8c27 fs/gpfs: update component and module file
to match what we are doing meanwhile in the other fs compomenents.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-11-25 09:01:38 -06:00
Edgar Gabriel
8d31ba3a09 fs/gpfs: update configure logic
update the configure logic of the gpfs component
based on what we learned from user feedback over the last
two years for the other components

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-11-25 09:01:38 -06:00
XuanWang1982
cd76e53479 Changed parameter name from siox to gpfs
Adding get_info function for gpfs module
annotate printf information

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-11-25 09:01:38 -06:00
Christoph Niethammer
d3c11396fd First code review round of the GPFS module.
Delete check for amode which should go to a higler layer, e.g. ompi_file_open.
Only perform Info value check if key is found.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-11-25 09:01:38 -06:00
XuanWang1982
b1dc58eeb2 First version for GPFS module. To be tested
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-11-25 09:01:38 -06:00
Howard Pritchard
ebf003581b
Merge pull request #7183 from hppritcha/topic/quiet_lustre
lustre: squash some compiler warnings
2019-11-22 19:19:08 -07:00
Barton Chittenden
47816ef83b Remove mxm, yalla and ikrit
Signed-off-by: Barton Chittenden <bartonski@gmail.com>
2019-11-22 13:40:16 -08:00
Michael Lass
67490118ad dims_create: fix calculation of factors for odd squares
Until now sqrt(n) was missed as a factor for odd square numbers n. This
lead to suboptimal results of MPI_Dims_create for input numbers like 9,
25, 49, ... Fix the results by including sqrt(n) in the search for
factors.

Refs: #7186

Signed-off-by: Michael Lass <bevan@bi-co.net>
2019-11-22 20:12:58 +01:00
Edgar Gabriel
ea1355beae fcoll/two_phase: fix error in calculating aggregators in 32bit mode
In fcoll_two_phase_supprot_fns.c: calculation of the aggregator index
failed for large offsets on 32bit machine, due to improper handling of
64bit offsets.

Fixes Issue #7110

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-11-22 09:45:19 -06:00
Howard Pritchard
e66a7cef11 lustre: squash some compiler warnings
Compiling OMPI on cray systems using latest Cray compilers (clang based)
yielded some compiler warnings from ompio/lustre.  Squash these warnings.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-11-21 14:14:40 -06:00
Geoff Paulsen
9e17f028b5
Merge pull request #5507 from markalle/prot_table
Adding -mca comm_method to print table of communication methods
2019-11-14 14:24:06 -06:00
Austen Lauria
edcd6d8aeb
Merge pull request #7146 from bgoglin/master
fix typos hlwoc->hwloc
2019-11-13 15:38:00 -05:00
Yossi Itigin
40e2fbb093
Merge pull request #7114 from brminich/topic/mlx_scat_tuning
COLL/TUNED: Add linear scatter using isend for mlnx platform
2019-11-07 13:38:21 +02:00