1
1

10848 Коммитов

Автор SHA1 Сообщение Дата
Artem Polyakov
7c17a38c96 timings: Fix timings when 'prefix' is used
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2020-03-07 09:36:43 -08:00
Austen Lauria
04a3a28a74 Some memchecker cleanup and others.
- Port memchecker call from a1d502c.
- Remove unused memcheck macro variables.
- Some code readability improvements.
- Remove some stray +1's in dynamic comm cleanup.
- Re-add OPAL_ENABLE_DEBUG macro to osc header.
- Cleanup some printf's, and includes.
- Refactor cleanup of dpm_disconnect_objs.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2020-03-05 16:44:18 -05:00
Gilles Gouaillardet
e2ad184db5 pml/ob1: silence valgrind errors
always define and initialize padding in various structs
when OPAL_ENABLE_DEBUG is set

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-03-05 16:10:43 -05:00
Gilles Gouaillardet
5751dfe91a mpi/c: fix memchecker invokation
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-03-05 16:10:42 -05:00
Gilles Gouaillardet
fc2516457b osc/pt2pt: silence valgrind warnings
explicitly add and initialize padding to keep valgrind happy

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-03-05 16:10:42 -05:00
Gilles Gouaillardet
4d92b5fcd8 memchecker: fix memchecker_call
- fix handling of contiguous datatypes with a non-zero true lower bound
- fix handling of datatypes using block of non contiguous predefined datatypes

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-03-05 16:10:42 -05:00
Gilles Gouaillardet
0db5a15696 ompi/dpm: plug misc memory leaks
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-03-04 14:54:40 -05:00
Harumi Kuno
397cc44aa4 Fix typo in mca_pml_base_pml_check_selected
Addresses issue https://github.com/open-mpi/ompi/issues/7494

Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
2020-03-02 22:49:40 -08:00
Ralph Castain
23303eaee3
Merge pull request #7492 from rhc54/topic/sing
Fix singleton operation
2020-02-29 15:15:04 -08:00
Ralph Castain
674134430c
Fix singleton operation
OpenPMIx fills in a variety of info when it detects that we are in
singleton mode. Best way of detecting it is to look for the "singleton"
at the beginning of the returned nspace.

Make the modex recvs optional so we don't bounce up to the server and
then to the host trying to retrieve job-level info that must be given to
us at job start.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-29 12:18:30 -08:00
Ralph Castain
4fe9ae329c
Add missing include and remove stale PML
The "yalla" pml no longer exists

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-29 11:54:38 -08:00
Ralph Castain
cbbe67eff9
Merge pull request #7487 from bosilca/topic/pml_from_vpid0
Make sure the PML selection is consistent across the world.
2020-02-28 17:19:26 -08:00
bosilca
c4d36859ec
Merge pull request #7228 from devreal/progress-returns
Harmonize return values of progress callbacks
2020-02-28 20:15:37 -05:00
bosilca
806b35157d
Merge pull request #7261 from bosilca/fix/vprotocol
Fix/vprotocol initialization
2020-02-28 20:14:30 -05:00
George Bosilca
21d743393f
Make sure the PML is consistent across the world.
Temporary solution for the PML inconsistency issue discussed in #7475.
This patch address 2 things: first it make the PMIx key optional so that
if we are not in a full modex mode we don't do a direct modex, and
second it get the PML info from the vpid 0 instead of from the local
rank.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-02-28 17:53:48 -05:00
Jeff Squyres
af1ec9a594
Merge pull request #7323 from bosilca/fix/7320
Trap wrong parameters to MPI_Init_thread.
2020-02-27 06:28:44 -05:00
Tsubasa Yanagibashi
b604f1f1fe add a description in MPI_WIN_DETACH man page.
Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
2020-02-27 18:19:55 +09:00
Tsubasa Yanagibashi
070d4c15bc update a description in MPI_Request_free man page.
Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
2020-02-27 18:17:44 +09:00
Tsubasa Yanagibashi
6c342aef68 fix some typos and spacing in man pages.
Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
2020-02-27 18:14:26 +09:00
Jeff Squyres
ba1f016508
Merge pull request #7428 from hjelmn/finally_kill_the_old_cpp_bindings
ompi: remove obsolete c++ bindings
2020-02-26 17:40:25 -05:00
Nathan Hjelm
0b8baa217d ompi: remove obsolete c++ bindings
This commit contains the following changes:

The C++ bindings were removed from the standard in MPI-3.0. This
commit removes the entirety of the C++ bindings as well as the
support configury.

Removes all references to C++ from the man pages. This includes the
bindings themselves, all references to what C++ bindings return,
all not-available comments, and differences between C++ and other
language bindings.

If the user passes --enable-mpi-cxx, --enable-mpi-cxx-seek, or
--enable-cxx-exceptions, print a warning message an abort configure.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-26 13:04:55 -08:00
raafatfeki
f46cfc120d fs/gpfs: Solve issues while setting GPFS hints
1- Remove the common symbols issue: global variable not initialized. (#7424)
Move the variables to local scope within the set_info function.
2- Remove GPFS hints using datashipping: not used anymore
3- Redirect output stream to corresponding fs framework.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2020-02-25 18:14:20 -05:00
raafatfeki
9ba6ab8209 mca/fs: Check the existence of communicator in file query
The communicator might be not existent yet when mca_fs_gpfs_component_file_query() is called.
Therefore, we need to check it first before calling brodcast function.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2020-02-25 16:26:46 -05:00
Geoff Paulsen
207b267135
Merge pull request #7455 from rhc54/topic/warn
Silence a bunch of warnings
2020-02-24 08:55:30 -06:00
Geoff Paulsen
6f28a18f4e
Merge pull request #7444 from tjahns/master
Fix incorrect argument in manual page.
2020-02-24 08:40:04 -06:00
Ralph Castain
89f3418c2c
Merge pull request #7459 from rhc54/topic/cid
Fix comm_spawn
2020-02-23 15:29:44 -08:00
Edgar Gabriel
28776c5d95
Merge pull request #7448 from edgargabriel/topic/individual-as-dummy-module
sharedfp/individual: defer error when not being able to open datafile
2020-02-23 16:35:23 -06:00
Ralph Castain
b35b0f7897
Fix comm_spawn
Use the correct data type in the CID exchange

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-23 13:51:22 -08:00
Ralph Castain
86de81baca
Silence a bunch of warnings
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-22 13:05:28 -08:00
Ralph Castain
76b9c15825
Remove lingering ORTE references
Wrapper compiler is trying to link in a libopen-rte. Man pages are
setting an ORTE release date.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-22 07:59:19 -08:00
Edgar Gabriel
df6e3e503a sharedfp/individual: defer error when not being able to open datafile
This commit changes the behavior of the individual sharedfp component. If
the component cannot create either the datafile or the metadatafile during File_open,
no error is being raised going forward. This allows applications that do not use shared
file pointer operations to continue execution without any issue.

If the user however subsequently calls MPI_File_write_shared or similar operations, an error
will be raised.

Fixes issue #7429

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2020-02-21 12:13:39 -06:00
Ralph Castain
829fd478b3
Create a hack to protect against non-integer jobids
If someone gives us a namespace that doesn't easily translate to an
integer, we have to create a mechanism for working around the
disconnect. PRRTE has been updated to give us a flag so we know we were
"natively" launched. If we don't see it, then fall back to generating a
hash of the nspace as our jobid. We then have to translate back/forth
between nspace and jobid using a lookup table.

Probably not the right long-term solution, but hopefully helps get us
thru for a bit.

Includes update of PRRTE pointer

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-21 06:04:55 -08:00
Thomas Jahns
8ecbe1ce97 Fix incorrect argument in manual page.
Signed-off-by: Thomas Jahns <jahns@dkrz.de>
2020-02-21 11:25:38 +01:00
Austen Lauria
dd5991f513
Merge pull request #7204 from devreal/shmwin_contig
Correctly set baseptr in contiguous shared memory window with local size zero
2020-02-20 13:22:40 -05:00
Nathan Hjelm
8ee80d8855 osc/rdma: fix bug in attach for non-debug builds
This commit fixes an issue with non-debug builds where adding an
attachment to the attachment list doesn't actually happen. This
causes all MPI_Win_detach calls to fail. The call was within an
assert which is optimized out in optimized builds.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-18 12:59:46 -08:00
Nathan Hjelm
eeb3d7f845
Merge pull request #7387 from hjelmn/osc_rdma_allow_overlapping_registration_regions_and_return_the_correct_error_code_when_regions_overlap
osc/rdma: modify attach to check for region overlap
2020-02-17 14:26:53 -07:00
Nathan Hjelm
54c8233f4f osc/rdma: bump the default max dynamic attachments to 64
This commit increaes the osc_rdma_max_attach variable from 32
to 64. The new default is kept low due to the small number
of registration resources on some systems (Cray Aries). A
larger max attachement value can be set by the user on other
systems.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-16 17:09:20 -08:00
Nathan Hjelm
6649aef8bd osc/rdma: modify attach to check for region overlap
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.

References #7384

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-16 17:09:06 -08:00
Ralph Castain
4bdb5a8103
Merge pull request #7402 from rhc54/topic/up3
Resolve the PMIx v3 incompatibility
2020-02-15 06:05:08 -08:00
Ralph Castain
133e8eba22
Resolve the PMIx v3 incompatibility
Fix a couple of spots in OMPI to resolve warnings. The one in comm_cid
in particular may be responsible for some/all of the comm_spawn issues
as it was passing an incorrect pointer to a macro, thus causing memory
corruption.

Update PRRTE and PMIx to deal with v3/v4 differences.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-14 21:01:10 -08:00
Tsubasa Yanagibashi
a07a83d189 osc/sm: fix typo and minor correction
- fix a typo `alloc_shared_contig` to `alloc_shared_noncontig`
- correct the value of `blocking_fence`

Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
2020-02-14 17:52:56 +09:00
Gilles Gouaillardet
174e967dbc
Remove ORTE project
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.

remove orte: misc fixes

 - UCX fixes
 - VPATH issue
 - oshmem fixes
 - remove useless definition
 - Add PRRTE submodule
 - Get autogen.pl to traverse PRRTE submodule
 - Remove stale orcm reference
 - Configure embedded PRRTE
 - Correctly pass the prefix to PRRTE
 - Correctly set the OMPI_WANT_PRRTE am_conditional
 - Move prrte configuration to the end of OMPI's configure.ac
 - Make mpirun a symlink to prun, when available
 - Fix makedist with --no-orte/--no-prrte option
 - Add a `--no-prrte` option which is the same as the legacy
   `--no-orte` option.
 - Remove embedded PMIx tarball. Replace it with new submodule
   pointing to OpenPMIx master repo's master branch
 - Some cleanup in PRRTE integration and add config summary entry
 - Correctly set the hostname
 - Fix locality
 - Fix singleton operations
 - Fix support for "tune" and "am" options

Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-02-07 18:20:06 -08:00
Jeff Squyres
ab398f4b9a fortran: ensure not to use [AM_]CPPFLAGS
Automake's Fortran compilation rules inexplicably use CPPFLAGS and
AM_CPPFLAGS.  Unfortunately, this can cause problems in some cases
(e.g., picking up already-installed mpi.mod in a system-default
include search path).

So in relevant module-using Fortran compilation Makefile.am's, zero
out CPPFLAGS and AM_CPPFLAGS.

This has a side-effect of requiring that we compile the one .c file in
the F08 library in a new, separate subdirectory (with its own
Makefile.am that does _not_ have CPPFLAGS/AM_CPPFLAGS zeroed out).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-02-03 14:45:32 -08:00
Jeff Squyres
f4a47a5a8e fortran: remove useless CPPFLAGS assignment
These -D's are for C compilation, not Fortran compilation.  Remove
this useless statement.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-02-03 14:45:25 -08:00
Fangrui Song
5609268e90 Make C and Fortran types for MPI sentinels agree in size
Fix the C types for the following:

* MPI_UNWEIGHTED
* MPI_WEIGHTS_EMPTY
* MPI_ARGV_NULL
* MPI_ARGVS_NULL
* MPI_ERRCODES_IGNORE

There is lengthy discussion on
https://github.com/open-mpi/ompi/pull/7210 describing the issue; the
gist of it is that the C and Fortran types for several MPI global
sentenial values should agree (specifically: their sizes must(**)
agree).  We erroneously had several of these array-like sentinel
values be "array-like" values in C.  E.g., MPI_ERRCODES_IGNORE was an
(int *) in C while its corresponding Fortran type was "integer,
dimension(1)".  On a 64 bit platform, this resulted in C expecting the
symbol size to be sizeof(int*)==8 while Fortran expected the symbol
size to be sizeof(INTEGER, DIMENSION(1))==4.

That is incorrect -- the corresponding C type needed to be (int).
Then both C and Fortran expect the size of the symbol to be the same.

(**) NOTE: This code has been wrong for years.  This mismatch of types
typically worked because, due to Fortran's call-by-reference
semantics, Open MPI was comparing the *addresses* of these instances,
not their *types* (or sizes) -- so even if C expected the size of the
symbol to be X and Fortran expected the size of the symbol to be Y
(where X!=Y), all we really checked at run time was that the addresses
of the symbols were the same.  But it caused linker warning messages,
and even caused errors in some cases.

Specifically: due to a GNU ld bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common
symbols are incorrectly versioned VER_NDX_LOCAL because their
definitions in Fortran sources have smaller st_size than those in
libmpi.so.

This makes the Fortran library not linkable with lld in distributions
that ship openmpi built with -Wl,--version-script
(https://bugs.llvm.org/show_bug.cgi?id=43748):

  % mpifort -fuse-ld=lld /dev/null
  ld.lld: error: corrupt input file: version definition index 0 for symbol
  mpi_fortran_argv_null_ is out of bounds
  >>> defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so
  ...

If we fix the C and Fortran symbols to actually be the same size, the
problem goes away and the GNU ld bug does not come into play.

This commit also fixes a minor issue that MPI_UNWEIGHTED and
MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed
by commit 107c0073dd11fb90d18122c521686f692a32cdd8).

Fixes open-mpi/ompi#7209

Signed-off-by: Fangrui Song <i@maskray.me>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-02-02 04:20:39 -08:00
George Bosilca
72501f8f9c Consistent return from all progress functions.
This fix ensures that all progress functions return the number of
completed events.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-01-28 20:16:53 +01:00
Joseph Schuchart
2c97187ee0 Harmonize return values of progress callbacks
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-01-28 20:15:03 +01:00
Austen Lauria
3c0fa537c2
Merge pull request #7113 from devreal/osc_rdma_cas
RDMA osc: perform CAS in shared memory if possible
2020-01-28 10:59:17 -05:00
Austen Lauria
10f6a77640
Merge pull request #7315 from abouteiller/export/tcp_errors_v2
Handle error cases in TCP BTL (v2)
2020-01-27 17:03:07 -05:00
Aurelien Bouteiller
395fcc4253
Disable inband PML error reporting during MPI Finalize as it interferes with the Finalize process.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-01-27 13:32:34 -05:00