1
1

10735 Коммитов

Автор SHA1 Сообщение Дата
Joseph Schuchart
d8696aa8c4 UCX osc: centralize decision on whether to use AMOs
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
427d4bd226 UCX osc: do not acquire accumulate lock if exclusive lock was taken
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
471d76777a UCX osc: fence active operations before releasing accumulate lock and free memory if required
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
4d7a3856fa UCX osc: Use accumulate for operations/datatypes that are not covered by UCX
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
899f58cef5 UCX osc: simplify output address computation
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
d888b4fd76 UCX osc: correctly handle MPI_NO_OP
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
7cfc0e71da UCX osc: allow to asynchronously compare-and-swap
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
557ae80858 UCX osc: allow for overlap with (some) request-based atomic operations
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
1a3c6bbf35 UCX osc: re-use value returned by cswap to save additional get
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
8606a02b87 UCX osc: fix macro parameter name usage in OMPI_OSC_UCX_REQUEST_RETURN
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
d448efd49c UCX osc: properly clean up requests in case of errors
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
73a183408f UCX osc: add support for acc_single_intrinsic info key / mca param
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Jeff Squyres
47df60717c check_alt_short_float: ensure compiler supports math
Even if the compiler supports an "alternate" short float type (e.g.,
_Float16), check to make sure that the compiler will correctly link
applications that perform mathematical operations on that type.

Carefully choose the mathematical test in the configure check to
ensure the mathematical operation is not removed by compiler
optimization (when setting CFLAGS=-O1 or higher).

Out of the box, clang 6.0.x and 7.0.x will fail to link applications
that try to perform addition (and other mathematical operations) on
_Float16 variables (an additional CLI flag is required to enable
software emulation of _Float16).  If we detect a situation where the
type is supported by a sample program fails to link and the basename
of $CC is "clang", emit a warning and point the user to a relevant
README.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@fujitsu.com>
2020-06-17 08:38:42 -07:00
Edgar Gabriel
9af19ab205
Merge pull request #7819 from edgargabriel/pr/avg-fview-size
common/ompio: use avg. file view size in the aggregator selection logic
2020-06-16 10:04:31 -05:00
Jeff Squyres
d522c27037 mpi.h.in: Remove //-style comments
Keep all comments in the user-facing mpi.h.in as "old style" C
comments: /* */.  This gives us maximum portability, just on the off
chance that a user's C compiler does not support //-style comments.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-06-15 12:56:51 -07:00
Jeff Squyres
835f8f1834 mpi.h.in: fixups for static assert messages
1. __STDC_VERSION__ isn't necessarily defined (e.g., by C++
   compilers).  So check to make sure it is defined before we actually
   check the value.
2. If we're in C++11 (or later), use static_assert().
3. Split the static assert macro in two macros:
   * THIS_SYMBOL_WAS_REMOVED_IN_MPI30(...): Insert a valid expression
     (i.e., 0, because it's only used with MPI_Datatype values, and
     since MPI_Datatype is a pointer, 0 is a valid RHS expression)
     before invoking the static assert so that we don't get a syntax
     error instead of the actual static assert error.
   * THIS_FUNCTION_WAS_REMOVED_IN_MPI30(...): No need for the valid
     expression; just invoke the assert functionality.

Also remove an errant "\".

Thanks to Constantine Khrulev and Martin Audet for identifying the
issue and suggesting to use C11's static_assert().

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-06-15 12:55:46 -07:00
Edgar Gabriel
4a8a330bba common/ompio: use avg. file view size in the aggregator selection logic
This is a fix  based on a bugreport on github/mailing list from CGNS.
The core of the problem was that different processes entered different branches of
our aggregator selection logic, due to the fact that in some cases processes had
a matching file_view size and contiguous chunk size (thus assuming 1-D distribution),
and some processes did not (thus assuming 2-D distribution). The fix is to calculate
the avg. file view size across all processes and use this value, thus ensuring that
all processes enter the same branch.

Fixes issue #7809

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2020-06-15 09:17:44 -05:00
Sergey Oblomov
df0f2ac026 OMPI/HCOLL: fixed typo in vars description
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2020-05-29 20:13:35 +03:00
bosilca
2b1f053345
Merge pull request #7758 from wckzhang/fixdynamic
coll/tuned: Fix dynamic message size for gather and scatter
2020-05-22 14:15:32 -04:00
Michael Heinz
e21c31f54c
Merge pull request #7722 from mwheinz/mwheinz-7721
Add check for PSM2 reference counting to PSM2 MTL #7721
2020-05-19 08:06:41 -04:00
Michael Heinz
f10305a49f Add check for PSM2 reference counting to PSM2 MTL #7721
As discussed, a feature is being added to libpsm2 to correctly handle
the case where the library is opened by multiple OMPI transports in the same
process. (For example, the OFI BTL and the PSM2 MTL).

* Improved error message to indicate required libpsm2 version.

* Adds a test at autogen/configure time for the existence of
  PSM2_LIB_REFCOUNT_CAP.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
2020-05-18 15:25:22 -04:00
Ralph Castain
4468691eeb
Sync up with PRRTE and cleanup stale code
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-16 14:48:31 -07:00
William Zhang
50823fe9a9 coll/tuned: Fix dynamic message size for gather and scatter
The gather and scatter operations did not use the correct message size
(Only did datatype size * com size). This did not correctly reflect the
total message size and prevents fine tuning within a com size. This
patch multiplies the value by the number of elements sent.

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-05-14 12:17:52 -07:00
Howard Pritchard
f744668f5f
Merge pull request #7646 from hppritcha/topic/ofi_common_wl
add a common ofi whitelist/blacklist
2020-05-13 06:44:05 -06:00
Michael Heinz
4a5622a436
Merge pull request #7713 from mwheinz/master-7699
PSM2: Call add_procs through PML
2020-05-13 07:59:43 -04:00
Michael Heinz
548060e43f PSM2: Call add_procs through PML
Change ompi_mtl_ofi_get_endpoint() to call the active PML's add_procs()
rather than the OFI MTL add_procs() directly when discovering a new
process during operation.

Functionally, this has no impact in correct operation. However, the
current behavior means that the heterogenous and active PML checks
are not being executed in the dynamic discovery case.

Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
2020-05-12 12:35:39 -04:00
Howard Pritchard
9f1081a07a add a common ofi whitelist/blacklist
also add common verbose variable.

Note the verbosity thing is a little tricky owing to the way the MCA frameworks and components are registered and
and initialized.  The BTL's are registered/initialized prior to the MTL components even getting registered.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-05-09 14:50:31 -06:00
Michael Heinz
dbbdb8f2e2
Merge pull request #7621 from jsquyres/pr/remove-osc-pt2pt
Remove OSC pt2pt component
2020-05-08 12:43:57 -04:00
Brian Barrett
0dc2325297
Merge pull request #7641 from dancejic/multi-NIC
Added multi-NIC support to provider selection
2020-05-07 15:24:41 -07:00
Jeff Squyres
9afe58643e
Merge pull request #7600 from jsquyres/pr/mpit-general-docs
MPI_T general docs
2020-05-07 10:11:40 -04:00
Ralph Castain
42b3541242
Update mtl_psm2.c
Track change in PMIx

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-06 17:50:45 -07:00
Michael Heinz
c55c9e67f4 PSM2 update to use PRRTE instead of ORTE
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
2020-05-06 16:16:27 -04:00
Jeff Squyres
70993e1670 Move "MPI" and "OpenMPI" man pages to section 5
Make the main man page be Open-MPI(5), and set nroff-native aliases
for MPI(5) and OpenMPI(5).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-05-02 12:45:32 -07:00
Jeff Squyres
7ace873b50 Add MPI_T.5 man page for Open MPI-specific info
Also added infrastructure to have developers write man pages in
Markdown (vs. nroff).  Pandoc >=v1.12 is used to convert those
Markdown files into actual nroff man pages.

Dist tarballs will contain generated nroff man pages; we don't want to
require users to have Pandoc installed.  Anyone who builds Open MPI
from a git clone will need to have Pandoc installed (similar to how we
treat Flex).  You can opt out of Open MPI's Pandoc-generated man pages
by configuring Open MPI with --disable-man-pages.  This will also
disable "make dist" (i.e., "make dist" will error if you configured
with --disable-man-pages).

Also removed the stuff to re-generate man pages.

This commit also:

1. Includes a new man page, written in Markdown
   (ompi/mpi/man/man5/MPI_T.5.md) that contains Open MPI-specific
   information about MPI_T.
2. Includes a converted ompi/mpi/man/man3/MPI_T_init_thread.3.md (from
   MPI_T_init_thread.3in -- i.e., nroff) just to show that Markdown
   can be used throughout the Open MPI code base for man pages.
3. Made the Makefiles in ompi/mpi/man/man?/ be full-fledged
   Makefile.am's (vs. Makefile.extras that are designed to be included
   in ompi/Makefile.am).  It is more convenient to test generation /
   installation of man pages when you can "make" and "make install" in
   their respective directories (vs. doing a build / install for the
   entire ompi project).
4. Removed logic from ompi/Makefile.am that re-generated man pages if
   opal_config.h changes.

Other man pages -- hopefully all of them! -- will be converted to
Markdown over time.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-05-02 12:45:31 -07:00
Yossi Itigin
b61bf9a00a
Merge pull request #7349 from hoopoepg/topic/ucx-new-api-nbx
OPAL/UCX: enabling new API provided by UCX
2020-05-02 14:30:44 +03:00
Ralph Castain
f608575eec
Remove references to numa_rank
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-01 13:32:29 -07:00
Ralph Castain
86709b1c80
Fix PMIx_Fence call signature
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-01 12:27:42 -07:00
Sergey Oblomov
75bda25ddb OPAL/UCX: enabling new API provided by UCX
- added detection of new API into configuration
- added tag_send call implemented using new API
- added MPI_Send/MPI_Isend/MPI_Recv/MPI_Irecv implementations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2020-05-01 17:58:29 +03:00
Nikola Dancejic
167d75b42a common/ofi: Added multi-NIC support to provider selection
Adds the capability to select a NIC based on hardware locality.
Creates a list of NICs that share the same cpuset as the process,
then selects the NIC based on the (local rank) % (number of NICs).
If no NICs are available that share the same cpuset, the selection process
will create a list of all available NICs and make a selection based on
(local rank) % (number of NICs)

Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
2020-05-01 01:05:13 +00:00
Ralph Castain
bd29ab0ae9
Update dpm to handle deprecation of MPI_Info keys
Deprecate the current OMPI-specific MPI_Info key definitions for
MPI_Comm_spawn and replace them with their PMIx equivalents. Issue a
deprecation/conversion warning as this is done. Also issue deprecation
warnings for options such as "ompi_non_mpi" that are no longer used.

Handle both cases where the user might pass either the PMIx attribute
name itself (e.g., "PMIX_MAPBY") or the string value of the attribute
(e.g., PMIX_MAPBY, which translates to "pmix.mapby"). This can only be
done for PMIx v4 and above, so protect that code.

Silence a couple of Coverity warnings and add a test along the way.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-29 14:56:38 -07:00
Brian Barrett
4f03f44ced
Merge pull request #7582 from dipti-kothari/pml_check
mca/pml: PML check for direct modex
2020-04-27 12:29:11 -07:00
Austen Lauria
2e22a247bb
Merge pull request #7650 from devreal/fix-7617-oscpt2pt-leak
PT2PT osc: don't extra retain datatype
2020-04-24 08:55:28 -04:00
Austen Lauria
9f2f98e3ec
Merge pull request #7651 from devreal/fix-7617-oscrdma-complete_atomic
RDMA osc: remove extra retain on pending_op
2020-04-24 08:55:08 -04:00
Ralph Castain
91be01beb2
Merge pull request #7652 from rhc54/topic/het
Cleanup heterogeneous builds
2020-04-22 16:20:06 -07:00
Ralph Castain
6d29bbfde8
Cleanup heterogeneous builds
Consolidate the ompi_process_info and opal_process_info structs to
remove duplicate storage and conversion issues. Unwind some interweaving
of include files using opal.h. Silence a couple of warnings.

For now, set the arch to local if PMIX_ARCH is not found.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-22 12:46:27 -07:00
Dipti Kothari
5418cc56dd mca/pml: PML check for direct modex
For direct modex, all procs publish the selected pml module
and then at add_procs pml module for each proc is checked
against every other proc in the add_proc call.
For full modex, there is no change in functionality. Only Rank0
publishes its selected pml, all other procs in the add_proc call
check their selected pml against Rank0.
If pml's do not match, throw error and exit.

Signed-off-by: Dipti Kothari <dkothar@amazon.com>
2020-04-22 16:25:01 +00:00
Joseph Schuchart
de67ada442 RDMA osc: remove extra retain on pending_op
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-04-21 22:49:48 +02:00
Joseph Schuchart
07d1011afe OSC base: fix typos in documentation
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-04-21 21:53:36 +02:00
Joseph Schuchart
154cf571b6 OSC base: do not retain datatype by default
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-04-21 21:53:10 +02:00
William Zhang
771f9c011d coll/tuned: Add NULL check to prevent segfault
Signed-off-by: William Zhang <wilzhang@amazon.com>

cr https://code.amazon.com/reviews/CR-23837553
2020-04-21 17:53:46 +00:00