1
1

30708 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
4468691eeb
Sync up with PRRTE and cleanup stale code
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-16 14:48:31 -07:00
Ralph Castain
c1374afd0d
Merge pull request #7744 from rhc54/topic/sync
Pickup the OMPI system-default parameters
2020-05-16 14:00:33 -07:00
Ralph Castain
54f8b6d23c
Pickup the OMPI system-default parameters
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-16 12:43:04 -07:00
Ralph Castain
ac5ec62563
Merge pull request #7741 from rhc54/topic/sync
Sync to PMIx and PRRTE masters
2020-05-16 11:06:45 -07:00
Ralph Castain
337fcb0047
Sync to PMIx and PRRTE masters
Roll in new mapping/binding methods and report outputs. Fix a few bugs

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-16 07:39:31 -07:00
Josh Hursey
9c0a2bb2d6
Merge pull request #7734 from jjhursey/fix-lsf-libevent
Move to `libevent_core` and add checks for libevent.so conflict with LSF
2020-05-15 14:36:22 -05:00
Austen Lauria
9996b9f54d
Merge pull request #7720 from abouteiller/bugfix/tcp-failed-lock
Race condition when closing TCP endpoint with error
2020-05-13 16:52:21 -04:00
Joshua Hursey
33afdb6649 Move from legacy -levent to recommended -levent_core
* `libevent_core.so` contains the core functionality that we depend upon
   - `libevent.so` library has been identified as the legacy target.
   - `libevent_core.so` exists as far back as Libevent 2.0.5 (oldest supported by OMPI)
 * `libevent_pthreads.so` can work with either `-levent` or `-levent_core`

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 886f41fe3381a338eac215f26360980c612e6bb8)
2020-05-13 10:48:24 -04:00
Joshua Hursey
959353b421 Add checks for libevent.so conflict with LSF
* LSF ships a `libevent.so` that is no related to the `libevent.so`
   shipped with Libevent.
 * Add some checks to the configure logic to detect scenarios where this
   conflict can be detected, and provide the user with a descriptive
   warning message.
   - When detected by `event/external` this is just a warning since
     the internal component may be able to be used instead.
     - This happens when the user supplies the LSF path via the
       `LDFLAGS` envar instead of via `--with-lsf-libdir`.
   - When detected by a LSF component and LSF was explicitly requested
     then this becomes an error. Otherwise it will just print the warning
     and that component will fail to build.
 * Note for `master` the `orter_check_lsf.m4` portion of this cherry-pick
   was moved to `prrte/config/prrte_check_lsf.m4`

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit fc4199e3ba567a672ce1da0dc46efbfd996d71f6)
2020-05-13 10:47:02 -04:00
Joshua Hursey
a73a89f6cf event/external: Fix typo in LDFLAGS vs LIBS var before check
* This should have been `LDFLAGS` not `LIBS`. Either works, but
   `LDFLAGS` is more correct. We should also include `CPPFLAGS`
   just in case the header is important to the check.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 22d8fa197b73eff7afc6d5fd11a99ced396c388a)
2020-05-13 10:45:26 -04:00
Howard Pritchard
f744668f5f
Merge pull request #7646 from hppritcha/topic/ofi_common_wl
add a common ofi whitelist/blacklist
2020-05-13 06:44:05 -06:00
Michael Heinz
4a5622a436
Merge pull request #7713 from mwheinz/master-7699
PSM2: Call add_procs through PML
2020-05-13 07:59:43 -04:00
Michael Heinz
548060e43f PSM2: Call add_procs through PML
Change ompi_mtl_ofi_get_endpoint() to call the active PML's add_procs()
rather than the OFI MTL add_procs() directly when discovering a new
process during operation.

Functionally, this has no impact in correct operation. However, the
current behavior means that the heterogenous and active PML checks
are not being executed in the dynamic discovery case.

Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
2020-05-12 12:35:39 -04:00
Howard Pritchard
3078485eee
Merge pull request #7712 from shintaro-iwasaki/fix7697
opal/mca/threads/argobots: fix compilation error
2020-05-11 09:02:22 -06:00
Aurelien Bouteiller
0e93d0f647
Bugfix: when a TCP socket is closed in error, it could update the
endpoint state without holding the endpoint lock, resulting in a race
condition.

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-05-11 01:11:05 -04:00
Howard Pritchard
9f1081a07a add a common ofi whitelist/blacklist
also add common verbose variable.

Note the verbosity thing is a little tricky owing to the way the MCA frameworks and components are registered and
and initialized.  The BTL's are registered/initialized prior to the MTL components even getting registered.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-05-09 14:50:31 -06:00
bosilca
4460e8ba8e
Merge pull request #7714 from devreal/opal-progress-unregister-oob
Fix potential out-of-bounds write in opal_progress_unregister
2020-05-08 16:57:28 -04:00
Joseph Schuchart
fa1b12ac33 Fix potential out-of-bounds write in opal_progress_unregister
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-05-08 21:11:51 +02:00
Michael Heinz
dbbdb8f2e2
Merge pull request #7621 from jsquyres/pr/remove-osc-pt2pt
Remove OSC pt2pt component
2020-05-08 12:43:57 -04:00
Brian Barrett
0dc2325297
Merge pull request #7641 from dancejic/multi-NIC
Added multi-NIC support to provider selection
2020-05-07 15:24:41 -07:00
Shintaro Iwasaki
0fc2033c75 opal/mca/threads/argobots: fix compilation error
Fixes #7697

Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
2020-05-07 16:07:12 +00:00
Jeff Squyres
9afe58643e
Merge pull request #7600 from jsquyres/pr/mpit-general-docs
MPI_T general docs
2020-05-07 10:11:40 -04:00
Ralph Castain
698e6bb549
Merge pull request #7708 from mwheinz/master-7700
PSM2 update to use PRRTE instead of ORTE
2020-05-06 19:03:45 -07:00
Ralph Castain
42b3541242
Update mtl_psm2.c
Track change in PMIx

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-06 17:50:45 -07:00
Ralph Castain
ac87e812e0
Merge pull request #7709 from rhc54/topic/ps
Update PMIx to fix some bugs
2020-05-06 17:48:47 -07:00
Ralph Castain
120cd31aaa
Update PMIx to fix some bugs
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-06 15:53:11 -07:00
Ralph Castain
6b5b7125d0
Merge pull request #7706 from rhc54/topic/syn
Update PMIx and PRRTE
2020-05-06 14:48:09 -07:00
Michael Heinz
c55c9e67f4 PSM2 update to use PRRTE instead of ORTE
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
2020-05-06 16:16:27 -04:00
Ralph Castain
ebd164b4c1
Update PMIx and PRRTE
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-06 12:40:11 -07:00
Jeff Squyres
b66e27d3ca
Merge pull request #7671 from hjelmn/add_support_for_component_aliasing_and_reluctantly_rename_vader_to_btl_sm
Add support for component aliasing
2020-05-05 10:57:16 -04:00
Nathan Hjelm
9d8f634044 btl/vader: rename vader -> sm
Now that the old sm btl has been gone for some time there was a request
to rename vader to sm. This commit does just that (reluctantly).

An alias has been generated so specifying vader in the btl selection
variable or specifying vader parameters will continue to work.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-05-05 06:43:19 -07:00
Nathan Hjelm
9fae5bfdf3 mca/base: add support for component aliasing
This commit adds support for aliasing component names. A component
name alias is created by calling: mca_base_alias_register. The name
of the project and framework are optional. The component name and
component alias are required. Once an alias is registered all
variables registered after the alias creation will have synonyms
also registered. For example:

```c
mca_base_alias_register("opal", "btl", "vader", "sm", false);
```

would cause all of the variables registered by btl/vader to have
aliases that start with btl_sm. Ex: btl_vader_single_copy_mechanism
would have the synonym: btl_sm_single_copy_mechanism.

If aliases are registered before component filtering the alias
can also be used for component selection. For example, if sm is
registered as an alias to vader in the btl framework register
function then ```--mca btl self,sm``` would be equivalent to
```--mca btl self,vader```.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-05-05 06:43:19 -07:00
Nathan Hjelm
3a036f8486 opal/class: add additional object helper functions
This commit adds two additional helpers to opal/class:

 - OPAL_HASH_TABLE_FOREACH_PTR: Same as OPAL_HASH_TABLE_FOREACH but
   operating on ptr hash tables. This is needed because the _ptr
   iterator functions take an additional argument.

 - OPAL_LIST_FOREACH_DECL: Same as OPAL_LIST_FOREACH but declares
   the variable specified in the first argument.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-05-05 06:43:19 -07:00
Ralph Castain
1b6622f5d9
Merge pull request #7685 from rhc54/topic/sync
Update PRRTE
2020-05-04 09:13:26 -07:00
Ralph Castain
b60ea7a6ad
Update PRRTE
Protect against systems that forward entire environment

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-04 07:57:22 -07:00
Jeff Squyres
de22f84f1a configure.ac: fix minor output ordering issue
Put Perl test below the section title.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-05-02 12:45:32 -07:00
Jeff Squyres
cce62678a5 configure.ac: update Flex warning language
Make the Flex "you do not need Flex if..." warning to be consistent
with the language about Pandoc.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-05-02 12:45:32 -07:00
Jeff Squyres
70993e1670 Move "MPI" and "OpenMPI" man pages to section 5
Make the main man page be Open-MPI(5), and set nroff-native aliases
for MPI(5) and OpenMPI(5).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-05-02 12:45:32 -07:00
Jeff Squyres
7ace873b50 Add MPI_T.5 man page for Open MPI-specific info
Also added infrastructure to have developers write man pages in
Markdown (vs. nroff).  Pandoc >=v1.12 is used to convert those
Markdown files into actual nroff man pages.

Dist tarballs will contain generated nroff man pages; we don't want to
require users to have Pandoc installed.  Anyone who builds Open MPI
from a git clone will need to have Pandoc installed (similar to how we
treat Flex).  You can opt out of Open MPI's Pandoc-generated man pages
by configuring Open MPI with --disable-man-pages.  This will also
disable "make dist" (i.e., "make dist" will error if you configured
with --disable-man-pages).

Also removed the stuff to re-generate man pages.

This commit also:

1. Includes a new man page, written in Markdown
   (ompi/mpi/man/man5/MPI_T.5.md) that contains Open MPI-specific
   information about MPI_T.
2. Includes a converted ompi/mpi/man/man3/MPI_T_init_thread.3.md (from
   MPI_T_init_thread.3in -- i.e., nroff) just to show that Markdown
   can be used throughout the Open MPI code base for man pages.
3. Made the Makefiles in ompi/mpi/man/man?/ be full-fledged
   Makefile.am's (vs. Makefile.extras that are designed to be included
   in ompi/Makefile.am).  It is more convenient to test generation /
   installation of man pages when you can "make" and "make install" in
   their respective directories (vs. doing a build / install for the
   entire ompi project).
4. Removed logic from ompi/Makefile.am that re-generated man pages if
   opal_config.h changes.

Other man pages -- hopefully all of them! -- will be converted to
Markdown over time.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-05-02 12:45:31 -07:00
Ralph Castain
0dce61b1fc
Merge pull request #7679 from rhc54/topic/nm
Update PRRTE to resolve naming issues
2020-05-02 06:33:42 -07:00
Ralph Castain
34f3f3bbf2
Update PRRTE to resolve naming issues
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-02 05:31:25 -07:00
Yossi Itigin
b61bf9a00a
Merge pull request #7349 from hoopoepg/topic/ucx-new-api-nbx
OPAL/UCX: enabling new API provided by UCX
2020-05-02 14:30:44 +03:00
Ralph Castain
0e17e5b8e4
Merge pull request #7678 from rhc54/topic/nit
Ensure proper handling of default MCA param files
2020-05-01 15:18:18 -07:00
Ralph Castain
f608575eec
Remove references to numa_rank
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-01 13:32:29 -07:00
Ralph Castain
86709b1c80
Fix PMIx_Fence call signature
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-01 12:27:42 -07:00
Ralph Castain
10c93a10e2
Ensure proper handling of default MCA param files
Update PMIx/PRRTE to ensure we pickup the default system and user MCA
param definitions during PMIx_server_setup_application so they get
propagated. Protect OPAL's MCA var processing so it doesn't try to
process a NULL filename when PMIx provides the params for it.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-01 12:02:10 -07:00
Sergey Oblomov
75bda25ddb OPAL/UCX: enabling new API provided by UCX
- added detection of new API into configuration
- added tag_send call implemented using new API
- added MPI_Send/MPI_Isend/MPI_Recv/MPI_Irecv implementations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2020-05-01 17:58:29 +03:00
Nikola Dancejic
167d75b42a common/ofi: Added multi-NIC support to provider selection
Adds the capability to select a NIC based on hardware locality.
Creates a list of NICs that share the same cpuset as the process,
then selects the NIC based on the (local rank) % (number of NICs).
If no NICs are available that share the same cpuset, the selection process
will create a list of all available NICs and make a selection based on
(local rank) % (number of NICs)

Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
2020-05-01 01:05:13 +00:00
Ralph Castain
0fa7ead700
Merge pull request #7662 from rhc54/topic/dpm
Update dpm to handle deprecation of MPI_Info keys
2020-04-29 16:32:50 -07:00
Ralph Castain
bd29ab0ae9
Update dpm to handle deprecation of MPI_Info keys
Deprecate the current OMPI-specific MPI_Info key definitions for
MPI_Comm_spawn and replace them with their PMIx equivalents. Issue a
deprecation/conversion warning as this is done. Also issue deprecation
warnings for options such as "ompi_non_mpi" that are no longer used.

Handle both cases where the user might pass either the PMIx attribute
name itself (e.g., "PMIX_MAPBY") or the string value of the attribute
(e.g., PMIX_MAPBY, which translates to "pmix.mapby"). This can only be
done for PMIx v4 and above, so protect that code.

Silence a couple of Coverity warnings and add a test along the way.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-29 14:56:38 -07:00