1
1
Граф коммитов

3933 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
b9498ec31b rework argobots configury to be smarter
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-05-23 14:46:41 -07:00
Joshua Hursey
05e095a1ee A slightly stronger check for LSF's libevent
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-05-18 15:08:10 -04:00
Ralph Castain
54f8b6d23c
Pickup the OMPI system-default parameters
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-16 12:43:04 -07:00
Ralph Castain
337fcb0047
Sync to PMIx and PRRTE masters
Roll in new mapping/binding methods and report outputs. Fix a few bugs

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-16 07:39:31 -07:00
Josh Hursey
9c0a2bb2d6
Merge pull request #7734 from jjhursey/fix-lsf-libevent
Move to `libevent_core` and add checks for libevent.so conflict with LSF
2020-05-15 14:36:22 -05:00
Austen Lauria
9996b9f54d
Merge pull request #7720 from abouteiller/bugfix/tcp-failed-lock
Race condition when closing TCP endpoint with error
2020-05-13 16:52:21 -04:00
Joshua Hursey
33afdb6649 Move from legacy -levent to recommended -levent_core
* `libevent_core.so` contains the core functionality that we depend upon
   - `libevent.so` library has been identified as the legacy target.
   - `libevent_core.so` exists as far back as Libevent 2.0.5 (oldest supported by OMPI)
 * `libevent_pthreads.so` can work with either `-levent` or `-levent_core`

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 886f41fe33)
2020-05-13 10:48:24 -04:00
Joshua Hursey
959353b421 Add checks for libevent.so conflict with LSF
* LSF ships a `libevent.so` that is no related to the `libevent.so`
   shipped with Libevent.
 * Add some checks to the configure logic to detect scenarios where this
   conflict can be detected, and provide the user with a descriptive
   warning message.
   - When detected by `event/external` this is just a warning since
     the internal component may be able to be used instead.
     - This happens when the user supplies the LSF path via the
       `LDFLAGS` envar instead of via `--with-lsf-libdir`.
   - When detected by a LSF component and LSF was explicitly requested
     then this becomes an error. Otherwise it will just print the warning
     and that component will fail to build.
 * Note for `master` the `orter_check_lsf.m4` portion of this cherry-pick
   was moved to `prrte/config/prrte_check_lsf.m4`

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit fc4199e3ba)
2020-05-13 10:47:02 -04:00
Joshua Hursey
a73a89f6cf event/external: Fix typo in LDFLAGS vs LIBS var before check
* This should have been `LDFLAGS` not `LIBS`. Either works, but
   `LDFLAGS` is more correct. We should also include `CPPFLAGS`
   just in case the header is important to the check.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 22d8fa197b)
2020-05-13 10:45:26 -04:00
Howard Pritchard
f744668f5f
Merge pull request #7646 from hppritcha/topic/ofi_common_wl
add a common ofi whitelist/blacklist
2020-05-13 06:44:05 -06:00
Howard Pritchard
3078485eee
Merge pull request #7712 from shintaro-iwasaki/fix7697
opal/mca/threads/argobots: fix compilation error
2020-05-11 09:02:22 -06:00
Aurelien Bouteiller
0e93d0f647
Bugfix: when a TCP socket is closed in error, it could update the
endpoint state without holding the endpoint lock, resulting in a race
condition.

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-05-11 01:11:05 -04:00
Howard Pritchard
9f1081a07a add a common ofi whitelist/blacklist
also add common verbose variable.

Note the verbosity thing is a little tricky owing to the way the MCA frameworks and components are registered and
and initialized.  The BTL's are registered/initialized prior to the MTL components even getting registered.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-05-09 14:50:31 -06:00
Brian Barrett
0dc2325297
Merge pull request #7641 from dancejic/multi-NIC
Added multi-NIC support to provider selection
2020-05-07 15:24:41 -07:00
Shintaro Iwasaki
0fc2033c75 opal/mca/threads/argobots: fix compilation error
Fixes #7697

Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
2020-05-07 16:07:12 +00:00
Ralph Castain
120cd31aaa
Update PMIx to fix some bugs
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-06 15:53:11 -07:00
Ralph Castain
ebd164b4c1
Update PMIx and PRRTE
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-06 12:40:11 -07:00
Nathan Hjelm
9d8f634044 btl/vader: rename vader -> sm
Now that the old sm btl has been gone for some time there was a request
to rename vader to sm. This commit does just that (reluctantly).

An alias has been generated so specifying vader in the btl selection
variable or specifying vader parameters will continue to work.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-05-05 06:43:19 -07:00
Nathan Hjelm
9fae5bfdf3 mca/base: add support for component aliasing
This commit adds support for aliasing component names. A component
name alias is created by calling: mca_base_alias_register. The name
of the project and framework are optional. The component name and
component alias are required. Once an alias is registered all
variables registered after the alias creation will have synonyms
also registered. For example:

```c
mca_base_alias_register("opal", "btl", "vader", "sm", false);
```

would cause all of the variables registered by btl/vader to have
aliases that start with btl_sm. Ex: btl_vader_single_copy_mechanism
would have the synonym: btl_sm_single_copy_mechanism.

If aliases are registered before component filtering the alias
can also be used for component selection. For example, if sm is
registered as an alias to vader in the btl framework register
function then ```--mca btl self,sm``` would be equivalent to
```--mca btl self,vader```.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-05-05 06:43:19 -07:00
Ralph Castain
10c93a10e2
Ensure proper handling of default MCA param files
Update PMIx/PRRTE to ensure we pickup the default system and user MCA
param definitions during PMIx_server_setup_application so they get
propagated. Protect OPAL's MCA var processing so it doesn't try to
process a NULL filename when PMIx provides the params for it.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-05-01 12:02:10 -07:00
Nikola Dancejic
167d75b42a common/ofi: Added multi-NIC support to provider selection
Adds the capability to select a NIC based on hardware locality.
Creates a list of NICs that share the same cpuset as the process,
then selects the NIC based on the (local rank) % (number of NICs).
If no NICs are available that share the same cpuset, the selection process
will create a list of all available NICs and make a selection based on
(local rank) % (number of NICs)

Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
2020-05-01 01:05:13 +00:00
Ralph Castain
bd29ab0ae9
Update dpm to handle deprecation of MPI_Info keys
Deprecate the current OMPI-specific MPI_Info key definitions for
MPI_Comm_spawn and replace them with their PMIx equivalents. Issue a
deprecation/conversion warning as this is done. Also issue deprecation
warnings for options such as "ompi_non_mpi" that are no longer used.

Handle both cases where the user might pass either the PMIx attribute
name itself (e.g., "PMIX_MAPBY") or the string value of the attribute
(e.g., PMIX_MAPBY, which translates to "pmix.mapby"). This can only be
done for PMIx v4 and above, so protect that code.

Silence a couple of Coverity warnings and add a test along the way.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-29 14:56:38 -07:00
Ralph Castain
6146d52772
Sync PMIx
Remove pmix_config.h from the tarball. Deal with the case of no local
procs when register_nspace is called.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-29 09:00:04 -07:00
Ralph Castain
fd098d0eba
Sync PMIx and PRRTE
Remove prrte_config.h from tarball plus misc bug fixes

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-28 07:46:21 -07:00
Ralph Castain
ae90412098
Remove no-longer-used hwloc support fns
Remove a set of functions that were only used by ORTE as they are no
longer required. We can probably remove more of them with a little
cleanup in the rest of the code.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-25 21:05:07 -07:00
Ralph Castain
e1841fab17
Update PMIx
Fixes #7663

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-25 16:00:12 -07:00
Ralph Castain
29832798ef
Sync PMIx to pickup dmodex fix
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-23 13:52:32 -07:00
Ralph Castain
111f0a53ef
Sync PMIx and PRRTE
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-23 08:46:54 -07:00
Ralph Castain
3a15ab0ab5
Update PMIx to add missing include to tarball
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-22 20:13:18 -07:00
Ralph Castain
60c650e79b
Ensure "mpirun --version" reports as Open MPI
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-22 15:11:21 -07:00
Ralph Castain
6d29bbfde8
Cleanup heterogeneous builds
Consolidate the ompi_process_info and opal_process_info structs to
remove duplicate storage and conversion issues. Unwind some interweaving
of include files using opal.h. Silence a couple of warnings.

For now, set the arch to local if PMIX_ARCH is not found.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-22 12:46:27 -07:00
Ralph Castain
336f44ecc3
Sync with PMIx and PRRTE masters
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-20 10:10:16 -07:00
Ralph Castain
f61774fb83
Update PMIx
Pickup fixes in the OMPI envar setting support

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-15 13:02:20 -07:00
Ralph Castain
3252d26183
Sync to PMIx and PRRTE master
- fix potential hang in direct modex
- add support for reachable framework in PRRTE/oob

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-14 10:05:47 -07:00
Ralph Castain
4ab74450d4
Update PRRTE and PMIx
- ensure we return timeout error status
- lots of various bug fixes

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-12 07:05:12 -07:00
Ralph Castain
02346ee6a2
Silence Coverity warning
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-09 07:09:22 -07:00
Ralph Castain
f32febd7f7
Update PMIx and PRRTE
PMIx:
- restore OPA support

PRRTE:
Restore support for several options
* -N for ppr:N:node
* INHERIT modifier for --map-by option, indicating that
  the spawned job should inherit the placement options
  of its parent. Only applicable to dynamically spawned
  jobs

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-08 09:24:44 -07:00
Ralph Castain
a210f8046f
Cleanup ompi/dpm operations
Do some code cleanup in the connect/accept code. Ensure that the OMPI
layer has access to the PMIx identifier for the process. Add macros for
converting PMIx names to/from strings. Cleanup a few of the simple test
programs. Add a little more info to a btl/tcp error message.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-08 08:37:25 -07:00
Ralph Castain
80568bb388
Update for support of PMIX_NUMA_RANK values
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-06 13:06:26 -07:00
Ralph Castain
0d52c2dad7
Sync updates for PMIx and PRRTE
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-06 08:42:13 -07:00
Ralph Castain
3fbfeabff2
Update PRRTE schizo framework
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-03 11:37:19 -07:00
Jeff Squyres
fc0f0b38fd
Merge pull request #7590 from jsquyres/pr/update-to-https
Update text references to HTTPS
2020-04-02 20:46:58 -04:00
Ralph Castain
44c97e3842
Update again and hope to fix the integration points
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-02 11:58:58 -07:00
Jeff Squyres
9687d5e867 Upgrade all www.open-mpi.org URLs to https
Found a handful of other URLs that weren't https-ized, so I updated
them, too (after verifying that they support https, of course).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-04-02 10:43:50 -04:00
Ralph Castain
50d05e7b64
Revert "Add extra libs to PRRTE binaries for external deps"
This reverts commit 1aabbe456d.

Update PMIx and PRRTE, plus PRRTE config integration

Cleanup how we pass the extra libs and LDFLAGS for linking against
external libevent, hwloc, and pmix installs.

Catch the flag indicating that PMIx provided the user-level default MCA
params so we don't go looking for them ourselves.

Cleanup misc config warnings

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-01 20:17:32 -07:00
Ralph Castain
538d2de860
Merge pull request #7566 from rhc54/topic/up2
Update PMIx and PRRTE
2020-03-31 10:29:19 -07:00
Nathan Hjelm
160ff188b8
Merge pull request #7169 from hjelmn/fix_what_wg21_calls_our_problem_not_theirs_seriously__in_some_ways_they_are_correct_but_wtf
configure: use -iquote for non-system include paths
2020-03-30 09:22:54 -07:00
Ralph Castain
f88f271054
Cleanup few errors associated with tool support
Properly mark/detect that a daemon sourced the event broadcast to avoid
reinjecting it into the PMIx server library. Correct the source field
for the event notify call on launcher ready.

Update event notification for tool support
Deal with a variety of race conditions related to tool reconnection to a
different server.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-03-29 11:58:43 -07:00
Ralph Castain
95db66d0c8
Fix typo in usnic btl
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-03-27 20:27:45 -07:00
Howard Pritchard
f136a20cae
Merge pull request #6578 from hppritcha/topic/thread_framework2
Implement a MCA framework for threads
2020-03-27 15:55:48 -06:00