1
1

988 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
9ffac85650 build: Move libevent to a 3rd-party package
With Open MPI 5.0, the decision was made to stop building
3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as
MCA components and instead 1) start relying on external libraries
whenever possible and 2) Open MPI builds the 3rd party
libraries (if needed) as independent libraries, rather than
linked into libopen-pal.

This patch moves libevent from an MCA framework to a stand-alone
library built outside of OPAL.  A wrapper in opal/util is provided
to minimize the unnecessary changes in the rest of the code.  When
using the internal Libevent, it will be installed as a stand-alone
libevent.a, instead of bundled in OPAL.  Any pre-installed version
of Libevent at or after 2.0.21 is preferred over the internal
version.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-10-01 16:55:58 +00:00
George Bosilca
49da998f33
Fix a copy/paste in the RDMA emulation.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-09-13 22:56:58 -04:00
Nathan Hjelm
556a4ac0da btl: remove unused descriptor flags
This PR removes the MCA_BTL_DES_FLAGS_PUT and MCA_BTL_DES_FLAGS_GET
descriptor flags. At some point these had some meaning but they were
replaced by the rcache access flags.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-08-31 13:07:32 -06:00
Brian Barrett
f3832c1ab9
Merge pull request #7973 from wckzhang/btlexclude
btl/ofi: Disable EFA provider in versions earlier than libfabric 1.12.0
2020-08-12 13:34:03 -07:00
William Zhang
41acfee2bb btl/ofi: Disable ofi_rxm provider
The ofi_rxm provider is dependent upon the underlying hardware for its
implementation of FI_DELIVERY_COMPLETE. Since this can lead to early
completions, we disable the provider to avoid correctness issues.

This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-08-11 16:47:19 -07:00
bosilca
b6a06ca37b
Merge pull request #7974 from abouteiller/bugfix/vader_des_tag
bug fix: des->tag = hdr->frag, should be hdr->tag
2020-08-11 11:13:14 -04:00
William Zhang
9b8f463a76 btl/ofi: Use common provider include/exclude list
The btl/ofi does not currently utilize the common ofi include/exclude
list. Added verification code similar to the mtl/ofi that will check if
the info object is in the include or exclude list. If it isn't in the
include list or is in the exclude list, validate_info will return
OPAL_ERROR. The btl/ofi will no longer pass a provider name as a hint
when calling getinfo, instead filtering the provider during
validate_info.

This patch also moves the is_in_list MTL function into common code and
adds additional debugging output to the BTL to match the MTL standard.

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-07-31 12:13:00 -07:00
William Zhang
a7dcfd9874 btl/ofi: Disable EFA provider in versions earlier than libfabric 1.12.0
EFA incorrectly implements FI_DELIVERY_COMPLETE in earlier libfabric
versions. While FI_DELIVERY_COMPLETE would be advertised by the
provider, completions would return too early by not accounting for
bounce buffers on the receive side. This would cause the BTL
to receive early completions that lead to correctness issues.

This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-07-30 13:53:16 -07:00
Aurelien Bouteiller
8e0cb1d49d
des->tag = hdr->frag, should be hdr->tag
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-07-30 14:02:22 -04:00
George Bosilca
8bc1f3d8fb Don't allow any asynchronous CUDA operations.
There are 2 reasons for this:
- pending CUDA events are not progressed by this BTL, so anything that becomes
  asychronous will never be completed.
- we use the packed data on the shared memory backing file, and this will be
  returned to the peer process upon return (thus if we copy asynchronously we
  might not copy the right data).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-15 01:37:09 -04:00
George Bosilca
fd4ca394e2 Make the smcuda BTL great again.
It has been broken for months because of the lack of initialization of the
HWLOC library. The smcuda process creating the backing file (local rank 0)
uses opal_cache_line_size to align the objects in the backing file, and the
opal_cache_line_size is initialized by default to 128. Later on, when the rest
of the processes attach the same backing file, HWLOC has been called and the
cache size has now been updated to the correct value. If this value is
different than the default one (and they are as most cache sizes are 64 bytes
right now) the objects in the backing file will be misaligned.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-14 01:48:08 -04:00
George Bosilca
96e8cbe25f First step on fixing the BTL API conversion for the SMCUDA BTL
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-13 14:46:10 -04:00
Nathan Hjelm
d0c0cb7144
Merge pull request #7913 from hjelmn/btl_base_atomics_are_awesome
btl: change argument type of BTL receive callbacks
2020-07-11 12:13:26 -06:00
Nathan Hjelm
88f51fbb8e btl: change argument type of BTL receive callbacks
This commit updates the btl interface to change the parameters
passed to receive callbacks. The interface used to pass the tag,
a btl base descriptor, and the callback context. Most of the
values in the btl base descriptor were unused and only helped
simplify the callbacks from the self btl. All of the arguments
have now been replaced with a single receive callback descriptor.
This descriptor contains the incoming endpoint, data segment(s),
tag, and callback context. All btls have been updated to use
the new callback and the btl interface version has been bumped
to v3.2.0.

As part of this change the descriptor argument (and the segments
contained within it) have been marked as const. The were treated
as const before but this change could allow the compiler to make
better optimization decisions and will enforce that the callback
does not attempt to change the data in the descriptor.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-07-08 07:38:46 -07:00
tomhers
88f9d2c90f BTL/OFI: Fix missing include file.
The missing include file causes an error when using an external version of LibEvent.

Signed-off-by: tomhers <tom.herschberg@gmail.com>
2020-07-06 16:32:37 -04:00
Christoph Niethammer
f0f206b247
Merge pull request #7673 from cniethammer/uct-supported-version-update
Accept UCX 1.8 in configure of btl/uct
2020-06-26 20:53:36 +02:00
Joseph Schuchart
ae3974d249 Add missing free call to mca_btl_tcp_component_exchange
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-19 14:30:07 +02:00
Harumi Kuno
f1b21cb776 mtl_btl_ofi_rcache_init() before creating domain
mtl_btl_ofi_rcache_init() initializes patcher which should only take
place things are single threaded.  OFI providers may start spawn threads,
so initialize the rcache before creating OFI objects to prevent races.

Authored-by: John L. Byrne <john.l.byrne@hpe.com>
Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
2020-06-03 09:56:29 -06:00
Austen Lauria
9996b9f54d
Merge pull request #7720 from abouteiller/bugfix/tcp-failed-lock
Race condition when closing TCP endpoint with error
2020-05-13 16:52:21 -04:00
Aurelien Bouteiller
0e93d0f647
Bugfix: when a TCP socket is closed in error, it could update the
endpoint state without holding the endpoint lock, resulting in a race
condition.

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-05-11 01:11:05 -04:00
Howard Pritchard
9f1081a07a add a common ofi whitelist/blacklist
also add common verbose variable.

Note the verbosity thing is a little tricky owing to the way the MCA frameworks and components are registered and
and initialized.  The BTL's are registered/initialized prior to the MTL components even getting registered.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-05-09 14:50:31 -06:00
Brian Barrett
0dc2325297
Merge pull request #7641 from dancejic/multi-NIC
Added multi-NIC support to provider selection
2020-05-07 15:24:41 -07:00
Nathan Hjelm
9d8f634044 btl/vader: rename vader -> sm
Now that the old sm btl has been gone for some time there was a request
to rename vader to sm. This commit does just that (reluctantly).

An alias has been generated so specifying vader in the btl selection
variable or specifying vader parameters will continue to work.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-05-05 06:43:19 -07:00
Nikola Dancejic
167d75b42a common/ofi: Added multi-NIC support to provider selection
Adds the capability to select a NIC based on hardware locality.
Creates a list of NICs that share the same cpuset as the process,
then selects the NIC based on the (local rank) % (number of NICs).
If no NICs are available that share the same cpuset, the selection process
will create a list of all available NICs and make a selection based on
(local rank) % (number of NICs)

Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
2020-05-01 01:05:13 +00:00
Christoph Niethammer
9b10f46126 Accept UCX 1.8 in configure of btl/uct
The configure script for the btl uct component reports an error for
the new UCX 1.8.0 versions as it was fixed up to UCX 1.7.

This fixes #7612

Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
2020-04-30 09:28:10 +02:00
Ralph Castain
bd29ab0ae9
Update dpm to handle deprecation of MPI_Info keys
Deprecate the current OMPI-specific MPI_Info key definitions for
MPI_Comm_spawn and replace them with their PMIx equivalents. Issue a
deprecation/conversion warning as this is done. Also issue deprecation
warnings for options such as "ompi_non_mpi" that are no longer used.

Handle both cases where the user might pass either the PMIx attribute
name itself (e.g., "PMIX_MAPBY") or the string value of the attribute
(e.g., PMIX_MAPBY, which translates to "pmix.mapby"). This can only be
done for PMIx v4 and above, so protect that code.

Silence a couple of Coverity warnings and add a test along the way.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-29 14:56:38 -07:00
Ralph Castain
6d29bbfde8
Cleanup heterogeneous builds
Consolidate the ompi_process_info and opal_process_info structs to
remove duplicate storage and conversion issues. Unwind some interweaving
of include files using opal.h. Silence a couple of warnings.

For now, set the arch to local if PMIX_ARCH is not found.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-22 12:46:27 -07:00
Ralph Castain
a210f8046f
Cleanup ompi/dpm operations
Do some code cleanup in the connect/accept code. Ensure that the OMPI
layer has access to the PMIx identifier for the process. Add macros for
converting PMIx names to/from strings. Cleanup a few of the simple test
programs. Add a little more info to a btl/tcp error message.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-08 08:37:25 -07:00
Jeff Squyres
9687d5e867 Upgrade all www.open-mpi.org URLs to https
Found a handful of other URLs that weren't https-ized, so I updated
them, too (after verifying that they support https, of course).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-04-02 10:43:50 -04:00
Nathan Hjelm
160ff188b8
Merge pull request #7169 from hjelmn/fix_what_wg21_calls_our_problem_not_theirs_seriously__in_some_ways_they_are_correct_but_wtf
configure: use -iquote for non-system include paths
2020-03-30 09:22:54 -07:00
Ralph Castain
95db66d0c8
Fix typo in usnic btl
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-03-27 20:27:45 -07:00
Howard Pritchard
f136a20cae
Merge pull request #6578 from hppritcha/topic/thread_framework2
Implement a MCA framework for threads
2020-03-27 15:55:48 -06:00
Noah Evans
ee3517427e Add threads framework
Add a framework to support different types of threading models including
user space thread packages such as Qthreads and argobot:

https://github.com/pmodels/argobots

https://github.com/Qthreads/qthreads

The default threading model is pthreads.  Alternate thread models are
specificed at configure time using the --with-threads=X option.

The framework is static.  The theading model to use is selected at
Open MPI configure/build time.

mca/threads: implement Argobots threading layer

config: fix thread configury

- Add double quotations
- Change Argobot to Argobots
config: implement Argobots check

If the poll time is too long, MPI hangs.

This quick fix just sets it to 0, but it is not good for the
Pthreads version. Need to find a good way to abstract it.

Note that even 1 (= 1 millisecond) causes disastrous performance
degradation.

rework threads MCA framework configury

It now works more like the ompi/mca/rte configury,
modulo some edge items that are special for threading package
linking, etc.

qthreads module
some argobots cleanup

Signed-off-by: Noah Evans <noah.evans@gmail.com>
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-03-27 10:15:45 -06:00
Ralph Castain
33ab928e1b ompi_proc_t size reduction: part 1
We currently save the hostname of a proc when we create the ompi_proc_t for it. This was originally done because the only method we had for discovering the host of a proc was to include that info in the modex, and we had to therefore store it somewhere proc-local. Obviously, this ccarried a memory penalty for storing all those strings, and so we added a "cutoff" parameter so that we wouldn't collect hostnames above a certain number of procs.

Unfortunately, this still results in an 8-byte/proc memory cost as we have a char* pointer in the opal_proc_t that is contained in the ompi_proc_t so that we can store the hostname of the other procs if we fall below the cutoff. At scale, this can consume a fair amount of memory.

With the switch to relying on PMIx, there is no longer a need to cache the proc hostnames. Using the "optional" feature of PMIx_Get, we restrict the retrieval to be purely proc-local - i.e., we retrieve the info either via shared memory or from within the proc-internal hash storage (depending upon the active PMIx components). Thus, the retrieval of a hostname is purely a local operation involving no communication.

All RM's are required to provide a complete hostname map of all procs at startup. Thus, we have full access to all hostnames without including them in a modex or having to cache them on each proc. This allows us to remove the char* pointer from the opal_proc_t, saving us 8-bytes/proc.

Unfortunately, PMIx_Get does not currently support the return of a static pointer to memory. Thus, even though PMIx has the hostname in its memory, it can only return a malloc'd version of it. I have therefore ensured that the return from opal_get_proc_hostname is consistently malloc'd and free'd wherever used. This shouldn't be a burden as the hostname is only used in one of two circumstances:

(a) in an error message
(b) in a verbose output for debugging purposes

Thus, there should be no performance penalty associated with the malloc/free requirement. PMIx will eventually be returning static pointers, and so we can eventually simplify this method and return a "const char*" - but as noted, this really isn't an issue even today.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-03-23 12:49:44 -07:00
Austen Lauria
b560fc5fae
Merge pull request #7505 from hkuno/john.l.byrne/btl_ofi
Fix btl ofi clean-up logic
2020-03-23 10:10:33 -04:00
Jeff Squyres
1870b04017 usnic: remove typo
Remove an amusing -- but harmless -- typo.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-03-20 11:16:52 -07:00
Ralph Castain
6b4fb509e9
Cleanup singleton detection and data retrieval
Extend the PMIx modex recv macros to cover the full set of
immediate/optional combinations. If PMIx_Init cannot reach a server,
then declare the MPI proc to be a singleton.

Provide full support for info values via PMIx

Catch all the values used in the "info" area of OMPI using data
available from PMIx instead of via envars. Update PMIx and PRRTE to sync
with their capabilities.

PMIx
- ensure cleanup of fork/exec children
- fix bug in gds/hash that left app info off of list

PRRTE
- fix multi-app bugs
- port setup_child logic from orte
- OMPI env changes
- set app->first_rank
- ensure common hostname across prun, prte, and pmix
- Fix "nolocal" support

Silence a warning from btl/vader

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-03-16 12:25:28 -07:00
Harumi Kuno
ab4875ddc2 set ep to NULL to avoid double close
Per suggestion of @awlauria

Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
2020-03-10 17:39:59 -06:00
Gilles Gouaillardet
69bc2e8372 misc: fix <> vs "" includes throught the ompi codebase
This commit fixes an issue with the include usage in some
ompi source files. These source files are using the <> form
of include when the "" form is correct (as these are internal,
**not** system headers).

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-03-09 21:13:49 -04:00
Harumi Kuno
1bc3dab118 Add comments about order of close ops
Per suggestion of @awlauria, added some comments about
the need to free ep before resources it points to.

Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
2020-03-07 14:08:39 -07:00
harumi kuno
3095fabf94 Fix mca_btl_ofi_finalize clean-up logic
This fix is from John L. Byrne (john.l.byrne@hpe.com).

When OFI Libfabric binds objects to endpoints, before the object can
be successfully closed, the endpoint must first be freed.  For scalable
endpoints, objects can also be bound to transmit and receive contexts,
and for objects that are bound to contexts, we need to first free the
contexts before freeing the endpoint. We also need to clear the memory
registration cache.

If we don't clean up properly, then fi\_close may not be able to close
the domain because the dom will have a non-zero ref count.

Signed-off-by: harumi kuno <harumi.kuno@hpe.com>
2020-03-04 17:51:08 -07:00
Austen Lauria
f69c8d6819 Fix segv in btl/vader.
Keep track of the connected procs in vader_add_procs().
Otherwise, the same rank will reconnect the same shmem
segment (rank 0+...) multiple times instead of the next
one as intended.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2020-03-04 09:32:58 -05:00
bosilca
c4d36859ec
Merge pull request #7228 from devreal/progress-returns
Harmonize return values of progress callbacks
2020-02-28 20:15:37 -05:00
Ralph Castain
9e2db26732
Fix vader local modex
Restrict the search to the "immediate" range so at worst we check with
our local server and don't go up to the host daemon.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-27 18:15:34 -08:00
Jeff Squyres
cdf478e963 btl/sm: remove the deprecation-notice shell
The SM BTL was effectively removed a long time ago.  All that was left
was a shell that warned people if they tried to use the SM BTL.  For
v5.0, we plan to finally remove this ancient shell (and possibly
replace it with vader).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-02-25 11:48:42 -05:00
Gilles Gouaillardet
174e967dbc
Remove ORTE project
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.

remove orte: misc fixes

 - UCX fixes
 - VPATH issue
 - oshmem fixes
 - remove useless definition
 - Add PRRTE submodule
 - Get autogen.pl to traverse PRRTE submodule
 - Remove stale orcm reference
 - Configure embedded PRRTE
 - Correctly pass the prefix to PRRTE
 - Correctly set the OMPI_WANT_PRRTE am_conditional
 - Move prrte configuration to the end of OMPI's configure.ac
 - Make mpirun a symlink to prun, when available
 - Fix makedist with --no-orte/--no-prrte option
 - Add a `--no-prrte` option which is the same as the legacy
   `--no-orte` option.
 - Remove embedded PMIx tarball. Replace it with new submodule
   pointing to OpenPMIx master repo's master branch
 - Some cleanup in PRRTE integration and add config summary entry
 - Correctly set the hostname
 - Fix locality
 - Fix singleton operations
 - Fix support for "tune" and "am" options

Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-02-07 18:20:06 -08:00
Austen Lauria
824dbcbcf3 Protect use of _Static_assert().
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2020-02-04 13:46:58 -05:00
Joseph Schuchart
2c97187ee0 Harmonize return values of progress callbacks
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-01-28 20:15:03 +01:00
Aurelien Bouteiller
9f4365fef6
Merge pull request #6783 from abouteiller/export/macos-epipe
Prevent EPIPE on OSX.
2020-01-28 11:18:46 -05:00
Brian Barrett
fc8c7a5869
Merge pull request #7134 from wckzhang/btl_tcp_interface_match
btl tcp: Use reachability and graph solving for global interface matching
2020-01-27 15:38:49 -08:00