1
1

7211 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
26e42f9a0c op/avx: check for _mm512_mullo_epi64() AVX512 intrinsic
PGI (20.4) compiler do not define this intrinsic, so only build
AVX512 support if _mm512_mullo_epi64() intrisic is defined.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-11-04 14:45:03 +09:00
Yossi Itigin
1f3e33441c
Merge pull request #8140 from hoopoepg/topic/pml-ucx-recv-improved-errhandling
PML/UCX: improved error processing in MPI_Recv
2020-11-03 13:08:42 +02:00
Jeff Squyres
e9e5dab8b9
Merge pull request #8153 from dancejic/multi
Using package_rank to select between NIC of equal distance from the process.
2020-11-02 15:27:37 -05:00
Sergey Oblomov
eb9405d53f PML/UCX: improved error processing in MPI_Recv
- improved error processing in MPI_Recv implementation
  of pml UCX
- added error handling for pml_ucx_mrecv call

Signed-off-by: Sergey Oblomov <sergeyo@nvidia.com>
2020-11-02 11:25:28 +02:00
Nikola Dancejic
8017f12801 Using package_rank to select between NIC of equal distance from the process.
If PMIX_PACKAGE_RANK is available, uses this value to select between multiple
NIC of equal distance between the current process. If this value is not
available, try to calculate it by getting the locality string from each local
process and assign a package_rank. If everything fails, fall back to using
process_id.rank to select the NIC. This last case is not ideal, but has a small
chance of occuring, and causes an output to be displayed to notify that this is
occuring.

Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
2020-11-02 00:32:03 -08:00
Jeff Squyres
ee405ccaa5 coll/adapt and coll/han: fix trivial compiler warnings
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-10-30 10:41:14 -04:00
George Bosilca
154117515a Fix HAN issues reported by Coverity.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-10-28 15:44:16 -04:00
Howard Pritchard
487bbf31ba
Merge pull request #8052 from hkuno/john.l.byrne/ofi_create_recv_tag_mask
use sync_send mask for ofi_create_recv_tag
2020-10-26 15:56:35 -06:00
Raghu Raja
2e5303f618
Merge pull request #8124 from rajachan/peek-probe-errorflow
mtl/ofi: Fix erroneous FI_PEEK/FI_CLAIM usage
2020-10-26 10:42:12 -07:00
bosilca
ce97090673
Merge pull request #7735 from bosilca/coll/han
A hierarchical, architecture-aware collective communication module
2020-10-26 00:07:03 -04:00
George Bosilca
16b49dc5b3 A complete overhaul of the HAN code.
Among many other things:
- Fix an imbalance bug in MPI_allgather
- Accept more human readable configuration files. We can now specify
  the collective by name instead of a magic number, and the component
  we want to use also by name.
- Add the capability to have optional arguments in the collective
  communication configuration file. Right now the capability exists
  for segment lengths, but is yet to be connected with the algorithms.
- Redo the initialization of all HAN collectives.

Cleanup the fallback collective support.
- In case the module is unable to deliver the expected result, it will fallback
  executing the collective operation on another collective component. This change
  make the support for this fallback simpler to use.
- Implement a fallback allowing a HAN module to remove itself as
  potential active collective module, and instead fallback to the
  next module in line.
- Completely disable the HAN modules on error. From the moment an error is
  encountered they remove themselves from the communicator, and in case some
  other modules calls them simply behave as a pass-through.

Communicator: provide ompi_comm_split_with_info to split and provide info at the same time
Add ompi_comm_coll_preference info key to control collective component selection

COLL HAN: use info keys instead of component-level variable to communicate topology level between abstraction layers
- The info value is a comma-separated list of entries, which are chosen with
  decreasing priorities. This overrides the priority of the component,
  unless the component has disqualified itself.
  An entry prefixed with ^ starts the ignore-list. Any entry following this
  character will be ingnored during the collective component selection for the
  communicator.
  Example: "sm,libnbc,^han,adapt" gives sm the highest preference, followed
  by libnbc. The components han and adapt are ignored in the selection process.
- Allocate a temporary buffer for all lower-level leaders (length 2 segments)
- Fix the handling of MPI_IN_PLACE for gather and scatter.

COLL HAN: Fix topology handling
 - HAN should not rely on node names to determine the ordering of ranks.
   Instead, use the node leaders as identifiers and short-cut if the
   node-leaders agree that ranks are consecutive. Also, error out if
   the rank distribution is imbalanced for now.

Signed-off-by: Xi Luo <xluo12@vols.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-10-25 18:13:16 -04:00
Raghu Raja
39f8a86b65 mtl/ofi: Fix erroneous FI_PEEK/FI_CLAIM usage
The current iprobe/improbe implementations merely checks the return
code on the posted receive operation to tell if there is a match or
not. This commit moves the check to the probe's error callback
instead. Per the semantics defined in libfabric, the peek operation is
asynchronous and the results are to be fetched from the completion
queue. If no message is found matching the tags specified in the peek
request, then a completion queue error entry with err field set to
FI_ENOMSG will be available.

Signed-off-by: Raghu Raja <craghun@amazon.com>
2020-10-23 17:07:16 +00:00
Raghu Raja
415dddb9af mtl/ofi: Do not fail if error CQ is empty
In multi-threaded scenarios, any thread that attempts to read a CQ
when there's a pending error CQ entry gets an -FI_EAVAIL. Without
any serialization here (which is okay, since libfabric will protect
access to critical CQ objects), all threads proceed to read from the
error CQ, but only one thread fetches the entry while others get
-FI_EAGAIN indicating an empty queue, which is not erroneous.

Signed-off-by: Raghu Raja <craghun@amazon.com>
2020-10-21 21:27:29 +00:00
Nysal Jan K A
992e8f91bb
Merge pull request #8105 from AboorvaDevarajan/fix_zero2
pml/ucx: fix zero sized datatype transfers
2020-10-20 19:53:15 +05:30
Austen Lauria
23269ccaa6
Merge pull request #8079 from markalle/exported_symbol_names
symbol pollution
2020-10-13 11:48:43 -04:00
Aboorva Devarajan
202b81d95c pml/ucx: fix zero sized datatype transfers
Signed-off-by: Aboorva Devarajan <aburvadevarajan@gmail.com>
2020-10-13 01:58:50 -04:00
bsergentm
220b997a58 Coll/han Bull
* first import of Bull specific modifications to HAN

* Cleaning, renaming and compilation fixing Changed all future into han.

* Import BULL specific modifications in coll/tuned and coll/base

* Fixed compilation issues in Han

* Changed han_output to directly point to coll framework output.

* The verbosity MCA parameter was removed as a duplicated of coll verbosity

* Add fallback in han reduce when op cannot commute and ppn are imbalanced

* Added fallback wfor han bcast when nodes do not have the same number of process

* Add fallback in han scatter when ppn are imbalanced

+ fixed missing scatter_fn pointer in the module interface

Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net>
Co-authored-by: a700850 <pierre.lemarinier@atos.net>
Co-authored-by: germainf <florent.germain@atos.net>
2020-10-09 14:17:46 -04:00
Xi Luo
182c333b21 Initial import of the HAN collective module
a hierarchical, architecture-aware collective communication module.

Add Reduce and remove up_seg_size and low_seg_size in Bcast
Increase HAN's priority

Signed-off-by: Xi Luo <xluo12@vols.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-10-09 14:17:46 -04:00
bosilca
a541ab933d
Merge pull request #8066 from devreal/spc-pml-ucx
PML UCX: add SPC instrumentation for sent/received message sizes
2020-10-07 21:56:27 -04:00
Mark Allen
34b42b1b09 symbol pollution
Made a couple vars static if they didn't look like they were used
more than one place, and added prefixes to a few.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2020-10-07 13:04:39 -04:00
bosilca
9f114c6afe
Merge pull request #8067 from devreal/fix-ob1-spc
PML OB1: Fix potential overcounting of SPC sent/received message sizes and account for fin messages
2020-10-05 22:26:24 -04:00
Brian Barrett
0e9581d478 build: Move hwloc to a 3rd-party package
With Open MPI 5.0, the decision was made to stop building
3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as
MCA components and instead 1) start relying on external libraries
whenever possible and 2) Open MPI builds the 3rd party
libraries (if needed) as independent libraries, rather than
linked into libopen-pal.

This patch moves the hwloc library bundled with Open MPI from a
MCA framework to a stand-alone library built outside of OPAL.  Due
to the amount of code in the MCA base (and its assumptions about
being part of an MCA framework), the framework is left with no
active components.  Any pre-installed version of HWLOC 1.6 or
newer is preferred over the internal version.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-10-01 16:55:59 +00:00
Brian Barrett
9ffac85650 build: Move libevent to a 3rd-party package
With Open MPI 5.0, the decision was made to stop building
3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as
MCA components and instead 1) start relying on external libraries
whenever possible and 2) Open MPI builds the 3rd party
libraries (if needed) as independent libraries, rather than
linked into libopen-pal.

This patch moves libevent from an MCA framework to a stand-alone
library built outside of OPAL.  A wrapper in opal/util is provided
to minimize the unnecessary changes in the rest of the code.  When
using the internal Libevent, it will be installed as a stand-alone
libevent.a, instead of bundled in OPAL.  Any pre-installed version
of Libevent at or after 2.0.21 is preferred over the internal
version.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-10-01 16:55:58 +00:00
Nathan Hjelm
1f5ed0b83d
Merge pull request #8070 from devreal/osc-page-align
OSC RDMA: put memory for each process into separate pages
2020-09-29 07:54:45 -06:00
Joseph Schuchart
52b52b8ebb OSC RDMA: only touch pages before memory registration, don't fill them
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-09-29 07:45:07 +02:00
Joseph Schuchart
d11ccbada9 OSC RDMA: put memory of each process into separate pages
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-09-29 07:44:17 +02:00
Joseph Schuchart
1fdf05f634 pml/ob1: fix SPC potential over-counting when sending ack and requesting put
mca_pml_ob1_recv_request_put_frag is used to request a put from the peer if get fails
mca_pml_ob1_recv_request_ack_send_btl is used to send an acknowledgement, not data

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-09-28 15:38:47 +02:00
Joseph Schuchart
a9ed53aa66 pml/ob1: add SPC instrumentation of sent fin messages
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-09-28 15:19:29 +02:00
Joseph Schuchart
91a94201d2 PML UCX: add SPC instrumentation for message size sent/received
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-09-28 15:12:24 +02:00
George Bosilca
96fea22cdd
Don't allow some rank to don't count the collective if they have no data
to exchange.

This is the same logic as in 77eaa5c applied to ialltoallw.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-09-24 13:29:01 -04:00
George Bosilca
77eaa5c8b8
Keep the non-blocking collective tags globally in sync.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-09-18 12:52:14 -04:00
George Bosilca
c98e387a53
Many fixes and improvements to ADAPT
- Add support for fallback to previous coll module on non-commutative operations (#30)
- Replace mutexes by atomic operations.
- Use the correct nbc request type (for both ibcast and ireduce)
  * coll/base: document type casts in ompi_coll_base_retain_*
- add module-wide topology cache
- use standard instead of synchronous send and add mca parameter to control mode of initial send in ireduce/ibcast
- reduce number of memory allocations
- call the default request completion.
  - Remove the requests from the Fortran lookup conversion tables before completing
    and free it.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>

Co-authored-by: Joseph Schuchart <schuchart@hlrs.de>
2020-09-18 12:50:17 -04:00
Harumi Kuno
18baa5e291 use sync_send mask for ofi_create_recv_tag
The upper 2 bits of an ompi tag encode the synchronize send and
synchronize send ack.
Because the mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag
functions both use ompi_mtl_ofi.sync_proto_mask instead of
ompi_mtl_ofi.sync_send when generating their "ignore" masks, they hide
the ack bit, turning the tag into an "any tag receive"

This is an issue because ssend is implemented by doing a send and
receive internally.  So if there happens to be an outstanding posted
receive posted before the ssend, that receive will end up consuming the
internal message intended for the ssend's internal receive.

Updating mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag functions
to use ompi_mtl_ofi.sync_send fixes this.

Authored-by: John L. Byrne <john.l.byrne@hpe.com>

Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
2020-09-16 18:12:22 -06:00
Jeff Squyres
560ebc5780
Merge pull request #7716 from bosilca/coll/adapt
ADAPT: Event-driven collective implementation
2020-09-01 11:29:53 -04:00
Aurelien Bouteiller
4df5fcf48c
errors_are_fatal_comm_handler takes a pointer to the error constant as
input.

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-08-26 16:05:30 -04:00
Howard Pritchard
c1c71b22b9
Merge pull request #8002 from hppritcha/topic/ofi_gni_prov_patch_for_mtl
OFI: patch OFI MTL for GNI provider
2020-08-26 12:30:50 -06:00
Howard Pritchard
d6ac41cbbd OFI: patch OFI MTL for GNI provider
Uncovered a problem using the GNI provider with the OFI MTL.
See https://github.com/ofiwg/libfabric/issues/6194.

Related to #8001

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
2020-08-26 08:25:53 -06:00
William Zhang
57b95bcb45 coll/tuned: Revert RSB and RS default algorithms
Reduce scatter block and reduce scatter algorithms were hitting
correctness issues for non commutative strided tests. We will revert to
the original default algorithms for those two collectives (basic linear
and non overlapping respectively) in the non commutative op case.

See #8010

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-08-25 08:44:24 -07:00
George Bosilca
ee592f3672 Address the comments on the PR.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
Xi Luo
e59bde912e Remove the code handling zero count cases in ADAPT.
Set request in ibcast.c to empty when the count is 0.

Signed-off-by: Xi Luo <xluo12@vols.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
George Bosilca
c2970a3695 Correctly handle non-blocking collectives tags
As it is possible to have multiple outstanding non-blocking collectives
provided by different collective modules, we need a consistent
mechanism to allow them to select unique tags for each instance of a
collective.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
George Bosilca
d71264569e Fix the atomic management of the bcast and reduce freelist
API consistent with other collective modules
Add comments
Other minor cleanups.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
bsergentm
a4be3bb93d Coll/adapt Bull (#15)
* piggybacking Bull functionalities

* coll/adapt: Fix naming conventions and C11 atomic use

This commit fixes some naming convention issues, such as function names
which should follow the naming ompi_coll_adapt instead of
mca_coll_adapt, reserved for component and module naming (cf. tuned
collective component);

It also fixes the use of _Atomic construct, which is only valid in C11.
OPAL constructs have already been adapted to that use, so use
opal_atomic_* types instead.

* coll/adapt: Remove unused component field in module

This commit removes an unneeded field referencing the component in the
module of adapt, as it is already available through the
mca_coll_adapt_component global variable.

Signed-off-by: Marc Sergent <marc.sergent@atos.net>
Co-authored-by: Lemarinier, Pierre <pierre.lemarinier@atos.net>
Co-authored-by: pierrele <31764860+pierrele@users.noreply.github.com>
2020-08-24 12:13:38 -07:00
Xi Luo
fe73586808 Add ADAPT module
Add comments in the ADAPT module

Signed-off-by: Xi Luo <xluo12@vols.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
Howard Pritchard
eefaadf7f1
Merge pull request #8012 from hppritcha/topic/mprobe_with_ofi_fix
ofi mtl: fix problem with mrecv
2020-08-18 17:21:37 -06:00
Howard Pritchard
e6f81ed6d6 ofi mtl: fix problem with mrecv
the ofi mtl mrecv was not properly setting the message in/out
arg to MPI_MRECV to MPI_MESSAGE_NULL.

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
2020-08-18 15:39:19 -06:00
Jeff Squyres
20c772e733 Cleanup language about MPI exceptions --> errors
MPI-4 is finally cleaning up its language: an MPI "exception" does not
actually exist.  The only thing that exists is an MPI "error" (and
associated handlers).  This commit replaces all relevant uses of the
word "exception" with "error".  Note that this is still applicable in
versions of the MPI standard less than MPI-4.0 (indeed, nearly all the
cases fixed in this commit are just changes to comments, anyway).

One exception to this is the Java bindings, where there's an
MPIException class.  In hindsight, it probably should have been named
MPIError, but changing it now would break anyone who is using the Java
bindings.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-08-17 13:57:47 -04:00
Jeff Squyres
9a0f661a66
Merge pull request #7975 from wckzhang/btlcommonlist
btl/ofi: Use common provider include/exclude list
2020-08-10 14:41:53 -04:00
Aurelien Bouteiller
efbc6ff6a5
Merge pull request #7798 from abouteiller/mpi-next/unbounderr-self
MPI-4 error handling: 'unbound' errors to MPI_COMM_SELF
2020-08-03 15:59:14 -04:00
Aurelien Bouteiller
ee149fcfcb
MPI3 (unchanged in 4) says that errors after MPI_REQUEST_FREE are FATAL
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-07-31 17:49:38 -04:00