1
1

31085 Коммитов

Автор SHA1 Сообщение Дата
Joseph Schuchart
91a94201d2 PML UCX: add SPC instrumentation for message size sent/received
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-09-28 15:12:24 +02:00
bosilca
08f68671db
Merge pull request #8060 from bosilca/fix/ialltoallw
Prevent some rank from not increasing the non-blocking collective tag if they have no data to exchange.
2020-09-26 12:25:13 -04:00
Nathan Hjelm
920315611e
Merge pull request #8054 from hjelmn/kill_the_never_going_to_work_patcher_linux_component_to_prevent_future_confusion_as_to_its_effectiveness
patcher: remove the linux component
2020-09-24 19:27:02 -06:00
George Bosilca
96fea22cdd
Don't allow some rank to don't count the collective if they have no data
to exchange.

This is the same logic as in 77eaa5c applied to ialltoallw.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-09-24 13:29:01 -04:00
Yossi Itigin
b532564643
Merge pull request #8041 from brminich/topic/shmem_scoll_fix
SHMEM/SCOLL: Fix inplace reductions
2020-09-23 13:56:10 +03:00
Mikhail Brinskii
dfe20e0472 SHMEM/SCOLL: Fix inplace reductions
Signed-off-by: Mikhail Brinskii <mikhailb@nvidia.com>
2020-09-23 10:06:36 +03:00
bosilca
21c9c666ab
Merge pull request #8039 from bosilca/fix/adapt
Fix some corner cases with ADAPT
2020-09-18 17:18:41 -04:00
George Bosilca
77eaa5c8b8
Keep the non-blocking collective tags globally in sync.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-09-18 12:52:14 -04:00
George Bosilca
c98e387a53
Many fixes and improvements to ADAPT
- Add support for fallback to previous coll module on non-commutative operations (#30)
- Replace mutexes by atomic operations.
- Use the correct nbc request type (for both ibcast and ireduce)
  * coll/base: document type casts in ompi_coll_base_retain_*
- add module-wide topology cache
- use standard instead of synchronous send and add mca parameter to control mode of initial send in ireduce/ibcast
- reduce number of memory allocations
- call the default request completion.
  - Remove the requests from the Fortran lookup conversion tables before completing
    and free it.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>

Co-authored-by: Joseph Schuchart <schuchart@hlrs.de>
2020-09-18 12:50:17 -04:00
Nathan Hjelm
7fca99b2f7 patcher: remove the linux component
The Linux component was an attempt to hook calls by patching the dynamic
symbol table. It, unfortunately, does not work as it will always miss
calls made internally by glibc. For example, it might catch a user call
directly to munmap but will miss the chain free -> munmap. Since the
later is the common case we were trying to hook this made the component
unusable. This PR finally kills the component.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-09-18 10:23:01 -06:00
Harumi Kuno
18baa5e291 use sync_send mask for ofi_create_recv_tag
The upper 2 bits of an ompi tag encode the synchronize send and
synchronize send ack.
Because the mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag
functions both use ompi_mtl_ofi.sync_proto_mask instead of
ompi_mtl_ofi.sync_send when generating their "ignore" masks, they hide
the ack bit, turning the tag into an "any tag receive"

This is an issue because ssend is implemented by doing a send and
receive internally.  So if there happens to be an outstanding posted
receive posted before the ssend, that receive will end up consuming the
internal message intended for the ssend's internal receive.

Updating mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag functions
to use ompi_mtl_ofi.sync_send fixes this.

Authored-by: John L. Byrne <john.l.byrne@hpe.com>

Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
2020-09-16 18:12:22 -06:00
bosilca
eca00a7a3b
Merge pull request #8042 from bosilca/fix/sm_emu
Fix a copy/paste in the RDMA emulation.
2020-09-14 11:43:00 -04:00
Jeff Squyres
3a93e4f94d
Merge pull request #8038 from devreal/fix-opal-pmix-cond-init
Use correct conditional variable initializer in opal/mca/pmix/base
2020-09-14 09:38:43 -04:00
Jeff Squyres
d5791b2770
Merge pull request #8043 from ggouaillardet/topic/status_f2f08
mpif-h: fix a typo in MPI_Status_f2f08()
2020-09-14 09:32:34 -04:00
Gilles Gouaillardet
fb8bfccb83 mpif-h: fix a typo in MPI_Status_f2f08()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-09-14 13:56:16 +09:00
George Bosilca
49da998f33
Fix a copy/paste in the RDMA emulation.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-09-13 22:56:58 -04:00
Jeff Squyres
1b0dfcdfab
Merge pull request #7762 from ggouaillardet/topic/mpi_status_f08_c
Add missing MPI_Status conversion subroutines
2020-09-10 09:45:58 -04:00
Jeff Squyres
c04dc355de mpi/man: convert MPI_Status conversion man pages to Markdown
Convert the MPI_Status_f082f, MPI_Status_f082c, and MPI_Status_f2c man
pages to Markdown.  Fix some typos and improve the text a bit along
the way.

Left the raw NROFF redirect pages MPI_Status_f2f08, MPI_Status_c2f08,
and MPI_Status_c2f files as they were -- they're 1-line redirects, and
it seems simpler to leave those (vs. duplicating the Markdown).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-09-09 06:59:12 -07:00
Gilles Gouaillardet
e97d3ce645 Add missing MPI_Status conversion subroutines
Only in C bindings:
 - MPI_Status_c2f08()
 - MPI_Status_f082c()

In all bindings but mpif.h
 - MPI_Status_f082f()
 - MPI_Status_f2f08()

and the PMPI_* related subroutines

As initially inteded by the MPI forum, the Fortran to/from Fortran 2008
conversion subtoutines are *not* implemented in the mpif.h bindings.
See the discussion at https://github.com/mpi-forum/mpi-issues/issues/298

Refs. open-mpi/ompi#1475

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-09-09 06:59:12 -07:00
Gilles Gouaillardet
466a2b31e0 configury: cleanup .mod file
manually cleanup the generated .mod file in OMPI_FORTRAN_CHECK_BIND_C_TYPE

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-09-09 06:59:12 -07:00
Gilles Gouaillardet
7fce2f3057 update MPI_F08_status type
Make the C MPI_F08_status type definition match the updated
mpi_f08 type(MPI_Status) definition.

This fix the inconsistency introduced in open-mpi/ompi@98bc7af7d4

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-09-09 06:59:12 -07:00
Joseph Schuchart
b78c7e93db Use correct conditional variable initializer in opal/mca/pmix/base
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-09-09 09:05:30 +02:00
Joseph Schuchart
43e3addca6
Merge pull request #8035 from devreal/osc-ucx-fix-win-dynamic-segfault
UCX: do not dereference NULL pointer in wpmem_[free|flush]
2020-09-04 17:56:45 +02:00
Joseph Schuchart
fc025c78df UCX: do not dereference NULL pointer in wpmem_[free|flush]
Flushing or freeing a newly created dynamic window causes NULL to be passed.

Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-09-04 09:31:18 +02:00
Jeff Squyres
560ebc5780
Merge pull request #7716 from bosilca/coll/adapt
ADAPT: Event-driven collective implementation
2020-09-01 11:29:53 -04:00
Nathan Hjelm
01dcc39170
Merge pull request #8031 from hjelmn/some_btl_interface_cleanup
btl: remove unused descriptor flags
2020-08-31 16:29:12 -06:00
Nathan Hjelm
556a4ac0da btl: remove unused descriptor flags
This PR removes the MCA_BTL_DES_FLAGS_PUT and MCA_BTL_DES_FLAGS_GET
descriptor flags. At some point these had some meaning but they were
replaced by the rcache access flags.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-08-31 13:07:32 -06:00
Jeff Squyres
c17968c738
Merge pull request #8028 from devreal/fix-mpi3-manpage
Fix MPI versions in MPI.3 manpage
2020-08-31 09:15:32 -04:00
Joseph Schuchart
4d420348f7 Fix MPI versions in MPI.3 manpage
Thanks to Andy Riebs for reporting that on the Open MPI user mailing list (https://www.mail-archive.com/users@lists.open-mpi.org/msg34103.html)

Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-08-31 09:21:26 +02:00
bosilca
2b62a2b8c1
Merge pull request #8023 from abouteiller/bugfix/ob1_err_abort
errors_are_fatal_comm_handler takes a pointer to the error constant
2020-08-29 11:50:57 -04:00
Aurelien Bouteiller
4df5fcf48c
errors_are_fatal_comm_handler takes a pointer to the error constant as
input.

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-08-26 16:05:30 -04:00
Howard Pritchard
c1c71b22b9
Merge pull request #8002 from hppritcha/topic/ofi_gni_prov_patch_for_mtl
OFI: patch OFI MTL for GNI provider
2020-08-26 12:30:50 -06:00
Jeff Squyres
8727e981b2
Merge pull request #8015 from hppritcha/topic/squash_icc_no_log_warning
suppress icc long double message
2020-08-26 10:40:26 -04:00
Howard Pritchard
d6ac41cbbd OFI: patch OFI MTL for GNI provider
Uncovered a problem using the GNI provider with the OFI MTL.
See https://github.com/ofiwg/libfabric/issues/6194.

Related to #8001

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
2020-08-26 08:25:53 -06:00
Brian Barrett
b1874e400e
Merge pull request #8019 from wckzhang/rsbandrsfix
coll/tuned: Revert RSB and RS default algorithms
2020-08-25 15:02:31 -07:00
William Zhang
57b95bcb45 coll/tuned: Revert RSB and RS default algorithms
Reduce scatter block and reduce scatter algorithms were hitting
correctness issues for non commutative strided tests. We will revert to
the original default algorithms for those two collectives (basic linear
and non overlapping respectively) in the non commutative op case.

See #8010

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-08-25 08:44:24 -07:00
George Bosilca
ee592f3672 Address the comments on the PR.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
Xi Luo
e59bde912e Remove the code handling zero count cases in ADAPT.
Set request in ibcast.c to empty when the count is 0.

Signed-off-by: Xi Luo <xluo12@vols.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
George Bosilca
c2970a3695 Correctly handle non-blocking collectives tags
As it is possible to have multiple outstanding non-blocking collectives
provided by different collective modules, we need a consistent
mechanism to allow them to select unique tags for each instance of a
collective.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
George Bosilca
8582e10d2b Consistent handling of zero counts in the MPI API.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
George Bosilca
d71264569e Fix the atomic management of the bcast and reduce freelist
API consistent with other collective modules
Add comments
Other minor cleanups.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
bsergentm
a4be3bb93d Coll/adapt Bull (#15)
* piggybacking Bull functionalities

* coll/adapt: Fix naming conventions and C11 atomic use

This commit fixes some naming convention issues, such as function names
which should follow the naming ompi_coll_adapt instead of
mca_coll_adapt, reserved for component and module naming (cf. tuned
collective component);

It also fixes the use of _Atomic construct, which is only valid in C11.
OPAL constructs have already been adapted to that use, so use
opal_atomic_* types instead.

* coll/adapt: Remove unused component field in module

This commit removes an unneeded field referencing the component in the
module of adapt, as it is already available through the
mca_coll_adapt_component global variable.

Signed-off-by: Marc Sergent <marc.sergent@atos.net>
Co-authored-by: Lemarinier, Pierre <pierre.lemarinier@atos.net>
Co-authored-by: pierrele <31764860+pierrele@users.noreply.github.com>
2020-08-24 12:13:38 -07:00
Xi Luo
fe73586808 Add ADAPT module
Add comments in the ADAPT module

Signed-off-by: Xi Luo <xluo12@vols.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
Howard Pritchard
6df0e53421 suppress icc long double message
improve configury to check whether icc is handling no long double.
This prevents seeing 100s of messages like this:

icc: command line warning #10148: option '-Wno-long-double' not supported

A similar patch will be needed for pmix.

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
2020-08-19 21:38:11 +00:00
Howard Pritchard
eefaadf7f1
Merge pull request #8012 from hppritcha/topic/mprobe_with_ofi_fix
ofi mtl: fix problem with mrecv
2020-08-18 17:21:37 -06:00
Howard Pritchard
e6f81ed6d6 ofi mtl: fix problem with mrecv
the ofi mtl mrecv was not properly setting the message in/out
arg to MPI_MRECV to MPI_MESSAGE_NULL.

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
2020-08-18 15:39:19 -06:00
Jeff Squyres
bf4e1b4376
Merge pull request #8008 from jsquyres/pr/cleanup-of-mpi-errors-and-exceptions
Cleanup of MPI errors and exceptions
2020-08-17 16:41:25 -04:00
Jeff Squyres
20c772e733 Cleanup language about MPI exceptions --> errors
MPI-4 is finally cleaning up its language: an MPI "exception" does not
actually exist.  The only thing that exists is an MPI "error" (and
associated handlers).  This commit replaces all relevant uses of the
word "exception" with "error".  Note that this is still applicable in
versions of the MPI standard less than MPI-4.0 (indeed, nearly all the
cases fixed in this commit are just changes to comments, anyway).

One exception to this is the Java bindings, where there's an
MPIException class.  In hindsight, it probably should have been named
MPIError, but changing it now would break anyone who is using the Java
bindings.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-08-17 13:57:47 -04:00
Jeff Squyres
1e11933660 ompi: cleanup C++ MPI::ERRORS_THROW_EXCEPTIONS
The C++ bindings were removed a while ago;
MPI::ERRORS_THROW_EXCEPTIONS and MPI_ERRORS_THROW_EXCEPTIONS no longer
exist.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-08-17 13:52:02 -04:00
Brian Barrett
f3832c1ab9
Merge pull request #7973 from wckzhang/btlexclude
btl/ofi: Disable EFA provider in versions earlier than libfabric 1.12.0
2020-08-12 13:34:03 -07:00