1
1

30946 Коммитов

Автор SHA1 Сообщение Дата
Xi Luo
e59bde912e Remove the code handling zero count cases in ADAPT.
Set request in ibcast.c to empty when the count is 0.

Signed-off-by: Xi Luo <xluo12@vols.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
George Bosilca
c2970a3695 Correctly handle non-blocking collectives tags
As it is possible to have multiple outstanding non-blocking collectives
provided by different collective modules, we need a consistent
mechanism to allow them to select unique tags for each instance of a
collective.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
George Bosilca
8582e10d2b Consistent handling of zero counts in the MPI API.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
George Bosilca
d71264569e Fix the atomic management of the bcast and reduce freelist
API consistent with other collective modules
Add comments
Other minor cleanups.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
bsergentm
a4be3bb93d Coll/adapt Bull (#15)
* piggybacking Bull functionalities

* coll/adapt: Fix naming conventions and C11 atomic use

This commit fixes some naming convention issues, such as function names
which should follow the naming ompi_coll_adapt instead of
mca_coll_adapt, reserved for component and module naming (cf. tuned
collective component);

It also fixes the use of _Atomic construct, which is only valid in C11.
OPAL constructs have already been adapted to that use, so use
opal_atomic_* types instead.

* coll/adapt: Remove unused component field in module

This commit removes an unneeded field referencing the component in the
module of adapt, as it is already available through the
mca_coll_adapt_component global variable.

Signed-off-by: Marc Sergent <marc.sergent@atos.net>
Co-authored-by: Lemarinier, Pierre <pierre.lemarinier@atos.net>
Co-authored-by: pierrele <31764860+pierrele@users.noreply.github.com>
2020-08-24 12:13:38 -07:00
Xi Luo
fe73586808 Add ADAPT module
Add comments in the ADAPT module

Signed-off-by: Xi Luo <xluo12@vols.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
Howard Pritchard
eefaadf7f1
Merge pull request #8012 from hppritcha/topic/mprobe_with_ofi_fix
ofi mtl: fix problem with mrecv
2020-08-18 17:21:37 -06:00
Howard Pritchard
e6f81ed6d6 ofi mtl: fix problem with mrecv
the ofi mtl mrecv was not properly setting the message in/out
arg to MPI_MRECV to MPI_MESSAGE_NULL.

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
2020-08-18 15:39:19 -06:00
Jeff Squyres
bf4e1b4376
Merge pull request #8008 from jsquyres/pr/cleanup-of-mpi-errors-and-exceptions
Cleanup of MPI errors and exceptions
2020-08-17 16:41:25 -04:00
Jeff Squyres
20c772e733 Cleanup language about MPI exceptions --> errors
MPI-4 is finally cleaning up its language: an MPI "exception" does not
actually exist.  The only thing that exists is an MPI "error" (and
associated handlers).  This commit replaces all relevant uses of the
word "exception" with "error".  Note that this is still applicable in
versions of the MPI standard less than MPI-4.0 (indeed, nearly all the
cases fixed in this commit are just changes to comments, anyway).

One exception to this is the Java bindings, where there's an
MPIException class.  In hindsight, it probably should have been named
MPIError, but changing it now would break anyone who is using the Java
bindings.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-08-17 13:57:47 -04:00
Jeff Squyres
1e11933660 ompi: cleanup C++ MPI::ERRORS_THROW_EXCEPTIONS
The C++ bindings were removed a while ago;
MPI::ERRORS_THROW_EXCEPTIONS and MPI_ERRORS_THROW_EXCEPTIONS no longer
exist.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-08-17 13:52:02 -04:00
Brian Barrett
f3832c1ab9
Merge pull request #7973 from wckzhang/btlexclude
btl/ofi: Disable EFA provider in versions earlier than libfabric 1.12.0
2020-08-12 13:34:03 -07:00
William Zhang
41acfee2bb btl/ofi: Disable ofi_rxm provider
The ofi_rxm provider is dependent upon the underlying hardware for its
implementation of FI_DELIVERY_COMPLETE. Since this can lead to early
completions, we disable the provider to avoid correctness issues.

This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-08-11 16:47:19 -07:00
bosilca
b6a06ca37b
Merge pull request #7974 from abouteiller/bugfix/vader_des_tag
bug fix: des->tag = hdr->frag, should be hdr->tag
2020-08-11 11:13:14 -04:00
Josh Hursey
1d07933a78
Merge pull request #7992 from mkurnosov/fix-parsing-locality-str
opal/hwloc: fix a typo in parsing locality string
2020-08-11 08:58:59 -05:00
Mikhail Kurnosov
4708458d6b Fix a typo in parsing locality string: L0 changed to L1
(`prte_hwloc_base_get_locality_string` never returns locality string with L0).

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2020-08-11 08:43:47 +07:00
Jeff Squyres
9a0f661a66
Merge pull request #7975 from wckzhang/btlcommonlist
btl/ofi: Use common provider include/exclude list
2020-08-10 14:41:53 -04:00
Nathan Hjelm
a44914cb6b
Merge pull request #7915 from bosilca/fix/intel_2330_warning_take2
Second take on fixing the Intel _Atomic atomic operation warning
2020-08-08 06:30:15 -06:00
Jeff Squyres
f5cb1a49b1
Merge pull request #7897 from cniethammer/cmd_line_fixes
Minor fix in cmd line parser help
2020-08-08 07:13:22 -04:00
Aurelien Bouteiller
8a2127bcd3
Merge pull request #7984 from abouteiller/bugfix/java-errh
Missing function to populate java errors_abort handler
2020-08-06 11:47:31 -04:00
Aurelien Bouteiller
e6c7731d9b
Missing function to populate java error handler abort
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-08-06 10:29:42 -04:00
Aurelien Bouteiller
efbc6ff6a5
Merge pull request #7798 from abouteiller/mpi-next/unbounderr-self
MPI-4 error handling: 'unbound' errors to MPI_COMM_SELF
2020-08-03 15:59:14 -04:00
Aurelien Bouteiller
ee149fcfcb
MPI3 (unchanged in 4) says that errors after MPI_REQUEST_FREE are FATAL
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-07-31 17:49:38 -04:00
Aurelien Bouteiller
bec7dfc1b1
Errors in non-api calls remain fatal
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-07-31 17:49:35 -04:00
Aurelien Bouteiller
e0df0f4bd9
Make errors_mpi3 compat a global mpi-3 compatibility flag
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-07-31 17:48:47 -04:00
Aurelien Bouteiller
7dfe6c1adc
Thread-shift errors reported by PMIx to the main MPI progress engine
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>

make things happen before the terminal call

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-07-31 17:48:44 -04:00
William Zhang
9b8f463a76 btl/ofi: Use common provider include/exclude list
The btl/ofi does not currently utilize the common ofi include/exclude
list. Added verification code similar to the mtl/ofi that will check if
the info object is in the include or exclude list. If it isn't in the
include list or is in the exclude list, validate_info will return
OPAL_ERROR. The btl/ofi will no longer pass a provider name as a hint
when calling getinfo, instead filtering the provider during
validate_info.

This patch also moves the is_in_list MTL function into common code and
adds additional debugging output to the BTL to match the MTL standard.

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-07-31 12:13:00 -07:00
Artem Polyakov
dfb0ae748f
Merge pull request #7681 from janjust/master-tls-refactor_v3
ompi/osc/ucx: remove global TLS tables
2020-07-31 10:39:53 -07:00
William Zhang
a7dcfd9874 btl/ofi: Disable EFA provider in versions earlier than libfabric 1.12.0
EFA incorrectly implements FI_DELIVERY_COMPLETE in earlier libfabric
versions. While FI_DELIVERY_COMPLETE would be advertised by the
provider, completions would return too early by not accounting for
bounce buffers on the receive side. This would cause the BTL
to receive early completions that lead to correctness issues.

This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-07-30 13:53:16 -07:00
Aurelien Bouteiller
8e0cb1d49d
des->tag = hdr->frag, should be hdr->tag
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-07-30 14:02:22 -04:00
Tommy Janjusic
2c8da2c0a9 Further code reduction and simplifications.
Co-authored-by: Artem Polyakov <artpol84@gmail.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-30 20:00:22 +03:00
Tomislav Janjusic
cbfc9a3263 opal/mca/common/ucx: Use new TSD api
Co-authored-by: Artem Y. Polyakov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-30 00:21:26 +03:00
Tomislav Janjusic
72296e12f4 opal/common/ucx:
-mutex lock/unlock suggestions
-common destructor/cleanup

Co-authored-with: Artem Y. Polyakov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-30 00:21:26 +03:00
Tomislav Janjusic
27ba4b612f ompi/osc/ucx: Remove workerpool's global thread storage tables.
Co-authored-by: Artem Y. Polyakov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-30 00:21:26 +03:00
Brian Barrett
41df122083
Merge pull request #7730 from wckzhang/newdefaults
coll/tuned: Change the default collective algorithm selection
2020-07-28 15:27:46 -07:00
William Zhang
ce40cfbaa5 coll/tuned: Change the default collective algorithm selection
The default algorithm selections were out of date and not performing
well. After gathering data from OMPI developers, new default algorithm
decisions were selected for:

    allgather
    allgatherv
    allreduce
    alltoall
    alltoallv
    barrier
    bcast
    gather
    reduce
    reduce_scatter_block
    reduce_scatter
    scatter

These results were gathered using the ompi-collectives-tuning package
and then averaged amongst the results gathered from multiple OMPI
developers on their clusters.

You can access the graphs and averaged data here:
https://drive.google.com/drive/folders/1MV5E9gN-5tootoWoh62aoXmN0jiWiqh3

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-07-28 10:41:48 -07:00
Austen Lauria
d0152eb51e
Merge pull request #7940 from awlauria/revert_libevent_commit
Revert "Address a race condition in libevent select."
2020-07-28 11:34:59 -04:00
Jeff Squyres
c07d77fbf2
Merge pull request #7957 from bosilca/fix/avx_alignment
Use the unaligned SSE memory access primitive.
2020-07-27 15:50:40 -04:00
Artem Polyakov
e5ef80fe8c
Merge pull request #7936 from janjust/master-new-tsd-thread-api
Master: new thread-specific-data (tsd) api
2020-07-24 14:58:03 -07:00
Ralph Castain
863a058f8d
Merge pull request #7964 from rhc54/topic/sync
Sync to PRRTE master
2020-07-24 14:57:32 -07:00
Ralph Castain
8c0269cd4f
Sync to PRRTE master
Pickup the FT and libev cleanups

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-07-24 14:11:34 -07:00
Tomislav Janjusic
d809f6ba27 New TSD API interface fix for various components
Co-authored by: Artem Polykaov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-24 18:29:40 +03:00
Tomislav Janjusic
cba5a0e117 Rename tsd interface function calls
Co-authored by: Artem Polykaov <artemp@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-24 18:29:07 +03:00
Tomislav Janjusic
cb1955bb53 Fix renamed interface functions for argo, q, and pthreads
Co-authored by: Artem Polykaov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-24 18:29:07 +03:00
Tomislav Janjusic
07dc86eb3a opal/thread: New TSD API
Co-authored-by: Artem Polyakov <artemp@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-24 18:29:07 +03:00
Ralph Castain
06c585c316
Merge pull request #7962 from rhc54/topic/sync
Sync to PMIx and PRRTE master
2020-07-23 16:22:32 -07:00
Ralph Castain
c0bc89dc50
Sync to PMIx and PRRTE master
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-07-23 12:35:17 -07:00
Aurelien Bouteiller
06c563625a
Add a test for mpi_errors_mpi3 behavior and non-catastrophic errors
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-07-23 05:09:29 -04:00
Aurélien Bouteiller
b37202c74e
Add compliance mode with MPI-4 routing of errors to MPI_COMM_SELF by
default
And other streamlining of aborting behavior.

Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>

Remove OMPI_COMM_ERRORS and use NOHANDLE macros instead.

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>

route unbound errors to self error handler

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>

Do not raise the error handler from within components

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-07-23 05:09:29 -04:00
George Bosilca
c4e88a43a3
Check unaligned ops for correctness.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-22 11:26:07 -04:00