1
1

5901 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
3edd62e568
Merge pull request #8203 from jsquyres/pr/fix-warnings
Fix many compiler warnings
2020-11-23 10:15:15 -05:00
Sergey Oblomov
1aa6e74d1b PML/UCX/WPOOL: fixe potential leak in error processing
- there was potential leak in error handling, fixed

Signed-off-by: Sergey Oblomov <sergeyo@nvidia.com>
2020-11-17 18:08:15 +02:00
Sergey Oblomov
a6e00e3d41 PML/UCX/WPOOL: fixed coverity issue
- fixed issue reported by coverity

Signed-off-by: Sergey Oblomov <sergeyo@nvidia.com>
2020-11-16 10:30:32 +02:00
Jeff Squyres
14aa5fae3c Fix many compiler warnings:
- Add some missing AC_CHECK_SIZEOF's in configure.ac
- Remove some unused variables
- Initialize some variables
- Fix some parameter types
- Cast where appropriate/safe to fix warnings
- Move ompi/mca/common/monitoring Fortran bindings to a separate .c
  file so that they can use different #define's than the C bindings,
  and therefore compile properly / without warnings.
- Fix signedness discrepancies
- Who knew?  Separated these into multiple #if's, instead:
  ```
  // This is undefined behavior
  #define HAVE_FOO defined(FOO)
  #define YOW (HAVE_FOO && defined(BAR))
  ```
- Fix some typos in OMPI_BUILD_HOST logic
- Don't "2>/dev/null" in OMPI_BUILD_HOST logic; it just hides errors

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-11-14 07:20:30 -08:00
Jeff Squyres
67421c5d23
Merge pull request #8207 from ggouaillardet/topic/retain_datatypes_w
coll/base: do not drop const qualifier
2020-11-14 09:21:37 -05:00
Artem Polyakov
f9ef4b4ac0
Merge pull request #7632 from devreal/osc-ucx-progress
UCX osc: make progress on idle worker if none are active
2020-11-13 13:27:31 -08:00
Gilles Gouaillardet
c49e5e5c4a coll/base: do not drop const qualifier
MPI_Ialltoallw() and friends take a const MPI_Datatype types[] argument.
In order to be able to call OBJ_RELEASE(types[0]), we used to simply
drop the const modifier. This change make it right by introducing the
OBJ_RELEASE_NO_NULLIFY(object) macro that no more set object = NULL
if the object is freed.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-11-13 17:31:04 +09:00
Joseph Schuchart
581478dc91 UCX osc: make progress on default worker if none are active
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-11-11 23:23:31 +01:00
Ralph Castain
2f7f1feca5
Fix confusion between cpuset and locality
Ensure we correctly collect and save the cpuset of the process
separately from its locality string. Ensure we use the correct one when
computing things like relative locality between processes.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-11-10 16:41:00 -08:00
Jeff Squyres
c960d292ec Convert all README files to Markdown
A mindless task for a lazy weekend: convert all the README and
README.txt files to Markdown.  Paired with the slow conversion of all
of our man pages to Markdown, this gives a uniform language to the
Open MPI docs.

This commit moved a bunch of copyright headers out of the top-level
README.txt file, so I updated the relevant copyright header years in
the top-level LICENSE file to match what was removed from README.txt.

Additionally, this commit did (very) little to update the actual
content of the README files.  A very small number of updates were made
for topics that I found blatently obvious while Markdown-izing the
content, but in general, I did not update content during this commit.
For example, there's still quite a bit of text about ORTE that was not
meaningfully updated.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Co-authored-by: Josh Hursey <jhursey@us.ibm.com>
2020-11-10 13:52:29 -05:00
Raghu Raja
917269b699 Coverity fixes for recent OFI changes
8017f12 introduced a new function to get the package rank of a process,
which had a pass-by-value signature (opal_process_info_t); and coverity
was not happy about it. This commit changes the signature to take a
reference to opal_process_info_t instead.

Signed-off-by: Raghu Raja <craghun@amazon.com>
2020-11-04 00:18:54 +00:00
Jeff Squyres
e9e5dab8b9
Merge pull request #8153 from dancejic/multi
Using package_rank to select between NIC of equal distance from the process.
2020-11-02 15:27:37 -05:00
Nikola Dancejic
8017f12801 Using package_rank to select between NIC of equal distance from the process.
If PMIX_PACKAGE_RANK is available, uses this value to select between multiple
NIC of equal distance between the current process. If this value is not
available, try to calculate it by getting the locality string from each local
process and assign a package_rank. If everything fails, fall back to using
process_id.rank to select the NIC. This last case is not ideal, but has a small
chance of occuring, and causes an output to be displayed to notify that this is
occuring.

Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
2020-11-02 00:32:03 -08:00
Jeff Squyres
8ed1d28fb4 keyval_parse.c: update whitespace/comments
Slightly improve comments and update some whitespace.

No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-10-31 04:16:00 -07:00
Jeff Squyres
eac0ab5c3a keyval_parse.c: ensure to init values
Coverity complained about uninitialized variables; ensure that they
are initialized to 0 in all cases.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-10-31 04:11:49 -07:00
Joseph Schuchart
320a9a1660 OPAL: fix string buffer allocation for large env variables
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-10-30 14:28:23 +01:00
bosilca
ce97090673
Merge pull request #7735 from bosilca/coll/han
A hierarchical, architecture-aware collective communication module
2020-10-26 00:07:03 -04:00
George Bosilca
cc6432b4a2 Fix partial packing of non data elements.
There was a bug allowing for partial packing of non-data elements (such as loop
and end_loop markers) during the exit condition of a pack/unpack call. This has
basically no meaning. Prevent this bug from happening by making sure the element
point to a data before trying to partially pack it.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-10-25 18:15:09 -04:00
Mark Allen
bca3c0ed17 make Type_create_resized set FLAG_USER_UB
In the below type creation sequence
    MPI_Type_create_resized(MPI_INT, 0, 6, &mydt1);
    MPI_Type_contiguous(1, mydt1, &mydt2);
I think both mydt1 and mydt2 should have extent 6.

The Type_create_resized would add an UB marker into the type map,
and the definition of Type_contiguous would maintain the same
markers in the new map.

The only counter argument I can think of to the above is if
we declared that mydt1 is illegal because it's putting data
on addresses that don't satisfy the alignment requirement.

But in my interpretation of the standard the term "alignment
requirement" is a property of the system memory, and MPI defines
"extent" in a way to make it easy to create MPI datatypes that
support the system's alignment requirements.  But the standard
isn't saying it's illegal to make MPI datatypes that don't satisfy
the system's alignment requirements.  I think this is true also
because the MPI datatypes might be used in file IO where the
requirements are different, so that's my long winded explanation
for why I don't think we can declare mydt1 illegal.

Complete example:
    #include <stdio.h>
    #include <mpi.h>
    int main() {
        MPI_Datatype mydt1, mydt2;
        MPI_Aint lb, ext;
        MPI_Init(0, 0);
        MPI_Type_create_resized(MPI_INT, 0, 6, &mydt1);
        MPI_Type_commit(&mydt1);
        MPI_Type_contiguous(1, mydt1, &mydt2);
        MPI_Type_commit(&mydt2);

        MPI_Type_get_extent(mydt1, &lb, &ext);
        printf("mydt1 extent %d\n", (int)ext);
        MPI_Type_get_extent(mydt2, &lb, &ext);
        printf("mydt2 extent %d\n", (int)ext);

        MPI_Type_free(&mydt1);
        MPI_Type_free(&mydt2);
        MPI_Finalize();
        return(0);
    }
% mpicc -o x test.c
% mpirun -np 1 ./x
Without this PR the output is
> mydt1 extent 6
> mydt2 extent 8
With this PR both extents are 6.

Fwiw I also tested with mpich and they give 6 for both extents.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2020-10-15 16:10:50 -05:00
Jeff Squyres
0bcef049c9
Merge pull request #8053 from shintaro-iwasaki/topic/fix_issue_8036
opal/mca/threads/qthreads: Fix #8036
2020-10-08 09:40:58 -04:00
NARIBAYASHI Akira
3dc3bbc1b1 opal/util: Fix typo
Signed-off-by: NARIBAYASHI Akira <a.naribayashi@fujitsu.com>
2020-10-08 16:13:09 +09:00
Shintaro Iwasaki
84dcb233bf mca/threads: set THREAD_* flags in the component's root configure.m4
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
2020-10-05 11:39:26 -05:00
Shintaro Iwasaki
919a16300c mca/threads/qthreads: implement missing functionalities
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
2020-10-05 11:39:18 -05:00
Shintaro Iwasaki
db3e598b6a mca/threads/qthreads: remove Argobots dependency
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
2020-10-05 11:39:09 -05:00
Shintaro Iwasaki
6cc17b0c6a mca/threads/qthreads: rework configury to be smarter
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
2020-10-05 11:39:01 -05:00
Brian Barrett
8f89d15d31 build: Move PMIx to a 3rd-party package
With Open MPI 5.0, the decision was made to stop building
3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as
MCA components and instead 1) start relying on external libraries
whenever possible and 2) Open MPI builds the 3rd party
libraries (if needed) as independent libraries, rather than
linked into libopen-pal.

This patch moves the PMIx library bundled with Open MPI from a
MCA framework to a stand-alone library built outside of OPAL.  Due
to the amount of code in the MCA base (and its assumptions about
being part of an MCA framework), the framework is left with no
active components.  Any pre-installed version of PMIx 3.0.0 or
newer is preferred over the internal version.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-10-01 16:56:00 +00:00
Brian Barrett
0e9581d478 build: Move hwloc to a 3rd-party package
With Open MPI 5.0, the decision was made to stop building
3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as
MCA components and instead 1) start relying on external libraries
whenever possible and 2) Open MPI builds the 3rd party
libraries (if needed) as independent libraries, rather than
linked into libopen-pal.

This patch moves the hwloc library bundled with Open MPI from a
MCA framework to a stand-alone library built outside of OPAL.  Due
to the amount of code in the MCA base (and its assumptions about
being part of an MCA framework), the framework is left with no
active components.  Any pre-installed version of HWLOC 1.6 or
newer is preferred over the internal version.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-10-01 16:55:59 +00:00
Brian Barrett
9ffac85650 build: Move libevent to a 3rd-party package
With Open MPI 5.0, the decision was made to stop building
3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as
MCA components and instead 1) start relying on external libraries
whenever possible and 2) Open MPI builds the 3rd party
libraries (if needed) as independent libraries, rather than
linked into libopen-pal.

This patch moves libevent from an MCA framework to a stand-alone
library built outside of OPAL.  A wrapper in opal/util is provided
to minimize the unnecessary changes in the rest of the code.  When
using the internal Libevent, it will be installed as a stand-alone
libevent.a, instead of bundled in OPAL.  Any pre-installed version
of Libevent at or after 2.0.21 is preferred over the internal
version.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-10-01 16:55:58 +00:00
Nathan Hjelm
7fca99b2f7 patcher: remove the linux component
The Linux component was an attempt to hook calls by patching the dynamic
symbol table. It, unfortunately, does not work as it will always miss
calls made internally by glibc. For example, it might catch a user call
directly to munmap but will miss the chain free -> munmap. Since the
later is the common case we were trying to hook this made the component
unusable. This PR finally kills the component.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-09-18 10:23:01 -06:00
bosilca
eca00a7a3b
Merge pull request #8042 from bosilca/fix/sm_emu
Fix a copy/paste in the RDMA emulation.
2020-09-14 11:43:00 -04:00
Jeff Squyres
3a93e4f94d
Merge pull request #8038 from devreal/fix-opal-pmix-cond-init
Use correct conditional variable initializer in opal/mca/pmix/base
2020-09-14 09:38:43 -04:00
George Bosilca
49da998f33
Fix a copy/paste in the RDMA emulation.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-09-13 22:56:58 -04:00
Joseph Schuchart
b78c7e93db Use correct conditional variable initializer in opal/mca/pmix/base
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-09-09 09:05:30 +02:00
Joseph Schuchart
fc025c78df UCX: do not dereference NULL pointer in wpmem_[free|flush]
Flushing or freeing a newly created dynamic window causes NULL to be passed.

Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-09-04 09:31:18 +02:00
Nathan Hjelm
556a4ac0da btl: remove unused descriptor flags
This PR removes the MCA_BTL_DES_FLAGS_PUT and MCA_BTL_DES_FLAGS_GET
descriptor flags. At some point these had some meaning but they were
replaced by the rcache access flags.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-08-31 13:07:32 -06:00
Brian Barrett
f3832c1ab9
Merge pull request #7973 from wckzhang/btlexclude
btl/ofi: Disable EFA provider in versions earlier than libfabric 1.12.0
2020-08-12 13:34:03 -07:00
William Zhang
41acfee2bb btl/ofi: Disable ofi_rxm provider
The ofi_rxm provider is dependent upon the underlying hardware for its
implementation of FI_DELIVERY_COMPLETE. Since this can lead to early
completions, we disable the provider to avoid correctness issues.

This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-08-11 16:47:19 -07:00
bosilca
b6a06ca37b
Merge pull request #7974 from abouteiller/bugfix/vader_des_tag
bug fix: des->tag = hdr->frag, should be hdr->tag
2020-08-11 11:13:14 -04:00
Mikhail Kurnosov
4708458d6b Fix a typo in parsing locality string: L0 changed to L1
(`prte_hwloc_base_get_locality_string` never returns locality string with L0).

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2020-08-11 08:43:47 +07:00
Jeff Squyres
9a0f661a66
Merge pull request #7975 from wckzhang/btlcommonlist
btl/ofi: Use common provider include/exclude list
2020-08-10 14:41:53 -04:00
Nathan Hjelm
a44914cb6b
Merge pull request #7915 from bosilca/fix/intel_2330_warning_take2
Second take on fixing the Intel _Atomic atomic operation warning
2020-08-08 06:30:15 -06:00
Jeff Squyres
f5cb1a49b1
Merge pull request #7897 from cniethammer/cmd_line_fixes
Minor fix in cmd line parser help
2020-08-08 07:13:22 -04:00
William Zhang
9b8f463a76 btl/ofi: Use common provider include/exclude list
The btl/ofi does not currently utilize the common ofi include/exclude
list. Added verification code similar to the mtl/ofi that will check if
the info object is in the include or exclude list. If it isn't in the
include list or is in the exclude list, validate_info will return
OPAL_ERROR. The btl/ofi will no longer pass a provider name as a hint
when calling getinfo, instead filtering the provider during
validate_info.

This patch also moves the is_in_list MTL function into common code and
adds additional debugging output to the BTL to match the MTL standard.

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-07-31 12:13:00 -07:00
William Zhang
a7dcfd9874 btl/ofi: Disable EFA provider in versions earlier than libfabric 1.12.0
EFA incorrectly implements FI_DELIVERY_COMPLETE in earlier libfabric
versions. While FI_DELIVERY_COMPLETE would be advertised by the
provider, completions would return too early by not accounting for
bounce buffers on the receive side. This would cause the BTL
to receive early completions that lead to correctness issues.

This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-07-30 13:53:16 -07:00
Aurelien Bouteiller
8e0cb1d49d
des->tag = hdr->frag, should be hdr->tag
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-07-30 14:02:22 -04:00
Tommy Janjusic
2c8da2c0a9 Further code reduction and simplifications.
Co-authored-by: Artem Polyakov <artpol84@gmail.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-30 20:00:22 +03:00
Tomislav Janjusic
cbfc9a3263 opal/mca/common/ucx: Use new TSD api
Co-authored-by: Artem Y. Polyakov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-30 00:21:26 +03:00
Tomislav Janjusic
72296e12f4 opal/common/ucx:
-mutex lock/unlock suggestions
-common destructor/cleanup

Co-authored-with: Artem Y. Polyakov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-30 00:21:26 +03:00
Tomislav Janjusic
27ba4b612f ompi/osc/ucx: Remove workerpool's global thread storage tables.
Co-authored-by: Artem Y. Polyakov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-30 00:21:26 +03:00
Austen Lauria
d0152eb51e
Merge pull request #7940 from awlauria/revert_libevent_commit
Revert "Address a race condition in libevent select."
2020-07-28 11:34:59 -04:00