1
1

6827 Коммитов

Автор SHA1 Сообщение Дата
KAWASHIMA Takahiro
303d7842d9
Merge pull request #6074 from kawashima-fj/pr/remove-c99-type-check
Remove `#if HAVE_[TYPE]` for types available in C99
2018-11-20 11:42:13 +09:00
Aurelien Bouteiller
20447be744
Someone left a debug printf in NBC
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2018-11-16 10:37:04 -05:00
KAWASHIMA Takahiro
cacd6f389c datatype: Remove #if HAVE_[TYPE] for C99 types
Now Open MPI requires a C99 compiler. Checking availability of
the following types is no more needed.

- `long long` (`signed` and `unsigned`)
- `long double`
- `float _Complex`
- `double _Complex`
- `long double _Complex`

Furthermore, the `#if HAVE_[TYPE]` style checking is not correct.
Availability of C types is checked by `AC_CHECK_TYPES` in `configure.ac`.
`AC_CHECK_TYPES` defines macro `HAVE_[TYPE]` as `1` in `opal_config.h`
if the `[TYPE]` is available. But it does not define `HAVE_[TYPE]`
(instead of defining as `0`) if it is not available. So even if we
need `HAVE_[TYPE]` checking, it should be `#if defined(HAVE_[TYPE])`.

I didn't remove `AC_CHECK_TYPES` for these types in `configure.ac`
since someone may use `HAVE_[TYPE]` macros somewhere.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-11-14 09:32:52 +09:00
Aravind Gopalakrishnan
5cf43de445 MTL/OFI: Check threshold number of peers allowed per rank
When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have
sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when
this limit is crossed.

Check the max allowed number of ranks during add_procs() and return if there is
danger of exceeding this threshold.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-11-01 14:03:00 -07:00
Matias Cabral
2da31706bf
Merge pull request #5970 from aravindksg/coll-tuned-fix
coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms
2018-10-31 11:20:07 -07:00
Ralph Castain
05ac8fa71c Remove stale defunct tools
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-10-30 08:48:16 -07:00
Aravind Gopalakrishnan
88d781056f coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms
PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.

MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-10-24 15:31:33 -07:00
Joseph Schuchart
a193ae26bf Fix regression introduced earlier by re-adding a barrier after shared memory has been registered
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2018-10-24 15:54:19 -04:00
Yossi Itigin
4c442f2601
Merge pull request #5934 from hoopoepg/topic/suppressed-cov-warn-added-log-msg
COMMON/UCX: suppressed coverity warnings
2018-10-22 11:00:47 +03:00
Sergey Oblomov
1099d5f023 COMMON/UCX: added error code to log output
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-21 11:37:25 +03:00
Nathan Hjelm
a66373454e
Merge pull request #5943 from bosilca/fix/libnbc_warnings
Remove few warnings in libnbc identified by clang-1000.11.45.2
2018-10-20 21:24:30 -06:00
bosilca
c3abedbd2c
Merge pull request #5759 from bosilca/fix/monitoring
Fix/monitoring
2018-10-19 07:18:41 -07:00
Nathan Hjelm
dbae9c0958 romio/romio321: silence some compiler warnings
Some compilers complain when comparing signed and unsigned. romio321
was doing just this. The check is meant to check whether a size (which
is an ADIO_Offset-- a signed number) will work with memcpy which takes
a size_t. To silence the warning I added a new type (ADIO_Size) which
is an unsigned type and cast the ADIO_Offset to this new type.

Fixes #5951

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-18 13:36:51 -06:00
George Bosilca
dc972f0b92
Fix the PML monitoring.
The monitoring PML hides it's existence from the OMPI infrastructure by
removing itself from the list of PML loaded components, remaining hidden
until MPI_Finalize.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-10-18 00:29:23 -04:00
George Bosilca
668aa15dda
Early selection of the best PML.
With this patch the best PML is selected earlier, before finalizing
the others PML. This provides a simpler mechanism to intercept and
highjack the PML (as done in the monitoring PML)

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-10-18 00:29:23 -04:00
Ralph Castain
1bd772e8eb Remove the stale orte-dvm code
Users should migrate to https://github.com/pmix/prrte

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-10-17 15:11:38 -07:00
George Bosilca
66182a294d
Remove few warnings in libnbc identified by clang-1000.11.45.2
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-10-17 18:04:39 -04:00
Howard Pritchard
a435bfe1cf
Merge pull request #5933 from hppritcha/topic/remove_bfo_pml
remove the bfo pml
2018-10-17 09:39:58 -06:00
Nathan Hjelm
43547ade4c
Merge pull request #5663 from mkurnosov/coll-ireduce-rabenseifner
coll/libnbc: add Rabenseifner's algorithm for MPI_Ireduce
2018-10-17 09:02:06 -06:00
Nathan Hjelm
979a199b4f
Merge pull request #5896 from mkurnosov/coll-iallgather-recursivedoubling
coll/libnbc: add recursive doubling algorithm for MPI_Iallgather
2018-10-17 09:01:14 -06:00
Sergey Oblomov
df765595e3 COMMON/UCX: suppressed coverity warnings
- suppressed coverity warnings - added log messages on failed calls

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-17 16:11:03 +03:00
Howard Pritchard
7d6774acf8 remove the bfo pml
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-10-17 06:50:11 -06:00
Nathan Hjelm
1ff3cfedb6
Merge pull request #5921 from devreal/ompi-rdma-preinit
RDMA OSC: initialize segment memory before registering the segment
2018-10-16 15:10:02 -06:00
Joseph Schuchart
d9dcdfdfba RDMA OSC: initialize segment memory before registering the segment
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2018-10-16 16:12:14 -04:00
Edgar Gabriel
069084e6ad
Merge pull request #5907 from edgargabriel/topic/testmpio-fixes
Topic/testmpio fixes
2018-10-16 13:03:22 -07:00
Edgar Gabriel
ba95588332 io/ompio: add verification for data representations.
check for providing a data representation that is actually supported
by ompio.

Add also one check for a non-NULL pointer in mpi/c/file_set_view
for the data representation.

Also fixes parts of issue #5643

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-16 12:45:33 -05:00
Jeff Squyres
54ca3310ea ompi: cleanup various string operations
Several fixes to string handling:

1. strncpy() -> opal_string_copy() (because opal_string_copy()
   guarantees to NULL-terminate, and strncpy() does not)
2. Simplify a few places, such as:
   * Since opal_string_copy() guarantees to NULL terminate, eliminate
     some memsets(), etc.
   * Use opal_asprintf() to eliminate multi-step string creation

There's more work that could be done; e.g., this commit doesn't
attempt to clean up any strcpy() usage.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-10-14 16:10:20 -07:00
Edgar Gabriel
849d0452a0 io/ompio: execute barrier before sync
this ensures that all processes are done modifying a file
before syncing. Fixes an error in the testmpio testsuite.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-11 17:39:05 -05:00
Edgar Gabriel
bf058ca6b0 common/ompio: check datatypes when setting file view
return MPI_ERR_ARG if the size of the fileview is not a
multiple of the size of the etype provided.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-11 14:43:32 -05:00
Edgar Gabriel
05d25383c2 common/ompio: return correct error code for improper access
return MPI_ERR_ACCESS if the user tries to read from  a file
that was opened using MPI_MODE_WRONLY

return MPI_ERR_READ_ONLY if the user tries to write a file
that was opened using MPI_MODE_RDONLY

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-11 14:41:58 -05:00
Edgar Gabriel
c0d7b578be io/ompio: fix seek position calculation for SEEK_CUR
This commit fixes the calculation of the position where to
seek to, in case SEEK_CUR is used.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-11 14:09:03 -05:00
Yossi Itigin
b71e85b8d5 pml_ucx: fix return code from mca_pml_ucx_init() error flow
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-11 18:48:54 +03:00
Mikhail Kurnosov
a7386c1e09 coll/libnbc: add recursive doubling algorithm for MPI_Iallgather
Implements recursive doubling algorithm for MPI_Iallgather.
The algorithm can be used only for power-of-two number of processes.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-10-11 21:43:13 +07:00
Yossi Itigin
b8e1af6fcb osc_ucx: add worker flush before osc module free
Make sure all pending communications are done on all ranks before
closing the window. This way it will be safe to close the endpoints when
closing the component.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:47:16 +03:00
Yossi Itigin
bcc48515e4 Revert "osc_ucx: fix hang/timeout in component finalize"
This reverts commit 438d13b4ca1e7333b789ca3fb536fda17b0feb38.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:47:13 +03:00
Yossi Itigin
27d8c8e83c
Merge pull request #5878 from yosefe/topic/pml-ucx-fix-datatype-leak
pml_ucx: add ompi datatype attribute to release ucp_datatype
2018-10-10 20:18:39 +03:00
Yossi Itigin
a012ee91d8
Merge pull request #5886 from yosefe/topic/osc-ucx-fix-finalize-hang
osc_ucx: fix hang/timeout in component finalize
2018-10-10 16:29:29 +03:00
Yossi Itigin
9a365555b0
Merge pull request #5879 from hoopoepg/topic/fixed-zero-size-window
OSC/UCX: fixed zero-size window processing
2018-10-10 16:28:55 +03:00
Yossi Itigin
40ac9e4771 pml_ucx: fix return code from mca_pml_ucx_init()
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 14:41:05 +03:00
Yossi Itigin
dc6809495d osc_ucx: fix hang/timeout in component finalize
Add barrier to make sure all endpoints are destroyed before destroying
the worker.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 14:38:06 +03:00
Sergey Oblomov
ae6f81983f OSC/UCX: fixed zero-size window processing
- added processing of zero-size MPI window

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-10 13:08:01 +03:00
Yossi Itigin
4763822a64 pml_ucx: add ompi datatype attribute to release ucp_datatype
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-09 17:34:34 +03:00
Mikhail Kurnosov
b0429d25df coll/libnbc: add knomial tree algorithm for MPI_Ibcast
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-10-09 20:43:04 +07:00
Mikhail Kurnosov
7bd63e79c8 coll/libnbc: add Rabenseifner's algorithm for MPI_Ireduce
An implementation of R. Rabenseifner's algorithm for MPI_Ireduce.
This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by a gather.

Limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators only

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-10-09 20:27:09 +07:00
Brian Barrett
e9e4d2a4bc Handle asprintf errors with opal_asprintf wrapper
The Open MPI code base assumed that asprintf always behaved like
the FreeBSD variant, where ptr is set to NULL on error.  However,
the C standard (and Linux) only guarantee that the return code will
be -1 on error and leave ptr undefined.  Rather than fix all the
usage in the code, we use opal_asprintf() wrapper instead, which
guarantees the BSD-like behavior of ptr always being set to NULL.
In addition to being correct, this will fix many, many warnings
in the Open MPI code base.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-08 16:43:53 -07:00
Mikhail Kurnosov
9557fa087f Resolve merge conflicts
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-10-05 21:40:27 +07:00
Nathan Hjelm
88a560fa3c
Merge pull request #5744 from mkurnosov/coll-iscan-recursivedoubling
coll/libnbc: add recursive doubling algorithm for MPI_Iscan
2018-10-03 09:02:05 -06:00
Brian Barrett
2e24e6ec08 coll libnbc: Remove dead code
Remove dead code that was causing warnings about unused static
functions.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-02 13:35:15 -04:00
Brian Barrett
c5eaa38491 mtl ofi: Change from opt-in to opt-out provider selection
Change default provider selection logic for the OFI MTL.  The
old logic was whitelist-only, so any new HPC NIC provider would
have to ask users to do extra work or wait for an OMPI release
to be whitelisted.  The reason for the logic was to avoid
selecting a "generic" provider like sockets or shm that would
frequently have worse performance than the optimized BTL options
Open MPI supports.

With the change, we blacklist the (small, relatively static) list
of providers that duplicate internal capabilities.  Users can use
one of thse blacklisted providers in two ways: first, they can
explicitly request the provider in the include list (which will
override the default exclude list) and second, the can set a new
empty exclude list.

Since most HPC networks require special libraries and therefore
an explicit build of libfabric, it is highly unlikely that this
change will cause users to use libfabric when they didn't want to
do so.  It does, however, solve the whitelisting problem.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-09-27 11:02:18 -07:00
Jeff Squyres
6bb356ab87 Squash a bunch of harmless compiler warnings.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-09-26 12:15:21 -07:00