1
1
Граф коммитов

29203 Коммитов

Автор SHA1 Сообщение Дата
Gaëtan Bossu
ccc96efc2e DDN's Infinite Memory Engine support for OMPIO
Changes made:
 - Create a new fs component for IME
 - Create a new fbtl component for IME
 - Modify the close function of OMPIO to finalize IME if necessary

Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>
Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
2018-08-16 11:45:47 +02:00
Brian Barrett
ac53ab9f5b dist: Update master NEWS with 2.1.4
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-08-15 12:21:08 -07:00
Aurelien Bouteiller
6acebc40a1
Handle error cases in TCP BTL
When an error is returned by the socket operations, trigger the
appropriate error path in the PML to give an opportunity for
rerouting/error handling.

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2018-08-14 15:35:24 -04:00
Jeff Squyres
1b96be5f2f
Merge pull request #5536 from hjelmn/btl_vader_mr_fix
btl/vader: move memory barrier to where it belongs
2018-08-14 11:32:49 -04:00
Nathan Hjelm
dca3516765 btl/vader: move memory barrier to where it belongs
The write memory barrier was intended to precede setting a fast-box
header but instead follows it. This commit moves the memory barrier to
the intended location.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-13 10:14:34 -06:00
Nathan Hjelm
e0f73866ef
Merge pull request #5525 from hjelmn/event_threading
opal/progress: protect against multiple threads in event base
2018-08-13 09:21:44 -06:00
Jeff Squyres
f1346cba42
Merge pull request #5533 from jsquyres/pr/fix-libevent-hwloc-gwlargflappbt
Fix internal/external hwloc and libevent
2018-08-11 14:59:24 -04:00
Jeff Squyres
01e4570af7 hwloc201/configure.m4: make it safe when used with hwloc:external
The Autoconf AC_CONFIG_* macros can only be instantiated exacly once
for any given file, *and* they must be in a code execution path at run
time for the target file to be generated at the end of configure.

For example, if you want to generate file ABC at the end of configure,
you must invoke the AC_CONFIG_FILES(ABC) macro in a code path that
will get executed when configure is run.

That's pretty straightforward.

What's not straightforward is two corner cases:

1. You cannot invoke the AC_CONFIG_FILES(ABC) macro for the same file
   more than once.  If you do, autoreconf will fail (even before you
   can run configure).
2. If AC_CONFIG_FILES(ABC) is not in a code path that is executed by
   configure, the file ABC is not registered properly, and ABC will
   not be generated at the end of configure.

This applies to hwloc because hwloc's HWLOC_SETUP_CORE macro calls
both AC_CONFIG_FILES and AC_CONFIG_HEADER to setup its Makefiles
(etc.) so that targets like "make distclean" and "make distcheck" will
work properly.  Hence, we *have* to invoke HWLOC_SETUP_CORE.

However, the MCA_opal_hwloc_hwloc201_CONFIG macro has a few side
effects.  It would be nice to do able to do something like this:

```
    if hwloc:extern is going to be used:
        Invoke minimal HWLOC_SETUP_CORE (with no side effects)
    else
        Invoke full HWLOC_SETUP_CORE (with side effects)
    fi
```

But we can't, because autoreconf will detect that AC_CONFIG_FILES has
been invoked on the same files more than once (regardless of whether
those code paths will be executed at run time or not).  Kaboom.

Similarly, we can't do this:

```
    if hwloc:extern is not going to be used:
        Invoke full HWLOC_SETUP_CORE (with side effects)
    fi
```

Because then hwloc's AC_CONFIG_FILES won't be registered properly when
hwloc:external *is* used (i.e., when the HWLOC_SETUP_CORE macro is not
in a code path that is executed at run time), and targets like "make
distclean" will fail because hwloc's Makefiles won't have been setup.
Kaboom.

But remember that the hwloc framework is a bit special: there will
only ever be 2 comoponents: external and internal.  External is
guaranteed to be configured first because of its priority.  So the
internal component (i.e., this component) immediately knows if it is
going to be used or not based on whether the external component
configuration succeeded or failed.

Specifically: regardless of whether the internal component (i.e., this
component) is going to be used, we have to invoke HWLOC_SETUP_CORE.
But we can manage the side effects: allow the side effects when
this/internal component is going to be used, and avoid the side
effects when this/internal component is not going to be used.

This is a little less clean than I would have liked, but because of
Autoconf's oddity about its AC_CONFIG_* macros, this is the only
solution I could come up with.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-11 11:05:23 -07:00
Jeff Squyres
69aa46e167 libevent2022/configure.m4: always invoke sub-configure
In order to make "make distclean" (and friends) work, we need to
*always* invoke the embedded configure script -- even if we know that
we're not going to use this component.

But in cases where we know we're not going to use this component, we
also need to avoid the side effects of the code path that is used when
we *do* want to use this component.  So split the two possibilities
into two different macros:

1. MCA_opal_event_libevent2022_FAKE_CONFIG: which does almost nothing
   except invoke the underlying "configure" script.
2. MCA_opal_event_libevent2022_REAL_CONFIG: which does all the real
   work (including invoking the underlying "configure" script).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-11 06:03:46 -07:00
Jeff Squyres
80df3f040b libevent2022/configure.m4: trivial cleanup
Put argument to AM_CONDITIONAL inside [].  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-10 10:30:45 -07:00
Jeff Squyres
17aa64e438 libevent2022/configure.m4: minor comment cleanup
Change # -> dnl.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-10 10:30:04 -07:00
Aravind Gopalakrishnan
ed2343034d MTL OFI: Fix race condition due to global progress entries array
Since progress entries array is globally allocated, it is susceptible
to race conditions when using multi-threaded applications. Allocating it
on the stack resolves any potential races as it is thread local by default.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-08-09 10:52:28 -07:00
Jeff Squyres
09ad7258e2
Merge pull request #5527 from jsquyres/pr/external-internal-hwloc-libevent-fixes
External internal hwloc libevent fixes
2018-08-09 09:45:58 -04:00
Jeff Squyres
6a2891e092
Merge pull request #5526 from jsquyres/pr/man-page-script-abstraction-break
Fix script abstraction break: mv make_manpage.pl to config
2018-08-09 08:31:39 -04:00
Jeff Squyres
b063cb6b0f libevent2022: only configure if event:external fails
We know that event:external will be configured first (because of its
priority).  Take advantage of that here in libevent2022 by having it
refuse to configure / politely fail if event:external succeeded.

Also print out some additional lines in configure output indicating
what is going on (i.e., event:external succeeded, so this component
will be skipped, or event:external failed, so this component will be
used).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-08 10:22:38 -07:00
Jeff Squyres
4e5f432786 hwloc201: only configure if hwloc:external fails
We know that hwloc:external will be configured first (because of its
priority).  Take advantage of that here in hwloc201 by having it
refuse to configure / politely fail if hwloc:external succeeded.

Also print out some additional lines in configure output indicating
what is going on (i.e., hwloc:external succeeded, so this component
will be skipped, or hwloc:external failed, so this component will be
used).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-08 10:22:38 -07:00
Jeff Squyres
89773c41a2 Fix script abstraction break: mv make_manpage.pl to config
Having the "make_manpage.pl" script in the ompi/ tree broke
"./autogen.pl --no-ompi" (specifically: "make distcheck" of --no-ompi
builds would break).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-08 08:50:55 -07:00
Nathan Hjelm
bdbb853461 opal/progress: protect against multiple threads in event base
libevent does not support multiple threads calling the event loop on
the same event base. This causes external libevent's to print out
re-entrant warning messages. This commit fixes the issue by protecting
the call to the event loop with an atomic swap check.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-07 16:13:43 -06:00
Todd Kordenbrock
e9f378e851
Merge pull request #5500 from tkordenbrock/topic/master/fix.PtlMEUnlink.in.use
coll-portals4: retry PtlMEUnlink() if PTL_IN_USE
2018-08-07 11:21:00 -05:00
Nathan Hjelm
c294bbc352
Merge pull request #5508 from hjelmn/fuzzy_match
Bring fuzzy matching support into master
2018-08-06 13:52:04 -06:00
Nathan Hjelm
eeae3f9b93
Merge pull request #5517 from bosilca/topic/treematch_warnings
Remove few warnings identified by @rhc in #5514.
2018-08-06 13:25:07 -06:00
Matthew Dosanjh
c8d13486cc Fixed promotion bug
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-06 12:56:36 -06:00
Zoltán Mizsei
ac3f8a16ed fcntl include bugfix
Signed-off-by: Zoltán Mizsei <zmizsei@extrowerk.com>
2018-08-06 19:45:59 +02:00
Ralph Castain
97da19f203
Merge pull request #5498 from karasevb/pmix_fence_status
pmix: added check for pmix fence status
2018-08-06 05:33:44 -07:00
Boris Karasev
57683366ca pmix: added check for pmix fence status
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-08-06 15:01:57 +06:00
George Bosilca
6d11a45f44
Remove few warnings identified by @rhc in #5514.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-08-03 16:21:06 -04:00
Thananon Patinyasakdikul
080115d440 btl/ofi: Added 2 side communication support.
The 2 sided communication support is added for non-tagmatching provider
to take advantage of this BTL and PML OB1. The current state is
"functional" and not optimized for performance.

Two sided support is disabled by default and can be turned on by mca
parameter: "mca_btl_ofi_mode".

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
2018-08-03 12:30:03 -07:00
George Bosilca
a5fbfa476a
Be conservative with the array_of_indices
We were assuming that the array_of_indices has the same size as the
number of requests (incount), instead of the numberr of actually
active requests. While the patch is trivial, the question of the
size of the array_of_indices should be clarified in the MPI Forum.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-08-03 14:58:13 -04:00
Ralph Castain
ae03014690
Merge pull request #5515 from rhc54/topic/ignore
Update ignores
2018-08-03 08:52:52 -07:00
Ralph Castain
cfbb630243 Update ignores
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-08-03 08:51:36 -07:00
Nathan Hjelm
dd74c6252f pml/ob1: custom matching cleanup and configury
This commit updates the new custom matching code in pml/ob1 so it can
not be enabled with a configure option. This commit also renames the
fuzzy-matching headers to avoid potential name conflicts and removes
the use of C reserved identifiers.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-02 13:06:19 -06:00
Matthew Dosanjh
572694b621 Adding custom match source.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-02 12:23:08 -06:00
Nathan Hjelm
ea6f936900
Merge pull request #5505 from hjelmn/orted_threads
orte/runtime: always set opal_using_threads for orted/mpirun
2018-08-01 09:45:02 -06:00
Nathan Hjelm
551133fd1a orte/runtime: always set opal_using_threads for orted/mpirun
Both orted and mpirun use threads to speed up local process spawing.
In order to avoid data corruption when calling the opal_output
interface we need to ensure that opal_using_threads is set to true.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-01 08:53:22 -06:00
Ralph Castain
1aef0a64aa
Merge pull request #5477 from nrspruit/ns_mtl_send_isend
MTL OFI: send/isend split into blocking/non-blocking paths
2018-07-31 13:08:37 -07:00
Ralph Castain
8744320a18
Merge pull request #5476 from nrspruit/ns_cancel_fix
MTL OFI: Fix Deadlock in fi_cancel given completion during cancel
2018-07-31 13:07:41 -07:00
Todd Kordenbrock
f3f2a826b4 coll-portals4: retry PtlMEUnlink() if PTL_IN_USE
In the cleanup phase, it is possible for PtlMEUnlink() to return
PTL_IN_USE if the NIC is not done with the ME.  This should not
be considered an error.  This commit adds a retry loop around
PtlMEUnlink().

In some cases, the return value of PtlMEUnlink() and PtlCTFree()
was not checked at all.  Check them with the same retry loop as
above.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2018-07-31 10:20:55 -05:00
Mark Allen
f413ef6b14 apply romio314 patch to romio321
When romio314 was first pulled in an extra patch was applied to it, see commit
92f6c7c1e2. Most of that patch is already present
in vanilla romio321, but the fix for MPIO_DATATYPE_ISCOMMITTED() isn't.

If that macro doesn't set err_ then some paths end up with a variable being used
uninitialized. In particular you can trace through romio321/romio/mpi-io/read.c
to see what happens with error_code. It's an uninitialized stack variable that goes
through three MPIO_CHECK_* macros none of which set it. The macros consistently set
error_code to a failure if they see something wrong, but they don't consistently
set it to success when things are fine.

And then in the last macro MPIO_CHECK_DATATYPE it tries to look at the value
of error_code that was never set.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2018-07-30 17:14:56 -04:00
Yossi Itigin
f614438518
Merge pull request #5480 from hoopoepg/topic/ucx-init-c99
PML/SPML/UCX: init global objects using C99 style
2018-07-28 16:20:42 +03:00
Sergey Oblomov
d204b8a678 PML/SPML/UCX/COMPONENT: applied C99 initialization
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-28 09:44:03 +03:00
Mikhail Kurnosov
b45e190e66 coll/base/allgatherv: fix MPI_IN_PLACE processing
The call of MPI_Allgatherv with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-27 09:34:17 +07:00
Ralph Castain
d67619b760
Merge pull request #5487 from rhc54/topic/grr
Fix typo
2018-07-25 21:36:35 -07:00
Ralph Castain
f7a537cf04 Fix typo
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-25 20:05:33 -07:00
Ralph Castain
fcefa44ad0
Merge pull request #5483 from rhc54/topic/maps
Fix the map-by modifier parsing
2018-07-25 19:51:15 -07:00
Ralph Castain
bcdb1f45ac Fix the multiple pe/proc option
Things got a little out of whack and we weren't actually processing the map-by modifiers, plus an error crept into the display of the binding report. So clean those up.

Thanks to @tonyreina for the error report

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-25 18:47:39 -07:00
Ralph Castain
a32fc958f4 Add a bunch of missing ignores
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-25 16:13:08 -07:00
Ralph Castain
55cefedf9b Cleanup pmix selection check
Allow for versions > 3

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-25 16:11:32 -07:00
Sergey Oblomov
2806504290 PML/SPML/UCX: init global objects using C99 style
- to avoid value mix used C99 style of object initializations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-25 14:52:45 +03:00
Howard Pritchard
99ad8d4f2a VERSION: move master to 4.1
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-07-24 12:37:03 -07:00
Matias Cabral
f3db153f75
Merge pull request #5468 from aravindksg/aravindksg/mem_tag_format
MTL OFI: Add support for mem_tag_format
2018-07-24 10:34:22 -07:00