1
1

29140 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
6a2891e092
Merge pull request #5526 from jsquyres/pr/man-page-script-abstraction-break
Fix script abstraction break: mv make_manpage.pl to config
2018-08-09 08:31:39 -04:00
Jeff Squyres
b063cb6b0f libevent2022: only configure if event:external fails
We know that event:external will be configured first (because of its
priority).  Take advantage of that here in libevent2022 by having it
refuse to configure / politely fail if event:external succeeded.

Also print out some additional lines in configure output indicating
what is going on (i.e., event:external succeeded, so this component
will be skipped, or event:external failed, so this component will be
used).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-08 10:22:38 -07:00
Jeff Squyres
4e5f432786 hwloc201: only configure if hwloc:external fails
We know that hwloc:external will be configured first (because of its
priority).  Take advantage of that here in hwloc201 by having it
refuse to configure / politely fail if hwloc:external succeeded.

Also print out some additional lines in configure output indicating
what is going on (i.e., hwloc:external succeeded, so this component
will be skipped, or hwloc:external failed, so this component will be
used).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-08 10:22:38 -07:00
Jeff Squyres
89773c41a2 Fix script abstraction break: mv make_manpage.pl to config
Having the "make_manpage.pl" script in the ompi/ tree broke
"./autogen.pl --no-ompi" (specifically: "make distcheck" of --no-ompi
builds would break).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-08 08:50:55 -07:00
Nathan Hjelm
bdbb853461 opal/progress: protect against multiple threads in event base
libevent does not support multiple threads calling the event loop on
the same event base. This causes external libevent's to print out
re-entrant warning messages. This commit fixes the issue by protecting
the call to the event loop with an atomic swap check.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-07 16:13:43 -06:00
Todd Kordenbrock
e9f378e851
Merge pull request #5500 from tkordenbrock/topic/master/fix.PtlMEUnlink.in.use
coll-portals4: retry PtlMEUnlink() if PTL_IN_USE
2018-08-07 11:21:00 -05:00
Nathan Hjelm
c294bbc352
Merge pull request #5508 from hjelmn/fuzzy_match
Bring fuzzy matching support into master
2018-08-06 13:52:04 -06:00
Nathan Hjelm
eeae3f9b93
Merge pull request #5517 from bosilca/topic/treematch_warnings
Remove few warnings identified by @rhc in #5514.
2018-08-06 13:25:07 -06:00
Matthew Dosanjh
c8d13486cc Fixed promotion bug
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-06 12:56:36 -06:00
Zoltán Mizsei
ac3f8a16ed fcntl include bugfix
Signed-off-by: Zoltán Mizsei <zmizsei@extrowerk.com>
2018-08-06 19:45:59 +02:00
Ralph Castain
97da19f203
Merge pull request #5498 from karasevb/pmix_fence_status
pmix: added check for pmix fence status
2018-08-06 05:33:44 -07:00
Boris Karasev
57683366ca pmix: added check for pmix fence status
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-08-06 15:01:57 +06:00
George Bosilca
6d11a45f44
Remove few warnings identified by @rhc in #5514.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-08-03 16:21:06 -04:00
Thananon Patinyasakdikul
080115d440 btl/ofi: Added 2 side communication support.
The 2 sided communication support is added for non-tagmatching provider
to take advantage of this BTL and PML OB1. The current state is
"functional" and not optimized for performance.

Two sided support is disabled by default and can be turned on by mca
parameter: "mca_btl_ofi_mode".

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
2018-08-03 12:30:03 -07:00
George Bosilca
a5fbfa476a
Be conservative with the array_of_indices
We were assuming that the array_of_indices has the same size as the
number of requests (incount), instead of the numberr of actually
active requests. While the patch is trivial, the question of the
size of the array_of_indices should be clarified in the MPI Forum.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-08-03 14:58:13 -04:00
Ralph Castain
ae03014690
Merge pull request #5515 from rhc54/topic/ignore
Update ignores
2018-08-03 08:52:52 -07:00
Ralph Castain
cfbb630243 Update ignores
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-08-03 08:51:36 -07:00
Nathan Hjelm
dd74c6252f pml/ob1: custom matching cleanup and configury
This commit updates the new custom matching code in pml/ob1 so it can
not be enabled with a configure option. This commit also renames the
fuzzy-matching headers to avoid potential name conflicts and removes
the use of C reserved identifiers.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-02 13:06:19 -06:00
Matthew Dosanjh
572694b621 Adding custom match source.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-02 12:23:08 -06:00
Nathan Hjelm
ea6f936900
Merge pull request #5505 from hjelmn/orted_threads
orte/runtime: always set opal_using_threads for orted/mpirun
2018-08-01 09:45:02 -06:00
Nathan Hjelm
551133fd1a orte/runtime: always set opal_using_threads for orted/mpirun
Both orted and mpirun use threads to speed up local process spawing.
In order to avoid data corruption when calling the opal_output
interface we need to ensure that opal_using_threads is set to true.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-01 08:53:22 -06:00
Ralph Castain
1aef0a64aa
Merge pull request #5477 from nrspruit/ns_mtl_send_isend
MTL OFI: send/isend split into blocking/non-blocking paths
2018-07-31 13:08:37 -07:00
Ralph Castain
8744320a18
Merge pull request #5476 from nrspruit/ns_cancel_fix
MTL OFI: Fix Deadlock in fi_cancel given completion during cancel
2018-07-31 13:07:41 -07:00
Todd Kordenbrock
f3f2a826b4 coll-portals4: retry PtlMEUnlink() if PTL_IN_USE
In the cleanup phase, it is possible for PtlMEUnlink() to return
PTL_IN_USE if the NIC is not done with the ME.  This should not
be considered an error.  This commit adds a retry loop around
PtlMEUnlink().

In some cases, the return value of PtlMEUnlink() and PtlCTFree()
was not checked at all.  Check them with the same retry loop as
above.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2018-07-31 10:20:55 -05:00
Mark Allen
f413ef6b14 apply romio314 patch to romio321
When romio314 was first pulled in an extra patch was applied to it, see commit
92f6c7c1e210c559471a05aaac9b19e0bd3d71bb. Most of that patch is already present
in vanilla romio321, but the fix for MPIO_DATATYPE_ISCOMMITTED() isn't.

If that macro doesn't set err_ then some paths end up with a variable being used
uninitialized. In particular you can trace through romio321/romio/mpi-io/read.c
to see what happens with error_code. It's an uninitialized stack variable that goes
through three MPIO_CHECK_* macros none of which set it. The macros consistently set
error_code to a failure if they see something wrong, but they don't consistently
set it to success when things are fine.

And then in the last macro MPIO_CHECK_DATATYPE it tries to look at the value
of error_code that was never set.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2018-07-30 17:14:56 -04:00
Yossi Itigin
f614438518
Merge pull request #5480 from hoopoepg/topic/ucx-init-c99
PML/SPML/UCX: init global objects using C99 style
2018-07-28 16:20:42 +03:00
Sergey Oblomov
d204b8a678 PML/SPML/UCX/COMPONENT: applied C99 initialization
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-28 09:44:03 +03:00
Mikhail Kurnosov
b45e190e66 coll/base/allgatherv: fix MPI_IN_PLACE processing
The call of MPI_Allgatherv with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-27 09:34:17 +07:00
Ralph Castain
d67619b760
Merge pull request #5487 from rhc54/topic/grr
Fix typo
2018-07-25 21:36:35 -07:00
Ralph Castain
f7a537cf04 Fix typo
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-25 20:05:33 -07:00
Ralph Castain
fcefa44ad0
Merge pull request #5483 from rhc54/topic/maps
Fix the map-by modifier parsing
2018-07-25 19:51:15 -07:00
Ralph Castain
bcdb1f45ac Fix the multiple pe/proc option
Things got a little out of whack and we weren't actually processing the map-by modifiers, plus an error crept into the display of the binding report. So clean those up.

Thanks to @tonyreina for the error report

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-25 18:47:39 -07:00
Ralph Castain
a32fc958f4 Add a bunch of missing ignores
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-25 16:13:08 -07:00
Ralph Castain
55cefedf9b Cleanup pmix selection check
Allow for versions > 3

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-25 16:11:32 -07:00
Sergey Oblomov
2806504290 PML/SPML/UCX: init global objects using C99 style
- to avoid value mix used C99 style of object initializations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-25 14:52:45 +03:00
Howard Pritchard
99ad8d4f2a VERSION: move master to 4.1
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-07-24 12:37:03 -07:00
Matias Cabral
f3db153f75
Merge pull request #5468 from aravindksg/aravindksg/mem_tag_format
MTL OFI: Add support for mem_tag_format
2018-07-24 10:34:22 -07:00
Spruit, Neil R
7dc8c8ba3f MTL OFI: send/isend split into blocking/non-blocking paths
-Updated blocking send to directly call functionality and
set completion events expected to 0 initally. This allows for optimization for
providers that support fi_tinject up to larger sizes. This also reduces
latency on running the OFI mtl with smaller sizes without requiring
calls to progress given fi_tinject is required to complete the messaging
before returning and will not create any events in the Completion Queue.

-Updated non-blocking send to directly call fi_tsend and avoid calling
fi_tinject as the functionality should not wait on completions. This
resolves a bug where applications calling MPI_Isend can overrun the
TX buffer with small (inject) messages causing a deadlock. In addition
this improves performance in message rates by preventing
waiting on any size message to complete in non-blocking send messages.

-Created common ompi_mtl_ofi_ssend_recv function to post the ssend recv
which is common between isend and send code paths.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-07-24 07:54:24 -07:00
Nathan Hjelm
f5221a473c
Merge pull request #5473 from hoopoepg/topic/atomic-init-conflict
MCA/ATOMIC: atomic_init renamed to atomic_startup
2018-07-24 07:08:09 -06:00
Spruit, Neil R
767135c580 MTL OFI: Fix Deadlock in fi_cancel given completion during cancel
- If a message for a recv that is being cancelled gets completed after
the call to fi_cancel, then the OFI mtl will enter a deadlock state
waiting for ofi_req->super.ompi_req->req_status._cancelled which will
never happen since the recv was successfully finished.

- To resolve this issue, the OFI mtl now checks ofi_req->req_started
to see if the request has been started within the loop waiting for the
event to be cancelled. If the request is being completed, then the loop
is broken and fi_cancel exits setting
ofi_req->super.ompi_req->req_status._cancelled = false;

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-07-24 03:12:44 -07:00
Sergey Oblomov
3295b23800 MCA/ATOMIC: atomic_init renamed to atomic_startup
- there is C11 naming conflict - atomic_init is C macro
  which cause building issue

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-24 08:49:06 +03:00
Nathan Hjelm
47ed8e8830 btl/uct: fix compile warnings/errors
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-23 14:04:38 -06:00
Matias Cabral
d996f529c0 MTL OFI: Add support for mem_tag_format
OFI providers may reserve some of the upper bits of the tag for
internal usage and expose it using mem_tag_format. Check for that
and adjust communicator bits as needed.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-07-23 11:39:40 -07:00
Ralph Castain
7d6fd346cb
Merge pull request #5466 from rhc54/topic/oops
Leave opal_event_external_support exposed as global var
2018-07-23 11:11:10 -07:00
Ralph Castain
5cab823979 Leave opal_event_external_support exposed as global var
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-23 09:18:48 -07:00
Gilles Gouaillardet
92d89411ca
Merge pull request #5395 from jsquyres/pr/prefer-externals
Change defaults to prefer external libevent/hwloc
2018-07-23 11:46:01 +09:00
Jeff Squyres
a70ecf5267 event/external: prefer external event component
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-07-23 09:20:27 +09:00
Jeff Squyres
83e4a45a9f event: trivial comment change
Switch from #-style to dnl-style.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-07-23 09:20:27 +09:00
Gilles Gouaillardet
ce2c9fffd4 hwloc: prefer external hwloc component
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-07-23 09:20:27 +09:00
Matias Cabral
30fb635836
Merge pull request #5446 from nrspruit/ns_mtl_ofi_overflow
MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow
2018-07-20 14:53:53 -07:00