1
1
Граф коммитов

26455 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
a718743a5c opal/timer: add code to check if rtdtsc is core invariant
Newer x86 processors have a core invariant tsc. On these systems it is
safe to use the rtdtsc instruction as a monotonic timer. This commit
adds a new function to the opal timer code to check if the timer
backend is monotonic. On x86 it checks the appropriate bit and on
other architectures it parrots back the OPAL_TIMER_MONOTONIC value.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-12-16 15:11:50 -07:00
Josh Hursey
ced245d093 Merge pull request #2590 from jjhursey/topic/osc-pt2pt-1-thread-fixes
Topic/osc pt2pt 1 thread fixes
2016-12-16 12:26:59 -06:00
Mark Allen
eec1d5bf2e osc/pt2pt: Fix hang with Put and Win_lock_all
* When using `MPI_Put` with `MPI_Win_lock_all` a hang is possible since
   the `put` is waiting on `eager_send_active` to become `true` but
   that variable might not be reset in the case of `MPI_Win_lock_all`
   depending on other incoming events (e.g., `post` or ACKs of lock
   requests.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2016-12-16 11:52:53 -05:00
Mark Allen
0d1336b4a8 osc/pt2pt: Fix Lock/Unlock and Get wrong answer
* When using `MPI_Lock`/`MPI_Unlock` with `MPI_Get` and non-contiguous
   datatypes is is possible that the unlock finishes too early before
   the data is actually present in the recv buffer.
 * We need to wait for the irecv to complete before unlocking the target.
   This commit waits for the outgoing fragment counts to become equal
   before unlocking.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2016-12-16 11:52:51 -05:00
Mark Allen
1ebf9fd3a4 osc/pt2pt: Fix PSCW after Fence wrong answer.
* If the user uses PSCW synchronization after a Fence then the previous
   epoch is not reset which can cause the PSCW to transfer data before
   it is ready leading to wrong answers.
 * This commit resets the `eager_send_active` in the start call.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2016-12-16 11:52:49 -05:00
Joshua Ladd
d8c1a3de3a Merge pull request #2589 from xinzhao3/topic/ucx-mt-support
PML/SPML/UCX: add UCX MT support to PML and SPML.
2016-12-16 08:53:50 -05:00
Xin Zhao
0ecf3c951c PML/SPML/UCX: add UCX MT support to PML and SPML.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2016-12-15 23:59:15 +02:00
Martin Kontsek
30d076a2f7 Add arguments to rpmbuild script and update README, implement pull request suggestions.
Signed-off-by: Martin Kontsek <mkontsek@cisco.com>
2016-12-15 11:18:41 -08:00
rhc54
00b87ea829 Merge pull request #2584 from rhc54/topic/warnings
Reduce the flood of warnings due to uninitialized variables, mismatch…
2016-12-15 10:09:01 -08:00
rhc54
e84f738c11 Merge pull request #2587 from rhc54/topic/oversub
Ensure that we don't bind-by-default in an oversubscribed condition
2016-12-15 09:59:48 -08:00
Ralph Castain
2af677b1cf Ensure that we don't bind-by-default in an oversubscribed condition
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-15 07:58:52 -08:00
Ralph Castain
585540bcee Reduce the flood of warnings due to uninitialized variables, mismatched types, and unused things to a more bearable trickle
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-14 16:33:50 -08:00
Gilles Gouaillardet
a019095b84 pmix2x/class: correctly handle concurrent class initialization
(back-ported from upstream commit pmix/master@ceedbd67fd)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-15 09:07:24 +09:00
rhc54
15b6eaf2d4 Merge pull request #2562 from rhc54/topic/pmix2
Update the PMIx2 support to include the latest shared memory optimizations
2016-12-14 15:18:33 -08:00
Ralph Castain
884fb7fcf2 Update the PMIx2 support to include the latest shared memory optimizations
Update ORTE support for dynamic PMIx operations e.g., PMIx_Spawn
Update to track master
Ensure that --disable-pmix-dstore actually disables the dstore. Sync to a few debugger updates

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-14 15:00:10 -08:00
rhc54
db32d1d600 Merge pull request #2577 from rhc54/topic/exitcode
Ensure jobs that fail always return a non-zero exit code.
2016-12-14 10:35:07 -08:00
Jeff Squyres
3cb3220094 AUTHORS: update via make-authors.pl script
Update .mailmap to catch some inconsistencies.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-14 10:28:48 -08:00
Jeff Squyres
a28ae984ee make-authors: we no longer require organizations
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-14 10:20:56 -08:00
Ralph Castain
9f69b0183f Ensure jobs that fail always return a non-zero exit code.
Thanks to Ashley Pittman for the report.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-14 09:41:06 -08:00
rhc54
82110dcb53 Merge pull request #2575 from rhc54/topic/tmpdir
Use the server tmpdir instead of the system tmpdir for tool contact files
2016-12-14 09:40:13 -08:00
Ralph Castain
1961a1c22a Use the server tmpdir instead of the system tmpdir for tool contact files
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-14 08:42:09 -08:00
Alex Mikheev
67d66c2326
oshmem: sshmem: make mmap allocator a default instead of verbs
By default use mmap() to allocate memory for the symmetric heap.
It is safer and more portable choice than sysv and verbs.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-12-14 13:31:16 +02:00
Nathan Hjelm
8155124adc Merge pull request #2558 from hjelmn/datatype_fix
ompi/datatype: fix bug in darray that causes MPI/IO failures
2016-12-13 14:02:15 -07:00
Yossi
fa6e263821 Merge pull request #2537 from alinask/topic/pml-spml-ucx-api
PML/SPML/UCX: Adapt to the API changes in the UCX lib.
2016-12-13 20:01:47 +02:00
Nathan Hjelm
eb439228b1 ompi/datatype: fix bug in darray that causes MPI/IO failures
This commit fixes errors in the lb and extent of darray datatypes. For
these datatypes the lb should be the start offset of the rank's data
in the array and the extent should be the size of the entire
datatype. In master the lb was always 0 and the extent was always to
small. This commit updates the call to opal_datatype_resize to set the
correct lb and fixes the extent calculation.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-12-13 09:25:16 -07:00
Jeff Squyres
f9e8a55a0e Merge pull request #2543 from ggouaillardet/topic/dll_bit_reproducible
ompi/debuggers: make the binary bit reproducible
2016-12-09 06:35:47 -05:00
KAWASHIMA Takahiro
ae056d957c Merge pull request #2545 from kawashima-fj/pr/inactive-persistent-request
ompi/request: Fix a persistent request creation bug
2016-12-09 08:42:31 +09:00
Jeff Squyres
1187212f5d scaling.pl: minor change to perl quoting
Makes emacs syntax hilighting work better.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-08 09:25:08 -08:00
Ralph Castain
d5a428b646 Scaling test should only launch one proc/node
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-08 09:24:22 -08:00
KAWASHIMA Takahiro
6510800c16 ompi/request: Fix a persistent request creation bug
According to the MPI-3.1 p.52 and p.53 (cited below), a request
created by `MPI_*_INIT` but not yet started by `MPI_START` or
`MPI_STARTALL` is inactive therefore `MPI_WAIT` or its friends
must return immediately if such a request is passed.

The current implementation hangs in `MPI_WAIT` and its friends
in such case because a persistent request is initialized as
`req_complete = REQUEST_PENDING`. This commit fixes the
initialization.

Also, this commit fixes internal requests used in `MPI_PROBE`
and `MPI_IPROBE` which was marked wrongly as persistent.

MPI-3.1 p.52:

We shall use the following terminology: A null handle is a handle
with value MPI_REQUEST_NULL. A persistent request and the handle
to it are inactive if the request is not associated with any ongoing
communication (see Section 3.9). A handle is active if it is neither
null nor inactive. An empty status is a status which is set to return
tag = MPI_ANY_TAG, source = MPI_ANY_SOURCE, error = MPI_SUCCESS, and
is also internally configured so that calls to MPI_GET_COUNT,
MPI_GET_ELEMENTS, and MPI_GET_ELEMENTS_X return count = 0 and
MPI_TEST_CANCELLED returns false. We set a status variable to empty
when the value returned by it is not significant. Status is set in
this way so as to prevent errors due to accesses of stale information.

MPI-3.1 p.53:

One is allowed to call MPI_WAIT with a null or inactive request
argument. In this case the operation returns immediately with empty
status.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2016-12-08 21:42:05 +09:00
Alina Sklarevich
e9d2d029c6 PML/SPML/UCX: Adapt to the API changes in the UCX lib.
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2016-12-08 11:33:29 +02:00
Gilles Gouaillardet
804a784fce Merge pull request #2544 from ggouaillardet/topic/mca_spml_yoda_get
spml/yoda: fix support for BTLs that do not register memory in mca_sp…
2016-12-08 17:26:07 +09:00
Gilles Gouaillardet
062ed9c919 spml/yoda: fix support for BTLs that do not register memory in mca_spml_yoda_get()
Refs open-mpi/ompi#2499

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-08 15:56:25 +09:00
Gilles Gouaillardet
4d8f606420 ompi/debuggers: make the binary bit reproducible
instead of compilation date __DATE__, use a MPI_Get_library_version() like string

Thanks Alastair McKinstry for the report

Fixes open-mpi/ompi#2518

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-08 13:46:43 +09:00
rhc54
341ab683de Merge pull request #2532 from rhc54/topic/pmixptl
Update to latest PMIx master + PTL branch
2016-12-07 17:28:22 -08:00
rhc54
25a3e27b07 Merge pull request #2542 from rhc54/topic/ashley
Correctly cleanup the local children and node map info on remote orte…
2016-12-07 15:53:08 -08:00
Ralph Castain
e1aa7939ef Correctly cleanup the local children and node map info on remote orteds upon job completion. Ensure that register_nspace only includes procs from that job in the proc map
Thanks to Ashley Pittman for the report

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-07 13:53:00 -08:00
rhc54
309c967946 Merge pull request #2536 from ggouaillardet/topic/ess_base_update_routing_plan
ess/base: invoke orte_routed.update_routing_plan() earlier
2016-12-07 07:20:20 -08:00
Gilles Gouaillardet
123036dbf8 ess/base: invoke orte_routed.update_routing_plan() earlier
fix an issue that can be evidenced with two nodes
n0$ mpirun --host n1:1 --mca oob_tcp_static_ipv4_ports 1234 -np 1 --mca routed radix --mca oob tcp true

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-07 17:19:25 +09:00
Ralph Castain
fbed2d794a Update to latest PMIx master + PTL branch
Update the usock component to disable it

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-06 20:47:44 -08:00
rhc54
f95f11d285 Merge pull request #2534 from rhc54/topic/pmixconfig
Update pmix check headers to support Open BSD
2016-12-06 20:46:34 -08:00
Gilles Gouaillardet
1635c293e6 Merge pull request #2522 from ggouaillardet/topic/misc_asm_fixes
Topic/misc asm fixes
2016-12-07 13:31:49 +09:00
Ralph Castain
d51821cbc7 Update pmix check headers to support Open BSD
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-06 19:37:06 -08:00
rhc54
c2f57581df Merge pull request #2527 from rhc54/topic/sig2
Update signal handling to introduce a pause between SIGCONT and SIGT…
2016-12-06 14:02:25 -08:00
Ralph Castain
85a634926b Update signal handling to introduce a pause between SIGCONT and SIGTERM, followed by another pause before SIGKILL. Do this within the odls/kill_local_procs function while we know we are blocked in an event, and before the daemon shuts down the event progress loop
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-06 12:34:42 -08:00
alex-mikheev
a3e4c33f0e Merge pull request #2524 from jladd-mlnx/topic/shmemx_h-master
Remove shmemx.h from shmem.h. Add shmem.h to shmemx.h
2016-12-06 09:49:34 +02:00
Gilles Gouaillardet
299a6f8d7c configury: auto-detect armhf and armel architectures on Debian
Thanks Alastair McKinstry for the patch

Fixes open-mpi/ompi#2514

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-06 14:49:54 +09:00
Gilles Gouaillardet
596613c0aa configury: add support for x32 architecture
Thanks Alastair McKinstry for the patch

Fixes open-mpi/ompi#2515

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-06 14:49:37 +09:00
Gilles Gouaillardet
c8b51a2d3b configury: remove some dead code
perl is now mandatory to build Open MPI,
so there is no need to check for it

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-06 14:49:37 +09:00
Joshua Ladd
dc6f4a0feb Remove shmemx.h from shmem.h. Add shmem.h to shmemx.h
Fixes #2483
Signed-off-by: Joshua Ladd <joshual@mellanox.com>
2016-12-06 06:42:26 +02:00