1
1
Граф коммитов

27615 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
68029b27e4 Fix the orte-dvm operations so that orterun can connect and execute an application. There is a lingering problem, though. The first invocation of orterun succeeds every time. However, subsequent invocations have a high probability of hanging in the OOB connection handshake.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-23 17:31:08 -07:00
Ralph Castain
2e23fba5c4 Merge pull request #4136 from rhc54/topic/pmixup
Continue tracking PMIx v2.1.0
2017-08-23 11:16:39 -07:00
Ralph Castain
0561d64748 Continue tracking PMIx v2.1.0
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-23 09:38:27 -07:00
Ralph Castain
f6fd699d44 Merge pull request #4133 from rhc54/topic/modex
Optimize discovery of HWLOC topology
2017-08-22 21:00:49 -07:00
Ralph Castain
e02c39385a Merge branch 'master' into topic/modex 2017-08-22 20:06:35 -07:00
George Bosilca
50f471e31e
Cleanup a set of warnings reported by Ralph.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-08-22 23:00:18 -04:00
Gilles Gouaillardet
565b516dae hwloc/base: fix opal_output() usage
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-08-23 10:24:47 +09:00
Ralph Castain
d80b0c7990 If the HWLOC shared memory system is unable to connect, then fallback to providing the topology via XML. Do not automatically provide the XML to every process as that defeats the purpose of the shared memory system. Instead, use PMIx_Query_info_nb to get the info from the server when required.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-22 18:12:26 -07:00
Ralph Castain
8273cea9d6 Merge pull request #4132 from rhc54/topic/ext
Fix the external PMIx and HWLOC components
2017-08-22 15:18:55 -07:00
Ralph Castain
38e363c515 Fix the #if check for hwloc version
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-22 14:07:36 -07:00
Ralph Castain
e3213386ec Fix the internal PMIx installation - matching changes have been upstreamed
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-22 13:49:07 -07:00
Ralph Castain
a1b15c5666 Roll in update to PMIx master. Transfer updates from pmix2x component to ext2x
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-22 13:06:47 -07:00
Jeff Squyres
b991135634 Merge pull request #4128 from jsquyres/pr/fix-info-delete-return-value
mpi/info_delete: fix return code
2017-08-22 14:33:29 -04:00
Jeff Squyres
ea5093fc14 mpi/info_delete: fix return code
Per MPI-3.1, ensure to raise an MPI exception with value
MPI_ERR_INFO_NOKEY if we try to MPI_INFO_DELETE a key that does not
exist.  Thanks to @dalcinl (Lisando Dalcin) for raising the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-08-22 08:56:40 -07:00
Ralph Castain
f5fb43e9c7 Merge pull request #4120 from bgoglin/master
fixes and debug messages to the hwloc/shmem use
2017-08-22 07:59:45 -07:00
Brice Goglin
046d870124 rtc/hwloc/shmem: add Inria copyrights
The code for finding the hole for the shmem region actually came from me.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2017-08-21 23:09:57 +02:00
Brice Goglin
2d242ab9f0 hwloc/shmem: don't abort on failure to load from shmem
Adopting can fail if the server-side hole isn't available on the client.

We can fallback to other ways to load the topology.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2017-08-21 19:57:38 +02:00
Brice Goglin
ffd209fc2e hwloc/shmem: dump /proc/self/maps if failed to find a hole and verbosity > 4
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2017-08-21 19:57:38 +02:00
Brice Goglin
baf762d99d rtc/hwloc/shmem: dump /proc/self/maps if failed to find a hole and verbosity > 4
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2017-08-21 19:57:38 +02:00
Brice Goglin
8f6afbb641 rtc/hwloc/shmem: fix "heap" hole search kind
There can be multiple [heap] consecutively in proc/<pid>/maps,
and there's no room between them.
Don't use a hole after the first [heap] is there's another [heap]
immediately after it.

This code would fail to find the last [heap] if there were multiple
[heap] interleaved with non-heap VMA, but our kind "after heap"
wouldn't be meaningful anymore anyway.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2017-08-21 15:42:38 +02:00
Brice Goglin
b8b46b253b rtc/hwloc/shmem: fix "libs" hole search kind
We want the biggest hole *between* heap and stack, not outside.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2017-08-21 15:40:36 +02:00
Gilles Gouaillardet
a3e31fa8d0 ompi/communicator: plug a memory leak in ompi_comm_init()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-08-21 11:47:11 +09:00
Ralph Castain
9d3f4516e6 Merge pull request #4116 from rhc54/topic/notify
Don't restrict broadcast notifications
2017-08-18 18:13:47 -07:00
Ralph Castain
d515f48885 The local PMIx server is notifying its clients of all events, but for some reason I don't recall, the broadcast notification was marked for delivery only to non-default event handlers. This creates a discrepancy between the two behaviors, so don't restrict the broadcast notifications.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-18 17:26:11 -07:00
Brian Barrett
c667719a3f Merge pull request #3955 from mohanasudhan/master
Btl tcp: Improved diagnostic output and failure mode
2017-08-18 11:42:27 -07:00
Mohan
fc32ae401e Btl Tcp: Updated tcp handshake methods
This commit has two changes

1. Adding magic string during handshake can cause
issue when used with older version of MPI. Hence set
RCVTIMEO paramter to 2 second
2. Using single call during handshake instead of
two calls

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-18 10:06:52 -07:00
Mohan
e3dfe11da9 Btl tcp: Improving verbose around tcp
As part of improvement towards tcp btl we
are improving verbose in general

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 17:22:16 -07:00
Mohan
4bc7b214dc Btl tcp: Improving verbose around IPV6
As part of improvement around tcp btl debugging
& verbose. we are improving verbose around IPV6

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:14 -07:00
Mohan
0741fad479 Btl tcp: BTL_ERROR to show_help & update func behaviour
As part of improvement towards tcp debugging
we are moving few BTL_ERROR to show_help and also
update the function behaviour of
mca_btl_tcp_endpoint_complete_connect to return
SUCCESS and ERROR cases.

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:14 -07:00
Mohan
368f9f0dfc Btl tcp: Using magic string to verify mpi connection
As part of improvement towards handling failure case
in btl tcp we are using magic string to verify mpi
connection. In case if there is mismatch or missing
magic string we can identify that we are trying to
connect with someother process.

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:13 -07:00
Mohan
c30a42917c Btl tcp: Refactoring non-blocking send/receive function
Moving non-blocking send/receive function to btl_tcp
will help reusing these function where ever needed.
In this case we plan to reuse receive function to
retrive magic string to validate established connection
is from mpi process.

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:13 -07:00
Ralph Castain
b67b1e88a5 Merge pull request #4111 from rhc54/topic/multiconnect
Cleanup some issues in connect/accept support across jobs started by …
2017-08-17 12:49:01 -07:00
Ralph Castain
d85239e052 Cleanup some issues in connect/accept support across jobs started by different mpirun commands. Still not fully operational, but someone else will have to finish debugging it
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-17 11:58:48 -07:00
Ralph Castain
a855ebd86b Merge pull request #4110 from rhc54/topic/cov
Silence coverity warnings
2017-08-17 10:57:31 -07:00
Ralph Castain
088b6cdeee Silence coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-17 09:49:35 -07:00
Jeff Squyres
4e763796b1 Merge pull request #4100 from jsquyres/pr/fix-nmcheck-prefix
nmcheck_prefix: more updates for more compilers
2017-08-16 20:39:34 -04:00
Ralph Castain
1f799afa30 Merge pull request #4106 from rhc54/topic/hwloc
Add diagnostics for hwloc get_topology
2017-08-16 15:47:47 -07:00
Ralph Castain
41df973359 Add diagnostics for hwloc get_topology
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-16 14:21:27 -07:00
Jeff Squyres
cd8db5313e Merge pull request #4101 from jsquyres/pr/usnic-restore-configure-summary-line
btl/usnic: restore configure usNIC summary line
2017-08-16 16:36:19 -04:00
Josh Hursey
e0931714ea Merge pull request #4090 from jjhursey/config/old-xl-ppc-support
config: Remove support for big endian PPC, XL compiler older than 13.1
2017-08-16 15:31:35 -05:00
Ralph Castain
f21dfd3189 Merge pull request #4097 from rhc54/topic/dlopepn
Change test per recommendation of @jsquyres
2017-08-16 13:22:18 -07:00
Jeff Squyres
a591159fb4 btl/usnic: restore configure usNIC summary line
Not sure how/when this got deleted, but put back the "Cisco usNIC"
line in the transport summary at the end of configure.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-08-16 12:37:59 -07:00
Jeff Squyres
9d09fe0151 nmcheck_prefix: more updates for more compilers
Ignore a few more symbols to pass Absoft and modern gcc.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-08-16 12:28:49 -07:00
Ralph Castain
c4d5dbfcdc Change test per recommendation of @jsquyres
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-16 11:19:15 -07:00
Jeff Squyres
1f0b6f783c Merge pull request #4095 from jsquyres/pr/fix-compiler-warning
rcash_base_frame: fix compiler warning
2017-08-16 14:02:21 -04:00
Jeff Squyres
ce3a032b5e rcash_base_frame: fix compiler warning
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-08-16 09:48:31 -07:00
Ralph Castain
4cacd222d6 Merge pull request #4094 from rhc54/topic/pmix210rc1
Update to PMIx v2.1.0a1
2017-08-15 21:20:39 -07:00
Ralph Castain
eb69df02ae Update to PMIx v2.1.0rc1
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-15 19:59:15 -07:00
Ralph Castain
23ffbeb8f8 Merge pull request #4093 from rhc54/topic/toolsupport
Update tool support by adding MCA params to direct orted's to drop
2017-08-15 19:41:45 -07:00
Ralph Castain
65fb6070d9 Update tool support by adding MCA params to direct orted's to drop
session and/or system-level tool rendezous files. Ensure PMIx is
enabled for tools

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-15 17:49:47 -07:00