Ralph Castain
68029b27e4
Fix the orte-dvm operations so that orterun can connect and execute an application. There is a lingering problem, though. The first invocation of orterun succeeds every time. However, subsequent invocations have a high probability of hanging in the OOB connection handshake.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-23 17:31:08 -07:00
Ralph Castain
2e23fba5c4
Merge pull request #4136 from rhc54/topic/pmixup
...
Continue tracking PMIx v2.1.0
2017-08-23 11:16:39 -07:00
Ralph Castain
0561d64748
Continue tracking PMIx v2.1.0
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-23 09:38:27 -07:00
Ralph Castain
f6fd699d44
Merge pull request #4133 from rhc54/topic/modex
...
Optimize discovery of HWLOC topology
2017-08-22 21:00:49 -07:00
Ralph Castain
e02c39385a
Merge branch 'master' into topic/modex
2017-08-22 20:06:35 -07:00
George Bosilca
50f471e31e
Cleanup a set of warnings reported by Ralph.
...
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-08-22 23:00:18 -04:00
Gilles Gouaillardet
565b516dae
hwloc/base: fix opal_output() usage
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-08-23 10:24:47 +09:00
Ralph Castain
d80b0c7990
If the HWLOC shared memory system is unable to connect, then fallback to providing the topology via XML. Do not automatically provide the XML to every process as that defeats the purpose of the shared memory system. Instead, use PMIx_Query_info_nb to get the info from the server when required.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-22 18:12:26 -07:00
Ralph Castain
8273cea9d6
Merge pull request #4132 from rhc54/topic/ext
...
Fix the external PMIx and HWLOC components
2017-08-22 15:18:55 -07:00
Ralph Castain
38e363c515
Fix the #if check for hwloc version
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-22 14:07:36 -07:00
Ralph Castain
e3213386ec
Fix the internal PMIx installation - matching changes have been upstreamed
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-22 13:49:07 -07:00
Ralph Castain
a1b15c5666
Roll in update to PMIx master. Transfer updates from pmix2x component to ext2x
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-22 13:06:47 -07:00
Jeff Squyres
b991135634
Merge pull request #4128 from jsquyres/pr/fix-info-delete-return-value
...
mpi/info_delete: fix return code
2017-08-22 14:33:29 -04:00
Jeff Squyres
ea5093fc14
mpi/info_delete: fix return code
...
Per MPI-3.1, ensure to raise an MPI exception with value
MPI_ERR_INFO_NOKEY if we try to MPI_INFO_DELETE a key that does not
exist. Thanks to @dalcinl (Lisando Dalcin) for raising the issue.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-08-22 08:56:40 -07:00
Ralph Castain
f5fb43e9c7
Merge pull request #4120 from bgoglin/master
...
fixes and debug messages to the hwloc/shmem use
2017-08-22 07:59:45 -07:00
Brice Goglin
046d870124
rtc/hwloc/shmem: add Inria copyrights
...
The code for finding the hole for the shmem region actually came from me.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2017-08-21 23:09:57 +02:00
Brice Goglin
2d242ab9f0
hwloc/shmem: don't abort on failure to load from shmem
...
Adopting can fail if the server-side hole isn't available on the client.
We can fallback to other ways to load the topology.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2017-08-21 19:57:38 +02:00
Brice Goglin
ffd209fc2e
hwloc/shmem: dump /proc/self/maps if failed to find a hole and verbosity > 4
...
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2017-08-21 19:57:38 +02:00
Brice Goglin
baf762d99d
rtc/hwloc/shmem: dump /proc/self/maps if failed to find a hole and verbosity > 4
...
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2017-08-21 19:57:38 +02:00
Brice Goglin
8f6afbb641
rtc/hwloc/shmem: fix "heap" hole search kind
...
There can be multiple [heap] consecutively in proc/<pid>/maps,
and there's no room between them.
Don't use a hole after the first [heap] is there's another [heap]
immediately after it.
This code would fail to find the last [heap] if there were multiple
[heap] interleaved with non-heap VMA, but our kind "after heap"
wouldn't be meaningful anymore anyway.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2017-08-21 15:42:38 +02:00
Brice Goglin
b8b46b253b
rtc/hwloc/shmem: fix "libs" hole search kind
...
We want the biggest hole *between* heap and stack, not outside.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2017-08-21 15:40:36 +02:00
Gilles Gouaillardet
a3e31fa8d0
ompi/communicator: plug a memory leak in ompi_comm_init()
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-08-21 11:47:11 +09:00
Ralph Castain
9d3f4516e6
Merge pull request #4116 from rhc54/topic/notify
...
Don't restrict broadcast notifications
2017-08-18 18:13:47 -07:00
Ralph Castain
d515f48885
The local PMIx server is notifying its clients of all events, but for some reason I don't recall, the broadcast notification was marked for delivery only to non-default event handlers. This creates a discrepancy between the two behaviors, so don't restrict the broadcast notifications.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-18 17:26:11 -07:00
Brian Barrett
c667719a3f
Merge pull request #3955 from mohanasudhan/master
...
Btl tcp: Improved diagnostic output and failure mode
2017-08-18 11:42:27 -07:00
Mohan
fc32ae401e
Btl Tcp: Updated tcp handshake methods
...
This commit has two changes
1. Adding magic string during handshake can cause
issue when used with older version of MPI. Hence set
RCVTIMEO paramter to 2 second
2. Using single call during handshake instead of
two calls
Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-18 10:06:52 -07:00
Mohan
e3dfe11da9
Btl tcp: Improving verbose around tcp
...
As part of improvement towards tcp btl we
are improving verbose in general
Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 17:22:16 -07:00
Mohan
4bc7b214dc
Btl tcp: Improving verbose around IPV6
...
As part of improvement around tcp btl debugging
& verbose. we are improving verbose around IPV6
Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:14 -07:00
Mohan
0741fad479
Btl tcp: BTL_ERROR to show_help & update func behaviour
...
As part of improvement towards tcp debugging
we are moving few BTL_ERROR to show_help and also
update the function behaviour of
mca_btl_tcp_endpoint_complete_connect to return
SUCCESS and ERROR cases.
Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:14 -07:00
Mohan
368f9f0dfc
Btl tcp: Using magic string to verify mpi connection
...
As part of improvement towards handling failure case
in btl tcp we are using magic string to verify mpi
connection. In case if there is mismatch or missing
magic string we can identify that we are trying to
connect with someother process.
Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:13 -07:00
Mohan
c30a42917c
Btl tcp: Refactoring non-blocking send/receive function
...
Moving non-blocking send/receive function to btl_tcp
will help reusing these function where ever needed.
In this case we plan to reuse receive function to
retrive magic string to validate established connection
is from mpi process.
Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:13 -07:00
Ralph Castain
b67b1e88a5
Merge pull request #4111 from rhc54/topic/multiconnect
...
Cleanup some issues in connect/accept support across jobs started by …
2017-08-17 12:49:01 -07:00
Ralph Castain
d85239e052
Cleanup some issues in connect/accept support across jobs started by different mpirun commands. Still not fully operational, but someone else will have to finish debugging it
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-17 11:58:48 -07:00
Ralph Castain
a855ebd86b
Merge pull request #4110 from rhc54/topic/cov
...
Silence coverity warnings
2017-08-17 10:57:31 -07:00
Ralph Castain
088b6cdeee
Silence coverity warnings
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-17 09:49:35 -07:00
Jeff Squyres
4e763796b1
Merge pull request #4100 from jsquyres/pr/fix-nmcheck-prefix
...
nmcheck_prefix: more updates for more compilers
2017-08-16 20:39:34 -04:00
Ralph Castain
1f799afa30
Merge pull request #4106 from rhc54/topic/hwloc
...
Add diagnostics for hwloc get_topology
2017-08-16 15:47:47 -07:00
Ralph Castain
41df973359
Add diagnostics for hwloc get_topology
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-16 14:21:27 -07:00
Jeff Squyres
cd8db5313e
Merge pull request #4101 from jsquyres/pr/usnic-restore-configure-summary-line
...
btl/usnic: restore configure usNIC summary line
2017-08-16 16:36:19 -04:00
Josh Hursey
e0931714ea
Merge pull request #4090 from jjhursey/config/old-xl-ppc-support
...
config: Remove support for big endian PPC, XL compiler older than 13.1
2017-08-16 15:31:35 -05:00
Ralph Castain
f21dfd3189
Merge pull request #4097 from rhc54/topic/dlopepn
...
Change test per recommendation of @jsquyres
2017-08-16 13:22:18 -07:00
Jeff Squyres
a591159fb4
btl/usnic: restore configure usNIC summary line
...
Not sure how/when this got deleted, but put back the "Cisco usNIC"
line in the transport summary at the end of configure.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-08-16 12:37:59 -07:00
Jeff Squyres
9d09fe0151
nmcheck_prefix: more updates for more compilers
...
Ignore a few more symbols to pass Absoft and modern gcc.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-08-16 12:28:49 -07:00
Ralph Castain
c4d5dbfcdc
Change test per recommendation of @jsquyres
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-16 11:19:15 -07:00
Jeff Squyres
1f0b6f783c
Merge pull request #4095 from jsquyres/pr/fix-compiler-warning
...
rcash_base_frame: fix compiler warning
2017-08-16 14:02:21 -04:00
Jeff Squyres
ce3a032b5e
rcash_base_frame: fix compiler warning
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-08-16 09:48:31 -07:00
Ralph Castain
4cacd222d6
Merge pull request #4094 from rhc54/topic/pmix210rc1
...
Update to PMIx v2.1.0a1
2017-08-15 21:20:39 -07:00
Ralph Castain
eb69df02ae
Update to PMIx v2.1.0rc1
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-15 19:59:15 -07:00
Ralph Castain
23ffbeb8f8
Merge pull request #4093 from rhc54/topic/toolsupport
...
Update tool support by adding MCA params to direct orted's to drop
2017-08-15 19:41:45 -07:00
Ralph Castain
65fb6070d9
Update tool support by adding MCA params to direct orted's to drop
...
session and/or system-level tool rendezous files. Ensure PMIx is
enabled for tools
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-15 17:49:47 -07:00