Ralph Castain
92c996487c
Update how we pass the node regex so we pass _all_ nodes, even those without daemons. This allows the backend daemons to form a complete picture of the allocation. Include info on which nodes have daemons on them, and populate that info on the backend as well.
...
Set the daemons' state to "running" and mark them as "alive" by default when constructing the nidmap
Get the DVM running again
Fix direct modex by eliminating race condition caused by releasing data while sending it
Up the size limit before compressing
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-03 19:25:15 -07:00
Ralph Castain
2cc5fea8be
Update to PMIx v2.0alpha
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-03 10:02:29 -07:00
Ralph Castain
7dd34d0c9a
Use the correct callback data - the callback function was expecting a bool*, not a pmix_ptl_sr_t*.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-28 17:21:47 -07:00
Ralph Castain
35f817911e
Fix coverity issues
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-24 08:09:46 -07:00
Ralph Castain
c0bcd11bcf
Fix permissions - no CI required
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-23 08:05:52 -07:00
Ralph Castain
55e4fba5f5
If we lose connection to the server after initiating a send/recv in PMIx (e.g., in PMIx_Abort), then we need to "resolve" all pending recvs to avoid hanging.
...
Fixes #3225
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-23 02:53:21 -07:00
Ralph Castain
d645557fa0
Update to include the PMIx 2.0 APIs for monitoring and job control. Include required integration, but leave the monitors off for now. Move the sensor framework out of ORTE as it is being absorbed into PMIx
...
Fix typo and silence warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-21 17:47:08 -07:00
Ralph Castain
4b6d220a83
You cannot include both pmi.h and pmi2.h as they have conflicting defines in them.
...
Thanks to Kilian Cavalotti for pointing it out
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-19 11:53:54 -07:00
Ralph Castain
c6bc3ccb76
Sync to latest PMIx master and PMIx reference server
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-11 12:50:38 -08:00
Ralph Castain
aca7091114
Fix some minor compatibility issues by ensuring job-level data gets stored against wildcard rank in the cray, s1, and s2 components, and that the ext1 component translates all wildcard rank requests into the peer's rank since v1.x of PMIx doesn't understand wildcard ranks
...
Closes #3101
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-05 10:30:59 -08:00
Ralph Castain
1de72ff023
Silence an unnecessary error log
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-02 17:18:34 -08:00
Ralph Castain
e86a0dbf39
Update to PMIx master to include dlopen fixes and addition of libltdl support
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-22 11:54:33 -08:00
Ralph Castain
8cffdcf127
Ensure that the pmix headers and lib get installed when --with-devel-headers is given so that PMIx applications can be built and executed against the "embedded" PMIx version
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-21 13:46:46 -08:00
Gilles Gouaillardet
bb2481a84b
pmix2x: synchronize to the latest PMIx master
...
pmix/master@f57d9b2953
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-20 10:45:17 +09:00
Ralph Castain
f49118eaab
Fix some pmix configuration code
...
Remove stale file reference that caused a check to always fail. Update psm2 function check to new libs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-16 10:54:47 -08:00
Howard Pritchard
b272f87926
Merge pull request #2968 from hjelmn/pmix_cray
...
pmix/cray: performance improvements and cleanup
2017-02-16 11:41:59 -07:00
Ralph Castain
201f8571ca
Ensure we retain the peer object until we are done with it, then detect that the socket has closed due to a lost connection and cleanly release the message event
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 18:30:55 -08:00
Ralph Castain
9cd7349d7c
Instead of completely free'ing the event base, pause the PMIx progress thread before tearing down the infrastructure, and then release the event base at the end of the procedure. This allows any infrastructure objects holding events to delete them prior to free'ing the event base.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 05:02:43 -08:00
Ralph Castain
f7fe2f7189
Merge pull request #2977 from rhc54/topic/spawn
...
Fix comm_spawn by registering nspace info only when needed
2017-02-15 04:31:54 -08:00
Ralph Castain
68b53e2179
Fix comm_spawn by registering nspace info only when needed - either when we have local procs, or when job-level info is required by connecting jobs
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-14 19:47:56 -08:00
Ralph Castain
0c8609ca16
Update to newest PMIx master (includes configuration cleanups). Silence trivial Coverity warning in hwloc base.
...
Cleanup a race condition segfault during finalize by ensuring the PMIx progress thread is stopped prior to starting to tear down the messaging components
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-14 15:14:00 -08:00
Nathan Hjelm
3b912ea2a7
pmix/cray: performance improvements and cleanup
...
Do not use opal_output_verbose inside O(n) loops. This was causing us
to make O(n) calls to snprintf which was greatly slowing launch at
scale.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-14 11:13:10 -07:00
Ralph Castain
35578b4009
Update to lastest PMIx master
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-13 23:19:26 -08:00
Gilles Gouaillardet
7acef4833e
pmix2x: Update to latest PMIx master
...
pmix/master@6ed27be839
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-08 13:23:27 +09:00
KAWASHIMA Takahiro
4b2eba34a6
Merge pull request #2933 from kawashima-fj/pr/dstore-config-desc
...
pmix/pmix2x: Correct configure option description
2017-02-08 13:03:27 +09:00
Jeff Squyres
100b112d3c
pmix: fix zlib protection macro usage
...
It's possible that we can have zlib.h but still not have zlib support.
Use the correct macro to protect the usage of calling zlib functions.
This fixes 32-bit MTT builds at Cisco (e.g.,
https://mtt.open-mpi.org/index.php?do_redir=2389 ).
Submitted upstream to PMIX: https://github.com/pmix/master/pull/290
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-07 05:52:32 -08:00
KAWASHIMA Takahiro
750406f67b
pmix/pmix2x: Correct configure option description
...
`--enable-pmix-dstore` option was enabled by default in f4a5511.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-02-07 11:52:56 +09:00
Ralph Castain
edcfdf2365
Update to latest PMIx master
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-31 08:01:37 -08:00
Gilles Gouaillardet
b078e57e73
pmix/ext1x: fix misc memory leaks in namespace registration
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:42 +09:00
Gilles Gouaillardet
f51fc293a2
ext1x/pmix1x_client: plug misc memory leaks
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:42 +09:00
Gilles Gouaillardet
022cca79ea
pmix/ext1x: plug a memory leak in opal_lkupcbfunc()
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:36 +09:00
Gilles Gouaillardet
f485d12a82
pmix: rename the ext11 component into ext1x
...
also use the same naming scheme thann pmix/ext2x
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:35 +09:00
Gilles Gouaillardet
dccb1899e6
pmix/ext11: correctly use PMIx_server_register_nspace()
...
PMIx_server_register_nspace() is an asynchronous operation, so
the pmix glue wait for it completes before returning.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 09:23:19 +09:00
Gilles Gouaillardet
6955e1e25c
pmix/ext11: fix compilation
...
the argc field from the opal_pmix_app_t struct was removed,
so adjust the pmix/ext11 glue accordingly.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 09:23:18 +09:00
Ralph Castain
3302864a7d
Cleanup a typo that can cause a segfault - use a local variable name different than the one passed into the function
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 16:49:25 -08:00
Gilles Gouaillardet
896434b1bd
pmix/ext2x: plug a memory leak in opal_lkupcbfunc()
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-26 14:07:15 +09:00
Gilles Gouaillardet
6b8e1c217c
pmix/ext2x: plug misc memory leaks
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-26 14:06:58 +09:00
Gilles Gouaillardet
142b95df87
pmix/ext2x: plug misc memory leaks regarding opal_pmix2x_event_chain_t handling
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-25 16:17:10 +09:00
Gilles Gouaillardet
7a3d39f079
pmix/ext2x: plug a memory leak in _reg_nspace()
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-25 16:17:01 +09:00
Gilles Gouaillardet
189da7fdab
pmix2x: plug a memory leak in _event_hdlr()
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:13:30 +09:00
Gilles Gouaillardet
acbc32d3b2
pmix2x: plug a memory leak in opal_lkupcbfunc()
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:13:29 +09:00
Gilles Gouaillardet
b5b21043c4
pmix2x: plug a memory leak in _reg_nspace()
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:13:29 +09:00
Gilles Gouaillardet
0f47310a75
pmix2x/pmix2x_client: plug misc memory leaks
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:13:29 +09:00
Ralph Castain
8c960bae8d
Update to latest PMIx master
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-23 07:07:40 -08:00
Ralph Castain
e568b211e4
Silence Coverity CID 1398541
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-10 15:30:50 -08:00
Gilles Gouaillardet
6d59b476de
Merge pull request #2686 from ggouaillardet/topic/pmix2x_ptl_base_sendrecv
...
pmix2x: ptl/base: send header and message data together via writev()
2017-01-10 16:26:10 +09:00
Gilles Gouaillardet
44c1ff60f1
Merge pull request #2672 from ggouaillardet/topic/misc_memory_leaks
...
Plug misc memory leaks
2017-01-10 13:16:04 +09:00
Gilles Gouaillardet
a01960bee5
pmix2x: ptl/base: send header and message data together via writev()
...
on Linux, sending the header and then the message data does severely
impact performances of ptl/tcp :
on the receiver, reading the data can often result in an PMIX_ERR_RESOURCE_BUSY
or PMIX_ERR_WOULD_BLOCK, which ends up degrading performances)
this commit send both header and message data at the same time via writev()
and makes ptl/tcp virtually as efficient as ptl/usock.
Short writev generally occur when the kernel buffer is full, so there is no
point for retrying in this case.
fwiw, no such degradation was observed on OSX.
Refs open-mpi/ompi#2657
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-10 13:07:39 +09:00
Ralph Castain
67fce2861b
Merge pull request #2685 from rhc54/topic/cov
...
Resolve Coverity issues
2017-01-07 13:11:40 -08:00
Ralph Castain
e25e69dc2f
Resolve Coverity issues
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-07 10:45:52 -08:00