Nathan Hjelm
8c4da756cf
pmix: do not touch recently freed memory
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 08:44:50 -06:00
Ralph Castain
4c654ffd94
Sync to PMIx master
2015-09-21 21:27:06 -07:00
Ralph Castain
1b7930ad52
Silence some warnings and address Coverity issues
2015-09-16 07:58:22 -07:00
Ralph Castain
c1bbbb5e2f
Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds
2015-09-15 13:08:35 -07:00
Ralph Castain
22d7c0081a
Fix the no-disconnect test by resolving a segfault on free - opal_dss.unload will return the remaining unpacked portion of a buffer. As such, it cannot return the pointer to that info as it might be partway inside of a malloc'd region. So copy the data out of the buffer.
2015-09-11 13:01:35 -07:00
Ralph Castain
dc5796b8a1
Revert "Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local""
...
Fix the locality computation by correctly computing the vpid of the local peer
This reverts commit open-mpi/ompi@6a8fad49e5 .
2015-09-11 08:29:51 -07:00
Ralph Castain
6a8fad49e5
Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local"
...
This reverts commit f94f3cda214ab937c46802896fb53b84bec6cc3a.
2015-09-11 02:01:25 -07:00
Ralph Castain
e0a52354d4
Sync to PMIx master at open-mpi/pmix@89680d6663
...
Includes changes to support BigEndian machines
2015-09-10 20:47:40 -07:00
Ralph Castain
a2a15cea8a
Fix the s1 component so direct launch is supported for SLURM
2015-09-10 16:07:37 -07:00
rhc54
3430f154fc
Merge pull request #885 from hppritcha/topic/pmix_not_pmix1xx_u16_prob
...
pmix/~pmix1xx: use u32 for OPAL_PMIX_LOCAL_SIZE
2015-09-10 15:38:54 -07:00
Howard Pritchard
2bbf22e2d0
pmix/~pmix1xx: use u32 for OPAL_PMIX_LOCAL_SIZE
...
Looks like in ess_pmi_module.c u32 is being used
for retrieving OPAL_PMIX_LOCAL_SIZE, while s1/s2/cray
pmix components were storing as u16.
This commit fixes this problem.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-09-10 11:41:39 -07:00
Ralph Castain
f94f3cda21
Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local
2015-09-10 10:25:30 -07:00
Ralph Castain
4c47c498ac
Sync to latest PMIx master
...
Allow the blocking send and recv to keep trying
2015-09-09 11:48:47 -07:00
Gilles Gouaillardet
7f0ed74d24
pmix1xx: fix CPPFLAGS when DSO are not built
2015-09-09 14:20:12 +09:00
rhc54
f6b6b9a9ca
Merge pull request #877 from rhc54/topic/s1s2
...
Cleanup s1 and s2 components
2015-09-08 19:20:59 -07:00
Ralph Castain
1cdb86b8c7
Cleanup s1 and s2 components, and ensure that mpirun and orteds only use non-direct-launch pmix components.
2015-09-08 18:37:09 -07:00
rhc54
3a446c9797
Merge pull request #876 from rhc54/topic/hnp
...
Fix segfault upon job error
2015-09-08 15:10:51 -07:00
rhc54
47f437608d
Merge pull request #875 from rhc54/topic/dynamics
...
Stop a segfault in the test by correctly passing all the argv during spawn
2015-09-08 14:35:42 -07:00
Ralph Castain
459f169e06
Fix segfault upon job error
...
Silence some unnecessary error-logs
2015-09-08 14:03:06 -07:00
Ralph Castain
ae7156cabb
Stop a segfault in the test by correctly passing all the argv during spawn
2015-09-08 13:42:46 -07:00
rhc54
8053357fcc
Merge pull request #873 from rhc54/topic/static
...
Add the libs required for PMIx to support static builds (and trim all excess whitespace)
2015-09-08 11:28:47 -07:00
Ralph Castain
291afe502f
Add the libs required for PMIx to support static builds
...
Remove unneeded CPPFLAGS
2015-09-08 10:21:06 -07:00
Jeff Squyres
bc9e5652ff
whitespace: purge whitespace at end of lines
...
Generated by running "./contrib/whitespace-purge.sh".
2015-09-08 09:47:17 -07:00
Ralph Castain
e6add86e4f
Deal with connect/accept between two jobs from different mpirun's. Somewhat optimize connect/accept by using MPI bcast to distribute the participants instead of another PMIx lookup. Cleanup some Coverity issues.
2015-09-07 09:19:24 -07:00
Ralph Castain
37c3ed68e7
Cleanup connect/disconnect and bring comm_spawn back online!
2015-09-06 10:27:39 -07:00
rhc54
665b30376a
Merge pull request #868 from rhc54/topic/hwloc
...
Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given
2015-09-04 17:58:07 -07:00
Ralph Castain
d97bc29102
Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given
2015-09-04 16:54:40 -07:00
rhc54
d45ccda813
Merge pull request #866 from rhc54/topic/updatepmix
...
Update PMIx support
2015-09-04 11:09:36 -07:00
Ralph Castain
f6948c2bb4
Sync with PMIx master 43e45c3. Get multi-node publish/lookup/unpublish working
2015-09-04 10:07:17 -07:00
Howard Pritchard
0557beee22
Merge pull request #864 from hppritcha/topic/pmix_cray_more_funcs
...
pmix/cray: more stubs plus a get_version method
2015-09-03 14:52:46 -06:00
Howard Pritchard
6e7345c790
pmix/cray: more stubs plus a get_version method
...
Add more stubs to reduce likelihood of future
mysterious segfaults if some of the newer pmix
funcs start to get used within ompi.
Add a get_version to return the version of the
Cray PMI library being used, since the Cray PMI
library actually has a function to get that info.
Be more accurate about which functions have a hope
of being implemented using Cray PMI and those which
never will.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-09-03 12:51:50 -07:00
Ralph Castain
a772b46c15
Bring the MPI_Publish and friends online
2015-09-02 12:04:07 -07:00
Ralph Castain
95dbd70f44
Sync to PMIx 1.1, sha- 51479b0
2015-09-01 14:09:25 -07:00
rhc54
d8cb3fe705
Merge pull request #852 from rhc54/topic/pmix
...
Sync to PMIx tarball - includes:
2015-09-01 06:54:34 -07:00
Gilles Gouaillardet
6dfa996760
configury: fix a typo in opal/mca/pmix/pmix1xx/configure.m4
2015-09-01 14:59:07 +09:00
Ralph Castain
c1bbd7bc78
Sync to PMIx tarball - includes:
...
* update to configury to silence ident messages (thanks Gilles!)
* fix for warnings Jeff saw when get didn't find the requested data
* fix for Mac OSX operations
2015-08-31 21:51:02 -07:00
Ralph Castain
ef69958e01
Only copy the value across if the "get" operation succeeded
2015-08-31 17:11:26 -07:00
Ralph Castain
a3842af709
Sync to PMIx tarball
2015-08-31 07:47:46 -07:00
Ralph Castain
bcabd1e282
Sync with PMIx tarball, bringing across the warning fixes pointed out by Gilles
2015-08-30 21:13:55 -07:00
Gilles Gouaillardet
7e6a213465
pmix: fix compilation error
...
compilation failed because of missing prototypes when configure'd with --enable-debug --enable-picky on a CentOS 7 box
2015-08-31 10:33:13 +09:00
rhc54
51a8a0f5d7
Merge pull request #842 from rhc54/topic/smfix
...
Fix shared memory operations by resolving local peers
2015-08-30 14:49:43 -07:00
Ralph Castain
b0d7564400
Sync to PMIx 1.1 - do not check pmix version when making connections
2015-08-30 12:15:30 -07:00
Ralph Castain
38ba54366c
Fix shared memory operations by resolving local peers
2015-08-30 12:07:14 -07:00
Ralph Castain
0d5814b5ca
Cleanup Coverity issues
2015-08-29 21:19:27 -07:00
Ralph Castain
3cab860a01
Some cleanups - still some errors that impact shared memory operations
2015-08-29 18:11:11 -07:00
Ralph Castain
1d71037139
Update some APIs
2015-08-29 17:26:32 -07:00
Ralph Castain
79827ceaa8
Remove stale directory
2015-08-29 17:15:17 -07:00
Ralph Castain
cf6137b530
Integrate PMIx 1.0 with OMPI.
...
Bring Slurm PMI-1 component online
Bring the s2 component online
Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.
Bring the OMPI pubsub/pmi component online
Get comm_spawn working again
Ensure we always provide a cpuset, even if it is NULL
pmix/cray: adjust cray pmix component for pmix
Make changes so cray pmix can work within the integrated
ompi/pmix framework.
Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet
Cleanup comm_spawn - procs now starting, error in connect_accept
Complete integration
2015-08-29 16:04:10 -07:00
Jeff Squyres
d7c25f683e
pmix_native: update to the new opal_progress_thread API
2015-08-07 10:13:40 -07:00
Ralph Castain
219c4dfba5
Create a new opal_async_event_base and have the pmix/native and ORTE level use it. This reduces our thread count by one.
2015-07-12 08:23:34 -07:00