1
1

129 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
8c4da756cf pmix: do not touch recently freed memory
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 08:44:50 -06:00
Ralph Castain
4c654ffd94 Sync to PMIx master 2015-09-21 21:27:06 -07:00
Ralph Castain
1b7930ad52 Silence some warnings and address Coverity issues 2015-09-16 07:58:22 -07:00
Ralph Castain
c1bbbb5e2f Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds 2015-09-15 13:08:35 -07:00
Ralph Castain
22d7c0081a Fix the no-disconnect test by resolving a segfault on free - opal_dss.unload will return the remaining unpacked portion of a buffer. As such, it cannot return the pointer to that info as it might be partway inside of a malloc'd region. So copy the data out of the buffer. 2015-09-11 13:01:35 -07:00
Ralph Castain
dc5796b8a1 Revert "Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local""
Fix the locality computation by correctly computing the vpid of the local peer

This reverts commit open-mpi/ompi@6a8fad49e5.
2015-09-11 08:29:51 -07:00
Ralph Castain
6a8fad49e5 Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local"
This reverts commit f94f3cda214ab937c46802896fb53b84bec6cc3a.
2015-09-11 02:01:25 -07:00
Ralph Castain
e0a52354d4 Sync to PMIx master at open-mpi/pmix@89680d6663
Includes changes to support BigEndian machines
2015-09-10 20:47:40 -07:00
Ralph Castain
a2a15cea8a Fix the s1 component so direct launch is supported for SLURM 2015-09-10 16:07:37 -07:00
rhc54
3430f154fc Merge pull request #885 from hppritcha/topic/pmix_not_pmix1xx_u16_prob
pmix/~pmix1xx: use u32 for OPAL_PMIX_LOCAL_SIZE
2015-09-10 15:38:54 -07:00
Howard Pritchard
2bbf22e2d0 pmix/~pmix1xx: use u32 for OPAL_PMIX_LOCAL_SIZE
Looks like in ess_pmi_module.c u32 is being used
for retrieving OPAL_PMIX_LOCAL_SIZE, while s1/s2/cray
pmix components were storing as u16.

This commit fixes this problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-09-10 11:41:39 -07:00
Ralph Castain
f94f3cda21 Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local 2015-09-10 10:25:30 -07:00
Ralph Castain
4c47c498ac Sync to latest PMIx master
Allow the blocking send and recv to keep trying
2015-09-09 11:48:47 -07:00
Gilles Gouaillardet
7f0ed74d24 pmix1xx: fix CPPFLAGS when DSO are not built 2015-09-09 14:20:12 +09:00
rhc54
f6b6b9a9ca Merge pull request #877 from rhc54/topic/s1s2
Cleanup s1 and s2 components
2015-09-08 19:20:59 -07:00
Ralph Castain
1cdb86b8c7 Cleanup s1 and s2 components, and ensure that mpirun and orteds only use non-direct-launch pmix components. 2015-09-08 18:37:09 -07:00
rhc54
3a446c9797 Merge pull request #876 from rhc54/topic/hnp
Fix segfault upon job error
2015-09-08 15:10:51 -07:00
rhc54
47f437608d Merge pull request #875 from rhc54/topic/dynamics
Stop a segfault in the test by correctly passing all the argv during spawn
2015-09-08 14:35:42 -07:00
Ralph Castain
459f169e06 Fix segfault upon job error
Silence some unnecessary error-logs
2015-09-08 14:03:06 -07:00
Ralph Castain
ae7156cabb Stop a segfault in the test by correctly passing all the argv during spawn 2015-09-08 13:42:46 -07:00
rhc54
8053357fcc Merge pull request #873 from rhc54/topic/static
Add the libs required for PMIx to support static builds (and trim all excess whitespace)
2015-09-08 11:28:47 -07:00
Ralph Castain
291afe502f Add the libs required for PMIx to support static builds
Remove unneeded CPPFLAGS
2015-09-08 10:21:06 -07:00
Jeff Squyres
bc9e5652ff whitespace: purge whitespace at end of lines
Generated by running "./contrib/whitespace-purge.sh".
2015-09-08 09:47:17 -07:00
Ralph Castain
e6add86e4f Deal with connect/accept between two jobs from different mpirun's. Somewhat optimize connect/accept by using MPI bcast to distribute the participants instead of another PMIx lookup. Cleanup some Coverity issues. 2015-09-07 09:19:24 -07:00
Ralph Castain
37c3ed68e7 Cleanup connect/disconnect and bring comm_spawn back online! 2015-09-06 10:27:39 -07:00
rhc54
665b30376a Merge pull request #868 from rhc54/topic/hwloc
Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given
2015-09-04 17:58:07 -07:00
Ralph Castain
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
rhc54
d45ccda813 Merge pull request #866 from rhc54/topic/updatepmix
Update PMIx support
2015-09-04 11:09:36 -07:00
Ralph Castain
f6948c2bb4 Sync with PMIx master 43e45c3. Get multi-node publish/lookup/unpublish working 2015-09-04 10:07:17 -07:00
Howard Pritchard
0557beee22 Merge pull request #864 from hppritcha/topic/pmix_cray_more_funcs
pmix/cray: more stubs plus a get_version method
2015-09-03 14:52:46 -06:00
Howard Pritchard
6e7345c790 pmix/cray: more stubs plus a get_version method
Add more stubs to reduce likelihood of future
mysterious segfaults if some of the newer pmix
funcs start to get used within ompi.

Add a get_version to return the version of the
Cray PMI library being used, since the Cray PMI
library actually has a function to get that info.

Be more accurate about which functions have a hope
of being implemented using Cray PMI and those which
never will.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-09-03 12:51:50 -07:00
Ralph Castain
a772b46c15 Bring the MPI_Publish and friends online 2015-09-02 12:04:07 -07:00
Ralph Castain
95dbd70f44 Sync to PMIx 1.1, sha- 51479b0 2015-09-01 14:09:25 -07:00
rhc54
d8cb3fe705 Merge pull request #852 from rhc54/topic/pmix
Sync to PMIx tarball - includes:
2015-09-01 06:54:34 -07:00
Gilles Gouaillardet
6dfa996760 configury: fix a typo in opal/mca/pmix/pmix1xx/configure.m4 2015-09-01 14:59:07 +09:00
Ralph Castain
c1bbd7bc78 Sync to PMIx tarball - includes:
* update to configury to silence ident messages (thanks Gilles!)
* fix for warnings Jeff saw when get didn't find the requested data
* fix for Mac OSX operations
2015-08-31 21:51:02 -07:00
Ralph Castain
ef69958e01 Only copy the value across if the "get" operation succeeded 2015-08-31 17:11:26 -07:00
Ralph Castain
a3842af709 Sync to PMIx tarball 2015-08-31 07:47:46 -07:00
Ralph Castain
bcabd1e282 Sync with PMIx tarball, bringing across the warning fixes pointed out by Gilles 2015-08-30 21:13:55 -07:00
Gilles Gouaillardet
7e6a213465 pmix: fix compilation error
compilation failed because of missing prototypes when configure'd with --enable-debug --enable-picky on a CentOS 7 box
2015-08-31 10:33:13 +09:00
rhc54
51a8a0f5d7 Merge pull request #842 from rhc54/topic/smfix
Fix shared memory operations by resolving local peers
2015-08-30 14:49:43 -07:00
Ralph Castain
b0d7564400 Sync to PMIx 1.1 - do not check pmix version when making connections 2015-08-30 12:15:30 -07:00
Ralph Castain
38ba54366c Fix shared memory operations by resolving local peers 2015-08-30 12:07:14 -07:00
Ralph Castain
0d5814b5ca Cleanup Coverity issues 2015-08-29 21:19:27 -07:00
Ralph Castain
3cab860a01 Some cleanups - still some errors that impact shared memory operations 2015-08-29 18:11:11 -07:00
Ralph Castain
1d71037139 Update some APIs 2015-08-29 17:26:32 -07:00
Ralph Castain
79827ceaa8 Remove stale directory 2015-08-29 17:15:17 -07:00
Ralph Castain
cf6137b530 Integrate PMIx 1.0 with OMPI.
Bring Slurm PMI-1 component online
Bring the s2 component online

Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.

Bring the OMPI pubsub/pmi component online

Get comm_spawn working again

Ensure we always provide a cpuset, even if it is NULL

pmix/cray: adjust cray pmix component for pmix

Make changes so cray pmix can work within the integrated
ompi/pmix framework.

Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet

Cleanup comm_spawn - procs now starting, error in connect_accept

Complete integration
2015-08-29 16:04:10 -07:00
Jeff Squyres
d7c25f683e pmix_native: update to the new opal_progress_thread API 2015-08-07 10:13:40 -07:00
Ralph Castain
219c4dfba5 Create a new opal_async_event_base and have the pmix/native and ORTE level use it. This reduces our thread count by one. 2015-07-12 08:23:34 -07:00