1
1
Граф коммитов

23 Коммитов

Автор SHA1 Сообщение Дата
308bbcbad1 ompi/dpm: retrieves OPAL_PMIX_ARCH in heterogeneous mode
also remove code duplication by using ompi_proc_complete_init_single()

Thanks Siegmar Gross for reporting this issue, and Ralph for the guidance.
2016-02-22 11:01:06 +09:00
030a5f2054 sentinel: use type uintptr_t for sentinel
MSB is now automatically cleared when right shifting
Thanks George for pointing this
2016-02-10 11:28:56 +09:00
c0f8f2ce32 ompi/dpm: correctly handle sentinels in construct peers
This fix is similar to open-mpi/ompi@4c1ea4a171
and open-mpi/ompi@213b2abde4
2016-01-18 09:57:38 +09:00
4c1ea4a171 dpm: correctly handle procs_cutoff in ompi_dpm_connect_accept()
this commit includes missing bits from open-mpi/ompi@213b2abde4
2016-01-07 09:11:03 +09:00
213b2abde4 dpm: correctly handle procs_cutoff in ompi_dpm_connect_accept() 2016-01-06 16:21:13 +09:00
5334d22a37 ompi/group: release ompi_proc_t's at group destruction
This commit changes the way ompi_proc_t's are retained/released by
ompi_group_t's. Before this change ompi_proc_t's were retained once
for the group and then once for each retain of a group. This method
adds unnecessary overhead (need to traverse the group list each time
the group is retained) and causes problems when using an async
add_procs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-30 23:03:47 -07:00
4cb42f8264 ompi: fix coverity issues
Fixes CID 715741: Logically dead code

Verified. Removed dead code.

Fixes CID 1320878: Resource leak

Free proc_list before returning.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-10-09 08:41:27 -06:00
a4a3dfd480 Cleanup the code a bit by simply adding our nspace to the top of the list of jobid <-> nspace correlations. Add two new APIs to opal_pmix for registering new jobid/nspace pairs and retrieving an nspace given a jobid - these are required to support connect/accept. No impact on the PMIx library. 2015-09-28 08:50:13 -07:00
c1bbbb5e2f Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds 2015-09-15 13:08:35 -07:00
9c45c63143 ompi/dpm: fix typo in dynamic communicator detection
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-15 12:42:58 -06:00
ed005f2a61 ompi/dpm: improve scalability of ompi_dpm_mark_dyncomm
This commit removes the use of ompi_group_peer_lookup in the
ompi_dpm_mark_dyncomm function. The function now uses
ompi_group_get_proc_name which does not allocate an ompi_proc_t if one
does not already exist.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 10:50:58 -06:00
b79cffc73b Protect ourselves - if the active pmix component doesn't have some optional functions, then gracefully decline to perform the operation OR use a required alternative (e.g., fence in place of disconnect)
This fixes the Slurm pmi2 support - still something wrong in pmi1
2015-09-09 02:29:00 -07:00
3a446c9797 Merge pull request #876 from rhc54/topic/hnp
Fix segfault upon job error
2015-09-08 15:10:51 -07:00
459f169e06 Fix segfault upon job error
Silence some unnecessary error-logs
2015-09-08 14:03:06 -07:00
ae7156cabb Stop a segfault in the test by correctly passing all the argv during spawn 2015-09-08 13:42:46 -07:00
e6add86e4f Deal with connect/accept between two jobs from different mpirun's. Somewhat optimize connect/accept by using MPI bcast to distribute the participants instead of another PMIx lookup. Cleanup some Coverity issues. 2015-09-07 09:19:24 -07:00
37c3ed68e7 Cleanup connect/disconnect and bring comm_spawn back online! 2015-09-06 10:27:39 -07:00
665b30376a Merge pull request #868 from rhc54/topic/hwloc
Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given
2015-09-04 17:58:07 -07:00
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
f6948c2bb4 Sync with PMIx master 43e45c3. Get multi-node publish/lookup/unpublish working 2015-09-04 10:07:17 -07:00
a772b46c15 Bring the MPI_Publish and friends online 2015-09-02 12:04:07 -07:00
0d5814b5ca Cleanup Coverity issues 2015-08-29 21:19:27 -07:00
cf6137b530 Integrate PMIx 1.0 with OMPI.
Bring Slurm PMI-1 component online
Bring the s2 component online

Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.

Bring the OMPI pubsub/pmi component online

Get comm_spawn working again

Ensure we always provide a cpuset, even if it is NULL

pmix/cray: adjust cray pmix component for pmix

Make changes so cray pmix can work within the integrated
ompi/pmix framework.

Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet

Cleanup comm_spawn - procs now starting, error in connect_accept

Complete integration
2015-08-29 16:04:10 -07:00