1
1
Граф коммитов

19 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
5cfa2a7fca Complete integration of job_control
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-08-20 16:10:50 -07:00
Ralph Castain
5ac2ce6346 Cover all the PMIx data types
Cover all data types for OPAL-to-PMIx conversion, generating error logs when we hit something we don't support

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-20 09:06:19 -07:00
Ralph Castain
48f27655a6 Sync to PMIx v3.0rc and add ext4x
Sync to the draft rc for PMIx v3.0. Add an external component for PMIx master, which is at v4.0

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-11 05:54:23 -07:00
Ralph Castain
55ac526a67 Enable the PMIx ompi/rte component
Get the OMPI rte/pmix component working. This was tested using PRRTE as the RM, configuring OMPI using:

* autogen --no-orte

* with external libevent, external hwloc, and external PMIx master

* configuring PMIx master with the same libevent and hwloc

* execute the application using PRRTE's "prun" launcher, which has the same cmd line as ORTE's mpirun

Note that PMIx master appears to have a bug in the event notification system that caches job termination events. Thus, the first execution runs fine, but subsequent executions cause an "abort" when the OMPI default error handler is invoked upon notification of the prior job's termination. Will work that separately.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 134cca9ac0de092d767999357573a31703f72292)
2018-06-03 07:25:12 -07:00
Ralph Castain
e443adc7a1 Reset OMPI master to PMIx master
Track PMIx master instead of the reference server - fixes problem of external PMIx master builds.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-25 08:36:46 -07:00
Ralph Castain
17c40f4cea Implement support for proctable queries
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-02 02:00:31 -08:00
Ralph Castain
0434b615b5 Update ORTE to support PMIx v3
This is a point-in-time update that includes support for several new PMIx features, mostly focused on debuggers and "instant on":

* initial prototype support for PMIx-based debuggers. For the moment, this is restricted to using the DVM. Supports direct launch of apps under debugger control, and indirect launch using prun as the intermediate launcher. Includes ability for debuggers to control the environment of both the launcher and the spawned app procs. Work continues on completing support for indirect launch

* IO forwarding for tools. Output of apps launched under tool control is directed to the tool and output there - includes support for XML formatting and output to files. Stdin can be forwarded from the tool to apps, but this hasn't been implemented in ORTE yet.

* Fabric integration for "instant on". Enable collection of network "blobs" to be delivered to network libraries on compute nodes prior to local proc spawn. Infrastructure is in place - implementation will come later.

* Harvesting and forwarding of envars. Enable network plugins to harvest envars and include them in the launch msg for setting the environment prior to local proc spawn. Currently, only OmniPath is supported. PMIx MCA params control which envars are included, and also allows envars to be excluded.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-02 02:00:31 -08:00
Ralph Castain
1a7dfd7d54 Sync to PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-07 12:16:51 -08:00
Ralph Castain
a5679ef000 Update the PMIx 3.x component
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:34:44 -08:00
Ralph Castain
6216225bda Ensure cleanup of registered files/dirs
Resolve a race condition between registering for a file to be removed upon termination and actual creation of that file by providing attributes that identify whether the path is a file or directory. This removes the need for PMIx to detect the difference.

Refs #4686

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-11 11:05:30 -08:00
Ralph Castain
07427c6d89 Update to PMIx v3.0 PR for cleanup registration
If available, have apps use registration capability to cleanup their session directories. Setup capability for vader to register its shared memory file location - let someone familiar with that code do so.

Final cleanup to track uid/gid, update the opal/pmix API to pass flags for ignore and leave top directory alone

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-18 06:53:11 -08:00
Ralph Castain
6ea3c8a0bd Update the interlib example to show an alternative method for model declaration. Add a missing range value to the OPAL layer. Make it easier to see OMPI model callbacks
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-23 11:27:42 -07:00
Ralph Castain
c696e04c5e Since PMIx is moving to release v3.0, embed the new release candidate in opal/pmix framework. Move the pmix2x code over to the ext2x component. Create a new ext3x component
Remove some build product. Tell PMIx that we don't need a new nspace generated when OMPI calls connect
Add missing Makefile

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-09 13:51:08 -07:00
Ralph Castain
1a0bccb536 Now that PMIx has settled on its release strategy and numbering, update the OPAL pmix framework to track
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-02 15:44:43 -08:00
Ralph Castain
649301a3a2 Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier).
Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
2016-10-23 21:52:39 -07:00
Ralph Castain
6f65d0a173 Repair event notification support. Cleanup the long-suffering "epoll: warning" coming out of libevent whenever a process abnormally terminated.
Add changes to test program

Sync to PMIx master
2016-10-13 16:27:39 -07:00
Ralph Castain
e773c17cf3 Put show_help thru the PMIx "log" API. This pushes the show_help output from apps into the pmix thread, thus avoiding conflicts in the RML thread, which should help with thread lock situations. 2016-10-02 16:02:23 -07:00
Ralph Castain
0ea1cff733 Implement notification of completion on comm_spawn'd child jobs. Add a configure flag to enable PMIx 3's shared memory datastore, and set it disable by default so that comm_spawn functions again. Will reverse the default once that feature is fully functional 2016-09-01 13:10:10 -07:00
Ralph Castain
af67f16422 Update configury to support multiple PMIx versions, rename pmix2x component to pmix3x for support of PMIx master
Update support for external v1.1.x and v2.x libraries. Minor corrections to the v3.x component
2016-08-25 18:19:05 -07:00