1
1

10 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
ac522a521f Ensure that prun doesn't prematurely exit
Ensure that prun doesn't exit until notified that its own child job
terminated.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-11 19:03:32 -08:00
Ralph Castain
3b71be4db4 Update the scaling script to avoid use of "system" command, thus ensuring that each command sees the same environment. Fix prun to pickup and propagate OMPI MCA params
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-23 16:27:41 -07:00
Ralph Castain
60b338e857 Sync to PMIx v3. Ensure prun uses the ess/tool component.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-14 08:24:57 -07:00
Ralph Castain
388034c814 Add support for the -v (verbose) option to prun and silence the "executing" and "completed" output otherwise.
Debounce "unreachable" notifications for tools when they disconnect
Enable the -x cmd line option for prun

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 0a5b36180a22959654461ac1303cec35313f8b4a)
2017-10-10 12:54:49 -07:00
Ralph Castain
5352c31914 Enable remote tool connections for the DVM. Fix notifications so we "de-bounce" termination calls
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-06 10:47:05 -07:00
Ralph Castain
fe9b584c05 Fully support OMPI spawn options. Fix a bug in the round-robin mappers where we weren't adding nodes to the job map node array, and so resources were not released
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 285d8cfef74ffc899e9c51e1d9c597b7fb2ceb89)
2017-09-21 10:29:27 -07:00
Ralph Castain
e575c4d6f9 Fix tool connection logic so we properly search for default session server, perform specified number of retries, etc.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 7c755e01004f8b86c71f1729662979ea45ab1adb)
2017-09-19 13:35:46 -07:00
Ralph Castain
3c914a7a97 Complete the fix of the ORTE DVM. We will now use "prun" instead of "orterun -hnp foo" to execute jobs. This provides the feature of automatic discovery of the orte-dvm so you don't need to manually enter URI's or contact file locations. All IO is forwarded to prun.
Still in the "needs to be done" category:

* mapping/ranking/binding options aren't correctly supported

* if the DVM encounters some errors (e.g., not enough resources for the job), the resulting error is globally set and impacts any subsequent job submission

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-16 13:13:07 -07:00
Ralph Castain
7c7d8a69a0 Backport changes from PMIx reference server
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-14 11:48:56 -07:00
Ralph Castain
bbd83fd4c0 Add a new launcher "prun" for starting applications against the ORTE DVM.
Unlike "orterun", "prun" is a PMIx-only program that discovers the DVM connection instead of requiring that we explicitly provide it. Only build "prun" if PMIx v2.x is available.

This gets the DVM working again, but still is showing problems for multiple executions. I'll detail those in a separate issue. Thus, the DVM should still be considered "broken".

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-12 21:40:41 -07:00