Ralph Castain
ca69403cc8
In MPMD case, add slots given to each of the executables instead of overwriting
2016-05-15 08:55:43 -07:00
Ralph Castain
7767882346
Per user request, add some missing data and definitions:
...
OPAL_PMIX_UNIV_RANK - synonym for OPAL_PMIX_GLOBAL_RANK
OPAL_PMIX_APP_SIZE - #ranks in the application of this proc
2016-05-09 08:39:01 -07:00
Ralph Castain
1911d74095
Prevent segfault when -debug given to mpirun
2016-05-08 10:19:05 -07:00
Ralph Castain
7e5ef6a240
Fix the env_list support - the MCA param was being set way too early, so provide a "backdoor" way of providing the value
2016-05-06 15:38:39 -07:00
Ralph Castain
58dd41facf
Repair the processing of cmd line options that mapped to MCA params. This was responsible for breaking things like map-by <foo>.
...
Remove debug, let orterun send terminate cmd to DVM
Recover the DVM support
2016-05-06 13:14:03 -07:00
rhc54
ff8518853e
Merge pull request #1604 from rhc54/topic/psm2
...
Improve the transport key print statement to ensure that we don't get…
2016-05-03 13:43:10 -07:00
Jeff Squyres
265e5b9795
Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1
...
ompi/opal/orte/oshmem/test: max hostname length cleanup
2016-05-02 09:44:18 -04:00
rhc54
2fa8b6c6ac
Merge pull request #1525 from rhc54/topic/schizo
...
Extend the schizo framework
2016-05-01 15:09:08 -07:00
Ralph Castain
6ac7929bd0
Extend the schizo framework to allow definition of CLI options by environment. Refactor orterun to mesh with the orted_submit code, thus improving code reuse. Eliminate the orte-submit tool as orterun can now meet that need.
...
Cleanups per @jjhursey review
2016-05-01 11:30:25 -07:00
Ralph Castain
0f05893952
Ensure consistency between max_procs and univ_size values - since orte wants max_procs, have the proc get that value instead of univ_size
...
Make the singleton module consistent as well
2016-05-01 11:13:33 -07:00
Ralph Castain
29bc24bdd5
Improve the transport key print statement to ensure that we don't get zero fields as this can be a problem for PSM
2016-04-28 20:11:12 -07:00
Ralph Castain
fac409d094
Ensure the personality gets set for the debugger job launch when attaching
2016-04-28 15:28:55 -07:00
Ralph Castain
e6ad1ad621
Up-port of change for 2.x: if user directs oversubscribe, then do not bind as we will otherwise overload resources
2016-04-28 13:21:10 -07:00
Ralph Castain
75dc4c305a
Correctly set the #procs in the job to "job_size", and the max_procs to "univ_size"
2016-04-27 12:00:19 -07:00
Gilles Gouaillardet
6bf57c799f
orte/rml: ORTE_RML_SEND_COMPLETE handles messages with both NULL iov and cbfunc.buffer
2016-04-26 09:19:31 +09:00
Karol Mroz
5c11bdb251
orte: fixup hostname max length usage
...
Also removes orte specific max hostname value.
Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-25 07:08:23 +02:00
Joshua Hursey
29b49351af
ras/lsf: Fix affinity for MPMD jobs running under LSF
2016-04-22 11:18:34 -05:00
Jeff Squyres
68c1a5eb6c
Merge pull request #1567 from jsquyres/pr/fix-ompi-to-opal-name-conversion
...
m4: rename OMPI_SUMMARY_* macros to OPAL_SUMMARY_*
2016-04-20 13:10:06 -04:00
Jeff Squyres
6800ef9ec0
m4: rename OMPI_SUMMARY_* macros to OPAL_SUMMARY_*
...
These macros should really be named OPAL_SUMMARY_*; they're used in
all projects, and therefore should be in the lowest later project (OPAL).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-04-20 08:40:00 -07:00
Ralph Castain
449ec41532
Roll to PMIx 1.1.4rc1 and remove the PMIx 1.2.0 directory as the community has decided to not do that release version. This incorporates a number of bug fixes that have been identified and repaired in the PMIx and OMPI code bases. Also includes several minor corrections to the PMIx code so it now supports run-thru without hanging on collectives involving a process that exits
2016-04-15 10:11:11 -07:00
Ralph Castain
1fa236b26c
Ensure that we exit with a non-zero status when oversubscribe fails
2016-04-14 05:51:10 -07:00
Ralph Castain
437f5b4289
Fix map-by node and do-not-launch
2016-04-13 09:21:19 -07:00
Ralph Castain
2432daf065
Some minor cleanups of a memory leak and error output
2016-04-08 07:46:18 -07:00
Rainer Keller
ad690a4bc0
Move the help into the proper file: all orte_show_help in
...
orte/orted/pmix/pmix_server.c reference orterun.
2016-04-07 22:52:23 +02:00
Rainer Keller
52080a5736
As per the pull request to pmix/master:
...
https://github.com/pmix/master/pull/71
Have OMPI's current version of pmix120 nicely fail in case of
too long sun_path (longer than 108 or in case of OSX 103 chars).
And have OMPI return proper error messages with hints how to
amend.
2016-04-07 22:12:53 +02:00
rhc54
a95de6e8ef
Merge pull request #1353 from rhc54/topic/host
...
Per the discussion on the telecon, change the -host behavior yet again
2016-04-04 10:30:36 -07:00
Gilles Gouaillardet
d757fbba5d
oob/usock: drop message to be sent in process_send()
2016-04-04 16:04:54 +09:00
Gilles Gouaillardet
170734182b
oob/usock: mca_oob_usock_peer_close() sets peer->sd = -1 after close()
...
so usock_peer_create_socket know it must re-create the socket
/* assuming it is ever supposed to occur */
also fix a typo (peer->sd >= 0) in usock_peer_create_socket
2016-04-04 16:02:05 +09:00
Gilles Gouaillardet
2ede47c462
pmix: fix misc missing conversion and type issues
2016-04-04 10:12:34 +09:00
Ralph Castain
503e1274a9
Per the discussion on the telecon, change the -host behavior so we only run one instance if no slots were provided and the user didn't specify #procs to run. However, if no slots are given and the user does specify #procs, then let the number of slots default to the #found processing elements
...
Ensure the returned exit status is non-zero if we fail to map
If no -np is given, but either -host and/or -hostfile was given, then error out with a message telling the user that this combination is not supported.
If -np is given, and -host is given with only one instance of each host, then default the #slots to the detected #pe's and enforce oversubscription rules.
If -np is given, and -host is given with more than one instance of a given host, then set the #slots for that host to the number of times it was given and enforce oversubscription rules. Alternatively, the #slots can be specified via "-host foo:N". I therefore believe that row #7 on Jeff's spreadsheet is incorrect.
With that one correction, this now passes all the given use-cases on that spreadsheet.
Make things behave under unmanaged allocations more like their managed cousins - if the #slots is given, then no-np shall fill things up.
Fixes #1344
2016-03-29 11:21:57 -07:00
Ralph Castain
bd18d9c9d5
Ensure the compiler knows that a critical variable is volatile
2016-03-29 09:18:25 -07:00
Howard Pritchard
e7433fcb44
Merge pull request #1486 from hppritcha/topic/fix_wlm_detect_code
...
plm/alps: fix usage of cray wlm_detect methods
2016-03-26 13:22:50 -06:00
Ralph Castain
0e1350f5b7
Add missing header files
2016-03-25 09:06:51 -07:00
Ralph Castain
a3fea58d1c
Minor cleanups to prior PR commit
2016-03-24 15:55:14 -07:00
rhc54
6756e19aa2
Merge pull request #1457 from anandhis/master
...
rml changes
2016-03-24 15:17:29 -07:00
rhc54
ba8c8700aa
Merge pull request #1493 from rhc54/topic/sing
...
Update singularity support to track changes in upstream Singularity code
2016-03-24 15:16:38 -07:00
Ralph Castain
8c14df2328
Revert "Modify singularity support per patch from Greg Kurtzer"
...
This reverts commit open-mpi/ompi@f7257a8310 .
Ensure that we properly cleanup the session directory tree. Prior code had issues with symlinks, especially if the file that the link points to was already removed as we traverse the tree. Also found that the dirent checks for directory type weren't fully portable, and so fall back to the stat-based approach which is known to be portable.
Fix singularity singletons by detecting we are in a container and properly setting the pmix selection to pick the isolated component. Remove a stale restriction blocking use of the sm btl
2016-03-24 11:27:18 -07:00
Ralph Castain
378d9cbb5e
Extend the abort on non zero status flag to apply to processes which die as the result of signals.
2016-03-24 08:33:55 -07:00
Ralph Castain
cdd3dc99ca
Correct the binding for the --map-by node case - we should still use our default binding algorithms
2016-03-23 09:55:24 -07:00
Ralph Castain
6e6bbfda91
Very minor typo
2016-03-23 08:31:47 -07:00
Ralph Castain
4a623778a9
Fix the debugger attach - previous commit had fixed one instance of a check prior to sending the release message, but there was a second code path that included a similar check that was missed. Thanks to John DelSignore for spotting it!
2016-03-23 08:25:25 -07:00
Howard Pritchard
69200e6229
plm/alps: fix usage of cray wlm_detect methods
...
Turns out there are some cases where the Cray
wlm_detect_get_active may return NULL, in which
case fallback to wlm_detect_get_default method
is suggested. Make use of the fallback to
avoid segfaults under some circumstances in the
ALPS plm selection method.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-03-22 11:40:56 -07:00
Ralph Castain
c146c4969b
Revert part of open-mpi/ompi@c1bbbb5e2f to restore the usock component, thus fixing show_help aggregation.
...
Fixes #1467
Restore debugger attach operations
Fixes #1225
2016-03-18 21:49:04 -07:00
Ralph Castain
8f410d7897
Revert one part of open-mpi/ompi@4d0cc27eb7
2016-03-18 07:23:30 -07:00
Ralph Castain
2970becd6b
Revert "Merge pull request #1451 from ggouaillardet/topic/orte_fork_wrapper_fullname"
...
This reverts commit efafd62d38bb12c161330d5a6e4f338e9b560a7e, reversing
changes made to a93b849f13b12a7b1c1cdde71a9e491ddc220e17.
2016-03-18 07:18:36 -07:00
Ralph Castain
a67ff065ae
Silence coverity warnings
2016-03-16 08:43:16 -07:00
Nysal Jan K.A
f6e932c864
Fix memory corruption in orte-ps
...
orte-ps ends up free'ing the same pointer multiple times
2016-03-15 16:03:31 +05:30
Ralph Castain
6d7ada9675
Silence Coverity warning
2016-03-14 09:42:43 -07:00
Gilles Gouaillardet
589924c4aa
odls/base: use the full app name when using an orte fork agent
2016-03-14 11:18:21 +09:00
Anandhi S Jayakumar
a31292abc7
fixes to ud for removing qos channel
2016-03-10 18:03:17 -08:00