1
1
openmpi/orte/util
Ralph Castain 503e1274a9 Per the discussion on the telecon, change the -host behavior so we only run one instance if no slots were provided and the user didn't specify #procs to run. However, if no slots are given and the user does specify #procs, then let the number of slots default to the #found processing elements
Ensure the returned exit status is non-zero if we fail to map

If no -np is given, but either -host and/or -hostfile was given, then error out with a message telling the user that this combination is not supported.

If -np is given, and -host is given with only one instance of each host, then default the #slots to the detected #pe's and enforce oversubscription rules.

If -np is given, and -host is given with more than one instance of a given host, then set the #slots for that host to the number of times it was given and enforce oversubscription rules. Alternatively, the #slots can be specified via "-host foo:N". I therefore believe that row #7 on Jeff's spreadsheet is incorrect.

With that one correction, this now passes all the given use-cases on that spreadsheet.

Make things behave under unmanaged allocations more like their managed cousins - if the #slots is given, then no-np shall fill things up.

Fixes #1344
2016-03-29 11:21:57 -07:00
..
comm Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
dash_host Per the discussion on the telecon, change the -host behavior so we only run one instance if no slots were provided and the user didn't specify #procs to run. However, if no slots are given and the user does specify #procs, then let the number of slots default to the #found processing elements 2016-03-29 11:21:57 -07:00
hostfile Refactor the ORTE DVM code so that external codes can submit multiple jobs using only a single connection to the HNP. 2016-02-13 08:10:44 -08:00
attr.c Revert "Modify singularity support per patch from Greg Kurtzer" 2016-03-24 11:27:18 -07:00
attr.h Per the discussion on the telecon, change the -host behavior so we only run one instance if no slots were provided and the user didn't specify #procs to run. However, if no slots are given and the user does specify #procs, then let the number of slots default to the #found processing elements 2016-03-29 11:21:57 -07:00
context_fns.c Silence Coverity warning 2016-03-14 09:42:43 -07:00
context_fns.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
error_strings.c Refactor the ORTE DVM code so that external codes can submit multiple jobs using only a single connection to the HNP. 2016-02-13 08:10:44 -08:00
error_strings.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
help-regex.txt Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
hnp_contact.c Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
hnp_contact.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
listener.c Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
listener.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Makefile.am Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
name_fns.c orte_util_snprintf_jobid: return ORTE_SUCCESS or ORTE_ERROR 2016-01-18 09:44:33 +09:00
name_fns.h sentinel: fix sentinel to proc_name conversion 2016-02-10 15:44:07 +09:00
nidmap.c Refactor the ORTE DVM code so that external codes can submit multiple jobs using only a single connection to the HNP. 2016-02-13 08:10:44 -08:00
nidmap.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
parse_options.c Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
parse_options.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
pre_condition_transports.c more c99 updates 2015-06-25 10:14:13 -06:00
pre_condition_transports.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
proc_info.c Plug a few memory leaks identified by valgrind 2015-09-23 15:21:04 -07:00
proc_info.h orte proc_info.h: use symbolic names 2015-11-10 13:39:21 -08:00
regex.c Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
regex.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
session_dir.c Now that we have an "isolated" PLM component, we cannot just let rsh silently decline to run when it cannot find a launch agent - if we do, then we will -always- run on the local node. So if the user specifies a launch agent and we can't find it, then generate a pretty error message, report a fatal error back to the component select, and exit out. 2015-09-24 07:16:48 -07:00
session_dir.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
show_help.c Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
show_help.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00