1
1
openmpi/orte/util
Ralph Castain d672fad849 Repair rsh/ssh tree spawn
Repair rsh/ssh tree spawn by unpacking and updating the nidmap in remote_spawn.

Add more specific error messages so the cause of a messaging problem is a little clearer. Remove some stale code. Ensure we stop trying to send a message after a few times.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 11:35:00 -08:00
..
comm Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier). 2016-10-23 21:52:39 -07:00
dash_host Extend the -host:N syntax to accept "*" or "auto" to indicate "auto-detect the #cpus and set #slots to that value" 2017-01-24 10:21:01 -08:00
hostfile util/hostfile: plug a memory leak 2017-01-06 15:38:45 +09:00
attr.c Deprecate the --slot-list paramaeter in favor of --cpu-list. Remove the --cpu-set param (mark it as deprecated) and use --cpu-list instead as it was confusing having the two params. The --cpu-list param defines the cpus to be used by procs of this job, and the binding policy will be overlayed on top of it. 2017-01-24 13:33:22 -08:00
attr.h Deprecate the --slot-list paramaeter in favor of --cpu-list. Remove the --cpu-set param (mark it as deprecated) and use --cpu-list instead as it was confusing having the two params. The --cpu-list param defines the cpus to be used by procs of this job, and the binding policy will be overlayed on top of it. 2017-01-24 13:33:22 -08:00
compress.c Compress the xcast message if bigger than a defined size to further improve launch performance at scale 2017-01-19 22:08:02 -08:00
compress.h Compress the xcast message if bigger than a defined size to further improve launch performance at scale 2017-01-19 22:08:02 -08:00
context_fns.c Silence Coverity warning 2016-03-14 09:42:43 -07:00
context_fns.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
error_strings.c Repair rsh/ssh tree spawn 2017-01-27 11:35:00 -08:00
error_strings.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
help-regex.txt Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
hnp_contact.c Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier). 2016-10-23 21:52:39 -07:00
hnp_contact.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
listener.c Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
listener.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Makefile.am Compress the xcast message if bigger than a defined size to further improve launch performance at scale 2017-01-19 22:08:02 -08:00
name_fns.c orte_util_snprintf_jobid: return ORTE_SUCCESS or ORTE_ERROR 2016-01-18 09:44:33 +09:00
name_fns.h sentinel: fix sentinel to proc_name conversion 2016-02-10 15:44:07 +09:00
nidmap.c Cleanup launch 2017-01-25 22:06:09 -08:00
nidmap.h Next step in reducing launch time: begin reducing the size of the launch message itself. Start by expressing the daemon map as a set of three regular expression strings. On an 8k cluster, this reduces the nidmap contribution from over 200kBytes to 21 bytes in size. 2017-01-23 19:54:47 -08:00
parse_options.c Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
parse_options.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
pre_condition_transports.c Enable PSM to support dynamic processes 2016-09-02 10:22:04 -07:00
pre_condition_transports.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
proc_info.c Clean out old cruft from the ORCM project 2016-09-21 00:13:30 -07:00
proc_info.h Clean out old cruft from the ORCM project 2016-09-21 00:13:30 -07:00
regex.c Next step in reducing launch time: begin reducing the size of the launch message itself. Start by expressing the daemon map as a set of three regular expression strings. On an 8k cluster, this reduces the nidmap contribution from over 200kBytes to 21 bytes in size. 2017-01-23 19:54:47 -08:00
regex.h Next step in reducing launch time: begin reducing the size of the launch message itself. Start by expressing the daemon map as a set of three regular expression strings. On an 8k cluster, this reduces the nidmap contribution from over 200kBytes to 21 bytes in size. 2017-01-23 19:54:47 -08:00
session_dir.c Fix the session directory cleanup - only remove the jobfam session dir level if we are the local daemon and are cleaning up our own session directory. 2016-12-03 09:59:18 -08:00
session_dir.h Several fixes related to session directories: 2016-09-05 07:48:44 +03:00
show_help.c Complete the memprobe support. This provides a new scaling tool called "mpi_memprobe" that samples the memory footprint of the local daemon and the client procs, and then reports the results. The output contains the footprint of the daemon on each node, plus the average footprint of the client procs on that node. 2017-01-05 10:32:17 -08:00
show_help.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00