1
1
openmpi/orte/util
Ralph Castain fde83a44ab This confusion has been around for awhile, caused by a long-ago decision to track slots allocated to a specific job as opposed to allocated to the overall mpirun instance. We eliminated that quite a while ago, but never consolidated the "slots_alloc" and "slots" fields in orte_node_t. As a result, confusion has grown in the code base as to which field to look at and/or update.
So (finally) consolidate these two fields into one "slots" field. Add a field in orte_job_t to indicate when all the procs for a job will be launched together, so that staged operations can know when MPI operations are allowed.

This commit was SVN r27239.
2012-09-05 01:30:39 +00:00
..
comm Sorry for mid-day commit, but I had promised on the call to do this upon my return. 2012-04-06 14:23:13 +00:00
dash_host If (and only if) a user requests, set the default number of slots on any node to the number of objects of the specified type. This *only* takes effect in an unmanaged environment - i.e., if an external resource manager assigns us a number of slots, then that is what we use. However, if we are using a hostfile, then the user may or may not have given us a value for the number of slots on each node. 2012-09-04 20:58:26 +00:00
hostfile This confusion has been around for awhile, caused by a long-ago decision to track slots allocated to a specific job as opposed to allocated to the overall mpirun instance. We eliminated that quite a while ago, but never consolidated the "slots_alloc" and "slots" fields in orte_node_t. As a result, confusion has grown in the code base as to which field to look at and/or update. 2012-09-05 01:30:39 +00:00
context_fns.c Hostname is not used in this function. 2010-07-21 11:07:28 +00:00
context_fns.h ... Delayed due to notifier commits earlier this day ... 2009-04-29 01:32:14 +00:00
error_strings.c Introduce staged execution. If you don't have adequate resources to run everything without oversubscribing, don't want to oversubscribe, and aren't using MPI, then staged execution lets you (a) run as many procs as there are available resources, and (b) start additional procs as others complete and free up resources. Adds a new mapper as well as a new state machine. 2012-08-28 21:20:17 +00:00
error_strings.h Introduce staged execution. If you don't have adequate resources to run everything without oversubscribing, don't want to oversubscribe, and aren't using MPI, then staged execution lets you (a) run as many procs as there are available resources, and (b) start additional procs as others complete and free up resources. Adds a new mapper as well as a new state machine. 2012-08-28 21:20:17 +00:00
help-regex.txt Fix regular expression analyzer for slurmd - use a slurm-specific version 2011-07-13 22:49:56 +00:00
hnp_contact.c Sorry for mid-day commit, but I had promised on the call to do this upon my return. 2012-04-06 14:23:13 +00:00
hnp_contact.h As requested by Aurelien at the July design meeting - long time coming, but finally got around to it. 2008-12-10 17:10:39 +00:00
Makefile.am Revamp the errmgr framework to provide a greater range of optional behaviors, including different behaviors for daemons, and remove several looping messages across the code base: 2010-04-23 04:44:41 +00:00
name_fns.c Add database framework to ORTE and refactor modex code to utilize it. Create the "hash" db component from the prior modex db code. Leave the other components ignored for now - will activate them later. 2012-06-19 13:38:42 +00:00
name_fns.h Roll in the rest of the modex change. Eliminate all non-modex API access of RTE info from the MPI layer - in some cases, the info was already present (either in the ompi_proc_t or in the orte_process_info struct) and no call was necessary. This removes all calls to orte_ess from the MPI layer. Calls to orte_grpcomm remain required. 2012-06-27 14:53:55 +00:00
nidmap.c Add some protection to allow NULL bytes in byte objects and NULL strings to be handled cleanly in nidmaps and modex entries. Ensure there is a valid nidmap available for the HNP to pass down to any local procs when it is operating alone. 2012-08-31 01:07:36 +00:00
nidmap.h Introduce staged execution. If you don't have adequate resources to run everything without oversubscribing, don't want to oversubscribe, and aren't using MPI, then staged execution lets you (a) run as many procs as there are available resources, and (b) start additional procs as others complete and free up resources. Adds a new mapper as well as a new state machine. 2012-08-28 21:20:17 +00:00
parse_options.c Use ports as multicast channels instead of networks so we avoid stepping into reserved spaces. 2011-04-29 18:46:40 +00:00
parse_options.h Use ports as multicast channels instead of networks so we avoid stepping into reserved spaces. 2011-04-29 18:46:40 +00:00
pre_condition_transports.c This is actually a much smaller commit than it appears at first glance - it just touches a lot of files. The --without-rte-support configuration option has never really been implemented completely. The option caused various objects not to be defined and conditionally compiled some base functions, but did nothing to prevent build of the component libraries. Unfortunately, since many of those components use objects covered by the option, it caused builds to break if those components were allowed to build. 2011-11-22 21:24:35 +00:00
pre_condition_transports.h In the case of direct-launched processes running under slurm, psm requires that the pre_condition_transports MCA param be set. This is normally computed by mpirun and inserted into each proc's environ, but that doesn't work here. 2011-04-28 13:54:33 +00:00
proc_info.c Remove debug 2012-08-13 21:35:21 +00:00
proc_info.h For cases where the alpha+non-zero prefix must be removed from a node name, be sure to do it everywhere we access node names - otherwise, modex methods such as pmi will fail to correctly identify procs on the same node 2012-08-13 20:44:56 +00:00
regex.c This is actually a much smaller commit than it appears at first glance - it just touches a lot of files. The --without-rte-support configuration option has never really been implemented completely. The option caused various objects not to be defined and conditionally compiled some base functions, but did nothing to prevent build of the component libraries. Unfortunately, since many of those components use objects covered by the option, it caused builds to break if those components were allowed to build. 2011-11-22 21:24:35 +00:00
regex.h Fix some major bit-rot on scalable launch. If static ports are provided, then daemons can connect back to the HNP via the routed connection tree instead of doing so directly. In order to do that at scale, the node list must be passed as a regular expression - otherwise, the orted command line gets too long. 2011-07-07 18:54:30 +00:00
session_dir.c Some cleanup of the tmpdir session directory specifications. Remove the --tmpdir option from orterun as it was confusing. Create an orte_local_tmpdir_base mca param in its place. Clarify the role of the local vs remote vs global tmpdir base params, and ensure that you don't set conflicting options. 2012-02-16 16:10:01 +00:00
session_dir.h - Revert r20739 2009-03-05 21:56:03 +00:00
show_help.c Backout the ORCA commit. :( 2012-06-27 01:28:28 +00:00
show_help.h Backout the ORCA commit. :( 2012-06-27 01:28:28 +00:00