1
1
openmpi/orte/mca
Ralph Castain 554da83865 Set the locality for remote procs even after a comm_spawn. Ensure we store our own local cpuset upon launch so it will be shared during comm_join.
This provides full locality - i.e., not just node-level, but all the way down to whatever common binding level exists between the procs.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31106.
2014-03-18 14:51:07 +00:00
..
dfs Fix longstanding issue with our multi-project support. Rather than using 2014-01-07 22:11:15 +00:00
errmgr Deal with the corner case where we encounter an error when attempting to launch a daemon. In this case, we will order abnormal termination before daemons callback to us, and thus any attempt to send them a "die" message will fail. Ensure that mpirun at least exits cleanly in this scenario, thereby allowing the remote daemons that did get launched to commit suicide when comm fails. 2014-03-14 15:32:30 +00:00
ess Set the locality for remote procs even after a comm_spawn. Ensure we store our own local cpuset upon launch so it will be shared during comm_join. 2014-03-18 14:51:07 +00:00
filem Protect array against crossing boundaries 2014-01-17 21:36:20 +00:00
grpcomm Fully fix the PMI2 warning - turned out to be larger than originally thought due to the way the function was being handled across multiple files. Properly resolve the problem by not compiling the file if PMI2 is not desired, and then appropriately setting the visibility of the function within the module 2014-03-17 17:36:37 +00:00
iof Cleanup some potential memory overruns 2014-01-19 16:31:26 +00:00
odls Remove the job_control_forwarding logic as we want *any* signal to go to all members of the process group 2014-03-17 22:45:33 +00:00
oob There is no OOB component object - it is a simple struct with an opal_list_item_t element at the beginning 2014-03-17 21:23:59 +00:00
plm Cleanup copy/paste errors to ensure we progress the launch 2014-03-18 01:24:49 +00:00
ras Paul Hargrove has pointed out that some big SMP systems (e.g., from SGI) configure Torque differently - instead of listing each node name once/slot in the nodefile, they list the node only once and set an envar to indicate the number of procs/node being allocated. Add an MCA param users can set to indicate we are in such an environment, and then use the envar to set the slots. Error out if the mode flag is given, but (a) we don't find the PBS_PPN envar, or (b) we find a node actually listed more than once in the PBS_Nodefile. 2014-02-05 15:51:17 +00:00
rmaps When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it 2014-03-10 15:53:07 +00:00
rml use the newly created JOB_STATE_FT_* events 2014-03-12 12:37:14 +00:00
routed Deal with the corner case where we encounter an error when attempting to launch a daemon. In this case, we will order abnormal termination before daemons callback to us, and thus any attempt to send them a "die" message will fail. Ensure that mpirun at least exits cleanly in this scenario, thereby allowing the remote daemons that did get launched to commit suicide when comm fails. 2014-03-14 15:32:30 +00:00
sensor Do a little cleanup - only resusage needs the node/proc info, so remove it from the sensor base 2014-03-17 21:26:46 +00:00
snapc Use unique collective ids for the checkpoint/restart code 2014-02-04 14:03:05 +00:00
sstore fix "warning: 'sstore_stage_select' defined but not used" 2014-03-06 16:53:27 +00:00
state Continue to resolve priority issues. Cleanup the case of forced termination in mpirun during launch processing by ensuring we can respond to socket closures, and ensuring that the remote daemons correctly close their sockets when terminating. 2014-03-13 04:02:24 +00:00