openmpi

Ralph Castain 554da83865 Set the locality for remote procs even after a comm_spawn. Ensure we store our own local cpuset upon launch so it will be shared during comm_join. This provides full locality - i.e., not just node-level, but all the way down to whatever common binding level exists between the procs. cmr=v1.7.5:reviewer=jsquyres This commit was SVN r31106.		2014-03-18 14:51:07 +00:00
..
dfs	Fix longstanding issue with our multi-project support. Rather than using	2014-01-07 22:11:15 +00:00
errmgr	Deal with the corner case where we encounter an error when attempting to launch a daemon. In this case, we will order abnormal termination before daemons callback to us, and thus any attempt to send them a "die" message will fail. Ensure that mpirun at least exits cleanly in this scenario, thereby allowing the remote daemons that did get launched to commit suicide when comm fails.	2014-03-14 15:32:30 +00:00
ess	Set the locality for remote procs even after a comm_spawn. Ensure we store our own local cpuset upon launch so it will be shared during comm_join.	2014-03-18 14:51:07 +00:00
filem	Protect array against crossing boundaries	2014-01-17 21:36:20 +00:00
grpcomm	Fully fix the PMI2 warning - turned out to be larger than originally thought due to the way the function was being handled across multiple files. Properly resolve the problem by not compiling the file if PMI2 is not desired, and then appropriately setting the visibility of the function within the module	2014-03-17 17:36:37 +00:00
iof	Cleanup some potential memory overruns	2014-01-19 16:31:26 +00:00
odls	Remove the job_control_forwarding logic as we want any signal to go to all members of the process group	2014-03-17 22:45:33 +00:00
oob	There is no OOB component object - it is a simple struct with an opal_list_item_t element at the beginning	2014-03-17 21:23:59 +00:00
plm	Cleanup copy/paste errors to ensure we progress the launch	2014-03-18 01:24:49 +00:00
ras	Paul Hargrove has pointed out that some big SMP systems (e.g., from SGI) configure Torque differently - instead of listing each node name once/slot in the nodefile, they list the node only once and set an envar to indicate the number of procs/node being allocated. Add an MCA param users can set to indicate we are in such an environment, and then use the envar to set the slots. Error out if the mode flag is given, but (a) we don't find the PBS_PPN envar, or (b) we find a node actually listed more than once in the PBS_Nodefile.	2014-02-05 15:51:17 +00:00
rmaps	When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it	2014-03-10 15:53:07 +00:00
rml	use the newly created JOB_STATE_FT_* events	2014-03-12 12:37:14 +00:00
routed	Deal with the corner case where we encounter an error when attempting to launch a daemon. In this case, we will order abnormal termination before daemons callback to us, and thus any attempt to send them a "die" message will fail. Ensure that mpirun at least exits cleanly in this scenario, thereby allowing the remote daemons that did get launched to commit suicide when comm fails.	2014-03-14 15:32:30 +00:00
sensor	Do a little cleanup - only resusage needs the node/proc info, so remove it from the sensor base	2014-03-17 21:26:46 +00:00
snapc	Use unique collective ids for the checkpoint/restart code	2014-02-04 14:03:05 +00:00
sstore	fix "warning: 'sstore_stage_select' defined but not used"	2014-03-06 16:53:27 +00:00
state	Continue to resolve priority issues. Cleanup the case of forced termination in mpirun during launch processing by ensuring we can respond to socket closures, and ensuring that the remote daemons correctly close their sockets when terminating.	2014-03-13 04:02:24 +00:00