openmpi/mca at 5f3b81e291789d25162a477321a7a5a1a25d122a - openmpi - СВД Встраиваемые Системы. Git

ports/openmpi

Форкнуть 0

История

Dave Goodell 5f3b81e291 oob: delete events when destroying a peer

Without this patch running ring_c with the usnic BTL under valgrind will
cause the orteds to segfault.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Ralph Castain <rhc@open-mpi.org>

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r31161.

2014-03-19 22:15:49 +00:00

..

Fix longstanding issue with our multi-project support. Rather than using

2014-01-07 22:11:15 +00:00

Deal with the corner case where we encounter an error when attempting to launch a daemon. In this case, we will order abnormal termination before daemons callback to us, and thus any attempt to send them a "die" message will fail. Ensure that mpirun at least exits cleanly in this scenario, thereby allowing the remote daemons that did get launched to commit suicide when comm fails.

2014-03-14 15:32:30 +00:00

Set the locality for remote procs even after a comm_spawn. Ensure we store our own local cpuset upon launch so it will be shared during comm_join.

2014-03-18 14:51:07 +00:00

Protect array against crossing boundaries

2014-01-17 21:36:20 +00:00

Fully fix the PMI2 warning - turned out to be larger than originally thought due to the way the function was being handled across multiple files. Properly resolve the problem by not compiling the file if PMI2 is not desired, and then appropriately setting the visibility of the function within the module

2014-03-17 17:36:37 +00:00

Cleanup some potential memory overruns

2014-01-19 16:31:26 +00:00

More corrections w.r.t. process groups

2014-03-18 21:31:01 +00:00

oob: delete events when destroying a peer

2014-03-19 22:15:49 +00:00

Surrender to the tyranny of C++ and give up on enum for node states, as nice as that would be, in favor of retaining memory footprint constraints.

2014-03-19 16:15:24 +00:00

Paul Hargrove has pointed out that some big SMP systems (e.g., from SGI) configure Torque differently - instead of listing each node name once/slot in the nodefile, they list the node only once and set an envar to indicate the number of procs/node being allocated. Add an MCA param users can set to indicate we are in such an environment, and then use the envar to set the slots. Error out if the mode flag is given, but (a) we don't find the PBS_PPN envar, or (b) we find a node actually listed more than once in the PBS_Nodefile.

2014-02-05 15:51:17 +00:00

When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it

2014-03-10 15:53:07 +00:00

use the newly created JOB_STATE_FT_* events

2014-03-12 12:37:14 +00:00

Deal with the corner case where we encounter an error when attempting to launch a daemon. In this case, we will order abnormal termination before daemons callback to us, and thus any attempt to send them a "die" message will fail. Ensure that mpirun at least exits cleanly in this scenario, thereby allowing the remote daemons that did get launched to commit suicide when comm fails.

2014-03-14 15:32:30 +00:00

Do a little cleanup - only resusage needs the node/proc info, so remove it from the sensor base

2014-03-17 21:26:46 +00:00

Use unique collective ids for the checkpoint/restart code

2014-02-04 14:03:05 +00:00

fix "warning: 'sstore_stage_select' defined but not used"

2014-03-06 16:53:27 +00:00

Continue to resolve priority issues. Cleanup the case of forced termination in mpirun during launch processing by ensuring we can respond to socket closures, and ensuring that the remote daemons correctly close their sockets when terminating.

2014-03-13 04:02:24 +00:00