1
1
openmpi/orte/runtime
Ralph Castain d28dd55d33 Minimize the amount of topology info returned by the daemons. Most clusters, especially at scale, use the same node topology on every node, so there is no re
ason to return the topology from every daemon. Borrow a page from the --hetero-apps page and let users indicate that the node topology differs by adding a --
hetero-nodes option to mpirun. If the option is set, then every daemon returns topology info. If not set, then only daemon vpid=1 returns it.

We always want one daemon to return the topology as the head node is often different from the compute nodes. Having one daemon return the compute node topolo
gy allows us to detect any such difference. All compute nodes are then set to the same topology.

This commit was SVN r25408.
2011-11-01 18:43:10 +00:00
..
data_type_support Bring back the local node's binding capabilities along with its topology. Clean up indentation. 2011-10-30 13:20:16 +00:00
help-orte-runtime.txt Remove lingering references to opal_profile option 2011-05-18 18:27:29 +00:00
Makefile.am Start reducing our dependency on the event library by removing at least one instance where we use it to redirect the program counter. Rolf reported occasional hangs of mpirun in very specific circumstances after all daemons were done. A review of MTT results indicates this may have been happening more generally in a small fraction of cases. 2010-07-17 21:03:27 +00:00
orte_cr.c Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac. 2010-10-24 18:35:54 +00:00
orte_cr.h Correct several export declarations. 2011-08-15 09:45:51 +00:00
orte_data_server.c By popular demand the epoch code is now disabled by default. 2011-08-26 22:16:14 +00:00
orte_data_server.h Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. 2008-02-28 01:57:57 +00:00
orte_finalize.c * opal_atomic_trylock is documented to return 0 if the lock was acquired, 2011-10-11 18:43:45 +00:00
orte_globals.c Minimize the amount of topology info returned by the daemons. Most clusters, especially at scale, use the same node topology on every node, so there is no re 2011-11-01 18:43:10 +00:00
orte_globals.h Minimize the amount of topology info returned by the daemons. Most clusters, especially at scale, use the same node topology on every node, so there is no re 2011-11-01 18:43:10 +00:00
orte_init.c Provide a generic fix for the termination issue instead of r25248. The 2011-10-18 03:07:37 +00:00
orte_locks.c Start reducing our dependency on the event library by removing at least one instance where we use it to redirect the program counter. Rolf reported occasional hangs of mpirun in very specific circumstances after all daemons were done. A review of MTT results indicates this may have been happening more generally in a small fraction of cases. 2010-07-17 21:03:27 +00:00
orte_locks.h Start reducing our dependency on the event library by removing at least one instance where we use it to redirect the program counter. Rolf reported occasional hangs of mpirun in very specific circumstances after all daemons were done. A review of MTT results indicates this may have been happening more generally in a small fraction of cases. 2010-07-17 21:03:27 +00:00
orte_mca_params.c Minimize the amount of topology info returned by the daemons. Most clusters, especially at scale, use the same node topology on every node, so there is no re 2011-11-01 18:43:10 +00:00
orte_quit.c Provide a generic fix for the termination issue instead of r25248. The 2011-10-18 03:07:37 +00:00
orte_quit.h Start reducing our dependency on the event library by removing at least one instance where we use it to redirect the program counter. Rolf reported occasional hangs of mpirun in very specific circumstances after all daemons were done. A review of MTT results indicates this may have been happening more generally in a small fraction of cases. 2010-07-17 21:03:27 +00:00
orte_wait.c * opal_atomic_trylock is documented to return 0 if the lock was acquired, 2011-10-11 18:43:45 +00:00
orte_wait.h By popular demand the epoch code is now disabled by default. 2011-08-26 22:16:14 +00:00
runtime_internals.h Modify the accounting system to recycle jobids. Properly recover resources from nodes and jobs upon completion. Adjustments in several places were required to deal with sparsely populated job, node, and proc arrays as a result of this change. 2009-03-03 16:39:13 +00:00
runtime.h Provide a generic fix for the termination issue instead of r25248. The 2011-10-18 03:07:37 +00:00