openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	866edf6a89	Now that George has found his problem, we no longer need the bozo check. Interesting how these platform-specific issues surface... This commit was SVN r25493.	2011-11-18 17:43:14 +00:00
George Bosilca	b613c7eacb	Fix the issue with the round robin mapper. When mixing different precisions, one should manually promote the participants to the expected type. In this particular example as opal_list_get_size returns an unsigned long, the computation on the left side is translated to an unsigned. If the hostfile contains more nodes that what required (via the -np), this leads to a gigantic value for the balance, and breaks the round robin algorithm. This commit was SVN r25492.	2011-11-18 17:03:35 +00:00
Ralph Castain	1e5e9bde77	Add protection against a bozo case where we could end up in an infinite loop while calculating ranks This commit was SVN r25491.	2011-11-18 15:35:55 +00:00
George Bosilca	88d32312d6	The bind_level should be initialized to zero or weird things happens. I'm not yet sure how and why, but packing a uint8_t with opal_dss lead to weird values during unpack (except if the original value is already set to zero). This commit was SVN r25490.	2011-11-18 10:22:58 +00:00
George Bosilca	61f273b987	Do not tolerate uninitialized variables. This commit was SVN r25489.	2011-11-18 10:19:24 +00:00
Ralph Castain	b34acd0476	Grrr....get the correct number too! This commit was SVN r25478.	2011-11-15 11:11:47 +00:00
Ralph Castain	593fc388a9	Make it a little more obvious as to which nodes are from each topology by labeling them with a letter. This commit was SVN r25477.	2011-11-15 11:10:39 +00:00
Ralph Castain	6310361532	At long last, the fabled revision to the affinity system has arrived. A more detailed explanation of how this all works will be presented here: https://svn.open-mpi.org/trac/ompi/wiki/ProcessPlacement The wiki page is incomplete at the moment, but I hope to complete it over the next few days. I will provide updates on the devel list. As the wiki page states, the default and most commonly used options remain unchanged (except as noted below). New, esoteric and complex options have been added, but unless you are a true masochist, you are unlikely to use many of them beyond perhaps an initial curiosity-motivated experimentation. In a nutshell, this commit revamps the map/rank/bind procedure to take into account topology info on the compute nodes. I have, for the most part, preserved the default behaviors, with three notable exceptions: 1. I have at long last bowed my head in submission to the system admin's of managed clusters. For years, they have complained about our default of allowing users to oversubscribe nodes - i.e., to run more processes on a node than allocated slots. Accordingly, I have modified the default behavior: if you are running off of hostfile/dash-host allocated nodes, then the default is to allow oversubscription. If you are running off of RM-allocated nodes, then the default is to NOT allow oversubscription. Flags to override these behaviors are provided, so this only affects the default behavior. 2. both cpus/rank and stride have been removed. The latter was demanded by those who didn't understand the purpose behind it - and I agreed as the users who requested it are no longer using it. The former was removed temporarily pending implementation. 3. vm launch is now the sole method for starting OMPI. It was just too darned hard to maintain multiple launch procedures - maybe someday, provided someone can demonstrate a reason to do so. As Jeff stated, it is impossible to fully test a change of this size. I have tested it on Linux and Mac, covering all the default and simple options, singletons, and comm_spawn. That said, I'm sure others will find problems, so I'll be watching MTT results until this stabilizes. This commit was SVN r25476.	2011-11-15 03:40:11 +00:00
Ralph Castain	c8e105bd8c	Remove stale code This commit was SVN r25475.	2011-11-14 23:39:23 +00:00
Ralph Castain	793f4c688f	Extend capability to support heterogeneous clusters with multiple topologies This commit was SVN r25474.	2011-11-13 23:23:09 +00:00
Ralph Castain	6b5e1b89cf	Turn off tree spawn as it doesn't currently work - will fix shortly. Add topology collection This commit was SVN r25472.	2011-11-11 23:42:36 +00:00
Ralph Castain	d008aeb531	Silence debug This commit was SVN r25471.	2011-11-11 16:42:45 +00:00
George Bosilca	3d318a4c26	Put the interface of our MPIR support in sync with the document accepted by the MPI Forum (http://www.mpi-forum.org/docs/mpir-specification-10-11-2010.pdf). This commit was SVN r25456.	2011-11-08 01:24:16 +00:00
George Bosilca	85a18dab74	MPIR_partial_attach_ok is not a volatile, but a constant. This commit was SVN r25455.	2011-11-08 01:00:38 +00:00
Ralph Castain	a3ce355a60	Revert r25453 and r25450 until we can fix the libevent2013 configure code - still not getting the includedir to eval correctly. This commit was SVN r25454. The following SVN revision numbers were found above: r25450 --> open-mpi/ompi@7f7d5c4f1f r25453 --> open-mpi/ompi@c9fe8c32e2	2011-11-07 16:23:44 +00:00
Samuel Gutierrez	3ea59cce96	minor cleanup to getenv_pmi.c. This commit was SVN r25449.	2011-11-07 03:18:07 +00:00
Samuel Gutierrez	e03bc93fb7	only use pmi grpcomm and pubsub during the direct launch case. use PMI environment variable to setup vpid in ess alps on cray xe systems. add pmi test code. This commit was SVN r25447.	2011-11-06 17:28:40 +00:00
Ralph Castain	34f0a27cb6	Initialize the locality info - at time of pmap creation, we at least know node locality This commit was SVN r25446.	2011-11-06 17:06:41 +00:00
Ralph Castain	729935dffb	Minor cleanups, mirroring what Jeff did to ompi_info This commit was SVN r25438.	2011-11-05 00:42:49 +00:00
Ralph Castain	fcee46b063	Add an option for printing a diffable process map for testing mappers This commit was SVN r25428.	2011-11-03 14:22:07 +00:00
Samuel Gutierrez	3fe7b3ee54	add PMI support to ess alps module. xt system guys: please yell at me if i missed something in cnos. This commit was SVN r25423.	2011-11-03 04:04:32 +00:00
Samuel Gutierrez	27b9bcfafd	update ess alps configuration file to include CNOS and PMI checks. some of the features committed here aren't being used, but they will be. also update orte_check_pmi.m4 to include missing call to action-if-not-found if --with-pmi is not specified or is disabled. This commit was SVN r25422.	2011-11-03 02:14:47 +00:00
Jeff Squyres	7f6f7bd0eb	Remove this component; twitter long ago switched to the oauth authentication, and no one has ever updated this component to match. It can be revived out of history if anyone cares. This commit was SVN r25421.	2011-11-02 21:04:49 +00:00
Ralph Castain	891027c10d	Cleanup error reports This commit was SVN r25420.	2011-11-02 18:34:19 +00:00
Ralph Castain	b2e2d24726	As in the rsh module, report failed daemons to the errmgr for proper cleanup This commit was SVN r25419.	2011-11-02 18:30:22 +00:00
Ralph Castain	3e4165fd8d	Cleanup includes This commit was SVN r25418.	2011-11-02 18:28:28 +00:00
Ralph Castain	b77552c45d	Cleanup some include files, return a silent error in open/select as the complaining component already output a message This commit was SVN r25416.	2011-11-02 17:42:06 +00:00
Ralph Castain	198e001554	Add another test This commit was SVN r25415.	2011-11-02 15:59:16 +00:00
Ralph Castain	55b996678e	Minor indentation changes This commit was SVN r25414.	2011-11-02 15:56:56 +00:00
Ralph Castain	f00753881e	Handle the case where mpirun -is- of the same topology as the compute nodes. This commit was SVN r25412.	2011-11-01 22:26:03 +00:00
Ralph Castain	d28dd55d33	Minimize the amount of topology info returned by the daemons. Most clusters, especially at scale, use the same node topology on every node, so there is no re ason to return the topology from every daemon. Borrow a page from the --hetero-apps page and let users indicate that the node topology differs by adding a -- hetero-nodes option to mpirun. If the option is set, then every daemon returns topology info. If not set, then only daemon vpid=1 returns it. We always want one daemon to return the topology as the head node is often different from the compute nodes. Having one daemon return the compute node topolo gy allows us to detect any such difference. All compute nodes are then set to the same topology. This commit was SVN r25408.	2011-11-01 18:43:10 +00:00
Ralph Castain	14966e0f8f	Cleanup PMI startup - if a component isn't selected, it should finalize PMI IFF it started it. Otherwise, components that aren't selected can finalize PMI when it is in use by other parts of the system. This commit was SVN r25407.	2011-11-01 16:25:12 +00:00
Ralph Castain	71ed8e3cd3	Bring back the local node's binding capabilities along with its topology. Clean up indentation. This commit was SVN r25399.	2011-10-30 13:20:16 +00:00
Ralph Castain	d492b20975	Bozo check for topology info This commit was SVN r25398.	2011-10-30 11:49:38 +00:00
Ralph Castain	4232115a98	Ensure pruning remains within the current job/app being mapped. This commit was SVN r25397.	2011-10-30 00:02:20 +00:00
Ralph Castain	648c85b41b	Add a simple pattern mapper as an example of how to use the topology info to create desired mappings. Let the user specify a pattern based on resource types, and map that pattern across all available nodes as resources permit. Don't automatically display the topology for each node when --display-devel-map is set as it can overwhelm the reader. Use a separate flag --display-topo to get it. This commit was SVN r25396.	2011-10-29 15:12:45 +00:00
Ralph Castain	12a589130a	Add some debug This commit was SVN r25395.	2011-10-29 15:07:58 +00:00
Ralph Castain	965b04d1a5	Use the new utilities to get a topology that reflects available cpus This commit was SVN r25394.	2011-10-29 15:07:36 +00:00
Ralph Castain	e50bcbf028	Add the ability to specify a topology-containing xml file to describe the simulated nodes to support mapping tests against arbitrary topologies This commit was SVN r25388.	2011-10-29 02:01:11 +00:00
Ralph Castain	7fa5f82d70	Add simulator component to support testing of large scale mapping methods. Automatically sets do-not-resolve and do-not-launch, and creates however many nodes the user wants to simulate in the system. This commit was SVN r25386.	2011-10-28 23:48:53 +00:00
Ralph Castain	e2eb8d5f78	Remove bad param registration - that param was already registered as an int_name in another location. This commit was SVN r25381.	2011-10-28 19:14:43 +00:00
Josh Hursey	6726590b1c	Remove the 'ess_node_rank' accessor from here. This caused running under 'tm' to segv at the orteds. It just looks like this part of the component was not updated during r25331. It was removed from the 'env' and 'slurm' environments in that patch. It looks like 'tm' was updated, but did not get this particular piece. This commit was SVN r25380. The following SVN revision numbers were found above: r25331 --> open-mpi/ompi@b44f8d4b28	2011-10-28 17:41:35 +00:00
Josh Hursey	59ff1dbbfb	Fix indentation problem that caused a segv when running without regex. This was introduced in r25063. This commit was SVN r25379. The following SVN revision numbers were found above: r25063 --> open-mpi/ompi@e58623cd5b	2011-10-28 13:39:32 +00:00
Samuel Gutierrez	922e41a318	fix typo. use PMI_Initialized for init status instead of PMI_Init. This commit was SVN r25377.	2011-10-27 22:27:30 +00:00
Ralph Castain	951d72692c	Reverse the #if direction so we report daemon failure to the errmgr - otherwise, we just hang if a daemon fails to start. Reviewed with Josh. This commit was SVN r25366.	2011-10-25 19:09:52 +00:00
Ralph Castain	c55cba55a7	Totally trivial spelling fix This commit was SVN r25361.	2011-10-24 14:06:33 +00:00
Ralph Castain	955d8e7d46	Allow apps to use pmi when launched by mpirun, if desired, without affecting daemons This commit was SVN r25359.	2011-10-23 15:57:13 +00:00
Nathan Hjelm	e8af0d8589	don't use alps paffinity This commit was SVN r25358.	2011-10-21 22:52:03 +00:00
Abhishek Kulkarni	46952e9008	Fix C/R functionality in trunk. Intra-node checkpointing of a job now works as expected. Signed-off-by: Abhishek Kulkarni <adkulkar@osl.iu.edu> This commit was SVN r25357.	2011-10-21 22:07:35 +00:00
Nathan Hjelm	7b1172b346	need a terminating character in the decoded string This commit was SVN r25355.	2011-10-21 16:46:28 +00:00

1 2 3 4 5 ...

3326 Коммитов