Ralph Castain
6310361532
At long last, the fabled revision to the affinity system has arrived. A more detailed explanation of how this all works will be presented here:
...
https://svn.open-mpi.org/trac/ompi/wiki/ProcessPlacement
The wiki page is incomplete at the moment, but I hope to complete it over the next few days. I will provide updates on the devel list. As the wiki page states, the default and most commonly used options remain unchanged (except as noted below). New, esoteric and complex options have been added, but unless you are a true masochist, you are unlikely to use many of them beyond perhaps an initial curiosity-motivated experimentation.
In a nutshell, this commit revamps the map/rank/bind procedure to take into account topology info on the compute nodes. I have, for the most part, preserved the default behaviors, with three notable exceptions:
1. I have at long last bowed my head in submission to the system admin's of managed clusters. For years, they have complained about our default of allowing users to oversubscribe nodes - i.e., to run more processes on a node than allocated slots. Accordingly, I have modified the default behavior: if you are running off of hostfile/dash-host allocated nodes, then the default is to allow oversubscription. If you are running off of RM-allocated nodes, then the default is to NOT allow oversubscription. Flags to override these behaviors are provided, so this only affects the default behavior.
2. both cpus/rank and stride have been removed. The latter was demanded by those who didn't understand the purpose behind it - and I agreed as the users who requested it are no longer using it. The former was removed temporarily pending implementation.
3. vm launch is now the sole method for starting OMPI. It was just too darned hard to maintain multiple launch procedures - maybe someday, provided someone can demonstrate a reason to do so.
As Jeff stated, it is impossible to fully test a change of this size. I have tested it on Linux and Mac, covering all the default and simple options, singletons, and comm_spawn. That said, I'm sure others will find problems, so I'll be watching MTT results until this stabilizes.
This commit was SVN r25476.
2011-11-15 03:40:11 +00:00
Ralph Castain
c8e105bd8c
Remove stale code
...
This commit was SVN r25475.
2011-11-14 23:39:23 +00:00
Ralph Castain
793f4c688f
Extend capability to support heterogeneous clusters with multiple topologies
...
This commit was SVN r25474.
2011-11-13 23:23:09 +00:00
Ralph Castain
6b5e1b89cf
Turn off tree spawn as it doesn't currently work - will fix shortly. Add topology collection
...
This commit was SVN r25472.
2011-11-11 23:42:36 +00:00
Ralph Castain
d008aeb531
Silence debug
...
This commit was SVN r25471.
2011-11-11 16:42:45 +00:00
Jeff Squyres
e8dcad6017
This typo has been here since August 2005. :-)
...
This commit was SVN r25468.
2011-11-11 03:01:52 +00:00
Brian Barrett
45a27e4f9f
For now, ignore LINK event
...
This commit was SVN r25467.
2011-11-11 02:49:03 +00:00
Jeff Squyres
face13157c
Why would Barrier be in the hello world example? Looks like a really
...
old accidental commit. :-)
This commit was SVN r25466.
2011-11-10 19:41:46 +00:00
Brad Benton
96395c916e
de-tab'd
...
This commit was SVN r25465.
2011-11-09 19:45:12 +00:00
Brad Benton
0712b911a5
Updated IBM copyright
...
This commit was SVN r25464.
2011-11-09 19:38:53 +00:00
Mike Dubman
00c27afd52
fix pid
...
This commit was SVN r25463.
2011-11-09 17:53:59 +00:00
Shiqing Fan
fc46ed6438
Use the new libevent On Windows.
...
This commit was SVN r25462.
2011-11-09 14:05:35 +00:00
George Bosilca
bd55a19db4
Disable vector optimization for ICC v12.1.0 release 2011.6.233.
...
This commit was SVN r25461.
2011-11-08 21:23:30 +00:00
Nathan Hjelm
d603f31976
removed ptr member from seg_key union
...
This commit was SVN r25460.
2011-11-08 15:44:54 +00:00
Mike Dubman
71398b658e
fix: OMPI_ERR_CONNECTION_FAILED available in v1.5, unavailable in trunk
...
This commit was SVN r25459.
2011-11-08 12:34:01 +00:00
Jeff Squyres
78538b701d
Turns out that we're not even using that $includedir properly, so just
...
remove it.
This commit was SVN r25458.
2011-11-08 03:18:09 +00:00
Ralph Castain
a931c5b1eb
Redo a patch from late last night that replaces libevent 2.0.7 with libevent 2.0.13 as our default event library. Cleanup the libevent renames to correctly state 2013 as our new version.
...
This commit was SVN r25457.
2011-11-08 01:36:15 +00:00
George Bosilca
3d318a4c26
Put the interface of our MPIR support in sync with the document accepted by the MPI
...
Forum (http://www.mpi-forum.org/docs/mpir-specification-10-11-2010.pdf ).
This commit was SVN r25456.
2011-11-08 01:24:16 +00:00
George Bosilca
85a18dab74
MPIR_partial_attach_ok is not a volatile, but a constant.
...
This commit was SVN r25455.
2011-11-08 01:00:38 +00:00
Ralph Castain
a3ce355a60
Revert r25453 and r25450 until we can fix the libevent2013 configure code - still not getting the includedir to eval correctly.
...
This commit was SVN r25454.
The following SVN revision numbers were found above:
r25450 --> open-mpi/ompi@7f7d5c4f1f
r25453 --> open-mpi/ompi@c9fe8c32e2
2011-11-07 16:23:44 +00:00
Ralph Castain
c9fe8c32e2
Fix a bug in the configure.m4 to ensure we disable unused libevent modes. Edit their Makefile.am to remove bufferevent support since we don't use it either.
...
Sorry for mid-day correction.
This commit was SVN r25453.
2011-11-07 15:21:29 +00:00
Mike Dubman
4cf9e1323d
fix: return correct error on connection failure
...
This commit was SVN r25452.
2011-11-07 06:13:17 +00:00
Samuel Gutierrez
1eb97a903e
update plat files to include ugni btl.
...
This commit was SVN r25451.
2011-11-07 05:00:46 +00:00
Nathan Hjelm
7f7d5c4f1f
RFC: upgrade to libevent 2.0.13 (removing 2.0.7) timeout. Removed libevent 2.0.7
...
This commit was SVN r25450.
2011-11-07 04:32:36 +00:00
Samuel Gutierrez
3ea59cce96
minor cleanup to getenv_pmi.c.
...
This commit was SVN r25449.
2011-11-07 03:18:07 +00:00
Nathan Hjelm
8962ce25b0
fixed some compiler errors caused by seg_key changes. osc/rdma may need to be updated to use btls that use 128 bit segment keys
...
This commit was SVN r25448.
2011-11-06 20:19:14 +00:00
Samuel Gutierrez
e03bc93fb7
only use pmi grpcomm and pubsub during the direct launch case. use PMI environment variable to setup vpid in ess alps on cray xe systems. add pmi test code.
...
This commit was SVN r25447.
2011-11-06 17:28:40 +00:00
Ralph Castain
34f0a27cb6
Initialize the locality info - at time of pmap creation, we at least know node locality
...
This commit was SVN r25446.
2011-11-06 17:06:41 +00:00
Nathan Hjelm
520a7c570e
changes to seg_key needed for a new btl
...
This commit was SVN r25445.
2011-11-06 16:19:09 +00:00
Ralph Castain
729935dffb
Minor cleanups, mirroring what Jeff did to ompi_info
...
This commit was SVN r25438.
2011-11-05 00:42:49 +00:00
Jeff Squyres
38451d4972
Add the MPI API version to the ompi_info output. How did we never
...
have this in there before?
This commit was SVN r25437.
2011-11-04 23:30:59 +00:00
Jeff Squyres
b43deb7091
Update for SVN 1.7.x, which only has a single top-level .svn directory
...
(no more .svn directories scattered throughout the tree)
This commit was SVN r25435.
2011-11-04 19:52:12 +00:00
Rolf vandeVaart
f777fe8eba
Change tab to spaces.
...
This commit was SVN r25433.
2011-11-04 17:18:30 +00:00
Jeff Squyres
f08b8bf2d4
Per this thread:
...
http://www.open-mpi.org/community/lists/devel/2011/10/9878.php
I am making a final decision to decide the behavior of what happens
when an MCA parameter is re-registered and changes types. In
developer builds (i.e., OPAL_ENABLE_DEBUG==1), a show_help message
will be displayed. In all builds, an error status will be returned.
Specifically, the logic looks like this:
{{{
if (detect_re-registration_with_type_change) {
#if OPAL_ENABLE_DEBUG
opal_show_help(...);
#endif
return OPAL_ERR_VALUE_OUT_OF_BOUNDS;
}
}}}
If someone would like to change this behavior, they are welcome to do
so. :-) I am committing this so that ''some'' action occurs (rather
than talking about the issue and then nothing happens).
This commit was SVN r25432.
2011-11-04 14:16:49 +00:00
Christopher Yeoh
fb57a74a40
Removes pointless memmove which because of a previous memcpy will always
...
have identical source and destination pointers. See #2871
Plugs a couple of minor memory leaks related to remote qp info
This commit was SVN r25431.
2011-11-04 00:15:08 +00:00
Christopher Yeoh
7e7701e7fc
Removes misleading debug warning from opal_free when a NULL
...
pointer is passed to it.
Fixes trac:2884
This commit was SVN r25430.
The following Trac tickets were found above:
Ticket 2884 --> https://svn.open-mpi.org/trac/ompi/ticket/2884
2011-11-03 23:57:26 +00:00
Jeff Squyres
886a9d589b
Custom patch from Brice for the hwloc-1.2.2ompi distro, per an issue
...
that Chris Yeoh/IBM found. See the thread below for more info:
http://www.open-mpi.org/community/lists/hwloc-devel/2011/11/2521.php
This commit was SVN r25429.
2011-11-03 14:53:22 +00:00
Ralph Castain
fcee46b063
Add an option for printing a diffable process map for testing mappers
...
This commit was SVN r25428.
2011-11-03 14:22:07 +00:00
Ralph Castain
5f73b874d9
Update ignores
...
This commit was SVN r25427.
2011-11-03 13:57:09 +00:00
Jeff Squyres
1d6d39d2ea
Missed this free/re-strdup
...
This commit was SVN r25426.
2011-11-03 11:31:37 +00:00
Jeff Squyres
6139256e45
v may get incremented, so be sure to save the ''original'' strdup'ed
...
pointer and free ''that'' -- not the (possibly incremented) pointer
This commit was SVN r25425.
2011-11-03 11:23:17 +00:00
Mike Dubman
7595a80a63
fix self pid
...
This commit was SVN r25424.
2011-11-03 06:46:20 +00:00
Samuel Gutierrez
3fe7b3ee54
add PMI support to ess alps module. xt system guys: please yell at me if i missed something in cnos.
...
This commit was SVN r25423.
2011-11-03 04:04:32 +00:00
Samuel Gutierrez
27b9bcfafd
update ess alps configuration file to include CNOS and PMI checks. some of the features committed here aren't being used, but they will be. also update orte_check_pmi.m4 to include missing call to action-if-not-found if --with-pmi is not specified or is disabled.
...
This commit was SVN r25422.
2011-11-03 02:14:47 +00:00
Jeff Squyres
7f6f7bd0eb
Remove this component; twitter long ago switched to the oauth
...
authentication, and no one has ever updated this component to match.
It can be revived out of history if anyone cares.
This commit was SVN r25421.
2011-11-02 21:04:49 +00:00
Ralph Castain
891027c10d
Cleanup error reports
...
This commit was SVN r25420.
2011-11-02 18:34:19 +00:00
Ralph Castain
b2e2d24726
As in the rsh module, report failed daemons to the errmgr for proper cleanup
...
This commit was SVN r25419.
2011-11-02 18:30:22 +00:00
Ralph Castain
3e4165fd8d
Cleanup includes
...
This commit was SVN r25418.
2011-11-02 18:28:28 +00:00
Ralph Castain
1bfc2bb424
Minor cleanup
...
This commit was SVN r25417.
2011-11-02 18:24:19 +00:00
Ralph Castain
b77552c45d
Cleanup some include files, return a silent error in open/select as the complaining component already output a message
...
This commit was SVN r25416.
2011-11-02 17:42:06 +00:00