1
1
Граф коммитов

16246 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
6310361532 At long last, the fabled revision to the affinity system has arrived. A more detailed explanation of how this all works will be presented here:
https://svn.open-mpi.org/trac/ompi/wiki/ProcessPlacement

The wiki page is incomplete at the moment, but I hope to complete it over the next few days. I will provide updates on the devel list. As the wiki page states, the default and most commonly used options remain unchanged (except as noted below). New, esoteric and complex options have been added, but unless you are a true masochist, you are unlikely to use many of them beyond perhaps an initial curiosity-motivated experimentation.

In a nutshell, this commit revamps the map/rank/bind procedure to take into account topology info on the compute nodes. I have, for the most part, preserved the default behaviors, with three notable exceptions:

1. I have at long last bowed my head in submission to the system admin's of managed clusters. For years, they have complained about our default of allowing users to oversubscribe nodes - i.e., to run more processes on a node than allocated slots. Accordingly, I have modified the default behavior: if you are running off of hostfile/dash-host allocated nodes, then the default is to allow oversubscription. If you are running off of RM-allocated nodes, then the default is to NOT allow oversubscription. Flags to override these behaviors are provided, so this only affects the default behavior.

2. both cpus/rank and stride have been removed. The latter was demanded by those who didn't understand the purpose behind it - and I agreed as the users who requested it are no longer using it. The former was removed temporarily pending implementation.

3. vm launch is now the sole method for starting OMPI. It was just too darned hard to maintain multiple launch procedures - maybe someday, provided someone can demonstrate a reason to do so.

As Jeff stated, it is impossible to fully test a change of this size. I have tested it on Linux and Mac, covering all the default and simple options, singletons, and comm_spawn. That said, I'm sure others will find problems, so I'll be watching MTT results until this stabilizes.

This commit was SVN r25476.
2011-11-15 03:40:11 +00:00
Ralph Castain
c8e105bd8c Remove stale code
This commit was SVN r25475.
2011-11-14 23:39:23 +00:00
Ralph Castain
793f4c688f Extend capability to support heterogeneous clusters with multiple topologies
This commit was SVN r25474.
2011-11-13 23:23:09 +00:00
Ralph Castain
6b5e1b89cf Turn off tree spawn as it doesn't currently work - will fix shortly. Add topology collection
This commit was SVN r25472.
2011-11-11 23:42:36 +00:00
Ralph Castain
d008aeb531 Silence debug
This commit was SVN r25471.
2011-11-11 16:42:45 +00:00
Jeff Squyres
e8dcad6017 This typo has been here since August 2005. :-)
This commit was SVN r25468.
2011-11-11 03:01:52 +00:00
Brian Barrett
45a27e4f9f For now, ignore LINK event
This commit was SVN r25467.
2011-11-11 02:49:03 +00:00
Jeff Squyres
face13157c Why would Barrier be in the hello world example? Looks like a really
old accidental commit.  :-)

This commit was SVN r25466.
2011-11-10 19:41:46 +00:00
Brad Benton
96395c916e de-tab'd
This commit was SVN r25465.
2011-11-09 19:45:12 +00:00
Brad Benton
0712b911a5 Updated IBM copyright
This commit was SVN r25464.
2011-11-09 19:38:53 +00:00
Mike Dubman
00c27afd52 fix pid
This commit was SVN r25463.
2011-11-09 17:53:59 +00:00
Shiqing Fan
fc46ed6438 Use the new libevent On Windows.
This commit was SVN r25462.
2011-11-09 14:05:35 +00:00
George Bosilca
bd55a19db4 Disable vector optimization for ICC v12.1.0 release 2011.6.233.
This commit was SVN r25461.
2011-11-08 21:23:30 +00:00
Nathan Hjelm
d603f31976 removed ptr member from seg_key union
This commit was SVN r25460.
2011-11-08 15:44:54 +00:00
Mike Dubman
71398b658e fix: OMPI_ERR_CONNECTION_FAILED available in v1.5, unavailable in trunk
This commit was SVN r25459.
2011-11-08 12:34:01 +00:00
Jeff Squyres
78538b701d Turns out that we're not even using that $includedir properly, so just
remove it.

This commit was SVN r25458.
2011-11-08 03:18:09 +00:00
Ralph Castain
a931c5b1eb Redo a patch from late last night that replaces libevent 2.0.7 with libevent 2.0.13 as our default event library. Cleanup the libevent renames to correctly state 2013 as our new version.
This commit was SVN r25457.
2011-11-08 01:36:15 +00:00
George Bosilca
3d318a4c26 Put the interface of our MPIR support in sync with the document accepted by the MPI
Forum (http://www.mpi-forum.org/docs/mpir-specification-10-11-2010.pdf).

This commit was SVN r25456.
2011-11-08 01:24:16 +00:00
George Bosilca
85a18dab74 MPIR_partial_attach_ok is not a volatile, but a constant.
This commit was SVN r25455.
2011-11-08 01:00:38 +00:00
Ralph Castain
a3ce355a60 Revert r25453 and r25450 until we can fix the libevent2013 configure code - still not getting the includedir to eval correctly.
This commit was SVN r25454.

The following SVN revision numbers were found above:
  r25450 --> open-mpi/ompi@7f7d5c4f1f
  r25453 --> open-mpi/ompi@c9fe8c32e2
2011-11-07 16:23:44 +00:00
Ralph Castain
c9fe8c32e2 Fix a bug in the configure.m4 to ensure we disable unused libevent modes. Edit their Makefile.am to remove bufferevent support since we don't use it either.
Sorry for mid-day correction.

This commit was SVN r25453.
2011-11-07 15:21:29 +00:00
Mike Dubman
4cf9e1323d fix: return correct error on connection failure
This commit was SVN r25452.
2011-11-07 06:13:17 +00:00
Samuel Gutierrez
1eb97a903e update plat files to include ugni btl.
This commit was SVN r25451.
2011-11-07 05:00:46 +00:00
Nathan Hjelm
7f7d5c4f1f RFC: upgrade to libevent 2.0.13 (removing 2.0.7) timeout. Removed libevent 2.0.7
This commit was SVN r25450.
2011-11-07 04:32:36 +00:00
Samuel Gutierrez
3ea59cce96 minor cleanup to getenv_pmi.c.
This commit was SVN r25449.
2011-11-07 03:18:07 +00:00
Nathan Hjelm
8962ce25b0 fixed some compiler errors caused by seg_key changes. osc/rdma may need to be updated to use btls that use 128 bit segment keys
This commit was SVN r25448.
2011-11-06 20:19:14 +00:00
Samuel Gutierrez
e03bc93fb7 only use pmi grpcomm and pubsub during the direct launch case. use PMI environment variable to setup vpid in ess alps on cray xe systems. add pmi test code.
This commit was SVN r25447.
2011-11-06 17:28:40 +00:00
Ralph Castain
34f0a27cb6 Initialize the locality info - at time of pmap creation, we at least know node locality
This commit was SVN r25446.
2011-11-06 17:06:41 +00:00
Nathan Hjelm
520a7c570e changes to seg_key needed for a new btl
This commit was SVN r25445.
2011-11-06 16:19:09 +00:00
Ralph Castain
729935dffb Minor cleanups, mirroring what Jeff did to ompi_info
This commit was SVN r25438.
2011-11-05 00:42:49 +00:00
Jeff Squyres
38451d4972 Add the MPI API version to the ompi_info output. How did we never
have this in there before?

This commit was SVN r25437.
2011-11-04 23:30:59 +00:00
Jeff Squyres
b43deb7091 Update for SVN 1.7.x, which only has a single top-level .svn directory
(no more .svn directories scattered throughout the tree)

This commit was SVN r25435.
2011-11-04 19:52:12 +00:00
Rolf vandeVaart
f777fe8eba Change tab to spaces.
This commit was SVN r25433.
2011-11-04 17:18:30 +00:00
Jeff Squyres
f08b8bf2d4 Per this thread:
http://www.open-mpi.org/community/lists/devel/2011/10/9878.php

I am making a final decision to decide the behavior of what happens
when an MCA parameter is re-registered and changes types.  In
developer builds (i.e., OPAL_ENABLE_DEBUG==1), a show_help message
will be displayed.  In all builds, an error status will be returned.
Specifically, the logic looks like this:

{{{
    if (detect_re-registration_with_type_change) {
#if OPAL_ENABLE_DEBUG
        opal_show_help(...);
#endif
        return OPAL_ERR_VALUE_OUT_OF_BOUNDS;
    }
}}}

If someone would like to change this behavior, they are welcome to do
so.  :-) I am committing this so that ''some'' action occurs (rather
than talking about the issue and then nothing happens).

This commit was SVN r25432.
2011-11-04 14:16:49 +00:00
Christopher Yeoh
fb57a74a40 Removes pointless memmove which because of a previous memcpy will always
have identical source and destination pointers. See #2871
Plugs a couple of minor memory leaks related to remote qp info

This commit was SVN r25431.
2011-11-04 00:15:08 +00:00
Christopher Yeoh
7e7701e7fc Removes misleading debug warning from opal_free when a NULL
pointer is passed to it.
Fixes trac:2884

This commit was SVN r25430.

The following Trac tickets were found above:
  Ticket 2884 --> https://svn.open-mpi.org/trac/ompi/ticket/2884
2011-11-03 23:57:26 +00:00
Jeff Squyres
886a9d589b Custom patch from Brice for the hwloc-1.2.2ompi distro, per an issue
that Chris Yeoh/IBM found.  See the thread below for more info:

  http://www.open-mpi.org/community/lists/hwloc-devel/2011/11/2521.php

This commit was SVN r25429.
2011-11-03 14:53:22 +00:00
Ralph Castain
fcee46b063 Add an option for printing a diffable process map for testing mappers
This commit was SVN r25428.
2011-11-03 14:22:07 +00:00
Ralph Castain
5f73b874d9 Update ignores
This commit was SVN r25427.
2011-11-03 13:57:09 +00:00
Jeff Squyres
1d6d39d2ea Missed this free/re-strdup
This commit was SVN r25426.
2011-11-03 11:31:37 +00:00
Jeff Squyres
6139256e45 v may get incremented, so be sure to save the ''original'' strdup'ed
pointer and free ''that'' -- not the (possibly incremented) pointer

This commit was SVN r25425.
2011-11-03 11:23:17 +00:00
Mike Dubman
7595a80a63 fix self pid
This commit was SVN r25424.
2011-11-03 06:46:20 +00:00
Samuel Gutierrez
3fe7b3ee54 add PMI support to ess alps module. xt system guys: please yell at me if i missed something in cnos.
This commit was SVN r25423.
2011-11-03 04:04:32 +00:00
Samuel Gutierrez
27b9bcfafd update ess alps configuration file to include CNOS and PMI checks. some of the features committed here aren't being used, but they will be. also update orte_check_pmi.m4 to include missing call to action-if-not-found if --with-pmi is not specified or is disabled.
This commit was SVN r25422.
2011-11-03 02:14:47 +00:00
Jeff Squyres
7f6f7bd0eb Remove this component; twitter long ago switched to the oauth
authentication, and no one has ever updated this component to match.
It can be revived out of history if anyone cares.

This commit was SVN r25421.
2011-11-02 21:04:49 +00:00
Ralph Castain
891027c10d Cleanup error reports
This commit was SVN r25420.
2011-11-02 18:34:19 +00:00
Ralph Castain
b2e2d24726 As in the rsh module, report failed daemons to the errmgr for proper cleanup
This commit was SVN r25419.
2011-11-02 18:30:22 +00:00
Ralph Castain
3e4165fd8d Cleanup includes
This commit was SVN r25418.
2011-11-02 18:28:28 +00:00
Ralph Castain
1bfc2bb424 Minor cleanup
This commit was SVN r25417.
2011-11-02 18:24:19 +00:00
Ralph Castain
b77552c45d Cleanup some include files, return a silent error in open/select as the complaining component already output a message
This commit was SVN r25416.
2011-11-02 17:42:06 +00:00