1
1
Граф коммитов

602 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
807dc5383d Check for a function that is only available in recent versions of the
IBCM library.  Fixes trac:1280.

This commit was SVN r18397.

The following Trac tickets were found above:
  Ticket 1280 --> https://svn.open-mpi.org/trac/ompi/ticket/1280
2008-05-07 11:51:55 +00:00
Pak Lui
f5311903ee Correct the check with AC_LINK_IFELSE per Jeff's suggestion
This commit was SVN r18368.
2008-05-05 02:13:30 +00:00
Jeff Squyres
ba5615a18f Merge in /tmp-public/cpc3 branch to trunk. oob/xoob still remains the
default CPC.

This commit was SVN r18356.
2008-05-02 11:52:33 +00:00
Jeff Squyres
357428f82f Per http://www.open-mpi.org/community/lists/devel/2008/04/3778.php, Ralph W.'s suggestion to remove an unnecessary escape
This commit was SVN r18354.
2008-05-01 22:33:49 +00:00
Jeff Squyres
518bd99e17 Per thread started here:
http://www.open-mpi.org/community/lists/users/2008/04/5483.php

Make the error message a bit more user-friendly.

This commit was SVN r18293.
2008-04-25 11:09:43 +00:00
Jeff Squyres
a198971fa2 Temporarily disable Solaris ports support in libevent. Refs trac:1273
This commit was SVN r18199.

The following Trac tickets were found above:
  Ticket 1273 --> https://svn.open-mpi.org/trac/ompi/ticket/1273
2008-04-17 23:14:43 +00:00
Jeff Squyres
939d50dff6 Minor configure help message fix. Thanks Bernhard Fischer
This commit was SVN r18077.
2008-04-02 22:55:38 +00:00
Rainer Keller
e1e13631cc - Starting with gcc-4.4, the compiler does not recognize faulty
compiler warnings starting with -Wno- to recognize (if there are no
   other warnings.
   Try it with Your favorite warning, such as -Wno-britney will not
   fail, while it will be recognized as faulty, if You also pass on with
   -Wno-britney -Wspears....

This commit was SVN r18070.
2008-04-02 07:44:17 +00:00
George Bosilca
8e8b8950ef Add support for Interix.
This commit was SVN r17983.
2008-03-26 23:20:33 +00:00
Jeff Squyres
314ab2c6e7 Update internal libevent to upstream (v1.4.2-rc + OMPI changes).
Greatly reduce the number of "foo" -> "opal_foo" symbol renames in the
libevent source, and instead greatly expand the event_rename.h file
that uses preprocessor macros to make all public symbols be
"opal_foo".

This commit was SVN r17923.
2008-03-23 12:33:04 +00:00
Jeff Squyres
ace1717ca7 Patch from Brian to add in proper linker libraries
This commit was SVN r17919.
2008-03-21 23:00:54 +00:00
George Bosilca
cce542dd73 Don't do anything if the compile step failed. Make the correct
detection on Windows.

This commit was SVN r17791.
2008-03-07 23:57:56 +00:00
George Bosilca
79d292fe31 Do the microsoft checks.
This commit was SVN r17790.
2008-03-07 22:35:26 +00:00
George Bosilca
023fa2663d Typos.
This commit was SVN r17788.
2008-03-07 21:13:20 +00:00
Ralph Castain
b104a59b08 Remove obsolescent configure option
This commit was SVN r17753.
2008-03-06 03:09:42 +00:00
Pak Lui
4dd5683715 Typo in help message
This commit was SVN r17743.
2008-03-05 16:02:33 +00:00
Matthias Jurenz
07bbdd0de0 Re-enable building of contrib packages by default (the VT configury issues are fixed)
This commit was SVN r17740.
2008-03-05 15:30:50 +00:00
Tim Prins
1b34620d8e Make the default to enable symbol visibility.
Fixes trac:1222

This commit was SVN r17712.

The following Trac tickets were found above:
  Ticket 1222 --> https://svn.open-mpi.org/trac/ompi/ticket/1222
2008-03-05 01:30:32 +00:00
Jeff Squyres
8189fcc7d5 Back out r17702; it went very badly.
This commit was SVN r17704.

The following SVN revision numbers were found above:
  r17702 --> open-mpi/ompi@3df754ebd7
2008-03-05 00:42:39 +00:00
Jeff Squyres
8e631d4dc0 Suggestions from Ralf W. to use the official git HTTP mirrors to get
the latest config.sub and config.guess.

This commit was SVN r17695.
2008-03-04 21:22:51 +00:00
Jeff Squyres
6aba701f65 Change the default to ''not'' build any contrib packages by default
(per consensus on the devel list, at least until the VT configury
issues are fixed).

This commit was SVN r17683.
2008-03-04 13:43:12 +00:00
Matthias Jurenz
70fe703057 set OMPI_CONTRIB_DIST_SUBDIRS only if the contributed software is enabled
This commit was SVN r17680.
2008-03-03 15:59:52 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
Galen Shipman
44003a41f2 Update common_portals to allow using portals interconnect with a modex rather
than relying on cnos to get the nid/pid map. 

This commit was SVN r17588.
2008-02-25 19:17:21 +00:00
Josh Hursey
99144db970 Improve checkpoint/restart support by allowing a checkpoint to progress when the process is *not* in the MPI library. This involves creating a separate thread for polling for a checkpoint request. This thread is active when the MPI process is not in the MPI library, and paused when the MPI process is in the library.
Some MPI C interface files saw some spacing changes to conform to the coding standards of Open MPI.

Changed MPI C interface files to use {{{OPAL_CR_ENTER_LIBRARY()}}} and {{{OPAL_CR_EXIT_LIBRARY()}}} instead of just {{{OPAL_CR_TEST_CHECKPOINT_READY()}}}. This will allow the checkpoint/restart system more flexibility in how it is to behave.

Fixed the configure check for {{{--enable-ft-thread}}} so it has a know dependance on {{{--enable-mpi-thread}}} (and/or {{{--enable-progress-thread}}}).

Added a line for Checkpoint/Restart support to {{{ompi_info}}}.

Added some options to choose at runtime whether or not to use the checkpoint polling thread. By default, if the user asked for it to be compiled in, then it is used. But some users will want the ability to toggle its use at runtime.

There are still some places for improvement, but the feature works correctly. As always with Checkpoint/Restart, it is compiled out unless explicitly asked for at configure time. Further, if it was configured in, then it is not used unless explicitly asked for by the user at runtime.

This commit was SVN r17516.
2008-02-19 22:15:52 +00:00
Jeff Squyres
5bb1e5151f Suggestions/patches from Brian to make stuff better:
* Include all the stuff that is necessary for running autogen.sh in a
   distribution tarball.
 * Remove from config/Makefile.am's EXTRA_DIST that which is
   automatically included in the tarball in recent versions of
   Automake (i.e., all the m4 files that are acincluded).
 * Make ROMIO's configure script look for something that is actually
   included in the tarball.

Fixes trac:1025.

This commit was SVN r17505.

The following Trac tickets were found above:
  Ticket 1025 --> https://svn.open-mpi.org/trac/ompi/ticket/1025
2008-02-19 01:49:52 +00:00
Galen Shipman
cec3d96a94 configure changes for Cray XT CNL (adds --with-alps param) auto detection is
not included (or desired) 

This commit was SVN r17481.
2008-02-17 19:02:36 +00:00
Matthias Jurenz
12782ba700 Added Ralfs's patch to make future contrib integration easier. Thanks, Ralf!
This commit was SVN r17426.
2008-02-12 11:48:01 +00:00
Pavel Shamis
f0c478e7e0 XRC - replacing the new old API with new one.
This commit was SVN r17369.
2008-02-04 14:03:38 +00:00
Andreas Knüpfer
c53e19be46 bringing VampirTrace integration to the trunk
This commit was SVN r17278.
2008-01-28 08:39:48 +00:00
Jeff Squyres
fe6ba96dd6 Be a little friendlier for mercurial checkouts.
This commit was SVN r17271.
2008-01-28 03:04:53 +00:00
Jeff Squyres
9b1b27fa8d Allow the get_version script to be run under ksh as well. Thanks to Elan Ruusamae for pointing out the problem; thanks to Ralf Wildenhues for providing a shell-independent fix.
This commit was SVN r17246.
2008-01-26 00:01:03 +00:00
Rainer Keller
17906c008f - Take this nights changes to .m4: We have not pml-teg, delete refs.
This commit was SVN r17226.
2008-01-25 09:32:33 +00:00
Jeff Squyres
2227d5ec4a Add configure check for struct ibv_device.transport type, which was added in OFED v1.2. Still need to fix up oob and rdma_cm cpc's to do something better with this information...
This commit was SVN r17198.
2008-01-24 12:14:21 +00:00
Jon Mason
a0d4122606 The new cpc selection framework is now in place. The patch below allows
for dynamic selection of cpc methods based on what is available.  It
also allows for inclusion/exclusions of methods.  It even futher allows
for modifying the priorities of certain cpc methods to better determine
the optimal cpc method.

This patch also contains XRC compile time disablement (per Jeff's
patch).

At a high level, the cpc selections works by walking through each cpc
and allowing it to test to see if it is permissable to run on this
mpirun.  It returns a priority if it is permissable or a -1 if not.  All
of the cpc names and priorities are rolled into a string.  This string
is then encapsulated in a message and passed around all the ompi
processes.  Once received and unpacked, the list received is compared
to a local copy of the list.  The connection method is chosen by
comparing the lists passed around to all nodes via modex with the list
generated locally.  Any non-negative number is a potentially valid
connection method.  The method below of determining the optimal
connection method is to take the cross-section of the two lists.  The
highest single value (and the other side being non-negative) is selected
as the cpc method.

svn merge -r 16948:17128 https://svn.open-mpi.org/svn/ompi/tmp-public/openib-cpc/ .

This commit was SVN r17138.
2008-01-14 23:22:03 +00:00
Ethan Mallove
cb7c435a9c Added OMPI_WHICH macro as an alternative to using {{{`which
<prog>`}}} in `configure`. It is preferable to simply using {{{`which
<prog>`}}} because backticks (`) (aka backquotes) invoke a sub-shell
which may source a "noisy" `~/.whatever` file, and we do not want the
error messages to be part of the assignment in {{{foo=`which
<prog>`}}}.

This commit was SVN r16955.
2007-12-14 02:39:58 +00:00
Jeff Squyres
8320a491fe Fix problem when $CC or $CXX where multiple tokens. A similar fix
when in for $F77 and $FC long ago.

Thanks to Brian Barrett for noticing and submitting a patch.

This commit was SVN r16864.
2007-12-06 11:38:35 +00:00
Gleb Natapov
bd47da4699 Initial XRC support by Mellanox.
This commit was SVN r16787.
2007-11-28 07:18:59 +00:00
Jeff Squyres
71715b05ec Add missing $LDFLAGS in the fortran linker line. This missing flag
erroneously caused the test to fail on Cray systems.

This commit was SVN r16777.
2007-11-27 23:49:08 +00:00
George Bosilca
d0f30e521b After the 10.5.1 update this bug is still valid. Remove the -g from all
Leopard versions (until they fix it).

This commit was SVN r16762.
2007-11-21 03:10:05 +00:00
Jeff Squyres
e491318081 Per #1181, make our use of rm be consistent with the rest of AC/AM.
LT uses $RM, but AC/AM appear to use "rm ...".  So we'll go with
that.

This commit was SVN r16672.
2007-11-06 12:20:58 +00:00
Jeff Squyres
33257f2b56 Remove -g from CCASFLAGS if on OS X Leopard. Fixes trac:1179.
This commit was SVN r16671.

The following Trac tickets were found above:
  Ticket 1179 --> https://svn.open-mpi.org/trac/ompi/ticket/1179
2007-11-06 12:02:11 +00:00
Jeff Squyres
748fc31906 Change everywhere we do a "rm -f conftest*" to "rm -rf conftest" to
cover the case where a subdirectory is also built that needs to be
removed.

Note that there are other macros that we don't control (AC, AM, and/or
LT) that also exhibit this problem that we cannot fix.  :-\

Fixes trac:1180.

This commit was SVN r16669.

The following Trac tickets were found above:
  Ticket 1180 --> https://svn.open-mpi.org/trac/ompi/ticket/1180
2007-11-06 01:32:42 +00:00
Ethan Mallove
005652c9d4 * Embed ident strings into the Open MPI libraries using one of the following
methods (in order of precedence):
  1. #pragma ident <ident string> (e.g., Intel and Sun)
  1. #ident <ident string> (e.g., GCC)
  1. static const char ident[] = <ident string> (all others)
By default, the ident string used is the standard Open MPI version string. Only
the following libraries will get the embedded version strings (e.g., DSOs will
not):
  * libmpi.so
  * libmpi_cxx.so
  * libmpi_f77.so
  * libopen-pal.so
  * libopen-rte.so
* Added two new configure options:
  * `--with-package-name="STRING"` (defaults to "Open MPI username@hostname
    Distribution"). `STRING` is displayed by `ompi_info` next to the "Package"
    heading.
  * `--with-ident-string="STRING"` (defaults to the standard Open MPI version
    string - e.g., X.Y.Zr######). `%VERSION%` will expand to the Open MPI
    version string if it is supplied to this configure option.

This commit was SVN r16644.
2007-11-03 02:40:22 +00:00
Jeff Squyres
00da8605a5 PUSH and POP shell variable scopes like this:
{{{
OMPI_VAR_SCOPE_PUSH([var1 var2 var3])
...use $var1 $var1 and $var3
OMPI_VAR_SCOPE_PUSH([var4 var5 var6])
...use $var1 $var1 and $var3
...use $var4 $var5 and $var6
OMPI_VAR_SCOPE_POP
...use $var1 $var1 and $var3
OMPI_VAR_SCOPE_POP
}}}

The PUSH macro does a simple sanity check to ensure that the variables
listed are not already set with other values.  If they are set, it
will abort configure, assuming that this is a programming error.  If
none of the names are set as environment variables containing values,
the names are saved for later POP'ing.  The POP will unset all the
variables from a corresponding PUSH.

As the names imply, these macros effect a stack-like behavior.  So a
POP must correspond to a PUSH, etc.

These macros are intended to be simple sanity checks for OMPI
configure programmers, and also help keep the environment clean by
unsetting variables when they are no longer used.

This commit was SVN r16592.
2007-10-26 23:35:02 +00:00
George Bosilca
938be44f07 Complete the removal of the mvapi BTL.
This commit was SVN r16491.
2007-10-17 22:02:52 +00:00
George Bosilca
e9aa15f9d5 On behalf of Ralf Wildenhues:
config/ompi_check_visibility.m4 (OMPI_CHECK_VISIBILITY):
Rename ompi_vc_cc_fvisibility to ompi_cv_cc_fvisibility, so
that it will be cached.

This commit was SVN r16435.
2007-10-11 22:06:39 +00:00
Ralph Castain
53af94fd87 Modify the configure system so that gridengine support is only built in specific conditions:
1. --with-sge, always builds
2. --without-sge, never builds
3. if neither is specified, build if and only if either SGE_ROOT is set or "qrsh" is found in the path

This commit was SVN r16422.
2007-10-10 21:39:16 +00:00
Jeff Squyres
74fd678de8 Fix a help message to also show the default value.
This commit was SVN r16369.
2007-10-06 14:25:38 +00:00
Ralph Castain
54b2cf747e These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC.
The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component.

This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done:

As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in.

In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in.

The incoming changes revamp these procedures in three ways:

1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step.

The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic.

Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure.


2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed.

The size of this data has been reduced in three ways:

(a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes.

To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose.

(b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction.

(c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using.

While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly.


3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup.

It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k*50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging.

Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future.


There are a few minor additional changes in the commit that I'll just note in passing:

* propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details.

* requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details.

* cleanup of some stale header files

This commit was SVN r16364.
2007-10-05 19:48:23 +00:00