1
1
Граф коммитов

537 Коммитов

Автор SHA1 Сообщение Дата
Josh Hursey
bd6426d3dd This is a minor cleanup of the configure.m4 (per suggestion from Jeff).
Refs trac:1987
The patch for v1.3 attached to Ticket #1987 already includes this change.

I did not have a chance to commit this last night, so sorry for the delay.

This commit was SVN r21777.

The following Trac tickets were found above:
  Ticket 1987 --> https://svn.open-mpi.org/trac/ompi/ticket/1987
2009-08-07 23:38:54 +00:00
Josh Hursey
063f5b2ff6 After talking through the patch with Jeff, we have a couple more fixes to r21766 that should also go over to v1.3 in Ticket #1987.
* Check for {{{dlfcn.h}}} in the self component's configure.m4 (also clean up the .m4 a bit.
 * Adjust the priority of the BLCR component so that the self component has a higher priority (if the application went to the trouble of writing the routines, why not use them.) The 'self' component checks for the appropriate functions during query, so it know if it -can- be used during component selection.
 * Adjust some copyrights that I missed before
 * Fix a warning when casing the result of dlsym() into a function pointer. There is a bit of pointer magic to make this happen (thanks to the following website, and RedHat EL 4 man pages for illustrating it:
  http://www.opengroup.org/onlinepubs/009695399/functions/dlsym.html

Passing to Jeff for a final review of the patch before moving to v1.3.

This commit was SVN r21768.

The following SVN revision numbers were found above:
  r21766 --> open-mpi/ompi@91e52d062b
2009-08-05 22:07:37 +00:00
Josh Hursey
91e52d062b Fix the 'self' CRS component.
Due to the visibility patch to libltdl in r21731, this module can no longer access or use the libltdl interfaces directly. Instead just use the dlopen/dlsym/dlclose functions directly. This is a portability implication here, but for the moment it does not seem to bite us.

Also in this patch, cleanup some of the 'self' specific code paths.
 * opal-restart need not special case the 'self' component since it can now interact with it as if it were a normal component.
 * Cleanup the initialization of the cmd line arguments in opal-restart.
 * Make sure to mark opal-restart as a 'tool', but do so by setting the global variable directly instead of setting the environment variable, which could be inherited by the application.
 * Most of the functions in the 'self' component should not be used by a command line tool (exception being 'restart'), so make sure that if we accidently call them then errors are returned.
 * Increase the priority of the 'none' component to be above that of 'self' when being selected in a command line tool. This allows for both mpirun and opal-restart to work correctly with the 'self' module.

This commit was SVN r21766.

The following SVN revision numbers were found above:
  r21731 --> open-mpi/ompi@0278b86456
2009-08-05 16:21:51 +00:00
Jeff Squyres
98b0a7af3d Per http://www.open-mpi.org/community/lists/devel/2009/08/6555.php,
add an lt_dlerror() in the error output one of the error cases.

This commit was SVN r21751.
2009-08-03 16:29:52 +00:00
Jeff Squyres
9455eb804f Oops -- fix the type.
This commit was SVN r21750.
2009-08-03 16:26:15 +00:00
Jeff Squyres
7b1f65095b Update to match the current code and be a bit more explicit (since
others are currently looking at this code).

This commit was SVN r21746.
2009-07-30 12:45:59 +00:00
Jeff Squyres
cb653bc4e8 Change the test memalign() call to use an alignment of 4 so that some
debuggers stop complaining. :-)

This commit was SVN r21744.
2009-07-29 20:33:38 +00:00
Jeff Squyres
69139e4171 Print a warning if someone tries to set opal_ptmalloc2_disable via an
MCA parameter file.

This commit was SVN r21743.
2009-07-29 20:05:56 +00:00
Jeff Squyres
d12db20089 This function actually returns an int, not a bool (OPAL_SUCCESS or
OPAL_ERROR).  Also add a line to the docs describing that it's ok to
pass in NULL for the source_file.

This commit was SVN r21742.
2009-07-29 19:52:18 +00:00
Ralph Castain
c459615f8f When someone specifies a rank-file slot-list of N:*, stop the loop at the proper place (we were going through the loop one too many times).
Thanks to Eugene for spotting it.

This commit was SVN r21728.
2009-07-23 17:51:15 +00:00
Jeff Squyres
bfd689f0ef Per discussion on the mailing list, backing out r21723.
This commit was SVN r21725.

The following SVN revision numbers were found above:
  r21723 --> open-mpi/ompi@2250be582d
2009-07-22 00:02:00 +00:00
Jeff Squyres
38faae6eab Ignore this component until it can be fixed properly.
This commit was SVN r21724.
2009-07-21 22:35:45 +00:00
Iain Bason
2250be582d Added autodetect installdirs component. Currently supports Solaris and Linux.
* Installation directories will be inferred from the actual location
  of the shared library that contains the component.

* OPAL_PREFIX and other environment variables allow users to override
  the inferred directories.  They should no longer be necessary in
  most cases, though.

* Any directories that cannot be inferred will fall back to whatever
  is provided by the config installdirs component.

This commit was SVN r21723.
2009-07-21 20:19:38 +00:00
George Bosilca
3e971e61f3 The system headers are supposed to be protected by #ifdef and not by #if.
This commit was SVN r21700.
2009-07-16 18:27:33 +00:00
Ralph Castain
d3fb39073f Initialize a variable to ensure we get the correct number of bound processors
This commit was SVN r21590.
2009-07-02 17:48:04 +00:00
Jeff Squyres
246caafe06 Correct the logic of the check for the env variable
OMPI_MCA_memory_ptmalloc2_disable and also add an explicit check for
FAKEROOTKEY (see http://bugs.debian.org/531522).

This commit was SVN r21489.
2009-06-20 11:22:06 +00:00
Jeff Squyres
f42727707b Per http://bugs.debian.org/531522, add an MCA param/environment
variable to allow the disabling of the ptmalloc2 component at init
time.

This commit was SVN r21479.
2009-06-19 10:50:23 +00:00
Jeff Squyres
6777f01380 Also look for /dev/ipath
This commit was SVN r21410.
2009-06-11 00:35:21 +00:00
Ralph Castain
c3c1ab1337 Correct a comment in paffinity.h about what paffinity_get returns - it was inaccurate.
Revamp the affinity detection/set procedure in mpi_init to correctly detect when we have already been bound to processors, given the revised understanding of paffinity_get. Add a new paffinity macro to make checking for already bound a little nicer.

This commit was SVN r21402.
2009-06-09 14:33:35 +00:00
Josh Hursey
70333b9441 Some components were still using OMPI_*_VERSION instead of OPAL_*_VERSION, so convert them over (Jeff is taking care of PLPA, so that is not included here).
This commit was SVN r21384.
2009-06-05 15:34:59 +00:00
Shiqing Fan
3137001772 Read from the correct registry entry on Windows Vista and Server 2008.
This commit was SVN r21224.
2009-05-13 15:56:37 +00:00
Ralph Castain
aa25a51c92 Do not mark the mpi_paffinity_alone param as deprecated so we don't scare Jeff...er...users.
This commit was SVN r21218.
2009-05-12 15:41:11 +00:00
Jeff Squyres
05d87ee7b4 Because this error comes up over and over and over and over and ...
Libltdl erroneously returns an error string of "file not found" for
lots of reasons, even if the file really *is* there, but just failed
to dlopen() for some reason.  So if lt_dlerror() returns "file not
found", do some simple hueristics and if we *do* find a file, print a
slightly better error message.

This commit was SVN r21214.
2009-05-12 12:41:42 +00:00
Ralph Castain
d396f0a6fc Per the discussion on the devel list, move the binding of processes to processors from MPI_Init to process start. This involves:
1. replacing mpi_paffinity_alone with opal_paffinity_alone - for back-compatibility, I have aliased mpi_paffinity_alone to the new param name. This caus
es a mild abstraction break in the opal/mca/paffinity framework - per the devel discussion...live with it. :-) I also moved the ompi_xxx global variable
 that tracked maffinity setup so it could be properly closed in MPI_Finalize to the opal/mca/maffinity framework to avoid an abstraction break.

2. Added code to the odls/default module to perform paffinity binding and maffinity init between process fork and exec. This has been tested on IU's odi
n cluster and works for both MPI and non-MPI apps.

3. Revise MPI_Init to detect if affinity has already been set, and to attempt to set it if not already done. I have *not* tested this as I haven't yet f
igured out a way to do so - I couldn't get slurm to perform cpu bindings, even though it supposedly does do so.

This has only been lightly tested and would definitely benefit from a wider range of evaluation...

This commit was SVN r21209.
2009-05-12 02:18:35 +00:00
Josh Hursey
d920a302f3 Some more C/R related commits that have been sitting off-trunk for a while.
* Pass the sequence number of the checkpoint along with reference from the global to the local coordinator.
 * 'orte-restart --apponly' now just generates the app context file, and does not run with it. This provides the user the ability to edit the file before launching. 
 * Add a OPAL_CRS_NONE state
 * Split the INC into three distinct parts.
 * Implement a restart mechanism for the 'none' component. If given a context it simply execvp()'s it.

This commit was SVN r21195.
2009-05-08 20:51:13 +00:00
Josh Hursey
5d0607395d A couple of C/R related commits that have been sitting off-trunk for a while.
* Add 'orte-checkpoint -l' option that lists all checkpoints currently available on the system.
 * Add 'orte-restart -i' which prints information regarding the checkpoint targeted for restart.
 * Add ability to extract the timing metadata.
 * Fix show_help() in the orte-checkpoint and orte-restart tools. They should be using the opal versions instead of the orte versions (otherwise nothing is printed).

This commit was SVN r21194.
2009-05-08 19:41:11 +00:00
Rainer Keller
b0754071b7 - For compilation with BLCR and --with-ft=cr, #include <string.h>
This commit was SVN r21185.
2009-05-07 16:14:59 +00:00
Greg Koenig
60485ff95f This is a very large change to rename several #define values from
OMPI_* to OPAL_*.  This allows opal layer to be used more independent
from the whole of ompi.

NOTE: 9 "svn mv" operations immediately follow this commit.

This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Shiqing Fan
cd565923d3 Completely remove ltdl support for Windows build.
This commit was SVN r21170.
2009-05-05 18:59:13 +00:00
Josh Hursey
8b8bee04d6 It seems that some of the patches were missed in r21131. :(
This patch contains the following items:
 * Fix the flag passed to open() for the read side of the named pipe between the local and app coordinator. There is a race condition when using O_RDWR on a named pipe (not sure how that bug got in there in the first place).
 * Adjust control in the C/R thread timing
 * Clarify return code in BLCR component
 * Allow the user to adjust the max wait time for the named pipes in the FileM local coordinator by using the MCA parameter "snapc_full_max_wait_time" (Default: 20 seconds)
 * If the application terminates while there are active FileM operations, force mpirun to wait on these operations to complete.
 * Allow the user to set the local copy command (Default: cp) via MCA parameter "filem_rsh_cp"
 * Implement the ability to throttle the number of outgoing connections in FileM. At larger scales this type of explicit throttling helps prevent overwhelming the HNP machine. Default: 10, set via MCA parameter: {{{filem_rsh_max_outgoing}}}

This commit was SVN r21167.

The following SVN revision numbers were found above:
  r21131 --> open-mpi/ompi@0deb009225
2009-05-05 16:45:49 +00:00
Ralph Castain
468800996b Make it possible to no-build the carto framework
Could swear we had done this before...but I guess not!

This commit was SVN r21150.
2009-05-05 03:54:58 +00:00
Josh Hursey
1327c57e9d add back a missing header
This commit was SVN r21148.
2009-05-04 21:30:11 +00:00
Ralph Castain
e1673778be Replace missing headers
This commit was SVN r21136.
2009-05-01 15:09:10 +00:00
Josh Hursey
ab63ab6568 forgot to update the copyright
This commit was SVN r21128.
2009-04-30 16:39:54 +00:00
Josh Hursey
759c2b5596 Add a 'crs_blcr_dev_null' MCA parameter. This causes BLCR to checkpoint directly to /dev/null instead of to a file.
Though this is not useful in checkpointing an application, it can be a useful diagnostic.

This commit was SVN r21125.
2009-04-30 16:32:55 +00:00
Shiqing Fan
ff0e51f686 Include a missing header.
This commit was SVN r21121.
2009-04-30 09:03:21 +00:00
Shiqing Fan
e7b6445b32 Add a missed .windows file into the tarball.
This commit was SVN r21105.
2009-04-29 10:31:10 +00:00
Rainer Keller
221fb9dbca ... Delayed due to notifier commits earlier this day ...
- Delete unnecessary header files using
   contrib/check_unnecessary_headers.sh after applying
   patches, that include headers, being "lost" due to
   inclusion in one of the now deleted headers...

   In total 817 files are touched.
   In ompi/mpi/c/ header files are moved up into the actual c-file,
   where necessary (these are the only additional #include),
   otherwise it is only deletions of #include (apart from the above
   additions required due to notifier...)

 - To get different MCAs (OpenIB, TM, ALPS), an earlier version was
   successfully compiled (yesterday) on:
   Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
   Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
   Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled

This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
Shiqing Fan
3d4e0472d6 Add windows support files into the tarball, including .windows, CMakeLists.txt files, and CMake modules. Thanks to Jeff for testing it on Linux.
This commit was SVN r21069.
2009-04-24 16:39:33 +00:00
Brian Barrett
2ca0b7fe44 remove some checks which are not needed after the recent ptmalloc2 changes
This commit was SVN r21042.
2009-04-19 18:17:05 +00:00
Jeff Squyres
e90ecb6020 Fix a compiler warning. Put in a good comment explaining why it is
declared the way it is.  Sigh.

This commit was SVN r21040.
2009-04-17 21:59:31 +00:00
Jeff Squyres
35fc9fedd2 MTT is your friend: Cisco tests --enable-static --disable-shared, but
we had already tested this scenario manually to know that it seemed to
be working.  What we ''didn't'' test was --enable-static
--disable-shared --disable-dlopen -- but my MTT '''did.'''  Yay!

This commit fixes that scenario.  Essentially we need to call a dummy
function in hooks.c to ensure that the linker pulls in all those
symbols into the final executable (and therefore pulls in the
malloc_initialize_hook, etc.).  Thanks for the heads-up from Brian in
fixing this one!

This commit was SVN r21022.
2009-04-15 19:09:10 +00:00
Shiqing Fan
0ea6d48320 Add a missed .windows file for timer component, which should be built always statically.
This commit was SVN r20987.
2009-04-14 12:19:21 +00:00
Jeff Squyres
3cfa8f55c4 Gaah; I meant to include a better comment in the last commit but had
forgotten to save before the commit was sent.

This comment explains why we're doing a cache check here rather than a
real check.

This commit was SVN r20975.
2009-04-10 21:16:23 +00:00
Jeff Squyres
9fcd01035d Fix a problem reported by Steve Kagl on the user's list; the posix
component (which we probably don't test regularly because we probably
only test environments where the other paffinity components are used)
was not getting built because it had a bad configure test.

This commit was SVN r20974.
2009-04-10 21:15:20 +00:00
Shiqing Fan
6e04a4de08 On Windows, define a equivalent type for in_addr_t, and correctly include unistd.h.
This commit was SVN r20951.
2009-04-07 16:07:05 +00:00
Jeff Squyres
a13dfb2140 Add in a proper test for munmap.
This commit was SVN r20936.
2009-04-04 00:43:17 +00:00
Jeff Squyres
52a0e5fe69 Add some checks for more network driver types.
This commit was SVN r20934.
2009-04-02 19:17:21 +00:00
Jeff Squyres
3bf8c7025a Remove compiler warning about function not being prototyped.
This commit was SVN r20929.
2009-04-02 13:06:47 +00:00
Jeff Squyres
0f517c3d3f Gah; some non-final code got merged in by accident. Remove debugging
printf and put in the final test code for malloc.

This commit was SVN r20924.
2009-04-01 18:20:23 +00:00