1
1

12080 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
3107545709 Ensure that ORTE processes such as mpirun and orted never inadvertently bind themselves to cores. Change the mca param name used by the rank_file mapper to get user directives on slot lists to be different from that used by MPI procs to discover their binding. Add a cmd line option to orterun to make it easier for a user to specify the slot list (basically, hide the mca param name).
Discussed and reviewed with Lenny and Jeff.

This commit was SVN r19062.
2008-07-28 14:18:36 +00:00
Adrian Knoth
5096512c3a Cosmetics, only typos.
This commit was SVN r19061.
2008-07-28 13:33:08 +00:00
Ralph Castain
96a74c2d09 Update ignore properties
This commit was SVN r19060.
2008-07-28 12:52:24 +00:00
Jeff Squyres
477cdb0b62 Had to abondon the first approach from r19040: it caused problems with
"make distclean".  It's not clear whether it's an Automake bug or
whether what I did simply is not supported (I've got pending mail into
Ralf W. asking about it).  The short version is that during "make
distclean", ompi/mpi/f77/Makefile would rm -rf ompi/mpi/f77/.deps.
But ompi/Makefile still include's some .Plo files from that directory,
so Bad Things happened when "make distclean" unrolled from the
ompi/mpi/f77 dir back up to the ompi/ dir.

So I went with George's original suggestion and moved the f77 "base"
files in question into a new directory: ompi/mpi/f77/base and put a
Makefile.include in there.  That way, this directory is not traversed
twice by distclean, and .deps is only removed when it is supposed to
be.  Maybe we'll be able to do it a little better someday, but that's
the way it is now.

I'll check this with a fresh checkout once this is committed to SVN as
well; some of these kinds of problems don't show up until you do a
build from a completely fresh SVN checkout.

This commit was SVN r19054.

The following SVN revision numbers were found above:
  r19040 --> open-mpi/ompi@9f4d4c4312
2008-07-26 20:38:30 +00:00
Jeff Squyres
505ffc6719 This file was never used
This commit was SVN r19053.
2008-07-26 19:27:18 +00:00
Tim Mattox
73b528b050 Include the base version (trunk, v1.2, etc.) in the body of a
create failure e-mail as well as on the subject line.

This commit was SVN r19052.
2008-07-26 14:13:20 +00:00
Jeff Squyres
7a4359b43f Since we no longer generate a Makefile or Makefile.in in this
directory, no longer ignore those files.

This commit was SVN r19051.
2008-07-26 14:05:57 +00:00
Tim Mattox
8753064190 Have create failure e-mails say what base version failed.
This commit was SVN r19050.
2008-07-26 13:31:06 +00:00
Jeff Squyres
78b8bac900 First part of "make distcheck" fix, but at least one more commit will
be required for a total fix.  Still looking into the problem...

This commit was SVN r19049.
2008-07-26 13:25:11 +00:00
Jeff Squyres
7c4d46a8d9 Grrr: I ''did'' remove these files on the initial commit of the new
SVN version (r19045), but I also edited the svn:ignore to ignore these
files in the same SVN commit -- I suspect that SVN got confused and
did not actually delete them.

This commit was SVN r19048.

The following SVN revision numbers were found above:
  r19045 --> open-mpi/ompi@63b63d48c3
2008-07-26 13:00:24 +00:00
Jeff Squyres
6dac4706ea Somehow this file got missed in the SVN import.
This commit was SVN r19047.
2008-07-26 12:54:15 +00:00
Jeff Squyres
4d034383d9 Apply patch from Ralf W. to remove a non-portable use of ==.
This commit was SVN r19046.
2008-07-26 12:36:24 +00:00
Jeff Squyres
63b63d48c3 Fixes trac:1370, #1147
Update the version of ROMIO to that which was contained in
MPICH2-1.0.7, plus a few patches from the upstream ROMIO maintainers
(because OMPI uses a few code paths in ROMIO that MPICH2 does not;
there were a few compile bugs in the ROMIO from MPICH2-1.0.7).

Added an info MCA param to be able to tell which version of ROMIO is
contained in OMPI: io_romio_version.

Many, many thanks to romio-maint@mcs.anl.gov for all their help in
integrating this new version of ROMIO into Open MPI.

This commit was SVN r19045.

The following Trac tickets were found above:
  Ticket 1370 --> https://svn.open-mpi.org/trac/ompi/ticket/1370
2008-07-26 12:23:30 +00:00
Ralph Castain
0735d6f1c2 This commit fixes ticket #1414
Cleanup the logic in the odls for when processes terminate. It turns out that we were only going through the kill_proc logic once instead of looping over all local children when we ordered a daemon to kill its local procs. This went unnoticed for some time as for most systems the local procs were terminated anyway when the daemon terminated due to the parent/child relationship.

Solaris is apparently different - the children are not automatically terminated when the parent dies. As a result, it acts as a detector for this bug.

Mucho thanks to Rolf V. for his help in debugging - and to IM for letting me follow his gdb progress in quasi real-time!

This commit was SVN r19044.
2008-07-26 02:54:43 +00:00
Jeff Squyres
92c10cd187 Remove some old kruft from Makefile.am's -- likely the result of
copying some old Makefile.am a long time ago.

This commit was SVN r19043.
2008-07-26 00:27:42 +00:00
Jeff Squyres
773f79f495 Follow-on to r19040; missed one file on the checkin.
This commit was SVN r19041.

The following SVN revision numbers were found above:
  r19040 --> open-mpi/ompi@9f4d4c4312
2008-07-25 21:35:29 +00:00
Jeff Squyres
9f4d4c4312 Fixes trac:1409: ensure that the C++, F77, and F90 bindings libraries
are properly linked against libmpi.la.

This required a little creative AM usage, inspired by discussion on
OMPI devel list:

 * Make a new ompi/mpi/f77/Makefile_f77base.include; effectively move
   the building of the f77 "base" glue stuff (libmpi_f77base.la) into
   this Makefile and away from ompi/mpi/f77/Makefile.am.  The sources
   in question require some specific CPPFLAGS, so we couldn't just add
   the raw sources into libmpi_la_SOURCES, unfortunately.
 * Include this new Makefile in the top-level ompi/Makefile.am
 * The libmpi_f77base.la LT convenience library was already sucked
   into libmpi.la; breaking it out into its own Makefile allows us
   to build it earlier and therefore complete buidling libmpi.la
   earlier.
 * Side effect: the ompi/mpi/Makefile.am is now mostly unnecessary; it
   no longer specifies a SUBDIRS for each of the bindings directories
   to traverse into (since they are now in the top-level SUBDIRS).  As
   such, the man pages are now also now included in the top-level
   ompi/Makefile.am.

The end of the result is that libmpi.la -- including a few sources
from mpi/f77 -- is fully built before the C++, F77, and F90 bindings
are built.  Therefore, the C++, F77, and F90 bindings libraries can
all link against libmpi.la.

This commit was SVN r19040.

The following Trac tickets were found above:
  Ticket 1409 --> https://svn.open-mpi.org/trac/ompi/ticket/1409
2008-07-25 21:18:05 +00:00
Ralph Castain
d5a916d350 Fix a problem reported by IBM: nolocal and bynode combined to map byslot. Problem actually was that any time multiple mapping policy directives were provided, we would only map byslot due to incorrect if statement conditions.
Thanks to Kris Davis for his patience while we tracked this down!

This commit was SVN r19039.
2008-07-25 17:50:46 +00:00
Ralph Castain
718cceddaa Ensure that we only launch procs on the HNP if that node is actually included in the allocation.
This commit was SVN r19038.
2008-07-25 17:13:22 +00:00
Ralph Castain
cb93775cca Just for the AR - remove unnecessary typecast
This commit was SVN r19034.
2008-07-25 15:30:37 +00:00
Thomas Herault
28dc80b67e Deal with the SIGCHLD issue in LSF.
lsb_launch tampers with SIGCHLD signal handler. We are forced to reinstall our own signal handler after a call to this function.

This commit fixes trac:1356.

This commit was SVN r19033.

The following Trac tickets were found above:
  Ticket 1356 --> https://svn.open-mpi.org/trac/ompi/ticket/1356
2008-07-25 15:23:23 +00:00
Ralph Castain
7e6e104fc3 Add more debugging to the RML when it fails to find a route - specifically, have it print a stacktrace so we can figure out where it came from.
This commit was SVN r19032.
2008-07-25 15:01:41 +00:00
Ralph Castain
42c134cb32 Silence stupid compiler warning - and a certain someone who keeps reminding me of it... :-)
This commit was SVN r19031.
2008-07-25 14:01:06 +00:00
Ralph Castain
a1d296ae03 This commit fixes ticket #1410
Fix a few bugs in the mappers:

1. Ensure that bynode with no -np fills all available slots - it just does so with the ranks set bynode instead of byslot

2. fix --nolocal behavior so it works correctly in all cases. We still have to test the host's name using opal_ifislocal in the mapper because the name returned by gethostname to orte_process_info.hostname can be an FQDN, but a hostfile may contain a non-FQDN version.

3. Add missing --nolocal logic to the seq mapper

Oversubscribed mapping seemed to be working okay without repair, so I couldn't verify my own bug report in that regard.

Also included are some preliminary changes to support the modified hostfile behavior, which will be committed shortly:

1. removed the totally useless "allocate" field in the orte_node_t object since every node is automatically allocated for use - and everything ignored the field anyway

2. correctly initialize the slots_alloc field when the allocation is read

This commit was SVN r19030.
2008-07-25 13:35:12 +00:00
Jeff Squyres
31df89ccb2 Add bullet about mpi_leave_pinned and the openib BTL
This commit was SVN r19029.
2008-07-25 11:29:37 +00:00
Jeff Squyres
4adc4a632a This option is neither documented nor implemented.
This commit was SVN r19027.
2008-07-24 23:37:16 +00:00
Jeff Squyres
e3e79c0881 Fixes trac:1379:
* Use synonym/deprecated MCA param API for some mca base params
 * In openib BTL, if we have appropriate memory hooks support, and if
   mpi_leave_pinned and mpi_leave_pinned_pipeline were not set by the
   user, set mpi_leave_pinned to 1.
 * Defer checking mpi_leave_pinned_* until as late as possible (i.e.,
   until after the btl's have had a chance to set mpi_leave_pinned to
   1):
   * in ob1 pml
   * in rdma mpool

This commit was SVN r19022.

The following Trac tickets were found above:
  Ticket 1379 --> https://svn.open-mpi.org/trac/ompi/ticket/1379
2008-07-24 22:51:26 +00:00
Josh Hursey
ca43968418 Fix a dealock scenario when registering depricated MCA parameters. The internal loop uses the 'item' variable that is used by the outer loop as well. So when the outer loop checks the value of 'item' it will never equal the end of the list since it no longer references the same list.
Kinda found by MTT. MTT calls 'ompi_info --all --parsable' and it was livelocked and had to be killed by hand.

I'm going to push this one to Jeff to push to v1.3 since he did the original implementation and should check this code.

This commit was SVN r19014.
2008-07-24 15:51:54 +00:00
George Bosilca
6c21851160 Only register the BTL progress function if there is a need for it. This require
a little bit more than "BTL was able to add some procs". The real condition to
allow the BTL progress is that we will use it to send/recv data to/from some
of the peers (this include the BTL exclusivity in the process).

This commit was SVN r19010.
2008-07-24 10:33:17 +00:00
Ralph Castain
fdb2408bf2 Rename the osx paffinity component the "posix" component since it really has nothing osx specific in it - it is just a generic posix call to determine #processors. Set the priority low so that both linux and solaris components override it if they build. It shouldn't build in Windows at all.
Modify the odls to remove a (size_t) typecast in front of the num_processors variable just in case it is returned negative. This usually is accompanied by an opal_error, so this shouldn't make any difference - but it is more technically correct.

This commit was SVN r19008.
2008-07-24 01:54:51 +00:00
Ralph Castain
d880d6282a Update LANL platform files
This commit was SVN r19005.
2008-07-23 18:57:03 +00:00
Lenny Verkhovsky
b4d54dda57 Fixed possible seqf when using RANKFILE, but not all ranks assigned
Fixed allocation of all ranks when using RANKFILE, but not all ranks assigned
Aborting if using RANKFILE, but np wasn't specified a little earlier
Clean mca_rmaps_rank_file_component.debug

This commit was SVN r19004.
2008-07-23 17:44:02 +00:00
Shiqing Fan
0646cd2491 - Move wait object instance code out of the #ifdef block, so that systems with waitpid and Windows can both use it. Thanks to Ralph.
This commit was SVN r19003.
2008-07-23 16:20:42 +00:00
Jeff Squyres
1fd5b0402a Refs trac:1250
* Fix linux paffinity component to make a "best" guess when PLPA
   can't find topology information in the Linux kernel.  That is, if
   PLPA can't tell us the max_processor_id, just assume that it's the
   same as the number of processors.  If you have a more complex
   system than that (e.g., you have holes in your available processor
   IDs), you'll likely be running a Linux kernel that supports the
   topology information, and this problem won't happen.
 * Make sure to conver the return codes from PLPA to OPAL_ERR* codes.

This commit was SVN r19001.

The following Trac tickets were found above:
  Ticket 1250 --> https://svn.open-mpi.org/trac/ompi/ticket/1250
2008-07-23 15:47:43 +00:00
Ralph Castain
e3c3d28bf1 Add some more debugging to tell us how many processors were found when setting sched_yield
This commit was SVN r18999.
2008-07-23 15:28:51 +00:00
Thomas Herault
b6affd35e9 Small typos for LSF compilation and update Makefile.am
This commit was SVN r18998.
2008-07-23 14:42:26 +00:00
Jeff Squyres
5b9219565c Remove the use of __cpu_to_be64() and replace it with hton64().
This commit was SVN r18995.
2008-07-23 12:08:55 +00:00
Shiqing Fan
5f021e47a9 - Add support for get_processor_info in windows paffinity module.
This commit was SVN r18992.
2008-07-23 07:59:03 +00:00
Ralph Castain
76600d9e51 Set properties on new component
This commit was SVN r18991.
2008-07-23 04:11:30 +00:00
Ralph Castain
83e7c19d33 Remove deprecated function - this was incorporated into the paffinity framework a long time ago. Fortunately, nobody was actually using it!
This commit was SVN r18990.
2008-07-23 03:43:31 +00:00
Ralph Castain
f32e24ab86 Move the POSIX-specific code out of the paffinity base. Add support for OSX in its own component.
For now, hide the OSX component with .ompi_ignore so only I can see it until I can ensure that it doesn't inadvertently interfere with Linux and Solaris support.

This clears the conflict with Windows.

This commit was SVN r18989.
2008-07-23 03:29:43 +00:00
Ralph Castain
dbc35b60f6 Okay, one last time - get the xml output of the map correct...sigh.
This commit was SVN r18988.
2008-07-23 02:45:08 +00:00
Ralph Castain
76f2659527 Very minor cleanup to slurm support
This commit was SVN r18987.
2008-07-23 02:35:03 +00:00
Ralph Castain
1f665425e7 Fix some compile problems in the LSF support
This commit was SVN r18986.
2008-07-23 02:34:41 +00:00
Jeff Squyres
2f208f885c Fixes trac:1295: change language in openib BTL from IB-specific to be
"!OpenFabrics" / neutral (i.e., refer to IB and/or iWARP).

 * Mostly just type, variable/field, and funcion name changes, such as
   s/hca/device/g, etc.  
 * Changed the INI file for the hardware-specific parameters to be
   mca-btl-openib-device-params.ini.
 * Updated a lot of help messages in the help-*.txt files, not just to
   update it to be !OpenFabrics/neutral language, but also for some
   consistency of tone, indenting, etc.
 * Deprecated a bunch of MCA params in favor of language-neutral new
   ones:
   * btl_openib_warn_no_hca_params_found (s/hca/device/)
   * btl_openib_hca_param_files
   * btl_openib_ib_cq_size (s/_ib_/_of_/)
   * btl_openib_ib_max_inline_data
   * btl_openib_ib_psn
   * btl_openib_ib_mtu
   * btl_openib_ib_pkey_ix
   * btl_openib_ib_pkey_val

This commit was SVN r18985.

The following Trac tickets were found above:
  Ticket 1295 --> https://svn.open-mpi.org/trac/ompi/ticket/1295
2008-07-23 00:28:59 +00:00
Aurelien Bouteiller
086cb6190e Use the generic version number instead of hardcoded ones
This commit was SVN r18983.
2008-07-22 21:10:51 +00:00
Jon Mason
f80404d991 Add openib error handling during wireup for rdmacm
The rdmacm event handler has no way of reporting fatal errors to the upper
layers.  By calling mca_btl_openib_endpoint_invoke_error in the rdmacm event
handler for the errors encountered, these errors can now be handled
appropriately.

Closes out Ticket #1283

This commit was SVN r18980.
2008-07-22 19:03:13 +00:00
Rolf vandeVaart
ed4920ba5f Fix a couple problems with orte-clean. Also add a new
--debug flag to help developers figure out possible future issues.

This fixes trac:1335.

This commit was SVN r18979.

The following Trac tickets were found above:
  Ticket 1335 --> https://svn.open-mpi.org/trac/ompi/ticket/1335
2008-07-22 17:41:06 +00:00
Ralph Castain
26cfac94e6 Fix a formatting problem with xml output of map
This commit was SVN r18976.
2008-07-22 13:14:02 +00:00
Jeff Squyres
d37a25a2d0 Remove per http://www.open-mpi.org/community/lists/devel/2008/07/4386.php
This commit was SVN r18972.
2008-07-22 00:57:23 +00:00