1
1
Граф коммитов

12010 Коммитов

Автор SHA1 Сообщение Дата
Thomas Herault
28dc80b67e Deal with the SIGCHLD issue in LSF.
lsb_launch tampers with SIGCHLD signal handler. We are forced to reinstall our own signal handler after a call to this function.

This commit fixes trac:1356.

This commit was SVN r19033.

The following Trac tickets were found above:
  Ticket 1356 --> https://svn.open-mpi.org/trac/ompi/ticket/1356
2008-07-25 15:23:23 +00:00
Ralph Castain
7e6e104fc3 Add more debugging to the RML when it fails to find a route - specifically, have it print a stacktrace so we can figure out where it came from.
This commit was SVN r19032.
2008-07-25 15:01:41 +00:00
Ralph Castain
42c134cb32 Silence stupid compiler warning - and a certain someone who keeps reminding me of it... :-)
This commit was SVN r19031.
2008-07-25 14:01:06 +00:00
Ralph Castain
a1d296ae03 This commit fixes ticket #1410
Fix a few bugs in the mappers:

1. Ensure that bynode with no -np fills all available slots - it just does so with the ranks set bynode instead of byslot

2. fix --nolocal behavior so it works correctly in all cases. We still have to test the host's name using opal_ifislocal in the mapper because the name returned by gethostname to orte_process_info.hostname can be an FQDN, but a hostfile may contain a non-FQDN version.

3. Add missing --nolocal logic to the seq mapper

Oversubscribed mapping seemed to be working okay without repair, so I couldn't verify my own bug report in that regard.

Also included are some preliminary changes to support the modified hostfile behavior, which will be committed shortly:

1. removed the totally useless "allocate" field in the orte_node_t object since every node is automatically allocated for use - and everything ignored the field anyway

2. correctly initialize the slots_alloc field when the allocation is read

This commit was SVN r19030.
2008-07-25 13:35:12 +00:00
Jeff Squyres
31df89ccb2 Add bullet about mpi_leave_pinned and the openib BTL
This commit was SVN r19029.
2008-07-25 11:29:37 +00:00
Jeff Squyres
4adc4a632a This option is neither documented nor implemented.
This commit was SVN r19027.
2008-07-24 23:37:16 +00:00
Jeff Squyres
e3e79c0881 Fixes trac:1379:
* Use synonym/deprecated MCA param API for some mca base params
 * In openib BTL, if we have appropriate memory hooks support, and if
   mpi_leave_pinned and mpi_leave_pinned_pipeline were not set by the
   user, set mpi_leave_pinned to 1.
 * Defer checking mpi_leave_pinned_* until as late as possible (i.e.,
   until after the btl's have had a chance to set mpi_leave_pinned to
   1):
   * in ob1 pml
   * in rdma mpool

This commit was SVN r19022.

The following Trac tickets were found above:
  Ticket 1379 --> https://svn.open-mpi.org/trac/ompi/ticket/1379
2008-07-24 22:51:26 +00:00
Josh Hursey
ca43968418 Fix a dealock scenario when registering depricated MCA parameters. The internal loop uses the 'item' variable that is used by the outer loop as well. So when the outer loop checks the value of 'item' it will never equal the end of the list since it no longer references the same list.
Kinda found by MTT. MTT calls 'ompi_info --all --parsable' and it was livelocked and had to be killed by hand.

I'm going to push this one to Jeff to push to v1.3 since he did the original implementation and should check this code.

This commit was SVN r19014.
2008-07-24 15:51:54 +00:00
George Bosilca
6c21851160 Only register the BTL progress function if there is a need for it. This require
a little bit more than "BTL was able to add some procs". The real condition to
allow the BTL progress is that we will use it to send/recv data to/from some
of the peers (this include the BTL exclusivity in the process).

This commit was SVN r19010.
2008-07-24 10:33:17 +00:00
Ralph Castain
fdb2408bf2 Rename the osx paffinity component the "posix" component since it really has nothing osx specific in it - it is just a generic posix call to determine #processors. Set the priority low so that both linux and solaris components override it if they build. It shouldn't build in Windows at all.
Modify the odls to remove a (size_t) typecast in front of the num_processors variable just in case it is returned negative. This usually is accompanied by an opal_error, so this shouldn't make any difference - but it is more technically correct.

This commit was SVN r19008.
2008-07-24 01:54:51 +00:00
Ralph Castain
d880d6282a Update LANL platform files
This commit was SVN r19005.
2008-07-23 18:57:03 +00:00
Lenny Verkhovsky
b4d54dda57 Fixed possible seqf when using RANKFILE, but not all ranks assigned
Fixed allocation of all ranks when using RANKFILE, but not all ranks assigned
Aborting if using RANKFILE, but np wasn't specified a little earlier
Clean mca_rmaps_rank_file_component.debug

This commit was SVN r19004.
2008-07-23 17:44:02 +00:00
Shiqing Fan
0646cd2491 - Move wait object instance code out of the #ifdef block, so that systems with waitpid and Windows can both use it. Thanks to Ralph.
This commit was SVN r19003.
2008-07-23 16:20:42 +00:00
Jeff Squyres
1fd5b0402a Refs trac:1250
* Fix linux paffinity component to make a "best" guess when PLPA
   can't find topology information in the Linux kernel.  That is, if
   PLPA can't tell us the max_processor_id, just assume that it's the
   same as the number of processors.  If you have a more complex
   system than that (e.g., you have holes in your available processor
   IDs), you'll likely be running a Linux kernel that supports the
   topology information, and this problem won't happen.
 * Make sure to conver the return codes from PLPA to OPAL_ERR* codes.

This commit was SVN r19001.

The following Trac tickets were found above:
  Ticket 1250 --> https://svn.open-mpi.org/trac/ompi/ticket/1250
2008-07-23 15:47:43 +00:00
Ralph Castain
e3c3d28bf1 Add some more debugging to tell us how many processors were found when setting sched_yield
This commit was SVN r18999.
2008-07-23 15:28:51 +00:00
Thomas Herault
b6affd35e9 Small typos for LSF compilation and update Makefile.am
This commit was SVN r18998.
2008-07-23 14:42:26 +00:00
Jeff Squyres
5b9219565c Remove the use of __cpu_to_be64() and replace it with hton64().
This commit was SVN r18995.
2008-07-23 12:08:55 +00:00
Shiqing Fan
5f021e47a9 - Add support for get_processor_info in windows paffinity module.
This commit was SVN r18992.
2008-07-23 07:59:03 +00:00
Ralph Castain
76600d9e51 Set properties on new component
This commit was SVN r18991.
2008-07-23 04:11:30 +00:00
Ralph Castain
83e7c19d33 Remove deprecated function - this was incorporated into the paffinity framework a long time ago. Fortunately, nobody was actually using it!
This commit was SVN r18990.
2008-07-23 03:43:31 +00:00
Ralph Castain
f32e24ab86 Move the POSIX-specific code out of the paffinity base. Add support for OSX in its own component.
For now, hide the OSX component with .ompi_ignore so only I can see it until I can ensure that it doesn't inadvertently interfere with Linux and Solaris support.

This clears the conflict with Windows.

This commit was SVN r18989.
2008-07-23 03:29:43 +00:00
Ralph Castain
dbc35b60f6 Okay, one last time - get the xml output of the map correct...sigh.
This commit was SVN r18988.
2008-07-23 02:45:08 +00:00
Ralph Castain
76f2659527 Very minor cleanup to slurm support
This commit was SVN r18987.
2008-07-23 02:35:03 +00:00
Ralph Castain
1f665425e7 Fix some compile problems in the LSF support
This commit was SVN r18986.
2008-07-23 02:34:41 +00:00
Jeff Squyres
2f208f885c Fixes trac:1295: change language in openib BTL from IB-specific to be
"!OpenFabrics" / neutral (i.e., refer to IB and/or iWARP).

 * Mostly just type, variable/field, and funcion name changes, such as
   s/hca/device/g, etc.  
 * Changed the INI file for the hardware-specific parameters to be
   mca-btl-openib-device-params.ini.
 * Updated a lot of help messages in the help-*.txt files, not just to
   update it to be !OpenFabrics/neutral language, but also for some
   consistency of tone, indenting, etc.
 * Deprecated a bunch of MCA params in favor of language-neutral new
   ones:
   * btl_openib_warn_no_hca_params_found (s/hca/device/)
   * btl_openib_hca_param_files
   * btl_openib_ib_cq_size (s/_ib_/_of_/)
   * btl_openib_ib_max_inline_data
   * btl_openib_ib_psn
   * btl_openib_ib_mtu
   * btl_openib_ib_pkey_ix
   * btl_openib_ib_pkey_val

This commit was SVN r18985.

The following Trac tickets were found above:
  Ticket 1295 --> https://svn.open-mpi.org/trac/ompi/ticket/1295
2008-07-23 00:28:59 +00:00
Aurelien Bouteiller
086cb6190e Use the generic version number instead of hardcoded ones
This commit was SVN r18983.
2008-07-22 21:10:51 +00:00
Jon Mason
f80404d991 Add openib error handling during wireup for rdmacm
The rdmacm event handler has no way of reporting fatal errors to the upper
layers.  By calling mca_btl_openib_endpoint_invoke_error in the rdmacm event
handler for the errors encountered, these errors can now be handled
appropriately.

Closes out Ticket #1283

This commit was SVN r18980.
2008-07-22 19:03:13 +00:00
Rolf vandeVaart
ed4920ba5f Fix a couple problems with orte-clean. Also add a new
--debug flag to help developers figure out possible future issues.

This fixes trac:1335.

This commit was SVN r18979.

The following Trac tickets were found above:
  Ticket 1335 --> https://svn.open-mpi.org/trac/ompi/ticket/1335
2008-07-22 17:41:06 +00:00
Ralph Castain
26cfac94e6 Fix a formatting problem with xml output of map
This commit was SVN r18976.
2008-07-22 13:14:02 +00:00
Jeff Squyres
d37a25a2d0 Remove per http://www.open-mpi.org/community/lists/devel/2008/07/4386.php
This commit was SVN r18972.
2008-07-22 00:57:23 +00:00
Ralph Castain
a4f0fa6e3a Update the routed framework to:
1. add a new API delete_route(orte_process_name_t*) to delete the specified proc from the routing table

2. modify update_route so that it actually updates pre-existing routes instead of only adding routing info the end of the hash table

This fixes ticket #1403

This commit was SVN r18970.
2008-07-21 21:37:09 +00:00
Jeff Squyres
4180667adb configure.params should not be svn:ignore'ed
This commit was SVN r18969.
2008-07-21 21:18:43 +00:00
Jeff Squyres
148a774ba3 For the lazy: a first-cut .hgignore file to ignore .svn directories
This commit was SVN r18968.
2008-07-21 21:05:58 +00:00
Jeff Squyres
25bcf0f1d3 Oops -- cut-n-paste-error -- use OMPI in the OMPI layer...
This commit was SVN r18966.
2008-07-21 20:07:37 +00:00
Jeff Squyres
54dbd95243 Fix some component version numbers to be the same as the OMPI release
This commit was SVN r18965.
2008-07-21 20:05:29 +00:00
Ralph Castain
3137ed9255 Update the manpages for comm_spawn(_multiple) - add man page to explain host/hostfile behavior
This commit was SVN r18961.
2008-07-21 17:58:12 +00:00
Ralph Castain
28ca14297c Add minimal support (#processors only) for OSX and other systems that don't have paffinity modules.
This commit was SVN r18959.
2008-07-21 16:54:14 +00:00
George Bosilca
bcac9a0540 Remove a warning about using map when it is not initialized.
This commit was SVN r18957.
2008-07-21 14:35:05 +00:00
George Bosilca
4f9ea0155b Remove 2 compiler warnings.
This commit was SVN r18956.
2008-07-21 12:55:40 +00:00
Pavel Shamis
849a8f86a7 Bug fox for #1388 - fixing ib_cm_listen() random failures.
This commit was SVN r18952.
2008-07-20 06:21:32 +00:00
Jeff Squyres
750ea30961 So apparently my clever fix in r18873 was not good -- apparently, we
can have a pub_endpoint and a sub_endpoint that are not equal but go
to the same place (fd).  I didn't think that that was possible. :-\

So just use a bool to track whether we have forwarded the fragment at
all; if we have, then don't forward to the sub_endpoint.

IOF is going to be re-written for v1.4.

This commit was SVN r18950.

The following SVN revision numbers were found above:
  r18873 --> open-mpi/ompi@773c92a6eb
2008-07-18 20:04:26 +00:00
Ethan Mallove
294f07a13d Fixes trac:1401 (-xvector* needs to be counted as an optimzation flag for orterun to compile)
This commit was SVN r18947.

The following Trac tickets were found above:
  Ticket 1401 --> https://svn.open-mpi.org/trac/ompi/ticket/1401
2008-07-18 19:19:22 +00:00
Ralph Castain
6135943382 Update the paffinity call in the ODLS so we retrieve the number of processors on the local node, thus allowing us to correctly set the sched_yield parameter.
This commit was SVN r18946.
2008-07-18 19:19:16 +00:00
Jeff Squyres
e28f71f6e6 Have ompi_info also output where the value was set from (default,
environment, file, or API override).

Refs trac:1397

This commit was SVN r18943.

The following Trac tickets were found above:
  Ticket 1397 --> https://svn.open-mpi.org/trac/ompi/ticket/1397
2008-07-18 10:52:21 +00:00
Rolf vandeVaart
9c080b27d6 Fix for bug when running 64-bit heterogeneous.
This commit fixes trac:1341.

This commit was SVN r18940.

The following Trac tickets were found above:
  Ticket 1341 --> https://svn.open-mpi.org/trac/ompi/ticket/1341
2008-07-17 19:04:40 +00:00
George Bosilca
3ba0a8c0c1 In the case where the environment is homogeneous we can ALWAYS create
the receiver convertor when we create the request (as we know all
architectures are identical).

This commit was SVN r18934.
2008-07-17 04:57:55 +00:00
George Bosilca
902a2892b6 Fix typo.
This commit was SVN r18933.
2008-07-17 04:55:23 +00:00
George Bosilca
cb66115512 Add more optimizations in the case where heterogeneous support
is not enabled.

This commit was SVN r18932.
2008-07-17 04:54:47 +00:00
George Bosilca
939fa3001d Small cleanups. Remove some switch cases that cannot be reached. Rename
a struct field.

This commit was SVN r18931.
2008-07-17 04:50:39 +00:00
George Bosilca
319a8b3219 Once matched the proc attached to the request should be the source
of the message and not the first on the list. This fix the ticket
#1386.

This commit was SVN r18929.
2008-07-17 03:04:28 +00:00