1
1
Граф коммитов

12228 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
4c558ed637 Enable aggregation checking for "*** An error occurred..." MPI layer
help messages so that users only see the message once instead of N
times when their MPI app crashes.

Note that there is a tradeoff here -- we now call malloc in this
particular "show the error" code path.  This shouldn't usually be a
problem, because the errors typically displayed through this mechanism
are MPI API argument problems (e.g., sending a negative count to
MPI_SEND), and not memory errors.  But such API argument errors could
be a consequence of of a prior memory error, so there's a nonzero
chance that the error failure will fail to print because malloc
failed.  In this case, the user can disable help message aggregation
(via the orte_base_want_aggregate MCA parameter) and we'll fall back
to the no-malloc code path (but without aggregation).

Note that we won't aggregate before MPI_INIT or after MPI_FINALIZE.
So if you call an MPI function before MPI_INIT / after MPI_FINALIZE,
you'll still see the error message N times.  Nothing we can do about
that; we need ORTE to do the aggregation properly (which is obviously
unavailable before MPI_INIT / after MPI_FINALIZE).

This commit was SVN r19611.
2008-09-23 17:19:24 +00:00
Jeff Squyres
a676874f47 Disable global ID resolution when sparse groups are used. Tested by
Terry and George in the non-sparse-groups scenarios.  Fixes trac:1464.

Will file a new ticket to actually resolve IDs when sparse groups are
used.

This commit was SVN r19610.

The following Trac tickets were found above:
  Ticket 1464 --> https://svn.open-mpi.org/trac/ompi/ticket/1464
2008-09-23 16:27:01 +00:00
Pavel Shamis
bd09bbf851 Disabling IBCM support by default.
The component still is not stable.

This commit was SVN r19609.
2008-09-23 15:57:55 +00:00
Ralph Castain
e64b79f30f Modify the --display-map and --display-alloc per note on devel list to reduce info for user understanding.
Add --display-devel-map and --display-devel-alloc to display all the detailed info we used to provide - it is only of use/interest to developers anyway and confuses users.

This commit was SVN r19608.
2008-09-23 15:46:34 +00:00
Josh Hursey
90c936b292 Cleanup BLCR configure logic. Add a '--with-blcr-libdir' option to allow a user to specify a library directory outside of the '--with-blcr' option.
Needs to be moved to v1.3

This commit was SVN r19607.
2008-09-22 19:48:47 +00:00
Josh Hursey
36b824effd Make sure to protect the symbol, so builds that do not involve threads will build properly.
Thanks to Jeff for pointing this out to me.

This commit was SVN r19606.
2008-09-22 19:03:41 +00:00
Kenneth Matney
68248a32ef Add #include for stdio.h to allow make check to run with gcc 4.2.4 (on
Cray XT platform).

This commit was SVN r19605.
2008-09-22 18:00:30 +00:00
Jeff Squyres
e0a991a8c2 Print out a message telling the user how to enable non-aggregated help
/ error messages.

This commit was SVN r19604.
2008-09-22 17:42:56 +00:00
Jeff Squyres
d6696c46a6 Oops -- sometimes we actually pass NULL for the error_code. Make sure
to handle that nicely without segv'ing.

This commit was SVN r19603.
2008-09-22 17:41:39 +00:00
Josh Hursey
0cd65bfaa8 Fix a SIGPIPE that may occur when checkpointing a restarted process. This was a result of calling system() in the BLCR CRS. After inspection and testing it was determined that the operation was no longer necessary. So the call was removed thus fixing the bug.
This commit was SVN r19601.
2008-09-22 16:49:56 +00:00
Jeff Squyres
8eccda391a Fix comment to match the code.
This commit was SVN r19598.
2008-09-20 12:35:48 +00:00
Jeff Squyres
02f2cbe85a * Added bullet about upgrading autotools
* Added bullet about removing duplicate error messages
 * Some minor grammar and syntax fixes.

This commit was SVN r19597.
2008-09-20 11:42:59 +00:00
Jeff Squyres
5fd742e769 Add in the standardized way to notify a debugger if the MPI job is
about to abort.  Fixes trac:1509.

This commit was SVN r19596.

The following Trac tickets were found above:
  Ticket 1509 --> https://svn.open-mpi.org/trac/ompi/ticket/1509
2008-09-20 11:34:37 +00:00
Matthias Jurenz
16561fa297 Added config.h.in to svn:ignore
This commit was SVN r19593.
2008-09-19 15:17:36 +00:00
Matthias Jurenz
5755d35045 Removed - This file will be created by autotools
This commit was SVN r19591.
2008-09-19 15:09:46 +00:00
Jeff Squyres
53967f2b4e Merge in PLPA v1.2rc2 (README fixes, new version of Autotools, and
have PLPA report its version correctly).

This commit was SVN r19590.
2008-09-19 15:05:03 +00:00
Jeff Squyres
b1ff61b19e Update to PLPA v1.2rc1
This commit was SVN r19589.
2008-09-19 14:49:53 +00:00
Jeff Squyres
7d119a1c3b Fix CID 1116: ensure to check return code (patch approved by George
:-) ).

This commit was SVN r19584.
2008-09-19 13:28:04 +00:00
Jeff Squyres
d0a8be6d2f Fix CID 1117: ensure to check return values.
This commit was SVN r19583.
2008-09-19 13:27:30 +00:00
Lenny Verkhovsky
ca0a5ea60b Fixed the warnings on the crays.
base/paffinity_base_service.c:153: warning: 'phys_core' may be used uninitialized in this function
base/paffinity_base_service.c:153: note: 'phys_core' was declared here

This commit was SVN r19580.
2008-09-18 11:31:12 +00:00
Matthias Jurenz
d42592113b Fixed compiler warning (unused variable)
This commit was SVN r19577.
2008-09-17 14:39:19 +00:00
Josh Hursey
778e387618 fix a compiler warning
This commit was SVN r19574.
2008-09-17 14:01:31 +00:00
Josh Hursey
80d05cf957 Cleanup the patch from r19566.
Thanks to George and Jeff for pointing out a better way to do this.

This commit was SVN r19573.

The following SVN revision numbers were found above:
  r19566 --> open-mpi/ompi@351c3a3a86
2008-09-17 13:55:21 +00:00
Jeff Squyres
d2d06008a0 Change the default value of mpi_leave_pinned to -1, meaning that we'll
figure it out at runtime (really meaning: we'll still default to "0"
unless something explicitly overrides to 1, such as the openib BTL).
This way, ompi_info doesn't confusingly report mpi_leave_pinned==0 for
mpi_leave_pinned, but we end up running with mpi_leave_pinned==1.

Fixes trac:1502.

This commit was SVN r19571.

The following Trac tickets were found above:
  Ticket 1502 --> https://svn.open-mpi.org/trac/ompi/ticket/1502
2008-09-16 22:06:14 +00:00
Josh Hursey
351c3a3a86 The ft_event function needs access to the bml_r2_remove_btl_progress() to ensure
that all progress events are flushed as needed across a checkpoint/restart.

This commit was SVN r19566.
2008-09-16 19:06:53 +00:00
Jeff Squyres
270f482fea Addendum to r19561: also remove a comment that is no longer true and
some code that is commented out.

This commit was SVN r19564.

The following SVN revision numbers were found above:
  r19561 --> open-mpi/ompi@17e65369be
2008-09-16 13:02:10 +00:00
George Bosilca
6a9514ee08 Make the code match the comment. I checked with Jelena, and based on the papers we
published this is the expected algorithm for the specified message and communicator
size.

This commit closes ticket #1330.

This commit was SVN r19563.
2008-09-15 23:28:40 +00:00
George Bosilca
acd3406aa7 Never drop messages. No never no more.
This is supposed to fix the ticket #1460.

This commit was SVN r19562.
2008-09-15 23:04:18 +00:00
George Bosilca
17e65369be Fix the deadlock when we run out of resources on the BTLs. Move the progress
function from the BML into the PML. The BTL progress functions are now directly
registered with the event library.

This commit was SVN r19561.
2008-09-15 22:56:23 +00:00
Shiqing Fan
68f6fdf111 - a small fix for windows, use different environment separators based on the system type.
This commit was SVN r19554.
2008-09-15 15:05:47 +00:00
Ralph Castain
5f3861572d Update LANL platform files
This commit was SVN r19552.
2008-09-12 16:48:31 +00:00
Jeff Squyres
4cf909a932 Fix a typpppo (thanks Rolf!).
This commit was SVN r19549.
2008-09-11 21:43:01 +00:00
Jeff Squyres
f794580bbe Print a [much] better error message when MPI processes are unable to
reach each other (this problem just bit me; I had forgotten how horrid
our previous error message was).

This commit was SVN r19548.
2008-09-11 20:52:58 +00:00
Jeff Squyres
eeabae49b9 Per http://www.open-mpi.org/community/lists/devel/2008/09/4648.php,
remove the unconditional opal_output's when mmap() fails, and instead,
conditionally output the failure message via btl_base_verbose settings.

This commit was SVN r19547.
2008-09-11 19:02:33 +00:00
Ralph Castain
e44ac3f36d Remove some historical, but now unused, cruft
This commit was SVN r19545.
2008-09-11 12:49:11 +00:00
Rainer Keller
5c58b4e7cd - Move the OMPI_DECLSPEC from .c to .h
file. breaks windows compilation. see r19502

This commit was SVN r19544.

The following SVN revision numbers were found above:
  r19502 --> open-mpi/ompi@ce42e749a0
2008-09-11 12:26:33 +00:00
Ralph Castain
16e4b0b698 Ensure that a child job inherits its parent job's prefix dir during comm_spawn operations
This commit was SVN r19538.
2008-09-10 19:05:23 +00:00
Josh Hursey
36185ad964 Replace the old coordinated component ('coord') and replace it with a much more refined version ('bkmrk').
The new component fixes a number of problems with the old component. The core algorithm is the same, but by changing the data strucutres a bit we have improved performance and memory utilization.

There are still a couple corner cases that still need some work. However, I did not want to delay bringing this into the trunk (and v1.3 branch) for too much longer.

This commit was SVN r19537.
2008-09-10 18:29:17 +00:00
Rolf vandeVaart
1ad9d0459e Add a check for LOCK_SHARED in the sys/synch.h file. If it exists then smash it to avoid problems with preprocessor and C++.
This fixes trac:1477.

Help provided by Jeff and Terry.

This commit was SVN r19533.

The following Trac tickets were found above:
  Ticket 1477 --> https://svn.open-mpi.org/trac/ompi/ticket/1477
2008-09-10 12:58:30 +00:00
Ralph Castain
f326ee356e Add some error output to the plm rsh
This commit was SVN r19532.
2008-09-10 01:59:49 +00:00
Ralph Castain
20ece3cb86 Add new test that stresses MPI send/recv
This commit was SVN r19530.
2008-09-09 15:47:31 +00:00
Rolf vandeVaart
3193cf1008 Fix workaround for Sun Studio compilers and message queue when compiling with threads.
This commit was SVN r19528.
2008-09-09 13:46:36 +00:00
Jeff Squyres
0ae2c27d3b Ensure that the mutex is properly constructed/destructed.
This commit was SVN r19527.
2008-09-09 12:57:45 +00:00
Jeff Squyres
4b5de753d4 Bring in new PLPA v1.2b5 to fix a typo found by Lenny.
This commit was SVN r19526.
2008-09-09 12:29:31 +00:00
Rainer Keller
1d7fd1f51d - Initialize the lock as well
Found by Christoph Niethammer using the mpi_test_suite

This commit was SVN r19523.
2008-09-09 08:01:41 +00:00
Shiqing Fan
8558ba1f51 - Remove the duplicated declarations, which causes linkage errors and warning when building shared libraries on Windows.
This commit was SVN r19520.
2008-09-08 16:53:26 +00:00
Ralph Castain
c0d7fbaf88 A few mapping cleanups - mostly aimed to properly balancing loads so multi app-context comm_spawns don't dump everything on one node.
This commit was SVN r19519.
2008-09-08 15:45:55 +00:00
Ralph Castain
9b8473fdbf Cleanup orted cmd line - we don't need to pass nodenames, and shouldn't pass heartbeat unless the orted is going to use it. This helps shorten the cmd line for future use.
Cleanup when an orted actually opens the PLM. Unfortunately, some unmentionable people are pushing head node environs out to remote nodes, causing the daemons to think they are the HNP. This helps prevent the confusion.

This commit was SVN r19518.
2008-09-08 15:45:11 +00:00
Shiqing Fan
04ee20a880 - Mainly type casts. Microsoft VC++ compiler is too strict.
This commit was SVN r19517.
2008-09-08 15:39:30 +00:00
Jeff Squyres
2f50fc8b92 Add comment about MPI::Comm::Call_errhandler.
This commit was SVN r19515.
2008-09-08 14:43:22 +00:00