1
1
Граф коммитов

14640 Коммитов

Автор SHA1 Сообщение Дата
Edgar Gabriel
5881719d84 checks for sendcount and recvcount(s) being zero have slightly different
consequences depending on whether the communicator is an intra or an inter
communicator. 

fixes trac:2415

This commit was SVN r23187.

The following Trac tickets were found above:
  Ticket 2415 --> https://svn.open-mpi.org/trac/ompi/ticket/2415
2010-05-20 22:21:26 +00:00
Ethan Mallove
e751f3c21c Add a check for a duplicate rank assignment in the rankfile parser (Fixes trac:2414)
This commit was SVN r23186.

The following Trac tickets were found above:
  Ticket 2414 --> https://svn.open-mpi.org/trac/ompi/ticket/2414
2010-05-20 18:38:03 +00:00
Ralph Castain
ef3c88cbd2 If we have ordered jobs to terminate, then we should ignore comm_failed reports from daemons as they may be dropping out
This commit was SVN r23185.
2010-05-20 12:37:09 +00:00
Ralph Castain
05e05089b8 Ignore failed comm connections if it is our connection that failed
This commit was SVN r23184.
2010-05-20 03:13:09 +00:00
Jeff Squyres
6d0465971e An obvious optimization. :-)
This commit was SVN r23183.
2010-05-20 00:41:36 +00:00
Jeff Squyres
5eff7ae63b AC_USE_SYSTEM_EXTENSIONS will modify CFLAGS if nothing was in there
beforehand.  We don't want that.  So if there was nothing in
CFLAGS, put nothing back in there.

This commit was SVN r23182.
2010-05-20 00:25:23 +00:00
Abhishek Kulkarni
abe13d802c Silence warnings by commenting out unused functions in the "hnp" notifier component.
This commit was SVN r23181.
2010-05-19 22:46:05 +00:00
Abhishek Kulkarni
118ce0e166 OMPI FTB component updates
* register FTB events from an event schema file
  * define more FTB events
  * minor fixes

This commit was SVN r23180.
2010-05-19 22:05:06 +00:00
George Bosilca
b56ab33ff6 Indent and fix some uninitialized variables.
This commit was SVN r23179.
2010-05-19 21:20:33 +00:00
George Bosilca
c51932c250 Don't forget to initialize "line" in all cases.
This commit was SVN r23178.
2010-05-19 21:19:45 +00:00
Josh Hursey
71fa89aca5 Move the sos_init() after the initialization of opal_show_help.
I was getting a funny segv if the param_register failed, and show_help was not initialized yet.

This commit was SVN r23177.
2010-05-19 20:47:05 +00:00
Josh Hursey
77532c9f44 minor test fix, found my MTT
This commit was SVN r23176.
2010-05-19 17:02:13 +00:00
Ralph Castain
c7d7a18318 Little more cleanup from SOS
This commit was SVN r23175.
2010-05-19 16:28:58 +00:00
Rolf vandeVaart
03b3e75f86 Add two arguments to the PML error callback function. This
allows the BTL to specify a specific ompi_proc_t that had an
error.  Also add an optional descriptive string.  Currently, arguments
are not used but will be by future failover PML. 
Changes based on RFC.  Reviewed by George Bosilca.

This commit was SVN r23174.
2010-05-19 11:55:45 +00:00
Abhishek Kulkarni
c63c4d6892 Fix bugs where (OMPI_ERROR == *) checks cannot be converted to (OMPI_SUCCESS != *) since the return codes are overloaded to return an "index" on success.
The fix is to just check if the return value is positive or not, since all the SOS encoded errors are *always* negative.

The real fix (as Ralph points out) is to change these functions (opal_pointer_array_add and mca_base_param*) to return the index as a pointer.

This commit was SVN r23173.
2010-05-18 20:54:11 +00:00
Josh Hursey
499b90e1e8 some svn ignore fixes
This commit was SVN r23172.
2010-05-18 19:04:05 +00:00
Jeff Squyres
32417b9802 Bump up to hwloc v1.0.
This commit was SVN r23171.
2010-05-18 17:11:45 +00:00
Josh Hursey
f57e73d4e5 add a few more missing SOS includes
This commit was SVN r23168.
2010-05-18 15:00:07 +00:00
Rolf vandeVaart
cdd2d09c69 Fix broken compile.
This commit was SVN r23167.
2010-05-18 12:43:21 +00:00
Jeff Squyres
76d1bb3c96 Per some off-list discussion with Ralph and George, MPIR_being_debugged
really should be volatile.

This commit was SVN r23166.
2010-05-18 11:42:41 +00:00
Abhishek Kulkarni
260eaa24dd Add a missing header. Bleh.
This commit was SVN r23165.
2010-05-17 23:55:53 +00:00
Abhishek Kulkarni
8335ff3893 Fixing a branch merge issue. Looks like we picked the wrong branch when resolving
a conflict.

This commit was SVN r23164.
2010-05-17 23:49:46 +00:00
Abhishek Kulkarni
0b3e5f5d79 Silence a opal_sos compiler warning.
This commit was SVN r23163.
2010-05-17 23:14:44 +00:00
Abhishek Kulkarni
afbe3e99c6 * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
 SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
 back the native error code.

* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
  (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
  decode 'ret' to get the native error code.

This commit was SVN r23162.
2010-05-17 23:08:56 +00:00
Abhishek Kulkarni
f7f4dd87ab Change ORTE_ERR_LOG macro to make it SOS-aware.
If the logged error is an SOS-encoded error, use the SOS log function otherwise
use the existing errmgr log function.

This commit was SVN r23161.
2010-05-17 23:02:13 +00:00
Abhishek Kulkarni
b0e963299a Adding a new function to return the stack trace (not including the call to the function itself)
as a string (which must be freed by the caller).

This commit was SVN r23160.
2010-05-17 22:57:42 +00:00
Abhishek Kulkarni
5e05546194 Adding SOS headers and package data to the Makefile.
This commit was SVN r23159.
2010-05-17 22:53:33 +00:00
Abhishek Kulkarni
4e33e6aeaa Merge OPAL SOS into the trunk.
The OPAL SOS framework tries to meet the following objectives:

 * reduce the cascading error messages and the amount of code needed to print an error message.
 * build and aggregate stacks of encountered errors and associate related individual errors with each other.
 * allow registration of custom callbacks to intercept error events.

For more information, refer to
https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages

This commit was SVN r23158.
2010-05-17 22:51:52 +00:00
Abhishek Kulkarni
9c5860706f Merge improvements to the "notifier" framework from the OPAL SOS and the ORTE WDC mercurial branches into the SVN trunk.
A brief description of the improvements can be found at
https://svn.open-mpi.org/trac/ompi/wiki/ORTEWDC#ChangesdonetotheORTEnotifier

This commit was SVN r23157.
2010-05-17 22:48:05 +00:00
Abhishek Kulkarni
f5b9bc4ff1 Add a new "HNP" component to the notifier framework.
This component proxies notification messages up to the HNP.  This
component runs in both the HNP and non-HNP processes for ease of
selection (e.g., so you can "--mca notifier hnp" (vs. "--mca
notifier hnp,non_hnp").  It auto-detects where it is running and
does the Right Thing -- if it's in the HNP process, it sets up to
receive incoming proxied messages.  If it's not in the HNP, then it
proxies all messages to the HNP.

This commit was SVN r23156.
2010-05-17 22:43:43 +00:00
Abhishek Kulkarni
197ec7586d Add a new "file" component to the notifier framework.
When this component is selected, the notification messages are sent to a file. 
The file can be a plain file or stdout or stderr.

The MCA parameter "notifier_file_name" can be used to specify the suffix of the file the notification messages should be sent to.
The default suffix is "wdc" and the file full name is "output-wdc".

This commit was SVN r23155.
2010-05-17 22:39:52 +00:00
Jeff Squyres
b43288f01e Add missing header file
This commit was SVN r23154.
2010-05-17 21:31:24 +00:00
Jeff Squyres
b0cfe91eca Re-enable hwloc component; it should be working now.
I forgot to mention one more thing in the r23152 commit message:

 * Copy the fix for hwloc's m4 to disable the configure flag
   --enable-debug when building in embedding mode, because it can be
   hijacked by the outter-level application.  In this case, if you
   configured OMPI with --enable-debug (or have --enable-debug in a
   platform file), you'd see all of hwloc's debug output.  Ick.  hwloc
   1.0 will include this fix.

This commit was SVN r23153.

The following SVN revision numbers were found above:
  r23152 --> open-mpi/ompi@ca3362021e
2010-05-17 21:07:57 +00:00
Jeff Squyres
ca3362021e Fix some problems noted by Ralph:
* Fix disabling hwloc build (i.e., put the AM_CONDITIONALs where they
   belong in the configure.m4 file)
 * Update some svn:ignores
 * r23142 removed some extraneous code, but forgot to remove the
   variables used only by that code

This commit was SVN r23152.

The following SVN revision numbers were found above:
  r23142 --> open-mpi/ompi@610fc67d12
2010-05-17 21:05:27 +00:00
Ralph Castain
cc8ebe7dd5 Protect against NULL when looking for an MCA param in an environment
This commit was SVN r23151.
2010-05-17 02:50:39 +00:00
Ralph Castain
7e6985edbf Cleanup warnings
This commit was SVN r23150.
2010-05-16 20:23:26 +00:00
Ralph Castain
de85049477 Cleanup warnings
This commit was SVN r23149.
2010-05-16 20:22:18 +00:00
Ralph Castain
12590202d8 Cleanup warnings
This commit was SVN r23148.
2010-05-16 20:22:00 +00:00
Ralph Castain
da170a7ab9 Turn off the blasted hwloc component as it generates a ton of garbage. Note that this means linux-based systems will -not- have paffinity for now since the good old plpa module was removed.
Clean up some missing ignores

This commit was SVN r23147.
2010-05-16 20:06:14 +00:00
Jeff Squyres
657ece775c Add 1.4.3 section.
This commit was SVN r23146.
2010-05-15 13:10:56 +00:00
Jeff Squyres
91507e595f Fix bug reported on user list; set the errhandler type properly.
This commit was SVN r23145.
2010-05-15 13:04:32 +00:00
Ralph Castain
88f5217a12 Cleanup the debugger daemon co-launch code and add an ability to test it. Implement ability to co-launch debugger daemons upon attach to a running job for jobs launched under rsh, slurm, and tm environments (others can easily be added if desired).
Add new mca params to test:

orte_debugger_test_daemon: Name of the executable to be used to simulate a debugger colaunch
orte_debugger_test_attach: Test debugger colaunch after debugger attachment

To test co-launch at job start, just set the orte_debugger_test_daemon param.

To test co-launch upon attach:
set orte_debugger_test_daemon
set orte_debugger_test_attach=1
set orte_enable_debug_cospawn_while_running=1
set orte_debugger_check_rate=<N> - defines the number of seconds to wait before "checking" for a debugger attaching

Added a "debugger" program to orte/test/mpi that just spins to simulate a debugger daemon.

This commit was SVN r23144.
2010-05-14 18:44:49 +00:00
Jeff Squyres
e2ab4f2baf Should be working now...
This commit was SVN r23143.
2010-05-14 15:20:47 +00:00
Jeff Squyres
610fc67d12 Oops -- don't convert to a processor ID here; just return the OS index
of the core.

This commit was SVN r23142.
2010-05-14 15:14:28 +00:00
Jeff Squyres
98720b7456 Update svn:ignore
This commit was SVN r23141.
2010-05-14 13:21:52 +00:00
Jeff Squyres
a27da2473a Ensure the whole directory is built.
This commit was SVN r23140.
2010-05-14 13:21:09 +00:00
Jeff Squyres
3ba4086b0f Remove another debugging message.
This commit was SVN r23139.
2010-05-14 13:20:46 +00:00
Jeff Squyres
a1848ef8d5 Arf. Ignore this component while I fix vpath builds...
This commit was SVN r23138.
2010-05-14 13:03:02 +00:00
Jeff Squyres
2d01a67516 Remove these generates files from SVN.
This commit was SVN r23137.
2010-05-14 11:58:17 +00:00
Jeff Squyres
8c8efa9bf3 Remove debugging message.
This commit was SVN r23136.
2010-05-14 11:57:43 +00:00