1
1
Граф коммитов

3005 Коммитов

Автор SHA1 Сообщение Дата
Abhishek Kulkarni
fd7ef7a1f1 Fixes broken trunk compile: call process status notify
only when ft-enable-cr is selected.

This commit was SVN r24255.
2011-01-14 18:37:07 +00:00
Abhishek Kulkarni
87d2c9b31d Few fault tolerance updates related to the CIFTS project (http://www.mcs.anl.gov/research/cifts/)
* Improve the FTB notifier to publish (C/R, process/communication failure) events to the FTB with the
   OMPI jobid as the associated payload.
 * Add notifier calls for C/R events and process status events in SnapC and ErrMgr components.
 * Fix a bug where the SnapC states and process states collide before being thrown out over the notifier.

This commit was SVN r24251.
2011-01-13 20:13:49 +00:00
Ralph Castain
b09f57b03d Update the multicast subsystem - ported from Cisco branch
This commit was SVN r24246.
2011-01-13 01:54:05 +00:00
Terry Dontje
f3aaa885a3 corrected a couple places in orte where it said cpu_model when it should have been cpu_type.
This commit was SVN r24221.
2011-01-11 19:56:26 +00:00
Abhishek Kulkarni
11ffa854ff Update the FTB notifier
* fix indentation issues
 * update the name of one of the fault events published to the FTB (per the FTB MPI standard)

This commit was SVN r24213.
2011-01-10 18:58:31 +00:00
Ralph Castain
80ef1af8ba Add psm key generator program
This commit was SVN r24197.
2010-12-30 20:54:58 +00:00
Nathan Hjelm
c082d05ecb Reset the timer on MPIR_being_debugged only if MPIR_being_debugged is not set. Fix typo in return code.
This commit was SVN r24187.
2010-12-20 21:00:49 +00:00
Jeff Squyres
a525e70f46 Convert "opal_show_help" to be a global variable pointer.
It is statically initialized to the real back-end OPAL show_help
function.  During orte_show_help_init(), the variable is re-assigned
with the value of the back-end ORTE show_help function (the one that
does error message aggregation).  

Therefore, anything that calls opal_show_help() after a certain point
in orte_init() will have their show_help messages be aggregated.
w00t!  Even code down in OPAL -- that has no knowledge of ORTE -- will
have their messages aggregated.  '''Double w00t!'''

During orte_show_help_finalize(), we restore the original pointer
value so that it something calls opal_show_help() after
orte_finalize(), it'll still work properly (but it won't be
aggregated).  

This commit was SVN r24185.
2010-12-16 23:00:25 +00:00
Jeff Squyres
de97962aac Fixes trac:2651.
Fix off-by-one error when /dev/urandom doesn't exist.  Thanks to "pth"
for the patch.

This commit was SVN r24170.

The following Trac tickets were found above:
  Ticket 2651 --> https://svn.open-mpi.org/trac/ompi/ticket/2651
2010-12-14 14:52:51 +00:00
Ralph Castain
b251a59cdf Cleanup nidmap finalize
This commit was SVN r24164.
2010-12-11 16:42:06 +00:00
Ralph Castain
2dc5cbb483 Remove stale code and API from the RML/OOB frameworks. Stopped using this code years ago.
This commit was SVN r24153.
2010-12-05 15:58:21 +00:00
Rolf vandeVaart
b67d3398da It is convention to have orte_config.h included at top of file.
This commit was SVN r24146.
2010-12-03 16:13:31 +00:00
Shiqing Fan
f43862420c Convert the bad dos line endings to unix style for all windows related files.
This commit was SVN r24137.
2010-12-02 12:08:08 +00:00
Ralph Castain
aaad8ae891 Remove unused var
This commit was SVN r24136.
2010-12-02 02:38:13 +00:00
Ralph Castain
f9ffff59f8 Ensure clean termination of threads and tcp multicast
This commit was SVN r24134.
2010-12-02 00:23:42 +00:00
Nathan Hjelm
75605faa75 added support for reattaching a debugger using the MPIR_attach_fifo
This commit was SVN r24132.
2010-12-01 20:13:58 +00:00
Ralph Castain
ad814f26cd One more time, into the breach!
Restore the use of override_oversubscribe to indicate that the data source for resources on the backend nodes used in mapping is unreliable. In this situation (e.g., data came from hostfile, or we are just using localhost because nothing was provided), we don't trust the oversubscribe condition passed by the mapper. Instead, we check locally to ensure we set sched_yield correctly.

This commit was SVN r24130.
2010-12-01 15:15:26 +00:00
Ralph Castain
eba65e97f3 Extend the rmcast APIs to allow enable/disable of comm, required for clean termination by upper layer users.
Point the recv thread event base to the right place so it can wakeup when required.

Add a new error code for "comm disabled" when attempting to communicate after disabling comm.

This commit was SVN r24129.
2010-12-01 13:41:19 +00:00
Ralph Castain
9224302c10 Remove debug
This commit was SVN r24128.
2010-12-01 13:12:24 +00:00
Ralph Castain
4f5625d699 Not totally necessary, but good form - init the oversubscribed field in the orte_nid_t object
This commit was SVN r24127.
2010-12-01 12:58:37 +00:00
Ralph Castain
30c37ea536 Ensure that the oversubscribed condition of nodes is accurately reported by the mapper, and that the results are communicated and used by the backend orteds when setting sched_yield on local procs. Restores prior behavior that was somehow lost along the way.
Includes a patch from Damien Guinier to fix vpid assignments when cpus-per-task is specified.

This commit was SVN r24126.
2010-12-01 12:51:39 +00:00
Ralph Castain
85a974b0de Better check for NULL before using the value
This commit was SVN r24122.
2010-12-01 04:48:50 +00:00
Ralph Castain
c56185887b Change the event base "wakeup" support to enable the passing of events to the central thread for add/del. Add a macro OPAL_UPDATE_EVBASE for this purpose as it will likely be widely used.
Update the ORTE thread support to utilize this capability. Update the rmcast framework to track the change.

This commit was SVN r24121.
2010-12-01 04:26:43 +00:00
Ralph Castain
963336ee5a Remove the test for libevent internal threads
This commit was SVN r24120.
2010-12-01 04:24:10 +00:00
Ralph Castain
0441e81882 Oops - ensure that multicast msgs get circulated properly with the tcp module
This commit was SVN r24118.
2010-11-30 21:13:53 +00:00
Ralph Castain
d20c023348 Checkpoint the threading support for multicast - will be revised shortly, but this version currently works.
This commit was SVN r24117.
2010-11-30 17:30:16 +00:00
Ralph Castain
0465605a9c Cleanup condition check for a param so it doesn't show if not usable.
This commit was SVN r24116.
2010-11-30 17:28:53 +00:00
Ralph Castain
09f02b3087 Update the ORTE thread acquire/release/wakeup macros to trigger release from event_loop so that conditions can be checked.
Add macro versions of condition_wait and friends for debug use.

This commit was SVN r24115.
2010-11-30 17:27:58 +00:00
Ralph Castain
d2547e84a3 MPI procs never use orte progress threads
This commit was SVN r24093.
2010-11-29 03:52:46 +00:00
Ralph Castain
71669720a3 Just get the output once on sigpipe error, and include the fd
This commit was SVN r24092.
2010-11-25 15:32:48 +00:00
Ralph Castain
30c635fd4d Don't endlessly output sigpipe errors. Count the number of times we trap it, and abort if we get more than 10 of them.
This commit was SVN r24091.
2010-11-25 15:25:24 +00:00
Ralph Castain
b9b2d101dc Add an mca param to indicate if orte progress threads are to be enabled. Error out if this is given and libevent thread support was not built.
This commit was SVN r24089.
2010-11-24 23:28:00 +00:00
Rolf vandeVaart
1d62542c23 Fix another Sun Studio warning. jobid and vpid need to
be uint32_t. 

This commit was SVN r24074.
2010-11-19 18:12:46 +00:00
Rolf vandeVaart
09fdd5cc23 Include fcntl.h, not sys/fcntl.h so we get the definition
of the open system call.  That is what man page says to do.
Fixes warning on Solaris.

This commit was SVN r24073.
2010-11-19 17:40:02 +00:00
Shiqing Fan
358b4a5cba Add an option to enable the debug postfix for executables.
This commit was SVN r24070.
2010-11-19 15:54:13 +00:00
Ethan Mallove
66f2301170 Just plain "Grid Engine" instead of "Sun Grid Engine"
This commit was SVN r24068.
2010-11-18 19:30:04 +00:00
Abhishek Kulkarni
78a67654d4 add notifier events for process migration
This commit was SVN r24058.
2010-11-16 17:57:44 +00:00
Abhishek Kulkarni
6e6ccae082 Update the checkpoint notification events that we throw out over the FTB with a payload embedded in {}
This commit was SVN r24057.
2010-11-16 17:55:57 +00:00
Ralph Castain
58e711a412 Update a test and add two new ones for testing event lib thread support
This commit was SVN r24051.
2010-11-13 15:39:28 +00:00
Jeff Squyres
e4744b4ed5 Per http://www.open-mpi.org/community/lists/devel/2010/11/8671.php,
change a bunch of OMPI_<foo> names to OPAL_<foo>.

This commit was SVN r24046.
2010-11-12 23:22:11 +00:00
Ralph Castain
703684e071 Output the mca params for debug purposes
This commit was SVN r24042.
2010-11-11 20:06:29 +00:00
Shiqing Fan
c03ea1a5f3 A more clean way to build on Windows.
It's not possible to combine two shared libraries on Windows, so we have to do it a bit different. First generate a small event static library by just linking the object files, and link it into other libraries that needs the libevent API.

This commit was SVN r24039.
2010-11-11 12:02:54 +00:00
Ralph Castain
bb521c6b7e Properly count local procs to set oversubscribed condition
This commit was SVN r24037.
2010-11-10 21:59:35 +00:00
Ralph Castain
021bd77bf1 Don't free the event base if we aren't using progress threads
This commit was SVN r24036.
2010-11-10 21:58:58 +00:00
Ralph Castain
9c72737414 Send the recovery flag
This commit was SVN r24035.
2010-11-10 21:26:28 +00:00
Ralph Castain
57257ab9b4 Use the right event base if threads are disabled. Always update the seq num
This commit was SVN r24034.
2010-11-10 21:26:04 +00:00
Ralph Castain
cbb758c4fb Allow mcast threads to be disabled
This commit was SVN r24032.
2010-11-10 20:16:41 +00:00
Ralph Castain
22e40d92a0 Cleanup thread termination
This commit was SVN r24031.
2010-11-10 19:36:44 +00:00
Ralph Castain
f5e50abab2 Make class visible
This commit was SVN r24022.
2010-11-09 19:07:45 +00:00
Nathan Hjelm
986265fc6e fixed crash in orte-ps caused by calls to OBJ_RELEASE on an opal_event_t object.
This commit was SVN r24020.
2010-11-09 18:41:43 +00:00