Wesley Bland
4e7ff0bd5e
By popular demand the epoch code is now disabled by default.
...
To enable the epochs and the resilient orte code, use the configure flag:
--enable-resilient-orte
This will define both:
ORTE_ENABLE_EPOCH
ORTE_RESIL_ORTE
This commit was SVN r25093.
2011-08-26 22:16:14 +00:00
Wesley Bland
e1ba09ad51
Add a resilience to ORTE. Allows the runtime to continue after a process (or
...
ORTED) failure. Note that more work will be necessary to allow the MPI layer to
take advantage of this.
Per RFC:
http://www.open-mpi.org/community/lists/devel/2011/06/9299.php
This commit was SVN r24815.
2011-06-23 20:38:02 +00:00
Ralph Castain
033cbbed31
Don't automatically assign group channels if not given - let the layer above figure it out.
...
This commit was SVN r24771.
2011-06-10 16:28:18 +00:00
Ralph Castain
d7e029cb40
Convert heartbeat to multicast basis
...
This commit was SVN r24570.
2011-03-24 19:05:39 +00:00
Ralph Castain
90698a2c02
Ensure that blocking recvs wait until the data is actually recvd
...
This commit was SVN r24558.
2011-03-22 18:45:54 +00:00
Ralph Castain
33b68132cc
Update the rmcast framework
...
This commit was SVN r24370.
2011-02-12 16:52:03 +00:00
Ralph Castain
b09f57b03d
Update the multicast subsystem - ported from Cisco branch
...
This commit was SVN r24246.
2011-01-13 01:54:05 +00:00
Ralph Castain
f9ffff59f8
Ensure clean termination of threads and tcp multicast
...
This commit was SVN r24134.
2010-12-02 00:23:42 +00:00
Ralph Castain
eba65e97f3
Extend the rmcast APIs to allow enable/disable of comm, required for clean termination by upper layer users.
...
Point the recv thread event base to the right place so it can wakeup when required.
Add a new error code for "comm disabled" when attempting to communicate after disabling comm.
This commit was SVN r24129.
2010-12-01 13:41:19 +00:00
Ralph Castain
9224302c10
Remove debug
...
This commit was SVN r24128.
2010-12-01 13:12:24 +00:00
Ralph Castain
c56185887b
Change the event base "wakeup" support to enable the passing of events to the central thread for add/del. Add a macro OPAL_UPDATE_EVBASE for this purpose as it will likely be widely used.
...
Update the ORTE thread support to utilize this capability. Update the rmcast framework to track the change.
This commit was SVN r24121.
2010-12-01 04:26:43 +00:00
Ralph Castain
0441e81882
Oops - ensure that multicast msgs get circulated properly with the tcp module
...
This commit was SVN r24118.
2010-11-30 21:13:53 +00:00
Ralph Castain
a47b33678b
Add orte-level thread support to avoid some of the opal_if_threads protection used solely for ompi.
...
Use threads to help process multicast messages.
This commit was SVN r24009.
2010-11-08 19:09:23 +00:00
Ralph Castain
bf665692c3
Update the rmcast callback function API to return message sequence number. Update orte_mcast test to stress the system.
...
This commit was SVN r24004.
2010-11-07 23:29:52 +00:00
Ralph Castain
37f566bf1e
Cancel recvs when finalizing
...
This commit was SVN r23871.
2010-10-07 22:02:12 +00:00
Ralph Castain
40a2bfa238
WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gone into validating the branch. However, not all conditions can be checked, so users are cautioned that it may be advisable to not update from the trunk for a few days to allow MTT to identify platform-specific issues.
...
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.
Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.
This commit was SVN r23764.
2010-09-17 23:04:06 +00:00
Ralph Castain
b81358815c
Add some debug
...
This commit was SVN r23686.
2010-08-29 13:45:10 +00:00
Ralph Castain
4491a0e5dc
Add a channel for reporting errors, fix a bug in the tcp module
...
This commit was SVN r23610.
2010-08-13 15:04:22 +00:00
Ralph Castain
9fbd7c1949
Fix a bug in tcp multicast
...
This commit was SVN r23547.
2010-08-04 01:37:54 +00:00
Ralph Castain
94d4887452
Cleanup some race conditions at job start
...
This commit was SVN r23515.
2010-07-27 18:24:11 +00:00
Ralph Castain
0c201486e4
Allow for multiple channels for the same channel "name" - could be both input and output channels
...
This commit was SVN r23498.
2010-07-27 01:38:39 +00:00
Ralph Castain
140e427a79
Ensure that wildcard recvs go to the end of the matching list so that recvs for specific tags take precedence.
...
Ensure we don't try to send tcp mcast messsages to procs that haven't reported back yet
This commit was SVN r23491.
2010-07-23 19:31:34 +00:00
Ralph Castain
62a8b73f1a
Correctly handle output sent to the group input channel
...
This commit was SVN r23362.
2010-07-07 14:17:48 +00:00
Ralph Castain
628ffd1d6e
Make the mcast channel assignments unsigned ints so they can be used as array indices. Assign input/output channels for apps. Cleanup some bugs in open_channel
...
This commit was SVN r23275.
2010-06-16 19:40:59 +00:00
Ralph Castain
6cbe947810
Modify the multicast scheme so that applications have separate input and output channels to avoid cross-talk. Update the multicast test to conform.
...
This commit was SVN r23271.
2010-06-15 03:50:31 +00:00
Ralph Castain
4ce07ace61
Allow the user to set the send/recv buf size for udp. Don't declare existing nb recvs to be an error.
...
This commit was SVN r23210.
2010-05-26 14:29:36 +00:00
Ralph Castain
ab6e06f5b3
Reorganize the rmcast code to capture common code elements. Increase max msg size for spread and udp transports. Cleanup the spread configuration doc.
...
This commit was SVN r23207.
2010-05-25 22:36:57 +00:00
Ralph Castain
bcff0d6301
Some minor cleanup in the rmcast framework, ensure that a default multicast group is always defined for each app
...
This commit was SVN r23079.
2010-05-03 04:07:14 +00:00
Josh Hursey
b43d621f30
Remove an errant '$' in the configure.m4 files. Was causing problems with configure.
...
This commit was SVN r22821.
2010-03-12 20:08:22 +00:00
Ralph Castain
18c7aaff08
Update the grpcomm framework to be more thread-friendly.
...
Modify the orte configure options to specify --enable-multicast such that it directs components to build or not instead of littering the code base with #if's. Remove those #if's where they used to occur.
Add a new grpcomm "mcast" module to support multicast operations. Still some work required to properly perform daemon collectives for comm_spawn operations. New module only builds when --enable-multicast is provided, and when specifically selected.
This commit was SVN r22709.
2010-02-25 01:11:29 +00:00
Ralph Castain
9a5fdbb622
Continue development of reliable multicast
...
This commit was SVN r22616.
2010-02-14 19:20:56 +00:00
Ralph Castain
86dd1d41af
Handle zero-length iovecs in multicast messages
...
This commit was SVN r22507.
2010-01-28 15:29:43 +00:00
Ralph Castain
3fe5e3e142
Propagate the user's callback data during non-blocking sends
...
This commit was SVN r22432.
2010-01-15 20:02:47 +00:00
Ralph Castain
16b16c5cb8
Fix a silly typo
...
This commit was SVN r22387.
2010-01-09 15:34:49 +00:00
Ralph Castain
add84178ef
Fix a silly typo that prevented tcp multicast messages from being delivered
...
This commit was SVN r22384.
2010-01-08 20:30:27 +00:00
Ralph Castain
b3a58f8b83
Pass the correct address when packing iovec bytes for multicast.
...
Thanks to Rick Payne for the correction.
This commit was SVN r22351.
2009-12-30 20:59:31 +00:00
Ralph Castain
89a6131032
Check the return status code on all dss operations within the rmcast modules
...
This commit was SVN r22349.
2009-12-30 01:45:31 +00:00
Ralph Castain
9acec283af
Add a new TCP module to the reliable multicast framework. This module uses ORTE's grpcomm.xcast functionality to "fake" multicasts for environments where regular multicast isn't reliable.
...
Modify the startup logic to allow for this use-case.
This commit was SVN r22310.
2009-12-15 01:18:27 +00:00