1
1

43 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
33b68132cc Update the rmcast framework
This commit was SVN r24370.
2011-02-12 16:52:03 +00:00
Ralph Castain
b09f57b03d Update the multicast subsystem - ported from Cisco branch
This commit was SVN r24246.
2011-01-13 01:54:05 +00:00
Ralph Castain
f9ffff59f8 Ensure clean termination of threads and tcp multicast
This commit was SVN r24134.
2010-12-02 00:23:42 +00:00
Ralph Castain
eba65e97f3 Extend the rmcast APIs to allow enable/disable of comm, required for clean termination by upper layer users.
Point the recv thread event base to the right place so it can wakeup when required.

Add a new error code for "comm disabled" when attempting to communicate after disabling comm.

This commit was SVN r24129.
2010-12-01 13:41:19 +00:00
Ralph Castain
85a974b0de Better check for NULL before using the value
This commit was SVN r24122.
2010-12-01 04:48:50 +00:00
Ralph Castain
c56185887b Change the event base "wakeup" support to enable the passing of events to the central thread for add/del. Add a macro OPAL_UPDATE_EVBASE for this purpose as it will likely be widely used.
Update the ORTE thread support to utilize this capability. Update the rmcast framework to track the change.

This commit was SVN r24121.
2010-12-01 04:26:43 +00:00
Ralph Castain
d20c023348 Checkpoint the threading support for multicast - will be revised shortly, but this version currently works.
This commit was SVN r24117.
2010-11-30 17:30:16 +00:00
Ralph Castain
021bd77bf1 Don't free the event base if we aren't using progress threads
This commit was SVN r24036.
2010-11-10 21:58:58 +00:00
Ralph Castain
57257ab9b4 Use the right event base if threads are disabled. Always update the seq num
This commit was SVN r24034.
2010-11-10 21:26:04 +00:00
Ralph Castain
cbb758c4fb Allow mcast threads to be disabled
This commit was SVN r24032.
2010-11-10 20:16:41 +00:00
Ralph Castain
22e40d92a0 Cleanup thread termination
This commit was SVN r24031.
2010-11-10 19:36:44 +00:00
Ralph Castain
01347926d1 Be a little more thorough about cleaning up during finalize
This commit was SVN r24014.
2010-11-09 14:56:27 +00:00
Shiqing Fan
d3701ccba8 type casts.
This commit was SVN r24013.
2010-11-09 09:17:22 +00:00
Ralph Castain
f2f41d1ca9 Be nice to those who don't enable-multicast...poor wretches.
This commit was SVN r24011.
2010-11-09 05:08:55 +00:00
Ralph Castain
a47b33678b Add orte-level thread support to avoid some of the opal_if_threads protection used solely for ompi.
Use threads to help process multicast messages.

This commit was SVN r24009.
2010-11-08 19:09:23 +00:00
Ralph Castain
bf665692c3 Update the rmcast callback function API to return message sequence number. Update orte_mcast test to stress the system.
This commit was SVN r24004.
2010-11-07 23:29:52 +00:00
Ralph Castain
9ea2b196ce Convert the opal_event framework to use direct function calls instead of hiding functions behind function pointers. Eliminate the opal_object_t abstraction of libevent's event struct so it can be directly passed to the libevent functions.
Note: the ompi_check_libfca.m4 file had to be modified to avoid it stomping on global CPPFLAGS and the like. The file was also relocated to the ompi/config directory as it pertains solely to an ompi-layer component.

Forgive the mid-day configure change, but I know Shiqing is working the windows issues and don't want to cause him unnecessary redo work.

This commit was SVN r23966.
2010-10-28 15:22:46 +00:00
Ralph Castain
86c7365e8e Clean up a few initialization issues - don't think these are impacting the shared memory situation as it didn't fix the problem.
Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution.

This commit was SVN r23943.
2010-10-26 02:41:42 +00:00
Ralph Castain
fceabb2498 Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.

Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.

Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.

I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:

1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)

2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.

There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.

This commit was SVN r23925.
2010-10-24 18:35:54 +00:00
Ralph Castain
bbf84fd92b Refine the protection from cross-dvm communications
This commit was SVN r23615.
2010-08-16 16:33:39 +00:00
Ralph Castain
1ba8bbe1a9 Don't require the existence of a multicast interface if --enable-multicast wasn't specified.
This commit was SVN r23494.
2010-07-26 15:09:57 +00:00
Ralph Castain
140e427a79 Ensure that wildcard recvs go to the end of the matching list so that recvs for specific tags take precedence.
Ensure we don't try to send tcp mcast messsages to procs that haven't reported back yet

This commit was SVN r23491.
2010-07-23 19:31:34 +00:00
Ralph Castain
6cbe947810 Modify the multicast scheme so that applications have separate input and output channels to avoid cross-talk. Update the multicast test to conform.
This commit was SVN r23271.
2010-06-15 03:50:31 +00:00
Shiqing Fan
2697a37363 Use the correct type for IO vector base.
This commit was SVN r23229.
2010-06-01 15:40:11 +00:00
Ralph Castain
36e6c11c5e Little cleanup
This commit was SVN r23211.
2010-05-27 02:49:09 +00:00
Ralph Castain
ab6e06f5b3 Reorganize the rmcast code to capture common code elements. Increase max msg size for spread and udp transports. Cleanup the spread configuration doc.
This commit was SVN r23207.
2010-05-25 22:36:57 +00:00
Abhishek Kulkarni
afbe3e99c6 * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
 SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
 back the native error code.

* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
  (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
  decode 'ret' to get the native error code.

This commit was SVN r23162.
2010-05-17 23:08:56 +00:00
Ralph Castain
bcff0d6301 Some minor cleanup in the rmcast framework, ensure that a default multicast group is always defined for each app
This commit was SVN r23079.
2010-05-03 04:07:14 +00:00
Ralph Castain
3f262bf0b6 Add a new reliable multicast component based on the "spread" library
Thanks to Srini Nariangadu (Cisco) for the contribution!

This commit was SVN r23076.
2010-05-02 17:29:41 +00:00
Ralph Castain
c6448587fe It is okay to not select an rmcast module
This commit was SVN r22719.
2010-02-26 02:39:04 +00:00
Ralph Castain
9a5fdbb622 Continue development of reliable multicast
This commit was SVN r22616.
2010-02-14 19:20:56 +00:00
Ralph Castain
48486df4fe Cleanup some diagnostics
This commit was SVN r22389.
2010-01-12 01:25:19 +00:00
Ralph Castain
4a82dd9a45 Add message sequence numbers to multicast messages, tracked by channel
This commit was SVN r22262.
2009-12-04 04:17:44 +00:00
Ralph Castain
840766a894 Update the rmcast APIs to include tag params and reorder them to look like their rml cousins
This commit was SVN r22218.
2009-11-17 15:58:59 +00:00
Ralph Castain
6496ce7212 Expand the reliable multicast APIs to support sending/recving of iovecs
This commit was SVN r22213.
2009-11-11 22:10:35 +00:00
Tim Mattox
4acfbe6554 Unfortunately, the typo's that r22129 tried to fix were not
as simple as I or Ralph had hoped.  This should be the real fix,
or very close to it.  I can now see both the sensor and rmcast
information from ompi_info when configured
with --enable-monitoring --enable_multicast

This commit was SVN r22131.

The following SVN revision numbers were found above:
  r22129 --> open-mpi/ompi@02ff00dfb5
2009-10-23 02:38:51 +00:00
Ralph Castain
49ce2b4342 Add a new interface to the rmcast framework to query the output channel for the proc
This commit was SVN r22105.
2009-10-15 17:47:42 +00:00
Ralph Castain
99c67183d2 Minor cleanups, mainly to ensure we correctly block on blocking sends
This commit was SVN r22102.
2009-10-15 02:39:15 +00:00
Ralph Castain
18960a9c5a Refactor the multicast support so the data type objects can be accessed beyond just the one component
Ensure that the local node is included in the allocation prior to bootstrap discovery

This commit was SVN r22099.
2009-10-14 17:43:40 +00:00
Ralph Castain
84cc847be8 Next phase of auto-wireup using multicast. Enable use of multicast groups to separate comm from different application groups. Have the orted bootstrap message go to a different rml tag so the node can be added to the pool.
This commit was SVN r22083.
2009-10-10 01:19:56 +00:00
Ralph Castain
1d7ab97c84 Update the multicast framework to allow specification of different message scopes per various RFCs. Redefine the API a little to utilize channel numbers without worrying about the specifics of their addressing
This commit was SVN r22037.
2009-09-30 14:40:43 +00:00
Ralph Castain
3167f0a0a0 Complete the next round of the multicast framework development. Needs further polish, upgrade to handle message fragmentation - but good enough for auto-bootstrap of orteds.
Teach the ess cm module to bootstrap orted launch

This commit was SVN r22006.
2009-09-23 20:57:49 +00:00
Ralph Castain
c3f9096fd9 Add a reliable multicast framework, with an initial basic module. This is configured out unless specifically requested via --enable-multicast.
This commit was SVN r21988.
2009-09-22 00:58:29 +00:00