1
1
Граф коммитов

28 Коммитов

Автор SHA1 Сообщение Дата
Prabhanjan Kambadur
2439244f0c These are some changes which will enable dynamic builds to go through on Windows. Most of the changes are in adding/deleting windows symbol exporting things.
This commit was SVN r4377.
2005-02-10 19:08:35 +00:00
Prabhanjan Kambadur
34622a5f18 checking in changes that would make this thing work
This commit was SVN r4142.
2005-01-26 00:20:35 +00:00
Ralph Castain
65f874c556 Fix some memory problems. This will temporarily break comm_spawn again - Tim needs to grab this to take a look at something in connect_accept.
This commit was SVN r3772.
2004-12-10 16:27:09 +00:00
Ralph Castain
ed197f0186 More minor changes that continue to make progress on comm_spawn. Nothing significant - no impact on other operations.
PLEASE NOTE: there are some diagnostic messages in oob_xcast that will print out. Please don't have a cow about them - they won't hurt nor injure anyone, and it's just there for a little while to help Tim and I debug a problem. Just didn't want to create yet another MCA parameter to debug 10 lines of code. :-) 

This commit was SVN r3756.
2004-12-09 04:54:37 +00:00
Ralph Castain
d21c0027df Well, we are getting closer to resolving the comm_spawn problem. For the benefit of those that haven't been in the midst of this discussion, the problem is that this is the first case where the process starting a set of processes has not been mpirun and is not guaranteed to be alive throughout the lifetime of the spawned processes. This sounds simple, but actually has some profound impacts.
Most of this checkin consists of more debugging stuff. Hopefully, you won't see any printf's that aren't protected by debug flags - if you do, let me know and I'll take them out with my apologies.

Outside of debugging, the biggest change was a revamp of the shutdown process. For several reasons, we had chosen to have all processes "wait" for a shutdown message before exiting. This message is typically generated by mpirun, but in the case of comm_spawn we needed to do something else. We have decided that the best way to solve this problem is to:

(a) replace the shutdown message (which needed to be generated by somebody - usually mpirun) with an oob_barrier call. This still requires that the rank 0 process be alive. However, we terminate all processes if one abnormally terminates anyway, so this isn't a problem (with the standard or our implementation); and

(b) have the state-of-health monitoring subsystem issue the call to cleanup the job from the registry. Since the state-of-health subsystem isn't available yet, we have temporarily assigned that responsibility to the rank 0 process.  Once the state-of-health subsystem is available, we will have it monitor the job for all-processes-complete and then it can tell the registry to cleanup the job (i.e., remove all data relating to this job).

Hope that helps a little. I'll put all this into the design docs soon.

This commit was SVN r3754.
2004-12-08 21:44:41 +00:00
Jeff Squyres
616269a9be Add HLRS copyright
This commit was SVN r3665.
2004-11-28 20:09:25 +00:00
Jeff Squyres
e9ed717748 First cut at copyrights: IU, UTK, and some OSU. LANL and HLRS still
pending.

This commit was SVN r3655.
2004-11-22 01:38:40 +00:00
Ralph Castain
bf9087d9d1 The merged main trunk and gpr integration branch. Tested on Mac only so far - will check out and test on Linux. If that has a problem, will back all changes out (again), but I think we have this one correct. Will send out a more complete change notice once testing is complete.
This commit was SVN r3644.
2004-11-20 19:12:43 +00:00
Brian Barrett
23a6d5bb60 * roll back r3584 (gpr changes to reduce floods) as it appears to cause
some instability on Linux

This commit was SVN r3587.

The following SVN revision numbers were found above:
  r3584 --> open-mpi/ompi@52add381d0
2004-11-17 02:30:07 +00:00
Brian Barrett
52add381d0 * Merge over the gpr changes Ralph has made on the gpr-integration branch.
This may trigger a complete rebuild :(.  Short overview of changes:

  - reduce number of network slams at startup
  - prevent gpr from hanging when doing process death code
  - general gpr cleanups

This commit was SVN r3584.
2004-11-16 22:53:33 +00:00
Ralph Castain
b42a361302 Patch a few things that were causing trouble for programs that re-entered the registry during a callback function. Also fixed a timing problem in rte_monitor - ensured that we were in fact already waiting on a condition before generating a wakeup signal. Adjusted the timing of mpirun to ensure that the synchro to alert mpirun of all-processes-done got registered before they completed.
This commit was SVN r2885.
2004-09-29 21:54:57 +00:00
Ralph Castain
311e6c1f23 ka-ching
This commit was SVN r2833.
2004-09-23 14:37:04 +00:00
Tim Woodall
0ffa11b904 modified locking
This commit was SVN r2698.
2004-09-16 08:26:57 +00:00
Ralph Castain
d0e308fbc4 First attempt to thread safe the registry and name server subsystems. Comment out the duplicate calls to register processes in mpi_init and mpirun2.
This commit was SVN r2697.
2004-09-16 04:14:35 +00:00
Ralph Castain
03096df34a Correct a mis-match in a parameter passed to one of the functions. Somehow, the parameter list to construct a notify message got twisted to passing key values instead of tokens. This meant that the parameters sent to "get" were also incorrect, guaranteeing that the whole notify system would fail.
Not sure how the mistake got in there - possibly just some lack of coordination. Anyway, it has now been corrected.

This commit was SVN r2440.
2004-09-02 01:44:19 +00:00
Ralph Castain
854ac21038 Provide a couple of new synchro options:
1. greater than or equal
2. less than or equal

Adjust ascending/descending mode to require transition through level. Change initial checks to only check levels.

This commit was SVN r2428.
2004-09-01 15:27:50 +00:00
Ralph Castain
71ad56d894 Update the registry to complete the publish/subscribe system. Add new function "cancel_synchro" to remove a synchro trigger, enable subscribe/unsubscribe on proxies, minor cleanups. Registry should now be fully functional. Note that I am currently unable to test this in a multi-process environment - can only guarantee it compiles and passes replica-only tests.
This commit was SVN r2393.
2004-08-30 15:15:27 +00:00
Ralph Castain
37f331dcff Update to the registry, bringing publish/subscribe system online. Most functions now available - exceptions are a couple of more esoteric notify modes, and the "unsubscribe" function. Will bring those online over the weekend.
This commit was SVN r2346.
2004-08-28 01:59:56 +00:00
Ralph Castain
1df5cca479 Clean up constructors/destructors for proper memory management.
This commit was SVN r2325.
2004-08-27 15:36:53 +00:00
Ralph Castain
f1ab634fab Massive update of the registry system to incorporate publish/subscribe notifications, including new "synchro" function that allows a barrier-like operation to be performed on a registry segment. Corresponding update to unit test for replica - system passes, but additional new functionality needs to be added to test.
This commit was SVN r2317.
2004-08-27 05:23:04 +00:00
Ralph Castain
ee1f0b13f4 Add a new function - synchro - that allows the caller to be notified when a specified number of objects have been placed on a segment. The tokens describing the objects must also be provided, which means you can intermix objects on the segment and still be notified when you reach the specified number of objects meeting your description.
Code compiles, but has not been functionally validated yet.

This commit was SVN r2290.
2004-08-25 01:59:36 +00:00
Ralph Castain
8edd599cf4 Update header files to reflect change in API - registry objects will now be void*. Users can call the packing routines to generate packed info, then extract and pass the resulting object to the GPR for storage. Users can also simply send a string for storage without packing as characters carry across platforms just fine.
Added OOB connection - still needs testing, but test tree is currently kaput.

This commit was SVN r2226.
2004-08-19 15:14:29 +00:00
Ralph Castain
964ad6ae46 Added hook to test internal subsystems. Expanded unit test to exploit. Fixed a few bugs.
This commit was SVN r2177.
2004-08-17 04:23:06 +00:00
Ralph Castain
2db94bb016 Checkpoint the registry. Now have several of the basic functions operating - can define segments, get index of dictionaries, etc - all the basic elements required to build get/put. Will continue tonight.
This commit was SVN r2169.
2004-08-16 21:36:50 +00:00
Ralph Castain
60a5428e43 Update the registry - restore the gpr_replica.c file into the make system, update functionality.
This commit was SVN r2164.
2004-08-16 16:20:47 +00:00
Ralph Castain
b433fc3ad9 Added the local support functions for the replica. Everything compiles fine on my machine - next step is to test these functions.
This commit was SVN r2154.
2004-08-15 22:37:01 +00:00
Ralph Castain
49e8e16148 Fix the registry software to compile - get all the naming errors finally out, etc. The functionality is not present, so don't use it yet - nothing will happen. I'll be restoring the functionality over the next week or two.
This commit was SVN r2146.
2004-08-15 05:49:55 +00:00
Ralph Castain
c0121cb927 Major update to the general purpose registry. Cleaned up the mess from name changes, begin building the functionality. Long way to go....
I have to commit this to cleanup a break in my tree. I'm hoping it won't break the compile of the tree, but will fix it as quickly as possible.

Jeff - you are welcome to set an "ignore" on the gpr if you like - I'll let you know when I've got the "kinks" out.

This commit was SVN r2145.
2004-08-15 03:33:13 +00:00