1
1
Граф коммитов

40 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
445f8a802e Insert a flag into the registry replica to indicate that we are already processing callbacks, so don't call process_callbacks again. Helps to limit the stack size.
This commit was SVN r6047.
2005-06-13 17:00:57 +00:00
Ralph Castain
1c57ae20b0 Checkpoint the notifier work - notify when something is added now works, need to simply turn on the other checks.
Existing code shouldn't see any impacts. Tested on up to 125 processes.

This commit was SVN r6020.
2005-06-09 20:37:25 +00:00
Ralph Castain
51380eba13 Checkpoint the continuing re-enablement of the notifiers.
Also added a check to protect the callback system from an error being seen by Tim P. - should help with debugging.

This commit was SVN r6010.
2005-06-09 13:35:35 +00:00
Ralph Castain
ba7673a83f Checkpoint the first step in re-enabling the notification for subscriptions that monitor value changes. Added a new array that stores the actions each time the registry is called via a function that modifies its values. Updated the dump function to output the action records.
This commit was SVN r5995.
2005-06-08 19:40:38 +00:00
Ralph Castain
2451f3bdc9 Several fixes in this commit:
1. Fixed the GPR search engine so that keys AND worked, and so that multiple objects with the same key didn't mess up the search.

2. Added an orte_bitmap function based on the existing ompi_bitmap one, but minus the fortran "pollution"

3. Added a new name service function called create_my_name to remove the duplicate name creation that was happening with the RML. Basically, the RML has to assign a name when a process makes first contact if the process doesn't already have a name. For processes that get a name passed into them, this was okay - the name was already assigned. For other processes (e.g., singletons), this was not okay - the first message to the seed daemon was to create a name, which caused the RML to assign one, and then the name service to assign another.

4. Change orted so it gets its name the way everyone else does - during orte_init.

This commit was SVN r5842.
2005-05-24 13:39:15 +00:00
Josh Hursey
cc6cb5cac5 Checkpoint on Windows build.
Many changes to headers for OMPI_DECLSPEC, and 
proper placement of c_plusplus defines in those files.

mca/gpr/replica and tools are the two sets of directories
that still need work for the Windows build for this pass.

This commit was SVN r5688.
2005-05-11 20:21:10 +00:00
Jeff Squyres
a28b5ae43b Fix for a bunch of size_t issues; reviewed by George and Ralph.
- Change all uses of *printf'ing a size_t to use an explicit cast to
  (unsigned long) and the %lu escape
- change ORTE_GPR_REPLICA_MAX_SIZE to INT_MAX until bug 1345 is fixed
  (i.e., until we allow size_t in MCA params)
- ns_base_local_fns.c:orte_ns_base_get_proc_name_string(): changed
  from %0X -> %lu
- ORTE_NAME_ARGS added explicit (unsigned long) casts, and changed all
  usages of ORTE_NAME_ARGS to use %lu's

This commit was SVN r5644.
2005-05-08 13:22:55 +00:00
Jeff Squyres
aa70022dc2 Commit 2 of 4 for bringing the changes over from the hetero branch.
Merged in from:

svn merge -r5448:5496 https://svn.open-mpi.org/svn/ompi/tmp/hetero .

This commit was SVN r5550.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r5448
  r5496
2005-05-01 00:53:00 +00:00
Ralph Castain
8686c60223 Fix the subscription system so it correctly deals with triggers at a level (as opposed to comparing two counters). Combined with Brian's latest checkin, this corrects the tendency for orterun to "hang" when one or more processes abnormally terminate.
This commit was SVN r5109.
2005-03-31 14:24:36 +00:00
Ralph Castain
dfe49d0fd2 Fix a subtle bug in the registry callback system that was manifesting itself in the singleton case and (randomly) in the multiprocess case.
Update the unit-test-status matrix to include priority.

Add several new registry diagnostics that helped track down the above bug.
M    test/mca/gpr/gpr_triggers.c
M    test/Unit-Test-Status.xls
M    test/Unit-Test-Status.pdf
M    src/mpi/runtime/ompi_mpi_init.c
M    src/mca/oob/base/oob_base_xcast.c
M    src/mca/ns/base/ns_base_nds_env.c
M    src/mca/gpr/replica/api_layer/gpr_replica_dump_api.c
M    src/mca/gpr/replica/api_layer/gpr_replica_api.h
M    src/mca/gpr/replica/communications/gpr_replica_comm.h
M    src/mca/gpr/replica/communications/gpr_replica_remote_msg.c
M    src/mca/gpr/replica/communications/gpr_replica_cmd_processor.c
M    src/mca/gpr/replica/communications/gpr_replica_dump_cm.c
M    src/mca/gpr/replica/gpr_replica_component.c
M    src/mca/gpr/replica/gpr_replica.h
M    src/mca/gpr/replica/functional_layer/gpr_replica_dump_fn.c
M    src/mca/gpr/replica/functional_layer/gpr_replica_fn.h
M    src/mca/gpr/replica/functional_layer/gpr_replica_trig_ops_fn.c
M    src/mca/gpr/replica/functional_layer/gpr_replica_messaging_fn.c
M    src/mca/gpr/replica/functional_layer/gpr_replica_segment_fn.c
M    src/mca/gpr/proxy/gpr_proxy_dump.c
M    src/mca/gpr/proxy/gpr_proxy.h
M    src/mca/gpr/proxy/gpr_proxy_component.c
M    src/mca/gpr/gpr_types.h
M    src/mca/gpr/base/base.h
M    src/mca/gpr/base/unpack_api_response/gpr_base_dump_notify.c
M    src/mca/gpr/base/pack_api_cmd/gpr_base_pack_dump.c
M    src/mca/gpr/gpr.h

This commit was SVN r5080.
2005-03-28 22:37:54 +00:00
Jeff Squyres
3f5541349a Add UC copyright
This commit was SVN r5009.
2005-03-24 12:43:37 +00:00
Brian Barrett
6822a519bb * results from initial merge of the tim branch into the trunk. Compiles and
ompi_info works, but that's all that has been tested.

This commit was SVN r4827.
2005-03-14 20:57:21 +00:00
Prabhanjan Kambadur
2439244f0c These are some changes which will enable dynamic builds to go through on Windows. Most of the changes are in adding/deleting windows symbol exporting things.
This commit was SVN r4377.
2005-02-10 19:08:35 +00:00
Prabhanjan Kambadur
34622a5f18 checking in changes that would make this thing work
This commit was SVN r4142.
2005-01-26 00:20:35 +00:00
Ralph Castain
65f874c556 Fix some memory problems. This will temporarily break comm_spawn again - Tim needs to grab this to take a look at something in connect_accept.
This commit was SVN r3772.
2004-12-10 16:27:09 +00:00
Ralph Castain
ed197f0186 More minor changes that continue to make progress on comm_spawn. Nothing significant - no impact on other operations.
PLEASE NOTE: there are some diagnostic messages in oob_xcast that will print out. Please don't have a cow about them - they won't hurt nor injure anyone, and it's just there for a little while to help Tim and I debug a problem. Just didn't want to create yet another MCA parameter to debug 10 lines of code. :-) 

This commit was SVN r3756.
2004-12-09 04:54:37 +00:00
Ralph Castain
d21c0027df Well, we are getting closer to resolving the comm_spawn problem. For the benefit of those that haven't been in the midst of this discussion, the problem is that this is the first case where the process starting a set of processes has not been mpirun and is not guaranteed to be alive throughout the lifetime of the spawned processes. This sounds simple, but actually has some profound impacts.
Most of this checkin consists of more debugging stuff. Hopefully, you won't see any printf's that aren't protected by debug flags - if you do, let me know and I'll take them out with my apologies.

Outside of debugging, the biggest change was a revamp of the shutdown process. For several reasons, we had chosen to have all processes "wait" for a shutdown message before exiting. This message is typically generated by mpirun, but in the case of comm_spawn we needed to do something else. We have decided that the best way to solve this problem is to:

(a) replace the shutdown message (which needed to be generated by somebody - usually mpirun) with an oob_barrier call. This still requires that the rank 0 process be alive. However, we terminate all processes if one abnormally terminates anyway, so this isn't a problem (with the standard or our implementation); and

(b) have the state-of-health monitoring subsystem issue the call to cleanup the job from the registry. Since the state-of-health subsystem isn't available yet, we have temporarily assigned that responsibility to the rank 0 process.  Once the state-of-health subsystem is available, we will have it monitor the job for all-processes-complete and then it can tell the registry to cleanup the job (i.e., remove all data relating to this job).

Hope that helps a little. I'll put all this into the design docs soon.

This commit was SVN r3754.
2004-12-08 21:44:41 +00:00
Jeff Squyres
616269a9be Add HLRS copyright
This commit was SVN r3665.
2004-11-28 20:09:25 +00:00
Jeff Squyres
e9ed717748 First cut at copyrights: IU, UTK, and some OSU. LANL and HLRS still
pending.

This commit was SVN r3655.
2004-11-22 01:38:40 +00:00
Ralph Castain
bf9087d9d1 The merged main trunk and gpr integration branch. Tested on Mac only so far - will check out and test on Linux. If that has a problem, will back all changes out (again), but I think we have this one correct. Will send out a more complete change notice once testing is complete.
This commit was SVN r3644.
2004-11-20 19:12:43 +00:00
Brian Barrett
23a6d5bb60 * roll back r3584 (gpr changes to reduce floods) as it appears to cause
some instability on Linux

This commit was SVN r3587.

The following SVN revision numbers were found above:
  r3584 --> open-mpi/ompi@52add381d0
2004-11-17 02:30:07 +00:00
Brian Barrett
52add381d0 * Merge over the gpr changes Ralph has made on the gpr-integration branch.
This may trigger a complete rebuild :(.  Short overview of changes:

  - reduce number of network slams at startup
  - prevent gpr from hanging when doing process death code
  - general gpr cleanups

This commit was SVN r3584.
2004-11-16 22:53:33 +00:00
Ralph Castain
b42a361302 Patch a few things that were causing trouble for programs that re-entered the registry during a callback function. Also fixed a timing problem in rte_monitor - ensured that we were in fact already waiting on a condition before generating a wakeup signal. Adjusted the timing of mpirun to ensure that the synchro to alert mpirun of all-processes-done got registered before they completed.
This commit was SVN r2885.
2004-09-29 21:54:57 +00:00
Ralph Castain
311e6c1f23 ka-ching
This commit was SVN r2833.
2004-09-23 14:37:04 +00:00
Tim Woodall
0ffa11b904 modified locking
This commit was SVN r2698.
2004-09-16 08:26:57 +00:00
Ralph Castain
d0e308fbc4 First attempt to thread safe the registry and name server subsystems. Comment out the duplicate calls to register processes in mpi_init and mpirun2.
This commit was SVN r2697.
2004-09-16 04:14:35 +00:00
Ralph Castain
03096df34a Correct a mis-match in a parameter passed to one of the functions. Somehow, the parameter list to construct a notify message got twisted to passing key values instead of tokens. This meant that the parameters sent to "get" were also incorrect, guaranteeing that the whole notify system would fail.
Not sure how the mistake got in there - possibly just some lack of coordination. Anyway, it has now been corrected.

This commit was SVN r2440.
2004-09-02 01:44:19 +00:00
Ralph Castain
854ac21038 Provide a couple of new synchro options:
1. greater than or equal
2. less than or equal

Adjust ascending/descending mode to require transition through level. Change initial checks to only check levels.

This commit was SVN r2428.
2004-09-01 15:27:50 +00:00
Ralph Castain
71ad56d894 Update the registry to complete the publish/subscribe system. Add new function "cancel_synchro" to remove a synchro trigger, enable subscribe/unsubscribe on proxies, minor cleanups. Registry should now be fully functional. Note that I am currently unable to test this in a multi-process environment - can only guarantee it compiles and passes replica-only tests.
This commit was SVN r2393.
2004-08-30 15:15:27 +00:00
Ralph Castain
37f331dcff Update to the registry, bringing publish/subscribe system online. Most functions now available - exceptions are a couple of more esoteric notify modes, and the "unsubscribe" function. Will bring those online over the weekend.
This commit was SVN r2346.
2004-08-28 01:59:56 +00:00
Ralph Castain
1df5cca479 Clean up constructors/destructors for proper memory management.
This commit was SVN r2325.
2004-08-27 15:36:53 +00:00
Ralph Castain
f1ab634fab Massive update of the registry system to incorporate publish/subscribe notifications, including new "synchro" function that allows a barrier-like operation to be performed on a registry segment. Corresponding update to unit test for replica - system passes, but additional new functionality needs to be added to test.
This commit was SVN r2317.
2004-08-27 05:23:04 +00:00
Ralph Castain
ee1f0b13f4 Add a new function - synchro - that allows the caller to be notified when a specified number of objects have been placed on a segment. The tokens describing the objects must also be provided, which means you can intermix objects on the segment and still be notified when you reach the specified number of objects meeting your description.
Code compiles, but has not been functionally validated yet.

This commit was SVN r2290.
2004-08-25 01:59:36 +00:00
Ralph Castain
8edd599cf4 Update header files to reflect change in API - registry objects will now be void*. Users can call the packing routines to generate packed info, then extract and pass the resulting object to the GPR for storage. Users can also simply send a string for storage without packing as characters carry across platforms just fine.
Added OOB connection - still needs testing, but test tree is currently kaput.

This commit was SVN r2226.
2004-08-19 15:14:29 +00:00
Ralph Castain
964ad6ae46 Added hook to test internal subsystems. Expanded unit test to exploit. Fixed a few bugs.
This commit was SVN r2177.
2004-08-17 04:23:06 +00:00
Ralph Castain
2db94bb016 Checkpoint the registry. Now have several of the basic functions operating - can define segments, get index of dictionaries, etc - all the basic elements required to build get/put. Will continue tonight.
This commit was SVN r2169.
2004-08-16 21:36:50 +00:00
Ralph Castain
60a5428e43 Update the registry - restore the gpr_replica.c file into the make system, update functionality.
This commit was SVN r2164.
2004-08-16 16:20:47 +00:00
Ralph Castain
b433fc3ad9 Added the local support functions for the replica. Everything compiles fine on my machine - next step is to test these functions.
This commit was SVN r2154.
2004-08-15 22:37:01 +00:00
Ralph Castain
49e8e16148 Fix the registry software to compile - get all the naming errors finally out, etc. The functionality is not present, so don't use it yet - nothing will happen. I'll be restoring the functionality over the next week or two.
This commit was SVN r2146.
2004-08-15 05:49:55 +00:00
Ralph Castain
c0121cb927 Major update to the general purpose registry. Cleaned up the mess from name changes, begin building the functionality. Long way to go....
I have to commit this to cleanup a break in my tree. I'm hoping it won't break the compile of the tree, but will fix it as quickly as possible.

Jeff - you are welcome to set an "ignore" on the gpr if you like - I'll let you know when I've got the "kinks" out.

This commit was SVN r2145.
2004-08-15 03:33:13 +00:00