1
1

22 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
47bf2574e1 Ensure that subscriptions for a specific requestor/subscription return id only get registered once. It appears that sometimes the system registers a subscription for the same return location multiple times. This prevents getting multiple callbacks when that happens. Still need to track down why it is happening at all.
This commit was SVN r7197.
2005-09-06 16:33:41 +00:00
Ralph Castain
96f4bb7a63 Hey, sports fans!! Guess what??
Here's the huge registry check-in you've all been waiting for with baited breath. The revised version sends a single message to all processes at the various stage gates, thus making the startup much more scalable. I could provide you with all the tawdry details, but won't for now - you are welcome to ask, though, and I'll merrily bore your ears to tears.

In addition, the commit contains the following:

1. set the ignore properties on ompi/debuggers and orte/mca/pls/poe

2. Added simplified subscribe and put functions to the registry's API. I have also converted all of the ompi functions that registered subscriptions to the new API, and caught their associated put's as well.

In a follow-on commit, I'll be adding support for George's hetero arch registry subscription (wanted to get this one in first).

This commit was SVN r7118.
2005-09-01 01:07:30 +00:00
Rainer Keller
f0c2f78dd4 - Another one, just missed.
This commit was SVN r6976.
2005-08-22 18:12:05 +00:00
Rainer Keller
1ac8c75965 - Nothing of interest: Fixed comments, indentation...
To get a clear view on the next patch.

This commit was SVN r6975.
2005-08-22 18:02:10 +00:00
Josh Hursey
3b187c4db3 Fix the 'delete container' logic in gpr to prevent recursive delete of all
containers when one is requested.

Fix a bug in gpr_replica_del_index_api which doesn't preset num_tokens and
num_keys, but assumes they are 0.

Fix orte_ras_base_node_delete() function to operate properly to delete the
appropriate container in the 'orte-node' segment when requested.

This commit was SVN r6756.
2005-08-05 23:37:39 +00:00
Ralph Castain
4e1837687b Finish simplified interfaces for put and subscribe - more details to come.
This commit was SVN r6713.
2005-08-02 19:43:29 +00:00
Ralph Castain
8c6c78c47a Add a few new functions that were requested last week - not tested yet, so please don't use them! I will test them this afternoon on a different computer. For now, they won't cause any problems since they aren't being called.
This commit was SVN r6689.
2005-08-01 16:38:15 +00:00
Ralph Castain
4e79a51395 Add a job_info segment to the system that holds a container for each job. Within each container is a keyval indicating the job state (i.e., all procs at stage1, finalized, etc.). This provides a rough state-of-health for the job.
This required a little fiddling with a number of areas. Biggest problem was that it uncovered a potential for an infinite loop to be created in the registry. If a callback function modified the registry, the registry checked the triggers to see if anything had fired. Well, if the original callback was due to a trigger firing, that condition hadn't changed - so the trigger fired again....which caused the callback to be called, which modified the registry, which checked the triggers, etc. etc.

Triggers are now checked and then "flagged" as being "in process" so that the registry will NOT recheck that trigger until all callbacks have been processed. Tried doing this with subscriptions as well, but that caused a problem - when we release processes from a stagegate, they (at the moment) immediately place data on the registry that should cause a subscription to fire. Unfortunately, the system will just hang if that subscription doesn't get processed. So, I have left the subscription system alone - any callback function that modifies the registry in a fashion that will fire a subscription will indeed fire that subscription. We'll have to see if this causes problems - it shouldn't, but a careless user could lock things up if the callback generates a callback to itself.

Also fixed the code that placed a process' RML contact info on the registry to eliminate the leading '/' from the string.

This commit was SVN r6684.
2005-07-29 14:11:19 +00:00
Ralph Castain
13fdcff66b Fix a bug Greg was seeing on subscription returns - problem in pointer arithmetic
This commit was SVN r6594.
2005-07-22 20:46:07 +00:00
Ralph Castain
f604fb72db Turn "on" the delete functionality for the registry. Should now be able to delete entries and segments, and get an index of the dictionary entries on the registry.
Haven't fully tested these yet (nobody is using them at the moment that I know of - good thing, since they haven't been working for a long time - though I know the MPI-2 stuff needs the functionality), but will do so shortly. For now, they compile.

This commit was SVN r6567.
2005-07-20 18:07:46 +00:00
Ralph Castain
5e437f9a09 Fix a potential "free" that shouldn't happen
This commit was SVN r6552.
2005-07-19 16:21:06 +00:00
Ralph Castain
9af1739d33 Correct an opal_hash_table_get/set_proc name to orte_hash_table_get/set_proc.
Remove a couple of unused variable complaints from registry dump.

This commit was SVN r6550.
2005-07-19 13:33:04 +00:00
Jeff Squyres
7e413d6c26 Remove mistaken return with a value in a void function.
This commit was SVN r6548.
2005-07-19 12:23:41 +00:00
Ralph Castain
485e549f38 missing file
This commit was SVN r6545.
2005-07-18 21:18:26 +00:00
Ralph Castain
19d58ee17e First phase of the scalable RTE changes:
1. Modify the registry to eliminate redundant data copying for startup messages.

2. Revise the subscription/trigger system to avoid redundant storage of triggers and subscriptions. This dramatically reduces the search time when a registry action occurs - to illustrate the point, there are now only a handful of triggers on the system for each job. Before, there were a handful of triggers for each PROCESS in the job, all of which had to be checked every time something happened on the registry. This is much, much faster now.

3. Update all subscriptions to the new format. There are now "named" subscriptions - this allows you to "name" a subscription that all the processes will be using. The first one to hit the registry actually defines the subscription. From then on, any subsequent "subscribes" to the same name just cause that process to "attach" to the existing subscription. This keeps the number of subscriptions being tracked by the registry to a minimum, while ensuring that each process still gets notified.

4. Do the same for triggers.

Also fixed a duplicate subscription problem that was causing people to receive data equal to the number of processes times the data they should have received from a trigger/subscription. Sorry about that... :-( ...but it's all better now!

Uncovered a situation where the modex data seems to be getting entered on the registry a second time - the latter time coming after the compound command has been "fired", thereby causing all the subscriptions to fire. Asked Tim and Jeff to look into this.

Second phase of the changes will involve modifying the xcast system so that the same message gets sent to all processes. This will further reduce the message traffic, and - once we have a true "broadcast" version of xcast - really speed things up and improve scalability.

This commit was SVN r6542.
2005-07-18 18:49:00 +00:00
Ralph Castain
526217b9fc Two things here:
1. Fix the reigstry's overwrite logic. It was only overwriting the first keyval specified in a value - the rest were just added on regardless of whether or not the keyval already existed. This was the source of the multiple keyvals some people were seeing - should be fixed now.

2. Change the orted command parsing options so it reports options that aren't recognized - should help reduce confusion

This commit was SVN r6536.
2005-07-16 23:08:15 +00:00
Brian Barrett
9f44b80291 * rename ompi_argv to opal_argv
* rename ompi_basename to opal_basename
* rename ompi bitop functions to opal
* rename ompi_cmd_line to opal_cmd_line
* rename ompi_sizet2int to opal_sizet2int
* rename orte_daemon_init to opal_daemon_init
* rename ompi_few to opal_few

This commit was SVN r6330.
2005-07-04 00:13:44 +00:00
Brian Barrett
a13166b500 * rename ompi_output to opal_output
This commit was SVN r6329.
2005-07-03 23:31:27 +00:00
Brian Barrett
39dbeeedfb * rename locking code from ompi to opal
This commit was SVN r6327.
2005-07-03 22:45:48 +00:00
Brian Barrett
761402f95f * rename ompi_list to opal_list
This commit was SVN r6322.
2005-07-03 16:22:16 +00:00
Brian Barrett
499e4de1e7 * rename ompi_object and ompi_class to opal_object and opal_class
This commit was SVN r6321.
2005-07-03 16:06:07 +00:00
Jeff Squyres
1b18979f79 Initial population of orte tree
This commit was SVN r6266.
2005-07-02 13:42:54 +00:00