1
1
Граф коммитов

115 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
de324502bc Update the Windows wait functions. The most important change is for
the event registration, which in the case of a process dead detection
should be marked as fire once and taking long time.

This commit was SVN r15068.
2007-06-14 04:35:46 +00:00
Brian Barrett
84d1512fba Add the potential for doing some basic error checking on mutexes during
single threaded builds.  In its default configuration, all this does
is ensure that there's at least a good chance of threads building
based on non-threaded development (since the variable names will be
checked).  There is also code to make sure that a "mutex" is never
"double locked" when using the conditional macro mutex operations.
This is off by default because there are a number of places in both
ORTE and OMPI where this alarm spews mega bytes of errors on a
simple test.  So we have some work to do on our path towards
thread support.

Also removed the macro versions of the non-conditional thread locks,
as the only places they were used, the author of the code intended
to use the conditional thread locks.  So now you have upper-case
macros for conditional thread locks and lowercase functions for
non-conditional locks.  Simple, right? :).

This commit was SVN r15011.
2007-06-12 16:25:26 +00:00
Ralph Castain
85df3bd92f Bring in the generalized xcast communication system along with the correspondingly revised orted launch. I will send a message out to developers explaining the basic changes. In brief:
1. generalize orte_rml.xcast to become a general broadcast-like messaging system. Messages can now be sent to any tag on the daemons or processes. Note that any message sent via xcast will be delivered to ALL processes in the specified job - you don't get to pick and choose. At a later date, we will introduce an augmented capability that will use the daemons as relays, but will allow you to send to a specified array of process names.

2. extended orte_rml.xcast so it supports more scalable message routing methodologies. At the moment, we support three: (a) direct, which sends the message directly to all recipients; (b) linear, which sends the message to the local daemon on each node, which then relays it to its own local procs; and (b) binomial, which sends the message via a binomial algo across all the daemons, each of which then relays to its own local procs. The crossover points between the algos are adjustable via MCA param, or you can simply demand that a specific algo be used.

3. orteds no longer exhibit two types of behavior: bootproxy or VM. Orteds now always behave like they are part of a virtual machine - they simply launch a job if mpirun tells them to do so. This is another step towards creating an "orteboot" functionality, but also provided a clean system for supporting message relaying.

Note one major impact of this commit: multiple daemons on a node cannot be supported any longer! Only a single daemon/node is now allowed.

This commit is known to break support for the following environments: POE, Xgrid, Xcpu, Windows. It has been tested on rsh, SLURM, and Bproc. Modifications for TM support have been made but could not be verified due to machine problems at LANL. Modifications for SGE have been made but could not be verified. The developers for the non-verified environments will be separately notified along with suggestions on how to fix the problems.

This commit was SVN r15007.
2007-06-12 13:28:54 +00:00
Brian Barrett
1d11cc4b2d Fix mis-declared variable type
This commit was SVN r14994.
2007-06-11 16:48:35 +00:00
Ralph Castain
983fd3432a Fix singleton comm_spawn. Ensure that singleton's start the RML receive function so they can receive RML updates during xconnect procedures once any comm_spawn'd children start. Since singleton's only use the RMGR/URM component, update that component to also hold us until xconnect is completed (if it is invoked) before returning to the caller.
This commit was SVN r14914.
2007-06-06 17:39:23 +00:00
Brian Barrett
508da4e959 OS X apparently really doesn't like shared libraries with unresolvable
symbols in them and environ is defined only in the final application
(probably in crt1.o).  Apple provides a function for getting at the
environment, so use that instead if it's available.

This commit was SVN r14857.
2007-06-05 03:03:59 +00:00
Brian Barrett
34fea87819 * Only need to to the opal_progress_event_users_increment() once between
OPAL and ORTE.  Since we now do opal_progress_init(), we do it
    there.  Fixes a performance issue introduced in r14773.
  * While trying to find the above, notived that we did the reference
    counting for the init in init_util and for finalize in fini.  That
    isn't right, so make them both in the non-util versions.

This commit was SVN r14830.

The following SVN revision numbers were found above:
  r14773 --> open-mpi/ompi@1e678c3f55
2007-06-01 02:43:46 +00:00
George Bosilca
905570a6d2 Call opal_show_help with the expected number of arguments.
This commit was SVN r14802.
2007-05-30 18:49:43 +00:00
Josh Hursey
1e678c3f55 per conversation with Ralph and Jeff take out the opal_init_only logic.
This commit moves the initalization/finalization of opal_event and opal_progress
to opal_init/finalize. These were previously init/final in ORTE which is an
abstraction violation. After talking about it we concluded that there are no
ordering issues that require these to be init/final in ORTE instead of OPAL.

I ran the IBM test suite against this commit and it didn't turn up any new
failures so I think it is good to go.

Let us know if this causes problems.

This commit was SVN r14773.
2007-05-24 21:54:58 +00:00
Ralph Castain
b582d98d4a Fix singleton comm_spawn so it can see available resources
This commit was SVN r14719.
2007-05-22 13:29:07 +00:00
Ralph Castain
677eb5e4bc Ensure the singleton wakes up when comm_spawn fails
This commit was SVN r14714.
2007-05-21 20:13:31 +00:00
Ralph Castain
4fff584a68 Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that *did* start.
The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system.

Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed.

Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief.

With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn.

Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put".

This commit was SVN r14711.
2007-05-21 18:31:28 +00:00
Josh Hursey
596062d34b Seems that the recent changes in the sds and oob exposed some invalid
assumptions in the FT restart code for the ORTE layer.

This fixes those problems by having the RML completely shutdown and 
restart the OOB framework (instead of just the module as before).
This makes it much easier to manage, and maintainable as the OOB
changes in the future.

The SDS now does communication as part of its startup procedure, so
we need to make sure we restart the RML before the SDS so that it can
communicate properly.

OOB base [close|open] used a static bool to determine if they have
been called previously or not. I needed to expose this boolean so 
that I can close() then open() the oob base in the restart procedure.
The functionality has not changed, we just now have the ability to 
open/close the framework as many times as we need to as long as we
always call them in that order. (So calling open twice in a row is not allowed
as before, it is only allowed if you open(), close(), then open() again).

Things seem to be working now.

This commit was SVN r14515.
2007-04-25 19:51:52 +00:00
Ralph Castain
18cb5c9762 Complete modifications for failed-to-start of applications. Modifications for failed-to-start of orteds coming next.
This completes the minor changes required to the PLS components. Basically, there is a small change required to the parameter list of the orted cmd functions. I caught and did it for xcpu and poe, in addition to the components listed in my email - so I think that only leaves xgrid unconverted.

The orted fail-to-start mods will also make changes in the PLS components, but those can be localized so they come in one at a time.

This commit was SVN r14499.
2007-04-24 20:53:54 +00:00
Ralph Castain
18b2dca51c Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly.
There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd *really* have to try).

This also involved a slight change to the oob.xcast API, so propagated that as required.

Note: this has *only* been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-)

Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately.

This commit was SVN r14475.
2007-04-23 18:41:04 +00:00
Ralph Castain
f47e7382e3 Add a new function to wake orterun up - used in failed-to-start scenarios, but can be used anytime a lower level needs to ensure orterun wakes up
This commit was SVN r14466.
2007-04-23 12:49:25 +00:00
George Bosilca
ef4baeb6ab Don't reset the pid, as at this point it is already set.
This commit was SVN r14235.
2007-04-05 20:13:50 +00:00
George Bosilca
f2a6b9394f Deal with the include spree. Protect "environ" on Windows.
Some others minors modifications in order to make it
compile [again] on Windows.

This commit was SVN r14188.
2007-04-01 16:16:54 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
George Bosilca
7750ed22e0 Correct the Windows part of the universe detection.
This commit was SVN r13547.
2007-02-07 22:37:28 +00:00
Rolf vandeVaart
bf5113198d Update to orte-clean so it will remove files on local and
remote nodes.  It will also kill off rogue orteds and orterun
processes.  The killing of processes is ifdef'ed out for Windows
since I do not know how to do it there.  Note that this change
will requite an autogen.  

This commit was SVN r13477.
2007-02-03 00:25:42 +00:00
Jeff Squyres
78a13bc3ea Fix the MPI_ABORT problem. We added an orte_initialized variable
yesterday and set it to "true" in orte_init().  But ompi_mpi_init()
doesn't call orte_init() -- it calls orte_init_stage1() and
orte_init_stage2(). So orte_initialized was never set to true, and
Badness happend from there (w.r.t. ompi_mpi_abort()).

This patch moves the setting of orte_initialized to orte_init_stage2()
so that everyone will always get it set properly.

It also moves setting orte_universe_info.state to RUNNING into
stage2() as well -- Ralph confirmed that that should have been there
for the same reasons that orte_initialized needs to be there.

This commit was SVN r13374.
2007-01-30 23:00:43 +00:00
Jeff Squyres
e90b3e415b * Before this commit, if we called ompi_mpi_abort() before MPI_INIT
completed successfully, Bad Things(tm) could happen.
 * Now we explicitly check orte_initialized (a new global in ORTE
   indicating whether we are between orte_init() and orte_finalize()
   or not), and if so, react accordingly.
 * If ORTE is initialized, use orte_system_info.nodename; otherwise,
   use gethostname().
 * Add loop protection to ensure that ompi_mpi_abort() is not invoked
   multiple times recursively.

This commit was SVN r13354.
2007-01-29 22:01:28 +00:00
Jeff Squyres
974dcebf9f Finish backing out r13316 by also removing the comments that it
insertted.

This commit was SVN r13324.

The following SVN revision numbers were found above:
  r13316 --> open-mpi/ompi@35c1370a13
2007-01-26 13:09:18 +00:00
George Bosilca
29597cf0c5 We need to initialize the ODLS as they are the only one to define
the ORTE_DAEMON_CMD type. Which, unfortunately, is used all over
the place. Without this, we get error:
[msc01:12341] [0,0,0] ORTE_ERROR_LOG: Data pack failed in file ../../ompi-trunk/orte/dss/dss_pack.c at line 83
[msc01:12341] [0,0,0] ORTE_ERROR_LOG: Data pack failed in file ../../ompi-trunk/orte/dss/dss_pack.c at line 58
[msc01:12341] [0,0,0] ORTE_ERROR_LOG: Data pack failed in file ../../../../ompi-trunk/orte/mca/pls/base/pls_base_orted_cmds.c at line 136

This commit was SVN r13320.
2007-01-26 04:32:15 +00:00
Ralph Castain
0905dfdfba Make sure the params.h file gets included in the tarballs
This commit was SVN r13318.
2007-01-26 03:05:30 +00:00
Rich Graham
35c1370a13 odls components are handled only by daemon procs.
This commit was SVN r13316.
2007-01-25 21:18:59 +00:00
Ralph Castain
ab5ea61100 Bring over the rest of the ctrl-c fixes. This commit includes:
1. add a "cancel_operation" API to the pls components that allows orterun to demand that an orted operation (e.g., terminate_job) be immediately cancelled and abandoned.

2. changes the pls orted commands from blocking to non-blocking. This allows us to interrupt those operations should an orted be non-responsive. The change also adds an orte_abort_timeout that limits how long orterun will automatically wait for the orteds to respond - if the terminate command, for example, doesn't see orted response within that time, then we printout an appropriate error message and just give up.

3. modifies orterun to allow multiple ctrl-c's to simply abort the program even if the orteds have not responded

4. does some cleanup on the orte-level mca params so that their implementation looks a lot more like that of ompi - makes it easier to maintain. This change also includes the definition of an orte_abort_timeout struct and associated MCA param (can't have too many!) so you can set the time after which orterun gives up on waiting for orteds to respond

This needs more testing before migrating to 1.2.

This commit was SVN r13304.
2007-01-25 14:17:44 +00:00
George Bosilca
5711583bdf Force only one thread to come out from the
socket engine.

This commit was SVN r13298.
2007-01-25 07:36:42 +00:00
George Bosilca
950b07d860 Work around the Windows sockets model.
This commit was SVN r13294.
2007-01-25 00:19:02 +00:00
George Bosilca
3169a29da4 Revert commit r13235.
This commit was SVN r13238.

The following SVN revision numbers were found above:
  r13235 --> open-mpi/ompi@2636881324
2007-01-22 06:46:58 +00:00
George Bosilca
2636881324 Remove unused variables.
This commit was SVN r13235.
2007-01-22 05:46:57 +00:00
George Bosilca
93c3e3a21f __WINDOWS__ is defined or not.
This commit was SVN r13234.
2007-01-22 05:46:30 +00:00
Brian Barrett
2755d5ccef Only do Windows things if we're on Windows. Need another case for when we
don't have windows and we don't have waitpid() (ie, the Cray)

This commit was SVN r13173.
2007-01-17 23:16:52 +00:00
George Bosilca
409d1b8a8d Make the universe creation functions Windows friendly again.
This commit was SVN r13041.
2007-01-08 21:58:57 +00:00
Rolf vandeVaart
fdf44cc4ab Add the ability to not only report broken files and directories,
but remove them also.  This current set of changes will affect
nothing as no one is making use of this ability.  However, orte-clean
will be changed soon to utilize this new feature.

This commit was SVN r12996.
2007-01-04 21:48:34 +00:00
Ralph Castain
90f5e3fad8 Fix a buglet in the singleton startup procedure. For purposes of minimizing the xcast message, we "strip" the descriptive info on all subscription messages. This means, though, that we have to store the process name and other info so it can be retrieved in the body of the subscription data (as opposed to in the description). This wasn't being done for singletons because they don't call the RMAPS to "map" themselves.
This has now been corrected. The singleton startup will dutifully call the mapper framework so that the proper data storage locations get initialized. Unfortunately, we then had to instruct the RMAPS not to allocate a vpid range for this job - otherwise, it would make a mistake and think there were two processes in it. Hence, a change was required to RMAPS to tell it "map this job, but don't allocate a vpid range for it".

This change will need to migrate across to 1.2 after it "soaks" the appropriate time.

This commit was SVN r12952.
2007-01-02 16:14:44 +00:00
Ralph Castain
82946cb220 Add a new option to orterun: "--do-not-launch" directs the system to do the allocation, map, job setup, etc., but don't actually launch the job. This lets us test all the setup portions of the code.
Also, take the first step in updating how we handle mca params in ORTE - bring it closer to how it is done in the other two layers. Much more work to be done here.

This commit was SVN r12838.
2006-12-13 04:51:38 +00:00
Brian Barrett
6f8b366acb Rename liborte to libopen-rte and libopal to libopen-pal per telecon today
and bug #632.

Refs trac:632

This commit was SVN r12762.

The following Trac tickets were found above:
  Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632
2006-12-05 18:27:24 +00:00
Rainer Keller
e61dd8722e - Silence compiler on ORTE_TRANSPORT_KEY_FMT, it is fixed to llx
- No functional changes, just indentation and corrections to error
   output.

This commit was SVN r12734.
2006-12-03 13:59:23 +00:00
Rainer Keller
b63500f62c - Dont unlock ompi_rte_mutex unconditionally, use the macro instead.
This commit was SVN r12655.
2006-11-22 21:01:43 +00:00
Ralph Castain
6fca1431f3 Back out some prior commits. These commits fixed bproc so it would run, but broke several other things (singleton comm_spawn and hostfile operations have been identified so far). Since bproc is the culprit here, let's leave bproc broken for now - I'll work on a fix for that environment that doesn't impact everythig else.
This commit was SVN r12648.
2006-11-22 13:30:21 +00:00
Brian Barrett
33320b7165 Rework the opal_progress interface to better support dynamic processes and at
the same time, remove some of the MPI-related options from OPAL:

  - provide mechanism to change at runtime whether sched_yield() should 
    be called when the progress engine is idle
  - provide mechanism for changing the rate at which the event engine
    is called when there are "no" users of the event engine (ie, when
    using MPI but not TCP)
  - fix some function names in the progress engine to better match
    their intended use (and remove MPI naming scheme)
  - remove progress_mpi_enable / progress_mpi_disable because 
    we can now use the functions to set the sched_yield and
    tick rate interfaces
  - rename opal_progress_events() to opal_progress_set_event_flag()
    because the first really isn't descriptive of what the function
    does and I always got confused by it

This commit was SVN r12645.
2006-11-22 02:06:52 +00:00
Ralph Castain
9f3dcd147a Round and round the mulberry bush we go...
Fix comm_spawn by singletons. orte_init does some voodoo to let the system know about localhost when we are a singleton. This includes allocating it so that any comm_spawn'd children can use their parent "allocation". Unfortunately, the fix that bproc needs (due to that smr filling up the node segment!) causes the singleton startup to fail. The fix is to just have the singleton startup force an allocation of its localhost.

Only issue here is: what happens if we are in a persistent universe? The singleton will now overwrite any prior info on slots used on localhost by other jobs (won't affect anything else). The answer, of course, is to do something more intelligent - lookup localhost on the registry and just update its info instead of overwriting it.

Something for another day (or month....or year)

This commit was SVN r12644.
2006-11-21 21:51:58 +00:00
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
Ralph Castain
a3be8261fb Fix a bug that had us generate an error message and abort startup when there were stale universe directories around. Now, we just ignore them.
This commit was SVN r12472.
2006-11-07 21:34:57 +00:00
Ralph Castain
194bdd413b Cleanup the problem of connecting to default universes.
This commit was SVN r12438.
2006-11-06 15:28:38 +00:00
Ralph Castain
7a77ef0ae3 Given the amount of pain singletons cause, one can't help but wonder if it REALLY was that much trouble for people to type "mpirun -n 1 foo"....sigh.
Get the ordering right so that a singleton can start.

Protect the rmgr copy app_context function from NULL fields

Tell the mapper it is okay for there not to be a pre-existing mapping plan for a parent when dynamically spawning processes

This commit was SVN r12257.
2006-10-23 15:15:45 +00:00
Ralph Castain
153e38ffc9 Lesson to be learned: if you send an ack to a recv'd command, better not send it to the same tag it came from - at least, not if there is a persistent recv on that tag!
Fix the persistent daemon problem where it was exiting when a job completed. Problem was that the persistent daemon would order the job daemons to exit. They would then send an 'ack' back to the persistent daemon - but the ack consisted of an echo of the "exit" command, which was recv'd by the wrong listener who treated it as a properly sent cmd....and exited.

This commit was SVN r12243.
2006-10-21 02:53:19 +00:00
Ralph Castain
13227e36ab This commit looks a lot bigger than it is, so relax :-)
Fix the problem observed by multiple people that comm_spawned children were (once again) being mapped onto the same nodes as their parents. This was caused by going through the RAS a second time, thus overwriting the mapper's bookkeeping that told RMAPS where it had left off.

To solve this - and to continue moving forward on the ORTE development - we introduce the concept of attributes to control the behavior of the RM frameworks. I defined the attributes and a list of attributes as new ORTE data types to make it easier for people to pass them around (since they are now fundamental to the system, and therefore we will be packing and unpacking them frequently). Thus, all the functions to manipulate attributes can be implemented and debugged in one place.

I used those capabilities in two places:

1. Added an attribute list to the rmgr.spawn interface.

2. Added an attribute list to the ras.allocate interface. At the moment, the only attribute I modified the various RAS components to recognize is the USE_PARENT_ALLOCATION one (as defined in rmgr_types.h).

So the RAS components now know how to reuse an allocation. I have debugged this under rsh, but it now needs to be tested on a wider set of platforms.

This commit was SVN r12138.
2006-10-17 16:06:17 +00:00