1
1
Граф коммитов

16 Коммитов

Автор SHA1 Сообщение Дата
Tim Woodall
142b7cc682 merge from release branch
This commit was SVN r8167.
2005-11-16 17:10:49 +00:00
Tim Woodall
3699c924bd callback for init prior to launch - allow app to hookup stdout/stderr
prior to launch

This commit was SVN r8083.
2005-11-10 04:47:41 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Tim Woodall
7300112564 removed debug
This commit was SVN r7896.
2005-10-27 17:08:30 +00:00
Tim Woodall
60754acae8 - modified rmaps data structures to point directly to ras node
- modified rsh to NOT query for each nodes mapping, as all data is
  already available in the rmaps structures

This commit was SVN r7894.
2005-10-27 17:04:10 +00:00
Jeff Squyres
0629cdc2d7 Bring back the changes from /tmp/jjhursey-rmaps. Specific merge
command:

svn merge -r 7567:7663 https://svn.open-mpi.org/svn/ompi/tmp/jjhursey-rmaps .

(where "." is a trunk checkout)

The logs from this branch are much more descriptive than I will put
here (including a *really* long description from last night).  Here's
the short version:

- fixed some broken implementations in ras and rmaps
- "orterun --host ..." now works and has clearly defined semantics
  (this was the impetus for the branch and all these fixes -- LANL had
  a requirement for --host to work for 1.0)
- there is still a little bit of cleanup left to do post-1.0 (we got
  correct functionality for 1.0 -- we did not fix bad implementations
  that still "work")
  - rds/hostfile and ras/hostfile handshaking
  - singleton node segment assignments in stage1
  - remove the default hostfile (no need for it anymore with the
    localhost ras component)
  - clean up pls components to avoid duplicate ras mapping queries
  - [possible] -bynode/-byslot being specific to a single app context 

This commit was SVN r7664.
2005-10-07 22:24:52 +00:00
Jeff Squyres
eb24fe4fd8 If the job fails to launch properly, set its state to ABORTED, which
will fire some subscriptions that will eventually result in invoking
terminate_job (i.e., terminate anything that may have been
successfully started by launch).

This commit was SVN r7622.
2005-10-04 17:19:23 +00:00
Andrew Friedley
555ae37255 Add lib{opal,orte,mpi}.la to appropriate LIBADD's, some whitespace cleanup as well.
This commit was SVN r7477.
2005-09-22 12:28:54 +00:00
Ralph Castain
2656ec93b5 Fix a typo so that stage_gate_2 gets correctly passed back to orterun...
This commit was SVN r7446.
2005-09-20 19:12:59 +00:00
Ralph Castain
b589a93e29 Continue to lace the trace functionality into orte...
This commit was SVN r7427.
2005-09-19 15:29:14 +00:00
Josh Hursey
8bf587475b Added a flag to orte_rmgr_base_proc_stage_gate_subscribe() allowing the
caller to specify a subset of the state variables that it can can subscribe to.
This is specified with one of three special flags defined in rmgr/rmgr_types.h

This is useful when we only care about a subset of the state changes, such as
in orted which only needs to know when a job has terminated or aborted.

This commit was SVN r7356.
2005-09-13 21:14:34 +00:00
Brian Barrett
ed56e743b7 * update configure.ac to use the modern version of AC_INIT and
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
  number to be set at autoconf time (instead of at configure time, as
  it was before).  Set the version number, minus the subversion r number,
  at autoconf time.  Override the internal variables to include the r
  number (if needed) at configure time.  Basically, the right thing
  should always happen.  The only place it might not is the version
  reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
  them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
  in the directory containing source files, even if the Makefile.am is
  in another directory.  This should start making it feasible to
  reduce the number of Makefile.am files we have in the tree, which
  will greatly reduce the time to run autogen and configure.

This commit was SVN r7211.
2005-09-07 05:54:53 +00:00
Ralph Castain
19d58ee17e First phase of the scalable RTE changes:
1. Modify the registry to eliminate redundant data copying for startup messages.

2. Revise the subscription/trigger system to avoid redundant storage of triggers and subscriptions. This dramatically reduces the search time when a registry action occurs - to illustrate the point, there are now only a handful of triggers on the system for each job. Before, there were a handful of triggers for each PROCESS in the job, all of which had to be checked every time something happened on the registry. This is much, much faster now.

3. Update all subscriptions to the new format. There are now "named" subscriptions - this allows you to "name" a subscription that all the processes will be using. The first one to hit the registry actually defines the subscription. From then on, any subsequent "subscribes" to the same name just cause that process to "attach" to the existing subscription. This keeps the number of subscriptions being tracked by the registry to a minimum, while ensuring that each process still gets notified.

4. Do the same for triggers.

Also fixed a duplicate subscription problem that was causing people to receive data equal to the number of processes times the data they should have received from a trigger/subscription. Sorry about that... :-( ...but it's all better now!

Uncovered a situation where the modex data seems to be getting entered on the registry a second time - the latter time coming after the compound command has been "fired", thereby causing all the subscriptions to fire. Asked Tim and Jeff to look into this.

Second phase of the changes will involve modifying the xcast system so that the same message gets sent to all processes. This will further reduce the message traffic, and - once we have a true "broadcast" version of xcast - really speed things up and improve scalability.

This commit was SVN r6542.
2005-07-18 18:49:00 +00:00
Jeff Squyres
ba99409628 Major simplifications to component versioning:
- After long discussions and ruminations on how we run components in
  LAM/MPI, made the decision that, by default, all components included
  in Open MPI will use the version number of their parent project
  (i.e., OMPI or ORTE).  They are certaint free to use a different
  number, but this simplification makes the common cases easy:
  - components are only released when the parent project is released
  - it is easy (trivial?) to distinguish which version component goes
    with with version of the parent project
- removed all autogen/configure code for templating the version .h
  file in components
- made all ORTE components use ORTE_*_VERSION for version numbers
- made all OMPI components use OMPI_*_VERSION for version numbers
- removed all VERSION files from components
- configure now displays OPAL, ORTE, and OMPI version numbers
- ditto for ompi_info
- right now, faking it -- OPAL and ORTE and OMPI will always have the
  same version number (i.e., they all come from the same top-level
  VERSION file).  But this paves the way for the Great Configure
  Reorganization, where, among other things, each project will have
  its own version number.

So all in all, we went from a boatload of version numbers to
[effectively] three.  That's pretty good.  :-)

This commit was SVN r6344.
2005-07-04 20:12:36 +00:00
Brian Barrett
a13166b500 * rename ompi_output to opal_output
This commit was SVN r6329.
2005-07-03 23:31:27 +00:00
Jeff Squyres
a314578d94 Oops -- rmgr should be in orte, not ompi.
This commit was SVN r6274.
2005-07-02 14:14:42 +00:00