1
1
Граф коммитов

396 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
f219cc1e6e A few changes to the lsf components - mostly cleanup, no major logic changes
This commit was SVN r15563.
2007-07-23 18:38:36 +00:00
Ralph Castain
ef141d1fbc Ensure daemons know contact info for all other daemons. Update binomial xcast to work in revised design. Add debug output to orted so the daemon lets us know it launched (if --debug-daemons set) early on in case it fails during orte_init
This commit was SVN r15555.
2007-07-23 15:00:39 +00:00
Jeff Squyres
78d214fec8 Oops -- didn't mean to commit the test program...
This commit was SVN r15538.
2007-07-20 20:15:51 +00:00
Jeff Squyres
2baa866026 Compiles to the new API, but doesn't quite work yet...
This commit was SVN r15537.
2007-07-20 19:49:27 +00:00
Brian Barrett
5b9fa7e998 reapply r15517 and r15520, which were removed in r15527 so that I could get
the RML/OOB merge in slightly easier

This commit was SVN r15530.

The following SVN revision numbers were found above:
  r15517 --> open-mpi/ompi@41977fcc95
  r15520 --> open-mpi/ompi@9cbc9df1b8
  r15527 --> open-mpi/ompi@2d17dd9516
2007-07-20 02:34:29 +00:00
Brian Barrett
39a6057fc6 A number of improvements / changes to the RML/OOB layers:
* General TCP cleanup for OPAL / ORTE
  * Simplifying the OOB by moving much of the logic into the RML
  * Allowing the OOB RML component to do routing of messages
  * Adding a component framework for handling routing tables
  * Moving the xcast functionality from the OOB base to its own framework

Includes merge from tmp/bwb-oob-rml-merge revisions:

    r15506, r15507, r15508, r15510, r15511, r15512, r15513

This commit was SVN r15528.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r15506
  r15507
  r15508
  r15510
  r15511
  r15512
  r15513
2007-07-20 01:34:02 +00:00
Brian Barrett
2d17dd9516 temporarily back our r15517 and 15520 so that I can get the RML / OOB changes
to cleanly apply

This commit was SVN r15527.

The following SVN revision numbers were found above:
  r15517 --> open-mpi/ompi@41977fcc95
2007-07-20 01:10:34 +00:00
Ralph Castain
41977fcc95 Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but...
Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point.

Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings.

This commit was SVN r15517.
2007-07-19 20:56:46 +00:00
Tim Prins
e41f86dfe6 add a small amount of debugging output
This commit was SVN r15483.
2007-07-18 15:20:55 +00:00
Jeff Squyres
b20248709a Next round of LSF commits. Getting farther, but it still doesn't
fully work yet (everything is still .ompi_ignore'ed for everyone).

This commit was SVN r15398.
2007-07-13 11:57:17 +00:00
George Bosilca
52eebd706f Update the xgrid PLS to fit the current interface of the PLS.
This commit was SVN r15396.
2007-07-13 06:18:16 +00:00
Ralph Castain
bd65f8ba88 Bring in an updated launch system for the orteds. This commit restores the ability to execute singletons and singleton comm_spawn, both in single node and multi-node environments.
Short description: major changes include -

1. singletons now fork/exec a local daemon to manage their operations.

2. the orte daemon code now resides in libopen-rte

3. daemons no longer use the orte triggering system during startup. Instead, they directly call back to their parent pls component to report ready to operate. A base function to count the callbacks has been provided.

I have modified all the pls components except xcpu and poe (don't understand either well enough to do it). Full functionality has been verified for rsh, SLURM, and TM systems. Compile has been verified for xgrid and gridengine.

This commit was SVN r15390.
2007-07-12 19:53:18 +00:00
Jeff Squyres
aa2c64d66d It compiles! That's a start... :-)
This commit was SVN r15382.
2007-07-12 14:41:09 +00:00
Jeff Squyres
e51bb19fab Fix some include files
This commit was SVN r15381.
2007-07-12 14:22:47 +00:00
Ralph Castain
a1bf04f39e First cut at revamping bproc support to separate it out from LANL's configuration.
First cut at adding support for LSF

Lots of ompi_ignores so only Jeff and I will see this stuff

This commit was SVN r15321.
2007-07-10 12:43:05 +00:00
Ralph Castain
684aa1bc9f Since universe size now is an orte thing, we may as well give it some direct support. Create rmgr set/get functions so it becomes more obvious where this value is being defined and how to retrieve it. Modify the bproc pls to pass it to the app procs when launched. Modify one of the test programs to verify it has been correctly set.
This commit was SVN r15266.
2007-07-02 16:45:40 +00:00
Josh Hursey
f88aa6c273 This commit cleans up the AMCA parameter implementation a bit.
* Remove the 'opal_mca_base_param_use_amca_sets' global variable
* Harness the fact that you can (read should) call the cmd_line functions
  before initializing opal_init_util(). This pushes the MCA/GMCA/AMCA
  command line options into the environment before OPAL inits and starts
  to use these values. By putting the cmd_line parse before opal_init_util
  in orterun and orted we only parse the *MCA parameter files once, and 
  correctly (alleviating the need to 'recache' the files on init.)
* Small bits of cleanup.

This commit was SVN r15219.
2007-06-27 01:03:31 +00:00
Sven Stork
0edcf1d47e - export required symbol
This commit was SVN r15190.
2007-06-25 14:27:04 +00:00
Jeff Squyres
bd56dc7e5d Fixes trac:1060
Per suggestion, if we don't find a valid shell via getpwuid(), also
check the $SHELL environment variable.  Also perform a few minor
cleanups along the way.

This commit was SVN r15156.

The following Trac tickets were found above:
  Ticket 1060 --> https://svn.open-mpi.org/trac/ompi/ticket/1060
2007-06-21 11:40:42 +00:00
George Bosilca
99e701062a The Windows job scheduler PLS. Initial commit as I have to move to
another Windows cluster. Right now it's not in a usable state.

This commit was SVN r15113.
2007-06-17 04:54:07 +00:00
Ralph Castain
fde15ac97d Bring the TM launcher online
This commit was SVN r15076.
2007-06-14 12:33:34 +00:00
George Bosilca
8dfa06a617 Only output when the user request it.
This commit was SVN r15067.
2007-06-14 04:33:18 +00:00
Pak Lui
de0f1eef89 No major changes here. Just updates to remove unused code and comments.
This commit was SVN r15051.
2007-06-13 17:23:03 +00:00
Pak Lui
03a93a38c5 Added an option for daemonizing orted. The existing behavior to --no-daemonize
for gridengine is not changed.

This commit was SVN r15050.
2007-06-13 17:11:37 +00:00
George Bosilca
18c2bb0ed6 Don't forget to set the name argument before spawning the daemon.
This commit was SVN r15047.
2007-06-13 15:45:34 +00:00
Pak Lui
8e7daea11f bring inline more changes with r15007.
This commit was SVN r15044.

The following SVN revision numbers were found above:
  r15007 --> open-mpi/ompi@85df3bd92f
2007-06-13 15:30:18 +00:00
Ralph Castain
425fed95ff Bring the SGE component online
This commit was SVN r15043.
2007-06-13 15:02:47 +00:00
George Bosilca
9d342ccb61 Shorter warning message.
This commit was SVN r15031.
2007-06-12 23:22:09 +00:00
George Bosilca
715f6012cf The DSS pack function can use the const attribute for the src field
as it is never modified by the pack functions directly. Enforce it
all over the code base.

This commit was SVN r15026.
2007-06-12 22:47:14 +00:00
George Bosilca
432185d617 Forget to remove the MCA parameter corresponding to the 2 unused
fields in the RSH PLS component.

This commit was SVN r15023.
2007-06-12 22:41:38 +00:00
George Bosilca
49e7bf3069 Be a little bit more clear when we fail to identify the shell.
This commit was SVN r15022.
2007-06-12 22:40:44 +00:00
George Bosilca
5b7796dfcd Remove 2 unused fields.
This commit was SVN r15021.
2007-06-12 22:39:57 +00:00
George Bosilca
bf6f30a42c Make the Windows PLS component match the current requirements for
a PLS module.

This commit was SVN r15019.
2007-06-12 22:34:56 +00:00
Ralph Castain
af64009368 Bring the CNOS component of the PLS back online
This commit was SVN r15018.
2007-06-12 22:17:05 +00:00
Ralph Castain
4e8081ed1e Cleanup a now unnecessary variable
This commit was SVN r15010.
2007-06-12 14:23:33 +00:00
Ralph Castain
85df3bd92f Bring in the generalized xcast communication system along with the correspondingly revised orted launch. I will send a message out to developers explaining the basic changes. In brief:
1. generalize orte_rml.xcast to become a general broadcast-like messaging system. Messages can now be sent to any tag on the daemons or processes. Note that any message sent via xcast will be delivered to ALL processes in the specified job - you don't get to pick and choose. At a later date, we will introduce an augmented capability that will use the daemons as relays, but will allow you to send to a specified array of process names.

2. extended orte_rml.xcast so it supports more scalable message routing methodologies. At the moment, we support three: (a) direct, which sends the message directly to all recipients; (b) linear, which sends the message to the local daemon on each node, which then relays it to its own local procs; and (b) binomial, which sends the message via a binomial algo across all the daemons, each of which then relays to its own local procs. The crossover points between the algos are adjustable via MCA param, or you can simply demand that a specific algo be used.

3. orteds no longer exhibit two types of behavior: bootproxy or VM. Orteds now always behave like they are part of a virtual machine - they simply launch a job if mpirun tells them to do so. This is another step towards creating an "orteboot" functionality, but also provided a clean system for supporting message relaying.

Note one major impact of this commit: multiple daemons on a node cannot be supported any longer! Only a single daemon/node is now allowed.

This commit is known to break support for the following environments: POE, Xgrid, Xcpu, Windows. It has been tested on rsh, SLURM, and Bproc. Modifications for TM support have been made but could not be verified due to machine problems at LANL. Modifications for SGE have been made but could not be verified. The developers for the non-verified environments will be separately notified along with suggestions on how to fix the problems.

This commit was SVN r15007.
2007-06-12 13:28:54 +00:00
George Bosilca
3b7f3e5565 Keep the unknown shell string.
This commit was SVN r14929.
2007-06-06 20:24:42 +00:00
Brian Barrett
508da4e959 OS X apparently really doesn't like shared libraries with unresolvable
symbols in them and environ is defined only in the final application
(probably in crt1.o).  Apple provides a function for getting at the
environment, so use that instead if it's available.

This commit was SVN r14857.
2007-06-05 03:03:59 +00:00
Ralph Castain
b771cfcce3 Fix compile problem
This commit was SVN r14713.
2007-05-21 20:11:03 +00:00
Galen Shipman
542937ee2f ompi running on bproc again.
This commit was SVN r14675.
2007-05-16 19:55:43 +00:00
Galen Shipman
df86202202 get bproc to compile, other issues still remain..
This commit was SVN r14661.
2007-05-15 23:11:33 +00:00
Ralph Castain
ad541e163e Fix compiler warning
This commit was SVN r14605.
2007-05-08 13:21:18 +00:00
Sven Stork
a04c8eb39a - Bring over the visibility feature, for a finer symbol export control
via the visibility feature that is provided by some compilers.

  Per default this feature is disabled, to enable it you need to
  configure with --enable-visibility and obviously you need a compiler
  with visibility support. Please refer to the wiki for more information.
  https://svn.open-mpi.org/trac/ompi/wiki/Visibility

This commit was SVN r14582.
2007-05-04 09:03:37 +00:00
Ralph Castain
2683c85085 Update the TM launcher so it provides an appropriate error message when encountering an invalid launch_id. This is a first step towards fixing ticket #1016, but needs to be followed by a more complete solution.
This commit was SVN r14578.
2007-05-03 20:14:24 +00:00
Shiqing Fan
c166e3d02c Too few arguments for call, fixed according to the corresponding definition.
This commit was SVN r14538.
2007-04-27 13:14:43 +00:00
Ralph Castain
7d6d0a1c00 Update reuse_daemons to find the daemons again - requires that orteds now report their nodenames (probably temporary patch pending upcoming minor revision of orted)
This commit was SVN r14533.
2007-04-26 15:09:54 +00:00
Ralph Castain
c733a7916b Update the gridengine pls to handle failed-to-start. Fix a few places where the fork'd child incorrectly called "return" instead of "exit" (undoubtedly copied from the same error in the old rsh pls).
This commit was SVN r14532.
2007-04-26 15:08:37 +00:00
Ralph Castain
bca2de3a57 Complete the update of the rsh pls to handle failed-to-start
This commit was SVN r14531.
2007-04-26 15:07:40 +00:00
Ralph Castain
8517a5a3a6 cleanup a few compiler warnings
This commit was SVN r14507.
2007-04-25 11:51:18 +00:00
Jeff Squyres
321e08c605 Add some missing header files
This commit was SVN r14500.
2007-04-24 21:39:12 +00:00