1
1
Граф коммитов

11 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Josh Hursey
92429dc90f Fix for a problem Edgar and Jeff identified WRT PLS determining if we are
oversubscribed on a node. And thus whether to call sched_yield or not.

The value of node->node_slots_inuse does not currently represent the number of
slots actually in use, at the moment. This is actually a bug in the RAS/RMAPS
base components, but the fix for that specific bug is bigger than we want to 
address at the moment (but will certianly do so in the near future).

Since we cannot trust this value, use the total number of mapped processes
(which was properly set by the RMAPS component upon mapping -- Just not 
properly propagated back to the registry's node segment) from the process 
mapping.

In addition to this change I cleaned up a couple of the debug messages. It
seems that TM and RSH are the only two directly effected by this. SLURM
would be if that section of code wasn't currently inactive, but put the fix
in for prosparity.

This commit was SVN r7743.
2005-10-13 03:26:48 +00:00
Jeff Squyres
0629cdc2d7 Bring back the changes from /tmp/jjhursey-rmaps. Specific merge
command:

svn merge -r 7567:7663 https://svn.open-mpi.org/svn/ompi/tmp/jjhursey-rmaps .

(where "." is a trunk checkout)

The logs from this branch are much more descriptive than I will put
here (including a *really* long description from last night).  Here's
the short version:

- fixed some broken implementations in ras and rmaps
- "orterun --host ..." now works and has clearly defined semantics
  (this was the impetus for the branch and all these fixes -- LANL had
  a requirement for --host to work for 1.0)
- there is still a little bit of cleanup left to do post-1.0 (we got
  correct functionality for 1.0 -- we did not fix bad implementations
  that still "work")
  - rds/hostfile and ras/hostfile handshaking
  - singleton node segment assignments in stage1
  - remove the default hostfile (no need for it anymore with the
    localhost ras component)
  - clean up pls components to avoid duplicate ras mapping queries
  - [possible] -bynode/-byslot being specific to a single app context 

This commit was SVN r7664.
2005-10-07 22:24:52 +00:00
Jeff Squyres
d44fc0fa2a - Clarify the help file text a little
- Remove an extraneous \n in opal_output() output

This commit was SVN r7581.
2005-10-02 11:58:51 +00:00
Jeff Squyres
0459678f82 Fixes to make the SLURM pls handle --prefix properly
This commit was SVN r7569.
2005-09-30 21:44:05 +00:00
Josh Hursey
a5e5924217 Added a custom arguments MCA param for Slurm PLS.
This allows the user to specify certain options to srun when an application
is launched with this PLS.

A useful example is the need to set the time to wait from when the first
process completes and when slurm kills remaining processes:

  pls_slurm_args=--wait=1200

This commit was SVN r7206.
2005-09-06 21:52:28 +00:00
Brian Barrett
fc71fd5744 * fix place where Jeff changed an exit to a return and we really wanted
it to be an exit.
* Put the srun process (or what is about to become the srun process) in
  it's own process group so that group-wide signals (such as the 
  SIGINT sent by hitting cntl-c in a shell) are not sent to the srun
  process. 

This commit was SVN r7068.
2005-08-27 17:08:48 +00:00
Brian Barrett
3e8740e740 * mostly working SLURM component. Had to add a sds for the daemons so that
we could vector launch the daemons and still have the nodenames fixed 
  up in the end

This commit was SVN r7041.
2005-08-25 22:29:23 +00:00
Jeff Squyres
524ded4896 A little cleanup and progress:
- build a proper srun argv
- launch the srun
- still have several "JMS" comments that need to be addressed

This commit was SVN r7036.
2005-08-25 16:38:42 +00:00
Jeff Squyres
9755a7f7fa First cut -- not working yet -- checkpointing to move to another
machine.

This commit was SVN r7018.
2005-08-24 22:19:48 +00:00
Jeff Squyres
1b18979f79 Initial population of orte tree
This commit was SVN r6266.
2005-07-02 13:42:54 +00:00