1
1

23 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
fdfe457578 Bring in the remote launch changes. This still isn't fully functional, but impacted a few other places that were worth fixing.
1. Added a new function to launch head node processes on remote nodes.

2. Added new tool "orteprobe" that checks to see if a daemon is running on a node. If so, it reports the contact info back to the requestor. If not, it will (eventually - but not now) fork/exec a daemon on the node, report the contact info back to requestor, and then die.

3. Modified orted to handle universe name parameters, and added separate command line flags for debugging the daemon and saving daemon debugging output in a file. The "debug" flag now turns on the runtime debug info instead of the daemon debug - thus, you can now just get daemon debug info if you like.

4. Fix the dps to handle zero length strings correctly.

5. Modify the fork and rsh launchers to pass required environmental variables to the daemons and processes

6. Pulled the redirection of stdin/stdout/stderr for the daemon out of orted and put it into the daemon_init function to simplify orted logic.

7. Modified sys_info to correctly deal with passed mca param

8. Modified univ_info to parse incoming universe location information.

This commit was SVN r5705.
2005-05-12 21:44:23 +00:00
Jeff Squyres
f5657fb8ee For the rsh pls, if the launch is on the local node, just exec it --
don't bother using the launching agent (typically rsh or ssh).

This commit was SVN r5702.
2005-05-12 19:12:53 +00:00
Jeff Squyres
a28b5ae43b Fix for a bunch of size_t issues; reviewed by George and Ralph.
- Change all uses of *printf'ing a size_t to use an explicit cast to
  (unsigned long) and the %lu escape
- change ORTE_GPR_REPLICA_MAX_SIZE to INT_MAX until bug 1345 is fixed
  (i.e., until we allow size_t in MCA params)
- ns_base_local_fns.c:orte_ns_base_get_proc_name_string(): changed
  from %0X -> %lu
- ORTE_NAME_ARGS added explicit (unsigned long) casts, and changed all
  usages of ORTE_NAME_ARGS to use %lu's

This commit was SVN r5644.
2005-05-08 13:22:55 +00:00
Ralph Castain
659d57f300 Several things in this commit - shouldn't impact any existing work:
1. Added pid_t to the dps

2. Processes now "register" their local pid and update their location (i.e., nodename) on the registry during mpi_init

3. Added a new error code for values that exceed maximum for their data type (useful when transitioning a value from one variable to another of different size)

4. Fixed a few places where size_t was being incorrectly handled

5. Updated dps_test to cover pid_t types

This should now provide support for TotalView connection - which David is pursuing.

This commit was SVN r5623.
2005-05-06 17:00:06 +00:00
George Bosilca
9387013ca2 If it's a unsigned long then the format string should be %lu.
This commit was SVN r5614.
2005-05-05 22:58:47 +00:00
Jeff Squyres
462adee81a Commit 1 of 4 to bring in the hetero branch to the trunk. Merged in
from:

svn merge -r5440:5448 https://svn.open-mpi.org/svn/ompi/tmp/hetero .

This commit was SVN r5549.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r5440
  r5448
2005-05-01 00:47:35 +00:00
Tim Woodall
ee52046876 make sure setup is complete before waiting on child
This commit was SVN r5476.
2005-04-20 15:59:22 +00:00
Tim Woodall
99ca522d39 debug option to not execute ompid - just print out command line - to enable
debug of the deamon

This commit was SVN r5473.
2005-04-20 15:43:55 +00:00
Jeff Squyres
f9ef7d4657 Make the pls's clean up the session directory of each process that dies.
This commit was SVN r5403.
2005-04-15 21:34:07 +00:00
Tim Woodall
a831729d6f split close into finalize/close so that rmgr can finalize all
sub-components prior to entering close. moved pls logic to
wait on children from close to finalize.

This commit was SVN r5392.
2005-04-15 17:04:57 +00:00
Brian Barrett
ce07c10a7e * fix error message formatting. resolves bug #1290.
This commit was SVN r5284.
2005-04-12 22:13:55 +00:00
Tim Woodall
53d51363ba move recv_cancel from close to finalize
This commit was SVN r5117.
2005-03-31 22:37:46 +00:00
Tim Woodall
81e9377c87 use nonblocking send/recv to send terminate message to the daemons
as they may have already exited

This commit was SVN r5111.
2005-03-31 18:53:35 +00:00
Tim Woodall
447d370905 - added proxy resource manager which is loaded when not the seed
- added support to pls fork/rsh modules for terminate_job

This commit was SVN r5110.
2005-03-31 15:47:37 +00:00
Brian Barrett
5753c6a47f Only set the state of the processes the daemon was responsible for to
ABORTED if the ssh that started the daemon exited abnormally.  Otherwise,
bad things happen if all the processes on that node exit before the
processes on other nodes.

This patch is bigger than it should be because I had to indent a bunch of code 
when I moved the if statement.

This commit was SVN r5107.
2005-03-31 04:23:55 +00:00
Tim Woodall
72b3a823c3 need to use integer when passing jobid on command line
This commit was SVN r5095.
2005-03-29 19:41:29 +00:00
Brian Barrett
cdbf179d40 * add header files that "go missing" if compiling with optimizations
* Fix one file that didn't have a comment header

This commit was SVN r5085.
2005-03-29 13:50:15 +00:00
Jeff Squyres
ffc75a623f Remove redundant declaration of orte_soh and move it into
src/mca/soh/base/base.h (similar to most other frameworks).

This commit was SVN r5073.
2005-03-28 20:54:45 +00:00
Tim Woodall
805095986c - mods to support daemon command line parameters
- check return value correctly when posting non-blocking recvs
- use any values that have been set in the global structs as the
  defaults when registering mca parameters - this prevents any
  values that have been set in the structs from the command line
  parser from being overwritten

This commit was SVN r5011.
2005-03-24 15:45:44 +00:00
Jeff Squyres
3f5541349a Add UC copyright
This commit was SVN r5009.
2005-03-24 12:43:37 +00:00
Brian Barrett
30af9a7b90 * More changes from the tim branch. Still has problems with ABORTed procs,
but now tells you when it can't find orted.  Also includes memory leak
  plugs, bproc fixes, and gm repairs.

This commit was SVN r4937.
2005-03-18 23:58:36 +00:00
Brian Barrett
77c65d69cc * Merge changes from tim branch from r 4821 to 4892. Tree can now run
MPI and non-ORTE applications for RSH on one node with or without
  threads.  I think we're approaching convergence with the tim branch

This commit was SVN r4895.
2005-03-18 03:43:59 +00:00
Brian Barrett
6822a519bb * results from initial merge of the tim branch into the trunk. Compiles and
ompi_info works, but that's all that has been tested.

This commit was SVN r4827.
2005-03-14 20:57:21 +00:00