1
1
Граф коммитов

4 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
bd65f8ba88 Bring in an updated launch system for the orteds. This commit restores the ability to execute singletons and singleton comm_spawn, both in single node and multi-node environments.
Short description: major changes include -

1. singletons now fork/exec a local daemon to manage their operations.

2. the orte daemon code now resides in libopen-rte

3. daemons no longer use the orte triggering system during startup. Instead, they directly call back to their parent pls component to report ready to operate. A base function to count the callbacks has been provided.

I have modified all the pls components except xcpu and poe (don't understand either well enough to do it). Full functionality has been verified for rsh, SLURM, and TM systems. Compile has been verified for xgrid and gridengine.

This commit was SVN r15390.
2007-07-12 19:53:18 +00:00
Ralph Castain
ef71055cf8 Teach the odls to properly test for and report failed-to-start for application processes.
Test for system limits (where known) prior to doing things like fork and pipe since some systems aren't very nice about it when we try to exceed such limits.

This commit was SVN r14494.
2007-04-24 18:54:45 +00:00
Ralph Castain
4636125e2d Modify the RMGR components to allow job setup with a given jobid, and add another attribute so that we can setup triggers without launching.
Add some debugging output to the ODLS default module, and the orted.

Remove the nodename data from the ODLS info report - that info is already stored in the registry by the RMAPS framework upon completing the mapping procedure.

Add another test program that does an ORTE-only dynamic spawn (gasp!). Looks just like comm_spawn - just no MPI involved.

Modify the ODLS to release the processor when we "kill" local procs in a more scalable fashion. It previously had a sleep in it that Jeff's prior commit removed. However, he introduced some Windows code into the non-Windows component (protected by "if"s, but unnecessary). This is a more general solution he proposed - included here so I could get things to compile properly.

This commit was SVN r12579.
2006-11-13 18:51:18 +00:00
Ralph Castain
4e50cdae52 This commit accomplishes two things:
1. Fix the "hang" condition when an application isn't found. It turned out that the ODLS had some difficulty with the process actually not having been started - hence, it never called the waitpid callback. As a result, the "terminated" trigger didn't fire, and so mpirun didn't wake up. With this change, the HNP's errmgr forces the issue by causing the trigger to fire itself when an abort condition occurs.

2. Shift the recording of the pid and the nodename from mpi_init to the orted launcher. This allows programs such as Eclipse PTP to get the pids even for non-MPI applications. In the case of bproc, the pls handles this chore since we don't use orteds in that system.

This commit was SVN r12558.
2006-11-11 04:03:45 +00:00