1
1
openmpi/orte/mca/pls/base
Ralph Castain 437f2b044d Modify the orted command communication system in two ways:
1. use non-blocking sends to transmit commands (this was actually done in a prior commit)

2. have an "ack" message sent back from the orted when it completes the command

The latter item is the new one here. With my prior commit, it was possible for the HNP to move on to other things before the orted had completed its command. This caused the HNP to occassionally exit before the orted, thus generating "lost connection" errors. With this change, we retain the parallel nature of the command communications, but still hold the HNP at that point until the orteds are done.

Best of both worlds.

This commit was SVN r12605.
2006-11-15 15:09:28 +00:00
..
base.h Add another test program - an MPI app that just spins. This supports testing of system response to signal-terminated processes. 2006-11-13 21:51:34 +00:00
help-pls-base.txt Fix the bug that caused mpirun to hang when a remote executable wasn't found using the rsh launcher. Will now test on a remote node 2006-10-11 18:43:13 +00:00
Makefile.am Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done. 2006-09-14 21:29:51 +00:00
pls_base_close.c Add another test program - an MPI app that just spins. This supports testing of system response to signal-terminated processes. 2006-11-13 21:51:34 +00:00
pls_base_dmn_registry_fns.c Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). 2006-11-14 19:34:59 +00:00
pls_base_general_support_fns.c Newline is required by some compilers at the end of a file. 2006-10-21 05:56:04 +00:00
pls_base_open.c Add another test program - an MPI app that just spins. This supports testing of system response to signal-terminated processes. 2006-11-13 21:51:34 +00:00
pls_base_orted_cmds.c Modify the orted command communication system in two ways: 2006-11-15 15:09:28 +00:00
pls_base_receive.c Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). 2006-11-14 19:34:59 +00:00
pls_base_select.c Continue bringing comm_spawn back online. Ensure all RM frameworks post their HNP receives. Fix the rmgr proxy component. 2006-10-02 00:46:31 +00:00
pls_private.h Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). 2006-11-14 19:34:59 +00:00