1
1
openmpi/orte/mca/odls/base
Ralph Castain 555bbf0c02 Fix the iof race conditions wrt proc termination. This is comprised of two sections:
1. modify the iof to track when a proc actually closes all of its open iof output pipes. When this occurs, notify the odls that the proc's iof is complete. This is done via a zero-time event so that we can step out of the read event before processing the notification.

2. in the odls, modify the waitpid callback so it only flags that it was called. Add a function to receive the iof-complete notification, and a function that checks for both iof complete and waitpid callback before declaring a proc fully terminated. This ensures that we read and deliver -all- of the IO prior to declaring the job complete.

Also modified the odls call to orte_iof.close (and the component's implementation) so it only closes stdin, leaving the other io channels alone. This fixes the other half of the known problem.

This should fix the ticket on this subject, but I'll wait to close it pending further testing in the trunk.

This commit was SVN r19991.
2008-11-12 23:32:01 +00:00
..
base.h Fix the iof race conditions wrt proc termination. This is comprised of two sections: 2008-11-12 23:32:01 +00:00
help-orte-odls-base.txt Bring over the jjh-filem branch which contains a non-blocking FileM interface 2007-09-27 13:13:29 +00:00
Makefile.am Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm. 2008-06-18 03:15:56 +00:00
odls_base_close.c Nothing relevant, few indentations and replace tab by spaces. 2008-10-31 22:24:52 +00:00
odls_base_default_fns.c Fix the iof race conditions wrt proc termination. This is comprised of two sections: 2008-11-12 23:32:01 +00:00
odls_base_open.c Fix the iof race conditions wrt proc termination. This is comprised of two sections: 2008-11-12 23:32:01 +00:00
odls_base_select.c Fix some Coverity 'Event set_but_not_used' highlights. 2008-06-06 14:38:41 +00:00
odls_base_state.c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. 2008-06-09 14:53:58 +00:00
odls_private.h Fix the iof race conditions wrt proc termination. This is comprised of two sections: 2008-11-12 23:32:01 +00:00