1
1
openmpi/orte
Ralph Castain 555bbf0c02 Fix the iof race conditions wrt proc termination. This is comprised of two sections:
1. modify the iof to track when a proc actually closes all of its open iof output pipes. When this occurs, notify the odls that the proc's iof is complete. This is done via a zero-time event so that we can step out of the read event before processing the notification.

2. in the odls, modify the waitpid callback so it only flags that it was called. Add a function to receive the iof-complete notification, and a function that checks for both iof complete and waitpid callback before declaring a proc fully terminated. This ensures that we read and deliver -all- of the IO prior to declaring the job complete.

Also modified the odls call to orte_iof.close (and the component's implementation) so it only closes stdin, leaving the other io channels alone. This fixes the other half of the known problem.

This should fix the ticket on this subject, but I'll wait to close it pending further testing in the trunk.

This commit was SVN r19991.
2008-11-12 23:32:01 +00:00
..
etc Many thanks to Ralf W. for finding a subtle bug in these Makefile.am's 2008-06-04 01:28:03 +00:00
include Roll in the revamped IOF subsystem. Per the devel mailing list email, this is a complete rewrite of the iof framework designed to simplify the code for maintainability, and to support features we had planned to do, but were too difficult to implement in the old code. Specifically, the new code: 2008-10-18 00:00:49 +00:00
mca Fix the iof race conditions wrt proc termination. This is comprised of two sections: 2008-11-12 23:32:01 +00:00
orted This is a first step towards supporting fully-routed OOB communications: 2008-10-31 21:10:00 +00:00
runtime Fix the iof race conditions wrt proc termination. This is comprised of two sections: 2008-11-12 23:32:01 +00:00
test This is a first step towards supporting fully-routed OOB communications: 2008-10-31 21:10:00 +00:00
tools fix some typos. should be moved to v1.3 2008-11-10 19:05:26 +00:00
util Fix typo found in Makefile that caused problems with "make distclean"; 2008-11-05 20:58:27 +00:00
Doxyfile Fix the broken Doxyfile so people can generate what little code base documentation we have :-) 2006-04-13 12:52:17 +00:00
Makefile.am Remove the orte_proc_table. Migrate all users of it to the opal_hash_table and a new name hash function in orte. 2008-03-05 22:44:35 +00:00