1
1
openmpi/opal/mca
Josh Hursey 66af515061 Fix C/R functionality with the new libtool. This fixes the case where the restarted process cannot be checkpointed or finalized.
Short Version:
--------------
Event engine needs to be flushed so it does not use old/stale file descriptors.

Long Version:
-------------
The problem was that the restarted process was waiting for the socket to the local daemon to finish establishing during the 'sync' operation. The core problem was that the daemon was sending a header of 36 bytes, but the restarted process only received 35 bytes of the message. So the restarted process became stuck waiting for the last byte to arrive.

After many hours of digging, I figured out that the event engine was using the same file descriptor for its evsig_cb functionality (to signal itself when a signal arrives). So when the daemon wrote in to the new fd the event engine was stealing the first byte (*shakes fist at event engine*) before the recv() could be posted.

The solution is to use the event_reinit() function on restart to re-establish the now-stale file descriptors in the event engine. This seems to have fixed the problem.


A few other minor things:
-------------------------
 * Add a check to make sure the event engine is balanced in its init/finalize
 * Add the opal_event_base_close() to the BLCR restart exec function (still not 100% sure it is needed, but there it is).

This commit was SVN r24296.
2011-01-25 22:43:47 +00:00
..
backtrace Convert the bad dos line endings to unix style for all windows related files. 2010-12-02 12:08:08 +00:00
base Add a proper help message for the mca_verbose MCA param (and shuffle 2011-01-14 20:18:06 +00:00
carto Fix mistake that came in via the ompi-agen tree in r23764. The mistake wasn't part of the core autogen upgrade; it was an additional 'bonus' cleanup. Oops. The mistake will always create a set of directories under installdir, even if you do not --with-devel-headers. The set of directories will be empty, but still -- they should not be there at all. This commit fixes that -- the directories are not created at all if you do not --with-devel-headers 2010-09-24 22:53:28 +00:00
compress Decode SOS error code before checking it with the native error code. 2011-01-20 23:21:38 +00:00
crs Fix C/R functionality with the new libtool. This fixes the case where the restarted process cannot be checkpointed or finalized. 2011-01-25 22:43:47 +00:00
event Fix C/R functionality with the new libtool. This fixes the case where the restarted process cannot be checkpointed or finalized. 2011-01-25 22:43:47 +00:00
if Revert r23928 as being the incorrect fix. The correct fix is not to include ipv6 interfaces when ipv6 support was not requested. 2010-10-25 14:31:18 +00:00
installdirs Convert the bad dos line endings to unix style for all windows related files. 2010-12-02 12:08:08 +00:00
maffinity Somehow this has been sitting, uncommitted, in a local checkout since 2011-01-24 14:39:16 +00:00
memchecker Per http://www.open-mpi.org/community/lists/devel/2010/11/8671.php, 2010-11-12 23:22:11 +00:00
memcpy Fix mistake that came in via the ompi-agen tree in r23764. The mistake wasn't part of the core autogen upgrade; it was an additional 'bonus' cleanup. Oops. The mistake will always create a set of directories under installdir, even if you do not --with-devel-headers. The set of directories will be empty, but still -- they should not be there at all. This commit fixes that -- the directories are not created at all if you do not --with-devel-headers 2010-09-24 22:53:28 +00:00
memory Per http://www.open-mpi.org/community/lists/devel/2010/11/8671.php, 2010-11-12 23:22:11 +00:00
paffinity Remove a few useless files that were missed last night. 2011-01-11 14:15:31 +00:00
pstat Per http://www.open-mpi.org/community/lists/devel/2010/11/8671.php, 2010-11-12 23:22:11 +00:00
sysinfo removing a file I should not have added 2011-01-11 19:02:08 +00:00
timer Convert the bad dos line endings to unix style for all windows related files. 2010-12-02 12:08:08 +00:00
Makefile.am Update the copyright notices for IU and UTK. 2005-11-05 19:57:48 +00:00
mca.h Per http://www.open-mpi.org/community/lists/devel/2010/01/7283.php, allow MCA components to fail the component.register and component.open methods without the MCA base printing errors. 2010-01-12 19:29:12 +00:00