1
1
Josh Hursey 66af515061 Fix C/R functionality with the new libtool. This fixes the case where the restarted process cannot be checkpointed or finalized.
Short Version:
--------------
Event engine needs to be flushed so it does not use old/stale file descriptors.

Long Version:
-------------
The problem was that the restarted process was waiting for the socket to the local daemon to finish establishing during the 'sync' operation. The core problem was that the daemon was sending a header of 36 bytes, but the restarted process only received 35 bytes of the message. So the restarted process became stuck waiting for the last byte to arrive.

After many hours of digging, I figured out that the event engine was using the same file descriptor for its evsig_cb functionality (to signal itself when a signal arrives). So when the daemon wrote in to the new fd the event engine was stealing the first byte (*shakes fist at event engine*) before the recv() could be posted.

The solution is to use the event_reinit() function on restart to re-establish the now-stale file descriptors in the event engine. This seems to have fixed the problem.


A few other minor things:
-------------------------
 * Add a check to make sure the event engine is balanced in its init/finalize
 * Add the opal_event_base_close() to the BLCR restart exec function (still not 100% sure it is needed, but there it is).

This commit was SVN r24296.
2011-01-25 22:43:47 +00:00
..

Last updated: 15 Sep 2010

How to update the Libevent embedded in OPAL
-------------------------------------------

OPAL requires some modification of the Libevent build system in order
to properly operate. In addition, OPAL accesses the Libevent functions
through a set of wrappers - this is done for three reasons:

1. Hide the Libevent functions. Some applications directly call
   libevent APIs and expect to operate against a locally installed
   library. Since the library used by OPAL may differ in version, and
   to avoid linker errors for multiply-defined symbols, it is
   important that the libevent functions included in OPAL be "hidden"
   from external view. Thus, OPAL's internal copy of libevent is built
   with visibility set to "hidden" and all access from the OPAL code
   base is done through "opal_xxx" wrapper API calls.

   In those cases where the system is built against a compiler that
   doesn't support visibility, conflicts can (unfortunately)
   arise. However, since only a very few applications would be
   affected, and since most compilers support visibility, we do not
   worry about this possibility.

2. Correct some deficiencies in the distributed Libevent configuration
   tests.  Specifically, the distributed tests for kqueue and epoll
   support provide erroneous results on some platforms (as determined
   by our empirical testing). OPAL therefore provides enhanced tests
   to correctly assess those environments.
   
3. Enable greater flexibility in configuring Libevent for the specific
   environment. In particular, OPAL has no need of Libevent's dns,
   http, and rpc events, so configuration options to remove that code
   from Libevent have been added.

The procedure for updating Libevent has been greatly simplified
compared to prior versions in the OPAL code base by replacing
file-by-file edits with configuration logic. Thus, updating the
included libevent code can generally be accomplished by:

1. create a new opal/mca/event component for the updated version, using
   a name "libeventxxx", where xxx = libevent version. For example,
   libevent 2.0.7 => component libevent207

2. create a subdirectory "libevent" in the new component and unpack the
   new libevent code tarball into it.

3. copy the configure.m4, autogen.subdirs, component.c, moduule.c,
   Makefile.am, and .h files from a prior version to the new component.
   You will need to customize them for the new version, but they can
   serve as a good template. In many cases, you will just have to update
   the component name.

4. edit libevent/configure.in to add OMPI specific options and modes.
   Use the corresponding file from a prior version as a guide. The
   necessary changes are marked with "OMPI" comments. These changes
   have been pushed upstream to libevent, and so edits may no longer
   be required in the new version.

5. Modify libevent/Makefile.am. Here again, you should use the file from
   a prior version as an example. Hopefully, you can just use the file
   without change - otherwise, the changes will have to be done by hand.
   Required changes reflect the need for OMPI to turn "off" unused
   subsystems such as http. These changes have been pushed upstream to
   libevent, and so edits may no longer be required in the new version.

6. in your new component Makefile.am, note that the libevent headers
   are listed by name when WITH_INSTALL_HEADERS is given. This is required
   to support the OMPI --with-devel-headers configure option. Please review
   the list and update it to include all libevent headers for the new
   version.