Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution.
This commit was SVN r23943.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.
Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.
Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.
I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:
1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)
2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.
There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.
This commit was SVN r23925.
I did not want to make this change globally since there could be good reason to keep the check before calling SIGKILL that I am not seeing at the moment.
This commit was SVN r23821.
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.
Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.
This commit was SVN r23764.
All interface APIs for accessing the info remain unchanged in opal/util/if.c.
This has been tested on Mac, Linux, and NetBSD. Nobody else seemed interested in testing it, so there may be some future problems revealed as people try it on other OSs.
This commit was SVN r23743.
administrator can specify compiler flags that get
inserted into the command before the user's flags.
These flags can be specified at configure time.
Reviewed by Jeff Squyres.
This fixes ticket #2474.
This commit was SVN r23709.
- Add one instance where we do not use a parameter in a function
- Fix a buglet in commit r23689, where the attribute-for-function ptrs
was applied.
This commit was SVN r23690.
The following SVN revision numbers were found above:
r23689 --> open-mpi/ompi@5eb571c458
be tested on function pointers and assigned accordingly,
instead of using the pre-processor in the header files.
A functional change is (re-) specifying __opal_attribute_noreturn__
on orte_errmgr_base_abort(): All modules in the errmgr framework
either use this function, or define their own abort function,
which sets __opal_attribute_noreturn__.
This attributes was taken out with the errmgr overhaul in r22872.
This commit was SVN r23689.
The following SVN revision numbers were found above:
r22872 --> open-mpi/ompi@e4f2d03d28
assigned to function-declarations.
Check this case and mark the currently only case existing in trunk.
Thanks to Paul Hargrove for bringing this up.
Let's test the svn commit msg CMR:v1.5
This commit was SVN r23676.
extravaganza.
= Short version =
This commit does several things, but the short version is that it
re-orients the error message creation of the ODLS default module to
generate error strings in the child process for errors that occur
after the fork but before the exec (such errors are ''usually''
related to paffinity). A show_help string is rendered in the child
and then IPC'ed up to the parent, who displays the string through
normal ORTE show_help aggregation mechanisms. We also broke up the
ginormous paffinity-setting logic into a few separate functions, both
to help us understand the code, and hopefully to ease future
maintenance.
The logic for the ODLS default binding should not have changed -- this
is mainly a code reshuffle and improvement on error reporting.
= Rationale =
The reasoning for this commit is complex. As mentioned above, it's
the first step in some paffinity cleanup. Here's the line of dominoes
that must fall (in this order):
1. Add hwloc paffinity component (already done).
1. While testing hwloc, we discovered that the error reporting from
the ODLS default module was abysmal. So we fixed it.
1. Further, we reorganized the code in the odsl_default_module.c a bit
to help our understanding of it.
1. We also discovered a few bugs in the original ODLS default module
logic that existed before this code shuffle; separate tickets
will be filed to fix them.
1. Next up will be some improvements to paffinity / odls default to
make the act of binding to a core ensure to bind to ''all''
hardware threads contained in that core (similar for sockets:
binding to a socket will bind to ''all'' hardware threads in that
socket).
1. Next will be improvements to paffinity to expose binding to
hardware threads through the paffinity framework API.
1. Finally, we'll expose these binding controls to the user (e.g.,
through mpirun command line arguments, MCA parameters, etc.).
This commit represents the first few bullets; the last 4 bullets are
being worked on right now, but there is no definite timeline for
completion.
= Miscelaneous =
A few points worth mentioning:
* We have tested this new code a bunch; we're pretty sure it behaves
just like the trunk -- but with better / more precise error
reporting. More testing is needed on a wider array of platforms,
however.
* A big comment at the top of odls_default_module.c explains the
(new) general scheme for the error reporting.
* The error reporting in the parent process is now really dumb;
almost all the intelligence about creating error messages is in the
child.
* The show_help file was renamed to be more consistent with other
help files (help-odls-default.txt -> help-orte-odls-default.txt)
* Removed the use of sched_yield() because of recent changes in the
Linux 2.6.3x kernels. We already had an #else clause for
select()'ing for 1us if we didn't have sched_yield() -- that is now
the only code path. This is not a performance-critical section of
the code, so this shouldn't be controversial.
* Replaced the macro-based error reporting with function-based
reporting. It's a bit more bulky, but it helped us understand the
code and saved us multiple times with compile-time parameter
checking, etc.
* Cleaned up the use of several show_help messages to ensure that
they mapped to real messages in help*.txt files.
This commit was SVN r23652.
as orte_show_help(), but it takes a fully-rendered string instead of a
varargs list that must be rendered. This function is useful in cases
where one entity renders the "show help" string and a different entity
sends the string via the normal orte "show help" mechanisms for
aggregation, etc.
Example usage: errors occur in the ODLS after forking but before
exec'ing. In such cases, it makes sense for the the child process to
render the "show help" string because it has all the details about the
error. But the child process can't call orte_show_help() itself
because it is not an ORTE process -- it can't OOB send the message
to the HNP, etc.
After rendering the help string, the child sends the rendered string
to its parent via normal IPC (e.g., via a pipe) and the parent can
then invoke orte_show_help_norender() with the ready-to-go string.
The message then displays out via the normal mechanisms (i.e., out via
the HNP, aggregated/coalesced, etc.).
This commit was SVN r23651.