2004-12-07 22:03:20 +03:00
|
|
|
#
|
2005-11-05 22:57:48 +03:00
|
|
|
# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
|
|
|
|
# University Research and Technology
|
|
|
|
# Corporation. All rights reserved.
|
|
|
|
# Copyright (c) 2004-2005 The University of Tennessee and The University
|
|
|
|
# of Tennessee Research Foundation. All rights
|
|
|
|
# reserved.
|
2004-12-07 22:03:20 +03:00
|
|
|
# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
|
|
|
# University of Stuttgart. All rights reserved.
|
2005-03-24 15:43:37 +03:00
|
|
|
# Copyright (c) 2004-2005 The Regents of the University of California.
|
|
|
|
# All rights reserved.
|
This commit is the first of several steps in a paffinity makeover
extravaganza.
= Short version =
This commit does several things, but the short version is that it
re-orients the error message creation of the ODLS default module to
generate error strings in the child process for errors that occur
after the fork but before the exec (such errors are ''usually''
related to paffinity). A show_help string is rendered in the child
and then IPC'ed up to the parent, who displays the string through
normal ORTE show_help aggregation mechanisms. We also broke up the
ginormous paffinity-setting logic into a few separate functions, both
to help us understand the code, and hopefully to ease future
maintenance.
The logic for the ODLS default binding should not have changed -- this
is mainly a code reshuffle and improvement on error reporting.
= Rationale =
The reasoning for this commit is complex. As mentioned above, it's
the first step in some paffinity cleanup. Here's the line of dominoes
that must fall (in this order):
1. Add hwloc paffinity component (already done).
1. While testing hwloc, we discovered that the error reporting from
the ODLS default module was abysmal. So we fixed it.
1. Further, we reorganized the code in the odsl_default_module.c a bit
to help our understanding of it.
1. We also discovered a few bugs in the original ODLS default module
logic that existed before this code shuffle; separate tickets
will be filed to fix them.
1. Next up will be some improvements to paffinity / odls default to
make the act of binding to a core ensure to bind to ''all''
hardware threads contained in that core (similar for sockets:
binding to a socket will bind to ''all'' hardware threads in that
socket).
1. Next will be improvements to paffinity to expose binding to
hardware threads through the paffinity framework API.
1. Finally, we'll expose these binding controls to the user (e.g.,
through mpirun command line arguments, MCA parameters, etc.).
This commit represents the first few bullets; the last 4 bullets are
being worked on right now, but there is no definite timeline for
completion.
= Miscelaneous =
A few points worth mentioning:
* We have tested this new code a bunch; we're pretty sure it behaves
just like the trunk -- but with better / more precise error
reporting. More testing is needed on a wider array of platforms,
however.
* A big comment at the top of odls_default_module.c explains the
(new) general scheme for the error reporting.
* The error reporting in the parent process is now really dumb;
almost all the intelligence about creating error messages is in the
child.
* The show_help file was renamed to be more consistent with other
help files (help-odls-default.txt -> help-orte-odls-default.txt)
* Removed the use of sched_yield() because of recent changes in the
Linux 2.6.3x kernels. We already had an #else clause for
select()'ing for 1us if we didn't have sched_yield() -- that is now
the only code path. This is not a performance-critical section of
the code, so this shouldn't be controversial.
* Replaced the macro-based error reporting with function-based
reporting. It's a bit more bulky, but it helped us understand the
code and saved us multiple times with compile-time parameter
checking, etc.
* Cleaned up the use of several show_help messages to ensure that
they mapped to real messages in help*.txt files.
This commit was SVN r23652.
2010-08-24 23:38:29 +04:00
|
|
|
# Copyright (c) 2010 Cisco Systems, Inc. All rights reserved.
|
2004-12-07 22:03:20 +03:00
|
|
|
# $COPYRIGHT$
|
|
|
|
#
|
|
|
|
# Additional copyrights may follow
|
|
|
|
#
|
|
|
|
# $HEADER$
|
|
|
|
#
|
|
|
|
|
This commit is the first of several steps in a paffinity makeover
extravaganza.
= Short version =
This commit does several things, but the short version is that it
re-orients the error message creation of the ODLS default module to
generate error strings in the child process for errors that occur
after the fork but before the exec (such errors are ''usually''
related to paffinity). A show_help string is rendered in the child
and then IPC'ed up to the parent, who displays the string through
normal ORTE show_help aggregation mechanisms. We also broke up the
ginormous paffinity-setting logic into a few separate functions, both
to help us understand the code, and hopefully to ease future
maintenance.
The logic for the ODLS default binding should not have changed -- this
is mainly a code reshuffle and improvement on error reporting.
= Rationale =
The reasoning for this commit is complex. As mentioned above, it's
the first step in some paffinity cleanup. Here's the line of dominoes
that must fall (in this order):
1. Add hwloc paffinity component (already done).
1. While testing hwloc, we discovered that the error reporting from
the ODLS default module was abysmal. So we fixed it.
1. Further, we reorganized the code in the odsl_default_module.c a bit
to help our understanding of it.
1. We also discovered a few bugs in the original ODLS default module
logic that existed before this code shuffle; separate tickets
will be filed to fix them.
1. Next up will be some improvements to paffinity / odls default to
make the act of binding to a core ensure to bind to ''all''
hardware threads contained in that core (similar for sockets:
binding to a socket will bind to ''all'' hardware threads in that
socket).
1. Next will be improvements to paffinity to expose binding to
hardware threads through the paffinity framework API.
1. Finally, we'll expose these binding controls to the user (e.g.,
through mpirun command line arguments, MCA parameters, etc.).
This commit represents the first few bullets; the last 4 bullets are
being worked on right now, but there is no definite timeline for
completion.
= Miscelaneous =
A few points worth mentioning:
* We have tested this new code a bunch; we're pretty sure it behaves
just like the trunk -- but with better / more precise error
reporting. More testing is needed on a wider array of platforms,
however.
* A big comment at the top of odls_default_module.c explains the
(new) general scheme for the error reporting.
* The error reporting in the parent process is now really dumb;
almost all the intelligence about creating error messages is in the
child.
* The show_help file was renamed to be more consistent with other
help files (help-odls-default.txt -> help-orte-odls-default.txt)
* Removed the use of sched_yield() because of recent changes in the
Linux 2.6.3x kernels. We already had an #else clause for
select()'ing for 1us if we didn't have sched_yield() -- that is now
the only code path. This is not a performance-critical section of
the code, so this shouldn't be controversial.
* Replaced the macro-based error reporting with function-based
reporting. It's a bit more bulky, but it helped us understand the
code and saved us multiple times with compile-time parameter
checking, etc.
* Cleaned up the use of several show_help messages to ensure that
they mapped to real messages in help*.txt files.
This commit was SVN r23652.
2010-08-24 23:38:29 +04:00
|
|
|
dist_pkgdata_DATA = help-orte-odls-default.txt
|
2004-12-07 22:03:20 +03:00
|
|
|
|
|
|
|
sources = \
|
2006-09-15 01:29:51 +04:00
|
|
|
odls_default.h \
|
|
|
|
odls_default_component.c \
|
|
|
|
odls_default_module.c
|
2004-12-07 22:03:20 +03:00
|
|
|
|
2005-03-14 23:57:21 +03:00
|
|
|
# Make the output library in this directory, and name it either
|
|
|
|
# mca_<type>_<name>.la (for DSO builds) or libmca_<type>_<name>.la
|
|
|
|
# (for static builds).
|
2004-12-07 22:03:20 +03:00
|
|
|
|
2006-09-15 01:29:51 +04:00
|
|
|
if OMPI_BUILD_odls_default_DSO
|
2004-12-07 22:03:20 +03:00
|
|
|
component_noinst =
|
2006-09-15 01:29:51 +04:00
|
|
|
component_install = mca_odls_default.la
|
2004-12-07 22:03:20 +03:00
|
|
|
else
|
2006-09-15 01:29:51 +04:00
|
|
|
component_noinst = libmca_odls_default.la
|
2004-12-07 22:03:20 +03:00
|
|
|
component_install =
|
|
|
|
endif
|
|
|
|
|
2007-04-12 15:19:42 +04:00
|
|
|
mcacomponentdir = $(pkglibdir)
|
2004-12-07 22:03:20 +03:00
|
|
|
mcacomponent_LTLIBRARIES = $(component_install)
|
2006-09-15 01:29:51 +04:00
|
|
|
mca_odls_default_la_SOURCES = $(sources)
|
|
|
|
mca_odls_default_la_LDFLAGS = -module -avoid-version
|
2004-12-07 22:03:20 +03:00
|
|
|
|
|
|
|
noinst_LTLIBRARIES = $(component_noinst)
|
2006-09-15 01:29:51 +04:00
|
|
|
libmca_odls_default_la_SOURCES =$(sources)
|
|
|
|
libmca_odls_default_la_LDFLAGS = -module -avoid-version
|