paffinity hwloc was returning "NOT_SUPPORTED" when the real problem
was that the underlying hwloc simply hadn't been initialized yet. So
let's clearly delineate this case: return OPAL_ERR_NOT_INITIALIZED if
the underlying hwloc is not initialized.
This commit was SVN r25902.
Use hwloc to obtain the cpuset for each process during mpi_init, and share that info in the modex. As it arrives, use a new opal_hwloc_base utility function to parse the value against the local proc's cpuset and determine where they overlap. Cache the value in the pmap object as it may be referenced multiple times.
Thus, the return value from orte_ess.proc_get_locality is a 16-bit bitmask that describes the resources being shared with you. This bitmask can be tested using the macros in opal/mca/paffinity/paffinity.h
Locality is available for all procs, whether launched via mpirun or directly with an external launcher such as slurm or aprun.
This commit was SVN r25331.
Since hwloc has a dynamic bitmap size, it could actually have bits set
that will not fit in the paffinity mask. We already made sure that we
didn't overrun the paffinity mask; now also set the return value to
OPAL_ERR_VALUE_OUT_OF_BOUNDS (wow, we really thought of everything
with those error codes, eh?) if the hwloc bitmap has bits set higher
than what will fit into the paffinity bitmask.
This commit was SVN r25179.
The following Trac tickets were found above:
Ticket 2854 --> https://svn.open-mpi.org/trac/ompi/ticket/2854
paffinity/hwloc components were still calling hwloc_topology_init/load
themselves, and not using the opal_hwloc_topology. Doh!
This commit fixes that -- these 2 components no longer have their own
copy of the topology tree; they just use opal_hwloc_topology.
This commit was SVN r25151.
the command line, hwloc is just like any other external dependency
in OMPI: if we find it, we'll use it. If we don't find it, we'll
ignore it. See comments in opal/mca/hwloc/configure.m4 for an
explanation.
* Fix some copy-n-paste errors in opal/mca/hwloc/configure.m4
w.r.t. flags coming in from the winning component.
* Add another line in ompi_info's output about whether hwloc support
is included or not.
This commit was SVN r25134.
Nth core, so it fell over to try to find the Nth PU.
-----
hwloc isn't able to find cores on all platforms. Example: PPC64
running RHEL 5.4 (linux kernel 2.6.18) only reports NUMA nodes and
PU's. Fine.
However, note that hwloc_get_obj_by_type() will return NULL in 2
(effectively) different cases:
- no objects of the requested type were found
- the Nth object of the requested type was not found
So first we have to see if we can find *any* cores by looking for the
0th core. If we find it, then try to find the Nth core. Otherwise,
try to find the Nth PU.
This commit was SVN r24632.
appeared multiple times in ompi_info output (so did others, but this
is the one that was noticed). Ensure that we don't repeat
opal_paffinity_base_register_params() multiple times.
This commit was SVN r24569.
filenames -- don't include the project name ("opal")
* Don't link maffinity/hwloc and paffinity/hwloc against the common
hwloc in the static build case (because this will result in
duplicate symbols)
This commit was SVN r24447.
Temporarily remove hwloc's internal version of myriexpress.h. It is
causing a problem when compiling Open MPI with MX support because
hwloc uses AC_CONFIG_HEADER in hwloc's hwloc.m4 to generate
opal/mca/paffinity/hwloc/hwloc/include/hwloc/config.h.
AC_CONFIG_HEADER apparently has the (undocumented) side effect of
adding -I$(top_builddir)/opal/mca/paffinity/hwloc/hwloc/include/hwloc
to OMPI's compilation flags. Hence, when the OMPI MX components are
compiled and #include "myriexpress.h" (or <myriexpress.h>) they see
hwloc's myriexpress.h before the system one. Badness ensures.
This removal is temporary because we need to figure out a better
solution. But for now, OMPI is not using hwloc's myriexpress.h file --
so it's safe to remove. I'll push this issue upstream to hwloc to
figure out a better solution...
This commit was SVN r24354.
The following Trac tickets were found above:
Ticket 2690 --> https://svn.open-mpi.org/trac/ompi/ticket/2690
the module to use the new hwloc bitmap API (the cpuset API is both
klunkier and deprecated), which simplified a few things.
This commit was SVN r24217.
{{{
base/paffinity_base_service.c: In function ‘opal_paffinity_base_cset2mapstr’:
base/paffinity_base_service.c:623: warning: unused variable ‘range_last’
base/paffinity_base_service.c:623: warning: unused variable ‘range_first’
base/paffinity_base_service.c:622: warning: unused variable ‘count’
base/paffinity_base_service.c:622: warning: unused variable ‘m’
}}}
{{{
connect/btl_openib_connect_oob.c: In function ‘init_ud_qp’:
connect/btl_openib_connect_oob.c:1111: warning: control reaches end of non-void function
connect/btl_openib_connect_oob.c: In function ‘init_device’:
connect/btl_openib_connect_oob.c:1235: warning: unused variable ‘i’
connect/btl_openib_connect_oob.c: In function ‘get_pathrecord_sl’:
connect/btl_openib_connect_oob.c:1323: warning: unused variable ‘i’
}}}
This commit was SVN r24196.
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.
Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.
This commit was SVN r23764.
linking against libibverbs on Solaris.
Sorry for the mid-day configure change folks; I meant to commit this
last night and forgot. :-(
This commit was SVN r23606.
logic (even though the "else" clause for handling it was there). This
commit puts back the specific check for the word "external".
Thanks to Jed Brown for noticing the issue. Fixes trac:2503.
This commit was SVN r23475.
The following Trac tickets were found above:
Ticket 2503 --> https://svn.open-mpi.org/trac/ompi/ticket/2503
not using in Open MPI (i.e., that stuff is only used in the standalone
builds of hwloc -- it's not compiled/installed/used by Open MPI).
This commit was SVN r23416.
platforms (e.g., PPC64 running RHEL 5.4) -- sometimes it only finds
PUs. So in that case, just run the same calculation, but with PUs
instead of cores.
This commit was SVN r23305.
* Remove OPAL_ERR_PAFFINITY_NOT_SUPPORTED; fit it into the generic
OPAL_ERR_NOT_SUPPORTED case.
* When odls_default detects that processor affinity is not supported,
it prints a specific message about it, and then it suppressed a
generic HNP help message that would normally follow it (i.e., it's
easier to have the "processor affinity is not supported" show_help
message last).
* Use some symbolic names in odls_default instead of fixed int's,
just for slight readability improvements in the code.
* Introduce orte_show_help_suppress(), which gives the ability to
suppress any future showings of any arbitrary show_help() message.
This is useful if you display message X and want to suppress
message Y. This suppression *only* works in environments where
orte_show_help() does coalescing.
This commit was SVN r23249.
distribution tarball, and would therefore cause automake to fail (in
case someone invokes autogen.sh on a distribution tarball).
This commit was SVN r23218.
* If < 0, it's an OPAL_ERR_* value
* If >= 0, it's the actual output value of the function
This is problematic for the OPAL_SOS stuff. This commit changes those
functions to always return OPAL_* statuses and send the output value
back through output parameters (like 95% of the rest of the code
base). This avoids the confusion with OPAL_SOS stuff and makes
paffinity work again (e.g., mpirun --bind-to-core ...).
I updated all paffinitiy modules for the new function signatures, and
bumped the paffinity API version up to 2.0.1. I don't think the
version change will matter, though, because we'll be introducing
support for hardware threads soon, which will either bump the
paffinity version again or we'll replace paffinity with
a new framework.
This commit was SVN r23197.
I forgot to mention one more thing in the r23152 commit message:
* Copy the fix for hwloc's m4 to disable the configure flag
--enable-debug when building in embedding mode, because it can be
hijacked by the outter-level application. In this case, if you
configured OMPI with --enable-debug (or have --enable-debug in a
platform file), you'd see all of hwloc's debug output. Ick. hwloc
1.0 will include this fix.
This commit was SVN r23153.
The following SVN revision numbers were found above:
r23152 --> open-mpi/ompi@ca3362021e
* Fix disabling hwloc build (i.e., put the AM_CONDITIONALs where they
belong in the configure.m4 file)
* Update some svn:ignores
* r23142 removed some extraneous code, but forgot to remove the
variables used only by that code
This commit was SVN r23152.
The following SVN revision numbers were found above:
r23142 --> open-mpi/ompi@610fc67d12