1
1

Now that it has been officially released, update the embedded HWLOC to 1.11.0

Этот коммит содержится в:
Ralph Castain 2015-06-28 14:07:45 -07:00
родитель 8a5e1611ab
Коммит 75ceec663a
7 изменённых файлов: 452 добавлений и 340 удалений

Просмотреть файл

@ -15,7 +15,7 @@ enable_mpi_fortran=yes
enable_mpi_cxx=no
enable_mpi_cxx_seek=no
enable_cxx_exceptions=no
enable_mpi_java=yes
enable_mpi_java=no
enable_io_romio=no
enable_contrib_no_build=libnbc
with_memory_manager=no

Просмотреть файл

@ -1,35 +1,3 @@
Applied the following patches from the upstream hwloc 1.9 branch after
the v1.9.1 release:
Applied the following patches from the upstream hwloc 1.11 branch after
the v1.11.0 release:
All relevant commits up to open-mpi/hwloc@4e23b12 (i.e., the HEAD as
of 27 March 2015). "Relevant" commits are defined as those that
included files that are embedded in the Open MPI tree (e.g., updates
to files in docs/, utils/, etc. aren't relevant because they are not
embedded in the Open MPI tree). To be specific, the following commits
have been cherry-picked over to Open MPI:
* open-mpi/hwloc@7c03216 v1.9.1 released, doing 1.9.2rc1 now
* open-mpi/hwloc@b35ced8 misc.h: Fix hwloc_strncasecmp() build under strict flags on BSD
* open-mpi/hwloc@d8c3f3d misc.h: Fix hwloc_strncasecmp() with some icc
* open-mpi/hwloc@f705a23 Use gcc's __asm__ version of the asm extension, which can be used in all standards
* open-mpi/hwloc@307726a configure: fix the check for X11/Xutil.h
* open-mpi/hwloc@ec58c05 errors: improve the advice to send hwloc-gather-topology files in the OS error message
* open-mpi/hwloc@35c743d NEWS update
* open-mpi/hwloc@868170e API: clearly state that os_index isn't unique while logical_index is
* open-mpi/hwloc@851532d x86 and OSF: Don't forget to set NUMA node nodeset
* open-mpi/hwloc@790aa2e cpuid-x86: Fix duplicate asm labels in case of heavy inlining on x86-32
* open-mpi/hwloc@dd09aa5 debug: fix an overzealous assertion about the parent cpuset vs its children
* open-mpi/hwloc@769b9b5 core: fix the merging of identical objects in presence of Misc objects
* open-mpi/hwloc@71da0f1 core: reorder children in merge_useless_child() as well
* open-mpi/hwloc@c9cef07 hpux: improve hwloc_hpux_find_ldom() looking for NUMA node
* open-mpi/hwloc@cdffea6 x86: use ulong for cache sizes, uint won't be enough in the near future
* open-mpi/hwloc@55b0676 x86: use Group instead of Misc for unknown x2apic levels
* open-mpi/hwloc@7764ce5 synthetic: Misc levels are not allowed in the synthetic description
* open-mpi/hwloc@5b2dce1 error: point to the FAQ when displaying the big OS error message
* open-mpi/hwloc@c7bd9e6 pci: fix SR-IOV VF vendor/device names
* open-mpi/hwloc@a0f72ef distances: when we fail to insert an intermediate group, don't try to group further above
* open-mpi/hwloc@e419811 AIX: Fix PU os_index
* open-mpi/hwloc@08ab793 groups: add complete sets when inserting distance/pci groups
* open-mpi/hwloc@c66e714 core: only update root->complete sets if insert succeeds
* open-mpi/hwloc@01da9b9 bitmap: fix a corner case in hwloc_bitmap_isincluded() with infinite sets
* open-mpi/hwloc@e7b192b pci: fix bridge depth

Просмотреть файл

@ -63,8 +63,10 @@ Version 1.11.0
- Automatically scales graphical box width to the inner text in Cairo,
ASCII and Windows outputs.
- Add --rect to lstopo to force rectangular layout even for NUMA nodes.
- Objects may have a Type info attribute to specific a better type name
- Add --restrict-flags to configure the behavior of --restrict.
- Objects may have a "Type" info attribute to specify a better type name
and display it in lstopo.
- Really export all verbose information to the given output file.
+ hwloc-annotate
- May now operate on all types of objects, including I/O.
- May now insert Misc objects in the topology.
@ -75,12 +77,15 @@ Version 1.11.0
thanks to Imre Kerr for reporting the problem.
+ Fix PCI Bridge-specific depth attribute.
+ Fix hwloc_bitmap_intersect() for two infinite bitmaps.
+ Fix some corner cases in the building of levels on large NUMA machines
with non-uniform NUMA groups and I/Os.
+ Improve the performance of object insertion by cpuset for large
topologies.
+ Prefix verbose XML import errors with the source name.
+ Improve pkg-config checks and error messages.
+ Fix excluding after a component with an argument in the HWLOC_COMPONENTS
environment variable.
* Documentation
+ Fix the recommended way in documentation and examples to allocate memory
on some node, it should use HWLOC_MEMBIND_BIND.
Thanks to Nicolas Bouzat for reporting the issue.

Просмотреть файл

@ -1,34 +1,33 @@
Introduction
hwloc provides command line tools and a C API to obtain the
hierarchical map of key computing elements, such as: NUMA memory nodes,
shared caches, processor packages, processor cores, processing units
(logical processors or "threads") and even I/O devices. hwloc also
gathers various attributes such as cache and memory information, and is
portable across a variety of different operating systems and platforms.
Additionally it may assemble the topologies of multiple machines into a
single one so as to let applications consult the topology of an entire
fabric or cluster at once.
hwloc provides command line tools and a C API to obtain the hierarchical map of
key computing elements, such as: NUMA memory nodes, shared caches, processor
packages, processor cores, processing units (logical processors or "threads")
and even I/O devices. hwloc also gathers various attributes such as cache and
memory information, and is portable across a variety of different operating
systems and platforms. Additionally it may assemble the topologies of multiple
machines into a single one so as to let applications consult the topology of an
entire fabric or cluster at once.
hwloc primarily aims at helping high-performance computing (HPC)
applications, but is also applicable to any project seeking to exploit
code and/or data locality on modern computing platforms.
hwloc primarily aims at helping high-performance computing (HPC) applications,
but is also applicable to any project seeking to exploit code and/or data
locality on modern computing platforms.
Note that the hwloc project represents the merger of the libtopology
project from inria and the Portable Linux Processor Affinity (PLPA)
sub-project from Open MPI. Both of these prior projects are now
deprecated. The first hwloc release was essentially a "re-branding" of
the libtopology code base, but with both a few genuinely new features
and a few PLPA-like features added in. Prior releases of hwloc included
documentation about switching from PLPA to hwloc; this documentation
has been dropped on the assumption that everyone who was using PLPA has
already switched to hwloc.
Note that the hwloc project represents the merger of the libtopology project
from inria and the Portable Linux Processor Affinity (PLPA) sub-project from
Open MPI. Both of these prior projects are now deprecated. The first hwloc
release was essentially a "re-branding" of the libtopology code base, but with
both a few genuinely new features and a few PLPA-like features added in. Prior
releases of hwloc included documentation about switching from PLPA to hwloc;
this documentation has been dropped on the assumption that everyone who was
using PLPA has already switched to hwloc.
hwloc supports the following operating systems:
* Linux (including old kernels not having sysfs topology information,
with knowledge of cpusets, offline CPUs, ScaleMP vSMP, NumaScale
NumaConnect, and Kerrighed support) on all supported hardware,
including Intel Xeon Phi (either standalone or as a coprocessor).
* Linux (including old kernels not having sysfs topology information, with
knowledge of cpusets, offline CPUs, ScaleMP vSMP, NumaScale NumaConnect,
and Kerrighed support) on all supported hardware, including Intel Xeon Phi
(either standalone or as a coprocessor).
* Solaris
* AIX
* Darwin / OS X
@ -39,127 +38,126 @@ hwloc supports the following operating systems:
* Microsoft Windows
* IBM BlueGene/Q Compute Node Kernel (CNK)
Since it uses standard Operating System information, hwloc's support is
mostly independant from the processor type (x86, powerpc, ...) and just
relies on the Operating System support. The only exception to this is
kFreeBSD, which does not support topology information, and hwloc thus
uses an x86-only CPUID-based backend (which can be used for other OSes
too, see the Components and plugins section).
Since it uses standard Operating System information, hwloc's support is mostly
independant from the processor type (x86, powerpc, ...) and just relies on the
Operating System support. The only exception to this is kFreeBSD, which does
not support topology information, and hwloc thus uses an x86-only CPUID-based
backend (which can be used for other OSes too, see the Components and plugins
section).
To check whether hwloc works on a particular machine, just try to build
it and run lstopo or lstopo-no-graphics. If some things do not look
right (e.g. bogus or missing cache information), see Questions and Bugs
below.
To check whether hwloc works on a particular machine, just try to build it and
run lstopo or lstopo-no-graphics. If some things do not look right (e.g. bogus
or missing cache information), see Questions and Bugs below.
hwloc only reports the number of processors on unsupported operating
systems; no topology information is available.
hwloc only reports the number of processors on unsupported operating systems;
no topology information is available.
For development and debugging purposes, hwloc also offers the ability
to work on "fake" topologies:
* Symmetrical tree of resources generated from a list of level
arities
* Remote machine simulation through the gathering of Linux sysfs
topology files
For development and debugging purposes, hwloc also offers the ability to work
on "fake" topologies:
hwloc can display the topology in a human-readable format, either in
graphical mode (X11), or by exporting in one of several different
formats, including: plain text, PDF, PNG, and FIG (see CLI Examples
below). Note that some of the export formats require additional support
libraries.
* Symmetrical tree of resources generated from a list of level arities
* Remote machine simulation through the gathering of Linux sysfs topology
files
hwloc offers a programming interface for manipulating topologies and
objects. It also brings a powerful CPU bitmap API that is used to
describe topology objects location on physical/logical processors. See
the Programming Interface below. It may also be used to binding
applications onto certain cores or memory nodes. Several utility
programs are also provided to ease command-line manipulation of
topology objects, binding of processes, and so on.
hwloc can display the topology in a human-readable format, either in graphical
mode (X11), or by exporting in one of several different formats, including:
plain text, PDF, PNG, and FIG (see CLI Examples below). Note that some of the
export formats require additional support libraries.
hwloc offers a programming interface for manipulating topologies and objects.
It also brings a powerful CPU bitmap API that is used to describe topology
objects location on physical/logical processors. See the Programming Interface
below. It may also be used to binding applications onto certain cores or memory
nodes. Several utility programs are also provided to ease command-line
manipulation of topology objects, binding of processes, and so on.
Perl bindings are available from Bernd Kallies on CPAN.
Python bindings are available from Guy Streeter:
* Fedora RPM and tarball.
* git tree (html).
Installation
hwloc (http://www.open-mpi.org/projects/hwloc/) is available under the
BSD license. It is hosted as a sub-project of the overall Open MPI
project (http://www.open-mpi.org/). Note that hwloc does not require
any functionality from Open MPI -- it is a wholly separate (and much
smaller!) project and code base. It just happens to be hosted as part
of the overall Open MPI project.
hwloc (http://www.open-mpi.org/projects/hwloc/) is available under the BSD
license. It is hosted as a sub-project of the overall Open MPI project (http://
www.open-mpi.org/). Note that hwloc does not require any functionality from
Open MPI -- it is a wholly separate (and much smaller!) project and code base.
It just happens to be hosted as part of the overall Open MPI project.
Nightly development snapshots are available on the web site. Additionally, the
code can be directly cloned from Git:
Nightly development snapshots are available on the web site.
Additionally, the code can be directly cloned from Git:
shell$ git clone https://github.com/open-mpi/hwloc.git
shell$ cd hwloc
shell$ ./autogen.sh
Note that GNU Autoconf >=2.63, Automake >=1.10 and Libtool >=2.2.6 are
required when building from a Git clone.
Note that GNU Autoconf >=2.63, Automake >=1.10 and Libtool >=2.2.6 are required
when building from a Git clone.
Installation by itself is the fairly common GNU-based process:
shell$ ./configure --prefix=...
shell$ make
shell$ make install
The hwloc command-line tool "lstopo" produces human-readable topology
maps, as mentioned above. It can also export maps to the "fig" file
format. Support for PDF, Postscript, and PNG exporting is provided if
the "Cairo" development package (usually cairo-devel or libcairo2-dev)
can be found in "lstopo" when hwloc is configured and build.
The hwloc command-line tool "lstopo" produces human-readable topology maps, as
mentioned above. It can also export maps to the "fig" file format. Support for
PDF, Postscript, and PNG exporting is provided if the "Cairo" development
package (usually cairo-devel or libcairo2-dev) can be found in "lstopo" when
hwloc is configured and build.
The hwloc core may also benefit from the following development
packages:
* libnuma for memory binding and migration support on Linux
(numactl-devel or libnuma-dev package).
The hwloc core may also benefit from the following development packages:
* libnuma for memory binding and migration support on Linux (numactl-devel or
libnuma-dev package).
* libpciaccess for full I/O device discovery (libpciaccess-devel or
libpciaccess-dev package). On Linux, PCI discovery may still be
performed (without vendor/device names) even if libpciaccess cannot
be used.
libpciaccess-dev package). On Linux, PCI discovery may still be performed
(without vendor/device names) even if libpciaccess cannot be used.
* the AMD OpenCL implementation for OpenCL device discovery.
* the NVIDIA CUDA Toolkit for CUDA device discovery.
* the NVIDIA Tesla Development Kit for NVML device discovery.
* the NV-CONTROL X extension library (NVCtrl) for NVIDIA display
discovery.
* the NV-CONTROL X extension library (NVCtrl) for NVIDIA display discovery.
* libxml2 for full XML import/export support (otherwise, the internal
minimalistic parser will only be able to import XML files that were
exported by the same hwloc release). See Importing and exporting
topologies from/to XML files for details. The relevant development
package is usually libxml2-devel or libxml2-dev.
* libudev on Linux for easier discovery of OS device information
(otherwise hwloc will try to manually parse udev raw files). The
relevant development package is usually libudev-devel or
libudev-dev.
* libtool's ltdl library for dynamic plugin loading. The relevant
development package is usually libtool-ltdl-devel or libltdl-dev.
minimalistic parser will only be able to import XML files that were
exported by the same hwloc release). See Importing and exporting topologies
from/to XML files for details. The relevant development package is usually
libxml2-devel or libxml2-dev.
* libudev on Linux for easier discovery of OS device information (otherwise
hwloc will try to manually parse udev raw files). The relevant development
package is usually libudev-devel or libudev-dev.
* libtool's ltdl library for dynamic plugin loading. The relevant development
package is usually libtool-ltdl-devel or libltdl-dev.
PCI and XML support may be statically built inside the main hwloc
library, or as separate dynamically-loaded plugins (see the Components
and plugins section).
PCI and XML support may be statically built inside the main hwloc library, or
as separate dynamically-loaded plugins (see the Components and plugins
section).
Note that because of the possibility of GPL taint, the pciutils library
libpci will not be used (remember that hwloc is BSD-licensed).
Note that because of the possibility of GPL taint, the pciutils library libpci
will not be used (remember that hwloc is BSD-licensed).
Also note that if you install supplemental libraries in non-standard
locations, hwloc's configure script may not be able to find them
without some help. You may need to specify additional CPPFLAGS,
LDFLAGS, or PKG_CONFIG_PATH values on the configure command line.
Also note that if you install supplemental libraries in non-standard locations,
hwloc's configure script may not be able to find them without some help. You
may need to specify additional CPPFLAGS, LDFLAGS, or PKG_CONFIG_PATH values on
the configure command line.
For example, if libpciaccess was installed into /opt/pciaccess, hwloc's
configure script may not find it be default. Try adding PKG_CONFIG_PATH
to the ./configure command line, like this:
configure script may not find it be default. Try adding PKG_CONFIG_PATH to the
./configure command line, like this:
./configure PKG_CONFIG_PATH=/opt/pciaccess/lib/pkgconfig ...
CLI Examples
On a 4-package 2-core machine with hyper-threading, the lstopo tool may
show the following graphical output:
On a 4-package 2-core machine with hyper-threading, the lstopo tool may show
the following graphical output:
dudley.png
dudley.png
Here's the equivalent output in textual form:
Machine (16GB)
Package L#0 + L3 L#0 (4096KB)
L2 L#0 (1024KB) + L1 L#0 (16KB) + Core L#0
@ -190,17 +188,17 @@ Machine (16GB)
PU L#14 (P#7)
PU L#15 (P#15)
Note that there is also an equivalent output in XML that is meant for
exporting/importing topologies but it is hardly readable to
human-beings (see Importing and exporting topologies from/to XML files
for details).
Note that there is also an equivalent output in XML that is meant for exporting
/importing topologies but it is hardly readable to human-beings (see Importing
and exporting topologies from/to XML files for details).
On a 4-package 2-core Opteron NUMA machine, the lstopo tool may show
the following graphical output:
On a 4-package 2-core Opteron NUMA machine, the lstopo tool may show the
following graphical output:
hagrid.png
hagrid.png
Here's the equivalent output in textual form:
Machine (32GB)
NUMANode L#0 (P#0 8190MB) + Package L#0
L2 L#0 (1024KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#0)
@ -215,12 +213,13 @@ Machine (32GB)
L2 L#6 (1024KB) + L1 L#6 (64KB) + Core L#6 + PU L#6 (P#6)
L2 L#7 (1024KB) + L1 L#7 (64KB) + Core L#7 + PU L#7 (P#7)
On a 2-package quad-core Xeon (pre-Nehalem, with 2 dual-core dies into
each package):
On a 2-package quad-core Xeon (pre-Nehalem, with 2 dual-core dies into each
package):
emmett.png
emmett.png
Here's the same output in textual form:
Machine (16GB)
Package L#0
L2 L#0 (4096KB)
@ -239,117 +238,290 @@ Machine (16GB)
Programming Interface
The basic interface is available in hwloc.h. Some higher-level
functions are available in hwloc/helper.h to reduce the need to
manually manipulate objects and follow links between them.
Documentation for all these is provided later in this document.
Developers may also want to look at hwloc/inlines.h which contains the
actual inline code of some hwloc.h routines, and at this document,
The basic interface is available in hwloc.h. Some higher-level functions are
available in hwloc/helper.h to reduce the need to manually manipulate objects
and follow links between them. Documentation for all these is provided later in
this document. Developers may also want to look at hwloc/inlines.h which
contains the actual inline code of some hwloc.h routines, and at this document,
which provides good higher-level topology traversal examples.
To precisely define the vocabulary used by hwloc, a Terms and
Definitions section is available and should probably be read first.
To precisely define the vocabulary used by hwloc, a Terms and Definitions
section is available and should probably be read first.
Each hwloc object contains a cpuset describing the list of processing
units that it contains. These bitmaps may be used for CPU binding and
Memory binding. hwloc offers an extensive bitmap manipulation interface
in hwloc/bitmap.h.
Each hwloc object contains a cpuset describing the list of processing units
that it contains. These bitmaps may be used for CPU binding and Memory binding.
hwloc offers an extensive bitmap manipulation interface in hwloc/bitmap.h.
Moreover, hwloc also comes with additional helpers for interoperability
with several commonly used environments. See the Interoperability With
Other Software section for details.
Moreover, hwloc also comes with additional helpers for interoperability with
several commonly used environments. See the Interoperability With Other
Software section for details.
The complete API documentation is available in a full set of HTML
pages, man pages, and self-contained PDF files (formatted for both both
US letter and A4 formats) in the source tarball in doc/doxygen-doc/.
The complete API documentation is available in a full set of HTML pages, man
pages, and self-contained PDF files (formatted for both both US letter and A4
formats) in the source tarball in doc/doxygen-doc/.
NOTE: If you are building the documentation from a Git clone, you will
need to have Doxygen and pdflatex installed -- the documentation will
be built during the normal "make" process. The documentation is
installed during "make install" to $prefix/share/doc/hwloc/ and your
systems default man page tree (under $prefix, of course).
NOTE: If you are building the documentation from a Git clone, you will need to
have Doxygen and pdflatex installed -- the documentation will be built during
the normal "make" process. The documentation is installed during "make install"
to $prefix/share/doc/hwloc/ and your systems default man page tree (under
$prefix, of course).
Portability
As shown in CLI Examples, hwloc can obtain information on a wide
variety of hardware topologies. However, some platforms and/or
operating system versions will only report a subset of this
information. For example, on an PPC64-based system with 32 cores (each
with 2 hardware threads) running a default 2.6.18-based kernel from
RHEL 5.4, hwloc is only able to glean information about NUMA nodes and
processor units (PUs). No information about caches, packages, or cores
is available.
As shown in CLI Examples, hwloc can obtain information on a wide variety of
hardware topologies. However, some platforms and/or operating system versions
will only report a subset of this information. For example, on an PPC64-based
system with 32 cores (each with 2 hardware threads) running a default
2.6.18-based kernel from RHEL 5.4, hwloc is only able to glean information
about NUMA nodes and processor units (PUs). No information about caches,
packages, or cores is available.
Similarly, Operating System have varying support for CPU and memory
binding, e.g. while some Operating Systems provide interfaces for all
kinds of CPU and memory bindings, some others provide only interfaces
for a limited number of kinds of CPU and memory binding, and some do
not provide any binding interface at all. Hwloc's binding functions
would then simply return the ENOSYS error (Function not implemented),
meaning that the underlying Operating System does not provide any
interface for them. CPU binding and Memory binding provide more
information on which hwloc binding functions should be preferred
because interfaces for them are usually available on the supported
Operating Systems.
Similarly, Operating System have varying support for CPU and memory binding,
e.g. while some Operating Systems provide interfaces for all kinds of CPU and
memory bindings, some others provide only interfaces for a limited number of
kinds of CPU and memory binding, and some do not provide any binding interface
at all. Hwloc's binding functions would then simply return the ENOSYS error
(Function not implemented), meaning that the underlying Operating System does
not provide any interface for them. CPU binding and Memory binding provide more
information on which hwloc binding functions should be preferred because
interfaces for them are usually available on the supported Operating Systems.
Here's the graphical output from lstopo on this platform when
Simultaneous Multi-Threading (SMT) is enabled:
Here's the graphical output from lstopo on this platform when Simultaneous
Multi-Threading (SMT) is enabled:
ppc64-with-smt.png
ppc64-with-smt.png
And here's the graphical output from lstopo on this platform when SMT
is disabled:
And here's the graphical output from lstopo on this platform when SMT is
disabled:
ppc64-without-smt.png
ppc64-without-smt.png
Notice that hwloc only sees half the PUs when SMT is disabled. PU #15,
for example, seems to change location from NUMA node #0 to #1. In
reality, no PUs "moved" -- they were simply re-numbered when hwloc only
saw half as many. Hence, PU #15 in the SMT-disabled picture probably
corresponds to PU #30 in the SMT-enabled picture.
Notice that hwloc only sees half the PUs when SMT is disabled. PU #15, for
example, seems to change location from NUMA node #0 to #1. In reality, no PUs
"moved" -- they were simply re-numbered when hwloc only saw half as many.
Hence, PU #15 in the SMT-disabled picture probably corresponds to PU #30 in the
SMT-enabled picture.
This same "PUs have disappeared" effect can be seen on other platforms
-- even platforms / OSs that provide much more information than the
above PPC64 system. This is an unfortunate side-effect of how operating
systems report information to hwloc.
This same "PUs have disappeared" effect can be seen on other platforms -- even
platforms / OSs that provide much more information than the above PPC64 system.
This is an unfortunate side-effect of how operating systems report information
to hwloc.
Note that upgrading the Linux kernel on the same PPC64 system mentioned
above to 2.6.34, hwloc is able to discover all the topology
information. The following picture shows the entire topology layout
when SMT is enabled:
Note that upgrading the Linux kernel on the same PPC64 system mentioned above
to 2.6.34, hwloc is able to discover all the topology information. The
following picture shows the entire topology layout when SMT is enabled:
ppc64-full-with-smt.png
ppc64-full-with-smt.png
Developers using the hwloc API or XML output for portable applications
should therefore be extremely careful to not make any assumptions about
the structure of data that is returned. For example, per the above
reported PPC topology, it is not safe to assume that PUs will always be
descendants of cores.
Developers using the hwloc API or XML output for portable applications should
therefore be extremely careful to not make any assumptions about the structure
of data that is returned. For example, per the above reported PPC topology, it
is not safe to assume that PUs will always be descendants of cores.
Additionally, future hardware may insert new topology elements that are
not available in this version of hwloc. Long-lived applications that
are meant to span multiple different hardware platforms should also be
careful about making structure assumptions. For example, there may
someday be an element "lower" than a PU, or perhaps a new element may
exist between a core and a PU.
Additionally, future hardware may insert new topology elements that are not
available in this version of hwloc. Long-lived applications that are meant to
span multiple different hardware platforms should also be careful about making
structure assumptions. For example, there may someday be an element "lower"
than a PU, or perhaps a new element may exist between a core and a PU.
API Example
The following small C example (named ``hwloc-hello.c'') prints the
topology of the machine and bring the process to the first logical
processor of the second core of the machine. More examples are
available in the doc/examples/ directory of the source tree.
The following small C example (named ``hwloc-hello.c'') prints the topology of
the machine and bring the process to the first logical processor of the second
core of the machine. More examples are available in the doc/examples/ directory
of the source tree.
/* Example hwloc API program.
*
* See other examples under doc/examples/ in the source tree
* for more details.
*
* Copyright (c) 2009-2015 Inria. All rights reserved.
* Copyright (c) 2009-2011 Universit?eacute; Bordeaux
* Copyright (c) 2009-2010 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
*
* hwloc-hello.c
*/
#include <hwloc.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
static void print_children(hwloc_topology_t topology, hwloc_obj_t obj,
int depth)
{
char type[32], attr[1024];
unsigned i;
hwloc_obj_type_snprintf(type, sizeof(type), obj, 0);
printf("%*s%s", 2*depth, "", type);
if (obj->os_index != (unsigned) -1)
printf("#%u", obj->os_index);
hwloc_obj_attr_snprintf(attr, sizeof(attr), obj, " ", 0);
if (*attr)
printf("(%s)", attr);
printf("\n");
for (i = 0; i < obj->arity; i++) {
print_children(topology, obj->children[i], depth + 1);
}
}
int main(void)
{
int depth;
unsigned i, n;
unsigned long size;
int levels;
char string[128];
int topodepth;
hwloc_topology_t topology;
hwloc_cpuset_t cpuset;
hwloc_obj_t obj;
/* Allocate and initialize topology object. */
hwloc_topology_init(&topology);
/* ... Optionally, put detection configuration here to ignore
some objects types, define a synthetic topology, etc....
The default is to detect all the objects of the machine that
the caller is allowed to access. See Configure Topology
Detection. */
/* Perform the topology detection. */
hwloc_topology_load(topology);
/* Optionally, get some additional topology information
in case we need the topology depth later. */
topodepth = hwloc_topology_get_depth(topology);
/*****************************************************************
* First example:
* Walk the topology with an array style, from level 0 (always
* the system level) to the lowest level (always the proc level).
*****************************************************************/
for (depth = 0; depth < topodepth; depth++) {
printf("*** Objects at level %d\n", depth);
for (i = 0; i < hwloc_get_nbobjs_by_depth(topology, depth);
i++) {
hwloc_obj_type_snprintf(string, sizeof(string),
hwloc_get_obj_by_depth
(topology, depth, i), 0);
printf("Index %u: %s\n", i, string);
}
}
/*****************************************************************
* Second example:
* Walk the topology with a tree style.
*****************************************************************/
printf("*** Printing overall tree\n");
print_children(topology, hwloc_get_root_obj(topology), 0);
/*****************************************************************
* Third example:
* Print the number of packages.
*****************************************************************/
depth = hwloc_get_type_depth(topology, HWLOC_OBJ_PACKAGE);
if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) {
printf("*** The number of packages is unknown\n");
} else {
printf("*** %u package(s)\n",
hwloc_get_nbobjs_by_depth(topology, depth));
}
/*****************************************************************
* Fourth example:
* Compute the amount of cache that the first logical processor
* has above it.
*****************************************************************/
levels = 0;
size = 0;
for (obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, 0);
obj;
obj = obj->parent)
if (obj->type == HWLOC_OBJ_CACHE) {
levels++;
size += obj->attr->cache.size;
}
printf("*** Logical processor 0 has %d caches totaling %luKB\n",
levels, size / 1024);
/*****************************************************************
* Fifth example:
* Bind to only one thread of the last core of the machine.
*
* First find out where cores are, or else smaller sets of CPUs if
* the OS doesn't have the notion of a "core".
*****************************************************************/
depth = hwloc_get_type_or_below_depth(topology, HWLOC_OBJ_CORE);
/* Get last core. */
obj = hwloc_get_obj_by_depth(topology, depth,
hwloc_get_nbobjs_by_depth(topology, depth) - 1);
if (obj) {
/* Get a copy of its cpuset that we may modify. */
cpuset = hwloc_bitmap_dup(obj->cpuset);
/* Get only one logical processor (in case the core is
SMT/hyper-threaded). */
hwloc_bitmap_singlify(cpuset);
/* And try to bind ourself there. */
if (hwloc_set_cpubind(topology, cpuset, 0)) {
char *str;
int error = errno;
hwloc_bitmap_asprintf(&str, obj->cpuset);
printf("Couldn't bind to cpuset %s: %s\n", str, strerror(error));
free(str);
}
/* Free our cpuset copy */
hwloc_bitmap_free(cpuset);
}
/*****************************************************************
* Sixth example:
* Allocate some memory on the last NUMA node, bind some existing
* memory to the last NUMA node.
*****************************************************************/
/* Get last node. */
n = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_NUMANODE);
if (n) {
void *m;
size = 1024*1024;
obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NUMANODE, n - 1);
m = hwloc_alloc_membind_nodeset(topology, size, obj->nodeset,
HWLOC_MEMBIND_BIND, 0);
hwloc_free(topology, m, size);
m = malloc(size);
hwloc_set_area_membind_nodeset(topology, m, size, obj->nodeset,
HWLOC_MEMBIND_BIND, 0);
free(m);
}
/* Destroy topology object. */
hwloc_topology_destroy(topology);
return 0;
}
hwloc provides a pkg-config executable to obtain relevant compiler and linker
flags. For example, it can be used thusly to compile applications that utilize
the hwloc library (assuming GNU Make):
hwloc provides a pkg-config executable to obtain relevant compiler and
linker flags. For example, it can be used thusly to compile
applications that utilize the hwloc library (assuming GNU Make):
CFLAGS += $(pkg-config --cflags hwloc)
LDLIBS += $(pkg-config --libs hwloc)
cc hwloc-hello.c $(CFLAGS) -o hwloc-hello $(LDLIBS)
On a machine with 4GB of RAM and 2 processor packages -- each package
of which has two processing cores -- the output from running
hwloc-hello could be something like the following:
On a machine with 4GB of RAM and 2 processor packages -- each package of which
has two processing cores -- the output from running hwloc-hello could be
something like the following:
shell$ ./hwloc-hello
*** Objects at level 0
Index 0: Machine(3938MB)
@ -383,45 +555,43 @@ shell$
Questions and Bugs
Questions should be sent to the devel mailing list
(http://www.open-mpi.org/community/lists/hwloc.php). Bug reports should
be reported in the tracker (https://git.open-mpi.org/trac/hwloc/).
Questions should be sent to the devel mailing list (http://www.open-mpi.org/
community/lists/hwloc.php). Bug reports should be reported in the tracker (
https://git.open-mpi.org/trac/hwloc/).
If hwloc discovers an incorrect topology for your machine, the very
first thing you should check is to ensure that you have the most recent
updates installed for your operating system. Indeed, most of hwloc
topology discovery relies on hardware information retrieved through the
operation system (e.g., via the /sys virtual filesystem of the Linux
kernel). If upgrading your OS or Linux kernel does not solve your
problem, you may also want to ensure that you are running the most
recent version of the BIOS for your machine.
If hwloc discovers an incorrect topology for your machine, the very first thing
you should check is to ensure that you have the most recent updates installed
for your operating system. Indeed, most of hwloc topology discovery relies on
hardware information retrieved through the operation system (e.g., via the /sys
virtual filesystem of the Linux kernel). If upgrading your OS or Linux kernel
does not solve your problem, you may also want to ensure that you are running
the most recent version of the BIOS for your machine.
If those things fail, contact us on the mailing list for additional
help. Please attach the output of lstopo after having given the
--enable-debug option to ./configure and rebuilt completely, to get
debugging output. Also attach the /proc + /sys tarball generated by the
installed script hwloc-gather-topology when submitting problems about
Linux, or send the output of kstat cpu_info in the Solaris case, or the
output of sysctl hw in the Darwin or BSD cases.
If those things fail, contact us on the mailing list for additional help.
Please attach the output of lstopo after having given the --enable-debug option
to ./configure and rebuilt completely, to get debugging output. Also attach the
/proc + /sys tarball generated by the installed script hwloc-gather-topology
when submitting problems about Linux, or send the output of kstat cpu_info in
the Solaris case, or the output of sysctl hw in the Darwin or BSD cases.
History / Credits
hwloc is the evolution and merger of the libtopology
(http://runtime.bordeaux.inria.fr/libtopology/) project and the
Portable Linux Processor Affinity (PLPA)
(http://www.open-mpi.org/projects/plpa/) project. Because of functional
and ideological overlap, these two code bases and ideas were merged and
released under the name "hwloc" as an Open MPI sub-project.
hwloc is the evolution and merger of the libtopology (http://
runtime.bordeaux.inria.fr/libtopology/) project and the Portable Linux
Processor Affinity (PLPA) (http://www.open-mpi.org/projects/plpa/) project.
Because of functional and ideological overlap, these two code bases and ideas
were merged and released under the name "hwloc" as an Open MPI sub-project.
libtopology was initially developed by the inria Runtime Team-Project
(http://runtime.bordeaux.inria.fr/) (headed by Raymond Namyst
(http://dept-info.labri.fr/~namyst/). PLPA was initially developed by
the Open MPI development team as a sub-project. Both are now deprecated
in favor of hwloc, which is distributed as an Open MPI sub-project.
libtopology was initially developed by the inria Runtime Team-Project (http://
runtime.bordeaux.inria.fr/) (headed by Raymond Namyst (http://
dept-info.labri.fr/~namyst/). PLPA was initially developed by the Open MPI
development team as a sub-project. Both are now deprecated in favor of hwloc,
which is distributed as an Open MPI sub-project.
Further Reading
The documentation chapters include
* Terms and Definitions
* Command-Line Tools
* Environment Variables
@ -439,8 +609,4 @@ The documentation chapters include
* Frequently Asked Questions
Make sure to have had a look at those too!
__________________________________________________________________
Generated on 5 Jun 2015 for Hardware Locality (hwloc) by doxygen
1.6.1

Просмотреть файл

@ -16,17 +16,17 @@ release=0
# requirement is that it must be entirely printable ASCII characters
# and have no white space.
greek=rc2
greek=
# The date when this release was created
date="Unreleased developer copy"
date="Jun 18, 2015"
# If snapshot=1, then use the value from snapshot_version as the
# entire hwloc version (i.e., ignore major, minor, release, and
# greek). This is only set to 1 when making snapshot tarballs.
snapshot=1
snapshot_version=dev-450-g1cc3012
snapshot=0
snapshot_version=${major}.${minor}.${release}${greek}-git
# The shared library version of hwloc's public library. This version
# is maintained in accordance with the "Library Interface Versions"

Просмотреть файл

@ -5,7 +5,7 @@ includedir=@includedir@
Name: hwloc
Description: Hardware locality detection and management library
Version: @VERSION@
Version: @HWLOC_VERSION@
Requires.private: @HWLOC_REQUIRES@
Cflags: -I${includedir}
Libs: -L${libdir} -lhwloc

Просмотреть файл

@ -576,20 +576,6 @@ hwloc_topology_dup(hwloc_topology_t *newp,
return -1;
}
/*
* How to compare objects based on types.
*
* Note that HIGHER/LOWER is only a (consistent) heuristic, used to sort
* objects with same cpuset consistently.
* Only EQUAL / not EQUAL can be relied upon.
*/
enum hwloc_type_cmp_e {
HWLOC_TYPE_HIGHER,
HWLOC_TYPE_DEEPER,
HWLOC_TYPE_EQUAL
};
/* WARNING: The indexes of this array MUST match the ordering that of
the obj_order_type[] array, below. Specifically, the values must
be laid out such that:
@ -702,7 +688,15 @@ int hwloc_compare_types (hwloc_obj_type_t type1, hwloc_obj_type_t type2)
return order1 - order2;
}
static enum hwloc_type_cmp_e
enum hwloc_obj_cmp_e {
HWLOC_OBJ_EQUAL = HWLOC_BITMAP_EQUAL, /**< \brief Equal */
HWLOC_OBJ_INCLUDED = HWLOC_BITMAP_INCLUDED, /**< \brief Strictly included into */
HWLOC_OBJ_CONTAINS = HWLOC_BITMAP_CONTAINS, /**< \brief Strictly contains */
HWLOC_OBJ_INTERSECTS = HWLOC_BITMAP_INTERSECTS, /**< \brief Intersects, but no inclusion! */
HWLOC_OBJ_DIFFERENT = HWLOC_BITMAP_DIFFERENT /**< \brief No intersection */
};
static enum hwloc_obj_cmp_e
hwloc_type_cmp(hwloc_obj_t obj1, hwloc_obj_t obj2)
{
hwloc_obj_type_t type1 = obj1->type;
@ -711,60 +705,52 @@ hwloc_type_cmp(hwloc_obj_t obj1, hwloc_obj_t obj2)
compare = hwloc_compare_types(type1, type2);
if (compare == HWLOC_TYPE_UNORDERED)
return HWLOC_TYPE_EQUAL; /* we cannot do better */
return HWLOC_OBJ_DIFFERENT; /* we cannot do better */
if (compare > 0)
return HWLOC_TYPE_DEEPER;
return HWLOC_OBJ_INCLUDED;
if (compare < 0)
return HWLOC_TYPE_HIGHER;
return HWLOC_OBJ_CONTAINS;
/* Caches have the same types but can have different depths. */
if (type1 == HWLOC_OBJ_CACHE) {
if (obj1->attr->cache.depth < obj2->attr->cache.depth)
return HWLOC_TYPE_DEEPER;
return HWLOC_OBJ_INCLUDED;
else if (obj1->attr->cache.depth > obj2->attr->cache.depth)
return HWLOC_TYPE_HIGHER;
return HWLOC_OBJ_CONTAINS;
else if (obj1->attr->cache.type > obj2->attr->cache.type)
/* consider icache deeper than dcache and dcache deeper than unified */
return HWLOC_TYPE_DEEPER;
return HWLOC_OBJ_INCLUDED;
else if (obj1->attr->cache.type < obj2->attr->cache.type)
/* consider icache deeper than dcache and dcache deeper than unified */
return HWLOC_TYPE_HIGHER;
return HWLOC_OBJ_CONTAINS;
}
/* Group objects have the same types but can have different depths. */
if (type1 == HWLOC_OBJ_GROUP) {
if (obj1->attr->group.depth == (unsigned) -1
|| obj2->attr->group.depth == (unsigned) -1)
return HWLOC_TYPE_EQUAL;
return HWLOC_OBJ_EQUAL;
if (obj1->attr->group.depth < obj2->attr->group.depth)
return HWLOC_TYPE_DEEPER;
return HWLOC_OBJ_INCLUDED;
else if (obj1->attr->group.depth > obj2->attr->group.depth)
return HWLOC_TYPE_HIGHER;
return HWLOC_OBJ_CONTAINS;
}
/* Bridges objects have the same types but can have different depths. */
if (type1 == HWLOC_OBJ_BRIDGE) {
if (obj1->attr->bridge.depth < obj2->attr->bridge.depth)
return HWLOC_TYPE_DEEPER;
return HWLOC_OBJ_INCLUDED;
else if (obj1->attr->bridge.depth > obj2->attr->bridge.depth)
return HWLOC_TYPE_HIGHER;
return HWLOC_OBJ_CONTAINS;
}
return HWLOC_TYPE_EQUAL;
return HWLOC_OBJ_EQUAL;
}
/*
* How to compare objects based on cpusets.
*/
enum hwloc_obj_cmp_e {
HWLOC_OBJ_EQUAL = HWLOC_BITMAP_EQUAL, /**< \brief Equal */
HWLOC_OBJ_INCLUDED = HWLOC_BITMAP_INCLUDED, /**< \brief Strictly included into */
HWLOC_OBJ_CONTAINS = HWLOC_BITMAP_CONTAINS, /**< \brief Strictly contains */
HWLOC_OBJ_INTERSECTS = HWLOC_BITMAP_INTERSECTS, /**< \brief Intersects, but no inclusion! */
HWLOC_OBJ_DIFFERENT = HWLOC_BITMAP_DIFFERENT /**< \brief No intersection */
};
static int
hwloc_obj_cmp_sets(hwloc_obj_t obj1, hwloc_obj_t obj2)
{
@ -786,32 +772,6 @@ hwloc_obj_cmp_sets(hwloc_obj_t obj1, hwloc_obj_t obj2)
return hwloc_bitmap_compare_inclusion(set1, set2);
}
static int
hwloc_obj_cmp_types(hwloc_obj_t obj1, hwloc_obj_t obj2)
{
/* Same sets, subsort by type to have a consistent ordering. */
int typeres = hwloc_type_cmp(obj1, obj2);
if (typeres == HWLOC_TYPE_DEEPER)
return HWLOC_OBJ_INCLUDED;
if (typeres == HWLOC_TYPE_HIGHER)
return HWLOC_OBJ_CONTAINS;
/* HWLOC_TYPE_EQUAL */
if (obj1->type == HWLOC_OBJ_MISC) {
/* Misc objects may vary by name */
int res = strcmp(obj1->name, obj2->name);
if (res < 0)
return HWLOC_OBJ_INCLUDED;
if (res > 0)
return HWLOC_OBJ_CONTAINS;
if (res == 0)
return HWLOC_OBJ_EQUAL;
}
/* Same sets and types! Let's hope it's coherent. */
return HWLOC_OBJ_EQUAL;
}
/* Compare object cpusets based on complete_cpuset if defined (always correctly ordered),
* or fallback to the main cpusets (only correctly ordered during early insert before disallowed/offline bits are cleared).
*
@ -968,7 +928,15 @@ hwloc___insert_object_by_cpuset(struct hwloc_topology *topology, hwloc_obj_t cur
*/
} else {
/* otherwise compare actual types to decide of the inclusion */
res = hwloc_obj_cmp_types(obj, child);
res = hwloc_type_cmp(obj, child);
if (res == HWLOC_OBJ_EQUAL && obj->type == HWLOC_OBJ_MISC) {
/* Misc objects may vary by name */
int ret = strcmp(obj->name, child->name);
if (ret < 0)
res = HWLOC_OBJ_INCLUDED;
else if (ret > 0)
res = HWLOC_OBJ_CONTAINS;
}
}
}
@ -2056,18 +2024,23 @@ hwloc_connect_children(hwloc_obj_t parent)
}
/*
* Check whether there is an object below ROOT that has the same type as OBJ
* Check whether there is an object below ROOT that has the same type as OBJ.
* Only used for building levels.
* Stop at I/O or Misc since these don't go into levels, and we never have
* normal objects under them.
*/
static int
find_same_type(hwloc_obj_t root, hwloc_obj_t obj)
{
hwloc_obj_t child;
if (hwloc_type_cmp(root, obj) == HWLOC_TYPE_EQUAL)
if (hwloc_type_cmp(root, obj) == HWLOC_OBJ_EQUAL)
return 1;
for (child = root->first_child; child; child = child->next_sibling)
if (find_same_type(child, obj))
if (!hwloc_obj_type_is_io(child->type)
&& child->type != HWLOC_OBJ_MISC
&& find_same_type(child, obj))
return 1;
return 0;
@ -2088,7 +2061,7 @@ hwloc_level_take_objects(hwloc_obj_t top_obj,
unsigned i, j;
for (i = 0; i < n_current_objs; i++)
if (hwloc_type_cmp(top_obj, current_objs[i]) == HWLOC_TYPE_EQUAL) {
if (hwloc_type_cmp(top_obj, current_objs[i]) == HWLOC_OBJ_EQUAL) {
/* Take it, add children. */
taken_objs[taken_i++] = current_objs[i];
for (j = 0; j < current_objs[i]->arity; j++)
@ -2276,7 +2249,7 @@ hwloc_connect_levels(hwloc_topology_t topology)
/* See if this is actually the topmost object */
for (i = 0; i < n_objs; i++) {
if (hwloc_type_cmp(top_obj, objs[i]) != HWLOC_TYPE_EQUAL) {
if (hwloc_type_cmp(top_obj, objs[i]) != HWLOC_OBJ_EQUAL) {
if (find_same_type(objs[i], top_obj)) {
/* OBJS[i] is strictly above an object of the same type as TOP_OBJ, so it
* is above TOP_OBJ. */
@ -2292,7 +2265,7 @@ hwloc_connect_levels(hwloc_topology_t topology)
n_taken_objs = 0;
n_new_objs = 0;
for (i = 0; i < n_objs; i++)
if (hwloc_type_cmp(top_obj, objs[i]) == HWLOC_TYPE_EQUAL) {
if (hwloc_type_cmp(top_obj, objs[i]) == HWLOC_OBJ_EQUAL) {
n_taken_objs++;
n_new_objs += objs[i]->arity;
}
@ -3150,7 +3123,7 @@ hwloc_topology_check(struct hwloc_topology *topology)
assert(obj->logical_index == j);
/* check that all objects in the level have the same type */
if (prev) {
assert(hwloc_type_cmp(obj, prev) == HWLOC_TYPE_EQUAL);
assert(hwloc_type_cmp(obj, prev) == HWLOC_OBJ_EQUAL);
assert(prev->next_cousin == obj);
assert(obj->prev_cousin == prev);
}