Now that it has been officially released, update the embedded HWLOC to 1.11.0
Этот коммит содержится в:
родитель
8a5e1611ab
Коммит
75ceec663a
@ -15,7 +15,7 @@ enable_mpi_fortran=yes
|
||||
enable_mpi_cxx=no
|
||||
enable_mpi_cxx_seek=no
|
||||
enable_cxx_exceptions=no
|
||||
enable_mpi_java=yes
|
||||
enable_mpi_java=no
|
||||
enable_io_romio=no
|
||||
enable_contrib_no_build=libnbc
|
||||
with_memory_manager=no
|
||||
|
@ -1,35 +1,3 @@
|
||||
Applied the following patches from the upstream hwloc 1.9 branch after
|
||||
the v1.9.1 release:
|
||||
Applied the following patches from the upstream hwloc 1.11 branch after
|
||||
the v1.11.0 release:
|
||||
|
||||
All relevant commits up to open-mpi/hwloc@4e23b12 (i.e., the HEAD as
|
||||
of 27 March 2015). "Relevant" commits are defined as those that
|
||||
included files that are embedded in the Open MPI tree (e.g., updates
|
||||
to files in docs/, utils/, etc. aren't relevant because they are not
|
||||
embedded in the Open MPI tree). To be specific, the following commits
|
||||
have been cherry-picked over to Open MPI:
|
||||
|
||||
* open-mpi/hwloc@7c03216 v1.9.1 released, doing 1.9.2rc1 now
|
||||
* open-mpi/hwloc@b35ced8 misc.h: Fix hwloc_strncasecmp() build under strict flags on BSD
|
||||
* open-mpi/hwloc@d8c3f3d misc.h: Fix hwloc_strncasecmp() with some icc
|
||||
* open-mpi/hwloc@f705a23 Use gcc's __asm__ version of the asm extension, which can be used in all standards
|
||||
* open-mpi/hwloc@307726a configure: fix the check for X11/Xutil.h
|
||||
* open-mpi/hwloc@ec58c05 errors: improve the advice to send hwloc-gather-topology files in the OS error message
|
||||
* open-mpi/hwloc@35c743d NEWS update
|
||||
* open-mpi/hwloc@868170e API: clearly state that os_index isn't unique while logical_index is
|
||||
* open-mpi/hwloc@851532d x86 and OSF: Don't forget to set NUMA node nodeset
|
||||
* open-mpi/hwloc@790aa2e cpuid-x86: Fix duplicate asm labels in case of heavy inlining on x86-32
|
||||
* open-mpi/hwloc@dd09aa5 debug: fix an overzealous assertion about the parent cpuset vs its children
|
||||
* open-mpi/hwloc@769b9b5 core: fix the merging of identical objects in presence of Misc objects
|
||||
* open-mpi/hwloc@71da0f1 core: reorder children in merge_useless_child() as well
|
||||
* open-mpi/hwloc@c9cef07 hpux: improve hwloc_hpux_find_ldom() looking for NUMA node
|
||||
* open-mpi/hwloc@cdffea6 x86: use ulong for cache sizes, uint won't be enough in the near future
|
||||
* open-mpi/hwloc@55b0676 x86: use Group instead of Misc for unknown x2apic levels
|
||||
* open-mpi/hwloc@7764ce5 synthetic: Misc levels are not allowed in the synthetic description
|
||||
* open-mpi/hwloc@5b2dce1 error: point to the FAQ when displaying the big OS error message
|
||||
* open-mpi/hwloc@c7bd9e6 pci: fix SR-IOV VF vendor/device names
|
||||
* open-mpi/hwloc@a0f72ef distances: when we fail to insert an intermediate group, don't try to group further above
|
||||
* open-mpi/hwloc@e419811 AIX: Fix PU os_index
|
||||
* open-mpi/hwloc@08ab793 groups: add complete sets when inserting distance/pci groups
|
||||
* open-mpi/hwloc@c66e714 core: only update root->complete sets if insert succeeds
|
||||
* open-mpi/hwloc@01da9b9 bitmap: fix a corner case in hwloc_bitmap_isincluded() with infinite sets
|
||||
* open-mpi/hwloc@e7b192b pci: fix bridge depth
|
||||
|
@ -63,8 +63,10 @@ Version 1.11.0
|
||||
- Automatically scales graphical box width to the inner text in Cairo,
|
||||
ASCII and Windows outputs.
|
||||
- Add --rect to lstopo to force rectangular layout even for NUMA nodes.
|
||||
- Objects may have a Type info attribute to specific a better type name
|
||||
- Add --restrict-flags to configure the behavior of --restrict.
|
||||
- Objects may have a "Type" info attribute to specify a better type name
|
||||
and display it in lstopo.
|
||||
- Really export all verbose information to the given output file.
|
||||
+ hwloc-annotate
|
||||
- May now operate on all types of objects, including I/O.
|
||||
- May now insert Misc objects in the topology.
|
||||
@ -75,12 +77,15 @@ Version 1.11.0
|
||||
thanks to Imre Kerr for reporting the problem.
|
||||
+ Fix PCI Bridge-specific depth attribute.
|
||||
+ Fix hwloc_bitmap_intersect() for two infinite bitmaps.
|
||||
+ Fix some corner cases in the building of levels on large NUMA machines
|
||||
with non-uniform NUMA groups and I/Os.
|
||||
+ Improve the performance of object insertion by cpuset for large
|
||||
topologies.
|
||||
+ Prefix verbose XML import errors with the source name.
|
||||
+ Improve pkg-config checks and error messages.
|
||||
+ Fix excluding after a component with an argument in the HWLOC_COMPONENTS
|
||||
environment variable.
|
||||
* Documentation
|
||||
+ Fix the recommended way in documentation and examples to allocate memory
|
||||
on some node, it should use HWLOC_MEMBIND_BIND.
|
||||
Thanks to Nicolas Bouzat for reporting the issue.
|
||||
|
@ -1,34 +1,33 @@
|
||||
Introduction
|
||||
|
||||
hwloc provides command line tools and a C API to obtain the
|
||||
hierarchical map of key computing elements, such as: NUMA memory nodes,
|
||||
shared caches, processor packages, processor cores, processing units
|
||||
(logical processors or "threads") and even I/O devices. hwloc also
|
||||
gathers various attributes such as cache and memory information, and is
|
||||
portable across a variety of different operating systems and platforms.
|
||||
Additionally it may assemble the topologies of multiple machines into a
|
||||
single one so as to let applications consult the topology of an entire
|
||||
fabric or cluster at once.
|
||||
hwloc provides command line tools and a C API to obtain the hierarchical map of
|
||||
key computing elements, such as: NUMA memory nodes, shared caches, processor
|
||||
packages, processor cores, processing units (logical processors or "threads")
|
||||
and even I/O devices. hwloc also gathers various attributes such as cache and
|
||||
memory information, and is portable across a variety of different operating
|
||||
systems and platforms. Additionally it may assemble the topologies of multiple
|
||||
machines into a single one so as to let applications consult the topology of an
|
||||
entire fabric or cluster at once.
|
||||
|
||||
hwloc primarily aims at helping high-performance computing (HPC)
|
||||
applications, but is also applicable to any project seeking to exploit
|
||||
code and/or data locality on modern computing platforms.
|
||||
hwloc primarily aims at helping high-performance computing (HPC) applications,
|
||||
but is also applicable to any project seeking to exploit code and/or data
|
||||
locality on modern computing platforms.
|
||||
|
||||
Note that the hwloc project represents the merger of the libtopology
|
||||
project from inria and the Portable Linux Processor Affinity (PLPA)
|
||||
sub-project from Open MPI. Both of these prior projects are now
|
||||
deprecated. The first hwloc release was essentially a "re-branding" of
|
||||
the libtopology code base, but with both a few genuinely new features
|
||||
and a few PLPA-like features added in. Prior releases of hwloc included
|
||||
documentation about switching from PLPA to hwloc; this documentation
|
||||
has been dropped on the assumption that everyone who was using PLPA has
|
||||
already switched to hwloc.
|
||||
Note that the hwloc project represents the merger of the libtopology project
|
||||
from inria and the Portable Linux Processor Affinity (PLPA) sub-project from
|
||||
Open MPI. Both of these prior projects are now deprecated. The first hwloc
|
||||
release was essentially a "re-branding" of the libtopology code base, but with
|
||||
both a few genuinely new features and a few PLPA-like features added in. Prior
|
||||
releases of hwloc included documentation about switching from PLPA to hwloc;
|
||||
this documentation has been dropped on the assumption that everyone who was
|
||||
using PLPA has already switched to hwloc.
|
||||
|
||||
hwloc supports the following operating systems:
|
||||
* Linux (including old kernels not having sysfs topology information,
|
||||
with knowledge of cpusets, offline CPUs, ScaleMP vSMP, NumaScale
|
||||
NumaConnect, and Kerrighed support) on all supported hardware,
|
||||
including Intel Xeon Phi (either standalone or as a coprocessor).
|
||||
|
||||
* Linux (including old kernels not having sysfs topology information, with
|
||||
knowledge of cpusets, offline CPUs, ScaleMP vSMP, NumaScale NumaConnect,
|
||||
and Kerrighed support) on all supported hardware, including Intel Xeon Phi
|
||||
(either standalone or as a coprocessor).
|
||||
* Solaris
|
||||
* AIX
|
||||
* Darwin / OS X
|
||||
@ -39,127 +38,126 @@ hwloc supports the following operating systems:
|
||||
* Microsoft Windows
|
||||
* IBM BlueGene/Q Compute Node Kernel (CNK)
|
||||
|
||||
Since it uses standard Operating System information, hwloc's support is
|
||||
mostly independant from the processor type (x86, powerpc, ...) and just
|
||||
relies on the Operating System support. The only exception to this is
|
||||
kFreeBSD, which does not support topology information, and hwloc thus
|
||||
uses an x86-only CPUID-based backend (which can be used for other OSes
|
||||
too, see the Components and plugins section).
|
||||
Since it uses standard Operating System information, hwloc's support is mostly
|
||||
independant from the processor type (x86, powerpc, ...) and just relies on the
|
||||
Operating System support. The only exception to this is kFreeBSD, which does
|
||||
not support topology information, and hwloc thus uses an x86-only CPUID-based
|
||||
backend (which can be used for other OSes too, see the Components and plugins
|
||||
section).
|
||||
|
||||
To check whether hwloc works on a particular machine, just try to build
|
||||
it and run lstopo or lstopo-no-graphics. If some things do not look
|
||||
right (e.g. bogus or missing cache information), see Questions and Bugs
|
||||
below.
|
||||
To check whether hwloc works on a particular machine, just try to build it and
|
||||
run lstopo or lstopo-no-graphics. If some things do not look right (e.g. bogus
|
||||
or missing cache information), see Questions and Bugs below.
|
||||
|
||||
hwloc only reports the number of processors on unsupported operating
|
||||
systems; no topology information is available.
|
||||
hwloc only reports the number of processors on unsupported operating systems;
|
||||
no topology information is available.
|
||||
|
||||
For development and debugging purposes, hwloc also offers the ability
|
||||
to work on "fake" topologies:
|
||||
* Symmetrical tree of resources generated from a list of level
|
||||
arities
|
||||
* Remote machine simulation through the gathering of Linux sysfs
|
||||
topology files
|
||||
For development and debugging purposes, hwloc also offers the ability to work
|
||||
on "fake" topologies:
|
||||
|
||||
hwloc can display the topology in a human-readable format, either in
|
||||
graphical mode (X11), or by exporting in one of several different
|
||||
formats, including: plain text, PDF, PNG, and FIG (see CLI Examples
|
||||
below). Note that some of the export formats require additional support
|
||||
libraries.
|
||||
* Symmetrical tree of resources generated from a list of level arities
|
||||
* Remote machine simulation through the gathering of Linux sysfs topology
|
||||
files
|
||||
|
||||
hwloc offers a programming interface for manipulating topologies and
|
||||
objects. It also brings a powerful CPU bitmap API that is used to
|
||||
describe topology objects location on physical/logical processors. See
|
||||
the Programming Interface below. It may also be used to binding
|
||||
applications onto certain cores or memory nodes. Several utility
|
||||
programs are also provided to ease command-line manipulation of
|
||||
topology objects, binding of processes, and so on.
|
||||
hwloc can display the topology in a human-readable format, either in graphical
|
||||
mode (X11), or by exporting in one of several different formats, including:
|
||||
plain text, PDF, PNG, and FIG (see CLI Examples below). Note that some of the
|
||||
export formats require additional support libraries.
|
||||
|
||||
hwloc offers a programming interface for manipulating topologies and objects.
|
||||
It also brings a powerful CPU bitmap API that is used to describe topology
|
||||
objects location on physical/logical processors. See the Programming Interface
|
||||
below. It may also be used to binding applications onto certain cores or memory
|
||||
nodes. Several utility programs are also provided to ease command-line
|
||||
manipulation of topology objects, binding of processes, and so on.
|
||||
|
||||
Perl bindings are available from Bernd Kallies on CPAN.
|
||||
|
||||
Python bindings are available from Guy Streeter:
|
||||
|
||||
* Fedora RPM and tarball.
|
||||
* git tree (html).
|
||||
|
||||
Installation
|
||||
|
||||
hwloc (http://www.open-mpi.org/projects/hwloc/) is available under the
|
||||
BSD license. It is hosted as a sub-project of the overall Open MPI
|
||||
project (http://www.open-mpi.org/). Note that hwloc does not require
|
||||
any functionality from Open MPI -- it is a wholly separate (and much
|
||||
smaller!) project and code base. It just happens to be hosted as part
|
||||
of the overall Open MPI project.
|
||||
hwloc (http://www.open-mpi.org/projects/hwloc/) is available under the BSD
|
||||
license. It is hosted as a sub-project of the overall Open MPI project (http://
|
||||
www.open-mpi.org/). Note that hwloc does not require any functionality from
|
||||
Open MPI -- it is a wholly separate (and much smaller!) project and code base.
|
||||
It just happens to be hosted as part of the overall Open MPI project.
|
||||
|
||||
Nightly development snapshots are available on the web site. Additionally, the
|
||||
code can be directly cloned from Git:
|
||||
|
||||
Nightly development snapshots are available on the web site.
|
||||
Additionally, the code can be directly cloned from Git:
|
||||
shell$ git clone https://github.com/open-mpi/hwloc.git
|
||||
shell$ cd hwloc
|
||||
shell$ ./autogen.sh
|
||||
|
||||
Note that GNU Autoconf >=2.63, Automake >=1.10 and Libtool >=2.2.6 are
|
||||
required when building from a Git clone.
|
||||
Note that GNU Autoconf >=2.63, Automake >=1.10 and Libtool >=2.2.6 are required
|
||||
when building from a Git clone.
|
||||
|
||||
Installation by itself is the fairly common GNU-based process:
|
||||
|
||||
shell$ ./configure --prefix=...
|
||||
shell$ make
|
||||
shell$ make install
|
||||
|
||||
The hwloc command-line tool "lstopo" produces human-readable topology
|
||||
maps, as mentioned above. It can also export maps to the "fig" file
|
||||
format. Support for PDF, Postscript, and PNG exporting is provided if
|
||||
the "Cairo" development package (usually cairo-devel or libcairo2-dev)
|
||||
can be found in "lstopo" when hwloc is configured and build.
|
||||
The hwloc command-line tool "lstopo" produces human-readable topology maps, as
|
||||
mentioned above. It can also export maps to the "fig" file format. Support for
|
||||
PDF, Postscript, and PNG exporting is provided if the "Cairo" development
|
||||
package (usually cairo-devel or libcairo2-dev) can be found in "lstopo" when
|
||||
hwloc is configured and build.
|
||||
|
||||
The hwloc core may also benefit from the following development
|
||||
packages:
|
||||
* libnuma for memory binding and migration support on Linux
|
||||
(numactl-devel or libnuma-dev package).
|
||||
The hwloc core may also benefit from the following development packages:
|
||||
|
||||
* libnuma for memory binding and migration support on Linux (numactl-devel or
|
||||
libnuma-dev package).
|
||||
* libpciaccess for full I/O device discovery (libpciaccess-devel or
|
||||
libpciaccess-dev package). On Linux, PCI discovery may still be
|
||||
performed (without vendor/device names) even if libpciaccess cannot
|
||||
be used.
|
||||
libpciaccess-dev package). On Linux, PCI discovery may still be performed
|
||||
(without vendor/device names) even if libpciaccess cannot be used.
|
||||
|
||||
* the AMD OpenCL implementation for OpenCL device discovery.
|
||||
* the NVIDIA CUDA Toolkit for CUDA device discovery.
|
||||
* the NVIDIA Tesla Development Kit for NVML device discovery.
|
||||
* the NV-CONTROL X extension library (NVCtrl) for NVIDIA display
|
||||
discovery.
|
||||
* the NV-CONTROL X extension library (NVCtrl) for NVIDIA display discovery.
|
||||
* libxml2 for full XML import/export support (otherwise, the internal
|
||||
minimalistic parser will only be able to import XML files that were
|
||||
exported by the same hwloc release). See Importing and exporting
|
||||
topologies from/to XML files for details. The relevant development
|
||||
package is usually libxml2-devel or libxml2-dev.
|
||||
* libudev on Linux for easier discovery of OS device information
|
||||
(otherwise hwloc will try to manually parse udev raw files). The
|
||||
relevant development package is usually libudev-devel or
|
||||
libudev-dev.
|
||||
* libtool's ltdl library for dynamic plugin loading. The relevant
|
||||
development package is usually libtool-ltdl-devel or libltdl-dev.
|
||||
minimalistic parser will only be able to import XML files that were
|
||||
exported by the same hwloc release). See Importing and exporting topologies
|
||||
from/to XML files for details. The relevant development package is usually
|
||||
libxml2-devel or libxml2-dev.
|
||||
* libudev on Linux for easier discovery of OS device information (otherwise
|
||||
hwloc will try to manually parse udev raw files). The relevant development
|
||||
package is usually libudev-devel or libudev-dev.
|
||||
* libtool's ltdl library for dynamic plugin loading. The relevant development
|
||||
package is usually libtool-ltdl-devel or libltdl-dev.
|
||||
|
||||
PCI and XML support may be statically built inside the main hwloc
|
||||
library, or as separate dynamically-loaded plugins (see the Components
|
||||
and plugins section).
|
||||
PCI and XML support may be statically built inside the main hwloc library, or
|
||||
as separate dynamically-loaded plugins (see the Components and plugins
|
||||
section).
|
||||
|
||||
Note that because of the possibility of GPL taint, the pciutils library
|
||||
libpci will not be used (remember that hwloc is BSD-licensed).
|
||||
Note that because of the possibility of GPL taint, the pciutils library libpci
|
||||
will not be used (remember that hwloc is BSD-licensed).
|
||||
|
||||
Also note that if you install supplemental libraries in non-standard
|
||||
locations, hwloc's configure script may not be able to find them
|
||||
without some help. You may need to specify additional CPPFLAGS,
|
||||
LDFLAGS, or PKG_CONFIG_PATH values on the configure command line.
|
||||
Also note that if you install supplemental libraries in non-standard locations,
|
||||
hwloc's configure script may not be able to find them without some help. You
|
||||
may need to specify additional CPPFLAGS, LDFLAGS, or PKG_CONFIG_PATH values on
|
||||
the configure command line.
|
||||
|
||||
For example, if libpciaccess was installed into /opt/pciaccess, hwloc's
|
||||
configure script may not find it be default. Try adding PKG_CONFIG_PATH
|
||||
to the ./configure command line, like this:
|
||||
configure script may not find it be default. Try adding PKG_CONFIG_PATH to the
|
||||
./configure command line, like this:
|
||||
|
||||
./configure PKG_CONFIG_PATH=/opt/pciaccess/lib/pkgconfig ...
|
||||
|
||||
CLI Examples
|
||||
|
||||
On a 4-package 2-core machine with hyper-threading, the lstopo tool may
|
||||
show the following graphical output:
|
||||
On a 4-package 2-core machine with hyper-threading, the lstopo tool may show
|
||||
the following graphical output:
|
||||
|
||||
dudley.png
|
||||
dudley.png
|
||||
|
||||
Here's the equivalent output in textual form:
|
||||
|
||||
Machine (16GB)
|
||||
Package L#0 + L3 L#0 (4096KB)
|
||||
L2 L#0 (1024KB) + L1 L#0 (16KB) + Core L#0
|
||||
@ -190,17 +188,17 @@ Machine (16GB)
|
||||
PU L#14 (P#7)
|
||||
PU L#15 (P#15)
|
||||
|
||||
Note that there is also an equivalent output in XML that is meant for
|
||||
exporting/importing topologies but it is hardly readable to
|
||||
human-beings (see Importing and exporting topologies from/to XML files
|
||||
for details).
|
||||
Note that there is also an equivalent output in XML that is meant for exporting
|
||||
/importing topologies but it is hardly readable to human-beings (see Importing
|
||||
and exporting topologies from/to XML files for details).
|
||||
|
||||
On a 4-package 2-core Opteron NUMA machine, the lstopo tool may show
|
||||
the following graphical output:
|
||||
On a 4-package 2-core Opteron NUMA machine, the lstopo tool may show the
|
||||
following graphical output:
|
||||
|
||||
hagrid.png
|
||||
hagrid.png
|
||||
|
||||
Here's the equivalent output in textual form:
|
||||
|
||||
Machine (32GB)
|
||||
NUMANode L#0 (P#0 8190MB) + Package L#0
|
||||
L2 L#0 (1024KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#0)
|
||||
@ -215,12 +213,13 @@ Machine (32GB)
|
||||
L2 L#6 (1024KB) + L1 L#6 (64KB) + Core L#6 + PU L#6 (P#6)
|
||||
L2 L#7 (1024KB) + L1 L#7 (64KB) + Core L#7 + PU L#7 (P#7)
|
||||
|
||||
On a 2-package quad-core Xeon (pre-Nehalem, with 2 dual-core dies into
|
||||
each package):
|
||||
On a 2-package quad-core Xeon (pre-Nehalem, with 2 dual-core dies into each
|
||||
package):
|
||||
|
||||
emmett.png
|
||||
emmett.png
|
||||
|
||||
Here's the same output in textual form:
|
||||
|
||||
Machine (16GB)
|
||||
Package L#0
|
||||
L2 L#0 (4096KB)
|
||||
@ -239,117 +238,290 @@ Machine (16GB)
|
||||
|
||||
Programming Interface
|
||||
|
||||
The basic interface is available in hwloc.h. Some higher-level
|
||||
functions are available in hwloc/helper.h to reduce the need to
|
||||
manually manipulate objects and follow links between them.
|
||||
Documentation for all these is provided later in this document.
|
||||
Developers may also want to look at hwloc/inlines.h which contains the
|
||||
actual inline code of some hwloc.h routines, and at this document,
|
||||
The basic interface is available in hwloc.h. Some higher-level functions are
|
||||
available in hwloc/helper.h to reduce the need to manually manipulate objects
|
||||
and follow links between them. Documentation for all these is provided later in
|
||||
this document. Developers may also want to look at hwloc/inlines.h which
|
||||
contains the actual inline code of some hwloc.h routines, and at this document,
|
||||
which provides good higher-level topology traversal examples.
|
||||
|
||||
To precisely define the vocabulary used by hwloc, a Terms and
|
||||
Definitions section is available and should probably be read first.
|
||||
To precisely define the vocabulary used by hwloc, a Terms and Definitions
|
||||
section is available and should probably be read first.
|
||||
|
||||
Each hwloc object contains a cpuset describing the list of processing
|
||||
units that it contains. These bitmaps may be used for CPU binding and
|
||||
Memory binding. hwloc offers an extensive bitmap manipulation interface
|
||||
in hwloc/bitmap.h.
|
||||
Each hwloc object contains a cpuset describing the list of processing units
|
||||
that it contains. These bitmaps may be used for CPU binding and Memory binding.
|
||||
hwloc offers an extensive bitmap manipulation interface in hwloc/bitmap.h.
|
||||
|
||||
Moreover, hwloc also comes with additional helpers for interoperability
|
||||
with several commonly used environments. See the Interoperability With
|
||||
Other Software section for details.
|
||||
Moreover, hwloc also comes with additional helpers for interoperability with
|
||||
several commonly used environments. See the Interoperability With Other
|
||||
Software section for details.
|
||||
|
||||
The complete API documentation is available in a full set of HTML
|
||||
pages, man pages, and self-contained PDF files (formatted for both both
|
||||
US letter and A4 formats) in the source tarball in doc/doxygen-doc/.
|
||||
The complete API documentation is available in a full set of HTML pages, man
|
||||
pages, and self-contained PDF files (formatted for both both US letter and A4
|
||||
formats) in the source tarball in doc/doxygen-doc/.
|
||||
|
||||
NOTE: If you are building the documentation from a Git clone, you will
|
||||
need to have Doxygen and pdflatex installed -- the documentation will
|
||||
be built during the normal "make" process. The documentation is
|
||||
installed during "make install" to $prefix/share/doc/hwloc/ and your
|
||||
systems default man page tree (under $prefix, of course).
|
||||
NOTE: If you are building the documentation from a Git clone, you will need to
|
||||
have Doxygen and pdflatex installed -- the documentation will be built during
|
||||
the normal "make" process. The documentation is installed during "make install"
|
||||
to $prefix/share/doc/hwloc/ and your systems default man page tree (under
|
||||
$prefix, of course).
|
||||
|
||||
Portability
|
||||
|
||||
As shown in CLI Examples, hwloc can obtain information on a wide
|
||||
variety of hardware topologies. However, some platforms and/or
|
||||
operating system versions will only report a subset of this
|
||||
information. For example, on an PPC64-based system with 32 cores (each
|
||||
with 2 hardware threads) running a default 2.6.18-based kernel from
|
||||
RHEL 5.4, hwloc is only able to glean information about NUMA nodes and
|
||||
processor units (PUs). No information about caches, packages, or cores
|
||||
is available.
|
||||
As shown in CLI Examples, hwloc can obtain information on a wide variety of
|
||||
hardware topologies. However, some platforms and/or operating system versions
|
||||
will only report a subset of this information. For example, on an PPC64-based
|
||||
system with 32 cores (each with 2 hardware threads) running a default
|
||||
2.6.18-based kernel from RHEL 5.4, hwloc is only able to glean information
|
||||
about NUMA nodes and processor units (PUs). No information about caches,
|
||||
packages, or cores is available.
|
||||
|
||||
Similarly, Operating System have varying support for CPU and memory
|
||||
binding, e.g. while some Operating Systems provide interfaces for all
|
||||
kinds of CPU and memory bindings, some others provide only interfaces
|
||||
for a limited number of kinds of CPU and memory binding, and some do
|
||||
not provide any binding interface at all. Hwloc's binding functions
|
||||
would then simply return the ENOSYS error (Function not implemented),
|
||||
meaning that the underlying Operating System does not provide any
|
||||
interface for them. CPU binding and Memory binding provide more
|
||||
information on which hwloc binding functions should be preferred
|
||||
because interfaces for them are usually available on the supported
|
||||
Operating Systems.
|
||||
Similarly, Operating System have varying support for CPU and memory binding,
|
||||
e.g. while some Operating Systems provide interfaces for all kinds of CPU and
|
||||
memory bindings, some others provide only interfaces for a limited number of
|
||||
kinds of CPU and memory binding, and some do not provide any binding interface
|
||||
at all. Hwloc's binding functions would then simply return the ENOSYS error
|
||||
(Function not implemented), meaning that the underlying Operating System does
|
||||
not provide any interface for them. CPU binding and Memory binding provide more
|
||||
information on which hwloc binding functions should be preferred because
|
||||
interfaces for them are usually available on the supported Operating Systems.
|
||||
|
||||
Here's the graphical output from lstopo on this platform when
|
||||
Simultaneous Multi-Threading (SMT) is enabled:
|
||||
Here's the graphical output from lstopo on this platform when Simultaneous
|
||||
Multi-Threading (SMT) is enabled:
|
||||
|
||||
ppc64-with-smt.png
|
||||
ppc64-with-smt.png
|
||||
|
||||
And here's the graphical output from lstopo on this platform when SMT
|
||||
is disabled:
|
||||
And here's the graphical output from lstopo on this platform when SMT is
|
||||
disabled:
|
||||
|
||||
ppc64-without-smt.png
|
||||
ppc64-without-smt.png
|
||||
|
||||
Notice that hwloc only sees half the PUs when SMT is disabled. PU #15,
|
||||
for example, seems to change location from NUMA node #0 to #1. In
|
||||
reality, no PUs "moved" -- they were simply re-numbered when hwloc only
|
||||
saw half as many. Hence, PU #15 in the SMT-disabled picture probably
|
||||
corresponds to PU #30 in the SMT-enabled picture.
|
||||
Notice that hwloc only sees half the PUs when SMT is disabled. PU #15, for
|
||||
example, seems to change location from NUMA node #0 to #1. In reality, no PUs
|
||||
"moved" -- they were simply re-numbered when hwloc only saw half as many.
|
||||
Hence, PU #15 in the SMT-disabled picture probably corresponds to PU #30 in the
|
||||
SMT-enabled picture.
|
||||
|
||||
This same "PUs have disappeared" effect can be seen on other platforms
|
||||
-- even platforms / OSs that provide much more information than the
|
||||
above PPC64 system. This is an unfortunate side-effect of how operating
|
||||
systems report information to hwloc.
|
||||
This same "PUs have disappeared" effect can be seen on other platforms -- even
|
||||
platforms / OSs that provide much more information than the above PPC64 system.
|
||||
This is an unfortunate side-effect of how operating systems report information
|
||||
to hwloc.
|
||||
|
||||
Note that upgrading the Linux kernel on the same PPC64 system mentioned
|
||||
above to 2.6.34, hwloc is able to discover all the topology
|
||||
information. The following picture shows the entire topology layout
|
||||
when SMT is enabled:
|
||||
Note that upgrading the Linux kernel on the same PPC64 system mentioned above
|
||||
to 2.6.34, hwloc is able to discover all the topology information. The
|
||||
following picture shows the entire topology layout when SMT is enabled:
|
||||
|
||||
ppc64-full-with-smt.png
|
||||
ppc64-full-with-smt.png
|
||||
|
||||
Developers using the hwloc API or XML output for portable applications
|
||||
should therefore be extremely careful to not make any assumptions about
|
||||
the structure of data that is returned. For example, per the above
|
||||
reported PPC topology, it is not safe to assume that PUs will always be
|
||||
descendants of cores.
|
||||
Developers using the hwloc API or XML output for portable applications should
|
||||
therefore be extremely careful to not make any assumptions about the structure
|
||||
of data that is returned. For example, per the above reported PPC topology, it
|
||||
is not safe to assume that PUs will always be descendants of cores.
|
||||
|
||||
Additionally, future hardware may insert new topology elements that are
|
||||
not available in this version of hwloc. Long-lived applications that
|
||||
are meant to span multiple different hardware platforms should also be
|
||||
careful about making structure assumptions. For example, there may
|
||||
someday be an element "lower" than a PU, or perhaps a new element may
|
||||
exist between a core and a PU.
|
||||
Additionally, future hardware may insert new topology elements that are not
|
||||
available in this version of hwloc. Long-lived applications that are meant to
|
||||
span multiple different hardware platforms should also be careful about making
|
||||
structure assumptions. For example, there may someday be an element "lower"
|
||||
than a PU, or perhaps a new element may exist between a core and a PU.
|
||||
|
||||
API Example
|
||||
|
||||
The following small C example (named ``hwloc-hello.c'') prints the
|
||||
topology of the machine and bring the process to the first logical
|
||||
processor of the second core of the machine. More examples are
|
||||
available in the doc/examples/ directory of the source tree.
|
||||
The following small C example (named ``hwloc-hello.c'') prints the topology of
|
||||
the machine and bring the process to the first logical processor of the second
|
||||
core of the machine. More examples are available in the doc/examples/ directory
|
||||
of the source tree.
|
||||
|
||||
/* Example hwloc API program.
|
||||
*
|
||||
* See other examples under doc/examples/ in the source tree
|
||||
* for more details.
|
||||
*
|
||||
* Copyright (c) 2009-2015 Inria. All rights reserved.
|
||||
* Copyright (c) 2009-2011 Universit?eacute; Bordeaux
|
||||
* Copyright (c) 2009-2010 Cisco Systems, Inc. All rights reserved.
|
||||
* See COPYING in top-level directory.
|
||||
*
|
||||
* hwloc-hello.c
|
||||
*/
|
||||
|
||||
#include <hwloc.h>
|
||||
#include <errno.h>
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
|
||||
static void print_children(hwloc_topology_t topology, hwloc_obj_t obj,
|
||||
int depth)
|
||||
{
|
||||
char type[32], attr[1024];
|
||||
unsigned i;
|
||||
|
||||
hwloc_obj_type_snprintf(type, sizeof(type), obj, 0);
|
||||
printf("%*s%s", 2*depth, "", type);
|
||||
if (obj->os_index != (unsigned) -1)
|
||||
printf("#%u", obj->os_index);
|
||||
hwloc_obj_attr_snprintf(attr, sizeof(attr), obj, " ", 0);
|
||||
if (*attr)
|
||||
printf("(%s)", attr);
|
||||
printf("\n");
|
||||
for (i = 0; i < obj->arity; i++) {
|
||||
print_children(topology, obj->children[i], depth + 1);
|
||||
}
|
||||
}
|
||||
|
||||
int main(void)
|
||||
{
|
||||
int depth;
|
||||
unsigned i, n;
|
||||
unsigned long size;
|
||||
int levels;
|
||||
char string[128];
|
||||
int topodepth;
|
||||
hwloc_topology_t topology;
|
||||
hwloc_cpuset_t cpuset;
|
||||
hwloc_obj_t obj;
|
||||
|
||||
/* Allocate and initialize topology object. */
|
||||
hwloc_topology_init(&topology);
|
||||
|
||||
/* ... Optionally, put detection configuration here to ignore
|
||||
some objects types, define a synthetic topology, etc....
|
||||
|
||||
The default is to detect all the objects of the machine that
|
||||
the caller is allowed to access. See Configure Topology
|
||||
Detection. */
|
||||
|
||||
/* Perform the topology detection. */
|
||||
hwloc_topology_load(topology);
|
||||
|
||||
/* Optionally, get some additional topology information
|
||||
in case we need the topology depth later. */
|
||||
topodepth = hwloc_topology_get_depth(topology);
|
||||
|
||||
/*****************************************************************
|
||||
* First example:
|
||||
* Walk the topology with an array style, from level 0 (always
|
||||
* the system level) to the lowest level (always the proc level).
|
||||
*****************************************************************/
|
||||
for (depth = 0; depth < topodepth; depth++) {
|
||||
printf("*** Objects at level %d\n", depth);
|
||||
for (i = 0; i < hwloc_get_nbobjs_by_depth(topology, depth);
|
||||
i++) {
|
||||
hwloc_obj_type_snprintf(string, sizeof(string),
|
||||
hwloc_get_obj_by_depth
|
||||
(topology, depth, i), 0);
|
||||
printf("Index %u: %s\n", i, string);
|
||||
}
|
||||
}
|
||||
|
||||
/*****************************************************************
|
||||
* Second example:
|
||||
* Walk the topology with a tree style.
|
||||
*****************************************************************/
|
||||
printf("*** Printing overall tree\n");
|
||||
print_children(topology, hwloc_get_root_obj(topology), 0);
|
||||
|
||||
/*****************************************************************
|
||||
* Third example:
|
||||
* Print the number of packages.
|
||||
*****************************************************************/
|
||||
depth = hwloc_get_type_depth(topology, HWLOC_OBJ_PACKAGE);
|
||||
if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) {
|
||||
printf("*** The number of packages is unknown\n");
|
||||
} else {
|
||||
printf("*** %u package(s)\n",
|
||||
hwloc_get_nbobjs_by_depth(topology, depth));
|
||||
}
|
||||
|
||||
/*****************************************************************
|
||||
* Fourth example:
|
||||
* Compute the amount of cache that the first logical processor
|
||||
* has above it.
|
||||
*****************************************************************/
|
||||
levels = 0;
|
||||
size = 0;
|
||||
for (obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, 0);
|
||||
obj;
|
||||
obj = obj->parent)
|
||||
if (obj->type == HWLOC_OBJ_CACHE) {
|
||||
levels++;
|
||||
size += obj->attr->cache.size;
|
||||
}
|
||||
printf("*** Logical processor 0 has %d caches totaling %luKB\n",
|
||||
levels, size / 1024);
|
||||
|
||||
/*****************************************************************
|
||||
* Fifth example:
|
||||
* Bind to only one thread of the last core of the machine.
|
||||
*
|
||||
* First find out where cores are, or else smaller sets of CPUs if
|
||||
* the OS doesn't have the notion of a "core".
|
||||
*****************************************************************/
|
||||
depth = hwloc_get_type_or_below_depth(topology, HWLOC_OBJ_CORE);
|
||||
|
||||
/* Get last core. */
|
||||
obj = hwloc_get_obj_by_depth(topology, depth,
|
||||
hwloc_get_nbobjs_by_depth(topology, depth) - 1);
|
||||
if (obj) {
|
||||
/* Get a copy of its cpuset that we may modify. */
|
||||
cpuset = hwloc_bitmap_dup(obj->cpuset);
|
||||
|
||||
/* Get only one logical processor (in case the core is
|
||||
SMT/hyper-threaded). */
|
||||
hwloc_bitmap_singlify(cpuset);
|
||||
|
||||
/* And try to bind ourself there. */
|
||||
if (hwloc_set_cpubind(topology, cpuset, 0)) {
|
||||
char *str;
|
||||
int error = errno;
|
||||
hwloc_bitmap_asprintf(&str, obj->cpuset);
|
||||
printf("Couldn't bind to cpuset %s: %s\n", str, strerror(error));
|
||||
free(str);
|
||||
}
|
||||
|
||||
/* Free our cpuset copy */
|
||||
hwloc_bitmap_free(cpuset);
|
||||
}
|
||||
|
||||
/*****************************************************************
|
||||
* Sixth example:
|
||||
* Allocate some memory on the last NUMA node, bind some existing
|
||||
* memory to the last NUMA node.
|
||||
*****************************************************************/
|
||||
/* Get last node. */
|
||||
n = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_NUMANODE);
|
||||
if (n) {
|
||||
void *m;
|
||||
size = 1024*1024;
|
||||
|
||||
obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NUMANODE, n - 1);
|
||||
m = hwloc_alloc_membind_nodeset(topology, size, obj->nodeset,
|
||||
HWLOC_MEMBIND_BIND, 0);
|
||||
hwloc_free(topology, m, size);
|
||||
|
||||
m = malloc(size);
|
||||
hwloc_set_area_membind_nodeset(topology, m, size, obj->nodeset,
|
||||
HWLOC_MEMBIND_BIND, 0);
|
||||
free(m);
|
||||
}
|
||||
|
||||
/* Destroy topology object. */
|
||||
hwloc_topology_destroy(topology);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
hwloc provides a pkg-config executable to obtain relevant compiler and linker
|
||||
flags. For example, it can be used thusly to compile applications that utilize
|
||||
the hwloc library (assuming GNU Make):
|
||||
|
||||
hwloc provides a pkg-config executable to obtain relevant compiler and
|
||||
linker flags. For example, it can be used thusly to compile
|
||||
applications that utilize the hwloc library (assuming GNU Make):
|
||||
CFLAGS += $(pkg-config --cflags hwloc)
|
||||
LDLIBS += $(pkg-config --libs hwloc)
|
||||
cc hwloc-hello.c $(CFLAGS) -o hwloc-hello $(LDLIBS)
|
||||
|
||||
On a machine with 4GB of RAM and 2 processor packages -- each package
|
||||
of which has two processing cores -- the output from running
|
||||
hwloc-hello could be something like the following:
|
||||
On a machine with 4GB of RAM and 2 processor packages -- each package of which
|
||||
has two processing cores -- the output from running hwloc-hello could be
|
||||
something like the following:
|
||||
|
||||
shell$ ./hwloc-hello
|
||||
*** Objects at level 0
|
||||
Index 0: Machine(3938MB)
|
||||
@ -383,45 +555,43 @@ shell$
|
||||
|
||||
Questions and Bugs
|
||||
|
||||
Questions should be sent to the devel mailing list
|
||||
(http://www.open-mpi.org/community/lists/hwloc.php). Bug reports should
|
||||
be reported in the tracker (https://git.open-mpi.org/trac/hwloc/).
|
||||
Questions should be sent to the devel mailing list (http://www.open-mpi.org/
|
||||
community/lists/hwloc.php). Bug reports should be reported in the tracker (
|
||||
https://git.open-mpi.org/trac/hwloc/).
|
||||
|
||||
If hwloc discovers an incorrect topology for your machine, the very
|
||||
first thing you should check is to ensure that you have the most recent
|
||||
updates installed for your operating system. Indeed, most of hwloc
|
||||
topology discovery relies on hardware information retrieved through the
|
||||
operation system (e.g., via the /sys virtual filesystem of the Linux
|
||||
kernel). If upgrading your OS or Linux kernel does not solve your
|
||||
problem, you may also want to ensure that you are running the most
|
||||
recent version of the BIOS for your machine.
|
||||
If hwloc discovers an incorrect topology for your machine, the very first thing
|
||||
you should check is to ensure that you have the most recent updates installed
|
||||
for your operating system. Indeed, most of hwloc topology discovery relies on
|
||||
hardware information retrieved through the operation system (e.g., via the /sys
|
||||
virtual filesystem of the Linux kernel). If upgrading your OS or Linux kernel
|
||||
does not solve your problem, you may also want to ensure that you are running
|
||||
the most recent version of the BIOS for your machine.
|
||||
|
||||
If those things fail, contact us on the mailing list for additional
|
||||
help. Please attach the output of lstopo after having given the
|
||||
--enable-debug option to ./configure and rebuilt completely, to get
|
||||
debugging output. Also attach the /proc + /sys tarball generated by the
|
||||
installed script hwloc-gather-topology when submitting problems about
|
||||
Linux, or send the output of kstat cpu_info in the Solaris case, or the
|
||||
output of sysctl hw in the Darwin or BSD cases.
|
||||
If those things fail, contact us on the mailing list for additional help.
|
||||
Please attach the output of lstopo after having given the --enable-debug option
|
||||
to ./configure and rebuilt completely, to get debugging output. Also attach the
|
||||
/proc + /sys tarball generated by the installed script hwloc-gather-topology
|
||||
when submitting problems about Linux, or send the output of kstat cpu_info in
|
||||
the Solaris case, or the output of sysctl hw in the Darwin or BSD cases.
|
||||
|
||||
History / Credits
|
||||
|
||||
hwloc is the evolution and merger of the libtopology
|
||||
(http://runtime.bordeaux.inria.fr/libtopology/) project and the
|
||||
Portable Linux Processor Affinity (PLPA)
|
||||
(http://www.open-mpi.org/projects/plpa/) project. Because of functional
|
||||
and ideological overlap, these two code bases and ideas were merged and
|
||||
released under the name "hwloc" as an Open MPI sub-project.
|
||||
hwloc is the evolution and merger of the libtopology (http://
|
||||
runtime.bordeaux.inria.fr/libtopology/) project and the Portable Linux
|
||||
Processor Affinity (PLPA) (http://www.open-mpi.org/projects/plpa/) project.
|
||||
Because of functional and ideological overlap, these two code bases and ideas
|
||||
were merged and released under the name "hwloc" as an Open MPI sub-project.
|
||||
|
||||
libtopology was initially developed by the inria Runtime Team-Project
|
||||
(http://runtime.bordeaux.inria.fr/) (headed by Raymond Namyst
|
||||
(http://dept-info.labri.fr/~namyst/). PLPA was initially developed by
|
||||
the Open MPI development team as a sub-project. Both are now deprecated
|
||||
in favor of hwloc, which is distributed as an Open MPI sub-project.
|
||||
libtopology was initially developed by the inria Runtime Team-Project (http://
|
||||
runtime.bordeaux.inria.fr/) (headed by Raymond Namyst (http://
|
||||
dept-info.labri.fr/~namyst/). PLPA was initially developed by the Open MPI
|
||||
development team as a sub-project. Both are now deprecated in favor of hwloc,
|
||||
which is distributed as an Open MPI sub-project.
|
||||
|
||||
Further Reading
|
||||
|
||||
The documentation chapters include
|
||||
|
||||
* Terms and Definitions
|
||||
* Command-Line Tools
|
||||
* Environment Variables
|
||||
@ -439,8 +609,4 @@ The documentation chapters include
|
||||
* Frequently Asked Questions
|
||||
|
||||
Make sure to have had a look at those too!
|
||||
__________________________________________________________________
|
||||
|
||||
|
||||
Generated on 5 Jun 2015 for Hardware Locality (hwloc) by doxygen
|
||||
1.6.1
|
||||
|
@ -16,17 +16,17 @@ release=0
|
||||
# requirement is that it must be entirely printable ASCII characters
|
||||
# and have no white space.
|
||||
|
||||
greek=rc2
|
||||
greek=
|
||||
|
||||
# The date when this release was created
|
||||
|
||||
date="Unreleased developer copy"
|
||||
date="Jun 18, 2015"
|
||||
|
||||
# If snapshot=1, then use the value from snapshot_version as the
|
||||
# entire hwloc version (i.e., ignore major, minor, release, and
|
||||
# greek). This is only set to 1 when making snapshot tarballs.
|
||||
snapshot=1
|
||||
snapshot_version=dev-450-g1cc3012
|
||||
snapshot=0
|
||||
snapshot_version=${major}.${minor}.${release}${greek}-git
|
||||
|
||||
# The shared library version of hwloc's public library. This version
|
||||
# is maintained in accordance with the "Library Interface Versions"
|
||||
|
@ -5,7 +5,7 @@ includedir=@includedir@
|
||||
|
||||
Name: hwloc
|
||||
Description: Hardware locality detection and management library
|
||||
Version: @VERSION@
|
||||
Version: @HWLOC_VERSION@
|
||||
Requires.private: @HWLOC_REQUIRES@
|
||||
Cflags: -I${includedir}
|
||||
Libs: -L${libdir} -lhwloc
|
||||
|
@ -576,20 +576,6 @@ hwloc_topology_dup(hwloc_topology_t *newp,
|
||||
return -1;
|
||||
}
|
||||
|
||||
/*
|
||||
* How to compare objects based on types.
|
||||
*
|
||||
* Note that HIGHER/LOWER is only a (consistent) heuristic, used to sort
|
||||
* objects with same cpuset consistently.
|
||||
* Only EQUAL / not EQUAL can be relied upon.
|
||||
*/
|
||||
|
||||
enum hwloc_type_cmp_e {
|
||||
HWLOC_TYPE_HIGHER,
|
||||
HWLOC_TYPE_DEEPER,
|
||||
HWLOC_TYPE_EQUAL
|
||||
};
|
||||
|
||||
/* WARNING: The indexes of this array MUST match the ordering that of
|
||||
the obj_order_type[] array, below. Specifically, the values must
|
||||
be laid out such that:
|
||||
@ -702,7 +688,15 @@ int hwloc_compare_types (hwloc_obj_type_t type1, hwloc_obj_type_t type2)
|
||||
return order1 - order2;
|
||||
}
|
||||
|
||||
static enum hwloc_type_cmp_e
|
||||
enum hwloc_obj_cmp_e {
|
||||
HWLOC_OBJ_EQUAL = HWLOC_BITMAP_EQUAL, /**< \brief Equal */
|
||||
HWLOC_OBJ_INCLUDED = HWLOC_BITMAP_INCLUDED, /**< \brief Strictly included into */
|
||||
HWLOC_OBJ_CONTAINS = HWLOC_BITMAP_CONTAINS, /**< \brief Strictly contains */
|
||||
HWLOC_OBJ_INTERSECTS = HWLOC_BITMAP_INTERSECTS, /**< \brief Intersects, but no inclusion! */
|
||||
HWLOC_OBJ_DIFFERENT = HWLOC_BITMAP_DIFFERENT /**< \brief No intersection */
|
||||
};
|
||||
|
||||
static enum hwloc_obj_cmp_e
|
||||
hwloc_type_cmp(hwloc_obj_t obj1, hwloc_obj_t obj2)
|
||||
{
|
||||
hwloc_obj_type_t type1 = obj1->type;
|
||||
@ -711,60 +705,52 @@ hwloc_type_cmp(hwloc_obj_t obj1, hwloc_obj_t obj2)
|
||||
|
||||
compare = hwloc_compare_types(type1, type2);
|
||||
if (compare == HWLOC_TYPE_UNORDERED)
|
||||
return HWLOC_TYPE_EQUAL; /* we cannot do better */
|
||||
return HWLOC_OBJ_DIFFERENT; /* we cannot do better */
|
||||
if (compare > 0)
|
||||
return HWLOC_TYPE_DEEPER;
|
||||
return HWLOC_OBJ_INCLUDED;
|
||||
if (compare < 0)
|
||||
return HWLOC_TYPE_HIGHER;
|
||||
return HWLOC_OBJ_CONTAINS;
|
||||
|
||||
/* Caches have the same types but can have different depths. */
|
||||
if (type1 == HWLOC_OBJ_CACHE) {
|
||||
if (obj1->attr->cache.depth < obj2->attr->cache.depth)
|
||||
return HWLOC_TYPE_DEEPER;
|
||||
return HWLOC_OBJ_INCLUDED;
|
||||
else if (obj1->attr->cache.depth > obj2->attr->cache.depth)
|
||||
return HWLOC_TYPE_HIGHER;
|
||||
return HWLOC_OBJ_CONTAINS;
|
||||
else if (obj1->attr->cache.type > obj2->attr->cache.type)
|
||||
/* consider icache deeper than dcache and dcache deeper than unified */
|
||||
return HWLOC_TYPE_DEEPER;
|
||||
return HWLOC_OBJ_INCLUDED;
|
||||
else if (obj1->attr->cache.type < obj2->attr->cache.type)
|
||||
/* consider icache deeper than dcache and dcache deeper than unified */
|
||||
return HWLOC_TYPE_HIGHER;
|
||||
return HWLOC_OBJ_CONTAINS;
|
||||
}
|
||||
|
||||
/* Group objects have the same types but can have different depths. */
|
||||
if (type1 == HWLOC_OBJ_GROUP) {
|
||||
if (obj1->attr->group.depth == (unsigned) -1
|
||||
|| obj2->attr->group.depth == (unsigned) -1)
|
||||
return HWLOC_TYPE_EQUAL;
|
||||
return HWLOC_OBJ_EQUAL;
|
||||
if (obj1->attr->group.depth < obj2->attr->group.depth)
|
||||
return HWLOC_TYPE_DEEPER;
|
||||
return HWLOC_OBJ_INCLUDED;
|
||||
else if (obj1->attr->group.depth > obj2->attr->group.depth)
|
||||
return HWLOC_TYPE_HIGHER;
|
||||
return HWLOC_OBJ_CONTAINS;
|
||||
}
|
||||
|
||||
/* Bridges objects have the same types but can have different depths. */
|
||||
if (type1 == HWLOC_OBJ_BRIDGE) {
|
||||
if (obj1->attr->bridge.depth < obj2->attr->bridge.depth)
|
||||
return HWLOC_TYPE_DEEPER;
|
||||
return HWLOC_OBJ_INCLUDED;
|
||||
else if (obj1->attr->bridge.depth > obj2->attr->bridge.depth)
|
||||
return HWLOC_TYPE_HIGHER;
|
||||
return HWLOC_OBJ_CONTAINS;
|
||||
}
|
||||
|
||||
return HWLOC_TYPE_EQUAL;
|
||||
return HWLOC_OBJ_EQUAL;
|
||||
}
|
||||
|
||||
/*
|
||||
* How to compare objects based on cpusets.
|
||||
*/
|
||||
|
||||
enum hwloc_obj_cmp_e {
|
||||
HWLOC_OBJ_EQUAL = HWLOC_BITMAP_EQUAL, /**< \brief Equal */
|
||||
HWLOC_OBJ_INCLUDED = HWLOC_BITMAP_INCLUDED, /**< \brief Strictly included into */
|
||||
HWLOC_OBJ_CONTAINS = HWLOC_BITMAP_CONTAINS, /**< \brief Strictly contains */
|
||||
HWLOC_OBJ_INTERSECTS = HWLOC_BITMAP_INTERSECTS, /**< \brief Intersects, but no inclusion! */
|
||||
HWLOC_OBJ_DIFFERENT = HWLOC_BITMAP_DIFFERENT /**< \brief No intersection */
|
||||
};
|
||||
|
||||
static int
|
||||
hwloc_obj_cmp_sets(hwloc_obj_t obj1, hwloc_obj_t obj2)
|
||||
{
|
||||
@ -786,32 +772,6 @@ hwloc_obj_cmp_sets(hwloc_obj_t obj1, hwloc_obj_t obj2)
|
||||
return hwloc_bitmap_compare_inclusion(set1, set2);
|
||||
}
|
||||
|
||||
static int
|
||||
hwloc_obj_cmp_types(hwloc_obj_t obj1, hwloc_obj_t obj2)
|
||||
{
|
||||
/* Same sets, subsort by type to have a consistent ordering. */
|
||||
int typeres = hwloc_type_cmp(obj1, obj2);
|
||||
if (typeres == HWLOC_TYPE_DEEPER)
|
||||
return HWLOC_OBJ_INCLUDED;
|
||||
if (typeres == HWLOC_TYPE_HIGHER)
|
||||
return HWLOC_OBJ_CONTAINS;
|
||||
|
||||
/* HWLOC_TYPE_EQUAL */
|
||||
|
||||
if (obj1->type == HWLOC_OBJ_MISC) {
|
||||
/* Misc objects may vary by name */
|
||||
int res = strcmp(obj1->name, obj2->name);
|
||||
if (res < 0)
|
||||
return HWLOC_OBJ_INCLUDED;
|
||||
if (res > 0)
|
||||
return HWLOC_OBJ_CONTAINS;
|
||||
if (res == 0)
|
||||
return HWLOC_OBJ_EQUAL;
|
||||
}
|
||||
/* Same sets and types! Let's hope it's coherent. */
|
||||
return HWLOC_OBJ_EQUAL;
|
||||
}
|
||||
|
||||
/* Compare object cpusets based on complete_cpuset if defined (always correctly ordered),
|
||||
* or fallback to the main cpusets (only correctly ordered during early insert before disallowed/offline bits are cleared).
|
||||
*
|
||||
@ -968,7 +928,15 @@ hwloc___insert_object_by_cpuset(struct hwloc_topology *topology, hwloc_obj_t cur
|
||||
*/
|
||||
} else {
|
||||
/* otherwise compare actual types to decide of the inclusion */
|
||||
res = hwloc_obj_cmp_types(obj, child);
|
||||
res = hwloc_type_cmp(obj, child);
|
||||
if (res == HWLOC_OBJ_EQUAL && obj->type == HWLOC_OBJ_MISC) {
|
||||
/* Misc objects may vary by name */
|
||||
int ret = strcmp(obj->name, child->name);
|
||||
if (ret < 0)
|
||||
res = HWLOC_OBJ_INCLUDED;
|
||||
else if (ret > 0)
|
||||
res = HWLOC_OBJ_CONTAINS;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -2056,18 +2024,23 @@ hwloc_connect_children(hwloc_obj_t parent)
|
||||
}
|
||||
|
||||
/*
|
||||
* Check whether there is an object below ROOT that has the same type as OBJ
|
||||
* Check whether there is an object below ROOT that has the same type as OBJ.
|
||||
* Only used for building levels.
|
||||
* Stop at I/O or Misc since these don't go into levels, and we never have
|
||||
* normal objects under them.
|
||||
*/
|
||||
static int
|
||||
find_same_type(hwloc_obj_t root, hwloc_obj_t obj)
|
||||
{
|
||||
hwloc_obj_t child;
|
||||
|
||||
if (hwloc_type_cmp(root, obj) == HWLOC_TYPE_EQUAL)
|
||||
if (hwloc_type_cmp(root, obj) == HWLOC_OBJ_EQUAL)
|
||||
return 1;
|
||||
|
||||
for (child = root->first_child; child; child = child->next_sibling)
|
||||
if (find_same_type(child, obj))
|
||||
if (!hwloc_obj_type_is_io(child->type)
|
||||
&& child->type != HWLOC_OBJ_MISC
|
||||
&& find_same_type(child, obj))
|
||||
return 1;
|
||||
|
||||
return 0;
|
||||
@ -2088,7 +2061,7 @@ hwloc_level_take_objects(hwloc_obj_t top_obj,
|
||||
unsigned i, j;
|
||||
|
||||
for (i = 0; i < n_current_objs; i++)
|
||||
if (hwloc_type_cmp(top_obj, current_objs[i]) == HWLOC_TYPE_EQUAL) {
|
||||
if (hwloc_type_cmp(top_obj, current_objs[i]) == HWLOC_OBJ_EQUAL) {
|
||||
/* Take it, add children. */
|
||||
taken_objs[taken_i++] = current_objs[i];
|
||||
for (j = 0; j < current_objs[i]->arity; j++)
|
||||
@ -2276,7 +2249,7 @@ hwloc_connect_levels(hwloc_topology_t topology)
|
||||
|
||||
/* See if this is actually the topmost object */
|
||||
for (i = 0; i < n_objs; i++) {
|
||||
if (hwloc_type_cmp(top_obj, objs[i]) != HWLOC_TYPE_EQUAL) {
|
||||
if (hwloc_type_cmp(top_obj, objs[i]) != HWLOC_OBJ_EQUAL) {
|
||||
if (find_same_type(objs[i], top_obj)) {
|
||||
/* OBJS[i] is strictly above an object of the same type as TOP_OBJ, so it
|
||||
* is above TOP_OBJ. */
|
||||
@ -2292,7 +2265,7 @@ hwloc_connect_levels(hwloc_topology_t topology)
|
||||
n_taken_objs = 0;
|
||||
n_new_objs = 0;
|
||||
for (i = 0; i < n_objs; i++)
|
||||
if (hwloc_type_cmp(top_obj, objs[i]) == HWLOC_TYPE_EQUAL) {
|
||||
if (hwloc_type_cmp(top_obj, objs[i]) == HWLOC_OBJ_EQUAL) {
|
||||
n_taken_objs++;
|
||||
n_new_objs += objs[i]->arity;
|
||||
}
|
||||
@ -3150,7 +3123,7 @@ hwloc_topology_check(struct hwloc_topology *topology)
|
||||
assert(obj->logical_index == j);
|
||||
/* check that all objects in the level have the same type */
|
||||
if (prev) {
|
||||
assert(hwloc_type_cmp(obj, prev) == HWLOC_TYPE_EQUAL);
|
||||
assert(hwloc_type_cmp(obj, prev) == HWLOC_OBJ_EQUAL);
|
||||
assert(prev->next_cousin == obj);
|
||||
assert(obj->prev_cousin == prev);
|
||||
}
|
||||
|
Загрузка…
x
Ссылка в новой задаче
Block a user