1
1
Jeff Squyres ca3362021e Fix some problems noted by Ralph:
* Fix disabling hwloc build (i.e., put the AM_CONDITIONALs where they
   belong in the configure.m4 file)
 * Update some svn:ignores
 * r23142 removed some extraneous code, but forgot to remove the
   variables used only by that code

This commit was SVN r23152.

The following SVN revision numbers were found above:
  r23142 --> open-mpi/ompi@610fc67d12
2010-05-17 21:05:27 +00:00
..
2010-05-17 21:05:27 +00:00
2010-04-26 21:57:51 +00:00
2010-04-29 15:59:01 +00:00
2010-05-12 17:04:48 +00:00

Introduction

hwloc provides command line tools and a C API to obtain the hierarchical map of
key computing elements, such as: NUMA memory nodes, shared caches, processor
sockets, processor cores, and processing units (logical processors or
"threads"). hwloc also gathers various attributes such as cache and memory
information, and is portable across a variety of different operating systems
and platforms.

hwloc primarily aims at helping high-performance computing (HPC) applications,
but is also applicable to any project seeking to exploit code and/or data
locality on modern computing platforms.

Note that the hwloc project represents the merger of the libtopology project
from INRIA and the Portable Linux Processor Affinity (PLPA) sub-project from
Open MPI. Both of these prior projects are now deprecated. The first hwloc
release is essentially a "re-branding" of the libtopology code base, but with
both a few genuinely new features and a few PLPA-like features added in. More
new features and more PLPA-like features will be added to hwloc over time. See
Switching from PLPA to hwloc for more details about converting your application
from PLPA to hwloc.

hwloc supports the following operating systems:

  * Linux (including old kernels not having sysfs topology information, with
 knowledge of cpusets, offline cpus, ScaleMP vSMP, and Kerrighed support)
  * Solaris
  * AIX
  * Darwin / OS X
  * FreeBSD and its variants, such as kFreeBSD/GNU
  * OSF/1 (a.k.a., Tru64)
  * HP-UX
  * Microsoft Windows

hwloc only reports the number of processors on unsupported operating systems;
no topology information is available.

For development and debugging purposes, hwloc also offers the ability to work
on "fake" topologies:

  * Symmetrical tree of resources generated from a list of level arities
  * Remote machine simulation through the gathering of Linux sysfs topology
 files

hwloc can display the topology in a human-readable format, either in graphical
mode (X11), or by exporting in one of several different formats, including:
plain text, PDF, PNG, and FIG (see Examples below). Note that some of the
export formats require additional support libraries.

hwloc offers a programming interface for manipulating topologies and objects.
It also brings a powerful CPU bitmap API that is used to describe topology
objects location on physical/logical processors. See the Programming interface
below. It may also be used to binding applications onto certain cores or memory
nodes. Several utility programs are also provided to ease command-line
manipulation of topology objects, binding of processes, and so on.

Installation

hwloc (http://www.open-mpi.org/projects/hwloc/) is available under the BSD
license. It is hosted as a sub-project of the overall Open MPI project (http://
www.open-mpi.org/). Note that hwloc does not require any functionality from
Open MPI -- it is a wholly separate (and much smaller!) project and code base.
It just happens to be hosted as part of the overall Open MPI project.

Nightly development snapshots are available on the web site. Additionally, the
code can be directly checked out of Subversion:

shell$ svn checkout http://svn.open-mpi.org/svn/hwloc/trunk hwloc-trunk
shell$ cd hwloc-trunk
shell$ ./autogen.sh

Note that GNU Autoconf >=2.63, Automake >=1.10 and Libtool >=2.2.6 are required
when building from a Subversion checkout.

Installation by itself is the fairly common GNU-based process:

shell$ ./configure --prefix=...
shell$ make
shell$ make install

The hwloc command-line tool "lstopo" produces human-readable topology maps, as
mentioned above. It can also export maps to the "fig" file format. Support for
PDF, Postscript, and PNG exporting is provided if the "Cairo" development
package can be found when hwloc is configured and build. Similarly, lstopo's
XML support requires the libxml2 development package.

Examples

On a 4-socket 2-core machine with hyperthreading, the lstopo tool may show the
following outputs:

                               dudley.png

Machine (16GB)
  Socket #0 + L3 #0 (4096KB)
 L2 #0 (1024KB) + L1 #0 (16KB) + Core #0
   PU #0 (phys=0)
   PU #1 (phys=8)
 L2 #1 (1024KB) + L1 #1 (16KB) + Core #1
   PU #2 (phys=4)
   PU #3 (phys=12)
  Socket #1 + L3 #1 (4096KB)
 L2 #2 (1024KB) + L1 #2 (16KB) + Core #2
   PU #4 (phys=1)
   PU #5 (phys=9)
 L2 #3 (1024KB) + L1 #3 (16KB) + Core #3
   PU #6 (phys=5)
   PU #7 (phys=13)
  Socket #2 + L3 #2 (4096KB)
 L2 #4 (1024KB) + L1 #4 (16KB) + Core #4
   PU #8 (phys=2)
   PU #9 (phys=10)
 L2 #5 (1024KB) + L1 #5 (16KB) + Core #5
   PU #10 (phys=6)
   PU #11 (phys=14)
  Socket #3 + L3 #3 (4096KB)
 L2 #6 (1024KB) + L1 #6 (16KB) + Core #6
   PU #12 (phys=3)
   PU #13 (phys=11)
 L2 #7 (1024KB) + L1 #7 (16KB) + Core #7
   PU #14 (phys=7)
   PU #15 (phys=15)

On a 4-socket 2-core Opteron NUMA machine, the lstopo tool may show the
following outputs:

                               hagrid.png

Machine (64GB)
  NUMANode #0 (phys=0 8190MB) + Socket #0
 L2 #0 (1024KB) + L1 #0 (64KB) + Core #0 + PU #0 (phys=0)
 L2 #1 (1024KB) + L1 #1 (64KB) + Core #1 + PU #1 (phys=1)
  NUMANode #1 (phys=1 8192MB) + Socket #1
 L2 #2 (1024KB) + L1 #2 (64KB) + Core #2 + PU #2 (phys=2)
 L2 #3 (1024KB) + L1 #3 (64KB) + Core #3 + PU #3 (phys=3)
  NUMANode #2 (phys=2 8192MB) + Socket #2
 L2 #4 (1024KB) + L1 #4 (64KB) + Core #4 + PU #4 (phys=4)
 L2 #5 (1024KB) + L1 #5 (64KB) + Core #5 + PU #5 (phys=5)
  NUMANode #3 (phys=3 8192MB) + Socket #3
 L2 #6 (1024KB) + L1 #6 (64KB) + Core #6 + PU #6 (phys=6)
 L2 #7 (1024KB) + L1 #7 (64KB) + Core #7 + PU #7 (phys=7)
  NUMANode #4 (phys=4 8192MB) + Socket #4
 L2 #8 (1024KB) + L1 #8 (64KB) + Core #8 + PU #8 (phys=8)
 L2 #9 (1024KB) + L1 #9 (64KB) + Core #9 + PU #9 (phys=9)
  NUMANode #5 (phys=5 8192MB) + Socket #5
 L2 #10 (1024KB) + L1 #10 (64KB) + Core #10 + PU #10 (phys=10)
 L2 #11 (1024KB) + L1 #11 (64KB) + Core #11 + PU #11 (phys=11)
  NUMANode #6 (phys=6 8192MB) + Socket #6
 L2 #12 (1024KB) + L1 #12 (64KB) + Core #12 + PU #12 (phys=12)
 L2 #13 (1024KB) + L1 #13 (64KB) + Core #13 + PU #13 (phys=13)
  NUMANode #7 (phys=7 8192MB) + Socket #7
 L2 #14 (1024KB) + L1 #14 (64KB) + Core #14 + PU #14 (phys=14)
 L2 #15 (1024KB) + L1 #15 (64KB) + Core #15 + PU #15 (phys=15)

On a 2-socket quad-core Xeon (pre-Nehalem, with 2 dual-core dies into each
socket):

                               emmett.png

Machine (16GB)
  Socket #0
 L2 #0 (4096KB)
   L1 #0 (32KB) + Core #0 + PU #0 (phys=0)
   L1 #1 (32KB) + Core #1 + PU #1 (phys=4)
 L2 #1 (4096KB)
   L1 #2 (32KB) + Core #2 + PU #2 (phys=2)
   L1 #3 (32KB) + Core #3 + PU #3 (phys=6)
  Socket #1
 L2 #2 (4096KB)
   L1 #4 (32KB) + Core #4 + PU #4 (phys=1)
   L1 #5 (32KB) + Core #5 + PU #5 (phys=5)
 L2 #3 (4096KB)
   L1 #6 (32KB) + Core #6 + PU #6 (phys=3)
   L1 #7 (32KB) + Core #7 + PU #7 (phys=7)

Programming interface

The basic interface is available in hwloc.h. It essentially offers low-level
routines for advanced programmers that want to manually manipulate objects and
follow links between them. Developers should also look at hwloc/helper.h, which
provides good higher-level topology traversal examples.

Each object contains a cpuset describing the list of processing units that it
contains. These cpusets may be used for Binding. hwloc offers an extensive
cpuset manipulation interface in hwloc/cpuset.h.

Moreover, hwloc also comes with additional helpers for interoperability with
several commonly used environments. See the Interoperability with other
software section for details.

To precisely define the vocabulary used by hwloc, a Terms and Definitions
section is available and should probably be read first.

Further documentation is available in a full set of HTML pages, man pages, and
self-contained PDF files (formatted for both both US letter and A4 formats) in
the source tarball in doc/doxygen-doc/. If you are building from a Subversion
checkout, you will need to have Doxygen and pdflatex installed -- the
documentation will be built during the normal "make" process. The documentation
is installed during "make install" to $prefix/share/doc/hwloc/ and your systems
default man page tree (under $prefix, of course).

The following section presents an example of API usage.

API example

The following small C example (named ``hwloc-hello.c'') prints the topology of
the machine and bring the process to the first logical processor of the second
core of the machine.

/* Example hwloc API program.
 *
 * Copyright ? 2009 INRIA, Universit? Bordeaux 1
 * Copyright ? 2009-2010 Cisco Systems, Inc.  All rights reserved.
 *
 * hwloc-hello.c
 */

#include <hwloc.h>

static void print_children(hwloc_topology_t topology, hwloc_obj_t obj,
                        int depth)
{
 char string[128];
 unsigned i;

 hwloc_obj_snprintf(string, sizeof(string), topology, obj, "#", 0);
 printf("%*s%s\n", 2*depth, "", string);
 for (i = 0; i < obj->arity; i++) {
     print_children(topology, obj->children[i], depth + 1);
 }
}

int main(void)
{
 int depth;
 unsigned i;
 unsigned long size;
 int levels;
 char string[128];
 int topodepth;
 hwloc_topology_t topology;
 hwloc_cpuset_t cpuset;
 hwloc_obj_t obj;

 /* Allocate and initialize topology object. */
 hwloc_topology_init(&topology);

 /* ... Optionally, put detection configuration here to ignore
    some objects types, define a synthetic topology, etc....

    The default is to detect all the objects of the machine that
    the caller is allowed to access.  See Configure Topology
    Detection. */

 /* Perform the topology detection. */
 hwloc_topology_load(topology);

 /* Optionally, get some additional topology information
    in case we need the topology depth later. */
 topodepth = hwloc_topology_get_depth(topology);

 /*****************************************************************
  * First example:
  * Walk the topology with an array style, from level 0 (always
  * the system level) to the lowest level (always the proc level).
  *****************************************************************/
 for (depth = 0; depth < topodepth; depth++) {
     printf("*** Objects at level %d\n", depth);
     for (i = 0; i < hwloc_get_nbobjs_by_depth(topology, depth);
          i++) {
         hwloc_obj_snprintf(string, sizeof(string), topology,
                    hwloc_get_obj_by_depth(topology, depth, i),
                    "#", 0);
         printf("Index %u: %s\n", i, string);
     }
 }

 /*****************************************************************
  * Second example:
  * Walk the topology with a tree style.
  *****************************************************************/
 printf("*** Printing overall tree\n");
 print_children(topology, hwloc_get_root_obj(topology), 0);

 /*****************************************************************
  * Third example:
  * Print the number of sockets.
  *****************************************************************/
 depth = hwloc_get_type_depth(topology, HWLOC_OBJ_SOCKET);
 if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) {
     printf("*** The number of sockets is unknown\n");
 } else {
     printf("*** %u socket(s)\n",
            hwloc_get_nbobjs_by_depth(topology, depth));
 }

 /*****************************************************************
  * Fourth example:
  * Compute the amount of cache that the first logical processor
  * has above it.
  *****************************************************************/
 levels = 0;
 size = 0;
 for (obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, 0);
      obj;
      obj = obj->parent)
   if (obj->type == HWLOC_OBJ_CACHE) {
     levels++;
     size += obj->attr->cache.size;
   }
 printf("*** Logical processor 0 has %d caches totaling %luKB\n",
        levels, size / 1024);

 /*****************************************************************
  * Fifth example:
  * Bind to only one thread of the last core of the machine.
  *
  * First find out where cores are, or else smaller sets of CPUs if
  * the OS doesn't have the notion of a "core".
  *****************************************************************/
 depth = hwloc_get_type_or_below_depth(topology, HWLOC_OBJ_CORE);

 /* Get last core. */
 obj = hwloc_get_obj_by_depth(topology, depth,
                hwloc_get_nbobjs_by_depth(topology, depth) - 1);
 if (obj) {
     /* Get a copy of its cpuset that we may modify. */
     cpuset = hwloc_cpuset_dup(obj->cpuset);

     /* Get only one logical processor (in case the core is
        SMT/hyperthreaded). */
     hwloc_cpuset_singlify(cpuset);

     /* And try to bind ourself there. */
     if (hwloc_set_cpubind(topology, cpuset, 0)) {
         char *str;
         hwloc_cpuset_asprintf(&str, obj->cpuset);
         printf("Couldn't bind to cpuset %s\n", str);
         free(str);
     }

     /* Free our cpuset copy */
     hwloc_cpuset_free(cpuset);
 }

 /* Destroy topology object. */
 hwloc_topology_destroy(topology);

 return 0;
}

hwloc provides a pkg-config executable to obtain relevant compiler and linker
flags. For example, it can be used thusly to compile applications that utilize
the hwloc library (assuming GNU Make):

CFLAGS += $(pkg-config --cflags hwloc)
LDLIBS += $(pkg-config --libs hwloc)
cc hwloc-hello.c $(CFLAGS) -o hwloc-hello $(LDLIBS)

On a machine with 4GB of RAM and 2 processor sockets -- each socket of which
has two processing cores -- the output from running hwloc-hello could be
something like the following:

shell$ ./hwloc-hello
*** Objects at level 0
Index 0: Machine(3938MB)
*** Objects at level 1
Index 0: Socket#0
Index 1: Socket#1
*** Objects at level 2
Index 0: Core#0
Index 1: Core#1
Index 2: Core#3
Index 3: Core#2
*** Objects at level 3
Index 0: PU#0
Index 1: PU#1
Index 2: PU#2
Index 3: PU#3
*** Printing overall tree
Machine(3938MB)
  Socket#0
 Core#0
   PU#0
 Core#1
   PU#1
  Socket#1
 Core#3
   PU#2
 Core#2
   PU#3
*** 2 socket(s)
shell$

Questions and bugs

Questions should be sent to the devel mailing list (http://www.open-mpi.org/
community/lists/hwloc.php). Bug reports should be reported in the tracker (
https://svn.open-mpi.org/trac/hwloc/).

If hwloc discovers an incorrect topology for your machine, the very first thing
you should check is to ensure that you have the most recent updates installed
for your operating system. Indeed, most of hwloc topology discovery relies on
hardware information retrieved through the operation system (e.g., via the /sys
virtual filesystem of the Linux kernel). If upgrading your OS or Linux kernel
does not solve your problem, you may also want to ensure that you are running
the most recent version of the BIOS for your machine.

If those things fail, contact us on the mailing list for additional help.

History / credits

hwloc is the evolution and merger of the libtopology (http://
runtime.bordeaux.inria.fr/libtopology/) project and the Portable Linux
Processor Affinity (PLPA) (http://www.open-mpi.org/projects/plpa/) project.
Because of functional and ideological overlap, these two code bases and ideas
were merged and released under the name "hwloc" as an Open MPI sub-project.

libtopology was initially developed by the INRIA Runtime Team-Project (http://
runtime.bordeaux.inria.fr/) (headed by Raymond Namyst (http://
dept-info.labri.fr/~namyst/). PLPA was initially developed by the Open MPI
development team as a sub-project. Both are now deprecated in favor of hwloc,
which is distributed as an Open MPI sub-project.

-------------------------------------------------------------------------------

Generated by  doxygen 1.6.2