1
1
Граф коммитов

382 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
0e4e3af1db Remove problem installation of hwloc 2.0
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-20 18:18:08 -07:00
Ralph Castain
7d8d877837 Remove build product and update .gitignore to avoid picking it up again
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-20 11:49:48 -07:00
Gilles Gouaillardet
593e4ce63f hwloc: add hwloc2x
internal hwloc 2x is used with --with-hwloc=future

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-20 17:39:51 +09:00
Gilles Gouaillardet
60aa9cfcb6 hwloc: add support for hwloc v2 API
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-20 17:39:44 +09:00
Gilles Gouaillardet
9f29f3bff4 hwloc: since WHOLE_SYSTEM is no more used, remove useless
checks related to offline and disallowed elements

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-20 17:39:21 +09:00
Gilles Gouaillardet
1a34224948 hwloc: do not set the HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM flag
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-20 17:39:16 +09:00
Ralph Castain
2753f53e6d Detect that we have a mix of BE/LE in the system, provide a warning that OMPI doesn't currently support this environment, and error out
Fixes #2817

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-03 15:47:05 -07:00
Nathan Hjelm
ffd8ee2dfd opal: use opal_list_t convienience macros
This commit cleans up code in opal to use OPAL_LIST_FOREACH(_SAFE),
OPAL_LIST_DESTRUCT, and OPAL_LIST_RELEASE.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-20 12:37:12 -06:00
Gilles Gouaillardet
10ea991d0a hwloc: add CUDA include dir to CPPFLAGS
so hwloc configury can find nvml.h when CUDA support is built

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-05 11:46:22 +09:00
Gilles Gouaillardet
8d7541f766 hwloc: disable nvml is CUDA support is not built in Open MPI
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-05 11:07:34 +09:00
Gilles Gouaillardet
81062b7cd2 hwloc: update hwloc to 1.11.6
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-03-31 13:35:16 +09:00
Jeff Squyres
b8dfd49e97 hwloc: re-enable use of autogen.pl in a tarball
Commit fec519a793 broke the ability to
run autogen.pl in a distribution tarball.  This commit restores that
ability by also distributing opal/mca/hwloc/autogen.options in the
tarball.

Skipping CI because CI does not test this functionality:

[skip ci]
bot:notest

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-17 11:41:17 -07:00
Gilles Gouaillardet
7e01be60d9 hwloc: add support for hwloc v1.5
hwloc v1.5 does not support HWLOC_OBJ_OSDEV_COPROC
nor hwloc_topology_dup(), so for this version :
- do not search for coprocessors
- do not try hwloc_topology_dup(), note this is not
  used anywhere in the code base

Thanks Jeff for helping with the wording

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-03-03 09:39:24 +09:00
Jeff Squyres
fec519a793 hwloc: rename opal/mca/hwloc/hwloc.h -> hwloc-internal.h
Per a prior commit, the presence of "hwloc.h" can cause ambiguity when
using --with-hwloc=external (i.e., whether to include
opal/mca/hwloc/hwloc.h or whether to include the system-installed
hwloc.h).

This commit:

1. Renames opal/mca/hwloc/hwloc.h to hwloc-internal.h.
2. Adds opal/mca/hwloc/autogen.options to tell autogen.pl to expect to
   find hwloc-internal.h (instead of hwloc.h) in opal/mca/hwloc.
3. s@opal/mca/hwloc/hwloc.h@opal/mca/hwloc/hwloc-internal.h@g in the
   rest of the code base.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-28 07:48:42 -08:00
Ralph Castain
223495325d Fix binding policy bug and support pe=1 modifier
Allow someone to specify the "pe=N" modifier to a mapping policy when N=1. This equates to just "bind-to core", but helps people who use a script to set the PE policy. Fix a bug where setting the binding policy left a lingering "if-supported" flag that shouldn't be there.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 14:55:17 -08:00
Ralph Castain
0c8609ca16 Update to newest PMIx master (includes configuration cleanups). Silence trivial Coverity warning in hwloc base.
Cleanup a race condition segfault during finalize by ensuring the PMIx progress thread is stopped prior to starting to tear down the messaging components

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-14 15:14:00 -08:00
Ralph Castain
ef86707fbe Deprecate the --slot-list paramaeter in favor of --cpu-list. Remove the --cpu-set param (mark it as deprecated) and use --cpu-list instead as it was confusing having the two params. The --cpu-list param defines the cpus to be used by procs of this job, and the binding policy will be overlayed on top of it.
Note: since the discovered cpus are filtered against this list, #slots will be set to the #cpus in the list if no slot values are given in a -host or -hostname specification.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-24 13:33:22 -08:00
Ralph Castain
08c76a42bb Update to latest PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>

Plug a minor memory leak. Tell the PMIx server not to create a dstore memory region for the daemon job as there is nobody to share it with.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>

Protect users of hwloc membind functions

Signed-off-by: Ralph Castain <rhc@open-mpi.org>

Update PMIx to include NULL string protection

Signed-off-by: Ralph Castain <rhc@open-mpi.org>

Update to PMIx master to include key overwrite protection

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-30 12:44:47 -08:00
Ralph Castain
fe68f23099 Only instantiate the HWLOC topology in an MPI process if it actually will be used.
There are only five places in the non-daemon code paths where opal_hwloc_topology is currently referenced:

* shared memory BTLs (sm, smcuda). I have added a code path to those components that uses the location string
  instead of the topology itself, if available, thus avoiding instantiating the topology

* openib BTL. This uses the distance matrix. At present, I haven't developed a method
  for replacing that reference. Thus, this component will instantiate the topology

* usnic BTL. Uses the distance matrix.

* treematch TOPO component. Does some complex tree-based algorithm, so it will instantiate
  the topology

* ess base functions. If a process is direct launched and not bound at launch, this
  code attempts to bind it. Thus, procs in this scenario will instantiate the
  topology

Note that instantiating the topology on complex chips such as KNL can consume
megabytes of memory.

Fix pernode binding policy

Properly handle the unbound case

Correct pointer usage

Do not free static error messages!

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-29 10:33:29 -08:00
Ralph Castain
3a2d6a5ab6 Begin to reduce reliance of application procs on the topology tree itself by having the daemon provide more detailed info. In this case, provide the topology description string so that procs can readily determine the number of types of objects on the node, and a "locality" string that describes which objects this process is executing upon. The latter allows a process to compute the objects of overlap between itself and another proc without consulting the topology tree.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-28 09:14:26 -08:00
Jeff Squyres
3571c3c5bb hwloc external: minor fixes to 9649c44
- Fix capitolization typos
- Make comment more correct / flow better
- Use AM_CPPFLAGS, not DEFAULT_INCLUDES
- Remove extra "hwloc/" from external hwloc.h specification

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-21 09:06:24 -08:00
Gilles Gouaillardet
9649c44fa0 hwloc: correctly handle --with-hwloc=external
- simply #include "hwloc.h" to use the external hwloc header
- do use the external hwloc header instead of opal/mca/hwloc/hwloc.h

Thanks Orion Poplawski for the report

Fixes open-mpi/ompi#2616

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-21 11:58:10 +09:00
Ralph Castain
585540bcee Reduce the flood of warnings due to uninitialized variables, mismatched types, and unused things to a more bearable trickle
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-14 16:33:50 -08:00
Gilles Gouaillardet
45732fd764 hwloc/base: fix a memory leak in buffer_cleanup()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-01 14:24:29 +09:00
Ralph Castain
be3197fe27 Ensure that the libevent headers are installed for external libevent when --with-devel-headers is given. Correct the path for opal_config.h in the external hwloc header 2016-10-20 20:57:50 -07:00
Gilles Gouaillardet
4e19cd51b1 hwloc/external: add a missing include file 2016-10-14 09:27:33 +09:00
Ralph Castain
a14ec3bdbc Mucho thanks to Gilles - his patch to reorder the CPPFLAGS solves the problem of inadvertently picking up hwloc and libevent headers from locations in CPPFLAGS while continuing to build the embedded versions. Also silence a minor warning about an uninitialized var. 2016-09-22 07:39:22 -07:00
Gilles Gouaillardet
cd2b5a82ed hwloc: plug memory leak
as reported by Coverity with CID 1270441
2016-09-07 10:08:44 +09:00
Jeff Squyres
7bea563e02 hwloc: fix Valgrind warning
Cherry picked from open-mpi/hwloc@d4565c351e

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-01 18:50:40 -07:00
Gilles Gouaillardet
acda07472a configury: revamp and re-ident sub configure.m4 after open-mpi/ompi@846360fd4c 2016-07-06 11:59:51 +09:00
Gilles Gouaillardet
846360fd4c configury: correctly perform make distclean when {libevent,hwloc,pmix} are external components
Thanks Jeff for the guidance

Fixes open-mpi/ompi#1683

note:
in order to keep this commit easy to review, some AS_IF([...]) were replaced with
AS_IF([false], ...) or AS_IF_([true], ...)
these will be removed and re-idented in a subsequent commit
2016-07-06 11:57:24 +09:00
Jeff Squyres
d175fd692d README.ompi: track patches added to hwloc
Track post-v1.11.3-release patches applied to the hwloc copy embedded
in Open MPI.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-06-01 07:17:05 -07:00
Jeff Squyres
3867bd3640 hwloc.m4: only check for valgrind in non-embedded mode
This fixes https://github.com/open-mpi/ompi/issues/1732: i.e., the
case where the outer project has its own check for
<valgrind/valgrind.h>, but also supplements CPPFLAGS (to find
Valgrind's header files) before doing that check.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

Ideally, we would tell OMPI to disable autoconf's caching of our
valgrind check result so that its check gets the right result after
adding CPPFLAGS. Not sure if we can do that.

For now, just disable our Valgrind code in embedded mode.
This will keep the x86 backend enabled under Valgrind but
it will auto-disable itself when finding identical APIC ids anyway
(because CPUID returns same outputs for all PUs).

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>

Fixes open-mpi/ompi#1732

(cherry picked from commit open-mpi/hwloc@8b44fb1c81)
2016-06-01 06:58:53 -07:00
Jeff Squyres
5cfee95ea4 hwloc1113: add missing file to Makefile.am
Lack of this file causes a failure when you run autogen.pl on a
distribution tarball.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-31 09:57:50 -07:00
Brice Goglin
ca621330a6 Update hwloc to v1.11.3
Remove contrib/windows/
Merge hwlocXYZ/hwloc/README-ompi.txt back into hwlocXYZ/README-ompi.txt instead of having both.
Add README.txt in new automake-required directory contrib/systemd/

Keep the following patches applied since they are not in 1.11.3
    linux: actually enable libudev based on the result of AC_CHECK_LIB
    (cherry picked from open-mpi/hwloc@9549fd59af)
    configure: check the actual may_alias syntax that we use
    (cherry picked from open-mpi/hwloc@0ab7af5e90)
2016-05-20 07:20:16 +02:00
Jeff Squyres
eccf0ff4cd hwloc/external: set WRAPPER_EXTRA_* vars in proper location
WRAPPER_EXTRA flags are checked *before* the POST_CONFIG macro is
invoked.  So set them in the main CONFIG macro.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-10 07:34:56 -07:00
Brice Goglin
6839d928c2 configure: check the actual may_alias syntax that we use
xlc 13.1.0 crashes because of our may_alias attributes in nolibxml.c
on Power7. libxml.c and nolibxml.c are the only may_alias users for now,
so change our configure check to match the actual code using it.

Thanks to Paul Hargrove for reporting and debugging the issue,
and providing the patch.

https://www.open-mpi.org/community/lists/devel/2016/05/18918.php

(cherry picked from open-mpi/hwloc@0ab7af5e90)
2016-05-08 22:22:30 +02:00
Ralph Castain
7594b95e4b Ensure the hwloc external header is include when --with-devel-headers is given 2016-05-08 10:18:14 -07:00
Brice Goglin
a2a721f961 linux: actually enable libudev based on the result of AC_CHECK_LIB
instead of doing AC_CHECK_HEADERS+AC_CHECK_LIB and only using the result of the former.

Thanks to Paul Hargrove for reporting the issue (OMPI build with -m32).

(cherry picked from open-mpi/hwloc@9549fd59af)
2016-05-03 10:00:40 +02:00
Karol Mroz
e1c64e6e59 opal: standardize on max hostname length
Define OPAL_MAXHOSTNAMELEN to be either:
  (MAXHOSTNAMELEN + 1) or
  (limits.h:HOST_NAME_MAX + 1) or
  (255 + 1)

For pmix code, define above using PMIX_MAXHOSTNAMELEN.

Fixup opal layer to use the new max.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-24 08:19:47 +02:00
Gilles Gouaillardet
d529951206 hwloc: correctly count cores with at least one allowed PU
when SMT is enabled, a core must be counted as long as one of its hwthread is allowed

Thanks Ben Menadue for the report.

This fixes a regression from open-mpi/ompi@6d149554a7
2016-01-29 11:54:34 +09:00
Gilles Gouaillardet
6d149554a7 hwloc: have opal_hwloc_base_get_pu search for HWLOC_OBJ_PU when mpirun is invoked with --use-hwthread-cpus
Fixes open-mpi/ompi#1247
2016-01-26 18:10:33 +09:00
Tim Mattox
958de82471 hwloc_base_util.c: Remove newly unused variable 'i'. 2016-01-14 16:35:47 -05:00
Tim Mattox
f2d4a8d266 Replace a bit counting loop with a call to an efficient population count routine 2016-01-12 10:48:56 -05:00
Nathan Hjelm
15007b4e2b linux: use mntent.h instead of manually parsing /proc/mounts
setmntent() doesn't support root_fd, but manual parsing of
/proc/mounts is fragile, and actually buggy for very long mount lines
(see open-mpi/hwloc#142 (comment)).

Since we only openat("/proc/mounts") there, just manually concatenate
the fsroot_path and use setmntent().

Thanks to Nathan Hjelm for the report.

(Cherry-picked from open-mpi/hwloc@d2d07b9a22)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-07 12:55:03 -07:00
Nathan Hjelm
1384559fcd Update hwloc to v1.11.2
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-07 12:33:12 -07:00
Gilles Gouaillardet
b20a219ad0 hwloc/external: abort if hwloc v2 is detected since it is not yet supported 2015-12-29 09:23:27 +09:00
Gilles Gouaillardet
fec973efda configury: test portability
replace test ... -o ... with test ... || test ...
and test ... -a ... with test ... && test ...
2015-12-28 13:58:45 +09:00
Ralph Castain
64b695669a Cleanup warnings in opal and orte layers when building optimized on Mac 2015-12-17 07:51:24 -08:00
Jeff Squyres
f69364e768 hwloc: upgrade from v1.11.0 to v1.11.1
Taken from upstream v1.11.1 release.

Fixes open-mpi/ompi#981.
2015-10-15 08:58:33 -07:00
Jeff Squyres
12e796dcaf hwloc: headers are not in $includedir
They are in $opalincludedir.  Use the neutral "$pkgincludedir", which
will get translated under the covers to opalincludedir.
2015-10-13 05:59:52 -07:00
Gilles Gouaillardet
2ac09d5a8d pci: do not probe PCI topology on Solaris unless effective uid is root
Otherwise libpciaccess sends a big error message to stderr:
  Error opening /devices/pci@0,0:reg: Permission denied

(cherry picked from commit open-mpi/hwloc@d93c7c0960)
2015-09-29 09:42:58 +09:00
Gilles Gouaillardet
975b6fd51b hwloc: do not count not allowed cores in df_search_cores 2015-09-17 13:10:34 +09:00
Nathan Hjelm
899bf548a2 opal/hwloc: fix topology detection when socket is above numa
The OPAL_PROC_ON_* definitions have been changed from values to
flags. This should not cause any problems as these values were already
used as flags throughout the code base. Note, there will be a
difference between localities produced by the new code and the
old. For example, if a machine does not have a level-3 but two cores
share a level-1 or level-2 cache cache the level-3 bit will not be set
in the locality and OPAL_PROC_ON_LOCAL_L3CACHE will return 0. Before
this change it would have returned 1.

In addition the OPAL_PROC_ON_LOCAL_* macros have been simplified.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 14:17:45 -06:00
Ralph Castain
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
Nathan Hjelm
156ce6af21 periodic whitespace purge
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 09:32:33 -06:00
Ralph Castain
b42545b0cb Update x86_32 cpuid assembly code. Cheery-picked from
open-mpi/hwloc@40f9978bcc
2015-07-31 11:40:38 -07:00
Ralph Castain
ed93154e43 Fix hetero operations. An error in the hwloc utilities only allocated memory for the first display of a binding map, and then assumed that all nodes had the same number of cores in them. This resulted in memory corruption whenever someone displayed a binding pattern for a hetero cluster, and a smaller node was first in line. 2015-07-07 12:52:16 -07:00
Ralph Castain
75ceec663a Now that it has been officially released, update the embedded HWLOC to 1.11.0 2015-06-28 14:07:45 -07:00
Nathan Hjelm
4d92c9989e more c99 updates
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Ralph Castain
ff92781ec4 Replace hwloc191 with hwloc1110
Fix hwloc compile. Ignore LAMA mapper due to deprecated hwloc functions
2015-06-13 10:11:45 -07:00
Ralph Castain
d9f23627fd Add in hwloc 1.11.0rc1 - will overwrite with final version 2015-06-04 15:35:56 -07:00
Ralph Castain
5003be5c5c If the user specifies a --map-by <foo> option, then default to bind-to <foo> unless they specify a bind-to option. If they map-by slot/node, then use the default policy based on num_procs. 2015-04-23 13:30:21 -07:00
Nathan Hjelm
33181b2543 opal: use C99 subobject naming for component initialization
This commit helps future-proof opal components by initializing each
component member by name.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Nathan Hjelm
3436f2917d Merge pull request #449 from hjelmn/mca_base_update
mca/base update
2015-04-16 08:41:48 -06:00
Jeff Squyres
fadc3ad01a hwloc/external/configure.m4: no need to unset
Instead, use a safe environment variable name (that is SCOPE_PUSHed and
SCOPE_POPed).
2015-04-14 07:04:01 -07:00
Ralph Castain
033418f62a Correct a typo that reversed the default binding pattern. Ensure we default bind to hwthread if user specified --use-hwthread-cpus if nprocs <= 2, and bind to hwthread if told to do so. 2015-04-10 15:58:35 -07:00
Ralph Castain
c32609b1c7 Bring over open-mpi/hwloc@f714f8d
linux: only use the device-tree on Power machines

It's available on ARM but the assumption that cpus' "reg" start at 0
is invalid.
We could make that work but the device-tree doesn't currently
bring anything better than sysfs on ARM, so don't bother for now.
2015-04-04 09:30:21 -07:00
Ralph Castain
b67b3619fc If we are using the default bindings, and one or more nodes are not setup to support binding, then don't error out - just don't bind.
Thanks to Annu Desari for pointing out the problem.
2015-03-28 08:20:24 -07:00
Nathan Hjelm
b68d66bb9b MCA: Add the project/project version to the MCA base component
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.

All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-03-27 10:59:04 -06:00
Jeff Squyres
0c502d90cd hwloc README-ompi.txt: update for what we pulled from hwloc
Document what we pulled from the hwloc tree.
2015-03-27 06:49:42 -07:00
Brice Goglin
29ccbfd590 hwloc pci: fix bridge depth
It was setup in the PCI backend before filtering,
and partially updated after filtering in the core.
Only setup once correctly after filtering in the core.

(cherry picked from commit open-mpi/hwloc@9659653d24)

Conflicts:
	tests/hwloc/linux/40intel64-2g2n4c+pci.output
	tests/hwloc/xml/192em64t-12gr2n8c2t-distancegroups.xml
	tests/hwloc/xml/192em64t-24n8c2t-distancegroups.xml
	tests/hwloc/xml/192em64t-24n8c2t-nodistancegroups.xml
	tests/hwloc/xml/24em64t-2n6c2t-pci.xml
	tests/hwloc/xml/32em64t-2n8c2t-pci-normalio.xml
	tests/hwloc/xml/96em64t-4n4d3ca2co-pci.xml
	utils/hwloc/test-hwloc-compress-dir.input.tar.gz
	utils/hwloc/test-hwloc-compress-dir.output.tar.gz

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 06:49:39 -07:00
Brice Goglin
1905f35a1e hwloc: bitmap: fix a corner case in hwloc_bitmap_isincluded() with infinite sets
If super_set contains more allocated ulongs than sub_set,
we did not check the last ulongs.
We would return true instead of false when sub_set is
infinite while the last ulongs in super_set are not full.

This fixes tests/hwloc_bitmap_compare_inclusion on some platforms.

(cherry picked from commit open-mpi/hwloc@299e6e846f)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:44 -07:00
Brice Goglin
5c9157c547 hwloc: core: only update root->complete sets if insert succeeds
Otherwise we get spurious bits for crazy topologies such as 8em64t-2s2ca2c-buggynuma.output

Will make debug asserts easier.

(cherry picked from commit open-mpi/hwloc@546cd9330a)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:44 -07:00
Brice Goglin
dec01097f8 hwloc: groups: add complete sets when inserting distance/pci groups
Make sure we define complete cpuset/nodeset when we define groups' main cpuset/nodeset
during later insert of groups (for PCI hostbridges or distances).
Otherwise they may end up clearing child/parent complete sets which
suddenly become incoherent while they were fixed earlier.

Needed to fix allowed_nodeset meaning.

(cherry picked from commit open-mpi/hwloc@7c88d17add)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:33 -07:00
Brice Goglin
d6e415cd41 hwloc: AIX: Fix PU os_index
When looking for PUs inside R_MAXSDL rads, some AIX 6.1 releases
return one first rad without any PU.
AIX 6.1 00F63F144C00 does (on quad-power7).
AIX 6.1 00CBAAC24C00 doesn't (on 16x power6).

So we can't assume rad #x contains PU #x. But we already have the right
code to fill the cpuset from the rad, so use that to obtain the PU os_index
as well.

Cannot be used to obtain NUMA node os_index since there's no way to directly
retrieve NUMA nodes from rads (mempools seem unrelated). Just keep using #rad
for NUMA nodes os_index and document that convention when converting back in
set_membind().

Thanks to Hendryk Bockelmann and Erik Schnetter for helping debugging.

(cherry picked from commit open-mpi/hwloc@60006c7b88)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:33 -07:00
Brice Goglin
80140bbe7b hwloc: distances: when we fail to insert an intermediate group, don't try to group further above
Otherwise we'll have some NULL objects above, would be annoying.
No need to dig further, the distance matrix is likely buggy.

We still keep the inserted groups at this level (incomplete level)
because removing them is hard.

(cherry picked from commit open-mpi/hwloc@312a971ec9)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:33 -07:00
Brice Goglin
29c99156cf hwloc: pci: fix SR-IOV VF vendor/device names
Commit 626129d2818693e62b83c1cfa2ba6e058e5bed66 fixed the hwloc
device/vendor numbers obtained from libpciaccess.
But the corresponding names are still retrieved from pciaccess numbers,
so fix these numbers inside pciaccess structures before retrieving the names.

(cherry picked from commit open-mpi/hwloc@85ea6e4acc)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:32 -07:00
Brice Goglin
da164be0ef hwloc: error: point to the FAQ when displaying the big OS error message
(cherry picked from commit open-mpi/hwloc@b191f816f6)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:32 -07:00
Brice Goglin
3f96e7a271 hwloc: synthetic: Misc levels are not allowed in the synthetic description
Misc objects were used between system and machine in the past
but quickly got replaced with groups.

(cherry picked from commit open-mpi/hwloc@6c2aa6d1ea)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
050bb35feb hwloc: x86: use Group instead of Misc for unknown x2apic levels
Misc are reserved for annotating the topology, the core
doesn't like merging them. Group is more appropriate.

(cherry picked from commit open-mpi/hwloc@3c47649591)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
379c7b0d8b hwloc: x86: use ulong for cache sizes, uint won't be enough in the near future
(cherry picked from commit open-mpi/hwloc@ae82597773)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
6caf9edbea hwloc: hpux: improve hwloc_hpux_find_ldom() looking for NUMA node
hwloc_get_first_largest_obj_inside_cpuset() returns the largest/highest object,
but it could still have a child with the same cpuset.
So check children as well in case there's a matching NUMA node there.

(cherry picked from commit open-mpi/hwloc@57a1c4fbe4)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
fff1bb5dcd hwloc: core: reorder children in merge_useless_child() as well
When ignore_keep_structure is enabled, intermediate level can disappear
between parent and child, making the new child complete_cpuset smaller,
causing the child list to require a reorder just like in remove_ignored().

(cherry picked from commit open-mpi/hwloc@88afbe6b62)

Embed this related commit:
core: abstract out reorder_children(), needed when merging modifies the list of children
(cherry picked from commit open-mpi/hwloc@14db82d391)

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
77978a846e hwloc: core: fix the merging of identical objects in presence of Misc objects
If object A contains B + I/O as children, we can "ignore" I/Os and still
try to merge A and B. We now do the same for Misc objects without cpusets
instead of I/Os.

This fixes a corner case when export/reimport to XML creates a slightly
different topology (making hwloc_insert_misc fail inside a Linux cgroup).

Thanks to Dave Love for reporting the problem.

Fixes #118

(cherry picked from commit open-mpi/hwloc@650371e115)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
5427b33caf hwloc: debug: fix an overzealous assertion about the parent cpuset vs its children
When I/O are attached under a PU, removing the children's cpusets from
the parent cpuset doesn't give 0, it gives the PU cpuset.
The assertion fails on single-pu machines with I/O when --merge is given,
only one PU remains with I/O under it.

But if we insert Misc by cpuset under PU, it gives 0 as expected.

Fix the assertion accordingly.

Thanks to Thomas Van Doren for reporting the issue.

(cherry picked from commit open-mpi/hwloc@45c94c336d)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
9b59d532fc hwloc: cpuid-x86: Fix duplicate asm labels in case of heavy inlining on x86-32
hwloc_x86_discover() calls hwloc_look_x86() twice, which calls hwloc_have_x86_cpuid().
If everything gets inlined, the asm label inside hwloc_have_x86_cpuid()
is duplicated.
Use a local label with f annotation in jumps to avoid the problem.

Thanks to Thomas Van Doren for reporting the issue (found with gcc -m32).

(cherry picked from commit open-mpi/hwloc@50e447f5bc)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
86a536ca58 hwloc: x86 and OSF: Don't forget to set NUMA node nodeset
x86: Not critical since BSDs that use this backend have no membind support,
but better fix it for uniformization.
(cherry picked from commit open-mpi/hwloc@a431361c7d)

OSF: Looks like nobody ever tried to play with memory binding on OSF/Tru64.
(cherry picked from commit open-mpi/hwloc@2d6c73356d)

Conflicts:
	NEWS

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
db5bc72496 hwloc: API: clearly state that os_index isn't unique while logical_index is
(cherry picked from commit open-mpi/hwloc@6c75302ab2)

Conflicts:
	opal/mca/hwloc/hwloc191/hwloc/include/hwloc.h

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
a636790604 hwloc: opal/mca/hwloc/hwloc191/hwloc/NEWS update
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:13:41 -07:00
Brice Goglin
7c96aecfaf hwloc: errors: improve the advice to send hwloc-gather-topology files in the OS error message
(cherry picked from commit open-mpi/hwloc@f77aa01b3c)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:13:41 -07:00
Brice Goglin
d5f8c89527 hwloc: configure: fix the check for X11/Xutil.h
At least some solaris enforce the need to #include X11/Xlib.h first.

Thanks to Siegmar Gross for reporting the issue.

(cherry picked from commit open-mpi/hwloc@005a7e89b6)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:13:01 -07:00
Brice Goglin
50b035dddb hwloc: misc.h: Fix hwloc_strncasecmp() with some icc
tolower needs <ctype.h>

Thanks to Ralph Castain for reporting the failure.

(cherry picked from commit open-mpi/hwloc@038c372a58)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:10:57 -07:00
Brice Goglin
6764413aa3 hwloc: misc.h: Fix hwloc_strncasecmp() build under strict flags on BSD
strncasecmp() needs <strings.h>

Thanks to Pavan Balaji for reporting the failure.

(cherry picked from commit open-mpi/hwloc@37439c4801)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:10:46 -07:00
Brice Goglin
6b0011f138 hwloc: v1.9.1 released, doing 1.9.2rc1 now
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:06:15 -07:00
Ralph Castain
ed5d10b816 Somehow slipped by - ensure we correctly count the cores 2015-03-19 17:56:18 -07:00
Ralph Castain
43a3baad5e Ensure we use the first compute node's topology for mapping
Don't filter the topology by cpuset if you are mpirun until you know that no other compute nodes are involved. This deals with the corner case where mpirun is executing on a node of different topology from the compute nodes.

Simplify - don't mandate that all cpus in the given cpuset be present on every node. We can then run everything thru the filter as before, which ensures that any procs run on mpirun are also contained within the specified cpuset.

Correctly count the number of available PUs under each object when given a cpuset

Fix the default binding settings, and correctly count PUs when no cpuset is given

Ensure the binding policy gets set in all cases
2015-03-19 16:30:36 -07:00
Jeff Squyres
4ab9e67832 hwloc external: portability updates
Change "test -a" to "&& test", and change foo="$bar" to foo=$bar.  No
substantive code changes.
2015-03-13 04:40:09 -07:00
Jeff Squyres
4d63c88ed1 hwloc external: whitespace cleanup, no code changes 2015-03-13 04:40:05 -07:00