1
1
Граф коммитов

361 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
9649c44fa0 hwloc: correctly handle --with-hwloc=external
- simply #include "hwloc.h" to use the external hwloc header
- do use the external hwloc header instead of opal/mca/hwloc/hwloc.h

Thanks Orion Poplawski for the report

Fixes open-mpi/ompi#2616

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-21 11:58:10 +09:00
Ralph Castain
585540bcee Reduce the flood of warnings due to uninitialized variables, mismatched types, and unused things to a more bearable trickle
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-14 16:33:50 -08:00
Gilles Gouaillardet
45732fd764 hwloc/base: fix a memory leak in buffer_cleanup()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-01 14:24:29 +09:00
Ralph Castain
be3197fe27 Ensure that the libevent headers are installed for external libevent when --with-devel-headers is given. Correct the path for opal_config.h in the external hwloc header 2016-10-20 20:57:50 -07:00
Gilles Gouaillardet
4e19cd51b1 hwloc/external: add a missing include file 2016-10-14 09:27:33 +09:00
Ralph Castain
a14ec3bdbc Mucho thanks to Gilles - his patch to reorder the CPPFLAGS solves the problem of inadvertently picking up hwloc and libevent headers from locations in CPPFLAGS while continuing to build the embedded versions. Also silence a minor warning about an uninitialized var. 2016-09-22 07:39:22 -07:00
Gilles Gouaillardet
cd2b5a82ed hwloc: plug memory leak
as reported by Coverity with CID 1270441
2016-09-07 10:08:44 +09:00
Jeff Squyres
7bea563e02 hwloc: fix Valgrind warning
Cherry picked from open-mpi/hwloc@d4565c351e

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-01 18:50:40 -07:00
Gilles Gouaillardet
acda07472a configury: revamp and re-ident sub configure.m4 after open-mpi/ompi@846360fd4c 2016-07-06 11:59:51 +09:00
Gilles Gouaillardet
846360fd4c configury: correctly perform make distclean when {libevent,hwloc,pmix} are external components
Thanks Jeff for the guidance

Fixes open-mpi/ompi#1683

note:
in order to keep this commit easy to review, some AS_IF([...]) were replaced with
AS_IF([false], ...) or AS_IF_([true], ...)
these will be removed and re-idented in a subsequent commit
2016-07-06 11:57:24 +09:00
Jeff Squyres
d175fd692d README.ompi: track patches added to hwloc
Track post-v1.11.3-release patches applied to the hwloc copy embedded
in Open MPI.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-06-01 07:17:05 -07:00
Jeff Squyres
3867bd3640 hwloc.m4: only check for valgrind in non-embedded mode
This fixes https://github.com/open-mpi/ompi/issues/1732: i.e., the
case where the outer project has its own check for
<valgrind/valgrind.h>, but also supplements CPPFLAGS (to find
Valgrind's header files) before doing that check.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

Ideally, we would tell OMPI to disable autoconf's caching of our
valgrind check result so that its check gets the right result after
adding CPPFLAGS. Not sure if we can do that.

For now, just disable our Valgrind code in embedded mode.
This will keep the x86 backend enabled under Valgrind but
it will auto-disable itself when finding identical APIC ids anyway
(because CPUID returns same outputs for all PUs).

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>

Fixes open-mpi/ompi#1732

(cherry picked from commit open-mpi/hwloc@8b44fb1c81)
2016-06-01 06:58:53 -07:00
Jeff Squyres
5cfee95ea4 hwloc1113: add missing file to Makefile.am
Lack of this file causes a failure when you run autogen.pl on a
distribution tarball.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-31 09:57:50 -07:00
Brice Goglin
ca621330a6 Update hwloc to v1.11.3
Remove contrib/windows/
Merge hwlocXYZ/hwloc/README-ompi.txt back into hwlocXYZ/README-ompi.txt instead of having both.
Add README.txt in new automake-required directory contrib/systemd/

Keep the following patches applied since they are not in 1.11.3
    linux: actually enable libudev based on the result of AC_CHECK_LIB
    (cherry picked from open-mpi/hwloc@9549fd59af)
    configure: check the actual may_alias syntax that we use
    (cherry picked from open-mpi/hwloc@0ab7af5e90)
2016-05-20 07:20:16 +02:00
Jeff Squyres
eccf0ff4cd hwloc/external: set WRAPPER_EXTRA_* vars in proper location
WRAPPER_EXTRA flags are checked *before* the POST_CONFIG macro is
invoked.  So set them in the main CONFIG macro.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-10 07:34:56 -07:00
Brice Goglin
6839d928c2 configure: check the actual may_alias syntax that we use
xlc 13.1.0 crashes because of our may_alias attributes in nolibxml.c
on Power7. libxml.c and nolibxml.c are the only may_alias users for now,
so change our configure check to match the actual code using it.

Thanks to Paul Hargrove for reporting and debugging the issue,
and providing the patch.

https://www.open-mpi.org/community/lists/devel/2016/05/18918.php

(cherry picked from open-mpi/hwloc@0ab7af5e90)
2016-05-08 22:22:30 +02:00
Ralph Castain
7594b95e4b Ensure the hwloc external header is include when --with-devel-headers is given 2016-05-08 10:18:14 -07:00
Brice Goglin
a2a721f961 linux: actually enable libudev based on the result of AC_CHECK_LIB
instead of doing AC_CHECK_HEADERS+AC_CHECK_LIB and only using the result of the former.

Thanks to Paul Hargrove for reporting the issue (OMPI build with -m32).

(cherry picked from open-mpi/hwloc@9549fd59af)
2016-05-03 10:00:40 +02:00
Karol Mroz
e1c64e6e59 opal: standardize on max hostname length
Define OPAL_MAXHOSTNAMELEN to be either:
  (MAXHOSTNAMELEN + 1) or
  (limits.h:HOST_NAME_MAX + 1) or
  (255 + 1)

For pmix code, define above using PMIX_MAXHOSTNAMELEN.

Fixup opal layer to use the new max.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-24 08:19:47 +02:00
Gilles Gouaillardet
d529951206 hwloc: correctly count cores with at least one allowed PU
when SMT is enabled, a core must be counted as long as one of its hwthread is allowed

Thanks Ben Menadue for the report.

This fixes a regression from open-mpi/ompi@6d149554a7
2016-01-29 11:54:34 +09:00
Gilles Gouaillardet
6d149554a7 hwloc: have opal_hwloc_base_get_pu search for HWLOC_OBJ_PU when mpirun is invoked with --use-hwthread-cpus
Fixes open-mpi/ompi#1247
2016-01-26 18:10:33 +09:00
Tim Mattox
958de82471 hwloc_base_util.c: Remove newly unused variable 'i'. 2016-01-14 16:35:47 -05:00
Tim Mattox
f2d4a8d266 Replace a bit counting loop with a call to an efficient population count routine 2016-01-12 10:48:56 -05:00
Nathan Hjelm
15007b4e2b linux: use mntent.h instead of manually parsing /proc/mounts
setmntent() doesn't support root_fd, but manual parsing of
/proc/mounts is fragile, and actually buggy for very long mount lines
(see open-mpi/hwloc#142 (comment)).

Since we only openat("/proc/mounts") there, just manually concatenate
the fsroot_path and use setmntent().

Thanks to Nathan Hjelm for the report.

(Cherry-picked from open-mpi/hwloc@d2d07b9a22)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-07 12:55:03 -07:00
Nathan Hjelm
1384559fcd Update hwloc to v1.11.2
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-07 12:33:12 -07:00
Gilles Gouaillardet
b20a219ad0 hwloc/external: abort if hwloc v2 is detected since it is not yet supported 2015-12-29 09:23:27 +09:00
Gilles Gouaillardet
fec973efda configury: test portability
replace test ... -o ... with test ... || test ...
and test ... -a ... with test ... && test ...
2015-12-28 13:58:45 +09:00
Ralph Castain
64b695669a Cleanup warnings in opal and orte layers when building optimized on Mac 2015-12-17 07:51:24 -08:00
Jeff Squyres
f69364e768 hwloc: upgrade from v1.11.0 to v1.11.1
Taken from upstream v1.11.1 release.

Fixes open-mpi/ompi#981.
2015-10-15 08:58:33 -07:00
Jeff Squyres
12e796dcaf hwloc: headers are not in $includedir
They are in $opalincludedir.  Use the neutral "$pkgincludedir", which
will get translated under the covers to opalincludedir.
2015-10-13 05:59:52 -07:00
Gilles Gouaillardet
2ac09d5a8d pci: do not probe PCI topology on Solaris unless effective uid is root
Otherwise libpciaccess sends a big error message to stderr:
  Error opening /devices/pci@0,0:reg: Permission denied

(cherry picked from commit open-mpi/hwloc@d93c7c0960)
2015-09-29 09:42:58 +09:00
Gilles Gouaillardet
975b6fd51b hwloc: do not count not allowed cores in df_search_cores 2015-09-17 13:10:34 +09:00
Nathan Hjelm
899bf548a2 opal/hwloc: fix topology detection when socket is above numa
The OPAL_PROC_ON_* definitions have been changed from values to
flags. This should not cause any problems as these values were already
used as flags throughout the code base. Note, there will be a
difference between localities produced by the new code and the
old. For example, if a machine does not have a level-3 but two cores
share a level-1 or level-2 cache cache the level-3 bit will not be set
in the locality and OPAL_PROC_ON_LOCAL_L3CACHE will return 0. Before
this change it would have returned 1.

In addition the OPAL_PROC_ON_LOCAL_* macros have been simplified.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 14:17:45 -06:00
Ralph Castain
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
Nathan Hjelm
156ce6af21 periodic whitespace purge
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 09:32:33 -06:00
Ralph Castain
b42545b0cb Update x86_32 cpuid assembly code. Cheery-picked from
open-mpi/hwloc@40f9978bcc
2015-07-31 11:40:38 -07:00
Ralph Castain
ed93154e43 Fix hetero operations. An error in the hwloc utilities only allocated memory for the first display of a binding map, and then assumed that all nodes had the same number of cores in them. This resulted in memory corruption whenever someone displayed a binding pattern for a hetero cluster, and a smaller node was first in line. 2015-07-07 12:52:16 -07:00
Ralph Castain
75ceec663a Now that it has been officially released, update the embedded HWLOC to 1.11.0 2015-06-28 14:07:45 -07:00
Nathan Hjelm
4d92c9989e more c99 updates
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Ralph Castain
ff92781ec4 Replace hwloc191 with hwloc1110
Fix hwloc compile. Ignore LAMA mapper due to deprecated hwloc functions
2015-06-13 10:11:45 -07:00
Ralph Castain
d9f23627fd Add in hwloc 1.11.0rc1 - will overwrite with final version 2015-06-04 15:35:56 -07:00
Ralph Castain
5003be5c5c If the user specifies a --map-by <foo> option, then default to bind-to <foo> unless they specify a bind-to option. If they map-by slot/node, then use the default policy based on num_procs. 2015-04-23 13:30:21 -07:00
Nathan Hjelm
33181b2543 opal: use C99 subobject naming for component initialization
This commit helps future-proof opal components by initializing each
component member by name.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Nathan Hjelm
3436f2917d Merge pull request #449 from hjelmn/mca_base_update
mca/base update
2015-04-16 08:41:48 -06:00
Jeff Squyres
fadc3ad01a hwloc/external/configure.m4: no need to unset
Instead, use a safe environment variable name (that is SCOPE_PUSHed and
SCOPE_POPed).
2015-04-14 07:04:01 -07:00
Ralph Castain
033418f62a Correct a typo that reversed the default binding pattern. Ensure we default bind to hwthread if user specified --use-hwthread-cpus if nprocs <= 2, and bind to hwthread if told to do so. 2015-04-10 15:58:35 -07:00
Ralph Castain
c32609b1c7 Bring over open-mpi/hwloc@f714f8d
linux: only use the device-tree on Power machines

It's available on ARM but the assumption that cpus' "reg" start at 0
is invalid.
We could make that work but the device-tree doesn't currently
bring anything better than sysfs on ARM, so don't bother for now.
2015-04-04 09:30:21 -07:00
Ralph Castain
b67b3619fc If we are using the default bindings, and one or more nodes are not setup to support binding, then don't error out - just don't bind.
Thanks to Annu Desari for pointing out the problem.
2015-03-28 08:20:24 -07:00
Nathan Hjelm
b68d66bb9b MCA: Add the project/project version to the MCA base component
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.

All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-03-27 10:59:04 -06:00
Jeff Squyres
0c502d90cd hwloc README-ompi.txt: update for what we pulled from hwloc
Document what we pulled from the hwloc tree.
2015-03-27 06:49:42 -07:00
Brice Goglin
29ccbfd590 hwloc pci: fix bridge depth
It was setup in the PCI backend before filtering,
and partially updated after filtering in the core.
Only setup once correctly after filtering in the core.

(cherry picked from commit open-mpi/hwloc@9659653d24)

Conflicts:
	tests/hwloc/linux/40intel64-2g2n4c+pci.output
	tests/hwloc/xml/192em64t-12gr2n8c2t-distancegroups.xml
	tests/hwloc/xml/192em64t-24n8c2t-distancegroups.xml
	tests/hwloc/xml/192em64t-24n8c2t-nodistancegroups.xml
	tests/hwloc/xml/24em64t-2n6c2t-pci.xml
	tests/hwloc/xml/32em64t-2n8c2t-pci-normalio.xml
	tests/hwloc/xml/96em64t-4n4d3ca2co-pci.xml
	utils/hwloc/test-hwloc-compress-dir.input.tar.gz
	utils/hwloc/test-hwloc-compress-dir.output.tar.gz

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 06:49:39 -07:00
Brice Goglin
1905f35a1e hwloc: bitmap: fix a corner case in hwloc_bitmap_isincluded() with infinite sets
If super_set contains more allocated ulongs than sub_set,
we did not check the last ulongs.
We would return true instead of false when sub_set is
infinite while the last ulongs in super_set are not full.

This fixes tests/hwloc_bitmap_compare_inclusion on some platforms.

(cherry picked from commit open-mpi/hwloc@299e6e846f)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:44 -07:00
Brice Goglin
5c9157c547 hwloc: core: only update root->complete sets if insert succeeds
Otherwise we get spurious bits for crazy topologies such as 8em64t-2s2ca2c-buggynuma.output

Will make debug asserts easier.

(cherry picked from commit open-mpi/hwloc@546cd9330a)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:44 -07:00
Brice Goglin
dec01097f8 hwloc: groups: add complete sets when inserting distance/pci groups
Make sure we define complete cpuset/nodeset when we define groups' main cpuset/nodeset
during later insert of groups (for PCI hostbridges or distances).
Otherwise they may end up clearing child/parent complete sets which
suddenly become incoherent while they were fixed earlier.

Needed to fix allowed_nodeset meaning.

(cherry picked from commit open-mpi/hwloc@7c88d17add)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:33 -07:00
Brice Goglin
d6e415cd41 hwloc: AIX: Fix PU os_index
When looking for PUs inside R_MAXSDL rads, some AIX 6.1 releases
return one first rad without any PU.
AIX 6.1 00F63F144C00 does (on quad-power7).
AIX 6.1 00CBAAC24C00 doesn't (on 16x power6).

So we can't assume rad #x contains PU #x. But we already have the right
code to fill the cpuset from the rad, so use that to obtain the PU os_index
as well.

Cannot be used to obtain NUMA node os_index since there's no way to directly
retrieve NUMA nodes from rads (mempools seem unrelated). Just keep using #rad
for NUMA nodes os_index and document that convention when converting back in
set_membind().

Thanks to Hendryk Bockelmann and Erik Schnetter for helping debugging.

(cherry picked from commit open-mpi/hwloc@60006c7b88)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:33 -07:00
Brice Goglin
80140bbe7b hwloc: distances: when we fail to insert an intermediate group, don't try to group further above
Otherwise we'll have some NULL objects above, would be annoying.
No need to dig further, the distance matrix is likely buggy.

We still keep the inserted groups at this level (incomplete level)
because removing them is hard.

(cherry picked from commit open-mpi/hwloc@312a971ec9)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:33 -07:00
Brice Goglin
29c99156cf hwloc: pci: fix SR-IOV VF vendor/device names
Commit 626129d2818693e62b83c1cfa2ba6e058e5bed66 fixed the hwloc
device/vendor numbers obtained from libpciaccess.
But the corresponding names are still retrieved from pciaccess numbers,
so fix these numbers inside pciaccess structures before retrieving the names.

(cherry picked from commit open-mpi/hwloc@85ea6e4acc)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:32 -07:00
Brice Goglin
da164be0ef hwloc: error: point to the FAQ when displaying the big OS error message
(cherry picked from commit open-mpi/hwloc@b191f816f6)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:32 -07:00
Brice Goglin
3f96e7a271 hwloc: synthetic: Misc levels are not allowed in the synthetic description
Misc objects were used between system and machine in the past
but quickly got replaced with groups.

(cherry picked from commit open-mpi/hwloc@6c2aa6d1ea)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
050bb35feb hwloc: x86: use Group instead of Misc for unknown x2apic levels
Misc are reserved for annotating the topology, the core
doesn't like merging them. Group is more appropriate.

(cherry picked from commit open-mpi/hwloc@3c47649591)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
379c7b0d8b hwloc: x86: use ulong for cache sizes, uint won't be enough in the near future
(cherry picked from commit open-mpi/hwloc@ae82597773)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
6caf9edbea hwloc: hpux: improve hwloc_hpux_find_ldom() looking for NUMA node
hwloc_get_first_largest_obj_inside_cpuset() returns the largest/highest object,
but it could still have a child with the same cpuset.
So check children as well in case there's a matching NUMA node there.

(cherry picked from commit open-mpi/hwloc@57a1c4fbe4)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
fff1bb5dcd hwloc: core: reorder children in merge_useless_child() as well
When ignore_keep_structure is enabled, intermediate level can disappear
between parent and child, making the new child complete_cpuset smaller,
causing the child list to require a reorder just like in remove_ignored().

(cherry picked from commit open-mpi/hwloc@88afbe6b62)

Embed this related commit:
core: abstract out reorder_children(), needed when merging modifies the list of children
(cherry picked from commit open-mpi/hwloc@14db82d391)

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
77978a846e hwloc: core: fix the merging of identical objects in presence of Misc objects
If object A contains B + I/O as children, we can "ignore" I/Os and still
try to merge A and B. We now do the same for Misc objects without cpusets
instead of I/Os.

This fixes a corner case when export/reimport to XML creates a slightly
different topology (making hwloc_insert_misc fail inside a Linux cgroup).

Thanks to Dave Love for reporting the problem.

Fixes #118

(cherry picked from commit open-mpi/hwloc@650371e115)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
5427b33caf hwloc: debug: fix an overzealous assertion about the parent cpuset vs its children
When I/O are attached under a PU, removing the children's cpusets from
the parent cpuset doesn't give 0, it gives the PU cpuset.
The assertion fails on single-pu machines with I/O when --merge is given,
only one PU remains with I/O under it.

But if we insert Misc by cpuset under PU, it gives 0 as expected.

Fix the assertion accordingly.

Thanks to Thomas Van Doren for reporting the issue.

(cherry picked from commit open-mpi/hwloc@45c94c336d)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
9b59d532fc hwloc: cpuid-x86: Fix duplicate asm labels in case of heavy inlining on x86-32
hwloc_x86_discover() calls hwloc_look_x86() twice, which calls hwloc_have_x86_cpuid().
If everything gets inlined, the asm label inside hwloc_have_x86_cpuid()
is duplicated.
Use a local label with f annotation in jumps to avoid the problem.

Thanks to Thomas Van Doren for reporting the issue (found with gcc -m32).

(cherry picked from commit open-mpi/hwloc@50e447f5bc)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
86a536ca58 hwloc: x86 and OSF: Don't forget to set NUMA node nodeset
x86: Not critical since BSDs that use this backend have no membind support,
but better fix it for uniformization.
(cherry picked from commit open-mpi/hwloc@a431361c7d)

OSF: Looks like nobody ever tried to play with memory binding on OSF/Tru64.
(cherry picked from commit open-mpi/hwloc@2d6c73356d)

Conflicts:
	NEWS

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
db5bc72496 hwloc: API: clearly state that os_index isn't unique while logical_index is
(cherry picked from commit open-mpi/hwloc@6c75302ab2)

Conflicts:
	opal/mca/hwloc/hwloc191/hwloc/include/hwloc.h

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
a636790604 hwloc: opal/mca/hwloc/hwloc191/hwloc/NEWS update
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:13:41 -07:00
Brice Goglin
7c96aecfaf hwloc: errors: improve the advice to send hwloc-gather-topology files in the OS error message
(cherry picked from commit open-mpi/hwloc@f77aa01b3c)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:13:41 -07:00
Brice Goglin
d5f8c89527 hwloc: configure: fix the check for X11/Xutil.h
At least some solaris enforce the need to #include X11/Xlib.h first.

Thanks to Siegmar Gross for reporting the issue.

(cherry picked from commit open-mpi/hwloc@005a7e89b6)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:13:01 -07:00
Brice Goglin
50b035dddb hwloc: misc.h: Fix hwloc_strncasecmp() with some icc
tolower needs <ctype.h>

Thanks to Ralph Castain for reporting the failure.

(cherry picked from commit open-mpi/hwloc@038c372a58)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:10:57 -07:00
Brice Goglin
6764413aa3 hwloc: misc.h: Fix hwloc_strncasecmp() build under strict flags on BSD
strncasecmp() needs <strings.h>

Thanks to Pavan Balaji for reporting the failure.

(cherry picked from commit open-mpi/hwloc@37439c4801)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:10:46 -07:00
Brice Goglin
6b0011f138 hwloc: v1.9.1 released, doing 1.9.2rc1 now
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:06:15 -07:00
Ralph Castain
ed5d10b816 Somehow slipped by - ensure we correctly count the cores 2015-03-19 17:56:18 -07:00
Ralph Castain
43a3baad5e Ensure we use the first compute node's topology for mapping
Don't filter the topology by cpuset if you are mpirun until you know that no other compute nodes are involved. This deals with the corner case where mpirun is executing on a node of different topology from the compute nodes.

Simplify - don't mandate that all cpus in the given cpuset be present on every node. We can then run everything thru the filter as before, which ensures that any procs run on mpirun are also contained within the specified cpuset.

Correctly count the number of available PUs under each object when given a cpuset

Fix the default binding settings, and correctly count PUs when no cpuset is given

Ensure the binding policy gets set in all cases
2015-03-19 16:30:36 -07:00
Jeff Squyres
4ab9e67832 hwloc external: portability updates
Change "test -a" to "&& test", and change foo="$bar" to foo=$bar.  No
substantive code changes.
2015-03-13 04:40:09 -07:00
Jeff Squyres
4d63c88ed1 hwloc external: whitespace cleanup, no code changes 2015-03-13 04:40:05 -07:00
Nysal Jan K.A
881a9f3d58 Fix cache line size detection on power
Due to the nature of the cache architecture on power,
we don't export coherency_line_size for L2 in sysfs.
If we are unable to get the L2 cache line size, try L1.

See open-mpi/ompi#383 for more information.
2015-02-25 17:26:28 +05:30
Howard Pritchard
c9e81b54fb Merge pull request #412 from hppritcha/topic/owner_files
add owner files to opa/ompi/orte mca directories
2015-02-23 09:48:20 -07:00
Gilles Gouaillardet
8d44d7086a hwloc/base: fix misc memory leaks
as reported by Coverity with CIDs 710636 and 1270441
2015-02-23 13:55:04 +09:00
Howard Pritchard
bf89131f9e add owner files to opa/ompi/orte mca directories
This commit adds an owner file in each of the component directories
for each framework.  This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page.  Currently there are two
"fields" in the file, an owner and a status.  A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
2015-02-22 15:10:23 -07:00
Gilles Gouaillardet
55948f2a6d hwloc: fix misc memory leak
as reported by Coverity with CID 1270441
(previous commit open-mpi/ompi@c25185f3a9 did not fully fix that one)
2015-02-17 14:06:15 +09:00
Gilles Gouaillardet
c25185f3a9 opal/hwloc: fix misc memory leaks
as reported by Coverity with CIDS 710631-710638, 1196705,
1196716, 1196717, 1196752, 1196753
2015-02-16 12:23:37 +09:00
Gilles Gouaillardet
8dd77c692e opal/hwloc: fix misc bugs
as reported by Coverity with CIDs 72224, 703566,
1196821, 1196842, 1196657 and 1196658
2015-02-16 11:59:48 +09:00
Gilles Gouaillardet
f6da257477 configury: test external hwloc version is 1.8 or greater
hwloc_topology_dup is only available from hwloc 1.8
2014-12-22 13:42:38 +09:00
Jeff Squyres
40dd4c5b76 configury: manually remove some stamp-h? files
Due to what might be a bug in Automake, we need to remove stamp-h?
files manually.  See
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19418.
2014-12-20 08:32:57 -08:00
Ralph Castain
123fdd603f If we are using hwthread cpus, then default to binding there, letting the user override to whatever they want 2014-12-19 08:04:28 -08:00
Jeff Squyres
140bb3d421 hwloc configure: fix typo -- add missing $
Arrgh!  Missed a "$" in the last commit, making the test always
false.
2014-12-18 10:25:43 -08:00
Jeff Squyres
be6d46490f hwloc: only add CPPFLAGS if hwloc is actually being built
As pointed out by @ggouaillardet, we were adding some unnecessary -I
flags to CPPFLAFGS when --without-hwloc was being used.  This commit
slightly updates the hwloc191 component configury to only add such
things when the component is, in fact, going to be
compiled/installed.
2014-12-18 08:56:49 -08:00
Ralph Castain
0630680f36 Two cleanups required for transfer to 1.8.4:
* Use %d format for the topo signature as some systems apparently have problems with %u
* Use correct variable in show_help message
2014-12-12 17:23:32 -08:00
Ralph Castain
18d9fdfd8d Restore full topology comparison to support inventory monitoring 2014-12-09 01:33:06 -08:00
Ralph Castain
9b2f8cd840 Add the processor architecture to the topology signature 2014-12-09 01:17:00 -08:00
Ralph Castain
bb529ebd8e Revise the way we handle hetero nodes as users are finding this (a) a significant surprise, and (b) confusing as to when it is required. So try to automate it a bit by creating a topology "signature" that mpirun can share on the cmd line with the remote daemons, thus allowing them to check to see if they match. This isn't comprehensive of course - for now, it only checks the number of each type of hwloc object on the node. This is good enough to pickup major differences (e.g., where we have different numbers of sockets or assigned core bindings).
Retain the hetero-nodes flag for those cases where the user *knows* that there are differences and our automated system isn't good enough to see it.

Will obviously require further refinement as we find out which variances it can detect, and which it cannot.
2014-12-08 15:38:14 -08:00
Ralph Castain
cb15cc06e1 Minor changes per Jeff's request on PR for 1.8.4 2014-12-02 19:54:10 -08:00
Ralph Castain
960ef34988 Ensure the LSF ras adds the hosts to the allocation. Correctly handle the semi-colon vs comma situation in hwloc slot_lists 2014-11-30 14:37:37 -08:00
Ralph Castain
3f9d9ae8b6 Provide tighter LSF integration by correctly handling scenarios where the user has asked LSF to assign bindings. Fix a couple of typos in lex parser definitions. Tell hostfile parser to ignore binding designations in hostfiles. Add an attribute to indicate that cpusets were provided as physical cpu ids.
Once validated, a version of this will be backported to the v1.8.4 release.
2014-11-30 11:50:31 -08:00
Ralph Castain
d0704ef118 Restore handling of physical processors in rankfiles. Note that the prior implementation was likely incorrect as it falsely assumed that physical core indices were unique, which isn't always true. Stipulate that physical rankfiles can only include PU numbers, and bind the result to the core that contains that physical PU. Update the mpirun man page to cover the new use-case. 2014-11-10 14:00:40 -08:00
Ralph Castain
2a90788724 Support physical processor ids in rankfile 2014-11-10 14:00:40 -08:00