1
1
Граф коммитов

279 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
899bf548a2 opal/hwloc: fix topology detection when socket is above numa
The OPAL_PROC_ON_* definitions have been changed from values to
flags. This should not cause any problems as these values were already
used as flags throughout the code base. Note, there will be a
difference between localities produced by the new code and the
old. For example, if a machine does not have a level-3 but two cores
share a level-1 or level-2 cache cache the level-3 bit will not be set
in the locality and OPAL_PROC_ON_LOCAL_L3CACHE will return 0. Before
this change it would have returned 1.

In addition the OPAL_PROC_ON_LOCAL_* macros have been simplified.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 14:17:45 -06:00
Ralph Castain
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
Nathan Hjelm
156ce6af21 periodic whitespace purge
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 09:32:33 -06:00
Ralph Castain
b42545b0cb Update x86_32 cpuid assembly code. Cheery-picked from
open-mpi/hwloc@40f9978bcc
2015-07-31 11:40:38 -07:00
Ralph Castain
ed93154e43 Fix hetero operations. An error in the hwloc utilities only allocated memory for the first display of a binding map, and then assumed that all nodes had the same number of cores in them. This resulted in memory corruption whenever someone displayed a binding pattern for a hetero cluster, and a smaller node was first in line. 2015-07-07 12:52:16 -07:00
Ralph Castain
75ceec663a Now that it has been officially released, update the embedded HWLOC to 1.11.0 2015-06-28 14:07:45 -07:00
Nathan Hjelm
4d92c9989e more c99 updates
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Ralph Castain
ff92781ec4 Replace hwloc191 with hwloc1110
Fix hwloc compile. Ignore LAMA mapper due to deprecated hwloc functions
2015-06-13 10:11:45 -07:00
Ralph Castain
d9f23627fd Add in hwloc 1.11.0rc1 - will overwrite with final version 2015-06-04 15:35:56 -07:00
Ralph Castain
5003be5c5c If the user specifies a --map-by <foo> option, then default to bind-to <foo> unless they specify a bind-to option. If they map-by slot/node, then use the default policy based on num_procs. 2015-04-23 13:30:21 -07:00
Nathan Hjelm
33181b2543 opal: use C99 subobject naming for component initialization
This commit helps future-proof opal components by initializing each
component member by name.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Nathan Hjelm
3436f2917d Merge pull request #449 from hjelmn/mca_base_update
mca/base update
2015-04-16 08:41:48 -06:00
Jeff Squyres
fadc3ad01a hwloc/external/configure.m4: no need to unset
Instead, use a safe environment variable name (that is SCOPE_PUSHed and
SCOPE_POPed).
2015-04-14 07:04:01 -07:00
Ralph Castain
033418f62a Correct a typo that reversed the default binding pattern. Ensure we default bind to hwthread if user specified --use-hwthread-cpus if nprocs <= 2, and bind to hwthread if told to do so. 2015-04-10 15:58:35 -07:00
Ralph Castain
c32609b1c7 Bring over open-mpi/hwloc@f714f8d
linux: only use the device-tree on Power machines

It's available on ARM but the assumption that cpus' "reg" start at 0
is invalid.
We could make that work but the device-tree doesn't currently
bring anything better than sysfs on ARM, so don't bother for now.
2015-04-04 09:30:21 -07:00
Ralph Castain
b67b3619fc If we are using the default bindings, and one or more nodes are not setup to support binding, then don't error out - just don't bind.
Thanks to Annu Desari for pointing out the problem.
2015-03-28 08:20:24 -07:00
Nathan Hjelm
b68d66bb9b MCA: Add the project/project version to the MCA base component
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.

All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-03-27 10:59:04 -06:00
Jeff Squyres
0c502d90cd hwloc README-ompi.txt: update for what we pulled from hwloc
Document what we pulled from the hwloc tree.
2015-03-27 06:49:42 -07:00
Brice Goglin
29ccbfd590 hwloc pci: fix bridge depth
It was setup in the PCI backend before filtering,
and partially updated after filtering in the core.
Only setup once correctly after filtering in the core.

(cherry picked from commit open-mpi/hwloc@9659653d24)

Conflicts:
	tests/hwloc/linux/40intel64-2g2n4c+pci.output
	tests/hwloc/xml/192em64t-12gr2n8c2t-distancegroups.xml
	tests/hwloc/xml/192em64t-24n8c2t-distancegroups.xml
	tests/hwloc/xml/192em64t-24n8c2t-nodistancegroups.xml
	tests/hwloc/xml/24em64t-2n6c2t-pci.xml
	tests/hwloc/xml/32em64t-2n8c2t-pci-normalio.xml
	tests/hwloc/xml/96em64t-4n4d3ca2co-pci.xml
	utils/hwloc/test-hwloc-compress-dir.input.tar.gz
	utils/hwloc/test-hwloc-compress-dir.output.tar.gz

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 06:49:39 -07:00
Brice Goglin
1905f35a1e hwloc: bitmap: fix a corner case in hwloc_bitmap_isincluded() with infinite sets
If super_set contains more allocated ulongs than sub_set,
we did not check the last ulongs.
We would return true instead of false when sub_set is
infinite while the last ulongs in super_set are not full.

This fixes tests/hwloc_bitmap_compare_inclusion on some platforms.

(cherry picked from commit open-mpi/hwloc@299e6e846f)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:44 -07:00
Brice Goglin
5c9157c547 hwloc: core: only update root->complete sets if insert succeeds
Otherwise we get spurious bits for crazy topologies such as 8em64t-2s2ca2c-buggynuma.output

Will make debug asserts easier.

(cherry picked from commit open-mpi/hwloc@546cd9330a)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:44 -07:00
Brice Goglin
dec01097f8 hwloc: groups: add complete sets when inserting distance/pci groups
Make sure we define complete cpuset/nodeset when we define groups' main cpuset/nodeset
during later insert of groups (for PCI hostbridges or distances).
Otherwise they may end up clearing child/parent complete sets which
suddenly become incoherent while they were fixed earlier.

Needed to fix allowed_nodeset meaning.

(cherry picked from commit open-mpi/hwloc@7c88d17add)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:33 -07:00
Brice Goglin
d6e415cd41 hwloc: AIX: Fix PU os_index
When looking for PUs inside R_MAXSDL rads, some AIX 6.1 releases
return one first rad without any PU.
AIX 6.1 00F63F144C00 does (on quad-power7).
AIX 6.1 00CBAAC24C00 doesn't (on 16x power6).

So we can't assume rad #x contains PU #x. But we already have the right
code to fill the cpuset from the rad, so use that to obtain the PU os_index
as well.

Cannot be used to obtain NUMA node os_index since there's no way to directly
retrieve NUMA nodes from rads (mempools seem unrelated). Just keep using #rad
for NUMA nodes os_index and document that convention when converting back in
set_membind().

Thanks to Hendryk Bockelmann and Erik Schnetter for helping debugging.

(cherry picked from commit open-mpi/hwloc@60006c7b88)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:33 -07:00
Brice Goglin
80140bbe7b hwloc: distances: when we fail to insert an intermediate group, don't try to group further above
Otherwise we'll have some NULL objects above, would be annoying.
No need to dig further, the distance matrix is likely buggy.

We still keep the inserted groups at this level (incomplete level)
because removing them is hard.

(cherry picked from commit open-mpi/hwloc@312a971ec9)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:33 -07:00
Brice Goglin
29c99156cf hwloc: pci: fix SR-IOV VF vendor/device names
Commit 626129d2818693e62b83c1cfa2ba6e058e5bed66 fixed the hwloc
device/vendor numbers obtained from libpciaccess.
But the corresponding names are still retrieved from pciaccess numbers,
so fix these numbers inside pciaccess structures before retrieving the names.

(cherry picked from commit open-mpi/hwloc@85ea6e4acc)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:32 -07:00
Brice Goglin
da164be0ef hwloc: error: point to the FAQ when displaying the big OS error message
(cherry picked from commit open-mpi/hwloc@b191f816f6)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:32 -07:00
Brice Goglin
3f96e7a271 hwloc: synthetic: Misc levels are not allowed in the synthetic description
Misc objects were used between system and machine in the past
but quickly got replaced with groups.

(cherry picked from commit open-mpi/hwloc@6c2aa6d1ea)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
050bb35feb hwloc: x86: use Group instead of Misc for unknown x2apic levels
Misc are reserved for annotating the topology, the core
doesn't like merging them. Group is more appropriate.

(cherry picked from commit open-mpi/hwloc@3c47649591)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
379c7b0d8b hwloc: x86: use ulong for cache sizes, uint won't be enough in the near future
(cherry picked from commit open-mpi/hwloc@ae82597773)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
6caf9edbea hwloc: hpux: improve hwloc_hpux_find_ldom() looking for NUMA node
hwloc_get_first_largest_obj_inside_cpuset() returns the largest/highest object,
but it could still have a child with the same cpuset.
So check children as well in case there's a matching NUMA node there.

(cherry picked from commit open-mpi/hwloc@57a1c4fbe4)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
fff1bb5dcd hwloc: core: reorder children in merge_useless_child() as well
When ignore_keep_structure is enabled, intermediate level can disappear
between parent and child, making the new child complete_cpuset smaller,
causing the child list to require a reorder just like in remove_ignored().

(cherry picked from commit open-mpi/hwloc@88afbe6b62)

Embed this related commit:
core: abstract out reorder_children(), needed when merging modifies the list of children
(cherry picked from commit open-mpi/hwloc@14db82d391)

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:16 -07:00
Brice Goglin
77978a846e hwloc: core: fix the merging of identical objects in presence of Misc objects
If object A contains B + I/O as children, we can "ignore" I/Os and still
try to merge A and B. We now do the same for Misc objects without cpusets
instead of I/Os.

This fixes a corner case when export/reimport to XML creates a slightly
different topology (making hwloc_insert_misc fail inside a Linux cgroup).

Thanks to Dave Love for reporting the problem.

Fixes #118

(cherry picked from commit open-mpi/hwloc@650371e115)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
5427b33caf hwloc: debug: fix an overzealous assertion about the parent cpuset vs its children
When I/O are attached under a PU, removing the children's cpusets from
the parent cpuset doesn't give 0, it gives the PU cpuset.
The assertion fails on single-pu machines with I/O when --merge is given,
only one PU remains with I/O under it.

But if we insert Misc by cpuset under PU, it gives 0 as expected.

Fix the assertion accordingly.

Thanks to Thomas Van Doren for reporting the issue.

(cherry picked from commit open-mpi/hwloc@45c94c336d)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
9b59d532fc hwloc: cpuid-x86: Fix duplicate asm labels in case of heavy inlining on x86-32
hwloc_x86_discover() calls hwloc_look_x86() twice, which calls hwloc_have_x86_cpuid().
If everything gets inlined, the asm label inside hwloc_have_x86_cpuid()
is duplicated.
Use a local label with f annotation in jumps to avoid the problem.

Thanks to Thomas Van Doren for reporting the issue (found with gcc -m32).

(cherry picked from commit open-mpi/hwloc@50e447f5bc)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
86a536ca58 hwloc: x86 and OSF: Don't forget to set NUMA node nodeset
x86: Not critical since BSDs that use this backend have no membind support,
but better fix it for uniformization.
(cherry picked from commit open-mpi/hwloc@a431361c7d)

OSF: Looks like nobody ever tried to play with memory binding on OSF/Tru64.
(cherry picked from commit open-mpi/hwloc@2d6c73356d)

Conflicts:
	NEWS

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
db5bc72496 hwloc: API: clearly state that os_index isn't unique while logical_index is
(cherry picked from commit open-mpi/hwloc@6c75302ab2)

Conflicts:
	opal/mca/hwloc/hwloc191/hwloc/include/hwloc.h

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:14:15 -07:00
Brice Goglin
a636790604 hwloc: opal/mca/hwloc/hwloc191/hwloc/NEWS update
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:13:41 -07:00
Brice Goglin
7c96aecfaf hwloc: errors: improve the advice to send hwloc-gather-topology files in the OS error message
(cherry picked from commit open-mpi/hwloc@f77aa01b3c)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:13:41 -07:00
Brice Goglin
d5f8c89527 hwloc: configure: fix the check for X11/Xutil.h
At least some solaris enforce the need to #include X11/Xlib.h first.

Thanks to Siegmar Gross for reporting the issue.

(cherry picked from commit open-mpi/hwloc@005a7e89b6)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:13:01 -07:00
Brice Goglin
50b035dddb hwloc: misc.h: Fix hwloc_strncasecmp() with some icc
tolower needs <ctype.h>

Thanks to Ralph Castain for reporting the failure.

(cherry picked from commit open-mpi/hwloc@038c372a58)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:10:57 -07:00
Brice Goglin
6764413aa3 hwloc: misc.h: Fix hwloc_strncasecmp() build under strict flags on BSD
strncasecmp() needs <strings.h>

Thanks to Pavan Balaji for reporting the failure.

(cherry picked from commit open-mpi/hwloc@37439c4801)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:10:46 -07:00
Brice Goglin
6b0011f138 hwloc: v1.9.1 released, doing 1.9.2rc1 now
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2015-03-27 05:06:15 -07:00
Ralph Castain
ed5d10b816 Somehow slipped by - ensure we correctly count the cores 2015-03-19 17:56:18 -07:00
Ralph Castain
43a3baad5e Ensure we use the first compute node's topology for mapping
Don't filter the topology by cpuset if you are mpirun until you know that no other compute nodes are involved. This deals with the corner case where mpirun is executing on a node of different topology from the compute nodes.

Simplify - don't mandate that all cpus in the given cpuset be present on every node. We can then run everything thru the filter as before, which ensures that any procs run on mpirun are also contained within the specified cpuset.

Correctly count the number of available PUs under each object when given a cpuset

Fix the default binding settings, and correctly count PUs when no cpuset is given

Ensure the binding policy gets set in all cases
2015-03-19 16:30:36 -07:00
Jeff Squyres
4ab9e67832 hwloc external: portability updates
Change "test -a" to "&& test", and change foo="$bar" to foo=$bar.  No
substantive code changes.
2015-03-13 04:40:09 -07:00
Jeff Squyres
4d63c88ed1 hwloc external: whitespace cleanup, no code changes 2015-03-13 04:40:05 -07:00
Nysal Jan K.A
881a9f3d58 Fix cache line size detection on power
Due to the nature of the cache architecture on power,
we don't export coherency_line_size for L2 in sysfs.
If we are unable to get the L2 cache line size, try L1.

See open-mpi/ompi#383 for more information.
2015-02-25 17:26:28 +05:30
Howard Pritchard
c9e81b54fb Merge pull request #412 from hppritcha/topic/owner_files
add owner files to opa/ompi/orte mca directories
2015-02-23 09:48:20 -07:00
Gilles Gouaillardet
8d44d7086a hwloc/base: fix misc memory leaks
as reported by Coverity with CIDs 710636 and 1270441
2015-02-23 13:55:04 +09:00