since ibv_create_xrc_rcv_qp is now deprecated, and in order to
be "future-proof", we have to consider the case in which only XRC Domains are supported.
Thanks Paul Hargrove for the detailled report.
The usnic BTL configure.m4 no longer needs to OPAL_CHECK_LIBFABRIC; it
just uses the results from opal/mca/common/libfabric's configure.m4.
We also now don't need to link against libfabric -- they just link
against the opal_common_libfabric library.
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
The definition of MPI_T_pvar_get_index was incorrect. This commit
fixes the definition and adds a missing return code.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
So add a brief timer event to kick us out of the communication. The precise amount of time we should wait is somewhat TBD, but set something short for now and we can adjust.
This function is only used in the DL case -- it can be #if'ed out if
we're not compiling with DL support to avoid a compiler warning about
defined-but-not-used.
Minor comment/whitespace fixes. Also some minor logic changes that
are mainly for defensive programming purposes (i.e., ensure to always
set malloc_hook_set to true or false, and then check it before we try
to actually invoke it).
Instead of unconditionally setting the memory hook, only set it when
the memory hooks are both available and have been enabled (e.g.,
opal/mca/memory/linux has decided that it *can* be enabled, and when
the mpi_leave_pinned MCA param is set to 1, or is set to -1 and some
component requested the memory hooks be enabled).
If we set the memory hook when memory hooks are not enabled,
__malloc_hook will be NULL, which will cause problems when
btl_openib_malloc_hook() tries to invoke it.
Fixesopen-mpi/ompi#638.
Also add another (superflous but symmetric) continue statement.
This missing "continue" statement allows IPv4 "private network"
matches to fall through and allow IPv6 matches to be made -- thereby
overriding the IPv4 match that was already made.
Fixes#585 (although several of the other issues identified on #585
still exist, the primary / initial bug that was reported there is now
fixed).
CID 1301389 Resource leak (RESOURCE_LEAK)
There is no conceivable reason to strdup cr_argv[0] in either
location. Removed the calls to strdup.
CID 741357 Resource leak (RESOURCE_LEAK)
cr_argv was created by opal_argv_split (tmp_argv[0], ' '). Why should
we call opal_argv_join (' ') on this array. Leak fixed by printing out
tmp_argv[0] instead of calling opal_argv_join.
CID 741358 Resource leak (RESOURCE_LEAK)
The code does not handle exec failure correctly. The error should be
communicated to the parent process but the function in question is
only called by the parent. This calls into question some of the
structure of the function in general (like what is the point of
returning the child process id). That said, I will go ahead and add
the opal_argv_free to quiet this error.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
CID 1269841 Out-of-bounds access (OVERRUN)
Correct issue. If the string being concatingated fills the remaining
buffer then a \0 is written past the end of the string. In practice
this should never happen but it should be fixed. I re-organized the
code a bit to clear this error.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Changing the client to leave its socket as blocking during the connect doesn't solve the problem by itself - you also have to introduce a sleep delay once the backlog is hit to avoid simply machine-gunning your way thru retries. This gets somewhat difficult to adjust as you don't want to unnecessarily prolong startup time.
We've solved this before by adding a listening thread that simply reaps accepts and shoves them into the event library for subsequent processing. This would resolve the problem, but meant yet another daemon-level thread. So I centralized the listening thread support and let multiple elements register listeners on it. Thus, each daemon now has a single listening thread that reaps accepts from multiple sources - for now, the orte/pmix server and the oob/usock support are using it. I'll add in the oob/tcp component later.
This still didn't fully resolve the SMP problem, especially on coprocessor cards (e.g., KNC). Removing the shared memory dstore support helped further improve the behavior - it looks like there is some kind of memory paging issue there that needs further understanding. Given that the shared memory support was about to be lost when I bring over the PMIx integration (until it is restored in that library), it seemed like a reasonable thing to just remove it at this point.
CID 1269707 Logically dead code (DEADCODE)
Coverity is correct that tmp3 can never be NULL here. Deleted the dead
code.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
CID 1269730 Dereference after null check (FORWARD_NULL)
The code checked for cb == NULL before checking for a callback
function but did not have the same protection around the
OBJ_RELEASE(cb).
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
CID 1269821 Dereference null return value (NULL_RETURNS)
This is another false positive that can be silenced by looping on
opal_list_remove_first instead of using both opal_list_is_empty and
opal_list_remove_first.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
CID 1301390 Dereference before null check (REVERSE_INULL)
endpoint can not be NULL here. Remove NULL check.
CID 1269836 Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
CID 1301388 Bad bit shift operation (BAD_SHIFT)
Add ull to integer constants to ensure the math is done in 64-bits not
32.
CID 715749 Explicit null dereferenced (FORWARD_NULL)
As far as I can tell this parser function does not accept a line that
does match key = value. If that is the case then value should never be
NULL. If it is it is a parse error. Updated the code to reflect
this. Also modified the intify function to do something more sane
(strtol vs atoi with hex detection).
CID 1269820 Dereference null return value (NULL_RETURNS)
This is a false positive as strchr will never return NULL here. It
makes sense, though, to quiet the warning by changing the do {} while
() loop to a while () loop.
CID 1269780 Dereference after null check (FORWARD_NULL)
Just return an error if the endpoint's cpc data is NULL.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
CID 1269674 Ignoring number of bytes read (CHECKED_RETURN)
Check that we read enough bytes to get a complete async command.
CID 1269793 Missing break in switch (MISSING_BREAK)
Added comment to indicate fall through was intentional.
CID 1269702: Constant variable guards dead code (DEADCODE)
Remove an unused argument to opal_show_help. This will quiet the
coverity issue.
CID 1269675 Ignoring number of bytes read (CHECKED_RETURN)
Check that at least sizeof(int) bytes are read. If this is not the
case then it is an error.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
CID 1196720 Resource leak (RESOURCE_LEAK)
CID 1196721 Resource leak (RESOURCE_LEAK)
The code in question does leak loc_token and loc_value. Cleaned up the
code a bit and plugged the leak.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
CID 1269864 Resource leak (RESOURCE_LEAK)
CID 1269865 Resource leak (RESOURCE_LEAK)
Slightly refactored the code to remove extra goto statements and
ensure the if_include_list and if_exclude_list are actually released
on success.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
CID 1269931 Uninitialized scalar variable (UNINIT)
Initialize complete async message. This was not a bug but the fix
contributes to valgrind cleanness (uninitialed write).
CID 1269915 Unintended sign extension (SIGN_EXTENSION)
Should never happen. Quieting this by explicitly casting to uint64_t.
CID 1269824 Dereference null return value (NULL_RETURNS)
It is impossible for opal_list_remove_first to return NULL if
opal_list_is_empty returns false. I refactored the code in question to
not use opal_list_is_empty but loop until NULL is returned by
opal_list_remove_first. That will quiet the issue.
CID 1269913 Dereference before null check (REVERSE_INULL)
The storage parameter should never be NULL. The check intended to
check if *storage was NULL not storage.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
CID 1269933 Uninitialized scalar variable (UNINIT)
This CID isn't really an error but it is best for both valgrind and
coverity cleanness to not write uninitialized data. Added an
initializer for async_command in btl_openib_component_close.
CID 1269930 Uninitialized scalar variable (UNINIT)
Same as above. Best not to write uninitialized data. Added an
initializer for async_command.
CID 1269701 Logically dead code (DEADCODE)
Coverity is correct. The smallest_pp_qp will always be 0. Changed the
initial value so that the smallest_pp_qp is set as intended. If no
per-per queue pair exists then use the last shared queue pair. This
queue pair should have the smallest message size. This will reduce
buffer waste.
CID 1269713 Logically dead code (DEADCODE)
False positive but easy to silence. The two check are meaningless if
HAVE_XRC is 0 so protect them with #if HAVE_XRC.
CID 1269726 Division or modulo by zero (DIVIDE_BY_ZERO)
Indeed an issue. If we get an invalid value for rd_win then this will
cause a divide-by-zero exception. Added a check to ensure rd_win is >
0. Also updated the help message to reflect this requirement.
CID 1269672 Ignoring number of bytes read (CHECKED_RETURN)
This error was somewhat intentional. Linux parameter files are
probably not empty but it is safer to check the return code of read to
make sure we got something. If 0 bytes are read this code could SEGV
whe running strtoull.
CID 1269836 Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
Add a range check to read_module_param to ensure we do not
overflow. In the future it might be worthwhile to report an error
because these parameters should never cause overflow in this
calculation.
CID 1269692 Calling risky function (DC.WEAK_CRYPTO)
??? This call was added in 2006 but I see no calls to the rest of the
rand48 family of functions. Anyway, we SHOULD NEVER be calling seed48,
srand, etc because it messes with user code. Removed the call to
seed48.
CID 1269823 Dereference null return value (NULL_RETURNS)
This is likely a false positive. The endpoint lock is being held so no
other thread should be able to remove fragments from the list. Also,
mca_btl_openib_endpoint_post_send should not be removing items from
the list. If a NULL fragment is ever returned it will likely be a
coding error on the part of an Open MPI developer. Added an assert()
to catch this and quiet the coverity error.
CID 1269671 Unchecked return value (CHECKED_RETURN)
Added a check for the return code of mca_btl_openib_endpoint_post_send
to quiet the coverity error. It is unlikely this error path will be
traversed.
CID 1270229 Missing break in switch (MISSING_BREAK)
Add a comment to indicate that the fall-through is intentional.
CID 1269735 Dereference after null check (FORWARD_NULL)
There should always be an endpoint when handling a work
completion. The endpoint is either stored on the fragment or can be
looked up using the immediate data. Move the immediate data code up
and add an assert for a NULL endpoint.
CID 1269740 Dereference after null check (FORWARD_NULL)
CID 1269741 Explicit null dereferenced (FORWARD_NULL)
Similar to CID 1269735 fix.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
CID 1292483 Uninitialized pointer read (UNINIT)
Initialize the method and credential members of the opal_sec_cred_t to
avoid possible invalid read when calling cleanup_cred.
CID 1292484 Double free (USE_AFTER_FREE)
Set method and credential members to NULL after freeing in
cleanup_cred.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Followup to open-mpi/ompi@65b66ab: if we're not debugging, then #if
out an entire block so that the compiler doesn't warn about variables
that are assigned and not used.
CID 70622 Dereference before null check (REVERSE_INULL)
CID 70459 Logically dead code (DEADCODE)
Cleanup some cludgy code which (among other things) reimplemented
strcat, strdup, and strchr. In the process this resolved two
outstanding coverity issues.
CID 70631 Dereference before null check (REVERSE_INULL)
best_module can not be NULL in this code path. Remove NULL check and
unnecessary goto statements.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
CID 1047278 Unchecked return value
Updated check for mca_base_var_generate_full_name4 to match other
checks. Logically equivalent to the old check. Not a bug.
CID 1196685 Dereference null return
Added check for NULL when looking up the original variable for a
synonym.
CID 1269705 Logically dead code
Removed code that set the project to NULL. Code was intended to be
removed with an earlier commit that added the project name into the
component structure. Added code to actually support searching for a
group with a wildcard ('*').
CID 1292739 Dereference null return
CID 1269819 Dereference null return
Removed unnecessary string duplication and strchr.
CID 1287030 Logically dead code
Refactored fixup_files code and confirmed that the code in question is
not reachable. Removed the dead code.
CID 1292740 Use of untrusted string
Use strdup to silence coverity warning.
CID 1294413 Free of address-of expression
Reset mitem to NULL after the OPAL_LIST_FOREACH loop to ensure we
never try to free the list sentinel.
CID 1294414 Unchecked return value
Use (void) to indicate we do not care about the return code in this
instance.
CID 1294415 Resource leak
On error free all the base pointer.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
When using an external libfabric (or really any libfabric newer than
libfabric commit 607e863), we must use fi_getname to determine the local
port of our endpoint. Without this fix, OMPI will hang endlessly
while retransmitting packets to port 0 on the remote host.
Code for setting proc node locality
was absent after the removal of Cray
PMI KVS usage. This commit puts that
functionality back in place.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
This commit fixes synonyms so the source file is correctly printed out
by ompi_info. This commit also adds support for printing out the line
number where the variable is set.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Only install the fake usnic libibverbs driver when there are actually
usnic kernel devices present. This prevents some run-time weirdness
on the Cray verbs emulation environment, where apparently
ibv_register_driver() either is not implemented or does not work
properly.
In days past, some implementations of Portals4 could not cover all
of memory with a single Memory Descriptor so multiple large
overlapping Memory Descriptors were created. Because none of the
current implementations have this limitation (and no future
implementations should either), this commit removes the overlapping
Memory Descriptors code.
A few uninitialized common symbols are remaining:
common symbols generated by flex :
* opal/util/keyval/keyval_lex.l: opal_util_keyval_yyleng
* opal/util/keyval/keyval_lex.o: opal_util_keyval_yytext
* opal/util/show_help_lex.l: opal_show_help_yyleng
* opal/util/show_help_lex.l: opal_show_help_yytext
common symbol generated by "external" hwloc library:
* opal/mca/hwloc/hwloc191/hwloc/src/components.o: component_map
This is likely short-lived: now that libfabric has a 1.0.0 release
available, the embedded libfabric may disappear from the OMPI tree
sometime soon. However, we still need it for the time being...
The ompi libfabric/Makefile.am to build the libmca_component_libfabric
lib was missing a recently added psmx_eq.c in the list of source
files for the psm provider.
Fixes#569
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
When a libibverbs driver returns NULL for its context, it's the Open
MPI libibverbs fake driver. Hence, this device is simply not
supported -- ignore it.
libibverbs will complain to stderr if it sees device entries in
/sys/class/infiniband for which it has no userspace plugins.
The Cisco usNIC device no longer exports a verbs interface, thereby
causing libibverbs to emit this annoying stderr warning.
To avoid this, use the public ibv API to register a "fake" libibverbs
driver at run-time (right after we call ibv_fork_init(), but --
critically -- *before* we call ibv_get_device_list()). The purpose of
this driver is solely to convince libibverbs that there *is* a driver
for /sys/class/infininband/usnic_verbs devices. ...although this
driver will never return a valid ibv context (and therefore will never
be used).
Defer initializing the CPCs until we know that we have devices/ports
to use. This both prevents some useless work at startup when there
are no devices/ports to use, and also prevents librdmacm complaining
that there are no verbs-capable RDMA devices available (e.g., if a
Cisco usNIC device is present, but does not present a verbs RDMA
interface).
Defer initializing the CPCs until we know that we have devices/ports
to use. This both prevents some useless work at startup when there
are no devices/ports to use, and also prevents librdmacm complaining
that there are no verbs-capable RDMA devices available (e.g., if a
Cisco usNIC device is present, but does not present a verbs RDMA
interface).
This commit fixes an assert when trying to cleanup a module we failed
to initialize. There is no protection around the OBJ_DESTRUCT calls so
they will always be called so similarly we should always call
OBJ_CONSTRUCT at init.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit also fixes a problem with the lazy opening of topo
components. The topo framework incorrectly: 1) checked if the topo
framework was open by checking the length of the components list, and
2) called the framework open directly instead of using
mca_base_framework_open.
fixes#544
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The Portals4 spec requires that PtlSetMap() be called before any
Portals4 call other than PtlNIInit(), PtlGetMap() and
PtlGetPhysId(). To satisfy this requirement, this commit delays
interface initialization until the end add_procs() at which time
PtlSetMap() has been called.
The Portals4 BTL is registered with the PML as an RDMA BTL, so
prepare_src() is only used in limited cases. This commit removes
the code path from prepare_src() for unbuffered contiguous buffers
with no PML reserve. This is now handled in register_mem().
In the large message case, the sender issues a PtlMEAppend() in
order to generate events when the receiver issues a PtlGet(). This
commit moves the PtlMEAppend() from mca_btl_portals4_prepare_src()
to mca_btl_portals4_register_mem() which is the way it's done in
BTL 3.0.
This commit is a rework of the component repository. The changes
included in this commit are:
- Remove the component dependency code based off .ompi_info
files. This code is legacy code dating back 10 years that and is no
longer used.
- Move the plugin scanning code to the component repository. New
calls have been added to add new scanning paths, query available
components, and dlopen/load components.
- Pass the framework down to mca_base_component_find/filter. Eventually
the framework structure will be used to further validate components
before they are used.
- Add support to the MCA framework system to disable scanning for
dlopened components on open (support already existed in
register). This is really only relevant to installdirs as it has no
register function and no DSO components.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a typo in mca_btl_vader_progress_endpoints where
OPAL_THREAD_LOCK was used when OPAL_THREAD_UNLOCK was intended.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes several vagrind errors. Included:
- installdirs did not correctly reinitialize all pointers to NULL
at close. This causes valgrind errors on a subsequent call to
opal_init_tool.
- several opal strings were leaked by opal_deregister_params which
was setting them to NULL instead of letting them be freed by the
MCA variable system.
- move opal_net_init to AFTER the variable system is initialized and
opal's MCA variables have been registered. opal_net_init uses a
variable registered by opal_register_params!
- do not leak ompi_mpi_main_thread when it is allocated by
MPI_T_init_thread.
- do not overwrite ompi_mpi_main_thread if it is already set (by
MPI_T_init_thread).
- mca_base_var: read_files was overwritting mca_base_var_file_list
even if it was non-NULL.
- mca_base_var: set all file global variables to initial states on
finalize.
- btl/vader: decrement enumerator reference count to ensure that it
is freed.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes the following bugs:
- opal_output_finalize did not properly set internal state. This
caused problems when calling the sequence opal_output_init (),
opal_output_finalize (), opal_output_init ().
- opal_info support called mca_base_open () but never called the
matching mca_base_close (). mca_base_open () and mca_base_close ()
have been updated to use a open count instead of an open flag to
allow mca_base_open to be called through multiple paths (as may be
the case when MPI_T is in use).
- orte_info support did not register opal variables. This can cause
orte-info to not return opal variables.
- opal_info, orte_info, and ompi_info support have been updated to
use a register count.
- When opening the dl framework the reference count was added to
ensure the framework stuck around. The framework being closed
prematurely was a bug in the MCA base that has since been
corrected. The increment (and associated decrement) have been
removed.
- dl/dlopen did not set the value of
mca_dl_dlopen_component.filename_suffixes_mca_storage on each call
to register. Instead the value was set in the component
structure. This caused the value to be lost when re-loading the
component. Fixed by setting the default value in register.
- Reset shmem framework state on close to avoid returning a stale
component after reloading opal/shmem.
- MCA base parameters were not properly deregistered when the MCA
base was closed.
This commit may fix#374.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
linux: only use the device-tree on Power machines
It's available on ARM but the assumption that cpus' "reg" start at 0
is invalid.
We could make that work but the device-tree doesn't currently
bring anything better than sysfs on ARM, so don't bother for now.
On 32-bit architectures loads/stores of fast box headers may take
multiple instructions. This can lead to a data race between the
sender/receiver when reading/writing the sequence number. This can
lead to a situation where the receiver could process incomplete
data. To fix the issue this commit re-orders the fast box header to
put the sequence number and the tag in the same 32-bits to ensure they
are always loaded/stored together.
Fixes#473
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
fi_av_insert() is invoked with a context containing each endpoint
USNIC_NUM_CHANNELS times. If the address on that endpoint fails to
resolve / is unreachable / has some error, we'll therefore get
USNIC_NUM_CHANNELS error completions with that same endpoint. We
therefore only want to warn about the unreachability of (and
OBJ_RELEASE) that endpoint the *first* time.
Fixes CSCut46822.
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.
All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
It was setup in the PCI backend before filtering,
and partially updated after filtering in the core.
Only setup once correctly after filtering in the core.
(cherry picked from commit open-mpi/hwloc@9659653d24)
Conflicts:
tests/hwloc/linux/40intel64-2g2n4c+pci.output
tests/hwloc/xml/192em64t-12gr2n8c2t-distancegroups.xml
tests/hwloc/xml/192em64t-24n8c2t-distancegroups.xml
tests/hwloc/xml/192em64t-24n8c2t-nodistancegroups.xml
tests/hwloc/xml/24em64t-2n6c2t-pci.xml
tests/hwloc/xml/32em64t-2n8c2t-pci-normalio.xml
tests/hwloc/xml/96em64t-4n4d3ca2co-pci.xml
utils/hwloc/test-hwloc-compress-dir.input.tar.gz
utils/hwloc/test-hwloc-compress-dir.output.tar.gz
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
If super_set contains more allocated ulongs than sub_set,
we did not check the last ulongs.
We would return true instead of false when sub_set is
infinite while the last ulongs in super_set are not full.
This fixes tests/hwloc_bitmap_compare_inclusion on some platforms.
(cherry picked from commit open-mpi/hwloc@299e6e846f)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Otherwise we get spurious bits for crazy topologies such as 8em64t-2s2ca2c-buggynuma.output
Will make debug asserts easier.
(cherry picked from commit open-mpi/hwloc@546cd9330a)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Make sure we define complete cpuset/nodeset when we define groups' main cpuset/nodeset
during later insert of groups (for PCI hostbridges or distances).
Otherwise they may end up clearing child/parent complete sets which
suddenly become incoherent while they were fixed earlier.
Needed to fix allowed_nodeset meaning.
(cherry picked from commit open-mpi/hwloc@7c88d17add)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
When looking for PUs inside R_MAXSDL rads, some AIX 6.1 releases
return one first rad without any PU.
AIX 6.1 00F63F144C00 does (on quad-power7).
AIX 6.1 00CBAAC24C00 doesn't (on 16x power6).
So we can't assume rad #x contains PU #x. But we already have the right
code to fill the cpuset from the rad, so use that to obtain the PU os_index
as well.
Cannot be used to obtain NUMA node os_index since there's no way to directly
retrieve NUMA nodes from rads (mempools seem unrelated). Just keep using #rad
for NUMA nodes os_index and document that convention when converting back in
set_membind().
Thanks to Hendryk Bockelmann and Erik Schnetter for helping debugging.
(cherry picked from commit open-mpi/hwloc@60006c7b88)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Otherwise we'll have some NULL objects above, would be annoying.
No need to dig further, the distance matrix is likely buggy.
We still keep the inserted groups at this level (incomplete level)
because removing them is hard.
(cherry picked from commit open-mpi/hwloc@312a971ec9)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Commit 626129d2818693e62b83c1cfa2ba6e058e5bed66 fixed the hwloc
device/vendor numbers obtained from libpciaccess.
But the corresponding names are still retrieved from pciaccess numbers,
so fix these numbers inside pciaccess structures before retrieving the names.
(cherry picked from commit open-mpi/hwloc@85ea6e4acc)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Misc objects were used between system and machine in the past
but quickly got replaced with groups.
(cherry picked from commit open-mpi/hwloc@6c2aa6d1ea)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Misc are reserved for annotating the topology, the core
doesn't like merging them. Group is more appropriate.
(cherry picked from commit open-mpi/hwloc@3c47649591)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
hwloc_get_first_largest_obj_inside_cpuset() returns the largest/highest object,
but it could still have a child with the same cpuset.
So check children as well in case there's a matching NUMA node there.
(cherry picked from commit open-mpi/hwloc@57a1c4fbe4)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
When ignore_keep_structure is enabled, intermediate level can disappear
between parent and child, making the new child complete_cpuset smaller,
causing the child list to require a reorder just like in remove_ignored().
(cherry picked from commit open-mpi/hwloc@88afbe6b62)
Embed this related commit:
core: abstract out reorder_children(), needed when merging modifies the list of children
(cherry picked from commit open-mpi/hwloc@14db82d391)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
If object A contains B + I/O as children, we can "ignore" I/Os and still
try to merge A and B. We now do the same for Misc objects without cpusets
instead of I/Os.
This fixes a corner case when export/reimport to XML creates a slightly
different topology (making hwloc_insert_misc fail inside a Linux cgroup).
Thanks to Dave Love for reporting the problem.
Fixes#118
(cherry picked from commit open-mpi/hwloc@650371e115)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
When I/O are attached under a PU, removing the children's cpusets from
the parent cpuset doesn't give 0, it gives the PU cpuset.
The assertion fails on single-pu machines with I/O when --merge is given,
only one PU remains with I/O under it.
But if we insert Misc by cpuset under PU, it gives 0 as expected.
Fix the assertion accordingly.
Thanks to Thomas Van Doren for reporting the issue.
(cherry picked from commit open-mpi/hwloc@45c94c336d)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
hwloc_x86_discover() calls hwloc_look_x86() twice, which calls hwloc_have_x86_cpuid().
If everything gets inlined, the asm label inside hwloc_have_x86_cpuid()
is duplicated.
Use a local label with f annotation in jumps to avoid the problem.
Thanks to Thomas Van Doren for reporting the issue (found with gcc -m32).
(cherry picked from commit open-mpi/hwloc@50e447f5bc)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
x86: Not critical since BSDs that use this backend have no membind support,
but better fix it for uniformization.
(cherry picked from commit open-mpi/hwloc@a431361c7d)
OSF: Looks like nobody ever tried to play with memory binding on OSF/Tru64.
(cherry picked from commit open-mpi/hwloc@2d6c73356d)
Conflicts:
NEWS
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
At least some solaris enforce the need to #include X11/Xlib.h first.
Thanks to Siegmar Gross for reporting the issue.
(cherry picked from commit open-mpi/hwloc@005a7e89b6)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
tolower needs <ctype.h>
Thanks to Ralph Castain for reporting the failure.
(cherry picked from commit open-mpi/hwloc@038c372a58)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
strncasecmp() needs <strings.h>
Thanks to Pavan Balaji for reporting the failure.
(cherry picked from commit open-mpi/hwloc@37439c4801)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Don't filter the topology by cpuset if you are mpirun until you know that no other compute nodes are involved. This deals with the corner case where mpirun is executing on a node of different topology from the compute nodes.
Simplify - don't mandate that all cpus in the given cpuset be present on every node. We can then run everything thru the filter as before, which ensures that any procs run on mpirun are also contained within the specified cpuset.
Correctly count the number of available PUs under each object when given a cpuset
Fix the default binding settings, and correctly count PUs when no cpuset is given
Ensure the binding policy gets set in all cases
When we use LIBADD for static libraries, the dependent libraries get
propagated properly. For example, the dl/dlopen component will almost
certainly require the -ldl library; when using LIBS, that doesn't get
propagated elsewhere in the tree, but when using LIBADD, it does
(e.g., when linking opal_wrapper_compiler).