Since version hwloc 2.0.0 has a new organization of NUMA nodes on the
topology tree. This commit adds the detection of local NUMA object for
hwloc => 2.0.0, which fixes the procs bindings policy for rmaps mindist
component.
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
The Autoconf AC_CONFIG_* macros can only be instantiated exacly once
for any given file, *and* they must be in a code execution path at run
time for the target file to be generated at the end of configure.
For example, if you want to generate file ABC at the end of configure,
you must invoke the AC_CONFIG_FILES(ABC) macro in a code path that
will get executed when configure is run.
That's pretty straightforward.
What's not straightforward is two corner cases:
1. You cannot invoke the AC_CONFIG_FILES(ABC) macro for the same file
more than once. If you do, autoreconf will fail (even before you
can run configure).
2. If AC_CONFIG_FILES(ABC) is not in a code path that is executed by
configure, the file ABC is not registered properly, and ABC will
not be generated at the end of configure.
This applies to hwloc because hwloc's HWLOC_SETUP_CORE macro calls
both AC_CONFIG_FILES and AC_CONFIG_HEADER to setup its Makefiles
(etc.) so that targets like "make distclean" and "make distcheck" will
work properly. Hence, we *have* to invoke HWLOC_SETUP_CORE.
However, the MCA_opal_hwloc_hwloc201_CONFIG macro has a few side
effects. It would be nice to do able to do something like this:
```
if hwloc:extern is going to be used:
Invoke minimal HWLOC_SETUP_CORE (with no side effects)
else
Invoke full HWLOC_SETUP_CORE (with side effects)
fi
```
But we can't, because autoreconf will detect that AC_CONFIG_FILES has
been invoked on the same files more than once (regardless of whether
those code paths will be executed at run time or not). Kaboom.
Similarly, we can't do this:
```
if hwloc:extern is not going to be used:
Invoke full HWLOC_SETUP_CORE (with side effects)
fi
```
Because then hwloc's AC_CONFIG_FILES won't be registered properly when
hwloc:external *is* used (i.e., when the HWLOC_SETUP_CORE macro is not
in a code path that is executed at run time), and targets like "make
distclean" will fail because hwloc's Makefiles won't have been setup.
Kaboom.
But remember that the hwloc framework is a bit special: there will
only ever be 2 comoponents: external and internal. External is
guaranteed to be configured first because of its priority. So the
internal component (i.e., this component) immediately knows if it is
going to be used or not based on whether the external component
configuration succeeded or failed.
Specifically: regardless of whether the internal component (i.e., this
component) is going to be used, we have to invoke HWLOC_SETUP_CORE.
But we can manage the side effects: allow the side effects when
this/internal component is going to be used, and avoid the side
effects when this/internal component is not going to be used.
This is a little less clean than I would have liked, but because of
Autoconf's oddity about its AC_CONFIG_* macros, this is the only
solution I could come up with.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
We know that hwloc:external will be configured first (because of its
priority). Take advantage of that here in hwloc201 by having it
refuse to configure / politely fail if hwloc:external succeeded.
Also print out some additional lines in configure output indicating
what is going on (i.e., hwloc:external succeeded, so this component
will be skipped, or hwloc:external failed, so this component will be
used).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
If OMPI is configured with `--with-hwloc=external` or `--with-hwloc=DIR`
and gfortran is used, I see a lot of warnings when compiling files
under the `ompi/mpi/fortran` directory.
```
f951: Warning: Nonexistent include directory
'BUILD_DIR/opal/mca/hwloc/external/hwloc/include' [-Wmissing-include-dirs]
```
There is no such `include` directory in the source tree and `configure`-
created tree. I think these lines in the `configure.m4` file are wrongly
copied from that for the embedded `hwlocXXX` component in the past.
The `-Wmissing-include-dirs` option is enabled in gfortran by default
but it is not enabled by default (or even with `-Wall`) in gcc and g++.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
Since the new binding option is tied to the --cpu-list orterun CLI
option, make the --bind-to option reflect the same name (vs. the
--cpu-set CLI option, which is entirely different). For example:
mpirun --bind-to cpu-list:ordered ...
Note that "--bind-to cpulist:ordered" is accepted as a synonym,
because people will be lazy.
Also add some minor updates to the orterun.1in man page for
clarification.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Allow users to request that procs be bound to a cpu in a given cpu-list based on their corresponding local rank
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Don't do a recursive search (hence no need for *idx anymore).
Find the level depth, to hide cache-issues first.
Then iterate over that level to find the objects we want.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
df_search_min_bound() would need to be fixed for hwloc 2.0,
but it's only used in opal_hwloc_base_find_min_bound_target_under_obj()
which isn't used anymore. So just remove all of them.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Be a little more deliberate about convering OMPI's --with-cuda CLI
value to hwloc's --enable-cuda configure option.
Also, unconditionally disable hwloc NVML support (because Open MPI is
not currently using it).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Adopting can fail if the server-side hole isn't available on the client.
We can fallback to other ways to load the topology.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Update to support passing of HWLOC shmem topology to client procs
Update use of distance API per @bgoglin
Have the openib component lookup its object in the distance matrix
Bring usnic up-to-date
Restore binding for hwloc2
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
This commit cleans up code in opal to use OPAL_LIST_FOREACH(_SAFE),
OPAL_LIST_DESTRUCT, and OPAL_LIST_RELEASE.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Commit fec519a793 broke the ability to
run autogen.pl in a distribution tarball. This commit restores that
ability by also distributing opal/mca/hwloc/autogen.options in the
tarball.
Skipping CI because CI does not test this functionality:
[skip ci]
bot:notest
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
hwloc v1.5 does not support HWLOC_OBJ_OSDEV_COPROC
nor hwloc_topology_dup(), so for this version :
- do not search for coprocessors
- do not try hwloc_topology_dup(), note this is not
used anywhere in the code base
Thanks Jeff for helping with the wording
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>