1
1
Граф коммитов

380 Коммитов

Автор SHA1 Сообщение Дата
Mark Allen
bdd92a7a64 -cpu-set as a constraint rather than as a binding
The first category of issue I'm addressing is that recent code changes
seem to only consider -cpu-set as a binding option. Eg a command like
this
  % mpirun -np 2 --report-bindings --use-hwthread-cpus \
      --bind-to cpulist:ordered --map-by hwthread --cpu-set 6,7 hostname
which just round robins over the --cpu-set list.

Example output which seems fine to me:
> MCW rank 0: [..../..B./..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [..../...B/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]

It should also be possible though to pass a --cpu-set to most other
map/bind options and have it be a constraint on that binding. Eg
  % mpirun -np 2 --report-bindings \
      --bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname
  % mpirun -np 2 --report-bindings \
      --bind-to hwthread --map-by ppr:2:node,pe=2 --cpu-set 6,7,12,13 hostname

The first command above errors that
> Conflicting directives for mapping policy are causing the policy
> to be redefined:
>   New policy:   RANK_FILE
>   Prior policy:  BYHWTHREAD

The error check in orte_rmaps_rank_file_open() is likely too aggressive.
The intent seems to be that any option like "--map-by whatever" will
check to see if a rankfile is in use, and report that mapping via rmaps
and using an explicit rankfile is a conflict.

But the check has been expanded to not just check
    NULL != orte_rankfile
but also errors out if
    (NULL != opal_hwloc_base_cpu_list &&
    !OPAL_BIND_ORDERED_REQUESTED(opal_hwloc_binding_policy))
which seems to be only recognizing -cpu-set as a binding option and
ignoring -cpu-set as a constraint on other binding policies.

For now I've changed the
    NULL != opal_hwloc_base_cpu_list
to
    OPAL_BIND_TO_CPUSET == OPAL_GET_BINDING_POLICY(opal_hwloc_binding_policy)
so it hopefully only errors out if -cpu-set is being used as a binding
policy.  Whether I did that right or not it's enough to get to the next
stage of testing the example commands I have above.

Another place similar logic is used is hwloc_base_frame.c where it has
    /* did the user provide a slot list? */
    if (NULL != opal_hwloc_base_cpu_list) {
        OPAL_SET_BINDING_POLICY(opal_hwloc_binding_policy, OPAL_BIND_TO_CPUSET);
    }
where it used to (long ago) only do that if
    !OPAL_BINDING_POLICY_IS_SET(opal_hwloc_binding_policy)
I think the new code is making it impossible to use --cpu-set as anything
other than a binding policy.

That brings us past the error detection and into the real functionality, some of
which has been stripped out, probably in moving to hwloc-2:
  % mpirun -np 2 --report-bindings \
      --bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname
> MCW rank 0: [B.../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [.B../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]

The rank_by() function in rmaps_base_ranking.c makes an array out of objects
returned from
    opal_hwloc_base_get_obj_by_type(,,,i,)
which uses df_search().  That function changed quite a bit from hwloc-1 to 2
but it used to include a check for
    available = opal_hwloc_base_get_available_cpus(topo, start)
which is where the bitmask from --cpu-set goes.  And it used to skip objs that
had hwloc_bitmap_iszero(available).

So I restored that behavior in ds_search() by adding a "constrained_cpuset" to
replace start->cpuset that it was otherwise processing.  With that change in
place the first command works:
  % mpirun -np 2 --report-bindings \
      --bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname
> MCW rank 0: [..../..B./..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [..../...B/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]

The other command uses a different path though that still ignored the
available mask:
  % mpirun -np 2 --report-bindings \
      --bind-to hwthread --map-by ppr:2:node:pe=2 --cpu-set 6,7,12,13 hostname
> MCW rank 0: [BB../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [..BB/..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
In bind_generic() the code used to call
opal_hwloc_base_find_min_bound_target_under_obj() which used
opal_hwloc_base_get_ncpus(), and that's where it would
intersect objects with the available cpuset and skip over ones
that were't available. To match the old behavior I added a few
lines in bind_generic() to skip over objects that don't intersect
the available mask. After that we get
  % mpirun -np 2 --report-bindings \
      --bind-to hwthread --map-by ppr:2:node:pe=2 --cpu-set 6,7,12,13 hostname
> MCW rank 0: [..../..BB/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [..../..../..../BB../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]

I think the above changes are improvements, but I don't feel like they're
comprehensive.  I only traced through enough code to fix the two specific
bugs I was dealing with.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2019-04-12 15:33:56 -04:00
Ralph Castain
0f26d8c76b Silence warnings
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-03-19 10:27:39 -07:00
Josh Hursey
ad8c842e7d
Merge pull request #6477 from markalle/report_bindings_strlen
opal_hwloc_base_cset2str() off-by-1 in its strncat()
2019-03-14 12:42:50 -05:00
Mark Allen
30d60994d2 opal_hwloc_base_cset2str() off-by-1 in its strncat()
I think the strncat() calls here need to be of the form
    strncat(str, new_str_to_add, len - strlen(new_str_to_addstr) - 1);
since in the OMPI calls len is being used as total number of bytes
in str.

strncat(dest,src,n) on the other hand is documented as writing up to
n chars from the incoming string plus 1 for the null, for n+1 total
bytes it can write.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2019-03-11 14:35:53 -04:00
Ben Menadue
17dcc7041a Hold off running hwloc:external feature tests until after we decide if we're using the internal or external component. This fixes #6430.
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>
2019-02-25 16:58:11 +11:00
Jeff Squyres
59c8ab6da4 m4: remove all configury related to libibverbs
Now that all components that use libibverbs are gone, remove
OPAL_CHECK_VERBS and the confusingly-named OPAL_CHECK_OPENFABRICS
(which really just checked for verbs things -- not all the possible
OpenFabrics APIs/libraries).

The only code left in Open MPI that calls verbs is hwloc -- and that's
just the APIs that takes an IBV device and returns topological
information about it.  Since nothing in the Open MPI code base uses
the "ibv_*" API any more, we have no need for this hwloc functionality
so we'll even remove the --with-verbs configure options.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 05:36:06 -08:00
Gilles Gouaillardet
73d104f695 hwloc/base: fix some off-by-one errors
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-29 07:36:56 -08:00
Jeff Squyres
f22b7d4f46 hwloc/external.h: fix a clash with external HWLOC_VERSION[*]
Some macros defined by the embedded hwloc ends up in opal_config.h
because hwloc configury m4 files are slurped into Open MPI.  These
macros are not required here, and they might conflict with an external
hwloc install, so simply #undef them in hwloc/external/external.h
after including <opal_config.h> but before including the external
<hwloc.h>.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-01-29 07:36:01 -08:00
Brian Barrett
e9e4d2a4bc Handle asprintf errors with opal_asprintf wrapper
The Open MPI code base assumed that asprintf always behaved like
the FreeBSD variant, where ptr is set to NULL on error.  However,
the C standard (and Linux) only guarantee that the return code will
be -1 on error and leave ptr undefined.  Rather than fix all the
usage in the code, we use opal_asprintf() wrapper instead, which
guarantees the BSD-like behavior of ptr always being set to NULL.
In addition to being correct, this will fix many, many warnings
in the Open MPI code base.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-08 16:43:53 -07:00
Boris Karasev
ed42f568ae pmix: check the old topo key to keep compatibility with old RMs
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-09-25 18:13:54 +03:00
Boris Karasev
beb0697f24 Fixed copyrights of prev commit.
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-08-27 09:50:11 +03:00
Boris Karasev
e5291ccc34 Fixed the NUMA obj detection for hwloc ver >= 2.0.0
Since version hwloc 2.0.0 has a new organization of NUMA nodes on the
topology tree. This commit adds the detection of local NUMA object for
hwloc => 2.0.0, which fixes the procs bindings policy for rmaps mindist
component.

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-08-24 19:11:52 +03:00
Jeff Squyres
01e4570af7 hwloc201/configure.m4: make it safe when used with hwloc:external
The Autoconf AC_CONFIG_* macros can only be instantiated exacly once
for any given file, *and* they must be in a code execution path at run
time for the target file to be generated at the end of configure.

For example, if you want to generate file ABC at the end of configure,
you must invoke the AC_CONFIG_FILES(ABC) macro in a code path that
will get executed when configure is run.

That's pretty straightforward.

What's not straightforward is two corner cases:

1. You cannot invoke the AC_CONFIG_FILES(ABC) macro for the same file
   more than once.  If you do, autoreconf will fail (even before you
   can run configure).
2. If AC_CONFIG_FILES(ABC) is not in a code path that is executed by
   configure, the file ABC is not registered properly, and ABC will
   not be generated at the end of configure.

This applies to hwloc because hwloc's HWLOC_SETUP_CORE macro calls
both AC_CONFIG_FILES and AC_CONFIG_HEADER to setup its Makefiles
(etc.) so that targets like "make distclean" and "make distcheck" will
work properly.  Hence, we *have* to invoke HWLOC_SETUP_CORE.

However, the MCA_opal_hwloc_hwloc201_CONFIG macro has a few side
effects.  It would be nice to do able to do something like this:

```
    if hwloc:extern is going to be used:
        Invoke minimal HWLOC_SETUP_CORE (with no side effects)
    else
        Invoke full HWLOC_SETUP_CORE (with side effects)
    fi
```

But we can't, because autoreconf will detect that AC_CONFIG_FILES has
been invoked on the same files more than once (regardless of whether
those code paths will be executed at run time or not).  Kaboom.

Similarly, we can't do this:

```
    if hwloc:extern is not going to be used:
        Invoke full HWLOC_SETUP_CORE (with side effects)
    fi
```

Because then hwloc's AC_CONFIG_FILES won't be registered properly when
hwloc:external *is* used (i.e., when the HWLOC_SETUP_CORE macro is not
in a code path that is executed at run time), and targets like "make
distclean" will fail because hwloc's Makefiles won't have been setup.
Kaboom.

But remember that the hwloc framework is a bit special: there will
only ever be 2 comoponents: external and internal.  External is
guaranteed to be configured first because of its priority.  So the
internal component (i.e., this component) immediately knows if it is
going to be used or not based on whether the external component
configuration succeeded or failed.

Specifically: regardless of whether the internal component (i.e., this
component) is going to be used, we have to invoke HWLOC_SETUP_CORE.
But we can manage the side effects: allow the side effects when
this/internal component is going to be used, and avoid the side
effects when this/internal component is not going to be used.

This is a little less clean than I would have liked, but because of
Autoconf's oddity about its AC_CONFIG_* macros, this is the only
solution I could come up with.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-11 11:05:23 -07:00
Jeff Squyres
4e5f432786 hwloc201: only configure if hwloc:external fails
We know that hwloc:external will be configured first (because of its
priority).  Take advantage of that here in hwloc201 by having it
refuse to configure / politely fail if hwloc:external succeeded.

Also print out some additional lines in configure output indicating
what is going on (i.e., hwloc:external succeeded, so this component
will be skipped, or hwloc:external failed, so this component will be
used).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-08 10:22:38 -07:00
Gilles Gouaillardet
ce2c9fffd4 hwloc: prefer external hwloc component
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-07-23 09:20:27 +09:00
KAWASHIMA Takahiro
3e179ba95f hwloc/external: Suppress missing-include-dirs warning
If OMPI is configured with `--with-hwloc=external` or `--with-hwloc=DIR`
and gfortran is used, I see a lot of warnings when compiling files
under the `ompi/mpi/fortran` directory.

```
f951: Warning: Nonexistent include directory
'BUILD_DIR/opal/mca/hwloc/external/hwloc/include' [-Wmissing-include-dirs]
```

There is no such `include` directory in the source tree and `configure`-
created tree. I think these lines in the `configure.m4` file are wrongly
copied from that for the embedded `hwlocXXX` component in the past.

The `-Wmissing-include-dirs` option is enabled in gfortran by default
but it is not enabled by default (or even with `-Wall`) in gcc and g++.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-09 10:55:33 +09:00
Jeff Squyres
4603852740 orterun: use consistent CLI option name for --bind-to
Since the new binding option is tied to the --cpu-list orterun CLI
option, make the --bind-to option reflect the same name (vs. the
--cpu-set CLI option, which is entirely different).  For example:

    mpirun --bind-to cpu-list:ordered ...

Note that "--bind-to cpulist:ordered" is accepted as a synonym,
because people will be lazy.

Also add some minor updates to the orterun.1in man page for
clarification.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-21 08:22:00 -07:00
Ralph Castain
f17d47087a Define a new binding method and qualifier
Allow users to request that procs be bound to a cpu in a given cpu-list based on their corresponding local rank

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-20 21:26:09 -07:00
Peter Gottesman
afe363b73b Ensure required hwloc directories are in dist tarballs
Fixes MTT failure when running autogen on a tarball

Signed-off-by: Peter Gottesman <pgottesm@cisco.com>
2018-06-12 08:33:50 -07:00
Ralph Castain
014bb3c8de Fix external hwloc builds
Remove spurious comma in header file definition. Remove unused variables

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-03 11:24:21 -07:00
Brice Goglin
847f2e9933 opal/hwloc: remove now unused available field from opal_hwloc_obj_data_t
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
b260600450 opal/hwloc: simplify df_search() and make it work with hwloc 2.x NUMA nodes
Don't do a recursive search (hence no need for *idx anymore).
Find the level depth, to hide cache-issues first.
Then iterate over that level to find the objects we want.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
a06fc74664 opal/hwloc: remove an obsolete comment about offlines CPUs etc
Only online/available objects are enabled in OMPI now.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
369a7ea279 opal/hwloc: remove df_search_cores and fix things for hwloc 2.x NUMA nodes
Just iterate over cores inside the given object cpuset.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
0cd0c12111 opal/hwloc: remove min_bound() functions
df_search_min_bound() would need to be fixed for hwloc 2.0,
but it's only used in opal_hwloc_base_find_min_bound_target_under_obj()
which isn't used anymore. So just remove all of them.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
d12ef324c9 hwloc 2.0 doesn't have hwloc/myriexpress.h anymore
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
33ea2f0de4 fix OPAL_HWLOC_WANT_SHMEM management in opal/mca/hwloc/external/external.h
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
bd08a6ead9 hwloc: fix hwloc/shmem.h in the external case
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Jeff Squyres
af4299ebc5 hwloc: updates for hwloc 2.0.x API
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-05-24 11:53:07 +02:00
Brice Goglin
77cc3fcda5 hwloc: update to hwloc 2.0.1
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:52:59 +02:00
Ralph Castain
c341b53475 Fix the embedded hwloc configure to always disable cuda support. Add definitions for updated hwloc objects when old external versions are used
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-04 11:35:20 -07:00
Jeff Squyres
2ec2a329dc hwloc2a/configure.m4: be more careful in with_cuda->enable_cuda
Be a little more deliberate about convering OMPI's --with-cuda CLI
value to hwloc's --enable-cuda configure option.

Also, unconditionally disable hwloc NVML support (because Open MPI is
not currently using it).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-09-24 05:36:23 -07:00
Brice Goglin
84a721d17a hwloc: disable GL and OpenCL in the hwloc component
Open MPI doesn't use GL or OpenCL OS devices, so just disable them in
hwloc.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-09-21 08:25:46 -07:00
Jeff Squyres
f5d51dc2f5 hwloc: do not build hwloc CUDA support if --without-cuda used
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-09-21 08:24:54 -07:00
Ralph Castain
e02c39385a Merge branch 'master' into topic/modex 2017-08-22 20:06:35 -07:00
Gilles Gouaillardet
565b516dae hwloc/base: fix opal_output() usage
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-08-23 10:24:47 +09:00
Ralph Castain
d80b0c7990 If the HWLOC shared memory system is unable to connect, then fallback to providing the topology via XML. Do not automatically provide the XML to every process as that defeats the purpose of the shared memory system. Instead, use PMIx_Query_info_nb to get the info from the server when required.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-22 18:12:26 -07:00
Brice Goglin
2d242ab9f0 hwloc/shmem: don't abort on failure to load from shmem
Adopting can fail if the server-side hole isn't available on the client.

We can fallback to other ways to load the topology.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2017-08-21 19:57:38 +02:00
Brice Goglin
ffd209fc2e hwloc/shmem: dump /proc/self/maps if failed to find a hole and verbosity > 4
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2017-08-21 19:57:38 +02:00
Ralph Castain
41df973359 Add diagnostics for hwloc get_topology
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-16 14:21:27 -07:00
Ralph Castain
98f36711e3 Update hwloc to latest shmem branch. Correct typos in update-my-copyright.pl.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-15 13:32:12 -07:00
Ralph Castain
daf548b328 Apply patch from @bgoglin
Fixes #4027

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-11 07:16:14 -07:00
Ralph Castain
d1b7c3d8d5 Silence some compile-time warnings. Update scripts now that AUTHORS is gone
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-04 20:08:31 -07:00
Ralph Castain
f39ce67982 Merge pull request #3951 from rhc54/topic/hwloc2
Update to hwloc 2.0.0a
2017-08-01 15:18:31 -06:00
Gilles Gouaillardet
825116044e hwloc/base: fix info message for opal_hwloc_base_binding_policy
if np > 2, the default binding is now "numa"

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-28 11:17:15 +09:00
Ralph Castain
6ebaed8c01 Restore support for user-provided cpulist
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-25 23:51:21 -07:00
Ralph Castain
7a83fdb9bb Update to hwloc 2.0.0a with shmem support.
Update to support passing of HWLOC shmem topology to client procs
Update use of distance API per @bgoglin
Have the openib component lookup its object in the distance matrix
Bring usnic up-to-date
Restore binding for hwloc2

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-25 20:26:22 -07:00
Ralph Castain
96f07aebfa Restore binding support
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-25 18:44:44 -07:00
Ralph Castain
0e4e3af1db Remove problem installation of hwloc 2.0
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-20 18:18:08 -07:00
Ralph Castain
7d8d877837 Remove build product and update .gitignore to avoid picking it up again
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-20 11:49:48 -07:00