1
1
Граф коммитов

29893 Коммитов

Автор SHA1 Сообщение Дата
Mark Allen
bdd92a7a64 -cpu-set as a constraint rather than as a binding
The first category of issue I'm addressing is that recent code changes
seem to only consider -cpu-set as a binding option. Eg a command like
this
  % mpirun -np 2 --report-bindings --use-hwthread-cpus \
      --bind-to cpulist:ordered --map-by hwthread --cpu-set 6,7 hostname
which just round robins over the --cpu-set list.

Example output which seems fine to me:
> MCW rank 0: [..../..B./..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [..../...B/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]

It should also be possible though to pass a --cpu-set to most other
map/bind options and have it be a constraint on that binding. Eg
  % mpirun -np 2 --report-bindings \
      --bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname
  % mpirun -np 2 --report-bindings \
      --bind-to hwthread --map-by ppr:2:node,pe=2 --cpu-set 6,7,12,13 hostname

The first command above errors that
> Conflicting directives for mapping policy are causing the policy
> to be redefined:
>   New policy:   RANK_FILE
>   Prior policy:  BYHWTHREAD

The error check in orte_rmaps_rank_file_open() is likely too aggressive.
The intent seems to be that any option like "--map-by whatever" will
check to see if a rankfile is in use, and report that mapping via rmaps
and using an explicit rankfile is a conflict.

But the check has been expanded to not just check
    NULL != orte_rankfile
but also errors out if
    (NULL != opal_hwloc_base_cpu_list &&
    !OPAL_BIND_ORDERED_REQUESTED(opal_hwloc_binding_policy))
which seems to be only recognizing -cpu-set as a binding option and
ignoring -cpu-set as a constraint on other binding policies.

For now I've changed the
    NULL != opal_hwloc_base_cpu_list
to
    OPAL_BIND_TO_CPUSET == OPAL_GET_BINDING_POLICY(opal_hwloc_binding_policy)
so it hopefully only errors out if -cpu-set is being used as a binding
policy.  Whether I did that right or not it's enough to get to the next
stage of testing the example commands I have above.

Another place similar logic is used is hwloc_base_frame.c where it has
    /* did the user provide a slot list? */
    if (NULL != opal_hwloc_base_cpu_list) {
        OPAL_SET_BINDING_POLICY(opal_hwloc_binding_policy, OPAL_BIND_TO_CPUSET);
    }
where it used to (long ago) only do that if
    !OPAL_BINDING_POLICY_IS_SET(opal_hwloc_binding_policy)
I think the new code is making it impossible to use --cpu-set as anything
other than a binding policy.

That brings us past the error detection and into the real functionality, some of
which has been stripped out, probably in moving to hwloc-2:
  % mpirun -np 2 --report-bindings \
      --bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname
> MCW rank 0: [B.../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [.B../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]

The rank_by() function in rmaps_base_ranking.c makes an array out of objects
returned from
    opal_hwloc_base_get_obj_by_type(,,,i,)
which uses df_search().  That function changed quite a bit from hwloc-1 to 2
but it used to include a check for
    available = opal_hwloc_base_get_available_cpus(topo, start)
which is where the bitmask from --cpu-set goes.  And it used to skip objs that
had hwloc_bitmap_iszero(available).

So I restored that behavior in ds_search() by adding a "constrained_cpuset" to
replace start->cpuset that it was otherwise processing.  With that change in
place the first command works:
  % mpirun -np 2 --report-bindings \
      --bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname
> MCW rank 0: [..../..B./..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [..../...B/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]

The other command uses a different path though that still ignored the
available mask:
  % mpirun -np 2 --report-bindings \
      --bind-to hwthread --map-by ppr:2:node:pe=2 --cpu-set 6,7,12,13 hostname
> MCW rank 0: [BB../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [..BB/..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
In bind_generic() the code used to call
opal_hwloc_base_find_min_bound_target_under_obj() which used
opal_hwloc_base_get_ncpus(), and that's where it would
intersect objects with the available cpuset and skip over ones
that were't available. To match the old behavior I added a few
lines in bind_generic() to skip over objects that don't intersect
the available mask. After that we get
  % mpirun -np 2 --report-bindings \
      --bind-to hwthread --map-by ppr:2:node:pe=2 --cpu-set 6,7,12,13 hostname
> MCW rank 0: [..../..BB/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [..../..../..../BB../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]

I think the above changes are improvements, but I don't feel like they're
comprehensive.  I only traced through enough code to fix the two specific
bugs I was dealing with.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2019-04-12 15:33:56 -04:00
valentin petrov
cd5fa9706e
Merge pull request #6574 from vspetrov/master
Fixes the O(N^2) loop in the mca_scoll_mpi_comm_query
2019-04-11 17:25:55 +03:00
bosilca
8cf7a7e87d
Merge pull request #6538 from bosilca/topic/issue6522
Prevent a segfault when accessing a rank outside a communicator.
2019-04-09 18:08:49 -04:00
Jeff Squyres
9bb8fd509b
Merge pull request #6566 from davideberius/name_fix
Component: Changes
2019-04-09 09:03:05 -05:00
Gilles Gouaillardet
fd686d7e1f
Merge pull request #6580 from ggouaillardet/topic/pmix_refresh
pmix/pmix4x: refresh to the latest PMIx
2019-04-09 15:26:45 +09:00
Gilles Gouaillardet
9ce8d7b568 pmix/pmix4x: refresh to the latest PMIx
refrest pmi4x to pmix/pmix@2531c0c3d1

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-04-09 14:03:00 +09:00
Valentin Petrov
2fa93332ce Fixes the O(N^2) loop in the mca_scoll_mpi_comm_query
The new proc group is created from the "world_group" based on the
      ranks mapping which can be directly taken from proc_name->vpid.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-04-08 17:49:00 +03:00
KAWASHIMA Takahiro
163bbd4f04
Merge pull request #6567 from kawashima-fj/pr/sys-timer-cleanup
opal/sys: Native timer cleanup
2019-04-08 09:22:43 +09:00
Yossi Itigin
0a446b0a3f
Merge pull request #6563 from benmenadue/fix-shmem-context
master: add missing #include to oshmem/shmem/c/shmem_context.c
2019-04-04 16:22:27 +03:00
KAWASHIMA Takahiro
77286a41aa opal/sys: Introduce OPAL_HAVE_SYS_TIMER_GET_FREQ macro
... to avoid using an architecture name macro in
`opal/mca/timer/linux/timer_linux_component.c`.

The function name `opal_sys_timer_freq` is also changed for
consistency with `opal_sys_timer_get_cycles`.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-04-04 11:48:02 +09:00
KAWASHIMA Takahiro
f2c108bd8e opal/sys: Correct OPAL_HAVE_SYS_TIMER_GET_CYCLES value
... in the case of `OPAL_GCC_INLINE_ASSEMBLY == 0`

In this case, `OPAL_HAVE_SYS_TIMER_GET_CYCLES` should be 0 because
the `opal_sys_timer_get_cycles` function is not defined.

The history:

1. Before 8d4175ad89, `OPAL_HAVE_SYS_TIMER_GET_CYCLES` was 0.
2. In 8d4175ad89, adf92d6237, adf92d6237, and c62ce1593a,
   `OPAL_HAVE_SYS_TIMER_GET_CYCLES` was changed to 1 by introducing
   `opal/asm/base/*.asm`.
3. In ebce88b7ad, `opal/asm/base/*.asm` were removed.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-04-04 11:46:37 +09:00
David Eberius
461d8bc77b Fixed a potential name collision.
Signed-off-by: David Eberius <deberius@vols.utk.edu>
2019-04-03 16:43:48 -04:00
Ben Menadue
063596b828 Add missing #include to oshmem/shmem/c/shmem_context.c.
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>
2019-04-03 15:58:13 +11:00
markalle
98fdeeeb41
Merge pull request #6448 from markalle/macro_writing_input_arg
in-place conversion macro writes into INPUT argument
2019-04-02 11:33:18 -05:00
Brelle Emmanuel
e630046a4b pml/ob1: fixed local handle sent during PUT control message
In case of using a btl_put in ob1, the handle of the locally registered
memory is sent with a PUT control message. In the current master code
the sent handle is necessary the handle in the frag but if the handle
has been successfully registered in the request, the frag structure does
not have any valid handle and all fragments use the request one.

I suggest to check if the handle in the fragment is valid and if not to
send the handle from the request.

Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net>
2019-04-01 18:45:05 +02:00
Brelle Emmanuel
9c689f2225 pml/ob1: fixed exit from get_frag_fail when falling back on btl_put
In the case the btl_get fails Ob1 tries to fallback on btl_put first but
the return code was ignored. So the code fell back on both btl_put and
btl_send.

Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net>
2019-04-01 18:17:10 +02:00
Mark Allen
0a7f1e3cc5 in-place conversion macro writes into INPUT argument
In fint_2_int.h there are some conversion macros for logicals. It has
one path for OMPI_SIZEOF_FORTRAN_LOGICAL != SIZEOF_INT where a new array
would be allocated and the conversions then might expand to
    c_array[i] = (array[i] == 0 ? 0 : 1)
and another path for OMPI_SIZEOF_FORTRAN_LOGICAL == SIZEOF_INT where it
does things "in place", so the same conversion there would just be
    array[i] = (array[i] == 0 ? 0 : 1)

The problem is some of the logical arrays being converted are INPUT
arguments. And it's possible for some compilers to even put the argument
in read-only memory so the above "in place" conversion SEGV's.  A
testcase I have used
    call MPI_CART_SUB(oldcomm, (/.true.,.false./), newcomm, ierr)
and gfortran put the second arg in read-only mem.

In cart_sub_f.c you can trace the ompi_fortran_logical_t *remain_dims arg.
remain_dims[] is for input only, but the file uses
    OMPI_LOGICAL_ARRAY_NAME_DECL(remain_dims);
    OMPI_ARRAY_LOGICAL_2_INT(remain_dims, ndims);
    PMPI_Cart_sub(..., OMPI_LOGICAL_ARRAY_NAME_CONVERT(remain_dims), ...);
    OMPI_ARRAY_INT_2_LOGICAL(remain_dims, ndims);
to convert it to c-ints make a C call then restore it to Fortran logicals
before returning.

It's not always wrong to convert purely in-place, eg cart_get_f.c has
a periods[] that's exclusively for OUTPUT and it would be fine with the
macros as they were. But I still say the macros are invalid because they
don't distinguish whether they're being used on INPUT or OUTPUT args and
thus they can't be used in a way that's legal for both cases.

It might be possible to fix the macros by adding more of them so that
cart_create_f.c and cart_get_f.c would use different macros that give
more context. But my fix here is just to turn off the first block and
make all paths run as if OMPI_SIZEOF_FORTRAN_LOGICAL != SIZEOF_INT.

The main macros that get enlarged by this change are
    define OMPI_ARRAY_LOGICAL_2_INT_ALLOC : mallocs now
    define OMPI_ARRAY_LOGICAL_2_INT : also mallocs now
But these are only used in 4 places, three of which are the purpose of
this checkin, to avoid the former in-place expansion of an INPUT arg:
    cart_create_f.c
    cart_map_f.c
    cart_sub_f.c
and one of which is an OUPUT arg that was fine and that gets
unnecessarily expanded into a separate array by this checkin.
    cart_get_f.c

So I think an unnecessary malloc in cart_get_f.c is the only downside
to this change, where the logicals array argument could have been used
and converted in place.

Signed-off-by: Mark Allen <markalle@us.ibm.com>

Update provided by Gilles Gouaillardet to keep the in-place option
if OMPI_FORTRAN_VALUE_TRUE == 1 where no conversion is needed.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-04-01 10:38:05 -04:00
Mark Allen
eb888118e8 shmat/shmdt additions for patcher
This is mostly based off recent UCX additions to their patcher:
    https://github.com/openucx/ucx/pull/2703

They added triggers for
* mmap when (flags & MAP_FIXED) && (addr != NULL)
* shmat when (shmflg & SHM_REMAP) && (shmaddr != NULL)

Beyond that I noticed they already had a trigger for
* madvise when (advice == MADV_FREE)
that we didn't so I added that.

And the other main thing is we didn't really have shmat/shmdt
active for some systems because we only had a path for
syscall(SYS_shmdt, ) but we needed to also have a path for
syscall(SYS_ipc, IPCOP_shmdt, ) and same for shmat.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2019-03-29 14:38:46 -04:00
KAWASHIMA Takahiro
76516bc70c
Merge pull request #6542 from kawashima-fj/pr/man-typo
man: Fix typo of MPI_TYPE_GET_NAME
2019-03-29 13:06:46 +09:00
KAWASHIMA Takahiro
63a1968459 man: Fix typo of MPI_TYPE_GET_NAME
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-03-29 13:01:52 +09:00
bosilca
b54fdf5dd9
Merge pull request #6541 from bwbarrett/bugfix/enotconn
btl/tcp: Skip printing error message in racy cleanup path
2019-03-28 22:42:52 -04:00
Brian Barrett
d5360711fa btl/tcp: Skip printing error message in racy cleanup path
Avoid printing an error message about ENOTCONN return codes from
getpeername() when handling an incoming connection request.  At
this point in the receive state machine, the remote process has
been verified to be a valid OMPI instance.  In all-to-all startup
at 4k rank scale, we're seeing this error message when the remote
side drops the connection because it realizes it's the "loser"
in the connection race.  We were already doing all the right things,
other than printing a scary error message.  So skip the error
message and call it good.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2019-03-28 23:12:35 +00:00
Jeff Squyres
05c5e2034b
Merge pull request #6527 from James-A-Clark/master
Add compilation flag to allow unwinding through files that are present in the stack when attaching with MPIR
2019-03-28 18:16:02 -04:00
George Bosilca
6ea0c4eab9
Prevent a segfault when accessing a rank outside a communicator.
This is not fixing any issue, it is simply preventing a sefault if the
communicator creation has not happened as expected. Thus, this code path
should never really be hit in a correct MPI application with a valid
communicator creation support.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-03-28 12:03:29 -04:00
Jeff Squyres
3c1b33c93a
Merge pull request #6140 from bertwesarg/fix-cpp-condition
Fix use of bitwise operation in CPP condition
2019-03-28 10:06:20 -04:00
Nathan Hjelm
34d0790558
Merge pull request #6526 from ggouaillardet/topic/vader_fini
btl/vader: fix finalize sequence
2019-03-27 12:12:00 -06:00
James Clark
20f5840cbb Add a compilation flag that adds unwind info to all files that are present in the stack starting from MPI_Init.
This is so when a debugger attaches using MPIR, it can step out of this stack back into main.
This cannot be done with certain aggressive optimisations and missing debug information.

Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

Co-authored-by: Jeff Squyres <jsquyres@cisco.com>
2019-03-27 14:32:15 +00:00
Gilles Gouaillardet
77060cad07 btl/vader: fix finalize sequence
free the component mpool in mca_btl_vader_component_close()
and after freeing soem objects that depend on it such as
mca_btl_vader_component.vader_frags_user

Thanks Christoph Niethammer for reporting this.

Refs. open-mpi/ompi#6524

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-03-27 11:57:40 +09:00
Ralph Castain
4e5cacc8db
Merge pull request #6523 from rhc54/topic/nid
Sync nidmap to PRRTE to fix hetero topo problem
2019-03-26 09:22:58 -07:00
Ralph Castain
8174286530 Sync nidmap to PRRTE to fix hetero topo problem
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-03-26 08:24:09 -07:00
Ralph Castain
dfbc14430d
Merge pull request #6440 from ggouaillardet/topic/yield_when_idle
schizo/ompi: correctly handle the yield_when_idle option
2019-03-25 12:17:34 -07:00
Geoff Paulsen
44b3aa244b
Merge pull request #6510 from sam6258/int4_cswap_fix
shmem/fortran: Fix invalid datatype size in call to atomic cswap
2019-03-25 11:49:00 -05:00
Gilles Gouaillardet
97b7fab872
Merge pull request #6516 from ggouaillardet/topic/pmix_refresh
pmix/pmix4x: refresh to the latest PMIx
2019-03-25 14:48:45 +09:00
Gilles Gouaillardet
e844f76725 pmix/pmix4x: refresh to the latest PMIx
refrest pmi4x to pmix/pmix@20cc9c041e

Fixes open-mpi/ompi#6513

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-03-25 13:33:18 +09:00
Artem Polyakov
bfff5783f9
Merge pull request #6371 from artpol84/osc/select_dbg
osc/base: Add debug output stating a selected component
2019-03-22 22:24:04 -07:00
Joshua Ladd
9ab6ecba65
Merge pull request #6492 from janjust/oshmem-multiple-contexts-master
Oshmem multiple contexts
2019-03-22 17:34:46 -04:00
Xin Zhao
9c3d00b144 ompi/oshmem/spml/ucx: use lockfree array to optimize spml_ucx_progress/delete oshmem_barrier in shmem_ctx_destroy
ompi/oshmem/spml/ucx: optimize spml ucx progress

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-21 23:01:45 +02:00
Xin Zhao
e0414006b0 ompi/oshmem/spml/ucx:delete oob path of getting rkeys in spml ucx
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-21 23:01:45 +02:00
Xin Zhao
e1c1ab0202 ompi/oshmem/spml/ucx: defer clean up shmem_ctx to shmem_finalize
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-21 23:01:37 +02:00
Scott Miller
6b294e0641 shmem/fortran: Fix invalid datatype size in call to atomic cswap
Signed-off-by: Scott Miller <scott.miller1@ibm.com>
2019-03-20 21:57:08 -04:00
Josh Hursey
53cd31ed7e
Merge pull request #6504 from jjhursey/rm-hash-pmix4
Do not force 'hash' gds on direct modex in pmix4x
2019-03-19 20:35:12 -05:00
Ralph Castain
4e0905cda7
Merge pull request #6505 from rhc54/topic/pmxup
Sync to latest PMIx master and silence hwloc warnings
2019-03-19 12:53:15 -07:00
Ralph Castain
0f26d8c76b Silence warnings
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-03-19 10:27:39 -07:00
Ralph Castain
c4be211741 Sync to latest PMIx master
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-03-19 10:27:12 -07:00
Joshua Hursey
1314cf2640 Do not force 'hash' gds on direct modex in pmix4x
* Forcing the 'hash' gds component should not be necessary any more.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2019-03-19 11:53:26 -05:00
Josh Hursey
836c80c442
Merge pull request #6498 from jjhursey/rm-hash-pmix3
Do not force 'hash' gds on direct modex
2019-03-19 10:45:11 -05:00
Nathan Hjelm
bf5fb5b589
Merge pull request #6500 from nysal/spinlock_fix
opal/atomics: Add acquire semantics back for spinlocks
2019-03-19 07:54:37 -06:00
Jeff Squyres
5111dbd480
Merge pull request #6493 from rhc54/topic/order
Ensure that nodes are always used in order provided
2019-03-19 09:40:21 -04:00
Nysal Jan K.A
00f27a80fc opal/atomics: Add acquire semantics back for spinlocks
This was introduced in commit 9d0b3fe9

Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
2019-03-19 16:27:03 +05:30
Joshua Hursey
c2581d0e33 Do not force 'hash' gds on direct modex
* Forcing the 'hash' gds component should not be necessary any more.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2019-03-18 21:52:32 -05:00