1
1

29927 Коммитов

Автор SHA1 Сообщение Дата
Mark Allen
0a7f1e3cc5 in-place conversion macro writes into INPUT argument
In fint_2_int.h there are some conversion macros for logicals. It has
one path for OMPI_SIZEOF_FORTRAN_LOGICAL != SIZEOF_INT where a new array
would be allocated and the conversions then might expand to
    c_array[i] = (array[i] == 0 ? 0 : 1)
and another path for OMPI_SIZEOF_FORTRAN_LOGICAL == SIZEOF_INT where it
does things "in place", so the same conversion there would just be
    array[i] = (array[i] == 0 ? 0 : 1)

The problem is some of the logical arrays being converted are INPUT
arguments. And it's possible for some compilers to even put the argument
in read-only memory so the above "in place" conversion SEGV's.  A
testcase I have used
    call MPI_CART_SUB(oldcomm, (/.true.,.false./), newcomm, ierr)
and gfortran put the second arg in read-only mem.

In cart_sub_f.c you can trace the ompi_fortran_logical_t *remain_dims arg.
remain_dims[] is for input only, but the file uses
    OMPI_LOGICAL_ARRAY_NAME_DECL(remain_dims);
    OMPI_ARRAY_LOGICAL_2_INT(remain_dims, ndims);
    PMPI_Cart_sub(..., OMPI_LOGICAL_ARRAY_NAME_CONVERT(remain_dims), ...);
    OMPI_ARRAY_INT_2_LOGICAL(remain_dims, ndims);
to convert it to c-ints make a C call then restore it to Fortran logicals
before returning.

It's not always wrong to convert purely in-place, eg cart_get_f.c has
a periods[] that's exclusively for OUTPUT and it would be fine with the
macros as they were. But I still say the macros are invalid because they
don't distinguish whether they're being used on INPUT or OUTPUT args and
thus they can't be used in a way that's legal for both cases.

It might be possible to fix the macros by adding more of them so that
cart_create_f.c and cart_get_f.c would use different macros that give
more context. But my fix here is just to turn off the first block and
make all paths run as if OMPI_SIZEOF_FORTRAN_LOGICAL != SIZEOF_INT.

The main macros that get enlarged by this change are
    define OMPI_ARRAY_LOGICAL_2_INT_ALLOC : mallocs now
    define OMPI_ARRAY_LOGICAL_2_INT : also mallocs now
But these are only used in 4 places, three of which are the purpose of
this checkin, to avoid the former in-place expansion of an INPUT arg:
    cart_create_f.c
    cart_map_f.c
    cart_sub_f.c
and one of which is an OUPUT arg that was fine and that gets
unnecessarily expanded into a separate array by this checkin.
    cart_get_f.c

So I think an unnecessary malloc in cart_get_f.c is the only downside
to this change, where the logicals array argument could have been used
and converted in place.

Signed-off-by: Mark Allen <markalle@us.ibm.com>

Update provided by Gilles Gouaillardet to keep the in-place option
if OMPI_FORTRAN_VALUE_TRUE == 1 where no conversion is needed.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-04-01 10:38:05 -04:00
Mark Allen
eb888118e8 shmat/shmdt additions for patcher
This is mostly based off recent UCX additions to their patcher:
    https://github.com/openucx/ucx/pull/2703

They added triggers for
* mmap when (flags & MAP_FIXED) && (addr != NULL)
* shmat when (shmflg & SHM_REMAP) && (shmaddr != NULL)

Beyond that I noticed they already had a trigger for
* madvise when (advice == MADV_FREE)
that we didn't so I added that.

And the other main thing is we didn't really have shmat/shmdt
active for some systems because we only had a path for
syscall(SYS_shmdt, ) but we needed to also have a path for
syscall(SYS_ipc, IPCOP_shmdt, ) and same for shmat.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2019-03-29 14:38:46 -04:00
KAWASHIMA Takahiro
76516bc70c
Merge pull request #6542 from kawashima-fj/pr/man-typo
man: Fix typo of MPI_TYPE_GET_NAME
2019-03-29 13:06:46 +09:00
KAWASHIMA Takahiro
63a1968459 man: Fix typo of MPI_TYPE_GET_NAME
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-03-29 13:01:52 +09:00
bosilca
b54fdf5dd9
Merge pull request #6541 from bwbarrett/bugfix/enotconn
btl/tcp: Skip printing error message in racy cleanup path
2019-03-28 22:42:52 -04:00
Brian Barrett
d5360711fa btl/tcp: Skip printing error message in racy cleanup path
Avoid printing an error message about ENOTCONN return codes from
getpeername() when handling an incoming connection request.  At
this point in the receive state machine, the remote process has
been verified to be a valid OMPI instance.  In all-to-all startup
at 4k rank scale, we're seeing this error message when the remote
side drops the connection because it realizes it's the "loser"
in the connection race.  We were already doing all the right things,
other than printing a scary error message.  So skip the error
message and call it good.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2019-03-28 23:12:35 +00:00
Jeff Squyres
05c5e2034b
Merge pull request #6527 from James-A-Clark/master
Add compilation flag to allow unwinding through files that are present in the stack when attaching with MPIR
2019-03-28 18:16:02 -04:00
George Bosilca
6ea0c4eab9
Prevent a segfault when accessing a rank outside a communicator.
This is not fixing any issue, it is simply preventing a sefault if the
communicator creation has not happened as expected. Thus, this code path
should never really be hit in a correct MPI application with a valid
communicator creation support.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-03-28 12:03:29 -04:00
Jeff Squyres
3c1b33c93a
Merge pull request #6140 from bertwesarg/fix-cpp-condition
Fix use of bitwise operation in CPP condition
2019-03-28 10:06:20 -04:00
Nathan Hjelm
34d0790558
Merge pull request #6526 from ggouaillardet/topic/vader_fini
btl/vader: fix finalize sequence
2019-03-27 12:12:00 -06:00
James Clark
20f5840cbb Add a compilation flag that adds unwind info to all files that are present in the stack starting from MPI_Init.
This is so when a debugger attaches using MPIR, it can step out of this stack back into main.
This cannot be done with certain aggressive optimisations and missing debug information.

Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

Co-authored-by: Jeff Squyres <jsquyres@cisco.com>
2019-03-27 14:32:15 +00:00
Gilles Gouaillardet
77060cad07 btl/vader: fix finalize sequence
free the component mpool in mca_btl_vader_component_close()
and after freeing soem objects that depend on it such as
mca_btl_vader_component.vader_frags_user

Thanks Christoph Niethammer for reporting this.

Refs. open-mpi/ompi#6524

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-03-27 11:57:40 +09:00
Ralph Castain
4e5cacc8db
Merge pull request #6523 from rhc54/topic/nid
Sync nidmap to PRRTE to fix hetero topo problem
2019-03-26 09:22:58 -07:00
Ralph Castain
8174286530 Sync nidmap to PRRTE to fix hetero topo problem
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-03-26 08:24:09 -07:00
Ralph Castain
dfbc14430d
Merge pull request #6440 from ggouaillardet/topic/yield_when_idle
schizo/ompi: correctly handle the yield_when_idle option
2019-03-25 12:17:34 -07:00
Geoff Paulsen
44b3aa244b
Merge pull request #6510 from sam6258/int4_cswap_fix
shmem/fortran: Fix invalid datatype size in call to atomic cswap
2019-03-25 11:49:00 -05:00
Gilles Gouaillardet
97b7fab872
Merge pull request #6516 from ggouaillardet/topic/pmix_refresh
pmix/pmix4x: refresh to the latest PMIx
2019-03-25 14:48:45 +09:00
Gilles Gouaillardet
e844f76725 pmix/pmix4x: refresh to the latest PMIx
refrest pmi4x to pmix/pmix@20cc9c041e

Fixes open-mpi/ompi#6513

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-03-25 13:33:18 +09:00
Artem Polyakov
bfff5783f9
Merge pull request #6371 from artpol84/osc/select_dbg
osc/base: Add debug output stating a selected component
2019-03-22 22:24:04 -07:00
Joshua Ladd
9ab6ecba65
Merge pull request #6492 from janjust/oshmem-multiple-contexts-master
Oshmem multiple contexts
2019-03-22 17:34:46 -04:00
Xin Zhao
9c3d00b144 ompi/oshmem/spml/ucx: use lockfree array to optimize spml_ucx_progress/delete oshmem_barrier in shmem_ctx_destroy
ompi/oshmem/spml/ucx: optimize spml ucx progress

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-21 23:01:45 +02:00
Xin Zhao
e0414006b0 ompi/oshmem/spml/ucx:delete oob path of getting rkeys in spml ucx
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-21 23:01:45 +02:00
Xin Zhao
e1c1ab0202 ompi/oshmem/spml/ucx: defer clean up shmem_ctx to shmem_finalize
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-21 23:01:37 +02:00
Scott Miller
6b294e0641 shmem/fortran: Fix invalid datatype size in call to atomic cswap
Signed-off-by: Scott Miller <scott.miller1@ibm.com>
2019-03-20 21:57:08 -04:00
Josh Hursey
53cd31ed7e
Merge pull request #6504 from jjhursey/rm-hash-pmix4
Do not force 'hash' gds on direct modex in pmix4x
2019-03-19 20:35:12 -05:00
Ralph Castain
4e0905cda7
Merge pull request #6505 from rhc54/topic/pmxup
Sync to latest PMIx master and silence hwloc warnings
2019-03-19 12:53:15 -07:00
Ralph Castain
0f26d8c76b Silence warnings
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-03-19 10:27:39 -07:00
Ralph Castain
c4be211741 Sync to latest PMIx master
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-03-19 10:27:12 -07:00
Joshua Hursey
1314cf2640 Do not force 'hash' gds on direct modex in pmix4x
* Forcing the 'hash' gds component should not be necessary any more.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2019-03-19 11:53:26 -05:00
Josh Hursey
836c80c442
Merge pull request #6498 from jjhursey/rm-hash-pmix3
Do not force 'hash' gds on direct modex
2019-03-19 10:45:11 -05:00
Nathan Hjelm
bf5fb5b589
Merge pull request #6500 from nysal/spinlock_fix
opal/atomics: Add acquire semantics back for spinlocks
2019-03-19 07:54:37 -06:00
Jeff Squyres
5111dbd480
Merge pull request #6493 from rhc54/topic/order
Ensure that nodes are always used in order provided
2019-03-19 09:40:21 -04:00
Nysal Jan K.A
00f27a80fc opal/atomics: Add acquire semantics back for spinlocks
This was introduced in commit 9d0b3fe9

Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
2019-03-19 16:27:03 +05:30
Joshua Hursey
c2581d0e33 Do not force 'hash' gds on direct modex
* Forcing the 'hash' gds component should not be necessary any more.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2019-03-18 21:52:32 -05:00
Ralph Castain
5aa775c02e Correctly set the byte_object size
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-03-18 14:29:37 -07:00
Ralph Castain
aed06e68b9 Protect against NULL node pointer
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-03-16 01:31:28 -07:00
Ralph Castain
2794ae43b3 Update nidmap
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-03-16 01:20:15 -07:00
Ralph Castain
35a597178d Ensure that nodes are always used in order provided
If a user provides a list of nodes to use via -host or -hostfile, then
ensure that the ranks are placed according to that order. Also fix a bug
where the number of slots on a node was incorrectly computed for
localhost if the name given didn't exactly match the return from
get_hostname.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-03-15 12:58:10 -07:00
Xin Zhao
48033ac1f4 ompi/oshmem: add spml_context back to sshmem_type in memheap, to keep track of ucx_ctx_default's rkeys
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-15 18:48:21 +02:00
Xin Zhao
9a06000962 ompi/oshmem/spml/ucx: let shmem_finalize to clean up any ctx left
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-15 18:48:07 +02:00
Xin Zhao
289595e45d OMPI/OSHMEM: bug-fix: store mkeys for each oshmem ctx.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-15 18:47:50 +02:00
Xin Zhao
79ba752667 ompi/oshmem/spml/ucx: fix eps destroy in shmem_ctx_destroy().
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-15 18:47:38 +02:00
Xin Zhao
b00209e1f5 Revert "OMPI/OSHMEM: bug-fix: store mkeys for each oshmem ctx."
This reverts commit f1b095c784de6d1908fa40dcf76e733110cbeaf2.

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-15 18:46:56 +02:00
Josh Hursey
ad8c842e7d
Merge pull request #6477 from markalle/report_bindings_strlen
opal_hwloc_base_cset2str() off-by-1 in its strncat()
2019-03-14 12:42:50 -05:00
Yossi Itigin
9b91cf09cc
Merge pull request #6481 from hoopoepg/topic/check-ucx-params
PML/SPML/UCX: added evaluation of mmap events
2019-03-14 11:53:42 +02:00
Sergey Oblomov
c319cf9ade COMMON/UCX: rewording of hooks suggestion
- also updated output macro

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-03-14 11:00:57 +02:00
bosilca
0173796008
Merge pull request #6482 from awlauria/indexed_datatype_overflows
Fix integer overflows with indexed datatype creation.
2019-03-13 11:46:43 -04:00
Austen Lauria
b61e6242d3 Fix integer overflows with indexed datatype creation.
The types of count, disp, and extent passed into
ompi_datatype_add() should be size_t, ptrdiff_t and ptrdiff_t,
respectively. This prevents integer overflows and errors in
computing the size of large indexed datatypes.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2019-03-13 09:39:57 -04:00
Sergey Oblomov
d8e3562bae PML/SPML/UCX: added evaluation of mmap events
- there was a set of UCX related issues reported which caused
  by mmap API hooks conflicts. We added diagnostic of such
  problems to simplify bug-resolving pipeline

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-03-12 21:14:27 +02:00
Geoff Paulsen
a14bb4bc89
Merge pull request #6471 from hppritcha/topic/issue_6470
ompi_info: report whether MPI1 compat is enabled
2019-03-11 21:11:55 -05:00