1
1

10430 Коммитов

Автор SHA1 Сообщение Дата
bosilca
6089608858
Merge pull request #6647 from bosilca/fix/length_0
Fix/length 0
2019-05-14 17:59:15 -04:00
Jeff Squyres
9442989e2c
Merge pull request #6382 from jsquyres/pr/ofi-mtl-gitignore
mtl/ofi: add a .gitignore
2019-05-13 12:00:41 -04:00
George Bosilca
42119254c7 Fix incorrect behavior with length == 0
Fixes #6575.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-10 19:53:34 -04:00
George Bosilca
d141bf7912 Update the datatype dump to match the actual types.
Update the comments to better reflect what is going on.
Minor indentations.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-10 18:03:57 -04:00
Nathan Hjelm
4345308dfd osc/rdma: fix CAS 32-bit network atomic compatibility check
When checking for btl compatibility with 32-bit CAS osc/rdma was
checking the incorrect flag field.

Signed-off-by: Nathan Hjelm <hjelmn@cs.unm.edu>
2019-05-10 07:27:53 -06:00
KAWASHIMA Takahiro
dabad084b5
Merge pull request #6621 from bosilca/topic/persistent_req_leak
Fix the leak of fragments for persistent sends (issue #6565)
2019-05-03 15:21:42 +09:00
George Bosilca
a16cf0e4dd
Fix the leak of fragments for persistent sends.
The rdma_frag attached to the send request was not correctly released
upon request completion, leaking until MPI_Finalize. A quick solution
would have been to add RDMA_FRAG_RETURN at different locations on the
send request completion, but it would have unnecessarily made the
sendreq completion path more complex. Instead, I added the length to
the RDMA fragment so that it can be completed during the remote ack.

Be more explicit on the comment.

The rdma_frag can only be freed once when the peer forced a protocol
change (from RDMA GET to send/recv). Otherwise the fragment will be
returned once all data pertaining to it has been trasnferred.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-02 09:40:11 -04:00
Jeff Squyres
ac54d771ec mtl/ofi: add a .gitignore
Ignore generated files.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-01 14:00:00 -07:00
Yossi Itigin
5d2200a7d6
Merge pull request #6605 from brminich/topic/shmem_all2all_put
SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h
2019-05-01 12:00:21 +03:00
bosilca
399b7133ab
Merge pull request #6556 from EmmanuelBRELLE/PR_fix_local_handle_in_PUT_message
pml/ob1: fixed local handle sent during PUT control message
2019-04-27 13:51:22 -04:00
Mikhail Brinskii
2ef5bd8b36 SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-04-26 14:47:58 +03:00
Mark Allen
d85cac8f1a fixing an unsafe usage of integer disps[] (romio321 gpfs)
There are a couple MPI_Alltoallv calls in ad_gpfs_aggrs.c where the
send/recv data comes from places like req[r].lens, and the send
buffer and send displacements for example were being calculated as
    sbuf = pick one of the reqs: req[bottom].lens
    sdisps[r] = req[r].lens - req[bottom].lens
which might be okay if the .lens was data inside of req[] so they'd
all be close to each other. But each .lens field is just a pointer
that's malloced, so those addresses can be all over the place, so the
integer-sized sdisps[] isn't safe.

I changed it to have a new extra array sbuf and rbuf for those two
Alltoallv calls, and copied the data into the sbuf from the same
locations it used to be setting up the sdisps[] at, and after the
Alltoallv I copy the data out of the new rbuf into the same
locations it used to be setting up the rdisps[] at.

For what it's worth I was able to get this to fail -np 2 on a GPFS
filesystem with hints romio_cb_write enable. I didn't whittle the
test down to something small, but it was failing in an
MPI_File_write_all call.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2019-04-23 16:01:55 -04:00
Jeff Squyres
9a9d106296
Merge pull request #6555 from EmmanuelBRELLE/PR-pmlob1_fix_rc_for_putfrag_when_get_failed
pml/ob1: fixed exit from get_frag_fail when falling back on btl_put
2019-04-22 17:19:12 -04:00
Gilles Gouaillardet
251477c518
Merge pull request #6431 from ggouaillardet/topic/mpiext_nolib
mpiext/shortfloat: do not create empty libraries
2019-04-22 11:23:19 +09:00
Edgar Gabriel
c80a842036
Merge pull request #6602 from edgargabriel/topic/io_array_refactor
common/ompio: refactor the build_io_array function
2019-04-18 13:44:48 -05:00
Gilles Gouaillardet
e1098dae4b mpiext/shortfloat: do not build an empty library
the shortfloat extension is only made of header files,
and hence do not require a library to be built.

Refs. open-mpi/ompi#6205

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-04-18 13:42:18 -04:00
Gilles Gouaillardet
e70780b762 configury: allow mpi extensions with no libraries
Do not require an archive when the OMPI_MPIEXT_<ext>_HAVE_OBJECT
macro is defined to 0.
See `ompi/mpiext/example/configure.m4`.

Allow some extensions to be built on OS X since the creation of
archives with no files is not permitted.

Refs. open-mpi/ompi#6205

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-04-18 13:42:01 -04:00
Gilles Gouaillardet
232055fc7a fortran/use-mpi-f08: fix intent of the internal ompi_*_f bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-04-18 13:29:19 +09:00
Edgar Gabriel
d43427fc76 common/ompio: refactor the build_io_array function
abstract out the io_array structure to be used in common_ompio_build_io_array function.
This is preparation for a future component that would like to use the same function,
but not modify the io_array stored on the file handle itself.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-04-17 14:42:33 -05:00
Valentin Petrov
30970bdfdf OSC/UCX: correctly handle NULL origin addr and MPI_NO_OP
Addtional bugfix: origin_addr -> result_addr for no_op, replace_op
    and sum_op fetch destination.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-04-17 10:30:21 +03:00
bosilca
8cf7a7e87d
Merge pull request #6538 from bosilca/topic/issue6522
Prevent a segfault when accessing a rank outside a communicator.
2019-04-09 18:08:49 -04:00
David Eberius
461d8bc77b Fixed a potential name collision.
Signed-off-by: David Eberius <deberius@vols.utk.edu>
2019-04-03 16:43:48 -04:00
markalle
98fdeeeb41
Merge pull request #6448 from markalle/macro_writing_input_arg
in-place conversion macro writes into INPUT argument
2019-04-02 11:33:18 -05:00
Brelle Emmanuel
e630046a4b pml/ob1: fixed local handle sent during PUT control message
In case of using a btl_put in ob1, the handle of the locally registered
memory is sent with a PUT control message. In the current master code
the sent handle is necessary the handle in the frag but if the handle
has been successfully registered in the request, the frag structure does
not have any valid handle and all fragments use the request one.

I suggest to check if the handle in the fragment is valid and if not to
send the handle from the request.

Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net>
2019-04-01 18:45:05 +02:00
Brelle Emmanuel
9c689f2225 pml/ob1: fixed exit from get_frag_fail when falling back on btl_put
In the case the btl_get fails Ob1 tries to fallback on btl_put first but
the return code was ignored. So the code fell back on both btl_put and
btl_send.

Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net>
2019-04-01 18:17:10 +02:00
Mark Allen
0a7f1e3cc5 in-place conversion macro writes into INPUT argument
In fint_2_int.h there are some conversion macros for logicals. It has
one path for OMPI_SIZEOF_FORTRAN_LOGICAL != SIZEOF_INT where a new array
would be allocated and the conversions then might expand to
    c_array[i] = (array[i] == 0 ? 0 : 1)
and another path for OMPI_SIZEOF_FORTRAN_LOGICAL == SIZEOF_INT where it
does things "in place", so the same conversion there would just be
    array[i] = (array[i] == 0 ? 0 : 1)

The problem is some of the logical arrays being converted are INPUT
arguments. And it's possible for some compilers to even put the argument
in read-only memory so the above "in place" conversion SEGV's.  A
testcase I have used
    call MPI_CART_SUB(oldcomm, (/.true.,.false./), newcomm, ierr)
and gfortran put the second arg in read-only mem.

In cart_sub_f.c you can trace the ompi_fortran_logical_t *remain_dims arg.
remain_dims[] is for input only, but the file uses
    OMPI_LOGICAL_ARRAY_NAME_DECL(remain_dims);
    OMPI_ARRAY_LOGICAL_2_INT(remain_dims, ndims);
    PMPI_Cart_sub(..., OMPI_LOGICAL_ARRAY_NAME_CONVERT(remain_dims), ...);
    OMPI_ARRAY_INT_2_LOGICAL(remain_dims, ndims);
to convert it to c-ints make a C call then restore it to Fortran logicals
before returning.

It's not always wrong to convert purely in-place, eg cart_get_f.c has
a periods[] that's exclusively for OUTPUT and it would be fine with the
macros as they were. But I still say the macros are invalid because they
don't distinguish whether they're being used on INPUT or OUTPUT args and
thus they can't be used in a way that's legal for both cases.

It might be possible to fix the macros by adding more of them so that
cart_create_f.c and cart_get_f.c would use different macros that give
more context. But my fix here is just to turn off the first block and
make all paths run as if OMPI_SIZEOF_FORTRAN_LOGICAL != SIZEOF_INT.

The main macros that get enlarged by this change are
    define OMPI_ARRAY_LOGICAL_2_INT_ALLOC : mallocs now
    define OMPI_ARRAY_LOGICAL_2_INT : also mallocs now
But these are only used in 4 places, three of which are the purpose of
this checkin, to avoid the former in-place expansion of an INPUT arg:
    cart_create_f.c
    cart_map_f.c
    cart_sub_f.c
and one of which is an OUPUT arg that was fine and that gets
unnecessarily expanded into a separate array by this checkin.
    cart_get_f.c

So I think an unnecessary malloc in cart_get_f.c is the only downside
to this change, where the logicals array argument could have been used
and converted in place.

Signed-off-by: Mark Allen <markalle@us.ibm.com>

Update provided by Gilles Gouaillardet to keep the in-place option
if OMPI_FORTRAN_VALUE_TRUE == 1 where no conversion is needed.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-04-01 10:38:05 -04:00
KAWASHIMA Takahiro
63a1968459 man: Fix typo of MPI_TYPE_GET_NAME
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-03-29 13:01:52 +09:00
Jeff Squyres
05c5e2034b
Merge pull request #6527 from James-A-Clark/master
Add compilation flag to allow unwinding through files that are present in the stack when attaching with MPIR
2019-03-28 18:16:02 -04:00
George Bosilca
6ea0c4eab9
Prevent a segfault when accessing a rank outside a communicator.
This is not fixing any issue, it is simply preventing a sefault if the
communicator creation has not happened as expected. Thus, this code path
should never really be hit in a correct MPI application with a valid
communicator creation support.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-03-28 12:03:29 -04:00
Jeff Squyres
3c1b33c93a
Merge pull request #6140 from bertwesarg/fix-cpp-condition
Fix use of bitwise operation in CPP condition
2019-03-28 10:06:20 -04:00
James Clark
20f5840cbb Add a compilation flag that adds unwind info to all files that are present in the stack starting from MPI_Init.
This is so when a debugger attaches using MPIR, it can step out of this stack back into main.
This cannot be done with certain aggressive optimisations and missing debug information.

Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

Co-authored-by: Jeff Squyres <jsquyres@cisco.com>
2019-03-27 14:32:15 +00:00
Ralph Castain
dfbc14430d
Merge pull request #6440 from ggouaillardet/topic/yield_when_idle
schizo/ompi: correctly handle the yield_when_idle option
2019-03-25 12:17:34 -07:00
Artem Polyakov
bfff5783f9
Merge pull request #6371 from artpol84/osc/select_dbg
osc/base: Add debug output stating a selected component
2019-03-22 22:24:04 -07:00
Yossi Itigin
9b91cf09cc
Merge pull request #6481 from hoopoepg/topic/check-ucx-params
PML/SPML/UCX: added evaluation of mmap events
2019-03-14 11:53:42 +02:00
Austen Lauria
b61e6242d3 Fix integer overflows with indexed datatype creation.
The types of count, disp, and extent passed into
ompi_datatype_add() should be size_t, ptrdiff_t and ptrdiff_t,
respectively. This prevents integer overflows and errors in
computing the size of large indexed datatypes.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2019-03-13 09:39:57 -04:00
Sergey Oblomov
d8e3562bae PML/SPML/UCX: added evaluation of mmap events
- there was a set of UCX related issues reported which caused
  by mmap API hooks conflicts. We added diagnostic of such
  problems to simplify bug-resolving pipeline

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-03-12 21:14:27 +02:00
Geoff Paulsen
a14bb4bc89
Merge pull request #6471 from hppritcha/topic/issue_6470
ompi_info: report whether MPI1 compat is enabled
2019-03-11 21:11:55 -05:00
Howard Pritchard
61ccc65302 ompi_info: report MPI1 compat is disabled
MPI1 compat disabled beyond v4.0.x

Related to #6470

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-03-11 13:50:29 -06:00
Gilles Gouaillardet
26c1b833c7 man: remove man pages of removed MPI1 subroutines
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-03-05 15:01:07 +09:00
Gilles Gouaillardet
cc97c0f611 schizo/ompi: correctly handle the yield_when_idle option
in schizo/ompi, sets the new OMPI_MCA_mpi_oversubscribe environment
variable according to the node oversubscription state.

This MCA parameter is used to set the default value of the
mpi_yield_when_idle parameter.

This two steps tango is needed so the mpi_yield_when_idle setting
is always honored when set in a config file.

Refs. open-mpi/ompi#6433

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-28 09:53:29 +09:00
Nathan Hjelm
73085e9ce3
Merge pull request #6413 from nuriallv/issue_osc_rdma
osc/rdma: fix when determining the node with the rank_array info for a peer
2019-02-27 16:30:06 -07:00
Geoffrey Paulsen
a6d6be2853 mpi.h.in: delete removed MPI1 functions/datatypes (API change!)
This commit DELETES the removed MPI1 functions and datatypes from
both the mpi.h header and from the library (they were deleted from the
MPI standard in MPI-3.0).

WARNING: This changes the MPI API in a non-backwards compatible way.
         This also removes the configure option that was added in Open
         MPI v4.0.x, requiring users to change their apps if they are
         using any of these almost 20 year old APIs.

This commit removes the following MPI1 removed functions and datatypes:

         MPI_Address
         MPI_Errhandler_create
         MPI_Errhandler_get
         MPI_Errhandler_set
         MPI_Type_extent
         MPI_Type_hindexed
         MPI_Type_hvector
         MPI_Type_struct
         MPI_Type_UB
         MPI_Type_LB

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-02-27 08:24:11 -08:00
Geoffrey Paulsen
3136a1706c mpi.h.in: Revamp MPI-1 removed function warnings
Refs https://github.com/open-mpi/ompi/issues/6278.

This commit is intended to be cherry-picked to v4.0.x and
the following commit will ammend to this functionality for
master's removal.

Changes the prototypes for MPI removed functions in the
following ways:

There are 4 cases:

 1) User wants MPI-1 compatibility (--enable-mpi1-compatibility)

    MPI_Address (and friends) are declared in mpi.h with
    deprecation notice

 2) User does not want MPI-1 compatibility, and has a C11-capable
    compiler

    Declare an MPI_Address (etc.) macro in mpi.h, which will
    cause a compile-time error using _Static_assert C11 feature

 3) User does not want MPI-1 compatibility, and does not have a
    C11-capable compiler, but the compiler supports error function
    attributes.

    Declare an MPI_Address (etc.) macro in mpi.h, which will
    cause a compile-time error using error function attribute.

 4) User does not want MPI-1 compatibility, and does not have a
    C11-capable compiler, or a compiler that supports error
    function attributes.

    Do not declare MPI_Address (etc.) in mpi.h at all.
    Unless the user is compiling with something like -Werror,
    this will allow the user's code to compile. We are
    choosing this because it seems like a losing battle to
    make some kind of compile time error that is friendly to
    the user (and doesn't make it look like mpi.h itself is broken).

    On v4.0.x, this will allow the user code to both compile
    (albeit with a warning) and link (because the MPI_Address
    will be in the MPI library because we are preserving ABI
    back to 3.0.x).

    On master/v5.0.x, this will allow the user code to compile,
    but it will fail to link (because the MPI_Address symbol will
    not be in the MPI library).

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-02-27 08:24:11 -08:00
bosilca
8400502d8a
Merge pull request #6353 from bosilca/topic/fix_monitoring_pvar
Fix the PVAR allocation usage.
2019-02-25 16:03:56 -05:00
Howard Pritchard
9b3a9c2579
Merge pull request #6417 from abouteiller/bugfix/cart_create_cid
Cart/Graph create would not run the next_cid  algorithm
2019-02-22 13:05:59 -07:00
Howard Pritchard
d6cdbdfd39
Merge pull request #6412 from hppritcha/topic/fix_pgi_usempif08
fortran:fix for PGI linking
2019-02-21 20:31:14 -07:00
Aurelien Bouteiller
fb17115ba9
Cart/Graph create would not run the next_cid algorithm and create
disjoint communicator with inconsistent cid.

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2019-02-21 11:40:22 -05:00
Howard Pritchard
266bc3aced fortran:use mpif08 fix for PGI linking
commit c6070fd2e broke building fortran bindings
with PGI compilers.  Turns out PGI compilers need
to link in the *.o from a module file whether or
not there are module subroutines defined or not in
the module file.

Related to #6411

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-02-20 12:33:25 -07:00
Nuria Losada
3cae149262 osc/rdma: fix when determining the node with the rank_array info for a peer
Signed-off-by: Nuria Losada <nlosada@icl.utk.edu>
2019-02-20 13:12:00 -05:00
Artem Polyakov
13a8e42108
Merge pull request #6163 from artpol84/osc/mt_submission
Refactoring of osc/ucx component for MT
2019-02-20 09:41:27 -08:00