1
1
Граф коммитов

10530 Коммитов

Автор SHA1 Сообщение Дата
Geoff Paulsen
4e1e6f8972
Merge pull request #6993 from awlauria/fix_warnings_master
Fix miscellaneous compiler warnings.
2019-10-09 09:17:02 -05:00
Gilles Gouaillardet
33361aa124 pml/ucx: correctly handle zero size datatypes
zero-size derived datatypes are now flagged as OPAL_DATATYPE_FLAG_CONTIGUOUS
so update mca_pml_ucx_init_datatype() to correctly handle them.
Since 'size' is a 'size_t', the assertion can simply be removed.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-10-09 16:54:00 +09:00
Edgar Gabriel
a130f569df comomn_ompio_file_read/write: fix 2GB limiting issue
individual read/write operations exceeding 2GB fail in ompio
due to improper conversions from size_t to int in two different
locations. This commit fixes an issue reported by Richard Warren
from the HDF5 group.

Fixes Issue #397

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-10-05 09:50:02 -05:00
Austen Lauria
0d4004cc3c Fix miscellaneous compiler warnings.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2019-10-01 16:27:25 -04:00
Howard Pritchard
d6d73b7724 mtl/ofi: replace OMPI_UNLIKELY with OPAL version
one off patch for v4.0.x.  for some reason commit on master
didn't have this problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 5f3dbdb5c8)

Note that this commit is actually a cherry-pick from the v4.0.x
branch.  This is the opposite direction than what we nornmally do: we
usually commit to master first and then cherry-pick to the release
branches (vs. the other way around).

As is probably evident from the original commit message above, through
a comedy of errors, this commit was actually applied to the v4.0.x
branch first and then cherry-picked back to master (i.e., the problem
*did* exist in the original master commit
3aca4af548, but it was not recongized at
the time).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-10-01 09:52:27 -07:00
Jeff Squyres
ee3564a2dc
Merge pull request #7004 from mwheinz/REFS6976-master
REF6976 Silent failure of OMPI over OFI with large messages sizes
2019-09-23 17:31:21 -04:00
Michael Heinz
3aca4af548 REF6976 Silent failure of OMPI over OFI with large messages sizes
INTERNAL: STL-59403

The OFI (libfabric) MTL does not respect the maximum message size
parameter that OFI provides in the fi_info data.

This patch adds this missing max_msg_size field to the mca_ofi_module_t
structure and adds a length check to the low-level send routines.

Change-Id: I05aa71d332f2df897133b30c28bf37d98f061996
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
Reviewed-by: Adam Goldman <adam.goldman@intel.com>
Reviewed-by: Brendan Cunningham <brendan.cunningham@intel.com>
2019-09-23 15:23:48 -04:00
Jeff Squyres
cc586d808a
Merge pull request #6991 from devreal/grequestx-progress
Ensure that grequestx continuously make progress
2019-09-18 13:46:46 -04:00
Joseph Schuchart
37e6bbb1e1 Ensure that grequestx continuously make progress
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2019-09-18 18:55:11 +02:00
Sergey Oblomov
e0aee1ba5a MPI.H: fixed few typos in comments
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-09-10 13:34:38 +03:00
George Bosilca
3522916971
Mark predefined empty datatype contiguous.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-09-07 14:40:21 +10:00
Geoff Paulsen
5ff6cb6e6a
Merge pull request #6756 from markalle/romio_info
romio info: letting romio keep its internal setup
2019-09-05 15:43:07 -05:00
Raafat Feki
7877743784
Merge pull request #6857 from raafatfeki/pr/ompio_coll_write_clean
Pr/ompio_fcoll_write_clean
2019-09-04 11:06:56 -05:00
Mark Allen
14e3d7b8b0 romio info: letting romio keep its internal setup
I'm restoring the info function pointers to the IO module
but allowing the function pointers to be NULL (eg in ompio).
And letting romio321 set its function pointers for those
routines.

This means the info system uses the new OMPI-level info
system for most things, but skips it and uses the pre-existing
romio info system just for the romio module.

It's possible to convert romio, but I went a ways down that
path and found it kind of convoluted.  Having pointers from
the lower level ADIO_File back to the higher level ompi_file_t
wasn't too bad, but I got stuck trying to figure out where/how
to register the infosubscribe_subscribe callbacks vs the way
initial k/v values are scattered around the romio code currently.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2019-09-03 14:08:19 -04:00
Nathan Hjelm
c4d0752036
Merge pull request #6803 from guserav/fix-osc-sm-post-32-bit-atomics
Fix osc sm posts when only 32 bit atomics support
2019-08-27 18:23:48 -07:00
Valentin Petrov
a0d99ad190 Coll/hcoll: fixes hcoll non-blocking colls support
open-mpi/ompi@0fe756d416 Introduced
    a bug in coll/hcoll component. The ompi_requests allocated by
    libhcoll would be treated as coll_base_nbc_request during
    ompi_coll_base_retain_<> call. Afterwards this would lead to a
    segv in the request cleanup.

    Fix: since libhcoll interface does not distinguish between the
    blocling/non-blocking requests use coll_base_nbc_request all the
    time and initialize it properly in
    coll/hcoll/get_coll_handle(). It is still within 2 cache lines.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-08-27 17:22:58 +03:00
raafatfeki
2c6a5eed29 fcoll/dynamic_gen2: Adjustment of displacement index in collective write
Within the shuffle iteration, the aggregators have to set a displacement array needed to receive data from other processes. The array had 1 extra element. We adjust the displacement index to match the number of elements.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2019-08-26 10:03:23 -05:00
raafatfeki
f45e9cfdbe fcoll/vulcan: Adjustment of displacement index in collective write
Within the shuffle iteration, the aggregators have to set a displacement array needed to receive data from other processes. The array had 1 extra element. We adjust the displacement index to match the number of elements.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2019-08-26 10:03:23 -05:00
George Bosilca
2930bd9d21 Whitespace cleanup
No code or logic changes.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-08-14 11:06:47 -04:00
Artem Polyakov
d58c59eb71
Merge pull request #6893 from janjust/osc_error_path_fix
osc/ucx: Fix error path
2019-08-12 21:23:57 -07:00
Jeff Squyres
ae1f7e0c3b
Merge pull request #6879 from mwheinz/REF6877-master
PSM MTL is obsolete and should be removed
2019-08-12 15:08:25 -04:00
Tomislav Janjusic
d5f6b088ae osc/ucx: Fix error path
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-08-12 21:54:01 +03:00
Gilles Gouaillardet
63d3ccde9d coll/base: only retain datatypes/op if the request has not yet completed
a non blocking collective might return ompi_request_null, so we should not
retain anything in that case.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-08-09 09:57:56 +09:00
Gilles Gouaillardet
0862c409f1 coll/base: cleanup ompi_coll_base_nbc_request_t elements
Since ompi_coll_base_nbc_request_t is to be used in an
opal_free_list_t, it must be returned into a "clean" state.
So cleanup some data in the callback completion subroutines.

This fixes a regression introduced in open-mpi/ompi@0fe756d416

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-08-08 10:48:06 +09:00
Gilles Gouaillardet
f8eef0fde9 coll/libnbc: fixes ompi ompi_coll_libnbc_request_t parent
base ompi_coll_libnbc_request_t on top of ompi_coll_base_nbc_request_t
to correctly support the retention of datatypes/operators

This fixes a regression introduced in open-mpi/ompi@0fe756d416

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-08-08 10:47:48 +09:00
Michael Heinz
0348d14ff3 PSM MTL is obsolete and should be removed
The PSM MTL for Intel's TrueScale Infiniband HCAs is not being actively
maintained and should be removed from the master branch.

Fixes issue: #6877

Signed-off-by: Michael Heinz <michael.william.heinz@intel.com:
2019-08-07 11:43:03 -04:00
Yossi Itigin
ec9def1406
Merge pull request #6864 from hoopoepg/topic/ucx-ppn-hint
UCX: added PPN hint for UCX context
2019-08-07 13:45:38 +03:00
Edgar Gabriel
34b06dc8bd io_ompio_file_open: fix offset calculation with SEEK_END
and SEEK_CUR. fixes an issue reported by Wei-keng Liao

Fixes Issue #6858

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-08-05 15:56:25 -05:00
Ralph Castain
0e878c1ac3
Silence Coverity warning
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-08-05 09:20:54 -07:00
Sergey Oblomov
43186e494b UCX: added PPN hint for UCX context
- added PPN hint for UCX context init

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-08-05 18:07:06 +03:00
Gilles Gouaillardet
01fe53d531 fortran/use-mpi-f08: slurp missing code
Split the sentinel library in ompi/mpi/fortran/use-mpi-f08 into
 - the real sentinel that contains no code (only used to build the .mod files)
 - an internal library that does contain some code
and have libmpi_usempif08.la slurp the latter.

This fixes a regression introduced in open-mpi/ompi@5de5e751ed

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-08-04 17:06:33 +09:00
Gilles Gouaillardet
68ef097f1d
Merge pull request #6811 from ggouaillardet/topic/usempif08_sentinel
fortran/use-mpi-f08: do not slurp the sentinel module files
2019-08-01 10:45:47 +09:00
Nysal Jan K A
3c45542c51
Merge pull request #6840 from nysal/ucx_accumulate_fix
osc/ucx: Fix data corruption with non-contiguous accumulates
2019-07-25 22:11:52 +05:30
Yossi Itigin
98d0ecfe14
Merge pull request #6814 from brminich/tuned_all2all_select
COLL/TUNED: Update alltoall selection rule for mellanox platform
2019-07-25 17:51:55 +03:00
Mikhail Brinskii
65618f8db8 COLL/TUNED: Minor var names/comments fixes
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-07-24 10:23:38 +00:00
Nysal Jan K.A
3529d44702 osc/ucx: Fix data corruption with non-contiguous accumulates
Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
2019-07-24 13:07:59 +05:30
bosilca
94f26f5a51
Merge pull request #6695 from bosilca/fix/vector_stride_0
A big refresh of the datatype engine
2019-07-23 15:20:14 -04:00
Ralph Castain
8f32a59304
Merge pull request #6830 from rhc54/topic/dpm
Provide locality for all procs on node
2019-07-23 08:10:57 -07:00
Nysal Jan K A
20dd06c151
Merge pull request #6826 from nysal/ucx_nolocks_infokey
osc/ucx: Add support for the no_locks info key
2019-07-23 15:33:39 +05:30
Gilles Gouaillardet
102a46e28a
Merge pull request #6812 from ggouaillardet/topic/mpifh_c_ierr
fortran/mpif-h: fix C to Fortran error code conversion
2019-07-23 17:07:26 +09:00
KAWASHIMA Takahiro
facf8c5e98 pcollreq/mpif-h: fix MPIX_Alltoallw_init() binding
These issues were introduced in the recent commit b71af0eca0.
This commit fixes Coverity CID 1451661 and 1451660.

Though `c_info` part was an actual bug, the `c_sendtypes` part was not.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-07-23 08:45:17 +09:00
Ralph Castain
d202e10c14
Provide locality for all procs on node
Update PMIx to latest master to get supporting updates. For
connect/accept (part of comm_spawn as well), lookup locality for all
participating procs on the node and compute the relative locality so it
can be used for MPI operations.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-07-22 09:23:38 -07:00
Nysal Jan K.A
14808922cf osc/ucx: Add support for the no_locks info key
Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
2019-07-18 17:29:01 +05:30
Gilles Gouaillardet
b71af0eca0 pcollreq/mpif-h: fix MPIX_Alltoallw_init() binding
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-07-17 11:58:18 +09:00
Gilles Gouaillardet
ed703bec1b fortran/mpif-h: fix [i]alltoallw bindings
Fix a regression introduced in open-mpi/ompi@cdaed89d04

Fixes CID 1451610, 1451611 and 1451612

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-07-17 11:14:35 +09:00
Mikhail Brinskii
404c480068 COLL/TUNED: Update alltoall selection rule for mlx
Use linear with sync alltoall algorithm for certain message/comm size
ranges. Does not affect default fixed decision, unless HPCX (with its
custom parameters) is used or corresponding mca is set.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-07-13 23:27:40 +03:00
Gilles Gouaillardet
cdaed89d04 fortran/mpif-h: fix MPI_[I]Alltoallw() binding
- ignore sendcounts, sendispls and sendtypes arguments when MPI_IN_PLACE is used
 - use the right size when an inter-communicator is used.

Thanks Markus Geimer for reporting this.

Refs. open-mpi/ompi#5459

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-07-13 22:34:30 +09:00
Gilles Gouaillardet
223e6cc537 fortran/mpif-h: fix C to Fortran error code conversion
- remove incorrect use of OMPI_INT_2_FINT()
 - use homogenous syntax (e.g. c_ierr = PMPI_...())

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-07-13 18:36:12 +09:00
Gilles Gouaillardet
5de5e751ed fortran/use-mpi-f08: do not slurp the sentinel module files
A sentinel is only an internal Fortran module and hence should not
be slurped into libmpi_usempif08.so

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-07-13 16:50:55 +09:00
Gilles Gouaillardet
020a5918af
Merge pull request #2154 from ggouaillardet/topic/retain_op_and_datatypes
non-blocking collectives: retain MPI_op and MPI_Datatype(s)
2019-07-13 10:20:36 +09:00