1
1

29593 Коммитов

Автор SHA1 Сообщение Дата
Matias Cabral
0601b3e982
Merge pull request #6325 from aravindksg/fix_help_reference
mtl/ofi: Fix reference to help text object
2019-02-05 07:22:51 -08:00
Ralph Castain
e2c7224281
Merge pull request #6356 from rhc54/topic/pmixup
Update to latest PRI master
2019-02-04 13:04:22 -08:00
Ralph Castain
baef25338a Update to latest PRI master
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-04 10:10:58 -08:00
bosilca
89fa06135e
Merge pull request #6348 from ggouaillardet/topic/opal_datatype_clone
opal/datatype: reset ptypes in opal_datatype_clone()
2019-02-01 00:36:33 -05:00
Gilles Gouaillardet
b395342c9f opal/datatype: reset ptypes in opal_datatype_clone()
Reset ptypes when cloning a datatype in order to prevent
a double free() in the opal_datatype_t destructor.

This fixes a bug introduced in open-mpi/ompi@7c938f070f

Fixes open-mpi/ompi#6346

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-01 11:20:13 +09:00
bosilca
2cf6944e70
Merge pull request #6326 from bosilca/fix/convertor_raw
Provide a better fix for #6285.
2019-01-31 18:20:46 -05:00
George Bosilca
5a82c4fd07
Provide a better fix for #6285.
The issue was a little complicated due to the internal stack used in the
convertor. The main issue was that in the case where we run out of iov
space to save the raw description of the data while hanbdling a
repetition (loop), instead of saving the current position and bailing out
directly we reading of the next predefined type element. It worked in
most cases, except the one identified by the HDF5 test. However, the
biggest issue here was the drop in performance for all ensuing calls to
the convertor pack/unpack, as instead of handling contiguous loops as a
whole (and minimizing the number of memory copies) we copied data
description by data description.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-01-31 10:01:48 -05:00
Jeff Squyres
4c64322db4
Merge pull request #6334 from jsquyres/pr/make-mpi-h-a-little-more-c++-friendly
mpi.h.in: use C++ static_cast<> where appropriate
2019-01-31 07:14:34 -05:00
Jeff Squyres
30afdcead9 mpi.h.in: use C++ static_cast<> where appropriate
When compiling mpi.h with a modern C++ compiler and a high degree of
pickyness (e.g., -Wold-style-cast), casting using (void*) in the
OMPI_PREDEFINED_GLOBAL and MPI_STATUS*_IGNORE macros will emit
warnings.  So if we're compiling with a C++ compiler, use C++'s
static_cast<> instead of (void*).

Thanks to @shadow-fax for identifying the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-01-31 03:22:26 -08:00
Ralph Castain
c03407320d
Merge pull request #6338 from rhc54/topic/rmlofi
Remove stale rml/ofi component
2019-01-30 14:10:55 -08:00
Ralph Castain
8794077520 Remove stale rml/ofi component
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-01-30 12:41:50 -08:00
Thananon Patinyasakdikul
782ec851ea
Merge pull request #6319 from thananon/pr/allow_overtake
pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.
2019-01-30 15:32:04 -05:00
Thananon Patinyasakdikul
58244b36d1
Merge pull request #6320 from thananon/pr/wait_sync_fix
opal/threads: reverted #6199
2019-01-30 15:31:27 -05:00
Nathan Hjelm
2c8f745d8d
Merge pull request #6337 from hjelmn/btl_vader_fix_a_stupid_error_in_the_fragment_sizes_used_by_the_free_lists_that_can_cause_weird_results
btl/vader: fix fragment sizes used by free lists
2019-01-30 13:26:57 -07:00
Nathan Hjelm
b51c8f888c btl/vader: fix fragment sizes used by free lists
This commit fixes a bug introduced in
f62d26ddbc8cda4d985cceee531a2ec32406d1f6. That commit changed how
vader allocates fragment memory from the shared memory
segment. Unfortunately, the values used for the fragment sizes did not
include space for the fragment header. This can cause an overrun of
data from one fragment to the header of the next fragment.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-30 12:31:34 -07:00
Jeff Squyres
2203f8d900
Merge pull request #6185 from ggouaillardet/topic/hwloc_macros
hwloc: remove public hwloc macros from opal_config.h
2019-01-30 07:32:22 -05:00
Gilles Gouaillardet
0aeb27f776 topo/treematch: silence a hwloc related warning
treematch/km_partitioning.c #include "config.h",
but there is no such file when the embedded treematch is used.

In order to prevent the embedded treematch from incorrectly using
the config.h from the embedded hwloc, generate a dummy config.h.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-30 14:51:38 +09:00
Aravind Gopalakrishnan
9cabcfdbba mtl/ofi: Fix reference to help text object
When we exceed the threshold number of contexts created, print appropriate help
text

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2019-01-29 15:10:06 -08:00
bosilca
29915fc943
Merge pull request #6292 from ggouaillardet/topic/opal_datatype_destruct
opal/datatype: plug a memory leak in opal_datatype_t destructor
2019-01-29 17:33:18 -05:00
Thananon Patinyasakdikul
0263456cf4 pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.
We missed an assert to check if ALLOW_OVERTAKE is set or not before
validating the sequence number and this will cause deadlock.

Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
2019-01-29 14:55:06 -05:00
Thananon Patinyasakdikul
56d3e0a43d opal/threads: reverted #6199
This commit reverted pr #6199 as it introduced deadlock in some cases.
Also removed the assert as the condition is obsoleted.

Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
2019-01-29 13:34:44 -05:00
Nathan Hjelm
ea40d48899
Merge pull request #6295 from ggouaillardet/topic/opal_convertor_raw
opal/datatype: fix opal_convertor_raw()
2019-01-29 10:57:29 -07:00
Nathan Hjelm
f9338dac93
Merge pull request #6312 from ggouaillardet/topic/op
ompi/op: fix support of non predefined datatypes with predefined oper…
2019-01-29 10:55:00 -07:00
Brian Barrett
23da9fac23
Merge pull request #6294 from bwbarrett/mtl-ofi-no-device-warning
mtl/ofi: Print descriptive error message on modex failure
2019-01-29 08:32:49 -08:00
Brian Barrett
1bb7a73a9c
Merge pull request #6302 from bwbarrett/feature/ofi-av-count
mtl/ofi: Provide av count hint during initialization
2019-01-29 08:32:24 -08:00
Edgar Gabriel
7023357843
Merge pull request #6286 from edgargabriel/pr/floating-point-division-problem
common/ompio: fix a floating point division problem
2019-01-29 10:07:09 -06:00
Gilles Gouaillardet
c8790d29de opal: remove unnecessary #include file
opal_config_bottom.h can only be #include'd in opal_config.h,
so there is no need to #include "opal_config.h" inside.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-29 07:40:54 -08:00
Gilles Gouaillardet
73d104f695 hwloc/base: fix some off-by-one errors
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-29 07:36:56 -08:00
Jeff Squyres
f22b7d4f46 hwloc/external.h: fix a clash with external HWLOC_VERSION[*]
Some macros defined by the embedded hwloc ends up in opal_config.h
because hwloc configury m4 files are slurped into Open MPI.  These
macros are not required here, and they might conflict with an external
hwloc install, so simply #undef them in hwloc/external/external.h
after including <opal_config.h> but before including the external
<hwloc.h>.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-01-29 07:36:01 -08:00
Gilles Gouaillardet
bc1cab5498 ompi/op: fix support of non predefined datatypes with predefined operators
ACCUMULATE, unlike REDUCE, can use with derived
datatypes with predefinied operations, with some
restrictions outlined in MPI-3:11.3.4.  The derived
datatype must be composed entierly from one predefined
datatype (so you can do all the construction you want,
but at the bottom, you can only use one datatype, say,
MPI_INT).

Refs. open-mpi/ompi#6275

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-29 09:33:39 +09:00
bosilca
402606ea8e
Merge pull request #6305 from ggouaillardet/topic/ompi_datatype_set_args
ompi/datatype: fix how we compute the space needed for the args
2019-01-28 12:52:18 -05:00
Nathan Hjelm
7633c427b1
Merge pull request #6304 from uberlinuxguy/issue_6303_race_fix
Adding changes for issue #6303 for branch master.
2019-01-28 08:53:34 -07:00
Gilles Gouaillardet
45fb69b2b9 ompi/datatype: fix how we compute the space needed for the args
Refs. open-mpi/ompi#6275

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-28 15:26:11 +09:00
Jason Williams
98d81a5f7a Adding changes for issue #6303 for branch master.
Signed-off-by: Jason Williams <uberlinuxguy@gmail.com>
2019-01-26 10:49:47 -05:00
Brian Barrett
44be7f139a mtl/ofi: Provide av count hint during initialization
Provide the av_attr.count hint (number of addresses that will be
inserted into the address vector through the life of the process)
at initialization of the address vector.  It's ok to be a bit
wrong, but some endpoints (RxR) can benefit by not going through
the slow growth realloc churn.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2019-01-24 15:47:24 -08:00
Gilles Gouaillardet
0832ab5acc opal/datatype: fix opal_convertor_raw
correctly handle the case in which iovec is full and the
last accessed element of the datatype is the beginning of a loop

Refs. open-mpi/ompi#6285

Thanks Axel Huebl for reporting this

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-23 15:38:43 +09:00
Gilles Gouaillardet
7c938f070f opal/datatype: plug a memory leak in opal_datatype_t destructor
correctly free ptypes if the datatype is not pre-defined.

Thanks Axel Huebl for reporting this.

Refs. open-mpi/ompi#6291

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-22 10:57:57 +09:00
Edgar Gabriel
c0f8ce0fff common/ompio: fix a floating point division problem
This commit fixes  a problem reported on the mailing list with
individual writes larger than 512 MB.

The culprit is a floating point division of two large, close values.
Changing the datatypes from float to double (which is what is being
used in the fcoll components) fixes the problem.

See issue #6285 and

 https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118

Thanks for Axel Huebl and René Widera for reporting the issue.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-01-21 17:59:12 -06:00
Brian Barrett
fe25097194 mtl/ofi: Print descriptive error message on modex failure
With MTLs, there's no "other transport" when the remote side
does not have an active NIC, so we should print a useful error
message when the modex failed (indicating lack of a NIC on
the remote side).

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2019-01-21 23:50:31 +00:00
KAWASHIMA Takahiro
352b667323
Merge pull request #6210 from kawashima-fj/pr/mpiext-use-mod
Use mpi_f08 module in mpi_f08_ext module
2019-01-21 11:56:41 +09:00
Edgar Gabriel
e67133c36c
Merge pull request #6287 from psychocoderHPC/fix-possibleFloatRoundIssue
common/ompio: possible rounding issue
2019-01-19 08:44:27 -06:00
René Widera
a91fab80a1 common/ompio: possible rounding issue
Similar to #6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <r.widera@hzdr.de>
2019-01-18 14:05:23 +01:00
Yossi Itigin
387b2ff56f
Merge pull request #6260 from hoopoepg/topic/removed-fca
COLL: removed FCA component
2019-01-17 00:05:07 +08:00
KAWASHIMA Takahiro
b380dd58b5 config/ompi_ext: use mpi module in mpi_ext module
If MPI extensions are enabled, all
`ompi/mpiext/pcollreq/use-mpi/mpiext_*_usempi.h` are included in
`ompi/mpi/fortran/mpiext-use-mpi/mpi-ext-module.F90` and all
`ompi/mpiext/pcollreq/use-mpi/mpiext_*_usempif08.h` are included in
`ompi/mpi/fortran/mpiext-use-mpi-f08/mpi-f08-ext-module.F90` using
`#include` directives.

In `mpiext_*_usempi.h` and `mpiext_*_usempif08.h`, some MPI extension
may want to use constants or handles defined in the `mpi` module and
the `mpi_f08` module. For example, if you want to define a new
datatype in `mpi_f08_ext`, you'll need the definition of
`type(mpi_datatype)`. However, putting `use mpi_f08` line in thier
`mpiext_*_usempif08.h` may cause a compilation error if more than
one MPI extensions are enabled because the `use` statement must be
put prior to any variable declarations.

To resolve this problem, this commit puts `use mpi` and `use mpi_f08`
as first lines of `mpi-ext-module.F90` and `mpi-f08-ext-module.F90`
respectively.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-01-16 11:55:55 +09:00
KAWASHIMA Takahiro
2220623f34 config/ompi_ext: Don't include mpiext_*_mpifh.h in mpi_f08_ext
Including `mpiext_*_mpifh.h` in the source file of the `mpi_f08_ext`
module is not always appropriate. For example, if you want to define
a new datatype in an MPI extension, the `include 'mpif-ext.h'` binding
defines the datatype as `integer` but the `use mpi_f08_ext` binding
defines it as `type(mpi_datatype)`. They conflict.

This commit allows each MPI extension to declare whether it wants to
include its `mpiext_*_mpifh.h` in `mpi_f08` and `mpi_f08_ext`
respectively. The default (no declaration) is 'want'.

See `ompi/mpiext/example/configure.m4` for an example.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-01-16 11:55:55 +09:00
Nathan Hjelm
fe5d5c75f9
Merge pull request #6279 from hjelmn/ompi_memory_usage
mpool: add new base module type "basic"
2019-01-15 09:48:38 -07:00
Nathan Hjelm
f62d26ddbc btl/vader: use basic mpool type to handle frag/fbox allocation
This commit updates btl/vader to use an mpool for handling all shared
memory allocations (frags, fboxes).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-14 15:54:16 -07:00
Nathan Hjelm
6ffc7cc96c mpool: add new base module type "basic"
This commit adds a new mpool base module type: basic. This module can
be used with an opal_free_list_t to allocate space from a
pre-allocated block (such as a shared memory region). The new module
only supports allocation and is not meant for more dynamic use cases
at this time.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-14 15:48:46 -07:00
Matias Cabral
dc6eb5d1a2
Merge pull request #6226 from aravindksg/sep_mca
mtl/ofi: Add MCA variables to enable SEP and to request OFI contexts
2019-01-14 11:00:37 -08:00
Aravind Gopalakrishnan
37f9aff2a0 mtl/ofi: Add MCA variables to enable SEP and to request number of OFI contexts
Moving to a model where we have users actively _enable_ SEP feature for use
rather than opening SEP by default if provider supports it. This allows us to
not regress (either functionally or for performance reasons) any apps that were
working correctly on regular endpoints.

Also, providing MCA to specify number of OFI contexts to create and default
this value to 1 (Given btl/ofi also creates one by default, this reduces the
incidence of a scenario where we allocate all available contexts by default and
if btl/ofi asks for one more, then provider breaks as it doesn't support it).

While at it, spruce up README on SEP content.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2019-01-14 09:58:36 -08:00