1
1

9359 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
6aa658ae33 ompi/request: change semantics of ompi request callbacks
This commit changes the sematics of ompi request callbacks. If a
request's callback has freed or re-posted (using start) a request
the callback must return 1 instead of OMPI_SUCCESS. This indicates
to ompi_request_complete that the request should not be modified
further. This fixes a race condition in osc/pt2pt that could lead
to the req_state being inconsistent if a request is freed between
the callback and setting the request as complete.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-17 20:14:01 -06:00
Jeff Squyres
fb894e6e3e pkg-config: fix static linking
We need to list all major project libraries in the private libraries
line to enable static linking to work properly.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-17 20:37:51 -05:00
Sylvain Jeaugey
61e900eea5 Fix typo calling allreduce with the allgather module.
That was causing CUDA collective to crash.
2016-08-17 17:05:13 -07:00
Edgar Gabriel
e14c23ba79 Merge pull request #1980 from edgargabriel/topic/coverty-cleanup
io/ompio: Topic/coverty cleanup
2016-08-17 17:27:51 -05:00
Edgar Gabriel
2c8437ce62 fs/pvfs2: fix a common symbol 2016-08-17 13:10:32 -05:00
Edgar Gabriel
eba5293586 fix coverty warning CID 1369021 2016-08-17 13:02:45 -05:00
Nathan Hjelm
40b70889e5 osc/pt2pt: make receive count an unsigned int
This receive_count MCA variable should never be negative. Change it
to an unsigned int.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-08-17 08:14:24 -06:00
Gilles Gouaillardet
8faa1edafa osc/pt2pt: silence misc warnings 2016-08-17 14:24:14 +09:00
LANL OMPI Bot
96c7762050 Merge pull request #1942 from hppritcha/topic/minor_ofi_fix
mtl/ofi: use mca param to set av type
2016-08-16 14:14:12 -06:00
Nathan Hjelm
9444df1eb7 osc/pt2pt: make lock_all locking on-demand
The original lock_all algorithm in osc/pt2pt sent a lock message to
each peer in the communicator even if the peer is never the target of
an operation. Since this scales very poorly the implementation has
been replaced by one that locks the remote peer on first communication
after a call to MPI_Win_lock_all.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-11 15:33:07 -06:00
Nathan Hjelm
7589a25377 osc/pt2pt: do not repost receive from request callback
This commit fixes an issue that can occur if a target gets overwhelmed with
requests. This can cause osc/pt2pt to go into deep recursion with a stack
like req_complete_cb -> ompi_osc_pt2pt_callback -> start -> req_complete_cb
-> ... . At small scale this is fine as the recursion depth stays small but
at larger scale we can quickly exhaust the stack processing frag requests.
To fix the issue the request callback now simply puts the request on a
list and returns. The osc/pt2pt progress function then handles the
processing and reposting of the request.

As part of this change osc/pt2pt can now post multiple fragment receive
requests per window. This should help prevent a target from being overwhelmed.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-08-11 15:33:07 -06:00
George Bosilca
8d0baf140f If the RTE fails to deliver the daemon information,
gracefully fallback to a non-reordered communicator.
Optimize the loops building the process hierarchy.
2016-08-11 13:04:27 -04:00
Howard Pritchard
e46eee3fcb mtl/ofi: use mca param to set av type
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-08-10 16:10:17 -06:00
Gilles Gouaillardet
dfbf2b7be4 opal/threads: add OPAL_THREAD_SUB_SIZE_T macro
-1 is not a valid size_t, so instead of OPAL_THREAD_ADD_SIZE_T(..., -1),
simply OPAL_THREAD_SUB_SIZE_T(..., 1) and keep picky compilers happy
2016-08-10 13:37:36 +09:00
Nathan Hjelm
799104f688 Merge pull request #1947 from hjelmn/perf
pml/ob1: be more selective when using rdma capable btls
2016-08-09 22:15:09 -06:00
Nathan Hjelm
4079eec974 pml/ob1: be more selective when using rdma capable btls
This commit updates the btl selection logic for the RDMA and RDMA
pipeline protocols to use a btl iff: 1) the btl is also used for eager
messages (high exclusivity), or 2) no other RDMA btl is available on
an endpoint and the pml_ob1_use_all_rdma MCA variable is true. This
fixes a performance regression with shared memory when an RDMA capable
network is available.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-09 20:54:42 -06:00
Ralph Castain
ba77d9beff Remove forced debugs 2016-08-08 13:20:24 -07:00
Nathan Hjelm
2788083b98 Merge pull request #1936 from hjelmn/osc_pt2pt_fix
osc/pt2pt: do not set rdma_frag after start
2016-08-08 14:17:40 -06:00
Nathan Hjelm
e4d7ea75a9 Merge pull request #1935 from hjelmn/persistent_fix
pml/ob1: reset req_bytes_packed on start
2016-08-08 14:17:13 -06:00
Todd Kordenbrock
3be6052523 Merge pull request #1896 from PDeveze/Patchs-on-coll-portals4
Patchs on coll portals4
2016-08-08 14:57:02 -05:00
Edgar Gabriel
fb9fa4fbc4 Merge pull request #1938 from edgargabriel/pr/barrier-on-close
io/ompio: Add barrier to file_close and to file_set_size
2016-08-08 09:22:08 -05:00
Edgar Gabriel
4709f4229b Merge pull request #1929 from edgargabriel/pr/ompio-code-reorg
io/ompio: next step in code-reorganization
2016-08-08 09:20:54 -05:00
Thananon Patinyasakdikul
23b27c510c romio: make romio use internal opal_random instead of rand(3).
This fixes issue #1877
2016-08-05 09:04:52 -07:00
Howard Pritchard
ff669e7b15 code cleanup: clang is now a happier panda
Clang 5.1 on my mac was a sad panda compiling a couple
of files,  complaining about uninitialized stack variables.

This commit makes clang a happier panda (or at least not so sad).

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-08-04 19:34:44 -06:00
Edgar Gabriel
9c3180160c io/ompio: Add barrier to file_close and to file_set_size
This fixes a bug reported on the mailing for ompio.
https://www.open-mpi.org/community/lists/users/2016/05/29333.php
2016-08-04 11:20:31 -05:00
Jeff Squyres
c2634612d7 Merge pull request #1870 from jsquyres/pr/fix-unweighted-and-weights-empty
mpi.h: fix types of MPI_UNWEIGHTED and MPI_WEIGHTS_EMPTY
2016-08-04 08:49:58 -07:00
Gilles Gouaillardet
60e91e890a coll/base: give a boost to ompi_coll_base_sendrecv_nonzero_actual()
Based on current implementation it is faster to use a blocking
send than the non-blocking version. Switch the exchange function
used in the barrier to use the blocking version combined with
the non-blocking version of the receive.

This is similar to open-mpi/ompi@223d75595d
2016-08-04 13:31:07 +09:00
Nathan Hjelm
11c853d05e osc/pt2pt: do not set rdma_frag after start
It is possible for the start call to complete the requests. For this
reason the module rdma_frag field should be filled in before start is
called. If the request completes the completion callback will reset
the rdma_frag field to NULL. Fixes a bug discovered by @tkordenbrock.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-03 15:20:36 -06:00
Nathan Hjelm
889dd32806 pml/ob1: reset req_bytes_packed on start
On start we were not correctly resetting all request fields. This was
leading to a double-completion on persistent receives. This commit
updates the base start code to reset the receive req_bytes_packed and
the send request convertor.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-03 11:29:30 -06:00
KAWASHIMA Takahiro
642acfbb8c Merge pull request #1927 from kawashima-fj/pr/fortran-named-constants
fortran: Correct named constants and a datatype name
2016-08-03 08:40:36 +09:00
Edgar Gabriel
aa7e852e44 common/ompio: files are only compiled in case MPI I/O is requested
fixes: open-mpi/ompi#1932
2016-08-02 15:01:38 -05:00
George Bosilca
087761c2dc Fix a warning and other small cleanups. 2016-08-02 17:33:53 +02:00
Edgar Gabriel
19fe5cac50 io/ompio: next step in code-reorganization
- move the sort_iovec operations to fcoll/base
 - move set_view_internal to common/ompio
 - move set_file_default to common/ompio
 - remove io_ompio_sort, not used anymore.
2016-08-02 09:18:29 -05:00
KAWASHIMA Takahiro
722e898e8e fortran: Correct the name of MPI_INTEGER16.
The name of `MPI_INTEGER16` obtained using `MPI_TYPE_GET_NAME`
from Fortran program was incorrect (`MPI_INTEGER8` was obtained)
when `INTEGER*16` is not supported by a compiler.

This bug affects only the Fortran binding because `MPI_INTEGER16`
is not defined in `mpi.h` if a compiler does not support it.
2016-08-02 22:37:58 +09:00
KAWASHIMA Takahiro
5383003eab fortran: Add missing predefined datatype named constants.
This commit add the following Fortran named constants which are
defined in the MPI standard but are missing in Open MPI.

- `MPI_LONG_LONG` (defined as a synonym of `MPI_LONG_LONG_INT`)
- `MPI_CXX_FLOAT_COMPLEX`
- `MPI_C_BOOL`

And this commit also changes the value of the following Fortran
named constant for consistency.

- `MPI_C_COMPLEX`
  `(MPI_C_FLOAT_COMPLEX` is defined as a synonym of this)

Each needs a different solution described below.

For `MPI_LONG_LONG`:

The value of `MPI_LONG_LONG` is defined to have a same value
as `MPI_LONG_LONG_INT` because of the following reasons.

1. It is defined as a synonym of `MPI_LONG_LONG_INT` in
   the MPI standard.
2. `MPI_LONG_LONG_INT` and `MPI_LONG_LONG` has a same value
   for C in `mpi.h`.
3. `ompi_mpi_long_long` is not defined in
   `ompi/datatype/ompi_datatype_module.c`.

For `MPI_CXX_FLOAT_COMPLEX`:

Existing `MPI_CXX_COMPLEX` is replaced with `MPI_CXX_FLOAT_COMPLEX`
bacause `MPI_CXX_FLOAT_COMPLEX` is the right name defined in MPI-3.1
and `MPI_CXX_COMPLEX` is not defined in MPI-3.1 (nor older).
But for compatibility, `MPI_CXX_COMPLEX` is treated as a synonym
of `MPI_CXX_FLOAT_COMPLEX` on Open MPI.

For `MPI_C_BOOL`:

`MPI_C_BOOL` is newly added. The value which `MPI_C_COMPLEX` had
used (68) is assinged for it because the value becomes no longer
in use (described later) and it is a suited position as a datatype
added on MPI-2.2.

For `MPI_C_COMPLEX`:

Existing `MPI_C_FLOAT_COMPLEX` is replaced with `MPI_C_COMPLEX`
and `MPI_C_FLOAT_COMPLEX` is changed to have the same value.
In other words, make `MPI_C_COMPLEX` the canonical name and
make `MPI_C_FLOAT_COMPLEX` an alias of it.
This is bacause the relation of these datatypes is same as
the relation of `MPI_LONG_LONG_INT` and `MPI_LONG_LONG`, and
`MPI_LONG_LONG_INT` and `MPI_LONG_LONG` are implemented like that.
But in the datatype engine, we use `ompi_mpi_c_float_complex`
instead of `ompi_mpi_c_complex` as a variable name to keep
the consistency with the other similar types such as
`ompi_mpi_c_double_complex` (see George's comment in open-mpi/ompi#1927).
We don't delete `ompi_mpi_c_complex` now because it is used in
some other places in Open MPI code. It may be cleand up in the future.

In addition, `MPI_CXX_COMPLEX`, which was defined only in the Open MPI
Fortran binding, is added to `mpi.h` for the C binding.

This commit breaks binary compatibility of Fortran `MPI_C_COMPLEX`.
When this commit is merged into v2.x branch, the change of
`MPI_C_COMPLEX` should be excluded.
2016-08-02 22:36:41 +09:00
Gilles Gouaillardet
917d96ba50 coll/libnbc: cleanup handling of the second temporary buffer in ireduce 2016-08-02 16:32:15 +09:00
Gilles Gouaillardet
ed9139ca13 coll/libnbc: correctly handle datatype alignment when allocating two buffers at once 2016-08-02 15:44:12 +09:00
KAWASHIMA Takahiro
0cb5dfe18d fortran: Correct predefined datatype named constants. 2016-08-02 13:12:22 +09:00
KAWASHIMA Takahiro
ad3b590172 fortran: Add missing MPI_NO_OP and MPI_WIN_* named constants. 2016-08-02 13:10:44 +09:00
Edgar Gabriel
c0bd8728fd io/ompio: move aggregator selection code to a separate file
- move all functions related to aggregator selection to a single file
- perform code cleanup fixing many Coverty complains along the way.
2016-08-01 14:04:27 -05:00
Jeff Squyres
50952c3a31 Merge pull request #1912 from rivis/pr/mpisync
mpisync: Fix a compilation error.
2016-07-29 14:53:44 -04:00
Edgar Gabriel
160d9a78c1 Merge pull request #1886 from edgargabriel/pr/ompio-reorg
io/ompio: move io/ompio functionality to common/ompio
2016-07-29 12:24:21 -05:00
Joshua Ladd
4a03a657c6 Merge pull request #1913 from vspetrov/hcoll_derived_datatypes
coll/hcoll mpi datatypes support
2016-07-29 10:08:23 -04:00
Nathan Hjelm
1da558407c Merge pull request #1911 from hjelmn/threads
opal/thread: clean up and add additional OPAL_THREAD macros
2016-07-29 06:44:11 -06:00
Valentin Petrov
3582bba6b7 coll/hcoll mpi datatypes support 2016-07-29 10:06:39 +03:00
Gilles Gouaillardet
9f3e1a0620 Merge pull request #1898 from ggouaillardet/topic/poc_configury_cli
configury: capture configury command line
2016-07-29 11:55:21 +09:00
Howard Pritchard
5ff6b81eee Merge pull request #1871 from hppritcha/topic/ofi_mtl_params
mtl/ofi: add some more mca parameters
2016-07-28 18:21:23 -06:00
Gilles Gouaillardet
273e56096b configury: capture configury command line
configury command line is quoted and made available via the OPAL_CONFIGURE_CLI macro.
it can be retrieved via {orte-info,ompi_info,oshmem_info} -c, or
{orte-info,ompi_info,oshmem_info} --all --parseable | grep ^config:cli:
2016-07-29 09:14:09 +09:00
Ralph Castain
cacb582ecd Support timeout values when performing connect/accept operations. Bump default timeout to 10 minutes so folks have time to start the partnering application 2016-07-28 14:09:06 -07:00
KAWASHIMA Takahiro
2a932f48ad mpisync: Fix a compilation error.
This commit fixes the undefined `OPAL_MAXHOSTNAMELEN` error
which arises only when `--enable-timing` is specified for
`configure`.

This bug exists only in master branch because the commit 3322347
is not merged into other branches.
2016-07-29 02:38:25 +09:00