1
1

9707 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
26f44da429 coll/base: fix mca_coll_base_alltoallv_intra_basic_inplace()
correctly handle the case when a MPI task has no data to send/recv

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 15:19:14 +09:00
Gilles Gouaillardet
eaf050cfe1 romio314: adio/ad_nfs: fix buffer overflows in ADIOI_NFS_{Read,Write}Strided
Refs: models/mpich#2338
Refs: models/mpich#2617

Signed-off-by: Rob Latham <robl@mcs.anl.gov>

(back-ported from upstream commit pmodels/mpich@642db57648)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 11:11:12 +09:00
Gilles Gouaillardet
02af10ce6e romio314: update NFS read/write routines for large xfers
When we updated UFS and others we left NFS alone.  HDF group would like
a fix, so here we go.

Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>

(back-ported from upstream commit pmodels/mpich@684df9f4c9)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 11:07:47 +09:00
Jeff Squyres
7185567d50 Merge pull request #3455 from jsquyres/pr/fix-lustre-configure
Lustre configure fixes
2017-05-08 16:49:23 -04:00
Ralph Castain
ef0e0171c9 Implement the changes required to support cross-library coordination. Update PMIx to support intra-process notifications and ensure that we always notify ourselves for events. Add a new ompi/interlib directory where cross-lib coordination code can go, and put the code to declare ourselves there (called from ompi_mpi_init.c).
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-08 10:04:50 -07:00
KAWASHIMA Takahiro
e453e42279 group: Fix ompi_group_have_remote_peers
`ompi_group_t::grp_proc_pointers[i]` may have sentinel values even
for processes which reside in the local node because the array for
`MPI_COMM_WORLD` is set up before `ompi_proc_complete_init`, which
allocates `ompi_proc_t` objects for processes reside in the local
node, is called in `MPI_INIT`. So using `ompi_proc_is_sentinel`
against `ompi_group_t::grp_proc_pointers[i]` in order to determine
whether the process resides in a remote node is not appropriate.

This bug sometimes causes an `MPI_ERR_RMA_SHARED` error when
`MPI_WIN_ALLOCATE_SHARED` is called, where sm OSC uses
`ompi_group_have_remote_peers`.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-05-08 20:28:51 +09:00
KAWASHIMA Takahiro
913adce59b Revert "group: Fix ompi_group_have_remote_peers" 2017-05-08 18:42:18 +09:00
Artem Polyakov
858d8cdff7 Merge pull request #3375 from artpol84/comm_create/master
ompi/comm: Improve MPI_Comm_create algorithm
2017-05-05 20:41:16 -07:00
Jeff Squyres
c81bc50198 fs/lustre: remove redundant/dead code
We check for liblustreapi.h in OMPI_CHECK_LUSTRE, so this code was
commented out here.  Might as well fully delete it, since it's
redundant and dead.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-05-05 05:28:33 -07:00
Nathan Hjelm
4676575343 Merge pull request #3410 from kawashima-fj/pr/group-remote-peers
group: Fix `ompi_group_have_remote_peers`
2017-05-04 09:20:35 -06:00
KAWASHIMA Takahiro
28281190eb Merge pull request #3402 from kawashima-fj/pr/java
mpi/java: Add missing Java binding methods
2017-04-27 15:45:49 +09:00
Yossi
f56847542e Merge pull request #3347 from alinask/topic/ucx-sync-send
PML UCX: handle a synchronous send.
2017-04-26 18:02:09 +03:00
Alina Sklarevich
49913c692a PML UCX: unite the code for all the sending modes.
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-26 13:17:06 +03:00
Gilles Gouaillardet
96b00b0fcf f08: make procedure(MPI_User_function) type available from mpi_f08
Refs. open-mpi/ompi#3409

Thanks Nathan T. Weeks for the report

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-25 13:33:00 +09:00
KAWASHIMA Takahiro
f036bac4c2 group: Fix ompi_group_have_remote_peers
`ompi_group_t::grp_proc_pointers[i]` may have sentinel values even
for processes which reside in the local node because the array for
`MPI_COMM_WORLD` is set up before `ompi_proc_complete_init`, which
allocates `ompi_proc_t` objects for processes reside in the local
node, is called in `MPI_INIT`. So using `ompi_proc_is_sentinel`
against `ompi_group_t::grp_proc_pointers[i]` in order to determine
whether the process resides in a remote node is not appropriate.

This bug sometimes causes an `MPI_ERR_RMA_SHARED` error when
`MPI_WIN_ALLOCATE_SHARED` is called, where sm OSC uses
`ompi_group_have_remote_peers`.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-25 11:00:52 +09:00
Jeff Squyres
7ea05954bf Merge pull request #3399 from jsquyres/pr/add-aint-add-diff
mpif-externals.h: add missing MPI_AINT_ADD/MPI_AINT_DIFF
2017-04-24 15:47:43 -04:00
KAWASHIMA Takahiro
3699ce1f75 mpi/java: Set the given error handler to Win
Probably setting `MPI_ERRORS_RETURN` is unintentional. Probably...

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-24 16:55:13 +09:00
KAWASHIMA Takahiro
8558185c85 mpi/java: Add missing Java binding methods
This commit add the following methods.

| Language-indep. notation | Java binding            |
| ------------------------ | ----------------------- |
| MPI_WIN_GET_ERRHANDLER   | mpi.Win.getErrhandler   |
| MPI_FILE_SET_ERRHANDLER  | mpi.File.setErrhandler  |
| MPI_FILE_GET_ERRHANDLER  | mpi.File.getErrhandler  |
| MPI_COMM_CALL_ERRHANDLER | mpi.Comm.callErrhandler |
| MPI_FILE_CALL_ERRHANDLER | mpi.File.callErrhandler |
| MPI_FILE_IREAD_AT_ALL    | mpi.File.iReadAtAll     |
| MPI_FILE_IWRITE_AT_ALL   | mpi.File.iWriteAtAll    |
| MPI_FILE_IREAD_ALL       | mpi.File.iReadAll       |
| MPI_FILE_IWRITE_ALL      | mpi.File.iWriteAll      |
| MPI_FILE_GET_ATOMICITY   | mpi.File.getAtomicity   |

`MPI_FILE_I{READ,WRITE}(_AT)_ALL` routines are added in MPI-3.1.
I don't know why other methods were missing.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-24 16:55:03 +09:00
KAWASHIMA Takahiro
0fcd96486a fortran: Fix MPI_ARGV(S)_NULL compilation error
Fortran constants `MPI_ARGV_NULL` and `MPI_ARGVS_NULL` are defined
in MPI-3.1 p.680 as below.

> `MPI_ARGVS_NULL`
>   2-dim. array of `CHARACTER*(*)`
> `MPI_ARGV_NULL`
>   array of `CHARACTER*(*)`

`MPI_ARGV_NULL` and `MPI_ARGVS_NULL` are used as an argument of
`MPI_COMM_SPAWN` and `MPI_COMM_SPAWN_MULTIPLE` respectively and
their argument `argv` and `array_of_argv` are defined as below
for `USE mpi_f08` binding in MPI-3.1.

```
CHARACTER(LEN=*), INTENT(IN) :: argv(*)
CHARACTER(LEN=*), INTENT(IN) :: array_of_argv(count, *)
```

Defining them as `INTEGER` in `mpi_f08` module will cause
a compilation error of user programs like
"There is no specific subroutine for the generic 'mpi_comm_spawn'".

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-24 13:53:12 +09:00
Ralph Castain
8b1f01dfe6 Set the default modex parameters back to full blocking modex while we continue to test and debug the slow modex - it seems to be having issues on the Cray
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-22 15:19:46 -07:00
Jeff Squyres
d32eff6ea2 mpif-externals.h: add missing MPI_AINT_ADD/MPI_AINT_DIFF
MPI_AINT_ADD and MPI_AINT_DIFF are functions and must be declared as
externals with the proper return type.  This is already done properly
in the mpi and mpi_f08 modules; these declarations for these functions
were only missing from mpif.h (i.e., mpif-externals.h).

Thanks to Aboorva Devarajan (@AboorvaDevarajan) for the bug report.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-04-22 08:57:54 -07:00
Gilles Gouaillardet
ebe6125750 mpi/c: MPI_PROC_NULL is not a valid rank in MPI_Win_{lock,unlock}
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-22 11:13:13 +09:00
Ralph Castain
f2ed293ecd Merge pull request #3398 from rhc54/topic/modex
Implement a background fence that collects all data during modex operation
2017-04-21 15:15:49 -07:00
Ralph Castain
9fc3079ac2 Implement a background fence that collects all data during modex operation
The direct modex operation is slow, especially at scale for even modestly-connected applications. Likewise, blocking in MPI_Init while we wait for a full modex to complete takes too long. However, as George pointed out, there is a middle ground here. We could kickoff the modex operation in the background, and then trap any modex_recv's until the modex completes and the data is delivered. For most non-benchmark apps, this may prove to be the best of the available options as they are likely to perform other (non-communicating) setup operations after MPI_Init, and so there is a reasonable chance that the modex will actually be done before the first modex_recv gets called.

Once we get instant-on-enabled hardware, this won't be necessary. Clearly, zero time will always out-perform the time spent doing a modex. However, this provides a decent compromise in the interim.

This PR changes the default settings of a few relevant params to make "background modex" the default behavior:

* pmix_base_async_modex -> defaults to true

* pmix_base_collect_data -> continues to default to true (no change)

* async_mpi_init - defaults to true. Note that the prior code attempted to base the default setting of this value on the setting of pmix_base_async_modex. Unfortunately, the pmix value isn't set prior to setting async_mpi_init, and so that attempt failed to accomplish anything.

The logic in MPI_Init is:

* if async_modex AND collect_data are set, AND we have a non-blocking fence available, then we execute the background modex operation

* if async_modex is set, but collect_data is false, then we simply skip the modex entirely - no fence is performed

* if async_modex is not set, then we block until the fence completes (regardless of collecting data or not)

* if we do NOT have a non-blocking fence (e.g., we are not using PMIx), then we always perform the full blocking modex operation.

* if we do perform the background modex, and the user requested the barrier be performed at the end of MPI_Init, then we check to see if the modex has completed when we reach that point. If it has, then we execute the barrier. However, if the modex has NOT completed, then we block until the modex does complete and skip the extra barrier. So we never perform two barriers in that case.

HTH
Ralph

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-21 10:29:23 -07:00
Howard Pritchard
462342d148 Merge pull request #3311 from hppritcha/topic/libfabric_moves_to_ofi
common/libfabric: move libfabric to ofi
2017-04-21 07:50:38 -06:00
Artem Polyakov
68167ec879 ompi/comm: Improve MPI_Comm_create algorithm
Force only procs that are participating in the ne Comm to decide what
    CID is appropriate. This will have 2 advantages:
    * Speedup Comm creation for small communicators: non-participating procs
      will not interfere
    * Reduce CID fragmentation: non-overlaping groups will be allowed to use
      same CID.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-04-21 08:33:29 +07:00
Howard Pritchard
841192645b common/libfabric: move libfabric to ofi
This PR renames the common library for OFI libfabric from
libfabric to ofi.  There are a number of reasons this
is good to do:

1) its shorter and replaces 9 characters with three for
   function names for what may eventually be a fairly extensive interface
2) OFI is the term used for MTL and RML components that use
   the OFI libfabric interface
3) A planned OSC component will also use the OFI term.
4) Other HPC libraries that can use OFI libfabric tend to use
   the term "ofi" internally and also in their configure options
   relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM)

There seem to be comments in places in the Open MPI source
code that indicate that this common library will be going away.
Far from it as we will want to be able to share things like
AV objects between OMPI and possibly OSHMEM components that
use the OFI libfabric interface.

This PR also adds a synonym to the --with-libfabric(-libdir)
configury options: --with-ofi and with-ofi-libdir.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-04-20 13:07:16 -06:00
Ralph Castain
c86f71376a Increase fine grain of timing info
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-20 00:17:40 -07:00
Gilles Gouaillardet
ded63c5e0c ompi: use ompi_coll_base_sendrecv_actual() whenever possible
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-20 10:01:28 +09:00
Gilles Gouaillardet
52551d96c1 Merge pull request #3285 from ggouaillardet/topic/coll_zerobyte_messages
coll/base: always send/recv zero-byte messages
2017-04-20 09:22:47 +09:00
Gilles Gouaillardet
fa5cd0dbe5 use ptrdiff_t instead of OPAL_PTRDIFF_TYPE
since Open MPI now requires a C99, and ptrdiff_t type is part of C99,
there is no more need for the abstract OPAL_PTRDIFF_TYPE type.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-19 13:41:56 +09:00
Gilles Gouaillardet
dcf9cca21f ompi/datatype: add the OMPI_DATATYPE_INIT_UNAVAILABLE_BASIC_TYPE macro
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-19 13:09:33 +09:00
bosilca
872cf44c28 Improve the opal_pointer_array & more (#3369)
* Complete rewrite of opal_pointer_array
Instead of a cache oblivious linear search use a bits array
to speed up the management of the free space. As a result we
slightly increase the memory used by the structure, but we get a
significant boost in performance.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Do not register datatypes in the f2c translation table.
The registration is now done up into the Fortran layer, by
forcing a call to MPI_Type_c2f.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-04-18 21:41:26 -04:00
Gilles Gouaillardet
23dad50d51 mpi/c: allow MPI_PROC_NULL in MPI_Win_shared_query()
This fixes a regression introduced in open-mpi/ompi@b3a20100d3

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-19 10:06:41 +09:00
Yossi
9ebcafd6d6 Merge pull request #3260 from derbeyn/fix_yalla
Fix yalla PML: MPI_Recv does not return MPI_ERR_TRUNCATE upon overflow
2017-04-18 11:37:48 +03:00
Alina Sklarevich
d93b67257b PML UCX: handle a synchronous send.
MCA_PML_BASE_SEND_SYNCHRONOUS

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-13 18:11:55 +03:00
Alina Sklarevich
eec310c99c PML/UCX/YALLA: Fix the message release call.
Set message to MPI_MESSAGE_NULL.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-13 14:41:13 +03:00
Gilles Gouaillardet
6886c1229a Merge pull request #3327 from jeffhammond/fix-issue-3326
check for negative ranks in ompi_win_peer_invalid
2017-04-13 10:53:32 +09:00
Ralph Castain
dadc924cde Cleanup warnings when timing is not enabled
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-11 17:29:27 -07:00
Jeff Hammond
b3a20100d3 check for negative ranks in ompi_win_peer_invalid
resolves #3326 (https://github.com/open-mpi/ompi/issues/3326)

Signed-off-by: jeff.r.hammond@intel.com
2017-04-11 14:26:16 -07:00
Nathan Hjelm
bea7d9e4f7 Merge pull request #3320 from hjelmn/osc_pt2pt_fix
osc/pt2pt: fix infinite frag allocation loop
2017-04-11 09:09:30 -06:00
Artem Polyakov
4477b87e1d Merge pull request #3303 from karasevb/timing2/master
OMPI timings
2017-04-11 07:52:40 -07:00
Boris Karasev
d132eab4a5 ompi/timings: fixed the error of opal timings env import
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-04-11 12:08:48 +06:00
Nathan Hjelm
12b52b2b2c osc/pt2pt: fix infinite frag allocation loop
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-04-10 16:30:47 -06:00
KAWASHIMA Takahiro
b4599d7bb7 datatype: Fix darray MPI_ACCUMULATE bug
Array sizes of `array_of_gsizes`, `array_of_distribs`, `array_of_dargs`,
and `array_of_psizes` parameters of the `ompi_datatype_create_darray`
function (and `MPI_TYPE_CREATE_DARRAY`) are all `ndims`.
`ndims` are `i[2]`, not `i[0]`. See MPI-3.1 p.122.

Because this function `__ompi_datatype_create_from_args` is used by
pt2pt OSC, using a datatype created by `MPI_TYPE_CREATE_DARRAY` for
`MPI_(R)(GET_)ACCUMULATE` caused a segmentation fault or something
on a target process.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-10 17:31:59 +09:00
Ralph Castain
95ae0d1df3 Cleanup timing macros for portability across compilers. Rename the --enable-timing configure option to be --enable-pmix-timing so it doesn't pickup external timing requests. Remove a stale function reference in PMIx so it can compile with timing enabled.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-10 12:56:38 +06:00
Howard Pritchard
f5942ff23c Merge pull request #3304 from hppritcha/topic/de-ortization-of-ompi
de-ORTEfy the ompi tree
2017-04-07 14:14:41 -06:00
Noah Evans
ef29fb13cb de-ORTEfy the ompi tree
The ompi tree should be runtime independent, but over time a few
ORTE depedent definitions and functions have escaped into the ompi
tree. I'm working on my own runtime so I've used this as an opportunity
to get rid of ORTE dependencies in the ompi/ tree. I still need to go
back and change orte to conform to the new world and these changes are
untested, but I can now compile (but not link) without orte so I'm
commiting this changeset.

Signed-off-by: Noah Evans <noah.evans@gmail.com>
2017-04-07 12:35:58 -06:00
Boris Karasev
36a0e71f2d ompi/timings: preparing to production state
Adds:
- enabling/disabling of timings throught environment variable `OMPI_TIMING_ENABLE`
- output format: [file name]:[function name]:[description]: avg/min/max
- dynamically extending array of results for case then inited size was exhausted
- catch and collect errors
- cleanup

Note:
For use feature need to configure with `--enable-timings`
and set env `OMPI_TIMING_ENABLE = 1`

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-04-07 21:16:57 +06:00
Artem Polyakov
e3acf2a339 ompi/timings: add OMPI-level timing framework.
This is an extension of OPAL timing framework that allows to use
MPI_reduce to provide the compact representation of the collected
timings throughout the whole application.

NOTE: the functionality is disabled now, it will be enabled after
the runtime verification.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-04-07 21:16:22 +06:00