1
1
Граф коммитов

9570 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
6886c1229a Merge pull request #3327 from jeffhammond/fix-issue-3326
check for negative ranks in ompi_win_peer_invalid
2017-04-13 10:53:32 +09:00
Ralph Castain
dadc924cde Cleanup warnings when timing is not enabled
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-11 17:29:27 -07:00
Jeff Hammond
b3a20100d3 check for negative ranks in ompi_win_peer_invalid
resolves #3326 (https://github.com/open-mpi/ompi/issues/3326)

Signed-off-by: jeff.r.hammond@intel.com
2017-04-11 14:26:16 -07:00
Nathan Hjelm
bea7d9e4f7 Merge pull request #3320 from hjelmn/osc_pt2pt_fix
osc/pt2pt: fix infinite frag allocation loop
2017-04-11 09:09:30 -06:00
Artem Polyakov
4477b87e1d Merge pull request #3303 from karasevb/timing2/master
OMPI timings
2017-04-11 07:52:40 -07:00
Boris Karasev
d132eab4a5 ompi/timings: fixed the error of opal timings env import
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-04-11 12:08:48 +06:00
Nathan Hjelm
12b52b2b2c osc/pt2pt: fix infinite frag allocation loop
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-04-10 16:30:47 -06:00
KAWASHIMA Takahiro
b4599d7bb7 datatype: Fix darray MPI_ACCUMULATE bug
Array sizes of `array_of_gsizes`, `array_of_distribs`, `array_of_dargs`,
and `array_of_psizes` parameters of the `ompi_datatype_create_darray`
function (and `MPI_TYPE_CREATE_DARRAY`) are all `ndims`.
`ndims` are `i[2]`, not `i[0]`. See MPI-3.1 p.122.

Because this function `__ompi_datatype_create_from_args` is used by
pt2pt OSC, using a datatype created by `MPI_TYPE_CREATE_DARRAY` for
`MPI_(R)(GET_)ACCUMULATE` caused a segmentation fault or something
on a target process.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-10 17:31:59 +09:00
Ralph Castain
95ae0d1df3 Cleanup timing macros for portability across compilers. Rename the --enable-timing configure option to be --enable-pmix-timing so it doesn't pickup external timing requests. Remove a stale function reference in PMIx so it can compile with timing enabled.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-10 12:56:38 +06:00
Howard Pritchard
f5942ff23c Merge pull request #3304 from hppritcha/topic/de-ortization-of-ompi
de-ORTEfy the ompi tree
2017-04-07 14:14:41 -06:00
Noah Evans
ef29fb13cb de-ORTEfy the ompi tree
The ompi tree should be runtime independent, but over time a few
ORTE depedent definitions and functions have escaped into the ompi
tree. I'm working on my own runtime so I've used this as an opportunity
to get rid of ORTE dependencies in the ompi/ tree. I still need to go
back and change orte to conform to the new world and these changes are
untested, but I can now compile (but not link) without orte so I'm
commiting this changeset.

Signed-off-by: Noah Evans <noah.evans@gmail.com>
2017-04-07 12:35:58 -06:00
Boris Karasev
36a0e71f2d ompi/timings: preparing to production state
Adds:
- enabling/disabling of timings throught environment variable `OMPI_TIMING_ENABLE`
- output format: [file name]:[function name]:[description]: avg/min/max
- dynamically extending array of results for case then inited size was exhausted
- catch and collect errors
- cleanup

Note:
For use feature need to configure with `--enable-timings`
and set env `OMPI_TIMING_ENABLE = 1`

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-04-07 21:16:57 +06:00
Artem Polyakov
e3acf2a339 ompi/timings: add OMPI-level timing framework.
This is an extension of OPAL timing framework that allows to use
MPI_reduce to provide the compact representation of the collected
timings throughout the whole application.

NOTE: the functionality is disabled now, it will be enabled after
the runtime verification.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-04-07 21:16:22 +06:00
Artem Polyakov
1063c0d567 opal/timing: remove timings from MPI_Init and MPI_Finalize
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-04-07 21:16:21 +06:00
Nadia Derbey
f918d88c3e Fix yalla PML: Update previous commit after Yossofe's review
Signed-off-by: Nadia Derbey <Nadia.Derbey@atos.net>
2017-04-06 07:58:26 +02:00
Gilles Gouaillardet
f3581c8259 coll/base: have alltoallv send/recv zero-bytes messages
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-05 13:44:17 +09:00
Gilles Gouaillardet
5492edd71e coll/base: have ompi_coll_base_sendrecv() send/recv zero-bytes messages
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-05 13:44:05 +09:00
Nathan Hjelm
1322e5dee8 Merge pull request #3274 from hjelmn/osc_rdma_fix
osc/rdma: fix typo in atomic code
2017-04-04 00:20:42 -06:00
Gilles Gouaillardet
5dfd4ab6ca coll/tuned: remove set-but-not-used variables
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-04 13:18:11 +09:00
Nathan Hjelm
fad0803920 osc/rdma: fix typo in atomic code
Fixes #3267

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-04-03 15:54:28 -06:00
Nadia Derbey
b6de94e449 Fix yalla PML: MPI_Recv does not return MPI_ERR_TRUNCATE upon overflow
Signed-off-by: Nadia Derbey <Nadia.Derbey@atos.net>
2017-03-30 15:18:31 +02:00
Xin Zhao
ee952fcccd Passing estimated_num_procs to UCX init in PML and SPML.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-27 20:36:52 +03:00
Nathan Hjelm
c72fb30eb5 osc/pt2pt: fix typo
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2017-03-23 09:00:21 -06:00
Xin Zhao
6a99c60fbd Add multithreading support in PML UCX framework.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-20 19:55:00 +02:00
Jeff Squyres
ce0e1cd32c Merge pull request #3201 from hppritcha/jjhursey-topic/timer-gettimeofday
Jjhursey topic/timer gettimeofday
2017-03-18 20:12:36 -04:00
Howard Pritchard
b9331527f5 timer: hack use of clock_gettime
better solution needed later
 workaround for #3003

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-18 15:08:59 -05:00
Ralph Castain
45b46dc446 Merge pull request #3181 from artpol84/add_proc_fix_2/master
ompi: Avoid unnecessary PMIx lookups when adding procs (step 2).
2017-03-16 15:06:08 -07:00
Jeff Squyres
760db0d5ce osc/pt2pt: fix compiler warning
Remove unused variable.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-16 05:46:11 -07:00
Jeff Squyres
1947280865 topo/treematch: squash some compiler warnings
Only define MIN/MAX if they are not already defined.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-16 05:44:26 -07:00
Joshua Hursey
48d13aa8ef mpi/c: Force wtick/wtime to use gettimeofday
* See https://github.com/open-mpi/ompi/issues/3003 for a discussion about
   this patch. Once we get a better version in place we can revert this
   change.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-03-15 21:24:37 -05:00
Artem Polyakov
1f7a3a2d54 ompi: Avoid unnecessary PMIx lookups when adding procs (step 2).
Follow-up for 717f3fef62.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-03-16 07:47:27 +07:00
Nathan Hjelm
37214eda09 Merge pull request #3164 from hjelmn/ob1_pinned
pml/ob1: do not cache leave_pinned
2017-03-14 13:22:18 -06:00
Nathan Hjelm
3e7ef48c13 pml/ob1: do not cache leave_pinned
This commit fixes a bug that disabled both the RDMA pipeline and RDMA
protocols in ob1. ob1 was internally caching the values of
opal_leave_pinned and opal_leave_pinned_pipeline at init time. This is
no longer valid as opal_leave_pinned may be set by any call to a btl's
add_procs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-14 09:00:40 -06:00
Valentin Petrov
fe069c9570 Fixes the coll_allgather usage bug
One should use the correct module object when calling
      c_coll.coll_allgather. Otherwise there will be a segfault in the
      case, for example, when hcoll is used. In that case
      c_coll.coll_allgather = mca_coll_hcoll_allgather while
      c_coll.coll_gather_module = tuned.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2017-03-14 09:47:39 +02:00
Jeff Squyres
086748bb70 Merge pull request #3102 from omor1/master
Add missing definition of MPI_T_PVAR_SESSION_NULL (resolve #2652)
2017-03-13 15:27:05 -04:00
Alex Mikheev
c081239f88
ompi: pml ucx: fix persistant request init CR changes
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-08 13:26:29 +02:00
Alex Mikheev
c113c37a7a
ompi: pml ucx: fix persistant request initialization
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-08 10:59:41 +02:00
Nathan Hjelm
0195d15401 osc/pt2pt: flush pending fragments on lock ack
This commit addresses an issue that can occur in cases where a lot of
fragments are outstanding.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-06 13:58:46 -07:00
Edgar Gabriel
607dc2c039 Merge pull request #3103 from edgargabriel/pr/sharedfp-name-collision-fix
sharedfp/lockedfile and sm: fix the namecollision
2017-03-05 14:46:20 -06:00
Edgar Gabriel
2d462b3b80 sharedfp/lockedfile and sm: fix name collision
this fixes the issue reported by Nicolas Joly on the mailing: the sharedfp/lockedfile component does not support right now a scenario where multiple jobs read from the same input file, due to a collision of the filenames utilized for the sharedfp handle. Although not part of the oroginal report, the same occurs for the sharedfp/sm component. Add therefore the jobid to be part of the lockedfilename/sm file name.

use the OMPI_CAST_RTE_NAME macro to determine jobid

Fixes: #3098

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-03-05 11:28:28 -06:00
Omri Mor
20ab37a297 Add missing MPI_T_PVAR_SESSION_NULL to mpi.h
MPI_T_pvar_session_free() should reject null sessions and set *session to MPI_T_PVAR_SESSION_NULL

Signed-off-by: Omri Mor <omri50@gmail.com>
2017-03-05 09:03:30 -06:00
Artem Polyakov
9448814c40 ompi/pml/ucx: Fix uninitialized UCX request field.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-03-05 03:06:30 +07:00
Edgar Gabriel
d1fed77781 Merge pull request #3094 from edgargabriel/pr/master-lustre-priority
io/ompio: adjust the priority of the OMPIO component on lustre
2017-03-03 09:29:14 -06:00
KAWASHIMA Takahiro
39294caf04 Merge pull request #3086 from kawashima-fj/pr/coll-base-defs
coll: Update `ompi/mca/coll/base/coll_base_functions.h`
2017-03-03 18:53:00 +09:00
KAWASHIMA Takahiro
7cb42d9aaa Merge pull request #3085 from kawashima-fj/pr/pml-bfo-typo
pml/bfo: Correct a function name and header filenames
2017-03-03 18:48:01 +09:00
Edgar Gabriel
9e19834327 io/ompio: adjust the priority of the OMPIO component on lustre
this commit brings over the behavior from the 2.x series to master, mostly with the fork for the 3.x series in mind.
Also, use strncasecmp instead of two strncmps

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-03-02 12:10:11 -06:00
Jeff Squyres
dc53cd5f74 MPI_Wtick: may return a higher resolution than 10e-6 these days
Thanks to Mark Dixon (@ccaamad) for reporting the error.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-02 10:39:28 -05:00
KAWASHIMA Takahiro
c4ca5e703d coll: Update ompi/mca/coll/base/coll_base_functions.h
- Support MPI-2.2 and MPI-3.0 COLL features.

  * `MPI_REDUCE_SCATTER_BLOCK`
  * neighborhood collective communication
  * nonblocking collective communication

- Add `*_BASE_ARGS` and `*_BASE_ARG_NAMES` for convenience.

- Use parameter names used in the MPI Standard.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-03-02 17:58:02 +09:00
KAWASHIMA Takahiro
96aa0d90c1 pml/bfo: Correct a function name and header filenames
These lines were incorrectly modified in 90f2940.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-03-02 16:02:53 +09:00
Alex Mikheev
152f77df59
ompi: pml ucx: fix datatype packing error in bsend
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-01 16:18:19 +02:00
Yossi Itigin
33471c44ee pml_yalla/mtl_mxm/hcoll: open memory component to activate memory hooks.
Memory hooks are now set-up on demand. pml/yalla, mtl/mxm and
coll/hcoll need the memory hooks, so make sure those are installed.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-03-01 12:12:20 +02:00
Gilles Gouaillardet
880f2d5431 mpi/c: revamp error handling in MPI_{Pack,Unpack}[_external]
Thanks Alex and the folks at Mellanox for the help.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-03-01 10:03:31 +09:00
Jeff Squyres
d5266aba90 Merge pull request #2955 from jsquyres/pr/hwloc-external-fixes
Fix --with-hwloc=external
2017-02-28 14:57:07 -05:00
Josh Hursey
0006f0d7c5 Merge pull request #2773 from jjhursey/topic/hook-fwk
Add a 'hook' framework
2017-02-28 12:29:50 -06:00
Ralph Castain
735fbf8f67 Merge pull request #3011 from artpol84/add_proc_fix/master
ompi: Avoid unnecessary PMIx lookups when adding procs.
2017-02-28 08:25:08 -08:00
Jeff Squyres
fec519a793 hwloc: rename opal/mca/hwloc/hwloc.h -> hwloc-internal.h
Per a prior commit, the presence of "hwloc.h" can cause ambiguity when
using --with-hwloc=external (i.e., whether to include
opal/mca/hwloc/hwloc.h or whether to include the system-installed
hwloc.h).

This commit:

1. Renames opal/mca/hwloc/hwloc.h to hwloc-internal.h.
2. Adds opal/mca/hwloc/autogen.options to tell autogen.pl to expect to
   find hwloc-internal.h (instead of hwloc.h) in opal/mca/hwloc.
3. s@opal/mca/hwloc/hwloc.h@opal/mca/hwloc/hwloc-internal.h@g in the
   rest of the code base.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-28 07:48:42 -08:00
Jeff Squyres
0cd3b6c235 treematch: do not include <hwloc.h>
Instead, include "opal/mca/hwloc/hwloc.h"

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-28 07:45:23 -08:00
Josh Hursey
b1c4e50500 Merge pull request #2934 from jjhursey/topic/coll-comm-restructure
Move coll structure outside of the communicator
2017-02-28 08:45:18 -06:00
Nathan Hjelm
032bcf915a osc/rdma: fix compile warning
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-27 16:26:00 -07:00
George Bosilca
366d64b7e5 Move the collective structure outside the communicator.
As we changed the ABI (forcing a major release), we can limit
the size of the predefined communicators by moving the collective
structure outside the communicator. This might have a minimal,
but unnoticeable, impact on performance. This approach has been
discussed during the January 2017 devel meeting.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-27 11:54:17 -06:00
Joshua Hursey
c10bbfded6 ompi/hook: Add the hook/license framework
* Include a 'demo' component that shows some of the features.
 * Currently has hooks for:
   - MPI_Initialized
     - top, bottom
   - MPI_Init_thread
     - top, bottom
   - MPI_Finalized
     - top, bottom
   - MPI_Init
     - top (pre-opal_init), top (post-opal_init), error, bottom
   - MPI_Finalize
     - top, bottom
 * Other places in ompi can 'register' to hook into any one of these places
   by passing back a component structure filled with function pointers.
 * Add a `MCA_BASE_COMPONENT_FLAG_REQUIRED` flag to the MCA structure that
   is checked by the `hook` framework. If a required, static component has
   been excluded then the `hook` framework will fail to initialize.
   - See note in `opal/mca/mca.h` as to why this is checked in the `hook`
     framework and not in `opal/mca/base/mca_base_component_find.c`

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-27 12:05:53 -05:00
Nathan Hjelm
581bff9871 Merge pull request #3034 from hjelmn/osc_rdma_atomic
osc/rdma: make locking code more robust
2017-02-27 08:46:52 -07:00
Nathan Hjelm
4707c7c5e0 osc/rdma: make locking code more robust
Under heavy load the locking code could fail if the underlying btl
module started to return OPAL_ERR_OUT_OF_RESOURCE on atomic
operations. This commit updates the code to gracefully handle btl
errors.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-27 00:01:26 -07:00
Gilles Gouaillardet
af0b5cffb4 asm: rename the AMD64 into X86_64
in this context, AMD64 really means amd64 or em64t, so let's
rename this into X86_64 in order to avoid any confusion

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-27 15:10:50 +09:00
Sylvain Jeaugey
f827b6b8dd Fix more typos using the allgather module for allreduce operations, causing a crash when CUDA collectives are enabled.
Signed-off-by: Sylvain Jeaugey <sjeaugey@nvidia.com>
Signed-off-by: Akshay Venkatesh <akvenkatesh@nvidia.com>
2017-02-24 16:35:29 -08:00
Yossi
fb67c966a8 Merge pull request #2944 from alex-mikheev/topic/pml_ucx_bsend
ompi: pml ucx: add support for the buffered send
2017-02-22 12:21:03 +02:00
Artem Polyakov
717f3fef62 ompi: Avoid unnecessary PMIx lookups when adding procs.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-02-22 16:09:30 +07:00
Alex Mikheev
b015c8bb48 ompi: pml ucx: add support for the buffered send
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-21 17:19:22 +02:00
Gilles Gouaillardet
4184c01be5 Merge pull request #2393 from bosilca/topic/no_predefined_ddt_refcount
Don't refcount the predefined datatypes.
2017-02-21 09:38:11 +09:00
Todd Kordenbrock
048f757d9f osc-portals4: add support for noncontiguous datatypes
This commit implements onesided operations for noncontiguous
datatypes using two different algorithms.

 * If the result and/or origin datatype is noncontiguous and the
   target datatype is contiguous, then an iovec MD is created for
   the result and origin.  The operation is performed using a
   single Portals4 call (unless it exceeds the max message size).
 * If the target datatype is noncontigous, then an algorithm
   similar to the one in osc-rdma is used to loop over the
   contiguous blocks of each datatype.  The operation is
   performed using multiple Portals4 calls.

This commit ensures that individual operations do not exceed the
max atomic size or the max message size supported by the device.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2017-02-15 16:17:13 -06:00
Gilles Gouaillardet
cd4537193c osc/sm: fix MPI_Win_allocate_shared() alignment
add padding so the memory allocated by MPI_Win_allocate_shared()
is 64 bytes aligned.

Thanks Joseph Schuchart for the bug report

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-15 13:40:48 +09:00
Josh Hursey
0b273c2561 Merge pull request #2808 from jjhursey/fix/ibm/reduce-local-to-coll
coll: Move reduce_local into the coll framework
2017-02-14 15:54:15 -06:00
Nathan Hjelm
cc4a0fabcf Merge pull request #2727 from hjelmn/osc_rdma
osc/rdma: fix typo in check for MPI_MODE_NOCHECK
2017-02-14 10:50:33 -07:00
Joshua Hursey
78006f93a4 coll: Move reduce_local into the coll framework
* Since we are adding a new function to `mca_coll_base_module_2_1_0_t`
   we need to increase the version of the module structure to `2_2_0`.
 * Add a comment just above the PREDEFINED_COMMUNICATOR_PAD describing
   it's purpose and when it should change. To help future developers
   trying to answer the question noted in the comment.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-14 08:56:07 -06:00
Gilles Gouaillardet
e70a30cca4 coll/libnbc: optimize zero size ialltoall{v,w} with MPI_IN_PLACE
and incidentally avoids malloc(0)

Thanks Lisandro Dalcin for the report

Fixes open-mpi/ompi#2945

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-13 15:21:28 +09:00
Gilles Gouaillardet
12949547f4 coll/libnbc: fix a2aw_sched_linear() with zero size datatype or zero count
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-13 15:21:28 +09:00
Joshua Hursey
383330a50d coll/basic: Expand check for negative input values
* Negative values are parameter errors for neighborhood collectives
   - Add checks to the mpi/c interface `MPI_PARAM_CHECK`
 * Fix a success check for neighbor_alltoallw with dist_graph

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-08 14:26:32 -06:00
Geoff Paulsen
4917e44a7d Merge pull request #2832 from jjhursey/topic/ibm/osc-base-dt-abort
osc/base: Detect unsupported data types and abort
2017-02-05 04:26:04 -06:00
Howard Pritchard
f4ad119693 Merge pull request #2914 from hppritcha/topic/nbc_compiler_warning
swat some compiler warnings
2017-02-04 11:56:52 -05:00
Howard Pritchard
acaecb2448 swat some compiler warnings
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-02-03 08:28:15 -07:00
Gilles Gouaillardet
e879d2910a coll/tuned: make coll_tuned_gather_algorithms MCA settable
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-02 11:00:38 +09:00
Nathan Hjelm
362ac8b87e osc/pt2pt: fix threading issues
This commit fixes a number of threading issues discovered in
osc/pt2pt. This includes:

 - Lock the synchronization object not the module in osc_pt2pt_start.
   This fixes a race between the start function and processing post
   messages.

 - Always lock before calling cond_broadcast. Fixes a race between
   the waiting thread and signaling thread.

 - Make all atomically updated values volatile.

 - Make the module lock recursive to protect against some deadlock
   conditions. Will roll this back once the locks have been
   re-designed.

 - Mark incoming complete *after* completing an accumulate not
   before. This was causing an incorrect answer under certain
   conditions.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-01 10:33:01 -07:00
Gilles Gouaillardet
02558134ef coll/base: remove unused local variable
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-01 11:54:17 +09:00
Gilles Gouaillardet
ad44ecb2ba pml/base: initialize global variables
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-01 11:49:47 +09:00
bosilca
c331e6794c Allow all tuned MCA parameters to be modified programatically. (#2829)
Fix a comment in the MCA header.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-01-31 21:47:36 -05:00
Josh Hursey
5fcd69da52 Merge pull request #2831 from jjhursey/topic/ibm/pml-bsend
pml/base: Expose some bsend varaibles so PMLs may reference them
2017-01-31 10:31:42 -06:00
Gilles Gouaillardet
9bcadbd51b coll/libnbc: fix the red_schain algo of ireduce with MPI_IN_PLACE
this fixes a regression introduced in open-mpi/ompi@045d0c5f4c

Fixes open-mpi/ompi#2879

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 14:19:45 +09:00
Yossi Itigin
13c3bf0dd7 yalla: fix memory leak with blocking non-contig send.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-01-29 18:51:43 +02:00
Ralph Castain
3440b46e5e Merge pull request #2820 from rhc54/topic/async
Per f2f meeting: if async modex is given, default to no MPI init barr…
2017-01-27 15:43:43 -08:00
Josh Hursey
f4a86904c4 Merge pull request #2813 from jjhursey/fix/ibm/comm-cleanup
communicator: Fix uninitialized variable
2017-01-26 14:35:32 -06:00
Josh Hursey
ebc90f926e Merge pull request #2806 from jjhursey/fix/ibm/aint-diff-type
Fix a minor error at MPI_AINT_DIFF.
2017-01-26 14:23:21 -06:00
Josh Hursey
0408c116eb Merge pull request #2805 from jjhursey/fix/ibm/base-allgatherv
coll/base: Allgatherv MPI_IN_PLACE Bug
2017-01-26 14:21:57 -06:00
Geoffrey Paulsen
d2527cff46 Fixing comment only in MPI_IN_PLACE case for ireduce in libnbc.
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2017-01-26 10:58:51 -08:00
Geoffrey Paulsen
045d0c5f4c Fix for Ireduce + MPI_IN_PLACE.
Fixes a wrong answer from MPI_Ireduce when the red_sched_chain()
path was taken (which only happens for np<=4 and mesgsize>=64k).

The way libnbc treats MPI_IN_PLACE is to set sbuf == rbuf, and
whether an algorithm will work cleanly or not after that depends on the
details.

In this case the last steps of the algorithm amounted to
    (right neighbor is sending us reduction results from ranks 1..n-1)
    recv into rbuf from right neighbor
    add the contribution from our sbuf into rbuf
this would be fine in general, but if sbuf==rbuf, that recv overwrites
the sbuf. I changed it to recv into a tmpbuf if MPI_IN_PLACE was used.

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2017-01-25 18:08:08 -08:00
Nysal Jan K.A
94f92f6b49 osc/base: Detect unsupported data types and abort
Using MPI_MINLOC or MPI_MAXLOC with the following data types
leads to data corruption:
 * MPI_DOUBLE_INT
 * MPI_LONG_INT
 * MPI_SHORT_INT
 * MPI_LONG_DOUBLE_INT

Detect this print a error message and abort.
This workaround should be removed once the following issue is resolved:
 * https://github.com/open-mpi/ompi/issues/1666

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-25 15:28:28 -06:00
Sameh S. Sharkawi
320ab3b84f pml/base: Expose some bsend varaibles so PMLs may reference them
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-25 15:21:53 -06:00
Ralph Castain
a7b8190fdc Per f2f meeting: if async modex is given, default to no MPI init barrier, letting the user override that if desired.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-25 10:13:53 -08:00
Joshua Hursey
a2d45f6e9f communicator: Fix uninitialized variable
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-24 16:46:13 -06:00
Zhi Ming Wang
9718bbac82 Fix a minor error at MPI_AINT_DIFF.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-24 16:06:14 -06:00
Mark Allen
a3452adfa9 coll/base: Allgatherv MPI_IN_PLACE Bug
MPI_Allgatherv with MPI_IN_PLACE reads data from wrong location.

They were locating the MPI_IN_PLACE send buffer as
```c
         send_buf = (char*)rbuf;
         for (i = 0; i < rank; ++i) {
             send_buf += ((ptrdiff_t)rcounts[i] * extent);
         }
```
when it should be
```c
         send_buf = (char*)rbuf;
         send_buf += ((ptrdiff_t)disps[rank] * extent);
```
because disps[] specifies where things are in the v-style buffers.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-24 15:52:36 -06:00
Edgar Gabriel
cbb3cb9745 fs/ufs: avoid using the exclusive flag with shared file pointer
when a file is opened a second time for shared file pointer operations,
avoid setting the create and exclusive flag.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-01-24 12:11:29 -06:00
Edgar Gabriel
f5289a1803 common/ompio: store correctly the SHAREDFP_IS_SET flag
it looks like disabling the lazy_open flag for sharedfp components
revealead a bug that lead to a crash in file_close in some tests. Make
sure the SHAREDFP_IS_SET flag is correctly set (and not overwritten again),
and we use that to avoid a double-free of the communicator.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-01-24 12:09:56 -06:00
Gilles Gouaillardet
d5aa310884 mpiext/affinity: initialize all output variables of OMPI_Affinity_str()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:13:29 +09:00
Gilles Gouaillardet
501eb8dc7e ompio: plug misc memory leaks
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:13:19 +09:00
Gilles Gouaillardet
d0629f18c2 coll/libnbc: optimize size one communicators
simply "return" with ompi_request_empty if the communicator size is 1

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:12:47 +09:00
Gilles Gouaillardet
6f2ca5809b man: fix a typo in MPI_Win_get_name()
Thanks Nicolas Joly for the report

Fixes open-mpi/ompi#2782

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:08:13 +09:00
Edgar Gabriel
4dc09de3b8 common/ompio: update comment based on the previsou commit.
No source code changed.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-01-23 13:38:05 -06:00
Edgar Gabriel
3eae0eecd0 io/ompio: change default for sharedfp_lazy_open parameter
Revert the logic of io_ompio_sharedfp_lazy_open. The user now has to explicitely
disable shared fp in order for the structures not to be allocated.
Otherwise, resetting the shared fp e.g. in case the file was opened
in append mode will not work correctly, the code could deadlock.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-01-23 08:59:22 -06:00
Edgar Gabriel
d3a8d38cc6 common/ompio: correctly position shared fp in append mode
Fixes a bug reported on the mailing list. ompio did only reposition the individual
file pointer when the file was opened in append mode. Set the shared file
pointer also to point to the end of the file, similarly to the individual
file pointer.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-01-23 08:59:05 -06:00
Nathan Hjelm
0497ec0b70 osc/rdma: fix typo in check for MPI_MODE_NOCHECK
This commit fixes two typos in the lock_all path that inverted the
MPI_MODE_NOCHECK flag.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-01-12 11:28:11 -07:00
Gilles Gouaillardet
4932391002 ompi/proc: fix ompi_proc_finalize()
revert bits of open-mpi/ompi@cf534d0c95
we cannot del_procs here since the pml framework has already been closed

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-12 11:41:35 +09:00
George Bosilca
c2cd717f82 Don't refcount the predefined datatypes.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-01-11 16:48:59 -05:00
Gilles Gouaillardet
2189c5bcc3 ompi/dpm: plug a memory leak in disconnect_waitall()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:44 +09:00
Gilles Gouaillardet
cf534d0c95 ompi/proc: plug a memory leak in ompi_proc_finalize()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
1daa80d78f mtl/psm2: plug a memory leak in ompi_mtl_psm2_component_open()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 09:28:32 +09:00
Joshua Ladd
57c0c847d0 Merge pull request #2603 from xinzhao3/topic/revert-ucx-mt
Revert "PML/SPML/UCX: add UCX MT support to PML and SPML."
2017-01-04 11:50:37 -05:00
Ralph Castain
66131b4183 Remove the bcol, coll/ml, and sbgp code as stale and lacking a maintainer
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-03 19:32:48 -08:00
Ralph Castain
dadc6fbaf6 Merge pull request #2448 from thananon/remove_request_lock
Completely removed ompi_request_lock and ompi_request_cond
2017-01-03 19:31:46 -08:00
Jeff Squyres
33d2988985 Merge pull request #2647 from OMGtechy/master
Fixed -Wmisleading-indentation in ad_read_coll.c
2017-01-03 12:24:22 -05:00
Ralph Castain
fe68f23099 Only instantiate the HWLOC topology in an MPI process if it actually will be used.
There are only five places in the non-daemon code paths where opal_hwloc_topology is currently referenced:

* shared memory BTLs (sm, smcuda). I have added a code path to those components that uses the location string
  instead of the topology itself, if available, thus avoiding instantiating the topology

* openib BTL. This uses the distance matrix. At present, I haven't developed a method
  for replacing that reference. Thus, this component will instantiate the topology

* usnic BTL. Uses the distance matrix.

* treematch TOPO component. Does some complex tree-based algorithm, so it will instantiate
  the topology

* ess base functions. If a process is direct launched and not bound at launch, this
  code attempts to bind it. Thus, procs in this scenario will instantiate the
  topology

Note that instantiating the topology on complex chips such as KNL can consume
megabytes of memory.

Fix pernode binding policy

Properly handle the unbound case

Correct pointer usage

Do not free static error messages!

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-29 10:33:29 -08:00
Joshua Gerrard
94e87654c6 Fixed -Wmisleading-indentation in ad_read_coll.c
Signed-off-by: Joshua Gerrard <joshuagerrard+ompi-commit@protonmail.com>
2016-12-28 20:14:13 +00:00
Jeff Squyres
d772fcf8f1 Merge pull request #2509 from OMGtechy/master
Fixed memory leak and some -Werror=unused-result warnings
2016-12-27 17:13:23 -05:00
Nysal Jan K.A
25ba507ada mpit: Fix MPI_T_pvar_get_index
MPI_T_pvar_get_index was returning an incorrect index. The index
was never set correctly while registering the performance variables.
Additionally fix a missing case in the mca_base_var_type_t to MPI
datatype conversion. This type is currently used for control variables
registered by mxm, fca and hcoll components.

Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
2016-12-22 12:30:21 +05:30
Gilles Gouaillardet
773cad6b3e ompi/debugger: fix mqs_version_string()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-22 15:00:47 +09:00
Xin Zhao
2d77912c19 Revert "PML/SPML/UCX: add UCX MT support to PML and SPML."
This reverts commit 0ecf3c951c.

Signed-off-by: Xin Zhao <xinz@mellanox.com>
2016-12-19 18:57:48 +02:00
Joshua Gerrard
3332a7d630 Fixed memory leak and some -Werror=unused-result warnings
Signed-off-by: Joshua Gerrard <joshuagerrard+ompi-commit@protonmail.com>
2016-12-17 17:43:14 +00:00
Mark Allen
eec1d5bf2e osc/pt2pt: Fix hang with Put and Win_lock_all
* When using `MPI_Put` with `MPI_Win_lock_all` a hang is possible since
   the `put` is waiting on `eager_send_active` to become `true` but
   that variable might not be reset in the case of `MPI_Win_lock_all`
   depending on other incoming events (e.g., `post` or ACKs of lock
   requests.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2016-12-16 11:52:53 -05:00
Mark Allen
0d1336b4a8 osc/pt2pt: Fix Lock/Unlock and Get wrong answer
* When using `MPI_Lock`/`MPI_Unlock` with `MPI_Get` and non-contiguous
   datatypes is is possible that the unlock finishes too early before
   the data is actually present in the recv buffer.
 * We need to wait for the irecv to complete before unlocking the target.
   This commit waits for the outgoing fragment counts to become equal
   before unlocking.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2016-12-16 11:52:51 -05:00
Mark Allen
1ebf9fd3a4 osc/pt2pt: Fix PSCW after Fence wrong answer.
* If the user uses PSCW synchronization after a Fence then the previous
   epoch is not reset which can cause the PSCW to transfer data before
   it is ready leading to wrong answers.
 * This commit resets the `eager_send_active` in the start call.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2016-12-16 11:52:49 -05:00
Xin Zhao
0ecf3c951c PML/SPML/UCX: add UCX MT support to PML and SPML.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2016-12-15 23:59:15 +02:00
Ralph Castain
585540bcee Reduce the flood of warnings due to uninitialized variables, mismatched types, and unused things to a more bearable trickle
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-14 16:33:50 -08:00
Ralph Castain
884fb7fcf2 Update the PMIx2 support to include the latest shared memory optimizations
Update ORTE support for dynamic PMIx operations e.g., PMIx_Spawn
Update to track master
Ensure that --disable-pmix-dstore actually disables the dstore. Sync to a few debugger updates

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-14 15:00:10 -08:00
Nathan Hjelm
8155124adc Merge pull request #2558 from hjelmn/datatype_fix
ompi/datatype: fix bug in darray that causes MPI/IO failures
2016-12-13 14:02:15 -07:00
Yossi
fa6e263821 Merge pull request #2537 from alinask/topic/pml-spml-ucx-api
PML/SPML/UCX: Adapt to the API changes in the UCX lib.
2016-12-13 20:01:47 +02:00
Nathan Hjelm
eb439228b1 ompi/datatype: fix bug in darray that causes MPI/IO failures
This commit fixes errors in the lb and extent of darray datatypes. For
these datatypes the lb should be the start offset of the rank's data
in the array and the extent should be the size of the entire
datatype. In master the lb was always 0 and the extent was always to
small. This commit updates the call to opal_datatype_resize to set the
correct lb and fixes the extent calculation.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-12-13 09:25:16 -07:00
Jeff Squyres
f9e8a55a0e Merge pull request #2543 from ggouaillardet/topic/dll_bit_reproducible
ompi/debuggers: make the binary bit reproducible
2016-12-09 06:35:47 -05:00
KAWASHIMA Takahiro
6510800c16 ompi/request: Fix a persistent request creation bug
According to the MPI-3.1 p.52 and p.53 (cited below), a request
created by `MPI_*_INIT` but not yet started by `MPI_START` or
`MPI_STARTALL` is inactive therefore `MPI_WAIT` or its friends
must return immediately if such a request is passed.

The current implementation hangs in `MPI_WAIT` and its friends
in such case because a persistent request is initialized as
`req_complete = REQUEST_PENDING`. This commit fixes the
initialization.

Also, this commit fixes internal requests used in `MPI_PROBE`
and `MPI_IPROBE` which was marked wrongly as persistent.

MPI-3.1 p.52:

We shall use the following terminology: A null handle is a handle
with value MPI_REQUEST_NULL. A persistent request and the handle
to it are inactive if the request is not associated with any ongoing
communication (see Section 3.9). A handle is active if it is neither
null nor inactive. An empty status is a status which is set to return
tag = MPI_ANY_TAG, source = MPI_ANY_SOURCE, error = MPI_SUCCESS, and
is also internally configured so that calls to MPI_GET_COUNT,
MPI_GET_ELEMENTS, and MPI_GET_ELEMENTS_X return count = 0 and
MPI_TEST_CANCELLED returns false. We set a status variable to empty
when the value returned by it is not significant. Status is set in
this way so as to prevent errors due to accesses of stale information.

MPI-3.1 p.53:

One is allowed to call MPI_WAIT with a null or inactive request
argument. In this case the operation returns immediately with empty
status.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2016-12-08 21:42:05 +09:00
Alina Sklarevich
e9d2d029c6 PML/SPML/UCX: Adapt to the API changes in the UCX lib.
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2016-12-08 11:33:29 +02:00
Gilles Gouaillardet
4d8f606420 ompi/debuggers: make the binary bit reproducible
instead of compilation date __DATE__, use a MPI_Get_library_version() like string

Thanks Alastair McKinstry for the report

Fixes open-mpi/ompi#2518

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-08 13:46:43 +09:00
Joshua Ladd
59f40e7cc5 Merge pull request #2500 from vspetrov/hcoll_ctx_free_detection
Detect hcoll_context_free at config
2016-12-05 22:39:40 -05:00
Jeff Squyres
40d94fdc5a Merge pull request #2422 from edgargabriel/pr/cycle-buf-default-val
io/ompio: change the default value of mca parameter
2016-12-05 15:33:52 -05:00
Jeff Squyres
6319332258 Merge pull request #2491 from OMGtechy/master
Swapped use of fprintf for opal_output_verbose
2016-12-03 07:32:03 -05:00
Valentin Petrov
e13e264185 Detect hcoll_context_free at config
Needed for better flexibility with versioning

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2016-12-02 22:09:20 +02:00
Jeff Squyres
1504ffb18d ompi_file_delete: output a better error message
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-02 11:08:04 -05:00
Joshua Gerrard
d5a45bc12e Swapped use of fprintf for opal_output_verbose
Signed-off-by: Joshua Gerrard <enquiries@joshuagerrard.com>
2016-12-01 19:56:06 +00:00
Gilles Gouaillardet
188b9668e4 ompi/attribute: plug a memory leak in set_value()
OBJ_RELEASE() the previous attribute value if any

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-01 14:24:30 +09:00
Gilles Gouaillardet
d94e8c97a0 ompi/runtime: release F90 types in ompi_mpi_finalize()
F90 types cannot be freed by the enduser as specified by the standard.
but since they are ompi_datatype_dup'ed from predefined datatypes,
they have to be explicitly free'd at finalize time in order
to avoid a memory leak.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-01 14:24:30 +09:00
Gilles Gouaillardet
b2aca6c753 ompi/proc: plug a memory leak in ompi_proc_unpack()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-01 14:24:29 +09:00
Gilles Gouaillardet
ae278fd5df ompi/runtime: plug a memory leak
declare ompi_mpi_show_mca_params_file as NULL
so MPI_T_Init_thread() can be invoked without leaking memory

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-01 14:24:29 +09:00
Gilles Gouaillardet
43ee08b20e ompi/c: remove unused variable in [i]gatherv
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-01 13:59:25 +09:00