1
1

9446 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
e70a30cca4 coll/libnbc: optimize zero size ialltoall{v,w} with MPI_IN_PLACE
and incidentally avoids malloc(0)

Thanks Lisandro Dalcin for the report

Fixes open-mpi/ompi#2945

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-13 15:21:28 +09:00
Gilles Gouaillardet
12949547f4 coll/libnbc: fix a2aw_sched_linear() with zero size datatype or zero count
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-13 15:21:28 +09:00
Joshua Hursey
383330a50d coll/basic: Expand check for negative input values
* Negative values are parameter errors for neighborhood collectives
   - Add checks to the mpi/c interface `MPI_PARAM_CHECK`
 * Fix a success check for neighbor_alltoallw with dist_graph

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-08 14:26:32 -06:00
Geoff Paulsen
4917e44a7d Merge pull request #2832 from jjhursey/topic/ibm/osc-base-dt-abort
osc/base: Detect unsupported data types and abort
2017-02-05 04:26:04 -06:00
Howard Pritchard
f4ad119693 Merge pull request #2914 from hppritcha/topic/nbc_compiler_warning
swat some compiler warnings
2017-02-04 11:56:52 -05:00
Howard Pritchard
acaecb2448 swat some compiler warnings
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-02-03 08:28:15 -07:00
Gilles Gouaillardet
e879d2910a coll/tuned: make coll_tuned_gather_algorithms MCA settable
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-02 11:00:38 +09:00
Nathan Hjelm
362ac8b87e osc/pt2pt: fix threading issues
This commit fixes a number of threading issues discovered in
osc/pt2pt. This includes:

 - Lock the synchronization object not the module in osc_pt2pt_start.
   This fixes a race between the start function and processing post
   messages.

 - Always lock before calling cond_broadcast. Fixes a race between
   the waiting thread and signaling thread.

 - Make all atomically updated values volatile.

 - Make the module lock recursive to protect against some deadlock
   conditions. Will roll this back once the locks have been
   re-designed.

 - Mark incoming complete *after* completing an accumulate not
   before. This was causing an incorrect answer under certain
   conditions.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-01 10:33:01 -07:00
Gilles Gouaillardet
02558134ef coll/base: remove unused local variable
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-01 11:54:17 +09:00
Gilles Gouaillardet
ad44ecb2ba pml/base: initialize global variables
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-01 11:49:47 +09:00
bosilca
c331e6794c Allow all tuned MCA parameters to be modified programatically. (#2829)
Fix a comment in the MCA header.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-01-31 21:47:36 -05:00
Josh Hursey
5fcd69da52 Merge pull request #2831 from jjhursey/topic/ibm/pml-bsend
pml/base: Expose some bsend varaibles so PMLs may reference them
2017-01-31 10:31:42 -06:00
Gilles Gouaillardet
9bcadbd51b coll/libnbc: fix the red_schain algo of ireduce with MPI_IN_PLACE
this fixes a regression introduced in open-mpi/ompi@045d0c5f4c

Fixes open-mpi/ompi#2879

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 14:19:45 +09:00
Yossi Itigin
13c3bf0dd7 yalla: fix memory leak with blocking non-contig send.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-01-29 18:51:43 +02:00
Ralph Castain
3440b46e5e Merge pull request #2820 from rhc54/topic/async
Per f2f meeting: if async modex is given, default to no MPI init barr…
2017-01-27 15:43:43 -08:00
Josh Hursey
f4a86904c4 Merge pull request #2813 from jjhursey/fix/ibm/comm-cleanup
communicator: Fix uninitialized variable
2017-01-26 14:35:32 -06:00
Josh Hursey
ebc90f926e Merge pull request #2806 from jjhursey/fix/ibm/aint-diff-type
Fix a minor error at MPI_AINT_DIFF.
2017-01-26 14:23:21 -06:00
Josh Hursey
0408c116eb Merge pull request #2805 from jjhursey/fix/ibm/base-allgatherv
coll/base: Allgatherv MPI_IN_PLACE Bug
2017-01-26 14:21:57 -06:00
Geoffrey Paulsen
d2527cff46 Fixing comment only in MPI_IN_PLACE case for ireduce in libnbc.
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2017-01-26 10:58:51 -08:00
Geoffrey Paulsen
045d0c5f4c Fix for Ireduce + MPI_IN_PLACE.
Fixes a wrong answer from MPI_Ireduce when the red_sched_chain()
path was taken (which only happens for np<=4 and mesgsize>=64k).

The way libnbc treats MPI_IN_PLACE is to set sbuf == rbuf, and
whether an algorithm will work cleanly or not after that depends on the
details.

In this case the last steps of the algorithm amounted to
    (right neighbor is sending us reduction results from ranks 1..n-1)
    recv into rbuf from right neighbor
    add the contribution from our sbuf into rbuf
this would be fine in general, but if sbuf==rbuf, that recv overwrites
the sbuf. I changed it to recv into a tmpbuf if MPI_IN_PLACE was used.

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2017-01-25 18:08:08 -08:00
Nysal Jan K.A
94f92f6b49 osc/base: Detect unsupported data types and abort
Using MPI_MINLOC or MPI_MAXLOC with the following data types
leads to data corruption:
 * MPI_DOUBLE_INT
 * MPI_LONG_INT
 * MPI_SHORT_INT
 * MPI_LONG_DOUBLE_INT

Detect this print a error message and abort.
This workaround should be removed once the following issue is resolved:
 * https://github.com/open-mpi/ompi/issues/1666

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-25 15:28:28 -06:00
Sameh S. Sharkawi
320ab3b84f pml/base: Expose some bsend varaibles so PMLs may reference them
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-25 15:21:53 -06:00
Ralph Castain
a7b8190fdc Per f2f meeting: if async modex is given, default to no MPI init barrier, letting the user override that if desired.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-25 10:13:53 -08:00
Joshua Hursey
a2d45f6e9f communicator: Fix uninitialized variable
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-24 16:46:13 -06:00
Zhi Ming Wang
9718bbac82 Fix a minor error at MPI_AINT_DIFF.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-24 16:06:14 -06:00
Mark Allen
a3452adfa9 coll/base: Allgatherv MPI_IN_PLACE Bug
MPI_Allgatherv with MPI_IN_PLACE reads data from wrong location.

They were locating the MPI_IN_PLACE send buffer as
```c
         send_buf = (char*)rbuf;
         for (i = 0; i < rank; ++i) {
             send_buf += ((ptrdiff_t)rcounts[i] * extent);
         }
```
when it should be
```c
         send_buf = (char*)rbuf;
         send_buf += ((ptrdiff_t)disps[rank] * extent);
```
because disps[] specifies where things are in the v-style buffers.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-24 15:52:36 -06:00
Edgar Gabriel
cbb3cb9745 fs/ufs: avoid using the exclusive flag with shared file pointer
when a file is opened a second time for shared file pointer operations,
avoid setting the create and exclusive flag.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-01-24 12:11:29 -06:00
Edgar Gabriel
f5289a1803 common/ompio: store correctly the SHAREDFP_IS_SET flag
it looks like disabling the lazy_open flag for sharedfp components
revealead a bug that lead to a crash in file_close in some tests. Make
sure the SHAREDFP_IS_SET flag is correctly set (and not overwritten again),
and we use that to avoid a double-free of the communicator.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-01-24 12:09:56 -06:00
Gilles Gouaillardet
d5aa310884 mpiext/affinity: initialize all output variables of OMPI_Affinity_str()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:13:29 +09:00
Gilles Gouaillardet
501eb8dc7e ompio: plug misc memory leaks
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:13:19 +09:00
Gilles Gouaillardet
d0629f18c2 coll/libnbc: optimize size one communicators
simply "return" with ompi_request_empty if the communicator size is 1

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:12:47 +09:00
Gilles Gouaillardet
6f2ca5809b man: fix a typo in MPI_Win_get_name()
Thanks Nicolas Joly for the report

Fixes open-mpi/ompi#2782

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:08:13 +09:00
Edgar Gabriel
4dc09de3b8 common/ompio: update comment based on the previsou commit.
No source code changed.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-01-23 13:38:05 -06:00
Edgar Gabriel
3eae0eecd0 io/ompio: change default for sharedfp_lazy_open parameter
Revert the logic of io_ompio_sharedfp_lazy_open. The user now has to explicitely
disable shared fp in order for the structures not to be allocated.
Otherwise, resetting the shared fp e.g. in case the file was opened
in append mode will not work correctly, the code could deadlock.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-01-23 08:59:22 -06:00
Edgar Gabriel
d3a8d38cc6 common/ompio: correctly position shared fp in append mode
Fixes a bug reported on the mailing list. ompio did only reposition the individual
file pointer when the file was opened in append mode. Set the shared file
pointer also to point to the end of the file, similarly to the individual
file pointer.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-01-23 08:59:05 -06:00
Nathan Hjelm
0497ec0b70 osc/rdma: fix typo in check for MPI_MODE_NOCHECK
This commit fixes two typos in the lock_all path that inverted the
MPI_MODE_NOCHECK flag.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-01-12 11:28:11 -07:00
Gilles Gouaillardet
4932391002 ompi/proc: fix ompi_proc_finalize()
revert bits of open-mpi/ompi@cf534d0c95
we cannot del_procs here since the pml framework has already been closed

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-12 11:41:35 +09:00
George Bosilca
c2cd717f82 Don't refcount the predefined datatypes.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-01-11 16:48:59 -05:00
Gilles Gouaillardet
2189c5bcc3 ompi/dpm: plug a memory leak in disconnect_waitall()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:44 +09:00
Gilles Gouaillardet
cf534d0c95 ompi/proc: plug a memory leak in ompi_proc_finalize()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
1daa80d78f mtl/psm2: plug a memory leak in ompi_mtl_psm2_component_open()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 09:28:32 +09:00
Joshua Ladd
57c0c847d0 Merge pull request #2603 from xinzhao3/topic/revert-ucx-mt
Revert "PML/SPML/UCX: add UCX MT support to PML and SPML."
2017-01-04 11:50:37 -05:00
Ralph Castain
66131b4183 Remove the bcol, coll/ml, and sbgp code as stale and lacking a maintainer
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-03 19:32:48 -08:00
Ralph Castain
dadc6fbaf6 Merge pull request #2448 from thananon/remove_request_lock
Completely removed ompi_request_lock and ompi_request_cond
2017-01-03 19:31:46 -08:00
Jeff Squyres
33d2988985 Merge pull request #2647 from OMGtechy/master
Fixed -Wmisleading-indentation in ad_read_coll.c
2017-01-03 12:24:22 -05:00
Ralph Castain
fe68f23099 Only instantiate the HWLOC topology in an MPI process if it actually will be used.
There are only five places in the non-daemon code paths where opal_hwloc_topology is currently referenced:

* shared memory BTLs (sm, smcuda). I have added a code path to those components that uses the location string
  instead of the topology itself, if available, thus avoiding instantiating the topology

* openib BTL. This uses the distance matrix. At present, I haven't developed a method
  for replacing that reference. Thus, this component will instantiate the topology

* usnic BTL. Uses the distance matrix.

* treematch TOPO component. Does some complex tree-based algorithm, so it will instantiate
  the topology

* ess base functions. If a process is direct launched and not bound at launch, this
  code attempts to bind it. Thus, procs in this scenario will instantiate the
  topology

Note that instantiating the topology on complex chips such as KNL can consume
megabytes of memory.

Fix pernode binding policy

Properly handle the unbound case

Correct pointer usage

Do not free static error messages!

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-29 10:33:29 -08:00
Joshua Gerrard
94e87654c6 Fixed -Wmisleading-indentation in ad_read_coll.c
Signed-off-by: Joshua Gerrard <joshuagerrard+ompi-commit@protonmail.com>
2016-12-28 20:14:13 +00:00
Jeff Squyres
d772fcf8f1 Merge pull request #2509 from OMGtechy/master
Fixed memory leak and some -Werror=unused-result warnings
2016-12-27 17:13:23 -05:00
Nysal Jan K.A
25ba507ada mpit: Fix MPI_T_pvar_get_index
MPI_T_pvar_get_index was returning an incorrect index. The index
was never set correctly while registering the performance variables.
Additionally fix a missing case in the mca_base_var_type_t to MPI
datatype conversion. This type is currently used for control variables
registered by mxm, fca and hcoll components.

Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
2016-12-22 12:30:21 +05:30
Gilles Gouaillardet
773cad6b3e ompi/debugger: fix mqs_version_string()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-22 15:00:47 +09:00