1
1
Граф коммитов

8896 Коммитов

Автор SHA1 Сообщение Дата
Igor Ivanov
8b05f308f9 opal/memory: Move Memory Allocation Hooks usage from openib
These changes fix issue https://github.com/open-mpi/ompi/issues/1336

- improve abstractions: opal/memory/linux component should be single place that opeartes with
Memory Allocation Hooks.
- avoid collisions in case dynamic component open/close: it is safe because it is linked statically.
- does not change original behaivour.
2016-02-11 14:46:35 +02:00
Gilles Gouaillardet
96310f439b sentinel: fix 32 bits arch
since a sentinel is only made from the current job, only store the
first 31 bits of the vpid into the sentinel.
2016-02-10 15:44:07 +09:00
Gilles Gouaillardet
b55b9e6aee sentinel: fix sentinel to proc_name conversion
converting an opal_process_name_t means the loss of one bit,
it was decided to restrict the local job id to 15 bits, so the
useful information of an opal_process_name_t can fit in 63 bits.
2016-02-10 15:44:07 +09:00
Gilles Gouaillardet
030a5f2054 sentinel: use type uintptr_t for sentinel
MSB is now automatically cleared when right shifting
Thanks George for pointing this
2016-02-10 11:28:56 +09:00
Jeff Squyres
d537ee9f26 Merge pull request #1340 from jsquyres/pr/decrease-mpi_add_procs_cutoff
RFC: ompi_mpi_params.c: set mpi_add_procs_cutoff default to 0
2016-02-09 13:36:43 -05:00
Jeff Squyres
902b477aac ompi_mpi_params.c: set mpi_add_procs_cutoff default to 0
Decrease the default value of the "mpi_add_procs_cutoff" MCA param
from 1024 to 0.
2016-02-09 09:41:36 -08:00
George Bosilca
7c574a3530 Typo. 2016-02-07 07:22:22 +02:00
Nathan Hjelm
5b9c82a964 osc/pt2pt: bug fixes
This commit fixes several bugs identified by @ggouaillardet and MTT:

 - Fix SEGV in long send completion caused by missing update to the
   request callback data.

 - Add an MPI_Barrier to the fence short-cut. This fixes potential
   semantic issues where messages may be received before fence is
   reached.

 - Ensure fragments are flushed when using request-based RMA. This
   allows MPI_Test/MPI_Wait/etc to work as expected.

 - Restore the tag space back to 16-bits. It was intended that the
   space be expanded to 32-bits but the required change to the
   fragment headers was not committed. The tag space may be expanded
   in a later commit.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-04 16:59:39 -07:00
Gilles Gouaillardet
6eac6a8b00 osc/sm: create datafile into the per proc directory in order to make it unique per communicator
Thanks Peter Wind for the report
2016-02-03 10:12:37 +09:00
Nathan Hjelm
519fffb65e osc/pt2pt: eager sends are always active if MPI_MODE_NOCHECK is used
This commit fixes open-mpi/ompi#1299.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-02 12:44:17 -07:00
Nathan Hjelm
d7264aa613 osc/pt2pt: various threading fixes
This commit fixes several bugs identified by a new multi-threaded RMA
benchmarking suite. The following bugs have been identified and fixed:

 - The code that signaled the actual start of an access epoch changed
   the eager_send_active flag on a synchronization object without
   holding the object's lock. This could cause another thread waiting
   on eager sends to block indefinitely because the entirety of
   ompi_osc_pt2pt_sync_expected could exectute between the check of
   eager_send_active and the conditon wait of
   ompi_osc_pt2pt_sync_wait.

 - The bookkeeping of fragments could get screwed up when performing
   long put/accumulate operations from different threads. This was
   caused by the fragment flush code at the end of both put and
   accumulate. This code was put in place to avoid sending a large
   number of unexpected messages to a peer. To fix the bookkeeping
   issue we now 1) wait for eager sends to be active before stating
   any large isend's, and 2) keep track of the number of large isends
   associated with a fragment. If the number of large isends reaches
   32 the active fragment is flushed.

 - Use atomics to update the large receive/send tag counters. This
   prevents duplicate tags from being used. The tag space has also
   been updated to use the entire 16-bits of the tag space.

These changes should also fix open-mpi/ompi#1299.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-02 12:33:33 -07:00
Gilles Gouaillardet
cda094afc7 mpi_f08: correctly implements MPI_{COMM,TYPE,WIN}_{DUP,NULL_{COPY,DELETE}}_FN
Fixes open-mpi/ompi#1323
2016-02-02 13:38:01 +09:00
Gilles Gouaillardet
728a97c558 use-mpi-f08: remove duplicates from Makefile.am 2016-02-02 13:33:07 +09:00
Jeff Squyres
910eca751f Merge pull request #1327 from ggouaillardet/poc/mpi_xxx_dup_yyy_no_bind
f08: do not BIND(C) to subroutines with LOGICAL parameters
2016-02-01 17:56:27 -05:00
Edgar Gabriel
3f7fff5780 Merge pull request #1331 from edgargabriel/solaris-statfs-fix
Solaris statfs fix
2016-01-28 20:16:33 -06:00
Gilles Gouaillardet
69ba2a9b6b ddt: fix support of MPI_COMBINER_RESIZED in __ompi_datatype_create_from_args
Thanks James Ramsey for the report
2016-01-28 11:32:29 +09:00
Nathan Hjelm
a19c265ab5 osc/rdma: fix typo in ompi_osc_rdma_complete_atomic
The typo caused SEGVs on systems with only fetching atomic
support.

Fixes open-mpi/ompi#1329

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-26 15:44:07 -07:00
Edgar Gabriel
b4a725c26a need to check for the parent dir as well, since the file might not exist yet. 2016-01-26 13:49:21 -06:00
Edgar Gabriel
722aab92e6 - extend opal_path_nfs to retrieve the file system type
- use opal_path_nfs in the fs_base function to avoid code duplication.
2016-01-26 13:36:21 -06:00
Gilles Gouaillardet
704f14f91e f08: do not BIND(C) to subroutines with LOGICAL parameters
Thanks Paul Romano for reporting this issue.
2016-01-26 13:56:24 +09:00
Joshua Ladd
69e3c6f289 Merge pull request #1321 from jladd-mlnx/topic/add-allgatherv-reduce
Adding entry points for Allgatherv, iAllgatherv, Reduce, and iReduce.
2016-01-25 20:46:52 -05:00
Nathan Hjelm
500e90422d Merge pull request #1320 from hjelmn/osc_rdma_fix
osc/rdma: fix hang when performing large unaligned gets
2016-01-25 09:36:13 -07:00
Nathan Hjelm
45da311473 osc/rdma: fix hang when performing large unaligned gets
This commit adds code to handle large unaligned gets. There are two
possible code paths for these transactions:

 1) The remote region and local region have the same alignment. In
 this case the get will be broken down into at most three get
 transactions: 1 transaction to get the unaligned start of the region
 (buffered), 1 transaction to get the aligned portion of the region,
 and 1 transaction to get the end of the region.

 2) The remote and local regions do not have the same alignment. This
 should be an uncommon case and is not optimized. In this case a
 buffer is allocated and registered locally to hold the aligned data
 from the remote region. There may be cases where this fails (low
 memory, can't register memory). Those conditions are unlikely and
 will be handled later.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-22 21:06:46 -07:00
Valentin Petrov
5e2a2c0755 BufFix for coll/hcoll: coll_request must be set to ACTIVE when alloced
If the state of the request is not set to OMPI_REQUEST_ACTIVE
       then MPI_Test would immediately signal such request completed
       while hcoll may still be working on it.

Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-23 03:23:59 +02:00
Joshua Ladd
e398bf6f3a Adding entry points for Allgatherv, iAllgatherv, Reduce, and iReduce.
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-23 03:09:29 +02:00
Nathan Hjelm
49d2f44b97 osc/rdma: use correct endpoint for local state
If atomics are not globally visible (cpu and nic atomics do not mix)
then a btl endpoint must be used to access local ranks. To avoid
issues that are caused by having the same region registered with
multiple handles osc/rdma was updated to always use the handle for
rank 0. There was a bug in the update that caused osc/rdma to continue
using the local endpoint for accessing the state even though the
pointer/handle are not valid for that endpoint. This commit fixes the
bug.

Fixes open-mpi/ompi#1241.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-22 10:41:27 -07:00
Nathan Hjelm
243d973cfe Merge pull request #1316 from hjelmn/datatype_pack_threads
ompi/datatype: make datatype pack thread safe
2016-01-21 20:14:10 -07:00
Nathan Hjelm
b921831f2b ompi/datatype: make datatype pack thread safe
This commit makes ompi_datatype_get_pack_description thread safe. The
call is used by osc/pt2pt to send the packed description to remote
peers. Before this commit if MPI_THREAD_MULTIPLE is enabled and the
user uses MPI_Put, MPI_Get, etc we could hit a race where multiple
threads attempt to store the packed description on the datatype. Since
the code in question is not performance-critical the threading fix
uses opal_atomic_* calls instead of bothering with OPAL_THREAD_*.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-21 17:53:28 -07:00
Nathan Hjelm
6180386bea osc/rdma: disable put aggregation when using threads
Optimizing put aggregation in the presence of threads will require a
redesign of the code. For now just ensure that put aggregation is
turned off when MPI_THREAD_MULTIPLE is enabled.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-21 15:50:35 -07:00
Edgar Gabriel
b253d4e887 fix CID 1349739, CID 1349738, CID 1349736 and (probably) CID 1349740 (not entirely sure about the last one, since I don't understand why block[i] is a problem but max_len[i] allocated and treated exactly the same way 1 line later is not). 2016-01-21 08:32:23 -06:00
Edgar Gabriel
9b8d769e41 will rivist the addproc component later in spring, right now it is constantly in the way of doing my tests. 2016-01-20 15:05:51 -06:00
Edgar Gabriel
1671604dbc Merge pull request #1307 from edgargabriel/fcoll-dynamic_gen2
Fcoll dynamic gen2
2016-01-20 10:19:56 -06:00
Francois WELLENREITER
411b7301c3 OSC portals4 : do not generate an EVENT_SEND to avoid to filter it 2016-01-20 11:47:46 +01:00
Gilles Gouaillardet
2adbe273d6 mpi: have MPI_Wtick() return the period (and not the frequency) if OPAL_TIMER_CYCLE_NATIVE 2016-01-20 14:14:47 +09:00
Gilles Gouaillardet
c0f8f2ce32 ompi/dpm: correctly handle sentinels in construct peers
This fix is similar to open-mpi/ompi@4c1ea4a171
and open-mpi/ompi@213b2abde4
2016-01-18 09:57:38 +09:00
Edgar Gabriel
a9ca37059a improve the communicaton abstraction. This commit also allows all aggregators to work simultaniously, instead of the slightly staggered way of the previous version. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
56e11bfc97 initialize the stripe_size variable as well. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
26c57ef374 separate the size of the buffer used for the shuffle step and the size of the buffer used for a pwritev operation. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
39d5c8c281 further bug fixes silencing a compiler warning and fixing a memory overrun 2016-01-17 09:48:49 -06:00
Edgar Gabriel
2bcae84e11 further debugging 2016-01-17 09:48:49 -06:00
Edgar Gabriel
2bdd6ba17a correctly free some buffers, and ensure that lustre_stripe_size and stripe_count are always read from the file system. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
4bbb22bd0b add a new field to the ompio data structure (stripe_count) and set it correctly on pvfs2 and lustre. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
d282e94b67 add the new dynamic_gen2 component, designed to coexist for now with the original dynamic component 2016-01-17 09:48:49 -06:00
Jeff Squyres
60ffe713b8 common syms: whitelist bison-generated common symbols
Bison generates some common symbols that we can't do anything about,
so whitelist them.
2016-01-16 03:53:14 -08:00
Jeff Squyres
96f94f8228 fortran: whitelist deliberate common symbols
The Fortran library has a number of common symbols that are
deliberate, so whitelist them.
2016-01-16 03:53:14 -08:00
Joshua Ladd
18c5a21562 Fix typo in error handling flow. 2016-01-14 22:28:54 +02:00
Joshua Ladd
afa62d8ca1 Addressing reviewers' comments for https://github.com/open-mpi/ompi-release/pull/891 2016-01-14 19:22:27 +02:00
Tomislav Janjusic
3858bc8e62 Adding support for dynamic endpoint creation
Signed-off-by: Tomislav Janjusic <tomislavj@mngx-apl-01.mtl.labs.mlnx>
Signed-off-by: Tomislavj Janjusic <tomislavj@mellanox.com>
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-12 22:17:03 +02:00
Nathan Hjelm
dd4d49cbbb Merge pull request #1278 from ggouaillardet/poc/osc_pt2pt
osc/pt2pt: use two distinct "namespaces" for tags
2016-01-12 09:49:31 -07:00
Nathan Hjelm
d26cc3fece ompi/group: do no decrement parent group proc pointers in destruct
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-11 12:56:11 -07:00
Edgar Gabriel
0a1b735eed use the actual preadv and pwritev functions if available. That's what the fbtl interfaces have been designed for. 2016-01-07 08:29:17 -06:00
Gilles Gouaillardet
4c1ea4a171 dpm: correctly handle procs_cutoff in ompi_dpm_connect_accept()
this commit includes missing bits from open-mpi/ompi@213b2abde4
2016-01-07 09:11:03 +09:00
Gilles Gouaillardet
213b2abde4 dpm: correctly handle procs_cutoff in ompi_dpm_connect_accept() 2016-01-06 16:21:13 +09:00
Edgar Gabriel
1b0b849994 remove the MCA parameter setting the number of hosts in PLFS, since the plfs_setxattr function used is causing linking problems with PLFS 2.5
remove unused variables.
2016-01-05 11:13:23 -06:00
Edgar Gabriel
7861a8c357 revise the logic in the fbtl plfs avoiding the memcpy operation 2016-01-05 10:04:46 -06:00
Edgar Gabriel
da309ac962 - use a unique pid for each process as requested by the API
- sync the file before closing it
- use plfs_access() instead of access() before closing the file
2016-01-05 10:04:12 -06:00
KAWASHIMA Takahiro
ad26899110 osc/sm: Fix a bus error on MPI_WIN_{POST,START}.
A bus error occurs in sm OSC under the following conditions.

- sparc64 or any other architectures which need strict alignment.
- `MPI_WIN_POST` or `MPI_WIN_START` is called for a window created
  by sm OSC.
- The communicator size is odd and greater than 3.

The lines 283-285 in current `ompi/mca/osc/sm/osc_sm_component.c` has
the following code.

```c
module->global_state = (ompi_osc_sm_global_state_t *) (module->segment_base);
module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1);
module->posts[0] = (uint64_t *) (module->node_states + comm_size);
```

The size of `ompi_osc_sm_node_state_t` is multiples of 4 but not
multiples of 8. So if `comm_size` is odd, `module->posts[0]` does
not aligned to 8. This causes a bus error when accessing
`module->posts[i][j]`.

This patch fixes the alignment of `module->posts[0]` by setting
`module->posts[0]` first.
2016-01-05 19:04:53 +09:00
Gilles Gouaillardet
06ecdb6aa7 osc/pt2pt: use two distinct "namespaces" for tags 2016-01-05 16:57:37 +09:00
Gilles Gouaillardet
14fdf75944 fs/pvfs2: fix typo
Thanks Dave Love for reporting this issue.

Fixes #1272
2016-01-03 23:28:35 +09:00
Artem Polyakov
2abb2972ac Fix Mellanox copyrights with respect to the following PRs:
* https://github.com/open-mpi/ompi/pull/1184
* https://github.com/open-mpi/ompi/pull/1188
* https://github.com/open-mpi/ompi/pull/1197
* https://github.com/open-mpi/ompi/pull/1202
* https://github.com/open-mpi/ompi/pull/1210
* https://github.com/open-mpi/ompi/pull/1216
* https://github.com/open-mpi/ompi/pull/1236
* https://github.com/open-mpi/ompi/pull/1237
* https://github.com/open-mpi/ompi/pull/1248
* https://github.com/open-mpi/ompi/pull/1260
* https://github.com/open-mpi/ompi/pull/1264
2015-12-30 00:12:19 +06:00
Ralph Castain
810f2446b7 Add pmix120 component, update the error handling functions in the PMIx API.
Update the configure logic for the new pmix120 component

ckpt

Get the pmix120 component to work - still not really registering or handling notifications, but infrastructure now operates

Cleanup some of the symbol scopes, and provide a more comprehensive rename.h file. Will pretty it up later - let's see how this works

Cleanup the rename files to use the pretty macros
2015-12-28 23:15:44 +09:00
Gilles Gouaillardet
fec973efda configury: test portability
replace test ... -o ... with test ... || test ...
and test ... -a ... with test ... && test ...
2015-12-28 13:58:45 +09:00
Gilles Gouaillardet
47ab2fcb89 man: fix MPI_Neighbor_alltoall{v,w} prototypes
Thanks Willem Vermin for bringing this to our attention
2015-12-28 09:39:33 +09:00
Gilles Gouaillardet
ccc96ad204 fbtl/base: add missing #include "opal/util/output.h"
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:26 +09:00
Gilles Gouaillardet
cebde2a753 coll/tuned: add missing #include "opal/util/output.h"
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:17 +09:00
Gilles Gouaillardet
ad9693c604 pml/yalla: add missing #include <alloca.h> 2015-12-24 14:33:58 +09:00
Gilles Gouaillardet
b38c17dbcb pml/cm: add missing #include <alloca.h>
Thanks Paul Hargrove for reporting this issue
2015-12-24 14:33:58 +09:00
Gilles Gouaillardet
071ae39a44 osc/rdma: add missing #include <alloca.h> 2015-12-24 14:33:58 +09:00
Gilles Gouaillardet
77f199d1d7 coll/fca: add missing #include <alloca.h> 2015-12-24 14:33:58 +09:00
Todd Kordenbrock
8a3660138e mtl-portals4: initialize endpoint nid/pid when using logical mapping
When mtl-portals4 is configured for logical mapping, coll-portals4
must disqualify because it does not yet support logical mapping.
coll-portals4 looks for the endpoint pid to be zero which tells it
that mtl-portals4 is configured for logical mapping.  This commit
initializes the endpoint nid/pid to zero for logical mapping.
2015-12-22 11:20:18 -06:00
Gilles Gouaillardet
e918d75fae java: try do dlopen libmpi with the full path
Since OS X 10.11 (aka El Capitan) DYLD_LIBRARY_PATH is no more
propagated to children, so try to dlopen libmpi with the full path
using the directory of libmpi_java

Fixes open-mpi/ompi#1220

Thanks Alexander Daryin for reporting this
2015-12-22 11:09:46 +09:00
rhc54
aa17bdf6e8 Merge pull request #1239 from rhc54/topic/cleanup
Cleanup the warnings from the ompi layer when compiling optimized under Mac OSX
2015-12-21 07:23:31 -08:00
Edgar Gabriel
46c20a1246 correctly set all variables storing information on the file pointer position to zero when setting the file view 2015-12-21 09:41:39 +09:00
George Bosilca
12dad8b37f Fix the missing resize of the returned type for
the subarray and darray types.

Thanks Keith Bennett and Dan Garmann for reporting this issue

Fixes open-mpi/ompi#1191
2015-12-21 09:41:30 +09:00
George Bosilca
6e6fd14a19 Fix indentation. 2015-12-20 03:15:19 -05:00
George Bosilca
c895eb7068 Remove extraneous declaration. 2015-12-19 01:34:48 -05:00
Ralph Castain
ac6289dca6 Cleanup the warnings from the ompi layer when compiling optimized under Mac OSX
Cleanup per George's comments
2015-12-17 17:39:15 -08:00
Nathan Hjelm
d0b4aa1f9a Merge pull request #1237 from artpol84/add_proc_deadlck_fix
Fix add_proc deadlock.
2015-12-17 12:09:40 -07:00
Artem Polyakov
6a791c3026 Fix add_proc deadlock. 2015-12-17 21:18:33 +06:00
igor.ivanov@itseez.com
041a6a9f53 ompi/pml: Fix warnings in yalla component 2015-12-16 16:22:30 +02:00
igor.ivanov@itseez.com
38c253c74c ompi/mtl: Fix warnings in mxm component 2015-12-16 16:22:29 +02:00
igor.ivanov@itseez.com
0a9956927a ompi/coll: Fix warnings in fca components
warning: assignment from incompatible pointer type
2015-12-16 16:22:16 +02:00
igor.ivanov@itseez.com
8f45d83d46 ompi/coll: Fix warnings in hcoll component
warning: assignment from incompatible pointer type
2015-12-16 14:52:29 +02:00
Ralph Castain
3a56f0d34b Create the pmix external component. Fix a few places where opal/util/argv.h were required when building with an external pmix (go figure).
NOTE: Building with external pmix *requires* that you also build with external libevent and hwloc libraries. Detect this at configure and error out with large message if this requirement is violated.

Closes #1204  (replaces it)
Fixes #1064
2015-12-15 15:26:13 -08:00
Nathan Hjelm
4992c22f4a Merge pull request #1224 from hjelmn/osc_fixes
osc/rdma: fix bugs when running more than one process per node
2015-12-15 14:01:01 -08:00
Nathan Hjelm
0de9445fc7 osc/rdma: fix bugs when running more than one process per node
A previous commit updated the one-sided code to register the state
region only once. This created an issue when using the scratch lock
with fetching atomics. In this case on any rank that isn't local rank
0 the module->state_handle is NULL. This commit fixes the issue by
removing the scratch lock and using a fragment pointer instead.

Fixes open-mpi/ompi#1290

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-15 11:25:25 -07:00
Jeff Squyres
a2a5d650f9 Merge pull request #1180 from ggouaillardet/mpi_xxx_dup_fn
fortran: add missing MPI_xxx_DUP_FN bindings
2015-12-15 13:15:27 -05:00
Gilles Gouaillardet
f0df2a7b2b ompi: silence CID 1343322 2015-12-15 13:33:43 +09:00
Nathan Hjelm
139799f3c4 Merge pull request #1202 from artpol84/alltoall_fix
Fix MPI_Alltoall to support inter-communicators.
2015-12-14 14:33:23 -08:00
Nathan Hjelm
b7ba301310 Merge pull request #1165 from hjelmn/add_procs_group
ompi/group: release ompi_proc_t's at group destruction
2015-12-14 13:53:42 -08:00
Nathan Hjelm
9d659465b7 Merge pull request #1210 from artpol84/icbarrier_fix
Fix NBC iBarrier for inter-communicators.
2015-12-14 13:52:38 -08:00
Nathan Hjelm
4b3dac5933 Merge pull request #1216 from artpol84/icgatherv_fix
Fix NBC iGatherv for inter-communicators.
2015-12-14 13:51:58 -08:00
Matias Cabral
7cfd7d50b9 Merge pull request #1219 from matcabral/PSM2_tag_hashing
Support for PSM2 hashing lookup in message queue.
2015-12-14 12:01:55 -08:00
matcabral
9a1f9be146 A new internal feature in PSM2 will use hash tables to
accelerate message queue lookups if the lookups have
the proper tag&mask layout. OpenMPI should follow
PSM2's preferred tag&mask spec, so that PSM2 can provide
a performance benefit.
2015-12-14 10:13:39 -08:00
Artem Polyakov
2d0919dbdc Fix NBC iGatherv for inter-communicators.
We need to use remote size to form a schedule.
2015-12-14 12:19:10 +06:00
Ralph Castain
5e5adebf8e Port the changes from #782 to the master. Not everything applies here as the code in the 1.10 series is a little different. In addition, we asked for a few changes (e.g., using MPI_ERR_ARG instead of "13") that are incorporated here.
Thanks to @jsharpe for the PR
2015-12-12 12:40:34 -08:00
Artem Polyakov
fc17deca43 Fix NBC iBarrier for inter-communicators.
Remove send of the extra message. This bug hase triggered on
MPICH/coll/nbicbarrier test. In this test a series of communicators
are created.
This extre-message was reseived after original communicator was destroyed
and queued into non_existing_communicator_pending. When new completely
unrelated communicator with the same id as original was created this message
was pushed into the frags_cant_match queue and caused seq numbers skew and hang.
2015-12-12 13:27:31 +06:00
Gilles Gouaillardet
3a3b13ea12 coll/base: fix an integer overflow in ompi_coll_base_reduce_generic
Refs open-mpi/ompi#1198
2015-12-11 13:55:59 +09:00
Artem Polyakov
25077fc5d9 Fix MPI_Alltoall to support inter-communicators.
Remove excessive parameter check to avoid premature exit from the collective.

MPI standard says:
The type signature associated with sendcount, sendtype, at a process must be equal to
the type signature associated with recvcount, recvtype at any other process. This implies
that the amount of data sent must be equal to the amount of data received, pairwise between
every pair of processes.

In case of inter-communicator we have 2 group of processes and "left" group may call
MPI_Alltoall(NULL, 0, MPI_INT, buf, 10, MPI_INT, comm, ...);
and the right one:
MPI_Alltoall(buf,10,MPI_INT, NULL, 0, MPI_INT, comm, ...);

And it would be legal though one of the group will receive 0 bytes from others.

This was triggered by MPICH/coll test called icalltoall.
2015-12-11 08:50:34 +06:00
Alina Sklarevich
3ffd8dcd20 PML UCX: fix typo (following 7becc54d). 2015-12-10 13:51:10 +02:00
Artem Polyakov
ee71e35a90 Fix ompi_comm_create when source communicator is inter-communicator.
This bug was triggered by probe-intercom and icm tests from MPICH suite.
2015-12-09 15:44:26 +02:00
Gilles Gouaillardet
3a62341b30 Merge pull request #1189 from ggouaillardet/topic/empty_ddt_fix
ddt: duplicate MPI_DATATYPE_NULL when ompi_datatype_create_indexed of…
2015-12-09 15:29:03 +09:00
Nathan Hjelm
f317ba5262 Merge pull request #1163 from hjelmn/ompi_proc_threads
ompi/proc: make proc system always thread safe
2015-12-08 10:36:55 -07:00
Nathan Hjelm
b47a64f27d Merge pull request #1188 from artpol84/intercomm_split_fix
Yet one more fix to intercommunicator splitting logic.
2015-12-08 07:09:46 -07:00
Nathan Hjelm
dae3746d2f Merge pull request #1190 from kawashima-fj/pr/sm-win-test-fix
osc/sm: Fix a bug that `MPI_WIN_TEST` does not update `flag` to 0
2015-12-08 06:39:16 -07:00
KAWASHIMA Takahiro
9c7b6a4352 osc/sm: Fix a bug that MPI_WIN_TEST does not update flag to 0.
`MPI_WIN_TEST` must update the `flag` parameter to 0 when not all
origin processes called `MPI_WIN_COMPLETE`. But sm OSC doesn't.
If the caller initialize the `flag` argument to a non-0 value,
the caller will receive the non-0 `flag` value.
2015-12-08 19:23:21 +09:00
Gilles Gouaillardet
59a361b781 ompio: correctly handle zero f_cc_size in mca_io_ompio_simple_grouping 2015-12-08 17:00:11 +09:00
Gilles Gouaillardet
d43ad3fada ddt: duplicate MPI_DATATYPE_NULL when ompi_datatype_create_indexed of ompi_datatype_create_indexed_block is invoked with a zero count 2015-12-08 16:25:36 +09:00
Artem Polyakov
7690f4027a Yet one more fix to intercommunicator splitting logic.
Previous commit f2794740 reverts Nathans changes. However it turns out
that I was unable to trace his logic until I started investigation of
icsplit hang. Bug was triggered when splitting Intercom was giving a group
where on side of the communicator was empty (icsplit, intercom create #2).
in this case remote_size == 0 and there is no way to distinguish between
inter- and intra-communicator.
Conclusion: We do need to distinguish between intra- and inter-communicators.
So we should use ompi_mpi_group_null.group.
2015-12-08 08:43:08 +02:00
Nathan Hjelm
63d8feb31c Merge pull request #1187 from hjelmn/bsend_fix
pml/ob1: add missing ompi_request_wait_completion for buffered sends
2015-12-07 23:09:04 -07:00
Nathan Hjelm
f68c315188 pml/ob1: add missing ompi_request_wait_completion for buffered sends
This commit adds a call to ompi_request_wait_completion for buffered
sends. Without this line it is possible to get into a state where the
data is never sent.

Fixes open-mpi/ompi#1185

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-07 22:28:07 -07:00
Nathan Hjelm
eb830b9501 ompi_proc_pack: correctly handle proc sentinels
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-07 17:27:38 -07:00
Artem Polyakov
f2794740b3 Fix intercommunicator split (was triggered by MPICH/icsend test) 2015-12-07 15:41:29 +02:00
Gilles Gouaillardet
bfe8e03d9d fcoll/two_phase: use ompi_mpi_abort instead of PMPI_Abort
Thanks Jeff for the review
2015-12-07 11:34:36 +09:00
Gilles Gouaillardet
ef03bc726c ompi: fix comment in ompi/mpi/c/Makefile.am
Thanks Jeff for the review
2015-12-07 11:34:01 +09:00
Gilles Gouaillardet
37c978f5e9 coll/libnbc: correctly handle changed types.
this fixes open-mpi/ompi@d816d1c194
thanks Jeff for the review
2015-12-07 10:13:43 +09:00
Gilles Gouaillardet
26b2ed1069 fortran: add missing MPI_xxx_DUP_FN bindings in use-mpi-tkr
- MPI_COMM_DUP_FN
- MPI_TYPE_DUP_FN
- MPI_WIN_DUP_FN
2015-12-07 09:10:48 +09:00
George Bosilca
3a9664ac9d Fix Coverity CIDs 1341584-1341589. 2015-12-06 14:06:36 -05:00
Jeff Squyres
ad35a363fa Merge pull request #1179 from jsquyres/pr/mpi-testsome-man-page-update
Pr/mpi testsome man page update
2015-12-04 05:55:33 -05:00
bosilca
8fee96c086 Merge pull request #1091 from bosilca/topic/datatype_span
Cleanup the temporary memory allocation in collectives
2015-12-03 19:25:04 -05:00
Jeff Squyres
0adcb5b5cd MPI_Testsome.3in: wrap some long lines
Wrap some long lines; no other text or semantics changes.
2015-12-03 17:06:43 -05:00
Jeff Squyres
11c571b568 MPI_Testsome.3in: add explicit verbiage about return values
Instead of solely relying on the out value definitions in
MPI_Waitsome.3, explicitly copy this text here.

Note that the original text in this man page was copied verbatim from
the MPI spec; we've now added a bit more text (copied from
MPI_Waitsome.3in) that explains the out values so that users don't
have to cross-reference to another man page.

Thanks to Eric Schnetter for the suggestion.

Fixes open-mpi/ompi#1153
2015-12-03 17:06:22 -05:00
Gilles Gouaillardet
a5440ade5f topo/treematch: do not invoke hwloc_topology_{init,load}
* this is not necessary
 * this overwrites existing topology, that could be different if hwloc_base_topo_file is used
2015-12-03 11:24:32 +09:00
George Bosilca
688108cf7f Patch submitted by @ggouaillardet on ticket #1091. 2015-12-02 20:42:18 -05:00
George Bosilca
4d00c59b2e Cleanup the memory handling for temporary buffers in
some of the collective modules. Added a new function
opan_datatype_span, to compute the memory span of
count number of datatype, excluding the gaps in the
beginning and at the end. If a memory allocation is
made using the returned value, the gap (also returned)
should be removed from the allocated pointer.
2015-12-02 20:42:18 -05:00
Gilles Gouaillardet
351bd03249 ompi_proc_sentinel_to_name: clear the top left bit 2015-12-02 17:18:56 +09:00
Jeff Squyres
15325c8094 op/x86: change the owner to Ralph
Cisco no longer cares about this component, but Intel might.
Transferring ownership to Ralph.
2015-12-01 15:08:07 -08:00
igor-ivanov
d8c85738ab Merge pull request #1151 from igor-ivanov/pr/opal-abort-vars
Add new mca variables opal_abort_delay and opal_abort_print_stack
2015-12-01 16:27:11 +04:00
Nathan Hjelm
406b9ff1e6 ompi/group: add helper function for creating plist groups
This commit adds a helper function for creating groups from proc
lists. The function is used by ompi_comm_fill_rest to create the local
and remote groups.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-30 23:52:57 -07:00
Nathan Hjelm
5334d22a37 ompi/group: release ompi_proc_t's at group destruction
This commit changes the way ompi_proc_t's are retained/released by
ompi_group_t's. Before this change ompi_proc_t's were retained once
for the group and then once for each retain of a group. This method
adds unnecessary overhead (need to traverse the group list each time
the group is retained) and causes problems when using an async
add_procs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-30 23:03:47 -07:00
Ryan Grant
324534b191 Merge pull request #1161 from tkordenbrock/topic/add.triggered.scatter
coll-portals4: add scatter and iscatter implementations that use Portals4 triggered operations
2015-11-30 16:53:47 -07:00
Nathan Hjelm
22af95b266 ompi/proc: make proc system always thread safe
This commit changes the OPAL_THREAD_LOCK/OPAL_THREAD_UNLOCK calls in
ompi/proc to opal_mutex_lock/opal_mutex_unlock. This will allow
multi-threaded BTLs the ability to creat ompi_proc_t's without having
to set opal_using_threads. There should be no performance hits as none
of the lock points are in the critical path.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-30 16:37:09 -07:00
Todd Kordenbrock
4721b70dd5 coll-portals4: add scatter and iscatter implementations that use Portals4 triggered operations
This commit adds implementations of scatter and iscatter using
Portals4 triggered operations.  Currently, the only algorithm
is linear.
2015-11-30 15:07:18 -06:00
Todd Kordenbrock
f6f525e0d8 coll-portals4: remove unneeded code from gather
This commit removes two pieces of unneeded code from gather.  First
it removes destroy_tree() calls from linear_top(), because the
linear algorithm does not create a tree, so there is no need to
destroy it.  Second it removes unpack_bytes from the gather request
because it was calculated but never used.
2015-11-30 10:38:51 -06:00
Gilles Gouaillardet
80f02518ff topo/base: correctly free the topo object in mca_topo_base_dist_graph_create_adjacent 2015-11-30 15:33:59 +09:00
Gilles Gouaillardet
8227bc6320 ompi_proc_find_and_add: use ompi_proc_allocate in order to update *both* ompi_proc_list and ompi_proc_hash 2015-11-30 14:00:59 +09:00
igor.ivanov@itseez.com
c15bf147bf opal: Add opal_abort_print_stack mca variable with aliases for ompi/oshmem
This commit allows to control output during abnormal oshmem/ompi application
termination.
Fixed issue in backtrace output. HAVE_BACKTRACE was never set so user was limited
in control of this variable.
Two related mca variables are moved to opal layer. Corresponding aliases are
added for ompi and oshmem.
2015-11-25 18:18:33 +02:00
Ryan Grant
81d482dca6 Merge pull request #1137 from francois-wellenreiter/trig_mtl_rdv
MTL portals4 : improve the rendez-vous protocol using PtlTriggeredGet…
2015-11-24 17:31:31 -07:00
Ryan Grant
219581e87e Merge pull request #1090 from tkordenbrock/topic/check.for.invalid.handles.in.finalize
mtl-portals4: test for valid handle before releasing resources
2015-11-20 07:54:44 -06:00
Mike Dubman
c544620a7c Merge pull request #1138 from igor-ivanov/pr/yalla-valgrind
yalla: fix valgrind error due to uninitialized status field.
2015-11-20 07:19:11 -05:00
Gilles Gouaillardet
002c7b8b3a fcoll/two_phase: use PMPI_* insted of MPI_* 2015-11-20 13:46:19 +09:00
Gilles Gouaillardet
561e7f6647 vprotocol/pessimist: use internal ompi_* insted of MPI_* 2015-11-20 13:46:19 +09:00
Gilles Gouaillardet
025fd8a9fc osc: use PMPI_* insted of MPI_* 2015-11-20 13:46:19 +09:00
Gilles Gouaillardet
d816d1c194 coll/libnbc: use PMPI_* and internal ompi_* insted of MPI_* 2015-11-20 13:46:19 +09:00
yosefe
3bb1270715 yalla: fix valgrind error due to uninitialized status field. 2015-11-19 10:59:31 +02:00
Francois WELLENREITER
9126ea5e82 MTL portals4 : improve the rendez-vous protocol using PtlTriggeredGet operation 2015-11-19 09:52:53 +01:00
Edgar Gabriel
9e5ade4e8b argh, a debugging sleep statement got into the source code. 2015-11-16 13:26:57 -06:00
Edgar Gabriel
dbfbcdecd5 make adjustments for the default settings of grouping parameters and the default contiguous group size option.
minor bug fix in the simple grouping strategy.
2015-11-16 08:17:27 -06:00
Edgar Gabriel
27628774c7 add a new option for a simple aggregator selection which has zero communication costs. 2015-11-16 08:17:26 -06:00
Edgar Gabriel
66c1ea5fcb change the default value of the grouping option. Also add new grouping option which skips the refinement step in the aggregator selection. 2015-11-16 08:17:23 -06:00