1
1
Граф коммитов

9090 Коммитов

Автор SHA1 Сообщение Дата
Edgar Gabriel
e65e189671 io/ompio: fix file size after file_preallocate
Thanks for @dalcini for reporting
Fixes open-mpi/ompi#1633
2016-05-06 08:20:59 -05:00
Edgar Gabriel
d358965134 io/ompio: fix envelope of datatype returned by getview
Thanks for @dalcini for reporting
Fixes open-mpi/ompi#1632
2016-05-06 08:19:48 -05:00
Edgar Gabriel
7c92acaa78 Merge pull request #1637 from edgargabriel/pr/netbsd-compilation-problems
fs/lustre and fs/pvfs2: fix netbsd compilation problems
2016-05-06 08:05:36 -05:00
Jeff Squyres
810db734c4 Merge pull request #1640 from jsquyres/pr/mpir-cleanup
debuggers: remove some useless code
2016-05-05 21:23:30 -04:00
Gilles Gouaillardet
6c9d65c0ca coll/libnbc: fix MPI_Ireduce_scatter_block for one task communicator
Thanks Lisandro Dalcin for the report

Fixes open-mpi/ompi#248
2016-05-06 09:43:29 +09:00
Ralph Castain
08022d7af1 Some minor cleanups of warnings from gcc 6.0.0. Update s1/s2 pmix to get max_procs as required. 2016-05-05 15:28:13 -07:00
Jeff Squyres
83c2d04aa3 debuggers: remove some useless code
MPIR-1.0 specifies that the following symbols are only relevant in the
starter process:

- MPIR_Breakpoint
- MPIR_being_debugged
- MPIR_debug_state
- MPIR_debug_abort_string

I.e., the code filling in values in these various symbols was useless
/ never used.

MPIR-1.1 will define that MPIR_being_debugged *is* relevant in MPI
processes.  That symbol is currently defined in libopen-rte (which is
currently causing a duplicate symbol error for static builds -- this
commit fixes that error), and is therefore still available for MPI
processes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-05 14:22:55 -07:00
Jeff Squyres
f167be1c91 ompio: always return valid info from FILE_GET_INFO
MPI-3.1 says that even if no info keys are set on the file, we need to
return a new, empty info.

Thanks to Lisandro Dalcin for identifying the issue.

Fixes open-mpi/ompi#1630

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-05 12:03:29 -07:00
Aurélien Bouteiller
4899c89731 Fix a race condition when multiple threads try to create a bml endpoint simultaneously. 2016-05-05 10:49:30 -04:00
Aurélien Bouteiller
627a89bf71 Fix a race condition when multiple threads do the "first send" to an endpoint simultaneously. 2016-05-05 09:04:10 -04:00
Joshua Ladd
4771c9ece6 Merge pull request #1617 from jladd-mlnx/topic/disable-hcoll-barrier-in-finalize-ompi-trunk
HCOLL: fix hang in hcoll barrier called from finalize for MXM/yalla
2016-05-04 10:12:34 -04:00
Aurélien Bouteiller
8344d00418 use-mpi extensions do not have a .la lib, so the fortran module should not depend on them. 2016-05-03 11:54:35 -04:00
Edgar Gabriel
78fa8bb2c4 remove some unused variables that can cause compilation problems on netbsd 2016-05-03 10:25:15 -05:00
Todd Kordenbrock
3498bed650 Merge pull request #1555 from shawone/check_reduce_ret
coll-portals4: check return value from reduce kary tree functions
2016-05-03 10:17:23 -05:00
Jeff Squyres
33dd8ca81e osc_rdma_peer: properly include ompi_config.h
Thanks to Paul Hargrove for reporting.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-03 07:39:55 -07:00
Devendar Bureddy
cafd55f18c HCOLL: fix hang in hcoll barrier called from finalize for MXM/yalla
tear down

HCOLL barrier may not complete if HCOLL progress is not called periodically.
which is the case in HCOLL teardown progress in the finalize.
(cherry picked from commit 793244d75dd94d1d5e0243bcccf6d04318750f3f)
2016-05-03 00:49:57 +03:00
Nathan Hjelm
d3d779f6d9 osc/rdma: clear all_sync object when obtaining a lock
This commit fixes a bad synchronization detection bug that occurs when
mixing MPI_Win_fence() and MPI_Win_lock(). If no communication has
occurred in the fence epoch it is safe to just clear the all_sync
object (it was set up by fence).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-02 15:28:47 -06:00
Jeff Squyres
265e5b9795 Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1
ompi/opal/orte/oshmem/test: max hostname length cleanup
2016-05-02 09:44:18 -04:00
Ralph Castain
6ac7929bd0 Extend the schizo framework to allow definition of CLI options by environment. Refactor orterun to mesh with the orted_submit code, thus improving code reuse. Eliminate the orte-submit tool as orterun can now meet that need.
Cleanups per @jjhursey review
2016-05-01 11:30:25 -07:00
George Bosilca
6e6ed62a3c Allow NULL arrays for emoty datatypes.
When building an empty datatype (aka. size = 0) because the count of
included datatypes is 0, be less strict on what the arguments are
(allow NULL pointers).
2016-05-01 12:37:02 -04:00
Nathan Hjelm
ec66a6a1f8 Merge pull request #1605 from hjelmn/rdma_fixes
osc/rdma: fix global index array calculation
2016-04-28 20:41:36 -06:00
Nathan Hjelm
7bda3eb2dc osc/rdma: fix global index array calculation
This commit fixes a bug that occurs when ranks are either not mapped
evenly or by something other than core.

Fixes open-mpi/ompi#1599

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-28 19:11:11 -06:00
Nathan Hjelm
1783d94f91 ompi/group: fix sparse group proc reference counting
This commit fixes a bug when sparse groups are in use. Since sparse
group do not actually increment the reference counts of any procs
(they just retain the parent group) it is wrong to decrement the
reference counts of all procs in the group using
ompi_group_decrement_proc_count(). This commit makes the call to
ompi_group_decrement_proc_count() conditional on the group being
dense.

Fixes open-mpi/ompi#1593

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-27 15:55:13 -06:00
Gilles Gouaillardet
01c90d4e71 fortran/mpif-h: fix *_create_keyval_f
correctly handle out parameter *_keyval when OMPI_SIZEOF_FORTRAN_INTEGER > SIZEOF_INT
2016-04-27 13:34:32 +09:00
Gilles Gouaillardet
178dde6a20 fortran/mpif-h: fix MPI_Win_shared_query
correctly handle out parameter disp_unit when OMPI_SIZEOF_FORTRAN_INTEGER > SIZEOF_INT
2016-04-27 11:22:09 +09:00
Gilles Gouaillardet
7f59d2a8c7 fortran/mpif-h: fix MPI_Win_free_keyval
initialize inout parameter when OMPI_SIZEOF_FORTRAN_INTEGER > SIZEOF_INT
2016-04-27 10:46:14 +09:00
Nathan Hjelm
f0f3383006 Merge pull request #1590 from hjelmn/thread_multiple
osc/pt2pt: do not drop/reacquire the ompi_request_lock
2016-04-26 16:48:37 -06:00
Nathan Hjelm
34ff6293bd osc/pt2pt: do not drop/reacquire the ompi_request_lock
This lock is now recursive so it is safe to call into the pml without
dropping the lock.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-26 14:19:38 -06:00
George Bosilca
bf190671e9 Make the request lock recursive.
If during the request completion callback we post another request that
completes right away (such a small send or a match for an unexpected
short message) we will try to complete the second request while holding
the lock for the completion of the first. For performance reasons
(mainly to avoid unlocking and locking the request mutex several times)
we have made the request lock recursive.
2016-04-26 16:16:07 -04:00
Nathan Hjelm
1e4daa2a0e mpi_init: move opal_set_using_threads() earlier in MPI_Init()
There is a potential race condition in MPI_Init() where an orte even
thread could be in a function that uses OPAL_THREAD_LOCK /
OPAL_THREAD_UNLOCK when ompi_mpi_init calls opal_set_using_threads().

Closes open-mpi/ompi#1586

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-26 13:02:42 -06:00
Nathan Hjelm
c16e639b2f Merge pull request #1563 from hjelmn/ompi_coverity
ompi coverity fixes
2016-04-26 09:17:48 -06:00
Jeff Squyres
8ab88f2051 ompi_mpi_finalize: add/update comments
This is a follow-on to open-mpi/ompi@7373111: add some comments
explaining why the code is the way it is.  Also update a previous
comment.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-04-25 13:42:30 -07:00
Ralph Castain
7373111662 Somehow, the logic for finalize got lost, so restore it here. If pmix.fence_nb is available, then call it and cycle opal_progress until complete. If pmix.fence_nb is not available, then do an MPI_Barrier and call pmix.fence.
Needs to go over to 2.x
2016-04-25 08:04:35 -07:00
Karol Mroz
3322347da9 ompi: fixup hostname max length usage
Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-25 07:08:23 +02:00
Nathan Hjelm
ae0ffbb67f Merge pull request #1397 from hjelmn/enable_thread_multiple
ompi: always enable MPI_THREAD_MULTIPLE support
2016-04-23 08:40:22 -06:00
Joshua Ladd
0d5a57d9d3 Merge pull request #1558 from vspetrov/hcoll_complex_dtype_support
Adds mapping to hcoll complex data type
2016-04-20 08:35:33 -04:00
Gilles Gouaillardet
490b538ad6 ompi/datatype: fix MPI_LONG_LONG_INT type name
MPI_LONG_LONG_INT is a named predefined datatype, so its name is now MPI_LONG_LONG_INT
MPI_LONG_LONG is a synonym of MPI_LONG_LONG_INT, and its name is also MPI_LONG_LONG_INT
2016-04-20 09:34:20 +09:00
Nathan Hjelm
1ff3d3b16b pml/ob1: fix coverity issue
Fix CID 1357978 (1 of 1): Logically dead code (DEADCODE):

Remove duplicate check for NULL == endpoint.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-19 14:48:13 -06:00
Nathan Hjelm
70533e6d50 fcoll/static: fix coverity issues
Fix CID 72362: Explicit null dereferenced (FORWARD_NULL)

From what I can tell the code @ fcoll_static_file_read_all.c:649
should be setting bytes_per_process[i] to 0 not bytes_per_process.

Fix CID 72361: Explicit null dereferenced (FORWARD_NULL)

Modified check to check for blocklen_per_process non-NULL before
trying to free blocklen_per_process[l]. This is sufficient because
free (NULL) is safe. Also cleaned up the initialization of this an a
couple other arrays. They were allocated with malloc() then
initialized to 0. Changed to used calloc().

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-19 14:48:13 -06:00
Nathan Hjelm
8871bdb2f8 fcoll/two_phase: fix coverity issues
Fix CID 72296: Resource leak (RESOURCE_LEAK):

Changed code to goto exit instead of returning to ensure memory is
freed.

Fix CID 712589: Out-of-bounds read (OVERRUN):

In this loop i and j are identical and always less than
iov_count. The CID was triggered because i was incremented if i was <
iov_count. This meant that if the loop did go on the next iteration
would access an invalid index.

Fix CID 741363: Uninitialized scalar variable (UNINIT):

Allocate tmp_len with calloc to insure every index is initialized.

Fix CID 741364: Uninitialized pointer read (UNINIT):

Allocate recv_types with calloc to ensure all indices are always
initialized. Also added a check to not loop and destroy if recv_types
is NULL.

Also added a NULL check on the allocation of decoded iov. This is not
the cause of CID 126784 but should be fixed.

Fix CID 712588: Out-of-bounds read (OVERRUN):

Similar to CID 712589. Should silence the issue.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-19 14:47:41 -06:00
Valentin Petrov
21f1c572c0 Adds mapping to hcoll complex dte 2016-04-19 14:14:28 +03:00
Nicolas Chevalier
c86d4035d2 coll-portals4: check return value from reduce kary tree functions 2016-04-18 12:02:30 +00:00
Ralph Castain
7829fbdc29 Per request from Jeff, aggregate all help messages during MPI_Init thru MPI_Finalize as long as the RTE is available 2016-04-15 13:37:22 -07:00
KAWASHIMA Takahiro
e854404570 fortran: Change the line order of #pragma
No code change.
These lines were introduced in my recent commit 17d32ac.
I had a editing mistake and the order is different from
other lines/files.
2016-04-15 12:49:03 +09:00
Nathan Hjelm
3245428e82 Merge pull request #1535 from kawashima-fj/pr/osc-pt2pt-header-fix
osc/pt2pt: Fix a struct name typo
2016-04-14 15:55:25 -06:00
Jeff Squyres
4566286b9a Merge pull request #1538 from kawashima-fj/pr/fortran-binding-fix
fortran: Fix many Fortran binding bugs
2016-04-14 17:18:59 -04:00
Nathan Hjelm
330302c4b4 Merge pull request #1534 from kawashima-fj/pr/parallel-rma-fix
osc/pt2pt: Fix tag conflicts on parallel RMA communications
2016-04-14 15:13:32 -06:00
Jeff Squyres
fdf33674b3 Merge pull request #1532 from kmroz/wip-hindexed-cleanup-1
romio,java: cleanup deprecated hindexed call
2016-04-14 17:07:31 -04:00
Jeff Squyres
2374d8fcf7 Merge pull request #1536 from kawashima-fj/pr/inplace-fix
mpi/c, mpi/fortran: Fix `MPI_IN_PLACE`-related bugs
2016-04-14 15:56:55 -04:00
Nathan Hjelm
b4e5b5c09e Merge pull request #1531 from hjelmn/bml
bml: always enable the bml
2016-04-14 10:22:33 -06:00
Nathan Hjelm
1e6b4f2f55 Merge pull request #1495 from hjelmn/new_hooks
Add new patcher memory hooks
2016-04-13 18:19:23 -06:00
Nathan Hjelm
11e2d7886e opal/memory: update component structure
This commit makes it possible to set relative priorities for
components. Before the addition of the patched component there was
only one component that would run on any system but that is no longer
the case. When determining which component to open each component's
query function is called and the one that returns the highest priority
is opened. The default priority of the patcher component is set
slightly higher than the old ptmalloc2/ummunotify component.

This commit fixes a long-standing break in the abstration of the
memory components. ompi_mpi_init.c was referencing the linux malloc
hook initilize function to ensure the hooks are initialized for
libmpi.so. The abstraction break has been fixed by adding a memory
base function that calls the open memory component's malloc hook init
function if it has one. The code is not yet complete but is intended
to support ptmalloc in 2.0.0. In that case the base function will
always call the ptmalloc hook init if exists.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:14:51 -06:00
KAWASHIMA Takahiro
4944ba7edc datatype: Fix incorrect predefined datatype names and other datatype bugs (#1537)
* datatype: Fix a incorrect datatype name of `MPI_UNSIGNED`

Name of predefined datatype for C `unsigned int` gotten by
`MPI_TYPE_GET_NAME` should be `MPI_UNSIGNED`, not `MPI_UNSIGNED_INT`.

* datatype: Fix incorrect datatype names of `MPI_C_BOOL` and `MPI_CXX_*`

Names of predefined datatypes gotten by `MPI_TYPE_GET_NAME` are:

after this commit (correct) | before this commit (incorrect)
-----------------------------------------------------------
MPI_C_BOOL                    MPI_BOOL
MPI_CXX_BOOL                  MPI_BOOL
MPI_CXX_FLOAT_COMPLEX         MPI_C_FLOAT_COMPLEX
MPI_CXX_DOUBLE_COMPLEX        MPI_C_DOUBLE_COMPLEX
MPI_CXX_LONG_DOUBLE_COMPLEX   MPI_C_LONG_DOUBLE_COMPLEX

* datatype: Fix a incorrect datatype name of `MPI_2DOUBLE_PRECISION`

Name of the predefined datatype for Fortran two `double precision`
gotten by `MPI_TYPE_GET_NAME` should be `MPI_2DOUBLE_PRECISION`,
not `MPI_2DBLPREC`.

This bug was caused by setting the name to `opal_datatype_t::name`
instead of `ompi_datatype_t::name`.

* datatype: Fix `MPI_UNSIGNED_CHAR` internal flag

`MPI_UNSIGNED_CHAR` is an integer type.

* ompi/cxx: Fix C++ `MPI::LONG_DOUBLE_INT` definition

Just a typo fix. Without this fix, `MPI::MAX_LOC` and `MPI::MIN_LOC`
cannot be used with `MPI::LONG_DOUBLE_INT` in C++ programs.

I know the C++ binding is obsolete, but fixing this is harmless.

* Add FUJITSU copyright
2016-04-12 20:17:46 +02:00
KAWASHIMA Takahiro
17d32acbb6 fortran: Add missing (P)MPI_Alloc_mem_cptr_{f,f08} symbols
This commit adds the following symbols

  MPI_Alloc_mem_cptr_f
  MPI_Alloc_mem_cptr_f08
  PMPI_Alloc_mem_cptr_f
  PMPI_Alloc_mem_cptr_f08

These are implemented in the same way as other `_cptr` routines.
2016-04-12 22:40:58 +09:00
KAWASHIMA Takahiro
d48c8525ed fortran: Fix incorrect weak symbol names 2016-04-12 22:16:32 +09:00
KAWASHIMA Takahiro
5d32a601ff fortran: Add missing interfaces (part 2) 2016-04-12 22:06:35 +09:00
KAWASHIMA Takahiro
6f09d53e34 fortran: Add missing interfaces 2016-04-12 21:44:33 +09:00
KAWASHIMA Takahiro
f3b9a49ad1 fortran: Add missing PMPI interfaces 2016-04-12 20:55:41 +09:00
KAWASHIMA Takahiro
b6cb0bc257 fortran: Fix an incorrect interface name 2016-04-12 20:48:08 +09:00
KAWASHIMA Takahiro
96e93a9c5f fortran: Sort declared subroutins in alphabetical order
And insert necessary empty lines and remove unnecessary empty lines.

No code change.
2016-04-12 20:36:46 +09:00
KAWASHIMA Takahiro
334c63cf0a fortran: Change subroutine declaration order
Same order for `comm`, `type`, and `win`.

No code change.
2016-04-12 20:10:15 +09:00
KAWASHIMA Takahiro
10c11ff5b5 fortran: Add missing MPI_DUP_FN subroutine
Though the `MPI_DUP_FN` subroutine is depricated, it is not yet removed
as of MPI-3.1.
2016-04-12 20:06:50 +09:00
KAWASHIMA Takahiro
35ea9e5c3c Add FUJITSU copyright 2016-04-12 13:47:53 +09:00
KAWASHIMA Takahiro
39bcbe439a osc/pt2pt: Fix a struct name typo
Fortunately the sizes of `ompi_osc_pt2pt_header_put_t` and
`ompi_osc_pt2pt_header_get_t` are same. So this doesn't affect
the behavior.
2016-04-11 20:55:22 +09:00
KAWASHIMA Takahiro
d3d6386578 mpi/forran: Support MPI_IN_PLACE on (I)ALLTOALLW and (I)EXSCAN
`MPI_IN_PLACE` support for `MPI_ALLTOALLW` and `MPI_EXSCAN` was
added in MPI-2.2 but it was missed in OMPI Fortran binding code.
2016-04-11 20:38:28 +09:00
KAWASHIMA Takahiro
28a0577364 osc/pt2pt: Insert breaks in long lines 2016-04-11 19:06:01 +09:00
KAWASHIMA Takahiro
5ac95df9dc osc/pt2pt: use two distinct "namespaces" for tags - revised
Before this commit, a same PML tag may be used for distinct
communications for long messages. For example, consider a condition
where rank A calls ```MPI_PUT``` targeting rank B and rank B calls
```MPI_GET``` targeting rank A simultaneously.
A PML tag for the ```MPI_PUT``` is acquired on rank A and is used
for the long-message communication from rank A to rank B.
A PML tag for the ```MPI_GET``` is acquired on rank B and is used
for the long-message communication from rank A to rank B.
These two tags may become a same value because they are managed
independently on each rank. This will cause a data corruption.

This commit separates the tag used in a single RMA communication
call, one for communication from an origin to a target, and one
for communication from a target to an origin. A "base" tag
is acquired using ```get_tag``` function and PML tag is caluculated
from the base tag by ```tag_to_target``` and ```tag_to_origin```
function.
2016-04-11 19:05:20 +09:00
KAWASHIMA Takahiro
3576ecafa7 Revert "osc/pt2pt: use two distinct "namespaces" for tags"
This reverts commit 06ecdb6aa7
to reimplement the fix completely.
2016-04-11 19:04:11 +09:00
KAWASHIMA Takahiro
eb5c31521b mpi/c: Fix MPI_IALLTOALLW memchecker 2016-04-11 18:47:30 +09:00
KAWASHIMA Takahiro
1ced7f213c mpi/c: Fix IALLTOALL{V|W} + MPI_IN_PLACE param check
`sendcounts`, `sdispls`, and `sendtype(s)` must be ignored
if `MPI_IN_PLACE` is specified for `sendbuf`.
This commit makes the param check code same as the blocking
`ALLTOALL{V|W}` function.
2016-04-11 18:34:11 +09:00
Karol Mroz
f8ecdbd623 java: replace deprecated hindexed call
Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-10 19:56:22 +02:00
Karol Mroz
5c54184986 romio: replace deprecated hindexed call
Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-10 19:56:22 +02:00
Nathan Hjelm
c6b19818be bml: always enable the bml
This commit ensures the bml is always enabled whether or not it will
be used. This ensures that any available btls communicate their modex
so that they can be used for one-sided communication.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-08 21:14:17 -06:00
George Bosilca
896f857fc4 Thanks @hjelmn for catching up the typo. 2016-04-07 13:56:26 -04:00
Thananon Patinyasakdikul
92290b94e0 Fixed Coverity reports 1358014-1358018 (DEADCODE and CHECK_RETURN) 2016-04-07 12:52:17 -04:00
Ryan Grant
7cdf50533c Merge pull request #1314 from francois-wellenreiter/osc_disable_portals4_evt_send
OSC portals4 : do not generate an EVENT_SEND to avoid to filter it
2016-04-07 10:04:27 -06:00
Gilles Gouaillardet
7b803ac557 MPI_Unpack: fix error code when insize <= 0
this fixes a regression from open-mpi/ompi@f2e33c725f
2016-04-06 09:47:21 +09:00
Karol Mroz
a468c3ba1a opal_info_support: pass component map when handling params
Pass component_map to opal_info_do_params(). It will be needed to output
component versions.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-02 21:17:44 +02:00
Gilles Gouaillardet
f2e33c725f MPI_Unpack: fix return status
this regression was previously introduced in open-mpi/ompi@221e6e2eab
2016-03-31 09:56:54 +09:00
Gilles Gouaillardet
5932287cef datatype/[un]pack_external[_size]: move subroutines down to ompi/datatype
so it can be directly used by test/datatype/external32
2016-03-30 13:01:33 +09:00
Gilles Gouaillardet
221e6e2eab Add the datatype checks to the pack/unpack functions.
The datatype must satisfy the same constraints as for the
corresponding communication function (send for pack and
recv for unpack).
2016-03-30 11:40:08 +09:00
Gilles Gouaillardet
a89f113507 mpi/c: add missing OPAL_CR_EXIT_LIBRARY() in [un]pack[_external] 2016-03-30 11:25:21 +09:00
George Bosilca
004c0cc05b Fix issues identified by @derbeyn. 2016-03-29 15:50:32 -04:00
Jeff Squyres
91c54d7a07 Merge pull request #1491 from ICLDisco/progress_thread
BTL TCP async progress
2016-03-29 06:26:10 -04:00
George Bosilca
f69eba1bc4 Update the copyright and cleanup the code.
Per @jsquyres suggestion remove all trailing spaces.
Credit to `sed -i.bak 's/ *$//' */[ch]`.
2016-03-28 14:41:01 -04:00
Thananon Patinyasakdikul
92062492b9 Enable Threading in the BTL TCP
Added mca parameter to turn progress thread on/off
Add a flag to check if we have btl progress thread.
Added macro for ob1 matching lock.
Update the AUTHORS file.
2016-03-28 14:41:01 -04:00
Nathan Hjelm
9d5eeecb8a pml/ob1: detect unreachable errors
This commit adds code to detect when procs are unreachable when using
the dynamic add_procs functionality.

Fixes #1501

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-28 10:52:40 -06:00
Gilles Gouaillardet
1baed498b6 win: silence a warning in alloc_window(...) 2016-03-28 14:57:31 +09:00
Nathan Hjelm
d6e90f24b1 Merge pull request #1483 from hjelmn/flag_enum_2
RFC: Add support for flag enumerators for MCA variables
2016-03-26 11:43:33 -06:00
George Bosilca
57eadb0dd6 Fix for Coverity CID 1357152.
Or at least that was the origin of the issue. It turns out
we were freeing the wrong buffer (but as it only happen in the
case of an error we never noticed).
2016-03-24 00:53:30 -04:00
George Bosilca
4b38b6bd0c Fix multiple issues with the collective requests.
This patch addresses most (if not all) @derbeyn concerns
expressed on #1015. I added checks for the requests allocation
in all functions, ompi_coll_base_free_reqs is called with the
right number of requests, I removed the unnecessary basic_module_comm_t
and use the base_module_comm_t instead, I remove all uses of the
COLL_BASE_BCAST_USE_BLOCKING define, and other minor fixes.
2016-03-23 18:35:41 -04:00
Nathan Hjelm
a1420003b6 ompi/comm: clean up includes in comm_request.h
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-22 09:17:38 -06:00
Nathan Hjelm
b15a45088c mca: add support for flag enumerators
This commit adds a new type of enumerator meant to support flag
values. The enumerator parses comma-delimited strings and matches
each string or value to a list of valid flags. Additionally, the
enumerator does some basic checks to see if 1) a flag is valid in the
enumerator, and 2) if any conflicting flags are specified.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-21 15:20:56 -06:00
Todd Kordenbrock
2122a15217 Merge pull request #1443 from francois-wellenreiter/fix_trig_rndv
MTL portals4 : fix around triggered rndv operations
2016-03-21 08:16:33 -05:00
Ralph Castain
c146c4969b Revert part of open-mpi/ompi@c1bbbb5e2f to restore the usock component, thus fixing show_help aggregation.
Fixes #1467

Restore debugger attach operations

Fixes #1225
2016-03-18 21:49:04 -07:00
Nathan Hjelm
075dfa4121 topo/treematch: fix component coverity issues
Fix CID 1315298: Resource leak (RESOURCE_LEAK) :
Fix CID 1315300: Resource leak (RESOURCE_LEAK):
Fix CID 1315299: Resource leak (RESOURCE_LEAK):
Fix CID 1315297 (#1 of 1): Resource leak (RESOURCE_LEAK):

Confirmed leaks in error paths. Added the leaked arrays to the
ERR_EXIT macro to ensure they are freed.

Fix CID 1315296 (#1 of 1): Resource leak (RESOURCE_LEAK):

Confirmed leak in error paths. Both the oversub and reqs arrays are
leaked. Free these arrays on error.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-18 11:31:11 -06:00
Nathan Hjelm
3540b65f7d bcol: fix coverity issues
Fix CID 1269976 (#1 of 1): Unused value (UNUSED_VALUE):
Fix CID 1269979 (#1 of 1): Unused value (UNUSED_VALUE):

Removed unused variables k_temp1 and k_temp2.

Fix CID 1269981 (#1 of 1): Unused value (UNUSED_VALUE):
Fix CID 1269974 (#1 of 1): Unused value (UNUSED_VALUE):

Removed gotos and use the matched flags to decide whether to return.

Fix CID 715755 (#1 of 1): Dereference null return value (NULL_RETURNS):

This was also a leak. The items on cs->ctl_structures are allocated using OBJ_NEW so they mist be released using OBJ_RELEASE not OBJ_DESTRUCT. Replaced the loop with OPAL_LIST_DESTRUCT().

Fix CID 715776 (#1 of 1): Dereference before null check (REVERSE_INULL):

Rework error path to remove REVERSE_INULL. Also added a free to an error path where it was missing.

Fix CID 1196603 (#1 of 1): Bad bit shift operation (BAD_SHIFT):
Fix CID 1196601 (#1 of 1): Bad bit shift operation (BAD_SHIFT):

Both of these are false positives but it is still worthwhile to fix so they no longer appear. The loop conditional has been updated to use radix_mask_pow instead of radix_mask to quiet these issues.

Fix CID 1269804 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS):

In general close (-1) is safe but coverity doesn’t like it. Reworked the error path for open to not try to close (-1).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-18 10:59:46 -06:00
Nathan Hjelm
c8b077f232 coll/ml: fix coverity issues
Fix CID 715744 (#1 of 1): Logically dead code (DEADCODE):
Fix CID 715745 (#1 of 1): Logically dead code (DEADCODE):

The free of scratch_num in either place is defensive programming. Instead of removing the free the conditional around the free has been removed to quiet the warning.

Fix CID 715753 (#1 of 1): Dereference after null check (FORWARD_NULL):
Fix CID 715778 (#1 of 1): Dereference before null check (REVERSE_INULL):

Fixed the conditional to check for collective_alg != NULL instead of collective_alg->functions != NULL.

Fix CID 715749 (#1 of 4): Explicit null dereferenced (FORWARD_NULL):

Updated code to ensure that none of the parse functions are reached with a non-NULL value.

Fix CID 715746 (#1 of 1): Logically dead code (DEADCODE):

Removed dead code.

Fix CID 715768 (#1 of 1): Resource leak (RESOURCE_LEAK):
Fix CID 715769 (#2 of 2): Resource leak (RESOURCE_LEAK):
Fix CID 715772 (#1 of 1): Resource leak (RESOURCE_LEAK):

Move free calls to before error checks to cleanup leak in error paths.

Fix CID 741334 (#1 of 1): Explicit null dereferenced (FORWARD_NULL):

Added a check to ensure temp is not dereferenced if it is NULL.

Fix CID 1196605 (#1 of 1): Bad bit shift operation (BAD_SHIFT):

Fixed overflow in calculation by replacing int mask with 1ul.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-18 10:11:16 -06:00
Nathan Hjelm
2f4e5325aa coll/base: fix coverity issues
Fix CID 1325868 (#1 of 1): Dereference after null check (FORWARD_NULL):
Fix CID 1325869 (#1-2 of 2): Dereference after null check (FORWARD_NULL):

Here reqs can indeed be NULL. Added a check to
ompi_coll_base_free_reqs to prevent dereferencing NULL pointer.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-18 09:31:43 -06:00
Nathan Hjelm
2ed4501490 osc: fix coverity issues
Fix CID 1324726 (#1 of 1): Free of address-of expression (BAD_FREE):

Indeed, if a lock conflicts with the lock_all we will end up trying to
free an invalid pointer.

Fix CID 1328826 (#1 of 1): Dereference after null check (FORWARD_NULL):

This was intentional but it would be a good idea to check for
module->comm being non_NULL to be safe. Also cleaned out some checks
for NULL before free().

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-18 09:11:48 -06:00
Nathan Hjelm
b9d100929b man: fix typo in MPI_Win_allocate_shared
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-16 14:47:40 -06:00
Nathan Hjelm
ec9712050b Merge pull request #1118 from hjelmn/mpool_rewrite
mpool/rcache rewrite
2016-03-15 10:46:24 -06:00
Nathan Hjelm
deae9e52bf Merge pull request #1259 from kawashima-fj/pr/osc-sm-align
osc/sm: Fix a bus error on MPI_WIN_{POST,START}.
2016-03-15 09:13:38 -06:00
Francois WELLENREITER
2bc432d95f MTL portals4 : fix around triggered rndv operations 2016-03-15 15:31:04 +01:00
Nathan Hjelm
d4afb16f5a opal: rework mpool and rcache frameworks
This commit rewrites both the mpool and rcache frameworks. Summary of
changes:

 - Before this change a significant portion of the rcache
   functionality lived in mpool components. This meant that it was
   impossible to add a new memory pool to use with rdma networks
   (ugni, openib, etc) without duplicating the functionality of an
   existing mpool component. All the registration functionality has
   been removed from the mpool and placed in the rcache framework.

 - All registration cache mpools components (udreg, grdma, gpusm,
   rgpusm) have been changed to rcache components. rcaches are
   allocated and released in the same way mpool components were.

 - It is now valid to pass NULL as the resources argument when
   creating an rcache. At this time the gpusm and rgpusm components
   support this. All other rcache components require non-NULL
   resources.

 - A new mpool component has been added: hugepage. This component
   supports huge page allocations on linux.

 - Memory pools are now allocated using "hints". Each mpool component
   is queried with the hints and returns a priority. The current hints
   supported are NULL (uses posix_memalign/malloc), page_size=x (huge
   page mpool), and mpool=x.

 - The sm mpool has been moved to common/sm. This reflects that the sm
   mpool is specialized and not meant for any general
   allocations. This mpool may be moved back into the mpool framework
   if there is any objection.

 - The opal_free_list_init arguments have been updated. The unused0
   argument is not used to pass in the registration cache module. The
   mpool registration flags are now rcache registration flags.

 - All components have been updated to make use of the new framework
   interfaces.

As this commit makes significant changes to both the mpool and rcache
frameworks both versions have been bumped to 3.0.0.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-14 10:50:41 -06:00
Gilles Gouaillardet
eb690432e8 fortran: add missing constants for MPI_WIN_CREATE_FLAVOR and MPI_WIN_MODEL
also add valid values used by MPI_WIN_CREATE_FLAVOR :
- MPI_WIN_FLAVOR_CREATE
- MPI_WIN_FLAVOR_ALLOCATE
- MPI_WIN_FLAVOR_DYNAMIC
- MPI_WIN_FLAVOR_SHARED
2016-03-14 10:19:21 +09:00
Nathan Hjelm
60a3eb12ac comm_spawn_multiple_f: fix coverity issue
Fix CID 1327338: Resource leak (RESOURCE_LEAK):

Confirmed that the c_info array was being leaked. Free the array
before returning.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-11 13:07:01 -07:00
Gilles Gouaillardet
fbed6df4a3 coll/base: fix a typo
typo was introduced in open-mpi/ompi@c98e97a46e
2016-03-11 14:18:03 +09:00
Gilles Gouaillardet
0da1374f22 man: fix typo in MPI_File related man pages 2016-03-11 14:16:21 +09:00
Gilles Gouaillardet
d08fb46ec7 ompi/win: use type int* for MPI_WIN_DISP_UNIT, MPI_WIN_CREATE_FLAVOR and MPI_WIN_MODEL
Thanks Alastair McKinstry for the report.
2016-03-11 09:22:25 +09:00
Aurélien Bouteiller
c98e97a46e Do not return MPI_ERR_PENDING from collectives. 2016-03-09 16:13:34 -05:00
Joshua Ladd
4dffae2f88 Fixing MXM Yalla and MTL add procs behavior. MXM cannot support dynamic add procs, so propaget this info to the MTL and PML layers. 2016-03-08 01:46:24 +02:00
Aurélien Bouteiller
892e1ed57e Fix a potential race condition in which a progress matching thread could match a request while we are cancelling it. 2016-03-01 16:43:45 -05:00
Gilles Gouaillardet
8aff67c399 topo/base: correctly support MPI_UNWEIGHTED in mca_topo_base_dist_graph_neighbors()
Thanks Jun Kudo for the bug report.
2016-03-01 10:28:28 +09:00
Jeff Squyres
89d0a033b7 cxx: "rank" is now a function in C++11
Use "myrank" instead (I tried using ::rank, but had varied
success... so I just renamed the variable).
2016-02-25 15:56:08 -06:00
George Bosilca
dbe93b0b19 Use mca_bml_base_get_endpoint
Correctly use mca_bml_base_get_endpoint instead of accessing the
endpoint directly.
2016-02-25 11:00:30 -06:00
Sylvain Jeaugey
5f32f49eb8 pml/ob1: Fix segmentation fault on CUDA path.
Fix segfault due to mca_pml_ob1_cuda_need_buffers not handling the case of the
endpoint not being there. Calling mca_bml_get_endpoint() seems to fix the problem.

Fixes open-mpi/ompi#1402
2016-02-24 21:32:25 -08:00
Gilles Gouaillardet
d8482ce6f4 opal/mca/memory: add a memoryc_set_alignment subroutine to the OPAL memory MCA
this commit also (partially) reverts :
 - open-mpi/ompi@7de01b347c
 - open-mpi/ompi@8b05f308f9
2016-02-24 09:50:12 +09:00
Nathan Hjelm
230d04327e ompi: always enable MPI_THREAD_MULTIPLE support
This commit removes the --with-mpi-thread-multiple option and forces
MPI_THREAD_MULTIPLE support. This cleans up an abstration violation
in opal where OMPI_ENABLE_THREAD_MULTIPLE determines whether the
opal_using_threads is meaningful. To reduce the performance hit on
MPI_THREAD_SINGLE programs an OPAL_UNLIKELY is used for the
check on opal_using_threads in OPAL_THREAD_* macros.

This commit does not clean up the arguments to the various functions
that take whether muti-threading support is enabled. That should be
done at a later time.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-23 10:02:14 -07:00
Edgar Gabriel
45003ef78d fix the data size counter for large ops for the static fcoll component 2016-02-23 08:33:50 -06:00
Gilles Gouaillardet
308bbcbad1 ompi/dpm: retrieves OPAL_PMIX_ARCH in heterogeneous mode
also remove code duplication by using ompi_proc_complete_init_single()

Thanks Siegmar Gross for reporting this issue, and Ralph for the guidance.
2016-02-22 11:01:06 +09:00
Gilles Gouaillardet
a4aa4c9571 ompi_proc_complete_init_single: make the subroutine public
and accept a proc from a different job
2016-02-22 11:01:06 +09:00
yohann
59b6d041f8 mtl/ofi: Check allocated pointer. 2016-02-19 16:59:47 -08:00
yohann
bd47062764 mtl/ofi: Fix error handling. 2016-02-19 16:58:41 -08:00
yohann
404987e9b3 mtl/ofi: Fix mismatching types. 2016-02-19 16:57:26 -08:00
yohann
3ad59435ce mtl/ofi: Prevent possible memory leak. 2016-02-19 16:57:02 -08:00
Edgar Gabriel
92d1b99468 optimize the shuffle step:
1. use communicator collectives if possible for performance reasons
 2. combined multiple allgathers into a single one
2016-02-19 11:04:04 -06:00
Edgar Gabriel
e63836c653 clean up the mca parameter handling of the component. Add new parameters for number of sub groups and write chunk size. This will allow to perform a systematic parameter study. 2016-02-19 10:15:28 -06:00
Edgar Gabriel
4f400314e0 add the dynamic_gen2 component into the fcoll selection table. 2016-02-19 09:32:54 -06:00
Edgar Gabriel
268d525053 change the tag to be a positive value. handle 0-byte situations correctly. 2016-02-19 08:28:50 -06:00
Edgar Gabriel
ad79012059 first cut on the version which overlaps the communication/computation of 2 iterations. 2016-02-19 08:28:50 -06:00
Jeff Squyres
7b73c868d5 memchecker.h: fix memchecker no-data case
Thanks to @clintonstimpson for reporting the issue.

Fixes open-mpi/ompi#100

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-02-18 10:48:11 -08:00
Ralph Castain
60a7bc2e50 Enable the PMIx notification callback system. This currently is only supported by the pmix120 component, which is not selected by default. All other components will ignore error registration requests, and thus do not support debugger attach when launched via mpirun. Note that direct launched applications will support such attachment, but may not do so in a scalable fashion.
Fixes ##1225
2016-02-18 09:29:12 -08:00
yohann
7fe395c82a mtl/ofi: cleanup 2016-02-16 09:57:57 -08:00
yohann
22eddfee10 mtl/ofi: update copyright dates. 2016-02-16 09:56:09 -08:00
Gilles Gouaillardet
7de01b347c ompi/init: fix abstraction violation
This fixes open-mpi/ompi@8b05f308f9

libmpi.so cannot be built (unresolved symbols) with configure'd with
--disable-mem-debug --disable-mem-profile --disable-memchecker --without-memory-manager
2016-02-16 16:39:21 +09:00
igor-ivanov
d9eefefa74 Merge pull request #1351 from igor-ivanov/pr/issue-1336
opal/memory: Move Memory Allocation Hooks usage from openib
2016-02-15 14:07:36 +04:00
George Bosilca
56425a5d48 Fix issue identified by Lisandro Dalcin regarding the lack
of support for NULL value in MPI_Type_set_attr.
Provides a fix for issue #1359.
2016-02-14 00:07:08 -05:00
George Bosilca
68c36ea9dc Fix two annoying warnings in our UCX support. 2016-02-14 00:02:16 -05:00
Jeff Squyres
7bc62e8f4c Merge pull request #1356 from hjelmn/get_address
Fix MPI_Get_address (MPI_BOTTOM, ...)
2016-02-13 08:27:18 -05:00
Nathan Hjelm
064a67f5b9 Fix MPI_Get_address (MPI_BOTTOM, ...)
Nowhere in the standard does it say that it is invalid to pass
MPI_BOTTOM to MPI_Get_address yet we were returning an error. This
commit removes the error check on NULL == location.

Fixes open-mpi/ompi#1355.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-12 16:34:21 -07:00
yohann
67ce4a080a mtl/ofi: FI_AV_MAP support only. 2016-02-12 10:06:52 -08:00
yohann
b3d8ead76e mtl/ofi: Fix dynamic add_procs. 2016-02-12 10:05:52 -08:00
Jeff Squyres
d98616b9ed Merge pull request #1337 from ggouaillardet/poc/f08_fn
mpi_f08: correctly implements MPI_{COMM,TYPE,WIN}_{DUP,NULL_{COPY,DEL…
2016-02-11 12:27:29 -05:00
Igor Ivanov
8b05f308f9 opal/memory: Move Memory Allocation Hooks usage from openib
These changes fix issue https://github.com/open-mpi/ompi/issues/1336

- improve abstractions: opal/memory/linux component should be single place that opeartes with
Memory Allocation Hooks.
- avoid collisions in case dynamic component open/close: it is safe because it is linked statically.
- does not change original behaivour.
2016-02-11 14:46:35 +02:00
Gilles Gouaillardet
96310f439b sentinel: fix 32 bits arch
since a sentinel is only made from the current job, only store the
first 31 bits of the vpid into the sentinel.
2016-02-10 15:44:07 +09:00
Gilles Gouaillardet
b55b9e6aee sentinel: fix sentinel to proc_name conversion
converting an opal_process_name_t means the loss of one bit,
it was decided to restrict the local job id to 15 bits, so the
useful information of an opal_process_name_t can fit in 63 bits.
2016-02-10 15:44:07 +09:00
Gilles Gouaillardet
030a5f2054 sentinel: use type uintptr_t for sentinel
MSB is now automatically cleared when right shifting
Thanks George for pointing this
2016-02-10 11:28:56 +09:00
Jeff Squyres
d537ee9f26 Merge pull request #1340 from jsquyres/pr/decrease-mpi_add_procs_cutoff
RFC: ompi_mpi_params.c: set mpi_add_procs_cutoff default to 0
2016-02-09 13:36:43 -05:00
Jeff Squyres
902b477aac ompi_mpi_params.c: set mpi_add_procs_cutoff default to 0
Decrease the default value of the "mpi_add_procs_cutoff" MCA param
from 1024 to 0.
2016-02-09 09:41:36 -08:00
George Bosilca
7c574a3530 Typo. 2016-02-07 07:22:22 +02:00
Nathan Hjelm
5b9c82a964 osc/pt2pt: bug fixes
This commit fixes several bugs identified by @ggouaillardet and MTT:

 - Fix SEGV in long send completion caused by missing update to the
   request callback data.

 - Add an MPI_Barrier to the fence short-cut. This fixes potential
   semantic issues where messages may be received before fence is
   reached.

 - Ensure fragments are flushed when using request-based RMA. This
   allows MPI_Test/MPI_Wait/etc to work as expected.

 - Restore the tag space back to 16-bits. It was intended that the
   space be expanded to 32-bits but the required change to the
   fragment headers was not committed. The tag space may be expanded
   in a later commit.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-04 16:59:39 -07:00
Gilles Gouaillardet
6eac6a8b00 osc/sm: create datafile into the per proc directory in order to make it unique per communicator
Thanks Peter Wind for the report
2016-02-03 10:12:37 +09:00
Nathan Hjelm
519fffb65e osc/pt2pt: eager sends are always active if MPI_MODE_NOCHECK is used
This commit fixes open-mpi/ompi#1299.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-02 12:44:17 -07:00
Nathan Hjelm
d7264aa613 osc/pt2pt: various threading fixes
This commit fixes several bugs identified by a new multi-threaded RMA
benchmarking suite. The following bugs have been identified and fixed:

 - The code that signaled the actual start of an access epoch changed
   the eager_send_active flag on a synchronization object without
   holding the object's lock. This could cause another thread waiting
   on eager sends to block indefinitely because the entirety of
   ompi_osc_pt2pt_sync_expected could exectute between the check of
   eager_send_active and the conditon wait of
   ompi_osc_pt2pt_sync_wait.

 - The bookkeeping of fragments could get screwed up when performing
   long put/accumulate operations from different threads. This was
   caused by the fragment flush code at the end of both put and
   accumulate. This code was put in place to avoid sending a large
   number of unexpected messages to a peer. To fix the bookkeeping
   issue we now 1) wait for eager sends to be active before stating
   any large isend's, and 2) keep track of the number of large isends
   associated with a fragment. If the number of large isends reaches
   32 the active fragment is flushed.

 - Use atomics to update the large receive/send tag counters. This
   prevents duplicate tags from being used. The tag space has also
   been updated to use the entire 16-bits of the tag space.

These changes should also fix open-mpi/ompi#1299.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-02 12:33:33 -07:00
Gilles Gouaillardet
cda094afc7 mpi_f08: correctly implements MPI_{COMM,TYPE,WIN}_{DUP,NULL_{COPY,DELETE}}_FN
Fixes open-mpi/ompi#1323
2016-02-02 13:38:01 +09:00
Gilles Gouaillardet
728a97c558 use-mpi-f08: remove duplicates from Makefile.am 2016-02-02 13:33:07 +09:00
Jeff Squyres
910eca751f Merge pull request #1327 from ggouaillardet/poc/mpi_xxx_dup_yyy_no_bind
f08: do not BIND(C) to subroutines with LOGICAL parameters
2016-02-01 17:56:27 -05:00
Edgar Gabriel
3f7fff5780 Merge pull request #1331 from edgargabriel/solaris-statfs-fix
Solaris statfs fix
2016-01-28 20:16:33 -06:00
Gilles Gouaillardet
69ba2a9b6b ddt: fix support of MPI_COMBINER_RESIZED in __ompi_datatype_create_from_args
Thanks James Ramsey for the report
2016-01-28 11:32:29 +09:00
Nathan Hjelm
a19c265ab5 osc/rdma: fix typo in ompi_osc_rdma_complete_atomic
The typo caused SEGVs on systems with only fetching atomic
support.

Fixes open-mpi/ompi#1329

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-26 15:44:07 -07:00
Edgar Gabriel
b4a725c26a need to check for the parent dir as well, since the file might not exist yet. 2016-01-26 13:49:21 -06:00
Edgar Gabriel
722aab92e6 - extend opal_path_nfs to retrieve the file system type
- use opal_path_nfs in the fs_base function to avoid code duplication.
2016-01-26 13:36:21 -06:00
Gilles Gouaillardet
704f14f91e f08: do not BIND(C) to subroutines with LOGICAL parameters
Thanks Paul Romano for reporting this issue.
2016-01-26 13:56:24 +09:00
Joshua Ladd
69e3c6f289 Merge pull request #1321 from jladd-mlnx/topic/add-allgatherv-reduce
Adding entry points for Allgatherv, iAllgatherv, Reduce, and iReduce.
2016-01-25 20:46:52 -05:00
Nathan Hjelm
500e90422d Merge pull request #1320 from hjelmn/osc_rdma_fix
osc/rdma: fix hang when performing large unaligned gets
2016-01-25 09:36:13 -07:00
Nathan Hjelm
45da311473 osc/rdma: fix hang when performing large unaligned gets
This commit adds code to handle large unaligned gets. There are two
possible code paths for these transactions:

 1) The remote region and local region have the same alignment. In
 this case the get will be broken down into at most three get
 transactions: 1 transaction to get the unaligned start of the region
 (buffered), 1 transaction to get the aligned portion of the region,
 and 1 transaction to get the end of the region.

 2) The remote and local regions do not have the same alignment. This
 should be an uncommon case and is not optimized. In this case a
 buffer is allocated and registered locally to hold the aligned data
 from the remote region. There may be cases where this fails (low
 memory, can't register memory). Those conditions are unlikely and
 will be handled later.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-22 21:06:46 -07:00
Valentin Petrov
5e2a2c0755 BufFix for coll/hcoll: coll_request must be set to ACTIVE when alloced
If the state of the request is not set to OMPI_REQUEST_ACTIVE
       then MPI_Test would immediately signal such request completed
       while hcoll may still be working on it.

Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-23 03:23:59 +02:00
Joshua Ladd
e398bf6f3a Adding entry points for Allgatherv, iAllgatherv, Reduce, and iReduce.
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-23 03:09:29 +02:00
Nathan Hjelm
49d2f44b97 osc/rdma: use correct endpoint for local state
If atomics are not globally visible (cpu and nic atomics do not mix)
then a btl endpoint must be used to access local ranks. To avoid
issues that are caused by having the same region registered with
multiple handles osc/rdma was updated to always use the handle for
rank 0. There was a bug in the update that caused osc/rdma to continue
using the local endpoint for accessing the state even though the
pointer/handle are not valid for that endpoint. This commit fixes the
bug.

Fixes open-mpi/ompi#1241.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-22 10:41:27 -07:00
Nathan Hjelm
243d973cfe Merge pull request #1316 from hjelmn/datatype_pack_threads
ompi/datatype: make datatype pack thread safe
2016-01-21 20:14:10 -07:00
Nathan Hjelm
b921831f2b ompi/datatype: make datatype pack thread safe
This commit makes ompi_datatype_get_pack_description thread safe. The
call is used by osc/pt2pt to send the packed description to remote
peers. Before this commit if MPI_THREAD_MULTIPLE is enabled and the
user uses MPI_Put, MPI_Get, etc we could hit a race where multiple
threads attempt to store the packed description on the datatype. Since
the code in question is not performance-critical the threading fix
uses opal_atomic_* calls instead of bothering with OPAL_THREAD_*.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-21 17:53:28 -07:00
Nathan Hjelm
6180386bea osc/rdma: disable put aggregation when using threads
Optimizing put aggregation in the presence of threads will require a
redesign of the code. For now just ensure that put aggregation is
turned off when MPI_THREAD_MULTIPLE is enabled.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-21 15:50:35 -07:00
Edgar Gabriel
b253d4e887 fix CID 1349739, CID 1349738, CID 1349736 and (probably) CID 1349740 (not entirely sure about the last one, since I don't understand why block[i] is a problem but max_len[i] allocated and treated exactly the same way 1 line later is not). 2016-01-21 08:32:23 -06:00
Edgar Gabriel
9b8d769e41 will rivist the addproc component later in spring, right now it is constantly in the way of doing my tests. 2016-01-20 15:05:51 -06:00
Edgar Gabriel
1671604dbc Merge pull request #1307 from edgargabriel/fcoll-dynamic_gen2
Fcoll dynamic gen2
2016-01-20 10:19:56 -06:00
Francois WELLENREITER
411b7301c3 OSC portals4 : do not generate an EVENT_SEND to avoid to filter it 2016-01-20 11:47:46 +01:00
Gilles Gouaillardet
2adbe273d6 mpi: have MPI_Wtick() return the period (and not the frequency) if OPAL_TIMER_CYCLE_NATIVE 2016-01-20 14:14:47 +09:00
Gilles Gouaillardet
c0f8f2ce32 ompi/dpm: correctly handle sentinels in construct peers
This fix is similar to open-mpi/ompi@4c1ea4a171
and open-mpi/ompi@213b2abde4
2016-01-18 09:57:38 +09:00
Edgar Gabriel
a9ca37059a improve the communicaton abstraction. This commit also allows all aggregators to work simultaniously, instead of the slightly staggered way of the previous version. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
56e11bfc97 initialize the stripe_size variable as well. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
26c57ef374 separate the size of the buffer used for the shuffle step and the size of the buffer used for a pwritev operation. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
39d5c8c281 further bug fixes silencing a compiler warning and fixing a memory overrun 2016-01-17 09:48:49 -06:00
Edgar Gabriel
2bcae84e11 further debugging 2016-01-17 09:48:49 -06:00
Edgar Gabriel
2bdd6ba17a correctly free some buffers, and ensure that lustre_stripe_size and stripe_count are always read from the file system. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
4bbb22bd0b add a new field to the ompio data structure (stripe_count) and set it correctly on pvfs2 and lustre. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
d282e94b67 add the new dynamic_gen2 component, designed to coexist for now with the original dynamic component 2016-01-17 09:48:49 -06:00
Jeff Squyres
60ffe713b8 common syms: whitelist bison-generated common symbols
Bison generates some common symbols that we can't do anything about,
so whitelist them.
2016-01-16 03:53:14 -08:00
Jeff Squyres
96f94f8228 fortran: whitelist deliberate common symbols
The Fortran library has a number of common symbols that are
deliberate, so whitelist them.
2016-01-16 03:53:14 -08:00
Joshua Ladd
18c5a21562 Fix typo in error handling flow. 2016-01-14 22:28:54 +02:00
Joshua Ladd
afa62d8ca1 Addressing reviewers' comments for https://github.com/open-mpi/ompi-release/pull/891 2016-01-14 19:22:27 +02:00
Tomislav Janjusic
3858bc8e62 Adding support for dynamic endpoint creation
Signed-off-by: Tomislav Janjusic <tomislavj@mngx-apl-01.mtl.labs.mlnx>
Signed-off-by: Tomislavj Janjusic <tomislavj@mellanox.com>
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-12 22:17:03 +02:00
Nathan Hjelm
dd4d49cbbb Merge pull request #1278 from ggouaillardet/poc/osc_pt2pt
osc/pt2pt: use two distinct "namespaces" for tags
2016-01-12 09:49:31 -07:00
Nathan Hjelm
d26cc3fece ompi/group: do no decrement parent group proc pointers in destruct
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-11 12:56:11 -07:00
Edgar Gabriel
0a1b735eed use the actual preadv and pwritev functions if available. That's what the fbtl interfaces have been designed for. 2016-01-07 08:29:17 -06:00
Gilles Gouaillardet
4c1ea4a171 dpm: correctly handle procs_cutoff in ompi_dpm_connect_accept()
this commit includes missing bits from open-mpi/ompi@213b2abde4
2016-01-07 09:11:03 +09:00
Gilles Gouaillardet
213b2abde4 dpm: correctly handle procs_cutoff in ompi_dpm_connect_accept() 2016-01-06 16:21:13 +09:00
Edgar Gabriel
1b0b849994 remove the MCA parameter setting the number of hosts in PLFS, since the plfs_setxattr function used is causing linking problems with PLFS 2.5
remove unused variables.
2016-01-05 11:13:23 -06:00
Edgar Gabriel
7861a8c357 revise the logic in the fbtl plfs avoiding the memcpy operation 2016-01-05 10:04:46 -06:00
Edgar Gabriel
da309ac962 - use a unique pid for each process as requested by the API
- sync the file before closing it
- use plfs_access() instead of access() before closing the file
2016-01-05 10:04:12 -06:00