1
1

9050 Коммитов

Автор SHA1 Сообщение Дата
Aurélien Bouteiller
7f65c2b18e forgot to update copyright in commits 627a89b 4899c89 2016-05-13 11:34:59 -04:00
George Bosilca
37e03e3e5b Don't update req_bytes_received if no bytes were received. 2016-05-12 23:39:32 -04:00
rhc54
4d026e223c Merge pull request #1661 from matcabral/master
PSM and PSM2 MTLs to detect drivers and link
2016-05-11 17:43:17 -07:00
George Bosilca
f8facb177d atomically update the refcount on the datatype args. 2016-05-11 12:40:18 -04:00
Matias A Cabral
528abff6ae Merge remote-tracking branch 'upstream/master' 2016-05-10 15:42:08 -07:00
Matias A Cabral
d28ee62a96 Update in PSM and PSM2 MTLs to detect entries created by drivers for
Intel TrueScale and Intel OmniPath, and detect a link in ACTIVE state.
This fix addresses the scenario reported in the below OMPI users email,
including formerly named Qlogic IB, now Intel True scale. Given the
nature of the PSM/PSM2 mtls this fix applies to OmniPath:
https://www.open-mpi.org/community/lists/users/2016/04/29018.php
2016-05-09 12:08:44 -07:00
Gilles Gouaillardet
0a19337371 coll/base: return MPI_ERR_UNSUPPORTED_OPERATION when coll_base_*_two_procs algo is used on a communicator that has no two tasks
Thanks Dave Love for the report
2016-05-09 14:18:40 +09:00
Gilles Gouaillardet
b159587325 io/romio: fix filesystem type check on OpenBSD 5.7
check the existence of the f_type field in struct statfs

Thanks Paul Hargrove for the report
2016-05-09 13:54:46 +09:00
Ralph Castain
6b24e2779b Remove stale component - I'm not going to get to it 2016-05-07 04:13:34 -07:00
Edgar Gabriel
def1b95fd7 Merge pull request #1646 from edgargabriel/getview-preallocate-fixes
io/ompio: file_getview and file_preallocate fixes
2016-05-06 11:46:00 -05:00
Edgar Gabriel
e65e189671 io/ompio: fix file size after file_preallocate
Thanks for @dalcini for reporting
Fixes open-mpi/ompi#1633
2016-05-06 08:20:59 -05:00
Edgar Gabriel
d358965134 io/ompio: fix envelope of datatype returned by getview
Thanks for @dalcini for reporting
Fixes open-mpi/ompi#1632
2016-05-06 08:19:48 -05:00
Edgar Gabriel
7c92acaa78 Merge pull request #1637 from edgargabriel/pr/netbsd-compilation-problems
fs/lustre and fs/pvfs2: fix netbsd compilation problems
2016-05-06 08:05:36 -05:00
Jeff Squyres
810db734c4 Merge pull request #1640 from jsquyres/pr/mpir-cleanup
debuggers: remove some useless code
2016-05-05 21:23:30 -04:00
Gilles Gouaillardet
6c9d65c0ca coll/libnbc: fix MPI_Ireduce_scatter_block for one task communicator
Thanks Lisandro Dalcin for the report

Fixes open-mpi/ompi#248
2016-05-06 09:43:29 +09:00
Ralph Castain
08022d7af1 Some minor cleanups of warnings from gcc 6.0.0. Update s1/s2 pmix to get max_procs as required. 2016-05-05 15:28:13 -07:00
Jeff Squyres
83c2d04aa3 debuggers: remove some useless code
MPIR-1.0 specifies that the following symbols are only relevant in the
starter process:

- MPIR_Breakpoint
- MPIR_being_debugged
- MPIR_debug_state
- MPIR_debug_abort_string

I.e., the code filling in values in these various symbols was useless
/ never used.

MPIR-1.1 will define that MPIR_being_debugged *is* relevant in MPI
processes.  That symbol is currently defined in libopen-rte (which is
currently causing a duplicate symbol error for static builds -- this
commit fixes that error), and is therefore still available for MPI
processes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-05 14:22:55 -07:00
Jeff Squyres
f167be1c91 ompio: always return valid info from FILE_GET_INFO
MPI-3.1 says that even if no info keys are set on the file, we need to
return a new, empty info.

Thanks to Lisandro Dalcin for identifying the issue.

Fixes open-mpi/ompi#1630

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-05 12:03:29 -07:00
Aurélien Bouteiller
4899c89731 Fix a race condition when multiple threads try to create a bml endpoint simultaneously. 2016-05-05 10:49:30 -04:00
Aurélien Bouteiller
627a89bf71 Fix a race condition when multiple threads do the "first send" to an endpoint simultaneously. 2016-05-05 09:04:10 -04:00
Joshua Ladd
4771c9ece6 Merge pull request #1617 from jladd-mlnx/topic/disable-hcoll-barrier-in-finalize-ompi-trunk
HCOLL: fix hang in hcoll barrier called from finalize for MXM/yalla
2016-05-04 10:12:34 -04:00
Aurélien Bouteiller
8344d00418 use-mpi extensions do not have a .la lib, so the fortran module should not depend on them. 2016-05-03 11:54:35 -04:00
Edgar Gabriel
78fa8bb2c4 remove some unused variables that can cause compilation problems on netbsd 2016-05-03 10:25:15 -05:00
Todd Kordenbrock
3498bed650 Merge pull request #1555 from shawone/check_reduce_ret
coll-portals4: check return value from reduce kary tree functions
2016-05-03 10:17:23 -05:00
Jeff Squyres
33dd8ca81e osc_rdma_peer: properly include ompi_config.h
Thanks to Paul Hargrove for reporting.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-03 07:39:55 -07:00
Devendar Bureddy
cafd55f18c HCOLL: fix hang in hcoll barrier called from finalize for MXM/yalla
tear down

HCOLL barrier may not complete if HCOLL progress is not called periodically.
which is the case in HCOLL teardown progress in the finalize.
(cherry picked from commit 793244d75dd94d1d5e0243bcccf6d04318750f3f)
2016-05-03 00:49:57 +03:00
Nathan Hjelm
d3d779f6d9 osc/rdma: clear all_sync object when obtaining a lock
This commit fixes a bad synchronization detection bug that occurs when
mixing MPI_Win_fence() and MPI_Win_lock(). If no communication has
occurred in the fence epoch it is safe to just clear the all_sync
object (it was set up by fence).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-02 15:28:47 -06:00
Jeff Squyres
265e5b9795 Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1
ompi/opal/orte/oshmem/test: max hostname length cleanup
2016-05-02 09:44:18 -04:00
Ralph Castain
6ac7929bd0 Extend the schizo framework to allow definition of CLI options by environment. Refactor orterun to mesh with the orted_submit code, thus improving code reuse. Eliminate the orte-submit tool as orterun can now meet that need.
Cleanups per @jjhursey review
2016-05-01 11:30:25 -07:00
George Bosilca
6e6ed62a3c Allow NULL arrays for emoty datatypes.
When building an empty datatype (aka. size = 0) because the count of
included datatypes is 0, be less strict on what the arguments are
(allow NULL pointers).
2016-05-01 12:37:02 -04:00
Nathan Hjelm
ec66a6a1f8 Merge pull request #1605 from hjelmn/rdma_fixes
osc/rdma: fix global index array calculation
2016-04-28 20:41:36 -06:00
Nathan Hjelm
7bda3eb2dc osc/rdma: fix global index array calculation
This commit fixes a bug that occurs when ranks are either not mapped
evenly or by something other than core.

Fixes open-mpi/ompi#1599

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-28 19:11:11 -06:00
Nathan Hjelm
1783d94f91 ompi/group: fix sparse group proc reference counting
This commit fixes a bug when sparse groups are in use. Since sparse
group do not actually increment the reference counts of any procs
(they just retain the parent group) it is wrong to decrement the
reference counts of all procs in the group using
ompi_group_decrement_proc_count(). This commit makes the call to
ompi_group_decrement_proc_count() conditional on the group being
dense.

Fixes open-mpi/ompi#1593

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-27 15:55:13 -06:00
Gilles Gouaillardet
01c90d4e71 fortran/mpif-h: fix *_create_keyval_f
correctly handle out parameter *_keyval when OMPI_SIZEOF_FORTRAN_INTEGER > SIZEOF_INT
2016-04-27 13:34:32 +09:00
Gilles Gouaillardet
178dde6a20 fortran/mpif-h: fix MPI_Win_shared_query
correctly handle out parameter disp_unit when OMPI_SIZEOF_FORTRAN_INTEGER > SIZEOF_INT
2016-04-27 11:22:09 +09:00
Gilles Gouaillardet
7f59d2a8c7 fortran/mpif-h: fix MPI_Win_free_keyval
initialize inout parameter when OMPI_SIZEOF_FORTRAN_INTEGER > SIZEOF_INT
2016-04-27 10:46:14 +09:00
Nathan Hjelm
f0f3383006 Merge pull request #1590 from hjelmn/thread_multiple
osc/pt2pt: do not drop/reacquire the ompi_request_lock
2016-04-26 16:48:37 -06:00
Nathan Hjelm
34ff6293bd osc/pt2pt: do not drop/reacquire the ompi_request_lock
This lock is now recursive so it is safe to call into the pml without
dropping the lock.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-26 14:19:38 -06:00
George Bosilca
bf190671e9 Make the request lock recursive.
If during the request completion callback we post another request that
completes right away (such a small send or a match for an unexpected
short message) we will try to complete the second request while holding
the lock for the completion of the first. For performance reasons
(mainly to avoid unlocking and locking the request mutex several times)
we have made the request lock recursive.
2016-04-26 16:16:07 -04:00
Nathan Hjelm
1e4daa2a0e mpi_init: move opal_set_using_threads() earlier in MPI_Init()
There is a potential race condition in MPI_Init() where an orte even
thread could be in a function that uses OPAL_THREAD_LOCK /
OPAL_THREAD_UNLOCK when ompi_mpi_init calls opal_set_using_threads().

Closes open-mpi/ompi#1586

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-26 13:02:42 -06:00
Nathan Hjelm
c16e639b2f Merge pull request #1563 from hjelmn/ompi_coverity
ompi coverity fixes
2016-04-26 09:17:48 -06:00
Jeff Squyres
8ab88f2051 ompi_mpi_finalize: add/update comments
This is a follow-on to open-mpi/ompi@7373111: add some comments
explaining why the code is the way it is.  Also update a previous
comment.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-04-25 13:42:30 -07:00
Ralph Castain
7373111662 Somehow, the logic for finalize got lost, so restore it here. If pmix.fence_nb is available, then call it and cycle opal_progress until complete. If pmix.fence_nb is not available, then do an MPI_Barrier and call pmix.fence.
Needs to go over to 2.x
2016-04-25 08:04:35 -07:00
Karol Mroz
3322347da9 ompi: fixup hostname max length usage
Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-25 07:08:23 +02:00
Nathan Hjelm
ae0ffbb67f Merge pull request #1397 from hjelmn/enable_thread_multiple
ompi: always enable MPI_THREAD_MULTIPLE support
2016-04-23 08:40:22 -06:00
Joshua Ladd
0d5a57d9d3 Merge pull request #1558 from vspetrov/hcoll_complex_dtype_support
Adds mapping to hcoll complex data type
2016-04-20 08:35:33 -04:00
Gilles Gouaillardet
490b538ad6 ompi/datatype: fix MPI_LONG_LONG_INT type name
MPI_LONG_LONG_INT is a named predefined datatype, so its name is now MPI_LONG_LONG_INT
MPI_LONG_LONG is a synonym of MPI_LONG_LONG_INT, and its name is also MPI_LONG_LONG_INT
2016-04-20 09:34:20 +09:00
Nathan Hjelm
1ff3d3b16b pml/ob1: fix coverity issue
Fix CID 1357978 (1 of 1): Logically dead code (DEADCODE):

Remove duplicate check for NULL == endpoint.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-19 14:48:13 -06:00
Nathan Hjelm
70533e6d50 fcoll/static: fix coverity issues
Fix CID 72362: Explicit null dereferenced (FORWARD_NULL)

From what I can tell the code @ fcoll_static_file_read_all.c:649
should be setting bytes_per_process[i] to 0 not bytes_per_process.

Fix CID 72361: Explicit null dereferenced (FORWARD_NULL)

Modified check to check for blocklen_per_process non-NULL before
trying to free blocklen_per_process[l]. This is sufficient because
free (NULL) is safe. Also cleaned up the initialization of this an a
couple other arrays. They were allocated with malloc() then
initialized to 0. Changed to used calloc().

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-19 14:48:13 -06:00
Nathan Hjelm
8871bdb2f8 fcoll/two_phase: fix coverity issues
Fix CID 72296: Resource leak (RESOURCE_LEAK):

Changed code to goto exit instead of returning to ensure memory is
freed.

Fix CID 712589: Out-of-bounds read (OVERRUN):

In this loop i and j are identical and always less than
iov_count. The CID was triggered because i was incremented if i was <
iov_count. This meant that if the loop did go on the next iteration
would access an invalid index.

Fix CID 741363: Uninitialized scalar variable (UNINIT):

Allocate tmp_len with calloc to insure every index is initialized.

Fix CID 741364: Uninitialized pointer read (UNINIT):

Allocate recv_types with calloc to ensure all indices are always
initialized. Also added a check to not loop and destroy if recv_types
is NULL.

Also added a NULL check on the allocation of decoded iov. This is not
the cause of CID 126784 but should be fixed.

Fix CID 712588: Out-of-bounds read (OVERRUN):

Similar to CID 712589. Should silence the issue.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-19 14:47:41 -06:00