1
1
Граф коммитов

8967 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
8445c885ce pml/cm: update for request changes
This fixes a hang caused by the request refactor work. The cm pml was
not updated and was hanging is most cases.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 15:35:32 -06:00
Nathan Hjelm
ef11ba9394 request: fix compilation error
The request.h header is unfortunately included files in the C++
bindings. C++ does not allow assigning from void * to another
pointer without a cast. This commit adds the cast. We can clean this
up when the C++ bindings are deleted.

Fixes open-mpi/ompi#1707

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 09:52:23 -06:00
Valentin Petrov
5ff6372886 coll/hcoll: bugfix: initialize req_type field
If left uninitialized then segfault is possible in MPI_Waitall in
    the case the field by chance equals OMPI_REQUEST_GEN.
2016-05-25 15:38:01 +03:00
George Bosilca
2b868c4952 Fix MPI datatype args.
Compensate for the datatype ID that we add to the array.
2016-05-24 23:36:54 -04:00
bosilca
b90c83840f Refactor the request completion (#1422)
* Remodel the request.
Added the wait sync primitive and integrate it into the PML and MTL
infrastructure. The multi-threaded requests are now significantly
less heavy and less noisy (only the threads associated with completed
requests are signaled).

* Fix the condition to release the request.
2016-05-24 18:20:51 -05:00
Nathan Hjelm
5126da5377 win: add support for accumulate_ordering info key
This commit adds support for the MPI-3.1 accumulate_ordering info
key. The default value is rar,war,raw,waw and is supported using an
MCA variable flag enumerator.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-24 11:13:30 -06:00
Jeff Squyres
e7d46b96a3 Merge pull request #1680 from yburette/topic/fix_provider_selection
mtl/ofi: Change default provider selection behavior.
2016-05-23 15:06:02 -04:00
Gilles Gouaillardet
bca44592af Merge pull request #1643 from ggouaillardet/topic/romio_openbsd57
io/romio: fix filesystem type check on OpenBSD
2016-05-23 16:33:56 +09:00
George Bosilca
16d9f71d01 Correctly compute the space needed for the args.
Add checks to bail out if our precomputed value is less
than needed (we are already at fault).

bot:milestone:v1.10.3
bot:milestone:v2.0
bot🏷️bug
bot:assign: @ggouaillardet
2016-05-21 16:01:16 -04:00
George Bosilca
0641005dab Only check the parameters on valid dimensions. 2016-05-21 15:54:04 -04:00
George Bosilca
6aac0d9c22 Remove useless output stream. 2016-05-21 15:54:04 -04:00
Nathan Hjelm
31bfeede82 bml/r2: always add btl progress function
This commit changes the behavior of bml/r2 from conditionally
registering btl progress functions to always registering progress
functions. Any progress function beloning to a btl that is not yet in
use is registered as low-priority. As soon as a proc is added that
will make use of the btl is is re-registered normally.

This works around an issue with some btls. In order to progress a
first message from an unknown peer both ugni and openib need to have
their progress functions called. If either btl is not in use after the
first call to add_procs the callback was never happening. This commit
ensures the btl progress function is called at some point but the
number of progress callbacks is reduced from normal to ensure lower
overhead when a btl is not used. The current ratio is 1 low priority
progress callback for every 8 calls to opal_progress().

Fixes open-mpi/ompi#1676

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-21 15:54:04 -04:00
yohann
2f0cde791a mtl/ofi: Change default provider selection behavior.
As more providers get added to libfabric, the default exclude list would need
to be updated.
Instead, we choose to include only the providers known to work by default.

New default:
  - include: psm,psm2,gni
  - exclude: none
2016-05-19 10:59:25 -07:00
Ralph Castain
a35bb8453a Unlock the mutex prior to destructing it.
Thanks to Nicolas Joly for the report
2016-05-19 10:36:58 -07:00
rhc54
8b534e9897 Merge pull request #1668 from rhc54/topic/slurm
When direct launching applications, we must allow the MPI layer to pr…
2016-05-16 12:23:19 -07:00
Jeff Squyres
5275e5e2a1 bml_r2: use __func__ to identify function names
There were some old/stale function names in some debugging/verbose
opal_output calls.  Use __func__ instead, so that they won't become
stale in the future.

Thanks to Durga Choudhury for pointing out the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-16 11:06:47 -04:00
Ralph Castain
01ba861f2a When direct launching applications, we must allow the MPI layer to progress during RTE-level barriers. Neither SLURM nor Cray provide non-blocking fence functions, so push those calls into a separate event thread (use the OPAL async thread for this purpose so we don't create another one) and let the MPI thread sping in wait_for_completion. This also restores the "lazy" completion during MPI_Finalize to minimize cpu utilization.
Update external as well

Revise the change: we still need the MPI_Barrier in MPI_Finalize when we use a blocking fence, but do use the "lazy" wait for completion. Replace the direct logic in MPI_Init with a cleaner macro
2016-05-14 16:37:00 -07:00
Aurélien Bouteiller
7f65c2b18e forgot to update copyright in commits 627a89b 4899c89 2016-05-13 11:34:59 -04:00
George Bosilca
37e03e3e5b Don't update req_bytes_received if no bytes were received. 2016-05-12 23:39:32 -04:00
rhc54
4d026e223c Merge pull request #1661 from matcabral/master
PSM and PSM2 MTLs to detect drivers and link
2016-05-11 17:43:17 -07:00
George Bosilca
f8facb177d atomically update the refcount on the datatype args. 2016-05-11 12:40:18 -04:00
Matias A Cabral
528abff6ae Merge remote-tracking branch 'upstream/master' 2016-05-10 15:42:08 -07:00
Matias A Cabral
d28ee62a96 Update in PSM and PSM2 MTLs to detect entries created by drivers for
Intel TrueScale and Intel OmniPath, and detect a link in ACTIVE state.
This fix addresses the scenario reported in the below OMPI users email,
including formerly named Qlogic IB, now Intel True scale. Given the
nature of the PSM/PSM2 mtls this fix applies to OmniPath:
https://www.open-mpi.org/community/lists/users/2016/04/29018.php
2016-05-09 12:08:44 -07:00
Gilles Gouaillardet
0a19337371 coll/base: return MPI_ERR_UNSUPPORTED_OPERATION when coll_base_*_two_procs algo is used on a communicator that has no two tasks
Thanks Dave Love for the report
2016-05-09 14:18:40 +09:00
Gilles Gouaillardet
b159587325 io/romio: fix filesystem type check on OpenBSD 5.7
check the existence of the f_type field in struct statfs

Thanks Paul Hargrove for the report
2016-05-09 13:54:46 +09:00
Ralph Castain
6b24e2779b Remove stale component - I'm not going to get to it 2016-05-07 04:13:34 -07:00
Edgar Gabriel
def1b95fd7 Merge pull request #1646 from edgargabriel/getview-preallocate-fixes
io/ompio: file_getview and file_preallocate fixes
2016-05-06 11:46:00 -05:00
Edgar Gabriel
e65e189671 io/ompio: fix file size after file_preallocate
Thanks for @dalcini for reporting
Fixes open-mpi/ompi#1633
2016-05-06 08:20:59 -05:00
Edgar Gabriel
d358965134 io/ompio: fix envelope of datatype returned by getview
Thanks for @dalcini for reporting
Fixes open-mpi/ompi#1632
2016-05-06 08:19:48 -05:00
Edgar Gabriel
7c92acaa78 Merge pull request #1637 from edgargabriel/pr/netbsd-compilation-problems
fs/lustre and fs/pvfs2: fix netbsd compilation problems
2016-05-06 08:05:36 -05:00
Jeff Squyres
810db734c4 Merge pull request #1640 from jsquyres/pr/mpir-cleanup
debuggers: remove some useless code
2016-05-05 21:23:30 -04:00
Gilles Gouaillardet
6c9d65c0ca coll/libnbc: fix MPI_Ireduce_scatter_block for one task communicator
Thanks Lisandro Dalcin for the report

Fixes open-mpi/ompi#248
2016-05-06 09:43:29 +09:00
Ralph Castain
08022d7af1 Some minor cleanups of warnings from gcc 6.0.0. Update s1/s2 pmix to get max_procs as required. 2016-05-05 15:28:13 -07:00
Jeff Squyres
83c2d04aa3 debuggers: remove some useless code
MPIR-1.0 specifies that the following symbols are only relevant in the
starter process:

- MPIR_Breakpoint
- MPIR_being_debugged
- MPIR_debug_state
- MPIR_debug_abort_string

I.e., the code filling in values in these various symbols was useless
/ never used.

MPIR-1.1 will define that MPIR_being_debugged *is* relevant in MPI
processes.  That symbol is currently defined in libopen-rte (which is
currently causing a duplicate symbol error for static builds -- this
commit fixes that error), and is therefore still available for MPI
processes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-05 14:22:55 -07:00
Jeff Squyres
f167be1c91 ompio: always return valid info from FILE_GET_INFO
MPI-3.1 says that even if no info keys are set on the file, we need to
return a new, empty info.

Thanks to Lisandro Dalcin for identifying the issue.

Fixes open-mpi/ompi#1630

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-05 12:03:29 -07:00
Aurélien Bouteiller
4899c89731 Fix a race condition when multiple threads try to create a bml endpoint simultaneously. 2016-05-05 10:49:30 -04:00
Aurélien Bouteiller
627a89bf71 Fix a race condition when multiple threads do the "first send" to an endpoint simultaneously. 2016-05-05 09:04:10 -04:00
Joshua Ladd
4771c9ece6 Merge pull request #1617 from jladd-mlnx/topic/disable-hcoll-barrier-in-finalize-ompi-trunk
HCOLL: fix hang in hcoll barrier called from finalize for MXM/yalla
2016-05-04 10:12:34 -04:00
Aurélien Bouteiller
8344d00418 use-mpi extensions do not have a .la lib, so the fortran module should not depend on them. 2016-05-03 11:54:35 -04:00
Edgar Gabriel
78fa8bb2c4 remove some unused variables that can cause compilation problems on netbsd 2016-05-03 10:25:15 -05:00
Todd Kordenbrock
3498bed650 Merge pull request #1555 from shawone/check_reduce_ret
coll-portals4: check return value from reduce kary tree functions
2016-05-03 10:17:23 -05:00
Jeff Squyres
33dd8ca81e osc_rdma_peer: properly include ompi_config.h
Thanks to Paul Hargrove for reporting.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-03 07:39:55 -07:00
Devendar Bureddy
cafd55f18c HCOLL: fix hang in hcoll barrier called from finalize for MXM/yalla
tear down

HCOLL barrier may not complete if HCOLL progress is not called periodically.
which is the case in HCOLL teardown progress in the finalize.
(cherry picked from commit 793244d75dd94d1d5e0243bcccf6d04318750f3f)
2016-05-03 00:49:57 +03:00
Nathan Hjelm
d3d779f6d9 osc/rdma: clear all_sync object when obtaining a lock
This commit fixes a bad synchronization detection bug that occurs when
mixing MPI_Win_fence() and MPI_Win_lock(). If no communication has
occurred in the fence epoch it is safe to just clear the all_sync
object (it was set up by fence).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-02 15:28:47 -06:00
Jeff Squyres
265e5b9795 Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1
ompi/opal/orte/oshmem/test: max hostname length cleanup
2016-05-02 09:44:18 -04:00
Ralph Castain
6ac7929bd0 Extend the schizo framework to allow definition of CLI options by environment. Refactor orterun to mesh with the orted_submit code, thus improving code reuse. Eliminate the orte-submit tool as orterun can now meet that need.
Cleanups per @jjhursey review
2016-05-01 11:30:25 -07:00
George Bosilca
6e6ed62a3c Allow NULL arrays for emoty datatypes.
When building an empty datatype (aka. size = 0) because the count of
included datatypes is 0, be less strict on what the arguments are
(allow NULL pointers).
2016-05-01 12:37:02 -04:00
Nathan Hjelm
ec66a6a1f8 Merge pull request #1605 from hjelmn/rdma_fixes
osc/rdma: fix global index array calculation
2016-04-28 20:41:36 -06:00
Nathan Hjelm
7bda3eb2dc osc/rdma: fix global index array calculation
This commit fixes a bug that occurs when ranks are either not mapped
evenly or by something other than core.

Fixes open-mpi/ompi#1599

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-28 19:11:11 -06:00
Nathan Hjelm
1783d94f91 ompi/group: fix sparse group proc reference counting
This commit fixes a bug when sparse groups are in use. Since sparse
group do not actually increment the reference counts of any procs
(they just retain the parent group) it is wrong to decrement the
reference counts of all procs in the group using
ompi_group_decrement_proc_count(). This commit makes the call to
ompi_group_decrement_proc_count() conditional on the group being
dense.

Fixes open-mpi/ompi#1593

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-27 15:55:13 -06:00