1
1

25110 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
fd93d236b1 opal/util/ethtool: fix compilation on older Linux when struct ethtool_cmd has no speed_hi field
Refs: open-mpi/ompi#1628
2016-05-19 11:58:04 +09:00
Nathan Hjelm
a679cc072b bml/r2: always add btl progress function
This commit changes the behavior of bml/r2 from conditionally
registering btl progress functions to always registering progress
functions. Any progress function beloning to a btl that is not yet in
use is registered as low-priority. As soon as a proc is added that
will make use of the btl is is re-registered normally.

This works around an issue with some btls. In order to progress a
first message from an unknown peer both ugni and openib need to have
their progress functions called. If either btl is not in use after the
first call to add_procs the callback was never happening. This commit
ensures the btl progress function is called at some point but the
number of progress callbacks is reduced from normal to ensure lower
overhead when a btl is not used. The current ratio is 1 low priority
progress callback for every 8 calls to opal_progress().

Fixes open-mpi/ompi#1676

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-18 16:00:31 -06:00
Jeff Squyres
66f53ec29a Merge pull request #1628 from kmroz/wip-btl-tcp-ethtool-speed
btl/tcp: autodetect bandwidth and latency if unset by the user
2016-05-18 12:12:55 -04:00
Nathan Hjelm
9371a6a52d Merge pull request #1673 from hjelmn/fix_rcache_deadlock
rcache: fix deadlock in multi-threaded environments
2016-05-18 08:32:21 -07:00
Karol Mroz
ca6ddf3270 btl/tcp: autodetect bandwidth and latency if unset
Fixes open-mpi/ompi#120

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-05-18 16:25:52 +02:00
Karol Mroz
b9c6c43c6b btl/tcp: add default defines for bandwidth and latency
Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-05-18 16:25:52 +02:00
Karol Mroz
31e33a64f9 opal/util: add function to obtain interface speed
If kernel ethtool_cmd_speed() is not available, use copies if possible.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-05-18 16:25:51 +02:00
Nathan Hjelm
ab8ed177f5 rcache: fix deadlock in multi-threaded environments
This commit fixes several bugs in the registration cache code:

 - Fix a programming error in the grdma invalidation function that can
   cause an infinite loop if more than 100 registrations are
   associated with a munmapped region. This happens because the
   mca_rcache_base_vma_find_all function returns the same 100
   registrations on each call. This has been fixed by adding an
   iterate function to the vma tree interface.

 - Always obtain the vma lock when needed. This is required because
   there may be other threads in the system even if
   opal_using_threads() is false. Additionally, since it is safe to do
   so (the vma lock is recursive) the vma interface has been made
   thread safe.

 - Avoid calling free() while holding a lock. This avoids race
   conditions with locks held outside the Open MPI code.

Fixes open-mpi/ompi#1654.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-17 09:02:40 -06:00
Nathan Hjelm
f6938868bd Merge pull request #1659 from hjelmn/sync_64
sync_builtin: check for 64-bit atomic support
2016-05-17 05:40:04 -07:00
rhc54
8b534e9897 Merge pull request #1668 from rhc54/topic/slurm
When direct launching applications, we must allow the MPI layer to pr…
2016-05-16 12:23:19 -07:00
Howard Pritchard
9066d26c67 Merge pull request #1672 from hppritcha/topic/fix_cray_pmix_bust
pmix/cray: fix some breakage
2016-05-16 20:06:51 +01:00
Jeff Squyres
daa26f4578 Merge pull request #1671 from jsquyres/pr/fix-bml-opal-output-func-names
bml_r2: use __func__ to identify function names
2016-05-16 14:17:04 -04:00
Howard Pritchard
1a676e5b35 pmix/cray: fix some breakage
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-05-16 12:45:05 -05:00
Jeff Squyres
5275e5e2a1 bml_r2: use __func__ to identify function names
There were some old/stale function names in some debugging/verbose
opal_output calls.  Use __func__ instead, so that they won't become
stale in the future.

Thanks to Durga Choudhury for pointing out the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-16 11:06:47 -04:00
Gilles Gouaillardet
4e21933a74 memory/patcher: declare __curbrk as extern in order not to generate an (unitialized) common symbol 2016-05-16 09:30:11 +09:00
rhc54
e0fef390a8 Merge pull request #1670 from rhc54/topic/dashhost
In MPMD case, add slots given to each of the executables instead of overwriting
2016-05-15 13:51:54 -07:00
Ralph Castain
ca69403cc8 In MPMD case, add slots given to each of the executables instead of overwriting 2016-05-15 08:55:43 -07:00
Ralph Castain
01ba861f2a When direct launching applications, we must allow the MPI layer to progress during RTE-level barriers. Neither SLURM nor Cray provide non-blocking fence functions, so push those calls into a separate event thread (use the OPAL async thread for this purpose so we don't create another one) and let the MPI thread sping in wait_for_completion. This also restores the "lazy" completion during MPI_Finalize to minimize cpu utilization.
Update external as well

Revise the change: we still need the MPI_Barrier in MPI_Finalize when we use a blocking fence, but do use the "lazy" wait for completion. Replace the direct logic in MPI_Init with a cleaner macro
2016-05-14 16:37:00 -07:00
Aurélien Bouteiller
7f65c2b18e forgot to update copyright in commits 627a89b 4899c89 2016-05-13 11:34:59 -04:00
George Bosilca
37e03e3e5b Don't update req_bytes_received if no bytes were received. 2016-05-12 23:39:32 -04:00
Gilles Gouaillardet
456b73da69 btl/openib: fix error path in init_one_device()
do not explicitly release ib verbs components since they will
be released in the object destructor

Thanks Durga for the report
2016-05-13 09:03:48 +09:00
rhc54
4d026e223c Merge pull request #1661 from matcabral/master
PSM and PSM2 MTLs to detect drivers and link
2016-05-11 17:43:17 -07:00
Matias A Cabral
40717198c6 add glob.h check fro psm and psm2 mtls 2016-05-11 13:05:43 -07:00
George Bosilca
f8facb177d atomically update the refcount on the datatype args. 2016-05-11 12:40:18 -04:00
rhc54
055a3add73 Merge pull request #1667 from rhc54/topic/pmixext
Ensure we fail to configure if external PMIx was requested and is not found.
2016-05-11 08:56:16 -07:00
Ralph Castain
9a5ef60602 Ensure we fail to configure if external PMIx was requested and is not found. 2016-05-11 07:59:05 -07:00
Matias A Cabral
528abff6ae Merge remote-tracking branch 'upstream/master' 2016-05-10 15:42:08 -07:00
Jeff Squyres
30f913f217 Merge pull request #1652 from jsquyres/pr/remove-aix-timer
timer/aix: remove stale code
2016-05-10 15:47:02 -04:00
Jeff Squyres
5aa38cc300 Merge pull request #1644 from ggouaillardet/topic/two_proc_algo_error
coll/tuned: two_proc algo errors when forced on comm size > 2
2016-05-10 11:54:15 -04:00
Jeff Squyres
3706c955cc Merge pull request #1663 from jsquyres/pr/hwloc-wrapper-extra-flags-fix
hwloc/external: set WRAPPER_EXTRA_* vars in proper location
2016-05-10 11:51:33 -04:00
Jeff Squyres
eccf0ff4cd hwloc/external: set WRAPPER_EXTRA_* vars in proper location
WRAPPER_EXTRA flags are checked *before* the POST_CONFIG macro is
invoked.  So set them in the main CONFIG macro.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-10 07:34:56 -07:00
Nathan Hjelm
b2f33bc076 opal/asm: fall back on inline asm atomics in some cases
This commit changes the asm configure logic to fall back on inline asm
atomics on systems that 1) have __sync atomics, 2) do not have 64-bit
__sync atomics, and 3) support 64-bit asm.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-05-10 04:27:58 -06:00
Gilles Gouaillardet
ef3ee027b0 autogen: patch configure in order to correctly detect Solaris Studio 12.5 beta compilers
previously: f90: Sun Fortran 95 8.7 Linux_i386 2014/10/20
now : Studio 12.5 Fortran 95 8.8 Linux_i386 Beta 2015/11/17
note the "Sun" branding has gone

Thanks Paul Hargrove for the report and the patch
2016-05-10 16:28:33 +09:00
Matias A Cabral
d28ee62a96 Update in PSM and PSM2 MTLs to detect entries created by drivers for
Intel TrueScale and Intel OmniPath, and detect a link in ACTIVE state.
This fix addresses the scenario reported in the below OMPI users email,
including formerly named Qlogic IB, now Intel True scale. Given the
nature of the PSM/PSM2 mtls this fix applies to OmniPath:
https://www.open-mpi.org/community/lists/users/2016/04/29018.php
2016-05-09 12:08:44 -07:00
Josh Hursey
44d95cb610 Merge pull request #1657 from bgoglin/hwloc-for-2.0
configure: check the actual may_alias syntax that we use
2016-05-09 13:37:08 -05:00
Ralph Castain
7767882346 Per user request, add some missing data and definitions:
OPAL_PMIX_UNIV_RANK - synonym for OPAL_PMIX_GLOBAL_RANK
OPAL_PMIX_APP_SIZE - #ranks in the application of this proc
2016-05-09 08:39:01 -07:00
Nathan Hjelm
d99a9786b6 sync_builtin: check for 64-bit atomic support
This commit adds an additional check for 64-bit atomic support for __sync
builtins. If 64-bit support is not available the opal_atomic_*_64 atomics
are disabled.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-05-09 03:17:51 -06:00
Gilles Gouaillardet
0a19337371 coll/base: return MPI_ERR_UNSUPPORTED_OPERATION when coll_base_*_two_procs algo is used on a communicator that has no two tasks
Thanks Dave Love for the report
2016-05-09 14:18:40 +09:00
Gilles Gouaillardet
b159587325 io/romio: fix filesystem type check on OpenBSD 5.7
check the existence of the f_type field in struct statfs

Thanks Paul Hargrove for the report
2016-05-09 13:54:46 +09:00
Brice Goglin
6839d928c2 configure: check the actual may_alias syntax that we use
xlc 13.1.0 crashes because of our may_alias attributes in nolibxml.c
on Power7. libxml.c and nolibxml.c are the only may_alias users for now,
so change our configure check to match the actual code using it.

Thanks to Paul Hargrove for reporting and debugging the issue,
and providing the patch.

https://www.open-mpi.org/community/lists/devel/2016/05/18918.php

(cherry picked from open-mpi/hwloc@0ab7af5e90)
2016-05-08 22:22:30 +02:00
Ralph Castain
1911d74095 Prevent segfault when -debug given to mpirun 2016-05-08 10:19:05 -07:00
Ralph Castain
7594b95e4b Ensure the hwloc external header is include when --with-devel-headers is given 2016-05-08 10:18:14 -07:00
Nathan Hjelm
b110f76c8e Merge pull request #1653 from jsquyres/pr/fix-memory-patcher-sys-syscall-h
memory/patcher: check for <sys/syscall.h>
2016-05-08 04:03:17 -06:00
Jeff Squyres
acbd2c608d memory/patcher: check for <sys/syscall.h>
Thanks to Paul Hargrove for reporting.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-07 09:48:14 -07:00
Jeff Squyres
b4982d7725 timer/aix: remove stale code
Per discussion on the mailing list and with IBM, remove the AIX timer
code (since AIX is no longer supported).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-07 09:31:34 -07:00
rhc54
58de86e1a5 Merge pull request #1651 from rhc54/topic/op86
Remove stale component - I'm not going to get to it
2016-05-07 07:49:50 -07:00
Ralph Castain
6b24e2779b Remove stale component - I'm not going to get to it 2016-05-07 04:13:34 -07:00
rhc54
2839484737 Merge pull request #1649 from rhc54/topic/env
Fix the env_list support - the MCA param was being set way too early,…
2016-05-06 16:51:36 -07:00
Ralph Castain
7e5ef6a240 Fix the env_list support - the MCA param was being set way too early, so provide a "backdoor" way of providing the value 2016-05-06 15:38:39 -07:00
rhc54
6311f86939 Merge pull request #1648 from rhc54/topic/repair
Repair the processing of cmd line options that mapped to MCA params.
2016-05-06 14:39:23 -07:00