1
1
Граф коммитов

4132 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
37e9e2c660 mca/base: fix typo in flag enumeration
This commit fixes a typo in flag enumeration that can cause the parser
to miss valid flags or crash.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-23 12:21:34 -06:00
Gilles Gouaillardet
d5a2ac6f2f btl/openib: fix #if vs #ifdef 2016-05-23 14:27:33 +09:00
Gilles Gouaillardet
5a8cbe5a8f btl/openib: remove obsolete reference to MEMORY_LINUX_MALLOC_ALIGN_ENABLED macro 2016-05-23 14:12:21 +09:00
Gilles Gouaillardet
8466a3daf3 pmix: update .gitignore
git ignore opal/mca/pmix/pmix114/pmix/include/pmix/autogen/config.h.in
git rm opal/mca/pmix/pmix114/pmix/include/pmix/autogen/config.h.in
git ignore opal/mca/pmix/pmix*/...
2016-05-23 11:58:07 +09:00
Nathan Hjelm
31bfeede82 bml/r2: always add btl progress function
This commit changes the behavior of bml/r2 from conditionally
registering btl progress functions to always registering progress
functions. Any progress function beloning to a btl that is not yet in
use is registered as low-priority. As soon as a proc is added that
will make use of the btl is is re-registered normally.

This works around an issue with some btls. In order to progress a
first message from an unknown peer both ugni and openib need to have
their progress functions called. If either btl is not in use after the
first call to add_procs the callback was never happening. This commit
ensures the btl progress function is called at some point but the
number of progress callbacks is reduced from normal to ensure lower
overhead when a btl is not used. The current ratio is 1 low priority
progress callback for every 8 calls to opal_progress().

Fixes open-mpi/ompi#1676

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-21 15:54:04 -04:00
Ralph Castain
4e0749f03d Remove verbose error messages 2016-05-20 10:04:26 -07:00
Ralph Castain
42ecffb6d0 Move the registration of MCA params out of the init of the var system - put them in with the rest of the OPAL MCA param registrations
Take another shot at untangling the spaghetti

orterun: fix for command line parsing

orte-submit calls opal_init_util () before parsing out MCA command line
options (-mca, -am, etc). This prevents mpirun from setting opal MCA
variables for some frameworks as well as the MCA base. This is because
when a framework is opened all of its variables are set to read-only.
Eventually we want to lift this restriction on some MCA variables but
since -mca is affected we must parse out the MCA command line options
before opal_init_util(). This commit fixes the bug by adding a new
option to opal_cmd_line_parse (ignore unknown option) so orte-submit
can pre-parse the command line for MCA options.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>

Minor cleanups to avoid releasing/recreating the cmd line
2016-05-20 09:59:50 -07:00
Gilles Gouaillardet
5ec1eedbae Merge pull request #1682 from ggouaillardet/topic/fix-ethtool-again
opal/util/ethtool: use system ethtool_cmd_speed when available
2016-05-20 10:30:43 +09:00
Gilles Gouaillardet
cbbdce05b1 pmix/pmix114: silence a warning 2016-05-20 09:35:26 +09:00
Gilles Gouaillardet
ed3fd1775f rcache/grdma: silence a warning 2016-05-20 09:30:29 +09:00
Gilles Gouaillardet
a01a5487a8 opal/util/ethtool: use system ethtool_cmd_speed when available
Refs: open-mpi/ompi#1679
2016-05-20 09:05:09 +09:00
rhc54
99d3c283f5 Merge pull request #1681 from rhc54/topic/pmixupdate
Update PMIx 114 to current release candidate
2016-05-19 13:50:16 -07:00
Ralph Castain
6f743f81b6 Update PMIx 114 to current release candidate 2016-05-19 12:55:05 -07:00
Jeff Squyres
87233aae49 ethtool: better handle portability
Be sure to handle the case where we don't have ethtool support at all.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-19 10:57:14 -07:00
Gilles Gouaillardet
fd93d236b1 opal/util/ethtool: fix compilation on older Linux when struct ethtool_cmd has no speed_hi field
Refs: open-mpi/ompi#1628
2016-05-19 11:58:04 +09:00
Jeff Squyres
66f53ec29a Merge pull request #1628 from kmroz/wip-btl-tcp-ethtool-speed
btl/tcp: autodetect bandwidth and latency if unset by the user
2016-05-18 12:12:55 -04:00
Nathan Hjelm
9371a6a52d Merge pull request #1673 from hjelmn/fix_rcache_deadlock
rcache: fix deadlock in multi-threaded environments
2016-05-18 08:32:21 -07:00
Karol Mroz
ca6ddf3270 btl/tcp: autodetect bandwidth and latency if unset
Fixes open-mpi/ompi#120

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-05-18 16:25:52 +02:00
Karol Mroz
b9c6c43c6b btl/tcp: add default defines for bandwidth and latency
Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-05-18 16:25:52 +02:00
Karol Mroz
31e33a64f9 opal/util: add function to obtain interface speed
If kernel ethtool_cmd_speed() is not available, use copies if possible.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-05-18 16:25:51 +02:00
Nathan Hjelm
ab8ed177f5 rcache: fix deadlock in multi-threaded environments
This commit fixes several bugs in the registration cache code:

 - Fix a programming error in the grdma invalidation function that can
   cause an infinite loop if more than 100 registrations are
   associated with a munmapped region. This happens because the
   mca_rcache_base_vma_find_all function returns the same 100
   registrations on each call. This has been fixed by adding an
   iterate function to the vma tree interface.

 - Always obtain the vma lock when needed. This is required because
   there may be other threads in the system even if
   opal_using_threads() is false. Additionally, since it is safe to do
   so (the vma lock is recursive) the vma interface has been made
   thread safe.

 - Avoid calling free() while holding a lock. This avoids race
   conditions with locks held outside the Open MPI code.

Fixes open-mpi/ompi#1654.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-17 09:02:40 -06:00
Nathan Hjelm
f6938868bd Merge pull request #1659 from hjelmn/sync_64
sync_builtin: check for 64-bit atomic support
2016-05-17 05:40:04 -07:00
rhc54
8b534e9897 Merge pull request #1668 from rhc54/topic/slurm
When direct launching applications, we must allow the MPI layer to pr…
2016-05-16 12:23:19 -07:00
Howard Pritchard
1a676e5b35 pmix/cray: fix some breakage
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-05-16 12:45:05 -05:00
Gilles Gouaillardet
4e21933a74 memory/patcher: declare __curbrk as extern in order not to generate an (unitialized) common symbol 2016-05-16 09:30:11 +09:00
Ralph Castain
01ba861f2a When direct launching applications, we must allow the MPI layer to progress during RTE-level barriers. Neither SLURM nor Cray provide non-blocking fence functions, so push those calls into a separate event thread (use the OPAL async thread for this purpose so we don't create another one) and let the MPI thread sping in wait_for_completion. This also restores the "lazy" completion during MPI_Finalize to minimize cpu utilization.
Update external as well

Revise the change: we still need the MPI_Barrier in MPI_Finalize when we use a blocking fence, but do use the "lazy" wait for completion. Replace the direct logic in MPI_Init with a cleaner macro
2016-05-14 16:37:00 -07:00
Gilles Gouaillardet
456b73da69 btl/openib: fix error path in init_one_device()
do not explicitly release ib verbs components since they will
be released in the object destructor

Thanks Durga for the report
2016-05-13 09:03:48 +09:00
Jeff Squyres
30f913f217 Merge pull request #1652 from jsquyres/pr/remove-aix-timer
timer/aix: remove stale code
2016-05-10 15:47:02 -04:00
Jeff Squyres
eccf0ff4cd hwloc/external: set WRAPPER_EXTRA_* vars in proper location
WRAPPER_EXTRA flags are checked *before* the POST_CONFIG macro is
invoked.  So set them in the main CONFIG macro.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-10 07:34:56 -07:00
Josh Hursey
44d95cb610 Merge pull request #1657 from bgoglin/hwloc-for-2.0
configure: check the actual may_alias syntax that we use
2016-05-09 13:37:08 -05:00
Ralph Castain
7767882346 Per user request, add some missing data and definitions:
OPAL_PMIX_UNIV_RANK - synonym for OPAL_PMIX_GLOBAL_RANK
OPAL_PMIX_APP_SIZE - #ranks in the application of this proc
2016-05-09 08:39:01 -07:00
Nathan Hjelm
d99a9786b6 sync_builtin: check for 64-bit atomic support
This commit adds an additional check for 64-bit atomic support for __sync
builtins. If 64-bit support is not available the opal_atomic_*_64 atomics
are disabled.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-05-09 03:17:51 -06:00
Brice Goglin
6839d928c2 configure: check the actual may_alias syntax that we use
xlc 13.1.0 crashes because of our may_alias attributes in nolibxml.c
on Power7. libxml.c and nolibxml.c are the only may_alias users for now,
so change our configure check to match the actual code using it.

Thanks to Paul Hargrove for reporting and debugging the issue,
and providing the patch.

https://www.open-mpi.org/community/lists/devel/2016/05/18918.php

(cherry picked from open-mpi/hwloc@0ab7af5e90)
2016-05-08 22:22:30 +02:00
Ralph Castain
7594b95e4b Ensure the hwloc external header is include when --with-devel-headers is given 2016-05-08 10:18:14 -07:00
Jeff Squyres
acbd2c608d memory/patcher: check for <sys/syscall.h>
Thanks to Paul Hargrove for reporting.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-07 09:48:14 -07:00
Jeff Squyres
b4982d7725 timer/aix: remove stale code
Per discussion on the mailing list and with IBM, remove the AIX timer
code (since AIX is no longer supported).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-07 09:31:34 -07:00
Ralph Castain
7e5ef6a240 Fix the env_list support - the MCA param was being set way too early, so provide a "backdoor" way of providing the value 2016-05-06 15:38:39 -07:00
Ralph Castain
58dd41facf Repair the processing of cmd line options that mapped to MCA params. This was responsible for breaking things like map-by <foo>.
Remove debug, let orterun send terminate cmd to DVM

Recover the DVM support
2016-05-06 13:14:03 -07:00
Josh Hursey
35ae7e33d7 Merge pull request #1639 from jjhursey/topic/dl-open-null-fname
dl/dlopen/libltdl: Allow opal_dl_open to take a NULL filename.
2016-05-05 22:15:46 -05:00
Ralph Castain
8ec1891d11 Silence warning 2016-05-05 20:04:10 -07:00
Ralph Castain
08022d7af1 Some minor cleanups of warnings from gcc 6.0.0. Update s1/s2 pmix to get max_procs as required. 2016-05-05 15:28:13 -07:00
Joshua Hursey
677178f206 dl/dlopen/libltdl: Allow opal_dl_open to take a NULL filename. 2016-05-05 17:07:26 -04:00
Nathan Hjelm
80f45925bc Merge pull request #1629 from hjelmn/new_hooks_update
New hooks update
2016-05-04 18:53:25 -06:00
Joshua Hursey
788cf1a9fe asm/powerpc: Fix empty colon list in asm for XL compiler on power
Thanks to Paul Hargrove for reporting the problem, and submitting patch.
 * https://www.open-mpi.org/community/lists/devel/2016/05/18886.php
2016-05-04 14:14:33 -05:00
Nathan Hjelm
ff2a54bd37 patcher/linux: code cleanup
Update based on cleanup made to the upstream version on OpenUCX.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-04 12:53:45 -06:00
Nathan Hjelm
6c9a0e1c55 patcher/overwrite: disable ia64 support for now
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-04 12:53:24 -06:00
Nathan Hjelm
6ad68da407 patcher/linux: disable the linux patcher component
This commit disables the linux patcher component due to a limitation
in loader patching. While this component is effective in patching
calls made within Open MPI and by the application it fails to hook
calls made within glibc. This means the munmap call made by free is
not correctly hooked. Until this problem can be resolved this
component will remain disabled. If it can't be resolved this component
should probably be removed.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-04 12:50:51 -06:00
Nathan Hjelm
71be36d380 patcher: fix ppc32 support
The table of contents (TOC) code only appears to only apply to
ppc64. The code was incorrectly assuming the existence of the TOC on
ppc32. This commit updates the necessary code to only apply to ppc64.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-04 12:50:32 -06:00
Nathan Hjelm
41f00b7465 memory/patcher: initialize patcher framework when needed
This commit moves the patcher framework initialization to the
memory/patcher component.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-04 12:46:42 -06:00
Nathan Hjelm
0f54a95408 Merge pull request #1626 from hjelmn/vader_32
btl/vader: fix compilation on 32-bit systems
2016-05-03 16:39:46 -06:00