1
1

24807 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
1fa236b26c Ensure that we exit with a non-zero status when oversubscribe fails 2016-04-14 05:51:10 -07:00
Nathan Hjelm
1e6b4f2f55 Merge pull request #1495 from hjelmn/new_hooks
Add new patcher memory hooks
2016-04-13 18:19:23 -06:00
Nathan Hjelm
c2b6fbb124 opal/memory: move initialization to first rcache creation
Because of the removal of the linux memory component it is no longer
necessary to initialize the memory component in opal_init(). This
commit moves the initialization to the creation of the first rcache
component.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:21:46 -06:00
Nathan Hjelm
80ec79cfc8 memory/patcher: updates to memory hooks
This commit fixes bugs that can cause crashes and memory corruption
when the mremap hook is called. The problem occurs because of the
ellipses (...) in the mremap intercept function. The ellipses cover
the optional new_addr argument on Linux. This commit removes the
ellipses and adds an explicit 5th argument.

This commit also adds a hook for shmdt. The code only works on Linux
at the moment as it needs to read /proc/self/maps to determine the
size of the shared memory segment.

Additionally, this commit removes the mmap hook. There is no
apparent benefit for detecting mmap(..., PROT_NONE, ...) and it
seems to cause problems when threads are in use.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:20:24 -06:00
Nathan Hjelm
91bcab93cb opal/memory: remove ptmalloc2
This commit removes the ptmalloc2 memory hooks. This is necessary in
order to support lazy registration of memory hooks. A feature that is
not supported by the ptmalloc hooks but is supported by the new
patcher hooks.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:18:15 -06:00
Nathan Hjelm
27f8a4e806 opal: add code patcher framework
This commit adds a framework to abstract runtime code patching.
Components in the new framework can provide functions for either
patching a named function or a function pointer. The later
functionality is not being used but may provide a way to allow memory
hooks when dlopen functionality is disabled.

This commit adds two different flavors of code patching. The first is
provided by the overwrite component. This component overwrites the
first several instructions of the target function with code to jump to
the provided hook function. The hook is expected to provide the full
functionality of the hooked function.

The linux patcher component is based on the memory hooks in ucx. It
only works on linux and operates by overwriting function pointers in
the symbol table. In this case the hook is free to call the original
function using the function pointer returned by dlsym.

Both components restore the original functions when the patcher
framework closes.

Changes had to be made to support Power/PowerPC with the Linux
dynamic loader patcher. Some of the changes:

 - Move code necessary for powerpc/power support to the patcher
   base. The code is needed by both the overwrite and linux
   components.

 - Move patch structure down to base and move the patch list to
   mca_patcher_base_module_t. The structure has been modified to
   include a function pointer to the function that will unapply the
   patch. This allows the mixing of multiple different types of
   patches in the patch_list.

 - Update linux patching code to keep track of the matching between
   got entry and original (unpatched) address. This allows us to
   completely clean up the patch on finalize.

All patchers keep track of the changes they made so that they can be
reversed when the patcher framework is closed.

At this time there are bugs in the Linux dynamic loader patcher so
its priority is lower than the overwrite patcher.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:16:13 -06:00
Nathan Hjelm
b1670f844d contrib/platform: don't disable dlopen
The --enable-static gives us what we want: statically linked components.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:16:12 -06:00
Nathan Hjelm
4cac623aeb opal/patch: add call to check if binary patching is supported
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:16:12 -06:00
Nathan Hjelm
11e2d7886e opal/memory: update component structure
This commit makes it possible to set relative priorities for
components. Before the addition of the patched component there was
only one component that would run on any system but that is no longer
the case. When determining which component to open each component's
query function is called and the one that returns the highest priority
is opened. The default priority of the patcher component is set
slightly higher than the old ptmalloc2/ummunotify component.

This commit fixes a long-standing break in the abstration of the
memory components. ompi_mpi_init.c was referencing the linux malloc
hook initilize function to ensure the hooks are initialized for
libmpi.so. The abstraction break has been fixed by adding a memory
base function that calls the open memory component's malloc hook init
function if it has one. The code is not yet complete but is intended
to support ptmalloc in 2.0.0. In that case the base function will
always call the ptmalloc hook init if exists.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:14:51 -06:00
Nathan Hjelm
7aa03d66b3 opal/memory: add support for patch based memory hooks
This commit adds support for runtime binary patching. The support is
broken down into two parts: util/opal_patcher.[ch] which contains the
functionality for runtime patching of symbols, and mca/memory/patcher
which patches the various symbols needed to provide support for memory
hooks. This work is preliminary and is based off work donated by IBM.

The patcher code is disabled if dlopen is disabled.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:14:31 -06:00
rhc54
e500ea2c8c Merge pull request #1540 from rhc54/topic/map
Fix map-by node and do-not-launch
2016-04-13 10:29:28 -07:00
Ralph Castain
437f5b4289 Fix map-by node and do-not-launch 2016-04-13 09:21:19 -07:00
KAWASHIMA Takahiro
4944ba7edc datatype: Fix incorrect predefined datatype names and other datatype bugs (#1537)
* datatype: Fix a incorrect datatype name of `MPI_UNSIGNED`

Name of predefined datatype for C `unsigned int` gotten by
`MPI_TYPE_GET_NAME` should be `MPI_UNSIGNED`, not `MPI_UNSIGNED_INT`.

* datatype: Fix incorrect datatype names of `MPI_C_BOOL` and `MPI_CXX_*`

Names of predefined datatypes gotten by `MPI_TYPE_GET_NAME` are:

after this commit (correct) | before this commit (incorrect)
-----------------------------------------------------------
MPI_C_BOOL                    MPI_BOOL
MPI_CXX_BOOL                  MPI_BOOL
MPI_CXX_FLOAT_COMPLEX         MPI_C_FLOAT_COMPLEX
MPI_CXX_DOUBLE_COMPLEX        MPI_C_DOUBLE_COMPLEX
MPI_CXX_LONG_DOUBLE_COMPLEX   MPI_C_LONG_DOUBLE_COMPLEX

* datatype: Fix a incorrect datatype name of `MPI_2DOUBLE_PRECISION`

Name of the predefined datatype for Fortran two `double precision`
gotten by `MPI_TYPE_GET_NAME` should be `MPI_2DOUBLE_PRECISION`,
not `MPI_2DBLPREC`.

This bug was caused by setting the name to `opal_datatype_t::name`
instead of `ompi_datatype_t::name`.

* datatype: Fix `MPI_UNSIGNED_CHAR` internal flag

`MPI_UNSIGNED_CHAR` is an integer type.

* ompi/cxx: Fix C++ `MPI::LONG_DOUBLE_INT` definition

Just a typo fix. Without this fix, `MPI::MAX_LOC` and `MPI::MIN_LOC`
cannot be used with `MPI::LONG_DOUBLE_INT` in C++ programs.

I know the C++ binding is obsolete, but fixing this is harmless.

* Add FUJITSU copyright
2016-04-12 20:17:46 +02:00
bosilca
0ff6efce03 Merge pull request #1539 from hjelmn/sync_builtin
config: check for more __sync builtins
2016-04-12 19:12:58 +02:00
Nathan Hjelm
98ce659e0b config: check for more __sync builtins
This commit updates the check for __sync builtin atomics to see if the
compiler supports both __sync_bool_compare_and_swap and
__sync_add_and_fetch. If either of these functions are not available
then we can't use the __sync builtins.

Fixes #1487

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-12 10:02:39 -06:00
Jeff Squyres
8e988dff40 AUTHORS: Fix a typo
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-04-11 16:59:19 -04:00
Mike Dubman
4b3cf12fc8 Merge pull request #1533 from alinask/topic/mlnx-opt-rdmacm-ibaddr
mellanox/optimized: set enable_openib_rdmacm_ibaddr=yes in the mellanox/optimized file.
2016-04-11 18:58:11 +03:00
Alina Sklarevich
6cd7282631 mellanox/optimized: set enable_openib_rdmacm_ibaddr=yes in the mellanox/optimized file. 2016-04-11 18:01:16 +03:00
Gilles Gouaillardet
c72688e8cf Merge pull request #1362 from ggouaillardet/topic/openib_warn_default_gid_prefix
btl/openib: correctly issue a warning when two btls or more are in th…
2016-04-11 13:22:48 +09:00
Gilles Gouaillardet
4ab6c8ad56 mpool/hugepage: use statvfs() instead of statfs() when needed.
Thanks Siegmar Gross for the report.
2016-04-11 11:13:29 +09:00
Ralph Castain
4135cdf2c5 Add the 1.10.3 NEWS items 2016-04-08 08:39:15 -07:00
Ralph Castain
2432daf065 Some minor cleanups of a memory leak and error output 2016-04-08 07:46:18 -07:00
rhc54
5b8a40ad65 Merge pull request #1528 from hpcraink/pr/osx_sun_path
OSX tempdir too long for sun_path
2016-04-08 07:16:26 -07:00
Rainer Keller
ad690a4bc0 Move the help into the proper file: all orte_show_help in
orte/orted/pmix/pmix_server.c reference orterun.
2016-04-07 22:52:23 +02:00
Rainer Keller
52080a5736 As per the pull request to pmix/master:
https://github.com/pmix/master/pull/71

Have OMPI's current version of pmix120 nicely fail in case of
too long sun_path (longer than 108 or in case of OSX 103 chars).
And have OMPI return proper error messages with hints how to
amend.
2016-04-07 22:12:53 +02:00
George Bosilca
896f857fc4 Thanks @hjelmn for catching up the typo. 2016-04-07 13:56:26 -04:00
Thananon Patinyasakdikul
92290b94e0 Fixed Coverity reports 1358014-1358018 (DEADCODE and CHECK_RETURN) 2016-04-07 12:52:17 -04:00
Ryan Grant
7cdf50533c Merge pull request #1314 from francois-wellenreiter/osc_disable_portals4_evt_send
OSC portals4 : do not generate an EVENT_SEND to avoid to filter it
2016-04-07 10:04:27 -06:00
Ralph Castain
3597a0834c Update license copyright year for Intel 2016-04-06 11:54:00 -07:00
Gilles Gouaillardet
7b803ac557 MPI_Unpack: fix error code when insize <= 0
this fixes a regression from open-mpi/ompi@f2e33c725f
2016-04-06 09:47:21 +09:00
rhc54
f858647779 Merge pull request #1522 from kmroz/wip-ompi-info-params-fix
opal_info_support: fix memory leak and refactor for pvars
2016-04-05 10:02:42 -07:00
Nathan Hjelm
444190093a Merge pull request #1516 from kmroz/wip-ompi-info-cleanup
opal_info_support: fix api comments
2016-04-05 07:31:33 -06:00
Nathan Hjelm
9efd465539 Merge pull request #1517 from hjelmn/ugni_fixes
Gemini/Aries bug fixes
2016-04-05 07:23:18 -06:00
George Bosilca
26fc8533f8 Remove compiler warnings. 2016-04-04 16:34:23 -04:00
Karol Mroz
13979f559f opal_info_support: refactor component output separation
Adding component name to the pvar pretty output as well.

Further, I think keeping the asprintf()/opal_info_out(msg,msg,------)
within the loops is needed to avoid printing any component information
(independently for group_vars and group_pvars) in case the upcoming
parameters are internal and not to be displayed.

Lastly, unnecessarily duplicating the dashed output should not happen as
each invocation of opal_info_show_mca_group_params() passes a new group
structure which we check.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-04 19:49:57 +02:00
rhc54
a95de6e8ef Merge pull request #1353 from rhc54/topic/host
Per the discussion on the telecon, change the -host behavior yet again
2016-04-04 10:30:36 -07:00
Karol Mroz
43254391a8 opal_info_support: fix memory leak
Fixing a memory leak I introduced with a3229c3a1f855ecf31b379705ce46937ef4c0654.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-04 18:11:11 +02:00
rhc54
74293bc235 Merge pull request #1519 from ggouaillardet/poc/oob_usock
Poc/oob usock
2016-04-04 06:43:51 -07:00
Gilles Gouaillardet
d757fbba5d oob/usock: drop message to be sent in process_send() 2016-04-04 16:04:54 +09:00
Gilles Gouaillardet
170734182b oob/usock: mca_oob_usock_peer_close() sets peer->sd = -1 after close()
so usock_peer_create_socket know it must re-create the socket
/* assuming it is ever supposed to occur */
also fix a typo (peer->sd >= 0) in usock_peer_create_socket
2016-04-04 16:02:05 +09:00
Gilles Gouaillardet
6f450630d8 pmix/external: fix misc missing conversion and type issues 2016-04-04 10:12:34 +09:00
Gilles Gouaillardet
2ede47c462 pmix: fix misc missing conversion and type issues 2016-04-04 10:12:34 +09:00
rhc54
a548232c6b Merge pull request #1518 from kmroz/wip-ompi-info-param-output-1
opal_info_support: add component to param pretty output
2016-04-03 06:39:00 -07:00
rhc54
d724d8a673 Merge pull request #1515 from kmroz/wip-ompi-info-all-2
opal_info_support: output component versions
2016-04-03 06:36:50 -07:00
Karol Mroz
e1eb23e7eb opal_info_support: separate parameter groups in pretty output
Use a dashed line to separate parameters based on component when pretty
printing.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-03 10:56:39 +02:00
Karol Mroz
a3229c3a1f opal_info_support: add component to param pretty output
When listing available parameters, add the component name to the MCA
framework field. Parsable option is already doing this, makes sense
for the pretty print option to do it as well.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-03 10:49:32 +02:00
Karol Mroz
20e448c7d8 opal_info_support: output component versions
When invoking, for example, `ompi_info` with:
   -a
   --params foo all
   --params foo bar
it's useful to have the appropriate components and their versions be
displayed, regardless of whether they have registered any parameters.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-02 21:21:40 +02:00
Karol Mroz
a468c3ba1a opal_info_support: pass component map when handling params
Pass component_map to opal_info_do_params(). It will be needed to output
component versions.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-02 21:17:44 +02:00
Nathan Hjelm
7e806791aa rcache/udreg: bug fixes
This commit fixes bugs that caused hangs or crashes when running out
of registration resources.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-02 12:24:20 -06:00
Nathan Hjelm
d7874920aa btl/ugni: set the frag reference count in the eager get path
This comit adds code that sets the fragment reg_cnt to 1 when sending
the completion message for an eager get. Without this the btl will
either hang or abort.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-02 12:10:22 -06:00