1
1
Граф коммитов

2693 Коммитов

Автор SHA1 Сообщение Дата
Brice Goglin
6839d928c2 configure: check the actual may_alias syntax that we use
xlc 13.1.0 crashes because of our may_alias attributes in nolibxml.c
on Power7. libxml.c and nolibxml.c are the only may_alias users for now,
so change our configure check to match the actual code using it.

Thanks to Paul Hargrove for reporting and debugging the issue,
and providing the patch.

https://www.open-mpi.org/community/lists/devel/2016/05/18918.php

(cherry picked from open-mpi/hwloc@0ab7af5e90)
2016-05-08 22:22:30 +02:00
Ralph Castain
7594b95e4b Ensure the hwloc external header is include when --with-devel-headers is given 2016-05-08 10:18:14 -07:00
Jeff Squyres
acbd2c608d memory/patcher: check for <sys/syscall.h>
Thanks to Paul Hargrove for reporting.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-07 09:48:14 -07:00
Jeff Squyres
b4982d7725 timer/aix: remove stale code
Per discussion on the mailing list and with IBM, remove the AIX timer
code (since AIX is no longer supported).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-07 09:31:34 -07:00
Ralph Castain
7e5ef6a240 Fix the env_list support - the MCA param was being set way too early, so provide a "backdoor" way of providing the value 2016-05-06 15:38:39 -07:00
Ralph Castain
58dd41facf Repair the processing of cmd line options that mapped to MCA params. This was responsible for breaking things like map-by <foo>.
Remove debug, let orterun send terminate cmd to DVM

Recover the DVM support
2016-05-06 13:14:03 -07:00
Josh Hursey
35ae7e33d7 Merge pull request #1639 from jjhursey/topic/dl-open-null-fname
dl/dlopen/libltdl: Allow opal_dl_open to take a NULL filename.
2016-05-05 22:15:46 -05:00
Ralph Castain
8ec1891d11 Silence warning 2016-05-05 20:04:10 -07:00
Ralph Castain
08022d7af1 Some minor cleanups of warnings from gcc 6.0.0. Update s1/s2 pmix to get max_procs as required. 2016-05-05 15:28:13 -07:00
Joshua Hursey
677178f206 dl/dlopen/libltdl: Allow opal_dl_open to take a NULL filename. 2016-05-05 17:07:26 -04:00
Nathan Hjelm
ff2a54bd37 patcher/linux: code cleanup
Update based on cleanup made to the upstream version on OpenUCX.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-04 12:53:45 -06:00
Nathan Hjelm
6c9a0e1c55 patcher/overwrite: disable ia64 support for now
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-04 12:53:24 -06:00
Nathan Hjelm
6ad68da407 patcher/linux: disable the linux patcher component
This commit disables the linux patcher component due to a limitation
in loader patching. While this component is effective in patching
calls made within Open MPI and by the application it fails to hook
calls made within glibc. This means the munmap call made by free is
not correctly hooked. Until this problem can be resolved this
component will remain disabled. If it can't be resolved this component
should probably be removed.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-04 12:50:51 -06:00
Nathan Hjelm
71be36d380 patcher: fix ppc32 support
The table of contents (TOC) code only appears to only apply to
ppc64. The code was incorrectly assuming the existence of the TOC on
ppc32. This commit updates the necessary code to only apply to ppc64.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-04 12:50:32 -06:00
Nathan Hjelm
41f00b7465 memory/patcher: initialize patcher framework when needed
This commit moves the patcher framework initialization to the
memory/patcher component.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-04 12:46:42 -06:00
Nathan Hjelm
0f54a95408 Merge pull request #1626 from hjelmn/vader_32
btl/vader: fix compilation on 32-bit systems
2016-05-03 16:39:46 -06:00
Nathan Hjelm
4a740e9f27 Merge pull request #1619 from hjelmn/ext_verbs_fix
btl/openib: fix check for exp verbs struct members
2016-05-03 14:16:17 -06:00
Nathan Hjelm
e7ccbdee27 btl/vader: fix compilation on 32-bit systems
This commit fixes a compile/link issue caused by vader. The vader btl
was using OPAL_THREAD_ADD64 to increment a counter which may not be
available on 32-bit systems. Changed to use OPAL_THREAD_ADD_SIZE_T
which will be 64-bit or 32-bit depending on the system.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-03 10:14:44 -06:00
Nathan Hjelm
2d0e2b6233 patcher: do not clobber ebx
ebx can not be clobbered when using -fPIC so save and restore the
register instead of allowing it to be clobbered.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-05-03 08:24:33 -06:00
Brice Goglin
a2a721f961 linux: actually enable libudev based on the result of AC_CHECK_LIB
instead of doing AC_CHECK_HEADERS+AC_CHECK_LIB and only using the result of the former.

Thanks to Paul Hargrove for reporting the issue (OMPI build with -m32).

(cherry picked from open-mpi/hwloc@9549fd59af)
2016-05-03 10:00:40 +02:00
Nathan Hjelm
da695a6ce6 Merge pull request #1618 from hjelmn/new_hooks_update
More hook updates
2016-05-02 18:12:50 -06:00
Nathan Hjelm
a65af6d079 btl/openib: fix check for exp verbs struct members
This commit fixes a compilation issue with some versions of exp
verbs. In some cases struct ibv_exp_device_attr does not have either
the exp_atom or exp_atomic_cap fields. It is fine to drop one check
and fall back to the non-exp attribute check on the other.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-02 17:13:33 -06:00
Nathan Hjelm
1ff79656dd patcher: remove debug fprintf
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-02 17:11:00 -06:00
Nathan Hjelm
581e47c271 patcher: check for clflush
Add a feature check for clflush before trying to use the clflush
instruction. As far as I can tell there is no equivalent before the
SSE2 instruction set.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-02 17:10:42 -06:00
Nathan Hjelm
67fd6fa6eb Merge pull request #1615 from hjelmn/new_hooks_update
memory/patcher: add #if check for MREMAP_FIXED
2016-05-02 16:26:58 -06:00
Nathan Hjelm
eb14b34f04 memory/patcher: fix compilation on BSDs
The function signature of mremap on BSD (NetBSD, FreeBSD) differs from
the linux version. Added support for the BSD style of mremap.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-02 14:54:08 -06:00
Nathan Hjelm
52edb43bdc memory/patcher: check for linux/mman.h
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-02 14:29:46 -06:00
Nathan Hjelm
f8b3be6236 patcher/overwrite: fix ia64 compilation
Fixed a couple of typos in ia64 code.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-02 14:10:34 -06:00
Nathan Hjelm
14c34ae9f0 memory/patcher: add #if check for MREMAP_FIXED
This commit fixes a compile error when the system has mremap but not
MREMAP_FIXED. In this case we do not care about the value of
new_address as the argument does not exist.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-02 13:58:51 -06:00
rhc54
648043597a Merge pull request #1612 from ggouaillardet/poc/pmix_external_configury
pmix/external: revamp external pmix package detection
2016-05-02 09:46:05 -07:00
Jeff Squyres
265e5b9795 Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1
ompi/opal/orte/oshmem/test: max hostname length cleanup
2016-05-02 09:44:18 -04:00
Gilles Gouaillardet
45f9a47d77 pmix/external: fix typo and silence a warning 2016-05-02 17:15:52 +09:00
Gilles Gouaillardet
08d91b9a03 pmix/external: revamp external pmix package detection 2016-05-02 16:23:31 +09:00
Ralph Castain
6ac7929bd0 Extend the schizo framework to allow definition of CLI options by environment. Refactor orterun to mesh with the orted_submit code, thus improving code reuse. Eliminate the orte-submit tool as orterun can now meet that need.
Cleanups per @jjhursey review
2016-05-01 11:30:25 -07:00
George Bosilca
3445577f4c Avoid race conditions during BTP TCP handshake.
In some rare cases when a process receives the connect ack while
locally updating the peer endpoint structure, we could drop the
incomming connect ack due to the fact that the send handler is
protected with a try lock (on the endpoint) and our initial send
event was not persistent. Making the send event persistent solves
all issues.
2016-05-01 14:19:29 -04:00
George Bosilca
702f80ad7e Remove "signed vs. unsigned" warnings. 2016-05-01 11:45:48 -04:00
Ralph Castain
42d9d861fc Fix minor typo in PMIx packing of pmix_app_t - thanks to Gilles for pointing it out 2016-04-29 08:55:46 -07:00
Howard Pritchard
f52dd511d4 Merge pull request #1600 from hppritcha/topic/pmix_fix_for_finalize
pmix/cray: set fence_nb to NULL
2016-04-28 13:50:15 -06:00
hppritcha
aa1d7b9c50 pmix/cray: set fence_nb to NULL
Rather than have a stub function for the pmix fence_nb
operation, just set to NULL.  Causes fewer problems.

Fixes #1597
Fixes #1527

Signed-off-by: hppritcha <howardp@lanl.gov>
2016-04-28 13:48:54 -05:00
Nysal Jan K.A
18cf65dc24 Remove a stray print statement 2016-04-28 18:00:52 +05:30
Nathan Hjelm
03f4a854cb btl/tcp: fix add_procs race condition
This commit fixes a race between a thread calling the tcp btl's
add_procs and a thread processing an incomming connection. The race
occured because the add_procs thread adds a newly created proc object
to the hash table *before* the object is fully initialized. The
connection thread then attempts to use the object before the endpoints
array on the object has beeen allocation. The fix is to only add the
proc to the hash table after it has been completely initialized.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-27 10:24:39 -06:00
Nathan Hjelm
8f93b15e90 Merge pull request #1580 from hjelmn/new_hooks_update
memory/patcher: cast away const in shmdt hook
2016-04-26 17:48:01 -06:00
Nathan Hjelm
df194087c7 Merge pull request #1591 from hjelmn/rcache_update
rcache: fix leave_pinned failure path
2016-04-26 16:50:06 -06:00
Nathan Hjelm
25a97af695 rcache: fix leave_pinned failure path
This commit fixes an error in the failure path of leave_pinned. When
the rcache tries to enable leave_pinned but leave_pinned was not
specifically requested (opal_leave_pinned == -1) the code was
erroneously printing an error and returning NULL.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-26 14:39:23 -06:00
Ralph Castain
02876564d4 Silence warning of zero-byte malloc 2016-04-26 11:55:59 -07:00
Nathan Hjelm
5612998d21 memory/patcher: cast away const in shmdt hook
The opal_mem_hooks_release_hook does not have const on the pointer
(though it probably should). This commit eliminates a warning by
casting away the const until opal_mem_hooks_release_hook is updated to
use const.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-25 15:32:11 -06:00
Jeff Squyres
78b367eb0d memory patcher: add some clarifying comments
This is complicated stuff: add some comments so that future
maintainers have some rationale to understand the way things have been
done.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-04-25 13:12:02 -07:00
Geoffrey Paulsen
55a15fb1d0 Missed one IBM Copyright message for contributions in memory patcher component 2016-04-25 15:47:15 -04:00
Geoffrey Paulsen
ed6f508735 Updated IBM Copyright message for contributions in memory patcher component. 2016-04-25 15:13:38 -04:00
Karol Mroz
e1c64e6e59 opal: standardize on max hostname length
Define OPAL_MAXHOSTNAMELEN to be either:
  (MAXHOSTNAMELEN + 1) or
  (limits.h:HOST_NAME_MAX + 1) or
  (255 + 1)

For pmix code, define above using PMIX_MAXHOSTNAMELEN.

Fixup opal layer to use the new max.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-24 08:19:47 +02:00
Jeff Squyres
dc18c32437 usnic: fix resource check
The math for checking the number of QPs and CQs per usNIC/VF was
incorrect, allowing you to run MPI processes even when usNICs (i.e.,
VIC VFs) had fewer QPs and CQs than were necessary.  This led to a
confusing error later when fi_enable(3) failed (because we lazily
create QPs).  Fixing the math here ensure that we actually print a
helpful error message telling the user specifically what is wrong.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-04-22 15:58:27 -07:00
Jeff Squyres
68c1a5eb6c Merge pull request #1567 from jsquyres/pr/fix-ompi-to-opal-name-conversion
m4: rename OMPI_SUMMARY_* macros to OPAL_SUMMARY_*
2016-04-20 13:10:06 -04:00
Jeff Squyres
6800ef9ec0 m4: rename OMPI_SUMMARY_* macros to OPAL_SUMMARY_*
These macros should really be named OPAL_SUMMARY_*; they're used in
all projects, and therefore should be in the lowest later project (OPAL).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-04-20 08:40:00 -07:00
Nathan Hjelm
db854c368a memory/patcher: do not hook madvise if the syscall doesn't exist
Fixes open-mpi/ompi#1565

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-20 09:18:31 -06:00
Nathan Hjelm
fdd1ff7c29 Merge pull request #1562 from hjelmn/opal_coverity
mca/base: fix coverity issue
2016-04-19 14:29:05 -06:00
Nathan Hjelm
3f15d442de Merge pull request #1561 from hjelmn/mpool_rewrite
rcache/base: add missing file to tarball
2016-04-19 12:00:06 -06:00
Nathan Hjelm
16c28399cd rcache/base: add missing file to tarball
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-19 11:03:38 -06:00
Nathan Hjelm
d981a9fc7d patcher/overwrite: fix compile error on x86
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-19 10:16:39 -06:00
Nathan Hjelm
5bc9d9d1f8 patcher/linux: fix compiler warnings
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-19 10:16:20 -06:00
Nathan Hjelm
1147fb3dd1 patcher/linux: ensure component is only enabled on Linux
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-19 10:16:00 -06:00
Nathan Hjelm
a8e90e8796 memory/patcher: munmap hook could be called from within a malloc() implementation
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-18 15:00:07 -06:00
Nathan Hjelm
7f271171f6 memory/patcher: fix coverity warning
Fix CID 1358512:  Error handling issues  (NEGATIVE_RETURNS):

C libraries usually handle read (-1, ...) fine but it is safer to
avoid calling read with a negative handle. Added negative file
descriptor check.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-18 11:29:20 -06:00
Gilles Gouaillardet
d96919638f pmix: remote autogenerated file and update .gitignore
removed: opal/mca/pmix/pmix114/pmix/src/include/private/autogen/config.h.in
2016-04-18 12:57:41 +09:00
Ralph Castain
b009e58d25 Roll to PMIx 1.1.4rc2 - replaces some code that was incorrectly removed in prior update 2016-04-16 18:24:24 -07:00
Ralph Castain
8ff114e668 Update to official PMIx 1.1.4rc1 2016-04-15 21:47:46 -07:00
Ralph Castain
449ec41532 Roll to PMIx 1.1.4rc1 and remove the PMIx 1.2.0 directory as the community has decided to not do that release version. This incorporates a number of bug fixes that have been identified and repaired in the PMIx and OMPI code bases. Also includes several minor corrections to the PMIx code so it now supports run-thru without hanging on collectives involving a process that exits 2016-04-15 10:11:11 -07:00
Nathan Hjelm
4d9c047f04 Merge pull request #1546 from hjelmn/mpool_rewrite
rcache: add missing file
2016-04-14 10:22:09 -06:00
Nathan Hjelm
9046424be5 rcache: add missing file
Fixes #1545

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-14 09:21:09 -06:00
Ralph Castain
4b3995dd27 Trivial change to silence warning 2016-04-14 05:54:02 -07:00
Nathan Hjelm
1e6b4f2f55 Merge pull request #1495 from hjelmn/new_hooks
Add new patcher memory hooks
2016-04-13 18:19:23 -06:00
Nathan Hjelm
c2b6fbb124 opal/memory: move initialization to first rcache creation
Because of the removal of the linux memory component it is no longer
necessary to initialize the memory component in opal_init(). This
commit moves the initialization to the creation of the first rcache
component.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:21:46 -06:00
Nathan Hjelm
80ec79cfc8 memory/patcher: updates to memory hooks
This commit fixes bugs that can cause crashes and memory corruption
when the mremap hook is called. The problem occurs because of the
ellipses (...) in the mremap intercept function. The ellipses cover
the optional new_addr argument on Linux. This commit removes the
ellipses and adds an explicit 5th argument.

This commit also adds a hook for shmdt. The code only works on Linux
at the moment as it needs to read /proc/self/maps to determine the
size of the shared memory segment.

Additionally, this commit removes the mmap hook. There is no
apparent benefit for detecting mmap(..., PROT_NONE, ...) and it
seems to cause problems when threads are in use.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:20:24 -06:00
Nathan Hjelm
91bcab93cb opal/memory: remove ptmalloc2
This commit removes the ptmalloc2 memory hooks. This is necessary in
order to support lazy registration of memory hooks. A feature that is
not supported by the ptmalloc hooks but is supported by the new
patcher hooks.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:18:15 -06:00
Nathan Hjelm
27f8a4e806 opal: add code patcher framework
This commit adds a framework to abstract runtime code patching.
Components in the new framework can provide functions for either
patching a named function or a function pointer. The later
functionality is not being used but may provide a way to allow memory
hooks when dlopen functionality is disabled.

This commit adds two different flavors of code patching. The first is
provided by the overwrite component. This component overwrites the
first several instructions of the target function with code to jump to
the provided hook function. The hook is expected to provide the full
functionality of the hooked function.

The linux patcher component is based on the memory hooks in ucx. It
only works on linux and operates by overwriting function pointers in
the symbol table. In this case the hook is free to call the original
function using the function pointer returned by dlsym.

Both components restore the original functions when the patcher
framework closes.

Changes had to be made to support Power/PowerPC with the Linux
dynamic loader patcher. Some of the changes:

 - Move code necessary for powerpc/power support to the patcher
   base. The code is needed by both the overwrite and linux
   components.

 - Move patch structure down to base and move the patch list to
   mca_patcher_base_module_t. The structure has been modified to
   include a function pointer to the function that will unapply the
   patch. This allows the mixing of multiple different types of
   patches in the patch_list.

 - Update linux patching code to keep track of the matching between
   got entry and original (unpatched) address. This allows us to
   completely clean up the patch on finalize.

All patchers keep track of the changes they made so that they can be
reversed when the patcher framework is closed.

At this time there are bugs in the Linux dynamic loader patcher so
its priority is lower than the overwrite patcher.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:16:13 -06:00
Nathan Hjelm
4cac623aeb opal/patch: add call to check if binary patching is supported
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:16:12 -06:00
Nathan Hjelm
11e2d7886e opal/memory: update component structure
This commit makes it possible to set relative priorities for
components. Before the addition of the patched component there was
only one component that would run on any system but that is no longer
the case. When determining which component to open each component's
query function is called and the one that returns the highest priority
is opened. The default priority of the patcher component is set
slightly higher than the old ptmalloc2/ummunotify component.

This commit fixes a long-standing break in the abstration of the
memory components. ompi_mpi_init.c was referencing the linux malloc
hook initilize function to ensure the hooks are initialized for
libmpi.so. The abstraction break has been fixed by adding a memory
base function that calls the open memory component's malloc hook init
function if it has one. The code is not yet complete but is intended
to support ptmalloc in 2.0.0. In that case the base function will
always call the ptmalloc hook init if exists.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:14:51 -06:00
Nathan Hjelm
7aa03d66b3 opal/memory: add support for patch based memory hooks
This commit adds support for runtime binary patching. The support is
broken down into two parts: util/opal_patcher.[ch] which contains the
functionality for runtime patching of symbols, and mca/memory/patcher
which patches the various symbols needed to provide support for memory
hooks. This work is preliminary and is based off work donated by IBM.

The patcher code is disabled if dlopen is disabled.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:14:31 -06:00
Gilles Gouaillardet
c72688e8cf Merge pull request #1362 from ggouaillardet/topic/openib_warn_default_gid_prefix
btl/openib: correctly issue a warning when two btls or more are in th…
2016-04-11 13:22:48 +09:00
Gilles Gouaillardet
4ab6c8ad56 mpool/hugepage: use statvfs() instead of statfs() when needed.
Thanks Siegmar Gross for the report.
2016-04-11 11:13:29 +09:00
Ralph Castain
2432daf065 Some minor cleanups of a memory leak and error output 2016-04-08 07:46:18 -07:00
Rainer Keller
52080a5736 As per the pull request to pmix/master:
https://github.com/pmix/master/pull/71

Have OMPI's current version of pmix120 nicely fail in case of
too long sun_path (longer than 108 or in case of OSX 103 chars).
And have OMPI return proper error messages with hints how to
amend.
2016-04-07 22:12:53 +02:00
Thananon Patinyasakdikul
92290b94e0 Fixed Coverity reports 1358014-1358018 (DEADCODE and CHECK_RETURN) 2016-04-07 12:52:17 -04:00
Nathan Hjelm
9efd465539 Merge pull request #1517 from hjelmn/ugni_fixes
Gemini/Aries bug fixes
2016-04-05 07:23:18 -06:00
George Bosilca
26fc8533f8 Remove compiler warnings. 2016-04-04 16:34:23 -04:00
Gilles Gouaillardet
6f450630d8 pmix/external: fix misc missing conversion and type issues 2016-04-04 10:12:34 +09:00
Gilles Gouaillardet
2ede47c462 pmix: fix misc missing conversion and type issues 2016-04-04 10:12:34 +09:00
Nathan Hjelm
7e806791aa rcache/udreg: bug fixes
This commit fixes bugs that caused hangs or crashes when running out
of registration resources.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-02 12:24:20 -06:00
Nathan Hjelm
d7874920aa btl/ugni: set the frag reference count in the eager get path
This comit adds code that sets the fragment reg_cnt to 1 when sending
the completion message for an eager get. Without this the btl will
either hang or abort.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-02 12:10:22 -06:00
Ralph Castain
2bfd6a92b0 Add missing include 2016-04-02 08:52:13 -07:00
Nathan Hjelm
6df5a663b7 mca/base: fix coverity issue
Fixes CID 1357979: Resource leak (RESOURCE_LEAK):

flags array was being leaked. Fixed.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-30 12:49:37 -06:00
Ralph Castain
7eca2f9650 Add missing include 2016-03-30 01:34:01 -07:00
Gilles Gouaillardet
e852d85cc1 btl/tcp: add missing mca_btl_tcp_dump() subroutine 2016-03-30 16:10:15 +09:00
Ralph Castain
f70c5c495b Tsk...tsk...replace references to ompi values with opal 2016-03-29 13:35:43 -07:00
George Bosilca
d0165818b3 Initialize all common symbols. 2016-03-29 16:08:27 -04:00
Jeff Squyres
91c54d7a07 Merge pull request #1491 from ICLDisco/progress_thread
BTL TCP async progress
2016-03-29 06:26:10 -04:00
George Bosilca
f69eba1bc4 Update the copyright and cleanup the code.
Per @jsquyres suggestion remove all trailing spaces.
Credit to `sed -i.bak 's/ *$//' */[ch]`.
2016-03-28 14:41:01 -04:00
Thananon Patinyasakdikul
92062492b9 Enable Threading in the BTL TCP
Added mca parameter to turn progress thread on/off
Add a flag to check if we have btl progress thread.
Added macro for ob1 matching lock.
Update the AUTHORS file.
2016-03-28 14:41:01 -04:00
George Bosilca
32277db6ab Add support for async progress in the BTL TCP.
All BTL-only operations (basically all data movements
with the exception of the matching operation) can now
be handled for the TCP BTL by a progress thread.
2016-03-28 14:40:50 -04:00
Jeff Squyres
4a3c986a80 usnic: remove need for hwloc verbs helper
Haven't needed this for a while, but it got left in by accident.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-03-28 09:10:12 -07:00
Jeff Squyres
05e2423756 usnic: specify the cache name
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-03-28 09:02:52 -07:00
Gilles Gouaillardet
4a76f23f40 btl/openib: do not issue an error message if modex cannot retrieve openib info 2016-03-28 10:42:16 +09:00
Gilles Gouaillardet
861564df94 btl/openib: correctly issue a warning when two btls or more are in the default subnet gid
Thanks Matias Cabral for reporting this issue.

Fixes #1352
2016-03-28 09:17:29 +09:00
Nysal Jan K.A
75233573d1 pmix: Increment the reference count in PMIx_Init
The reference counting was broken which led PMIx_Finalize
to release resources early. This fixes the "use after free" scenarios
that I encountered.

(based on commit pmix/master@abfaa4c)
2016-03-27 04:11:25 -04:00
Nathan Hjelm
d6e90f24b1 Merge pull request #1483 from hjelmn/flag_enum_2
RFC: Add support for flag enumerators for MCA variables
2016-03-26 11:43:33 -06:00
Jeff Squyres
017f242b1b opal: remove some unused variables / compiler warnings
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-03-26 03:50:57 -07:00
Josh Hursey
099170bb31 Merge pull request #1496 from jjhursey/topic/pmix120-obj-patch
pmix/pmix120: Fix OBJ_ to PMIX_ symbol name
2016-03-25 19:35:43 -05:00
Ralph Castain
0b4310b186 Remove an unnecessary header that forced exposure of the PMIx internal headers 2016-03-25 16:57:41 -07:00
Ralph Castain
e8246e079b Minor cleanup to match the changes in the PMIx master 2016-03-25 15:12:41 -07:00
Joshua Hursey
8ebeaa5861 pmix/pmix120: Fix OBJ_ to PMIX_ symbol name 2016-03-25 16:17:08 -05:00
rhc54
ba8c8700aa Merge pull request #1493 from rhc54/topic/sing
Update singularity support to track changes in upstream Singularity code
2016-03-24 15:16:38 -07:00
Ralph Castain
8c14df2328 Revert "Modify singularity support per patch from Greg Kurtzer"
This reverts commit open-mpi/ompi@f7257a8310.

Ensure that we properly cleanup the session directory tree. Prior code had issues with symlinks, especially if the file that the link points to was already removed as we traverse the tree. Also found that the dirent checks for directory type weren't fully portable, and so fall back to the stat-based approach which is known to be portable.

Fix singularity singletons by detecting we are in a container and properly setting the pmix selection to pick the isolated component. Remove a stale restriction blocking use of the sm btl
2016-03-24 11:27:18 -07:00
Joshua Ladd
c2813a48e6 Merge pull request #1488 from alinask/topic/openib_enhance_verbose
btl/openib: enhance the verbosity level when using rdmacm without a first PP QP
2016-03-24 10:17:21 -04:00
George Bosilca
ab8008e9fe Nathan missed one reference to mpool. 2016-03-24 00:52:59 -04:00
Nathan Hjelm
478feaf622 btl/scif: update for mpool/rcache rewrite
This commit brings the scif btl up to date with changes made on master
to rework the mpool and rcache frameworks.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-23 16:30:10 -06:00
Alina Sklarevich
2e5a1372dd btl/openib: enhance the verbosity level when using rdmacm without a first PP QP. 2016-03-23 18:49:12 +02:00
Nathan Hjelm
7572c8b74f btl: use flag enumerator for btl_*_flags and btl_*_atomic_flags
This commit uses the new flag "enumerator" to support comma-delimited
lists of flags for both the btl and btl atomic flags. After this
commit is is valid to specify something like -mca btl_foo_flags
self,put,get,in-place. All non-deprecated flags are supported by the
enumerator.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-21 15:23:37 -06:00
Nathan Hjelm
b15a45088c mca: add support for flag enumerators
This commit adds a new type of enumerator meant to support flag
values. The enumerator parses comma-delimited strings and matches
each string or value to a list of valid flags. Additionally, the
enumerator does some basic checks to see if 1) a flag is valid in the
enumerator, and 2) if any conflicting flags are specified.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-21 15:20:56 -06:00
Nathan Hjelm
645bd9d9dd btl/openib: update rdmacm for dynamic add_procs
This commit adds the data necessesary for supporting dynamic add_procs
to the rdma message (opal_process_name_t). The endpoint lookup
function has been updated to match the code in udcm.

Closes open-mpi/ompi#1468.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-21 10:00:41 -06:00
Nathan Hjelm
4d4fa28f75 opal: fix coverity issues
Fix CID 1345825 (1 of 1): Dereference before null check (REVERSE_INULL):

ib_proc should not be NULL in this case. Removed the check and added a
check for NULL after OBJ_NEW.

CID 1269821 (1 of 1): Dereference null return value (NULL_RETURNS):

I labeled this one as a false positive (which it is) but the code in
question could stand be be cleaned up.

Fix CID 1356424 (1 of 1): Argument cannot be negative (NEGATIVE_RETURNS):

While trying to silence another Coverity issue another was
flagged. Protect the close of fd with if (fd >= 0).

CID 70772 (1 of 1): Dereference null return value (NULL_RETURNS):
CID 70773 (1 of 1): Dereference null return value (NULL_RETURNS):
CID 70774 (1 of 1): Dereference null return value (NULL_RETURNS):

None of these are errors and are intentional but now that we have a
list release function use that to make these go away. The cleanup is
similar to CID 1269821.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-18 15:56:08 -06:00
Nathan Hjelm
676a33bfff rcache/grdma: do not OBJ_RELEASE vma tree too early
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-17 11:31:41 -06:00
Nathan Hjelm
852cc8cfbc opal: fix various coverity errors
Fix CID 1356358:  Null pointer dereferences  (REVERSE_INULL):

flist->fl_mpool can no longer be NULL. Removed the conditional.

Fix CID 1356357:  Resource leaks  (RESOURCE_LEAK):

Added the call to free the hints array.

Fix CID 1356356:  Resource leaks  (RESOURCE_LEAK):

This is a false error but it is safe to call close (-1) so just always
call close.

Fix CID 1356354:  Control flow issues  (MISSING_BREAK):
Fix CID 1356353:  Control flow issues  (MISSING_BREAK):

Add comments that indicate the fall-through is intentional.

Fix CID 1356351:  Null pointer dereferences  (FORWARD_NULL):

Fix potential SEGV if the page_size key is malformed.

Fix CID 1356350:  Error handling issues  (CHECKED_RETURN):

Add (void) to indicate that we do not care about the return code of
sscanf in this case.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-17 10:05:57 -06:00
Mike Dubman
8f1838df4b Merge pull request #1459 from alinask/topic/openib_diff_subnets
btl/openib: enable connecting processes from different subnets.
2016-03-17 08:40:08 +02:00
Nathan Hjelm
cbce085b12 rcache/grdma: fix typo
This typo was originally fixed on the mpool_rewrite branch but the change
was lost.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-16 18:30:44 -06:00
Jeff Squyres
d44804f0c9 usnic: use version 1 of the API, not the current version 2016-03-16 16:03:51 -07:00
Jeff Squyres
e7ef711455 usnic: allow mpool_hints to be empty
Follow on to open-mpi/ompi@eac0b11
2016-03-16 15:04:39 -07:00
Alina Sklarevich
bbcbe3cacd btl/openib: enable connecting processes from different subnets.
+ Added an mca parameter to allow connecting processes from different
subnets. Its current default value is 'false' - don't allow, to keep the
current flow the way it is now.

+ rmdacm: when calling ibv_query_gid, use the gid index from
btl_openib_gid_index.
2016-03-16 10:52:06 +02:00
Gilles Gouaillardet
99809162b0 rcache: initialize common symbol mca_rcache_base_used_mem_hooks 2016-03-16 09:27:33 +09:00
Ralph Castain
beecf1b6eb Add missing include, remove unused vairable 2016-03-15 13:45:27 -07:00
Nathan Hjelm
eac0b110b8 btl/usnic: update for mpool/rcache rewrite
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-14 10:50:41 -06:00
Nathan Hjelm
522c2f2b82 rcache: add major/minor/release version macros
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-14 10:50:41 -06:00
Nathan Hjelm
69d9266497 Update memkind mpool for new mpool interface
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-14 10:50:41 -06:00
Vishwanath Venkatesan
8024142f46 The latest version of memkind uses jemalloc as a submodule.
This means we need not check for jemalloc in the configure script for
this component. Removing this.

In some machines having the TLS option on can cause errors in
opening this component. --disable-tls while configuring jemalloc.
Please look for instructions for installing jemalloc as a static
library linked directly into memkind in CONTRIBUTING file
github.com/memkind/memkindw
2016-03-14 10:50:41 -06:00
Vishwanath Venkatesan
3d98a1a01e Adding memkind component to use MPI_Alloc_mem through memkind 2016-03-14 10:50:41 -06:00
Nathan Hjelm
d4afb16f5a opal: rework mpool and rcache frameworks
This commit rewrites both the mpool and rcache frameworks. Summary of
changes:

 - Before this change a significant portion of the rcache
   functionality lived in mpool components. This meant that it was
   impossible to add a new memory pool to use with rdma networks
   (ugni, openib, etc) without duplicating the functionality of an
   existing mpool component. All the registration functionality has
   been removed from the mpool and placed in the rcache framework.

 - All registration cache mpools components (udreg, grdma, gpusm,
   rgpusm) have been changed to rcache components. rcaches are
   allocated and released in the same way mpool components were.

 - It is now valid to pass NULL as the resources argument when
   creating an rcache. At this time the gpusm and rgpusm components
   support this. All other rcache components require non-NULL
   resources.

 - A new mpool component has been added: hugepage. This component
   supports huge page allocations on linux.

 - Memory pools are now allocated using "hints". Each mpool component
   is queried with the hints and returns a priority. The current hints
   supported are NULL (uses posix_memalign/malloc), page_size=x (huge
   page mpool), and mpool=x.

 - The sm mpool has been moved to common/sm. This reflects that the sm
   mpool is specialized and not meant for any general
   allocations. This mpool may be moved back into the mpool framework
   if there is any objection.

 - The opal_free_list_init arguments have been updated. The unused0
   argument is not used to pass in the registration cache module. The
   mpool registration flags are now rcache registration flags.

 - All components have been updated to make use of the new framework
   interfaces.

As this commit makes significant changes to both the mpool and rcache
frameworks both versions have been bumped to 3.0.0.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-14 10:50:41 -06:00
Jeff Squyres
48c650c47a configury: minor updates to config summary output 2016-03-10 13:02:52 -08:00
Jeff Squyres
97714716ec usnic: add some cagent verification checks
Add primitive magic number and version checking in the connectivity
checker protocol.  These checks doesn't *guarantee* to we won't get
false PINGs and ACKs, but they do significantly reduce the possibility
of interpretating random incoming fragments as PINGs or ACKs.
2016-03-09 13:25:00 -08:00
Nathan Hjelm
f8469de832 Merge pull request #1415 from hjelmn/configure_summary
configure: add a summary section at the end of configure output
2016-03-09 12:25:39 -07:00
Nathan Hjelm
fdebebc4c0 Merge pull request #1439 from hjelmn/btl_openib_send_size
btl/openib: fix inconsistency in the default settings
2016-03-09 09:31:18 -07:00
rhc54
007ca2ae08 Merge pull request #1442 from rhc54/topic/cleanup
Cleanup a debug statement. Plug a memory leak
2016-03-08 20:42:39 -08:00
Ralph Castain
af1444b6e1 Cleanup a debug statement. Plug a memory leak 2016-03-08 18:27:55 -08:00
Jeff Squyres
584b80147d usnic: set MCA_BTL_FLAGS_SINGLE_ADD_PROCS
The btl_recv.h:lookup_sender() function uses the hashed ORTE proc name
to determine the sender of the packet.  With add_procs_cutoff>0, the
usnic BTL may not have knowledge of all the senders.

Until the usNIC BTL can be adjusted to do something like the
openib/ugni BTLs (i.e., use opal_proc_for_name() to lookup unknown
sender proc names), set MCA_BTL_FLAGS_SINGLE_ADD_PROCS, which means
that ob1 will only all add_procs() once -- with all the procs in it.

Also in this commit, adapt the connectivity checker to not rely on
knowing all the senders (which is a bit easier than adapting the main
BTL send path): the receiving connectivity agent will simply echo back
the same PING message (which contains the sender's IP address+UDP
port) back to the sender without checking that it knows who the sender
is.  If the sender receives the echoed PING back on the expexted
interface, it will find a match in the pending pings list.  If the
sender receives the echoed PING back an unexpected interface, a match
will not be found, and the incoming PING message will be dropped.

Fixes open-mpi/ompi#1440
2016-03-08 17:41:42 -08:00
Jeff Squyres
4975fdcd5c usnic: allow connect(2) to fail temporarily
When connecting the connectivity checker client to its agent fails
with ECONNREFUSED, just delay a little and try again a few more times.
2016-03-08 15:35:34 -08:00
Nathan Hjelm
2ef2763f72 btl/openib: fix inconsistency in the default settings
This commit fixes an inconsistency between btl_openib_receive_queues,
btl_openib_max_send_size and btl_openib_eager_limit. Before this
commit if the ini file specified a set of default receive queues that
happen to not contain one large enough for the default max_send_size
of eager_limit users would see an error like:

   WARNING: The largest queue pair buffer size specified in the
   btl_openib_receive_queues MCA parameter is smaller than the maximum
   send size (i.e., the btl_openib_max_send_size MCA parameter), meaning
   that no queue is large enough to receive the largest possible incoming
   message fragment.  The OpenFabrics (openib) BTL will therefore be
   deactivated for this run.

     Local host: somehost
     Largest buffer size: 65536
     Maximum send fragment size: 131072

This commit adds code that detects the source of the max_send_size and
eager_limit values and sets either or both of them to the size
supported by the largest queue pair if both 1) the value is larger
than the largest queue pair size, and 2) the value was not set by the
user or a MCA configuration file.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-08 16:17:33 -07:00
Nathan Hjelm
d2f5fca82a configure: add a summary section at the end of configure output
This commit adds two m4 macros: OPAL_SUMMARY_ADD, OPAL_SUMMARY_PRINT.
OPAL_SUMMARY_ADD adds an item to a section in the summary. For example
OPAL_SUMMARY_ADD([[Transports]],[[Foo]],...,[yes]) will add the
following to the summary:

Transports
-----------------------
Foo: yes

With this commit two sections are added: Transports, Resource Managers.

The OPAL_SUMMARY_PRINT macro is called after AC_OUTPUT and prints out
some information about the build (version, projects, etc) and then
the summarys sections. It will additionally print a warning if
internal debugging is enabled.

Example output:

Open MPI configuration:
-----------------------
Version: 3.0.0 a1
Build Open Platform Abstration project: yes
Build Open Runtime project: yes
Build Open MPI project: yes
Build Open SHMEM project: no
MPI C++ bindings (deprecated): no
MPI Fortran bindings: mpif.h, use mpi, use mpi_f08
Debug build: yes

Transports
-----------------------
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): no
KNEM Shared Memory: no
Linux CMA IPC: no
Mellanox MXM: no
Open UCX: no
OpenFabrics libfabric: no
OpenFabrics Verbs: no
portals4: no
QLogic Infinipath (PSM): no
tcp: yes
XPMEM Shared Memory: no

Resource Managers
-----------------------
Cray Alps: no
Grid Engine: no
LSF: no
Slurm: yes
Torque: yes

INTERNAL DEBUGGING IS ENABLED. DO NOT USE THIS BUILD FOR PERFORMANCE MEASUREMENTS!

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-03-08 10:04:15 -07:00
Ralph Castain
bac6290b22 Ensure the process name is positive when using direct launch
Fixes #1425
2016-03-08 08:31:05 -08:00
Ralph Castain
4d0cc27eb7 Update the singularity support to match that of the latest singularity master. Remove the restriction on shared memory components by instructing singularity to not isolate the PID space. Add a new schizo API to allow setting up the original app_context. Ensure the container is installed prior to execution. 2016-03-05 21:47:42 -08:00
Ralph Castain
b57a191ccc Update the external client to the new PMIx init/finalize signatures 2016-03-03 20:50:20 -08:00
igor-ivanov
5e9fdabdbb Merge pull request #1420 from ggouaillardet/topic/memory_linux_memalign_enum
memory/linux: make memory_linux_memalign an enum
2016-03-03 21:21:55 +04:00
rhc54
d38e2e6655 Merge pull request #1423 from rhc54/topic/suicide
Fix registration of error handlers thru the pmix120 component.
2016-03-02 17:43:06 -08:00
Gilles Gouaillardet
5c685e2332 memory/linux: make memory_linux_memalign an enum
Thanks Igor Ivanov for the review.
2016-03-03 08:38:46 +09:00