1
1
Граф коммитов

3538 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
097b48d521 mca_base_component_respository.c: fix compiler warning
This function is only used in the DL case -- it can be #if'ed out if
we're not compiling with DL support to avoid a compiler warning about
defined-but-not-used.
2015-06-17 08:54:59 -07:00
Jeff Squyres
dfa36197ea usnic/Makefile.am: ensure static builds include -lfabric 2015-06-17 08:15:29 -07:00
Gilles Gouaillardet
2cef2d0fe6 opal/memory: silence a warning
as reported by Coverity with CID 71663
2015-06-17 11:17:55 +09:00
Gilles Gouaillardet
58d1b3f4d0 opal_os_dirpath_create: fix TOCTOU
as reported by Coverity with CID 70396
2015-06-17 11:17:54 +09:00
Gilles Gouaillardet
de66447ebb opal_cmd_line_get_usage_msg: silence warning
as reported by Coverity with CID 1269967
2015-06-17 11:17:54 +09:00
Gilles Gouaillardet
f2f66e6e63 opal_daemon_init: silence warning
as reported by Coverity with CID 710642
2015-06-17 11:17:53 +09:00
Gilles Gouaillardet
8427e87ee9 opal_argv_delete: silence warning
as reported by Coverity with CID 71914
2015-06-17 11:17:53 +09:00
Gilles Gouaillardet
d9c490cf9f refactor opal_bitmap_get_string
make it more efficient and fix CID 71992 (dead code)
2015-06-17 11:17:53 +09:00
Jeff Squyres
44e7646de9 usnic/configure.m4: convert to use external libfabric
Use the new OPAL_CHECK_LIBFABRIC macro.
2015-06-15 15:17:06 -07:00
Jeff Squyres
3e1b85ceb3 libfabric: remove embedded libfabric
OMPI now only builds against external libfabric installations.
2015-06-15 15:17:05 -07:00
Jeff Squyres
c74ab51dd4 opal/mca/dl/dl.h: fix the #ifndef/#define name
Thanks to Scott Atchley for noticing the name mismatch.
2015-06-15 13:08:57 -07:00
rhc54
adbff46a13 Merge pull request #642 from rhc54/topic/hwloc
Update hwloc to 1.11.0
2015-06-13 12:09:58 -07:00
Ralph Castain
ff92781ec4 Replace hwloc191 with hwloc1110
Fix hwloc compile. Ignore LAMA mapper due to deprecated hwloc functions
2015-06-13 10:11:45 -07:00
Jeff Squyres
4384131e65 openib: minor style and defensive programming fixes
Minor comment/whitespace fixes.  Also some minor logic changes that
are mainly for defensive programming purposes (i.e., ensure to always
set malloc_hook_set to true or false, and then check it before we try
to actually invoke it).
2015-06-12 20:11:47 -07:00
Jeff Squyres
2f137ff151 openib: reset memalign threshhold properly
Now that open-mpi/ompi#638 is fixed, reset the openib BTL memalign
threshhold properly.

This effectively re-instates commit
open-mpi/ompi@ce915b5757.
2015-06-12 20:11:47 -07:00
Jeff Squyres
88c13adc8c openib: only set the memory hook if it is enabled
Instead of unconditionally setting the memory hook, only set it when
the memory hooks are both available and have been enabled (e.g.,
opal/mca/memory/linux has decided that it *can* be enabled, and when
the mpi_leave_pinned MCA param is set to 1, or is set to -1 and some
component requested the memory hooks be enabled).

If we set the memory hook when memory hooks are not enabled,
__malloc_hook will be NULL, which will cause problems when
btl_openib_malloc_hook() tries to invoke it.

Fixes open-mpi/ompi#638.
2015-06-12 20:11:47 -07:00
Ralph Castain
12d3c9ca22 Revert "Fix a typo that incorrectly set the alignment threshold in the openib BTL."
This reverts commit ce915b5757.
2015-06-10 14:02:49 -07:00
Gilles Gouaillardet
8885b34637 mca/base: fix a misc memory leak
as reported by Coverity with CID 1294415
2015-06-10 15:10:57 +09:00
Gilles Gouaillardet
9e278a21ce opal/crs: fix a string overflow
and revamp out of resource handling
fixes resource leak as reported by Coverity with CID 1304752
2015-06-10 14:23:25 +09:00
Nathan Hjelm
6772d32b85 opal/crs: silence clang warnings introduced by coverity fixes
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-06-08 09:16:13 -06:00
Gilles Gouaillardet
bcdb2d1380 add missing #include
sscanf requires stdio.h
fixes commit open-mpi/ompi@6ca57724c4
2015-06-08 09:13:11 +09:00
Jeff Squyres
4b59be4e4c btl tcp: cosmetic changes and updates
No logic changes.

Update some stale/incorrect comments, fix some indenting and style.
2015-06-06 10:17:20 -07:00
Jeff Squyres
d164fe9bc5 opal_params.c: fix typo in comment 2015-06-06 10:17:20 -07:00
Jeff Squyres
0acec2b676 opal/util/net.c: remove stale comment
Also wrap a long "if" statement -- but make no code logic changes.
2015-06-06 10:17:20 -07:00
Jeff Squyres
6ca57724c4 opal/util/net.c: remove superflous #include 2015-06-06 10:17:20 -07:00
Jeff Squyres
cddc8945e0 btl_tcp_proc.c: add missing "continue"
Also add another (superflous but symmetric) continue statement.

This missing "continue" statement allows IPv4 "private network"
matches to fall through and allow IPv6 matches to be made -- thereby
overriding the IPv4 match that was already made.

Fixes #585 (although several of the other issues identified on #585
still exist, the primary / initial bug that was reported there is now
fixed).
2015-06-06 10:17:12 -07:00
Ralph Castain
d9f23627fd Add in hwloc 1.11.0rc1 - will overwrite with final version 2015-06-04 15:35:56 -07:00
Rolf vandeVaart
8622b34664 Check for GPU Direct RDMA and leave pinned turned off 2015-06-04 14:25:24 -04:00
Nathan Hjelm
f72b6d45c7 crs/none: fix coverity issues
CID 1301389 Resource leak (RESOURCE_LEAK)

There is no conceivable reason to strdup cr_argv[0] in either
location. Removed the calls to strdup.

CID 741357 Resource leak (RESOURCE_LEAK)

cr_argv was created by opal_argv_split (tmp_argv[0], ' '). Why should
we call opal_argv_join (' ') on this array. Leak fixed by printing out
tmp_argv[0] instead of calling opal_argv_join.

CID 741358 Resource leak (RESOURCE_LEAK)

The code does not handle exec failure correctly. The error should be
communicated to the parent process but the function in question is
only called by the parent. This calls into question some of the
structure of the function in general (like what is the point of
returning the child process id). That said, I will go ahead and add
the opal_argv_free to quiet this error.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-06-01 16:00:51 -06:00
Nathan Hjelm
7e34997746 event/libevent2022: fix coverity issue
CID 1269841 Out-of-bounds access (OVERRUN)

Correct issue. If the string being concatingated fills the remaining
buffer then a \0 is written past the end of the string. In practice
this should never happen but it should be fixed. I re-organized the
code a bit to clear this error.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-06-01 15:38:54 -06:00
Ralph Castain
ea35e47228 Fat SMPs (i.e., systems with nodes containing large numbers of cpus) were failing to start due to connection failures of the opal/pmix support. Root cause was that (a) we were setting the client socket to non-blocking before calling connect, and (b) the server was using the event library to harvest the accepts, and also did the handshake while in that event. So the server would backup beyond the connection backlog limit, and we would fail.
Changing the client to leave its socket as blocking during the connect doesn't solve the problem by itself - you also have to introduce a sleep delay once the backlog is hit to avoid simply machine-gunning your way thru retries. This gets somewhat difficult to adjust as you don't want to unnecessarily prolong startup time.

We've solved this before by adding a listening thread that simply reaps accepts and shoves them into the event library for subsequent processing. This would resolve the problem, but meant yet another daemon-level thread. So I centralized the listening thread support and let multiple elements register listeners on it. Thus, each daemon now has a single listening thread that reaps accepts from multiple sources - for now, the orte/pmix server and the oob/usock support are using it. I'll add in the oob/tcp component later.

This still didn't fully resolve the SMP problem, especially on coprocessor cards (e.g., KNC). Removing the shared memory dstore support helped further improve the behavior - it looks like there is some kind of memory paging issue there that needs further understanding. Given that the shared memory support was about to be lost when I bring over the PMIx integration (until it is restored in that library), it seemed like a reasonable thing to just remove it at this point.
2015-05-29 14:37:14 -07:00
Nathan Hjelm
7b7993e406 pmix/base: fix coverity issue
CID 1269707 Logically dead code (DEADCODE)

Coverity is correct that tmp3 can never be NULL here. Deleted the dead
code.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-29 09:02:56 -06:00
Nathan Hjelm
1d27b1f944 pmix/native: fix coverity issue
CID 1269730 Dereference after null check (FORWARD_NULL)

The code checked for cb == NULL before checking for a callback
function but did not have the same protection around the
OBJ_RELEASE(cb).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-29 08:48:15 -06:00
Nathan Hjelm
5e2bc2c662 btl/openib: fix coverity issue
CID 1269821 Dereference null return value (NULL_RETURNS)

This is another false positive that can be silenced by looping on
opal_list_remove_first instead of using both opal_list_is_empty and
opal_list_remove_first.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-29 08:44:03 -06:00
Nathan Hjelm
65472a383f mca/base: add yes/no as valid values for boolean variables
This commit expands the set of accepted values for boolean values to
include yes/no as synonyms for 1/0.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-29 08:41:51 -06:00
Nathan Hjelm
0d763ea0bc Merge pull request #611 from hjelmn/opal_coverity
btl/openib: more coverity fixes
2015-05-28 13:01:37 -06:00
Nathan Hjelm
b038eb6434 btl/openib: more coverity fixes
CID 1301390 Dereference before null check (REVERSE_INULL)

endpoint can not be NULL here. Remove NULL check.

CID 1269836 Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
CID 1301388 Bad bit shift operation (BAD_SHIFT)

Add ull to integer constants to ensure the math is done in 64-bits not
32.

CID 715749 Explicit null dereferenced (FORWARD_NULL)

As far as I can tell this parser function does not accept a line that
does match key = value. If that is the case then value should never be
NULL. If it is it is a parse error. Updated the code to reflect
this. Also modified the intify function to do something more sane
(strtol vs atoi with hex detection).

CID 1269820 Dereference null return value (NULL_RETURNS)

This is a false positive as strchr will never return NULL here. It
makes sense, though, to quiet the warning by changing the do {} while
() loop to a while () loop.

CID 1269780 Dereference after null check (FORWARD_NULL)

Just return an error if the endpoint's cpc data is NULL.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 11:58:17 -06:00
Nathan Hjelm
9c170a8c00 Merge pull request #608 from hjelmn/opal_coverity
Opal coverity fixes
2015-05-28 09:06:31 -06:00
Nathan Hjelm
ceb319170a btl/openib: fix more coverity issues
CID 1269674 Ignoring number of bytes read (CHECKED_RETURN)

Check that we read enough bytes to get a complete async command.

CID 1269793 Missing break in switch (MISSING_BREAK)

Added comment to indicate fall through was intentional.

CID 1269702: Constant variable guards dead code (DEADCODE)

Remove an unused argument to opal_show_help. This will quiet the
coverity issue.

CID 1269675 Ignoring number of bytes read (CHECKED_RETURN)

Check that at least sizeof(int) bytes are read. If this is not the
case then it is an error.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:10 -06:00
Nathan Hjelm
9353fcea95 crs/base: fix coverity issues
CID 1196720 Resource leak (RESOURCE_LEAK)
CID 1196721 Resource leak (RESOURCE_LEAK)

The code in question does leak loc_token and loc_value. Cleaned up the
code a bit and plugged the leak.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:10 -06:00
Nathan Hjelm
0e3c32a98a opal/sys_limits: fix coverity issue
CID 996175 Dereference before null check (REVERSE_NULL)

If lims is NULL then we ran out of memory. Return an error and remove
the NULL check at cleanup.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:10 -06:00
Nathan Hjelm
3edb421adc common/verbs: fix coverity issues
CID 1269864 Resource leak (RESOURCE_LEAK)
CID 1269865 Resource leak (RESOURCE_LEAK)

Slightly refactored the code to remove extra goto statements and
ensure the if_include_list and if_exclude_list are actually released
on success.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:10 -06:00
Nathan Hjelm
43d678e7ca btl/openib: fix more coverity issues
CID 1269931 Uninitialized scalar variable (UNINIT)

Initialize complete async message. This was not a bug but the fix
contributes to valgrind cleanness (uninitialed write).

CID 1269915 Unintended sign extension (SIGN_EXTENSION)

Should never happen. Quieting this by explicitly casting to uint64_t.

CID 1269824 Dereference null return value (NULL_RETURNS)

It is impossible for opal_list_remove_first to return NULL if
opal_list_is_empty returns false. I refactored the code in question to
not use opal_list_is_empty but loop until NULL is returned by
opal_list_remove_first. That will quiet the issue.

CID 1269913 Dereference before null check (REVERSE_INULL)

The storage parameter should never be NULL. The check intended to
check if *storage was NULL not storage.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:10 -06:00
Nathan Hjelm
32d4d7b6ea opal/dss: silence coverity issues
CID 1269988 Use after free (USE_AFTER_FREE)
CID 1269987 Use after free (USE_AFTER_FREE)

Both are false positives as convert is always overwritten by the call
to opal_dss_unpack_string(). Set convert to prevent this issue from
re-appearing.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:09 -06:00
Nathan Hjelm
6b86e74218 btl/openib: fix coverity issues
CID 1269933 Uninitialized scalar variable (UNINIT)

This CID isn't really an error but it is best for both valgrind and
coverity cleanness to not write uninitialized data. Added an
initializer for async_command in btl_openib_component_close.

CID 1269930 Uninitialized scalar variable (UNINIT)

Same as above. Best not to write uninitialized data. Added an
initializer for async_command.

CID 1269701 Logically dead code (DEADCODE)

Coverity is correct. The smallest_pp_qp will always be 0. Changed the
initial value so that the smallest_pp_qp is set as intended. If no
per-per queue pair exists then use the last shared queue pair. This
queue pair should have the smallest message size. This will reduce
buffer waste.

CID 1269713 Logically dead code (DEADCODE)

False positive but easy to silence. The two check are meaningless if
HAVE_XRC is 0 so protect them with #if HAVE_XRC.

CID 1269726 Division or modulo by zero (DIVIDE_BY_ZERO)

Indeed an issue. If we get an invalid value for rd_win then this will
cause a divide-by-zero exception. Added a check to ensure rd_win is >
0. Also updated the help message to reflect this requirement.

CID 1269672 Ignoring number of bytes read (CHECKED_RETURN)

This error was somewhat intentional. Linux parameter files are
probably not empty but it is safer to check the return code of read to
make sure we got something. If 0 bytes are read this code could SEGV
whe running strtoull.

CID 1269836 Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)

Add a range check to read_module_param to ensure we do not
overflow. In the future it might be worthwhile to report an error
because these parameters should never cause overflow in this
calculation.

CID 1269692 Calling risky function (DC.WEAK_CRYPTO)

??? This call was added in 2006 but I see no calls to the rest of the
rand48 family of functions. Anyway, we SHOULD NEVER be calling seed48,
srand, etc because it messes with user code. Removed the call to
seed48.

CID 1269823 Dereference null return value (NULL_RETURNS)

This is likely a false positive. The endpoint lock is being held so no
other thread should be able to remove fragments from the list. Also,
mca_btl_openib_endpoint_post_send should not be removing items from
the list. If a NULL fragment is ever returned it will likely be a
coding error on the part of an Open MPI developer. Added an assert()
to catch this and quiet the coverity error.

CID 1269671 Unchecked return value (CHECKED_RETURN)

Added a check for the return code of mca_btl_openib_endpoint_post_send
to quiet the coverity error. It is unlikely this error path will be
traversed.

CID 1270229 Missing break in switch (MISSING_BREAK)

Add a comment to indicate that the fall-through is intentional.

CID 1269735 Dereference after null check (FORWARD_NULL)

There should always be an endpoint when handling a work
completion. The endpoint is either stored on the fragment or can be
looked up using the immediate data. Move the immediate data code up
and add an assert for a NULL endpoint.

CID 1269740 Dereference after null check (FORWARD_NULL)
CID 1269741 Explicit null dereferenced (FORWARD_NULL)

Similar to CID 1269735 fix.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:09 -06:00
Nathan Hjelm
13e0a9da3a sec/base: fix coverity issues
CID 1292483 Uninitialized pointer read (UNINIT)

Initialize the method and credential members of the opal_sec_cred_t to
avoid possible invalid read when calling cleanup_cred.

CID 1292484 Double free (USE_AFTER_FREE)

Set method and credential members to NULL after freeing in
cleanup_cred.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:09 -06:00
Nathan Hjelm
f5389cbb03 opal/keyval: fix coverity issues
CID 1292738 Dereference after null check (FORWARD_NULL)

It is an error if NULL is passed for val in add_to_env_str. Removed
the NULL-check @ keyval_parse.c:253 and added a NULL check and an
error return.

CID 1292737 Logically dead code (DEADCODE)

Coverity is correct, the error code at the end of parse_line_new is
never reached. This means we fail to report parsing errors when
parsing -x and -mca lines in keyval files. I moved the error code into
the loop and removed the checks @ keyval_parse.c:314.

I also named the parse state enum type and updated parse_line_new to
use this type.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:09 -06:00
Gilles Gouaillardet
b896ce7c4a opal/ddt: fix opal_dt_swap_bytes
fix a bug from commit open-mpi/ompi@ef74566734
2015-05-28 16:00:03 +09:00
George Bosilca
ef74566734 Optimize the heterogeneous case when there are multiple
identical types.
2015-05-27 01:07:29 -04:00
George Bosilca
79d5e2a92b Use C99 array initialization. Add a comment about
the limitations of the functions generated with
COPY_TYPE_HETEROGENEOUS.
2015-05-27 00:44:17 -04:00
Jeff Squyres
ec57aa1805 usnic: also update btl_usnic_compat.c for versioning 2015-05-26 18:56:29 -07:00
Jeff Squyres
9c19dd4e5b usnic: make v1.10 and v2.0 fit in version checking scheme
v1.10 is now in the same compatibility level as v1.7/v1.8 (there
is/will be no v1.9 series).  v2.0 now takes over for what used to be
called v1.9.
2015-05-26 18:42:33 -07:00
Jeff Squyres
95a2b14543 usnic: avoid some compiler warnings
Followup to open-mpi/ompi@65b66ab: if we're not debugging, then #if
out an entire block so that the compiler doesn't warn about variables
that are assigned and not used.
2015-05-26 14:27:26 -07:00
Nathan Hjelm
a3eb3e2c9c Merge pull request #604 from hjelmn/opal_coverity
opal/crs: clean up parsing code to fix coverity issues
2015-05-26 15:26:01 -06:00
Jeff Squyres
68c33355c6 Merge pull request #595 from goodell/pr/usnic_getname
usnic: use fi_getname in newer libfabric
2015-05-26 17:20:13 -04:00
Nathan Hjelm
9e56ef0da9 opal/crs: clean up parsing code to fix coverity issues
CID 70622 Dereference before null check (REVERSE_INULL)
CID 70459 Logically dead code (DEADCODE)

Cleanup some cludgy code which (among other things) reimplemented
strcat, strdup, and strchr. In the process this resolved two
outstanding coverity issues.

CID 70631 Dereference before null check (REVERSE_INULL)

best_module can not be NULL in this code path. Remove NULL check and
unnecessary goto statements.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-26 14:45:35 -06:00
Gilles Gouaillardet
bc105afb10 opal/ddt: add copy functions for complex types
long double internal representation is arch specific,
and no conversion is done (yet), so MPI_LONG_DOUBLE should not
be used (yet) in heterogeneous mode
2015-05-26 16:59:36 +09:00
Ralph Castain
ce915b5757 Fix a typo that incorrectly set the alignment threshold in the openib BTL.
Thanks to Xavier Besseron for pointing it out
2015-05-25 07:12:57 -07:00
Gilles Gouaillardet
e47cb9636d Revert "opal_pack_homogeneous_contig_with_gaps_function: correctly handle contiguous ddt made of more than one basic type"
This reverts commit e4846746f4.
2015-05-22 17:25:33 +09:00
Gilles Gouaillardet
60e4d6c795 btl: add conversion macros for mca_btl_base_segment_t for heterogeneous support 2015-05-22 15:52:32 +09:00
Gilles Gouaillardet
e4846746f4 opal_pack_homogeneous_contig_with_gaps_function: correctly handle contiguous ddt made of more than one basic type
Fix an issue that can only be seen on an heterogeneous cluster when sending MPI_LONG_INT type and friends
2015-05-22 15:44:08 +09:00
Nathan Hjelm
2f93fe63b9 Merge pull request #597 from hjelmn/mca_base_coverity
mca/base: fix coverity issues and enable project name in MCA groups
2015-05-21 15:01:35 -06:00
Nathan Hjelm
cea735b3c3 mca/base: fix coverity issues and enable project name in MCA groups
CID 1047278 Unchecked return value

Updated check for mca_base_var_generate_full_name4 to match other
checks. Logically equivalent to the old check. Not a bug.

CID 1196685 Dereference null return

Added check for NULL when looking up the original variable for a
synonym.

CID 1269705 Logically dead code

Removed code that set the project to NULL. Code was intended to be
removed with an earlier commit that added the project name into the
component structure. Added code to actually support searching for a
group with a wildcard ('*').

CID 1292739 Dereference null return
CID 1269819 Dereference null return

Removed unnecessary string duplication and strchr.

CID 1287030 Logically dead code

Refactored fixup_files code and confirmed that the code in question is
not reachable. Removed the dead code.

CID 1292740 Use of untrusted string

Use strdup to silence coverity warning.

CID 1294413 Free of address-of expression

Reset mitem to NULL after the OPAL_LIST_FOREACH loop to ensure we
never try to free the list sentinel.

CID 1294414 Unchecked return value

Use (void) to indicate we do not care about the return code in this
instance.

CID 1294415 Resource leak

On error free all the base pointer.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-21 13:33:31 -06:00
Rolf vandeVaart
b3e4ae71d5 Fix finalize code when cuda support is not fully initialized 2015-05-21 13:42:22 -04:00
Nathan Hjelm
c540a9e59a Merge pull request #582 from hjelmn/mca_var_file_fix
mca/base: fix source file name bug for synonyms
2015-05-21 11:34:12 -06:00
Dave Goodell
65b66ab4ae usnic: use fi_getname in newer libfabric
When using an external libfabric (or really any libfabric newer than
libfabric commit 607e863), we must use fi_getname to determine the local
port of our endpoint.  Without this fix, OMPI will hang endlessly
while retransmitting packets to port 0 on the remote host.
2015-05-21 08:51:03 -07:00
Nathan Hjelm
108f55a963 btl/vader: clean up progress of waiting endpoints
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-20 16:14:58 -06:00
Nathan Hjelm
69e70776aa btl/vader: fix double unlock
References #594

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-20 14:35:22 -06:00
Howard Pritchard
62a278d29c Merge pull request #590 from hppritcha/topic/coverity_133
pmix/base: fix coverity error
2015-05-18 06:52:37 -06:00
Gilles Gouaillardet
69f900ab9d libfabric: check the psm_epconn_t type is available before building the PSM provider
embedded libfabric configury does it its own way, so "backport" ofiwg/libfabric#1031
2015-05-18 14:04:41 +09:00
Howard Pritchard
0980423c5f pmix/base: fix coverity error
Remove some obviously dead code and thus fix a coverity
error - CID #133

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-05-16 13:24:03 -06:00
Howard Pritchard
d9f080b0c7 btl/ugni: silence common symbol squawk
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-05-16 10:23:06 -05:00
Howard Pritchard
a1d65cfd8b pmix/cray: fix locality setting
Code for setting proc node locality
was absent after the removal of Cray
PMI KVS usage.  This commit puts that
functionality back in place.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-05-15 12:17:15 -07:00
George Bosilca
675dccf9d9 Print the port in host byte order. 2015-05-15 00:14:28 -04:00
Nathan Hjelm
427aebbaca Fix cuda support MCA variables
This commit fixes some issues with the cuda support parameters. There
were a couple of duplicate registrations and an incorrect synonym (one
variable was made a synonym of mpi_preconnect_mpi).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-12 09:52:51 -06:00
Nathan Hjelm
9caffa5dd8 mca/base: fix source file name bug for synonyms
This commit fixes synonyms so the source file is correctly printed out
by ompi_info. This commit also adds support for printing out the line
number where the variable is set.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-12 09:52:31 -06:00
Jeff Squyres
e95010b095 common verbs: only install fake usnic driver when relevant
Only install the fake usnic libibverbs driver when there are actually
usnic kernel devices present.  This prevents some run-time weirdness
on the Cray verbs emulation environment, where apparently
ibv_register_driver() either is not implemented or does not work
properly.
2015-05-11 12:57:06 -07:00
Todd Kordenbrock
9df163f116 portals4: use a single Memory Descriptor to cover all of memory
In days past, some implementations of Portals4 could not cover all
of memory with a single Memory Descriptor so multiple large
overlapping Memory Descriptors were created.  Because none of the
current implementations have this limitation (and no future
implementations should either), this commit removes the overlapping
Memory Descriptors code.
2015-05-11 11:49:41 -05:00
Howard Pritchard
94576993b0 Merge pull request #574 from hppritcha/topic/ugni_common_symbol
common/ugni: fix common symbol problem
2015-05-08 05:55:15 -06:00
Howard Pritchard
341b773724 common/ugni: fix common symbol problem
Stop nagging of common symbol detection script for ugni
stuff.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-05-08 04:27:01 -07:00
Jeff Squyres
7a577c0ed2 libfabric: delete tarball
Oops -- the tarball itself should not have been committed to the repo.
2015-05-08 03:24:09 -07:00
Gilles Gouaillardet
c809aace47 initialize common symbols from opal
A few uninitialized common symbols are remaining:

common symbols generated by flex :
 * opal/util/keyval/keyval_lex.l: opal_util_keyval_yyleng
 * opal/util/keyval/keyval_lex.o: opal_util_keyval_yytext
 * opal/util/show_help_lex.l: opal_show_help_yyleng
 * opal/util/show_help_lex.l: opal_show_help_yytext

common symbol generated by "external" hwloc library:
 * opal/mca/hwloc/hwloc191/hwloc/src/components.o: component_map
2015-05-08 09:48:51 +09:00
Jeff Squyres
a1770950c6 libfabric: update to 1.0.0
This is likely short-lived: now that libfabric has a 1.0.0 release
available, the embedded libfabric may disappear from the OMPI tree
sometime soon.  However, we still need it for the time being...
2015-05-07 11:14:13 -07:00
Ralph Castain
9cb2fcfa5c Cleanup the qos code when --enable-timings is given 2015-05-06 20:24:27 -07:00
Ralph Castain
1f8de276de Consolidate all the QOS changes into one clean commit 2015-05-06 19:48:42 -07:00
Ralph Castain
554c7c3551 Per request from Nathan, let the user provide a NULL return list for dstore.fetch to indicate they just want to know if the key is present (but don't care about the actual value). Saves dealing with the list and copying data when not needed. 2015-05-06 08:20:19 -07:00
Jeff Squyres
676673189b Merge pull request #565 from jsquyres/pr/fake-usnic-ibv-driver
Squelch libibverbs complaints about lack of usnic userspace plugin
2015-05-05 10:27:33 -04:00
Jeff Squyres
f79e137247 Merge pull request #555 from jsquyres/pr/openib-delay-cpc-init
btl openib: only initialize CPCs if there are devices to use
2015-05-05 10:26:55 -04:00
Howard Pritchard
b5fc5404c6 libfabric/embedded: add missing psmx_eq.c
The ompi libfabric/Makefile.am to build the libmca_component_libfabric
lib was missing a recently added psmx_eq.c in the list of source
files for the psm provider.

Fixes #569

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-05-04 14:54:59 -06:00
George Bosilca
459e15479f Remove double ; 2015-04-30 14:43:19 -04:00
Jeff Squyres
76222f462e btl openib: if ibv_open_device() returns NULL, it's not supported
When a libibverbs driver returns NULL for its context, it's the Open
MPI libibverbs fake driver.  Hence, this device is simply not
supported -- ignore it.
2015-04-29 18:07:12 -07:00
Jeff Squyres
a2b55e12f2 common verbs: insert fake usnic_verbs libibverbs driver
libibverbs will complain to stderr if it sees device entries in
/sys/class/infiniband for which it has no userspace plugins.

The Cisco usNIC device no longer exports a verbs interface, thereby
causing libibverbs to emit this annoying stderr warning.

To avoid this, use the public ibv API to register a "fake" libibverbs
driver at run-time (right after we call ibv_fork_init(), but --
critically -- *before* we call ibv_get_device_list()).  The purpose of
this driver is solely to convince libibverbs that there *is* a driver
for /sys/class/infininband/usnic_verbs devices.  ...although this
driver will never return a valid ibv context (and therefore will never
be used).
2015-04-29 18:07:12 -07:00
Jeff Squyres
202f64868c btl openib: only initialize CPCs if there are devices to use
Defer initializing the CPCs until we know that we have devices/ports
to use.  This both prevents some useless work at startup when there
are no devices/ports to use, and also prevents librdmacm complaining
that there are no verbs-capable RDMA devices available (e.g., if a
Cisco usNIC device is present, but does not present a verbs RDMA
interface).
2015-04-29 17:54:11 -07:00
Jeff Squyres
df6f7597a4 btl openib: only initialize CPCs if there are devices to use
Defer initializing the CPCs until we know that we have devices/ports
to use.  This both prevents some useless work at startup when there
are no devices/ports to use, and also prevents librdmacm complaining
that there are no verbs-capable RDMA devices available (e.g., if a
Cisco usNIC device is present, but does not present a verbs RDMA
interface).
2015-04-29 17:52:41 -07:00
Jeff Squyres
4cc5c5261d libfabric: disable all semblance of verbs
Including the usnic fake ibv verbs driver.

This fix was mistakenly not included in open-mpi/ompi@d0937c6.
2015-04-29 17:46:12 -07:00
Jeff Squyres
d0937c6f42 libfabric: update to upstream c01338a53abf969799ac0722de152ca0bd96fa3c
Fixes a usnic bug with respect to porting to v1.8
2015-04-29 17:38:19 -07:00
Jeff Squyres
faf3324b0e libfabric: update to upstream d4ab6e56e23124e565ada939054a159737e52102
Fix a critical usnic bug, and other misc updates.
2015-04-29 16:02:08 -07:00
Jeff Squyres
a50ad505e7 There were corner cases that allowed max_reg to be uninitialized. Set
a default value so that those corner cases would still have an
initialized value in max_reg.
2015-04-28 14:34:17 -07:00
Rolf vandeVaart
b260dc4228 Cleanup interface that handles events. No functional changes 2015-04-28 15:15:24 -04:00
Rolf vandeVaart
2b99b44a16 Fix error only seen when running with CUDA 5.5 or less. Introduced with BTL 3.0 2015-04-27 17:09:53 -04:00