1
1
Граф коммитов

22936 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
ec57aa1805 usnic: also update btl_usnic_compat.c for versioning 2015-05-26 18:56:29 -07:00
Jeff Squyres
9c19dd4e5b usnic: make v1.10 and v2.0 fit in version checking scheme
v1.10 is now in the same compatibility level as v1.7/v1.8 (there
is/will be no v1.9 series).  v2.0 now takes over for what used to be
called v1.9.
2015-05-26 18:42:33 -07:00
Jeff Squyres
7ffc1e2383 VERSION: Bump to v2.0.0 2015-05-26 18:35:18 -07:00
Jeff Squyres
95a2b14543 usnic: avoid some compiler warnings
Followup to open-mpi/ompi@65b66ab: if we're not debugging, then #if
out an entire block so that the compiler doesn't warn about variables
that are assigned and not used.
2015-05-26 14:27:26 -07:00
Nathan Hjelm
a3eb3e2c9c Merge pull request #604 from hjelmn/opal_coverity
opal/crs: clean up parsing code to fix coverity issues
2015-05-26 15:26:01 -06:00
Jeff Squyres
68c33355c6 Merge pull request #595 from goodell/pr/usnic_getname
usnic: use fi_getname in newer libfabric
2015-05-26 17:20:13 -04:00
Nathan Hjelm
9e56ef0da9 opal/crs: clean up parsing code to fix coverity issues
CID 70622 Dereference before null check (REVERSE_INULL)
CID 70459 Logically dead code (DEADCODE)

Cleanup some cludgy code which (among other things) reimplemented
strcat, strdup, and strchr. In the process this resolved two
outstanding coverity issues.

CID 70631 Dereference before null check (REVERSE_INULL)

best_module can not be NULL in this code path. Remove NULL check and
unnecessary goto statements.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-26 14:45:35 -06:00
Nathan Hjelm
7db48c581d orte_quit: Remove logically dead code
CID 71993 Logically dead code (DEADCODE)

As indicated by coverity proc can not be NULL at any point after the
continue. Removed dead code.

CID 1269682 Unchecked return value (CHECKED_RETURN)

Check the return code of orte_get_attribute. I assume we still need to
check for a NULL proc in case the aborted proc attribute is set to
NULL. This might be better as an assert ().

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-26 12:16:12 -06:00
Nathan Hjelm
d21bd24126 ompi/crcp: fix logic issue after component selection
CID 70630 Dereference before null check

Cleaned up useless goto statements and deleted NULL check. If
mca_base_select returns success than best_module and best_component
will always be non-NULL.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-26 11:48:40 -06:00
Gilles Gouaillardet
bc105afb10 opal/ddt: add copy functions for complex types
long double internal representation is arch specific,
and no conversion is done (yet), so MPI_LONG_DOUBLE should not
be used (yet) in heterogeneous mode
2015-05-26 16:59:36 +09:00
Gilles Gouaillardet
899fb89392 MPI_Sendrecv_replace : use the right process convertor 2015-05-26 16:59:36 +09:00
Gilles Gouaillardet
0a2f60994c ompi/ddt: fix #ifdef vs #if HAVE_xxx 2015-05-26 16:59:36 +09:00
Gilles Gouaillardet
e980958ad4 pml/ob1: silence a warning 2015-05-26 15:05:44 +09:00
Ralph Castain
ce915b5757 Fix a typo that incorrectly set the alignment threshold in the openib BTL.
Thanks to Xavier Besseron for pointing it out
2015-05-25 07:12:57 -07:00
Ralph Castain
566505afbf Update NEWS 2015-05-23 12:31:56 -07:00
Ralph Castain
cfd2cc49fd Get the Java bindings to compile again - add missing header 2015-05-23 11:22:24 -07:00
Ralph Castain
c21cd1c91e Ensure the ssh session is dead 2015-05-23 08:14:29 -07:00
Ralph Castain
920562d9b4 Ensure that all ssh sessions are terminated when abnormally terminating the job 2015-05-23 08:14:29 -07:00
Nathan Hjelm
68614a211b Merge pull request #596 from hjelmn/errorcode_fixes
Handle ompi error codes in java code and remove non-standard MPI error code from mpi.h.
2015-05-23 07:29:44 -06:00
Jeff Squyres
5e52ce26b5 help-errmgr-base.txt: remove trailing newline
Removed spurrious newline at end of file so that the emitted help
message doesn't contain a blank line before the final "-----" output.
2015-05-23 03:33:23 -07:00
Ralph Castain
55cd2a07f6 Update exit code 2015-05-22 21:06:43 -07:00
Ralph Castain
3510bb4ced Set the exit code when a daemon fails 2015-05-22 21:05:23 -07:00
rhc54
37d7ae14a7 Merge pull request #598 from rhc54/topic/oob
Fix abnormal shutdown when a node dies
2015-05-22 21:50:48 -06:00
Nathan Hjelm
163a1b4505 Remove non-standard MPI_ERR_SYSRESOURCE error code
Replaced internal usage with OMPI_ERR_OUT_OF_RESOURCE.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-05-22 19:59:37 -06:00
Ralph Castain
bc7a7f3de5 Fix abnormal shutdown when a node dies 2015-05-22 17:29:06 -07:00
Nathan Hjelm
9da29c3621 java: remove debug code
Talked to @ggouaillardet about this code. It was not intended to be committed to
master. Removing to fix coverity issue.

CID 1270134 Unchecked return value

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-22 08:36:14 -06:00
Gilles Gouaillardet
e47cb9636d Revert "opal_pack_homogeneous_contig_with_gaps_function: correctly handle contiguous ddt made of more than one basic type"
This reverts commit e4846746f4.
2015-05-22 17:25:33 +09:00
Gilles Gouaillardet
60e4d6c795 btl: add conversion macros for mca_btl_base_segment_t for heterogeneous support 2015-05-22 15:52:32 +09:00
Gilles Gouaillardet
85c45e2275 pml/ob1: fix mca_pml_ob1_recv_request_put_frag(...) in heterogeneous mode 2015-05-22 15:48:45 +09:00
Gilles Gouaillardet
e4846746f4 opal_pack_homogeneous_contig_with_gaps_function: correctly handle contiguous ddt made of more than one basic type
Fix an issue that can only be seen on an heterogeneous cluster when sending MPI_LONG_INT type and friends
2015-05-22 15:44:08 +09:00
Nathan Hjelm
2f93fe63b9 Merge pull request #597 from hjelmn/mca_base_coverity
mca/base: fix coverity issues and enable project name in MCA groups
2015-05-21 15:01:35 -06:00
Nathan Hjelm
cea735b3c3 mca/base: fix coverity issues and enable project name in MCA groups
CID 1047278 Unchecked return value

Updated check for mca_base_var_generate_full_name4 to match other
checks. Logically equivalent to the old check. Not a bug.

CID 1196685 Dereference null return

Added check for NULL when looking up the original variable for a
synonym.

CID 1269705 Logically dead code

Removed code that set the project to NULL. Code was intended to be
removed with an earlier commit that added the project name into the
component structure. Added code to actually support searching for a
group with a wildcard ('*').

CID 1292739 Dereference null return
CID 1269819 Dereference null return

Removed unnecessary string duplication and strchr.

CID 1287030 Logically dead code

Refactored fixup_files code and confirmed that the code in question is
not reachable. Removed the dead code.

CID 1292740 Use of untrusted string

Use strdup to silence coverity warning.

CID 1294413 Free of address-of expression

Reset mitem to NULL after the OPAL_LIST_FOREACH loop to ensure we
never try to free the list sentinel.

CID 1294414 Unchecked return value

Use (void) to indicate we do not care about the return code in this
instance.

CID 1294415 Resource leak

On error free all the base pointer.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-21 13:33:31 -06:00
Rolf vandeVaart
b3e4ae71d5 Fix finalize code when cuda support is not fully initialized 2015-05-21 13:42:22 -04:00
Nathan Hjelm
c540a9e59a Merge pull request #582 from hjelmn/mca_var_file_fix
mca/base: fix source file name bug for synonyms
2015-05-21 11:34:12 -06:00
Nathan Hjelm
403b3b20d7 Handle ompi error codes in java code
This commit also adds protection against negative error codes in ompi
error code functions. There is one outstanding issue. There is a
negative MPI error code defined in mpi.h. This will need to be fixed
separetely.

This commit fixes coverity IDs 1271533 and 1270156.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-21 10:39:09 -06:00
Nathan Hjelm
757c021951 Fix coverity ID 1270164
The sargs array and its elements were malloced but not freed. Note
that strings passed to NewStringUTF are copied into Java's heap and it
is the callers responsibility to free the original string.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-21 10:15:10 -06:00
Dave Goodell
65b66ab4ae usnic: use fi_getname in newer libfabric
When using an external libfabric (or really any libfabric newer than
libfabric commit 607e863), we must use fi_getname to determine the local
port of our endpoint.  Without this fix, OMPI will hang endlessly
while retransmitting packets to port 0 on the remote host.
2015-05-21 08:51:03 -07:00
Ralph Castain
96cd42699e Cleanup warnings for uninitialized vars and convert bare debug output to verbose 2015-05-21 07:41:26 -07:00
Jeff Squyres
0d4a6d7326 Merge pull request #588 from jsquyres/pr/keepalive-madness
Keepalive cleanups
2015-05-20 21:38:44 -04:00
Jeff Squyres
3069daa015 oob_tcp_listener: slightly refactor EAGAIN/EWOULDBLOCK
Have only a single level of "if" conditionals.  Also, slightly change
the logic such that we only die/break out of the loop if we get EMFILE
-- all other errors are ok to go on to the next fd.

Finally, use a real show_help() message to warn when other errors occur.
2015-05-20 21:10:11 -04:00
Jeff Squyres
e43c8dc291 oob tcp: label a few #endif's
Only bother labeling the ones that are a little far away from their
corresponding #if statements.
2015-05-20 21:10:11 -04:00
Jeff Squyres
4b2f0d4827 oob tcp: reset MCA params from level 9
Set various MCA param levels
2015-05-20 21:10:11 -04:00
Jeff Squyres
1a4c9960e1 oob tcp: set KEEPALIVE timeout 60s, retry interval 5s
The timeout is frequency at which to send keepalive pings; the retry
interval is how often to send successive pings once a keepalive has
not replied.

Also update comments and MCA param help strings.

60 seconds -- squashme
2015-05-20 21:08:37 -04:00
Nathan Hjelm
108f55a963 btl/vader: clean up progress of waiting endpoints
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-20 16:14:58 -06:00
Jeff Squyres
c95215dfc2 oob_tcp: do not set KEEPALIVE on listening sockets 2015-05-20 17:28:45 -04:00
Jeff Squyres
32d81af35f oob tcp: re-enable keepalive option for Mac
Plus very minor #if/#endif reduction.
2015-05-20 17:28:45 -04:00
Nathan Hjelm
69e70776aa btl/vader: fix double unlock
References #594

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-20 14:35:22 -06:00
Nathan Hjelm
ce48eabd84 pml/ob1: use c99 flexible array members instead of size 1 arrays
This commit updates several ob1 structures to take advantage of C99's
flexible array member.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-20 10:31:35 -06:00
rhc54
95c40e64b9 Merge pull request #584 from nkogteva/oob_ud_stress_test
oob ud: fixed a bug that prevented the work with QoS framework
2015-05-20 09:56:08 -06:00
Gilles Gouaillardet
b6c67e051d io/ompio: fix misc memory leaks
as reported by Coverity with CIDs 72147-72149,72187,72188,731274,731275,741356,
1269889,1269893,1271535 and 1269872
2015-05-20 17:19:39 +09:00