1
1

23052 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
6b86e74218 btl/openib: fix coverity issues
CID 1269933 Uninitialized scalar variable (UNINIT)

This CID isn't really an error but it is best for both valgrind and
coverity cleanness to not write uninitialized data. Added an
initializer for async_command in btl_openib_component_close.

CID 1269930 Uninitialized scalar variable (UNINIT)

Same as above. Best not to write uninitialized data. Added an
initializer for async_command.

CID 1269701 Logically dead code (DEADCODE)

Coverity is correct. The smallest_pp_qp will always be 0. Changed the
initial value so that the smallest_pp_qp is set as intended. If no
per-per queue pair exists then use the last shared queue pair. This
queue pair should have the smallest message size. This will reduce
buffer waste.

CID 1269713 Logically dead code (DEADCODE)

False positive but easy to silence. The two check are meaningless if
HAVE_XRC is 0 so protect them with #if HAVE_XRC.

CID 1269726 Division or modulo by zero (DIVIDE_BY_ZERO)

Indeed an issue. If we get an invalid value for rd_win then this will
cause a divide-by-zero exception. Added a check to ensure rd_win is >
0. Also updated the help message to reflect this requirement.

CID 1269672 Ignoring number of bytes read (CHECKED_RETURN)

This error was somewhat intentional. Linux parameter files are
probably not empty but it is safer to check the return code of read to
make sure we got something. If 0 bytes are read this code could SEGV
whe running strtoull.

CID 1269836 Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)

Add a range check to read_module_param to ensure we do not
overflow. In the future it might be worthwhile to report an error
because these parameters should never cause overflow in this
calculation.

CID 1269692 Calling risky function (DC.WEAK_CRYPTO)

??? This call was added in 2006 but I see no calls to the rest of the
rand48 family of functions. Anyway, we SHOULD NEVER be calling seed48,
srand, etc because it messes with user code. Removed the call to
seed48.

CID 1269823 Dereference null return value (NULL_RETURNS)

This is likely a false positive. The endpoint lock is being held so no
other thread should be able to remove fragments from the list. Also,
mca_btl_openib_endpoint_post_send should not be removing items from
the list. If a NULL fragment is ever returned it will likely be a
coding error on the part of an Open MPI developer. Added an assert()
to catch this and quiet the coverity error.

CID 1269671 Unchecked return value (CHECKED_RETURN)

Added a check for the return code of mca_btl_openib_endpoint_post_send
to quiet the coverity error. It is unlikely this error path will be
traversed.

CID 1270229 Missing break in switch (MISSING_BREAK)

Add a comment to indicate that the fall-through is intentional.

CID 1269735 Dereference after null check (FORWARD_NULL)

There should always be an endpoint when handling a work
completion. The endpoint is either stored on the fragment or can be
looked up using the immediate data. Move the immediate data code up
and add an assert for a NULL endpoint.

CID 1269740 Dereference after null check (FORWARD_NULL)
CID 1269741 Explicit null dereferenced (FORWARD_NULL)

Similar to CID 1269735 fix.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:09 -06:00
Nathan Hjelm
13e0a9da3a sec/base: fix coverity issues
CID 1292483 Uninitialized pointer read (UNINIT)

Initialize the method and credential members of the opal_sec_cred_t to
avoid possible invalid read when calling cleanup_cred.

CID 1292484 Double free (USE_AFTER_FREE)

Set method and credential members to NULL after freeing in
cleanup_cred.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:09 -06:00
Nathan Hjelm
f5389cbb03 opal/keyval: fix coverity issues
CID 1292738 Dereference after null check (FORWARD_NULL)

It is an error if NULL is passed for val in add_to_env_str. Removed
the NULL-check @ keyval_parse.c:253 and added a NULL check and an
error return.

CID 1292737 Logically dead code (DEADCODE)

Coverity is correct, the error code at the end of parse_line_new is
never reached. This means we fail to report parsing errors when
parsing -x and -mca lines in keyval files. I moved the error code into
the loop and removed the checks @ keyval_parse.c:314.

I also named the parse state enum type and updated parse_line_new to
use this type.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:09 -06:00
Gilles Gouaillardet
b896ce7c4a opal/ddt: fix opal_dt_swap_bytes
fix a bug from commit open-mpi/ompi@ef74566734
2015-05-28 16:00:03 +09:00
Edgar Gabriel
aa72e5b2ca fix the selection logic to not overwrite on the new aggregator side the list of
aggregators determined by the algorithm.
2015-05-27 22:35:45 -05:00
Jithin Jose
5ba5a9ade2 Offset buffer by datatype true_lb to handle resized datatypes.
- Follow up patch for 56869bff38e264ee91ea68ae2fabfafe9456548e

Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-05-27 13:51:05 -07:00
Jeff Squyres
7b9e349498 iopenmpi-nightly-tarball.sh: make v1.10 tarballs 2015-05-27 11:45:40 -07:00
Jithin Jose
c745854d9b Avoid opal_convertor_pack for contigous data types in MXM mtl
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-05-27 11:09:25 -07:00
Mike Dubman
ae0d577513 Merge pull request #606 from miked-mellanox/topic/fix_confgure_logic
mxm: fix configure logic
2015-05-27 10:44:00 +03:00
Mike Dubman
be5601b5d0 build: fix mxm configure
configure w/ CPPFLAGS/LDFLAGS can add empty -L statement from mxm
Thanks to David Shrader <dshrader@lanl.gov> for patch
2015-05-27 09:59:42 +03:00
George Bosilca
ef74566734 Optimize the heterogeneous case when there are multiple
identical types.
2015-05-27 01:07:29 -04:00
George Bosilca
79d5e2a92b Use C99 array initialization. Add a comment about
the limitations of the functions generated with
COPY_TYPE_HETEROGENEOUS.
2015-05-27 00:44:17 -04:00
Jithin Jose
07043894bd Avoid extra lookup for ompi_proc in homogenous build
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-05-26 21:42:42 -07:00
Jithin Jose
50089977ac Inline PML-CM
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-05-26 21:42:41 -07:00
Jithin Jose
56869bff38 Avoid datatype pack/unpack for contiguous data on homogenous systems.
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-05-26 21:42:41 -07:00
Jeff Squyres
fb12572438 OFI: make v1.10 and v2.0 fit in version checking scheme
v1.10 is now in the same compatibility level as v1.7/v1.8 (there
is/will be no v1.9 series).  v2.0 now takes over for what used to be
called v1.9.
2015-05-26 18:58:28 -07:00
Jeff Squyres
ec57aa1805 usnic: also update btl_usnic_compat.c for versioning 2015-05-26 18:56:29 -07:00
Jeff Squyres
9c19dd4e5b usnic: make v1.10 and v2.0 fit in version checking scheme
v1.10 is now in the same compatibility level as v1.7/v1.8 (there
is/will be no v1.9 series).  v2.0 now takes over for what used to be
called v1.9.
2015-05-26 18:42:33 -07:00
Jeff Squyres
7ffc1e2383 VERSION: Bump to v2.0.0 2015-05-26 18:35:18 -07:00
Jeff Squyres
95a2b14543 usnic: avoid some compiler warnings
Followup to open-mpi/ompi@65b66ab: if we're not debugging, then #if
out an entire block so that the compiler doesn't warn about variables
that are assigned and not used.
2015-05-26 14:27:26 -07:00
Nathan Hjelm
a3eb3e2c9c Merge pull request #604 from hjelmn/opal_coverity
opal/crs: clean up parsing code to fix coverity issues
2015-05-26 15:26:01 -06:00
Jeff Squyres
68c33355c6 Merge pull request #595 from goodell/pr/usnic_getname
usnic: use fi_getname in newer libfabric
2015-05-26 17:20:13 -04:00
Nathan Hjelm
9e56ef0da9 opal/crs: clean up parsing code to fix coverity issues
CID 70622 Dereference before null check (REVERSE_INULL)
CID 70459 Logically dead code (DEADCODE)

Cleanup some cludgy code which (among other things) reimplemented
strcat, strdup, and strchr. In the process this resolved two
outstanding coverity issues.

CID 70631 Dereference before null check (REVERSE_INULL)

best_module can not be NULL in this code path. Remove NULL check and
unnecessary goto statements.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-26 14:45:35 -06:00
Nathan Hjelm
7db48c581d orte_quit: Remove logically dead code
CID 71993 Logically dead code (DEADCODE)

As indicated by coverity proc can not be NULL at any point after the
continue. Removed dead code.

CID 1269682 Unchecked return value (CHECKED_RETURN)

Check the return code of orte_get_attribute. I assume we still need to
check for a NULL proc in case the aborted proc attribute is set to
NULL. This might be better as an assert ().

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-26 12:16:12 -06:00
Nathan Hjelm
d21bd24126 ompi/crcp: fix logic issue after component selection
CID 70630 Dereference before null check

Cleaned up useless goto statements and deleted NULL check. If
mca_base_select returns success than best_module and best_component
will always be non-NULL.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-26 11:48:40 -06:00
Gilles Gouaillardet
bc105afb10 opal/ddt: add copy functions for complex types
long double internal representation is arch specific,
and no conversion is done (yet), so MPI_LONG_DOUBLE should not
be used (yet) in heterogeneous mode
2015-05-26 16:59:36 +09:00
Gilles Gouaillardet
899fb89392 MPI_Sendrecv_replace : use the right process convertor 2015-05-26 16:59:36 +09:00
Gilles Gouaillardet
0a2f60994c ompi/ddt: fix #ifdef vs #if HAVE_xxx 2015-05-26 16:59:36 +09:00
Gilles Gouaillardet
e980958ad4 pml/ob1: silence a warning 2015-05-26 15:05:44 +09:00
Ralph Castain
ce915b5757 Fix a typo that incorrectly set the alignment threshold in the openib BTL.
Thanks to Xavier Besseron for pointing it out
2015-05-25 07:12:57 -07:00
Ralph Castain
566505afbf Update NEWS 2015-05-23 12:31:56 -07:00
Ralph Castain
cfd2cc49fd Get the Java bindings to compile again - add missing header 2015-05-23 11:22:24 -07:00
Ralph Castain
c21cd1c91e Ensure the ssh session is dead 2015-05-23 08:14:29 -07:00
Ralph Castain
920562d9b4 Ensure that all ssh sessions are terminated when abnormally terminating the job 2015-05-23 08:14:29 -07:00
Nathan Hjelm
68614a211b Merge pull request #596 from hjelmn/errorcode_fixes
Handle ompi error codes in java code and remove non-standard MPI error code from mpi.h.
2015-05-23 07:29:44 -06:00
Jeff Squyres
5e52ce26b5 help-errmgr-base.txt: remove trailing newline
Removed spurrious newline at end of file so that the emitted help
message doesn't contain a blank line before the final "-----" output.
2015-05-23 03:33:23 -07:00
Ralph Castain
55cd2a07f6 Update exit code 2015-05-22 21:06:43 -07:00
Ralph Castain
3510bb4ced Set the exit code when a daemon fails 2015-05-22 21:05:23 -07:00
rhc54
37d7ae14a7 Merge pull request #598 from rhc54/topic/oob
Fix abnormal shutdown when a node dies
2015-05-22 21:50:48 -06:00
Nathan Hjelm
163a1b4505 Remove non-standard MPI_ERR_SYSRESOURCE error code
Replaced internal usage with OMPI_ERR_OUT_OF_RESOURCE.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-05-22 19:59:37 -06:00
Ralph Castain
bc7a7f3de5 Fix abnormal shutdown when a node dies 2015-05-22 17:29:06 -07:00
Nathan Hjelm
9da29c3621 java: remove debug code
Talked to @ggouaillardet about this code. It was not intended to be committed to
master. Removing to fix coverity issue.

CID 1270134 Unchecked return value

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-22 08:36:14 -06:00
Gilles Gouaillardet
e47cb9636d Revert "opal_pack_homogeneous_contig_with_gaps_function: correctly handle contiguous ddt made of more than one basic type"
This reverts commit e4846746f4d4a746cba5e1b1e62fcd536830f136.
2015-05-22 17:25:33 +09:00
Gilles Gouaillardet
60e4d6c795 btl: add conversion macros for mca_btl_base_segment_t for heterogeneous support 2015-05-22 15:52:32 +09:00
Gilles Gouaillardet
85c45e2275 pml/ob1: fix mca_pml_ob1_recv_request_put_frag(...) in heterogeneous mode 2015-05-22 15:48:45 +09:00
Gilles Gouaillardet
e4846746f4 opal_pack_homogeneous_contig_with_gaps_function: correctly handle contiguous ddt made of more than one basic type
Fix an issue that can only be seen on an heterogeneous cluster when sending MPI_LONG_INT type and friends
2015-05-22 15:44:08 +09:00
Nathan Hjelm
2f93fe63b9 Merge pull request #597 from hjelmn/mca_base_coverity
mca/base: fix coverity issues and enable project name in MCA groups
2015-05-21 15:01:35 -06:00
Nathan Hjelm
cea735b3c3 mca/base: fix coverity issues and enable project name in MCA groups
CID 1047278 Unchecked return value

Updated check for mca_base_var_generate_full_name4 to match other
checks. Logically equivalent to the old check. Not a bug.

CID 1196685 Dereference null return

Added check for NULL when looking up the original variable for a
synonym.

CID 1269705 Logically dead code

Removed code that set the project to NULL. Code was intended to be
removed with an earlier commit that added the project name into the
component structure. Added code to actually support searching for a
group with a wildcard ('*').

CID 1292739 Dereference null return
CID 1269819 Dereference null return

Removed unnecessary string duplication and strchr.

CID 1287030 Logically dead code

Refactored fixup_files code and confirmed that the code in question is
not reachable. Removed the dead code.

CID 1292740 Use of untrusted string

Use strdup to silence coverity warning.

CID 1294413 Free of address-of expression

Reset mitem to NULL after the OPAL_LIST_FOREACH loop to ensure we
never try to free the list sentinel.

CID 1294414 Unchecked return value

Use (void) to indicate we do not care about the return code in this
instance.

CID 1294415 Resource leak

On error free all the base pointer.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-21 13:33:31 -06:00
Rolf vandeVaart
b3e4ae71d5 Fix finalize code when cuda support is not fully initialized 2015-05-21 13:42:22 -04:00
Nathan Hjelm
c540a9e59a Merge pull request #582 from hjelmn/mca_var_file_fix
mca/base: fix source file name bug for synonyms
2015-05-21 11:34:12 -06:00