1
1
Граф коммитов

24856 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
0e1350f5b7 Add missing header files 2016-03-25 09:06:51 -07:00
Ralph Castain
a3fea58d1c Minor cleanups to prior PR commit 2016-03-24 15:55:14 -07:00
rhc54
6756e19aa2 Merge pull request #1457 from anandhis/master
rml changes
2016-03-24 15:17:29 -07:00
rhc54
ba8c8700aa Merge pull request #1493 from rhc54/topic/sing
Update singularity support to track changes in upstream Singularity code
2016-03-24 15:16:38 -07:00
Ralph Castain
8c14df2328 Revert "Modify singularity support per patch from Greg Kurtzer"
This reverts commit open-mpi/ompi@f7257a8310.

Ensure that we properly cleanup the session directory tree. Prior code had issues with symlinks, especially if the file that the link points to was already removed as we traverse the tree. Also found that the dirent checks for directory type weren't fully portable, and so fall back to the stat-based approach which is known to be portable.

Fix singularity singletons by detecting we are in a container and properly setting the pmix selection to pick the isolated component. Remove a stale restriction blocking use of the sm btl
2016-03-24 11:27:18 -07:00
rhc54
f5f1751877 Merge pull request #1492 from rhc54/topic/ashley
Extend the abort on non zero status flag to apply to processes which die as the result of signals.
2016-03-24 09:42:21 -07:00
Ralph Castain
378d9cbb5e Extend the abort on non zero status flag to apply to processes which die as the result of signals. 2016-03-24 08:33:55 -07:00
Joshua Ladd
c2813a48e6 Merge pull request #1488 from alinask/topic/openib_enhance_verbose
btl/openib: enhance the verbosity level when using rdmacm without a first PP QP
2016-03-24 10:17:21 -04:00
Gilles Gouaillardet
cb84f582b2 oshmem: add missing prototypes for pshmem_alltoall[s]{32,64} 2016-03-24 15:22:29 +09:00
George Bosilca
57eadb0dd6 Fix for Coverity CID 1357152.
Or at least that was the origin of the issue. It turns out
we were freeing the wrong buffer (but as it only happen in the
case of an error we never noticed).
2016-03-24 00:53:30 -04:00
George Bosilca
ab8008e9fe Nathan missed one reference to mpool. 2016-03-24 00:52:59 -04:00
Nathan Hjelm
4298c6957f Merge pull request #1490 from hjelmn/scif
btl/scif: update for mpool/rcache rewrite
2016-03-23 17:49:03 -06:00
George Bosilca
4b38b6bd0c Fix multiple issues with the collective requests.
This patch addresses most (if not all) @derbeyn concerns
expressed on #1015. I added checks for the requests allocation
in all functions, ompi_coll_base_free_reqs is called with the
right number of requests, I removed the unnecessary basic_module_comm_t
and use the base_module_comm_t instead, I remove all uses of the
COLL_BASE_BCAST_USE_BLOCKING define, and other minor fixes.
2016-03-23 18:35:41 -04:00
Nathan Hjelm
478feaf622 btl/scif: update for mpool/rcache rewrite
This commit brings the scif btl up to date with changes made on master
to rework the mpool and rcache frameworks.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-23 16:30:10 -06:00
rhc54
dec23f3d39 Merge pull request #1489 from rhc54/topic/bind
Correct the binding for the --map-by node case
2016-03-23 12:00:42 -07:00
Ralph Castain
cdd3dc99ca Correct the binding for the --map-by node case - we should still use our default binding algorithms 2016-03-23 09:55:24 -07:00
Alina Sklarevich
2e5a1372dd btl/openib: enhance the verbosity level when using rdmacm without a first PP QP. 2016-03-23 18:49:12 +02:00
Ralph Castain
6e6bbfda91 Very minor typo 2016-03-23 08:31:47 -07:00
Ralph Castain
4a623778a9 Fix the debugger attach - previous commit had fixed one instance of a check prior to sending the release message, but there was a second code path that included a similar check that was missed. Thanks to John DelSignore for spotting it! 2016-03-23 08:25:25 -07:00
Gilles Gouaillardet
4b91237464 opal/class/opal_lifo: use a standard syntax to initialize a local variable
opal_lifo.h is now included from C++, and g++ does not seem to accept C99 initializer.
2016-03-23 09:46:46 +09:00
Howard Pritchard
69200e6229 plm/alps: fix usage of cray wlm_detect methods
Turns out there are some cases where the Cray
wlm_detect_get_active may return NULL, in which
case fallback to wlm_detect_get_default method
is suggested.  Make use of the fallback to
avoid segfaults under some circumstances in the
ALPS plm selection method.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-03-22 11:40:56 -07:00
Nathan Hjelm
8ff1d09907 Merge pull request #1484 from hjelmn/header_cleanup
ompi/comm: clean up includes in comm_request.h
2016-03-22 10:17:04 -06:00
Nathan Hjelm
a1420003b6 ompi/comm: clean up includes in comm_request.h
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-22 09:17:38 -06:00
Mike Dubman
1d8fbfefb0 Merge pull request #1478 from igor-ivanov/pr/oshmem-v1.3-alltoall
oshmem: Add alltoall
2016-03-22 07:51:36 +02:00
Mike Dubman
7483a66ef6 Merge pull request #1455 from igor-ivanov/pr/oshmem-v1.3
oshmem: Add Non-blocking Remote Memory Access Routines
2016-03-22 07:50:11 +02:00
Nathan Hjelm
7572c8b74f btl: use flag enumerator for btl_*_flags and btl_*_atomic_flags
This commit uses the new flag "enumerator" to support comma-delimited
lists of flags for both the btl and btl atomic flags. After this
commit is is valid to specify something like -mca btl_foo_flags
self,put,get,in-place. All non-deprecated flags are supported by the
enumerator.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-21 15:23:37 -06:00
Nathan Hjelm
b15a45088c mca: add support for flag enumerators
This commit adds a new type of enumerator meant to support flag
values. The enumerator parses comma-delimited strings and matches
each string or value to a list of valid flags. Additionally, the
enumerator does some basic checks to see if 1) a flag is valid in the
enumerator, and 2) if any conflicting flags are specified.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-21 15:20:56 -06:00
Nathan Hjelm
00bf598fac Merge pull request #1481 from hjelmn/rdmacm_dynamic_add_procs
btl/openib: update rdmacm for dynamic add_procs
2016-03-21 11:02:40 -06:00
Nathan Hjelm
645bd9d9dd btl/openib: update rdmacm for dynamic add_procs
This commit adds the data necessesary for supporting dynamic add_procs
to the rdma message (opal_process_name_t). The endpoint lookup
function has been updated to match the code in udcm.

Closes open-mpi/ompi#1468.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-21 10:00:41 -06:00
Todd Kordenbrock
2122a15217 Merge pull request #1443 from francois-wellenreiter/fix_trig_rndv
MTL portals4 : fix around triggered rndv operations
2016-03-21 08:16:33 -05:00
igor-ivanov
4315435963 Merge pull request #1476 from igor-ivanov/pr/fix-scoll-basic-barrier
oshmem/scoll: Fix bug in basic/barrier algorithm
2016-03-21 13:04:42 +03:00
Igor Ivanov
1bed5d8aee oshmem: Align OSHMEM API with spec v1.3 (update scoll/basic) 2016-03-21 11:46:01 +02:00
Igor Ivanov
9825157fc4 oshmem: Align OSHMEM API with spec v1.3 (Add man for alltoall) 2016-03-21 10:43:45 +02:00
Igor Ivanov
3e1e131744 oshmem: Align OSHMEM API with spec v1.3 (Add alltoall Fortran) 2016-03-21 10:43:44 +02:00
Igor Ivanov
bd6eaac561 oshmem: Align OSHMEM API with spec v1.3 (Add alltoall C function) 2016-03-21 10:43:43 +02:00
Igor Ivanov
50906b34b3 oshmem: Align OSHMEM API with spec v1.3 (Add scoll/alltoall interface) 2016-03-21 10:43:31 +02:00
KAWASHIMA Takahiro
84b79dbf59 osc/sm: Fix a bus error on MPI_WIN_{POST,START}.
A bus error occurs in sm OSC under the following conditions.

- sparc64 or any other architectures which need strict alignment.
- `MPI_WIN_POST` or `MPI_WIN_START` is called for a window created
  by sm OSC.
- The communicator size is odd and greater than 3.

The lines 283-285 in current `ompi/mca/osc/sm/osc_sm_component.c` has
the following code.

```c
module->global_state = (ompi_osc_sm_global_state_t *) (module->segment_base);
module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1);
module->posts[0] = (uint64_t *) (module->node_states + comm_size);
```

The size of `ompi_osc_sm_node_state_t` is multiples of 4 but not
multiples of 8. So if `comm_size` is odd, `module->posts[0]` does
not aligned to 8. This causes a bus error when accessing
`module->posts[i][j]`.

This patch fixes the alignment of `module->posts[0]` by setting
`module->posts[0]` first.
2016-03-21 10:38:02 +02:00
Igor Ivanov
e690521cdd oshmem/scoll: Fix bug in basic/barrier algorithm 2016-03-21 10:34:55 +02:00
rhc54
10de9c7919 Merge pull request #1480 from rhc54/topic/usock
Fix debugger operations and show_help aggregation
2016-03-19 02:33:17 -07:00
Ralph Castain
c146c4969b Revert part of open-mpi/ompi@c1bbbb5e2f to restore the usock component, thus fixing show_help aggregation.
Fixes #1467

Restore debugger attach operations

Fixes #1225
2016-03-18 21:49:04 -07:00
Nathan Hjelm
e020566924 Merge pull request #1479 from hjelmn/opal_coverity
opal: fix coverity issues
2016-03-18 17:00:53 -06:00
Nathan Hjelm
4d4fa28f75 opal: fix coverity issues
Fix CID 1345825 (1 of 1): Dereference before null check (REVERSE_INULL):

ib_proc should not be NULL in this case. Removed the check and added a
check for NULL after OBJ_NEW.

CID 1269821 (1 of 1): Dereference null return value (NULL_RETURNS):

I labeled this one as a false positive (which it is) but the code in
question could stand be be cleaned up.

Fix CID 1356424 (1 of 1): Argument cannot be negative (NEGATIVE_RETURNS):

While trying to silence another Coverity issue another was
flagged. Protect the close of fd with if (fd >= 0).

CID 70772 (1 of 1): Dereference null return value (NULL_RETURNS):
CID 70773 (1 of 1): Dereference null return value (NULL_RETURNS):
CID 70774 (1 of 1): Dereference null return value (NULL_RETURNS):

None of these are errors and are intentional but now that we have a
list release function use that to make these go away. The cleanup is
similar to CID 1269821.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-18 15:56:08 -06:00
Nathan Hjelm
018e3ebeb6 Merge pull request #1471 from hjelmn/lanl_platform
contrib/lanl: update platform files for TOSS2
2016-03-18 15:04:34 -06:00
Nathan Hjelm
81085dd89d Merge pull request #1477 from hjelmn/ompi_coverity
Fix some issues in OMPI identified by Coverity
2016-03-18 14:55:54 -06:00
Ralph Castain
972026b9c1 Add the option to not make the greek tarball, only making the non-greek one 2016-03-18 12:25:20 -07:00
Nathan Hjelm
075dfa4121 topo/treematch: fix component coverity issues
Fix CID 1315298: Resource leak (RESOURCE_LEAK) :
Fix CID 1315300: Resource leak (RESOURCE_LEAK):
Fix CID 1315299: Resource leak (RESOURCE_LEAK):
Fix CID 1315297 (#1 of 1): Resource leak (RESOURCE_LEAK):

Confirmed leaks in error paths. Added the leaked arrays to the
ERR_EXIT macro to ensure they are freed.

Fix CID 1315296 (#1 of 1): Resource leak (RESOURCE_LEAK):

Confirmed leak in error paths. Both the oversub and reqs arrays are
leaked. Free these arrays on error.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-18 11:31:11 -06:00
Nathan Hjelm
3540b65f7d bcol: fix coverity issues
Fix CID 1269976 (#1 of 1): Unused value (UNUSED_VALUE):
Fix CID 1269979 (#1 of 1): Unused value (UNUSED_VALUE):

Removed unused variables k_temp1 and k_temp2.

Fix CID 1269981 (#1 of 1): Unused value (UNUSED_VALUE):
Fix CID 1269974 (#1 of 1): Unused value (UNUSED_VALUE):

Removed gotos and use the matched flags to decide whether to return.

Fix CID 715755 (#1 of 1): Dereference null return value (NULL_RETURNS):

This was also a leak. The items on cs->ctl_structures are allocated using OBJ_NEW so they mist be released using OBJ_RELEASE not OBJ_DESTRUCT. Replaced the loop with OPAL_LIST_DESTRUCT().

Fix CID 715776 (#1 of 1): Dereference before null check (REVERSE_INULL):

Rework error path to remove REVERSE_INULL. Also added a free to an error path where it was missing.

Fix CID 1196603 (#1 of 1): Bad bit shift operation (BAD_SHIFT):
Fix CID 1196601 (#1 of 1): Bad bit shift operation (BAD_SHIFT):

Both of these are false positives but it is still worthwhile to fix so they no longer appear. The loop conditional has been updated to use radix_mask_pow instead of radix_mask to quiet these issues.

Fix CID 1269804 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS):

In general close (-1) is safe but coverity doesn’t like it. Reworked the error path for open to not try to close (-1).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-18 10:59:46 -06:00
Nathan Hjelm
c8b077f232 coll/ml: fix coverity issues
Fix CID 715744 (#1 of 1): Logically dead code (DEADCODE):
Fix CID 715745 (#1 of 1): Logically dead code (DEADCODE):

The free of scratch_num in either place is defensive programming. Instead of removing the free the conditional around the free has been removed to quiet the warning.

Fix CID 715753 (#1 of 1): Dereference after null check (FORWARD_NULL):
Fix CID 715778 (#1 of 1): Dereference before null check (REVERSE_INULL):

Fixed the conditional to check for collective_alg != NULL instead of collective_alg->functions != NULL.

Fix CID 715749 (#1 of 4): Explicit null dereferenced (FORWARD_NULL):

Updated code to ensure that none of the parse functions are reached with a non-NULL value.

Fix CID 715746 (#1 of 1): Logically dead code (DEADCODE):

Removed dead code.

Fix CID 715768 (#1 of 1): Resource leak (RESOURCE_LEAK):
Fix CID 715769 (#2 of 2): Resource leak (RESOURCE_LEAK):
Fix CID 715772 (#1 of 1): Resource leak (RESOURCE_LEAK):

Move free calls to before error checks to cleanup leak in error paths.

Fix CID 741334 (#1 of 1): Explicit null dereferenced (FORWARD_NULL):

Added a check to ensure temp is not dereferenced if it is NULL.

Fix CID 1196605 (#1 of 1): Bad bit shift operation (BAD_SHIFT):

Fixed overflow in calculation by replacing int mask with 1ul.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-18 10:11:16 -06:00
Nathan Hjelm
2f4e5325aa coll/base: fix coverity issues
Fix CID 1325868 (#1 of 1): Dereference after null check (FORWARD_NULL):
Fix CID 1325869 (#1-2 of 2): Dereference after null check (FORWARD_NULL):

Here reqs can indeed be NULL. Added a check to
ompi_coll_base_free_reqs to prevent dereferencing NULL pointer.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-18 09:31:43 -06:00
rhc54
d30911f186 Merge pull request #1474 from rhc54/topic/revert
Revert problematic change to app context handling
2016-03-18 08:14:20 -07:00