1
1
Граф коммитов

22322 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
4ded049cbc Modify MCA param description 2015-03-16 11:57:32 -07:00
Ralph Castain
019bba5caf Cleanup a bit - don't need to lookup the protocol number if we just use the right define 2015-03-16 11:54:51 -07:00
Ralph Castain
69ac25bf55 Add support for TCP keepalive on inter-node sockets 2015-03-16 09:59:44 -07:00
Ralph Castain
0cfb4f29aa Silence compiler warning 2015-03-16 09:59:21 -07:00
Mike Dubman
7640507438 Merge pull request #472 from miked-mellanox/topic/fix_compile_warn
btl/openib: fix compiler warning, by HalR
2015-03-13 14:06:07 +02:00
Jeff Squyres
0166318966 opal_check_pmi: protect un-prefixed shell variables
Since there's unfortunately only a global namespace for shell
variables, we need to protect un-prefixed shell variables with
OPAL_VAR_SCOPE_PUSH/POP.
2015-03-13 04:48:31 -07:00
Jeff Squyres
4ab9e67832 hwloc external: portability updates
Change "test -a" to "&& test", and change foo="$bar" to foo=$bar.  No
substantive code changes.
2015-03-13 04:40:09 -07:00
Jeff Squyres
4d63c88ed1 hwloc external: whitespace cleanup, no code changes 2015-03-13 04:40:05 -07:00
Mike Dubman
00784ae3ba btl/openib: fix compiler warning, by HalR 2015-03-13 13:17:23 +02:00
Todd Kordenbrock
9350b06f7d btl-portals4: fix compiler warnings 2015-03-12 20:34:04 -05:00
Todd Kordenbrock
515d9e8cc9 mtl-portals4: fix compiler warnings 2015-03-12 20:34:04 -05:00
Jeff Squyres
65a0e041ac dl: need to use LIBADD, not LIBS
When we use LIBADD for static libraries, the dependent libraries get
propagated properly.  For example, the dl/dlopen component will almost
certainly require the -ldl library; when using LIBS, that doesn't get
propagated elsewhere in the tree, but when using LIBADD, it does
(e.g., when linking opal_wrapper_compiler).
2015-03-12 15:01:14 -07:00
Ryan Grant
6f76984a3c Merge pull request #470 from tkordenbrock/topic/update-portals4-to-btl3
btl-portals4: implement the BTL 3.0 interface
2015-03-12 15:34:05 -06:00
Jeff Squyres
a1daa39425 libfabric: update to Github lifabric 90ac5a258418e
Update to latest upstream Github lifabric in order to fix some usnic
bugs.
2015-03-12 13:23:32 -07:00
Todd Kordenbrock
d1656347c8 btl-portals4: implement the BTL 3.0 interface 2015-03-12 14:19:44 -05:00
adrianreber
714d9aa67e Merge pull request #348 from adrianreber/topic/orte_cr_continue_like_restart
Topic/orte cr continue like restart
2015-03-12 14:54:02 +01:00
Mike Dubman
b4d6420797 Merge pull request #468 from alinask/topic/fix_yalla_mxm_cov
MTL_MXM/PML_YALLA: fix coverity issues.
2015-03-12 13:22:44 +02:00
Alina Sklarevich
28586caecf MTL_MXM/PML_YALLA: fix coverity issues. 2015-03-12 11:49:22 +02:00
Nathan Hjelm
695dcd5a28 oob/ud: fix compiler warning 2015-03-11 10:53:32 -06:00
Howard Pritchard
da85d5fc0a Merge pull request #467 from hppritcha/topic/minor_fcoll_static_coverity_fix
fcoll/static: minor fix for coverity
2015-03-11 10:28:05 -06:00
Nathan Hjelm
fd78491768 Merge pull request #451 from elenash/master
fix: mca_base_env_var mca parameter is never handled if it's set from am...
2015-03-11 09:54:25 -06:00
Nathan Hjelm
ce6caab2a7 Merge pull request #463 from hjelmn/cuda_async
btl/openib: cuda: fix CUDA-aware support with async copy
2015-03-11 09:52:48 -06:00
Howard Pritchard
66fee3bd18 fcoll/static: minor fix for coverity
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-03-11 09:11:49 -06:00
Jeff Squyres
c61dd4d56f usnic: each err eq entry reports *1* completion
Actually, the return from fi_eq_readerr() only indicates a *single*
error completion (not err_entry.data completions).
2015-03-11 08:07:20 -07:00
Ralph Castain
2de5cd6e5f Ensure we don't install the libevent internal headers 2015-03-11 07:35:20 -07:00
Nathan Hjelm
395635f017 Merge pull request #461 from hjelmn/btl_openib_cleanup
btl/openib: remove derived btl segment type
2015-03-11 08:20:41 -06:00
Jeff Squyres
9c926e5e82 usnic: add more commments/explanation about error cases
If we really get a catastrophic error from a libfabric call, don't
bother trying to continue (because data has been corrupted and there's
nothing sane left to do).  Just call opal_btl_usnic_exit() (which
tries to call the PML error callback, but we're so early in the
module_init process that this likely hasn't been setup yet, so the job
will likely abort).
2015-03-11 07:16:28 -07:00
Jeff Squyres
51583789fb usnic: re-indent some show_help code
Nothing too substantial here, but two of the messages moved from
"libfabric API failed" to "internal error during init", just to be a
bit more descriptive.
2015-03-11 07:15:28 -07:00
Jeff Squyres
1b836d784c usnic: subtract number of errored insertions from loop count
When we get errors, the entry.data field tells us how many errors are
being reported.  So decrement the loop count variable by that much.

This fixes CSCut30441.
2015-03-11 07:13:10 -07:00
Adrian Reber
c08e234af7 FT: fix compilation using --with-ft (5/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

With the changes introduced in the previous patches in this series
some goto constructs for cleanup are no longer necessary and removed.
2015-03-11 14:23:33 +01:00
Adrian Reber
8ba41a834a FT: fix compilation using --with-ft (4/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

This patch tries to handle the new xcast semantic.
2015-03-11 14:23:33 +01:00
Adrian Reber
9b84fe45d3 FT: fix compilation using --with-ft (3/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

Follow-up of 552c9ca5a0. This patch
implements the necessary changes in mentioned commit in the FT code.
2015-03-11 14:23:33 +01:00
Adrian Reber
1c5a8df724 FT: fix compilation using --with-ft (2/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

The FT code used barrier mechanisms which have been removed
with aec5cd08bd. This patch replaces
all those different barriers with opal_pmix.fence(NULL, 0);
I am not sure this is completely correct but at least a starting
point for a review.
2015-03-11 14:23:33 +01:00
Adrian Reber
f45dd069bd FT: fix compilation using --with-ft (1/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

This first patch moves orte_cr_continue_like_restart from ORTE
to opal_cr_continue_like_restart in OPAL. This only leaves three
calls from OPAL to ORTE in the FT code. As it is not yet 100%
clear how to handle these calls the code orte_sstore.set_attr()
has been #ifdef'd out for now.
2015-03-11 14:23:33 +01:00
Mike Dubman
a188cb2ff9 Merge pull request #465 from alinask/topic/fix_yalla_warn
PML_YALLA: fix compilation warnings.
2015-03-11 11:38:41 +02:00
Alina Sklarevich
f9a9b936a1 PML_YALLA: fix compilation warnings. 2015-03-11 10:58:54 +02:00
Nathan Hjelm
b308afa8fd btl/openib: remove derived btl segment type
The derived segment type (btl_openib_segment_t) was intended to store
the registration info needed for put and get. In BTL 3.0 this is no
longer required. I intended to remove this type as part of
open-mpi/ompi@74f1af4548 .

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-03-10 14:41:15 -06:00
Nathan Hjelm
3d32dbd793 btl/openib: cuda: fix CUDA-aware support with async copy
This commit should resolve an issue seen with CUDA-aware support. The
problem came in with BTL 3.0. Before 3.0 the size of the copy was
stored in the incoming segment's des_remote_count field. This field
does not exist in BTL 3.0 so I stored the value in the
des_segment_count field. This caused problems with the cuda support
code. To fix the issue the endpoint pointer is now stored in the in
fragment's endpoint pointer which free's up the segment's des_cbdata
pointer for storing the transfer size.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-03-10 14:38:12 -06:00
Nathan Hjelm
d929137768 osc/pt2pt: need to unlock self before waiting for unlock acks
This commit fixes a bug in osc/pt2pt which causes MPI_Win_unlock_all
to hang. The problem was caused by code refactoring that moved the
unlock of the local process to after the loop that waits for unlock
acks. This will cause the code to loop forever waiting on the self
ack.

Fixes #444
2015-03-10 14:10:37 -06:00
Yohann Burette
d48a8ab8f0 mtl/ofi: Use fi_allocinfo(). 2015-03-10 12:50:55 -07:00
Jeff Squyres
cbd99d5f60 libfabric: update to Github upstream 1b4bb2285b
Get a usnic bug fix.
2015-03-10 12:09:02 -07:00
Jeff Squyres
2e8ee003b0 ofi: endpoint type hint moved to a sub-struct, BUFFERED went away
Update to match	new libfabric API/structure change.
2015-03-10 09:55:45 -07:00
Jeff Squyres
d97551bdb1 usnic: endpoint type hint moved to a sub-struct
Update to match new libfabric API/structure change.
2015-03-10 09:47:41 -07:00
Jeff Squyres
1a1be2efa0 libfabric: update to Github upstream 7095f3dc 2015-03-10 09:47:40 -07:00
Jeff Squyres
afec1454f5 usnic: only setup the connectivity checker if we have modules
If we ended up with no modules (e.g., all usnic devices were
excluded), there was a race condition in that the connectivity agent
could tear down its local socket before one or more of the local
clients saw it.  Therefore, the local clients would timeout waiting
for the socket to appear.

So move the connectivity checker init later in the bootstrapping
process (it *must* be setup before module_init()), and have it only
invoked if we actually ended up with one or more modules.
2015-03-10 07:43:20 -07:00
Jeff Squyres
06accb721c usnic: ensure to free all resources if no usnic BTLs found
If all usnic devices are excluded, then we need to ensure the error
path includes freeing the filter.

This was Coverity CID 1288085
2015-03-10 07:43:20 -07:00
Jeff Squyres
8fef4e865f dl dlopen: fix use-after-free
Re-structure the loop looking for duplicates a little so that we only
have a single free of the string that happens regardless of whether we
found a duplicate or not.

This was Coverity CID 1288090
2015-03-10 07:43:20 -07:00
Jeff Squyres
3efb5f56ae dl dlopen: ensure dirs is not NULL
opal_argv_split() may have returned NULL.

This was Coverity CID 1288088
2015-03-10 07:43:20 -07:00
Jeff Squyres
86968dcdda dl dlopen: fix resource leak
closedir() was one block higher than it should have been.

This was Coverity CID 1288087.
2015-03-10 07:43:20 -07:00
Jeff Squyres
546ad3f060 dl dlopen: free resources upon error
Ensure to take the right path out upon errors (that will free any
pending resources).

This was Coverity CID 1288086
2015-03-10 07:43:19 -07:00