Jeff Squyres
df800286e4
Merge pull request #709 from avilcheslopez/master
...
Improving opal_pointer_array bounds checking.
2015-07-23 14:45:11 -04:00
Alejandro Vilches
994ed60b3d
Improving opal_pointer_array bounds checking (using
...
OPAL_UNLIKELY).
2015-07-23 11:53:16 -07:00
Rolf vandeVaart
773b509407
Merge pull request #737 from rolfv/pr/add-cuda-war
...
Add a workaroud for issue in libcuda.so library
2015-07-22 16:14:14 -04:00
Rolf vandeVaart
7703c96496
Add a workaroud for issue in libcuda.so library
2015-07-22 11:35:27 -04:00
Jeff Squyres
ec3a38384f
Merge pull request #688 from jsquyres/pr/usnic-libfabric-msg-prefix-fix
...
usnic fixes for differences between libfabric v1.0.0 and v1.1.0
2015-07-21 10:18:36 -04:00
Gilles Gouaillardet
f7cf7d5070
configury: fix XRC detection on OFED < 3.12
...
since ibv_create_xrc_rcv_qp is now deprecated, and in order to
be "future-proof", we have to consider the case in which only XRC Domains are supported.
also, correctly handle distro that ship broken ibverbs devel headers
Thanks Paul Hargrove for the detailled report.
2015-07-13 10:43:22 +09:00
Ralph Castain
219c4dfba5
Create a new opal_async_event_base and have the pmix/native and ORTE level use it. This reduces our thread count by one.
2015-07-12 08:23:34 -07:00
Ralph Castain
683efcb850
Rename the current opal_event_base to opal_sync_event_base in preparation for adding an async progress thread to opal. No functional changes made here - just a simple rename.
2015-07-11 10:08:19 -07:00
rhc54
053d9b2a7c
Merge pull request #713 from rhc54/topic/errhandler
...
Add an opal/errhandler so opal-level errors can be up-leveled
2015-07-11 07:58:57 -07:00
Ralph Castain
a2243dcddd
Add an opal/errhandler so opal-level errors can be up-leveled
2015-07-11 07:09:11 -07:00
Ralph Castain
61fb067f14
Update the opal_hotel class to support a given event base instead of defaulting to using opal_event_base
2015-07-11 06:42:23 -07:00
Jeff Squyres
633da6641e
usnic: gracefully handle when we can't alloc an ACK
...
The comment didn't match the debugging code (which was ugly, and
apparently never happens, anyway). Just return and let the sender
retransmit.
2015-07-10 14:19:33 -07:00
Jeff Squyres
3327fa56b5
usnic: minor code cleanups
2015-07-10 10:10:43 -07:00
Jeff Squyres
f9c65a701e
usnic: "sin" assignment needs to be outside the #if
...
The "sin" variable is used below; need to ensure that it is assigned
for all builds (not just debug builds).
2015-07-10 06:51:03 -07:00
Jeff Squyres
cd87c8ad41
usnic: misc compiler warnings fixes
2015-07-10 06:51:03 -07:00
Jeff Squyres
ba429dc890
usnic: temporarily disable the BTL put method
...
The usnic BTL put method is currently broken. Disable it until we can
fix it properly.
2015-07-10 06:51:03 -07:00
Jeff Squyres
f265358fbe
usnic: handle FI_MSG_PREFIX differences libfabric v1.0.0->v1.1.0
...
In libfabric v1.0.0 (i.e., API v1.0), the usnic provider handled
FI_MSG_PREFIX inconsistently between sends and receives. This has
been fixed in libfabric v1.1.0 (i.e., API v1.1): FI_MSG_PREFIX is
handled consistently for both sends and receives.
Run-time detect which libfabric we are running with and adapt behavior
appropriately.
2015-07-10 06:51:03 -07:00
Jeff Squyres
ddd0de6cfc
usnic: make more OS-bypass memory Valgrind-defined
...
This helps reduce false positives when running MPI apps through
Valgrind.
2015-07-10 06:51:03 -07:00
Jeff Squyres
9bc7a54e0c
usnic: correctly count CRC errors
...
Handle the differences between libfabric v1.0.0 and v1.1.0 in the
return value of fi_cq_readerr().
Also consolidate CRC and truncation errors into the same handling
block, since truncation errors are typically another symptom of CRC
errors. This ensures that buffers get reposted properly.
2015-07-10 06:51:03 -07:00
Jeff Squyres
fc686f5538
usnic: make configure complain if libfabric cannot be found
...
Instead of silently determining that the usnic BTL can't be built,
announce that usnic is checking for libfabric support, and then
AC_MSG_RESULT the result of that check.
2015-07-10 06:45:33 -07:00
Jeff Squyres
4341639a66
Revert "configury: fix (again) XRC detection on OFED < 3.12"
...
@ggouaillardet is likely offline for the weekend, but master is broken
on RHEL 6.5 systems that do not have MOFED installed. So I'm taking
the liberty of revering this commit; I'm guessing Gilles will fixup
and re-commit next week.
This reverts commit 77f8282d51
.
2015-07-10 06:45:33 -07:00
Gilles Gouaillardet
77f8282d51
configury: fix (again) XRC detection on OFED < 3.12
...
since ibv_create_xrc_rcv_qp is now deprecated, and in order to
be "future-proof", we have to consider the case in which only XRC Domains are supported.
Thanks Paul Hargrove for the detailled report.
2015-07-10 15:31:45 +09:00
Rolf vandeVaart
ae0f3cfee7
Make explicit call to initalize MCA parameters in common CUDA code. This allows us to view them with ompi_info and possibly modify with tools interface
2015-07-09 12:51:55 -04:00
Rolf vandeVaart
cdffa4724d
Force smcuda BTL to use CUDA IPC path for all GPU buffers where possible
2015-07-08 17:11:25 -04:00
Ralph Castain
ed93154e43
Fix hetero operations. An error in the hwloc utilities only allocated memory for the first display of a binding map, and then assumed that all nodes had the same number of cores in them. This resulted in memory corruption whenever someone displayed a binding pattern for a hetero cluster, and a smaller node was first in line.
2015-07-07 12:52:16 -07:00
Gilles Gouaillardet
9f171de412
btl/openib: queue pending fragments once only when running out of credit
...
Fixes open-mpi/ompi#640
2015-07-06 09:45:01 +09:00
bosilca
77367ca02c
Merge pull request #687 from rolfv/pr/fix-smcuda-perfprob
...
Add the ability use different size buffers for host and CUDA buffers
2015-07-02 18:42:41 -04:00
Jeff Squyres
4e7d979f8d
Merge pull request #686 from jsquyres/pr/autogen-no-ompi-bool-fixes
...
bool: use SIZEOF__BOOL, not SIZEOF_BOOL
2015-07-02 12:19:07 -04:00
Rolf vandeVaart
30a872b478
Add the ability to send host buffers through one sized staging buffers and CUDA buffers through different sized buffers. Fixes performance issues
2015-07-02 11:11:15 -04:00
Jeff Squyres
f1353947ff
libfabric: fix wrappers for static builds
...
Need to set the WRAPPER_EXTRA flags so that the wrappers for static
builds pull in -lfabric.
Also update/fix some comments.
2015-07-02 07:58:16 -07:00
Jeff Squyres
cd5751c217
bool: use SIZEOF__BOOL, not SIZEOF_BOOL
...
When you "autogen.pl --no-ompi", the AC_SIZEOF(bool) test is not run.
But we *do* run AC_SIZEOF(_Bool), which is the equivalent. So switch
the uses of SIZEOF_BOOL in the code base to be SIZEOF__BOOL, and it's
all good.
2015-07-02 07:32:02 -07:00
Ralph Castain
861fe1d9dd
This is the third time I am fixing this - I have no idea who or why this is being reset.
2015-07-02 08:39:48 -05:00
Alina Sklarevich
27797654db
openib btl: added a new vendor_part_id for Mellanox ConnectX4-LX.
2015-06-29 13:50:43 +03:00
Ralph Castain
75ceec663a
Now that it has been officially released, update the embedded HWLOC to 1.11.0
2015-06-28 14:07:45 -07:00
bureddy
c78b8e9b8e
Merge pull request #664 from bureddy/master
...
powerpc: update mem barrier instructions
2015-06-25 14:09:49 -07:00
Jeff Squyres
a172bd161e
usnic: switch to use the new libfabric common library
...
The usnic BTL configure.m4 no longer needs to OPAL_CHECK_LIBFABRIC; it
just uses the results from opal/mca/common/libfabric's configure.m4.
We also now don't need to link against libfabric -- they just link
against the opal_common_libfabric library.
2015-06-25 13:33:15 -07:00
Ralph Castain
8d128fe090
Remove the non-null attributes from the cmd_line parser as this isn't something we can guarantee, and the optimization isn't worth the potential for error
2015-06-25 13:26:20 -07:00
Ralph Castain
ea0e21bb06
Add a common/libfabric component to the opal layer where we can place common functions
2015-06-25 11:04:00 -07:00
Nathan Hjelm
ee36d813dc
Merge pull request #657 from hjelmn/c99
...
more c99 updates
2015-06-25 11:21:09 -06:00
Howard Pritchard
f45914db9b
Merge pull request #670 from hppritcha/topic/ownership_update
...
ownership: update ownership files
2015-06-25 11:02:45 -06:00
Nathan Hjelm
4d92c9989e
more c99 updates
...
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
rhc54
1a767ed47c
Merge pull request #654 from rhc54/topic/config
...
Remove internal bool type definitions
2015-06-25 09:10:21 -07:00
Howard Pritchard
e49a37c034
ownership: update ownership files
...
per discussions at OMPI devel workshop
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-06-25 10:04:42 -06:00
Devendar Bureddy
ed406b05cb
powerpc: update mem barrier instructions
...
- added isync interface.
- define opal_atomic_wmb() to lwsync as it is recommend over eieio
on cache enabled storage.
(http://www.ibm.com/developerworks/systems/articles/powerpc.html ).
2015-06-25 10:54:44 +03:00
Nathan Hjelm
4552afff06
Fix definition of MPI_T_pvar_get_index
...
The definition of MPI_T_pvar_get_index was incorrect. This commit
fixes the definition and adds a missing return code.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-24 17:31:26 -06:00
Ralph Castain
869041f770
Purge whitespace from the repo
2015-06-23 20:59:57 -07:00
Ralph Castain
a809902c0a
Now that we require C99, and stdbool.h is part of C99, we no longer need to define our own bool types. Since bool is commonly used in a lot of places, just include stdbool.h in opal_config_bottom.h
2015-06-23 11:31:48 -07:00
Ralph Castain
cc9b416ab3
Ensure we properly commit suicide if/when we lose connection to the daemon. There are multiple paths by which a lost daemon can be reported, and so a race condition exists in the pmix support. Our MPI layer wants the ability to determine the response to the failure, and so it will call down to the RTE with any abort request. This comes down to the pmix layer as a "pmix_abort" command, which involves communicating the request to the daemon - who is gone. Sadly, the pmix component may not know that just yet, and so we hang.
...
So add a brief timer event to kick us out of the communication. The precise amount of time we should wait is somewhat TBD, but set something short for now and we can adjust.
2015-06-18 09:45:52 -07:00
Jeff Squyres
8ab2b11f88
btl_openib.c: fix another compiler warning
...
Remove this unused variable
2015-06-17 09:00:12 -07:00
Jeff Squyres
f688289aaf
btl_openib.c: fix compiler warning
...
This return code is not used; tell the compiler we're not going to
use it.
2015-06-17 08:56:56 -07:00