Jeff Squyres
9e1e563120
event: remove opal_async_event_base
...
opal_async_event_base is not used anywhere. The opal_progress_thread
API should be used instead.
2015-08-07 10:13:41 -07:00
Jeff Squyres
d7c25f683e
pmix_native: update to the new opal_progress_thread API
2015-08-07 10:13:40 -07:00
Jeff Squyres
b5c37dbfe2
CSCuv67889: usnic: fix an error corner case
...
Ensure that we have non-NULL on all levels of pointers, which will
save us if there are exitable errors very early during component /
module initialization.
2015-08-06 10:54:28 -07:00
Jeff Squyres
cbcd16b399
usnic: remove a stale shell variable name
2015-07-31 18:53:54 -07:00
Jeff Squyres
0ee8295e6e
usnic: ensure that we have libfabric >= v1.1
2015-07-31 18:53:54 -07:00
rhc54
c6cc1a9707
Merge pull request #766 from rhc54/topic/hwloc
...
Update x86_32 cpuid assembly code.
2015-07-31 12:53:16 -07:00
Jeff Squyres
2e7f794aae
usnic: convert to use fi_recvmsg / FI_MORE
...
Minor optimization to post 16 receive buffers at a time (vs. 1).
2015-07-31 12:45:40 -07:00
Ralph Castain
b42545b0cb
Update x86_32 cpuid assembly code. Cheery-picked from
...
open-mpi/hwloc@40f9978bcc
2015-07-31 11:40:38 -07:00
George Bosilca
c03b3b135c
Don't allow multiple pvar with the same pvar_index.
...
Fix Cisco copyright.
2015-07-25 15:57:50 -04:00
Guillaume Papauré
98b6d65385
avoid use of non initialized variable
2015-07-25 15:29:32 -04:00
Rolf vandeVaart
1f32fa21ae
Fix arguments to error message, remove tabs and trailing spaces
2015-07-23 10:02:45 -04:00
Rolf vandeVaart
773b509407
Merge pull request #737 from rolfv/pr/add-cuda-war
...
Add a workaroud for issue in libcuda.so library
2015-07-22 16:14:14 -04:00
Rolf vandeVaart
7703c96496
Add a workaroud for issue in libcuda.so library
2015-07-22 11:35:27 -04:00
Jeff Squyres
ec3a38384f
Merge pull request #688 from jsquyres/pr/usnic-libfabric-msg-prefix-fix
...
usnic fixes for differences between libfabric v1.0.0 and v1.1.0
2015-07-21 10:18:36 -04:00
Gilles Gouaillardet
f7cf7d5070
configury: fix XRC detection on OFED < 3.12
...
since ibv_create_xrc_rcv_qp is now deprecated, and in order to
be "future-proof", we have to consider the case in which only XRC Domains are supported.
also, correctly handle distro that ship broken ibverbs devel headers
Thanks Paul Hargrove for the detailled report.
2015-07-13 10:43:22 +09:00
Ralph Castain
219c4dfba5
Create a new opal_async_event_base and have the pmix/native and ORTE level use it. This reduces our thread count by one.
2015-07-12 08:23:34 -07:00
Ralph Castain
683efcb850
Rename the current opal_event_base to opal_sync_event_base in preparation for adding an async progress thread to opal. No functional changes made here - just a simple rename.
2015-07-11 10:08:19 -07:00
rhc54
053d9b2a7c
Merge pull request #713 from rhc54/topic/errhandler
...
Add an opal/errhandler so opal-level errors can be up-leveled
2015-07-11 07:58:57 -07:00
Ralph Castain
a2243dcddd
Add an opal/errhandler so opal-level errors can be up-leveled
2015-07-11 07:09:11 -07:00
Ralph Castain
61fb067f14
Update the opal_hotel class to support a given event base instead of defaulting to using opal_event_base
2015-07-11 06:42:23 -07:00
Jeff Squyres
633da6641e
usnic: gracefully handle when we can't alloc an ACK
...
The comment didn't match the debugging code (which was ugly, and
apparently never happens, anyway). Just return and let the sender
retransmit.
2015-07-10 14:19:33 -07:00
Jeff Squyres
3327fa56b5
usnic: minor code cleanups
2015-07-10 10:10:43 -07:00
Jeff Squyres
f9c65a701e
usnic: "sin" assignment needs to be outside the #if
...
The "sin" variable is used below; need to ensure that it is assigned
for all builds (not just debug builds).
2015-07-10 06:51:03 -07:00
Jeff Squyres
cd87c8ad41
usnic: misc compiler warnings fixes
2015-07-10 06:51:03 -07:00
Jeff Squyres
ba429dc890
usnic: temporarily disable the BTL put method
...
The usnic BTL put method is currently broken. Disable it until we can
fix it properly.
2015-07-10 06:51:03 -07:00
Jeff Squyres
f265358fbe
usnic: handle FI_MSG_PREFIX differences libfabric v1.0.0->v1.1.0
...
In libfabric v1.0.0 (i.e., API v1.0), the usnic provider handled
FI_MSG_PREFIX inconsistently between sends and receives. This has
been fixed in libfabric v1.1.0 (i.e., API v1.1): FI_MSG_PREFIX is
handled consistently for both sends and receives.
Run-time detect which libfabric we are running with and adapt behavior
appropriately.
2015-07-10 06:51:03 -07:00
Jeff Squyres
ddd0de6cfc
usnic: make more OS-bypass memory Valgrind-defined
...
This helps reduce false positives when running MPI apps through
Valgrind.
2015-07-10 06:51:03 -07:00
Jeff Squyres
9bc7a54e0c
usnic: correctly count CRC errors
...
Handle the differences between libfabric v1.0.0 and v1.1.0 in the
return value of fi_cq_readerr().
Also consolidate CRC and truncation errors into the same handling
block, since truncation errors are typically another symptom of CRC
errors. This ensures that buffers get reposted properly.
2015-07-10 06:51:03 -07:00
Jeff Squyres
fc686f5538
usnic: make configure complain if libfabric cannot be found
...
Instead of silently determining that the usnic BTL can't be built,
announce that usnic is checking for libfabric support, and then
AC_MSG_RESULT the result of that check.
2015-07-10 06:45:33 -07:00
Jeff Squyres
4341639a66
Revert "configury: fix (again) XRC detection on OFED < 3.12"
...
@ggouaillardet is likely offline for the weekend, but master is broken
on RHEL 6.5 systems that do not have MOFED installed. So I'm taking
the liberty of revering this commit; I'm guessing Gilles will fixup
and re-commit next week.
This reverts commit 77f8282d51d8f40f6ae988ef84c9c852de75c625.
2015-07-10 06:45:33 -07:00
Gilles Gouaillardet
77f8282d51
configury: fix (again) XRC detection on OFED < 3.12
...
since ibv_create_xrc_rcv_qp is now deprecated, and in order to
be "future-proof", we have to consider the case in which only XRC Domains are supported.
Thanks Paul Hargrove for the detailled report.
2015-07-10 15:31:45 +09:00
Rolf vandeVaart
ae0f3cfee7
Make explicit call to initalize MCA parameters in common CUDA code. This allows us to view them with ompi_info and possibly modify with tools interface
2015-07-09 12:51:55 -04:00
Rolf vandeVaart
cdffa4724d
Force smcuda BTL to use CUDA IPC path for all GPU buffers where possible
2015-07-08 17:11:25 -04:00
Ralph Castain
ed93154e43
Fix hetero operations. An error in the hwloc utilities only allocated memory for the first display of a binding map, and then assumed that all nodes had the same number of cores in them. This resulted in memory corruption whenever someone displayed a binding pattern for a hetero cluster, and a smaller node was first in line.
2015-07-07 12:52:16 -07:00
Gilles Gouaillardet
9f171de412
btl/openib: queue pending fragments once only when running out of credit
...
Fixes open-mpi/ompi#640
2015-07-06 09:45:01 +09:00
bosilca
77367ca02c
Merge pull request #687 from rolfv/pr/fix-smcuda-perfprob
...
Add the ability use different size buffers for host and CUDA buffers
2015-07-02 18:42:41 -04:00
Rolf vandeVaart
30a872b478
Add the ability to send host buffers through one sized staging buffers and CUDA buffers through different sized buffers. Fixes performance issues
2015-07-02 11:11:15 -04:00
Jeff Squyres
f1353947ff
libfabric: fix wrappers for static builds
...
Need to set the WRAPPER_EXTRA flags so that the wrappers for static
builds pull in -lfabric.
Also update/fix some comments.
2015-07-02 07:58:16 -07:00
Ralph Castain
861fe1d9dd
This is the third time I am fixing this - I have no idea who or why this is being reset.
2015-07-02 08:39:48 -05:00
Alina Sklarevich
27797654db
openib btl: added a new vendor_part_id for Mellanox ConnectX4-LX.
2015-06-29 13:50:43 +03:00
Ralph Castain
75ceec663a
Now that it has been officially released, update the embedded HWLOC to 1.11.0
2015-06-28 14:07:45 -07:00
Jeff Squyres
a172bd161e
usnic: switch to use the new libfabric common library
...
The usnic BTL configure.m4 no longer needs to OPAL_CHECK_LIBFABRIC; it
just uses the results from opal/mca/common/libfabric's configure.m4.
We also now don't need to link against libfabric -- they just link
against the opal_common_libfabric library.
2015-06-25 13:33:15 -07:00
Ralph Castain
ea0e21bb06
Add a common/libfabric component to the opal layer where we can place common functions
2015-06-25 11:04:00 -07:00
Nathan Hjelm
ee36d813dc
Merge pull request #657 from hjelmn/c99
...
more c99 updates
2015-06-25 11:21:09 -06:00
Nathan Hjelm
4d92c9989e
more c99 updates
...
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Howard Pritchard
e49a37c034
ownership: update ownership files
...
per discussions at OMPI devel workshop
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-06-25 10:04:42 -06:00
Nathan Hjelm
4552afff06
Fix definition of MPI_T_pvar_get_index
...
The definition of MPI_T_pvar_get_index was incorrect. This commit
fixes the definition and adds a missing return code.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-24 17:31:26 -06:00
Ralph Castain
869041f770
Purge whitespace from the repo
2015-06-23 20:59:57 -07:00
Ralph Castain
cc9b416ab3
Ensure we properly commit suicide if/when we lose connection to the daemon. There are multiple paths by which a lost daemon can be reported, and so a race condition exists in the pmix support. Our MPI layer wants the ability to determine the response to the failure, and so it will call down to the RTE with any abort request. This comes down to the pmix layer as a "pmix_abort" command, which involves communicating the request to the daemon - who is gone. Sadly, the pmix component may not know that just yet, and so we hang.
...
So add a brief timer event to kick us out of the communication. The precise amount of time we should wait is somewhat TBD, but set something short for now and we can adjust.
2015-06-18 09:45:52 -07:00
Jeff Squyres
8ab2b11f88
btl_openib.c: fix another compiler warning
...
Remove this unused variable
2015-06-17 09:00:12 -07:00