1
1
Граф коммитов

225 Коммитов

Автор SHA1 Сообщение Дата
Ryan Grant
5d5e9bc1f8 fixes OpenIB connect error reporting for ibv_* calls that return an errno 2015-02-04 09:09:14 -07:00
Jeff Squyres
cb7cc171f9 usnic: update README.txt notes
Update notes about copying the usnic BTL between master and the v1.8
branch.
2015-02-03 15:54:36 -08:00
Jeff Squyres
edf7232e00 usnic: enable building with an external libfabric 2015-02-03 13:46:06 -08:00
Jeff Squyres
bfa54d5d7b usnic: update to match new libfabric 2015-02-03 13:46:06 -08:00
Todd Kordenbrock
37e6096fe7 Copyright update. 2015-01-29 11:08:13 -06:00
Todd Kordenbrock
ca30e129e8 Add the option to use the Portals4 logical to physical table.
This commit adds an MCA variable to select Portals4 logical
addressing, populates the logical-to-physical mapping table and
initializes the NI in this mode.
2015-01-29 11:08:13 -06:00
George Bosilca
b9a63cbe7a One less warning. 2015-01-27 13:25:55 -05:00
Jeff Squyres
436223959d usnic: update to match new libfabric APIs 2015-01-24 05:49:36 -08:00
Gilles Gouaillardet
9f80aa2d28 btl/openib: regression fix when rdmacm or udcm are disabled
This fixes a regression introduced in open-mpi/ompi@661c35ca67

Thanks to Mark Santcroos for reporting this issue
2015-01-20 11:31:50 +09:00
Jeff Squyres
65a279019e usnic: fix typo in memchecker usage 2015-01-16 09:42:19 -08:00
Gilles Gouaillardet
661c35ca67 cleanup dead code caused by the removal of the --with-threads configure option 2015-01-16 19:13:59 +09:00
Nathan Hjelm
006074c48d Merge pull request #332 from hjelmn/openib_updates
Openib updates
2015-01-15 15:05:18 -06:00
Jeff Squyres
d13c14ec82 CSCus22527: fix off-by-one error in checking the number of VFs
Ensure to count *this* process when checking for how many VFs we need
on the local server.

(cherry picked from commit 386c01934e98cb8dcb48ff648ecdfb0c8677baa9)
2015-01-15 11:44:29 -08:00
Aurélien Bouteiller
f49981bb2a Disable coalescing until pull request #332 gets in. 2015-01-14 14:12:47 -05:00
Jeff Squyres
e4e5e7dbc0 usnic: ensure to clean up nicely in case of low resources
If there are not enough resources (e.g., low VFs), we can end up
calling finalize_one_channel() on the same channel multiple times.  So
ensure to NULL out fields that we have freed already so that we do not
try to free them a second time.

Fixes CSCus26648.
2015-01-13 14:37:31 -08:00
Jeff Squyres
d00cede718 usnic: fix if_include/exclude of CIDR-specified networks
Fix the ordering so that we obtain the usnic netmask information
*before* we do the filtering based on CIDR-specified networks.

Also requires upstream Github libfabric commit 3976745.

Fixes CSCus22495.
2015-01-13 12:04:51 -08:00
Jeff Squyres
a220b92cf8 usnic: fix function name in opal_output 2015-01-13 12:04:07 -08:00
Jeff Squyres
5ed688a074 usnic: enusre that we only get "usnic"-named providers
Also, a minor update to a verbose message.
2015-01-12 13:21:22 -08:00
Jeff Squyres
881b1dcf19 usnic: document libfabric abstractions
Handy tips to remember the libfabric abstractions and what they
correspond to in usnic/VIC terms.
2015-01-09 15:21:51 -08:00
Gilles Gouaillardet
194d9f84d3 btl/usnic: move call to check_reg_mem_basics()
avoid annoying memlock related messages when there is no usnic device.
2015-01-09 11:37:45 +09:00
George Bosilca
1344097d35 Turn OFF the TCP dump mechanism. 2015-01-08 18:50:49 -05:00
George Bosilca
8ddd3b3b09 Cleanup the TCP dump mechanism. 2015-01-08 18:50:05 -05:00
Nathan Hjelm
c65f026fee btl/vader: fix typo in xpmem setup 2015-01-08 12:52:38 -07:00
Gilles Gouaillardet
4c29d8e247 btl/openib: silence warning (unused code) 2015-01-08 17:18:07 +09:00
Gilles Gouaillardet
8ab605d9c5 btl/tcp: fix overflow in mca_btl_tcp_endpoint_dump() 2015-01-08 15:40:16 +09:00
Nathan Hjelm
7d206ae769 btl/ugni: fix a couple of bugs
Two fixes:

 - Do not try to return a mailbox to the free list if one wasn't
   allocated.

 - Do not try to tear down IRQ CQs if they were not created.
2015-01-07 13:48:17 -07:00
Dave Goodell
49069bc661 usnic: fix fi_av_insert (ARP resolution) bugs
We had several problems in the old code:

1. We were specifying an arbitrary timeout (100 ms) and then abandoning
   all remaining pending AV insert operations.  We would then free the
   endpoint buffer that we gave to fi_av_insert(), usually causing
   libfabric's progress thread to write to a freed buffer.

2. We were claiming in a show_help message that the timeout was
   controllable via an MCA parameter.  This commit removes that
   parameter, since there's no good method for us to specify a timeout
   like this to libfabric right now.

3. We also weren't waiting for the correct number of fi_av_insert()
   operations to complete.  We were waiting for nprocs, which is
   accidentally fine for 2 procs on separate hosts, but not for most
   other proc counts.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
2015-01-07 08:25:17 -08:00
Gilles Gouaillardet
06e071454e btl/openib: cleanup duplicate code 2015-01-07 14:07:30 +09:00
Gilles Gouaillardet
135ecce0eb btl/openib: rename OPAL_HAVE_XRCD macro into OPAL_HAVE_CONNECTX_XRC_DOMAINS 2015-01-07 13:27:25 +09:00
Nathan Hjelm
6733d89cf9 btl/vader: fix return code check when opening ptrace_scope file 2015-01-06 15:17:56 -07:00
Nathan Hjelm
cde79bfa60 btl/openib: misc cleanup (tabs, etc) and put credit code into a common place (was duplicated in the send and sendi paths) 2015-01-06 11:39:23 -07:00
Nathan Hjelm
9bae131589 btl/openib: fix message coalescing
There was a bug in the openib btl handling this valid sequence of
calls:

desc = btl_alloc ();
btl_free (desc);

When triggered the bug would cause either fragment loss or undefined
behavior (SEGV, etc). The problem occured because btl_alloc contained
the logic to modify the pending fragment (length, etc) and these
changes were not corrected if the fragment was freed instead of sent.

To fix this issue I 1) moved some of the coalescing logic to the
btl_send function, and 2) retry the coalesced fragment on btl_free
if it was never sent. This appears to completely address the issue.
2015-01-06 11:39:16 -07:00
Nathan Hjelm
9aaac11648 btl/openib: fix recieve queue source detection 2015-01-06 11:39:11 -07:00
Howard Pritchard
7df648f1cf btl/openib: fix problems from commit b3617e73
For systems with OFED's lacking XRC support, commit b3617e73
broke the build of the openib btl.  This commit addresses
the issues introduced by this commit.
2015-01-06 11:31:12 -07:00
Gilles Gouaillardet
b3617e736e btl/openib: add XRC support with OFED 3.12+
based on an original patch contributed by Bull.
2015-01-06 15:30:52 +09:00
Howard Pritchard
c857cc926c Merge pull request #327 from hppritcha/topic/async_progress
Topic/async progress
2015-01-05 16:20:44 -07:00
Howard Pritchard
0a6f841d5f xpmem/config: simple xpmem search on Cray's
Use the pkg-config related m4 functions to find out where
Cray's xpmem.h and libxpmem are located on a system.

With this commit, there is no longer any need to have to
explicitly indicate an xpmem install location on the configure
line, at least for Cray systems running CLE 4.X and 5.X.
2014-12-24 14:40:06 -07:00
Howard Pritchard
065c756860 btl/ugni: improve error handling
Improve error handling when pthread functions return errors.
Remove stale debug code.
2014-12-24 11:50:24 -07:00
Howard Pritchard
f8e354ce00 btl/ugni: add a request_progress_thread mca param
Replace temporary environment variables with a MCA
parameter for the ugni btl.  A user wishing to
use the ugni btl async. progress thread needs to
set the request_progress_thread param to true.
For example, using env. variable format:

export OMPI_MCA_btl_ugni_request_progress_thread=1
2014-12-24 11:50:24 -07:00
Howard Pritchard
8b250cc15b btl/ugni: more debug cleanup 2014-12-24 11:50:24 -07:00
Howard Pritchard
f0c519517b btl/ugni: switch to using opal_progress
Switch to invoking opal_progress from the async progress
thread, rather than calling ugni btl specific progress.
2014-12-24 11:50:24 -07:00
Howard Pritchard
47747c1b27 btl/ugni: remove some debug output 2014-12-24 11:50:24 -07:00
Howard Pritchard
2d14c2a204 btl/ugni: switch to using tx cq irqs for rdma
Verified via testing with unit tests, etc. that
in fact BTE TX descriptors using CQs configured to
generate IRQs were in fact working correctly on Cray XC.  Disable
send message back to self and just use IRQs generated
by completion of TX descriptors posted to BTE.
2014-12-24 11:50:24 -07:00
Howard Pritchard
acd07d98da btl/ugni: turn off chatty debug in irq cq setup 2014-12-24 11:50:24 -07:00
Howard Pritchard
0dec2f4af7 btl/ugni: mark btl frags for irqs as btl owned
Make sure frags allocated to generate irqs to wake
the progress thread, etc. set the MCA_BTL_DES_FLAGS_BTL_OWNERSHIP
flag.
2014-12-24 11:50:23 -07:00
Howard Pritchard
d188f0bc6f btl/ugni: honor enable_mpi_threads
Honor enable_mpi_threads setting to enable the ugni btl
async progress thread.  If the app doesn't request thread-multiple
the thread will not be created.
2014-12-24 11:50:23 -07:00
Howard Pritchard
43cdcb745f btl/ugni: add missing mutex lock 2014-12-24 11:50:23 -07:00
Howard Pritchard
83bcbd1cf9 btl/ugni: compilation fixes
Fix compilation problems in ugni btl associated with
async progress additions.
2014-12-24 11:50:23 -07:00
Howard Pritchard
13ab8a9e5a btl/ugni: use MCA_BTL_DES_FLAGS_SIGNAL
Use MCA_BTL_DES_FLAGS_SIGNAL frag flag to indicate
whether or not an interrupt needs to be delivered
along with a control message going through smsg.
2014-12-24 11:50:23 -07:00
Howard Pritchard
3fc7b389ff initial async progress changes for gni 2014-12-24 11:50:23 -07:00