1
1
Граф коммитов

391 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
593f97ae92 btl: add support for 64-bit atomic operations
This commit adds an interface for btl's to export support for 64-bit atomic
operations on integers. BTL's that can support atomic operations should
implement these functions and set the appropriate btl_flags and btl_atomic_flags.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-13 11:46:36 -07:00
Nathan Hjelm
f8e15ca83d Update the interface to provide a cleaner interface for RDMA operations.
The old BTL interface provided support for RDMA through the use of
the btl_prepare_src and btl_prepare_dst functions. These functions were
expected to prepare as much of the user buffer as possible for the RDMA
operation and return a descriptor. The descriptor contained segment
information on the prepared region. The btl user could then pass the
RDMA segment information to a remote peer. Once the peer received that
information it then packed it into a similar descriptor on the other
side that could then be passed into a single btl_put or btl_get
operation.

Changes:

 - Added functions to register and deregister memory regions with the
   btl. If no registration is needed a btl should set these function
   pointers to NULL. These function take over for btl_prepare_src/dst
   and btl_free for RDMA operations. The caller should specify the
   maximum permissions needed on the memory.

 - Changed the function signatures for both btl_put and btl_get. In
   place of a prepared descriptor the caller should provide the source
   and destination addresses and registration handles as well as a
   new callback function. The callback will be provided with the local
   address and registration handle, callback context, callback data, and
   status. See mca_btl_base_rdma_completion_fn_t in btl.h.

 - Added a new btl constraint: MCA_BTL_REG_HANDLE_MAX_SIZE. This
   value specifies the maximum size of any btl's registration handle.

 - Removed the btl_prepare_dst function. This reflects the fact that
   RDMA operations no longer depend on "prepared" descriptors.

 - Removed the btl_seg_size member. There is no need to btl's to
   subclass the mca_btl_base_segment_t class anymore.

 - Expose the btl's put/get limitations with new struct members:
   btl_put_limit, btl_put_alignment, btl_get_limit, btl_get_alignment.

 - Remove the mca_mpool_base_registration_t argument from the btl_prepare_src
   function. The argument was intended to support RDMA operations and is no
   longer necessary.

 - Remove des_remote/des_remote_count from the mca_btl_base_descriptor_t
   structure. This structure member was originally used to specify the remote
   segment for RDMA operations. Since the new btl interface no longer uses
   desriptors for RDMA this member no longer has a purpose. In addition
   to removing these members the local segment structure fields have been
   renamed to from des_local/des_local_count to des_segments/des_segment_count.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-13 11:46:36 -07:00
Jeff Squyres
f7b4b23383 usnic: ensure to NULL-terminate the string/not overflow
This was CID 1269921.
2015-02-12 13:41:30 -08:00
Jeff Squyres
8febd41a39 usnic: fix minor memory leak
This was CID 1269859.
2015-02-12 13:41:30 -08:00
Jeff Squyres
4c074da1c2 usnic: fix minor memory leak
This was CID 1269853.
2015-02-12 13:41:30 -08:00
Jeff Squyres
a7ce2d406c usnic: don't bother comparing unsigned values for <0
This was CID 1269812.
2015-02-12 13:41:30 -08:00
Jeff Squyres
caacc6ad91 usnic: properly differentiate data pool vs. malloc
usnic_fls() can actually return 0, leading us to incorrectly free() a
buffer instead of OMPI_FREE_LIST_RETURN_MT'ing it.

So add an explicit bool in the struct that tracks whether the buffer
came from malloc or a freelist.

This was CID 1269660.
2015-02-12 13:41:30 -08:00
Jeff Squyres
3b39535ebb usnic: ensure that the string is NULL-terminated
This was CID 1269666.
2015-02-12 13:41:30 -08:00
Jeff Squyres
41c6e26a38 usnic: ensure the copied string is NULL-terminated
This was CID 1269667
2015-02-12 13:41:30 -08:00
Jeff Squyres
81585c0a7c usnic: strengthen the check-if-accept()-failed test
This was Coverity CID 1269801.
2015-02-12 13:41:30 -08:00
Nathan Hjelm
f1dc29b145 btl/vader: fix modex size when xpmem is in use
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-12 14:06:24 -07:00
Jeff Squyres
8be0e0b0ca usnic: don't close fp upon error
Let the caller close fp.  Properly check for errors when calling
subroutines.

This was Coverity CID 1269995.
2015-02-12 10:24:01 -08:00
Rolf vandeVaart
08dceda2c0 Fix logic for handling priority and eager RDMA. There was some refactoring that was done
in this code and it ended up changing the logic that is used to set up eager RDMA.
Rather than setting up eager RDMA with a high priority message, it did it the other
way around.  For some reason, CUDA-aware support did not like this.  So, basically,
restore the logic to the way it was prior to the refactoring.  The refactoring did not
intend to change this.  Lightly reviewed by hjelmn.
2015-02-11 16:38:36 -05:00
George Bosilca
e173f9b0c0 Somehow we lost one of the most critical parameter
allowing the PML to decide how to order the different
interconnects. Bring it back !
2015-02-10 20:32:05 -05:00
Mike Dubman
6816e3421f Merge pull request #377 from regrant/ib_wr_fix
fix problem with get_pathrecord posting too many recv requests
2015-02-10 08:47:23 +02:00
Ryan Grant
de93497789 fix problem with get_pathrecord posting too many recv requests 2015-02-04 09:53:58 -07:00
Ryan Grant
5d5e9bc1f8 fixes OpenIB connect error reporting for ibv_* calls that return an errno 2015-02-04 09:09:14 -07:00
Jeff Squyres
cb7cc171f9 usnic: update README.txt notes
Update notes about copying the usnic BTL between master and the v1.8
branch.
2015-02-03 15:54:36 -08:00
Jeff Squyres
edf7232e00 usnic: enable building with an external libfabric 2015-02-03 13:46:06 -08:00
Jeff Squyres
bfa54d5d7b usnic: update to match new libfabric 2015-02-03 13:46:06 -08:00
Todd Kordenbrock
37e6096fe7 Copyright update. 2015-01-29 11:08:13 -06:00
Todd Kordenbrock
ca30e129e8 Add the option to use the Portals4 logical to physical table.
This commit adds an MCA variable to select Portals4 logical
addressing, populates the logical-to-physical mapping table and
initializes the NI in this mode.
2015-01-29 11:08:13 -06:00
George Bosilca
b9a63cbe7a One less warning. 2015-01-27 13:25:55 -05:00
Jeff Squyres
436223959d usnic: update to match new libfabric APIs 2015-01-24 05:49:36 -08:00
Gilles Gouaillardet
9f80aa2d28 btl/openib: regression fix when rdmacm or udcm are disabled
This fixes a regression introduced in open-mpi/ompi@661c35ca67

Thanks to Mark Santcroos for reporting this issue
2015-01-20 11:31:50 +09:00
Jeff Squyres
65a279019e usnic: fix typo in memchecker usage 2015-01-16 09:42:19 -08:00
Gilles Gouaillardet
661c35ca67 cleanup dead code caused by the removal of the --with-threads configure option 2015-01-16 19:13:59 +09:00
Nathan Hjelm
006074c48d Merge pull request #332 from hjelmn/openib_updates
Openib updates
2015-01-15 15:05:18 -06:00
Jeff Squyres
d13c14ec82 CSCus22527: fix off-by-one error in checking the number of VFs
Ensure to count *this* process when checking for how many VFs we need
on the local server.

(cherry picked from commit 386c01934e98cb8dcb48ff648ecdfb0c8677baa9)
2015-01-15 11:44:29 -08:00
Aurélien Bouteiller
f49981bb2a Disable coalescing until pull request #332 gets in. 2015-01-14 14:12:47 -05:00
Jeff Squyres
e4e5e7dbc0 usnic: ensure to clean up nicely in case of low resources
If there are not enough resources (e.g., low VFs), we can end up
calling finalize_one_channel() on the same channel multiple times.  So
ensure to NULL out fields that we have freed already so that we do not
try to free them a second time.

Fixes CSCus26648.
2015-01-13 14:37:31 -08:00
Jeff Squyres
d00cede718 usnic: fix if_include/exclude of CIDR-specified networks
Fix the ordering so that we obtain the usnic netmask information
*before* we do the filtering based on CIDR-specified networks.

Also requires upstream Github libfabric commit 3976745.

Fixes CSCus22495.
2015-01-13 12:04:51 -08:00
Jeff Squyres
a220b92cf8 usnic: fix function name in opal_output 2015-01-13 12:04:07 -08:00
Jeff Squyres
5ed688a074 usnic: enusre that we only get "usnic"-named providers
Also, a minor update to a verbose message.
2015-01-12 13:21:22 -08:00
Jeff Squyres
881b1dcf19 usnic: document libfabric abstractions
Handy tips to remember the libfabric abstractions and what they
correspond to in usnic/VIC terms.
2015-01-09 15:21:51 -08:00
Gilles Gouaillardet
194d9f84d3 btl/usnic: move call to check_reg_mem_basics()
avoid annoying memlock related messages when there is no usnic device.
2015-01-09 11:37:45 +09:00
George Bosilca
1344097d35 Turn OFF the TCP dump mechanism. 2015-01-08 18:50:49 -05:00
George Bosilca
8ddd3b3b09 Cleanup the TCP dump mechanism. 2015-01-08 18:50:05 -05:00
Nathan Hjelm
c65f026fee btl/vader: fix typo in xpmem setup 2015-01-08 12:52:38 -07:00
Gilles Gouaillardet
4c29d8e247 btl/openib: silence warning (unused code) 2015-01-08 17:18:07 +09:00
Gilles Gouaillardet
8ab605d9c5 btl/tcp: fix overflow in mca_btl_tcp_endpoint_dump() 2015-01-08 15:40:16 +09:00
Nathan Hjelm
7d206ae769 btl/ugni: fix a couple of bugs
Two fixes:

 - Do not try to return a mailbox to the free list if one wasn't
   allocated.

 - Do not try to tear down IRQ CQs if they were not created.
2015-01-07 13:48:17 -07:00
Dave Goodell
49069bc661 usnic: fix fi_av_insert (ARP resolution) bugs
We had several problems in the old code:

1. We were specifying an arbitrary timeout (100 ms) and then abandoning
   all remaining pending AV insert operations.  We would then free the
   endpoint buffer that we gave to fi_av_insert(), usually causing
   libfabric's progress thread to write to a freed buffer.

2. We were claiming in a show_help message that the timeout was
   controllable via an MCA parameter.  This commit removes that
   parameter, since there's no good method for us to specify a timeout
   like this to libfabric right now.

3. We also weren't waiting for the correct number of fi_av_insert()
   operations to complete.  We were waiting for nprocs, which is
   accidentally fine for 2 procs on separate hosts, but not for most
   other proc counts.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
2015-01-07 08:25:17 -08:00
Gilles Gouaillardet
06e071454e btl/openib: cleanup duplicate code 2015-01-07 14:07:30 +09:00
Gilles Gouaillardet
135ecce0eb btl/openib: rename OPAL_HAVE_XRCD macro into OPAL_HAVE_CONNECTX_XRC_DOMAINS 2015-01-07 13:27:25 +09:00
Nathan Hjelm
6733d89cf9 btl/vader: fix return code check when opening ptrace_scope file 2015-01-06 15:17:56 -07:00
Nathan Hjelm
cde79bfa60 btl/openib: misc cleanup (tabs, etc) and put credit code into a common place (was duplicated in the send and sendi paths) 2015-01-06 11:39:23 -07:00
Nathan Hjelm
9bae131589 btl/openib: fix message coalescing
There was a bug in the openib btl handling this valid sequence of
calls:

desc = btl_alloc ();
btl_free (desc);

When triggered the bug would cause either fragment loss or undefined
behavior (SEGV, etc). The problem occured because btl_alloc contained
the logic to modify the pending fragment (length, etc) and these
changes were not corrected if the fragment was freed instead of sent.

To fix this issue I 1) moved some of the coalescing logic to the
btl_send function, and 2) retry the coalesced fragment on btl_free
if it was never sent. This appears to completely address the issue.
2015-01-06 11:39:16 -07:00
Nathan Hjelm
9aaac11648 btl/openib: fix recieve queue source detection 2015-01-06 11:39:11 -07:00
Howard Pritchard
7df648f1cf btl/openib: fix problems from commit b3617e73
For systems with OFED's lacking XRC support, commit b3617e73
broke the build of the openib btl.  This commit addresses
the issues introduced by this commit.
2015-01-06 11:31:12 -07:00
Gilles Gouaillardet
b3617e736e btl/openib: add XRC support with OFED 3.12+
based on an original patch contributed by Bull.
2015-01-06 15:30:52 +09:00
Howard Pritchard
c857cc926c Merge pull request #327 from hppritcha/topic/async_progress
Topic/async progress
2015-01-05 16:20:44 -07:00
Howard Pritchard
0a6f841d5f xpmem/config: simple xpmem search on Cray's
Use the pkg-config related m4 functions to find out where
Cray's xpmem.h and libxpmem are located on a system.

With this commit, there is no longer any need to have to
explicitly indicate an xpmem install location on the configure
line, at least for Cray systems running CLE 4.X and 5.X.
2014-12-24 14:40:06 -07:00
Howard Pritchard
065c756860 btl/ugni: improve error handling
Improve error handling when pthread functions return errors.
Remove stale debug code.
2014-12-24 11:50:24 -07:00
Howard Pritchard
f8e354ce00 btl/ugni: add a request_progress_thread mca param
Replace temporary environment variables with a MCA
parameter for the ugni btl.  A user wishing to
use the ugni btl async. progress thread needs to
set the request_progress_thread param to true.
For example, using env. variable format:

export OMPI_MCA_btl_ugni_request_progress_thread=1
2014-12-24 11:50:24 -07:00
Howard Pritchard
8b250cc15b btl/ugni: more debug cleanup 2014-12-24 11:50:24 -07:00
Howard Pritchard
f0c519517b btl/ugni: switch to using opal_progress
Switch to invoking opal_progress from the async progress
thread, rather than calling ugni btl specific progress.
2014-12-24 11:50:24 -07:00
Howard Pritchard
47747c1b27 btl/ugni: remove some debug output 2014-12-24 11:50:24 -07:00
Howard Pritchard
2d14c2a204 btl/ugni: switch to using tx cq irqs for rdma
Verified via testing with unit tests, etc. that
in fact BTE TX descriptors using CQs configured to
generate IRQs were in fact working correctly on Cray XC.  Disable
send message back to self and just use IRQs generated
by completion of TX descriptors posted to BTE.
2014-12-24 11:50:24 -07:00
Howard Pritchard
acd07d98da btl/ugni: turn off chatty debug in irq cq setup 2014-12-24 11:50:24 -07:00
Howard Pritchard
0dec2f4af7 btl/ugni: mark btl frags for irqs as btl owned
Make sure frags allocated to generate irqs to wake
the progress thread, etc. set the MCA_BTL_DES_FLAGS_BTL_OWNERSHIP
flag.
2014-12-24 11:50:23 -07:00
Howard Pritchard
d188f0bc6f btl/ugni: honor enable_mpi_threads
Honor enable_mpi_threads setting to enable the ugni btl
async progress thread.  If the app doesn't request thread-multiple
the thread will not be created.
2014-12-24 11:50:23 -07:00
Howard Pritchard
43cdcb745f btl/ugni: add missing mutex lock 2014-12-24 11:50:23 -07:00
Howard Pritchard
83bcbd1cf9 btl/ugni: compilation fixes
Fix compilation problems in ugni btl associated with
async progress additions.
2014-12-24 11:50:23 -07:00
Howard Pritchard
13ab8a9e5a btl/ugni: use MCA_BTL_DES_FLAGS_SIGNAL
Use MCA_BTL_DES_FLAGS_SIGNAL frag flag to indicate
whether or not an interrupt needs to be delivered
along with a control message going through smsg.
2014-12-24 11:50:23 -07:00
Howard Pritchard
3fc7b389ff initial async progress changes for gni 2014-12-24 11:50:23 -07:00
Devendar Bureddy
ccafc62c07 OMPI: btl openib: fix max registarable memory caluclation
- by default allow to register maximum possible (i.e 2 * total_memory)
      memory. This beheviour can be turned off using mca parameter
      "btl_openib_allow_max_memory_registration"

    - In fallback case, use device specific parameters to calulate
      memory limit.
2014-12-23 23:35:54 +02:00
Howard Pritchard
ffbf9738a3 btl/vader: disable SGI UV xpmem for now
This commit allows master to build again on SGI UV systems.

Fixes #322
2014-12-23 12:04:25 -07:00
Rolf vandeVaart
26482db736 Bump up max send size. Gives much better performance for GPU transfers while only decreasing host transfers by a small amount. 2014-12-18 13:22:58 -08:00
Jeff Squyres
c621d1e622 libfabric: don't LIBADD the common library in the static case
Adding the libfabric common library in the --disable-dlopen case will
result in duplicate symbols.
2014-12-18 11:04:08 -08:00
Jeff Squyres
269d7f9713 openib: don't use opal_using_threads() in component_init
Use the flag that was passed in, instead.
2014-12-17 15:08:43 -08:00
Jeff Squyres
d6f059f538 configury: add some descriptive output messages in configure
Ensure that the ofi MTL and the usnic BTL have good descriptive output
messages in configure.
2014-12-17 13:36:01 -08:00
Rolf vandeVaart
f55de452ab Change the way we register the sm memory pool with CUDA. Rather than just registering local free lists, register the entire pool as the local process does not know which memory the remote processes are using for free lists. Fixes performance problem we were seeing with copying out of memory (since host piece was not pinned). 2014-12-17 14:21:34 -05:00
George Bosilca
830df07202 Fix the indentation. 2014-12-16 16:07:42 -05:00
George Bosilca
146ab96e29 These variables are now unnecessary. 2014-12-16 16:05:00 -05:00
Aurélien Bouteiller
ee3b090316 The fallback case when yama is not installed was not correct in CMA vader 2014-12-16 14:39:14 -05:00
Aurélien Bouteiller
0bf860ef02 indentation 2014-12-16 14:22:26 -05:00
Jeff Squyres
95da4a5a0e usnic: no longer use opal_using_threads()
Instead, use the flag that is passed in.
2014-12-16 08:49:01 -08:00
George Bosilca
2fec570fe7 There is no need to keep track of these events. They are scheduled
as triggers in libevent, so one bookkepping should be enough.
2014-12-15 22:35:29 -05:00
George Bosilca
46baab350c The event is automatically deleted by default. 2014-12-15 21:59:20 -05:00
George Bosilca
b01abfa0d7 Don't over-do it! 2014-12-15 21:33:32 -05:00
George Bosilca
f87a4b691b Solve another handshake problem, where one threads was calling del_event
while cleaning up after receiving a zero byte on the connect socket
(localyy started connection), while another was trying to accept a
new connection from the same peer. Create a zero-timed event and
delocalize the accept into a timer_event.
Add support for registering an error callback, that can be used when a
connection is discovered as failed during the initialization process.
2014-12-15 20:27:32 -05:00
George Bosilca
e20413c885 Rearrange the code to remove a compiler complaint about
the missing return from a non-void function.
2014-12-15 15:42:57 -05:00
George Bosilca
2edbe16c47 Add the necessary infrastructure to allow the dumping of all TCP
informations related to an endpoint (status and all pending fragments).
Do some minor space cleanup.
2014-12-13 01:59:55 -05:00
George Bosilca
5b8616d890 Fix the race condition in endpoint connection initialization. The race
was quite subtle, and only happened on the process with the smallest
guid (as this process will tear down the connection created locally and
replace it with the result of accept). If multiple threads are active in
the system, the deadlock occurs during the recv event deletion as one
thread will hold the recv event lock of the endpoint and try to access
the TCP event base lock, while the other thread will hold the TCP event
base lock while trying to access the recv event lock (in case data is
available on the socket).

The proposed solution let the event callback fail to process the data,
preventing the deadlock and allowing the other thread to always complete
it's job. As the event is not execute the same triggered will trigger
again at the next opportunity, so this solution introduce a minimal
delay in the connection establishement.
2014-12-13 01:45:00 -05:00
Nathan Hjelm
38d66272c5 btl/vader: fix compile on SGI UV 2014-12-12 09:09:01 -07:00
Jeff Squyres
cd0a54d76f usnic: short term fix to enable builds on non-libfabric platforms
This isn't quite the Right fix yet, because it doesn't address usnic
for external libfabric builds.  I'll fix that separately / later.
2014-12-09 09:19:26 -08:00
Jeff Squyres
6e24a1eb85 usnic: update for libfabric API change
Use FI_ADDR_UNSPEC for posting a receive from an unspecified source.
2014-12-09 06:06:52 -08:00
Jeff Squyres
9547345b18 usnic: fix show_help message
Rename a few symbols to use libfabric-friendly names.  Fix a show_help
message when fi_av_insert times out.
2014-12-08 11:39:07 -08:00
Jeff Squyres
8e49cc754f usnic: update to latest libfabric API changes 2014-12-08 11:37:37 -08:00
Jeff Squyres
984982790a usnic: convert from verbs to libfabric (yay!)
This commit represents the conversion of the usnic BTL from verbs to
libfabric.

For the moment, libfabric is embedded in Open MPI (currently in the
usnic BTL).  This is because the libfabric API is still changing, and
also has not yet been released.  Ultimately, this embedded copy of
libfabric will likely disappear and the usnic BTL will rely on an
external installation of libfabric.

New configure options:

* --with-libfabric: will cause configure to fail if libfabric support
    cannot be built
* --without-libfabric: will prevent libfabric support from being built
* --with-libfabric=DIR: use an external libfabric installation
* --with-libfabric-libdir=LIBDIR: when paired with --with-libfabric=DIR,
    use LIBDIR for the libfabric installation library dir

The --with-libnl3[-libdir] arguments are now gone.
2014-12-08 11:37:37 -08:00
Nathan Hjelm
f989fe27b8 btl/vader: workaround to make jenkins happy 2014-12-03 15:51:58 -07:00
Todd Kordenbrock
c0c680bccb Portals4 BTL: Do not disqualify if a peer does not put Portals4 BTL modex info
If OPAL_MODEX_RECV() returns OPAL_ERR_NOT_FOUND, the peer didn't
send any Portals4 BTL info.  This is not a fatal error.  Instead of
disqualifying the Portals4 BTL just ignore that peer.

@jsquyres reported this in #194.
2014-12-03 14:22:10 -06:00
George Bosilca
dee243c58d ompi_proc_finalize has an interesting side effect. A proc is
inserted in the ompi_proc_list as soon as it is created and it
is removed only upon the call to the destructor. In ompi_proc_finalize
we loop over all procs in ompi_proc_finalize and release them once.
However, as a proc is not removed from this list right away, we
decrease the ref count for each proc until it reach zero and the
proc is finally removed. Thus, we cannot clean the BML/BTL after
the call the ompi_proc_finalize.
A quick fix is to delay the call to ompi_proc_finalize until all
other frameworks have been finalized, and then the behavior
depicted above will give the expected outcome.
2014-11-28 18:26:36 -05:00
Gilles Gouaillardet
a6744b8177 fix misc memory leaks specific to the master 2014-11-25 13:52:10 +09:00
Gilles Gouaillardet
758f7ab768 Revert "btl/vader: use FRAG_ALLOC_USER when single_copy_mechanism is VADER_NONE"
as discussed with @hjelmn in open-mpi/ompi-release#86

This reverts commit d2d7f39a4b.
2014-11-20 16:04:55 +09:00
Nathan Hjelm
1b564f62bd Revert "Merge pull request #275 from hjelmn/btlmod"
This reverts commit ccaecf0fd6, reversing
changes made to 6a19bf85dd.
2014-11-19 23:22:43 -07:00
Nathan Hjelm
b1f9569b7d Revert "btl/openib: fix warnings"
This reverts commit 6e6c786b49.
2014-11-19 23:16:16 -07:00
Nathan Hjelm
6e6c786b49 btl/openib: fix warnings 2014-11-19 15:57:01 -07:00
Nathan Hjelm
2b579610f2 btl/openib: fix compilation issues with XRC 2014-11-19 11:44:48 -07:00
Nathan Hjelm
2a382c2ec1 add btl comment 2014-11-19 11:33:04 -07:00
Nathan Hjelm
bf7daac388 btl/openib: add atomic operation support 2014-11-19 11:33:04 -07:00
Nathan Hjelm
45d1fac8af ugni thread safety fixes 2014-11-19 11:33:03 -07:00
Nathan Hjelm
5e7c77c576 btl/ugni: add support for atomic operations 2014-11-19 11:33:03 -07:00
Nathan Hjelm
90554d0f95 btl/openib: misc cleanup (tabs, etc) and put credit code into a common place (was duplicated in the send and sendi paths) 2014-11-19 11:33:03 -07:00
Nathan Hjelm
4122067236 btl/openib: fix message coalescing 2014-11-19 11:33:03 -07:00
Nathan Hjelm
38e9611930 btl/openib: fix recieve queue source detection 2014-11-19 11:33:03 -07:00
Nathan Hjelm
7c43b566d2 more openib updates 2014-11-19 11:33:03 -07:00
Nathan Hjelm
3ea10476a4 btl/sm fix compilation when not using CMA or KNEM 2014-11-19 11:33:03 -07:00
Nathan Hjelm
4ccb20b097 btl: fix warning about enum type and modify btl_sendi to allow the
value NULL for the descriptor

The send inline optimization uses the btl_sendi function to achieve
lower latency and higher message rates. The problem is the btl_sendi
function was allowed to return a descriptor to the caller. This is fine
for some paths but not ok for the send inline optimization. To fix
this the btl now must be able to handle descriptor = NULL.
2014-11-19 11:33:03 -07:00
Nathan Hjelm
2a70238f4d First crack at adding atomic operation support 2014-11-19 11:33:03 -07:00
Nathan Hjelm
249e5e009f Fix knem support in both sm and vader 2014-11-19 11:33:02 -07:00
Nathan Hjelm
e03956e099 Update the scif and openib btls for the new btl interface
Other changes:
 - Remove the registration argument from prepare_src since it no
   longer is meant for RDMA buffers.

 - Additional cleanup and bugfixes.
2014-11-19 11:33:02 -07:00
Nathan Hjelm
ec33374339 btl: remove des_remote/des_remote_count from the mca_btl_base_descriptor_t
structure

This structure member was originally used to specify the remote segment
for an RDMA operation. Since the new btl interface no longer uses
desriptors for RDMA this member no longer has a purpose. In addition
to removing these members the local segment information has been
renamed to des_segments/des_segment_count.
2014-11-19 11:33:02 -07:00
Nathan Hjelm
2d381f800f Update the interface to provide a cleaner interface for RDMA operations.
The old BTL interface provided support for RDMA through the use of
the btl_prepare_src and btl_prepare_dst functions. These functions were
expected to prepare as much of the user buffer as possible for the RDMA
operation and return a descriptor. The descriptor contained segment
information on the prepared region. The btl user could then pass the
RDMA segment information to a remote peer. Once the peer received that
information it then packed it into a similar descriptor on the other
side that could then be passed into a single btl_put or btl_get
operation.

Changes:

 - Removed the btl_prepare_dst function. This reflects the fact that
   RDMA operations no longer depend on "prepared" descriptors.

 - Removed the btl_seg_size member. There is no need to btl's to
   subclass the mca_btl_base_segment_t class anymore.

...

Add more
2014-11-19 11:33:02 -07:00
Nathan Hjelm
cfbb9cba16 btl/vader: don't assume the address in the put/get segment is unmodified when
using knem

It is valid to modify the remote segment that will be used with the
btl put/get operations as long as the resulting address range falls in
the originally prepared segment. Vader should have been calculating the
offset of the remote address in the registered region. This commit
fixes this issue.
2014-11-12 10:12:52 -07:00
Gilles Gouaillardet
b088175705 btl/vader: fix a typo in mca_btl_vader_put_knem 2014-11-12 19:00:00 +09:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Gilles Gouaillardet
d2d7f39a4b btl/vader: use FRAG_ALLOC_USER when single_copy_mechanism is VADER_NONE 2014-11-10 17:02:45 +09:00
Howard Pritchard
5c08aa8552 enable ugni btl to work without disable-dlopen
There were mistakes in the Makefiles for the ugni btl and
mca/common/ugni that prevented the ugni btl from being
used unless one happened to set the --disable-dlopen option
on the config line.

This commit fixes this problem.
2014-11-09 15:19:47 -07:00
rolfv
022612c83b Missed a removal from previous commit 2014-11-07 11:08:41 -08:00
rolfv
cbb43d5ac3 Make sure initialization happens 2014-11-07 11:00:45 -08:00
George Bosilca
8da5dcc22e Don't release the provided opal_proc in the error path. 2014-11-06 08:42:23 -08:00
Gilles Gouaillardet
e269a52ac7 btl/openib: send openib modex with the PMIX_GLOBAL flag 2014-11-06 08:42:23 -08:00
Steve Wise
7316a88754 openib btl: add Soft iWARP device to the ini file
This enables IBM's software iWARP provider.  With this driver you can
run iWARP/RDMA over any ethernet NIC.  Useful for testing OMPI RDMA
logic without requiring an expensive RDMA adapter/infrastructure.

The Soft iWARP code is at: https://www.gitorious.org/softiwarp
2014-11-04 14:48:43 -08:00
Gilles Gouaillardet
76ee98c86a btl/scif: start the listening thread once only 2014-10-31 16:34:02 +09:00
Gilles Gouaillardet
b4e445afb5 btl/sm: fix a typo in the error message 2014-10-28 11:25:42 +09:00
rolfv
9134f48d4c Do not use sendi path with GPU buffer 2014-10-24 13:35:01 -07:00
Nathan Hjelm
d72fc7a05f btl/vader: more updates to the help messages 2014-10-23 08:48:54 -06:00
Gilles Gouaillardet
55a5c99ff0 btl/vader: fix typos in the help file 2014-10-23 19:28:09 +09:00
Nathan Hjelm
e1bc2de853 btl/vader: defensive programming: use an actual function for the dummy btl_get and btl_put 2014-10-22 14:57:55 -06:00
Nathan Hjelm
19fbe868b8 btl/sm: defensive programming: use an actual function for the dummy btl_get 2014-10-22 14:57:55 -06:00
Aurélien Bouteiller
f232e94c02 Merge branch 'master' of github.com:open-mpi/ompi 2014-10-22 16:56:06 -04:00
Aurélien Bouteiller
55e49470de Patch from Nathan outlined with a crash the mishandling of the case where CMA is requested but not available. 2014-10-22 16:55:18 -04:00
Nathan Hjelm
998e69a6fa btl/sm: add some protection for the use_knem = -1 case
Need to unset the dummy btl_get and remove the MCA_BTL_FLAGS_GET flag
if neither knem nor cma can be used.
2014-10-22 13:57:01 -06:00
Nathan Hjelm
d7c7bb3993 btl/sm: re-enable the use of CMA and knem
At some point we added a sanity check to the btl base to ensure that
the btl flags match the available functions (this prevents user's from
specifying get or put when no function exists). This check was
disabling get for the sm btl since at the time of the check there is
no btl_get function. The simplest fix is to set a dummy value to btl_get
that will be overwritten with the proper value on btl initialization.

Closes #239.
2014-10-22 13:30:27 -06:00
Jeff Squyres
ec4268b59c usnic: do not send zero-length modex message
If there are no usnic BTL modules, then just avoid sending any modex
message at all (other BTLs do this; it's safe to do).

The change is smaller than it looks: I added a "if 0 ==..." check at
the top to return immediately if there are no BTL modules.  Then I
removed some now-unnecessary conditionals and un-indented as
appropriate.

Fixes #248
2014-10-22 11:11:58 -07:00
Jeff Squyres
e415c8f9a8 vader: Remove stale comment 2014-10-22 10:32:33 -07:00
Jeff Squyres
c22e1ae33b configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros
These two macros set the prefix for the OPAL and ORTE libraries,
respectively.  Specifically, the OPAL library will be named
libPREFIXopen-pal.la and the ORTE library will be named
libPREFIXopen-rte.la.

These macros must be called, even if the prefix argument is empty.

The intent is that Open MPI will call these macros with an empty
prefix, but other projects (such as ORCM) will call these macros with
a non-empty prefix.  For example, ORCM libraries can be named
liborcm-open-pal.la and liborcm-open-rte.la.

This scheme is necessary to allow running Open MPI applications under
systems that use their own versions of ORTE and OPAL.  For example,
when running MPI applications under ORTE, if the ORTE and OPAL
libraries between OMPI and ORCM are not identical (which, because they
are released at different times, are likely to be different), we need
to ensure that the OMPI applications link against their ORTE and OPAL
libraries, but the ORCM executables link against their ORTE and OPAL
libraries.
2014-10-22 10:32:19 -07:00
Gilles Gouaillardet
75e8387a4e vader: vader_add_procs report the error if init_vader_endpoint fails 2014-10-22 19:11:54 +09:00
Nathan Hjelm
1a3734ae57 btl/vader: fix compilation on OS X 2014-10-21 09:27:36 -06:00
Gilles Gouaillardet
f56169cee6 btl/vader: silence warning
correctly check HAVE_SYS_PRCTL_H
2014-10-21 19:51:29 +09:00
Gilles Gouaillardet
d60f0cbd88 btl/vader: report an error when a segment cannot be attached 2014-10-21 10:42:22 +09:00
Nathan Hjelm
13643f5b6e btl/vader: improved single-copy support
This commit makes the folowing changes:

 - Add support for the knem single-copy mechanism. Initially vader will only
   support the synchronous copy mode. Asynchronous copy support may be added
   int the future.

 - Improve Linux cross memory attach (CMA) when using restrictive ptrace
   settings. This will allow Open MPI to use CMA without modifying the system
   settings to support ptrace attach (see /etc/sysctl.d/10-ptrace.conf).

 - Allow runtime selection of the single copy mechanism. The default behavior
   is to use the best available. The priority list of single-copy mehanisms is
   as follows: xpmem, cma, and knem.

 - Allow disabling support for kernel-assisted single copy.

 - Some tuning and bug fixes.
2014-10-20 11:44:52 -06:00
Aurélien Bouteiller
e3be1fb9a5 Quick pass over the sm-knem code, indent fixes 2014-10-17 10:38:35 -04:00
Jeff Squyres
43aff4d8b3 btl sm: error if knem support is requested and cannot be activated
Restore the functionality to error out (and show a helpful message) if
knem support is requested by is either not compiled in or cannot be
activated.

Thanks to Gus Correa for bringing the matter to our attention.
2014-10-16 20:01:26 -07:00
Jeff Squyres
b04a2634c6 btl sm: restore btl_sm_have_knem_support MCA param
Somehow, this MCA param was accidentally dropped after v1.6.5.  Thanks
to Gus Correa for bringing this matter to our attention.

Also moving some MCA params down from level 9 to levels 4/5.
2014-10-16 19:48:21 -07:00
George Bosilca
7541c03b4c Mark all instances where atomic operations are used but their return value is unnecessary 2014-10-15 21:47:32 -04:00
Jeff Squyres
51027a6635 usnic: fix minor typo
Change harmless-but-weird comma to semicolon.  Found during code
review.
2014-10-15 05:32:36 -07:00
Nathan Hjelm
a31cf3b740 btl/vader: missing include 2014-10-09 13:57:21 -06:00
Nathan Hjelm
9e0c07e4ce btl/ugni: improve the handling of eager get fragments when the btl runs out
of preregistered buffers

Before this change eager gets we retried on each progress loop. This commit
modifies the protocol to only retry eager gets when another eager get has
completed. This commit also cleans up some callback code that is no longer
needed.
2014-10-09 13:57:21 -06:00
Howard Pritchard
ebc368d26b remove GNI_RDMAMODE_FENCE bit in GNI_PostRdma
The GNI_RDMAMODE_FENCE bit was a left over from
async progress work that is not needed at this point
in the gni BTL.  Removing the bit also allows
for the removal of the GNI_CDM_MODE_BTE_SINGLE_CHANNEL
bit from the GNI_CdmCreate call.
2014-10-09 12:41:19 -06:00
Howard Pritchard
9947758d98 initial thread safety for ugni btl
This commit adds initial ugni thread safety support.
With this commit, sun thread tests (excepting MPI-2 RMA)
pass with various process counts and threads/process.
Also osu_latency_mt passes.
2014-10-08 10:13:22 -06:00
Ralph Castain
fd6a044b7f Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages.
Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.
2014-10-03 16:02:57 -06:00
Howard Pritchard
1df933ea27 remove ompi/runtime/params.h include in ugni btl
This commit was SVN r32813.
2014-09-29 19:26:33 +00:00
Nathan Hjelm
e0eb1f2e73 btl/vader: make vader registration lookup/caching thread safe
This commit was SVN r32798.
2014-09-25 22:24:06 +00:00
George Bosilca
53e012ae97 Fix typo.
This commit was SVN r32795.
2014-09-25 17:18:27 +00:00
Nathan Hjelm
aba87f3776 btl/vader:silence warning
This commit was SVN r32788.
2014-09-24 22:10:23 +00:00
Nathan Hjelm
79881ca892 btl/vader: prevent double-destruction of endpoints and move endpoint teardown code into destructor
This commit was SVN r32779.
2014-09-23 21:51:15 +00:00
Nathan Hjelm
2d8fba0861 btl/vader: silence warning
This commit was SVN r32778.
2014-09-23 21:33:45 +00:00
Nathan Hjelm
8bd3160432 btl/vader: fix several typos in vader update
This commit was SVN r32775.
2014-09-23 20:25:36 +00:00
Nathan Hjelm
12bfd13150 btl/vader: improve performance for both single and multiple threads
This is a large update that does the following:

 - Only allocate fast boxes for a peer if a send count threshold
   has been reached (default: 16). This will greatly reduce the memory
   usage with large numbers of local peers.

 - Improve performance by limiting the number of fast boxes that can
   be allocated per peer (default: 32). This will reduce the amount
   of time spent polling for fast box messages.

 - Provide new MCA variables to configure the size, maximum count,
   and send count thresholds for fast boxes allocations.

 - Updated buffer design to increase the range of message sizes that
   can be sent with a fast box.

 - Add thread protection around fast box allocation (locks). When
   spin locks are available this should be updated to use spin locks.

 - Various fixes and cleanup.

This commit was SVN r32774.
2014-09-23 18:11:22 +00:00
Vasily Filipov
e26af91a64 BTL/OPENIB: set "max_lmc" param to be "1" and not "all available values" by default.
cmr=v1.8.3:reviewer=miked 

This commit was SVN r32736.
2014-09-15 13:56:41 +00:00
Alex Mikheev
31d0724a08 OMPI: btl openib: fix detection of max registarable memory
Deal with the case when mlx4 module is loaded but device
is not present

cmr=v1.8.3:reviewer=miked

This commit was SVN r32734.
2014-09-15 12:17:23 +00:00
Howard Pritchard
e43715574a remove ignored restrct return type qualifier
The use of restrict in the return type qualifier for mca_btl_vader_reserve_fbox
is being ignored by gnu compiler.  for newer gcc, one sees this warning only
with -Wignored-qualifiers set, but for older variants of gcc it was reported
that numerous warning messages about this ignored qualifier were being
generated as vader is being compiled.

The warning reported by gcc is

btl_vader_fbox.h:53:47: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
 static inline mca_btl_vader_fbox_t * restrict mca_btl_vader_reserve_fbox (struct mca_btl_base_endpoint_t *ep, const size_t size)

This commit was SVN r32714.
2014-09-11 21:12:41 +00:00
Ralph Castain
cb2ad98f57 Silence an unused function warning
This commit was SVN r32704.
2014-09-10 17:36:34 +00:00
Ralph Castain
a7c5b77d70 Just because the openib BTL can't reach a process doesn't mean it is a job-ending error. If we have other methods for reaching the process (e.g., sm for a local proc), then that's okay. If there is no method for reaching a proc, then that's an error - but the BML will report that situation.
The question of whether or not the openib BTL supports loopback is a separate question. It may be more appropriate to make the modex be PMIX_GLOBAL for cases where openib can support loopback so someone can run without a shared memory component. I'll leave that decision to the IB vendors.

This commit was SVN r32702.
2014-09-10 17:02:16 +00:00
Ralph Castain
3fed455bbc If something goes wrong in add_procs, let's not segfault during finalize
This commit was SVN r32665.
2014-09-03 17:27:31 +00:00
Ralph Castain
9ac75451ff Nathan had requested this before as he needs to know the #procs in the job to optimize the UGNI btl. Add the fetch for that data - the native pmix component already provides it, but ensure the Slurm PMI-1 support does too. If not found, fall back to the non-optimized number
This commit was SVN r32648.
2014-08-29 22:53:35 +00:00
Gilles Gouaillardet
6916bfc368 btl/openib: fix use of mca_btl_openib_component.default_recv_qps
- do not have mca_btl_openib_component.default_recv_qps point to the stack
- do not reset mca_btl_openib_component.default_recv_qps in btl_openib_component_open

cmr=v1.8.3:reviewer=miked

This commit was SVN r32642.
2014-08-29 04:41:34 +00:00
Gilles Gouaillardet
b8a2e90f2d btl/openib: fix a typo
cmr=v1.8.3:reviewer=miked

This commit was SVN r32639.
2014-08-29 04:23:42 +00:00
Jeff Squyres
733316372b usnic: remove suggestion of enabling no-drop in the fabric
Reviewed by Reese Faucette

cmr=v1.8.3:reviewer=ompi-rm1.8

This commit was SVN r32628.
2014-08-28 23:56:56 +00:00
Jeff Squyres
b0dfb9f401 usnic: avoid a possible race condition
Per #4874, code review revealed a possible race condition in the
module struct and the connectivity agent.  Move the setup of the
connectivity agent listener until the module struct has been fully
setup.

This commit was SVN r32573.
2014-08-22 02:34:24 +00:00
Jeff Squyres
a896f90712 btl_base_select: fix faulty/incorrect show_help message
When no components were able to be found, btl_base_select() was
showing the wrong help message -- one that indicated that a specific
component could not be found.  And it left off a string argument, so
the end of the help message was garbage.

This commit creates a new help message for this case and updates the
show_help call to use the new message.

This commit was SVN r32572.
2014-08-22 01:53:38 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
Mike Dubman
c3beb0472e openib/btl: better detect max reg memory. OFED has no runtime versioning API :(
based on http://www.open-mpi.org/community/lists/users/2014/08/25048.php

reviewed by AlexM
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32569.
2014-08-21 12:12:43 +00:00
Gilles Gouaillardet
cfc0773c8c btl/scif: use safe syntax
Thanks to Ashley Pittman for pointing the modex should now be zero'ed

cmr=v1.8.2:ticket=trac:4871

This commit was SVN r32568.

The following Trac tickets were found above:
  Ticket 4871 --> https://svn.open-mpi.org/trac/ompi/ticket/4871
2014-08-21 10:32:05 +00:00
Mike Dubman
acd5a9acac udcm: psn should be 24 bit, new OFED actually checks and fails if it is not 24 bit.
This commit was SVN r32567.
2014-08-21 08:49:43 +00:00
Gilles Gouaillardet
3c1944054e btl/scif: use safe syntax
PGI compilers 2013 and older do not support the following syntax :
mca_btl_scif_modex_t modex = {.port_id = mca_btl_scif_module.port_id};
so split it on two lines

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32555.
2014-08-20 02:48:47 +00:00
Jeff Squyres
ac7c907f8d usnic: ensure to have a safe destruction of an opal_list_item_t
It turns out that we ''can'' get to the endpoint destructor with the
endpoint still on the "endpoints needing ACKs" list.  So if it's on
the list, remove it first, and then DESTRUCT the opal_list_item_t.

This prevents an assert() fail in debug builds.  We'd like to let this
soak over the weekend.

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r32546.
2014-08-15 21:52:36 +00:00
Jeff Squyres
1cdcb7290b usnic: no need to check before calling this function
This function is intentionally always safe to call -- no need for a
double redundant check.

This commit was SVN r32545.
2014-08-15 21:39:29 +00:00
Jeff Squyres
082ab15d19 usnic: increase the listen() backlog size
Rarely -- but it happens -- the connectivity client gets ECONNREFUSED
because the connectivity agent listen() backlog is too small.  Rather
than put in a loop on the client side, take the simple way out for
now: increase the backlog size to an arbitrarily-large number.

Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32543.
2014-08-15 19:12:18 +00:00
Jeff Squyres
9373d6420e usnic: when a module is finalized, "unlisten" the connectivity checker
Instead of waiting to destroy the connectivity agent during component
shutdown, have the module shutdown send an "unlisten" command to the
cagent that will tell it to stop listening on a given interface.

This commit was SVN r32536.
2014-08-15 00:52:43 +00:00
Jeff Squyres
6b592d3016 usnic: convert some BTL_ERRORs to more descriptive show_help messages
1. After we receive N abnormally-short messages (meaning: corrupted),
print a show_help message about it.  N defaults to 25.  N can be set
to 0 disable the message via btl_usnic_max_short_packets.
1. If we receive a completion error for something other than a
receive, display a show_help message.

Reviewed by Dave Goodell.

CMR'ing to v1.8.3, but it will require a custom patch because of the
OMPI->OPAL BTL move.

cmr=v1.8.3

This commit was SVN r32522.
2014-08-13 15:01:20 +00:00
Mike Dubman
5b90af601c btl/openib: add missing definition for ConnectX3 card
This commit was SVN r32521.
2014-08-13 13:56:34 +00:00
Rolf vandeVaart
37dc9477d0 As requested in RFC and discussed at weekly meeting, change default setting of ibv_fork_init() to off.
Link to RFC: http://www.open-mpi.org/community/lists/devel/2014/07/15393.php

cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32514.
2014-08-12 20:57:11 +00:00
Rolf vandeVaart
fb835a3b04 Needs to be a plus.
This commit was SVN r32513.
2014-08-12 20:39:00 +00:00
Rolf vandeVaart
c53c981506 Fix initialization and cleanup code for CUDA-aware code. Eliminates all resource leaks.
This commit was SVN r32512.
2014-08-12 19:41:46 +00:00
Howard Pritchard
5bbcf5f7fb remove ompi comm related constructs from ugni btl
There were remaining references to size of MPI_COMM_WORLD,
etc. in ugni btl which prevented building of opal library
following btl move to opal.

Eventually another mechanism will need to be found for
providing hints to BTLs about how to setup internal
resources relevant to max. number possible endpoints, etc.

Conflicts:
	opal/mca/btl/ugni/btl_ugni_component.c

This commit was SVN r32507.
2014-08-11 16:15:39 +00:00
George Bosilca
de7191132d Remove few warnings.
This commit was SVN r32506.
2014-08-11 13:34:44 +00:00
Gilles Gouaillardet
f24699623f check-help-strings cleanup
This commit was SVN r32495.
2014-08-11 03:25:22 +00:00
George Bosilca
beec6b4b4b Remove a lost #include.
This commit was SVN r32478.
2014-08-08 23:42:40 +00:00
Howard Pritchard
1e02bb056f openib btl check-help-strings cleanup
This commit was SVN r32470.
2014-08-08 20:40:18 +00:00
Jeff Squyres
65767aff68 usnic: remove errant OMPI header file
This commit was SVN r32469.
2014-08-08 20:34:50 +00:00
Howard Pritchard
a1f6ecf1e6 initial fixes for ugni btl move to opal
This commit was SVN r32466.
2014-08-08 18:02:46 +00:00
Jeff Squyres
323b9f346c usnic: update connectivity checker help message
Show an example of using the btl_usnic_connectivity_map option.  Also,
mention that another reason for the "total connectivity failure" may
be due to asymmetric / unexpected routing.

Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32465.
2014-08-08 17:18:29 +00:00
Rolf vandeVaart
909cfa35aa Fix some error messages.
This commit was SVN r32458.
2014-08-08 14:33:46 +00:00
Jeff Squyres
040989edde btl_sm_component.c: update outdated topic for call to show_help
This commit was SVN r32456.
2014-08-08 13:35:52 +00:00
Jeff Squyres
2d2534a1bc btl_openib_component.c: remove inline logic and use 2 calls to show_help
The contrib/check-help-strings.pl gets confused if the topic is an
inline logic check, so separate it into two calls to show_help.

This commit was SVN r32455.
2014-08-08 13:35:29 +00:00
Jeff Squyres
b0897031f0 btl vader: removed unused helpfile
vader does not invoke show_help() at all -- all the topics in this
helpfile were copied from the original sm btl helpfile.

This commit was SVN r32454.
2014-08-08 13:35:05 +00:00