1
1
Граф коммитов

902 Коммитов

Автор SHA1 Сообщение Дата
bosilca
ab68aced23 Merge pull request #3738 from bosilca/topic/tcp_event_count
Fix the TCP performance impact when BTL not used
2017-09-19 23:08:58 -04:00
Brian Barrett
bffcc3bca0 util: move graph solver from usnic to util
Cisco wrote a bipartite graph solver to properly solve
interface pair selection for usNIC.  Using the reachable
framework, the TCP BTL (and possibly the runtime network
code) can use the graph solver to make more optimal pair
selection.  Jeff was happy to have the code more broadly
used, but didn't have time to do the move, hence this
commit.

There are a couple of minor changes to the code compared
to the usNIC version.  Obviously, the functions have
been renamed to match naming convention for their new
home.  Since it's easier to write unit tests for
util/ code, the unit tests have been made first class
tests run at "make check" time.  This last bit required
moving some of the definitions into a new header,
bipartite_graph_internal.h, so that they could be
included in both the library code and the test code.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2017-09-15 15:08:47 -07:00
Nathan Hjelm
0851122cce btl/openib/udcm: add support for connection across subnets
This commit adds the code necessary to support forming connections across
subnets. The primary changes are to 1) add the gid to the modex, and 2)
use the gid to create the address handle.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2017-09-14 06:42:06 -10:00
George Bosilca
d10522a01c
Set a hard limit on the TCP max fragment size.
Some OSes have hardcoded limits to prevent overflowing over an int32_t.
We can either detect this at configure (which might be a nicer but
incomplete solution), or always force the pipelined protocol over TCP.
As it only covers data larger than 1GB, no performance penalty is to be
expected.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-09-01 18:52:48 -04:00
George Bosilca
c340da2586
A first cut at the large data problem with TCP. As long
as the writev and readv support a sum larger than a uint32_t
this version will work. For the other OSes a different patch
is required. This patch is a slight modification of the one
proposed by @ggouaillardet.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-09-01 18:52:48 -04:00
Josh Hursey
ad87aa2674 Merge pull request #4121 from jjhursey/explore/dlopen-local
mca: Dynamic components link against project lib
2017-08-25 13:15:51 -05:00
Joshua Hursey
e1d079544b mca: Dynamic components link against project lib
* Resolves #3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-08-24 11:56:16 -04:00
George Bosilca
50f471e31e
Cleanup a set of warnings reported by Ralph.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-08-22 23:00:18 -04:00
Ralph Castain
38e363c515 Fix the #if check for hwloc version
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-22 14:07:36 -07:00
Brian Barrett
c667719a3f Merge pull request #3955 from mohanasudhan/master
Btl tcp: Improved diagnostic output and failure mode
2017-08-18 11:42:27 -07:00
Mohan
fc32ae401e Btl Tcp: Updated tcp handshake methods
This commit has two changes

1. Adding magic string during handshake can cause
issue when used with older version of MPI. Hence set
RCVTIMEO paramter to 2 second
2. Using single call during handshake instead of
two calls

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-18 10:06:52 -07:00
Mohan
e3dfe11da9 Btl tcp: Improving verbose around tcp
As part of improvement towards tcp btl we
are improving verbose in general

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 17:22:16 -07:00
Mohan
4bc7b214dc Btl tcp: Improving verbose around IPV6
As part of improvement around tcp btl debugging
& verbose. we are improving verbose around IPV6

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:14 -07:00
Mohan
0741fad479 Btl tcp: BTL_ERROR to show_help & update func behaviour
As part of improvement towards tcp debugging
we are moving few BTL_ERROR to show_help and also
update the function behaviour of
mca_btl_tcp_endpoint_complete_connect to return
SUCCESS and ERROR cases.

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:14 -07:00
Mohan
368f9f0dfc Btl tcp: Using magic string to verify mpi connection
As part of improvement towards handling failure case
in btl tcp we are using magic string to verify mpi
connection. In case if there is mismatch or missing
magic string we can identify that we are trying to
connect with someother process.

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:13 -07:00
Mohan
c30a42917c Btl tcp: Refactoring non-blocking send/receive function
Moving non-blocking send/receive function to btl_tcp
will help reusing these function where ever needed.
In this case we plan to reuse receive function to
retrive magic string to validate established connection
is from mpi process.

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:13 -07:00
Jeff Squyres
a591159fb4 btl/usnic: restore configure usNIC summary line
Not sure how/when this got deleted, but put back the "Cisco usNIC"
line in the transport summary at the end of configure.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-08-16 12:37:59 -07:00
Jeff Squyres
6889948475 Merge pull request #4058 from thananon/pr/usnic_fix_credit
btl/usnic: assign the number of send credit correctly.
2017-08-09 11:46:42 -04:00
Thananon Patinyasakdikul
68658e4bab btl/usnic: assign the number of send credit correctly.
usnic endpoints was always created with default send credit value of 8. This
commit assign the correct number from the hardware instead.

Signed-off-by: Thananon Patinyasakdikul <apatinya@cisco.com>
2017-08-08 17:01:16 -07:00
Nathan Hjelm
76320a8ba5 opal: rename opal_atomic_init to opal_atomic_lock_init
This function is used to initalize and opal atomic lock. The old name
was confusing.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-08-07 14:15:11 -06:00
Joshua Hursey
196b314643 btl/sm: Missing argv header
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-08-04 21:10:49 -04:00
Howard Pritchard
5ce07a6983 Merge pull request #3997 from hppritcha/topic/swat_compiler_warning
btl/ugni: swat compiler warning
2017-08-02 15:44:09 -06:00
Ralph Castain
f39ce67982 Merge pull request #3951 from rhc54/topic/hwloc2
Update to hwloc 2.0.0a
2017-08-01 15:18:31 -06:00
Howard Pritchard
12a5aacdfd btl/ugni: swat compiler warning
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
2017-08-01 12:21:57 -06:00
Jeff Squyres
d954167ecf Merge pull request #3881 from bharatpotnuri/master
master: btl/openib: Handle EOPNOTSUPP
2017-07-26 11:32:40 -04:00
Ralph Castain
7a83fdb9bb Update to hwloc 2.0.0a with shmem support.
Update to support passing of HWLOC shmem topology to client procs
Update use of distance API per @bgoglin
Have the openib component lookup its object in the distance matrix
Bring usnic up-to-date
Restore binding for hwloc2

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-25 20:26:22 -07:00
Gilles Gouaillardet
60aa9cfcb6 hwloc: add support for hwloc v2 API
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-20 17:39:44 +09:00
Gilles Gouaillardet
9f29f3bff4 hwloc: since WHOLE_SYSTEM is no more used, remove useless
checks related to offline and disallowed elements

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-20 17:39:21 +09:00
Nathan Hjelm
e5343c16c0 btl/vader: remove debug code that should not be in a release
References #3902. Close when in master, v3.0.x, and v2.x.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-07-17 11:58:47 -05:00
Gilles Gouaillardet
6e35cfc19a btl/sm: fix misc memory leak
as reported by Coverity with CID 1415105

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-16 13:02:55 +09:00
Jeff Squyres
5cf64e6555 btl/sm: effectively delete the SM BTL
If a user explicitly asks for the "sm" BTL, print a show_help message
saying that the SM BTL is dead, and the user should be using "vader".

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-07-15 09:33:08 -07:00
Potnuri Bharat Teja
9154ade8b1 btl/openib: Handle EOPNOTSUPP
Updated openib BTL to handle EOPNOTSUPP as per
https://www.open-mpi.org/community/lists/devel/2016/04/18839.php

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
2017-07-13 21:05:32 +05:30
Gilles Gouaillardet
32606ad476 btl/tcp: fix heterogeneous support for put / large messages
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-12 10:27:45 +09:00
Nathan Hjelm
c18007d095 btl/vader: work around ob1 pending fragment bug
This commit ensures that the pml callback is always made when
sending fragments. This is needed to avoid #3845. Once that is
fixed the #if 0'd code can be restored.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-07-11 15:59:56 -06:00
Ralph Castain
31130a4bee Replace syntax with something less strictly C99
Fixes #3809

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-05 16:54:36 -07:00
George Bosilca
bd5650d680
Fix the TCP performance impact when not used
Based on an idea from Brian move the libevent trigger update to a later
stage instead of the generic add/del procs. So, we are doing the
increment/decrement when we register the recv handler for an endpoint,
so basically when we create and connect a socket to a peer. The benefit
is that as long as TCP is not used, there should be no impact on the
performance of other BTLs. The drawback is that the first TCP connection
will be slightly slower, but then once we have a peer connected over
TCP things go back to normal.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-06-23 11:15:45 +02:00
Nathan Hjelm
ffd8ee2dfd opal: use opal_list_t convienience macros
This commit cleans up code in opal to use OPAL_LIST_FOREACH(_SAFE),
OPAL_LIST_DESTRUCT, and OPAL_LIST_RELEASE.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-20 12:37:12 -06:00
Thananon Patinyasakdikul
bf7534d32c btl/usnic: changed fi_ep_bind flags for AV from NULL to 0 due to
compiler warning.

This commit fixed compiler warning generated from earlier commit :
ddbe1726c5

Signed-off-by: Thananon Patinyasakdikul <apatinya@cisco.com>
2017-05-22 10:09:43 -07:00
Thananon Patinyasakdikul
a705f2cf7b usNIC: fix fi_ep_bind flag. FI_RECV should not be associated with av.
Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
2017-05-16 18:22:28 -04:00
Ralph Castain
a737d0f963 Merge pull request #3430 from bosilca/topic/tcp_hostname
Use the OPAL function to get the hostname.
2017-05-03 06:42:02 -07:00
Brian Barrett
3b991498be btl tcp: Don't set socket buffer size by default
Set the default send and receive socket buffer size to 0,
which means Open MPI will not try to set a buffer size during
startup.

The default behavior since near day one of the TCP BTL has been
to set the send and receive socket buffer sizes to 128 KiB.  A
number that works great on 1 GbE, but not so great on 10 GbE
fabrics of any real size.  Modern TCP stacks, particularly on
Linux, have gotten much smarter about buffer sizes and are much
less efficient if a buffer size is set (even if set to something
large).

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2017-04-28 14:14:49 -07:00
George Bosilca
2d8943d920
Use the OPAL function to get the hostname. 2017-04-28 02:48:15 -04:00
Nathan Hjelm
387467c358 btl/ugni: remove erroneous mca_btl_ugni_frag_return call
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-04-27 09:14:51 -06:00
Howard Pritchard
f2a27cc991 Merge pull request #3396 from hppritcha/topic/swat_compiler_warning
btl/sm: swat a compiler warning
2017-04-22 14:31:21 -06:00
Jeff Squyres
1d5e08f44a usnic: more iov_limit fixes
Follow on to 7bd2de9960: move setting
the iov_limit to 1 earlier in the startup sequence.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-04-21 09:14:28 -07:00
Howard Pritchard
782f1bb9af btl/sm: swat a compiler warning
gnu 6.3.1 complaining about uninitialized variable

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-04-21 10:02:56 -05:00
Howard Pritchard
462342d148 Merge pull request #3311 from hppritcha/topic/libfabric_moves_to_ofi
common/libfabric: move libfabric to ofi
2017-04-21 07:50:38 -06:00
Jeff Squyres
7bd2de9960 usnic: ensure to set the iov_limit to 1
The usNIC BTL does not use more than 1 iov, so be sure to set it to 1
so that we don't allocate cq/rq/sq entries based on a default (i.e.,
>1) number of iovs per entry.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-04-20 13:28:15 -07:00
Howard Pritchard
841192645b common/libfabric: move libfabric to ofi
This PR renames the common library for OFI libfabric from
libfabric to ofi.  There are a number of reasons this
is good to do:

1) its shorter and replaces 9 characters with three for
   function names for what may eventually be a fairly extensive interface
2) OFI is the term used for MTL and RML components that use
   the OFI libfabric interface
3) A planned OSC component will also use the OFI term.
4) Other HPC libraries that can use OFI libfabric tend to use
   the term "ofi" internally and also in their configure options
   relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM)

There seem to be comments in places in the Open MPI source
code that indicate that this common library will be going away.
Far from it as we will want to be able to share things like
AV objects between OMPI and possibly OSHMEM components that
use the OFI libfabric interface.

This PR also adds a synonym to the --with-libfabric(-libdir)
configury options: --with-ofi and with-ofi-libdir.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-04-20 13:07:16 -06:00
Mark Allen
655a06f559 IB fork
The key change was in btl_openib_connect_udcm.c where a buffer was
being pinned with size 65664 (whether openib was being used or not).
The start of the buffer was page aligned, but because of the size
the end wasn't. That makes it too easy for a forked child to accidentally
touch pinned memory on the same page as the end of that buffer.

So this change increases the size of the allocated buffer to use the
rest of the page.

I inspected the rest of the ibv_reg_mr() calls and changed one other
place to page align its buffer too, although I think the above is
the one that really matters.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-04-05 17:35:52 -04:00
Ralph Castain
b398d721d5 Merge pull request #3236 from rhc54/topic/craycleanups
Silence a flood of warnings when compiling with gcc on Cray
2017-03-24 13:33:46 -07:00
Ralph Castain
ecc8000136 Silence a flood of warnings when compiling with gcc on Cray
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-24 13:37:11 -06:00
Ralph Castain
470452cba0 Correctly check the sa_family and cast the data correctly before passing it to inet_nop, and don't be quite as fancy with the pointer arithmetic as the combination was causing us to segfault every time this debug message was called.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-24 11:42:57 -07:00
Nathan Hjelm
6b210fa2c4 btl/ugni: do not return a frag from sendi if an endpoint is waitlisted
This fixes a hang that can occur when running bandwidth tests.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-14 10:14:13 -06:00
Nathan Hjelm
2e42b0afbd btl/ugni: move connection check into sync event
This commit makes datagram checks time based and reduces their
frequency when only the wildcard datagram is posted. This change
improves latency on knl systems.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-14 10:10:05 -06:00
Nathan Hjelm
d5aaeb74b6 btl/ugni: return a descriptor from sendi
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-13 14:56:54 -06:00
Nathan Hjelm
a19e7023d1 btl/ugni: always check local SMSG CQ
This commit removes the local operation count check from the local SMSG
completion queue. This check was leading to hangs due to an undocumented
feature of the ugni library. The local SMSG CQ is used to send credit
return messages back to the sender. The ugni library never checks for
the completion itself but relying on the SMSG user to periodically
check the CQ.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-13 14:56:54 -06:00
Nathan Hjelm
d5cdeb81d0 btl/ugni: improve multi-threaded performance
This commit updates the ugni btl to make use of multiple device
contexts to improve the multi-threaded RMA performance. This commit
contains the following:

 - Cleanup the endpoint structure by removing unnecessary field. The
   structure now also contains all the fields originally handled by the
   common/ugni endpoint.

 - Clean up the fragment allocation code to remove the need to
   initialize the my_list member of the fragment structure. This
   member is not initialized by the free list initializer function.

 - Remove the (now unused) common/ugni component. btl/ugni no longer
   need the component. common/ugni was originally split out of
   btl/ugni to support bcol/ugni. As that component exists there is no
   reason to keep this component.

 - Create wrappers for the ugni functionality required by
   btl/ugni. This was done to ease supporting multiple device
   contexts. The wrappers are thread safe and currently use a spin
   lock instead of a mutex. This produces better performance when
   using multiple threads spread over multiple cores. In the future
   this lock may be replaced by another serialization mechanism. The
   wrappers are located in a new file: btl_ugni_device.h.

 - Remove unnecessary device locking from serial parts of the ugni
   btl. This includes the first add-procs and module finalize.

 - Clean up fragment wait list code by moving enqueue into common
   function.

 - Expose the communication domain flags as an MCA variable. The
   defaults have been updated to reflect the recommended setting for
   knl and haswell.

 - Avoid allocating fragments for communication with already
   overloaded peers.

 - Allocate RDMA endpoints dyncamically. This is needed to support
   spreading RMA operations accross multiple contexts.

 - Add support for spreading RMA communication over multiple ugni
   device contexts. This should greatly improve the threading
   performance when communicating with multiple peers. By default the
   number of virtual devices depends on 1) whether
   opal_using_threads() is set, 2) how many local processes are in the
   job, and 3) how many bits are available in the pid. The last is
   used to ensure that each CDM is created with a unique id.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-13 14:46:06 -06:00
Nathan Hjelm
12bf38a25c btl/ugni: add MPI_T performance variables for ugni counters
This commit exposes ugni statistics for use with MPI_T. There is
no overhead to providing these counters.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-13 14:42:58 -06:00
Joshua Ladd
b28647857f Adding latest ConnectX-5 adapter vendor part id to OpenIB device params.
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2017-03-07 00:19:54 +02:00
Jeff Squyres
5b484c91f4 btl/tcp: use show_help to print the dropped-TCP warning
Make the message more friendly / more detailed, and de-duplicate it
(just in case it happens a lot).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-01 16:31:29 -08:00
George Bosilca
ec4a235e6a
Allow a TCP proc release during the create.
This is mostly for error cases, where we need to release the
newly created proc. Currently the code deadlocks because the endpoint
lock is help at the release and the lock is not recursive.

Aslo added some code to print the IP addresses that don't match during
the TCP connection step.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-03-01 13:17:54 -05:00
Jeff Squyres
fec519a793 hwloc: rename opal/mca/hwloc/hwloc.h -> hwloc-internal.h
Per a prior commit, the presence of "hwloc.h" can cause ambiguity when
using --with-hwloc=external (i.e., whether to include
opal/mca/hwloc/hwloc.h or whether to include the system-installed
hwloc.h).

This commit:

1. Renames opal/mca/hwloc/hwloc.h to hwloc-internal.h.
2. Adds opal/mca/hwloc/autogen.options to tell autogen.pl to expect to
   find hwloc-internal.h (instead of hwloc.h) in opal/mca/hwloc.
3. s@opal/mca/hwloc/hwloc.h@opal/mca/hwloc/hwloc-internal.h@g in the
   rest of the code base.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-28 07:48:42 -08:00
Jeff Squyres
a8247a76c9 Merge pull request #2948 from jsquyres/pr/update-warn-component-unused
help btl base: tell how to disable the warning
2017-02-09 21:10:01 -05:00
Jeff Squyres
e272250531 help btl base: tell how to disable the warning
As reported in
https://www.mail-archive.com/users@lists.open-mpi.org/msg30607.html,
give instructions in the show_help message how to disable the
warning.  Thanks to Susan Schwarz for reporting the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-09 15:51:30 -08:00
George Bosilca
bc2890ed11
Upon a new connection go over all available ifaces.
Add a verbose to show all the failed attempts to match the
remote interfaces based on the modex info.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-02-07 19:15:49 -05:00
Gilles Gouaillardet
c62498ab3d btl/tcp: remove reference to just removed tcp_local
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-07 09:32:09 +09:00
Jeff Squyres
368ab4d9a5 Merge pull request #2684 from bosilca/topic/tcp_fixes
Remove the tcp_local field from the TCP component.
2017-02-06 16:32:06 -05:00
Nathan Hjelm
9f28c0af39 verbs: remove extra event user increment/decrement operation
Since the oob and connections systems do not work the same way they
did in older versions of Open MPI these operations are no longer
necessary. At best they do nothing and at worst they hurt performance
by making us enter the event library more often in opal_progress().

Fixes #2839

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-01-25 18:37:06 -07:00
George Bosilca
999d4973a9
Fix an issue with extremely large data identified by tjb900.
Due to the conversion from ssize_t to int we were losing bytes, and
ended up writing outside the receiver buffer. Similarly on the send,
due to the conversion to a lesser type, we could missinterpret the
end of the fragment.
2017-01-18 10:33:12 -05:00
Gilles Gouaillardet
aeee48357a btl/sm: correctly handle nodes with zero NUMA hwloc object
the hwloc topology might not contain a NUMA object with hwloc < v2
if the node is not NUMA, so force the NUMA object count to one
in order to correctly allocate mca_btl_sm_component.sm_mpools.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-12 11:45:29 +09:00
Jeff Squyres
b980e334dc usnic: add completion stats
This should probably not go to the v2.x branch, since it changes the
output format of the usnic stats.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:54 -08:00
Jeff Squyres
706f53bb01 usnic: ensure that stats string is always truncated
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:54 -08:00
Jeff Squyres
1fdd0fe228 usnic: add missing params to show_help() call
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:54 -08:00
Jeff Squyres
7048adec04 usnic: add some assert()s
Add some run-time assert checks for debug builds.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:32 -08:00
Jeff Squyres
2d28ccb5fd usnic: add verbose output of queue lengths
Show the actual RX/TX and CQ length returned by libfabric in verbose
output.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:32 -08:00
Jeff Squyres
bd5b8ed754 usnic: ensure that queues are long enough
Double check the queue lengths that we get back from libfabric to
ensure that they are at least as long as we need.  They *should* never
be shorter than we need, but let's just check to be sure.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:32 -08:00
Jeff Squyres
53dc75a89c usnic: ensure to reset flags on returned frags
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:31 -08:00
Jeff Squyres
c4d7876ca0 usnic: check send credits on data channel for data frags
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:31 -08:00
Jeff Squyres
879d25e5df usnic: ensure to check send credits for ACKs
Don't just blindly send ACKs; ensure that we have send credits before
doing so.  If we don't have any send credits, just don't send the ACK
(it'll come again soon enough; it's not a tragedy if we don't send it
now).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:31 -08:00
Jeff Squyres
7787dad4db usnic: ensure CQs are long enough
The libfabric usnic provider may give you back TX/RX queues that are
longer than you asked for.  So just use the TX/RQ/CQ lengths that we
asked for, regardless of what length comes back.

Additionally, keep the length of the priority channel CQ separate from
the length of the data CQ.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:03:53 -08:00
Jeff Squyres
b02d8c48f5 usnic: make the releasing safer
Since the usnic BTL is single-threaded in this area, there really is
no danger, but don't use one of the pointers hanging off the frag
after we return it to the freelist.  Instead, save the endpoint
pointer before returning the frag.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:03:53 -08:00
Jeff Squyres
e25b860627 usnic: clarify types
The types are technically typedef equivalent, but it's less confusing
to use the types that agree with the name of the constructor.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:03:53 -08:00
Jeff Squyres
40fe575132 usnic: trivial updates (no code/logic changes)
- Add more explanatory comments
- Trivial whitespace / style updates
- Rename opal_btl_usnic_force_retrans() -> opal_btl_usnic_fast_retrans()

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 10:40:02 -08:00
George Bosilca
cfeeecd381 Remove the tcp_local field from the TCP component.
Instead use the OPAL process name to get the name
of the local process.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-01-07 13:24:18 -05:00
Gilles Gouaillardet
c2ddb1e2fc mca/base: plug a memory leak
register mca_base_var_enum_value_flag_t so they can be free'd
upon finalize

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:36 +09:00
Gilles Gouaillardet
7e5da7382e btl/tcp: plug leaks when closing component
remove tcp_local from the tcp_procs table, and release it

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Ralph Castain
fe68f23099 Only instantiate the HWLOC topology in an MPI process if it actually will be used.
There are only five places in the non-daemon code paths where opal_hwloc_topology is currently referenced:

* shared memory BTLs (sm, smcuda). I have added a code path to those components that uses the location string
  instead of the topology itself, if available, thus avoiding instantiating the topology

* openib BTL. This uses the distance matrix. At present, I haven't developed a method
  for replacing that reference. Thus, this component will instantiate the topology

* usnic BTL. Uses the distance matrix.

* treematch TOPO component. Does some complex tree-based algorithm, so it will instantiate
  the topology

* ess base functions. If a process is direct launched and not bound at launch, this
  code attempts to bind it. Thus, procs in this scenario will instantiate the
  topology

Note that instantiating the topology on complex chips such as KNL can consume
megabytes of memory.

Fix pernode binding policy

Properly handle the unbound case

Correct pointer usage

Do not free static error messages!

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-29 10:33:29 -08:00
Ralph Castain
3a2d6a5ab6 Begin to reduce reliance of application procs on the topology tree itself by having the daemon provide more detailed info. In this case, provide the topology description string so that procs can readily determine the number of types of objects on the node, and a "locality" string that describes which objects this process is executing upon. The latter allows a process to compute the objects of overlap between itself and another proc without consulting the topology tree.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-28 09:14:26 -08:00
Gilles Gouaillardet
54c84196a6 btl/vader: plug a memory leak
as reported by Coverity with CID 1362691

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-22 16:04:36 +09:00
Gilles Gouaillardet
3a76a78bff btl/openib: plug a memory leak in btl_openib_register_mca_params()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-01 14:24:30 +09:00
Gilles Gouaillardet
1a279c4ee9 btl/self: fix fragment segment length in mca_btl_self_prepare_src()
opal_convertor_pack() might pack less bytes than requested,
so always set frag->segments[0].seg_len.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-11-24 10:44:56 +09:00
Howard Pritchard
09f47fcf8e btl/ugni:vader swat some compiler warnings
Swat some compiler warnings.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-11-21 14:58:34 -06:00
Joshua Ladd
4907085c6f Add the ConnectX-5 device ID to openib BTL.
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-11-16 21:42:37 +02:00
George Bosilca
d0dddef53d
Protect the tcp_endpoints list from concurrent accesses.
Thanks Gilles for your help.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2016-11-11 00:06:03 -05:00
Gilles Gouaillardet
a49422fe84 btl/tcp: get rid of the MCA_BTL_TCP_SUPPORT_PROGRESS_THREAD macro
since pthreads are now mandatory, the MCA_BTL_TCP_SUPPORT_PROGRESS_THREAD
is always true and hence can be safely removed

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-11-08 14:00:05 +09:00
Jeff Squyres
a4ffa590c8 Merge pull request #2308 from hjelmn/vader_mem
btl/vader: reduce memory footprint when using xpmem
2016-11-02 10:28:26 -04:00
Steve Wise
7050969d47 openib btl: remove BTL_OPENIB_FAILOVER_ENABLED code
Remove BTL_OPENIB_FAILOVER_ENABLED code in the openib btl source.

Remove the failover-specific files from the openib btl.

Update the openib/Makefile.am accordingly.

Remove the -enable-openib-failover config logic.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
2016-11-01 14:45:36 -07:00
Jeff Squyres
149b660666 btl/usnic: fix compiler warning
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-28 07:36:20 -07:00
Nathan Hjelm
9d92075e60 btl/self: rewrite to decrease memory usage (#2307)
This commit rewrites much of the btl/self component to fix a long
standing memory usage bug. Before this commit the prepare_src path
would always allocate a max send fragment (256kB). This caused the
rank to allocate 32 * 256k useless buffers from one send. This commit
makes the following changes:

 - Add the MCA_BTL_FLAGS_GET flag by default. No reason not to set it.

 - Reduce the eager limit, max send size, buffers per allocation, and
   maximum buffer count per fragment size. These changes should have
   no noticible affect on performance but should greatly reduce the
   memory usage of the component.

 - Implement the sendi function. This should reduce self send latency
   somewhat.

 - Rewrite prepare_src to never allocate a eager or max send fragment
   for contiguous data.

 - add_procs needs to return something in the peer array for the proc
   self not just set the reachability bit. Now stores (void *) 1.

 - Various cleanups. Removed and unused file.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-10-27 12:34:54 -04:00
Nathan Hjelm
a652a193ea btl/vader: reduce memory footprint when using xpmem
The vader btl kept a per-peer registration cache to keep track of
attachments. This is not really a problem with small numbers of local
ranks but can be a problem with large SMP machines. To reduce the
footprint there is now one registration cache for all xpmem
attachments. This will probably increase the lookup time for large
transfers but is a worthwhile trade-off.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-10-27 10:09:43 -06:00
Potnuri Bharat Teja
29f1aa836f btl/openib: remove unwanted ompi header inclusion in opal code.
OMPI header cannot be included in OPAL source code, hence removed it.
Fixes: (740b636db) btl/openib: Disqualify rdmacm CPC if
MPI_THREAD_MULTIPLE.

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
2016-10-13 16:21:36 +05:30
Jeff Squyres
bcbf0bc4f9 usnic: s/OMPI/OPAL/
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-11 16:43:35 -07:00
Gilles Gouaillardet
c92e9a5406 use the new OPAL_HASH_TABLE_FOREACH convenience macro 2016-10-08 16:58:20 +09:00
Jeff Squyres
67684be7c9 usnic: fix one last stray fabric_attr->name --> linux_device_name
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-04 18:17:38 -07:00
Jeff Squyres
8b77359cac usnic: remove some legacy libfabric 1.0/1.1 code
We only support running with libfabric v1.3 or greater.  So it's safe
to remove the legacy/adaptive cq_readerr() behavior.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-03 11:59:41 -07:00
Jeff Squyres
345c07a252 usnic: require libfabric >= v1.3 at run time
There are critical usnic libfabric AV insert bugs before v1.3, so
don't allow any version prior to v1.3 at run time (still allow
*compiling* with earlier versions, though, since the ABI guarantees
allow us to compile with an earlier libfabric and run with a later
libfabric).

Switch to using fi_version() to check the version (instead of calling
fi_getinfo()) as a potentially lighter-weight / simpler solution.
This allows us to only call fi_getinfo() once.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-03 11:59:41 -07:00
Jeff Squyres
b13813810f usnic: print a helpful message invoke PML error callback
The previous message was unhelpful / confusing.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-03 11:59:41 -07:00
Jeff Squyres
545d8f2e66 usnic cagent: correctly compute the "large" ping message size
The (effective) "+42" computation was, in fact, the incorrect answer
in this case (gasp!).

We should just take the max_msg_size from the command (which came from
the libfabric endpoint max_msg_size attribute in the client) and
subtract off the max header size: 68 (which is explained in the
comment).  This will result in a "large" message size which is likely
slightly smaller than the MTU, but still right up near the MTU, and
therefore good enough.

Note: the old computation (i.e., -(68-42)) worked fine when we asked
for Libfabric API v1.1 because the usnic provider would return a
max_msg_size that was already less than the MTU due to FI_PREFIX
behavior shenanigans.  Once we started asking for Libfabric API v1.4,
the usnic Libfabric provider started returning (MTU + prefix_size),
and the -(68-42) computation started giving a value that was over the
MTU.  This caused sendto() on the connectivity checker UDP socket
to fail.

This commit also removes an old/misleading comment.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-09-30 17:01:05 -07:00
Joshua Hursey
f6f24a4f67 build: Custom libmpi(_FOO) name option in configure
* Add a configure time option to rename libmpi(_FOO).*
   - `--with-libmpi-name=STRING`
 * This commit only impacts the installed libraries.
   Internal, temporary libraries have not been renamed to limit the
   scope of the patch to only what is needed.

For example:
```shell
shell$ ./configure --with-libmpi-name=wookie
...
shell$ find . -name "libmpi*"
shell$ find . -name "libwookie*"
./lib/libwookie.so.0.0.0
./lib/libwookie.so.0
./lib/libwookie.so
./lib/libwookie.la
./lib/libwookie_mpifh.so.0.0.0
./lib/libwookie_mpifh.so.0
./lib/libwookie_mpifh.so
./lib/libwookie_mpifh.la
./lib/libwookie_usempi.so.0.0.0
./lib/libwookie_usempi.so.0
./lib/libwookie_usempi.so
./lib/libwookie_usempi.la
shell$
```
2016-09-29 21:47:24 -05:00
Jeff Squyres
1a5a5fb400 Merge pull request #1861 from bharatpotnuri/master
btl/openib: Disqualify rdmacm CPC if MPI_THREAD_MULTIPLE
2016-09-27 13:03:35 -04:00
Potnuri Bharat Teja
740b636dbe btl/openib: Disqualify rdmacm CPC if MPI_THREAD_MULTIPLE
The rdmacm CPC in the openib BTL is not thread safe. The rdmacm CPC
should disqualify itself (instead of failing in random ways) if
MPI_THREAD_MULTIPLE is the thread level.

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
2016-09-27 14:20:59 +05:30
George Bosilca
93fa94f96f Re-enable support for local addresses.
This patch is based on the "RFC: Reenabling the TCP BTL over local
interfaces (when specifically requested)". It removes the hardcoded
exception for the local devices that has been enforced by the
TCP BTL. Instead, we exclude the local interface only via the
exclude MCA (both IPv4 and IPv6 local addresses are already in the
default if_exclude), which is also the behavior currently described in
our README file.
2016-09-23 13:04:33 -04:00
Nathan Hjelm
a681837ba8 btl/tcp: fix double list remove
This commit fixes an abort during finalize because pending events were
removed from the list twice.

References #2030

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-09-13 09:23:12 -06:00
Jeff Squyres
527efec4fb Merge pull request #2050 from jsquyres/pr/btl-tcp-help-messages
Add a show_help message to TCP BTL when peer unexpectedly disconnects
2016-09-06 09:40:31 -04:00
Jeff Squyres
1953e3406f btl/tcp: add show_help message when peer hangs up
We commonly see messages on the users list where a peer has hung up
because it has crashed.  Instead of having just a BTL_ERROR message,
make this a real opal_show_help() message that tells the user that the
peer unexpectedly hung up, and they should look into *why* that peer
hung up.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-09-06 09:40:03 -04:00
Gilles Gouaillardet
4b208e4463 btl/tcp: make mca_btl_tcp_proc_insert re-entrant
otherwise bad things happen with
 --mca btl_tcp_progress_thread 1 (non default)
and
 --mca mpi_add_procs_cutoff 0 (default)
2016-09-05 15:57:34 +09:00
Jeff Squyres
95c6f6cfc0 btl/tcp: fix help message
It looks like one help message was accidentally pasted in the middle
of another.  Disentangle the two messages from each other, and
slightly tweak the one message to say that the job may also crash (in
addition to hanging).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-09-02 17:14:22 -04:00
Nathan Hjelm
f93c1f2106 btl/ugni: fix erroneous warning message
This commit prevents the connection code from trying to connect an
endpoint if the directed datagram has been posted but not received.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-09-02 09:17:44 -06:00
Jeff Squyres
87a5ccc060 usnic: show the local UDP ports
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-26 12:25:18 -07:00
Jeff Squyres
9ae51a09f2 Merge pull request #1989 from jsquyres/pr/update-usnic-to-libfabric-v1.4
Update usnic BTL to libfabric v1.4
2016-08-26 09:53:07 -04:00
Jeff Squyres
f56b16f079 usnic: remove unused variable
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-25 03:53:18 -07:00
Jeff Squyres
9717bcb7e6 btl/usnic: remove stale comment
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-25 03:53:18 -07:00
Jeff Squyres
6f5e377fe0 btl/usnic: update for libfabric v1.4
With libfabric v1.4, the usnic provider changed the values of its
fabric and domain name strings (compared to libfabric <v1.4).  Update
the Open MPI usNIC BTL to handle both pre-v1.4 and v1.4 fabric/domain
names.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-25 03:53:17 -07:00
George Bosilca
3adff9d323 Fixes #1793.
Reshape the tearing down process (connection close) to prevent race
conditions between the main thread and the progress thread.

Minor cleanups.
2016-08-24 22:45:19 -04:00
Nathan Hjelm
83062db7cb btl/ugni: actually make the endpoint lock recursive
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-24 10:36:08 -06:00
Potnuri Bharat Teja
9b7f9ece20 Add Chelsio T6 adapter device parameters.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
2016-08-23 10:38:13 +05:30
Nathan Hjelm
adb668209b btl/ugni: fix another connection race
This commit fixes a race that can occur when two threads are in the
ugni progress function at the same time. This race occurs when one
thread calls GNI_PostDataProbeById then goes to sleep then another
thread calls GNI_PostDataProbeById then GNI_EpPostDataWaitById before
the other thread wakes up. If this happens the first thread will print
a warning on GNI_EpPostDataWaitById about no matching post.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-08 15:38:11 -06:00
Todd Kordenbrock
b90da992c8 Merge pull request #1895 from PDeveze/Patchs-on-btl-portals4
btl/portals4: Take into account the limitation of portals4 (max_msg_s…
2016-08-08 15:12:50 -05:00
Nathan Hjelm
14b36d4503 btl/ugni: protect against re-entry and races in connections
This commit fixes two issues that can occur during a connection:

 - Re-entry to connection progress from modex lookup. Added an
   additional endpoint state that will keep the code from re-entering
   the common endpoint create.

 - Fixed a race between a process posting a directed datagram through
   a send and a connection being progressed through opal_progress().
   The progress code was not obtaining the endpoint lock before
   attempting to update the endpoint. To limit the amount of code
   changed for 2.0.1 this commit makes the endpoint lock recursive. In
   a future update this may be changed.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-04 16:08:01 -06:00
Nathan Hjelm
5e13e1ab7d btl/openib: set send flags only after endpoint is connected
The max inline send size on a queue pair is not available until after
the endpoint is connected. Before this commit the send flags
(including the inline flag) were set before this value was
initialized. This commit moves setting the send_flags down to
mca_btl_openib_put_internal which is only called after the endpoint is
connected. This fixes a bug when using osc/rdma.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-26 16:01:11 -06:00
Gilles Gouaillardet
91ccec342c btl/openib: remove some dead code
remove useless call to opal_mem_hooks_support_level() and the value local variable.
2016-07-22 09:26:33 +09:00
Gilles Gouaillardet
1b3be0ac8c configury + btl/openib: fix a typo
test for existence of struct ibv_exp_device_attr.exp_atomic_cap.
That was previously mistyped struct ibv_exp_device_attr.ext_atomic_cap
2016-07-22 09:26:33 +09:00
Pascal Deveze
6d6ec66705 btl/portals4: Take into account the limitation of portals4 (max_msg_size) 2016-07-19 15:19:29 +02:00
Nathan Hjelm
01d6da31af btl/openib: fix rdmacm locking bug
This commit fixes a long standing bug in rdmacm. It is required that
the thread that calls mca_btl_openib_endpoint_cpc_complete holds the
endpoint lock. This was not the case for rdmacm. This causes debug
builds to abort. This change also required changing
mca_btl_openib_endpoint_send_cts to require the endpoint lock to be
held when calling.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-30 15:50:07 -06:00
Nathan Hjelm
960fcd292c btl/openib: fix rdma hang
This commit is an attempt to fix a hang in finalize of rdmacm. This fixes
a path where no rdmacm client is found for an endpoint.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-29 20:31:26 -06:00
Jeff Squyres
f18d6606da Merge pull request #1824 from hjelmn/rdmacm_fix
btl/openib: fix segmentation fault
2016-06-28 18:10:35 -04:00
Nathan Hjelm
8128c8eb29 btl/openib: fix segmentation fault
This commit fixes a segmentation fault that occurs if a device can be
initialized but not used. In this case the devices_count is not equal
to the number of usable devices in the devices pointer array.

Thanks to @artpol84 for tracking this down.

Fixes open-mpi/ompi#1823

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-28 10:31:32 -06:00
Nathan Hjelm
dac9201f3b Merge pull request #1770 from hjelmn/rdma_wth
btl/openib: fix rdmacm
2016-06-24 22:46:53 -06:00
Thananon Patinyasakdikul
afe07cd5d5 Fixed common symbol in btl/usnic
- This commit fixes the accidental common symbol btl_usnic_lock
- It also moves the btl_usnic_lock declaration to btl_usnic.h
2016-06-20 10:05:44 -07:00
Jeff Squyres
7a8d7fb948 openib: fix compiler warnings
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-06-18 07:15:11 -07:00
Thananon Patinyasakdikul
7bd18214a7 Fix btl/usnic deadlock when the connectivity check is turned off. 2016-06-15 07:42:55 -07:00
Thananon Patinyasakdikul
ee85204c12 Added MPI_THREAD_MULTIPLE support for btl/usnic. 2016-06-13 13:47:06 -07:00
Nathan Hjelm
17ae1aceeb btl/openib: fix rdmacm
The rdma_disconnect function specifies that both the server and client
should call rdma_disconnect. The code was not calling rdma_disconnect
on an endpoint if the event came before the endpoint finalization.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-07 17:53:58 -06:00
Nathan Hjelm
dd519c55b1 btl/openib: fix cq resize calculation
Before dynamic add_procs the openib_btl_size_queues was called exactly
once for non-dynamic jobs. Now the function is called on each new
connection so the calculation was wrong. Re-wrote the function to
correctly calculate the CQ size and only attempt to adjust the CQ if
the requested size has changed. This fixes a bug when using the openib
btl on psm2 hardware that is caused by the time needed to resize a
CQ. The overhead was causing udcm to timeout and fail.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-07 16:05:56 -06:00
Nathan Hjelm
6169d03ea3 btl: adjust values of new atomic flags
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-06-02 19:21:34 -06:00
Nathan Hjelm
9f43b23725 Merge pull request #1710 from hjelmn/ugni_atomics
Additional ugni atomics
2016-06-02 18:25:49 -06:00
Nathan Hjelm
ceb2912838 Merge pull request #1736 from hjelmn/ugni_fixes
ugni BTL fixes
2016-06-01 14:59:55 -06:00
Gilles Gouaillardet
57978a75d0 Merge pull request #1717 from ggouaillardet/topic/lex_cleanup
configury: clean the flex generated .c files
2016-06-01 13:06:21 +09:00
Nathan Hjelm
5d4bcce042 Merge pull request #1700 from shamisp/topic/cma_config
CMA: Fixing logic for CMA system call detection
2016-05-31 20:33:48 -06:00
Nathan Hjelm
340152a635 Merge pull request #1720 from shamisp/topic/vader/max_addr
VADER: Adjusting VADER_MAX_ADDRESS for non x86 platforms.
2016-05-31 20:33:28 -06:00
Gilles Gouaillardet
5f565dfec3 configury: clean the flex generated .c files 2016-06-01 11:13:31 +09:00
Nathan Hjelm
bf10d79914 btl/ugni: remove erroneous unlock
The endpoint lock was being released twice in mca_btl_ugni_get_ep.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-31 16:52:53 -06:00
Nathan Hjelm
cc96097873 btl/ugni: fix bug when attempting unaligned get on aries
This commit fixes a programming error when using an aries nic. The
documentation of ugni shows that only the local alignment restriction
for get was lifted on aries. There is still a remote address alignment
restriction.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-31 16:52:09 -06:00
George Bosilca
d2abff583e Fix race condition during BTL TCP tear-down.
bot🏷️bug
bot:assign:@hjelmn
2016-05-30 10:47:14 -05:00
Nathan Hjelm
28dfa36a3f btl/ugni: fix bug when attempting unaligned get on aries
This commit fixes a programming error when using an aries nic. The
documentation of ugni shows that only the local alignment restriction
for get was lifted on aries. There is still a remote address alignment
restriction.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-27 08:22:13 -06:00
Nathan Hjelm
c19426ac1b btl/ugni: add support for additional atomic operations
This commit adds support for Cray Aries atomic operations. This
includes 32-bit and floating point support.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-27 08:22:13 -06:00
Nathan Hjelm
23fe19a956 btl: add support for more atomics
This commit add support for more atomic operations and type. The
operations added are logical and, logical or, logical xor, swap, min,
and max. New types are 32-bit int by using the
MCA_BTL_ATOMIC_FLAG_32BIT flag, 64-bit float by using the
MCA_BTL_ATOMIC_FLAG_FLOAT flag, and 32-bit float by using both
flags. Floating point numbers are supported by packing the number in
as an int64_t or int32_t. We will update the btl interface in the
future to make this less confusing.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-27 08:22:13 -06:00
Nathan Hjelm
8c9292d5d1 Merge pull request #1721 from hjelmn/xrc_fix
btl/openib: fix XRC WQE calculation
2016-05-26 17:00:31 -06:00
Nathan Hjelm
56bdcd0888 btl/openib: fix XRC WQE calculation
Before dynamic add_procs support was committed to master we called
add_procs with every proc in the job. The XRC code in the openib btl
was taking advantage of this and setting the number of work queue
entries (WQE) based on all the procs on a remote node. Since that is
no longer the case we can not simply increment the sd_wqe field on the
queue pair. To fix the issue a new field has been added to the xrc
queue pair structure to keep track of how many wqes there are total on
the queue pair. If a new endpoint is added that increases the number
of wqes and the xrc queue pair is already connected the code will
attempt to modify the number of wqes on the queue pair. A failure is
ignored because all that will happen is the number of active send work
requests on an XRC queue pair will be more limited.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-26 15:58:31 -06:00
Aurelien Bouteiller
49bd28d0ac Merge pull request #1714 from hjelmn/scif_exclusivity
btl/scif: reduce default exclusivity
2016-05-26 17:53:11 -04:00
Pavel Shamis (Pasha)
60fd25f3fb VADER: Adjusting VADER_MAX_ADDRESS for non x86 platforms.
The original VADER_MAX_ADDRESS was tunned for x86_64 platforms only.
For non x86_64 platforms we can use XPMEM_MAXADDR_SIZE.

Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>
2016-05-26 16:38:04 -05:00
Nathan Hjelm
99627319f0 btl/ugni: reduce overhead of progress function
This commit reduces the overhead of calling the ugni progress
function. It does the following:

 - Check for new connections once every eight calls.

 - Do not call remote smsg progress unless we are connected to at
   least one remote peer.

 - Do not call rdma progress unless at least one rdma fragment is
   outstanding.

 - Check endpoint wait list size before obtaining a lock.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 14:27:34 -06:00
Nathan Hjelm
5caf12cd9b btl/scif: reduce default exclusivity
This commit reduces the default exclusivity so that btl/scif is not
used for send/recv over other shared memory transports.

Fixes open-mpi/ompi#1712

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 14:25:07 -06:00
Pavel Shamis (Pasha)
d984b4b3f9 CMA: Fixing logic for CMA system call detection
The OPAL_CMA_NEED_SYSCALL_DEFS is always defined/set to 0 or 1.  Therefore
instead of checking if the macro is defined, we have to look at the value
itself.

Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>
2016-05-24 14:53:25 -05:00
Gilles Gouaillardet
d5a2ac6f2f btl/openib: fix #if vs #ifdef 2016-05-23 14:27:33 +09:00
Gilles Gouaillardet
5a8cbe5a8f btl/openib: remove obsolete reference to MEMORY_LINUX_MALLOC_ALIGN_ENABLED macro 2016-05-23 14:12:21 +09:00
Jeff Squyres
66f53ec29a Merge pull request #1628 from kmroz/wip-btl-tcp-ethtool-speed
btl/tcp: autodetect bandwidth and latency if unset by the user
2016-05-18 12:12:55 -04:00
Karol Mroz
ca6ddf3270 btl/tcp: autodetect bandwidth and latency if unset
Fixes open-mpi/ompi#120

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-05-18 16:25:52 +02:00
Karol Mroz
b9c6c43c6b btl/tcp: add default defines for bandwidth and latency
Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-05-18 16:25:52 +02:00
Nathan Hjelm
ab8ed177f5 rcache: fix deadlock in multi-threaded environments
This commit fixes several bugs in the registration cache code:

 - Fix a programming error in the grdma invalidation function that can
   cause an infinite loop if more than 100 registrations are
   associated with a munmapped region. This happens because the
   mca_rcache_base_vma_find_all function returns the same 100
   registrations on each call. This has been fixed by adding an
   iterate function to the vma tree interface.

 - Always obtain the vma lock when needed. This is required because
   there may be other threads in the system even if
   opal_using_threads() is false. Additionally, since it is safe to do
   so (the vma lock is recursive) the vma interface has been made
   thread safe.

 - Avoid calling free() while holding a lock. This avoids race
   conditions with locks held outside the Open MPI code.

Fixes open-mpi/ompi#1654.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-17 09:02:40 -06:00
Gilles Gouaillardet
456b73da69 btl/openib: fix error path in init_one_device()
do not explicitly release ib verbs components since they will
be released in the object destructor

Thanks Durga for the report
2016-05-13 09:03:48 +09:00
Ralph Castain
08022d7af1 Some minor cleanups of warnings from gcc 6.0.0. Update s1/s2 pmix to get max_procs as required. 2016-05-05 15:28:13 -07:00
Nathan Hjelm
0f54a95408 Merge pull request #1626 from hjelmn/vader_32
btl/vader: fix compilation on 32-bit systems
2016-05-03 16:39:46 -06:00
Nathan Hjelm
e7ccbdee27 btl/vader: fix compilation on 32-bit systems
This commit fixes a compile/link issue caused by vader. The vader btl
was using OPAL_THREAD_ADD64 to increment a counter which may not be
available on 32-bit systems. Changed to use OPAL_THREAD_ADD_SIZE_T
which will be 64-bit or 32-bit depending on the system.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-03 10:14:44 -06:00
Nathan Hjelm
a65af6d079 btl/openib: fix check for exp verbs struct members
This commit fixes a compilation issue with some versions of exp
verbs. In some cases struct ibv_exp_device_attr does not have either
the exp_atom or exp_atomic_cap fields. It is fine to drop one check
and fall back to the non-exp attribute check on the other.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-02 17:13:33 -06:00
George Bosilca
3445577f4c Avoid race conditions during BTP TCP handshake.
In some rare cases when a process receives the connect ack while
locally updating the peer endpoint structure, we could drop the
incomming connect ack due to the fact that the send handler is
protected with a try lock (on the endpoint) and our initial send
event was not persistent. Making the send event persistent solves
all issues.
2016-05-01 14:19:29 -04:00
George Bosilca
702f80ad7e Remove "signed vs. unsigned" warnings. 2016-05-01 11:45:48 -04:00
Nathan Hjelm
03f4a854cb btl/tcp: fix add_procs race condition
This commit fixes a race between a thread calling the tcp btl's
add_procs and a thread processing an incomming connection. The race
occured because the add_procs thread adds a newly created proc object
to the hash table *before* the object is fully initialized. The
connection thread then attempts to use the object before the endpoints
array on the object has beeen allocation. The fix is to only add the
proc to the hash table after it has been completely initialized.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-27 10:24:39 -06:00
Jeff Squyres
dc18c32437 usnic: fix resource check
The math for checking the number of QPs and CQs per usNIC/VF was
incorrect, allowing you to run MPI processes even when usNICs (i.e.,
VIC VFs) had fewer QPs and CQs than were necessary.  This led to a
confusing error later when fi_enable(3) failed (because we lazily
create QPs).  Fixing the math here ensure that we actually print a
helpful error message telling the user specifically what is wrong.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-04-22 15:58:27 -07:00
Jeff Squyres
6800ef9ec0 m4: rename OMPI_SUMMARY_* macros to OPAL_SUMMARY_*
These macros should really be named OPAL_SUMMARY_*; they're used in
all projects, and therefore should be in the lowest later project (OPAL).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-04-20 08:40:00 -07:00
Nathan Hjelm
1e6b4f2f55 Merge pull request #1495 from hjelmn/new_hooks
Add new patcher memory hooks
2016-04-13 18:19:23 -06:00
Nathan Hjelm
c2b6fbb124 opal/memory: move initialization to first rcache creation
Because of the removal of the linux memory component it is no longer
necessary to initialize the memory component in opal_init(). This
commit moves the initialization to the creation of the first rcache
component.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:21:46 -06:00
Gilles Gouaillardet
c72688e8cf Merge pull request #1362 from ggouaillardet/topic/openib_warn_default_gid_prefix
btl/openib: correctly issue a warning when two btls or more are in th…
2016-04-11 13:22:48 +09:00
Thananon Patinyasakdikul
92290b94e0 Fixed Coverity reports 1358014-1358018 (DEADCODE and CHECK_RETURN) 2016-04-07 12:52:17 -04:00
Nathan Hjelm
9efd465539 Merge pull request #1517 from hjelmn/ugni_fixes
Gemini/Aries bug fixes
2016-04-05 07:23:18 -06:00
George Bosilca
26fc8533f8 Remove compiler warnings. 2016-04-04 16:34:23 -04:00
Nathan Hjelm
d7874920aa btl/ugni: set the frag reference count in the eager get path
This comit adds code that sets the fragment reg_cnt to 1 when sending
the completion message for an eager get. Without this the btl will
either hang or abort.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-02 12:10:22 -06:00
Ralph Castain
7eca2f9650 Add missing include 2016-03-30 01:34:01 -07:00
Gilles Gouaillardet
e852d85cc1 btl/tcp: add missing mca_btl_tcp_dump() subroutine 2016-03-30 16:10:15 +09:00
Ralph Castain
f70c5c495b Tsk...tsk...replace references to ompi values with opal 2016-03-29 13:35:43 -07:00
George Bosilca
d0165818b3 Initialize all common symbols. 2016-03-29 16:08:27 -04:00
Jeff Squyres
91c54d7a07 Merge pull request #1491 from ICLDisco/progress_thread
BTL TCP async progress
2016-03-29 06:26:10 -04:00
George Bosilca
f69eba1bc4 Update the copyright and cleanup the code.
Per @jsquyres suggestion remove all trailing spaces.
Credit to `sed -i.bak 's/ *$//' */[ch]`.
2016-03-28 14:41:01 -04:00
Thananon Patinyasakdikul
92062492b9 Enable Threading in the BTL TCP
Added mca parameter to turn progress thread on/off
Add a flag to check if we have btl progress thread.
Added macro for ob1 matching lock.
Update the AUTHORS file.
2016-03-28 14:41:01 -04:00
George Bosilca
32277db6ab Add support for async progress in the BTL TCP.
All BTL-only operations (basically all data movements
with the exception of the matching operation) can now
be handled for the TCP BTL by a progress thread.
2016-03-28 14:40:50 -04:00
Jeff Squyres
4a3c986a80 usnic: remove need for hwloc verbs helper
Haven't needed this for a while, but it got left in by accident.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-03-28 09:10:12 -07:00
Jeff Squyres
05e2423756 usnic: specify the cache name
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-03-28 09:02:52 -07:00
Gilles Gouaillardet
4a76f23f40 btl/openib: do not issue an error message if modex cannot retrieve openib info 2016-03-28 10:42:16 +09:00
Gilles Gouaillardet
861564df94 btl/openib: correctly issue a warning when two btls or more are in the default subnet gid
Thanks Matias Cabral for reporting this issue.

Fixes #1352
2016-03-28 09:17:29 +09:00