1
1
Граф коммитов

801 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
bffcc3bca0 util: move graph solver from usnic to util
Cisco wrote a bipartite graph solver to properly solve
interface pair selection for usNIC.  Using the reachable
framework, the TCP BTL (and possibly the runtime network
code) can use the graph solver to make more optimal pair
selection.  Jeff was happy to have the code more broadly
used, but didn't have time to do the move, hence this
commit.

There are a couple of minor changes to the code compared
to the usNIC version.  Obviously, the functions have
been renamed to match naming convention for their new
home.  Since it's easier to write unit tests for
util/ code, the unit tests have been made first class
tests run at "make check" time.  This last bit required
moving some of the definitions into a new header,
bipartite_graph_internal.h, so that they could be
included in both the library code and the test code.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2017-09-15 15:08:47 -07:00
Nathan Hjelm
0851122cce btl/openib/udcm: add support for connection across subnets
This commit adds the code necessary to support forming connections across
subnets. The primary changes are to 1) add the gid to the modex, and 2)
use the gid to create the address handle.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2017-09-14 06:42:06 -10:00
George Bosilca
d10522a01c
Set a hard limit on the TCP max fragment size.
Some OSes have hardcoded limits to prevent overflowing over an int32_t.
We can either detect this at configure (which might be a nicer but
incomplete solution), or always force the pipelined protocol over TCP.
As it only covers data larger than 1GB, no performance penalty is to be
expected.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-09-01 18:52:48 -04:00
George Bosilca
c340da2586
A first cut at the large data problem with TCP. As long
as the writev and readv support a sum larger than a uint32_t
this version will work. For the other OSes a different patch
is required. This patch is a slight modification of the one
proposed by @ggouaillardet.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-09-01 18:52:48 -04:00
Josh Hursey
ad87aa2674 Merge pull request #4121 from jjhursey/explore/dlopen-local
mca: Dynamic components link against project lib
2017-08-25 13:15:51 -05:00
Joshua Hursey
e1d079544b mca: Dynamic components link against project lib
* Resolves #3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-08-24 11:56:16 -04:00
George Bosilca
50f471e31e
Cleanup a set of warnings reported by Ralph.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-08-22 23:00:18 -04:00
Ralph Castain
38e363c515 Fix the #if check for hwloc version
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-22 14:07:36 -07:00
Brian Barrett
c667719a3f Merge pull request #3955 from mohanasudhan/master
Btl tcp: Improved diagnostic output and failure mode
2017-08-18 11:42:27 -07:00
Mohan
fc32ae401e Btl Tcp: Updated tcp handshake methods
This commit has two changes

1. Adding magic string during handshake can cause
issue when used with older version of MPI. Hence set
RCVTIMEO paramter to 2 second
2. Using single call during handshake instead of
two calls

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-18 10:06:52 -07:00
Mohan
e3dfe11da9 Btl tcp: Improving verbose around tcp
As part of improvement towards tcp btl we
are improving verbose in general

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 17:22:16 -07:00
Mohan
4bc7b214dc Btl tcp: Improving verbose around IPV6
As part of improvement around tcp btl debugging
& verbose. we are improving verbose around IPV6

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:14 -07:00
Mohan
0741fad479 Btl tcp: BTL_ERROR to show_help & update func behaviour
As part of improvement towards tcp debugging
we are moving few BTL_ERROR to show_help and also
update the function behaviour of
mca_btl_tcp_endpoint_complete_connect to return
SUCCESS and ERROR cases.

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:14 -07:00
Mohan
368f9f0dfc Btl tcp: Using magic string to verify mpi connection
As part of improvement towards handling failure case
in btl tcp we are using magic string to verify mpi
connection. In case if there is mismatch or missing
magic string we can identify that we are trying to
connect with someother process.

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:13 -07:00
Mohan
c30a42917c Btl tcp: Refactoring non-blocking send/receive function
Moving non-blocking send/receive function to btl_tcp
will help reusing these function where ever needed.
In this case we plan to reuse receive function to
retrive magic string to validate established connection
is from mpi process.

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-08-17 16:45:13 -07:00
Jeff Squyres
a591159fb4 btl/usnic: restore configure usNIC summary line
Not sure how/when this got deleted, but put back the "Cisco usNIC"
line in the transport summary at the end of configure.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-08-16 12:37:59 -07:00
Jeff Squyres
6889948475 Merge pull request #4058 from thananon/pr/usnic_fix_credit
btl/usnic: assign the number of send credit correctly.
2017-08-09 11:46:42 -04:00
Thananon Patinyasakdikul
68658e4bab btl/usnic: assign the number of send credit correctly.
usnic endpoints was always created with default send credit value of 8. This
commit assign the correct number from the hardware instead.

Signed-off-by: Thananon Patinyasakdikul <apatinya@cisco.com>
2017-08-08 17:01:16 -07:00
Nathan Hjelm
76320a8ba5 opal: rename opal_atomic_init to opal_atomic_lock_init
This function is used to initalize and opal atomic lock. The old name
was confusing.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-08-07 14:15:11 -06:00
Joshua Hursey
196b314643 btl/sm: Missing argv header
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-08-04 21:10:49 -04:00
Howard Pritchard
5ce07a6983 Merge pull request #3997 from hppritcha/topic/swat_compiler_warning
btl/ugni: swat compiler warning
2017-08-02 15:44:09 -06:00
Ralph Castain
f39ce67982 Merge pull request #3951 from rhc54/topic/hwloc2
Update to hwloc 2.0.0a
2017-08-01 15:18:31 -06:00
Howard Pritchard
12a5aacdfd btl/ugni: swat compiler warning
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
2017-08-01 12:21:57 -06:00
Jeff Squyres
d954167ecf Merge pull request #3881 from bharatpotnuri/master
master: btl/openib: Handle EOPNOTSUPP
2017-07-26 11:32:40 -04:00
Ralph Castain
7a83fdb9bb Update to hwloc 2.0.0a with shmem support.
Update to support passing of HWLOC shmem topology to client procs
Update use of distance API per @bgoglin
Have the openib component lookup its object in the distance matrix
Bring usnic up-to-date
Restore binding for hwloc2

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-25 20:26:22 -07:00
Gilles Gouaillardet
60aa9cfcb6 hwloc: add support for hwloc v2 API
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-20 17:39:44 +09:00
Gilles Gouaillardet
9f29f3bff4 hwloc: since WHOLE_SYSTEM is no more used, remove useless
checks related to offline and disallowed elements

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-20 17:39:21 +09:00
Nathan Hjelm
e5343c16c0 btl/vader: remove debug code that should not be in a release
References #3902. Close when in master, v3.0.x, and v2.x.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-07-17 11:58:47 -05:00
Gilles Gouaillardet
6e35cfc19a btl/sm: fix misc memory leak
as reported by Coverity with CID 1415105

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-16 13:02:55 +09:00
Jeff Squyres
5cf64e6555 btl/sm: effectively delete the SM BTL
If a user explicitly asks for the "sm" BTL, print a show_help message
saying that the SM BTL is dead, and the user should be using "vader".

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-07-15 09:33:08 -07:00
Potnuri Bharat Teja
9154ade8b1 btl/openib: Handle EOPNOTSUPP
Updated openib BTL to handle EOPNOTSUPP as per
https://www.open-mpi.org/community/lists/devel/2016/04/18839.php

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
2017-07-13 21:05:32 +05:30
Gilles Gouaillardet
32606ad476 btl/tcp: fix heterogeneous support for put / large messages
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-12 10:27:45 +09:00
Nathan Hjelm
c18007d095 btl/vader: work around ob1 pending fragment bug
This commit ensures that the pml callback is always made when
sending fragments. This is needed to avoid #3845. Once that is
fixed the #if 0'd code can be restored.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-07-11 15:59:56 -06:00
Ralph Castain
31130a4bee Replace syntax with something less strictly C99
Fixes #3809

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-05 16:54:36 -07:00
George Bosilca
bd5650d680
Fix the TCP performance impact when not used
Based on an idea from Brian move the libevent trigger update to a later
stage instead of the generic add/del procs. So, we are doing the
increment/decrement when we register the recv handler for an endpoint,
so basically when we create and connect a socket to a peer. The benefit
is that as long as TCP is not used, there should be no impact on the
performance of other BTLs. The drawback is that the first TCP connection
will be slightly slower, but then once we have a peer connected over
TCP things go back to normal.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-06-23 11:15:45 +02:00
Nathan Hjelm
ffd8ee2dfd opal: use opal_list_t convienience macros
This commit cleans up code in opal to use OPAL_LIST_FOREACH(_SAFE),
OPAL_LIST_DESTRUCT, and OPAL_LIST_RELEASE.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-20 12:37:12 -06:00
Thananon Patinyasakdikul
bf7534d32c btl/usnic: changed fi_ep_bind flags for AV from NULL to 0 due to
compiler warning.

This commit fixed compiler warning generated from earlier commit :
ddbe1726c5

Signed-off-by: Thananon Patinyasakdikul <apatinya@cisco.com>
2017-05-22 10:09:43 -07:00
Thananon Patinyasakdikul
a705f2cf7b usNIC: fix fi_ep_bind flag. FI_RECV should not be associated with av.
Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
2017-05-16 18:22:28 -04:00
Ralph Castain
a737d0f963 Merge pull request #3430 from bosilca/topic/tcp_hostname
Use the OPAL function to get the hostname.
2017-05-03 06:42:02 -07:00
Brian Barrett
3b991498be btl tcp: Don't set socket buffer size by default
Set the default send and receive socket buffer size to 0,
which means Open MPI will not try to set a buffer size during
startup.

The default behavior since near day one of the TCP BTL has been
to set the send and receive socket buffer sizes to 128 KiB.  A
number that works great on 1 GbE, but not so great on 10 GbE
fabrics of any real size.  Modern TCP stacks, particularly on
Linux, have gotten much smarter about buffer sizes and are much
less efficient if a buffer size is set (even if set to something
large).

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2017-04-28 14:14:49 -07:00
George Bosilca
2d8943d920
Use the OPAL function to get the hostname. 2017-04-28 02:48:15 -04:00
Nathan Hjelm
387467c358 btl/ugni: remove erroneous mca_btl_ugni_frag_return call
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-04-27 09:14:51 -06:00
Howard Pritchard
f2a27cc991 Merge pull request #3396 from hppritcha/topic/swat_compiler_warning
btl/sm: swat a compiler warning
2017-04-22 14:31:21 -06:00
Jeff Squyres
1d5e08f44a usnic: more iov_limit fixes
Follow on to 7bd2de9960: move setting
the iov_limit to 1 earlier in the startup sequence.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-04-21 09:14:28 -07:00
Howard Pritchard
782f1bb9af btl/sm: swat a compiler warning
gnu 6.3.1 complaining about uninitialized variable

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-04-21 10:02:56 -05:00
Howard Pritchard
462342d148 Merge pull request #3311 from hppritcha/topic/libfabric_moves_to_ofi
common/libfabric: move libfabric to ofi
2017-04-21 07:50:38 -06:00
Jeff Squyres
7bd2de9960 usnic: ensure to set the iov_limit to 1
The usNIC BTL does not use more than 1 iov, so be sure to set it to 1
so that we don't allocate cq/rq/sq entries based on a default (i.e.,
>1) number of iovs per entry.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-04-20 13:28:15 -07:00
Howard Pritchard
841192645b common/libfabric: move libfabric to ofi
This PR renames the common library for OFI libfabric from
libfabric to ofi.  There are a number of reasons this
is good to do:

1) its shorter and replaces 9 characters with three for
   function names for what may eventually be a fairly extensive interface
2) OFI is the term used for MTL and RML components that use
   the OFI libfabric interface
3) A planned OSC component will also use the OFI term.
4) Other HPC libraries that can use OFI libfabric tend to use
   the term "ofi" internally and also in their configure options
   relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM)

There seem to be comments in places in the Open MPI source
code that indicate that this common library will be going away.
Far from it as we will want to be able to share things like
AV objects between OMPI and possibly OSHMEM components that
use the OFI libfabric interface.

This PR also adds a synonym to the --with-libfabric(-libdir)
configury options: --with-ofi and with-ofi-libdir.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-04-20 13:07:16 -06:00
Mark Allen
655a06f559 IB fork
The key change was in btl_openib_connect_udcm.c where a buffer was
being pinned with size 65664 (whether openib was being used or not).
The start of the buffer was page aligned, but because of the size
the end wasn't. That makes it too easy for a forked child to accidentally
touch pinned memory on the same page as the end of that buffer.

So this change increases the size of the allocated buffer to use the
rest of the page.

I inspected the rest of the ibv_reg_mr() calls and changed one other
place to page align its buffer too, although I think the above is
the one that really matters.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-04-05 17:35:52 -04:00
Ralph Castain
b398d721d5 Merge pull request #3236 from rhc54/topic/craycleanups
Silence a flood of warnings when compiling with gcc on Cray
2017-03-24 13:33:46 -07:00
Ralph Castain
ecc8000136 Silence a flood of warnings when compiling with gcc on Cray
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-24 13:37:11 -06:00
Ralph Castain
470452cba0 Correctly check the sa_family and cast the data correctly before passing it to inet_nop, and don't be quite as fancy with the pointer arithmetic as the combination was causing us to segfault every time this debug message was called.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-24 11:42:57 -07:00
Nathan Hjelm
6b210fa2c4 btl/ugni: do not return a frag from sendi if an endpoint is waitlisted
This fixes a hang that can occur when running bandwidth tests.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-14 10:14:13 -06:00
Nathan Hjelm
2e42b0afbd btl/ugni: move connection check into sync event
This commit makes datagram checks time based and reduces their
frequency when only the wildcard datagram is posted. This change
improves latency on knl systems.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-14 10:10:05 -06:00
Nathan Hjelm
d5aaeb74b6 btl/ugni: return a descriptor from sendi
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-13 14:56:54 -06:00
Nathan Hjelm
a19e7023d1 btl/ugni: always check local SMSG CQ
This commit removes the local operation count check from the local SMSG
completion queue. This check was leading to hangs due to an undocumented
feature of the ugni library. The local SMSG CQ is used to send credit
return messages back to the sender. The ugni library never checks for
the completion itself but relying on the SMSG user to periodically
check the CQ.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-13 14:56:54 -06:00
Nathan Hjelm
d5cdeb81d0 btl/ugni: improve multi-threaded performance
This commit updates the ugni btl to make use of multiple device
contexts to improve the multi-threaded RMA performance. This commit
contains the following:

 - Cleanup the endpoint structure by removing unnecessary field. The
   structure now also contains all the fields originally handled by the
   common/ugni endpoint.

 - Clean up the fragment allocation code to remove the need to
   initialize the my_list member of the fragment structure. This
   member is not initialized by the free list initializer function.

 - Remove the (now unused) common/ugni component. btl/ugni no longer
   need the component. common/ugni was originally split out of
   btl/ugni to support bcol/ugni. As that component exists there is no
   reason to keep this component.

 - Create wrappers for the ugni functionality required by
   btl/ugni. This was done to ease supporting multiple device
   contexts. The wrappers are thread safe and currently use a spin
   lock instead of a mutex. This produces better performance when
   using multiple threads spread over multiple cores. In the future
   this lock may be replaced by another serialization mechanism. The
   wrappers are located in a new file: btl_ugni_device.h.

 - Remove unnecessary device locking from serial parts of the ugni
   btl. This includes the first add-procs and module finalize.

 - Clean up fragment wait list code by moving enqueue into common
   function.

 - Expose the communication domain flags as an MCA variable. The
   defaults have been updated to reflect the recommended setting for
   knl and haswell.

 - Avoid allocating fragments for communication with already
   overloaded peers.

 - Allocate RDMA endpoints dyncamically. This is needed to support
   spreading RMA operations accross multiple contexts.

 - Add support for spreading RMA communication over multiple ugni
   device contexts. This should greatly improve the threading
   performance when communicating with multiple peers. By default the
   number of virtual devices depends on 1) whether
   opal_using_threads() is set, 2) how many local processes are in the
   job, and 3) how many bits are available in the pid. The last is
   used to ensure that each CDM is created with a unique id.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-13 14:46:06 -06:00
Nathan Hjelm
12bf38a25c btl/ugni: add MPI_T performance variables for ugni counters
This commit exposes ugni statistics for use with MPI_T. There is
no overhead to providing these counters.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-13 14:42:58 -06:00
Joshua Ladd
b28647857f Adding latest ConnectX-5 adapter vendor part id to OpenIB device params.
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2017-03-07 00:19:54 +02:00
Jeff Squyres
5b484c91f4 btl/tcp: use show_help to print the dropped-TCP warning
Make the message more friendly / more detailed, and de-duplicate it
(just in case it happens a lot).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-01 16:31:29 -08:00
George Bosilca
ec4a235e6a
Allow a TCP proc release during the create.
This is mostly for error cases, where we need to release the
newly created proc. Currently the code deadlocks because the endpoint
lock is help at the release and the lock is not recursive.

Aslo added some code to print the IP addresses that don't match during
the TCP connection step.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-03-01 13:17:54 -05:00
Jeff Squyres
fec519a793 hwloc: rename opal/mca/hwloc/hwloc.h -> hwloc-internal.h
Per a prior commit, the presence of "hwloc.h" can cause ambiguity when
using --with-hwloc=external (i.e., whether to include
opal/mca/hwloc/hwloc.h or whether to include the system-installed
hwloc.h).

This commit:

1. Renames opal/mca/hwloc/hwloc.h to hwloc-internal.h.
2. Adds opal/mca/hwloc/autogen.options to tell autogen.pl to expect to
   find hwloc-internal.h (instead of hwloc.h) in opal/mca/hwloc.
3. s@opal/mca/hwloc/hwloc.h@opal/mca/hwloc/hwloc-internal.h@g in the
   rest of the code base.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-28 07:48:42 -08:00
Jeff Squyres
a8247a76c9 Merge pull request #2948 from jsquyres/pr/update-warn-component-unused
help btl base: tell how to disable the warning
2017-02-09 21:10:01 -05:00
Jeff Squyres
e272250531 help btl base: tell how to disable the warning
As reported in
https://www.mail-archive.com/users@lists.open-mpi.org/msg30607.html,
give instructions in the show_help message how to disable the
warning.  Thanks to Susan Schwarz for reporting the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-09 15:51:30 -08:00
George Bosilca
bc2890ed11
Upon a new connection go over all available ifaces.
Add a verbose to show all the failed attempts to match the
remote interfaces based on the modex info.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-02-07 19:15:49 -05:00
Gilles Gouaillardet
c62498ab3d btl/tcp: remove reference to just removed tcp_local
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-07 09:32:09 +09:00
Jeff Squyres
368ab4d9a5 Merge pull request #2684 from bosilca/topic/tcp_fixes
Remove the tcp_local field from the TCP component.
2017-02-06 16:32:06 -05:00
Nathan Hjelm
9f28c0af39 verbs: remove extra event user increment/decrement operation
Since the oob and connections systems do not work the same way they
did in older versions of Open MPI these operations are no longer
necessary. At best they do nothing and at worst they hurt performance
by making us enter the event library more often in opal_progress().

Fixes #2839

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-01-25 18:37:06 -07:00
George Bosilca
999d4973a9
Fix an issue with extremely large data identified by tjb900.
Due to the conversion from ssize_t to int we were losing bytes, and
ended up writing outside the receiver buffer. Similarly on the send,
due to the conversion to a lesser type, we could missinterpret the
end of the fragment.
2017-01-18 10:33:12 -05:00
Gilles Gouaillardet
aeee48357a btl/sm: correctly handle nodes with zero NUMA hwloc object
the hwloc topology might not contain a NUMA object with hwloc < v2
if the node is not NUMA, so force the NUMA object count to one
in order to correctly allocate mca_btl_sm_component.sm_mpools.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-12 11:45:29 +09:00
Jeff Squyres
b980e334dc usnic: add completion stats
This should probably not go to the v2.x branch, since it changes the
output format of the usnic stats.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:54 -08:00
Jeff Squyres
706f53bb01 usnic: ensure that stats string is always truncated
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:54 -08:00
Jeff Squyres
1fdd0fe228 usnic: add missing params to show_help() call
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:54 -08:00
Jeff Squyres
7048adec04 usnic: add some assert()s
Add some run-time assert checks for debug builds.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:32 -08:00
Jeff Squyres
2d28ccb5fd usnic: add verbose output of queue lengths
Show the actual RX/TX and CQ length returned by libfabric in verbose
output.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:32 -08:00
Jeff Squyres
bd5b8ed754 usnic: ensure that queues are long enough
Double check the queue lengths that we get back from libfabric to
ensure that they are at least as long as we need.  They *should* never
be shorter than we need, but let's just check to be sure.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:32 -08:00
Jeff Squyres
53dc75a89c usnic: ensure to reset flags on returned frags
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:31 -08:00
Jeff Squyres
c4d7876ca0 usnic: check send credits on data channel for data frags
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:31 -08:00
Jeff Squyres
879d25e5df usnic: ensure to check send credits for ACKs
Don't just blindly send ACKs; ensure that we have send credits before
doing so.  If we don't have any send credits, just don't send the ACK
(it'll come again soon enough; it's not a tragedy if we don't send it
now).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:31 -08:00
Jeff Squyres
7787dad4db usnic: ensure CQs are long enough
The libfabric usnic provider may give you back TX/RX queues that are
longer than you asked for.  So just use the TX/RQ/CQ lengths that we
asked for, regardless of what length comes back.

Additionally, keep the length of the priority channel CQ separate from
the length of the data CQ.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:03:53 -08:00
Jeff Squyres
b02d8c48f5 usnic: make the releasing safer
Since the usnic BTL is single-threaded in this area, there really is
no danger, but don't use one of the pointers hanging off the frag
after we return it to the freelist.  Instead, save the endpoint
pointer before returning the frag.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:03:53 -08:00
Jeff Squyres
e25b860627 usnic: clarify types
The types are technically typedef equivalent, but it's less confusing
to use the types that agree with the name of the constructor.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:03:53 -08:00
Jeff Squyres
40fe575132 usnic: trivial updates (no code/logic changes)
- Add more explanatory comments
- Trivial whitespace / style updates
- Rename opal_btl_usnic_force_retrans() -> opal_btl_usnic_fast_retrans()

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 10:40:02 -08:00
George Bosilca
cfeeecd381 Remove the tcp_local field from the TCP component.
Instead use the OPAL process name to get the name
of the local process.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-01-07 13:24:18 -05:00
Gilles Gouaillardet
c2ddb1e2fc mca/base: plug a memory leak
register mca_base_var_enum_value_flag_t so they can be free'd
upon finalize

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:36 +09:00
Gilles Gouaillardet
7e5da7382e btl/tcp: plug leaks when closing component
remove tcp_local from the tcp_procs table, and release it

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Ralph Castain
fe68f23099 Only instantiate the HWLOC topology in an MPI process if it actually will be used.
There are only five places in the non-daemon code paths where opal_hwloc_topology is currently referenced:

* shared memory BTLs (sm, smcuda). I have added a code path to those components that uses the location string
  instead of the topology itself, if available, thus avoiding instantiating the topology

* openib BTL. This uses the distance matrix. At present, I haven't developed a method
  for replacing that reference. Thus, this component will instantiate the topology

* usnic BTL. Uses the distance matrix.

* treematch TOPO component. Does some complex tree-based algorithm, so it will instantiate
  the topology

* ess base functions. If a process is direct launched and not bound at launch, this
  code attempts to bind it. Thus, procs in this scenario will instantiate the
  topology

Note that instantiating the topology on complex chips such as KNL can consume
megabytes of memory.

Fix pernode binding policy

Properly handle the unbound case

Correct pointer usage

Do not free static error messages!

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-29 10:33:29 -08:00
Ralph Castain
3a2d6a5ab6 Begin to reduce reliance of application procs on the topology tree itself by having the daemon provide more detailed info. In this case, provide the topology description string so that procs can readily determine the number of types of objects on the node, and a "locality" string that describes which objects this process is executing upon. The latter allows a process to compute the objects of overlap between itself and another proc without consulting the topology tree.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-28 09:14:26 -08:00
Gilles Gouaillardet
54c84196a6 btl/vader: plug a memory leak
as reported by Coverity with CID 1362691

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-22 16:04:36 +09:00
Gilles Gouaillardet
3a76a78bff btl/openib: plug a memory leak in btl_openib_register_mca_params()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-01 14:24:30 +09:00
Gilles Gouaillardet
1a279c4ee9 btl/self: fix fragment segment length in mca_btl_self_prepare_src()
opal_convertor_pack() might pack less bytes than requested,
so always set frag->segments[0].seg_len.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-11-24 10:44:56 +09:00
Howard Pritchard
09f47fcf8e btl/ugni:vader swat some compiler warnings
Swat some compiler warnings.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-11-21 14:58:34 -06:00
Joshua Ladd
4907085c6f Add the ConnectX-5 device ID to openib BTL.
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-11-16 21:42:37 +02:00
George Bosilca
d0dddef53d
Protect the tcp_endpoints list from concurrent accesses.
Thanks Gilles for your help.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2016-11-11 00:06:03 -05:00
Gilles Gouaillardet
a49422fe84 btl/tcp: get rid of the MCA_BTL_TCP_SUPPORT_PROGRESS_THREAD macro
since pthreads are now mandatory, the MCA_BTL_TCP_SUPPORT_PROGRESS_THREAD
is always true and hence can be safely removed

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-11-08 14:00:05 +09:00
Jeff Squyres
a4ffa590c8 Merge pull request #2308 from hjelmn/vader_mem
btl/vader: reduce memory footprint when using xpmem
2016-11-02 10:28:26 -04:00
Steve Wise
7050969d47 openib btl: remove BTL_OPENIB_FAILOVER_ENABLED code
Remove BTL_OPENIB_FAILOVER_ENABLED code in the openib btl source.

Remove the failover-specific files from the openib btl.

Update the openib/Makefile.am accordingly.

Remove the -enable-openib-failover config logic.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
2016-11-01 14:45:36 -07:00
Jeff Squyres
149b660666 btl/usnic: fix compiler warning
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-28 07:36:20 -07:00
Nathan Hjelm
9d92075e60 btl/self: rewrite to decrease memory usage (#2307)
This commit rewrites much of the btl/self component to fix a long
standing memory usage bug. Before this commit the prepare_src path
would always allocate a max send fragment (256kB). This caused the
rank to allocate 32 * 256k useless buffers from one send. This commit
makes the following changes:

 - Add the MCA_BTL_FLAGS_GET flag by default. No reason not to set it.

 - Reduce the eager limit, max send size, buffers per allocation, and
   maximum buffer count per fragment size. These changes should have
   no noticible affect on performance but should greatly reduce the
   memory usage of the component.

 - Implement the sendi function. This should reduce self send latency
   somewhat.

 - Rewrite prepare_src to never allocate a eager or max send fragment
   for contiguous data.

 - add_procs needs to return something in the peer array for the proc
   self not just set the reachability bit. Now stores (void *) 1.

 - Various cleanups. Removed and unused file.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-10-27 12:34:54 -04:00
Nathan Hjelm
a652a193ea btl/vader: reduce memory footprint when using xpmem
The vader btl kept a per-peer registration cache to keep track of
attachments. This is not really a problem with small numbers of local
ranks but can be a problem with large SMP machines. To reduce the
footprint there is now one registration cache for all xpmem
attachments. This will probably increase the lookup time for large
transfers but is a worthwhile trade-off.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-10-27 10:09:43 -06:00