1
1

804 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
db825abc00 usnic: don't overrun the fi_av_insert() EQ
Add endpoints in a blocked manner so that we don't overrun the
fi_av_insert() event queue.  Also make the AV EQ length an MCA param,
and report it in mca_btl_base_verbose >=5 output.
2016-01-30 08:33:48 -08:00
Jeff Squyres
d624e0d60f usnic: fix wraparound sequence number issue
Sequence numbers will wrap around; it is not sufficient to check for
(seq-1) -- must use the SEQ_DIFF macro to properly handle the
wraparound.

This bug wasn't serious; it just meant we might retransmit one or two
extra times when retransmits were triggerd and the sequence numbers
wrapped around their sliding windows.
2016-01-30 08:32:13 -08:00
Jeff Squyres
4de4a263f5 usnic: ensure all messages are sent on the data channel
Messages should go on the data channel, even if they're short.  Only
ACKs go on the priority channel.
2016-01-30 08:31:21 -08:00
Jeff Squyres
348ac507c2 usnic: explain why we still have OPAL_HAVE_HWLOC
Put in a comment explaining why btl_usnic_compat.h still defines
OPAL_HAVE_HWLOC, even though master/v2.x no longer does.
2016-01-16 04:11:05 -08:00
Jeff Squyres
0f5fcf9029 usnic: fix common symbol 2016-01-16 03:55:27 -08:00
Jeff Squyres
60ffe713b8 common syms: whitelist bison-generated common symbols
Bison generates some common symbols that we can't do anything about,
so whitelist them.
2016-01-16 03:53:14 -08:00
Artem Polyakov
84e4fb308b Fix race condition in UDCM where service thread sees that
`cm_message_event_active == 1` but main thread has already stopped
processing messages and thus we will have the situation where one
message was left unhandled leading to a hang.
2016-01-08 23:56:21 +06:00
Jeff Squyres
6d073a8da4 btl_sm: add a comment explaining why we rename(2)
Per open-mpi/ompi#1230, add a comment explaining why we write to a
temporary file and then rename(2) the file, just so that future code
maintainers don't wonder why we do this seemingly-useless step.
2016-01-04 14:51:52 -05:00
Artem Polyakov
2abb2972ac Fix Mellanox copyrights with respect to the following PRs:
* https://github.com/open-mpi/ompi/pull/1184
* https://github.com/open-mpi/ompi/pull/1188
* https://github.com/open-mpi/ompi/pull/1197
* https://github.com/open-mpi/ompi/pull/1202
* https://github.com/open-mpi/ompi/pull/1210
* https://github.com/open-mpi/ompi/pull/1216
* https://github.com/open-mpi/ompi/pull/1236
* https://github.com/open-mpi/ompi/pull/1237
* https://github.com/open-mpi/ompi/pull/1248
* https://github.com/open-mpi/ompi/pull/1260
* https://github.com/open-mpi/ompi/pull/1264
2015-12-30 00:12:19 +06:00
Gilles Gouaillardet
fec973efda configury: test portability
replace test ... -o ... with test ... || test ...
and test ... -a ... with test ... && test ...
2015-12-28 13:58:45 +09:00
Nathan Hjelm
700a21022a Merge pull request #1260 from artpol84/openib_proc_account_fix
Openib proc accounting fix
2015-12-27 15:19:52 -07:00
Artem Polyakov
a20826e6b4 Fix vader resource leak.
This nasty bug was nicely masked. It was causing `mca_btl_vader_component.vader_frags_user`
overflow and as the result rear hangs of ompi-test-suite.
2015-12-28 00:41:45 +06:00
Gilles Gouaillardet
2d9aa38e6a btl/openib: fix heterogeneous support 2015-12-25 16:31:35 +09:00
Artem Polyakov
3031affdb7 Fix openib process accounting if procs was dynamically added. 2015-12-24 17:56:35 +06:00
Artem Polyakov
400af6c52d openib addproc improvements:
1. finer grained locks;
2. separate srq creation from cq adjustments.
2015-12-24 17:56:35 +06:00
Artem Polyakov
41c325f15a Shift common code for calculating a port count and btl_rank in openib
into the static function
2015-12-24 17:56:35 +06:00
Gilles Gouaillardet
5fa63f086a btl/tcp: add missing #include <unistd.h>
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:46 +09:00
Gilles Gouaillardet
15ed7ad9f5 btl/sm: add missing #include <unistd.h>
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:41 +09:00
Gilles Gouaillardet
42313acd58 btl/usnic: add missing #include <alloca.h> 2015-12-24 14:33:58 +09:00
Nathan Hjelm
84d890b7e7 Merge pull request #1248 from artpol84/openib_proc_init_race
Openib dynamic add proc race conditions
2015-12-22 21:48:05 -07:00
Artem Polyakov
08ad8357a8 Fix local process accounting in openib when dynamic add_proc is on. 2015-12-22 22:44:46 +06:00
Artem Polyakov
3c2f6d5560 Protect openib_btl->device data with explicit opal_mitex locks. 2015-12-22 18:33:26 +06:00
Gilles Gouaillardet
607d7c7545 btl/sm: rename file after file descriptor has been closed.
Thanks George for spotting this.
2015-12-22 13:56:53 +09:00
Artem Polyakov
e06bffe213 Fix ib_proc locking 2015-12-21 18:52:31 +06:00
Artem Polyakov
3eb4756a17 Force locking regardles to the opal_using_threads() setting. 2015-12-21 18:52:31 +06:00
Artem Polyakov
11b72d9add Make important fields of ib_proc volatile. 2015-12-21 18:52:31 +06:00
Artem Polyakov
86c0c3ec52 Provide additional information: whether ib_proc was newly created or
it was already existing.
2015-12-21 18:52:31 +06:00
Artem Polyakov
9325bd3d69 Protect device initialization 2015-12-21 18:52:31 +06:00
Artem Polyakov
0f77bc7ea7 Perform endpoint initialization atomically. 2015-12-21 18:52:31 +06:00
Artem Polyakov
afaf9c9ea6 Shift ib_proc initialization to the separate function. 2015-12-21 18:52:31 +06:00
Artem Polyakov
3c9fd567b6 Fix openib race condition when direct modex is used.
The problem was in mca_btl_openib_proc_create. This function may be called
from several places simultaneously:
* from the main thread when somebody wants to do `MPI_Send()` (for example) for
the first time;
* from udcm if the counterpart peer is trying to connect and `mca_btl_openib_get_ep()`
is called.

In this case one of the threads may add an uninitialized proc structure
to the `mca_btl_openib_component.ib_procs` and the other will read it and
treat as initialized.

This commit turns ib_proc initialization into a single atomic operation.
2015-12-21 18:52:30 +06:00
Gilles Gouaillardet
db4f483653 btl/sm: fix race condition
write to file and then rename, so when the file is open for read, its content is known to have been written.

Fixes open-mpi/ompi#1230
2015-12-21 16:37:51 +09:00
Nathan Hjelm
e77199fd4f Merge pull request #1235 from ggouaillardet/topic/ibv_exp_fixes
btl/openib: do not mix exp and non exp verbs
2015-12-17 08:36:09 -07:00
Gilles Gouaillardet
994a627f82 btl/openib: do not mix exp and non exp verbs 2015-12-17 16:45:43 +09:00
Artem Polyakov
0951a34e95 Fix openib memory registration limit calculation if cutoff = 0. 2015-12-17 13:45:19 +06:00
Jeff Squyres
2b9341a38a usnic: fix embarrissing typo 2015-12-15 19:01:19 -08:00
Jeff Squyres
944d5061a6 usnic: sendto() can return EPERM if we send too fast
If we send too fast, sendto() can run out of resources and return
EPERM.  So delay a little and try again.
2015-12-15 15:31:29 -08:00
Jeff Squyres
ab1bbca5b9 usnic: improve error message
When sendto() fails, it would be helpful to see the errno value.
2015-12-15 15:04:25 -08:00
Jeff Squyres
c1a6beac8d usnic: fix error message
There were too many "%s" instances.  Re-order the output so that we
show file, line, and then the error message.
2015-12-15 14:48:38 -08:00
Nathan Hjelm
c98086f028 Merge pull request #1223 from hjelmn/ib_use_srq
btl/openib: use only SRQ on ib by default
2015-12-15 14:04:19 -08:00
Nathan Hjelm
00da520fd5 Merge pull request #1222 from hjelmn/vader_fix
btl/vader: do not attempt to munmap opal/shmem pointer
2015-12-15 09:06:50 -08:00
Nathan Hjelm
b24b3a4ae4 btl/openib: use only SRQ on ib by default
It was decided some time ago that there is no benefit to using any
per-peer receive queues on infiniband. At the time we decided not to
change the default but that objection has been dropped. This commit
changes the 128 message queue to use SRQ instead of PP. This has no
impact on iWarp which sets the default in a different way.

Closes open-mpi/ompi#1156

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-15 09:48:03 -07:00
Nathan Hjelm
60591ae753 btl/vader: do not attempt to munmap opal/shmem pointer
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-15 08:48:04 -07:00
Todd Kordenbrock
7b97963669 btl-portals4: remove unnecessary PtlMDBind result check
When PtlMDBind was removed, the result check was left in which
causes intermittent failures depending on the junk value found in
the 'ret' variable.  The commit removes the result check.
2015-12-14 12:09:01 -06:00
Nathan Hjelm
f692576f1e btl/openib: add check for IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG
Mofed 2.2 does not have the IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG attribute
flag. Add a check to fix compilation for mofed 2.2. This commit only
fixes complilation with the older mofed. It will not allow an Open MPI
compiled with mofed 2.3 or newer to work on a machine with mofed 2.2.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-09 17:02:36 -07:00
Todd Kordenbrock
2b7e983989 btl-portals4: set endpoint rank even if endpoint already exists
If btl-portals4 is configured to use logical mapping of ranks to
physical nodes, then the endpoint must have the rank field set.
This commit fixes a bug that caused the endpoint to have the
nid/pid instead of the rank if the endpoint already exists.
2015-12-08 12:29:00 -06:00
Nathan Hjelm
c9382f23e9 mlx5: need to set comp_mask to get experimental verbs attributes
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-08 10:34:16 -07:00
Nathan Hjelm
191aebb9c8 btl/openib: fix compile problems when using experimental verbs
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-30 22:21:26 -07:00
Nathan Hjelm
bb8e347371 btl/openib: update experimental verbs support
This update adds an additional check (if supported) to see if 8-byte
atomics are supported by the hardware. If 8-byte atomics are not
supported the atomics support is disabled.

This commit also includes some cleanup.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-30 12:32:04 -07:00
Nathan Hjelm
02a6c6856d btl/openib: add support for mlx5 atomic operations
This commit adds support for fetch-and-add and compare-and-swap when
using the mlx5 driver. The support is only enabled if the expanded
verbs interface is detected. This is required because mlx5 HCAs return
the atomic result in network byte order. This support may need to be
tweaked if Mellanox commits their changes into upstream verbs.

Closes open-mpi/ompi#1077

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-23 16:07:12 -07:00