1
1

274 Коммитов

Автор SHA1 Сообщение Дата
Joseph Schuchart
3d08d790e9 osc/rdma: fail query_btls if no endpoint for non-local peer is found
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit eebc451ec8313975998a63e25938fb4e0b4d6c44)
2020-07-17 08:38:37 +02:00
Jeff Squyres
249c57a4bc
Merge pull request #7889 from devreal/osc-rdma-noncontig-requests-v4.1.x
osc rdma: check for outstanding fragments before completing a request (II) (v4.1.x)
2020-06-29 15:19:49 -04:00
Joseph Schuchart
2d3f862f1d osc rdma: check for outstanding fragments before completing a request in ompi_osc_rdma_put_complete_flush as well
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit caed3b2eed478c76f34d56b5d0495bf26e44a9bb)
2020-06-29 15:44:44 +02:00
Jeff Squyres
6d550a32ca
Merge pull request #7856 from hjelmn/v4.1.x_add_a_missing_osc_rdma_commit_for_dynamic_memory_windows_that_was_forgotten_before_the_last_release_doh
osc/rdma: fix bug in attach for non-debug builds
2020-06-22 17:16:42 -04:00
Nathan Hjelm
059c9614a6 osc/rdma: fix bug in attach for non-debug builds
This commit fixes an issue with non-debug builds where adding an
attachment to the attachment list doesn't actually happen. This
causes all MPI_Win_detach calls to fail. The call was within an
assert which is optimized out in optimized builds.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 8ee80d885561c4349ab97725416613614d63c77f)
2020-06-22 08:34:43 -06:00
Joseph Schuchart
1af51d07d6 osc rdma: check for outstanding fragments before completing a request
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 85ed26f2f859283522188a68dea493cdc99d76aa)
2020-06-22 08:32:14 +02:00
Joseph Schuchart
2503b5f10f RDMA osc: remove extra retain on pending_op
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit de67ada44251b2792f3fef19d890dc7aad39dd73)
2020-04-24 09:16:57 -06:00
Nathan Hjelm
f4bc0f46d6 osc/rdma: bump the default max dynamic attachments to 64
This commit increaes the osc_rdma_max_attach variable from 32
to 64. The new default is kept low due to the small number
of registration resources on some systems (Cray Aries). A
larger max attachement value can be set by the user on other
systems.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 54c8233f4f670ee43e59d95316b8dc68f8258ba0)
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-17 13:49:36 -08:00
Nathan Hjelm
eeb6821550 osc/rdma: modify attach to check for region overlap
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.

References #7384

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 6649aef8bde750d2f1f28e52ff295919face3e0b)
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-17 13:45:55 -08:00
Joseph Schuchart
c5cf3432b9 OSC rdma win allocate: synchronize error codes across shared memory group
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 8f27cc26d9845b5b207979b2a4621ef1089d1afb)
2019-06-24 17:49:26 +02:00
Joseph Schuchart
900f0fa21f OSC rdma: make sure accumulating in shared memory is safe
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit c67e2291937a09947c421dc84c6b3a8d07bec07f)
2019-06-07 12:45:00 +02:00
Gilles Gouaillardet
749f51845b osc/rdma: correctly handle communications to self
mark the "self" peer OMPI_OSC_RDMA_PEER_LOCAL_BASE when
the window is dynamically created and use_cpu_atomics is set
in order to correctly handle communications to self.

Thanks Bart Janssens for reporting this issue.

Refs. open-mpi/ompi#6394

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(back-ported from commit open-mpi/ompi@fe05fcc11a)
2019-02-20 13:06:05 +09:00
Joseph Schuchart
c5346751e6 Plug two memory leaks in rdma osc
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 91885f5876129aa4fb43ed4b3404c9d1ca7e08b8)
2018-11-29 10:19:26 -05:00
Geoff Paulsen
bc798b6135
Merge pull request #5755 from gpaulsen/osc_rdma_cleanup
osc/rdma: clean out stale aggregation code
2018-09-22 15:00:21 -05:00
Nathan Hjelm
72fc8acb50 osc/rdma: quiet warning
gcc complains about ret possibly being used uninitialized. That will
never happen but we should still quiet the warning. This commit sets
ret to a valid value.

Fixes #5513

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:44:56 -05:00
Nathan Hjelm
56e31f8206 osc/rdma: clean out stale aggregation code
The aggregation code in osc/rdma is currently broken and will likely
not be reused. This commit cleans it out.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:42:45 -05:00
Nathan Hjelm
304a6a52d4 osc/rdma: use local base for local process when possible
This commit fixes a crash that occurs when using btl/vader as an RDMA
btl. This btl supports using CPU atomics and does not support using
the btl for self communication so we must use the local memory
optimizations in osc/rdma.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-12 15:50:50 -06:00
Nathan Hjelm
037656bc1d osc/rdma: fix bug introduced in b90c838
This commit fixes an bug that was introduced back in 2016 which
impacts request-based RMA in some cases.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-10 18:17:55 -06:00
Nathan Hjelm
e4989714c2 osc/rdma: fix data race on teardown
The osc/rdma module did not wait for all pending atomics to complete
before tearing down. This could lead to weird issues as the target
location may no longer be registered or allocated.

This commit also fixes an offset calculation issue in
ompi_osc_get_data_blocking ().

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-06-25 11:47:34 -06:00
Howard Pritchard
64de269cc3 topo/treematch - quash compiler warning
quash a compiler warning showing up with gcc 7.3

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-06-13 16:34:17 -05:00
Nathan Hjelm
d0d59b1d7d osc/rdma: add support for controlling location of backing store
This commit adds a new MCA variable to set the location of the backing
store: osc_rdma_backing_directory. The default on Linux has been
changed to use /dev/shm to improve performance in cases where /tmp is
not a tmpfs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-05-29 21:43:33 -06:00
bosilca
2ab628b92e
Merge pull request #5074 from bosilca/topic/remove_warnings
Remove warnings identified by clang.
2018-05-15 11:15:23 -04:00
Nathan Hjelm
cf585d725c osc/rdma: fix SEGV will null origin in FOP in debug build
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-05-08 14:10:20 -06:00
George Bosilca
6ff11267fb
Remove warnings identified by clang.
Plus minor spacing and indentation issues.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-04-14 17:14:12 -04:00
Nathan Hjelm
e79debc320 osc/rdma: fix overflow in offset calculation
This commit fixes a bug is osc/rdma that can occur if the total size
of the shared memory segment gets larger than 4 GiB. The bug was
caused by a typo. The type of my_base_offset should have been size_t
not int.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-03-27 09:33:44 -06:00
Nathan Hjelm
f7faacca4e osc/rdma: fix 32-bit builds
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-03-27 09:16:04 -06:00
Jeff Squyres
06af6f1c4c
Merge pull request #4962 from jsquyres/pr/cid-fixes
A bunch of CID fixes
2018-03-26 22:30:31 -04:00
Jeff Squyres
124208198c osc/rdma: fix CID 1424327
Fix minor memory leak.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-26 14:21:21 -07:00
Ralph Castain
3a93b535ec Silence the flood of OSC/RDMA warnings
Fixes #4950

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-25 16:12:41 -07:00
Nathan Hjelm
7f4872d483 osc/rdma: performance improvments and bug fixes
This commit is a large update to the osc/rdma component. Included in
this commit:

 - Add support for using hardware atomics for fetch-and-op and single
   count accumulate  when using the accumulate lock. This will improve
   the performance of these operations even when not setting the
   single intrinsic info key.

 - Rework how large accumulates are done. They now block on the get
   operation to fix some bugs discovered by an IBM one-sided test. I
   may roll back some of the changes if the underlying bug in the
   original design is discovered. There appear to be no real
   difference (on the hardware this was tested with) in performance so
   its probably a non-issue. References #2530.

 - Add support for an additional lock-all algorithm: on-demand. The
   on-demand algorithm will attempt to acquire the peer lock when
   starting an RMA operation. The lock algorithm default has not
   changed. The algorithm can be selected by setting the
   osc_rdma_locking_mode MCA variable. The valid values are two_level
   and on_demand.

 - Make use of the btl_flush function if available. This can improve
   performance with some btls.

 - When using btl_flush do not keep track of the number of put
   operations. This reduces the number of atomic operations in the
   critical path.

 - Make the window buffers more friendly to multi-threaded
   applications. This was done by dropping support for multiple
   buffers per MPI window. I intend to re-add that support once the
   underlying performance bug under the old buffering scheme is
   fixed.

 - Fix a bug in request completion in the accumulate, get, and put
   paths. This also helps with #2530.

 - General code cleanup and fixes.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-03-15 14:53:53 -06:00
Matias A Cabral
009ba475e1 osc/rmda: fix missing opal_argv_free in mtls search.
Use asprintf in description message to avoid missing default values
Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2018-01-11 14:29:16 -08:00
Nathan Hjelm
1282e98a01 opal/asm: rename existing arithmetic atomic functions
This commit renames the arithmetic atomic operations in opal to
indicate that they return the new value not the old value. This naming
differentiates these routines from new functions that return the old
value.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-30 10:41:22 -07:00
Nathan Hjelm
9d0b3fe9f4 opal/asm: remove opal_atomic_bool_cmpset functions
This commit eliminates the old opal_atomic_bool_cmpset functions. They
have been replaced by the opal_atomic_compare_exchange_strong
functions.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-30 10:41:22 -07:00
Nathan Hjelm
45db3637af osc/rdma: bug fixes
This commit fixes the following bugs:

 - Allow a btl to be used for communication if it can communicate with
   all non-self peers and it supports global atomic visibility. In
   this case CPU atomics can be used for self and the btl for any
   other peer.

 - It was possible to get into a state where different threads of an
   MPI process could issue conflicting accumulate operations to a
   remote peer. To eliminate this race we now update the peer flags
   atomically.

 - Queue up and re-issue put operations that failed during a BTL
   callback. This can occur during an accumulate operation. This was
   an unhandled error case.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-29 12:43:58 -07:00
Nathan Hjelm
67e26b6e5a
Merge pull request #4482 from matcabral/osc_rdma_skip_mtls
osc/rdma: mca parameter to list MTLs that lower osc rdma priority
2017-11-29 09:34:33 -07:00
Ralph Castain
3906aaf41a Silence warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-25 11:50:18 -08:00
Matias A Cabral
80c8858c5a osc/rdma: add an mca parameter to list MTLs for which osc pt2pt should have
higher priority than rdma and default to psm2.

Context: the Intel Omni-path driver (hfi1) has verbs support, so the openib
btl is available to use. However, at a bad performance. Without this
change osc rdma using btl openib is the default choice when running on Intel
Omni-path, with a lower performance than osc pt2pt over mtl psm2.

Signed-off-by: Matias A Cabral <matias.a.cabral@intel.com>
2017-11-09 11:54:11 -08:00
Nathan Hjelm
3ff34af355 opal: rename opal_atomic_cmpset* to opal_atomic_bool_cmpset*
This commit renames the atomic compare-and-swap functions to indicate
the return value. This is in preperation for adding support for a
compare-and-swap that returns the old value. At the same time the
return type has been changed to bool.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-10-31 12:47:23 -06:00
Joshua Hursey
e1d079544b mca: Dynamic components link against project lib
* Resolves #3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-08-24 11:56:16 -04:00
Ralph Castain
7a83fdb9bb Update to hwloc 2.0.0a with shmem support.
Update to support passing of HWLOC shmem topology to client procs
Update use of distance API per @bgoglin
Have the openib component lookup its object in the distance matrix
Bring usnic up-to-date
Restore binding for hwloc2

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-25 20:26:22 -07:00
Nathan Hjelm
022c658bbf osc/rdma: rework locking code to improve behavior of unlock
This commit changes the locking code to allow the lock release to be
non-blocking. This helps with releasing the accumulate lock which may
occur in a BTL callback.

Fixes #3616

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-27 15:29:51 -06:00
Nathan Hjelm
31ab83362a osc/rdma: cleanup local peer setup and fix a bug
The data endpoint was not being set correctly for local peers in some
cases. This commit fixes the bug and cleans the associated code to
simplify the logic.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-22 13:28:45 -06:00
Gilles Gouaillardet
5e9be7667b Merge pull request #3600 from ggouaillardet/topic/osc_rdma_get_segment
osc/rdma: fix osc_rdma_get_remote_segment() length parameter
2017-06-01 13:09:14 +09:00
Nathan Hjelm
e1a997c0cb Merge pull request #3593 from hjelmn/bug_3575
osc/rdma: fix typo in ompi_osc_rdma_lock_acquire_exclusive
2017-05-31 08:54:40 -06:00
Gilles Gouaillardet
e622ca8c1c osc/rdma: fix osc_rdma_get_remote_segment() length parameter
a buffer defined by (buf, count, dt)
will have data starting at buf+offset and ending len bytes later with
len = opal_datatype_span(&dt.super, count, &offset);

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-29 11:08:03 +09:00
Nathan Hjelm
b83c5dbee5 osc/rdma: fix typo in ompi_osc_rdma_lock_acquire_exclusive
Fixes #3575

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-05-26 14:21:08 -06:00
Josh Hursey
4bfb0fcddd Merge pull request #3577 from markalle/pr/osc_rdma_rangecheck
fix for buffer length check (rdma osc w/ odd datatypes)
2017-05-26 10:44:33 -05:00
Gilles Gouaillardet
0f79259b94 osc/rdma: use extent of the appropriate datatype in ompi_osc_rdma_rget_accumulate_internal()
origin_datatype and target_datatype might be different and hence have different extent,
so use either origin_extent or target_extent when appropriate.

Refs open-mpi/ompi#3569

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-26 13:59:38 +09:00
Mark Allen
df14cbf039 fix for buffer length check (rdma osc w/ odd datatypes)
The osc_rdma_get_remote_segment() has the 3rd and 4th args as
* target_disp
* length
which it uses to determine if the rdma falls within the bounds of
the window or not (actually it only checks the upper bound, but I'm
okay with that).

Anyway the caller previously was passing in the length argument as
    target_datatype->super.size * target_count
which which doesn't really represent the number of bytes after target_disp
for which data exists. In particular I could create a datatype as
    { disp -4, len 4 } and use target_disp 4
and that would be bytes 0-3 of the window where the original code
would think it was bytes 4-7 and could abort at the range check.

Ive changed it to use the opal_datatype_span() function.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-05-24 19:10:39 -04:00
Mark Allen
c9f31a8d39 fix for 1sided with some hosts single rank
See bug report
    https://github.com/open-mpi/ompi/issues/3548

If a 1sided test is launched -host hostA:2,hostB:1 some of the ranks
call allocate_state_single() and others call allocate_state_shared().
These functions were producing different values for module->state_size
but that's used when they lookup peer info from each other in
ompi_osc_rdma_peer_setup() so they need to all have matching
module->state_offset values.

This change adds a few unused bytes in the memory allocate_state_single()
creates so it matches.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-05-22 15:10:49 -04:00