This commit increaes the osc_rdma_max_attach variable from 32
to 64. The new default is kept low due to the small number
of registration resources on some systems (Cray Aries). A
larger max attachement value can be set by the user on other
systems.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
This commit addresses two issues in osc/rdma:
1) It is erroneous to attach regions that overlap. This was being
allowed but the standard does not allow overlapping attachments.
2) Overlapping registration regions (4k alignment of attachments)
appear to be allowed. Add attachment bases to the bookeeping
structure so we can keep better track of what can be detached.
It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.
References #7384
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
- fix a typo `alloc_shared_contig` to `alloc_shared_noncontig`
- correct the value of `blocking_fence`
Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
To avoid fully initializing the osc/ucx component for MPI application
that are not using One-Sided functionality, the initialization happens
at the first MPI window creation.
This commit ensures atomicity of global state modifications.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
Atomic lock must progress local worker while obtaining the remote lock,
otherwise an active message which actually releases the lock might not
be processed while polling on local memory location.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
mark the "self" peer OMPI_OSC_RDMA_PEER_LOCAL_BASE when
the window is dynamically created and use_cpu_atomics is set
in order to correctly handle communications to self.
Thanks Bart Janssens for reporting this issue.
Refs. open-mpi/ompi#6394
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Place the content of common_ucx_int.h back to the common_ucx.h and
include common_ucx_wpool.h explicitly.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
Now Open MPI requires a C99 compiler. Checking availability of
the following types is no more needed.
- `long long` (`signed` and `unsigned`)
- `long double`
- `float _Complex`
- `double _Complex`
- `long double _Complex`
Furthermore, the `#if HAVE_[TYPE]` style checking is not correct.
Availability of C types is checked by `AC_CHECK_TYPES` in `configure.ac`.
`AC_CHECK_TYPES` defines macro `HAVE_[TYPE]` as `1` in `opal_config.h`
if the `[TYPE]` is available. But it does not define `HAVE_[TYPE]`
(instead of defining as `0`) if it is not available. So even if we
need `HAVE_[TYPE]` checking, it should be `#if defined(HAVE_[TYPE])`.
I didn't remove `AC_CHECK_TYPES` for these types in `configure.ac`
since someone may use `HAVE_[TYPE]` macros somewhere.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
Make sure all pending communications are done on all ranks before
closing the window. This way it will be safe to close the endpoints when
closing the component.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>