external32 data representation is now support by ompio for everything
but non-blocking collective I/O operations. The support can further be improved
in a second step to limit the temporary buffer size (at least for blocking operations),
but it does work now for many scenarios.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
introduce separate convertors for memory vs. file representation. Adjust the interfaces for decode_datatype to provide the convertor to be used for that.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
the infrastructure put in place to manage cuda buffers is actually
a lot more generic than just for cuda buffers. Specifically, we ca
reuse much of the code to implement the external32 data representation.
This commit converts the code from common_ompio_cuda* to
common_ompio_buffer*. There are just very few places where we actually need to keep the OPAL_CUDA_SUPPORT ifdef in place.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
If user sets HCOLL_EXTERNAL_UCM_EVENTS=1 then we try init opal
memory framework and register a mem release cb. Otherwise, rely on ucx.
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
Atomic lock must progress local worker while obtaining the remote lock,
otherwise an active message which actually releases the lock might not
be processed while polling on local memory location.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
SLURM 19 discontinued the use of --cpu_bind (and changed it to
--cpu-bind). There's no easy way to test at run time which one is
accepted, so set the environment variable SLURM_CPU_BIND to "none",
which should do the same thing as the srun CLI parameter.
Signed-off-by: Jordan Hayes <jhayes@ucr.edu>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
- in some cases realloc operation may be completed without
allocation of new buffer (and without additional data copy)
- added logic to reallocate buffer inplace if possible
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
Remove compatibility code for multiple versions of BTL_IN_OPAL,
BTL_VERSION, and RCACHE_VERSION. This stuff was really only necessary
when we were actively swapping code between multiple release branches
that had large variations in core OMPI infrastructure. These large
variations have now been around for quite a while, so the need for
this "compat" layer is significantly reduced. It hasn't been removed
simply because a few of the "compat" names a slightly more friendly
than the real names (e.g., the SEND/RECV/PUT names).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This commit fixes an error in the 32-bit compare-and-swap atomic support
for Aries networks. The code was incorrectly using the non-fetching
version of cswap which was causing the routing to return
OPAL_ERR_BAD_ARG.
Signed-off-by: Nathan Hjelm <hjelmn@cs.unm.edu>