introduce separate convertors for memory vs. file representation. Adjust the interfaces for decode_datatype to provide the convertor to be used for that.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
the infrastructure put in place to manage cuda buffers is actually
a lot more generic than just for cuda buffers. Specifically, we ca
reuse much of the code to implement the external32 data representation.
This commit converts the code from common_ompio_cuda* to
common_ompio_buffer*. There are just very few places where we actually need to keep the OPAL_CUDA_SUPPORT ifdef in place.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
If user sets HCOLL_EXTERNAL_UCM_EVENTS=1 then we try init opal
memory framework and register a mem release cb. Otherwise, rely on ucx.
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
Atomic lock must progress local worker while obtaining the remote lock,
otherwise an active message which actually releases the lock might not
be processed while polling on local memory location.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
SLURM 19 discontinued the use of --cpu_bind (and changed it to
--cpu-bind). There's no easy way to test at run time which one is
accepted, so set the environment variable SLURM_CPU_BIND to "none",
which should do the same thing as the srun CLI parameter.
Signed-off-by: Jordan Hayes <jhayes@ucr.edu>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Remove compatibility code for multiple versions of BTL_IN_OPAL,
BTL_VERSION, and RCACHE_VERSION. This stuff was really only necessary
when we were actively swapping code between multiple release branches
that had large variations in core OMPI infrastructure. These large
variations have now been around for quite a while, so the need for
this "compat" layer is significantly reduced. It hasn't been removed
simply because a few of the "compat" names a slightly more friendly
than the real names (e.g., the SEND/RECV/PUT names).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This commit fixes an error in the 32-bit compare-and-swap atomic support
for Aries networks. The code was incorrectly using the non-fetching
version of cswap which was causing the routing to return
OPAL_ERR_BAD_ARG.
Signed-off-by: Nathan Hjelm <hjelmn@cs.unm.edu>
The rdma_frag attached to the send request was not correctly released
upon request completion, leaking until MPI_Finalize. A quick solution
would have been to add RDMA_FRAG_RETURN at different locations on the
send request completion, but it would have unnecessarily made the
sendreq completion path more complex. Instead, I added the length to
the RDMA fragment so that it can be completed during the remote ack.
Be more explicit on the comment.
The rdma_frag can only be freed once when the peer forced a protocol
change (from RDMA GET to send/recv). Otherwise the fragment will be
returned once all data pertaining to it has been trasnferred.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
In OMPI 2.1.2, buildrpm.sh could work with a value of rpmtopdir that was
set in the environment. In newer versions this is no longer true,
causing such values to be ignored. This patch adds a new argument to
buildrpm.sh, -R, which allows the user to specify where to build the
RPMs.
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>