Avoid printing an error message about ENOTCONN return codes from
getpeername() when handling an incoming connection request. At
this point in the receive state machine, the remote process has
been verified to be a valid OMPI instance. In all-to-all startup
at 4k rank scale, we're seeing this error message when the remote
side drops the connection because it realizes it's the "loser"
in the connection race. We were already doing all the right things,
other than printing a scary error message. So skip the error
message and call it good.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
free the component mpool in mca_btl_vader_component_close()
and after freeing soem objects that depend on it such as
mca_btl_vader_component.vader_frags_user
Thanks Christoph Niethammer for reporting this.
Refs. open-mpi/ompi#6524
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
- there was a set of UCX related issues reported which caused
by mmap API hooks conflicts. We added diagnostic of such
problems to simplify bug-resolving pipeline
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
I think the strncat() calls here need to be of the form
strncat(str, new_str_to_add, len - strlen(new_str_to_addstr) - 1);
since in the OMPI calls len is being used as total number of bytes
in str.
strncat(dest,src,n) on the other hand is documented as writing up to
n chars from the incoming string plus 1 for the null, for n+1 total
bytes it can write.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
Use $(AM_CPPFLAGS) in $(usnic_btl_run_tests_CPPFLAGS) so that we don't
have to replicate hard-coded values.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
OPAL_XLC_INLINE_ASSEMBLY was removed in commit ebce88b7ad.
Removing dead code, which also fixes a compiler warning.
Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
do define the OMPI_LIBMPI_NAME macro via the CPPFLAGS.
The issue occurs when Open MPI is configured with
--enable-opal-btl-usnic-unit-tests
Thanks George Marselis for reporting this issue
Refs. open-mpi/ompi#6441
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
When Slurm is built against PMIx, some installations place a copy of the
PMIx library that Slurm is linking against in the Slurm PMI location.
Current configury ignores that location. The desired behavior is to look
for a PMIx lib in that location when --with-pmi is given. If the user
also specifies --with-pmix and gives a different location, then override
anything previously found and look for it where the user directed.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Place the content of common_ucx_int.h back to the common_ucx.h and
include common_ucx_wpool.h explicitly.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>