A bunch of empirical testing has shown that increasing the retranmit
timeout from 1ms to 5ms doesn't adversely affect performance, yet
decreases the number of gratuitious retransmissions.
Add endpoints in a blocked manner so that we don't overrun the
fi_av_insert() event queue. Also make the AV EQ length an MCA param,
and report it in mca_btl_base_verbose >=5 output.
Sequence numbers will wrap around; it is not sufficient to check for
(seq-1) -- must use the SEQ_DIFF macro to properly handle the
wraparound.
This bug wasn't serious; it just meant we might retransmit one or two
extra times when retransmits were triggerd and the sequence numbers
wrapped around their sliding windows.
when SMT is enabled, a core must be counted as long as one of its hwthread is allowed
Thanks Ben Menadue for the report.
This fixes a regression from open-mpi/ompi@6d149554a7
This commit adds code to handle large unaligned gets. There are two
possible code paths for these transactions:
1) The remote region and local region have the same alignment. In
this case the get will be broken down into at most three get
transactions: 1 transaction to get the unaligned start of the region
(buffered), 1 transaction to get the aligned portion of the region,
and 1 transaction to get the end of the region.
2) The remote and local regions do not have the same alignment. This
should be an uncommon case and is not optimized. In this case a
buffer is allocated and registered locally to hold the aligned data
from the remote region. There may be cases where this fails (low
memory, can't register memory). Those conditions are unlikely and
will be handled later.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
If the state of the request is not set to OMPI_REQUEST_ACTIVE
then MPI_Test would immediately signal such request completed
while hcoll may still be working on it.
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
If atomics are not globally visible (cpu and nic atomics do not mix)
then a btl endpoint must be used to access local ranks. To avoid
issues that are caused by having the same region registered with
multiple handles osc/rdma was updated to always use the handle for
rank 0. There was a bug in the update that caused osc/rdma to continue
using the local endpoint for accessing the state even though the
pointer/handle are not valid for that endpoint. This commit fixes the
bug.
Fixesopen-mpi/ompi#1241.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit makes ompi_datatype_get_pack_description thread safe. The
call is used by osc/pt2pt to send the packed description to remote
peers. Before this commit if MPI_THREAD_MULTIPLE is enabled and the
user uses MPI_Put, MPI_Get, etc we could hit a race where multiple
threads attempt to store the packed description on the datatype. Since
the code in question is not performance-critical the threading fix
uses opal_atomic_* calls instead of bothering with OPAL_THREAD_*.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Optimizing put aggregation in the presence of threads will require a
redesign of the code. For now just ensure that put aggregation is
turned off when MPI_THREAD_MULTIPLE is enabled.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>