order the BTL depending on the real latency for the eager protocol. Starting from now, the
latency one can specify for the devices will be in micro-second, while the bandwidth is in Mbs
(as it was before).
This commit was SVN r10566.
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1. If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).
This commit was SVN r10521.
explicitly enabled at run-time with the mca parameter
io_romio_enable_parallel_optimizations set to something non-zero.
This will enable some magic flags in Panasas if the user didn't
set them (either on or off) and do some slightly better things
with strided collective writes.
This commit was SVN r10516.
standard). This macro allow us to specify the length of the fragment. Now we are
able to know how the message is fragmented between the network devices or inside
the communication protocol.
This commit was SVN r10508.
specified check that the put function is available for the BTL. Same safe check for
the GET function. At the end make sure that at least on communication protocol is
specified, otherwise force the send flag.
This commit was SVN r10507.
by the BTL (btl_max_rdma_size). Now the PUT protocol is pipelined even if there
is just one network between the 2 peers. Unfortunately, this problem is present
the 1.1 (no pipeline for the PUT protocol).
This commit was SVN r10499.
time.
UD is connectionless, and as long as peers are statically assigned to QPs,
there is no reason to set up the adressing information lazily.
Lots of code was axed, as endpoints no longer have state. Removed a
number of other elements in the endpoint struct to make it as lightweight
as possible.
I was able to remove an entire function call/branch in the send path,
which I believe is the main contributor to a 2us drop in NetPIPE latency.
Some whitespace cleanups as well.
Passes IBM test suite, and all but certain intel tests that were failing
before the change, over ob1 PML.
This commit was SVN r10494.
Moved a lot of the module-specific init from the component init to the module init.
Try keeping a pointer to reduce indexing, didn't seem to help - leaving in place
for now.
This commit was SVN r10485.
Playing around with OPAL_LIKELY/UNLIKELY, no real gains yet.
Reworked progress() to process many WC's at a time, as well
as immediately repost groups of receive buffers.
This commit was SVN r10481.
was smaller than the CACHE_LINE_SIZE. Here is the version that works.
In fact this works on 2 steps. First we set the element size to something
multiple of the desired alignment. Then when we allocate memory, we compute
the total size, and we will align each of the elements (we allocate
multiple of them every time) to the CACHE_LINE_SIZE.
This commit was SVN r10479.
bytes). The simplest way to make sure they are aligned is to update
the size of the basic element to a multiple of the desired alignment.
It will use a little bit more memory, but the improvements on the SM BTL
seems quite interesting.
This commit was SVN r10478.
cannot include the PMPI_WTIME|WTICK functions in the external and
double precision statements because some compilers complain about
this. Instead, we need to use the macro that is defined by
configure.ac (MPIF_H_PMPI_W_FUNCS). This unfortunately means that we
need to generate mpif.h (in addition to mpif-config.h) because the
"external" statement is toxic to F90 compilers.
This commit was SVN r10464.
with the other methodology even if there are no choice buffers and no
special constants. But it keeps the Makefile.am simple and the
methodology consistent.
This commit was SVN r10462.
was that declaring the type of MPI_WTICK and MPI_TIME in mpif-common.h
would allow the F90 bindings to call through to the back end f77
function and have the right return type. But upon reflection, that's
silly -- we were just declaring the variables MPI_WTICK and MPI_WTIME
that were of type double precision. Duh.
So add some fixed (non-generated) wrapper F90 functions to call the
back-end *C* MPI_WTICK and MPI_TIME functions (vs. the back end *F77*
functions). We have to call the back-end C functions because there's
a name conflict if we try to call the back-end F77 functions -- for
the same reasons that we can't "implicitly" define MPI_WTIME and
MPI_WTICK in the f90 module, we can't call such an implicitly-defined
function. So we had to add new back-end C functions that are directly
callable from Fortran, the easiest implementation of which was to
provide 4 one-line functions for each (rather than muck around with
weak symbols).
This commit was SVN r10448.
1. ompi/mca/btl/udapl/btl_udapl_proc.c should be including
btl_udapl_endpoint.h for mca_btl_udapl_proc_insert function.
2. btl_udapl_endpoint.c it looks like you are using
&endpoint->endpoint_lock when you should use &ep->endpoint_lock in a
OPAL_THREAD_LOCK call.
3. btl_udapl_frag.h has a couple opal_list_item_t's that should be
ompi_free_list_item_t in the _FRAG_ALLOC_{EAGER,MAX} macros.
This commit was SVN r10442.
call to the _UNPACK macro, in the case where the length of the received data
is zero. This might happens on the PUT protocol.
This commit was SVN r10431.