1. The send path get shorter. The BTL is allowed to return > 0 to specify that the
descriptor was pushed to the networks, and that the memory attached to it is
available again for the upper layer. The MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag
can be used by the PML to force the BTL to always trigger the callback.
Unmodified BTL will continue to work as expected, as they will return OMPI_SUCCESS
which force the PML to have exactly the same behavior as before. Some BTLs have
been modified: self, sm, tcp, mx.
2. Add send immediate interface to BTL.
The idea is to have a mechanism of allowing the BTL to take advantage of
send optimizations such as the ability to deliver data "inline". Some
network APIs such as Portals allow data to be sent using a "thin" event
without packing data into a memory descriptor. This interface change
allows the BTL to use such capabilities and allows for other optimizations
in the future. All existing BTLs except for Portals and sm have this interface
set to NULL.
This commit was SVN r18551.
than using a single receive callback followed by a switch on the header.
Also fast pathed the matching for small fragments.
This commit was SVN r18549.
from the posted generic_recv-queue.
- Move the PERUSE_COMM_MSG_MATCH_POSTED_REQ from
MCA_PML_OB1_RECV_REQUEST_MATCHED to
mca_pml_ob1_recv_frag_match() as suggested by Terry Dontje
Only post, if this is not a probe/iprobe request.
- Do not post PERUSE_COMM_REQ_MATCH_UNEX for probes / iprobes and
do in correct order before PERUSE_COMM_MSG_REMOVE_FROM_UNEX_Q
This commit was SVN r15947.
receive queues are shared among all PMLs, they are declared in the base PML,
and the selected PML is in charge of initializing and releasing them.
The CM PML is slightly different compared with OB1 or DR. Internally it use
2 different types of requests: light and heavy. However, now with this patch
both types of requests are stored in the same queue, and cast appropriately
on the allocation macro. This means we might use less memory than we allocate,
but in exchange we got full support for most of the parallel debuggers.
Another thing with this patch, is that now for all PML (CM included) the basic
PML requests start with the same fields, and they are declared in the same order
in the request structure. Moreover, the fields have been moved in such a way
that only one volatile/atomic will exist per line of cache (hopefully).
This commit was SVN r15346.
relative bandwidths of each BTL. Precalculate what part of a message should
be send via each BTL in advance instead of doing it during scheduling.
This commit was SVN r15248.
each BTL. Precalculate what part of a message should be send via each BTL in
advance instead of doing it during scheduling.
This commit was SVN r15247.
that allows to send any range of a request by send/recv instaed of RDMA
and use it to send data from the end of a request in pipeline protocol.
This commit was SVN r14841.
* Make sure that the pval always writes to the correct portion of the
lval. This only matters on 32 bit big endian machines.
* On 32 bit machines when assigning to pval, the other 4 bytes of lval
weren't being written, which could lead to bogus data
We use macros so that there aren't casts all over the code and the pval
assignment can occur to the correct 4 bytes. Refs trac:587
This commit was SVN r12974.
The following Trac tickets were found above:
Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows
to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited).
This commit was SVN r12878.
r12714) for supporting compilers / architectures with different
padding rules.
This commit was SVN r12749.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r12491
r12714
set it up before the match when we know the peer, saving some
time on the critical path. If the receive is ANY_SOURCE then
we initialize the convertor on _MATCHED. Anyway, we will set it
up only once per receive.
This commit was SVN r12484.
allocation logic is completely done outside the data-type engine (in the PML) there is
no need for any special case inside the data-type engine. There is less arguments for
the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is
not required anymore as there is no memory allocated in the engine itself). This change
affect all components using datatypes. I test most of them, but it might happens that I
miss some ... If it's the case please let me know (don't shoot the pianist!!).
This commit was SVN r12331.
long ago) supposed to be used as a cache for accessing the PML procs. But in
all of the PMLs the PML proc contain only one field i.e. a pointer to the ompi_proc.
This pointer can be accessed using the c_remote_group easily. Therefore, there is no
meaning of keeping the PML procs around. Slim fast commit ...
This commit was SVN r11730.
standard). This macro allow us to specify the length of the fragment. Now we are
able to know how the message is fragmented between the network devices or inside
the communication protocol.
This commit was SVN r10508.
message from a known peer (not MPI_ANY_SOURCE) then we can attach the
remote proc and initialize the convertor as soon as we know the data-type,
and the count (so basically in the _INIT macro). If it's not the case, then
create them in the _MATCHED macro (as in the original version). Of course,
beforeinitializing the convertor we check that there will be some data
in the message.
This commit, plus the convertor improvements from few days ago, lower the
latency for my test case environment (mvapi) by 0.1 microseconds. The convertor
now is as slim as it can be, I don't think there is anything else to
remove/improve.
This commit was SVN r9843.
We support all the events in the PERUSE specifications, but right now only one event
of each type can be attached to a communicator. This will be worked out in the future.
The events were places in such a way, that we will be able to measure the overhead
for our threading implementation (the cost of the synchronization objects).
This commit was SVN r9500.
flag, new flags to be included when convertor is initialized
- modified pml/btl module defs and added stub functions for diagnostic
output routines to dump state of queues / endpoints
- updates to data reliability pml
This commit was SVN r9329.
to let the PML (or io, more generally the low level request manager)
to have it's own release function (what was before the req_fini). This
function will only be called from the low level while the req_free will
be called from the upper level (MPI layer) in order to mark the request
as not used by the user anymore.
From the request point of view the requests will be marked as inactive
everytime we read their status (true for persistent as well). As
MPI_REQUEST_NULL is already marked as inactive, the test and wait functions
are simpler. The drawback is that now we have to change in the
ompi_request_{test|wait} the req_status of the request once we get it's
status.
This commit was SVN r9290.