mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1. If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).
This commit was SVN r10521.
time.
UD is connectionless, and as long as peers are statically assigned to QPs,
there is no reason to set up the adressing information lazily.
Lots of code was axed, as endpoints no longer have state. Removed a
number of other elements in the endpoint struct to make it as lightweight
as possible.
I was able to remove an entire function call/branch in the send path,
which I believe is the main contributor to a 2us drop in NetPIPE latency.
Some whitespace cleanups as well.
Passes IBM test suite, and all but certain intel tests that were failing
before the change, over ob1 PML.
This commit was SVN r10494.
Moved a lot of the module-specific init from the component init to the module init.
Try keeping a pointer to reduce indexing, didn't seem to help - leaving in place
for now.
This commit was SVN r10485.
Playing around with OPAL_LIKELY/UNLIKELY, no real gains yet.
Reworked progress() to process many WC's at a time, as well
as immediately repost groups of receive buffers.
This commit was SVN r10481.
1. ompi/mca/btl/udapl/btl_udapl_proc.c should be including
btl_udapl_endpoint.h for mca_btl_udapl_proc_insert function.
2. btl_udapl_endpoint.c it looks like you are using
&endpoint->endpoint_lock when you should use &ep->endpoint_lock in a
OPAL_THREAD_LOCK call.
3. btl_udapl_frag.h has a couple opal_list_item_t's that should be
ompi_free_list_item_t in the _FRAG_ALLOC_{EAGER,MAX} macros.
This commit was SVN r10442.
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1. If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).
This commit was SVN r10424.
send/receive queues if there is something available in the short message
rdma queues, then we have to poll *ALL* the rdma queues before exiting,
or we aren't fair about frag reception and fall into degenerate matching
cases.
This commit was SVN r10410.
- Added some basic flow control to limit number of posted sends.
- Merged endpoint send/recv lock into single endpoint lock.
- Set the LMR triplet length in the send path, not at allocation time.
This has to be done because upper layers might send less than the
amount allocated.
- Alter the tie-breaker if statement protecting the second call
to dat_ep_connect(). The logic was reversed compared to the tie-
breaker for the first dat_ep_connect(), making it possible for
3 or more processes to form a deadlock loop.
- Some asserts were added for debugging purposes.. leaving them
in place for now.
This commit was SVN r10317.
btl_openib_ib_mtu and btl_mvapi_ib_mtu MCA params by showing the valid
values what what they represent (got a question about this from Cisco
testing engineers).
This commit was SVN r10277.
UD is the Unreliable Datagram transport for Infiniband, specifically OpenIB. This BTL is derived from the existing openib BTL, which is RC (Reliable Connection) based.
Still a work in progress, as there is a lot of work left to do. Specifically, performance, scalability, and flow control need to be addressed.
Currently I'm playing around with different methods for handling receive buffers, as well as profiling to figure out where the time is going.
This commit was SVN r10271.
support for progress threads, so we shouldn't build them or try to use
them when support for progress threads has been requested. The TCP, GM,
SELF, and SM BTLs should have progress thread support, so they aren't
disabled. The Portals BTL isn't compiled on platforms with threads,
so it doens't need to be updated.
This commit was SVN r10156.
Do this rather than the my_list pointer because we need to do some
things that are somewhat special because we pre-pin eager fragments but
not send fragments. Also makes a couple ideas I have slightly easier to
play around with.
This commit was SVN r10127.
Instead of figuring out which free list the fragment belongs to based on size
we simply store a pointer to the list which it belongs in the fragment.
This was reviewed by Brian and should hit all the branches.
This commit was SVN r10072.
Trying to remember what I did here.. eager/max messages should work now, no RDMA yet. A number of other fixes and cleanups.
I do know of two problems:
Bad stuff happens when flooded with send frags too quickly - the BTL doesn't handle flow control.
Certain IBM tests turn up a length assertion in the datatype engine - needs more investigation.
This commit was SVN r10070.
derefence through it. It is legal for endpoint_addr to be NULL in the
destructor because if btl_tcp_add_procs() -> btl_tcp_proc_insert()
returns UNREACH, then endpoint_addr will be NULL and we'll OBJ_RELEASE
it.
This commit was SVN r9940.
apparently don't work properly: r9869, r9868 (sm btl alignment issues)
This commit was SVN r9936.
The following SVN revision numbers were found above:
r9868 --> open-mpi/ompi@9b985c3216
r9869 --> open-mpi/ompi@adedf511fb
easier to do event accounting that way
* greatly increase receive event and buffer sizes. We're still about half
of what Cray defaults to, so I don't feel bad about the increases
* Implement a pre-pinning optimization for eager fragments - will be
pinned on first use and left pinned for the life of the fragment
* Since we can't have two receive frag callbacks fired at the same time,
don't have receive free list - just keep one receive fragment in the
module. Saves a big free list and all that interaction.
This commit was SVN r9915.
right now. Some testing on large NUMA machines should be done in order
to make sure that we need to export this variable out to the MCA layer.
This commit was SVN r9868.