Trying to remember what I did here.. eager/max messages should work now, no RDMA yet. A number of other fixes and cleanups.
I do know of two problems:
Bad stuff happens when flooded with send frags too quickly - the BTL doesn't handle flow control.
Certain IBM tests turn up a length assertion in the datatype engine - needs more investigation.
This commit was SVN r10070.
- Some initial work on prepare_src
- Move some fragment initialization around
- Fix a union casting issue on picky compilers, identified by Don Kerr
- Other small cleanups/bugfixes
This commit was SVN r9662.
we send our local addr_t OOB. Remote side then matches endpoints and calls
dat_ep_connect(). Everything should be the same as before from here, except
that client/server roles are reversed.
- Properly set our buffer size when posting receives. When the frag used to
transfer address information is recycled by the free list, the wrong buffer
size was being used, which caused buffer overflow errors.
- Finally put the uDAPL error handling stuff in the mpool component.
- Remove a few more OPAL_OUTPUTs.
This commit was SVN r9569.
Not much got tested that wasn't already - I've uncovered a connection
establishment deadlock and wanted to get these changes committed before I
attack it.
The big changes:
- Moved much of the connection code from btl_udapl_component.c to
btl_udapl_endpoint.c.
- Cleaned up initialization of various fragment members.
- MCA_BTL_UDAPL_ERROR macro, which is compiled in/out appropriately.
This commit was SVN r9496.
- Grab the mpool_registration in _frag_common_constructor()
- Save the LMR context in the segment key
- No need for cookie variables - can just cast the frag
- No need to memcpy() data when recv'ing
- Add an LMR triplet to the fragment structure and initialize it
in btl_udapl_alloc().
- Whitespace/typo fixes, remove some opal_output() calls
Looks like I can use triplets describing sub-regions of registered LMR's. So I
do this - prior to this patch I was sending the entire free list memory over,
which isn't correct :)
Back to an earlier problem - when sending address information right after
connection establishment, the receiving end receives a DTO completion event and
appears to have good data. But the sending end never receives a DTO completion
event indicating the send completed, and never completes the client side of the
connection.
This commit was SVN r9386.
In short, I'm very close to having connection establishment and eager send/recv working.
Part of the connection process involves sending address information from the
client to server. For some reason, I am never receiving an event indicating
completetion of the send on the client side. Otherwise, connection
establishment is working and eager send/recv should be trivial from here.
Some more detailed changes:
- Send partially implemented, just handles starting up new connections.
- Several support functions implemented for establishing connection. Client
side code went in btl_udapl_endpoint.c, server side in btl_udapl_component.c
- Frags list and send/recv locks added to the endpoint structure.
- BTL sets up a public service point, which listens for new connections.
Steps over ports that are already bound, iterating through a range of ports.
- Remove any traces of recv frags, don't think I need them after all.
- Pieces of component_progress() implemented for connection establishment.
- Frags have two new types for connection establishment - CONN_SEND and
CONN_RECV.
- Many other minor cleanups not affecting functionality
This commit was SVN r9345.
- Copied the template BTL and renamed everything
- Compiles and shows up correctly in ompi_info, not tested past that
- Should be ignored for everyone but me
This commit was SVN r8544.