- the registration array is now global instead of one by BTL.
- each framework have to declare the entries in the registration array reserved. Then
it have to define the internal way of sharing (or not) these entries between all
components. As an example, the PML will not share as there is only one active PML
at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3
are reserved for the framework while the remaining 5 are use internally by each
framework.
- The registration function is optional. If a BTL do not provide such function,
nothing happens. However, in the case where such function is provided in the BTL
structure, it will be called by the BML, when a tag is registered.
Now, it's time for the second step... Converting OB1 from a switch based PML to an
active message one.
This commit was SVN r17140.
(sometimes after the merge with the ORTE branch), the opal_pointer_array
will became the only pointer_array implementation (the orte_pointer_array
will be removed).
This commit was SVN r17007.
than just the PML/BTLs these days. Also clean up the code so that it
handles the situation where not all nodes register information for a given
node (rather than just spinning until that node sends information, like
we do today).
Includes r15234 and r15265 from the /tmp/bwb-modex branch.
This commit was SVN r15310.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r15234
r15265
This is required to tighten up the BTL semantics. Ordering is not guaranteed,
but, if the BTL returns a order tag in a descriptor (other than
MCA_BTL_NO_ORDER) then we may request another descriptor that will obey
ordering w.r.t. to the other descriptor.
This will allow sane behavior for RDMA networks, where local completion of an
RDMA operation on the active side does not imply remote completion on the
passive side. If we send a FIN message after local completion and the FIN is
not ordered w.r.t. the RDMA operation then badness may occur as the passive
side may now try to deregister the memory and the RDMA operation may still be
pending on the passive side.
Note that this has no impact on networks that don't suffer from this
limitation as the ORDER tag can simply always be specified as
MCA_BTL_NO_ORDER.
This commit was SVN r14768.
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.
This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.
This commit closes trac:158
More details to follow.
This commit was SVN r14051.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r13912
The following Trac tickets were found above:
Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
* Make sure that the pval always writes to the correct portion of the
lval. This only matters on 32 bit big endian machines.
* On 32 bit machines when assigning to pval, the other 4 bytes of lval
weren't being written, which could lead to bogus data
We use macros so that there aren't casts all over the code and the pval
assignment can occur to the correct 4 bytes. Refs trac:587
This commit was SVN r12974.
The following Trac tickets were found above:
Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows
to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited).
This commit was SVN r12878.
In order to provide backwards compatability the framework versions are bumped
and the handler registeration function is at the end of the btl struct.
Testing done on sm, openib, and gm..
This commit was SVN r11256.
I've introduced a race condition - seeing occasional LOCAL_LENGTH errors on the receive side. I think I'm mixing up eager/max somehow - will look at it more on monday.
This commit was SVN r10690.
- Added some basic flow control to limit number of posted sends.
- Merged endpoint send/recv lock into single endpoint lock.
- Set the LMR triplet length in the send path, not at allocation time.
This has to be done because upper layers might send less than the
amount allocated.
- Alter the tie-breaker if statement protecting the second call
to dat_ep_connect(). The logic was reversed compared to the tie-
breaker for the first dat_ep_connect(), making it possible for
3 or more processes to form a deadlock loop.
- Some asserts were added for debugging purposes.. leaving them
in place for now.
This commit was SVN r10317.
Trying to remember what I did here.. eager/max messages should work now, no RDMA yet. A number of other fixes and cleanups.
I do know of two problems:
Bad stuff happens when flooded with send frags too quickly - the BTL doesn't handle flow control.
Certain IBM tests turn up a length assertion in the datatype engine - needs more investigation.
This commit was SVN r10070.
- Some initial work on prepare_src
- Move some fragment initialization around
- Fix a union casting issue on picky compilers, identified by Don Kerr
- Other small cleanups/bugfixes
This commit was SVN r9662.
we send our local addr_t OOB. Remote side then matches endpoints and calls
dat_ep_connect(). Everything should be the same as before from here, except
that client/server roles are reversed.
- Properly set our buffer size when posting receives. When the frag used to
transfer address information is recycled by the free list, the wrong buffer
size was being used, which caused buffer overflow errors.
- Finally put the uDAPL error handling stuff in the mpool component.
- Remove a few more OPAL_OUTPUTs.
This commit was SVN r9569.
Not much got tested that wasn't already - I've uncovered a connection
establishment deadlock and wanted to get these changes committed before I
attack it.
The big changes:
- Moved much of the connection code from btl_udapl_component.c to
btl_udapl_endpoint.c.
- Cleaned up initialization of various fragment members.
- MCA_BTL_UDAPL_ERROR macro, which is compiled in/out appropriately.
This commit was SVN r9496.
- Grab the mpool_registration in _frag_common_constructor()
- Save the LMR context in the segment key
- No need for cookie variables - can just cast the frag
- No need to memcpy() data when recv'ing
- Add an LMR triplet to the fragment structure and initialize it
in btl_udapl_alloc().
- Whitespace/typo fixes, remove some opal_output() calls
Looks like I can use triplets describing sub-regions of registered LMR's. So I
do this - prior to this patch I was sending the entire free list memory over,
which isn't correct :)
Back to an earlier problem - when sending address information right after
connection establishment, the receiving end receives a DTO completion event and
appears to have good data. But the sending end never receives a DTO completion
event indicating the send completed, and never completes the client side of the
connection.
This commit was SVN r9386.
In short, I'm very close to having connection establishment and eager send/recv working.
Part of the connection process involves sending address information from the
client to server. For some reason, I am never receiving an event indicating
completetion of the send on the client side. Otherwise, connection
establishment is working and eager send/recv should be trivial from here.
Some more detailed changes:
- Send partially implemented, just handles starting up new connections.
- Several support functions implemented for establishing connection. Client
side code went in btl_udapl_endpoint.c, server side in btl_udapl_component.c
- Frags list and send/recv locks added to the endpoint structure.
- BTL sets up a public service point, which listens for new connections.
Steps over ports that are already bound, iterating through a range of ports.
- Remove any traces of recv frags, don't think I need them after all.
- Pieces of component_progress() implemented for connection establishment.
- Frags have two new types for connection establishment - CONN_SEND and
CONN_RECV.
- Many other minor cleanups not affecting functionality
This commit was SVN r9345.
- move files out of toplevel include/ and etc/, moving it into the
sub-projects
- rather than including config headers with <project>/include,
have them as <project>
- require all headers to be included with a project prefix, with
the exception of the config headers ({opal,orte,ompi}_config.h
mpi.h, and mpif.h)
This commit was SVN r8985.
- Start filling in the progress function, focusing on connection establishment.
- Initialize udapl mpool and free lists
- Create/destroy a protection zone with each IA
- Misc organization as I learn how things work
This commit was SVN r8969.
- Remove printing of CFLAGS in configure.m4
- Set MCA_BTL_FLAGS_SEND flag
- Improved error handling during module initialization
- Extract the address of each interface with dat_ia_query
- Start playing around with fragment stuff - probably wrong
- Misc code cleanup (removal of GM-specific code)
This commit was SVN r8801.
- Move mca_btl_udapl_error/mca_btl_module_init to mca_btl_udapl.c and rename it
- White space cleanups
- Free the uDAPL evd and ia handles in mca_btl_udapl_finalize
This commit was SVN r8705.
- Borrow configure.m4 from the mvapi btl. One of the uDAPL headers emits a
warning when -pedantic is enabled, so strip it out.
- Change function check in ompi_check_dapl.m4 from dat_ia_open to
dat_registry_list_providers.. dat_ia_open wasn't working right
- Make the references to prepare_dst, put, and get NULL for now
- Add opal_output() calls in all the udapl interface functions for debugging
- Add evd_qlen component parameter to control event dispatcher queue length
- First stab at component_init and module_init
- Misc cleanups - whitespace, dead code removal
- Update copyrights to 2006
This commit was SVN r8701.