1
1

32 Коммитов

Автор SHA1 Сообщение Дата
Donald Kerr
3f428af7b8 couple of minor changes to fix #973 and seperated eager rdma fragments into structure only and data only area
This commit was SVN r14470.
2007-04-23 17:41:34 +00:00
George Bosilca
1cb26e3b9c Finally the convertor export a convenience function to allow a consistent
computation of the current location on the pack/unpack process. This can
be used both for retrieving the pointer to the first byte (in the special
case of the cached RDMA protocol) and for getting the current
position (for the pipelined protocol).

I modified all BTLs, but most of them are still untested.

This commit was SVN r14180.
2007-03-30 22:02:45 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Donald Kerr
ed097d17c1 fix for bug #749, though I can not confirm without a linux compiler
This commit was SVN r13090.
2007-01-11 22:25:13 +00:00
Donald Kerr
80f2cbb498 add udapl rdma capabilities into the udapl btl
This commit was SVN r13082.
2007-01-11 15:22:08 +00:00
Brian Barrett
48ec0b2071 Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix
for now...

This commit was SVN r12997.

The following SVN revision numbers were found above:
  r12974 --> open-mpi/ompi@27cea44a9c
2007-01-04 22:07:37 +00:00
Brian Barrett
27cea44a9c Fix a number of issues with the ompi_ptr_t:
* Make sure that the pval always writes to the correct portion of the
    lval.  This only matters on 32 bit big endian machines.
  * On 32 bit machines when assigning to pval, the other 4 bytes of lval
    weren't being written, which could lead to bogus data

We use macros so that there aren't casts all over the code and the pval
assignment can occur to the correct 4 bytes.  Refs trac:587

This commit was SVN r12974.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-03 19:47:48 +00:00
Donald Kerr
899297c8f4 udapl btl was not compiling after r12878 on 12/17/2006, some minor changes to allow btl to compile
This commit was SVN r12963.

The following SVN revision numbers were found above:
  r12878 --> open-mpi/ompi@190e7a27cd
2007-01-02 21:44:12 +00:00
Gleb Natapov
190e7a27cd Merge with gleb-mpool branch. All RDMA components use same mpool now (rdma).
udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows
to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited).

This commit was SVN r12878.
2006-12-17 12:26:41 +00:00
George Bosilca
126a68dc9a Big datatype commit. Remove all unused features of the datatype engine. As the memory
allocation logic is completely done outside the data-type engine (in the PML) there is
no need for any special case inside the data-type engine. There is less arguments for
the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is
not required anymore as there is no memory allocated in the engine itself). This change
affect all components using datatypes. I test most of them, but it might happens that I
miss some ... If it's the case please let me know (don't shoot the pianist!!).

This commit was SVN r12331.
2006-10-26 23:11:26 +00:00
George Bosilca
640178c4b3 Grepping through the source files I found these calls to the data-type engine
with the wrong type of arguments.

This commit was SVN r12148.
2006-10-17 21:05:04 +00:00
Terry Dontje
d636db5832 Fixed bug trac #213 by moving the udapl btl header to being a footer.
Also fixed bug trac #346.

This commit was SVN r11760.
2006-09-22 19:28:09 +00:00
Dan Lacher
f2526d60ed Minor fix for a dropped comma.
This commit was SVN r11259.
2006-08-18 17:55:57 +00:00
Galen Shipman
e5c594c211 More updates for the async error handler for btl's
In order to provide backwards compatability the framework versions are bumped
and the handler registeration function is at the end of the btl struct.
Testing done on sm, openib, and gm.. 

This commit was SVN r11256.
2006-08-17 22:02:01 +00:00
Galen Shipman
3b49953ce2 Add error callback to the btl interface, this allows error to be delivered to
the upperlayer assynchronously although there are some issues with this.. such
as there are multiple consumers of the btl's.. who get's the

This commit was SVN r11232.
2006-08-16 20:21:38 +00:00
Donald Kerr
2e5e01a8df Remove dependency on known port range and allow udapl to provide the port number.
This commit was SVN r11040.
2006-07-28 13:58:21 +00:00
Andrew Friedley
c68c6ac122 A number of fixes and the usual cleanup..
- Added some basic flow control to limit number of posted sends.
- Merged endpoint send/recv lock into single endpoint lock.
- Set the LMR triplet length in the send path, not at allocation time.
  This has to be done because upper layers might send less than the
  amount allocated.
- Alter the tie-breaker if statement protecting the second call
  to dat_ep_connect().  The logic was reversed compared to the tie-
  breaker for the first dat_ep_connect(), making it possible for
  3 or more processes to form a deadlock loop.
- Some asserts were added for debugging purposes.. leaving them
  in place for now.

This commit was SVN r10317.
2006-06-12 22:42:01 +00:00
Andrew Friedley
8a3d0862ca I can commit! *happy dance*
Trying to remember what I did here.. eager/max messages should work now, no RDMA yet.  A number of other fixes and cleanups.

I do know of two problems:
 Bad stuff happens when flooded with send frags too quickly - the BTL doesn't handle flow control.
 Certain IBM tests turn up a length assertion in the datatype engine - needs more investigation.

This commit was SVN r10070.
2006-05-25 15:47:59 +00:00
Andrew Friedley
345551cb36 Checkpoint before starting work on max-sized frags (maybe user too?).
- Some initial work on prepare_src
- Move some fragment initialization around
- Fix a union casting issue on picky compilers, identified by Don Kerr
- Other small cleanups/bugfixes

This commit was SVN r9662.
2006-04-19 22:20:22 +00:00
Andrew Friedley
d461b55696 - Implement OOB connection handshaking via the ORTE RML. To start a connect,
we send our local addr_t OOB.  Remote side then matches endpoints and calls
  dat_ep_connect().  Everything should be the same as before from here, except
  that client/server roles are reversed.
- Properly set our buffer size when posting receives.  When the frag used to
  transfer address information is recycled by the free list, the wrong buffer
  size was being used, which caused buffer overflow errors.
- Finally put the uDAPL error handling stuff in the mpool component.
- Remove a few more OPAL_OUTPUTs.

This commit was SVN r9569.
2006-04-07 15:26:05 +00:00
Andrew Friedley
74b2f77a4c The expected cleanup/refactoring commit..
Not much got tested that wasn't already - I've uncovered a connection
establishment deadlock and wanted to get these changes committed before I
attack it.

The big changes:
 - Moved much of the connection code from btl_udapl_component.c to
   btl_udapl_endpoint.c.
 - Cleaned up initialization of various fragment members.
 - MCA_BTL_UDAPL_ERROR macro, which is compiled in/out appropriately.

This commit was SVN r9496.
2006-03-31 16:25:19 +00:00
Andrew Friedley
0eba366b07 Various pieces all over to make basic small message send/recv work. Next step
is clean up the code.. it is in need of refactoring and testing.

Thanks to Brian for help in troubleshooting!

This commit was SVN r9466.
2006-03-29 21:55:41 +00:00
Andrew Friedley
48d61cd99a Mostly fragment/LMR handling fixes:
- Grab the mpool_registration in _frag_common_constructor()
 - Save the LMR context in the segment key
 - No need for cookie variables - can just cast the frag
 - No need to memcpy() data when recv'ing
 - Add an LMR triplet to the fragment structure and initialize it
   in btl_udapl_alloc().
 - Whitespace/typo fixes, remove some opal_output() calls

Looks like I can use triplets describing sub-regions of registered LMR's.  So I
do this - prior to this patch I was sending the entire free list memory over,
which isn't correct :)

Back to an earlier problem - when sending address information right after
connection establishment, the receiving end receives a DTO completion event and
appears to have good data.  But the sending end never receives a DTO completion
event indicating the send completed, and never completes the client side of the
connection.

This commit was SVN r9386.
2006-03-23 16:21:08 +00:00
Andrew Friedley
cf9246f7b9 Long overdue commit.. many changes.
In short, I'm very close to having connection establishment and eager send/recv working.

Part of the connection process involves sending address information from the
client to server.  For some reason, I am never receiving an event indicating
completetion of the send on the client side.  Otherwise, connection
establishment is working and eager send/recv should be trivial from here.


Some more detailed changes:
 - Send partially implemented, just handles starting up new connections.
 - Several support functions implemented for establishing connection.  Client
   side code went in btl_udapl_endpoint.c, server side in btl_udapl_component.c
 - Frags list and send/recv locks added to the endpoint structure.
 - BTL sets up a public service point, which listens for new connections.
   Steps over ports that are already bound, iterating through a range of ports.
 - Remove any traces of recv frags, don't think I need them after all.
 - Pieces of component_progress() implemented for connection establishment.
 - Frags have two new types for connection establishment - CONN_SEND and
   CONN_RECV.
 - Many other minor cleanups not affecting functionality

This commit was SVN r9345.
2006-03-21 00:12:55 +00:00
Tim Woodall
712468dbef add diagnostic interface
This commit was SVN r9328.
2006-03-17 17:39:41 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
Andrew Friedley
b37e18916f Many different things, the big ones:
- Start filling in the progress function, focusing on connection establishment.
 - Initialize udapl mpool and free lists
 - Create/destroy a protection zone with each IA
 - Misc organization as I learn how things work

This commit was SVN r8969.
2006-02-10 21:49:15 +00:00
Andrew Friedley
ec995160e6 Checkpoint for switch to mpool work:
- Remove printing of CFLAGS in configure.m4
 - Set MCA_BTL_FLAGS_SEND flag
 - Improved error handling during module initialization
 - Extract the address of each interface with dat_ia_query
 - Start playing around with fragment stuff - probably wrong
 - Misc code cleanup (removal of GM-specific code)

This commit was SVN r8801.
2006-01-25 02:21:34 +00:00
Andrew Friedley
5ccab7bcda Checkpoint:
- Move mca_btl_udapl_error/mca_btl_module_init to mca_btl_udapl.c and rename it
 - White space cleanups
 - Free the uDAPL evd and ia handles in mca_btl_udapl_finalize

This commit was SVN r8705.
2006-01-16 21:54:50 +00:00
Andrew Friedley
a4abe3bdbe Checkpoint:
- Borrow configure.m4 from the mvapi btl.  One of the uDAPL headers emits a
   warning when -pedantic is enabled, so strip it out.
 - Change function check in ompi_check_dapl.m4 from dat_ia_open to
   dat_registry_list_providers.. dat_ia_open wasn't working right
 - Make the references to prepare_dst, put, and get NULL for now
 - Add opal_output() calls in all the udapl interface functions for debugging
 - Add evd_qlen component parameter to control event dispatcher queue length
 - First stab at component_init and module_init
 - Misc cleanups - whitespace, dead code removal
 - Update copyrights to 2006

This commit was SVN r8701.
2006-01-16 03:01:12 +00:00
Andrew Friedley
c0bad339af - Use the GM BTL as a template instead, per Tim's suggestion
- Begin adding uDAPL-specific stuff
- Added config/ompi_check_udapl.m4 - hopefully I did this right

This commit was SVN r8681.
2006-01-12 04:05:02 +00:00
Andrew Friedley
f402854a96 Initial commit of uDAPL BTL component.
- Copied the template BTL and renamed everything
 - Compiles and shows up correctly in ompi_info, not tested past that
 - Should be ignored for everyone but me

This commit was SVN r8544.
2005-12-19 16:37:05 +00:00