1
1
Граф коммитов

641 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
043153dad3 * fix opal_list_item_t -> ompi_free_list_item_t type change
This commit was SVN r10659.
2006-07-05 17:02:16 +00:00
Brian Barrett
47725c9b02 * Add new PML (CM) and network drivers (MTL) for high speed
interconnects that provide matching logic in the library.
  Currently includes support for MX and some support for
  Portals
* Fix overuse of proc_pml pointer on the ompi_proc structuer, 
  splitting into proc_pml for pml data and proc_bml for
  the BML endpoint data
* bug fixes in bsend init code, which wasn't being used by
  the OB1 or DR PMLs...

This commit was SVN r10642.
2006-07-04 01:20:20 +00:00
Galen Shipman
7e079d20ab fix for stupid casting.. addresses issue on PPC64 where sizes get set
improperly and badness ensues..

This commit was SVN r10574.
2006-06-29 21:58:50 +00:00
George Bosilca
7d59a6885b Remove all references to the MRU list. Add back the repost list checks. For some reasons
it decrease the latency by around 0.3 micro-seconds ...

This commit was SVN r10571.
2006-06-29 19:25:44 +00:00
George Bosilca
78f0de127d Typo.
This commit was SVN r10567.
2006-06-29 15:16:25 +00:00
George Bosilca
238147f576 Help the compiler to optimize the code. Now the order in the enum reflect the
order we use them in the switch.

This commit was SVN r10565.
2006-06-29 15:10:58 +00:00
George Bosilca
9bf281bca2 Remove the gm_mru_reg list as it is never used. Cleanup the repost logic. Now we repost
a receive fragment only when we're done with the message from inside and we try to add it
to the list.

This commit was SVN r10564.
2006-06-29 15:10:11 +00:00
George Bosilca
43b7b17033 Release the memory registration when the descriptors get freed.
This commit was SVN r10540.
2006-06-28 15:24:16 +00:00
George Bosilca
d9daa34a6c Set the registration field to NULL when we create a new fragment.
This commit was SVN r10539.
2006-06-28 15:23:36 +00:00
Gleb Natapov
c8f75c472a remove modulo op from fast path. Improvement 0.02-0.04ms.
This commit was SVN r10538.
2006-06-28 12:00:47 +00:00
Gleb Natapov
e58a89ef3e OMPI_ENABLE_DEBUG is always defined (to 0 or 1). Use #if and nto #ifdef.
This commit was SVN r10537.
2006-06-28 11:25:09 +00:00
Gleb Natapov
704a5eb645 Support for LMC (lid mask count) and multiple QPs per port.
This commit was SVN r10536.
2006-06-28 07:23:08 +00:00
Galen Shipman
e6cd8db0e5 DR will now checksum on a per btl basis (see MCA_BTL_FLAGS_NEED_CSUM). We
still always send ACK's, teasing apart completion for ACK/no ACK looks like a
pain in the .. 

This commit was SVN r10530.
2006-06-27 20:23:47 +00:00
Jeff Squyres
df45221a3e Until a real fix for #142 is found, this workaround prohibits using
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1.  If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).

This commit was SVN r10521.
2006-06-27 10:43:03 +00:00
Gleb Natapov
52208d7bf9 Whe don't need to register zero sized frags.
This commit was SVN r10519.
2006-06-27 08:50:12 +00:00
Galen Shipman
8855e5b73a Fixes for DR as well as better diagnostic..
Successfully passing the intel test suite with/without induced errors/drops. 

This commit was SVN r10518.
2006-06-26 22:29:29 +00:00
Gleb Natapov
b7715395cb Return descriptor before sending credits one more time. We may need it.
This commit was SVN r10495.
2006-06-26 07:05:58 +00:00
Andrew Friedley
7bfac82ce7 Change over from lazy connection setup to setting up at initialization
time.

UD is connectionless, and as long as peers are statically assigned to QPs,
there is no reason to set up the adressing information lazily.

Lots of code was axed, as endpoints no longer have state.  Removed a
number of other elements in the endpoint struct to make it as lightweight
as possible.

I was able to remove an entire function call/branch in the send path,
which I believe is the main contributor to a 2us drop in NetPIPE latency.

Some whitespace cleanups as well.

Passes IBM test suite, and all but certain intel tests that were failing
before the change, over ob1 PML.

This commit was SVN r10494.
2006-06-23 16:50:50 +00:00
Andrew Friedley
046f4cd4ae Enough cleanup for now.
Moved a lot of the module-specific init from the component init to the module init.

Try keeping a pointer to reduce indexing, didn't seem to help - leaving in place
for now.

This commit was SVN r10485.
2006-06-22 22:12:13 +00:00
Andrew Friedley
8392ed4cac A checkpoint before I really do some cleanup.. nothing pretty here.
Playing around with OPAL_LIKELY/UNLIKELY, no real gains yet.

Reworked progress() to process many WC's at a time, as well
as immediately repost groups of receive buffers.

This commit was SVN r10481.
2006-06-22 18:06:55 +00:00
Andrew Friedley
365c81d6e9 Fix a few issues reported by Terry Dontje:
1. ompi/mca/btl/udapl/btl_udapl_proc.c should be including
btl_udapl_endpoint.h for mca_btl_udapl_proc_insert function.

2. btl_udapl_endpoint.c it looks like you are using
&endpoint->endpoint_lock when you should use &ep->endpoint_lock in a
OPAL_THREAD_LOCK call.

3. btl_udapl_frag.h has a couple opal_list_item_t's that should be
ompi_free_list_item_t in the _FRAG_ALLOC_{EAGER,MAX} macros.

This commit was SVN r10442.
2006-06-20 17:13:44 +00:00
George Bosilca
044868df45 Set the destination descriptor before calling the recv registration. Once
this call is completed, we have to remove it in order to be able to cleanup
correctly the fragments.

This commit was SVN r10428.
2006-06-20 14:11:09 +00:00
George Bosilca
1b18b7d934 Change the parameter registration of this BTL to the new calls (new is relative
here). Change the self BTL to use RDMA protocol.

This commit was SVN r10427.
2006-06-20 14:09:58 +00:00
Jeff Squyres
1d27ca5d0a Until a real fix for #142 is found, this workaround prohibits using
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1.  If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).

This commit was SVN r10424.
2006-06-20 11:32:46 +00:00
Jeff Squyres
600bf4295a Update the help message to be slightly more concise and clear
This commit was SVN r10422.
2006-06-20 11:23:38 +00:00
Brian Barrett
3d027e57a8 * fix for ticket #141. If we are going to shortcut out of polling the
send/receive queues if there is something available in the short message
  rdma queues, then we have to poll *ALL* the rdma queues before exiting,
  or we aren't fair about frag reception and fall into degenerate matching
  cases.

This commit was SVN r10410.
2006-06-17 21:32:25 +00:00
Brian Barrett
05046e8ad2 if MX isn't running on some hosts, but is on others, we were blocking in the modex receive
waiting for the non-running procs to publish their contact information.  Publish their
(lack of) contact information.

This commit was SVN r10355.
2006-06-14 19:07:38 +00:00
George Bosilca
aca71521db Complete the move of the mpool registration from opal_list_item_t to the
ompi_free_list_item_t.

This commit was SVN r10354.
2006-06-14 17:43:50 +00:00
Brian Barrett
d367dc5d56 * Fix for bug #115 -- we need to decrement the use count on a pinned buffer
so that memory is actually deregistered.  Reviewed by Galen.

This commit was SVN r10349.
2006-06-14 13:38:24 +00:00
Andrew Friedley
c68c6ac122 A number of fixes and the usual cleanup..
- Added some basic flow control to limit number of posted sends.
- Merged endpoint send/recv lock into single endpoint lock.
- Set the LMR triplet length in the send path, not at allocation time.
  This has to be done because upper layers might send less than the
  amount allocated.
- Alter the tie-breaker if statement protecting the second call
  to dat_ep_connect().  The logic was reversed compared to the tie-
  breaker for the first dat_ep_connect(), making it possible for
  3 or more processes to form a deadlock loop.
- Some asserts were added for debugging purposes.. leaving them
  in place for now.

This commit was SVN r10317.
2006-06-12 22:42:01 +00:00
Galen Shipman
218a438509 finished the ompi_free_list_t class nightmare..
This commit was SVN r10314.
2006-06-12 22:09:03 +00:00
Brian Barrett
d5acb4e3cc * silence dumb (and mostly useless) warning during cleanup
This commit was SVN r10280.
2006-06-09 21:09:53 +00:00
Jeff Squyres
a4030ad2d9 Improve the tremendously unhelpful MCA help message for the
btl_openib_ib_mtu and btl_mvapi_ib_mtu MCA params by showing the valid
values what what they represent (got a question about this from Cisco
testing engineers).

This commit was SVN r10277.
2006-06-09 18:02:45 +00:00
Andrew Friedley
75176370ae blah. somehow missed adding .ompi_ignore/.ompi_unignore.
This commit was SVN r10272.
2006-06-09 00:15:36 +00:00
Andrew Friedley
cca1616368 Finally committing the UD BTL.
UD is the Unreliable Datagram transport for Infiniband, specifically OpenIB.  This BTL is derived from the existing openib BTL, which is RC (Reliable Connection) based.

Still a work in progress, as there is a lot of work left to do.  Specifically, performance, scalability, and flow control need to be addressed.

Currently I'm playing around with different methods for handling receive buffers, as well as profiling to figure out where the time is going.

This commit was SVN r10271.
2006-06-09 00:13:45 +00:00
Galen Shipman
90799f82cd copy paste error..
This commit was SVN r10220.
2006-06-06 02:38:29 +00:00
Galen Shipman
cc54b07aa0 add better error messages for vapi retry exceeded errors.
This commit was SVN r10219.
2006-06-06 02:04:56 +00:00
Galen Shipman
9e6e7575b9 doh... add the file..
This commit was SVN r10210.
2006-06-05 21:24:42 +00:00
Galen Shipman
f05dee0435 add help file to explain why things went south..
This commit was SVN r10209.
2006-06-05 21:23:45 +00:00
Galen Shipman
74c97fb784 cleanup error reporting.. use ompi_proc_t->proc_name if available this gives
us source/dest hostnames for communication errors.. 

This goes to 1.1 branch (reviewed by Brian).. 

This commit was SVN r10200.
2006-06-05 20:02:41 +00:00
Galen Shipman
0344ae4ac5 Fix to allow eager limit and max send size to be any size (within resource limitations). Instead of storing the ompi_free_list_t * in the fragment, we use the frag type enum, this tells us where the frag came from and where it should return.. This could also be done in mvapi but is not a high priority moving forward..
Review by Brian, needs to hit the trunk + 1.1 release.. 

This commit was SVN r10157.
2006-06-01 02:32:18 +00:00
Brian Barrett
5163f2b296 Fix for bug #36. The MX, MVAPI, and OpenIB components don't have
support for progress threads, so we shouldn't build them or try to use
them when support for progress threads has been requested.  The TCP, GM,
SELF, and SM BTLs should have progress thread support, so they aren't
disabled.  The Portals BTL isn't compiled on platforms with threads,
so it doens't need to be updated.

This commit was SVN r10156.
2006-06-01 01:30:16 +00:00
Galen Shipman
c79efc9efb track which list a fragment came from, allows returning based on list, not
on size. 

This commit was SVN r10142.
2006-05-31 14:24:32 +00:00
Brian Barrett
c723d196c5 Rather than using fragment size to determine fragment type, use an enum.
Do this rather than the my_list pointer because we need to do some
things that are somewhat special because we pre-pin eager fragments but
not send fragments.  Also makes a couple ideas I have slightly easier to
play around with.

This commit was SVN r10127.
2006-05-31 03:34:32 +00:00
Galen Shipman
2667c52a5d Track fragments by list, not by size..
-- reviewed by Brian, needs to hit all the branches.. 

This commit was SVN r10078.
2006-05-25 18:07:26 +00:00
Galen Shipman
38a0561d9b Allow maximum send size to be less than the eager limit.
Instead of figuring out which free list the fragment belongs to based on size
we simply store a pointer to the list which it belongs in the fragment.

This was reviewed by Brian and should hit all the branches.

This commit was SVN r10072.
2006-05-25 16:57:14 +00:00
Andrew Friedley
8a3d0862ca I can commit! *happy dance*
Trying to remember what I did here.. eager/max messages should work now, no RDMA yet.  A number of other fixes and cleanups.

I do know of two problems:
 Bad stuff happens when flooded with send frags too quickly - the BTL doesn't handle flow control.
 Certain IBM tests turn up a length assertion in the datatype engine - needs more investigation.

This commit was SVN r10070.
2006-05-25 15:47:59 +00:00
Gleb Natapov
f590d8a190 fix eager RDMA on PPC64.
This commit was SVN r10059.
2006-05-25 11:05:12 +00:00
Jeff Squyres
dd44d36be0 Fix for ticket #25. Ensure that in the threaded case where we have
This commit was SVN r10043.
2006-05-24 16:15:07 +00:00
George Bosilca
085cac552f Don't let TCP to create local connections, we have the self BTL for this purpose.
This commit was SVN r10018.
2006-05-23 03:06:32 +00:00
Jeff Squyres
7b59847765 Ensure that endpoint->endpoint_addr is not NULL before trying to
derefence through it.  It is legal for endpoint_addr to be NULL in the
destructor because if btl_tcp_add_procs() -> btl_tcp_proc_insert()
returns UNREACH, then endpoint_addr will be NULL and we'll OBJ_RELEASE
it.

This commit was SVN r9940.
2006-05-16 19:01:08 +00:00
Jeff Squyres
e24377a89c Back out a pair of commits from George from last week because they
apparently don't work properly: r9869, r9868 (sm btl alignment issues)

This commit was SVN r9936.

The following SVN revision numbers were found above:
  r9868 --> open-mpi/ompi@9b985c3216
  r9869 --> open-mpi/ompi@adedf511fb
2006-05-16 16:48:43 +00:00
Brian Barrett
dcc6b47fa2 * put rdma operations in the send event queue instead of receive because it's
easier to do event accounting that way
* greatly increase receive event and buffer sizes.  We're still about half
  of what Cray defaults to, so I don't feel bad about the increases
* Implement a pre-pinning optimization for eager fragments - will be
  pinned on first use and left pinned for the life of the fragment
* Since we can't have two receive frag callbacks fired at the same time,
  don't have receive free list - just keep one receive fragment in the
  module.  Saves a big free list and all that interaction.

This commit was SVN r9915.
2006-05-14 04:23:26 +00:00
Andrew Friedley
4c3aa05c83 uDAPL has an expects memory for enumerating interface adapters in a really
weird way - fix up to do things 'properly'.

Add my sandia username to the unignore.

This commit was SVN r9879.
2006-05-10 19:50:30 +00:00
George Bosilca
adedf511fb Remove the printf that I unfortunately commit.
This commit was SVN r9869.
2006-05-10 00:02:54 +00:00
George Bosilca
9b985c3216 Force the useful data to be aligned on special boundary. It is 32 bits
right now. Some testing on large NUMA machines should be done in order
to make sure that we need to export this variable out to the MCA layer.

This commit was SVN r9868.
2006-05-09 21:46:10 +00:00
George Bosilca
a386fccccc Increase the default limits for the SM BTL. These new
values allow better performances on all the clusters
I was able to test.

This commit was SVN r9867.
2006-05-09 21:44:24 +00:00
Gleb Natapov
0c34d5c9e6 fix endpoint matching in on demand connection establishment. This fix is in mvapi btl already.
This commit was SVN r9855.
2006-05-09 12:12:52 +00:00
Tim Woodall
350d5b1713 change hardcoded values into mca params
This commit was SVN r9815.
2006-05-04 15:20:18 +00:00
George Bosilca
bdecdc8d41 Cleanup the MX BTL. Remove all mpool related code as there will never be a MX mpool.
This commit was SVN r9808.
2006-05-04 06:55:45 +00:00
Tim Woodall
4fd2a71b6c removed debug code - free list implementation has changed
This commit was SVN r9750.
2006-04-27 15:34:12 +00:00
Brian Barrett
9cab1bb54a * re-enable the eager fragment throttling, this time with the proper threshold value for when
the memory descriptor is closing itself, so that it actually works properly ;).  I think I
  was just getting lucky and not sending enough short messages with the reference impl.

This commit was SVN r9748.
2006-04-27 14:13:52 +00:00
Brian Barrett
66d1d3b83f * add a quick debugging sanity check
* It appears that Cray's SeaStar has some horrible performance for iovecs - IN_pLACE
  was actually slower than copying into eager frags.  Ugh.  And we don't even pre-pin
  eager frags yet!

This commit was SVN r9738.
2006-04-27 02:55:31 +00:00
George Bosilca
3e968d4f63 There is no length on the free list.
This commit was SVN r9704.
2006-04-24 23:13:51 +00:00
Brian Barrett
9a65ddd788 * back out r9005, which for some reason works fine on the reference implementation
but causes resource exhaustion on the Red Storm implementation.  Sigh...

This commit was SVN r9686.

The following SVN revision numbers were found above:
  r9005 --> open-mpi/ompi@20d06e889e
2006-04-22 20:12:33 +00:00
Andrew Friedley
345551cb36 Checkpoint before starting work on max-sized frags (maybe user too?).
- Some initial work on prepare_src
- Move some fragment initialization around
- Fix a union casting issue on picky compilers, identified by Don Kerr
- Other small cleanups/bugfixes

This commit was SVN r9662.
2006-04-19 22:20:22 +00:00
George Bosilca
61bea41350 The same in MX (missing copyright).
This commit was SVN r9661.
2006-04-19 21:37:30 +00:00
George Bosilca
afe9821d84 Add a missing copyright.
This commit was SVN r9660.
2006-04-19 21:36:22 +00:00
Tim Woodall
10f343734f decrease eager limit to 12K (improves latency)
This commit was SVN r9646.
2006-04-14 22:28:37 +00:00
Tim Woodall
6523c12e4b - decrease eager limit to 12K (improves latency)
- trigger event library while setting up connections

This commit was SVN r9645.
2006-04-14 22:28:05 +00:00
Tim Woodall
c6489cb5aa - turn on eager rdma by default
This commit was SVN r9641.
2006-04-14 21:11:14 +00:00
George Bosilca
b3cc3d82d3 Activate the OOB while we setup connections for MVAPI. Same thing should be done for the
Open IB ...

This commit was SVN r9640.
2006-04-14 20:53:42 +00:00
Gleb Natapov
98282a3567 fix spelling. threashold -> threshold.
This commit was SVN r9577.
2006-04-08 08:13:37 +00:00
Andrew Friedley
d461b55696 - Implement OOB connection handshaking via the ORTE RML. To start a connect,
we send our local addr_t OOB.  Remote side then matches endpoints and calls
  dat_ep_connect().  Everything should be the same as before from here, except
  that client/server roles are reversed.
- Properly set our buffer size when posting receives.  When the frag used to
  transfer address information is recycled by the free list, the wrong buffer
  size was being used, which caused buffer overflow errors.
- Finally put the uDAPL error handling stuff in the mpool component.
- Remove a few more OPAL_OUTPUTs.

This commit was SVN r9569.
2006-04-07 15:26:05 +00:00
Gleb Natapov
b6ab1f4262 fix compilation warnings.
This commit was SVN r9515.
2006-04-02 11:32:25 +00:00
Andrew Friedley
74b2f77a4c The expected cleanup/refactoring commit..
Not much got tested that wasn't already - I've uncovered a connection
establishment deadlock and wanted to get these changes committed before I
attack it.

The big changes:
 - Moved much of the connection code from btl_udapl_component.c to
   btl_udapl_endpoint.c.
 - Cleaned up initialization of various fragment members.
 - MCA_BTL_UDAPL_ERROR macro, which is compiled in/out appropriately.

This commit was SVN r9496.
2006-03-31 16:25:19 +00:00
Gleb Natapov
256bf70530 Forgot to add file to previous commit
This commit was SVN r9480.
2006-03-30 17:37:52 +00:00
Gleb Natapov
79bcfb096f Add type to frag. Sometimes we need to know that a frag is from short rdma area.
I used hack for this that doesn't work for mvapi, so changing it to something more sane.

This commit was SVN r9477.
2006-03-30 15:26:21 +00:00
Gleb Natapov
ea11582191 Porting of short message RDMA from openib BTL. Endpoint registers circular buffer and sends its address and rkey to the peer. Peer uses this buffer to eagerly RDMA small message into it. Endpoint polls the buffer for message arrival before checking HP/LP QPs. Set btl_mvapi_use_eager_rdma to 1 to enable it.
This commit was SVN r9474.
2006-03-30 12:55:31 +00:00
Andrew Friedley
0eba366b07 Various pieces all over to make basic small message send/recv work. Next step
is clean up the code.. it is in need of refactoring and testing.

Thanks to Brian for help in troubleshooting!

This commit was SVN r9466.
2006-03-29 21:55:41 +00:00
Gleb Natapov
590c992a7e fix recursive lock of openib_btl->ib_lock.
This commit was SVN r9427.
2006-03-26 15:02:43 +00:00
Gleb Natapov
01a119c3c5 fix compilation bug with --enable-mpi-threads
This commit was SVN r9426.
2006-03-26 13:24:10 +00:00
Gleb Natapov
a5a78b10cc Implementation of short message RDMA. Endpoint registers circular buffer and sends its address and rkey to the peer. Peer uses this buffer to eagerly RDMA small message into it. Endpoint polls the buffer for message arrival before checking HP/LP QPs. Set btl_openib_use_eager_rdma to 1 to enable it.
This commit was SVN r9425.
2006-03-26 08:30:50 +00:00
Andrew Friedley
48d61cd99a Mostly fragment/LMR handling fixes:
- Grab the mpool_registration in _frag_common_constructor()
 - Save the LMR context in the segment key
 - No need for cookie variables - can just cast the frag
 - No need to memcpy() data when recv'ing
 - Add an LMR triplet to the fragment structure and initialize it
   in btl_udapl_alloc().
 - Whitespace/typo fixes, remove some opal_output() calls

Looks like I can use triplets describing sub-regions of registered LMR's.  So I
do this - prior to this patch I was sending the entire free list memory over,
which isn't correct :)

Back to an earlier problem - when sending address information right after
connection establishment, the receiving end receives a DTO completion event and
appears to have good data.  But the sending end never receives a DTO completion
event indicating the send completed, and never completes the client side of the
connection.

This commit was SVN r9386.
2006-03-23 16:21:08 +00:00
Tim Woodall
c7ee5e13bc simplification - dont swap src/dst pointers - always leave both
src/dst pointing to same segments

This commit was SVN r9357.
2006-03-21 18:20:17 +00:00
George Bosilca
f7a5a582c5 Diagnostic function for mvapi. It print all the credits used for the flow control.
This commit was SVN r9355.
2006-03-21 17:02:14 +00:00
Andrew Friedley
cf9246f7b9 Long overdue commit.. many changes.
In short, I'm very close to having connection establishment and eager send/recv working.

Part of the connection process involves sending address information from the
client to server.  For some reason, I am never receiving an event indicating
completetion of the send on the client side.  Otherwise, connection
establishment is working and eager send/recv should be trivial from here.


Some more detailed changes:
 - Send partially implemented, just handles starting up new connections.
 - Several support functions implemented for establishing connection.  Client
   side code went in btl_udapl_endpoint.c, server side in btl_udapl_component.c
 - Frags list and send/recv locks added to the endpoint structure.
 - BTL sets up a public service point, which listens for new connections.
   Steps over ports that are already bound, iterating through a range of ports.
 - Remove any traces of recv frags, don't think I need them after all.
 - Pieces of component_progress() implemented for connection establishment.
 - Frags have two new types for connection establishment - CONN_SEND and
   CONN_RECV.
 - Many other minor cleanups not affecting functionality

This commit was SVN r9345.
2006-03-21 00:12:55 +00:00
George Bosilca
e181153f16 Remove the bogus prototype.
This commit was SVN r9333.
2006-03-19 19:22:35 +00:00
George Bosilca
a0d25ab6ef Add missing prototype for the mvapi diagnostic function.
This commit was SVN r9331.
2006-03-18 19:38:56 +00:00
Tim Woodall
bd870519fd - modified convertor copy_and_prepare routines to accept an addition
flag, new flags to be included when convertor is initialized
- modified pml/btl module defs and added stub functions for diagnostic
  output routines to dump state of queues / endpoints
- updates to data reliability pml

This commit was SVN r9329.
2006-03-17 18:46:48 +00:00
Tim Woodall
712468dbef add diagnostic interface
This commit was SVN r9328.
2006-03-17 17:39:41 +00:00
Tim Woodall
c34f4c2cb7 correct cleanup for threaded case
This commit was SVN r9291.
2006-03-16 00:05:39 +00:00
Galen Shipman
440417e92c Add max_btls option
This commit was SVN r9263.
2006-03-13 17:03:21 +00:00
Sven Stork
12b94972e2 Fix comment of a paramter.
This commit was SVN r9261.
2006-03-13 09:11:46 +00:00
Brian Barrett
d041558f85 * protect lock when not building threaded
This commit was SVN r9254.
2006-03-11 03:20:50 +00:00
Brian Barrett
3e2c51dea8 * fix some silly commenting done by a previous developer that are good for
a laugh but probably not good for usability ;)

This commit was SVN r9253.
2006-03-11 03:09:24 +00:00
Tim Woodall
9ae910044b resolve threading issue
This commit was SVN r9234.
2006-03-09 17:59:05 +00:00
Tim Woodall
8bf6ed7a36 - corrected locking in gm btl - gm api is not thread safe
- initial support for gm progress thread
- corrected threading issue in pml
- added polling progress for a configurable number of cycles to wait for threaded case

This commit was SVN r9188.
2006-03-02 00:39:07 +00:00
Brian Barrett
579e74290f * make gm wire-up endian safe
This commit was SVN r9179.
2006-02-28 02:03:46 +00:00
Brian Barrett
bfd49d248b * (hopefully) fix MPI_BOTTOM for portals, same way as oll the other RDMA btls from
eons ago...

This commit was SVN r9172.
2006-02-27 17:07:24 +00:00
Brian Barrett
9b19e3fef0 * remove some debugging output that shouldn't have been committed. Doh!
This commit was SVN r9171.
2006-02-27 16:23:52 +00:00
Brian Barrett
285581dff2 More endian-related cleanups:
- moved hton64 and ntoh64 from the bunch of places it had been copied
    into one header file
  - properly set and use the btl_tcp's nbo option to put things in
    network byte order on the wire if both sides don't have the same
    endianness
  - Put the OB1 PML's headers (with a couple exceptions I need to discuss
    with Tim) in network byte order on the wire if both sides don't have
    the same endianness
  - since it was needed for the TCP BTL, move the orte_process_name_t
    HTON and NTOH macros from the TCP OOB to ns_types.h

This commit was SVN r9145.
2006-02-26 00:45:54 +00:00
Jeff Squyres
628125599d Fix the TCL btl module endpoint matching during setup for the scenario
when running an MPI job spanning a node that has two TCP NICs and a
node that has one TCP NIC.  Previously, for the 2 NIC/module process,
we would return the first peer IP address if we couldn't find a subnet
match with any of the peer's published IP addresses -- this was to
support running OMPI across subnet boundaries.  Changed the behavior
to only do that behavior if the IP address we're trying to match is
public (i.e., not 10.x.y.z, 192.168.x.y, or 172.16.x.y) *and* any of
the remote peer's addresses are public (working on the assumption that
if we both have public addresses, they're routable to each other).

This definitely will not work in all scenarios, such as when we go to
WAN kinds of executions, and will need to be revisited at that time.

This commit was SVN r9119.
2006-02-23 02:02:19 +00:00
Galen Shipman
e58b758031 standardize behavior of btl_alloc, if the size is larger than the max send
size, btl_alloc returns NULL. 

This commit was SVN r9114.
2006-02-22 17:37:59 +00:00
Brian Barrett
0d098c9d57 * after talking with galen, take into account we might truncate
This commit was SVN r9113.
2006-02-22 17:03:04 +00:00
Brian Barrett
765d2ffc29 * the self btl should set the segment size field on alloc like the other btls
* clean up duplicate free in long message accumulates that looks like it was
  a cut-n-paste error

This commit was SVN r9112.
2006-02-22 16:20:13 +00:00
Brian Barrett
08747dcaf8 * Throttle the number of incoming receives so that we don't overrun our receive
event queue and lose receive messages.

This commit was SVN r9006.
2006-02-13 15:59:54 +00:00
Brian Barrett
20d06e889e * revert out some of the attempts to better use the Portals 3.3.2-2 user-space
run-time support, as it appears to be doing bad things to memory.  Update
  the hack to get the local nid to match the recent TCP nal changes, and
  update the P3RT api useage

This commit was SVN r9005.
2006-02-13 15:41:00 +00:00
George Bosilca
ecc3e00362 Various cleanups.
This commit was SVN r9002.
2006-02-12 21:36:07 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
Andrew Friedley
b37e18916f Many different things, the big ones:
- Start filling in the progress function, focusing on connection establishment.
 - Initialize udapl mpool and free lists
 - Create/destroy a protection zone with each IA
 - Misc organization as I learn how things work

This commit was SVN r8969.
2006-02-10 21:49:15 +00:00
George Bosilca
9f1357fb89 Remove all the useless includes. Most of the endpoint do not depend on the
orte includes.

This commit was SVN r8932.
2006-02-08 05:10:48 +00:00
Galen Shipman
c8045bf397 Fixup for ORTE datatype checkin,
- use appropriate header files 
- change calls from orte_dps to orte_dss 

This commit was SVN r8920.
2006-02-07 15:20:44 +00:00
George Bosilca
20fd358327 A BTL cannot depend on the orte data-types.
This commit was SVN r8915.
2006-02-07 06:05:36 +00:00
Ralph Castain
4b9f015c0b Merge in the new data support subsystem for ORTE. MPI folks should not notice a difference. Longer explanation will be sent to developers mailing list.
This commit was SVN r8912.
2006-02-07 03:32:36 +00:00
Rainer Keller
7ac0ffc349 - Instead of doing the unlock inside the if, just move the if-statement
later after the mandatory unlock.

This commit was SVN r8885.
2006-02-02 17:32:22 +00:00
Tim Woodall
a2fde48f2f changes from release branch
This commit was SVN r8858.
2006-01-31 16:17:18 +00:00
Tim Woodall
bcd6c525f8 removed duplicate locks
This commit was SVN r8857.
2006-01-31 16:12:37 +00:00
Tim Woodall
9d484916db remove locks already held
This commit was SVN r8853.
2006-01-31 14:23:08 +00:00
Brian Barrett
b1d2424013 Merge in present work on the MPI-2 onesided chapter. The current code is not
complete, but stable enough that it will have no impact on general development,
so into the trunk it goes.  Changes in this commit include:

 - Remove the --with option for disabling MPI-2 onesided support.  It
   complicated code, and has no real reason for existing
 - add a framework osc (OneSided Communication) for encapsulating
   all the MPI-2 onesided functionality
 - Modify the MPI interface functions for the MPI-2 onesided chapter
   to properly call the underlying framework and do the required
   error checking
 - Created an osc component pt2pt, which is layered over the BML/BTL
   for communication (although it also uses the PML for long message
   transfers).  Currently, all support functions, all communication
   functions (Put, Get, Accumulate), and the Fence synchronization
   function are implemented.  The PWSC active synchronization
   functions and Lock/Unlock passive synchronization functions are
   still not implemented

This commit was SVN r8836.
2006-01-28 15:38:37 +00:00
George Bosilca
0f1c6d79e8 Make the MVAPI BTL thread safe again. The problem was a double locking on the endpoint mutex.
It's still not very clean as we still lock the mvapi_btl mutex inside a critical section
protected by the endpoint mutex ...

This commit was SVN r8810.
2006-01-25 23:14:06 +00:00
Galen Shipman
ddc22d8c7e Use endpoint_lock, not ib_lock (copy paste error from openib btl)
This commit was SVN r8806.
2006-01-25 15:04:37 +00:00
Andrew Friedley
ec995160e6 Checkpoint for switch to mpool work:
- Remove printing of CFLAGS in configure.m4
 - Set MCA_BTL_FLAGS_SEND flag
 - Improved error handling during module initialization
 - Extract the address of each interface with dat_ia_query
 - Start playing around with fragment stuff - probably wrong
 - Misc code cleanup (removal of GM-specific code)

This commit was SVN r8801.
2006-01-25 02:21:34 +00:00
Tim Woodall
51ec050647 port of revised flow control from openib
This commit was SVN r8799.
2006-01-24 23:44:30 +00:00
Tim Woodall
e861158fcd - removed debug code
- removed extraneous memset

This commit was SVN r8798.
2006-01-24 23:38:41 +00:00
Galen Shipman
1e0ea9dd6d Major fixes for the RDMA registration cache (leave_pinned).
This commit fixes issues with HPL runs on node counts > 4. 

This commit was SVN r8793.
2006-01-23 22:51:50 +00:00
Brian Barrett
8b2a285f8f * update reference implementation support to match latest release
This commit was SVN r8783.
2006-01-22 04:56:18 +00:00
Galen Shipman
d657052510 misc cleanup..
This commit was SVN r8731.
2006-01-18 16:20:50 +00:00
Galen Shipman
84a09e4f4e use #if not #ifdef..
This commit was SVN r8720.
2006-01-17 21:07:34 +00:00
Galen Shipman
0c81c0a6ce use ibv_get_device_list if present
(submitted from roland)

This commit was SVN r8712.
2006-01-17 16:23:35 +00:00
Andrew Friedley
5ccab7bcda Checkpoint:
- Move mca_btl_udapl_error/mca_btl_module_init to mca_btl_udapl.c and rename it
 - White space cleanups
 - Free the uDAPL evd and ia handles in mca_btl_udapl_finalize

This commit was SVN r8705.
2006-01-16 21:54:50 +00:00
Andrew Friedley
a4abe3bdbe Checkpoint:
- Borrow configure.m4 from the mvapi btl.  One of the uDAPL headers emits a
   warning when -pedantic is enabled, so strip it out.
 - Change function check in ompi_check_dapl.m4 from dat_ia_open to
   dat_registry_list_providers.. dat_ia_open wasn't working right
 - Make the references to prepare_dst, put, and get NULL for now
 - Add opal_output() calls in all the udapl interface functions for debugging
 - Add evd_qlen component parameter to control event dispatcher queue length
 - First stab at component_init and module_init
 - Misc cleanups - whitespace, dead code removal
 - Update copyrights to 2006

This commit was SVN r8701.
2006-01-16 03:01:12 +00:00
George Bosilca
d4699037f7 Protect an assert if the endpoint cache is not activated.
This commit was SVN r8695.
2006-01-14 21:10:09 +00:00
George Bosilca
3317bf81ad A better implementation for the TCP endpoint cache + few comments.
This commit was SVN r8692.
2006-01-14 20:21:44 +00:00
Tim Woodall
a584c60dbe re-worked flow control logic to take into account the return
of credits from the peer prior to local completion, so that
we don't overrun the number of send wqes available.

This commit was SVN r8683.
2006-01-12 23:42:44 +00:00
Andrew Friedley
c0bad339af - Use the GM BTL as a template instead, per Tim's suggestion
- Begin adding uDAPL-specific stuff
- Added config/ompi_check_udapl.m4 - hopefully I did this right

This commit was SVN r8681.
2006-01-12 04:05:02 +00:00
Tim Woodall
63d0438991 merge in changes from release branch
This commit was SVN r8637.
2006-01-04 16:34:45 +00:00
George Bosilca
479d510eaf Use the common SM component to unmap the shared memory file.
This commit was SVN r8623.
2005-12-31 15:07:48 +00:00
George Bosilca
228f966798 A little trick to force qsort to order the sm modules in the order we expect.
On some systems (like windows) qsort can modify the order of modules when the
comparaison function return 0. As we expect to have the
mca_btl_sm_add_procs_same_base_addr called before mca_btl_sm_add_procs we have
force the qsort to return the SM modules in the correct order. Giving to the
same_addr module a slightly higher priority solve this problem.

This commit was SVN r8620.
2005-12-31 14:59:53 +00:00
Tim Woodall
4ff5316b2d correct copy/paste error
This commit was SVN r8601.
2005-12-22 18:07:46 +00:00
Tim Woodall
5d91c492d6 improve diagnostic error messages
This commit was SVN r8600.
2005-12-22 18:01:47 +00:00
Tim Woodall
e9498f7a75 improve error reporting when registrations fail
This commit was SVN r8598.
2005-12-22 16:05:28 +00:00
Andrew Friedley
f402854a96 Initial commit of uDAPL BTL component.
- Copied the template BTL and renamed everything
 - Compiles and shows up correctly in ompi_info, not tested past that
 - Should be ignored for everyone but me

This commit was SVN r8544.
2005-12-19 16:37:05 +00:00
Brian Barrett
a5af07cd6b fixes suggested by Ralf for supporting both Libtool 1 and 2 in Open MPI...
This commit was SVN r8538.
2005-12-19 03:10:23 +00:00
George Bosilca
1b667067d6 I need to know the number of iovec attached to the fragment.
This commit was SVN r8447.
2005-12-10 23:28:16 +00:00
George Bosilca
e5158142b9 The lb should be extracted from the datatype not from the convertor.
This commit was SVN r8446.
2005-12-10 23:27:20 +00:00
George Bosilca
01b0db91ae Get the lower-bound from the data not from the convertor.
This commit was SVN r8444.
2005-12-10 22:38:25 +00:00
George Bosilca
7baae4f394 Protect the headers and remove the unused ones.
This commit was SVN r8439.
2005-12-10 22:04:28 +00:00
Tim Woodall
1929a97d2f corrections for MPI_BOTTOM
This commit was SVN r8429.
2005-12-09 23:27:55 +00:00
George Bosilca
8888bfb063 And the thread-safe version. The lock/unlock macros are supposed to be
empty for non threaded builds, but somehow just by moving the code a
little bit around and removing 2 call to lock/unlock the latency for TCP
went down by 2 micro-seconds ...

This commit was SVN r8426.
2005-12-09 05:16:50 +00:00
Jeff Squyres
6fbd321442 Fix a bunch of install locations for header files
This commit was SVN r8406.
2005-12-08 00:54:44 +00:00
George Bosilca
5851b55647 Improve the latency for small and medium messages. The idea is to decrease the
number of recv system call by caching the data. Each endpoint has a buffer
(the size is an MCA parameter) that can be use as a cache. Before each receive
operation this buffer is added at the end of the iovec list. All data that are
not expected by the fragment will go in this cache. If the cache contain data
all subsequent receive will just memcpy the data into the BTL buffers.

The only drawback is that we will spin around the receive_handle until all the
cached data is readed by the PML layer. This limitation come from the fact that
the event library is unable to call us if there is no events on the socket.
Therefore we are unable to keep the data in the cache until the next loop
into the progress engine.

This commit was SVN r8398.
2005-12-07 00:12:59 +00:00
Brian Barrett
38391e3406 disable shared receive queue support at compile time if the mvapi implementation
does not support shared receive queues (such as the one shipped by SilverStorm / 
Infinicon for OS X).  Reviewed by Galen.

This commit was SVN r8389.
2005-12-06 15:46:30 +00:00
Tim Woodall
3f396aeae9 fix send to self for large messages
This commit was SVN r8379.
2005-12-05 23:36:33 +00:00
Tim Woodall
8c443832ae add a parameter to limit max number of btls (HCA ports used)
This commit was SVN r8342.
2005-11-30 22:18:21 +00:00
Tim Woodall
5db38b38f5 corrections for latency issue
- don't do additional select until non-blocking read fails 
- don't do an additional read for 0 byte message

This commit was SVN r8312.
2005-11-29 17:33:01 +00:00
George Bosilca
9d990af4a5 Remove 2 useless functions. They have been replaced by the mca_base version few commits ago.
This commit was SVN r8287.
2005-11-28 20:14:23 +00:00
Galen Shipman
55a9fbefd8 fix misc compiler warnings..
This commit was SVN r8263.
2005-11-27 22:53:30 +00:00
George Bosilca
bfa4a40983 Cast it the a well known type to remove a warning.
This commit was SVN r8261.
2005-11-26 21:17:15 +00:00
George Bosilca
b9a739e2b6 Remove 2 useless assignments (they are done at the end before the return).
This commit was SVN r8260.
2005-11-26 21:16:30 +00:00
George Bosilca
00c10a6372 Make the MX BTL startup scalable. When the number of processes involved in the MPI application
increase the previous connection code was broken. It can take as much as 60 seconds to connect
64 processes. Now we do not create the connections when we add the procs but only when we send
them the first message. Now it take only 1.6 seconds to setup a 64 procs MPI job over MX (doing a 2 steps barrier in order to insure that we create all the connections).

This commit was SVN r8252.
2005-11-23 23:48:56 +00:00
George Bosilca
7c73095440 Update the correct sended size.
This commit was SVN r8237.
2005-11-22 21:51:04 +00:00
Brian Barrett
20cea60b82 * fix "make distclean" error in PML
* turns out (duh!) that there was a reason that the <projectdir>dir
  variable was set in the AM conditional.  If not, stupid directories
  are created and not needed...  duh.

This commit was SVN r8205.
2005-11-20 07:41:09 +00:00
Brian Barrett
8faa1884f0 * The last of the build system optimizations. Combine the component and
component/base Makefile.am files, reducing the time configure spends
  stamping out Makefiles at the end
* Install base_impl.h file when devel-headers are being installed

This commit was SVN r8200.
2005-11-20 01:03:01 +00:00
Galen Shipman
eb3ccdb4d8 make compiler happy on false postive warning..
This commit was SVN r8192.
2005-11-18 18:48:11 +00:00
Galen Shipman
4fce90a37b one last warning fixed on 32 bit platforms.
This commit was SVN r8191.
2005-11-18 17:27:09 +00:00
Galen Shipman
635e7a682b fix for 32bit compile warnings.
This commit was SVN r8190.
2005-11-18 17:08:51 +00:00
George Bosilca
bba42f5e49 We are allowed to call mx_set_error_handler before any other MX functions, even before mx_init.
With the errors set to return mx_init will not force the application to exit if there is no MX kernel
module loaded.

This commit was SVN r8184.
2005-11-17 18:47:27 +00:00
Galen Shipman
dde38d4119 reset sg_entry->addr to point at header when sending control messages.
cast to uint64_t (the correct datatype per verbs.h) instead of uintptr_t. 

This commit was SVN r8175.
2005-11-17 05:45:33 +00:00
Tim Woodall
58dd6c2493 - merge from release branch
This commit was SVN r8174.
2005-11-17 05:32:30 +00:00
Tim Woodall
01b94862df merge from release branch
This commit was SVN r8168.
2005-11-16 17:12:44 +00:00
Tim Woodall
142b7cc682 merge from release branch
This commit was SVN r8167.
2005-11-16 17:10:49 +00:00
George Bosilca
7ad6b2b70e Add a MCA params to allow/disable the MX shared memory capabilities. Right now this param
is labeled as internal so the users will not see it but it is not read-only so we can still
play with it (that's for our internal tests). This is supposed to dissapear later after the
next (or next next) release of the MX library, but we need it now as a quick fix before the
release.

This commit was SVN r8161.
2005-11-15 20:54:45 +00:00
Tim Woodall
54b6acb2b4 merge from release branch
This commit was SVN r8149.
2005-11-13 23:31:20 +00:00
Jeff Squyres
e6a3a406e2 Remove debugging printf
This commit was SVN r8139.
2005-11-13 14:57:44 +00:00
Jeff Squyres
97b97f84b8 Next checkpoint in the sm btl fixes:
- Add big comment about a general overview of what the sm btl is doing
- random small code cleanups
- fix instances of mca_btl_sm[0] to mca_btl_sm[1] where relevant
- remove a lot of unused, confusing, and incorrect interface functions
  from ompi_fifo.h and ompi_circular_buffer.h.  These functions, if
  they were used, would not work properly with the scheme that the sm
  btl uses with the fifos (i.e., receiver makes right -- if necessary)
- add some missing offset computations in the fifo and circular buffers
- change the types of offsets to be ssize_t, not size_t
- remove an offset parameter from a function that didn't need it

This commit was SVN r8135.
2005-11-12 22:32:09 +00:00
Jeff Squyres
6444887373 - Add copyright headers to btl_sm_frag.h
- Ensure to convert base_shared_mem_flags to be a relative offset in
  the global storage, and then to convert that back to an absolute
  virtual address before we try to use it
- Don't double increment n_local_procs when calculating the peer rank
  during bootstrapping of the different base address case

Something else is still wrong; if mmap() returns a different base
address, things don't work (i.e., segv or hang forever when you try to
send a message).  More specifically, the bootstrapping now seems to
correctly handle the case when mmap() base addresses are different,
but the message passing does *not* -- it always assumes that the
mmap() base addresses are the same.

Still working on the fix for that -- want to checkpoint what has been
done so far to facilitate working on different machines...

This commit was SVN r8134.
2005-11-12 14:04:46 +00:00
Galen Shipman
5a4b1ebdd4 in mca_btl_openib_endpoint_post_send: set opcode on work request before potentially inserting it on pending list..
This commit was SVN r8127.
2005-11-12 02:11:14 +00:00
George Bosilca
e297b58fbd Add more MCA arguments.
Make some of them system (not seems by the user) and read-only.
Small cleanups.

This commit was SVN r8126.
2005-11-12 00:31:59 +00:00
Galen Shipman
5cf2d8d40c default to first available IP address if no matching subnets found..
This commit was SVN r8125.
2005-11-12 00:31:34 +00:00
Tim Woodall
654ba6d262 srq cleanup
This commit was SVN r8106.
2005-11-10 23:29:54 +00:00
Tim Woodall
2013104d1a SRQ cleanup
This commit was SVN r8104.
2005-11-10 20:51:56 +00:00
Tim Woodall
4a06e8463c port of flow control from mvapi
This commit was SVN r8102.
2005-11-10 20:15:02 +00:00
Tim Woodall
985c2ca943 cleanup
This commit was SVN r8093.
2005-11-10 15:40:27 +00:00
George Bosilca
8119c970db Improve the connection algorithm for MX. There are 2 problems here:
- first we setup the connections in the begining with all the peers
- MX does not handle well the case where several peers make connections to the same
  destination simultaneously.

So I change the order in which we connect. First we compute our rank in the array,
then in a round-robin fashion we setup connection starting with our left neighboard.

This commit was SVN r8075.
2005-11-10 01:15:49 +00:00
George Bosilca
dc1ad885d1 Move the output message outside the loop. We print an error message only once when we fail to
connect to a peer. Bonus, we print some additional informations like its MAC Address or name
if it's on our tables.

This commit was SVN r8074.
2005-11-10 01:13:18 +00:00
Tim Woodall
62fd74140b decrease socket buffers sizes to same as ptl code
This commit was SVN r8072.
2005-11-10 00:40:55 +00:00
Tim Woodall
b5ed723ea4 - check for null return
- disable debug

This commit was SVN r8070.
2005-11-10 00:02:18 +00:00
Galen Shipman
3079fc2da1 use correct lock for threaded build..
This commit was SVN r8055.
2005-11-09 16:09:05 +00:00
Tim Woodall
78522ed454 send credits on correct qp
This commit was SVN r8050.
2005-11-08 22:59:44 +00:00
Tim Woodall
b4ca28da4b removed debug
This commit was SVN r8046.
2005-11-08 21:41:02 +00:00
Tim Woodall
2d9c509add flow control
This commit was SVN r8039.
2005-11-08 16:50:07 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Tim Woodall
31eb35c3f1 correct rnr parameter - need to review this code and pass correct data type
This commit was SVN r7936.
2005-10-31 17:18:39 +00:00
George Bosilca
b0def3f6bf MX has 2 limitations regarding the iovecs. First they do not support iovec witha total size
larger than 32K for inter-nodes transfert ... and then they do not support iovecs larger than
16K for inter-node transfert. Therefore we have to set the size of our first fragment to
16K to match both cases.

This commit was SVN r7926.
2005-10-28 20:37:43 +00:00
Galen Shipman
4a15761732 add support for srq limit reached async event, even though it doesn't appear
to  be supported by mellanox vapi.. perhaps this will be supported in the near 
future, for now it doesn't hurt to have it in the trunk


Also cleanup the receive descriptor posting macro's.. 

This commit was SVN r7903.
2005-10-27 22:47:19 +00:00
Tim Woodall
3bd5b81dfa Submitted: Gleb Natapov
This commit was SVN r7899.
2005-10-27 17:48:40 +00:00
Tim Woodall
13409ec53b correction for hang, check for additional fragments before callback,
which may queue a new fragment

This commit was SVN r7889.
2005-10-27 01:39:39 +00:00
Galen Shipman
cb84a57c57 add endpoint and srq flow-control..
Note, we are failing the ring tests in the intel p2p test suite, but we seem
to fail the same tests under the current trunk.. will look into this further. 

This commit was SVN r7823.
2005-10-21 02:21:45 +00:00
Galen Shipman
0d1d231169 convert to new mca params, adding description strings.
changed mca param rr_buf_min/max to rd_min/max 
Add bandwidth param to openib 

This commit was SVN r7815.
2005-10-20 02:55:21 +00:00
Brian Barrett
de5e501519 Rather than hard spinning waiting for something to happen when doing shared
memory initialization, call opal_progress() to push any pending events
around and possibly yield the processor if nothing entertaining is happening.

This should probably go to the 1.0 branch.

This commit was SVN r7808.
2005-10-19 00:56:14 +00:00
Galen Shipman
4d2d39b0a6 intial checking of SRQ flow control support for mvapi
This commit was SVN r7796.
2005-10-18 14:55:11 +00:00
Galen Shipman
3efecaaeda convert openib btl to use new mca_param registration.. Also, change rr_buf_min
and rr_buf_max to rd_min and rd_max 

This commit was SVN r7786.
2005-10-17 20:00:34 +00:00
Tim Woodall
c944988b9e merge in changes from release branch - acquire/release send token for put/get
This commit was SVN r7784.
2005-10-17 18:59:28 +00:00
Brian Barrett
1302cb4072 The next in a long line of crazed build system changes from Brian. This was
originally suggested by Ralf Wildenhues, to try to speed autogen, configure,
and make (and possibly even make install).  Use automake's include directive
to drastically reduce the number of Makefile files (although the number of
Makefile.am files is the same - most are just included in a top-level
Makefile.am).  Also use an Automake SUBDIRs feature to eliminate the
dynamic-mca tree, which was no longer really needed.  This makes adding
a framework easier (since you don't have to remember the dynamic-mca
tree) and makes building faster (as make doesn't have to recurse through
the dynamic-mca tree)

This commit was SVN r7777.
2005-10-17 00:21:10 +00:00
Tim Woodall
d859855dea merge in changes from 1.0
This commit was SVN r7728.
2005-10-12 15:54:35 +00:00
Galen Shipman
23cbac25c8 lower default free list sizes..
This commit was SVN r7676.
2005-10-09 18:15:12 +00:00
Galen Shipman
fb19cc4177 compiler warning fixes..
This commit was SVN r7661.
2005-10-07 17:38:34 +00:00
George Bosilca
1fe18814da Decrease the default length for the first fragment.
This commit was SVN r7643.
2005-10-06 00:05:01 +00:00
George Bosilca
0f04132b13 mx_connect in the MX documentation is supposed to take a timeout in seconds. However, in real life it seems that the timeout should be in micro-second.
This commit was SVN r7642.
2005-10-06 00:04:27 +00:00
Tim Woodall
3b4a134a24 - removed unused define
- correct free to release registration rather than retain it

This commit was SVN r7611.
2005-10-04 14:33:26 +00:00
George Bosilca
6b3d02b514 Warning cleanups. On some OSes the iov_base member of the iovec structure is defined as an void * when
on others as an char*. Thus the right side of all assignment should be explicitly casted to an void* in
order to avoid any casting complaints from the compilers.

This commit was SVN r7607.
2005-10-04 12:36:07 +00:00
George Bosilca
3453a6c0e9 Remove some compiler warnings about unused variables
Correctly define the 64 bits constants.
Some minor cleanups.

This commit was SVN r7606.
2005-10-04 12:29:51 +00:00
George Bosilca
492c0e59dc Correct the casting type and remove some useless output (already commented out).
This commit was SVN r7605.
2005-10-04 12:28:47 +00:00
Galen Shipman
eefe0fd04a fix threaded compile
fix misc warnings 
cleanup posting of receive descriptors 
comment why we retain before deregister in rcache_rb_mru.c 

This commit was SVN r7595.
2005-10-03 16:35:12 +00:00
Galen Shipman
f46548e691 Add SRQ support to OpenIB btl, removed old mca param - not used..
This commit was SVN r7585.
2005-10-02 18:58:57 +00:00
Galen Shipman
67d38b7896 Add multi-nic support to openib
Fix connection establishment race in openib 
Other misc 

This commit was SVN r7570.
2005-09-30 22:58:09 +00:00
Brian Barrett
db872a0fbb * check that return from ibv_get_devices isn't NULL before calling dlist_start().
On thor, if IB is down, we get NULL back from ibv_get_devices(), which then
  caused segfaults in dlist_start().
* Pretty-print error message if no HCAs found

This commit was SVN r7557.
2005-09-30 14:58:59 +00:00
Jeff Squyres
fcef1774d5 Per advice from Ralf W., change the pkgdata declarations in
Makefile.am's to be a *slightly* more correct (and, more importantly,
less error-prone) construct.

This commit was SVN r7554.
2005-09-30 13:32:39 +00:00
Jeff Squyres
80b7deb4d7 Add in EXTRA_DIST to get helpfile in tarballs
This commit was SVN r7553.
2005-09-30 10:25:04 +00:00
Brian Barrett
7b20370306 * pretty-print an error message if a btl component loads but can't find
any NICs to use
* Make mvapi, gm, and mx components all publish information, even if there
  are no NICs available so that modex_recv doesn't hang.  If there are no
  NICs available, don't set the reachable bit, but don't do anything
  to fail.  This unfortunately doesn't cover the hangs that will result if
  different procs load different sets of components, but it's a start

This commit was SVN r7550.
2005-09-30 04:39:44 +00:00
Brian Barrett
a77c908496 * the last of the tuning params for portals
This commit was SVN r7548.
2005-09-30 04:05:31 +00:00
Galen Shipman
8239e635b9 fix misc warnings, cleanup macro..
This commit was SVN r7547.
2005-09-30 03:13:51 +00:00
Brian Barrett
997644af31 * There are now two forms of ibv_create_cq, one with 3 params and one with 5.
Try to detect which form this version of Open IB uses, defaulting to the 5
  version if we can't figure it out (the new version has 5 params)
* Only add -lcm if it exists on the system - some versions of Open IB
  apparently don't need it.

This commit was SVN r7542.
2005-09-29 13:35:57 +00:00
Galen Shipman
26a74d42fa release, not retain on gm_free
This commit was SVN r7535.
2005-09-28 20:18:52 +00:00
Galen Shipman
af04b3e1ab fix warnings..
This commit was SVN r7515.
2005-09-27 14:23:51 +00:00
Galen Shipman
3c97b3f722 Modified the registration to include a base_align and bound_align for
searching the tree. Modified the memory callback to search the tree at each
page boundary for registrations. This is necessary as an application may
malloc memory and send out of any portion of that memory, even discontiguous
regions. 

This commit was SVN r7510.
2005-09-27 02:01:21 +00:00
Brian Barrett
d9e80d8f2a * increase size of event queue for receives - it was too small to be useful
on a reasonably sized machine
* if no mpool exists, don't try to malloc out an array of 0 bytes

This commit was SVN r7507.
2005-09-25 17:04:03 +00:00
Galen Shipman
9fe5844071 decrement ref count on removal of registration from mru and tree.
add misc asserts to check for proper reference counting. 

ugly hack 1 -- use mallopt to never release memory ala sbrk - this is
commented out in mca_btl_mvapi_component_init

ugly hack 2 -- test registrations comming out of the tree via rcache_find, for
an unknown reason the tree is returning registrations where the address is not
within the base or bound of the registration. If this happens, we return
NULL. 

comment out code to enable mem hooks if leave_pinned is set, note we can do
this via an mca param and will default it to leave_pinned with mem_hooks when
we iron out these issues. 

I am adding a unit test for the rcache. Note that we have a unit test for the
rb tree but the compare function is significantly different than that used for
registrations. After we have tracked down the issues with rcache_rb we will
remove the above hacks. 

This commit was SVN r7499.
2005-09-24 00:24:49 +00:00
Brian Barrett
50dc5499b4 * fix some remaining --with-btl-portals configure issues
This commit was SVN r7498.
2005-09-24 00:11:40 +00:00
Brian Barrett
0d68728b94 * add some more debugging output for send fragment issue to figure out why
Red Storm is complaining about invalid memory pointer (need to go back
  to Linux and look at this with valgrind)
* Turn off send in place for now, so I can run the tests on RS and see if
  everything else is ok

This commit was SVN r7497.
2005-09-23 19:30:54 +00:00
Brian Barrett
07b0b8c943 * add some useful debugging output
* fix dumb bug in btl_portals_get where I using the dest descriptor key instead
  of the source descriptor key for the match bits, resulting in a PtlGet() with
  the wrong match bits

This commit was SVN r7496.
2005-09-23 15:30:18 +00:00
Tim Woodall
147716c249 added hostname to error output
This commit was SVN r7486.
2005-09-22 16:41:34 +00:00
Andrew Friedley
555ae37255 Add lib{opal,orte,mpi}.la to appropriate LIBADD's, some whitespace cleanup as well.
This commit was SVN r7477.
2005-09-22 12:28:54 +00:00
Tim Woodall
a74ca0062a reductions to initial memory footprint
This commit was SVN r7455.
2005-09-21 19:10:56 +00:00
Galen Shipman
4296e723c9 default free_lists to smaller size..
This commit was SVN r7454.
2005-09-21 18:55:07 +00:00
Galen Shipman
96ab5a6bd3 we can be in WAITING_ACK state without a race if the OOB ack is "slower" than
the scheduling of queued IB send operations. 

This commit was SVN r7452.
2005-09-21 16:47:08 +00:00
Tim Woodall
0ee34051f8 debug asserts
This commit was SVN r7449.
2005-09-21 15:30:17 +00:00
Tim Woodall
1b73d3856e possible race condition - set endpoint state before sending connect ack
This commit was SVN r7448.
2005-09-20 21:03:55 +00:00
Brian Barrett
d81726833e * Add memory barriers for shared memory. Rich and I think we got them
all and the Intel tests pass slightly oversubscribed.

This commit was SVN r7431.
2005-09-19 16:28:25 +00:00
Tim Woodall
aeb5bc3f57 still need to cleanup/revise the template for mpool changes
This commit was SVN r7425.
2005-09-19 14:34:24 +00:00
George Bosilca
b5cb27c006 The self should use self named files.
This commit was SVN r7421.
2005-09-18 12:37:15 +00:00
Galen Shipman
808b2c1c53 threaded build fix for btl_gm..
This commit was SVN r7409.
2005-09-16 17:18:15 +00:00
Tim Woodall
31d392af95 correct name
This commit was SVN r7376.
2005-09-14 22:35:58 +00:00
Tim Woodall
d190e6a315 handle losing a connection
This commit was SVN r7373.
2005-09-14 21:27:30 +00:00
Tim Woodall
c25fb5dab0 - fixed issue w/ btl send-in-place option that was affecting tcp
- reduced size of match header by an additional 4 bytes to 16 bytes
- corrections for buffered send (work in progress)

This commit was SVN r7371.
2005-09-14 17:08:08 +00:00
Brian Barrett
e98415eb7b * make tree compile on OS X
This commit was SVN r7370.
2005-09-14 15:52:42 +00:00
Galen Shipman
f0b1ea52bc if all else fails in prepare_src,, pack
init the rdma_pending list in ob1

This commit was SVN r7366.
2005-09-14 04:41:33 +00:00
Brian Barrett
1290b8eed2 * some debugging to figure out why get isn't working on RS
This commit was SVN r7354.
2005-09-13 20:52:56 +00:00
George Bosilca
ad0c0cdc03 Make the GM btl compile again. There were just some typos.
This commit was SVN r7352.
2005-09-13 20:19:21 +00:00