1
1
Граф коммитов

421 Коммитов

Автор SHA1 Сообщение Дата
Andrew Friedley
365c81d6e9 Fix a few issues reported by Terry Dontje:
1. ompi/mca/btl/udapl/btl_udapl_proc.c should be including
btl_udapl_endpoint.h for mca_btl_udapl_proc_insert function.

2. btl_udapl_endpoint.c it looks like you are using
&endpoint->endpoint_lock when you should use &ep->endpoint_lock in a
OPAL_THREAD_LOCK call.

3. btl_udapl_frag.h has a couple opal_list_item_t's that should be
ompi_free_list_item_t in the _FRAG_ALLOC_{EAGER,MAX} macros.

This commit was SVN r10442.
2006-06-20 17:13:44 +00:00
George Bosilca
044868df45 Set the destination descriptor before calling the recv registration. Once
this call is completed, we have to remove it in order to be able to cleanup
correctly the fragments.

This commit was SVN r10428.
2006-06-20 14:11:09 +00:00
George Bosilca
1b18b7d934 Change the parameter registration of this BTL to the new calls (new is relative
here). Change the self BTL to use RDMA protocol.

This commit was SVN r10427.
2006-06-20 14:09:58 +00:00
Jeff Squyres
1d27ca5d0a Until a real fix for #142 is found, this workaround prohibits using
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1.  If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).

This commit was SVN r10424.
2006-06-20 11:32:46 +00:00
Jeff Squyres
600bf4295a Update the help message to be slightly more concise and clear
This commit was SVN r10422.
2006-06-20 11:23:38 +00:00
Brian Barrett
3d027e57a8 * fix for ticket #141. If we are going to shortcut out of polling the
send/receive queues if there is something available in the short message
  rdma queues, then we have to poll *ALL* the rdma queues before exiting,
  or we aren't fair about frag reception and fall into degenerate matching
  cases.

This commit was SVN r10410.
2006-06-17 21:32:25 +00:00
Brian Barrett
05046e8ad2 if MX isn't running on some hosts, but is on others, we were blocking in the modex receive
waiting for the non-running procs to publish their contact information.  Publish their
(lack of) contact information.

This commit was SVN r10355.
2006-06-14 19:07:38 +00:00
George Bosilca
aca71521db Complete the move of the mpool registration from opal_list_item_t to the
ompi_free_list_item_t.

This commit was SVN r10354.
2006-06-14 17:43:50 +00:00
Brian Barrett
d367dc5d56 * Fix for bug #115 -- we need to decrement the use count on a pinned buffer
so that memory is actually deregistered.  Reviewed by Galen.

This commit was SVN r10349.
2006-06-14 13:38:24 +00:00
Andrew Friedley
c68c6ac122 A number of fixes and the usual cleanup..
- Added some basic flow control to limit number of posted sends.
- Merged endpoint send/recv lock into single endpoint lock.
- Set the LMR triplet length in the send path, not at allocation time.
  This has to be done because upper layers might send less than the
  amount allocated.
- Alter the tie-breaker if statement protecting the second call
  to dat_ep_connect().  The logic was reversed compared to the tie-
  breaker for the first dat_ep_connect(), making it possible for
  3 or more processes to form a deadlock loop.
- Some asserts were added for debugging purposes.. leaving them
  in place for now.

This commit was SVN r10317.
2006-06-12 22:42:01 +00:00
Galen Shipman
218a438509 finished the ompi_free_list_t class nightmare..
This commit was SVN r10314.
2006-06-12 22:09:03 +00:00
Brian Barrett
d5acb4e3cc * silence dumb (and mostly useless) warning during cleanup
This commit was SVN r10280.
2006-06-09 21:09:53 +00:00
Jeff Squyres
a4030ad2d9 Improve the tremendously unhelpful MCA help message for the
btl_openib_ib_mtu and btl_mvapi_ib_mtu MCA params by showing the valid
values what what they represent (got a question about this from Cisco
testing engineers).

This commit was SVN r10277.
2006-06-09 18:02:45 +00:00
Andrew Friedley
75176370ae blah. somehow missed adding .ompi_ignore/.ompi_unignore.
This commit was SVN r10272.
2006-06-09 00:15:36 +00:00
Andrew Friedley
cca1616368 Finally committing the UD BTL.
UD is the Unreliable Datagram transport for Infiniband, specifically OpenIB.  This BTL is derived from the existing openib BTL, which is RC (Reliable Connection) based.

Still a work in progress, as there is a lot of work left to do.  Specifically, performance, scalability, and flow control need to be addressed.

Currently I'm playing around with different methods for handling receive buffers, as well as profiling to figure out where the time is going.

This commit was SVN r10271.
2006-06-09 00:13:45 +00:00
Galen Shipman
90799f82cd copy paste error..
This commit was SVN r10220.
2006-06-06 02:38:29 +00:00
Galen Shipman
cc54b07aa0 add better error messages for vapi retry exceeded errors.
This commit was SVN r10219.
2006-06-06 02:04:56 +00:00
Galen Shipman
9e6e7575b9 doh... add the file..
This commit was SVN r10210.
2006-06-05 21:24:42 +00:00
Galen Shipman
f05dee0435 add help file to explain why things went south..
This commit was SVN r10209.
2006-06-05 21:23:45 +00:00
Galen Shipman
74c97fb784 cleanup error reporting.. use ompi_proc_t->proc_name if available this gives
us source/dest hostnames for communication errors.. 

This goes to 1.1 branch (reviewed by Brian).. 

This commit was SVN r10200.
2006-06-05 20:02:41 +00:00
Galen Shipman
0344ae4ac5 Fix to allow eager limit and max send size to be any size (within resource limitations). Instead of storing the ompi_free_list_t * in the fragment, we use the frag type enum, this tells us where the frag came from and where it should return.. This could also be done in mvapi but is not a high priority moving forward..
Review by Brian, needs to hit the trunk + 1.1 release.. 

This commit was SVN r10157.
2006-06-01 02:32:18 +00:00
Brian Barrett
5163f2b296 Fix for bug #36. The MX, MVAPI, and OpenIB components don't have
support for progress threads, so we shouldn't build them or try to use
them when support for progress threads has been requested.  The TCP, GM,
SELF, and SM BTLs should have progress thread support, so they aren't
disabled.  The Portals BTL isn't compiled on platforms with threads,
so it doens't need to be updated.

This commit was SVN r10156.
2006-06-01 01:30:16 +00:00
Galen Shipman
c79efc9efb track which list a fragment came from, allows returning based on list, not
on size. 

This commit was SVN r10142.
2006-05-31 14:24:32 +00:00
Brian Barrett
c723d196c5 Rather than using fragment size to determine fragment type, use an enum.
Do this rather than the my_list pointer because we need to do some
things that are somewhat special because we pre-pin eager fragments but
not send fragments.  Also makes a couple ideas I have slightly easier to
play around with.

This commit was SVN r10127.
2006-05-31 03:34:32 +00:00
Galen Shipman
2667c52a5d Track fragments by list, not by size..
-- reviewed by Brian, needs to hit all the branches.. 

This commit was SVN r10078.
2006-05-25 18:07:26 +00:00
Galen Shipman
38a0561d9b Allow maximum send size to be less than the eager limit.
Instead of figuring out which free list the fragment belongs to based on size
we simply store a pointer to the list which it belongs in the fragment.

This was reviewed by Brian and should hit all the branches.

This commit was SVN r10072.
2006-05-25 16:57:14 +00:00
Andrew Friedley
8a3d0862ca I can commit! *happy dance*
Trying to remember what I did here.. eager/max messages should work now, no RDMA yet.  A number of other fixes and cleanups.

I do know of two problems:
 Bad stuff happens when flooded with send frags too quickly - the BTL doesn't handle flow control.
 Certain IBM tests turn up a length assertion in the datatype engine - needs more investigation.

This commit was SVN r10070.
2006-05-25 15:47:59 +00:00
Gleb Natapov
f590d8a190 fix eager RDMA on PPC64.
This commit was SVN r10059.
2006-05-25 11:05:12 +00:00
Jeff Squyres
dd44d36be0 Fix for ticket #25. Ensure that in the threaded case where we have
This commit was SVN r10043.
2006-05-24 16:15:07 +00:00
George Bosilca
085cac552f Don't let TCP to create local connections, we have the self BTL for this purpose.
This commit was SVN r10018.
2006-05-23 03:06:32 +00:00
Jeff Squyres
7b59847765 Ensure that endpoint->endpoint_addr is not NULL before trying to
derefence through it.  It is legal for endpoint_addr to be NULL in the
destructor because if btl_tcp_add_procs() -> btl_tcp_proc_insert()
returns UNREACH, then endpoint_addr will be NULL and we'll OBJ_RELEASE
it.

This commit was SVN r9940.
2006-05-16 19:01:08 +00:00
Jeff Squyres
e24377a89c Back out a pair of commits from George from last week because they
apparently don't work properly: r9869, r9868 (sm btl alignment issues)

This commit was SVN r9936.

The following SVN revision numbers were found above:
  r9868 --> open-mpi/ompi@9b985c3216
  r9869 --> open-mpi/ompi@adedf511fb
2006-05-16 16:48:43 +00:00
Brian Barrett
dcc6b47fa2 * put rdma operations in the send event queue instead of receive because it's
easier to do event accounting that way
* greatly increase receive event and buffer sizes.  We're still about half
  of what Cray defaults to, so I don't feel bad about the increases
* Implement a pre-pinning optimization for eager fragments - will be
  pinned on first use and left pinned for the life of the fragment
* Since we can't have two receive frag callbacks fired at the same time,
  don't have receive free list - just keep one receive fragment in the
  module.  Saves a big free list and all that interaction.

This commit was SVN r9915.
2006-05-14 04:23:26 +00:00
Andrew Friedley
4c3aa05c83 uDAPL has an expects memory for enumerating interface adapters in a really
weird way - fix up to do things 'properly'.

Add my sandia username to the unignore.

This commit was SVN r9879.
2006-05-10 19:50:30 +00:00
George Bosilca
adedf511fb Remove the printf that I unfortunately commit.
This commit was SVN r9869.
2006-05-10 00:02:54 +00:00
George Bosilca
9b985c3216 Force the useful data to be aligned on special boundary. It is 32 bits
right now. Some testing on large NUMA machines should be done in order
to make sure that we need to export this variable out to the MCA layer.

This commit was SVN r9868.
2006-05-09 21:46:10 +00:00
George Bosilca
a386fccccc Increase the default limits for the SM BTL. These new
values allow better performances on all the clusters
I was able to test.

This commit was SVN r9867.
2006-05-09 21:44:24 +00:00
Gleb Natapov
0c34d5c9e6 fix endpoint matching in on demand connection establishment. This fix is in mvapi btl already.
This commit was SVN r9855.
2006-05-09 12:12:52 +00:00
Tim Woodall
350d5b1713 change hardcoded values into mca params
This commit was SVN r9815.
2006-05-04 15:20:18 +00:00
George Bosilca
bdecdc8d41 Cleanup the MX BTL. Remove all mpool related code as there will never be a MX mpool.
This commit was SVN r9808.
2006-05-04 06:55:45 +00:00
Tim Woodall
4fd2a71b6c removed debug code - free list implementation has changed
This commit was SVN r9750.
2006-04-27 15:34:12 +00:00
Brian Barrett
9cab1bb54a * re-enable the eager fragment throttling, this time with the proper threshold value for when
the memory descriptor is closing itself, so that it actually works properly ;).  I think I
  was just getting lucky and not sending enough short messages with the reference impl.

This commit was SVN r9748.
2006-04-27 14:13:52 +00:00
Brian Barrett
66d1d3b83f * add a quick debugging sanity check
* It appears that Cray's SeaStar has some horrible performance for iovecs - IN_pLACE
  was actually slower than copying into eager frags.  Ugh.  And we don't even pre-pin
  eager frags yet!

This commit was SVN r9738.
2006-04-27 02:55:31 +00:00
George Bosilca
3e968d4f63 There is no length on the free list.
This commit was SVN r9704.
2006-04-24 23:13:51 +00:00
Brian Barrett
9a65ddd788 * back out r9005, which for some reason works fine on the reference implementation
but causes resource exhaustion on the Red Storm implementation.  Sigh...

This commit was SVN r9686.

The following SVN revision numbers were found above:
  r9005 --> open-mpi/ompi@20d06e889e
2006-04-22 20:12:33 +00:00
Andrew Friedley
345551cb36 Checkpoint before starting work on max-sized frags (maybe user too?).
- Some initial work on prepare_src
- Move some fragment initialization around
- Fix a union casting issue on picky compilers, identified by Don Kerr
- Other small cleanups/bugfixes

This commit was SVN r9662.
2006-04-19 22:20:22 +00:00
George Bosilca
61bea41350 The same in MX (missing copyright).
This commit was SVN r9661.
2006-04-19 21:37:30 +00:00
George Bosilca
afe9821d84 Add a missing copyright.
This commit was SVN r9660.
2006-04-19 21:36:22 +00:00
Tim Woodall
10f343734f decrease eager limit to 12K (improves latency)
This commit was SVN r9646.
2006-04-14 22:28:37 +00:00
Tim Woodall
6523c12e4b - decrease eager limit to 12K (improves latency)
- trigger event library while setting up connections

This commit was SVN r9645.
2006-04-14 22:28:05 +00:00