1
1

613 Коммитов

Автор SHA1 Сообщение Дата
Galen Shipman
c93711cfdb checking for max_inline_data == 0 as an error condition is not valid,, so
don't do it.. 

This commit was SVN r11132.
2006-08-08 16:53:47 +00:00
Brian Barrett
0ba0a60ada * Merge in new version of the pt2pt one-sided communication component,
implemented entirely on top of the PML.  This allows us to have a
  one-sided interface even when we are using the CM PML and MTLs for
  point-to-point transport (and therefore not using the BML/BTLs)
* Old pt2pt component was renamed "rdma", as it will soon be having
  real RDMA support added to it.

Work was done in a temporary branch.  Commit is the result of the
merge command:

  svn merge -r10862:11099 https://svn.open-mpi.org/svn/ompi/tmp/bwb-osc-pt2pt

This commit was SVN r11100.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r10862
  r11099
2006-08-03 00:10:19 +00:00
Brian Barrett
9c30aefff5 * constant is always defined -- use #if, not #ifdef
This commit was SVN r11089.
2006-08-02 18:37:41 +00:00
Galen Shipman
fb9210463f clarify assignment..
This commit was SVN r11065.
2006-07-31 20:54:54 +00:00
Galen Shipman
ce0b8d9b48 cleanup of cq/srq sizing..
This commit was SVN r11061.
2006-07-31 17:24:39 +00:00
Galen Shipman
c9e0eda190 Initialize the completion queue to a reasonable size based on maximum number
of send/receives outstanding.

Use ibv_cq_resize if available after initial creation of completion queue if
cq_size is too small (based on number of peers). 

This commit was SVN r11053.
2006-07-30 00:58:40 +00:00
Donald Kerr
2e5e01a8df Remove dependency on known port range and allow udapl to provide the port number.
This commit was SVN r11040.
2006-07-28 13:58:21 +00:00
Donald Kerr
fcb932a6d9 Workaround for bug in Solaris udapl library where dat_evd_dequeue does not dequeue DAT_CONNECTION_REQUEST_EVENT.
This commit was SVN r11032.
2006-07-27 16:13:46 +00:00
Gleb Natapov
72575d81d2 Create separate pool for control messages. It is unlimited, but the maximum number of element that are allocated from it is limited by number of connections.
This commit was SVN r11028.
2006-07-27 14:09:30 +00:00
Gleb Natapov
4b605295b3 remove unused field.
This commit was SVN r10965.
2006-07-24 06:12:16 +00:00
Gleb Natapov
3b34dc8df8 remove MCA_BTL_IB_FRAG_ALIGN. Alignment is handled in free_list_t.
This commit was SVN r10945.
2006-07-23 12:33:49 +00:00
Gleb Natapov
91f48f9a79 Merge with gleb-pml branch. Add out of resource handling support to PML layer.
If resource is not available request is added to one of the pending list and retried later.

This commit was SVN r10900.
2006-07-20 14:44:35 +00:00
Gleb Natapov
383694c68d Add support to get alignemnt buffers from free_list_t. Convert openib BTL to new interface.
This commit was SVN r10899.
2006-07-20 14:39:05 +00:00
Brian Barrett
4c101c6394 * rename the collectives sm bootstrap area to be consistent with other
shared memory segments
* make sure to properly unlink the collectives sm bootstrap area at
  shutdown
* Add missing / in the path for the mpool shared memory segment
* make sure to release the common_mmap structure in the SM btl
  after unlinking the file during shutdown

This commit was SVN r10886.
2006-07-19 20:55:29 +00:00
George Bosilca
21c542f0a5 Make the SM BTL FT friendly. Now there are 3 FT friendly BTLs: TCP, SM
and self.

This commit was SVN r10780.
2006-07-13 07:42:18 +00:00
George Bosilca
d00e6e29e8 Create a close function for the mpool SM module, in order to allow the cleanup. The
mca_common_sm_mmap file was left over by the SM mpool, and there was nobody able
to unmap and unlink it.

This commit was SVN r10770.
2006-07-12 22:12:07 +00:00
George Bosilca
fd39203262 As the self proc is marked as local, there will always be at least one local
proc. Don't create the SM file until we really know there is someone lse on
the same node.

This commit was SVN r10740.
2006-07-11 17:05:13 +00:00
George Bosilca
14b3f141db Nothing relevant !!!
This commit was SVN r10711.
2006-07-11 00:30:26 +00:00
Andrew Friedley
b7e0484c37 Give up on dat_ep_query() and instead manually send our address information across the wire after connection establishment.
I've introduced a race condition - seeing occasional LOCAL_LENGTH errors on the receive side.  I think I'm mixing up eager/max somehow - will look at it more on monday.

This commit was SVN r10690.
2006-07-07 21:48:16 +00:00
Gleb Natapov
e05ec69dc4 print "flush error" only once.
This commit was SVN r10672.
2006-07-06 08:03:01 +00:00
Gleb Natapov
9b0807e547 Put pending fragment on the right waiting list.
This commit was SVN r10671.
2006-07-06 07:51:23 +00:00
Brian Barrett
4ee4acb6a6 * ignore some Cray-only code when not on the Cray machine
This commit was SVN r10660.
2006-07-05 17:16:27 +00:00
Brian Barrett
043153dad3 * fix opal_list_item_t -> ompi_free_list_item_t type change
This commit was SVN r10659.
2006-07-05 17:02:16 +00:00
Brian Barrett
47725c9b02 * Add new PML (CM) and network drivers (MTL) for high speed
interconnects that provide matching logic in the library.
  Currently includes support for MX and some support for
  Portals
* Fix overuse of proc_pml pointer on the ompi_proc structuer, 
  splitting into proc_pml for pml data and proc_bml for
  the BML endpoint data
* bug fixes in bsend init code, which wasn't being used by
  the OB1 or DR PMLs...

This commit was SVN r10642.
2006-07-04 01:20:20 +00:00
Galen Shipman
7e079d20ab fix for stupid casting.. addresses issue on PPC64 where sizes get set
improperly and badness ensues..

This commit was SVN r10574.
2006-06-29 21:58:50 +00:00
George Bosilca
7d59a6885b Remove all references to the MRU list. Add back the repost list checks. For some reasons
it decrease the latency by around 0.3 micro-seconds ...

This commit was SVN r10571.
2006-06-29 19:25:44 +00:00
George Bosilca
78f0de127d Typo.
This commit was SVN r10567.
2006-06-29 15:16:25 +00:00
George Bosilca
238147f576 Help the compiler to optimize the code. Now the order in the enum reflect the
order we use them in the switch.

This commit was SVN r10565.
2006-06-29 15:10:58 +00:00
George Bosilca
9bf281bca2 Remove the gm_mru_reg list as it is never used. Cleanup the repost logic. Now we repost
a receive fragment only when we're done with the message from inside and we try to add it
to the list.

This commit was SVN r10564.
2006-06-29 15:10:11 +00:00
George Bosilca
43b7b17033 Release the memory registration when the descriptors get freed.
This commit was SVN r10540.
2006-06-28 15:24:16 +00:00
George Bosilca
d9daa34a6c Set the registration field to NULL when we create a new fragment.
This commit was SVN r10539.
2006-06-28 15:23:36 +00:00
Gleb Natapov
c8f75c472a remove modulo op from fast path. Improvement 0.02-0.04ms.
This commit was SVN r10538.
2006-06-28 12:00:47 +00:00
Gleb Natapov
e58a89ef3e OMPI_ENABLE_DEBUG is always defined (to 0 or 1). Use #if and nto #ifdef.
This commit was SVN r10537.
2006-06-28 11:25:09 +00:00
Gleb Natapov
704a5eb645 Support for LMC (lid mask count) and multiple QPs per port.
This commit was SVN r10536.
2006-06-28 07:23:08 +00:00
Galen Shipman
e6cd8db0e5 DR will now checksum on a per btl basis (see MCA_BTL_FLAGS_NEED_CSUM). We
still always send ACK's, teasing apart completion for ACK/no ACK looks like a
pain in the .. 

This commit was SVN r10530.
2006-06-27 20:23:47 +00:00
Jeff Squyres
df45221a3e Until a real fix for #142 is found, this workaround prohibits using
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1.  If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).

This commit was SVN r10521.
2006-06-27 10:43:03 +00:00
Gleb Natapov
52208d7bf9 Whe don't need to register zero sized frags.
This commit was SVN r10519.
2006-06-27 08:50:12 +00:00
Galen Shipman
8855e5b73a Fixes for DR as well as better diagnostic..
Successfully passing the intel test suite with/without induced errors/drops. 

This commit was SVN r10518.
2006-06-26 22:29:29 +00:00
Gleb Natapov
b7715395cb Return descriptor before sending credits one more time. We may need it.
This commit was SVN r10495.
2006-06-26 07:05:58 +00:00
Andrew Friedley
7bfac82ce7 Change over from lazy connection setup to setting up at initialization
time.

UD is connectionless, and as long as peers are statically assigned to QPs,
there is no reason to set up the adressing information lazily.

Lots of code was axed, as endpoints no longer have state.  Removed a
number of other elements in the endpoint struct to make it as lightweight
as possible.

I was able to remove an entire function call/branch in the send path,
which I believe is the main contributor to a 2us drop in NetPIPE latency.

Some whitespace cleanups as well.

Passes IBM test suite, and all but certain intel tests that were failing
before the change, over ob1 PML.

This commit was SVN r10494.
2006-06-23 16:50:50 +00:00
Andrew Friedley
046f4cd4ae Enough cleanup for now.
Moved a lot of the module-specific init from the component init to the module init.

Try keeping a pointer to reduce indexing, didn't seem to help - leaving in place
for now.

This commit was SVN r10485.
2006-06-22 22:12:13 +00:00
Andrew Friedley
8392ed4cac A checkpoint before I really do some cleanup.. nothing pretty here.
Playing around with OPAL_LIKELY/UNLIKELY, no real gains yet.

Reworked progress() to process many WC's at a time, as well
as immediately repost groups of receive buffers.

This commit was SVN r10481.
2006-06-22 18:06:55 +00:00
Andrew Friedley
365c81d6e9 Fix a few issues reported by Terry Dontje:
1. ompi/mca/btl/udapl/btl_udapl_proc.c should be including
btl_udapl_endpoint.h for mca_btl_udapl_proc_insert function.

2. btl_udapl_endpoint.c it looks like you are using
&endpoint->endpoint_lock when you should use &ep->endpoint_lock in a
OPAL_THREAD_LOCK call.

3. btl_udapl_frag.h has a couple opal_list_item_t's that should be
ompi_free_list_item_t in the _FRAG_ALLOC_{EAGER,MAX} macros.

This commit was SVN r10442.
2006-06-20 17:13:44 +00:00
George Bosilca
044868df45 Set the destination descriptor before calling the recv registration. Once
this call is completed, we have to remove it in order to be able to cleanup
correctly the fragments.

This commit was SVN r10428.
2006-06-20 14:11:09 +00:00
George Bosilca
1b18b7d934 Change the parameter registration of this BTL to the new calls (new is relative
here). Change the self BTL to use RDMA protocol.

This commit was SVN r10427.
2006-06-20 14:09:58 +00:00
Jeff Squyres
1d27ca5d0a Until a real fix for #142 is found, this workaround prohibits using
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1.  If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).

This commit was SVN r10424.
2006-06-20 11:32:46 +00:00
Jeff Squyres
600bf4295a Update the help message to be slightly more concise and clear
This commit was SVN r10422.
2006-06-20 11:23:38 +00:00
Brian Barrett
3d027e57a8 * fix for ticket #141. If we are going to shortcut out of polling the
send/receive queues if there is something available in the short message
  rdma queues, then we have to poll *ALL* the rdma queues before exiting,
  or we aren't fair about frag reception and fall into degenerate matching
  cases.

This commit was SVN r10410.
2006-06-17 21:32:25 +00:00
Brian Barrett
05046e8ad2 if MX isn't running on some hosts, but is on others, we were blocking in the modex receive
waiting for the non-running procs to publish their contact information.  Publish their
(lack of) contact information.

This commit was SVN r10355.
2006-06-14 19:07:38 +00:00
George Bosilca
aca71521db Complete the move of the mpool registration from opal_list_item_t to the
ompi_free_list_item_t.

This commit was SVN r10354.
2006-06-14 17:43:50 +00:00