1
1
Граф коммитов

137 Коммитов

Автор SHA1 Сообщение Дата
Gleb Natapov
c9a1b06771 Remove trailing whitespaces. No code changes in this commit.
This commit was SVN r17167.
2008-01-21 12:11:18 +00:00
Pavel Shamis
add4d9df8a XRC fixes for MPI2 dynamics.
This commit was SVN r17144.
2008-01-15 21:14:48 +00:00
George Bosilca
6310ce955c The first patch related to the Active Message stuff. So far, here is what we have:
- the registration array is now global instead of one by BTL.
- each framework have to declare the entries in the registration array reserved. Then
  it have to define the internal way of sharing (or not) these entries between all
  components. As an example, the PML will not share as there is only one active PML
  at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3
  are reserved for the framework while the remaining 5 are use internally by each
  framework.
- The registration function is optional. If a BTL do not provide such function,
  nothing happens. However, in the case where such function is provided in the BTL
  structure, it will be called by the BML, when a tag is registered.

Now, it's time for the second step... Converting OB1 from a switch based PML to an
active message one.

This commit was SVN r17140.
2008-01-15 05:32:53 +00:00
Jon Mason
a0d4122606 The new cpc selection framework is now in place. The patch below allows
for dynamic selection of cpc methods based on what is available.  It
also allows for inclusion/exclusions of methods.  It even futher allows
for modifying the priorities of certain cpc methods to better determine
the optimal cpc method.

This patch also contains XRC compile time disablement (per Jeff's
patch).

At a high level, the cpc selections works by walking through each cpc
and allowing it to test to see if it is permissable to run on this
mpirun.  It returns a priority if it is permissable or a -1 if not.  All
of the cpc names and priorities are rolled into a string.  This string
is then encapsulated in a message and passed around all the ompi
processes.  Once received and unpacked, the list received is compared
to a local copy of the list.  The connection method is chosen by
comparing the lists passed around to all nodes via modex with the list
generated locally.  Any non-negative number is a potentially valid
connection method.  The method below of determining the optimal
connection method is to take the cross-section of the two lists.  The
highest single value (and the other side being non-negative) is selected
as the cpc method.

svn merge -r 16948:17128 https://svn.open-mpi.org/svn/ompi/tmp-public/openib-cpc/ .

This commit was SVN r17138.
2008-01-14 23:22:03 +00:00
Jon Mason
626e0814a2 Style clean-up
This commit was SVN r17126.
2008-01-12 18:47:17 +00:00
Pavel Shamis
99f51482e3 Fixing openib finalization flow.
This commit was SVN r17085.
2008-01-09 12:36:30 +00:00
Gleb Natapov
621fa223c5 Create free lists of fragments per HCA, not per BTL. Saves memory in case of
multiple LMCs.

This commit was SVN r17082.
2008-01-09 10:26:21 +00:00
Gleb Natapov
5ce3213158 Rearrange functions order so that functions are defined before they are used. No
code changes here.

This commit was SVN r17081.
2008-01-09 10:05:41 +00:00
Gleb Natapov
c3bbf69356 Set send_flags correctly in btl_openib_put. Otherwise we may reuse flags from
previous use of the buffer and they may be incorrect.

This commit was SVN r17058.
2008-01-07 10:19:07 +00:00
Gleb Natapov
2fb6947f88 Destroy endpoints that use eager rdma communication before destroying SRQ. Do't
skip async event thread destruction if SRQ was not destroyed, or it will segfault
on module removal.

This commit was SVN r17025.
2007-12-23 13:58:31 +00:00
Gleb Natapov
b06d92bdab OpenIB BTL has three channels through which data can be received (eager rdma,
high prio QPs and low prio QPs) and because not all of them are polled each time
progrgess() is called (to save on latency) starvation is possible. The commit
fixes this. Now each channel is polled, but higher priority channels are polled
more often. Three new parameters are introduced that control polling ratios 
between different channels.

This commit was SVN r17024.
2007-12-23 12:29:34 +00:00
George Bosilca
906e8bf1d1 Replace the ompi_pointer_array with opal_pointer_array. The next step
(sometimes after the merge with the ORTE branch), the opal_pointer_array
will became the only pointer_array implementation (the orte_pointer_array
will be removed).

This commit was SVN r17007.
2007-12-21 06:02:00 +00:00
Gleb Natapov
2a59b2a68f 1. Set segments length in prepare_src() after packing because actual size may be
smaller then allocated size.

2. If reserve zero don't allocate coalesced frag since it will be RDMAed, not
send.  The logic was other way around.

This commit was SVN r16928.
2007-12-11 13:10:52 +00:00
Gleb Natapov
17611dafbe Fix pointer casting on 32bit machines.
This commit was SVN r16907.
2007-12-09 14:15:35 +00:00
Gleb Natapov
2f9c5b46cf Return OMPI_ERR_RESOURCE_BUSY from openib_btl_send() if fragment is not on wire.
This commit was SVN r16906.
2007-12-09 14:14:11 +00:00
Gleb Natapov
493951e09d Add heterogeneous support to message coalescing.
This commit was SVN r16903.
2007-12-09 14:10:25 +00:00
Gleb Natapov
b4698dc6df Use flags provided during allocation to coalesce to correct priority queue.
This commit was SVN r16902.
2007-12-09 14:08:55 +00:00
Gleb Natapov
e2e211f23b Add flags parameter to btl_alloc() and btl_prepare_src() functions. If BTL
knows at the time of allocation priority of a descriptor it may do some
optimizations.

This commit was SVN r16901.
2007-12-09 14:08:01 +00:00
Gleb Natapov
5313a2baa7 Message coalescing for openib BTL. If fragment is waiting to be transmitted in
a pending queue pack another message into it if there is enough space there.

This commit was SVN r16900.
2007-12-09 14:05:13 +00:00
Gleb Natapov
7302cd24eb Call btl_alloc() from btl_prepare_src() to have one point of frag allocation.
This commit was SVN r16899.
2007-12-09 14:02:32 +00:00
Gleb Natapov
7364b7cf47 Add endpoint parameter to btl_alloc() function. Enables various optimizations
inside BTL.

This commit was SVN r16898.
2007-12-09 14:00:42 +00:00
Pavel Shamis
57728986f8 Fixing XRC multiport/multisubnet support.
This commit was SVN r16819.
2007-12-03 09:49:53 +00:00
Gleb Natapov
a774cd98f8 Put send completions to low prio CQ. Receive is more important.
This commit was SVN r16817.
2007-12-02 14:46:37 +00:00
Pavel Shamis
8aca6eb31b OFED 1.3 doesn't implement ibv_resize_cq for connectX.
On error exit from ibv_resize_cq we should to check if the function
is implemented.

This commit was SVN r16799.
2007-11-28 15:23:19 +00:00
Pavel Shamis
3e2e4f6d2a Removing unused lid.
This commit was SVN r16794.
2007-11-28 10:06:57 +00:00
Pavel Shamis
aa79bdabc8 Removing port_touse - we don't really need it
This commit was SVN r16793.
2007-11-28 09:57:48 +00:00
Pavel Shamis
2ffbe8776a Fixing compilation problems in openib
This commit was SVN r16792.
2007-11-28 09:38:49 +00:00
Gleb Natapov
218adb2a96 Account for eager rdma credit fragments when creating send queue. Create XRC
receive QP with zero receive and send queue length. We don't going to use this
QP for send and receives a posted to SRQs.

This commit was SVN r16791.
2007-11-28 07:22:01 +00:00
Gleb Natapov
601952a952 Don't shared endpoint->qps array, only pointer to actual QP. Calculate send
queue size for shared QP based on all endpoints that want to use it.

This commit was SVN r16790.
2007-11-28 07:21:07 +00:00
Gleb Natapov
b46c9cc7bc Make xrc use srq_qp unions instead of the xrc_qp which is exactly like srq_qp.
This commit was SVN r16789.
2007-11-28 07:20:26 +00:00
Gleb Natapov
bd47da4699 Initial XRC support by Mellanox.
This commit was SVN r16787.
2007-11-28 07:18:59 +00:00
Gleb Natapov
923666b75c Process pending put/get frags on endpoint connection establishment.
This commit was SVN r16785.
2007-11-28 07:16:52 +00:00
Gleb Natapov
5a4e953aaa Allow share the same qp for different buffer sizes. Needed for XRC support.
This commit was SVN r16783.
2007-11-28 07:15:20 +00:00
Gleb Natapov
b123696d57 Fix async thread creation and destruction. Create async thread only when it is
needed instead of creating it and then canceling if it is not needed. Change
error handling during finalize so that it will not skip async thread
destruction. Otherwise async thread may segfault during openib module unloading.

This commit was SVN r16782.
2007-11-28 07:14:34 +00:00
Gleb Natapov
a9f864d15c If there is an eager rdma credit, but there is no WQE to send a packet we add it
to a pending queue of eager rdma QP instead of correct pending list. This patch
fixes this by getting reed of "eager rdma qp" notion. Packet is always send
over its order QP. The patch also adds two pending queues for high and low prio
packets. Only high prio packets are sent over eager RDMA channel.

This commit was SVN r16780.
2007-11-28 07:12:44 +00:00
Gleb Natapov
6a2d210b7d Use OMPI object system to make fragment hierarchy more object oriented. The
main idea (except of cleanup) is to save on initialisation of unneeded fields
and to use C type checking system to catch obvious errors.

This commit was SVN r16779.
2007-11-28 07:11:14 +00:00
Gleb Natapov
267cd2342a Cleanup. Remove unused functions.
This commit was SVN r16778.
2007-11-28 07:08:56 +00:00
Gleb Natapov
3a63eb6c17 Cleanup macro definitions.
This commit was SVN r16554.
2007-10-23 13:33:19 +00:00
Jeff Squyres
94b1e9cff9 Update to use BTL_VERBOSE and BTL_ERROR instead of opal_output'ing to
the mca_btl_base_output stream directly (and relying on it to be -1 if
we didn't want any output).

This commit was SVN r16449.
2007-10-15 17:53:02 +00:00
Gleb Natapov
60af46d541 We have QP description in component structure, module structure and endpoint.
Each one of them has a field to store QP type, but this is redundant.
Store qp type only in one structure (the component one).

This commit was SVN r16272.
2007-09-30 16:14:17 +00:00
Gleb Natapov
c7105eadc7 Update Voltaire copyright.
This commit was SVN r16189.
2007-09-24 10:11:52 +00:00
Brian Barrett
59b22533f2 Enable RDMA for heterogeneous situations. Currently done by overloading
the ompi_convertor_need_buffers function to only return 0 if the convertor
is homogeneous (which it never does on the trunk, but does to on v1.2, but
that's a different issue).  Only enable the heterogeneous rdma code for
a btl if it supports it (via a flag), as some btls need some work for this
to work properly.  Currently only TCP and OpenIB extensively tested

This commit was SVN r15990.
2007-08-28 21:23:44 +00:00
Gleb Natapov
d8f3063895 Create only one CQ for all BTLs on the same HCA. Many BTLs can be created for
one HCA. Multiple ports, LMC, multiple BTLs per one LID. Having only one CQ for
all of them substantially reduce polling time.

This commit was SVN r15933.
2007-08-20 12:28:25 +00:00
Galen Shipman
438a56e0d7 update copyrights for ib_multifrag commit
This commit was SVN r15612.
2007-07-25 15:03:34 +00:00
Galen Shipman
325c184fb4 remove debugging "abort()"
fix a debugging assert

This commit was SVN r15611.
2007-07-25 14:51:19 +00:00
Pavel Shamis
d837f1446b It is work around for Ticket #1092.
It will prevent the error failure in openib finalize
but it doesn't resolve the actual issue. I guess that
oneside tests some how allocates memory (mpool?) and doesn't 
release it. Need to check it.

This commit was SVN r15488.
2007-07-18 18:02:13 +00:00
Gleb Natapov
45fcb45e31 Remove debug checks that produce lots of warnings during compilation.
This commit was SVN r15479.
2007-07-18 13:49:15 +00:00
Gleb Natapov
30b2183314 Remove debug output from a hot path.
This commit was SVN r15478.
2007-07-18 12:48:34 +00:00
Jeff Squyres
8ace07efed This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
   BTL.
1. Pasha's new implementation of asychronous HCA event handling.

Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.  

Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes).  :-(

== Fine-grain control of queue pair resources ==

Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).

Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments.  When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers.  One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.

The new design allows multiple QPs to be specified at runtime.  Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified.  The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:

{{{
-mca btl_openib_receive_queues \
     "P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}

Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma).  The above
example therefore describes 4 QPs.

The first QP is:

    P,128,16,4

Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes).  The third field indicates the number of receive buffers
to allocate to the QP (16).  The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).

The second QP is:

    S,1024,256,128,32

Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP.  The second, third and fourth fields are the same as in the
per-peer based QP.  The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32).  This provides a
"good enough" mechanism of flow control for some regular communication
patterns.

QPs MUST be specified in ascending receive buffer size order.  This
requirement may be removed prior to 1.3 release.

This commit was SVN r15474.
2007-07-18 01:15:59 +00:00
Gleb Natapov
b88b7dedfe Rename btl_rdma_offset to btl_pipeline_send_length.
This commit was SVN r15153.
2007-06-21 07:12:40 +00:00