1
1
Граф коммитов

251 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
0074126886 Per #1352, most iWARP adapters today cannot handle connections between
two processes on the same server (!).  So for today, we'll simply mark
all local processes that use iWARP adapters as "unreachable".

More details in #1352.

This commit was SVN r18699.
2008-06-20 22:08:00 +00:00
Pavel Shamis
4537827973 Making the qp allocation more optimized.
- sq parameter was replaced with max_inline parameter
- inline is allocated only for relevant QPs

This commit was SVN r18675.
2008-06-19 08:40:39 +00:00
Pavel Shamis
dc3f14736d Fixing QP initialization stuff.
This commit was SVN r18650.
2008-06-11 16:31:39 +00:00
Ralph Castain
9613b3176c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.

I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.

This commit was SVN r18619.
2008-06-09 14:53:58 +00:00
Pavel Shamis
379e00050c Fixing openib btl finalize flow. Bug fix for #1286.
This commit was SVN r18590.
2008-06-05 12:20:13 +00:00
George Bosilca
e361bcb64c Send optimizations.
1. The send path get shorter. The BTL is allowed to return > 0 to specify that the
   descriptor was pushed to the networks, and that the memory attached to it is 
   available again for the upper layer. The MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag
   can be used by the PML to force the BTL to always trigger the callback.
   Unmodified BTL will continue to work as expected, as they will return OMPI_SUCCESS
   which force the PML to have exactly the same behavior as before. Some BTLs have
   been modified: self, sm, tcp, mx.
2. Add send immediate interface to BTL.
   The idea is to have a mechanism of allowing the BTL to take advantage of
   send optimizations such as the ability to deliver data "inline". Some
   network APIs such as Portals allow data to be sent using a "thin" event
   without packing data into a memory descriptor. This interface change
   allows the BTL to use such capabilities and allows for other optimizations
   in the future. All existing BTLs except for Portals and sm have this interface
   set to NULL.

This commit was SVN r18551.
2008-05-30 03:58:39 +00:00
Jeff Squyres
e7ecd56bd2 This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.

= ORTE Job-Level Output Messages =

Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):

 * orte_output(): (and corresponding friends ORTE_OUTPUT,
   orte_output_verbose, etc.)  This function sends the output directly
   to the HNP for processing as part of a job-specific output
   channel.  It supports all the same outputs as opal_output()
   (syslog, file, stdout, stderr), but for stdout/stderr, the output
   is sent to the HNP for processing and output.  More on this below.
 * orte_show_help(): This function is a drop-in-replacement for
   opal_show_help(), with two differences in functionality:
   1. the rendered text help message output is sent to the HNP for
      display (rather than outputting directly into the process' stderr
      stream)
   1. the HNP detects duplicate help messages and does not display them
      (so that you don't see the same error message N times, once from
      each of your N MPI processes); instead, it counts "new" instances
      of the help message and displays a message every ~5 seconds when
      there are new ones ("I got X new copies of the help message...")

opal_show_help and opal_output still exist, but they only output in
the current process.  The intent for the new orte_* functions is that
they can apply job-level intelligence to the output.  As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.

=== New code ===

For ORTE and OMPI programmers, here's what you need to do differently
in new code:

 * Do not include opal/util/show_help.h or opal/util/output.h.
   Instead, include orte/util/output.h (this one header file has
   declarations for both the orte_output() series of functions and
   orte_show_help()).
 * Effectively s/opal_output/orte_output/gi throughout your code.
   Note that orte_output_open() takes a slightly different argument
   list (as a way to pass data to the filtering stream -- see below),
   so you if explicitly call opal_output_open(), you'll need to
   slightly adapt to the new signature of orte_output_open().
 * Literally s/opal_show_help/orte_show_help/.  The function signature
   is identical.

=== Notes ===

 * orte_output'ing to stream 0 will do similar to what
   opal_output'ing did, so leaving a hard-coded "0" as the first
   argument is safe.
 * For systems that do not use ORTE's RML or the HNP, the effect of
   orte_output_* and orte_show_help will be identical to their opal
   counterparts (the additional information passed to
   orte_output_open() will be lost!).  Indeed, the orte_* functions
   simply become trivial wrappers to their opal_* counterparts.  Note
   that we have not tested this; the code is simple but it is quite
   possible that we mucked something up.

= Filter Framework =

Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr.  The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations.  The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc.  This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).

Filtering is not active by default.  Filter components must be
specifically requested, such as:

{{{
$ mpirun --mca filter xml ...
}}}

There can only be one filter component active.

= New MCA Parameters =

The new functionality described above introduces two new MCA
parameters:

 * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
   help messages will be aggregated, as described above.  If set to 0,
   all help messages will be displayed, even if they are duplicates
   (i.e., the original behavior).
 * '''orte_base_show_output_recursions''': An MCA parameter to help
   debug one of the known issues, described below.  It is likely that
   this MCA parameter will disappear before v1.3 final.

= Known Issues =

 * The XML filter component is not complete.  The current output from
   this component is preliminary and not real XML.  A bit more work
   needs to be done to configure.m4 search for an appropriate XML
   library/link it in/use it at run time.
 * There are possible recursion loops in the orte_output() and
   orte_show_help() functions -- e.g., if RML send calls orte_output()
   or orte_show_help().  We have some ideas how to fix these, but
   figured that it was ok to commit before feature freeze with known
   issues.  The code currently contains sub-optimal workarounds so
   that this will not be a problem, but it would be good to actually
   solve the problem rather than have hackish workarounds before v1.3 final.

This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
Jeff Squyres
ba5615a18f Merge in /tmp-public/cpc3 branch to trunk. oob/xoob still remains the
default CPC.

This commit was SVN r18356.
2008-05-02 11:52:33 +00:00
Ralph Castain
fa082cafa9 Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex.
Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer.

This commit was SVN r18198.
2008-04-17 20:43:56 +00:00
Ralph Castain
dc7f45dafd Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure.
Only one place used the user name field - session_dir, when formulating the name of the top-level directory. Accordingly, the code for getting the user's id has been moved to the session_dir code.

This commit was SVN r17926.
2008-03-23 23:10:15 +00:00
Pavel Shamis
54ad8d7446 The issue was reported/fixed by Jon Mason one month ago but the fix was not committed. So I'm commiting it now.
This commit was SVN r17835.
2008-03-17 11:13:06 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
Gleb Natapov
60c151608c Set flags inside fragment allocation function.
This commit was SVN r17508.
2008-02-19 12:26:45 +00:00
George Bosilca
fa31ec81d0 Add the ownership flags to the PML/BTL interface. The layer
owning the descriptor is responsible for releasing it once
the descriptor is not in use anymore.

This commit was SVN r17497.
2008-02-18 17:39:30 +00:00
Gleb Natapov
c9a1b06771 Remove trailing whitespaces. No code changes in this commit.
This commit was SVN r17167.
2008-01-21 12:11:18 +00:00
Pavel Shamis
add4d9df8a XRC fixes for MPI2 dynamics.
This commit was SVN r17144.
2008-01-15 21:14:48 +00:00
George Bosilca
6310ce955c The first patch related to the Active Message stuff. So far, here is what we have:
- the registration array is now global instead of one by BTL.
- each framework have to declare the entries in the registration array reserved. Then
  it have to define the internal way of sharing (or not) these entries between all
  components. As an example, the PML will not share as there is only one active PML
  at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3
  are reserved for the framework while the remaining 5 are use internally by each
  framework.
- The registration function is optional. If a BTL do not provide such function,
  nothing happens. However, in the case where such function is provided in the BTL
  structure, it will be called by the BML, when a tag is registered.

Now, it's time for the second step... Converting OB1 from a switch based PML to an
active message one.

This commit was SVN r17140.
2008-01-15 05:32:53 +00:00
Jon Mason
a0d4122606 The new cpc selection framework is now in place. The patch below allows
for dynamic selection of cpc methods based on what is available.  It
also allows for inclusion/exclusions of methods.  It even futher allows
for modifying the priorities of certain cpc methods to better determine
the optimal cpc method.

This patch also contains XRC compile time disablement (per Jeff's
patch).

At a high level, the cpc selections works by walking through each cpc
and allowing it to test to see if it is permissable to run on this
mpirun.  It returns a priority if it is permissable or a -1 if not.  All
of the cpc names and priorities are rolled into a string.  This string
is then encapsulated in a message and passed around all the ompi
processes.  Once received and unpacked, the list received is compared
to a local copy of the list.  The connection method is chosen by
comparing the lists passed around to all nodes via modex with the list
generated locally.  Any non-negative number is a potentially valid
connection method.  The method below of determining the optimal
connection method is to take the cross-section of the two lists.  The
highest single value (and the other side being non-negative) is selected
as the cpc method.

svn merge -r 16948:17128 https://svn.open-mpi.org/svn/ompi/tmp-public/openib-cpc/ .

This commit was SVN r17138.
2008-01-14 23:22:03 +00:00
Jon Mason
626e0814a2 Style clean-up
This commit was SVN r17126.
2008-01-12 18:47:17 +00:00
Pavel Shamis
99f51482e3 Fixing openib finalization flow.
This commit was SVN r17085.
2008-01-09 12:36:30 +00:00
Gleb Natapov
621fa223c5 Create free lists of fragments per HCA, not per BTL. Saves memory in case of
multiple LMCs.

This commit was SVN r17082.
2008-01-09 10:26:21 +00:00
Gleb Natapov
5ce3213158 Rearrange functions order so that functions are defined before they are used. No
code changes here.

This commit was SVN r17081.
2008-01-09 10:05:41 +00:00
Gleb Natapov
c3bbf69356 Set send_flags correctly in btl_openib_put. Otherwise we may reuse flags from
previous use of the buffer and they may be incorrect.

This commit was SVN r17058.
2008-01-07 10:19:07 +00:00
Gleb Natapov
2fb6947f88 Destroy endpoints that use eager rdma communication before destroying SRQ. Do't
skip async event thread destruction if SRQ was not destroyed, or it will segfault
on module removal.

This commit was SVN r17025.
2007-12-23 13:58:31 +00:00
Gleb Natapov
b06d92bdab OpenIB BTL has three channels through which data can be received (eager rdma,
high prio QPs and low prio QPs) and because not all of them are polled each time
progrgess() is called (to save on latency) starvation is possible. The commit
fixes this. Now each channel is polled, but higher priority channels are polled
more often. Three new parameters are introduced that control polling ratios 
between different channels.

This commit was SVN r17024.
2007-12-23 12:29:34 +00:00
George Bosilca
906e8bf1d1 Replace the ompi_pointer_array with opal_pointer_array. The next step
(sometimes after the merge with the ORTE branch), the opal_pointer_array
will became the only pointer_array implementation (the orte_pointer_array
will be removed).

This commit was SVN r17007.
2007-12-21 06:02:00 +00:00
Gleb Natapov
2a59b2a68f 1. Set segments length in prepare_src() after packing because actual size may be
smaller then allocated size.

2. If reserve zero don't allocate coalesced frag since it will be RDMAed, not
send.  The logic was other way around.

This commit was SVN r16928.
2007-12-11 13:10:52 +00:00
Gleb Natapov
17611dafbe Fix pointer casting on 32bit machines.
This commit was SVN r16907.
2007-12-09 14:15:35 +00:00
Gleb Natapov
2f9c5b46cf Return OMPI_ERR_RESOURCE_BUSY from openib_btl_send() if fragment is not on wire.
This commit was SVN r16906.
2007-12-09 14:14:11 +00:00
Gleb Natapov
493951e09d Add heterogeneous support to message coalescing.
This commit was SVN r16903.
2007-12-09 14:10:25 +00:00
Gleb Natapov
b4698dc6df Use flags provided during allocation to coalesce to correct priority queue.
This commit was SVN r16902.
2007-12-09 14:08:55 +00:00
Gleb Natapov
e2e211f23b Add flags parameter to btl_alloc() and btl_prepare_src() functions. If BTL
knows at the time of allocation priority of a descriptor it may do some
optimizations.

This commit was SVN r16901.
2007-12-09 14:08:01 +00:00
Gleb Natapov
5313a2baa7 Message coalescing for openib BTL. If fragment is waiting to be transmitted in
a pending queue pack another message into it if there is enough space there.

This commit was SVN r16900.
2007-12-09 14:05:13 +00:00
Gleb Natapov
7302cd24eb Call btl_alloc() from btl_prepare_src() to have one point of frag allocation.
This commit was SVN r16899.
2007-12-09 14:02:32 +00:00
Gleb Natapov
7364b7cf47 Add endpoint parameter to btl_alloc() function. Enables various optimizations
inside BTL.

This commit was SVN r16898.
2007-12-09 14:00:42 +00:00
Pavel Shamis
57728986f8 Fixing XRC multiport/multisubnet support.
This commit was SVN r16819.
2007-12-03 09:49:53 +00:00
Gleb Natapov
a774cd98f8 Put send completions to low prio CQ. Receive is more important.
This commit was SVN r16817.
2007-12-02 14:46:37 +00:00
Pavel Shamis
8aca6eb31b OFED 1.3 doesn't implement ibv_resize_cq for connectX.
On error exit from ibv_resize_cq we should to check if the function
is implemented.

This commit was SVN r16799.
2007-11-28 15:23:19 +00:00
Pavel Shamis
3e2e4f6d2a Removing unused lid.
This commit was SVN r16794.
2007-11-28 10:06:57 +00:00
Pavel Shamis
aa79bdabc8 Removing port_touse - we don't really need it
This commit was SVN r16793.
2007-11-28 09:57:48 +00:00
Pavel Shamis
2ffbe8776a Fixing compilation problems in openib
This commit was SVN r16792.
2007-11-28 09:38:49 +00:00
Gleb Natapov
218adb2a96 Account for eager rdma credit fragments when creating send queue. Create XRC
receive QP with zero receive and send queue length. We don't going to use this
QP for send and receives a posted to SRQs.

This commit was SVN r16791.
2007-11-28 07:22:01 +00:00
Gleb Natapov
601952a952 Don't shared endpoint->qps array, only pointer to actual QP. Calculate send
queue size for shared QP based on all endpoints that want to use it.

This commit was SVN r16790.
2007-11-28 07:21:07 +00:00
Gleb Natapov
b46c9cc7bc Make xrc use srq_qp unions instead of the xrc_qp which is exactly like srq_qp.
This commit was SVN r16789.
2007-11-28 07:20:26 +00:00
Gleb Natapov
bd47da4699 Initial XRC support by Mellanox.
This commit was SVN r16787.
2007-11-28 07:18:59 +00:00
Gleb Natapov
923666b75c Process pending put/get frags on endpoint connection establishment.
This commit was SVN r16785.
2007-11-28 07:16:52 +00:00
Gleb Natapov
5a4e953aaa Allow share the same qp for different buffer sizes. Needed for XRC support.
This commit was SVN r16783.
2007-11-28 07:15:20 +00:00
Gleb Natapov
b123696d57 Fix async thread creation and destruction. Create async thread only when it is
needed instead of creating it and then canceling if it is not needed. Change
error handling during finalize so that it will not skip async thread
destruction. Otherwise async thread may segfault during openib module unloading.

This commit was SVN r16782.
2007-11-28 07:14:34 +00:00
Gleb Natapov
a9f864d15c If there is an eager rdma credit, but there is no WQE to send a packet we add it
to a pending queue of eager rdma QP instead of correct pending list. This patch
fixes this by getting reed of "eager rdma qp" notion. Packet is always send
over its order QP. The patch also adds two pending queues for high and low prio
packets. Only high prio packets are sent over eager RDMA channel.

This commit was SVN r16780.
2007-11-28 07:12:44 +00:00
Gleb Natapov
6a2d210b7d Use OMPI object system to make fragment hierarchy more object oriented. The
main idea (except of cleanup) is to save on initialisation of unneeded fields
and to use C type checking system to catch obvious errors.

This commit was SVN r16779.
2007-11-28 07:11:14 +00:00
Gleb Natapov
267cd2342a Cleanup. Remove unused functions.
This commit was SVN r16778.
2007-11-28 07:08:56 +00:00
Gleb Natapov
3a63eb6c17 Cleanup macro definitions.
This commit was SVN r16554.
2007-10-23 13:33:19 +00:00
Jeff Squyres
94b1e9cff9 Update to use BTL_VERBOSE and BTL_ERROR instead of opal_output'ing to
the mca_btl_base_output stream directly (and relying on it to be -1 if
we didn't want any output).

This commit was SVN r16449.
2007-10-15 17:53:02 +00:00
Gleb Natapov
60af46d541 We have QP description in component structure, module structure and endpoint.
Each one of them has a field to store QP type, but this is redundant.
Store qp type only in one structure (the component one).

This commit was SVN r16272.
2007-09-30 16:14:17 +00:00
Gleb Natapov
c7105eadc7 Update Voltaire copyright.
This commit was SVN r16189.
2007-09-24 10:11:52 +00:00
Brian Barrett
59b22533f2 Enable RDMA for heterogeneous situations. Currently done by overloading
the ompi_convertor_need_buffers function to only return 0 if the convertor
is homogeneous (which it never does on the trunk, but does to on v1.2, but
that's a different issue).  Only enable the heterogeneous rdma code for
a btl if it supports it (via a flag), as some btls need some work for this
to work properly.  Currently only TCP and OpenIB extensively tested

This commit was SVN r15990.
2007-08-28 21:23:44 +00:00
Gleb Natapov
d8f3063895 Create only one CQ for all BTLs on the same HCA. Many BTLs can be created for
one HCA. Multiple ports, LMC, multiple BTLs per one LID. Having only one CQ for
all of them substantially reduce polling time.

This commit was SVN r15933.
2007-08-20 12:28:25 +00:00
Galen Shipman
438a56e0d7 update copyrights for ib_multifrag commit
This commit was SVN r15612.
2007-07-25 15:03:34 +00:00
Galen Shipman
325c184fb4 remove debugging "abort()"
fix a debugging assert

This commit was SVN r15611.
2007-07-25 14:51:19 +00:00
Pavel Shamis
d837f1446b It is work around for Ticket #1092.
It will prevent the error failure in openib finalize
but it doesn't resolve the actual issue. I guess that
oneside tests some how allocates memory (mpool?) and doesn't 
release it. Need to check it.

This commit was SVN r15488.
2007-07-18 18:02:13 +00:00
Gleb Natapov
45fcb45e31 Remove debug checks that produce lots of warnings during compilation.
This commit was SVN r15479.
2007-07-18 13:49:15 +00:00
Gleb Natapov
30b2183314 Remove debug output from a hot path.
This commit was SVN r15478.
2007-07-18 12:48:34 +00:00
Jeff Squyres
8ace07efed This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
   BTL.
1. Pasha's new implementation of asychronous HCA event handling.

Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.  

Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes).  :-(

== Fine-grain control of queue pair resources ==

Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).

Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments.  When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers.  One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.

The new design allows multiple QPs to be specified at runtime.  Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified.  The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:

{{{
-mca btl_openib_receive_queues \
     "P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}

Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma).  The above
example therefore describes 4 QPs.

The first QP is:

    P,128,16,4

Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes).  The third field indicates the number of receive buffers
to allocate to the QP (16).  The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).

The second QP is:

    S,1024,256,128,32

Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP.  The second, third and fourth fields are the same as in the
per-peer based QP.  The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32).  This provides a
"good enough" mechanism of flow control for some regular communication
patterns.

QPs MUST be specified in ascending receive buffer size order.  This
requirement may be removed prior to 1.3 release.

This commit was SVN r15474.
2007-07-18 01:15:59 +00:00
Gleb Natapov
b88b7dedfe Rename btl_rdma_offset to btl_pipeline_send_length.
This commit was SVN r15153.
2007-06-21 07:12:40 +00:00
Gleb Natapov
7b9ae49fe1 This time correctly calculate local BTL rank among all BTLs in a subnet.
This commit was SVN r15073.
2007-06-14 10:27:11 +00:00
Gleb Natapov
5c3f511451 Properly determine btl's rank among all btls withing the same subnet.
This commit was SVN r15038.
2007-06-13 11:15:58 +00:00
Brian Barrett
a446af5b6b * Remove unneeded SRQ test -- we no longer support OFED builds that don't
have the SRQ interface.
  * Instead of setting AC_DEFINEs per MCA component, set per test.  THe
    answers can never be difference, and this will speed sed just a teeny
    bit

This commit was SVN r14856.
2007-06-05 01:49:26 +00:00
Gleb Natapov
444762456e Don't dereference NULL pointer. Fix bug introduced in r14768.
This commit was SVN r14781.

The following SVN revision numbers were found above:
  r14768 --> open-mpi/ompi@3401bd2b07
2007-05-27 09:24:56 +00:00
Galen Shipman
3401bd2b07 Add optional ordering to the BTL interface.
This is required to tighten up the BTL semantics. Ordering is not guaranteed,
but, if the BTL returns a order tag in a descriptor (other than
MCA_BTL_NO_ORDER) then we may request another descriptor that will obey
ordering w.r.t. to the other descriptor.


This will allow sane behavior for RDMA networks, where local completion of an
RDMA operation on the active side does not imply remote completion on the
passive side. If we send a FIN message after local completion and the FIN is
not ordered w.r.t. the RDMA operation then badness may occur as the passive
side may now try to deregister the memory and the RDMA operation may still be
pending on the passive side. 

Note that this has no impact on networks that don't suffer from this
limitation as the ORDER tag can simply always be specified as
MCA_BTL_NO_ORDER.

This commit was SVN r14768.
2007-05-24 19:51:26 +00:00
Gleb Natapov
3ebaff8dfe Implement new BTL parameters:
We eagerly send data up to btl_*_eager_limit with the match
Upon ACK of the MATCH we start using send/receives of size
btl_*_max_send_size up to the btl_*_rdma_pipeline_offset
After the btl_*_rdma_pipeline_offset we begin using RDMA writes of
size btl_*_rdma_pipeline_frag_size.

Now, on a per message basis we only use the above protocol if the
message is larger than btl_*_min_rdma_pipeline_size

btl_*_eager_limit - > same
btl_*_max_send_size -> same
btl_*_rdma_pipeline_offset -> btl_*_min_rdma_size
btl_*_rdma_pipeline_frag_size -> btl_*_max_rdma_size


btl_*_min_rdma_pipeline_size is new..

This patch also moves all BTL common parameters initialisation into
btl_base_mca.c file.

This commit was SVN r14681.
2007-05-17 07:54:27 +00:00
Pavel Shamis
e2d0e27111 Adding:
* openib_finalize flow for openib btl
* async event handler for openib btl

This commit was SVN r14623.
2007-05-08 21:47:21 +00:00
Rainer Keller
1aceece03f - Add a few comments for elements for structs, a few spelling fixes.
No functional change.

This commit was SVN r14534.
2007-04-26 21:03:38 +00:00
George Bosilca
1cb26e3b9c Finally the convertor export a convenience function to allow a consistent
computation of the current location on the pack/unpack process. This can
be used both for retrieving the pointer to the first byte (in the special
case of the cached RDMA protocol) and for getting the current
position (for the pipelined protocol).

I modified all BTLs, but most of them are still untested.

This commit was SVN r14180.
2007-03-30 22:02:45 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Gleb Natapov
2b6cbd6299 Separate frag lists for RDMA descriptors to two, one for src descriptors
and another for dst descriptors. This provide partial solution to OB1 protocol
deadlock problem. We can limit number of RDMA descriptors (by setting
btl_openib_free_list_max to something different from -1) and if we will be
lucky to hit this limit before we fail to register more memory the protocol
will not deadlock. When we had only one list for src/dst descriptors we
deadlocked when we reached max limit for the list.

This commit was SVN r13844.
2007-02-28 13:43:38 +00:00
Pavel Shamis
edeab0e912 Adding Mellanox Technologies copyright to files touched by Mellanox.
This commit was SVN r13669.
2007-02-15 18:03:20 +00:00
George Bosilca
56ffbfc5ff Get rid of the warnings in the Open IB BTL.
This commit was SVN r13424.
2007-02-01 19:07:04 +00:00
Brian Barrett
ee753694e0 Print out the memlock limit when we can't allocate memory
This commit was SVN r13372.
2007-01-30 21:22:56 +00:00
Jeff Squyres
6fea000e5f Oops -- get the right function name (copy-n-paste error).
This commit was SVN r13290.
2007-01-24 22:31:13 +00:00
Jeff Squyres
6b69ea664d Make a much, much better error message for a not-uncommon failure
scenario (user/sysadmin forgot to set the memlock limits high
enough).

This commit was SVN r13289.
2007-01-24 22:25:40 +00:00
Jeff Squyres
c9fe68c406 Better patch from Gleb to do the per-port (endpoint) specification of
whether to use eager RDMA or not

This commit was SVN r13262.
2007-01-23 22:40:59 +00:00
Galen Shipman
df099a4731 call it what it is...
we are looking at subnet_id's and we are counting active ports per subnet. 
move subnet count out of procs loop,, no need to do it there... 

This commit was SVN r13105.
2007-01-12 22:42:20 +00:00
Brian Barrett
e130f18cc2 Fix some compiler warnings that have slipped in lately...
This commit was SVN r13037.
2007-01-08 17:20:09 +00:00
Brian Barrett
8900d3ae43 Second take at fixing the issues with using ompi_ptr_t. Add helper functions for converting from .pval to .lval and vice-versa. Users of ompi_ptr_t types should only use one of the fields in the union unless using the helper conversion functions. For the BTLs, local pointers will always be stored in the .pval field and remote pointers always stored in the .lval field.
George wrote the initial patch, I extended it slightly and am responsible for all bugs found.

Refs trac:587

This commit was SVN r13023.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-07 01:48:57 +00:00
Brian Barrett
48ec0b2071 Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix
for now...

This commit was SVN r12997.

The following SVN revision numbers were found above:
  r12974 --> open-mpi/ompi@27cea44a9c
2007-01-04 22:07:37 +00:00
Galen Shipman
d207a6c988 endpoint should use a uint64_t for subnet, as everyone else does.. makes bad
things happen when packing into a 64 bit buffer... 

Misc cleanup.. 

This commit was SVN r12993.
2007-01-04 20:25:28 +00:00
Galen Shipman
f12bbe0591 Handle different subnets correctly and multiple nic endpoint negotiation
This is somewhat limited currently for expample,  if you have 3 ports on Node A and 5 ports
on Node B then the peers will use 3 ports to communicate with each other. 
This is on a subnet basis, so for any pair of nodes we take the
intersection of the available ports within a subnet.

We use subnets to determine reachability for lazy connection establishment. So
if Node A and Node B each have two HCA's (on seperate networks) then the
subnet's must be distinct, otherwise we will try to wire up HCA's on seperate
networks.  

This commit was SVN r12978.
2007-01-03 22:35:41 +00:00
Brian Barrett
27cea44a9c Fix a number of issues with the ompi_ptr_t:
* Make sure that the pval always writes to the correct portion of the
    lval.  This only matters on 32 bit big endian machines.
  * On 32 bit machines when assigning to pval, the other 4 bytes of lval
    weren't being written, which could lead to bogus data

We use macros so that there aren't casts all over the code and the pval
assignment can occur to the correct 4 bytes.  Refs trac:587

This commit was SVN r12974.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-03 19:47:48 +00:00
Gleb Natapov
190e7a27cd Merge with gleb-mpool branch. All RDMA components use same mpool now (rdma).
udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows
to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited).

This commit was SVN r12878.
2006-12-17 12:26:41 +00:00
Gleb Natapov
b4fd2d7d50 Fix warnings from progress thread patch.
This commit was SVN r12434.
2006-11-06 12:34:56 +00:00
Pavel Shamis
566667ac61 Adding progress thread support to OpenIB BTL.
Reviewed by Gleb.

This commit was SVN r12411.
2006-11-02 16:15:21 +00:00
George Bosilca
126a68dc9a Big datatype commit. Remove all unused features of the datatype engine. As the memory
allocation logic is completely done outside the data-type engine (in the PML) there is
no need for any special case inside the data-type engine. There is less arguments for
the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is
not required anymore as there is no memory allocated in the engine itself). This change
affect all components using datatypes. I test most of them, but it might happens that I
miss some ... If it's the case please let me know (don't shoot the pianist!!).

This commit was SVN r12331.
2006-10-26 23:11:26 +00:00
George Bosilca
640178c4b3 Grepping through the source files I found these calls to the data-type engine
with the wrong type of arguments.

This commit was SVN r12148.
2006-10-17 21:05:04 +00:00
Gleb Natapov
30608c5a50 Add separate queues for pending rget and rput frags. Process only limited
number of pending packets at once, otherwise we can spin forever.

This commit was SVN r11861.
2006-09-28 11:41:45 +00:00
Gleb Natapov
7999c08107 consolidate credit management and CQ polling code.
This commit was SVN r11622.
2006-09-12 09:17:59 +00:00
Gleb Natapov
d0caffa0aa Consolidate receive buffers prepost code for HP/LP QPs.
This commit was SVN r11552.
2006-09-07 13:05:41 +00:00
Gleb Natapov
298c825592 Remove #if OMPI_MCA_BTL_OPENIB_HAVE_SRQ. Always compile SRQ.
This commit was SVN r11537.
2006-09-06 05:45:37 +00:00
Gleb Natapov
424e412391 Make eager rdma work with SRQ enabled.
This commit was SVN r11530.
2006-09-05 16:04:04 +00:00
Gleb Natapov
fe932ca7bf consolidate part of HP/LP fields.
This commit was SVN r11528.
2006-09-05 16:00:18 +00:00
Galen Shipman
e5c594c211 More updates for the async error handler for btl's
In order to provide backwards compatability the framework versions are bumped
and the handler registeration function is at the end of the btl struct.
Testing done on sm, openib, and gm.. 

This commit was SVN r11256.
2006-08-17 22:02:01 +00:00
Galen Shipman
3b49953ce2 Add error callback to the btl interface, this allows error to be delivered to
the upperlayer assynchronously although there are some issues with this.. such
as there are multiple consumers of the btl's.. who get's the

This commit was SVN r11232.
2006-08-16 20:21:38 +00:00
Brian Barrett
9c30aefff5 * constant is always defined -- use #if, not #ifdef
This commit was SVN r11089.
2006-08-02 18:37:41 +00:00
Galen Shipman
fb9210463f clarify assignment..
This commit was SVN r11065.
2006-07-31 20:54:54 +00:00
Galen Shipman
ce0b8d9b48 cleanup of cq/srq sizing..
This commit was SVN r11061.
2006-07-31 17:24:39 +00:00
Galen Shipman
c9e0eda190 Initialize the completion queue to a reasonable size based on maximum number
of send/receives outstanding.

Use ibv_cq_resize if available after initial creation of completion queue if
cq_size is too small (based on number of peers). 

This commit was SVN r11053.
2006-07-30 00:58:40 +00:00
Gleb Natapov
91f48f9a79 Merge with gleb-pml branch. Add out of resource handling support to PML layer.
If resource is not available request is added to one of the pending list and retried later.

This commit was SVN r10900.
2006-07-20 14:44:35 +00:00
Gleb Natapov
9b0807e547 Put pending fragment on the right waiting list.
This commit was SVN r10671.
2006-07-06 07:51:23 +00:00
Gleb Natapov
704a5eb645 Support for LMC (lid mask count) and multiple QPs per port.
This commit was SVN r10536.
2006-06-28 07:23:08 +00:00
Galen Shipman
0344ae4ac5 Fix to allow eager limit and max send size to be any size (within resource limitations). Instead of storing the ompi_free_list_t * in the fragment, we use the frag type enum, this tells us where the frag came from and where it should return.. This could also be done in mvapi but is not a high priority moving forward..
Review by Brian, needs to hit the trunk + 1.1 release.. 

This commit was SVN r10157.
2006-06-01 02:32:18 +00:00
Gleb Natapov
01a119c3c5 fix compilation bug with --enable-mpi-threads
This commit was SVN r9426.
2006-03-26 13:24:10 +00:00
Tim Woodall
712468dbef add diagnostic interface
This commit was SVN r9328.
2006-03-17 17:39:41 +00:00
Galen Shipman
e58b758031 standardize behavior of btl_alloc, if the size is larger than the max send
size, btl_alloc returns NULL. 

This commit was SVN r9114.
2006-02-22 17:37:59 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
Tim Woodall
e861158fcd - removed debug code
- removed extraneous memset

This commit was SVN r8798.
2006-01-24 23:38:41 +00:00
Tim Woodall
a584c60dbe re-worked flow control logic to take into account the return
of credits from the peer prior to local completion, so that
we don't overrun the number of send wqes available.

This commit was SVN r8683.
2006-01-12 23:42:44 +00:00
Tim Woodall
63d0438991 merge in changes from release branch
This commit was SVN r8637.
2006-01-04 16:34:45 +00:00
Tim Woodall
e9498f7a75 improve error reporting when registrations fail
This commit was SVN r8598.
2005-12-22 16:05:28 +00:00
George Bosilca
e5158142b9 The lb should be extracted from the datatype not from the convertor.
This commit was SVN r8446.
2005-12-10 23:27:20 +00:00
Tim Woodall
1929a97d2f corrections for MPI_BOTTOM
This commit was SVN r8429.
2005-12-09 23:27:55 +00:00
Galen Shipman
4fce90a37b one last warning fixed on 32 bit platforms.
This commit was SVN r8191.
2005-11-18 17:27:09 +00:00
Galen Shipman
635e7a682b fix for 32bit compile warnings.
This commit was SVN r8190.
2005-11-18 17:08:51 +00:00
Galen Shipman
dde38d4119 reset sg_entry->addr to point at header when sending control messages.
cast to uint64_t (the correct datatype per verbs.h) instead of uintptr_t. 

This commit was SVN r8175.
2005-11-17 05:45:33 +00:00
Tim Woodall
654ba6d262 srq cleanup
This commit was SVN r8106.
2005-11-10 23:29:54 +00:00
Tim Woodall
4a06e8463c port of flow control from mvapi
This commit was SVN r8102.
2005-11-10 20:15:02 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Galen Shipman
cb84a57c57 add endpoint and srq flow-control..
Note, we are failing the ring tests in the intel p2p test suite, but we seem
to fail the same tests under the current trunk.. will look into this further. 

This commit was SVN r7823.
2005-10-21 02:21:45 +00:00
Galen Shipman
eefe0fd04a fix threaded compile
fix misc warnings 
cleanup posting of receive descriptors 
comment why we retain before deregister in rcache_rb_mru.c 

This commit was SVN r7595.
2005-10-03 16:35:12 +00:00
Galen Shipman
f46548e691 Add SRQ support to OpenIB btl, removed old mca param - not used..
This commit was SVN r7585.
2005-10-02 18:58:57 +00:00
Galen Shipman
67d38b7896 Add multi-nic support to openib
Fix connection establishment race in openib 
Other misc 

This commit was SVN r7570.
2005-09-30 22:58:09 +00:00
Brian Barrett
997644af31 * There are now two forms of ibv_create_cq, one with 3 params and one with 5.
Try to detect which form this version of Open IB uses, defaulting to the 5
  version if we can't figure it out (the new version has 5 params)
* Only add -lcm if it exists on the system - some versions of Open IB
  apparently don't need it.

This commit was SVN r7542.
2005-09-29 13:35:57 +00:00
Galen Shipman
f0b1ea52bc if all else fails in prepare_src,, pack
init the rdma_pending list in ob1

This commit was SVN r7366.
2005-09-14 04:41:33 +00:00
Galen Shipman
d932cfd342 merge of rcache work into the trunk.. lotsa fun ;-)..
I regression tested before the merge, I will regression test tonight and
correct issues that might have crept in. 

This commit was SVN r7329.
2005-09-12 22:28:23 +00:00
Galen Shipman
afdfa70f73 Added support for openib RDMA READ.. note that performance is currently an
issue so PUT is default.. We are determining if this is an openib issue or a
btl issue as we have seen performance increases on mvapi. 

This commit was SVN r6928.
2005-08-18 17:08:27 +00:00
Galen Shipman
ee6999fa90 typo in threaded build..
This commit was SVN r6898.
2005-08-16 13:22:08 +00:00
Galen Shipman
8e1e2eec3d Misc fixes for threaded builds..
This commit was SVN r6874.
2005-08-14 19:03:09 +00:00
Galen Shipman
73757b300c Added BTL_VERBOSE and OMPI_MCA_btl_base_debug , if set to 1 DEBUG output if
set to 2 VERBOSE output.. 

This commit was SVN r6783.
2005-08-09 17:49:39 +00:00
Tim Woodall
2214f0502d - first cut at tcp btl (working but not optimal)
- reworked btl error logging macros

This commit was SVN r6701.
2005-08-02 13:20:50 +00:00
Galen Shipman
4c119def85 Misc fixes..
This commit was SVN r6583.
2005-07-21 20:26:17 +00:00
Galen Shipman
fd969ac833 More code cleanup.. Also converted post receive requests to macros..
This commit was SVN r6566.
2005-07-20 17:43:31 +00:00
Galen Shipman
946402b980 More openib cleanup.. still note ready for public consumption ;-)
This commit was SVN r6565.
2005-07-20 15:17:18 +00:00
Galen Shipman
2f67ab82bb Working version of openib btl ;-)
Fixed receive descriptor counts that limited mvapi and openib to 2 procs.                                                   
Begin porting error messages to use the BTL_ERROR macro. 

This commit was SVN r6554.
2005-07-19 21:04:22 +00:00
Galen Shipman
5af3cc8045 carryover mvapi mpool changes to openib
This commit was SVN r6525.
2005-07-15 16:05:05 +00:00
Galen Shipman
b75560796c Fix up error handling in openib.. Added a simple debug test for memory
registration.. 

This commit was SVN r6520.
2005-07-15 15:13:19 +00:00
Galen Shipman
c1c4a5efba Compiling checkin of openib btl and mpool..
This commit was SVN r6452.
2005-07-13 00:17:08 +00:00
Galen Shipman
d7bdc46ac9 compile error and warining fixes for openib..
This commit was SVN r6449.
2005-07-12 21:49:30 +00:00
Galen Shipman
454fdff824 Initial commit of changes to the mvapi btl to the openib btl. Still need to
work on the configure.stub to correctly locate the ib libraries. 

This commit was SVN r6435.
2005-07-12 13:38:54 +00:00
Brian Barrett
e55f99d23a * rename ompi_if to opal_if
* rename ompi_malloc to opal_malloc
* rename ompi_numtostr to opal_numtostr
* start of rename of ompi_environ to opal_environ

This commit was SVN r6332.
2005-07-04 01:36:20 +00:00
Brian Barrett
a13166b500 * rename ompi_output to opal_output
This commit was SVN r6329.
2005-07-03 23:31:27 +00:00
Brian Barrett
39dbeeedfb * rename locking code from ompi to opal
This commit was SVN r6327.
2005-07-03 22:45:48 +00:00
Brian Barrett
761402f95f * rename ompi_list to opal_list
This commit was SVN r6322.
2005-07-03 16:22:16 +00:00