1
1
Граф коммитов

1152 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
671f0c379d Remove a whole pile of orte/util/show_help.h's that I missed. :-(
This commit was SVN r18437.
2008-05-14 11:32:33 +00:00
Jeff Squyres
e7ecd56bd2 This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.

= ORTE Job-Level Output Messages =

Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):

 * orte_output(): (and corresponding friends ORTE_OUTPUT,
   orte_output_verbose, etc.)  This function sends the output directly
   to the HNP for processing as part of a job-specific output
   channel.  It supports all the same outputs as opal_output()
   (syslog, file, stdout, stderr), but for stdout/stderr, the output
   is sent to the HNP for processing and output.  More on this below.
 * orte_show_help(): This function is a drop-in-replacement for
   opal_show_help(), with two differences in functionality:
   1. the rendered text help message output is sent to the HNP for
      display (rather than outputting directly into the process' stderr
      stream)
   1. the HNP detects duplicate help messages and does not display them
      (so that you don't see the same error message N times, once from
      each of your N MPI processes); instead, it counts "new" instances
      of the help message and displays a message every ~5 seconds when
      there are new ones ("I got X new copies of the help message...")

opal_show_help and opal_output still exist, but they only output in
the current process.  The intent for the new orte_* functions is that
they can apply job-level intelligence to the output.  As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.

=== New code ===

For ORTE and OMPI programmers, here's what you need to do differently
in new code:

 * Do not include opal/util/show_help.h or opal/util/output.h.
   Instead, include orte/util/output.h (this one header file has
   declarations for both the orte_output() series of functions and
   orte_show_help()).
 * Effectively s/opal_output/orte_output/gi throughout your code.
   Note that orte_output_open() takes a slightly different argument
   list (as a way to pass data to the filtering stream -- see below),
   so you if explicitly call opal_output_open(), you'll need to
   slightly adapt to the new signature of orte_output_open().
 * Literally s/opal_show_help/orte_show_help/.  The function signature
   is identical.

=== Notes ===

 * orte_output'ing to stream 0 will do similar to what
   opal_output'ing did, so leaving a hard-coded "0" as the first
   argument is safe.
 * For systems that do not use ORTE's RML or the HNP, the effect of
   orte_output_* and orte_show_help will be identical to their opal
   counterparts (the additional information passed to
   orte_output_open() will be lost!).  Indeed, the orte_* functions
   simply become trivial wrappers to their opal_* counterparts.  Note
   that we have not tested this; the code is simple but it is quite
   possible that we mucked something up.

= Filter Framework =

Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr.  The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations.  The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc.  This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).

Filtering is not active by default.  Filter components must be
specifically requested, such as:

{{{
$ mpirun --mca filter xml ...
}}}

There can only be one filter component active.

= New MCA Parameters =

The new functionality described above introduces two new MCA
parameters:

 * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
   help messages will be aggregated, as described above.  If set to 0,
   all help messages will be displayed, even if they are duplicates
   (i.e., the original behavior).
 * '''orte_base_show_output_recursions''': An MCA parameter to help
   debug one of the known issues, described below.  It is likely that
   this MCA parameter will disappear before v1.3 final.

= Known Issues =

 * The XML filter component is not complete.  The current output from
   this component is preliminary and not real XML.  A bit more work
   needs to be done to configure.m4 search for an appropriate XML
   library/link it in/use it at run time.
 * There are possible recursion loops in the orte_output() and
   orte_show_help() functions -- e.g., if RML send calls orte_output()
   or orte_show_help().  We have some ideas how to fix these, but
   figured that it was ok to commit before feature freeze with known
   issues.  The code currently contains sub-optimal workarounds so
   that this will not be a problem, but it would be good to actually
   solve the problem rather than have hackish workarounds before v1.3 final.

This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
Jon Mason
125eb5a2ed Convert from the Linux ifaddrs to the OMPI ifaddrs, which should unbreak Solaris.
This commit was SVN r18433.
2008-05-13 18:34:22 +00:00
Jeff Squyres
d8e5608053 Remove all retransmission code; the IBCM kernel module handles all of
that for us.

This commit was SVN r18432.
2008-05-13 16:10:34 +00:00
Jon Mason
74bf1ae25f Fix compiler warnings
This commit was SVN r18431.
2008-05-13 16:01:58 +00:00
Jon Mason
4ead9442b5 Add in IDs for all Chelsio iWARP capable adapters
This commit was SVN r18428.
2008-05-12 21:59:03 +00:00
Jeff Squyres
6b26895ad4 A little style update -- constants on the left...
This commit was SVN r18426.
2008-05-12 12:05:16 +00:00
Jeff Squyres
16cde0e5fa Fix compile error on older OFED systems
This commit was SVN r18425.
2008-05-12 11:56:14 +00:00
Gleb Natapov
6844ff32ba Return OMPI_ERR_RESOURCE_BUSY from sm->btl_send() function if there is no place in cb. This will prevent OB1 from doing early completion of small sends.
This commit was SVN r18424.
2008-05-12 07:15:29 +00:00
Gleb Natapov
0827e537fa Don't include rdma/rdma_cma.h if !OMPI_HAVE_RDMACM.
This commit was SVN r18422.
2008-05-11 11:58:02 +00:00
Jon Mason
99ab66e131 RDMACM code cleanup
This patch adds some much needed comments, reduces the amount of code
wrapping, and rearrges and removes redundant code.

This commit was SVN r18417.
2008-05-08 21:20:12 +00:00
Jon Mason
88e5f2a339 Abstract iWARP subnet ID functions (sans build break)
The iWARP subnet ID determination should not be in the RDMACM cpc, as
it was in the preversion, as this violates the cpc abstract that is
present throughout the code.  Also, this patch uses the opal_list_t
data struct instead of using its own linked lists.

This attempt includes *iwarp.c and *iwarp.h

This commit was SVN r18414.
2008-05-08 14:38:14 +00:00
Jeff Squyres
60f39a30f6 Revert r18409; that commit broke the build because it forgot to add
the btl_openib_iwarp.c and btl_openib_iwarp.h files.

This commit was SVN r18410.

The following SVN revision numbers were found above:
  r18409 --> open-mpi/ompi@056bbb68c8
2008-05-08 00:22:21 +00:00
Jon Mason
056bbb68c8 Abstract iWARP subnet ID functions
The iWARP subnet ID determination should not be in the RDMACM cpc, as
it was in the preversion, as this violates the cpc abstract that is
present throughout the code.  Also, this patch uses the opal_list_t
data struct instead of using its own linked lists.

This commit was SVN r18409.
2008-05-07 23:59:43 +00:00
Ralph Castain
7c7b9b0486 Do a little cleanup on the opal graph class and opal carto framework to conform to OMPI naming conventions and avoid potential conflict with user applications - no change in functionality, passes carto test program
This commit was SVN r18407.
2008-05-07 19:33:49 +00:00
Jeff Squyres
157cea378f * A few fixes to make IP address and port number comparisons properly
* A few indenting and style fixes

This commit was SVN r18405.
2008-05-07 16:56:07 +00:00
Jeff Squyres
bfae8ea828 The comment wasn't long enough; I felt the need to make it longer (and
explain a little more ;-) ).

This commit was SVN r18404.
2008-05-07 16:53:05 +00:00
Jeff Squyres
63abb3eb9b Clarify a comment / fix typos.
This commit was SVN r18402.
2008-05-07 14:51:36 +00:00
Jon Mason
502d164908 Create subnet ID's for iWARP.
This enables subnet differientation for iWARP devices, and rearrange
initilization so that the services are available when they are needed.

This commit was SVN r18393.
2008-05-06 22:43:52 +00:00
Jon Mason
9c724128f8 Handle no IP Address in rdmacm more resiliently
If there is no IP Address, have rdmacm log the correct error and let
another cpc have a go at it.  This is being done by splitting off the
IP address checking logic for the modex message creation, and having
it log the correct error in the error case.

This commit was SVN r18392.
2008-05-06 22:31:29 +00:00
Jon Mason
46bfd42c09 Fix compile warnings in rdmacm
Fix some reported compiler warnings and make the code a little prettier.

This commit was SVN r18391.
2008-05-06 22:19:28 +00:00
Jon Mason
9066168cd1 Prevent iWARP qp flush errors.
For iWARP, the TCP connection is tied to the QP once the QP is in RTS.  
And destroying the QP is thus tied to connection teardown for iWARP.  
This is a key distinction from IB, I think.   Anyway, to destroy the 
connection in iWARP you must move the QP out of RTS, either into CLOSING 
for a nice graceful close, or to ERROR if you want to be rude.  In both 
cases, all pending non-completed SQ and RQ WRs must be flushed.

This patch ignores all flush errors reaped by the cq and removes an
earlier attempt to work around this in the rdmacm cpc.

This commit was SVN r18388.
2008-05-06 21:57:40 +00:00
Jeff Squyres
a06d4023b8 Oops -- missed one sys_errlist -> strerror().
This commit was SVN r18378.
2008-05-06 13:22:36 +00:00
Jeff Squyres
4154e587de strerror() is much better.
This commit was SVN r18376.
2008-05-05 21:06:07 +00:00
Jon Mason
a3bf503e01 Remove error on rdma cm
If there are multiple QP's, RDMACM will not send a message if the
qpnum != 0.  In doing so, it will log an error unecessarily.  This
removes that.

This commit was SVN r18363.
2008-05-02 20:12:01 +00:00
Jon Mason
3989981578 Enable support of num_proc > num_nodes
Add the logic to support using port numbers, instead of simply using
the IP address of the sending node to determine which endpoint to
connect.  Since each process calls the cpc query function, it will
generate its own port to listen on thus enablign this to work.

This commit was SVN r18362.
2008-05-02 16:20:28 +00:00
Jeff Squyres
ba5615a18f Merge in /tmp-public/cpc3 branch to trunk. oob/xoob still remains the
default CPC.

This commit was SVN r18356.
2008-05-02 11:52:33 +00:00
Donald Kerr
843a35094f adding local work queue accounting
This commit was SVN r18352.
2008-05-01 21:01:51 +00:00
George Bosilca
a69ac964df Allow any order in the list of Elan vpid.
This commit was SVN r18350.
2008-05-01 20:32:03 +00:00
Pavel Shamis
61cc8843bf The r17940 broke the XRC code.
The endpoint may be appended to list during XOOB connection bring up.

This commit was SVN r18328.

The following SVN revision numbers were found above:
  r17940 --> open-mpi/ompi@ebfdd133f5
2008-04-29 13:22:40 +00:00
Brad Penoff
c699236be2 updating SCTP BTL to configure properly with FreeBSD 7
This commit was SVN r18324.
2008-04-28 04:19:10 +00:00
Adrian Knoth
c53d3c3c22 reverted r18169,r18170 due to connection reset by peer on odin/sif
This commit was SVN r18255.

The following SVN revision numbers were found above:
  r18169 --> open-mpi/ompi@20473bfda2
  r18170 --> open-mpi/ompi@d34dfbe12c
2008-04-23 15:26:15 +00:00
Jeff Squyres
c40740947f Fix minor spelling error.
This commit was SVN r18229.
2008-04-22 13:11:50 +00:00
Galen Shipman
27c425b304 make portals level ack's optional (require ACK by default)
This commit was SVN r18228.
2008-04-21 22:22:18 +00:00
Ralph Castain
fa082cafa9 Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex.
Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer.

This commit was SVN r18198.
2008-04-17 20:43:56 +00:00
Adrian Knoth
d34dfbe12c fixed misleading comment.
This commit was SVN r18170.
2008-04-16 11:26:15 +00:00
Adrian Knoth
20473bfda2 on incoming connections, compare with every possible source address.
Rational (taken from the code):

    /* This is PITA. We never know which source address an 
    * incoming/outgoing packet will have, so even with 
    * btl_tcp_if_include/exclude on the remote end, we 
    * might get a different source address. 
    * 
    * If this address isn't included in btl_proc->proc_addrs, 
    * we would erroneously drop the connection 
    */ 

merge -r18165:18167 to the trunk.

This commit was SVN r18169.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r18165
  r18167
2008-04-16 11:24:09 +00:00
Adrian Knoth
e981a259bb btl_tcp_disable_family=4 and btl_tcp_disable_family=6 are mutually
exclusive, so this should result in "unreachable" when set differently
between peers.

This commit was SVN r18168.
2008-04-16 10:14:58 +00:00
Adrian Knoth
75c54616c7 renamed opal_sockaddr2str to opal_net_get_hostname for WANT_PEER_DUMP=1
This commit was SVN r18154.
2008-04-15 19:23:47 +00:00
Jeff Squyres
72af302360 Remove unused variable.
This commit was SVN r18151.
2008-04-15 14:58:32 +00:00
Aurelien Bouteiller
0f311ed824 Make sure the function returns NULL when no elan adapter is available instead of a random value.
This commit was SVN r18136.
2008-04-11 21:03:01 +00:00
Aurelien Bouteiller
20592cbcbf Fixes a warning about mallocing 0 bytes when no elan adapter is available.
This commit was SVN r18135.
2008-04-11 20:59:12 +00:00
Jon Mason
08ead87604 Potential double free of locks
mca_btl_openib_endpoint_post_rr_nolock is freeing the endpoint lock on
the error case, but most/all of the functions calling this free the lock
regardless of its error case.  Thus resulting is a double free of the
lock.

This commit was SVN r18131.
2008-04-10 21:15:01 +00:00
Donald Kerr
38e298cc9a report error message in all libs, not just debug
This commit was SVN r18103.
2008-04-08 22:58:28 +00:00
Gleb Natapov
713a27dc71 Counter of created RDMA channels should be incremented immediately after channel
creation (not in control message completion) otherwise more than max_eager_rdma
channel may be created.

This commit was SVN r18082.
2008-04-06 13:48:45 +00:00
Jeff Squyres
7072a32703 * Properly protect XRC stuff
* A few minor style fixes

This commit was SVN r18076.
2008-04-02 19:52:03 +00:00
George Bosilca
944453c4c1 Cleanups.
This commit was SVN r18068.
2008-04-02 06:37:42 +00:00
Jeff Squyres
d0f12f3df0 Make a better error message.
This commit was SVN r18014.
2008-03-29 12:54:24 +00:00
George Bosilca
be4b153f0d Another patch for thread safety in the TCP BTL (thanks to Pierre).
This commit was SVN r17993.
2008-03-27 18:36:08 +00:00
Jeff Squyres
5320c91ab3 Oops -- fix the constructor to also use opal_object_t instead of
opal_list_item_t.

This commit was SVN r17945.
2008-03-25 11:59:50 +00:00
Jeff Squyres
ebfdd133f5 AFACT, we never put endpoints on a list.
This commit was SVN r17940.
2008-03-24 18:32:55 +00:00
Ralph Castain
dc7f45dafd Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure.
Only one place used the user name field - session_dir, when formulating the name of the top-level directory. Accordingly, the code for getting the user's id has been moved to the session_dir code.

This commit was SVN r17926.
2008-03-23 23:10:15 +00:00
Galen Shipman
dcac824f59 Fix problem in releasing fragments during GET_END event (didn't check that
portals btl has ownership and therefor didn't free the frag as it should) this
causes leakage and hangs in MPI_Finalize. 

Also added a bit more debugging. 

This commit was SVN r17900.
2008-03-20 22:46:32 +00:00
George Bosilca
1d04ec4ded Correct the connection logic for TCP. Now we have not only a cleaner
connection, but a more thread safe one. Thanks to Pierre for his
help on this.

This commit was SVN r17853.
2008-03-18 02:42:16 +00:00
Gleb Natapov
9b6db25182 Fix compilation warning.
This commit was SVN r17839.
2008-03-17 13:37:57 +00:00
Pavel Shamis
54ad8d7446 The issue was reported/fixed by Jon Mason one month ago but the fix was not committed. So I'm commiting it now.
This commit was SVN r17835.
2008-03-17 11:13:06 +00:00
Brad Penoff
be13b86fc5 Clarifying and fixing SCTP btl_sctp_if_11 parameter
This commit was SVN r17834.
2008-03-17 09:18:31 +00:00
Gleb Natapov
f488b94899 More SM BTL initialization cleanups.
This commit was SVN r17833.
2008-03-16 10:01:56 +00:00
Jeff Squyres
6c77c995c2 Add missing dependencies in the static build case.
This commit was SVN r17825.
2008-03-15 12:11:36 +00:00
George Bosilca
5e229fe688 Thanks Ma for the patch. Correct the multi-rail support and
rename some fields to something more clear.

This commit was SVN r17824.
2008-03-14 19:17:28 +00:00
George Bosilca
ecebd5ae77 Update the Elan BTL to take in account multiple networks, and correctly deal
with the node position in the network.

This commit was SVN r17822.
2008-03-14 17:32:35 +00:00
Gleb Natapov
772772b944 Remove unneeded include.
This commit was SVN r17813.
2008-03-12 10:01:20 +00:00
Gleb Natapov
90c70e37b9 Clean up SM btl startup code. Remove no longer needed code leftovers from two
BTL times. Remove old and no longer correct comment.

This commit was SVN r17805.
2008-03-11 14:39:10 +00:00
Gleb Natapov
ffa09c44fd Pass correct pointer to mpool_base function.
This commit was SVN r17795.
2008-03-09 13:22:12 +00:00
Gleb Natapov
b0b21c68b4 Remove trailing spaces from SM BTL.
This commit was SVN r17794.
2008-03-09 13:17:13 +00:00
Tim Prins
5de3e1965e Remove the orte_proc_table. Migrate all users of it to the opal_hash_table and a new name hash function in orte.
Everything should work, however I am unable to compile and test the sctp BTL.

This commit was SVN r17751.
2008-03-05 22:44:35 +00:00
Donald Kerr
ef8f807c1c was not passing correct variable to dat_strerror
This commit was SVN r17749.
2008-03-05 21:45:16 +00:00
Jeff Squyres
ea5c0cb4a2 Now that the nightly tarball has safely been made, let's try this
commit again.  Remove the svn:ignore from problematic directories and
try a merge from /tmp-public/plpa-merge-area2.

This commit was SVN r17718.
2008-03-05 02:45:15 +00:00
Jeff Squyres
8189fcc7d5 Back out r17702; it went very badly.
This commit was SVN r17704.

The following SVN revision numbers were found above:
  r17702 --> open-mpi/ompi@3df754ebd7
2008-03-05 00:42:39 +00:00
Jeff Squyres
3df754ebd7 Bring over PLPA v1.1 from /tmp-public/plpa-v1.1 branch.
This commit was SVN r17702.
2008-03-05 00:16:49 +00:00
Christian Bell
c3d0a81cd3 Add new QLogic adapters to hca-params.init
This commit was SVN r17699.
2008-03-04 22:14:27 +00:00
Gleb Natapov
08abafdaa1 Initialize ib_pd to NULL.
This commit was SVN r17674.
2008-03-02 09:11:23 +00:00
Tim Prins
84b2099fe8 Remove the now-unused orte_value_array. As this is the last 'class' split between orte and ompi, remove the big comment about the split in ompi_bitmap.
Also, update some properties (source files should not be executeable...), and remove a couple unneeded inclusions of orte_proc_table.h

This commit was SVN r17655.
2008-02-28 21:39:42 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
Galen Shipman
44003a41f2 Update common_portals to allow using portals interconnect with a modex rather
than relying on cnos to get the nid/pid map. 

This commit was SVN r17588.
2008-02-25 19:17:21 +00:00
Brian Barrett
bc8d863ce3 * Make Portals BTL compile again (looks like the frag ownership stuff didn't
get copied well)
* Clean up a bunch of warnings

This commit was SVN r17562.
2008-02-23 01:45:36 +00:00
Donald Kerr
437e280829 removing a few superfluous casts when the base or super is available
This commit was SVN r17554.
2008-02-22 20:10:55 +00:00
Donald Kerr
fe51084d8e fix compile warning by casting btl udapl module to base module before call to mca_btl_udapl_free
This commit was SVN r17541.
2008-02-21 16:19:06 +00:00
Pierre Lemarinier
2a99f89631 Modification of the mutex lock order to prevent races during connection stage.
This commit was SVN r17535.
2008-02-20 18:17:58 +00:00
Pavel Shamis
a0d12a9c92 Adding support for APM over different ports
This commit was SVN r17521.
2008-02-20 13:44:05 +00:00
Gleb Natapov
60c151608c Set flags inside fragment allocation function.
This commit was SVN r17508.
2008-02-19 12:26:45 +00:00
Nysal Jan
479f36adfc Fix a SEGV on ppc64. size_t is 8 bytes on a 64-bit build
This commit was SVN r17507.
2008-02-19 11:01:21 +00:00
Jeff Squyres
f22f62ef1f Fix typos.
This commit was SVN r17502.
2008-02-18 21:26:21 +00:00
Jeff Squyres
33a4aff18e Make openib btl a bit more resillient in the face of driver errors --
return OMPI_ERR_UNREACH if the port returns an invalid speed or
width.  OMPI_ERR_VALUE_OUT_OF_BOUNDS is reserved for when we exceed
the number of allowable BTLs.

This commit was SVN r17500.
2008-02-18 20:28:06 +00:00
George Bosilca
7a21d77b29 Remove some compilation warnings.
This commit was SVN r17498.
2008-02-18 18:55:32 +00:00
George Bosilca
fa31ec81d0 Add the ownership flags to the PML/BTL interface. The layer
owning the descriptor is responsible for releasing it once
the descriptor is not in use anymore.

This commit was SVN r17497.
2008-02-18 17:39:30 +00:00
George Bosilca
be2579467a With the new ompi_free_list this is not needed anymore.
This commit was SVN r17465.
2008-02-15 03:22:16 +00:00
Donald Kerr
58bf7f5a1d add uintptr_t to prevent the possibility of a signed extension occuring
This commit was SVN r17456.
2008-02-14 19:16:34 +00:00
Jeff Squyres
6420db7088 Add missing header file that caused compilation errors in the
rhc-step2b branch last night.

This commit was SVN r17453.
2008-02-14 14:10:27 +00:00
George Bosilca
255cd2186b Improve the performance of the MX BTL. Correct the fake PUT
protocol.

This commit was SVN r17452.
2008-02-14 04:38:55 +00:00
Adrian Knoth
f1648f08df Advanced address selection code from Thomas Peiselt. Re #1207, #1027
This commit was SVN r17450.
2008-02-13 21:53:00 +00:00
Sharon Melamed
5b2dab2439 Reverted commit # r17443
This commit was SVN r17446.

The following SVN revision numbers were found above:
  r17443 --> open-mpi/ompi@88ce5a2b73
2008-02-13 14:07:12 +00:00
Sharon Melamed
88ce5a2b73 Replaced PLPA to the latest PLPA (plpa-1.1a3r123)
This commit was SVN r17443.
2008-02-13 13:09:11 +00:00
Rainer Keller
7621800477 - Fix and add comments -- output full name for pd
- Protect argument in macro...

This commit was SVN r17434.
2008-02-12 16:59:59 +00:00
Gleb Natapov
cf801edfe5 Use carto topology framework to choose which HCAs to use.
This commit was SVN r17414.
2008-02-11 10:34:11 +00:00
George Bosilca
ee321748a6 The lost space.
This commit was SVN r17413.
2008-02-10 22:08:49 +00:00
Pavel Shamis
df787bbeab Fixing compilation issue on machines with ofed under 1.3.
Also finx in apm migration flow.

This commit was SVN r17383.
2008-02-06 13:54:58 +00:00
Pavel Shamis
3ba3f70624 Adding apm support for xrc.
This commit was SVN r17382.
2008-02-06 10:19:51 +00:00
Gleb Natapov
03c80bdfe3 Fix old libiverbs case.
This commit was SVN r17370.
2008-02-04 14:05:01 +00:00
Pavel Shamis
f0c478e7e0 XRC - replacing the new old API with new one.
This commit was SVN r17369.
2008-02-04 14:03:38 +00:00
Gleb Natapov
67f752dd50 Add compatibility function between old libibverbs and current libibverbs
way of detecting HCAs.

This commit was SVN r17365.
2008-02-03 15:16:24 +00:00
George Bosilca
3a6d2e3894 The latest and greatest Elan improvements.
This commit was SVN r17361.
2008-02-01 21:29:57 +00:00
Gleb Natapov
f73adf69c0 Fix compiler warnings on 32bit systems.
This commit was SVN r17346.
2008-01-31 09:05:25 +00:00
Adrian Knoth
8ae4a10b4c Reverted r17331, r17332. Still broken. I'm in a bad hurry. :-( Re #1206
This commit was SVN r17333.

The following SVN revision numbers were found above:
  r17331 --> open-mpi/ompi@3846e2a797
  r17332 --> open-mpi/ompi@c03de08c55
2008-01-30 16:51:55 +00:00
Adrian Knoth
c03de08c55 Logic is wrong. I'm going to revert it again. Re #1206
This commit was SVN r17332.
2008-01-30 16:48:50 +00:00
Adrian Knoth
3846e2a797 When checking incoming connections, also care about aliased interfaces.
Re #1206

This commit was SVN r17331.
2008-01-30 16:45:41 +00:00
Adrian Knoth
7f79c68930 Reverted r17307 and r17308. It broke parallel TCP connections. Re #1206
This commit was SVN r17329.

The following SVN revision numbers were found above:
  r17307 --> open-mpi/ompi@7a59b3f58c
  r17308 --> open-mpi/ompi@72b29bc21f
2008-01-30 14:31:47 +00:00
Adrian Knoth
72b29bc21f Cosmetic patch. Use IN6_ARE_ADDR_EQUAL instead of memcmp(). Re #1206.
This commit was SVN r17308.
2008-01-29 16:02:24 +00:00
Adrian Knoth
7a59b3f58c accept incoming connections from hosts with multiple addresses.
We loop over all peer addresses and accept when one of them matches.
Note that this might break functionality: mca_btl_tcp_proc_insert now
always inserts the same endpoint. (is the lack of endpoints the problem?
should there be one for every remote address?)

Re #1206

This commit was SVN r17307.
2008-01-29 15:55:56 +00:00
Pavel Shamis
7b59f8ae0b Fixing warning in apm code.
This commit was SVN r17306.
2008-01-29 15:45:18 +00:00
Gleb Natapov
bb03e07ec4 Move eager RDMA channels accounting into completion callback. Otherwise it
can go wrong with XRC as endpoint may be not yet connected at the time
eager rdma channel is created.

This commit was SVN r17302.
2008-01-29 14:35:33 +00:00
Pavel Shamis
92ef832472 Making sure that XRC will not overrun ib_dev_attr.max_qp_wr
This commit was SVN r17300.
2008-01-29 13:15:21 +00:00
Pavel Shamis
7d83f34eb0 Protecting the apm code with OMPI_HAVE_THREADS.
This commit was SVN r17284.
2008-01-28 16:10:18 +00:00
Jeff Squyres
6a49c97368 Remove erroneous #if
This commit was SVN r17282.
2008-01-28 14:38:03 +00:00
Pavel Shamis
28a3917306 Adding APM support (over different lids).
This commit was SVN r17280.
2008-01-28 10:38:08 +00:00
George Bosilca
3418485085 Replace the tport by a queue.
This commit was SVN r17221.
2008-01-25 01:15:18 +00:00
Donald Kerr
66acac8ff3 the value for invalid idx was just plain wrong, a more appropriate value is now used
This commit was SVN r17201.
2008-01-24 15:01:26 +00:00
Jeff Squyres
2227d5ec4a Add configure check for struct ibv_device.transport type, which was added in OFED v1.2. Still need to fix up oob and rdma_cm cpc's to do something better with this information...
This commit was SVN r17198.
2008-01-24 12:14:21 +00:00
Gleb Natapov
52c94fa7ea Fix compilation warnings.
This commit was SVN r17169.
2008-01-21 15:07:39 +00:00
Gleb Natapov
c9a1b06771 Remove trailing whitespaces. No code changes in this commit.
This commit was SVN r17167.
2008-01-21 12:11:18 +00:00
George Bosilca
170416797d This commit was SVN r17162. 2008-01-18 20:10:57 +00:00
George Bosilca
0081202195 Mark the receives as ELAN_TPORT_RXBUF | ELAN_TPORT_RXANY ...
This commit was SVN r17161.
2008-01-18 20:00:44 +00:00
George Bosilca
bf299bb833 Keep most of the functions as static. Improve the progress function. Get rid
of all internal quues that are not really useful.

This commit was SVN r17160.
2008-01-18 19:28:50 +00:00
Donald Kerr
5f884b1ca4 fix for #1130 - adds support for multi-rail configurations
This commit was SVN r17152.
2008-01-17 17:30:50 +00:00
Donald Kerr
908b514ac5 update use of internal tag values to accommodate the active message change found in r17140
This commit was SVN r17148.

The following SVN revision numbers were found above:
  r17140 --> open-mpi/ompi@6310ce955c
2008-01-16 21:17:25 +00:00
Pavel Shamis
add4d9df8a XRC fixes for MPI2 dynamics.
This commit was SVN r17144.
2008-01-15 21:14:48 +00:00
Jeff Squyres
251842ff6a Remove this AS_IF -- it breaks "make dist".
This commit was SVN r17143.
2008-01-15 12:33:08 +00:00
George Bosilca
e8ac5ff04d Typos.
This commit was SVN r17141.
2008-01-15 05:37:42 +00:00
George Bosilca
6310ce955c The first patch related to the Active Message stuff. So far, here is what we have:
- the registration array is now global instead of one by BTL.
- each framework have to declare the entries in the registration array reserved. Then
  it have to define the internal way of sharing (or not) these entries between all
  components. As an example, the PML will not share as there is only one active PML
  at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3
  are reserved for the framework while the remaining 5 are use internally by each
  framework.
- The registration function is optional. If a BTL do not provide such function,
  nothing happens. However, in the case where such function is provided in the BTL
  structure, it will be called by the BML, when a tag is registered.

Now, it's time for the second step... Converting OB1 from a switch based PML to an
active message one.

This commit was SVN r17140.
2008-01-15 05:32:53 +00:00
Jon Mason
a0d4122606 The new cpc selection framework is now in place. The patch below allows
for dynamic selection of cpc methods based on what is available.  It
also allows for inclusion/exclusions of methods.  It even futher allows
for modifying the priorities of certain cpc methods to better determine
the optimal cpc method.

This patch also contains XRC compile time disablement (per Jeff's
patch).

At a high level, the cpc selections works by walking through each cpc
and allowing it to test to see if it is permissable to run on this
mpirun.  It returns a priority if it is permissable or a -1 if not.  All
of the cpc names and priorities are rolled into a string.  This string
is then encapsulated in a message and passed around all the ompi
processes.  Once received and unpacked, the list received is compared
to a local copy of the list.  The connection method is chosen by
comparing the lists passed around to all nodes via modex with the list
generated locally.  Any non-negative number is a potentially valid
connection method.  The method below of determining the optimal
connection method is to take the cross-section of the two lists.  The
highest single value (and the other side being non-negative) is selected
as the cpc method.

svn merge -r 16948:17128 https://svn.open-mpi.org/svn/ompi/tmp-public/openib-cpc/ .

This commit was SVN r17138.
2008-01-14 23:22:03 +00:00
Pavel Shamis
6e50fca2dd Fixing permissions for XRC domain file.
This commit was SVN r17127.
2008-01-13 19:23:11 +00:00
Jon Mason
626e0814a2 Style clean-up
This commit was SVN r17126.
2008-01-12 18:47:17 +00:00
Jon Mason
3970c3ff6c Add Chelsio T3 to ompi/mca/btl/openib/mca-btl-openib-hca-params.ini
This commit was SVN r17101.
2008-01-09 22:14:18 +00:00
Jon Mason
597c7e68f1 Minor cleanups
This commit was SVN r17100.
2008-01-09 21:54:11 +00:00
Rolf vandeVaart
870fa8b1f1 Pad the sm btl header to double-word alignment. Preserves PML
header as double-word aligned and prevents bus errors on SPARC
based servers.  This is part of fix for #1148.

Refs trac:1148

This commit was SVN r17090.

The following Trac tickets were found above:
  Ticket 1148 --> https://svn.open-mpi.org/trac/ompi/ticket/1148
2008-01-09 18:50:51 +00:00
Gleb Natapov
25ce70bb92 Call mca_btl_openib_endpoint_post_send() holding endpoint lock and not holding
qp lock since this is what the function assumes.

This commit was SVN r17086.
2008-01-09 14:46:41 +00:00
Pavel Shamis
99f51482e3 Fixing openib finalization flow.
This commit was SVN r17085.
2008-01-09 12:36:30 +00:00
Gleb Natapov
51d6ca0cb6 Provide no lock version of mca_btl_openib_endpoint_post_rr(). On connection
creation we call it with endpoint lock already held.

This commit was SVN r17084.
2008-01-09 10:39:35 +00:00
Gleb Natapov
50af6b9e78 Rearrange functions order so that functions are defined before they are used. No
code changes here.

This commit was SVN r17083.
2008-01-09 10:27:15 +00:00
Gleb Natapov
621fa223c5 Create free lists of fragments per HCA, not per BTL. Saves memory in case of
multiple LMCs.

This commit was SVN r17082.
2008-01-09 10:26:21 +00:00
Gleb Natapov
5ce3213158 Rearrange functions order so that functions are defined before they are used. No
code changes here.

This commit was SVN r17081.
2008-01-09 10:05:41 +00:00
Pavel Shamis
fbf7bcd9a9 We need to prepost on srq/xrc before reply with ENDPOINT_XOOB_CONNECT_XRC_RESPONSE.
This commit was SVN r17066.
2008-01-08 10:30:16 +00:00
Rolf vandeVaart
0f0fde3490 Partial fix for #1148. Enable this for 32-bit sparc as well as 64-bit sparc.
This commit was SVN r17059.
2008-01-07 15:43:44 +00:00
Gleb Natapov
c3bbf69356 Set send_flags correctly in btl_openib_put. Otherwise we may reuse flags from
previous use of the buffer and they may be incorrect.

This commit was SVN r17058.
2008-01-07 10:19:07 +00:00
George Bosilca
48f5a26e8c Cast to keep VC happy (quiet).
This commit was SVN r17054.
2008-01-04 23:13:32 +00:00
Jeff Squyres
a234ba198a Remove superflous / unused -D from Makefile.am.
This commit was SVN r17030.
2008-01-02 18:00:20 +00:00
Jeff Squyres
c9bea80f8f Fix unbalanced parenthesees noticed by Paul Hargove.
This commit was SVN r17029.
2008-01-02 13:34:07 +00:00
Gleb Natapov
2fb6947f88 Destroy endpoints that use eager rdma communication before destroying SRQ. Do't
skip async event thread destruction if SRQ was not destroyed, or it will segfault
on module removal.

This commit was SVN r17025.
2007-12-23 13:58:31 +00:00
Gleb Natapov
b06d92bdab OpenIB BTL has three channels through which data can be received (eager rdma,
high prio QPs and low prio QPs) and because not all of them are polled each time
progrgess() is called (to save on latency) starvation is possible. The commit
fixes this. Now each channel is polled, but higher priority channels are polled
more often. Three new parameters are introduced that control polling ratios 
between different channels.

This commit was SVN r17024.
2007-12-23 12:29:34 +00:00
Brad Penoff
4c2571b54c fixed more 64 bit SCTP BTL warnings
This commit was SVN r17022.
2007-12-21 21:50:00 +00:00