1
1
Граф коммитов

1152 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
b45d59ea2e Adjust API usage for new parameter in mca_base_param_lookup_source()
This commit was SVN r19115.
2008-07-31 21:56:20 +00:00
George Bosilca
7f01be2830 Remove useless defines. The MCA was bumped to 2.0, there is no reason to keep
these defines around.

This commit was SVN r19091.
2008-07-30 10:40:18 +00:00
Donald Kerr
2899f64146 moving vendor_id info to top of file
This commit was SVN r19087.
2008-07-29 22:33:17 +00:00
Donald Kerr
513225c9f3 add Sun Microsystems, Inc. vendor_id
This commit was SVN r19085.
2008-07-29 19:31:41 +00:00
Jeff Squyres
0af7ac53f2 Fixes trac:1392, #1400
* add "register" function to mca_base_component_t
   * converted coll:basic and paffinity:linux and paffinity:solaris to
     use this function
   * we'll convert the rest over time (I'll file a ticket once all
     this is committed)
 * add 32 bytes of "reserved" space to the end of mca_base_component_t
   and mca_base_component_data_2_0_0_t to make future upgrades
   [slightly] easier
   * new mca_base_component_t size: 196 bytes
   * new mca_base_component_data_2_0_0_t size: 36 bytes
 * MCA base version bumped to v2.0
   * '''We now refuse to load components that are not MCA v2.0.x'''
 * all MCA frameworks versions bumped to v2.0
 * be a little more explicit about version numbers in the MCA base
   * add big comment in mca.h about versioning philosophy

This commit was SVN r19073.

The following Trac tickets were found above:
  Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392
2008-07-28 22:40:57 +00:00
Lenny Verkhovsky
f278609d77 Funny warning message about rd_win size fix
This commit was SVN r19063.
2008-07-28 14:30:57 +00:00
Adrian Knoth
5096512c3a Cosmetics, only typos.
This commit was SVN r19061.
2008-07-28 13:33:08 +00:00
Jeff Squyres
e3e79c0881 Fixes trac:1379:
* Use synonym/deprecated MCA param API for some mca base params
 * In openib BTL, if we have appropriate memory hooks support, and if
   mpi_leave_pinned and mpi_leave_pinned_pipeline were not set by the
   user, set mpi_leave_pinned to 1.
 * Defer checking mpi_leave_pinned_* until as late as possible (i.e.,
   until after the btl's have had a chance to set mpi_leave_pinned to
   1):
   * in ob1 pml
   * in rdma mpool

This commit was SVN r19022.

The following Trac tickets were found above:
  Ticket 1379 --> https://svn.open-mpi.org/trac/ompi/ticket/1379
2008-07-24 22:51:26 +00:00
Jeff Squyres
5b9219565c Remove the use of __cpu_to_be64() and replace it with hton64().
This commit was SVN r18995.
2008-07-23 12:08:55 +00:00
Jeff Squyres
2f208f885c Fixes trac:1295: change language in openib BTL from IB-specific to be
"!OpenFabrics" / neutral (i.e., refer to IB and/or iWARP).

 * Mostly just type, variable/field, and funcion name changes, such as
   s/hca/device/g, etc.  
 * Changed the INI file for the hardware-specific parameters to be
   mca-btl-openib-device-params.ini.
 * Updated a lot of help messages in the help-*.txt files, not just to
   update it to be !OpenFabrics/neutral language, but also for some
   consistency of tone, indenting, etc.
 * Deprecated a bunch of MCA params in favor of language-neutral new
   ones:
   * btl_openib_warn_no_hca_params_found (s/hca/device/)
   * btl_openib_hca_param_files
   * btl_openib_ib_cq_size (s/_ib_/_of_/)
   * btl_openib_ib_max_inline_data
   * btl_openib_ib_psn
   * btl_openib_ib_mtu
   * btl_openib_ib_pkey_ix
   * btl_openib_ib_pkey_val

This commit was SVN r18985.

The following Trac tickets were found above:
  Ticket 1295 --> https://svn.open-mpi.org/trac/ompi/ticket/1295
2008-07-23 00:28:59 +00:00
Jon Mason
f80404d991 Add openib error handling during wireup for rdmacm
The rdmacm event handler has no way of reporting fatal errors to the upper
layers.  By calling mca_btl_openib_endpoint_invoke_error in the rdmacm event
handler for the errors encountered, these errors can now be handled
appropriately.

Closes out Ticket #1283

This commit was SVN r18980.
2008-07-22 19:03:13 +00:00
Pavel Shamis
849a8f86a7 Bug fox for #1388 - fixing ib_cm_listen() random failures.
This commit was SVN r18952.
2008-07-20 06:21:32 +00:00
Andrew Friedley
d479d981a7 Missed this on my last commit regarding if include/exclude -- should now iterate from 0, not 1.
This commit was SVN r18924.
2008-07-16 14:15:25 +00:00
Andrew Friedley
dabe6defb3 Add support for btl_ofud_{in,ex}clude MCA parameters.
This commit was SVN r18916.
2008-07-15 17:57:52 +00:00
Pavel Shamis
e233c11df0 Disabling IBCM cpc by default. In order to enable it, you need to define it explicitly in command line.
This commit was SVN r18911.
2008-07-15 08:02:00 +00:00
Pavel Shamis
807c53c7b1 "failed to ib_cm_listen 10 times" - is know bug in IBCM and we should print it
only if IBCM was explicitly requested.

This commit was SVN r18897.
2008-07-13 16:36:41 +00:00
Pavel Shamis
12379e7f3e Fixing race condition between main thread and async event thread
during openib finalization.

This commit was SVN r18895.
2008-07-13 16:21:49 +00:00
Pavel Shamis
a34bb98f8a Bug fix for #1376.
If IBCM was explicitly specified with exclude/include parameter,
OpenIB BTL will enable verbose report for "/dev/infiniband/ucm" error,
other way the error will not be reported.

This commit was SVN r18868.
2008-07-10 15:08:49 +00:00
Brad Benton
9f0280bd55 arghh...I inadvertantly checked this in to the 1.3 branch rather than
first to the trunk.  So, here is the trunk checkin:

The call to orte_show_help() to notify truncation of the max_inline value
was missing the want_error_header boolean, which eventually results in
a SEGV.  This change corrects the call with the bool set to true.

This commit was SVN r18839.
2008-07-08 15:28:53 +00:00
Pavel Shamis
452141bfb8 Bugfix for #1375.
- Adding configure options that allow to disable IB/RDMA-CM support.
- Code cleanup in openib section of configure

This commit was SVN r18830.
2008-07-08 06:32:54 +00:00
Jeff Squyres
160ba5fe11 Set max_inline_data for the iWARP adapters to be 64
This commit was SVN r18782.
2008-06-30 14:25:32 +00:00
Pavel Shamis
eaa7676c57 Changing default maximum inline data size
from Maximum_Supported_By_Device to 128 (our original value).

This commit was SVN r18774.
2008-06-30 07:47:09 +00:00
Jeff Squyres
f42f55f84b Several improvements and fixes to the openib IBCM CPC:
* Move the passive side QP move to RTS to before we send the reply
   (vs. sending it after we get the RTU).  A lengthy comment explains
   the need for this.
 * Add some timers to the code for analyzing where time is spent.
 * Clarify a few error messages.
 * Currently have a loop around ib_cm_listen() because sometimes it
   fails for seeminly no reason.  Have pending e-mails in to Sean
   Hefty to see if we can figure out why this is happening.  Note that
   the more MPI processes you add, the more likely this error is to
   occur (e.g., ran 720 processes and it happens at least 50% of the
   time).  This makes IBCM somewhat unattractive for general use;
   hopefully we can get a fix...

This commit was SVN r18766.
2008-06-28 10:53:58 +00:00
Jeff Squyres
67933e743c Move some of the things I did in r18762 to the openib BTL proper (in
endpoint.c) because it's almost identical in all the CPC's.  OOB,
XOOB, and IBCM now all invoke the btl error handler properly if
there's an error during wireup.  RDMACM still needs to be done.

This commit was SVN r18764.

The following SVN revision numbers were found above:
  r18762 --> open-mpi/ompi@3eda04578f
2008-06-27 22:48:45 +00:00
Jeff Squyres
3eda04578f Clean up a lot of error handling in IBCM CPC; properly pass error to
upper level btl (and therefore the PML) when something goes wrong
during wireup.

Refs trac:1283.

This commit was SVN r18762.

The following Trac tickets were found above:
  Ticket 1283 --> https://svn.open-mpi.org/trac/ompi/ticket/1283
2008-06-27 21:02:05 +00:00
Jeff Squyres
ad16d8f335 Fix some issues in the ibcm CPC:
* Properly handle non-symmetric subnet ID's
 * Be a bit more stringent when checking for the GID
 * Add lots of BTL_VERBOSE's for diagnostics

This commit was SVN r18754.
2008-06-26 20:23:56 +00:00
Jeff Squyres
dd563f9297 Add more output to btl_base_verbose
This commit was SVN r18743.
2008-06-25 20:16:34 +00:00
Jeff Squyres
b8b1aded4e Hook all the ibcm diagnostics up to "--mca btl_base_verbose 1":
* Use BTL_VERBOSE
 * Remove tailing \n's
 * Remove redundant prefixes (BTL_VERBOSE outputs __FILE__, function,
   and __LINE__)

This commit was SVN r18742.
2008-06-25 19:19:03 +00:00
Jeff Squyres
26f6d9cf0c Properly protect the use of dev->transport_type and
IBV_TRANSPORT_IWARP.

This commit was SVN r18738.
2008-06-25 14:50:59 +00:00
Jeff Squyres
64034b65d0 Remove debugging message kruft
This commit was SVN r18737.
2008-06-25 11:32:33 +00:00
George Bosilca
df9e54194b Fix a deadlock in the put protocol for MX. The callback was never triggered,
as the send and the put share the same execution path and in the case of the
put the MCA_BTL_DES_SEND_ALWAYS_CALLBACK was not set.

This commit was SVN r18735.
2008-06-25 03:45:30 +00:00
George Bosilca
24316b77cf This commit is related to ticket #1362. The following devices respect the
behavior described on the ticket: elan, gm, mx, self, sm, tcp.

This commit was SVN r18734.
2008-06-25 03:35:07 +00:00
Jeff Squyres
edad3b66a2 Add warning message if you request one max_inline_data value and the
device reduces it.

This commit was SVN r18728.
2008-06-24 20:40:03 +00:00
Jon Mason
eefc8956b6 Fix a bug introduced in r18725
ompi_info will crash due to a reference to a unalloced struct.  Add a check for
this struct to prevent this crash.

This commit was SVN r18727.

The following SVN revision numbers were found above:
  r18725 --> open-mpi/ompi@c2351d39f1
2008-06-24 20:38:47 +00:00
Jon Mason
c2351d39f1 Fix Ticket #1359
If rdmacm cpc is disabled and an iwarp device is forced to be used, then
mca_btl_openib_get_iwarp_subnet_id will attempt to reference an uninitalized
list.  Modify the code to check for the uninitalized list and return zero.

Also, fix a memory leak caused by not freeing alloced addresses structs during
shutdown. 

This commit was SVN r18725.
2008-06-24 19:16:06 +00:00
George Bosilca
fbf341068a Remove the pending queue from the shared memory BTL. The PML is in charge of managing
the fragments that failed to be send, there is no need to replicate the same
mechanism in the BTL.
Force the SM BTL to empty all ack fragments in the component progress function.

This commit was SVN r18724.
2008-06-24 19:01:26 +00:00
Jeff Squyres
ea21c31f44 * MCA params btl_openib_use_eager_rdma can now override the
INI file use_eager_rdma value (fixes trac:1169)
 * fixed a typo in a MCA param help message
 * made the check for enabling short/eager RDMA more robust in the
   presence of progress threads; it now emits a show_help warning

This commit was SVN r18723.

The following Trac tickets were found above:
  Ticket 1169 --> https://svn.open-mpi.org/trac/ompi/ticket/1169
2008-06-24 18:31:46 +00:00
Jeff Squyres
e0545460ff Fixes trac:1355: allow INI file to set max_inline_data vale, and if not
specified, probe for max value supported by device.

This commit was SVN r18720.

The following Trac tickets were found above:
  Ticket 1355 --> https://svn.open-mpi.org/trac/ompi/ticket/1355
2008-06-24 17:18:07 +00:00
George Bosilca
1eb62b6c48 Remove a warning. Close ticket #1357.
This commit was SVN r18717.
2008-06-24 14:23:02 +00:00
Jeff Squyres
3f95b906c5 Fixes trac:1085: Improves SLURM configure logic to also allow OS X or any platform where srun is found in the PATH.
This commit was SVN r18714.

The following Trac tickets were found above:
  Ticket 1085 --> https://svn.open-mpi.org/trac/ompi/ticket/1085
2008-06-23 23:12:55 +00:00
Jeff Squyres
281c37afcc Ensure to ignoe the "empty" CPC components.
This commit was SVN r18702.
2008-06-21 11:39:53 +00:00
Jeff Squyres
807e2cc742 Mark a notable place where we need to return an error up to the BTL or
PML.

This commit was SVN r18701.
2008-06-20 22:11:49 +00:00
Jeff Squyres
5ded50df0e * Fix a > that should be ==
* Ensure to destroy the correct QP (local->id[num]->qp will always
   have a valid pointer in it, even if we setup a dummy qp)
 * Note two notable places where we need to figure out how to
   propagate errors up from the CPC to the main BTL / PML when errors
   occur.  Probably have the same issue in IBCM, too.

This commit was SVN r18700.
2008-06-20 22:09:30 +00:00
Jeff Squyres
0074126886 Per #1352, most iWARP adapters today cannot handle connections between
two processes on the same server (!).  So for today, we'll simply mark
all local processes that use iWARP adapters as "unreachable".

More details in #1352.

This commit was SVN r18699.
2008-06-20 22:08:00 +00:00
Jeff Squyres
f4145fce7a Ensure that we don't try to shut down a thread that is not [yet] there
(e.g., if you're excluding some devices, their destructors will be
invoked before the async event thread was setup for them).

This commit was SVN r18698.
2008-06-20 19:30:51 +00:00
Jeff Squyres
ed17b51204 Adjust the max_inline default size down so that it can be accepted on
multiple adapters (eg., Chelsio T3).

But we need to figure out how to determine a good value for the
resident adapter(s) at runtime.  It's problematic because, for
example, Mellanox ConnectX and Chelsio T3 report max_inline values
differently at run-time.  If you ibv_create_qp with a max_inline value
of 0, ConnectX reports back a value that is a formular based on a few
other values (e.g., max_send_sge and max_recv_sge).  But T3 always
reports back "64".

We're looking into this to figure out the best way -- reducing the
default right now should allow other adapters to run while we figure
it out.

This commit was SVN r18697.
2008-06-20 18:24:04 +00:00
George Bosilca
54e7e03695 One less warning.
This commit was SVN r18695.
2008-06-20 17:50:19 +00:00
Jeff Squyres
7905db57bd Slightly decrease the number of buffers for the NetXen adapter
This commit was SVN r18691.
2008-06-20 01:00:22 +00:00
Pavel Shamis
4537827973 Making the qp allocation more optimized.
- sq parameter was replaced with max_inline parameter
- inline is allocated only for relevant QPs

This commit was SVN r18675.
2008-06-19 08:40:39 +00:00
Lenny Verkhovsky
f4811d6c4d NUMA Awareness support. Gleb's patch
This commit was SVN r18658.
2008-06-15 13:43:28 +00:00