1
1
openmpi/ompi/mca/btl/openib/connect/help-mpi-btl-openib-cpc-rdmacm.txt
Jeff Squyres c42ab8ea37 Fixes trac:1210, #1319
Commit from a long-standing Mercurial tree that ended up incorporating a lot of things:

 * A few fixes for CPC interface changes in all the CPCs
 * Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC
 * #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC)
   * Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag)
   * CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers. 
   * CPCs allocate a CTS buffer only if they're about to make a connection
 * RDMA CM improvements:
   * Use threaded mode openib fd monitoring to wait for for RDMA CM events
   * Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions
   * Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors
   * Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type
   * Renamed many variables to be internally consistent
   * Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator
   * Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them) 
   * Use rdma_create_qp() instead of ibv_create_qp()
 * openib fd monitoring improvements:
   * Renamed a bunch of functions and variables to be a little more obvious as to their true function
   * Use pipes to communicate between main thread and service thread
   * Add ability for main thread to invoke a function back on the service thread 
   * Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3))
   * Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port
   * Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout
 * Other improvements:
   * btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp")
   * Somewhat improved error handling
   * Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements
   * Oodles of little coding style fixes
   * Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction
   * Added some more show_help error messages
   * Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread) 

This commit was SVN r19686.

The following Trac tickets were found above:
  Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210
2008-10-06 00:46:02 +00:00

69 строки
1.7 KiB
Plaintext

# -*- text -*-
#
# Copyright (c) 2008 Cisco Systems, Inc. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
# This is the US/English help file for Open MPI's OpenFabrics RDMA CM
# support (the openib BTL).
#
[no valid ip]
It appears that an OpenFabrics device does not have an IP address
associated with it. The OpenFabrics RDMA CM connection scheme
*requires* IP addresses to be setup in order to function properly.
Local host: %s
Local device: %s
#
[could not find matching endpoint]
The OpenFabrics device in an MPI process received an RDMA CM connect
request for a peer that it could not identify as part of this MPI job.
This should not happen. Your process is likely to abort; sorry.
Local host: %s
Local device: %s
Remote address: %s
Remote TCP port: %d
#
[illegal tcp port]
The btl_openib_connect_rdmacm_port MCA parameter was used to specify
an illegal TCP port value. TCP ports must be between 0 and 65536
(ports below 1024 can only be used by root).
TCP port: %d
This value was ignored.
#
[illegal timeout]
The btl_openib_connect_rdmacm_resolve_timeout parameter was used to
specify an illegal timeout value. Timeout values are specified in
miliseconds and must be greater than 0.
Timeout value: %d
This value was ignored.
#
[rdma cm device removal]
The RDMA CM returned that the device Open MPI was trying to use has
been removed.
Local host: %s
Local device: %s
Your MPI job will now abort, sorry.
#
[rdma cm event error]
The RDMA CM returned an event error while attempting to make a
connection. This type of error usually indicates a network
configuration error.
Local host: %s
Local device: %s
Error name: %s
Peer: %s
Your MPI job will now abort, sorry.