1
1
openmpi/ompi/mca/btl/openib/connect/help-mpi-btl-openib-cpc-rdmacm.txt

69 строки
1.7 KiB
Plaintext
Исходник Обычный вид История

# -*- text -*-
#
# Copyright (c) 2008 Cisco Systems, Inc. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
# This is the US/English help file for Open MPI's OpenFabrics RDMA CM
# support (the openib BTL).
#
[no valid ip]
It appears that an OpenFabrics device does not have an IP address
associated with it. The OpenFabrics RDMA CM connection scheme
*requires* IP addresses to be setup in order to function properly.
Local host: %s
Local device: %s
Fixes trac:1210, #1319 Commit from a long-standing Mercurial tree that ended up incorporating a lot of things: * A few fixes for CPC interface changes in all the CPCs * Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC * #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC) * Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag) * CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers. * CPCs allocate a CTS buffer only if they're about to make a connection * RDMA CM improvements: * Use threaded mode openib fd monitoring to wait for for RDMA CM events * Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions * Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors * Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type * Renamed many variables to be internally consistent * Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator * Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them) * Use rdma_create_qp() instead of ibv_create_qp() * openib fd monitoring improvements: * Renamed a bunch of functions and variables to be a little more obvious as to their true function * Use pipes to communicate between main thread and service thread * Add ability for main thread to invoke a function back on the service thread * Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3)) * Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port * Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout * Other improvements: * btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp") * Somewhat improved error handling * Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements * Oodles of little coding style fixes * Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction * Added some more show_help error messages * Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread) This commit was SVN r19686. The following Trac tickets were found above: Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210
2008-10-06 00:46:02 +00:00
#
[could not find matching endpoint]
The OpenFabrics device in an MPI process received an RDMA CM connect
request for a peer that it could not identify as part of this MPI job.
This should not happen. Your process is likely to abort; sorry.
Local host: %s
Local device: %s
Remote address: %s
Remote TCP port: %d
#
[illegal tcp port]
The btl_openib_connect_rdmacm_port MCA parameter was used to specify
an illegal TCP port value. TCP ports must be between 0 and 65536
(ports below 1024 can only be used by root).
TCP port: %d
This value was ignored.
#
[illegal timeout]
The btl_openib_connect_rdmacm_resolve_timeout parameter was used to
specify an illegal timeout value. Timeout values are specified in
miliseconds and must be greater than 0.
Timeout value: %d
This value was ignored.
#
[rdma cm device removal]
The RDMA CM returned that the device Open MPI was trying to use has
been removed.
Local host: %s
Local device: %s
Your MPI job will now abort, sorry.
#
[rdma cm event error]
The RDMA CM returned an event error while attempting to make a
connection. This type of error usually indicates a network
configuration error.
Local host: %s
Local device: %s
Error name: %s
Peer: %s
Your MPI job will now abort, sorry.