552c9ca5a0
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
69 строки
1.7 KiB
Plaintext
69 строки
1.7 KiB
Plaintext
# -*- text -*-
|
|
#
|
|
# Copyright (c) 2008 Cisco Systems, Inc. All rights reserved.
|
|
# $COPYRIGHT$
|
|
#
|
|
# Additional copyrights may follow
|
|
#
|
|
# $HEADER$
|
|
#
|
|
# This is the US/English help file for Open MPI's OpenFabrics RDMA CM
|
|
# support (the openib BTL).
|
|
#
|
|
[no valid ip]
|
|
It appears that an OpenFabrics device does not have an IP address
|
|
associated with it. The OpenFabrics RDMA CM connection scheme
|
|
*requires* IP addresses to be setup in order to function properly.
|
|
|
|
Local host: %s
|
|
Local device: %s
|
|
#
|
|
[could not find matching endpoint]
|
|
The OpenFabrics device in an MPI process received an RDMA CM connect
|
|
request for a peer that it could not identify as part of this MPI job.
|
|
This should not happen. Your process is likely to abort; sorry.
|
|
|
|
Local host: %s
|
|
Local device: %s
|
|
Remote address: %s
|
|
Remote TCP port: %d
|
|
#
|
|
[illegal tcp port]
|
|
The btl_openib_connect_rdmacm_port MCA parameter was used to specify
|
|
an illegal TCP port value. TCP ports must be between 0 and 65536
|
|
(ports below 1024 can only be used by root).
|
|
|
|
TCP port: %d
|
|
|
|
This value was ignored.
|
|
#
|
|
[illegal timeout]
|
|
The btl_openib_connect_rdmacm_resolve_timeout parameter was used to
|
|
specify an illegal timeout value. Timeout values are specified in
|
|
miliseconds and must be greater than 0.
|
|
|
|
Timeout value: %d
|
|
|
|
This value was ignored.
|
|
#
|
|
[rdma cm device removal]
|
|
The RDMA CM returned that the device Open MPI was trying to use has
|
|
been removed.
|
|
|
|
Local host: %s
|
|
Local device: %s
|
|
|
|
Your MPI job will now abort, sorry.
|
|
#
|
|
[rdma cm event error]
|
|
The RDMA CM returned an event error while attempting to make a
|
|
connection. This type of error usually indicates a network
|
|
configuration error.
|
|
|
|
Local host: %s
|
|
Local device: %s
|
|
Error name: %s
|
|
Peer: %s
|
|
|
|
Your MPI job will now abort, sorry.
|