16f90acbaf
This commit was SVN r31322.
249 строки
7.8 KiB
Plaintext
249 строки
7.8 KiB
Plaintext
# -*- text -*-
|
|
#
|
|
# Copyright (c) 2012-2014 Cisco Systems, Inc. All rights reserved.
|
|
#
|
|
# $COPYRIGHT$
|
|
#
|
|
# Additional copyrights may follow
|
|
#
|
|
# $HEADER$
|
|
#
|
|
# This is the US/English help file for the Open MPI usnic BTL.
|
|
#
|
|
[ibv API failed]
|
|
Open MPI failed a basic verbs operation on a Cisco usNIC device. This
|
|
is highly unusual and shouldn't happen. It suggests that there may be
|
|
something wrong with the usNIC or OpenFabrics configuration on this
|
|
server.
|
|
|
|
In addition to any suggestions listed below, you might want to check
|
|
the Linux "memlock" limits on your system (they should probably be
|
|
"unlimited"). See this FAQ entry for details:
|
|
|
|
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
|
|
|
|
Open MPI will skip this device/port in the usnic BTL, which may result
|
|
in either lower performance or your job aborting.
|
|
|
|
Server: %s
|
|
Device: %s
|
|
Port: %d
|
|
Failed function: %s (%s:%d)
|
|
Description: %s
|
|
#
|
|
[not enough usnic resources]
|
|
There are not enough usNIC resources on a VIC for all the MPI
|
|
processes on this server.
|
|
|
|
This means that you have either not provisioned enough usNICs on this
|
|
VIC, or there are not enough total receive, transmit, or completion
|
|
queues on the provisioned usNICs. On each VIC in a given server, you
|
|
need to provision at least as many usNICs as MPI processes on that
|
|
server. In each usNIC, you need to provision at least two each of the
|
|
following: send queues, receive queues, and completion queues.
|
|
|
|
Open MPI will skip this device in the usnic BTL, which may result in
|
|
either lower performance or your job aborting.
|
|
|
|
Server: %s
|
|
Device: %s
|
|
Description: %s
|
|
#
|
|
[create ibv resource failed]
|
|
Open MPI failed to allocate a usNIC-related resource on a VIC. This
|
|
usually means one of two things:
|
|
|
|
1. You are running something other than this MPI job on this server
|
|
that is consuming usNIC resources.
|
|
2. You have run out of locked Linux memory. You should probably set
|
|
the Linux "memlock" limits to "unlimited". See this FAQ entry for
|
|
details:
|
|
|
|
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
|
|
|
|
This Open MPI job will skip this device/port in the usnic BTL, which
|
|
may result in either lower performance or the job aborting.
|
|
|
|
Server: %s
|
|
Device: %s
|
|
Failed function: %s (%s:%d)
|
|
Description: %s
|
|
#
|
|
[async event]
|
|
Open MPI detected a fatal error on a usNIC port. Your MPI job will
|
|
now abort; sorry.
|
|
|
|
Server: %s
|
|
Device: %s
|
|
Port: %d
|
|
Async event code: %s (%d)
|
|
#
|
|
[internal error during init]
|
|
An internal error has occurred in the Open MPI usNIC BTL. This is
|
|
highly unusual and shouldn't happen. It suggests that there may be
|
|
something wrong with the usNIC or OpenFabrics configuration on this
|
|
server.
|
|
|
|
Open MPI will skip this device/port in the usnic BTL, which may result
|
|
in either lower performance or your job aborting.
|
|
|
|
Server: %s
|
|
Device: %s
|
|
Port: %d
|
|
Failure: %s (%s:%d)
|
|
Description: %s
|
|
#
|
|
[internal error after init]
|
|
An internal error has occurred in the Open MPI usNIC BTL. This is
|
|
highly unusual and shouldn't happen. It suggests that there may be
|
|
something wrong with the usNIC or OpenFabrics configuration on this
|
|
server.
|
|
|
|
Server: %s
|
|
Message: %s
|
|
File: %s
|
|
Line: %d
|
|
Error: %s
|
|
#
|
|
[verbs_port_bw failed]
|
|
Open MPI failed to query the supported bandwidth of a port on a Cisco
|
|
usNIC device. This is unusual and shouldn't happen. It suggests that
|
|
there may be something wrong with the usNIC or OpenFabrics
|
|
configuration on this server.
|
|
|
|
Open MPI will skip this device/port in the usnic BTL, which may result
|
|
in either lower performance or your job aborting.
|
|
|
|
Server: %s
|
|
Device: %s
|
|
Port: %d
|
|
#
|
|
[check_reg_mem_basics fail]
|
|
The usNIC BTL failed to initialize while trying to register some
|
|
memory. This typically can indicate that the "memlock" limits are set
|
|
too low. For most HPC installations, the memlock limits should be set
|
|
to "unlimited". The failure occurred here:
|
|
|
|
Local host: %s
|
|
Memlock limit: %s
|
|
|
|
You may need to consult with your system administrator to get this
|
|
problem fixed. This FAQ entry on the Open MPI web site may also be
|
|
helpful:
|
|
|
|
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
|
|
#
|
|
[invalid if_inexclude]
|
|
WARNING: An invalid value was given for btl_usnic_if_%s. This
|
|
value will be ignored.
|
|
|
|
Local host: %s
|
|
Value: %s
|
|
Message: %s
|
|
#
|
|
[MTU mismatch]
|
|
The MTU does not match on local and remote hosts. All interfaces on all
|
|
hosts participating in an MPI job must be configured with the same MTU.
|
|
The device and port listed below will not be used to communicate with this
|
|
remote host.
|
|
|
|
Local host: %s
|
|
Device/port: %s/%d
|
|
Local MTU: %d
|
|
Remote host: %s
|
|
Remote MTU: %d
|
|
#
|
|
[rtnetlink init fail]
|
|
The usnic BTL failed to initialize the rtnetlink query subsystem.
|
|
|
|
Server: %s
|
|
Error message: %s
|
|
#
|
|
[connectivity error: small ok, large bad]
|
|
The Open MPI usNIC BTL was unable to establish full connectivity
|
|
between at least one pair of servers in the MPI job. Specifically,
|
|
small UDP messages seem to flow between the servers, but large UDP
|
|
messages do not.
|
|
|
|
Your MPI job is going to abort now.
|
|
|
|
Source:
|
|
Hostname / IP: %s (%s)
|
|
Host interfaces: %s / %s
|
|
MAC address: %s
|
|
Destination:
|
|
Hostname / IP: %s (%s)
|
|
MAC address: %s
|
|
|
|
Small message size: %u
|
|
Large message size: %u
|
|
|
|
Note that this behavior usually indicates that the MTU of some network
|
|
link is too small between these two servers. You should verify that
|
|
UDP traffic with payloads up to the "large message size" listed above
|
|
can flow between these two servers.
|
|
#
|
|
[connectivity error: small bad, large ok]
|
|
The Open MPI usNIC BTL was unable to establish full connectivity
|
|
between at least one pair of servers in the MPI job. Specifically,
|
|
large UDP messages seem to flow between the servers, but small UDP
|
|
messages do not.
|
|
|
|
Your MPI job is going to abort now.
|
|
|
|
Source:
|
|
Hostname / IP: %s (%s)
|
|
Host interfaces: %s / %s
|
|
MAC address: %s
|
|
Destination:
|
|
Hostname / IP: %s (%s)
|
|
MAC address: %s
|
|
|
|
Small message size: %u
|
|
Large message size: %u
|
|
|
|
This is a very strange network error, and should not occur in most
|
|
situations. You may be experiencing high amounts of congestion, or
|
|
this may indicate some kind of network misconfiguration. You should
|
|
verify that UDP traffic with payloads up to the "large message size"
|
|
listed above can flow between these two servers.
|
|
#
|
|
[connectivity error: small bad, large bad]
|
|
The Open MPI usNIC BTL was unable to establish any connectivity
|
|
between at least one pair of servers in the MPI job. Specifically,
|
|
no UDP messages seemed to flow between these two servers.
|
|
|
|
Your MPI job is going to abort now.
|
|
|
|
Source:
|
|
Hostname / IP: %s (%s)
|
|
Host interfaces: %s / %s
|
|
MAC address: %s
|
|
Destination:
|
|
Hostname / IP: %s (%s)
|
|
MAC address: %s
|
|
|
|
Small message size: %u
|
|
Large message size: %u
|
|
|
|
Note that this behavior usually indicates some kind of network
|
|
misconfiguration. You should verify that UDP traffic with payloads up
|
|
to the "large message size" listed above can flow between these two
|
|
servers.
|
|
#
|
|
[ibv_create_ah timeout]
|
|
The usnic BTL failed to create addresses for remote peers within the
|
|
specified timeout. When using the usNIC/UDP transport, this usually
|
|
means that ARP requests failed to resolve in time. You may be able to
|
|
solve the problem by increasing the usnic BTL's ARP timeout. If that
|
|
doesn't work, you should diagnose why ARP replies are apparently not
|
|
being delivered in a timely manner.
|
|
|
|
The usNIC interface listed below will be ignored. Your MPI
|
|
application will likely either run with degraded performance and/or
|
|
abort.
|
|
|
|
Server: %s
|
|
Device: %s:%d (%s)
|
|
Current ARP timeout: %d (btl_usnic_arp_timeout MCA param)
|