733316372b
Reviewed by Reese Faucette cmr=v1.8.3:reviewer=ompi-rm1.8 This commit was SVN r32628.
367 строки
12 KiB
Plaintext
367 строки
12 KiB
Plaintext
# -*- text -*-
|
|
#
|
|
# Copyright (c) 2012-2014 Cisco Systems, Inc. All rights reserved.
|
|
#
|
|
# $COPYRIGHT$
|
|
#
|
|
# Additional copyrights may follow
|
|
#
|
|
# $HEADER$
|
|
#
|
|
# This is the US/English help file for the Open MPI usnic BTL.
|
|
#
|
|
[ibv API failed]
|
|
Open MPI failed a basic verbs operation on a Cisco usNIC interface.
|
|
This is highly unusual and shouldn't happen. It suggests that there
|
|
may be something wrong with the usNIC or OpenFabrics configuration on
|
|
this server.
|
|
|
|
In addition to any suggestions listed below, you might want to check
|
|
the Linux "memlock" limits on your system (they should probably be
|
|
"unlimited"). See this FAQ entry for details:
|
|
|
|
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
|
|
|
|
Open MPI will skip this usNIC interface in the usnic BTL, which may
|
|
result in either lower performance or your job aborting.
|
|
|
|
Server: %s
|
|
usNIC interface: %s (which is %s)
|
|
Failed function: %s (%s:%d)
|
|
Description: %s
|
|
#
|
|
[not enough usnic resources]
|
|
There are not enough usNIC resources on a VIC for all the MPI
|
|
processes on this server.
|
|
|
|
This means that you have either not provisioned enough usNICs on this
|
|
VIC, or there are not enough total receive, transmit, or completion
|
|
queues on the provisioned usNICs. On each VIC in a given server, you
|
|
need to provision at least as many usNICs as MPI processes on that
|
|
server. In each usNIC, you need to provision at least two each of the
|
|
following: send queues, receive queues, and completion queues.
|
|
|
|
Open MPI will skip this usNIC interface in the usnic BTL, which may
|
|
result in either lower performance or your job aborting.
|
|
|
|
Server: %s
|
|
usNIC interface: %s
|
|
Description: %s
|
|
#
|
|
[create ibv resource failed]
|
|
Open MPI failed to allocate a usNIC-related resource on a VIC. This
|
|
usually means one of two things:
|
|
|
|
1. You are running something other than this MPI job on this server
|
|
that is consuming usNIC resources.
|
|
2. You have run out of locked Linux memory. You should probably set
|
|
the Linux "memlock" limits to "unlimited". See this FAQ entry for
|
|
details:
|
|
|
|
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
|
|
|
|
This Open MPI job will skip this usNIC interface in the usnic BTL,
|
|
which may result in either lower performance or the job aborting.
|
|
|
|
Server: %s
|
|
usNIC interface: %s (which is %s)
|
|
Failed function: %s (%s:%d)
|
|
Description: %s
|
|
#
|
|
[async event]
|
|
Open MPI detected a fatal error on a usNIC interface. Your MPI job
|
|
will now abort; sorry.
|
|
|
|
Server: %s
|
|
usNIC interface: %s (which is %s)
|
|
Async event code: %s (%d)
|
|
#
|
|
[internal error during init]
|
|
An internal error has occurred in the Open MPI usNIC BTL. This is
|
|
highly unusual and shouldn't happen. It suggests that there may be
|
|
something wrong with the usNIC or OpenFabrics configuration on this
|
|
server.
|
|
|
|
Open MPI will skip this usNIC interface in the usnic BTL, which may
|
|
result in either lower performance or your job aborting.
|
|
|
|
Server: %s
|
|
usNIC interface: %s (which is %s)
|
|
Failure: %s (%s:%d)
|
|
#
|
|
[internal error after init]
|
|
An internal error has occurred in the Open MPI usNIC BTL. This is
|
|
highly unusual and shouldn't happen. It suggests that there may be
|
|
something wrong with the usNIC or OpenFabrics configuration on this
|
|
server.
|
|
|
|
Server: %s
|
|
Message: %s
|
|
File: %s
|
|
Line: %d
|
|
Error: %s
|
|
#
|
|
[verbs_port_bw failed]
|
|
Open MPI failed to query the supported bandwidth of a usNIC interface.
|
|
This is unusual and shouldn't happen. It suggests that there may be
|
|
something wrong with the usNIC or OpenFabrics configuration on this
|
|
server.
|
|
|
|
Open MPI will skip this usNIC interface in the usnic BTL, which may
|
|
result in either lower performance or your job aborting.
|
|
|
|
Server: %s
|
|
usNIC interface: %s (which is %s)
|
|
#
|
|
[check_reg_mem_basics fail]
|
|
The usNIC BTL failed to initialize while trying to register some
|
|
memory. This typically can indicate that the "memlock" limits are set
|
|
too low. For most HPC installations, the memlock limits should be set
|
|
to "unlimited". The failure occurred here:
|
|
|
|
Local host: %s
|
|
Memlock limit: %s
|
|
|
|
You may need to consult with your system administrator to get this
|
|
problem fixed. This FAQ entry on the Open MPI web site may also be
|
|
helpful:
|
|
|
|
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
|
|
#
|
|
[invalid if_inexclude]
|
|
WARNING: An invalid value was given for btl_usnic_if_%s. This
|
|
value will be ignored.
|
|
|
|
Local host: %s
|
|
Value: %s
|
|
Message: %s
|
|
#
|
|
[MTU mismatch]
|
|
The MTU does not match on local and remote hosts. All interfaces on
|
|
all hosts participating in an MPI job must be configured with the same
|
|
MTU. The usNIC interface listed below will not be used to communicate
|
|
with this remote host.
|
|
|
|
Local host: %s
|
|
usNIC interface: %s (which is %s)
|
|
Local MTU: %d
|
|
Remote host: %s
|
|
Remote MTU: %d
|
|
#
|
|
[rtnetlink init fail]
|
|
The usnic BTL failed to initialize the rtnetlink query subsystem.
|
|
|
|
Server: %s
|
|
Error message: %s
|
|
#
|
|
[connectivity error: small ok, large bad]
|
|
The Open MPI usNIC BTL was unable to establish full connectivity
|
|
between at least one pair of interfaces on servers in the MPI job.
|
|
Specifically, small UDP messages seem to flow between the interfaces,
|
|
but large UDP messages do not.
|
|
|
|
Your MPI job is going to abort now.
|
|
|
|
Source:
|
|
Hostname / IP: %s (%s)
|
|
Host interfaces: %s / %s
|
|
MAC address: %s
|
|
Destination:
|
|
Hostname / IP: %s (%s)
|
|
MAC address: %s
|
|
|
|
Small message size: %u
|
|
Large message size: %u
|
|
|
|
Note that this behavior usually indicates that the MTU of some network
|
|
link is too small between these two interfaces.
|
|
|
|
You should verify that UDP traffic with payloads up to the "large
|
|
message size" listed above can flow between these two interfaces. You
|
|
should also verify that Open MPI is choosing to pair IP interfaces
|
|
consistently. For example:
|
|
|
|
mpirun --mca btl_usnic_connectivity_map mymap ...
|
|
|
|
Check the resulting "mymap*" files to see the exact pairing of IP
|
|
interfaces. Inconsistent results may be indicative of underlying
|
|
network misconfigurations.
|
|
#
|
|
[connectivity error: small bad, large ok]
|
|
The Open MPI usNIC BTL was unable to establish full connectivity
|
|
between at least one pair of interfaces on servers in the MPI job.
|
|
Specifically, large UDP messages seem to flow between the interfaces,
|
|
but small UDP messages do not.
|
|
|
|
Your MPI job is going to abort now.
|
|
|
|
Source:
|
|
Hostname / IP: %s (%s)
|
|
Host interfaces: %s / %s
|
|
MAC address: %s
|
|
Destination:
|
|
Hostname / IP: %s (%s)
|
|
MAC address: %s
|
|
|
|
Small message size: %u
|
|
Large message size: %u
|
|
|
|
This is a very strange network error, and should not occur in most
|
|
situations. You may be experiencing high amounts of congestion, or
|
|
this may indicate some kind of network misconfiguration.
|
|
|
|
You should verify that UDP traffic with payloads up to the "large
|
|
message size" listed above can flow between these two interfaces. You
|
|
should also verify that Open MPI is choosing to pair IP interfaces
|
|
consistently. For example:
|
|
|
|
mpirun --mca btl_usnic_connectivity_map mymap ...
|
|
|
|
Check the resulting "mymap*" files to see the exact pairing of IP
|
|
interfaces. Inconsistent results may be indicative of underlying
|
|
network misconfigurations.
|
|
#
|
|
[connectivity error: small bad, large bad]
|
|
The Open MPI usNIC BTL was unable to establish any connectivity
|
|
between at least one pair of interfaces on servers in the MPI job.
|
|
This can happen for several reasons, including:
|
|
|
|
1. No UDP traffic is able to flow between the interfaces listed below.
|
|
2. There is asymmetric routing between the interfaces listed below,
|
|
leading Open MPI to discard UDP traffic it thinks is from an
|
|
unexpected source.
|
|
|
|
Your MPI job is going to abort now.
|
|
|
|
Source:
|
|
Hostname / IP: %s (%s)
|
|
Host interfaces: %s / %s
|
|
MAC address: %s
|
|
Destination:
|
|
Hostname / IP: %s (%s)
|
|
MAC address: %s
|
|
|
|
Small message size: %u
|
|
Large message size: %u
|
|
|
|
Note that this behavior usually indicates some kind of network
|
|
misconfiguration.
|
|
|
|
You should verify that UDP traffic with payloads up to the "large
|
|
message size" listed above can flow between these two interfaces. You
|
|
should also verify that Open MPI is choosing to pair IP interfaces
|
|
consistently. For example:
|
|
|
|
mpirun --mca btl_usnic_connectivity_map mymap ...
|
|
|
|
Check the resulting "mymap*" files to see the exact pairing of IP
|
|
interfaces. Inconsistent results may be indicative of underlying
|
|
network misconfigurations.
|
|
#
|
|
[ibv_create_ah timeout]
|
|
The usnic BTL failed to create addresses for remote peers within the
|
|
specified timeout. When using the usNIC/UDP transport, this usually
|
|
means that ARP requests failed to resolve in time. You may be able to
|
|
solve the problem by increasing the usnic BTL's ARP timeout. If that
|
|
doesn't work, you should diagnose why ARP replies are apparently not
|
|
being delivered in a timely manner.
|
|
|
|
The usNIC interface listed below will be ignored. Your MPI
|
|
application will likely either run with degraded performance and/or
|
|
abort.
|
|
|
|
Server: %s
|
|
usNIC interface: %s (which is %s)
|
|
Current ARP timeout: %d (btl_usnic_arp_timeout MCA param)
|
|
#
|
|
[transport mismatch]
|
|
The underlying transports used by the usNIC driver stack on multiple
|
|
servers do not match. This configuration is unsupported and is almost
|
|
certainly not what you want.
|
|
|
|
This error indicates that the VIC firmware, Linux usNIC kernel driver,
|
|
and/or Linux usNIC userspace drivers are not compatible between at
|
|
least the following two servers:
|
|
|
|
Local server: %s
|
|
Remote server: %s
|
|
|
|
The usnic MPI transport will be deactivated in at least the one local
|
|
MPI process that reported the problem. This may lead to performance
|
|
degradation, and may also result in aborting the overall MPI job.
|
|
|
|
It is usually easiest to have the same VIC firmware, Linux usNIC
|
|
kernel driver, and Linux usNIC userspace driver installed on all
|
|
servers.
|
|
#
|
|
[create_ah failed]
|
|
WARNING: Open MPI failed to find a route to a peer IP address via a
|
|
specific usNIC interface. This usually indicates a problem in the IP
|
|
routing between these peers.
|
|
|
|
Open MPI will skip this usNIC interface when communicating with that
|
|
peer, which may result in lower performance to that peer. It may also
|
|
result in your job aborting if there are no other network paths to
|
|
that peer.
|
|
|
|
Note that this error message defaults to only printing for the *first*
|
|
pair of MPI process peers that exhibit this problem; this same problem
|
|
may exist for other peer pairs, too.
|
|
|
|
Local interface: %s:%s (which is %s and %s)
|
|
Peer: %s:%s
|
|
|
|
NOTE: You can set the MCA param btl_usnic_show_route_failures to 0 to
|
|
disable this warning.
|
|
#
|
|
[cannot write to map file]
|
|
WARNING: The usnic BTL was unable to open the requested output
|
|
connectivity map file. Your job will continue, but this output
|
|
connectivity map file will not be written.
|
|
|
|
Local server: %s
|
|
Output map file: %s
|
|
Working directory: %s
|
|
Error: %s (%d)
|
|
#
|
|
[received too many short packets]
|
|
WARNING: The usnic BTL received a significant number of abnormally
|
|
short packets on a single network interface. This may be due to
|
|
corruption or congestion in the network fabric. It may be useful to
|
|
run a physical/layer 0 diagnostic.
|
|
|
|
Your job will continue, but if this poor network behavior continues,
|
|
you may experience lower-than-expected performance due to overheads
|
|
caused by higher-than-usual retransmission rates (to compensate for
|
|
the corrupted received packets).
|
|
|
|
Local server: %s
|
|
usNIC interface: %s (which is %s)
|
|
# of short packets
|
|
received so far: %d
|
|
|
|
You will only receive this warning once per MPI process per job.
|
|
|
|
If you know that your network environment is lossy/heavily congested
|
|
such that short/corrupted packets are expected, you can disable this
|
|
warning by setting the btl_usnic_max_short_packets MCA parameter to 0.
|
|
#
|
|
[non-receive completion error]
|
|
WARNING: The usnic BTL has detected an error in the completion of a
|
|
non-receive event. This is highly unusual, and may indicate an error
|
|
in the usNIC subsystem on this server.
|
|
|
|
Your MPI job will continue, but you should monitor the job and ensure
|
|
that it behaves correctly.
|
|
|
|
Local server: %s
|
|
usNIC interface: %s (which is %s)
|
|
Channel index: %d
|
|
Completion status: %d
|
|
Work request ID: %p
|
|
Opcode: %d
|
|
Vendor error: %d
|
|
|
|
If this error keeps happening, you should contact Cisco technical
|
|
support.
|