# -*- text -*- # # Copyright (c) 2012-2014 Cisco Systems, Inc. All rights reserved. # # $COPYRIGHT$ # # Additional copyrights may follow # # $HEADER$ # # This is the US/English help file for the Open MPI usnic BTL. # [ibv API failed] Open MPI failed a basic verbs operation on a Cisco usNIC interface. This is highly unusual and shouldn't happen. It suggests that there may be something wrong with the usNIC or OpenFabrics configuration on this server. In addition to any suggestions listed below, you might want to check the Linux "memlock" limits on your system (they should probably be "unlimited"). See this FAQ entry for details: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Open MPI will skip this usNIC interface in the usnic BTL, which may result in either lower performance or your job aborting. Server: %s usNIC interface: %s (which is %s) Failed function: %s (%s:%d) Description: %s # [not enough usnic resources] There are not enough usNIC resources on a VIC for all the MPI processes on this server. This means that you have either not provisioned enough usNICs on this VIC, or there are not enough total receive, transmit, or completion queues on the provisioned usNICs. On each VIC in a given server, you need to provision at least as many usNICs as MPI processes on that server. In each usNIC, you need to provision at least two each of the following: send queues, receive queues, and completion queues. Open MPI will skip this usNIC interface in the usnic BTL, which may result in either lower performance or your job aborting. Server: %s usNIC interface: %s Description: %s # [create ibv resource failed] Open MPI failed to allocate a usNIC-related resource on a VIC. This usually means one of two things: 1. You are running something other than this MPI job on this server that is consuming usNIC resources. 2. You have run out of locked Linux memory. You should probably set the Linux "memlock" limits to "unlimited". See this FAQ entry for details: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages This Open MPI job will skip this usNIC interface in the usnic BTL, which may result in either lower performance or the job aborting. Server: %s usNIC interface: %s (which is %s) Failed function: %s (%s:%d) Description: %s # [async event] Open MPI detected a fatal error on a usNIC interface. Your MPI job will now abort; sorry. Server: %s usNIC interface: %s (which is %s) Async event code: %s (%d) # [internal error during init] An internal error has occurred in the Open MPI usNIC BTL. This is highly unusual and shouldn't happen. It suggests that there may be something wrong with the usNIC or OpenFabrics configuration on this server. Open MPI will skip this usNIC interface in the usnic BTL, which may result in either lower performance or your job aborting. Server: %s usNIC interface: %s (which is %s) Failure: %s (%s:%d) Description: %s # [internal error after init] An internal error has occurred in the Open MPI usNIC BTL. This is highly unusual and shouldn't happen. It suggests that there may be something wrong with the usNIC or OpenFabrics configuration on this server. Server: %s Message: %s File: %s Line: %d Error: %s # [verbs_port_bw failed] Open MPI failed to query the supported bandwidth of a usNIC interface. This is unusual and shouldn't happen. It suggests that there may be something wrong with the usNIC or OpenFabrics configuration on this server. Open MPI will skip this usNIC interface in the usnic BTL, which may result in either lower performance or your job aborting. Server: %s usNIC interface: %s (which is %s) # [check_reg_mem_basics fail] The usNIC BTL failed to initialize while trying to register some memory. This typically can indicate that the "memlock" limits are set too low. For most HPC installations, the memlock limits should be set to "unlimited". The failure occurred here: Local host: %s Memlock limit: %s You may need to consult with your system administrator to get this problem fixed. This FAQ entry on the Open MPI web site may also be helpful: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages # [invalid if_inexclude] WARNING: An invalid value was given for btl_usnic_if_%s. This value will be ignored. Local host: %s Value: %s Message: %s # [MTU mismatch] The MTU does not match on local and remote hosts. All interfaces on all hosts participating in an MPI job must be configured with the same MTU. The usNIC interface listed below will not be used to communicate with this remote host. Local host: %s usNIC interface: %s (which is %s) Local MTU: %d Remote host: %s Remote MTU: %d # [rtnetlink init fail] The usnic BTL failed to initialize the rtnetlink query subsystem. Server: %s Error message: %s # [connectivity error: small ok, large bad] The Open MPI usNIC BTL was unable to establish full connectivity between at least one pair of servers in the MPI job. Specifically, small UDP messages seem to flow between the servers, but large UDP messages do not. Your MPI job is going to abort now. Source: Hostname / IP: %s (%s) Host interfaces: %s / %s MAC address: %s Destination: Hostname / IP: %s (%s) MAC address: %s Small message size: %u Large message size: %u Note that this behavior usually indicates that the MTU of some network link is too small between these two servers. You should verify that UDP traffic with payloads up to the "large message size" listed above can flow between these two servers. # [connectivity error: small bad, large ok] The Open MPI usNIC BTL was unable to establish full connectivity between at least one pair of servers in the MPI job. Specifically, large UDP messages seem to flow between the servers, but small UDP messages do not. Your MPI job is going to abort now. Source: Hostname / IP: %s (%s) Host interfaces: %s / %s MAC address: %s Destination: Hostname / IP: %s (%s) MAC address: %s Small message size: %u Large message size: %u This is a very strange network error, and should not occur in most situations. You may be experiencing high amounts of congestion, or this may indicate some kind of network misconfiguration. You should verify that UDP traffic with payloads up to the "large message size" listed above can flow between these two servers. # [connectivity error: small bad, large bad] The Open MPI usNIC BTL was unable to establish any connectivity between at least one pair of servers in the MPI job. Specifically, no UDP messages seemed to flow between these two servers. Your MPI job is going to abort now. Source: Hostname / IP: %s (%s) Host interfaces: %s / %s MAC address: %s Destination: Hostname / IP: %s (%s) MAC address: %s Small message size: %u Large message size: %u Note that this behavior usually indicates some kind of network misconfiguration. You should verify that UDP traffic with payloads up to the "large message size" listed above can flow between these two servers. # [ibv_create_ah timeout] The usnic BTL failed to create addresses for remote peers within the specified timeout. When using the usNIC/UDP transport, this usually means that ARP requests failed to resolve in time. You may be able to solve the problem by increasing the usnic BTL's ARP timeout. If that doesn't work, you should diagnose why ARP replies are apparently not being delivered in a timely manner. The usNIC interface listed below will be ignored. Your MPI application will likely either run with degraded performance and/or abort. Server: %s usNIC interface: %s (which is %s) Current ARP timeout: %d (btl_usnic_arp_timeout MCA param) # [transport mismatch] The underlying transports used by the usNIC driver stack on multiple servers do not match. This configuration is unsupported and is almost certainly not what you want. This error indicates that the VIC firmware, Linux usNIC kernel driver, and/or Linux usNIC userspace drivers are not compatible between at least the following two servers: Local server: %s Remote server: %s The usnic MPI transport will be deactivated in at least the one local MPI process that reported the problem. This may lead to performance degradation, and may also result in aborting the overall MPI job. It is usually easiest to have the same VIC firmware, Linux usNIC kernel driver, and Linux usNIC userspace driver installed on all servers. # [create_ah failed] WARNING: Open MPI failed to find a route to a peer IP address via a specific usNIC interface. This usually indicates a problem in the IP routing between these peers. Open MPI will skip this usNIC interface when communicating with that peer, which may result in lower performance to that peer. It may also result in your job aborting if there are no other network paths to that peer. Note that this error message defaults to only printing for the *first* pair of MPI process peers that exhibit this problem; this same problem may exist for other peer pairs, too. Local interface: %s:%s (which is %s and %s) Peer: %s:%s NOTE: You can set the MCA param btl_usnic_show_route_failures to 0 to disable this warning.