usnic: update connectivity checker help message
Show an example of using the btl_usnic_connectivity_map option. Also, mention that another reason for the "total connectivity failure" may be due to asymmetric / unexpected routing. Reviewed by Dave Goodell. cmr=v1.8.2:reviewer=ompi-rm1.8 This commit was SVN r32465.
Этот коммит содержится в:
родитель
c5b2f9c8a5
Коммит
323b9f346c
@ -174,9 +174,18 @@ Your MPI job is going to abort now.
|
||||
Large message size: %u
|
||||
|
||||
Note that this behavior usually indicates that the MTU of some network
|
||||
link is too small between these two interfaces. You should verify that
|
||||
UDP traffic with payloads up to the "large message size" listed above
|
||||
can flow between the specified interfaces on these servers.
|
||||
link is too small between these two interfaces.
|
||||
|
||||
You should verify that UDP traffic with payloads up to the "large
|
||||
message size" listed above can flow between these two interfaces. You
|
||||
should also verify that Open MPI is choosing to pair IP interfaces
|
||||
consistently. For example:
|
||||
|
||||
mpirun --mca btl_usnic_connectivity_map mymap ...
|
||||
|
||||
Check the resulting "mymap*" files to see the exact pairing of IP
|
||||
interfaces. Inconsistent results may be indicative of underlying
|
||||
network misconfigurations.
|
||||
#
|
||||
[connectivity error: small bad, large ok]
|
||||
The Open MPI usNIC BTL was unable to establish full connectivity
|
||||
@ -199,15 +208,28 @@ Your MPI job is going to abort now.
|
||||
|
||||
This is a very strange network error, and should not occur in most
|
||||
situations. You may be experiencing high amounts of congestion, or
|
||||
this may indicate some kind of network misconfiguration. You should
|
||||
verify that UDP traffic with payloads up to the "large message size"
|
||||
listed above can flow between the specified interfaces on these
|
||||
servers.
|
||||
this may indicate some kind of network misconfiguration.
|
||||
|
||||
You should verify that UDP traffic with payloads up to the "large
|
||||
message size" listed above can flow between these two interfaces. You
|
||||
should also verify that Open MPI is choosing to pair IP interfaces
|
||||
consistently. For example:
|
||||
|
||||
mpirun --mca btl_usnic_connectivity_map mymap ...
|
||||
|
||||
Check the resulting "mymap*" files to see the exact pairing of IP
|
||||
interfaces. Inconsistent results may be indicative of underlying
|
||||
network misconfigurations.
|
||||
#
|
||||
[connectivity error: small bad, large bad]
|
||||
The Open MPI usNIC BTL was unable to establish any connectivity
|
||||
between at least one pair of interfaces on servers in the MPI job.
|
||||
Specifically, no UDP messages seemed to flow between the interfaces.
|
||||
This can happen for several reasons, including:
|
||||
|
||||
1. No UDP traffic is able to flow between the interfaces listed below.
|
||||
2. There is asymmetric routing between the interfaces listed below,
|
||||
leading Open MPI to discard UDP traffic it thinks is from an
|
||||
unexpected source.
|
||||
|
||||
Your MPI job is going to abort now.
|
||||
|
||||
@ -223,9 +245,18 @@ Your MPI job is going to abort now.
|
||||
Large message size: %u
|
||||
|
||||
Note that this behavior usually indicates some kind of network
|
||||
misconfiguration. You should verify that UDP traffic with payloads up
|
||||
to the "large message size" listed above can flow between the
|
||||
specified interfaces on these servers.
|
||||
misconfiguration.
|
||||
|
||||
You should verify that UDP traffic with payloads up to the "large
|
||||
message size" listed above can flow between these two interfaces. You
|
||||
should also verify that Open MPI is choosing to pair IP interfaces
|
||||
consistently. For example:
|
||||
|
||||
mpirun --mca btl_usnic_connectivity_map mymap ...
|
||||
|
||||
Check the resulting "mymap*" files to see the exact pairing of IP
|
||||
interfaces. Inconsistent results may be indicative of underlying
|
||||
network misconfigurations.
|
||||
#
|
||||
[ibv_create_ah timeout]
|
||||
The usnic BTL failed to create addresses for remote peers within the
|
||||
|
Загрузка…
Ссылка в новой задаче
Block a user