709 строки
24 KiB
Plaintext
709 строки
24 KiB
Plaintext
# -*- text -*-
|
|
#
|
|
# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
|
|
# University Research and Technology
|
|
# Corporation. All rights reserved.
|
|
# Copyright (c) 2004-2005 The University of Tennessee and The University
|
|
# of Tennessee Research Foundation. All rights
|
|
# reserved.
|
|
# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
|
# University of Stuttgart. All rights reserved.
|
|
# Copyright (c) 2004-2006 The Regents of the University of California.
|
|
# All rights reserved.
|
|
# Copyright (c) 2006-2011 Cisco Systems, Inc. All rights reserved.
|
|
# Copyright (c) 2007-2009 Mellanox Technologies. All rights reserved.
|
|
# Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved.
|
|
# Copyright (c) 2013-2014 NVIDIA Corporation. All rights reserved.
|
|
# $COPYRIGHT$
|
|
#
|
|
# Additional copyrights may follow
|
|
#
|
|
# $HEADER$
|
|
#
|
|
# This is the US/English help file for Open MPI's OpenFabrics support
|
|
# (the openib BTL).
|
|
#
|
|
[ini file:file not found]
|
|
The Open MPI OpenFabrics (openib) BTL component was unable to find or
|
|
read an INI file that was requested via the
|
|
btl_openib_device_param_files MCA parameter. Please check this file
|
|
and/or modify the btl_openib_evice_param_files MCA parameter:
|
|
|
|
%s
|
|
#
|
|
[ini file:not in a section]
|
|
In parsing the OpenFabrics (openib) BTL parameter file, values were
|
|
found that were not in a valid INI section. These values will be
|
|
ignored. Please re-check this file:
|
|
|
|
%s
|
|
|
|
At line %d, near the following text:
|
|
|
|
%s
|
|
#
|
|
[ini file:unexpected token]
|
|
In parsing the OpenFabrics (openib) BTL parameter file, unexpected
|
|
tokens were found (this may cause significant portions of the INI file
|
|
to be ignored). Please re-check this file:
|
|
|
|
%s
|
|
|
|
At line %d, near the following text:
|
|
|
|
%s
|
|
#
|
|
[ini file:expected equals]
|
|
In parsing the OpenFabrics (openib) BTL parameter file, unexpected
|
|
tokens were found (this may cause significant portions of the INI file
|
|
to be ignored). An equals sign ("=") was expected but was not found.
|
|
Please re-check this file:
|
|
|
|
%s
|
|
|
|
At line %d, near the following text:
|
|
|
|
%s
|
|
#
|
|
[ini file:expected newline]
|
|
In parsing the OpenFabrics (openib) BTL parameter file, unexpected
|
|
tokens were found (this may cause significant portions of the INI file
|
|
to be ignored). A newline was expected but was not found. Please
|
|
re-check this file:
|
|
|
|
%s
|
|
|
|
At line %d, near the following text:
|
|
|
|
%s
|
|
#
|
|
[ini file:unknown field]
|
|
In parsing the OpenFabrics (openib) BTL parameter file, an
|
|
unrecognized field name was found. Please re-check this file:
|
|
|
|
%s
|
|
|
|
At line %d, the field named:
|
|
|
|
%s
|
|
|
|
This field, and any other unrecognized fields, will be skipped.
|
|
#
|
|
[no device params found]
|
|
WARNING: No preset parameters were found for the device that Open MPI
|
|
detected:
|
|
|
|
Local host: %s
|
|
Device name: %s
|
|
Device vendor ID: 0x%04x
|
|
Device vendor part ID: %d
|
|
|
|
Default device parameters will be used, which may result in lower
|
|
performance. You can edit any of the files specified by the
|
|
btl_openib_device_param_files MCA parameter to set values for your
|
|
device.
|
|
|
|
NOTE: You can turn off this warning by setting the MCA parameter
|
|
btl_openib_warn_no_device_params_found to 0.
|
|
#
|
|
[init-fail-no-mem]
|
|
The OpenFabrics (openib) BTL failed to initialize while trying to
|
|
allocate some locked memory. This typically can indicate that the
|
|
memlock limits are set too low. For most HPC installations, the
|
|
memlock limits should be set to "unlimited". The failure occured
|
|
here:
|
|
|
|
Local host: %s
|
|
OMPI source: %s:%d
|
|
Function: %s()
|
|
Device: %s
|
|
Memlock limit: %s
|
|
|
|
You may need to consult with your system administrator to get this
|
|
problem fixed. This FAQ entry on the Open MPI web site may also be
|
|
helpful:
|
|
|
|
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
|
|
#
|
|
[init-fail-create-q]
|
|
The OpenFabrics (openib) BTL failed to initialize while trying to
|
|
create an internal queue. This typically indicates a failed
|
|
OpenFabrics installation, faulty hardware, or that Open MPI is
|
|
attempting to use a feature that is not supported on your hardware
|
|
(i.e., is a shared receive queue specified in the
|
|
btl_openib_receive_queues MCA parameter with a device that does not
|
|
support it?). The failure occured here:
|
|
|
|
Local host: %s
|
|
OMPI source: %s:%d
|
|
Function: %s()
|
|
Error: %s (errno=%d)
|
|
Device: %s
|
|
|
|
You may need to consult with your system administrator to get this
|
|
problem fixed.
|
|
#
|
|
[pp rnr retry exceeded]
|
|
The OpenFabrics "receiver not ready" retry count on a per-peer
|
|
connection between two MPI processes has been exceeded. In general,
|
|
this should not happen because Open MPI uses flow control on per-peer
|
|
connections to ensure that receivers are always ready when data is
|
|
sent.
|
|
|
|
This error usually means one of two things:
|
|
|
|
1. There is something awry within the network fabric itself.
|
|
2. A bug in Open MPI has caused flow control to malfunction.
|
|
|
|
#1 is usually more likely. You should note the hosts on which this
|
|
error has occurred; it has been observed that rebooting or removing a
|
|
particular host from the job can sometimes resolve this issue.
|
|
|
|
Below is some information about the host that raised the error and the
|
|
peer to which it was connected:
|
|
|
|
Local host: %s
|
|
Local device: %s
|
|
Peer host: %s
|
|
|
|
You may need to consult with your system administrator to get this
|
|
problem fixed.
|
|
#
|
|
[srq rnr retry exceeded]
|
|
The OpenFabrics "receiver not ready" retry count on a shared receive
|
|
queue or XRC receive queue has been exceeded. This error can occur if
|
|
the mca_btl_openib_ib_rnr_retry is set to a value less than 7 (where 7
|
|
the default value and effectively means "infinite retry"). If your
|
|
rnr_retry value is 7, there might be something awry within the network
|
|
fabric itself. In this case, you should note the hosts on which this
|
|
error has occurred; it has been observed that rebooting or removing a
|
|
particular host from the job can sometimes resolve this issue.
|
|
|
|
Below is some information about the host that raised the error and the
|
|
peer to which it was connected:
|
|
|
|
Local host: %s
|
|
Local device: %s
|
|
Peer host: %s
|
|
|
|
You may need to consult with your system administrator to get this
|
|
problem fixed.
|
|
#
|
|
[pp retry exceeded]
|
|
The InfiniBand retry count between two MPI processes has been
|
|
exceeded. "Retry count" is defined in the InfiniBand spec 1.2
|
|
(section 12.7.38):
|
|
|
|
The total number of times that the sender wishes the receiver to
|
|
retry timeout, packet sequence, etc. errors before posting a
|
|
completion error.
|
|
|
|
This error typically means that there is something awry within the
|
|
InfiniBand fabric itself. You should note the hosts on which this
|
|
error has occurred; it has been observed that rebooting or removing a
|
|
particular host from the job can sometimes resolve this issue.
|
|
|
|
Two MCA parameters can be used to control Open MPI's behavior with
|
|
respect to the retry count:
|
|
|
|
* btl_openib_ib_retry_count - The number of times the sender will
|
|
attempt to retry (defaulted to 7, the maximum value).
|
|
* btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
|
|
to 20). The actual timeout value used is calculated as:
|
|
|
|
4.096 microseconds * (2^btl_openib_ib_timeout)
|
|
|
|
See the InfiniBand spec 1.2 (section 12.7.34) for more details.
|
|
|
|
Below is some information about the host that raised the error and the
|
|
peer to which it was connected:
|
|
|
|
Local host: %s
|
|
Local device: %s
|
|
Peer host: %s
|
|
|
|
You may need to consult with your system administrator to get this
|
|
problem fixed.
|
|
#
|
|
[no active ports found]
|
|
WARNING: There is at least non-excluded one OpenFabrics device found,
|
|
but there are no active ports detected (or Open MPI was unable to use
|
|
them). This is most certainly not what you wanted. Check your
|
|
cables, subnet manager configuration, etc. The openib BTL will be
|
|
ignored for this job.
|
|
|
|
Local host: %s
|
|
#
|
|
[error in device init]
|
|
WARNING: There was an error initializing an OpenFabrics device.
|
|
|
|
Local host: %s
|
|
Local device: %s
|
|
#
|
|
[no devices right type]
|
|
WARNING: No OpenFabrics devices of the right type were found within
|
|
the requested bus distance. The OpenFabrics BTL will be ignored for
|
|
this run.
|
|
|
|
Local host: %s
|
|
Requested type: %s
|
|
|
|
If the "requested type" is "<any>", this usually means that *no*
|
|
OpenFabrics devices were found within the requested bus distance.
|
|
#
|
|
[default subnet prefix]
|
|
WARNING: There are more than one active ports on host '%s', but the
|
|
default subnet GID prefix was detected on more than one of these
|
|
ports. If these ports are connected to different physical IB
|
|
networks, this configuration will fail in Open MPI. This version of
|
|
Open MPI requires that every physically separate IB subnet that is
|
|
used between connected MPI processes must have different subnet ID
|
|
values.
|
|
|
|
Please see this FAQ entry for more details:
|
|
|
|
http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid
|
|
|
|
NOTE: You can turn off this warning by setting the MCA parameter
|
|
btl_openib_warn_default_gid_prefix to 0.
|
|
#
|
|
[ibv_fork requested but not supported]
|
|
WARNING: fork() support was requested for the OpenFabrics (openib)
|
|
BTL, but it is not supported on the host %s. Deactivating the
|
|
OpenFabrics BTL.
|
|
#
|
|
[ibv_fork_init fail]
|
|
WARNING: fork() support was requested for the OpenFabrics (openib)
|
|
BTL, but the library call ibv_fork_init() failed on the host %s.
|
|
Deactivating the OpenFabrics BTL.
|
|
#
|
|
[wrong buffer alignment]
|
|
Wrong buffer alignment %d configured on host '%s'. Should be bigger
|
|
than zero and power of two. Use default %d instead.
|
|
#
|
|
[of error event]
|
|
The OpenFabrics stack has reported a network error event. Open MPI
|
|
will try to continue, but your job may end up failing.
|
|
|
|
Local host: %s
|
|
MPI process PID: %d
|
|
Error number: %d (%s)
|
|
|
|
This error may indicate connectivity problems within the fabric;
|
|
please contact your system administrator.
|
|
#
|
|
[of unknown event]
|
|
The OpenFabrics stack has reported an unknown network error event.
|
|
Open MPI will try to continue, but the job may end up failing.
|
|
|
|
Local host: %s
|
|
MPI process PID: %d
|
|
Error number: %d
|
|
|
|
This error may indicate that you are using an OpenFabrics library
|
|
version that is not currently supported by Open MPI. You might try
|
|
recompiling Open MPI against your OpenFabrics library installation to
|
|
get more information.
|
|
#
|
|
[specified include and exclude]
|
|
ERROR: You have specified more than one of the btl_openib_if_include,
|
|
btl_openib_if_exclude, btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude
|
|
MCA parameters. These four parameters are mutually exclusive; you can only
|
|
specify one.
|
|
|
|
For reference, the values that you specified are:
|
|
|
|
btl_openib_if_include: %s
|
|
btl_openib_if_exclude: %s
|
|
btl_openib_ipaddr_include: %s
|
|
btl_openib_ipaddr_exclude: %s
|
|
#
|
|
[nonexistent port]
|
|
WARNING: One or more nonexistent OpenFabrics devices/ports were
|
|
specified:
|
|
|
|
Host: %s
|
|
MCA parameter: mca_btl_if_%sclude
|
|
Nonexistent entities: %s
|
|
|
|
These entities will be ignored. You can disable this warning by
|
|
setting the btl_openib_warn_nonexistent_if MCA parameter to 0.
|
|
#
|
|
[invalid mca param value]
|
|
WARNING: An invalid MCA parameter value was found for the OpenFabrics
|
|
(openib) BTL.
|
|
|
|
Problem: %s
|
|
Resolution: %s
|
|
#
|
|
[no qps in receive_queues]
|
|
WARNING: No queue pairs were defined in the btl_openib_receive_queues
|
|
MCA parameter. At least one queue pair must be defined. The
|
|
OpenFabrics (openib) BTL will therefore be deactivated for this run.
|
|
|
|
Local host: %s
|
|
#
|
|
[invalid qp type in receive_queues]
|
|
WARNING: An invalid queue pair type was specified in the
|
|
btl_openib_receive_queues MCA parameter. The OpenFabrics (openib) BTL
|
|
will be deactivated for this run.
|
|
|
|
Valid queue pair types are "P" for per-peer and "S" for shared receive
|
|
queue.
|
|
|
|
Local host: %s
|
|
btl_openib_receive_queues: %s
|
|
Bad specification: %s
|
|
#
|
|
[invalid pp qp specification]
|
|
WARNING: An invalid per-peer receive queue specification was detected
|
|
as part of the btl_openib_receive_queues MCA parameter. The
|
|
OpenFabrics (openib) BTL will therefore be deactivated for this run.
|
|
|
|
Per-peer receive queues require between 2 and 5 parameters:
|
|
|
|
1. Buffer size in bytes (mandatory)
|
|
2. Number of buffers (mandatory)
|
|
3. Low buffer count watermark (optional; defaults to (num_buffers / 2))
|
|
4. Credit window size (optional; defaults to (low_watermark / 2),
|
|
must be > 0)
|
|
5. Number of buffers reserved for credit messages (optional;
|
|
defaults to (num_buffers*2-1)/credit_window)
|
|
|
|
Example: P,128,256,128,16
|
|
- 128 byte buffers
|
|
- 256 buffers to receive incoming MPI messages
|
|
- When the number of available buffers reaches 128, re-post 128 more
|
|
buffers to reach a total of 256
|
|
- If the number of available credits reaches 16, send an explicit
|
|
credit message to the sender
|
|
- Defaulting to ((256 * 2) - 1) / 16 = 31; this many buffers are
|
|
reserved for explicit credit messages
|
|
|
|
Local host: %s
|
|
Bad queue specification: %s
|
|
#
|
|
[invalid srq specification]
|
|
WARNING: An invalid shared receive queue specification was detected as
|
|
part of the btl_openib_receive_queues MCA parameter. The OpenFabrics
|
|
(openib) BTL will therefore be deactivated for this run.
|
|
|
|
Shared receive queues can take between 2 and 6 parameters:
|
|
|
|
1. Buffer size in bytes (mandatory)
|
|
2. Number of buffers (mandatory)
|
|
3. Low buffer count watermark (optional; defaults to (num_buffers / 2))
|
|
4. Maximum number of outstanding sends a sender can have (optional;
|
|
defaults to (low_watermark / 4)
|
|
5. Start value of number of receive buffers that will be pre-posted (optional; defaults to (num_buffers / 4))
|
|
6. Event limit buffer count watermark (optional; defaults to (3/16 of start value of buffers number))
|
|
|
|
Example: S,1024,256,128,32,32,8
|
|
- 1024 byte buffers
|
|
- 256 buffers to receive incoming MPI messages
|
|
- When the number of available buffers reaches 128, re-post 128 more
|
|
buffers to reach a total of 256
|
|
- A sender will not send to a peer unless it has less than 32
|
|
outstanding sends to that peer.
|
|
- 32 receive buffers will be preposted.
|
|
- When the number of unused shared receive buffers reaches 8, more
|
|
buffers (32 in this case) will be posted.
|
|
|
|
Local host: %s
|
|
Bad queue specification: %s
|
|
#
|
|
[rd_num must be > rd_low]
|
|
WARNING: The number of buffers for a queue pair specified via the
|
|
btl_openib_receive_queues MCA parameter must be greater than the low
|
|
buffer count watermark. The OpenFabrics (openib) BTL will therefore
|
|
be deactivated for this run.
|
|
|
|
Local host: %s
|
|
Bad queue specification: %s
|
|
#
|
|
[rd_num must be >= rd_init]
|
|
WARNING: The number of buffers for a queue pair specified via the
|
|
btl_openib_receive_queues MCA parameter (parameter #2) must be
|
|
greater or equal to the initial SRQ size (parameter #5).
|
|
The OpenFabrics (openib) BTL will therefore be deactivated for this run.
|
|
|
|
Local host: %s
|
|
Bad queue specification: %s
|
|
#
|
|
[srq_limit must be > rd_num]
|
|
WARNING: The number of buffers for a queue pair specified via the
|
|
btl_openib_receive_queues MCA parameter (parameter #2) must be greater than the limit
|
|
buffer count (parameter #6). The OpenFabrics (openib) BTL will therefore
|
|
be deactivated for this run.
|
|
|
|
Local host: %s
|
|
Bad queue specification: %s
|
|
#
|
|
[biggest qp size is too small]
|
|
WARNING: The largest queue pair buffer size specified in the
|
|
btl_openib_receive_queues MCA parameter is smaller than the maximum
|
|
send size (i.e., the btl_openib_max_send_size MCA parameter), meaning
|
|
that no queue is large enough to receive the largest possible incoming
|
|
message fragment. The OpenFabrics (openib) BTL will therefore be
|
|
deactivated for this run.
|
|
|
|
Local host: %s
|
|
Largest buffer size: %d
|
|
Maximum send fragment size: %d
|
|
#
|
|
[biggest qp size is too big]
|
|
WARNING: The largest queue pair buffer size specified in the
|
|
btl_openib_receive_queues MCA parameter is larger than the maximum
|
|
send size (i.e., the btl_openib_max_send_size MCA parameter). This
|
|
means that memory will be wasted because the largest possible incoming
|
|
message fragment will not fill a buffer allocated for incoming
|
|
fragments.
|
|
|
|
Local host: %s
|
|
Largest buffer size: %d
|
|
Maximum send fragment size: %d
|
|
#
|
|
[freelist too small]
|
|
WARNING: The maximum freelist size that was specified was too small
|
|
for the requested receive queue sizes. The maximum freelist size must
|
|
be at least equal to the sum of the largest number of buffers posted
|
|
to a single queue plus the corresponding number of reserved/credit
|
|
buffers for that queue. It is suggested that the maximum be quite a
|
|
bit larger than this for performance reasons.
|
|
|
|
Local host: %s
|
|
Specified freelist size: %d
|
|
Minimum required freelist size: %d
|
|
#
|
|
[XRC with PP or SRQ]
|
|
WARNING: An invalid queue pair type was specified in the
|
|
btl_openib_receive_queues MCA parameter. The OpenFabrics (openib) BTL
|
|
will be deactivated for this run.
|
|
|
|
Note that XRC ("X") queue pairs cannot be used with per-peer ("P") and
|
|
SRQ ("S") queue pairs. This restriction may be removed in future
|
|
versions of Open MPI.
|
|
|
|
Local host: %s
|
|
btl_openib_receive_queues: %s
|
|
#
|
|
[XRC with BTLs per LID]
|
|
WARNING: An invalid queue pair type was specified in the
|
|
btl_openib_receive_queues MCA parameter. The OpenFabrics (openib) BTL
|
|
will be deactivated for this run.
|
|
|
|
XRC ("X") queue pairs can not be used when (btls_per_lid > 1). This
|
|
restriction may be removed in future versions of Open MPI.
|
|
|
|
Local host: %s
|
|
btl_openib_receive_queues: %s
|
|
btls_per_lid: %d
|
|
#
|
|
[XRC on device without XRC support]
|
|
WARNING: You configured the OpenFabrics (openib) BTL to run with %d
|
|
XRC queues. The device %s does not have XRC capabilities; the
|
|
OpenFabrics btl will ignore this device. If no devices are found with
|
|
XRC capabilities, the OpenFabrics BTL will be disabled.
|
|
|
|
Local host: %s
|
|
#
|
|
[No XRC support]
|
|
WARNING: The Open MPI build was compiled without XRC support, but XRC
|
|
("X") queues were specified in the btl_openib_receive_queues MCA
|
|
parameter. The OpenFabrics (openib) BTL will therefore be deactivated
|
|
for this run.
|
|
|
|
Local host: %s
|
|
btl_openib_receive_queues: %s
|
|
#
|
|
[non optimal rd_win]
|
|
WARNING: rd_win specification is non optimal. For maximum performance it is
|
|
advisable to configure rd_win bigger than (rd_num - rd_low), but currently
|
|
rd_win = %d and (rd_num - rd_low) = %d.
|
|
#
|
|
[apm without lmc]
|
|
WARNING: You can't enable APM support with LMC bit configured to 0.
|
|
APM support will be disabled.
|
|
#
|
|
[apm with wrong lmc]
|
|
Can not provide %d alternative paths with LMC bit configured to %d.
|
|
#
|
|
[apm not enough ports]
|
|
WARNING: For APM over ports ompi require at least 2 active ports and
|
|
only single active port was found. Disabling APM over ports
|
|
#
|
|
[locally conflicting receive_queues]
|
|
Open MPI detected two devices on a single server that have different
|
|
"receive_queues" parameter values (in the openib BTL). Open MPI
|
|
currently only supports one OpenFabrics receive_queues value in an MPI
|
|
job, even if you have different types of OpenFabrics adapters on the
|
|
same host.
|
|
|
|
Device 2 (in the details shown below) will be ignored for the duration
|
|
of this MPI job.
|
|
|
|
You can fix this issue by one or more of the following:
|
|
|
|
1. Set the MCA parameter btl_openib_receive_queues to a value that
|
|
is usable by all the OpenFabrics devices that you will use.
|
|
2. Use the btl_openib_if_include or btl_openib_if_exclue MCA
|
|
parameters to select exactly which OpenFabrics devices to use in
|
|
your MPI job.
|
|
|
|
Finally, note that the "receive_queues" values may have been set by
|
|
the Open MPI device default settings file. You may want to look in
|
|
this file and see if your devices are getting receive_queues values
|
|
from this file:
|
|
|
|
%s/mca-btl-openib-device-params.ini
|
|
|
|
Here is more detailed information about the recieive_queus value
|
|
conflict:
|
|
|
|
Local host: %s
|
|
Device 1: %s (vendor 0x%x, part ID %d)
|
|
Receive queues: %s
|
|
Device 2: %s (vendor 0x%x, part ID %d)
|
|
Receive queues: %s
|
|
#
|
|
[eager RDMA and progress threads]
|
|
WARNING: The openib BTL was directed to use "eager RDMA" for short
|
|
messages, but the openib BTL was compiled with progress threads
|
|
support. Short eager RDMA is not yet supported with progress threads;
|
|
its use has been disabled in this job.
|
|
|
|
This is a warning only; you job will attempt to continue.
|
|
#
|
|
[ptmalloc2 with no threads]
|
|
WARNING: It appears that ptmalloc2 was compiled into this process via
|
|
-lopenmpi-malloc, but there is no thread support. This combination is
|
|
known to cause memory corruption in the openib BTL. Open MPI is
|
|
therefore disabling the use of the openib BTL in this process for this
|
|
run.
|
|
|
|
Local host: %s
|
|
#
|
|
[cannot raise btl error]
|
|
The OpenFabrics driver in Open MPI tried to raise a fatal error, but
|
|
failed. Hopefully there was an error message before this one that
|
|
gave some more detailed information.
|
|
|
|
Local host: %s
|
|
Source file: %s
|
|
Source line: %d
|
|
|
|
Your job is now going to abort, sorry.
|
|
#
|
|
[no iwarp support]
|
|
Open MPI does not support iWARP devices with this version of OFED.
|
|
You need to upgrade to a later version of OFED (1.3 or later) for Open
|
|
MPI to support iWARP devices.
|
|
|
|
(This message is being displayed because you told Open MPI to use
|
|
iWARP devices via the btl_openib_device_type MCA parameter)
|
|
#
|
|
[invalid ipaddr_inexclude]
|
|
WARNING: An invalid value was given for btl_openib_ipaddr_%s. This
|
|
value will be ignored.
|
|
|
|
Local host: %s
|
|
Value: %s
|
|
Message: %s
|
|
#
|
|
[unsupported queues configuration]
|
|
The Open MPI receive queue configuration for the OpenFabrics devices
|
|
on two nodes are incompatible, meaning that MPI processes on two
|
|
specific nodes were unable to communicate with each other. This
|
|
generally happens when you are using OpenFabrics devices from
|
|
different vendors on the same network. You should be able to use the
|
|
mca_btl_openib_receive_queues MCA parameter to set a uniform receive
|
|
queue configuration for all the devices in the MPI job, and therefore
|
|
be able to run successfully.
|
|
|
|
Local host: %s
|
|
Local adapter: %s (vendor 0x%x, part ID %d)
|
|
Local queues: %s
|
|
|
|
Remote host: %s
|
|
Remote adapter: (vendor 0x%x, part ID %d)
|
|
Remote queues: %s
|
|
#
|
|
[conflicting transport types]
|
|
Open MPI detected two different OpenFabrics transport types in the same Infiniband network.
|
|
Such mixed network trasport configuration is not supported by Open MPI.
|
|
|
|
Local host: %s
|
|
Local adapter: %s (vendor 0x%x, part ID %d)
|
|
Local transport type: %s
|
|
|
|
Remote host: %s
|
|
Remote Adapter: (vendor 0x%x, part ID %d)
|
|
Remote transport type: %s
|
|
#
|
|
[gid index too large]
|
|
Open MPI tried to use a GID index that was too large for an
|
|
OpenFabrics device (i.e., the GID index does not exist on this
|
|
device).
|
|
|
|
Local host: %s
|
|
Local adapter: %s
|
|
Local port: %d
|
|
|
|
Requested GID index: %d (specified by the btl_openib_gid_index MCA param)
|
|
Max allowable GID index: %d
|
|
|
|
Use "ibv_devinfo -v" on the local host to see the GID table of this
|
|
device.
|
|
[reg mem limit low]
|
|
WARNING: It appears that your OpenFabrics subsystem is configured to only
|
|
allow registering part of your physical memory. This can cause MPI jobs to
|
|
run with erratic performance, hang, and/or crash.
|
|
|
|
This may be caused by your OpenFabrics vendor limiting the amount of
|
|
physical memory that can be registered. You should investigate the
|
|
relevant Linux kernel module parameters that control how much physical
|
|
memory can be registered, and increase them to allow registering all
|
|
physical memory on your machine.
|
|
|
|
See this Open MPI FAQ item for more information on these Linux kernel module
|
|
parameters:
|
|
|
|
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
|
|
|
|
Local host: %s
|
|
Registerable memory: %lu MiB
|
|
Total memory: %lu MiB
|
|
|
|
%s
|
|
[CUDA_no_gdr_support]
|
|
You requested to run with CUDA GPU Direct RDMA support but the Open MPI
|
|
library was not built with that support. The Open MPI library must be
|
|
configured with CUDA 6.0 or later.
|
|
|
|
Local host: %s
|
|
[driver_no_gdr_support]
|
|
You requested to run with CUDA GPU Direct RDMA support but this OFED
|
|
installation does not have that support. Contact Mellanox to figure
|
|
out how to get an OFED stack with that support.
|
|
|
|
Local host: %s
|
|
[no_fork_with_gdr]
|
|
You cannot have fork support and CUDA GPU Direct RDMA support on at the
|
|
same time. Please disable one of them. Deactivating the openib BTL.
|
|
|
|
Local host: %s
|
|
#
|
|
[CUDA_gdr_and_nopinned]
|
|
You requested to run with CUDA GPU Direct RDMA support but also with
|
|
"leave pinned" turned off. This will result in very poor performance
|
|
with CUDA GPU Direct RDMA. Either disable GPU Direct RDMA support or
|
|
enable "leave pinned" support. Deactivating the openib BTL.
|
|
|
|
Local host: %s
|
|
#
|
|
[do_not_set_openib_value]
|
|
Open MPI has detected that you have attempted to set the btl_openib_cuda_max_send_size
|
|
value. This is not supported. Setting back to default value of 0.
|
|
|
|
Local host: %s
|