1
1
openmpi/opal/mca/btl/openib/mca-btl-openib-device-params.ini
Ralph Castain 552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00

291 строка
9.8 KiB
INI

#
# Copyright (c) 2006-2013 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2006-2011 Mellanox Technologies. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# This is the default NIC/HCA parameters file for Open MPI's OpenIB
# BTL. If NIC/HCA vendors wish to add their respective values into
# this file (that is distributed with Open MPI), please contact the
# Open MPI development team. See http://www.open-mpi.org/ for
# details.
# This file is in the "ini" style, meaning that it has sections
# identified section names enclosed in square brackets (e.g.,
# "[Section name]") followed by "key = value" pairs indicating values
# for a specific NIC/HCA vendor and model. NICs/HCAs are identified
# by their vendor ID and vendor part ID, which can be obtained by
# running the diagnostic utility command "ibv_devinfo". The fields
# "vendor_id" and "vendor_part"id" are the vendor ID and vendor part
# ID, respectively.
# The sections in this file only accept a few fields:
# vendor_id: a comma-delimited list of integers of NIC/HCA vendor IDs,
# expressed either in decimal or hexidecimal (e.g., "13" or "0xd").
# Individual values can be taken directly from the output of
# "ibv_devinfo". NIC/HCA vendor ID's correspond to IEEE OUI's, for
# which you can find the canonical list here:
# http://standards.ieee.org/regauth/oui/. Example:
#
# vendor_id = 0x05ad
#
# Note: Several vendors resell Mellanox hardware and put their own firmware
# on the cards, therefore overriding the default Mellanox vendor ID.
#
# Mellanox 0x02c9
# Cisco 0x05ad
# Silverstorm 0x066a
# Voltaire 0x08f1
# HP 0x1708
# Sun 0x03ba
# Bull 0x119f
# vendor_part_id: a comma-delimited list of integers of different
# NIC/HCA models from a single vendor, expressed in either decimal or
# hexidecimal (e.g., "13" or "0xd"). Individual values can be
# obtained from the output of the "ibv_devinfo". Example:
#
# vendor_part_id = 25208,25218
# mtu: an integer indicating the maximum transfer unit (MTU) to be
# used with this NIC/HCA. The effective MTU will be the minimum of an
# NIC's/HCA's MTU value and its peer NIC's/HCA's MTU value. Valid
# values are 256, 512, 1024, 2048, and 4096. Example:
#
# mtu = 1024
# use_eager_rdma: an integer indicating whether RDMA should be used
# for eager messages. 0 values indicate "no" (false); non-zero values
# indicate "yes" (true). This flag should only be enabled for
# NICs/HCAs that can provide guarantees about ordering of data in
# memory -- that the last byte of an incoming RDMA write will always
# be written last. Certain cards cannot provide this guarantee, while
# others can.
# use_eager_rdma = 1
# receive_queues: a list of "bucket shared receive queues" (BSRQ) that
# are opened between MPI process peer pairs for point-to-point
# communications of messages shorter than the total length required
# for RDMA transfer. The use of multiple RQs, each with different
# sized posted receive buffers can allow [much] better registered
# memory utilization -- MPI messages are sent on the QP with the
# smallest buffer size that will fit the message. Note that flow
# control messages are always sent across the QP with the smallest
# buffer size. Also note that the buffers *must* be listed in
# increasing buffer size. This parameter matches the
# mca_btl_openib_receive_queues MCA parameter; see the ompi_info help
# message and FAQ for a description of its values. BSRQ
# specifications are found in this precedence:
# highest: specifying the mca_btl_openib_receive_queues MCA param
# next: finding a value in this file
# lowest: using the default mca_btl_openib_receive_queues MCA param value
# receive_queues = P,128,256,192,128:S,65536,256,192,128
# max_inline_data: an integer specifying the maximum inline data (in
# bytes) supported by the device. -1 means to use a run-time probe to
# figure out the maximum value supported by the device.
# max_inline_data = 1024
# rdmacm_reject_causes_connect_error: a boolean indicating whether
# when an RDMA CM REJECT is issued on the device, instead of getting
# the expected REJECT event back, you might get a CONNECT_ERROR event.
# Open MPI uses RDMA CM REJECT messages in its normal wireup
# procedure; some connections are *expected* to be rejected. However,
# with some older drivers, if process A issues a REJECT, process B
# will receive a CONNECT_ERROR event instead of a REJECT event. So if
# this flag is set to true and we receive a CONNECT_ERROR event on a
# connection where we are expecting a REJECT, then just treat the
# CONNECT_ERROR exactly as we would have treated the REJECT. Setting
# this flag to true allows Open MPI to work around the behavior
# described above. It is [mostly] safe to set this flag to true even
# after a driver has been fixed; the scope of where this flag is used
# is small enough that it *shouldn't* mask real CONNECT_ERROR events.
# rdmacm_reject_causes_connect_error = 1
############################################################################
[default]
# These are the default values, identified by the vendor and part ID
# numbers of 0 and 0. If queried NIC/HCA does not return vendor and
# part ID numbers that match any of the sections in this file, the
# values in this section are used. Vendor IDs and part IDs can be hex
# or decimal.
vendor_id = 0
vendor_part_id = 0
use_eager_rdma = 0
mtu = 1024
max_inline_data = 128
############################################################################
[Mellanox Tavor Infinihost]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3
vendor_part_id = 23108
use_eager_rdma = 1
mtu = 1024
max_inline_data = 128
############################################################################
[Mellanox Arbel InfiniHost III MemFree/Tavor]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3
vendor_part_id = 25208,25218
use_eager_rdma = 1
mtu = 1024
max_inline_data = 128
############################################################################
[Mellanox Sinai Infinihost III]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3
vendor_part_id = 25204,24204
use_eager_rdma = 1
mtu = 2048
max_inline_data = 128
############################################################################
# A.k.a. ConnectX
[Mellanox Hermon]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
vendor_part_id = 25408,25418,25428,25448,26418,26428,26438,26448,26468,26478,26488,4099,4103
use_eager_rdma = 1
mtu = 2048
max_inline_data = 128
############################################################################
[Mellanox ConnectIB]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
vendor_part_id = 4113
use_eager_rdma = 1
mtu = 4096
max_inline_data = 256
############################################################################
[Mellanox ConnectX4]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
vendor_part_id = 4115
use_eager_rdma = 1
mtu = 4096
max_inline_data = 256
############################################################################
[IBM eHCA 4x and 12x]
vendor_id = 0x5076
vendor_part_id = 0
use_eager_rdma = 1
mtu = 2048
receive_queues = P,128,256,192,128:P,65536,256,192,128
max_inline_data = 0
############################################################################
[IBM eHCA-2 4x and 12x]
vendor_id = 0x5076
vendor_part_id = 1
use_eager_rdma = 1
mtu = 4096
receive_queues = P,128,256,192,128:P,65536,256,192,128
max_inline_data = 0
############################################################################
# See http://lists.openfabrics.org/pipermail/general/2008-June/051920.html
# 0x1fc1 and 0x1077 are PCI ID's; at least one of QL's OUIs is 0x1175
[QLogic InfiniPath 1]
vendor_id = 0x1fc1,0x1077,0x1175
vendor_part_id = 13
use_eager_rdma = 1
mtu = 2048
max_inline_data = 0
[QLogic InfiniPath 2]
vendor_id = 0x1fc1,0x1077,0x1175
vendor_part_id = 16,29216
use_eager_rdma = 1
mtu = 4096
max_inline_data = 0
[QLogic InfiniPath 3]
vendor_id = 0x1fc1,0x1077,0x1175
vendor_part_id = 16,29474
use_eager_rdma = 1
mtu = 4096
max_inline_data = 0
############################################################################
# Chelsio's OUI is 0x0743. 0x1425 is the PCI ID.
[Chelsio T3]
vendor_id = 0x1425
vendor_part_id = 0x0020,0x0021,0x0022,0x0023,0x0024,0x0025,0x0026,0x0030,0x0031,0x0032,0x0035,0x0036
use_eager_rdma = 1
mtu = 2048
receive_queues = P,65536,256,192,128
max_inline_data = 64
[Chelsio T4]
vendor_id = 0x1425
vendor_part_id = 0xa000,0x4400,0x4401,0x4402,0x4403,0x4404,0x4405,0x4406,0x4407,0x4408,0x4409,0x440a,0x440b,0x440c,0x440d,0x440e,0x4480,0x4481
use_eager_rdma = 1
mtu = 2048
receive_queues = P,65536,64
max_inline_data = 280
[Chelsio T5]
vendor_id = 0x1425
vendor_part_id = 0xb000,0xb001,0x5400,0x5401,0x5402,0x5403,0x5404,0x5405,0x5406,0x5407,0x5408,0x5409,0x540a,0x540b,0x540c,0x540d,0x540e,0x540f,0x5410,0x5411,0x5412,0x5413
use_eager_rdma = 1
mtu = 2048
receive_queues = P,65536,64
max_inline_data = 280
############################################################################
# I'm *assuming* that 0x4040 is the PCI ID...
[NetXen]
vendor_id = 0x4040
vendor_part_id = 0x0001,0x0002,0x0003,0x0004,0x0005,0x0024,0x0025,0x0100
use_eager_rdma = 1
mtu = 2048
receive_queues = P,65536,248,192,128
max_inline_data = 64
############################################################################
# NetEffect's OUI is 0x1255. 0x1678 is the PCI ID. ...but then
# NetEffect was bought by Intel. Intel's OUI is 0x1b21.
[NetEffect/Intel NE020]
vendor_id = 0x1678,0x1255,0x1b21
vendor_part_id = 0x0100,0x0110
use_eager_rdma = 1
mtu = 2048
receive_queues = P,65536,256,192,128
max_inline_data = 64
############################################################################
# Intel has several OUI's, including 0x8086. Amusing. :-) Intel has
# advised us (June, 2013) to ignore the Intel Phi OpenFabrics
# device... at least for now.
[Intel Xeon Phi]
vendor_id = 0x8086
vendor_part_id = 0
ignore_device = 1