552c9ca5a0
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
291 строка
9.8 KiB
INI
291 строка
9.8 KiB
INI
#
|
|
# Copyright (c) 2006-2013 Cisco Systems, Inc. All rights reserved.
|
|
# Copyright (c) 2006-2011 Mellanox Technologies. All rights reserved.
|
|
# $COPYRIGHT$
|
|
#
|
|
# Additional copyrights may follow
|
|
#
|
|
|
|
# This is the default NIC/HCA parameters file for Open MPI's OpenIB
|
|
# BTL. If NIC/HCA vendors wish to add their respective values into
|
|
# this file (that is distributed with Open MPI), please contact the
|
|
# Open MPI development team. See http://www.open-mpi.org/ for
|
|
# details.
|
|
|
|
# This file is in the "ini" style, meaning that it has sections
|
|
# identified section names enclosed in square brackets (e.g.,
|
|
# "[Section name]") followed by "key = value" pairs indicating values
|
|
# for a specific NIC/HCA vendor and model. NICs/HCAs are identified
|
|
# by their vendor ID and vendor part ID, which can be obtained by
|
|
# running the diagnostic utility command "ibv_devinfo". The fields
|
|
# "vendor_id" and "vendor_part"id" are the vendor ID and vendor part
|
|
# ID, respectively.
|
|
|
|
# The sections in this file only accept a few fields:
|
|
|
|
# vendor_id: a comma-delimited list of integers of NIC/HCA vendor IDs,
|
|
# expressed either in decimal or hexidecimal (e.g., "13" or "0xd").
|
|
# Individual values can be taken directly from the output of
|
|
# "ibv_devinfo". NIC/HCA vendor ID's correspond to IEEE OUI's, for
|
|
# which you can find the canonical list here:
|
|
# http://standards.ieee.org/regauth/oui/. Example:
|
|
#
|
|
# vendor_id = 0x05ad
|
|
#
|
|
# Note: Several vendors resell Mellanox hardware and put their own firmware
|
|
# on the cards, therefore overriding the default Mellanox vendor ID.
|
|
#
|
|
# Mellanox 0x02c9
|
|
# Cisco 0x05ad
|
|
# Silverstorm 0x066a
|
|
# Voltaire 0x08f1
|
|
# HP 0x1708
|
|
# Sun 0x03ba
|
|
# Bull 0x119f
|
|
|
|
# vendor_part_id: a comma-delimited list of integers of different
|
|
# NIC/HCA models from a single vendor, expressed in either decimal or
|
|
# hexidecimal (e.g., "13" or "0xd"). Individual values can be
|
|
# obtained from the output of the "ibv_devinfo". Example:
|
|
#
|
|
# vendor_part_id = 25208,25218
|
|
|
|
# mtu: an integer indicating the maximum transfer unit (MTU) to be
|
|
# used with this NIC/HCA. The effective MTU will be the minimum of an
|
|
# NIC's/HCA's MTU value and its peer NIC's/HCA's MTU value. Valid
|
|
# values are 256, 512, 1024, 2048, and 4096. Example:
|
|
#
|
|
# mtu = 1024
|
|
|
|
# use_eager_rdma: an integer indicating whether RDMA should be used
|
|
# for eager messages. 0 values indicate "no" (false); non-zero values
|
|
# indicate "yes" (true). This flag should only be enabled for
|
|
# NICs/HCAs that can provide guarantees about ordering of data in
|
|
# memory -- that the last byte of an incoming RDMA write will always
|
|
# be written last. Certain cards cannot provide this guarantee, while
|
|
# others can.
|
|
|
|
# use_eager_rdma = 1
|
|
|
|
# receive_queues: a list of "bucket shared receive queues" (BSRQ) that
|
|
# are opened between MPI process peer pairs for point-to-point
|
|
# communications of messages shorter than the total length required
|
|
# for RDMA transfer. The use of multiple RQs, each with different
|
|
# sized posted receive buffers can allow [much] better registered
|
|
# memory utilization -- MPI messages are sent on the QP with the
|
|
# smallest buffer size that will fit the message. Note that flow
|
|
# control messages are always sent across the QP with the smallest
|
|
# buffer size. Also note that the buffers *must* be listed in
|
|
# increasing buffer size. This parameter matches the
|
|
# mca_btl_openib_receive_queues MCA parameter; see the ompi_info help
|
|
# message and FAQ for a description of its values. BSRQ
|
|
# specifications are found in this precedence:
|
|
|
|
# highest: specifying the mca_btl_openib_receive_queues MCA param
|
|
# next: finding a value in this file
|
|
# lowest: using the default mca_btl_openib_receive_queues MCA param value
|
|
|
|
# receive_queues = P,128,256,192,128:S,65536,256,192,128
|
|
|
|
# max_inline_data: an integer specifying the maximum inline data (in
|
|
# bytes) supported by the device. -1 means to use a run-time probe to
|
|
# figure out the maximum value supported by the device.
|
|
|
|
# max_inline_data = 1024
|
|
|
|
# rdmacm_reject_causes_connect_error: a boolean indicating whether
|
|
# when an RDMA CM REJECT is issued on the device, instead of getting
|
|
# the expected REJECT event back, you might get a CONNECT_ERROR event.
|
|
# Open MPI uses RDMA CM REJECT messages in its normal wireup
|
|
# procedure; some connections are *expected* to be rejected. However,
|
|
# with some older drivers, if process A issues a REJECT, process B
|
|
# will receive a CONNECT_ERROR event instead of a REJECT event. So if
|
|
# this flag is set to true and we receive a CONNECT_ERROR event on a
|
|
# connection where we are expecting a REJECT, then just treat the
|
|
# CONNECT_ERROR exactly as we would have treated the REJECT. Setting
|
|
# this flag to true allows Open MPI to work around the behavior
|
|
# described above. It is [mostly] safe to set this flag to true even
|
|
# after a driver has been fixed; the scope of where this flag is used
|
|
# is small enough that it *shouldn't* mask real CONNECT_ERROR events.
|
|
|
|
# rdmacm_reject_causes_connect_error = 1
|
|
|
|
############################################################################
|
|
|
|
[default]
|
|
# These are the default values, identified by the vendor and part ID
|
|
# numbers of 0 and 0. If queried NIC/HCA does not return vendor and
|
|
# part ID numbers that match any of the sections in this file, the
|
|
# values in this section are used. Vendor IDs and part IDs can be hex
|
|
# or decimal.
|
|
vendor_id = 0
|
|
vendor_part_id = 0
|
|
use_eager_rdma = 0
|
|
mtu = 1024
|
|
max_inline_data = 128
|
|
|
|
############################################################################
|
|
|
|
[Mellanox Tavor Infinihost]
|
|
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3
|
|
vendor_part_id = 23108
|
|
use_eager_rdma = 1
|
|
mtu = 1024
|
|
max_inline_data = 128
|
|
|
|
############################################################################
|
|
|
|
[Mellanox Arbel InfiniHost III MemFree/Tavor]
|
|
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3
|
|
vendor_part_id = 25208,25218
|
|
use_eager_rdma = 1
|
|
mtu = 1024
|
|
max_inline_data = 128
|
|
|
|
############################################################################
|
|
|
|
[Mellanox Sinai Infinihost III]
|
|
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3
|
|
vendor_part_id = 25204,24204
|
|
use_eager_rdma = 1
|
|
mtu = 2048
|
|
max_inline_data = 128
|
|
|
|
############################################################################
|
|
|
|
# A.k.a. ConnectX
|
|
[Mellanox Hermon]
|
|
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
|
|
vendor_part_id = 25408,25418,25428,25448,26418,26428,26438,26448,26468,26478,26488,4099,4103
|
|
use_eager_rdma = 1
|
|
mtu = 2048
|
|
max_inline_data = 128
|
|
|
|
############################################################################
|
|
|
|
[Mellanox ConnectIB]
|
|
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
|
|
vendor_part_id = 4113
|
|
use_eager_rdma = 1
|
|
mtu = 4096
|
|
max_inline_data = 256
|
|
|
|
############################################################################
|
|
|
|
[Mellanox ConnectX4]
|
|
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
|
|
vendor_part_id = 4115
|
|
use_eager_rdma = 1
|
|
mtu = 4096
|
|
max_inline_data = 256
|
|
|
|
############################################################################
|
|
|
|
[IBM eHCA 4x and 12x]
|
|
vendor_id = 0x5076
|
|
vendor_part_id = 0
|
|
use_eager_rdma = 1
|
|
mtu = 2048
|
|
receive_queues = P,128,256,192,128:P,65536,256,192,128
|
|
max_inline_data = 0
|
|
|
|
############################################################################
|
|
|
|
[IBM eHCA-2 4x and 12x]
|
|
vendor_id = 0x5076
|
|
vendor_part_id = 1
|
|
use_eager_rdma = 1
|
|
mtu = 4096
|
|
receive_queues = P,128,256,192,128:P,65536,256,192,128
|
|
max_inline_data = 0
|
|
|
|
############################################################################
|
|
|
|
# See http://lists.openfabrics.org/pipermail/general/2008-June/051920.html
|
|
# 0x1fc1 and 0x1077 are PCI ID's; at least one of QL's OUIs is 0x1175
|
|
|
|
[QLogic InfiniPath 1]
|
|
vendor_id = 0x1fc1,0x1077,0x1175
|
|
vendor_part_id = 13
|
|
use_eager_rdma = 1
|
|
mtu = 2048
|
|
max_inline_data = 0
|
|
|
|
[QLogic InfiniPath 2]
|
|
vendor_id = 0x1fc1,0x1077,0x1175
|
|
vendor_part_id = 16,29216
|
|
use_eager_rdma = 1
|
|
mtu = 4096
|
|
max_inline_data = 0
|
|
|
|
[QLogic InfiniPath 3]
|
|
vendor_id = 0x1fc1,0x1077,0x1175
|
|
vendor_part_id = 16,29474
|
|
use_eager_rdma = 1
|
|
mtu = 4096
|
|
max_inline_data = 0
|
|
|
|
############################################################################
|
|
|
|
# Chelsio's OUI is 0x0743. 0x1425 is the PCI ID.
|
|
|
|
[Chelsio T3]
|
|
vendor_id = 0x1425
|
|
vendor_part_id = 0x0020,0x0021,0x0022,0x0023,0x0024,0x0025,0x0026,0x0030,0x0031,0x0032,0x0035,0x0036
|
|
use_eager_rdma = 1
|
|
mtu = 2048
|
|
receive_queues = P,65536,256,192,128
|
|
max_inline_data = 64
|
|
|
|
[Chelsio T4]
|
|
vendor_id = 0x1425
|
|
vendor_part_id = 0xa000,0x4400,0x4401,0x4402,0x4403,0x4404,0x4405,0x4406,0x4407,0x4408,0x4409,0x440a,0x440b,0x440c,0x440d,0x440e,0x4480,0x4481
|
|
use_eager_rdma = 1
|
|
mtu = 2048
|
|
receive_queues = P,65536,64
|
|
max_inline_data = 280
|
|
|
|
[Chelsio T5]
|
|
vendor_id = 0x1425
|
|
vendor_part_id = 0xb000,0xb001,0x5400,0x5401,0x5402,0x5403,0x5404,0x5405,0x5406,0x5407,0x5408,0x5409,0x540a,0x540b,0x540c,0x540d,0x540e,0x540f,0x5410,0x5411,0x5412,0x5413
|
|
use_eager_rdma = 1
|
|
mtu = 2048
|
|
receive_queues = P,65536,64
|
|
max_inline_data = 280
|
|
|
|
############################################################################
|
|
|
|
# I'm *assuming* that 0x4040 is the PCI ID...
|
|
|
|
[NetXen]
|
|
vendor_id = 0x4040
|
|
vendor_part_id = 0x0001,0x0002,0x0003,0x0004,0x0005,0x0024,0x0025,0x0100
|
|
use_eager_rdma = 1
|
|
mtu = 2048
|
|
receive_queues = P,65536,248,192,128
|
|
max_inline_data = 64
|
|
|
|
############################################################################
|
|
|
|
# NetEffect's OUI is 0x1255. 0x1678 is the PCI ID. ...but then
|
|
# NetEffect was bought by Intel. Intel's OUI is 0x1b21.
|
|
|
|
[NetEffect/Intel NE020]
|
|
vendor_id = 0x1678,0x1255,0x1b21
|
|
vendor_part_id = 0x0100,0x0110
|
|
use_eager_rdma = 1
|
|
mtu = 2048
|
|
receive_queues = P,65536,256,192,128
|
|
max_inline_data = 64
|
|
|
|
############################################################################
|
|
|
|
# Intel has several OUI's, including 0x8086. Amusing. :-) Intel has
|
|
# advised us (June, 2013) to ignore the Intel Phi OpenFabrics
|
|
# device... at least for now.
|
|
|
|
[Intel Xeon Phi]
|
|
vendor_id = 0x8086
|
|
vendor_part_id = 0
|
|
ignore_device = 1
|