1
1
openmpi/ompi/mca/mtl/ofi/help-mtl-ofi.txt
Michael Heinz 3aca4af548 REF6976 Silent failure of OMPI over OFI with large messages sizes
INTERNAL: STL-59403

The OFI (libfabric) MTL does not respect the maximum message size
parameter that OFI provides in the fi_info data.

This patch adds this missing max_msg_size field to the mca_ofi_module_t
structure and adds a length check to the low-level send routines.

Change-Id: I05aa71d332f2df897133b30c28bf37d98f061996
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
Reviewed-by: Adam Goldman <adam.goldman@intel.com>
Reviewed-by: Brendan Cunningham <brendan.cunningham@intel.com>
2019-09-23 15:23:48 -04:00

80 строки
2.4 KiB
Plaintext

# -*- text -*-
#
# Copyright (c) 2013-2018 Intel, Inc. All rights reserved
#
# Copyright (c) 2017 Cisco Systems, Inc. All rights reserved
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
[OFI call fail]
Open MPI failed an OFI Libfabric library call (%s). This is highly
unusual; your job may behave unpredictably (and/or abort) after this.
Local host: %s
Location: %s:%d
Error: %s (%zd)
#
[Not enough bits for CID]
OFI provider "%s" does not have enough free bits in its tag to fit the MPI
Communicator ID. See the mem_tag_format of the provider by running:
fi_info -v -p %s
Local host: %s
Location: %s:%d
[SEP unavailable]
Scalable Endpoint feature is enabled by the user but it is not supported by
%s provider. Try disabling this feature or use a different provider that
supports it using mtl_ofi_provider_include.
Local host: %s
Location: %s:%d
[SEP required]
Scalable Endpoint feature is required for Thread Grouping feature to work.
Please try enabling Scalable Endpoints using mtl_ofi_enable_sep.
Local host: %s
Location: %s:%d
[SEP thread grouping ctxt limit]
Reached limit (%d) for number of OFI contexts set by mtl_ofi_num_ctxts.
Please set mtl_ofi_num_ctxts to a larger value if you need more contexts.
If an MPI application creates more communicators than mtl_ofi_num_ctxts,
OFI MTL will make the new communicators re-use existing contexts in
round-robin fashion which will impact performance.
Local host: %s
Location: %s:%d
[Local ranks exceed ofi contexts]
Number of local ranks exceed the number of available OFI contexts in %s
provider and we cannot provision enough contexts for each rank. Try disabling
Scalable Endpoint feature.
Local host: %s
Location: %s:%d
[Ctxts exceeded available]
User requested for more than available contexts from provider. Limiting
to max allowed (%d). Contexts will be re used in round-robin fashion if there
are more threads than the available contexts.
Local host: %s
Location: %s:%d
[modex failed]
The OFI MTL was not able to find endpoint information for a remote
endpoint. Most likely, this means that the remote process was unable
to initialize the Libfabric NIC correctly. This error is not
recoverable and your application is likely to abort.
Local host: %s
Remote host: %s
Error: %s (%d)
[message too big]
Message size %llu bigger than supported by selected transport. Max = %llu