# -*- text -*- # # Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana # University Research and Technology # Corporation. All rights reserved. # Copyright (c) 2004-2005 The University of Tennessee and The University # of Tennessee Research Foundation. All rights # reserved. # Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, # University of Stuttgart. All rights reserved. # Copyright (c) 2004-2006 The Regents of the University of California. # All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow # # $HEADER$ # # This is the US/English general help file for Open MPI. # [btl_mvapi:retry-exceeded] The retry count is a down counter initialized on creation of the QP. Retry count is defined in the InfiniBand Spec 1.2 (12.7.38): The total number of times that the sender wishes the receiver to retry tim- eout, packet sequence, etc. errors before posting a completion error. Note that two mca parameters are involved here: btl_mvapi_ib_retry_count - The number of times the sender will attempt to retry (defaulted to 7, the maximum value). btl_mvapi_ib_timeout - The local ack timeout parameter (defaulted to 10). The actual timeout value used is calculated as: (4.096 micro-seconds * 2^btl_mvapi_ib_timeout). See InfiniBand Spec 1.2 (12.7.34) for more details. What to do next: One item to note is the hosts on which this error has occured, it has been observed that rebooting or removing a particular host from the job can resolve this issue. Should you be able to identify a specific cause or additional trouble shooting information please report this to devel@open-mpi.org.