1
1

btl_usnic_mca.c: Increase default connectivity checker frequency

In abusive MPI communication patterns, sending a UDP ping only once a
second may not be sufficient -- all the UDP pings may be dropped.  So
increase the frequency of the pings to every quarter second, and allow
more total pings to be sent.

Total timeout time is still the same (10 seconds) -- we'll just now
try 40 times (i.e., once every quarter second) as opposed to 10 times
(i.e., once a second).  Testing has shown that this frequency allows
the connectivity checker to always succeed even in the many-to-one
abusive communication patterns.

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r31602.
Этот коммит содержится в:
Jeff Squyres 2014-05-02 11:06:18 +00:00
родитель ccd33a17b8
Коммит 10e8ab493e

Просмотреть файл

@ -269,14 +269,14 @@ int ompi_btl_usnic_component_register(void)
&mca_btl_usnic_component.connectivity_enabled,
OPAL_INFO_LVL_3));
mca_btl_usnic_component.connectivity_ack_timeout = 1000;
mca_btl_usnic_component.connectivity_ack_timeout = 250;
CHECK(reg_int("connectivity_ack_timeout",
"Timeout, in milliseconds, while waiting for an ACK while verification connectivity between usNIC devices. If 0, the connectivity check is disabled (must be >=0).",
mca_btl_usnic_component.connectivity_ack_timeout,
&mca_btl_usnic_component.connectivity_ack_timeout,
REGINT_GE_ZERO, OPAL_INFO_LVL_3));
mca_btl_usnic_component.connectivity_num_retries = 10;
mca_btl_usnic_component.connectivity_num_retries = 40;
CHECK(reg_int("connectivity_error_num_retries",
"Number of times to retry usNIC connectivity verification before aborting the MPI job (must be >0).",
mca_btl_usnic_component.connectivity_num_retries,