1b6231a9b5
Each 's' partition has its own TCP network. It's fine to use this network for jobs that fit inside the partition, but the TCP OOB errors when trying to connect across two partitions, because there are two disjoint networks. Each node also has another TCP network connecting ALL nodes together. So the solution is to actually try all the available TCP interfaces on a node, instead of erroring when the first one fails. Also, the default TCP connect() timeout is way too long (5 minutes) - use our own timeout mechanism, with the timeout value expressed as an MCA parameter. This commit was SVN r11718. |
||
---|---|---|
.. | ||
configure.m4 | ||
configure.params | ||
Makefile.am | ||
oob_tcp_addr.c | ||
oob_tcp_addr.h | ||
oob_tcp_hdr.h | ||
oob_tcp_msg.c | ||
oob_tcp_msg.h | ||
oob_tcp_peer.c | ||
oob_tcp_peer.h | ||
oob_tcp_ping.c | ||
oob_tcp_recv.c | ||
oob_tcp_send.c | ||
oob_tcp.c | ||
oob_tcp.h |