1
1
openmpi/orte/mca/oob/tcp
Andrew Friedley 1b6231a9b5 Fix for running jobs that span multiple 's' partitions on IU BigRed.
Each 's' partition has its own TCP network.  It's fine to use this network for jobs that fit inside the partition, but the TCP OOB errors when trying to connect across two partitions, because there are two disjoint networks.  Each node also has another TCP network connecting ALL nodes together.

So the solution is to actually try all the available TCP interfaces on a node, instead of erroring when the first one fails.

Also, the default TCP connect() timeout is way too long (5 minutes) - use our own timeout mechanism, with the timeout value expressed as an MCA parameter.

This commit was SVN r11718.
2006-09-19 19:33:49 +00:00
..
configure.m4 Update the copyright notices for IU and UTK. 2005-11-05 19:57:48 +00:00
configure.params Update the copyright notices for IU and UTK. 2005-11-05 19:57:48 +00:00
Makefile.am Update the copyright notices for IU and UTK. 2005-11-05 19:57:48 +00:00
oob_tcp_addr.c With the branch to 1.2 made.... 2006-08-15 19:54:10 +00:00
oob_tcp_addr.h And ORTE is ready for prime-time. All Windows tricks are in: 2006-08-23 03:32:36 +00:00
oob_tcp_hdr.h Next step in the project split, mainly source code re-arranging 2006-02-12 01:33:29 +00:00
oob_tcp_msg.c Cleanup a historical naming convention problem. Move the socket_errno definitions to the OPAL layer and change the name accordingly. This cleans up some interrelationship issues as well as removing a name confusion. 2006-08-14 20:14:44 +00:00
oob_tcp_msg.h * use opal_free_list_item_t as the type of items stored in an opal_free_list_t, 2006-07-17 21:51:50 +00:00
oob_tcp_peer.c Fix for running jobs that span multiple 's' partitions on IU BigRed. 2006-09-19 19:33:49 +00:00
oob_tcp_peer.h And ORTE is ready for prime-time. All Windows tricks are in: 2006-08-23 03:32:36 +00:00
oob_tcp_ping.c And ORTE is ready for prime-time. All Windows tricks are in: 2006-08-23 03:32:36 +00:00
oob_tcp_recv.c * use opal_free_list_item_t as the type of items stored in an opal_free_list_t, 2006-07-17 21:51:50 +00:00
oob_tcp_send.c Next step in the project split, mainly source code re-arranging 2006-02-12 01:33:29 +00:00
oob_tcp.c Fix for running jobs that span multiple 's' partitions on IU BigRed. 2006-09-19 19:33:49 +00:00
oob_tcp.h Fix for running jobs that span multiple 's' partitions on IU BigRed. 2006-09-19 19:33:49 +00:00