When an error is returned by the socket operations, trigger the
appropriate error path in the PML to give an opportunity for
rerouting/error handling.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
Their is racing condition in TCP connection establishment
during simultaneous handshake. This PR handles the fix for
it.
Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
This commit has two changes
1. Adding magic string during handshake can cause
issue when used with older version of MPI. Hence set
RCVTIMEO paramter to 2 second
2. Using single call during handshake instead of
two calls
Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
Moving non-blocking send/receive function to btl_tcp
will help reusing these function where ever needed.
In this case we plan to reuse receive function to
retrive magic string to validate established connection
is from mpi process.
Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
Based on an idea from Brian move the libevent trigger update to a later
stage instead of the generic add/del procs. So, we are doing the
increment/decrement when we register the recv handler for an endpoint,
so basically when we create and connect a socket to a peer. The benefit
is that as long as TCP is not used, there should be no impact on the
performance of other BTLs. The drawback is that the first TCP connection
will be slightly slower, but then once we have a peer connected over
TCP things go back to normal.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
This commit makes two changes to the tcp btl:
- If a tcp proc does not exist when handling a new connection create
a new proc and use it. The current implementation uses the
opal_proc_by_name() function to get the opal_proc_t then calls
add_procs on all btl modules. It may be sufficient to just call
add_procs until an endpoint is created so this may change somewhat.
- In add_procs add a check for an existing endpoint before creating
one.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
structure
This structure member was originally used to specify the remote segment
for an RDMA operation. Since the new btl interface no longer uses
desriptors for RDMA this member no longer has a purpose. In addition
to removing these members the local segment information has been
renamed to des_segments/des_segment_count.
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL
All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.
This commit was SVN r32317.