1
1
openmpi/opal/mca/btl/openib/connect/base.h

106 строки
2.5 KiB
C
Исходник Обычный вид История

/*
* Copyright (c) 2007-2008 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2013 Mellanox Technologies, Inc.
* All rights reserved.
*
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#ifndef BTL_OPENIB_CONNECT_BASE_H
#define BTL_OPENIB_CONNECT_BASE_H
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
#include "opal/mca/btl/openib/connect/connect.h"
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
#ifdef OPAL_HAVE_RDMAOE
#define BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl) \
(((IBV_TRANSPORT_IB != ((btl)->device->ib_dev->transport_type)) || \
(IBV_LINK_LAYER_ETHERNET == ((btl)->ib_port_attr.link_layer))) ? \
true : false)
#else
#define BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl) \
((IBV_TRANSPORT_IB != ((btl)->device->ib_dev->transport_type)) ? \
true : false)
#endif
BEGIN_C_DECLS
Fixes trac:1210, #1319 Commit from a long-standing Mercurial tree that ended up incorporating a lot of things: * A few fixes for CPC interface changes in all the CPCs * Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC * #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC) * Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag) * CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers. * CPCs allocate a CTS buffer only if they're about to make a connection * RDMA CM improvements: * Use threaded mode openib fd monitoring to wait for for RDMA CM events * Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions * Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors * Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type * Renamed many variables to be internally consistent * Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator * Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them) * Use rdma_create_qp() instead of ibv_create_qp() * openib fd monitoring improvements: * Renamed a bunch of functions and variables to be a little more obvious as to their true function * Use pipes to communicate between main thread and service thread * Add ability for main thread to invoke a function back on the service thread * Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3)) * Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port * Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout * Other improvements: * btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp") * Somewhat improved error handling * Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements * Oodles of little coding style fixes * Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction * Added some more show_help error messages * Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread) This commit was SVN r19686. The following Trac tickets were found above: Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210
2008-10-06 04:46:02 +04:00
/*
* Forward declaration to resolve circular dependency
*/
struct mca_btl_base_endpoint_t;
/*
* Open function
*/
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
int opal_btl_openib_connect_base_register(void);
/*
* Component-wide CPC init
*/
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
int opal_btl_openib_connect_base_init(void);
/*
* Query CPCs to see if they want to run on a specific module
*/
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
int opal_btl_openib_connect_base_select_for_local_port
(mca_btl_openib_module_t *btl);
/*
* Forward reference to avoid an include file loop
*/
struct mca_btl_openib_proc_modex_t;
/*
* Select function
*/
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
int opal_btl_openib_connect_base_find_match
(mca_btl_openib_module_t *btl,
struct mca_btl_openib_proc_modex_t *peer_port,
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
opal_btl_openib_connect_base_module_t **local_cpc,
opal_btl_openib_connect_base_module_data_t **remote_cpc_data);
/*
* Find a CPC's index so that we can send it in the modex
*/
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
int opal_btl_openib_connect_base_get_cpc_index
(opal_btl_openib_connect_base_component_t *cpc);
/*
* Lookup a CPC by its index (received from the modex)
*/
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
opal_btl_openib_connect_base_component_t *
opal_btl_openib_connect_base_get_cpc_byindex(uint8_t index);
/*
Fixes trac:1210, #1319 Commit from a long-standing Mercurial tree that ended up incorporating a lot of things: * A few fixes for CPC interface changes in all the CPCs * Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC * #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC) * Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag) * CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers. * CPCs allocate a CTS buffer only if they're about to make a connection * RDMA CM improvements: * Use threaded mode openib fd monitoring to wait for for RDMA CM events * Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions * Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors * Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type * Renamed many variables to be internally consistent * Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator * Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them) * Use rdma_create_qp() instead of ibv_create_qp() * openib fd monitoring improvements: * Renamed a bunch of functions and variables to be a little more obvious as to their true function * Use pipes to communicate between main thread and service thread * Add ability for main thread to invoke a function back on the service thread * Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3)) * Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port * Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout * Other improvements: * btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp") * Somewhat improved error handling * Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements * Oodles of little coding style fixes * Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction * Added some more show_help error messages * Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread) This commit was SVN r19686. The following Trac tickets were found above: Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210
2008-10-06 04:46:02 +04:00
* Allocate a CTS frag
*/
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
int opal_btl_openib_connect_base_alloc_cts(
Fixes trac:1210, #1319 Commit from a long-standing Mercurial tree that ended up incorporating a lot of things: * A few fixes for CPC interface changes in all the CPCs * Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC * #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC) * Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag) * CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers. * CPCs allocate a CTS buffer only if they're about to make a connection * RDMA CM improvements: * Use threaded mode openib fd monitoring to wait for for RDMA CM events * Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions * Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors * Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type * Renamed many variables to be internally consistent * Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator * Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them) * Use rdma_create_qp() instead of ibv_create_qp() * openib fd monitoring improvements: * Renamed a bunch of functions and variables to be a little more obvious as to their true function * Use pipes to communicate between main thread and service thread * Add ability for main thread to invoke a function back on the service thread * Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3)) * Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port * Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout * Other improvements: * btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp") * Somewhat improved error handling * Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements * Oodles of little coding style fixes * Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction * Added some more show_help error messages * Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread) This commit was SVN r19686. The following Trac tickets were found above: Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210
2008-10-06 04:46:02 +04:00
struct mca_btl_base_endpoint_t *endpoint);
/*
Fixes trac:1210, #1319 Commit from a long-standing Mercurial tree that ended up incorporating a lot of things: * A few fixes for CPC interface changes in all the CPCs * Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC * #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC) * Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag) * CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers. * CPCs allocate a CTS buffer only if they're about to make a connection * RDMA CM improvements: * Use threaded mode openib fd monitoring to wait for for RDMA CM events * Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions * Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors * Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type * Renamed many variables to be internally consistent * Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator * Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them) * Use rdma_create_qp() instead of ibv_create_qp() * openib fd monitoring improvements: * Renamed a bunch of functions and variables to be a little more obvious as to their true function * Use pipes to communicate between main thread and service thread * Add ability for main thread to invoke a function back on the service thread * Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3)) * Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port * Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout * Other improvements: * btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp") * Somewhat improved error handling * Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements * Oodles of little coding style fixes * Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction * Added some more show_help error messages * Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread) This commit was SVN r19686. The following Trac tickets were found above: Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210
2008-10-06 04:46:02 +04:00
* Free a CTS frag
*/
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
int opal_btl_openib_connect_base_free_cts(
Fixes trac:1210, #1319 Commit from a long-standing Mercurial tree that ended up incorporating a lot of things: * A few fixes for CPC interface changes in all the CPCs * Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC * #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC) * Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag) * CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers. * CPCs allocate a CTS buffer only if they're about to make a connection * RDMA CM improvements: * Use threaded mode openib fd monitoring to wait for for RDMA CM events * Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions * Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors * Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type * Renamed many variables to be internally consistent * Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator * Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them) * Use rdma_create_qp() instead of ibv_create_qp() * openib fd monitoring improvements: * Renamed a bunch of functions and variables to be a little more obvious as to their true function * Use pipes to communicate between main thread and service thread * Add ability for main thread to invoke a function back on the service thread * Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3)) * Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port * Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout * Other improvements: * btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp") * Somewhat improved error handling * Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements * Oodles of little coding style fixes * Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction * Added some more show_help error messages * Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread) This commit was SVN r19686. The following Trac tickets were found above: Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210
2008-10-06 04:46:02 +04:00
struct mca_btl_base_endpoint_t *endpoint);
/*
* Start a new connection to an endpoint
*/
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
int opal_btl_openib_connect_base_start(
opal_btl_openib_connect_base_module_t *cpc,
Fixes trac:1210, #1319 Commit from a long-standing Mercurial tree that ended up incorporating a lot of things: * A few fixes for CPC interface changes in all the CPCs * Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC * #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC) * Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag) * CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers. * CPCs allocate a CTS buffer only if they're about to make a connection * RDMA CM improvements: * Use threaded mode openib fd monitoring to wait for for RDMA CM events * Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions * Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors * Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type * Renamed many variables to be internally consistent * Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator * Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them) * Use rdma_create_qp() instead of ibv_create_qp() * openib fd monitoring improvements: * Renamed a bunch of functions and variables to be a little more obvious as to their true function * Use pipes to communicate between main thread and service thread * Add ability for main thread to invoke a function back on the service thread * Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3)) * Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port * Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout * Other improvements: * btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp") * Somewhat improved error handling * Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements * Oodles of little coding style fixes * Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction * Added some more show_help error messages * Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread) This commit was SVN r19686. The following Trac tickets were found above: Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210
2008-10-06 04:46:02 +04:00
struct mca_btl_base_endpoint_t *endpoint);
/*
* Component-wide CPC finalize
*/
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
void opal_btl_openib_connect_base_finalize(void);
END_C_DECLS
#endif