1
1

9 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
56bdcd0888 btl/openib: fix XRC WQE calculation
Before dynamic add_procs support was committed to master we called
add_procs with every proc in the job. The XRC code in the openib btl
was taking advantage of this and setting the number of work queue
entries (WQE) based on all the procs on a remote node. Since that is
no longer the case we can not simply increment the sd_wqe field on the
queue pair. To fix the issue a new field has been added to the xrc
queue pair structure to keep track of how many wqes there are total on
the queue pair. If a new endpoint is added that increases the number
of wqes and the xrc queue pair is already connected the code will
attempt to modify the number of wqes on the queue pair. A failure is
ignored because all that will happen is the number of active send work
requests on an XRC queue pair will be more limited.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-26 15:58:31 -06:00
Nathan Hjelm
371df45bf8 btl/openib: fix locking bugs with XRC ib_addr lock
This bug fixes two issue with the ib_addr lock:

 - The ib_addr lock must always be obtained regardless of
   opal_using_threads() as the CPC is run in a seperate thread.

 - The ib_addr lock is held in mca_btl_openib_endpoint_connected when
   calling back into the CPC start_connect on any pending
   connections. This will attempt to obtain the ib_addr lock
   again. Since this is not a performance-critical part of the code
   the lock has been changed to be recursive.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-18 15:55:34 -07:00
Gilles Gouaillardet
d02ccd67de btl/openib: remove OFED version runtime check when XRC is used
this test seems broken :
 - some false positive were reported
 - it fails to detect some OFED version mismatch
this commit simply removes this test, which means the application
will likely fail if XRC is used ad OFED version is different
between compile time and runtime
2015-08-14 09:10:03 +09:00
Gilles Gouaillardet
135ecce0eb btl/openib: rename OPAL_HAVE_XRCD macro into OPAL_HAVE_CONNECTX_XRC_DOMAINS 2015-01-07 13:27:25 +09:00
Gilles Gouaillardet
b3617e736e btl/openib: add XRC support with OFED 3.12+
based on an original patch contributed by Bull.
2015-01-06 15:30:52 +09:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Rolf vandeVaart
e39c673dd2 Fix typo.
This commit was SVN r32339.
2014-07-29 13:21:24 +00:00
George Bosilca
f217661ee0 Use opal_process_info whenever possible. Some other minor cleanups.
This commit was SVN r32325.
2014-07-26 21:48:23 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00