and another for dst descriptors. This provide partial solution to OB1 protocol
deadlock problem. We can limit number of RDMA descriptors (by setting
btl_openib_free_list_max to something different from -1) and if we will be
lucky to hit this limit before we fail to register more memory the protocol
will not deadlock. When we had only one list for src/dst descriptors we
deadlocked when we reached max limit for the list.
This commit was SVN r13844.
not the component. This potentially allows for a mix of HCAs that
support eager RDMA and those who do not on a port-by-port basis.
This commit was SVN r13242.
we are looking at subnet_id's and we are counting active ports per subnet.
move subnet count out of procs loop,, no need to do it there...
This commit was SVN r13105.
udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows
to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited).
This commit was SVN r12878.
Add ability for ini files to recognize "use_eager_rdma" flag. Set the
default to "no" (because we should assume that HCAs cannot support the
property necessary for using RDMA for eager messages -- that the last
byte of the message is guaranteed to be written to memory last --
unless proven otherwise. For example, iWARP cards apparently do not
provide this guarantee), and then set all Mellanox and IBM HCAs to
override the default to enable this behavior on these cards.
This commit was SVN r12851.
The following Trac tickets were found above:
Ticket 366 --> https://svn.open-mpi.org/trac/ompi/ticket/366
r12714) for supporting compilers / architectures with different
padding rules.
This commit was SVN r12749.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r12491
r12714
mca_btl_openib_endpoint_connect_eager_rdma() is called recursively. He also
noticed that orte_pointer_array_add() can't fail because we allocate max number
of elements at init time. So just remove error handling and locking. No locking
- no deadlocks.
This commit was SVN r12388.
something is going wrong down in the code it is removed from the array. So add
mutex to prevent concurrent access to the array from different threads.
This commit was SVN r12385.
What's happening is that we're holding openib_btl->eager_rdma_lock when
we call mca_btl_openib_endpoint_send_eager_rdma() on
btl_openib_endpoint.c:1227. This in turn calls
mca_btl_openib_endpoint_send() on line 1179. Then, if the endpoint
state isn't MCA_BTL_IB_CONNECTED or MCA_BTL_IB_FAILED, we call
opal_progress(), where we eventually try to lock
openib_btl->eager_rdma_lock at btl_openib_component.c:997.
The fix removes this lock altogether. Instead we atomically set local RDMA
pointer to prevent other threads to create rdma buffer for the same endpoint.
And we increment eager_rdma_buffers_count atomically thus polling thread doesn't
need lock around it.
This commit was SVN r12369.
GIDs (there can be more than one) and not GIDs of the HCA on the network. Entry
zero always have to be initialized so we use it, and warn user if there is more
then one port active and default subnet is configured on at least one of them.
This commit was SVN r11815.
In order to provide backwards compatability the framework versions are bumped
and the handler registeration function is at the end of the btl struct.
Testing done on sm, openib, and gm..
This commit was SVN r11256.
the upperlayer assynchronously although there are some issues with this.. such
as there are multiple consumers of the btl's.. who get's the
This commit was SVN r11232.
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
of send/receives outstanding.
Use ibv_cq_resize if available after initial creation of completion queue if
cq_size is too small (based on number of peers).
This commit was SVN r11053.
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1. If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).
This commit was SVN r10521.
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1. If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).
This commit was SVN r10424.
- move files out of toplevel include/ and etc/, moving it into the
sub-projects
- rather than including config headers with <project>/include,
have them as <project>
- require all headers to be included with a project prefix, with
the exception of the config headers ({opal,orte,ompi}_config.h
mpi.h, and mpif.h)
This commit was SVN r8985.
Note, we are failing the ring tests in the intel p2p test suite, but we seem
to fail the same tests under the current trunk.. will look into this further.
This commit was SVN r7823.