* btl sendi(): if message can be send inline try to avoid signal
* signal is requested one per 64 or when
there are no send wqes
when message can not be send inline
any other btl method then sendi()
This commit was SVN r27724.
pml/v:
- If vprotocol is not being used vprotocol_include_list is leaked. Assume vprotocol never takes ownership (see below) and always free the string.
coll/ml:
- (patch verified) calling mca_base_param_lookup_string after mca_base_param_reg_string is unnecessary. The call to mca_base_param_lookup_string causes the value returned by mca_base_param_reg_string to be leaked.
- Need to free mca_coll_ml_component.config_file_name on component close.
btl/openib:
- calling mca_base_param_lookup_string after mca_base_param_reg_string is unnecessary. The call to mca_base_param_lookup_string causes the value returned by mca_base_param_reg_string to be leaked.
vprotocol/base:
- There was no way for pml/v to determine if vprotocol took ownership of vprotocol_include_list. Fix by always never ownership (use strdup).
mca/base:
- param_lookup will result in storage->stringval to be a newly allocated string if the mca parameter has a string value. ensure this string is always freed.
cmr:v1.7
This commit was SVN r27569.
It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now.
This commit was SVN r27527.
The following SVN revision numbers were found above:
r27451 --> open-mpi/ompi@d59034e6ef
r27456 --> open-mpi/ompi@ecdbf34937
* Add OMPI_COMMON_VERBS_FLAGS_NOT_RC, which looks for a device that
does ''not'' support RC
* Add ompi_common_verbs_find_max_inline(), and remove that code from
the openib BTL component
This commit was SVN r27393.
* Moved "check basics" sanity check from openib BTL to common/verbs
(which also allows us to have openib ''not'' include
<infiniband/driver.h>, which is a Very Good Thing)
* Add new ompi_common_verbs_qp_test() function, which tests to see
whether a device supports RC and/or UD QPs. The openib BTL now
uses this function to ensure that the device supports RC QPs.
* Rename ompi_common_verbs_find_ibv_ports() to be
ompi_common_verbs_find_ports() -- the "ibv" was redundant.
* Re-work ompi_common_verbs_find_ports() to use
ompi_common_verbs_qp_test() instead of testing for RC/UD QPs itself
* Add bunches of opal_output_verbose() to the find_ports() routine
(to help diagnosing connectivity problems -- imaging running with
--mca btl_base_verbose 10; you'll see all the find_ports() test
results)
* Make ompi_common_verbs_qp_test() warn if devices/ports are supplied
in the if_include/if_exclude strings that do not exists (quite
similar to what the openib BTL does today).
* Add ompi_common_verbs_mca_register() function, which registers
common verbs MCA params. It will also register MCA param synonyms
for thse MCA params to upper-level components (e.g.,
btl_<upper-level-component>_<the-mca-param>).
* common_verbs_warn_nonexistent_if: warn if
if_include/if_exclude-specified devices or ports do not exist.
This commit was SVN r27332.
We ran into a case where the OMPI SVN trunk grew a new acceptable MCA
parameter value, but this new value was not accepted on the v1.6
branch (hwloc_base_mem_bind_failure_action -- on the trunk it accepts
the value "silent", but on the older v1.6 branch, it doesn't). If you
set "hwloc_base_mem_bind_failure_action=silent" in the default MCA
params file and then accidentally ran with the v1.6 branch, every OMPI
executable (including ompi_info) just failed because hwloc_base_open()
would say "hey, 'silent' is not a valid value for
hwloc_base_mem_bind_failure_action!". Kaboom.
The only problem is that it didn't give you any indication of where
this value was being set. Quite maddening, from a user perspective.
So we changed the ompi_info handles this case. If any framework open
function return OMPI_ERR_BAD_PARAM (either because its base MCA params
got a bad value or because one of its component register/open
functions return OMPI_ERR_BAD_PARAM), ompi_info will stop, print out
a warning that it received and error, and then dump out the parameters
that it has received so far in the framework that had a problem.
At a minimum, this will show the user the MCA param that had an error
(it's usually the last one), and ''where it was set from'' (so that
they can go fix it).
We updated ompi_info to check for O???_ERR_BAD_PARAM from each from
the framework opens. Also updated the doxygen docs in mca.h for this
O???_BAD_PARAM behavior. And we noticed that mca.h had MCA_SUCCESS
and MCA_ERR_??? codes. Why? I think we used them in exactly one
place in the code base (mca_base_components_open.c). So we deleted
those and just used the normal OPAL_* codes instead.
While we were doing this, we also cleaned up a little memory
management during ompi_info/orte-info/opal-info finalization.
Valgrind still reports a truckload of memory still in use at ompi_info
termination, but they mostly look to be components not freeing
memory/resources properly (and outside the scope of this fix).
This commit was SVN r27306.
The following Trac tickets were found above:
Ticket 3275 --> https://svn.open-mpi.org/trac/ompi/ticket/3275
some new common OpenFabrics functionality to ompi/mca/common/verbs.
Also move everything that was in ompi/mca/common/ofautils under
ompi/mca/common/verbs.
* Move ofautils -> verbs
* Add new functionality in ompi/mca/common/verbs (see doxygen
* comments in ompi/mca/common/verbs/common_verbs.h for details):
* ompi_common_verbs_find_ibv_ports()
* ompi_common_verbs_port_bw()
* ompi_common_verbs_mtu()
* '''If you're writing verbs-based code, you should be using this
common functionality'''
* Adapt openib BTL to use some trivial common functionality in
common/verbs
* Don't use "#ifdef OMPI_HAVE_RDMAOE",use
"#if defined(HAVE_IBV_LINK_LAYER_ETHERNET)"
* Update the following to include/link against common/verbs
* bcol/iboffload
* sbgp/ibnet
* btl/openib
This commit was SVN r27212.
that causes MPI jobs to abort if there is not enough registered memory
available (vs. just warning).
This commit was SVN r27140.
The following Trac tickets were found above:
Ticket 3258 --> https://svn.open-mpi.org/trac/ompi/ticket/3258
- OMPI_SUCCESS
- OMPI_ERROR
- OMPI_ERR_RESOURCE_BUSY
If an "OMPI_ERR_OUT_OF_RESOURCE" occurs, the request is added to the pending list, and will be handled later. An error message
should not be printed to the user in this case. This is not an error, but rather a notification of a possible valid condition.
Only in the case of "OMPI_ERROR" should it be printed to the user.
This commit was SVN r27065.
btl_openib_connect_udcm when notifying not to listen to an fd to ensure
that the main thread does not continue until the service thread has
processed the message
Adds ability to send message to openib async thread to tell it to
ignore the ERR state on a specific QP. Adds this call to udcm_module_finalize
so when we set the error state on the QP it doesn't cause the
openib async thread to abort the mpi program prematurely
Fixes trac:3161
This commit was SVN r27064.
The following Trac tickets were found above:
Ticket 3161 --> https://svn.open-mpi.org/trac/ompi/ticket/3161
receive queues value) so that we don't break the use of RDMA CM, and
therefore break RoCE.
This commit was SVN r27017.
The following SVN revision numbers were found above:
r26869 --> open-mpi/ompi@fe0e7f81df
- For now we'll use 8192 as a base value
- We leave the adjust_cq() as is
- For the long term we can work on an appropriate setting to expose through the INI file.
8K CQEs are 512K per process, which is 8MB for ppn=16
This commit was SVN r26877.
ibv_get_device_list_compat() and not finding it, I finally realized
that it was a function in OMPI. So let's name it with a proper ompi_
prefix, not an ibv_ prefix.
This commit was SVN r26867.
1000. Refs trac:3154.
IB/iWarp vendors need to get together to figure out a real fix.
This commit was SVN r26777.
The following SVN revision numbers were found above:
r26730 --> open-mpi/ompi@5315c91baf
The following Trac tickets were found above:
Ticket 3154 --> https://svn.open-mpi.org/trac/ompi/ticket/3154
* If the MCA param btl_openib_cq_size is set to 0 (which is the
default), use the device CQ max size. Otherwise, use the MCA param
value (and never adjust it again).
* Remove the CQ size adjustment code. Since we default to max CQ
size, there really isn't much point in having it any more. I think
people setting an absolute CQ size is going to be rare, so let's
not do anything fancy with it.
* If the MCA param value is larger than what the device supports,
print a warning (only once per process) and default to using the
device max
* Add a BTL_VERBOSE displaying which CQ size we used
This commit was SVN r26730.
The following Trac tickets were found above:
Ticket 3152 --> https://svn.open-mpi.org/trac/ompi/ticket/3152
hook early in the setup, but ''not'' during the component register
function. And then properly unset it if was set.
This commit was SVN r26697.
The following Trac tickets were found above:
Ticket 3130 --> https://svn.open-mpi.org/trac/ompi/ticket/3130