necessarily mean an error -- it could (and usually does) mean that the
peer realized that we both initiated a connect at the same time, and
therefore it decided to hang up.
I also added a friendly show_help error message for other cases where
recv_blocking() fails (i.e., "Something went wrong. Kaboom! Your job
will abort...").
This commit was SVN r28023.
The following Trac tickets were found above:
Ticket 3494 --> https://svn.open-mpi.org/trac/ompi/ticket/3494
flags, and mca flags are kept seperate until the very end. The main configure
wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD
macro. MCA components should either let <framework>_<component>_{LIBS,LDFLAGS}
be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}.
The situations in which WRAPPER CPPFLAGS can be set by MCA components was
made very small to match the one use case where it makes sense.
This commit was SVN r27950.
all pending fragments when the destination goes down. This allows the PML
to recalibrate its behavior, either find an alternate route or just give up.
This commit was SVN r27881.
using the modex or RML to share sm initialization information, have node rank 0
create a file containing initialization information in a well-known place. Then
during add_procs, the rest of the node processes requiring sm BTL initialization
will just read from that file to complete their initialization.
This commit was SVN r27789.
There was a race condition in the eager get protocol where the RDMA complete message could be received before the local completion of the SMSG message that started the eager get protocol.
cmr:v1.7
This commit was SVN r27740.
* btl sendi(): if message can be send inline try to avoid signal
* signal is requested one per 64 or when
there are no send wqes
when message can not be send inline
any other btl method then sendi()
This commit was SVN r27724.
An upcoming BTL from Cisco used ofud as a starting point, and should
probably be used as a starting point for any future UD-based BTL.
And this OFUD BTL is obviously still in history if anyone ever wants
to resurrect it.
This commit was SVN r27655.
of modules, print a BTL_ERROR and exit(1) (previous behavior was to
segv). This at least explicitly tells the developer that their BTL
component is behaving badly.
This commit was SVN r27634.
Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality.
This commit was SVN r27570.
pml/v:
- If vprotocol is not being used vprotocol_include_list is leaked. Assume vprotocol never takes ownership (see below) and always free the string.
coll/ml:
- (patch verified) calling mca_base_param_lookup_string after mca_base_param_reg_string is unnecessary. The call to mca_base_param_lookup_string causes the value returned by mca_base_param_reg_string to be leaked.
- Need to free mca_coll_ml_component.config_file_name on component close.
btl/openib:
- calling mca_base_param_lookup_string after mca_base_param_reg_string is unnecessary. The call to mca_base_param_lookup_string causes the value returned by mca_base_param_reg_string to be leaked.
vprotocol/base:
- There was no way for pml/v to determine if vprotocol took ownership of vprotocol_include_list. Fix by always never ownership (use strdup).
mca/base:
- param_lookup will result in storage->stringval to be a newly allocated string if the mca parameter has a string value. ensure this string is always freed.
cmr:v1.7
This commit was SVN r27569.
It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now.
This commit was SVN r27527.
The following SVN revision numbers were found above:
r27451 --> open-mpi/ompi@d59034e6ef
r27456 --> open-mpi/ompi@ecdbf34937
* Add OMPI_COMMON_VERBS_FLAGS_NOT_RC, which looks for a device that
does ''not'' support RC
* Add ompi_common_verbs_find_max_inline(), and remove that code from
the openib BTL component
This commit was SVN r27393.
* Moved "check basics" sanity check from openib BTL to common/verbs
(which also allows us to have openib ''not'' include
<infiniband/driver.h>, which is a Very Good Thing)
* Add new ompi_common_verbs_qp_test() function, which tests to see
whether a device supports RC and/or UD QPs. The openib BTL now
uses this function to ensure that the device supports RC QPs.
* Rename ompi_common_verbs_find_ibv_ports() to be
ompi_common_verbs_find_ports() -- the "ibv" was redundant.
* Re-work ompi_common_verbs_find_ports() to use
ompi_common_verbs_qp_test() instead of testing for RC/UD QPs itself
* Add bunches of opal_output_verbose() to the find_ports() routine
(to help diagnosing connectivity problems -- imaging running with
--mca btl_base_verbose 10; you'll see all the find_ports() test
results)
* Make ompi_common_verbs_qp_test() warn if devices/ports are supplied
in the if_include/if_exclude strings that do not exists (quite
similar to what the openib BTL does today).
* Add ompi_common_verbs_mca_register() function, which registers
common verbs MCA params. It will also register MCA param synonyms
for thse MCA params to upper-level components (e.g.,
btl_<upper-level-component>_<the-mca-param>).
* common_verbs_warn_nonexistent_if: warn if
if_include/if_exclude-specified devices or ports do not exist.
This commit was SVN r27332.
We ran into a case where the OMPI SVN trunk grew a new acceptable MCA
parameter value, but this new value was not accepted on the v1.6
branch (hwloc_base_mem_bind_failure_action -- on the trunk it accepts
the value "silent", but on the older v1.6 branch, it doesn't). If you
set "hwloc_base_mem_bind_failure_action=silent" in the default MCA
params file and then accidentally ran with the v1.6 branch, every OMPI
executable (including ompi_info) just failed because hwloc_base_open()
would say "hey, 'silent' is not a valid value for
hwloc_base_mem_bind_failure_action!". Kaboom.
The only problem is that it didn't give you any indication of where
this value was being set. Quite maddening, from a user perspective.
So we changed the ompi_info handles this case. If any framework open
function return OMPI_ERR_BAD_PARAM (either because its base MCA params
got a bad value or because one of its component register/open
functions return OMPI_ERR_BAD_PARAM), ompi_info will stop, print out
a warning that it received and error, and then dump out the parameters
that it has received so far in the framework that had a problem.
At a minimum, this will show the user the MCA param that had an error
(it's usually the last one), and ''where it was set from'' (so that
they can go fix it).
We updated ompi_info to check for O???_ERR_BAD_PARAM from each from
the framework opens. Also updated the doxygen docs in mca.h for this
O???_BAD_PARAM behavior. And we noticed that mca.h had MCA_SUCCESS
and MCA_ERR_??? codes. Why? I think we used them in exactly one
place in the code base (mca_base_components_open.c). So we deleted
those and just used the normal OPAL_* codes instead.
While we were doing this, we also cleaned up a little memory
management during ompi_info/orte-info/opal-info finalization.
Valgrind still reports a truckload of memory still in use at ompi_info
termination, but they mostly look to be components not freeing
memory/resources properly (and outside the scope of this fix).
This commit was SVN r27306.
The following Trac tickets were found above:
Ticket 3275 --> https://svn.open-mpi.org/trac/ompi/ticket/3275
some new common OpenFabrics functionality to ompi/mca/common/verbs.
Also move everything that was in ompi/mca/common/ofautils under
ompi/mca/common/verbs.
* Move ofautils -> verbs
* Add new functionality in ompi/mca/common/verbs (see doxygen
* comments in ompi/mca/common/verbs/common_verbs.h for details):
* ompi_common_verbs_find_ibv_ports()
* ompi_common_verbs_port_bw()
* ompi_common_verbs_mtu()
* '''If you're writing verbs-based code, you should be using this
common functionality'''
* Adapt openib BTL to use some trivial common functionality in
common/verbs
* Don't use "#ifdef OMPI_HAVE_RDMAOE",use
"#if defined(HAVE_IBV_LINK_LAYER_ETHERNET)"
* Update the following to include/link against common/verbs
* bcol/iboffload
* sbgp/ibnet
* btl/openib
This commit was SVN r27212.
that causes MPI jobs to abort if there is not enough registered memory
available (vs. just warning).
This commit was SVN r27140.
The following Trac tickets were found above:
Ticket 3258 --> https://svn.open-mpi.org/trac/ompi/ticket/3258