1. Fix ompi_info memory leak in usnic BTL: do not allocate memory in
the component register function, because ompi_info only calls the
component register function and then dlclose's the component -- it
does not call component finalize. Instead, defer parsing the MCA
param (and alloc'ing memory) until the component init function so
that any allocated memory can be freed in the component close
function.
1. Also add a new check to ensure that we actually have some part
numbers to check. Add a show_help message if we don't find any
vendor part IDs to check.
1. Add a verbose output if usnic disqualifies itself from selection
because THREAD_MULTIPLE was specified.
cmr=v1.7.5:reviewer=dgoodell
This commit was SVN r30073.
- Modifications to coll/hcoll component related to the changes in the libhcoll API.
Now, hcoll_destroy_context accepts one more parameter that indicates if the context was
really destroyed as a result of the call.
This new "non-blocking" context destruction fixes hang discovered in IMB with mcast enabled.
- Clean up all the left contexts (if any) on the comm_world destruction.
fixed by Val, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7
This commit was SVN r30055.
This patch changes all send/send_buffer occurrences in the C/R code
to send_nb/send_buffer_nb.
The new code compiles but does not work.
Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED
Changes from V2:
* just replace the blocking calls with the non-blocking calls
* all #ifdef's introduced in V1 are gone
* send_* returns error code or ORTE_SUCCESS (not the number of bytes)
This commit was SVN r30036.
This patch changes all recv/recv_buffer occurrences in the C/R code
to recv_nb/recv_buffer_nb.
The old code is still there but disabled using ifdefs (ENABLE_FT_FIXED).
The new code compiles but does not work.
Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED
Changes from V2:
* only #ifdef out the code where the behaviour is changed
(used to be blocking; now non-blocking)
This commit was SVN r30035.
Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node.
Refs trac:4003
This commit was SVN r30033.
The following Trac tickets were found above:
Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003
Some NFS scenarios can result in an infinite ESTALE return, which will
hang ROMIO. This commit causes ROMIO to error out after a large number
of retries instead of spinning forever.
This is MPICH commit b250d338:
http://git.mpich.org/mpich.git/commit/b250d338e66667a8a1071a5f73a4151fd59f83b2
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r29993.
function pointers set to the _map functions, and we get segv's in MTT
testing (e.g., the C++ suite, which actually calls MPI_Cart_map and
MPI_Graph_map).
cmr=v1.7.4:reviewer=bosilca:subject=Fix topo _map function pointer assignments
This commit was SVN r29988.
branch (it's not necessary on trunk/v1.7 because they require C99,
which allows variadic macros).
Also fix another compiler warning (using %p to print a (void*)).
Submitted by Jeff, reviewed by Dave.
cmr=v1.7.4:reviewer=ompi-rm1.7:subject=two usnic BTL fixes
This commit was SVN r29966.
usnic_channel_finalize() was deregistering recv buffers before
destroying the QP to which they were posted. The QP needs to be
destroyed first so that the NIC does not attemp tto write to
deregistered memory, causing the DMAR messages.
Submitted by Reese, reviewed by Jeff.
cmr=v1.7.4:reviewer=ompi-rm1.7
This commit was SVN r29963.
debugger code (not mca_topo_base_module_2_1_0_t).
I checked: we do a similar thing for coll in the communicator struct
(i.e., leave the version number off the module struct). I confess to
not remembering ''why'' we leave the version number off, but it seems
to be consistent this way...
cmr=v1.7.4:reviewer=bosilca:subject=fix debugger type symbol lookup for mca_topo_base_module_t
This commit was SVN r29953.
The following Trac tickets were found above:
Ticket 3958 --> https://svn.open-mpi.org/trac/ompi/ticket/3958
* automatically retrieve the hostname (and all RTE info) for all procs during MPI_Init if nprocs < cutoff
* if nprocs > cutoff, retrieve the hostname (and all RTE info) for a proc upon the first call to modex_recv for that proc. This would provide the hostname for debugging purposes as we only report errors on messages, and so we must have called modex_recv to get the endpoint info
* BTLs are not to call modex_recv until they need the endpoint info for first message - i.e., not during add_procs so we don't call it for every process in the job, but only those with whom we communicate
My understanding is that only some BTLs have been modified to meet that third requirement, but those include the Cray ones where jobs are big enough that launch times were becoming an issue. Other BTLs would hopefully be modified as time went on and interest in using them at scale arose. Meantime, those BTLs would call modex_recv on every proc, and we would therefore be no worse than the prior behavior.
This commit revises the MPI-RTE interface to pass the ompi_proc_t instead of the ompi_process_name_t for the proc so that the hostname can be easily inserted. I have advised the ORNL folks of the change.
cmr=v1.7.4:reviewer=jsquyres:subject=Fix thread deadlock
This commit was SVN r29931.
The following SVN revision numbers were found above:
r29917 --> open-mpi/ompi@1a972e2c9d
includes various fixes all over the C/R code which are
hard to group like the other patches.
Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values
Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)
This commit was SVN r29922.