This commit fixes one warning that should have caused coll/ml to segfault
on reduce. The fix should be correct but we will continue to investigate.
cmr=v1.7.5:ticket=trac:4158
This commit was SVN r30477.
The following Trac tickets were found above:
Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
After IM with Nathan, apply patch from ticket after verification by Paul Hargrove that it fixes the problem on non-x86 32-bit platforms
Verified by Paul, RM-approved
cmr=v1.7.4:reviewer=ompi-gk1.7
This commit was SVN r30411.
The following Trac tickets were found above:
Ticket 4143 --> https://svn.open-mpi.org/trac/ompi/ticket/4143
allgather.
The new collectives provide a signifigant performance increase over tuned for
small and medium messages. We are initially setting the priority lower than
tuned until this has had some time to soak in the trunk. Please set
coll_ml_priority to 90 for MTT runs.
Credit for this work goes to Manjunath Gorentla Venkata (ORNL), Pavel Shamis (ORNL),
and Nathan Hjelm (LANL).
Commit details (for reference):
Import ORNL's collectives for MPI_Allreduce, MPI_Reduce, and MPI_Allgather.
We need to take the basesmuma header into account when calculating the
ptpcoll small message thresholds. Add a define to bcol.h indicating the
maximum header size so we can take the header into account while not
making ptpcoll dependent on information from basesmuma.
This resolves an issue with allreduce where ptpcoll overwrites the
header of the next buffer in the basesmuma bank.
Fix reduce and make a sequential collective launcher in coll_ml_inlines.h
The root calculation for reduce was wrong for any root != 0. There are
four possibilities for the root:
- The root is not the current process but is in the current hierarchy. In
this case the root is the index of the global root as specified in the
root vector.
- The root is not the current process and is not in the next level of the
hierarchy. In this case 0 must be the local root since this process will
never communicate with the real root.
- The root is not the current process but will be in next level of the
hierarchy. In this case the current process must be the root.
- I am the root. The root is my index.
Tested with IMB which rotates the root on every call to MPI_Reduce. Consider
IMB the reproducer for the issue this commit solves.
Make the bcast algorithm decision an enumerated variable
Resolve various asset failures when destructing coll ml requests.
Two issues:
- Always reset the request to be invalid before returning it to the
free list. This will avoid an asset in ompi_request_t's destructor.
OMPI_REQUEST_FINI does this (and also releases the fortran handle
index).
- Never explicitly construct or destruct the superclass of an opal
object. This screws up the class function tables and will cause
either an assert failure or a segmentation fault when destructing
coll ml requests.
Cleanup allgather.
I removed the duplicate non-blocking and blocking functions and modeled
the cleanup after what I found in allreduce. Also cleaned up the code
somewhat.
Don't bother copying from the send to the recieve buffer in
bcol_basesmuma_allreduce_intra_fanin_fanout if the pointers are the
same.
The eliminates a warning about memcpy and aliasing and avoids an
unnecessary call to memcpy.
Alwasy call CHECK_AND_RELEASE on memsync collectives.
There was a call to OBJ_RELEASE on the collective communicator but
because CHECK_AND_RECYLCE was never called there was not matching call
to OBJ_RELEASE. This caused coll ml to leak communicators.
Make allreduce use the sequential collective launcher in coll_ml_inlines.h
Just launch the next collective in the component progress.
I am a little unsure about this patch. There appears to be some sort
of race between collectives that causes buffer exhaustion in some cases
(IMB Allreduce is a reproducer). Changing progress to only launch the
next bcol seems to resolve the issue but might not be the best fix.
Note that I see little-no performance penalty for this change.
Fix allreduce when there are extra sources.
There was an issue with the buffer offset calculation when there are
extra sources. In the case of extra sources == 1 the offset was set
to buffer_size (just past the header of the next buffer). I adjusted
the buffer size to take into accoun the maximum header size (see the
earlier commit that added this) and simplified the offset calculation.
Make reduce/allreduce non-blocking. This is required for MPI_Comm_idup
to work correctly.
This has been tested with various layouts using the ibm testsuite and
imb and appears to have the same performance as the old blocking version.
Fix allgather for non-contiguous layouts and simplify parsing the
topology.
Some things in this patch:
- There were several comments to the effect that level 0 of the
hierarchy MUST contain all of the ranks. At least one function
made this assumption but it was not true. I changed the sbgp
components and the coll ml initization code to enforce this
requirement.
- Ensure that hierarchy level 0 has the ranks in the correct
scatter gather order. This removes the need for a separate
sort list and fixes the offset calculation for allgather.
- There were several passes over the hierarchy to determine
properties of the hierarchy. I eliminated these extra passes
and the memory allocation associated with them and calculate the
tree properties on the fly. The same DFS recursion also handles
the re-order of level 0.
All these changes have been verified with MPI_Allreduce, MPI_Reduce, and
MPI_Allgather. All functions now pass all IBM/Open MPI, and IMB tests.
coll/ml: correct pointer usage for MPI_BOTTOM
Since contiguous datatypes are copied via memcpy (bypassing the convertor) we
need to adjust for the lb of the datatype. This corrects problems found testing
code that uses MPI_BOTTOM (NULL) as the send pointer.
Add fallback collectives for allreduce and reduce.
cmr=v1.7.5:reviewer=pasha
This commit was SVN r30363.
Adds coll_hcoll_np mca parameter similar to that of fca component (defaults to 32). Those who use hcoll be aware that from now on the communicators less than 32 procs will run w/o hcoll by default. - Resolves fallback issue in case libhcoll runs out of allowed contexts. The solution is moving hcoll_context_create from comm_enable to comm_query. Shortly, comm_enable should never return OMPI_ERROR in the coll component with highest priority (hcoll). Otherwise the ompi coll_base_select will unselect the coll funtion pointers and module references leaving the communicator w/o coll pointer. This will cause the fail. Same behavior can be reproduced even with tuned if one would hardcore some "return OMPI_ERROR" into it's module_enable funtion. - Additionally, removed all the dead code under #if 0; removed unused variables (path for library, active_modules list) and classes (module list wrapper)
Fixed by Val, Reviewed by Devendar/Josh/Miked
cmr=v1.7.4:reviewer=ompi-rm1.7
This commit was SVN r30341.
Set comm attribute with keyval.
Wait for pending hcoll module tasks in comm delete callback where PML
still valid on the communicator. safely destroy hcoll context during
hcoll module destructor.
Author: Devendar Bureddy
reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7
This commit was SVN r30175.
- HCOLL close without init
- Call hcoll progress after comm finalize
- mpirun default for coll_hcoll_enable is 1
fixed by Igor, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7
This commit was SVN r30156.
configury/Makefile.am changes; this commit renames the internal
installdirs.h framework struct field names to match the configry macro
names:
* pkgdatdir -> ompidatadir
* pkglibdir -> ompilibdir
* pkgincludedir -> ompiincludedir
This commit was SVN r30145.
The following SVN revision numbers were found above:
r30140 --> open-mpi/ompi@8b778903d8
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi. This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.
This commit was SVN r30140.
- Modifications to coll/hcoll component related to the changes in the libhcoll API.
Now, hcoll_destroy_context accepts one more parameter that indicates if the context was
really destroyed as a result of the call.
This new "non-blocking" context destruction fixes hang discovered in IMB with mcast enabled.
- Clean up all the left contexts (if any) on the comm_world destruction.
fixed by Val, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7
This commit was SVN r30055.
(aka the root). This commit is based on a patch provided by Pierre
Jolivet.
Fix all the output to match the failing MPI call.
This commit was SVN r29761.
To support the new mpool two changes were made to the mpool infrastructure:
1) Added an mpool flag to indicate that an mpool does not need the memory
hooks to use the leave pinned protocols. This flag is checked in the
mpool lookup.
2) Add a mpool context to the base registration. This new member is used
by the udreg mpool to store the udreg context associated with the
particular registration. The new member will not break the ABI
compatibility as the new member is only currently used by the udreg
mpool.
Dynamics support for Cray systems makes use of the global rank provided by
orte to give the ugni library a unique rank for each process. Dynamics
support is not available under direct-launch (srun.)
cmr=v1.7.4
This commit was SVN r29719.
Only use Portals on communicators with more than one rank
Fix computation of number of children when using the hypercube tree
This commit was SVN r29616.
and tuned to correctly handle 0 recvcounts.
Tested with the reproducer from #1550.
Refs trac:1559
This commit was SVN r29542.
The following Trac tickets were found above:
Ticket 1559 --> https://svn.open-mpi.org/trac/ompi/ticket/1559
The algorithms are intended for MPI-3.0 compliance and are not
optimized. We should aim to add better algorithms in the future through
cheetah.
MPI_Iallreduce and MPI_Igatherv on intercommunicators are required for
MPI_Comm_idup support.
cmr=v1.7.4:reviewer=brbarret:ticket=trac:2715
This commit was SVN r29333.
The following Trac tickets were found above:
Ticket 2715 --> https://svn.open-mpi.org/trac/ompi/ticket/2715
1. Change in rte api implementation: now comm_world used to do p2p.
This allows to not worry about other comms being destroyed.
2. added a notification mechanism with a help of which runtime can say libhcoll that RTE api can not be used any longer.
pass a pointer to a flag, and its size to libhcoll.
The flag changes when the RTE is no longer available.
Currently this flag is just ompi_mpi_finalized global bool value.
cmr=v1.7.3:reviewer=jladd
This commit was SVN r29331.
Blocking versions are simple linear algorithms implemented in coll/basic. Non-
blocking versions are from libnbc 1.1.1. All algorithms have been tested with
simple test cases.
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r29265.
of MPI_Alltoall.
- add support for MPI_IN_PLACE in the self collective component.
- fix the extent usage in the tuned collective component.
- correctly use the peer counts instead of local - add support for MPI_IN_PLACE in the self collective component.
- fix the extent usage in the tuned collective component.
- correctly use the peer counts instead of local.
Thanks to Fujitsu for the patch.
This commit was SVN r29187.
configure-time dynamic allocation of flags. The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process. Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.
This commit was SVN r29100.
option to autodetect whether fragmentation should be enabled
cmr=v1.7.3:ticket=trac:3717
This commit was SVN r29065.
The following Trac tickets were found above:
Ticket 3717 --> https://svn.open-mpi.org/trac/ompi/ticket/3717