1
1
openmpi/ompi/mca/coll/tuned
Rainer Keller 4e6a6fc146 - Check, whether the compiler supports __builtin_clz (count leading
zeroes);
   if so, use it for bit-operations like opal_cube_dim and opal_hibit.
   Implement two versions of power-of-two.
   In case of opal_next_poweroftwo, this reduces the average execution
   time from 83 cycles to 4 cycles (Intel Nehalem, icc, -O2, inlining,
   measured rdtsc, with loop over 2^27 values).
   Numbers for other functions are similar (but of course heavily depend
   on the usage, e.g. opal_hibit() with a start of 4 does not save
   much).  The bsr instruction on AMD Opteron is also not as fast.

 - Replace various places where the next power-of-two is computed.
   
   Tested on Intel Nehalem Cluster with openib, compilers GNU-4.6.1 and
   Intel-12.0.4 using mpi_testsuite -t "Collective" with 128 processes.

This commit was SVN r25270.
2011-10-11 22:49:01 +00:00
..
coll_tuned_allgather.c - Check, whether the compiler supports __builtin_clz (count leading 2011-10-11 22:49:01 +00:00
coll_tuned_allgatherv.c - Split the datatype engine into two parts: an MPI specific part in 2009-07-13 04:56:31 +00:00
coll_tuned_allreduce.c - Check, whether the compiler supports __builtin_clz (count leading 2011-10-11 22:49:01 +00:00
coll_tuned_alltoall.c Don't forget to initialize "line" in all cases. 2010-05-19 21:19:45 +00:00
coll_tuned_alltoallv.c - Split the datatype engine into two parts: an MPI specific part in 2009-07-13 04:56:31 +00:00
coll_tuned_barrier.c - Check, whether the compiler supports __builtin_clz (count leading 2011-10-11 22:49:01 +00:00
coll_tuned_bcast.c - Split the datatype engine into two parts: an MPI specific part in 2009-07-13 04:56:31 +00:00
coll_tuned_component.c Rework the selection logic for the tuned collectives. All supported collectives 2009-08-14 21:06:23 +00:00
coll_tuned_decision_dynamic.c Rework the selection logic for the tuned collectives. All supported collectives 2009-08-14 21:06:23 +00:00
coll_tuned_decision_fixed.c - Check, whether the compiler supports __builtin_clz (count leading 2011-10-11 22:49:01 +00:00
coll_tuned_dynamic_file.c ... Delayed due to notifier commits earlier this day ... 2009-04-29 01:32:14 +00:00
coll_tuned_dynamic_file.h - Replace combinations of 2009-08-20 11:42:18 +00:00
coll_tuned_dynamic_rules.c Rework the selection logic for the tuned collectives. All supported collectives 2009-08-14 21:06:23 +00:00
coll_tuned_dynamic_rules.h - Replace combinations of 2009-08-20 11:42:18 +00:00
coll_tuned_gather.c Only get the receive datatype extent on the root process, as every 2010-02-17 16:01:50 +00:00
coll_tuned_module.c Ensure that the com_rules[] array entries are initialized to NULL in 2010-07-07 14:04:18 +00:00
coll_tuned_reduce_scatter.c - Check, whether the compiler supports __builtin_clz (count leading 2011-10-11 22:49:01 +00:00
coll_tuned_reduce.c Rework the selection logic for the tuned collectives. All supported collectives 2009-08-14 21:06:23 +00:00
coll_tuned_scatter.c - Split the datatype engine into two parts: an MPI specific part in 2009-07-13 04:56:31 +00:00
coll_tuned_topo.c - Check, whether the compiler supports __builtin_clz (count leading 2011-10-11 22:49:01 +00:00
coll_tuned_topo.h - Replace combinations of 2009-08-20 11:42:18 +00:00
coll_tuned_util.c Cleanups. 2011-02-25 00:28:32 +00:00
coll_tuned_util.h - Replace combinations of 2009-08-20 11:42:18 +00:00
coll_tuned.h Rework the selection logic for the tuned collectives. All supported collectives 2009-08-14 21:06:23 +00:00
Makefile.am WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gone into validating the branch. However, not all conditions can be checked, so users are cautioned that it may be advisable to not update from the trunk for a few days to allow MTT to identify platform-specific issues. 2010-09-17 23:04:06 +00:00