4e6a6fc146
zeroes); if so, use it for bit-operations like opal_cube_dim and opal_hibit. Implement two versions of power-of-two. In case of opal_next_poweroftwo, this reduces the average execution time from 83 cycles to 4 cycles (Intel Nehalem, icc, -O2, inlining, measured rdtsc, with loop over 2^27 values). Numbers for other functions are similar (but of course heavily depend on the usage, e.g. opal_hibit() with a start of 4 does not save much). The bsr instruction on AMD Opteron is also not as fast. - Replace various places where the next power-of-two is computed. Tested on Intel Nehalem Cluster with openib, compilers GNU-4.6.1 and Intel-12.0.4 using mpi_testsuite -t "Collective" with 128 processes. This commit was SVN r25270. |
||
---|---|---|
.. | ||
Makefile.am | ||
ompi_numtostr.c | ||
opal_argv.c | ||
opal_basename.c | ||
opal_bit_ops.c | ||
opal_error.c | ||
opal_if.c | ||
opal_os_create_dirpath.c | ||
opal_os_path.c | ||
opal_path_nfs.c | ||
opal_sos.c | ||
opal_timer.c | ||
orte_session_dir.c | ||
orte_universe_setup_file_io.c |