openmpi

ports/openmpi

Форкнуть 0

Граф коммитов

Автор	SHA1	Сообщение	Дата
Jeff Squyres	435eaf4671	This is an opal test; it should include opal_config.h, not ompi_config.h. This matters if you autogen.pl --no-ompi. This commit was SVN r29855.	2013-12-11 03:31:25 +00:00
Rainer Keller	4e6a6fc146	- Check, whether the compiler supports __builtin_clz (count leading zeroes); if so, use it for bit-operations like opal_cube_dim and opal_hibit. Implement two versions of power-of-two. In case of opal_next_poweroftwo, this reduces the average execution time from 83 cycles to 4 cycles (Intel Nehalem, icc, -O2, inlining, measured rdtsc, with loop over 2^27 values). Numbers for other functions are similar (but of course heavily depend on the usage, e.g. opal_hibit() with a start of 4 does not save much). The bsr instruction on AMD Opteron is also not as fast. - Replace various places where the next power-of-two is computed. Tested on Intel Nehalem Cluster with openib, compilers GNU-4.6.1 and Intel-12.0.4 using mpi_testsuite -t "Collective" with 128 processes. This commit was SVN r25270.	2011-10-11 22:49:01 +00:00

Автор

SHA1

Сообщение

Дата

Jeff Squyres

435eaf4671

This is an opal test; it should include opal_config.h, not ompi_config.h.

This matters if you autogen.pl --no-ompi.

This commit was SVN r29855.

2013-12-11 03:31:25 +00:00

Rainer Keller

4e6a6fc146

- Check, whether the compiler supports __builtin_clz (count leading

zeroes);
   if so, use it for bit-operations like opal_cube_dim and opal_hibit.
   Implement two versions of power-of-two.
   In case of opal_next_poweroftwo, this reduces the average execution
   time from 83 cycles to 4 cycles (Intel Nehalem, icc, -O2, inlining,
   measured rdtsc, with loop over 2^27 values).
   Numbers for other functions are similar (but of course heavily depend
   on the usage, e.g. opal_hibit() with a start of 4 does not save
   much).  The bsr instruction on AMD Opteron is also not as fast.

 - Replace various places where the next power-of-two is computed.
   
   Tested on Intel Nehalem Cluster with openib, compilers GNU-4.6.1 and
   Intel-12.0.4 using mpi_testsuite -t "Collective" with 128 processes.

This commit was SVN r25270.

2011-10-11 22:49:01 +00:00

2 Коммитов