http://www.open-mpi.org/community/lists/devel/2011/10/9878.php
I am making a final decision to decide the behavior of what happens
when an MCA parameter is re-registered and changes types. In
developer builds (i.e., OPAL_ENABLE_DEBUG==1), a show_help message
will be displayed. In all builds, an error status will be returned.
Specifically, the logic looks like this:
{{{
if (detect_re-registration_with_type_change) {
#if OPAL_ENABLE_DEBUG
opal_show_help(...);
#endif
return OPAL_ERR_VALUE_OUT_OF_BOUNDS;
}
}}}
If someone would like to change this behavior, they are welcome to do
so. :-) I am committing this so that ''some'' action occurs (rather
than talking about the issue and then nothing happens).
This commit was SVN r25432.
handled properly when MCA parameters are re-registered and their types
change. Specifically, this case was broken:
1. Register an int MCA param with a non-zero default value
1. Re-register the same MCA param as a string with a NULL default value
The 2nd step would cause a segv because the first int default value
wasn't being reset properly. Here's sample code that shows the issue:
{{{
{
int ibogus;
char *sbogus;
opal_init(&argc, &argv);
mca_base_param_reg_int_name("type", "name", "help", false, false, 3, &ibogus);
printf("Ibogus: %d\n", ibogus);
mca_base_param_reg_string_name("type", "name", "help", false, false, NULL, &sbogus);
printf("Sbogus: %s\n", (NULL == sbogus) ? "NULL" : sbogus);
exit(0);
}
}}}
This commit fixes the problem from the sample code above as well as
the a similar issue for file-set MCA params and override values. It
also resets default values for MCA params initially registered as a
string but then re-registered as an int.
This commit was SVN r25392.
Use hwloc to obtain the cpuset for each process during mpi_init, and share that info in the modex. As it arrives, use a new opal_hwloc_base utility function to parse the value against the local proc's cpuset and determine where they overlap. Cache the value in the pmap object as it may be referenced multiple times.
Thus, the return value from orte_ess.proc_get_locality is a 16-bit bitmask that describes the resources being shared with you. This bitmask can be tested using the macros in opal/mca/paffinity/paffinity.h
Locality is available for all procs, whether launched via mpirun or directly with an external launcher such as slurm or aprun.
This commit was SVN r25331.
calling main():
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> which ompi_info
~/openmpi-1.5.4/COMPILE-intel-12.1.0/usr/bin/ompi_info
xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> ompi_info
Segmentation fault
xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> gdb usr/bin/ompi_info
...
(gdb) run
Starting program:
...
Program received signal SIGSEGV, Segmentation fault.
opal_memory_ptmalloc2_int_malloc (av=0x7ffff7fe83d8, bytes=4102) at
../../../../../opal/mca/memory/linux/malloc.c:4080
4080 /* remove from unsorted list */
(gdb) where
#0 opal_memory_ptmalloc2_int_malloc (av=0x7ffff7fe83d8, bytes=4102) at
../../../../../opal/mca/memory/linux/malloc.c:4080
#1 0x00007ffff7c232b9 in opal_memory_linux_malloc_hook
(sz=140737354040280, caller=0x1006) at
../../../../../opal/mca/memory/linux/hooks.c:687
#2 0x0000003dd96a6871 in __alloc_dir () from /lib64/libc.so.6
#3 0x0000003ddfa053cd in ?? () from /usr/lib64/libnuma.so.1
#4 0x0000003dd8e0e445 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
A lot of combinations and trials have been done, yet to no avail.
Intel v11.0 worked...
Thanks to Hubert Haberstock (Intel) providing the hint in:
http://software.intel.com/en-us/forums/showthread.php?t=87132
This was tested on openmpi-1.5.4 and therefore should
cmr:v1.5
This commit was SVN r25290.
zeroes);
if so, use it for bit-operations like opal_cube_dim and opal_hibit.
Implement two versions of power-of-two.
In case of opal_next_poweroftwo, this reduces the average execution
time from 83 cycles to 4 cycles (Intel Nehalem, icc, -O2, inlining,
measured rdtsc, with loop over 2^27 values).
Numbers for other functions are similar (but of course heavily depend
on the usage, e.g. opal_hibit() with a start of 4 does not save
much). The bsr instruction on AMD Opteron is also not as fast.
- Replace various places where the next power-of-two is computed.
Tested on Intel Nehalem Cluster with openib, compilers GNU-4.6.1 and
Intel-12.0.4 using mpi_testsuite -t "Collective" with 128 processes.
This commit was SVN r25270.
since the 1.4 release (and configure would abort when run with sparc v8), but
the code was left in place. Sparc v9 (32 or 64 bit) are still supported
targets.
This commit was SVN r25258.
1 otherwise. It was doing the opposite, so this patch fixes the
return values. All uses (all in ORTE) used the actual return values,
not the documented values, so fix them as well.
This commit was SVN r25257.
match similar stuff in the event framework; only add CPPFLAGS /
LDFLAGS / LIBS / and WRAPPER_EXTRA_* of the same for the one, single,
winning component (because this framework is compile-time,
one-of-many).
This commit was SVN r25199.
only became evident when there was more than one event component.
The libevent2013 component is still ompi_ignore'd for most developers.
This commit was SVN r25198.
Since hwloc has a dynamic bitmap size, it could actually have bits set
that will not fit in the paffinity mask. We already made sure that we
didn't overrun the paffinity mask; now also set the return value to
OPAL_ERR_VALUE_OUT_OF_BOUNDS (wow, we really thought of everything
with those error codes, eh?) if the hwloc bitmap has bits set higher
than what will fit into the paffinity bitmask.
This commit was SVN r25179.
The following Trac tickets were found above:
Ticket 2854 --> https://svn.open-mpi.org/trac/ompi/ticket/2854
* change components from setting <framework>_base_include to
opal_<framework>_<component>_include; the framework m4 will figure
out the winning component and pick the right "include" shell
variable. Ditto for the other shell variables (cppflags, ldflags,
etc.).
* misc fixes to hwloc/external
* add a bunch of missing "opal_" prefixes to shell variables
* add a few more / update a few comments in framework m4's
This commit was SVN r25174.