Worked with Oracle to verify that hwloc PCI detection is correctly
disabled on the Suse 10/64 bit platform and is enabled by default on
all other platforms. The --[en|dis]able-hwloc-pci switch is also
available for manual override of the configure decision about hwloc
PCI support.
This commit was SVN r26175.
This commit was SVN r26165.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r4402
hwloc component weren't reverse applied to the external hwloc
component. Additionaly, if we add stuff to LDFLAGS/LIBS, we also may
need to append (DY)LD_LIBRARY_PATH (here in this configure process
only), otherwise future configure tests may fail because they can't
find libhwloc.so (e.g., if you --with-hwloc=/some/path, we need to add
/some/path/lib to (DY)LD_LIBRARY_PATH).
This commit was SVN r26082.
The following Trac tickets were found above:
Ticket 3043 --> https://svn.open-mpi.org/trac/ompi/ticket/3043
This commit was SVN r26037.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r4340
This commit was SVN r25987.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r4319
This commit was SVN r25986.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r4314
Easiest solution is to set the include in a POST_CONFIG macro based on
whether the configure system says the component was selected or not.
This commit was SVN r25968.
1. no binding support - indicated by a negative return code from get_cpubind
2. binding supported, but not bound - the bitset returned by get_cpubind is the same as the available cpuset
3. binding supported and bound - bitset from get_cpubind is a subset of available cpuset
4. only one cpu is available - in this case, get_cpubind matches the available cpuset, but we are effectively bound
This commit was SVN r25957.
in the tarball. Thanks to Paul Hargrove for the fix.
This commit was SVN r25952.
The following Trac tickets were found above:
Ticket 2951 --> https://svn.open-mpi.org/trac/ompi/ticket/2951
paffinity hwloc was returning "NOT_SUPPORTED" when the real problem
was that the underlying hwloc simply hadn't been initialized yet. So
let's clearly delineate this case: return OPAL_ERR_NOT_INITIALIZED if
the underlying hwloc is not initialized.
This commit was SVN r25902.
into account when configuring the components of the faramework. If
--without-memory-manager was given, then we really don't want any
memory managers to be used.
This commit was SVN r25807.
The following Trac tickets were found above:
Ticket 2844 --> https://svn.open-mpi.org/trac/ompi/ticket/2844
problem on SuSE 10 (which might be related to Oracle's dual-bitness
builds, but we aren't completely sure yet).
So just turn it off for now, and bring this over to v1.5. Find a
proper fix (that enables pci support properly) for trunk/v1.7 later.
This commit was SVN r25769.
The following Trac tickets were found above:
Ticket 2952 --> https://svn.open-mpi.org/trac/ompi/ticket/2952
This commit was SVN r25759.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r4094
This commit was SVN r25707.
The following SVN revision numbers were found above:
r4102 --> open-mpi/ompi@8961ca568d
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r4104
.ompi_ignored to allow other developers to test with it. It is
expected that we'll remove the .ompi_ignore here soon, and
simultaneously remove the hwloc 1.2.2ompi component.
There was one very minor patch added to stock hwloc 1.3.1 in
hwloc/config/hwloc.m4:
{{{
--- hwloc-1.3.1/config/hwloc.m4 2011-12-14
+++ ompi3/opal/mca/hwloc/hwloc131/hwloc/config/hwloc.m4
@@ -583,6 +583,7 @@
])
fi
AC_SUBST(HWLOC_PCI_LIBS)
+ HWLOC_LIBS="$HWLOC_LIBS $HWLOC_PCI_LIBS"
# If we asked for pci support but couldn't deliver, fail
AS_IF([test "$enable_pci" = "yes" -a "$hwloc_pci_happy" = "no"],
[AC_MSG_WARN([Specified --enable-pci switch, but could
not])
}}}
This will be pushed upstream to hwloc.
This commit was SVN r25706.
No need for it to be in the base (we mistakenly thought it was used in
multiple shmem components).
This commit was SVN r25662.
The following SVN revision numbers were found above:
r25652 --> open-mpi/ompi@7e223b5799
* Fixes trac:2329 : Improves the error message, and ensures opal-restart will not segv in opal_finalize.
This commit was SVN r25586.
The following Trac tickets were found above:
Ticket 2329 --> https://svn.open-mpi.org/trac/ompi/ticket/2329
to -I the opal/config directory so that autoregen can find our .m4
files.
This commit was SVN r25564.
The following SVN revision numbers were found above:
r25511 --> open-mpi/ompi@5751c45916
supposed to. I.e., half-baked/not complete stuff.
This commit backs out all of r25545. Sorry folks!
This commit was SVN r25546.
The following SVN revision numbers were found above:
r25545 --> open-mpi/ompi@7f9ae11faf
to make MPI_IN_PLACE (and other sentinel Fortran constants) work on OS
X, we need to use the following compiler (linker) flag:
-Wl,-commons,use_dylibs
So if we're compiling on OS X, test to see if that flag works with the
compiler. If so, add it to the wrapper FFLAGS and FCFLAGS (note that
per a future update, we'll only have one Fortran compiler anyway).
Fixes trac:1982.
This commit was SVN r25545.
The following Trac tickets were found above:
Ticket 1982 --> https://svn.open-mpi.org/trac/ompi/ticket/1982
https://svn.open-mpi.org/trac/ompi/wiki/ProcessPlacement
The wiki page is incomplete at the moment, but I hope to complete it over the next few days. I will provide updates on the devel list. As the wiki page states, the default and most commonly used options remain unchanged (except as noted below). New, esoteric and complex options have been added, but unless you are a true masochist, you are unlikely to use many of them beyond perhaps an initial curiosity-motivated experimentation.
In a nutshell, this commit revamps the map/rank/bind procedure to take into account topology info on the compute nodes. I have, for the most part, preserved the default behaviors, with three notable exceptions:
1. I have at long last bowed my head in submission to the system admin's of managed clusters. For years, they have complained about our default of allowing users to oversubscribe nodes - i.e., to run more processes on a node than allocated slots. Accordingly, I have modified the default behavior: if you are running off of hostfile/dash-host allocated nodes, then the default is to allow oversubscription. If you are running off of RM-allocated nodes, then the default is to NOT allow oversubscription. Flags to override these behaviors are provided, so this only affects the default behavior.
2. both cpus/rank and stride have been removed. The latter was demanded by those who didn't understand the purpose behind it - and I agreed as the users who requested it are no longer using it. The former was removed temporarily pending implementation.
3. vm launch is now the sole method for starting OMPI. It was just too darned hard to maintain multiple launch procedures - maybe someday, provided someone can demonstrate a reason to do so.
As Jeff stated, it is impossible to fully test a change of this size. I have tested it on Linux and Mac, covering all the default and simple options, singletons, and comm_spawn. That said, I'm sure others will find problems, so I'll be watching MTT results until this stabilizes.
This commit was SVN r25476.
http://www.open-mpi.org/community/lists/devel/2011/10/9878.php
I am making a final decision to decide the behavior of what happens
when an MCA parameter is re-registered and changes types. In
developer builds (i.e., OPAL_ENABLE_DEBUG==1), a show_help message
will be displayed. In all builds, an error status will be returned.
Specifically, the logic looks like this:
{{{
if (detect_re-registration_with_type_change) {
#if OPAL_ENABLE_DEBUG
opal_show_help(...);
#endif
return OPAL_ERR_VALUE_OUT_OF_BOUNDS;
}
}}}
If someone would like to change this behavior, they are welcome to do
so. :-) I am committing this so that ''some'' action occurs (rather
than talking about the issue and then nothing happens).
This commit was SVN r25432.
handled properly when MCA parameters are re-registered and their types
change. Specifically, this case was broken:
1. Register an int MCA param with a non-zero default value
1. Re-register the same MCA param as a string with a NULL default value
The 2nd step would cause a segv because the first int default value
wasn't being reset properly. Here's sample code that shows the issue:
{{{
{
int ibogus;
char *sbogus;
opal_init(&argc, &argv);
mca_base_param_reg_int_name("type", "name", "help", false, false, 3, &ibogus);
printf("Ibogus: %d\n", ibogus);
mca_base_param_reg_string_name("type", "name", "help", false, false, NULL, &sbogus);
printf("Sbogus: %s\n", (NULL == sbogus) ? "NULL" : sbogus);
exit(0);
}
}}}
This commit fixes the problem from the sample code above as well as
the a similar issue for file-set MCA params and override values. It
also resets default values for MCA params initially registered as a
string but then re-registered as an int.
This commit was SVN r25392.
Use hwloc to obtain the cpuset for each process during mpi_init, and share that info in the modex. As it arrives, use a new opal_hwloc_base utility function to parse the value against the local proc's cpuset and determine where they overlap. Cache the value in the pmap object as it may be referenced multiple times.
Thus, the return value from orte_ess.proc_get_locality is a 16-bit bitmask that describes the resources being shared with you. This bitmask can be tested using the macros in opal/mca/paffinity/paffinity.h
Locality is available for all procs, whether launched via mpirun or directly with an external launcher such as slurm or aprun.
This commit was SVN r25331.
calling main():
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> which ompi_info
~/openmpi-1.5.4/COMPILE-intel-12.1.0/usr/bin/ompi_info
xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> ompi_info
Segmentation fault
xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> gdb usr/bin/ompi_info
...
(gdb) run
Starting program:
...
Program received signal SIGSEGV, Segmentation fault.
opal_memory_ptmalloc2_int_malloc (av=0x7ffff7fe83d8, bytes=4102) at
../../../../../opal/mca/memory/linux/malloc.c:4080
4080 /* remove from unsorted list */
(gdb) where
#0 opal_memory_ptmalloc2_int_malloc (av=0x7ffff7fe83d8, bytes=4102) at
../../../../../opal/mca/memory/linux/malloc.c:4080
#1 0x00007ffff7c232b9 in opal_memory_linux_malloc_hook
(sz=140737354040280, caller=0x1006) at
../../../../../opal/mca/memory/linux/hooks.c:687
#2 0x0000003dd96a6871 in __alloc_dir () from /lib64/libc.so.6
#3 0x0000003ddfa053cd in ?? () from /usr/lib64/libnuma.so.1
#4 0x0000003dd8e0e445 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
A lot of combinations and trials have been done, yet to no avail.
Intel v11.0 worked...
Thanks to Hubert Haberstock (Intel) providing the hint in:
http://software.intel.com/en-us/forums/showthread.php?t=87132
This was tested on openmpi-1.5.4 and therefore should
cmr:v1.5
This commit was SVN r25290.