1
1
Граф коммитов

1192 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
1d7fef001c Record the upstream hwloc commit that we've committed here in the OMPI
tree

This commit was SVN r26422.
2012-05-10 12:15:23 +00:00
Jeff Squyres
9c9d7e77df Commit a fix for hwloc -- still checking with upstream to see if this
will be the final solution.  But I'm committing it now so that
Oracle's Solaris Studio builds can resume.

The issue is that the C++ bindings are now (eventually) including
<hwloc.h>.  We use !__hwloc_inline__ and #define it to an appropriate
value at compile-time.  The issue is that when we're compiling C++
code, we should just set !__hwloc_inline__ to "inline", because that's
a keyword in the C++ language (as opposed to !__inline__, or
somesuch).

This commit was SVN r26418.
2012-05-09 21:03:45 +00:00
Jeff Squyres
de4bbacd13 It turns out that we can't always include the hwloc OpenFabrics verbs
helper file, even if we find that the system has <infiniband/verbs.h>.
The reason is because there are some inline functions in that verbs
helper file that invoke ibv_* functions.  Some linkers (e.g., Solaris
Studio Compilers) will instantiate those static inline functions --
even if we don't use them -- and therefore we need to be able to
resolve the ibv_* symbols at link time.

But since -libverbs is only specified in places where we use other
ibv_* functions (e.g., the OpenFabrics-based BTLs), that means that
linking random executables can/will fail (e.g., orterun).

So instead, introduce a new #define: OPAL_HWLOC_WANT_VERBS_HELPER.  If
this macro is set to 1 before including opal/mca/hwloc/hwloc.h, then
you'll also get the hwloc OpenFabrics verbs helper header file (*if*
hwloc found <infiniband/verbs.h> -- otherwise, it'll #error).

This commit was SVN r26417.
2012-05-09 20:18:31 +00:00
Jeff Squyres
2ba10c37fe Per RFC, bring in the following changes:
* Remove paffinity, maffinity, and carto frameworks -- they've been
   wholly replaced by hwloc.
 * Move ompi_mpi_init() affinity-setting/checking code down to ORTE.
 * Update sm, smcuda, wv, and openib components to no longer use carto.
   Instead, use hwloc data.  There are still optimizations possible in
   the sm/smcuda BTLs (i.e., making multiple mpools).  Also, the old
   carto-based code found out how many NUMA nodes were ''available''
   -- not how many were used ''in this job''.  The new hwloc-using
   code computes the same value -- it was not updated to calculate how
   many NUMA nodes are used ''by this job.''
   * Note that I cannot compile the smcuda and wv BTLs -- I ''think''
     they're right, but they need to be verified by their owners.
 * The openib component now does a bunch of stuff to figure out where
   "near" OpenFabrics devices are.  '''THIS IS A CHANGE IN DEFAULT
   BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors
   (I do not have a NUMA machine with an OpenFabrics device that is a
   non-uniform distance from multiple different NUMA nodes).
 * Completely rewrite the OMPI_Affinity_str() routine from the
   "affinity" mpiext extension.  This extension now understands
   hyperthreads; the output format of it has changed a bit to reflect
   this new information.
 * Bunches of minor changes around the code base to update names/types
   from maffinity/paffinity-based names to hwloc-based names.
 * Add some helper functions into the hwloc base, mainly having to do
   with the fact that we have the hwloc data reporting ''all''
   topology information, but sometimes you really only want the
   (online | available) data.

This commit was SVN r26391.
2012-05-07 14:52:54 +00:00
Jeff Squyres
aba398ce09 Per RFC
(http://www.open-mpi.org/community/lists/devel/2012/04/10905.php), set
opal_cache_line_size via hwloc data, if we have it.
opal_cache_line_size will be set to an hwloc-inspired value by the end
of orte_init(), but will always have a safe value to use (i.e., a
default value 128) -- even before opal_init() has completed.

Default to the same value of 128 that Open MPI has used for several
years if a) we have no hwloc data, or b) we weren't able to find L2
objects in the hwloc data.

This commit was SVN r26322.
2012-04-24 17:31:06 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Mike Dubman
ff1c84c53f revert previous commit
This commit was SVN r26206.
2012-03-29 14:07:13 +00:00
Mike Dubman
43a5775e8a performance fix: set alignment for openib internal buffers
This commit was SVN r26205.
2012-03-29 14:00:08 +00:00
Jeff Squyres
028f471a20 Using the right env variable name helps!
This commit was SVN r26204.
2012-03-28 17:59:21 +00:00
Jeff Squyres
8a2df3311d Fixes trac:2812: check for env. markers indicating that we're in a
fakeroot.  If so, exit out of the pre-main hook immediately (without
calling functions such as stat, which will be replaced by fakeroot to
things that are not safe to call in a pre-main environment).

This commit was SVN r26203.

The following Trac tickets were found above:
  Ticket 2812 --> https://svn.open-mpi.org/trac/ompi/ticket/2812
2012-03-28 16:41:29 +00:00
Pavel Shamis
39a55df333 Adding exported libevent globabl variables to the opal_rename file.
Otherwise the varables case name conflicts.

This commit was SVN r26201.
2012-03-27 17:26:21 +00:00
Ralph Castain
811413e9bc Correctly handle multiple cpu-set ranges. Correctly support optional binding directives combined with cpu-set.
This commit was SVN r26187.
2012-03-23 14:50:41 +00:00
Ralph Castain
ce0caf7567 Support -cpu-set by binding to the specified cpus in the absence of any other binding directive. Allows users to subdivide nodes for multiple parallel mpirun invocations.
This commit was SVN r26186.
2012-03-23 14:05:52 +00:00
Ralph Castain
6f6930eb66 Resolve infinite loop when -cpu-set is specified
This commit was SVN r26184.
2012-03-23 07:18:58 +00:00
Jeff Squyres
95148f3310 Don't force the use of libpci support in hwloc in the default case --
just let hwloc decide for itself.

This commit was SVN r26178.
2012-03-22 15:28:35 +00:00
Jeff Squyres
3bf038bb1c Per RFC from long ago:
http://www.open-mpi.org/community/lists/devel/2011/10/9784.php

Bring support for a DMTCP CRS module into the trunk.  See
http://dmtcp.sourceforge.net/ for a description of DMTCP.  Thanks to
the contribution from Alex Brick at Northeastern University, and all
the others up there who helped shepherd this into being ready to
submit.

This commit was SVN r26176.
2012-03-22 12:01:46 +00:00
Jeff Squyres
d30bbc2ef9 Fix an old issue: enable hwloc PCI detection except on SuSE 10 64 bit.
Worked with Oracle to verify that hwloc PCI detection is correctly
disabled on the Suse 10/64 bit platform and is enabled by default on
all other platforms.  The --[en|dis]able-hwloc-pci switch is also
available for manual override of the configure decision about hwloc
PCI support.

This commit was SVN r26175.
2012-03-22 11:30:57 +00:00
Jeff Squyres
ab543fce58 We have no common components in opal any more, so we can remove this directory.
This commit was SVN r26169.
2012-03-20 21:21:49 +00:00
Jeff Squyres
0322db7cde Bring over r4402 from hwloc trunk.
This commit was SVN r26165.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4402
2012-03-19 16:39:54 +00:00
Jeff Squyres
aeca190744 Refs trac:3046: feedback from Brian -- don't set DYLD_LIBRARY_PATH.
This commit was SVN r26108.

The following Trac tickets were found above:
  Ticket 3046 --> https://svn.open-mpi.org/trac/ompi/ticket/3046
2012-03-07 13:12:22 +00:00
Ralph Castain
366f9d1518 Add some missing localities to the hwloc pretty-print, fix pmi modex
This commit was SVN r26105.
2012-03-06 06:21:10 +00:00
Jeff Squyres
f84c16bb65 Fixes trac:3043. Looks like some of the improvements to the hwloc132
hwloc component weren't reverse applied to the external hwloc
component.  Additionaly, if we add stuff to LDFLAGS/LIBS, we also may
need to append (DY)LD_LIBRARY_PATH (here in this configure process
only), otherwise future configure tests may fail because they can't
find libhwloc.so (e.g., if you --with-hwloc=/some/path, we need to add
/some/path/lib to (DY)LD_LIBRARY_PATH).

This commit was SVN r26082.

The following Trac tickets were found above:
  Ticket 3043 --> https://svn.open-mpi.org/trac/ompi/ticket/3043
2012-03-02 20:15:07 +00:00
Jeff Squyres
e77653511b Bring in upstream hwloc v1.3 branch SVN commit r4345
This commit was SVN r26048.

The following SVN revision numbers were found above:
  r4345 --> open-mpi/ompi@b6c2a5b602
2012-02-24 13:57:18 +00:00
Jeff Squyres
f8f7f6b3ef Bring over upstream hwloc fix r4340
This commit was SVN r26037.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4340
2012-02-23 20:44:21 +00:00
Jeff Squyres
d0df08c953 Bring in upstream hwloc SVN r4319.
This commit was SVN r25987.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4319
2012-02-21 15:39:21 +00:00
Jeff Squyres
9f7b1d76cd Apply upstream hwloc fix; hwloc SVN r4314
This commit was SVN r25986.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4314
2012-02-21 15:10:40 +00:00
Brian Barrett
2d4bbfb083 Need to make sure that only the winning component sets the include file.
Easiest solution is to set the include in a POST_CONFIG macro based on
whether the configure system says the component was selected or not.

This commit was SVN r25968.
2012-02-20 16:45:54 +00:00
Brian Barrett
628aa0d84d Altix check needs to occur before linux check.
This commit was SVN r25967.
2012-02-20 16:07:41 +00:00
Ralph Castain
534d70025f Cleanup the detection of process binding during mpi_init. There are several cases that need to be checked:
1. no binding support - indicated by a negative return code from get_cpubind

2. binding supported, but not bound - the bitset returned by get_cpubind is the same as the available cpuset

3. binding supported and bound - bitset from get_cpubind is a subset of available cpuset

4. only one cpu is available - in this case, get_cpubind matches the available cpuset, but we are effectively bound

This commit was SVN r25957.
2012-02-17 21:18:53 +00:00
Jeff Squyres
72e44cfefe Fixes trac:2951: make .../hwloc/include/autogen/config.h not be included
in the tarball.  Thanks to Paul Hargrove for the fix.

This commit was SVN r25952.

The following Trac tickets were found above:
  Ticket 2951 --> https://svn.open-mpi.org/trac/ompi/ticket/2951
2012-02-17 14:27:27 +00:00
Jeff Squyres
eb47a97025 After a bunch more back-n-forth with Paul Hargrove, hopefully this
visibility stuff will now be fixed!

This commit was SVN r25944.
2012-02-17 00:09:32 +00:00
Jeff Squyres
a055c5662c This is already ompi_ignore'd -- let's remove it.
This commit was SVN r25943.
2012-02-16 22:58:58 +00:00
Ralph Castain
7fd3ee6662 Ensure we see the correct config.h, and silence the warnings caused by duplicate defines
This commit was SVN r25938.
2012-02-16 02:50:57 +00:00
Jeff Squyres
14457accd7 Add hwloc 1.3.2 and ompi_ignore hwloc 1.3.1 (with the intent of
removing 1.3.1 in the near future).

This commit was SVN r25927.
2012-02-14 21:01:36 +00:00
Jeff Squyres
63a96e92b5 In a recent v1.5 branch issue, it took a while to figure out that
paffinity hwloc was returning "NOT_SUPPORTED" when the real problem
was that the underlying hwloc simply hadn't been initialized yet.  So
let's clearly delineate this case: return OPAL_ERR_NOT_INITIALIZED if
the underlying hwloc is not initialized.

This commit was SVN r25902.
2012-02-10 18:29:52 +00:00
Jeff Squyres
8d0bc199df hwloc131_module.c isn't necessary -- there's no module.
This commit was SVN r25901.
2012-02-10 18:09:19 +00:00
Jeff Squyres
6557d74e01 Make sure we get the entire hwloc tree, including IO devices.
This commit was SVN r25887.
2012-02-09 16:59:38 +00:00
Jeff Squyres
6dde3b6d86 Remove the old hwloc component; we bumped up to 1.3.1 a long time ago.
This commit was SVN r25885.
2012-02-09 12:27:00 +00:00
Ralph Castain
a3ab70c53f Correctly parse socket:core syntax in rankfile
This commit was SVN r25848.
2012-02-01 01:50:05 +00:00
Jeff Squyres
ba1b02dea0 Don't install this extra libevent file.
This commit was SVN r25808.
2012-01-27 20:38:00 +00:00
Jeff Squyres
9e9b06d9f7 Fixes trac:2844: ensure to take the value of --with(out)-memory-manager
into account when configuring the components of the faramework.  If
--without-memory-manager was given, then we really don't want any
memory managers to be used.

This commit was SVN r25807.

The following Trac tickets were found above:
  Ticket 2844 --> https://svn.open-mpi.org/trac/ompi/ticket/2844
2012-01-27 18:05:48 +00:00
Ralph Castain
3f31feee6f Handle the case where a user's rankfile specifies only cpus, and not socket:cpu pairs.
This commit was SVN r25803.
2012-01-27 12:21:45 +00:00
Shiqing Fan
debe91aefa Change the syntax to be compatible with C++ compiler, as this has to be compiled as C++ on Windows. Thanks Ralph.
This commit was SVN r25785.
2012-01-26 14:53:45 +00:00
Jeff Squyres
e162945090 This script is generated and should not be in SVN.
This commit was SVN r25778.
2012-01-25 16:38:25 +00:00
Jeff Squyres
40e23e3979 Refs trac:2952: temporarily turn off hwloc PCI support because it causes a
problem on SuSE 10 (which might be related to Oracle's dual-bitness
builds, but we aren't completely sure yet).

So just turn it off for now, and bring this over to v1.5.  Find a
proper fix (that enables pci support properly) for trunk/v1.7 later.

This commit was SVN r25769.

The following Trac tickets were found above:
  Ticket 2952 --> https://svn.open-mpi.org/trac/ompi/ticket/2952
2012-01-24 15:07:41 +00:00
Jeff Squyres
6c6d19f5f5 We always want to add HWLOC_EMBEDDED_LIBS to the wrapper flags. It'll
either be empty or have meaningful stuff in it.

This commit was SVN r25761.
2012-01-21 02:57:17 +00:00
Jeff Squyres
6cad1f34e0 Bring r4182 from the hwloc v1.3 branch: fix static linking issues with
libhwloc_embedded.la.

This commit was SVN r25760.

The following SVN revision numbers were found above:
  r4182 --> open-mpi/ompi@b240395d9a
2012-01-21 02:56:42 +00:00
Jeff Squyres
878a0365be Bring over r4094 from the hwloc v1.3 branch: add missing HWLOC_PCI_LIBS
This commit was SVN r25759.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4094
2012-01-21 02:17:07 +00:00
Jeff Squyres
1d15d39fb8 Remove the libnuma component; the hwloc maffinity component does everything that this compnent used to do. Good riddance!
This commit was SVN r25749.
2012-01-19 23:53:05 +00:00
Jeff Squyres
45636b0558 Make hwloc 1.3.1 the default. Will likel remove 1.2.2ompi shortly.
This commit was SVN r25748.
2012-01-19 23:18:40 +00:00
Jeff Squyres
1a73ba6ce8 Note the upstream patches that we have in addition to stock hwloc 1.3.1.
This commit was SVN r25708.
2012-01-11 00:22:34 +00:00
Jeff Squyres
4243cb7af0 Bring over hwloc r4102 and r4104 for some upstream patches.
This commit was SVN r25707.

The following SVN revision numbers were found above:
  r4102 --> open-mpi/ompi@8961ca568d

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4104
2012-01-11 00:21:47 +00:00
Jeff Squyres
50e5b0937c Add hwloc 1.3.1, but it is not yet the default -- it is currently
.ompi_ignored to allow other developers to test with it.  It is
expected that we'll remove the .ompi_ignore here soon, and
simultaneously remove the hwloc 1.2.2ompi component.

There was one very minor patch added to stock hwloc 1.3.1 in
hwloc/config/hwloc.m4:

{{{
--- hwloc-1.3.1/config/hwloc.m4	2011-12-14 
+++ ompi3/opal/mca/hwloc/hwloc131/hwloc/config/hwloc.m4
@@ -583,6 +583,7 @@
         ])
     fi
     AC_SUBST(HWLOC_PCI_LIBS)
+    HWLOC_LIBS="$HWLOC_LIBS $HWLOC_PCI_LIBS"
     # If we asked for pci support but couldn't deliver, fail
     AS_IF([test "$enable_pci" = "yes" -a "$hwloc_pci_happy" = "no"],
           [AC_MSG_WARN([Specified --enable-pci switch, but could
	   not])
}}}

This will be pushed upstream to hwloc.

This commit was SVN r25706.
2012-01-10 23:38:14 +00:00
Samuel Gutierrez
0ca6603fa0 remove some unused cruft in shmem. minor common sm cleanup.
This commit was SVN r25665.
2011-12-16 22:43:55 +00:00
Jeff Squyres
1d3dc0af28 Gah! opal_shmem_base_register_params() ''wasn't'' added for the mmap
on NFS warning -- it was already there!  So put it back so that it can
register base_verbose and RUNTIME_QUERY_hint.

This commit was SVN r25663.
2011-12-15 21:14:34 +00:00
Jeff Squyres
9cef715194 Updates to r25652 -- put this MCA param in the shmem/mmap component.
No need for it to be in the base (we mistakenly thought it was used in
multiple shmem components).

This commit was SVN r25662.

The following SVN revision numbers were found above:
  r25652 --> open-mpi/ompi@7e223b5799
2011-12-15 20:41:14 +00:00
Ralph Castain
7e223b5799 Okay, okay...stop the whining! Put the mca param registration in the shmem base.
This commit was SVN r25652.
2011-12-14 22:25:32 +00:00
Ralph Castain
4303958968 Allow users to silence warning
This commit was SVN r25650.
2011-12-14 21:50:34 +00:00
Shiqing Fan
a58e4ae809 Add a missing .windows file into the tarball.
This commit was SVN r25638.
2011-12-14 13:29:26 +00:00
Jeff Squyres
efd8106d0a Fix typo in help message
This commit was SVN r25628.
2011-12-13 21:55:48 +00:00
Samuel Gutierrez
8b9bb66b1c fixes shared memory in windows.
This commit was SVN r25623.
2011-12-12 22:16:31 +00:00
Nathan Hjelm
239e9c8740 clean up tabs
This commit was SVN r25622.
2011-12-12 20:54:14 +00:00
Nathan Hjelm
885d5cbcf8 enable ptmalloc with using uGNI
This commit was SVN r25621.
2011-12-12 20:52:51 +00:00
Samuel Gutierrez
de8d3a4f79 more windows shmem updates. maybe this is closer..?
This commit was SVN r25607.
2011-12-09 06:34:06 +00:00
Samuel Gutierrez
989d75bfec some more windows update... this may really break things :-(.
This commit was SVN r25592.
2011-12-08 00:10:09 +00:00
Samuel Gutierrez
dcb965e60e Some shmem Windows updates. I'm not sure if this makes a difference. Shiqing - can you please test?.
This commit was SVN r25591.
2011-12-07 21:56:27 +00:00
Josh Hursey
076db435cd * Fixes trac:2807 : Improve the BLCR configure option so that it checks if the {{{--with-blcr}}} option is specified, but not {{{--with-ft=cr}}}, then configure returns an error (since this is clearly not what the user intended).
This commit was SVN r25590.

The following Trac tickets were found above:
  Ticket 2807 --> https://svn.open-mpi.org/trac/ompi/ticket/2807
2011-12-07 21:36:03 +00:00
Josh Hursey
58938b2f50 * Clarified show help when CRS component cannot be loaded.
* Fixes trac:2329 : Improves the error message, and ensures opal-restart will not segv in opal_finalize.

This commit was SVN r25586.

The following Trac tickets were found above:
  Ticket 2329 --> https://svn.open-mpi.org/trac/ompi/ticket/2329
2011-12-07 14:58:08 +00:00
Ralph Castain
6fefe236a4 Warn users if they set opal_paffinity_alone, either to true or false, that this parameter is no longer functional - they must use the --bind-to option and its corresponding mca param.
This commit was SVN r25567.
2011-12-03 01:10:52 +00:00
Jeff Squyres
93a797cc33 r25511 didn't fully fix the OPAL_CHECK_VISIBILITY issue; we also need
to -I the opal/config directory so that autoregen can find our .m4
files. 

This commit was SVN r25564.

The following SVN revision numbers were found above:
  r25511 --> open-mpi/ompi@5751c45916
2011-12-02 19:45:39 +00:00
Jeff Squyres
ecf6ba910c Silence a few icc warnings and about mixing enums with other types.
This commit was SVN r25560.
2011-12-02 13:18:54 +00:00
Jeff Squyres
6fbbfd0f7a Gah! r25545 acidentally included ''waaaay'' more stuff than it was
supposed to.  I.e., half-baked/not complete stuff.

This commit backs out all of r25545.  Sorry folks!

This commit was SVN r25546.

The following SVN revision numbers were found above:
  r25545 --> open-mpi/ompi@7f9ae11faf
2011-11-29 23:24:52 +00:00
Jeff Squyres
7f9ae11faf Per http://www.open-mpi.org/community/lists/users/2011/11/17862.php,
to make MPI_IN_PLACE (and other sentinel Fortran constants) work on OS
X, we need to use the following compiler (linker) flag:

    -Wl,-commons,use_dylibs 

So if we're compiling on OS X, test to see if that flag works with the
compiler.  If so, add it to the wrapper FFLAGS and FCFLAGS (note that
per a future update, we'll only have one Fortran compiler anyway).

Fixes trac:1982.  

This commit was SVN r25545.

The following Trac tickets were found above:
  Ticket 1982 --> https://svn.open-mpi.org/trac/ompi/ticket/1982
2011-11-29 23:05:54 +00:00
Samuel Gutierrez
375162c693 this commit fixes a few things. 1. silence warning in common sm. 2. remove unneeded config code in common sm. 3. move opal_shmem_base_close to a better place in opal_finalize. 4. fix opal_path_nfs output.
This commit was SVN r25518.
2011-11-28 23:41:19 +00:00
Samuel Gutierrez
b4edf0ff5c getting ready for 1.5 port of the shared memory enhancements. remove some unused/unneeded stuff and minor style update.
This commit was SVN r25513.
2011-11-28 16:08:32 +00:00
George Bosilca
5751c45916 Actually ... OMPI is lacking visibility, as it was all moved down in OPAL.
This commit was SVN r25511.
2011-11-28 04:27:06 +00:00
Terry Dontje
1f53b32216 This commit fixes trac:2917. By using the cleaned up version of check_visibility that is in the hwloc trunk repo.
This commit was SVN r25495.

The following Trac tickets were found above:
  Ticket 2917 --> https://svn.open-mpi.org/trac/ompi/ticket/2917
2011-11-22 00:01:09 +00:00
George Bosilca
61f273b987 Do not tolerate uninitialized variables.
This commit was SVN r25489.
2011-11-18 10:19:24 +00:00
Samuel Gutierrez
15249dfc01 rename some variables following ompi's naming convention.
This commit was SVN r25487.
2011-11-17 19:22:58 +00:00
Samuel Gutierrez
47499d1d3d fixes CID 2256.
This commit was SVN r25486.
2011-11-17 18:06:28 +00:00
Samuel Gutierrez
c1012f502f added backing file relocation capability in shmem mmap. two new mca
parameters control its behavior (shmem_mmap_relocate_backing_file and
shmem_mmap_backing_file_base_dir).

This commit was SVN r25480.
2011-11-16 01:38:23 +00:00
Ralph Castain
6310361532 At long last, the fabled revision to the affinity system has arrived. A more detailed explanation of how this all works will be presented here:
https://svn.open-mpi.org/trac/ompi/wiki/ProcessPlacement

The wiki page is incomplete at the moment, but I hope to complete it over the next few days. I will provide updates on the devel list. As the wiki page states, the default and most commonly used options remain unchanged (except as noted below). New, esoteric and complex options have been added, but unless you are a true masochist, you are unlikely to use many of them beyond perhaps an initial curiosity-motivated experimentation.

In a nutshell, this commit revamps the map/rank/bind procedure to take into account topology info on the compute nodes. I have, for the most part, preserved the default behaviors, with three notable exceptions:

1. I have at long last bowed my head in submission to the system admin's of managed clusters. For years, they have complained about our default of allowing users to oversubscribe nodes - i.e., to run more processes on a node than allocated slots. Accordingly, I have modified the default behavior: if you are running off of hostfile/dash-host allocated nodes, then the default is to allow oversubscription. If you are running off of RM-allocated nodes, then the default is to NOT allow oversubscription. Flags to override these behaviors are provided, so this only affects the default behavior.

2. both cpus/rank and stride have been removed. The latter was demanded by those who didn't understand the purpose behind it - and I agreed as the users who requested it are no longer using it. The former was removed temporarily pending implementation.

3. vm launch is now the sole method for starting OMPI. It was just too darned hard to maintain multiple launch procedures - maybe someday, provided someone can demonstrate a reason to do so.

As Jeff stated, it is impossible to fully test a change of this size. I have tested it on Linux and Mac, covering all the default and simple options, singletons, and comm_spawn. That said, I'm sure others will find problems, so I'll be watching MTT results until this stabilizes.

This commit was SVN r25476.
2011-11-15 03:40:11 +00:00
Shiqing Fan
fc46ed6438 Use the new libevent On Windows.
This commit was SVN r25462.
2011-11-09 14:05:35 +00:00
George Bosilca
bd55a19db4 Disable vector optimization for ICC v12.1.0 release 2011.6.233.
This commit was SVN r25461.
2011-11-08 21:23:30 +00:00
Jeff Squyres
78538b701d Turns out that we're not even using that $includedir properly, so just
remove it.

This commit was SVN r25458.
2011-11-08 03:18:09 +00:00
Ralph Castain
a931c5b1eb Redo a patch from late last night that replaces libevent 2.0.7 with libevent 2.0.13 as our default event library. Cleanup the libevent renames to correctly state 2013 as our new version.
This commit was SVN r25457.
2011-11-08 01:36:15 +00:00
Ralph Castain
a3ce355a60 Revert r25453 and r25450 until we can fix the libevent2013 configure code - still not getting the includedir to eval correctly.
This commit was SVN r25454.

The following SVN revision numbers were found above:
  r25450 --> open-mpi/ompi@7f7d5c4f1f
  r25453 --> open-mpi/ompi@c9fe8c32e2
2011-11-07 16:23:44 +00:00
Ralph Castain
c9fe8c32e2 Fix a bug in the configure.m4 to ensure we disable unused libevent modes. Edit their Makefile.am to remove bufferevent support since we don't use it either.
Sorry for mid-day correction.

This commit was SVN r25453.
2011-11-07 15:21:29 +00:00
Nathan Hjelm
7f7d5c4f1f RFC: upgrade to libevent 2.0.13 (removing 2.0.7) timeout. Removed libevent 2.0.7
This commit was SVN r25450.
2011-11-07 04:32:36 +00:00
Jeff Squyres
f08b8bf2d4 Per this thread:
http://www.open-mpi.org/community/lists/devel/2011/10/9878.php

I am making a final decision to decide the behavior of what happens
when an MCA parameter is re-registered and changes types.  In
developer builds (i.e., OPAL_ENABLE_DEBUG==1), a show_help message
will be displayed.  In all builds, an error status will be returned.
Specifically, the logic looks like this:

{{{
    if (detect_re-registration_with_type_change) {
#if OPAL_ENABLE_DEBUG
        opal_show_help(...);
#endif
        return OPAL_ERR_VALUE_OUT_OF_BOUNDS;
    }
}}}

If someone would like to change this behavior, they are welcome to do
so.  :-) I am committing this so that ''some'' action occurs (rather
than talking about the issue and then nothing happens).

This commit was SVN r25432.
2011-11-04 14:16:49 +00:00
Jeff Squyres
886a9d589b Custom patch from Brice for the hwloc-1.2.2ompi distro, per an issue
that Chris Yeoh/IBM found.  See the thread below for more info:

  http://www.open-mpi.org/community/lists/hwloc-devel/2011/11/2521.php

This commit was SVN r25429.
2011-11-03 14:53:22 +00:00
Jeff Squyres
4fe26b0392 Fix some minor memory leaks
This commit was SVN r25410.
2011-11-01 20:22:26 +00:00
Ralph Castain
4368199c86 Missing include
This commit was SVN r25402.
2011-10-31 13:39:57 +00:00
Ralph Castain
96332a2859 Fix typo
This commit was SVN r25400.
2011-10-30 13:23:42 +00:00
Ralph Castain
71ed8e3cd3 Bring back the local node's binding capabilities along with its topology. Clean up indentation.
This commit was SVN r25399.
2011-10-30 13:20:16 +00:00
Ralph Castain
7ba4675adf Bring over some useful utilities and definitions for working with hwloc inside ORTE/OMPI. Cache frequently computed info to save processing time when handling multiple nodes with the same topology. Deal with available cpus as defined by online vs allowed vs user-specified limits. Help deal with hwloc's unfortunate decision to lump all caches in the same object type.
This commit was SVN r25393.
2011-10-29 14:58:58 +00:00
Jeff Squyres
6092b50ebb Fix the cases where the default values of MCA params were not always
handled properly when MCA parameters are re-registered and their types
change.  Specifically, this case was broken:

 1. Register an int MCA param with a non-zero default value
 1. Re-register the same MCA param as a string with a NULL default value

The 2nd step would cause a segv because the first int default value
wasn't being reset properly.  Here's sample code that shows the issue:

{{{
{
    int ibogus;
    char *sbogus;
    opal_init(&argc, &argv);
    mca_base_param_reg_int_name("type", "name", "help", false, false, 3, &ibogus);
    printf("Ibogus: %d\n", ibogus);
    mca_base_param_reg_string_name("type", "name", "help", false, false, NULL, &sbogus);
    printf("Sbogus: %s\n", (NULL == sbogus) ? "NULL" : sbogus);
    exit(0);
}
}}}

This commit fixes the problem from the sample code above as well as
the a similar issue for file-set MCA params and override values.  It
also resets default values for MCA params initially registered as a
string but then re-registered as an int.

This commit was SVN r25392.
2011-10-29 12:29:31 +00:00
Ralph Castain
21d45b0807 Just some cleanup in case of error
This commit was SVN r25387.
2011-10-29 01:55:19 +00:00
Ralph Castain
a7cbc25658 Minor cleanups - check hwloc returns everywhere. Thanks to Chris Yeoh for pointing this out.
This commit was SVN r25360.
2011-10-24 14:05:26 +00:00
Shiqing Fan
5711414eb7 Fix Windows build
This commit was SVN r25351.
2011-10-21 14:46:58 +00:00
Jeff Squyres
cbafea8f69 Add a DEPENDENCIES line so that if you edit something down in the
hwloc tree, it'll get picked up by the component (and therefore by
libopen-pal).

Thanks to Terry for finding the problem.

This commit was SVN r25349.
2011-10-21 11:39:52 +00:00
Ralph Castain
b44f8d4b28 Complete implementation of the ess.proc_get_locality API. Up to this point, the API was only capable of telling if the specified proc was sharing a node with you. However, the returned value was capable of telling you much more detailed info - e.g., if the proc shares a socket, a cache, or numa node. We just didn't have the data to provide that detail.
Use hwloc to obtain the cpuset for each process during mpi_init, and share that info in the modex. As it arrives, use a new opal_hwloc_base utility function to parse the value against the local proc's cpuset and determine where they overlap. Cache the value in the pmap object as it may be referenced multiple times.

Thus, the return value from orte_ess.proc_get_locality is a 16-bit bitmask that describes the resources being shared with you. This bitmask can be tested using the macros in opal/mca/paffinity/paffinity.h

Locality is available for all procs, whether launched via mpirun or directly with an external launcher such as slurm or aprun.

This commit was SVN r25331.
2011-10-19 20:18:14 +00:00
Rainer Keller
ec6ac33b75 - On Linux x86-64 with intel compiler v12.1, any ompi-app fails before
calling main():
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> which ompi_info
~/openmpi-1.5.4/COMPILE-intel-12.1.0/usr/bin/ompi_info
xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> ompi_info
Segmentation fault
xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> gdb usr/bin/ompi_info
...
(gdb) run
Starting program:
...
Program received signal SIGSEGV, Segmentation fault.
opal_memory_ptmalloc2_int_malloc (av=0x7ffff7fe83d8, bytes=4102) at
../../../../../opal/mca/memory/linux/malloc.c:4080
4080          /* remove from unsorted list */
(gdb) where
#0  opal_memory_ptmalloc2_int_malloc (av=0x7ffff7fe83d8, bytes=4102) at
../../../../../opal/mca/memory/linux/malloc.c:4080
#1  0x00007ffff7c232b9 in opal_memory_linux_malloc_hook
(sz=140737354040280, caller=0x1006) at
../../../../../opal/mca/memory/linux/hooks.c:687
#2  0x0000003dd96a6871 in __alloc_dir () from /lib64/libc.so.6
#3  0x0000003ddfa053cd in ?? () from /usr/lib64/libnuma.so.1
#4  0x0000003dd8e0e445 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    A lot of combinations and trials have been done, yet to no avail.
    Intel v11.0 worked...

    Thanks to Hubert Haberstock (Intel) providing the hint in:
    http://software.intel.com/en-us/forums/showthread.php?t=87132

    This was tested on openmpi-1.5.4 and therefore should
    cmr:v1.5

This commit was SVN r25290.
2011-10-14 20:47:08 +00:00
Ralph Castain
69a0882207 Correctly setup hwloc when passing a topology from an external source
This commit was SVN r25277.
2011-10-12 21:34:46 +00:00
Jeff Squyres
ff97b57c90 Change the names to be slightly more descriptive.
This commit was SVN r25271.
2011-10-12 16:07:09 +00:00
Jeff Squyres
951c745590 We always have hwloc xml support (now that it's built into to hwloc
without needing libxml2).  So OPAL_HAVE_HWLOC_XML is no longer
necessary.  

This commit was SVN r25263.
2011-10-11 20:20:59 +00:00
Jeff Squyres
e4f8b662a1 Remove this component; it was wholly superceded by hwloc122ompi a
little while ago.

This commit was SVN r25261.
2011-10-11 20:13:33 +00:00
Jeff Squyres
7fd3a7f696 Remove no-longer-true comment. Process-wide memory affinity policy is
set by calling opal_hwloc_base_set_process_membind_policy().

This commit was SVN r25260.
2011-10-11 20:00:45 +00:00
Ralph Castain
8c4512a994 Fix the verbose output for caches (again) so they are properly labeled, pending adoption of the upstream patch we supplied.
This commit was SVN r25251.
2011-10-11 05:54:26 +00:00
Swen Boehm
08b4322a1a patched the lex files to not issue the following compiler warning:
'yyunput' defined but not used

This commit was SVN r25246.
2011-10-10 18:13:04 +00:00
George Bosilca
649af6c925 Enumerated mixed with another type (int) is tolerated but
easily fixable.

This commit was SVN r25241.
2011-10-09 03:54:52 +00:00
George Bosilca
9d68d7c0c8 iFix a bunch of warnings.
This commit was SVN r25227.
2011-10-03 18:46:49 +00:00
George Bosilca
b4c076ad28 Remove an unused function.
This commit was SVN r25226.
2011-10-03 18:46:27 +00:00
Jeff Squyres
34deb0db97 Sync with final hwloc 1.2.2 release
This commit was SVN r25221.
2011-10-03 14:12:38 +00:00
Jeff Squyres
6a32aa4a04 Oops -- it looks like we ''do'' still use this variable in the
trunk... 

This commit was SVN r25203.
2011-09-28 12:12:37 +00:00
Jeff Squyres
bc3e213a69 After fixing an svn/hg kerfluffle, there's a few files left over from
last night's hwloc/paffinity/maffinity minor update.  Nothing huge;
just a little cleanup.

This commit was SVN r25202.
2011-09-28 11:46:28 +00:00
Jeff Squyres
9fa2130cfb Fix typo that prevents VPATH builds.
This commit was SVN r25201.
2011-09-28 11:29:12 +00:00
Jeff Squyres
970a75a7b6 Update to a custom OMPI roll of hwloc v1.2.2. Upgrade the configry to
match similar stuff in the event framework; only add CPPFLAGS /
LDFLAGS / LIBS / and WRAPPER_EXTRA_* of the same for the one, single,
winning component (because this framework is compile-time,
one-of-many).

This commit was SVN r25199.
2011-09-27 23:54:09 +00:00
Jeff Squyres
3d61d0f357 Fix up some long-latency bugs in the MCA even framework configury that
only became evident when there was more than one event component.

The libevent2013 component is still ompi_ignore'd for most developers.

This commit was SVN r25198.
2011-09-27 23:18:07 +00:00
Jeff Squyres
d4603f080d Refs trac:2854.
Since hwloc has a dynamic bitmap size, it could actually have bits set
that will not fit in the paffinity mask.  We already made sure that we
didn't overrun the paffinity mask; now also set the return value to
OPAL_ERR_VALUE_OUT_OF_BOUNDS (wow, we really thought of everything
with those error codes, eh?) if the hwloc bitmap has bits set higher
than what will fit into the paffinity bitmask.

This commit was SVN r25179.

The following Trac tickets were found above:
  Ticket 2854 --> https://svn.open-mpi.org/trac/ompi/ticket/2854
2011-09-24 13:52:27 +00:00
Jeff Squyres
57323570e3 These calls were mistakenly added in r25164.
This commit was SVN r25176.

The following SVN revision numbers were found above:
  r25164 --> open-mpi/ompi@51129cc2a8
2011-09-22 18:20:30 +00:00
Jeff Squyres
82c93611e6 Fix some problems with the libevent and hwloc frameworks:
* change components from setting <framework>_base_include to
   opal_<framework>_<component>_include; the framework m4 will figure
   out the winning component and pick the right "include" shell
   variable.  Ditto for the other shell variables (cppflags, ldflags,
   etc.). 
 * misc fixes to hwloc/external
 * add a bunch of missing "opal_" prefixes to shell variables
 * add a few more / update a few comments in framework m4's

This commit was SVN r25174.
2011-09-21 23:06:13 +00:00
Ralph Castain
a818101d19 Silence compiler warning
This commit was SVN r25172.
2011-09-21 16:39:59 +00:00
Shiqing Fan
3e0ee394ef Select only one libevent component.
This commit was SVN r25169.
2011-09-20 16:10:01 +00:00
Shiqing Fan
4caed984ed Need to exclude another file for windows build.
This commit was SVN r25168.
2011-09-20 16:09:03 +00:00
Ralph Castain
51129cc2a8 If built without hwloc xml support, we cannot currently pass the local topology from the daemon to an MPI app. This makes it impossible to set affinity, for example. In this case, have the app get its own copy of the topology at startup.
For safety sake, protect hwloc-based affinity modules from NULL topology

This commit was SVN r25164.
2011-09-20 14:46:55 +00:00
Ralph Castain
052ccd4b1e Set ignores
This commit was SVN r25163.
2011-09-20 13:47:05 +00:00
Ralph Castain
45396d8f9c Add missing files
This commit was SVN r25162.
2011-09-20 13:37:22 +00:00
Nathan Hjelm
7cd8f21b7f add libevent 2.0.13 module
This commit was SVN r25161.
2011-09-20 00:13:05 +00:00
Jeff Squyres
9db4542c2b Move maffinity_base_alloc_policy and
maffinity_base_bind_failure_action MCA params to the hwloc base
(hwloc_base_alloc_polocy and hwloc_base_bind_failure_action).  Since
these MCA parameters were never on a release branch, I'm just
moving/renaming them outright and not leaving aliases to the old
names.

Note that some upper layer needs to call
opal_hwloc_base_set_process_membind_policy() to set the
set-by-MCA-param process-wide memory affinity policy.  We can't do
this automatically during hwloc_base_open() because, for reasons
described elsewhere, opal_hwloc_topology is not automatically filled
during hwloc_base_open() (in short: potential scalability issues when
launching many MPI processes simultaneously on a single machine, for
example).

This commit was SVN r25156.
2011-09-19 16:10:37 +00:00
Jeff Squyres
dc70100cee In reviewing CMR #2866, it was noticed that the maffinity/hwloc and
paffinity/hwloc components were still calling hwloc_topology_init/load
themselves, and not using the opal_hwloc_topology.  Doh!

This commit fixes that -- these 2 components no longer have their own
copy of the topology tree; they just use opal_hwloc_topology.

This commit was SVN r25151.
2011-09-17 13:13:36 +00:00
Jeff Squyres
ecd603256a * Rename opal_hwloc_components to opal_hwloc_base_components
* Fix some comments

This commit was SVN r25150.
2011-09-17 11:54:36 +00:00
Jeff Squyres
4a2cf81c6f Fixes to ensure that dependent libraries are carried forward from the embedded hwloc
This commit was SVN r25140.
2011-09-13 22:43:39 +00:00
Jeff Squyres
d6682523f6 Put in proper basename so that "make dist" can find it.
This commit was SVN r25135.
2011-09-13 11:09:56 +00:00
Jeff Squyres
4771c36061 * With some m4 trickery, if no form of --with-hwloc is specified on
the command line, hwloc is just like any other external dependency
   in OMPI: if we find it, we'll use it. If we don't find it, we'll
   ignore it.  See comments in opal/mca/hwloc/configure.m4 for an
   explanation. 
 * Fix some copy-n-paste errors in opal/mca/hwloc/configure.m4
   w.r.t. flags coming in from the winning component.
 * Add another line in ompi_info's output about whether hwloc support
   is included or not.

This commit was SVN r25134.
2011-09-13 00:39:14 +00:00
Jeff Squyres
7dc352a328 Add some notes about maintaining the hwloc framework.
This commit was SVN r25132.
2011-09-12 19:40:18 +00:00
Jeff Squyres
c5bfa09574 Fixes from Brice Goglin, post hwloc v1.2.1 for AMD Magny-Cours. See
http://www.open-mpi.org/community/lists/users/2011/09/17164.php.

This commit was SVN r25131.
2011-09-12 19:03:48 +00:00
Shiqing Fan
b61eed801f Fix the problem of building hwloc on Windows. Temporarily not using it for Windows.
This commit was SVN r25128.
2011-09-12 13:55:34 +00:00
Ralph Castain
6460fe5480 Silence warning
This commit was SVN r25127.
2011-09-12 13:32:21 +00:00
Ralph Castain
92c7372e20 Per the RFC from Jeff, move hwloc from opal/mca/common to its own static framework ala libevent. Have ORTE daemons collect the topology info at startup and, if --enable-hwloc-xml is set, send that info back to the HNP for later use. The HNP only retains unique topology "templates" to reduce memory footprint. Have the daemon include the local topology info in the nidmap buffer sent to each app so the apps don't all hammer the local system to discover it for themselves.
Remove the sysinfo framework as hwloc replaces that functionality.

This commit was SVN r25124.
2011-09-11 19:02:24 +00:00
Ralph Castain
b2971df7df Ensure we loop over all cpu's.
Thanks to Nadia Derbey for spotting the error.

This commit was SVN r25102.
2011-08-29 14:24:34 +00:00
Eugene Loh
55a7b474dd Change a stray __volatile to __volatile__.
This commit was SVN r25092.
2011-08-26 15:36:10 +00:00
Jeff Squyres
495ceef60d Upgrade hwloc to v1.2.1.
This commit was SVN r25088.
2011-08-26 13:14:26 +00:00
Ralph Castain
df28c63164 If we are on a single processor, then we are effectively bound - so have the macro correctly report it.
Thanks to Pascal Deveze for the patch.

This commit was SVN r25068.
2011-08-22 16:28:40 +00:00
Shiqing Fan
3af7c9f7bb Complete the MinGW build support on Windows.
This commit was SVN r25048.
2011-08-15 09:47:23 +00:00
Jeff Squyres
1cbfb53801 r24976 wasn't quite right -- you now actually get a warning if you
specify btl_tcp_if_include because btl_tcp_if_exclude is defaulted to
the loopback devices.

This commit does a few things:

 * Introduce a new OPAL MCA base function:
   mca_base_param_check_exclusive_string().  It checks to see that the
   ''user'' does not set two MCA parameters that are mutually
   exclusive by checking the source of those MCS param values.
 * Use the above function in many BTLs (and the OOB TCP) to ensure
   that <foo>_if_include and <foo>_if_exclude are not both specified
   ''by the user''.
 * Re-arrange many of these BTLs to move their MCA registration code
   into a separate component_register() function (vs. the
   component_open() function).

This code has been nominally reviewed and checked by Ralph, George,
Terry, and Shiqing.

This commit was SVN r25043.

The following SVN revision numbers were found above:
  r24976 --> open-mpi/ompi@8f4ac54336
2011-08-10 17:24:36 +00:00
Samuel Gutierrez
bb791eaa23 change opal_output_verbose level to be consistent with shmem base.
This commit was SVN r25036.
2011-08-09 21:34:12 +00:00
Samuel Gutierrez
b144c8c343 silence warning in shmem posix run-time test when err is not equal to EEXIST.
This commit was SVN r25034.
2011-08-09 21:13:28 +00:00
Jeff Squyres
ba432393d4 Remove some really old (internal) kruft that never ended up getting
used. 

This commit was SVN r24988.
2011-08-04 15:24:37 +00:00
Samuel Gutierrez
adde221413 use memcpy in ds_copy.
This commit was SVN r24942.
2011-07-25 17:16:29 +00:00
Ralph Castain
8a7f9f8997 Hide libevent symbols when internal thread support enabled
This commit was SVN r24922.
2011-07-22 19:49:47 +00:00
Ralph Castain
3f0d13efe2 Fix libevent internal thread support
This commit was SVN r24920.
2011-07-22 19:18:49 +00:00
Shiqing Fan
edaa7b96e4 This should not be commented out.
This commit was SVN r24914.
2011-07-21 12:56:18 +00:00
Shiqing Fan
665d1284be Fix a bug that memcpy'ing a wrong temp string.
This commit was SVN r24912.
2011-07-21 12:53:03 +00:00
Ralph Castain
6201581544 Fix the symbol visibility issue for libevent by renaming all visible libevent symbols
This commit was SVN r24902.
2011-07-14 07:10:52 +00:00
Ralph Castain
5e99d45ae4 Remove unused variable
This commit was SVN r24887.
2011-07-13 03:42:20 +00:00
Nadia Derbey
0d0cead33a Fix a hang in carto_base_select() if carto_module_init() fails
This commit was SVN r24876.
2011-07-12 05:47:28 +00:00
Abhishek Kulkarni
7363938ba8 add a missing include.
This commit was SVN r24873.
2011-07-11 00:04:31 +00:00
Jeff Squyres
08a05a1e35 Minor additions to make OMPI trunk compatible with the latest GNU
Autotools:

 * Autoconf 2.68
 * Automake 1.11.1
 * Libtool 2.4
 * m4 1.4.16

This commit was SVN r24867.
2011-07-10 12:11:47 +00:00
Jeff Squyres
e2df4d4a8d Some platforms don't have <execinfo.h>, even if they have backtrace()
function (e.g., NetBSD).  Thanks to Aleksej Saushev for pointing out
the issue. 

This commit was SVN r24866.
2011-07-10 11:14:19 +00:00
Jeff Squyres
b2b781e537 Fix a few miscelaneous memory leaks.
This commit was SVN r24865.
2011-07-08 16:39:58 +00:00
Terry Dontje
86a80411f0 update changes from review comments of #2816
This commit was SVN r24856.
2011-07-05 22:51:39 +00:00
Terry Dontje
8c0af7838a add configure check for Solaris Legacy munmap prototype
This commit was SVN r24839.
2011-06-29 23:45:27 +00:00
Ralph Castain
4dc3ee369f If event threads are enabled, we don't need to wakeup the event lib to pickup new events - so help valgrind to quit whining about it.
This commit was SVN r24837.
2011-06-29 22:52:28 +00:00
Samuel Gutierrez
93110ce805 place a bandage on ds_copy plus minor cleanup. i need to rethink this part of the framework. thanks to Rolf for pointing out the issue.
This commit was SVN r24831.
2011-06-28 19:37:12 +00:00
Ralph Castain
cd6b8417ec Cleanup a set of warnings that appear to be caused by failure of PRIsize_t on Linux.
Set ignore properties

This commit was SVN r24812.
2011-06-23 15:07:58 +00:00
Samuel Gutierrez
61ff422562 fix a few more spots in posix.
This commit was SVN r24808.
2011-06-22 23:17:26 +00:00
Samuel Gutierrez
7fcf806dc9 fix posix builds on solaris. shmem still needs more cleanup on solaris, but at least shmem will stop breaking builds (i hope).
This commit was SVN r24807.
2011-06-22 23:08:58 +00:00
Samuel Gutierrez
5b5ce434fc fix shmem sysv build on solaris.
This commit was SVN r24806.
2011-06-22 18:05:08 +00:00
Samuel Gutierrez
867df203bc fix shmem mmap build on solaris. thanks terry.
This commit was SVN r24805.
2011-06-22 16:05:50 +00:00
Rolf vandeVaart
856a9c43b1 Add string.h. Needed when configuring with --enable-picky
This commit was SVN r24804.
2011-06-22 15:48:32 +00:00
Samuel Gutierrez
81f38b258a commit of new shared memory backing facility framework (shmem) and its components.
This commit was SVN r24795.
2011-06-21 15:41:57 +00:00
Josh Hursey
b223d355fc Add explicit number for opal_crs_state_type_t enum (for debugging). Also add a MAX so we can easily check for out of bounds states during debugging.
This commit was SVN r24766.
2011-06-09 14:27:24 +00:00
Ralph Castain
8f401a0563 Enable the ability to constrain applications to hosts on the basis of resources.
This commit was SVN r24736.
2011-05-28 22:18:19 +00:00
Ralph Castain
b47ec2ee87 Remove lingering references to opal_profile option
This commit was SVN r24709.
2011-05-18 18:27:29 +00:00
Ralph Castain
ddf4914094 Plug fd leak
This commit was SVN r24707.
2011-05-18 13:46:27 +00:00
Ralph Castain
4083e23073 Complete cleanup of pstat linux
This commit was SVN r24701.
2011-05-16 14:08:08 +00:00
Ralph Castain
08c3ecd608 Handle the case where memory stats are in different order, or don't exist on that platform
This commit was SVN r24700.
2011-05-16 13:32:42 +00:00
Ralph Castain
a3e43594a4 Extend node stats to include additional memory info. Change "darwin" pstat module to "test" as we don't really know how to get all the stat info for darwin.
Add a new OPAL_ERROR_LOG macro similar to the ORTE_ERROR_LOG one.

This commit was SVN r24692.
2011-05-08 14:45:16 +00:00
Jeff Squyres
0882d636a6 Oops -- need string.h, too (for strcasecmp).
This commit was SVN r24649.
2011-04-28 15:42:35 +00:00
Jeff Squyres
7362a0730a Change the default to "none". David Singleton raises a good point
that enabling "local_only" by default could cause excessive
by-NUMA-node paging and/or OOMs (rather than allowing memory
allocations to spill over to other NUMA nodes).

This brought home the very real-world example of people buying servers
with more processors/cores than they need, just to get more memory.
We wouldn't want Badness to occur in such scenarios by default.
Instead, let people turn on "only allow memory allocations on my local
NUMA node" if their application would benefit from it.

This commit was SVN r24648.
2011-04-28 15:16:39 +00:00
Jeff Squyres
7b48042ffd Commit patch from upstream hwloc: r3482. Fixes some compiler
warnings. 

This commit was SVN r24641.

The following SVN revision numbers were found above:
  r3482 --> open-mpi/ompi@2435be8d49
2011-04-27 17:08:15 +00:00
Jeff Squyres
d134ff9b4d Refs trac:2698
After a long period of development with many starts and stops, we
finally got this where we wanted it.

This commit introduces 2 new MCA params (note that the
"maffinity_libnuma_policy" MCA param introduced by r24290 was removed
when libnuma support was removed).  Remember that maffinity policies
are only in effect when paffinity is enaabled -- i.e., when processes
are bound to processors!

 * '''maffinity_base_alloc_policy:''' Policy that determines how
   general memory allocations are bound after MPI_INIT.  A value of
   "none" means that no memory policy is applied.  A value of
   "local_only" means that all memory allocations will be restricted
   to the local NUMA node where each process is placed.  Note that
   operating system paging policies are unaffected by this setting.
   For example, if "local_only" is used and local NUMA node memory is
   exhausted, a new memory allocation may cause paging.
 * '''maffinity_base_bind_failure_action:''' What Open MPI will do if
   it explicitly tries to bind memory to a specific NUMA location, and
   fails.  Note that this is a different case than the general
   allocation policy described by maffinity_base_alloc_policy.  A
   value of "warn" means that Open MPI will warn the first time this
   happens, but allow the job to continue (possibly with degraded
   performance).  A value of "error" means that Open MPI will abort
   the job if this happens.

This needs at least a little soak time on the trunk before going to
v1.5.

This commit was SVN r24639.

The following SVN revision numbers were found above:
  r24290 --> open-mpi/ompi@afa654746c

The following Trac tickets were found above:
  Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
2011-04-26 13:31:07 +00:00
Jeff Squyres
926af377fe Refs trac:2778.
Upgrade to hwloc 1.2 (from hwloc 1.1.2).  This should fix the problems
Nathan's seeing in #2778.

Let's let this soak on the trunk for a little while and see how LANL's
MTT's work out.  If that works, then we can CMR this to v1.5.

This commit was SVN r24635.

The following Trac tickets were found above:
  Ticket 2778 --> https://svn.open-mpi.org/trac/ompi/ticket/2778
2011-04-25 19:31:49 +00:00
Jeff Squyres
b8af3b7c4a New comment explains it all -- previous code was failing to find the
Nth core, so it fell over to try to find the Nth PU.

-----

hwloc isn't able to find cores on all platforms.  Example: PPC64
running RHEL 5.4 (linux kernel 2.6.18) only reports NUMA nodes and
PU's.  Fine.

However, note that hwloc_get_obj_by_type() will return NULL in 2
(effectively) different cases:

- no objects of the requested type were found
- the Nth object of the requested type was not found

So first we have to see if we can find *any* cores by looking for the
0th core.  If we find it, then try to find the Nth core.  Otherwise,
try to find the Nth PU.

This commit was SVN r24632.
2011-04-25 16:55:27 +00:00
Ralph Castain
9988b97b97 Extend/update how we handle process stats. Add the ability to collect node-level stats separate from the process stats. Update the process stat memory fields to report in MBytes instead of KBytes as I can't find any process that runs in KBytes nowadays.
Rename the memusage sensor plugin to "resusage" as it will soon be updated to include full process stat monitoring.

Extend the heartbeat sensor to report node and process stats in the heartbeat.

Store the process and node stats in their respective orte_xxx_t object.

This commit was SVN r24629.
2011-04-21 22:55:45 +00:00
Jeff Squyres
2fe94b929a Manually add hwloc v1.1 branch r3418 commit (went in after v1.1.2
released): 

backport hwloc r 3416 from trunk: Add cache info entry _after_ checking
that we need one, thanks Andriy Gapon for the fix

This commit was SVN r24612.

The following SVN revision numbers were found above:
  r3418 --> open-mpi/ompi@9972663a12
2011-04-12 14:41:46 +00:00
Jeff Squyres
9dc3a1aa54 Upgrade to hwloc 1.1.2; most likely the last release of the hwloc
1.1.x series

This commit was SVN r24611.
2011-04-12 14:35:26 +00:00
Jeff Squyres
38d3cdd4a6 Update hwloc to 1.1.1. Next stop: 1.1.2.
This commit was SVN r24610.
2011-04-12 14:16:37 +00:00
Shiqing Fan
4b3b713bfc Update the windows installdir component.
Don't use the old env component for windows, so remove the .windows file.

This commit was SVN r24597.
2011-04-05 12:15:41 +00:00
Ralph Castain
f40edd6b4f Add the stupid test word
This commit was SVN r24578.
2011-03-26 03:38:59 +00:00
Ralph Castain
5bfb01c6c8 Only build the linux component of sysinfo if linux is the operating system.
Thanks to Paul Hargrove for the suggestion.

This commit was SVN r24576.
2011-03-25 20:55:57 +00:00
Jeff Squyres
cf6c5e8d48 Fix a bug noted by Gus Correa on the user's list: mpi_paffinity_alone
appeared multiple times in ompi_info output (so did others, but this
is the one that was noticed).  Ensure that we don't repeat
opal_paffinity_base_register_params() multiple times.

This commit was SVN r24569.
2011-03-24 00:58:25 +00:00
Jeff Squyres
324b90142f Fix CID 1583: hwloc bitmap leak.
This commit was SVN r24496.
2011-03-08 16:47:26 +00:00
Josh Hursey
7c737b9274 Some string and state cleanup. Thanks to George Bosilca for the initial patch.
This commit was SVN r24471.
2011-03-01 20:12:23 +00:00
Jeff Squyres
ad985260d3 Ensure to disable XML and Cairo support in hwloc; OMPI doesn't use it. Additionally, ensure that the right flags are passed back to the wrappers in the case of static builds. We probably won't need these (especially since XML has been disabled), but it's the Right Thing to do.
This commit was SVN r24451.
2011-02-23 23:11:45 +00:00
Jeff Squyres
e8ba72258e Patch for PPC64 platforms with smt=off, issue raised by Brad. This
fix will be included in hwloc 1.1.2.

Brad -- can you verify that this fixes the issue for you?

Fixes trac:2732.

This commit was SVN r24450.

The following Trac tickets were found above:
  Ticket 2732 --> https://svn.open-mpi.org/trac/ompi/ticket/2732
2011-02-23 22:43:58 +00:00
Jeff Squyres
8143b201a9 Custom patch for hwloc (that will be included in hwloc 1.1.2) so that
we don't barf on Linux non-NUMA (NNUMA, aka UMA ;-) ) platforms.

This commit was SVN r24448.
2011-02-23 21:02:02 +00:00
Jeff Squyres
2368410eff * Ensure to follow standard filename conventions for output MCA DSO
filenames -- don't include the project name ("opal")
 * Don't link maffinity/hwloc and paffinity/hwloc against the common
   hwloc in the static build case (because this will result in
   duplicate symbols)

This commit was SVN r24447.
2011-02-23 21:00:20 +00:00
Jeff Squyres
5e082d68f6 Fix the compile error for libnuma 0.9.x introduced in r24442;
hopefully, this now compiles for libnuma 0.9.x and libnuma 2.0.x.

Fixes for the strategy discussed in the commit message for r24442
(i.e., check against numa_get_mems_allowed(), which only exists in
libnuma 2.0.x) and the new new new plan on #2698 coming in a separate
commit.

This commit was SVN r24443.

The following SVN revision numbers were found above:
  r24442 --> open-mpi/ompi@90a8fe4aad
2011-02-23 13:44:46 +00:00
Rainer Keller
90a8fe4aad - Addendum to r24421: get mca_maffinity_libnuma to compile on linux
(with libnuma-2.0.4 / LIBNUMA_API_VERSION 2): numa_get_run_node_mask
   returns a struct bitmask *.

   Whether it's a good idea to blindly pass that on to
   numa_set_membind() is another matter: one might want to match against
   the list returned by numa_get_mems_allowed(), which may be set by the
   outside environment.

   Refs trac:2698.

This commit was SVN r24442.

The following SVN revision numbers were found above:
  r24421 --> open-mpi/ompi@31510e683b

The following Trac tickets were found above:
  Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
2011-02-23 12:59:49 +00:00
Jeff Squyres
c1b26005d7 Create new opal/mca/common area, similar to ompi/mca/common. Move hwloc into this new opal MCA common area, and link the hwloc paffinity component against it. Also add a new hwloc maffinity component, and also link it against the opal MCA common hwloc. More development coming soon regarding this common hwloc instance (i.e., an OPAL-ized version of the hwloc API via a new framework so that we can safely use hwloc's services throughout the rest of the OPAL/ORTE/OMPI code bases.
This commit was SVN r24440.
2011-02-22 23:21:48 +00:00
Jeff Squyres
31510e683b Replace r24290 with something more meaningful. In this case, find out
what memory node the process is running on (which is guaranteed to be
a good answer because maffinity won't be invoked unless the process is
already bound to a specific processor), and then bind our memory to
that. 

Refs trac:2698.

This commit was SVN r24421.

The following SVN revision numbers were found above:
  r24290 --> open-mpi/ompi@afa654746c

The following Trac tickets were found above:
  Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
2011-02-21 20:07:11 +00:00
Shiqing Fan
ddc05e05d7 Avoid blocking select on Windows.
This commit was SVN r24396.
2011-02-16 08:48:21 +00:00
Ralph Castain
bf1cff3711 Plug a couple of additional memory leaks - try to highlight a little better that strings returned from reg_string_name must be freed by caller
This commit was SVN r24383.
2011-02-14 20:58:22 +00:00
Ralph Castain
d85916c1c2 Plug memory leak
This commit was SVN r24380.
2011-02-14 19:51:33 +00:00
Jeff Squyres
b0ce9bae8e Oops. Also need to remove myriexpress.h from the Makefile.am.
This commit was SVN r24357.
2011-02-04 03:29:49 +00:00
Jeff Squyres
6421abecc7 Fixes trac:2690.
Temporarily remove hwloc's internal version of myriexpress.h.  It is
causing a problem when compiling Open MPI with MX support because
hwloc uses AC_CONFIG_HEADER in hwloc's hwloc.m4 to generate
opal/mca/paffinity/hwloc/hwloc/include/hwloc/config.h.
AC_CONFIG_HEADER apparently has the (undocumented) side effect of
adding -I$(top_builddir)/opal/mca/paffinity/hwloc/hwloc/include/hwloc
to OMPI's compilation flags.  Hence, when the OMPI MX components are
compiled and #include "myriexpress.h" (or <myriexpress.h>) they see
hwloc's myriexpress.h before the system one.  Badness ensures.

This removal is temporary because we need to figure out a better
solution.  But for now, OMPI is not using hwloc's myriexpress.h file --
so it's safe to remove.  I'll push this issue upstream to hwloc to
figure out a better solution...

This commit was SVN r24354.

The following Trac tickets were found above:
  Ticket 2690 --> https://svn.open-mpi.org/trac/ompi/ticket/2690
2011-02-03 14:24:32 +00:00
Jeff Squyres
4674e62929 These files are superflouos.
This commit was SVN r24331.
2011-02-01 21:31:35 +00:00
Jeff Squyres
c8badb79df Don't instantiate variables in for loops; we don't assume C99
compilers. 

This commit was SVN r24330.
2011-02-01 19:23:14 +00:00
Nysal Jan
42015cf30a Fix build failure on AIX
This commit was SVN r24321.
2011-01-28 08:09:45 +00:00
Nysal Jan
857c32784e Fix detection of fd_mask
This commit was SVN r24320.
2011-01-28 06:20:32 +00:00
Jeff Squyres
6c8de8fb76 Bump up to hwloc 1.1.1
This commit was SVN r24312.
2011-01-26 23:20:26 +00:00
Josh Hursey
66af515061 Fix C/R functionality with the new libtool. This fixes the case where the restarted process cannot be checkpointed or finalized.
Short Version:
--------------
Event engine needs to be flushed so it does not use old/stale file descriptors.

Long Version:
-------------
The problem was that the restarted process was waiting for the socket to the local daemon to finish establishing during the 'sync' operation. The core problem was that the daemon was sending a header of 36 bytes, but the restarted process only received 35 bytes of the message. So the restarted process became stuck waiting for the last byte to arrive.

After many hours of digging, I figured out that the event engine was using the same file descriptor for its evsig_cb functionality (to signal itself when a signal arrives). So when the daemon wrote in to the new fd the event engine was stealing the first byte (*shakes fist at event engine*) before the recv() could be posted.

The solution is to use the event_reinit() function on restart to re-establish the now-stale file descriptors in the event engine. This seems to have fixed the problem.


A few other minor things:
-------------------------
 * Add a check to make sure the event engine is balanced in its init/finalize
 * Add the opal_event_base_close() to the BLCR restart exec function (still not 100% sure it is needed, but there it is).

This commit was SVN r24296.
2011-01-25 22:43:47 +00:00
Jeff Squyres
afa654746c Somehow this has been sitting, uncommitted, in a local checkout since
last December.  :-(

Add new MCA param: maffinity_libnuma_policy.  Thanks to David
Singleton for the suggestion.  Here's the help text about it:

{{{
   MCA maffinity: parameter "maffinity_libnuma_policy" (current value:
                  <loose>, data source: default value)
                  Binding policy that determines what happens if memory
                  is unavailable on the local NUMA node.  A value of
                  "strict" means that the memory allocation will fail;
                  a value of "loose" means that the memory allocation
                  will spill over to another NUMA node.
}}}

This commit was SVN r24290.
2011-01-24 14:39:16 +00:00
Abhishek Kulkarni
3243b16bb3 Decode SOS error code before checking it with the native error code.
This commit was SVN r24281.
2011-01-20 23:21:38 +00:00
Jeff Squyres
189b541dbd Add a proper help message for the mca_verbose MCA param (and shuffle
the code to be slightly more efficient).

This commit was SVN r24256.
2011-01-14 20:18:06 +00:00
Terry Dontje
56c03a3853 removing a file I should not have added
This commit was SVN r24220.
2011-01-11 19:02:08 +00:00
Terry Dontje
a374661ead add configure.params to solaris sysinfo module to allow it to be built
This commit was SVN r24219.
2011-01-11 18:31:55 +00:00
Jeff Squyres
cd8f12d8e5 Remove a few useless files that were missed last night.
This commit was SVN r24218.
2011-01-11 14:15:31 +00:00
Jeff Squyres
54cb4eb2b5 Merge over new version of hwloc 1.1 from the vendor branch. Update
the module to use the new hwloc bitmap API (the cpuset API is both
klunkier and deprecated), which simplified a few things.

This commit was SVN r24217.
2011-01-11 01:41:10 +00:00
Josh Hursey
bbfdf04a81 Fix a couple of 'unused variable' warnings, and one return value warning.
{{{
base/paffinity_base_service.c: In function ‘opal_paffinity_base_cset2mapstr’:
base/paffinity_base_service.c:623: warning: unused variable ‘range_last’
base/paffinity_base_service.c:623: warning: unused variable ‘range_first’
base/paffinity_base_service.c:622: warning: unused variable ‘count’
base/paffinity_base_service.c:622: warning: unused variable ‘m’
}}}

{{{
connect/btl_openib_connect_oob.c: In function ‘init_ud_qp’:
connect/btl_openib_connect_oob.c:1111: warning: control reaches end of non-void function
connect/btl_openib_connect_oob.c: In function ‘init_device’:
connect/btl_openib_connect_oob.c:1235: warning: unused variable ‘i’
connect/btl_openib_connect_oob.c: In function ‘get_pathrecord_sl’:
connect/btl_openib_connect_oob.c:1323: warning: unused variable ‘i’
}}}

This commit was SVN r24196.
2010-12-30 15:37:50 +00:00
Terry Dontje
6da16ab0d7 add format parameter and layout format to OMPI_Affinity_str
This commit was SVN r24182.
2010-12-16 15:11:17 +00:00
Shiqing Fan
ec82e73bce use sockets instead of pipes on Windows.
This commit was SVN r24174.
2010-12-15 14:34:25 +00:00
Rolf vandeVaart
3f7dd84278 Fix libevent so it can compile in the few cases where sys/queue.h does not exist.
1. Remove it from libevent207.h because it is not needed.
2. Add compat to the include list so it can use queue.h when needed.

This commit was SVN r24144.
2010-12-02 23:05:02 +00:00
Shiqing Fan
f43862420c Convert the bad dos line endings to unix style for all windows related files.
This commit was SVN r24137.
2010-12-02 12:08:08 +00:00
Ralph Castain
c56185887b Change the event base "wakeup" support to enable the passing of events to the central thread for add/del. Add a macro OPAL_UPDATE_EVBASE for this purpose as it will likely be widely used.
Update the ORTE thread support to utilize this capability. Update the rmcast framework to track the change.

This commit was SVN r24121.
2010-12-01 04:26:43 +00:00
Ralph Castain
2523c9b2e8 Overload the event_base_t struct to include a (hopefully) temporary change to deal with cross-event-base synchronization. This is done transparently so no code changes are required within the rest of the code base. Comments explain what was changed and why.
This commit was SVN r24105.
2010-11-30 16:14:19 +00:00
Ralph Castain
71f116d21f Expose the event_active API
This commit was SVN r24090.
2010-11-24 23:30:13 +00:00
Ralph Castain
380835602c Add support for internal libevent threading support. Add configure logic to define an appropriate flag, and then use that flag to expose the required functions.
This commit was SVN r24088.
2010-11-24 23:24:53 +00:00
Shiqing Fan
39c9f7468e Add support for managing priorities of windows mca components.
Correct the generated strings in mpi.h.

This commit was SVN r24082.
2010-11-23 19:09:06 +00:00
Rolf vandeVaart
e7ff9375d7 Use pid_t to avoid warnings on some platforms.
This commit was SVN r24072.
2010-11-19 17:14:33 +00:00
Rolf vandeVaart
1735f98c78 Avoid potential warnings by using pid_t in all places.
This commit was SVN r24071.
2010-11-19 16:29:45 +00:00
Shiqing Fan
4fea0f021e Per r24062, this should also be removed.
This commit was SVN r24064.

The following SVN revision numbers were found above:
  r24062 --> open-mpi/ompi@3b0caf7dea
2010-11-17 17:14:55 +00:00
Rolf vandeVaart
3b0caf7dea Remove inclusion of stdbool.h where not needed.
Change OMPI code in libevent to not use bool.
Add some comments to indicate OMPI specific code.
This should fix compiles on Sun Studio Solaris.

This commit was SVN r24062.
2010-11-17 15:14:00 +00:00
Ralph Castain
32be69eaef Update the OMPI libevent interface module and the internal libevent event.c file to provide ability to disable specific event modes. Basically an issue between #define and checking to see if the value was defined to zero.
This commit was SVN r24056.
2010-11-16 16:06:32 +00:00
Ralph Castain
1b3421f16e Fix a bug spotted by Rolf - ensure that disable-event-xxx results in the corresponding have_event_xxx being undefined or defined to 0
This commit was SVN r24055.
2010-11-16 04:37:30 +00:00
Ralph Castain
b43a4509ac Remove stale mca param. Ensure that verbosity gets properly set for event framework debug
This commit was SVN r24050.
2010-11-13 15:37:17 +00:00
Ralph Castain
db014edb0b Initialize boolean
This commit was SVN r24048.
2010-11-13 15:31:55 +00:00
Jeff Squyres
e4744b4ed5 Per http://www.open-mpi.org/community/lists/devel/2010/11/8671.php,
change a bunch of OMPI_<foo> names to OPAL_<foo>.

This commit was SVN r24046.
2010-11-12 23:22:11 +00:00
Shiqing Fan
1f4eae2046 Type cast for compiling under VS 2010.
This commit was SVN r24044.
2010-11-12 08:31:23 +00:00
Jeff Squyres
dded8a9756 Ensure to always remove the .new file
This commit was SVN r24023.
2010-11-09 23:33:53 +00:00
Terry Dontje
8e0b24a45b add comment to r23998 code change to be able to track libevent code change better
This commit was SVN r24005.

The following SVN revision numbers were found above:
  r23998 --> open-mpi/ompi@e8aa8984a8
2010-11-08 14:36:28 +00:00
Terry Dontje
e8aa8984a8 corrected stdbool.h inclusion to allow Oracle C++ compilers to work with OMPI
This commit was SVN r23998.
2010-11-05 18:54:19 +00:00
Josh Hursey
676adfb7cc fix compile error, need to revisit this line later
This commit was SVN r23991.
2010-11-03 17:34:11 +00:00
Shiqing Fan
a7dc32afb0 Remove the OPAL_DECLSPEC for the event functions.
This commit was SVN r23987.
2010-11-03 09:10:12 +00:00
Shiqing Fan
505efbaa27 Update the CMake scripts, solve a few export symbols for Windows.
This commit was SVN r23976.
2010-11-02 16:39:27 +00:00
Jeff Squyres
9c15a30b75 Really fix the libevent make distcheck problem. The main issue is how
libevent creates its event-config.h during "make all" (vs. during
configure).  The prior method around this didn't work because it wrote
an event-config.h.in in the source tree -- a Bad Idea(tm).  The new
way uses AC_CONFIG_COMMAND to get stuff executed at the end of
config.status to create event-config.h.  This seems to work properly
during make distcheck.

This commit was SVN r23975.
2010-11-01 23:28:50 +00:00
Jeff Squyres
6bd41cf5d8 Fixes for vpath builds; this should enable 'make dist' again.
This commit was SVN r23973.
2010-10-29 22:07:52 +00:00
Ralph Castain
0171e05942 Only add include paths for event headers if --with-devel-headers was specified
This commit was SVN r23968.
2010-10-29 00:43:10 +00:00
Ralph Castain
838ed14401 Include the libevent headers when --with-devel-headers is specified. Ensure that the proper include paths are added to the wrapper compilers - thanks to Jeff for figuring out how to do it.
This commit was SVN r23967.
2010-10-28 21:26:07 +00:00
Ralph Castain
9ea2b196ce Convert the opal_event framework to use direct function calls instead of hiding functions behind function pointers. Eliminate the opal_object_t abstraction of libevent's event struct so it can be directly passed to the libevent functions.
Note: the ompi_check_libfca.m4 file had to be modified to avoid it stomping on global CPPFLAGS and the like. The file was also relocated to the ompi/config directory as it pertains solely to an ompi-layer component.

Forgive the mid-day configure change, but I know Shiqing is working the windows issues and don't want to cause him unnecessary redo work.

This commit was SVN r23966.
2010-10-28 15:22:46 +00:00
Brian Barrett
50394a05f2 Restore ordering to installdirs components
This commit was SVN r23964.
2010-10-28 01:03:16 +00:00
Terry Dontje
b3f2ac8d46 removed direct include of stdbool.h from event.h that was causing studio C++ issues. Also removed include of stdbool.h in a couple other places since it was already being pulled in via opal_config_bottom.h.
This commit was SVN r23963.
2010-10-27 20:47:42 +00:00
Jeff Squyres
33c3b71317 We had long-ago added a new loop type to libevent: EVLOOP_ONELOOP.
After talking with Brian, we're pretty sure that this is only because
really, really old libevent didn't allow bitwise or-ing of the other
loop types, because what we really need is (EVLOOP_ONCE |
EVLOOP_NONBLOCK).  And that's what EVLOOP_ONELOOP did (i.e., we
changed the logic of libevent's event.c to let ONELOOP do both ONCE
and NONBLOCK things).

In the new libevent version, we didn't implement EVLOOP_ONELOOP
properly.  As a result, and we got hangs in the SM BTL add_procs
function.  Note that the SM BTL wasn't to blame -- it was purely a
side-effect of bad ONELOOP integration (i.e., if you got past the SM
BTL add_procs, you may well have hung somewhere else).

This commit removes all ONELOOP customizations from event.c and
returns it to (almost) its original state from the libevent 2.0.7-rc
distribution.  Everwhere in the code base where we used ONELOOP, we
now use (ONCE | NONBLOCK).

This commit was SVN r23957.
2010-10-26 20:29:22 +00:00
Ralph Castain
a5c440c974 Turn off libevent's internal thread support to (hopefully) minimize performance hit
This commit was SVN r23956.
2010-10-26 20:10:44 +00:00
Shiqing Fan
a3d9c91ff7 Exclude stdbool.h for Windows, and use the definition in opal. Immigrate the socket pair support from libevent. Fix other minor things and make it compile.
This commit was SVN r23951.
2010-10-26 14:53:50 +00:00
Ralph Castain
847e43703f Remove cruft
This commit was SVN r23950.
2010-10-26 14:49:36 +00:00
Shiqing Fan
b2c3cb300c Correctly configure the new libevent mca for Windows.
This commit was SVN r23946.
2010-10-26 09:33:47 +00:00
Ralph Castain
86c7365e8e Clean up a few initialization issues - don't think these are impacting the shared memory situation as it didn't fix the problem.
Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution.

This commit was SVN r23943.
2010-10-26 02:41:42 +00:00
Jeff Squyres
ed1e9a412a Need these files in all tarballs -- so don't conditionally add them to
EXTRA_DIST. 

This commit was SVN r23938.
2010-10-25 18:31:38 +00:00
Jeff Squyres
d14474969b Need this variable in optimized builds, too.
This commit was SVN r23937.
2010-10-25 18:31:01 +00:00
George Bosilca
bc3e1376ba event-config.h only exists in the builddir, so we need to explicitly
include it while building.

This commit was SVN r23936.
2010-10-25 18:29:52 +00:00
George Bosilca
c2e40f8616 Remove a warning about signed to unsigned comparaison.
This commit was SVN r23935.
2010-10-25 18:29:11 +00:00
George Bosilca
b9a06afd98 opal_event_libevent207 is prototyped as const, so it should be defined as const.
This commit was SVN r23934.
2010-10-25 18:28:42 +00:00
Jeff Squyres
1d1571a86c Fix vpath builds.
This commit was SVN r23932.
2010-10-25 17:48:02 +00:00
Ralph Castain
a04da165bc Remove the sample and test code from the libevent distro - don't need to include them in ompi
This commit was SVN r23931.
2010-10-25 14:53:33 +00:00
Ralph Castain
bab990d812 Revert r23928 as being the incorrect fix. The correct fix is not to include ipv6 interfaces when ipv6 support was not requested.
This commit was SVN r23930.

The following SVN revision numbers were found above:
  r23928 --> open-mpi/ompi@7394f6d167
2010-10-25 14:31:18 +00:00
Abhishek Kulkarni
c671ec52d1 Fix broken trunk compile after the libevent changes.
This commit was SVN r23929.
2010-10-25 14:11:48 +00:00
Ralph Castain
fceabb2498 Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.

Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.

Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.

I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:

1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)

2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.

There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.

This commit was SVN r23925.
2010-10-24 18:35:54 +00:00
Jeff Squyres
e09bbb49a9 No need to have this AC_ARG_WITH in every component configure.m4 -- just put it up in the framework-level configure.m4.
This commit was SVN r23890.
2010-10-14 22:39:48 +00:00
Sylvain Jeaugey
5fb2a2f2c9 Add a check for the ummunotify device before setting up ptmalloc2 hooks.
This commit was SVN r23882.
2010-10-11 15:05:57 +00:00
Sylvain Jeaugey
78176d2aeb Fix missing include in ummunotify
This commit was SVN r23881.
2010-10-11 15:03:00 +00:00
Jeff Squyres
69a64e5905 Fix typo that prevented the valgrind component from configuring properly
This commit was SVN r23874.
2010-10-07 22:39:08 +00:00
Jeff Squyres
a95ca7444e Fix a meaningless compare of an unsigned against 0. Rework the logic
a bit so that the secondary loop isn't even necessary; makes the whole
thing much simpler, anyway.

This commit was SVN r23860.
2010-10-07 15:04:50 +00:00
Jeff Squyres
73bcc4a36b Fix mistake that came in via the ompi-agen tree in r23764. The mistake wasn't part of the core autogen upgrade; it was an additional 'bonus' cleanup. Oops. The mistake will always create a set of directories under installdir, even if you do not --with-devel-headers. The set of directories will be empty, but still -- they should not be there at all. This commit fixes that -- the directories are not created at all if you do not --with-devel-headers
This commit was SVN r23801.

The following SVN revision numbers were found above:
  r23764 --> open-mpi/ompi@40a2bfa238
2010-09-24 22:53:28 +00:00
Jeff Squyres
7ef20f60f3 Autoconf updates to make us compatible with AC 2.68. Thanks to Ralf W. for the patch!
This commit was SVN r23797.
2010-09-23 22:37:52 +00:00
Ralph Castain
407eefc66d Update the if configure to include "opal" so they will build!
This commit was SVN r23787.
2010-09-22 03:19:15 +00:00
Jeff Squyres
0ca617e570 Make this a warning, not an error.
This commit was SVN r23767.
2010-09-18 07:14:58 +00:00
Ralph Castain
40a2bfa238 WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gone into validating the branch. However, not all conditions can be checked, so users are cautioned that it may be advisable to not update from the trunk for a few days to allow MTT to identify platform-specific issues.
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.

Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.

This commit was SVN r23764.
2010-09-17 23:04:06 +00:00
Rolf vandeVaart
09750d0310 Need output.h header file for opal_output() definition.
Otherwise, build will fail when configuring with --enable-picky.

This commit was SVN r23763.
2010-09-17 12:22:17 +00:00
Shiqing Fan
9a47ca1995 Correct the place of including the if.h, and change retain_loopback to opal_if_retain_loopback for windows module too.
This commit was SVN r23756.
2010-09-14 14:03:48 +00:00
Ralph Castain
c74ce1632a Catch a couple of places (one hidden inside an #if 0, other in solaris module) where retain_loopback needs to be opal_if_retain_loopback
This commit was SVN r23755.
2010-09-14 11:37:10 +00:00
Shiqing Fan
95b17c1e82 Add a missing header for if windows.
This commit was SVN r23754.
2010-09-14 07:51:38 +00:00
Ralph Castain
e96b5f486f Reorganize the opal interface code in opal/util/if.c per prior emails and telecon discussions. Move the interface discovery code into a framework so that configuration logic can separate it out (instead of the prior #if-#else confusion).
All interface APIs for accessing the info remain unchanged in opal/util/if.c.

This has been tested on Mac, Linux, and NetBSD. Nobody else seemed interested in testing it, so there may be some future problems revealed as people try it on other OSs.

This commit was SVN r23743.
2010-09-13 01:58:51 +00:00
Jeff Squyres
3b14366c85 Fix a copyright statement
This commit was SVN r23741.
2010-09-12 09:55:01 +00:00
Jeff Squyres
3eedbee7a4 Fixes trac:2541. Ensure that we keep CPPFLAGS if a non-standard valgrind location was specified. CMR:v1.4.3 CMR:v1.5
This commit was SVN r23680.

The following Trac tickets were found above:
  Ticket 2541 --> https://svn.open-mpi.org/trac/ompi/ticket/2541
2010-08-27 22:45:02 +00:00
Jeff Squyres
97fb426325 Per long-ago RFC, now that the odsl default module reports errors nicely, remove all paffinity components except for hwloc and test.
This commit was SVN r23666.
2010-08-25 22:34:30 +00:00
Ralph Castain
23904c2f3e Correct the extra_dist path to the .windows file
This commit was SVN r23613.
2010-08-14 01:21:58 +00:00
Jeff Squyres
a2f349167e Update hwloc to 1.0.3a1r2398. This fixes a problem with Solaris
linking against libibverbs on Solaris.

Sorry for the mid-day configure change folks; I meant to commit this
last night and forgot.  :-(

This commit was SVN r23606.
2010-08-13 13:18:09 +00:00
Shiqing Fan
550f180014 Add a windows support file into the tarball.
This commit was SVN r23605.
2010-08-13 11:54:13 +00:00
Shiqing Fan
330999e36c Some fixes for C/R enhancement on Windows. Add the option and fix some type casts, just let it compile.
This commit was SVN r23599.
2010-08-12 13:31:37 +00:00
Josh Hursey
e12ca48cd9 A number of C/R enhancements per RFC below:
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php

Documentation:
  http://osl.iu.edu/research/ft/

Major Changes: 
-------------- 
 * Added C/R-enabled Debugging support. 
   Enabled with the --enable-crdebug flag. See the following website for more information: 
   http://osl.iu.edu/research/ft/crdebug/ 
 * Added Stable Storage (SStore) framework for checkpoint storage 
   * 'central' component does a direct to central storage save 
   * 'stage' component stages checkpoints to central storage while the application continues execution. 
     * 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress) 
     * 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching) 
 * Added Compression (compress) framework to support 
 * Add two new ErrMgr recovery policies 
   * {{{crmig}}} C/R Process Migration 
   * {{{autor}}} C/R Automatic Recovery 
 * Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component 
 * Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option) 
   * {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342) 
   * {{{OMPI_CR_Restart}}} 
   * {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules) 
   * {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192) 
   * {{{OMPI_CR_Quiesce_start}}} 
   * {{{OMPI_CR_Quiesce_checkpoint}}} 
   * {{{OMPI_CR_Quiesce_end}}} 
   * {{{OMPI_CR_self_register_checkpoint_callback}}} 
   * {{{OMPI_CR_self_register_restart_callback}}} 
   * {{{OMPI_CR_self_register_continue_callback}}} 
 * The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future. 
 * Add a progress meter to: 
   * FileM rsh (filem_rsh_process_meter) 
   * SnapC full (snapc_full_progress_meter) 
   * SStore stage (sstore_stage_progress_meter) 
 * Added 2 new command line options to ompi-restart 
   * --showme : Display the full command line that would have been exec'ed. 
   * --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413) 
 * Deprecated some MCA params: 
   * crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir 
   * snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir 
   * snapc_base_global_shared deprecated, use sstore_stage_global_is_shared 
   * snapc_base_store_in_place deprecated, replaced with different components of SStore 
   * snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref 
   * snapc_base_establish_global_snapshot_dir deprecated, never well supported 
   * snapc_full_skip_filem deprecated, use sstore_stage_skip_filem 

Minor Changes: 
-------------- 
 * Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing. 
 * Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components 
 * Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it. 
 * Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}} 
 * Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set. 
 * opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality. 
 * Cleanup the CRS framework and components to work with the SStore framework. 
 * Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably). 
 * Add 'quiesce' hook to CRCP for a future enhancement. 
 * We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}. 
 * Add optional application level INC callbacks (registered through the CR MPI Ext interface). 
 * Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive. 
 * {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked. 
 * {{{opal-restart}}} also support local decompression before restarting 
 * {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata 
 * {{{orte-restart}}} now uses the SStore framework to work with the metadata 
 * Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality. 
 * Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}. 
 * Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped. 
 * Make sure to decrement the number of 'num_local_procs' in the orted when one goes away. 
 * odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options. 
 * Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities. 
 * Improve the checks for 'already checkpointing' error path. 
 * A a recovery output timer, to show how long it takes to restart a job 
 * Do a better job of cleaning up the old session directory on restart. 
 * Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment) 
 * Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize. 

This commit was SVN r23587.

The following Trac tickets were found above:
  Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
  Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
  Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
  Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
  Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
  Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
  Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
2010-08-10 20:51:11 +00:00
Terry Dontje
b74ef351b7 Added new solaris sysinfo module. Also added code to assign
orte_local_chip_type and orte_local_chip_model in MPI processes it the
appropriate sysinfo module found the values on the machine.

This commit was SVN r23581.
2010-08-09 19:28:56 +00:00
Jeff Squyres
88b7923fc5 At least on NetBSD 5.0_STABLE with Libtool 2.2.6b, lt_dlerror() can
sometimes return NULL, so be sure to handle that case properly.

This commit was SVN r23503.
2010-07-27 14:15:53 +00:00
Jeff Squyres
7d7c0aa48f Somehow the check for the specific value "external" got dropped in the
logic (even though the "else" clause for handling it was there).  This
commit puts back the specific check for the word "external".

Thanks to Jed Brown for noticing the issue.  Fixes trac:2503.

This commit was SVN r23475.

The following Trac tickets were found above:
  Ticket 2503 --> https://svn.open-mpi.org/trac/ompi/ticket/2503
2010-07-22 11:42:15 +00:00
Jeff Squyres
64cb8f5d7f Another round of man page cleanups from Debian mantainer Manuel
Prinz.  Many thanks!

This commit was SVN r23445.
2010-07-20 14:07:18 +00:00
Christopher Yeoh
8a3d5d4e1c Adds missing sys/stat.h include needed for more recent versions of glibc
This commit was SVN r23440.
2010-07-20 06:31:16 +00:00
Jeff Squyres
5ab634555a Apparently, Cisco plans to be working on Open MPI for a veeeeery long time!
This commit was SVN r23433.
2010-07-19 19:31:59 +00:00
Jeff Squyres
57d89d1c0c Remove a lot of kruft from the hwloc paffinity directory that we're
not using in Open MPI (i.e., that stuff is only used in the standalone
builds of hwloc -- it's not compiled/installed/used by Open MPI).

This commit was SVN r23416.
2010-07-14 20:46:47 +00:00