1
1
Граф коммитов

1049 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
8a2ca3d96f Delete built files
This commit was SVN r26481.
2012-05-23 18:14:26 +00:00
Ralph Castain
11d5a31b1e Update ignores, remove build result file from repo
This commit was SVN r26460.
2012-05-21 00:59:04 +00:00
Ralph Castain
40317c0290 Remove the old event lib version - new one seems to be working just fine.
This commit was SVN r26459.
2012-05-21 00:57:47 +00:00
Ralph Castain
83d69b6c95 Enable the ORTE progress thread for apps (not needed in the tools as they already continuously loop in the event lib). This appears to be working, at least for MPI apps that only use shared memory (a simple "hello"). More testing is required to identify where problems will occur - this is only intended to allow further development.
In order to use the progress thread, you must configure with:

--enable-orte-progress-threads --enable-event-thread-support

This commit was SVN r26457.
2012-05-20 15:14:43 +00:00
Ralph Castain
1ce59d08b5 Continue cleaning up the libevent ignores
This commit was SVN r26455.
2012-05-19 16:11:24 +00:00
Ralph Castain
1826b513ee Add missing windows file
This commit was SVN r26433.
2012-05-12 02:05:25 +00:00
Ralph Castain
09f413025b As per the RFC, upgrade OMPI to libevent 2.0.19. Leave the 2.0.13 release in the system, but inactive, for now in case we discover a need to rollback.
This commit was SVN r26431.
2012-05-11 01:05:36 +00:00
Jeff Squyres
1d7fef001c Record the upstream hwloc commit that we've committed here in the OMPI
tree

This commit was SVN r26422.
2012-05-10 12:15:23 +00:00
Jeff Squyres
9c9d7e77df Commit a fix for hwloc -- still checking with upstream to see if this
will be the final solution.  But I'm committing it now so that
Oracle's Solaris Studio builds can resume.

The issue is that the C++ bindings are now (eventually) including
<hwloc.h>.  We use !__hwloc_inline__ and #define it to an appropriate
value at compile-time.  The issue is that when we're compiling C++
code, we should just set !__hwloc_inline__ to "inline", because that's
a keyword in the C++ language (as opposed to !__inline__, or
somesuch).

This commit was SVN r26418.
2012-05-09 21:03:45 +00:00
Jeff Squyres
de4bbacd13 It turns out that we can't always include the hwloc OpenFabrics verbs
helper file, even if we find that the system has <infiniband/verbs.h>.
The reason is because there are some inline functions in that verbs
helper file that invoke ibv_* functions.  Some linkers (e.g., Solaris
Studio Compilers) will instantiate those static inline functions --
even if we don't use them -- and therefore we need to be able to
resolve the ibv_* symbols at link time.

But since -libverbs is only specified in places where we use other
ibv_* functions (e.g., the OpenFabrics-based BTLs), that means that
linking random executables can/will fail (e.g., orterun).

So instead, introduce a new #define: OPAL_HWLOC_WANT_VERBS_HELPER.  If
this macro is set to 1 before including opal/mca/hwloc/hwloc.h, then
you'll also get the hwloc OpenFabrics verbs helper header file (*if*
hwloc found <infiniband/verbs.h> -- otherwise, it'll #error).

This commit was SVN r26417.
2012-05-09 20:18:31 +00:00
Jeff Squyres
2ba10c37fe Per RFC, bring in the following changes:
* Remove paffinity, maffinity, and carto frameworks -- they've been
   wholly replaced by hwloc.
 * Move ompi_mpi_init() affinity-setting/checking code down to ORTE.
 * Update sm, smcuda, wv, and openib components to no longer use carto.
   Instead, use hwloc data.  There are still optimizations possible in
   the sm/smcuda BTLs (i.e., making multiple mpools).  Also, the old
   carto-based code found out how many NUMA nodes were ''available''
   -- not how many were used ''in this job''.  The new hwloc-using
   code computes the same value -- it was not updated to calculate how
   many NUMA nodes are used ''by this job.''
   * Note that I cannot compile the smcuda and wv BTLs -- I ''think''
     they're right, but they need to be verified by their owners.
 * The openib component now does a bunch of stuff to figure out where
   "near" OpenFabrics devices are.  '''THIS IS A CHANGE IN DEFAULT
   BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors
   (I do not have a NUMA machine with an OpenFabrics device that is a
   non-uniform distance from multiple different NUMA nodes).
 * Completely rewrite the OMPI_Affinity_str() routine from the
   "affinity" mpiext extension.  This extension now understands
   hyperthreads; the output format of it has changed a bit to reflect
   this new information.
 * Bunches of minor changes around the code base to update names/types
   from maffinity/paffinity-based names to hwloc-based names.
 * Add some helper functions into the hwloc base, mainly having to do
   with the fact that we have the hwloc data reporting ''all''
   topology information, but sometimes you really only want the
   (online | available) data.

This commit was SVN r26391.
2012-05-07 14:52:54 +00:00
Jeff Squyres
aba398ce09 Per RFC
(http://www.open-mpi.org/community/lists/devel/2012/04/10905.php), set
opal_cache_line_size via hwloc data, if we have it.
opal_cache_line_size will be set to an hwloc-inspired value by the end
of orte_init(), but will always have a safe value to use (i.e., a
default value 128) -- even before opal_init() has completed.

Default to the same value of 128 that Open MPI has used for several
years if a) we have no hwloc data, or b) we weren't able to find L2
objects in the hwloc data.

This commit was SVN r26322.
2012-04-24 17:31:06 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Mike Dubman
ff1c84c53f revert previous commit
This commit was SVN r26206.
2012-03-29 14:07:13 +00:00
Mike Dubman
43a5775e8a performance fix: set alignment for openib internal buffers
This commit was SVN r26205.
2012-03-29 14:00:08 +00:00
Jeff Squyres
028f471a20 Using the right env variable name helps!
This commit was SVN r26204.
2012-03-28 17:59:21 +00:00
Jeff Squyres
8a2df3311d Fixes trac:2812: check for env. markers indicating that we're in a
fakeroot.  If so, exit out of the pre-main hook immediately (without
calling functions such as stat, which will be replaced by fakeroot to
things that are not safe to call in a pre-main environment).

This commit was SVN r26203.

The following Trac tickets were found above:
  Ticket 2812 --> https://svn.open-mpi.org/trac/ompi/ticket/2812
2012-03-28 16:41:29 +00:00
Pavel Shamis
39a55df333 Adding exported libevent globabl variables to the opal_rename file.
Otherwise the varables case name conflicts.

This commit was SVN r26201.
2012-03-27 17:26:21 +00:00
Ralph Castain
811413e9bc Correctly handle multiple cpu-set ranges. Correctly support optional binding directives combined with cpu-set.
This commit was SVN r26187.
2012-03-23 14:50:41 +00:00
Ralph Castain
ce0caf7567 Support -cpu-set by binding to the specified cpus in the absence of any other binding directive. Allows users to subdivide nodes for multiple parallel mpirun invocations.
This commit was SVN r26186.
2012-03-23 14:05:52 +00:00
Ralph Castain
6f6930eb66 Resolve infinite loop when -cpu-set is specified
This commit was SVN r26184.
2012-03-23 07:18:58 +00:00
Jeff Squyres
95148f3310 Don't force the use of libpci support in hwloc in the default case --
just let hwloc decide for itself.

This commit was SVN r26178.
2012-03-22 15:28:35 +00:00
Jeff Squyres
3bf038bb1c Per RFC from long ago:
http://www.open-mpi.org/community/lists/devel/2011/10/9784.php

Bring support for a DMTCP CRS module into the trunk.  See
http://dmtcp.sourceforge.net/ for a description of DMTCP.  Thanks to
the contribution from Alex Brick at Northeastern University, and all
the others up there who helped shepherd this into being ready to
submit.

This commit was SVN r26176.
2012-03-22 12:01:46 +00:00
Jeff Squyres
d30bbc2ef9 Fix an old issue: enable hwloc PCI detection except on SuSE 10 64 bit.
Worked with Oracle to verify that hwloc PCI detection is correctly
disabled on the Suse 10/64 bit platform and is enabled by default on
all other platforms.  The --[en|dis]able-hwloc-pci switch is also
available for manual override of the configure decision about hwloc
PCI support.

This commit was SVN r26175.
2012-03-22 11:30:57 +00:00
Jeff Squyres
ab543fce58 We have no common components in opal any more, so we can remove this directory.
This commit was SVN r26169.
2012-03-20 21:21:49 +00:00
Jeff Squyres
0322db7cde Bring over r4402 from hwloc trunk.
This commit was SVN r26165.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4402
2012-03-19 16:39:54 +00:00
Jeff Squyres
aeca190744 Refs trac:3046: feedback from Brian -- don't set DYLD_LIBRARY_PATH.
This commit was SVN r26108.

The following Trac tickets were found above:
  Ticket 3046 --> https://svn.open-mpi.org/trac/ompi/ticket/3046
2012-03-07 13:12:22 +00:00
Ralph Castain
366f9d1518 Add some missing localities to the hwloc pretty-print, fix pmi modex
This commit was SVN r26105.
2012-03-06 06:21:10 +00:00
Jeff Squyres
f84c16bb65 Fixes trac:3043. Looks like some of the improvements to the hwloc132
hwloc component weren't reverse applied to the external hwloc
component.  Additionaly, if we add stuff to LDFLAGS/LIBS, we also may
need to append (DY)LD_LIBRARY_PATH (here in this configure process
only), otherwise future configure tests may fail because they can't
find libhwloc.so (e.g., if you --with-hwloc=/some/path, we need to add
/some/path/lib to (DY)LD_LIBRARY_PATH).

This commit was SVN r26082.

The following Trac tickets were found above:
  Ticket 3043 --> https://svn.open-mpi.org/trac/ompi/ticket/3043
2012-03-02 20:15:07 +00:00
Jeff Squyres
e77653511b Bring in upstream hwloc v1.3 branch SVN commit r4345
This commit was SVN r26048.

The following SVN revision numbers were found above:
  r4345 --> open-mpi/ompi@b6c2a5b602
2012-02-24 13:57:18 +00:00
Jeff Squyres
f8f7f6b3ef Bring over upstream hwloc fix r4340
This commit was SVN r26037.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4340
2012-02-23 20:44:21 +00:00
Jeff Squyres
d0df08c953 Bring in upstream hwloc SVN r4319.
This commit was SVN r25987.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4319
2012-02-21 15:39:21 +00:00
Jeff Squyres
9f7b1d76cd Apply upstream hwloc fix; hwloc SVN r4314
This commit was SVN r25986.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4314
2012-02-21 15:10:40 +00:00
Brian Barrett
2d4bbfb083 Need to make sure that only the winning component sets the include file.
Easiest solution is to set the include in a POST_CONFIG macro based on
whether the configure system says the component was selected or not.

This commit was SVN r25968.
2012-02-20 16:45:54 +00:00
Brian Barrett
628aa0d84d Altix check needs to occur before linux check.
This commit was SVN r25967.
2012-02-20 16:07:41 +00:00
Ralph Castain
534d70025f Cleanup the detection of process binding during mpi_init. There are several cases that need to be checked:
1. no binding support - indicated by a negative return code from get_cpubind

2. binding supported, but not bound - the bitset returned by get_cpubind is the same as the available cpuset

3. binding supported and bound - bitset from get_cpubind is a subset of available cpuset

4. only one cpu is available - in this case, get_cpubind matches the available cpuset, but we are effectively bound

This commit was SVN r25957.
2012-02-17 21:18:53 +00:00
Jeff Squyres
72e44cfefe Fixes trac:2951: make .../hwloc/include/autogen/config.h not be included
in the tarball.  Thanks to Paul Hargrove for the fix.

This commit was SVN r25952.

The following Trac tickets were found above:
  Ticket 2951 --> https://svn.open-mpi.org/trac/ompi/ticket/2951
2012-02-17 14:27:27 +00:00
Jeff Squyres
eb47a97025 After a bunch more back-n-forth with Paul Hargrove, hopefully this
visibility stuff will now be fixed!

This commit was SVN r25944.
2012-02-17 00:09:32 +00:00
Jeff Squyres
a055c5662c This is already ompi_ignore'd -- let's remove it.
This commit was SVN r25943.
2012-02-16 22:58:58 +00:00
Ralph Castain
7fd3ee6662 Ensure we see the correct config.h, and silence the warnings caused by duplicate defines
This commit was SVN r25938.
2012-02-16 02:50:57 +00:00
Jeff Squyres
14457accd7 Add hwloc 1.3.2 and ompi_ignore hwloc 1.3.1 (with the intent of
removing 1.3.1 in the near future).

This commit was SVN r25927.
2012-02-14 21:01:36 +00:00
Jeff Squyres
63a96e92b5 In a recent v1.5 branch issue, it took a while to figure out that
paffinity hwloc was returning "NOT_SUPPORTED" when the real problem
was that the underlying hwloc simply hadn't been initialized yet.  So
let's clearly delineate this case: return OPAL_ERR_NOT_INITIALIZED if
the underlying hwloc is not initialized.

This commit was SVN r25902.
2012-02-10 18:29:52 +00:00
Jeff Squyres
8d0bc199df hwloc131_module.c isn't necessary -- there's no module.
This commit was SVN r25901.
2012-02-10 18:09:19 +00:00
Jeff Squyres
6557d74e01 Make sure we get the entire hwloc tree, including IO devices.
This commit was SVN r25887.
2012-02-09 16:59:38 +00:00
Jeff Squyres
6dde3b6d86 Remove the old hwloc component; we bumped up to 1.3.1 a long time ago.
This commit was SVN r25885.
2012-02-09 12:27:00 +00:00
Ralph Castain
a3ab70c53f Correctly parse socket:core syntax in rankfile
This commit was SVN r25848.
2012-02-01 01:50:05 +00:00
Jeff Squyres
ba1b02dea0 Don't install this extra libevent file.
This commit was SVN r25808.
2012-01-27 20:38:00 +00:00
Jeff Squyres
9e9b06d9f7 Fixes trac:2844: ensure to take the value of --with(out)-memory-manager
into account when configuring the components of the faramework.  If
--without-memory-manager was given, then we really don't want any
memory managers to be used.

This commit was SVN r25807.

The following Trac tickets were found above:
  Ticket 2844 --> https://svn.open-mpi.org/trac/ompi/ticket/2844
2012-01-27 18:05:48 +00:00
Ralph Castain
3f31feee6f Handle the case where a user's rankfile specifies only cpus, and not socket:cpu pairs.
This commit was SVN r25803.
2012-01-27 12:21:45 +00:00
Shiqing Fan
debe91aefa Change the syntax to be compatible with C++ compiler, as this has to be compiled as C++ on Windows. Thanks Ralph.
This commit was SVN r25785.
2012-01-26 14:53:45 +00:00
Jeff Squyres
e162945090 This script is generated and should not be in SVN.
This commit was SVN r25778.
2012-01-25 16:38:25 +00:00
Jeff Squyres
40e23e3979 Refs trac:2952: temporarily turn off hwloc PCI support because it causes a
problem on SuSE 10 (which might be related to Oracle's dual-bitness
builds, but we aren't completely sure yet).

So just turn it off for now, and bring this over to v1.5.  Find a
proper fix (that enables pci support properly) for trunk/v1.7 later.

This commit was SVN r25769.

The following Trac tickets were found above:
  Ticket 2952 --> https://svn.open-mpi.org/trac/ompi/ticket/2952
2012-01-24 15:07:41 +00:00
Jeff Squyres
6c6d19f5f5 We always want to add HWLOC_EMBEDDED_LIBS to the wrapper flags. It'll
either be empty or have meaningful stuff in it.

This commit was SVN r25761.
2012-01-21 02:57:17 +00:00
Jeff Squyres
6cad1f34e0 Bring r4182 from the hwloc v1.3 branch: fix static linking issues with
libhwloc_embedded.la.

This commit was SVN r25760.

The following SVN revision numbers were found above:
  r4182 --> open-mpi/ompi@b240395d9a
2012-01-21 02:56:42 +00:00
Jeff Squyres
878a0365be Bring over r4094 from the hwloc v1.3 branch: add missing HWLOC_PCI_LIBS
This commit was SVN r25759.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4094
2012-01-21 02:17:07 +00:00
Jeff Squyres
1d15d39fb8 Remove the libnuma component; the hwloc maffinity component does everything that this compnent used to do. Good riddance!
This commit was SVN r25749.
2012-01-19 23:53:05 +00:00
Jeff Squyres
45636b0558 Make hwloc 1.3.1 the default. Will likel remove 1.2.2ompi shortly.
This commit was SVN r25748.
2012-01-19 23:18:40 +00:00
Jeff Squyres
1a73ba6ce8 Note the upstream patches that we have in addition to stock hwloc 1.3.1.
This commit was SVN r25708.
2012-01-11 00:22:34 +00:00
Jeff Squyres
4243cb7af0 Bring over hwloc r4102 and r4104 for some upstream patches.
This commit was SVN r25707.

The following SVN revision numbers were found above:
  r4102 --> open-mpi/ompi@8961ca568d

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4104
2012-01-11 00:21:47 +00:00
Jeff Squyres
50e5b0937c Add hwloc 1.3.1, but it is not yet the default -- it is currently
.ompi_ignored to allow other developers to test with it.  It is
expected that we'll remove the .ompi_ignore here soon, and
simultaneously remove the hwloc 1.2.2ompi component.

There was one very minor patch added to stock hwloc 1.3.1 in
hwloc/config/hwloc.m4:

{{{
--- hwloc-1.3.1/config/hwloc.m4	2011-12-14 
+++ ompi3/opal/mca/hwloc/hwloc131/hwloc/config/hwloc.m4
@@ -583,6 +583,7 @@
         ])
     fi
     AC_SUBST(HWLOC_PCI_LIBS)
+    HWLOC_LIBS="$HWLOC_LIBS $HWLOC_PCI_LIBS"
     # If we asked for pci support but couldn't deliver, fail
     AS_IF([test "$enable_pci" = "yes" -a "$hwloc_pci_happy" = "no"],
           [AC_MSG_WARN([Specified --enable-pci switch, but could
	   not])
}}}

This will be pushed upstream to hwloc.

This commit was SVN r25706.
2012-01-10 23:38:14 +00:00
Samuel Gutierrez
0ca6603fa0 remove some unused cruft in shmem. minor common sm cleanup.
This commit was SVN r25665.
2011-12-16 22:43:55 +00:00
Jeff Squyres
1d3dc0af28 Gah! opal_shmem_base_register_params() ''wasn't'' added for the mmap
on NFS warning -- it was already there!  So put it back so that it can
register base_verbose and RUNTIME_QUERY_hint.

This commit was SVN r25663.
2011-12-15 21:14:34 +00:00
Jeff Squyres
9cef715194 Updates to r25652 -- put this MCA param in the shmem/mmap component.
No need for it to be in the base (we mistakenly thought it was used in
multiple shmem components).

This commit was SVN r25662.

The following SVN revision numbers were found above:
  r25652 --> open-mpi/ompi@7e223b5799
2011-12-15 20:41:14 +00:00
Ralph Castain
7e223b5799 Okay, okay...stop the whining! Put the mca param registration in the shmem base.
This commit was SVN r25652.
2011-12-14 22:25:32 +00:00
Ralph Castain
4303958968 Allow users to silence warning
This commit was SVN r25650.
2011-12-14 21:50:34 +00:00
Shiqing Fan
a58e4ae809 Add a missing .windows file into the tarball.
This commit was SVN r25638.
2011-12-14 13:29:26 +00:00
Jeff Squyres
efd8106d0a Fix typo in help message
This commit was SVN r25628.
2011-12-13 21:55:48 +00:00
Samuel Gutierrez
8b9bb66b1c fixes shared memory in windows.
This commit was SVN r25623.
2011-12-12 22:16:31 +00:00
Nathan Hjelm
239e9c8740 clean up tabs
This commit was SVN r25622.
2011-12-12 20:54:14 +00:00
Nathan Hjelm
885d5cbcf8 enable ptmalloc with using uGNI
This commit was SVN r25621.
2011-12-12 20:52:51 +00:00
Samuel Gutierrez
de8d3a4f79 more windows shmem updates. maybe this is closer..?
This commit was SVN r25607.
2011-12-09 06:34:06 +00:00
Samuel Gutierrez
989d75bfec some more windows update... this may really break things :-(.
This commit was SVN r25592.
2011-12-08 00:10:09 +00:00
Samuel Gutierrez
dcb965e60e Some shmem Windows updates. I'm not sure if this makes a difference. Shiqing - can you please test?.
This commit was SVN r25591.
2011-12-07 21:56:27 +00:00
Josh Hursey
076db435cd * Fixes trac:2807 : Improve the BLCR configure option so that it checks if the {{{--with-blcr}}} option is specified, but not {{{--with-ft=cr}}}, then configure returns an error (since this is clearly not what the user intended).
This commit was SVN r25590.

The following Trac tickets were found above:
  Ticket 2807 --> https://svn.open-mpi.org/trac/ompi/ticket/2807
2011-12-07 21:36:03 +00:00
Josh Hursey
58938b2f50 * Clarified show help when CRS component cannot be loaded.
* Fixes trac:2329 : Improves the error message, and ensures opal-restart will not segv in opal_finalize.

This commit was SVN r25586.

The following Trac tickets were found above:
  Ticket 2329 --> https://svn.open-mpi.org/trac/ompi/ticket/2329
2011-12-07 14:58:08 +00:00
Ralph Castain
6fefe236a4 Warn users if they set opal_paffinity_alone, either to true or false, that this parameter is no longer functional - they must use the --bind-to option and its corresponding mca param.
This commit was SVN r25567.
2011-12-03 01:10:52 +00:00
Jeff Squyres
93a797cc33 r25511 didn't fully fix the OPAL_CHECK_VISIBILITY issue; we also need
to -I the opal/config directory so that autoregen can find our .m4
files. 

This commit was SVN r25564.

The following SVN revision numbers were found above:
  r25511 --> open-mpi/ompi@5751c45916
2011-12-02 19:45:39 +00:00
Jeff Squyres
ecf6ba910c Silence a few icc warnings and about mixing enums with other types.
This commit was SVN r25560.
2011-12-02 13:18:54 +00:00
Jeff Squyres
6fbbfd0f7a Gah! r25545 acidentally included ''waaaay'' more stuff than it was
supposed to.  I.e., half-baked/not complete stuff.

This commit backs out all of r25545.  Sorry folks!

This commit was SVN r25546.

The following SVN revision numbers were found above:
  r25545 --> open-mpi/ompi@7f9ae11faf
2011-11-29 23:24:52 +00:00
Jeff Squyres
7f9ae11faf Per http://www.open-mpi.org/community/lists/users/2011/11/17862.php,
to make MPI_IN_PLACE (and other sentinel Fortran constants) work on OS
X, we need to use the following compiler (linker) flag:

    -Wl,-commons,use_dylibs 

So if we're compiling on OS X, test to see if that flag works with the
compiler.  If so, add it to the wrapper FFLAGS and FCFLAGS (note that
per a future update, we'll only have one Fortran compiler anyway).

Fixes trac:1982.  

This commit was SVN r25545.

The following Trac tickets were found above:
  Ticket 1982 --> https://svn.open-mpi.org/trac/ompi/ticket/1982
2011-11-29 23:05:54 +00:00
Samuel Gutierrez
375162c693 this commit fixes a few things. 1. silence warning in common sm. 2. remove unneeded config code in common sm. 3. move opal_shmem_base_close to a better place in opal_finalize. 4. fix opal_path_nfs output.
This commit was SVN r25518.
2011-11-28 23:41:19 +00:00
Samuel Gutierrez
b4edf0ff5c getting ready for 1.5 port of the shared memory enhancements. remove some unused/unneeded stuff and minor style update.
This commit was SVN r25513.
2011-11-28 16:08:32 +00:00
George Bosilca
5751c45916 Actually ... OMPI is lacking visibility, as it was all moved down in OPAL.
This commit was SVN r25511.
2011-11-28 04:27:06 +00:00
Terry Dontje
1f53b32216 This commit fixes trac:2917. By using the cleaned up version of check_visibility that is in the hwloc trunk repo.
This commit was SVN r25495.

The following Trac tickets were found above:
  Ticket 2917 --> https://svn.open-mpi.org/trac/ompi/ticket/2917
2011-11-22 00:01:09 +00:00
George Bosilca
61f273b987 Do not tolerate uninitialized variables.
This commit was SVN r25489.
2011-11-18 10:19:24 +00:00
Samuel Gutierrez
15249dfc01 rename some variables following ompi's naming convention.
This commit was SVN r25487.
2011-11-17 19:22:58 +00:00
Samuel Gutierrez
47499d1d3d fixes CID 2256.
This commit was SVN r25486.
2011-11-17 18:06:28 +00:00
Samuel Gutierrez
c1012f502f added backing file relocation capability in shmem mmap. two new mca
parameters control its behavior (shmem_mmap_relocate_backing_file and
shmem_mmap_backing_file_base_dir).

This commit was SVN r25480.
2011-11-16 01:38:23 +00:00
Ralph Castain
6310361532 At long last, the fabled revision to the affinity system has arrived. A more detailed explanation of how this all works will be presented here:
https://svn.open-mpi.org/trac/ompi/wiki/ProcessPlacement

The wiki page is incomplete at the moment, but I hope to complete it over the next few days. I will provide updates on the devel list. As the wiki page states, the default and most commonly used options remain unchanged (except as noted below). New, esoteric and complex options have been added, but unless you are a true masochist, you are unlikely to use many of them beyond perhaps an initial curiosity-motivated experimentation.

In a nutshell, this commit revamps the map/rank/bind procedure to take into account topology info on the compute nodes. I have, for the most part, preserved the default behaviors, with three notable exceptions:

1. I have at long last bowed my head in submission to the system admin's of managed clusters. For years, they have complained about our default of allowing users to oversubscribe nodes - i.e., to run more processes on a node than allocated slots. Accordingly, I have modified the default behavior: if you are running off of hostfile/dash-host allocated nodes, then the default is to allow oversubscription. If you are running off of RM-allocated nodes, then the default is to NOT allow oversubscription. Flags to override these behaviors are provided, so this only affects the default behavior.

2. both cpus/rank and stride have been removed. The latter was demanded by those who didn't understand the purpose behind it - and I agreed as the users who requested it are no longer using it. The former was removed temporarily pending implementation.

3. vm launch is now the sole method for starting OMPI. It was just too darned hard to maintain multiple launch procedures - maybe someday, provided someone can demonstrate a reason to do so.

As Jeff stated, it is impossible to fully test a change of this size. I have tested it on Linux and Mac, covering all the default and simple options, singletons, and comm_spawn. That said, I'm sure others will find problems, so I'll be watching MTT results until this stabilizes.

This commit was SVN r25476.
2011-11-15 03:40:11 +00:00
Shiqing Fan
fc46ed6438 Use the new libevent On Windows.
This commit was SVN r25462.
2011-11-09 14:05:35 +00:00
George Bosilca
bd55a19db4 Disable vector optimization for ICC v12.1.0 release 2011.6.233.
This commit was SVN r25461.
2011-11-08 21:23:30 +00:00
Jeff Squyres
78538b701d Turns out that we're not even using that $includedir properly, so just
remove it.

This commit was SVN r25458.
2011-11-08 03:18:09 +00:00
Ralph Castain
a931c5b1eb Redo a patch from late last night that replaces libevent 2.0.7 with libevent 2.0.13 as our default event library. Cleanup the libevent renames to correctly state 2013 as our new version.
This commit was SVN r25457.
2011-11-08 01:36:15 +00:00
Ralph Castain
a3ce355a60 Revert r25453 and r25450 until we can fix the libevent2013 configure code - still not getting the includedir to eval correctly.
This commit was SVN r25454.

The following SVN revision numbers were found above:
  r25450 --> open-mpi/ompi@7f7d5c4f1f
  r25453 --> open-mpi/ompi@c9fe8c32e2
2011-11-07 16:23:44 +00:00
Ralph Castain
c9fe8c32e2 Fix a bug in the configure.m4 to ensure we disable unused libevent modes. Edit their Makefile.am to remove bufferevent support since we don't use it either.
Sorry for mid-day correction.

This commit was SVN r25453.
2011-11-07 15:21:29 +00:00
Nathan Hjelm
7f7d5c4f1f RFC: upgrade to libevent 2.0.13 (removing 2.0.7) timeout. Removed libevent 2.0.7
This commit was SVN r25450.
2011-11-07 04:32:36 +00:00
Jeff Squyres
f08b8bf2d4 Per this thread:
http://www.open-mpi.org/community/lists/devel/2011/10/9878.php

I am making a final decision to decide the behavior of what happens
when an MCA parameter is re-registered and changes types.  In
developer builds (i.e., OPAL_ENABLE_DEBUG==1), a show_help message
will be displayed.  In all builds, an error status will be returned.
Specifically, the logic looks like this:

{{{
    if (detect_re-registration_with_type_change) {
#if OPAL_ENABLE_DEBUG
        opal_show_help(...);
#endif
        return OPAL_ERR_VALUE_OUT_OF_BOUNDS;
    }
}}}

If someone would like to change this behavior, they are welcome to do
so.  :-) I am committing this so that ''some'' action occurs (rather
than talking about the issue and then nothing happens).

This commit was SVN r25432.
2011-11-04 14:16:49 +00:00
Jeff Squyres
886a9d589b Custom patch from Brice for the hwloc-1.2.2ompi distro, per an issue
that Chris Yeoh/IBM found.  See the thread below for more info:

  http://www.open-mpi.org/community/lists/hwloc-devel/2011/11/2521.php

This commit was SVN r25429.
2011-11-03 14:53:22 +00:00
Jeff Squyres
4fe26b0392 Fix some minor memory leaks
This commit was SVN r25410.
2011-11-01 20:22:26 +00:00
Ralph Castain
4368199c86 Missing include
This commit was SVN r25402.
2011-10-31 13:39:57 +00:00
Ralph Castain
96332a2859 Fix typo
This commit was SVN r25400.
2011-10-30 13:23:42 +00:00
Ralph Castain
71ed8e3cd3 Bring back the local node's binding capabilities along with its topology. Clean up indentation.
This commit was SVN r25399.
2011-10-30 13:20:16 +00:00
Ralph Castain
7ba4675adf Bring over some useful utilities and definitions for working with hwloc inside ORTE/OMPI. Cache frequently computed info to save processing time when handling multiple nodes with the same topology. Deal with available cpus as defined by online vs allowed vs user-specified limits. Help deal with hwloc's unfortunate decision to lump all caches in the same object type.
This commit was SVN r25393.
2011-10-29 14:58:58 +00:00
Jeff Squyres
6092b50ebb Fix the cases where the default values of MCA params were not always
handled properly when MCA parameters are re-registered and their types
change.  Specifically, this case was broken:

 1. Register an int MCA param with a non-zero default value
 1. Re-register the same MCA param as a string with a NULL default value

The 2nd step would cause a segv because the first int default value
wasn't being reset properly.  Here's sample code that shows the issue:

{{{
{
    int ibogus;
    char *sbogus;
    opal_init(&argc, &argv);
    mca_base_param_reg_int_name("type", "name", "help", false, false, 3, &ibogus);
    printf("Ibogus: %d\n", ibogus);
    mca_base_param_reg_string_name("type", "name", "help", false, false, NULL, &sbogus);
    printf("Sbogus: %s\n", (NULL == sbogus) ? "NULL" : sbogus);
    exit(0);
}
}}}

This commit fixes the problem from the sample code above as well as
the a similar issue for file-set MCA params and override values.  It
also resets default values for MCA params initially registered as a
string but then re-registered as an int.

This commit was SVN r25392.
2011-10-29 12:29:31 +00:00
Ralph Castain
21d45b0807 Just some cleanup in case of error
This commit was SVN r25387.
2011-10-29 01:55:19 +00:00
Ralph Castain
a7cbc25658 Minor cleanups - check hwloc returns everywhere. Thanks to Chris Yeoh for pointing this out.
This commit was SVN r25360.
2011-10-24 14:05:26 +00:00
Shiqing Fan
5711414eb7 Fix Windows build
This commit was SVN r25351.
2011-10-21 14:46:58 +00:00
Jeff Squyres
cbafea8f69 Add a DEPENDENCIES line so that if you edit something down in the
hwloc tree, it'll get picked up by the component (and therefore by
libopen-pal).

Thanks to Terry for finding the problem.

This commit was SVN r25349.
2011-10-21 11:39:52 +00:00
Ralph Castain
b44f8d4b28 Complete implementation of the ess.proc_get_locality API. Up to this point, the API was only capable of telling if the specified proc was sharing a node with you. However, the returned value was capable of telling you much more detailed info - e.g., if the proc shares a socket, a cache, or numa node. We just didn't have the data to provide that detail.
Use hwloc to obtain the cpuset for each process during mpi_init, and share that info in the modex. As it arrives, use a new opal_hwloc_base utility function to parse the value against the local proc's cpuset and determine where they overlap. Cache the value in the pmap object as it may be referenced multiple times.

Thus, the return value from orte_ess.proc_get_locality is a 16-bit bitmask that describes the resources being shared with you. This bitmask can be tested using the macros in opal/mca/paffinity/paffinity.h

Locality is available for all procs, whether launched via mpirun or directly with an external launcher such as slurm or aprun.

This commit was SVN r25331.
2011-10-19 20:18:14 +00:00
Rainer Keller
ec6ac33b75 - On Linux x86-64 with intel compiler v12.1, any ompi-app fails before
calling main():
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> which ompi_info
~/openmpi-1.5.4/COMPILE-intel-12.1.0/usr/bin/ompi_info
xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> ompi_info
Segmentation fault
xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> gdb usr/bin/ompi_info
...
(gdb) run
Starting program:
...
Program received signal SIGSEGV, Segmentation fault.
opal_memory_ptmalloc2_int_malloc (av=0x7ffff7fe83d8, bytes=4102) at
../../../../../opal/mca/memory/linux/malloc.c:4080
4080          /* remove from unsorted list */
(gdb) where
#0  opal_memory_ptmalloc2_int_malloc (av=0x7ffff7fe83d8, bytes=4102) at
../../../../../opal/mca/memory/linux/malloc.c:4080
#1  0x00007ffff7c232b9 in opal_memory_linux_malloc_hook
(sz=140737354040280, caller=0x1006) at
../../../../../opal/mca/memory/linux/hooks.c:687
#2  0x0000003dd96a6871 in __alloc_dir () from /lib64/libc.so.6
#3  0x0000003ddfa053cd in ?? () from /usr/lib64/libnuma.so.1
#4  0x0000003dd8e0e445 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    A lot of combinations and trials have been done, yet to no avail.
    Intel v11.0 worked...

    Thanks to Hubert Haberstock (Intel) providing the hint in:
    http://software.intel.com/en-us/forums/showthread.php?t=87132

    This was tested on openmpi-1.5.4 and therefore should
    cmr:v1.5

This commit was SVN r25290.
2011-10-14 20:47:08 +00:00
Ralph Castain
69a0882207 Correctly setup hwloc when passing a topology from an external source
This commit was SVN r25277.
2011-10-12 21:34:46 +00:00
Jeff Squyres
ff97b57c90 Change the names to be slightly more descriptive.
This commit was SVN r25271.
2011-10-12 16:07:09 +00:00
Jeff Squyres
951c745590 We always have hwloc xml support (now that it's built into to hwloc
without needing libxml2).  So OPAL_HAVE_HWLOC_XML is no longer
necessary.  

This commit was SVN r25263.
2011-10-11 20:20:59 +00:00
Jeff Squyres
e4f8b662a1 Remove this component; it was wholly superceded by hwloc122ompi a
little while ago.

This commit was SVN r25261.
2011-10-11 20:13:33 +00:00
Jeff Squyres
7fd3a7f696 Remove no-longer-true comment. Process-wide memory affinity policy is
set by calling opal_hwloc_base_set_process_membind_policy().

This commit was SVN r25260.
2011-10-11 20:00:45 +00:00
Ralph Castain
8c4512a994 Fix the verbose output for caches (again) so they are properly labeled, pending adoption of the upstream patch we supplied.
This commit was SVN r25251.
2011-10-11 05:54:26 +00:00
Swen Boehm
08b4322a1a patched the lex files to not issue the following compiler warning:
'yyunput' defined but not used

This commit was SVN r25246.
2011-10-10 18:13:04 +00:00
George Bosilca
649af6c925 Enumerated mixed with another type (int) is tolerated but
easily fixable.

This commit was SVN r25241.
2011-10-09 03:54:52 +00:00
George Bosilca
9d68d7c0c8 iFix a bunch of warnings.
This commit was SVN r25227.
2011-10-03 18:46:49 +00:00
George Bosilca
b4c076ad28 Remove an unused function.
This commit was SVN r25226.
2011-10-03 18:46:27 +00:00
Jeff Squyres
34deb0db97 Sync with final hwloc 1.2.2 release
This commit was SVN r25221.
2011-10-03 14:12:38 +00:00
Jeff Squyres
6a32aa4a04 Oops -- it looks like we ''do'' still use this variable in the
trunk... 

This commit was SVN r25203.
2011-09-28 12:12:37 +00:00
Jeff Squyres
bc3e213a69 After fixing an svn/hg kerfluffle, there's a few files left over from
last night's hwloc/paffinity/maffinity minor update.  Nothing huge;
just a little cleanup.

This commit was SVN r25202.
2011-09-28 11:46:28 +00:00
Jeff Squyres
9fa2130cfb Fix typo that prevents VPATH builds.
This commit was SVN r25201.
2011-09-28 11:29:12 +00:00
Jeff Squyres
970a75a7b6 Update to a custom OMPI roll of hwloc v1.2.2. Upgrade the configry to
match similar stuff in the event framework; only add CPPFLAGS /
LDFLAGS / LIBS / and WRAPPER_EXTRA_* of the same for the one, single,
winning component (because this framework is compile-time,
one-of-many).

This commit was SVN r25199.
2011-09-27 23:54:09 +00:00
Jeff Squyres
3d61d0f357 Fix up some long-latency bugs in the MCA even framework configury that
only became evident when there was more than one event component.

The libevent2013 component is still ompi_ignore'd for most developers.

This commit was SVN r25198.
2011-09-27 23:18:07 +00:00
Jeff Squyres
d4603f080d Refs trac:2854.
Since hwloc has a dynamic bitmap size, it could actually have bits set
that will not fit in the paffinity mask.  We already made sure that we
didn't overrun the paffinity mask; now also set the return value to
OPAL_ERR_VALUE_OUT_OF_BOUNDS (wow, we really thought of everything
with those error codes, eh?) if the hwloc bitmap has bits set higher
than what will fit into the paffinity bitmask.

This commit was SVN r25179.

The following Trac tickets were found above:
  Ticket 2854 --> https://svn.open-mpi.org/trac/ompi/ticket/2854
2011-09-24 13:52:27 +00:00
Jeff Squyres
57323570e3 These calls were mistakenly added in r25164.
This commit was SVN r25176.

The following SVN revision numbers were found above:
  r25164 --> open-mpi/ompi@51129cc2a8
2011-09-22 18:20:30 +00:00
Jeff Squyres
82c93611e6 Fix some problems with the libevent and hwloc frameworks:
* change components from setting <framework>_base_include to
   opal_<framework>_<component>_include; the framework m4 will figure
   out the winning component and pick the right "include" shell
   variable.  Ditto for the other shell variables (cppflags, ldflags,
   etc.). 
 * misc fixes to hwloc/external
 * add a bunch of missing "opal_" prefixes to shell variables
 * add a few more / update a few comments in framework m4's

This commit was SVN r25174.
2011-09-21 23:06:13 +00:00
Ralph Castain
a818101d19 Silence compiler warning
This commit was SVN r25172.
2011-09-21 16:39:59 +00:00
Shiqing Fan
3e0ee394ef Select only one libevent component.
This commit was SVN r25169.
2011-09-20 16:10:01 +00:00
Shiqing Fan
4caed984ed Need to exclude another file for windows build.
This commit was SVN r25168.
2011-09-20 16:09:03 +00:00
Ralph Castain
51129cc2a8 If built without hwloc xml support, we cannot currently pass the local topology from the daemon to an MPI app. This makes it impossible to set affinity, for example. In this case, have the app get its own copy of the topology at startup.
For safety sake, protect hwloc-based affinity modules from NULL topology

This commit was SVN r25164.
2011-09-20 14:46:55 +00:00
Ralph Castain
052ccd4b1e Set ignores
This commit was SVN r25163.
2011-09-20 13:47:05 +00:00
Ralph Castain
45396d8f9c Add missing files
This commit was SVN r25162.
2011-09-20 13:37:22 +00:00
Nathan Hjelm
7cd8f21b7f add libevent 2.0.13 module
This commit was SVN r25161.
2011-09-20 00:13:05 +00:00
Jeff Squyres
9db4542c2b Move maffinity_base_alloc_policy and
maffinity_base_bind_failure_action MCA params to the hwloc base
(hwloc_base_alloc_polocy and hwloc_base_bind_failure_action).  Since
these MCA parameters were never on a release branch, I'm just
moving/renaming them outright and not leaving aliases to the old
names.

Note that some upper layer needs to call
opal_hwloc_base_set_process_membind_policy() to set the
set-by-MCA-param process-wide memory affinity policy.  We can't do
this automatically during hwloc_base_open() because, for reasons
described elsewhere, opal_hwloc_topology is not automatically filled
during hwloc_base_open() (in short: potential scalability issues when
launching many MPI processes simultaneously on a single machine, for
example).

This commit was SVN r25156.
2011-09-19 16:10:37 +00:00
Jeff Squyres
dc70100cee In reviewing CMR #2866, it was noticed that the maffinity/hwloc and
paffinity/hwloc components were still calling hwloc_topology_init/load
themselves, and not using the opal_hwloc_topology.  Doh!

This commit fixes that -- these 2 components no longer have their own
copy of the topology tree; they just use opal_hwloc_topology.

This commit was SVN r25151.
2011-09-17 13:13:36 +00:00
Jeff Squyres
ecd603256a * Rename opal_hwloc_components to opal_hwloc_base_components
* Fix some comments

This commit was SVN r25150.
2011-09-17 11:54:36 +00:00
Jeff Squyres
4a2cf81c6f Fixes to ensure that dependent libraries are carried forward from the embedded hwloc
This commit was SVN r25140.
2011-09-13 22:43:39 +00:00
Jeff Squyres
d6682523f6 Put in proper basename so that "make dist" can find it.
This commit was SVN r25135.
2011-09-13 11:09:56 +00:00
Jeff Squyres
4771c36061 * With some m4 trickery, if no form of --with-hwloc is specified on
the command line, hwloc is just like any other external dependency
   in OMPI: if we find it, we'll use it. If we don't find it, we'll
   ignore it.  See comments in opal/mca/hwloc/configure.m4 for an
   explanation. 
 * Fix some copy-n-paste errors in opal/mca/hwloc/configure.m4
   w.r.t. flags coming in from the winning component.
 * Add another line in ompi_info's output about whether hwloc support
   is included or not.

This commit was SVN r25134.
2011-09-13 00:39:14 +00:00
Jeff Squyres
7dc352a328 Add some notes about maintaining the hwloc framework.
This commit was SVN r25132.
2011-09-12 19:40:18 +00:00
Jeff Squyres
c5bfa09574 Fixes from Brice Goglin, post hwloc v1.2.1 for AMD Magny-Cours. See
http://www.open-mpi.org/community/lists/users/2011/09/17164.php.

This commit was SVN r25131.
2011-09-12 19:03:48 +00:00
Shiqing Fan
b61eed801f Fix the problem of building hwloc on Windows. Temporarily not using it for Windows.
This commit was SVN r25128.
2011-09-12 13:55:34 +00:00
Ralph Castain
6460fe5480 Silence warning
This commit was SVN r25127.
2011-09-12 13:32:21 +00:00
Ralph Castain
92c7372e20 Per the RFC from Jeff, move hwloc from opal/mca/common to its own static framework ala libevent. Have ORTE daemons collect the topology info at startup and, if --enable-hwloc-xml is set, send that info back to the HNP for later use. The HNP only retains unique topology "templates" to reduce memory footprint. Have the daemon include the local topology info in the nidmap buffer sent to each app so the apps don't all hammer the local system to discover it for themselves.
Remove the sysinfo framework as hwloc replaces that functionality.

This commit was SVN r25124.
2011-09-11 19:02:24 +00:00
Ralph Castain
b2971df7df Ensure we loop over all cpu's.
Thanks to Nadia Derbey for spotting the error.

This commit was SVN r25102.
2011-08-29 14:24:34 +00:00
Eugene Loh
55a7b474dd Change a stray __volatile to __volatile__.
This commit was SVN r25092.
2011-08-26 15:36:10 +00:00
Jeff Squyres
495ceef60d Upgrade hwloc to v1.2.1.
This commit was SVN r25088.
2011-08-26 13:14:26 +00:00