1
1
Граф коммитов

615 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
21178f9379 Remove the "linux" paffinity component (i.e., the one that was based
on the now-defunct PLPA) -- the new hwloc component supersedes it.  

So long, PLPA -- we loved ya!

This commit was SVN r23126.
2010-05-13 23:59:21 +00:00
Jeff Squyres
3129ccd9ec Make the hwloc paffinity component available for everyone. hwloc
supports a wide variety of operating systems and platforms; see the
opal/mca/paffinity/hwloc/hwloc/README file for details.

This component includes an embedded copy of hwloc, currently based on
hwloc-1.0rc6.  But note that hwloc is properly SVN imported into the
/vendor branch, so it will be easy to update when 1.0 GA is released.
Note that the hwloc tree embedded in opal/mca/paffinity/hwloc/hwloc is
identical to a hwloc distribution tarball, except that much of the
documentation was rm -rf'ed (because we don't need it for the embedded
case).

Since the paffinity framework currently does not understand hardware
threads, the hwloc component compensates for this by identifying cores
by the "first" hardware thread on that core.  Hopefully we'll update
paffinity someday to understand hardware threads.  :-)

configure grew a --with-hwloc option, analogous to what we do for many
other external libraries that OMPI supports.  However, there's a new
feature: due to the request of several distros, OMPI can be configured
to build with its internal copy of hwloc or with an external copy of
hwloc (e.g., a system-installed hwloc).

 1. If --with-hwloc is not specified, Open MPI will try to use its
    internal copy (but silently fail/ignore hwloc if that fails).
 1. If --with-hwloc=<dir> is supplied, Open MPI looks for hwloc
    support in <dir> (and --with-hwloc-libdir=<dir>, if specified).
 1. If --with-hwloc=external is supplied, Open MPI will look for hwloc
    in a compiler/linker default external location.
 1. If --with-hwloc=internal is supplied, Open MPI will use its
    internal copy of hwloc.

Some of OMPI's main configury had to be slightly re-arranged in the
bootstrapping phase to accomodate hwloc's configry needs.

This commit was SVN r23125.
2010-05-13 23:56:05 +00:00
Jeff Squyres
ca6d95a9c8 Clean up some comments; make paffinity/base/base.h comments agree with
paffinity/paffinity.h. 

This commit was SVN r23124.
2010-05-13 23:43:28 +00:00
Jeff Squyres
bf7954c1de Bump up to 1.0rc6 from the vendor branch.
This commit was SVN r23117.
2010-05-12 17:04:48 +00:00
Jeff Squyres
c7c3de87f5 Add ummunotify support to Open MPI. See
http://marc.info/?l=linux-mm-commits&m=127352503417787&w=2 for more
details.

 * Remove the ptmalloc memory component; replace it with a new "linux"
   memory component.
 * The linux memory component will conditionally compile in support
   for ummunotify.  At run-time, if it has ummunotify support and
   finds run-time support for ummunotify (i.e., /dev/ummunotify), it
   uses it.  If not, it tries to use ptmalloc via the glibc memory
   hooks. 
 * Add some more API functions to the memory framework to accomodate
   the ummunotify model (i.e., poll to see if memory has "changed").
 * Add appropriate calls in the rcache to the new memory APIs to see
   if memory has changed, and to react accordingly.
 * Add a few comments in the openib BTL to indicate why we don't need
   to notify the OPAL memory framework about specific instances of
   registered memory.
 * Add dummy API calls in the solaris malloc component (since it
   doesn't have polling/"did memory change" support).

This commit was SVN r23113.
2010-05-11 21:43:19 +00:00
Ralph Castain
d6a1d7a082 Little more cleanup on paffinity. Provide a specific error code for affinity not supported so we can better report the problem. Move the error reporting to orterun so we only get one error message. Update the darwin paffinity module to return the correct new error codes.
This commit was SVN r23107.
2010-05-07 14:04:55 +00:00
Ralph Castain
d4f56cff61 More cleanup on paffinity....groan
It is okay to not have a paffinity module IF you aren't using paffinity anyway. So don't error out of MPI_Init because a paffinity module wasn't selected.

Cleanup error reporting in the odls default module to (once and for all!) eliminate messages originating in the fork'd process. Create some new error codes to allow us to pass enough info back to the parent process to provide useful error messages.

This commit was SVN r23106.
2010-05-06 20:57:17 +00:00
Jeff Squyres
71cbe1a69f Bump up to hwloc v1.0rc3
This commit was SVN r23070.
2010-04-29 15:59:01 +00:00
Jeff Squyres
f064056a07 We don't need all this stuff in OMPI.
This commit was SVN r23056.
2010-04-28 00:31:15 +00:00
Jeff Squyres
2fe1bc043d Bump up to hwloc 1.0rc2
This commit was SVN r23042.
2010-04-26 21:57:51 +00:00
Ralph Castain
13a7338289 Ensure we get past the '=' in the parameter
This commit was SVN r23039.
2010-04-26 20:46:50 +00:00
Ralph Castain
e1b9f400ba Add some new utilities that support searching an environ string list (not just our own environ) for specific MCA params and returning their value. Helpful when a daemon needs to check an app_context's environ for params that can impact how the daemon launches and/or interacts with it, but don't pertain to the daemon's own environ.
This commit was SVN r23034.
2010-04-26 03:35:09 +00:00
Jeff Squyres
ea8b0ea569 Add a new function in the paffinity base:
opal_paffinity_base_cset2str().  This function basically makes a
prettyprint string out of an opal_paffinity_base_cset_t.

This commit was SVN r23017.
2010-04-21 17:26:36 +00:00
Jeff Squyres
53ab6600e6 Minor update to comments.
This commit was SVN r23013.
2010-04-20 20:59:42 +00:00
Jeff Squyres
f1d4a748eb Minor fix: pass by pointer to the new function so that the caller
can see the results.

This commit was SVN r23012.
2010-04-20 19:52:47 +00:00
Ralph Castain
7717c970a3 Ahem...it requires 2 hex chars to describe each byte of a bitmask...
This commit was SVN r23001.
2010-04-20 05:11:16 +00:00
Ralph Castain
86228aee38 Provide two new opal paffinity utilities for printing a hex representation of the cpu set and parsing that string back into a cpu set on the other end. Also add a new MCA param for passing the cpu set applied to a process during launch down to that process so it can know what we attempted to do.
All to be used in some new MPI extensions provided by Jeff so that users can easily query their binding situation.

This commit was SVN r22998.
2010-04-19 22:16:35 +00:00
Jeff Squyres
338920656f Remove the compile-time proiorities for paffinity modules (they were
done this way a long time ago for the "gee whiz!" factor -- when in
reality, they really only need one-of-many-run-time priority
selection).

Changed run-time priorities to be as follows:

 * darwin: 20
 * linux: 20
 * posix: 10
 * solaris: 30
 * test: 5
 * windows: 20

I have a very dim (possibly untrue) recollection that Solaris needs to
have a higher priority than others just to ensure that no other is
chosen under Solaris.  Make all other "native" components have a
priority of 20 (they shouldn't conflict with each other).  Make the
posix fallback component have a priority of 10.  Make the test
component priority 5, meaning someone can always select it, but you
can also make a "never select me" component that prioritizes itself
under test.

This commit was SVN r22997.
2010-04-19 22:14:06 +00:00
Jeff Squyres
9f5ddbcc6e 3rd party import hwloc 1.0rc1 into the SVN trunk
This commit was SVN r22996.
2010-04-19 19:48:58 +00:00
Jeff Squyres
8b163ccd70 Add dummy hwloc directory for staged import into svn
This commit was SVN r22994.
2010-04-19 19:43:43 +00:00
Ralph Castain
4d06125a33 Establish a method by which a process knows if it has been bound by mpirun. This helps resolve a problem where a process gets "bound" to all available resources, which looks to the opal paffinity system as "not bound". This can cause mpi_init to attempt to "bind" the process itself, causing unintended behavior.
This commit was SVN r22985.
2010-04-17 01:58:26 +00:00
Ralph Castain
41428e6b61 Issue a warning if a requested binding operation results in processes being bound to all available processes, which is the equivalent of not being bound at all.
See the following email thread for further details:

http://www.open-mpi.org/community/lists/devel/2010/04/7745.php

This commit was SVN r22984.
2010-04-17 01:02:41 +00:00
Jeff Squyres
798202c424 Allow the mca_component_path to change over time.
This commit was SVN r22957.
2010-04-12 22:02:34 +00:00
Jeff Squyres
f77257d931 These don't belong in this file.
This commit was SVN r22956.
2010-04-12 20:50:23 +00:00
Jeff Squyres
1919ba225d Allow static_components to be NULL for cases where we ''know'' there
will be no static components to be searched.

This commit was SVN r22954.
2010-04-12 14:51:47 +00:00
Terry Dontje
282a537cf7 This commit fixes 2370, by having the solaris paffinity module return error codes for get_physical_processor_id and having odls_default_fork_local_proc check get_physical_processor_id for OPAL_ERROR
This commit was SVN r22948.
2010-04-09 15:10:46 +00:00
Brad Benton
58a9aeff5a ================================================================================
modify the OPAL_PAFFINITY_PROCESS_IS_BOUND macro to search the cpuset for
the maximum possible number of cpus rather than just the number of cpus
currently online.  This corrects a problem where mpi_paffinity_alone was
not working properly on systems in which there can be cpu namespaces with
holes, such as on ppc64 with smt off (as discussed in #2365).

This commit was SVN r22927.
2010-04-02 18:24:12 +00:00
Josh Hursey
62f8d3c471 r22885 missed a few symbol updates when it changed ompi_want_ft to opal_want_ft
This commit was SVN r22916.

The following SVN revision numbers were found above:
  r22885 --> open-mpi/ompi@522a23d6a3
2010-03-30 16:47:39 +00:00
Jeff Squyres
59126b1e0b Update copyrights.
This commit was SVN r22867.
2010-03-23 12:03:20 +00:00
Jeff Squyres
136f926fd1 Really fixes trac:2104. There is a lengthy discussion about this patch on
#2322.

The short version is that this patch consolidates two pieces of code
that call the back-end munmap and ensures that (if dlsym is used) the
corresponding dlsym is only invoked once and that the variable holding
the result is volatile.

This commit was SVN r22863.

The following Trac tickets were found above:
  Ticket 2104 --> https://svn.open-mpi.org/trac/ompi/ticket/2104
2010-03-23 01:04:25 +00:00
Josh Hursey
e9b5162d79 Fix the configure logic for --with-ft so that it properly takes a comma separated list.
Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those.

The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed.

This commit was SVN r22824.
2010-03-12 23:57:50 +00:00
Christopher Yeoh
bccafbb5df Fixes the problem where the rcache and core memory allocation can deadlock itself
This commit fixes trac:2104. Request a cmr:v1.4

This commit was SVN r22675.

The following Trac tickets were found above:
  Ticket 2104 --> https://svn.open-mpi.org/trac/ompi/ticket/2104
2010-02-22 05:12:10 +00:00
Jeff Squyres
a89dc623b0 Brice Goglin noticed that mpi_paffinity_alone didn't seem to be doing
anything for non-MPI apps.  Oops!  (But before you freak out, gentle
reader, note that mpi_paffinity_alone for MPI apps still worked fine)
When we made the switchover somewhere in the 1.3 series to have the
orted's do processor binding, then stuff like:

  mpirun --mca mpi_paffinity_alone 1 hostname

should have bound hostname to processor 0.  But it didn't because of a
subtle startup ordering issue: the MCA param registration for
opal_paffinity_alone was in the paffinity base (vs. being in
opal/runtime/opal_params.c), but it didn't actually get registered
until after the global variable opal_paffinity_alone was checked to
see if we wanted old-style affinity bindings.  Oops.

However, for MPI apps, even though the orted didn't do the binding,
ompi_mpi_init() would notice that opal_paffinity_alone was set, yet
the process didn't seem to be bound.  So the MPI process would bind
itself (this was done to support the running-without-orteds
scenarios).  Hence, MPI apps still obeyed mpi_paffinity_alone
semantics.

But note that the error described above caused the new mpirun switch
--report-bindings to not work with mpi_paffinity_alone=1, meaning that
the orted would not report the bindings when mpi_paffinity_alone was
set to 1 (it ''did'' correctly report bindings if you used
--bind-to-core or one of the other binding options).

This commit separates out the paffinity base MCA param registration
into a small function that can be called at the Right place during the
startup sequence.

This commit was SVN r22602.
2010-02-10 22:32:00 +00:00
Rainer Keller
80136ac9e2 - We don't configure-check for errno.h and don't need errno.h here...
This commit was SVN r22587.
2010-02-09 01:12:52 +00:00
Brian Barrett
8b4825ff37 Updates to make trunk run on Catamount again:
* Don't build the pstat component if all defines needed aren't there.
 * Update platform file to work better
 * Work around two places that depended on modex being operational

This commit was SVN r22536.
2010-02-03 05:07:40 +00:00
Josh Hursey
b749ecbab8 This commit fixes trac:2190.
Originally the patch was to improve the error message, but when digging into the code I found a subtle bug. If the daemon does not tell the HNP what CRS component it used, then the HNP tries to figure it out from the metadata (this is an uncommon case). The path the HNP used was not complete, so it was unable to find the metadata information. This patch fixes this by adding the 'snapshot_reference' to the 'snapshot_location' which completes the path for this search.

cmr:v1.4 (needs a custom patch)

cmr:v1.5

This commit was SVN r22479.

The following Trac tickets were found above:
  Ticket 2190 --> https://svn.open-mpi.org/trac/ompi/ticket/2190
2010-01-25 20:28:38 +00:00
Jeff Squyres
596473e7ca Patch from Aleksej Saushev to properly only check for /proc/cpuinfo on Linux-based systems
This commit was SVN r22417.
2010-01-14 23:16:31 +00:00
Ralph Castain
f0646b3603 Need separate flag for select when initializing sysinfo framework
This commit was SVN r22394.
2010-01-12 23:22:46 +00:00
Ralph Castain
b35486d945 The CM ess module needs to open the sysinfo framework and select modules prior to when others need it. Thus, setup a flag to avoid multiple open/select within that framework.
This commit was SVN r22393.
2010-01-12 22:03:49 +00:00
Jeff Squyres
d9fc4e0a9d Per http://www.open-mpi.org/community/lists/devel/2010/01/7283.php, allow MCA components to fail the component.register and component.open methods without the MCA base printing errors.
This commit was SVN r22391.
2010-01-12 19:29:12 +00:00
Jeff Squyres
dbb29663e8 Update the embedded PLPA version to v1.3.2. Since this is a 3rd
party/"vendor" import, the changes are actually far smaller than the
size of this changeset implies.  Here's a list of the changes:

 * Update the AMD license header in plpa_map.c to be less restrictive
   (see https://svn.open-mpi.org/trac/plpa/changeset/262 for details)
   -- '''this is the most/only important change of this update.'''  No
   code is changed by this; only removing a clase from a license
   header in plpa_map.c.
 * Changes to the generated {{{configure}}}, {{{config.guess}}}, and
   {{{config.sub}}} scripts (which aren't used by OMPI).
 * soname version tracking changes (which also aren't used by OMPI;
   they're only used when PLPA is built/installed in "standalone"
   mode).
 * Update the "get version" m4 (which was stolen from OMPI's m4 to
   begin with, and is only used during OMPI's autogen.sh step).
 * Update various PLPA version numbers to 1.3.2.
 * Bug fix in plpa-taskset (which is not built in the OMPI PLPA build).

This commit was SVN r22367.
2010-01-06 00:44:14 +00:00
Josh Hursey
313acba4ce Move the mca_base_is_component_required() functionality to mca/base per suggestion so that it can be reused in other components.
This commit was SVN r22327.
2009-12-17 15:12:26 +00:00
Josh Hursey
6e584c151f We need to check the value of {{{opal_crs_base_metadata_read_token}}} since it may segv if we have a malformed metadata file.
Bug found by Sergio Diaz Montes:
  http://www.open-mpi.org/community/lists/users/2009/11/11176.php

This commit was SVN r22290.
2009-12-09 18:41:56 +00:00
Josh Hursey
e8de64d5a0 Make sure that we release the components that do not qualify for selection. These components are never open'ed really so we never need to close them.
This will need to be applied to v1.4 and v1.5, CMRs to follow.

This commit was SVN r22288.
2009-12-09 15:45:53 +00:00
Rainer Keller
787538ae38 Correct the spelling, and try cmr:v1.5 This should succeed
This commit was SVN r22280.
2009-12-08 18:46:46 +00:00
Ralph Castain
70e385bcab Picky, picky, picky...the a-retentive amongst us wants the default value to show in ompi_info! Of all the nerve...
:-)

Okay, cleanup the prior commit so that the default component search path shows in ompi_info, and remains available in component_find.

This commit was SVN r22278.
2009-12-08 17:32:22 +00:00
Ralph Castain
703ec3d6ce Some minor cleanups to the handling of multi-path component find
This commit was SVN r22275.
2009-12-08 09:34:49 +00:00
Ralph Castain
0b654ba4dc Extend the mca_component_path param usage by allowing a user to add paths to the default system and user ones defined in the program. Thus, the user can specify a param value of:
"my_perfect_path":SYSTEM_DEFAULT:USER_DEFAULT

and OPAL will substitute its internally derived values for the defaults (instead of forcing the user to figure them out).

This commit was SVN r22272.
2009-12-07 20:29:28 +00:00
Ralph Castain
a0d5c80ce0 Add a new framework for discovering local resource information such as cpu type/model, #cpus, available physical memory, etc. Two initial components (darwin and linux) are provided. This is needed to support bootstrap operations where daemons are started at node boot, and applications where initial knowledge of cpu identification is needed to guide framework component selection.
Add orte configuration option to control the use of the framework in the system. Although the code will build, it will not be active unless configured with --enable-bootstrap.

If bootstrap is enabled and the new opal_sysinfo framework can successfully determine the cpu model, pass that info to the application as an MCA param to support some work at Sun.

Also, have daemons report back the resources they find to guide process mapping in bootstrap operations (i.e., where the daemon starts at node boot as opposed to being launched at application start).

Adjust some platform files to enable these capabilities.

This commit was SVN r22244.
2009-11-30 23:11:25 +00:00
Jeff Squyres
5e6c494269 Remove the mistaken line (confirmed by Shiqing).
This commit was SVN r22175.
2009-10-30 12:45:05 +00:00