1
1
Граф коммитов

188 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
7d7c0aa48f Somehow the check for the specific value "external" got dropped in the
logic (even though the "else" clause for handling it was there).  This
commit puts back the specific check for the word "external".

Thanks to Jed Brown for noticing the issue.  Fixes trac:2503.

This commit was SVN r23475.

The following Trac tickets were found above:
  Ticket 2503 --> https://svn.open-mpi.org/trac/ompi/ticket/2503
2010-07-22 11:42:15 +00:00
Jeff Squyres
57d89d1c0c Remove a lot of kruft from the hwloc paffinity directory that we're
not using in Open MPI (i.e., that stuff is only used in the standalone
builds of hwloc -- it's not compiled/installed/used by Open MPI).

This commit was SVN r23416.
2010-07-14 20:46:47 +00:00
Jeff Squyres
6d07a1cc0b Per comments in this commit, hwloc isn't able to find cores on all
platforms (e.g., PPC64 running RHEL 5.4) -- sometimes it only finds
PUs.  So in that case, just run the same calculation, but with PUs
instead of cores.

This commit was SVN r23305.
2010-06-25 21:36:53 +00:00
Jeff Squyres
5cdd79ef13 Oops -- set the bits one at a time via _set. Using _cpu effectively
zeroed out the cpuset before setting the bit (i.e., we always had a
cpuset of 1).

This commit was SVN r23298.
2010-06-23 20:56:59 +00:00
Jeff Squyres
8ce59bb3e3 Use HWLOC_EMBEDDED_LIBS properly (new variable as of 1.0.2a12214).
Should fix some Solaris build issues.

This commit was SVN r23266.
2010-06-09 19:58:42 +00:00
Jeff Squyres
2887fe77c5 Refresh hwloc to an as-yet unreleased tarball from the hwloc 1.0
release branch in order to fix some Solaris bugs.

This commit was SVN r23265.
2010-06-09 19:56:18 +00:00
Jeff Squyres
f1a7b5cc33 Make "processor affinity not supported" error message a little better:
* Remove OPAL_ERR_PAFFINITY_NOT_SUPPORTED; fit it into the generic
   OPAL_ERR_NOT_SUPPORTED case.
 * When odls_default detects that processor affinity is not supported,
   it prints a specific message about it, and then it suppressed a
   generic HNP help message that would normally follow it (i.e., it's
   easier to have the "processor affinity is not supported" show_help
   message last).
 * Use some symbolic names in odls_default instead of fixed int's,
   just for slight readability improvements in the code.
 * Introduce orte_show_help_suppress(), which gives the ability to
   suppress any future showings of any arbitrary show_help() message.
   This is useful if you display message X and want to suppress
   message Y.  This suppression *only* works in environments where
   orte_show_help() does coalescing.

This commit was SVN r23249.
2010-06-08 20:16:07 +00:00
Jeff Squyres
61f5528ec4 Update to hwloc 1.0.1rc1:
* Should fix the issues with 32 bit builds on 64 bit platforms
 * A few windows fixes
 * A few other minor / misc fixes

This commit was SVN r23226.
2010-06-01 14:51:25 +00:00
Jeff Squyres
e41603fb64 Add files into 3 directories that would not otherwise exist in a
distribution tarball, and would therefore cause automake to fail (in
case someone invokes autogen.sh on a distribution tarball).

This commit was SVN r23218.
2010-05-28 19:33:22 +00:00
Jeff Squyres
fec7918eea Some paffinity functions had their return status overloaded:
* If < 0, it's an OPAL_ERR_* value
 * If >= 0, it's the actual output value of the function

This is problematic for the OPAL_SOS stuff.  This commit changes those
functions to always return OPAL_* statuses and send the output value
back through output parameters (like 95% of the rest of the code
base).  This avoids the confusion with OPAL_SOS stuff and makes
paffinity work again (e.g., mpirun --bind-to-core ...).

I updated all paffinitiy modules for the new function signatures, and
bumped the paffinity API version up to 2.0.1.  I don't think the
version change will matter, though, because we'll be introducing
support for hardware threads soon, which will either bump the
paffinity version again or we'll replace paffinity with 
a new framework.

This commit was SVN r23197.
2010-05-21 16:55:28 +00:00
Jeff Squyres
32417b9802 Bump up to hwloc v1.0.
This commit was SVN r23171.
2010-05-18 17:11:45 +00:00
Jeff Squyres
b0cfe91eca Re-enable hwloc component; it should be working now.
I forgot to mention one more thing in the r23152 commit message:

 * Copy the fix for hwloc's m4 to disable the configure flag
   --enable-debug when building in embedding mode, because it can be
   hijacked by the outter-level application.  In this case, if you
   configured OMPI with --enable-debug (or have --enable-debug in a
   platform file), you'd see all of hwloc's debug output.  Ick.  hwloc
   1.0 will include this fix.

This commit was SVN r23153.

The following SVN revision numbers were found above:
  r23152 --> open-mpi/ompi@ca3362021e
2010-05-17 21:07:57 +00:00
Jeff Squyres
ca3362021e Fix some problems noted by Ralph:
* Fix disabling hwloc build (i.e., put the AM_CONDITIONALs where they
   belong in the configure.m4 file)
 * Update some svn:ignores
 * r23142 removed some extraneous code, but forgot to remove the
   variables used only by that code

This commit was SVN r23152.

The following SVN revision numbers were found above:
  r23142 --> open-mpi/ompi@610fc67d12
2010-05-17 21:05:27 +00:00
Ralph Castain
12590202d8 Cleanup warnings
This commit was SVN r23148.
2010-05-16 20:22:00 +00:00
Ralph Castain
da170a7ab9 Turn off the blasted hwloc component as it generates a ton of garbage. Note that this means linux-based systems will -not- have paffinity for now since the good old plpa module was removed.
Clean up some missing ignores

This commit was SVN r23147.
2010-05-16 20:06:14 +00:00
Jeff Squyres
e2ab4f2baf Should be working now...
This commit was SVN r23143.
2010-05-14 15:20:47 +00:00
Jeff Squyres
610fc67d12 Oops -- don't convert to a processor ID here; just return the OS index
of the core.

This commit was SVN r23142.
2010-05-14 15:14:28 +00:00
Jeff Squyres
a27da2473a Ensure the whole directory is built.
This commit was SVN r23140.
2010-05-14 13:21:09 +00:00
Jeff Squyres
3ba4086b0f Remove another debugging message.
This commit was SVN r23139.
2010-05-14 13:20:46 +00:00
Jeff Squyres
a1848ef8d5 Arf. Ignore this component while I fix vpath builds...
This commit was SVN r23138.
2010-05-14 13:03:02 +00:00
Jeff Squyres
2d01a67516 Remove these generates files from SVN.
This commit was SVN r23137.
2010-05-14 11:58:17 +00:00
Jeff Squyres
8c8efa9bf3 Remove debugging message.
This commit was SVN r23136.
2010-05-14 11:57:43 +00:00
Jeff Squyres
21178f9379 Remove the "linux" paffinity component (i.e., the one that was based
on the now-defunct PLPA) -- the new hwloc component supersedes it.  

So long, PLPA -- we loved ya!

This commit was SVN r23126.
2010-05-13 23:59:21 +00:00
Jeff Squyres
3129ccd9ec Make the hwloc paffinity component available for everyone. hwloc
supports a wide variety of operating systems and platforms; see the
opal/mca/paffinity/hwloc/hwloc/README file for details.

This component includes an embedded copy of hwloc, currently based on
hwloc-1.0rc6.  But note that hwloc is properly SVN imported into the
/vendor branch, so it will be easy to update when 1.0 GA is released.
Note that the hwloc tree embedded in opal/mca/paffinity/hwloc/hwloc is
identical to a hwloc distribution tarball, except that much of the
documentation was rm -rf'ed (because we don't need it for the embedded
case).

Since the paffinity framework currently does not understand hardware
threads, the hwloc component compensates for this by identifying cores
by the "first" hardware thread on that core.  Hopefully we'll update
paffinity someday to understand hardware threads.  :-)

configure grew a --with-hwloc option, analogous to what we do for many
other external libraries that OMPI supports.  However, there's a new
feature: due to the request of several distros, OMPI can be configured
to build with its internal copy of hwloc or with an external copy of
hwloc (e.g., a system-installed hwloc).

 1. If --with-hwloc is not specified, Open MPI will try to use its
    internal copy (but silently fail/ignore hwloc if that fails).
 1. If --with-hwloc=<dir> is supplied, Open MPI looks for hwloc
    support in <dir> (and --with-hwloc-libdir=<dir>, if specified).
 1. If --with-hwloc=external is supplied, Open MPI will look for hwloc
    in a compiler/linker default external location.
 1. If --with-hwloc=internal is supplied, Open MPI will use its
    internal copy of hwloc.

Some of OMPI's main configury had to be slightly re-arranged in the
bootstrapping phase to accomodate hwloc's configry needs.

This commit was SVN r23125.
2010-05-13 23:56:05 +00:00
Jeff Squyres
ca6d95a9c8 Clean up some comments; make paffinity/base/base.h comments agree with
paffinity/paffinity.h. 

This commit was SVN r23124.
2010-05-13 23:43:28 +00:00
Jeff Squyres
bf7954c1de Bump up to 1.0rc6 from the vendor branch.
This commit was SVN r23117.
2010-05-12 17:04:48 +00:00
Ralph Castain
d6a1d7a082 Little more cleanup on paffinity. Provide a specific error code for affinity not supported so we can better report the problem. Move the error reporting to orterun so we only get one error message. Update the darwin paffinity module to return the correct new error codes.
This commit was SVN r23107.
2010-05-07 14:04:55 +00:00
Ralph Castain
d4f56cff61 More cleanup on paffinity....groan
It is okay to not have a paffinity module IF you aren't using paffinity anyway. So don't error out of MPI_Init because a paffinity module wasn't selected.

Cleanup error reporting in the odls default module to (once and for all!) eliminate messages originating in the fork'd process. Create some new error codes to allow us to pass enough info back to the parent process to provide useful error messages.

This commit was SVN r23106.
2010-05-06 20:57:17 +00:00
Jeff Squyres
71cbe1a69f Bump up to hwloc v1.0rc3
This commit was SVN r23070.
2010-04-29 15:59:01 +00:00
Jeff Squyres
f064056a07 We don't need all this stuff in OMPI.
This commit was SVN r23056.
2010-04-28 00:31:15 +00:00
Jeff Squyres
2fe1bc043d Bump up to hwloc 1.0rc2
This commit was SVN r23042.
2010-04-26 21:57:51 +00:00
Jeff Squyres
ea8b0ea569 Add a new function in the paffinity base:
opal_paffinity_base_cset2str().  This function basically makes a
prettyprint string out of an opal_paffinity_base_cset_t.

This commit was SVN r23017.
2010-04-21 17:26:36 +00:00
Jeff Squyres
53ab6600e6 Minor update to comments.
This commit was SVN r23013.
2010-04-20 20:59:42 +00:00
Jeff Squyres
f1d4a748eb Minor fix: pass by pointer to the new function so that the caller
can see the results.

This commit was SVN r23012.
2010-04-20 19:52:47 +00:00
Ralph Castain
7717c970a3 Ahem...it requires 2 hex chars to describe each byte of a bitmask...
This commit was SVN r23001.
2010-04-20 05:11:16 +00:00
Ralph Castain
86228aee38 Provide two new opal paffinity utilities for printing a hex representation of the cpu set and parsing that string back into a cpu set on the other end. Also add a new MCA param for passing the cpu set applied to a process during launch down to that process so it can know what we attempted to do.
All to be used in some new MPI extensions provided by Jeff so that users can easily query their binding situation.

This commit was SVN r22998.
2010-04-19 22:16:35 +00:00
Jeff Squyres
338920656f Remove the compile-time proiorities for paffinity modules (they were
done this way a long time ago for the "gee whiz!" factor -- when in
reality, they really only need one-of-many-run-time priority
selection).

Changed run-time priorities to be as follows:

 * darwin: 20
 * linux: 20
 * posix: 10
 * solaris: 30
 * test: 5
 * windows: 20

I have a very dim (possibly untrue) recollection that Solaris needs to
have a higher priority than others just to ensure that no other is
chosen under Solaris.  Make all other "native" components have a
priority of 20 (they shouldn't conflict with each other).  Make the
posix fallback component have a priority of 10.  Make the test
component priority 5, meaning someone can always select it, but you
can also make a "never select me" component that prioritizes itself
under test.

This commit was SVN r22997.
2010-04-19 22:14:06 +00:00
Jeff Squyres
9f5ddbcc6e 3rd party import hwloc 1.0rc1 into the SVN trunk
This commit was SVN r22996.
2010-04-19 19:48:58 +00:00
Jeff Squyres
8b163ccd70 Add dummy hwloc directory for staged import into svn
This commit was SVN r22994.
2010-04-19 19:43:43 +00:00
Ralph Castain
4d06125a33 Establish a method by which a process knows if it has been bound by mpirun. This helps resolve a problem where a process gets "bound" to all available resources, which looks to the opal paffinity system as "not bound". This can cause mpi_init to attempt to "bind" the process itself, causing unintended behavior.
This commit was SVN r22985.
2010-04-17 01:58:26 +00:00
Ralph Castain
41428e6b61 Issue a warning if a requested binding operation results in processes being bound to all available processes, which is the equivalent of not being bound at all.
See the following email thread for further details:

http://www.open-mpi.org/community/lists/devel/2010/04/7745.php

This commit was SVN r22984.
2010-04-17 01:02:41 +00:00
Terry Dontje
282a537cf7 This commit fixes 2370, by having the solaris paffinity module return error codes for get_physical_processor_id and having odls_default_fork_local_proc check get_physical_processor_id for OPAL_ERROR
This commit was SVN r22948.
2010-04-09 15:10:46 +00:00
Brad Benton
58a9aeff5a ================================================================================
modify the OPAL_PAFFINITY_PROCESS_IS_BOUND macro to search the cpuset for
the maximum possible number of cpus rather than just the number of cpus
currently online.  This corrects a problem where mpi_paffinity_alone was
not working properly on systems in which there can be cpu namespaces with
holes, such as on ppc64 with smt off (as discussed in #2365).

This commit was SVN r22927.
2010-04-02 18:24:12 +00:00
Jeff Squyres
a89dc623b0 Brice Goglin noticed that mpi_paffinity_alone didn't seem to be doing
anything for non-MPI apps.  Oops!  (But before you freak out, gentle
reader, note that mpi_paffinity_alone for MPI apps still worked fine)
When we made the switchover somewhere in the 1.3 series to have the
orted's do processor binding, then stuff like:

  mpirun --mca mpi_paffinity_alone 1 hostname

should have bound hostname to processor 0.  But it didn't because of a
subtle startup ordering issue: the MCA param registration for
opal_paffinity_alone was in the paffinity base (vs. being in
opal/runtime/opal_params.c), but it didn't actually get registered
until after the global variable opal_paffinity_alone was checked to
see if we wanted old-style affinity bindings.  Oops.

However, for MPI apps, even though the orted didn't do the binding,
ompi_mpi_init() would notice that opal_paffinity_alone was set, yet
the process didn't seem to be bound.  So the MPI process would bind
itself (this was done to support the running-without-orteds
scenarios).  Hence, MPI apps still obeyed mpi_paffinity_alone
semantics.

But note that the error described above caused the new mpirun switch
--report-bindings to not work with mpi_paffinity_alone=1, meaning that
the orted would not report the bindings when mpi_paffinity_alone was
set to 1 (it ''did'' correctly report bindings if you used
--bind-to-core or one of the other binding options).

This commit separates out the paffinity base MCA param registration
into a small function that can be called at the Right place during the
startup sequence.

This commit was SVN r22602.
2010-02-10 22:32:00 +00:00
Jeff Squyres
dbb29663e8 Update the embedded PLPA version to v1.3.2. Since this is a 3rd
party/"vendor" import, the changes are actually far smaller than the
size of this changeset implies.  Here's a list of the changes:

 * Update the AMD license header in plpa_map.c to be less restrictive
   (see https://svn.open-mpi.org/trac/plpa/changeset/262 for details)
   -- '''this is the most/only important change of this update.'''  No
   code is changed by this; only removing a clase from a license
   header in plpa_map.c.
 * Changes to the generated {{{configure}}}, {{{config.guess}}}, and
   {{{config.sub}}} scripts (which aren't used by OMPI).
 * soname version tracking changes (which also aren't used by OMPI;
   they're only used when PLPA is built/installed in "standalone"
   mode).
 * Update the "get version" m4 (which was stolen from OMPI's m4 to
   begin with, and is only used during OMPI's autogen.sh step).
 * Update various PLPA version numbers to 1.3.2.
 * Bug fix in plpa-taskset (which is not built in the OMPI PLPA build).

This commit was SVN r22367.
2010-01-06 00:44:14 +00:00
Jeff Squyres
9afe50d886 Update Cisco copyrights for consistency
This commit was SVN r22072.
2009-10-07 22:02:32 +00:00
Jeff Squyres
7900451e4e Fix CID 1326: for the (unlikely) case where
opal_paffinity_base_get_processor_info() returns failure.

This commit was SVN r22069.
2009-10-07 19:52:08 +00:00
Jeff Squyres
977574bd45 Fix a problem noted by Julian Seward: MAKE_MEM_UNDEFINED is not the
opposite of MAKE_MEM_DEFINED. Also add in a call to NOACCESS to
(mostly) reverse the effects of MAKE_MEM_DEFINED (technically, page 0
was accessible before this, even though it's a Bad Idea to access it).

This commit was SVN r22056.
2009-10-06 17:55:49 +00:00
Eugene Loh
67bac2fe31 Fix paffinity_linux_module.c. The set and get functions transferred cpu
masks between the mask argument and a local PLPA mask.  There were three
problems:
1) The "get" function computed the number of bits as sizeof(mask),
   which is the size of the pointer to the mask rather than the mask
   itself.  So, only 4 bits were copied with m32 and 8 bits with m64.
   There are actually 1024 bits.
2) The "get" and "set" functions both copied a number of bits computed
   from the sizeof() mask, but sizeof() reports the number of bytes.
   We have to multiply by 8 to get the number of bits.
3) These two functions check to make sure tha the mask argument is not
   bigger than the PLPA mask.  But, the set function copies a number
   of bits in the PLPA mask, which is conceivably greater than the
   number of bits in the mask argument.  So, accesses to the mask
   argument may overrun that argument.
Problems 1 and 2 meant that one would encounter errors when the number of
cores exceeded 4 (with -m32) or 8 (with -m64).  Problem 3 probably caused
no errors.

This commit was SVN r21993.
2009-09-22 16:00:37 +00:00
Ralph Castain
2028017554 Modify the paffinity system to handle binding directives that are "soft" - i.e., when someone directs that we bind if the system supports it. This allows community members to distribute OMPI with default MCA param files that direct general binding policies, without having the distributed software fail if the system cannot support those policies.
The new options work by adding an ":if-avail" qualifier to the "bind-to-socket" and "bind-to-core" MCA params. If the system does not support this capability, the job will launch anyway. Without the qualifier, the job will abort with an error message indicating that the required functionality is not supported on this system.

This commit was SVN r21975.
2009-09-18 19:48:42 +00:00