1
1
Граф коммитов

218 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
2d01a67516 Remove these generates files from SVN.
This commit was SVN r23137.
2010-05-14 11:58:17 +00:00
Jeff Squyres
8c8efa9bf3 Remove debugging message.
This commit was SVN r23136.
2010-05-14 11:57:43 +00:00
Jeff Squyres
21178f9379 Remove the "linux" paffinity component (i.e., the one that was based
on the now-defunct PLPA) -- the new hwloc component supersedes it.  

So long, PLPA -- we loved ya!

This commit was SVN r23126.
2010-05-13 23:59:21 +00:00
Jeff Squyres
3129ccd9ec Make the hwloc paffinity component available for everyone. hwloc
supports a wide variety of operating systems and platforms; see the
opal/mca/paffinity/hwloc/hwloc/README file for details.

This component includes an embedded copy of hwloc, currently based on
hwloc-1.0rc6.  But note that hwloc is properly SVN imported into the
/vendor branch, so it will be easy to update when 1.0 GA is released.
Note that the hwloc tree embedded in opal/mca/paffinity/hwloc/hwloc is
identical to a hwloc distribution tarball, except that much of the
documentation was rm -rf'ed (because we don't need it for the embedded
case).

Since the paffinity framework currently does not understand hardware
threads, the hwloc component compensates for this by identifying cores
by the "first" hardware thread on that core.  Hopefully we'll update
paffinity someday to understand hardware threads.  :-)

configure grew a --with-hwloc option, analogous to what we do for many
other external libraries that OMPI supports.  However, there's a new
feature: due to the request of several distros, OMPI can be configured
to build with its internal copy of hwloc or with an external copy of
hwloc (e.g., a system-installed hwloc).

 1. If --with-hwloc is not specified, Open MPI will try to use its
    internal copy (but silently fail/ignore hwloc if that fails).
 1. If --with-hwloc=<dir> is supplied, Open MPI looks for hwloc
    support in <dir> (and --with-hwloc-libdir=<dir>, if specified).
 1. If --with-hwloc=external is supplied, Open MPI will look for hwloc
    in a compiler/linker default external location.
 1. If --with-hwloc=internal is supplied, Open MPI will use its
    internal copy of hwloc.

Some of OMPI's main configury had to be slightly re-arranged in the
bootstrapping phase to accomodate hwloc's configry needs.

This commit was SVN r23125.
2010-05-13 23:56:05 +00:00
Jeff Squyres
ca6d95a9c8 Clean up some comments; make paffinity/base/base.h comments agree with
paffinity/paffinity.h. 

This commit was SVN r23124.
2010-05-13 23:43:28 +00:00
Jeff Squyres
bf7954c1de Bump up to 1.0rc6 from the vendor branch.
This commit was SVN r23117.
2010-05-12 17:04:48 +00:00
Ralph Castain
d6a1d7a082 Little more cleanup on paffinity. Provide a specific error code for affinity not supported so we can better report the problem. Move the error reporting to orterun so we only get one error message. Update the darwin paffinity module to return the correct new error codes.
This commit was SVN r23107.
2010-05-07 14:04:55 +00:00
Ralph Castain
d4f56cff61 More cleanup on paffinity....groan
It is okay to not have a paffinity module IF you aren't using paffinity anyway. So don't error out of MPI_Init because a paffinity module wasn't selected.

Cleanup error reporting in the odls default module to (once and for all!) eliminate messages originating in the fork'd process. Create some new error codes to allow us to pass enough info back to the parent process to provide useful error messages.

This commit was SVN r23106.
2010-05-06 20:57:17 +00:00
Jeff Squyres
71cbe1a69f Bump up to hwloc v1.0rc3
This commit was SVN r23070.
2010-04-29 15:59:01 +00:00
Jeff Squyres
f064056a07 We don't need all this stuff in OMPI.
This commit was SVN r23056.
2010-04-28 00:31:15 +00:00
Jeff Squyres
2fe1bc043d Bump up to hwloc 1.0rc2
This commit was SVN r23042.
2010-04-26 21:57:51 +00:00
Jeff Squyres
ea8b0ea569 Add a new function in the paffinity base:
opal_paffinity_base_cset2str().  This function basically makes a
prettyprint string out of an opal_paffinity_base_cset_t.

This commit was SVN r23017.
2010-04-21 17:26:36 +00:00
Jeff Squyres
53ab6600e6 Minor update to comments.
This commit was SVN r23013.
2010-04-20 20:59:42 +00:00
Jeff Squyres
f1d4a748eb Minor fix: pass by pointer to the new function so that the caller
can see the results.

This commit was SVN r23012.
2010-04-20 19:52:47 +00:00
Ralph Castain
7717c970a3 Ahem...it requires 2 hex chars to describe each byte of a bitmask...
This commit was SVN r23001.
2010-04-20 05:11:16 +00:00
Ralph Castain
86228aee38 Provide two new opal paffinity utilities for printing a hex representation of the cpu set and parsing that string back into a cpu set on the other end. Also add a new MCA param for passing the cpu set applied to a process during launch down to that process so it can know what we attempted to do.
All to be used in some new MPI extensions provided by Jeff so that users can easily query their binding situation.

This commit was SVN r22998.
2010-04-19 22:16:35 +00:00
Jeff Squyres
338920656f Remove the compile-time proiorities for paffinity modules (they were
done this way a long time ago for the "gee whiz!" factor -- when in
reality, they really only need one-of-many-run-time priority
selection).

Changed run-time priorities to be as follows:

 * darwin: 20
 * linux: 20
 * posix: 10
 * solaris: 30
 * test: 5
 * windows: 20

I have a very dim (possibly untrue) recollection that Solaris needs to
have a higher priority than others just to ensure that no other is
chosen under Solaris.  Make all other "native" components have a
priority of 20 (they shouldn't conflict with each other).  Make the
posix fallback component have a priority of 10.  Make the test
component priority 5, meaning someone can always select it, but you
can also make a "never select me" component that prioritizes itself
under test.

This commit was SVN r22997.
2010-04-19 22:14:06 +00:00
Jeff Squyres
9f5ddbcc6e 3rd party import hwloc 1.0rc1 into the SVN trunk
This commit was SVN r22996.
2010-04-19 19:48:58 +00:00
Jeff Squyres
8b163ccd70 Add dummy hwloc directory for staged import into svn
This commit was SVN r22994.
2010-04-19 19:43:43 +00:00
Ralph Castain
4d06125a33 Establish a method by which a process knows if it has been bound by mpirun. This helps resolve a problem where a process gets "bound" to all available resources, which looks to the opal paffinity system as "not bound". This can cause mpi_init to attempt to "bind" the process itself, causing unintended behavior.
This commit was SVN r22985.
2010-04-17 01:58:26 +00:00
Ralph Castain
41428e6b61 Issue a warning if a requested binding operation results in processes being bound to all available processes, which is the equivalent of not being bound at all.
See the following email thread for further details:

http://www.open-mpi.org/community/lists/devel/2010/04/7745.php

This commit was SVN r22984.
2010-04-17 01:02:41 +00:00
Terry Dontje
282a537cf7 This commit fixes 2370, by having the solaris paffinity module return error codes for get_physical_processor_id and having odls_default_fork_local_proc check get_physical_processor_id for OPAL_ERROR
This commit was SVN r22948.
2010-04-09 15:10:46 +00:00
Brad Benton
58a9aeff5a ================================================================================
modify the OPAL_PAFFINITY_PROCESS_IS_BOUND macro to search the cpuset for
the maximum possible number of cpus rather than just the number of cpus
currently online.  This corrects a problem where mpi_paffinity_alone was
not working properly on systems in which there can be cpu namespaces with
holes, such as on ppc64 with smt off (as discussed in #2365).

This commit was SVN r22927.
2010-04-02 18:24:12 +00:00
Jeff Squyres
a89dc623b0 Brice Goglin noticed that mpi_paffinity_alone didn't seem to be doing
anything for non-MPI apps.  Oops!  (But before you freak out, gentle
reader, note that mpi_paffinity_alone for MPI apps still worked fine)
When we made the switchover somewhere in the 1.3 series to have the
orted's do processor binding, then stuff like:

  mpirun --mca mpi_paffinity_alone 1 hostname

should have bound hostname to processor 0.  But it didn't because of a
subtle startup ordering issue: the MCA param registration for
opal_paffinity_alone was in the paffinity base (vs. being in
opal/runtime/opal_params.c), but it didn't actually get registered
until after the global variable opal_paffinity_alone was checked to
see if we wanted old-style affinity bindings.  Oops.

However, for MPI apps, even though the orted didn't do the binding,
ompi_mpi_init() would notice that opal_paffinity_alone was set, yet
the process didn't seem to be bound.  So the MPI process would bind
itself (this was done to support the running-without-orteds
scenarios).  Hence, MPI apps still obeyed mpi_paffinity_alone
semantics.

But note that the error described above caused the new mpirun switch
--report-bindings to not work with mpi_paffinity_alone=1, meaning that
the orted would not report the bindings when mpi_paffinity_alone was
set to 1 (it ''did'' correctly report bindings if you used
--bind-to-core or one of the other binding options).

This commit separates out the paffinity base MCA param registration
into a small function that can be called at the Right place during the
startup sequence.

This commit was SVN r22602.
2010-02-10 22:32:00 +00:00
Jeff Squyres
dbb29663e8 Update the embedded PLPA version to v1.3.2. Since this is a 3rd
party/"vendor" import, the changes are actually far smaller than the
size of this changeset implies.  Here's a list of the changes:

 * Update the AMD license header in plpa_map.c to be less restrictive
   (see https://svn.open-mpi.org/trac/plpa/changeset/262 for details)
   -- '''this is the most/only important change of this update.'''  No
   code is changed by this; only removing a clase from a license
   header in plpa_map.c.
 * Changes to the generated {{{configure}}}, {{{config.guess}}}, and
   {{{config.sub}}} scripts (which aren't used by OMPI).
 * soname version tracking changes (which also aren't used by OMPI;
   they're only used when PLPA is built/installed in "standalone"
   mode).
 * Update the "get version" m4 (which was stolen from OMPI's m4 to
   begin with, and is only used during OMPI's autogen.sh step).
 * Update various PLPA version numbers to 1.3.2.
 * Bug fix in plpa-taskset (which is not built in the OMPI PLPA build).

This commit was SVN r22367.
2010-01-06 00:44:14 +00:00
Jeff Squyres
9afe50d886 Update Cisco copyrights for consistency
This commit was SVN r22072.
2009-10-07 22:02:32 +00:00
Jeff Squyres
7900451e4e Fix CID 1326: for the (unlikely) case where
opal_paffinity_base_get_processor_info() returns failure.

This commit was SVN r22069.
2009-10-07 19:52:08 +00:00
Jeff Squyres
977574bd45 Fix a problem noted by Julian Seward: MAKE_MEM_UNDEFINED is not the
opposite of MAKE_MEM_DEFINED. Also add in a call to NOACCESS to
(mostly) reverse the effects of MAKE_MEM_DEFINED (technically, page 0
was accessible before this, even though it's a Bad Idea to access it).

This commit was SVN r22056.
2009-10-06 17:55:49 +00:00
Eugene Loh
67bac2fe31 Fix paffinity_linux_module.c. The set and get functions transferred cpu
masks between the mask argument and a local PLPA mask.  There were three
problems:
1) The "get" function computed the number of bits as sizeof(mask),
   which is the size of the pointer to the mask rather than the mask
   itself.  So, only 4 bits were copied with m32 and 8 bits with m64.
   There are actually 1024 bits.
2) The "get" and "set" functions both copied a number of bits computed
   from the sizeof() mask, but sizeof() reports the number of bytes.
   We have to multiply by 8 to get the number of bits.
3) These two functions check to make sure tha the mask argument is not
   bigger than the PLPA mask.  But, the set function copies a number
   of bits in the PLPA mask, which is conceivably greater than the
   number of bits in the mask argument.  So, accesses to the mask
   argument may overrun that argument.
Problems 1 and 2 meant that one would encounter errors when the number of
cores exceeded 4 (with -m32) or 8 (with -m64).  Problem 3 probably caused
no errors.

This commit was SVN r21993.
2009-09-22 16:00:37 +00:00
Ralph Castain
2028017554 Modify the paffinity system to handle binding directives that are "soft" - i.e., when someone directs that we bind if the system supports it. This allows community members to distribute OMPI with default MCA param files that direct general binding policies, without having the distributed software fail if the system cannot support those policies.
The new options work by adding an ":if-avail" qualifier to the "bind-to-socket" and "bind-to-core" MCA params. If the system does not support this capability, the job will launch anyway. Without the qualifier, the job will abort with an error message indicating that the required functionality is not supported on this system.

This commit was SVN r21975.
2009-09-18 19:48:42 +00:00
Jeff Squyres
2fa048b0e0 Make the paffinity test component only build when you --enable-debug
(or have a developer build where that's enabled for you by default).

This commit was SVN r21928.
2009-09-02 11:23:54 +00:00
Ralph Castain
388c65fd80 Add missing include file
This commit was SVN r21924.
2009-09-01 13:31:54 +00:00
Ralph Castain
888f3c3afe Extend the paffinity test module to allow users to specify the number of sockets and cores - provides an extended ability to mimic archs.
This commit was SVN r21912.
2009-08-29 03:35:39 +00:00
Ralph Castain
3c4f28b22c Modify the paffinity test module to take a param indicating whether or not to mimic being externally bound
This commit was SVN r21908.
2009-08-28 02:31:01 +00:00
Ralph Castain
2178f995b9 Add a new "test" module to the paffinity framework that mimics a system that supports affinity when running on a Mac for development purposes. Only active if specifically called out.
This commit was SVN r21881.
2009-08-26 01:55:30 +00:00
Rainer Keller
8e1b23779f - Replace combinations of
#if defined (c_plusplus)
          defined (__cplusplus)
   followed by
      extern "C" {
   and the closing counterpart by BEGIN_C_DECLS and END_C_DECLS.

   Notable exceptions are:
    - opal/include/opal_config_bottom.h:
      This is our generated code, that itself defines BEGIN_C_DECL and
      END_C_DECL
    - ompi/mpi/cxx/mpicxx.h:
      Here we do not include opal_config_bottom.h:                                 
    - Belongs to external code:                                                    
      opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.c        
      opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.h        
    - opal/include/opal/prefetch.h:
      Has C++ specific macros that are protected:                                  

    - Had #if ... } #endif  _and_ END_C_DECLS (aka end up with 2x
      END_C_DECLS)
      ompi/mca/btl/openib/btl_openib.h
    - opal/event/event.h has #ifdef __cplusplus as BEGIN_C_DECLS...
    - opal/win32/ompi_process.h: had extern "C"\n {...
      opal/win32/ompi_process.h: dito
    - ompi/mca/btl/pcie/btl_pcie_lex.l: needed to add *_C_DECLS
      ompi/mpi/f90/test/align_c.c: dito
    - ompi/debuggers/msgq_interface.h: used #ifdef __cplusplus
    - ompi/mpi/f90/xml/common-C.xsl: Amend

   Tested on linux using --with-openib and --with-mx

   The following do not contain either opal_config.h, orte_config.h or
   ompi_config.h
   (but possibly other header files, that include one of the above):
      ompi/mca/bml/r2/bml_r2_ft.h
      ompi/mca/btl/gm/btl_gm_endpoint.h
      ompi/mca/btl/gm/btl_gm_proc.h
      ompi/mca/btl/mx/btl_mx_endpoint.h
      ompi/mca/btl/ofud/btl_ofud_endpoint.h
      ompi/mca/btl/ofud/btl_ofud_frag.h
      ompi/mca/btl/ofud/btl_ofud_proc.h
      ompi/mca/btl/openib/btl_openib_mca.h
      ompi/mca/btl/portals/btl_portals_endpoint.h
      ompi/mca/btl/portals/btl_portals_frag.h
      ompi/mca/btl/sctp/btl_sctp_endpoint.h
      ompi/mca/btl/sctp/btl_sctp_proc.h
      ompi/mca/btl/tcp/btl_tcp_endpoint.h
      ompi/mca/btl/tcp/btl_tcp_ft.h
      ompi/mca/btl/tcp/btl_tcp_proc.h
      ompi/mca/btl/template/btl_template_endpoint.h
      ompi/mca/btl/template/btl_template_proc.h
      ompi/mca/btl/udapl/btl_udapl_eager_rdma.h
      ompi/mca/btl/udapl/btl_udapl_endpoint.h
      ompi/mca/btl/udapl/btl_udapl_mca.h
      ompi/mca/btl/udapl/btl_udapl_proc.h
      ompi/mca/mtl/mx/mtl_mx_endpoint.h
      ompi/mca/mtl/mx/mtl_mx.h
      ompi/mca/mtl/psm/mtl_psm_endpoint.h
      ompi/mca/mtl/psm/mtl_psm.h
      ompi/mca/pml/cm/pml_cm_component.h
      ompi/mca/pml/csum/pml_csum_comm.h
      ompi/mca/pml/dr/pml_dr_comm.h
      ompi/mca/pml/dr/pml_dr_component.h
      ompi/mca/pml/dr/pml_dr_endpoint.h
      ompi/mca/pml/dr/pml_dr_recvfrag.h
      ompi/mca/pml/example/pml_example.h
      ompi/mca/pml/ob1/pml_ob1_comm.h
      ompi/mca/pml/ob1/pml_ob1_component.h
      ompi/mca/pml/ob1/pml_ob1_endpoint.h
      ompi/mca/pml/ob1/pml_ob1_rdmafrag.h
      ompi/mca/pml/ob1/pml_ob1_recvfrag.h
      ompi/mca/pml/v/pml_v_output.h
      opal/include/opal/prefetch.h
      opal/mca/timer/aix/timer_aix.h
      opal/util/qsort.h
      test/support/components.h

This commit was SVN r21855.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2009-08-20 11:42:18 +00:00
Ralph Castain
aca3e71ccd Don't declare us "bound" if the cpu mask is completely zero
This commit was SVN r21839.
2009-08-19 18:55:06 +00:00
Ralph Castain
1dc12046f1 Modify the OMPI paffinity and mapping system to support socket-level mapping and binding. Mostly refactors existing code, with modifications to the odls_default module to support the new capabilities.
Adds several new mpirun options:

* -bysocket - assign ranks on a node by socket. Effectively load balances the procs assigned to a node across the available sockets. Note that ranks can still be bound to a specific core within the socket, or to the entire socket - the mapping is independent of the binding.

* -bind-to-socket - bind each rank to all the cores on the socket to which they are assigned.

* -bind-to-core - currently the default behavior (maintained from prior default)

* -npersocket N - launch N procs for every socket on a node. Note that this implies we know how many sockets are on a node. Mpirun will determine its local values. These can be overridden by provided values, either via MCA param or in a hostfile

Similar features/options are provided at the board level for multi-board nodes.

Documentation to follow...

This commit was SVN r21791.
2009-08-11 02:51:27 +00:00
Ralph Castain
c459615f8f When someone specifies a rank-file slot-list of N:*, stop the loop at the proper place (we were going through the loop one too many times).
Thanks to Eugene for spotting it.

This commit was SVN r21728.
2009-07-23 17:51:15 +00:00
George Bosilca
3e971e61f3 The system headers are supposed to be protected by #ifdef and not by #if.
This commit was SVN r21700.
2009-07-16 18:27:33 +00:00
Ralph Castain
d3fb39073f Initialize a variable to ensure we get the correct number of bound processors
This commit was SVN r21590.
2009-07-02 17:48:04 +00:00
Ralph Castain
c3c1ab1337 Correct a comment in paffinity.h about what paffinity_get returns - it was inaccurate.
Revamp the affinity detection/set procedure in mpi_init to correctly detect when we have already been bound to processors, given the revised understanding of paffinity_get. Add a new paffinity macro to make checking for already bound a little nicer.

This commit was SVN r21402.
2009-06-09 14:33:35 +00:00
Ralph Castain
aa25a51c92 Do not mark the mpi_paffinity_alone param as deprecated so we don't scare Jeff...er...users.
This commit was SVN r21218.
2009-05-12 15:41:11 +00:00
Ralph Castain
d396f0a6fc Per the discussion on the devel list, move the binding of processes to processors from MPI_Init to process start. This involves:
1. replacing mpi_paffinity_alone with opal_paffinity_alone - for back-compatibility, I have aliased mpi_paffinity_alone to the new param name. This caus
es a mild abstraction break in the opal/mca/paffinity framework - per the devel discussion...live with it. :-) I also moved the ompi_xxx global variable
 that tracked maffinity setup so it could be properly closed in MPI_Finalize to the opal/mca/maffinity framework to avoid an abstraction break.

2. Added code to the odls/default module to perform paffinity binding and maffinity init between process fork and exec. This has been tested on IU's odi
n cluster and works for both MPI and non-MPI apps.

3. Revise MPI_Init to detect if affinity has already been set, and to attempt to set it if not already done. I have *not* tested this as I haven't yet f
igured out a way to do so - I couldn't get slurm to perform cpu bindings, even though it supposedly does do so.

This has only been lightly tested and would definitely benefit from a wider range of evaluation...

This commit was SVN r21209.
2009-05-12 02:18:35 +00:00
Rainer Keller
221fb9dbca ... Delayed due to notifier commits earlier this day ...
- Delete unnecessary header files using
   contrib/check_unnecessary_headers.sh after applying
   patches, that include headers, being "lost" due to
   inclusion in one of the now deleted headers...

   In total 817 files are touched.
   In ompi/mpi/c/ header files are moved up into the actual c-file,
   where necessary (these are the only additional #include),
   otherwise it is only deletions of #include (apart from the above
   additions required due to notifier...)

 - To get different MCAs (OpenIB, TM, ALPS), an earlier version was
   successfully compiled (yesterday) on:
   Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
   Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
   Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled

This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
Shiqing Fan
3d4e0472d6 Add windows support files into the tarball, including .windows, CMakeLists.txt files, and CMake modules. Thanks to Jeff for testing it on Linux.
This commit was SVN r21069.
2009-04-24 16:39:33 +00:00
Jeff Squyres
3cfa8f55c4 Gaah; I meant to include a better comment in the last commit but had
forgotten to save before the commit was sent.

This comment explains why we're doing a cache check here rather than a
real check.

This commit was SVN r20975.
2009-04-10 21:16:23 +00:00
Jeff Squyres
9fcd01035d Fix a problem reported by Steve Kagl on the user's list; the posix
component (which we probably don't test regularly because we probably
only test environments where the other paffinity components are used)
was not getting built because it had a bad configure test.

This commit was SVN r20974.
2009-04-10 21:15:20 +00:00
Terry Dontje
d521a7bb71 Add visibility feature for Sun Studio compilers.
This commit was SVN r20833.
2009-03-20 10:13:01 +00:00
Jeff Squyres
730a1b80b2 Roll in 1.3rc4, which includes a fix for a problem discovered by LANL that the test for Valgrind's macro was not strong enough and may therefore try to compile in valgrind support into PLPA even if your version of Valgrind was too old.
This commit was SVN r20819.
2009-03-17 22:18:26 +00:00
Rainer Keller
d8cf4c0fec - Get pgcc on XT to complain less:
In case we use memcmp, strlen, strup and friends include <string.h>
   Also several constants.h are not included directly
 - Let's have mca_topo_base_cart_create  return ompi-errors in
   ompi/mca/topo/base/topo_base_cart_create.c

This commit was SVN r20773.
2009-03-13 02:10:32 +00:00
Jeff Squyres
ed22f9744e Bring in PLPA v1.3rc3 (add a missing comma, which should fix compiling
issues at Sun).  

Sorry for the middle of the day configure change, but this should fix
a compile break at Sun...

This commit was SVN r20594.
2009-02-19 17:42:43 +00:00
Jeff Squyres
558fc2836d Bump PLPA version to 1.3rc2, which should fix the "make dist" error
from last night's nightly tarball.

This commit was SVN r20576.
2009-02-17 12:54:57 +00:00
Jeff Squyres
4590582807 Bump PLPA to v1.3rc1, which includes a valgrind API fix for a
known-bad memory access pattern.  Specifically, a NULL pointer is
passed in a system call as part of a probe to figure out which
affinity API this system has.  We know it's a NULL and we did it on
purpose, so don't have Valgrind yell about it.

This commit was SVN r20572.
2009-02-17 02:01:30 +00:00
Ralph Castain
883a0972bc Add missing include file
This commit was SVN r20529.
2009-02-12 16:41:48 +00:00
Ralph Castain
4cdf91a8d4 Per the RFC, extend the current use of the ompi_proc_t flags field (without changing the field itself).
The prior ompi_proc_t structure had a uint8_t flag field in it, where only one
bit was used to flag that a proc was "local". In that context, "local" was
constrained to mean "local to this node".

This commit provides a greater degree of granularity on the term "local", to include tests
to see if the proc is on the same socket, PC board, node, switch, CU (computing
unit), and cluster.

Add #define's to designate which bits stand for which local condition. This
was added to the OPAL layer to avoid conflicting with the proposed movement of
the BTLs. To make it easier to use, a set of macros have been defined - e.g.,
OPAL_PROC_ON_LOCAL_SOCKET - that test the specific bit. These can be used in
the code base to clearly indicate which sense of locality is being considered.

All locations in the code base that looked at the current proc_t field have
been changed to use the new macros.

Also modify the orte_ess modules so that each returns a uint8_t (to match the
ompi_proc_t field) that contains a complete description of the locality of this
proc. Obviously, not all environments will be capable of providing such detailed
info. Thus, getting a "false" from a test for "on_local_socket" may simply
indicate a lack of knowledge.

This commit was SVN r20496.
2009-02-10 02:20:16 +00:00
Jeff Squyres
d1c6f3f89a * Fix a truckload of Cisco copyrights to be the same as the rest of
the code base.
 * Fix a few misspellings in other copyrights.

This commit was SVN r20241.
2009-01-11 02:30:00 +00:00
Shiqing Fan
a5281f0434 - 1/4 commit for Windows Visual Studio and CCP support:
CMakeLists and .windows files.
  In contribs preconfigured and precompiled parts.

This commit was SVN r20108.
2008-12-10 20:59:20 +00:00
Jeff Squyres
7b32402959 Fixes from Brian for OS X 10.4.
This commit was SVN r19953.
2008-11-07 22:13:43 +00:00
Jeff Squyres
53967f2b4e Merge in PLPA v1.2rc2 (README fixes, new version of Autotools, and
have PLPA report its version correctly).

This commit was SVN r19590.
2008-09-19 15:05:03 +00:00
Jeff Squyres
b1ff61b19e Update to PLPA v1.2rc1
This commit was SVN r19589.
2008-09-19 14:49:53 +00:00
Lenny Verkhovsky
ca0a5ea60b Fixed the warnings on the crays.
base/paffinity_base_service.c:153: warning: 'phys_core' may be used uninitialized in this function
base/paffinity_base_service.c:153: note: 'phys_core' was declared here

This commit was SVN r19580.
2008-09-18 11:31:12 +00:00
Jeff Squyres
4b5de753d4 Bring in new PLPA v1.2b5 to fix a typo found by Lenny.
This commit was SVN r19526.
2008-09-09 12:29:31 +00:00
Ralph Castain
274d912fe1 Silence warnings in paffinity_base_service
This commit was SVN r19453.
2008-08-28 22:11:49 +00:00
Ralph Castain
4ef9d15d97 Revamp the opal mca paffinity interface. We ran into a problem when we encountered machines that had "holes" in their physical processor layout - e.g., machines that supported "hotplugging", or that had unpopulated sockets. To solve that problem, we had to clarify at the API level where we were describing physical vs logical processor info, and then translate accordingly in the underlying implementation.
See opal/mca/paffinity/paffinity.h for explanation as to the physical vs logical nature of the params used in the API.

Fixes trac:1435

This commit was SVN r19391.

The following Trac tickets were found above:
  Ticket 1435 --> https://svn.open-mpi.org/trac/ompi/ticket/1435
2008-08-21 19:21:28 +00:00
Jeff Squyres
80d11dba8f Bring in PLPA v1.2b4
This commit was SVN r19299.
2008-08-14 21:04:28 +00:00
Jeff Squyres
bb585922fd This is fixed a different way now; no need to be different than stock
PLPA.

This commit was SVN r19293.
2008-08-14 18:54:34 +00:00
Jeff Squyres
a6e0589f01 Update to PLPA v1.2b3. Sorry again for the mid-day configure change...
This commit was SVN r19292.
2008-08-14 14:26:26 +00:00
Jeff Squyres
a19cf02c2b Refs trac:1435
Bring in a new version of PLPA (v1.2b2) with some new capabilities for
offline processors and mapping of the Nth processor/socket/core to its
corresponding Linux processor/socket/core ID.

(Sorry for the configure change in the middle of the day, folks -- I
need it to be able to continue to integrate paffinity changes for
#1435...)

This commit was SVN r19282.

The following Trac tickets were found above:
  Ticket 1435 --> https://svn.open-mpi.org/trac/ompi/ticket/1435
2008-08-13 20:18:37 +00:00
George Bosilca
5a885a9150 Safety net for the sscanf function. Without the \0 at the end of the
buffer, we can read outside the allocated memory.

This commit was SVN r19258.
2008-08-12 16:59:27 +00:00
Jeff Squyres
b97a185fb1 Fix CID 859: fix minor memory leak
This commit was SVN r19241.
2008-08-11 21:35:03 +00:00
Ralph Castain
21ba1b2ec0 Modify the configure system in the paffinity framework so that only one component is built. Cleanout variable name conflicts that on some systems prevented building
This commit was SVN r19122.
2008-08-01 22:54:24 +00:00
Jeff Squyres
1a3045ff81 * Remove some extraneous AC_MSG_RESULT's
* Make the results of the top-level configure.ac test for
   _SC_NPROCESSORS_ONLN be cached so that we can check for it
   elsewhere (e.g., opal/mca/paffinity/posix/configure.m4)
 * Update top-level configure.ac test for _SC_NPROCESSORS_ONLN: stamp
   out another AC_TRY_COMPILE
 * Ensure paffinity:posix doesn't even try to compile if we don't
   have _SC_NPROCESSORS_ONLN
 * Minor style updates

This commit was SVN r19118.
2008-08-01 11:41:08 +00:00
Ralph Castain
e1501f2c9c Add darwin paffinity component to handle the difference between Tiger and Leopard. Although both are POSIX compatible, Tiger is a tad different in this regard and requires a different interface to get the #processor data.
This commit was SVN r19117.
2008-08-01 00:15:10 +00:00
Lenny Verkhovsky
90a784dfca Making paffinity_base_slot_list invisible for the user
This commit was SVN r19096.
2008-07-30 14:52:45 +00:00
Terry Dontje
0ff11f7523 Added initialization and proper increment of the value of num_processors
pointer.  This commit fixes trac:1420.

This commit was SVN r19089.

The following Trac tickets were found above:
  Ticket 1420 --> https://svn.open-mpi.org/trac/ompi/ticket/1420
2008-07-30 10:29:05 +00:00
Jeff Squyres
0af7ac53f2 Fixes trac:1392, #1400
* add "register" function to mca_base_component_t
   * converted coll:basic and paffinity:linux and paffinity:solaris to
     use this function
   * we'll convert the rest over time (I'll file a ticket once all
     this is committed)
 * add 32 bytes of "reserved" space to the end of mca_base_component_t
   and mca_base_component_data_2_0_0_t to make future upgrades
   [slightly] easier
   * new mca_base_component_t size: 196 bytes
   * new mca_base_component_data_2_0_0_t size: 36 bytes
 * MCA base version bumped to v2.0
   * '''We now refuse to load components that are not MCA v2.0.x'''
 * all MCA frameworks versions bumped to v2.0
 * be a little more explicit about version numbers in the MCA base
   * add big comment in mca.h about versioning philosophy

This commit was SVN r19073.

The following Trac tickets were found above:
  Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392
2008-07-28 22:40:57 +00:00
Ralph Castain
fdb2408bf2 Rename the osx paffinity component the "posix" component since it really has nothing osx specific in it - it is just a generic posix call to determine #processors. Set the priority low so that both linux and solaris components override it if they build. It shouldn't build in Windows at all.
Modify the odls to remove a (size_t) typecast in front of the num_processors variable just in case it is returned negative. This usually is accompanied by an opal_error, so this shouldn't make any difference - but it is more technically correct.

This commit was SVN r19008.
2008-07-24 01:54:51 +00:00
Jeff Squyres
1fd5b0402a Refs trac:1250
* Fix linux paffinity component to make a "best" guess when PLPA
   can't find topology information in the Linux kernel.  That is, if
   PLPA can't tell us the max_processor_id, just assume that it's the
   same as the number of processors.  If you have a more complex
   system than that (e.g., you have holes in your available processor
   IDs), you'll likely be running a Linux kernel that supports the
   topology information, and this problem won't happen.
 * Make sure to conver the return codes from PLPA to OPAL_ERR* codes.

This commit was SVN r19001.

The following Trac tickets were found above:
  Ticket 1250 --> https://svn.open-mpi.org/trac/ompi/ticket/1250
2008-07-23 15:47:43 +00:00
Shiqing Fan
5f021e47a9 - Add support for get_processor_info in windows paffinity module.
This commit was SVN r18992.
2008-07-23 07:59:03 +00:00
Ralph Castain
f32e24ab86 Move the POSIX-specific code out of the paffinity base. Add support for OSX in its own component.
For now, hide the OSX component with .ompi_ignore so only I can see it until I can ensure that it doesn't inadvertently interfere with Linux and Solaris support.

This clears the conflict with Windows.

This commit was SVN r18989.
2008-07-23 03:29:43 +00:00
Ralph Castain
28ca14297c Add minimal support (#processors only) for OSX and other systems that don't have paffinity modules.
This commit was SVN r18959.
2008-07-21 16:54:14 +00:00
Jeff Squyres
cb36782310 Make this parameter visible to users; it was a mistake/typo to make
it hidden.

This commit was SVN r18902.
2008-07-14 11:21:52 +00:00
Lenny Verkhovsky
a812324963 Fixing "paffinity_base_slot_list" environment
This commit was SVN r18900.
2008-07-14 07:10:50 +00:00
Jeff Squyres
583bf425c0 Fixes trac:1383:
Short version: remove opal_paffinity_alone and restore
mpi_paffinity_alone.  ORTE makes various information available for the
MPI layer to decide what it wants to do in terms of processor
affinity.

Details:

 * remove opal_paffinity_alone MCA param; restore mpi_paffinity_alone
   MCA param
 * move opal_paffinity_slot_list param registration to paffinity base
 * ompi_mpi_init() calls opal_paffinity_base_slot_list_set(); if that
   succeeds use that.  If no slot list was set, see if
   mpi_paffinity_alone was set.  If so, bind this process to its Node
   Local Rank (NLR).  The NLR is the ORTE-maintained slot ID; if you
   COMM_SPAWN to a host in this ORTE universe that already has procs
   on it, the NLR for the new job will start at N (not 0).  So this is
   slightly better than mpi_paffinity_alone in the v1.2 series.
 * If a slot list is specified *and* mpi_paffinity_alone is set, we
   display an error and abort.
 * Remove calls from rmaps/rank_file component to register and lookup
   opal_paffinity mca params. 
 * Remove code in orte/odls that set affinities - instead, have them
   just pass a slot_list if it exists. 
 * Cleanup the orte/odls code that determined
   oversubscribed/want_processor as these were just opposites of each
   other.

This commit was SVN r18874.

The following Trac tickets were found above:
  Ticket 1383 --> https://svn.open-mpi.org/trac/ompi/ticket/1383
2008-07-10 21:12:45 +00:00
Lenny Verkhovsky
ba1fa73881 Selectign Maffinity only if Paffinity selected fix
This commit was SVN r18797.
2008-07-03 13:39:34 +00:00
Pak Lui
188c8bce5d Fix the SEGV when module_get finds that no proc is binded. Also make no-intr available for processor binding.
This commit was SVN r18671.
2008-06-18 16:03:08 +00:00
Josh Hursey
1de50b523c Fix some Coverity 'Event set_but_not_used' highlights.
Thanks to Jeff for bringing them to my attention.

This commit was SVN r18606.
2008-06-06 14:38:41 +00:00
Jeff Squyres
12a3fe57e1 As pointed out by Ralf
W. (http://www.open-mpi.org/community/lists/devel/2008/06/4095.php),
these dependencies don't need to be here.

This commit was SVN r18603.
2008-06-06 01:20:47 +00:00
Lenny Verkhovsky
a8b5dcb204 Added more output info about socket:core pair in paffinity / rankfile components
This commit was SVN r18589.
2008-06-05 10:28:44 +00:00
Jeff Squyres
d45cb82ecc Fix two bugs in PLPA:
1. If we don't have the topology information, don't bother trying to
    create cross-referencing information
 1. Ensure to only check for valid processor ID's

This commit was SVN r18462.
2008-05-20 12:57:12 +00:00
Terry Dontje
517abf9b09 This commit fixes trac:1288.
This commit was SVN r18441.

The following Trac tickets were found above:
  Ticket 1288 --> https://svn.open-mpi.org/trac/ompi/ticket/1288
2008-05-15 17:40:08 +00:00
Josh Hursey
9971bc9d95 Merge in the mca_base_select changes per RFC:
http://www.open-mpi.org/community/lists/devel/2008/04/3779.php

{{{
svn merge -r 18276:18380 https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play .
}}}

Any components not in the trunk, but in one of the effected frameworks *must* be
updated. Contact the list, look at the RFC, or look at the diff for how to do this.

Sorry for the early commit of this, but I wanted to get it in today (per RFC) and
didn't know if I would have a chance later today.

This commit was SVN r18381.
2008-05-06 18:08:45 +00:00
George Bosilca
60111ce66d Few less warnings.
This commit was SVN r18025.
2008-03-30 19:06:49 +00:00
Lenny Verkhovsky
fa6a084d33 added opal/mca/paffinity/base/paffinity_base_service.c with paffinity functions
This commit was SVN r18020.
2008-03-30 12:01:02 +00:00
Lenny Verkhovsky
7e45d7e134 Few updates due to RMAPS rank_file component changes
1. applied prefix rule to functions and variables of RMAPS rank_file component
2. cleaned ompi_mpi_init.c from paffinity code
3. paffinity code moved to new opal/mca/paffinity/base/paffinity_base_service.c file
4. added opal_paffinity_slot_list mca parameter

This commit was SVN r18019.
2008-03-30 11:52:11 +00:00
Jeff Squyres
314ab2c6e7 Update internal libevent to upstream (v1.4.2-rc + OMPI changes).
Greatly reduce the number of "foo" -> "opal_foo" symbol renames in the
libevent source, and instead greatly expand the event_rename.h file
that uses preprocessor macros to make all public symbols be
"opal_foo".

This commit was SVN r17923.
2008-03-23 12:33:04 +00:00
Rolf vandeVaart
91af56db00 Fix a few typos so this compiles on Solaris. Remove some trailing spaces.
This commit was SVN r17746.
2008-03-05 20:16:00 +00:00
Jeff Squyres
ea5c0cb4a2 Now that the nightly tarball has safely been made, let's try this
commit again.  Remove the svn:ignore from problematic directories and
try a merge from /tmp-public/plpa-merge-area2.

This commit was SVN r17718.
2008-03-05 02:45:15 +00:00
Jeff Squyres
8189fcc7d5 Back out r17702; it went very badly.
This commit was SVN r17704.

The following SVN revision numbers were found above:
  r17702 --> open-mpi/ompi@3df754ebd7
2008-03-05 00:42:39 +00:00