openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	12590202d8	Cleanup warnings This commit was SVN r23148.	2010-05-16 20:22:00 +00:00
Ralph Castain	da170a7ab9	Turn off the blasted hwloc component as it generates a ton of garbage. Note that this means linux-based systems will -not- have paffinity for now since the good old plpa module was removed. Clean up some missing ignores This commit was SVN r23147.	2010-05-16 20:06:14 +00:00
Jeff Squyres	e2ab4f2baf	Should be working now... This commit was SVN r23143.	2010-05-14 15:20:47 +00:00
Jeff Squyres	610fc67d12	Oops -- don't convert to a processor ID here; just return the OS index of the core. This commit was SVN r23142.	2010-05-14 15:14:28 +00:00
Jeff Squyres	a27da2473a	Ensure the whole directory is built. This commit was SVN r23140.	2010-05-14 13:21:09 +00:00
Jeff Squyres	3ba4086b0f	Remove another debugging message. This commit was SVN r23139.	2010-05-14 13:20:46 +00:00
Jeff Squyres	a1848ef8d5	Arf. Ignore this component while I fix vpath builds... This commit was SVN r23138.	2010-05-14 13:03:02 +00:00
Jeff Squyres	2d01a67516	Remove these generates files from SVN. This commit was SVN r23137.	2010-05-14 11:58:17 +00:00
Jeff Squyres	8c8efa9bf3	Remove debugging message. This commit was SVN r23136.	2010-05-14 11:57:43 +00:00
Jeff Squyres	21178f9379	Remove the "linux" paffinity component (i.e., the one that was based on the now-defunct PLPA) -- the new hwloc component supersedes it. So long, PLPA -- we loved ya! This commit was SVN r23126.	2010-05-13 23:59:21 +00:00
Jeff Squyres	3129ccd9ec	Make the hwloc paffinity component available for everyone. hwloc supports a wide variety of operating systems and platforms; see the opal/mca/paffinity/hwloc/hwloc/README file for details. This component includes an embedded copy of hwloc, currently based on hwloc-1.0rc6. But note that hwloc is properly SVN imported into the /vendor branch, so it will be easy to update when 1.0 GA is released. Note that the hwloc tree embedded in opal/mca/paffinity/hwloc/hwloc is identical to a hwloc distribution tarball, except that much of the documentation was rm -rf'ed (because we don't need it for the embedded case). Since the paffinity framework currently does not understand hardware threads, the hwloc component compensates for this by identifying cores by the "first" hardware thread on that core. Hopefully we'll update paffinity someday to understand hardware threads. :-) configure grew a --with-hwloc option, analogous to what we do for many other external libraries that OMPI supports. However, there's a new feature: due to the request of several distros, OMPI can be configured to build with its internal copy of hwloc or with an external copy of hwloc (e.g., a system-installed hwloc). 1. If --with-hwloc is not specified, Open MPI will try to use its internal copy (but silently fail/ignore hwloc if that fails). 1. If --with-hwloc=<dir> is supplied, Open MPI looks for hwloc support in <dir> (and --with-hwloc-libdir=<dir>, if specified). 1. If --with-hwloc=external is supplied, Open MPI will look for hwloc in a compiler/linker default external location. 1. If --with-hwloc=internal is supplied, Open MPI will use its internal copy of hwloc. Some of OMPI's main configury had to be slightly re-arranged in the bootstrapping phase to accomodate hwloc's configry needs. This commit was SVN r23125.	2010-05-13 23:56:05 +00:00
Jeff Squyres	ca6d95a9c8	Clean up some comments; make paffinity/base/base.h comments agree with paffinity/paffinity.h. This commit was SVN r23124.	2010-05-13 23:43:28 +00:00
Jeff Squyres	bf7954c1de	Bump up to 1.0rc6 from the vendor branch. This commit was SVN r23117.	2010-05-12 17:04:48 +00:00
Ralph Castain	d6a1d7a082	Little more cleanup on paffinity. Provide a specific error code for affinity not supported so we can better report the problem. Move the error reporting to orterun so we only get one error message. Update the darwin paffinity module to return the correct new error codes. This commit was SVN r23107.	2010-05-07 14:04:55 +00:00
Ralph Castain	d4f56cff61	More cleanup on paffinity....groan It is okay to not have a paffinity module IF you aren't using paffinity anyway. So don't error out of MPI_Init because a paffinity module wasn't selected. Cleanup error reporting in the odls default module to (once and for all!) eliminate messages originating in the fork'd process. Create some new error codes to allow us to pass enough info back to the parent process to provide useful error messages. This commit was SVN r23106.	2010-05-06 20:57:17 +00:00
Jeff Squyres	71cbe1a69f	Bump up to hwloc v1.0rc3 This commit was SVN r23070.	2010-04-29 15:59:01 +00:00
Jeff Squyres	f064056a07	We don't need all this stuff in OMPI. This commit was SVN r23056.	2010-04-28 00:31:15 +00:00
Jeff Squyres	2fe1bc043d	Bump up to hwloc 1.0rc2 This commit was SVN r23042.	2010-04-26 21:57:51 +00:00
Jeff Squyres	ea8b0ea569	Add a new function in the paffinity base: opal_paffinity_base_cset2str(). This function basically makes a prettyprint string out of an opal_paffinity_base_cset_t. This commit was SVN r23017.	2010-04-21 17:26:36 +00:00
Jeff Squyres	53ab6600e6	Minor update to comments. This commit was SVN r23013.	2010-04-20 20:59:42 +00:00
Jeff Squyres	f1d4a748eb	Minor fix: pass by pointer to the new function so that the caller can see the results. This commit was SVN r23012.	2010-04-20 19:52:47 +00:00
Ralph Castain	7717c970a3	Ahem...it requires 2 hex chars to describe each byte of a bitmask... This commit was SVN r23001.	2010-04-20 05:11:16 +00:00
Ralph Castain	86228aee38	Provide two new opal paffinity utilities for printing a hex representation of the cpu set and parsing that string back into a cpu set on the other end. Also add a new MCA param for passing the cpu set applied to a process during launch down to that process so it can know what we attempted to do. All to be used in some new MPI extensions provided by Jeff so that users can easily query their binding situation. This commit was SVN r22998.	2010-04-19 22:16:35 +00:00
Jeff Squyres	338920656f	Remove the compile-time proiorities for paffinity modules (they were done this way a long time ago for the "gee whiz!" factor -- when in reality, they really only need one-of-many-run-time priority selection). Changed run-time priorities to be as follows: * darwin: 20 * linux: 20 * posix: 10 * solaris: 30 * test: 5 * windows: 20 I have a very dim (possibly untrue) recollection that Solaris needs to have a higher priority than others just to ensure that no other is chosen under Solaris. Make all other "native" components have a priority of 20 (they shouldn't conflict with each other). Make the posix fallback component have a priority of 10. Make the test component priority 5, meaning someone can always select it, but you can also make a "never select me" component that prioritizes itself under test. This commit was SVN r22997.	2010-04-19 22:14:06 +00:00
Jeff Squyres	9f5ddbcc6e	3rd party import hwloc 1.0rc1 into the SVN trunk This commit was SVN r22996.	2010-04-19 19:48:58 +00:00
Jeff Squyres	8b163ccd70	Add dummy hwloc directory for staged import into svn This commit was SVN r22994.	2010-04-19 19:43:43 +00:00
Ralph Castain	4d06125a33	Establish a method by which a process knows if it has been bound by mpirun. This helps resolve a problem where a process gets "bound" to all available resources, which looks to the opal paffinity system as "not bound". This can cause mpi_init to attempt to "bind" the process itself, causing unintended behavior. This commit was SVN r22985.	2010-04-17 01:58:26 +00:00
Ralph Castain	41428e6b61	Issue a warning if a requested binding operation results in processes being bound to all available processes, which is the equivalent of not being bound at all. See the following email thread for further details: http://www.open-mpi.org/community/lists/devel/2010/04/7745.php This commit was SVN r22984.	2010-04-17 01:02:41 +00:00
Terry Dontje	282a537cf7	This commit fixes 2370, by having the solaris paffinity module return error codes for get_physical_processor_id and having odls_default_fork_local_proc check get_physical_processor_id for OPAL_ERROR This commit was SVN r22948.	2010-04-09 15:10:46 +00:00
Brad Benton	58a9aeff5a	================================================================================ modify the OPAL_PAFFINITY_PROCESS_IS_BOUND macro to search the cpuset for the maximum possible number of cpus rather than just the number of cpus currently online. This corrects a problem where mpi_paffinity_alone was not working properly on systems in which there can be cpu namespaces with holes, such as on ppc64 with smt off (as discussed in #2365). This commit was SVN r22927.	2010-04-02 18:24:12 +00:00
Jeff Squyres	a89dc623b0	Brice Goglin noticed that mpi_paffinity_alone didn't seem to be doing anything for non-MPI apps. Oops! (But before you freak out, gentle reader, note that mpi_paffinity_alone for MPI apps still worked fine) When we made the switchover somewhere in the 1.3 series to have the orted's do processor binding, then stuff like: mpirun --mca mpi_paffinity_alone 1 hostname should have bound hostname to processor 0. But it didn't because of a subtle startup ordering issue: the MCA param registration for opal_paffinity_alone was in the paffinity base (vs. being in opal/runtime/opal_params.c), but it didn't actually get registered until after the global variable opal_paffinity_alone was checked to see if we wanted old-style affinity bindings. Oops. However, for MPI apps, even though the orted didn't do the binding, ompi_mpi_init() would notice that opal_paffinity_alone was set, yet the process didn't seem to be bound. So the MPI process would bind itself (this was done to support the running-without-orteds scenarios). Hence, MPI apps still obeyed mpi_paffinity_alone semantics. But note that the error described above caused the new mpirun switch --report-bindings to not work with mpi_paffinity_alone=1, meaning that the orted would not report the bindings when mpi_paffinity_alone was set to 1 (it ''did'' correctly report bindings if you used --bind-to-core or one of the other binding options). This commit separates out the paffinity base MCA param registration into a small function that can be called at the Right place during the startup sequence. This commit was SVN r22602.	2010-02-10 22:32:00 +00:00
Jeff Squyres	dbb29663e8	Update the embedded PLPA version to v1.3.2. Since this is a 3rd party/"vendor" import, the changes are actually far smaller than the size of this changeset implies. Here's a list of the changes: * Update the AMD license header in plpa_map.c to be less restrictive (see https://svn.open-mpi.org/trac/plpa/changeset/262 for details) -- '''this is the most/only important change of this update.''' No code is changed by this; only removing a clase from a license header in plpa_map.c. * Changes to the generated {{{configure}}}, {{{config.guess}}}, and {{{config.sub}}} scripts (which aren't used by OMPI). * soname version tracking changes (which also aren't used by OMPI; they're only used when PLPA is built/installed in "standalone" mode). * Update the "get version" m4 (which was stolen from OMPI's m4 to begin with, and is only used during OMPI's autogen.sh step). * Update various PLPA version numbers to 1.3.2. * Bug fix in plpa-taskset (which is not built in the OMPI PLPA build). This commit was SVN r22367.	2010-01-06 00:44:14 +00:00
Jeff Squyres	9afe50d886	Update Cisco copyrights for consistency This commit was SVN r22072.	2009-10-07 22:02:32 +00:00
Jeff Squyres	7900451e4e	Fix CID 1326: for the (unlikely) case where opal_paffinity_base_get_processor_info() returns failure. This commit was SVN r22069.	2009-10-07 19:52:08 +00:00
Jeff Squyres	977574bd45	Fix a problem noted by Julian Seward: MAKE_MEM_UNDEFINED is not the opposite of MAKE_MEM_DEFINED. Also add in a call to NOACCESS to (mostly) reverse the effects of MAKE_MEM_DEFINED (technically, page 0 was accessible before this, even though it's a Bad Idea to access it). This commit was SVN r22056.	2009-10-06 17:55:49 +00:00
Eugene Loh	67bac2fe31	Fix paffinity_linux_module.c. The set and get functions transferred cpu masks between the mask argument and a local PLPA mask. There were three problems: 1) The "get" function computed the number of bits as sizeof(mask), which is the size of the pointer to the mask rather than the mask itself. So, only 4 bits were copied with m32 and 8 bits with m64. There are actually 1024 bits. 2) The "get" and "set" functions both copied a number of bits computed from the sizeof() mask, but sizeof() reports the number of bytes. We have to multiply by 8 to get the number of bits. 3) These two functions check to make sure tha the mask argument is not bigger than the PLPA mask. But, the set function copies a number of bits in the PLPA mask, which is conceivably greater than the number of bits in the mask argument. So, accesses to the mask argument may overrun that argument. Problems 1 and 2 meant that one would encounter errors when the number of cores exceeded 4 (with -m32) or 8 (with -m64). Problem 3 probably caused no errors. This commit was SVN r21993.	2009-09-22 16:00:37 +00:00
Ralph Castain	2028017554	Modify the paffinity system to handle binding directives that are "soft" - i.e., when someone directs that we bind if the system supports it. This allows community members to distribute OMPI with default MCA param files that direct general binding policies, without having the distributed software fail if the system cannot support those policies. The new options work by adding an ":if-avail" qualifier to the "bind-to-socket" and "bind-to-core" MCA params. If the system does not support this capability, the job will launch anyway. Without the qualifier, the job will abort with an error message indicating that the required functionality is not supported on this system. This commit was SVN r21975.	2009-09-18 19:48:42 +00:00
Jeff Squyres	2fa048b0e0	Make the paffinity test component only build when you --enable-debug (or have a developer build where that's enabled for you by default). This commit was SVN r21928.	2009-09-02 11:23:54 +00:00
Ralph Castain	388c65fd80	Add missing include file This commit was SVN r21924.	2009-09-01 13:31:54 +00:00
Ralph Castain	888f3c3afe	Extend the paffinity test module to allow users to specify the number of sockets and cores - provides an extended ability to mimic archs. This commit was SVN r21912.	2009-08-29 03:35:39 +00:00
Ralph Castain	3c4f28b22c	Modify the paffinity test module to take a param indicating whether or not to mimic being externally bound This commit was SVN r21908.	2009-08-28 02:31:01 +00:00
Ralph Castain	2178f995b9	Add a new "test" module to the paffinity framework that mimics a system that supports affinity when running on a Mac for development purposes. Only active if specifically called out. This commit was SVN r21881.	2009-08-26 01:55:30 +00:00
Rainer Keller	8e1b23779f	- Replace combinations of #if defined (c_plusplus) defined (__cplusplus) followed by extern "C" { and the closing counterpart by BEGIN_C_DECLS and END_C_DECLS. Notable exceptions are: - opal/include/opal_config_bottom.h: This is our generated code, that itself defines BEGIN_C_DECL and END_C_DECL - ompi/mpi/cxx/mpicxx.h: Here we do not include opal_config_bottom.h: - Belongs to external code: opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.c opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.h - opal/include/opal/prefetch.h: Has C++ specific macros that are protected: - Had #if ... } #endif _and_ END_C_DECLS (aka end up with 2x END_C_DECLS) ompi/mca/btl/openib/btl_openib.h - opal/event/event.h has #ifdef __cplusplus as BEGIN_C_DECLS... - opal/win32/ompi_process.h: had extern "C"\n {... opal/win32/ompi_process.h: dito - ompi/mca/btl/pcie/btl_pcie_lex.l: needed to add *_C_DECLS ompi/mpi/f90/test/align_c.c: dito - ompi/debuggers/msgq_interface.h: used #ifdef __cplusplus - ompi/mpi/f90/xml/common-C.xsl: Amend Tested on linux using --with-openib and --with-mx The following do not contain either opal_config.h, orte_config.h or ompi_config.h (but possibly other header files, that include one of the above): ompi/mca/bml/r2/bml_r2_ft.h ompi/mca/btl/gm/btl_gm_endpoint.h ompi/mca/btl/gm/btl_gm_proc.h ompi/mca/btl/mx/btl_mx_endpoint.h ompi/mca/btl/ofud/btl_ofud_endpoint.h ompi/mca/btl/ofud/btl_ofud_frag.h ompi/mca/btl/ofud/btl_ofud_proc.h ompi/mca/btl/openib/btl_openib_mca.h ompi/mca/btl/portals/btl_portals_endpoint.h ompi/mca/btl/portals/btl_portals_frag.h ompi/mca/btl/sctp/btl_sctp_endpoint.h ompi/mca/btl/sctp/btl_sctp_proc.h ompi/mca/btl/tcp/btl_tcp_endpoint.h ompi/mca/btl/tcp/btl_tcp_ft.h ompi/mca/btl/tcp/btl_tcp_proc.h ompi/mca/btl/template/btl_template_endpoint.h ompi/mca/btl/template/btl_template_proc.h ompi/mca/btl/udapl/btl_udapl_eager_rdma.h ompi/mca/btl/udapl/btl_udapl_endpoint.h ompi/mca/btl/udapl/btl_udapl_mca.h ompi/mca/btl/udapl/btl_udapl_proc.h ompi/mca/mtl/mx/mtl_mx_endpoint.h ompi/mca/mtl/mx/mtl_mx.h ompi/mca/mtl/psm/mtl_psm_endpoint.h ompi/mca/mtl/psm/mtl_psm.h ompi/mca/pml/cm/pml_cm_component.h ompi/mca/pml/csum/pml_csum_comm.h ompi/mca/pml/dr/pml_dr_comm.h ompi/mca/pml/dr/pml_dr_component.h ompi/mca/pml/dr/pml_dr_endpoint.h ompi/mca/pml/dr/pml_dr_recvfrag.h ompi/mca/pml/example/pml_example.h ompi/mca/pml/ob1/pml_ob1_comm.h ompi/mca/pml/ob1/pml_ob1_component.h ompi/mca/pml/ob1/pml_ob1_endpoint.h ompi/mca/pml/ob1/pml_ob1_rdmafrag.h ompi/mca/pml/ob1/pml_ob1_recvfrag.h ompi/mca/pml/v/pml_v_output.h opal/include/opal/prefetch.h opal/mca/timer/aix/timer_aix.h opal/util/qsort.h test/support/components.h This commit was SVN r21855. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855	2009-08-20 11:42:18 +00:00
Ralph Castain	aca3e71ccd	Don't declare us "bound" if the cpu mask is completely zero This commit was SVN r21839.	2009-08-19 18:55:06 +00:00
Ralph Castain	1dc12046f1	Modify the OMPI paffinity and mapping system to support socket-level mapping and binding. Mostly refactors existing code, with modifications to the odls_default module to support the new capabilities. Adds several new mpirun options: * -bysocket - assign ranks on a node by socket. Effectively load balances the procs assigned to a node across the available sockets. Note that ranks can still be bound to a specific core within the socket, or to the entire socket - the mapping is independent of the binding. * -bind-to-socket - bind each rank to all the cores on the socket to which they are assigned. * -bind-to-core - currently the default behavior (maintained from prior default) * -npersocket N - launch N procs for every socket on a node. Note that this implies we know how many sockets are on a node. Mpirun will determine its local values. These can be overridden by provided values, either via MCA param or in a hostfile Similar features/options are provided at the board level for multi-board nodes. Documentation to follow... This commit was SVN r21791.	2009-08-11 02:51:27 +00:00
Ralph Castain	c459615f8f	When someone specifies a rank-file slot-list of N:*, stop the loop at the proper place (we were going through the loop one too many times). Thanks to Eugene for spotting it. This commit was SVN r21728.	2009-07-23 17:51:15 +00:00
George Bosilca	3e971e61f3	The system headers are supposed to be protected by #ifdef and not by #if. This commit was SVN r21700.	2009-07-16 18:27:33 +00:00
Ralph Castain	d3fb39073f	Initialize a variable to ensure we get the correct number of bound processors This commit was SVN r21590.	2009-07-02 17:48:04 +00:00
Ralph Castain	c3c1ab1337	Correct a comment in paffinity.h about what paffinity_get returns - it was inaccurate. Revamp the affinity detection/set procedure in mpi_init to correctly detect when we have already been bound to processors, given the revised understanding of paffinity_get. Add a new paffinity macro to make checking for already bound a little nicer. This commit was SVN r21402.	2009-06-09 14:33:35 +00:00
Ralph Castain	aa25a51c92	Do not mark the mpi_paffinity_alone param as deprecated so we don't scare Jeff...er...users. This commit was SVN r21218.	2009-05-12 15:41:11 +00:00

1 2 3 4

175 Коммитов