openmpi

Автор	SHA1	Сообщение	Дата
Wesley Bland	e1ba09ad51	Add a resilience to ORTE. Allows the runtime to continue after a process (or ORTED) failure. Note that more work will be necessary to allow the MPI layer to take advantage of this. Per RFC: http://www.open-mpi.org/community/lists/devel/2011/06/9299.php This commit was SVN r24815.	2011-06-23 20:38:02 +00:00
Ralph Castain	1297acde13	George raised some valid concerns about the extensibility of the revised rmaps framework. Address those by: 1. removing the enum of mapper values 2. change the req_mapper and last_mapper fields to char* so they can hold the component name instead of a mapper flag 3. revise the selection logic in the mapper components to reflect the change. Components now look for their name in the req_mapper field, or to see if other criteria (e.g., npernode) are set that mandate their doing the mapping Several MCA params resided in the rmaps base for historical reasons - they have been in the base since at least the original 1.2 release (and perhaps earlier). However, George correctly pointed out that they really should reside in their respective components. Accordingly, move them to the components, but register synonyms to the old names to avoid breaking backward compatibility. These revisions retain the current functionality of allowing comm_spawn'd jobs to use different mappers than the original job, and for the errmgr to utilize the resilient mapper to recover processes regardless of how they were originally mapped. Given the large number of possible combinations, I am sure that someone will find a corner-case combination of values and selection criteria that cause either no mapper to be selected, or one other than the intended to be used. No one can test all the ways people will use this system, so I expect debugging to continue for awhile. The ability of comm_spawn'd jobs to exploit this functionality relies on changes to the orte_dpm component - this will be committed separately. This commit was SVN r24520.	2011-03-12 05:30:09 +00:00
Ralph Castain	5120e6aec3	Redefine the rmaps framework to allow multiple mapper modules to be active at the same time. This allows users to map the primary job one way, and map any comm_spawn'd job in a different way. Modules are given the opportunity to map a job in priority order, with the round-robin mapper having the highest default priority. Priority of each module can be defined using mca param. When called, each mapper checks to see if it can map the job. If npernode is provided, for example, then the loadbalance mapper accepts the assignment and performs the operation - all mappers before it will "pass" as they can't map npernode requests. Also remove the stale and never completed topo mapper. This commit was SVN r24393.	2011-02-15 23:24:31 +00:00
Ralph Castain	f1f156d57b	Make rmaps base open function play nicely with ompi_info This commit was SVN r22111.	2009-10-20 07:28:23 +00:00
Ralph Castain	d8d80d6f1a	Closes trac:2054. Check if a user specifies more cpus-per-rank than there are cpus in a socket - if so, politely tell them "you are stupid" and abort. This commit was SVN r22091. The following Trac tickets were found above: Ticket 2054 --> https://svn.open-mpi.org/trac/ompi/ticket/2054	2009-10-13 04:19:07 +00:00
Ralph Castain	1475d34c13	Ensure we default to byslot mapping This commit was SVN r22090.	2009-10-11 23:50:42 +00:00
Ralph Castain	dff0d01673	Yet another paffinity cleanup...sigh. 1. ensure that orte_rmaps_base_schedule_policy does not override cmd line settings 2. when you try to bind to more cores than we have, generate a not-enough-processors error message 3. allow npersocket -bind-to-core combination - because, yes, somebody actually wants to do it. This commit was SVN r21996.	2009-09-22 18:44:53 +00:00
Ralph Castain	8da3aa8d5c	Some (hopefully final!) adjustments and corrections to the paffinity support: 1. default -npersocket to force -bind-to-socket 2. if we cannot get a value for cores/socket, try using #logical cpus. otherwise, default to 1 core 3. add missing error message for not-enough-processors 4. since we no longer loop through orte_register_params twice, put the auto-detect of topology info in the rte_init for hnp and std_orted 5. fix bind-to-core, bysocket combination This commit was SVN r21992.	2009-09-22 15:41:03 +00:00
Ralph Castain	0394a4884d	Setup cpus-per-proc and cpus-per-rank as synonyms, both in mca params and on mpirun cmd line This commit was SVN r21914.	2009-08-30 14:30:36 +00:00
Ralph Castain	2d27bc9824	Default npersocket to bind-to-socket unless otherwise directed This commit was SVN r21904.	2009-08-27 13:21:14 +00:00
Ralph Castain	5e710928a5	Revise the new binding system slightly: 1. finalize the logic for properly respecting externally assigned bindings. Thanks to Chris Samuel for his help with this. Still needs some acid testing, but appears to now work. 2. remove the double-logic of requiring opal_paffinity_alone AND bind-to-foo. If the user specifies bind-to-foo, trust her and just do it. This commit was SVN r21885.	2009-08-26 02:01:49 +00:00
Ralph Castain	0005e6e834	Correct a couple of bugs in the rank_file mapper that were incorrectly assigning vpids. Add a capability to parse the rankfile to extract node information in place of requiring both hostfile and rankfile for non-RM managed environments. The rankfile is -only- parsed for this IF the hostfile and -host options are not given. Otherwise, those are used to establish allocation info as we did before this commit. This commit was SVN r21815.	2009-08-13 16:08:43 +00:00
Ralph Castain	1dc12046f1	Modify the OMPI paffinity and mapping system to support socket-level mapping and binding. Mostly refactors existing code, with modifications to the odls_default module to support the new capabilities. Adds several new mpirun options: * -bysocket - assign ranks on a node by socket. Effectively load balances the procs assigned to a node across the available sockets. Note that ranks can still be bound to a specific core within the socket, or to the entire socket - the mapping is independent of the binding. * -bind-to-socket - bind each rank to all the cores on the socket to which they are assigned. * -bind-to-core - currently the default behavior (maintained from prior default) * -npersocket N - launch N procs for every socket on a node. Note that this implies we know how many sockets are on a node. Mpirun will determine its local values. These can be overridden by provided values, either via MCA param or in a hostfile Similar features/options are provided at the board level for multi-board nodes. Documentation to follow... This commit was SVN r21791.	2009-08-11 02:51:27 +00:00
Rainer Keller	6c1cce8761	- For the upcoming header cleanup commit, several header files (previously included by header-files) now have to be moved "upward". This is mainly system headers such as string.h, stdio.h and for networking, but also some orte headers. This commit was SVN r21095.	2009-04-29 00:49:23 +00:00
Rainer Keller	04567d3af0	- Header orte/mca/errmgr/errmgr.h is not needed. Once again compiles fine with -Wimplicit-function-declaration This commit was SVN r20640.	2009-02-26 04:05:30 +00:00
Rainer Keller	d81443cc5a	- On the way to get the BTLs split out and lessen dependency on orte: Often, orte/util/show_help.h is included, although no functionality is required -- instead, most often opal_output.h, or orte/mca/rml/rml_types.h Please see orte_show_help_replacement.sh commited next. - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration actually showed two missing #include "orte/util/show_help.h" in orte/mca/odls/base/odls_base_default_fns.c and in orte/tools/orte-top/orte-top.c Manually added these. Let's have MTT the last word. This commit was SVN r20557.	2009-02-14 02:26:12 +00:00
Ralph Castain	89792bbc72	May as well have the other "clean" outputs use the same channel This commit was SVN r20082.	2008-12-08 19:37:22 +00:00
Ralph Castain	e64b79f30f	Modify the --display-map and --display-alloc per note on devel list to reduce info for user understanding. Add --display-devel-map and --display-devel-alloc to display all the detailed info we used to provide - it is only of use/interest to developers anyway and confuses users. This commit was SVN r19608.	2008-09-23 15:46:34 +00:00
Ralph Castain	3107545709	Ensure that ORTE processes such as mpirun and orted never inadvertently bind themselves to cores. Change the mca param name used by the rank_file mapper to get user directives on slot lists to be different from that used by MPI procs to discover their binding. Add a cmd line option to orterun to make it easier for a user to specify the slot list (basically, hide the mca param name). Discussed and reviewed with Lenny and Jeff. This commit was SVN r19062.	2008-07-28 14:18:36 +00:00
Ralph Castain	0532d799d6	Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm. Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed. This commit was SVN r18664.	2008-06-18 03:15:56 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
Ralph Castain	0da811ce79	Initial work on xml support - allocation and job map outputs completed. More to come. This commit was SVN r18587.	2008-06-04 20:53:12 +00:00
Ralph Castain	c992e99035	Remove the tags from orte_output_open and the filtering operation from orte_output - this will be handled differently to improve the XML output interface This commit was SVN r18557.	2008-06-03 14:24:01 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Ralph Castain	5311b13b60	Add a loadbalancing feature to the round-robin mapper - more to be sent to devel list Fix a potential problem with RM-provided nodenames not matching returns from gethostname - ensure that the HNP's nodename gets DNS-resolved when comparing against RM-provided hostnames. Note that this may be an issue for RM-based clusters that don't have local DNS resolution, but hopefully that is more indicative of a poorly configured system. This commit was SVN r18252.	2008-04-23 14:52:09 +00:00
Ralph Castain	3a0d09300b	Fully implement the inbound binomial allgather for daemon-based collectives. Supports both modex and barrier operations. Comm_spawn still uses the rank=0 method - shifting that algo to the daemons is under study. This commit was SVN r18115.	2008-04-09 22:10:53 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Josh Hursey	729c63cf9d	Fix invalid MCA 'base' names so they appear in ompi_info. A subset of this patch needs to be applied to v1.2 Refs trac:928 This commit was SVN r15918. The following Trac tickets were found above: Ticket 928 --> https://svn.open-mpi.org/trac/ompi/ticket/928	2007-08-18 03:05:45 +00:00
Ralph Castain	0d98264097	Fix the nolocal option on the OMPI trunk This commit was SVN r14138.	2007-03-24 16:16:16 +00:00
Ralph Castain	455e4ada9a	Bring the modified/updated pernode and npernode behaviors over from the openrte repository. This change enables npernode to pay attention to the total #procs to be launched, and cleans up the bynode vs. byslot mapping directives when in pernode and npernode modes. This commit was SVN r13191.	2007-01-18 17:15:19 +00:00
Ralph Castain	28ce8e5e5e	Extend the mpirun options to support "--npernode N". This option tells the system to spawn N procs/node across all nodes in the allocation. If N is greater than the number of allocated slots, then the usual oversubscription logic will apply (i.e., the system will error out if oversubscription is not allowed, otherwise it will run with the sched_yield set to non-aggressive behavior). In "--npernode" operation, the "-np" command line parameter is ignored. This commit was SVN r12826.	2006-12-12 00:54:05 +00:00
Ralph Castain	f771cc4fbd	Modify the reuse daemons procedure so we only generate the add_local_procs message once. Revise the display-map-at-launch option so the RMAPS framework takes responsibility for implementation of that option. Modify the RMAPS framework so we eliminate communicating a map to a backend node when certain attributes are set. The proxy functions are now implemented in the base, and a check made for HNP/non-HNP operation made in the map_jobs function prior to execution. This commit was SVN r12619.	2006-11-17 19:06:10 +00:00
Ralph Castain	f4a458532b	This doesn't totally resolve the comm_spawn problem, but it helps a little. I'll continue working on it and hope to resolve it completely shortly. The issue primarily centers on where to start mapping the child job's processes, and how to deal with oversubscription that might result. At the moment, I am trying to resolve the first issue first (hey, that even sounds right!). This change does a couple of things: 1. Since the USE_PARENT_ALLOC attribute is a directive about regarding allocation of resources to a job, it more properly should be an attribute of the RAS. Change the name to reflect that and move the attribute define to the ras_types.h file. 2. Add the attributes list to the RMAPS map_job interface. This provides us with the desired flexibility to dynamically specify directives for mapping. The system will - in the absence of any attribute-based directive - default to the values provided in the MCA parameters (either from environment or command-line interface). This commit was SVN r12164.	2006-10-18 14:01:44 +00:00
Ralph Castain	699ffcf359	Restore the "bynode" mapping functionality - accidentally deleted setting of parameter This commit was SVN r12078.	2006-10-10 19:41:22 +00:00
Ralph Castain	98dd57b70e	Add a new option to launch "pernode" - launches one process/node across all available nodes. The other options also work correctly: "-bynode" with no -np will launch on all slots, mapped on a per-node basis. This commit was SVN r12063.	2006-10-07 19:50:12 +00:00
Ralph Castain	ae79894bad	Bring the map fixes into the main trunk. This should fix several problems, including the multiple app_context issue. I have tested on rsh, slurm, bproc, and tm. Bproc continues to have a problem (will be asking for help there). Gridengine compiles but I cannot test (believe it likely will run). Poe and xgrid compile to the extent they can without the proper include files. This commit was SVN r12059.	2006-10-07 15:45:24 +00:00
Ralph Castain	cd7d87aa7b	Define the map data types for dss compatibility. Setup to debug bproc This commit was SVN r11955.	2006-10-03 17:40:00 +00:00
Ralph Castain	37dfdb76eb	Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done. This commit was SVN r11661.	2006-09-14 21:29:51 +00:00
Ralph Castain	febc143d8c	Per LANL's stated need, add functionality that runs a.out across ALL available process slots if no num_proc is specified on the command line. However, please note the following limitation: we ONLY allow ONE application to be specified on the command line when this feature is invoked. If multiple apps are specified, the user MUST also specify the number to be launched for each and every one of them. Update the help text to report errors when not following that rule. Also updated the RMAPS help text to reflect the reorganization of some of the round-robin code into the base. The new functionality has been tested under Mac OS-X and on Odin using an MPI program. Both byslot and bynode mapping have been checked and verified. Operational support for other systems needs to be verified - I respectfully request people's help in doing so. This commit was SVN r10708.	2006-07-10 21:25:33 +00:00
Ralph Castain	3d220cbd48	This patch fixes several issues relating to comm_spawn and N1GE. In particular, it does the following: 1. Modifies the RAS framework so it correctly stores and retrieves the actual slots in use, not just those that were allocated. Although the RAS node structure had storage for the number of slots in use, it turned out that the base function for storing and retrieving that information ignored what was in the field and simply set it equal to the number of slots allocated. This has now been fixed. 2. Modified the RMAPS framework so it updates the registry with the actual number of slots used by the mapping. Note that daemons are still NOT counted in this process as daemons are NOT mapped at this time. This will be fixed in 2.0, but will not be addressed in 1.x. 3. Added a new MCA parameter "rmaps_base_no_oversubscribe" that tells the system not to oversubscribe nodes even if the underlying environment permits it. The default is to oversubscribe if needed and the underlying environment permits it. I'm sure someone may argue "why would a user do that?", but it turns out that (looking ahead to dynamic resource reservations) sometimes users won't know how many nodes or slots they've been given in advance - this just allows them to say "hey, I'd rather not run if I didn't get enough". 4. Reorganizes the RMAPS framework to more easily support multiple components. A lot of the logic in the round_robin mapper was very valuable to any component - this has been moved to the base so others can take advantage of it. 5. Added a new test program "hello_nodename" - just does "hello_world" but also prints out the name of the node it is on. 6. Made the orte_ras_node_t object a full ORTE data type so it can more easily be copied, packed, etc. This proved helpful for the RMAPS code reorganization and might be of use elsewhere too. This commit was SVN r10697.	2006-07-10 14:10:21 +00:00
Jeff Squyres	538965aeb0	Final merge of stuff from /tmp/tm-stuff tree (merged through /tmp/tm-merge). Validated by RHC. Summary: - Add --nolocal (and -nolocal) options to orterun - Make some scalability improvements to the tm pls This commit was SVN r10651.	2006-07-04 20:12:35 +00:00
Brian Barrett	566a050c23	Next step in the project split, mainly source code re-arranging - move files out of toplevel include/ and etc/, moving it into the sub-projects - rather than including config headers with <project>/include, have them as <project> - require all headers to be included with a project prefix, with the exception of the config headers ({opal,orte,ompi}_config.h mpi.h, and mpif.h) This commit was SVN r8985.	2006-02-12 01:33:29 +00:00
Jeff Squyres	42ec26e640	Update the copyright notices for IU and UTK. This commit was SVN r7999.	2005-11-05 19:57:48 +00:00
Jeff Squyres	0629cdc2d7	Bring back the changes from /tmp/jjhursey-rmaps. Specific merge command: svn merge -r 7567:7663 https://svn.open-mpi.org/svn/ompi/tmp/jjhursey-rmaps . (where "." is a trunk checkout) The logs from this branch are much more descriptive than I will put here (including a really long description from last night). Here's the short version: - fixed some broken implementations in ras and rmaps - "orterun --host ..." now works and has clearly defined semantics (this was the impetus for the branch and all these fixes -- LANL had a requirement for --host to work for 1.0) - there is still a little bit of cleanup left to do post-1.0 (we got correct functionality for 1.0 -- we did not fix bad implementations that still "work") - rds/hostfile and ras/hostfile handshaking - singleton node segment assignments in stage1 - remove the default hostfile (no need for it anymore with the localhost ras component) - clean up pls components to avoid duplicate ras mapping queries - [possible] -bynode/-byslot being specific to a single app context This commit was SVN r7664.	2005-10-07 22:24:52 +00:00
Jeff Squyres	cce0950df7	- change a bunch of OMPI_* constants or ORTE_* equivalents - change the framework opens to [mostly] use the new MCA param API - properly pass in framework debug output streams to the mca_base_component_open() function This commit was SVN r6888.	2005-08-15 18:25:35 +00:00
Jeff Squyres	6a9c9953bc	Remove a bunch of -I's that are no longer necessary with properly-prefixed static-component.h files. This commit was SVN r6342.	2005-07-04 18:24:58 +00:00
Brian Barrett	a13166b500	* rename ompi_output to opal_output This commit was SVN r6329.	2005-07-03 23:31:27 +00:00
Brian Barrett	761402f95f	* rename ompi_list to opal_list This commit was SVN r6322.	2005-07-03 16:22:16 +00:00
Jeff Squyres	1b18979f79	Initial population of orte tree This commit was SVN r6266.	2005-07-02 13:42:54 +00:00

49 Коммитов