openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	61d94fcee2	Fix the sequential mapper - it was out-of-sync with the hostfile changes, and we missed the "seq" policy when parsing the --map-by option. Thanks to Bill Chen for reporting it cmr=v1.8.1:reviewer=jsquyres This commit was SVN r31333.	2014-04-08 03:38:25 +00:00
Jeff Squyres	82e104719a	hwloc/rmaps base: Add missing help message. Also, add missing ORTE_ERROR_LOG in the other case where this error message is used (i.e., ORTE_ERROR_LOG was used in the one place, so let's also use it in the other place). This commit was SVN r31321.	2014-04-07 15:39:54 +00:00
Ralph Castain	3fdcaeab97	Fix a problem where we need to abort due to a mapping failure, but we are in a managed environment and thus the orteds have not wired up. Thus, if we send the exit message across the routed network, the remote daemons won't have a way to relay the message along - and we won't exit. If we are aborting, then set the flags so the HNP directly sends an exit command to each daemon. Make it the halt_vm command so the remote daemon doesn't try to relay it, but instead just exits without waiting for its routed children to exit first. cmr=v1.8.1:reviewer=jsquyres:subject=fix hangs due to abort prior to daemon wireup This commit was SVN r31304.	2014-04-02 04:17:55 +00:00
Ralph Castain	390645ac2a	Per patch from Tetsuya Mishima, do a nicer job of warning the user that we need to map to a higher level to get the number of requested cpus/rank. Also, change the mapping policy to "byslot" when falling back to that option. cmr=v1.8:reviewer=rhc This commit was SVN r31196.	2014-03-24 15:47:29 +00:00
Ralph Castain	081669b440	When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it cmr=v1.7.5:reviewer=rhc:subject=fix pretty-print of bindings This commit was SVN r30968.	2014-03-10 15:53:07 +00:00
Ralph Castain	50c30d62ca	Repair builds without hwloc cmr=v1.7.5:reviewer=jsquyres This commit was SVN r30940.	2014-03-05 02:48:15 +00:00
Ralph Castain	0ac97761cc	Now that we are binding by default, the issue of #slots and what to do when oversubscribed has become a bit more complicated. This isn't a problem in managed environments as we are always provided an accurate assignment for the #slots, or when -host is used to define the allocation since we automatically assume one slot for every time a node is named. The problem arises when a hostfile is used, and the user provides host names without specifying the slots= paramater. In these cases, we assign slots=1, but automatically allow oversubscription since that number isn't confirmed. We then provide a separate parameter by which the user can direct that we assign the number of slots based on the sensed hardware - e.g., by telling us to set the #slots equal to the #cores on each node. However, this has been set to "off" by default. In order to make this a little less complex for the user, set the default such that we automatically set #slots equal to #cores (or #hwt's if use_hwthreads_as_cpus has been set) only for those cases where the user provides names in a hostfile but does not provide slot information. Also cleanup some a couple of issues in the mapping/binding system: * ensure we only override the binding directive if we are oversubscribed and overload is not allowed * ensure that the MPI procs don't attempt to bind themselves if they are launched by an orted as any binding directive (no matter what it was) would have been serviced by the orted on launch * minor cleanup to the warning message when oversubscribed and binding was requested cmr=v1.7.5:reviewer=rhc:subject=update mapping/binding system This commit was SVN r30909.	2014-03-03 16:46:37 +00:00
Ralph Castain	88b0e0cc6d	Allow the user to turn off the oversubscribed-binding warning if overload-allowed has been provided Refs trac:4317 This commit was SVN r30892. The following Trac tickets were found above: Ticket 4317 --> https://svn.open-mpi.org/trac/ompi/ticket/4317	2014-02-28 17:55:53 +00:00
Ralph Castain	4a645f0342	Add detection of oversubscription with binding requested - if binding requested to core or hwt, warn and do not bind or else we will hurt performance. Also, if no binding directive was given, turn off the default binding Refs trac:4317 This commit was SVN r30888. The following Trac tickets were found above: Ticket 4317 --> https://svn.open-mpi.org/trac/ompi/ticket/4317	2014-02-28 16:08:52 +00:00
Ralph Castain	61a21e4f31	Based on Tetsuya's patch, with some changes, correct the case of map-by node where multiple cpus/rank are requested and result in a non-integer match with num slots. Also correct tests for binding policy given to use the proper macro. Refs trac:4296 This commit was SVN r30857. The following Trac tickets were found above: Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296	2014-02-26 18:12:23 +00:00
Ralph Castain	c8112c1086	Loadbalancing across nodes (i.e., map-by node) wasn't working correctly - the algorithm relied on the nodes being defined in descending order of slots, or the numbe r of slots remaing to be assigned being only one/node. Regardless, it didn't work for the case where nodes were defined in ascending order of slots. Tetsuya's proposed patch didn't solve the problem for me, but it did correct the case where cpus/proc > 1. The final patch requires that we loop over the assignment algo until all procs are assigned or all nodes are filled - any remaining procs are then handled in the cleanup loop. cmr=v1.7.5:reviewer=rhc:subject=fix map-by node for different cases This commit was SVN r30798.	2014-02-22 16:39:41 +00:00
Ralph Castain	91f90058ce	Add missing options and cleanup the code a bit. Default to by-slot ranking if a non-hardware option isn't given. Thanks to Tetsuya Mishima for the assist. cmr=v1.7.5:reviewer=ompi-gk1.7 This commit was SVN r30725.	2014-02-14 10:23:16 +00:00
Ralph Castain	fd9b301a8b	Check equality instead of bit-mask - thanks to Tetsuya Mishima for reporting it cmr=v1.7.5:reviewer=ompi-gk1.7 This commit was SVN r30722.	2014-02-14 02:34:42 +00:00
Ralph Castain	1473dde6ea	Okay, once again be caught by the blasted hwloc inability to cleanly handle caches. Protect the calls to get_depth by first checking to see if it is a "cache", then use a cache-specific function to get the stupid data. Very, very irritating. cmr=v1.7.5:reviewer=jsquyres:subject=treat caches as something different yet again This commit was SVN r30693.	2014-02-12 01:45:06 +00:00
Ralph Castain	b566cd5e30	Protect against no modifiers Refs trac:4117 This commit was SVN r30672. The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-02-11 17:34:37 +00:00
Ralph Castain	6fa34407bf	Handle modifiers to the --map-by dist option Refs trac:4117 This commit was SVN r30671. The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-02-11 17:19:05 +00:00
Ralph Castain	4781ea71b6	Correct the handling of various map/bind combinations when pe=N is given. Thanks to Elena Elkina for reporting it. Refs trac:4117 This commit was SVN r30663. The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-02-11 03:05:26 +00:00
Ralph Castain	707e51d786	Check for --cpus-per-proc earlier, before the correct option can be processed. Thanks to Tetsuya Mishima for reporting it. Refs trac:4117 This commit was SVN r30662. The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-02-11 02:53:53 +00:00
Ralph Castain	d66d2f5fb3	It is just fine to map by node or slot and bind, so ensure the switch statement includes those options. Thanks to Tatsuya Mishima for point it out. Refs trac:4240 This commit was SVN r30661. The following Trac tickets were found above: Ticket 4240 --> https://svn.open-mpi.org/trac/ompi/ticket/4240	2014-02-11 02:52:01 +00:00
Ralph Castain	1a12325094	Rats - need to include bydist in the mapping list Refs trac:4117 This commit was SVN r30649. The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-02-09 16:17:05 +00:00
Ralph Castain	ca0c806662	Resolve the problem of binding in inverted topologies - check the relative depth of the map and bind objects in the topology, and let that determine whether we bind downward or upwards. cmr=v1.7.5:reviewer=jsquyres:subject=Resolve the problem of binding in inverted topologies This commit was SVN r30643.	2014-02-09 05:30:17 +00:00
Ralph Castain	bc7cc09749	After a lot of pain, I've managed to resolve the problem of conflicting mapping directives caused by mismatched MCA params - i.e., where someone has one variant of an MCA param (e.g., rmaps_base_mapping_policy) in their default MCA param file, and then specifies another variant (e.g., --npernode) on the command line. I can't fully resolve the problem as there is no way to know precisely what the user meant - we can only guess which param was really intended since the MCA param system can't apply its normal precedence rules. So...print a big "deprecated" warning for the old params and error out if a conflict is detected. I know that isn't what people really wanted, but it's the best we can do. If only the old style param is given, then process it after the warning. Extend the current map-by param to add support for ppr and cpus-per-proc, adding the latter to the list of allowed modifiers using "pe=n" for processing elements/proc. Thus, you can map-by socket:pe=2,oversubscribe to map by socket, binding 2 processing elements/process, with oversubscription allowed. Or you can map-by ppr:2:socket:pe=4 to map two processes to every socket in the allocation, binding each process to 4 processing elements. For those wondering, a processing element is defined as a hwthread if --use-hwthreads-as-cpus is given, or else as a core. Refs trac:4117 This commit was SVN r30620. The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-02-07 21:25:40 +00:00
Ralph Castain	e43589ed84	Fix warning - thanks to Paul Hargrove for reporting it cmr=v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r30548.	2014-02-03 23:51:45 +00:00
Ralph Castain	410a3afa7b	Fix --without-hwloc operations - must default to map-by slot in that scenario cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30474.	2014-01-29 16:54:05 +00:00
Ralph Castain	42eb0bbe1b	Fix --without-hwloc builds cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30462.	2014-01-28 17:10:32 +00:00
Ralph Castain	84a0ab3a75	Ah @$#!$#% - missed one last help message that needs to be corrected. cmr=v1.7.4:reviewer=jsquyres:subject=correct help message This commit was SVN r30449.	2014-01-28 04:03:24 +00:00
Ralph Castain	941bfd4604	Final cleanup of cpus-per-proc for 1.7.4 - provide better checking for cpus-per-proc and mismatched mapping/binding directives, and provide error messages telling the user what to do to get it right. cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30438.	2014-01-27 22:40:51 +00:00
Jeff Squyres	7768828d2d	Addendum to r30298: tweak the wording of the help messages a bit. Refs trac:4117. Please use this commit rather than the patch attached to the ticket; the patch had a few mistakes in the tweaked wording. This commit was SVN r30362. The following SVN revision numbers were found above: r30298 --> open-mpi/ompi@58479399c3 The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-01-22 12:17:14 +00:00
Ralph Castain	58479399c3	As per RFC and telecon, deprecate cmd line options and their corresponding MCA params for old-style mapping and binding directives cmr=v1.7.5:reviewer=jsquyres:subject=deprecate old-style mapping and binding directives This commit was SVN r30298.	2014-01-15 14:48:39 +00:00
Ralph Castain	fb9e427320	One last corner case - when encountering an overload condition (e.g., by comm_spawning more procs than we have cores) and we are using the default binding policy, do not bind the new procs to anything as this can cause major problems. Instead, let the spawn succeed since the user didn't specifically ask to be bound, and leave the new procs as unbound. Refs trac:4077 This commit was SVN r30200. The following Trac tickets were found above: Ticket 4077 --> https://svn.open-mpi.org/trac/ompi/ticket/4077	2014-01-09 22:39:34 +00:00
Ralph Castain	24e990e747	Fix comm_spawn for oversubscribed systems by correctly computing the number of available slots cmr=v1.7.4:reviewer=jsquyres:subject=Fix comm_spawn for oversubscribed systems This commit was SVN r30197.	2014-01-09 20:33:48 +00:00
Ralph Castain	9fcb46d85a	Correctly detect and handle oversubscription for comm_spawn cmr=v1.7.4:reviewer=jsquyres:subject=Correctly detect and handle oversubscription for comm_spawn This commit was SVN r30186.	2014-01-09 18:27:51 +00:00
Ralph Castain	6e5fedeb04	Oops - add verbose output to inform that cannot default bind due to no cores detected Refs trac:4074 This commit was SVN r30185. The following Trac tickets were found above: Ticket 4074 --> https://svn.open-mpi.org/trac/ompi/ticket/4074	2014-01-09 18:17:14 +00:00
Ralph Castain	7e4748a0f1	Handle the case of nodes that do not report cores, and thus our default binding policy will fail even though binding is supported by defaulting to not binding on those nodes. Thanks to Paul Hargrove for reporting the problem on NetBSD. cmr=v1.7.4:reviewer=jsquyres:subject=Handle the case of nodes that do not report cores This commit was SVN r30180.	2014-01-09 16:27:58 +00:00
Ralph Castain	bf453a2575	Reference the correct variable...sigh Refs trac:4059 This commit was SVN r30163. The following Trac tickets were found above: Ticket 4059 --> https://svn.open-mpi.org/trac/ompi/ticket/4059	2014-01-08 22:36:39 +00:00
Ralph Castain	e724d0d12d	Ensure comm_spawn'd jobs get treated the same wrt setting default mapping directives Refs trac:4059 This commit was SVN r30158. The following Trac tickets were found above: Ticket 4059 --> https://svn.open-mpi.org/trac/ompi/ticket/4059	2014-01-08 15:16:22 +00:00
Ralph Castain	fb650aed0c	Fix how we transfer mapping directives to the job, ensuring that directives that can be given outside of a mapping policy (e.g., oversubscribe and no-use-local) are retained. cmr=v1.7.4:reviewer=jsquyres:subject=Fix how we transfer mapping directives to the job This commit was SVN r30155.	2014-01-08 04:25:43 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
Mike Dubman	40aadab85f	re-enable map-by dist after last refactoring in rmaps, map-by dist:hca was disabled. reverting it back found/fixed by Elena, reviewed by miked cmr=v1.7.4:reviewer=ompi-rm1.7 This commit was SVN r30118.	2014-01-04 20:44:41 +00:00
Ralph Castain	d5a5caa7e0	Restore the bycore mpirun option for backward compatibility Refs trac:4044 cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30103. The following Trac tickets were found above: Ticket 4044 --> https://svn.open-mpi.org/trac/ompi/ticket/4044	2014-01-02 04:16:43 +00:00
George Bosilca	38cbaeaa82	Try to impose a little bit of consistency on how we parse lists of modules by enforcing the use of OPAL list accessors. This commit was SVN r30045.	2013-12-21 23:23:33 +00:00
Ralph Castain	31248c0985	Correctly add support for the "env" MPI_Info key during comm_spawn, update the "map-by", "rank-by", and "bind-to" Info key behaviors to match the new mapping/ranking/binding system, and update all docs and comments to match. Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node. Refs trac:4003 This commit was SVN r30033. The following Trac tickets were found above: Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003	2013-12-20 20:42:39 +00:00
Ralph Castain	55cd65b149	Don't warn about binding (process and/or memory) if the node cannot do it or if we would overload, but it wasn't specifically requested by the user (i.e., it is the result of the default policy). Instead, just don't bind and quietly move along. Reset topology usage for each node as we bind as multiple nodes may be linked to the same topology object. This will need to be revisited for scale as it does take some non-zero time to reset the usage each iteration. However, storing individual topology objects for every node consumes memory, so it's a tradeoff. cmr=v1.7.4:reviewer=jsquyres:subject=Eliminate excessive binding/memory warnings This commit was SVN r29978.	2013-12-19 16:31:45 +00:00
Ralph Castain	c5956e7b8c	Convert debug output to opal_output_verbose Thanks to Tetsuya Mishima for reporting it cmr=v1.7.4:reviewer=jsquyres This commit was SVN r29969.	2013-12-19 00:36:15 +00:00
Ralph Castain	ab4636c47b	Per email on devel list, change the default rank-by to slot unless map-by <obj> is specified, in which case use rank-by <obj> Refs trac:3977 This commit was SVN r29945. The following Trac tickets were found above: Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977	2013-12-18 00:48:50 +00:00
Ralph Castain	53cd00fe16	By setting a default mapping/ranking/binding policy that wasn't "none", we introduced a problem for users of the Mac and any other machine where sockets aren't defined and/or binding is not supported. Fix that by checking to see if the user specified the failing policy - if not, then fall back to the old map/rank by slot and no binding. Refs trac:3977 This commit was SVN r29933. The following Trac tickets were found above: Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977	2013-12-17 14:50:10 +00:00
Ralph Castain	8b6d117541	Per the OMPI devel conference that changed our default behaviors: * default to bind-to core * map-by slot if np=2 * map-by socket (balance across sockets on each node) if np > 2 * map-by <obj> will imply rank-by <obj> by default (leave default binding as above) Fix a bug in the map-by <obj> mapper where we incorrectly compute the #procs to assign if the #slots > #procs cmr=v1.7.4:reviewer=jsquyres:subject=Update default binding and mapping values This commit was SVN r29919.	2013-12-15 17:25:54 +00:00
Ralph Castain	7480beb7f0	Per request from Nathan, add an offset value to the job struct so we can construct a "global rank" that spans multiple jobs during dynamic launch operations. Store a new ORTE_DB_GLOBAL_RANK value for each process in the database, and ensure that we share our own value during connect_accept so both sides can see it. This isn't being used yet - just enabling Nathan to do what he needs. *** NOTE: any use of the OMPI_DB_GLOBAL_RANK database key must be protected by #ifdef OMPI_DB_GLOBAL_RANK as not all RTE's will define this key. *** This commit was SVN r29708.	2013-11-14 17:01:43 +00:00
Ralph Castain	e35ad23176	Correctly compute usage for dynamic spawns when binding is invoked. Ensure we correctly account for existing process usage on each node when computing bindings during dynamic spawns. cmr=v1.7.4:reviewer=hjelmn:subject=Correctly compute usage for dynamic spawns when binding is invoked This commit was SVN r29649.	2013-11-10 00:38:01 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00

1 2 3 4 5 ...

309 Коммитов