openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	398ae15533	rmaps_base_frame: remove dead code This was CID 1196641	2015-02-24 15:24:11 -05:00
Howard Pritchard	bf89131f9e	add owner files to opa/ompi/orte mca directories This commit adds an owner file in each of the component directories for each framework. This allows for a simple script to parse the contents of the files and generate, among other things, tables to be used on the project's wiki page. Currently there are two "fields" in the file, an owner and a status. A tool to parse the files and generate tables for the wiki page will be added in a subsequent commit.	2015-02-22 15:10:23 -07:00
Ralph Castain	116fcaff2c	Start adding support for cmd line options to orte-submit	2015-02-10 12:13:21 -08:00
Ralph Castain	b314bfb5e9	If someone specifies the bitmap for hwthreads and wants hwthread cpus, then don't parse the slot list as it expects cores - just copy the provided bitmap across as it already has the required info	2014-12-19 10:56:14 -08:00
Ralph Castain	0630680f36	Two cleanups required for transfer to 1.8.4: * Use %d format for the topo signature as some systems apparently have problems with %u * Use correct variable in show_help message	2014-12-12 17:23:32 -08:00
Ralph Castain	b757b3f452	Ensure that the #nodes in the job map gets properly updated when using the sequential mapper. Provide some further diagnostic info to help understand the problem when encountered.	2014-12-08 08:03:53 -08:00
Ralph Castain	cb15cc06e1	Minor changes per Jeff's request on PR for 1.8.4	2014-12-02 19:54:10 -08:00
Ralph Castain	3f9d9ae8b6	Provide tighter LSF integration by correctly handling scenarios where the user has asked LSF to assign bindings. Fix a couple of typos in lex parser definitions. Tell hostfile parser to ignore binding designations in hostfiles. Add an attribute to indicate that cpusets were provided as physical cpu ids. Once validated, a version of this will be backported to the v1.8.4 release.	2014-11-30 11:50:31 -08:00
Ralph Castain	2a90788724	Support physical processor ids in rankfile	2014-11-10 14:00:40 -08:00
Ralph Castain	ea11e63f59	Per patch from Tetsuya, allow the user to bind-to none when specifying multiple pe's/rank as requested by Reuti. This allows the user to reserve multiple "slots" in the allocation for each process while mapping, but not to bind the process to specific processing elements on the node. Reviewed by rhc, so RM-approved to go across to v1.8.3 cmr=v1.8.3:reviewer=ompi-gk1.8 This commit was SVN r32701.	2014-09-10 15:52:18 +00:00
Ralph Castain	4207b4c4ad	Improve the --bind-to help message to better indicate the default options under various values of np. Remove the warning message if the user doesn't specify a binding policy and we are overloaded cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32687.	2014-09-08 21:03:51 +00:00
Ralph Castain	842aaf6167	Correctly end mapping oversubscribed nodes round-robin byslot cmr=v1.8.3:reviewer=rhc This commit was SVN r32616.	2014-08-27 16:15:18 +00:00
Ralph Castain	024572cb6c	Sigh - I promised to remove these deprecation warnings back in June. My apologies to Dave Goodell and others who requested it. cmr=v1.8.2:reviewer=dgoodell:subject=remove deprecation warnings for pernode, npernode, and npersocket This commit was SVN r32552.	2014-08-19 19:40:20 +00:00
Gilles Gouaillardet	c3c364a262	check-help-strings cleanup This commit was SVN r32494.	2014-08-11 03:22:05 +00:00
George Bosilca	daa076995a	orte_rmaps_numa_node_t -> opal_rmaps_numa_node_t This commit was SVN r32380.	2014-07-31 19:58:47 +00:00
Ralph Castain	5bb5b22573	When a user asks for cpus/rank > 1 and only has one slot, we need to ensure we always map at least one process when they don't tell us -np cmr=v1.8.2:reviewer=rhc:subject=correct num_procs in corner case This commit was SVN r32142.	2014-07-04 17:00:35 +00:00
Ralph Castain	149810f02c	Per request from Jeff, slightly modify the show_help message as the precise name of the NUMA-containing packages differs based on OS and distro cmr=v1.8.2:reviewer=jsquyres:subject=modify show_help message This commit was SVN r32122.	2014-07-02 14:46:00 +00:00
Ralph Castain	8fca77c3d3	Protect the binding policy setting so it builds when --without-hwloc Refs trac:4742 This commit was SVN r32085. The following Trac tickets were found above: Ticket 4742 --> https://svn.open-mpi.org/trac/ompi/ticket/4742	2014-06-25 18:13:54 +00:00
Ralph Castain	5f6be06b54	Per request from Gilles and discussion at devel conference, have the --oversubscribe option automatically set both oversubscribe and overload-allowed properties as this is likely what the user intended. cmr=v1.8.2:reviewer=rhc:subject=automatically set oversub/load This commit was SVN r32072.	2014-06-24 18:11:39 +00:00
Ralph Castain	645df5e823	Don't release the node_name field as it gets used in the slots parsing - will be released at newline detection This commit was SVN r32058.	2014-06-20 13:18:46 +00:00
Ralph Castain	9a47e45a09	<laugh> ensure we really compare the things we want to compare This commit was SVN r32055.	2014-06-19 20:54:25 +00:00
Ralph Castain	e65538e91b	Add some defensive programming, fix a typo This commit was SVN r32054.	2014-06-19 20:52:13 +00:00
Ralph Castain	b43f760f93	If you don't specify all the rank-file mapping for all procs, then you'll segfault - which is probably a bad idea. I can't see an easy workaround, so just error out for now and let's see if anyone really cares. cmr=v1.8.2:reviewer=jsquyres This commit was SVN r32053.	2014-06-19 20:30:06 +00:00
Ralph Castain	65275d6326	Add a little more info to the warning message - i.e., that the likely cause of the problem is missing libnumactl and/or libnumactl-devel cmr=v1.8.2:reviewer=miked:subject=improve memory binding failure message This commit was SVN r32030.	2014-06-18 19:20:28 +00:00
Ralph Castain	3f04d50cb0	Per the ticket, resolve our handling of overload conditions to provide a more consistent response. If we are overloaded (i.e., attempting to bind more processes to a location than the number of cpus under that location), then we consider the following conditions: (a) default binding policy is in effect. In this case, we will emit a warning and default to not binding unless the user provided the "oversubscribe" or "overload" modifier to the "bind-to" option. (b) user-specified binding policy is in effect. In this case, we will error out unless the user provided the "oversubscribe" or "overload" modifier to the "bind-to" option as we cannot meet the directive. Either "bind-to" modifier (oversubscribe or overload) will be accepted for now - in 1.9, we will deprecate the "overload" term in favor of "oversubscribe". Also added the ability to accept a --bind-to modifier without specifying the binding policy itself so a user can specify overload-allowed with the default policy. Closes trac:4345 cmr=v1.8.2:reviewer=rhc:subject=resolve handling of overload conditions This commit was SVN r32005. The following Trac tickets were found above: Ticket 4345 --> https://svn.open-mpi.org/trac/ompi/ticket/4345	2014-06-14 15:38:32 +00:00
Ralph Castain	56c3575c0e	Can't emit an error for an unrecognized mapping policy modifier as the ppr policy relies on not doing so. This commit was SVN r31998.	2014-06-13 20:10:09 +00:00
Ralph Castain	3ed282bf44	Per patch from Tetsuya, correct the cpus-per-proc logic so we correctly detect when the user is attempting to bind too low for that option Refs trac:4702 This commit was SVN r31988. The following Trac tickets were found above: Ticket 4702 --> https://svn.open-mpi.org/trac/ompi/ticket/4702	2014-06-13 16:32:52 +00:00
Ralph Castain	06dbfa3098	Make the cpus-per-proc equivalent a little more intuitive: * allow users to specify just a modifier for map-by instead of requiring that they also specify a policy. Thus, we now accept --map-by :pe=3 as indicating that we should use the default mapping policy, but bind 3 cpus/proc. * if users specify a pe's/proc but no policy, default to --map-by NUMA to ensure we have access to multiple cpus for the request. This won't guarantee we have access to enough to meet the request, but gives us a chance. In addition, we know that binding a proc to multiple cpus will work best if those cpus are all in the same NUMA, so this provides some degree of optimized behavior. Per a request from Jeff, define "oversubscribe" for binding as a synonym for the "overload" modifier. cmr=v1.8.2:reviewer=rhc This commit was SVN r31967.	2014-06-08 20:26:59 +00:00
Ralph Castain	638c24f655	Correct the bind-in-place algorithm to better handle comm_spawn. If the location identified by the mapper is already occupied by procs from another job, then we need to shift either right or left until we find an unoccupied location where we can be bound. If nothing is available, then check for the overload flag (and bind us in the original location if provided), or see if this was the default binding policy instead of one specified by the user - if so, then just don't bind this process. cmr=v1.8.2:reviewer=rhc This commit was SVN r31959.	2014-06-06 12:36:14 +00:00
Ralph Castain	f1978fba7c	Cleanup a set of typos on the orte_get_attribute call This commit was SVN r31942.	2014-06-03 20:36:38 +00:00
Ralph Castain	8736a1c138	Per RFC: http://www.open-mpi.org/community/lists/devel/2014/05/14822.php Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root). This commit was SVN r31916.	2014-06-01 16:14:10 +00:00
Ralph Castain	f55c587a74	Per patch from Tetsuya Mishima, ensure the rank_file mapper accurately tracks number of nodes in the map Refs trac:4594 This commit was SVN r31725. The following Trac tickets were found above: Ticket 4594 --> https://svn.open-mpi.org/trac/ompi/ticket/4594	2014-05-13 14:36:25 +00:00
Ralph Castain	5602156a1c	Use the correct abstraction layer name for the data dirs This commit was SVN r31684.	2014-05-08 14:32:24 +00:00
Ralph Castain	11faab1091	The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees. This commit was SVN r31679.	2014-05-08 02:01:35 +00:00
Ralph Castain	6545e6e9a8	Add one more check for failed mapping that rarely occurs, but results in a hang when it does cmr=v1.8.2:reviewer=rhc This commit was SVN r31598.	2014-05-02 10:35:14 +00:00
Ralph Castain	61d94fcee2	Fix the sequential mapper - it was out-of-sync with the hostfile changes, and we missed the "seq" policy when parsing the --map-by option. Thanks to Bill Chen for reporting it cmr=v1.8.1:reviewer=jsquyres This commit was SVN r31333.	2014-04-08 03:38:25 +00:00
Jeff Squyres	82e104719a	hwloc/rmaps base: Add missing help message. Also, add missing ORTE_ERROR_LOG in the other case where this error message is used (i.e., ORTE_ERROR_LOG was used in the one place, so let's also use it in the other place). This commit was SVN r31321.	2014-04-07 15:39:54 +00:00
Ralph Castain	3fdcaeab97	Fix a problem where we need to abort due to a mapping failure, but we are in a managed environment and thus the orteds have not wired up. Thus, if we send the exit message across the routed network, the remote daemons won't have a way to relay the message along - and we won't exit. If we are aborting, then set the flags so the HNP directly sends an exit command to each daemon. Make it the halt_vm command so the remote daemon doesn't try to relay it, but instead just exits without waiting for its routed children to exit first. cmr=v1.8.1:reviewer=jsquyres:subject=fix hangs due to abort prior to daemon wireup This commit was SVN r31304.	2014-04-02 04:17:55 +00:00
Ralph Castain	714cb8f573	Silence warnings cmr=v1.8:reviewer=rhc This commit was SVN r31248.	2014-03-27 14:16:54 +00:00
Ralph Castain	390645ac2a	Per patch from Tetsuya Mishima, do a nicer job of warning the user that we need to map to a higher level to get the number of requested cpus/rank. Also, change the mapping policy to "byslot" when falling back to that option. cmr=v1.8:reviewer=rhc This commit was SVN r31196.	2014-03-24 15:47:29 +00:00
Ralph Castain	081669b440	When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it cmr=v1.7.5:reviewer=rhc:subject=fix pretty-print of bindings This commit was SVN r30968.	2014-03-10 15:53:07 +00:00
Ralph Castain	fc2dd6ac48	Per Jeff's request, add a more detailed comment as to why we are turning off the warning at this time. Refs trac:4339 This commit was SVN r30948. The following Trac tickets were found above: Ticket 4339 --> https://svn.open-mpi.org/trac/ompi/ticket/4339	2014-03-06 02:17:25 +00:00
Ralph Castain	a2b539c763	Per the telecon, silence the warning for 1.7.5 to give us time to consider a better permanent solution Refs trac:4339 This commit was SVN r30941. The following Trac tickets were found above: Ticket 4339 --> https://svn.open-mpi.org/trac/ompi/ticket/4339	2014-03-05 03:02:29 +00:00
Ralph Castain	50c30d62ca	Repair builds without hwloc cmr=v1.7.5:reviewer=jsquyres This commit was SVN r30940.	2014-03-05 02:48:15 +00:00
Ralph Castain	0ac97761cc	Now that we are binding by default, the issue of #slots and what to do when oversubscribed has become a bit more complicated. This isn't a problem in managed environments as we are always provided an accurate assignment for the #slots, or when -host is used to define the allocation since we automatically assume one slot for every time a node is named. The problem arises when a hostfile is used, and the user provides host names without specifying the slots= paramater. In these cases, we assign slots=1, but automatically allow oversubscription since that number isn't confirmed. We then provide a separate parameter by which the user can direct that we assign the number of slots based on the sensed hardware - e.g., by telling us to set the #slots equal to the #cores on each node. However, this has been set to "off" by default. In order to make this a little less complex for the user, set the default such that we automatically set #slots equal to #cores (or #hwt's if use_hwthreads_as_cpus has been set) only for those cases where the user provides names in a hostfile but does not provide slot information. Also cleanup some a couple of issues in the mapping/binding system: * ensure we only override the binding directive if we are oversubscribed and overload is not allowed * ensure that the MPI procs don't attempt to bind themselves if they are launched by an orted as any binding directive (no matter what it was) would have been serviced by the orted on launch * minor cleanup to the warning message when oversubscribed and binding was requested cmr=v1.7.5:reviewer=rhc:subject=update mapping/binding system This commit was SVN r30909.	2014-03-03 16:46:37 +00:00
Ralph Castain	88b0e0cc6d	Allow the user to turn off the oversubscribed-binding warning if overload-allowed has been provided Refs trac:4317 This commit was SVN r30892. The following Trac tickets were found above: Ticket 4317 --> https://svn.open-mpi.org/trac/ompi/ticket/4317	2014-02-28 17:55:53 +00:00
Ralph Castain	4a645f0342	Add detection of oversubscription with binding requested - if binding requested to core or hwt, warn and do not bind or else we will hurt performance. Also, if no binding directive was given, turn off the default binding Refs trac:4317 This commit was SVN r30888. The following Trac tickets were found above: Ticket 4317 --> https://svn.open-mpi.org/trac/ompi/ticket/4317	2014-02-28 16:08:52 +00:00
Ralph Castain	8500247c7b	Fix the by-obj mapper in the case where slots are not specified, and so we are in a perpetual oversubscribed state cmr=v1.7.5:reviewer=rhc This commit was SVN r30887.	2014-02-28 05:21:46 +00:00
Ralph Castain	a4c3d0a5a0	Add some more debug to the by-obj mapper This commit was SVN r30884.	2014-02-28 02:52:53 +00:00
Ralph Castain	d109c523b9	Per patch from Tetsuya Mishima, complete the overhaul of the round-robin mappers Refs trac:4296 This commit was SVN r30861. The following Trac tickets were found above: Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296	2014-02-27 00:43:53 +00:00
Ralph Castain	61a21e4f31	Based on Tetsuya's patch, with some changes, correct the case of map-by node where multiple cpus/rank are requested and result in a non-integer match with num slots. Also correct tests for binding policy given to use the proper macro. Refs trac:4296 This commit was SVN r30857. The following Trac tickets were found above: Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296	2014-02-26 18:12:23 +00:00
Ralph Castain	b880aa46bd	Update the map-by obj and map-by obj:span mappers to correct for errors in computing carryover across the nodes. Be a little less complex in the algorithm so it is easier to follow and debug. Refs trac:4296 This commit was SVN r30826. The following Trac tickets were found above: Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296	2014-02-25 23:32:43 +00:00
Ralph Castain	c8112c1086	Loadbalancing across nodes (i.e., map-by node) wasn't working correctly - the algorithm relied on the nodes being defined in descending order of slots, or the numbe r of slots remaing to be assigned being only one/node. Regardless, it didn't work for the case where nodes were defined in ascending order of slots. Tetsuya's proposed patch didn't solve the problem for me, but it did correct the case where cpus/proc > 1. The final patch requires that we loop over the assignment algo until all procs are assigned or all nodes are filled - any remaining procs are then handled in the cleanup loop. cmr=v1.7.5:reviewer=rhc:subject=fix map-by node for different cases This commit was SVN r30798.	2014-02-22 16:39:41 +00:00
Mike Dubman	8d4592a94b	rmaps/mindist: better error message better error message when there is only one socket available fixed by Elena, reviewed by Miked cmr=v1.7.5:reviewer=ompi-rm1.7 This commit was SVN r30787.	2014-02-21 11:38:35 +00:00
Ralph Castain	91f90058ce	Add missing options and cleanup the code a bit. Default to by-slot ranking if a non-hardware option isn't given. Thanks to Tetsuya Mishima for the assist. cmr=v1.7.5:reviewer=ompi-gk1.7 This commit was SVN r30725.	2014-02-14 10:23:16 +00:00
Ralph Castain	fd9b301a8b	Check equality instead of bit-mask - thanks to Tetsuya Mishima for reporting it cmr=v1.7.5:reviewer=ompi-gk1.7 This commit was SVN r30722.	2014-02-14 02:34:42 +00:00
Ralph Castain	1473dde6ea	Okay, once again be caught by the blasted hwloc inability to cleanly handle caches. Protect the calls to get_depth by first checking to see if it is a "cache", then use a cache-specific function to get the stupid data. Very, very irritating. cmr=v1.7.5:reviewer=jsquyres:subject=treat caches as something different yet again This commit was SVN r30693.	2014-02-12 01:45:06 +00:00
Ralph Castain	b566cd5e30	Protect against no modifiers Refs trac:4117 This commit was SVN r30672. The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-02-11 17:34:37 +00:00
Ralph Castain	6fa34407bf	Handle modifiers to the --map-by dist option Refs trac:4117 This commit was SVN r30671. The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-02-11 17:19:05 +00:00
Ralph Castain	4781ea71b6	Correct the handling of various map/bind combinations when pe=N is given. Thanks to Elena Elkina for reporting it. Refs trac:4117 This commit was SVN r30663. The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-02-11 03:05:26 +00:00
Ralph Castain	707e51d786	Check for --cpus-per-proc earlier, before the correct option can be processed. Thanks to Tetsuya Mishima for reporting it. Refs trac:4117 This commit was SVN r30662. The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-02-11 02:53:53 +00:00
Ralph Castain	d66d2f5fb3	It is just fine to map by node or slot and bind, so ensure the switch statement includes those options. Thanks to Tatsuya Mishima for point it out. Refs trac:4240 This commit was SVN r30661. The following Trac tickets were found above: Ticket 4240 --> https://svn.open-mpi.org/trac/ompi/ticket/4240	2014-02-11 02:52:01 +00:00
Ralph Castain	1a12325094	Rats - need to include bydist in the mapping list Refs trac:4117 This commit was SVN r30649. The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-02-09 16:17:05 +00:00
Ralph Castain	ca0c806662	Resolve the problem of binding in inverted topologies - check the relative depth of the map and bind objects in the topology, and let that determine whether we bind downward or upwards. cmr=v1.7.5:reviewer=jsquyres:subject=Resolve the problem of binding in inverted topologies This commit was SVN r30643.	2014-02-09 05:30:17 +00:00
Ralph Castain	bc7cc09749	After a lot of pain, I've managed to resolve the problem of conflicting mapping directives caused by mismatched MCA params - i.e., where someone has one variant of an MCA param (e.g., rmaps_base_mapping_policy) in their default MCA param file, and then specifies another variant (e.g., --npernode) on the command line. I can't fully resolve the problem as there is no way to know precisely what the user meant - we can only guess which param was really intended since the MCA param system can't apply its normal precedence rules. So...print a big "deprecated" warning for the old params and error out if a conflict is detected. I know that isn't what people really wanted, but it's the best we can do. If only the old style param is given, then process it after the warning. Extend the current map-by param to add support for ppr and cpus-per-proc, adding the latter to the list of allowed modifiers using "pe=n" for processing elements/proc. Thus, you can map-by socket:pe=2,oversubscribe to map by socket, binding 2 processing elements/process, with oversubscription allowed. Or you can map-by ppr:2:socket:pe=4 to map two processes to every socket in the allocation, binding each process to 4 processing elements. For those wondering, a processing element is defined as a hwthread if --use-hwthreads-as-cpus is given, or else as a core. Refs trac:4117 This commit was SVN r30620. The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-02-07 21:25:40 +00:00
Ralph Castain	e43589ed84	Fix warning - thanks to Paul Hargrove for reporting it cmr=v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r30548.	2014-02-03 23:51:45 +00:00
Ralph Castain	410a3afa7b	Fix --without-hwloc operations - must default to map-by slot in that scenario cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30474.	2014-01-29 16:54:05 +00:00
Ralph Castain	42eb0bbe1b	Fix --without-hwloc builds cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30462.	2014-01-28 17:10:32 +00:00
Ralph Castain	84a0ab3a75	Ah @$#!$#% - missed one last help message that needs to be corrected. cmr=v1.7.4:reviewer=jsquyres:subject=correct help message This commit was SVN r30449.	2014-01-28 04:03:24 +00:00
Ralph Castain	941bfd4604	Final cleanup of cpus-per-proc for 1.7.4 - provide better checking for cpus-per-proc and mismatched mapping/binding directives, and provide error messages telling the user what to do to get it right. cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30438.	2014-01-27 22:40:51 +00:00
Ralph Castain	886fee9367	Properly set num_procs when np is not given, but cpus-per-proc is used. Thanks to Tetsuya Mishima for pointing it out cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30389.	2014-01-23 05:01:07 +00:00
Jeff Squyres	7768828d2d	Addendum to r30298: tweak the wording of the help messages a bit. Refs trac:4117. Please use this commit rather than the patch attached to the ticket; the patch had a few mistakes in the tweaked wording. This commit was SVN r30362. The following SVN revision numbers were found above: r30298 --> open-mpi/ompi@58479399c3 The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-01-22 12:17:14 +00:00
Ralph Castain	58479399c3	As per RFC and telecon, deprecate cmd line options and their corresponding MCA params for old-style mapping and binding directives cmr=v1.7.5:reviewer=jsquyres:subject=deprecate old-style mapping and binding directives This commit was SVN r30298.	2014-01-15 14:48:39 +00:00
Ralph Castain	fb9e427320	One last corner case - when encountering an overload condition (e.g., by comm_spawning more procs than we have cores) and we are using the default binding policy, do not bind the new procs to anything as this can cause major problems. Instead, let the spawn succeed since the user didn't specifically ask to be bound, and leave the new procs as unbound. Refs trac:4077 This commit was SVN r30200. The following Trac tickets were found above: Ticket 4077 --> https://svn.open-mpi.org/trac/ompi/ticket/4077	2014-01-09 22:39:34 +00:00
Ralph Castain	24e990e747	Fix comm_spawn for oversubscribed systems by correctly computing the number of available slots cmr=v1.7.4:reviewer=jsquyres:subject=Fix comm_spawn for oversubscribed systems This commit was SVN r30197.	2014-01-09 20:33:48 +00:00
Ralph Castain	9fcb46d85a	Correctly detect and handle oversubscription for comm_spawn cmr=v1.7.4:reviewer=jsquyres:subject=Correctly detect and handle oversubscription for comm_spawn This commit was SVN r30186.	2014-01-09 18:27:51 +00:00
Ralph Castain	6e5fedeb04	Oops - add verbose output to inform that cannot default bind due to no cores detected Refs trac:4074 This commit was SVN r30185. The following Trac tickets were found above: Ticket 4074 --> https://svn.open-mpi.org/trac/ompi/ticket/4074	2014-01-09 18:17:14 +00:00
Ralph Castain	7e4748a0f1	Handle the case of nodes that do not report cores, and thus our default binding policy will fail even though binding is supported by defaulting to not binding on those nodes. Thanks to Paul Hargrove for reporting the problem on NetBSD. cmr=v1.7.4:reviewer=jsquyres:subject=Handle the case of nodes that do not report cores This commit was SVN r30180.	2014-01-09 16:27:58 +00:00
Ralph Castain	bf453a2575	Reference the correct variable...sigh Refs trac:4059 This commit was SVN r30163. The following Trac tickets were found above: Ticket 4059 --> https://svn.open-mpi.org/trac/ompi/ticket/4059	2014-01-08 22:36:39 +00:00
Ralph Castain	e724d0d12d	Ensure comm_spawn'd jobs get treated the same wrt setting default mapping directives Refs trac:4059 This commit was SVN r30158. The following Trac tickets were found above: Ticket 4059 --> https://svn.open-mpi.org/trac/ompi/ticket/4059	2014-01-08 15:16:22 +00:00
Ralph Castain	fb650aed0c	Fix how we transfer mapping directives to the job, ensuring that directives that can be given outside of a mapping policy (e.g., oversubscribe and no-use-local) are retained. cmr=v1.7.4:reviewer=jsquyres:subject=Fix how we transfer mapping directives to the job This commit was SVN r30155.	2014-01-08 04:25:43 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
Mike Dubman	40aadab85f	re-enable map-by dist after last refactoring in rmaps, map-by dist:hca was disabled. reverting it back found/fixed by Elena, reviewed by miked cmr=v1.7.4:reviewer=ompi-rm1.7 This commit was SVN r30118.	2014-01-04 20:44:41 +00:00
Ralph Castain	d5a5caa7e0	Restore the bycore mpirun option for backward compatibility Refs trac:4044 cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30103. The following Trac tickets were found above: Ticket 4044 --> https://svn.open-mpi.org/trac/ompi/ticket/4044	2014-01-02 04:16:43 +00:00
George Bosilca	38cbaeaa82	Try to impose a little bit of consistency on how we parse lists of modules by enforcing the use of OPAL list accessors. This commit was SVN r30045.	2013-12-21 23:23:33 +00:00
Ralph Castain	31248c0985	Correctly add support for the "env" MPI_Info key during comm_spawn, update the "map-by", "rank-by", and "bind-to" Info key behaviors to match the new mapping/ranking/binding system, and update all docs and comments to match. Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node. Refs trac:4003 This commit was SVN r30033. The following Trac tickets were found above: Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003	2013-12-20 20:42:39 +00:00
Ralph Castain	55cd65b149	Don't warn about binding (process and/or memory) if the node cannot do it or if we would overload, but it wasn't specifically requested by the user (i.e., it is the result of the default policy). Instead, just don't bind and quietly move along. Reset topology usage for each node as we bind as multiple nodes may be linked to the same topology object. This will need to be revisited for scale as it does take some non-zero time to reset the usage each iteration. However, storing individual topology objects for every node consumes memory, so it's a tradeoff. cmr=v1.7.4:reviewer=jsquyres:subject=Eliminate excessive binding/memory warnings This commit was SVN r29978.	2013-12-19 16:31:45 +00:00
Ralph Castain	c5956e7b8c	Convert debug output to opal_output_verbose Thanks to Tetsuya Mishima for reporting it cmr=v1.7.4:reviewer=jsquyres This commit was SVN r29969.	2013-12-19 00:36:15 +00:00
Ralph Castain	ab4636c47b	Per email on devel list, change the default rank-by to slot unless map-by <obj> is specified, in which case use rank-by <obj> Refs trac:3977 This commit was SVN r29945. The following Trac tickets were found above: Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977	2013-12-18 00:48:50 +00:00
Ralph Castain	53cd00fe16	By setting a default mapping/ranking/binding policy that wasn't "none", we introduced a problem for users of the Mac and any other machine where sockets aren't defined and/or binding is not supported. Fix that by checking to see if the user specified the failing policy - if not, then fall back to the old map/rank by slot and no binding. Refs trac:3977 This commit was SVN r29933. The following Trac tickets were found above: Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977	2013-12-17 14:50:10 +00:00
Ralph Castain	8b6d117541	Per the OMPI devel conference that changed our default behaviors: * default to bind-to core * map-by slot if np=2 * map-by socket (balance across sockets on each node) if np > 2 * map-by <obj> will imply rank-by <obj> by default (leave default binding as above) Fix a bug in the map-by <obj> mapper where we incorrectly compute the #procs to assign if the #slots > #procs cmr=v1.7.4:reviewer=jsquyres:subject=Update default binding and mapping values This commit was SVN r29919.	2013-12-15 17:25:54 +00:00
Jeff Squyres	0ab48ad0d2	Fix some annoying flex warnings that have been there for years. Many thanks to Tom Fogal for the initial patch. cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings This commit was SVN r29904.	2013-12-14 00:36:12 +00:00
Ralph Castain	0e81959aae	Cleanup mindist error messages - already patched in 1.7 This commit was SVN r29869.	2013-12-12 15:30:29 +00:00
Mike Dubman	c208b858e7	improve error messages in mindist cmr=v1.7.4:reviewer=ompi-rm1.7 This commit was SVN r29846.	2013-12-09 06:34:38 +00:00
Ralph Castain	f2c49c6c19	Fix the map-by object mapper to handle cpus-per-proc by accounting for the request when computing the number of procs to put on each object. This ensures that the binding routine doesn't automatically overload the cores. cmr=v1.7.4:reviewer=jsquyres This commit was SVN r29843.	2013-12-08 16:59:25 +00:00
Ralph Castain	7480beb7f0	Per request from Nathan, add an offset value to the job struct so we can construct a "global rank" that spans multiple jobs during dynamic launch operations. Store a new ORTE_DB_GLOBAL_RANK value for each process in the database, and ensure that we share our own value during connect_accept so both sides can see it. This isn't being used yet - just enabling Nathan to do what he needs. *** NOTE: any use of the OMPI_DB_GLOBAL_RANK database key must be protected by #ifdef OMPI_DB_GLOBAL_RANK as not all RTE's will define this key. *** This commit was SVN r29708.	2013-11-14 17:01:43 +00:00
Mike Dubman	840e2cb4a2	mindist: cosmetic, use fallback to byslot if unable to read NUMA info, small fix. fixed by Elena, reviewed by Ralph/Mike cmr=v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r29679.	2013-11-13 09:26:40 +00:00
Ralph Castain	e35ad23176	Correctly compute usage for dynamic spawns when binding is invoked. Ensure we correctly account for existing process usage on each node when computing bindings during dynamic spawns. cmr=v1.7.4:reviewer=hjelmn:subject=Correctly compute usage for dynamic spawns when binding is invoked This commit was SVN r29649.	2013-11-10 00:38:01 +00:00
Joshua Ladd	d594ffbfc7	Backing out Elena's patch - abstraction violation This commit was SVN r29645.	2013-11-08 13:12:07 +00:00
Joshua Ladd	da3e272fdd	Adds a check in the mindist mapper for whether or not the user asks for a specific device. This patch was submited by Elena Elkina and reviewed by Josh Ladd and should be added to cmr=v1.7.4:reviewer=jladd This commit was SVN r29644.	2013-11-08 04:28:53 +00:00
Ralph Castain	960a255e7f	Do some cleanup of the --without-hwloc build - no need to work on coprocessors since we can't detect them anyway, cleanup some unused variables in the ppr mapper This commit was SVN r29476.	2013-10-23 01:45:21 +00:00
Jeff Squyres	758cd25fff	Move the MCA / MPI_T level of the LAMA component down to 5 (from 9). This commit was SVN r29214.	2013-09-20 15:23:27 +00:00
Ralph Castain	d9f0505952	Fix the lama verbose outputs so they don't segfault if someone asks for verbose output, but isn't using lama cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29108.	2013-09-03 17:55:35 +00:00
Ralph Castain	2bfa99e945	If a rankfile is given and the number of procs not specified in the mpirun cmd line, then set the number of procs to the number of ranks in the rankfile cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29104.	2013-09-02 15:04:40 +00:00
Ralph Castain	7a7cfdd519	A little cleanup - the base function to sort numa lists must return something or you get a warning about non-void function returning without value, so cleanup the return values. Ensure the mindist module actually checks for a return of "error" so it won't segfault, and have it emit a polite message when that happens. cmr:v1.7.3:reviewer=jladd This commit was SVN r29089.	2013-08-29 20:01:06 +00:00
Joshua Ladd	1802aabf1a	Add support for autodetecting a MLNX HCA in the rmaps min distance feature. In this way, .ini files distributed with software stacks need not specify a particular HCA but instead may select the key word auto which will automatically select the discovered device. To use this feature, simply pass the keyword auto instead of a specific device name, --mca rmaps_base_dist_hca auto. If more than one card is installed, the mapper will inform the user of this and, at this point, the user will then need to specify which card via the normal route, e.g. --mca rmaps_base_dist_hca <dev_name>. This should be added to \ncmr=v1.7.4:reviewer=rhc:subject=Autodetect logic for min dist mapping This commit was SVN r29079.	2013-08-28 16:23:33 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Ralph Castain	b2d86e1857	Silence uninitialized var warning This commit was SVN r29034.	2013-08-16 21:35:51 +00:00
Ralph Castain	7a21661785	Silence a warning when --without-hwloc is used This commit was SVN r28783.	2013-07-13 17:17:17 +00:00
Dave Goodell	3741d62308	fix --without-hwloc build failure All builds since r28682 configured with '--without-hwloc' fail at "make" time without this fix. Reviewed by rhc@ This commit was SVN r28769. The following SVN revision numbers were found above: r28682 --> open-mpi/ompi@446e33a5d8	2013-07-12 17:21:14 +00:00
Ralph Castain	62378209f0	Even if we don't find the default hostfile, and nothing else was provided, then use all the known nodes. cmr:v1.7.3:#3653:reviewer=jsquyres cmr:v1.6.6:#3654:reviewer=jsquyres This commit was SVN r28718.	2013-07-03 22:31:32 +00:00
Ralph Castain	443a6802b9	If the default hostfile is empty, we need to pickup all the known nodes, not just the head node. cmr:v1.7.3:reviewer=jsquyres cmr:v1.6.6:reviewer=jsquyres This commit was SVN r28717.	2013-07-03 22:25:51 +00:00
Ralph Castain	446e33a5d8	There are cases where we want to use the novm state machine, but the backend node topology differs from that where mpirun is executing. In those cases, we can wind up thinking we are oversubscribed because the head node has fewer cores than the compute nodes. To resolve this situation, add the ability to specify a backend topology file that mpirun shall use for its mapping operations. Create a new "set_topology" function in opal hwloc to support it. This commit was SVN r28682.	2013-06-27 03:04:50 +00:00
Ralph Castain	a51a0a8c48	Fix uninitialized var This commit was SVN r28652.	2013-06-18 22:41:47 +00:00
Joshua Ladd	61ffb47573	Minor fix for the min-dist mapping algorithm: we need to call 'get_nbobjs_by_type' first, before we get the sorted list of nodes - we need to add node objects and fill them in the summary object for the current topology. This patch was submitted by Elena Elkina and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd This commit was SVN r28578.	2013-05-31 15:19:59 +00:00
Jeff Squyres	6d173af329	This commit introduces a new "mindist" ORTE RMAPS mapper, as well as some relevant updates/new functionality in the opal/mca/hwloc and orte/mca/rmaps bases. This work was mainly developed by Mellanox, with a bunch of advice from Ralph Castain, and some minor advice from Brice Goglin and Jeff Squyres. Even though this is mainly Mellanox's work, Jeff is committing only for logistical reasons (he holds the hg+svn combo tree, and can therefore commit it directly back to SVN). ----- Implemented distance-based mapping algorithm as a new "mindist" component in the rmaps framework. It allows mapping processes by NUMA due to PCI locality information as reported by the BIOS - from the closest to device to furthest. To use this algorithm, specify: {{{mpirun --map-by dist:<device_name>}}} where <device_name> can be mlx5_0, ib0, etc. There are two modes provided: 1. bynode: load-balancing across nodes 1. byslot: go through slots sequentially (i.e., the first nodes are more loaded) These options are regulated by the optional ''span'' modifier; the command line parameter looks like: {{{mpirun --map-by dist:<device_name>,span}}} So, for example, if there are 2 nodes, each with 8 cores, and we'd like to run 10 processes, the mindist algorithm will place 8 processes to the first node and 2 to the second by default. But if you want to place 5 processes to each node, you can add a span modifier in your command line to do that. If there are two NUMA nodes on the node, each with 4 cores, and we run 6 processes, the mindist algorithm will try to find the NUMA closest to the specified device, and if successful, it will place 4 processes on that NUMA but leaving the remaining two to the next NUMA node. You can also specify the number of cpus per MPI process. This option is handled so that we map as many processes to the closest NUMA as we can (number of available processors at the NUMA divided by number of cpus per rank) and then go on with the next closest NUMA. The default binding option for this mapping is bind-to-numa. It works if you don't specify any binding policy. But if you specified binding level that was "lower" than NUMA (i.e hwthread, core, socket) it would bind to whatever level you specify. This commit was SVN r28552.	2013-05-22 13:04:40 +00:00
Jeff Squyres	089c632cce	Remove a bunch of dead code: gcc 4.7 warns of set-but-unused variables. So get rid of them. This commit was SVN r28538.	2013-05-17 21:45:49 +00:00
Ralph Castain	e100b8d165	don't need the return value, but should check for error This commit was SVN r28534.	2013-05-16 15:15:02 +00:00
Jeff Squyres	128cc27417	Minor type fix (they're both enums/ints, so the compiler previously silently cast them). This commit was SVN r28532.	2013-05-16 00:47:37 +00:00
Ralph Castain	3a372a65b8	Mapping policies must be tested as equalities as they are values, not bitmasks This commit was SVN r28526.	2013-05-15 13:45:00 +00:00
Ralph Castain	29e4b0cc50	Cannot test equality on mapping directives as it is a bitmask This commit was SVN r28525.	2013-05-15 13:41:49 +00:00
Ralph Castain	5296099ecb	Fix the cpus-per-rank when binding to hwthreads. Add cpus-per-rank to diag printout Thanks to Elena for reporting the problem This commit was SVN r28508.	2013-05-14 20:17:50 +00:00
Ralph Castain	427b6b0b47	Fix the verbosity of yet another framework...sigh. This commit was SVN r28481.	2013-05-13 14:36:32 +00:00
Jeff Squyres	456df1c9f7	Remove redundant opal_output() messages from the module; the called functions will now show_help() their own error messages if something goes wrong (per r28470). This commit was SVN r28471. The following SVN revision numbers were found above: r28470 --> open-mpi/ompi@2ff95a7739	2013-05-10 15:12:07 +00:00
Jeff Squyres	2ff95a7739	Proper show_help error messages for LAMA. This commit was SVN r28470.	2013-05-10 15:06:25 +00:00
Ralph Castain	707d0e653a	Must use equal and not & comparison for mapping directives This commit was SVN r28451.	2013-05-06 15:07:12 +00:00
Ralph Castain	5d7a93c032	Add the ability to use an external version of libevent. Clearly not recommended at this time. I've verified that it works in limited scenarios, but more thorough testing and performance impacts need to be assessed. Interesting how many includes had to be fixed here and there to fill in missing dependencies :-) This commit was SVN r28411.	2013-04-29 17:02:37 +00:00
Ralph Castain	252147fba6	Cleanup error message if unknown host is given in -host and -hostfile options This commit was SVN r28262.	2013-03-28 16:52:10 +00:00
Nathan Hjelm	c041156f60	Update ORTE frameworks to use the MCA framework system. This commit was SVN r28240.	2013-03-27 21:14:43 +00:00
Nathan Hjelm	cf377db823	MCA/base: Add new MCA variable system Features: - Support for an override parameter file (openmpi-mca-param-override.conf). Variable values in this file can not be overridden by any file or environment value. - Support for boolean, unsigned, and unsigned long long variables. - Support for true/false values. - Support for enumerations on integer variables. - Support for MPIT scope, verbosity, and binding. - Support for command line source. - Support for setting variable source via the environment using OMPI_MCA_SOURCE_<var name>=source (either command or file:filename) - Cleaner API. - Support for variable groups (equivalent to MPIT categories). Notes: - Variables must be created with a backing store (char *, int , or bool *) that must live at least as long as the variable. - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of mca_base_var_set_value() to change the value. - String values are duplicated when the variable is registered. It is up to the caller to free the original value if necessary. The new value will be freed by the mca_base_var system and must not be freed by the user. - Variables with constant scope may not be settable. - Variable groups (and all associated variables) are deregistered when the component is closed or the component repository item is freed. This prevents a segmentation fault from accessing a variable after its component is unloaded. - After some discussion we decided we should remove the automatic registration of component priority variables. Few component actually made use of this feature. - The enumerator interface was updated to be general enough to handle future uses of the interface. - The code to generate ompi_info output has been moved into the MCA variable system. See mca_base_var_dump(). opal: update core and components to mca_base_var system orte: update core and components to mca_base_var system ompi: update core and components to mca_base_var system This commit also modifies the rmaps framework. The following variables were moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode, rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables. This commit was SVN r28236.	2013-03-27 21:09:41 +00:00
Ralph Castain	e7ac6c9bde	Don't build rank_file if you can't use it anyway This commit was SVN r28233.	2013-03-27 15:12:40 +00:00
Ralph Castain	256414121e	Protect the cpus-per-rank MCA param registration so that --without-hwloc will build This commit was SVN r28232.	2013-03-27 14:53:30 +00:00
Ralph Castain	317915225c	Finish the binding cleanup by removing the no-longer-used binding level scheme. This proved to be fallible as there is no guarantee that the hierarchy it used matched physical reality of the machine (e.g., is L3 "above" the socket or not). Still have to complete the ppr update, but get the rest of it correct. This commit was SVN r28223.	2013-03-26 20:09:49 +00:00
Ralph Castain	6ee32767d4	Restore the cpus-per-proc option for byslot and bynode mapping. Remove the bind_idx (which recorded the index of the hwloc object where the proc was bound) as this would no longer be unique, and just use the bitmap as the standard reference for location. Update the relative locality computation to take bitmaps as its argument. This commit was SVN r28219.	2013-03-26 18:27:50 +00:00
Ralph Castain	2f43989d22	Add debug and handle the use-case where someone (a) uses a hostfile while in a managed allocation to sub-allocate runs, and (b) includes the HNP's node in one of those hostfiles. cmr:v1.7 This commit was SVN r28203.	2013-03-22 00:53:33 +00:00
Ralph Castain	cf9796accd	Remove the old configure option for disabling full rte support - we now use the OMPI rte framework for such purposes This commit was SVN r28134.	2013-02-28 01:35:55 +00:00
Ralph Castain	8d2fa3693b	First cut at removing the native Windows support. Remove all the Windows-specific components, and the .windows files sprinkled around. Remove the Windows platform files and MTT scripts. Update the NEWS to point Windows users to the cygwin package. This commit was SVN r28116.	2013-02-26 20:44:56 +00:00
Jeff Squyres	8e25b927ab	Clean some minor warnings: remove variables that were set but never used. This commit was SVN r27974.	2013-01-29 23:35:42 +00:00
Ralph Castain	112f8eedb1	Handle the case where rankfile is providing the allocation This commit was SVN r27971.	2013-01-29 20:37:58 +00:00
Ralph Castain	f6b4db0b79	Fix rank_file operations. We changed the syntax to use semi-colons between multiple slot assignments so that we could use the comma to separate specific cores, but somehow the flex definitions didn't get updated to accept that character. We also incorrectly zero'd the bitmap between slot assignment sections, and so multiple slot assignments only wound up making the last one in the list. This commit was SVN r27908.	2013-01-25 18:33:25 +00:00
Nathan Hjelm	3e1b13b13a	Re-add support for old flex (2.5.4a and earlier) while still cleaning up properly in new flex. This commit was SVN r27657.	2012-12-07 00:12:43 +00:00
Nathan Hjelm	e0f5137e46	add prototypes for lex destroy functions This commit was SVN r27580.	2012-11-09 22:00:27 +00:00
Nathan Hjelm	8658bbc902	instead of relying on yyterminate to clean up the lex context call the destroy functions directly (after closing the file) This commit was SVN r27577.	2012-11-09 16:10:55 +00:00
Ralph Castain	9b729794f2	A prior commit apparently broke the trunk when something was inadvertently left behind - so remove a reference to a no-longer-existing function This commit was SVN r27574.	2012-11-07 11:11:05 +00:00
Nathan Hjelm	7fb5caea92	Remove the finish_parsing function from various .l files. The function is incomplete (doesn't clean up the lex state) and should be replaced by *_yylex_destroy which correctly cleans up the state. Checked with the flex 2.5.35. Verified with valgrind that this fixes several "still reachable" leaks. cmr:v1.7 This commit was SVN r27571.	2012-11-06 19:26:14 +00:00
Nathan Hjelm	bdedd8b0d3	Per RFC modify the behavior of mca_base_components_close to NOT close the output. Modify frameworks to always close their output and set to -1. Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality. This commit was SVN r27570.	2012-11-06 19:09:26 +00:00
Ralph Castain	094d6f3143	Add a new "distributed file system" capability to support file access operations across nodes that do not have a network file system attached to them. Add a set of URI create/parse utilities This commit was SVN r27483.	2012-10-25 17:15:17 +00:00
Ralph Castain	4028ce7a5d	Silence warnings by making types match This commit was SVN r27446.	2012-10-14 03:45:28 +00:00
Ralph Castain	285a3b168d	Add an ability to specify the max number of simultaneous procs/node for an application when operating in staged mode. Change some debug statements from OPAL_OUTPUT_VERBOSE to opal_output_verbose so they are available in optimized builds. This commit was SVN r27445.	2012-10-14 03:31:32 +00:00
Ralph Castain	54db4c35eb	Get the trunk to build again when --without-hwloc is specified. Move a couple of key type definitions and utilities out from under the HAVE_HWLOC test so they are always available as they don't really depend on hwloc's presence. Tell two compnents not to build if hwloc is disabled: ompi/mca/sbgp/basesmsocket orte/mca/rmaps/lama Remove stale configure.params files from the sbgp framework as the OMPI build system no longer looks at those files. This commit was SVN r27377.	2012-09-26 23:24:27 +00:00

1 2 3 4 5 ...

627 Коммитов