openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	b314bfb5e9	If someone specifies the bitmap for hwthreads and wants hwthread cpus, then don't parse the slot list as it expects cores - just copy the provided bitmap across as it already has the required info	2014-12-19 10:56:14 -08:00
Ralph Castain	0630680f36	Two cleanups required for transfer to 1.8.4: * Use %d format for the topo signature as some systems apparently have problems with %u * Use correct variable in show_help message	2014-12-12 17:23:32 -08:00
Ralph Castain	b757b3f452	Ensure that the #nodes in the job map gets properly updated when using the sequential mapper. Provide some further diagnostic info to help understand the problem when encountered.	2014-12-08 08:03:53 -08:00
Ralph Castain	cb15cc06e1	Minor changes per Jeff's request on PR for 1.8.4	2014-12-02 19:54:10 -08:00
Ralph Castain	3f9d9ae8b6	Provide tighter LSF integration by correctly handling scenarios where the user has asked LSF to assign bindings. Fix a couple of typos in lex parser definitions. Tell hostfile parser to ignore binding designations in hostfiles. Add an attribute to indicate that cpusets were provided as physical cpu ids. Once validated, a version of this will be backported to the v1.8.4 release.	2014-11-30 11:50:31 -08:00
Ralph Castain	2a90788724	Support physical processor ids in rankfile	2014-11-10 14:00:40 -08:00
Ralph Castain	ea11e63f59	Per patch from Tetsuya, allow the user to bind-to none when specifying multiple pe's/rank as requested by Reuti. This allows the user to reserve multiple "slots" in the allocation for each process while mapping, but not to bind the process to specific processing elements on the node. Reviewed by rhc, so RM-approved to go across to v1.8.3 cmr=v1.8.3:reviewer=ompi-gk1.8 This commit was SVN r32701.	2014-09-10 15:52:18 +00:00
Ralph Castain	4207b4c4ad	Improve the --bind-to help message to better indicate the default options under various values of np. Remove the warning message if the user doesn't specify a binding policy and we are overloaded cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32687.	2014-09-08 21:03:51 +00:00
Ralph Castain	842aaf6167	Correctly end mapping oversubscribed nodes round-robin byslot cmr=v1.8.3:reviewer=rhc This commit was SVN r32616.	2014-08-27 16:15:18 +00:00
Ralph Castain	024572cb6c	Sigh - I promised to remove these deprecation warnings back in June. My apologies to Dave Goodell and others who requested it. cmr=v1.8.2:reviewer=dgoodell:subject=remove deprecation warnings for pernode, npernode, and npersocket This commit was SVN r32552.	2014-08-19 19:40:20 +00:00
Gilles Gouaillardet	c3c364a262	check-help-strings cleanup This commit was SVN r32494.	2014-08-11 03:22:05 +00:00
George Bosilca	daa076995a	orte_rmaps_numa_node_t -> opal_rmaps_numa_node_t This commit was SVN r32380.	2014-07-31 19:58:47 +00:00
Ralph Castain	5bb5b22573	When a user asks for cpus/rank > 1 and only has one slot, we need to ensure we always map at least one process when they don't tell us -np cmr=v1.8.2:reviewer=rhc:subject=correct num_procs in corner case This commit was SVN r32142.	2014-07-04 17:00:35 +00:00
Ralph Castain	149810f02c	Per request from Jeff, slightly modify the show_help message as the precise name of the NUMA-containing packages differs based on OS and distro cmr=v1.8.2:reviewer=jsquyres:subject=modify show_help message This commit was SVN r32122.	2014-07-02 14:46:00 +00:00
Ralph Castain	8fca77c3d3	Protect the binding policy setting so it builds when --without-hwloc Refs trac:4742 This commit was SVN r32085. The following Trac tickets were found above: Ticket 4742 --> https://svn.open-mpi.org/trac/ompi/ticket/4742	2014-06-25 18:13:54 +00:00
Ralph Castain	5f6be06b54	Per request from Gilles and discussion at devel conference, have the --oversubscribe option automatically set both oversubscribe and overload-allowed properties as this is likely what the user intended. cmr=v1.8.2:reviewer=rhc:subject=automatically set oversub/load This commit was SVN r32072.	2014-06-24 18:11:39 +00:00
Ralph Castain	645df5e823	Don't release the node_name field as it gets used in the slots parsing - will be released at newline detection This commit was SVN r32058.	2014-06-20 13:18:46 +00:00
Ralph Castain	9a47e45a09	<laugh> ensure we really compare the things we want to compare This commit was SVN r32055.	2014-06-19 20:54:25 +00:00
Ralph Castain	e65538e91b	Add some defensive programming, fix a typo This commit was SVN r32054.	2014-06-19 20:52:13 +00:00
Ralph Castain	b43f760f93	If you don't specify all the rank-file mapping for all procs, then you'll segfault - which is probably a bad idea. I can't see an easy workaround, so just error out for now and let's see if anyone really cares. cmr=v1.8.2:reviewer=jsquyres This commit was SVN r32053.	2014-06-19 20:30:06 +00:00
Ralph Castain	65275d6326	Add a little more info to the warning message - i.e., that the likely cause of the problem is missing libnumactl and/or libnumactl-devel cmr=v1.8.2:reviewer=miked:subject=improve memory binding failure message This commit was SVN r32030.	2014-06-18 19:20:28 +00:00
Ralph Castain	3f04d50cb0	Per the ticket, resolve our handling of overload conditions to provide a more consistent response. If we are overloaded (i.e., attempting to bind more processes to a location than the number of cpus under that location), then we consider the following conditions: (a) default binding policy is in effect. In this case, we will emit a warning and default to not binding unless the user provided the "oversubscribe" or "overload" modifier to the "bind-to" option. (b) user-specified binding policy is in effect. In this case, we will error out unless the user provided the "oversubscribe" or "overload" modifier to the "bind-to" option as we cannot meet the directive. Either "bind-to" modifier (oversubscribe or overload) will be accepted for now - in 1.9, we will deprecate the "overload" term in favor of "oversubscribe". Also added the ability to accept a --bind-to modifier without specifying the binding policy itself so a user can specify overload-allowed with the default policy. Closes trac:4345 cmr=v1.8.2:reviewer=rhc:subject=resolve handling of overload conditions This commit was SVN r32005. The following Trac tickets were found above: Ticket 4345 --> https://svn.open-mpi.org/trac/ompi/ticket/4345	2014-06-14 15:38:32 +00:00
Ralph Castain	56c3575c0e	Can't emit an error for an unrecognized mapping policy modifier as the ppr policy relies on not doing so. This commit was SVN r31998.	2014-06-13 20:10:09 +00:00
Ralph Castain	3ed282bf44	Per patch from Tetsuya, correct the cpus-per-proc logic so we correctly detect when the user is attempting to bind too low for that option Refs trac:4702 This commit was SVN r31988. The following Trac tickets were found above: Ticket 4702 --> https://svn.open-mpi.org/trac/ompi/ticket/4702	2014-06-13 16:32:52 +00:00
Ralph Castain	06dbfa3098	Make the cpus-per-proc equivalent a little more intuitive: * allow users to specify just a modifier for map-by instead of requiring that they also specify a policy. Thus, we now accept --map-by :pe=3 as indicating that we should use the default mapping policy, but bind 3 cpus/proc. * if users specify a pe's/proc but no policy, default to --map-by NUMA to ensure we have access to multiple cpus for the request. This won't guarantee we have access to enough to meet the request, but gives us a chance. In addition, we know that binding a proc to multiple cpus will work best if those cpus are all in the same NUMA, so this provides some degree of optimized behavior. Per a request from Jeff, define "oversubscribe" for binding as a synonym for the "overload" modifier. cmr=v1.8.2:reviewer=rhc This commit was SVN r31967.	2014-06-08 20:26:59 +00:00
Ralph Castain	638c24f655	Correct the bind-in-place algorithm to better handle comm_spawn. If the location identified by the mapper is already occupied by procs from another job, then we need to shift either right or left until we find an unoccupied location where we can be bound. If nothing is available, then check for the overload flag (and bind us in the original location if provided), or see if this was the default binding policy instead of one specified by the user - if so, then just don't bind this process. cmr=v1.8.2:reviewer=rhc This commit was SVN r31959.	2014-06-06 12:36:14 +00:00
Ralph Castain	f1978fba7c	Cleanup a set of typos on the orte_get_attribute call This commit was SVN r31942.	2014-06-03 20:36:38 +00:00
Ralph Castain	8736a1c138	Per RFC: http://www.open-mpi.org/community/lists/devel/2014/05/14822.php Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root). This commit was SVN r31916.	2014-06-01 16:14:10 +00:00
Ralph Castain	f55c587a74	Per patch from Tetsuya Mishima, ensure the rank_file mapper accurately tracks number of nodes in the map Refs trac:4594 This commit was SVN r31725. The following Trac tickets were found above: Ticket 4594 --> https://svn.open-mpi.org/trac/ompi/ticket/4594	2014-05-13 14:36:25 +00:00
Ralph Castain	5602156a1c	Use the correct abstraction layer name for the data dirs This commit was SVN r31684.	2014-05-08 14:32:24 +00:00
Ralph Castain	11faab1091	The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees. This commit was SVN r31679.	2014-05-08 02:01:35 +00:00
Ralph Castain	6545e6e9a8	Add one more check for failed mapping that rarely occurs, but results in a hang when it does cmr=v1.8.2:reviewer=rhc This commit was SVN r31598.	2014-05-02 10:35:14 +00:00
Ralph Castain	61d94fcee2	Fix the sequential mapper - it was out-of-sync with the hostfile changes, and we missed the "seq" policy when parsing the --map-by option. Thanks to Bill Chen for reporting it cmr=v1.8.1:reviewer=jsquyres This commit was SVN r31333.	2014-04-08 03:38:25 +00:00
Jeff Squyres	82e104719a	hwloc/rmaps base: Add missing help message. Also, add missing ORTE_ERROR_LOG in the other case where this error message is used (i.e., ORTE_ERROR_LOG was used in the one place, so let's also use it in the other place). This commit was SVN r31321.	2014-04-07 15:39:54 +00:00
Ralph Castain	3fdcaeab97	Fix a problem where we need to abort due to a mapping failure, but we are in a managed environment and thus the orteds have not wired up. Thus, if we send the exit message across the routed network, the remote daemons won't have a way to relay the message along - and we won't exit. If we are aborting, then set the flags so the HNP directly sends an exit command to each daemon. Make it the halt_vm command so the remote daemon doesn't try to relay it, but instead just exits without waiting for its routed children to exit first. cmr=v1.8.1:reviewer=jsquyres:subject=fix hangs due to abort prior to daemon wireup This commit was SVN r31304.	2014-04-02 04:17:55 +00:00
Ralph Castain	714cb8f573	Silence warnings cmr=v1.8:reviewer=rhc This commit was SVN r31248.	2014-03-27 14:16:54 +00:00
Ralph Castain	390645ac2a	Per patch from Tetsuya Mishima, do a nicer job of warning the user that we need to map to a higher level to get the number of requested cpus/rank. Also, change the mapping policy to "byslot" when falling back to that option. cmr=v1.8:reviewer=rhc This commit was SVN r31196.	2014-03-24 15:47:29 +00:00
Ralph Castain	081669b440	When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it cmr=v1.7.5:reviewer=rhc:subject=fix pretty-print of bindings This commit was SVN r30968.	2014-03-10 15:53:07 +00:00
Ralph Castain	fc2dd6ac48	Per Jeff's request, add a more detailed comment as to why we are turning off the warning at this time. Refs trac:4339 This commit was SVN r30948. The following Trac tickets were found above: Ticket 4339 --> https://svn.open-mpi.org/trac/ompi/ticket/4339	2014-03-06 02:17:25 +00:00
Ralph Castain	a2b539c763	Per the telecon, silence the warning for 1.7.5 to give us time to consider a better permanent solution Refs trac:4339 This commit was SVN r30941. The following Trac tickets were found above: Ticket 4339 --> https://svn.open-mpi.org/trac/ompi/ticket/4339	2014-03-05 03:02:29 +00:00
Ralph Castain	50c30d62ca	Repair builds without hwloc cmr=v1.7.5:reviewer=jsquyres This commit was SVN r30940.	2014-03-05 02:48:15 +00:00
Ralph Castain	0ac97761cc	Now that we are binding by default, the issue of #slots and what to do when oversubscribed has become a bit more complicated. This isn't a problem in managed environments as we are always provided an accurate assignment for the #slots, or when -host is used to define the allocation since we automatically assume one slot for every time a node is named. The problem arises when a hostfile is used, and the user provides host names without specifying the slots= paramater. In these cases, we assign slots=1, but automatically allow oversubscription since that number isn't confirmed. We then provide a separate parameter by which the user can direct that we assign the number of slots based on the sensed hardware - e.g., by telling us to set the #slots equal to the #cores on each node. However, this has been set to "off" by default. In order to make this a little less complex for the user, set the default such that we automatically set #slots equal to #cores (or #hwt's if use_hwthreads_as_cpus has been set) only for those cases where the user provides names in a hostfile but does not provide slot information. Also cleanup some a couple of issues in the mapping/binding system: * ensure we only override the binding directive if we are oversubscribed and overload is not allowed * ensure that the MPI procs don't attempt to bind themselves if they are launched by an orted as any binding directive (no matter what it was) would have been serviced by the orted on launch * minor cleanup to the warning message when oversubscribed and binding was requested cmr=v1.7.5:reviewer=rhc:subject=update mapping/binding system This commit was SVN r30909.	2014-03-03 16:46:37 +00:00
Ralph Castain	88b0e0cc6d	Allow the user to turn off the oversubscribed-binding warning if overload-allowed has been provided Refs trac:4317 This commit was SVN r30892. The following Trac tickets were found above: Ticket 4317 --> https://svn.open-mpi.org/trac/ompi/ticket/4317	2014-02-28 17:55:53 +00:00
Ralph Castain	4a645f0342	Add detection of oversubscription with binding requested - if binding requested to core or hwt, warn and do not bind or else we will hurt performance. Also, if no binding directive was given, turn off the default binding Refs trac:4317 This commit was SVN r30888. The following Trac tickets were found above: Ticket 4317 --> https://svn.open-mpi.org/trac/ompi/ticket/4317	2014-02-28 16:08:52 +00:00
Ralph Castain	8500247c7b	Fix the by-obj mapper in the case where slots are not specified, and so we are in a perpetual oversubscribed state cmr=v1.7.5:reviewer=rhc This commit was SVN r30887.	2014-02-28 05:21:46 +00:00
Ralph Castain	a4c3d0a5a0	Add some more debug to the by-obj mapper This commit was SVN r30884.	2014-02-28 02:52:53 +00:00
Ralph Castain	d109c523b9	Per patch from Tetsuya Mishima, complete the overhaul of the round-robin mappers Refs trac:4296 This commit was SVN r30861. The following Trac tickets were found above: Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296	2014-02-27 00:43:53 +00:00
Ralph Castain	61a21e4f31	Based on Tetsuya's patch, with some changes, correct the case of map-by node where multiple cpus/rank are requested and result in a non-integer match with num slots. Also correct tests for binding policy given to use the proper macro. Refs trac:4296 This commit was SVN r30857. The following Trac tickets were found above: Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296	2014-02-26 18:12:23 +00:00
Ralph Castain	b880aa46bd	Update the map-by obj and map-by obj:span mappers to correct for errors in computing carryover across the nodes. Be a little less complex in the algorithm so it is easier to follow and debug. Refs trac:4296 This commit was SVN r30826. The following Trac tickets were found above: Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296	2014-02-25 23:32:43 +00:00
Ralph Castain	c8112c1086	Loadbalancing across nodes (i.e., map-by node) wasn't working correctly - the algorithm relied on the nodes being defined in descending order of slots, or the numbe r of slots remaing to be assigned being only one/node. Regardless, it didn't work for the case where nodes were defined in ascending order of slots. Tetsuya's proposed patch didn't solve the problem for me, but it did correct the case where cpus/proc > 1. The final patch requires that we loop over the assignment algo until all procs are assigned or all nodes are filled - any remaining procs are then handled in the cleanup loop. cmr=v1.7.5:reviewer=rhc:subject=fix map-by node for different cases This commit was SVN r30798.	2014-02-22 16:39:41 +00:00

1 2 3 4 5 ...

524 Коммитов