openmpi

Автор	SHA1	Сообщение	Дата
Gilles Gouaillardet	d529951206	hwloc: correctly count cores with at least one allowed PU when SMT is enabled, a core must be counted as long as one of its hwthread is allowed Thanks Ben Menadue for the report. This fixes a regression from open-mpi/ompi@6d149554a7	2016-01-29 11:54:34 +09:00
Gilles Gouaillardet	6d149554a7	hwloc: have opal_hwloc_base_get_pu search for HWLOC_OBJ_PU when mpirun is invoked with --use-hwthread-cpus Fixes open-mpi/ompi#1247	2016-01-26 18:10:33 +09:00
Tim Mattox	958de82471	hwloc_base_util.c: Remove newly unused variable 'i'.	2016-01-14 16:35:47 -05:00
Tim Mattox	f2d4a8d266	Replace a bit counting loop with a call to an efficient population count routine	2016-01-12 10:48:56 -05:00
Gilles Gouaillardet	975b6fd51b	hwloc: do not count not allowed cores in df_search_cores	2015-09-17 13:10:34 +09:00
Nathan Hjelm	899bf548a2	opal/hwloc: fix topology detection when socket is above numa The OPAL_PROC_ON_* definitions have been changed from values to flags. This should not cause any problems as these values were already used as flags throughout the code base. Note, there will be a difference between localities produced by the new code and the old. For example, if a machine does not have a level-3 but two cores share a level-1 or level-2 cache cache the level-3 bit will not be set in the locality and OPAL_PROC_ON_LOCAL_L3CACHE will return 0. Before this change it would have returned 1. In addition the OPAL_PROC_ON_LOCAL_* macros have been simplified. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-10 14:17:45 -06:00
Ralph Castain	ed93154e43	Fix hetero operations. An error in the hwloc utilities only allocated memory for the first display of a binding map, and then assumed that all nodes had the same number of cores in them. This resulted in memory corruption whenever someone displayed a binding pattern for a hetero cluster, and a smaller node was first in line.	2015-07-07 12:52:16 -07:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Ralph Castain	ed5d10b816	Somehow slipped by - ensure we correctly count the cores	2015-03-19 17:56:18 -07:00
Ralph Castain	43a3baad5e	Ensure we use the first compute node's topology for mapping Don't filter the topology by cpuset if you are mpirun until you know that no other compute nodes are involved. This deals with the corner case where mpirun is executing on a node of different topology from the compute nodes. Simplify - don't mandate that all cpus in the given cpuset be present on every node. We can then run everything thru the filter as before, which ensures that any procs run on mpirun are also contained within the specified cpuset. Correctly count the number of available PUs under each object when given a cpuset Fix the default binding settings, and correctly count PUs when no cpuset is given Ensure the binding policy gets set in all cases	2015-03-19 16:30:36 -07:00
Nysal Jan K.A	881a9f3d58	Fix cache line size detection on power Due to the nature of the cache architecture on power, we don't export coherency_line_size for L2 in sysfs. If we are unable to get the L2 cache line size, try L1. See open-mpi/ompi#383 for more information.	2015-02-25 17:26:28 +05:30
Gilles Gouaillardet	8d44d7086a	hwloc/base: fix misc memory leaks as reported by Coverity with CIDs 710636 and 1270441	2015-02-23 13:55:04 +09:00
Gilles Gouaillardet	55948f2a6d	hwloc: fix misc memory leak as reported by Coverity with CID 1270441 (previous commit open-mpi/ompi@c25185f3a9 did not fully fix that one)	2015-02-17 14:06:15 +09:00
Gilles Gouaillardet	c25185f3a9	opal/hwloc: fix misc memory leaks as reported by Coverity with CIDS 710631-710638, 1196705, 1196716, 1196717, 1196752, 1196753	2015-02-16 12:23:37 +09:00
Gilles Gouaillardet	8dd77c692e	opal/hwloc: fix misc bugs as reported by Coverity with CIDs 72224, 703566, 1196821, 1196842, 1196657 and 1196658	2015-02-16 11:59:48 +09:00
Ralph Castain	0630680f36	Two cleanups required for transfer to 1.8.4: * Use %d format for the topo signature as some systems apparently have problems with %u * Use correct variable in show_help message	2014-12-12 17:23:32 -08:00
Ralph Castain	9b2f8cd840	Add the processor architecture to the topology signature	2014-12-09 01:17:00 -08:00
Ralph Castain	bb529ebd8e	Revise the way we handle hetero nodes as users are finding this (a) a significant surprise, and (b) confusing as to when it is required. So try to automate it a bit by creating a topology "signature" that mpirun can share on the cmd line with the remote daemons, thus allowing them to check to see if they match. This isn't comprehensive of course - for now, it only checks the number of each type of hwloc object on the node. This is good enough to pickup major differences (e.g., where we have different numbers of sockets or assigned core bindings). Retain the hetero-nodes flag for those cases where the user knows that there are differences and our automated system isn't good enough to see it. Will obviously require further refinement as we find out which variances it can detect, and which it cannot.	2014-12-08 15:38:14 -08:00
Ralph Castain	cb15cc06e1	Minor changes per Jeff's request on PR for 1.8.4	2014-12-02 19:54:10 -08:00
Ralph Castain	960ef34988	Ensure the LSF ras adds the hosts to the allocation. Correctly handle the semi-colon vs comma situation in hwloc slot_lists	2014-11-30 14:37:37 -08:00
Ralph Castain	3f9d9ae8b6	Provide tighter LSF integration by correctly handling scenarios where the user has asked LSF to assign bindings. Fix a couple of typos in lex parser definitions. Tell hostfile parser to ignore binding designations in hostfiles. Add an attribute to indicate that cpusets were provided as physical cpu ids. Once validated, a version of this will be backported to the v1.8.4 release.	2014-11-30 11:50:31 -08:00
Ralph Castain	d0704ef118	Restore handling of physical processors in rankfiles. Note that the prior implementation was likely incorrect as it falsely assumed that physical core indices were unique, which isn't always true. Stipulate that physical rankfiles can only include PU numbers, and bind the result to the core that contains that physical PU. Update the mpirun man page to cover the new use-case.	2014-11-10 14:00:40 -08:00
Ralph Castain	2a90788724	Support physical processor ids in rankfile	2014-11-10 14:00:40 -08:00
George Bosilca	daa076995a	orte_rmaps_numa_node_t -> opal_rmaps_numa_node_t This commit was SVN r32380.	2014-07-31 19:58:47 +00:00
Ralph Castain	9fca25a8dd	Catch one more place where we need to use the actual topology instead of opal_hwloc_topology. Thanks to Tetsuya Mishima for the patch Reviewed okay. RM-approved cmr=v1.7.5:reviewer=ompi-gk1.7 This commit was SVN r31016.	2014-03-12 00:49:54 +00:00
Ralph Castain	081669b440	When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it cmr=v1.7.5:reviewer=rhc:subject=fix pretty-print of bindings This commit was SVN r30968.	2014-03-10 15:53:07 +00:00
Ralph Castain	193cceb483	Okay, since a certain other RM out there made a fuss about being able to lock their daemons to specified cores, offer the same option here. The MCA param orte_daemon_cores can be used to specify which core(s) you want the orte daemons to use. This will have no bearing on the application procs - unbound will remain unbound, and binding directives will be applied to the apps. Yippee skippee... This commit was SVN r30513.	2014-01-30 23:50:14 +00:00
George Bosilca	18ae20022a	Don't forget to release the bitmaps. This commit was SVN r30428.	2014-01-26 17:24:38 +00:00
Ralph Castain	fb9e427320	One last corner case - when encountering an overload condition (e.g., by comm_spawning more procs than we have cores) and we are using the default binding policy, do not bind the new procs to anything as this can cause major problems. Instead, let the spawn succeed since the user didn't specifically ask to be bound, and leave the new procs as unbound. Refs trac:4077 This commit was SVN r30200. The following Trac tickets were found above: Ticket 4077 --> https://svn.open-mpi.org/trac/ompi/ticket/4077	2014-01-09 22:39:34 +00:00
Ralph Castain	f179f2086b	Do a better job of reporting bindings - if someone gives a spec that binds us to all processors, then we are effectively unbound and should report it clearly instead of outputting a long line of B's. cmr=v1.7.4:reviewer=jsquyres:subject=Do a better job of reporting bindings This commit was SVN r30179.	2014-01-09 16:16:16 +00:00
Jeff Squyres	abeef55a55	Fix a few compiler warnings reported by clang: * Ensure "cnt" is always initialized * Ensure we dont' buffer overflow on strncat() -- need to ensure we account for the terminating \0 character * hwloc_get_type_depth() returns an int (not unsigned), and HWLOC_TYPE_DEPTH_UNKNOWN if it's unknown (which is probably <0, but still, might as well check what the official hwloc docs say to check for) cmr=v1.7.4:reviewer=rhc:subject=fix hwloc base compiler warnings This commit was SVN r29686.	2013-11-13 15:54:01 +00:00
Mike Dubman	840e2cb4a2	mindist: cosmetic, use fallback to byslot if unable to read NUMA info, small fix. fixed by Elena, reviewed by Ralph/Mike cmr=v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r29679.	2013-11-13 09:26:40 +00:00
Ralph Castain	75c306994e	Add some debug This commit was SVN r29523.	2013-10-26 02:26:21 +00:00
Ralph Castain	772a376d73	Correct location of elog file Refs trac:3847 This commit was SVN r29438. The following Trac tickets were found above: Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847	2013-10-14 19:21:45 +00:00
Ralph Castain	24c811805f	************************************************************** This change contains a non-mandatory modification of the MPI-RTE interface. Anyone wishing to support coprocessors such as the Xeon Phi may wish to add the required definition and underlying support ************************************************************** Add locality support for coprocessors such as the Intel Xeon Phi. Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host. So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following: 1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board 2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions 3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future. 4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time. 5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored. 6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set. cmr:v1.7.4:reviewer=hjelmn This commit was SVN r29435.	2013-10-14 16:52:58 +00:00
Ralph Castain	46ed907003	Correctly handle list of cores specified in the rankfile - i.e., a rankfile entry such as: rank 0=foo slot=0:0-1;1:0,1 cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29152.	2013-09-08 02:04:29 +00:00
Ralph Castain	7a7cfdd519	A little cleanup - the base function to sort numa lists must return something or you get a warning about non-void function returning without value, so cleanup the return values. Ensure the mindist module actually checks for a return of "error" so it won't segfault, and have it emit a polite message when that happens. cmr:v1.7.3:reviewer=jladd This commit was SVN r29089.	2013-08-29 20:01:06 +00:00
Joshua Ladd	1802aabf1a	Add support for autodetecting a MLNX HCA in the rmaps min distance feature. In this way, .ini files distributed with software stacks need not specify a particular HCA but instead may select the key word auto which will automatically select the discovered device. To use this feature, simply pass the keyword auto instead of a specific device name, --mca rmaps_base_dist_hca auto. If more than one card is installed, the mapper will inform the user of this and, at this point, the user will then need to specify which card via the normal route, e.g. --mca rmaps_base_dist_hca <dev_name>. This should be added to \ncmr=v1.7.4:reviewer=rhc:subject=Autodetect logic for min dist mapping This commit was SVN r29079.	2013-08-28 16:23:33 +00:00
Ralph Castain	446e33a5d8	There are cases where we want to use the novm state machine, but the backend node topology differs from that where mpirun is executing. In those cases, we can wind up thinking we are oversubscribed because the head node has fewer cores than the compute nodes. To resolve this situation, add the ability to specify a backend topology file that mpirun shall use for its mapping operations. Create a new "set_topology" function in opal hwloc to support it. This commit was SVN r28682.	2013-06-27 03:04:50 +00:00
Joshua Ladd	46362d2761	Stomps compiler warnings in HCA min-dist calculation. This should be added to cmr:v1.7:reviewer=jladd This commit was SVN r28620.	2013-06-12 16:25:25 +00:00
Jeff Squyres	6d173af329	This commit introduces a new "mindist" ORTE RMAPS mapper, as well as some relevant updates/new functionality in the opal/mca/hwloc and orte/mca/rmaps bases. This work was mainly developed by Mellanox, with a bunch of advice from Ralph Castain, and some minor advice from Brice Goglin and Jeff Squyres. Even though this is mainly Mellanox's work, Jeff is committing only for logistical reasons (he holds the hg+svn combo tree, and can therefore commit it directly back to SVN). ----- Implemented distance-based mapping algorithm as a new "mindist" component in the rmaps framework. It allows mapping processes by NUMA due to PCI locality information as reported by the BIOS - from the closest to device to furthest. To use this algorithm, specify: {{{mpirun --map-by dist:<device_name>}}} where <device_name> can be mlx5_0, ib0, etc. There are two modes provided: 1. bynode: load-balancing across nodes 1. byslot: go through slots sequentially (i.e., the first nodes are more loaded) These options are regulated by the optional ''span'' modifier; the command line parameter looks like: {{{mpirun --map-by dist:<device_name>,span}}} So, for example, if there are 2 nodes, each with 8 cores, and we'd like to run 10 processes, the mindist algorithm will place 8 processes to the first node and 2 to the second by default. But if you want to place 5 processes to each node, you can add a span modifier in your command line to do that. If there are two NUMA nodes on the node, each with 4 cores, and we run 6 processes, the mindist algorithm will try to find the NUMA closest to the specified device, and if successful, it will place 4 processes on that NUMA but leaving the remaining two to the next NUMA node. You can also specify the number of cpus per MPI process. This option is handled so that we map as many processes to the closest NUMA as we can (number of available processors at the NUMA divided by number of cpus per rank) and then go on with the next closest NUMA. The default binding option for this mapping is bind-to-numa. It works if you don't specify any binding policy. But if you specified binding level that was "lower" than NUMA (i.e hwthread, core, socket) it would bind to whatever level you specify. This commit was SVN r28552.	2013-05-22 13:04:40 +00:00
Nathan Hjelm	365cf48db5	Update OPAL frameworks to use the MCA framework system. This commit was SVN r28239.	2013-03-27 21:11:47 +00:00
Ralph Castain	317915225c	Finish the binding cleanup by removing the no-longer-used binding level scheme. This proved to be fallible as there is no guarantee that the hierarchy it used matched physical reality of the machine (e.g., is L3 "above" the socket or not). Still have to complete the ppr update, but get the rest of it correct. This commit was SVN r28223.	2013-03-26 20:09:49 +00:00
Ralph Castain	6ee32767d4	Restore the cpus-per-proc option for byslot and bynode mapping. Remove the bind_idx (which recorded the index of the hwloc object where the proc was bound) as this would no longer be unique, and just use the bitmap as the standard reference for location. Update the relative locality computation to take bitmaps as its argument. This commit was SVN r28219.	2013-03-26 18:27:50 +00:00
Ralph Castain	8a79d37ac2	Fix a few bugs in the hwloc integration code. The "set binding policy" macro should flag that the policy was indeed set. Some systems don't report sockets, so the print functions need to check for that condition. cmr:v1.7 This commit was SVN r28209.	2013-03-25 17:51:45 +00:00
Ralph Castain	037918e7b4	Correctly parse the rank file slot_list when given "S:C" - the first position holds the socket, so start looking for cores at posn=1 This commit was SVN r28054.	2013-02-13 13:06:03 +00:00
Ralph Castain	f6b4db0b79	Fix rank_file operations. We changed the syntax to use semi-colons between multiple slot assignments so that we could use the comma to separate specific cores, but somehow the flex definitions didn't get updated to accept that character. We also incorrectly zero'd the bitmap between slot assignment sections, and so multiple slot assignments only wound up making the last one in the list. This commit was SVN r27908.	2013-01-25 18:33:25 +00:00
Jeff Squyres	6af6809dc2	* Fix some comments. * Use the hwloc logical index, not the os_index. Fixes problems with opal_hwloc_base_cset2str() output (e.g., --report-bindings output) on machines where the os_index is not tightly packed in the range ![0, n-1] This commit was SVN r27394.	2012-10-03 09:33:40 +00:00
Ralph Castain	54db4c35eb	Get the trunk to build again when --without-hwloc is specified. Move a couple of key type definitions and utilities out from under the HAVE_HWLOC test so they are always available as they don't really depend on hwloc's presence. Tell two compnents not to build if hwloc is disabled: ompi/mca/sbgp/basesmsocket orte/mca/rmaps/lama Remove stale configure.params files from the sbgp framework as the OMPI build system no longer looks at those files. This commit was SVN r27377.	2012-09-26 23:24:27 +00:00
Ralph Castain	662bc05aa6	Refs trac:3322 Cannot start the data clearing at the root object level as the root object has a different struct attached to userdata. This commit was SVN r27357. The following Trac tickets were found above: Ticket 3322 --> https://svn.open-mpi.org/trac/ompi/ticket/3322	2012-09-20 23:30:32 +00:00

1 2

74 Коммитов