openmpi

Автор	SHA1	Сообщение	Дата
Mark Allen	bf3980d70c	fix hang in -np 3 --rank-by core The following command hangs: % mpirun --rank-by core -np 3 --report-bindings hostname because of a loop where i is supposed to cycle through an array of size num_objs, but for some reason it's only looking at node->num_procs entries. I changed the counter so it stays in the loop (stays on this node) until it makes a full cycle through the array of objects without any assignments then it ends the loop so it can go to the next node. Signed-off-by: Mark Allen <markalle@us.ibm.com>	2019-04-12 15:34:02 -04:00
Mark Allen	bdd92a7a64	-cpu-set as a constraint rather than as a binding The first category of issue I'm addressing is that recent code changes seem to only consider -cpu-set as a binding option. Eg a command like this % mpirun -np 2 --report-bindings --use-hwthread-cpus \ --bind-to cpulist:ordered --map-by hwthread --cpu-set 6,7 hostname which just round robins over the --cpu-set list. Example output which seems fine to me: > MCW rank 0: [..../..B./..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] > MCW rank 1: [..../...B/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] It should also be possible though to pass a --cpu-set to most other map/bind options and have it be a constraint on that binding. Eg % mpirun -np 2 --report-bindings \ --bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname % mpirun -np 2 --report-bindings \ --bind-to hwthread --map-by ppr:2:node,pe=2 --cpu-set 6,7,12,13 hostname The first command above errors that > Conflicting directives for mapping policy are causing the policy > to be redefined: > New policy: RANK_FILE > Prior policy: BYHWTHREAD The error check in orte_rmaps_rank_file_open() is likely too aggressive. The intent seems to be that any option like "--map-by whatever" will check to see if a rankfile is in use, and report that mapping via rmaps and using an explicit rankfile is a conflict. But the check has been expanded to not just check NULL != orte_rankfile but also errors out if (NULL != opal_hwloc_base_cpu_list && !OPAL_BIND_ORDERED_REQUESTED(opal_hwloc_binding_policy)) which seems to be only recognizing -cpu-set as a binding option and ignoring -cpu-set as a constraint on other binding policies. For now I've changed the NULL != opal_hwloc_base_cpu_list to OPAL_BIND_TO_CPUSET == OPAL_GET_BINDING_POLICY(opal_hwloc_binding_policy) so it hopefully only errors out if -cpu-set is being used as a binding policy. Whether I did that right or not it's enough to get to the next stage of testing the example commands I have above. Another place similar logic is used is hwloc_base_frame.c where it has /* did the user provide a slot list? */ if (NULL != opal_hwloc_base_cpu_list) { OPAL_SET_BINDING_POLICY(opal_hwloc_binding_policy, OPAL_BIND_TO_CPUSET); } where it used to (long ago) only do that if !OPAL_BINDING_POLICY_IS_SET(opal_hwloc_binding_policy) I think the new code is making it impossible to use --cpu-set as anything other than a binding policy. That brings us past the error detection and into the real functionality, some of which has been stripped out, probably in moving to hwloc-2: % mpirun -np 2 --report-bindings \ --bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname > MCW rank 0: [B.../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] > MCW rank 1: [.B../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] The rank_by() function in rmaps_base_ranking.c makes an array out of objects returned from opal_hwloc_base_get_obj_by_type(,,,i,) which uses df_search(). That function changed quite a bit from hwloc-1 to 2 but it used to include a check for available = opal_hwloc_base_get_available_cpus(topo, start) which is where the bitmask from --cpu-set goes. And it used to skip objs that had hwloc_bitmap_iszero(available). So I restored that behavior in ds_search() by adding a "constrained_cpuset" to replace start->cpuset that it was otherwise processing. With that change in place the first command works: % mpirun -np 2 --report-bindings \ --bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname > MCW rank 0: [..../..B./..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] > MCW rank 1: [..../...B/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] The other command uses a different path though that still ignored the available mask: % mpirun -np 2 --report-bindings \ --bind-to hwthread --map-by ppr:2:node:pe=2 --cpu-set 6,7,12,13 hostname > MCW rank 0: [BB../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] > MCW rank 1: [..BB/..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] In bind_generic() the code used to call opal_hwloc_base_find_min_bound_target_under_obj() which used opal_hwloc_base_get_ncpus(), and that's where it would intersect objects with the available cpuset and skip over ones that were't available. To match the old behavior I added a few lines in bind_generic() to skip over objects that don't intersect the available mask. After that we get % mpirun -np 2 --report-bindings \ --bind-to hwthread --map-by ppr:2:node:pe=2 --cpu-set 6,7,12,13 hostname > MCW rank 0: [..../..BB/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] > MCW rank 1: [..../..../..../BB../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] I think the above changes are improvements, but I don't feel like they're comprehensive. I only traced through enough code to fix the two specific bugs I was dealing with. Signed-off-by: Mark Allen <markalle@us.ibm.com>	2019-04-12 15:33:56 -04:00
Ralph Castain	35a597178d	Ensure that nodes are always used in order provided If a user provides a list of nodes to use via -host or -hostfile, then ensure that the ranks are placed according to that order. Also fix a bug where the number of slots on a node was incorrectly computed for localhost if the name given didn't exactly match the return from get_hostname. Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-03-15 12:58:10 -07:00
Ralph Castain	b19e5edf76	Correct parsing of ppr directives Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2019-01-02 09:03:13 -08:00
Ralph Castain	c86fede9df	Fix typo for rmaps_base_oversubscribe Causes the MCA param to be ignored, while the cmd line option still works. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-11-29 07:27:46 -08:00
Jeff Squyres	e9bf318dcb	orte-rmaps-base: slightly amend help message Follow on to 430c659908: clarify the help message and fix one typo. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-11-08 14:21:47 -08:00
Jeff Squyres	430c659908	orte-rmaps-base: update out-of-slots show_help message Update the show_help message for when there are not enough slots to run an application. Also, remove a bunch of copies of this message in various show_help text files that aren't used/referred to anywhere in the code. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-11-08 15:02:57 -05:00
Aurélien Bouteiller	2820aef551	Correctly propagate the oversubscribe flag to the spawnees Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>	2018-10-23 23:02:36 -04:00
Jeff Squyres	a85bad37df	orte: strncpy() -> opal_string_copy() Fairly straightforward conversion of strncpy() calls to opal_string_copy(). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-10-14 16:10:20 -07:00
Ralph Castain	f5a6b7f1e9	Fix -H operations for multi-app case Correctly aggregate slots across -H entries from each app. Take into account any -H entry when computing nprocs when no value was given. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-10-11 09:30:01 -07:00
Ralph H Castain	fc81d0d519	Replace asprintf with opal_asprintf Silence the flood of warnings from ORTE Signed-off-by: Ralph H Castain <rhc@open-mpi.org>	2018-10-06 19:32:37 +00:00
Ralph H Castain	51acbf738e	Fix map-by node for comm_spawn Do not reorder the available host list as this causes the head node process assignment to differ from those computed on the other nodes Signed-off-by: Ralph H Castain <rhc@open-mpi.org>	2018-10-06 15:58:45 +00:00
Jeff Squyres	6bb356ab87	Squash a bunch of harmless compiler warnings. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-09-26 12:15:21 -07:00
Ralph Castain	45f23ca5c9	Update mapping system Correctly transfer job-level mapping directives for dynamically spawned jobs to the mapping system. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-09-26 10:00:09 -07:00
Boris Karasev	beb0697f24	Fixed copyrights of prev commit. Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2018-08-27 09:50:11 +03:00
Boris Karasev	e5291ccc34	Fixed the NUMA obj detection for hwloc ver >= 2.0.0 Since version hwloc 2.0.0 has a new organization of NUMA nodes on the topology tree. This commit adds the detection of local NUMA object for hwloc => 2.0.0, which fixes the procs bindings policy for rmaps mindist component. Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2018-08-24 19:11:52 +03:00
Ralph Castain	bcdb1f45ac	Fix the multiple pe/proc option Things got a little out of whack and we weren't actually processing the map-by modifiers, plus an error crept into the display of the binding report. So clean those up. Thanks to @tonyreina for the error report Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-07-25 18:47:39 -07:00
Ralph Castain	6b6e63a346	Control inheritance of launch directives by child jobs Do not have child jobs inherit launch directives unless requested to do so. This affects the map-by, rank-by, bind-to, npernode, pernode, npersocket, persocket, and cpus-per-rank directives. Values provided in the spawn call always take precedence - if a particular value isn't specified, then the ORTE defaults will be used if inheritance is not requested, and the values specified by MCA param will be used if inheritance is set. Always inherit oversubscribe for now as otherwise MTT will break Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-07-10 15:12:05 -07:00
Ralph Castain	3b2390e5d5	Silence coverity warnings, remove/ignore build product Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-06-25 08:01:28 -07:00
Ralph Castain	f17d47087a	Define a new binding method and qualifier Allow users to request that procs be bound to a cpu in a given cpu-list based on their corresponding local rank Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-06-20 21:26:09 -07:00
Ralph Castain	98b4ed9a3a	Fix the no-disconnect test A race condition exists based on whether or not the userdata object attached to a hwloc_obj_t has been initialized. These objects are setup whenever we scan for resources under that location. You therefore must not set a variable to the pointer to the userdata object and then call a function that will initialize the data in it - you need to set the variable after the function call, and protect against a NULL pointer Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-06-19 13:52:34 -07:00
Ralph Castain	f0a0d606a0	Correct accounting for tools Signed-off-by: Ralph Castain <rhc@open-mpi.org> (cherry picked from commit 1be080f7b92bad39745f42628a8cb6afefad2d2a)	2018-06-18 13:24:25 -07:00
Ralph Castain	ea21f7175a	Silence warnings and remove unused code Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-06-16 17:42:48 -07:00
Ralph Castain	7c0ec7e851	Cleanup warnings in binding code This still leaves two unresolved warnings: base/rmaps_base_binding.c:577:22: warning: variable ‘clvm’ set but not used [-Wunused-but-set-variable] unsigned clvl=0, clvm=0; ^~~~ base/rmaps_base_binding.c:576:27: warning: variable ‘hwm’ set but not used [-Wunused-but-set-variable] hwloc_obj_type_t hwb, hwm; ^~~ The problem is that these values are used in the OPAL_HWLOC_MAKE_OBJ_CACHE macro to form a variable name. Thus, the compiler doesn't recognize the values as being "used". I'm not entirely sure how to resolve it cleanly. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-06-03 11:47:14 -07:00
Brice Goglin	c4dffa1d0f	rmaps: simplify the lookup for the binding object and fix for hwloc 2.0 Don't bother doing a lookup upwards or downwards for the target object type. Just use the target depth, iterate over the level until we find the min_bound object that intersects the locale cpuset. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>	2018-05-24 11:53:07 +02:00
Ralph Castain	d2040497b8	Silence Coverity warning Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-04-27 07:30:00 -07:00
Ralph Castain	1e8add52d7	Silence warning Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-04-26 11:45:25 -07:00
Ralph Castain	9ae80596f6	Fix rank-by option and improve npernode/skt This fixes a problem reported by @bgoglin where rank-by was incorrectly generating values when ranking by a type of object (e.g., socket). It also corrects the handling of the pernode, npernode, and npersocket options - these should only set the #procs and the default mapping pattern. They specifically should not prohibit the user from requesting a different mapping. Thus, the following should be valid: mpirun -npernode 2 --map-by socket ... should put 2 procs on each node, mapping them by-socket on each node. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-04-25 20:35:43 -07:00
Ralph Castain	d644f7ee26	Correctly fix the ranking policy Shorten the loops as much as possible - if someone wants to further optimize, they are welcome to do so. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-03-26 16:06:46 -07:00
Ralph Castain	322f6c5056	Fix a breakage in the ranking system While it may be faster to reverse the order of the assignment loops, it also results in the wrong answer Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-03-25 15:55:56 -07:00
Ralph Castain	cb221b6f6f	Correct mapping errors Since we now support the dynamic addition of hosts to the orte_node_pool, there is no longer any reason to require advanced specification of all possible nodes. Instead, use a precedence method to initially allocate only those hosts that were specified in the cmd line: * rankfile, if given, as that will specify the nodes * -host, aggregated across all app_contexts * -hostfile, aggregated across all app_contexts * default hostfile * assign local node Fix slots_inuse accounting so that the nodes are correctly reset upon error termination - e.g., when oversubscribed without permission. Ensure we accurately track the user's specified desires for oversubscribe and no-use-local when dynamically spawning jobs. Signed-off-by: Ralph Castain <rhc@open-mpi.org> (cherry picked from commit c9b3e68ce596a68a2ed2fbf73f211b3334b0a6a8)	2018-02-07 11:29:21 -08:00
Ralph Castain	73ef976ead	Silence warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-02-03 00:29:06 -08:00
Boris Karasev	52e81ee4b1	rmaps: fixed the ordering of `mpirun` target nodes Fixed the desync of job-nodelists between mpirun and orted daemons. The issue was observed when using RSH launching because user can provide arbitrary order of nodes regarding HNP placement. The mpirun process propagate the daemon's nodelist order to nodes. The problem was that HNP itself is assembling the nodelist based on user provided order. As the result ranks assignment was calculated differently on orted and mpirun. Consider following example: * User launches mpirun on node cn2. * Hostlist is cn1,cn2,cn3,cn4; ppn=1 * mpirun is passing hostlist cn[2:2,1,3-4]@0(4) to orteds So as result mpirun will assing rank 0 on cn1 while orted will assign rank 0 on cn2 (because orted sees cn2 as the first element in the node list) Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2018-02-01 17:16:05 +02:00
Ralph Castain	8a7a57d4e2	Remove debug from rmaps base Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-12-20 00:22:51 -08:00
Yu Feng	6aaf62584b	Mention --oversubscribe The current error message when the number of slots is insufficient (e.g. running mpirun -n 4 on a dual core machine) does not mention the use of `--oversubscribe`. In earlier version of Open MPI, the over-subscription was automatic (albeit buggy?); but the important point was no error message was printed and the application runs. Mentioning the oversubscibe flag in the message will ease up the transition to the current behaviour where explicit request is required. Also make a few other minor tweaks / cleanups to the orte-rmaps-seq:alloc-error help message. Signed-off-by: Yu Feng <rainwoodman@gmail.com> Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-11-27 10:14:13 -05:00
Boris Karasev	d2a568afa5	rmaps/mindist: reworked the job map binding The following issues have been fixed for `mindist`: - computing the job map on the backend nodes - using slots count (`-host node1:<s1>,nodeN:<sN>`) - fixed `dist:span` job mapping method - fixed `oversubcribe` option with `-host` Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2017-11-09 08:56:44 +02:00
Ralph Castain	dcf389b6fa	We now add all nodes to the job data object when we map, so don't do it twice Fixes #4449 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-11-05 17:17:50 -08:00
Ralph Castain	d7d127b9b5	Correctly assign locales when mapping ppr Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-10-03 03:03:56 -07:00
Ralph Castain	d5ce3c38e1	Begin cleaning up debugger support Debugger daemons do not count against available slots. Clean up some leftover errors from the upgrade to HWLOC 2 in the mappers. Properly flag debugger jobs that come in via PMIx. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-09-27 16:18:43 -05:00
Ralph Castain	fe9b584c05	Fully support OMPI spawn options. Fix a bug in the round-robin mappers where we weren't adding nodes to the job map node array, and so resources were not released Signed-off-by: Ralph Castain <rhc@open-mpi.org> (cherry picked from commit 285d8cfef74ffc899e9c51e1d9c597b7fb2ceb89)	2017-09-21 10:29:27 -07:00
Jeff Squyres	7cccee9d92	rmaps/base: remove debugging "DONE" message Thanks for Ben Menadue for reporting and supplying the patch. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-09-19 07:10:00 -07:00
Joshua Hursey	e1d079544b	mca: Dynamic components link against project lib * Resolves #3705 * Components should link against the project level library to better support `dlopen` with `RTLD_LOCAL`. * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am` with the appropriate project level library: ``` MCA components in ompi/ $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la MCA components in orte/ $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la MCA components in opal/ $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la MCA components in oshmem/ $(top_builddir)/oshmem/liboshmem.la" ``` Note: The changes in this commit were automated by the script in the commit that proceeds it with the `libadd_mca_comp_update.py` script. Some components were not included in this change because they are statically built only. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-08-24 11:56:16 -04:00
Ralph Castain	7a83fdb9bb	Update to hwloc 2.0.0a with shmem support. Update to support passing of HWLOC shmem topology to client procs Update use of distance API per @bgoglin Have the openib component lookup its object in the distance matrix Bring usnic up-to-date Restore binding for hwloc2 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-25 20:26:22 -07:00
Gilles Gouaillardet	60aa9cfcb6	hwloc: add support for hwloc v2 API Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-20 17:39:44 +09:00
Gilles Gouaillardet	9f29f3bff4	hwloc: since WHOLE_SYSTEM is no more used, remove useless checks related to offline and disallowed elements Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-20 17:39:21 +09:00
Geoffrey Paulsen	71333a4b14	Transitioning ownership of rmaps/seq and rmaps/rank_file from Intel to IBM.	2017-07-18 21:31:01 -04:00
Ralph Castain	7b39f19f60	Fix the backend mapper algorithm for comm_spawn. The front and back ends need to get the nodes into the job map in the same order so that the ranking algorithms will reach the same results Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-08 08:00:52 -07:00
Ralph Castain	93cf3c7203	Update OPAL and ORTE for thread safety (I swear, if I look this over one more time, I'll puke) Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-06 12:30:57 -07:00
Ralph Castain	ad108ba44d	Fix the DVM Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-30 11:42:42 -07:00
Ralph Castain	87201a80ff	Silence coverity warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-27 11:45:53 -07:00

1 2 3 4 5 ...

664 Коммитов