1
1

429 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
c86fede9df Fix typo for rmaps_base_oversubscribe
Causes the MCA param to be ignored, while the cmd line option still
works.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-11-29 07:27:46 -08:00
Jeff Squyres
e9bf318dcb orte-rmaps-base: slightly amend help message
Follow on to 430c659908: clarify the help message and fix one typo.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-11-08 14:21:47 -08:00
Jeff Squyres
430c659908 orte-rmaps-base: update out-of-slots show_help message
Update the show_help message for when there are not enough slots to
run an application.

Also, remove a bunch of copies of this message in various show_help
text files that aren't used/referred to anywhere in the code.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-11-08 15:02:57 -05:00
Aurélien Bouteiller
2820aef551
Correctly propagate the oversubscribe flag to the spawnees
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
2018-10-23 23:02:36 -04:00
Jeff Squyres
a85bad37df orte: strncpy() -> opal_string_copy()
Fairly straightforward conversion of strncpy() calls to
opal_string_copy().

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-10-14 16:10:20 -07:00
Ralph Castain
f5a6b7f1e9 Fix -H operations for multi-app case
Correctly aggregate slots across -H entries from each app. Take into
account any -H entry when computing nprocs when no value was given.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-10-11 09:30:01 -07:00
Ralph H Castain
fc81d0d519 Replace asprintf with opal_asprintf
Silence the flood of warnings from ORTE

Signed-off-by: Ralph H Castain <rhc@open-mpi.org>
2018-10-06 19:32:37 +00:00
Ralph H Castain
51acbf738e Fix map-by node for comm_spawn
Do not reorder the available host list as this causes the head node process assignment to differ from those computed on the other nodes

Signed-off-by: Ralph H Castain <rhc@open-mpi.org>
2018-10-06 15:58:45 +00:00
Jeff Squyres
6bb356ab87 Squash a bunch of harmless compiler warnings.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-09-26 12:15:21 -07:00
Ralph Castain
45f23ca5c9 Update mapping system
Correctly transfer job-level mapping directives for dynamically spawned
jobs to the mapping system.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-09-26 10:00:09 -07:00
Ralph Castain
bcdb1f45ac Fix the multiple pe/proc option
Things got a little out of whack and we weren't actually processing the map-by modifiers, plus an error crept into the display of the binding report. So clean those up.

Thanks to @tonyreina for the error report

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-25 18:47:39 -07:00
Ralph Castain
6b6e63a346 Control inheritance of launch directives by child jobs
Do not have child jobs inherit launch directives unless requested to do so. This affects the map-by, rank-by, bind-to, npernode, pernode, npersocket, persocket, and cpus-per-rank directives. Values provided in the spawn call always take precedence - if a particular value isn't specified, then the ORTE defaults will be used if inheritance is not requested, and the values specified by MCA param will be used if inheritance is set.

Always inherit oversubscribe for now as otherwise MTT will break

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-10 15:12:05 -07:00
Ralph Castain
3b2390e5d5 Silence coverity warnings, remove/ignore build product
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-25 08:01:28 -07:00
Ralph Castain
f17d47087a Define a new binding method and qualifier
Allow users to request that procs be bound to a cpu in a given cpu-list based on their corresponding local rank

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-20 21:26:09 -07:00
Ralph Castain
98b4ed9a3a Fix the no-disconnect test
A race condition exists based on whether or not the userdata object attached to a hwloc_obj_t has been initialized. These objects are setup whenever we scan for resources under that location. You therefore must not set a variable to the pointer to the userdata object and then call a function that will initialize the data in it - you need to set the variable after the function call, and protect against a NULL pointer

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-19 13:52:34 -07:00
Ralph Castain
f0a0d606a0 Correct accounting for tools
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 1be080f7b92bad39745f42628a8cb6afefad2d2a)
2018-06-18 13:24:25 -07:00
Ralph Castain
ea21f7175a Silence warnings and remove unused code
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-16 17:42:48 -07:00
Ralph Castain
7c0ec7e851 Cleanup warnings in binding code
This still leaves two unresolved warnings:

base/rmaps_base_binding.c:577:22: warning: variable ‘clvm’ set but not used [-Wunused-but-set-variable]
     unsigned clvl=0, clvm=0;
                      ^~~~
base/rmaps_base_binding.c:576:27: warning: variable ‘hwm’ set but not used [-Wunused-but-set-variable]
     hwloc_obj_type_t hwb, hwm;
                           ^~~

The problem is that these values are used in the OPAL_HWLOC_MAKE_OBJ_CACHE macro to form a variable name. Thus, the compiler doesn't recognize the values as being "used". I'm not entirely sure how to resolve it cleanly.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-03 11:47:14 -07:00
Brice Goglin
c4dffa1d0f rmaps: simplify the lookup for the binding object and fix for hwloc 2.0
Don't bother doing a lookup upwards or downwards for the target object type.
Just use the target depth, iterate over the level until we find the min_bound
object that intersects the locale cpuset.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Ralph Castain
1e8add52d7 Silence warning
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-04-26 11:45:25 -07:00
Ralph Castain
9ae80596f6 Fix rank-by option and improve npernode/skt
This fixes a problem reported by @bgoglin where rank-by was incorrectly generating values when ranking by a type of object (e.g., socket). It also corrects the handling of the pernode, npernode, and npersocket options - these should only set the #procs and the default mapping pattern. They specifically should not prohibit the user from requesting a different mapping.

Thus, the following should be valid:

mpirun -npernode 2 --map-by socket ...

should put 2 procs on each node, mapping them by-socket on each node.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-04-25 20:35:43 -07:00
Ralph Castain
d644f7ee26 Correctly fix the ranking policy
Shorten the loops as much as possible - if someone wants to further optimize, they are welcome to do so.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-26 16:06:46 -07:00
Ralph Castain
322f6c5056 Fix a breakage in the ranking system
While it may be faster to reverse the order of the assignment loops, it also results in the wrong answer

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-25 15:55:56 -07:00
Ralph Castain
cb221b6f6f Correct mapping errors
Since we now support the dynamic addition of hosts to the orte_node_pool, there is no longer any reason to require advanced specification of all possible nodes. Instead, use a precedence method to initially allocate only those hosts that were specified in the cmd line:

* rankfile, if given, as that will specify the nodes

* -host, aggregated across all app_contexts

* -hostfile, aggregated across all app_contexts

* default hostfile

* assign local node

Fix slots_inuse accounting so that the nodes are correctly reset upon error termination - e.g., when oversubscribed without permission.

Ensure we accurately track the user's specified desires for oversubscribe and no-use-local when dynamically spawning jobs.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit c9b3e68ce596a68a2ed2fbf73f211b3334b0a6a8)
2018-02-07 11:29:21 -08:00
Ralph Castain
73ef976ead Silence warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-03 00:29:06 -08:00
Boris Karasev
52e81ee4b1 rmaps: fixed the ordering of mpirun target nodes
Fixed the desync of job-nodelists between mpirun and orted
daemons. The issue was observed when using RSH launching because user
can provide arbitrary order of nodes regarding HNP placement.
The mpirun process propagate the daemon's nodelist order to nodes.
The problem was that HNP itself is assembling the nodelist based on
user provided order. As the result ranks assignment was calculated
differently on orted and mpirun.

Consider following example:
* User launches mpirun on node cn2.
* Hostlist is cn1,cn2,cn3,cn4; ppn=1
* mpirun is passing hostlist cn[2:2,1,3-4]@0(4) to orteds
So as result mpirun will assing rank 0 on cn1 while orted will assign
rank 0 on cn2 (because orted sees cn2 as the first element in the node
list)

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-02-01 17:16:05 +02:00
Ralph Castain
8a7a57d4e2 Remove debug from rmaps base
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-20 00:22:51 -08:00
Ralph Castain
dcf389b6fa We now add all nodes to the job data object when we map, so don't do it twice
Fixes #4449

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-05 17:17:50 -08:00
Ralph Castain
d5ce3c38e1 Begin cleaning up debugger support
Debugger daemons do not count against available slots. Clean up some leftover errors from the upgrade to HWLOC 2 in the mappers. Properly flag debugger jobs that come in via PMIx.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-27 16:18:43 -05:00
Jeff Squyres
7cccee9d92 rmaps/base: remove debugging "DONE" message
Thanks for Ben Menadue for reporting and supplying the patch.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-09-19 07:10:00 -07:00
Ralph Castain
7a83fdb9bb Update to hwloc 2.0.0a with shmem support.
Update to support passing of HWLOC shmem topology to client procs
Update use of distance API per @bgoglin
Have the openib component lookup its object in the distance matrix
Bring usnic up-to-date
Restore binding for hwloc2

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-25 20:26:22 -07:00
Gilles Gouaillardet
60aa9cfcb6 hwloc: add support for hwloc v2 API
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-20 17:39:44 +09:00
Gilles Gouaillardet
9f29f3bff4 hwloc: since WHOLE_SYSTEM is no more used, remove useless
checks related to offline and disallowed elements

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-20 17:39:21 +09:00
Ralph Castain
7b39f19f60 Fix the backend mapper algorithm for comm_spawn. The front and back ends need to get the nodes into the job map in the same order so that the ranking algorithms will reach the same results
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-08 08:00:52 -07:00
Ralph Castain
93cf3c7203 Update OPAL and ORTE for thread safety
(I swear, if I look this over one more time, I'll puke)

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-06 12:30:57 -07:00
Ralph Castain
87201a80ff Silence coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-27 11:45:53 -07:00
Ralph Castain
657e701c65 Add debug verbosity to the orte data server and pmix pub/lookup functions
Start updating the various mappers to the new procedure. Remove the stale lama component as it is now very out-of-date. Bring round_robin and PPR online, and modify the mindist component (but cannot test/debug it).

Remove unneeded test

Fix memory corruption by re-initializing variable to NULL in loop

Resolve the race condition identified by @ggouaillardet by resetting the
mapped flag within the same event where it was set. There is no need to
retain the flag beyond that point as it isn't used again.

Add a new job attribute ORTE_JOB_FULLY_DESCRIBED to indicate that all the job information (including locations and binding) is included in the launch message. Thus, the backend daemons do not need to do any map computation for the job. Use this for the seq, rankfile, and mindist mappers until someone decides to update them.

Note that this will maintain functionality, but means that users of those three mappers will see large launch messages and less performant scaling than those using the other mappers.

Have the mindist module add procs to the job's proc array as it is a fully described module

Protect the hnp-not-in-allocation case

Per path suggested by Gilles - protect the HNP node when it gets added in the absence of any other allocation or hostfile

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-25 18:41:27 -07:00
Ralph Castain
b527c40dae Remove debug
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-12 12:41:36 -07:00
Ralph Castain
23af6c9d02 Merge pull request #3519 from rhc54/topic/nolocal
Fix --nolocal
2017-05-12 09:57:52 -07:00
Ralph Castain
45bbd598c1 Fix --nolocal
Fix the --nolocal option by ensuring we always check/remove the HNP from the list of available nodes if the flag is set
Ensure that the HNP node is included as available when nothing else is given

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-12 09:03:26 -07:00
Ralph Castain
29e083bffd Fix total_slots_allocated computation
On unmanaged allocations, we need to update the total_slots_allocated once the daemons have been launched and "discovered" their topology

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-12 08:21:52 -07:00
Ralph Castain
180809f2ef Do not pass topologies during tree spawn of daemons as there is no way the HNP can know the backend topologies at that point. Any needed topologies will be sent along with the launch_apps command
Do not pass param file MCA params if the user has requested that no param files be read - required when trying to avoid launch time penalties from large numbers of processes reading default param files. The daemon picks them up and passes them along anyway, so it isn't clear what value we gain from having them all read the defaults

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-07 21:14:43 -07:00
Ralph Castain
bb1aaa3286 Use the node index to compare to daemon vpid when identifying procs to bind
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-14 02:37:25 -07:00
Ralph Castain
0500cc1c66 Update the debugger launch code to reflect the new backend mapping method.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-12 13:31:18 -07:00
Ralph Castain
61a71e25ef Ensure the backend daemons know if we are in a managed allocation and if
the HNP was included in the allocation

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-14 10:06:43 -07:00
Ralph Castain
105fb152e1 Silence Coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-13 08:38:51 -07:00
Howard Pritchard
f8183f71f7 rmaps/base: swat compiler warning
gcc was complaining about variables possibly used uninitialized

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-09 14:30:06 -06:00
Ralph Castain
48fc339718 Create an alternative mapping method that pushes responsibility
onto the backend daemons. By default, let mpirun only pack the app_context
info and send that to the backend daemons where the mapping will
be done. This significantly reduces the computational time on mpirun as it isn't
running up/down the topology tree computing thousands of binding
locations, and it reduces the launch message to a very small number of
bytes.

When running -novm, fall back to the old way of doing things
where mpirun computes the entire map and binding, and then sends
the full info to the backend daemon.

Add a new cmd line option/mca param --fwd-mpirun-port that allows
mpirun to dynamically select a port, but then passes that back to
all the other daemons so they will use that port as a static port
for their own wireup. In this mode, we no longer "phone home" directly
to mpirun, but instead use the static port to wireup at daemon
start. We then use the routing tree to rollup the initial
launch report, and limit the number of open sockets on mpirun's node.

Update ras simulator to track the new nidmap code

Cleanup some bugs in the nidmap regex code, and enhance the error message for not enough slots to include the host on which the problem is found.

Update gadget platform file

Initialize the range count when starting a new range

Fix the no-np case in managed allocation

Ensure DVM node usage gets cleaned up after each job

Update scaling.pl script to use --fwd-mpirun-port. Pre-connect the daemon to its parent during launch while we are otherwise waiting for the daemon's children to send their "phone home" rollup messages

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-07 20:43:12 -08:00
Ralph Castain
0ae873de5c Fix a bug where we failed to compute #procs for nperXXX directives, thus resulting in an incorrect default binding
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 16:32:24 -08:00
Ralph Castain
223495325d Fix binding policy bug and support pe=1 modifier
Allow someone to specify the "pe=N" modifier to a mapping policy when N=1. This equates to just "bind-to core", but helps people who use a script to set the PE policy. Fix a bug where setting the binding policy left a lingering "if-supported" flag that shouldn't be there.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 14:55:17 -08:00