Mike Dubman
8d4592a94b
rmaps/mindist: better error message
...
better error message when there is only one socket available
fixed by Elena, reviewed by Miked
cmr=v1.7.5:reviewer=ompi-rm1.7
This commit was SVN r30787.
2014-02-21 11:38:35 +00:00
Ralph Castain
91f90058ce
Add missing options and cleanup the code a bit. Default to by-slot ranking if a non-hardware option isn't given. Thanks to Tetsuya Mishima for the assist.
...
cmr=v1.7.5:reviewer=ompi-gk1.7
This commit was SVN r30725.
2014-02-14 10:23:16 +00:00
Ralph Castain
fd9b301a8b
Check equality instead of bit-mask - thanks to Tetsuya Mishima for reporting it
...
cmr=v1.7.5:reviewer=ompi-gk1.7
This commit was SVN r30722.
2014-02-14 02:34:42 +00:00
Ralph Castain
1473dde6ea
Okay, once again be caught by the blasted hwloc inability to cleanly handle caches. Protect the calls to get_depth by first checking to see if it is a "cache", then use a cache-specific function to get the stupid data. Very, very irritating.
...
cmr=v1.7.5:reviewer=jsquyres:subject=treat caches as something different yet again
This commit was SVN r30693.
2014-02-12 01:45:06 +00:00
Ralph Castain
b566cd5e30
Protect against no modifiers
...
Refs trac:4117
This commit was SVN r30672.
The following Trac tickets were found above:
Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-02-11 17:34:37 +00:00
Ralph Castain
6fa34407bf
Handle modifiers to the --map-by dist option
...
Refs trac:4117
This commit was SVN r30671.
The following Trac tickets were found above:
Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-02-11 17:19:05 +00:00
Ralph Castain
4781ea71b6
Correct the handling of various map/bind combinations when pe=N is given. Thanks to Elena Elkina for reporting it.
...
Refs trac:4117
This commit was SVN r30663.
The following Trac tickets were found above:
Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-02-11 03:05:26 +00:00
Ralph Castain
707e51d786
Check for --cpus-per-proc earlier, before the correct option can be processed. Thanks to Tetsuya Mishima for reporting it.
...
Refs trac:4117
This commit was SVN r30662.
The following Trac tickets were found above:
Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-02-11 02:53:53 +00:00
Ralph Castain
d66d2f5fb3
It is just fine to map by node or slot and bind, so ensure the switch statement includes those options. Thanks to Tatsuya Mishima for point it out.
...
Refs trac:4240
This commit was SVN r30661.
The following Trac tickets were found above:
Ticket 4240 --> https://svn.open-mpi.org/trac/ompi/ticket/4240
2014-02-11 02:52:01 +00:00
Ralph Castain
1a12325094
Rats - need to include bydist in the mapping list
...
Refs trac:4117
This commit was SVN r30649.
The following Trac tickets were found above:
Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-02-09 16:17:05 +00:00
Ralph Castain
ca0c806662
Resolve the problem of binding in inverted topologies - check the relative depth of the map and bind objects in the topology, and let that determine whether we bind downward or upwards.
...
cmr=v1.7.5:reviewer=jsquyres:subject=Resolve the problem of binding in inverted topologies
This commit was SVN r30643.
2014-02-09 05:30:17 +00:00
Ralph Castain
bc7cc09749
After a lot of pain, I've managed to resolve the problem of conflicting mapping directives caused by mismatched MCA params - i.e., where someone has one variant of an MCA param (e.g., rmaps_base_mapping_policy) in their default MCA param file, and then specifies another variant (e.g., --npernode) on the command line. I can't fully resolve the problem as there is no way to know precisely what the user meant - we can only guess which param was really intended since the MCA param system
...
can't apply its normal precedence rules.
So...print a big "deprecated" warning for the old params and error out if a conflict is detected. I know that isn't what people really wanted, but it's the best we
can do. If only the old style param is given, then process it after the warning.
Extend the current map-by param to add support for ppr and cpus-per-proc, adding the latter to the list of allowed modifiers using "pe=n" for processing elements/proc. Thus, you can map-by socket:pe=2,oversubscribe to map by socket, binding 2 processing elements/process, with oversubscription allowed. Or you can map-by ppr:2:socket:pe=4 to map two processes to every socket in the allocation, binding each process to 4 processing elements.
For those wondering, a processing element is defined as a hwthread if --use-hwthreads-as-cpus is given, or else as a core.
Refs trac:4117
This commit was SVN r30620.
The following Trac tickets were found above:
Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-02-07 21:25:40 +00:00
Ralph Castain
e43589ed84
Fix warning - thanks to Paul Hargrove for reporting it
...
cmr=v1.7.4:reviewer=ompi-gk1.7
This commit was SVN r30548.
2014-02-03 23:51:45 +00:00
Ralph Castain
410a3afa7b
Fix --without-hwloc operations - must default to map-by slot in that scenario
...
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r30474.
2014-01-29 16:54:05 +00:00
Ralph Castain
42eb0bbe1b
Fix --without-hwloc builds
...
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r30462.
2014-01-28 17:10:32 +00:00
Ralph Castain
84a0ab3a75
Ah @$#!$#% - missed one last help message that needs to be corrected.
...
cmr=v1.7.4:reviewer=jsquyres:subject=correct help message
This commit was SVN r30449.
2014-01-28 04:03:24 +00:00
Ralph Castain
941bfd4604
Final cleanup of cpus-per-proc for 1.7.4 - provide better checking for cpus-per-proc and mismatched mapping/binding directives, and provide error messages telling the user what to do to get it right.
...
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r30438.
2014-01-27 22:40:51 +00:00
Ralph Castain
886fee9367
Properly set num_procs when np is not given, but cpus-per-proc is used. Thanks to Tetsuya Mishima for pointing it out
...
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r30389.
2014-01-23 05:01:07 +00:00
Jeff Squyres
7768828d2d
Addendum to r30298: tweak the wording of the help messages a bit.
...
Refs trac:4117. Please use this commit rather than the patch attached to
the ticket; the patch had a few mistakes in the tweaked wording.
This commit was SVN r30362.
The following SVN revision numbers were found above:
r30298 --> open-mpi/ompi@58479399c3
The following Trac tickets were found above:
Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-01-22 12:17:14 +00:00
Ralph Castain
58479399c3
As per RFC and telecon, deprecate cmd line options and their corresponding MCA params for old-style mapping and binding directives
...
cmr=v1.7.5:reviewer=jsquyres:subject=deprecate old-style mapping and binding directives
This commit was SVN r30298.
2014-01-15 14:48:39 +00:00
Ralph Castain
fb9e427320
One last corner case - when encountering an overload condition (e.g., by comm_spawning more procs than we have cores) and we are using the default binding policy, do *not* bind the new procs to anything as this can cause major problems. Instead, let the spawn succeed since the user didn't specifically ask to be bound, and leave the new procs as unbound.
...
Refs trac:4077
This commit was SVN r30200.
The following Trac tickets were found above:
Ticket 4077 --> https://svn.open-mpi.org/trac/ompi/ticket/4077
2014-01-09 22:39:34 +00:00
Ralph Castain
24e990e747
Fix comm_spawn for oversubscribed systems by correctly computing the number of available slots
...
cmr=v1.7.4:reviewer=jsquyres:subject=Fix comm_spawn for oversubscribed systems
This commit was SVN r30197.
2014-01-09 20:33:48 +00:00
Ralph Castain
9fcb46d85a
Correctly detect and handle oversubscription for comm_spawn
...
cmr=v1.7.4:reviewer=jsquyres:subject=Correctly detect and handle oversubscription for comm_spawn
This commit was SVN r30186.
2014-01-09 18:27:51 +00:00
Ralph Castain
6e5fedeb04
Oops - add verbose output to inform that cannot default bind due to no cores detected
...
Refs trac:4074
This commit was SVN r30185.
The following Trac tickets were found above:
Ticket 4074 --> https://svn.open-mpi.org/trac/ompi/ticket/4074
2014-01-09 18:17:14 +00:00
Ralph Castain
7e4748a0f1
Handle the case of nodes that do not report cores, and thus our default binding policy will fail even though binding is supported by defaulting to not binding on those nodes.
...
Thanks to Paul Hargrove for reporting the problem on NetBSD.
cmr=v1.7.4:reviewer=jsquyres:subject=Handle the case of nodes that do not report cores
This commit was SVN r30180.
2014-01-09 16:27:58 +00:00
Ralph Castain
bf453a2575
Reference the correct variable...sigh
...
Refs trac:4059
This commit was SVN r30163.
The following Trac tickets were found above:
Ticket 4059 --> https://svn.open-mpi.org/trac/ompi/ticket/4059
2014-01-08 22:36:39 +00:00
Ralph Castain
e724d0d12d
Ensure comm_spawn'd jobs get treated the same wrt setting default mapping directives
...
Refs trac:4059
This commit was SVN r30158.
The following Trac tickets were found above:
Ticket 4059 --> https://svn.open-mpi.org/trac/ompi/ticket/4059
2014-01-08 15:16:22 +00:00
Ralph Castain
fb650aed0c
Fix how we transfer mapping directives to the job, ensuring that directives that can be given outside of a mapping policy (e.g., oversubscribe and no-use-local) are retained.
...
cmr=v1.7.4:reviewer=jsquyres:subject=Fix how we transfer mapping directives to the job
This commit was SVN r30155.
2014-01-08 04:25:43 +00:00
Brian Barrett
8b778903d8
Fix longstanding issue with our multi-project support. Rather than using
...
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi. This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.
This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Mike Dubman
40aadab85f
re-enable map-by dist
...
after last refactoring in rmaps, map-by dist:hca was disabled.
reverting it back
found/fixed by Elena, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7
This commit was SVN r30118.
2014-01-04 20:44:41 +00:00
Ralph Castain
d5a5caa7e0
Restore the bycore mpirun option for backward compatibility
...
Refs trac:4044
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r30103.
The following Trac tickets were found above:
Ticket 4044 --> https://svn.open-mpi.org/trac/ompi/ticket/4044
2014-01-02 04:16:43 +00:00
George Bosilca
38cbaeaa82
Try to impose a little bit of consistency on how we parse lists of
...
modules by enforcing the use of OPAL list accessors.
This commit was SVN r30045.
2013-12-21 23:23:33 +00:00
Ralph Castain
31248c0985
Correctly add support for the "env" MPI_Info key during comm_spawn, update the "map-by", "rank-by", and "bind-to" Info key behaviors to match the new mapping/ranking/binding system, and update all docs and comments to match.
...
Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node.
Refs trac:4003
This commit was SVN r30033.
The following Trac tickets were found above:
Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003
2013-12-20 20:42:39 +00:00
Ralph Castain
55cd65b149
Don't warn about binding (process and/or memory) if the node cannot do it or if we would overload, but it wasn't specifically requested by the user (i.e., it is the result of the default policy). Instead, just don't bind and quietly move along.
...
Reset topology usage for each node as we bind as multiple nodes may be linked to the same topology object. This will need to be revisited for scale as it does take some non-zero time to reset the usage each iteration. However, storing individual topology objects for every node consumes memory, so it's a tradeoff.
cmr=v1.7.4:reviewer=jsquyres:subject=Eliminate excessive binding/memory warnings
This commit was SVN r29978.
2013-12-19 16:31:45 +00:00
Ralph Castain
c5956e7b8c
Convert debug output to opal_output_verbose
...
Thanks to Tetsuya Mishima for reporting it
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r29969.
2013-12-19 00:36:15 +00:00
Ralph Castain
ab4636c47b
Per email on devel list, change the default rank-by to slot unless map-by <obj> is specified, in which case use rank-by <obj>
...
Refs trac:3977
This commit was SVN r29945.
The following Trac tickets were found above:
Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977
2013-12-18 00:48:50 +00:00
Ralph Castain
53cd00fe16
By setting a default mapping/ranking/binding policy that wasn't "none", we introduced a problem for users of the Mac and any other machine where sockets aren't defined and/or binding is not supported. Fix that by checking to see if the user specified the failing policy - if not, then fall back to the old map/rank by slot and no binding.
...
Refs trac:3977
This commit was SVN r29933.
The following Trac tickets were found above:
Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977
2013-12-17 14:50:10 +00:00
Ralph Castain
8b6d117541
Per the OMPI devel conference that changed our default behaviors:
...
* default to bind-to core
* map-by slot if np=2
* map-by socket (balance across sockets on each node) if np > 2
* map-by <obj> will imply rank-by <obj> by default (leave default binding as above)
Fix a bug in the map-by <obj> mapper where we incorrectly compute the #procs to assign if the #slots > #procs
cmr=v1.7.4:reviewer=jsquyres:subject=Update default binding and mapping values
This commit was SVN r29919.
2013-12-15 17:25:54 +00:00
Jeff Squyres
0ab48ad0d2
Fix some annoying flex warnings that have been there for years.
...
Many thanks to Tom Fogal for the initial patch.
cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings
This commit was SVN r29904.
2013-12-14 00:36:12 +00:00
Ralph Castain
0e81959aae
Cleanup mindist error messages - already patched in 1.7
...
This commit was SVN r29869.
2013-12-12 15:30:29 +00:00
Mike Dubman
c208b858e7
improve error messages in mindist
...
cmr=v1.7.4:reviewer=ompi-rm1.7
This commit was SVN r29846.
2013-12-09 06:34:38 +00:00
Ralph Castain
f2c49c6c19
Fix the map-by object mapper to handle cpus-per-proc by accounting for the request when computing the number of procs to put on each object. This ensures that the binding routine doesn't automatically overload the cores.
...
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r29843.
2013-12-08 16:59:25 +00:00
Ralph Castain
7480beb7f0
Per request from Nathan, add an offset value to the job struct so we can construct a "global rank" that spans multiple jobs during dynamic launch operations. Store a new ORTE_DB_GLOBAL_RANK value for each process in the database, and ensure that we share our own value during connect_accept so both sides can see it.
...
This isn't being used yet - just enabling Nathan to do what he needs.
***** NOTE: any use of the OMPI_DB_GLOBAL_RANK database key must be protected by #ifdef OMPI_DB_GLOBAL_RANK as not all RTE's will define this key. *****
This commit was SVN r29708.
2013-11-14 17:01:43 +00:00
Mike Dubman
840e2cb4a2
mindist: cosmetic, use fallback to byslot if unable to read NUMA info, small fix.
...
fixed by Elena, reviewed by Ralph/Mike
cmr=v1.7.4:reviewer=ompi-gk1.7
This commit was SVN r29679.
2013-11-13 09:26:40 +00:00
Ralph Castain
e35ad23176
Correctly compute usage for dynamic spawns when binding is invoked. Ensure we correctly account for existing process usage on each node when computing bindings during dynamic spawns.
...
cmr=v1.7.4:reviewer=hjelmn:subject=Correctly compute usage for dynamic spawns when binding is invoked
This commit was SVN r29649.
2013-11-10 00:38:01 +00:00
Joshua Ladd
d594ffbfc7
Backing out Elena's patch - abstraction violation
...
This commit was SVN r29645.
2013-11-08 13:12:07 +00:00
Joshua Ladd
da3e272fdd
Adds a check in the mindist mapper for whether or not the user asks for a specific device. This patch was submited by Elena Elkina and reviewed by Josh Ladd and should be added to
...
cmr=v1.7.4:reviewer=jladd
This commit was SVN r29644.
2013-11-08 04:28:53 +00:00
Ralph Castain
960a255e7f
Do some cleanup of the --without-hwloc build - no need to work on coprocessors since we can't detect them anyway, cleanup some unused variables in the ppr mapper
...
This commit was SVN r29476.
2013-10-23 01:45:21 +00:00
Jeff Squyres
758cd25fff
Move the MCA / MPI_T level of the LAMA component down to 5 (from 9).
...
This commit was SVN r29214.
2013-09-20 15:23:27 +00:00
Ralph Castain
d9f0505952
Fix the lama verbose outputs so they don't segfault if someone asks for verbose output, but isn't using lama
...
cmr:v1.7.3:reviewer=jsquyres
This commit was SVN r29108.
2013-09-03 17:55:35 +00:00