Ralph Castain
5ae42c816e
Attempt to reduce the RARP traffic during definition of allocations
2015-03-16 16:26:40 -07:00
Gilles Gouaillardet
d1b2f043ff
fix misc memory leaks
...
as already reported by Coverity with CIDs
71818, 71819, 72250, 715767, 1196749 and 1274002
2015-03-05 13:58:05 +09:00
Gilles Gouaillardet
42f5a36ee3
rmaps/seq: fix misc memory leaks
...
as reported by Coverity with CIDs 1269886 and 1269887
2015-03-02 15:31:11 +09:00
Gilles Gouaillardet
0c7a2846d1
rmaps/rank_file: fix misc memory leaks
...
as reported by Coverity with CIDs 72250 and 1196774
2015-03-02 15:31:11 +09:00
Gilles Gouaillardet
c15b919635
rmaps/lama: fix misc memory leaks
...
as reported by Coverity with CIDs 719263, 719264, 1196712 and 1269842
2015-03-02 15:31:11 +09:00
Gilles Gouaillardet
456baeb71b
rmaps/base: fix misc memory leaks
...
as reported by Coverity with CIDs 1196751, 1196754, 1196755 and 1269866
2015-03-02 15:31:11 +09:00
Jeff Squyres
398ae15533
rmaps_base_frame: remove dead code
...
This was CID 1196641
2015-02-24 15:24:11 -05:00
Howard Pritchard
bf89131f9e
add owner files to opa/ompi/orte mca directories
...
This commit adds an owner file in each of the component directories
for each framework. This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page. Currently there are two
"fields" in the file, an owner and a status. A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
2015-02-22 15:10:23 -07:00
Ralph Castain
116fcaff2c
Start adding support for cmd line options to orte-submit
2015-02-10 12:13:21 -08:00
Ralph Castain
b314bfb5e9
If someone specifies the bitmap for hwthreads and wants hwthread cpus, then don't parse the slot list as it expects cores - just copy the provided bitmap across as it already has the required info
2014-12-19 10:56:14 -08:00
Ralph Castain
0630680f36
Two cleanups required for transfer to 1.8.4:
...
* Use %d format for the topo signature as some systems apparently have problems with %u
* Use correct variable in show_help message
2014-12-12 17:23:32 -08:00
Ralph Castain
b757b3f452
Ensure that the #nodes in the job map gets properly updated when using the sequential mapper. Provide some further diagnostic info to help understand the problem when encountered.
2014-12-08 08:03:53 -08:00
Ralph Castain
cb15cc06e1
Minor changes per Jeff's request on PR for 1.8.4
2014-12-02 19:54:10 -08:00
Ralph Castain
3f9d9ae8b6
Provide tighter LSF integration by correctly handling scenarios where the user has asked LSF to assign bindings. Fix a couple of typos in lex parser definitions. Tell hostfile parser to ignore binding designations in hostfiles. Add an attribute to indicate that cpusets were provided as physical cpu ids.
...
Once validated, a version of this will be backported to the v1.8.4 release.
2014-11-30 11:50:31 -08:00
Ralph Castain
2a90788724
Support physical processor ids in rankfile
2014-11-10 14:00:40 -08:00
Ralph Castain
ea11e63f59
Per patch from Tetsuya, allow the user to bind-to none when specifying multiple pe's/rank as requested by Reuti. This allows the user to reserve multiple "slots" in the allocation for each process while mapping, but not to bind the process to specific processing elements on the node.
...
Reviewed by rhc, so RM-approved to go across to v1.8.3
cmr=v1.8.3:reviewer=ompi-gk1.8
This commit was SVN r32701.
2014-09-10 15:52:18 +00:00
Ralph Castain
4207b4c4ad
Improve the --bind-to help message to better indicate the default options under various values of np. Remove the warning message if the user doesn't specify a binding policy and we are overloaded
...
cmr=v1.8.3:reviewer=jsquyres
This commit was SVN r32687.
2014-09-08 21:03:51 +00:00
Ralph Castain
842aaf6167
Correctly end mapping oversubscribed nodes round-robin byslot
...
cmr=v1.8.3:reviewer=rhc
This commit was SVN r32616.
2014-08-27 16:15:18 +00:00
Ralph Castain
024572cb6c
Sigh - I promised to remove these deprecation warnings back in June. My apologies to Dave Goodell and others who requested it.
...
cmr=v1.8.2:reviewer=dgoodell:subject=remove deprecation warnings for pernode, npernode, and npersocket
This commit was SVN r32552.
2014-08-19 19:40:20 +00:00
Gilles Gouaillardet
c3c364a262
check-help-strings cleanup
...
This commit was SVN r32494.
2014-08-11 03:22:05 +00:00
George Bosilca
daa076995a
orte_rmaps_numa_node_t -> opal_rmaps_numa_node_t
...
This commit was SVN r32380.
2014-07-31 19:58:47 +00:00
Ralph Castain
5bb5b22573
When a user asks for cpus/rank > 1 and only has one slot, we need to ensure we always map at least one process when they don't tell us -np
...
cmr=v1.8.2:reviewer=rhc:subject=correct num_procs in corner case
This commit was SVN r32142.
2014-07-04 17:00:35 +00:00
Ralph Castain
149810f02c
Per request from Jeff, slightly modify the show_help message as the precise name of the NUMA-containing packages differs based on OS and distro
...
cmr=v1.8.2:reviewer=jsquyres:subject=modify show_help message
This commit was SVN r32122.
2014-07-02 14:46:00 +00:00
Ralph Castain
8fca77c3d3
Protect the binding policy setting so it builds when --without-hwloc
...
Refs trac:4742
This commit was SVN r32085.
The following Trac tickets were found above:
Ticket 4742 --> https://svn.open-mpi.org/trac/ompi/ticket/4742
2014-06-25 18:13:54 +00:00
Ralph Castain
5f6be06b54
Per request from Gilles and discussion at devel conference, have the --oversubscribe option automatically set both oversubscribe and overload-allowed properties as this is likely what the user intended.
...
cmr=v1.8.2:reviewer=rhc:subject=automatically set oversub/load
This commit was SVN r32072.
2014-06-24 18:11:39 +00:00
Ralph Castain
645df5e823
Don't release the node_name field as it gets used in the slots parsing - will be released at newline detection
...
This commit was SVN r32058.
2014-06-20 13:18:46 +00:00
Ralph Castain
9a47e45a09
<laugh> ensure we really compare the things we want to compare
...
This commit was SVN r32055.
2014-06-19 20:54:25 +00:00
Ralph Castain
e65538e91b
Add some defensive programming, fix a typo
...
This commit was SVN r32054.
2014-06-19 20:52:13 +00:00
Ralph Castain
b43f760f93
If you don't specify all the rank-file mapping for all procs, then you'll segfault - which is probably a bad idea. I can't see an easy workaround, so just error out for now and let's see if anyone really cares.
...
cmr=v1.8.2:reviewer=jsquyres
This commit was SVN r32053.
2014-06-19 20:30:06 +00:00
Ralph Castain
65275d6326
Add a little more info to the warning message - i.e., that the likely cause of the problem is missing libnumactl and/or libnumactl-devel
...
cmr=v1.8.2:reviewer=miked:subject=improve memory binding failure message
This commit was SVN r32030.
2014-06-18 19:20:28 +00:00
Ralph Castain
3f04d50cb0
Per the ticket, resolve our handling of overload conditions to provide a more consistent response. If we are overloaded (i.e., attempting to bind more processes to a location than the number of cpus under that location), then we consider the following conditions:
...
(a) default binding policy is in effect. In this case, we will emit a
warning and default to not binding unless the user provided the
"oversubscribe" or "overload" modifier to the "bind-to" option.
(b) user-specified binding policy is in effect. In this case, we will
error out unless the user provided the "oversubscribe" or "overload"
modifier to the "bind-to" option as we cannot meet the directive.
Either "bind-to" modifier (oversubscribe or overload) will be accepted for
now - in 1.9, we will deprecate the "overload" term in favor of
"oversubscribe".
Also added the ability to accept a --bind-to modifier without specifying the binding policy itself so a user can specify overload-allowed with the default policy.
Closes trac:4345
cmr=v1.8.2:reviewer=rhc:subject=resolve handling of overload conditions
This commit was SVN r32005.
The following Trac tickets were found above:
Ticket 4345 --> https://svn.open-mpi.org/trac/ompi/ticket/4345
2014-06-14 15:38:32 +00:00
Ralph Castain
56c3575c0e
Can't emit an error for an unrecognized mapping policy modifier as the ppr policy relies on not doing so.
...
This commit was SVN r31998.
2014-06-13 20:10:09 +00:00
Ralph Castain
3ed282bf44
Per patch from Tetsuya, correct the cpus-per-proc logic so we correctly detect when the user is attempting to bind too low for that option
...
Refs trac:4702
This commit was SVN r31988.
The following Trac tickets were found above:
Ticket 4702 --> https://svn.open-mpi.org/trac/ompi/ticket/4702
2014-06-13 16:32:52 +00:00
Ralph Castain
06dbfa3098
Make the cpus-per-proc equivalent a little more intuitive:
...
* allow users to specify just a modifier for map-by instead of requiring that they also specify a policy. Thus, we now accept --map-by :pe=3 as indicating that we should use the default mapping policy, but bind 3 cpus/proc.
* if users specify a pe's/proc but no policy, default to --map-by NUMA to ensure we have access to multiple cpus for the request. This won't guarantee we have access to enough to meet the request, but gives us a chance. In addition, we know that binding a proc to multiple cpus will work best if those cpus are all in the same NUMA, so this provides some degree of optimized behavior.
Per a request from Jeff, define "oversubscribe" for binding as a synonym for the "overload" modifier.
cmr=v1.8.2:reviewer=rhc
This commit was SVN r31967.
2014-06-08 20:26:59 +00:00
Ralph Castain
638c24f655
Correct the bind-in-place algorithm to better handle comm_spawn. If the location identified by the mapper is already occupied by procs from another job, then we need to shift either right or left until we find an unoccupied location where we can be bound. If nothing is available, then check for the overload flag (and bind us in the original location if provided), or see if this was the default binding policy instead of one specified by the user - if so, then just don't bind this process.
...
cmr=v1.8.2:reviewer=rhc
This commit was SVN r31959.
2014-06-06 12:36:14 +00:00
Ralph Castain
f1978fba7c
Cleanup a set of typos on the orte_get_attribute call
...
This commit was SVN r31942.
2014-06-03 20:36:38 +00:00
Ralph Castain
8736a1c138
Per RFC:
...
http://www.open-mpi.org/community/lists/devel/2014/05/14822.php
Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root).
This commit was SVN r31916.
2014-06-01 16:14:10 +00:00
Ralph Castain
f55c587a74
Per patch from Tetsuya Mishima, ensure the rank_file mapper accurately tracks number of nodes in the map
...
Refs trac:4594
This commit was SVN r31725.
The following Trac tickets were found above:
Ticket 4594 --> https://svn.open-mpi.org/trac/ompi/ticket/4594
2014-05-13 14:36:25 +00:00
Ralph Castain
5602156a1c
Use the correct abstraction layer name for the data dirs
...
This commit was SVN r31684.
2014-05-08 14:32:24 +00:00
Ralph Castain
11faab1091
The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
...
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Ralph Castain
6545e6e9a8
Add one more check for failed mapping that rarely occurs, but results in a hang when it does
...
cmr=v1.8.2:reviewer=rhc
This commit was SVN r31598.
2014-05-02 10:35:14 +00:00
Ralph Castain
61d94fcee2
Fix the sequential mapper - it was out-of-sync with the hostfile changes, and we missed the "seq" policy when parsing the --map-by option. Thanks to Bill Chen for reporting it
...
cmr=v1.8.1:reviewer=jsquyres
This commit was SVN r31333.
2014-04-08 03:38:25 +00:00
Jeff Squyres
82e104719a
hwloc/rmaps base: Add missing help message.
...
Also, add missing ORTE_ERROR_LOG in the other case where this error
message is used (i.e., ORTE_ERROR_LOG was used in the one place, so
let's also use it in the other place).
This commit was SVN r31321.
2014-04-07 15:39:54 +00:00
Ralph Castain
3fdcaeab97
Fix a problem where we need to abort due to a mapping failure, but we are in a managed environment and thus the orteds have not wired up. Thus, if we send the exit message across the routed network, the remote daemons won't have a way to relay the message along - and we won't exit.
...
If we are aborting, then set the flags so the HNP directly sends an exit command to each daemon. Make it the halt_vm command so the remote daemon doesn't try to relay it, but instead just exits without waiting for its routed children to exit first.
cmr=v1.8.1:reviewer=jsquyres:subject=fix hangs due to abort prior to daemon wireup
This commit was SVN r31304.
2014-04-02 04:17:55 +00:00
Ralph Castain
714cb8f573
Silence warnings
...
cmr=v1.8:reviewer=rhc
This commit was SVN r31248.
2014-03-27 14:16:54 +00:00
Ralph Castain
390645ac2a
Per patch from Tetsuya Mishima, do a nicer job of warning the user that we need to map to a higher level to get the number of requested cpus/rank. Also, change the mapping policy to "byslot" when falling back to that option.
...
cmr=v1.8:reviewer=rhc
This commit was SVN r31196.
2014-03-24 15:47:29 +00:00
Ralph Castain
081669b440
When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it
...
cmr=v1.7.5:reviewer=rhc:subject=fix pretty-print of bindings
This commit was SVN r30968.
2014-03-10 15:53:07 +00:00
Ralph Castain
fc2dd6ac48
Per Jeff's request, add a more detailed comment as to why we are turning off the warning at this time.
...
Refs trac:4339
This commit was SVN r30948.
The following Trac tickets were found above:
Ticket 4339 --> https://svn.open-mpi.org/trac/ompi/ticket/4339
2014-03-06 02:17:25 +00:00
Ralph Castain
a2b539c763
Per the telecon, silence the warning for 1.7.5 to give us time to consider a better permanent solution
...
Refs trac:4339
This commit was SVN r30941.
The following Trac tickets were found above:
Ticket 4339 --> https://svn.open-mpi.org/trac/ompi/ticket/4339
2014-03-05 03:02:29 +00:00
Ralph Castain
50c30d62ca
Repair builds without hwloc
...
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r30940.
2014-03-05 02:48:15 +00:00