Ralph Castain
d97bc29102
Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given
2015-09-04 16:54:40 -07:00
Ralph Castain
ed93154e43
Fix hetero operations. An error in the hwloc utilities only allocated memory for the first display of a binding map, and then assumed that all nodes had the same number of cores in them. This resulted in memory corruption whenever someone displayed a binding pattern for a hetero cluster, and a smaller node was first in line.
2015-07-07 12:52:16 -07:00
Ralph Castain
7455802a36
Add a bunch of debug, and correct an error that caused us to use the wrong mapping policy when determining the default binding policy
2015-07-07 10:13:10 -07:00
Nathan Hjelm
ee36d813dc
Merge pull request #657 from hjelmn/c99
...
more c99 updates
2015-06-25 11:21:09 -06:00
Nathan Hjelm
4d92c9989e
more c99 updates
...
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Howard Pritchard
e49a37c034
ownership: update ownership files
...
per discussions at OMPI devel workshop
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-06-25 10:04:42 -06:00
Ralph Castain
869041f770
Purge whitespace from the repo
2015-06-23 20:59:57 -07:00
Ralph Castain
869b2891c4
When doing comm-spawn, track the last object we bound to and ensure that we start the next job on the next object so we avoid overload situations when they aren't necessary
2015-06-17 09:20:08 -07:00
Gilles Gouaillardet
b72e9288bc
rmaps: fix a misc memory leak
...
as reported by Coverity with CID 1269887
2015-06-17 11:17:55 +09:00
Ralph Castain
ff92781ec4
Replace hwloc191 with hwloc1110
...
Fix hwloc compile. Ignore LAMA mapper due to deprecated hwloc functions
2015-06-13 10:11:45 -07:00
Gilles Gouaillardet
2e384a3b65
initialize common symbols from orte
...
A few uninitialized common symbols are remaining (generated by flex) :
* orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_leng
* orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_text
* orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_leng
* orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_text
2015-05-08 10:11:58 +09:00
Ralph Castain
0bb73645f0
Silence Coverity warning
2015-04-30 20:49:28 -07:00
Ralph Castain
7d1980ba83
Add the ability to specify the number of desired slots in the --host option. Just giving a host name => one slot (multiple copies of the name yield one slot per copy). Giving "foo:3" indicates you want three slots - a shorthand notation for saying "foo" three times. Giving "foo:*" indicates you want the topology to set the number of slots based on the orte_set_slots param.
2015-04-30 20:35:23 -07:00
Ralph Castain
e26e7ad736
Better support automated tests for map, rank, and bind options
2015-04-30 14:01:13 -07:00
Ralph Castain
9104e81958
When --map-by node, we should be unbound. Also remove dead code due to copy/paste error.
2015-04-23 20:35:54 -07:00
Ralph Castain
5003be5c5c
If the user specifies a --map-by <foo> option, then default to bind-to <foo> unless they specify a bind-to option. If they map-by slot/node, then use the default policy based on num_procs.
2015-04-23 13:30:21 -07:00
Nathan Hjelm
45e053dbce
orte: use C99 subobject naming for component initialization
...
This commit helps future-proof orte components by initializing each
component member by name.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Nathan Hjelm
3436f2917d
Merge pull request #449 from hjelmn/mca_base_update
...
mca/base update
2015-04-16 08:41:48 -06:00
Ralph Castain
9c6d452d6b
If we are using HT cpus and have <= 2 procs, then map-by hwthread by default
2015-04-11 21:18:05 -07:00
Ralph Castain
033418f62a
Correct a typo that reversed the default binding pattern. Ensure we default bind to hwthread if user specified --use-hwthread-cpus if nprocs <= 2, and bind to hwthread if told to do so.
2015-04-10 15:58:35 -07:00
Elena
1e913c76c4
changed mindist mapping policy specifier from map-bt dist:device,modifiers to --map-by dist:modifiers -mca rmaps_dist_device device
2015-04-01 15:07:35 +03:00
Ralph Castain
b67b3619fc
If we are using the default bindings, and one or more nodes are not setup to support binding, then don't error out - just don't bind.
...
Thanks to Annu Desari for pointing out the problem.
2015-03-28 08:20:24 -07:00
Nathan Hjelm
b68d66bb9b
MCA: Add the project/project version to the MCA base component
...
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.
All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-03-27 10:59:04 -06:00
Ralph Castain
43a3baad5e
Ensure we use the first compute node's topology for mapping
...
Don't filter the topology by cpuset if you are mpirun until you know that no other compute nodes are involved. This deals with the corner case where mpirun is executing on a node of different topology from the compute nodes.
Simplify - don't mandate that all cpus in the given cpuset be present on every node. We can then run everything thru the filter as before, which ensures that any procs run on mpirun are also contained within the specified cpuset.
Correctly count the number of available PUs under each object when given a cpuset
Fix the default binding settings, and correctly count PUs when no cpuset is given
Ensure the binding policy gets set in all cases
2015-03-19 16:30:36 -07:00
Ralph Castain
5ae42c816e
Attempt to reduce the RARP traffic during definition of allocations
2015-03-16 16:26:40 -07:00
Gilles Gouaillardet
d1b2f043ff
fix misc memory leaks
...
as already reported by Coverity with CIDs
71818, 71819, 72250, 715767, 1196749 and 1274002
2015-03-05 13:58:05 +09:00
Gilles Gouaillardet
42f5a36ee3
rmaps/seq: fix misc memory leaks
...
as reported by Coverity with CIDs 1269886 and 1269887
2015-03-02 15:31:11 +09:00
Gilles Gouaillardet
0c7a2846d1
rmaps/rank_file: fix misc memory leaks
...
as reported by Coverity with CIDs 72250 and 1196774
2015-03-02 15:31:11 +09:00
Gilles Gouaillardet
c15b919635
rmaps/lama: fix misc memory leaks
...
as reported by Coverity with CIDs 719263, 719264, 1196712 and 1269842
2015-03-02 15:31:11 +09:00
Gilles Gouaillardet
456baeb71b
rmaps/base: fix misc memory leaks
...
as reported by Coverity with CIDs 1196751, 1196754, 1196755 and 1269866
2015-03-02 15:31:11 +09:00
Jeff Squyres
398ae15533
rmaps_base_frame: remove dead code
...
This was CID 1196641
2015-02-24 15:24:11 -05:00
Howard Pritchard
bf89131f9e
add owner files to opa/ompi/orte mca directories
...
This commit adds an owner file in each of the component directories
for each framework. This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page. Currently there are two
"fields" in the file, an owner and a status. A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
2015-02-22 15:10:23 -07:00
Ralph Castain
116fcaff2c
Start adding support for cmd line options to orte-submit
2015-02-10 12:13:21 -08:00
Ralph Castain
b314bfb5e9
If someone specifies the bitmap for hwthreads and wants hwthread cpus, then don't parse the slot list as it expects cores - just copy the provided bitmap across as it already has the required info
2014-12-19 10:56:14 -08:00
Ralph Castain
0630680f36
Two cleanups required for transfer to 1.8.4:
...
* Use %d format for the topo signature as some systems apparently have problems with %u
* Use correct variable in show_help message
2014-12-12 17:23:32 -08:00
Ralph Castain
b757b3f452
Ensure that the #nodes in the job map gets properly updated when using the sequential mapper. Provide some further diagnostic info to help understand the problem when encountered.
2014-12-08 08:03:53 -08:00
Ralph Castain
cb15cc06e1
Minor changes per Jeff's request on PR for 1.8.4
2014-12-02 19:54:10 -08:00
Ralph Castain
3f9d9ae8b6
Provide tighter LSF integration by correctly handling scenarios where the user has asked LSF to assign bindings. Fix a couple of typos in lex parser definitions. Tell hostfile parser to ignore binding designations in hostfiles. Add an attribute to indicate that cpusets were provided as physical cpu ids.
...
Once validated, a version of this will be backported to the v1.8.4 release.
2014-11-30 11:50:31 -08:00
Ralph Castain
2a90788724
Support physical processor ids in rankfile
2014-11-10 14:00:40 -08:00
Ralph Castain
ea11e63f59
Per patch from Tetsuya, allow the user to bind-to none when specifying multiple pe's/rank as requested by Reuti. This allows the user to reserve multiple "slots" in the allocation for each process while mapping, but not to bind the process to specific processing elements on the node.
...
Reviewed by rhc, so RM-approved to go across to v1.8.3
cmr=v1.8.3:reviewer=ompi-gk1.8
This commit was SVN r32701.
2014-09-10 15:52:18 +00:00
Ralph Castain
4207b4c4ad
Improve the --bind-to help message to better indicate the default options under various values of np. Remove the warning message if the user doesn't specify a binding policy and we are overloaded
...
cmr=v1.8.3:reviewer=jsquyres
This commit was SVN r32687.
2014-09-08 21:03:51 +00:00
Ralph Castain
842aaf6167
Correctly end mapping oversubscribed nodes round-robin byslot
...
cmr=v1.8.3:reviewer=rhc
This commit was SVN r32616.
2014-08-27 16:15:18 +00:00
Ralph Castain
024572cb6c
Sigh - I promised to remove these deprecation warnings back in June. My apologies to Dave Goodell and others who requested it.
...
cmr=v1.8.2:reviewer=dgoodell:subject=remove deprecation warnings for pernode, npernode, and npersocket
This commit was SVN r32552.
2014-08-19 19:40:20 +00:00
Gilles Gouaillardet
c3c364a262
check-help-strings cleanup
...
This commit was SVN r32494.
2014-08-11 03:22:05 +00:00
George Bosilca
daa076995a
orte_rmaps_numa_node_t -> opal_rmaps_numa_node_t
...
This commit was SVN r32380.
2014-07-31 19:58:47 +00:00
Ralph Castain
5bb5b22573
When a user asks for cpus/rank > 1 and only has one slot, we need to ensure we always map at least one process when they don't tell us -np
...
cmr=v1.8.2:reviewer=rhc:subject=correct num_procs in corner case
This commit was SVN r32142.
2014-07-04 17:00:35 +00:00
Ralph Castain
149810f02c
Per request from Jeff, slightly modify the show_help message as the precise name of the NUMA-containing packages differs based on OS and distro
...
cmr=v1.8.2:reviewer=jsquyres:subject=modify show_help message
This commit was SVN r32122.
2014-07-02 14:46:00 +00:00
Ralph Castain
8fca77c3d3
Protect the binding policy setting so it builds when --without-hwloc
...
Refs trac:4742
This commit was SVN r32085.
The following Trac tickets were found above:
Ticket 4742 --> https://svn.open-mpi.org/trac/ompi/ticket/4742
2014-06-25 18:13:54 +00:00
Ralph Castain
5f6be06b54
Per request from Gilles and discussion at devel conference, have the --oversubscribe option automatically set both oversubscribe and overload-allowed properties as this is likely what the user intended.
...
cmr=v1.8.2:reviewer=rhc:subject=automatically set oversub/load
This commit was SVN r32072.
2014-06-24 18:11:39 +00:00
Ralph Castain
645df5e823
Don't release the node_name field as it gets used in the slots parsing - will be released at newline detection
...
This commit was SVN r32058.
2014-06-20 13:18:46 +00:00