Ralph Castain
7d4f9970d8
Minor cleanup
2015-04-29 17:49:35 -07:00
Jeff Squyres
8fbf34b196
oob ud: put call to ibv_fork_init() before *all* ibv calls
...
Move the call to opal_common_verbs_fork_test() to up before the call
to ibv_get_device_list() (just curious -- why not use
opal_ibv_get_device_list()?). This ensures that the call to
ibv_fork_init() is before *all* other ibv_* calls.
2015-04-24 14:19:06 -07:00
Ralph Castain
9104e81958
When --map-by node, we should be unbound. Also remove dead code due to copy/paste error.
2015-04-23 20:35:54 -07:00
Ralph Castain
5003be5c5c
If the user specifies a --map-by <foo> option, then default to bind-to <foo> unless they specify a bind-to option. If they map-by slot/node, then use the default policy based on num_procs.
2015-04-23 13:30:21 -07:00
Ralph Castain
43229d056e
Protect one more place from a NULL object
2015-04-20 18:45:57 -07:00
Jeff Squyres
11e8c2096b
plm rsh: assign some levels to the rsh PLM MCA params
2015-04-20 16:18:57 -07:00
Nathan Hjelm
359a282e7d
ess/singleton: MCA variable synonyms can not currently have NULL for both framework and component
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-20 16:50:52 -06:00
Nathan Hjelm
45e053dbce
orte: use C99 subobject naming for component initialization
...
This commit helps future-proof orte components by initializing each
component member by name.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Ralph Castain
34b53ac3dc
Silence Coverity warnings
2015-04-18 07:48:22 -07:00
Ralph Castain
12bfb27161
Redo in cleaner form: Per request from Andy Rieb, add ability to pass PATH and LD_LIBRARY_PATH elements to ssh command
2015-04-17 16:11:37 -07:00
Nadezhda Kogteva - nadezhda.kogteva@itseez.com
c2678b0cc9
oob ud: fixes and parameter adjustment
2015-04-17 16:22:43 +03:00
Nathan Hjelm
3436f2917d
Merge pull request #449 from hjelmn/mca_base_update
...
mca/base update
2015-04-16 08:41:48 -06:00
Ralph Castain
d9c555b547
Revert "Per request from Andy Rieb, add ability to pass PATH and LD_LIBRARY_PATH elements to ssh command"
...
This reverts commit open-mpi/ompi@278324c52a .
Revert "Add the ability to pass args to the rsh/ssh command line"
This reverts commit open-mpi/ompi@6f227f8564 .
2015-04-16 08:03:14 -06:00
Ralph Castain
278324c52a
Per request from Andy Rieb, add ability to pass PATH and LD_LIBRARY_PATH elements to ssh command
2015-04-15 20:30:04 -06:00
Ralph Castain
6f227f8564
Add the ability to pass args to the rsh/ssh command line
2015-04-15 20:07:13 -06:00
Howard Pritchard
283ef4c05d
oob/config: if --with-verbs=no, no ud
...
The oob/ud configure was not honoring the case
if the ompi is configured with --with-verbs=no.
This fixes that problems.
Fixes #522
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-04-14 06:31:18 -07:00
Ralph Castain
9c6d452d6b
If we are using HT cpus and have <= 2 procs, then map-by hwthread by default
2015-04-11 21:18:05 -07:00
Ralph Castain
cd686057f6
If the HNP is on a coprocessor, record it so we don't get an error log later
2015-04-11 15:30:15 -07:00
Ralph Castain
91e1cbf284
Init variable
2015-04-11 07:44:57 -07:00
Ralph Castain
033418f62a
Correct a typo that reversed the default binding pattern. Ensure we default bind to hwthread if user specified --use-hwthread-cpus if nprocs <= 2, and bind to hwthread if told to do so.
2015-04-10 15:58:35 -07:00
Ralph Castain
3e44d3c9e3
Enable singletons to run without any active OOB module until they attempt to comm_spawn
2015-04-10 14:06:42 -07:00
Ralph Castain
e4f6f83b9d
Attempt to silence new Coverity complaint by ensuring the string read from file is NULL terminated.
2015-04-10 07:54:37 -07:00
Ralph Castain
396700ad8b
Protect the notifier macro's against NULL job objects
2015-04-09 16:04:43 -07:00
Nathan Hjelm
c416c423bb
ess/singleton: do not put component strings into the environment
...
putenv requires that any string put into the environment is not
changed or freed. That is not the case with constant strings as they
will go away when dlclose is called on the component. Instead, just
use opal_setenv which does not have this restriction.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-09 11:00:47 -06:00
Ralph Castain
0c043dbdc9
Fix typo in var name
2015-04-02 02:32:42 -07:00
Ralph Castain
a4b466efc4
Support attempts to connect async processes by allowing the oob/tcp connection to retry the attempt to connect to a peer. Off by default, operates if someone specifies how long to wait between retry attempts.
2015-04-01 20:21:23 -07:00
Ralph Castain
9f8ae59162
Properly enclose the different && clauses
2015-04-01 18:48:25 -07:00
Ralph Castain
57c21d5209
Ensure the DVM flows thru the "daemons reported" state
2015-04-01 16:47:34 -07:00
Mike Dubman
8914a9c070
Merge pull request #494 from elenash/modifiers
...
changed mindist mapping policy specifier
2015-04-01 16:31:46 +03:00
Elena
1e913c76c4
changed mindist mapping policy specifier from map-bt dist:device,modifiers to --map-by dist:modifiers -mca rmaps_dist_device device
2015-04-01 15:07:35 +03:00
Nadezhda Kogteva
2d49d9bd45
grpcomm rcd: remove unnecessary malloc warning for case when number of daemons == 1
2015-04-01 11:07:44 +03:00
Mike Dubman
58d002098b
Merge pull request #474 from elenash/master
...
Introduce -tune command line option to set env vars and mca params from ...
2015-04-01 08:23:34 +03:00
Ralph Castain
6f9140a341
Add a little more debug to launch
2015-03-31 20:10:21 -07:00
Ralph Castain
b209c9efa5
Move the "dvm ready" message to stdout so it is easier to trap
2015-03-30 20:12:56 -07:00
Ralph Castain
6d205a3c80
Ensure that singletons pickup the oob/tcp component
2015-03-30 18:10:08 -07:00
rhc54
bc016617a0
Merge pull request #501 from rhc54/topic/sec2
...
Support authentication across security domains
2015-03-30 09:59:43 -07:00
Ralph Castain
d07dc362d5
Ensure we can authenticate when crossing security domains by including all available credentials, and letting the receiver use the highest priority one they have in common.
2015-03-28 20:34:26 -07:00
Ralph Castain
b67b3619fc
If we are using the default bindings, and one or more nodes are not setup to support binding, then don't error out - just don't bind.
...
Thanks to Annu Desari for pointing out the problem.
2015-03-28 08:20:24 -07:00
Ralph Castain
d2d02a1642
ckpt
2015-03-28 07:59:20 -07:00
Nathan Hjelm
b68d66bb9b
MCA: Add the project/project version to the MCA base component
...
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.
All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-03-27 10:59:04 -06:00
Elena
90f5b2bb84
Introduce -tune command line option to set env vars and mca params from file
2015-03-26 18:33:53 +02:00
rhc54
2ff7575dde
Merge pull request #497 from rhc54/topic/sec
...
Allow for different security domains.
2015-03-25 21:01:29 -07:00
Ralph Castain
6aa33deafb
Remove debug
2015-03-25 19:58:51 -07:00
Ralph Castain
10cf455080
Tools need to use the TCP OOB component
2015-03-25 19:56:49 -07:00
Ralph Castain
1b24536941
Allow for different security domains. Let the initiator of the connection determine the method to be used - if the receiver cannot support it, then that's an error that will cause the connection attempt to fail.
2015-03-25 13:22:01 -07:00
Ralph Castain
6ba76ed8d8
Per user request, we allow -host to specify a host that is not included in a hostfile (however, we reject it if we were given an allocation by a resource manager). Since we cannot know if an IP addr form references the same node that was previously given as a string name, we have no choice but to assume they are different. Get the topology from the right place in that situation so mpirun can succeed.
2015-03-25 06:16:01 -07:00
rhc54
df24816d64
Merge pull request #488 from lrrajesh/master
...
Notification msg add severity to the message header.
2015-03-20 09:45:46 -07:00
Ralph Castain
095a8fa684
We don't need to know about non-fatal errors from setting socket options
2015-03-20 07:16:31 -07:00
Howard Pritchard
990e9b47e0
Merge pull request #486 from hppritcha/topic/issue_484
...
orte/oob: implement alps oob component
2015-03-19 19:40:40 -06:00
Ralph Castain
43a3baad5e
Ensure we use the first compute node's topology for mapping
...
Don't filter the topology by cpuset if you are mpirun until you know that no other compute nodes are involved. This deals with the corner case where mpirun is executing on a node of different topology from the compute nodes.
Simplify - don't mandate that all cpus in the given cpuset be present on every node. We can then run everything thru the filter as before, which ensures that any procs run on mpirun are also contained within the specified cpuset.
Correctly count the number of available PUs under each object when given a cpuset
Fix the default binding settings, and correctly count PUs when no cpuset is given
Ensure the binding policy gets set in all cases
2015-03-19 16:30:36 -07:00