Ralph Castain
0c043dbdc9
Fix typo in var name
2015-04-02 02:32:42 -07:00
Ralph Castain
a4b466efc4
Support attempts to connect async processes by allowing the oob/tcp connection to retry the attempt to connect to a peer. Off by default, operates if someone specifies how long to wait between retry attempts.
2015-04-01 20:21:23 -07:00
Ralph Castain
9f8ae59162
Properly enclose the different && clauses
2015-04-01 18:48:25 -07:00
Ralph Castain
57c21d5209
Ensure the DVM flows thru the "daemons reported" state
2015-04-01 16:47:34 -07:00
Jeff Squyres
99754afd25
orterun.c: re-justify the output message text
...
The type-A personality / english lit major in me compells me to
re-justify the text. :-)
2015-04-01 10:57:23 -07:00
Mike Dubman
8914a9c070
Merge pull request #494 from elenash/modifiers
...
changed mindist mapping policy specifier
2015-04-01 16:31:46 +03:00
Elena
1e913c76c4
changed mindist mapping policy specifier from map-bt dist:device,modifiers to --map-by dist:modifiers -mca rmaps_dist_device device
2015-04-01 15:07:35 +03:00
Nadezhda Kogteva
2d49d9bd45
grpcomm rcd: remove unnecessary malloc warning for case when number of daemons == 1
2015-04-01 11:07:44 +03:00
Mike Dubman
58d002098b
Merge pull request #474 from elenash/master
...
Introduce -tune command line option to set env vars and mca params from ...
2015-04-01 08:23:34 +03:00
Ralph Castain
b468f6a503
Okay, Jeff - use opal_setenv
2015-03-31 20:34:02 -07:00
Ralph Castain
6f9140a341
Add a little more debug to launch
2015-03-31 20:10:21 -07:00
Ralph Castain
e5d96417e7
Update warnings for run-as-root
2015-03-31 17:55:28 -07:00
Ralph Castain
41dd65d6cd
Per Jeff's request, tone down the comments and "standardize" the warning
2015-03-31 17:54:54 -07:00
Ralph Castain
f04eb6a9c0
Extend the root-user protection to some more ORTE tools
2015-03-31 10:34:35 -07:00
Ralph Castain
f863147b05
Per the telecon and chat with Jeff, let root only do the version option without warning. Otherwise, require that the user specifically indicate allow-use-as-root
2015-03-31 10:34:35 -07:00
Ralph Castain
b209c9efa5
Move the "dvm ready" message to stdout so it is easier to trap
2015-03-30 20:12:56 -07:00
Ralph Castain
6d205a3c80
Ensure that singletons pickup the oob/tcp component
2015-03-30 18:10:08 -07:00
Ralph Castain
2fa56fb329
Ensure that orte-submit picks the correct ess module as it is -never- allowed to be used as a distributed tool
...
Thanks to Mark Santcroos for diagnosing this one.
2015-03-30 18:08:34 -07:00
rhc54
bc016617a0
Merge pull request #501 from rhc54/topic/sec2
...
Support authentication across security domains
2015-03-30 09:59:43 -07:00
Nadezhda Kogteva
a828eada98
sm dstore: set pmix segment size to proper value
2015-03-30 13:34:25 +03:00
Ralph Castain
d07dc362d5
Ensure we can authenticate when crossing security domains by including all available credentials, and letting the receiver use the highest priority one they have in common.
2015-03-28 20:34:26 -07:00
Ralph Castain
b67b3619fc
If we are using the default bindings, and one or more nodes are not setup to support binding, then don't error out - just don't bind.
...
Thanks to Annu Desari for pointing out the problem.
2015-03-28 08:20:24 -07:00
Ralph Castain
2f365720b0
Allow root to request the version and help from mpirun without having to override the run-as-root protection.
...
Thanks to Robert McLay for pointing this out
2015-03-28 08:17:44 -07:00
Ralph Castain
d2d02a1642
ckpt
2015-03-28 07:59:20 -07:00
Elena
90f5b2bb84
Introduce -tune command line option to set env vars and mca params from file
2015-03-26 18:33:53 +02:00
rhc54
2ff7575dde
Merge pull request #497 from rhc54/topic/sec
...
Allow for different security domains.
2015-03-25 21:01:29 -07:00
Ralph Castain
6aa33deafb
Remove debug
2015-03-25 19:58:51 -07:00
Ralph Castain
10cf455080
Tools need to use the TCP OOB component
2015-03-25 19:56:49 -07:00
Ralph Castain
1b24536941
Allow for different security domains. Let the initiator of the connection determine the method to be used - if the receiver cannot support it, then that's an error that will cause the connection attempt to fail.
2015-03-25 13:22:01 -07:00
Ralph Castain
6ba76ed8d8
Per user request, we allow -host to specify a host that is not included in a hostfile (however, we reject it if we were given an allocation by a resource manager). Since we cannot know if an IP addr form references the same node that was previously given as a string name, we have no choice but to assume they are different. Get the topology from the right place in that situation so mpirun can succeed.
2015-03-25 06:16:01 -07:00
rhc54
df24816d64
Merge pull request #488 from lrrajesh/master
...
Notification msg add severity to the message header.
2015-03-20 09:45:46 -07:00
Ralph Castain
095a8fa684
We don't need to know about non-fatal errors from setting socket options
2015-03-20 07:16:31 -07:00
Ralph Castain
a013f3059f
For scalability reasons, and to make life easier for the poor Cray-ites, don't bang on the system for the username - we'll just use the uid.
2015-03-19 21:24:13 -07:00
Howard Pritchard
990e9b47e0
Merge pull request #486 from hppritcha/topic/issue_484
...
orte/oob: implement alps oob component
2015-03-19 19:40:40 -06:00
Ralph Castain
43a3baad5e
Ensure we use the first compute node's topology for mapping
...
Don't filter the topology by cpuset if you are mpirun until you know that no other compute nodes are involved. This deals with the corner case where mpirun is executing on a node of different topology from the compute nodes.
Simplify - don't mandate that all cpus in the given cpuset be present on every node. We can then run everything thru the filter as before, which ensures that any procs run on mpirun are also contained within the specified cpuset.
Correctly count the number of available PUs under each object when given a cpuset
Fix the default binding settings, and correctly count PUs when no cpuset is given
Ensure the binding policy gets set in all cases
2015-03-19 16:30:36 -07:00
Howard Pritchard
6054975913
oob/alps: add configure file for alps oob
...
Have to have alps rpms installed on a system
for alps component to build, even if separated
by a level of indirection.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-03-19 15:38:14 -07:00
Howard Pritchard
b1f31a4364
orte/oob: implement alps oob component
...
Implement an almost-do-nothing alps oob component.
When using aprun to launch a job on Cray system,
there is no reason to need an oob system, since ompi
relies on Cray PMI for oob communication.
Fixes #484
2015-03-19 14:11:40 -07:00
lrrajesh
4dc75687e2
Notification msg add severity to the output
2015-03-18 13:55:03 -07:00
Nadezhda Kogteva
7c25b4cea6
grpcomm: fixed brks and rcd algorithms - added enough space for masks in order to get them working in the large scale.
2015-03-18 14:33:04 +02:00
Ralph Castain
50277fec76
Adjust MCA param
2015-03-17 19:46:31 -07:00
rhc54
b41d2ad6c4
Merge pull request #481 from rhc54/topic/slurm
...
Add new MCA parameter to support edge case with debugger at LLNL
2015-03-17 07:40:55 -07:00
Ralph Castain
b01e8c1063
Include the FQDN version and non-stripped version of the hostname in our list of aliases as these (plus localhost) are the most common aliases we see.
2015-03-17 06:26:26 -07:00
Ralph Castain
d7d8ae46ed
We no longer pass the RML URI for procs launched via mpirun as the daemon has no need for that info.
2015-03-17 06:10:20 -07:00
Ralph Castain
3e32c360c7
Add new MCA parameter to support edge case with debugger at LLNL
2015-03-16 20:04:05 -07:00
Ralph Castain
a0487e014c
Further reduce the RARP load by removing getaddrinfo for IPv6 connections. Correct typo when checking return on inet_pton. Don't consider the TCP component for apps that are launched via mpirun as it will never be used.
2015-03-16 19:42:05 -07:00
Ralph Castain
5ae42c816e
Attempt to reduce the RARP traffic during definition of allocations
2015-03-16 16:26:40 -07:00
Ralph Castain
64d11f170a
Adjust the default keepalive interval. Refactor the code when setting keepalive options
2015-03-16 12:32:58 -07:00
Ralph Castain
4ded049cbc
Modify MCA param description
2015-03-16 11:57:32 -07:00
Ralph Castain
019bba5caf
Cleanup a bit - don't need to lookup the protocol number if we just use the right define
2015-03-16 11:54:51 -07:00
Ralph Castain
69ac25bf55
Add support for TCP keepalive on inter-node sockets
2015-03-16 09:59:44 -07:00