Ralph Castain
9d556e2f17
Allow daemons to use PMI to get their name where PMI support is available while using the standard grpcomm and other capabilities. Remove the GNI code from the alps ess component as that component should only be for alps/cnos installations.
...
This commit was SVN r25737.
2012-01-18 20:56:53 +00:00
Ralph Castain
6235a355de
Correctly handle co-spawning of daemons when attaching to a running job. We cannot use the general process mappers as we only want debugger daemons spawned on nodes where application procs already exist. So custom build the map for the debugger daemon job, and have the plm just launch that job without doing its usual vm-spawn step.
...
This commit was SVN r25736.
2012-01-18 00:19:49 +00:00
Ralph Castain
11a37d3978
Fix the default
...
This commit was SVN r25733.
2012-01-17 21:09:27 +00:00
Ralph Castain
12d163293b
Yeah, I know it's the middle of the afternoon. I'm bound to forget and commit this in with something else if I don't. Per request from LANL, if PMI support is requested on an ALPS machine, add a couple of libs in the right ordering so that static builds will work correctly.
...
This commit was SVN r25732.
2012-01-17 20:41:50 +00:00
Ralph Castain
fd0d9f73c6
Make preload_binaries an MCA param so it can be set in the default MCA parameters for a system
...
This commit was SVN r25728.
2012-01-17 17:16:05 +00:00
Shiqing Fan
f57f873404
Disable the debugger support for Windows.
...
This commit was SVN r25725.
2012-01-17 16:21:33 +00:00
Nathan Hjelm
a2437feba7
removed debug message
...
This commit was SVN r25722.
2012-01-12 20:23:59 +00:00
Nathan Hjelm
5ab1674138
fixed de bruijn copyrights
...
This commit was SVN r25720.
2012-01-12 17:18:08 +00:00
Nathan Hjelm
c57f18999d
added Debruijn routed component
...
This commit was SVN r25717.
2012-01-12 17:11:03 +00:00
Ralph Castain
477582abef
Grrrr....fix ALL the cases where the membind warning occurs.
...
This commit was SVN r25715.
2012-01-11 23:51:18 +00:00
Ralph Castain
ce7ddd0e10
Create the debugger attach fifo unless the user requests that we periodically poll insteaad.
...
This commit was SVN r25714.
2012-01-11 19:44:22 +00:00
Ralph Castain
bf103de66c
My apologies for doing this outside of the usual time restrictions, but we need to get this in so we can make progress.
...
Move the ORTE-level debugger code back into orterun and out of the ORTE library to resolve symbol conflicts.
This commit was SVN r25713.
2012-01-11 15:53:09 +00:00
Ralph Castain
167ad944c4
Surprise, surprise - hwloc treats memory binding as at the thread, not process, level. Thus, hwloc always sets the membind proc-level support flag to false, and indicates actual memory binding support via the thread-level flag. So...just to be safe, test -both- flags and issue the "no support" warning ONLY if both are false.
...
This commit was SVN r25709.
2012-01-11 01:12:57 +00:00
Shiqing Fan
e3dfc49ced
make correct use of the newly updated structures in the Windows module.
...
This commit was SVN r25699.
2012-01-09 11:08:34 +00:00
Ralph Castain
840841bb8f
Missed a couple
...
This commit was SVN r25686.
2011-12-29 23:30:19 +00:00
Ralph Castain
af7fb68cfb
If we forward envars in rsh, then we have to be very careful about both duplicate entries and disallowed characters on the cmd line. To aid with detecting duplicates, make all cmd line options be given in their mca variant. Check anything we might add for semi-colons and protect those values with quotes.
...
This commit was SVN r25685.
2011-12-29 23:25:25 +00:00
Jeff Squyres
a4c8bb27fa
Pull in the MPIR_Breakpoint symbol via a dummy function in
...
debuggers_base_fns.c: orte_debugger_base_pull_mpir_breakpoint().
This commit was SVN r25660.
2011-12-15 18:39:34 +00:00
Ralph Castain
2dd2694f25
Fix comm_spawn in oversubscribed conditions. IF oversubscription is allowed, let nodes flow into the mapper even if they are oversubscribed, constrained by the slots_max absolute ceiling. Cleanup error messages when comm_spawn fails so it correctly and succintly reports the ereror.
...
This commit was SVN r25659.
2011-12-15 18:04:48 +00:00
Ralph Castain
437c52d2bf
Routing must be enabled by default
...
This commit was SVN r25657.
2011-12-15 17:13:52 +00:00
Ralph Castain
1adefcc176
When routing is not enabled, all routes must go direct
...
This commit was SVN r25656.
2011-12-15 15:32:09 +00:00
Ralph Castain
a309c53bf2
Set the lifeline when we are tree spawning under rsh so that the orted can self-terminate when its parent dies
...
This commit was SVN r25655.
2011-12-15 15:29:53 +00:00
Nathan Hjelm
9dec101043
fix totalview launch through --debug
...
This commit was SVN r25654.
2011-12-15 15:19:13 +00:00
Ralph Castain
e683b2f9c7
Minor touchup - reset the pointer to the end of the list each time to ensure we get the nodes in correct daemon order
...
This commit was SVN r25651.
2011-12-14 22:16:52 +00:00
Ralph Castain
912abe8a6c
Catch one more use-case
...
This commit was SVN r25649.
2011-12-14 21:03:19 +00:00
Ralph Castain
f531b09a8d
Correctly handle -host and -hostfile options. Ensure the initial vm launch constrains itself to the union of specified hosts if those options are given. Get oversubscribe set correctly for that case.
...
This commit was SVN r25648.
2011-12-14 20:01:15 +00:00
George Bosilca
ac26f58bd7
I guess this wasn't yet ready for prime time.
...
This commit was SVN r25624.
2011-12-12 23:55:11 +00:00
Nathan Hjelm
885d5cbcf8
enable ptmalloc with using uGNI
...
This commit was SVN r25621.
2011-12-12 20:52:51 +00:00
Nathan Hjelm
be11acf727
bug fix. don't add node to allocated_nodes twice
...
This commit was SVN r25619.
2011-12-12 19:14:41 +00:00
Ralph Castain
3f1ae5d89b
No longer need this include
...
This commit was SVN r25606.
2011-12-09 00:40:07 +00:00
Ralph Castain
44094cd5b3
Remove compiler warning
...
This commit was SVN r25601.
2011-12-08 16:35:41 +00:00
Samuel Gutierrez
0a922dcb3e
fixes XE6 build.
...
This commit was SVN r25600.
2011-12-08 16:13:58 +00:00
Samuel Gutierrez
0588e9ba36
add Cray XK6 support to ras alps. the configuration file is a different format and is in a different place.
...
This commit was SVN r25599.
2011-12-08 14:05:02 +00:00
Ralph Castain
7180ad40ad
Fix a copule of minor buglets
...
This commit was SVN r25589.
2011-12-07 21:08:35 +00:00
Ralph Castain
3e7ab1212a
Since this has come up a number of times, have the rsh launcher add MCA params from the environment by default. If it finds that the cmd line is too long, error out with a message directing the user to set a param to ignore the environmental MCA params.
...
This commit was SVN r25581.
2011-12-07 01:24:36 +00:00
Ralph Castain
7510339725
Remove stale orte_vm_launch param. Add a param that allows users to specify envars to forward/set so they can do it in the MCA param file instead of only via mpirun cmd line.
...
This commit was SVN r25580.
2011-12-06 21:31:22 +00:00
Ralph Castain
15facc4ba6
Fix comm_spawn yet again...add another test
...
This commit was SVN r25579.
2011-12-06 20:15:40 +00:00
Ralph Castain
90b7f2a7bf
The rest of the multi app_context fix. Remove the restriction on number of app_contexts that can have zero np specified as multiple mappers now support that use-case. Update the ranking algorithms to respect and track bookmarks. Ensure we properly set the oversubscribed flag on a per-node basis.
...
This commit was SVN r25578.
2011-12-06 17:28:29 +00:00
Ralph Castain
d9c7764e9b
Remove some debug
...
This commit was SVN r25575.
2011-12-05 22:04:50 +00:00
Ralph Castain
df2f594aa8
Some cleanup associated with multiple app_contexts. Ensure nodes only get entered once into the map. Correctly handle bookmarks. Cleanup tracking of slots_inuse and correct detection of oversubscription.
...
Still need to resolve the ranking issue so it starts at the bookmark, but that will come next.
This commit was SVN r25574.
2011-12-05 22:01:08 +00:00
Abhishek Kulkarni
0b7c51fae2
Correct an invalid reference to a missing help file.
...
This commit was SVN r25573.
2011-12-05 21:29:07 +00:00
Josh Hursey
b5ac320826
* If not able to checkpoint at this time (say because we are already checkpointing or restarting) then make sure to re-set the listener so that we can checkpoint later.
...
* Work around duplicate node names in the map. It should not happen normally, but if the rmaps component gets this wrong provide a work around. Ralph is working on a rmaps fix for this, so we will likely remove/comment out the fix later.
This commit was SVN r25572.
2011-12-05 19:29:26 +00:00
Josh Hursey
cc57840b53
Fix ess/tool so that it does not segv when using the rsh PLM. Just have it use the base function directly to avoid similar problems with finalizing other components.
...
This commit was SVN r25571.
2011-12-05 15:40:46 +00:00
Ralph Castain
6fefe236a4
Warn users if they set opal_paffinity_alone, either to true or false, that this parameter is no longer functional - they must use the --bind-to option and its corresponding mca param.
...
This commit was SVN r25567.
2011-12-03 01:10:52 +00:00
Ralph Castain
6cbd8fa6c9
Keep everyone in sync with new job state
...
This commit was SVN r25563.
2011-12-02 14:12:40 +00:00
Ralph Castain
07655e2945
Handle the case where the allocator "fibs" to us about the node names. In some cases (ahem...you know who you are!), the allocator will tell us a node number (e.g., "16"). However, the daemon will return a node name (e.g., "nid0016") - leaving us not recognizing its location.
...
So provide a new parameter (can't have too many!) that handles this situation by stripping the prefix from the returned node name. Also do a little cleanup to ensure we cleanly exit from errors, without generating too many annoying messages.
This commit was SVN r25562.
2011-12-02 14:10:08 +00:00
Jeff Squyres
ecf6ba910c
Silence a few icc warnings and about mixing enums with other types.
...
This commit was SVN r25560.
2011-12-02 13:18:54 +00:00
Ralph Castain
641e17f26c
A better way of handling fqdn allocations. Prior method was wrong as it equated "node1" with "node10", which definitely caused problems.
...
Detect the addition of fqdn nodes in the allocation. If not found, then strip all incoming hostnames from daemons of any domain info when matching those names against the names in the node pool.
Leave some protection and "live" diagnostic output in place so we can continue to detect problems across all environments.
This commit was SVN r25557.
2011-12-01 14:24:43 +00:00
Ralph Castain
512aea79bc
Print the right nodename value, fix the strange case
...
This commit was SVN r25556.
2011-12-01 02:31:56 +00:00
Ralph Castain
44394c6b34
Add a little more protection
...
This commit was SVN r25555.
2011-12-01 00:30:56 +00:00
Ralph Castain
c4ea7a252a
Add a little protection against badly formed node names so we don't segfault if they are encountered
...
This commit was SVN r25554.
2011-11-30 23:33:59 +00:00