Ralph Castain
6a7fa79a09
Cleanup debug by converting to show_help, little more work to cleanup local vs remote ops when no preload is specified
...
This commit was SVN r20506.
2009-02-10 19:11:24 +00:00
Ralph Castain
b408cbd8c1
Crumby - get the make tarball correct! Earlier commit was from intermediate state...
...
This commit was SVN r20504.
2009-02-10 18:33:32 +00:00
Ralph Castain
7216c5b104
Add a new test to demonstrate how to use slave spawn on hybrid machines. Add some of the orte test programs to the tarball to help diagnose user problems and provide examples
...
This commit was SVN r20503.
2009-02-10 18:28:58 +00:00
Ralph Castain
bfdd066dac
Correctly set the library and binary path for prefix
...
This commit was SVN r20502.
2009-02-10 18:27:52 +00:00
Ralph Castain
d1b5afd9ea
If we don't pre-position the binaries, correctly setup the ssh command to execute the bootproxy
...
This commit was SVN r20501.
2009-02-10 18:27:10 +00:00
Shiqing Fan
2f1461419c
Add a new feature for checking mca subdirectories, i.e. detecting if there is an exclude file list which indicates the files that shouldn't be added to the source list. By default, the CMake build system will simply add all source files in the required sub folders, without knowing which files have to be excluded. The first use of it is in plm/base/.windows.
...
And clean up the nested variable names, in order to make it readable.
This commit was SVN r20498.
2009-02-10 17:20:13 +00:00
Ralph Castain
4cdf91a8d4
Per the RFC, extend the current use of the ompi_proc_t flags field (without changing the field itself).
...
The prior ompi_proc_t structure had a uint8_t flag field in it, where only one
bit was used to flag that a proc was "local". In that context, "local" was
constrained to mean "local to this node".
This commit provides a greater degree of granularity on the term "local", to include tests
to see if the proc is on the same socket, PC board, node, switch, CU (computing
unit), and cluster.
Add #define's to designate which bits stand for which local condition. This
was added to the OPAL layer to avoid conflicting with the proposed movement of
the BTLs. To make it easier to use, a set of macros have been defined - e.g.,
OPAL_PROC_ON_LOCAL_SOCKET - that test the specific bit. These can be used in
the code base to clearly indicate which sense of locality is being considered.
All locations in the code base that looked at the current proc_t field have
been changed to use the new macros.
Also modify the orte_ess modules so that each returns a uint8_t (to match the
ompi_proc_t field) that contains a complete description of the locality of this
proc. Obviously, not all environments will be capable of providing such detailed
info. Thus, getting a "false" from a test for "on_local_socket" may simply
indicate a lack of knowledge.
This commit was SVN r20496.
2009-02-10 02:20:16 +00:00
Ralph Castain
42df4b2102
Enable the slurmd plm module for testing - only selected if specified
...
This commit was SVN r20495.
2009-02-09 21:16:24 +00:00
Ralph Castain
26806c3fdd
Add new slave spawn test programs
...
This commit was SVN r20493.
2009-02-09 20:45:11 +00:00
Ralph Castain
f0af389910
Enable comm_spawn of slave processes, currently only active for the rsh, slurm, and tm environments. Establish support for local rsh environments in the plm/base so that rsh of local slaves can be done by any environment that supports it. Create new orte_rsh_agent param so users can specify rsh agent from outside of rsh plm, and sym link that to the old plm_rsh_agent and pls_rsh_agent options.
...
Modify the orte-bootproxy to pass prefix for the remote slave to support hetero/hybrid scenarios
This commit was SVN r20492.
2009-02-09 20:44:44 +00:00
Ralph Castain
631d7d2a85
Handle cases where daemon uri has quote marks around it
...
This commit was SVN r20491.
2009-02-09 20:40:17 +00:00
Ralph Castain
890eb9c0ce
Init variable
...
This commit was SVN r20490.
2009-02-09 20:39:48 +00:00
Ralph Castain
4286b7adb9
Deal with unknown return address for ompi-top option
...
This commit was SVN r20489.
2009-02-09 20:39:05 +00:00
Ralph Castain
cab5095ce8
Init variable
...
This commit was SVN r20488.
2009-02-09 20:38:15 +00:00
Ralph Castain
5bfd1f3fd0
Ensure we have a correct, non-zero exit status when daemons or procs abort or fail to launch
...
This commit was SVN r20478.
2009-02-07 00:57:17 +00:00
Ralph Castain
8924e00e4c
Ensure we don't segfault if we don't know which proc failed
...
This commit was SVN r20474.
2009-02-06 20:04:36 +00:00
Ralph Castain
0750103d6c
Teach the routed modules that local slave processes are direct routes to/from their master daemon.
...
This commit was SVN r20467.
2009-02-06 15:41:53 +00:00
Ralph Castain
13749673ed
Enable spawn of local slave processes - plm module implementation to follow
...
This commit was SVN r20466.
2009-02-06 15:31:33 +00:00
Ralph Castain
f8cd188367
Make the orte_pmap_t an object so it can be properly initialized. Adjust the construct function to properly indicate invalid node/local ranks
...
This commit was SVN r20465.
2009-02-06 15:29:33 +00:00
Ralph Castain
e2a8f45fba
Update the nidmap functions to include a new lookup_jmap entry, and to initialize the nidmap and pidmap for startup.
...
Have the singleton ess module use the new capability.
Adjust a comment in ess_base_put
This commit was SVN r20464.
2009-02-06 15:28:32 +00:00
Ralph Castain
c5b637418b
Ensure that the various grpcomm modules use a common data set and packing order for modex operations so that jobs using different grpcomm modules can still perform connect/accept.
...
Have dynamic grpcomm operations update the nidmap/pidmap to support additional features.
This commit was SVN r20463.
2009-02-06 15:25:06 +00:00
Ralph Castain
a6f9c1f2b1
Allocate the slots for use in the xgrid plm
...
This commit was SVN r20460.
2009-02-06 00:55:14 +00:00
Shiqing Fan
a20254c8a5
A few type casts, making the MS compiler silent.
...
This commit was SVN r20449.
2009-02-05 16:37:44 +00:00
Jeff Squyres
eaeed0402c
Can just use the built-in _SCRIPTS suffix.
...
This commit was SVN r20440.
2009-02-05 12:21:56 +00:00
Lenny Verkhovsky
5ae9fd9865
fixing r20436
...
This commit was SVN r20439.
The following SVN revision numbers were found above:
r20436 --> open-mpi/ompi@0d447511a5
2009-02-05 09:45:34 +00:00
Ralph Castain
6292b797e9
Add a new ESS module for use by local slave processes - only active when specifically selected
...
This commit was SVN r20438.
2009-02-05 06:07:48 +00:00
Ralph Castain
b7e6bafada
Add a new routed module for local slave processes to use - only active when specifically selected
...
This commit was SVN r20437.
2009-02-05 06:07:04 +00:00
Ralph Castain
0d447511a5
Add a shellscript for daemon-less launch of local slave processes. No manpage as this is totally for internal use only.
...
This commit was SVN r20436.
2009-02-05 06:05:28 +00:00
Jeff Squyres
a58d0d1a27
Fix CID 1219: ensure that "found" is initialized.
...
This commit was SVN r20434.
2009-02-05 01:57:20 +00:00
George Bosilca
4804ee60a7
It barely compiles ...
...
This commit was SVN r20433.
2009-02-05 00:14:28 +00:00
Ralph Castain
df3446faf1
Procs don't need to check for other job families to update routes - now that the direct routing module is gone, they always route through their daemons anyway, so save a couple of unnecessary steps.
...
This commit was SVN r20429.
2009-02-04 22:49:57 +00:00
Ralph Castain
dbba261451
Commit missing change in #define so r20427 doesn't break trunk
...
This commit was SVN r20428.
The following SVN revision numbers were found above:
r20427 --> open-mpi/ompi@b100513022
2009-02-04 22:37:24 +00:00
Ralph Castain
e694c0dac6
Get the various grpcomm modules to all inter-operate cleanly with the "hier" module
...
This commit was SVN r20426.
2009-02-04 22:26:35 +00:00
George Bosilca
c359762c2d
We're supposed to read a string and not an int ...
...
This commit was SVN r20421.
2009-02-04 15:51:31 +00:00
Ralph Castain
c534757b59
Correct use of the return code from opal_pointer_array_add
...
This commit was SVN r20417.
2009-02-04 14:02:51 +00:00
Ralph Castain
f36b9332ab
Pass along the new output-filename and xterm cmd line options to the orteds - otherwise, they won't work in ssh environments.
...
Modify the rsh launcher to add -X to ssh if xterm option was selected.
This commit was SVN r20407.
2009-02-03 20:06:05 +00:00
Ralph Castain
645f4c1f20
Silence compiler warnings about variables used before init
...
This commit was SVN r20406.
2009-02-03 20:04:27 +00:00
Ralph Castain
7282be4287
Silence compiler warnings about variables used before init
...
This commit was SVN r20405.
2009-02-03 20:04:01 +00:00
Ralph Castain
aa2abc8cac
Fix xgrid plm by changing orte_pointer_array calls to opal_pointer_array
...
This commit was SVN r20404.
2009-02-03 18:43:00 +00:00
Shiqing Fan
eab19af55c
Include the missing header that used by the fix commit r20402, and use the correct reference for the parameter of orte_odls_base_notify_iof_complete function call. Thanks Ralph for r20402.
...
This commit was SVN r20403.
The following SVN revision numbers were found above:
r20402 --> open-mpi/ompi@f1084d6b84
2009-02-03 18:14:43 +00:00
Ralph Castain
f1084d6b84
Under Windows, tell the orted that the proc has met its IOF termination conditions when launched since Windows does its own IO forwarding.
...
This commit was SVN r20402.
2009-02-03 16:41:07 +00:00
Ralph Castain
104a0539e3
Fix a format statement to be compatible with all gcc compiler versions
...
This commit was SVN r20400.
2009-02-02 15:47:07 +00:00
Ralph Castain
9d381a4ebf
Add a '!' option to the xterm iof option to invoke the -hold feature of xterm.
...
Correct the orte-show-help file when a rank is out of bounds, and do that test where a wildcard doesn't get incorrectly flagged as out-of-bounds.
This commit was SVN r20398.
2009-02-02 15:06:23 +00:00
Ralph Castain
0597fdd778
Ensure that orte-iof barks when given an unrecognized cmd line option
...
This commit was SVN r20397.
2009-02-02 14:10:54 +00:00
Ralph Castain
b19dc2a4fa
Update mpirun's man page for report-pid and report-uri options
...
This commit was SVN r20396.
2009-02-02 13:49:07 +00:00
Ralph Castain
d207c17adf
Fix a segv when an application isn't found - ensure we properly terminate.
...
This commit was SVN r20395.
2009-02-02 13:44:08 +00:00
Ralph Castain
c3261e1a05
Fix optimized builds
...
This commit was SVN r20394.
2009-02-01 20:58:17 +00:00
Ralph Castain
debf128e53
Ensure the static port array is correctly checked for size
...
This commit was SVN r20393.
2009-01-31 03:46:42 +00:00
Ralph Castain
2966206f58
Fix a race condition in the IOF and add some new user-requested features:
...
1. fix a race condition whereby a proc's output could trigger an event prior to the other outputs being setup, thus c ausing the IOF to declare the proc "terminated" too early. This was really rare, but could happen.
2. add a new "timestamp-output" option that timestamp's each line of output
3. add a new "output-filename" option that redirects each proc's output to a separate rank-named file.
4. add a new "xterm" option that redirects the output of the specified ranks to a separate xterm window.
This commit was SVN r20392.
2009-01-30 22:47:30 +00:00
Rolf vandeVaart
0704b98668
Add the ability to forward SIGTSTP (converted to SIGSTOP) and
...
SIGCONT to the a.outs. By default, they are not forwarded and
the behavior remains as it has always been. However, if one
runs with --mca orte_forward_job_control 1, then mpirun will
catch those two signals and forward them to the orteds which
will deliver them to the a.outs. We have had requests for
this feature.
This commit was SVN r20391.
2009-01-30 18:50:10 +00:00