George Bosilca
4804ee60a7
It barely compiles ...
...
This commit was SVN r20433.
2009-02-05 00:14:28 +00:00
Ralph Castain
df3446faf1
Procs don't need to check for other job families to update routes - now that the direct routing module is gone, they always route through their daemons anyway, so save a couple of unnecessary steps.
...
This commit was SVN r20429.
2009-02-04 22:49:57 +00:00
Ralph Castain
dbba261451
Commit missing change in #define so r20427 doesn't break trunk
...
This commit was SVN r20428.
The following SVN revision numbers were found above:
r20427 --> open-mpi/ompi@b100513022
2009-02-04 22:37:24 +00:00
Ralph Castain
e694c0dac6
Get the various grpcomm modules to all inter-operate cleanly with the "hier" module
...
This commit was SVN r20426.
2009-02-04 22:26:35 +00:00
George Bosilca
c359762c2d
We're supposed to read a string and not an int ...
...
This commit was SVN r20421.
2009-02-04 15:51:31 +00:00
Ralph Castain
c534757b59
Correct use of the return code from opal_pointer_array_add
...
This commit was SVN r20417.
2009-02-04 14:02:51 +00:00
Ralph Castain
f36b9332ab
Pass along the new output-filename and xterm cmd line options to the orteds - otherwise, they won't work in ssh environments.
...
Modify the rsh launcher to add -X to ssh if xterm option was selected.
This commit was SVN r20407.
2009-02-03 20:06:05 +00:00
Ralph Castain
645f4c1f20
Silence compiler warnings about variables used before init
...
This commit was SVN r20406.
2009-02-03 20:04:27 +00:00
Ralph Castain
7282be4287
Silence compiler warnings about variables used before init
...
This commit was SVN r20405.
2009-02-03 20:04:01 +00:00
Ralph Castain
aa2abc8cac
Fix xgrid plm by changing orte_pointer_array calls to opal_pointer_array
...
This commit was SVN r20404.
2009-02-03 18:43:00 +00:00
Shiqing Fan
eab19af55c
Include the missing header that used by the fix commit r20402, and use the correct reference for the parameter of orte_odls_base_notify_iof_complete function call. Thanks Ralph for r20402.
...
This commit was SVN r20403.
The following SVN revision numbers were found above:
r20402 --> open-mpi/ompi@f1084d6b84
2009-02-03 18:14:43 +00:00
Ralph Castain
f1084d6b84
Under Windows, tell the orted that the proc has met its IOF termination conditions when launched since Windows does its own IO forwarding.
...
This commit was SVN r20402.
2009-02-03 16:41:07 +00:00
Ralph Castain
104a0539e3
Fix a format statement to be compatible with all gcc compiler versions
...
This commit was SVN r20400.
2009-02-02 15:47:07 +00:00
Ralph Castain
9d381a4ebf
Add a '!' option to the xterm iof option to invoke the -hold feature of xterm.
...
Correct the orte-show-help file when a rank is out of bounds, and do that test where a wildcard doesn't get incorrectly flagged as out-of-bounds.
This commit was SVN r20398.
2009-02-02 15:06:23 +00:00
Ralph Castain
0597fdd778
Ensure that orte-iof barks when given an unrecognized cmd line option
...
This commit was SVN r20397.
2009-02-02 14:10:54 +00:00
Ralph Castain
b19dc2a4fa
Update mpirun's man page for report-pid and report-uri options
...
This commit was SVN r20396.
2009-02-02 13:49:07 +00:00
Ralph Castain
d207c17adf
Fix a segv when an application isn't found - ensure we properly terminate.
...
This commit was SVN r20395.
2009-02-02 13:44:08 +00:00
Ralph Castain
c3261e1a05
Fix optimized builds
...
This commit was SVN r20394.
2009-02-01 20:58:17 +00:00
Ralph Castain
debf128e53
Ensure the static port array is correctly checked for size
...
This commit was SVN r20393.
2009-01-31 03:46:42 +00:00
Ralph Castain
2966206f58
Fix a race condition in the IOF and add some new user-requested features:
...
1. fix a race condition whereby a proc's output could trigger an event prior to the other outputs being setup, thus c ausing the IOF to declare the proc "terminated" too early. This was really rare, but could happen.
2. add a new "timestamp-output" option that timestamp's each line of output
3. add a new "output-filename" option that redirects each proc's output to a separate rank-named file.
4. add a new "xterm" option that redirects the output of the specified ranks to a separate xterm window.
This commit was SVN r20392.
2009-01-30 22:47:30 +00:00
Rolf vandeVaart
0704b98668
Add the ability to forward SIGTSTP (converted to SIGSTOP) and
...
SIGCONT to the a.outs. By default, they are not forwarded and
the behavior remains as it has always been. However, if one
runs with --mca orte_forward_job_control 1, then mpirun will
catch those two signals and forward them to the orteds which
will deliver them to the a.outs. We have had requests for
this feature.
This commit was SVN r20391.
2009-01-30 18:50:10 +00:00
Ralph Castain
5e6d3ba289
Initial implementation of static ports. Provide an mca param to specify static port ranges to the OOB - can provide an
...
y combination of comma-separated values and ranges. Daemons will use the first port in the range, MPI procs will use the other ports in the range assuming that they know their node rank in time and enough ports were specified.
NOTE: this capability only works under specific conditions. I will outline more about this in a note to devel as the remainder of the implementation progresses. For now, the only environment where this works is slurm. The linear routed module has also been adjusted to work with static ports so that all messaging flows strictly through the topology, including the initial daemon callback - thus limiting the number of sockets opened by mpirun.
This commit was SVN r20390.
2009-01-30 18:31:43 +00:00
Jeff Squyres
35c5e28a8e
Up to SVN r20383
...
This commit was SVN r20384.
The following SVN revision numbers were found above:
r20383 --> open-mpi/ompi@e0638c84c8
2009-01-29 17:59:04 +00:00
Jeff Squyres
bb3d258562
Round up a few places where PATH_MAX was used instead of
...
OMPI_PATH_MAX. Thanks to Andrea Iob for the bug report.
This commit was SVN r20360.
2009-01-27 22:57:50 +00:00
Ralph Castain
c92f906d7c
Move the daemon collectives out of the ODLS and into the GRPCOMM framework. This removes the inherent assumption that the OOB topology is a tree, thus allowing different grpcomm/routed combinations to implement collectives appropriate to their topology.
...
This commit was SVN r20357.
2009-01-27 19:13:56 +00:00
Ralph Castain
fd5e15ea58
Since parsing comma-delimited, range-capable options is being used in multiple places, create a new utility that consolidates that code.
...
Have orte-iof use it.
This commit was SVN r20346.
2009-01-25 17:16:25 +00:00
Ralph Castain
0435108834
Improve the efficiency of the launch system by changing the outer loop to being over app_context, and adding a flag to the app_context so the daemon can record that "this app is on my node" when decoding the launch msg.
...
If the --wdir option is given, check to see if the user provided a relative path. If so, convert it to an absolute path. This is needed to maintain consistent behavior across environements. Some environments automatically chdir to your current working directory when launching the remote orted, while others (e.g., ssh) don't. This levels the playing field and reduces user surprise.
This commit was SVN r20342.
2009-01-25 12:39:24 +00:00
Ralph Castain
40b6ed4a40
Take another crack at fixing the -wdir problem. Move the context checking code down into just prior to launching each child app. This is necessary so that individual app context wdir options are respected. Also, ensure that we return to our "base" directory after each app is launched so that the relative positions of the wdir options for each app_context are with respect to our base directory, instead of the last wdir option.
...
Hopefully, this will pass the "BigRed test". :-)
This commit was SVN r20341.
2009-01-24 20:59:27 +00:00
Tim Mattox
c2d105a4d9
Refs trac:1763: Fix -wdir option
...
Reverted r20306 since the fix caused 100% failues on our !BigRed system.
See the comments on ticket #1763 for the details.
This commit was SVN r20339.
The following SVN revision numbers were found above:
r20306 --> open-mpi/ompi@8c87e48721
The following Trac tickets were found above:
Ticket 1763 --> https://svn.open-mpi.org/trac/ompi/ticket/1763
2009-01-24 15:04:47 +00:00
Ralph Castain
c6c5bc17a0
Add a new hierarchical collective grpcomm component that performs modex and barrier across the procs instead of the daemons. Modeled on the tuned collectives. Collective code is in grpcomm base for eventual use by the daemon-based components as well.
...
This commit was SVN r20337.
2009-01-23 21:57:51 +00:00
Ralph Castain
7154cbf2e0
Cleanup a couple of mis-labeled diagnostic outputs
...
This commit was SVN r20332.
2009-01-23 20:46:54 +00:00
Josh Hursey
04c69b8a82
Fixes for --preload-files and --preload-binary.
...
* Improved the error propagation from a backend orted
* Fixed a hang in orterun due to failed files transferred
* Fix the movement of files with relative path names
* Improved error messages when a file cannot be moved
* Move file checks to FileM instead of embedding then in the ODLS
This commit Refs trac:1770
This commit was SVN r20331.
The following Trac tickets were found above:
Ticket 1770 --> https://svn.open-mpi.org/trac/ompi/ticket/1770
2009-01-23 15:32:24 +00:00
Josh Hursey
d066c67b53
We need to update both context->app and context->argv[0] with the new path when we use --preload-binary. This keeps orte from checking the wrong path later in the odls [orte_util_check_context_app() called from odls_base_default_setup_fork()].
...
Refs trac:1770
This commit was SVN r20321.
The following Trac tickets were found above:
Ticket 1770 --> https://svn.open-mpi.org/trac/ompi/ticket/1770
2009-01-22 19:18:36 +00:00
Ralph Castain
47740d1e87
Get the inequality the correct way!
...
This commit was SVN r20319.
2009-01-22 16:33:07 +00:00
Ralph Castain
f6ba4f6f30
Per discussion with Jeff, an invalid local rank value should never occur - if it does, it could be indicative of deeper problems in the launch procedure. Thus, rather than allowing the launch to proceed, let's abort.
...
This commit was SVN r20312.
2009-01-22 00:52:46 +00:00
Jeff Squyres
90e69ac6ff
Fix some man page nits noticed by the Debain OMPI maintainers. Thanks
...
Dirk!
This commit was SVN r20307.
2009-01-21 18:38:37 +00:00
Ralph Castain
8c87e48721
Fix a user-reported bug whereby the -wdir option would only be applied from the last app_context.
...
This commit was SVN r20306.
2009-01-21 15:52:12 +00:00
Josh Hursey
abfc7c6076
Per ticket #1527 orte-restart should be using {{{--default-hostfile}}} instead of {{{--hostfile}}} with app contexts.
...
Thanks to Gregor Dschung for reporting the problem.
This commit was SVN r20305.
2009-01-21 14:08:16 +00:00
Ralph Castain
5d9de3326c
Check for valid local/node ranks before using the returned values
...
This commit was SVN r20304.
2009-01-21 00:54:50 +00:00
Ralph Castain
a6a7335694
Catch a potential bug spanning several ESS modules. The node_rank and local_rank types were changed to uint16_t, however the modules returned UINT8_MAX as an "invalid" value. To clean this up, define an INVALID value for these types, and change the various modules so they return this value to indicate an invalid response.
...
This commit was SVN r20303.
2009-01-21 00:19:37 +00:00
Ralph Castain
4da9f53fa4
Implement the xml formatted output of stdout/err/diag. Force -tag-output if -xml is set.
...
This commit was SVN r20302.
2009-01-20 16:58:31 +00:00
Ralph Castain
88a0af9726
Revise the way we output resolved hostnames to make life easier for the Eclipse folks. Store aliases for individual nodes (only when requested to show resolved hostnames) and then report them out as part of the display-map option.
...
This commit was SVN r20284.
2009-01-15 18:11:50 +00:00
Ralph Castain
253a54df12
Shutdown the socket before closing for cleaner termination.
...
This commit was SVN r20283.
2009-01-15 18:06:01 +00:00
Ralph Castain
a9af219ba7
Fix CID 723: a pointless whine about not checking a return code
...
This commit was SVN r20274.
2009-01-14 19:06:36 +00:00
Jeff Squyres
a568ba0468
Fix CID 25: it's not possible for sav to be non-NULL by the time it
...
gets here.
This commit was SVN r20273.
2009-01-14 18:57:48 +00:00
Jeff Squyres
0c8f8fe1ea
Fix CID 733: remove some dead code (proc_name was set but effectively
...
never used).
This commit was SVN r20271.
2009-01-14 18:12:06 +00:00
Josh Hursey
a9da2dada1
Remove some unused variables.
...
This commit was SVN r20270.
2009-01-14 17:28:40 +00:00
Tim Mattox
5b70160626
For two error conditions in the ras_loadleveler_module, output
...
the error code reported by loadleveler. Also, clean up a
few more internal error messages.
This commit was SVN r20255.
2009-01-13 15:44:26 +00:00
Brian Barrett
d3310a5ad1
fixes to get compiling on Red Storm again
...
This commit was SVN r20252.
2009-01-12 22:30:00 +00:00
Ralph Castain
694008e9bb
Fix a reported bug whereby keyboard entry to a remote proc was being lost after the first iteration. In other words, if an application has a proc reading stdin from the keyboard, and that proc is not co-located with mpirun, then the system would hang.
...
The problem was eventually traced to two bugs in the code:
1. the orted wasn't resetting the write event flag, thus preventing itself from turning it on again.
2. the HNP needed to check if the stdin was attached to tty or not before adding the delay for fairness. If it is attached to a tty, there is no need for the delay. This prevents some strangely slow typing response.
This patch needs to move to 1.3
This commit was SVN r20246.
2009-01-12 20:12:58 +00:00