Ralph Castain
fb1ecb7a45
Fix orted termination so we get the #@# relay out before we exit ourselves.
...
Minor change in the way we respond to job info requests - needed for coming change.
This commit was SVN r20698.
2009-03-03 13:38:29 +00:00
Jeff Squyres
d5eddc7541
Some minor fixups / patches from Bert Wesarg.
...
This commit was SVN r20697.
2009-03-03 13:09:19 +00:00
Jeff Squyres
f81d357c53
Free a little memory. Thanks for the patch from Bert Wesarg.
...
This commit was SVN r20694.
2009-03-03 12:33:43 +00:00
Jeff Squyres
f8daa60b1b
Fix typo noted by Bery Wesarg.
...
This commit was SVN r20693.
2009-03-03 12:16:57 +00:00
George Bosilca
02de7846f8
Correctly tag the help message.
...
This commit was SVN r20683.
2009-03-02 22:10:45 +00:00
Josh Hursey
6d79a0398d
Fix a bounds check that prevented some vpid resolution in certian launch scenarios.
...
Traced back to r20629.
This commit was SVN r20675.
The following SVN revision numbers were found above:
r20629 --> open-mpi/ompi@dcff523244
2009-03-02 18:26:48 +00:00
Ralph Castain
c7fda41d2a
Only remove children from the local child list when the job completes so we update the status on all procs in the job and can properly terminate the job.
...
Correct an error in a debugging output
This commit was SVN r20669.
2009-03-01 20:12:20 +00:00
Ralph Castain
47cfccbb49
Update a couple of tests
...
This commit was SVN r20668.
2009-03-01 15:32:32 +00:00
Ralph Castain
15171e4ba8
Remove completed children from the local list of child processes so that we properly track our number of children. Otherwise, we can artificially believe we have exceeded system limits on the number of local children.
...
This commit was SVN r20667.
2009-03-01 15:31:27 +00:00
Ralph Castain
f0fcaf8b32
For some reason, the buffer gets trashed, so for now, let's process and then relay...until I can figure out the race condition that is causing the problem.
...
This commit was SVN r20665.
2009-03-01 01:24:02 +00:00
Ralph Castain
c2ff8dc5ce
Fix notifier base functions to match revised notifier.h framework APIs
...
This commit was SVN r20663.
2009-02-28 23:46:18 +00:00
Ralph Castain
11979c100a
Silence pointless compiler warning
...
This commit was SVN r20661.
2009-02-28 15:35:48 +00:00
Tim Mattox
57be80c983
First pass at integrating the CIFTS/FTB support as
...
a notifier module.
The Notifier framework was extended slightly to
convey more information about each event notice.
This works with the FTB v0.5 API.
To compile with FTB support, use --with-ftb=/path/to/ftb/install
CIFTS == Coordinated Infrastructure for Fault Tolerant Systems
FTB == Fault Tolerance Backplane
see http://wiki.mcs.anl.gov/cifts/index.php
This commit was SVN r20655.
2009-02-27 22:53:43 +00:00
Ralph Castain
7e5dc8f2be
Ensure that we turn off stdin read event when ctrl-c terminating a program
...
This commit was SVN r20654.
2009-02-27 15:01:28 +00:00
Ralph Castain
b8ffa302da
Separate abnormal job termination from abnormal orted termination so we can continue to use xcast for orted cmds, but can know to turn off reading of stdin as the job is being terminated.
...
This commit was SVN r20650.
2009-02-27 10:16:25 +00:00
Ralph Castain
4f75f6e443
Fix a bug where we were not stopping the read event on stdin if the write to stdin of the target process was backing up.
...
Ensure we stop reading stdin if we are abnormally terminating - no point in doing so since the job is being terminated.
This commit was SVN r20649.
2009-02-27 09:31:34 +00:00
Rainer Keller
1745895d09
- Sorry to come back to this, but revert r20643...
...
Headers should be included in the .c directly.
This commit was SVN r20645.
The following SVN revision numbers were found above:
r20643 --> open-mpi/ompi@e46c512ee7
2009-02-26 22:01:01 +00:00
Josh Hursey
e46c512ee7
Fix a couple of missing headers resulting from recent cleanup
...
This commit was SVN r20643.
2009-02-26 16:56:56 +00:00
Shiqing Fan
4d3f801dbd
Try to find the installed flex on current windows system first, if it's not there, just use the one comes along with the source.
...
This commit was SVN r20642.
2009-02-26 13:03:53 +00:00
Rainer Keller
4c0e8e1e69
- Header orte/mca/oob/base/base.h is probably the wrong one to include
...
anyhow -- if oob functionality is neededm then orte/mca/oob/oob.h
Nevertheless compiles fine with -Wimplicit-function-declaration
This commit was SVN r20641.
2009-02-26 04:20:03 +00:00
Rainer Keller
04567d3af0
- Header orte/mca/errmgr/errmgr.h is not needed.
...
Once again compiles fine with -Wimplicit-function-declaration
This commit was SVN r20640.
2009-02-26 04:05:30 +00:00
Rainer Keller
96e1b9b747
- Header orte/mca/rml/rml.h is not needed if no occurence of orte_rml
...
or ORTE_RML.
As the others compiles fine with -Wimplicit-function-declaration
This commit was SVN r20639.
2009-02-26 03:52:31 +00:00
Rainer Keller
bcac113b13
- Header orte/mca/ess/ess.h not being used
...
This commit was SVN r20638.
2009-02-26 03:28:59 +00:00
Shiqing Fan
2326f14be5
Remove the unnecessary PROJECT command, I somehow misunderstood how it should be used on Windows....
...
This commit was SVN r20634.
2009-02-25 16:07:43 +00:00
Ralph Castain
f3ffe48edd
Remove debug output
...
This commit was SVN r20632.
2009-02-25 04:01:09 +00:00
Rainer Keller
b356e90fa1
- Get rid of include orte/util/proc_info.h, if not needed
...
Only proc_info.h-internal include file is opal/dss/dss_types.h
- In one case (orte/util/hnp_contact.c) had to add proc_info.h again.
- Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
works fine, no errors.
Again, let's have MTT the last word.
This commit was SVN r20631.
2009-02-25 03:38:00 +00:00
Ralph Castain
85a9a2e6d8
Ensure that signals are de-trapped before exiting to stop the $#@@#$ event library from "asserting"
...
This commit was SVN r20630.
2009-02-25 03:10:21 +00:00
Ralph Castain
dcff523244
Fix a race condition that causes corruption of a buffer in mpirun while trying to process launch_local_proc cmds.
...
Cleanup the pidmap handling by changing from value to pointer arrays.
This commit was SVN r20629.
2009-02-25 02:43:22 +00:00
Shiqing Fan
3656a38a03
Fix a few type casts for windows.
...
This commit was SVN r20622.
2009-02-23 14:09:07 +00:00
Ralph Castain
1e5aa40e3f
Ensure that this component is not selected by tools, or anything other than an MPI proc
...
This commit was SVN r20608.
2009-02-20 15:01:58 +00:00
Rainer Keller
02599446d0
- Occurences of ORTE_PROC_MY_NAME require orte/runtime/orte_globals.h
...
This commit was SVN r20607.
2009-02-20 03:16:13 +00:00
Ralph Castain
5dc4a2b1e0
Add missing include file
...
This commit was SVN r20603.
2009-02-19 21:40:31 +00:00
Ralph Castain
ca97f315fe
Enable direct launch of applications under SLURM. Compute all required nidmap and mpidmap info based on publicly available SLURM environmental variables so that no linkage to SLURM libraries is required.
...
Note: this requires that nodes not be shared by jobs/users. SLURM developers are working on an enhancement to remove this constraint.
Note 2: yes, the direct routed module returned! However, it is vastly different than the old one and has zero support for such things as comm_spawn. It is solely to support non-daemon, direct-launch environments.
This commit was SVN r20601.
2009-02-19 21:39:54 +00:00
Ralph Castain
76fc406b08
Modify envars passed to support new proc_info and hier expectations
...
This commit was SVN r20600.
2009-02-19 21:36:30 +00:00
Ralph Castain
8359477387
Modify the base collective algorithms to take an array of arbitrary vpids instead of assuming everything is ordered in a particular way. Modify the hier grpcomm module to support arbitrary mappings
...
This commit was SVN r20599.
2009-02-19 21:35:20 +00:00
Ralph Castain
6151f7b60c
Enable static ports for application procs during self-bootstrap for non-daemon environments by letting them select what port to use based on node rank and attempting to connect to the peer on that port
...
Note that this assumes non-shared nodes...but only takes affect if there is no prior knowledge of how to talk to the specified peer. Thus, all daemon-based environments are unaffected.
This commit was SVN r20598.
2009-02-19 21:33:46 +00:00
Ralph Castain
9c2c17beb0
Split out the nidmap init function that adds entries for the local node and proc so these can be separate functions
...
This commit was SVN r20597.
2009-02-19 21:28:58 +00:00
Ralph Castain
2759b8e5e5
Add a central capability to parse regular expressions for node and ppn info - constructing the regex to come soon.
...
This commit was SVN r20596.
2009-02-19 20:46:36 +00:00
Ralph Castain
6db641c86d
Pass the number of nodes in a job to the process
...
This commit was SVN r20595.
2009-02-19 20:45:07 +00:00
Rolf vandeVaart
515b99b357
Under SGE, the orted should not daemonize by default.
...
Also create mca parameter to force daemonization (previous
behavior) which might be needed on larger clusters or
to make use of the -notify flag with qsub.
This fixes trac:1783.
This commit was SVN r20582.
The following Trac tickets were found above:
Ticket 1783 --> https://svn.open-mpi.org/trac/ompi/ticket/1783
2009-02-18 18:02:38 +00:00
George Bosilca
8f1c7cf8c2
Make sure we correctly unregister all persistent events
...
and signal handlers.
This commit was SVN r20568.
2009-02-17 00:20:05 +00:00
George Bosilca
63754be94f
Allow the tools to remove the cleanly finalize without
...
leaving the sighandler behind.
This commit was SVN r20567.
2009-02-16 20:04:55 +00:00
Shiqing Fan
3f6c64f2e3
Include a missing header,which was implicitly included and removed.
...
This commit was SVN r20563.
2009-02-16 12:38:38 +00:00
George Bosilca
4004cb11bc
Release the orte_default_hostfile.
...
This commit was SVN r20561.
2009-02-14 21:49:56 +00:00
Rainer Keller
d81443cc5a
- On the way to get the BTLs split out and lessen dependency on orte:
...
Often, orte/util/show_help.h is included, although no functionality
is required -- instead, most often opal_output.h, or
orte/mca/rml/rml_types.h
Please see orte_show_help_replacement.sh commited next.
- Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
actually showed two *missing* #include "orte/util/show_help.h"
in orte/mca/odls/base/odls_base_default_fns.c and
in orte/tools/orte-top/orte-top.c
Manually added these.
Let's have MTT the last word.
This commit was SVN r20557.
2009-02-14 02:26:12 +00:00
Ralph Castain
3e5ab0ac8c
Ensure proper error reporting when -wdir options fail.
...
This commit was SVN r20555.
2009-02-13 19:46:24 +00:00
George Bosilca
fa7b499519
Move a data declaration down the stack.
...
This commit was SVN r20552.
2009-02-13 16:34:51 +00:00
Jeff Squyres
91d302fd67
A bunch of minor ORTE valgrind-inspired memory leak cleanups (reviewed
...
by Ralph).
This commit was SVN r20544.
2009-02-13 04:14:10 +00:00
Rolf vandeVaart
ce97c27a53
Make sure we create a valid parth argument for execve.
...
This gets SGE working in the trunk again.
This commit was SVN r20531.
2009-02-12 18:27:40 +00:00
Ralph Castain
91bc5346eb
Update cell example
...
This commit was SVN r20528.
2009-02-12 16:36:11 +00:00