1
1

2277 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
fb1ecb7a45 Fix orted termination so we get the #@# relay out before we exit ourselves.
Minor change in the way we respond to job info requests - needed for coming change.

This commit was SVN r20698.
2009-03-03 13:38:29 +00:00
Jeff Squyres
d5eddc7541 Some minor fixups / patches from Bert Wesarg.
This commit was SVN r20697.
2009-03-03 13:09:19 +00:00
Jeff Squyres
f81d357c53 Free a little memory. Thanks for the patch from Bert Wesarg.
This commit was SVN r20694.
2009-03-03 12:33:43 +00:00
Jeff Squyres
f8daa60b1b Fix typo noted by Bery Wesarg.
This commit was SVN r20693.
2009-03-03 12:16:57 +00:00
George Bosilca
02de7846f8 Correctly tag the help message.
This commit was SVN r20683.
2009-03-02 22:10:45 +00:00
Josh Hursey
6d79a0398d Fix a bounds check that prevented some vpid resolution in certian launch scenarios.
Traced back to r20629.

This commit was SVN r20675.

The following SVN revision numbers were found above:
  r20629 --> open-mpi/ompi@dcff523244
2009-03-02 18:26:48 +00:00
Ralph Castain
c7fda41d2a Only remove children from the local child list when the job completes so we update the status on all procs in the job and can properly terminate the job.
Correct an error in a debugging output

This commit was SVN r20669.
2009-03-01 20:12:20 +00:00
Ralph Castain
47cfccbb49 Update a couple of tests
This commit was SVN r20668.
2009-03-01 15:32:32 +00:00
Ralph Castain
15171e4ba8 Remove completed children from the local list of child processes so that we properly track our number of children. Otherwise, we can artificially believe we have exceeded system limits on the number of local children.
This commit was SVN r20667.
2009-03-01 15:31:27 +00:00
Ralph Castain
f0fcaf8b32 For some reason, the buffer gets trashed, so for now, let's process and then relay...until I can figure out the race condition that is causing the problem.
This commit was SVN r20665.
2009-03-01 01:24:02 +00:00
Ralph Castain
c2ff8dc5ce Fix notifier base functions to match revised notifier.h framework APIs
This commit was SVN r20663.
2009-02-28 23:46:18 +00:00
Ralph Castain
11979c100a Silence pointless compiler warning
This commit was SVN r20661.
2009-02-28 15:35:48 +00:00
Tim Mattox
57be80c983 First pass at integrating the CIFTS/FTB support as
a notifier module.
The Notifier framework was extended slightly to
convey more information about each event notice.
This works with the FTB v0.5 API.

To compile with FTB support, use --with-ftb=/path/to/ftb/install

CIFTS == Coordinated Infrastructure for Fault Tolerant Systems
FTB == Fault Tolerance Backplane
see http://wiki.mcs.anl.gov/cifts/index.php

This commit was SVN r20655.
2009-02-27 22:53:43 +00:00
Ralph Castain
7e5dc8f2be Ensure that we turn off stdin read event when ctrl-c terminating a program
This commit was SVN r20654.
2009-02-27 15:01:28 +00:00
Ralph Castain
b8ffa302da Separate abnormal job termination from abnormal orted termination so we can continue to use xcast for orted cmds, but can know to turn off reading of stdin as the job is being terminated.
This commit was SVN r20650.
2009-02-27 10:16:25 +00:00
Ralph Castain
4f75f6e443 Fix a bug where we were not stopping the read event on stdin if the write to stdin of the target process was backing up.
Ensure we stop reading stdin if we are abnormally terminating - no point in doing so since the job is being terminated.

This commit was SVN r20649.
2009-02-27 09:31:34 +00:00
Rainer Keller
1745895d09 - Sorry to come back to this, but revert r20643...
Headers should be included in the .c directly.

This commit was SVN r20645.

The following SVN revision numbers were found above:
  r20643 --> open-mpi/ompi@e46c512ee7
2009-02-26 22:01:01 +00:00
Josh Hursey
e46c512ee7 Fix a couple of missing headers resulting from recent cleanup
This commit was SVN r20643.
2009-02-26 16:56:56 +00:00
Shiqing Fan
4d3f801dbd Try to find the installed flex on current windows system first, if it's not there, just use the one comes along with the source.
This commit was SVN r20642.
2009-02-26 13:03:53 +00:00
Rainer Keller
4c0e8e1e69 - Header orte/mca/oob/base/base.h is probably the wrong one to include
anyhow -- if oob functionality is neededm then orte/mca/oob/oob.h

   Nevertheless compiles fine with -Wimplicit-function-declaration   

This commit was SVN r20641.
2009-02-26 04:20:03 +00:00
Rainer Keller
04567d3af0 - Header orte/mca/errmgr/errmgr.h is not needed.
Once again compiles fine with -Wimplicit-function-declaration   

This commit was SVN r20640.
2009-02-26 04:05:30 +00:00
Rainer Keller
96e1b9b747 - Header orte/mca/rml/rml.h is not needed if no occurence of orte_rml
or ORTE_RML.
   As the others compiles fine with -Wimplicit-function-declaration

This commit was SVN r20639.
2009-02-26 03:52:31 +00:00
Rainer Keller
bcac113b13 - Header orte/mca/ess/ess.h not being used
This commit was SVN r20638.
2009-02-26 03:28:59 +00:00
Shiqing Fan
2326f14be5 Remove the unnecessary PROJECT command, I somehow misunderstood how it should be used on Windows....
This commit was SVN r20634.
2009-02-25 16:07:43 +00:00
Ralph Castain
f3ffe48edd Remove debug output
This commit was SVN r20632.
2009-02-25 04:01:09 +00:00
Rainer Keller
b356e90fa1 - Get rid of include orte/util/proc_info.h, if not needed
Only proc_info.h-internal include file is opal/dss/dss_types.h
 - In one case (orte/util/hnp_contact.c) had to add proc_info.h again.
 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   works fine, no errors.

   Again, let's have MTT the last word.

This commit was SVN r20631.
2009-02-25 03:38:00 +00:00
Ralph Castain
85a9a2e6d8 Ensure that signals are de-trapped before exiting to stop the $#@@#$ event library from "asserting"
This commit was SVN r20630.
2009-02-25 03:10:21 +00:00
Ralph Castain
dcff523244 Fix a race condition that causes corruption of a buffer in mpirun while trying to process launch_local_proc cmds.
Cleanup the pidmap handling by changing from value to pointer arrays.

This commit was SVN r20629.
2009-02-25 02:43:22 +00:00
Shiqing Fan
3656a38a03 Fix a few type casts for windows.
This commit was SVN r20622.
2009-02-23 14:09:07 +00:00
Ralph Castain
1e5aa40e3f Ensure that this component is not selected by tools, or anything other than an MPI proc
This commit was SVN r20608.
2009-02-20 15:01:58 +00:00
Rainer Keller
02599446d0 - Occurences of ORTE_PROC_MY_NAME require orte/runtime/orte_globals.h
This commit was SVN r20607.
2009-02-20 03:16:13 +00:00
Ralph Castain
5dc4a2b1e0 Add missing include file
This commit was SVN r20603.
2009-02-19 21:40:31 +00:00
Ralph Castain
ca97f315fe Enable direct launch of applications under SLURM. Compute all required nidmap and mpidmap info based on publicly available SLURM environmental variables so that no linkage to SLURM libraries is required.
Note: this requires that nodes not be shared by jobs/users. SLURM developers are working on an enhancement to remove this constraint.


Note 2: yes, the direct routed module returned! However, it is vastly different than the old one and has zero support for such things as comm_spawn. It is solely to support non-daemon, direct-launch environments.

This commit was SVN r20601.
2009-02-19 21:39:54 +00:00
Ralph Castain
76fc406b08 Modify envars passed to support new proc_info and hier expectations
This commit was SVN r20600.
2009-02-19 21:36:30 +00:00
Ralph Castain
8359477387 Modify the base collective algorithms to take an array of arbitrary vpids instead of assuming everything is ordered in a particular way. Modify the hier grpcomm module to support arbitrary mappings
This commit was SVN r20599.
2009-02-19 21:35:20 +00:00
Ralph Castain
6151f7b60c Enable static ports for application procs during self-bootstrap for non-daemon environments by letting them select what port to use based on node rank and attempting to connect to the peer on that port
Note that this assumes non-shared nodes...but only takes affect if there is no prior knowledge of how to talk to the specified peer. Thus, all daemon-based environments are unaffected.

This commit was SVN r20598.
2009-02-19 21:33:46 +00:00
Ralph Castain
9c2c17beb0 Split out the nidmap init function that adds entries for the local node and proc so these can be separate functions
This commit was SVN r20597.
2009-02-19 21:28:58 +00:00
Ralph Castain
2759b8e5e5 Add a central capability to parse regular expressions for node and ppn info - constructing the regex to come soon.
This commit was SVN r20596.
2009-02-19 20:46:36 +00:00
Ralph Castain
6db641c86d Pass the number of nodes in a job to the process
This commit was SVN r20595.
2009-02-19 20:45:07 +00:00
Rolf vandeVaart
515b99b357 Under SGE, the orted should not daemonize by default.
Also create mca parameter to force daemonization (previous
behavior) which might be needed on larger clusters or
to make use of the -notify flag with qsub.

This fixes trac:1783.

This commit was SVN r20582.

The following Trac tickets were found above:
  Ticket 1783 --> https://svn.open-mpi.org/trac/ompi/ticket/1783
2009-02-18 18:02:38 +00:00
George Bosilca
8f1c7cf8c2 Make sure we correctly unregister all persistent events
and signal handlers.

This commit was SVN r20568.
2009-02-17 00:20:05 +00:00
George Bosilca
63754be94f Allow the tools to remove the cleanly finalize without
leaving the sighandler behind.

This commit was SVN r20567.
2009-02-16 20:04:55 +00:00
Shiqing Fan
3f6c64f2e3 Include a missing header,which was implicitly included and removed.
This commit was SVN r20563.
2009-02-16 12:38:38 +00:00
George Bosilca
4004cb11bc Release the orte_default_hostfile.
This commit was SVN r20561.
2009-02-14 21:49:56 +00:00
Rainer Keller
d81443cc5a - On the way to get the BTLs split out and lessen dependency on orte:
Often, orte/util/show_help.h is included, although no functionality
   is required -- instead, most often opal_output.h, or               
   orte/mca/rml/rml_types.h                                           
   Please see orte_show_help_replacement.sh commited next.            

 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   actually showed two *missing* #include "orte/util/show_help.h"     
   in orte/mca/odls/base/odls_base_default_fns.c and                  
   in orte/tools/orte-top/orte-top.c                                  
   Manually added these.                                              

   Let's have MTT the last word.

This commit was SVN r20557.
2009-02-14 02:26:12 +00:00
Ralph Castain
3e5ab0ac8c Ensure proper error reporting when -wdir options fail.
This commit was SVN r20555.
2009-02-13 19:46:24 +00:00
George Bosilca
fa7b499519 Move a data declaration down the stack.
This commit was SVN r20552.
2009-02-13 16:34:51 +00:00
Jeff Squyres
91d302fd67 A bunch of minor ORTE valgrind-inspired memory leak cleanups (reviewed
by Ralph).

This commit was SVN r20544.
2009-02-13 04:14:10 +00:00
Rolf vandeVaart
ce97c27a53 Make sure we create a valid parth argument for execve.
This gets SGE working in the trunk again.

This commit was SVN r20531.
2009-02-12 18:27:40 +00:00
Ralph Castain
91bc5346eb Update cell example
This commit was SVN r20528.
2009-02-12 16:36:11 +00:00