1
1
Граф коммитов

1609 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
11979c100a Silence pointless compiler warning
This commit was SVN r20661.
2009-02-28 15:35:48 +00:00
Tim Mattox
57be80c983 First pass at integrating the CIFTS/FTB support as
a notifier module.
The Notifier framework was extended slightly to
convey more information about each event notice.
This works with the FTB v0.5 API.

To compile with FTB support, use --with-ftb=/path/to/ftb/install

CIFTS == Coordinated Infrastructure for Fault Tolerant Systems
FTB == Fault Tolerance Backplane
see http://wiki.mcs.anl.gov/cifts/index.php

This commit was SVN r20655.
2009-02-27 22:53:43 +00:00
Ralph Castain
b8ffa302da Separate abnormal job termination from abnormal orted termination so we can continue to use xcast for orted cmds, but can know to turn off reading of stdin as the job is being terminated.
This commit was SVN r20650.
2009-02-27 10:16:25 +00:00
Ralph Castain
4f75f6e443 Fix a bug where we were not stopping the read event on stdin if the write to stdin of the target process was backing up.
Ensure we stop reading stdin if we are abnormally terminating - no point in doing so since the job is being terminated.

This commit was SVN r20649.
2009-02-27 09:31:34 +00:00
Rainer Keller
1745895d09 - Sorry to come back to this, but revert r20643...
Headers should be included in the .c directly.

This commit was SVN r20645.

The following SVN revision numbers were found above:
  r20643 --> open-mpi/ompi@e46c512ee7
2009-02-26 22:01:01 +00:00
Josh Hursey
e46c512ee7 Fix a couple of missing headers resulting from recent cleanup
This commit was SVN r20643.
2009-02-26 16:56:56 +00:00
Rainer Keller
4c0e8e1e69 - Header orte/mca/oob/base/base.h is probably the wrong one to include
anyhow -- if oob functionality is neededm then orte/mca/oob/oob.h

   Nevertheless compiles fine with -Wimplicit-function-declaration   

This commit was SVN r20641.
2009-02-26 04:20:03 +00:00
Rainer Keller
04567d3af0 - Header orte/mca/errmgr/errmgr.h is not needed.
Once again compiles fine with -Wimplicit-function-declaration   

This commit was SVN r20640.
2009-02-26 04:05:30 +00:00
Rainer Keller
96e1b9b747 - Header orte/mca/rml/rml.h is not needed if no occurence of orte_rml
or ORTE_RML.
   As the others compiles fine with -Wimplicit-function-declaration

This commit was SVN r20639.
2009-02-26 03:52:31 +00:00
Rainer Keller
bcac113b13 - Header orte/mca/ess/ess.h not being used
This commit was SVN r20638.
2009-02-26 03:28:59 +00:00
Rainer Keller
b356e90fa1 - Get rid of include orte/util/proc_info.h, if not needed
Only proc_info.h-internal include file is opal/dss/dss_types.h
 - In one case (orte/util/hnp_contact.c) had to add proc_info.h again.
 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   works fine, no errors.

   Again, let's have MTT the last word.

This commit was SVN r20631.
2009-02-25 03:38:00 +00:00
Ralph Castain
dcff523244 Fix a race condition that causes corruption of a buffer in mpirun while trying to process launch_local_proc cmds.
Cleanup the pidmap handling by changing from value to pointer arrays.

This commit was SVN r20629.
2009-02-25 02:43:22 +00:00
Ralph Castain
1e5aa40e3f Ensure that this component is not selected by tools, or anything other than an MPI proc
This commit was SVN r20608.
2009-02-20 15:01:58 +00:00
Rainer Keller
02599446d0 - Occurences of ORTE_PROC_MY_NAME require orte/runtime/orte_globals.h
This commit was SVN r20607.
2009-02-20 03:16:13 +00:00
Ralph Castain
ca97f315fe Enable direct launch of applications under SLURM. Compute all required nidmap and mpidmap info based on publicly available SLURM environmental variables so that no linkage to SLURM libraries is required.
Note: this requires that nodes not be shared by jobs/users. SLURM developers are working on an enhancement to remove this constraint.


Note 2: yes, the direct routed module returned! However, it is vastly different than the old one and has zero support for such things as comm_spawn. It is solely to support non-daemon, direct-launch environments.

This commit was SVN r20601.
2009-02-19 21:39:54 +00:00
Ralph Castain
76fc406b08 Modify envars passed to support new proc_info and hier expectations
This commit was SVN r20600.
2009-02-19 21:36:30 +00:00
Ralph Castain
8359477387 Modify the base collective algorithms to take an array of arbitrary vpids instead of assuming everything is ordered in a particular way. Modify the hier grpcomm module to support arbitrary mappings
This commit was SVN r20599.
2009-02-19 21:35:20 +00:00
Ralph Castain
6151f7b60c Enable static ports for application procs during self-bootstrap for non-daemon environments by letting them select what port to use based on node rank and attempting to connect to the peer on that port
Note that this assumes non-shared nodes...but only takes affect if there is no prior knowledge of how to talk to the specified peer. Thus, all daemon-based environments are unaffected.

This commit was SVN r20598.
2009-02-19 21:33:46 +00:00
Ralph Castain
9c2c17beb0 Split out the nidmap init function that adds entries for the local node and proc so these can be separate functions
This commit was SVN r20597.
2009-02-19 21:28:58 +00:00
Ralph Castain
6db641c86d Pass the number of nodes in a job to the process
This commit was SVN r20595.
2009-02-19 20:45:07 +00:00
Rolf vandeVaart
515b99b357 Under SGE, the orted should not daemonize by default.
Also create mca parameter to force daemonization (previous
behavior) which might be needed on larger clusters or
to make use of the -notify flag with qsub.

This fixes trac:1783.

This commit was SVN r20582.

The following Trac tickets were found above:
  Ticket 1783 --> https://svn.open-mpi.org/trac/ompi/ticket/1783
2009-02-18 18:02:38 +00:00
George Bosilca
63754be94f Allow the tools to remove the cleanly finalize without
leaving the sighandler behind.

This commit was SVN r20567.
2009-02-16 20:04:55 +00:00
Shiqing Fan
3f6c64f2e3 Include a missing header,which was implicitly included and removed.
This commit was SVN r20563.
2009-02-16 12:38:38 +00:00
Rainer Keller
d81443cc5a - On the way to get the BTLs split out and lessen dependency on orte:
Often, orte/util/show_help.h is included, although no functionality
   is required -- instead, most often opal_output.h, or               
   orte/mca/rml/rml_types.h                                           
   Please see orte_show_help_replacement.sh commited next.            

 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   actually showed two *missing* #include "orte/util/show_help.h"     
   in orte/mca/odls/base/odls_base_default_fns.c and                  
   in orte/tools/orte-top/orte-top.c                                  
   Manually added these.                                              

   Let's have MTT the last word.

This commit was SVN r20557.
2009-02-14 02:26:12 +00:00
Ralph Castain
3e5ab0ac8c Ensure proper error reporting when -wdir options fail.
This commit was SVN r20555.
2009-02-13 19:46:24 +00:00
George Bosilca
fa7b499519 Move a data declaration down the stack.
This commit was SVN r20552.
2009-02-13 16:34:51 +00:00
Jeff Squyres
91d302fd67 A bunch of minor ORTE valgrind-inspired memory leak cleanups (reviewed
by Ralph).

This commit was SVN r20544.
2009-02-13 04:14:10 +00:00
Rolf vandeVaart
ce97c27a53 Make sure we create a valid parth argument for execve.
This gets SGE working in the trunk again.

This commit was SVN r20531.
2009-02-12 18:27:40 +00:00
Ralph Castain
62dd763a8f Add ability for local slave spawns to pre-position supporting files. Update comm_spawn and comm_spawn_multiple man pages to cover new info_keys.
This commit was SVN r20527.
2009-02-12 15:56:45 +00:00
Ralph Castain
816ef9e0a3 Ensure that the rsh_agent_argv is properly initialized when assembling the SGE qrsh command
This commit was SVN r20518.
2009-02-11 18:48:44 +00:00
Rolf vandeVaart
74b2001d61 Fix builds on Solaris. Missing errno.h file.
This commit was SVN r20516.
2009-02-11 15:08:07 +00:00
Ralph Castain
e76b68e554 Replace a missing line so that the TM libs are included in dynamic builds
This commit was SVN r20514.
2009-02-11 14:40:11 +00:00
Ralph Castain
390ce219f8 Enable the slurmd plm to trigger an mpirun exit if no other daemons are in the system
This commit was SVN r20507.
2009-02-10 19:22:57 +00:00
Ralph Castain
6a7fa79a09 Cleanup debug by converting to show_help, little more work to cleanup local vs remote ops when no preload is specified
This commit was SVN r20506.
2009-02-10 19:11:24 +00:00
Ralph Castain
d1b5afd9ea If we don't pre-position the binaries, correctly setup the ssh command to execute the bootproxy
This commit was SVN r20501.
2009-02-10 18:27:10 +00:00
Shiqing Fan
2f1461419c Add a new feature for checking mca subdirectories, i.e. detecting if there is an exclude file list which indicates the files that shouldn't be added to the source list. By default, the CMake build system will simply add all source files in the required sub folders, without knowing which files have to be excluded. The first use of it is in plm/base/.windows.
And clean up the nested variable names, in order to make it readable.

This commit was SVN r20498.
2009-02-10 17:20:13 +00:00
Ralph Castain
4cdf91a8d4 Per the RFC, extend the current use of the ompi_proc_t flags field (without changing the field itself).
The prior ompi_proc_t structure had a uint8_t flag field in it, where only one
bit was used to flag that a proc was "local". In that context, "local" was
constrained to mean "local to this node".

This commit provides a greater degree of granularity on the term "local", to include tests
to see if the proc is on the same socket, PC board, node, switch, CU (computing
unit), and cluster.

Add #define's to designate which bits stand for which local condition. This
was added to the OPAL layer to avoid conflicting with the proposed movement of
the BTLs. To make it easier to use, a set of macros have been defined - e.g.,
OPAL_PROC_ON_LOCAL_SOCKET - that test the specific bit. These can be used in
the code base to clearly indicate which sense of locality is being considered.

All locations in the code base that looked at the current proc_t field have
been changed to use the new macros.

Also modify the orte_ess modules so that each returns a uint8_t (to match the
ompi_proc_t field) that contains a complete description of the locality of this
proc. Obviously, not all environments will be capable of providing such detailed
info. Thus, getting a "false" from a test for "on_local_socket" may simply
indicate a lack of knowledge.

This commit was SVN r20496.
2009-02-10 02:20:16 +00:00
Ralph Castain
42df4b2102 Enable the slurmd plm module for testing - only selected if specified
This commit was SVN r20495.
2009-02-09 21:16:24 +00:00
Ralph Castain
f0af389910 Enable comm_spawn of slave processes, currently only active for the rsh, slurm, and tm environments. Establish support for local rsh environments in the plm/base so that rsh of local slaves can be done by any environment that supports it. Create new orte_rsh_agent param so users can specify rsh agent from outside of rsh plm, and sym link that to the old plm_rsh_agent and pls_rsh_agent options.
Modify the orte-bootproxy to pass prefix for the remote slave to support hetero/hybrid scenarios

This commit was SVN r20492.
2009-02-09 20:44:44 +00:00
Ralph Castain
cab5095ce8 Init variable
This commit was SVN r20488.
2009-02-09 20:38:15 +00:00
Ralph Castain
5bfd1f3fd0 Ensure we have a correct, non-zero exit status when daemons or procs abort or fail to launch
This commit was SVN r20478.
2009-02-07 00:57:17 +00:00
Ralph Castain
8924e00e4c Ensure we don't segfault if we don't know which proc failed
This commit was SVN r20474.
2009-02-06 20:04:36 +00:00
Ralph Castain
0750103d6c Teach the routed modules that local slave processes are direct routes to/from their master daemon.
This commit was SVN r20467.
2009-02-06 15:41:53 +00:00
Ralph Castain
13749673ed Enable spawn of local slave processes - plm module implementation to follow
This commit was SVN r20466.
2009-02-06 15:31:33 +00:00
Ralph Castain
e2a8f45fba Update the nidmap functions to include a new lookup_jmap entry, and to initialize the nidmap and pidmap for startup.
Have the singleton ess module use the new capability.

Adjust a comment in ess_base_put

This commit was SVN r20464.
2009-02-06 15:28:32 +00:00
Ralph Castain
c5b637418b Ensure that the various grpcomm modules use a common data set and packing order for modex operations so that jobs using different grpcomm modules can still perform connect/accept.
Have dynamic grpcomm operations update the nidmap/pidmap to support additional features.

This commit was SVN r20463.
2009-02-06 15:25:06 +00:00
Ralph Castain
a6f9c1f2b1 Allocate the slots for use in the xgrid plm
This commit was SVN r20460.
2009-02-06 00:55:14 +00:00
Shiqing Fan
a20254c8a5 A few type casts, making the MS compiler silent.
This commit was SVN r20449.
2009-02-05 16:37:44 +00:00
Ralph Castain
6292b797e9 Add a new ESS module for use by local slave processes - only active when specifically selected
This commit was SVN r20438.
2009-02-05 06:07:48 +00:00
Ralph Castain
b7e6bafada Add a new routed module for local slave processes to use - only active when specifically selected
This commit was SVN r20437.
2009-02-05 06:07:04 +00:00