1
1
Граф коммитов

987 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
39013e2a18 Clean up a couple of minor typos. Bring the new bproc-related RAS components online.
This commit was SVN r15328.
2007-07-10 14:11:26 +00:00
Ralph Castain
a1bf04f39e First cut at revamping bproc support to separate it out from LANL's configuration.
First cut at adding support for LSF

Lots of ompi_ignores so only Jeff and I will see this stuff

This commit was SVN r15321.
2007-07-10 12:43:05 +00:00
Brian Barrett
1d02b9e7b5 Fix a bunch of issues exposed by Ken Cain in getting Open MPI to work with
VxWorks.  Still some issues remaining, I'm sure.

Refs trac:1010

This commit was SVN r15320.

The following Trac tickets were found above:
  Ticket 1010 --> https://svn.open-mpi.org/trac/ompi/ticket/1010
2007-07-10 03:46:57 +00:00
Brian Barrett
b27b9b5380 * Clean up the ompi_mca macro's support for different configuration
types and add STOP_AT_FIRST_PRIORITY type for framework configuration,
    which allows all components at the highest priority that succeeds to
    succeed
  * Use STOP_AT_FIRST_PRIORITY type for gpr framework, so that the null
    component isn't built when the replica and proxy components are
    available.

This commit was SVN r15286.
2007-07-04 22:00:15 +00:00
Ralph Castain
684aa1bc9f Since universe size now is an orte thing, we may as well give it some direct support. Create rmgr set/get functions so it becomes more obvious where this value is being defined and how to retrieve it. Modify the bproc pls to pass it to the app procs when launched. Modify one of the test programs to verify it has been correctly set.
This commit was SVN r15266.
2007-07-02 16:45:40 +00:00
Tim Prins
c46ed1d5d4 Make it so the universe size is passed through the ODLS instead of through a gpr trigger during MPI init. This matches what is currently being done with the app number.
The default odls has been updated and works fine. The process odls has been updated, but I could not verify its operation. The bproc ODLS has not been updated yet. Ralph will look at it soon.

This commit was SVN r15257.
2007-07-02 01:33:35 +00:00
Sven Stork
086624a4fe - guess that we should retain the ep instead of releasing it
This commit was SVN r15244.
2007-06-29 11:18:37 +00:00
Brian Barrett
f8fb1e9720 Fix some compile failures on Solaris 9 because it doesn't have V6ONLY.
This commit was SVN r15237.
2007-06-28 18:52:15 +00:00
Ralph Castain
e299f7039f Allow bproc operations on the head node if it was allocated for our use
This commit was SVN r15232.
2007-06-28 14:53:17 +00:00
Josh Hursey
f88aa6c273 This commit cleans up the AMCA parameter implementation a bit.
* Remove the 'opal_mca_base_param_use_amca_sets' global variable
* Harness the fact that you can (read should) call the cmd_line functions
  before initializing opal_init_util(). This pushes the MCA/GMCA/AMCA
  command line options into the environment before OPAL inits and starts
  to use these values. By putting the cmd_line parse before opal_init_util
  in orterun and orted we only parse the *MCA parameter files once, and 
  correctly (alleviating the need to 'recache' the files on init.)
* Small bits of cleanup.

This commit was SVN r15219.
2007-06-27 01:03:31 +00:00
Sven Stork
0edcf1d47e - export required symbol
This commit was SVN r15190.
2007-06-25 14:27:04 +00:00
Josh Hursey
84f102c343 Fix/Cleanup the Checkpoint Error propagation through the Snapc Full component.
This commit was SVN r15175.
2007-06-22 16:14:25 +00:00
Jeff Squyres
bd56dc7e5d Fixes trac:1060
Per suggestion, if we don't find a valid shell via getpwuid(), also
check the $SHELL environment variable.  Also perform a few minor
cleanups along the way.

This commit was SVN r15156.

The following Trac tickets were found above:
  Ticket 1060 --> https://svn.open-mpi.org/trac/ompi/ticket/1060
2007-06-21 11:40:42 +00:00
Josh Hursey
edb2cbd150 In r15007 the --bootproxy orted argument was removed to support daemon reuse.
The SnapC Full local Coordinator used this argument to attach to the job the
daemon would be launching. So once this option was removed C/R support broke.

This commit has the local coordinator attach to the job just before it is
launched by the ODLS module. This is a much cleaner solution, and will
eventually allow the SnapC modules to attach to multiple jobs launched 
on a single machine.

This commit fixes the C/R regression introduced in r15007.

This commit was SVN r15121.

The following SVN revision numbers were found above:
  r15007 --> open-mpi/ompi@85df3bd92f
2007-06-18 15:39:04 +00:00
Shiqing Fan
2a77d46117 Fix a small bug.
This commit was SVN r15119.
2007-06-18 12:50:29 +00:00
George Bosilca
99e701062a The Windows job scheduler PLS. Initial commit as I have to move to
another Windows cluster. Right now it's not in a usable state.

This commit was SVN r15113.
2007-06-17 04:54:07 +00:00
Ralph Castain
e653da1d11 Where or where did that patch go??? Ah - there it went! ;-)
Fix singleton operations - allow multiple xcasts to be queued.

This commit was SVN r15097.
2007-06-15 13:45:29 +00:00
George Bosilca
35e824377e There seems to be a subtle race condition when we fail to spawn a
child. Marking the child as failed solve the issue.

This commit was SVN r15087.
2007-06-14 22:36:47 +00:00
George Bosilca
a4d99ddef6 More synchronizations for the Windows version. The problem came from
the multiple threads accessing the OOB/registry asynchronously via the
callbacks. The quickest solution (but definitively not the cleanest) is
to serialize these callbacks in such a way that at any given time
only one thread can execute a callbacks.

This commit was SVN r15086.
2007-06-14 22:35:38 +00:00
George Bosilca
fb9ff5cc75 Don't remove the tcp events from the list, they will remove themselves
in the destructor.

This commit was SVN r15085.
2007-06-14 22:33:09 +00:00
Josh Hursey
6cdfefad87 Fix portals BTL and cnos RML.
Both were failing due to interface changes that were never 
applied to them properly.

This commit was SVN r15082.
2007-06-14 18:49:41 +00:00
Ralph Castain
fde15ac97d Bring the TM launcher online
This commit was SVN r15076.
2007-06-14 12:33:34 +00:00
George Bosilca
95a607b945 A more Windows friendly version. As the socket event will be generated
through the win dll using multiple threads, we have to insure that
the oob callbacks happens only in a synchronous way or really bad
things happens with the current design (blocking messages from a receive
callback).

This commit was SVN r15069.
2007-06-14 04:38:06 +00:00
George Bosilca
8dfa06a617 Only output when the user request it.
This commit was SVN r15067.
2007-06-14 04:33:18 +00:00
George Bosilca
13a693faa0 Update the Windows process ODLS.
This commit was SVN r15066.
2007-06-14 04:32:19 +00:00
Pak Lui
de0f1eef89 No major changes here. Just updates to remove unused code and comments.
This commit was SVN r15051.
2007-06-13 17:23:03 +00:00
Pak Lui
03a93a38c5 Added an option for daemonizing orted. The existing behavior to --no-daemonize
for gridengine is not changed.

This commit was SVN r15050.
2007-06-13 17:11:37 +00:00
Ralph Castain
5adef03179 Clean up a diagnostic so it only outputs when requested
This commit was SVN r15048.
2007-06-13 15:53:10 +00:00
George Bosilca
18c2bb0ed6 Don't forget to set the name argument before spawning the daemon.
This commit was SVN r15047.
2007-06-13 15:45:34 +00:00
Pak Lui
8e7daea11f bring inline more changes with r15007.
This commit was SVN r15044.

The following SVN revision numbers were found above:
  r15007 --> open-mpi/ompi@85df3bd92f
2007-06-13 15:30:18 +00:00
Ralph Castain
425fed95ff Bring the SGE component online
This commit was SVN r15043.
2007-06-13 15:02:47 +00:00
Rainer Keller
7e0b400f3f - Small Fix.
This commit was SVN r15037.
2007-06-13 10:43:03 +00:00
George Bosilca
278ec7fd4f I wonder how this one compiled before ... or how do I manage to
miss it ...

This commit was SVN r15032.
2007-06-12 23:24:39 +00:00
George Bosilca
9d342ccb61 Shorter warning message.
This commit was SVN r15031.
2007-06-12 23:22:09 +00:00
George Bosilca
715f6012cf The DSS pack function can use the const attribute for the src field
as it is never modified by the pack functions directly. Enforce it
all over the code base.

This commit was SVN r15026.
2007-06-12 22:47:14 +00:00
George Bosilca
649ab84654 Don't do SIGPIPE handling on Windows.
This commit was SVN r15025.
2007-06-12 22:44:39 +00:00
George Bosilca
9e89abbd57 HAVE_SYS_TYPES_H require an ifdef.
This commit was SVN r15024.
2007-06-12 22:43:18 +00:00
George Bosilca
432185d617 Forget to remove the MCA parameter corresponding to the 2 unused
fields in the RSH PLS component.

This commit was SVN r15023.
2007-06-12 22:41:38 +00:00
George Bosilca
49e7bf3069 Be a little bit more clear when we fail to identify the shell.
This commit was SVN r15022.
2007-06-12 22:40:44 +00:00
George Bosilca
5b7796dfcd Remove 2 unused fields.
This commit was SVN r15021.
2007-06-12 22:39:57 +00:00
George Bosilca
16c38cabe1 Update the Windows ODLS component.
This commit was SVN r15020.
2007-06-12 22:37:04 +00:00
George Bosilca
bf6f30a42c Make the Windows PLS component match the current requirements for
a PLS module.

This commit was SVN r15019.
2007-06-12 22:34:56 +00:00
Ralph Castain
af64009368 Bring the CNOS component of the PLS back online
This commit was SVN r15018.
2007-06-12 22:17:05 +00:00
Jeff Squyres
54064f6fa1 Fix a warning that Tim P. found this morning.
The warning was indicative of overly-complex code anyway.  So I
removed the "first" bool and simply use a sentinel value in seq_min to
indicate that nothing has changed.  Note that this is "correct enough"
for the moment -- more fixes will come in this area with tickets #1049
and/or #1051.

This commit was SVN r15013.
2007-06-12 17:30:54 +00:00
Brian Barrett
84d1512fba Add the potential for doing some basic error checking on mutexes during
single threaded builds.  In its default configuration, all this does
is ensure that there's at least a good chance of threads building
based on non-threaded development (since the variable names will be
checked).  There is also code to make sure that a "mutex" is never
"double locked" when using the conditional macro mutex operations.
This is off by default because there are a number of places in both
ORTE and OMPI where this alarm spews mega bytes of errors on a
simple test.  So we have some work to do on our path towards
thread support.

Also removed the macro versions of the non-conditional thread locks,
as the only places they were used, the author of the code intended
to use the conditional thread locks.  So now you have upper-case
macros for conditional thread locks and lowercase functions for
non-conditional locks.  Simple, right? :).

This commit was SVN r15011.
2007-06-12 16:25:26 +00:00
Ralph Castain
4e8081ed1e Cleanup a now unnecessary variable
This commit was SVN r15010.
2007-06-12 14:23:33 +00:00
Tim Prins
1467558157 Cleanup a couple warnings.
Update svn:ignore

This commit was SVN r15009.
2007-06-12 14:11:06 +00:00
Ralph Castain
85df3bd92f Bring in the generalized xcast communication system along with the correspondingly revised orted launch. I will send a message out to developers explaining the basic changes. In brief:
1. generalize orte_rml.xcast to become a general broadcast-like messaging system. Messages can now be sent to any tag on the daemons or processes. Note that any message sent via xcast will be delivered to ALL processes in the specified job - you don't get to pick and choose. At a later date, we will introduce an augmented capability that will use the daemons as relays, but will allow you to send to a specified array of process names.

2. extended orte_rml.xcast so it supports more scalable message routing methodologies. At the moment, we support three: (a) direct, which sends the message directly to all recipients; (b) linear, which sends the message to the local daemon on each node, which then relays it to its own local procs; and (b) binomial, which sends the message via a binomial algo across all the daemons, each of which then relays to its own local procs. The crossover points between the algos are adjustable via MCA param, or you can simply demand that a specific algo be used.

3. orteds no longer exhibit two types of behavior: bootproxy or VM. Orteds now always behave like they are part of a virtual machine - they simply launch a job if mpirun tells them to do so. This is another step towards creating an "orteboot" functionality, but also provided a clean system for supporting message relaying.

Note one major impact of this commit: multiple daemons on a node cannot be supported any longer! Only a single daemon/node is now allowed.

This commit is known to break support for the following environments: POE, Xgrid, Xcpu, Windows. It has been tested on rsh, SLURM, and Bproc. Modifications for TM support have been made but could not be verified due to machine problems at LANL. Modifications for SGE have been made but could not be verified. The developers for the non-verified environments will be separately notified along with suggestions on how to fix the problems.

This commit was SVN r15007.
2007-06-12 13:28:54 +00:00
Brian Barrett
27ad954265 Fix a couple of problems with the way we were using orte_process_name_t
structures in the system.  Instead of using memcmp, use the ns function.
This won't cause a problem as long as all three elements of the name are
ints, but if they have different sizes, alignment and padding rules
can cause memcmp() to compare padding space, which rarely holds a sane
value.

This commit was SVN r14998.
2007-06-11 19:12:11 +00:00
Shiqing Fan
d9fa58dc33 Add two more arguments to call. The definition of the function has been modified with 2 additional arguments.
This commit was SVN r14990.
2007-06-11 14:27:36 +00:00