1
1
Граф коммитов

103 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
5b9fa7e998 reapply r15517 and r15520, which were removed in r15527 so that I could get
the RML/OOB merge in slightly easier

This commit was SVN r15530.

The following SVN revision numbers were found above:
  r15517 --> open-mpi/ompi@41977fcc95
  r15520 --> open-mpi/ompi@9cbc9df1b8
  r15527 --> open-mpi/ompi@2d17dd9516
2007-07-20 02:34:29 +00:00
Brian Barrett
39a6057fc6 A number of improvements / changes to the RML/OOB layers:
* General TCP cleanup for OPAL / ORTE
  * Simplifying the OOB by moving much of the logic into the RML
  * Allowing the OOB RML component to do routing of messages
  * Adding a component framework for handling routing tables
  * Moving the xcast functionality from the OOB base to its own framework

Includes merge from tmp/bwb-oob-rml-merge revisions:

    r15506, r15507, r15508, r15510, r15511, r15512, r15513

This commit was SVN r15528.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r15506
  r15507
  r15508
  r15510
  r15511
  r15512
  r15513
2007-07-20 01:34:02 +00:00
Brian Barrett
2d17dd9516 temporarily back our r15517 and 15520 so that I can get the RML / OOB changes
to cleanly apply

This commit was SVN r15527.

The following SVN revision numbers were found above:
  r15517 --> open-mpi/ompi@41977fcc95
2007-07-20 01:10:34 +00:00
Ralph Castain
41977fcc95 Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but...
Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point.

Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings.

This commit was SVN r15517.
2007-07-19 20:56:46 +00:00
Sven Stork
c43335d671 - release the lock before we forward a message. In the threaded
build it's possible that we have to process an ack before this function
  returns. If we don't release the lock here we cause a deadlock later
  in ack processing function.

This commit was SVN r15441.
2007-07-16 14:43:24 +00:00
Ralph Castain
cd9213a9f0 Son of a gun - how did that fix not get into r15390?
Fix the blasted iof null component so it only is selected if/when directed.

This commit was SVN r15437.

The following SVN revision numbers were found above:
  r15390 --> open-mpi/ompi@bd65f8ba88
2007-07-16 12:38:11 +00:00
Ralph Castain
bd65f8ba88 Bring in an updated launch system for the orteds. This commit restores the ability to execute singletons and singleton comm_spawn, both in single node and multi-node environments.
Short description: major changes include -

1. singletons now fork/exec a local daemon to manage their operations.

2. the orte daemon code now resides in libopen-rte

3. daemons no longer use the orte triggering system during startup. Instead, they directly call back to their parent pls component to report ready to operate. A base function to count the callbacks has been provided.

I have modified all the pls components except xcpu and poe (don't understand either well enough to do it). Full functionality has been verified for rsh, SLURM, and TM systems. Compile has been verified for xgrid and gridengine.

This commit was SVN r15390.
2007-07-12 19:53:18 +00:00
Brian Barrett
1d02b9e7b5 Fix a bunch of issues exposed by Ken Cain in getting Open MPI to work with
VxWorks.  Still some issues remaining, I'm sure.

Refs trac:1010

This commit was SVN r15320.

The following Trac tickets were found above:
  Ticket 1010 --> https://svn.open-mpi.org/trac/ompi/ticket/1010
2007-07-10 03:46:57 +00:00
Sven Stork
086624a4fe - guess that we should retain the ep instead of releasing it
This commit was SVN r15244.
2007-06-29 11:18:37 +00:00
George Bosilca
649ab84654 Don't do SIGPIPE handling on Windows.
This commit was SVN r15025.
2007-06-12 22:44:39 +00:00
Jeff Squyres
54064f6fa1 Fix a warning that Tim P. found this morning.
The warning was indicative of overly-complex code anyway.  So I
removed the "first" bool and simply use a sentinel value in seq_min to
indicate that nothing has changed.  Note that this is "correct enough"
for the moment -- more fixes will come in this area with tickets #1049
and/or #1051.

This commit was SVN r15013.
2007-06-12 17:30:54 +00:00
Jeff Squyres
b704ff9f4e Fix typo that accidentally always resulted in a "true" result.
This commit was SVN r14984.
2007-06-11 12:52:56 +00:00
Jeff Squyres
1ed906a78b * Forgot to update the iof null component when we revamped the IOF
framework.  Updated pointers to match current definitions.
 * Trimmed some dead wood while I was at it:
   * No need for component close function that does nothing
   * Use BEGIN/END_C_DECLS
   * Use recent MCA param register function
   * Ditch MCA param orte_iof_debug (it wasn't used anywhere)
   * Use MCA param orte_iof_override properly in the code (i.e., look
     up the value once and use the cached value later)

This commit was SVN r14981.
2007-06-11 00:58:21 +00:00
Jeff Squyres
4f3a11b4db Fixes trac:967.
A bunch of fixes from the /tmp/iof-fixes branch that fix up ''some''
(but not ''all'') of the problems that we have seen with iof:

 * Reading very large files via stdin redirected to orteun (Sun saw
   this)
 * Reading a little bit of a large file redirected to orterun's stdin
   and then either closing stdin or exiting the process

The Big Change was to make the proxy iof (the one running in non-HNP
orteds) send back a "I'm closing the stream" ACK back to the service
iof.  This tells the HNP that there will be nothing more coming from
that peer, and therefore the iof forward should be removed.

Many other minor cleanups/fixes, terminology changes, and
documentation additions are included in this commit as well.  However,
there are still some pretty big outstanding issues with IOF that are
not addressed either by #967 or this commit.  A few examples:

 * IOF was designed to allow multiple subscribers to a single stream.
   We're not entirely sure that this works (for one thing, there is
   nothing in the ORTE/OMPI code base that uses this functionality).
 * There are also resources leaked when processes/jobs exit (per
   Ralph's first comment on this ticket).  
 * There is no feedback to close orterun's stdin when all subscribers
   to the corresponding stream have closed stdin.

This commit was SVN r14967.

The following Trac tickets were found above:
  Ticket 967 --> https://svn.open-mpi.org/trac/ompi/ticket/967
2007-06-08 22:59:31 +00:00
Brian Barrett
21e00f6f0c Clean up a couple of configure things:
* Require Autoconf 2.60 or higher and remove some cruft
    required for AC 2.59 or the AC 2.59 / AC 2.60 mix
  * Remove a bunch of now unnecessary AC_SUBST calls
  * Use the libtool-provided variables for the -I and
    library to use when compiling against ltdl

Fixes trac:1000

This commit was SVN r14652.

The following Trac tickets were found above:
  Ticket 1000 --> https://svn.open-mpi.org/trac/ompi/ticket/1000
2007-05-15 04:23:48 +00:00
Ralph Castain
a764aa6395 Modify iof to report back more descriptive errors
This commit was SVN r14497.
2007-04-24 19:28:37 +00:00
Jeff Squyres
51f286d737 Just like r14289 on the ORTE trunk:
Per discussions with Brian and Ralph, make a slight correction in
where components are installed. Use $pkglibdir, not $libdir/openmpi,
so that when compiled in the orte trunk, components are installed to
the right directory (because the component search patch is checking
$pkglibdir).

This commit was SVN r14345.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r14289
2007-04-12 11:19:42 +00:00
George Bosilca
0d82473b9d Enable the null IOF.
This commit was SVN r14326.
2007-04-12 05:00:05 +00:00
George Bosilca
e7c4f1ca64 Remove some unused code and correct the finalize function (cancel the pending
receive request).

This commit was SVN r14324.
2007-04-12 04:58:12 +00:00
George Bosilca
4a87c782c3 Release all unselected components. This is a little bit more tricky than usual,
as the IOF components lack the required finalize function. Instead rely on the
module finalize. Read the comment or more informations.

This commit was SVN r14323.
2007-04-12 04:57:08 +00:00
Jeff Squyres
fe58753a23 Add a little documentation to iof.h.
This commit was SVN r14208.
2007-04-03 18:17:35 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Jeff Squyres
c000ee5328 Fixes trac:921
* Do not empty the list of in-flight frags during _close(); the OOB
   callback will still occur (_send_cb()) and try to remove the frag
   from the list, which will then result in an assert failure (debug
   builds).  
 * Add one more fix for a possible problem -- add an extra RETAIN /
   RELEASE pair on the endpoint to ensure that it is not actually
   freed before all in-flight frags have drained.

This commit was SVN r13953.

The following Trac tickets were found above:
  Ticket 921 --> https://svn.open-mpi.org/trac/ompi/ticket/921
2007-03-07 20:12:22 +00:00
Sven Stork
a86deb460e - export required symbols
This commit was SVN r13810.
2007-02-27 09:43:32 +00:00
Rainer Keller
3669e8921e - Fix further compiler warnings regarding initialization
and shadowing variables.

This commit was SVN r13358.
2007-01-30 06:34:38 +00:00
Brian Barrett
a34e67d743 Remove unneeded PARAM_INIT_FILE variable in configure.params files used by
components that use configure.m4 for configuration or are always built. 
The macro has not been needed since moving to configure types other than
configure.stub

Fixes trac:590

This commit was SVN r13031.

The following Trac tickets were found above:
  Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590
2007-01-08 03:44:22 +00:00
Brian Barrett
6f8b366acb Rename liborte to libopen-rte and libopal to libopen-pal per telecon today
and bug #632.

Refs trac:632

This commit was SVN r12762.

The following Trac tickets were found above:
  Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632
2006-12-05 18:27:24 +00:00
Jeff Squyres
3cf7dddd47 Fixes trac:635.
Ralph identified the problem, I tracked down ''where'' the fd was
being closed, and Brian figured out ''why'' (and the fix).

What was happening is that a remote process was closing its
stdout/stderr and therefore sending a 0-byte IOF message to mpirun.
mpirun, in turn, closed the iof endpoint associated with that stream
(i.e., stdout/stderr).  IOF does this to handle the case where
mpirun's stdin is closed -- this therefore causes the stdin on all the
ORTE-started processes to have their stdin's closed as well.

So the workaround here is to check that if we get a 0-byte IOF message
on a sink (indicating a remote closure), and if that sink is the
special stdout or stderr stream, don't actually close anything in the
local process.

This commit was SVN r12691.

The following Trac tickets were found above:
  Ticket 635 --> https://svn.open-mpi.org/trac/ompi/ticket/635
2006-11-28 21:42:49 +00:00
Brian Barrett
0895f5e08d Rename OMPI_PROCESS_NAME_{HTON, NTOH} macros to ORTE_PROCESS_NAME_{HTON, NTOH}
because they are in ORTE, not OMPI.  Also, remove the ORTE_PROCESS_NAME macros
in iof base as they are duplicates of the ones that were in ns_types, which 
meant that bad things happened if you changed what an orte_process_name_t
looked like.

This commit was SVN r12646.
2006-11-22 03:03:21 +00:00
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
George Bosilca
f52c10d18e And ORTE is ready for prime-time. All Windows tricks are in:
- use the OPAL functions for PATH and environment variables
- make all headers C++ friendly
- no unamed structures
- no implicit cast.

Plus a full implementation for the orte_wait functions.

This commit was SVN r11347.
2006-08-23 03:32:36 +00:00
George Bosilca
6afa4c6c64 Windows friendly version. We have to split the OMPI_DECLSPEC in at least 3
different macros, one for each project. Therefore, now we have OPAL_DECLSPEC,
ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project.

This commit was SVN r11270.
2006-08-20 15:54:04 +00:00
Brian Barrett
cd7b138d74 propogate up errors when setting up standard input forwarding
This commit was SVN r11187.
2006-08-14 21:09:05 +00:00
Brian Barrett
2185c059e8 * use opal_free_list_item_t as the type of items stored in an opal_free_list_t,
rather than assuing it's an opal_list_item_t.

This commit was SVN r10860.
2006-07-17 21:51:50 +00:00
Rainer Keller
50b5791969 - Release best_item
- Reformat

This commit was SVN r10814.
2006-07-14 19:55:14 +00:00
Brian Barrett
7000cecf78 Fix for standard output / standard error truncation issue when in a shell
pipeline.  See lengthy comment in iof_base_endpoint.c for the details, but
the short version is that we shouldn't set O_NONBLOCK on standard I/O 
file descriptors, so we no longer do.

Closes ticket:9

This commit was SVN r9966.
2006-05-18 15:43:32 +00:00
George Bosilca
1ea3a39372 The condition was wrong. The fact that it accept 0 length messages
is interpreted as a shutdown of the io channel on the next iteration.
Definitively not the good approach. The correct condition is
bigger than 0.

This commit was SVN r9770.
2006-04-28 04:57:07 +00:00
Jeff Squyres
bfcf3867fc Back out George's commit from earlier today; it seems to break stdout
forwarding. 

More detailed mail coming to devel-core shortly that explains.

This commit was SVN r9769.
2006-04-28 03:32:27 +00:00
George Bosilca
bafc16f724 We don't need the len anymore as everything is not attached to the fragment.
This commit was SVN r9758.
2006-04-27 17:35:05 +00:00
Tim Woodall
7a139d6cc8 - corrections to I/O forwarding - handling of incomplete writes
THESE CHANGES SHOULD BE PROPOGATED TO BOTH 1.0 and 1.1 BRANCHES

This commit was SVN r9734.
2006-04-26 15:36:06 +00:00
Brian Barrett
4ea8790342 * Don't try to call tcgetprgp on platforms that don't have that function
* Some more stuff to ignore / do in Red Storm build

This commit was SVN r9511.
2006-04-01 05:46:15 +00:00
Tim Woodall
637511e759 correct cleanup of callbacks
This commit was SVN r9479.
2006-03-30 16:55:02 +00:00
Brian Barrett
d0f5f8a242 * enable pty support for platforms that do not have openpty(), but do
have pty support in general.

This commit was SVN r9250.
2006-03-11 02:35:40 +00:00
George Bosilca
a76213f352 No group and lossy signals on Windows.
This commit was SVN r9160.
2006-02-27 05:11:44 +00:00
George Bosilca
6a80c75110 The default value is not 0 as the variable is an enum !
This commit was SVN r9081.
2006-02-17 05:11:53 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
George Bosilca
b7fa1f4664 As signal.h to the include files to import SIGCONT.
This commit was SVN r8899.
2006-02-05 05:49:24 +00:00
Brian Barrett
03f6a8529c * Fix situation where we were unlocking a mutex we didn't own in an error
cleanup code in the signal part of the event library
* Only attempt to forward standard input if we have a controlling terminal
  (isatty() returns 1) and we are the foreground process OR we do not have
  a controlling terminal (isatty() returns 0).  If we have a controlling
  terminal, check at each SIGCONT if we should change our forwarding,
  since our foreground / background status may have changed.

  Unfortunately, there isn't a great way in the iof framework to know if
  we are capturing a starter's stdin.  Use the logic that if it's a source
  AND tagged as standard input, it's a starter's stdin.  This seems to
  work for all the common usages.

Both these need to go to the v1.0 branch.

This commit was SVN r8894.
2006-02-04 23:26:58 +00:00
Brian Barrett
7c247eea01 * Add a finalize function for iof framework and add a finalize function for
the svc component so that it can disable the rml exception callback, fixing
  a race condition in the shutdown mechanism of orte.

  This should probably go to the v1.0 branch.

This commit was SVN r8893.
2006-02-03 21:01:11 +00:00
Brian Barrett
ddda56eb0d * Don't use ptys for stdin. When a pty has close() called on it, it
discards all of the data in the pty that hasn't been read.  This was
  leading to data being discarded when files were redirected into
  mpirun and read by rank 0 of the job.  This was very "not good".

  The decision to not use ptys for stdin was made based on what Tim said
  that LA-MPI was doing.

  This needs to go to the v1.0 branch...  Tim should probably review...

This commit was SVN r8892.
2006-02-03 20:43:20 +00:00