1
1
Граф коммитов

64 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
bd65f8ba88 Bring in an updated launch system for the orteds. This commit restores the ability to execute singletons and singleton comm_spawn, both in single node and multi-node environments.
Short description: major changes include -

1. singletons now fork/exec a local daemon to manage their operations.

2. the orte daemon code now resides in libopen-rte

3. daemons no longer use the orte triggering system during startup. Instead, they directly call back to their parent pls component to report ready to operate. A base function to count the callbacks has been provided.

I have modified all the pls components except xcpu and poe (don't understand either well enough to do it). Full functionality has been verified for rsh, SLURM, and TM systems. Compile has been verified for xgrid and gridengine.

This commit was SVN r15390.
2007-07-12 19:53:18 +00:00
Brian Barrett
1d02b9e7b5 Fix a bunch of issues exposed by Ken Cain in getting Open MPI to work with
VxWorks.  Still some issues remaining, I'm sure.

Refs trac:1010

This commit was SVN r15320.

The following Trac tickets were found above:
  Ticket 1010 --> https://svn.open-mpi.org/trac/ompi/ticket/1010
2007-07-10 03:46:57 +00:00
Sven Stork
086624a4fe - guess that we should retain the ep instead of releasing it
This commit was SVN r15244.
2007-06-29 11:18:37 +00:00
George Bosilca
649ab84654 Don't do SIGPIPE handling on Windows.
This commit was SVN r15025.
2007-06-12 22:44:39 +00:00
Jeff Squyres
4f3a11b4db Fixes trac:967.
A bunch of fixes from the /tmp/iof-fixes branch that fix up ''some''
(but not ''all'') of the problems that we have seen with iof:

 * Reading very large files via stdin redirected to orteun (Sun saw
   this)
 * Reading a little bit of a large file redirected to orterun's stdin
   and then either closing stdin or exiting the process

The Big Change was to make the proxy iof (the one running in non-HNP
orteds) send back a "I'm closing the stream" ACK back to the service
iof.  This tells the HNP that there will be nothing more coming from
that peer, and therefore the iof forward should be removed.

Many other minor cleanups/fixes, terminology changes, and
documentation additions are included in this commit as well.  However,
there are still some pretty big outstanding issues with IOF that are
not addressed either by #967 or this commit.  A few examples:

 * IOF was designed to allow multiple subscribers to a single stream.
   We're not entirely sure that this works (for one thing, there is
   nothing in the ORTE/OMPI code base that uses this functionality).
 * There are also resources leaked when processes/jobs exit (per
   Ralph's first comment on this ticket).  
 * There is no feedback to close orterun's stdin when all subscribers
   to the corresponding stream have closed stdin.

This commit was SVN r14967.

The following Trac tickets were found above:
  Ticket 967 --> https://svn.open-mpi.org/trac/ompi/ticket/967
2007-06-08 22:59:31 +00:00
Ralph Castain
a764aa6395 Modify iof to report back more descriptive errors
This commit was SVN r14497.
2007-04-24 19:28:37 +00:00
George Bosilca
4a87c782c3 Release all unselected components. This is a little bit more tricky than usual,
as the IOF components lack the required finalize function. Instead rely on the
module finalize. Read the comment or more informations.

This commit was SVN r14323.
2007-04-12 04:57:08 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Jeff Squyres
c000ee5328 Fixes trac:921
* Do not empty the list of in-flight frags during _close(); the OOB
   callback will still occur (_send_cb()) and try to remove the frag
   from the list, which will then result in an assert failure (debug
   builds).  
 * Add one more fix for a possible problem -- add an extra RETAIN /
   RELEASE pair on the endpoint to ensure that it is not actually
   freed before all in-flight frags have drained.

This commit was SVN r13953.

The following Trac tickets were found above:
  Ticket 921 --> https://svn.open-mpi.org/trac/ompi/ticket/921
2007-03-07 20:12:22 +00:00
Sven Stork
a86deb460e - export required symbols
This commit was SVN r13810.
2007-02-27 09:43:32 +00:00
Jeff Squyres
3cf7dddd47 Fixes trac:635.
Ralph identified the problem, I tracked down ''where'' the fd was
being closed, and Brian figured out ''why'' (and the fix).

What was happening is that a remote process was closing its
stdout/stderr and therefore sending a 0-byte IOF message to mpirun.
mpirun, in turn, closed the iof endpoint associated with that stream
(i.e., stdout/stderr).  IOF does this to handle the case where
mpirun's stdin is closed -- this therefore causes the stdin on all the
ORTE-started processes to have their stdin's closed as well.

So the workaround here is to check that if we get a 0-byte IOF message
on a sink (indicating a remote closure), and if that sink is the
special stdout or stderr stream, don't actually close anything in the
local process.

This commit was SVN r12691.

The following Trac tickets were found above:
  Ticket 635 --> https://svn.open-mpi.org/trac/ompi/ticket/635
2006-11-28 21:42:49 +00:00
Brian Barrett
0895f5e08d Rename OMPI_PROCESS_NAME_{HTON, NTOH} macros to ORTE_PROCESS_NAME_{HTON, NTOH}
because they are in ORTE, not OMPI.  Also, remove the ORTE_PROCESS_NAME macros
in iof base as they are duplicates of the ones that were in ns_types, which 
meant that bad things happened if you changed what an orte_process_name_t
looked like.

This commit was SVN r12646.
2006-11-22 03:03:21 +00:00
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
George Bosilca
f52c10d18e And ORTE is ready for prime-time. All Windows tricks are in:
- use the OPAL functions for PATH and environment variables
- make all headers C++ friendly
- no unamed structures
- no implicit cast.

Plus a full implementation for the orte_wait functions.

This commit was SVN r11347.
2006-08-23 03:32:36 +00:00
George Bosilca
6afa4c6c64 Windows friendly version. We have to split the OMPI_DECLSPEC in at least 3
different macros, one for each project. Therefore, now we have OPAL_DECLSPEC,
ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project.

This commit was SVN r11270.
2006-08-20 15:54:04 +00:00
Brian Barrett
cd7b138d74 propogate up errors when setting up standard input forwarding
This commit was SVN r11187.
2006-08-14 21:09:05 +00:00
Brian Barrett
2185c059e8 * use opal_free_list_item_t as the type of items stored in an opal_free_list_t,
rather than assuing it's an opal_list_item_t.

This commit was SVN r10860.
2006-07-17 21:51:50 +00:00
Brian Barrett
7000cecf78 Fix for standard output / standard error truncation issue when in a shell
pipeline.  See lengthy comment in iof_base_endpoint.c for the details, but
the short version is that we shouldn't set O_NONBLOCK on standard I/O 
file descriptors, so we no longer do.

Closes ticket:9

This commit was SVN r9966.
2006-05-18 15:43:32 +00:00
George Bosilca
1ea3a39372 The condition was wrong. The fact that it accept 0 length messages
is interpreted as a shutdown of the io channel on the next iteration.
Definitively not the good approach. The correct condition is
bigger than 0.

This commit was SVN r9770.
2006-04-28 04:57:07 +00:00
Jeff Squyres
bfcf3867fc Back out George's commit from earlier today; it seems to break stdout
forwarding. 

More detailed mail coming to devel-core shortly that explains.

This commit was SVN r9769.
2006-04-28 03:32:27 +00:00
George Bosilca
bafc16f724 We don't need the len anymore as everything is not attached to the fragment.
This commit was SVN r9758.
2006-04-27 17:35:05 +00:00
Tim Woodall
7a139d6cc8 - corrections to I/O forwarding - handling of incomplete writes
THESE CHANGES SHOULD BE PROPOGATED TO BOTH 1.0 and 1.1 BRANCHES

This commit was SVN r9734.
2006-04-26 15:36:06 +00:00
Brian Barrett
4ea8790342 * Don't try to call tcgetprgp on platforms that don't have that function
* Some more stuff to ignore / do in Red Storm build

This commit was SVN r9511.
2006-04-01 05:46:15 +00:00
Brian Barrett
d0f5f8a242 * enable pty support for platforms that do not have openpty(), but do
have pty support in general.

This commit was SVN r9250.
2006-03-11 02:35:40 +00:00
George Bosilca
a76213f352 No group and lossy signals on Windows.
This commit was SVN r9160.
2006-02-27 05:11:44 +00:00
George Bosilca
6a80c75110 The default value is not 0 as the variable is an enum !
This commit was SVN r9081.
2006-02-17 05:11:53 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
George Bosilca
b7fa1f4664 As signal.h to the include files to import SIGCONT.
This commit was SVN r8899.
2006-02-05 05:49:24 +00:00
Brian Barrett
03f6a8529c * Fix situation where we were unlocking a mutex we didn't own in an error
cleanup code in the signal part of the event library
* Only attempt to forward standard input if we have a controlling terminal
  (isatty() returns 1) and we are the foreground process OR we do not have
  a controlling terminal (isatty() returns 0).  If we have a controlling
  terminal, check at each SIGCONT if we should change our forwarding,
  since our foreground / background status may have changed.

  Unfortunately, there isn't a great way in the iof framework to know if
  we are capturing a starter's stdin.  Use the logic that if it's a source
  AND tagged as standard input, it's a starter's stdin.  This seems to
  work for all the common usages.

Both these need to go to the v1.0 branch.

This commit was SVN r8894.
2006-02-04 23:26:58 +00:00
Brian Barrett
7c247eea01 * Add a finalize function for iof framework and add a finalize function for
the svc component so that it can disable the rml exception callback, fixing
  a race condition in the shutdown mechanism of orte.

  This should probably go to the v1.0 branch.

This commit was SVN r8893.
2006-02-03 21:01:11 +00:00
Brian Barrett
ddda56eb0d * Don't use ptys for stdin. When a pty has close() called on it, it
discards all of the data in the pty that hasn't been read.  This was
  leading to data being discarded when files were redirected into
  mpirun and read by rank 0 of the job.  This was very "not good".

  The decision to not use ptys for stdin was made based on what Tim said
  that LA-MPI was doing.

  This needs to go to the v1.0 branch...  Tim should probably review...

This commit was SVN r8892.
2006-02-03 20:43:20 +00:00
Rainer Keller
dd13b098e1 - Simple locking fix.
This commit was SVN r8822.
2006-01-26 13:20:53 +00:00
George Bosilca
bf266c6109 Rollback the 8682 commit until we figure out the correct way to do it. It break several things
inside (like MPI_Wait* functions).

This commit was SVN r8686.
2006-01-13 22:02:40 +00:00
Rainer Keller
95f886b6ab - Protect callers of opal/ompi_condition_wait from spurious wakeups,
possible when with building with pthreads.
   Compiled on Linux ia32 with and without
   --enable-progress-threads

This commit was SVN r8682.
2006-01-12 17:13:08 +00:00
George Bosilca
505d830b3f I miss the requirement for the mca_base_component_repository.h header.
This commit was SVN r8465.
2005-12-12 21:10:30 +00:00
George Bosilca
7d8d516a4a A bunch of fixed for Windows support.
- protection with __WINDOWS__ and not WIN32 or _WIN32
 - protect all the headers

This commit was SVN r8463.
2005-12-12 20:04:00 +00:00
Jeff Squyres
bd0b5acf0b Oops -- there's a second instance of OCRNL that needed to be
protected.

This commit was SVN r8374.
2005-12-02 18:24:59 +00:00
Jeff Squyres
0c9420e204 OS X 10.3 does not have OCRNL #define'd, so we need to protect its
usage 

This commit was SVN r8371.
2005-12-02 16:57:37 +00:00
Tim Woodall
943e6f0cd5 corrections for stdin
- when eof is reached at orterun, send a 0 byte message to peer indicating eof
- on receipt of zero byte message - close corresponding file descriptor associated with the endpoint
- require setup ptys for stdin and stdout so that stdin can be closed independently of stdout

This commit was SVN r8264.
2005-11-28 14:58:53 +00:00
Brian Barrett
8faa1884f0 * The last of the build system optimizations. Combine the component and
component/base Makefile.am files, reducing the time configure spends
  stamping out Makefiles at the end
* Install base_impl.h file when devel-headers are being installed

This commit was SVN r8200.
2005-11-20 01:03:01 +00:00
Jeff Squyres
23ca7e1311 Ensure to return a value.
This commit was SVN r8182.
2005-11-17 14:31:42 +00:00
Brian Barrett
f464bbbcc0 fix a couple of double-lock issues in the iof code that have crept in recently.
This should go to the v1.0 branch.

This commit was SVN r8171.
2005-11-17 01:26:00 +00:00
Tim Woodall
142b7cc682 merge from release branch
This commit was SVN r8167.
2005-11-16 17:10:49 +00:00
Tim Woodall
59d8c791d9 return fragments to free list
This commit was SVN r8121.
2005-11-11 17:48:56 +00:00
Josh Hursey
5fa34df9ce Fix for orted / MPI_Abort problem reported from testers. They were seeing orteds
spining in orte_iof_base_flush() when running 
  intel_tests/src/MPI_Errhandler_fatal_c

When we close an endpoint by taking it out of the envent handler, we need to make
sure that it fits the criteria to pass through orte_iof_base_flush(), specificly
make sure we clean out the ep_frags list.
Note: This is more of a sanity check, since the endpoint should already be
      in this state at the point of closure.

Secondly in orte_iof_base_endpoint_read_handler(), if we determine that it is 
necessary to close the endpoint we have to "return" after doing so, otherwise
we add another frag to the endpoint which will cause it to hang in 
orte_iof_base_flush().

Bug go squish!

This commit was SVN r8109.
2005-11-11 00:09:07 +00:00
Tim Woodall
0b0d7f56c1 added support for callback on receipt of I/O
This commit was SVN r8084.
2005-11-10 04:49:51 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Tim Woodall
d0cd752e33 - don't track the sequence number when the endpoint is a data sink,
its not needed and there could be multiple sources each w/ their 
  own sequence.
- if a write doesn't complete, need to check for non-blocking case.. 

This commit was SVN r7795.
2005-10-18 14:26:12 +00:00
Jeff Squyres
9a25554559 Patch from Brooks Davis for some BSD compatibility issues.
This commit was SVN r7751.
2005-10-13 15:41:25 +00:00
Tim Woodall
3c900a7aa2 - fix a deadlock on threaded build
- update sequence number after a partial write completes

This commit was SVN r7654.
2005-10-06 21:50:58 +00:00