1
1
Граф коммитов

23 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
728a24c8ec After considerable patience and help with debugging/testing from Tim M and Jeff S, return a completed and pretty well tested patch of the IOF to the trunk. This commit includes the previously reverted r20074, r20068, and r20064, as well as changes to fix those commits.
Basically, the remaining problem turned out to be:

1. closing stdout/stderr during orte_finalize of mpirun

2. inadvertently setting up a write event on fd = -1

3. devising a scheme to more accurately track when the stdin write event was active vs closed so it only got released once

This passed prelim MTT testing by Jeff and Tim, but should soak for awhile before migrating to 1.3.

This commit was SVN r20106.

The following SVN revision numbers were found above:
  r20064 --> open-mpi/ompi@a07660aea8
  r20068 --> open-mpi/ompi@ec930d14a9
  r20074 --> open-mpi/ompi@2940309613
2008-12-10 20:40:47 +00:00
Ralph Castain
e28210d0dc Revert r20074, r20068, and r20064: remove the IOF proc completion code pending further off-trunk work.
This commit was SVN r20089.

The following SVN revision numbers were found above:
  r20064 --> open-mpi/ompi@a07660aea8
  r20068 --> open-mpi/ompi@ec930d14a9
  r20074 --> open-mpi/ompi@2940309613
2008-12-09 17:11:59 +00:00
Ralph Castain
a07660aea8 Bring over the IOF completion changes. This commit fixes the long-occurring problem whereby application procs could, under some circumstances, lose their final prints to stdout/err. The commit includes:
1. coordination of job completion notification to include a requirement for both waitpid detection AND notification that all iof pipes have been closed by the app

2. change of all IOF read and write events to be non-persistent so they can properly be shutdown and restarted only when required

3. addition of a delay (currently set to 10ms) before restarting the stdin read event. This was required to ensure that the stdout, stderr, and stddiag read events had an opportunity to be serviced in scenarios where large files are attached to stdin.

This commit was SVN r20064.
2008-12-03 17:45:42 +00:00
Ralph Castain
586334d1c8 Per discussion with Tim Mattox, reset the trunk to pre-19991 level for the iof only. I will shortly add a changeset that will repair the one known error where we were incorrectly closing the stdout/err/diag file descriptors when all we wanted to do was close stdin. I will leave out the changes associated with coordinating proc termination due to race conditions IU encounted during MTT testing. I have been unable to replicate those so far, but we hope to resolve it in the near future.
This commit was SVN r19998.
2008-11-14 20:22:36 +00:00
Ralph Castain
891630ae85 Handle a race condition between mpirun detecting stdin closed (and releasing the read event), and receiving an xon/xoff notice from a remote orted that detects proc termination and tells mpirun "don't send any more input - the proc is gone". This latter was necessary since we might have hung an infinite source of input on mpirun, while the proc terminated after some point in time.
This commit was SVN r19997.
2008-11-14 15:19:53 +00:00
Ralph Castain
702fc7154c Remove stale function definition
This commit was SVN r19992.
2008-11-13 05:07:11 +00:00
Ralph Castain
555bbf0c02 Fix the iof race conditions wrt proc termination. This is comprised of two sections:
1. modify the iof to track when a proc actually closes all of its open iof output pipes. When this occurs, notify the odls that the proc's iof is complete. This is done via a zero-time event so that we can step out of the read event before processing the notification.

2. in the odls, modify the waitpid callback so it only flags that it was called. Add a function to receive the iof-complete notification, and a function that checks for both iof complete and waitpid callback before declaring a proc fully terminated. This ensures that we read and deliver -all- of the IO prior to declaring the job complete.

Also modified the odls call to orte_iof.close (and the component's implementation) so it only closes stdin, leaving the other io channels alone. This fixes the other half of the known problem.

This should fix the ticket on this subject, but I'll wait to close it pending further testing in the trunk.

This commit was SVN r19991.
2008-11-12 23:32:01 +00:00
Ralph Castain
c56cdac379 Finish cleanup of stdin. Set non-stdio file descriptors to non-blocking (thanks to Jeff for catching that one). Handle writes that result in "would have blocked" errno.
This commit was SVN r19793.
2008-10-24 01:42:58 +00:00
Ralph Castain
6100d88ded Cleanup the new IOF:
1. remove some stale files that were overlooked in original commit

2. add a test program and data to stress iof for stdin

3. cleanup a debug statement that caused memory corruption when reading large files

4. some minor cleanups to correctly handle xon/xoff scenarios

This commit was SVN r19792.
2008-10-23 19:11:05 +00:00
Ralph Castain
ebaa2c59bb Cleanup non-debug builds
This commit was SVN r19771.
2008-10-18 13:09:47 +00:00
Jeff Squyres
d96003fec5 Fix typo.
This commit was SVN r19769.
2008-10-18 11:52:41 +00:00
Ralph Castain
6e5d844c36 Roll in the revamped IOF subsystem. Per the devel mailing list email, this is a complete rewrite of the iof framework designed to simplify the code for maintainability, and to support features we had planned to do, but were too difficult to implement in the old code. Specifically, the new code:
1. completely and cleanly separates responsibilities between the HNP, orted, and tool components.

2. removes all wireup messaging during launch and shutdown.

3. maintains flow control for stdin to avoid large-scale consumption of memory by orteds when large input files are forwarded. This is done using an xon/xoff protocol.

4. enables specification of stdin recipients on the mpirun cmd line. Allowed options include rank, "all", or "none". Default is rank 0.

5. creates a new MPI_Info key "ompi_stdin_target" that supports the above options for child jobs. Default is "none".

6. adds a new tool "orte-iof" that can connect to a running mpirun and display the output. Cmd line options allow selection of any combination of stdout, stderr, and stddiag. Default is stdout.

7. adds a new mpirun and orte-iof cmd line option "tag-output" that will tag each line of output with process name and stream ident. For example, "[1,0]<stdout>this is output"

This is not intended for the 1.3 release as it is a major change requiring considerable soak time.

This commit was SVN r19767.
2008-10-18 00:00:49 +00:00
Ralph Castain
0532d799d6 Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm.
Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed.

This commit was SVN r18664.
2008-06-18 03:15:56 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
Brian Barrett
1d02b9e7b5 Fix a bunch of issues exposed by Ken Cain in getting Open MPI to work with
VxWorks.  Still some issues remaining, I'm sure.

Refs trac:1010

This commit was SVN r15320.

The following Trac tickets were found above:
  Ticket 1010 --> https://svn.open-mpi.org/trac/ompi/ticket/1010
2007-07-10 03:46:57 +00:00
George Bosilca
f52c10d18e And ORTE is ready for prime-time. All Windows tricks are in:
- use the OPAL functions for PATH and environment variables
- make all headers C++ friendly
- no unamed structures
- no implicit cast.

Plus a full implementation for the orte_wait functions.

This commit was SVN r11347.
2006-08-23 03:32:36 +00:00
George Bosilca
6afa4c6c64 Windows friendly version. We have to split the OMPI_DECLSPEC in at least 3
different macros, one for each project. Therefore, now we have OPAL_DECLSPEC,
ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project.

This commit was SVN r11270.
2006-08-20 15:54:04 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Brian Barrett
39dbeeedfb * rename locking code from ompi to opal
This commit was SVN r6327.
2005-07-03 22:45:48 +00:00
Brian Barrett
761402f95f * rename ompi_list to opal_list
This commit was SVN r6322.
2005-07-03 16:22:16 +00:00
Brian Barrett
8077da277b * move ompi_rb_tree from opal to ompi since it's only used in ompi, and should
have the ompi_free_list instead of the opal_free_list
* Change orte to use opal_free_list instead of ompi_free_list

This commit was SVN r6307.
2005-07-02 16:46:27 +00:00
Jeff Squyres
1b18979f79 Initial population of orte tree
This commit was SVN r6266.
2005-07-02 13:42:54 +00:00