openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	728a24c8ec	After considerable patience and help with debugging/testing from Tim M and Jeff S, return a completed and pretty well tested patch of the IOF to the trunk. This commit includes the previously reverted r20074, r20068, and r20064, as well as changes to fix those commits. Basically, the remaining problem turned out to be: 1. closing stdout/stderr during orte_finalize of mpirun 2. inadvertently setting up a write event on fd = -1 3. devising a scheme to more accurately track when the stdin write event was active vs closed so it only got released once This passed prelim MTT testing by Jeff and Tim, but should soak for awhile before migrating to 1.3. This commit was SVN r20106. The following SVN revision numbers were found above: r20064 --> open-mpi/ompi@a07660aea8 r20068 --> open-mpi/ompi@ec930d14a9 r20074 --> open-mpi/ompi@2940309613	2008-12-10 20:40:47 +00:00
Ralph Castain	e28210d0dc	Revert r20074, r20068, and r20064: remove the IOF proc completion code pending further off-trunk work. This commit was SVN r20089. The following SVN revision numbers were found above: r20064 --> open-mpi/ompi@a07660aea8 r20068 --> open-mpi/ompi@ec930d14a9 r20074 --> open-mpi/ompi@2940309613	2008-12-09 17:11:59 +00:00
Ralph Castain	a07660aea8	Bring over the IOF completion changes. This commit fixes the long-occurring problem whereby application procs could, under some circumstances, lose their final prints to stdout/err. The commit includes: 1. coordination of job completion notification to include a requirement for both waitpid detection AND notification that all iof pipes have been closed by the app 2. change of all IOF read and write events to be non-persistent so they can properly be shutdown and restarted only when required 3. addition of a delay (currently set to 10ms) before restarting the stdin read event. This was required to ensure that the stdout, stderr, and stddiag read events had an opportunity to be serviced in scenarios where large files are attached to stdin. This commit was SVN r20064.	2008-12-03 17:45:42 +00:00
Ralph Castain	586334d1c8	Per discussion with Tim Mattox, reset the trunk to pre-19991 level for the iof only. I will shortly add a changeset that will repair the one known error where we were incorrectly closing the stdout/err/diag file descriptors when all we wanted to do was close stdin. I will leave out the changes associated with coordinating proc termination due to race conditions IU encounted during MTT testing. I have been unable to replicate those so far, but we hope to resolve it in the near future. This commit was SVN r19998.	2008-11-14 20:22:36 +00:00
Ralph Castain	555bbf0c02	Fix the iof race conditions wrt proc termination. This is comprised of two sections: 1. modify the iof to track when a proc actually closes all of its open iof output pipes. When this occurs, notify the odls that the proc's iof is complete. This is done via a zero-time event so that we can step out of the read event before processing the notification. 2. in the odls, modify the waitpid callback so it only flags that it was called. Add a function to receive the iof-complete notification, and a function that checks for both iof complete and waitpid callback before declaring a proc fully terminated. This ensures that we read and deliver -all- of the IO prior to declaring the job complete. Also modified the odls call to orte_iof.close (and the component's implementation) so it only closes stdin, leaving the other io channels alone. This fixes the other half of the known problem. This should fix the ticket on this subject, but I'll wait to close it pending further testing in the trunk. This commit was SVN r19991.	2008-11-12 23:32:01 +00:00
Ralph Castain	48c3de1865	Fix a problem in the plm "failed to start" code observed by Jeff. When we are unable to launch to a specific node because it doesn't exist or is down, the system would hang and/or segv. The reason for the hang was that we were "firing" the orted exit trigger prior to its timer event being defined - thus "locking" that one-shot and preventing it from firing when we actually were ready to use it. The segv was caused by the fact that we don't really know which daemon failed to start (at least, in most cases), so we didn't set a pointer to the aborted proc object. All we really wanted, though, was to ensure that mpirun returned a non-zero exit status, so the fix was to simply return the default error status. This commit was SVN r19754.	2008-10-16 14:21:37 +00:00
Ralph Castain	be02211b4f	Modify the wakeup system to make it more Windows-friendly. This allows Shiqing to consolidate the Windows-specific modifications into one location, and generalizes the wakeup procedure in case we hit other system-specific requirements. This needs some soak time to ensure we haven't opened any race conditions. I tried to loop everything in the shutdown procedure through that trigger event call to ensure it all goes through the one-time locks as it did before so that someone hitting ctrl-c when we are already shutting down shouldn't cause problems. Just want to let people use it for awhile to verify. This commit was SVN r19159.	2008-08-05 15:09:29 +00:00
Thomas Herault	28dc80b67e	Deal with the SIGCHLD issue in LSF. lsb_launch tampers with SIGCHLD signal handler. We are forced to reinstall our own signal handler after a call to this function. This commit fixes trac:1356. This commit was SVN r19033. The following Trac tickets were found above: Ticket 1356 --> https://svn.open-mpi.org/trac/ompi/ticket/1356	2008-07-25 15:23:23 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
Ralph Castain	7bee71aa59	Fix a potential, albeit perhaps esoteric, race condition that can occur for fast HNP's, slow orteds, and fast apps. Under those conditions, it is possible for the orted to be caught in its original send of contact info back to the HNP, and thus for the progress stack never to recover back to a high level. In those circumstances, the orted can "hang" when trying to exit. Add a new function to opal_progress that tells us our recursion depth to support that solution. Yes, I know this sounds picky, but good ol' Jeff managed to make it happen by driving his cluster near to death... Also ensure that we declare "failed" for the daemon job when daemons fail instead of the application job. This is important so that orte knows that it cannot use xcast to tell daemons to "exit", nor should it expect all daemons to respond. Otherwise, it is possible to hang. After lots of testing, decide to default (again) to slurm detecting failed orteds. This proved necessary to avoid rather annoying hangs that were difficult to recover from. There are conditions where slurm will fail to launch all daemons (slurm folks are working on it), and yet again, good ol' Jeff managed to find both of them. Thanks you Jeff! :-/ This commit was SVN r18611.	2008-06-06 19:36:27 +00:00
Ralph Castain	b456fb2d42	Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2heartrate, declares orted failure if not seen in last 2heartrate time. Also detect orted failed-to-start by setting timeout on launch. Currently only used in TM launcher. Neither detection is enabled by default, but are only active if heartrate is set and/or launch timeout is set. Exception for SLURM as orted failure is always detected and reported. More info to come on devel list. This commit was SVN r18555.	2008-06-02 21:46:34 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Ralph Castain	06d3145fe4	First cut at direct launch for TM. Able to launch non-ORTE procs and detect their completion for a clean shutdown. This commit was SVN r17732.	2008-03-05 13:51:32 +00:00
Ralph Castain	841d0e5208	Cleanup an attribute warning - not sure which one to set or where it should go, so I'll leave that to someone more familiar with "attributes". Ensure some debugging is only enabled when have_debug is set. This commit was SVN r17681.	2008-03-03 16:06:47 +00:00
Ralph Castain	6450962d59	Add some debugging to the message event object. Cleanup some no-longer-used values This commit was SVN r17671.	2008-02-29 20:10:31 +00:00
Ralph Castain	5e6928d710	Cleanup recursions in ORTE caused by processing recv'd messages that can cause the system to take action resulting in receipt of another message. Basically, the method employed here is to have a recv create a zero-time timer event that causes the event library to execute a function that processes the message once the recv returns. Thus, any action taken as a result of processing the message occur outside of a recv. Created two new macros to assist: ORTE_MESSAGE_EVENT: creates the zero-time event, passing info in a new orte_message_event_t object ORTE_PROGRESSED_WAIT: while waiting for specified conditions, just calls progress so messages can be recv'd. Also fixed the failed_launch function as we no longer block in the orted callback function. Updated the error messages to reflect revision. No change in API to this function, but PLM "owners" may want to check their internal error messages to avoid duplication and excessive output. This has been tested on Mac, TM, and SLURM. This commit was SVN r17647.	2008-02-28 19:58:32 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
George Bosilca	f52c10d18e	And ORTE is ready for prime-time. All Windows tricks are in: - use the OPAL functions for PATH and environment variables - make all headers C++ friendly - no unamed structures - no implicit cast. Plus a full implementation for the orte_wait functions. This commit was SVN r11347.	2006-08-23 03:32:36 +00:00
Brian Barrett	566a050c23	Next step in the project split, mainly source code re-arranging - move files out of toplevel include/ and etc/, moving it into the sub-projects - rather than including config headers with <project>/include, have them as <project> - require all headers to be included with a project prefix, with the exception of the config headers ({opal,orte,ompi}_config.h mpi.h, and mpif.h) This commit was SVN r8985.	2006-02-12 01:33:29 +00:00
Jeff Squyres	42ec26e640	Update the copyright notices for IU and UTK. This commit was SVN r7999.	2005-11-05 19:57:48 +00:00
Jeff Squyres	1b18979f79	Initial population of orte tree This commit was SVN r6266.	2005-07-02 13:42:54 +00:00

21 Коммитов