2005-08-31 16:15:59 +00:00
|
|
|
/*
|
2005-11-05 19:57:48 +00:00
|
|
|
* Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
|
|
|
|
* University Research and Technology
|
|
|
|
* Corporation. All rights reserved.
|
|
|
|
* Copyright (c) 2004-2005 The University of Tennessee and The University
|
|
|
|
* of Tennessee Research Foundation. All rights
|
|
|
|
* reserved.
|
2005-08-31 16:15:59 +00:00
|
|
|
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
|
|
|
* University of Stuttgart. All rights reserved.
|
|
|
|
* Copyright (c) 2004-2005 The Regents of the University of California.
|
|
|
|
* All rights reserved.
|
2009-01-11 02:30:00 +00:00
|
|
|
* Copyright (c) 2007 Cisco Systems, Inc. All rights reserved.
|
2005-08-31 16:15:59 +00:00
|
|
|
* $COPYRIGHT$
|
|
|
|
*
|
|
|
|
* Additional copyrights may follow
|
|
|
|
*
|
|
|
|
* $HEADER$
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef ORTERUN_ORTERUN_H
|
|
|
|
#define ORTERUN_ORTERUN_H
|
|
|
|
|
|
|
|
#include "orte_config.h"
|
2009-04-29 00:49:23 +00:00
|
|
|
#include "opal/threads/mutex.h"
|
2008-02-28 01:57:57 +00:00
|
|
|
|
2007-07-10 12:53:48 +00:00
|
|
|
BEGIN_C_DECLS
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Main body of orterun functionality
|
|
|
|
*/
|
2005-08-31 16:15:59 +00:00
|
|
|
int orterun(int argc, char *argv[]);
|
|
|
|
|
2007-07-10 12:53:48 +00:00
|
|
|
/**
|
|
|
|
* Global struct for catching orterun command line options.
|
|
|
|
*/
|
2008-03-06 19:35:57 +00:00
|
|
|
struct orterun_globals_t {
|
2007-07-10 12:53:48 +00:00
|
|
|
bool help;
|
|
|
|
bool version;
|
|
|
|
bool verbose;
|
|
|
|
bool quiet;
|
2008-12-24 15:27:46 +00:00
|
|
|
char *report_pid;
|
|
|
|
char *report_uri;
|
2007-07-10 12:53:48 +00:00
|
|
|
bool exit;
|
|
|
|
bool by_node;
|
|
|
|
bool by_slot;
|
2009-08-11 02:51:27 +00:00
|
|
|
bool by_board;
|
|
|
|
bool by_socket;
|
2009-09-18 19:48:42 +00:00
|
|
|
bool bind_to_none;
|
2009-08-11 02:51:27 +00:00
|
|
|
bool bind_to_core;
|
|
|
|
bool bind_to_board;
|
|
|
|
bool bind_to_socket;
|
2007-07-10 12:53:48 +00:00
|
|
|
bool debugger;
|
|
|
|
int num_procs;
|
|
|
|
char *env_val;
|
|
|
|
char *appfile;
|
|
|
|
char *wdir;
|
|
|
|
char *path;
|
|
|
|
bool preload_binary;
|
|
|
|
char *preload_files;
|
|
|
|
char *preload_files_dest_dir;
|
|
|
|
opal_mutex_t lock;
|
When we can detect that a daemon has failed, then we would like to terminate the system without having it lock up. The "hang" is currently caused by the system attempting to send messages to the daemons (specifically, ordering them to kill their local procs and then terminate). Unfortunately, without some idea of which daemon has died, the system hangs while attempting to send a message to someone who is no longer alive.
This commit introduces the necessary logic to avoid that conflict. If a PLS component can identify that a daemon has failed, then we will set a flag indicating that fact. The xcast system will subsequently check that flag and, if it is set, will send all messages direct to the recipient. In the case of "kill local procs" and "terminate", the messages will go directly to each orted, thus bypassing any orted that has failed.
In addition, the xcast system will -not- wait for the messages to complete, but will return immediately (i.e., operate in non-blocking mode). Orterun will wait (via an event timer) for a period of time based on the number of daemons in the system to allow the messages to attempt to be delivered - at the end of that time, orterun will simply exit, alerting the user to the problem and -strongly- recommending they run orte-clean.
I could only test this on slurm for the case where all daemons unexpectedly died - srun apparently only executes its waitpid callback when all launched functions terminate. I have asked that Jeff integrate this capability into the OOB as he is working on it so that we execute it whenever a socket to an orted is unexpectedly closed. Meantime, the functionality will rarely get called, but at least the logic is available for anyone whose environment can support it.
This commit was SVN r16451.
2007-10-15 18:00:30 +00:00
|
|
|
bool sleep;
|
2008-02-28 01:57:57 +00:00
|
|
|
char *ompi_server;
|
Per the July technical meeting:
During the discussion of MPI-2 functionality, it was pointed out by Aurelien that there was an inherent race condition between startup of ompi-server and mpirun. Specifically, if someone started ompi-server to run in the background as part of a script, and then immediately executed mpirun, it was possible that an MPI proc could attempt to contact the server (or that mpirun could try to read the server's contact file before the server is running and ready.
At that time, we discussed createing a new tool "ompi-wait-server" that would wait for the server to be running, and/or probe to see if it is running and return true/false. However, rather than create yet another tool, it seemed just as effective to add the functionality to mpirun.
Thus, this commit creates two new mpirun cmd line flags (hey, you can never have too many!):
--wait-for-server : instructs mpirun to ping the server to see if it responds. This causes mpirun to execute an rml.ping to the server's URI with an appropriate timeout interval - if the ping isn't successful, mpirun attempts it again.
--server-wait-time xx : sets the ping timeout interval to xx seconds. Note that mpirun will attempt to ping the server twice with this timeout, so we actually wait for twice this time. Default is 10 seconds, which should be plenty of time.
This has only lightly been tested. It works if the server is present, and outputs a nice error message if it cannot be contacted. I have not tested the race condition case.
This commit was SVN r19152.
2008-08-04 20:29:50 +00:00
|
|
|
bool wait_for_server;
|
|
|
|
int server_wait_timeout;
|
Roll in the revamped IOF subsystem. Per the devel mailing list email, this is a complete rewrite of the iof framework designed to simplify the code for maintainability, and to support features we had planned to do, but were too difficult to implement in the old code. Specifically, the new code:
1. completely and cleanly separates responsibilities between the HNP, orted, and tool components.
2. removes all wireup messaging during launch and shutdown.
3. maintains flow control for stdin to avoid large-scale consumption of memory by orteds when large input files are forwarded. This is done using an xon/xoff protocol.
4. enables specification of stdin recipients on the mpirun cmd line. Allowed options include rank, "all", or "none". Default is rank 0.
5. creates a new MPI_Info key "ompi_stdin_target" that supports the above options for child jobs. Default is "none".
6. adds a new tool "orte-iof" that can connect to a running mpirun and display the output. Cmd line options allow selection of any combination of stdout, stderr, and stddiag. Default is stdout.
7. adds a new mpirun and orte-iof cmd line option "tag-output" that will tag each line of output with process name and stream ident. For example, "[1,0]<stdout>this is output"
This is not intended for the 1.3 release as it is a major change requiring considerable soak time.
This commit was SVN r19767.
2008-10-18 00:00:49 +00:00
|
|
|
char *stdin_target;
|
2007-07-10 12:53:48 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
/**
|
2008-03-06 19:35:57 +00:00
|
|
|
* Struct holding values gleaned from the orterun command line -
|
|
|
|
* needed by debugger init
|
2007-07-10 12:53:48 +00:00
|
|
|
*/
|
2008-03-06 19:35:57 +00:00
|
|
|
ORTE_DECLSPEC extern struct orterun_globals_t orterun_globals;
|
2007-07-10 12:53:48 +00:00
|
|
|
|
|
|
|
END_C_DECLS
|
|
|
|
|
2005-08-31 16:15:59 +00:00
|
|
|
#endif /* ORTERUN_ORTERUN_H */
|