Jeff Squyres
501a86afe1
No need to include the generated files in the tarball. Thanks to
...
Eugene for pointing this out.
This commit was SVN r26339.
2012-04-25 14:19:18 +00:00
Ralph Castain
71805bf7e4
Clearout the startup_timeout event if the job did in fact start. Have ORTE_TERMINATE use the job state macro so debug will show where it was called
...
This commit was SVN r26334.
2012-04-25 01:05:17 +00:00
Jeff Squyres
708b497968
Ensure to unset the iof "active" flag after the libevent read callback
...
fires (it's already reset once we queue up the read event again). Failure
to unset the active flag would cause other logic to not queue up the
read event again, because it thought the read event was still active).
This commit was SVN r26311.
2012-04-23 15:58:12 +00:00
Ralph Castain
7999266f99
Silence warning by removing unused var
...
This commit was SVN r26275.
2012-04-17 22:34:48 +00:00
Ralph Castain
f68487016c
Add test code from Terry. Properly terminate if we don't abort on non-zero exit
...
This commit was SVN r26271.
2012-04-16 16:44:23 +00:00
Ralph Castain
ddfbde587f
Change the default to "abort" the job when any process exits with a non-zero status. Add the required code to ensure the orted tells the HNP about the problem.
...
This commit was SVN r26270.
2012-04-13 21:19:46 +00:00
Ralph Castain
7741ba47be
Fix comm_spawn that spans multiple nodes
...
This commit was SVN r26268.
2012-04-13 01:59:07 +00:00
Ralph Castain
4d16790836
Fix collectives for jobs running across partial allocations
...
This commit was SVN r26267.
2012-04-13 00:38:47 +00:00
Ralph Castain
5d14fa7546
Fix mpi_abort, minimize error output.
...
This commit was SVN r26266.
2012-04-11 14:37:08 +00:00
Ralph Castain
d3dfba3872
Fix the scenario where an MPI error handler causes a proc to exit after finalize, but with non-zero status to indicate an error occurred.
...
This commit was SVN r26265.
2012-04-11 02:23:46 +00:00
Ralph Castain
9cd4c06488
Get things to build and run when --disable-orte is specified
...
This commit was SVN r26263.
2012-04-10 21:50:01 +00:00
Ralph Castain
14d5525fb1
Some minor cleanups. Get singletons working. Cleanup abort handling so it gets properly identified.
...
This commit was SVN r26261.
2012-04-10 19:08:54 +00:00
Ralph Castain
53bbcf4b5b
Plug slot allocation leak
...
This commit was SVN r26260.
2012-04-10 14:56:24 +00:00
Ralph Castain
f5cd996b91
Fix the case where n=1
...
This commit was SVN r26258.
2012-04-09 22:44:56 +00:00
Ralph Castain
a34be856aa
Now that we have PMI support, this is no longer needed
...
This commit was SVN r26254.
2012-04-07 13:36:24 +00:00
Ralph Castain
71f9e69c62
Remove stale code
...
This commit was SVN r26253.
2012-04-07 13:34:12 +00:00
Ralph Castain
19630ca28d
Remove stale code
...
This commit was SVN r26252.
2012-04-07 13:33:40 +00:00
Ralph Castain
93bbeabc55
Remove stale code
...
This commit was SVN r26251.
2012-04-07 13:33:30 +00:00
Ralph Castain
b6cde9a8d1
Remove stale code
...
This commit was SVN r26250.
2012-04-07 13:33:18 +00:00
Ralph Castain
48de3a2501
Minor fixes so orte_progress_thread can work
...
This commit was SVN r26248.
2012-04-06 15:50:49 +00:00
George Bosilca
319f76d66a
Low hanging fruit. Remove a declared but not defined function.
...
This commit was SVN r26245.
2012-04-06 15:43:28 +00:00
Ralph Castain
ed197acaa2
Eliminate stale code
...
This commit was SVN r26244.
2012-04-06 15:31:13 +00:00
Ralph Castain
bd8b4f7f1e
Sorry for mid-day commit, but I had promised on the call to do this upon my return.
...
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.
Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.
This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Josh Hursey
1941f6b3b1
Cleanup some compiler warnings when doing an optimized/non-debug build.
...
This commit was SVN r26236.
2012-04-04 20:40:16 +00:00
Ralph Castain
ca3ff58c76
Ensure we get a non-zero exit status when we can't find the specified fork agent. Output a better error message, and ensure we don't multiply report the problem.
...
This commit was SVN r26191.
2012-03-24 00:49:38 +00:00
Ralph Castain
6dc44dc4b8
Look at the basename of the appname for the "java" keyword
...
This commit was SVN r26190.
2012-03-24 00:38:18 +00:00
Ralph Castain
46b040c79f
Fix typo
...
This commit was SVN r26189.
2012-03-24 00:31:05 +00:00
Ralph Castain
2bd75ec7e3
Fix Cray XE builds - the priority here needs to equal that of the HNP component so that both build. Otherwise, mpirun tries to use PMI for its basis, and that doesn't work!
...
This commit was SVN r26188.
2012-03-23 20:06:34 +00:00
Ralph Castain
811413e9bc
Correctly handle multiple cpu-set ranges. Correctly support optional binding directives combined with cpu-set.
...
This commit was SVN r26187.
2012-03-23 14:50:41 +00:00
Ralph Castain
ce0caf7567
Support -cpu-set by binding to the specified cpus in the absence of any other binding directive. Allows users to subdivide nodes for multiple parallel mpirun invocations.
...
This commit was SVN r26186.
2012-03-23 14:05:52 +00:00
Ralph Castain
33ed3cda07
Update the gridengine allocator to support data from multiple queues by checking for duplicate node entries
...
This commit was SVN r26148.
2012-03-15 17:45:50 +00:00
Josh Hursey
4dd9f89a99
Create an MCA parameter (ess_base_stream_buffering) that allows the user to override the system default for buffering of stdout/stderr streams. See 'man setvbuf' for more information.
...
Note: I am working on a system that buffered all output until the application fishished due to a default of 'fully buffered.' This makes debugging painful. This switch fixed the problem by allowing me to adjust the buffering.
This commit was SVN r26119.
2012-03-08 22:02:28 +00:00
Josh Hursey
a595525366
Add the callers name to the 'comm failed' error message, so we know between which two peers the communication failed.
...
This commit was SVN r26117.
2012-03-08 21:55:19 +00:00
Ralph Castain
e71e871bae
Initialize sink location when stdin is forwarded to all ranks
...
This commit was SVN r26107.
2012-03-06 15:47:04 +00:00
Ralph Castain
366f9d1518
Add some missing localities to the hwloc pretty-print, fix pmi modex
...
This commit was SVN r26105.
2012-03-06 06:21:10 +00:00
Ralph Castain
c3cf46af65
Ensure install_dirs are filled in before parsing prefix
...
This commit was SVN r26093.
2012-03-03 23:14:15 +00:00
Ralph Castain
75f738bfce
Fix one last place
...
This commit was SVN r26092.
2012-03-03 00:39:37 +00:00
Ralph Castain
834a86420b
Ensure we use the slurm module for slurm environments, and correct init order in pmi module when used by daemons
...
This commit was SVN r26089.
2012-03-02 23:10:48 +00:00
Ralph Castain
53edc28fe5
Fix memory issue - pidmap relative locality is only defined for apps.
...
This commit was SVN r26088.
2012-03-02 23:10:10 +00:00
Ralph Castain
b8f093d1a0
Switch precedence - take the --prefix value over the absolute-path-to-mpirun so the backend prefix can be different from that of mpirun on hetero machines.
...
This commit was SVN r26085.
2012-03-02 22:59:13 +00:00
Ralph Castain
6c93dd13b0
Cleanup the prefix handling by mpirun. Important note: we do NOT support per-app_context prefixes!!
...
Don't let app_files trump given prefix values. Assign according to following precedence rules:
1. absolute path to mpirun, if given
2. --prefix value, if given to mpirun
3. default prefix, if configured with --enable-orterun-prefix-default
4. prefix from first app in app_file, if given
5. no prefix
This commit was SVN r26081.
2012-03-02 19:48:25 +00:00
Ralph Castain
ceb34ed0c9
Fix typo
...
This commit was SVN r26079.
2012-03-02 09:58:09 +00:00
Jeff Squyres
97b3603036
A bunch of fixes and improvements to Open MPI's various command line tools.
...
* fixed some bugs where "unknown" tokens were allowed on the command
line (which should really only be used for ortertun).
* if an unknown token is encountered, print a short error to stderr
and quit with a nonzero exit status
* if we don't find the right number of parameters to an option, print
a short error to stderr and quit with a nonzero exit status
* when --help is given, print the help message to stdout (not stderr)
and quit with a zero exit status
* added --showme:help option to the wrapper compilers
* updated docs in opal/util/cmd_line.h
* other small/miscellaneous CLI parsing bugs in various tools
I won't bore you with what we did before. :-) Here's some examples
of what the new behavior looks like:
{{{
% ompi_info --bogus
ompi_info: Error: unknown option "--bogus"
Type 'ompi_info --help' for usage.
% ompi_info --param bogus
ompi_info: Error: option "--param" did not have enough parameters (2)
Type 'ompi_info --help' for usage.
%
}}}
This commit was SVN r26072.
2012-02-29 17:52:38 +00:00
Ralph Castain
b2f1bade37
Fix the -H localhost issue
...
This commit was SVN r26071.
2012-02-29 16:56:00 +00:00
Jeff Squyres
81dc6a11ee
Fix typo in copyright notice, found by Paul Hargrove
...
This commit was SVN r26070.
2012-02-29 02:02:54 +00:00
Ralph Castain
3d718863a8
Fix typo - thanks Pascal
...
This commit was SVN r26064.
2012-02-28 14:33:55 +00:00
Ralph Castain
bc5886707f
Document the mpirun exit status behavior
...
This commit was SVN r26009.
2012-02-22 23:47:00 +00:00
Ralph Castain
a83da303c5
When using PMI, we know the ranks that share our node and their relative local/node ranks. Save that info in the pidmap array so that BTLs that require early knowledge of local ranks can access it.
...
This commit was SVN r25992.
2012-02-21 16:43:17 +00:00
Jeff Squyres
b6a90434e4
Fix some include file header ordering issues for some BSDs, suggested
...
by Paul Hargrove.
This commit was SVN r25984.
2012-02-21 13:32:14 +00:00
Ralph Castain
47c64ec837
Roll in Java bindings per telecon discussion. Man pages still under revision
...
This commit was SVN r25973.
2012-02-20 22:12:43 +00:00