1
1
Граф коммитов

3576 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
da28a4b0e6 Silence warning
This commit was SVN r26479.
2012-05-23 13:59:22 +00:00
Jeff Squyres
7969faf372 Fixes trac:3057: minor update to the man page to state that slot locations
in rankfiles use ''physical'' device indexes (vs. logical indexes).

This commit was SVN r26478.

The following Trac tickets were found above:
  Ticket 3057 --> https://svn.open-mpi.org/trac/ompi/ticket/3057
2012-05-23 11:43:33 +00:00
Nathan Hjelm
b9959a95cd ack! one more
This commit was SVN r26472.
2012-05-22 20:52:52 +00:00
Nathan Hjelm
f2d4e95429 doh! add missing include
This commit was SVN r26471.
2012-05-22 20:49:13 +00:00
Nathan Hjelm
cdc3c87ba6 move pmi init/finalize into a common component
This commit was SVN r26470.
2012-05-22 15:15:39 +00:00
Nathan Hjelm
78b8b3cf76 bug fix: actually close ess components
This commit was SVN r26469.
2012-05-22 15:09:18 +00:00
Ralph Castain
b217124bd8 Symlink instead of copy
This commit was SVN r26464.
2012-05-21 23:07:48 +00:00
Ralph Castain
da3873af6f Rename the mapreduce tool to "mr+" per the marketing types
This commit was SVN r26463.
2012-05-21 21:17:44 +00:00
Nathan Hjelm
6eeca66475 add an option to enable static ports. diabled by default
This commit was SVN r26462.
2012-05-21 19:56:15 +00:00
Ralph Castain
83d69b6c95 Enable the ORTE progress thread for apps (not needed in the tools as they already continuously loop in the event lib). This appears to be working, at least for MPI apps that only use shared memory (a simple "hello"). More testing is required to identify where problems will occur - this is only intended to allow further development.
In order to use the progress thread, you must configure with:

--enable-orte-progress-threads --enable-event-thread-support

This commit was SVN r26457.
2012-05-20 15:14:43 +00:00
Ralph Castain
c4f8043064 Per Nathan, with a little cleanup by me: update the PMI support to aggregate modex info, thus reducing the number of keys required so it fits within Cray default constraints
This commit was SVN r26456.
2012-05-19 16:12:52 +00:00
Ralph Castain
a526afae92 Ensure we always cleanup local procs, no matter how we exited.
This commit was SVN r26454.
2012-05-18 23:37:40 +00:00
Ralph Castain
12ebc0e269 Don't need this to be a bin program as the class is captured in the jar
This commit was SVN r26453.
2012-05-18 23:37:18 +00:00
Ralph Castain
b16e43f489 Silence a warning on Mac
This commit was SVN r26449.
2012-05-18 15:27:04 +00:00
Ralph Castain
ca1b325738 Tweak the java setup so it works better on Mac. Only build mapreduce and allocators if hadoop support was requested.
This commit was SVN r26448.
2012-05-18 01:02:01 +00:00
Jeff Squyres
cab31eafce Revert r26413: it was causing too much confusion. When an MPI proc
exits with status 77, the whole job will be killed, but mpirun will
still return an exit status of 77, so MTT will report it as a skip
anyway. 

This commit was SVN r26445.

The following SVN revision numbers were found above:
  r26413 --> open-mpi/ompi@02aa36f2e5
2012-05-16 14:45:58 +00:00
Jeff Squyres
dab7d36a81 Fix location of the default hostfile. Thanks to Götz Waschk for
identifying the problem.

This commit was SVN r26441.
2012-05-15 16:13:39 +00:00
Jeff Squyres
2d78728d38 Fix the macro name in the comment: it's EXTRA_DIST, not EXTRA_SOURCES.
This commit was SVN r26429.
2012-05-10 14:07:36 +00:00
Jeff Squyres
b325c17c72 It's a little weird to put in a blank _SOURCES line for the
HDFSFileFinder PROGRAM, but if we don't put in a _SOURCES line at all,
Automake will default to "HDFSFileFinter_class_SOURCES =
HDFSFileFinder.c", which clearly will cause problems.  

But we don't want to put the .java file in _SOURCES, either, because
we haven't configured Automake to handle Java (because current
versions of Automake only have GCJ, not other Java compilers).  So set
HDFSFileFinder_class_SOURCES to blank and list the .java file in
EXTRA_SOURCES (so that they get picked up for "make dist").

This commit was SVN r26424.
2012-05-10 13:54:51 +00:00
Ralph Castain
b9d560263f Ensure we properly handle systems that do not have a jdk installed
This commit was SVN r26421.
2012-05-10 12:06:59 +00:00
Ralph Castain
b143633593 Fix java config
This commit was SVN r26420.
2012-05-10 01:51:02 +00:00
Ralph Castain
640f0610aa Fix the makefile to install the perl scripts properly
This commit was SVN r26416.
2012-05-09 14:06:02 +00:00
Ralph Castain
fd796cce0a Add an allocator tool for finding HDFS file locations and obtaining allocations for those nodes (supports both Hadoop 1 and 2). Split the Java support into two parts: detection of Java support and request for Java bindings.
This commit was SVN r26414.
2012-05-09 01:13:49 +00:00
Jeff Squyres
02aa36f2e5 ORTE defaults to killing the entire job when any process exits with a
nonzero status (we polled other MPI implementations since one one in
the OMPI community had a concrete opinion on what behavior to do here
-- all other MPI's seem to adhere to this behavior, too).

This commit adds an MCA parameter that allows us to tell ORTE to
''not'' kill jobs when a process exits with a status of 77, meaning
the GNU testing standard of "this test was skipped".  In all the OMPI
tests, all procs will either return 77 or not.  So if they all return
77, mpirun won't consider it an error, but will still return an exit
status of 77 (so that MTT can know that the test was cleanly skipped).

This commit was SVN r26413.
2012-05-08 21:49:05 +00:00
Ralph Castain
84d031d6c1 Add daemon object to job array after creation
This commit was SVN r26406.
2012-05-08 13:39:20 +00:00
Ralph Castain
70a106fa71 Fix binding on remote nodes - need to pass the binding bitmap!
This commit was SVN r26403.
2012-05-08 03:52:39 +00:00
Jeff Squyres
2ba10c37fe Per RFC, bring in the following changes:
* Remove paffinity, maffinity, and carto frameworks -- they've been
   wholly replaced by hwloc.
 * Move ompi_mpi_init() affinity-setting/checking code down to ORTE.
 * Update sm, smcuda, wv, and openib components to no longer use carto.
   Instead, use hwloc data.  There are still optimizations possible in
   the sm/smcuda BTLs (i.e., making multiple mpools).  Also, the old
   carto-based code found out how many NUMA nodes were ''available''
   -- not how many were used ''in this job''.  The new hwloc-using
   code computes the same value -- it was not updated to calculate how
   many NUMA nodes are used ''by this job.''
   * Note that I cannot compile the smcuda and wv BTLs -- I ''think''
     they're right, but they need to be verified by their owners.
 * The openib component now does a bunch of stuff to figure out where
   "near" OpenFabrics devices are.  '''THIS IS A CHANGE IN DEFAULT
   BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors
   (I do not have a NUMA machine with an OpenFabrics device that is a
   non-uniform distance from multiple different NUMA nodes).
 * Completely rewrite the OMPI_Affinity_str() routine from the
   "affinity" mpiext extension.  This extension now understands
   hyperthreads; the output format of it has changed a bit to reflect
   this new information.
 * Bunches of minor changes around the code base to update names/types
   from maffinity/paffinity-based names to hwloc-based names.
 * Add some helper functions into the hwloc base, mainly having to do
   with the fact that we have the hwloc data reporting ''all''
   topology information, but sometimes you really only want the
   (online | available) data.

This commit was SVN r26391.
2012-05-07 14:52:54 +00:00
Ralph Castain
44b8608f0a Convert debug to verbose
This commit was SVN r26384.
2012-05-05 17:46:10 +00:00
Ralph Castain
96bfeb591c Ensure flag is passed to remote daemons
This commit was SVN r26383.
2012-05-03 22:31:25 +00:00
Ralph Castain
45fee2b491 Resolve the case where only the HNP is in the system (i.e., single-node operation)
This commit was SVN r26382.
2012-05-03 18:00:01 +00:00
Ralph Castain
c352ca36c2 Minor cleanup
This commit was SVN r26381.
2012-05-02 21:23:37 +00:00
Ralph Castain
b2f77bf08f Extend the iof by adding two new components to support map-reduce IO chaining. Add a mapreduce tool for running such applications.
Fix the state machine to support multiple jobs being simultaneously launched as this is not only required for mapreduce, but can happen under comm-spawn applications as well.

This commit was SVN r26380.
2012-05-02 21:00:22 +00:00
Ralph Castain
40c2fc5f55 Update the tests, add a couple
This commit was SVN r26379.
2012-05-02 19:00:05 +00:00
Ralph Castain
c5da4f24d7 Fix stupid singletons - get the pidmap message correct
This commit was SVN r26378.
2012-05-02 17:48:02 +00:00
Ralph Castain
8f7bf3344a Update test
This commit was SVN r26370.
2012-05-01 18:38:44 +00:00
Ralph Castain
4542070cf2 Add event priority inversion test
This commit was SVN r26369.
2012-05-01 16:42:22 +00:00
Ralph Castain
a8db2fc95f Add procs to each node's map on the daemons
This commit was SVN r26368.
2012-05-01 16:41:35 +00:00
Ralph Castain
a927318ea1 Add -N option as synonym for "npernode"
This commit was SVN r26367.
2012-05-01 16:18:14 +00:00
Ralph Castain
9f724db182 Remove duplicate event assignment
This commit was SVN r26360.
2012-04-30 16:06:20 +00:00
Ralph Castain
289f9f41ec From long-term discussions, have the daemons use the node_t and proc_t structs and arrays instead of the pidmap and nidmap arrays. Sets the stage for future work.
This commit was SVN r26359.
2012-04-29 00:10:01 +00:00
Ralph Castain
47a5e30095 Ensure debug output levels if we are debugging
This commit was SVN r26358.
2012-04-29 00:03:28 +00:00
Ralph Castain
f3e3704c9e Per request from Brian, enable mapping of stddiag output (output from opal_output calls) to stderr of the local process. This allows you to obtain that output in a local window (for example, when using xterm for each process) instead of having it automatically forwarded to mpirun. Turn this on automatically whenever someone uses the -xterm option, and to be set manually using the orte_map_stddiag_to_stderr mca param.
This commit was SVN r26352.
2012-04-27 14:39:34 +00:00
Jeff Squyres
46f47e08b6 Remove typo/extra brackets and parens.
This commit was SVN r26351.
2012-04-27 13:48:43 +00:00
Jeff Squyres
9d0df5a9a6 Update configury in the new oob ud component: actually check to see if
it succeeds and run $1 or $2, accordingly.  This allows "make dist" to
run properly on machines that do not have OpenFabrics stuff installed
(e.g., the nightly tarball build machine).

There's still more to be done here -- it doesn't check for non-uniform
directories where the OpenFabrics headers/libraries might be
installed.  We might need to re-tool/combine
ompi/config/ompi_check_openib.m4 (which checks for way more than
oob/ud needs) and move it up to config/ompi_check_ofa.m4, or
something...?

This commit was SVN r26350.
2012-04-27 11:32:56 +00:00
Jeff Squyres
9829d2279f System-level includes should be at the top of the file, before most
OPAL/ORTE/OMPI includes.

This commit was SVN r26349.
2012-04-27 11:29:22 +00:00
Ralph Castain
38af7db183 Ensure the progress message comes out right away. Otherwise, on a large system where proc state messages are arriving frequently, the message doesn't get printed until the launch is done!
This commit was SVN r26346.
2012-04-26 23:41:03 +00:00
Nathan Hjelm
e1e0d466e5 Merge ssh://ct-fe1/usr/projects/hpctools/hjelmn/ompi-trunk-git into HEAD
This commit was SVN r26344.
2012-04-26 22:06:12 +00:00
Ralph Castain
3461809341 Fix reporting of launch progress so the numbers are correct and appear when they should
This commit was SVN r26342.
2012-04-26 00:10:09 +00:00
Ralph Castain
3b5b185c86 Don't double free timer events
This commit was SVN r26341.
2012-04-25 17:36:12 +00:00
Jeff Squyres
501a86afe1 No need to include the generated files in the tarball. Thanks to
Eugene for pointing this out.

This commit was SVN r26339.
2012-04-25 14:19:18 +00:00
Ralph Castain
71805bf7e4 Clearout the startup_timeout event if the job did in fact start. Have ORTE_TERMINATE use the job state macro so debug will show where it was called
This commit was SVN r26334.
2012-04-25 01:05:17 +00:00
Jeff Squyres
708b497968 Ensure to unset the iof "active" flag after the libevent read callback
fires (it's already reset once we queue up the read event again).  Failure
to unset the active flag would cause other logic to not queue up the
read event again, because it thought the read event was still active).

This commit was SVN r26311.
2012-04-23 15:58:12 +00:00
Ralph Castain
7999266f99 Silence warning by removing unused var
This commit was SVN r26275.
2012-04-17 22:34:48 +00:00
Ralph Castain
f68487016c Add test code from Terry. Properly terminate if we don't abort on non-zero exit
This commit was SVN r26271.
2012-04-16 16:44:23 +00:00
Ralph Castain
ddfbde587f Change the default to "abort" the job when any process exits with a non-zero status. Add the required code to ensure the orted tells the HNP about the problem.
This commit was SVN r26270.
2012-04-13 21:19:46 +00:00
Ralph Castain
7741ba47be Fix comm_spawn that spans multiple nodes
This commit was SVN r26268.
2012-04-13 01:59:07 +00:00
Ralph Castain
4d16790836 Fix collectives for jobs running across partial allocations
This commit was SVN r26267.
2012-04-13 00:38:47 +00:00
Ralph Castain
5d14fa7546 Fix mpi_abort, minimize error output.
This commit was SVN r26266.
2012-04-11 14:37:08 +00:00
Ralph Castain
d3dfba3872 Fix the scenario where an MPI error handler causes a proc to exit after finalize, but with non-zero status to indicate an error occurred.
This commit was SVN r26265.
2012-04-11 02:23:46 +00:00
Ralph Castain
9cd4c06488 Get things to build and run when --disable-orte is specified
This commit was SVN r26263.
2012-04-10 21:50:01 +00:00
Ralph Castain
14d5525fb1 Some minor cleanups. Get singletons working. Cleanup abort handling so it gets properly identified.
This commit was SVN r26261.
2012-04-10 19:08:54 +00:00
Ralph Castain
53bbcf4b5b Plug slot allocation leak
This commit was SVN r26260.
2012-04-10 14:56:24 +00:00
Ralph Castain
f5cd996b91 Fix the case where n=1
This commit was SVN r26258.
2012-04-09 22:44:56 +00:00
Ralph Castain
a34be856aa Now that we have PMI support, this is no longer needed
This commit was SVN r26254.
2012-04-07 13:36:24 +00:00
Ralph Castain
71f9e69c62 Remove stale code
This commit was SVN r26253.
2012-04-07 13:34:12 +00:00
Ralph Castain
19630ca28d Remove stale code
This commit was SVN r26252.
2012-04-07 13:33:40 +00:00
Ralph Castain
93bbeabc55 Remove stale code
This commit was SVN r26251.
2012-04-07 13:33:30 +00:00
Ralph Castain
b6cde9a8d1 Remove stale code
This commit was SVN r26250.
2012-04-07 13:33:18 +00:00
Ralph Castain
48de3a2501 Minor fixes so orte_progress_thread can work
This commit was SVN r26248.
2012-04-06 15:50:49 +00:00
George Bosilca
319f76d66a Low hanging fruit. Remove a declared but not defined function.
This commit was SVN r26245.
2012-04-06 15:43:28 +00:00
Ralph Castain
ed197acaa2 Eliminate stale code
This commit was SVN r26244.
2012-04-06 15:31:13 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Josh Hursey
1941f6b3b1 Cleanup some compiler warnings when doing an optimized/non-debug build.
This commit was SVN r26236.
2012-04-04 20:40:16 +00:00
Ralph Castain
ca3ff58c76 Ensure we get a non-zero exit status when we can't find the specified fork agent. Output a better error message, and ensure we don't multiply report the problem.
This commit was SVN r26191.
2012-03-24 00:49:38 +00:00
Ralph Castain
6dc44dc4b8 Look at the basename of the appname for the "java" keyword
This commit was SVN r26190.
2012-03-24 00:38:18 +00:00
Ralph Castain
46b040c79f Fix typo
This commit was SVN r26189.
2012-03-24 00:31:05 +00:00
Ralph Castain
2bd75ec7e3 Fix Cray XE builds - the priority here needs to equal that of the HNP component so that both build. Otherwise, mpirun tries to use PMI for its basis, and that doesn't work!
This commit was SVN r26188.
2012-03-23 20:06:34 +00:00
Ralph Castain
811413e9bc Correctly handle multiple cpu-set ranges. Correctly support optional binding directives combined with cpu-set.
This commit was SVN r26187.
2012-03-23 14:50:41 +00:00
Ralph Castain
ce0caf7567 Support -cpu-set by binding to the specified cpus in the absence of any other binding directive. Allows users to subdivide nodes for multiple parallel mpirun invocations.
This commit was SVN r26186.
2012-03-23 14:05:52 +00:00
Ralph Castain
33ed3cda07 Update the gridengine allocator to support data from multiple queues by checking for duplicate node entries
This commit was SVN r26148.
2012-03-15 17:45:50 +00:00
Josh Hursey
4dd9f89a99 Create an MCA parameter (ess_base_stream_buffering) that allows the user to override the system default for buffering of stdout/stderr streams. See 'man setvbuf' for more information.
Note: I am working on a system that buffered all output until the application fishished due to a default of 'fully buffered.' This makes debugging painful. This switch fixed the problem by allowing me to adjust the buffering.

This commit was SVN r26119.
2012-03-08 22:02:28 +00:00
Josh Hursey
a595525366 Add the callers name to the 'comm failed' error message, so we know between which two peers the communication failed.
This commit was SVN r26117.
2012-03-08 21:55:19 +00:00
Ralph Castain
e71e871bae Initialize sink location when stdin is forwarded to all ranks
This commit was SVN r26107.
2012-03-06 15:47:04 +00:00
Ralph Castain
366f9d1518 Add some missing localities to the hwloc pretty-print, fix pmi modex
This commit was SVN r26105.
2012-03-06 06:21:10 +00:00
Ralph Castain
c3cf46af65 Ensure install_dirs are filled in before parsing prefix
This commit was SVN r26093.
2012-03-03 23:14:15 +00:00
Ralph Castain
75f738bfce Fix one last place
This commit was SVN r26092.
2012-03-03 00:39:37 +00:00
Ralph Castain
834a86420b Ensure we use the slurm module for slurm environments, and correct init order in pmi module when used by daemons
This commit was SVN r26089.
2012-03-02 23:10:48 +00:00
Ralph Castain
53edc28fe5 Fix memory issue - pidmap relative locality is only defined for apps.
This commit was SVN r26088.
2012-03-02 23:10:10 +00:00
Ralph Castain
b8f093d1a0 Switch precedence - take the --prefix value over the absolute-path-to-mpirun so the backend prefix can be different from that of mpirun on hetero machines.
This commit was SVN r26085.
2012-03-02 22:59:13 +00:00
Ralph Castain
6c93dd13b0 Cleanup the prefix handling by mpirun. Important note: we do NOT support per-app_context prefixes!!
Don't let app_files trump given prefix values. Assign according to following precedence rules:

1. absolute path to mpirun, if given
2. --prefix value, if given to mpirun
3. default prefix, if configured with --enable-orterun-prefix-default
4. prefix from first app in app_file, if given
5. no prefix

This commit was SVN r26081.
2012-03-02 19:48:25 +00:00
Ralph Castain
ceb34ed0c9 Fix typo
This commit was SVN r26079.
2012-03-02 09:58:09 +00:00
Jeff Squyres
97b3603036 A bunch of fixes and improvements to Open MPI's various command line tools.
* fixed some bugs where "unknown" tokens were allowed on the command
   line (which should really only be used for ortertun).
 * if an unknown token is encountered, print a short error to stderr
   and quit with a nonzero exit status
 * if we don't find the right number of parameters to an option, print
   a short error to stderr and quit with a nonzero exit status
 * when --help is given, print the help message to stdout (not stderr)
   and quit with a zero exit status
 * added --showme:help option to the wrapper compilers
 * updated docs in opal/util/cmd_line.h
 * other small/miscellaneous CLI parsing bugs in various tools

I won't bore you with what we did before.  :-)  Here's some examples
of what the new behavior looks like:

{{{
% ompi_info --bogus
ompi_info: Error: unknown option "--bogus"
Type 'ompi_info --help' for usage.
% ompi_info --param bogus
ompi_info: Error: option "--param" did not have enough parameters (2)
Type 'ompi_info --help' for usage.
%
}}}

This commit was SVN r26072.
2012-02-29 17:52:38 +00:00
Ralph Castain
b2f1bade37 Fix the -H localhost issue
This commit was SVN r26071.
2012-02-29 16:56:00 +00:00
Jeff Squyres
81dc6a11ee Fix typo in copyright notice, found by Paul Hargrove
This commit was SVN r26070.
2012-02-29 02:02:54 +00:00
Ralph Castain
3d718863a8 Fix typo - thanks Pascal
This commit was SVN r26064.
2012-02-28 14:33:55 +00:00
Ralph Castain
bc5886707f Document the mpirun exit status behavior
This commit was SVN r26009.
2012-02-22 23:47:00 +00:00
Ralph Castain
a83da303c5 When using PMI, we know the ranks that share our node and their relative local/node ranks. Save that info in the pidmap array so that BTLs that require early knowledge of local ranks can access it.
This commit was SVN r25992.
2012-02-21 16:43:17 +00:00
Jeff Squyres
b6a90434e4 Fix some include file header ordering issues for some BSDs, suggested
by Paul Hargrove.

This commit was SVN r25984.
2012-02-21 13:32:14 +00:00
Ralph Castain
47c64ec837 Roll in Java bindings per telecon discussion. Man pages still under revision
This commit was SVN r25973.
2012-02-20 22:12:43 +00:00
Jeff Squyres
b295a01d8e Fix another configury error found by Paul Hargrove. Thanks, Paul!
This commit was SVN r25971.
2012-02-20 21:38:27 +00:00