1
1
Граф коммитов

807 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
9bd4b814db Fix one more nroff macro issue
This commit was SVN r28090.
2013-02-21 17:38:06 +00:00
Jeff Squyres
76fcd42bc3 Fix minor nroff macro issues.
This commit was SVN r28088.
2013-02-21 17:35:36 +00:00
Jeff Squyres
12e047e594 Update documentation for rankfiles in orterun.1:
* Add a little more description of what rankfiles are
 * Update that we use logical numbering for socket:core notation
 * Mention +nX notation

This commit was SVN r28067.
2013-02-16 17:52:30 +00:00
Ralph Castain
c0b670bea8 I guess some profiling tools and debuggers require that the argv[0] of each rank be unique so they can create a filename based on that value. For those obscure cases, provide an mpirun cmd line option that indexes each argv[0] by rank
This commit was SVN r28064.
2013-02-15 20:20:49 +00:00
Ralph Castain
744ed49b2d Begin cleanup of the thread_lock calls in ORTE. We'll ignore the ones in the rml/oob for now as that code block is being rewritten anyway.
This commit was SVN r28053.
2013-02-13 01:53:12 +00:00
Brian Barrett
504a6d036f * Rather than use the extra_includes directive, add the extra includes (which is really just -I${includedir}/openmpi/ for devel headers) to CPPFLAGS, since all the other necessary -Is for devel headers (like libevent and hwloc) are added to CPPFLAGS.
* Clean up ${includedir} and ${libdir} for script wrapper compilers
* Update script wrapper compilers to work like the C wrapper compilers w.r.t static and dynamic linking
* Remove the ORTE script wrapper compilers since they didn't support the ${includedir} stuff and Ralph said they weren't used anymore.

This commit was SVN r28052.
2013-02-13 00:33:05 +00:00
Brian Barrett
b8442ba505 Revamp the handling of wrapper compiler flags. The user flags, main configure
flags, and mca flags are kept seperate until the very end.  The main configure
wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD
macro.  MCA components should either let <framework>_<component>_{LIBS,LDFLAGS}
be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}.
The situations in which WRAPPER CPPFLAGS can be set by MCA components was
made very small to match the one use case where it makes sense.

This commit was SVN r27950.
2013-01-29 00:00:43 +00:00
Brian Barrett
f42783ae1a Move the RTE framework change into the trunk. With this change, all non-CR
runtime code goes through one of the rte, dpm, or pubsub frameworks.

This commit was SVN r27934.
2013-01-27 23:25:10 +00:00
Brian Barrett
0e799a93c3 Automake will ship the .in file whether or not the conditional is taken,
so don't install orte_wrapper_script when it's not used

This commit was SVN r27902.
2013-01-24 21:36:25 +00:00
Ralph Castain
6e2cabb87f Remove duplicate code
This commit was SVN r27889.
2013-01-23 02:07:06 +00:00
George Bosilca
e69dc00460 Dont duplicate headers nor global variables.
This commit was SVN r27864.
2013-01-18 11:51:56 +00:00
Ralph Castain
c96cc2d5a0 In order to properly connect to debuggers like STAT, we need to get the hostname in its unstripped version for the MPIR_proctab. Unfortunately, we need a stripped version for Cray's alps launcher. So when we are stripping the hostname prefix, retain alias hostnames and add the ability to specify an alias to use in the proctab.
This commit was SVN r27863.
2013-01-18 05:00:05 +00:00
Ralph Castain
5b8de0b9f4 Ouch - opal_progress calls event_loop with a NO_BLOCK flag. So when run without progress threads, the ORTE tools were not blocking in the event lib as they should be. Avoid calling opal_progress inside ORTE by directly using the event_loop call instead of ORTE_WAIT_FOR_COMPLETION as parts of the OMPI layer are using that macro.
Thanks to George for spotting the problem.

This commit was SVN r27815.
2013-01-14 23:06:42 +00:00
Ralph Castain
72bea688f1 Fix typo
This commit was SVN r27717.
2012-12-23 18:13:39 +00:00
Ralph Castain
852a709c0e Add libopen-pal to the libraries as all these tools directly reference OPAL functions, and the list of OS's that don't support indirect linking grows (Mac and Ubuntu, for now).
This commit was SVN r27716.
2012-12-23 15:54:05 +00:00
Jeff Squyres
c5b0bcd9f7 Refs trac:3422
* Add some comments in the *-wrapper-data-txt.in files just so that
   someone doesn't forget in the future why we link in what we do in
   the MPI and ORTE wrapper compilers.
 * Update ompi_wrapper_script.in to match the new behavior.
 * Update orte_wrapper_script.in to support --openmpi:linkall (which
   is a no-op in this case)

This commit was SVN r27672.

The following Trac tickets were found above:
  Ticket 3422 --> https://svn.open-mpi.org/trac/ompi/ticket/3422
2012-12-14 16:34:20 +00:00
Jeff Squyres
f779b1ded9 Put back the static-library-detection stuff from r27668, with some
additional functionality.  Rationale (refs trac:3422):

 * Normal MPI applications only ever use the MPI API. Hence, -lmpi is
   sufficient (they'll never directly call ORTE or OPAL
   functions). This is arguably the most common case.
 * That being said, we do have some test programs (e.g., those in
   orte/test/mpi) that call MPI functions but also call ORTE/OPAL
   functions. I've also written the occasional MPI test program that
   calls opal_output, for example (there even might be a few tests in
   the IBM test suite that directly call ORTE/OPAL functions).
   * Even though this is not a common case, these applications should
     also compile/link with mpicc.
   * So we should add a --openmpi:linkall option that will also link
     in whatever is necessary to call ORTE/OPAL functions
   * Yes, we could hard-code "-lopen-rte -lopen-pal" in Makefiles, but
     we do reserve the right to change those library names and/or add
     others someday, so it's better to abstract out the names and let
     the wrapper supply whatever is necessary.
 * ORTE programs, however, are different. They almost always call OPAL
   functions (e.g., if they want to send a message, they must use the
   OPAL DSS). As such, it seems like the ORTE programs should always
   link in OPAL.

Therefore:

 * Add undocumented --openmpi:linkall flag to the wrapper compilers.
   See the comment in opal_wrapper.c for an explanation of what it
   does.  This flag is only intended for Open MPI developers -- not
   end users.  That's why it's undocumented.
 * Update orte/test/mpi/Makefile.am to add --openmpi:linkall
 * Make ortecc/ortec++'s wrapper data text files always explicitly
   link in libopen-pal

This commit was SVN r27670.

The following SVN revision numbers were found above:
  r27668 --> open-mpi/ompi@cf845897aa

The following Trac tickets were found above:
  Ticket 3422 --> https://svn.open-mpi.org/trac/ompi/ticket/3422
2012-12-13 22:31:37 +00:00
Jeff Squyres
cf845897aa Temporarily revert r27662 and r27667 because something wonky is
happening on OS X.  Grumble...

This commit was SVN r27668.

The following SVN revision numbers were found above:
  r27662 --> open-mpi/ompi@97cc916007
  r27667 --> open-mpi/ompi@529f6244ca
2012-12-11 23:08:14 +00:00
Jeff Squyres
97cc916007 Per discussion at the Open MPI developer meeting last week:
1. Restore libopen-pal.la, libopen-rte.la, and libmpi.la to be
    separate entities (i.e., don't have libopen-rte.la include
    libopen-pal.la, and don't have libmpi.la include libopen-pal.la).
    Yay!
 1. Consequently, make the wrapper compilers look for flags indicating
    that the user wants to compile statically (currently: -static,
    !--static, -Bstatic, and "-Wl," in front of all of those).  If it
    is, follow a 6-way matrix for determinining which libraries to
    list on the underlying command line.
 1. To support that, add the name of a token static and dynamic
    library to look for in each of the wrapper compiler data files.
 1. Fix a long-standing typo in the opalcc wrapper data file.

This commit was SVN r27662.
2012-12-11 01:46:59 +00:00
Nathan Hjelm
a427a7e727 do not include c99 flag in compiler wrappers
This commit was SVN r27625.
2012-11-20 19:33:14 +00:00
Ralph Castain
7a5f6b584c Have orte-info show thread support as well
This commit was SVN r27624.
2012-11-18 18:15:22 +00:00
Ralph Castain
fefec03e78 Enable all ORTE tools to use progress threads if they are enabled
This commit was SVN r27593.
2012-11-12 02:54:09 +00:00
Ralph Castain
bd887f7f56 Add a new "test" component to the DFS that treats all files as remote in order to test the app-to-daemon interactions on a single machine. Set a global param to indicate we are using staged execution. Add a param to indicate it is okay for non-MPI processes to execute without finalizing. Cleanup file map load and fetch operations.
This commit was SVN r27587.
2012-11-10 14:09:12 +00:00
Ralph Castain
fd632147df Per patch from Nathan, with a few fixes, cleanup the orte-info tool
This commit was SVN r27581.
2012-11-10 04:11:40 +00:00
Nathan Hjelm
2acd0f83de Revert "Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter".
It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now.

This commit was SVN r27527.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-30 19:45:18 +00:00
Ralph Castain
a080de188f Enable orterun to directly support staged execution, treating each app as a separate job. Support transfer of file maps when support exists.
This commit was SVN r27516.
2012-10-29 23:11:30 +00:00
Ralph Castain
e6014bf2e1 Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter
This commit was SVN r27477.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-24 18:38:44 +00:00
Nathan Hjelm
d59034e6ef MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions.
cmr:v1.7

This commit was SVN r27451.
2012-10-17 20:17:37 +00:00
Ralph Castain
9daaa001d9 Remove tools that are no longer required
This commit was SVN r27383.
2012-09-29 17:33:16 +00:00
Jeff Squyres
fb2e543a57 Refs trac:3275.
We ran into a case where the OMPI SVN trunk grew a new acceptable MCA
parameter value, but this new value was not accepted on the v1.6
branch (hwloc_base_mem_bind_failure_action -- on the trunk it accepts
the value "silent", but on the older v1.6 branch, it doesn't).  If you
set "hwloc_base_mem_bind_failure_action=silent" in the default MCA
params file and then accidentally ran with the v1.6 branch, every OMPI
executable (including ompi_info) just failed because hwloc_base_open()
would say "hey, 'silent' is not a valid value for
hwloc_base_mem_bind_failure_action!".  Kaboom.

The only problem is that it didn't give you any indication of where
this value was being set.  Quite maddening, from a user perspective.

So we changed the ompi_info handles this case.  If any framework open
function return OMPI_ERR_BAD_PARAM (either because its base MCA params
got a bad value or because one of its component register/open
functions return OMPI_ERR_BAD_PARAM), ompi_info will stop, print out
a warning that it received and error, and then dump out the parameters
that it has received so far in the framework that had a problem.

At a minimum, this will show the user the MCA param that had an error
(it's usually the last one), and ''where it was set from'' (so that
they can go fix it).  

We updated ompi_info to check for O???_ERR_BAD_PARAM from each from
the framework opens.  Also updated the doxygen docs in mca.h for this
O???_BAD_PARAM behavior.  And we noticed that mca.h had MCA_SUCCESS
and MCA_ERR_??? codes.  Why?  I think we used them in exactly one
place in the code base (mca_base_components_open.c).  So we deleted
those and just used the normal OPAL_* codes instead.

While we were doing this, we also cleaned up a little memory
management during ompi_info/orte-info/opal-info finalization.
Valgrind still reports a truckload of memory still in use at ompi_info
termination, but they mostly look to be components not freeing
memory/resources properly (and outside the scope of this fix).

This commit was SVN r27306.

The following Trac tickets were found above:
  Ticket 3275 --> https://svn.open-mpi.org/trac/ompi/ticket/3275
2012-09-11 20:47:24 +00:00
Jeff Squyres
a8f8064d8b Add a missing free(). Refs trac:3292.
This commit was SVN r27298.

The following Trac tickets were found above:
  Ticket 3292 --> https://svn.open-mpi.org/trac/ompi/ticket/3292
2012-09-11 17:59:40 +00:00
Ralph Castain
6d29cecce1 Fix the help message warning of multiple prefixes so it correctly prints out the info, and fix a typo.
cmr:v1.7

This commit was SVN r27241.
2012-09-05 16:28:36 +00:00
Ralph Castain
bae5dab916 If (and only if) a user requests, set the default number of slots on any node to the number of objects of the specified type. This *only* takes effect in an unmanaged environment - i.e., if an external resource manager assigns us a number of slots, then that is what we use. However, if we are using a hostfile, then the user may or may not have given us a value for the number of slots on each node.
For those nodes (and *only* those nodes) where the user does *not* specify a slot count, we will set the number of slots according to their direction: either to the number of cores, numas, sockets, or hwthreads. Otherwise, the slot count is set to 1.

Note that the default behavior remains unchanged: in the absence of any value for #slots, and in the absence of any directive to set #slots, we will set #slots=1.

This commit was SVN r27236.
2012-09-04 20:58:26 +00:00
Ralph Castain
98580c117b Introduce staged execution. If you don't have adequate resources to run everything without oversubscribing, don't want to oversubscribe, and aren't using MPI, then staged execution lets you (a) run as many procs as there are available resources, and (b) start additional procs as others complete and free up resources. Adds a new mapper as well as a new state machine.
Remove some stale configure.m4's we no longer need.

Optimize the nidmaps a bit by only sending info that has changed each time, instead of sending a complete copy of everything. Makes no difference for the typical MPI job - only impacts things like staged execution where we are sending multiple (possibly many) launch messages.

This commit was SVN r27165.
2012-08-28 21:20:17 +00:00
Ralph Castain
e0c39c94e8 Complete the cleanup of the preload files system. Remove the dest_dir option as moving things to arbitrary locations - especially absolute paths - can prove disastrous. Remove the preload_libs option as these can be treated as just files. Cleanup some of the pack/unpack code as the dss handles NULL strings just fine. Deal a little better with absolute paths, noting that tar now strips the leading '/' for us (showing my age as it didn't used to do so).
Remove the odls_base_state.c file as that code is now covered by the new broadcast form of preload_files.

This commit was SVN r27127.
2012-08-24 02:28:29 +00:00
Ralph Castain
b4a544ad2a Per discussion with Josh, use the --preload-xxx cmd line options to broadcast files to all nodes. Add --set-cwd-to-session-dir option to start procs in their session directories. Add OMPI_FILE_LOCATION envar to tell procs where their prepositioned files went.
This commit was SVN r27125.
2012-08-23 21:28:05 +00:00
Ralph Castain
a572b6fa9f Pick the right place
This commit was SVN r27085.
2012-08-17 00:28:28 +00:00
Ralph Castain
35fef87202 Make the "no virtual machine" selection more intuitive by providing a --novm option to mpirun.
This commit was SVN r27048.
2012-08-15 14:55:03 +00:00
Ralph Castain
589acf550c Improve the new MPI_INFO_ENV to better handle Java applications and to correctly report the info for singletons.
This commit was SVN r27025.
2012-08-13 22:13:49 +00:00
Jeff Squyres
3719b6c68b After some further discussion between Jeff, Ralph, and Josh, rever
r26951.  The feeling is that fixing the actual problem of the command
line parser not always identifying when invalid command line options
were specified (i.e., r26953) was a better solution.

This commit was SVN r26979.

The following SVN revision numbers were found above:
  r26951 --> open-mpi/ompi@1f8df92c3c
  r26953 --> open-mpi/ompi@0b7b3feba9
2012-08-09 20:56:01 +00:00
Ralph Castain
1f8df92c3c Remove the confusion over which options are "to" and which are "by" by creating synonyms so that either spelling works.
This commit was SVN r26951.
2012-08-05 14:40:38 +00:00
Ralph Castain
c7f9a0fa34 Check for recursive use of mpirun - issue error message and abort if detected
This commit was SVN r26903.
2012-07-28 21:50:56 +00:00
Abhishek Kulkarni
5c58a1c9c1 Fix C/R support in the trunk.
Among other things, this patch deals with the following issues:
* fix ompi-checkpoint argument parsing
* ompi-restart -showme prints an extraneous "Restarted child with PID" 
  message. Move around the debug statement to avoid this.
* fixes for the state machine changes

This commit was SVN r26770.
2012-07-09 23:34:13 +00:00
Ralph Castain
e6f3586415 Remove the orte notifier framework, per discussion at the devel meeting and follow-up with Jeff (who took the action item)
This commit was SVN r26637.
2012-06-22 18:09:23 +00:00
Brian Barrett
9af72072a3 Use MKDIR_P instead of mkdir_p in Makefiles, as MKDIR_P is the only one
defined in recent versions of AC/AM.

This commit was SVN r26625.
2012-06-21 16:52:37 +00:00
Ralph Castain
0a713cd27e Add database framework to ORTE and refactor modex code to utilize it. Create the "hash" db component from the prior modex db code. Leave the other components ignored for now - will activate them later.
Modex is still a blocking operation at this point.

This commit was SVN r26618.
2012-06-19 13:38:42 +00:00
Ralph Castain
269cb2b8d9 Some cleanup to remove calls to opal_progress when running with orte progress threads, and to ensure that all orte-related events are in the orte event base.
This commit was SVN r26591.
2012-06-11 19:59:53 +00:00
Ralph Castain
d6279fc971 Fix the debugger daemon launch support to fit the new state machine. Treat debugger daemons just like any other job, except that we map them only to nodes where an app process currently exists (as opposed to every node in the system). Trigger breakpoint and rank0 release only after the debugger daemons are in position.
This commit was SVN r26556.
2012-06-06 02:01:23 +00:00
Jeff Squyres
99c5afb397 Remove clang compiler warnings.
This commit was SVN r26523.
2012-05-29 23:36:06 +00:00
Ralph Castain
be6ed9c2df Allow partial use of allocations by specifying the max number of daemons (i.e., max VM size) for the job
This commit was SVN r26499.
2012-05-27 16:48:19 +00:00
Jeff Squyres
7969faf372 Fixes trac:3057: minor update to the man page to state that slot locations
in rankfiles use ''physical'' device indexes (vs. logical indexes).

This commit was SVN r26478.

The following Trac tickets were found above:
  Ticket 3057 --> https://svn.open-mpi.org/trac/ompi/ticket/3057
2012-05-23 11:43:33 +00:00
Ralph Castain
b217124bd8 Symlink instead of copy
This commit was SVN r26464.
2012-05-21 23:07:48 +00:00
Ralph Castain
da3873af6f Rename the mapreduce tool to "mr+" per the marketing types
This commit was SVN r26463.
2012-05-21 21:17:44 +00:00
Ralph Castain
a526afae92 Ensure we always cleanup local procs, no matter how we exited.
This commit was SVN r26454.
2012-05-18 23:37:40 +00:00
Ralph Castain
12ebc0e269 Don't need this to be a bin program as the class is captured in the jar
This commit was SVN r26453.
2012-05-18 23:37:18 +00:00
Ralph Castain
b16e43f489 Silence a warning on Mac
This commit was SVN r26449.
2012-05-18 15:27:04 +00:00
Ralph Castain
ca1b325738 Tweak the java setup so it works better on Mac. Only build mapreduce and allocators if hadoop support was requested.
This commit was SVN r26448.
2012-05-18 01:02:01 +00:00
Jeff Squyres
2d78728d38 Fix the macro name in the comment: it's EXTRA_DIST, not EXTRA_SOURCES.
This commit was SVN r26429.
2012-05-10 14:07:36 +00:00
Jeff Squyres
b325c17c72 It's a little weird to put in a blank _SOURCES line for the
HDFSFileFinder PROGRAM, but if we don't put in a _SOURCES line at all,
Automake will default to "HDFSFileFinter_class_SOURCES =
HDFSFileFinder.c", which clearly will cause problems.  

But we don't want to put the .java file in _SOURCES, either, because
we haven't configured Automake to handle Java (because current
versions of Automake only have GCJ, not other Java compilers).  So set
HDFSFileFinder_class_SOURCES to blank and list the .java file in
EXTRA_SOURCES (so that they get picked up for "make dist").

This commit was SVN r26424.
2012-05-10 13:54:51 +00:00
Ralph Castain
640f0610aa Fix the makefile to install the perl scripts properly
This commit was SVN r26416.
2012-05-09 14:06:02 +00:00
Ralph Castain
fd796cce0a Add an allocator tool for finding HDFS file locations and obtaining allocations for those nodes (supports both Hadoop 1 and 2). Split the Java support into two parts: detection of Java support and request for Java bindings.
This commit was SVN r26414.
2012-05-09 01:13:49 +00:00
Jeff Squyres
2ba10c37fe Per RFC, bring in the following changes:
* Remove paffinity, maffinity, and carto frameworks -- they've been
   wholly replaced by hwloc.
 * Move ompi_mpi_init() affinity-setting/checking code down to ORTE.
 * Update sm, smcuda, wv, and openib components to no longer use carto.
   Instead, use hwloc data.  There are still optimizations possible in
   the sm/smcuda BTLs (i.e., making multiple mpools).  Also, the old
   carto-based code found out how many NUMA nodes were ''available''
   -- not how many were used ''in this job''.  The new hwloc-using
   code computes the same value -- it was not updated to calculate how
   many NUMA nodes are used ''by this job.''
   * Note that I cannot compile the smcuda and wv BTLs -- I ''think''
     they're right, but they need to be verified by their owners.
 * The openib component now does a bunch of stuff to figure out where
   "near" OpenFabrics devices are.  '''THIS IS A CHANGE IN DEFAULT
   BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors
   (I do not have a NUMA machine with an OpenFabrics device that is a
   non-uniform distance from multiple different NUMA nodes).
 * Completely rewrite the OMPI_Affinity_str() routine from the
   "affinity" mpiext extension.  This extension now understands
   hyperthreads; the output format of it has changed a bit to reflect
   this new information.
 * Bunches of minor changes around the code base to update names/types
   from maffinity/paffinity-based names to hwloc-based names.
 * Add some helper functions into the hwloc base, mainly having to do
   with the fact that we have the hwloc data reporting ''all''
   topology information, but sometimes you really only want the
   (online | available) data.

This commit was SVN r26391.
2012-05-07 14:52:54 +00:00
Ralph Castain
b2f77bf08f Extend the iof by adding two new components to support map-reduce IO chaining. Add a mapreduce tool for running such applications.
Fix the state machine to support multiple jobs being simultaneously launched as this is not only required for mapreduce, but can happen under comm-spawn applications as well.

This commit was SVN r26380.
2012-05-02 21:00:22 +00:00
Ralph Castain
a927318ea1 Add -N option as synonym for "npernode"
This commit was SVN r26367.
2012-05-01 16:18:14 +00:00
Jeff Squyres
501a86afe1 No need to include the generated files in the tarball. Thanks to
Eugene for pointing this out.

This commit was SVN r26339.
2012-04-25 14:19:18 +00:00
Ralph Castain
5d14fa7546 Fix mpi_abort, minimize error output.
This commit was SVN r26266.
2012-04-11 14:37:08 +00:00
Ralph Castain
14d5525fb1 Some minor cleanups. Get singletons working. Cleanup abort handling so it gets properly identified.
This commit was SVN r26261.
2012-04-10 19:08:54 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Ralph Castain
6dc44dc4b8 Look at the basename of the appname for the "java" keyword
This commit was SVN r26190.
2012-03-24 00:38:18 +00:00
Josh Hursey
a595525366 Add the callers name to the 'comm failed' error message, so we know between which two peers the communication failed.
This commit was SVN r26117.
2012-03-08 21:55:19 +00:00
Ralph Castain
c3cf46af65 Ensure install_dirs are filled in before parsing prefix
This commit was SVN r26093.
2012-03-03 23:14:15 +00:00
Ralph Castain
b8f093d1a0 Switch precedence - take the --prefix value over the absolute-path-to-mpirun so the backend prefix can be different from that of mpirun on hetero machines.
This commit was SVN r26085.
2012-03-02 22:59:13 +00:00
Ralph Castain
6c93dd13b0 Cleanup the prefix handling by mpirun. Important note: we do NOT support per-app_context prefixes!!
Don't let app_files trump given prefix values. Assign according to following precedence rules:

1. absolute path to mpirun, if given
2. --prefix value, if given to mpirun
3. default prefix, if configured with --enable-orterun-prefix-default
4. prefix from first app in app_file, if given
5. no prefix

This commit was SVN r26081.
2012-03-02 19:48:25 +00:00
Jeff Squyres
97b3603036 A bunch of fixes and improvements to Open MPI's various command line tools.
* fixed some bugs where "unknown" tokens were allowed on the command
   line (which should really only be used for ortertun).
 * if an unknown token is encountered, print a short error to stderr
   and quit with a nonzero exit status
 * if we don't find the right number of parameters to an option, print
   a short error to stderr and quit with a nonzero exit status
 * when --help is given, print the help message to stdout (not stderr)
   and quit with a zero exit status
 * added --showme:help option to the wrapper compilers
 * updated docs in opal/util/cmd_line.h
 * other small/miscellaneous CLI parsing bugs in various tools

I won't bore you with what we did before.  :-)  Here's some examples
of what the new behavior looks like:

{{{
% ompi_info --bogus
ompi_info: Error: unknown option "--bogus"
Type 'ompi_info --help' for usage.
% ompi_info --param bogus
ompi_info: Error: option "--param" did not have enough parameters (2)
Type 'ompi_info --help' for usage.
%
}}}

This commit was SVN r26072.
2012-02-29 17:52:38 +00:00
Ralph Castain
bc5886707f Document the mpirun exit status behavior
This commit was SVN r26009.
2012-02-22 23:47:00 +00:00
Ralph Castain
47c64ec837 Roll in Java bindings per telecon discussion. Man pages still under revision
This commit was SVN r25973.
2012-02-20 22:12:43 +00:00
Ralph Castain
d7d8a8cdf7 Some cleanup of the tmpdir session directory specifications. Remove the --tmpdir option from orterun as it was confusing. Create an orte_local_tmpdir_base mca param in its place. Clarify the role of the local vs remote vs global tmpdir base params, and ensure that you don't set conflicting options.
Remove the OMPI_PREFIX_ENV environmental variable as that was totally confusing as a way of setting a tmpdir base location.

This commit was SVN r25941.
2012-02-16 16:10:01 +00:00
Jeff Squyres
54cf60eb4b $(RM) is not a standard macro. Just use "rm" -- every platform has it.
This commit was SVN r25934.
2012-02-15 19:51:59 +00:00
Jeff Squyres
ae9503db6e Remove the sentence that says that --prefix is a per-context option.
This commit was SVN r25932.
2012-02-15 18:31:27 +00:00
Ralph Castain
61ac2bb11b If no session directories are being created, then we cannot create the debugger attachment fifo - so don't complain about it.
This commit was SVN r25802.
2012-01-27 04:05:23 +00:00
Ralph Castain
6db8c56cd4 Add local and node ranks to debugger daemon procs so the odls properly launches them
This commit was SVN r25774.
2012-01-25 03:17:10 +00:00
Ralph Castain
bf09133631 Correctly track the number of debugger daemons being spawned
This commit was SVN r25741.
2012-01-19 18:17:07 +00:00
Ralph Castain
6235a355de Correctly handle co-spawning of daemons when attaching to a running job. We cannot use the general process mappers as we only want debugger daemons spawned on nodes where application procs already exist. So custom build the map for the debugger daemon job, and have the plm just launch that job without doing its usual vm-spawn step.
This commit was SVN r25736.
2012-01-18 00:19:49 +00:00
Ralph Castain
fd0d9f73c6 Make preload_binaries an MCA param so it can be set in the default MCA parameters for a system
This commit was SVN r25728.
2012-01-17 17:16:05 +00:00
Shiqing Fan
f57f873404 Disable the debugger support for Windows.
This commit was SVN r25725.
2012-01-17 16:21:33 +00:00
Ralph Castain
ce7ddd0e10 Create the debugger attach fifo unless the user requests that we periodically poll insteaad.
This commit was SVN r25714.
2012-01-11 19:44:22 +00:00
Ralph Castain
bf103de66c My apologies for doing this outside of the usual time restrictions, but we need to get this in so we can make progress.
Move the ORTE-level debugger code back into orterun and out of the ORTE library to resolve symbol conflicts.

This commit was SVN r25713.
2012-01-11 15:53:09 +00:00
Jeff Squyres
a4c8bb27fa Pull in the MPIR_Breakpoint symbol via a dummy function in
debuggers_base_fns.c: orte_debugger_base_pull_mpir_breakpoint().

This commit was SVN r25660.
2011-12-15 18:39:34 +00:00
Nathan Hjelm
9dec101043 fix totalview launch through --debug
This commit was SVN r25654.
2011-12-15 15:19:13 +00:00
Ralph Castain
f531b09a8d Correctly handle -host and -hostfile options. Ensure the initial vm launch constrains itself to the union of specified hosts if those options are given. Get oversubscribe set correctly for that case.
This commit was SVN r25648.
2011-12-14 20:01:15 +00:00
Ralph Castain
7510339725 Remove stale orte_vm_launch param. Add a param that allows users to specify envars to forward/set so they can do it in the MCA param file instead of only via mpirun cmd line.
This commit was SVN r25580.
2011-12-06 21:31:22 +00:00
Ralph Castain
90b7f2a7bf The rest of the multi app_context fix. Remove the restriction on number of app_contexts that can have zero np specified as multiple mappers now support that use-case. Update the ranking algorithms to respect and track bookmarks. Ensure we properly set the oversubscribed flag on a per-node basis.
This commit was SVN r25578.
2011-12-06 17:28:29 +00:00
Ralph Castain
6fefe236a4 Warn users if they set opal_paffinity_alone, either to true or false, that this parameter is no longer functional - they must use the --bind-to option and its corresponding mca param.
This commit was SVN r25567.
2011-12-03 01:10:52 +00:00
Ralph Castain
c56acf60ca Although we never really thought about it, we made an unconscious assumption in the mapper system - we assumed that the daemons would be placed on nodes in the order that the nodes appear in the allocation. In other words, we assumed that the launch environment would map processes in node order.
Turns out, this isn't necessarily true. The Cray, for example, launches processes in a toroidal pattern, thus causing the daemons to wind up somewhere other than what we thought. Other environments (e.g., slurm) are also capable of such behavior, depending upon the default mapping algorithm they are told to use.

Resolve this problem by making the daemon-to-node assignment in the affected environments when the daemon calls back and tells us what node it is on. Order the nodes in the mapping list so they are in daemon-vpid order as opposed to the order in which they show in the allocation. For environments that don't exhibit this mapping behavior (e.g., rsh), this won't have any impact.

Also, clean up the vm launch procedure a little bit so it more closely aligns with the state machine implementation that is coming, and remove some lingering "slave" code.

This commit was SVN r25551.
2011-11-30 19:58:24 +00:00
Ralph Castain
b475421c16 As promised, rationalize the rsh support. Remove rshbase and the base rsh support, centralizing all rsh support into the rsh component. Remove the "slave" launch support as that experiment is complete. Fix tree spawn and make that the default method for rsh launch, turning it "off" for qrsh as that system does not support tree spawn.
This commit was SVN r25507.
2011-11-26 02:33:05 +00:00
Ralph Castain
9b59d8de6f This is actually a much smaller commit than it appears at first glance - it just touches a lot of files. The --without-rte-support configuration option has never really been implemented completely. The option caused various objects not to be defined and conditionally compiled some base functions, but did nothing to prevent build of the component libraries. Unfortunately, since many of those components use objects covered by the option, it caused builds to break if those components were allowed to build.
Brian dealt with this in the past by creating platform files and using "no-build" to block the components. This was clunky, but acceptable when only one organization was using that option. However, that number has now expanded to at least two more locations.

Accordingly, make --without-rte-support actually work by adding appropriate configury to prevent components from building when they shouldn't. While doing so, remove two frameworks (db and rmcast) that are no longer used as ORCM comes to a close (besides, they belonged in ORCM now anyway). Do some minor cleanups along the way.

This commit was SVN r25497.
2011-11-22 21:24:35 +00:00
Ralph Castain
6310361532 At long last, the fabled revision to the affinity system has arrived. A more detailed explanation of how this all works will be presented here:
https://svn.open-mpi.org/trac/ompi/wiki/ProcessPlacement

The wiki page is incomplete at the moment, but I hope to complete it over the next few days. I will provide updates on the devel list. As the wiki page states, the default and most commonly used options remain unchanged (except as noted below). New, esoteric and complex options have been added, but unless you are a true masochist, you are unlikely to use many of them beyond perhaps an initial curiosity-motivated experimentation.

In a nutshell, this commit revamps the map/rank/bind procedure to take into account topology info on the compute nodes. I have, for the most part, preserved the default behaviors, with three notable exceptions:

1. I have at long last bowed my head in submission to the system admin's of managed clusters. For years, they have complained about our default of allowing users to oversubscribe nodes - i.e., to run more processes on a node than allocated slots. Accordingly, I have modified the default behavior: if you are running off of hostfile/dash-host allocated nodes, then the default is to allow oversubscription. If you are running off of RM-allocated nodes, then the default is to NOT allow oversubscription. Flags to override these behaviors are provided, so this only affects the default behavior.

2. both cpus/rank and stride have been removed. The latter was demanded by those who didn't understand the purpose behind it - and I agreed as the users who requested it are no longer using it. The former was removed temporarily pending implementation.

3. vm launch is now the sole method for starting OMPI. It was just too darned hard to maintain multiple launch procedures - maybe someday, provided someone can demonstrate a reason to do so.

As Jeff stated, it is impossible to fully test a change of this size. I have tested it on Linux and Mac, covering all the default and simple options, singletons, and comm_spawn. That said, I'm sure others will find problems, so I'll be watching MTT results until this stabilizes.

This commit was SVN r25476.
2011-11-15 03:40:11 +00:00
Ralph Castain
729935dffb Minor cleanups, mirroring what Jeff did to ompi_info
This commit was SVN r25438.
2011-11-05 00:42:49 +00:00
Ralph Castain
fcee46b063 Add an option for printing a diffable process map for testing mappers
This commit was SVN r25428.
2011-11-03 14:22:07 +00:00
Ralph Castain
d28dd55d33 Minimize the amount of topology info returned by the daemons. Most clusters, especially at scale, use the same node topology on every node, so there is no re
ason to return the topology from every daemon. Borrow a page from the --hetero-apps page and let users indicate that the node topology differs by adding a --
hetero-nodes option to mpirun. If the option is set, then every daemon returns topology info. If not set, then only daemon vpid=1 returns it.

We always want one daemon to return the topology as the head node is often different from the compute nodes. Having one daemon return the compute node topolo
gy allows us to detect any such difference. All compute nodes are then set to the same topology.

This commit was SVN r25408.
2011-11-01 18:43:10 +00:00