It turns out that we were improperly allocating an array if -np was not passed. Also, we were not really using this array for anything. So this gets rid of the array and performs some minor cleanup.
This commit was SVN r11934.
The following Trac tickets were found above:
Ticket 452 --> https://svn.open-mpi.org/trac/ompi/ticket/452
on 64 bit platforms sizeof(size_t) != sizeof(orte_std_cntr_t), and we were incorrectly
assuming this when dealing with num procs. It worked on little endian platforms, but
not big endian. So change num_procs to type int, and cast where needed.
This commit was SVN r11796.
Add --enable-orterun-prefix-by-default (and a synonym:
--enable-mpirun-prefix-by-default) to make orterun always behave as if
"--prefix $prefix" was given on the command line (where $prefix is the
value given to the --prefix option to configure). This prevents many
rsh/ssh users from needing to modify their shell startup files to set
the LD_LIBRARY_PATH for Open MPI (they will still need to set PATH or
otherwise find the OMPI executables to mpicc/mpirun/etc. their MPI
applications).
Also added --noprefix option to orterun to disable this behavior.
Finally, note that even if --enable-orterun-prefix-by-default is
specified, if the user specifies --prefix or /path/to/mpirun, these
options will override the default value of the prefix ($prefix).
This commit was SVN r11669.
The following Trac tickets were found above:
Ticket 377 --> https://svn.open-mpi.org/trac/ompi/ticket/377
Other changes:
1. Remove the old xcpu components as they are not functional.
2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one.
This will require an autogen/configure, I'm afraid.
This commit was SVN r11228.
Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced).
Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up).
I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t).
In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but...
Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems.
This commit was SVN r11204.
Added another system-level test function for ORTE that just spins until terminated by a ctrl-c signal.
Modified orterun - added a couple of newlines to the output when abnormally terminating so the prompt always is on a new line.
This commit was SVN r10866.
- change -no_oversubscribe to -nooversubscribe (to be similar to
-nolocal)
- Added text to orterun.1 describing slots and -nooversubscribe
Still need to add text about "mpirun a.out" functionality, and RHC
wants to make some minor edits, so committing for synchronization.
This commit was SVN r10800.
Update the help text to report errors when not following that rule.
Also updated the RMAPS help text to reflect the reorganization of some of the round-robin code into the base.
The new functionality has been tested under Mac OS-X and on Odin using an MPI program. Both byslot and bynode mapping have been checked and verified. Operational support for other systems needs to be verified - I respectfully request people's help in doing so.
This commit was SVN r10708.
/tmp/tm-merge). Validated by RHC. Summary:
- Add --nolocal (and -nolocal) options to orterun
- Make some scalability improvements to the tm pls
This commit was SVN r10651.
This moves the logic to create the symbolic links for:
- mpirun
- mpiexec
- ompi-ps
- ompi-clean
and their respective man pages to the ompi level from
the orte layer.
This is a bit pedantic, but orte shouldn't be doing the
work of ompi since that is a bit of an abstraction break.
Note: need to autogen.sh to get this. Sorry :(
This commit was SVN r10602.
SIGUSR2. This can be extended later if needed to include other
signals we should forward to the user processes (TSTP and CONT,
perhaps?)
* Since the signal handlers don't actually run in signal context, we
can use malloc/fprintf/etc. So clean up some of the signal handler
code so that we don't keep message buffers around for the life of
the process
This commit was SVN r10496.
mpirun/orterun now has an option to print the version number. If -V/--version
is given, it will print the version number. If it's the only option, we
exit cleanly. Otherwise, we continue on as if --version wasn't given
(except we've printed the version number).
--This line, and th se below, will be ignored--
M orte/tools/orterun/orterun.c
M orte/tools/orterun/help-orterun.txt
This commit was SVN r10276.
1. Changed the RMGR and PLS APIs to add "signal_job" and "signal_proc" entry points. Only the "signal_job" entries are implemented - none of the components have implementations for "signal_proc" at this time. Thus, you can signal all of the procs in a job, but cannot currently signal only one specific proc.
2. Implemented those new API functions in all components except xgrid (Brian will do so very soon). Only the rsh/ssh and fork modules have been tested, however, and only under OS-X.
3. Added signal traps and callback functions for SIGUSR1/2 to orterun/mpirun that catch those signals and call the appropriate commands to propagate them out to all processes in the job.
4. Added a new test directory under the orte branch to (eventually) hold unit and system level tests for just the run-time. Since our test branch of the repository is under restricted access, people working on the RTE were continually developing their own system-level tests - thus making it hard to help diagnose problems. I have moved the more commonly-used functions here, and added one specifically for testing the SIGUSR1/2 functionality.
I will be contacting people directly to seek help with testing the changes on more environments. Other than compile issues, you should see absolutely no change in behavior on any of your systems - this additional functionality is transparent to anyone who does not issue a SIGUSR1/2 to mpirun.
Ralph
This commit was SVN r10258.
option as discussed on the devel-core mailing list. The Big
Difference is that instead of hard-coding the strings "/lib" and
"/bin" in to append to the prefix, we append the basename of the local
libdir and bindir. Hence, if your libdir is $prefix/lib64, we'll
append /lib64 to construct the remote node's LD_LIBRARY_PATH (etc.).
Also appended the orterun.1 man page to include a description of
--prefix, how it is constructed, what it handles / what it does not,
etc.
This commit was SVN r9930.
in San Jose. Allow the configure option --disable-binaries to build OMPI,
but not build or install the support binaries (so basically, just build
the libraries).
This commit was SVN r9777.
NOTE: JEFF SHOULD CHECK THIS!
I found that orterun was not tracking the index number of the app_contexts it was creating. Hence, the app_context->idx field was always sitting at zero. This index is used by the mapper to decide which app_context to use for each process - thus, with the value of each index being zero, the mapper only used the first app_context that was created. All others were ignored.
Not sure when this might have gotten changed. Could be it was a problem that always existed, but didn't get exposed until something else was changed.
Anyway, it seems to work now - could stand further testing.
This commit was SVN r9389.
(reported by Cisco)
- While in orterun, add a feature that multiple users have asked for:
if you specify an absolute pathname to orterun, such as
"/path/to/bin/orterun ...", it's equivalent to "orterun --path
/path/to ..."
This commit was SVN r9181.
- Explained <program> and made a consistancy change in the Quick Start section.
- Change references to 'app schema' to Open MPI 'app context'
- Audit the command line arguments for --foo, -foo stuff.
This commit was SVN r9097.
This needs to be gone over a few more times before it is allowed to see
daylight, but has come a long way. Some sections may be off more than a little,
but the general idea is there.
Need to audit to make sure we don't call the ORTE VHNP's daemons :)
This commit was SVN r9078.
Finished adding and pruning all the the Options.
Cleaned up a bunch of man syntax, so it should be 'more' readable (making the
assumption that man source is ever readable :p).
I am moving on to the "description" and "see also" sections next.
This commit was SVN r9077.
argv[0] and the cwd on the target node (i.e., the node where the
executable will be running in all systems except BProc, where the
searches are run on the node where orterun is invoked).
- fork pls now does cwd and argv[0] search in orted
- bproc pls does cwd and argv[0] search in orterun
- cwd behavior slightly different:
- if user specifies a -wdir to orterun, we chdir() to there; if we
can't for some reason, abort
- if user does not specify a -wdir, try to chdir() to the dir where
orterun was invoked. If we can't for some reason (e.g., it
doesn't exist on the target node), then try to chdir($HOME). If
we can't do that, then just live with whatever default directory
we were put in.
This commit was SVN r9068.
really needs to be documented (in part so that users stop asking us
how to do it!).
This is a first cut at an orterun.1 man page. It is 95% copied from
LAM's mpirun.1 lam page -- I just edited the very top and am handing
this off to Josh to finish the first cut. Then we'll add specific
docs about the behavior of some of the finer details. This is not
listed in the Makefile.am yet because it's so incomplete/incorrect
(w.r.t. OMPI), so I don't want it included in the tarball or installed
[yet].
This commit was SVN r9058.
- move files out of toplevel include/ and etc/, moving it into the
sub-projects
- rather than including config headers with <project>/include,
have them as <project>
- require all headers to be included with a project prefix, with
the exception of the config headers ({opal,orte,ompi}_config.h
mpi.h, and mpif.h)
This commit was SVN r8985.
(windows). Instead use the LN_S variable exported by the Makefile (set to
"ln -s" on all Unixes and to "cp -p" on windows).
When we remove an executable use the correct extension for its name
(add $(EXEEXT) to the name).
This commit was SVN r8616.
debugger scheme described in
http://www.open-mpi.org/community/lists/users/2005/10/0214.php. This
makes our user-level debugger scheme much more vendor-independent
(although the "-tv" option will still work for backwards compatibility
-- it'll just be a synonum of "--debug").
This commit was SVN r8206.
command:
svn merge -r 7567:7663 https://svn.open-mpi.org/svn/ompi/tmp/jjhursey-rmaps .
(where "." is a trunk checkout)
The logs from this branch are much more descriptive than I will put
here (including a *really* long description from last night). Here's
the short version:
- fixed some broken implementations in ras and rmaps
- "orterun --host ..." now works and has clearly defined semantics
(this was the impetus for the branch and all these fixes -- LANL had
a requirement for --host to work for 1.0)
- there is still a little bit of cleanup left to do post-1.0 (we got
correct functionality for 1.0 -- we did not fix bad implementations
that still "work")
- rds/hostfile and ras/hostfile handshaking
- singleton node segment assignments in stage1
- remove the default hostfile (no need for it anymore with the
localhost ras component)
- clean up pls components to avoid duplicate ras mapping queries
- [possible] -bynode/-byslot being specific to a single app context
This commit was SVN r7664.
at the moment.
Also remove all references to --map, and (C, N) command line options in the
help file. These references will be put back in when these options are
implemented.
This commit was SVN r7574.
Have the ras_base_schedule_policy MCA parameter working once again. before it
would only do slot based allocation, even if the MCA parameter was set properly.
Currently you can specify to orterun a node allocation by either:
-mca ras_base_schedule_policy node
-bynode
and slot allocation (which is the default) by:
-mca ras_base_schedule_policy slot
-byslot
This commit was SVN r7513.
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
number to be set at autoconf time (instead of at configure time, as
it was before). Set the version number, minus the subversion r number,
at autoconf time. Override the internal variables to include the r
number (if needed) at configure time. Basically, the right thing
should always happen. The only place it might not is the version
reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
in the directory containing source files, even if the Makefile.am is
in another directory. This should start making it feasible to
reduce the number of Makefile.am files we have in the tree, which
will greatly reduce the time to run autogen and configure.
This commit was SVN r7211.
app_context:
mpirun -np 2 -prefix /path/to/ompi/on/machineA ./exec1 : \
-np 2 -prefix /path/to/ompi/on/machineB ./exec2
- Allow with -mca pls_rsh_assume_same_shell 0, the checking for the
SHELL-variable on the actual node (currently 1st node).
Sets the prefix, PATH and LD_LIBRARY_PATH for bash/ksh and
csh/tcsh.
This commit was SVN r7195.
include any optimization flags
- Use these flags to always compile ompi/debuggers/* and orterun so
that parallel debuggers (such as Totalview) can always see the
debugging symbols (see comments in ompi/debuggers/Makefile.am and
orte/tools/orterun/Makefile.am)
- Remove some obsolete LAM-named variables from configure.ac
This commit was SVN r7125.
session directory cleanup (among other things)
- When we get an abnormal exit in orterun (i.e., timeout expires and
we haven't gotten termination notices from all processes), print a
better message an exit in a better way (which includes session
directory cleanup)
- Fix tm and poe pls's to not exit() but rather propagate the error up
the stack (where relevant)
This commit was SVN r7058.
- Change orte_base_infrastructre to orte_infrastructre to conform with
ompi_info's needs
- Move MCA Param registration in ORTE to a centralized function that is
called first in orte_init_stage1
- Set the infrastructre flag as an argument to orte_init
- Adjust initalization functions to properly pass down the infrastructre
flag.
This commit was SVN r7053.
that were set on the command line. This was techinically exactly the
way the code was designed, but it certainly violated the Law of Least
Astonishment (even to its designer ;-) ). So now if you execute
something like this:
mpirun -mca pls_rsh_debug 1 -np 4 hello
You'll see debugging output from the rsh pls component, as you would
expect (this was not previously the case -- the MCA pls_rsh_debug
parame would be set to 1 in the 4 spawned hello processes, but *not*
in the orterun process).
More specifically, MCA parameters will be set in the orterun process
in the following cases:
- The new command line switch "--gmca" (or "-gmca") is used,
indicating that the MCA parameter is "global". --gmca also means
that that MCA parameter will be applied to all context app's. For
example:
mpirun -gmca foo bar -np 1 hello : -np 2 goodbye
The foo MCA param will be set in both the hello and goodbye
processes.
- If there is only one context app. For example:
mpirun -mca pls_rsh_debug 1 -np 4 hello
will set pls_rsh_debug to 1 in both the orterun process and the 4
spawned hello processes.
Also added a few more comments inside orterun to document a somewhat
confusing use of a state variable in a recursive case.
This commit was SVN r6764.
Somehow, in changing over to the new MCA interfaces, the "set" part of that logic got lost, so the singleton flag was always being set. This should repair some of the anomalous behavior seen recently where the local host was always being used for an application process.
This commit was SVN r6757.
* rename ompi_malloc to opal_malloc
* rename ompi_numtostr to opal_numtostr
* start of rename of ompi_environ to opal_environ
This commit was SVN r6332.
* rename ompi_basename to opal_basename
* rename ompi bitop functions to opal
* rename ompi_cmd_line to opal_cmd_line
* rename ompi_sizet2int to opal_sizet2int
* rename orte_daemon_init to opal_daemon_init
* rename ompi_few to opal_few
This commit was SVN r6330.