possible things contained in the conditional like other rules are (for
example, a SOURCES rule in a conditional automatically has its files
added to the dist rules, even if that conditional isn't tru when
make dist occurs). So the man files weren't in the tarball.
Put the EXTRA_DIST with the files explicitly listed outside any conditionals
so the man pages always end up in the tarball.
This commit was SVN r12220.
they might require special tools (not sure if sed with multiple -e
arguments is totally portable)
- ignore the opalcc.1 man page. Couldn't do this in the previous
man page commit (r12192) because I was removing opalcc.1 in that
commit.
This commit was SVN r12194.
The following SVN revision numbers were found above:
r12192 --> open-mpi/ompi@581a4b0a4e
- Only install opal{cc,c++} and orte{cc,c++} if configured with
--with-devel-headers. Right now, they are always installed, but
there are no header files installed for either project, so there's
really not much way for a user to actually compile an OPAL / ORTE
application.
- Drop support for opalCC and orteCC. It's a pain to setup all the
symlinks (indeed, they are currently done wrong for opalCC) and
there's no history like there is for mpiCC.
- Change what is currently opalcc.1 to opal_wrapper.1 and add some
macros that get sed'ed so that the man pages appear to be
customized for the given command.
- Install the wrapper data files even if we compiled with
--disable-binaries. This is for the use case of doing multi-lib
builds, where one word size will only have the library built, but
we need both set of wrapper data files to piece together to
activate the multi-lib support in the wrapper compilers.
This commit was SVN r12192.
Fix the problem observed by multiple people that comm_spawned children were (once again) being mapped onto the same nodes as their parents. This was caused by going through the RAS a second time, thus overwriting the mapper's bookkeeping that told RMAPS where it had left off.
To solve this - and to continue moving forward on the ORTE development - we introduce the concept of attributes to control the behavior of the RM frameworks. I defined the attributes and a list of attributes as new ORTE data types to make it easier for people to pass them around (since they are now fundamental to the system, and therefore we will be packing and unpacking them frequently). Thus, all the functions to manipulate attributes can be implemented and debugged in one place.
I used those capabilities in two places:
1. Added an attribute list to the rmgr.spawn interface.
2. Added an attribute list to the ras.allocate interface. At the moment, the only attribute I modified the various RAS components to recognize is the USE_PARENT_ALLOCATION one (as defined in rmgr_types.h).
So the RAS components now know how to reuse an allocation. I have debugged this under rsh, but it now needs to be tested on a wider set of platforms.
This commit was SVN r12138.
I have tested on rsh, slurm, bproc, and tm. Bproc continues to have a problem (will be asking for help there).
Gridengine compiles but I cannot test (believe it likely will run).
Poe and xgrid compile to the extent they can without the proper include files.
This commit was SVN r12059.
It turns out that we were improperly allocating an array if -np was not passed. Also, we were not really using this array for anything. So this gets rid of the array and performs some minor cleanup.
This commit was SVN r11934.
The following Trac tickets were found above:
Ticket 452 --> https://svn.open-mpi.org/trac/ompi/ticket/452
remove requirements on .la files on wrapper scripts
Ticket: #374
extend compilers to support 32 bit and 64 bit in one version of the wrapper
Submitted by: Dan Lacher
Reviewed by: Rolf Vandevaart
This commit was SVN r11908.
install-exec-hook is not only wrong, it can cause ordering issues such
as trying to put sym links to man pages in directories that do not yet
exist.
This commit was SVN r11893.
on 64 bit platforms sizeof(size_t) != sizeof(orte_std_cntr_t), and we were incorrectly
assuming this when dealing with num procs. It worked on little endian platforms, but
not big endian. So change num_procs to type int, and cast where needed.
This commit was SVN r11796.
Add --enable-orterun-prefix-by-default (and a synonym:
--enable-mpirun-prefix-by-default) to make orterun always behave as if
"--prefix $prefix" was given on the command line (where $prefix is the
value given to the --prefix option to configure). This prevents many
rsh/ssh users from needing to modify their shell startup files to set
the LD_LIBRARY_PATH for Open MPI (they will still need to set PATH or
otherwise find the OMPI executables to mpicc/mpirun/etc. their MPI
applications).
Also added --noprefix option to orterun to disable this behavior.
Finally, note that even if --enable-orterun-prefix-by-default is
specified, if the user specifies --prefix or /path/to/mpirun, these
options will override the default value of the prefix ($prefix).
This commit was SVN r11669.
The following Trac tickets were found above:
Ticket 377 --> https://svn.open-mpi.org/trac/ompi/ticket/377
Other changes:
1. Remove the old xcpu components as they are not functional.
2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one.
This will require an autogen/configure, I'm afraid.
This commit was SVN r11228.
Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced).
Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up).
I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t).
In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but...
Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems.
This commit was SVN r11204.
Added another system-level test function for ORTE that just spins until terminated by a ctrl-c signal.
Modified orterun - added a couple of newlines to the output when abnormally terminating so the prompt always is on a new line.
This commit was SVN r10866.
- change -no_oversubscribe to -nooversubscribe (to be similar to
-nolocal)
- Added text to orterun.1 describing slots and -nooversubscribe
Still need to add text about "mpirun a.out" functionality, and RHC
wants to make some minor edits, so committing for synchronization.
This commit was SVN r10800.
- orte-clean.c : check to see if the base session directory is empty
and delete it if it is.
- orte_universe_exists.c : Fix a down stread problem resulting from
George's r10718 commit. Don't use the 'fulldirpath' since
that is no longer guarenteed to be the absolute path
to the session directory. Construct this value outside of that
function from the prefix and frontend vars.
This commit was SVN r10741.
The following SVN revision numbers were found above:
r10718 --> open-mpi/ompi@47eef2e002
so that it does not return an error when no universe is passed to it.
Also put back in the 'Slots In Use' column as it is now working properly
per Ralphs recent ras commits. Still not sure what 'Slots Alloc' is meant
to represent, so left that as #if 0'd out for the moment.
This commit was SVN r10739.
The following SVN revision numbers were found above:
r10718 --> open-mpi/ompi@47eef2e002
Update the help text to report errors when not following that rule.
Also updated the RMAPS help text to reflect the reorganization of some of the round-robin code into the base.
The new functionality has been tested under Mac OS-X and on Odin using an MPI program. Both byslot and bynode mapping have been checked and verified. Operational support for other systems needs to be verified - I respectfully request people's help in doing so.
This commit was SVN r10708.
/tmp/tm-merge). Validated by RHC. Summary:
- Add --nolocal (and -nolocal) options to orterun
- Make some scalability improvements to the tm pls
This commit was SVN r10651.
After seeing the uglyness that is removing directories in the
codebase I decided to push down this to the OPAL by extending the
opal/os_create_dirpath.(c|h) to contain some more functionality.
In this process I renamed 'os_create_dirpath' to 'os_dirpath' since it
is a bit more general now.
Added a few functions to:
- check if an directory is empty
- check to see if the access permissions are set correctly
- destroy the directory at the end of the dirpath
- By using a caller callback function (a la Perl, I believe)
for every file, the caller can have fine grained control over
whether a specific file is deleted or not.
This simplifies things a bit for orte_session_dir_(finalize|cleanup)
as it should no longer contain any of this functionality, but uses
these functions to do the work.
From the external perspective nothing has changed, from the
developer point of view we have some cleaner, more generic code.
This commit was SVN r10640.
Since we don't properly handle connecting/disconecting from multiple
universes, only connect to the first one (or the user specified one).
This is a bug that needs to be fixed, but involves some deep magic in
ORTE.
Print the node segment upon request (-n option).
{{{
Node Name | Arch | Cell ID | State | Slots | Slots Max |
-----------------------------------------------------------
odin001 | | 0 | Unknown | 2 | 4 |
odin002 | | 0 | Unknown | 2 | 5 |
odin003 | | 0 | Unknown | 2 | 6 |
odin004 | | 0 | Unknown | 2 | 7 |
}}}
Since node_slots_alloc and node_slots_inuse are not properly updated
in the GPR don't print those values.
This commit was SVN r10633.
This moves the logic to create the symbolic links for:
- mpirun
- mpiexec
- ompi-ps
- ompi-clean
and their respective man pages to the ompi level from
the orte layer.
This is a bit pedantic, but orte shouldn't be doing the
work of ompi since that is a bit of an abstraction break.
Note: need to autogen.sh to get this. Sorry :(
This commit was SVN r10602.
per a request.
Currently it is not working well. That will soon change
as it just needs a bit of attention and testing to
make it lots-mo-betta.
This commit was SVN r10556.
per a request for its functionality into the main trunk.
This command provides basic information about a running job. It
needs a bit of attention, but works fine in its current iteration.
Please play with it, and lets try to work out all the left over bugs.
Pending action for this tool:
It has been requested that the tool be changed slightly to allow
it to be called via a function call from internal libraries
(e.g. orteconsole).
This commit was SVN r10554.
SIGUSR2. This can be extended later if needed to include other
signals we should forward to the user processes (TSTP and CONT,
perhaps?)
* Since the signal handlers don't actually run in signal context, we
can use malloc/fprintf/etc. So clean up some of the signal handler
code so that we don't keep message buffers around for the life of
the process
This commit was SVN r10496.
mpirun/orterun now has an option to print the version number. If -V/--version
is given, it will print the version number. If it's the only option, we
exit cleanly. Otherwise, we continue on as if --version wasn't given
(except we've printed the version number).
--This line, and th se below, will be ignored--
M orte/tools/orterun/orterun.c
M orte/tools/orterun/help-orterun.txt
This commit was SVN r10276.
1. Changed the RMGR and PLS APIs to add "signal_job" and "signal_proc" entry points. Only the "signal_job" entries are implemented - none of the components have implementations for "signal_proc" at this time. Thus, you can signal all of the procs in a job, but cannot currently signal only one specific proc.
2. Implemented those new API functions in all components except xgrid (Brian will do so very soon). Only the rsh/ssh and fork modules have been tested, however, and only under OS-X.
3. Added signal traps and callback functions for SIGUSR1/2 to orterun/mpirun that catch those signals and call the appropriate commands to propagate them out to all processes in the job.
4. Added a new test directory under the orte branch to (eventually) hold unit and system level tests for just the run-time. Since our test branch of the repository is under restricted access, people working on the RTE were continually developing their own system-level tests - thus making it hard to help diagnose problems. I have moved the more commonly-used functions here, and added one specifically for testing the SIGUSR1/2 functionality.
I will be contacting people directly to seek help with testing the changes on more environments. Other than compile issues, you should see absolutely no change in behavior on any of your systems - this additional functionality is transparent to anyone who does not issue a SIGUSR1/2 to mpirun.
Ralph
This commit was SVN r10258.
option as discussed on the devel-core mailing list. The Big
Difference is that instead of hard-coding the strings "/lib" and
"/bin" in to append to the prefix, we append the basename of the local
libdir and bindir. Hence, if your libdir is $prefix/lib64, we'll
append /lib64 to construct the remote node's LD_LIBRARY_PATH (etc.).
Also appended the orterun.1 man page to include a description of
--prefix, how it is constructed, what it handles / what it does not,
etc.
This commit was SVN r9930.
in San Jose. Allow the configure option --disable-binaries to build OMPI,
but not build or install the support binaries (so basically, just build
the libraries).
This commit was SVN r9777.
automagically bring in the libraries through the top-level library (so
liborte automatically brings in libopal, etc.). Otherwise, we get some
warnings on Solaris
This should go to the v1.1 branch
This commit was SVN r9666.
NOTE: JEFF SHOULD CHECK THIS!
I found that orterun was not tracking the index number of the app_contexts it was creating. Hence, the app_context->idx field was always sitting at zero. This index is used by the mapper to decide which app_context to use for each process - thus, with the value of each index being zero, the mapper only used the first app_context that was created. All others were ignored.
Not sure when this might have gotten changed. Could be it was a problem that always existed, but didn't get exposed until something else was changed.
Anyway, it seems to work now - could stand further testing.
This commit was SVN r9389.
(reported by Cisco)
- While in orterun, add a feature that multiple users have asked for:
if you specify an absolute pathname to orterun, such as
"/path/to/bin/orterun ...", it's equivalent to "orterun --path
/path/to ..."
This commit was SVN r9181.
- Explained <program> and made a consistancy change in the Quick Start section.
- Change references to 'app schema' to Open MPI 'app context'
- Audit the command line arguments for --foo, -foo stuff.
This commit was SVN r9097.
This needs to be gone over a few more times before it is allowed to see
daylight, but has come a long way. Some sections may be off more than a little,
but the general idea is there.
Need to audit to make sure we don't call the ORTE VHNP's daemons :)
This commit was SVN r9078.
Finished adding and pruning all the the Options.
Cleaned up a bunch of man syntax, so it should be 'more' readable (making the
assumption that man source is ever readable :p).
I am moving on to the "description" and "see also" sections next.
This commit was SVN r9077.