1
1

122 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
d628a18411 Right now there is no support for TotalView on Windows. Therefore, we don't
really care how these functions and variables are declared.

This commit was SVN r11996.
2006-10-05 05:19:03 +00:00
Ralph Castain
12328395ae Missed a couple of debug statements
This commit was SVN r11935.
2006-10-02 15:46:41 +00:00
Tim Prins
53b116d309 This commit fixes trac:452.
It turns out that we were improperly allocating an array if -np was not passed. Also, we were not really using this array for anything. So this gets rid of the array and performs some minor cleanup.

This commit was SVN r11934.

The following Trac tickets were found above:
  Ticket 452 --> https://svn.open-mpi.org/trac/ompi/ticket/452
2006-10-02 15:03:43 +00:00
Ralph Castain
559b9b0ae8 Continue beating on comm_spawn. Setup to debug bproc.
This commit was SVN r11932.
2006-10-02 14:58:22 +00:00
Ralph Castain
121f834776 Continue bringing comm_spawn back online. Ensure all RM frameworks post their HNP receives. Fix the rmgr proxy component.
Still need some work on the proxy component, and on job termination for persistent daemon case.

This commit was SVN r11928.
2006-10-02 00:46:31 +00:00
Tim Prins
e4f8ad303e Fix for #397
on 64 bit platforms sizeof(size_t) != sizeof(orte_std_cntr_t), and we were incorrectly 
assuming this when dealing with num procs. It worked on little endian platforms, but
not big endian. So change num_procs to type int, and cast where needed. 

This commit was SVN r11796.
2006-09-25 19:41:54 +00:00
Ralph Castain
0ad0d84afd Add two new API functions to the RMGR, and modify the "spawn" API to support the enhanced MPI-2 functionality.
No implementation backs these new APIs - just placeholders for now.

This commit was SVN r11699.
2006-09-19 01:45:05 +00:00
George Bosilca
f8de894efe This one wasn't supposed to get into the repository.
This commit was SVN r11697.
2006-09-18 21:28:55 +00:00
George Bosilca
7ad23ff97b Be 100% total view friendly. Let tv find out the real name of our
executable and export all functions as they should be.

This commit was SVN r11694.
2006-09-18 17:55:14 +00:00
Jeff Squyres
8226dab86c Fixes trac:377
Add --enable-orterun-prefix-by-default (and a synonym:
--enable-mpirun-prefix-by-default) to make orterun always behave as if
"--prefix $prefix" was given on the command line (where $prefix is the
value given to the --prefix option to configure).  This prevents many
rsh/ssh users from needing to modify their shell startup files to set
the LD_LIBRARY_PATH for Open MPI (they will still need to set PATH or
otherwise find the OMPI executables to mpicc/mpirun/etc. their MPI
applications).

Also added --noprefix option to orterun to disable this behavior.
Finally, note that even if --enable-orterun-prefix-by-default is
specified, if the user specifies --prefix or /path/to/mpirun, these
options will override the default value of the prefix ($prefix).

This commit was SVN r11669.

The following Trac tickets were found above:
  Ticket 377 --> https://svn.open-mpi.org/trac/ompi/ticket/377
2006-09-15 02:52:08 +00:00
Ralph Castain
37dfdb76eb Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done.
This commit was SVN r11661.
2006-09-14 21:29:51 +00:00
Galen Shipman
b02185374f Push a generated "key" out to all the processes. This is necessary for some
interconnect wireup in which all processes must agree on a "key" to initialize
the interconnect with. 

This commit was SVN r11653.
2006-09-14 15:27:17 +00:00
George Bosilca
e04032ca2f Correct a comment and protect the usage of the environ variable against Windows.
This commit was SVN r11397.
2006-08-24 16:18:42 +00:00
George Bosilca
75fa0317da Keep environ as the prefered storage for the environment variables.
This commit was SVN r11351.
2006-08-23 06:14:24 +00:00
George Bosilca
b4732f557a Now it's time to update ORTE. Cleanup most of the ORTE tools. Force them
to use opal_basename and opal_dirname. Don't create the path manually. Use
the specialized opal functions instead.

This commit was SVN r11345.
2006-08-23 02:35:00 +00:00
George Bosilca
6ef0acf99f The names of the defines should start with OPAL as they belong to the
OPAL layer.
We now support 64 bits Windows too.

This commit was SVN r11312.
2006-08-21 21:55:41 +00:00
Ralph Castain
8c7f0ed9ae Change the SOH to the new State Monitoring and Reporting (SMR) framework. New API's will be appearing in the new framework shortly - this just gets the name change into the system.
Other changes:

1. Remove the old xcpu components as they are not functional.

2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one.

This will require an autogen/configure, I'm afraid.

This commit was SVN r11228.
2006-08-16 16:35:09 +00:00
Ralph Castain
5dfd54c778 With the branch to 1.2 made....
Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced).

Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up).

I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t).

In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but...

Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems.

This commit was SVN r11204.
2006-08-15 19:54:10 +00:00
Ralph Castain
8bec270f90 Fix a bug noted by Jeff - we were no longer accurately recording in the registry that a process had been terminated when the user initiated the "kill" process (via cntrl-c).
Added another system-level test function for ORTE that just spins until terminated by a ctrl-c signal.

Modified orterun - added a couple of newlines to the output when abnormally terminating so the prompt always is on a new line.

This commit was SVN r10866.
2006-07-18 14:42:27 +00:00
Ralph Castain
c22b0d516e Some edits to the man page for Jeff to review
This commit was SVN r10803.
2006-07-14 14:47:06 +00:00
Jeff Squyres
e6c9c699fe Minor changes:
- change -no_oversubscribe to -nooversubscribe (to be similar to
  -nolocal)
- Added text to orterun.1 describing slots and -nooversubscribe
Still need to add text about "mpirun a.out" functionality, and RHC
wants to make some minor edits, so committing for synchronization.

This commit was SVN r10800.
2006-07-14 14:15:03 +00:00
George Bosilca
94f6cb3765 There is no SIG_USR1 and SIG_USR2 on windows.
This commit was SVN r10715.
2006-07-11 05:24:08 +00:00
Ralph Castain
febc143d8c Per LANL's stated need, add functionality that runs a.out across ALL available process slots if no num_proc is specified on the command line. However, please note the following limitation: we ONLY allow ONE application to be specified on the command line when this feature is invoked. If multiple apps are specified, the user MUST also specify the number to be launched for each and every one of them.
Update the help text to report errors when not following that rule.

Also updated the RMAPS help text to reflect the reorganization of some of the round-robin code into the base.

The new functionality has been tested under Mac OS-X and on Odin using an MPI program. Both byslot and bynode mapping have been checked and verified. Operational support for other systems needs to be verified - I respectfully request people's help in doing so.

This commit was SVN r10708.
2006-07-10 21:25:33 +00:00
Jeff Squyres
538965aeb0 Final merge of stuff from /tmp/tm-stuff tree (merged through
/tmp/tm-merge).  Validated by RHC.  Summary:

- Add --nolocal (and -nolocal) options to orterun
- Make some scalability improvements to the tm pls

This commit was SVN r10651.
2006-07-04 20:12:35 +00:00
Josh Hursey
2edf1511fd Closes ticket #173 : Split name linking up for orte/ompi shared tools.
This moves the logic to create the symbolic links for:
 - mpirun
 - mpiexec
 - ompi-ps
 - ompi-clean
and their respective man pages to the ompi level from
the orte layer.

This is a bit pedantic, but orte shouldn't be doing the
work of ompi since that is a bit of an abstraction break.

Note: need to autogen.sh to get this. Sorry :(

This commit was SVN r10602.
2006-06-30 22:01:56 +00:00
Brian Barrett
b6663c64c7 * fix for bug #161 - add man page info for recently added features
This commit was SVN r10514.
2006-06-26 22:16:39 +00:00
Brian Barrett
86861bc1c3 * add --quiet option, and surpress a couple of the status messages in
orterun if it is actually enabled.  For ticket #129.

This commit was SVN r10497.
2006-06-26 18:21:45 +00:00
Brian Barrett
4e8abb943b * fix up signal handling code so that one function handles SIGUSR1 and
SIGUSR2.  This can be extended later if needed to include other
  signals we should forward to the user processes (TSTP and CONT,
  perhaps?)
* Since the signal handlers don't actually run in signal context, we
  can use malloc/fprintf/etc.  So clean up some of the signal handler
  code so that we don't keep message buffers around for the life of
  the process

This commit was SVN r10496.
2006-06-26 15:12:52 +00:00
Brian Barrett
9766c01e50 * Per discussion at quarterly meeting and bug #91, print out the bug
contact point when printing version and help strings

This commit was SVN r10484.
2006-06-22 19:48:27 +00:00
Brian Barrett
5c89dc6946 Fix for ticket #91
mpirun/orterun now has an option to print the version number.  If -V/--version
is given, it will print the version number.  If it's the only option, we
exit cleanly.  Otherwise, we continue on as if --version wasn't given
(except we've printed the version number).
--This line, and th se below, will be ignored--

M    orte/tools/orterun/orterun.c
M    orte/tools/orterun/help-orterun.txt

This commit was SVN r10276.
2006-06-09 17:21:23 +00:00
Ralph Castain
ee5a626d25 Add ability to trap and propagate SIGUSR1/2 to remote processes. There are a number of small changes that hit a bunch of files:
1. Changed the RMGR and PLS APIs to add "signal_job" and "signal_proc" entry points. Only the "signal_job" entries are implemented - none of the components have implementations for "signal_proc" at this time. Thus, you can signal all of the procs in a job, but cannot currently signal only one specific proc.

2. Implemented those new API functions in all components except xgrid (Brian will do so very soon). Only the rsh/ssh and fork modules have been tested, however, and only under OS-X.

3. Added signal traps and callback functions for SIGUSR1/2 to orterun/mpirun that catch those signals and call the appropriate commands to propagate them out to all processes in the job.

4. Added a new test directory under the orte branch to (eventually) hold unit and system level tests for just the run-time. Since our test branch of the repository is under restricted access, people working on the RTE were continually developing their own system-level tests - thus making it hard to help diagnose problems. I have moved the more commonly-used functions here, and added one specifically for testing the SIGUSR1/2 functionality.

I will be contacting people directly to seek help with testing the changes on more environments. Other than compile issues, you should see absolutely no change in behavior on any of your systems - this additional functionality is transparent to anyone who does not issue a SIGUSR1/2 to mpirun.

Ralph

This commit was SVN r10258.
2006-06-08 18:27:17 +00:00
Jeff Squyres
1d6902296c Additions to the tm, slurm, and rsh pls modules to handle the --prefix
option as discussed on the devel-core mailing list.  The Big
Difference is that instead of hard-coding the strings "/lib" and
"/bin" in to append to the prefix, we append the basename of the local
libdir and bindir.  Hence, if your libdir is $prefix/lib64, we'll
append /lib64 to construct the remote node's LD_LIBRARY_PATH (etc.).

Also appended the orterun.1 man page to include a description of
--prefix, how it is constructed, what it handles / what it does not,
etc.

This commit was SVN r9930.
2006-05-16 14:14:12 +00:00
Brian Barrett
52369307f8 Add a feature to the build system that Terry from Sun and I talked about
in San Jose.  Allow the configure option --disable-binaries to build OMPI,
but not build or install the support binaries (so basically, just build
the libraries).

This commit was SVN r9777.
2006-04-29 02:16:41 +00:00
Brian Barrett
62afa63ded Initialize length to 0 instead of -1 (size_t might be unsigned and therefore
-1 is an issue).

This should go to the v1.1 branch...

This commit was SVN r9665.
2006-04-20 15:42:36 +00:00
Ralph Castain
c79c1714de Okaaayyy....let's see if this restores the "prefix" command line option. No idea what the problem was with the other option, but it isn't critical right now, so I'll figure it out later.
This commit was SVN r9542.
2006-04-06 07:53:38 +00:00
Ralph Castain
0ba8851a47 Fix the univ_exist option
This commit was SVN r9535.
2006-04-05 17:18:06 +00:00
Ralph Castain
b9bdb2125e Fix and upgrade the console to support better debugging. Activate "dump" commands to display registry content. Remove the blasted opal_output default prefix that made the dump output illegible. Properly connect to existing daemons and/or start new ones.
This commit was SVN r9528.
2006-04-04 11:05:52 +00:00
Brian Barrett
99e4c89183 * some typo fixes for orterun manpage
* Install orterun manpage as mpirun.1 and mpiexec.1 as well as orterun.1

This commit was SVN r9444.
2006-03-29 01:04:43 +00:00
Jeff Squyres
07b0e559f2 Fix copyright
This commit was SVN r9443.
2006-03-29 00:53:11 +00:00
Josh Hursey
35eb1a2970 Added a section on "Specifying Hosts" to the man page.
This commit was SVN r9432.
2006-03-27 23:46:38 +00:00
Jeff Squyres
bc96040e1c - Add Cisco copyright
- Add comment explaining why we used INT_MAX
- Update NEWS

This commit was SVN r9415.
2006-03-24 15:39:09 +00:00
Jeff Squyres
a843ce4c23 Clean up a minor memory leak
This commit was SVN r9413.
2006-03-24 15:28:42 +00:00
Ralph Castain
08db67cdf8 Fix the app_context problem for app_files too....
Again, this should be checked by Jeff.

This commit was SVN r9393.
2006-03-23 17:55:25 +00:00
Ralph Castain
2a18ebd9e1 Fix the app_context problem.
NOTE: JEFF SHOULD CHECK THIS!

I found that orterun was not tracking the index number of the app_contexts it was creating. Hence, the app_context->idx field was always sitting at zero. This index is used by the mapper to decide which app_context to use for each process - thus, with the value of each index being zero, the mapper only used the first app_context that was created. All others were ignored.

Not sure when this might have gotten changed. Could be it was a problem that always existed, but didn't get exposed until something else was changed.

Anyway, it seems to work now - could stand further testing.

This commit was SVN r9389.
2006-03-23 16:53:11 +00:00
Josh Hursey
22bac7ae95 a test commit. one more try
This commit was SVN r9350.
2006-03-21 00:39:29 +00:00
Josh Hursey
d64aab529f a test commit. no real changes here. Removing added char.
This commit was SVN r9349.
2006-03-21 00:37:13 +00:00
Josh Hursey
c8f9108c18 a test commit. no real changes here
This commit was SVN r9348.
2006-03-21 00:33:20 +00:00
Josh Hursey
66edc64be0 Minor comment change
This commit was SVN r9316.
2006-03-16 19:00:03 +00:00
Josh Hursey
7fcfd87cd5 Minor date change
This commit was SVN r9315.
2006-03-16 18:59:13 +00:00
Jeff Squyres
80bc1850bf Ensure that --prefix takes precedence over /path/to/orterun
This commit was SVN r9183.
2006-02-28 14:44:40 +00:00