1
1
Граф коммитов

186 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
121f834776 Continue bringing comm_spawn back online. Ensure all RM frameworks post their HNP receives. Fix the rmgr proxy component.
Still need some work on the proxy component, and on job termination for persistent daemon case.

This commit was SVN r11928.
2006-10-02 00:46:31 +00:00
Dan Lacher
ba0389723e Ticket: #346
remove requirements on .la files on wrapper scripts

Ticket: #374
  extend compilers to support 32 bit and 64 bit in one version of the wrapper

Submitted by: Dan Lacher
Reviewed by: Rolf Vandevaart

This commit was SVN r11908.
2006-09-29 23:58:58 +00:00
Jeff Squyres
785a2e1c90 Move the man page installs to install-data-hook. Putting them in
install-exec-hook is not only wrong, it can cause ordering issues such
as trying to put sym links to man pages in directories that do not yet
exist.

This commit was SVN r11893.
2006-09-29 14:34:39 +00:00
Tim Prins
e4f8ad303e Fix for #397
on 64 bit platforms sizeof(size_t) != sizeof(orte_std_cntr_t), and we were incorrectly 
assuming this when dealing with num procs. It worked on little endian platforms, but
not big endian. So change num_procs to type int, and cast where needed. 

This commit was SVN r11796.
2006-09-25 19:41:54 +00:00
Jeff Squyres
c5cc1f0c1a Add man page for wrapper compilers.
Fixes trac:358.

This commit was SVN r11773.

The following Trac tickets were found above:
  Ticket 358 --> https://svn.open-mpi.org/trac/ompi/ticket/358
2006-09-25 14:11:21 +00:00
Ralph Castain
977e3c5ca1 Let's see if Cyrador understands this version a little better...
This commit was SVN r11709.
2006-09-19 13:05:40 +00:00
Ralph Castain
0ad0d84afd Add two new API functions to the RMGR, and modify the "spawn" API to support the enhanced MPI-2 functionality.
No implementation backs these new APIs - just placeholders for now.

This commit was SVN r11699.
2006-09-19 01:45:05 +00:00
George Bosilca
f8de894efe This one wasn't supposed to get into the repository.
This commit was SVN r11697.
2006-09-18 21:28:55 +00:00
George Bosilca
7ad23ff97b Be 100% total view friendly. Let tv find out the real name of our
executable and export all functions as they should be.

This commit was SVN r11694.
2006-09-18 17:55:14 +00:00
Ralph Castain
d7e61e40fc Quiet a few warnings from Cyrador
This commit was SVN r11686.
2006-09-18 12:40:42 +00:00
Jeff Squyres
8226dab86c Fixes trac:377
Add --enable-orterun-prefix-by-default (and a synonym:
--enable-mpirun-prefix-by-default) to make orterun always behave as if
"--prefix $prefix" was given on the command line (where $prefix is the
value given to the --prefix option to configure).  This prevents many
rsh/ssh users from needing to modify their shell startup files to set
the LD_LIBRARY_PATH for Open MPI (they will still need to set PATH or
otherwise find the OMPI executables to mpicc/mpirun/etc. their MPI
applications).

Also added --noprefix option to orterun to disable this behavior.
Finally, note that even if --enable-orterun-prefix-by-default is
specified, if the user specifies --prefix or /path/to/mpirun, these
options will override the default value of the prefix ($prefix).

This commit was SVN r11669.

The following Trac tickets were found above:
  Ticket 377 --> https://svn.open-mpi.org/trac/ompi/ticket/377
2006-09-15 02:52:08 +00:00
Ralph Castain
37dfdb76eb Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done.
This commit was SVN r11661.
2006-09-14 21:29:51 +00:00
Galen Shipman
b02185374f Push a generated "key" out to all the processes. This is necessary for some
interconnect wireup in which all processes must agree on a "key" to initialize
the interconnect with. 

This commit was SVN r11653.
2006-09-14 15:27:17 +00:00
George Bosilca
7e7bae335e Protect the environ variable on windows.
This commit was SVN r11435.
2006-08-27 04:44:17 +00:00
George Bosilca
e04032ca2f Correct a comment and protect the usage of the environ variable against Windows.
This commit was SVN r11397.
2006-08-24 16:18:42 +00:00
George Bosilca
fdfae70dbe Use environ.
This commit was SVN r11353.
2006-08-23 06:19:47 +00:00
George Bosilca
75fa0317da Keep environ as the prefered storage for the environment variables.
This commit was SVN r11351.
2006-08-23 06:14:24 +00:00
George Bosilca
b4732f557a Now it's time to update ORTE. Cleanup most of the ORTE tools. Force them
to use opal_basename and opal_dirname. Don't create the path manually. Use
the specialized opal functions instead.

This commit was SVN r11345.
2006-08-23 02:35:00 +00:00
George Bosilca
6ef0acf99f The names of the defines should start with OPAL as they belong to the
OPAL layer.
We now support 64 bits Windows too.

This commit was SVN r11312.
2006-08-21 21:55:41 +00:00
Ralph Castain
8c7f0ed9ae Change the SOH to the new State Monitoring and Reporting (SMR) framework. New API's will be appearing in the new framework shortly - this just gets the name change into the system.
Other changes:

1. Remove the old xcpu components as they are not functional.

2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one.

This will require an autogen/configure, I'm afraid.

This commit was SVN r11228.
2006-08-16 16:35:09 +00:00
Ralph Castain
5dfd54c778 With the branch to 1.2 made....
Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced).

Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up).

I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t).

In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but...

Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems.

This commit was SVN r11204.
2006-08-15 19:54:10 +00:00
Ralph Castain
ec3eeb819d Remove unused variable to make Cyrador happy.
This commit was SVN r11144.
2006-08-10 12:57:55 +00:00
Ralph Castain
56c15963af Finalize the session directory and runtime when the orted exits due to a failed launch.
This commit was SVN r11141.
2006-08-09 17:00:53 +00:00
Ralph Castain
8bec270f90 Fix a bug noted by Jeff - we were no longer accurately recording in the registry that a process had been terminated when the user initiated the "kill" process (via cntrl-c).
Added another system-level test function for ORTE that just spins until terminated by a ctrl-c signal.

Modified orterun - added a couple of newlines to the output when abnormally terminating so the prompt always is on a new line.

This commit was SVN r10866.
2006-07-18 14:42:27 +00:00
Jeff Squyres
416e9de22d Fix some minor problems when handling the error cases
This commit was SVN r10854.
2006-07-17 19:21:10 +00:00
Ralph Castain
c22b0d516e Some edits to the man page for Jeff to review
This commit was SVN r10803.
2006-07-14 14:47:06 +00:00
Jeff Squyres
e6c9c699fe Minor changes:
- change -no_oversubscribe to -nooversubscribe (to be similar to
  -nolocal)
- Added text to orterun.1 describing slots and -nooversubscribe
Still need to add text about "mpirun a.out" functionality, and RHC
wants to make some minor edits, so committing for synchronization.

This commit was SVN r10800.
2006-07-14 14:15:03 +00:00
Jeff Squyres
f7a71772a7 Remove long-defunct "openmpi" tool from orte. It was apparently an
early generation of the orted, and is now long-dead.

This commit was SVN r10754.
2006-07-12 03:52:17 +00:00
Josh Hursey
682a6a123e - os_dirpath.c : reset the is_dir var each time through the loop.
- orte-clean.c : check to see if the base session directory is empty 
                 and delete it if it is.

- orte_universe_exists.c : Fix a down stread problem resulting from 
      George's r10718 commit. Don't use the 'fulldirpath' since
      that is no longer guarenteed to be the absolute path
      to the session directory. Construct this value outside of that
      function from the prefix and frontend vars.

This commit was SVN r10741.

The following SVN revision numbers were found above:
  r10718 --> open-mpi/ompi@47eef2e002
2006-07-11 17:31:05 +00:00
Josh Hursey
5a812c8211 Fix orte-ps which George broke in r10718 by extending the orte_session_dir_get_name()
so that it does not return an error when no universe is passed to it.

Also put back in the 'Slots In Use' column as it is now working properly
per Ralphs recent ras commits. Still not sure what 'Slots Alloc' is meant
to represent, so left that as #if 0'd out for the moment.

This commit was SVN r10739.

The following SVN revision numbers were found above:
  r10718 --> open-mpi/ompi@47eef2e002
2006-07-11 16:54:07 +00:00
Josh Hursey
2e506591c3 more pedantic cleanup. Hopefully this will make happy.
This commit was SVN r10730.
2006-07-11 13:48:28 +00:00
Josh Hursey
6309047e63 pedantic cleanup
This commit was SVN r10728.
2006-07-11 13:43:50 +00:00
George Bosilca
b3e5c658d2 Add the correct include file.
This commit was SVN r10721.
2006-07-11 05:50:15 +00:00
George Bosilca
523b6dcbe8 Protect the header files. Remove the directory using the OPAL
function.

This commit was SVN r10716.
2006-07-11 05:25:41 +00:00
George Bosilca
94f6cb3765 There is no SIG_USR1 and SIG_USR2 on windows.
This commit was SVN r10715.
2006-07-11 05:24:08 +00:00
Ralph Castain
febc143d8c Per LANL's stated need, add functionality that runs a.out across ALL available process slots if no num_proc is specified on the command line. However, please note the following limitation: we ONLY allow ONE application to be specified on the command line when this feature is invoked. If multiple apps are specified, the user MUST also specify the number to be launched for each and every one of them.
Update the help text to report errors when not following that rule.

Also updated the RMAPS help text to reflect the reorganization of some of the round-robin code into the base.

The new functionality has been tested under Mac OS-X and on Odin using an MPI program. Both byslot and bynode mapping have been checked and verified. Operational support for other systems needs to be verified - I respectfully request people's help in doing so.

This commit was SVN r10708.
2006-07-10 21:25:33 +00:00
Josh Hursey
c38c47a4f5 Fix some unreachable statements. Caught by a nightly build.
This commit was SVN r10696.
2006-07-10 13:32:31 +00:00
Jeff Squyres
538965aeb0 Final merge of stuff from /tmp/tm-stuff tree (merged through
/tmp/tm-merge).  Validated by RHC.  Summary:

- Add --nolocal (and -nolocal) options to orterun
- Make some scalability improvements to the tm pls

This commit was SVN r10651.
2006-07-04 20:12:35 +00:00
Josh Hursey
d082a63734 Add some new OPAL functionality.
After seeing the uglyness that is removing directories in the
codebase I decided to push down this to the OPAL by extending the
opal/os_create_dirpath.(c|h) to contain some more functionality.

In this process I renamed 'os_create_dirpath' to 'os_dirpath' since it
is a bit more general now.

Added a few functions to:
 - check if an directory is empty
 - check to see if the access permissions are set correctly
 - destroy the directory at the end of the dirpath
   - By using a caller callback function (a la Perl, I believe)
     for every file, the caller can have fine grained control over
     whether a specific file is deleted or not.

This simplifies things a bit for orte_session_dir_(finalize|cleanup)
as it should no longer contain any of this functionality, but uses
these functions to do the work.

From the external perspective nothing has changed, from the 
developer point of view we have some cleaner, more generic code.

This commit was SVN r10640.
2006-07-03 22:23:07 +00:00
Josh Hursey
38df31e488 A bit of cleanup in the pretty_printing, making it a bit more sane.
Since we don't properly handle connecting/disconecting from multiple
universes, only connect to the first one (or the user specified one).
This is a bug that needs to be fixed, but involves some deep magic in
ORTE.

Print the node segment upon request (-n option). 
{{{
Node Name | Arch | Cell ID |   State | Slots | Slots Max | 
-----------------------------------------------------------
  odin001 |      |       0 | Unknown |     2 |         4 | 
  odin002 |      |       0 | Unknown |     2 |         5 | 
  odin003 |      |       0 | Unknown |     2 |         6 | 
  odin004 |      |       0 | Unknown |     2 |         7 | 
}}}

Since node_slots_alloc and node_slots_inuse are not properly updated
in the GPR don't print those values.

This commit was SVN r10633.
2006-07-03 17:11:02 +00:00
Josh Hursey
fc72eb4a01 remove a residual warning
This commit was SVN r10628.
2006-07-03 15:16:15 +00:00
Josh Hursey
2edf1511fd Closes ticket #173 : Split name linking up for orte/ompi shared tools.
This moves the logic to create the symbolic links for:
 - mpirun
 - mpiexec
 - ompi-ps
 - ompi-clean
and their respective man pages to the ompi level from
the orte layer.

This is a bit pedantic, but orte shouldn't be doing the
work of ompi since that is a bit of an abstraction break.

Note: need to autogen.sh to get this. Sorry :(

This commit was SVN r10602.
2006-06-30 22:01:56 +00:00
Josh Hursey
c356f4e948 forgot to init a var. Thanks Jeff for catching this
This commit was SVN r10583.
2006-06-30 14:22:58 +00:00
Ralph Castain
a90f8feb35 Need to initialize the buffer in the contact_info command.
This commit was SVN r10563.
2006-06-29 14:57:10 +00:00
Josh Hursey
793bbc667a bringing over orte-clean from tmp/jjhursey-ft-cr branch
per a request.

Currently it is not working well. That will soon change
as it just needs a bit of attention and testing to
make it lots-mo-betta.

This commit was SVN r10556.
2006-06-28 22:33:54 +00:00
Josh Hursey
9c0a279522 Moved the 'orte-ps' command from the tmp/jjhursey-ft-cr branch
per a request for its functionality into the main trunk.

This command provides basic information about a running job. It
needs a bit of attention, but works fine in its current iteration.

Please play with it, and lets try to work out all the left over bugs.

Pending action for this tool:
It has been requested that the tool be changed slightly to allow
it to be called via a function call from internal libraries
(e.g. orteconsole).

This commit was SVN r10554.
2006-06-28 22:06:13 +00:00
Brian Barrett
b6663c64c7 * fix for bug #161 - add man page info for recently added features
This commit was SVN r10514.
2006-06-26 22:16:39 +00:00
Brian Barrett
86861bc1c3 * add --quiet option, and surpress a couple of the status messages in
orterun if it is actually enabled.  For ticket #129.

This commit was SVN r10497.
2006-06-26 18:21:45 +00:00
Brian Barrett
4e8abb943b * fix up signal handling code so that one function handles SIGUSR1 and
SIGUSR2.  This can be extended later if needed to include other
  signals we should forward to the user processes (TSTP and CONT,
  perhaps?)
* Since the signal handlers don't actually run in signal context, we
  can use malloc/fprintf/etc.  So clean up some of the signal handler
  code so that we don't keep message buffers around for the life of
  the process

This commit was SVN r10496.
2006-06-26 15:12:52 +00:00
Brian Barrett
9766c01e50 * Per discussion at quarterly meeting and bug #91, print out the bug
contact point when printing version and help strings

This commit was SVN r10484.
2006-06-22 19:48:27 +00:00