1
1
Граф коммитов

415 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
8636ac6a4d Fix ticket 353 - print out a nice message that the combination of debug-daemons and num_concurrent in the pls rsh launcher will cause deadlock and exit
This commit was SVN r12279.
2006-10-24 15:59:02 +00:00
Tim Prins
cb622db7c9 Fixes trac:352
Only close off stdout/stderr from the daemons if we are not debugging the slurm pls and --debug-daemons was not passed.

This commit was SVN r12276.

The following Trac tickets were found above:
  Ticket 352 --> https://svn.open-mpi.org/trac/ompi/ticket/352
2006-10-24 13:05:13 +00:00
Tim Prins
93d61d01fb Fix for a problem on SLURM we have neen having since r12243 where mpirun would hang after the process had finished. It turns out that we were always reporting the name of the daemon wrong, but we simply never noticed as we never used it, until r12243. This makes it so we report the name of the daemon correctly.
This commit was SVN r12274.

The following SVN revision numbers were found above:
  r12243 --> open-mpi/ompi@153e38ffc9
2006-10-24 01:41:28 +00:00
George Bosilca
2a863df0a5 Newline is required by some compilers at the end of a file.
This commit was SVN r12244.
2006-10-21 05:56:04 +00:00
Ralph Castain
153e38ffc9 Lesson to be learned: if you send an ack to a recv'd command, better not send it to the same tag it came from - at least, not if there is a persistent recv on that tag!
Fix the persistent daemon problem where it was exiting when a job completed. Problem was that the persistent daemon would order the job daemons to exit. They would then send an 'ack' back to the persistent daemon - but the ack consisted of an echo of the "exit" command, which was recv'd by the wrong listener who treated it as a properly sent cmd....and exited.

This commit was SVN r12243.
2006-10-21 02:53:19 +00:00
Ralph Castain
c07d4e2510 Cleaner rendition now extended to other environments. Remove MCA params for backend procs that can cause trouble. Specifically, any directives on the selection of components for RDS, RAS, RMAPS, PLS, and RMGR can be bad mojo on the backend.
This patch will cause a problem for cnos, however, as there we want to specifically tell the backends to be "null". I'm working on that issue.

This commit was SVN r12225.
2006-10-20 16:50:13 +00:00
Ralph Castain
02efd07b60 Fix the MCA param passing issue, at least for rsh at the moment. I will clean this up and move it to the other environments once I shift back to a local computer.
This commit was SVN r12224.
2006-10-20 15:27:29 +00:00
Ralph Castain
b07a6b1d7a Fix a major typo that caused remote launch to crash - had something inside the wrong brace
This commit was SVN r12221.
2006-10-20 14:30:23 +00:00
George Bosilca
2aa3e51223 Nothing relevant. Only a set of castings to have a clean compile on
Windows. The cl.exe compiler is pretty good at complaining about
any kind of non explicit cast.

This commit was SVN r12207.
2006-10-20 02:25:50 +00:00
Ralph Castain
3f55d6897a Remove the memory debugging options. Fix what appears to be a typo in a help file.
This commit was SVN r12107.
2006-10-12 00:44:48 +00:00
Brian Barrett
29c91cf2f3 * Fix issue in odls_bproc where we were using vpid instead of the number of
processes launched locally for the stdio file names.  This was causing
    the expected files to not exist and bproc_vexecmove_io to fail.
  * Clean up a bunch of debugging output in the bproc pls

This commit was SVN r12102.
2006-10-11 20:34:12 +00:00
Ralph Castain
f91a95b3fe Fix the bug that caused mpirun to hang when a remote executable wasn't found using the rsh launcher. Will now test on a remote node
This commit was SVN r12095.
2006-10-11 18:43:13 +00:00
Ralph Castain
2da8245be0 Correctly propagate no-daemonize
This commit was SVN r12093.
2006-10-11 17:53:17 +00:00
Ralph Castain
e5877cc459 Add the proper valgrind params
This commit was SVN r12092.
2006-10-11 17:48:41 +00:00
George Bosilca
b56636c855 orte_pls belong to the PLS framework, therefore it should only be defined
in pls.h.

This commit was SVN r12089.
2006-10-11 17:12:22 +00:00
Ralph Castain
27e305347c Add a couple of options to orterun that support debugging of daemons for memory corruption.
Ensure that the environment provided to local application processes isn't "polluted" by the orteds

This commit was SVN r12087.
2006-10-11 15:18:57 +00:00
Brian Barrett
f5b8f1f2f0 Work around Automake not knowing how to properly configure libtool to build
Objective C libraries

Refs trac:483

This commit was SVN r12080.

The following Trac tickets were found above:
  Ticket 483 --> https://svn.open-mpi.org/trac/ompi/ticket/483
2006-10-10 20:14:26 +00:00
Ralph Castain
0e9dc590b7 Fix typo that didn't make it over from testing on vogon
This commit was SVN r12068.
2006-10-09 20:37:39 +00:00
Ralph Castain
cebdc51762 Remove a debugging output
This commit was SVN r12066.
2006-10-09 02:10:52 +00:00
Ralph Castain
2e09128337 Many thanks to Jeff for tracking down the typo causing the orte_job_map_t destuctor to fail!!
Restore the OBJ_RELEASE calls to cleanup map objects.

This commit was SVN r12064.
2006-10-07 22:44:00 +00:00
Jeff Squyres
efe28d62e9 Fix some compiler errors. I have *not* checked this for correctness;
but it does now compile.

This commit was SVN r12062.
2006-10-07 19:10:56 +00:00
Ralph Castain
ae79894bad Bring the map fixes into the main trunk. This should fix several problems, including the multiple app_context issue.
I have tested on rsh, slurm, bproc, and tm. Bproc continues to have a problem (will be asking for help there).

Gridengine compiles but I cannot test (believe it likely will run).

Poe and xgrid compile to the extent they can without the proper include files.

This commit was SVN r12059.
2006-10-07 15:45:24 +00:00
George Bosilca
cda46efd2a Some missing DECLSPEC
This commit was SVN r12047.
2006-10-06 15:21:52 +00:00
George Bosilca
c79c436c8d Cleanups. Remove all __WINDOWS__ checks as this module will never
get compiled on Windows.

This commit was SVN r12011.
2006-10-05 06:17:30 +00:00
George Bosilca
dbe7f8ac32 Always return bool.
This commit was SVN r12009.
2006-10-05 05:45:18 +00:00
George Bosilca
03083cc1f6 Don't release the values[0] before it get initialized.
This commit was SVN r11999.
2006-10-05 05:25:18 +00:00
Ralph Castain
cd7d87aa7b Define the map data types for dss compatibility. Setup to debug bproc
This commit was SVN r11955.
2006-10-03 17:40:00 +00:00
Ralph Castain
9eb14425b7 The last of the debug messages that keep hiding. My apologies.
This commit was SVN r11937.
2006-10-02 18:43:32 +00:00
Ralph Castain
3fd67a038f Bring comm_spawn and persistent operations online. Still some minor problems, though - so don't use them yet, please.
This won't affect anything except those two scenarios.

This commit was SVN r11936.
2006-10-02 18:29:15 +00:00
Ralph Castain
12328395ae Missed a couple of debug statements
This commit was SVN r11935.
2006-10-02 15:46:41 +00:00
Ralph Castain
7494a7a83f Clean out some debugging statements that were inadvertently left in the commit
This commit was SVN r11933.
2006-10-02 15:03:18 +00:00
Ralph Castain
559b9b0ae8 Continue beating on comm_spawn. Setup to debug bproc.
This commit was SVN r11932.
2006-10-02 14:58:22 +00:00
Ralph Castain
65593cd67e Fix a few Cyrador warnings
This commit was SVN r11930.
2006-10-02 13:00:32 +00:00
Ralph Castain
121f834776 Continue bringing comm_spawn back online. Ensure all RM frameworks post their HNP receives. Fix the rmgr proxy component.
Still need some work on the proxy component, and on job termination for persistent daemon case.

This commit was SVN r11928.
2006-10-02 00:46:31 +00:00
Brian Barrett
e464adcd51 Need to tell the daemon how many procs it will start (always 1, because of
the way we do the fake mapping thing...

This commit was SVN r11924.
2006-10-01 23:25:22 +00:00
Brian Barrett
95ba51fbd4 * Clean up debugging output so that it's useful
* Error message in an NSError object is localizedDescription, not
    localizedErrorReason.  The latter is a decription of how the error
    can occur, which is usually nothing in XGrid frameworks.
  * Clean up silly error in finding the Kerberos Service Principal
    when using Kerberos authenticaion
  * Print useful error message when a connection unexpectedly closes, 
    as this is usually authentication related...

This commit was SVN r11923.
2006-10-01 22:43:17 +00:00
Brian Barrett
9807a38458 Always initialize the base output stream, but only set verbose if requested.
Otherwise, the PLS components have pay more attention to debugging streams
than the rest of the OMPI source code

This commit was SVN r11922.
2006-10-01 22:37:30 +00:00
Ralph Castain
db6a93fa63 Fix a couple of reported issues:
1. PLS finalize was not being called. Now ensure that happens during orte_finalize.

2. Errmgr proxies were sending their messages to the wrong tag - typical cut/paste error.

This commit was SVN r11891.
2006-09-29 12:45:50 +00:00
Brian Barrett
8943f583bf quiet some debugging output
This commit was SVN r11813.
2006-09-26 04:10:07 +00:00
Brian Barrett
d8d55a760f First attempt at Kerberos support for the XGrid process starter
refs trac:345

This commit was SVN r11812.

The following Trac tickets were found above:
  Ticket 345 --> https://svn.open-mpi.org/trac/ompi/ticket/345
2006-09-26 03:54:38 +00:00
Brian Barrett
9733c8e3bd Update XGrid RAS and PLS to the new infrastructure. Not yet super well
tested, but starting to get there...

This commit was SVN r11810.
2006-09-26 03:26:45 +00:00
Ralph Castain
f906af983a Forgot to change the silly Makefile.am names - sorry Cyrador!
This commit was SVN r11670.
2006-09-15 04:52:20 +00:00
Ralph Castain
37dfdb76eb Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done.
This commit was SVN r11661.
2006-09-14 21:29:51 +00:00
George Bosilca
17afe7dc9f Do it on the correct way as this is normally compiled as a module.
This commit was SVN r11660.
2006-09-14 21:22:41 +00:00
George Bosilca
01c5a115b2 Don't export the POE module. Only the component have to be exported (visible).
This commit was SVN r11659.
2006-09-14 21:20:31 +00:00
Josh Hursey
908f31fe9f Fix a code clarity issue in the POE PLS.
Allow the POE RAS to be compled for linux as well as AIX.
The POE RAS is really a Loadleveler RAS, and IU now has
a cluster that uses Loadleveler in a Linux environment (BigRed).

This seems to be the only thing we need to do so far to run 
Open MPI on BigRed. Yay :)

This commit was SVN r11600.
2006-09-09 05:13:15 +00:00
Josh Hursey
160120b4c5 Fix a cut-n-paste error that causes the 'num_concurrent' to be
set to 1 or 0 instead of the user defined number or default (128).

This caused the PLS to deadlock when using '--debug-daemons' with
more than 2 processes. :(

svn blame says that it was broken in r11347

It is *not* a problem on v1.1 or v1.2 branches.

Bug spotted by Tim Mattox and myself.

This commit was SVN r11575.

The following SVN revision numbers were found above:
  r11347 --> open-mpi/ompi@f52c10d18e
2006-09-08 15:17:17 +00:00
Jeff Squyres
0f11584a6c * Update svn:ignore
* Remove svn:executable from non-executable files

This commit was SVN r11555.
2006-09-07 17:17:40 +00:00
George Bosilca
ba1514f2e7 A slightly more Windows friendly version. Unfortunately there
is no support for SGE on Windows.

This commit was SVN r11436.
2006-08-27 04:46:43 +00:00
Pak Lui
65a524dd0d - need to provide option for showing the grid engine's JOB_ID in case the grid engine job needs to be killed
- clean up the orted_path and debug message

This commit was SVN r11413.
2006-08-24 20:27:19 +00:00
Pak Lui
4f75dfd353 - missed the opal_os_path() for LD_LIBRARY_PATH
This commit was SVN r11410.
2006-08-24 18:58:50 +00:00
George Bosilca
9110ea2b80 Add the Windows fork component. As fork is not available on Windows, I
create a process component which use CreateProcess to spawn the child.
Special care should be taken in order to correctly redirect the stdin,
stdout and stderr of the child process.

This commit was SVN r11405.
2006-08-24 17:51:20 +00:00
George Bosilca
0d607c1346 Use opal_os_path and OPAL_PATH_SEP to build the file path. I don't have any
machine to test, so I hope I get it right.

This commit was SVN r11398.
2006-08-24 16:20:32 +00:00
Pak Lui
5220c1ca42 - converted some tabs into spaces
This commit was SVN r11384.
2006-08-23 23:21:08 +00:00
Pak Lui
9dda057f05 - Do the changes as in r11347 for gridengine to use opal_os_path().
- Remove extra NULL argument from rsh module.

This commit was SVN r11377.

The following SVN revision numbers were found above:
  r11347 --> open-mpi/ompi@f52c10d18e
2006-08-23 20:40:01 +00:00
Jeff Squyres
715bae369c Remove extra argument - now obsoleted by the use of opal_os_path().
This commit was SVN r11366.
2006-08-23 14:32:06 +00:00
George Bosilca
f52c10d18e And ORTE is ready for prime-time. All Windows tricks are in:
- use the OPAL functions for PATH and environment variables
- make all headers C++ friendly
- no unamed structures
- no implicit cast.

Plus a full implementation for the orte_wait functions.

This commit was SVN r11347.
2006-08-23 03:32:36 +00:00
George Bosilca
6afa4c6c64 Windows friendly version. We have to split the OMPI_DECLSPEC in at least 3
different macros, one for each project. Therefore, now we have OPAL_DECLSPEC,
ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project.

This commit was SVN r11270.
2006-08-20 15:54:04 +00:00
Ralph Castain
ee04e04dd0 Attempt to cleanup the xgrid pls module
This commit was SVN r11261.
2006-08-18 21:21:31 +00:00
Ralph Castain
8c7f0ed9ae Change the SOH to the new State Monitoring and Reporting (SMR) framework. New API's will be appearing in the new framework shortly - this just gets the name change into the system.
Other changes:

1. Remove the old xcpu components as they are not functional.

2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one.

This will require an autogen/configure, I'm afraid.

This commit was SVN r11228.
2006-08-16 16:35:09 +00:00
Ralph Castain
5dfd54c778 With the branch to 1.2 made....
Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced).

Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up).

I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t).

In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but...

Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems.

This commit was SVN r11204.
2006-08-15 19:54:10 +00:00
Ralph Castain
663e25f7cb Finalize the Bproc vpid algorithm.
Bproc is now fully operational and supports oversubscribed conditions for both bynode and byslot mapping procedures.

This commit was SVN r11180.
2006-08-14 19:16:11 +00:00
Ralph Castain
285aea1c0c Update to bproc algorithm to support oversubscription - committing to move to another test environment.
Note that this may break bproc for the moment.

This commit was SVN r11178.
2006-08-14 18:34:13 +00:00
Ralph Castain
de9156552b I have confirmed that the later version of the bproc launcher does support Bproc 3, so it appears that the outdated bproc_seed launcher truly is no longer required.
This commit was SVN r11164.
2006-08-12 07:47:21 +00:00
Ralph Castain
0ccc910485 Fix the Bproc vpid computation so that, when we map by slot, adjacent processes have vpids differing by only one.
I will ammend the documentation in the files shortly to explain why this was previously broken.

This commit was SVN r11162.
2006-08-11 19:41:33 +00:00
Ralph Castain
59d6f1e2eb Remove ompi_ignores on gridengine components as this seems resolved - thanks Pak for quick response!
Fixed a few very minor compiler complaints in the pls_gridengine_module.c file. ISO C is less forgiving about where variables get declared.

This commit was SVN r11156.
2006-08-11 15:32:17 +00:00
Ralph Castain
5fd6306c2f Add ompi_ignores until the configuration can be fixed
This commit was SVN r11154.
2006-08-11 14:11:41 +00:00
Pak Lui
08352878cc * Added in new ras and pls components to support Sun N1 Grid Engine (N1GE)
6 and its open source version as the job launchers for ORTE.

This commit was SVN r11153.
2006-08-10 21:46:52 +00:00
Ralph Castain
8496b6aff4 When a "fork" launch cannot find the executable, the system used to just return an error. This meant that the state of that process was never updated in the registry, leaving the counters at the incorrect levels. As a result, the triggers would never fire to indicate that the job had been aborted. This left orterun and other orteds/processes hanging.
This fix should fix the problem. I will test it on a broader range of systems forsooth...

This commit was SVN r11140.
2006-08-09 15:29:08 +00:00
Brian Barrett
16186978bb - Fix some compile issues in r11109
- indent / whitespace cleanup
- don't set --daemon-debug when pls debug is given, as it seems to make
  the daemons abort.

This commit was SVN r11113.

The following SVN revision numbers were found above:
  r11109 --> open-mpi/ompi@da7df6d257
2006-08-03 18:51:42 +00:00
Galen Shipman
da7df6d257 monitor bproc node state and terminate the job if a node in our job goes
down.. 

This commit was SVN r11109.
2006-08-03 05:29:49 +00:00
Ralph Castain
8bec270f90 Fix a bug noted by Jeff - we were no longer accurately recording in the registry that a process had been terminated when the user initiated the "kill" process (via cntrl-c).
Added another system-level test function for ORTE that just spins until terminated by a ctrl-c signal.

Modified orterun - added a couple of newlines to the output when abnormally terminating so the prompt always is on a new line.

This commit was SVN r10866.
2006-07-18 14:42:27 +00:00
Jeff Squyres
ffddfc5629 Turns out that it's a really Bad Idea(tm) to tm_spawn() and then not
keep the resulting tm_event_t that is generated because the back-end
TM library actually caches a bunch of stuff on it for internal
processing, and doesn't let go of it until tm_poll().

tm_event_t's are similar to (but slightly different than)
MPI_Requests: you can't do a million MPI_Isend()'s on a single
MPI_Request -- a) you need an array of MPI_Request's to fill and b)
you need to keep them around until all the requests have completed.

This commit was SVN r10820.
2006-07-14 22:04:41 +00:00
Jeff Squyres
ef8433a60b After more discussion on the phone, it seems easier to not muck around
in special components but rather go down to a /tmp branch.  So
removing these components and I'll branch next.

This commit was SVN r10771.
2006-07-12 22:12:29 +00:00
Jeff Squyres
62c189ea1c Fix a few blanket search/replaces
This commit was SVN r10768.
2006-07-12 21:54:05 +00:00
Jeff Squyres
36ca7497d1 Update m4 and configure files
This commit was SVN r10764.
2006-07-12 20:55:39 +00:00
Ralph Castain
a84898316c Create new components to support Thunderbird scalability development
This commit was SVN r10762.
2006-07-12 20:28:23 +00:00
Brian Barrett
41e144c879 Fix for ticket #92, bproc stdin being borked. The problem was that we were
using a pty for everything, which drops all buffered data on the floor when
close() is called on the daemon side, meaning EOF has some issues.  Instead,
do the same thing we do for other starters that use the fork() pls -- use
a pipe/fifo for stdin and stderr and a pty for stdout.  This is good enough
for what we need and avoids most of the issues with ptys.

This commit was SVN r10692.
2006-07-08 21:18:24 +00:00
Jeff Squyres
3d5d0959fa Remove unused variable, and therefore silence a compiler warning.
This commit was SVN r10673.
2006-07-06 10:44:04 +00:00
Josh Hursey
b1da6f8bc4 A bit more cleanup for that last patch.
* num_children should really be an int instead of size_t
  since 'size_t' is not signed and num_children can (in rare cases)
  drop below 0, and don't want it to roll around to MAX_INT or some
  such.

 * I figured out that this problem only happened to me because I use
  the pls_fork_reap_timeout MCA parameter and thus the only time that
  the code in pls_fork_module.c to waitpid is executed is if this is
  not set to 0 (I had it set to 1 to give my procs time to exit). I
  adjusted the loop from while{...} to do{...}while; so that it is
  executed at least once for consistency.

 * de-register the SIGCHILD callback for the pid before we attempt
  to kill it, so that we don't leave the door open for both the
  waitpids (the one in the callback, and the one in this function)
  to race to see who can wait on the child.

 * Move the 'thread release' to outside the for loop for a bit of an
  optimization, and always set the value to 0 since we want to 
  finish after this function.

 * Added a help message for the case when we can't send a kill()
   signal to the process. Should never happen, but all is possible
   in the wild wild west of HPC.

This commit was SVN r10666.
2006-07-05 21:38:23 +00:00
Josh Hursey
696bb4a0c0 A partial fix for the hanging orted bugs (Ticket #177)
When we force an application to terminate (via CTRL-C to mpirun)
we send an out-of-band message to the orted to reap its children.
the fork PLS was doing an internal waitpid but never releasing or
updating the information and signaling the condition variable. So
the fork PLS callback for SIGCHLD registered with the event library
and this waitpid are in a bit of a race to 'waitpid' for the children.
Since the PLS callback was the only one that handled the signal properly
when it 'won' then things were great -- as in the normal termination case.
But when it 'lost' -- as in the abnormal termination case -- the orted
never received the proper signal that its children had gone away.

We want to preserve the internal fork PLS callback since it allows
for a timeout while waiting for the child, which the event library
won't do.

This allows both to exist, and behave properly.

This was introduced in r9068.

The ticket is still open since the orted's hang in other situations
still. This is a fix for one of the causes.

This commit was SVN r10662.

The following SVN revision numbers were found above:
  r9068 --> open-mpi/ompi@c2c2daa966
2006-07-05 19:37:29 +00:00
Jeff Squyres
538965aeb0 Final merge of stuff from /tmp/tm-stuff tree (merged through
/tmp/tm-merge).  Validated by RHC.  Summary:

- Add --nolocal (and -nolocal) options to orterun
- Make some scalability improvements to the tm pls

This commit was SVN r10651.
2006-07-04 20:12:35 +00:00
Josh Hursey
5c5ce7e051 When 'mca_oob_send_callback' accesses the callback 'orte_pls_rsh_terminate_job_cb'
with an error status (< 0) then the req buffer is NULL. Put checks around the
OBJ_RELEASE(req) calls so that we don't try to release NULL :/

This commit was SVN r10641.
2006-07-03 22:44:54 +00:00
Josh Hursey
d082a63734 Add some new OPAL functionality.
After seeing the uglyness that is removing directories in the
codebase I decided to push down this to the OPAL by extending the
opal/os_create_dirpath.(c|h) to contain some more functionality.

In this process I renamed 'os_create_dirpath' to 'os_dirpath' since it
is a bit more general now.

Added a few functions to:
 - check if an directory is empty
 - check to see if the access permissions are set correctly
 - destroy the directory at the end of the dirpath
   - By using a caller callback function (a la Perl, I believe)
     for every file, the caller can have fine grained control over
     whether a specific file is deleted or not.

This simplifies things a bit for orte_session_dir_(finalize|cleanup)
as it should no longer contain any of this functionality, but uses
these functions to do the work.

From the external perspective nothing has changed, from the 
developer point of view we have some cleaner, more generic code.

This commit was SVN r10640.
2006-07-03 22:23:07 +00:00
Brian Barrett
0bd5acc51f * Fix for bus error in XGrid starter
This commit was SVN r10615.
2006-07-01 16:16:46 +00:00
Brian Barrett
2cf73912e2 * fix for signal forwarding additions in bproc_orted code
This commit was SVN r10529.
2006-06-27 19:59:07 +00:00
Sushant Sharma
76926756d0 variable ntid not being assigned any value was resulting in errors
This commit was SVN r10480.
2006-06-22 18:00:54 +00:00
Sushant Sharma
b5a16b6515 Updated xcpu launcher. open-mpi no longer needs xcpu library. Launcher code is now moved within xcpu.
This commit was SVN r10342.
2006-06-13 23:21:56 +00:00
Brian Barrett
17a8ccef89 * update XGrid API to match recent signal changes
This commit was SVN r10262.
2006-06-08 21:15:35 +00:00
Ralph Castain
ee5a626d25 Add ability to trap and propagate SIGUSR1/2 to remote processes. There are a number of small changes that hit a bunch of files:
1. Changed the RMGR and PLS APIs to add "signal_job" and "signal_proc" entry points. Only the "signal_job" entries are implemented - none of the components have implementations for "signal_proc" at this time. Thus, you can signal all of the procs in a job, but cannot currently signal only one specific proc.

2. Implemented those new API functions in all components except xgrid (Brian will do so very soon). Only the rsh/ssh and fork modules have been tested, however, and only under OS-X.

3. Added signal traps and callback functions for SIGUSR1/2 to orterun/mpirun that catch those signals and call the appropriate commands to propagate them out to all processes in the job.

4. Added a new test directory under the orte branch to (eventually) hold unit and system level tests for just the run-time. Since our test branch of the repository is under restricted access, people working on the RTE were continually developing their own system-level tests - thus making it hard to help diagnose problems. I have moved the more commonly-used functions here, and added one specifically for testing the SIGUSR1/2 functionality.

I will be contacting people directly to seek help with testing the changes on more environments. Other than compile issues, you should see absolutely no change in behavior on any of your systems - this additional functionality is transparent to anyone who does not issue a SIGUSR1/2 to mpirun.

Ralph

This commit was SVN r10258.
2006-06-08 18:27:17 +00:00
Jeff Squyres
4882dc0e2c Addendum to r9930: missed a chunk of the rsh pls to use the basename
of $libdir and $bindir (i.e., was correctly doing local launches, but
was still using $prefix/lib and $prefix/bin for remote launches).

[Re-]Fixes OFED bug 59.

This commit was SVN r10207.

The following SVN revision numbers were found above:
  r9930 --> open-mpi/ompi@1d6902296c
2006-06-05 21:12:36 +00:00
Jeff Squyres
1d6902296c Additions to the tm, slurm, and rsh pls modules to handle the --prefix
option as discussed on the devel-core mailing list.  The Big
Difference is that instead of hard-coding the strings "/lib" and
"/bin" in to append to the prefix, we append the basename of the local
libdir and bindir.  Hence, if your libdir is $prefix/lib64, we'll
append /lib64 to construct the remote node's LD_LIBRARY_PATH (etc.).

Also appended the orterun.1 man page to include a description of
--prefix, how it is constructed, what it handles / what it does not,
etc.

This commit was SVN r9930.
2006-05-16 14:14:12 +00:00
Gleb Natapov
80dfe7e39b remove newline from environment
This commit was SVN r9892.
2006-05-11 13:15:48 +00:00
Sushant Sharma
7a6e0c9ebf Fixed remote environment setup. Submitted by: Tim Woodall
This commit was SVN r9759.
2006-04-27 20:07:56 +00:00
Ralph Castain
95c4795157 Try a different tack...
This commit was SVN r9658.
2006-04-19 15:33:34 +00:00
Ralph Castain
93115fdaea Try again with passing the right enviro variables.
This commit was SVN r9629.
2006-04-13 18:07:22 +00:00
Ralph Castain
480af1c150 Add the missing enviro variables
This commit was SVN r9627.
2006-04-13 16:41:47 +00:00
Sushant Sharma
642e33fb3e xcpu launcher updated to setup the environment on remote nodes before launching jobs.
This commit was SVN r9622.
2006-04-12 22:42:41 +00:00
Ralph Castain
424900068f Update the xcpu launcher to setup the environment
This commit was SVN r9620.
2006-04-12 15:41:54 +00:00
Sushant Sharma
9fe5870862 xcpu pls component fixed so that it will compile correctly.
This commit was SVN r9617.
2006-04-11 20:27:13 +00:00
Ralph Castain
9adc16130e Proposed revision of the xcpu launcher to correctly incorporate the OpenRTE and Open MPI environment
This commit was SVN r9612.
2006-04-11 14:33:17 +00:00
Sushant Sharma
8d5289b2b8 Corrected Makefile.am files for pls and soh xcpu-components as per Brian's suggestion.
This commit was SVN r9519.
2006-04-03 17:14:47 +00:00
Jeff Squyres
858612fd06 Face the possibilty that the child may have already died.
This commit was SVN r9508.
2006-04-01 02:23:10 +00:00
Sushant Sharma
46f84b1e8e Added xcpu component in pls and soh.
This commit was SVN r9491.
2006-03-31 02:19:52 +00:00
Brian Barrett
23f0aef07c * fix casting warning
This commit was SVN r9407.
2006-03-24 02:52:34 +00:00
Tim Woodall
564c177922 corrections for threading support
This commit was SVN r9292.
2006-03-16 00:06:48 +00:00
Tim Woodall
648ca32742 correct include
This commit was SVN r9276.
2006-03-14 14:40:52 +00:00
Brian Barrett
c42da09796 * Fix a small bug George noticed - if you change the prefix (or any of the
installation directories) in configure, the files that depend on this
  information are not properly rebuilt.  If you need this information,
  don't setup a -D in the Makefile.am - instead, include 
  opal/install_dirs.h.
* Use the : option in AC_CONFIG_FILES to avoid needing to expose that
  we are playing around with temporary files with our headers to avoid
  rebuilding
* Clean up the version file information a bit, and like the install 
  directory stuff, make sure that there is a dependency so that 
  ompi_info gets rebuilt properly when a version number changes.

This commit was SVN r9256.
2006-03-12 04:35:01 +00:00
Brian Barrett
d0f5f8a242 * enable pty support for platforms that do not have openpty(), but do
have pty support in general.

This commit was SVN r9250.
2006-03-11 02:35:40 +00:00
Jeff Squyres
28a1610453 - Add mssing <netdb.h>
- Change #if to #ifdef

This commit was SVN r9146.
2006-02-26 16:06:58 +00:00
Jeff Squyres
b4765b6db6 Add <netdb.h>
This commit was SVN r9142.
2006-02-25 19:04:26 +00:00
Jeff Squyres
22da6ef4e4 This bit accidentally got lost in the testing of the bproc/fork path
and cwd update functionality.  For bproc, we *do* need to change
directories while checking the cwd because argv[0] may be expressed as
a relative path, and therefore needs to be checked from the cwd
expressed in the app context.

This commit was SVN r9084.
2006-02-17 16:15:21 +00:00
Jeff Squyres
2c91ac861a When not in debug mode, tie stdout/stderr to dev null so we don't see
messages from orted (i.e., from the srun command).

This commit was SVN r9083.
2006-02-17 15:06:08 +00:00
Jeff Squyres
81e0bd444b - Remove extraneous chdir("/tmp")
- For a local orted launch, chdir($HOME) to be consistent with what
  [we assume] will happen on the remote nodes

This commit was SVN r9073.
2006-02-16 22:14:05 +00:00
George Bosilca
ab59741df6 Access to environ is not granted for free, we have to declare it on MAC OS X
(at least with gcc 4.2).

This commit was SVN r9071.
2006-02-16 21:26:25 +00:00
Jeff Squyres
c2c2daa966 Change the behavior of orterun (mpirun, mpirexec) to search for
argv[0] and the cwd on the target node (i.e., the node where the
executable will be running in all systems except BProc, where the
searches are run on the node where orterun is invoked).
- fork pls now does cwd and argv[0] search in orted
- bproc pls does cwd and argv[0] search in orterun
- cwd behavior slightly different:
  - if user specifies a -wdir to orterun, we chdir() to there; if we
    can't for some reason, abort
  - if user does not specify a -wdir, try to chdir() to the dir where
    orterun was invoked.  If we can't for some reason (e.g., it
    doesn't exist on the target node), then try to chdir($HOME).  If
    we can't do that, then just live with whatever default directory
    we were put in.

This commit was SVN r9068.
2006-02-16 20:40:23 +00:00
Tim Woodall
039fe0ad29 change process group only in bproc case, as this is really
a workaround for a bproc4 bug

This commit was SVN r9064.
2006-02-16 16:19:37 +00:00
Tim Woodall
c07e84cf6d correct return values
This commit was SVN r9063.
2006-02-16 16:18:46 +00:00
Tim Woodall
fc751171cd bproc cleanup from release branch
This commit was SVN r9054.
2006-02-16 00:16:22 +00:00
David Daniel
aa5c5772c2 Fixing a wayward OMPI_ERROR.
Fixing logic of a couple of error logging statements (compiler was complaining)

This commit was SVN r9042.
2006-02-15 00:09:33 +00:00
Ralph Castain
f5d17148c1 Clean up the references to num_env, which has been removed from app_context.
This commit was SVN r9014.
2006-02-13 21:08:35 +00:00
Ralph Castain
bc6a82839d Update these components to new dss
This commit was SVN r9004.
2006-02-13 15:28:29 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
Ralph Castain
1abe8ef368 Well, it certainly helps triggers to fire if the respective responsible routines adjust the counters!
The INIT counter is supposed to be adjusted when the processes are mapped - this is now done correctly.

The LAUNCHED counter is supposed to be adjusted when the pls sets the process pid info into the registry and changes the state to LAUNCHED. This could probably be changed to have that function use the set_proc_soh API, but this fixes the problem for now.

Thanks to Brian for finding that the triggers were not being fired.

This commit was SVN r8948.
2006-02-09 15:39:06 +00:00
Brian Barrett
72de49f0ad * make the xgrid component compile again. still need to test tomorrow...
This commit was SVN r8913.
2006-02-07 04:46:00 +00:00
Ralph Castain
4b9f015c0b Merge in the new data support subsystem for ORTE. MPI folks should not notice a difference. Longer explanation will be sent to developers mailing list.
This commit was SVN r8912.
2006-02-07 03:32:36 +00:00
Jeff Squyres
abc67a257f This approach is cleaner than the previous one -- use a temporary
shell variable to avoid setting the OMPI $libpath twice in
$LD_LIBRARY_PATH.  Many thanks to Glenn Morris.

This commit was SVN r8883.
2006-02-02 11:58:40 +00:00
Jeff Squyres
cc1ee11eeb Fix issues with tcsh and LD_LIBRARY_PATH when using --prefix. See
lengthy comment inside for details.  Thanks to Glen Morris for finding
this issue and suggesting the fix.

This commit was SVN r8880.
2006-02-02 06:26:55 +00:00
Jeff Squyres
f7097d34c8 Remove some \n typos. Thanks to Glenn Morris for finding these.
This commit was SVN r8878.
2006-02-02 05:50:15 +00:00
George Bosilca
e6e28460f1 Remove all windows code as fork is not available on windows. Instead a shinny new pls
will join the fun (handling process creation on windows).

This commit was SVN r8745.
2006-01-19 07:01:51 +00:00
Jeff Squyres
e6bd80b424 Per the commit message of r8514, change the search order to be "ssh :
rsh", *not* "rsh : ssh".

This commit was SVN r8736.

The following SVN revision numbers were found above:
  r8514 --> open-mpi/ompi@9c25bdc5ac
2006-01-18 22:00:34 +00:00
George Bosilca
bf266c6109 Rollback the 8682 commit until we figure out the correct way to do it. It break several things
inside (like MPI_Wait* functions).

This commit was SVN r8686.
2006-01-13 22:02:40 +00:00
Rainer Keller
95f886b6ab - Protect callers of opal/ompi_condition_wait from spurious wakeups,
possible when with building with pthreads.
   Compiled on Linux ia32 with and without
   --enable-progress-threads

This commit was SVN r8682.
2006-01-12 17:13:08 +00:00
David Daniel
d272e02338 Need to include fcntl.h on linux -- protected for windows.
This commit was SVN r8630.
2006-01-04 00:54:16 +00:00
George Bosilca
7a88e72c1b Add more protections around the headers.
This commit was SVN r8617.
2005-12-31 12:35:24 +00:00
Jeff Squyres
1e93c78e2e - Rename rsh component members: argv->agent_argv, argc->agent_argc,
and path->agent_path so that it's totally clear what these are for
- make a new rsh component param for agent_param (the value from the
  MCA param)
- delay the path check for the agent until the component init -- don't
  make it fail during open, because the MCA base will print a warning
  if a component fails open() (e.g., on clusters without rsh/ssh (!),
  this component was failing noisly even though it was
  normal/expected)

This commit was SVN r8596.
2005-12-22 14:37:19 +00:00
Jeff Squyres
8fb5e506aa Arrgh -- should have included this in last commit: need to set a
variable before we += it in a Makefile.am.

This commit was SVN r8595.
2005-12-22 14:30:27 +00:00
Jeff Squyres
93b4d12d14 Add a friendly help message if no pls components are found to be
available.

This commit was SVN r8594.
2005-12-22 14:29:45 +00:00
Rainer Keller
b06d79d4fe - Seems with change r7664, the mapping has slightly changed.
In case of checking for Shell with --mca pls_rsh_assume_same_shell 0
   have the node point to sensible values.

This commit was SVN r8563.

The following SVN revision numbers were found above:
  r7664 --> open-mpi/ompi@0629cdc2d7
2005-12-20 15:59:17 +00:00
Brian Barrett
456ba1c11f * need to declare environ on OS X
this should go to the 1.0 branch

This commit was SVN r8527.
2005-12-16 19:20:33 +00:00
Jeff Squyres
fa097c9874 Remove two components that were templated out quite a while ago and
aren't currently in use (i.e., they were never finished).  If needed,
they can be pulled out of SVN history.

This commit was SVN r8524.
2005-12-16 17:40:51 +00:00
Jeff Squyres
25b2730a34 Only allow the fork component to run when we're in an orted.
This commit was SVN r8515.
2005-12-15 21:05:26 +00:00
Jeff Squyres
9c25bdc5ac Change to the rsh pls component to have the pls_rsh_agent MCA param
now take a colon-delimited list of agents (and associated argv).  Also
change the default value to "ssh : rsh".  Hence, if we run on a
cluster that does not have ssh, we'll fall back to rsh.  If we can't
find rsh, then the rsh component will disqualify itself from
selection.

This commit was SVN r8514.
2005-12-15 20:54:24 +00:00
George Bosilca
7d8d516a4a A bunch of fixed for Windows support.
- protection with __WINDOWS__ and not WIN32 or _WIN32
 - protect all the headers

This commit was SVN r8463.
2005-12-12 20:04:00 +00:00
Jeff Squyres
6fbd321442 Fix a bunch of install locations for header files
This commit was SVN r8406.
2005-12-08 00:54:44 +00:00
Galen Shipman
6e64e8a144 bproc fixes, these exist in the release 1.0 branch.
This commit was SVN r8292.
2005-11-28 21:10:02 +00:00
Brian Barrett
20cea60b82 * fix "make distclean" error in PML
* turns out (duh!) that there was a reason that the <projectdir>dir
  variable was set in the AM conditional.  If not, stupid directories
  are created and not needed...  duh.

This commit was SVN r8205.
2005-11-20 07:41:09 +00:00
Brian Barrett
8faa1884f0 * The last of the build system optimizations. Combine the component and
component/base Makefile.am files, reducing the time configure spends
  stamping out Makefiles at the end
* Install base_impl.h file when devel-headers are being installed

This commit was SVN r8200.
2005-11-20 01:03:01 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Tim Woodall
aa5b61e4f1 corrections for multiple app contexts
This commit was SVN r7939.
2005-10-31 20:37:44 +00:00
Tim Woodall
60754acae8 - modified rmaps data structures to point directly to ras node
- modified rsh to NOT query for each nodes mapping, as all data is
  already available in the rmaps structures

This commit was SVN r7894.
2005-10-27 17:04:10 +00:00
Brian Barrett
1302cb4072 The next in a long line of crazed build system changes from Brian. This was
originally suggested by Ralf Wildenhues, to try to speed autogen, configure,
and make (and possibly even make install).  Use automake's include directive
to drastically reduce the number of Makefile files (although the number of
Makefile.am files is the same - most are just included in a top-level
Makefile.am).  Also use an Automake SUBDIRs feature to eliminate the
dynamic-mca tree, which was no longer really needed.  This makes adding
a framework easier (since you don't have to remember the dynamic-mca
tree) and makes building faster (as make doesn't have to recurse through
the dynamic-mca tree)

This commit was SVN r7777.
2005-10-17 00:21:10 +00:00
Thara Angskun
73fff4ea2c - change from mca_base_param_register_* to mca_base_param_reg_*
- update include files / fix minor bugs

This commit was SVN r7746.
2005-10-13 12:58:31 +00:00
Josh Hursey
92429dc90f Fix for a problem Edgar and Jeff identified WRT PLS determining if we are
oversubscribed on a node. And thus whether to call sched_yield or not.

The value of node->node_slots_inuse does not currently represent the number of
slots actually in use, at the moment. This is actually a bug in the RAS/RMAPS
base components, but the fix for that specific bug is bigger than we want to 
address at the moment (but will certianly do so in the near future).

Since we cannot trust this value, use the total number of mapped processes
(which was properly set by the RMAPS component upon mapping -- Just not 
properly propagated back to the registry's node segment) from the process 
mapping.

In addition to this change I cleaned up a couple of the debug messages. It
seems that TM and RSH are the only two directly effected by this. SLURM
would be if that section of code wasn't currently inactive, but put the fix
in for prosparity.

This commit was SVN r7743.
2005-10-13 03:26:48 +00:00
Brian Barrett
128389758f * fix compile error in XGrid PLS that got introduced sometime in the not
too distant past
* work around apparently broken handling of max_slots somewhere along
  the line by just setting it to 0

Both changes should go to the trunk.

This commit was SVN r7710.
2005-10-12 00:41:14 +00:00
Jeff Squyres
0629cdc2d7 Bring back the changes from /tmp/jjhursey-rmaps. Specific merge
command:

svn merge -r 7567:7663 https://svn.open-mpi.org/svn/ompi/tmp/jjhursey-rmaps .

(where "." is a trunk checkout)

The logs from this branch are much more descriptive than I will put
here (including a *really* long description from last night).  Here's
the short version:

- fixed some broken implementations in ras and rmaps
- "orterun --host ..." now works and has clearly defined semantics
  (this was the impetus for the branch and all these fixes -- LANL had
  a requirement for --host to work for 1.0)
- there is still a little bit of cleanup left to do post-1.0 (we got
  correct functionality for 1.0 -- we did not fix bad implementations
  that still "work")
  - rds/hostfile and ras/hostfile handshaking
  - singleton node segment assignments in stage1
  - remove the default hostfile (no need for it anymore with the
    localhost ras component)
  - clean up pls components to avoid duplicate ras mapping queries
  - [possible] -bynode/-byslot being specific to a single app context 

This commit was SVN r7664.
2005-10-07 22:24:52 +00:00
Tim Woodall
2ea71064ad close all file descriptors w/ the exception of stdin/stdout/stderr
otherwise, parent's file descriptors are inherited and held open by
the child even if the parent dies

This commit was SVN r7652.
2005-10-06 21:22:36 +00:00
Jeff Squyres
65698bc6be Remove compiler warning
This commit was SVN r7635.
2005-10-05 10:23:02 +00:00
Jeff Squyres
0f100d8577 - Don't overwrite rc with the return value from pls_tm_disconnect --
it's always ORTE_SUCCESS and sometimes masks real !=ORTE_SUCCESS rc
  values. 
- Add MCA param pls_tm_want_path_check.  If nonzero (the default),
  check for the orted in the PATH before each tm_spawn()'ing (doing a
  little caching so that we don't hammer on the filesystem -- remember
  all the PATH's where we successfully found the orted so that we
  don't have to query the filesystem multiple times for a PATH where
  we previously found the orted)
- Be sure to opal_argv_split() the pls_tm_orted MCA param

This commit was SVN r7625.
2005-10-04 19:38:51 +00:00
Jeff Squyres
80399aff17 Add some README's to describe what these components are fore.
This commit was SVN r7618.
2005-10-04 15:14:23 +00:00
Jeff Squyres
3df0828921 Restore this PLS -- LANL needs this for some of its older clusters.
This commit was SVN r7617.
2005-10-04 15:09:38 +00:00
Jeff Squyres
7645a0fa23 This is the old bproc launcher that is ok to remove.
This commit was SVN r7583.
2005-10-02 14:58:52 +00:00
Jeff Squyres
a9f24c27bd Restore bproc -- this was *not* the old one (didn't read Tim Prins'
mail carefully -- doh!)

This commit was SVN r7582.
2005-10-02 14:57:44 +00:00
Jeff Squyres
d44fc0fa2a - Clarify the help file text a little
- Remove an extraneous \n in opal_output() output

This commit was SVN r7581.
2005-10-02 11:58:51 +00:00
Jeff Squyres
91ed790715 Add --prefix processing for the tm pls
This commit was SVN r7580.
2005-10-02 11:58:18 +00:00
Jeff Squyres
da1c096883 Remove old, outdated bproc launcher.
This commit was SVN r7579.
2005-10-02 10:45:00 +00:00
Jeff Squyres
0459678f82 Fixes to make the SLURM pls handle --prefix properly
This commit was SVN r7569.
2005-09-30 21:44:05 +00:00
Jeff Squyres
e9ec846c68 Minor change to only display the prefix debug message at most once
This commit was SVN r7568.
2005-09-30 21:43:32 +00:00
Jeff Squyres
d172088dd3 Leave it up to users to do something that we hadn't planned on. :-)
If you use --prefix and then "-x LD_LIBRARY_PATH", the rsh pls would
take great pains to ensure that PATH and LD_LIBRARY_PATH were setup
correctly on the local and remote nodes, but then the fork pls would
blitely overwrite LD_LIBRARY_PATH with what the user exported (i.e.,
most likely without our prefix).  This patch takes care of that -- the
fork pls examines the incoming environment, and if it sees PATH or
LD_LIBRARY_PATH, it re-prefixes those variables.

This commit was SVN r7566.
2005-09-30 19:14:31 +00:00
Andrew Friedley
82ee2933a5 - Add an opal_show_help() to the pls fork module to explain what went wrong when the execv to start the application fails.
- Add a couple opal_show_help()'s to indicate when not enough slots/nodes are available to satisfy a request.

This commit was SVN r7555.
2005-09-30 14:30:21 +00:00
Jeff Squyres
fcef1774d5 Per advice from Ralf W., change the pkgdata declarations in
Makefile.am's to be a *slightly* more correct (and, more importantly,
less error-prone) construct.

This commit was SVN r7554.
2005-09-30 13:32:39 +00:00
Jeff Squyres
de1c8fb125 - Make debug output a bit more accurate and readable
- Fix bug identified by users: --prefix may also apply on the local
  node; we need to prefix the PATH and LD_LIBRARY_PATH environment
  variables before invoking execve()

This commit was SVN r7541.
2005-09-29 12:35:43 +00:00
Andrew Friedley
cfa09dc0e7 Fix two more missing escapes.
Sorry about breaking the tree with typos, I think this should fix all of them.

This commit was SVN r7482.
2005-09-22 16:04:46 +00:00
Andrew Friedley
555ae37255 Add lib{opal,orte,mpi}.la to appropriate LIBADD's, some whitespace cleanup as well.
This commit was SVN r7477.
2005-09-22 12:28:54 +00:00
Tim Woodall
9c334800ad merge in environ from front-end node - giving precedence
to any user supplied values. otherwise, some c library
routines behave badly (getpwuid...)

This commit was SVN r7434.
2005-09-19 21:06:05 +00:00
George Bosilca
703e874468 Remove a race condition. If this functions is called by the progress thread then it does not have to
add an event, it can call the spawn function directly. This will avoid it standing on the condition who 
will never get released.

This commit was SVN r7428.
2005-09-19 15:54:53 +00:00
Tim Woodall
f71abbf856 cleanup
This commit was SVN r7414.
2005-09-16 20:59:53 +00:00
Brian Barrett
2787d993a9 * Add checks for fork/execve/setpgid for slurm components so that they
automagically don't build on platforms without such things
* Fix for mistaken use of cache variable in assembly setup
* one more cached test hits the books

This commit was SVN r7404.
2005-09-16 04:51:09 +00:00
Tim Woodall
5d6899258f correct typo
This commit was SVN r7396.
2005-09-15 20:54:40 +00:00
Tim Woodall
247e8044f2 don't include internal variables when building environment
This commit was SVN r7395.
2005-09-15 20:49:54 +00:00
Tim Woodall
e64f7cec70 remove warning regarding pty unless debug is enabled
This commit was SVN r7394.
2005-09-15 20:49:02 +00:00
Jeff Squyres
31ea80cd0b Simplify the poe configure.m4 -- if we're on AIX, always build it. If
we're not on AIX, don't build it.

This commit was SVN r7291.
2005-09-10 10:40:26 +00:00
Jeff Squyres
7e4c23d88c Add missing header file
This commit was SVN r7290.
2005-09-10 10:39:15 +00:00
Rainer Keller
3c639efa38 - Silly cleanup
This commit was SVN r7289.
2005-09-10 08:01:47 +00:00
Rainer Keller
5fed46e072 - Allow usernames to be specified in the hostfile.
The following formats are parsed:
    user@IPv4
    user@fqdn
    IPv4 or fqdn [username|user-name|user_name]=user
- Try a better error-detection when parsing (recognize wrong
  IPs, fqdns...)

This commit was SVN r7288.
2005-09-10 07:57:50 +00:00
Thara Angskun
e3feddfdd4 Move from stone aged configure.stub to configure.m4
This commit was SVN r7231.
2005-09-08 09:49:21 +00:00
Brian Barrett
ed56e743b7 * update configure.ac to use the modern version of AC_INIT and
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
  number to be set at autoconf time (instead of at configure time, as
  it was before).  Set the version number, minus the subversion r number,
  at autoconf time.  Override the internal variables to include the r
  number (if needed) at configure time.  Basically, the right thing
  should always happen.  The only place it might not is the version
  reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
  them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
  in the directory containing source files, even if the Makefile.am is
  in another directory.  This should start making it feasible to
  reduce the number of Makefile.am files we have in the tree, which
  will greatly reduce the time to run autogen and configure.

This commit was SVN r7211.
2005-09-07 05:54:53 +00:00
Josh Hursey
a5e5924217 Added a custom arguments MCA param for Slurm PLS.
This allows the user to specify certain options to srun when an application
is launched with this PLS.

A useful example is the need to set the time to wait from when the first
process completes and when slurm kills remaining processes:

  pls_slurm_args=--wait=1200

This commit was SVN r7206.
2005-09-06 21:52:28 +00:00
Rainer Keller
a36347d728 - Support -prefix specification on mpirun/orterun cmd-line per
app_context:
  mpirun -np 2 -prefix /path/to/ompi/on/machineA ./exec1 : \
         -np 2 -prefix /path/to/ompi/on/machineB ./exec2

- Allow with -mca pls_rsh_assume_same_shell 0, the checking for the
  SHELL-variable on the actual node (currently 1st node).
  Sets the prefix, PATH and LD_LIBRARY_PATH for bash/ksh and 
  csh/tcsh.

This commit was SVN r7195.
2005-09-06 16:10:05 +00:00
Rainer Keller
588a62cb90 - Missed file in last commit
This commit was SVN r7179.
2005-09-04 20:55:27 +00:00
Ralph Castain
4b5b3b4164 Properly handle the argv array and clean it up when done.
This commit was SVN r7166.
2005-09-03 00:15:21 +00:00
Ralph Castain
4bd25e0292 Few minor memory leak cleanups
This commit was SVN r7156.
2005-09-02 18:50:01 +00:00
Jeff Squyres
3962c53e2e - Add to AM_CPPFLAGS $(OPAL_LTDL_CPPFLAGS) where necessary in order to
add a -I to find the included ltdl.h (vs. a system-installed ltdl.h)
- Clean up kruft in a bunch of Makefile.am's to remove now-unnecessary
  AM_CPPFLAGS settings to get static-components.h for each framework
- Move the component_repository API functions out of opal/mca/base/base.h
  and into opal/mca/base/mca_base_component_repository.h in order to
  decrease unnecessary dependencies (e.g., before this, almost
  everything in the tree depended on ltdl.h, which is unnecessary --
  only a small number of files really need ltdl.h)

This commit was SVN r7127.
2005-09-01 12:16:36 +00:00
George Bosilca
53ccf0e58c POE is working. It can spawn jobs, redirect the output and is able to kill the job (with or without CTRL_C).
This commit was SVN r7093.
2005-08-30 16:13:55 +00:00
Rainer Keller
d7901c97a5 - Del whitespaces, to make coming patch smaller.
This commit was SVN r7089.
2005-08-30 06:58:37 +00:00
Brian Barrett
bf8a3632bb * bunch more memory leak / block in use fixes
This commit was SVN r7085.
2005-08-29 21:35:01 +00:00
Brian Barrett
fc71fd5744 * fix place where Jeff changed an exit to a return and we really wanted
it to be an exit.
* Put the srun process (or what is about to become the srun process) in
  it's own process group so that group-wide signals (such as the 
  SIGINT sent by hitting cntl-c in a shell) are not sent to the srun
  process. 

This commit was SVN r7068.
2005-08-27 17:08:48 +00:00
Jeff Squyres
27554c19d7 Add missing .h file
This commit was SVN r7062.
2005-08-27 11:01:44 +00:00
Jeff Squyres
c9cdb36b0b Finally get this right: move orte_sys_info.[ch] back into the orte
tree.
- fix up #include's throughout the tree (yay contrib/search_replace.pl!)
- remove a few extraneous #include's
- remove orte_sys_info*() from opal_init()/opal_finalize() (it's
  already in orte_init_stage1() and orte_system_finalize())
- remove dependencies in opal on orte_system_info -- util/os_path.c
  and util/os_create_dirpath.c (they only used path_sep, anyway --
  easily changed to #defines)

This commit was SVN r7059.
2005-08-26 21:03:41 +00:00
Jeff Squyres
b3bd549331 - Change a few calls from exit() to orte_abort() so that we get
session directory cleanup (among other things)
- When we get an abnormal exit in orterun (i.e., timeout expires and
  we haven't gotten termination notices from all processes), print a
  better message an exit in a better way (which includes session
  directory cleanup)
- Fix tm and poe pls's to not exit() but rather propagate the error up
  the stack (where relevant)

This commit was SVN r7058.
2005-08-26 20:36:11 +00:00