1
1
Граф коммитов

602 Коммитов

Автор SHA1 Сообщение Дата
Shiqing Fan
358b4a5cba Add an option to enable the debug postfix for executables.
This commit was SVN r24070.
2010-11-19 15:54:13 +00:00
Shiqing Fan
c03ea1a5f3 A more clean way to build on Windows.
It's not possible to combine two shared libraries on Windows, so we have to do it a bit different. First generate a small event static library by just linking the object files, and link it into other libraries that needs the libevent API.

This commit was SVN r24039.
2010-11-11 12:02:54 +00:00
Shiqing Fan
7bac326920 Fix Windows build, add custom command to generate static libraries (opal and orte) for shared build.
This commit was SVN r24012.
2010-11-09 08:32:45 +00:00
Ralph Castain
9ea2b196ce Convert the opal_event framework to use direct function calls instead of hiding functions behind function pointers. Eliminate the opal_object_t abstraction of libevent's event struct so it can be directly passed to the libevent functions.
Note: the ompi_check_libfca.m4 file had to be modified to avoid it stomping on global CPPFLAGS and the like. The file was also relocated to the ompi/config directory as it pertains solely to an ompi-layer component.

Forgive the mid-day configure change, but I know Shiqing is working the windows issues and don't want to cause him unnecessary redo work.

This commit was SVN r23966.
2010-10-28 15:22:46 +00:00
Ralph Castain
c13b0bb668 Update some debugger attachment code per LLNL request
This commit was SVN r23965.
2010-10-28 03:06:20 +00:00
Ralph Castain
86c7365e8e Clean up a few initialization issues - don't think these are impacting the shared memory situation as it didn't fix the problem.
Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution.

This commit was SVN r23943.
2010-10-26 02:41:42 +00:00
Ralph Castain
fceabb2498 Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.

Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.

Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.

I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:

1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)

2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.

There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.

This commit was SVN r23925.
2010-10-24 18:35:54 +00:00
Ralph Castain
2c1a658232 Fix debugger attach
This commit was SVN r23921.
2010-10-22 20:07:24 +00:00
Ralph Castain
871b685e89 Ensure all debugger interface symbols are present in orterun
This commit was SVN r23823.
2010-09-30 21:36:00 +00:00
Ralph Castain
40a2bfa238 WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gone into validating the branch. However, not all conditions can be checked, so users are cautioned that it may be advisable to not update from the trunk for a few days to allow MTT to identify platform-specific issues.
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.

Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.

This commit was SVN r23764.
2010-09-17 23:04:06 +00:00
Rainer Keller
97511912ec - Fixup several functions, that cannot return
- Add one instance where we do not use a parameter in a function
 - Fix a buglet in commit r23689, where the attribute-for-function ptrs
   was applied.

This commit was SVN r23690.

The following SVN revision numbers were found above:
  r23689 --> open-mpi/ompi@5eb571c458
2010-08-31 12:21:13 +00:00
Josh Hursey
e12ca48cd9 A number of C/R enhancements per RFC below:
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php

Documentation:
  http://osl.iu.edu/research/ft/

Major Changes: 
-------------- 
 * Added C/R-enabled Debugging support. 
   Enabled with the --enable-crdebug flag. See the following website for more information: 
   http://osl.iu.edu/research/ft/crdebug/ 
 * Added Stable Storage (SStore) framework for checkpoint storage 
   * 'central' component does a direct to central storage save 
   * 'stage' component stages checkpoints to central storage while the application continues execution. 
     * 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress) 
     * 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching) 
 * Added Compression (compress) framework to support 
 * Add two new ErrMgr recovery policies 
   * {{{crmig}}} C/R Process Migration 
   * {{{autor}}} C/R Automatic Recovery 
 * Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component 
 * Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option) 
   * {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342) 
   * {{{OMPI_CR_Restart}}} 
   * {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules) 
   * {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192) 
   * {{{OMPI_CR_Quiesce_start}}} 
   * {{{OMPI_CR_Quiesce_checkpoint}}} 
   * {{{OMPI_CR_Quiesce_end}}} 
   * {{{OMPI_CR_self_register_checkpoint_callback}}} 
   * {{{OMPI_CR_self_register_restart_callback}}} 
   * {{{OMPI_CR_self_register_continue_callback}}} 
 * The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future. 
 * Add a progress meter to: 
   * FileM rsh (filem_rsh_process_meter) 
   * SnapC full (snapc_full_progress_meter) 
   * SStore stage (sstore_stage_progress_meter) 
 * Added 2 new command line options to ompi-restart 
   * --showme : Display the full command line that would have been exec'ed. 
   * --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413) 
 * Deprecated some MCA params: 
   * crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir 
   * snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir 
   * snapc_base_global_shared deprecated, use sstore_stage_global_is_shared 
   * snapc_base_store_in_place deprecated, replaced with different components of SStore 
   * snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref 
   * snapc_base_establish_global_snapshot_dir deprecated, never well supported 
   * snapc_full_skip_filem deprecated, use sstore_stage_skip_filem 

Minor Changes: 
-------------- 
 * Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing. 
 * Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components 
 * Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it. 
 * Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}} 
 * Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set. 
 * opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality. 
 * Cleanup the CRS framework and components to work with the SStore framework. 
 * Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably). 
 * Add 'quiesce' hook to CRCP for a future enhancement. 
 * We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}. 
 * Add optional application level INC callbacks (registered through the CR MPI Ext interface). 
 * Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive. 
 * {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked. 
 * {{{opal-restart}}} also support local decompression before restarting 
 * {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata 
 * {{{orte-restart}}} now uses the SStore framework to work with the metadata 
 * Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality. 
 * Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}. 
 * Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped. 
 * Make sure to decrement the number of 'num_local_procs' in the orted when one goes away. 
 * odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options. 
 * Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities. 
 * Improve the checks for 'already checkpointing' error path. 
 * A a recovery output timer, to show how long it takes to restart a job 
 * Do a better job of cleaning up the old session directory on restart. 
 * Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment) 
 * Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize. 

This commit was SVN r23587.

The following Trac tickets were found above:
  Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
  Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
  Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
  Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
  Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
  Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
  Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
2010-08-10 20:51:11 +00:00
Eugene Loh
5e1f40ba12 Clean up English and clarify a point regarding process binding
in the mpirun man page.

This commit was SVN r23558.
2010-08-05 23:10:44 +00:00
Eugene Loh
1c12192c93 Fix grammatical typo in orterun man page.
This commit was SVN r23555.
2010-08-04 20:37:28 +00:00
Ralph Castain
8ccd14d508 Correct the -h output to reflect that --bind-to-none is the default behavior
This commit was SVN r23537.
2010-08-01 22:31:34 +00:00
Jeff Squyres
64cb8f5d7f Another round of man page cleanups from Debian mantainer Manuel
Prinz.  Many thanks!

This commit was SVN r23445.
2010-07-20 14:07:18 +00:00
Ralph Castain
12cd07c9a9 Start reducing our dependency on the event library by removing at least one instance where we use it to redirect the program counter. Rolf reported occasional hangs of mpirun in very specific circumstances after all daemons were done. A review of MTT results indicates this may have been happening more generally in a small fraction of cases.
The problem was tracked to use of the grpcomm.onesided_barrier to control daemon/mpirun termination. This relied on messaging -and- required that the program counter jump from the errmgr back to grpcomm. On rare occasions, this jump did not occur, causing mpirun to hang.

This patch looks more invasive than it is - most of the affected files simply had one or two lines removed. The essence of the change is:

* pulled the job_complete and quit routines out of orterun and orted_main and put them in a common place

* modified the errmgr to directly call the new routines when termination is detected

* removed the grpcomm.onesided_barrier and its associated RML tag

* add a new "num_routes" API to the routed framework that reports back the number of dependent routes. When route_lost is called, the daemon's list of "children" is checked and adjusted if that route went to a "leaf" in the routing tree

* use connection termination between daemons to track rollup of the daemon tree. Daemons and HNP now terminate once num_routes returns zero

Also picked up in this commit is the addition of a new bool flag to the app_context struct, and increasing the job_control field from 8 to 16 bits. Both trivial.

This commit was SVN r23429.
2010-07-17 21:03:27 +00:00
Ralph Castain
23b166cbd4 Initialize pointers to NULL
This commit was SVN r23422.
2010-07-15 16:33:57 +00:00
Ralph Castain
2193ace464 Ensure the debugger symbols are loaded into orterun prior to orte_init so they are found by attaching debuggers.
This commit was SVN r23368.
2010-07-09 17:51:00 +00:00
Ralph Castain
31295e8dc2 As discussed on today's telecon, reorganize the debugger attachment code in orte to better support efforts within the tool community aimed at exploring alternative methods. Move the debugger attachment code from the orterun directory to a new debugger framework. Organize the existing standard support code into an "mpir" component. Organize the current extensions for co-spawning debugger daemons into a separate "mpirx" component.
Since the MPIR symbols are now included in the ORTE library, remove duplicate declarations in OMPI and replace them with extern references to their ORTE instantiations.

This commit was SVN r23360.
2010-07-06 23:35:42 +00:00
Jeff Squyres
8fef296b8a Updates about thread support levels.
This commit was SVN r23341.
2010-07-02 13:14:09 +00:00
Ralph Castain
628936a99f Provide a convenience option to disable fault recovery (as opposed to setting three separate, long-named mca params)
This commit was SVN r23327.
2010-07-01 19:31:11 +00:00
Ralph Castain
099c3aad97 Fix a major foopah that broke debugger attach. With the revisions in updating proc state, we dropped the recording of each proc's pid. Thus, attaching debuggers would find a proctable whose pids all equal 0.
This required modification of the errmgr.update_state API so the pid could be passed in to the function that could update the proper data record(s). All calls to that API have been updated as well, but I obviously couldn't test them all.

Thanks to Dong Ahn (LLNL) for catching this problem!

Also fixed debugger daemon cospawn, both for initial launch and attach-while-running modes. Tested and verified on rsh and slurm.

This commit was SVN r23300.
2010-06-24 05:13:53 +00:00
Ralph Castain
ae746a390f Debugger daemons spawned upon attachment to a running job need to be treated just like a regular job - they are not "piggybacking" onto an existing launch, and so the orte daemons need to report them just like a regular job launch in order to release from spawn.
Modify the debugger control flag to include "do not monitor" so orterun will not take debugger daemon termination into account when deciding that all jobs are done.

This commit was SVN r23282.
2010-06-19 15:22:36 +00:00
Jeff Squyres
f1a7b5cc33 Make "processor affinity not supported" error message a little better:
* Remove OPAL_ERR_PAFFINITY_NOT_SUPPORTED; fit it into the generic
   OPAL_ERR_NOT_SUPPORTED case.
 * When odls_default detects that processor affinity is not supported,
   it prints a specific message about it, and then it suppressed a
   generic HNP help message that would normally follow it (i.e., it's
   easier to have the "processor affinity is not supported" show_help
   message last).
 * Use some symbolic names in odls_default instead of fixed int's,
   just for slight readability improvements in the code.
 * Introduce orte_show_help_suppress(), which gives the ability to
   suppress any future showings of any arbitrary show_help() message.
   This is useful if you display message X and want to suppress
   message Y.  This suppression *only* works in environments where
   orte_show_help() does coalescing.

This commit was SVN r23249.
2010-06-08 20:16:07 +00:00
Jeff Squyres
e734939ddf Minor wrod wrapping to make the text fit within the show_help lines.
This commit was SVN r23219.
2010-05-28 20:25:54 +00:00
Jeff Squyres
76d1bb3c96 Per some off-list discussion with Ralph and George, MPIR_being_debugged
really should be volatile.

This commit was SVN r23166.
2010-05-18 11:42:41 +00:00
Abhishek Kulkarni
260eaa24dd Add a missing header. Bleh.
This commit was SVN r23165.
2010-05-17 23:55:53 +00:00
Abhishek Kulkarni
8335ff3893 Fixing a branch merge issue. Looks like we picked the wrong branch when resolving
a conflict.

This commit was SVN r23164.
2010-05-17 23:49:46 +00:00
Abhishek Kulkarni
afbe3e99c6 * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
 SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
 back the native error code.

* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
  (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
  decode 'ret' to get the native error code.

This commit was SVN r23162.
2010-05-17 23:08:56 +00:00
Ralph Castain
88f5217a12 Cleanup the debugger daemon co-launch code and add an ability to test it. Implement ability to co-launch debugger daemons upon attach to a running job for jobs launched under rsh, slurm, and tm environments (others can easily be added if desired).
Add new mca params to test:

orte_debugger_test_daemon: Name of the executable to be used to simulate a debugger colaunch
orte_debugger_test_attach: Test debugger colaunch after debugger attachment

To test co-launch at job start, just set the orte_debugger_test_daemon param.

To test co-launch upon attach:
set orte_debugger_test_daemon
set orte_debugger_test_attach=1
set orte_enable_debug_cospawn_while_running=1
set orte_debugger_check_rate=<N> - defines the number of seconds to wait before "checking" for a debugger attaching

Added a "debugger" program to orte/test/mpi that just spins to simulate a debugger daemon.

This commit was SVN r23144.
2010-05-14 18:44:49 +00:00
Ralph Castain
7ce34223f1 Per off-list discussion, implement the new OMPI exit status policy (soon to be on wiki) and further cleanup error reporting to cover new cases.
Implement process migration when failed nodes are detected. Some testing still required

This commit was SVN r23121.
2010-05-12 18:11:58 +00:00
Ralph Castain
4bd25f587c Begin handling the case of lost connections by having the OOB report it to the errmgr instead of the routed framework. Add an "app" component to t
he errmgr framework so that it can decide how to respond - which for now at least is just to check for lifeline and abort if so.

Add a new error constant to indicate that the error is "unrecoverable" so the oob can know it needs to abort.

This commit was SVN r23112.
2010-05-11 00:34:12 +00:00
Ralph Castain
d6a1d7a082 Little more cleanup on paffinity. Provide a specific error code for affinity not supported so we can better report the problem. Move the error reporting to orterun so we only get one error message. Update the darwin paffinity module to return the correct new error codes.
This commit was SVN r23107.
2010-05-07 14:04:55 +00:00
Ralph Castain
319758e3e0 Restore process recovery for procs local to mpirun (first step towards restoring full capability). Define three new MCA params:
1. orte_enable_recovery - default recovery policy, can be overridden on a per-job basis

2. orte_max_local_restarts - default max number of local restarts, can be overridden

3. orte_max_global_restarts - default max number of relocates, can be overridden

Implement the restart_proc API for the ODLS framework, reorganize the default fns a little to avoid copying code.

This commit was SVN r23057.
2010-04-28 04:06:57 +00:00
Ralph Castain
b9893aacc5 Add a sensor framework to ORTE that monitors applications and notifies the errmgr when they exceed specified boundaries. Two modules are included here:
1. file activity - can monitor file size, access and modification times. If these fail to change over a specified number of sampling iterations (rate is an mca param), then the errmgr is notified.

2. memory usage - checks amount of memory used by a process. Limit and sampling rate can be set.

This support must be enabled by configuring --enable-sensors.

ompi_info and orte-info have been updated to include the new framework.

Also includes some initial steps toward restoring the recovery capability. Most notably, the ODLS API has been extended to include a "restart_proc" entry for restarting a local process, and organizes the various ERRMGR framework globals into a single struct as we do in the other ORTE frameworks. Fix an oversight in the ERRMGR framework where a pointer array was constructed, but not initialized.

Implementation continues.

This commit was SVN r23043.
2010-04-26 22:15:57 +00:00
Ralph Castain
efbb5c9b7c Revamp the errmgr framework to provide a greater range of optional behaviors, including different behaviors for daemons, and remove several looping messages across the code base:
* add hnp and orted modules to the errmgr framework. The HNP module contains much of the code that was in the errmgr base since that code could only be executed by the HNP anyway.

* update the odls to report process states directly into the active errmgr module, thus removing the need to send messages looped back into the odls cmd processor. Let the active errmgr module decide what to do at various states.

* remove the code to track application state progress from the plm_base_launch_support.c code. Update the plm modules to call the errmgr directly when a launch fails.

* update the plm_base_receive.c code to call the errmgr with state updates from remote daemons

* update the routed modules to reflect that process state is updated in the errmgr

* ensure that the orted's open the errmgr and select their appropriate module

* add new pretty-print utilities to print process and job state. Move the pretty-print of time info to a globally-accessible place

* define a global orte_comm function to send messages from orted's to the HNP so that others can overlay the standard RML methods, if desired.

* update the orterun help output to reflect that the "term w/o sync" error message can result from three, not two, scenarios

This commit was SVN r23023.
2010-04-23 04:44:41 +00:00
Shiqing Fan
d1e66bdd01 Use variables instead of hard-coded compiler flags, in order to support various C/C++ compilers on Windows.
This commit was SVN r23016.
2010-04-21 12:45:00 +00:00
Ralph Castain
4308922f59 Ensure that any application-specific selection of ess module doesn't get overridden by what is given to the orted or orterun
Cleanup tool name determination for CM

This commit was SVN r22980.
2010-04-15 18:10:50 +00:00
Ralph Castain
c32f046d7c Tiny cleanup - when the user kills us with a ctrl-c, there really isn't a need to tell him "your procs died and we don't know why". Just shaddup and die.
This commit was SVN r22949.
2010-04-09 18:47:35 +00:00
Ralph Castain
de6679dbd3 Truly respect the -quiet option. Make it an mca param so someone doesn't have to put it solely on the cmd line. Tell show_help to shaddup as well.
This commit was SVN r22926.
2010-04-02 14:19:38 +00:00
Ralph Castain
25a9a195b0 When the user requests "quiet", mpirun isn't supposed to output "helpful error messages" - so shaddup!
This commit was SVN r22925.
2010-04-02 07:13:11 +00:00
Jeff Squyres
c57a8fba5a Improve the help message when mpirun cannot find an executable.
Refs trac:2035

This commit was SVN r22922.

The following Trac tickets were found above:
  Ticket 2035 --> https://svn.open-mpi.org/trac/ompi/ticket/2035
2010-04-01 13:26:29 +00:00
Josh Hursey
e4f2d03d28 ErrMgr Framework redesign to better support fault tolerance development activities.
Explained in more detail in the following RFC:
  http://www.open-mpi.org/community/lists/devel/2010/03/7589.php

This commit was SVN r22872.
2010-03-23 21:28:02 +00:00
Josh Hursey
9e967a3a9b Revert this change, since even in the CR case we want to reset this var to NULL.
Thanks to Jeff for the catch.

This commit was SVN r22868.
2010-03-23 19:55:21 +00:00
Shiqing Fan
9591680ec0 One of the binaries was generated from a wrong source.
This commit was SVN r22865.
2010-03-23 09:56:11 +00:00
Josh Hursey
e9b5162d79 Fix the configure logic for --with-ft so that it properly takes a comma separated list.
Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those.

The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed.

This commit was SVN r22824.
2010-03-12 23:57:50 +00:00
Ralph Castain
f2c65dc70f Ensure that the errmgr does not take action if the process was terminated by a "kill_procs" command as this can lead to circular logic.
Cleanup the kill_procs command by removing a no-longer-used param. We update the process state when the proc actually exits.

This commit was SVN r22783.
2010-03-05 13:22:12 +00:00
Ralph Castain
2541aa98ab Change the app_idx type to uint32_t to support users who use large numbers of app_contexts. Set it up as a new typedef so we can change it later without as much effort.
This commit was SVN r22727.
2010-02-27 17:37:34 +00:00
Ralph Castain
6c0d7940c7 Add a new MCA param (and corresponding mpirun cmd line option) to output the debugger proctable info after launch. The output is just the job map with the process pid included, so you get a node-by-node list of the process ranks on that node and thier pids.
Works for initial launch and comm_spawn. xml and non-xml output is available

This commit was SVN r22725.
2010-02-27 08:32:25 +00:00
Eugene Loh
924bc10ceb Add description of the sequential mapper (-mca rmaps seq) to the
mpirun man page.

This commit was SVN r22567.
2010-02-06 06:22:29 +00:00
Ralph Castain
00b493e10d Make the sigpipe message be a verbose output
This commit was SVN r22464.
2010-01-22 03:46:11 +00:00
Ralph Castain
2517799102 Report, but ignore, SIGPIPE events. The odls already resets this signal handler when spawning local procs.
This commit was SVN r22463.
2010-01-21 05:01:06 +00:00
Shiqing Fan
ad763c327d Restore several linked libraries that were deleted by mistake in r22405.
This commit was SVN r22415.

The following SVN revision numbers were found above:
  r22405 --> open-mpi/ompi@872a4047ba
2010-01-14 21:50:42 +00:00
Shiqing Fan
872a4047ba Fix the bug that caused by ADD_DEPENDENCIES() from different version of CMake.
In CMake 2.6 and earlier, this function add dependencies for targets and also link the target libraries automatically, but in CMake 2.8,this behavior has been changed, i.e. it will only add the dependencies but no link, which will cause linking errors at compilation time.

This commit was SVN r22405.
2010-01-14 18:10:20 +00:00
Ralph Castain
d6003e3369 Update the mpirun man page to include info on the "!" option to the xterm cmd line option that holds the xterm window open after process termination so the user can see the process' output.
This commit was SVN r22341.
2009-12-26 16:01:03 +00:00
Ralph Castain
8ab962411c Detect the scenario where one or more procs fail to call orte/ompi_init while others in the job do. This scenario can cause the job to hang as MPI_Init contains a barrier operation that will not complete. Although ORTE does not contain such a barrier, it still will be considered as an error scenario so that we can detect the MPI case - otherwise, ORTE has no knowledge of OMPI and wouldn't know how to differentiate the use-cases.
Take advantage of the changes to update the routed_base_receive code to avoid message overlap.

This commit was SVN r22329.
2009-12-17 19:39:53 +00:00
Jeff Squyres
9fffd30660 Fix a typo in the man page. Thanks to Jeremiah Willcock for pointing
it out.

This commit was SVN r22268.
2009-12-04 21:10:50 +00:00
Jeff Squyres
16b100219d A patch from UTK to allow orte_init(), opal_init(), and associated
friends also receive &argc and &argv (George asked Jeff to Ralph to
review before committing).  The thought is that passing argv and argc
to opal/orte_init be useful to other projects outside of OMPI that are
using OPAL and/or ORTE (especially in conjunction with some other
bootstrapping code where it is helpful to modify argv).  It's such a
small thing that it's easy to apply here to make others' lives a
little easier.

Ask George for more details; I'm just the messenger.  :-)

Judging by the copyrights on this patch, it's been around for a
while.  :-)

This commit was SVN r22260.
2009-12-04 00:51:15 +00:00
Jeff Squyres
02db4f5146 Terry pointed out that ORTE also needs the "totalview" flags, and
therefore the m4 test really belongs on orte/config.  Thank Terry!

Additionally, I took the opprotunity to rename the variable so that
"TOTALVIEW" is not in the name anymore (because it applies to all
variables, not just Totalview).

This commit was SVN r22134.
2009-10-23 13:00:59 +00:00
Terry Dontje
13907781b2 missed adding report-uri option
This commit was SVN r22106.
2009-10-15 18:05:24 +00:00
Terry Dontje
c96af5654c correct options and wording that were dropped in the last change due to committing v1.3 manpage to the trunk
This commit was SVN r22104.
2009-10-15 15:03:21 +00:00
Terry Dontje
0a8645a411 This commit fixes trac:2017
This commit was SVN r22098.

The following Trac tickets were found above:
  Ticket 2017 --> https://svn.open-mpi.org/trac/ompi/ticket/2017
2009-10-14 11:40:47 +00:00
Ralph Castain
67625d5df6 Esure that mca params have a chance to be used in the standard hierarchy - in this case, allowing a choice of mapping policy to be made at the rmaps mca level.
This commit was SVN r22089.
2009-10-11 03:44:39 +00:00
Ralph Castain
19a13039aa Strange - remove duplicated comment
This commit was SVN r22035.
2009-09-30 14:38:13 +00:00
George Bosilca
a6a6df0c48 MPIR_Breakpoint should be externally visible in order to allow the
debugger to find the symbol in our libs.

This commit was SVN r22032.
2009-09-30 04:07:55 +00:00
Ralph Castain
c749fefbd0 Instead of an odls-base mca param, make report_bindings a global param so that we can (a) detect it was set in the plm, and then (b) ensure it gets passed along to remote orteds so they will comply with the request.
This commit was SVN r22021.
2009-09-28 03:17:15 +00:00
Ralph Castain
6fa2e81491 Correct handling of pointer array indexing
This commit was SVN r22017.
2009-09-26 03:33:26 +00:00
Ralph Castain
dff0d01673 Yet another paffinity cleanup...sigh.
1. ensure that orte_rmaps_base_schedule_policy does not override cmd line settings

2. when you try to bind to more cores than we have, generate a not-enough-processors error message

3. allow npersocket -bind-to-core combination - because, yes, somebody actually wants to do it.

This commit was SVN r21996.
2009-09-22 18:44:53 +00:00
Terry Dontje
0ccf2d87b6 rename do-not-bind to bind-to-none and clean up an error message
This commit was SVN r21980.
2009-09-21 17:00:02 +00:00
Ralph Castain
2028017554 Modify the paffinity system to handle binding directives that are "soft" - i.e., when someone directs that we bind if the system supports it. This allows community members to distribute OMPI with default MCA param files that direct general binding policies, without having the distributed software fail if the system cannot support those policies.
The new options work by adding an ":if-avail" qualifier to the "bind-to-socket" and "bind-to-core" MCA params. If the system does not support this capability, the job will launch anyway. Without the qualifier, the job will abort with an error message indicating that the required functionality is not supported on this system.

This commit was SVN r21975.
2009-09-18 19:48:42 +00:00
Ralph Castain
8ae4b55d16 Enable a new command line option to --report-events that instructs mpirun to RML-report specific events during job life to the requestor.
This commit was SVN r21954.
2009-09-09 05:28:45 +00:00
Ralph Castain
ca09e8f604 Minor modification required to allow opal_paffinity_alone to default to bind-to-core
This commit was SVN r21948.
2009-09-05 15:24:26 +00:00
Ralph Castain
0421a49844 Update the xml support to allow -xml-file foo whereby we redirect all xml formatted output (and ONLY xml formatted output) to a specified file
This commit was SVN r21930.
2009-09-02 18:03:10 +00:00
Ralph Castain
0394a4884d Setup cpus-per-proc and cpus-per-rank as synonyms, both in mca params and on mpirun cmd line
This commit was SVN r21914.
2009-08-30 14:30:36 +00:00
Ralph Castain
3ef028ca23 Trap mpirun error messages in xml format
This commit was SVN r21910.
2009-08-28 02:46:15 +00:00
Shiqing Fan
4119497c5a Use select() for Windows events by default.
For some historical reasons, we didn't use select() for the Windows events. Now it could be merged back to have a better performance on Windows.

This commit was SVN r21899.
2009-08-27 08:11:56 +00:00
Ralph Castain
5e710928a5 Revise the new binding system slightly:
1. finalize the logic for properly respecting externally assigned bindings. Thanks to Chris Samuel for his help with this. Still needs some acid testing, but appears to now work.

2. remove the double-logic of requiring opal_paffinity_alone AND bind-to-foo. If the user specifies bind-to-foo, trust her and just do it.

This commit was SVN r21885.
2009-08-26 02:01:49 +00:00
Ralph Castain
509cc0553c When directly launched by an RM, flag that a process is operating without daemons - i.e., standalone. Provide an error string for the new socket_not_available error. Use errmgr.abort to exit when we cannot get a socket, and ensure that the slurmd module returns the proper exit status for slurm 2.0
This commit was SVN r21868.
2009-08-22 02:58:20 +00:00
Ralph Castain
d3313f732a Flush the mpirun xml tag output to maintain ordering
This commit was SVN r21836.
2009-08-18 21:20:55 +00:00
Ralph Castain
3796cc568a Add a --report-bindings cmd line option to mpirun
This commit was SVN r21829.
2009-08-18 17:10:23 +00:00
Ralph Castain
a489f3fc6c Surround the entire xml output from an mpirun with <mpirun> tags.
This commit was SVN r21826.
2009-08-18 03:15:29 +00:00
Ralph Castain
0005e6e834 Correct a couple of bugs in the rank_file mapper that were incorrectly assigning vpids.
Add a capability to parse the rankfile to extract node information in place of requiring both hostfile and rankfile for non-RM managed environments. The rankfile is -only- parsed for this IF the hostfile and -host options are not given. Otherwise, those are used to establish allocation info as we did before this commit.

This commit was SVN r21815.
2009-08-13 16:08:43 +00:00
Ralph Castain
1dc12046f1 Modify the OMPI paffinity and mapping system to support socket-level mapping and binding. Mostly refactors existing code, with modifications to the odls_default module to support the new capabilities.
Adds several new mpirun options:

* -bysocket - assign ranks on a node by socket. Effectively load balances the procs assigned to a node across the available sockets. Note that ranks can still be bound to a specific core within the socket, or to the entire socket - the mapping is independent of the binding.

* -bind-to-socket - bind each rank to all the cores on the socket to which they are assigned.

* -bind-to-core - currently the default behavior (maintained from prior default)

* -npersocket N - launch N procs for every socket on a node. Note that this implies we know how many sockets are on a node. Mpirun will determine its local values. These can be overridden by provided values, either via MCA param or in a hostfile

Similar features/options are provided at the board level for multi-board nodes.

Documentation to follow...

This commit was SVN r21791.
2009-08-11 02:51:27 +00:00
George Bosilca
a934b9d975 Add the Open MPI specific part based on a patch from Manuel. Add the
sparc and alpha. A manpage patch is also included. This partially fixes
ticket #1973.

This commit was SVN r21654.
2009-07-13 20:01:12 +00:00
Ralph Castain
b97f885c00 Restore the original API to terminate individual processes instead of the entire job. This was originally removed as we didn't at that time know how to take advantage of it. Some of us are now working on proactive resilience methods that move procs prior to node failure, so this is now a required API. Modify the odls, plm, and orted functions to support this new functionality.
Continue work on the resilient mapper, completing support for fault groups.

This commit was SVN r21639.
2009-07-13 02:29:17 +00:00
Ralph Castain
00fb79567f Shift some setup items from orterun to the ess/hnp module so that any HNP will perform them.
This commit was SVN r21520.
2009-06-25 03:00:53 +00:00
Ralph Castain
0ba845fed2 Continue development of regular expression support by implementing it for slurm launches. Works for both initial (cmd line and non-cmd line) and comm_spawn launch.
Additional work required to fully enable static port support when using cmd line regular expression launch system.

This commit was SVN r21502.
2009-06-23 20:25:38 +00:00
Ralph Castain
0336460b0a Continue implementation of resilient operations by supporting reuse of jobids for restarted procs. Ensure that restarted processes have valid node and local ranks, and that node rank values are passed to direct-launched processes.
This commit was SVN r21385.
2009-06-06 01:08:47 +00:00
Ralph Castain
882b40182b Add --show-progress option to mpirun
This commit was SVN r21360.
2009-06-02 23:52:59 +00:00
Iain Bason
e7ff2368d6 This fixes trac:1930.
Emit a more informative error message when the file descriptor limit is
reached during an accept() call.  Also, abort when the accept fails to
avoid an infinite loop.

Emit a more informative error message when the help file can't be opened.

This commit was SVN r21271.

The following Trac tickets were found above:
  Ticket 1930 --> https://svn.open-mpi.org/trac/ompi/ticket/1930
2009-05-26 20:03:21 +00:00
Rainer Keller
9ad9b20847 - Fix Coverity CID #15.
This commit was SVN r21229.
2009-05-14 00:25:02 +00:00
Ralph Castain
c45ff0d59f Take the next step towards fully utilizing static ports for the daemons to eliminate the initial "phone home" to mpirun by modifying the orted termination procedure to eliminate the need for a full barrier-like operation. Instead, we add a "onesided" barrier to the grpcomm framework API that releases the orted once it has completed its own contribution to the barrier - i.e., the orteds now exit as the "ack" message rolls up towards mpirun instead of sending the "ack" directly to mpirun.
This causes the orteds in the routing tree to remain alive until all termination "acks" from orteds below them have passed through. Thus, if we use static ports, we no longer require a direct orted-to-mpirun connection.

Also modify the binomial routed module so it conforms to what all the other routed modules do and have all messages pass along the routing tree instead of short-circuiting between orteds. This further reduces the number of ports being opened on backend nodes.

This commit was SVN r21203.
2009-05-11 14:11:44 +00:00
Shiqing Fan
537b8cd8b1 Get rid of improper use of SET_SOURCE_FILES_PROPERTIES. When using the latest CMake (2.6 patch 4), we get many errors, which didn't show in previous version.
This commit was SVN r21188.
2009-05-07 17:41:05 +00:00
Greg Koenig
60485ff95f This is a very large change to rename several #define values from
OMPI_* to OPAL_*.  This allows opal layer to be used more independent
from the whole of ompi.

NOTE: 9 "svn mv" operations immediately follow this commit.

This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Ralph Castain
4be24521aa Modify the orte_process_info structure to handle a broader range of process types by replacing the individual booleans with a 32-bit bitmap. Use a set of #define's to define the individual bits, and a set of matching macros to test for them. Update the orte code base to use the macros instead of the booleans.
Minor mod to the ompi layer to use the new #define's - just one-line name replacements.

This commit was SVN r21144.
2009-05-04 11:07:40 +00:00
Ralph Castain
a74b74a68c Now that we understand why it failed before (thanks George!), go back to relaying cmds before processing them on the orteds so we pickup some speed by executing cmds in parallel. Modify the termination system accordingly.
Cleanup the termination in orterun when abnormally commanded via ctrl-c. We can just directly terminate_orteds as the orteds always kill any lingering local procs before exiting - no need to do the two-step cha-cha.

This commit was SVN r21123.
2009-04-30 15:08:02 +00:00
Rainer Keller
221fb9dbca ... Delayed due to notifier commits earlier this day ...
- Delete unnecessary header files using
   contrib/check_unnecessary_headers.sh after applying
   patches, that include headers, being "lost" due to
   inclusion in one of the now deleted headers...

   In total 817 files are touched.
   In ompi/mpi/c/ header files are moved up into the actual c-file,
   where necessary (these are the only additional #include),
   otherwise it is only deletions of #include (apart from the above
   additions required due to notifier...)

 - To get different MCAs (OpenIB, TM, ALPS), an earlier version was
   successfully compiled (yesterday) on:
   Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
   Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
   Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled

This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
Rainer Keller
6c1cce8761 - For the upcoming header cleanup commit,
several header files (previously included by header-files)
   now have to be moved "upward".
   This is mainly system headers such as string.h, stdio.h and for
   networking, but also some orte headers.

This commit was SVN r21095.
2009-04-29 00:49:23 +00:00
Shiqing Fan
3d4e0472d6 Add windows support files into the tarball, including .windows, CMakeLists.txt files, and CMake modules. Thanks to Jeff for testing it on Linux.
This commit was SVN r21069.
2009-04-24 16:39:33 +00:00
Ralph Castain
9fd834268c Ensure we exit with a non-zero status if terminated by user signal
This commit was SVN r20991.
2009-04-14 15:58:54 +00:00
Terry Dontje
4b43911c6a Remove superfluous spaces in manpages that were causing catman to
generate mangled windex files.  Made ompi-top.1 and ompi-iof.1 build
by default.  Also, added the orte-top synonym to the ompi-top manpage.

This commit was SVN r20915.
2009-04-01 14:40:27 +00:00
Rainer Keller
d8cf4c0fec - Get pgcc on XT to complain less:
In case we use memcmp, strlen, strup and friends include <string.h>
   Also several constants.h are not included directly
 - Let's have mca_topo_base_cart_create  return ompi-errors in
   ompi/mca/topo/base/topo_base_cart_create.c

This commit was SVN r20773.
2009-03-13 02:10:32 +00:00
Terry Dontje
d97e398b49 Clarify the error message for out of pipes condition.
This commit was SVN r20762.
2009-03-11 17:48:46 +00:00
Rolf vandeVaart
4d7071dadb Add some explanation of the -x flag as we have
gotten questions on its use.

This commit was SVN r20760.
2009-03-11 13:25:03 +00:00
Rainer Keller
ec0ed48718 - Revert r20739
This commit was SVN r20742.

The following SVN revision numbers were found above:
  r20739 --> open-mpi/ompi@781caee0b6
2009-03-05 21:56:03 +00:00
Rainer Keller
a94438343b - Revert r20740
This commit was SVN r20741.

The following SVN revision numbers were found above:
  r20740 --> open-mpi/ompi@2a70618a77
2009-03-05 21:50:47 +00:00
Rainer Keller
2a70618a77 - Second patch, as discussed in Louisville.
Replace short macros in orte/util/name_fns.h
   to the actual fct. call.

 - Compiles on linux/x86-64

This commit was SVN r20740.
2009-03-05 21:14:18 +00:00
Rainer Keller
781caee0b6 - First of two or three patches, in orte/util/proc_info.h:
Adapt orte_process_info to orte_proc_info, and
   change orte_proc_info() to orte_proc_info_init().
 - Compiled on linux-x86-64
 - Discussed with Ralph

This commit was SVN r20739.
2009-03-05 20:36:44 +00:00
Ralph Castain
f11931306a Modify the accounting system to recycle jobids. Properly recover resources from nodes and jobs upon completion. Adjustments in several places were required to deal with sparsely populated job, node, and proc arrays as a result of this change.
Correct an error wrt how jobids were being computed. Needed to ensure that the job family field was not overrun as we increment jobids for comm_spawn.

Update the slurm plm module so it uses the new slurm termination procedure (brings trunk back into alignment with 1.3 branch).

Update the slurmd ess component so it doesn't get selected if we are running a singleton inside of a slurm allocation.

Cleanup HNP init by moving some code that had been in orte_globals.c for historical reasons into the ess hnp module, and removing the call to that code from the ess_base_std_prolog


NOTE: this change allows orte to support an infinite aggregate number of comm_spawn's, with up to 64k being alive at any one instant. HOWEVER, the MPI layer currently does -not- support re-use of jobids. I did some prototype coding to revise the ompi_proc_t structures, but the BTLs are caching their own data, and there was no readily apparent way to update it. Thus, attempts to spawn more than the 64k limit will abort to avoid causing the MPI layer to hang.

This commit was SVN r20700.
2009-03-03 16:39:13 +00:00
Ralph Castain
7e5dc8f2be Ensure that we turn off stdin read event when ctrl-c terminating a program
This commit was SVN r20654.
2009-02-27 15:01:28 +00:00
Shiqing Fan
2326f14be5 Remove the unnecessary PROJECT command, I somehow misunderstood how it should be used on Windows....
This commit was SVN r20634.
2009-02-25 16:07:43 +00:00
Ralph Castain
85a9a2e6d8 Ensure that signals are de-trapped before exiting to stop the $#@@#$ event library from "asserting"
This commit was SVN r20630.
2009-02-25 03:10:21 +00:00
Rainer Keller
d81443cc5a - On the way to get the BTLs split out and lessen dependency on orte:
Often, orte/util/show_help.h is included, although no functionality
   is required -- instead, most often opal_output.h, or               
   orte/mca/rml/rml_types.h                                           
   Please see orte_show_help_replacement.sh commited next.            

 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   actually showed two *missing* #include "orte/util/show_help.h"     
   in orte/mca/odls/base/odls_base_default_fns.c and                  
   in orte/tools/orte-top/orte-top.c                                  
   Manually added these.                                              

   Let's have MTT the last word.

This commit was SVN r20557.
2009-02-14 02:26:12 +00:00
Ralph Castain
b19dc2a4fa Update mpirun's man page for report-pid and report-uri options
This commit was SVN r20396.
2009-02-02 13:49:07 +00:00
Ralph Castain
2966206f58 Fix a race condition in the IOF and add some new user-requested features:
1. fix a race condition whereby a proc's output could trigger an event prior to the other outputs being setup, thus c ausing the IOF to declare the proc "terminated" too early. This was really rare, but could happen.

2. add a new "timestamp-output" option that timestamp's each line of output

3. add a new "output-filename" option that redirects each proc's output to a separate rank-named file.

4. add a new "xterm" option that redirects the output of the specified ranks to a separate xterm window.

This commit was SVN r20392.
2009-01-30 22:47:30 +00:00
Rolf vandeVaart
0704b98668 Add the ability to forward SIGTSTP (converted to SIGSTOP) and
SIGCONT to the a.outs.  By default, they are not forwarded and
the behavior remains as it has always been.  However, if one
runs with --mca orte_forward_job_control 1, then mpirun will
catch those two signals and forward them to the orteds which
will deliver them to the a.outs.  We have had requests for
this feature.

This commit was SVN r20391.
2009-01-30 18:50:10 +00:00
Jeff Squyres
bb3d258562 Round up a few places where PATH_MAX was used instead of
OMPI_PATH_MAX.  Thanks to Andrea Iob for the bug report.

This commit was SVN r20360.
2009-01-27 22:57:50 +00:00
Ralph Castain
0435108834 Improve the efficiency of the launch system by changing the outer loop to being over app_context, and adding a flag to the app_context so the daemon can record that "this app is on my node" when decoding the launch msg.
If the --wdir option is given, check to see if the user provided a relative path. If so, convert it to an absolute path. This is needed to maintain consistent behavior across environements. Some environments automatically chdir to your current working directory when launching the remote orted, while others (e.g., ssh) don't. This levels the playing field and reduces user surprise.

This commit was SVN r20342.
2009-01-25 12:39:24 +00:00
Jeff Squyres
90e69ac6ff Fix some man page nits noticed by the Debain OMPI maintainers. Thanks
Dirk!

This commit was SVN r20307.
2009-01-21 18:38:37 +00:00
Jeff Squyres
d1c6f3f89a * Fix a truckload of Cisco copyrights to be the same as the rest of
the code base.
 * Fix a few misspellings in other copyrights.

This commit was SVN r20241.
2009-01-11 02:30:00 +00:00
Ralph Castain
bb96474d6e Per request from Aurelien, make orterun report-pid and report-uri functions work the same as that of ompi-server. Since these are used for ompi-server-like functionality, it makes sense that the report options work the same. Make orte-top take the corresponding input the same way too for consistency.
The modified cmd line options are:

--report-uri x where x is either '-' for stdout, '+' for stderr, or a filename
--report-pid x where x is the same as above

For orte-top, you can now provide either a pid or a uri (which allows connection to remote mpiruns), specified either directly or with a "file:x" option as per mpirun's ompi-server option.

Note: I did not add a report-pid option to ompi-server as it probably wouldn't be useful - the report-uri option works as well, and allows remote access (which is likely the normal way it would be used).

This commit was SVN r20168.
2008-12-24 15:27:46 +00:00
Shiqing Fan
20cea164db - 3/4 commit for Windows Visual Studio and CCP support:
corrections to non-windows files (but within ifdef __WINDOWS__)
  type casts, event library for windows use win32. 
  in orte runtime, add windows sockets handling and object construction.

This commit was SVN r20110.
2008-12-10 21:13:10 +00:00
Shiqing Fan
a5281f0434 - 1/4 commit for Windows Visual Studio and CCP support:
CMakeLists and .windows files.
  In contribs preconfigured and precompiled parts.

This commit was SVN r20108.
2008-12-10 20:59:20 +00:00
Ralph Castain
728a24c8ec After considerable patience and help with debugging/testing from Tim M and Jeff S, return a completed and pretty well tested patch of the IOF to the trunk. This commit includes the previously reverted r20074, r20068, and r20064, as well as changes to fix those commits.
Basically, the remaining problem turned out to be:

1. closing stdout/stderr during orte_finalize of mpirun

2. inadvertently setting up a write event on fd = -1

3. devising a scheme to more accurately track when the stdin write event was active vs closed so it only got released once

This passed prelim MTT testing by Jeff and Tim, but should soak for awhile before migrating to 1.3.

This commit was SVN r20106.

The following SVN revision numbers were found above:
  r20064 --> open-mpi/ompi@a07660aea8
  r20068 --> open-mpi/ompi@ec930d14a9
  r20074 --> open-mpi/ompi@2940309613
2008-12-10 20:40:47 +00:00
Ralph Castain
7e3ddb09d3 As requested by Aurelien at the July design meeting - long time coming, but finally got around to it.
Enable one mpirun to act as the server for another mpirun when doing MPI_Publish_name and its associated operations. The user is responsible, of course, for ensuring that the mpirun acting as a server outlives any mpiruns using it in that capacity.

Add a cmd line option to mpirun --report-pid that prints out mpirun's pid. Allow the --ompi-server option to now take pid:# (or PID:#) of the mpirun to be used as the server, and then look that pid up by searching the local mpirun contact infos for it.

This commit was SVN r20102.
2008-12-10 17:10:39 +00:00
Ralph Castain
1ace83c470 Enable modex-less launch. Consists of:
1. minor modification to include two new opal MCA params:
   (a) opal_profile: outputs what components were selected by each framework
       currently enabled for most, but not all, frameworks
   (b) opal_profile_file: name of file that contains profile info required
       for modex

2. introduction of two new tools:
   (a) ompi-probe: MPI process that simply calls MPI_Init/Finalize with
       opal_profile set. Also reports back the rml IP address for all
       interfaces on the node
   (b) ompi-profiler: uses ompi-probe to create the profile_file, also
       reports out a summary of what framework components are actually
       being used to help with configuration options

3. modification of the grpcomm basic component to utilize the
   profile file in place of the modex where possible

4. modification of orterun so it properly sees opal mca params and
   handles opal_profile correctly to ensure we don't get its profile

5. similar mod to orted as for orterun

6. addition of new test that calls orte_init followed by calls to
   grpcomm.barrier

This is all completely benign unless actively selected. At the moment, it only supports modex-less launch for openib-based systems. Minor mod to the TCP btl would be required to enable it as well, if people are interested. Similarly, anyone interested in enabling other BTL's for modex-less operation should let me know and I'll give you the magic details.

This seems to significantly improve scalability provided the file can be locally located on the nodes. I'm looking at an alternative means of disseminating the info (perhaps in launch message) as an option for removing that constraint.

This commit was SVN r20098.
2008-12-09 23:49:02 +00:00
Ralph Castain
e28210d0dc Revert r20074, r20068, and r20064: remove the IOF proc completion code pending further off-trunk work.
This commit was SVN r20089.

The following SVN revision numbers were found above:
  r20064 --> open-mpi/ompi@a07660aea8
  r20068 --> open-mpi/ompi@ec930d14a9
  r20074 --> open-mpi/ompi@2940309613
2008-12-09 17:11:59 +00:00
Ralph Castain
a07660aea8 Bring over the IOF completion changes. This commit fixes the long-occurring problem whereby application procs could, under some circumstances, lose their final prints to stdout/err. The commit includes:
1. coordination of job completion notification to include a requirement for both waitpid detection AND notification that all iof pipes have been closed by the app

2. change of all IOF read and write events to be non-persistent so they can properly be shutdown and restarted only when required

3. addition of a delay (currently set to 10ms) before restarting the stdin read event. This was required to ensure that the stdout, stderr, and stddiag read events had an opportunity to be serviced in scenarios where large files are attached to stdin.

This commit was SVN r20064.
2008-12-03 17:45:42 +00:00
Ralph Castain
b700fff1d1 Update the orterun man page - it was missing numerous options and had several out-of-date ones in it
This commit was SVN r19888.
2008-11-03 16:44:27 +00:00
Jeff Squyres
b11d13cc05 Silence a trivial compiler warnings (pgcc).
This commit was SVN r19810.
2008-10-27 14:23:02 +00:00
Ralph Castain
c56cdac379 Finish cleanup of stdin. Set non-stdio file descriptors to non-blocking (thanks to Jeff for catching that one). Handle writes that result in "would have blocked" errno.
This commit was SVN r19793.
2008-10-24 01:42:58 +00:00
Ralph Castain
6e5d844c36 Roll in the revamped IOF subsystem. Per the devel mailing list email, this is a complete rewrite of the iof framework designed to simplify the code for maintainability, and to support features we had planned to do, but were too difficult to implement in the old code. Specifically, the new code:
1. completely and cleanly separates responsibilities between the HNP, orted, and tool components.

2. removes all wireup messaging during launch and shutdown.

3. maintains flow control for stdin to avoid large-scale consumption of memory by orteds when large input files are forwarded. This is done using an xon/xoff protocol.

4. enables specification of stdin recipients on the mpirun cmd line. Allowed options include rank, "all", or "none". Default is rank 0.

5. creates a new MPI_Info key "ompi_stdin_target" that supports the above options for child jobs. Default is "none".

6. adds a new tool "orte-iof" that can connect to a running mpirun and display the output. Cmd line options allow selection of any combination of stdout, stderr, and stddiag. Default is stdout.

7. adds a new mpirun and orte-iof cmd line option "tag-output" that will tag each line of output with process name and stream ident. For example, "[1,0]<stdout>this is output"

This is not intended for the 1.3 release as it is a major change requiring considerable soak time.

This commit was SVN r19767.
2008-10-18 00:00:49 +00:00
Ralph Castain
b46d3e766e Cleanup the plm failed-to-start problem a little - ensure that the event is always defined so we don't have to check when trying to trigger it, thus avoiding potential race conditions.
This commit was SVN r19755.
2008-10-16 14:58:32 +00:00
Ralph Castain
48c3de1865 Fix a problem in the plm "failed to start" code observed by Jeff. When we are unable to launch to a specific node because it doesn't exist or is down, the system would hang and/or segv. The reason for the hang was that we were "firing" the orted exit trigger prior to its timer event being defined - thus "locking" that one-shot and preventing it from firing when we actually were ready to use it.
The segv was caused by the fact that we don't really know which daemon failed to start (at least, in most cases), so we didn't set a pointer to the aborted proc object. All we really wanted, though, was to ensure that mpirun returned a non-zero exit status, so the fix was to simply return the default error status.

This commit was SVN r19754.
2008-10-16 14:21:37 +00:00
Ralph Castain
802d14b130 Default job controls to forward IO. Adjust debugger code to not forward IO unless requested.
This commit was SVN r19690.
2008-10-06 17:56:23 +00:00
Ralph Castain
8d1ecdb361 Correct the creation of MPIR_Proctable so that the structs in the array correspond to the order of the ranks.
This commit was SVN r19624.
2008-09-24 14:55:46 +00:00
Ralph Castain
e64b79f30f Modify the --display-map and --display-alloc per note on devel list to reduce info for user understanding.
Add --display-devel-map and --display-devel-alloc to display all the detailed info we used to provide - it is only of use/interest to developers anyway and confuses users.

This commit was SVN r19608.
2008-09-23 15:46:34 +00:00
Ralph Castain
4e0f34a062 When we hit an error prior to actually launching daemons, it would be nice if orterun didn't bark about daemons failing to launch, mpirun detecting a job failed, etc.
Add a new job state to indicate that we never attempted to launch. Flag such a scenario and avoid hitting all the other error messages.

This commit was SVN r19366.
2008-08-19 15:19:30 +00:00
Ralph Castain
49745c5f40 Provide a new option that allows a user to leave an ssh session open without getting deluged by ORTE debug output. The new option is --leave-session-attached, with a corresponding MCA param of orte_leave_session_attached.
Theoretically, any PLM could use this - but in reality, all of them except rsh/ssh already leave the session attached anyway.

This fixes trac:656 - a REALLY old ticket

This commit was SVN r19294.

The following Trac tickets were found above:
  Ticket 656 --> https://svn.open-mpi.org/trac/ompi/ticket/656
2008-08-14 18:59:01 +00:00
Ralph Castain
30f37f762d Enable co-location of debugger daemons during initial launch and when debugging a running job.
Provide support for four MPIR extensions that allow specification of debugger daemon executable, argv for the debugger daemon, whether or not to forward debugger daemon IO, and whether or not debugger daemon will piggy-back on ORTE OOB network. Last is not yet implemented.

No change in behavior or operation occurs unless (a) the debugger specifically utilizes the extensions and, for co-locate while running, the user specifically enables the capability via an MCA param. Two of the MPIR extensions supported here are used in a widely-used debugger for a large-scale installation. The other two extensions are new and being utilized in prototype work by several debuggers for possible future release.

This commit was SVN r19275.
2008-08-13 17:47:24 +00:00
Terry Dontje
f0eec291d0 Added a couple examples and spelling corrections.
This commit was SVN r19234.
2008-08-11 12:34:02 +00:00
Jeff Squyres
797ec531aa Some more work on the man pages:
* Make the creation of the build dir for the man pages a bit more
   robust (thanks to suggestions from Ralf W.).
 * Only distribute the .Xin files, not the .X man pages themselves.
 * Make the .X files depend on opal_config.h so that if you re-run
   configure and change opal_config.h (e.g., a new version), the man
   pages should get rebuilt.
 * Man pages are now cleaned with "distclean", not "maintainer-clean".
 * Fix a typo in opal_crs.7in.
 * Udpate make_dist_tarball to update "date" in the VERSION file.
 * Make make_dist_tarball a bit friendlier to hg checkouts.

This commit was SVN r19219.
2008-08-07 19:20:40 +00:00
Ralph Castain
d7da6b3226 Just a minor cleanup of race conditions on trigger events for exit. Close the trigger pipe upon use since it is only a one-shut anyway. This removes the need to destruct the object, leaving the lock available to protect one-time termination routines throughout the life of the program.
This commit was SVN r19208.
2008-08-06 21:53:35 +00:00
Ralph Castain
63c33a9c32 Some minor updates to the locking system changes. Remove obsolete locks. Ensure the trigger event objects do not get deconstructed until the very end to avoid possible problems due to race conditions. Route all orted abnormal term tests through the trigger.
This commit was SVN r19172.
2008-08-06 11:31:06 +00:00
Ralph Castain
be02211b4f Modify the wakeup system to make it more Windows-friendly. This allows Shiqing to consolidate the Windows-specific modifications into one location, and generalizes the wakeup procedure in case we hit other system-specific requirements.
This needs some soak time to ensure we haven't opened any race conditions. I tried to loop everything in the shutdown procedure through that trigger event call to ensure it all goes through the one-time locks as it did before so that someone hitting ctrl-c when we are already shutting down shouldn't cause problems. Just want to let people use it for awhile to verify.

This commit was SVN r19159.
2008-08-05 15:09:29 +00:00
Ralph Castain
7342a6f1da Per the July technical meeting:
During the discussion of MPI-2 functionality, it was pointed out by Aurelien that there was an inherent race condition between startup of ompi-server and mpirun. Specifically, if someone started ompi-server to run in the background as part of a script, and then immediately executed mpirun, it was possible that an MPI proc could attempt to contact the server (or that mpirun could try to read the server's contact file before the server is running and ready.

At that time, we discussed createing a new tool "ompi-wait-server" that would wait for the server to be running, and/or probe to see if it is running and return true/false. However, rather than create yet another tool, it seemed just as effective to add the functionality to mpirun.

Thus, this commit creates two new mpirun cmd line flags (hey, you can never have too many!):

--wait-for-server : instructs mpirun to ping the server to see if it responds. This causes mpirun to execute an rml.ping to the server's URI with an appropriate timeout interval - if the ping isn't successful, mpirun attempts it again.

--server-wait-time xx : sets the ping timeout interval to xx seconds. Note that mpirun will attempt to ping the server twice with this timeout, so we actually wait for twice this time. Default is 10 seconds, which should be plenty of time.

This has only lightly been tested. It works if the server is present, and outputs a nice error message if it cannot be contacted. I have not tested the race condition case.

This commit was SVN r19152.
2008-08-04 20:29:50 +00:00
Ralph Castain
381d10833a Remove comments in documentation about the "-nw" option to mpirun as this option doesn't exist, and probably never will
This commit was SVN r19137.
2008-08-04 14:38:37 +00:00
Dan Lacher
9175da1e02 Putback for all changes to automate man page updates to strings of
versions, dates and build names.

Fixes trac:1387

Big thanks to Jeff and Brian for help and oversight.

This commit was SVN r19120.

The following Trac tickets were found above:
  Ticket 1387 --> https://svn.open-mpi.org/trac/ompi/ticket/1387
2008-08-01 21:14:37 +00:00
Jeff Squyres
4bdc093746 Fixes trac:1361: mainly add new internal MCA parameter that orterun will
set when it launches under debuggers using the --debug option.

This commit was SVN r19116.

The following Trac tickets were found above:
  Ticket 1361 --> https://svn.open-mpi.org/trac/ompi/ticket/1361
2008-07-31 22:11:46 +00:00
Ralph Castain
a62b2a0150 Per the July technical meeting:
Standardize the handling of the orte launch agent option across PLMs. This has been a consistent complaint I have received - each PLM would register its own MCA param to get input on the launch agent for remote nodes (in fact, one or two didn't, but most did). This would then get handled in various and contradictory ways.

Some PLMs would accept only a one-word input. Others accepted multi-word args such as "valgrind orted", but then some would error by putting any prefix specified on the cmd line in front of the incorrect argument.

For example, while using the rsh launcher, if you specified "valgrind orted" as your launch agent and had "--prefix foo" on you cmd line, you would attempt to execute "ssh foo/valgrind orted" - which obviously wouldn't work.

This was all -very- confusing to users, who had to know which PLM was being used so they could even set the right mca param in the first place! And since we don't warn about non-recognized or non-used mca params, half of the time they would wind up not doing what they thought they were telling us to do.

To solve this problem, we did the following:

1. removed all mca params from the individual plms for the launch agent

2. added a new mca param "orte_launch_agent" for this purpose. To further simplify for users, this comes with a new cmd line option "--launch-agent" that can take a multi-word string argument. The value of the param defaults to "orted".

3. added a PLM base function that processes the orte_launch_agent value and adds the contents to a provided argv array. This can subsequently be harvested at-will to handle multi-word values

4. modified the PLMs to use this new function. All the PLMs except for the rsh PLM required very minor change - just called the function and moved on. The rsh PLM required much larger changes as - because of the rsh/ssh cmd line limitations - we had to correctly prepend any provided prefix to the correct argv entry.

5. added a new opal_argv_join_range function that allows the caller to "join" argv entries between two specified indices

Please let me know of any problems. I tried to make this as clean as possible, but cannot compile all PLMs to ensure all is correct.

This commit was SVN r19097.
2008-07-30 18:26:24 +00:00
Ralph Castain
d45d728e8e Allow debuggers to attach to a running mpirun by -always- setting up the MPIR_Proctable. Only wait for MPIR_Breakpoint and hold MPI proc
s if we are launching under a debugger.

This commit was SVN r19079.
2008-07-29 17:39:16 +00:00
Ralph Castain
3107545709 Ensure that ORTE processes such as mpirun and orted never inadvertently bind themselves to cores. Change the mca param name used by the rank_file mapper to get user directives on slot lists to be different from that used by MPI procs to discover their binding. Add a cmd line option to orterun to make it easier for a user to specify the slot list (basically, hide the mca param name).
Discussed and reviewed with Lenny and Jeff.

This commit was SVN r19062.
2008-07-28 14:18:36 +00:00
Rolf vandeVaart
84ede78e40 Clarify use of --path option to mpirun.
This commit fixes trac:1231.

This commit was SVN r18909.

The following Trac tickets were found above:
  Ticket 1231 --> https://svn.open-mpi.org/trac/ompi/ticket/1231
2008-07-14 21:27:23 +00:00
Ralph Castain
1edcfa970e Remove debug
This commit was SVN r18845.
2008-07-09 00:12:11 +00:00
Ralph Castain
f1114b4144 Upgrade the ability of orterun to deal with cmd line MCA params that are passed to the orteds. Help reduce the size of the cmd line by eliminating duplicates where possible, and alert to duplicate entries that can cause problems.
Add comments to both orterun and orted code explaining why we take a snapshot of the local environment and apply it to the local procs when they are spawned.

This commit was SVN r18842.
2008-07-08 22:36:39 +00:00
Ralph Castain
51da9f2980 Properly ensure that cmd line MCA params override environmental MCA params for procs local to mpirun.
Actually, the problem was that we were simply -adding- any enviro MCA params to whatever had been found on the cmd line. Thus, duplicate MCA param directives were winding up duplicated in the environment. Some shells took the first one in the environ array - others took the last! So we could get completely different behavior based on the whims of the shell.

This commit fixes trac:1373

This commit was SVN r18836.

The following Trac tickets were found above:
  Ticket 1373 --> https://svn.open-mpi.org/trac/ompi/ticket/1373
2008-07-08 13:48:47 +00:00
Lenny Verkhovsky
23da11fdcc add some documentation for mpirun's new --loadbalance option closing #1277
This commit was SVN r18832.
2008-07-08 07:56:54 +00:00
Lenny Verkhovsky
489c22b6b1 Added -rf|--rankfile <arg0> oprion to mpirun -h + man info
This commit was SVN r18812.
2008-07-07 13:46:22 +00:00
Lenny Verkhovsky
2fae770a86 information about carto framework in mpirun man page, ticket #1372
This commit was SVN r18809.
2008-07-07 12:02:10 +00:00
Ralph Castain
ba5498cdc6 Repair the MPI-2 dynamic operations. This includes:
1. repair of the linear and direct routed modules

2. repair of the ompi/pubsub/orte module to correctly init routes to the ompi-server, and correctly handle failure to correctly parse the provided ompi-server URI

3. modification of orterun to accept both "file" and "FILE" for designating where the ompi-server URI is to be found - purely a convenience feature

4. resolution of a message ordering problem during the connect/accept handshake that allowed the "send-first" proc to attempt to send to the "recv-first" proc before the HNP had actually updated its routes.

Let this be a further reminder to all - message ordering is NOT guaranteed in the OOB

5. Repair the ompi/dpm/orte module to correctly init routes during connect/accept.

Reminder to all: messages sent to procs in another job family (i.e., started by a different mpirun) are ALWAYS routed through the respective HNPs. As per the comments in orte/routed, this is REQUIRED to maintain connect/accept (where only the root proc on each side is capable of init'ing the routes), allow communication between mpirun's using different routing modules, and to minimize connections on tools such as ompi-server. It is all taken care of "under the covers" by the OOB to ensure that a route back to the sender is maintained, even when the different mpirun's are using different routed modules.

6. corrections in the orte/odls to ensure proper identification of daemons participating in a dynamic launch

7. corrections in build/nidmap to support update of an existing nidmap during dynamic launch

8. corrected implementation of the update_arch function in the ESS, along with consolidation of a number of ESS operations into base functions for easier maintenance. The ability to support info from multiple jobs was added, although we don't currently do so - this will come later to support further fault recovery strategies

9. minor updates to several functions to remove unnecessary and/or no longer used variables and envar's, add some debugging output, etc.

10. addition of a new macro ORTE_PROC_IS_DAEMON that resolves to true if the provided proc is a daemon

There is still more cleanup to be done for efficiency, but this at least works.

Tested on single-node Mac, multi-node SLURM via odin. Tests included connect/accept, publish/lookup/unpublish, comm_spawn, comm_spawn_multiple, and singleton comm_spawn.

Fixes ticket #1256

This commit was SVN r18804.
2008-07-03 17:53:37 +00:00
Ralph Castain
b118779c08 It is okay for us to init the ORTE mca params multiple times. Indeed, it is absolutely required by orterun as the first time has to be done prior to parsing the command line, which means that the mca values haven't been parsed yet!
Add ability for sys admins to prohibit putting session directories under specified locations. Thus, they can now protect parallel file systems from foolish user mistakes.

This commit was SVN r18721.
2008-06-24 17:50:56 +00:00
Jeff Squyres
e4172a3c44 Shift the AM "if" logic down from orte/tools/Makefile.am down to the
individual orte/tools/*/Makefile.am files.  This causes "make" to
travese into every directory, even if it's not going to build anything
in that directory (which is a good thing).  It also helps cleanup and
dist issues.

This also affects orte-checkpoint and orte-restart, but I couldn't get
--with-ft to compile properly; I'll pass along a heads-up to Josh to
ensure that I didn't break anything.

This commit was SVN r18680.
2008-06-19 14:46:10 +00:00
Ralph Castain
571f483c39 Ensure that we don't breakpoint the debugger until -after- all procs have reported their contact info so we can successfully send the release message
This commit was SVN r18678.
2008-06-19 14:37:46 +00:00
Jeff Squyres
7e45b24001 MPIR_being_debugged is an int, not a bool.
This commit was SVN r18676.
2008-06-19 13:31:34 +00:00
Ralph Castain
b56f8ced4f Ensure params are registered prior to parsing global cmd line options in orterun so that debugger options are properly captured and acted upon.
Ensure that routes to remote procs are set on the HNP before completing launch so that the debugger message can be sent. Solves a race condition that can exist in those environments where the HNP does not have local procs.

This commit was SVN r18674.
2008-06-19 02:58:14 +00:00
Ralph Castain
282a220e7e Update the debugger interface per email thread with Jeff and Brian. Handoff to them for final test and validation
This commit was SVN r18670.
2008-06-18 15:28:46 +00:00
Ralph Castain
1f41069ac9 Fix CID 752 - if we can't find the daemon job object, we have to ensure we exit without attempting to dereference it
This commit was SVN r18647.
2008-06-11 14:49:58 +00:00
Ralph Castain
13ea4e4673 Be consistent - since we don't strdup the other values for param, don't strdup this one.
This allows r18645 to fix the memory corruption issue, but also allows us to resolve the memory leaks cited by CID 1039

This commit was SVN r18646.

The following SVN revision numbers were found above:
  r18645 --> open-mpi/ompi@53d83ba1c5
2008-06-11 14:42:47 +00:00
Pak Lui
53d83ba1c5 Take out a couple of free's.
This commit fixes trac:1343

This commit was SVN r18645.

The following Trac tickets were found above:
  Ticket 1343 --> https://svn.open-mpi.org/trac/ompi/ticket/1343
2008-06-11 14:02:49 +00:00
Ralph Castain
1a422995ae Fix two Coverity complaints CID 813 (value defined and not used) and 1039 (resource leak). While doing so, found and fixed another less obvious memory leak.
This commit was SVN r18641.
2008-06-10 17:53:28 +00:00
Ralph Castain
c13cadc3c7 Refs trac:1255
This commit repairs the debugger initialization procedure. I am not closing the ticket, however, pending Jeff's review of how it interfaces to the ompi_debugger code he implemented. There were duplicate symbols being created in that code, but not used anywhere. I replaced them with the ORTE-created symbols instead. However, since they aren't used anywhere, I have no way of checking to ensure I didn't break something.

So the ticket can be checked by Jeff when he returns from vacation... :-)

This commit was SVN r18625.

The following Trac tickets were found above:
  Ticket 1255 --> https://svn.open-mpi.org/trac/ompi/ticket/1255
2008-06-09 20:34:14 +00:00
Ralph Castain
2cc8b2c51f Add yet another test, this one for proper error behavior when someone call an MPI function after calling MPI_Finalize.
Add a minor debug that outputs the orterun exit status to stderr when orte_debug is set.

This commit was SVN r18622.
2008-06-09 19:21:20 +00:00
Ralph Castain
9613b3176c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.

I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.

This commit was SVN r18619.
2008-06-09 14:53:58 +00:00
Ralph Castain
83dd3d8c6f Restore the ability to forcibly terminate by providing multiple ctrl-c's
This commit was SVN r18618.
2008-06-09 13:08:54 +00:00
Ralph Castain
0da811ce79 Initial work on xml support - allocation and job map outputs completed. More to come.
This commit was SVN r18587.
2008-06-04 20:53:12 +00:00
Jeff Squyres
a4db97c213 More man pages fixes from our Debian Open MPI package maintainer
friends.  Woo hoo!

This commit was SVN r18559.
2008-06-03 16:44:40 +00:00
Ralph Castain
b456fb2d42 Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2*heartrate, declares orted failure if not seen in last 2*heartrate time.
Also detect orted failed-to-start by setting timeout on launch. Currently only used in TM launcher.

Neither detection is enabled by default, but are only active if heartrate is set and/or launch timeout is set. Exception for SLURM as orted failure is always detected and reported.

More info to come on devel list.

This commit was SVN r18555.
2008-06-02 21:46:34 +00:00
Ralph Castain
2772221ae0 Add the -xml option to mpirun to indicate that xml output is desired
This commit was SVN r18539.
2008-05-29 14:11:31 +00:00
Ralph Castain
72530f8fed Cleanly handle the failed start of an orted, or its unexpected failure after start. This commit will allow mpirun to exit cleanly when this occurs, and does a best-effort attempt to cleanup the mess. However, it still has two unresolved issues that need to be eventually addressed:
1. it depends upon the ability of the native environment to alert us that the orted has died/failed to start. I have included that support for SLURM, but other environments need to be done.

2. for some yet-to-be-determined reason, the message that tells the remaining daemons to "die" isn't getting out of the RML, even though no obvious blockage is standing in the way. Work will continue on resolving that problem. For now, the orteds appear to be exiting on their own quite nicely when they see their HNP "lifeline" disappear.

This represents the best-available fix for ticket #221 so I am closing that ticket at this time.

This commit was SVN r18536.
2008-05-29 13:38:27 +00:00
Jeff Squyres
af1a9290cb Ensure to check that opal_init_util() didn't fail.
This commit was SVN r18453.
2008-05-19 11:58:48 +00:00
Jeff Squyres
e7ecd56bd2 This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.

= ORTE Job-Level Output Messages =

Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):

 * orte_output(): (and corresponding friends ORTE_OUTPUT,
   orte_output_verbose, etc.)  This function sends the output directly
   to the HNP for processing as part of a job-specific output
   channel.  It supports all the same outputs as opal_output()
   (syslog, file, stdout, stderr), but for stdout/stderr, the output
   is sent to the HNP for processing and output.  More on this below.
 * orte_show_help(): This function is a drop-in-replacement for
   opal_show_help(), with two differences in functionality:
   1. the rendered text help message output is sent to the HNP for
      display (rather than outputting directly into the process' stderr
      stream)
   1. the HNP detects duplicate help messages and does not display them
      (so that you don't see the same error message N times, once from
      each of your N MPI processes); instead, it counts "new" instances
      of the help message and displays a message every ~5 seconds when
      there are new ones ("I got X new copies of the help message...")

opal_show_help and opal_output still exist, but they only output in
the current process.  The intent for the new orte_* functions is that
they can apply job-level intelligence to the output.  As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.

=== New code ===

For ORTE and OMPI programmers, here's what you need to do differently
in new code:

 * Do not include opal/util/show_help.h or opal/util/output.h.
   Instead, include orte/util/output.h (this one header file has
   declarations for both the orte_output() series of functions and
   orte_show_help()).
 * Effectively s/opal_output/orte_output/gi throughout your code.
   Note that orte_output_open() takes a slightly different argument
   list (as a way to pass data to the filtering stream -- see below),
   so you if explicitly call opal_output_open(), you'll need to
   slightly adapt to the new signature of orte_output_open().
 * Literally s/opal_show_help/orte_show_help/.  The function signature
   is identical.

=== Notes ===

 * orte_output'ing to stream 0 will do similar to what
   opal_output'ing did, so leaving a hard-coded "0" as the first
   argument is safe.
 * For systems that do not use ORTE's RML or the HNP, the effect of
   orte_output_* and orte_show_help will be identical to their opal
   counterparts (the additional information passed to
   orte_output_open() will be lost!).  Indeed, the orte_* functions
   simply become trivial wrappers to their opal_* counterparts.  Note
   that we have not tested this; the code is simple but it is quite
   possible that we mucked something up.

= Filter Framework =

Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr.  The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations.  The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc.  This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).

Filtering is not active by default.  Filter components must be
specifically requested, such as:

{{{
$ mpirun --mca filter xml ...
}}}

There can only be one filter component active.

= New MCA Parameters =

The new functionality described above introduces two new MCA
parameters:

 * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
   help messages will be aggregated, as described above.  If set to 0,
   all help messages will be displayed, even if they are duplicates
   (i.e., the original behavior).
 * '''orte_base_show_output_recursions''': An MCA parameter to help
   debug one of the known issues, described below.  It is likely that
   this MCA parameter will disappear before v1.3 final.

= Known Issues =

 * The XML filter component is not complete.  The current output from
   this component is preliminary and not real XML.  A bit more work
   needs to be done to configure.m4 search for an appropriate XML
   library/link it in/use it at run time.
 * There are possible recursion loops in the orte_output() and
   orte_show_help() functions -- e.g., if RML send calls orte_output()
   or orte_show_help().  We have some ideas how to fix these, but
   figured that it was ok to commit before feature freeze with known
   issues.  The code currently contains sub-optimal workarounds so
   that this will not be a problem, but it would be good to actually
   solve the problem rather than have hackish workarounds before v1.3 final.

This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
Ralph Castain
5311b13b60 Add a loadbalancing feature to the round-robin mapper - more to be sent to devel list
Fix a potential problem with RM-provided nodenames not matching returns from gethostname - ensure that the HNP's nodename gets DNS-resolved when comparing against RM-provided hostnames. Note that this may be an issue for RM-based clusters that don't have local DNS resolution, but hopefully that is more indicative of a poorly configured system.

This commit was SVN r18252.
2008-04-23 14:52:09 +00:00
Josh Hursey
cc83d41ad9 Merge in tmp/jjh-scratch
{{{
 svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch .
}}}

Contains:
 * Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart.
 * Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff
 * Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P.
 * Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry
 * Some other sundry cleanup items all dealing with C/R functionality in the trunk.

This commit was SVN r18241.
2008-04-23 00:17:12 +00:00
Ralph Castain
16c9100633 Add --display-allocation option to orterun that will display the node-by-node information regarding your allocation.
This commit was SVN r18216.
2008-04-20 02:25:45 +00:00
Ralph Castain
e7487ad533 Implement the seq rmaps module that sequentially maps process ranks to a list hosts in a hostfile.
Restore the "do-not-launch" functionality so users can test a mapping without launching it.

Add a "do-not-resolve" cmd line flag to mpirun so the opal/util/if.c code does not attempt to resolve network addresses, thus enabling a user to test a hostfile mapping without hanging on network resolve requests.

Add a function to hostfile to generate an ordered list of host names from a hostfile

This commit was SVN r18190.
2008-04-17 13:50:59 +00:00
Ralph Castain
7b91f8baff Cleanup and fix bugs in the MPI dynamics section. Modify the dpm API so it properly takes ports instead of process names (as correctly identified by Aurelien). Fix race conditions in the use of ompi-server. Fix incompatibilities between the mpi bindings and the dpm implemenation that could cause segfaults due to uninitialized memory.
Fix the ompi-server -h cmd line option so it actually tells you something!

Add two new testing codes to the orte/test/mpi area: accept and connect.

This commit was SVN r18176.
2008-04-16 14:27:42 +00:00
Ralph Castain
5e6dc24e62 Fix ompi-server so it works with unity routed module - still not working with tree routing.
Cleanup debug flag so it activates debugging on the data server code itself

This commit was SVN r18080.
2008-04-04 19:17:28 +00:00
Ralph Castain
dc7f45dafd Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure.
Only one place used the user name field - session_dir, when formulating the name of the top-level directory. Accordingly, the code for getting the user's id has been moved to the session_dir code.

This commit was SVN r17926.
2008-03-23 23:10:15 +00:00
Jeff Squyres
dee561d29e Per recent off-list discussions about the build system, I have done
some cleanups and standardizations in the various */tools/*/ 
Makefile.am files.  This commit:

 * Somewhat simplify the tool Makefile.am's 
 * Makes the tool Makefile.am's consistent with each other (do similar
   actions in similar ways)
 * Update the tool Makefile.am's to remove old kruft that was required
   by older versions of AM (trunk requires AM >=1.10)

This commit was SVN r17921.
2008-03-22 02:04:05 +00:00
Ralph Castain
6bb139e4f2 One more correction to mpirun exit codes - cleanup the application proc's exit codes in the orted so that non-zero exit codes generated by mpirun itself don't get "munged".
Modify the multi_abort function so they all return different exit codes - allows us to tell which one was being reported.

This commit was SVN r17895.
2008-03-20 13:54:11 +00:00
Ralph Castain
2ed0e60321 Bring some sanity to the exit code returned by mpirun. Ensure that we provide a non-zero code if something goes wrong, including someone exiting after calling mpi_init without calling mpi_finalize.
Jeff is preparing an (undoubtedly lengthy) explanation/matrix of how these codes are determined for the OMPI FAQ.

This commit was SVN r17879.
2008-03-19 19:00:51 +00:00
Ralph Castain
629b95a2fe Afraid this has a couple of things mixed into the commit. Couldn't be helped - had missed one commit prior to running out the door on vacation.
Fix race conditions in abnormal terminations. We had done a first-cut at this in a prior commit. However, the window remained partially open due to the fact that the HNP has multiple paths leading to orte_finalize. Most of our frameworks don't care if they are finalized more than once, but one of them does, which meant we segfaulted if orte_finalize got called more than once. Besides, we really shouldn't be doing that anyway.

So we now introduce a set of atomic locks that prevent us from multiply calling abort, attempting to call orte_finalize, etc. My initial tests indicate this is working cleanly, but since it is a race condition issue, more testing will have to be done before we know for sure that this problem has been licked.

Also, some updates relevant to the tool comm library snuck in here. Since those also touched the orted code (as did the prior changes), I didn't want to attempt to separate them out - besides, they are coming in soon anyway. More on them later as that functionality approaches completion.

This commit was SVN r17843.
2008-03-17 17:58:59 +00:00
Ralph Castain
57a72c412a Utilize Tim M's suggestion and use atomics to do the locking.
This commit was SVN r17767.
2008-03-06 21:36:32 +00:00
Ralph Castain
097cc83be2 Fix a race condition - ensure we don't call terminate in orterun more than once, even if the timeout fires while we are doing so
This commit was SVN r17766.
2008-03-06 19:35:57 +00:00
Tim Prins
f9916811ae Make it so we do not mangle the options the user passes to their executeable. Fixes trac:1124
The change also:
 - cleans up and simplifies the command line processing code
 - adds an error output if more than one hostfile passed for a single app context
 - gets rid of the superfluous orte_app_context_map_t type, and instead use a simple argv of -host options

This commit was SVN r17750.

The following Trac tickets were found above:
  Ticket 1124 --> https://svn.open-mpi.org/trac/ompi/ticket/1124
2008-03-05 22:12:27 +00:00
Rolf vandeVaart
03fdd57d5a Fix the use of --path and -x PATH so that things work properly.
Note that --path specifies extra directories where the executable
is searched for, but does not affect the PATH settings.

This commit fixes trac:1221.

This commit was SVN r17748.

The following Trac tickets were found above:
  Ticket 1221 --> https://svn.open-mpi.org/trac/ompi/ticket/1221
2008-03-05 21:07:43 +00:00
Ralph Castain
06d3145fe4 First cut at direct launch for TM. Able to launch non-ORTE procs and detect their completion for a clean shutdown.
This commit was SVN r17732.
2008-03-05 13:51:32 +00:00
Ralph Castain
edb8e32a7a Add default hostfile parameter plus --default-hostfile command line option.
Fix error message when job setup failed

This commit was SVN r17724.
2008-03-05 04:54:57 +00:00
Ralph Castain
9413d6cf5d Define a default exit code for when things fail prior to a job launch - still needs work, but a start.
Fix a deadlock loop when things really, really go bad. If we timeout trying to kill the job, then it's time to bail as cleanly as possible, not go back and keep trying.

This commit was SVN r17715.
2008-03-05 01:46:30 +00:00
George Bosilca
7879e0b9c2 Be nice with parallel debugger, export this required symbol.
This commit was SVN r17637.
2008-02-28 05:59:07 +00:00
George Bosilca
9d421bea2a Replace all occurences of orte_pointer_array by opal_pointer_array. Remove the
implementation of orte_pointer_array.

This commit was SVN r17636.
2008-02-28 05:32:23 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
Sharon Melamed
025b68becf Move the carto framework to the trunk.
This commit was SVN r17177.
2008-01-23 09:20:34 +00:00
Jeff Squyres
b6e9c99f7d Formatting fixes from Peter Breitenlohner.
This commit was SVN r17163.
2008-01-18 23:21:31 +00:00
Josh Hursey
0bf61a1b84 Move in some accumulated small features and minor bug fixes for C/R support.
{{{
svn merge -r 16447:16475 https://svn.open-mpi.org/svn/ompi/tmp/jjh-fgs .
}}}

This commit was SVN r16478.
2007-10-17 13:47:36 +00:00
Ralph Castain
b6196e8a39 When we can detect that a daemon has failed, then we would like to terminate the system without having it lock up. The "hang" is currently caused by the system attempting to send messages to the daemons (specifically, ordering them to kill their local procs and then terminate). Unfortunately, without some idea of which daemon has died, the system hangs while attempting to send a message to someone who is no longer alive.
This commit introduces the necessary logic to avoid that conflict. If a PLS component can identify that a daemon has failed, then we will set a flag indicating that fact. The xcast system will subsequently check that flag and, if it is set, will send all messages direct to the recipient. In the case of "kill local procs" and "terminate", the messages will go directly to each orted, thus bypassing any orted that has failed.

In addition, the xcast system will -not- wait for the messages to complete, but will return immediately (i.e., operate in non-blocking mode). Orterun will wait (via an event timer) for a period of time based on the number of daemons in the system to allow the messages to attempt to be delivered - at the end of that time, orterun will simply exit, alerting the user to the problem and -strongly- recommending they run orte-clean.

I could only test this on slurm for the case where all daemons unexpectedly died - srun apparently only executes its waitpid callback when all launched functions terminate. I have asked that Jeff integrate this capability into the OOB as he is working on it so that we execute it whenever a socket to an orted is unexpectedly closed. Meantime, the functionality will rarely get called, but at least the logic is available for anyone whose environment can support it.

This commit was SVN r16451.
2007-10-15 18:00:30 +00:00
Josh Hursey
31e9369e8b Fix orterun so it does not get influenced by an application's argv set.
For example, if I have an application that, internal to the application, takes
the argument '-mca foo bar' we do not want orterun to pick up this argument and
pass it through the system.

So the following
{{{
 shell$ mpirun -np 2 -mca btl tcp,self ./myapp -mca foo bar
}}}

orterun should pick up {{{-mca btl tcp,self}}} but not {{{-mca foo bar}}} which
it was previous to this commit.

I tested command line runs and runs with app files to confirm this patch works.

This commit was SVN r16431.
2007-10-11 18:33:40 +00:00
Josh Hursey
7437f37e96 This commit contains the following:
* Fix some missing includes in a few places.
 * Add the cr_request() functionality to the BLCR CRS component.
   We are now dependent upon the 0.6.* series of BLCR.
 * Made the CR notification mechanism a registered function.
   This way we can have an OPAL-only version and it can be replaced at
   runtime with the ORTE version.
 * Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only
   CR functionality when the user wants it. Default: Disabled.
 * Fix the placement of a checkpoint request check in MPI_Init
 * Pull the OPAL notification mechanism into the SnapC framework.
   * We no longer fork/exec the 'opal-checkpoint' command for local
   checkpointing, the Local coordinator in the orted does this directly.
   * The Local and Application coordinator talk together bypassing the OPAL
   notifiation mechanism.
   * Optimized the Local <-> App Coordinator communication.
   * Improved the structure used to track vpid_snapshots in the local coord.
 * Fix a race condition in which an application under heavy communication load
   may produce an inconsistent global checkpoint.

This commit was SVN r16389.
2007-10-08 20:53:02 +00:00
Ralph Castain
54b2cf747e These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC.
The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component.

This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done:

As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in.

In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in.

The incoming changes revamp these procedures in three ways:

1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step.

The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic.

Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure.


2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed.

The size of this data has been reduced in three ways:

(a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes.

To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose.

(b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction.

(c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using.

While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly.


3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup.

It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k*50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging.

Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future.


There are a few minor additional changes in the commit that I'll just note in passing:

* propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details.

* requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details.

* cleanup of some stale header files

This commit was SVN r16364.
2007-10-05 19:48:23 +00:00
George Bosilca
4e66376e66 Fix memory leak (Coverty 702).
This commit was SVN r16122.
2007-09-13 20:11:38 +00:00
Shiqing Fan
dcee7e4229 - Should not use ORTE_DECLSPEC with initialization.
This commit was SVN r16086.
2007-09-11 10:13:53 +00:00
George Bosilca
d658a477af Update the help file to match the real name of the required argument.
This commit was SVN r15762.
2007-08-04 00:35:55 +00:00
Shiqing Fan
0f468f3668 - Remove the solution and project files, will commit them later.
This commit was SVN r15705.
2007-07-31 17:07:02 +00:00
Shiqing Fan
4d7b349cdb - Add VC8 solution and project files.
- If one wants to use this solution, remember to unload the project 'orte-restart' which is currently not working for Windows.

This commit was SVN r15680.
2007-07-30 11:05:34 +00:00
Ralph Castain
d99c764e75 Resolve a problem where the orte daemon comm functions were being accessed by mpirun while still retaining occasional reference to the orted_globals. Remove all dependence on orted_globals from the comm functions. Move those functions back into their own file to make it easier to maintain the separation. Ensure that mpirun ignores any "exit" commands being sent to daemons as it will exit on its own.
This commit was SVN r15562.
2007-07-23 18:36:33 +00:00
Ralph Castain
2110064a9a Ensure that the LD_LIBRARY_PATH and PATH get properly set for procs locally spawned by mpirun.
This commit was SVN r15516.
2007-07-19 19:00:06 +00:00
Josh Hursey
eeba2cb871 Add a comment to clarify the relationship between
mca_base_cmd_line_process_args() and opal_init_util() so
we do not forget their ordering needs, and subtle relationship.

This commit was SVN r15412.
2007-07-13 19:08:05 +00:00
Ralph Castain
2bded34a1d Fix a problem observed by Brian where processes launched local to mpirun lost their environment except for MCA params.
The problem stemmed from no longer launching a local orted on the same node as mpirun. The orted would save and reuse the base environment. Mpirun didn't do that, and the odls was using the orted's globally saved environment (which wasn't being set).

This fix establishes a globally accessible base launch environment that both the orted and mpirun can utilize. Since we now use that, we don't need to pass it to the odls_launch_proc function, so remove that param from the API (and modify all components to handle the change).

This commit was SVN r15405.
2007-07-13 15:47:57 +00:00
Ralph Castain
bd65f8ba88 Bring in an updated launch system for the orteds. This commit restores the ability to execute singletons and singleton comm_spawn, both in single node and multi-node environments.
Short description: major changes include -

1. singletons now fork/exec a local daemon to manage their operations.

2. the orte daemon code now resides in libopen-rte

3. daemons no longer use the orte triggering system during startup. Instead, they directly call back to their parent pls component to report ready to operate. A base function to count the callbacks has been provided.

I have modified all the pls components except xcpu and poe (don't understand either well enough to do it). Full functionality has been verified for rsh, SLURM, and TM systems. Compile has been verified for xgrid and gridengine.

This commit was SVN r15390.
2007-07-12 19:53:18 +00:00
Jeff Squyres
64083570f5 Add support for DDT parallel debugger, which required several things:
* Making some symbols and types be global (vs. static) in orterun
 * Adding a "ddt" entry in the MCA parameter orte_base_user_debugger
   default value
 * Add support for @executable@, @executable_argv@, and @single_app@
   tokens in the orte_base_user_debugger MCA parameter.
 * Added various error checks and corresponding help messages after
   finding a debugger in the PATH

Fixes trac:1081

This commit was SVN r15323.

The following Trac tickets were found above:
  Ticket 1081 --> https://svn.open-mpi.org/trac/ompi/ticket/1081
2007-07-10 12:53:48 +00:00
Jeff Squyres
892bc38ad0 Protect against a bad free: full_line points to the full buffer. But
line may point to a few characters beyond the beginning of the buffer
(if the buffer had some extra white space padding at the beginning).
So if we want to free the buffer, free full_line, not line.

This commit was SVN r15315.
2007-07-09 19:56:16 +00:00
Josh Hursey
f88aa6c273 This commit cleans up the AMCA parameter implementation a bit.
* Remove the 'opal_mca_base_param_use_amca_sets' global variable
* Harness the fact that you can (read should) call the cmd_line functions
  before initializing opal_init_util(). This pushes the MCA/GMCA/AMCA
  command line options into the environment before OPAL inits and starts
  to use these values. By putting the cmd_line parse before opal_init_util
  in orterun and orted we only parse the *MCA parameter files once, and 
  correctly (alleviating the need to 'recache' the files on init.)
* Small bits of cleanup.

This commit was SVN r15219.
2007-06-27 01:03:31 +00:00
Rainer Keller
15c03e8acc - Apply patch 31_manpages_lintian.dpatch
Thanks to Dirk Eddelbuettel <edd@debian.org>

This commit was SVN r15215.
2007-06-26 21:13:10 +00:00
Josh Hursey
dd021e7121 Remove some leftover debugging that must have been accidentally left in
r15142.

This commit was SVN r15145.

The following SVN revision numbers were found above:
  r15142 --> open-mpi/ompi@a3998a1676
2007-06-20 14:06:13 +00:00
George Bosilca
a3998a1676 Allow the symbols required by TotalView to be exported even when
the visibility feature is on.

This commit was SVN r15142.
2007-06-19 22:35:23 +00:00
Ralph Castain
85df3bd92f Bring in the generalized xcast communication system along with the correspondingly revised orted launch. I will send a message out to developers explaining the basic changes. In brief:
1. generalize orte_rml.xcast to become a general broadcast-like messaging system. Messages can now be sent to any tag on the daemons or processes. Note that any message sent via xcast will be delivered to ALL processes in the specified job - you don't get to pick and choose. At a later date, we will introduce an augmented capability that will use the daemons as relays, but will allow you to send to a specified array of process names.

2. extended orte_rml.xcast so it supports more scalable message routing methodologies. At the moment, we support three: (a) direct, which sends the message directly to all recipients; (b) linear, which sends the message to the local daemon on each node, which then relays it to its own local procs; and (b) binomial, which sends the message via a binomial algo across all the daemons, each of which then relays to its own local procs. The crossover points between the algos are adjustable via MCA param, or you can simply demand that a specific algo be used.

3. orteds no longer exhibit two types of behavior: bootproxy or VM. Orteds now always behave like they are part of a virtual machine - they simply launch a job if mpirun tells them to do so. This is another step towards creating an "orteboot" functionality, but also provided a clean system for supporting message relaying.

Note one major impact of this commit: multiple daemons on a node cannot be supported any longer! Only a single daemon/node is now allowed.

This commit is known to break support for the following environments: POE, Xgrid, Xcpu, Windows. It has been tested on rsh, SLURM, and Bproc. Modifications for TM support have been made but could not be verified due to machine problems at LANL. Modifications for SGE have been made but could not be verified. The developers for the non-verified environments will be separately notified along with suggestions on how to fix the problems.

This commit was SVN r15007.
2007-06-12 13:28:54 +00:00
Brian Barrett
508da4e959 OS X apparently really doesn't like shared libraries with unresolvable
symbols in them and environ is defined only in the final application
(probably in crt1.o).  Apple provides a function for getting at the
environment, so use that instead if it's available.

This commit was SVN r14857.
2007-06-05 03:03:59 +00:00
Ralph Castain
a2964f429e Fix a compiler warning - strncmp returns an int, so you have to compare to 0 instead of NULL.
This commit was SVN r14790.
2007-05-29 18:02:10 +00:00
Anya Tatashina
de676d717b Ref Trac #1032; added suport for full path launching with TotalView
This commit was SVN r14789.
2007-05-29 17:39:11 +00:00
Ralph Castain
4fff584a68 Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that *did* start.
The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system.

Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed.

Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief.

With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn.

Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put".

This commit was SVN r14711.
2007-05-21 18:31:28 +00:00
Ralph Castain
d9acc93efa Compute and pass the local_rank and local number of procs (in that proc's job) on the node.
To be precise, given this hypothetical launching pattern:

host1: vpids 0, 2, 4, 6
host2: vpids 1, 3, 5, 7

The local_rank for these procs would be:

host1: vpids 0->local_rank 0, v2->lr1, v4->lr2, v6->lr3
host2: vpids 1->local_rank 0, v3->lr1, v5->lr2, v7->lr3

and the number of local procs on each node would be four. If vpid=0 then does a comm_spawn of one process on host1, the values of the parent job would remain unchanged. The local_rank of the child process would be 0 and its num_local_procs would be 1 since it is in a separate jobid.

I have verified this functionality for the rsh case - need to verify that slurm and other cases also get the right values. Some consolidation of common code is probably going to occur in the SDS components to make this simpler and more maintainable in the future.

This commit was SVN r14706.
2007-05-21 14:30:10 +00:00
Ralph Castain
75d51812a3 Fix the app-failed-to-start capability that was broken by r14554 (holding the caller in rmgr.spawn until the application - as opposed to just the orteds - have started). Allow the rmgr.spawn function to return if the app terminates, correctly handling its return status code to show abnormal termination. Modify orterun to correctly handle the returned status code so it doesn't enter a conditioned wait if the app fails to start since it will never wakeup if it does.
This commit was SVN r14693.

The following SVN revision numbers were found above:
  r14554 --> open-mpi/ompi@4510b42638
2007-05-18 13:29:11 +00:00
Jeff Squyres
51ff779a5d Minor gramatical nit found by Karen/Sun.
This commit was SVN r14622.
2007-05-08 21:24:44 +00:00
Jeff Squyres
395d05b6bc Update the man page to describe both -wdir and -wd. -wdir is consider
the "primary" option and -wd is the synonym.  Regardless, either of
them function exactly like the other.

This commit was SVN r14618.
2007-05-08 20:27:20 +00:00
Jeff Squyres
8a68b2dba7 Add -wdir option as a synonym for -wd (to make us match the man page).
This commit was SVN r14614.
2007-05-08 19:09:32 +00:00
Sven Stork
a04c8eb39a - Bring over the visibility feature, for a finer symbol export control
via the visibility feature that is provided by some compilers.

  Per default this feature is disabled, to enable it you need to
  configure with --enable-visibility and obviously you need a compiler
  with visibility support. Please refer to the wiki for more information.
  https://svn.open-mpi.org/trac/ompi/wiki/Visibility

This commit was SVN r14582.
2007-05-04 09:03:37 +00:00
Ralph Castain
c774f641fb Modify orterun to provide more user-friendly reporting on jobs that fail to start
This commit was SVN r14496.
2007-04-24 19:19:14 +00:00
Ralph Castain
18b2dca51c Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly.
There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd *really* have to try).

This also involved a slight change to the oob.xcast API, so propagated that as required.

Note: this has *only* been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-)

Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately.

This commit was SVN r14475.
2007-04-23 18:41:04 +00:00
Jeff Squyres
0ba47105ed Merge the /tmp/jms-installdirs-trunk branch into the trunk. This
finally brings in functionality that is already on the 1.2 branch, and
was developed and tested in the v1.2ofed branch (and other places).

Short version of new features:

 * Support for ibv_fork_init() 
 * Automatically fill in the openib BTL bandwidth value by 
   querying the HCA port 
 * Installdirs functionality 
 * Fixes to always use -I in the Fortran wrapper compilers (#924) 
 * Gleb's mpool updates 
 * Remove some kruft in btl/openib/configure.m4, therefore 
   fixing the harmless warnings noted in #665 
 * Bunches of updates to the Linux RPM spec file 

I.e., effectively the same thing that r14411 brought to the v1.2
branch.

Also effectively brought in r14432 and r14433 (some fixes on top of
the original r14411 commit to v1.2).  Still need to bring in the moral
equivalent of r14445 after this commit (fixes to installdirs).

This commit was SVN r14449.

The following SVN revision numbers were found above:
  r14411 --> open-mpi/ompi@83b31314ae
  r14432 --> open-mpi/ompi@a48f160595
  r14433 --> open-mpi/ompi@68f346d2bc
  r14445 --> open-mpi/ompi@13d366b827
2007-04-21 00:15:05 +00:00
Josh Hursey
8fd6d4ba09 add a newline so output is cleaner/clearer
This commit was SVN r14229.
2007-04-05 17:45:03 +00:00
George Bosilca
f2a6b9394f Deal with the include spree. Protect "environ" on Windows.
Some others minors modifications in order to make it
compile [again] on Windows.

This commit was SVN r14188.
2007-04-01 16:16:54 +00:00
Ralph Castain
0d98264097 Fix the nolocal option on the OMPI trunk
This commit was SVN r14138.
2007-03-24 16:16:16 +00:00
Jeff Squyres
bcdfbacaa4 Oops -- typo from previous commit. :-(
This commit was SVN r14130.
2007-03-23 00:51:50 +00:00
Jeff Squyres
a3dd0f2e08 Connect --nolocal up to the MCA param rmaps_base_schedule_local, as it
should be (it's a mistake that it got left out).

This commit was SVN r14127.
2007-03-22 19:29:47 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Josh Hursey
0404444dbe * Added 2 new MCA parameters
- mca_base_param_file_prefix
     (Default: NULL)
     This is the fullname of the "-am" mpirun option. Used to specify a ':'
     separated list of AMCA parameter set files.
  - mca_base_param_file_path
     (Default: $SYSCONFDIR/amca-param-sets/:$CWD)
     The path to search for AMCA files with relative paths. A warning will be
     printed if the AMCA file cannot be found.

* Added a new function "mca_base_param_recache_files" the re-reads the file
configurations. This is used internally to help bootstrap the MCA system.

* Added a new orterun/mpirun command line option '-am' that aliases for the
mca_base_param_file_prefix MCA parameter

* Exposed the opal_path_access function as it is generally useful in other
places in the code.

* New function "opal_cmd_line_make_opt_mca" which will allow you to append a
new command line option with MCA parameter identifiers to set at the same
time. Previously this could only be done at command line declaration time.

* Added a new directory under the $pkgdatadir named "amca-param-sets" where all
the 'shipped with' Open MPI AMCA parameter sets are placed. This is the first
place to search for AMCA sets with relative paths.

* An example.conf AMCA parameter set file is located in
contrib/amca-param-sets/.

* Jeff Squyres contributed an OpenIB AMCA set for benchmarking.

Note: You will need to autogen with this commit as it adds a configure param.
  Sorry :(

This commit was SVN r13867.
2007-03-01 13:39:20 +00:00
Pak Lui
2d6b3776bf * fix the SEGV described in trac #892 that the exit_status in the 200 range
causes a strsignal to show NULL as a result. Still trying to determine
  why exit_status is in that range.

This commit was SVN r13583.
2007-02-09 16:39:30 +00:00
Pak Lui
ccff0a6e65 * minor fix to correct the pid that always shows up as 0 in the abort
error message. e.g: 

  mpirun noticed that job rank 2 with PID 0 on node burl-ct-v440-4
  exited on signal 15 (Terminated).

This commit was SVN r13537.
2007-02-07 17:46:19 +00:00
George Bosilca
9f73335bdb Silence the compiler.
This commit was SVN r13381.
2007-01-31 04:24:56 +00:00