1
1
Граф коммитов

411 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
fceabb2498 Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.

Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.

Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.

I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:

1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)

2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.

There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.

This commit was SVN r23925.
2010-10-24 18:35:54 +00:00
Ralph Castain
2c1a658232 Fix debugger attach
This commit was SVN r23921.
2010-10-22 20:07:24 +00:00
Ralph Castain
1e93437cd4 To help with debugging, add a new mca param that instructs ORTE_ERROR_LOG to output "silent" errors. Helps to track down silent errors that don't have an associated error message (e.g., via show_help).
This commit was SVN r23893.
2010-10-16 03:29:47 +00:00
Brian Barrett
9febaa475e * Add shell of functionality required for supporting Portals4
* Update places where orte-free builds have failed

This commit was SVN r23891.
2010-10-14 22:49:09 +00:00
Ethan Mallove
8052c9dd46 Prevent SEGV when using odls_base_verbose
This commit was SVN r23758.
2010-09-15 20:17:50 +00:00
Ralph Castain
554aede041 Fix a situation where we were unlocking a thread that isn't locked for the main launch - it is only used for dynamic spawns.
This commit was SVN r23682.
2010-08-28 14:03:17 +00:00
Josh Hursey
77792c937d When we checkpoint with the --stop option, be sure to write out all the metadata before clearing the storage handle.
Here we tried to write out the session directory marker after we sync the directory, which happens early in the case of --stop.

Thanks to Ananda Mudar for noticing the bug.

This commit was SVN r23627.
2010-08-18 20:44:03 +00:00
Brian Barrett
13c827dda8 Make trunk compile on Red Storm again
This commit was SVN r23622.
2010-08-17 21:51:38 +00:00
Josh Hursey
e12ca48cd9 A number of C/R enhancements per RFC below:
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php

Documentation:
  http://osl.iu.edu/research/ft/

Major Changes: 
-------------- 
 * Added C/R-enabled Debugging support. 
   Enabled with the --enable-crdebug flag. See the following website for more information: 
   http://osl.iu.edu/research/ft/crdebug/ 
 * Added Stable Storage (SStore) framework for checkpoint storage 
   * 'central' component does a direct to central storage save 
   * 'stage' component stages checkpoints to central storage while the application continues execution. 
     * 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress) 
     * 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching) 
 * Added Compression (compress) framework to support 
 * Add two new ErrMgr recovery policies 
   * {{{crmig}}} C/R Process Migration 
   * {{{autor}}} C/R Automatic Recovery 
 * Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component 
 * Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option) 
   * {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342) 
   * {{{OMPI_CR_Restart}}} 
   * {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules) 
   * {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192) 
   * {{{OMPI_CR_Quiesce_start}}} 
   * {{{OMPI_CR_Quiesce_checkpoint}}} 
   * {{{OMPI_CR_Quiesce_end}}} 
   * {{{OMPI_CR_self_register_checkpoint_callback}}} 
   * {{{OMPI_CR_self_register_restart_callback}}} 
   * {{{OMPI_CR_self_register_continue_callback}}} 
 * The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future. 
 * Add a progress meter to: 
   * FileM rsh (filem_rsh_process_meter) 
   * SnapC full (snapc_full_progress_meter) 
   * SStore stage (sstore_stage_progress_meter) 
 * Added 2 new command line options to ompi-restart 
   * --showme : Display the full command line that would have been exec'ed. 
   * --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413) 
 * Deprecated some MCA params: 
   * crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir 
   * snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir 
   * snapc_base_global_shared deprecated, use sstore_stage_global_is_shared 
   * snapc_base_store_in_place deprecated, replaced with different components of SStore 
   * snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref 
   * snapc_base_establish_global_snapshot_dir deprecated, never well supported 
   * snapc_full_skip_filem deprecated, use sstore_stage_skip_filem 

Minor Changes: 
-------------- 
 * Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing. 
 * Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components 
 * Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it. 
 * Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}} 
 * Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set. 
 * opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality. 
 * Cleanup the CRS framework and components to work with the SStore framework. 
 * Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably). 
 * Add 'quiesce' hook to CRCP for a future enhancement. 
 * We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}. 
 * Add optional application level INC callbacks (registered through the CR MPI Ext interface). 
 * Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive. 
 * {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked. 
 * {{{opal-restart}}} also support local decompression before restarting 
 * {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata 
 * {{{orte-restart}}} now uses the SStore framework to work with the metadata 
 * Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality. 
 * Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}. 
 * Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped. 
 * Make sure to decrement the number of 'num_local_procs' in the orted when one goes away. 
 * odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options. 
 * Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities. 
 * Improve the checks for 'already checkpointing' error path. 
 * A a recovery output timer, to show how long it takes to restart a job 
 * Do a better job of cleaning up the old session directory on restart. 
 * Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment) 
 * Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize. 

This commit was SVN r23587.

The following Trac tickets were found above:
  Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
  Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
  Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
  Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
  Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
  Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
  Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
2010-08-10 20:51:11 +00:00
Terry Dontje
b74ef351b7 Added new solaris sysinfo module. Also added code to assign
orte_local_chip_type and orte_local_chip_model in MPI processes it the
appropriate sysinfo module found the values on the machine.

This commit was SVN r23581.
2010-08-09 19:28:56 +00:00
Josh Hursey
ba7e94dd89 Some relatively minor C/R related cleanup
* Fix a configure warning for checking --enable-ft-thread
 * In hnp and orted ErrMgr components check to see if other components have already recovered this process before trying to recover it again.
 * Fix 'npernode' for restarting using the resilient rmaps component
 * export ompi_info_set, so that internal functionality can use it.

This commit was SVN r23535.
2010-07-30 18:59:34 +00:00
Ralph Castain
e858b029aa Re-establish support for IBM's POE environment by resurrecting the poe launcher from the v1.2 series, updated for the current trunk. Modify the hnp errmgr module to support batch-operated jobs. Update the new ess generic module to support a wider range of mapping options.
This commit was SVN r23493.
2010-07-24 23:35:20 +00:00
Ralph Castain
f3e13b9766 Improve the efficiency by making the check for uniqueness of incoming hnp contact info much faster by including the hnp_uri in the job_family tracker object. Replace the global buffer storage with a quick routine to build the buffer from the jobfams array
This commit was SVN r23443.
2010-07-20 08:30:47 +00:00
Ralph Castain
248320b91a Enable connect_accept between multiple singleton jobs without the presence of an external rendezvous agent (e.g., ompi-server). This also enables connect_accept between processes in more than two jobs regardless of how they were started.
Create an ability to store the contact info for multiple HNPs being used to route between different job families. Modify the dpm orte module to pass the resulting store during the connect_accept procedure so that all jobs involved in the resulting communicator know how to route OOB messages between them.

Add a test provided by Philippe that tests this ability.

This commit was SVN r23438.
2010-07-20 04:22:45 +00:00
Ralph Castain
12cd07c9a9 Start reducing our dependency on the event library by removing at least one instance where we use it to redirect the program counter. Rolf reported occasional hangs of mpirun in very specific circumstances after all daemons were done. A review of MTT results indicates this may have been happening more generally in a small fraction of cases.
The problem was tracked to use of the grpcomm.onesided_barrier to control daemon/mpirun termination. This relied on messaging -and- required that the program counter jump from the errmgr back to grpcomm. On rare occasions, this jump did not occur, causing mpirun to hang.

This patch looks more invasive than it is - most of the affected files simply had one or two lines removed. The essence of the change is:

* pulled the job_complete and quit routines out of orterun and orted_main and put them in a common place

* modified the errmgr to directly call the new routines when termination is detected

* removed the grpcomm.onesided_barrier and its associated RML tag

* add a new "num_routes" API to the routed framework that reports back the number of dependent routes. When route_lost is called, the daemon's list of "children" is checked and adjusted if that route went to a "leaf" in the routing tree

* use connection termination between daemons to track rollup of the daemon tree. Daemons and HNP now terminate once num_routes returns zero

Also picked up in this commit is the addition of a new bool flag to the app_context struct, and increasing the job_control field from 8 to 16 bits. Both trivial.

This commit was SVN r23429.
2010-07-17 21:03:27 +00:00
Ralph Castain
4af771a681 Update the job object pack-unpack fns
This commit was SVN r23392.
2010-07-13 21:23:14 +00:00
Ralph Castain
eee7541ae7 Don't overwrite a prior setting for create_session_dirs
This commit was SVN r23383.
2010-07-13 06:30:09 +00:00
Ralph Castain
2babebf9c3 Revert some of a prior commit - there is no need for another flag to indicate one-sided termination of the orteds.
This commit was SVN r23372.
2010-07-10 03:33:01 +00:00
Ralph Castain
da61b69b15 Ensure we don't incorrectly return a non-zero exit code when normally terminating a slurm job.
Slurm, of course, must always be different...

This commit was SVN r23371.
2010-07-09 19:14:10 +00:00
Ralph Castain
31295e8dc2 As discussed on today's telecon, reorganize the debugger attachment code in orte to better support efforts within the tool community aimed at exploring alternative methods. Move the debugger attachment code from the orterun directory to a new debugger framework. Organize the existing standard support code into an "mpir" component. Organize the current extensions for co-spawning debugger daemons into a separate "mpirx" component.
Since the MPIR symbols are now included in the ORTE library, remove duplicate declarations in OMPI and replace them with extern references to their ORTE instantiations.

This commit was SVN r23360.
2010-07-06 23:35:42 +00:00
Ralph Castain
e9f4c84d7e Add another name field to the job object
This commit was SVN r23299.
2010-06-24 01:57:27 +00:00
Ralph Castain
ae746a390f Debugger daemons spawned upon attachment to a running job need to be treated just like a regular job - they are not "piggybacking" onto an existing launch, and so the orte daemons need to report them just like a regular job launch in order to release from spawn.
Modify the debugger control flag to include "do not monitor" so orterun will not take debugger daemon termination into account when deciding that all jobs are done.

This commit was SVN r23282.
2010-06-19 15:22:36 +00:00
Ralph Castain
e52a54183f Let max restarts be associated with an app_context instead of a job so that individual apps can have different values. Default to a single job-level value
This commit was SVN r23248.
2010-06-07 14:21:08 +00:00
Ralph Castain
73ebb748bb Ignore comm failures when shutting down orteds
This commit was SVN r23201.
2010-05-23 02:57:03 +00:00
Abhishek Kulkarni
afbe3e99c6 * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
 SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
 back the native error code.

* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
  (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
  decode 'ret' to get the native error code.

This commit was SVN r23162.
2010-05-17 23:08:56 +00:00
Ralph Castain
88f5217a12 Cleanup the debugger daemon co-launch code and add an ability to test it. Implement ability to co-launch debugger daemons upon attach to a running job for jobs launched under rsh, slurm, and tm environments (others can easily be added if desired).
Add new mca params to test:

orte_debugger_test_daemon: Name of the executable to be used to simulate a debugger colaunch
orte_debugger_test_attach: Test debugger colaunch after debugger attachment

To test co-launch at job start, just set the orte_debugger_test_daemon param.

To test co-launch upon attach:
set orte_debugger_test_daemon
set orte_debugger_test_attach=1
set orte_enable_debug_cospawn_while_running=1
set orte_debugger_check_rate=<N> - defines the number of seconds to wait before "checking" for a debugger attaching

Added a "debugger" program to orte/test/mpi that just spins to simulate a debugger daemon.

This commit was SVN r23144.
2010-05-14 18:44:49 +00:00
Ralph Castain
7ce34223f1 Per off-list discussion, implement the new OMPI exit status policy (soon to be on wiki) and further cleanup error reporting to cover new cases.
Implement process migration when failed nodes are detected. Some testing still required

This commit was SVN r23121.
2010-05-12 18:11:58 +00:00
Ralph Castain
2ff1ae13e1 Create a new "heartbeat" module in the sensor framework and move the plm_base heartbeat code there. Add new proc and job states for heartbeat_failed. Remove the "heartbeat" cmd line option for orted as this is now done automatically if the --enable-heartbeat configure option is set.
This commit was SVN r23102.
2010-05-05 00:48:43 +00:00
Ralph Castain
319758e3e0 Restore process recovery for procs local to mpirun (first step towards restoring full capability). Define three new MCA params:
1. orte_enable_recovery - default recovery policy, can be overridden on a per-job basis

2. orte_max_local_restarts - default max number of local restarts, can be overridden

3. orte_max_global_restarts - default max number of relocates, can be overridden

Implement the restart_proc API for the ODLS framework, reorganize the default fns a little to avoid copying code.

This commit was SVN r23057.
2010-04-28 04:06:57 +00:00
Ralph Castain
67568e7dec Fix a warning
This commit was SVN r23049.
2010-04-27 15:59:24 +00:00
Ralph Castain
b9893aacc5 Add a sensor framework to ORTE that monitors applications and notifies the errmgr when they exceed specified boundaries. Two modules are included here:
1. file activity - can monitor file size, access and modification times. If these fail to change over a specified number of sampling iterations (rate is an mca param), then the errmgr is notified.

2. memory usage - checks amount of memory used by a process. Limit and sampling rate can be set.

This support must be enabled by configuring --enable-sensors.

ompi_info and orte-info have been updated to include the new framework.

Also includes some initial steps toward restoring the recovery capability. Most notably, the ODLS API has been extended to include a "restart_proc" entry for restarting a local process, and organizes the various ERRMGR framework globals into a single struct as we do in the other ORTE frameworks. Fix an oversight in the ERRMGR framework where a pointer array was constructed, but not initialized.

Implementation continues.

This commit was SVN r23043.
2010-04-26 22:15:57 +00:00
Ralph Castain
efbb5c9b7c Revamp the errmgr framework to provide a greater range of optional behaviors, including different behaviors for daemons, and remove several looping messages across the code base:
* add hnp and orted modules to the errmgr framework. The HNP module contains much of the code that was in the errmgr base since that code could only be executed by the HNP anyway.

* update the odls to report process states directly into the active errmgr module, thus removing the need to send messages looped back into the odls cmd processor. Let the active errmgr module decide what to do at various states.

* remove the code to track application state progress from the plm_base_launch_support.c code. Update the plm modules to call the errmgr directly when a launch fails.

* update the plm_base_receive.c code to call the errmgr with state updates from remote daemons

* update the routed modules to reflect that process state is updated in the errmgr

* ensure that the orted's open the errmgr and select their appropriate module

* add new pretty-print utilities to print process and job state. Move the pretty-print of time info to a globally-accessible place

* define a global orte_comm function to send messages from orted's to the HNP so that others can overlay the standard RML methods, if desired.

* update the orterun help output to reflect that the "term w/o sync" error message can result from three, not two, scenarios

This commit was SVN r23023.
2010-04-23 04:44:41 +00:00
Ralph Castain
d3ed4e68b7 Utilize a non-used mapping policy bit to define a policy that uses only existing alive daemons to support virtual machines and restarting processes on already-active nodes
This commit was SVN r22951.
2010-04-10 05:02:47 +00:00
Ralph Castain
de6679dbd3 Truly respect the -quiet option. Make it an mca param so someone doesn't have to put it solely on the cmd line. Tell show_help to shaddup as well.
This commit was SVN r22926.
2010-04-02 14:19:38 +00:00
Ralph Castain
871a9e0df4 Track process heartbeats with time_t, be a little less restrictive on who can retrieve an orte_job_t object
This commit was SVN r22921.
2010-03-31 19:20:06 +00:00
Josh Hursey
e4f2d03d28 ErrMgr Framework redesign to better support fault tolerance development activities.
Explained in more detail in the following RFC:
  http://www.open-mpi.org/community/lists/devel/2010/03/7589.php

This commit was SVN r22872.
2010-03-23 21:28:02 +00:00
Ralph Castain
0b9552cd4e Expand the ESS framework's API to include a new function "query_sys_info" that allows the caller to retrieve key-value pairs of info on the local system capabilities (e.g., cpu type/model). Have each daemon and the HNP "sense" that information and provide it to their local procs to avoid having every proc querying the system directly.
This commit was SVN r22870.
2010-03-23 20:47:41 +00:00
Josh Hursey
e9b5162d79 Fix the configure logic for --with-ft so that it properly takes a comma separated list.
Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those.

The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed.

This commit was SVN r22824.
2010-03-12 23:57:50 +00:00
Ralph Castain
c88fe1ea54 Create a new mca parameter to control creation of session directories. Defaults to true so that the current behavior of always creating them is preserved. If set to false (0), then don't create session directories. Helps in those environments where session directories are a problem.
Tell the sm btl that it cannot run if no session directories were created.

This commit was SVN r22756.
2010-03-02 15:18:33 +00:00
Ralph Castain
2541aa98ab Change the app_idx type to uint32_t to support users who use large numbers of app_contexts. Set it up as a new typedef so we can change it later without as much effort.
This commit was SVN r22727.
2010-02-27 17:37:34 +00:00
Ralph Castain
6c0d7940c7 Add a new MCA param (and corresponding mpirun cmd line option) to output the debugger proctable info after launch. The output is just the job map with the process pid included, so you get a node-by-node list of the process ranks on that node and thier pids.
Works for initial launch and comm_spawn. xml and non-xml output is available

This commit was SVN r22725.
2010-02-27 08:32:25 +00:00
Shiqing Fan
44fe33452c Use this option only on Windows, so protect it with #ifdef __WINDOWS__.
This commit was SVN r22701.
2010-02-24 08:50:03 +00:00
Shiqing Fan
7a5a5ce024 Add a global option for inputing the head node name for Windows CCP trough command line.
This commit was SVN r22688.
2010-02-23 19:42:51 +00:00
Ralph Castain
f66b6cae23 Enable the boot of an orted "virtual machine". Modify the mapper framework to allow mapping of only daemons. Remove the cm ras module as no longer required. Modify the orted code to always send back node arch info. Remove the "--enable-bootstrap" configure option as this feature will now always be available.
This commit was SVN r22480.
2010-01-25 22:25:13 +00:00
Ralph Castain
cec840f6b9 The ability to add procs to a running job was unfortunately borked when we added the detection of a proc exiting before calling init. Re-enable it here, ensuring that procs that are being restarted and/or added to a job do -not- call barrier during orte_init.
This commit was SVN r22404.
2010-01-14 17:59:42 +00:00
Brian Barrett
86d8356b13 Updates to allow OMPI to build on Cray XT platforms running Catamount
This commit was SVN r22381.
2010-01-07 18:14:03 +00:00
Ralph Castain
ef1bfaa823 Add the ability to track how many times a process has been restarted, and to communicate that value to a process when it is restarted in case it needs to take action when it is restarted as opposed to being started for the first time.
This commit was SVN r22377.
2010-01-07 01:19:44 +00:00
Jeff Squyres
16b100219d A patch from UTK to allow orte_init(), opal_init(), and associated
friends also receive &argc and &argv (George asked Jeff to Ralph to
review before committing).  The thought is that passing argv and argc
to opal/orte_init be useful to other projects outside of OMPI that are
using OPAL and/or ORTE (especially in conjunction with some other
bootstrapping code where it is helpful to modify argv).  It's such a
small thing that it's easy to apply here to make others' lives a
little easier.

Ask George for more details; I'm just the messenger.  :-)

Judging by the copyrights on this patch, it's been around for a
while.  :-)

This commit was SVN r22260.
2009-12-04 00:51:15 +00:00
Ralph Castain
a0d5c80ce0 Add a new framework for discovering local resource information such as cpu type/model, #cpus, available physical memory, etc. Two initial components (darwin and linux) are provided. This is needed to support bootstrap operations where daemons are started at node boot, and applications where initial knowledge of cpu identification is needed to guide framework component selection.
Add orte configuration option to control the use of the framework in the system. Although the code will build, it will not be active unless configured with --enable-bootstrap.

If bootstrap is enabled and the new opal_sysinfo framework can successfully determine the cpu model, pass that info to the application as an MCA param to support some work at Sun.

Also, have daemons report back the resources they find to guide process mapping in bootstrap operations (i.e., where the daemon starts at node boot as opposed to being launched at application start).

Adjust some platform files to enable these capabilities.

This commit was SVN r22244.
2009-11-30 23:11:25 +00:00
Ralph Castain
e38a0eab9f Remove the fddp and sensor frameworks - relocated to new cluster mgr project
This commit was SVN r22240.
2009-11-27 22:14:47 +00:00
Ralph Castain
2f91a4833b Have the trigger event return the event itself in the callback function so it can be reset, if desired
This commit was SVN r22101.
2009-10-15 02:35:53 +00:00
Ralph Castain
c749fefbd0 Instead of an odls-base mca param, make report_bindings a global param so that we can (a) detect it was set in the plm, and then (b) ensure it gets passed along to remote orteds so they will comply with the request.
This commit was SVN r22021.
2009-09-28 03:17:15 +00:00
Ralph Castain
dff0d01673 Yet another paffinity cleanup...sigh.
1. ensure that orte_rmaps_base_schedule_policy does not override cmd line settings

2. when you try to bind to more cores than we have, generate a not-enough-processors error message

3. allow npersocket -bind-to-core combination - because, yes, somebody actually wants to do it.

This commit was SVN r21996.
2009-09-22 18:44:53 +00:00
Ralph Castain
8da3aa8d5c Some (hopefully final!) adjustments and corrections to the paffinity support:
1. default -npersocket to force -bind-to-socket

2. if we cannot get a value for cores/socket, try using #logical cpus. otherwise, default to 1 core

3. add missing error message for not-enough-processors

4. since we no longer loop through orte_register_params twice, put the auto-detect of
   topology info in the rte_init for hnp and std_orted

5. fix bind-to-core, bysocket combination

This commit was SVN r21992.
2009-09-22 15:41:03 +00:00
Ralph Castain
2210989e2d Update the cm ess module to support orted bootstrap. Continue work towards bootstrap capability.
This commit was SVN r21989.
2009-09-22 02:16:40 +00:00
Ralph Castain
7138fd131f Final cleanup on new paffinity "if-avail" messages, plus fix one bug reported by Terry
This commit was SVN r21978.
2009-09-19 17:43:21 +00:00
Ralph Castain
2028017554 Modify the paffinity system to handle binding directives that are "soft" - i.e., when someone directs that we bind if the system supports it. This allows community members to distribute OMPI with default MCA param files that direct general binding policies, without having the distributed software fail if the system cannot support those policies.
The new options work by adding an ":if-avail" qualifier to the "bind-to-socket" and "bind-to-core" MCA params. If the system does not support this capability, the job will launch anyway. Without the qualifier, the job will abort with an error message indicating that the required functionality is not supported on this system.

This commit was SVN r21975.
2009-09-18 19:48:42 +00:00
Ralph Castain
8ae4b55d16 Enable a new command line option to --report-events that instructs mpirun to RML-report specific events during job life to the requestor.
This commit was SVN r21954.
2009-09-09 05:28:45 +00:00
Ralph Castain
17444243f7 Correct the bit mask to properly set the binding policy
This commit was SVN r21934.
2009-09-03 17:58:23 +00:00
Ralph Castain
0421a49844 Update the xml support to allow -xml-file foo whereby we redirect all xml formatted output (and ONLY xml formatted output) to a specified file
This commit was SVN r21930.
2009-09-02 18:03:10 +00:00
Ralph Castain
12d6595c41 Do not deprecate the rankfile mca param.
This commit was SVN r21900.
2009-08-27 10:40:51 +00:00
Ralph Castain
5e710928a5 Revise the new binding system slightly:
1. finalize the logic for properly respecting externally assigned bindings. Thanks to Chris Samuel for his help with this. Still needs some acid testing, but appears to now work.

2. remove the double-logic of requiring opal_paffinity_alone AND bind-to-foo. If the user specifies bind-to-foo, trust her and just do it.

This commit was SVN r21885.
2009-08-26 02:01:49 +00:00
Ralph Castain
509cc0553c When directly launched by an RM, flag that a process is operating without daemons - i.e., standalone. Provide an error string for the new socket_not_available error. Use errmgr.abort to exit when we cannot get a socket, and ensure that the slurmd module returns the proper exit status for slurm 2.0
This commit was SVN r21868.
2009-08-22 02:58:20 +00:00
Ralph Castain
7183179f56 Provide native integration with SLURM 2.0's OMPI support
This commit was SVN r21865.
2009-08-21 18:03:34 +00:00
Rainer Keller
9a0b6ef71e - For now fixup our headers for known issue of Sun Studio 12.1 compiler
"...", line XXX: warning: linker scope was specified more than once:

This commit was SVN r21853.
2009-08-20 11:12:45 +00:00
Ralph Castain
c3c642aa0d Add two new frameworks for sensing and predicting faults. This is just the bare-bones plumbing for now - will instantiate soon.
No ess modules reference these frameworks yet, so they are completely inactive.

This commit was SVN r21847.
2009-08-20 04:27:16 +00:00
Ralph Castain
0005e6e834 Correct a couple of bugs in the rank_file mapper that were incorrectly assigning vpids.
Add a capability to parse the rankfile to extract node information in place of requiring both hostfile and rankfile for non-RM managed environments. The rankfile is -only- parsed for this IF the hostfile and -host options are not given. Otherwise, those are used to establish allocation info as we did before this commit.

This commit was SVN r21815.
2009-08-13 16:08:43 +00:00
Ralph Castain
1dc12046f1 Modify the OMPI paffinity and mapping system to support socket-level mapping and binding. Mostly refactors existing code, with modifications to the odls_default module to support the new capabilities.
Adds several new mpirun options:

* -bysocket - assign ranks on a node by socket. Effectively load balances the procs assigned to a node across the available sockets. Note that ranks can still be bound to a specific core within the socket, or to the entire socket - the mapping is independent of the binding.

* -bind-to-socket - bind each rank to all the cores on the socket to which they are assigned.

* -bind-to-core - currently the default behavior (maintained from prior default)

* -npersocket N - launch N procs for every socket on a node. Note that this implies we know how many sockets are on a node. Mpirun will determine its local values. These can be overridden by provided values, either via MCA param or in a hostfile

Similar features/options are provided at the board level for multi-board nodes.

Documentation to follow...

This commit was SVN r21791.
2009-08-11 02:51:27 +00:00
Ralph Castain
e75d9b8296 Use orte_notifier to alert sys admins to checksum violations in the csum pml.
Add ability to store the RM's jobid string to tag the notifier message so that the sys admin knows what job had the problem.

This commit was SVN r21687.
2009-07-15 19:43:26 +00:00
George Bosilca
d66632fdc9 Reorder the nidmap encoding function. Add a check to make sure we don't write
outside the boundaries of the allocated array.

However, the problem is still there. If we have rmaps file containing only
partial information the num_procs get set to the wrong value (the number of
hosts in the rmaps file instead of the number of processes requested on the
command line).

This commit was SVN r21686.
2009-07-15 19:36:53 +00:00
Ralph Castain
90a2db25e9 Modify the errmgr callback function so it passes the proc that failed instead of only the jobid.
Update the cm routed module to detect and pass orted failures.

This commit was SVN r21682.
2009-07-15 11:43:33 +00:00
Josh Hursey
dcedc99a6a It seems that we do no really want to set this particular variable to NULL. (causes badness in ess)
This commit was SVN r21673.
2009-07-14 19:01:43 +00:00
Josh Hursey
b03923cd72 Make sure to set these values to NULL, so that if we release an object (knowingly or not) twice then we do not fault by referencing unallocated memory.
This commit was SVN r21672.
2009-07-14 18:56:49 +00:00
Ralph Castain
dbac602be5 Add support for the add-host and add-hostfile MPI Info keys to allow Comm_spawn users to add new hosts to those already known by mpirun.
Requires full testing once comm_spawn is fixed (Edgar is working that now).

This commit was SVN r21664.
2009-07-14 14:34:11 +00:00
Ralph Castain
60edbc7220 Fix hetero operations and comm_spawn (to a point).
Remove all architecture references from ORTE and put them back in the modex using modex_send/recv calls.

Hetero operations are now fully supported again. Comm_spawn now works up to the point where it segfaults due to an error in the CID code - which now allows Edgar to dig further! :-)

This commit was SVN r21655.
2009-07-13 20:03:41 +00:00
Ralph Castain
235db33e83 Some more pointer array addressing cleanup
This commit was SVN r21644.
2009-07-13 14:49:20 +00:00
Ralph Castain
b96a71b62e Enable restart of individual processes upon command via the errmgr callback function. It needs an external application to drive this capability, so normal operations shouldn't be affected.
Does not support MPI applications. More work coming to update daemon accounting on movement of procs across nodes.

This commit was SVN r21545.
2009-06-26 20:54:58 +00:00
Ralph Castain
0ba845fed2 Continue development of regular expression support by implementing it for slurm launches. Works for both initial (cmd line and non-cmd line) and comm_spawn launch.
Additional work required to fully enable static port support when using cmd line regular expression launch system.

This commit was SVN r21502.
2009-06-23 20:25:38 +00:00
Ralph Castain
771ce035a5 Complete implementation of regular expression generator and parser - now handles leading zero's and suffix in node names.
This commit was SVN r21468.
2009-06-18 04:36:00 +00:00
Ralph Castain
8db7a9f9a7 Add regular expression generator to encode complete nid/pid maps - decoder to come.
This commit was SVN r21455.
2009-06-17 02:54:20 +00:00
Ralph Castain
44bb265a52 Add a new MPI_Info key to preposition OMPI libraries - implementation underway, but this just defines and passes the new key
This commit was SVN r21425.
2009-06-12 17:53:13 +00:00
Ralph Castain
70b8c89b44 Fix slave spawn, which was hanging because the local daemon never saw the slave job report - it doesn't do it in the normal way, and so the slave launch system itself has to "fake it".
Also complete implementation to printout app_context objects so we see all the fields.

This commit was SVN r21408.
2009-06-10 19:01:08 +00:00
Ralph Castain
0336460b0a Continue implementation of resilient operations by supporting reuse of jobids for restarted procs. Ensure that restarted processes have valid node and local ranks, and that node rank values are passed to direct-launched processes.
This commit was SVN r21385.
2009-06-06 01:08:47 +00:00
Ralph Castain
30a357bd8d Provide a "progress meter" for launch that outputs progress as we are launching, especially on large jobs. Also, provide a timeout mechanism so that we cleanly abort if we don't get a response from the next daemon in a specified time.
This commit was SVN r21359.
2009-06-02 23:52:02 +00:00
Ralph Castain
4e0223a638 Add the ability to directly launch procs via rsh/ssh. Collect common functions in plm/base. Create a new global param to set assume_same_shell, alias'd back to plm_rsh_assume_same_shell (not deprecated).
This commit was SVN r21328.
2009-05-30 01:10:25 +00:00
Jeff Squyres
1834fc4ac6 Nysal noticed some repeated header files; removed.
This commit was SVN r21310.
2009-05-28 12:05:42 +00:00
Ralph Castain
f139cfd28a Fully enable the use of static ports to minimize connections on mpirun. When static ports are provided, daemons will automatically use routes defined by the selected routed module to callback to mpirun during startup, thus elimating the dedicated daemon-to-mpirun connection. Therefore, the total number of connections on mpirun will equal the fanout of the routed module (instead of #nodes in job).
Add a new tm ess module that exploits this capability.

Update the various plm modules to enable it - just a minor change reflecting an added param to a plm base function.

Additional fixes included:

1. remove an erroneous cleanup of session directories in the tool finalize procedure - tools don't create session directories to begin with!

2. fix a duplicate free when attempting to execute a non-existent app

3. cleanup an typo in the comm utilities 

4. fix comm_spawn - was perturbed by the changes in pack/unpack of orte_job_t to properly support orte-ps

Been tested on slurm and tm machines, using all tests in orte/test/mpi. May run into issue with command line length on large jobs due to inclusion of node info to support static ports - will fix this next with addition of regexp generator to compress that info.

This commit was SVN r21248.
2009-05-16 04:15:55 +00:00
Ralph Castain
484a6f58f2 Repair orte-ps by updating some of the interface code. Add ability to recover from attempting to contact non-responsive HNPs due to stale session directories. Implement the -j option. Turn "off" the -p option as it doesn't work and will take a little while to actually implement it (if anyone really cares).
This commit was SVN r21245.
2009-05-15 13:21:18 +00:00
Josh Hursey
76ecb7d2fe Make sure to initialize orte_process_info.proc_type properly on restart. Otherwise the application will have type 'NONE' instead of 'APP'.
This commit was SVN r21217.
2009-05-12 14:14:05 +00:00
Ralph Castain
c6f0499720 Some cleanups required from last night's commits to resolve some race conditions and ensure we cleanup properly. Also, remove some debug output that was unintentionally left "on" by default.
This commit was SVN r21202.
2009-05-11 14:03:07 +00:00
Ralph Castain
10a694ea43 The current errmgr.register_callback API takes a jobid as one of its argument. The intent was to have the errmgr check the jobid of the job being reported to it and, if it matches the jobid that was registered, call the specified callback function.
Unfortunately, we assign the jobid during the plm.spawn procedure - which means it happens -after- control of the job has passed out of the range of mpirun (or whatever program is spawning the job), so it is too late for that main program to register a callback function. If the main program registers tha callback -after- we return from plm.spawn, then it (a) cannot get a callback for failed-to-start, and (b) will miss the callback if a proc aborts in the time between job launch and the call to errmgr.register_callback.

This commit fixes the problem by adding callback-related fields to the orte_job_t object. Thus, the main program can specify what job states should initiate a callback, what function is to be called, and what data is to be passed back by simply filling in the orte_job_t fields prior to calling plm.spawn.

Also, fully implement the "copy" function for the orte_job_t object.

NOTE: as a result of this change, the errmgr.register_callback API may no longer be of any value.

This commit was SVN r21200.
2009-05-11 03:38:15 +00:00
Greg Koenig
60485ff95f This is a very large change to rename several #define values from
OMPI_* to OPAL_*.  This allows opal layer to be used more independent
from the whole of ompi.

NOTE: 9 "svn mv" operations immediately follow this commit.

This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Ralph Castain
4be24521aa Modify the orte_process_info structure to handle a broader range of process types by replacing the individual booleans with a 32-bit bitmap. Use a set of #define's to define the individual bits, and a set of matching macros to test for them. Update the orte code base to use the macros instead of the booleans.
Minor mod to the ompi layer to use the new #define's - just one-line name replacements.

This commit was SVN r21144.
2009-05-04 11:07:40 +00:00
Ralph Castain
5fa3b38d3c Revert r21097 as this results in multiple instantiations of global variables. Instead, fix the problem by including orte_globals.h in the orte_init.c.
Since I already had some changes in there, add in the rmaps rank_file changes - should work okay, but not fully tested.

This commit was SVN r21099.

The following SVN revision numbers were found above:
  r21097 --> open-mpi/ompi@88ae934c26
2009-04-29 02:13:14 +00:00
Rainer Keller
88ae934c26 - So, this ain't right, but somewhere the orte-tools
do not detect the orte_name_wildcard and orte_name_invalid.

This commit was SVN r21097.
2009-04-29 01:44:43 +00:00
Rainer Keller
221fb9dbca ... Delayed due to notifier commits earlier this day ...
- Delete unnecessary header files using
   contrib/check_unnecessary_headers.sh after applying
   patches, that include headers, being "lost" due to
   inclusion in one of the now deleted headers...

   In total 817 files are touched.
   In ompi/mpi/c/ header files are moved up into the actual c-file,
   where necessary (these are the only additional #include),
   otherwise it is only deletions of #include (apart from the above
   additions required due to notifier...)

 - To get different MCAs (OpenIB, TM, ALPS), an earlier version was
   successfully compiled (yesterday) on:
   Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
   Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
   Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled

This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
Rainer Keller
6c1cce8761 - For the upcoming header cleanup commit,
several header files (previously included by header-files)
   now have to be moved "upward".
   This is mainly system headers such as string.h, stdio.h and for
   networking, but also some orte headers.

This commit was SVN r21095.
2009-04-29 00:49:23 +00:00
Ralph Castain
9f7c605166 More cleanup of pointer array usage
This commit was SVN r20981.
2009-04-13 19:06:54 +00:00
Brian Barrett
7d0c6b68dc allow trunk to compile on red storm
This commit was SVN r20960.
2009-04-08 20:53:54 +00:00
Ralph Castain
d0b50a2b9b Eliminate an annoying "not found" message when a job abnormally terminates. In this case, we can get a race condition where the job object has been removed, but updates are continuing to flow into the system.
This commit was SVN r20857.
2009-03-24 18:06:49 +00:00