b9893aacc5
1. file activity - can monitor file size, access and modification times. If these fail to change over a specified number of sampling iterations (rate is an mca param), then the errmgr is notified. 2. memory usage - checks amount of memory used by a process. Limit and sampling rate can be set. This support must be enabled by configuring --enable-sensors. ompi_info and orte-info have been updated to include the new framework. Also includes some initial steps toward restoring the recovery capability. Most notably, the ODLS API has been extended to include a "restart_proc" entry for restarting a local process, and organizes the various ERRMGR framework globals into a single struct as we do in the other ORTE frameworks. Fix an oversight in the ERRMGR framework where a pointer array was constructed, but not initialized. Implementation continues. This commit was SVN r23043. |
||
---|---|---|
.. | ||
config_files.m4 | ||
orte_check_alps.m4 | ||
orte_check_bproc.m4 | ||
orte_check_loadleveler.m4 | ||
orte_check_lsf.m4 | ||
orte_check_sge.m4 | ||
orte_check_slurm.m4 | ||
orte_check_tm.m4 | ||
orte_check_xgrid.m4 | ||
orte_configure_options.m4 | ||
orte_setup_debugger_flags.m4 | ||
orte_setup_wrappers.m4 |