1
1
Граф коммитов

9 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
9febaa475e * Add shell of functionality required for supporting Portals4
* Update places where orte-free builds have failed

This commit was SVN r23891.
2010-10-14 22:49:09 +00:00
Ralph Castain
2ff1ae13e1 Create a new "heartbeat" module in the sensor framework and move the plm_base heartbeat code there. Add new proc and job states for heartbeat_failed. Remove the "heartbeat" cmd line option for orted as this is now done automatically if the --enable-heartbeat configure option is set.
This commit was SVN r23102.
2010-05-05 00:48:43 +00:00
Ralph Castain
b9893aacc5 Add a sensor framework to ORTE that monitors applications and notifies the errmgr when they exceed specified boundaries. Two modules are included here:
1. file activity - can monitor file size, access and modification times. If these fail to change over a specified number of sampling iterations (rate is an mca param), then the errmgr is notified.

2. memory usage - checks amount of memory used by a process. Limit and sampling rate can be set.

This support must be enabled by configuring --enable-sensors.

ompi_info and orte-info have been updated to include the new framework.

Also includes some initial steps toward restoring the recovery capability. Most notably, the ODLS API has been extended to include a "restart_proc" entry for restarting a local process, and organizes the various ERRMGR framework globals into a single struct as we do in the other ORTE frameworks. Fix an oversight in the ERRMGR framework where a pointer array was constructed, but not initialized.

Implementation continues.

This commit was SVN r23043.
2010-04-26 22:15:57 +00:00
Ralph Castain
18c7aaff08 Update the grpcomm framework to be more thread-friendly.
Modify the orte configure options to specify --enable-multicast such that it directs components to build or not instead of littering the code base with #if's. Remove those #if's where they used to occur.

Add a new grpcomm "mcast" module to support multicast operations. Still some work required to properly perform daemon collectives for comm_spawn operations. New module only builds when --enable-multicast is provided, and when specifically selected.

This commit was SVN r22709.
2010-02-25 01:11:29 +00:00
Ralph Castain
f66b6cae23 Enable the boot of an orted "virtual machine". Modify the mapper framework to allow mapping of only daemons. Remove the cm ras module as no longer required. Modify the orted code to always send back node arch info. Remove the "--enable-bootstrap" configure option as this feature will now always be available.
This commit was SVN r22480.
2010-01-25 22:25:13 +00:00
Brian Barrett
86d8356b13 Updates to allow OMPI to build on Cray XT platforms running Catamount
This commit was SVN r22381.
2010-01-07 18:14:03 +00:00
Ralph Castain
a0d5c80ce0 Add a new framework for discovering local resource information such as cpu type/model, #cpus, available physical memory, etc. Two initial components (darwin and linux) are provided. This is needed to support bootstrap operations where daemons are started at node boot, and applications where initial knowledge of cpu identification is needed to guide framework component selection.
Add orte configuration option to control the use of the framework in the system. Although the code will build, it will not be active unless configured with --enable-bootstrap.

If bootstrap is enabled and the new opal_sysinfo framework can successfully determine the cpu model, pass that info to the application as an MCA param to support some work at Sun.

Also, have daemons report back the resources they find to guide process mapping in bootstrap operations (i.e., where the daemon starts at node boot as opposed to being launched at application start).

Adjust some platform files to enable these capabilities.

This commit was SVN r22244.
2009-11-30 23:11:25 +00:00
Ralph Castain
e38a0eab9f Remove the fddp and sensor frameworks - relocated to new cluster mgr project
This commit was SVN r22240.
2009-11-27 22:14:47 +00:00
Ralph Castain
214e26b539 Per Jeff (this work was done on a branch of mine, so I will do the commit):
Re-enable "./autogen.sh -no-ompi" again. If you -no-ompi, the entire OMPI
configury is skipped and the entire ompi/ subtree is not built. There's
some simple m4-isms that prune out the relevant parts.

I added ompi/config/, orte/config/, and opal/config/ directories. I moved a
bunch of m4 files from the top-level config/ dir into ompi/config/, and a few
into orte/config/.

Note that all 3 <project>/config directories have a config_files.m4 file. This
file contains the AC_CONFIG_FILES list for that project. The AC_CONFIG_FILES
call cannot be in an AC_DEFUN macro and conditionally called -- if it is
included at all, Autoconf will process it. Hence, these config_files.m4 files
don't AC_DEFUN -- they just have AC_CONFIG_FILES. m4_ifdef() is used to
conditionally include the files or not.

I moved a bunch of obvious OMPI-only m4 files from config/ to ompi/config/,
but I'm sure that there's more that could go. A ticket will be filed with
thoughts on future work in this area.

This commit was SVN r22113.
2009-10-20 23:44:20 +00:00