(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
back the native error code.
* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
(OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
decode 'ret' to get the native error code.
This commit was SVN r23162.
1. file activity - can monitor file size, access and modification times. If these fail to change over a specified number of sampling iterations (rate is an mca param), then the errmgr is notified.
2. memory usage - checks amount of memory used by a process. Limit and sampling rate can be set.
This support must be enabled by configuring --enable-sensors.
ompi_info and orte-info have been updated to include the new framework.
Also includes some initial steps toward restoring the recovery capability. Most notably, the ODLS API has been extended to include a "restart_proc" entry for restarting a local process, and organizes the various ERRMGR framework globals into a single struct as we do in the other ORTE frameworks. Fix an oversight in the ERRMGR framework where a pointer array was constructed, but not initialized.
Implementation continues.
This commit was SVN r23043.
Update ompi_info and orte-info to include the new framework.
Fix some selection logic and a typo'd variable name
Still remains ompi_ignored until we complete testing
This commit was SVN r22848.
Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those.
The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed.
This commit was SVN r22824.
Modify the orte configure options to specify --enable-multicast such that it directs components to build or not instead of littering the code base with #if's. Remove those #if's where they used to occur.
Add a new grpcomm "mcast" module to support multicast operations. Still some work required to properly perform daemon collectives for comm_spawn operations. New module only builds when --enable-multicast is provided, and when specifically selected.
This commit was SVN r22709.
as simple as I or Ralph had hoped. This should be the real fix,
or very close to it. I can now see both the sensor and rmcast
information from ompi_info when configured
with --enable-monitoring --enable_multicast
This commit was SVN r22131.
The following SVN revision numbers were found above:
r22129 --> open-mpi/ompi@02ff00dfb5