* Fix some missing includes in a few places.
* Add the cr_request() functionality to the BLCR CRS component.
We are now dependent upon the 0.6.* series of BLCR.
* Made the CR notification mechanism a registered function.
This way we can have an OPAL-only version and it can be replaced at
runtime with the ORTE version.
* Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only
CR functionality when the user wants it. Default: Disabled.
* Fix the placement of a checkpoint request check in MPI_Init
* Pull the OPAL notification mechanism into the SnapC framework.
* We no longer fork/exec the 'opal-checkpoint' command for local
checkpointing, the Local coordinator in the orted does this directly.
* The Local and Application coordinator talk together bypassing the OPAL
notifiation mechanism.
* Optimized the Local <-> App Coordinator communication.
* Improved the structure used to track vpid_snapshots in the local coord.
* Fix a race condition in which an application under heavy communication load
may produce an inconsistent global checkpoint.
This commit was SVN r16389.
and implementation. This has shown drastic performance benefit when
transferring Many files at roughly the same time.
I tested this for many different filem operations and everything was working
fine. Let me know if you have any problems with this functionality.
Some Notes:
- opal-checkpoint now has a 'quiet' flag to keep it from being too verbose.
- FileM RSH component is fully non-blocking.
- FileM RSH component has incomming connection throttling since by default
ssh only allows 10 concurrent scp connections to any single host. This
default can be adjusted via an MCA parameter.
{{{-mca filem_rsh_max_incomming 10}}}
- There is an MCA parameter for max outgoing connections, but it is currently
not implemented. If someone needs it then it should not be hard to implement.
{{{-mca filem_rsh_max_outgoing 10}}}
- Changed the FileM request structure so that it is a bit more explicit and
flexible.
- Moved the 'preload-binary' and 'preload-files' functionality into odls/base
allowing for code reuse in the 'process' and 'default' ODLS components.
- Fixed a bug in the process name resolution which broke the 'preload-*'
functionality due to GPR table structure changes.
- The FileM RSH component might be able to see even more speedup from using a
thread pool to operate on the work_pool structures, but that is for future
work.
- Added a 'opal-show-help' file to ODLS Base
This commit was SVN r16252.
Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point.
Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings.
This commit was SVN r15517.
The SnapC Full local Coordinator used this argument to attach to the job the
daemon would be launching. So once this option was removed C/R support broke.
This commit has the local coordinator attach to the job just before it is
launched by the ODLS module. This is a much cleaner solution, and will
eventually allow the SnapC modules to attach to multiple jobs launched
on a single machine.
This commit fixes the C/R regression introduced in r15007.
This commit was SVN r15121.
The following SVN revision numbers were found above:
r15007 --> open-mpi/ompi@85df3bd92f
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.
This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.
This commit closes trac:158
More details to follow.
This commit was SVN r14051.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r13912
The following Trac tickets were found above:
Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158