This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.
This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.
This commit closes trac:158
More details to follow.
This commit was SVN r14051.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r13912
The following Trac tickets were found above:
Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
builds, so disable it there
* On 10.4.8 (and possibly others), siginfo is NULL in the signal
callback on 64 bit Intel builds, so account for that in the signal
callback.
This commit was SVN r14045.
In that case, sendcount and sendtype are not valid and we need to use
recvcount and recvtype.
This commit fixes trac:943. Reviewed by Jelena Pjesivac-Grbovic.
This commit was SVN r14022.
The following Trac tickets were found above:
Ticket 943 --> https://svn.open-mpi.org/trac/ompi/ticket/943
Queue_empty is determined by the reader, and is it's local view.
However, the writer may continue writing to this queue. The decision
to go on to the next cb_fifo is done in an atomic region, checking the
writer's view. The writer also "changes it's view" in an atomic
region protected by the same lock.
This commit was SVN r13968.
test, Sun's darray test, and an internal LANL test code. I would not
assume it will work properly on other codes, as I'm still not sure I
completely understand what the standard says this function is supposed to
do.
Refs trac:65
This commit was SVN r13967.
The following Trac tickets were found above:
Ticket 65 --> https://svn.open-mpi.org/trac/ompi/ticket/65
- fixing line lengths and some of the comments
- possible bug fix (but I do not think we exposed it in any tests so far)
temporary buffers were allocated as multiples of extent instead of
true_extent + (count -1) * extent.
Everything is still passing Intel tests over tcp and btl mx up to 64 nodes.
This commit was SVN r13956.
* Do not empty the list of in-flight frags during _close(); the OOB
callback will still occur (_send_cb()) and try to remove the frag
from the list, which will then result in an assert failure (debug
builds).
* Add one more fix for a possible problem -- add an extra RETAIN /
RELEASE pair on the endpoint to ensure that it is not actually
freed before all in-flight frags have drained.
This commit was SVN r13953.
The following Trac tickets were found above:
Ticket 921 --> https://svn.open-mpi.org/trac/ompi/ticket/921
The original code was not compensating for the space used by the header.
When memory got tight, the allocator would return a pointer to memory that
did not exist resulting in a SEGV for the application. This is a partial
fix for ticket #929.
Reviewed by Rich Graham.
This commit was SVN r13950.
create a duplicate type, because any duplicate type lose the PREDEFINED flag.
An MPI_LB (respectively MPI_UB) without the PREDEFINED tag is useless, as it's
not the a marker anymore. The solution is to return the same pointer, but once
the reference count has been increased. In order for this to work, I allowed
the destruction to check for the reference count of an object before complaining
about destroying a predefined type.
This fixed ticket #317.
This commit was SVN r13942.
This saves some memory for the constructors and destructors arrays of a
class by counting the constructors and destructors while we are counting
the cls_depth. And the reversion of the constructor array can now be done
without an extra loop.
This commit was SVN r13939.
Currently 3 algorithms are available:
- non-overlapping, reduce + scatterv, (works for non-commutative operations)
- recursive halving algorithm (copied from basic module)
- ring algorithm (similar to allreduce ring, for large messages)
This commit was SVN r13929.