There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd *really* have to try).
This also involved a slight change to the oob.xcast API, so propagated that as required.
Note: this has *only* been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-)
Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately.
This commit was SVN r14475.
- Move the PML Modex stuff out of the BML -- Abstraction violation.
- Also fix the location of the add_procs with respect to the stage gates.
This commit was SVN r14422.
Fix for memory corruption in the restarted process stack. This stemed from
the brute force method we were previously using. This commit fixes this by
using a lighter weight solution focused in the r2 BML instead of above the PML.
This is a more efficient and flexible solution, and it solves the original
problem.
In the process I pulled out the ft_event function in the tcp BTL and r2 BML
into a set of *_ft.[c|h] files just to keep any updates to these code paths
as isolated as possible to make merging easier on everyone.
This commit was SVN r14371.
The following SVN revision numbers were found above:
r2 --> open-mpi/ompi@58fdc18855
The following Trac tickets were found above:
Ticket 977 --> https://svn.open-mpi.org/trac/ompi/ticket/977
Remove a redundant statement in the r2 BML.
This commit was SVN r14228.
The following SVN revision numbers were found above:
r2 --> open-mpi/ompi@58fdc18855
- Remove an old comment from crcp_base_fns.c
- Let ob1 have its very own ft_event function (which I'll fill in shortly)
- Make sure ob1 finalizes the bsend stuff so we don't leave a bunch of memory sitting around
- PML base - destruct the array upon finalize. Shrink the include search so it stops after finding a match
This commit was SVN r14222.
if less than or equal pml_ob1_unexpected_limit just buffer in the PML level recv
fragment else allocate a buffer via the bucket allocator
This commit was SVN r14117.
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.
This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.
This commit closes trac:158
More details to follow.
This commit was SVN r14051.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r13912
The following Trac tickets were found above:
Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
adding new procs that the remote proc's pml is the same as our local pml.
Turns the hangs from mismatched PMLs into an abort, which is better,
I think.
This commit was SVN r13582.
* Make sure that the pval always writes to the correct portion of the
lval. This only matters on 32 bit big endian machines.
* On 32 bit machines when assigning to pval, the other 4 bytes of lval
weren't being written, which could lead to bogus data
We use macros so that there aren't casts all over the code and the pval
assignment can occur to the correct 4 bytes. Refs trac:587
This commit was SVN r12974.
The following Trac tickets were found above:
Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
r12714) for supporting compilers / architectures with different
padding rules.
This commit was SVN r12749.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r12491
r12714
hits the buffer on the other side. For this kind of BTLs we need to send
FIN through the same BTL, PUT was performed with so network will handle
ordering for us. If we will use another BTL, receiver can get FIN before
data will hit the buffer and complete request prematurely. We mark such
problematic BTLs with MCA_BTL_FLAGS_FAKE_RDMA flag (this kind of RDMA
is really fake, because the real one guaranties that sender will see the
completion only after receiver's NIC confirmed that all the data was
received).
This commit was SVN r12732.
parameter. For optimisation purpose only this BTL is used to send packet
through instead of trying to send packets through all BTLs. But actually the
code was wrong. It simply used provided bml_btl and it may represent different
endpoint from packet's destination. The fixed code checks if packet's
destination is reachable through the BTL, finds appropriate bml_btl and only
then tries to send it through correct bml_btl.
This commit was SVN r12319.
all platforms. The only exceptions (and I will not deal with them
anytime soon) are on Windows:
- the write functions which require the length to be an int when it's
a size_t on all UNIX variants.
- all iovec manipulation functions where the iov_len is again an int
when it's a size_t on most of the UNIXes.
As these only happens on Windows, so I think we're set for now :)
This commit was SVN r12215.
long ago) supposed to be used as a cache for accessing the PML procs. But in
all of the PMLs the PML proc contain only one field i.e. a pointer to the ompi_proc.
This pointer can be accessed using the c_remote_group easily. Therefore, there is no
meaning of keeping the PML procs around. Slim fast commit ...
This commit was SVN r11730.
interconnects that provide matching logic in the library.
Currently includes support for MX and some support for
Portals
* Fix overuse of proc_pml pointer on the ompi_proc structuer,
splitting into proc_pml for pml data and proc_bml for
the BML endpoint data
* bug fixes in bsend init code, which wasn't being used by
the OB1 or DR PMLs...
This commit was SVN r10642.
flag, new flags to be included when convertor is initialized
- modified pml/btl module defs and added stub functions for diagnostic
output routines to dump state of queues / endpoints
- updates to data reliability pml
This commit was SVN r9329.
- move files out of toplevel include/ and etc/, moving it into the
sub-projects
- rather than including config headers with <project>/include,
have them as <project>
- require all headers to be included with a project prefix, with
the exception of the config headers ({opal,orte,ompi}_config.h
mpi.h, and mpif.h)
This commit was SVN r8985.
the values to the PML structure. This will allow PMLs that want to do
hardware matching at the cost of a smaller range of valid tags and cids.
Updated all the places that used the MPI_TAG_UB_VALUE constant to instead
look at the pml struct.
This commit was SVN r6778.
Change all the places where they are used to fit the new name.
Remove the code to check the remote arch from the PML. We will have a GPR mechanism
in ompi_mpi_initialize to do that.
This commit was SVN r6750.
support in OMPI. Currently only enables/disables the architecture
sharing modex in ob1 pml.
* Add sds framework to ompi_info
* Figure out table ids to use for Portals BTL at configure time, since
we should use 30 & 31 on Red Storm, but the reference implementation
only supports 0-8.
* Some bug fixes in Portals UTCP sds
This commit was SVN r6650.
- only call sched_yield if it exists
- don't fail out if modex doens't work in ob1
- bunch of fixes for Portals BTL
- add cnos rml component
- add NULL gpr component (should only be used if replica AND proxy
fail to load)
This commit was SVN r6629.
is a lot more difficult than a PTL, and it can adapt it's behavior to the level of threading required
by the user. In this case the behavior is the priorit of the PML. Therefore this information is never
availale before the init function (of the PML) is called. So I try to keep nearly the same structure
as it was before, with one change. When a PML get initialized it does not necessarily means it has been
selected, so it does not means it has to create all it's internal structures (and select the PTL and
all this stuff). They can all be done later, when a PML knows that it definitively get selected
(when the enable function is called with the argument set to true). Thus, in the case of a PML close
one have to check if the PML has been selected or not before trying to clean up the internals.
I had to change the MPI_Init function to allow the PML to be enabled before we start adding procs inside.
This commit was SVN r6434.