I'm unable to split it in two parts, my patch and Edgar's one. So I just update
copyright information for both of us.
What this patch do:
- it use the unexpected queue create by commit r19562 to dispatch the
unexpected message to the right communicator (once this communicator
is created and initialized).
- delay the PML comm_add until we have the context_id for the new communicator.
- only do the PML comm_add on processes that really belong to the new
communicator. Please read the lengthy comment in the source code for the
reason behind this.
This commit was SVN r19929.
The following SVN revision numbers were found above:
r19562 --> open-mpi/ompi@acd3406aa7
Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed.
This commit was SVN r18664.
(sometimes after the merge with the ORTE branch), the opal_pointer_array
will became the only pointer_array implementation (the orte_pointer_array
will be removed).
This commit was SVN r17007.
simultaneously, but is doing it incorrectly. If the function is running already
for one communicator and it is called from another thread for other communicator
with lower cid the check comm->c_contextid != ompi_comm_lowest_cid()
will fail and the function will be executed for two different communicators by
two threads simultaneously. There is nothing in the algorithm that prevent it
from been running simultaneously for different communicators as far as I can see,
but ompi_comm_unregister_cid() assumes that it is always called for a communicator
with the lowest cid and this is not always the case. This patch removes bogus
lowest cid check and fix ompi_comm_register_cid() to properly remove cid from
the list.
This commit was SVN r16088.
used at nce (up to one unique collective module per collective function).
Matches r15795:15921 of the tmp/bwb-coll-select branch
This commit was SVN r15924.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r15795
r15921
fix some of the multi-threading problems for the cid allocation. Two bugs
specifically:
- since we do not have a queue for incoming fragments of unknown cid, we need
to synchronize all processes before exiting the communicator creation. This
synchronization was/is located in comm_activate, which was however too late
for the multi-threaded case. Thus, for multi-threaded scenarios we are now
synchronizing 'before' we allow another thread to enter the cid-allocation
loop.
- for synchronization, we used for the sake of simplicity allreduce
operations. It turns out, that these operations interefered with the
allreductions in the cid-allocation routine, which lead to non-sense results
in the cid-allocation and potentially to endless loops.
Multi-threaded communicator creation seems to work now, is however still 'very
very' slow. I think, the busy wait of threads is killing the performance of
the active threads in the cid allocation. But this is another topic.
This commit was SVN r15910.
* General TCP cleanup for OPAL / ORTE
* Simplifying the OOB by moving much of the logic into the RML
* Allowing the OOB RML component to do routing of messages
* Adding a component framework for handling routing tables
* Moving the xcast functionality from the OOB base to its own framework
Includes merge from tmp/bwb-oob-rml-merge revisions:
r15506, r15507, r15508, r15510, r15511, r15512, r15513
This commit was SVN r15528.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r15506
r15507
r15508
r15510
r15511
r15512
r15513
threads to create new communicators. There is nothing in the standard about threading
and communicaotr functions, but as they include collective communications I expect the
same rules have to be applied. As such, on an incorrect MPI program we deadlock (!).
This commit was SVN r15456.
single threaded builds. In its default configuration, all this does
is ensure that there's at least a good chance of threads building
based on non-threaded development (since the variable names will be
checked). There is also code to make sure that a "mutex" is never
"double locked" when using the conditional macro mutex operations.
This is off by default because there are a number of places in both
ORTE and OMPI where this alarm spews mega bytes of errors on a
simple test. So we have some work to do on our path towards
thread support.
Also removed the macro versions of the non-conditional thread locks,
as the only places they were used, the author of the code intended
to use the conditional thread locks. So now you have upper-case
macros for conditional thread locks and lowercase functions for
non-conditional locks. Simple, right? :).
This commit was SVN r15011.
Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced).
Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up).
I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t).
In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but...
Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems.
This commit was SVN r11204.
- move files out of toplevel include/ and etc/, moving it into the
sub-projects
- rather than including config headers with <project>/include,
have them as <project>
- require all headers to be included with a project prefix, with
the exception of the config headers ({opal,orte,ompi}_config.h
mpi.h, and mpif.h)
This commit was SVN r8985.
discussed and cleared with Edgar.
Ensure that only processes who will be in the new communicator call
the coll selection function. It is pointless (and Bad in some cases)
for processes who are not in the new communicator to try to select a
coll module for the new communicator.
This commit was SVN r7573.
* don't know what I was thinking, but can't use the MCA_PML_CALL macro on
the two data values, as they don't have things that the macro can
expand into
This commit was SVN r6868.
the values to the PML structure. This will allow PMLs that want to do
hardware matching at the cost of a smaller range of valid tags and cids.
Updated all the places that used the MPI_TAG_UB_VALUE constant to instead
look at the pml struct.
This commit was SVN r6778.
* rename ompi_basename to opal_basename
* rename ompi bitop functions to opal
* rename ompi_cmd_line to opal_cmd_line
* rename ompi_sizet2int to opal_sizet2int
* rename orte_daemon_init to opal_daemon_init
* rename ompi_few to opal_few
This commit was SVN r6330.