The problem was that we doubly decremented the active count on blocking receives that we stall to complete. This moved the active count into the negative. With a negative count for 'active' a message that should have been accounted for would be over looked. This then causes the bookmark exchange to post a drain for a message that was never posted, thus locking the protocol. By eliminating the decrement on the 'active' count when we attempt to post the drain message, we only the decrement this counter when the outstanding blocking recv completes during the stall operation.
Refs trac:1619
Does not close this ticket since there is an outstanding potential problem with ANY_SOURCE and ANY_TAG, as referenced in the ticket.
This should be moved to v1.3
This commit was SVN r20147.
The following Trac tickets were found above:
Ticket 1619 --> https://svn.open-mpi.org/trac/ompi/ticket/1619
architectures (read SPARC64) require aligned accesses, we increase the storage space
when we pack a datatype description to keep the fields aligned. This has to be done
on both sided in order to be consistent.
This commit was SVN r20133.
problem seems to come from the free list, but due to lack of time to
understand it completely, I provide this fix. Basically, there is no
waiting in the MX BTL anymore, if we cannot allocate a fragment we
rely on the PML to take the corrective actions.
This commit was SVN r20124.
The solution is not to compute the OVERLAP flag, as the best we can do
is an approximative answer. Without this flag the unpack can leads to
unexpected answers if the data-type contain any overlapping regions.
As such datatypes are illegal in MPI, this became a user responsability.
This commit was SVN r20120.
corrections to non-windows files (but within ifdef __WINDOWS__)
type casts, event library for windows use win32.
in orte runtime, add windows sockets handling and object construction.
This commit was SVN r20110.
1. minor modification to include two new opal MCA params:
(a) opal_profile: outputs what components were selected by each framework
currently enabled for most, but not all, frameworks
(b) opal_profile_file: name of file that contains profile info required
for modex
2. introduction of two new tools:
(a) ompi-probe: MPI process that simply calls MPI_Init/Finalize with
opal_profile set. Also reports back the rml IP address for all
interfaces on the node
(b) ompi-profiler: uses ompi-probe to create the profile_file, also
reports out a summary of what framework components are actually
being used to help with configuration options
3. modification of the grpcomm basic component to utilize the
profile file in place of the modex where possible
4. modification of orterun so it properly sees opal mca params and
handles opal_profile correctly to ensure we don't get its profile
5. similar mod to orted as for orterun
6. addition of new test that calls orte_init followed by calls to
grpcomm.barrier
This is all completely benign unless actively selected. At the moment, it only supports modex-less launch for openib-based systems. Minor mod to the TCP btl would be required to enable it as well, if people are interested. Similarly, anyone interested in enabling other BTL's for modex-less operation should let me know and I'll give you the magic details.
This seems to significantly improve scalability provided the file can be locally located on the nodes. I'm looking at an alternative means of disseminating the info (perhaps in launch message) as an option for removing that constraint.
This commit was SVN r20098.
pondering about this problem, we came to the conclusion that the best approach
is to keep what we had before (i.e. the original approach).
The main reason for this is being nice with tool developers. In the current
incarnation, they can either catch the Fortran calls or the C calls. If they
provide both, then they will have to figure out how to cope with the double
calls (as your example highlight).
Here is the behavior Open MPI will stick too:
Fortran MPI -> C MPI
Fortran PMPI -> C MPI
However, the is another possible approach. This might avoid the double calls
while preserving the tool writers friendliness. This possible approach will do:
Fortran MPI -> C MPI
Fortran PMPI -> C PMPI
^
Unfortunately, we will have to heavily modify all files in the Fortran
interface layer in order to support this approach.
This commit was SVN r20079.
of more than one of the btl_openib_if_include, btl_openib_if_exclude,
btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude MCA parameters.
This commit was SVN r20053.
determining which IP address to use when transmitting data. Also it adds logic
to prevent usage of more than one of the btl_openib_if_include,
btl_openib_if_exclude, btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude
MCA parameters.
This should complete the code modifications needed for ticket 1665.
This commit was SVN r20052.
1. fix a bug in pml_ob1_recvreq/sendreq.c, buffer was made defined where the request has already been released.
2. complete memchecker support for collective functions.
3. change the wrongly spelled function name of memchecker, i.e. '*_isaddressible' should be '*_isaddressable'
This commit was SVN r20043.
determing of the IP subnet. The netmask was being used improperly when
determining which subnet each connection is on. Part two is the ability to
include/exclude specific subnets.
This patch fixes ticket #1665
This commit was SVN r20016.
memory hooks (munmap) and initialize the mallopt component, and
nothing else.
Use this mpool in the MX common initialization, supporting both BTL
and MTL. Automatically set the MX_RCACHE environment variable to
enable registration cache in MX.
Tested with success for munmap() and large free().
This commit was SVN r20003.
* If max_inline_data == -1 perform runtime detection
* If max_inline_data >=0 use the value provided
* If the user does not explicitly set this via command line, use the value from INI file
This commit fixes trac:1662
This commit was SVN r19995.
The following Trac tickets were found above:
Ticket 1662 --> https://svn.open-mpi.org/trac/ompi/ticket/1662
This will sit in trunk for a few days - would like to actually see some errors reported to syslog before moving the code to 1.3
This commit was SVN r19986.
another bug. This also causes 0-byte requests to be treated as a buffer
error, causing the base request to be requeued. On Cray XT, it may be
temporarily impossible to make allocations for buffer requests, as the
default stack size is small (8 MB) and there is no true swap device.
Even with the stack size increased, there will be cases in which this
condition recurs.
One possibility is to make the buffer allocations off of the heap; but,
this does not change the fact that eventually an out-of-memory condition
will occur and we need to support multiple receives in transit, a
condition for which the available buffer space may change. On the other
hand, if we switch to allocating the buffer space from the heap, we will
need to return an error when the allocation fails and there are no other
buffers in transit.
This commit was SVN r19981.
It highlighted a bug in the bookmark component where for persistent sends we were not copying the context, but just moving it. This caused us to lose track of the message if it is started/completed multiple times.
This will need to be brought over to the v1.3 branch, but it should soak overnight to get a round of testing first.
This commit was SVN r19962.