* Don't overwrite the des_flags field, removing the
all important always callback field
* Fix up return status of bml_base_send, since
the rest of the code expects OMPI_SUCCESS or
an error code
This commit was SVN r20178.
* Update to 4 space tabs where relevant (and some irrelevant white
space changes)
* Move a few constants to the left of !=/==
* Add a few {}'s are one line blocks
* Use BEGIN/END_C_DECLS
* Change /**< to /** in a few places
This commit was SVN r20177.
components to use. This code was rendered obsolete (albiet harmless)
by the MCA base improvements that only open the components that were
specified by each framework's MCA parameter.
This commit was SVN r20176.
Also, per chat with Jeff, modified the Makefile.am's of a few orte tools so that they were consistent in the way we generate the ompi-equivalent cmds.
This commit was SVN r20165.
The problem was that we doubly decremented the active count on blocking receives that we stall to complete. This moved the active count into the negative. With a negative count for 'active' a message that should have been accounted for would be over looked. This then causes the bookmark exchange to post a drain for a message that was never posted, thus locking the protocol. By eliminating the decrement on the 'active' count when we attempt to post the drain message, we only the decrement this counter when the outstanding blocking recv completes during the stall operation.
Refs trac:1619
Does not close this ticket since there is an outstanding potential problem with ANY_SOURCE and ANY_TAG, as referenced in the ticket.
This should be moved to v1.3
This commit was SVN r20147.
The following Trac tickets were found above:
Ticket 1619 --> https://svn.open-mpi.org/trac/ompi/ticket/1619
architectures (read SPARC64) require aligned accesses, we increase the storage space
when we pack a datatype description to keep the fields aligned. This has to be done
on both sided in order to be consistent.
This commit was SVN r20133.
problem seems to come from the free list, but due to lack of time to
understand it completely, I provide this fix. Basically, there is no
waiting in the MX BTL anymore, if we cannot allocate a fragment we
rely on the PML to take the corrective actions.
This commit was SVN r20124.
The solution is not to compute the OVERLAP flag, as the best we can do
is an approximative answer. Without this flag the unpack can leads to
unexpected answers if the data-type contain any overlapping regions.
As such datatypes are illegal in MPI, this became a user responsability.
This commit was SVN r20120.
corrections to non-windows files (but within ifdef __WINDOWS__)
type casts, event library for windows use win32.
in orte runtime, add windows sockets handling and object construction.
This commit was SVN r20110.
1. minor modification to include two new opal MCA params:
(a) opal_profile: outputs what components were selected by each framework
currently enabled for most, but not all, frameworks
(b) opal_profile_file: name of file that contains profile info required
for modex
2. introduction of two new tools:
(a) ompi-probe: MPI process that simply calls MPI_Init/Finalize with
opal_profile set. Also reports back the rml IP address for all
interfaces on the node
(b) ompi-profiler: uses ompi-probe to create the profile_file, also
reports out a summary of what framework components are actually
being used to help with configuration options
3. modification of the grpcomm basic component to utilize the
profile file in place of the modex where possible
4. modification of orterun so it properly sees opal mca params and
handles opal_profile correctly to ensure we don't get its profile
5. similar mod to orted as for orterun
6. addition of new test that calls orte_init followed by calls to
grpcomm.barrier
This is all completely benign unless actively selected. At the moment, it only supports modex-less launch for openib-based systems. Minor mod to the TCP btl would be required to enable it as well, if people are interested. Similarly, anyone interested in enabling other BTL's for modex-less operation should let me know and I'll give you the magic details.
This seems to significantly improve scalability provided the file can be locally located on the nodes. I'm looking at an alternative means of disseminating the info (perhaps in launch message) as an option for removing that constraint.
This commit was SVN r20098.
pondering about this problem, we came to the conclusion that the best approach
is to keep what we had before (i.e. the original approach).
The main reason for this is being nice with tool developers. In the current
incarnation, they can either catch the Fortran calls or the C calls. If they
provide both, then they will have to figure out how to cope with the double
calls (as your example highlight).
Here is the behavior Open MPI will stick too:
Fortran MPI -> C MPI
Fortran PMPI -> C MPI
However, the is another possible approach. This might avoid the double calls
while preserving the tool writers friendliness. This possible approach will do:
Fortran MPI -> C MPI
Fortran PMPI -> C PMPI
^
Unfortunately, we will have to heavily modify all files in the Fortran
interface layer in order to support this approach.
This commit was SVN r20079.
of more than one of the btl_openib_if_include, btl_openib_if_exclude,
btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude MCA parameters.
This commit was SVN r20053.
determining which IP address to use when transmitting data. Also it adds logic
to prevent usage of more than one of the btl_openib_if_include,
btl_openib_if_exclude, btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude
MCA parameters.
This should complete the code modifications needed for ticket 1665.
This commit was SVN r20052.
1. fix a bug in pml_ob1_recvreq/sendreq.c, buffer was made defined where the request has already been released.
2. complete memchecker support for collective functions.
3. change the wrongly spelled function name of memchecker, i.e. '*_isaddressible' should be '*_isaddressable'
This commit was SVN r20043.
determing of the IP subnet. The netmask was being used improperly when
determining which subnet each connection is on. Part two is the ability to
include/exclude specific subnets.
This patch fixes ticket #1665
This commit was SVN r20016.
memory hooks (munmap) and initialize the mallopt component, and
nothing else.
Use this mpool in the MX common initialization, supporting both BTL
and MTL. Automatically set the MX_RCACHE environment variable to
enable registration cache in MX.
Tested with success for munmap() and large free().
This commit was SVN r20003.
* If max_inline_data == -1 perform runtime detection
* If max_inline_data >=0 use the value provided
* If the user does not explicitly set this via command line, use the value from INI file
This commit fixes trac:1662
This commit was SVN r19995.
The following Trac tickets were found above:
Ticket 1662 --> https://svn.open-mpi.org/trac/ompi/ticket/1662
This will sit in trunk for a few days - would like to actually see some errors reported to syslog before moving the code to 1.3
This commit was SVN r19986.
another bug. This also causes 0-byte requests to be treated as a buffer
error, causing the base request to be requeued. On Cray XT, it may be
temporarily impossible to make allocations for buffer requests, as the
default stack size is small (8 MB) and there is no true swap device.
Even with the stack size increased, there will be cases in which this
condition recurs.
One possibility is to make the buffer allocations off of the heap; but,
this does not change the fact that eventually an out-of-memory condition
will occur and we need to support multiple receives in transit, a
condition for which the available buffer space may change. On the other
hand, if we switch to allocating the buffer space from the heap, we will
need to return an error when the allocation fails and there are no other
buffers in transit.
This commit was SVN r19981.
It highlighted a bug in the bookmark component where for persistent sends we were not copying the context, but just moving it. This caused us to lose track of the message if it is started/completed multiple times.
This will need to be brought over to the v1.3 branch, but it should soak overnight to get a round of testing first.
This commit was SVN r19962.
* Add OMPI_F77_CHECK_REAL16_C_EQUV test whether REAL*16 is bit
equivalent to long double. AC_DEFINE OMPI_REAL16_MATCHES_C with
result (0 or 1).
* Update ompi_info to only show real16 support if
OMPI_REAL16_MATCHES_C is 1.
* Update DDT to only support REAL16 and COMPLEX32 if
1==OMPI_REAL16_MATCHES_C.
* MPI Op function pointer tabls will have NULL for the REAL16 and
COMPLEX32 entries if 0==OMPI_REAL16_MATCHES_C.
* Slightly cleaned up OMPI_F77_GET_ALIGNMENT and OMPI_F77_CHECK m4
tests (use OMPI_VAR_SCOPE_PUSH/POP).
This commit was SVN r19948.
The following Trac tickets were found above:
Ticket 1603 --> https://svn.open-mpi.org/trac/ompi/ticket/1603
is >= 1. The default value of the MCA param is now -1, which means
"let someone else turn it on if they want to." So we should default
to ''off'' (false), and let the openib BTL (etc.) turn it on if it
can/wants to.
Failure to do this will default _pipeline to true because
-1(int)==true(bool). This causes a problem if the user tries to set
mpi_leave_pinned_pipeline to 1: they'll get a warning that you can't
set both _pinned and _pinned_pipeline to 1. This happens because
_pinned will get the bool-ified value of of the MCA parameter (-1),
and then the user sets the value of _pinned_pipeline to 1/true.
Hence, both of them are set to true. Bzzt!
This commit was SVN r19942.
1. register "mpi_preconnect_all" as a deprecated synonym for "mpi_preconnect_mpi"
2. remove "mpi_preconnect_oob" and "mpi_preconnect_oob_simultaneous" as these are no longer valid.
3. remove the routed framework's "warmup_routes" API. With the removal of the direct routed component, this function at best only wasted communications. The daemon routes are completely "warmed up" during launch, so having MPI procs order the sending of additional messages is simply wasteful.
4. remove the call to orte_routed.warmup_routes from MPI_Init. This was the only place it was used anyway.
The FAQs will be updated to reflect this changed situation, and a CMR filed to move this to the 1.3 branch.
This commit was SVN r19933.
parameters on the connecting side also. Also move define of IF_NAMESIZE
into if.h file. And lastly, add one verbose debug message which may be
useful if we run into other issues like this.
This commit fixes trac:1573.
This commit was SVN r19932.
The following Trac tickets were found above:
Ticket 1573 --> https://svn.open-mpi.org/trac/ompi/ticket/1573
I'm unable to split it in two parts, my patch and Edgar's one. So I just update
copyright information for both of us.
What this patch do:
- it use the unexpected queue create by commit r19562 to dispatch the
unexpected message to the right communicator (once this communicator
is created and initialized).
- delay the PML comm_add until we have the context_id for the new communicator.
- only do the PML comm_add on processes that really belong to the new
communicator. Please read the lengthy comment in the source code for the
reason behind this.
This commit was SVN r19929.
The following SVN revision numbers were found above:
r19562 --> open-mpi/ompi@acd3406aa7
in case of failure more user friendly. In addition we now fully integrate
with the MX_BONDING features (i.e the multi-rails bonding can be done either
at the PML OB1 level, or deep down in the MX library). This feature is driven
by the btl_mx_bonding MCA parameter.
Plus some extra features:
- Correctly compute the bandwidth in case of bonding.
- Nicely handle any mapper access errors.
This commit was SVN r19874.
<infiniband/driver.h> cannot be included because it will fail to
compile. So per advice from Roland, in that case, just put manually
include the one ibv_*() prototype that we need. Use an undocumented
feature of AC_CHECK_HEADERS to check for <infiniband/driver.h> that I
was told on the autoconf mailing list (see the comment for more
details).
This commit was SVN r19857.
We were only allocating one whereas we would typically need
one per interface. Most of the time we got lucky and we
happily trundled over unallocated memory. But with enough
interfaces, we would get a SEGV.
Also add include to fix compilation with static libs.
This commit was SVN r19845.
THREAD_MULTIPLE. There's a new (hidden) MCA parameter to re-enable
these BTLs in the presence of THREAD_MULTIPLE:
btl_base_thread_multiple_override. This MCA parameter should ''only''
be used by developers who are working on make their BTLs thread safe;
it should ''not'' be used by end-users!
This commit was SVN r19826.
The following Trac tickets were found above:
Ticket 1588 --> https://svn.open-mpi.org/trac/ompi/ticket/1588
(this was missed in #1585). Also, fix a long-standing problem that
the F90 wrapper compilers were using the F77 wrapper compiler flags.
This commit was SVN r19819.
We had a public symbol named "already_opened". This commit changes
the name to mca_btl_base_already_opened.
I guess it's good we have the visibility stuff enabled by default!
:-)
This commit was SVN r19791.
1. Allow MX bonding via btl_mx_bonding MCA parameter. With this on, Open MPI
suppose that lib MX will do the bonding, and we will only return one BTL.
Otherwise, we return as many as devices.
2. Decrease the memory footprint, by cleaning up what we store about the
peers and how we store it.
3. Allow multiple MX routes that share the same mapper. In this particular
case we will link by their nic_id.
4. Allow multiple MX routes with multiple mappers. In this case we match
the NICs based on the last 6 digits of the mapper MAC.
5. Increase the size of the eager and rendez-vous eager limits in the
case where we are unable to register an unexpected callback with MX.
6. Increase the default max number of MX fragments.
7. Increase the max number of MX BTLs.
8. Only allow mx_if_include and mx_if_exclude if we have acess to the
mapper.
This commit was SVN r19788.
1. completely and cleanly separates responsibilities between the HNP, orted, and tool components.
2. removes all wireup messaging during launch and shutdown.
3. maintains flow control for stdin to avoid large-scale consumption of memory by orteds when large input files are forwarded. This is done using an xon/xoff protocol.
4. enables specification of stdin recipients on the mpirun cmd line. Allowed options include rank, "all", or "none". Default is rank 0.
5. creates a new MPI_Info key "ompi_stdin_target" that supports the above options for child jobs. Default is "none".
6. adds a new tool "orte-iof" that can connect to a running mpirun and display the output. Cmd line options allow selection of any combination of stdout, stderr, and stddiag. Default is stdout.
7. adds a new mpirun and orte-iof cmd line option "tag-output" that will tag each line of output with process name and stream ident. For example, "[1,0]<stdout>this is output"
This is not intended for the 1.3 release as it is a major change requiring considerable soak time.
This commit was SVN r19767.
There is still a problem with OpenIB and threads (external to C/R functionality). It has been reported in Ticket #1539
Additionally:
* Fix a file cleanup bug in CRS Base.
* Fix a possible deadlock in the TCP ft_event function
* Add a mca_base_param_deregister() function to MCA base
* Add whole process checkpoint timers
* Add support for BTL: OpenIB, MX, Shared Memory
* Add support Mpool: rdma, sm
* Sundry bounds checking an cleanup in some scattered functions
This commit was SVN r19756.
I run IMB exchange on two QS22 machines with r19674 and it got stucked after 256 or 512 bytes every time.
After applying r19717 the test passed, so I guess this is a essential patch.
This commit was SVN r19752.
The following SVN revision numbers were found above:
r19674 --> open-mpi/ompi@15c47a2473
r19717 --> open-mpi/ompi@0a765cd788
unconditionally, which can result in a flood of messages to the user
if all MPI processes invoke abort. Additionally, some users were
confused because they saw the MPI_ABORT opal_output() messages from
''some'' MPI processes, but not ''all'' of them (despite the fact that
every MPI process supposedly invoked MPI_ABORT). The reason is that
calling MPI_ABORT triggers ORTE to kill all MPI processes, so it's a
race condition as to whether a) all MPI processes actually invoke
MPI_ABORT, and/or b) whether every process is able to opal_output()
before they are killed.
This commit does two simple things:
* Now use orte_show_help() for the MPI_ABORT message, so they are
aggregated.
* Add a note in the message that calling MPI_ABORT kills all
processes, so you might not see all output, yadda yadda yadda.
This commit was SVN r19735.
We still do an interreduce but it is now followed by an intrascatterv.
This fixes trac:1554.
This commit was SVN r19723.
The following Trac tickets were found above:
Ticket 1554 --> https://svn.open-mpi.org/trac/ompi/ticket/1554
following names are all new for v1.3, and therefore haven't been
officially released yet:
* btl_openib_of_cq_size
* btl_openib_of_max_inline_data
* btl_openib_of_pkey
* btl_openib_of_psn
* btl_openib_of_mtu
The "_of_" (for OpenFabrics) in there is redundant. It used to be
"_ib_", indicating that these values are pretty much passed directly
to the verbs stack. But I think the "openib" in the name implies this
already; having "_of_" in there just seems redundant, makes the name
longer, and seems redundant. It's also redundant.
So I took those "_of_"'s out of the MCA names. The old (v1.2) names
are still valid (but deprecated), such ash btl_openib_ib_cq_size.
This commit was SVN r19718.
* Change name: mca_btl_openib_of_pkey_value -> mca_btl_openib_of_pkey
(since now there's no index, the "_value" suffix is somewhat
superfluous)
* Put in a better help message for the _pkey MCA param (to agree with
the new help message in v1.2.8)
This commit was SVN r19716.
Commit from a long-standing Mercurial tree that ended up incorporating a lot of things:
* A few fixes for CPC interface changes in all the CPCs
* Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC
* #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC)
* Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag)
* CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers.
* CPCs allocate a CTS buffer only if they're about to make a connection
* RDMA CM improvements:
* Use threaded mode openib fd monitoring to wait for for RDMA CM events
* Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions
* Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors
* Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type
* Renamed many variables to be internally consistent
* Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator
* Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them)
* Use rdma_create_qp() instead of ibv_create_qp()
* openib fd monitoring improvements:
* Renamed a bunch of functions and variables to be a little more obvious as to their true function
* Use pipes to communicate between main thread and service thread
* Add ability for main thread to invoke a function back on the service thread
* Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3))
* Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port
* Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout
* Other improvements:
* btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp")
* Somewhat improved error handling
* Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements
* Oodles of little coding style fixes
* Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction
* Added some more show_help error messages
* Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread)
This commit was SVN r19686.
The following Trac tickets were found above:
Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210