* s/port/tcp_port/g where relevant to disambiguate TCP port from
device port
* Rework ipaddrcheck to make it work in the LMC>0 case
This commit was SVN r18482.
The following Trac tickets were found above:
Ticket 1281 --> https://svn.open-mpi.org/trac/ompi/ticket/1281
* Ensure _iwarp.h is always included, or you'll get warnings on
platforms that don't have the RDMACM
* Add skeleton for function descriptions in comments in iwarp.h
This commit was SVN r18477.
opal_ifnext() return -1 upon completion); don't check it against
opal_ifcount() -- the interface indexes aren't necessarily related to
how many interfaces were found.
This commit was SVN r18476.
This commit has the same commit message as r18450, but without the
extra bonus memory corruption that was introduced.
This commit was SVN r18467.
The following SVN revision numbers were found above:
r18450 --> open-mpi/ompi@5295902ebe
The following Trac tickets were found above:
Ticket 1285 --> https://svn.open-mpi.org/trac/ompi/ticket/1285
1. We can't use orte_output in the CPC service thread because orte is
not thread safe
1. Use the macro version sso that they're compiled out of production
builds
This commit was SVN r18455.
* allow receive_queues to be specified in the INI file
* detect when multiple different receive_queues are specified and
gracefully abort
However, accomplishing these goals ran into multiple difficulties. By
putting receive_queues in the INI file:
1. we may not find the value until we've already traversed multiple HCAs
1. we may find multiple different receive_queues values
But since the openib btl initializes as it discovers each HCA/port/LID
(including the BSRQ data), if we find a new receive_queues value late
in the discovery process, then all the BSRQ data that was previously
initialized will likely be invalid. So I had to pull all the BSRQ
initialization out until after the rest of the discovery /
initialization process.
Additionally, note that if the user specifies the MCA parameter
btl_openib_receive_queues, it trumps whatever was in the INI file. So
in this case, there can never be a receive_queues conflict. This
commit does the following (Jon wrote part of this, too):
* adapt _ini.c to accept the "receive_queues" field in the file
* move 90% of _setup_qps() from _ini.c to _component.c
* move what was left of _setup_qps() into the main
_register_mca_params() function
* adapt init_one_hca() to detect conflicting receive_queues values
from the INI file
* after the _component.c loop calling init_one_hca():
* call setup_qps() to parse the final receive_queues string value
* traverse all resulting btls and initialize their HCAs (if they
weren't already): setup some lists and call prepare_hca_for_use()
I tested this code on a dual-HCA system where I artificially put in
differing receive_queues values in the INI file for the two different
types of HCAs that I have and it all seemed to work.
This commit was SVN r18450.
The following Trac tickets were found above:
Ticket 1285 --> https://svn.open-mpi.org/trac/ompi/ticket/1285
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
The iWARP subnet ID determination should not be in the RDMACM cpc, as
it was in the preversion, as this violates the cpc abstract that is
present throughout the code. Also, this patch uses the opal_list_t
data struct instead of using its own linked lists.
This attempt includes *iwarp.c and *iwarp.h
This commit was SVN r18414.
the btl_openib_iwarp.c and btl_openib_iwarp.h files.
This commit was SVN r18410.
The following SVN revision numbers were found above:
r18409 --> open-mpi/ompi@056bbb68c8
The iWARP subnet ID determination should not be in the RDMACM cpc, as
it was in the preversion, as this violates the cpc abstract that is
present throughout the code. Also, this patch uses the opal_list_t
data struct instead of using its own linked lists.
This commit was SVN r18409.
This enables subnet differientation for iWARP devices, and rearrange
initilization so that the services are available when they are needed.
This commit was SVN r18393.
If there is no IP Address, have rdmacm log the correct error and let
another cpc have a go at it. This is being done by splitting off the
IP address checking logic for the modex message creation, and having
it log the correct error in the error case.
This commit was SVN r18392.
For iWARP, the TCP connection is tied to the QP once the QP is in RTS.
And destroying the QP is thus tied to connection teardown for iWARP.
This is a key distinction from IB, I think. Anyway, to destroy the
connection in iWARP you must move the QP out of RTS, either into CLOSING
for a nice graceful close, or to ERROR if you want to be rude. In both
cases, all pending non-completed SQ and RQ WRs must be flushed.
This patch ignores all flush errors reaped by the cq and removes an
earlier attempt to work around this in the rdmacm cpc.
This commit was SVN r18388.
If there are multiple QP's, RDMACM will not send a message if the
qpnum != 0. In doing so, it will log an error unecessarily. This
removes that.
This commit was SVN r18363.
Add the logic to support using port numbers, instead of simply using
the IP address of the sending node to determine which endpoint to
connect. Since each process calls the cpc query function, it will
generate its own port to listen on thus enablign this to work.
This commit was SVN r18362.
The endpoint may be appended to list during XOOB connection bring up.
This commit was SVN r18328.
The following SVN revision numbers were found above:
r17940 --> open-mpi/ompi@ebfdd133f5
Rational (taken from the code):
/* This is PITA. We never know which source address an
* incoming/outgoing packet will have, so even with
* btl_tcp_if_include/exclude on the remote end, we
* might get a different source address.
*
* If this address isn't included in btl_proc->proc_addrs,
* we would erroneously drop the connection
*/
merge -r18165:18167 to the trunk.
This commit was SVN r18169.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r18165
r18167
mca_btl_openib_endpoint_post_rr_nolock is freeing the endpoint lock on
the error case, but most/all of the functions calling this free the lock
regardless of its error case. Thus resulting is a double free of the
lock.
This commit was SVN r18131.
Only one place used the user name field - session_dir, when formulating the name of the top-level directory. Accordingly, the code for getting the user's id has been moved to the session_dir code.
This commit was SVN r17926.
portals btl has ownership and therefor didn't free the frag as it should) this
causes leakage and hangs in MPI_Finalize.
Also added a bit more debugging.
This commit was SVN r17900.
Also, update some properties (source files should not be executeable...), and remove a couple unneeded inclusions of orte_proc_table.h
This commit was SVN r17655.
return OMPI_ERR_UNREACH if the port returns an invalid speed or
width. OMPI_ERR_VALUE_OUT_OF_BOUNDS is reserved for when we exceed
the number of allowable BTLs.
This commit was SVN r17500.
We loop over all peer addresses and accept when one of them matches.
Note that this might break functionality: mca_btl_tcp_proc_insert now
always inserts the same endpoint. (is the lack of endpoints the problem?
should there be one for every remote address?)
Re #1206
This commit was SVN r17307.
- the registration array is now global instead of one by BTL.
- each framework have to declare the entries in the registration array reserved. Then
it have to define the internal way of sharing (or not) these entries between all
components. As an example, the PML will not share as there is only one active PML
at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3
are reserved for the framework while the remaining 5 are use internally by each
framework.
- The registration function is optional. If a BTL do not provide such function,
nothing happens. However, in the case where such function is provided in the BTL
structure, it will be called by the BML, when a tag is registered.
Now, it's time for the second step... Converting OB1 from a switch based PML to an
active message one.
This commit was SVN r17140.
for dynamic selection of cpc methods based on what is available. It
also allows for inclusion/exclusions of methods. It even futher allows
for modifying the priorities of certain cpc methods to better determine
the optimal cpc method.
This patch also contains XRC compile time disablement (per Jeff's
patch).
At a high level, the cpc selections works by walking through each cpc
and allowing it to test to see if it is permissable to run on this
mpirun. It returns a priority if it is permissable or a -1 if not. All
of the cpc names and priorities are rolled into a string. This string
is then encapsulated in a message and passed around all the ompi
processes. Once received and unpacked, the list received is compared
to a local copy of the list. The connection method is chosen by
comparing the lists passed around to all nodes via modex with the list
generated locally. Any non-negative number is a potentially valid
connection method. The method below of determining the optimal
connection method is to take the cross-section of the two lists. The
highest single value (and the other side being non-negative) is selected
as the cpc method.
svn merge -r 16948:17128 https://svn.open-mpi.org/svn/ompi/tmp-public/openib-cpc/ .
This commit was SVN r17138.
header as double-word aligned and prevents bus errors on SPARC
based servers. This is part of fix for #1148.
Refs trac:1148
This commit was SVN r17090.
The following Trac tickets were found above:
Ticket 1148 --> https://svn.open-mpi.org/trac/ompi/ticket/1148
high prio QPs and low prio QPs) and because not all of them are polled each time
progrgess() is called (to save on latency) starvation is possible. The commit
fixes this. Now each channel is polled, but higher priority channels are polled
more often. Three new parameters are introduced that control polling ratios
between different channels.
This commit was SVN r17024.
(sometimes after the merge with the ORTE branch), the opal_pointer_array
will became the only pointer_array implementation (the orte_pointer_array
will be removed).
This commit was SVN r17007.
so that it is higher than the new TCP BTL exclusivity as of r16942.
The portals BTL maintainer may want to do the same...
This commit was SVN r16995.
The following SVN revision numbers were found above:
r16942 --> open-mpi/ompi@80e9730100
about linkers, have all OPAL, ORTE, and OMPI components '''not'' link
against the OPAL, ORTE, or OMPI libraries.
See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for
details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a
better-formatted version of the same info).
This commit was SVN r16968.
should be passed via commandline. However, there is a slight coding
bug in the openib connect code. When registering the name of the
option, mca_base_param_reg_string will prepend the relevant info
("btl_openib_" in this case). The existing code will require
"btl_openib_btl_openib_connect" instead of "btl_openib_connect".
This patch corrects this.
This commit was SVN r16937.
smaller then allocated size.
2. If reserve zero don't allocate coalesced frag since it will be RDMAed, not
send. The logic was other way around.
This commit was SVN r16928.
mca_btl_openib_mca_setup_qps(). It looks like someone just forgot to
clean-up the previous call when they added the check for the return
code.
I ran a quick IMB test over IB to verify everything is still working.
This commit was SVN r16870.
parameters don't make any sense. Credits are never piggybacked. Also make
default queue sizes to be calculated from eager_limit and max_send_size values.
This commit was SVN r16816.
needed instead of creating it and then canceling if it is not needed. Change
error handling during finalize so that it will not skip async thread
destruction. Otherwise async thread may segfault during openib module unloading.
This commit was SVN r16782.
to a pending queue of eager rdma QP instead of correct pending list. This patch
fixes this by getting reed of "eager rdma qp" notion. Packet is always send
over its order QP. The patch also adds two pending queues for high and low prio
packets. Only high prio packets are sent over eager RDMA channel.
This commit was SVN r16780.
main idea (except of cleanup) is to save on initialisation of unneeded fields
and to use C type checking system to catch obvious errors.
This commit was SVN r16779.
has his own range which is defined by a min value and a range. By default
there is no limitation on the port range, which is exactly the same
behavior as before.
This commit was SVN r16584.
the use of the --mca btl_base_verbose flag. The
btl framework now matches all the other frameworks.
Slightly modify error messages for clarity.
This commit was SVN r16443.
* Fix some missing includes in a few places.
* Add the cr_request() functionality to the BLCR CRS component.
We are now dependent upon the 0.6.* series of BLCR.
* Made the CR notification mechanism a registered function.
This way we can have an OPAL-only version and it can be replaced at
runtime with the ORTE version.
* Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only
CR functionality when the user wants it. Default: Disabled.
* Fix the placement of a checkpoint request check in MPI_Init
* Pull the OPAL notification mechanism into the SnapC framework.
* We no longer fork/exec the 'opal-checkpoint' command for local
checkpointing, the Local coordinator in the orted does this directly.
* The Local and Application coordinator talk together bypassing the OPAL
notifiation mechanism.
* Optimized the Local <-> App Coordinator communication.
* Improved the structure used to track vpid_snapshots in the local coord.
* Fix a race condition in which an application under heavy communication load
may produce an inconsistent global checkpoint.
This commit was SVN r16389.
Check if an exclusion string (i.e. '-mca btl ^sm) was provided; if so OFUD just disables itself.
This commit was SVN r16307.
The following Trac tickets were found above:
Ticket 1154 --> https://svn.open-mpi.org/trac/ompi/ticket/1154
Each one of them has a field to store QP type, but this is redundant.
Store qp type only in one structure (the component one).
This commit was SVN r16272.
meaning "infinite") is no longer larger than the minimum required
size. So put in an appropriate test to ensure that "infinite" was not
requested.
This commit was SVN r16142.
you'll get a helpful error message and the openib BTL will deactivate
itself.
This commit was SVN r16133.
The following Trac tickets were found above:
Ticket 1133 --> https://svn.open-mpi.org/trac/ompi/ticket/1133
Basically revert this part of r16015.
This commit was SVN r16029.
The following SVN revision numbers were found above:
r16015 --> open-mpi/ompi@435e7d80e9
the ompi_convertor_need_buffers function to only return 0 if the convertor
is homogeneous (which it never does on the trunk, but does to on v1.2, but
that's a different issue). Only enable the heterogeneous rdma code for
a btl if it supports it (via a flag), as some btls need some work for this
to work properly. Currently only TCP and OpenIB extensively tested
This commit was SVN r15990.
to always first check for a NULL frag pointer before trying to send the
fragment. This avoids an issue in multi-threaded execution in which
multiple threads working on the same endpoint can result in a thread
finding itself here with nothing to send.
This commit was SVN r15963.
one HCA. Multiple ports, LMC, multiple BTLs per one LID. Having only one CQ for
all of them substantially reduce polling time.
This commit was SVN r15933.
semicolons but the new specitifcation string used colons. The text
parser now looks for colons.
* Changed all opal_output() error messages to
much-more-helpful/descriptive opal_show_help() messages.
* A few minor style/indenting fixes
This commit was SVN r15850.
The following SVN revision numbers were found above:
r15848 --> open-mpi/ompi@dd30597f39
/tmp/jms-modular-wireup branch):
* This commit moves all the openib BTL connection code out of
btl_openib_endpoint.c and into a connect "pseudo-component" area,
meaning that different schemes for doing OFA connection schemes can
be chosen via function pointer (i.e., MCA parameter) at run-time.
* The connect/connect.h file includes comments describing the
specific interface for the connect pseudo-component.
* Two pseudo-components are in this commit (more can certainly be
added).
* oob: use the same old oob/rml scheme for creating OFA connections
that we've had forever; this now just puts the logic into this
self-contained pseudo-component.
* rdma_cm: a currently-empty set of functions (that currently
return NOT_IMPLEMENTED) that will someday use the RDMA connection
manager to make OFA connections.
This commit was SVN r15786.
This mpool will have no btl module owner there was no btl created for
the HCA with no ports, but it will still be tracked in the mpool
framework (i.e., it's available).
If MPI_ALLOC_MEM is called by the app, one of two things will happen:
1. if there's an HCA on the host with some active ports, the openib
btl component will still be in the process space, and therefore
the "mpool with no btl" (MWNB) module will still be able to call
the reg/dereg functions, and all will be fine. However, if
MPI_FREE_MEM is never invoked to free the memory, bad things will
happen during MPI_FINALIZE. The pml is finalized, which finalizes
all the btls. The btls finalize all their mpools and all is fine.
But later we close down the mpool framework which then finalizes
any left over mpool modules, such as MWNB. However, the openib
BTL module functions that the MWNB was registered with are no
longer in the process space, and it segv's while trying deregister
the memory.
2. if there are *no* HCA's on the host with active ports, then the
openib btl will have been unloaded, and when the MWNM tries to
register the memory, the functions it tries to call (in the openib
btl) are no longer there, and we segv.
This commit was SVN r15735.