allows the BTL to specify a specific ompi_proc_t that had an
error. Also add an optional descriptive string. Currently, arguments
are not used but will be by future failover PML.
Changes based on RFC. Reviewed by George Bosilca.
This commit was SVN r23174.
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
back the native error code.
* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
(OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
decode 'ret' to get the native error code.
This commit was SVN r23162.
http://marc.info/?l=linux-mm-commits&m=127352503417787&w=2 for more
details.
* Remove the ptmalloc memory component; replace it with a new "linux"
memory component.
* The linux memory component will conditionally compile in support
for ummunotify. At run-time, if it has ummunotify support and
finds run-time support for ummunotify (i.e., /dev/ummunotify), it
uses it. If not, it tries to use ptmalloc via the glibc memory
hooks.
* Add some more API functions to the memory framework to accomodate
the ummunotify model (i.e., poll to see if memory has "changed").
* Add appropriate calls in the rcache to the new memory APIs to see
if memory has changed, and to react accordingly.
* Add a few comments in the openib BTL to indicate why we don't need
to notify the OPAL memory framework about specific instances of
registered memory.
* Add dummy API calls in the solaris malloc component (since it
doesn't have polling/"did memory change" support).
This commit was SVN r23113.
btl_openib_ip.*. The routines in these files are not specific to
iwarp -- they are specific to IP interfaces used with IBV devices
(even IB or IBoE/RoCEE/whatever devices).
This commit was SVN r22718.
issues with iwarp.c. These fixes are needed for IBoE / ROCEE /
whateveritscalledtoday. I added a few minor changes to his base
patch.
This commit was SVN r22717.
long-standing bugs (see trac ticket list below). They're currently
somewhat obscure bugs, but are becoming much more relevant in a world
where OpenFabrics devices fail and you replace them with a newer model
(i.e., the cluster is homogeneous... ''except'' for where you had to
replace one or two OpenFabrics devices, and the same model is no
longer available).
This commit includes a '''lengthy''' comment (that we spent a lot of
time writing!) about what exactly it does and does not do. The
previous code was rather short and '''incredibly''' subtle. The new
code is slightly longer, but is both much more explicit and much more
painstakingly documented.
This commit fixes multiple trac tickets. The real one that we fix is
#1707; the others are fixed as a side-effect. In short: fixing #1707
prevents Bad Things from happening later in the startup sequence.
Fixes trac:1707, #2164, #1574.
cmr:v1.4.2:reviewer=pasha
cmr:v1.5:reviewer=pasha
This commit was SVN r22592.
The following Trac tickets were found above:
Ticket 1707 --> https://svn.open-mpi.org/trac/ompi/ticket/1707
(now works on both big and little endian machines)
* Be a little more flexible when looking for active devices in
btl_openib_component.c
* Add device name and port number to lots of verbose and help
messages
* Add a bunch of verbose messages to give insight into what is
occurring during all the CPC wireups
This commit was SVN r21418.
I just found that we have 2 place where we call for XRC domain
creation. First one in init_one_device() and second one prepare_device_for_use().
They have absolutely identical code, but the call in init_one_device() is useless
because on this stage we don't know about QP configuration and we don't know if we need
XRC at all. So I removing the duplicated code from init_one_device().
This commit was SVN r21235.
OMPI_* to OPAL_*. This allows opal layer to be used more independent
from the whole of ompi.
NOTE: 9 "svn mv" operations immediately follow this commit.
This commit was SVN r21180.
- Delete unnecessary header files using
contrib/check_unnecessary_headers.sh after applying
patches, that include headers, being "lost" due to
inclusion in one of the now deleted headers...
In total 817 files are touched.
In ompi/mpi/c/ header files are moved up into the actual c-file,
where necessary (these are the only additional #include),
otherwise it is only deletions of #include (apart from the above
additions required due to notifier...)
- To get different MCAs (OpenIB, TM, ALPS), an earlier version was
successfully compiled (yesterday) on:
Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled
This commit was SVN r21096.
and mo'bettah. Put in lengthy comments explaining what's going on.
We might still want to tweak this some more, but we can no longer get
IMB-EXT to hang with this new code anymore (e.g., even without eager
RDMA -- we discovered after the fact that the code in the v1.3.2
release will hang if eager RDMA is disabled).
Fixes trac:1890. Really.
This commit was SVN r21061.
The following Trac tickets were found above:
Ticket 1890 --> https://svn.open-mpi.org/trac/ompi/ticket/1890
_endpoint_post_send(), which could result in an infinite loop (see the
comment in the code).
This is part one of a proper fix; it's suitable for the v1.3 tree and
for an immediate release. Pasha and I plan to spend a little more
time and clean up this stuff properly, but it does not need to be
included in v1.3.2.
This commit was SVN r21047.
The following Trac tickets were found above:
Ticket 1890 --> https://svn.open-mpi.org/trac/ompi/ticket/1890
and https://svn.open-mpi.org/trac/ompi/ticket/1853, mallopt() hints do
not always work -- it is possible for memory to be returned to the OS
and therefore OMPI's registration cache becomes invalid.
This commit removes all use of mallopt() and uses a different way to
integrate ptmalloc2 than we have done in the past. In particular, we
use almost exactly the same technique as MX:
* Remove all uses of mallopt, to include the opal/memory mallopt
component.
* Name-shift all of OMPI's internal ptmalloc2 public symbols (e.g.,
malloc -> opal_memory_ptmalloc2_malloc).
* At run-time, use the existing glibc allocator malloc hook function
pointers to fully hijack the glibc allocator with our own
name-shifted ptmalloc2.
* Make the decision whether to hijack the glibc allocator ''at run
time'' (vs. at link time, as previous ptmalloc2 integration
attempts have done). Look at the OMPI_MCA_mpi_leave_pinned
and OMPI_MCA_mpi_leave_pinned_pipeline environment variables and
the existence of /sys/class/infiniband to determine if we should
install the hooks or not.
* As an added bonus, we can now tell if libopen-pal is linked
statically or dynamically, and if we're linked statically, we
assume that munmap intercept support doesn't work.
See the opal/mca/memory/ptmalloc2/README-open-mpi.txt file for all the
gory details about the implementation.
Fixes trac:1853.
This commit was SVN r20921.
The following Trac tickets were found above:
Ticket 1853 --> https://svn.open-mpi.org/trac/ompi/ticket/1853
Adapt orte_process_info to orte_proc_info, and
change orte_proc_info() to orte_proc_info_init().
- Compiled on linux-x86-64
- Discussed with Ralph
This commit was SVN r20739.
Anyway, this is blocking the move: do not include pml.h
if not really needed, aka none of the following used:
mca_pml
MCA_PML_CALL
OMPI_ANY_TAG
OMPI_ANY_SOURCE
OMPI_PROC_NULL
- Notable exceptions (deleting in one header->adding):
- ompi/mca/mtl/psm/
- ompi/mca/osc/rdma/
- ompi/mca/btl/openib/btl_openib_endpoint.c depended on
pml_base_sendreq.h
- Tested on Linux/x86-64, this time including make check
(thanks Jeff and Ralph)
This commit was SVN r20725.
a notifier module.
The Notifier framework was extended slightly to
convey more information about each event notice.
This works with the FTB v0.5 API.
To compile with FTB support, use --with-ftb=/path/to/ftb/install
CIFTS == Coordinated Infrastructure for Fault Tolerant Systems
FTB == Fault Tolerance Backplane
see http://wiki.mcs.anl.gov/cifts/index.php
This commit was SVN r20655.
Often, orte/util/show_help.h is included, although no functionality
is required -- instead, most often opal_output.h, or
orte/mca/rml/rml_types.h
Please see orte_show_help_replacement.sh commited next.
- Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
actually showed two *missing* #include "orte/util/show_help.h"
in orte/mca/odls/base/odls_base_default_fns.c and
in orte/tools/orte-top/orte-top.c
Manually added these.
Let's have MTT the last word.
This commit was SVN r20557.
of more than one of the btl_openib_if_include, btl_openib_if_exclude,
btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude MCA parameters.
This commit was SVN r20053.
determining which IP address to use when transmitting data. Also it adds logic
to prevent usage of more than one of the btl_openib_if_include,
btl_openib_if_exclude, btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude
MCA parameters.
This should complete the code modifications needed for ticket 1665.
This commit was SVN r20052.
* If max_inline_data == -1 perform runtime detection
* If max_inline_data >=0 use the value provided
* If the user does not explicitly set this via command line, use the value from INI file
This commit fixes trac:1662
This commit was SVN r19995.
The following Trac tickets were found above:
Ticket 1662 --> https://svn.open-mpi.org/trac/ompi/ticket/1662
This will sit in trunk for a few days - would like to actually see some errors reported to syslog before moving the code to 1.3
This commit was SVN r19986.
<infiniband/driver.h> cannot be included because it will fail to
compile. So per advice from Roland, in that case, just put manually
include the one ibv_*() prototype that we need. Use an undocumented
feature of AC_CHECK_HEADERS to check for <infiniband/driver.h> that I
was told on the autoconf mailing list (see the comment for more
details).
This commit was SVN r19857.
THREAD_MULTIPLE. There's a new (hidden) MCA parameter to re-enable
these BTLs in the presence of THREAD_MULTIPLE:
btl_base_thread_multiple_override. This MCA parameter should ''only''
be used by developers who are working on make their BTLs thread safe;
it should ''not'' be used by end-users!
This commit was SVN r19826.
The following Trac tickets were found above:
Ticket 1588 --> https://svn.open-mpi.org/trac/ompi/ticket/1588
There is still a problem with OpenIB and threads (external to C/R functionality). It has been reported in Ticket #1539
Additionally:
* Fix a file cleanup bug in CRS Base.
* Fix a possible deadlock in the TCP ft_event function
* Add a mca_base_param_deregister() function to MCA base
* Add whole process checkpoint timers
* Add support for BTL: OpenIB, MX, Shared Memory
* Add support Mpool: rdma, sm
* Sundry bounds checking an cleanup in some scattered functions
This commit was SVN r19756.
I run IMB exchange on two QS22 machines with r19674 and it got stucked after 256 or 512 bytes every time.
After applying r19717 the test passed, so I guess this is a essential patch.
This commit was SVN r19752.
The following SVN revision numbers were found above:
r19674 --> open-mpi/ompi@15c47a2473
r19717 --> open-mpi/ompi@0a765cd788