1
1

289 Коммитов

Автор SHA1 Сообщение Дата
Vasily Filipov
c036c6ef95 Adding support for on-demand SRQ pre-post (receive wqe allocation)
This commit was SVN r22313.
2009-12-15 15:52:10 +00:00
Vasily Filipov
354bfe527f Improving support for non homogeneous OpenFabrics network configurations
This commit was SVN r22312.
2009-12-15 14:25:07 +00:00
Lenny Verkhovsky
796b765952 fixed finding minimum distance to ibv_device,
thanks to Pasha .

This commit was SVN r21916.
2009-08-31 07:54:22 +00:00
Rainer Keller
6050020c54 - Use OMPI_SUCCESS.
Fails to compile in environments with --disable-mpi

This commit was SVN r21785.
2009-08-10 17:46:25 +00:00
Rolf vandeVaart
c82e468ede Undo revision r21767 - sorry folks
This commit was SVN r21769.

The following SVN revision numbers were found above:
  r21767 --> open-mpi/ompi@41f38110ff
2009-08-05 22:23:26 +00:00
Rolf vandeVaart
41f38110ff HCA failover support in openib BTL
This commit was SVN r21767.
2009-08-05 21:53:02 +00:00
Nysal Jan
938599cb2d Fix build failure with latest IBM XL C/C++ v10.1 compiler. Also this seems like cleaner code.
This commit was SVN r21497.
2009-06-23 14:08:04 +00:00
Jeff Squyres
c39998db17 Also show the "you might not have enough registered memory" warning
message earlier in the openib BTL startup sequence

This commit was SVN r21469.
2009-06-18 12:24:39 +00:00
Jeff Squyres
814a8f5e0f * Fix #1916: endian problems in iwarp wireup on big endian machines
(now works on both big and little endian machines)
 * Be a little more flexible when looking for active devices in
   btl_openib_component.c
 * Add device name and port number to lots of verbose and help
   messages
 * Add a bunch of verbose messages to give insight into what is
   occurring during all the CPC wireups

This commit was SVN r21418.
2009-06-11 17:30:30 +00:00
Jeff Squyres
efd229b56b Clean up bunches of compiler warnings
This commit was SVN r21242.
2009-05-14 15:39:53 +00:00
Pavel Shamis
14eb11c18a Removing unused and duplicated XRC code.
I just found that we have 2 place where we call for XRC domain
creation. First one in init_one_device() and second one prepare_device_for_use().
They have absolutely identical code, but the call in init_one_device() is useless
because on this stage we don't know about QP configuration and we don't know if we need 
XRC at all. So I removing the duplicated code from init_one_device().

This commit was SVN r21235.
2009-05-14 13:15:00 +00:00
Greg Koenig
60485ff95f This is a very large change to rename several #define values from
OMPI_* to OPAL_*.  This allows opal layer to be used more independent
from the whole of ompi.

NOTE: 9 "svn mv" operations immediately follow this commit.

This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Rainer Keller
221fb9dbca ... Delayed due to notifier commits earlier this day ...
- Delete unnecessary header files using
   contrib/check_unnecessary_headers.sh after applying
   patches, that include headers, being "lost" due to
   inclusion in one of the now deleted headers...

   In total 817 files are touched.
   In ompi/mpi/c/ header files are moved up into the actual c-file,
   where necessary (these are the only additional #include),
   otherwise it is only deletions of #include (apart from the above
   additions required due to notifier...)

 - To get different MCAs (OpenIB, TM, ALPS), an earlier version was
   successfully compiled (yesterday) on:
   Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
   Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
   Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled

This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
Jeff Squyres
8704d33f4e Revamp one of the credits system checks and make it smaller, simpler,
and mo'bettah.  Put in lengthy comments explaining what's going on.
We might still want to tweak this some more, but we can no longer get
IMB-EXT to hang with this new code anymore (e.g., even without eager
RDMA -- we discovered after the fact that the code in the v1.3.2
release will hang if eager RDMA is disabled).

Fixes trac:1890.  Really.

This commit was SVN r21061.

The following Trac tickets were found above:
  Ticket 1890 --> https://svn.open-mpi.org/trac/ompi/ticket/1890
2009-04-23 20:04:39 +00:00
Jeff Squyres
9d94fa2308 Fixes trac:1890: we weren't checking the return status of
_endpoint_post_send(), which could result in an infinite loop (see the
comment in the code).

This is part one of a proper fix; it's suitable for the v1.3 tree and
for an immediate release.  Pasha and I plan to spend a little more
time and clean up this stuff properly, but it does not need to be
included in v1.3.2.

This commit was SVN r21047.

The following Trac tickets were found above:
  Ticket 1890 --> https://svn.open-mpi.org/trac/ompi/ticket/1890
2009-04-21 16:57:17 +00:00
Jeff Squyres
0d52271cd6 Per http://www.open-mpi.org/community/lists/announce/2009/03/0029.php
and https://svn.open-mpi.org/trac/ompi/ticket/1853, mallopt() hints do
not always work -- it is possible for memory to be returned to the OS
and therefore OMPI's registration cache becomes invalid.

This commit removes all use of mallopt() and uses a different way to
integrate ptmalloc2 than we have done in the past.  In particular, we
use almost exactly the same technique as MX:

 * Remove all uses of mallopt, to include the opal/memory mallopt
   component.
 * Name-shift all of OMPI's internal ptmalloc2 public symbols (e.g.,
   malloc -> opal_memory_ptmalloc2_malloc).
 * At run-time, use the existing glibc allocator malloc hook function
   pointers to fully hijack the glibc allocator with our own
   name-shifted ptmalloc2.
 * Make the decision whether to hijack the glibc allocator ''at run
   time'' (vs. at link time, as previous ptmalloc2 integration
   attempts have done).  Look at the OMPI_MCA_mpi_leave_pinned
   and OMPI_MCA_mpi_leave_pinned_pipeline environment variables and
   the existence of /sys/class/infiniband to determine if we should
   install the hooks or not.
 * As an added bonus, we can now tell if libopen-pal is linked
   statically or dynamically, and if we're linked statically, we
   assume that munmap intercept support doesn't work.

See the opal/mca/memory/ptmalloc2/README-open-mpi.txt file for all the
gory details about the implementation.

Fixes trac:1853.

This commit was SVN r20921.

The following Trac tickets were found above:
  Ticket 1853 --> https://svn.open-mpi.org/trac/ompi/ticket/1853
2009-04-01 17:52:16 +00:00
Donald Kerr
27ed29a0a1 wrap linux specific steps in __linux__ define
This commit was SVN r20888.
2009-03-26 15:09:01 +00:00
Pavel Shamis
d25b7203a2 Adding send_immediate (sendi) implementation to openib btl.
This commit was SVN r20881.
2009-03-25 16:53:26 +00:00
Pavel Shamis
8888c9831c Prevent segfault for case when we release SRQ before srq_create.
This commit was SVN r20872.
2009-03-25 14:18:05 +00:00
Pavel Shamis
c6d038a8e8 Adding vendor_error code to error report.
This commit was SVN r20803.
2009-03-17 15:47:34 +00:00
Rainer Keller
296a6fb275 - So much fun along the way:
we normally don't do opal/include/opal/...
   Just use the std. opal/...

This commit was SVN r20766.
2009-03-12 19:21:11 +00:00
Rainer Keller
ec0ed48718 - Revert r20739
This commit was SVN r20742.

The following SVN revision numbers were found above:
  r20739 --> open-mpi/ompi@781caee0b6
2009-03-05 21:56:03 +00:00
Rainer Keller
781caee0b6 - First of two or three patches, in orte/util/proc_info.h:
Adapt orte_process_info to orte_proc_info, and
   change orte_proc_info() to orte_proc_info_init().
 - Compiled on linux-x86-64
 - Discussed with Ralph

This commit was SVN r20739.
2009-03-05 20:36:44 +00:00
Rainer Keller
9dea63d63a - Last of intrusive commits (promised)... err for now.
Anyway, this is blocking the move: do not include pml.h
   if not really needed, aka none of the following used:
     mca_pml
     MCA_PML_CALL
     OMPI_ANY_TAG
     OMPI_ANY_SOURCE
     OMPI_PROC_NULL

 - Notable exceptions (deleting in one header->adding):
   - ompi/mca/mtl/psm/
   - ompi/mca/osc/rdma/
   - ompi/mca/btl/openib/btl_openib_endpoint.c depended on
     pml_base_sendreq.h

 - Tested on Linux/x86-64, this time including make check
   (thanks Jeff and Ralph)

This commit was SVN r20725.
2009-03-04 17:06:51 +00:00
Tim Mattox
57be80c983 First pass at integrating the CIFTS/FTB support as
a notifier module.
The Notifier framework was extended slightly to
convey more information about each event notice.
This works with the FTB v0.5 API.

To compile with FTB support, use --with-ftb=/path/to/ftb/install

CIFTS == Coordinated Infrastructure for Fault Tolerant Systems
FTB == Fault Tolerance Backplane
see http://wiki.mcs.anl.gov/cifts/index.php

This commit was SVN r20655.
2009-02-27 22:53:43 +00:00
Rainer Keller
04567d3af0 - Header orte/mca/errmgr/errmgr.h is not needed.
Once again compiles fine with -Wimplicit-function-declaration   

This commit was SVN r20640.
2009-02-26 04:05:30 +00:00
Rainer Keller
d81443cc5a - On the way to get the BTLs split out and lessen dependency on orte:
Often, orte/util/show_help.h is included, although no functionality
   is required -- instead, most often opal_output.h, or               
   orte/mca/rml/rml_types.h                                           
   Please see orte_show_help_replacement.sh commited next.            

 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   actually showed two *missing* #include "orte/util/show_help.h"     
   in orte/mca/odls/base/odls_base_default_fns.c and                  
   in orte/tools/orte-top/orte-top.c                                  
   Manually added these.                                              

   Let's have MTT the last word.

This commit was SVN r20557.
2009-02-14 02:26:12 +00:00
Pavel Shamis
391b101439 Renaming pending_frags to no_credits_pending_frags.
(this commit is part of bug fix for ticket #1693)

This commit was SVN r20217.
2009-01-07 14:41:20 +00:00
Pavel Shamis
2f7b66160b Adding real fix for ticket #1693 - XRC + coalescing segfault.
This commit was SVN r20214.
2009-01-07 14:10:58 +00:00
Avneesh Pant
c1e508750b Check the active port MTU against the MTU statically configured for the HCA. QLogic HCA's capable of MTU had an issue when connected to switches running at 2K.
This commit was SVN r20131.
2008-12-15 21:17:58 +00:00
Nysal Jan
6a5454b76a Fixes crash in openib BTL on a heterogeneous cluster Refs trac:1700
This commit was SVN r20113.

The following Trac tickets were found above:
  Ticket 1700 --> https://svn.open-mpi.org/trac/ompi/ticket/1700
2008-12-10 22:07:48 +00:00
Jon Mason
54b4e9901c Gracefully handle NULL strings when calling orte_show_help for preventing usage
of more than one of the btl_openib_if_include, btl_openib_if_exclude,
btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude MCA parameters.

This commit was SVN r20053.
2008-12-02 23:17:46 +00:00
Jon Mason
91b26eba67 This commit adds comments regarding IP Aliases and the default behavior when
determining which IP address to use when transmitting data.  Also it adds logic
to prevent usage of more than one of the btl_openib_if_include,
btl_openib_if_exclude, btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude
MCA parameters.

This should complete the code modifications needed for ticket 1665.

This commit was SVN r20052.
2008-12-02 22:42:01 +00:00
Nysal Jan
e4bdaac6d8 Fixed the case where a device does not support inline data. Redefined the interpretation of max_inline_data MCA parameter.
* If max_inline_data == -1 perform runtime detection 
* If max_inline_data >=0 use the value provided 
* If the user does not explicitly set this via command line, use the value from INI file

This commit fixes trac:1662

This commit was SVN r19995.

The following Trac tickets were found above:
  Ticket 1662 --> https://svn.open-mpi.org/trac/ompi/ticket/1662
2008-11-14 12:15:35 +00:00
Ralph Castain
ce26e3a2fb Update the notifier framework in prep for move to v1.3. Add an API to handle the case where error messages have been expressed via "show_help" so they can look similar to what was presented to users. Add three key calls in the openib btl to drop messages into syslog.
This will sit in trunk for a few days - would like to actually see some errors reported to syslog before moving the code to 1.3

This commit was SVN r19986.
2008-11-12 18:03:51 +00:00
Jeff Squyres
3b1a1d75f8 Fix problem on RHEL4U3 when compiling with the PGI 32 bit compiler:
<infiniband/driver.h> cannot be included because it will fail to
compile.  So per advice from Roland, in that case, just put manually
include the one ibv_*() prototype that we need.  Use an undocumented
feature of AC_CHECK_HEADERS to check for <infiniband/driver.h> that I
was told on the autoconf mailing list (see the comment for more
details).

This commit was SVN r19857.
2008-10-29 21:45:06 +00:00
Jeff Squyres
6b6c08ef67 Fixes trac:1588: have several BTLs disable themselves in the presence of
THREAD_MULTIPLE.  There's a new (hidden) MCA parameter to re-enable
these BTLs in the presence of THREAD_MULTIPLE:
btl_base_thread_multiple_override.  This MCA parameter should ''only''
be used by developers who are working on make their BTLs thread safe;
it should ''not'' be used by end-users!

This commit was SVN r19826.

The following Trac tickets were found above:
  Ticket 1588 --> https://svn.open-mpi.org/trac/ompi/ticket/1588
2008-10-28 18:29:57 +00:00
Jeff Squyres
ae34fd150a It always helps to initialize a variable before you try to use it
(vs. only initializing it in some cases).

This commit was SVN r19796.
2008-10-24 14:20:07 +00:00
Jeff Squyres
307d52aedc Ensure to check for internal ptmalloc2 support properly.
This commit was SVN r19760.
2008-10-17 16:27:20 +00:00
Josh Hursey
88aa45dd52 Commit to bring online OpenIB, MX, and shared memory support for Open MPI's checkpoint/restart functionality. Some tuning is still needed, but basic functionality is in place.
There is still a problem with OpenIB and threads (external to C/R functionality). It has been reported in Ticket #1539

Additionally:
* Fix a file cleanup bug in CRS Base.
* Fix a possible deadlock in the TCP ft_event function
* Add a mca_base_param_deregister() function to MCA base
* Add whole process checkpoint timers
* Add support for BTL: OpenIB, MX,  Shared Memory
* Add support Mpool: rdma, sm
* Sundry bounds checking an cleanup in some scattered functions

This commit was SVN r19756.
2008-10-16 15:09:00 +00:00
Lenny Verkhovsky
7ab9a72d0f merge with r19717, memory barrier on PPC
I run IMB exchange on two QS22 machines with r19674 and it got stucked after 256 or 512 bytes every time.
After applying r19717 the test passed, so I guess this is a essential patch.

This commit was SVN r19752.

The following SVN revision numbers were found above:
  r19674 --> open-mpi/ompi@15c47a2473
  r19717 --> open-mpi/ompi@0a765cd788
2008-10-15 16:56:42 +00:00
Jeff Squyres
b8b7619312 * Remove pkey index as an MCA param
* Change name: mca_btl_openib_of_pkey_value -> mca_btl_openib_of_pkey
   (since now there's no index, the "_value" suffix is somewhat
   superfluous)
 * Put in a better help message for the _pkey MCA param (to agree with
   the new help message in v1.2.8)

This commit was SVN r19716.
2008-10-08 20:55:40 +00:00
Pavel Shamis
eefb66a133 Fixing openib partition support.
This commit was SVN r19705.
2008-10-08 09:56:43 +00:00
Jeff Squyres
c42ab8ea37 Fixes trac:1210, #1319
Commit from a long-standing Mercurial tree that ended up incorporating a lot of things:

 * A few fixes for CPC interface changes in all the CPCs
 * Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC
 * #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC)
   * Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag)
   * CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers. 
   * CPCs allocate a CTS buffer only if they're about to make a connection
 * RDMA CM improvements:
   * Use threaded mode openib fd monitoring to wait for for RDMA CM events
   * Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions
   * Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors
   * Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type
   * Renamed many variables to be internally consistent
   * Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator
   * Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them) 
   * Use rdma_create_qp() instead of ibv_create_qp()
 * openib fd monitoring improvements:
   * Renamed a bunch of functions and variables to be a little more obvious as to their true function
   * Use pipes to communicate between main thread and service thread
   * Add ability for main thread to invoke a function back on the service thread 
   * Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3))
   * Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port
   * Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout
 * Other improvements:
   * btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp")
   * Somewhat improved error handling
   * Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements
   * Oodles of little coding style fixes
   * Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction
   * Added some more show_help error messages
   * Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread) 

This commit was SVN r19686.

The following Trac tickets were found above:
  Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210
2008-10-06 00:46:02 +00:00
Jeff Squyres
8b786cac04 The configure test we had for checking whether openib could build
(related to the presence of posix threads and ptmalloc2) is now a
little outdated: since we don't build ptmalloc2 as part of libopal
anymore, the openib BTL's requirements are not directly tied to
ptmalloc2's anymore.  Specifically, I altered the test to:

 1. At compile time, if no threads are found, the ptmalloc2 component
    is going to be built, '''and the ptmalloc2 component is going to be
    inside libopal,''' then refuse to build the openib BTL.
 1. At run time, if no threads were available at compile time and the
    ptmalloc2 component is part of the process, then refuse to use the
    openib BTL.

Fixes trac:1537.

This commit was SVN r19652.

The following Trac tickets were found above:
  Ticket 1537 --> https://svn.open-mpi.org/trac/ompi/ticket/1537
2008-09-27 11:19:21 +00:00
Jeff Squyres
d2d06008a0 Change the default value of mpi_leave_pinned to -1, meaning that we'll
figure it out at runtime (really meaning: we'll still default to "0"
unless something explicitly overrides to 1, such as the openib BTL).
This way, ompi_info doesn't confusingly report mpi_leave_pinned==0 for
mpi_leave_pinned, but we end up running with mpi_leave_pinned==1.

Fixes trac:1502.

This commit was SVN r19571.

The following Trac tickets were found above:
  Ticket 1502 --> https://svn.open-mpi.org/trac/ompi/ticket/1502
2008-09-16 22:06:14 +00:00
Jeff Squyres
7b05a14d9a Back up r19489, which was the result of a "svn ci -m ..." instead of
an "hg ci -m ...".  Oops.

This commit was SVN r19490.

The following SVN revision numbers were found above:
  r19489 --> open-mpi/ompi@ea866f9e26
2008-09-03 08:45:33 +00:00
Jeff Squyres
ea866f9e26 Up to SVN r19488
This commit was SVN r19489.

The following SVN revision numbers were found above:
  r19488 --> open-mpi/ompi@c0b8a4a9b5
2008-09-03 08:35:59 +00:00
Ralph Castain
4ef9d15d97 Revamp the opal mca paffinity interface. We ran into a problem when we encountered machines that had "holes" in their physical processor layout - e.g., machines that supported "hotplugging", or that had unpopulated sockets. To solve that problem, we had to clarify at the API level where we were describing physical vs logical processor info, and then translate accordingly in the underlying implementation.
See opal/mca/paffinity/paffinity.h for explanation as to the physical vs logical nature of the params used in the API.

Fixes trac:1435

This commit was SVN r19391.

The following Trac tickets were found above:
  Ticket 1435 --> https://svn.open-mpi.org/trac/ompi/ticket/1435
2008-08-21 19:21:28 +00:00
Jeff Squyres
4aff8038f9 A compromise -- move the shutting down of the async thread to the
component_close, when we know all the btl modules are gone and there
will be no more of them created.

This commit was SVN r19214.
2008-08-07 13:48:38 +00:00