1
1
Граф коммитов

4102 Коммитов

Автор SHA1 Сообщение Дата
Terry Dontje
b5a30af20e Correct double free of event on tcp_events list. Refs trac:1629.
This commit was SVN r19890.

The following Trac tickets were found above:
  Ticket 1629 --> https://svn.open-mpi.org/trac/ompi/ticket/1629
2008-11-03 19:03:57 +00:00
Matthias Jurenz
c4f0eb276a Added configure check for RTLD_NEXT, which necessary for LIBC's I/O tracing
This commit was SVN r19883.
2008-11-03 10:19:40 +00:00
George Bosilca
d06091cf84 Completely refactor the connection procedure for MX. It is now cleaner, and
in case of failure more user friendly. In addition we now fully integrate
with the MX_BONDING features (i.e the multi-rails bonding can be done either
at the PML OB1 level, or deep down in the MX library). This feature is driven
by the btl_mx_bonding MCA parameter.
Plus some extra features:
- Correctly compute the bandwidth in case of bonding.
- Nicely handle any mapper access errors.

This commit was SVN r19874.
2008-11-01 17:05:48 +00:00
Jeff Squyres
ae86c0eeca Ensure that the btl variable is always initialized.
This commit was SVN r19873.
2008-11-01 11:24:11 +00:00
George Bosilca
ef3d8cd2a1 Release the memory if we fail to open an MX endpoint.
This commit was SVN r19871.
2008-10-31 23:23:14 +00:00
Jeff Squyres
57c52d373b Minor nit: use the real filename. opal_show_help() will ultimately
find the file even if we don't specify the ".txt" extension here, but
we might as well do it Right in the first place.

This commit was SVN r19867.
2008-10-31 21:48:18 +00:00
Jeff Squyres
3b1a1d75f8 Fix problem on RHEL4U3 when compiling with the PGI 32 bit compiler:
<infiniband/driver.h> cannot be included because it will fail to
compile.  So per advice from Roland, in that case, just put manually
include the one ibv_*() prototype that we need.  Use an undocumented
feature of AC_CHECK_HEADERS to check for <infiniband/driver.h> that I
was told on the autoconf mailing list (see the comment for more
details).

This commit was SVN r19857.
2008-10-29 21:45:06 +00:00
Matthias Jurenz
83005d42b9 Updated Copyright information
This commit was SVN r19849.
2008-10-29 16:12:53 +00:00
Rolf vandeVaart
95bf870055 Fix the allocation for the pointers to the btl structures.
We were only allocating one whereas we would typically need
one per interface.  Most of the time we got lucky and we 
happily trundled over unallocated memory.  But with enough
interfaces, we would get a SEGV.

Also add include to fix compilation with static libs.

This commit was SVN r19845.
2008-10-29 15:15:37 +00:00
Matthias Jurenz
e119e42c08 Fixed memory leak in 'OTF_vsnprintf()'
This commit was SVN r19842.
2008-10-29 12:38:12 +00:00
Matthias Jurenz
93fcd24832 Added using of a portable implementation of 'snprintf()', that comes also with the VT-package
This commit was SVN r19836.
2008-10-29 10:03:23 +00:00
Jeff Squyres
b50a4f126a Gahh! Today is the day for accidental commits. Back out the op
portion of r19826.

This commit was SVN r19827.

The following SVN revision numbers were found above:
  r19826 --> open-mpi/ompi@6b6c08ef67
2008-10-28 18:56:06 +00:00
Jeff Squyres
6b6c08ef67 Fixes trac:1588: have several BTLs disable themselves in the presence of
THREAD_MULTIPLE.  There's a new (hidden) MCA parameter to re-enable
these BTLs in the presence of THREAD_MULTIPLE:
btl_base_thread_multiple_override.  This MCA parameter should ''only''
be used by developers who are working on make their BTLs thread safe;
it should ''not'' be used by end-users!

This commit was SVN r19826.

The following Trac tickets were found above:
  Ticket 1588 --> https://svn.open-mpi.org/trac/ompi/ticket/1588
2008-10-28 18:29:57 +00:00
Jeff Squyres
96f7d971d3 Refs trac:1589: remove the OMPI_DECLSPEC because it doesn't need to be
there.

This commit was SVN r19825.

The following Trac tickets were found above:
  Ticket 1589 --> https://svn.open-mpi.org/trac/ompi/ticket/1589
2008-10-28 17:24:31 +00:00
Jeff Squyres
fdbb2d01aa Fix the wrapper compiler flags to get the right C++ exceptions flags
(this was missed in #1585).  Also, fix a long-standing problem that
the F90 wrapper compilers were using the F77 wrapper compiler flags.

This commit was SVN r19819.
2008-10-28 14:26:47 +00:00
Jeff Squyres
be21ea2391 Back out r19805 and replace it with something a bit better, based on a
suggestion from George during review.

Refs trac:1595.

This commit was SVN r19818.

The following SVN revision numbers were found above:
  r19805 --> open-mpi/ompi@bc710baa75

The following Trac tickets were found above:
  Ticket 1595 --> https://svn.open-mpi.org/trac/ompi/ticket/1595
2008-10-28 12:25:08 +00:00
Matthias Jurenz
bd802e06c8 Undo last *faulty* checkin
This commit was SVN r19816.
2008-10-27 19:14:32 +00:00
Jeff Squyres
200a8552fe Fixes trac:1598: it's not an error if a CPC returns UNREACH.
This commit was SVN r19815.

The following Trac tickets were found above:
  Ticket 1598 --> https://svn.open-mpi.org/trac/ompi/ticket/1598
2008-10-27 18:23:56 +00:00
Matthias Jurenz
286321210a Minor bugfix in VTGen_write_COMMENT():
Call VTGEN_ALLOC_EVENT instead of VTGEN_ALLOC_DEF

This commit was SVN r19808.
2008-10-26 12:06:13 +00:00
Matthias Jurenz
1dc0348665 Fixed compiler error:
replaced 'using std::sprintf' by' using std::snprintf'

This commit was SVN r19807.
2008-10-26 12:05:56 +00:00
Matthias Jurenz
894df5072b Fixed minor portability issues under Windows
This commit was SVN r19806.
2008-10-26 12:05:27 +00:00
Jeff Squyres
bc710baa75 Remove a compiler warning from ompi_info by using 2 macros to record
whether MPI API parameters should be checked never, always, or
determined at run-time.

This commit was SVN r19805.
2008-10-25 20:54:02 +00:00
Jeff Squyres
b5123cb79f Don't destroy the event channel until after everything else has been
torn down.  Fixes trac:1582.

This commit was SVN r19800.

The following Trac tickets were found above:
  Ticket 1582 --> https://svn.open-mpi.org/trac/ompi/ticket/1582
2008-10-24 15:04:54 +00:00
Jeff Squyres
ae34fd150a It always helps to initialize a variable before you try to use it
(vs. only initializing it in some cases).

This commit was SVN r19796.
2008-10-24 14:20:07 +00:00
Jeff Squyres
238a6f851f Wow. And I won't name who it was who did this. You know who you are! :-)
We had a public symbol named "already_opened".  This commit changes
the name to mca_btl_base_already_opened.

I guess it's good we have the visibility stuff enabled by default!
:-)

This commit was SVN r19791.
2008-10-23 18:56:10 +00:00
George Bosilca
9260f6b157 There is no reason to ask for an ACK from the BTL here.
This commit was SVN r19789.
2008-10-22 20:13:33 +00:00
George Bosilca
7dfdf3e907 A lot of MX fixes.
1. Allow MX bonding via btl_mx_bonding MCA parameter. With this on, Open MPI
  suppose that lib MX will do the bonding, and we will only return one BTL.
  Otherwise, we return as many as devices.
2. Decrease the memory footprint, by cleaning up what we store about the
  peers and how we store it.
3. Allow multiple MX routes that share the same mapper. In this particular
  case we will link by their nic_id.
4. Allow multiple MX routes with multiple mappers. In this case we match
  the NICs based on the last 6 digits of the mapper MAC.
5. Increase the size of the eager and rendez-vous eager limits in the
  case where we are unable to register an unexpected callback with MX.
6. Increase the default max number of MX fragments.
7. Increase the max number of MX BTLs.
8. Only allow mx_if_include and mx_if_exclude if we have acess to the
  mapper.

This commit was SVN r19788.
2008-10-22 20:12:30 +00:00
Ralph Castain
6e5d844c36 Roll in the revamped IOF subsystem. Per the devel mailing list email, this is a complete rewrite of the iof framework designed to simplify the code for maintainability, and to support features we had planned to do, but were too difficult to implement in the old code. Specifically, the new code:
1. completely and cleanly separates responsibilities between the HNP, orted, and tool components.

2. removes all wireup messaging during launch and shutdown.

3. maintains flow control for stdin to avoid large-scale consumption of memory by orteds when large input files are forwarded. This is done using an xon/xoff protocol.

4. enables specification of stdin recipients on the mpirun cmd line. Allowed options include rank, "all", or "none". Default is rank 0.

5. creates a new MPI_Info key "ompi_stdin_target" that supports the above options for child jobs. Default is "none".

6. adds a new tool "orte-iof" that can connect to a running mpirun and display the output. Cmd line options allow selection of any combination of stdout, stderr, and stddiag. Default is stdout.

7. adds a new mpirun and orte-iof cmd line option "tag-output" that will tag each line of output with process name and stream ident. For example, "[1,0]<stdout>this is output"

This is not intended for the 1.3 release as it is a major change requiring considerable soak time.

This commit was SVN r19767.
2008-10-18 00:00:49 +00:00
Jeff Squyres
307d52aedc Ensure to check for internal ptmalloc2 support properly.
This commit was SVN r19760.
2008-10-17 16:27:20 +00:00
Rich Graham
9d59c0fbd6 add comment on what ompi_convertor_need_buffers() does.
This commit was SVN r19757.
2008-10-16 15:40:21 +00:00
Josh Hursey
88aa45dd52 Commit to bring online OpenIB, MX, and shared memory support for Open MPI's checkpoint/restart functionality. Some tuning is still needed, but basic functionality is in place.
There is still a problem with OpenIB and threads (external to C/R functionality). It has been reported in Ticket #1539

Additionally:
* Fix a file cleanup bug in CRS Base.
* Fix a possible deadlock in the TCP ft_event function
* Add a mca_base_param_deregister() function to MCA base
* Add whole process checkpoint timers
* Add support for BTL: OpenIB, MX,  Shared Memory
* Add support Mpool: rdma, sm
* Sundry bounds checking an cleanup in some scattered functions

This commit was SVN r19756.
2008-10-16 15:09:00 +00:00
Lenny Verkhovsky
7ab9a72d0f merge with r19717, memory barrier on PPC
I run IMB exchange on two QS22 machines with r19674 and it got stucked after 256 or 512 bytes every time.
After applying r19717 the test passed, so I guess this is a essential patch.

This commit was SVN r19752.

The following SVN revision numbers were found above:
  r19674 --> open-mpi/ompi@15c47a2473
  r19717 --> open-mpi/ompi@0a765cd788
2008-10-15 16:56:42 +00:00
Jeff Squyres
1e14bb305f Don't increase num_peers if the operation fails.
This commit was SVN r19748.
2008-10-15 10:37:20 +00:00
Jeff Squyres
57a3dce9ba LANL noticed that calling MPI_ABORT invokes opal_output(0, ...)
unconditionally, which can result in a flood of messages to the user
if all MPI processes invoke abort.  Additionally, some users were
confused because they saw the MPI_ABORT opal_output() messages from
''some'' MPI processes, but not ''all'' of them (despite the fact that
every MPI process supposedly invoked MPI_ABORT).  The reason is that
calling MPI_ABORT triggers ORTE to kill all MPI processes, so it's a
race condition as to whether a) all MPI processes actually invoke
MPI_ABORT, and/or b) whether every process is able to opal_output()
before they are killed.

This commit does two simple things:
 * Now use orte_show_help() for the MPI_ABORT message, so they are
   aggregated. 
 * Add a note in the message that calling MPI_ABORT kills all
   processes, so you might not see all output, yadda yadda yadda.

This commit was SVN r19735.
2008-10-14 19:23:03 +00:00
Rainer Keller
856eaf0d42 - Destruct the file->f_io_requests_lock as well.
This commit was SVN r19730.
2008-10-14 15:23:45 +00:00
Rolf vandeVaart
137729d2f9 Fix warnings (thanks Jeff) from previous fix. This is extra
fix for ticket #1554.

This commit was SVN r19728.
2008-10-10 14:35:52 +00:00
Jeff Squyres
8b66171086 Only fail the CPC when the first QP is not PP *if* the CPC uses the
CTS protocol.

This commit was SVN r19726.
2008-10-09 17:31:36 +00:00
Tim Mattox
de623ea161 Remove a redundant if & goto.
This commit was SVN r19724.
2008-10-09 15:07:56 +00:00
Rolf vandeVaart
aad4427caa Fix the implementation of MPI_Reduce_scatter on intercommunicators.
We still do an interreduce but it is now followed by an intrascatterv.

This fixes trac:1554.

This commit was SVN r19723.

The following Trac tickets were found above:
  Ticket 1554 --> https://svn.open-mpi.org/trac/ompi/ticket/1554
2008-10-09 14:35:20 +00:00
Jeff Squyres
f7a94f17b9 Since we now & in the mask, the value can never be higher than the
mask value.  Also, the value is unsigned, so it can never be less than
0.

This commit was SVN r19719.
2008-10-09 13:12:49 +00:00
Jeff Squyres
46d7ffd298 Remove some redundancy from redundant MCA redundant param names. The
following names are all new for v1.3, and therefore haven't been
officially released yet:

 * btl_openib_of_cq_size
 * btl_openib_of_max_inline_data
 * btl_openib_of_pkey
 * btl_openib_of_psn
 * btl_openib_of_mtu

The "_of_" (for OpenFabrics) in there is redundant.  It used to be
"_ib_", indicating that these values are pretty much passed directly
to the verbs stack.  But I think the "openib" in the name implies this
already; having "_of_" in there just seems redundant, makes the name
longer, and seems redundant.  It's also redundant.

So I took those "_of_"'s out of the MCA names.  The old (v1.2) names
are still valid (but deprecated), such ash btl_openib_ib_cq_size.

This commit was SVN r19718.
2008-10-08 21:34:05 +00:00
Jeff Squyres
b8b7619312 * Remove pkey index as an MCA param
* Change name: mca_btl_openib_of_pkey_value -> mca_btl_openib_of_pkey
   (since now there's no index, the "_value" suffix is somewhat
   superfluous)
 * Put in a better help message for the _pkey MCA param (to agree with
   the new help message in v1.2.8)

This commit was SVN r19716.
2008-10-08 20:55:40 +00:00
Josh Hursey
6e34994ee1 Fix a problem with persistent sends found in MTT regression testing on Odin.
This commit was SVN r19707.
2008-10-08 14:11:55 +00:00
Pavel Shamis
d6eb6b3a34 replacing maximum pkey value with mask
This commit was SVN r19706.
2008-10-08 10:39:57 +00:00
Pavel Shamis
eefb66a133 Fixing openib partition support.
This commit was SVN r19705.
2008-10-08 09:56:43 +00:00
Pavel Shamis
ceb3e04d0d Fixing compilation failure for case when OMPI_HAVE_RDMACM is not defined.
This commit was SVN r19704.
2008-10-08 09:50:54 +00:00
Rolf vandeVaart
13e8975f83 In the case where we detect a value of 0 in the recvcount
array, fall back to the simpler algorithms.  This is not
the optimal solution, but it works.  

This commit was SVN r19702.
2008-10-07 19:44:51 +00:00
Josh Hursey
b7560c52be Make sure to check the bounds of the communicator array before referencing it. Fixes some of the FT related Intel bugs seen in Odin testing last night.
This commit was SVN r19701.
2008-10-07 19:15:12 +00:00
Jeff Squyres
942b59d572 The endpoint is actually a list_item_t -- not a base object. But only
XOOB uses it, so we didn't notice util recently.

This commit was SVN r19689.
2008-10-06 17:38:25 +00:00
Jeff Squyres
c42ab8ea37 Fixes trac:1210, #1319
Commit from a long-standing Mercurial tree that ended up incorporating a lot of things:

 * A few fixes for CPC interface changes in all the CPCs
 * Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC
 * #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC)
   * Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag)
   * CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers. 
   * CPCs allocate a CTS buffer only if they're about to make a connection
 * RDMA CM improvements:
   * Use threaded mode openib fd monitoring to wait for for RDMA CM events
   * Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions
   * Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors
   * Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type
   * Renamed many variables to be internally consistent
   * Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator
   * Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them) 
   * Use rdma_create_qp() instead of ibv_create_qp()
 * openib fd monitoring improvements:
   * Renamed a bunch of functions and variables to be a little more obvious as to their true function
   * Use pipes to communicate between main thread and service thread
   * Add ability for main thread to invoke a function back on the service thread 
   * Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3))
   * Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port
   * Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout
 * Other improvements:
   * btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp")
   * Somewhat improved error handling
   * Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements
   * Oodles of little coding style fixes
   * Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction
   * Added some more show_help error messages
   * Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread) 

This commit was SVN r19686.

The following Trac tickets were found above:
  Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210
2008-10-06 00:46:02 +00:00