1
1

5050 Коммитов

Автор SHA1 Сообщение Дата
Rainer Keller
0feb158aaf - Since r22727 orte_app_idx_t was introduced, being a uint32_t (was
previously an orte_std_cntr_t, which is int32_t).
   Comparison with < 0 don't make any sense, here.

This commit was SVN r22799.

The following SVN revision numbers were found above:
  r22727 --> open-mpi/ompi@2541aa98ab
2010-03-08 22:56:33 +00:00
Rainer Keller
06f5ba1c19 - Reverse the logic (OPAL_LIKELY -> OPAL_UNLIKELY)
This commit was SVN r22796.
2010-03-08 14:00:59 +00:00
Jeff Squyres
95d7e08a66 More more discussion and testing has occurred off-ticket.
Short version: there is a bug in OS X/Snow Leopard, but there is also
a bug in Open MPI.  Fixing the bug in Open MPI is both trivial (a
1-line change) and avoids the bug in OS X.  We'll file an OS X bug
report upstream with Apple, but it should no longer affect us here in
OMPI.

Fixes trac:2039.

More details:

Some background first: 

 1. IPv4 sockets can only accept incoming IPv4 connections.  However,
    IPv6 sockets can be configured to accept ''only'' incoming IPv6
    connection, or ''both'' incoming IPv4 and IPv6 connections.  An
    IPv6 socket attribute sets which listening behavior is used.
 1. IPv4 and IPv6 have different port namespaces.  Hence, it is
    permissable to bind a v4 socket to port X ''and'' also bind a v6
    socket to that same port X on the same interface (assuming that
    the v6 socket is only accepting incoming v6 connections).
    Incoming v4 connections to port X on the interface should get
    matched to the listening v4 socket; incoming v6 connections should
    get matched to the listening v6 socket.
 1. When v6 sockets accept ''both'' incoming v4 and v6 connections, it
    should claim port X in both namespaces.
 1. Linux's default behavior is to only allow one listening socket to
    be bound to a given port (i.e., ''either'' a v6 or v4 socket to be
    bound to a single port X -- not both).  A v6 socket can listen for
    both v4 and v6 incoming connections on that port, but still --
    only one socket will be bound to that port.
 1. Snow Leopard's default behavior is to share ports -- i.e., let
    both a v4 and a v6 listening socket to be bound to port X
    (assuming that the v6 socket is only accepting incoming v6
    connections).

The TCP BTL creates a listening socket for each address family.
Hence, it creates a v4 listening socket on INADDR_ANY and a v6
listening socket on the v6 equivalent of INADDR_ANY.  OMPI then
iteratively tries to find ports to listen on within the range of
[mca_btl_tcp_port_min, mca_btl_tcp_port_min + mca_btl_tcp_port_range).

On Linux, the v4 socket will be bound to port X and the v6 socket will
likely be bound to port Y (where X != Y).  On Snow Leopard, the v4
socket will be bound to port X and the v6 socket may ''also'' be bound
to port X.  Since the namespaces are separate, this shouldn't be a
problem.

However, Open MPI was accidentally setting the v6 listening behavior
to accept ''both'' v4 and v6 incoming connections.  This is a trivial
thing to fix -- change a 0 to a 1 in the code.  On Linux, this issue
didn't matter because the v4 and v6 sockets were on different ports.
So even though the v6 socket ''would'' have accepted incoming v4
connections, that never happened because OMPI would direct v4
connections to the v4 port.

But on Snow Leopard, the v4 and v6 listening ports could end up
sharing the same port number.  As mentioned above, this ''shouldn't''
have been a problem, but it looks like Snow Leopard has the following
bugs:

 * If a v4 socket is already bound to port X, we're pretty sure that a
   v6 socket with the "accept both v4 and v6 incoming connections"
   listening behavior should not be able to claim port X (because
   there's already a v4 socket listening on X).  However, Snow Leopard
   would allow binding a v4 socket to port X, and then allow a v6
   socket configured to allow incoming v4 and v6 connections to
   ''also'' be bound to port X.
 * After binding the v6 socket to port X, Snow Leopard then lets
   ''another'' v4 socket ''also'' get bound to port X.  Hence, there's
   now '''three''' sockets all listening on port X.

This obviously led to mis-matched TCP connections, and things went
downhill from there.

That being said, Snow Leopard doesn't exhibit this behavior if v6
sockets only allow incoming v6 connections.  And technically, that is
exactly the behavior we want (we want v6 sockets to only accept
incoming v6 connections).  So if we just change the flag to make our
v6 listening socket us this behavior, the problem on OS X goes away.

That's what this commit does -- it changes a 0 to a 1, indicating
"only let this v6 socket allow incoming v6 connections."

That was simple, wasn't it?

This commit was SVN r22788.

The following Trac tickets were found above:
  Ticket 2039 --> https://svn.open-mpi.org/trac/ompi/ticket/2039
2010-03-05 17:37:57 +00:00
Matthias Jurenz
75d71239d1 Fixed bug in parsing nm-file:
Do not trigger a parse error if address is out of range. Ignore symbol instead.

This commit was SVN r22778.
2010-03-04 16:03:53 +00:00
Shiqing Fan
4c1fc87502 Set the compile flags for F77 on Windows more correctly.
This commit was SVN r22774.
2010-03-04 11:41:42 +00:00
Matthias Jurenz
5b9515225d - fixed stack shutdown if maximum number of buffer flushes was reached
- fixed potential stack underflow in vtfilter which might be cause a segmentation fault

This commit was SVN r22773.
2010-03-04 08:08:20 +00:00
Iain Bason
18d9e96301 Fixed two problems:
1. The code that looks at btl_tcp_if_exclude before doing a
   modex_send uses strcmp rather than strncmp. That means that
   "lo0" gets sent even though "lo" is excluded.

2. The code that determines whether a particular local TCP
   interface can connect to a particular remote interface doesn't
   check for loopback interfaces. With this fix, users can now
   enable "lo" and be assured that it will only be used for intra-
   node communication.

This commit was SVN r22762.
2010-03-03 15:51:15 +00:00
George Bosilca
ec7fcf3f91 While building the profiling interface, ignore the
I/O functions if support for I/O is not requested.

This commit was SVN r22761.
2010-03-02 21:05:04 +00:00
Ralph Castain
c88fe1ea54 Create a new mca parameter to control creation of session directories. Defaults to true so that the current behavior of always creating them is preserved. If set to false (0), then don't create session directories. Helps in those environments where session directories are a problem.
Tell the sm btl that it cannot run if no session directories were created.

This commit was SVN r22756.
2010-03-02 15:18:33 +00:00
Matthias Jurenz
5f368a094f Restored support for Automake's silent rules
This commit was SVN r22741.
2010-03-01 13:10:27 +00:00
Matthias Jurenz
157942809c Use more portable 'nm' command instead of the BFD library to collect symbol information for instrumentation with the GNU, Intel, and PathScale
This commit was SVN r22737.
2010-03-01 12:20:41 +00:00
Ralph Castain
f4c3cceb5e Get the function prototypes to match so we eliminate an annoying warning
This commit was SVN r22726.
2010-02-27 16:41:16 +00:00
Jeff Squyres
b0eaebf46f Add Intel's OUI.
This commit was SVN r22723.
2010-02-26 19:54:16 +00:00
Rolf vandeVaart
2715141f6d Fix minor bug in the way we handle btl_tcp_if_include list.
This commit was SVN r22722.
2010-02-26 18:08:04 +00:00
Shiqing Fan
e1c009932b Add a few more fortran compile flags, and enable dynamic build for f77 library now.
This commit was SVN r22720.
2010-02-26 07:55:32 +00:00
Jeff Squyres
2e91de0bdd This has bugged me for a long, long time: rename btl_openib_iwarp.* ->
btl_openib_ip.*.  The routines in these files are not specific to
iwarp -- they are specific to IP interfaces used with IBV devices
(even IB or IBoE/RoCEE/whatever devices).

This commit was SVN r22718.
2010-02-25 21:04:09 +00:00
Jeff Squyres
a4a81698c2 Mostly a patch from Vasily/Mellanox to fix multi-port and 32/64 bit
issues with iwarp.c.  These fixes are needed for IBoE / ROCEE /
whateveritscalledtoday.  I added a few minor changes to his base
patch.

This commit was SVN r22717.
2010-02-25 20:57:05 +00:00
Eugene Loh
316892b49f Fix spelling of "degradation".
This commit was SVN r22714.
2010-02-25 19:41:59 +00:00
Pavel Shamis
9fbfe6b1c0 The fix resolves the bug #2307. QP creation may fail, since the calculation for _reserved_ does not check for QP type. As result the max_recv_wr may get wrong value . Needs to go to both cmr:v1.4.2 and cmr:v1.5.0
This commit was SVN r22713.
2010-02-25 11:15:20 +00:00
Ralph Castain
18c7aaff08 Update the grpcomm framework to be more thread-friendly.
Modify the orte configure options to specify --enable-multicast such that it directs components to build or not instead of littering the code base with #if's. Remove those #if's where they used to occur.

Add a new grpcomm "mcast" module to support multicast operations. Still some work required to properly perform daemon collectives for comm_spawn operations. New module only builds when --enable-multicast is provided, and when specifically selected.

This commit was SVN r22709.
2010-02-25 01:11:29 +00:00
Jeff Squyres
dd4945c194 New part ID's from Chelsio and Intel. May still get more from
Chelsio. 

This commit was SVN r22708.
2010-02-24 20:39:40 +00:00
Jeff Squyres
af6f1f4b00 Add pkg-config(1) config files to Open MPI. Additionally, fix a minor
bug: libmpi_f90 had libmpi.la in its LIBADD instead of libmpi_f77.la.

Fixes trac:2244.

This commit was SVN r22704.

The following Trac tickets were found above:
  Ticket 2244 --> https://svn.open-mpi.org/trac/ompi/ticket/2244
2010-02-24 18:46:06 +00:00
Pavel Shamis
99ee62771d The fix resolves bug #2292. We may to call for prepare_device_for_use() only after adding the btl to mca_btl_openib_component.openib_btls. Needs to go to both cmr:v1.4.2 and cmr:v1.5.0
This commit was SVN r22702.
2010-02-24 10:13:06 +00:00
Christopher Yeoh
774a7a58b0 Fixes case where there is unprotected access to
mca_osc_rdma_component.c_modules in ompi_osc_rdma_windx_to_module
Fixes case where there is unprotected access to
mca_osc_rdma_component.c_modules in ompi_osc_rdma_windx_to_module

This commit was SVN r22700.
2010-02-24 01:28:37 +00:00
Jeff Squyres
d9b6b5af0c This commit converts us to the "one big libmpi" scheme that has been
discussed extensively.  See
https://svn.open-mpi.org/trac/ompi/ticket/2092 and the RFC thread
http://www.open-mpi.org/community/lists/devel/2010/02/7447.php.

Specifically:

 * Create LT convenience libraries for OPAL and ORTE if the layer
   above them is being created (use the already-defined
   AM_CONDITIONALs to know if the project above us is being built).
 * ORTE slurps in the LT convenience library for OPAL; OMPI slurps in
   the LT convenience library for ORTE.
 * Wrapper compilers now only -l one library (e.g., ortecc only does
   -lopen-ret, and mpicc only does -lmpi).

This commit was SVN r22691.
2010-02-23 22:20:01 +00:00
Jeff Squyres
5ec2d8764b Amendment to r22671: change the name of the new communicator flag from
INTERNAL to EXTRA_RETAIN, because not all "internal" communicators
have this flag set (only internal communicators with CIDs less than
their parent).  Hence, what this flag ''really'' means is that there
was an extra RETAIN performed on it.  So name the flag just that --
EXTRA_RETAIN -- indicating that an extra RETAIN has occurred.

This commit was SVN r22690.

The following SVN revision numbers were found above:
  r22671 --> open-mpi/ompi@61dee816db
2010-02-23 21:24:07 +00:00
Jeff Squyres
583394e30b This help message got a little jumbled.
This commit was SVN r22689.
2010-02-23 21:09:16 +00:00
Christopher Yeoh
f79263550c This fixes trac:2265 removing a race in the openib btl endpoint when
increasing sequence numbers. cmr:v1.4

This commit was SVN r22684.

The following Trac tickets were found above:
  Ticket 2265 --> https://svn.open-mpi.org/trac/ompi/ticket/2265
2010-02-23 12:46:06 +00:00
Christopher Yeoh
c1dcf1c164 The release of memory used by registration lists in rcaches must be delayed until the rcache lock is not held or deadlock
can occur ( fixes trac:2111 ).
Should not deregister memory with the rcache lock held otherwise a deadlock can occur as the lower
level infiniband libraries can free memory ( fixes trac:2110 )

cmr:v1.4

This commit was SVN r22683.

The following Trac tickets were found above:
  Ticket 2110 --> https://svn.open-mpi.org/trac/ompi/ticket/2110
  Ticket 2111 --> https://svn.open-mpi.org/trac/ompi/ticket/2111
2010-02-23 11:31:58 +00:00
Christopher Yeoh
322e73d8c4 The ib_procs list in the openib btl is accessed without the ib lock in some cases. This causes races when running multithreaded. This patch adds protection of the ib_procs list with the ib_lock.
fixes trac:2149 cmr:v1.4

This commit was SVN r22682.

The following Trac tickets were found above:
  Ticket 2149 --> https://svn.open-mpi.org/trac/ompi/ticket/2149
2010-02-23 05:19:03 +00:00
Christopher Yeoh
a0b8f061a6 Destroying an rcache vma while the rcache lock is held
as this can result in a low level free of memory which
can require the rcache lock resulting in a deadlock

This fixes trac:2107 
cmr:v1.4

This commit was SVN r22679.

The following Trac tickets were found above:
  Ticket 2107 --> https://svn.open-mpi.org/trac/ompi/ticket/2107
2010-02-22 11:19:15 +00:00
Christopher Yeoh
11500e3267 Fixes bug where the wrong lock is taken in mca_btl_openib_alloc
when protecting the no_wqe_pending_frags list.

fixes trac:2118 add cmr:v1.4

This commit was SVN r22678.

The following Trac tickets were found above:
  Ticket 2118 --> https://svn.open-mpi.org/trac/ompi/ticket/2118
2010-02-22 08:14:45 +00:00
Christopher Yeoh
a14a5dc3c6 This fixes a bug where sometimes the rcache lock would be dropped when it wasn't actually held.
Also includes some minor copytight header additions that were missed in previous checkins
fixes trac:2101 added cmr:v1.4

This commit was SVN r22676.

The following Trac tickets were found above:
  Ticket 2101 --> https://svn.open-mpi.org/trac/ompi/ticket/2101
2010-02-22 07:40:42 +00:00
Shiqing Fan
fa6a050b80 Set the correct install source path.
This commit was SVN r22673.
2010-02-20 13:40:15 +00:00
Shiqing Fan
e0bfd9f836 A type cast.
This commit was SVN r22672.
2010-02-20 10:47:37 +00:00
Edgar Gabriel
61dee816db This commit fixes a bug on how to deal with the potential if a 'dependent'
communicator that we created has a lower CID than the parent comm. This can
happen when using the hierarch collective communication module or for
inter-communicators (since we make a duplicate of the original communicator).
This is not a problem as long as the user calls MPI_Comm_free on the parent 
communicator.  However, if the communicators are not freed by the user but
released by Open MPI in MPI_Finalize, we walk through the list of still
available communicators and free them one by one. Thus, local_comm is freed
before the actual inter-communicator. However, the local_comm pointer in the
inter communicator will still contain the 'previous' address of the local_comm
and thus this will lead to a segmentation violation. In order to prevent that
from happening, we increase the reference counter local_comm by one if its CID
is lower than the parent. We cannot increase however its reference counter if
the CID of local_comm is larger than the CID of the inter communicators, since
a regular MPI_Comm_free would leave in that the case the local_comm hanging
around and thus we would not recycle CID's properly, which was the reason and
the cause for this trouble.

This commit fixes tickets 2094 and 2166. Note however, that I want to close
them manually, since a slightly different patch is required for the 1.4
series. This commit will have to be applied for the 1.5 series. And I will
need a volunteer to review it.

This commit was SVN r22671.
2010-02-19 23:45:30 +00:00
Rainer Keller
548d6f7c61 - Incorporated a rewording proposal by Jeff.
This commit was SVN r22670.
2010-02-19 14:37:09 +00:00
George Bosilca
7eff2cdf85 Unrestricted number of interfaces.
This commit was SVN r22669.
2010-02-19 07:10:32 +00:00
Matthias Jurenz
111a424dac - removed hard-coded directory paths in vt_dyninst.c
- temporary disabled wrapper for 'fcntl' in vt_iowrap.c, due to curious behaviour on some platforms (e.g. segfault)

This commit was SVN r22659.
2010-02-18 10:36:20 +00:00
Pavel Shamis
a124f6b10b Adding a hash table for management dependences between SRQs and their BTL modules.
This commit was SVN r22653.
2010-02-18 09:48:16 +00:00
Ralph Castain
40be3d896c Ensure we set an error code when leaving, correctly check for slot_list_set return status
This commit was SVN r22643.
2010-02-17 22:59:19 +00:00
Jeff Squyres
c23e6f3d56 Add an opal_attribute_unused in here since we're no longer using this
parameter (I just discovered while researching for v1.4 that v1.4 has
effectively this same function definition: it just always returns
true!).

This commit was SVN r22642.
2010-02-17 21:12:49 +00:00
Jeff Squyres
898eedd78f Fixes trac:2233.
This commit adds a lengthy comment in ompi_datatype.h that explains
why a one-sided datatype check was removed.  The short version is that
we do have to allow some datatypes that may be unwise to use (e.g.,
"h" types of datatypes that have offsets in bytes -- MPI says it's ok
to use these), and our DDT engine can't currently detect datatypes
with absolute offsets, which MPI says it's ''not'' ok to use with
one-sided operations.  Hence, we don't check for some datatypes that
are invalid to use with one-sided operations, and erroneous programs
may crash and burn.  Life is hard.

The main point of this commit is that we now do allow datatypes for
one-sided operations that are supposed to be allowed.

This commit was SVN r22641.

The following Trac tickets were found above:
  Ticket 2233 --> https://svn.open-mpi.org/trac/ompi/ticket/2233
2010-02-17 20:16:55 +00:00
George Bosilca
3bceb20b1c Only get the receive datatype extent on the root process, as every
other process should ignore this value. Thanks to Michael Hofmann
for investigating this issue.

This commit closes trac:2268.

This commit was SVN r22639.

The following Trac tickets were found above:
  Ticket 2268 --> https://svn.open-mpi.org/trac/ompi/ticket/2268
2010-02-17 16:01:50 +00:00
Matthias Jurenz
1ce37bc5ce VT general:
- Updated date in copyright header of each source file
VT configure fixes:
- fixed configure's version detection for PAPI to support version 4.x
- added configure tests to detect Bull MPICH2
VT new features:
- added support for "re-locate" an existing VampirTrace? installation without re-build it from source (fixes OMPI's ticket #1990)
- added support for tracing functions in shared libraries instrumented by the GNU, Intel, Pathscale, ot PGI 9 compiler
- added support for PAPI-C counters which belong to different components
- extended usability of environment variable VT_METRICS for PAPI counters to specifiy whether a counter provides increasing or absolute values

This commit was SVN r22637.
2010-02-17 14:38:11 +00:00
Shiqing Fan
3a3018deef Convert the line endings for the added header files. They were changed automatically by Windows when adding new files.
This commit was SVN r22634.
2010-02-16 17:24:44 +00:00
Shiqing Fan
08ffdbe987 Changes for portable platform headers. Commit it on behalf of Ralph.
This commit was SVN r22619.
2010-02-15 22:14:59 +00:00
Pavel Shamis
9d0ae097c1 Updating vendor part ids for some mellanox devices
This commit was SVN r22617.
2010-02-15 09:45:34 +00:00
Jeff Squyres
dafc0c914b Restoring the build for now.
This commit was SVN r22611.
2010-02-12 12:03:17 +00:00
Rainer Keller
48254c78c9 - When svn version string becomes too long (>72 columns) some Fortran
compilers get confused. Continue on the next line.
   Thanks to Richard Tran Mills for noticing that.

This commit was SVN r22609.
2010-02-11 23:23:36 +00:00