1
1

3186 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
dd4945c194 New part ID's from Chelsio and Intel. May still get more from
Chelsio. 

This commit was SVN r22708.
2010-02-24 20:39:40 +00:00
Pavel Shamis
99ee62771d The fix resolves bug #2292. We may to call for prepare_device_for_use() only after adding the btl to mca_btl_openib_component.openib_btls. Needs to go to both cmr:v1.4.2 and cmr:v1.5.0
This commit was SVN r22702.
2010-02-24 10:13:06 +00:00
Christopher Yeoh
774a7a58b0 Fixes case where there is unprotected access to
mca_osc_rdma_component.c_modules in ompi_osc_rdma_windx_to_module
Fixes case where there is unprotected access to
mca_osc_rdma_component.c_modules in ompi_osc_rdma_windx_to_module

This commit was SVN r22700.
2010-02-24 01:28:37 +00:00
Jeff Squyres
5ec2d8764b Amendment to r22671: change the name of the new communicator flag from
INTERNAL to EXTRA_RETAIN, because not all "internal" communicators
have this flag set (only internal communicators with CIDs less than
their parent).  Hence, what this flag ''really'' means is that there
was an extra RETAIN performed on it.  So name the flag just that --
EXTRA_RETAIN -- indicating that an extra RETAIN has occurred.

This commit was SVN r22690.

The following SVN revision numbers were found above:
  r22671 --> open-mpi/ompi@61dee816db
2010-02-23 21:24:07 +00:00
Jeff Squyres
583394e30b This help message got a little jumbled.
This commit was SVN r22689.
2010-02-23 21:09:16 +00:00
Christopher Yeoh
f79263550c This fixes trac:2265 removing a race in the openib btl endpoint when
increasing sequence numbers. cmr:v1.4

This commit was SVN r22684.

The following Trac tickets were found above:
  Ticket 2265 --> https://svn.open-mpi.org/trac/ompi/ticket/2265
2010-02-23 12:46:06 +00:00
Christopher Yeoh
c1dcf1c164 The release of memory used by registration lists in rcaches must be delayed until the rcache lock is not held or deadlock
can occur ( fixes trac:2111 ).
Should not deregister memory with the rcache lock held otherwise a deadlock can occur as the lower
level infiniband libraries can free memory ( fixes trac:2110 )

cmr:v1.4

This commit was SVN r22683.

The following Trac tickets were found above:
  Ticket 2110 --> https://svn.open-mpi.org/trac/ompi/ticket/2110
  Ticket 2111 --> https://svn.open-mpi.org/trac/ompi/ticket/2111
2010-02-23 11:31:58 +00:00
Christopher Yeoh
322e73d8c4 The ib_procs list in the openib btl is accessed without the ib lock in some cases. This causes races when running multithreaded. This patch adds protection of the ib_procs list with the ib_lock.
fixes trac:2149 cmr:v1.4

This commit was SVN r22682.

The following Trac tickets were found above:
  Ticket 2149 --> https://svn.open-mpi.org/trac/ompi/ticket/2149
2010-02-23 05:19:03 +00:00
Christopher Yeoh
a0b8f061a6 Destroying an rcache vma while the rcache lock is held
as this can result in a low level free of memory which
can require the rcache lock resulting in a deadlock

This fixes trac:2107 
cmr:v1.4

This commit was SVN r22679.

The following Trac tickets were found above:
  Ticket 2107 --> https://svn.open-mpi.org/trac/ompi/ticket/2107
2010-02-22 11:19:15 +00:00
Christopher Yeoh
11500e3267 Fixes bug where the wrong lock is taken in mca_btl_openib_alloc
when protecting the no_wqe_pending_frags list.

fixes trac:2118 add cmr:v1.4

This commit was SVN r22678.

The following Trac tickets were found above:
  Ticket 2118 --> https://svn.open-mpi.org/trac/ompi/ticket/2118
2010-02-22 08:14:45 +00:00
Christopher Yeoh
a14a5dc3c6 This fixes a bug where sometimes the rcache lock would be dropped when it wasn't actually held.
Also includes some minor copytight header additions that were missed in previous checkins
fixes trac:2101 added cmr:v1.4

This commit was SVN r22676.

The following Trac tickets were found above:
  Ticket 2101 --> https://svn.open-mpi.org/trac/ompi/ticket/2101
2010-02-22 07:40:42 +00:00
Edgar Gabriel
61dee816db This commit fixes a bug on how to deal with the potential if a 'dependent'
communicator that we created has a lower CID than the parent comm. This can
happen when using the hierarch collective communication module or for
inter-communicators (since we make a duplicate of the original communicator).
This is not a problem as long as the user calls MPI_Comm_free on the parent 
communicator.  However, if the communicators are not freed by the user but
released by Open MPI in MPI_Finalize, we walk through the list of still
available communicators and free them one by one. Thus, local_comm is freed
before the actual inter-communicator. However, the local_comm pointer in the
inter communicator will still contain the 'previous' address of the local_comm
and thus this will lead to a segmentation violation. In order to prevent that
from happening, we increase the reference counter local_comm by one if its CID
is lower than the parent. We cannot increase however its reference counter if
the CID of local_comm is larger than the CID of the inter communicators, since
a regular MPI_Comm_free would leave in that the case the local_comm hanging
around and thus we would not recycle CID's properly, which was the reason and
the cause for this trouble.

This commit fixes tickets 2094 and 2166. Note however, that I want to close
them manually, since a slightly different patch is required for the 1.4
series. This commit will have to be applied for the 1.5 series. And I will
need a volunteer to review it.

This commit was SVN r22671.
2010-02-19 23:45:30 +00:00
Rainer Keller
548d6f7c61 - Incorporated a rewording proposal by Jeff.
This commit was SVN r22670.
2010-02-19 14:37:09 +00:00
George Bosilca
7eff2cdf85 Unrestricted number of interfaces.
This commit was SVN r22669.
2010-02-19 07:10:32 +00:00
Pavel Shamis
a124f6b10b Adding a hash table for management dependences between SRQs and their BTL modules.
This commit was SVN r22653.
2010-02-18 09:48:16 +00:00
George Bosilca
3bceb20b1c Only get the receive datatype extent on the root process, as every
other process should ignore this value. Thanks to Michael Hofmann
for investigating this issue.

This commit closes trac:2268.

This commit was SVN r22639.

The following Trac tickets were found above:
  Ticket 2268 --> https://svn.open-mpi.org/trac/ompi/ticket/2268
2010-02-17 16:01:50 +00:00
Pavel Shamis
9d0ae097c1 Updating vendor part ids for some mellanox devices
This commit was SVN r22617.
2010-02-15 09:45:34 +00:00
Jeff Squyres
6c5f666890 Add a comment to the loopback check to explain why it is there. Also
slightly correct one other comment.

This commit was SVN r22606.
2010-02-11 14:59:04 +00:00
Rainer Keller
ea4de16561 - Check whether file is opened on network file-system.
If file does not exist, check the directory it lives in...
   Maybe used by caller, trying to open mmap() on NFS, Lustre or
   Panasas (thanks Sam).
   For now, this is used to warn about the usage of mmap on such FS.

   Please note, that Ralph mentioned the orte_no_session_dir parameter.
   The help message includes a reference to this.

   Tested on NFS and Lustre on Linux on
     smoky: mpirun --mca orte_tmpdir_base $HOME/tmp -np 2 ./mpi_stub
     jaguar: mpirun ... --mca orte_tmpdir_base /tmp/work/$USER ...

   Fixes trac:1354

   This should   cmr:v1.5   once it has soaked and is shown to work on
   Solaris

This commit was SVN r22604.

The following Trac tickets were found above:
  Ticket 1354 --> https://svn.open-mpi.org/trac/ompi/ticket/1354
2010-02-10 23:18:29 +00:00
Jeff Squyres
8f7edf6e3e After a '''lot''' of discussion and testing, this commit fixes some
long-standing bugs (see trac ticket list below).  They're currently
somewhat obscure bugs, but are becoming much more relevant in a world
where OpenFabrics devices fail and you replace them with a newer model
(i.e., the cluster is homogeneous... ''except'' for where you had to
replace one or two OpenFabrics devices, and the same model is no
longer available).

This commit includes a '''lengthy''' comment (that we spent a lot of
time writing!) about what exactly it does and does not do.  The
previous code was rather short and '''incredibly''' subtle.  The new
code is slightly longer, but is both much more explicit and much more
painstakingly documented.

This commit fixes multiple trac tickets.  The real one that we fix is
#1707; the others are fixed as a side-effect.  In short: fixing #1707
prevents Bad Things from happening later in the startup sequence.

Fixes trac:1707, #2164, #1574.

cmr:v1.4.2:reviewer=pasha
cmr:v1.5:reviewer=pasha

This commit was SVN r22592.

The following Trac tickets were found above:
  Ticket 1707 --> https://svn.open-mpi.org/trac/ompi/ticket/1707
2010-02-10 16:53:26 +00:00
Nysal Jan
97d66bce78 This fixes trac:2154 - CSUM PML false positive. Needs to go to both cmr:v1.4.2 and cmr:v1.5
This commit was SVN r22590.

The following Trac tickets were found above:
  Ticket 2154 --> https://svn.open-mpi.org/trac/ompi/ticket/2154
2010-02-10 10:24:16 +00:00
Steve Wise
d40d2165c0 Never advertise a loopback address (127/8) to your peers.
This commit was SVN r22589.
2010-02-09 19:07:33 +00:00
George Bosilca
144143a3ff Remove an unused local variable.
This commit was SVN r22566.
2010-02-05 22:27:24 +00:00
Josh Hursey
a3583b8f57 Fix --bynode option to remember for subsequent jobs where it left off last time.
Add a ''map_bynode'' info key to determine if the job to be started by comm_spawn* should be mapped by node or by slot. Default is to map according to the default policy set when the parent job was started.

cmr:v1.5.1

This commit was SVN r22564.
2010-02-05 15:37:49 +00:00
Brian Barrett
50e3a5c349 AC_CHECK_FUNCS. Removes an annoying warning during application link on
Catamount.

Should go to both cmr:v1.4:reviewer=jsquyres and cmr:v1.5:reviewer=jsquyres

This commit was SVN r22547.
2010-02-04 04:42:36 +00:00
George Bosilca
bc7ceb3587 We enable the dynamic decision if the user force it via an MCA argument or set it in the
decision file. In addition do a fine grain activation, i.e. per collective function.

This commit was SVN r22510.
2010-01-29 09:03:59 +00:00
Ralph Castain
b3dd63fd81 Remove a stale pcie btl that never got completed
This commit was SVN r22498.
2010-01-27 01:16:01 +00:00
Jeff Squyres
1a7b7f7180 Make PCIE BTL compile/distribute .l files like everywhere else in the tree.
This commit was SVN r22467.
2010-01-22 15:39:42 +00:00
Jeff Squyres
fa38b97249 Generated files should not be in SVN.
cmr:v1.5

This commit was SVN r22465.
2010-01-22 14:01:02 +00:00
Ralph Castain
31cdbcfa5f Set the nameisset flag during dyn_init. Thanks to Guillaume Thouvenin for spotting the problem.
This commit was SVN r22460.
2010-01-20 15:35:23 +00:00
Shiqing Fan
ad763c327d Restore several linked libraries that were deleted by mistake in r22405.
This commit was SVN r22415.

The following SVN revision numbers were found above:
  r22405 --> open-mpi/ompi@872a4047ba
2010-01-14 21:50:42 +00:00
Edgar Gabriel
99e4ef3c86 path to make ROMIO compile over PVFS2 version > 2.7
Taken from the MPICH version of ROMIO.

This commit was SVN r22413.
2010-01-14 21:25:53 +00:00
Avneesh Pant
8bdd334d95 Allow the PSM component to return ERR_NOT_AVAIL so it can be unloaded silently if executed on a node with no QLogic IB hardware. Also minor modifications to have the CM PML allow itself to be unloaded if no MTL components are available. The component selection logic can then continue to use other PMLs.
This commit was SVN r22410.
2010-01-14 19:39:35 +00:00
Shiqing Fan
872a4047ba Fix the bug that caused by ADD_DEPENDENCIES() from different version of CMake.
In CMake 2.6 and earlier, this function add dependencies for targets and also link the target libraries automatically, but in CMake 2.8,this behavior has been changed, i.e. it will only add the dependencies but no link, which will cause linking errors at compilation time.

This commit was SVN r22405.
2010-01-14 18:10:20 +00:00
Vasily Filipov
370b1c75c4 Added addition condition for create_srq
This commit was SVN r22403.
2010-01-14 16:09:10 +00:00
Jeff Squyres
b46628bf8d Reformat for 80-char width.
This commit was SVN r22402.
2010-01-14 13:31:11 +00:00
Avneesh Pant
774b965784 Add in support to specify IB path record query mechanism and IB Application/Service ID for PSM MTL. Also fix a minor bug in calculating the minimum connection timeout.
This commit was SVN r22397.
2010-01-13 18:58:00 +00:00
Jeff Squyres
2bdcb2a979 Move CM's MCA params into their own function (component.register).
This commit was SVN r22392.
2010-01-12 20:11:47 +00:00
Jeff Squyres
e96032dec9 Fix a type (otherwise we get a compiler warning).
This commit was SVN r22380.
2010-01-07 17:39:18 +00:00
Shiqing Fan
c37308b8eb Remove the deleted windows file from the tarball.
This commit was SVN r22347.
2009-12-29 16:11:32 +00:00
Shiqing Fan
b8555448b5 Remove the unnecessary/duplicated unistd.h.
This commit was SVN r22346.
2009-12-28 16:22:16 +00:00
Shiqing Fan
d0f85beaf3 Correctly include those header files.
This commit was SVN r22344.
2009-12-28 16:13:06 +00:00
Shiqing Fan
90e3092ce5 Fix a type cast.
This commit was SVN r22343.
2009-12-28 16:12:46 +00:00
Shiqing Fan
a2d00d4ab8 Exclude a pml component that is not necessary for Windows.
This commit was SVN r22342.
2009-12-28 16:12:28 +00:00
George Bosilca
e127b20038 Correct a type in the name of the help string.
This commit was SVN r22336.
2009-12-21 19:13:25 +00:00
Vasily Filipov
897b7c0aa8 Fix orte_show_help message type error.
This commit was SVN r22321.
2009-12-16 14:11:43 +00:00
Vasily Filipov
e73274f9a9 Disabling SRQ limit event for devices that doesn't support this feature.
This commit was SVN r22320.
2009-12-16 14:05:35 +00:00
Vasily Filipov
87e71b26fe Jeff Squyres fixes
This commit was SVN r22319.
2009-12-16 10:23:58 +00:00
George Bosilca
b3d3a8e7b3 Remove useless lines.
This commit was SVN r22316.
2009-12-15 23:55:14 +00:00
George Bosilca
b85c3ca081 Enable support for the INRIA
knem (http://runtime.bordeaux.inria.fr/knem/) kernel device. This
is part of Ma Teng's work on Open MPI.

This commit was SVN r22315.
2009-12-15 23:34:09 +00:00