1
1

953 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
8a2a0293fd fix sort_devs_by_distance in btl/openib
no need to #include <math.h> ...

cmr=v1.8.2:reviewer=miked:ticket=4759

This commit was SVN r32121.

The following Trac tickets were found above:
  Ticket 4759 --> https://svn.open-mpi.org/trac/ompi/ticket/4759
2014-07-02 08:08:10 +00:00
Gilles Gouaillardet
134eee1c4f fix sort_devs_by_distance in btl/openib
The distances as returned by hwloc_get_whole_distance_matrix_by_type are typ float.
This patch handle all distances as float.

cmr=v1.8.2:reviewer=miked

This commit was SVN r32120.
2014-07-02 07:56:40 +00:00
Ralph Castain
f3cb124e50 Revert r32082 and r32070 - the developer's conference has decided to go a different direction on the threaded progress effort. This will involve some degree of prototyping to understand the tradeoffs prior to making a final design decision, and so we'll hold off on the final change until that is completed.
This commit was SVN r32089.

The following SVN revision numbers were found above:
  r32070 --> open-mpi/ompi@12d92d0c22
  r32082 --> open-mpi/ompi@aa6438ef7a
2014-06-25 20:43:28 +00:00
Jeff Squyres
69fa331cc2 openib/ugni: output verbose message when a BTL is ignored due to THREAD_MULTIPLE
usnic and portals4 already do this.

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32074.
2014-06-24 21:13:17 +00:00
Ralph Castain
12d92d0c22 Per the OMPI developer conference, remove the last vestiges of OMPI_USE_PROGRESS_THREADS
This commit was SVN r32070.
2014-06-24 17:05:11 +00:00
Gilles Gouaillardet
d1bcd103ac btl/openib : correctly handle eager rdma buffers in mca_btl_openib_del_procs
if eager rdma is used, endpoint reference_count is greater than one.
this commit is a temporary fix that OBJ_RELEASE the endpoint as much as needed.
thought this is likely correct, it can be suboptimal and hence needs to be reviewed

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31922.
2014-06-02 02:23:52 +00:00
Ralph Castain
76f5991ab2 Couple of minor fixes
This commit was SVN r31680.
2014-05-08 02:26:45 +00:00
Ralph Castain
a8e2d6c3a6 The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature:
top_ompi_srcdir  ->  OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR

We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.

Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.

This commit was SVN r31678.
2014-05-07 21:48:53 +00:00
George Bosilca
024221f469 Initialize some fields (prevent valgrind complaints).
This commit was SVN r31503.
2014-04-23 13:38:30 +00:00
Rolf vandeVaart
1fab9bb37f Fixes per review by jsquyres. Piggy back on btl_base_verbose rather than using my own special MCA var.
This commit was SVN r31427.
2014-04-18 18:09:09 +00:00
Nathan Hjelm
ccb33ff811 btl: Use C99 sub-object naming when initializing BTL components
Two things to note:

 - This change will allow us to expand the BTL interface without
   having to worry about modifying BTLs that will not support the new
   interfaces. More on this will come later this year as part of the
   1.9 series.

 - C99 guarantees that uninitialed members of structs declared outside
   of functions (DATA binary section) will be initialized with
   0's. This allows us to drop stuff like .btl_flags = 0, or .btl_get
   = NULL.

This commit was SVN r31388.
2014-04-14 19:29:26 +00:00
Nathan Hjelm
9112977d86 btl/openib/udcm: fix two race conditions
This commit fixes two nasty races:

 - One can occur if the connection request message and connection completion
   message arrive out of order. This can happen normally when adaptive routing
   is used and also in a timeout situation where a UD message is lost.

 - One occurs when handling an ack at the same time as we are handling the
   message timeout. In this case we can not free the message or the timeout
   will be operating on invalid data. This fix is a band-aid until I can come
   up with a better approach. Instead of freeing the message it is marked
   as inactive and the event callback is triggered immediately (this has no
   affect if the callback is already active). The callback then frees the
   message if it is inactive.

cmr=v1.8.1:reviewer=pasha

This commit was SVN r31305.
2014-04-02 15:09:50 +00:00
Nathan Hjelm
b3bb90cf2d Do not include inttypes.h directly in Open MPI. Use opal_stdint.h instead.
This commit should finish the work started for #869. Closing that ticket
with this commit.

Closes trac:869

cmr=v1.8.1:reviewer=jsquyres

This commit was SVN r31257.

The following Trac tickets were found above:
  Ticket 869 --> https://svn.open-mpi.org/trac/ompi/ticket/869
2014-03-27 17:56:00 +00:00
Vasily Filipov
8ef2e746e6 BTL/OPENIB: fix for rdma cm AF_IB case - user private data pointer points to a lib RDMA CM header and not to a "Consumer Private Data".
This commit was SVN r31247.
2014-03-27 14:04:02 +00:00
Mike Dubman
b8dddabcfb add config section for upcoming ConnectiX4 card
cmr=v1.8:reviewer=ompi-rm1.8

This commit was SVN r31199.
2014-03-25 14:27:09 +00:00
Vasily Filipov
c424ad94f3 BTL/OPENIB: remove AC_RUN_IFELSE from configure and check AF_IB support by lib rdmacm during component_init.
This commit was SVN r31194.
2014-03-24 13:36:04 +00:00
Vasily Filipov
5acb96d11b BTL/OPENIB: always define BTL_OPENIB_RDMACM_IB_ADDR to 0 or 1.
This commit was SVN r30923.
2014-03-04 08:54:17 +00:00
Vasily Filipov
f36d50d494 OPENIB BTL/CONNECT: warning fixes caused by r30875.
This commit was SVN r30905.

The following SVN revision numbers were found above:
  r30875 --> open-mpi/ompi@f2014b96e7
2014-03-03 06:41:46 +00:00
Vasily Filipov
d702307521 OPENIB BTL/CONNECT: replace wrong rdma_freeaddrinfo call in rdmacm_component_query func.
This commit was SVN r30876.
2014-02-27 11:52:10 +00:00
Vasily Filipov
f2014b96e7 OPENIB BTL/CONNECT: Add support for AF_IB addressing in rdmacm.
This commit was SVN r30875.
2014-02-27 11:29:47 +00:00
Nathan Hjelm
dfe4a504e4 udcm: fix race between ack arrival and message send and potential hang in udcm
finalize.

Closes trac:4290

cmr=v1.7.5:reviewer=miked

This commit was SVN r30854.

The following Trac tickets were found above:
  Ticket 4290 --> https://svn.open-mpi.org/trac/ompi/ticket/4290
2014-02-26 15:33:27 +00:00
Nathan Hjelm
30b61a3333 Fix a number of issues in the new one sided code.
- Fix several typos is osc/rdma.

 - Fix a locking issue in osc/sm that was caused by an incorrect
   assumption about the semantics of opal_atomic_add_32.

 - Always unlock the accumulation lock in osc/sm.

 - The base of a processes shared memory window should be NULL if
   the size is zero. Fixed.

cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30853.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-26 15:33:18 +00:00
Joshua Ladd
9ea9bec4ad Addressing Jeff's comments:
1. Changed rng_buff_t --> opal_rng_buff_t
2. All global variables obey the prefix rule
3. Old code has been removed 
4. Found a couple of unnecessary includes

Refs trac:4298

This commit was SVN r30807.

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-24 23:18:35 +00:00
Joshua Ladd
e39d9f4080 Per the RFC schedule, add an additive lagged Fibonacci parallel random number generator to OPAL. In order to use, please add the following header to your code: opal/util/alfg.h. See ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c for an example how to seed with opal_srand and invoke the generator with opal_rand. This should be added to
cmr=v1.7.5:reviewer=rhc:subject=Add an OPAL RNG

This commit was SVN r30801.
2014-02-23 21:41:38 +00:00
Alex Margolin
636493393c OPENIB: Fixed error from writing to an uninitialized pipe.
The error was caused by leaving the pipe to the async thread uninitialized, then writing to it regardless of this. 
Fix is to check the existance of the async thread and the pipe to it.

reviewd by miked

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30644.
2014-02-09 14:07:14 +00:00
Joshua Ladd
1dbd8688db This fixes a long standing bug in the OpenIB BTL's MCA param intialization.Only caught if BTL_OPENIB_FAILOVER_ENABLED. Thanks to Jeff for spotting. This should be added to:
cmr=v1.7.4:reviewer=jsquyres
cmr=v1.6.6

This commit was SVN r30558.
2014-02-04 20:01:39 +00:00
Ralph Castain
2cf4862b49 Cleanup warnings for use of void* - requires intermediate cast to uintptr_t. Thanks to Paul Hargrove for reporting it
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30333.
2014-01-20 15:44:45 +00:00
Rolf vandeVaart
e75afb2b82 Fix bug in distance computation code when deciding which devices to use on a NUMA node.
Also add a verbose flag so one can see what devices are selected as well as another flag to override
locality information and use all devices on the node.  

This commit was SVN r30287.
2014-01-14 15:41:56 +00:00
Jeff Squyres
13b29cff2c This commit compliements/completes r30140. r30140 made all the
configury/Makefile.am changes; this commit renames the internal
installdirs.h framework struct field names to match the configry macro
names:

 * pkgdatdir ->	ompidatadir
 * pkglibdir -> ompilibdir
 * pkgincludedir -> ompiincludedir

This commit was SVN r30145.

The following SVN revision numbers were found above:
  r30140 --> open-mpi/ompi@8b778903d8
2014-01-07 23:36:33 +00:00
Jeff Squyres
50d20ade82 Fix compiler warnings: remove unused variables
This commit was SVN r30143.
2014-01-07 23:21:47 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Nathan Hjelm
5c8ea3a251 btl/openib: Move free list memory allocation to add_procs
Per RFC which expired two weeks ago:

We are planning to make a change to Open MPI to always set up the btls. This
means the btl init will be called even if add_procs is never called for that
btl. In the openib btl free lists fragments are currently allocated in btl_init.
To avoid wasting that memory this commit moves that final device setup to
the add_procs function. This included allocating free lists, and starting the
async event thread.

At this time this change is safe since we have a barrier after add_procs in
MPI_Init. If this changes we will need to re-think some of the initialization
since we might have the possibility of a connection request before add_procs
is called.

Tested with Mellanox ConnectX2 and QLogic HCAs.

Commit also cleans up tabs in btl_openib_async.c.

cmr=v1.7.5:reviewer=miked

This commit was SVN r30122.
2014-01-06 19:51:30 +00:00
Rolf vandeVaart
c47e06463d Adjust CUDA related crossover value.
This commit was SVN r30100.
2013-12-30 18:39:11 +00:00
Ralph Castain
652f7a120f Add Mellanox device IDs that were included in prior releases, but somehow missing again here
cmr=v1.7.4:reviewer=miked

This commit was SVN r30095.
2013-12-26 17:47:05 +00:00
Nathan Hjelm
3be4536d9b Cleanup various leaks in ompi_info reported by valgrind.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30058.
2013-12-23 17:47:43 +00:00
Jeff Squyres
0ab48ad0d2 Fix some annoying flex warnings that have been there for years.
Many thanks to Tom Fogal for the initial patch.

cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings

This commit was SVN r29904.
2013-12-14 00:36:12 +00:00
Rolf vandeVaart
b955dbd6d9 Fix various items discovered by review of ticket #3951.
This commit was SVN r29900.
2013-12-13 21:25:07 +00:00
Jeff Squyres
f4afa4fd1f Add missing include, exposed in "external libevent" work.
Refs trac:3694

This commit was SVN r29898.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 21:21:30 +00:00
Jeff Squyres
bcfe2156d5 Bring over m4 quoting fix from v1.7 branch (in r29894) that was
discovered when removing some components.

This commit was SVN r29895.

The following SVN revision numbers were found above:
  r29894 --> open-mpi/ompi@58ed00296c
2013-12-13 20:27:33 +00:00
Nathan Hjelm
3262080391 Cleanup udcm structures to avoid issues with nesting structures with
flexible members.

UDCM is ready to go for 1.7.4 with this patch.

cmr=v1.7.4:ticket=3940

This commit was SVN r29861.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-12 05:24:37 +00:00
Nathan Hjelm
e0e94a6029 Fix warning caused by typo in r29815
This commit was SVN r29860.

The following SVN revision numbers were found above:
  r29815 --> open-mpi/ompi@d556b60b21
2013-12-11 21:45:39 +00:00
Nathan Hjelm
6ab69c758b Fix warnings in udcm.
cmr=v1.7.4:reviewer=rhc:ticket=3940

This commit was SVN r29859.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-11 21:40:06 +00:00
Rolf vandeVaart
3ae88f8a24 Ensure no fork support with GDR. CUDA-aware code only.
This commit was SVN r29854.
2013-12-10 18:08:53 +00:00
Rolf vandeVaart
1cc55f305f Add extra check for GDR. Adjust some names and replace opal_output with opal_show_help.
This commit was SVN r29853.
2013-12-10 16:04:08 +00:00
Jeff Squyres
3bd9c603ff Clean up variables used in configure with OPAL_VAR_SCOPE.
This is helpful in the work for #3694: ensure that many places that
eventually end up in configure don't overly-pollute the global shell
variable space (because debugging accidental shell variable pollution
can be a real pain).

Refs trac:3694

This commit was SVN r29830.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-06 23:40:34 +00:00
Rolf vandeVaart
d556b60b21 Chnage some CUDA configure code and macro names per review request by jsquyres in ticket #3880.
Functionally, nothing changes.

This commit was SVN r29815.
2013-12-06 14:35:10 +00:00
Jeff Squyres
c74c1e86d3 Per suggestion from Paul Kapinos, report in BTL verbosity if a device
is skipped because it is too far away.

(see thread starting here:
http://www.open-mpi.org/community/lists/devel/2013/06/12470.php)

This commit was SVN r29790.
2013-12-03 22:44:11 +00:00
Rolf vandeVaart
ab77435d9b Fix the CUDA-aware case where we are not sending any GPU data.
This commit was SVN r29788.
2013-12-03 20:25:58 +00:00
Nathan Hjelm
fe327d9859 udcm: cleanup code and improve the ack handling
Originally udcm acks used the immediate data to indicate which message was
being acknowleged. This data was (mysteriously) junk when using QLogic HCAs so I
updated udcm to use the source info (slid, qp, etc) to determine which message was being
acked. This works as long as we don't have two messages simultaneously in flight
to a particular peer and then loose the first of the two messages. The chances of this
happening are tiny. To fix this case I updated the udcm message header to include
a pointer to the in flight message. This pointer is then sent back to the sending
process to ack receipt.

cmr=v1.7.4:ticket=trac:3940

This commit was SVN r29775.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-02 20:18:46 +00:00
Nathan Hjelm
fb0b0442c4 openib/connect: re-enable xrc support in the openib btl
This commit updates the udcm cpc to support xrc. The steps followed by udcm
mimic those in the removed xoob cpc. This update has been tested with both XRC
and RC.

Mellanox, this is intended to go into 1.7.4. Please review carefully and let
me know if there are any issues.

cmr=v1.7.4:reviewer=miked

This commit was SVN r29767.
2013-11-27 22:28:04 +00:00