1
1

4966 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
c4c9bc1573 As per the RFC:
http://www.open-mpi.org/community/lists/devel/2014/04/14496.php

Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM

This commit was SVN r31557.
2014-04-29 21:49:23 +00:00
Rolf vandeVaart
fc0a75da91 Fix help message errors as reported by check-help-strings.pl script.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31555.
2014-04-29 20:29:18 +00:00
Nathan Hjelm
2f5b1ca4cf osc/rdma: do not leak the receive request
This commit fixes a bug that can cause request and communicator leaks
when cleaning up an OSC window. The should prevent a hang seen with
IMB-EXT.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31539.
2014-04-28 19:55:18 +00:00
Nathan Hjelm
626b521e9c pml/ob1: fix heterogeneous support when using the send_inline optimization
We will track #4568 from the 1.8 CMR.

Closes trac:4568

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31535.

The following Trac tickets were found above:
  Ticket 4568 --> https://svn.open-mpi.org/trac/ompi/ticket/4568
2014-04-28 17:36:26 +00:00
Jeff Squyres
64c1228b55 Roll back r31519 and r31521: George convinced us that these approaches
weren't right.

This commit was SVN r31528.

The following SVN revision numbers were found above:
  r31519 --> open-mpi/ompi@b449c750b7
  r31521 --> open-mpi/ompi@e243805ed8
2014-04-24 20:27:03 +00:00
Nathan Hjelm
c9a257f1a0 btl/ugni: always buffer sendi fragments
This commit will improve the message rate when using the sendi function
by not waiting for the send to get to the remote process.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31526.
2014-04-24 18:50:29 +00:00
Nathan Hjelm
0849d61e38 btl/vader: improve performance under heavy load and eliminate a racy
feature

This commit should fix a hang seen when running some of the one-sided
tests. The downside of this fix is it reduces the maximum size of the
messages that use the fast boxes. I will fix this in a later commit.

To improve performance under a heavy load I introduced sequencing to
ensure messages are given to the pml in order. I have seen little-no
impact on the message rate or latency with this change and there is a
clear improvement to the heavy message rate case.

Lets let this sit in the trunk for a couple of days to ensure that
everything is working correctly.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31522.
2014-04-24 17:36:03 +00:00
Jeff Squyres
e243805ed8 coll tuned alltoallv: correctly handle 0-sized messages with MPI_IN_PLACE
Patch from Gilles Gouaillardet on #4517 to fix handling 0-sized
messages in coll tuned with MPI_ALLTOALLV and MPI_IN_PLACE.

Reviewed by Jeff Squyres.

Fixes trac:4517

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31521.

The following Trac tickets were found above:
  Ticket 4517 --> https://svn.open-mpi.org/trac/ompi/ticket/4517
2014-04-24 16:55:53 +00:00
Jeff Squyres
b449c750b7 coll basic: correctly handle alltoall[vw] 0-sized messages
Patch from Gilles Gouaillardet on #4506 to correctly handle 0-sized
messages in coll/basic MPI_Alltoallv and MPI_Alltoallw.

Reviewed by Jeff Squyres.

Fixes trac:4506.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31519.

The following Trac tickets were found above:
  Ticket 4506 --> https://svn.open-mpi.org/trac/ompi/ticket/4506
2014-04-24 16:25:43 +00:00
Jeff Squyres
e9b694f1d8 coll_base_comm_unselect.c: fix memory leaks
Ensure to also OBJ_RELEASE the neightbor and ineighbor modules.

Fixes trac:4444 (this patch is from that ticket).

This commit was SVN r31516.

The following Trac tickets were found above:
  Ticket 4444 --> https://svn.open-mpi.org/trac/ompi/ticket/4444
2014-04-24 15:53:06 +00:00
George Bosilca
024221f469 Initialize some fields (prevent valgrind complaints).
This commit was SVN r31503.
2014-04-23 13:38:30 +00:00
Jeff Squyres
5d17628823 Add in an opal_output_verbose() so that we'll see the case where there
are no usNICs found.

Refs trac:4549

This commit was SVN r31489.

The following Trac tickets were found above:
  Ticket 4549 --> https://svn.open-mpi.org/trac/ompi/ticket/4549
2014-04-22 18:59:10 +00:00
Mike Dubman
a4990de055 mca: track external lib version (runtime/compiletime) for mca component
based on thread: http://www.open-mpi.org/community/lists/devel/2014/04/14505.php

Create mca parameter to track runtime/compiletime ext lib version for component.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31487.
2014-04-22 18:02:26 +00:00
George Bosilca
75cf79c783 Ahem ... Correctly implement most of the 3 arguments
operator in Open MPI. Creepy that it was not discovered earlier.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31473.
2014-04-21 23:31:23 +00:00
George Bosilca
6a65d27bcc Print the 3rd buffer for the MPI_Op.
This commit was SVN r31471.
2014-04-21 23:29:30 +00:00
Jeff Squyres
a3acc49688 usnic_component.c: don't complain if there are no usNIC devices
cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r31468.
2014-04-21 19:28:48 +00:00
Mike Dubman
6f057e57ba MXM: enable on demand mapping for only MPI mxm context
fixed by Devender, reviewed by Yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31463.
2014-04-20 09:15:37 +00:00
Jeff Squyres
a28d7af262 Remove set-but-unused variable.
This commit was SVN r31457.
2014-04-19 12:51:51 +00:00
Rolf vandeVaart
1fab9bb37f Fixes per review by jsquyres. Piggy back on btl_base_verbose rather than using my own special MCA var.
This commit was SVN r31427.
2014-04-18 18:09:09 +00:00
Rolf vandeVaart
a6a245b5b5 More efficient way of waiting for asynchronous copy to complete.
This commit was SVN r31420.
2014-04-17 15:18:50 +00:00
Nathan Hjelm
a03b11c20e bcol/basesmuma: fix broken allgather algorithm
The algorithm was failing ibm/collective/allgather and iallgather. I
cleaned up the code to eliminate duplicate code paths and tracked the
issue down to an error in the way extra nodes in the knomial exchange
are handled. The new code is more compact and has been tested with up
to 64 ranks with the ibm test suite.

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31419.
2014-04-16 22:43:52 +00:00
Nathan Hjelm
e125bbe347 coll/ml: clean out apparently stale code
The file coll_ml_ibarrier.c wasn't included in coll/ml's Makefile.am
and the setup code from coll_ml_hier_algorithms_ibarrier.c was not
being called. It looks like this code is stale and has long since been
replaced by the code in coll_ml_barrier.c

Once all these little CMRs are approved I may make it into one roll-up
CMR to make it easier on the RM.

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31418.
2014-04-16 22:43:43 +00:00
Nathan Hjelm
484a3f6147 coll/ml: fix issues identified by the clang static analyser and fix
a segmentation fault in the reduce cleanup

Some of the changes address false warnings produced by scan-build. I
added asserts and changed some malloc calls to calloc to silence these
warnings.

The was one issue in cleanup for reduce since the component_functions
member is changed by the allreduce call. There may be other issues
with how this code works but releasing the allocated
component_functions after setting up the static functions addresses
the primary issue (SIGSEGV).

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31417.
2014-04-16 22:43:35 +00:00
Rolf vandeVaart
8897e2f5bb Fix typo error in commit r31388.
This commit was SVN r31398.

The following SVN revision numbers were found above:
  r31388 --> open-mpi/ompi@ccb33ff811
2014-04-15 19:50:54 +00:00
Nathan Hjelm
ccb33ff811 btl: Use C99 sub-object naming when initializing BTL components
Two things to note:

 - This change will allow us to expand the BTL interface without
   having to worry about modifying BTLs that will not support the new
   interfaces. More on this will come later this year as part of the
   1.9 series.

 - C99 guarantees that uninitialed members of structs declared outside
   of functions (DATA binary section) will be initialized with
   0's. This allows us to drop stuff like .btl_flags = 0, or .btl_get
   = NULL.

This commit was SVN r31388.
2014-04-14 19:29:26 +00:00
Yossi Etigin
7efb724d7b osc/rdma: fix deadlock with put_long protocol.
When sending PUT_LONG, the data is sent before headers, and sometimes 
the header is not flushed immediately. This creates a lot of unexpected 
receives in the peer, since it would posts a receive only when gets the 
header, which makes it run out of receive buffers. When the sender 
eventually flushes the window, the receiver already has no buffers to 
receive the header, which causes a deadlock.

The fix is to always flush the headers when doing put_long.

cmr=v1.8.1:reviewer=hjelmn

This commit was SVN r31378.
2014-04-13 16:24:56 +00:00
Jeff Squyres
6521dcc4f1 Trivial defensive programming/style update: use {}, even for 1-line blocks.
This commit was SVN r31361.
2014-04-09 16:28:31 +00:00
Nathan Hjelm
7aece0a7fd osc/sm: fix bugs in both the passive and active target paths
While testing one-sided on LANL systems I found a couple more OSC
bugs that were not caught during the initial testing:

 - In the passive target code we read the read lock count as a
   char instead of the intended uint32_t. This causes lock to
   lockup when using shared locks after 127 iterations.

 - The post code used the wrong group when trying to increment post
   counters. This causes a segmentation fault.

 - Both the post and wait code used the wrong check in the inner
   loop leading to an infinite loop.

cmr=v1.8.1:reviewer=jsquyres

This commit was SVN r31354.
2014-04-08 21:55:00 +00:00
Nathan Hjelm
a31bfbeb2c osc/rdma: fix typo in get accumulate path
There was a typo in the ompi_osc_gacc_long_start that was causing a
segmentation fault when executing long get accumulate operations.

cmr=v1.8.1:reviewer=jsquyres

This commit was SVN r31353.
2014-04-08 21:54:52 +00:00
Ryan Grant
ca0a7b1a9a Correct typo in r31332, mtl_portals_enpoint.h -> mtl_portals_endpoint.h
This commit was SVN r31338.

The following SVN revision numbers were found above:
  r31332 --> open-mpi/ompi@b12ee27b3d
2014-04-08 14:41:51 +00:00
Ralph Castain
b12ee27b3d Add missing files - thanks to Mr. Anonymous for reporting them as missing from the 1.8 tarball
cmr=v1.8.1:reviewer=jsquyres:subject=add missing portals4 files

This commit was SVN r31332.
2014-04-08 02:55:14 +00:00
Jeff Squyres
16f90acbaf btl usnic: Add some SHOW_HELP: tokens and remove 2 unused help messages
This commit was SVN r31322.
2014-04-07 15:40:19 +00:00
George Bosilca
95a4f219ea This commit fixes some of the Coverity reported warnings. I addressed
some of the collective modules, the shared memory and the profiling
interface. I left out VT, dynamic fcoll and seq rmaps.

cmr=v1.8.1:reviewer=jsquyres:subject=silence Coverity reported warnings

This commit was SVN r31309.
2014-04-06 18:23:49 +00:00
Nathan Hjelm
9112977d86 btl/openib/udcm: fix two race conditions
This commit fixes two nasty races:

 - One can occur if the connection request message and connection completion
   message arrive out of order. This can happen normally when adaptive routing
   is used and also in a timeout situation where a UD message is lost.

 - One occurs when handling an ack at the same time as we are handling the
   message timeout. In this case we can not free the message or the timeout
   will be operating on invalid data. This fix is a band-aid until I can come
   up with a better approach. Instead of freeing the message it is marked
   as inactive and the event callback is triggered immediately (this has no
   affect if the callback is already active). The callback then frees the
   message if it is inactive.

cmr=v1.8.1:reviewer=pasha

This commit was SVN r31305.
2014-04-02 15:09:50 +00:00
Nathan Hjelm
71bdb8c439 coll/ml: fix some warnings identified by clang
cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31285.
2014-03-28 22:31:41 +00:00
Nathan Hjelm
fdf4c3b900 osc/rdma: really fix active message support
The last fix prevented a hang but had some cases where the results were
wrong. Fixed. Tested with armci, openmpi/ibm, openmpi/onesided.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31284.
2014-03-28 22:06:16 +00:00
Nathan Hjelm
6913a0f3cf osc/base: defensive programming. handle one more possible datatype case
It might be possible (don't know) for a datatype to made of a contiguous block
of a primitive datatype and have an lb. If this is ever the case the code
would have done the wrong thing. Add the lb in to be safe.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31283.
2014-03-28 22:06:05 +00:00
Nathan Hjelm
459431622b Revert "coll/ml: there is no reason not to enable coll/ml when a process in not"
Discussed this with Manju and we decided to back this one out until a later time.

This reverts commit r31188 and closes trac:4435

This commit was SVN r31282.

The following SVN revision numbers were found above:
  r31188 --> open-mpi/ompi@f1dd589092

The following Trac tickets were found above:
  Ticket 4435 --> https://svn.open-mpi.org/trac/ompi/ticket/4435
2014-03-28 21:16:34 +00:00
Nathan Hjelm
ee7a1478ee osc/rdma: fix test/wait hang
There are differences between how active and passive messages are
accounted for in this component. Active message counts on the sender
side are set to zero before the control message is sent so we do not
have to add one to the expected number of messages or we end up
double counting the control message. This commit should fix that error.

Fixes regression in one-sided/test_rma1

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31281.
2014-03-28 20:49:20 +00:00
Manjunath Gorentla Venkata
28609d3ac2 Clean wanring in sbgp and coll ml
This commit was SVN r31280.
2014-03-28 19:53:36 +00:00
Manjunath Gorentla Venkata
8c849ee991 coll/ml : Replace longer error message with opal_show_help; thanks Jeff for identifying those
This commit was SVN r31279.
2014-03-28 19:25:54 +00:00
Nathan Hjelm
a9fb4976d5 coll/ml: more fixes
There were a couple of issues with the memory leak fixes and several more verbose
issues. This fixes those issues.

cmr=v1.8.1:ticket=trac:4473

This commit was SVN r31273.

The following Trac tickets were found above:
  Ticket 4473 --> https://svn.open-mpi.org/trac/ompi/ticket/4473
2014-03-28 18:31:28 +00:00
Nathan Hjelm
efa37c17c8 osc/base: fix one more case in ompi_osc_base_sndrcv_op
This fixes more issues identified by armci. More issues still remain and fixes are
coming for those as well.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31272.
2014-03-28 18:31:10 +00:00
Jeff Squyres
173c046617 build: add Automake-like silent/verbose macros for "ln -s ..." operations
Also, since I put some of the macros for these silent/verbose rules up
in the top-level Makefile.man-page-rules file, I renamed it to
Makefile.ompi-rules.

I've had this sitting around for a while; now seems like as good a
time as any to commit it.

This commit was SVN r31271.
2014-03-28 18:24:32 +00:00
Nathan Hjelm
ecce211403 btl/vader: create the shared memory backing file in the proc's session
directory not the job's

This bug didn't affect the correctness of the vader results just the
cleanup. This commit removes an error message about removing a non-existent
file.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31265.
2014-03-28 00:38:19 +00:00
Nathan Hjelm
bd3b550c6d coll/ml: fix leaks
Thanks to ggouaillardet for finding and fixing these issues.

Closes trac:4460

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31264.

The following Trac tickets were found above:
  Ticket 4460 --> https://svn.open-mpi.org/trac/ompi/ticket/4460
2014-03-27 23:25:31 +00:00
Nathan Hjelm
545d5daced osc: add missing MPI_ERR_RMA_SHARED error code and internal equivalent
cmr=v1.8:reviewer=jsquyres

This commit was SVN r31259.
2014-03-27 20:06:43 +00:00
Jeff Squyres
cdb396697c usnic: do not disqualify if a peer does not put usnic modex info
If ompi_modex_recv() fails with OPAL_ERR_DATA_VALUE_NOT_FOUND, it
simply means that the peer process did not put any usnic BTL modex
info -- it is not an error.  So have the usnic BTL simply ignore that
peer (vs. disqualifying itself / treating this like a real error).

Refs trac:4442.

This commit was SVN r31258.

The following Trac tickets were found above:
  Ticket 4442 --> https://svn.open-mpi.org/trac/ompi/ticket/4442
2014-03-27 19:37:07 +00:00
Nathan Hjelm
b3bb90cf2d Do not include inttypes.h directly in Open MPI. Use opal_stdint.h instead.
This commit should finish the work started for #869. Closing that ticket
with this commit.

Closes trac:869

cmr=v1.8.1:reviewer=jsquyres

This commit was SVN r31257.

The following Trac tickets were found above:
  Ticket 869 --> https://svn.open-mpi.org/trac/ompi/ticket/869
2014-03-27 17:56:00 +00:00
Vasily Filipov
8ef2e746e6 BTL/OPENIB: fix for rdma cm AF_IB case - user private data pointer points to a lib RDMA CM header and not to a "Consumer Private Data".
This commit was SVN r31247.
2014-03-27 14:04:02 +00:00