1
1
Граф коммитов

5171 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
c4d85ec6ca btl_usnic_cclient.c: update to use the new opal dstore
Use the new opal dstore API (vs. the old RTE DB API).

(dstore is not going to the v1.8 series, so there's no need to CMR
this to v1.8)

This commit was SVN r31580.
2014-04-30 22:32:47 +00:00
Nathan Hjelm
e963869fdf bcol/basesmuma: close mmapped file descriptor
Not closing this file descriptor will cause us to leak file
descriptors. It is safe to close the file after it has been mmapped.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31579.
2014-04-30 22:28:08 +00:00
Jeff Squyres
d40112a012 rte_base_frame.c: add sanity check to ensure proper sizes
There's a requirement in several places (e.g., opal dstore) that
sizeof(ompi_process_name_t) -- which comes from the compile-time
selected ompi/mca/rte component -- is equal to sizeof(uint64_t).  If
it's not, Bad Things will happen.

So put an assert here to catch that case.

This commit was SVN r31577.
2014-04-30 22:12:54 +00:00
Nathan Hjelm
d80f14eb0f sbgp/ptp: fix obvious typo
cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31575.
2014-04-30 22:10:22 +00:00
Nathan Hjelm
3e5388eaa6 mtl/psm: do not limit PSM to 8191 context ids
The old default context id maximum was committed to the trunk in
2006. After some discussion with Intel it appears this is restricting
the mtl to an arbirarly small number of communicators. Increasing the
default to allow up to 2^16 - 1 context ids.

Refs trac:4574

cmr=v1.8.2

This commit was SVN r31574.

The following Trac tickets were found above:
  Ticket 4574 --> https://svn.open-mpi.org/trac/ompi/ticket/4574
2014-04-30 22:10:15 +00:00
Ralph Castain
c4c9bc1573 As per the RFC:
http://www.open-mpi.org/community/lists/devel/2014/04/14496.php

Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM

This commit was SVN r31557.
2014-04-29 21:49:23 +00:00
Rolf vandeVaart
fc0a75da91 Fix help message errors as reported by check-help-strings.pl script.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31555.
2014-04-29 20:29:18 +00:00
Nathan Hjelm
2f5b1ca4cf osc/rdma: do not leak the receive request
This commit fixes a bug that can cause request and communicator leaks
when cleaning up an OSC window. The should prevent a hang seen with
IMB-EXT.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31539.
2014-04-28 19:55:18 +00:00
Nathan Hjelm
626b521e9c pml/ob1: fix heterogeneous support when using the send_inline optimization
We will track #4568 from the 1.8 CMR.

Closes trac:4568

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31535.

The following Trac tickets were found above:
  Ticket 4568 --> https://svn.open-mpi.org/trac/ompi/ticket/4568
2014-04-28 17:36:26 +00:00
Jeff Squyres
64c1228b55 Roll back r31519 and r31521: George convinced us that these approaches
weren't right.

This commit was SVN r31528.

The following SVN revision numbers were found above:
  r31519 --> open-mpi/ompi@b449c750b7
  r31521 --> open-mpi/ompi@e243805ed8
2014-04-24 20:27:03 +00:00
Nathan Hjelm
c9a257f1a0 btl/ugni: always buffer sendi fragments
This commit will improve the message rate when using the sendi function
by not waiting for the send to get to the remote process.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31526.
2014-04-24 18:50:29 +00:00
Nathan Hjelm
0849d61e38 btl/vader: improve performance under heavy load and eliminate a racy
feature

This commit should fix a hang seen when running some of the one-sided
tests. The downside of this fix is it reduces the maximum size of the
messages that use the fast boxes. I will fix this in a later commit.

To improve performance under a heavy load I introduced sequencing to
ensure messages are given to the pml in order. I have seen little-no
impact on the message rate or latency with this change and there is a
clear improvement to the heavy message rate case.

Lets let this sit in the trunk for a couple of days to ensure that
everything is working correctly.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31522.
2014-04-24 17:36:03 +00:00
Jeff Squyres
e243805ed8 coll tuned alltoallv: correctly handle 0-sized messages with MPI_IN_PLACE
Patch from Gilles Gouaillardet on #4517 to fix handling 0-sized
messages in coll tuned with MPI_ALLTOALLV and MPI_IN_PLACE.

Reviewed by Jeff Squyres.

Fixes trac:4517

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31521.

The following Trac tickets were found above:
  Ticket 4517 --> https://svn.open-mpi.org/trac/ompi/ticket/4517
2014-04-24 16:55:53 +00:00
Jeff Squyres
b449c750b7 coll basic: correctly handle alltoall[vw] 0-sized messages
Patch from Gilles Gouaillardet on #4506 to correctly handle 0-sized
messages in coll/basic MPI_Alltoallv and MPI_Alltoallw.

Reviewed by Jeff Squyres.

Fixes trac:4506.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31519.

The following Trac tickets were found above:
  Ticket 4506 --> https://svn.open-mpi.org/trac/ompi/ticket/4506
2014-04-24 16:25:43 +00:00
Jeff Squyres
e9b694f1d8 coll_base_comm_unselect.c: fix memory leaks
Ensure to also OBJ_RELEASE the neightbor and ineighbor modules.

Fixes trac:4444 (this patch is from that ticket).

This commit was SVN r31516.

The following Trac tickets were found above:
  Ticket 4444 --> https://svn.open-mpi.org/trac/ompi/ticket/4444
2014-04-24 15:53:06 +00:00
George Bosilca
024221f469 Initialize some fields (prevent valgrind complaints).
This commit was SVN r31503.
2014-04-23 13:38:30 +00:00
Jeff Squyres
5d17628823 Add in an opal_output_verbose() so that we'll see the case where there
are no usNICs found.

Refs trac:4549

This commit was SVN r31489.

The following Trac tickets were found above:
  Ticket 4549 --> https://svn.open-mpi.org/trac/ompi/ticket/4549
2014-04-22 18:59:10 +00:00
Mike Dubman
a4990de055 mca: track external lib version (runtime/compiletime) for mca component
based on thread: http://www.open-mpi.org/community/lists/devel/2014/04/14505.php

Create mca parameter to track runtime/compiletime ext lib version for component.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31487.
2014-04-22 18:02:26 +00:00
George Bosilca
75cf79c783 Ahem ... Correctly implement most of the 3 arguments
operator in Open MPI. Creepy that it was not discovered earlier.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31473.
2014-04-21 23:31:23 +00:00
George Bosilca
6a65d27bcc Print the 3rd buffer for the MPI_Op.
This commit was SVN r31471.
2014-04-21 23:29:30 +00:00
Jeff Squyres
a3acc49688 usnic_component.c: don't complain if there are no usNIC devices
cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r31468.
2014-04-21 19:28:48 +00:00
Mike Dubman
6f057e57ba MXM: enable on demand mapping for only MPI mxm context
fixed by Devender, reviewed by Yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31463.
2014-04-20 09:15:37 +00:00
Jeff Squyres
a28d7af262 Remove set-but-unused variable.
This commit was SVN r31457.
2014-04-19 12:51:51 +00:00
Rolf vandeVaart
1fab9bb37f Fixes per review by jsquyres. Piggy back on btl_base_verbose rather than using my own special MCA var.
This commit was SVN r31427.
2014-04-18 18:09:09 +00:00
Rolf vandeVaart
a6a245b5b5 More efficient way of waiting for asynchronous copy to complete.
This commit was SVN r31420.
2014-04-17 15:18:50 +00:00
Nathan Hjelm
a03b11c20e bcol/basesmuma: fix broken allgather algorithm
The algorithm was failing ibm/collective/allgather and iallgather. I
cleaned up the code to eliminate duplicate code paths and tracked the
issue down to an error in the way extra nodes in the knomial exchange
are handled. The new code is more compact and has been tested with up
to 64 ranks with the ibm test suite.

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31419.
2014-04-16 22:43:52 +00:00
Nathan Hjelm
e125bbe347 coll/ml: clean out apparently stale code
The file coll_ml_ibarrier.c wasn't included in coll/ml's Makefile.am
and the setup code from coll_ml_hier_algorithms_ibarrier.c was not
being called. It looks like this code is stale and has long since been
replaced by the code in coll_ml_barrier.c

Once all these little CMRs are approved I may make it into one roll-up
CMR to make it easier on the RM.

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31418.
2014-04-16 22:43:43 +00:00
Nathan Hjelm
484a3f6147 coll/ml: fix issues identified by the clang static analyser and fix
a segmentation fault in the reduce cleanup

Some of the changes address false warnings produced by scan-build. I
added asserts and changed some malloc calls to calloc to silence these
warnings.

The was one issue in cleanup for reduce since the component_functions
member is changed by the allreduce call. There may be other issues
with how this code works but releasing the allocated
component_functions after setting up the static functions addresses
the primary issue (SIGSEGV).

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31417.
2014-04-16 22:43:35 +00:00
Rolf vandeVaart
8897e2f5bb Fix typo error in commit r31388.
This commit was SVN r31398.

The following SVN revision numbers were found above:
  r31388 --> open-mpi/ompi@ccb33ff811
2014-04-15 19:50:54 +00:00
Nathan Hjelm
ccb33ff811 btl: Use C99 sub-object naming when initializing BTL components
Two things to note:

 - This change will allow us to expand the BTL interface without
   having to worry about modifying BTLs that will not support the new
   interfaces. More on this will come later this year as part of the
   1.9 series.

 - C99 guarantees that uninitialed members of structs declared outside
   of functions (DATA binary section) will be initialized with
   0's. This allows us to drop stuff like .btl_flags = 0, or .btl_get
   = NULL.

This commit was SVN r31388.
2014-04-14 19:29:26 +00:00
Yossi Etigin
7efb724d7b osc/rdma: fix deadlock with put_long protocol.
When sending PUT_LONG, the data is sent before headers, and sometimes 
the header is not flushed immediately. This creates a lot of unexpected 
receives in the peer, since it would posts a receive only when gets the 
header, which makes it run out of receive buffers. When the sender 
eventually flushes the window, the receiver already has no buffers to 
receive the header, which causes a deadlock.

The fix is to always flush the headers when doing put_long.

cmr=v1.8.1:reviewer=hjelmn

This commit was SVN r31378.
2014-04-13 16:24:56 +00:00
Jeff Squyres
6521dcc4f1 Trivial defensive programming/style update: use {}, even for 1-line blocks.
This commit was SVN r31361.
2014-04-09 16:28:31 +00:00
Nathan Hjelm
7aece0a7fd osc/sm: fix bugs in both the passive and active target paths
While testing one-sided on LANL systems I found a couple more OSC
bugs that were not caught during the initial testing:

 - In the passive target code we read the read lock count as a
   char instead of the intended uint32_t. This causes lock to
   lockup when using shared locks after 127 iterations.

 - The post code used the wrong group when trying to increment post
   counters. This causes a segmentation fault.

 - Both the post and wait code used the wrong check in the inner
   loop leading to an infinite loop.

cmr=v1.8.1:reviewer=jsquyres

This commit was SVN r31354.
2014-04-08 21:55:00 +00:00
Nathan Hjelm
a31bfbeb2c osc/rdma: fix typo in get accumulate path
There was a typo in the ompi_osc_gacc_long_start that was causing a
segmentation fault when executing long get accumulate operations.

cmr=v1.8.1:reviewer=jsquyres

This commit was SVN r31353.
2014-04-08 21:54:52 +00:00
Ryan Grant
ca0a7b1a9a Correct typo in r31332, mtl_portals_enpoint.h -> mtl_portals_endpoint.h
This commit was SVN r31338.

The following SVN revision numbers were found above:
  r31332 --> open-mpi/ompi@b12ee27b3d
2014-04-08 14:41:51 +00:00
Ralph Castain
b12ee27b3d Add missing files - thanks to Mr. Anonymous for reporting them as missing from the 1.8 tarball
cmr=v1.8.1:reviewer=jsquyres:subject=add missing portals4 files

This commit was SVN r31332.
2014-04-08 02:55:14 +00:00
Jeff Squyres
16f90acbaf btl usnic: Add some SHOW_HELP: tokens and remove 2 unused help messages
This commit was SVN r31322.
2014-04-07 15:40:19 +00:00
George Bosilca
95a4f219ea This commit fixes some of the Coverity reported warnings. I addressed
some of the collective modules, the shared memory and the profiling
interface. I left out VT, dynamic fcoll and seq rmaps.

cmr=v1.8.1:reviewer=jsquyres:subject=silence Coverity reported warnings

This commit was SVN r31309.
2014-04-06 18:23:49 +00:00
Nathan Hjelm
9112977d86 btl/openib/udcm: fix two race conditions
This commit fixes two nasty races:

 - One can occur if the connection request message and connection completion
   message arrive out of order. This can happen normally when adaptive routing
   is used and also in a timeout situation where a UD message is lost.

 - One occurs when handling an ack at the same time as we are handling the
   message timeout. In this case we can not free the message or the timeout
   will be operating on invalid data. This fix is a band-aid until I can come
   up with a better approach. Instead of freeing the message it is marked
   as inactive and the event callback is triggered immediately (this has no
   affect if the callback is already active). The callback then frees the
   message if it is inactive.

cmr=v1.8.1:reviewer=pasha

This commit was SVN r31305.
2014-04-02 15:09:50 +00:00
Nathan Hjelm
71bdb8c439 coll/ml: fix some warnings identified by clang
cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31285.
2014-03-28 22:31:41 +00:00
Nathan Hjelm
fdf4c3b900 osc/rdma: really fix active message support
The last fix prevented a hang but had some cases where the results were
wrong. Fixed. Tested with armci, openmpi/ibm, openmpi/onesided.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31284.
2014-03-28 22:06:16 +00:00
Nathan Hjelm
6913a0f3cf osc/base: defensive programming. handle one more possible datatype case
It might be possible (don't know) for a datatype to made of a contiguous block
of a primitive datatype and have an lb. If this is ever the case the code
would have done the wrong thing. Add the lb in to be safe.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31283.
2014-03-28 22:06:05 +00:00
Nathan Hjelm
459431622b Revert "coll/ml: there is no reason not to enable coll/ml when a process in not"
Discussed this with Manju and we decided to back this one out until a later time.

This reverts commit r31188 and closes trac:4435

This commit was SVN r31282.

The following SVN revision numbers were found above:
  r31188 --> open-mpi/ompi@f1dd589092

The following Trac tickets were found above:
  Ticket 4435 --> https://svn.open-mpi.org/trac/ompi/ticket/4435
2014-03-28 21:16:34 +00:00
Nathan Hjelm
ee7a1478ee osc/rdma: fix test/wait hang
There are differences between how active and passive messages are
accounted for in this component. Active message counts on the sender
side are set to zero before the control message is sent so we do not
have to add one to the expected number of messages or we end up
double counting the control message. This commit should fix that error.

Fixes regression in one-sided/test_rma1

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31281.
2014-03-28 20:49:20 +00:00
Manjunath Gorentla Venkata
28609d3ac2 Clean wanring in sbgp and coll ml
This commit was SVN r31280.
2014-03-28 19:53:36 +00:00
Manjunath Gorentla Venkata
8c849ee991 coll/ml : Replace longer error message with opal_show_help; thanks Jeff for identifying those
This commit was SVN r31279.
2014-03-28 19:25:54 +00:00
Nathan Hjelm
a9fb4976d5 coll/ml: more fixes
There were a couple of issues with the memory leak fixes and several more verbose
issues. This fixes those issues.

cmr=v1.8.1:ticket=trac:4473

This commit was SVN r31273.

The following Trac tickets were found above:
  Ticket 4473 --> https://svn.open-mpi.org/trac/ompi/ticket/4473
2014-03-28 18:31:28 +00:00
Nathan Hjelm
efa37c17c8 osc/base: fix one more case in ompi_osc_base_sndrcv_op
This fixes more issues identified by armci. More issues still remain and fixes are
coming for those as well.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31272.
2014-03-28 18:31:10 +00:00
Jeff Squyres
173c046617 build: add Automake-like silent/verbose macros for "ln -s ..." operations
Also, since I put some of the macros for these silent/verbose rules up
in the top-level Makefile.man-page-rules file, I renamed it to
Makefile.ompi-rules.

I've had this sitting around for a while; now seems like as good a
time as any to commit it.

This commit was SVN r31271.
2014-03-28 18:24:32 +00:00
Nathan Hjelm
ecce211403 btl/vader: create the shared memory backing file in the proc's session
directory not the job's

This bug didn't affect the correctness of the vader results just the
cleanup. This commit removes an error message about removing a non-existent
file.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31265.
2014-03-28 00:38:19 +00:00
Nathan Hjelm
bd3b550c6d coll/ml: fix leaks
Thanks to ggouaillardet for finding and fixing these issues.

Closes trac:4460

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31264.

The following Trac tickets were found above:
  Ticket 4460 --> https://svn.open-mpi.org/trac/ompi/ticket/4460
2014-03-27 23:25:31 +00:00
Nathan Hjelm
545d5daced osc: add missing MPI_ERR_RMA_SHARED error code and internal equivalent
cmr=v1.8:reviewer=jsquyres

This commit was SVN r31259.
2014-03-27 20:06:43 +00:00
Jeff Squyres
cdb396697c usnic: do not disqualify if a peer does not put usnic modex info
If ompi_modex_recv() fails with OPAL_ERR_DATA_VALUE_NOT_FOUND, it
simply means that the peer process did not put any usnic BTL modex
info -- it is not an error.  So have the usnic BTL simply ignore that
peer (vs. disqualifying itself / treating this like a real error).

Refs trac:4442.

This commit was SVN r31258.

The following Trac tickets were found above:
  Ticket 4442 --> https://svn.open-mpi.org/trac/ompi/ticket/4442
2014-03-27 19:37:07 +00:00
Nathan Hjelm
b3bb90cf2d Do not include inttypes.h directly in Open MPI. Use opal_stdint.h instead.
This commit should finish the work started for #869. Closing that ticket
with this commit.

Closes trac:869

cmr=v1.8.1:reviewer=jsquyres

This commit was SVN r31257.

The following Trac tickets were found above:
  Ticket 869 --> https://svn.open-mpi.org/trac/ompi/ticket/869
2014-03-27 17:56:00 +00:00
Vasily Filipov
8ef2e746e6 BTL/OPENIB: fix for rdma cm AF_IB case - user private data pointer points to a lib RDMA CM header and not to a "Consumer Private Data".
This commit was SVN r31247.
2014-03-27 14:04:02 +00:00
Alina Sklarevich
5cbf085dc2 mtl mxm: silent a warning.
in ompi_mtl_mxm_add_procs, define the ep_index variable only
for an older version of mxm.

submitted by Alina, reviewed by Mike.
cmr=v1.8:reviewer=ompi-rm1.8

This commit was SVN r31245.
2014-03-27 08:39:51 +00:00
Nathan Hjelm
0cccb2fb59 coll/ml: reduce noise from coll/ml error messages
The error doesn't prevent the user from running so there is no reason
to display it unless the user requested it (through coll_ml_verbose).

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31242.
2014-03-26 22:50:06 +00:00
Nathan Hjelm
b9da3ef462 btl/vader: actually set the correct send size in all cases
Fix a one line bug when dealing with non-contiguous sends in prepare_src. Bug was
identified by the intel test suite.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31232.
2014-03-26 21:50:07 +00:00
Nathan Hjelm
fc941edaf8 osc/base: adjust the logic in ompi_osc_base_sndrcv_op to adjust for
the case fix in ompi_osc_base_process_op in r31204.

There are two cases that needed to be handled:

 - The target is a simple datatype (contiguous block of a primitive
   type) but the origin is not. In this case we still need to pack
   the origin data but we can not rely on the convertor to do the
   unpack (see r31204).

 - Both the origin and target datatypes are simple datatypes. In this
   case we can use ompi_op_reduce to do the accumulation without having
   to pack the origin data.

cmr=v1.8:ticket=trac:4449

This commit was SVN r31231.

The following SVN revision numbers were found above:
  r31204 --> open-mpi/ompi@949abe45cd

The following Trac tickets were found above:
  Ticket 4449 --> https://svn.open-mpi.org/trac/ompi/ticket/4449
2014-03-26 17:07:29 +00:00
Nathan Hjelm
5400e21688 btl/vader: unlink the shared memory segment when finished
cmr=v1.8:reviewer=jsquyres

This commit was SVN r31230.
2014-03-26 16:25:02 +00:00
Nathan Hjelm
925af4706c osc/sm: fix bugs in window initialization and finalization
Fixed two bugs:

 - Use module->comm NOT comm to get the CID for the shared memory backing
   file. This fixes the case where there are multiple shared memory windows
   at the same time.

 - Remember to unlink the shared memory backing file.

Refs trac:4438

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31227.

The following Trac tickets were found above:
  Ticket 4438 --> https://svn.open-mpi.org/trac/ompi/ticket/4438
2014-03-26 15:52:51 +00:00
Nathan Hjelm
020f011552 osc/rdma: fix bugs in lock_all and flush_all
This commit fixes two bugs:

 - We were not correctly setting the lock type in the outstanding lock
   for lock_all. This caused undefined behavior.

 - flush_all was incorrectly checking for comm size - 1 lock acks but
   comm size flush acks. This is the reverse of what was intended.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31226.
2014-03-25 23:39:43 +00:00
Jeff Squyres
5a09ee5d8c usnic: drop unknown connection checker packets without erroring out
In most cases, bad messages received by the connectivty checker are
just dropped.  However, in one specific code path, a bad packet caused
an abort.  Doh!

This commit does two things:

1. Improve verbose messages for all these cases
1. Simply drop incoming messages that cannot be identified as ACKs or PINGs

Submitted by Jeff Squyres, reviewed by Dave Goodell.

cmr=v1.8:reviewer=ompi-rm1.8

This commit was SVN r31225.
2014-03-25 21:05:20 +00:00
Nathan Hjelm
0d703759f6 osc/rdma: fix possible error when encountering accumulate lock contention
It is possible to get into a situation where a small accumulate operation
can not be completed because a large accumulate operation holds the lock.
In this case we may return from wait/flush/etc before the operation is
complete. To handle this case increment the expected incoming fragment
count when queuing an accumulate operation and increment the incoming
fragment count after processing the accumulate operation.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31224.
2014-03-25 21:00:43 +00:00
Nathan Hjelm
3df85b47e9 osc/rdma: quiet warning in r31197
cmr=v1.8:ticket=trac:4441

This commit was SVN r31223.

The following SVN revision numbers were found above:
  r31197 --> open-mpi/ompi@0ed44f2fdb

The following Trac tickets were found above:
  Ticket 4441 --> https://svn.open-mpi.org/trac/ompi/ticket/4441
2014-03-25 21:00:36 +00:00
Nathan Hjelm
20af8339e6 osc/base: add support for datatypes that are a contiguous combination
of the primitive datatype

In this case we can not use the convertor to run the accumulate operation
since the datatype is a more or less a primitive type.

cmr=v1.8:ticket=trac:4449

This commit was SVN r31222.

The following Trac tickets were found above:
  Ticket 4449 --> https://svn.open-mpi.org/trac/ompi/ticket/4449
2014-03-25 21:00:26 +00:00
Nathan Hjelm
d681eb4655 osc/rdma: fix warnings introduced by r31204
cmr=v1.8:ticket=trac:4449

This commit was SVN r31221.

The following SVN revision numbers were found above:
  r31204 --> open-mpi/ompi@949abe45cd

The following Trac tickets were found above:
  Ticket 4449 --> https://svn.open-mpi.org/trac/ompi/ticket/4449
2014-03-25 21:00:19 +00:00
Nathan Hjelm
949abe45cd osc: fix datatype related issues in the one-sided code
This commit fixes two issues:

 - osc/rdma: The target side of an accumulate was using the target datatype
   in the receive to the packed buffer. This was conflicting with the way
   the reduction is done into the target buffer. Changed the receive to use
   the primitive datatype.

 - osc/base: The copy table was completely wrong. Fixed the table to match
   the underlying datatypes (which are opal not ompi datatypes).

 - osc/base: There is a problem using the optimized description. Fall back
   on using the non-optimized description until we can understand what is
   going wrong.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31204.
2014-03-25 15:28:48 +00:00
Nathan Hjelm
bc55276844 osc/rdma: fix bug in the active message code that could cause erroneous
results

The code to handle completion messages did not correctly increment the
number of expected messages. This could cause wait to return before all
incoming messages are complete.

I also added a check to ensure that start returns an error if we are in
a passive access epoch.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31203.
2014-03-25 15:28:36 +00:00
Jeff Squyres
8c2b9658ce Commit upstream ROMIO fix: dbad7873926a75adbff0fd0140ae321412f70d66
ROMIO code assumes all processes will use the same ROMIO driver.  we
were not reaching the "find a common file system" logic when NFS was
enabled, everyone stat-ed the file system without errors, but some
processees found a different file system (like if some processes are
writing to NFS and others to UFS)

See discussion beginning here:
http://lists.mpich.org/pipermail/discuss/2014-March/002403.html

Tested-by: Jeff Squyres <jsquyres@cisco.com>

Submitted by Rob Lathan, reviewed by Jeff Squyres

cmr=v1.8:reviewer=ompi-rm1.8

This commit was SVN r31201.
2014-03-25 14:50:07 +00:00
Alina Sklarevich
947233f539 common/verbs: added a call to ompi_ibv_free_device_list.
the ompi_common_verbs_find_ports function had a call to
ompi_ibv_get_device_list, but not to ompi_ibv_free_device_list.

fixed by Alina, reviewed by Vasily/Mike.
cmr=v1.8:reviewer=ompi-rm1.8 

This commit was SVN r31200.
2014-03-25 14:41:09 +00:00
Mike Dubman
b8dddabcfb add config section for upcoming ConnectiX4 card
cmr=v1.8:reviewer=ompi-rm1.8

This commit was SVN r31199.
2014-03-25 14:27:09 +00:00
Nathan Hjelm
0ed44f2fdb osc/rdma: add support for datatypes with large descriptions
This commit adds large datatype description support to the osc/rdma
component. Support is provided by an additional send/recv of the datatype
description if the description does not fit in an eager buffer. The
code is designed to require minimal new code and not for speed. We
consider this code path to be a slow path.

Refs trac:1905

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31197.

The following Trac tickets were found above:
  Ticket 1905 --> https://svn.open-mpi.org/trac/ompi/ticket/1905
2014-03-24 18:57:29 +00:00
Vasily Filipov
c424ad94f3 BTL/OPENIB: remove AC_RUN_IFELSE from configure and check AF_IB support by lib rdmacm during component_init.
This commit was SVN r31194.
2014-03-24 13:36:04 +00:00
Nathan Hjelm
15a8c9d7b8 coll/ml: addendum to r31189. increment the bcol_index
cmr=v1.8:ticket=trac:4436

This commit was SVN r31193.

The following SVN revision numbers were found above:
  r31189 --> open-mpi/ompi@c7d830f4b9

The following Trac tickets were found above:
  Ticket 4436 --> https://svn.open-mpi.org/trac/ompi/ticket/4436
2014-03-21 22:03:56 +00:00
Nathan Hjelm
128cfe0a39 coll/ml: cleanup tabs, indentation, and trailing whitespace in
bcol_basesmuma_bcast.c

This commit was SVN r31192.
2014-03-21 21:54:48 +00:00
Nathan Hjelm
d241f95af1 squash into previous. fix coll ml bcast
This commit was SVN r31191.
2014-03-21 21:54:41 +00:00
Nathan Hjelm
6740813c27 bcol/basesmuma: fix selection of coll/ml when only using local procs
When we are only using local ranks basesmuma needs to provide an allreduce
function for both large and small message or else the coll/ml selection
logic will fail. In the future this logic should probably be updated to
just disable allreduce in coll/ml instead of disabling coll/ml. For now
it should be correct to say the basesmuma allgather works for larger
messages.

cmr=v1.8:reviewer=manjugv

This commit was SVN r31190.
2014-03-21 21:54:35 +00:00
Nathan Hjelm
c7d830f4b9 coll/ml: improve the buffer size calculation and ensure the bcol_index in
a hierarchy actually matches a bcol that is in use.

There was a bug in one of the paths to calculate the ml buffer size. I fixed
the bug and squashed all the paths together to avoid further issues (the
result was correct in another path that calculated the same value).

Additionally, the i_hier was being used as the bcol_index. This is not
correct in a couple of cases so I added a variable to keep track of the
real bcol_index.

cmr=v1.8:reviewer=pasha

This commit was SVN r31189.
2014-03-21 21:54:28 +00:00
Nathan Hjelm
f1dd589092 coll/ml: there is no reason not to enable coll/ml when a process in not
bound.

This case is correctly handled by coll/ml so remove the check that diables
coll/ml in the not bound case.

cmr=v1.8:reviewer=manjugv

This commit was SVN r31188.
2014-03-21 21:54:21 +00:00
Nathan Hjelm
08bbdcbf61 coll/ml: fix leaks in coll/ml resources
This patch fixes two leaks:

 - Fix typo in fallback collective code that caused coll/ml to retain
   the ibcast module twice but only release it once. One of those ibcast
   saves was supposed to be bcast.

 - Do not check for module initialization in the module destructor. It
   is possible to destruct a module that is partially setup.

cmr=v1.8:reviewer=manjugv

This commit was SVN r31187.
2014-03-21 21:54:14 +00:00
Nathan Hjelm
20fe3804b0 Fix comment in r31146
cmr=v1.7.5:ticket=trac:4425

This commit was SVN r31148.

The following SVN revision numbers were found above:
  r31146 --> open-mpi/ompi@dca2f0027e

The following Trac tickets were found above:
  Ticket 4425 --> https://svn.open-mpi.org/trac/ompi/ticket/4425
2014-03-19 16:09:20 +00:00
Nathan Hjelm
dca2f0027e Protect against 0-byte allocations in carte_create and cart_sub.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31146.
2014-03-19 15:38:12 +00:00
Jeff Squyres
7adb137409 Fix segv in MPI_Graph_create_undef_c Intel test.
When you call MPI_Graph_create with a old_comm of size N, and pass
nnodes=(N=1), then the Nth proc is supposed to get MPI_COMM_NULL out.
The code in this base function didn't properly handle the proc(s) that
are supposed to get MPI_COMM_NULL out.

cmr=v1.7.5:reviewer=hjelmn

This commit was SVN r31145.
2014-03-19 15:16:28 +00:00
Jeff Squyres
c6994adf66 Add missing show_help message.
Found via Cisco MTT (i.e., it complained of not being able to find
this show_help message).

cmr=v1.8:reviewer=dgoodell

This commit was SVN r31144.
2014-03-19 14:09:19 +00:00
Nathan Hjelm
e764d3bebc coll/ml: really remove the asserts in the barrier setup
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r31136.
2014-03-18 22:04:50 +00:00
Nathan Hjelm
e030443d45 coll/ml: further improve the hierarchy discovery to handle the case where a
sbgp module fails to group any processes on any nodes.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31131.
2014-03-18 21:26:24 +00:00
Nathan Hjelm
8b2d723fd4 coll/ml: fix valgrind warning about reading uninitialed value
This isn't causing any errors that I know about but it does fix an
annoying valgrind warning. Simple fix, no review required.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r31130.
2014-03-18 21:26:17 +00:00
Nathan Hjelm
d9c8bf3785 coll/ml: move error messages to verbose output
There are situations where coll/ml does not initialize properly. These will
eventually need to be fixed but in the meantime it is better to not always
print an error message because the collective framework can still fall back
on another collective module. This commit reduces the verbose output.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31129.
2014-03-18 21:26:10 +00:00
Nathan Hjelm
97d7315dd2 coll/ml: do not assert if a barrier algorithm is not available
It is usually not a good idea to assert when something is not implemented
or something goes wrong. Replace asserts with debug output and return.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31128.
2014-03-18 21:26:04 +00:00
Nathan Hjelm
bddd6542b7 sbgp/basesmsocket: do not recalculate process locality
The necessary information is stored in the proc object. There is no need
to allgather the local process data to determine if another rank is on
the same socket.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31127.
2014-03-18 21:25:57 +00:00
Nathan Hjelm
22f64bb62b Addendum to r31096. Up basesmuma algorithm limits to 1M.
After discussion with Manju we decided to update these the process count
limits of the shared memory collectives to an arbitrarily large number.

cmr=v1.7.5:ticket=trac:4405

This commit was SVN r31126.

The following SVN revision numbers were found above:
  r31096 --> open-mpi/ompi@3f469d08e7

The following Trac tickets were found above:
  Ticket 4405 --> https://svn.open-mpi.org/trac/ompi/ticket/4405
2014-03-18 21:25:49 +00:00
Ralph Castain
543271b9de Set the locality prior to calling add_procs so bozos like Jeff get it at the right time
Refs trac:4411

This commit was SVN r31119.

The following Trac tickets were found above:
  Ticket 4411 --> https://svn.open-mpi.org/trac/ompi/ticket/4411
2014-03-18 17:57:27 +00:00
Jeff Squyres
7933de4928 Fix segv when ibv_create_ah fails.
* Ensure that all endpoints[x] values are initialized to NULL
* If ibv_create_ah fails, remove each endpoint from the
  module->all_endpoints list so that the endpoint can be destructed
  properly.

Submitted by Jeff Squyres, reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r31111.
2014-03-18 15:52:55 +00:00
Ralph Castain
554da83865 Set the locality for remote procs even after a comm_spawn. Ensure we store our own local cpuset upon launch so it will be shared during comm_join.
This provides full locality - i.e., not just node-level, but all the way down to whatever common binding level exists between the procs.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31106.
2014-03-18 14:51:07 +00:00
Jeff Squyres
5efd961149 Remove unnecessary \n's in ML_VERBOSE and ML_ERROR.
Also fixed spelling: IS_NOT_RECHABLE -> IS_NOT_REACHABLE.

Also mark a few places where opal_show_help() should have been used;
Manju will take care of these.

This commit was SVN r31104.
2014-03-18 12:24:32 +00:00
Nathan Hjelm
3f469d08e7 coll/ml: increase the number of allowed processes in a local reduce and
add checks to see if the bcol module can support allreduce.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31096.
2014-03-17 23:10:19 +00:00
Pavel Shamis
fba1edbf14 Removing ml include from bcol_ptpcoll.h.
It is not really required.

This commit was SVN r31095.
2014-03-17 22:58:40 +00:00
Nathan Hjelm
f92579dce5 coll/ml: fix a case not correctly handled by r31071
In r31071 I modified the logic to not increment the hierarchy level if
no processes were selected by that sbgp. That fixed a problem seen on
systems where we don't support process binding. The problem is there
is a case where we actually did select processes yet the number of
selected processes is 0. We need to increment the hierarchy in this case
as well.

This should fix the segmentation fault found by recent MTT runs. Once
this is committed to 1.7.5 remove the .ompi_ignore's from coll/ml and
bcol/ptpcoll. Tested with ompi-tests/ibm.

cmr=v1.7.5:reviewer=rhc

This commit was SVN r31081.

The following SVN revision numbers were found above:
  r31071 --> open-mpi/ompi@1911d97044
2014-03-15 22:37:28 +00:00
Jeff Squyres
34d92315ae Remove extraneous "while(0)".
Oops.

cmr=v1.7.5:ticket=trac:4395

This commit was SVN r31075.

The following Trac tickets were found above:
  Ticket 4395 --> https://svn.open-mpi.org/trac/ompi/ticket/4395
2014-03-14 20:41:54 +00:00
Jeff Squyres
06a58affca Fix minor hwloc memory leak in sbgp/basesmsocket
cmr=v1.8:reviewer=hjelmn

This commit was SVN r31074.
2014-03-14 20:40:12 +00:00
Jeff Squyres
036db91f3d For the love of all that is holy, do not put 1MB arrays on the stack.
This was causing JVMs to run out of stack space, and all manner of
badness ensued.

Instead, use the heap -- that's what it's there for.

cmr=v1.7.5:reviewer=rhc:subject=make coll/ml use the heap for large debug array

This commit was SVN r31073.
2014-03-14 20:39:39 +00:00
Rolf vandeVaart
ce5274652f Add some additional verbose output per this RFC
http://www.open-mpi.org/community/lists/devel/2014/03/14282.php
Reviewed by Jeff Squyres

This commit was SVN r31072.
2014-03-14 20:17:47 +00:00
Nathan Hjelm
1911d97044 coll/ml: fix assertion failure that occurs when level 0 of the hierarchy
fails to select any processes on any nodes.

Also modified basesmsocket to only print debugging info to the framework
output.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31071.
2014-03-14 19:39:00 +00:00
Ralph Castain
cd72aa9b66 Per Dave's comment, bzero has portability issues and little advantage over a simple memset. So let's use the safer solution.
cmr=v1.7.5:reviewer=dgoodell:subject=replace bzero with memset

This commit was SVN r31055.
2014-03-12 22:55:47 +00:00
Nathan Hjelm
e70809e169 osc/rdma: fix the spelling of incoming
cmr=v1.7.5:ticket=trac:4379

This commit was SVN r31050.

The following Trac tickets were found above:
  Ticket 4379 --> https://svn.open-mpi.org/trac/ompi/ticket/4379
2014-03-12 21:43:23 +00:00
Nathan Hjelm
d0009938a6 osc/rdma: tighten semantics a bit more
It is not valid to call flush outside a passive target epoch nor is
it valid to call lock/lock_all when no_locks is set. In the former
we were just semantically incorrect and the later would crash and
burn.

cmr=v1.7.5:ticket=trac:4382

This commit was SVN r31046.

The following Trac tickets were found above:
  Ticket 4382 --> https://svn.open-mpi.org/trac/ompi/ticket/4382
2014-03-12 18:53:47 +00:00
Nathan Hjelm
1fc9a55d08 osc/rdma: do not use MPI_SOURCE to determine the peer in an send operation.
This fixes a bug in r31029 which removes the use of the pml base request
(also not a good way since cm doesn't use the base request). We now allocate
a data structure (ugh) to determine the needed information. Tested with
mtt/onesided.

cmr=v1.7.5:ticket=trac:4379

This commit was SVN r31044.

The following SVN revision numbers were found above:
  r31029 --> open-mpi/ompi@29e00f9161

The following Trac tickets were found above:
  Ticket 4379 --> https://svn.open-mpi.org/trac/ompi/ticket/4379
2014-03-12 17:14:11 +00:00
Nathan Hjelm
6648a46963 rma: fix semantic errors in osc/rdma and MPI_Win_fence
- Return an error if the caller specified both MPI_MODE_NOPRECEDE and
   MPI_MODE_NOSUCCEED to MPI_Win_fence.

 - Return an error if the caller attempts to enter an active target
   epoch while already in a passive target epoch.

 - End an active target epoch if MPI_Win_fence is called with
   MPI_MODE_NOSUCCEED.

cmr=v1.7.5:ticket=trac:4382

This commit was SVN r31043.

The following Trac tickets were found above:
  Ticket 4382 --> https://svn.open-mpi.org/trac/ompi/ticket/4382
2014-03-12 17:14:03 +00:00
Nathan Hjelm
51916c5b41 osc/rdma: now that the access epoch is not open after MPI_Win_create* we
need to enable the access epoch in MPI_Win_fence.

I missed this change when I fixed the semantics of MPI_Win_create. With
this commit our one-sided MTT runs are now running clean.

cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r31041.
2014-03-12 16:11:15 +00:00
Nathan Hjelm
61f30d992a coll/ml: reduce has some issues when using non-contiguous datatypes. until
these issues are resolved disable coll/ml reduce.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31030.
2014-03-12 14:39:16 +00:00
Nathan Hjelm
29e00f9161 osc/rdma: fix issues with mpi_leave_pinned when using rdma capable btls
It seems we can't release accumulate buffers in completion callbacks
because the btls don't release registration resources until after the
callback has fired. The fix is to keep track of the unused buffers and
free them later. This should resolve issues when running IMB-EXT and
IMB-RMA.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31029.
2014-03-12 14:39:03 +00:00
Jeff Squyres
da87b506bd Remove warnings identified by clang 3.4
* Remove unused static functions
 * Remove unused static variables

cmr=v1.8:reviewer=hjelmn

This commit was SVN r31023.
2014-03-12 13:17:54 +00:00
Nathan Hjelm
d5d2d5c4d8 Add an internal ompi error code for RMA sync errors.
Dave Goodell correctly pointed out that it is unusual to return MPI
error classes from internal ompi functions. Correct this in the RMA
case by adding an internal error code to match MPI_ERR_RMA_SYNC.

This does change OMPI_ERR_MAX. I don't think this will cause any
problems with ABI.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31012.
2014-03-11 23:45:23 +00:00
Nathan Hjelm
b6a30e293a osc/rdma: check for incorrect use of the active target interface
This commit resolves a number of crashed discovered my the onesided
tests in MTT. The functions in question were operating on the assumption
the user was calling RMA functions correctly.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31008.
2014-03-11 23:01:51 +00:00
Nathan Hjelm
e9d60b9e2f osc/rdma: restrict local optimizations to occur only during an access epoch.
cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r31007.
2014-03-11 23:01:42 +00:00
Ralph Castain
9c66c4f439 Correctly implement --disable-oshmem and --without-orte so we don't build the disabled section of code. Fix a bunch of code rot in the PMI rte component, and add several missing headers when building --without-orte.
NOTE: I transferred the oshmem-disabled-by-default from the 1.7 branch to the trunk to minimize future disruption if/when we change that option.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31006.
2014-03-11 22:02:40 +00:00
Ralph Castain
ebd8e545c0 Silence warning
cmr=v1.8:reviewer=hjelmn

This commit was SVN r31005.
2014-03-11 21:59:17 +00:00
Mike Dubman
a14dda491e OSHMEM: various fixes
- -check-shmem-params is OFF by default. It checks OSHMEM API params and will abort on bad input
- hcoll do not save fallback coll pointers for unsupported collectives.

fixed by Val, Roman, reviewed by Miked/Igor

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30995.
2014-03-11 17:27:33 +00:00
Ralph Castain
e4efd5675f Per telecon, add comment indicating this needs to be fixed
Refs trac:4354

This commit was SVN r30991.

The following Trac tickets were found above:
  Ticket 4354 --> https://svn.open-mpi.org/trac/ompi/ticket/4354
2014-03-11 15:57:11 +00:00
Nathan Hjelm
51c5daf1b4 bcol/basesmuma: initialize module with all 0's to fix segmentation faults
in module destructor.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30978.
2014-03-10 20:42:47 +00:00
Nathan Hjelm
cbb531ed13 osc/rdma: use OPAL_ALIGN macro
cmr=v1.7.5:ticket=trac:4357

This commit was SVN r30975.

The following Trac tickets were found above:
  Ticket 4357 --> https://svn.open-mpi.org/trac/ompi/ticket/4357
2014-03-10 18:57:20 +00:00
Nathan Hjelm
5df8cd75a9 osc/rdma: ensure fragment headers and the packed datatype are 8-byte aligned.
The datatype unpacking code assumes that the packed datatype buffer has the
same alignment as an OPAL_PTRDIFF_TYPE. This was not enforced by the rdma
one-sided component. I changed the ordering and sized of various osc/rdma
headers to ensure their sizes are a multiple of 8-bytes and modified the
fragment allocation call to ensure all headers are 8-byte aligned. While
not the cleanest way to handle this situation it should resolve the issue.

Fixes trac:4315

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30974.

The following Trac tickets were found above:
  Ticket 4315 --> https://svn.open-mpi.org/trac/ompi/ticket/4315
2014-03-10 18:11:22 +00:00
Nathan Hjelm
85515f2587 osc/rdma: silence warning
cmr=v1.7.5:ticket=trac:4355

This commit was SVN r30970.

The following Trac tickets were found above:
  Ticket 4355 --> https://svn.open-mpi.org/trac/ompi/ticket/4355
2014-03-10 16:11:25 +00:00
Yossi Etigin
b04a2339c5 Fix segmentation fault when osc_rdma is used with pml_cm: osc_rdma
assumes the send request is derived from mca_pml_base_send_request_t,
but this is not true for pml cm, so we end up freeing invalid pointer.
 We cannot take the data pointer from the pml send request, so we pass 
the allocated buffer pointer in req_complete_cb_data, and put the 
osc_rdma_module pointer in that buffer as well.
 Previously, osc_pt2pt was used with pml_cm which didn't have this 
problem.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30967.
2014-03-10 15:21:37 +00:00
Yossi Etigin
280e96c99a In mtl_mxm, don't disconnect from a proc with refcount > 1.
This will keep the connection until mxm endpoint is destroyed.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30966.
2014-03-09 08:35:44 +00:00
Nathan Hjelm
579a2d10cc bcol/ptpcoll: initialize all pointers in the module to NULL to avoid possible
problems when the module is being destructed.

References #4331

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30964.
2014-03-07 21:16:20 +00:00
Nathan Hjelm
da2a68f669 coll/ml: fix bcast buffer size calculation
cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30963.
2014-03-07 21:00:08 +00:00
Nathan Hjelm
0af741810c coll/ml: do not access group proc pointers directly. use ompi_comm_peer_lookup instead.
Resolves an issue seen with --enable-sparse-groups.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30945.
2014-03-05 22:57:21 +00:00
Jeff Squyres
554303af08 Two minor usnic fixes
* clang warning stomp
* memory barrier for volatile variable use

These can go to 1.7.5 or can slip to v1.8 -- RM decision.

Submitted by Jeff Squyres, reviewed by Dave Goodell

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30944.
2014-03-05 21:21:53 +00:00
Jeff Squyres
fbde50e7cd Fix ibv_port_query() usnic extension usage
* Older versions of libusnic_verbs actually return 0 when querying for
  an unknown port.  So also check for a magic ID in the returned data
  to *really* know if the usnic extensions are supported.
* Use a union (in the common_verbs area) and memcpy (in the btl) to
  avoid undefined C type aliasing behavior.
* Ensure to memset the function table to 0 if the usnic extensions
  are not supported.

Submitted by Jeff Squyres, reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30935.
2014-03-04 22:11:47 +00:00
Ralph Castain
0461e54875 Add missing headers to tarball
This commit was SVN r30932.
2014-03-04 16:59:15 +00:00
Nathan Hjelm
5a4037df4f osc/rdma: fix typo in rdma osc component.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30931.
2014-03-04 16:57:56 +00:00
Nathan Hjelm
cb6670b340 bcol/basesmuma: use framework output for error message and fix a rounding error.
cmr=v1.7.5:reviewer=pasha

This commit was SVN r30929.
2014-03-04 16:55:57 +00:00
Adrian Reber
e5bef82ee1 OPAL_ENABLE_FT_CR: remove compiler warnings
When compiling --with-ft there are a few compiler warnings about
unused variables. This patch fixes those compiler warnings.

This commit was SVN r30927.
2014-03-04 15:28:07 +00:00
Rolf vandeVaart
c2ae29d860 Adjust priorities in smcuda BTL so it is used when CUDA-aware compiled in.
cmr=v1.7.5:reviewer=hjelmn

This commit was SVN r30925.
2014-03-04 14:44:44 +00:00
Vasily Filipov
5acb96d11b BTL/OPENIB: always define BTL_OPENIB_RDMACM_IB_ADDR to 0 or 1.
This commit was SVN r30923.
2014-03-04 08:54:17 +00:00
Jeff Squyres
6710c2ef3f usnic: remove unnecessary header union
Realistically, the usnic BTL doesn't need to know anything about the
underlying transport except for its header length (so that it knows
where the payload begins in a received buffer).  So remove the use of
the specific transport prefix union and just rely on the usnic verbs
extension to tell us what the header length is if we're using the
usNIC/UDP transport, or sizeof(struct ibv_grh) if we're using usNIC/L2
transport.

This commit was SVN r30914.
2014-03-03 21:33:12 +00:00
Jeff Squyres
05af83d5d8 common_verbs: Remove usnic magic probe test
Check the IBV_TRANSPORT_* values.  In the case of IBV_TRANSPORT_IWARP,
there's an ambiguity and we need to also check to see whether the
usnic verbs externsion probe exists.

This commit was SVN r30913.
2014-03-03 21:32:44 +00:00
Jeff Squyres
d61765cb2a usnic: use the new usnic verbs extensions
If they exist, call the usnic verbs extensions to both enable UDP
support and get the UD receiver header length that should be used
(rather than assume 40/struct GRH).

This commit was SVN r30912.
2014-03-03 21:31:42 +00:00
Nathan Hjelm
9e92c5be53 osc/sm: check for pthread_condattr_setpshared and pthread_mutexattr_setpshared. fall
back on barrier if either function doesn't exist.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30911.
2014-03-03 17:09:09 +00:00
Nathan Hjelm
dc3d4ffbf3 osc/sm: do not use gcc specific calls
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30910.
2014-03-03 16:47:29 +00:00
Mike Dubman
05ee929832 OMPI-MXM: handle multiple calls to add_procs() in MXM
- now add_procs can be called more than once (during MPI_INIT and Inter_Comm_Create)
- adjust MXM to this reality

fixed by Alina, reviewed by Yossi/Mike

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30907.
2014-03-03 13:50:37 +00:00
Vasily Filipov
f36d50d494 OPENIB BTL/CONNECT: warning fixes caused by r30875.
This commit was SVN r30905.

The following SVN revision numbers were found above:
  r30875 --> open-mpi/ompi@f2014b96e7
2014-03-03 06:41:46 +00:00
Jeff Squyres
1bda304a35 Mark the ompi_rte_abort() function as "no return"
This allows compilers to know that the code path(s) where
ompi_rte_abort() is invoked won't return (and therefore won't warn in
certain cases).

cmr=v1.8:reviewer=rhc

This commit was SVN r30891.
2014-02-28 17:45:36 +00:00
Jeff Squyres
f8dbba78a7 Send the BTL-passed message to ompi_rte_abort.
cmr=v1.8:reviewer=rolfv

This commit was SVN r30889.
2014-02-28 16:20:54 +00:00
Tom Naughton
8793560bde + fix abstraction violation (ORTE_process_info => OMPI_process_info)
This commit was SVN r30883.
2014-02-27 23:59:46 +00:00
Jeff Squyres
e819b5a34a Remove the vendor_ids parsing.
We don't use this functionality any more; we use the transport_type
and device name to identify usnic devices.  It's slightly easier
because we can transport_type+name from ibv_device_open() and don't
have to do an additional ibv_query_device() to get its attributes.

Reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30882.
2014-02-27 21:47:01 +00:00
Jeff Squyres
3cbdf33b88 This is what r30852 should have been: Consolidate into a single, outter loop of ibv_create_ah() calls
Follow on to SVN trunk r30850: consolidate the ibv_create_ah() calls
into a single loop, MPI_WAITALL-style.  That is, call the (effectively
non-blocking) ibv_create_ah() for each endpoint.  If we get
NULL+EAGAIN, it means that the UDP ARP is still ongoing down in the
kernel, so just try again later.  We put these all into a single loop
because it allows us to parallelize the ARP progress in the kernel.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30879.

The following SVN revision numbers were found above:
  r30850 --> open-mpi/ompi@3641500442
  r30852 --> open-mpi/ompi@4e282a3295

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-27 17:19:50 +00:00
Jeff Squyres
45810f0efb Revert r30852: the wrong version of this patch got committed to SVN.
This commit was SVN r30878.

The following SVN revision numbers were found above:
  r30852 --> open-mpi/ompi@4e282a3295
2014-02-27 15:02:15 +00:00
Vasily Filipov
d702307521 OPENIB BTL/CONNECT: replace wrong rdma_freeaddrinfo call in rdmacm_component_query func.
This commit was SVN r30876.
2014-02-27 11:52:10 +00:00
Vasily Filipov
f2014b96e7 OPENIB BTL/CONNECT: Add support for AF_IB addressing in rdmacm.
This commit was SVN r30875.
2014-02-27 11:29:47 +00:00
Ralph Castain
ce26b096b4 Prevent failover to direct_modex if key isn't found unless direct_modex was enabled
Refs trac:4258

This commit was SVN r30865.

The following Trac tickets were found above:
  Ticket 4258 --> https://svn.open-mpi.org/trac/ompi/ticket/4258
2014-02-27 02:04:56 +00:00
Jeff Squyres
7440f21b75 Add usnic connectivity-checking agent service.
Basically: since usnic is a connectionless transport, we do not get
OS-provided services "for free" that connection-oriented transports
get, namely: "hey, I wasn't able to make a connection to peer X", and
"hey, your connection to peer X has died."
    
This connectivity-checker runs in a separate progress thread in the
usnic BTL in local rank 0 on each server.  Upon first send in any
process, the connectivty-checker agent will send some UDP pings to the
peer to ensure that we can reach it.  If we can't, we'll abort the job
with a nice show_help message.
    
There's a lengthy comment in btl_usnic_connectivity.h explains the
scheme and how it works.

Reviewed by Dave Goodell.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30860.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 22:21:25 +00:00
Nathan Hjelm
dfe4a504e4 udcm: fix race between ack arrival and message send and potential hang in udcm
finalize.

Closes trac:4290

cmr=v1.7.5:reviewer=miked

This commit was SVN r30854.

The following Trac tickets were found above:
  Ticket 4290 --> https://svn.open-mpi.org/trac/ompi/ticket/4290
2014-02-26 15:33:27 +00:00
Nathan Hjelm
30b61a3333 Fix a number of issues in the new one sided code.
- Fix several typos is osc/rdma.

 - Fix a locking issue in osc/sm that was caused by an incorrect
   assumption about the semantics of opal_atomic_add_32.

 - Always unlock the accumulation lock in osc/sm.

 - The base of a processes shared memory window should be NULL if
   the size is zero. Fixed.

cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30853.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-26 15:33:18 +00:00
Jeff Squyres
4e282a3295 Consolidate into a single, outter loop of ibv_create_ah() calls
Follow on to SVN trunk r30850: consolidate the ibv_create_ah() calls
into a single loop, MPI_WAITALL-style.  That is, call the (effectively
non-blocking) ibv_create_ah() for each endpoint.  If we get
NULL+EAGAIN, it means that the UDP ARP is still ongoing down in the
kernel, so just try again later.  We put these all into a single loop
because it allows us to parallelize the ARP progress in the kernel.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30852.

The following SVN revision numbers were found above:
  r30850 --> open-mpi/ompi@3641500442

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 11:02:12 +00:00
Dave Goodell
3641500442 usnic: Loop on the ibv_create_ah call.
ibv_create_ah() may need to effect an ARP resolution, which may take
some time.  Rather than blocking in ibv_create_ah(), the usNIC driver
may return NULL and set errno to EAGAIN indicating that we should try
again (i.e., the ARP resolution is proceeding under the covers).

So add a simple loop here to loop over ibv_create_ah() until it
returns non-(NULL+EAGAIN).  A future commit will make this a bit more
efficient.

Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30850.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:38 +00:00
Dave Goodell
c40f8879c8 usnic: improve interface matching (esp. for UDP)
Prior to this commit we matched local interfaces to remote interfaces in
order to create endpoints in a simplistic way.  If any remote interfaces
were on the same subnet as any of our local interfaces then only local
interfaces would be paired (IP-routed remote interfaces would be
ignored).

This commit introduces a more general scheme which attempts to make the
"best" pairing of local interfaces to remote interfaces.  We now cast
the problem as a graph theory problem known as the "Assignment Problem",
or finding a maximum-cardinality, minimum-weight bipartite matching.  We
solve this problem by reducing the bipartite graph of interface
connectivity to a flow network and then solving for a minimum cost flow.
This is then easily converted into back into a matching on the original
bipartite graph.

In the new scheme, interfaces on the same subnet are preferred over
interfaces requiring intermediate routing hops and higher bandwidth
links are preferred over lower bandwidth links.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30849.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:26 +00:00
Dave Goodell
47148ab3cb usnic: helper routines for rtnetlink route lookups
Querying the OS routing table is important for making decisions about
which local and remote interfaces should be paired into reliable
communication channels.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30848.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:10 +00:00
Dave Goodell
db14c706ce usnic: add graph utility code
This code is intended to support usNIC interface matching functionality.
We currently view that problem as essentially the "Assignment Problem"
(http://en.wikipedia.org/wiki/Assignment_problem), for which there are
many possible solution approaches, including flow-network analysis.  In
the future, we might transition to a more nuanced view of the problem
which would likely also be flow-network based.

To this end, the current code focuses on providing one major algorithm
to the core usnic BTL: `ompi_btl_usnic_solve_bipartite_assignment`.  It
also exposes several typical and necessary functions for constructing,
manipulating, and querying weighted, directed graphs.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30847.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:49:54 +00:00
Dave Goodell
5bf969e63b usnic: unit test parse_ifex_str
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30846.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:48:05 +00:00
Dave Goodell
921a29e41f usnic: add simple unit testing infrastructure
This commit adds mechanisms for writing and running unit tests in the
usnic BTL.  The short version of how to run the tests is:

1. Configure with `--enable-ompi-btl-usnic-unit-tests`.  This will cause
   the unit testing code and test runner utility to be built.

2. Run the tests by running `ompi_btl_usnic_run_tests`.

See `README.test` for full details.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30845.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:50 +00:00
Dave Goodell
044a190cac usnic: consolidate orte includes into compat.h
These includes only exist in the Cisco-internal usnic-v1.6 code base,
but they should not exist anywhere except btl_usnic_compat.h in order to
minimize source differences between usnic-v1.6 and v1.7/trunk.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30844.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:33 +00:00
Dave Goodell
62dc42f628 usnic: check packet/segment lengths
Lower layer (hardware or software) bugs can result in a mismatch between
our BTL-layer payload size and the actual packet length.  We now check
that in order to catch these cases, which otherwise can result in
MPI-layer message corruption.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30843.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:19 +00:00
Dave Goodell
3b5b87c325 usnic: add missing MSGDEBUG in recv path
We were missing a debug message for a very common recv case, making it a
bit harder to follow a debug log.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30842.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:05 +00:00
Dave Goodell
f6036d11c8 usnic: fix sender hash comparisons for UDP
There was a duplicated subnet check in the sender hash lookup routine.
This caused receivers to always fail the sender hash lookup if the
sender was in a different subnet, so the receiver would discard the
packet as though it were coming from a different job.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30841.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:46:50 +00:00
Dave Goodell
90d68730f1 usnic: fix SEGV when ibv_create_ah fails
If ibv_create_ah fails, we will not initialize the `endpoint->proc`.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30840.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:46:37 +00:00
Dave Goodell
a54f53f242 usnic: also match interfaces in different subnets
This functionality is required for routable UDP/IP usnic traffic.

Previously we would only setup endpoints for remote interfaces on the
same subnet as the current module's local interface.  This behavior
still holds if two processes share any common subnets.  However, if the
two processes only have no subnets in common then we assume that all
interfaces are reachable from all other interfaces and wire them up in a
1-1, randomly-matched order somewhat similarly to the "tcp" BTL's
behavior.

Only match in different subnets if we detect UDP support in the lower
layer.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30839.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:44:49 +00:00
Dave Goodell
4875f48eaa usnic: enable UDP support
This commit decouples OMPI deployment from the version(s) of the lower
layers of the stack by probing for UDP support.

Verbs applications assume a 40-byte header (there is no current
mechanism for querying payload offset).  So to support a 42-byte UDP
header without causing existing applications like ibv_ud_pingpong or
older versions of OMPI to crash, we must inform libusnic_verbs that we
are aware of the nonstandard payload offset.  We do this by overriding
the `transport_type` field of the device to be 42 before calling
`ibv_open_device`.  If the library resets it to something else, then we
know the lower layers are UDP capable.  Otherwise we use the older
custom-L2 format.

This necessitated some minor ugliness in common_verbs, but it's as tidy
as Jeff and I know how to make it right now.

This commit only adds support for UDP headers and connectivity over the
same L2 network, it does not touch routing or interface pairing.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30838.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:44:35 +00:00
Dave Goodell
e10ad5763f usnic: rearrange component struct field order
Just trying to be deliberate about keeping fastpath-accessed fields
grouped together to fit into the same 64-byte cache line.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30837.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:53 +00:00
Dave Goodell
5d7eabbcd1 usnic: Change tiny_mtu to a size_t (it's compared against an unsigned value)
Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30836.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:37 +00:00
Dave Goodell
fef38d7e42 usnic: Fix a few compiler warnings about types of printed variables
Authored-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30835.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:23 +00:00
Dave Goodell
cadaa1c424 usnic: Shrink sequence numbers to 16 bits
Authored-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30834.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:10 +00:00
Dave Goodell
707e594d13 usnic: Use INLINE flag more often, saving the DMA is useful.
Authored-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30833.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:39:53 +00:00
Dave Goodell
dbbe6a8254 usnic: fix proc structure memory leak
Valgrind showed this one, just a bit of sloppiness with the reference
counting.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30832.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:39:34 +00:00
Dave Goodell
4af332bd4e Fix the logic in ompi_common_verbs_find_ports().
The logic did not correctly perform the OR behavior that is described
in the doxy docs for this function.  This commit fixes the logic so
that a port will be included if it has supports any of the
capabilities indicated by the passed-in flags.

Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30831.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:39:21 +00:00
Nathan Hjelm
acbd6032f9 Helps to include the correct header.
cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30821.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-25 19:14:48 +00:00
Nathan Hjelm
5edacac301 osc/rdma: add missing include
cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30820.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-25 19:11:19 +00:00
Ralph Castain
49d938de29 Merge one-sided updates to the trunk - written by Brian Barrett and Nathan Hjelmn
cmr=v1.7.5:reviewer=hjelmn:subject=Update one-sided to MPI-3

This commit was SVN r30816.
2014-02-25 17:36:43 +00:00
Joshua Ladd
9ea9bec4ad Addressing Jeff's comments:
1. Changed rng_buff_t --> opal_rng_buff_t
2. All global variables obey the prefix rule
3. Old code has been removed 
4. Found a couple of unnecessary includes

Refs trac:4298

This commit was SVN r30807.

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-24 23:18:35 +00:00
Jeff Squyres
d07d1864ae Revert r30804.
We're going to be bringing a bunch of usnic code to the SVN trunk
soon, and I basically brought this commit over out of order.  So I'm
reverting it for now; the same functionality will come back shortly.

This commit was SVN r30805.

The following SVN revision numbers were found above:
  r30804 --> open-mpi/ompi@5bedcc15bf
2014-02-24 19:12:49 +00:00
Jeff Squyres
5bedcc15bf Support the IBV_*_USNIC_* verbs constants.
These constants are now upstream (see
https://git.kernel.org/cgit/libs/infiniband/libibverbs.git/commit/?id=f57a9c67eabb9e7f19c624ac3c8c27b7be55796c),
so let's support them properly in Open MPI.

Added bonus: consolidating these checks up in
ompi_check_openfabrics.m4 allowed removing some custom checks and
AC_DEFINE's from the usnic configure.m4 script.

Also change the usnic/configure.m4 check for IBV_EVENT_GID_CHANGE to
use AC_CHECK_DECLS (vs. AC_CHECK_DECL).

cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r30804.
2014-02-24 18:57:04 +00:00
Jeff Squyres
1b855eca8e A few fixes after r30801:
* Use the prefix rule for global variables
 * Elimiante seed_prng() since it isn't necessary any more

These files will need to get edited again then the RNG type obeys the
prefix rule.

Refs trac:4298

This commit was SVN r30803.

The following SVN revision numbers were found above:
  r30801 --> open-mpi/ompi@e39d9f4080

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-24 17:47:52 +00:00
Joshua Ladd
e39d9f4080 Per the RFC schedule, add an additive lagged Fibonacci parallel random number generator to OPAL. In order to use, please add the following header to your code: opal/util/alfg.h. See ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c for an example how to seed with opal_srand and invoke the generator with opal_rand. This should be added to
cmr=v1.7.5:reviewer=rhc:subject=Add an OPAL RNG

This commit was SVN r30801.
2014-02-23 21:41:38 +00:00
Nathan Hjelm
bd275e642e btl/ugni: fix typo
cmr=v1.7.5:ticket=trac:4151

This commit was SVN r30795.

The following Trac tickets were found above:
  Ticket 4151 --> https://svn.open-mpi.org/trac/ompi/ticket/4151
2014-02-21 17:46:22 +00:00
Ralph Castain
29a7eda280 Remove executable property
This commit was SVN r30791.
2014-02-21 17:27:47 +00:00
Manjunath Gorentla Venkata
38e5a753dd basemuma bcol : fixing warnings
This commit was SVN r30784.
2014-02-20 18:30:53 +00:00
Mike Dubman
49ee63f4b8 MXM: do not enforce version check
- MXM uses libtool versioning scheme which is enough, no need additional in OMPI

reviewed by yossi

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30768.
2014-02-18 19:44:37 +00:00
Rolf vandeVaart
d4f12148c4 Fix several issues reported in ticket #4245.
This commit was SVN r30767.
2014-02-18 17:44:08 +00:00
Jeff Squyres
a80a24029d Rename poorly-named global: usnic_ticks -> ompi_btl_usnic_ticks
cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r30752.
2014-02-17 21:37:13 +00:00
Jeff Squyres
bb4ba6511d Remove an unused RML tag (it isn't even used in the oshmem layer).
This commit was SVN r30749.
2014-02-17 18:35:43 +00:00
Ralph Castain
c3df744a3b Shift the orte_db_localrank key to the opal level. Add the job and proc-level session directory names to the database using opal_db keys.
This commit was SVN r30746.
2014-02-17 01:40:56 +00:00
Ralph Castain
445c9f3384 Ensure we only post one receive for direct modex replies, and that we properly handle thread-transfer issues between the ORTE callback and the MPI layer. Account for potential threaded operations at the MPI level.
Refs trac:4258

This commit was SVN r30730.

The following Trac tickets were found above:
  Ticket 4258 --> https://svn.open-mpi.org/trac/ompi/ticket/4258
2014-02-14 20:37:17 +00:00
Ralph Castain
bdff767dce Ick - wonder how this ever built static? There is no "select" function anywhere in the system.
cmr=v1.7.5:reviewer=jsquyres:subject=remove bad function declaration

This commit was SVN r30729.
2014-02-14 20:34:21 +00:00
Mike Dubman
608269ed72 fca: support relocation of fca packages to opal_prefix/../fca
reviewed by AlexM
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30728.
2014-02-14 14:49:41 +00:00
Ralph Castain
3e12466f60 Ouch - fix bad race condition in direct modex
cmr=v1.7.5:reviewer=hjelmn:subject=fix bad race condition in direct modex

This commit was SVN r30691.
2014-02-11 23:21:27 +00:00
Dave Goodell
72c0b89e8f usnic: handle missing ibv_event_type_str
Some older versions of libibverbs do not have `ibv_event_type_str`,
leading to compilation failures on older machines, irrespective of
whether they could ever support usNIC anyway.  If we encounter any other
build issues related to "old verbs" then we should just cause the usnic
BTL to disqualify itself when it encounters "old" traits.

Thanks to Paul Hargrove for reporting the issue:
http://www.open-mpi.org/community/lists/devel/2014/02/14056.php

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30674.
2014-02-11 19:18:29 +00:00
Nathan Hjelm
6194bb502a vader: attempt to work around SGI UV issues by creating a segment that
only goes up to VADER_MAX_ADDRESS instead of 0xfffffffffffffffful.

cmr=v1.7.5:ticket=trac:4216

This commit was SVN r30669.

The following Trac tickets were found above:
  Ticket 4216 --> https://svn.open-mpi.org/trac/ompi/ticket/4216
2014-02-11 16:28:25 +00:00
Nathan Hjelm
f2f6a7fe81 vader: don't finalize an endpoint that is already finalized
Fixes trac:4252

cmr=v1.7.5:ticket=trac:4053

This commit was SVN r30668.

The following Trac tickets were found above:
  Ticket 4053 --> https://svn.open-mpi.org/trac/ompi/ticket/4053
  Ticket 4252 --> https://svn.open-mpi.org/trac/ompi/ticket/4252
2014-02-11 16:15:29 +00:00
Nathan Hjelm
f45364746e vader: fix typos in r30626
cmr=v1.7.5:ticket=trac:4053

This commit was SVN r30652.

The following SVN revision numbers were found above:
  r30626 --> open-mpi/ompi@a8867a9ca4

The following Trac tickets were found above:
  Ticket 4053 --> https://svn.open-mpi.org/trac/ompi/ticket/4053
2014-02-10 16:15:43 +00:00
Nathan Hjelm
6dd29a05f1 basesmuma: Fix typos in r30627
cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30651.

The following SVN revision numbers were found above:
  r30627 --> open-mpi/ompi@98ad6b3d1e

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-10 16:15:37 +00:00
Alex Margolin
636493393c OPENIB: Fixed error from writing to an uninitialized pipe.
The error was caused by leaving the pipe to the async thread uninitialized, then writing to it regardless of this. 
Fix is to check the existance of the async thread and the pipe to it.

reviewd by miked

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30644.
2014-02-09 14:07:14 +00:00
Nathan Hjelm
98ad6b3d1e bcol/basesmuma: fix initialization on 32-bit platforms
The initialization code did several allgathers on void *'s using
MPI_LONG_LONG_INT. This will produce the wrong result on 32-bit
platforms. Instead use MPI_BYTE with count = sizeof (void *).

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30627.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-08 00:00:30 +00:00
Nathan Hjelm
a8867a9ca4 btl/vader: fix 32-bit support
cmr=v1.7.5:ticket=trac:4053

This commit was SVN r30626.

The following Trac tickets were found above:
  Ticket 4053 --> https://svn.open-mpi.org/trac/ompi/ticket/4053
2014-02-07 23:57:36 +00:00
Nathan Hjelm
77869c3232 bcol/basesmuma: fix several bugs in the basesmuma code
Found two bugs in basesmuma:

 - Release all resources when tearing down the bcol module.

 - Allways call the allreduce in the smcm code. We do not know
   beforehand whether all procs have all the files mapped.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30623.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-07 21:39:24 +00:00
Pavel Shamis
3a683419c5 Fixing broken dependency between ML/BCOLS
This is hot-fix patch for the issue reported by Ralph. 
In future we plan to restructure ml data structure layout.

Tested by Nathan.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30619.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-07 19:15:45 +00:00
Ralph Castain
74d3393a4f Revert r30600, r30602-30604 as the first one broke the tarball and the others couldn't fix it
This commit was SVN r30605.

The following SVN revision numbers were found above:
  r30600 --> open-mpi/ompi@7d2c4cb468
  r30602 --> open-mpi/ompi@9e751a0302
  r30604 --> open-mpi/ompi@3012c280cf

Revision number ranges (suitable for "git log"):
  r30602-30604 --> open-mpi/ompi@9e751a03^..3012c280
2014-02-07 04:38:06 +00:00
Ralph Castain
3012c280cf I surrender - this code is just too interbred with other components for me to clean up, so turn it off for now
This commit was SVN r30604.
2014-02-07 04:16:21 +00:00
Ralph Castain
3954311bac We have rules about not cross-integrating components, even across frameworks - please follow them.
This commit was SVN r30603.
2014-02-07 03:46:45 +00:00
Ralph Castain
9e751a0302 You absolutely, positively *cannot* include a header file from a component in the base functions!
This commit was SVN r30602.
2014-02-07 03:27:06 +00:00
Nathan Hjelm
a06e491c2c ob1: large buffered sends were broken by the ob1 optimizations. fix them
The problem was caused by the static request optimization. The buffered send case
is much like the isend case in that the request structure may be needed after
MPI_Bsend completes. Fix this case by calling isend and freeing the resulting
request.

cmr=v1.7.5:ticket=trac:4149

This commit was SVN r30601.

The following Trac tickets were found above:
  Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149
2014-02-07 00:12:36 +00:00
Jeff Squyres
7d2c4cb468 There's a few ml-related bugs outstanding, and Nathan is looking into
them, but it's going to take a little time (at least one day).  So
Nathan says it's ok to .ompi_ignore coll ml until he's able to fix it.

This commit was SVN r30600.
2014-02-06 23:51:03 +00:00
Nathan Hjelm
3902cf66f1 ob1: OBJ_CONSTRUCT the convertor in the send_inline optimization.
This change does not appear to increase the small message latency of ping-pong
benchmarks and fixes an issue found by our ibm datatype tests.

Fixes trac:4232

cmr=v1.7.5:ticket=trac:4149

This commit was SVN r30598.

The following Trac tickets were found above:
  Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149
  Ticket 4232 --> https://svn.open-mpi.org/trac/ompi/ticket/4232
2014-02-06 21:27:42 +00:00
Nathan Hjelm
a41cb1f086 Remove duplicate definition of xpmem_apid_t
cmr=v1.7.5:ticket=trac:4216

This commit was SVN r30589.

The following Trac tickets were found above:
  Ticket 4216 --> https://svn.open-mpi.org/trac/ompi/ticket/4216
2014-02-06 20:38:20 +00:00
George Bosilca
6ee06b7fda No exit down into a BTL.
This commit was SVN r30566.
2014-02-05 15:04:01 +00:00
Ralph Castain
1326ed704f Per the RFC discussed here:
http://www.open-mpi.org/community/lists/devel/2014/01/13789.php

add support for async modex when requested.

cmr=v1.7.5:reviewer=jsquyres:subject=Add async modex support

This commit was SVN r30565.
2014-02-05 14:39:27 +00:00
Joshua Ladd
1dbd8688db This fixes a long standing bug in the OpenIB BTL's MCA param intialization.Only caught if BTL_OPENIB_FAILOVER_ENABLED. Thanks to Jeff for spotting. This should be added to:
cmr=v1.7.4:reviewer=jsquyres
cmr=v1.6.6

This commit was SVN r30558.
2014-02-04 20:01:39 +00:00
Nathan Hjelm
12f0bf9488 basesmuma: missed a couple of MB references
cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30538.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-03 18:19:53 +00:00
Nathan Hjelm
84320f3815 btl/vader: fix compilation with SGI xpmem and add some debugging to
component_init.

cmr=v1.7.5:ticker=#4053

This commit was SVN r30535.
2014-02-03 17:42:40 +00:00
Nathan Hjelm
64321acc22 basesmuma: do not call MB directly
opal does not always define MB. It is recommended that opal_atomic_[rw]mb is
called instead. We will need to address the cases where these functions are
no-ops on weak-memory ordered cpus.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30534.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-03 17:01:57 +00:00
Nathan Hjelm
c2b061cc84 basesmuma: clean up code
Several changes are contained in this commit:

 - Clean up tabs and trailing whitespaces

 - Use consistent indentation in changed files

 - Remove unused code. None of the removed code will ever have been
   used in a trunk build.

 - Clean up the smcm code quite a bit

 - Do not fflush stderr and use opal_output instead of fprintf.

These changes have been tested on Cray XE-6 and PSM systems.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30533.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-03 17:01:46 +00:00
Christoph Niethammer
4f23d8214c Fixed incorrect calculation of reallocated memory in mca_bml_r2_del_btl.
This commit was SVN r30529.
2014-02-03 08:43:59 +00:00
Nathan Hjelm
1ae39753dc bcol/basesmuma: check the return code of bcol_basesmuma_smcm_allgather_connection.
Fixes a segmentation fault found by the bogus intercomm_create test.

cmr=v1.7.4:review=manjugv

This commit was SVN r30527.
2014-01-31 22:20:25 +00:00
Adrian Reber
7de34ea201 SNAPC/CRCP/SSTORE: remove compiler warnings
This commit was SVN r30488.
2014-01-29 20:52:00 +00:00
Adrian Reber
fa1036f38c SSTORE/CRCP: use ORTE_WAIT_FOR_COMPLETION with non-blocking receives
During the commits to make the C/R code compile again the
blocking receive calls were replaced by non-blocking
which broke the code. This patch uses ORTE_WAIT_FOR_COMPLETION()
to wait until the non-blocking calls have finished.

This commit was SVN r30486.
2014-01-29 20:30:35 +00:00
Hadi Montakhabi
7bf4c425ff Fix: making sure the file type is not overwritten by the last queried component
This commit was SVN r30478.
2014-01-29 19:21:03 +00:00
Nathan Hjelm
afae924e29 coll/ml: fix some warnings and the spelling of indices
This commit fixes one warning that should have caused coll/ml to segfault
on reduce. The fix should be correct but we will continue to investigate.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30477.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-01-29 18:44:21 +00:00
Nathan Hjelm
700e97cf6a btl/vader: add support for SGI's implementation of xpmem and add support
for 32-bit architectures.

This commit also modifies _OMPI_CHECK_HEADER to use AC_CHECK_HEADERS instead
of AC_CHECK_HEADER. This allows components to check for multiple headers
instead of just one. The new semantics of the header check in OMPI_CHECK_PACKAGE
are to return success if at least one of the specified headers exists. The new
semantics will not break current usage.

cmr=v1.7.5:ticket=trac:4053

This commit was SVN r30476.

The following Trac tickets were found above:
  Ticket 4053 --> https://svn.open-mpi.org/trac/ompi/ticket/4053
2014-01-29 18:35:47 +00:00
Jeff Squyres
3fa9d36aba Per http://www.open-mpi.org/community/lists/devel/2014/01/13938.php,
Orion Poplawski noticed that we should not be installing mpio.h.

cmr=v1.7.4:reviewer=hjelmn:subject=do not install mpio.h

This commit was SVN r30465.
2014-01-28 21:46:26 +00:00
George Bosilca
bde9619386 Various minor cleanups.
This commit was SVN r30431.
2014-01-26 17:27:12 +00:00
George Bosilca
d265981c55 Don't always retain the proc, do it only for new procs. This enforce a strict policy in the BML, it has one and only one ref on each proc.
This commit was SVN r30429.
2014-01-26 17:26:04 +00:00
Ralph Castain
b32556e6dc Fixes trac:4143
After IM with Nathan, apply patch from ticket after verification by Paul Hargrove that it fixes the problem on non-x86 32-bit platforms

Verified by Paul, RM-approved

cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r30411.

The following Trac tickets were found above:
  Ticket 4143 --> https://svn.open-mpi.org/trac/ompi/ticket/4143
2014-01-24 17:56:52 +00:00
Nathan Hjelm
2435057a57 ignore the iboffload component for now.
This commit was SVN r30398.
2014-01-23 16:06:21 +00:00
Rolf vandeVaart
9f3bf4747d Provide option to have synchronous copy be asynchronous with a wait. For now,
this has to be selected at runtime.  Also fix up some error messages to have
node name in them.

This commit was SVN r30396.
2014-01-23 15:47:20 +00:00
Jeff Squyres
9fee7c2b4d According to a report from Adam Moody, there is a compile error with
ROMIO and Lustre 2.4.0.  It has been solved upstream already; here's
the ticket:

    http://trac.mpich.org/projects/mpich/ticket/1973

And here's the commit that fixed it:

    a0c4278f14

OMPI does not have the other code referred to in that git commit (in
ad_lustre_hints.c).

Thanks to Adam Moody for reporting the issue.

cmr=v1.7.4:reviewer=hjelmn:subject=Fix ROMIO compile error w/ Lustre 2.4

This commit was SVN r30393.
2014-01-23 14:15:35 +00:00
Christoph Niethammer
86776daf75 Fixed typo in opal output message.
This commit was SVN r30392.
2014-01-23 08:37:40 +00:00
Mike Dubman
071838bb0a HCOLL: call hcoll_finalize and hcoll progress unregister in case of hcoll module query failures
fixed by Elena, reviewed by Val/Miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30390.
2014-01-23 07:29:23 +00:00
Ralph Castain
06e6a06f3e Cleanup a couple of abstraction breaks found by Thomas Naughton
This commit was SVN r30371.
2014-01-22 21:36:24 +00:00
Hadi Montakhabi
8af6b8b4e4 add support for PLFS filesystem
This commit was SVN r30370.
2014-01-22 21:16:15 +00:00
Nathan Hjelm
7ba8bd81fa coll/ml: remove debug fprintfs
cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30367.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-01-22 17:21:05 +00:00
Nathan Hjelm
82d996fb76 coll/ml: cleanup some merge related errors
cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30366.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-01-22 16:48:09 +00:00
Nathan Hjelm
ff4c9c808a btl/ugni: fix leak in new sendi function.
cmr=v1.7.5:ticket=trac:4151

This commit was SVN r30365.

The following Trac tickets were found above:
  Ticket 4151 --> https://svn.open-mpi.org/trac/ompi/ticket/4151
2014-01-22 16:32:07 +00:00
Nathan Hjelm
66b69da394 Fix a bug in the ob1 optimizations that can cause a segfault.
btl sendi functions currently can not handle the descriptor being NULL. The
send inline optimization was assuming (incorrectly) that NULL was ok.

cmr=v1.7.5:ticket=trac:4149

This commit was SVN r30364.

The following Trac tickets were found above:
  Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149
2014-01-22 16:31:58 +00:00
Nathan Hjelm
1a021b8f2d coll/ml: add support for blocking and non-blocking allreduce, reduce, and
allgather.

The new collectives provide a signifigant performance increase over tuned for
small and medium messages. We are initially setting the priority lower than
tuned until this has had some time to soak in the trunk. Please set
coll_ml_priority to 90 for MTT runs.

Credit for this work goes to Manjunath Gorentla Venkata (ORNL), Pavel Shamis (ORNL),
and Nathan Hjelm (LANL).

Commit details (for reference):

Import ORNL's collectives for MPI_Allreduce, MPI_Reduce, and MPI_Allgather.

We need to take the basesmuma header into account when calculating the
ptpcoll small message thresholds. Add a define to bcol.h indicating the
maximum header size so we can take the header into account while not
making ptpcoll dependent on information from basesmuma.

This resolves an issue with allreduce where ptpcoll overwrites the
header of the next buffer in the basesmuma bank.

Fix reduce and make a sequential collective launcher in coll_ml_inlines.h

The root calculation for reduce was wrong for any root != 0. There are
four possibilities for the root:

 - The root is not the current process but is in the current hierarchy. In
   this case the root is the index of the global root as specified in the
   root vector.

 - The root is not the current process and is not in the next level of the
   hierarchy. In this case 0 must be the local root since this process will
   never communicate with the real root.

 - The root is not the current process but will be in next level of the
   hierarchy. In this case the current process must be the root.

 - I am the root. The root is my index.

Tested with IMB which rotates the root on every call to MPI_Reduce. Consider
IMB the reproducer for the issue this commit solves.

Make the bcast algorithm decision an enumerated variable

Resolve various asset failures when destructing coll ml requests.

Two issues:

 - Always reset the request to be invalid before returning it to the
   free list. This will avoid an asset in ompi_request_t's destructor.
   OMPI_REQUEST_FINI does this (and also releases the fortran handle
   index).

 - Never explicitly construct or destruct the superclass of an opal
   object. This screws up the class function tables and will cause
   either an assert failure or a segmentation fault when destructing
   coll ml requests.

Cleanup allgather.

I removed the duplicate non-blocking and blocking functions and modeled
the cleanup after what I found in allreduce. Also cleaned up the code
somewhat.

Don't bother copying from the send to the recieve buffer in
bcol_basesmuma_allreduce_intra_fanin_fanout if the pointers are the
same.

The eliminates a warning about memcpy and aliasing and avoids an
unnecessary call to memcpy.

Alwasy call CHECK_AND_RELEASE on memsync collectives.

There was a call to OBJ_RELEASE on the collective communicator but
because CHECK_AND_RECYLCE was never called there was not matching call
to OBJ_RELEASE. This caused coll ml to leak communicators.

Make allreduce use the sequential collective launcher in coll_ml_inlines.h

Just launch the next collective in the component progress.

I am a little unsure about this patch. There appears to be some sort
of race between collectives that causes buffer exhaustion in some cases
(IMB Allreduce is a reproducer). Changing progress to only launch the
next bcol seems to resolve the issue but might not be the best fix.

Note that I see little-no performance penalty for this change.

Fix allreduce when there are extra sources.

There was an issue with the buffer offset calculation when there are
extra sources. In the case of extra sources == 1 the offset was set
to buffer_size (just past the header of the next buffer). I adjusted
the buffer size to take into accoun the maximum header size (see the
earlier commit that added this) and simplified the offset calculation.

Make reduce/allreduce non-blocking. This is required for MPI_Comm_idup
to work correctly.

This has been tested with various layouts using the ibm testsuite and
imb and appears to have the same performance as the old blocking version.

Fix allgather for non-contiguous layouts and simplify parsing the
topology.

Some things in this patch:

 - There were several comments to the effect that level 0 of the
   hierarchy MUST contain all of the ranks. At least one function
   made this assumption but it was not true. I changed the sbgp
   components and the coll ml initization code to enforce this
   requirement.

 - Ensure that hierarchy level 0 has the ranks in the correct
   scatter gather order. This removes the need for a separate
   sort list and fixes the offset calculation for allgather.

 - There were several passes over the hierarchy to determine
   properties of the hierarchy. I eliminated these extra passes
   and the memory allocation associated with them and calculate the
   tree properties on the fly. The same DFS recursion also handles
   the re-order of level 0.

All these changes have been verified with MPI_Allreduce, MPI_Reduce, and
MPI_Allgather. All functions now pass all IBM/Open MPI, and IMB tests.

coll/ml: correct pointer usage for MPI_BOTTOM

Since contiguous datatypes are copied via memcpy (bypassing the convertor) we
need to adjust for the lb of the datatype. This corrects problems found testing
code that uses MPI_BOTTOM (NULL) as the send pointer.

Add fallback collectives for allreduce and reduce.

cmr=v1.7.5:reviewer=pasha

This commit was SVN r30363.
2014-01-22 15:39:19 +00:00
Nathan Hjelm
c9c335544e btl/ugni: fix a typo in r30353
cmr=v1.7.5:ticket=trac:4151

This commit was SVN r30354.

The following SVN revision numbers were found above:
  r30353 --> open-mpi/ompi@aa3fea55b2

The following Trac tickets were found above:
  Ticket 4151 --> https://svn.open-mpi.org/trac/ompi/ticket/4151
2014-01-21 21:02:28 +00:00
Nathan Hjelm
aa3fea55b2 btl/ugni: re-add a sendi function to exploit the new optimization in
ob1.

Also update LANL platform files to use the latest version of ugni.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30353.
2014-01-21 20:53:35 +00:00
Nathan Hjelm
2b57f4227e ob1: optimize blocking send and receive paths
Per RFC. There are two optimizations in this commit:

 - Allocate requests for blocking sends and receives on the stack. This
   bypasses the request free list and saves two atomics on the critical path.
   This change improves the small message ping-pong by 50-200ns on both AMD
   and Intel CPUs.

 - For small messages try to use the btl sendi function before intializing a
   send request. If the sendi fails or the btl does not have a sendi function
   silently fallback on the standard send path.

cmr=v1.7.5:reviewer=brbarret

This commit was SVN r30343.
2014-01-21 15:16:21 +00:00
Mike Dubman
b8550a55a7 HCOLL: many fixes
Adds coll_hcoll_np mca parameter similar to that of fca component (defaults to 32). Those who use hcoll be aware that from now on the communicators less than 32 procs will run w/o hcoll by default. - Resolves fallback issue in case libhcoll runs out of allowed contexts. The solution is moving hcoll_context_create from comm_enable to comm_query. Shortly, comm_enable should never return OMPI_ERROR in the coll component with highest priority (hcoll). Otherwise the ompi coll_base_select will unselect the coll funtion pointers and module references leaving the communicator w/o coll pointer. This will cause the fail. Same behavior can be reproduced even with tuned if one would hardcore some "return OMPI_ERROR" into it's module_enable funtion. - Additionally, removed all the dead code under #if 0; removed unused variables (path for library, active_modules list) and classes (module list wrapper)

Fixed by Val, Reviewed by Devendar/Josh/Miked

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30341.
2014-01-21 12:19:47 +00:00
Ralph Castain
2cf4862b49 Cleanup warnings for use of void* - requires intermediate cast to uintptr_t. Thanks to Paul Hargrove for reporting it
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30333.
2014-01-20 15:44:45 +00:00
Edgar Gabriel
be5d5834c5 fix the problem identified by a user on the mailing list with MPI_MODE_EXCL
cmr=v1.7.4:reviewer=vvenkatesan:subject=fix a problem when opening a file with MODE_EXCL

This commit was SVN r30324.
2014-01-18 16:06:27 +00:00
Nathan Hjelm
c88626510c Fix a merge issues with new ROMIO and fix obvious ROMIO bug.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30319.
2014-01-18 00:29:16 +00:00
Hadi Montakhabi
8c14411289 f_cc_size is contiguous chunk size, not the stripe width. There is no stripe_width in the file handle structure.
This commit was SVN r30314.
2014-01-17 18:35:55 +00:00
Nathan Hjelm
f2a73fcdbd udreg: free huge page allocations correctly
This commit fixes an error path that occurs when huge page allocations are
enabled. In this case we allocate a huge page and try to register it but fail.
We then were calling free on the opal object. Fix this by calling the proper free
function.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30289.
2014-01-14 16:26:06 +00:00
Nathan Hjelm
f9d2032705 vader: ensure fast box data is aligned on 4-byte boundaries
This commit fixes a bus error on Solaris/Sparc.

Closes trac:4111

cmr=v1.7.5:ticket=trac:4053

This commit was SVN r30288.

The following Trac tickets were found above:
  Ticket 4053 --> https://svn.open-mpi.org/trac/ompi/ticket/4053
  Ticket 4111 --> https://svn.open-mpi.org/trac/ompi/ticket/4111
2014-01-14 16:04:52 +00:00
Rolf vandeVaart
e75afb2b82 Fix bug in distance computation code when deciding which devices to use on a NUMA node.
Also add a verbose flag so one can see what devices are selected as well as another flag to override
locality information and use all devices on the node.  

This commit was SVN r30287.
2014-01-14 15:41:56 +00:00
Nathan Hjelm
da1316ca6e vader: don't OBJ_RELEASE endpoint rcaches.
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30284.
2014-01-13 23:44:34 +00:00
Jeff Squyres
20d6391734 Patch submitted by Paul Hargrove to fix NetBSD compile with -laio.
NetBSD puts the AIO functions in -lrt, vs. the usual libc.  So we
need the fbtl/posix configure.m4 to test for -lrt properly.

Reviewed by Jeff Squyres.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Fix NetBSD use of -laio

This commit was SVN r30274.
2014-01-13 18:49:39 +00:00
Yossi Etigin
7564e2c13f Fix a recursion in mxm send flow which happens when mpi starts a new send from the context of send completion callback.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30265.
2014-01-12 17:47:03 +00:00
Yossi Etigin
9504969f7d fix communicator double-free from pt2pt component, caused by r29938.
cmr=v1.7.5:reviewer=brbarret

This commit was SVN r30264.

The following SVN revision numbers were found above:
  r29938 --> open-mpi/ompi@ecfb122c97
2014-01-12 17:38:14 +00:00
Ralph Castain
286ff6d552 For large scale systems, we would like to avoid doing a full modex during MPI_Init so that launch will scale a little better. At the moment, our options are somewhat limited as only a few BTLs don't immediately call modex_recv on all procs during startup. However, for those situations where someone can take advantage of it, add the ability to do a "modex on demand" retrieval of data from remote procs when we launch via mpirun.
NOTE: launch performance will be absolutely awful if you do this with BTLs that aren't configured to modex_recv on first message!

Even with "modex on demand", we still have to do a barrier in place of the modex - we simply don't move any data around, which does reduce the time impact. The barrier is required to ensure that the other proc has in fact registered all its BTL info and therefore is prepared to hand over a complete data package. Otherwise, you may not get the info you need. In addition, the shared memory BTL can fail to properly rendezvous as it expects the barrier to be in place.

This behavior will *only* take effect under the following conditions:

1. launched via mpirun

2. #procs is greater than ompi_hostname_cutoff, which defaults to UINT32_MAX

3. mca param rte_orte_direct_modex is set to 1. At the moment, we are having problems getting this param to register properly, so only the first two conditions are in effect. Still, the bottom line is you have to *want* this behavior to get it.

The planned next evolution of this will be to make the direct modex be non-blocking - this will require two fixes:

1. if the remote proc doesn't have the required info, then let it delay its response until it does. This means we need a way for the MPI layer to tell the RTE "I am done entering modex data".

2. adjust the SM rendezvous logic to loop until the required file has been created

Creating a placeholder to bring this over to 1.7.5 when ready.

cmr=v1.7.5:reviewer=hjelmn:subject=Enable direct modex at scale

This commit was SVN r30259.
2014-01-11 17:36:06 +00:00
Jeff Squyres
34ae50a0ed Fix int <--> pointer casting by adding intermediate cast through (intptr_t)
Reviewed by Dave Goodell

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Add intptr_t casting in usnic btl

This commit was SVN r30243.
2014-01-10 20:42:53 +00:00
Nathan Hjelm
5259ab213f Fix one more error path in udreg. In this case we hit the maximum size
of the udreg cache and get a different error code back.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30242.
2014-01-10 19:27:32 +00:00
Ralph Castain
9566650458 Per Marco, don't define a "min" function if one is already defined to avoid conflict with cygwin reserved word
This commit was SVN r30241.
2014-01-10 18:03:25 +00:00
Ralph Castain
880943dc10 Per Marco, rename "interface" to "tcp_interface" to avoid cygwin reserved word
This commit was SVN r30240.
2014-01-10 18:02:22 +00:00
Ralph Castain
c7a94a57d7 Per Marco, rename ERROR tags to exit_ERROR to avoid cygwin reserved name issues.
Refs trac:4085

This commit was SVN r30239.

The following Trac tickets were found above:
  Ticket 4085 --> https://svn.open-mpi.org/trac/ompi/ticket/4085
2014-01-10 18:00:49 +00:00
Jeff Squyres
350d989c00 Fix OpenBSD warnings where <malloc.h> is available and usable, but not
intended to be used and emits a compile-time warning.

Thanks to Paul Hargrove for identifying the issue.

cmr=v1.7.4:reviewer=hjelmn:subject=remove/replace malloc.h

This commit was SVN r30231.
2014-01-10 17:20:49 +00:00
Jeff Squyres
53a3defde9 s/CACHE_LINE_SIZE/BASESMUMA_CACHE_LINE_SIZE/g to avoid a system macro
name clash on some BSDs.

cmr=v1.7.4:reviewer=pasha

This commit was SVN r30230.
2014-01-10 16:48:43 +00:00
Edgar Gabriel
217e61e345 add proper typcasts to intptr_t to avoid warnings on 32bit systems.
This commit was SVN r30229.
2014-01-10 16:19:04 +00:00
Jeff Squyres
212e07a1e9 Don't instantiate+init variables in a switch block.
Avoid compiler warning about (unnecessarily) initializing 2 variables
during instantiation at the top of a switch block (but outside of any
case statements): just declare the variables at the top of the outter
block.  They're already safely initialized, so don't worry about
initializing them in the instantiation.

Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Don't instantiate+init variables in a switch block

This commit was SVN r30228.
2014-01-10 15:39:16 +00:00
Mike Dubman
110c99af4f sharing negative tag space between libNBC and HCOLL
fixed by devendar, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30224.
2014-01-10 12:51:34 +00:00
Nathan Hjelm
52c231df3e ob1 does not check the return code of mpool_register. This can cause the
ob1 dummy registration to actually be used when using udreg. Fix this by
always setting reg to NULL when mpool/udreg's register function fails.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30214.
2014-01-10 00:46:16 +00:00
Jeff Squyres
115025b8dd Ensure that the usnic BTL is only built on 64 bit Linux platforms.
Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Ensure the usnic BTL only builds on 64 bit Linux

This commit was SVN r30199.
2014-01-09 22:17:01 +00:00
Brian Barrett
013e0ec771 * Add multi-device support to the Portals 4 btl.
* Remove use of the Portals 4 proc tag for the btl, as it's causing more
problems than its worth.

This commit was SVN r30191.
2014-01-09 20:01:42 +00:00
Nathan Hjelm
bb01fc2938 Add missing MCA variable enumerator sentinel.
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30178.
2014-01-09 15:28:42 +00:00
Alina Sklarevich
2869ff1782 mxm: fixes for compilation warnings.
removed set but not used variables and a variable that is unused.

reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30176.
2014-01-09 15:15:14 +00:00
Mike Dubman
0fae2caef3 Create a comm keyval for hcoll component with delete callback function.
Set comm attribute with keyval.
Wait for pending hcoll module tasks in comm delete callback where PML
still valid on the communicator. safely destroy hcoll context during
hcoll module destructor.

Author: Devendar Bureddy 
reviewed by miked

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30175.
2014-01-09 11:27:24 +00:00
Nathan Hjelm
10ecd80c8c Fix typo in udreg mpool that could cause us to try to use an invalid
registration. This was causing transaction errors on Aries systems.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30174.
2014-01-09 05:56:29 +00:00
Ralph Castain
2453843972 Add missing include - thanks to Paul Hargrove for spotting it
cmr=v1.7.4:reviewer=jsquyres:subject=add missing include in bcol

This commit was SVN r30171.
2014-01-09 03:57:55 +00:00
Jeff Squyres
9d41632eba Change the MCA level to 2 (from 5) on the rationale that it may be
needed for correctness.  The if_include/if_exclude are level 1, and
the TCP port range params are level 2; this parameter seems to be on
par with the TCP port range params.

Refs trac:4019

This commit was SVN r30161.

The following Trac tickets were found above:
  Ticket 4019 --> https://svn.open-mpi.org/trac/ompi/ticket/4019
2014-01-08 19:04:26 +00:00
Jeff Squyres
8c871c2db6 Fix some compiler warnings:
* Remove some set-but-not-used variables
 * Make a convenience function return void (we weren't using the
   return code, anyway)
 * Mark a function as inline (it was supposed to be inline anyway)

Reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7:subject=Fix usnic BTL compiler warnings

This commit was SVN r30160.
2014-01-08 16:57:14 +00:00
Ralph Castain
cb31187bbe Correct tcp_not_use_nodelay option processing - change in mca param system incorrectly reversed the original parameter
Thanks to Tetsuya Mishima for detecting it!

cmr=v1.7.4:reviewer=jsquyres:subject=Correct tcp_not_use_nodelay option processing

This commit was SVN r30157.
2014-01-08 15:12:50 +00:00
Mike Dubman
43d6a30693 Fix problems of:
- HCOLL close without init
- Call hcoll progress after comm finalize
- mpirun default for coll_hcoll_enable is 1

fixed by Igor, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30156.
2014-01-08 10:55:25 +00:00
Ralph Castain
e2ca265f40 Per 1/7/2014 telecon: Add an MCA param to turn on all warnings for missing excluded interfaces.
Refs trac:4019

This commit was SVN r30146.

The following Trac tickets were found above:
  Ticket 4019 --> https://svn.open-mpi.org/trac/ompi/ticket/4019
2014-01-08 00:21:25 +00:00
Jeff Squyres
13b29cff2c This commit compliements/completes r30140. r30140 made all the
configury/Makefile.am changes; this commit renames the internal
installdirs.h framework struct field names to match the configry macro
names:

 * pkgdatdir ->	ompidatadir
 * pkglibdir -> ompilibdir
 * pkgincludedir -> ompiincludedir

This commit was SVN r30145.

The following SVN revision numbers were found above:
  r30140 --> open-mpi/ompi@8b778903d8
2014-01-07 23:36:33 +00:00
Brian Barrett
7d472ad5a5 Improve some comments
This commit was SVN r30144.
2014-01-07 23:35:04 +00:00
Jeff Squyres
50d20ade82 Fix compiler warnings: remove unused variables
This commit was SVN r30143.
2014-01-07 23:21:47 +00:00
Jeff Squyres
8349e122e8 Fix compiler warning (signed/unsigned comparison)
This commit was SVN r30142.
2014-01-07 23:18:55 +00:00
Brian Barrett
afde8370b3 Pull both calls to get into one function, and wrap with the appropriate
reference count if flow control is enabled.

This commit was SVN r30141.
2014-01-07 23:15:09 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Tom Naughton
c01db6faca fix typo in btl:vader for OMPI_LOCAL_RANK_INVALID
This commit was SVN r30139.
2014-01-07 21:42:51 +00:00
Brian Barrett
dbcc53bc6f Fix a threading issue
Remove some unneeded UNLIKELYs

This commit was SVN r30138.
2014-01-07 19:41:39 +00:00
Rolf vandeVaart
b3edca19df Add braces per coding convention and design review.
This commit was SVN r30137.
2014-01-07 17:30:37 +00:00
Jeff Squyres
8bf4ad9030 Refs trac:4301
Complements r30073: tighten up the string parsing of the vendor parts
ID MCA param a bit.  Also fix a small memory leak: ensure to free the
array uint32_t's parsed out of the MCA param.

This commit was SVN r30128.

The following SVN revision numbers were found above:
  r30073 --> open-mpi/ompi@6003702a51

The following Trac tickets were found above:
  Ticket 4301 --> https://svn.open-mpi.org/trac/ompi/ticket/4301
2014-01-06 22:16:04 +00:00
Nathan Hjelm
e627c91227 btl/vader: add support for traditional shared memory.
This commit adds support for placing the send memory segment in a
traditional shared memory segment when XPMEM is not available. The
current default is to reserve 4MB for shared memory on each process.
The latest benchmarks show vader performing better than sm on both
Intel and AMD CPUs.

For large messages vader will now use CMA if it is available (and
XPMEM is not).

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30123.
2014-01-06 19:51:44 +00:00
Nathan Hjelm
5c8ea3a251 btl/openib: Move free list memory allocation to add_procs
Per RFC which expired two weeks ago:

We are planning to make a change to Open MPI to always set up the btls. This
means the btl init will be called even if add_procs is never called for that
btl. In the openib btl free lists fragments are currently allocated in btl_init.
To avoid wasting that memory this commit moves that final device setup to
the add_procs function. This included allocating free lists, and starting the
async event thread.

At this time this change is safe since we have a barrier after add_procs in
MPI_Init. If this changes we will need to re-think some of the initialization
since we might have the possibility of a connection request before add_procs
is called.

Tested with Mellanox ConnectX2 and QLogic HCAs.

Commit also cleans up tabs in btl_openib_async.c.

cmr=v1.7.5:reviewer=miked

This commit was SVN r30122.
2014-01-06 19:51:30 +00:00
Brian Barrett
d4bb1cbbad * Start working on thread safety of Portals 4 MTL
* Only call flowctl_add_procs if there's a new proc in the add_procs call

This commit was SVN r30110.
2014-01-02 22:37:01 +00:00
Brian Barrett
e811a8a9cb Make the Portals 4 collective component disable itself when there's not a
Portals 4 point-to-point (MTL or BTL) component in use

This commit was SVN r30109.
2014-01-02 22:35:37 +00:00
Ralph Castain
871f4e519c Silence warning
Refs trac:4040

This commit was SVN r30105.

The following Trac tickets were found above:
  Ticket 4040 --> https://svn.open-mpi.org/trac/ompi/ticket/4040
2014-01-02 16:05:54 +00:00
Rolf vandeVaart
c47e06463d Adjust CUDA related crossover value.
This commit was SVN r30100.
2013-12-30 18:39:11 +00:00
Rolf vandeVaart
e7f430d9ac Add empty line that was inadvertently removed in message.
This commit was SVN r30099.
2013-12-30 18:38:07 +00:00
George Bosilca
947c180d7f Create a finalize function to provide an opportunity to the mpool
base to release the internal structures.

This commit was SVN r30098.
2013-12-29 11:45:46 +00:00
Ralph Castain
652f7a120f Add Mellanox device IDs that were included in prior releases, but somehow missing again here
cmr=v1.7.4:reviewer=miked

This commit was SVN r30095.
2013-12-26 17:47:05 +00:00
Ralph Castain
62378a64c8 As Jeff pointed out, the reqd flag should only turn off the show_help - still enter the rest of the code block
Refs trac:4019

This commit was SVN r30091.

The following Trac tickets were found above:
  Ticket 4019 --> https://svn.open-mpi.org/trac/ompi/ticket/4019
2013-12-26 15:02:41 +00:00
Ralph Castain
a8a91b374e Update component-level selection comments to match latest revisions
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30087.
2013-12-25 19:12:43 +00:00
Jeff Squyres
12d23e9c92 Left out valid end-of-string comparison in r30073.
Refs trac:4031

This commit was SVN r30074.

The following SVN revision numbers were found above:
  r30073 --> open-mpi/ompi@6003702a51

The following Trac tickets were found above:
  Ticket 4031 --> https://svn.open-mpi.org/trac/ompi/ticket/4031
2013-12-24 12:07:56 +00:00
Jeff Squyres
6003702a51 Minor improvements to the usnic BTL:
1. Fix ompi_info memory leak in usnic BTL: do not allocate memory in
    the component register function, because ompi_info only calls the
    component register function and then dlclose's the component -- it
    does not call component finalize.  Instead, defer parsing the MCA
    param (and alloc'ing memory) until the component init function so
    that any allocated memory can be freed in the component close
    function.
 1. Also add a new check to ensure that we actually have some part
    numbers to check.  Add a show_help message if we don't find any
    vendor part IDs to check.
 1. Add a verbose output if usnic disqualifies itself from selection
    because THREAD_MULTIPLE was specified.

cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r30073.
2013-12-24 11:57:35 +00:00
Ralph Castain
9eebb79d54 Cleanup a loop that couldn't possibly execute as the outer loop indexed was being reused by the inner loops, leaving the index at the cutoff point after the first iteration
cmr=v1.7.4:reviewer=edgar:subject=Cleanup loop in sharedfp

This commit was SVN r30059.
2013-12-23 18:34:34 +00:00
Nathan Hjelm
3be4536d9b Cleanup various leaks in ompi_info reported by valgrind.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30058.
2013-12-23 17:47:43 +00:00
Mike Dubman
80f4e02e0a Several changes:
- Modifications to coll/hcoll component related to the changes in the libhcoll API. 
  Now, hcoll_destroy_context accepts one more parameter that indicates if the context was
  really destroyed as a result of the call. 
  This new "non-blocking" context destruction fixes hang discovered in IMB with mcast enabled. 
- Clean up all the left contexts (if any) on the comm_world destruction. 

fixed by Val, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30055.
2013-12-23 06:57:12 +00:00
Jeff Squyres
71ec6c1617 Remove unnecessary "mpi.h"; move opal headers to the top.
This commit was SVN r30053.
2013-12-22 20:38:43 +00:00
George Bosilca
38cbaeaa82 Try to impose a little bit of consistency on how we parse lists of
modules by enforcing the use of OPAL list accessors.

This commit was SVN r30045.
2013-12-21 23:23:33 +00:00
Ralph Castain
042ed95e4e Remove an annoying warning. If the user excludes a non-existent interface, there is no reason to warn - the interface may simply not exist on that node.
cmr=v1.7.4:reviewer=jsquyres:subject=Remove an annoying warning

This commit was SVN r30042.
2013-12-21 01:51:11 +00:00
Adrian Reber
53a70fe87f Trying to get the C/R code to compile again. (send_*_nb)
This patch changes all send/send_buffer occurrences in the C/R code
to send_nb/send_buffer_nb.
The new code compiles but does not work.

Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED

Changes from V2:
* just replace the blocking calls with the non-blocking calls
* all #ifdef's introduced in V1 are gone
* send_* returns error code or ORTE_SUCCESS (not the number of bytes)

This commit was SVN r30036.
2013-12-20 21:58:28 +00:00
Adrian Reber
a3813d37c7 Trying to get the C/R code to compile again. (recv_*_nb)
This patch changes all recv/recv_buffer occurrences in the C/R code
to recv_nb/recv_buffer_nb.
The old code is still there but disabled using ifdefs (ENABLE_FT_FIXED).
The new code compiles but does not work.

Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED

Changes from V2:
* only #ifdef out the code where the behaviour is changed
  (used to be blocking; now non-blocking)

This commit was SVN r30035.
2013-12-20 21:05:40 +00:00
Rolf vandeVaart
695d854cd8 Fix return value.
This commit was SVN r30034.
2013-12-20 20:57:04 +00:00
Ralph Castain
31248c0985 Correctly add support for the "env" MPI_Info key during comm_spawn, update the "map-by", "rank-by", and "bind-to" Info key behaviors to match the new mapping/ranking/binding system, and update all docs and comments to match.
Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node.

Refs trac:4003

This commit was SVN r30033.

The following Trac tickets were found above:
  Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003
2013-12-20 20:42:39 +00:00
Rolf vandeVaart
4cd1958deb Fix so we do not get warnings when running on system without CUDA software installed and CUDA-aware compiled in.
This commit was SVN r30032.
2013-12-20 20:39:25 +00:00
Dave Goodell
bd901a68ed usnic: fix 'fls' warnings+errors
The old version caused compilation errors on Solaris.  Thanks to Paul
Hargrove for testing and reporting the bug:

  http://www.open-mpi.org/community/lists/devel/2013/12/13520.php

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30025.
2013-12-20 17:37:22 +00:00
Ralph Castain
6959ba5577 Add missing include file.
Thanks to Paul Hargrove for spotting it.

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29998.
2013-12-19 23:39:21 +00:00
Dave Goodell
0c6b292442 romio: pick "infinitely stale" fix from upstream
Some NFS scenarios can result in an infinite ESTALE return, which will
hang ROMIO.  This commit causes ROMIO to error out after a large number
of retries instead of spinning forever.

This is MPICH commit b250d338:

http://git.mpich.org/mpich.git/commit/b250d338e66667a8a1071a5f73a4151fd59f83b2

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r29993.
2013-12-19 22:55:26 +00:00
Ralph Castain
b745078535 Support user-provided envars for comm_spawn using info key "env"
Thanks to Tom Fogal for the request

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29990.
2013-12-19 20:59:50 +00:00
Jeff Squyres
3a14adef63 Remove the comments around these assignments; otherwise, we won't get
function pointers set to the _map functions, and we get segv's in MTT
testing (e.g., the C++ suite, which actually calls MPI_Cart_map and
MPI_Graph_map).

cmr=v1.7.4:reviewer=bosilca:subject=Fix topo _map function pointer assignments

This commit was SVN r29988.
2013-12-19 20:41:32 +00:00
Yossi Etigin
6ab4aba9e6 Fix missing include of show_help.h in mtl mxm.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29987.
2013-12-19 19:37:21 +00:00
Tom Naughton
3aefca32b0 + update rte db_fetch comments with change from r29931
This commit was SVN r29971.

The following SVN revision numbers were found above:
  r29931 --> open-mpi/ompi@0995a6f3b9
2013-12-19 01:16:58 +00:00
Jeff Squyres
bb59b07321 Remove CFLAGS setting that was really only intended for the v1.6
branch (it's not necessary on trunk/v1.7 because they require C99,
which allows variadic macros).

Also fix another compiler warning (using %p to print a (void*)).

Submitted by Jeff, reviewed by Dave.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=two usnic BTL fixes

This commit was SVN r29966.
2013-12-19 00:19:05 +00:00
Nathan Hjelm
b9765a380f Update NEWS with new MPI-3 features and a note about the new ROMIO
version.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29965.
2013-12-19 00:16:07 +00:00
Jeff Squyres
515fd00411 CSCul95082: DMAR faults during mtt testing
usnic_channel_finalize() was deregistering recv buffers before
destroying the QP to which they were posted. The QP needs to be
destroyed first so that the NIC does not attemp tto write to
deregistered memory, causing the DMAR messages.

Submitted by Reese, reviewed by Jeff.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29963.
2013-12-19 00:01:35 +00:00
Yossi Etigin
ecfb122c97 Fix segfault in osc pt2pt completion handler, when the request is canceled during finalization.
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29938.
2013-12-17 17:30:14 +00:00
Ralph Castain
0995a6f3b9 Revert r29917 and replace it with a fix that resolves the thread deadlock while retaining the desired debug info. In an earlier commit, we had changed the modex accordingly:
* automatically retrieve the hostname (and all RTE info) for all procs during MPI_Init if nprocs < cutoff

* if nprocs > cutoff, retrieve the hostname (and all RTE info) for a proc upon the first call to modex_recv for that proc. This would provide the hostname for debugging purposes as we only report errors on messages, and so we must have called modex_recv to get the endpoint info

* BTLs are not to call modex_recv until they need the endpoint info for first message - i.e., not during add_procs so we don't call it for every process in the job, but only those with whom we communicate

My understanding is that only some BTLs have been modified to meet that third requirement, but those include the Cray ones where jobs are big enough that launch times were becoming an issue. Other BTLs would hopefully be modified as time went on and interest in using them at scale arose. Meantime, those BTLs would call modex_recv on every proc, and we would therefore be no worse than the prior behavior.

This commit revises the MPI-RTE interface to pass the ompi_proc_t instead of the ompi_process_name_t for the proc so that the hostname can be easily inserted. I have advised the ORNL folks of the change.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix thread deadlock

This commit was SVN r29931.

The following SVN revision numbers were found above:
  r29917 --> open-mpi/ompi@1a972e2c9d
2013-12-17 03:26:00 +00:00
Adrian Reber
b42aad44a3 Trying to get the C/R code to compile again. This patch
includes various fixes all over the C/R code which are
hard to group like the other patches.

Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values

Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)

This commit was SVN r29922.
2013-12-16 15:35:28 +00:00
George Bosilca
1a972e2c9d Don't be greedy, just do what we asked for.
This commit was SVN r29917.
2013-12-15 16:54:01 +00:00
George Bosilca
430a13719f Only if OMPI_BTL_SM_HAVE_CMA is set to 1.
This commit was SVN r29916.
2013-12-15 16:49:27 +00:00
Jeff Squyres
0ab48ad0d2 Fix some annoying flex warnings that have been there for years.
Many thanks to Tom Fogal for the initial patch.

cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings

This commit was SVN r29904.
2013-12-14 00:36:12 +00:00
Rolf vandeVaart
b955dbd6d9 Fix various items discovered by review of ticket #3951.
This commit was SVN r29900.
2013-12-13 21:25:07 +00:00
Jeff Squyres
f4afa4fd1f Add missing include, exposed in "external libevent" work.
Refs trac:3694

This commit was SVN r29898.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 21:21:30 +00:00
Jeff Squyres
bcfe2156d5 Bring over m4 quoting fix from v1.7 branch (in r29894) that was
discovered when removing some components.

This commit was SVN r29895.

The following SVN revision numbers were found above:
  r29894 --> open-mpi/ompi@58ed00296c
2013-12-13 20:27:33 +00:00
Ralph Castain
f763be26c4 Closes trac:2433. Check for hetero architecture and disqualify sm connections if that is found as the sm btl currently doesn't support hetero operations.
cmr=v1.7.4:reviewer=brbarret:subject=Disqualify sm btl for hetero procs

This commit was SVN r29882.

The following Trac tickets were found above:
  Ticket 2433 --> https://svn.open-mpi.org/trac/ompi/ticket/2433
2013-12-13 15:23:33 +00:00
Mike Dubman
fb3f94a16e remove debug print
Refs trac:3969

This commit was SVN r29876.

The following Trac tickets were found above:
  Ticket 3969 --> https://svn.open-mpi.org/trac/ompi/ticket/3969
2013-12-13 06:08:44 +00:00
Mike Dubman
21be95c9b5 Initialize sm global variables in mca_btl_sm_component_open(), because they are destructed in mca_btl_sm_component_close(), and init() function might not be called or fail.
For exammple, mca_btl_sm.knem_fd remained 0, and mca_btl_sm_component_close() ended up doing closing fd 0 which belongs to someone else.

fixed by Yossi, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29875.
2013-12-13 06:01:24 +00:00
Jeff Squyres
bac67e0d81 Per discussion @Chicago OMPI dev meeting Dec 2013: remove all MX support.
This commit was SVN r29873.
2013-12-12 18:54:47 +00:00
Nathan Hjelm
3262080391 Cleanup udcm structures to avoid issues with nesting structures with
flexible members.

UDCM is ready to go for 1.7.4 with this patch.

cmr=v1.7.4:ticket=3940

This commit was SVN r29861.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-12 05:24:37 +00:00
Nathan Hjelm
e0e94a6029 Fix warning caused by typo in r29815
This commit was SVN r29860.

The following SVN revision numbers were found above:
  r29815 --> open-mpi/ompi@d556b60b21
2013-12-11 21:45:39 +00:00
Nathan Hjelm
6ab69c758b Fix warnings in udcm.
cmr=v1.7.4:reviewer=rhc:ticket=3940

This commit was SVN r29859.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-11 21:40:06 +00:00
Rolf vandeVaart
3ae88f8a24 Ensure no fork support with GDR. CUDA-aware code only.
This commit was SVN r29854.
2013-12-10 18:08:53 +00:00
Rolf vandeVaart
1cc55f305f Add extra check for GDR. Adjust some names and replace opal_output with opal_show_help.
This commit was SVN r29853.
2013-12-10 16:04:08 +00:00
Jeff Squyres
0f61bb651e Technically, the PORT_ACTIVE is not a "bad" event.
Note that this event should never happen within a single OMPI job,
because OMPI will ignore usnic ports that are down.  The PORT_ACTIVE
event should only occur if a port ''was'' down and is now ''up''.  But
what the heck -- if we ever do get this event, it is harmless -- just
ignore it.

This commit was SVN r29852.
2013-12-09 20:45:55 +00:00
Edgar Gabriel
c253c2eec6 fix the condition for the lazy open of shared filepointers.
This commit was SVN r29850.
2013-12-09 19:37:21 +00:00
Mike Dubman
9a65e0d8c6 cosmetic fixed fpr hcol autotools
Refs: #3694

This commit was SVN r29841.
2013-12-08 09:45:13 +00:00
Mike Dubman
2e124454b4 cosmitic fix to remove redundant -lfca
use CPP extra flags var which propagated to coll/fca and scoll/fca
Refs: #3694

This commit was SVN r29832.
2013-12-07 15:00:54 +00:00
Jeff Squyres
3bd9c603ff Clean up variables used in configure with OPAL_VAR_SCOPE.
This is helpful in the work for #3694: ensure that many places that
eventually end up in configure don't overly-pollute the global shell
variable space (because debugging accidental shell variable pollution
can be a real pain).

Refs trac:3694

This commit was SVN r29830.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-06 23:40:34 +00:00
Rolf vandeVaart
d556b60b21 Chnage some CUDA configure code and macro names per review request by jsquyres in ticket #3880.
Functionally, nothing changes.

This commit was SVN r29815.
2013-12-06 14:35:10 +00:00
Nathan Hjelm
231ebb09c9 Update romio configury to remove a warning message.
cmr=v1.7.4:ticket=3158

This commit was SVN r29811.

The following Trac tickets were found above:
  Ticket 3158 --> https://svn.open-mpi.org/trac/ompi/ticket/3158
2013-12-06 00:12:35 +00:00
Dave Goodell
da26226e3c usnic: add some extra debug-build sanity checks
On the off chance that the PML is twiddling fields that it really
shouldn't be...

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29804.
2013-12-05 00:28:11 +00:00
Jeff Squyres
ba018b3603 Protect the container_of #define.
MOFED apparently has a /usr/include/infiniband/verbs.h that also
defines a (slightly different but fully compatible) container_of
macro.  So put proper #ifndef protection around our definition of
container_of.

Thanks to Rolf vandeVaart for pointing out the issue.

Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29799.
2013-12-04 14:24:56 +00:00
Yossi Etigin
a913b00f89 mtl mxm: update configuration parsing api to mxm 2.1, drop
older version support (1.0 and 1.1), and cleanup the code.

reviewed by miked.

cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29797.
2013-12-04 09:11:55 +00:00
Jeff Squyres
c74c1e86d3 Per suggestion from Paul Kapinos, report in BTL verbosity if a device
is skipped because it is too far away.

(see thread starting here:
http://www.open-mpi.org/community/lists/devel/2013/06/12470.php)

This commit was SVN r29790.
2013-12-03 22:44:11 +00:00
Rolf vandeVaart
218c05a4d1 Make sure synchronous copies are complete before moving the data.
This commit was SVN r29789.
2013-12-03 21:20:14 +00:00
Rolf vandeVaart
ab77435d9b Fix the CUDA-aware case where we are not sending any GPU data.
This commit was SVN r29788.
2013-12-03 20:25:58 +00:00
Devendar Bureddy
4554770ee4 hcol fixes
cmr=v1.7.4:reviewer=jladd

This commit was SVN r29787.
2013-12-03 20:21:40 +00:00
Nathan Hjelm
fe327d9859 udcm: cleanup code and improve the ack handling
Originally udcm acks used the immediate data to indicate which message was
being acknowleged. This data was (mysteriously) junk when using QLogic HCAs so I
updated udcm to use the source info (slid, qp, etc) to determine which message was being
acked. This works as long as we don't have two messages simultaneously in flight
to a particular peer and then loose the first of the two messages. The chances of this
happening are tiny. To fix this case I updated the udcm message header to include
a pointer to the in flight message. This pointer is then sent back to the sending
process to ack receipt.

cmr=v1.7.4:ticket=trac:3940

This commit was SVN r29775.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-02 20:18:46 +00:00
Jeff Squyres
3a7af4ab40 Fix another clang warning: sendreq is undefined if proc==NULL.
cmr=v1.7.4:reviewer=hjelmn:subject=fix ob1 undefined sendreq value

This commit was SVN r29774.
2013-12-02 19:44:42 +00:00
Nathan Hjelm
fb0b0442c4 openib/connect: re-enable xrc support in the openib btl
This commit updates the udcm cpc to support xrc. The steps followed by udcm
mimic those in the removed xoob cpc. This update has been tested with both XRC
and RC.

Mellanox, this is intended to go into 1.7.4. Please review carefully and let
me know if there are any issues.

cmr=v1.7.4:reviewer=miked

This commit was SVN r29767.
2013-11-27 22:28:04 +00:00
George Bosilca
cb24277737 Restrict the usage of MPI_Type_extent only to receiving processes
(aka the root). This commit is based on a patch provided by Pierre 
Jolivet.
Fix all the output to match the failing MPI call.

This commit was SVN r29761.
2013-11-27 12:09:31 +00:00
Rolf vandeVaart
aa98b0333b Call function from function table. Discovered during static build.
This commit was SVN r29755.
2013-11-25 22:46:07 +00:00
Ralph Castain
ac9820c46f Link against common cuda library
Thanks to Jorg Bornschein for pointing it out

cmr=v1.7.4:reviewer=rolfv

This commit was SVN r29750.
2013-11-24 17:06:51 +00:00
George Bosilca
68268377af Fix an error message for the igather and the usage of the extent on
non non-root processes for the iscatter. Thanks to Pierre
Jolivet for the bug report and the patch.

This commit was SVN r29736.
2013-11-23 00:59:22 +00:00
Edgar Gabriel
4f425872be fix the streams used in opal_output in the sharedfp components.
This commit was SVN r29726.
2013-11-21 16:11:49 +00:00
Devendar Bureddy
4a311ae9fd continue search sorted openib device list if no btls found with nearest HCA.
cmr=v1.7.4:reviewer=jladd

This commit was SVN r29725.
2013-11-20 22:23:12 +00:00
Nathan Hjelm
24a7e7aa34 Add support for the udreg registration cache and dynamics on XE/XK/XC.
To support the new mpool two changes were made to the mpool infrastructure:

 1) Added an mpool flag to indicate that an mpool does not need the memory
    hooks to use the leave pinned protocols. This flag is checked in the
    mpool lookup.

 2) Add a mpool context to the base registration. This new member is used
    by the udreg mpool to store the udreg context associated with the
    particular registration. The new member will not break the ABI
    compatibility as the new member is only currently used by the udreg
    mpool.

Dynamics support for Cray systems makes use of the global rank provided by
orte to give the ugni library a unique rank for each process. Dynamics
support is not available under direct-launch (srun.)

cmr=v1.7.4

This commit was SVN r29719.
2013-11-18 04:58:37 +00:00
Jeff Squyres
5206e877be Help decrease conflicts between SVN trunk and Cisco git branch of OMPI v1.6 branch
This commit was SVN r29715.
2013-11-15 21:35:56 +00:00
Jeff Squyres
e6ed7c9f4d Avoid trivial "don't mix declarations and code" compiler warning
This commit was SVN r29714.
2013-11-15 21:31:10 +00:00
Rolf vandeVaart
92e6aaa808 Adjust a default value. Adjust some levels of verbosity and one more debug message.
This commit was SVN r29712.
2013-11-14 21:47:27 +00:00
Ralph Castain
7480beb7f0 Per request from Nathan, add an offset value to the job struct so we can construct a "global rank" that spans multiple jobs during dynamic launch operations. Store a new ORTE_DB_GLOBAL_RANK value for each process in the database, and ensure that we share our own value during connect_accept so both sides can see it.
This isn't being used yet - just enabling Nathan to do what he needs.

***** NOTE: any use of the OMPI_DB_GLOBAL_RANK database key must be protected by #ifdef OMPI_DB_GLOBAL_RANK as not all RTE's will define this key. *****

This commit was SVN r29708.
2013-11-14 17:01:43 +00:00
Ralph Castain
22e30a680d Given that the oob and xoob cpc's are no longer operable and haven't been since the OOB update, remove them to avoid confusion
cmr:v1.7.4:reviewer=hjelmn:subject=Remove stale cpcs from openib

This commit was SVN r29703.
2013-11-14 04:16:53 +00:00
Nathan Hjelm
6b3cf0c1ba Merge branch 'romio_refresh'
This commit was SVN r29695.
2013-11-13 21:02:55 +00:00
Rolf vandeVaart
4964a5e98b Per this RFC from October 8, 2013 and as discuessed in telecon.
http://www.open-mpi.org/community/lists/devel/2013/10/13072.php

Add support for pinning GPU Direct RDMA in openib BTL for better small message latency of GPU buffers. 
Note that none of this is compiled in unless CUDA-aware support is requested.

This commit was SVN r29680.
2013-11-13 13:22:39 +00:00
Jeff Squyres
98ff91cfeb Refs trac:3091
Gah!  The "device" variable isn't used at all in this loop (my eye
glossed over the next line and thought that "device" was used in the
free() statement, but it's actually "devices" -- not "device").

This commit was SVN r29665.

The following Trac tickets were found above:
  Ticket 3091 --> https://svn.open-mpi.org/trac/ompi/ticket/3091
2013-11-12 23:01:04 +00:00
Jeff Squyres
7cb31111a6 Refs trac:3901
Feedback from Dave's review.

This commit was SVN r29664.

The following Trac tickets were found above:
  Ticket 3901 --> https://svn.open-mpi.org/trac/ompi/ticket/3901
2013-11-12 22:51:20 +00:00
Ralph Castain
762400d559 Silence warning
Refs trac:3898

This commit was SVN r29659.

The following Trac tickets were found above:
  Ticket 3898 --> https://svn.open-mpi.org/trac/ompi/ticket/3898
2013-11-11 22:53:09 +00:00
Jeff Squyres
5a940f5ee7 Arrgh -- remove debugging printf.
This commit was SVN r29657.
2013-11-11 22:44:28 +00:00
Jeff Squyres
e20217eccc Expand the "btl_usnic" MPI_T enumeration to have strings of the form:
<usnic device name>,<eth device>,<ip address>/<CIDR prefix>

For example:

   usnic_0,eth4,10.1.0.15/16

This is just handy for mapping the usnic_X device back to the IP
network to which it corresponds.

This commit was SVN r29656.
2013-11-11 22:25:30 +00:00
Nathan Hjelm
6a331275d8 Set transfers as active before starting them.
cmr=v1.7.4:ticket=trac:3898

This commit was SVN r29654.

The following Trac tickets were found above:
  Ticket 3898 --> https://svn.open-mpi.org/trac/ompi/ticket/3898
2013-11-11 21:50:54 +00:00
Nathan Hjelm
3d3c29ae96 btl/scif: do not return resource busy if we started a connection attempt.
Resolves a hang when using scif for shared memory transfers. This is a
simple change and doesn't require a review.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29653.
2013-11-11 19:36:34 +00:00
Nathan Hjelm
b5ce72cc15 Set the modex as active before starting it. This resolves a hang in
MPI_Init() on comm-spawned processes.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29652.
2013-11-11 19:33:32 +00:00
Rolf vandeVaart
a6df7bc33a Fix issues reported in ticket #3877. Also added additional comments.
This commit was SVN r29641.
2013-11-07 20:44:47 +00:00
Rolf vandeVaart
2cf7c40ee5 Minor adjustments to error messages due to review of #3880.
This commit was SVN r29640.
2013-11-07 20:21:21 +00:00
Rolf vandeVaart
3290cde630 Various minor changes to bring smcuda up to date with sm.
This commit was SVN r29639.
2013-11-07 19:45:56 +00:00
Dave Goodell
82db913490 usnic: fix module_recv_buffers perf regression
Cisco v1.6 git commit 913ec6c and upstream trunk r29593 (segfault fix)
introduced a performance regression by inadvertently disabling the
`module_recv_buffers` functionality.  With those changes in place, the
`btl_usnic_recv.c` logic would end up mallocing a buffer that should
have otherwise come from a `module_recv_buffers` pool.  It also resulted
in a small, bounded memory leak (128 buffers at each power-of-two size
interval).

The new version just places the buffer after the free list item with a
flexible array member.  I bumped the pool to allocate all 128 elements
up front because the deferred allocation was modestly impacting IMB
Sendrecv performance at a few sizes.

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29631.

The following SVN revision numbers were found above:
  r29593 --> open-mpi/ompi@1ed9b8ff43
2013-11-07 01:27:31 +00:00
Vishwanath Venkatesan
d37a5faa20 Need not do aggregator selection for one process case
So adding a check for this corner case!

This commit was SVN r29622.
2013-11-06 21:05:26 +00:00
Brian Barrett
cf8de1ef0f Minor indent cleanup in init_query()
Only use Portals on communicators with more than one rank
Fix computation of number of children when using the hypercube tree

This commit was SVN r29616.
2013-11-06 15:21:09 +00:00
Jeff Squyres
e28261898d Per discussion on the devel list, rename the btl_usnic_devices MPI_T
state pvar to be btl_usnic (i.e., the best suggestion so far).

See http://www.open-mpi.org/community/lists/devel/2013/11/13188.php
for more detail.

This commit was SVN r29614.
2013-11-06 06:19:03 +00:00
Rolf vandeVaart
e46c0bb952 Fix one more space for consistent defines.
This commit was SVN r29607.
2013-11-05 15:31:49 +00:00
Rolf vandeVaart
64b3a24fec Fix CUDA-aware compile issues.
This commit was SVN r29606.
2013-11-05 14:46:58 +00:00
Rolf vandeVaart
e57795f097 Revert r29594. That was just plain wrong. Sorry about workday configure change.
This commit was SVN r29605.

The following SVN revision numbers were found above:
  r29594 --> open-mpi/ompi@ed7ddcd9c7
2013-11-05 14:45:56 +00:00
Rolf vandeVaart
ed7ddcd9c7 Fix CUDA-aware compile error introduces with r29581.
This commit was SVN r29594.

The following SVN revision numbers were found above:
  r29581 --> open-mpi/ompi@ee7510b025
2013-11-05 00:08:33 +00:00
Dave Goodell
1ed9b8ff43 usnic: fix segfault at finalize time
Without this commit, if you run IMB pingpong between two nodes with only
one usnic selected (e.g., via `--mca btl_usnic_if_include usnic_0`) then
the run will seem fine but will segfault at MPI_Finalize time.

This behavior has happened since Cisco v1.6 git commit ec7ddf8, upstream
trunk r29484, and upstream v1.7 r29507.

Root cause was that the free list element was being used as the recv
buffer instead of the data buffer associated with the element.  So the
reassembly code would stomp all over the free list element, which would
cause the destructor to explode when the free list attempted to clean up
all of its elements.  This surprisingly did not cause any other problems
until now.

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29593.

The following SVN revision numbers were found above:
  r29484 --> open-mpi/ompi@a6ed232a10
  r29507 --> open-mpi/ompi@790d269ce8
2013-11-04 22:52:14 +00:00
Dave Goodell
73a943492c usnic: pack via convertor on the fly
If we need to use a convertor, go back to stashing that convertor in the
frag and populating segments "on the fly" (in
ompi_btl_usnic_module_progress_sends).  Previously we would pack into a
chain of chunk segments at prepare_src time, unnecessarily consuming
additional memory.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29592.
2013-11-04 22:52:03 +00:00
Dave Goodell
71d0d73575 usnic: refactor callback invocation
This makes it a little easier to see what's happening with callbacks to
the PML.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29591.
2013-11-04 22:51:48 +00:00
Dave Goodell
4c791e21d2 usnic: add MSGDEBUG1_OUT/MSGDEBUG2_OUT macros
This includes suppressing picky-mode warnings about __VA_ARGS__, which
we know are supported by any compilers we care about.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29590.
2013-11-04 22:51:35 +00:00