1
1
Граф коммитов

7294 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
a9fb4976d5 coll/ml: more fixes
There were a couple of issues with the memory leak fixes and several more verbose
issues. This fixes those issues.

cmr=v1.8.1:ticket=trac:4473

This commit was SVN r31273.

The following Trac tickets were found above:
  Ticket 4473 --> https://svn.open-mpi.org/trac/ompi/ticket/4473
2014-03-28 18:31:28 +00:00
Nathan Hjelm
efa37c17c8 osc/base: fix one more case in ompi_osc_base_sndrcv_op
This fixes more issues identified by armci. More issues still remain and fixes are
coming for those as well.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31272.
2014-03-28 18:31:10 +00:00
Jeff Squyres
173c046617 build: add Automake-like silent/verbose macros for "ln -s ..." operations
Also, since I put some of the macros for these silent/verbose rules up
in the top-level Makefile.man-page-rules file, I renamed it to
Makefile.ompi-rules.

I've had this sitting around for a while; now seems like as good a
time as any to commit it.

This commit was SVN r31271.
2014-03-28 18:24:32 +00:00
Nathan Hjelm
ecce211403 btl/vader: create the shared memory backing file in the proc's session
directory not the job's

This bug didn't affect the correctness of the vader results just the
cleanup. This commit removes an error message about removing a non-existent
file.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31265.
2014-03-28 00:38:19 +00:00
Nathan Hjelm
bd3b550c6d coll/ml: fix leaks
Thanks to ggouaillardet for finding and fixing these issues.

Closes trac:4460

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31264.

The following Trac tickets were found above:
  Ticket 4460 --> https://svn.open-mpi.org/trac/ompi/ticket/4460
2014-03-27 23:25:31 +00:00
Nathan Hjelm
595a6e94e6 Fix typos in r31260
Also added some missing values and sentinels.

cmr=v1.8:ticket=trac:4470

This commit was SVN r31263.

The following SVN revision numbers were found above:
  r31260 --> open-mpi/ompi@69036437b7

The following Trac tickets were found above:
  Ticket 4470 --> https://svn.open-mpi.org/trac/ompi/ticket/4470
2014-03-27 22:34:28 +00:00
Jeff Squyres
24f7bd327e MPI-3: Add missing MPI_Comm_get|set_info functions
Thanks to Lisandro Dalcin for pointing out the issue.

cmr=v1.8:reviewer=hjelmn

This commit was SVN r31262.
2014-03-27 21:41:59 +00:00
Nathan Hjelm
93238b2c58 Fix typo in r31260
cmr=v1.8:ticket=trac:4470

This commit was SVN r31261.

The following SVN revision numbers were found above:
  r31260 --> open-mpi/ompi@69036437b7

The following Trac tickets were found above:
  Ticket 4470 --> https://svn.open-mpi.org/trac/ompi/ticket/4470
2014-03-27 21:04:56 +00:00
Nathan Hjelm
69036437b7 Add missing MPI_WEIGHTS_EMPTY constant
cmr=v1.8:reviewer=jsquyres

This commit was SVN r31260.
2014-03-27 20:59:52 +00:00
Nathan Hjelm
545d5daced osc: add missing MPI_ERR_RMA_SHARED error code and internal equivalent
cmr=v1.8:reviewer=jsquyres

This commit was SVN r31259.
2014-03-27 20:06:43 +00:00
Jeff Squyres
cdb396697c usnic: do not disqualify if a peer does not put usnic modex info
If ompi_modex_recv() fails with OPAL_ERR_DATA_VALUE_NOT_FOUND, it
simply means that the peer process did not put any usnic BTL modex
info -- it is not an error.  So have the usnic BTL simply ignore that
peer (vs. disqualifying itself / treating this like a real error).

Refs trac:4442.

This commit was SVN r31258.

The following Trac tickets were found above:
  Ticket 4442 --> https://svn.open-mpi.org/trac/ompi/ticket/4442
2014-03-27 19:37:07 +00:00
Nathan Hjelm
b3bb90cf2d Do not include inttypes.h directly in Open MPI. Use opal_stdint.h instead.
This commit should finish the work started for #869. Closing that ticket
with this commit.

Closes trac:869

cmr=v1.8.1:reviewer=jsquyres

This commit was SVN r31257.

The following Trac tickets were found above:
  Ticket 869 --> https://svn.open-mpi.org/trac/ompi/ticket/869
2014-03-27 17:56:00 +00:00
Vasily Filipov
8ef2e746e6 BTL/OPENIB: fix for rdma cm AF_IB case - user private data pointer points to a lib RDMA CM header and not to a "Consumer Private Data".
This commit was SVN r31247.
2014-03-27 14:04:02 +00:00
Jeff Squyres
224842e4c9 ompi_info.1: Include much more info about the --level CLI option
Add a lot more information about the --level CLI option, and the nine
levels.

Also remove some now-erroneous examples regarding --version.

cmr=v1.8:reviewer=rhc

This commit was SVN r31246.
2014-03-27 12:23:21 +00:00
Alina Sklarevich
5cbf085dc2 mtl mxm: silent a warning.
in ompi_mtl_mxm_add_procs, define the ep_index variable only
for an older version of mxm.

submitted by Alina, reviewed by Mike.
cmr=v1.8:reviewer=ompi-rm1.8

This commit was SVN r31245.
2014-03-27 08:39:51 +00:00
Nathan Hjelm
0cccb2fb59 coll/ml: reduce noise from coll/ml error messages
The error doesn't prevent the user from running so there is no reason
to display it unless the user requested it (through coll_ml_verbose).

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31242.
2014-03-26 22:50:06 +00:00
Nathan Hjelm
b9da3ef462 btl/vader: actually set the correct send size in all cases
Fix a one line bug when dealing with non-contiguous sends in prepare_src. Bug was
identified by the intel test suite.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31232.
2014-03-26 21:50:07 +00:00
Nathan Hjelm
fc941edaf8 osc/base: adjust the logic in ompi_osc_base_sndrcv_op to adjust for
the case fix in ompi_osc_base_process_op in r31204.

There are two cases that needed to be handled:

 - The target is a simple datatype (contiguous block of a primitive
   type) but the origin is not. In this case we still need to pack
   the origin data but we can not rely on the convertor to do the
   unpack (see r31204).

 - Both the origin and target datatypes are simple datatypes. In this
   case we can use ompi_op_reduce to do the accumulation without having
   to pack the origin data.

cmr=v1.8:ticket=trac:4449

This commit was SVN r31231.

The following SVN revision numbers were found above:
  r31204 --> open-mpi/ompi@949abe45cd

The following Trac tickets were found above:
  Ticket 4449 --> https://svn.open-mpi.org/trac/ompi/ticket/4449
2014-03-26 17:07:29 +00:00
Nathan Hjelm
5400e21688 btl/vader: unlink the shared memory segment when finished
cmr=v1.8:reviewer=jsquyres

This commit was SVN r31230.
2014-03-26 16:25:02 +00:00
Nathan Hjelm
925af4706c osc/sm: fix bugs in window initialization and finalization
Fixed two bugs:

 - Use module->comm NOT comm to get the CID for the shared memory backing
   file. This fixes the case where there are multiple shared memory windows
   at the same time.

 - Remember to unlink the shared memory backing file.

Refs trac:4438

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31227.

The following Trac tickets were found above:
  Ticket 4438 --> https://svn.open-mpi.org/trac/ompi/ticket/4438
2014-03-26 15:52:51 +00:00
Nathan Hjelm
020f011552 osc/rdma: fix bugs in lock_all and flush_all
This commit fixes two bugs:

 - We were not correctly setting the lock type in the outstanding lock
   for lock_all. This caused undefined behavior.

 - flush_all was incorrectly checking for comm size - 1 lock acks but
   comm size flush acks. This is the reverse of what was intended.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31226.
2014-03-25 23:39:43 +00:00
Jeff Squyres
5a09ee5d8c usnic: drop unknown connection checker packets without erroring out
In most cases, bad messages received by the connectivty checker are
just dropped.  However, in one specific code path, a bad packet caused
an abort.  Doh!

This commit does two things:

1. Improve verbose messages for all these cases
1. Simply drop incoming messages that cannot be identified as ACKs or PINGs

Submitted by Jeff Squyres, reviewed by Dave Goodell.

cmr=v1.8:reviewer=ompi-rm1.8

This commit was SVN r31225.
2014-03-25 21:05:20 +00:00
Nathan Hjelm
0d703759f6 osc/rdma: fix possible error when encountering accumulate lock contention
It is possible to get into a situation where a small accumulate operation
can not be completed because a large accumulate operation holds the lock.
In this case we may return from wait/flush/etc before the operation is
complete. To handle this case increment the expected incoming fragment
count when queuing an accumulate operation and increment the incoming
fragment count after processing the accumulate operation.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31224.
2014-03-25 21:00:43 +00:00
Nathan Hjelm
3df85b47e9 osc/rdma: quiet warning in r31197
cmr=v1.8:ticket=trac:4441

This commit was SVN r31223.

The following SVN revision numbers were found above:
  r31197 --> open-mpi/ompi@0ed44f2fdb

The following Trac tickets were found above:
  Ticket 4441 --> https://svn.open-mpi.org/trac/ompi/ticket/4441
2014-03-25 21:00:36 +00:00
Nathan Hjelm
20af8339e6 osc/base: add support for datatypes that are a contiguous combination
of the primitive datatype

In this case we can not use the convertor to run the accumulate operation
since the datatype is a more or less a primitive type.

cmr=v1.8:ticket=trac:4449

This commit was SVN r31222.

The following Trac tickets were found above:
  Ticket 4449 --> https://svn.open-mpi.org/trac/ompi/ticket/4449
2014-03-25 21:00:26 +00:00
Nathan Hjelm
d681eb4655 osc/rdma: fix warnings introduced by r31204
cmr=v1.8:ticket=trac:4449

This commit was SVN r31221.

The following SVN revision numbers were found above:
  r31204 --> open-mpi/ompi@949abe45cd

The following Trac tickets were found above:
  Ticket 4449 --> https://svn.open-mpi.org/trac/ompi/ticket/4449
2014-03-25 21:00:19 +00:00
Nathan Hjelm
949abe45cd osc: fix datatype related issues in the one-sided code
This commit fixes two issues:

 - osc/rdma: The target side of an accumulate was using the target datatype
   in the receive to the packed buffer. This was conflicting with the way
   the reduction is done into the target buffer. Changed the receive to use
   the primitive datatype.

 - osc/base: The copy table was completely wrong. Fixed the table to match
   the underlying datatypes (which are opal not ompi datatypes).

 - osc/base: There is a problem using the optimized description. Fall back
   on using the non-optimized description until we can understand what is
   going wrong.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31204.
2014-03-25 15:28:48 +00:00
Nathan Hjelm
bc55276844 osc/rdma: fix bug in the active message code that could cause erroneous
results

The code to handle completion messages did not correctly increment the
number of expected messages. This could cause wait to return before all
incoming messages are complete.

I also added a check to ensure that start returns an error if we are in
a passive access epoch.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31203.
2014-03-25 15:28:36 +00:00
Jeff Squyres
8c2b9658ce Commit upstream ROMIO fix: dbad7873926a75adbff0fd0140ae321412f70d66
ROMIO code assumes all processes will use the same ROMIO driver.  we
were not reaching the "find a common file system" logic when NFS was
enabled, everyone stat-ed the file system without errors, but some
processees found a different file system (like if some processes are
writing to NFS and others to UFS)

See discussion beginning here:
http://lists.mpich.org/pipermail/discuss/2014-March/002403.html

Tested-by: Jeff Squyres <jsquyres@cisco.com>

Submitted by Rob Lathan, reviewed by Jeff Squyres

cmr=v1.8:reviewer=ompi-rm1.8

This commit was SVN r31201.
2014-03-25 14:50:07 +00:00
Alina Sklarevich
947233f539 common/verbs: added a call to ompi_ibv_free_device_list.
the ompi_common_verbs_find_ports function had a call to
ompi_ibv_get_device_list, but not to ompi_ibv_free_device_list.

fixed by Alina, reviewed by Vasily/Mike.
cmr=v1.8:reviewer=ompi-rm1.8 

This commit was SVN r31200.
2014-03-25 14:41:09 +00:00
Mike Dubman
b8dddabcfb add config section for upcoming ConnectiX4 card
cmr=v1.8:reviewer=ompi-rm1.8

This commit was SVN r31199.
2014-03-25 14:27:09 +00:00
Oscar Vega-Gisbert
cc511d0efc Avoid use Status member in Comm, Message and File.
This commit was SVN r31198.
2014-03-24 22:28:30 +00:00
Nathan Hjelm
0ed44f2fdb osc/rdma: add support for datatypes with large descriptions
This commit adds large datatype description support to the osc/rdma
component. Support is provided by an additional send/recv of the datatype
description if the description does not fit in an eager buffer. The
code is designed to require minimal new code and not for speed. We
consider this code path to be a slow path.

Refs trac:1905

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31197.

The following Trac tickets were found above:
  Ticket 1905 --> https://svn.open-mpi.org/trac/ompi/ticket/1905
2014-03-24 18:57:29 +00:00
Vasily Filipov
c424ad94f3 BTL/OPENIB: remove AC_RUN_IFELSE from configure and check AF_IB support by lib rdmacm during component_init.
This commit was SVN r31194.
2014-03-24 13:36:04 +00:00
Nathan Hjelm
15a8c9d7b8 coll/ml: addendum to r31189. increment the bcol_index
cmr=v1.8:ticket=trac:4436

This commit was SVN r31193.

The following SVN revision numbers were found above:
  r31189 --> open-mpi/ompi@c7d830f4b9

The following Trac tickets were found above:
  Ticket 4436 --> https://svn.open-mpi.org/trac/ompi/ticket/4436
2014-03-21 22:03:56 +00:00
Nathan Hjelm
128cfe0a39 coll/ml: cleanup tabs, indentation, and trailing whitespace in
bcol_basesmuma_bcast.c

This commit was SVN r31192.
2014-03-21 21:54:48 +00:00
Nathan Hjelm
d241f95af1 squash into previous. fix coll ml bcast
This commit was SVN r31191.
2014-03-21 21:54:41 +00:00
Nathan Hjelm
6740813c27 bcol/basesmuma: fix selection of coll/ml when only using local procs
When we are only using local ranks basesmuma needs to provide an allreduce
function for both large and small message or else the coll/ml selection
logic will fail. In the future this logic should probably be updated to
just disable allreduce in coll/ml instead of disabling coll/ml. For now
it should be correct to say the basesmuma allgather works for larger
messages.

cmr=v1.8:reviewer=manjugv

This commit was SVN r31190.
2014-03-21 21:54:35 +00:00
Nathan Hjelm
c7d830f4b9 coll/ml: improve the buffer size calculation and ensure the bcol_index in
a hierarchy actually matches a bcol that is in use.

There was a bug in one of the paths to calculate the ml buffer size. I fixed
the bug and squashed all the paths together to avoid further issues (the
result was correct in another path that calculated the same value).

Additionally, the i_hier was being used as the bcol_index. This is not
correct in a couple of cases so I added a variable to keep track of the
real bcol_index.

cmr=v1.8:reviewer=pasha

This commit was SVN r31189.
2014-03-21 21:54:28 +00:00
Nathan Hjelm
f1dd589092 coll/ml: there is no reason not to enable coll/ml when a process in not
bound.

This case is correctly handled by coll/ml so remove the check that diables
coll/ml in the not bound case.

cmr=v1.8:reviewer=manjugv

This commit was SVN r31188.
2014-03-21 21:54:21 +00:00
Nathan Hjelm
08bbdcbf61 coll/ml: fix leaks in coll/ml resources
This patch fixes two leaks:

 - Fix typo in fallback collective code that caused coll/ml to retain
   the ibcast module twice but only release it once. One of those ibcast
   saves was supposed to be bcast.

 - Do not check for module initialization in the module destructor. It
   is possible to destruct a module that is partially setup.

cmr=v1.8:reviewer=manjugv

This commit was SVN r31187.
2014-03-21 21:54:14 +00:00
Matthias Jurenz
c49a5d1e12 Changes to VT:
Disabled support for CUPTI API version > 4 (CUDA 6) due to API mismatch

This commit was SVN r31186.
2014-03-21 09:16:46 +00:00
Nathan Hjelm
20fe3804b0 Fix comment in r31146
cmr=v1.7.5:ticket=trac:4425

This commit was SVN r31148.

The following SVN revision numbers were found above:
  r31146 --> open-mpi/ompi@dca2f0027e

The following Trac tickets were found above:
  Ticket 4425 --> https://svn.open-mpi.org/trac/ompi/ticket/4425
2014-03-19 16:09:20 +00:00
Jeff Squyres
22e6417d9e Return non-SUCCESS error codes from attribute copy functions.
Without this, an attribute copy function could return non-success, but
it would not be propagated upwards.  This caused the intel
MPI_Keyval3_* tests to fail.

cmr=v1.8:reviewer=hjelmn

This commit was SVN r31147.
2014-03-19 15:45:38 +00:00
Nathan Hjelm
dca2f0027e Protect against 0-byte allocations in carte_create and cart_sub.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31146.
2014-03-19 15:38:12 +00:00
Jeff Squyres
7adb137409 Fix segv in MPI_Graph_create_undef_c Intel test.
When you call MPI_Graph_create with a old_comm of size N, and pass
nnodes=(N=1), then the Nth proc is supposed to get MPI_COMM_NULL out.
The code in this base function didn't properly handle the proc(s) that
are supposed to get MPI_COMM_NULL out.

cmr=v1.7.5:reviewer=hjelmn

This commit was SVN r31145.
2014-03-19 15:16:28 +00:00
Jeff Squyres
c6994adf66 Add missing show_help message.
Found via Cisco MTT (i.e., it complained of not being able to find
this show_help message).

cmr=v1.8:reviewer=dgoodell

This commit was SVN r31144.
2014-03-19 14:09:19 +00:00
Matthias Jurenz
cc3dd86121 Changes to VT:
Fixed compiler warning with the Clang compiler (no previous prototype for function '__fprintf_chk')

This commit was SVN r31143.
2014-03-19 13:39:26 +00:00
Nathan Hjelm
e764d3bebc coll/ml: really remove the asserts in the barrier setup
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r31136.
2014-03-18 22:04:50 +00:00
Nathan Hjelm
e030443d45 coll/ml: further improve the hierarchy discovery to handle the case where a
sbgp module fails to group any processes on any nodes.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31131.
2014-03-18 21:26:24 +00:00
Nathan Hjelm
8b2d723fd4 coll/ml: fix valgrind warning about reading uninitialed value
This isn't causing any errors that I know about but it does fix an
annoying valgrind warning. Simple fix, no review required.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r31130.
2014-03-18 21:26:17 +00:00
Nathan Hjelm
d9c8bf3785 coll/ml: move error messages to verbose output
There are situations where coll/ml does not initialize properly. These will
eventually need to be fixed but in the meantime it is better to not always
print an error message because the collective framework can still fall back
on another collective module. This commit reduces the verbose output.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31129.
2014-03-18 21:26:10 +00:00
Nathan Hjelm
97d7315dd2 coll/ml: do not assert if a barrier algorithm is not available
It is usually not a good idea to assert when something is not implemented
or something goes wrong. Replace asserts with debug output and return.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31128.
2014-03-18 21:26:04 +00:00
Nathan Hjelm
bddd6542b7 sbgp/basesmsocket: do not recalculate process locality
The necessary information is stored in the proc object. There is no need
to allgather the local process data to determine if another rank is on
the same socket.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31127.
2014-03-18 21:25:57 +00:00
Nathan Hjelm
22f64bb62b Addendum to r31096. Up basesmuma algorithm limits to 1M.
After discussion with Manju we decided to update these the process count
limits of the shared memory collectives to an arbitrarily large number.

cmr=v1.7.5:ticket=trac:4405

This commit was SVN r31126.

The following SVN revision numbers were found above:
  r31096 --> open-mpi/ompi@3f469d08e7

The following Trac tickets were found above:
  Ticket 4405 --> https://svn.open-mpi.org/trac/ompi/ticket/4405
2014-03-18 21:25:49 +00:00
Ralph Castain
543271b9de Set the locality prior to calling add_procs so bozos like Jeff get it at the right time
Refs trac:4411

This commit was SVN r31119.

The following Trac tickets were found above:
  Ticket 4411 --> https://svn.open-mpi.org/trac/ompi/ticket/4411
2014-03-18 17:57:27 +00:00
Ralph Castain
3323c47ab4 Ensure all procs set locality for all remote procs in the multi-way intercomm_create problem
Refs trac:4411

This commit was SVN r31118.

The following Trac tickets were found above:
  Ticket 4411 --> https://svn.open-mpi.org/trac/ompi/ticket/4411
2014-03-18 16:55:15 +00:00
Jeff Squyres
7933de4928 Fix segv when ibv_create_ah fails.
* Ensure that all endpoints[x] values are initialized to NULL
* If ibv_create_ah fails, remove each endpoint from the
  module->all_endpoints list so that the endpoint can be destructed
  properly.

Submitted by Jeff Squyres, reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r31111.
2014-03-18 15:52:55 +00:00
Ralph Castain
554da83865 Set the locality for remote procs even after a comm_spawn. Ensure we store our own local cpuset upon launch so it will be shared during comm_join.
This provides full locality - i.e., not just node-level, but all the way down to whatever common binding level exists between the procs.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31106.
2014-03-18 14:51:07 +00:00
Jeff Squyres
5efd961149 Remove unnecessary \n's in ML_VERBOSE and ML_ERROR.
Also fixed spelling: IS_NOT_RECHABLE -> IS_NOT_REACHABLE.

Also mark a few places where opal_show_help() should have been used;
Manju will take care of these.

This commit was SVN r31104.
2014-03-18 12:24:32 +00:00
Nathan Hjelm
3f469d08e7 coll/ml: increase the number of allowed processes in a local reduce and
add checks to see if the bcol module can support allreduce.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31096.
2014-03-17 23:10:19 +00:00
Pavel Shamis
fba1edbf14 Removing ml include from bcol_ptpcoll.h.
It is not really required.

This commit was SVN r31095.
2014-03-17 22:58:40 +00:00
Nathan Hjelm
f92579dce5 coll/ml: fix a case not correctly handled by r31071
In r31071 I modified the logic to not increment the hierarchy level if
no processes were selected by that sbgp. That fixed a problem seen on
systems where we don't support process binding. The problem is there
is a case where we actually did select processes yet the number of
selected processes is 0. We need to increment the hierarchy in this case
as well.

This should fix the segmentation fault found by recent MTT runs. Once
this is committed to 1.7.5 remove the .ompi_ignore's from coll/ml and
bcol/ptpcoll. Tested with ompi-tests/ibm.

cmr=v1.7.5:reviewer=rhc

This commit was SVN r31081.

The following SVN revision numbers were found above:
  r31071 --> open-mpi/ompi@1911d97044
2014-03-15 22:37:28 +00:00
Jeff Squyres
34d92315ae Remove extraneous "while(0)".
Oops.

cmr=v1.7.5:ticket=trac:4395

This commit was SVN r31075.

The following Trac tickets were found above:
  Ticket 4395 --> https://svn.open-mpi.org/trac/ompi/ticket/4395
2014-03-14 20:41:54 +00:00
Jeff Squyres
06a58affca Fix minor hwloc memory leak in sbgp/basesmsocket
cmr=v1.8:reviewer=hjelmn

This commit was SVN r31074.
2014-03-14 20:40:12 +00:00
Jeff Squyres
036db91f3d For the love of all that is holy, do not put 1MB arrays on the stack.
This was causing JVMs to run out of stack space, and all manner of
badness ensued.

Instead, use the heap -- that's what it's there for.

cmr=v1.7.5:reviewer=rhc:subject=make coll/ml use the heap for large debug array

This commit was SVN r31073.
2014-03-14 20:39:39 +00:00
Rolf vandeVaart
ce5274652f Add some additional verbose output per this RFC
http://www.open-mpi.org/community/lists/devel/2014/03/14282.php
Reviewed by Jeff Squyres

This commit was SVN r31072.
2014-03-14 20:17:47 +00:00
Nathan Hjelm
1911d97044 coll/ml: fix assertion failure that occurs when level 0 of the hierarchy
fails to select any processes on any nodes.

Also modified basesmsocket to only print debugging info to the framework
output.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31071.
2014-03-14 19:39:00 +00:00
Jeff Squyres
24020ef1e3 Refs trac:4372: 3rd and hopefully final addendum to Fortran API fixes for the RMA functions
These parameters should not be marked as INTENT(OUT) (they aren't in
the MPI-3 standard).

This commit was SVN r31056.

The following Trac tickets were found above:
  Ticket 4372 --> https://svn.open-mpi.org/trac/ompi/ticket/4372
2014-03-12 22:55:57 +00:00
Ralph Castain
cd72aa9b66 Per Dave's comment, bzero has portability issues and little advantage over a simple memset. So let's use the safer solution.
cmr=v1.7.5:reviewer=dgoodell:subject=replace bzero with memset

This commit was SVN r31055.
2014-03-12 22:55:47 +00:00
Nathan Hjelm
e70809e169 osc/rdma: fix the spelling of incoming
cmr=v1.7.5:ticket=trac:4379

This commit was SVN r31050.

The following Trac tickets were found above:
  Ticket 4379 --> https://svn.open-mpi.org/trac/ompi/ticket/4379
2014-03-12 21:43:23 +00:00
Jeff Squyres
ccff41383c Refs trac:4372: Another addendum to Fortran API fixes for the RMA functions
* Several parameters should not be marked as INTENT(OUT) (they aren't in
  the MPI-3 standard).
* Added missing PMPI F08 OMPI interfaces

This commit was SVN r31049.

The following Trac tickets were found above:
  Ticket 4372 --> https://svn.open-mpi.org/trac/ompi/ticket/4372
2014-03-12 20:22:15 +00:00
Jeff Squyres
8a5a832085 Refs trac:4372: Addendum to Fortran API fixes for the RMA functions
These parameters should not be marked as INTENT(OUT) (they aren't in
the MPI-3 standard).

This commit was SVN r31048.

The following Trac tickets were found above:
  Ticket 4372 --> https://svn.open-mpi.org/trac/ompi/ticket/4372
2014-03-12 19:59:04 +00:00
Nathan Hjelm
d0009938a6 osc/rdma: tighten semantics a bit more
It is not valid to call flush outside a passive target epoch nor is
it valid to call lock/lock_all when no_locks is set. In the former
we were just semantically incorrect and the later would crash and
burn.

cmr=v1.7.5:ticket=trac:4382

This commit was SVN r31046.

The following Trac tickets were found above:
  Ticket 4382 --> https://svn.open-mpi.org/trac/ompi/ticket/4382
2014-03-12 18:53:47 +00:00
Nathan Hjelm
1fc9a55d08 osc/rdma: do not use MPI_SOURCE to determine the peer in an send operation.
This fixes a bug in r31029 which removes the use of the pml base request
(also not a good way since cm doesn't use the base request). We now allocate
a data structure (ugh) to determine the needed information. Tested with
mtt/onesided.

cmr=v1.7.5:ticket=trac:4379

This commit was SVN r31044.

The following SVN revision numbers were found above:
  r31029 --> open-mpi/ompi@29e00f9161

The following Trac tickets were found above:
  Ticket 4379 --> https://svn.open-mpi.org/trac/ompi/ticket/4379
2014-03-12 17:14:11 +00:00
Nathan Hjelm
6648a46963 rma: fix semantic errors in osc/rdma and MPI_Win_fence
- Return an error if the caller specified both MPI_MODE_NOPRECEDE and
   MPI_MODE_NOSUCCEED to MPI_Win_fence.

 - Return an error if the caller attempts to enter an active target
   epoch while already in a passive target epoch.

 - End an active target epoch if MPI_Win_fence is called with
   MPI_MODE_NOSUCCEED.

cmr=v1.7.5:ticket=trac:4382

This commit was SVN r31043.

The following Trac tickets were found above:
  Ticket 4382 --> https://svn.open-mpi.org/trac/ompi/ticket/4382
2014-03-12 17:14:03 +00:00
Nathan Hjelm
51916c5b41 osc/rdma: now that the access epoch is not open after MPI_Win_create* we
need to enable the access epoch in MPI_Win_fence.

I missed this change when I fixed the semantics of MPI_Win_create. With
this commit our one-sided MTT runs are now running clean.

cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r31041.
2014-03-12 16:11:15 +00:00
Jeff Squyres
3120ec2b96 Also bump the Fortran MPI version constants to 3.0
cmr=v1.7.5:ticket=trac:4371

This commit was SVN r31038.

The following Trac tickets were found above:
  Ticket 4371 --> https://svn.open-mpi.org/trac/ompi/ticket/4371
2014-03-12 16:06:15 +00:00
Nathan Hjelm
c173344141 Add new MPI-3.1 tools interface functions.
See https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/377

This ticket adds the following functions to the standard:

 - MPI_T_cvar_get_index, MPI_T_pvar_get_index, and MPI_T_category_get_index

The ticket has passed and the functions are part of MPI-3.1 that will
be released sometime later this year. In Open MPI the functions expose
existing internal functionality so they are low-risk to add to 1.8.0. I
will leave it up to Ralph whether he wants to accept these into 1.8.

cmr=v1.8:reviewer=rhc

This commit was SVN r31037.
2014-03-12 16:03:39 +00:00
Jeff Squyres
c6fb1b51b1 Remove "medium" RMA interfaces
We no longer specify interfaces with choice buffers in the TKR "mpi"
module implementation -- MPI-3 prohibits it (see r30169 and r30170 for
more details).

cmr=v1.7.5:ticket=trac:4372

This commit was SVN r31033.

The following SVN revision numbers were found above:
  r30169 --> open-mpi/ompi@759ee33fd4
  r30170 --> open-mpi/ompi@776f6144af

The following Trac tickets were found above:
  Ticket 4372 --> https://svn.open-mpi.org/trac/ompi/ticket/4372
2014-03-12 15:51:42 +00:00
Jeff Squyres
53248a90f3 Add missing files in Makefile.am listing
cmr=v1.7.5:ticket=trac:4372

This commit was SVN r31032.

The following Trac tickets were found above:
  Ticket 4372 --> https://svn.open-mpi.org/trac/ompi/ticket/4372
2014-03-12 15:48:55 +00:00
Nathan Hjelm
61f30d992a coll/ml: reduce has some issues when using non-contiguous datatypes. until
these issues are resolved disable coll/ml reduce.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31030.
2014-03-12 14:39:16 +00:00
Nathan Hjelm
29e00f9161 osc/rdma: fix issues with mpi_leave_pinned when using rdma capable btls
It seems we can't release accumulate buffers in completion callbacks
because the btls don't release registration resources until after the
callback has fired. The fix is to keep track of the unused buffers and
free them later. This should resolve issues when running IMB-EXT and
IMB-RMA.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31029.
2014-03-12 14:39:03 +00:00
Jeff Squyres
b8cd2878e7 Fix more pragma argument names to match dummy argument names.
Issue found by Absoft MTT runs.

cmr=v1.7.5:ticket=trac:4372

This commit was SVN r31028.

The following Trac tickets were found above:
  Ticket 4372 --> https://svn.open-mpi.org/trac/ompi/ticket/4372
2014-03-12 14:30:51 +00:00
Jeff Squyres
f3b5670dce Fix pragma argument names to match dummy argument names.
Issue found by Absoft MTT runs.

cmr=v1.7.5:ticket=trac:4372

This commit was SVN r31027.

The following Trac tickets were found above:
  Ticket 4372 --> https://svn.open-mpi.org/trac/ompi/ticket/4372
2014-03-12 14:27:43 +00:00
Jeff Squyres
3523e315b8 Fix ompi function in Fortran RMA bindings
cmr=v1.7.5:ticket=trac:4372

This commit was SVN r31026.

The following Trac tickets were found above:
  Ticket 4372 --> https://svn.open-mpi.org/trac/ompi/ticket/4372
2014-03-12 14:12:10 +00:00
Jeff Squyres
da87b506bd Remove warnings identified by clang 3.4
* Remove unused static functions
 * Remove unused static variables

cmr=v1.8:reviewer=hjelmn

This commit was SVN r31023.
2014-03-12 13:17:54 +00:00
Nathan Hjelm
0c493b4372 Fix C&P errors in r31012
cmr=v1.7.5:ticket=trac:4373

This commit was SVN r31014.

The following SVN revision numbers were found above:
  r31012 --> open-mpi/ompi@d5d2d5c4d8

The following Trac tickets were found above:
  Ticket 4373 --> https://svn.open-mpi.org/trac/ompi/ticket/4373
2014-03-12 00:21:23 +00:00
Ralph Castain
647fdaaa22 Fix an MPI-3 compliance buglet - MPI_Info_set should now use "const" qualifiers on the key and value parameters
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31013.
2014-03-11 23:50:09 +00:00
Nathan Hjelm
d5d2d5c4d8 Add an internal ompi error code for RMA sync errors.
Dave Goodell correctly pointed out that it is unusual to return MPI
error classes from internal ompi functions. Correct this in the RMA
case by adding an internal error code to match MPI_ERR_RMA_SYNC.

This does change OMPI_ERR_MAX. I don't think this will cause any
problems with ABI.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31012.
2014-03-11 23:45:23 +00:00
Nathan Hjelm
c4de1aa1ce osc: add fortran bindings for new RMA function
I have only checked that these bindings compile without warnings. They
appear to work with both intel's compilers and gfortran.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31010.
2014-03-11 23:02:27 +00:00
Nathan Hjelm
5be13abcc0 Open MPI is MPI-3 conformant as of 1.7.5. Update mpi.h to reflect this.
cmr=v1.7.5:reviewer=rhc

This commit was SVN r31009.
2014-03-11 23:01:58 +00:00
Nathan Hjelm
b6a30e293a osc/rdma: check for incorrect use of the active target interface
This commit resolves a number of crashed discovered my the onesided
tests in MTT. The functions in question were operating on the assumption
the user was calling RMA functions correctly.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31008.
2014-03-11 23:01:51 +00:00
Nathan Hjelm
e9d60b9e2f osc/rdma: restrict local optimizations to occur only during an access epoch.
cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r31007.
2014-03-11 23:01:42 +00:00
Ralph Castain
9c66c4f439 Correctly implement --disable-oshmem and --without-orte so we don't build the disabled section of code. Fix a bunch of code rot in the PMI rte component, and add several missing headers when building --without-orte.
NOTE: I transferred the oshmem-disabled-by-default from the 1.7 branch to the trunk to minimize future disruption if/when we change that option.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31006.
2014-03-11 22:02:40 +00:00
Ralph Castain
ebd8e545c0 Silence warning
cmr=v1.8:reviewer=hjelmn

This commit was SVN r31005.
2014-03-11 21:59:17 +00:00
Mike Dubman
a14dda491e OSHMEM: various fixes
- -check-shmem-params is OFF by default. It checks OSHMEM API params and will abort on bad input
- hcoll do not save fallback coll pointers for unsupported collectives.

fixed by Val, Roman, reviewed by Miked/Igor

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30995.
2014-03-11 17:27:33 +00:00
Ralph Castain
e4efd5675f Per telecon, add comment indicating this needs to be fixed
Refs trac:4354

This commit was SVN r30991.

The following Trac tickets were found above:
  Ticket 4354 --> https://svn.open-mpi.org/trac/ompi/ticket/4354
2014-03-11 15:57:11 +00:00
Nathan Hjelm
51c5daf1b4 bcol/basesmuma: initialize module with all 0's to fix segmentation faults
in module destructor.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30978.
2014-03-10 20:42:47 +00:00
Nathan Hjelm
cbb531ed13 osc/rdma: use OPAL_ALIGN macro
cmr=v1.7.5:ticket=trac:4357

This commit was SVN r30975.

The following Trac tickets were found above:
  Ticket 4357 --> https://svn.open-mpi.org/trac/ompi/ticket/4357
2014-03-10 18:57:20 +00:00
Nathan Hjelm
5df8cd75a9 osc/rdma: ensure fragment headers and the packed datatype are 8-byte aligned.
The datatype unpacking code assumes that the packed datatype buffer has the
same alignment as an OPAL_PTRDIFF_TYPE. This was not enforced by the rdma
one-sided component. I changed the ordering and sized of various osc/rdma
headers to ensure their sizes are a multiple of 8-bytes and modified the
fragment allocation call to ensure all headers are 8-byte aligned. While
not the cleanest way to handle this situation it should resolve the issue.

Fixes trac:4315

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30974.

The following Trac tickets were found above:
  Ticket 4315 --> https://svn.open-mpi.org/trac/ompi/ticket/4315
2014-03-10 18:11:22 +00:00
Nathan Hjelm
85515f2587 osc/rdma: silence warning
cmr=v1.7.5:ticket=trac:4355

This commit was SVN r30970.

The following Trac tickets were found above:
  Ticket 4355 --> https://svn.open-mpi.org/trac/ompi/ticket/4355
2014-03-10 16:11:25 +00:00
Ralph Castain
081669b440 When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it
cmr=v1.7.5:reviewer=rhc:subject=fix pretty-print of bindings

This commit was SVN r30968.
2014-03-10 15:53:07 +00:00
Yossi Etigin
b04a2339c5 Fix segmentation fault when osc_rdma is used with pml_cm: osc_rdma
assumes the send request is derived from mca_pml_base_send_request_t,
but this is not true for pml cm, so we end up freeing invalid pointer.
 We cannot take the data pointer from the pml send request, so we pass 
the allocated buffer pointer in req_complete_cb_data, and put the 
osc_rdma_module pointer in that buffer as well.
 Previously, osc_pt2pt was used with pml_cm which didn't have this 
problem.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30967.
2014-03-10 15:21:37 +00:00
Yossi Etigin
280e96c99a In mtl_mxm, don't disconnect from a proc with refcount > 1.
This will keep the connection until mxm endpoint is destroyed.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30966.
2014-03-09 08:35:44 +00:00
Oscar Vega-Gisbert
4d7be5b4a4 Java: send MPI_Op handle as parameter to C side
This commit was SVN r30965.
2014-03-08 20:29:50 +00:00
Nathan Hjelm
579a2d10cc bcol/ptpcoll: initialize all pointers in the module to NULL to avoid possible
problems when the module is being destructed.

References #4331

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30964.
2014-03-07 21:16:20 +00:00
Nathan Hjelm
da2a68f669 coll/ml: fix bcast buffer size calculation
cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30963.
2014-03-07 21:00:08 +00:00
Nathan Hjelm
0af741810c coll/ml: do not access group proc pointers directly. use ompi_comm_peer_lookup instead.
Resolves an issue seen with --enable-sparse-groups.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30945.
2014-03-05 22:57:21 +00:00
Jeff Squyres
554303af08 Two minor usnic fixes
* clang warning stomp
* memory barrier for volatile variable use

These can go to 1.7.5 or can slip to v1.8 -- RM decision.

Submitted by Jeff Squyres, reviewed by Dave Goodell

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30944.
2014-03-05 21:21:53 +00:00
Jeff Squyres
fbde50e7cd Fix ibv_port_query() usnic extension usage
* Older versions of libusnic_verbs actually return 0 when querying for
  an unknown port.  So also check for a magic ID in the returned data
  to *really* know if the usnic extensions are supported.
* Use a union (in the common_verbs area) and memcpy (in the btl) to
  avoid undefined C type aliasing behavior.
* Ensure to memset the function table to 0 if the usnic extensions
  are not supported.

Submitted by Jeff Squyres, reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30935.
2014-03-04 22:11:47 +00:00
Ralph Castain
0461e54875 Add missing headers to tarball
This commit was SVN r30932.
2014-03-04 16:59:15 +00:00
Nathan Hjelm
5a4037df4f osc/rdma: fix typo in rdma osc component.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30931.
2014-03-04 16:57:56 +00:00
Nathan Hjelm
cb6670b340 bcol/basesmuma: use framework output for error message and fix a rounding error.
cmr=v1.7.5:reviewer=pasha

This commit was SVN r30929.
2014-03-04 16:55:57 +00:00
Adrian Reber
e5bef82ee1 OPAL_ENABLE_FT_CR: remove compiler warnings
When compiling --with-ft there are a few compiler warnings about
unused variables. This patch fixes those compiler warnings.

This commit was SVN r30927.
2014-03-04 15:28:07 +00:00
Rolf vandeVaart
c2ae29d860 Adjust priorities in smcuda BTL so it is used when CUDA-aware compiled in.
cmr=v1.7.5:reviewer=hjelmn

This commit was SVN r30925.
2014-03-04 14:44:44 +00:00
Vasily Filipov
5acb96d11b BTL/OPENIB: always define BTL_OPENIB_RDMACM_IB_ADDR to 0 or 1.
This commit was SVN r30923.
2014-03-04 08:54:17 +00:00
Jeff Squyres
6710c2ef3f usnic: remove unnecessary header union
Realistically, the usnic BTL doesn't need to know anything about the
underlying transport except for its header length (so that it knows
where the payload begins in a received buffer).  So remove the use of
the specific transport prefix union and just rely on the usnic verbs
extension to tell us what the header length is if we're using the
usNIC/UDP transport, or sizeof(struct ibv_grh) if we're using usNIC/L2
transport.

This commit was SVN r30914.
2014-03-03 21:33:12 +00:00
Jeff Squyres
05af83d5d8 common_verbs: Remove usnic magic probe test
Check the IBV_TRANSPORT_* values.  In the case of IBV_TRANSPORT_IWARP,
there's an ambiguity and we need to also check to see whether the
usnic verbs externsion probe exists.

This commit was SVN r30913.
2014-03-03 21:32:44 +00:00
Jeff Squyres
d61765cb2a usnic: use the new usnic verbs extensions
If they exist, call the usnic verbs extensions to both enable UDP
support and get the UD receiver header length that should be used
(rather than assume 40/struct GRH).

This commit was SVN r30912.
2014-03-03 21:31:42 +00:00
Nathan Hjelm
9e92c5be53 osc/sm: check for pthread_condattr_setpshared and pthread_mutexattr_setpshared. fall
back on barrier if either function doesn't exist.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30911.
2014-03-03 17:09:09 +00:00
Nathan Hjelm
dc3d4ffbf3 osc/sm: do not use gcc specific calls
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30910.
2014-03-03 16:47:29 +00:00
Jeff Squyres
baf21ab446 Fix DESTDIR installs for javadocs
Thanks to Orion Poplawski for noticing the issue
(http://www.open-mpi.org/community/lists/devel/2014/03/14263.php).

This commit was SVN r30908.
2014-03-03 16:26:18 +00:00
Mike Dubman
05ee929832 OMPI-MXM: handle multiple calls to add_procs() in MXM
- now add_procs can be called more than once (during MPI_INIT and Inter_Comm_Create)
- adjust MXM to this reality

fixed by Alina, reviewed by Yossi/Mike

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30907.
2014-03-03 13:50:37 +00:00
Vasily Filipov
f36d50d494 OPENIB BTL/CONNECT: warning fixes caused by r30875.
This commit was SVN r30905.

The following SVN revision numbers were found above:
  r30875 --> open-mpi/ompi@f2014b96e7
2014-03-03 06:41:46 +00:00
Ralph Castain
85770ff782 Remove stale file reference
Refs trac:4322

This commit was SVN r30904.

The following Trac tickets were found above:
  Ticket 4322 --> https://svn.open-mpi.org/trac/ompi/ticket/4322
2014-03-02 03:48:56 +00:00
Oscar Vega-Gisbert
3a31869a58 Authorship of Java bindings
This commit was SVN r30901.
2014-03-01 20:18:05 +00:00
Oscar Vega-Gisbert
4def6c5ae2 Java: avoid setting Status attributes in the C side.
Make File implementation consistent with the rest of the API.

This commit was SVN r30900.
2014-03-01 19:54:36 +00:00
Jeff Squyres
1bda304a35 Mark the ompi_rte_abort() function as "no return"
This allows compilers to know that the code path(s) where
ompi_rte_abort() is invoked won't return (and therefore won't warn in
certain cases).

cmr=v1.8:reviewer=rhc

This commit was SVN r30891.
2014-02-28 17:45:36 +00:00
Jeff Squyres
f8dbba78a7 Send the BTL-passed message to ompi_rte_abort.
cmr=v1.8:reviewer=rolfv

This commit was SVN r30889.
2014-02-28 16:20:54 +00:00
Tom Naughton
8793560bde + fix abstraction violation (ORTE_process_info => OMPI_process_info)
This commit was SVN r30883.
2014-02-27 23:59:46 +00:00
Jeff Squyres
e819b5a34a Remove the vendor_ids parsing.
We don't use this functionality any more; we use the transport_type
and device name to identify usnic devices.  It's slightly easier
because we can transport_type+name from ibv_device_open() and don't
have to do an additional ibv_query_device() to get its attributes.

Reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30882.
2014-02-27 21:47:01 +00:00
Jeff Squyres
3cbdf33b88 This is what r30852 should have been: Consolidate into a single, outter loop of ibv_create_ah() calls
Follow on to SVN trunk r30850: consolidate the ibv_create_ah() calls
into a single loop, MPI_WAITALL-style.  That is, call the (effectively
non-blocking) ibv_create_ah() for each endpoint.  If we get
NULL+EAGAIN, it means that the UDP ARP is still ongoing down in the
kernel, so just try again later.  We put these all into a single loop
because it allows us to parallelize the ARP progress in the kernel.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30879.

The following SVN revision numbers were found above:
  r30850 --> open-mpi/ompi@3641500442
  r30852 --> open-mpi/ompi@4e282a3295

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-27 17:19:50 +00:00
Jeff Squyres
45810f0efb Revert r30852: the wrong version of this patch got committed to SVN.
This commit was SVN r30878.

The following SVN revision numbers were found above:
  r30852 --> open-mpi/ompi@4e282a3295
2014-02-27 15:02:15 +00:00
Vasily Filipov
d702307521 OPENIB BTL/CONNECT: replace wrong rdma_freeaddrinfo call in rdmacm_component_query func.
This commit was SVN r30876.
2014-02-27 11:52:10 +00:00
Vasily Filipov
f2014b96e7 OPENIB BTL/CONNECT: Add support for AF_IB addressing in rdmacm.
This commit was SVN r30875.
2014-02-27 11:29:47 +00:00
Ralph Castain
ce26b096b4 Prevent failover to direct_modex if key isn't found unless direct_modex was enabled
Refs trac:4258

This commit was SVN r30865.

The following Trac tickets were found above:
  Ticket 4258 --> https://svn.open-mpi.org/trac/ompi/ticket/4258
2014-02-27 02:04:56 +00:00
Jeff Squyres
7440f21b75 Add usnic connectivity-checking agent service.
Basically: since usnic is a connectionless transport, we do not get
OS-provided services "for free" that connection-oriented transports
get, namely: "hey, I wasn't able to make a connection to peer X", and
"hey, your connection to peer X has died."
    
This connectivity-checker runs in a separate progress thread in the
usnic BTL in local rank 0 on each server.  Upon first send in any
process, the connectivty-checker agent will send some UDP pings to the
peer to ensure that we can reach it.  If we can't, we'll abort the job
with a nice show_help message.
    
There's a lengthy comment in btl_usnic_connectivity.h explains the
scheme and how it works.

Reviewed by Dave Goodell.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30860.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 22:21:25 +00:00
Oscar Vega-Gisbert
f2043776f6 mpi.Comm: some methods must be private.
They were protected because the old Prequest implementation used them.

This commit was SVN r30859.
2014-02-26 21:17:52 +00:00
Nathan Hjelm
dfe4a504e4 udcm: fix race between ack arrival and message send and potential hang in udcm
finalize.

Closes trac:4290

cmr=v1.7.5:reviewer=miked

This commit was SVN r30854.

The following Trac tickets were found above:
  Ticket 4290 --> https://svn.open-mpi.org/trac/ompi/ticket/4290
2014-02-26 15:33:27 +00:00
Nathan Hjelm
30b61a3333 Fix a number of issues in the new one sided code.
- Fix several typos is osc/rdma.

 - Fix a locking issue in osc/sm that was caused by an incorrect
   assumption about the semantics of opal_atomic_add_32.

 - Always unlock the accumulation lock in osc/sm.

 - The base of a processes shared memory window should be NULL if
   the size is zero. Fixed.

cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30853.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-26 15:33:18 +00:00
Jeff Squyres
4e282a3295 Consolidate into a single, outter loop of ibv_create_ah() calls
Follow on to SVN trunk r30850: consolidate the ibv_create_ah() calls
into a single loop, MPI_WAITALL-style.  That is, call the (effectively
non-blocking) ibv_create_ah() for each endpoint.  If we get
NULL+EAGAIN, it means that the UDP ARP is still ongoing down in the
kernel, so just try again later.  We put these all into a single loop
because it allows us to parallelize the ARP progress in the kernel.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30852.

The following SVN revision numbers were found above:
  r30850 --> open-mpi/ompi@3641500442

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 11:02:12 +00:00
Dave Goodell
3641500442 usnic: Loop on the ibv_create_ah call.
ibv_create_ah() may need to effect an ARP resolution, which may take
some time.  Rather than blocking in ibv_create_ah(), the usNIC driver
may return NULL and set errno to EAGAIN indicating that we should try
again (i.e., the ARP resolution is proceeding under the covers).

So add a simple loop here to loop over ibv_create_ah() until it
returns non-(NULL+EAGAIN).  A future commit will make this a bit more
efficient.

Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30850.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:38 +00:00
Dave Goodell
c40f8879c8 usnic: improve interface matching (esp. for UDP)
Prior to this commit we matched local interfaces to remote interfaces in
order to create endpoints in a simplistic way.  If any remote interfaces
were on the same subnet as any of our local interfaces then only local
interfaces would be paired (IP-routed remote interfaces would be
ignored).

This commit introduces a more general scheme which attempts to make the
"best" pairing of local interfaces to remote interfaces.  We now cast
the problem as a graph theory problem known as the "Assignment Problem",
or finding a maximum-cardinality, minimum-weight bipartite matching.  We
solve this problem by reducing the bipartite graph of interface
connectivity to a flow network and then solving for a minimum cost flow.
This is then easily converted into back into a matching on the original
bipartite graph.

In the new scheme, interfaces on the same subnet are preferred over
interfaces requiring intermediate routing hops and higher bandwidth
links are preferred over lower bandwidth links.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30849.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:26 +00:00
Dave Goodell
47148ab3cb usnic: helper routines for rtnetlink route lookups
Querying the OS routing table is important for making decisions about
which local and remote interfaces should be paired into reliable
communication channels.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30848.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:10 +00:00
Dave Goodell
db14c706ce usnic: add graph utility code
This code is intended to support usNIC interface matching functionality.
We currently view that problem as essentially the "Assignment Problem"
(http://en.wikipedia.org/wiki/Assignment_problem), for which there are
many possible solution approaches, including flow-network analysis.  In
the future, we might transition to a more nuanced view of the problem
which would likely also be flow-network based.

To this end, the current code focuses on providing one major algorithm
to the core usnic BTL: `ompi_btl_usnic_solve_bipartite_assignment`.  It
also exposes several typical and necessary functions for constructing,
manipulating, and querying weighted, directed graphs.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30847.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:49:54 +00:00
Dave Goodell
5bf969e63b usnic: unit test parse_ifex_str
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30846.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:48:05 +00:00
Dave Goodell
921a29e41f usnic: add simple unit testing infrastructure
This commit adds mechanisms for writing and running unit tests in the
usnic BTL.  The short version of how to run the tests is:

1. Configure with `--enable-ompi-btl-usnic-unit-tests`.  This will cause
   the unit testing code and test runner utility to be built.

2. Run the tests by running `ompi_btl_usnic_run_tests`.

See `README.test` for full details.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30845.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:50 +00:00
Dave Goodell
044a190cac usnic: consolidate orte includes into compat.h
These includes only exist in the Cisco-internal usnic-v1.6 code base,
but they should not exist anywhere except btl_usnic_compat.h in order to
minimize source differences between usnic-v1.6 and v1.7/trunk.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30844.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:33 +00:00
Dave Goodell
62dc42f628 usnic: check packet/segment lengths
Lower layer (hardware or software) bugs can result in a mismatch between
our BTL-layer payload size and the actual packet length.  We now check
that in order to catch these cases, which otherwise can result in
MPI-layer message corruption.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30843.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:19 +00:00