1
1
Граф коммитов

7363 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
e819b5a34a Remove the vendor_ids parsing.
We don't use this functionality any more; we use the transport_type
and device name to identify usnic devices.  It's slightly easier
because we can transport_type+name from ibv_device_open() and don't
have to do an additional ibv_query_device() to get its attributes.

Reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30882.
2014-02-27 21:47:01 +00:00
Jeff Squyres
3cbdf33b88 This is what r30852 should have been: Consolidate into a single, outter loop of ibv_create_ah() calls
Follow on to SVN trunk r30850: consolidate the ibv_create_ah() calls
into a single loop, MPI_WAITALL-style.  That is, call the (effectively
non-blocking) ibv_create_ah() for each endpoint.  If we get
NULL+EAGAIN, it means that the UDP ARP is still ongoing down in the
kernel, so just try again later.  We put these all into a single loop
because it allows us to parallelize the ARP progress in the kernel.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30879.

The following SVN revision numbers were found above:
  r30850 --> open-mpi/ompi@3641500442
  r30852 --> open-mpi/ompi@4e282a3295

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-27 17:19:50 +00:00
Jeff Squyres
45810f0efb Revert r30852: the wrong version of this patch got committed to SVN.
This commit was SVN r30878.

The following SVN revision numbers were found above:
  r30852 --> open-mpi/ompi@4e282a3295
2014-02-27 15:02:15 +00:00
Vasily Filipov
d702307521 OPENIB BTL/CONNECT: replace wrong rdma_freeaddrinfo call in rdmacm_component_query func.
This commit was SVN r30876.
2014-02-27 11:52:10 +00:00
Vasily Filipov
f2014b96e7 OPENIB BTL/CONNECT: Add support for AF_IB addressing in rdmacm.
This commit was SVN r30875.
2014-02-27 11:29:47 +00:00
Ralph Castain
ce26b096b4 Prevent failover to direct_modex if key isn't found unless direct_modex was enabled
Refs trac:4258

This commit was SVN r30865.

The following Trac tickets were found above:
  Ticket 4258 --> https://svn.open-mpi.org/trac/ompi/ticket/4258
2014-02-27 02:04:56 +00:00
Jeff Squyres
7440f21b75 Add usnic connectivity-checking agent service.
Basically: since usnic is a connectionless transport, we do not get
OS-provided services "for free" that connection-oriented transports
get, namely: "hey, I wasn't able to make a connection to peer X", and
"hey, your connection to peer X has died."
    
This connectivity-checker runs in a separate progress thread in the
usnic BTL in local rank 0 on each server.  Upon first send in any
process, the connectivty-checker agent will send some UDP pings to the
peer to ensure that we can reach it.  If we can't, we'll abort the job
with a nice show_help message.
    
There's a lengthy comment in btl_usnic_connectivity.h explains the
scheme and how it works.

Reviewed by Dave Goodell.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30860.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 22:21:25 +00:00
Oscar Vega-Gisbert
f2043776f6 mpi.Comm: some methods must be private.
They were protected because the old Prequest implementation used them.

This commit was SVN r30859.
2014-02-26 21:17:52 +00:00
Nathan Hjelm
dfe4a504e4 udcm: fix race between ack arrival and message send and potential hang in udcm
finalize.

Closes trac:4290

cmr=v1.7.5:reviewer=miked

This commit was SVN r30854.

The following Trac tickets were found above:
  Ticket 4290 --> https://svn.open-mpi.org/trac/ompi/ticket/4290
2014-02-26 15:33:27 +00:00
Nathan Hjelm
30b61a3333 Fix a number of issues in the new one sided code.
- Fix several typos is osc/rdma.

 - Fix a locking issue in osc/sm that was caused by an incorrect
   assumption about the semantics of opal_atomic_add_32.

 - Always unlock the accumulation lock in osc/sm.

 - The base of a processes shared memory window should be NULL if
   the size is zero. Fixed.

cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30853.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-26 15:33:18 +00:00
Jeff Squyres
4e282a3295 Consolidate into a single, outter loop of ibv_create_ah() calls
Follow on to SVN trunk r30850: consolidate the ibv_create_ah() calls
into a single loop, MPI_WAITALL-style.  That is, call the (effectively
non-blocking) ibv_create_ah() for each endpoint.  If we get
NULL+EAGAIN, it means that the UDP ARP is still ongoing down in the
kernel, so just try again later.  We put these all into a single loop
because it allows us to parallelize the ARP progress in the kernel.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30852.

The following SVN revision numbers were found above:
  r30850 --> open-mpi/ompi@3641500442

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 11:02:12 +00:00
Dave Goodell
3641500442 usnic: Loop on the ibv_create_ah call.
ibv_create_ah() may need to effect an ARP resolution, which may take
some time.  Rather than blocking in ibv_create_ah(), the usNIC driver
may return NULL and set errno to EAGAIN indicating that we should try
again (i.e., the ARP resolution is proceeding under the covers).

So add a simple loop here to loop over ibv_create_ah() until it
returns non-(NULL+EAGAIN).  A future commit will make this a bit more
efficient.

Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30850.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:38 +00:00
Dave Goodell
c40f8879c8 usnic: improve interface matching (esp. for UDP)
Prior to this commit we matched local interfaces to remote interfaces in
order to create endpoints in a simplistic way.  If any remote interfaces
were on the same subnet as any of our local interfaces then only local
interfaces would be paired (IP-routed remote interfaces would be
ignored).

This commit introduces a more general scheme which attempts to make the
"best" pairing of local interfaces to remote interfaces.  We now cast
the problem as a graph theory problem known as the "Assignment Problem",
or finding a maximum-cardinality, minimum-weight bipartite matching.  We
solve this problem by reducing the bipartite graph of interface
connectivity to a flow network and then solving for a minimum cost flow.
This is then easily converted into back into a matching on the original
bipartite graph.

In the new scheme, interfaces on the same subnet are preferred over
interfaces requiring intermediate routing hops and higher bandwidth
links are preferred over lower bandwidth links.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30849.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:26 +00:00
Dave Goodell
47148ab3cb usnic: helper routines for rtnetlink route lookups
Querying the OS routing table is important for making decisions about
which local and remote interfaces should be paired into reliable
communication channels.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30848.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:10 +00:00
Dave Goodell
db14c706ce usnic: add graph utility code
This code is intended to support usNIC interface matching functionality.
We currently view that problem as essentially the "Assignment Problem"
(http://en.wikipedia.org/wiki/Assignment_problem), for which there are
many possible solution approaches, including flow-network analysis.  In
the future, we might transition to a more nuanced view of the problem
which would likely also be flow-network based.

To this end, the current code focuses on providing one major algorithm
to the core usnic BTL: `ompi_btl_usnic_solve_bipartite_assignment`.  It
also exposes several typical and necessary functions for constructing,
manipulating, and querying weighted, directed graphs.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30847.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:49:54 +00:00
Dave Goodell
5bf969e63b usnic: unit test parse_ifex_str
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30846.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:48:05 +00:00
Dave Goodell
921a29e41f usnic: add simple unit testing infrastructure
This commit adds mechanisms for writing and running unit tests in the
usnic BTL.  The short version of how to run the tests is:

1. Configure with `--enable-ompi-btl-usnic-unit-tests`.  This will cause
   the unit testing code and test runner utility to be built.

2. Run the tests by running `ompi_btl_usnic_run_tests`.

See `README.test` for full details.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30845.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:50 +00:00
Dave Goodell
044a190cac usnic: consolidate orte includes into compat.h
These includes only exist in the Cisco-internal usnic-v1.6 code base,
but they should not exist anywhere except btl_usnic_compat.h in order to
minimize source differences between usnic-v1.6 and v1.7/trunk.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30844.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:33 +00:00
Dave Goodell
62dc42f628 usnic: check packet/segment lengths
Lower layer (hardware or software) bugs can result in a mismatch between
our BTL-layer payload size and the actual packet length.  We now check
that in order to catch these cases, which otherwise can result in
MPI-layer message corruption.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30843.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:19 +00:00
Dave Goodell
3b5b87c325 usnic: add missing MSGDEBUG in recv path
We were missing a debug message for a very common recv case, making it a
bit harder to follow a debug log.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30842.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:05 +00:00
Dave Goodell
f6036d11c8 usnic: fix sender hash comparisons for UDP
There was a duplicated subnet check in the sender hash lookup routine.
This caused receivers to always fail the sender hash lookup if the
sender was in a different subnet, so the receiver would discard the
packet as though it were coming from a different job.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30841.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:46:50 +00:00
Dave Goodell
90d68730f1 usnic: fix SEGV when ibv_create_ah fails
If ibv_create_ah fails, we will not initialize the `endpoint->proc`.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30840.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:46:37 +00:00
Dave Goodell
a54f53f242 usnic: also match interfaces in different subnets
This functionality is required for routable UDP/IP usnic traffic.

Previously we would only setup endpoints for remote interfaces on the
same subnet as the current module's local interface.  This behavior
still holds if two processes share any common subnets.  However, if the
two processes only have no subnets in common then we assume that all
interfaces are reachable from all other interfaces and wire them up in a
1-1, randomly-matched order somewhat similarly to the "tcp" BTL's
behavior.

Only match in different subnets if we detect UDP support in the lower
layer.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30839.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:44:49 +00:00
Dave Goodell
4875f48eaa usnic: enable UDP support
This commit decouples OMPI deployment from the version(s) of the lower
layers of the stack by probing for UDP support.

Verbs applications assume a 40-byte header (there is no current
mechanism for querying payload offset).  So to support a 42-byte UDP
header without causing existing applications like ibv_ud_pingpong or
older versions of OMPI to crash, we must inform libusnic_verbs that we
are aware of the nonstandard payload offset.  We do this by overriding
the `transport_type` field of the device to be 42 before calling
`ibv_open_device`.  If the library resets it to something else, then we
know the lower layers are UDP capable.  Otherwise we use the older
custom-L2 format.

This necessitated some minor ugliness in common_verbs, but it's as tidy
as Jeff and I know how to make it right now.

This commit only adds support for UDP headers and connectivity over the
same L2 network, it does not touch routing or interface pairing.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30838.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:44:35 +00:00
Dave Goodell
e10ad5763f usnic: rearrange component struct field order
Just trying to be deliberate about keeping fastpath-accessed fields
grouped together to fit into the same 64-byte cache line.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30837.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:53 +00:00
Dave Goodell
5d7eabbcd1 usnic: Change tiny_mtu to a size_t (it's compared against an unsigned value)
Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30836.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:37 +00:00
Dave Goodell
fef38d7e42 usnic: Fix a few compiler warnings about types of printed variables
Authored-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30835.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:23 +00:00
Dave Goodell
cadaa1c424 usnic: Shrink sequence numbers to 16 bits
Authored-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30834.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:10 +00:00
Dave Goodell
707e594d13 usnic: Use INLINE flag more often, saving the DMA is useful.
Authored-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30833.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:39:53 +00:00
Dave Goodell
dbbe6a8254 usnic: fix proc structure memory leak
Valgrind showed this one, just a bit of sloppiness with the reference
counting.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30832.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:39:34 +00:00
Dave Goodell
4af332bd4e Fix the logic in ompi_common_verbs_find_ports().
The logic did not correctly perform the OR behavior that is described
in the doxy docs for this function.  This commit fixes the logic so
that a port will be included if it has supports any of the
capabilities indicated by the passed-in flags.

Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30831.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:39:21 +00:00
Nathan Hjelm
acbd6032f9 Helps to include the correct header.
cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30821.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-25 19:14:48 +00:00
Nathan Hjelm
5edacac301 osc/rdma: add missing include
cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30820.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-25 19:11:19 +00:00
Ralph Castain
49d938de29 Merge one-sided updates to the trunk - written by Brian Barrett and Nathan Hjelmn
cmr=v1.7.5:reviewer=hjelmn:subject=Update one-sided to MPI-3

This commit was SVN r30816.
2014-02-25 17:36:43 +00:00
Joshua Ladd
9ea9bec4ad Addressing Jeff's comments:
1. Changed rng_buff_t --> opal_rng_buff_t
2. All global variables obey the prefix rule
3. Old code has been removed 
4. Found a couple of unnecessary includes

Refs trac:4298

This commit was SVN r30807.

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-24 23:18:35 +00:00
Oscar Vega-Gisbert
90625573ff mpi.Request: avoid calling JNI free if it is null
This commit was SVN r30806.
2014-02-24 22:28:59 +00:00
Jeff Squyres
d07d1864ae Revert r30804.
We're going to be bringing a bunch of usnic code to the SVN trunk
soon, and I basically brought this commit over out of order.  So I'm
reverting it for now; the same functionality will come back shortly.

This commit was SVN r30805.

The following SVN revision numbers were found above:
  r30804 --> open-mpi/ompi@5bedcc15bf
2014-02-24 19:12:49 +00:00
Jeff Squyres
5bedcc15bf Support the IBV_*_USNIC_* verbs constants.
These constants are now upstream (see
https://git.kernel.org/cgit/libs/infiniband/libibverbs.git/commit/?id=f57a9c67eabb9e7f19c624ac3c8c27b7be55796c),
so let's support them properly in Open MPI.

Added bonus: consolidating these checks up in
ompi_check_openfabrics.m4 allowed removing some custom checks and
AC_DEFINE's from the usnic configure.m4 script.

Also change the usnic/configure.m4 check for IBV_EVENT_GID_CHANGE to
use AC_CHECK_DECLS (vs. AC_CHECK_DECL).

cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r30804.
2014-02-24 18:57:04 +00:00
Jeff Squyres
1b855eca8e A few fixes after r30801:
* Use the prefix rule for global variables
 * Elimiante seed_prng() since it isn't necessary any more

These files will need to get edited again then the RNG type obeys the
prefix rule.

Refs trac:4298

This commit was SVN r30803.

The following SVN revision numbers were found above:
  r30801 --> open-mpi/ompi@e39d9f4080

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-24 17:47:52 +00:00
Joshua Ladd
e39d9f4080 Per the RFC schedule, add an additive lagged Fibonacci parallel random number generator to OPAL. In order to use, please add the following header to your code: opal/util/alfg.h. See ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c for an example how to seed with opal_srand and invoke the generator with opal_rand. This should be added to
cmr=v1.7.5:reviewer=rhc:subject=Add an OPAL RNG

This commit was SVN r30801.
2014-02-23 21:41:38 +00:00
Oscar Vega-Gisbert
75d40c0ae7 Errhandler doesn't need an init method.
This commit was SVN r30800.
2014-02-23 20:10:54 +00:00
Oscar Vega-Gisbert
ad8f57396a Avoid creating status objects in the C side.
It is necessary because calling java methods from C is very slow.

This commit was SVN r30799.
2014-02-23 20:08:53 +00:00
Nathan Hjelm
bd275e642e btl/ugni: fix typo
cmr=v1.7.5:ticket=trac:4151

This commit was SVN r30795.

The following Trac tickets were found above:
  Ticket 4151 --> https://svn.open-mpi.org/trac/ompi/ticket/4151
2014-02-21 17:46:22 +00:00
Ralph Castain
29a7eda280 Remove executable property
This commit was SVN r30791.
2014-02-21 17:27:47 +00:00
Oscar Vega-Gisbert
1dd20f397d Send JNI arguments of type request array as long array.
This commit was SVN r30786.
2014-02-21 09:23:07 +00:00
Manjunath Gorentla Venkata
38e5a753dd basemuma bcol : fixing warnings
This commit was SVN r30784.
2014-02-20 18:30:53 +00:00
Jeff Squyres
bda840df49 Fixes trac:4205: ensure sizeof(MPI_Count) <= sizeof(size_t)
- Move the ptrdiff_t tests up higher in configure.ac to be with the
  rest of the type tests.
- Create new OMPI_FIND_MPI_AINT_COUNT_OFFSET for finding the
  corresponding types of MPI_Aint, MPI_Count, and MPI_Offset.
  Consolidate all the old C and Fortran tests into this new macro (and
  .m4 file).
- Fix Fortran MPI_*_KIND tests that incorrectly keyed off assumed
  types (e.g., int64_t) rather than whatever the corresponding C
  MPI_Aint, MPI_Count, MPI_Offset types turned out to be.
- Add new logic to ensure that sizeof(MPI_Count) <= sizeof(size_t),
  because our entire PML, BTL, and convertor infrastructure requires
  this.  As a side effect, just like MPI_Offset the same type of
  MPI_Count (because MPI_Count has to be able to hold an MPI_Offset,
  so we can't let MPI_Offset be larger than a MPI_Count).

This commit was SVN r30776.

The following Trac tickets were found above:
  Ticket 4205 --> https://svn.open-mpi.org/trac/ompi/ticket/4205
2014-02-19 23:04:34 +00:00
Oscar Vega-Gisbert
5e9fbdde9b Comments about 'db' arguments.
This commit was SVN r30775.
2014-02-19 22:31:03 +00:00
Oscar Vega-Gisbert
04172e47c3 mpi.Prequest: improve start and startAll
This commit was SVN r30769.
2014-02-18 22:33:15 +00:00
Mike Dubman
49ee63f4b8 MXM: do not enforce version check
- MXM uses libtool versioning scheme which is enough, no need additional in OMPI

reviewed by yossi

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30768.
2014-02-18 19:44:37 +00:00
Rolf vandeVaart
d4f12148c4 Fix several issues reported in ticket #4245.
This commit was SVN r30767.
2014-02-18 17:44:08 +00:00
Oscar Vega-Gisbert
98530055d1 mpi.Message: java object members as parameters
This commit was SVN r30755.
2014-02-17 22:26:30 +00:00
Jeff Squyres
a80a24029d Rename poorly-named global: usnic_ticks -> ompi_btl_usnic_ticks
cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r30752.
2014-02-17 21:37:13 +00:00
Jeff Squyres
bb4ba6511d Remove an unused RML tag (it isn't even used in the oshmem layer).
This commit was SVN r30749.
2014-02-17 18:35:43 +00:00
Ralph Castain
c3df744a3b Shift the orte_db_localrank key to the opal level. Add the job and proc-level session directory names to the database using opal_db keys.
This commit was SVN r30746.
2014-02-17 01:40:56 +00:00
Oscar Vega-Gisbert
86e40c568a Improve access to buffers.
This commit was SVN r30745.
2014-02-16 22:58:01 +00:00
Oscar Vega-Gisbert
ecfca4c5f9 mpi.Comm: java object members as parameters
This commit was SVN r30741.
2014-02-16 18:51:14 +00:00
Oscar Vega-Gisbert
7c1802e933 mpi.Comm: java object members as parameters
This commit was SVN r30738.
2014-02-15 19:22:55 +00:00
Oscar Vega-Gisbert
d06e5ab42e Improve exception check.
This commit was SVN r30737.
2014-02-15 16:38:29 +00:00
Ralph Castain
445c9f3384 Ensure we only post one receive for direct modex replies, and that we properly handle thread-transfer issues between the ORTE callback and the MPI layer. Account for potential threaded operations at the MPI level.
Refs trac:4258

This commit was SVN r30730.

The following Trac tickets were found above:
  Ticket 4258 --> https://svn.open-mpi.org/trac/ompi/ticket/4258
2014-02-14 20:37:17 +00:00
Ralph Castain
bdff767dce Ick - wonder how this ever built static? There is no "select" function anywhere in the system.
cmr=v1.7.5:reviewer=jsquyres:subject=remove bad function declaration

This commit was SVN r30729.
2014-02-14 20:34:21 +00:00
Mike Dubman
608269ed72 fca: support relocation of fca packages to opal_prefix/../fca
reviewed by AlexM
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30728.
2014-02-14 14:49:41 +00:00
Oscar Vega-Gisbert
66e2e337f3 Fix mpijavac: -cp classpath
This commit was SVN r30724.
2014-02-14 08:46:23 +00:00
Ralph Castain
b787e2054f Silence warning
cmr=v1.7.5:reviewer=ompi-gk1.7

This commit was SVN r30721.
2014-02-14 00:00:23 +00:00
Christoph Niethammer
010a806a58 Omit usage of pre calculated prime numbers and factorize directly.
Optimization of the MPI_Dims_create function which omits the usage of pre
calculated prime numbers and factorize directly as discussed at the developer
list.

cmr=v1.7.5:ticket=4217:reviewer=jsquyres

This commit was SVN r30695.

The following Trac tickets were found above:
  Ticket 4217 --> https://svn.open-mpi.org/trac/ompi/ticket/4217
2014-02-12 08:47:33 +00:00
Christoph Niethammer
85dce869c8 Move parameter check into appropriate code section at the begin.
Freeprocs variable was obtained from nnodes, so check the value of nnodes at
the begin in the MPI_PARAM_CHECK code section instead as discussed at the
developer list.

cmr=v1.7.5:reviewer=jsquyres:subject=move parameter check to begin


jsquyres, please review this CMR. Thanks.

This commit was SVN r30694.
2014-02-12 08:30:13 +00:00
Ralph Castain
3e12466f60 Ouch - fix bad race condition in direct modex
cmr=v1.7.5:reviewer=hjelmn:subject=fix bad race condition in direct modex

This commit was SVN r30691.
2014-02-11 23:21:27 +00:00
Dave Goodell
72c0b89e8f usnic: handle missing ibv_event_type_str
Some older versions of libibverbs do not have `ibv_event_type_str`,
leading to compilation failures on older machines, irrespective of
whether they could ever support usNIC anyway.  If we encounter any other
build issues related to "old verbs" then we should just cause the usnic
BTL to disqualify itself when it encounters "old" traits.

Thanks to Paul Hargrove for reporting the issue:
http://www.open-mpi.org/community/lists/devel/2014/02/14056.php

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30674.
2014-02-11 19:18:29 +00:00
Nathan Hjelm
6194bb502a vader: attempt to work around SGI UV issues by creating a segment that
only goes up to VADER_MAX_ADDRESS instead of 0xfffffffffffffffful.

cmr=v1.7.5:ticket=trac:4216

This commit was SVN r30669.

The following Trac tickets were found above:
  Ticket 4216 --> https://svn.open-mpi.org/trac/ompi/ticket/4216
2014-02-11 16:28:25 +00:00
Nathan Hjelm
f2f6a7fe81 vader: don't finalize an endpoint that is already finalized
Fixes trac:4252

cmr=v1.7.5:ticket=trac:4053

This commit was SVN r30668.

The following Trac tickets were found above:
  Ticket 4053 --> https://svn.open-mpi.org/trac/ompi/ticket/4053
  Ticket 4252 --> https://svn.open-mpi.org/trac/ompi/ticket/4252
2014-02-11 16:15:29 +00:00
Jeff Squyres
db41d749c1 Remove ASYNCHRONOUS from the ignore TKR mpi_f08 module.
It turns out that ASYNCHRONOUS should not be used with ignore TKR
dummy parameters (some compilers will [correctly] warn about this).

Many thanks to Rolf Rabenseifner and Christoph Niethammer, who noticed
the problem.

Submitted by Rolf Rabenseifner, reviewed by Jeff.

cmr=v1.7.5:reviewer=ompi-rm1.7:subject=Remove ASYNCHRONOUS from the ignore TKR mpi_f08 module.

This commit was SVN r30665.
2014-02-11 13:19:30 +00:00
Nathan Hjelm
f45364746e vader: fix typos in r30626
cmr=v1.7.5:ticket=trac:4053

This commit was SVN r30652.

The following SVN revision numbers were found above:
  r30626 --> open-mpi/ompi@a8867a9ca4

The following Trac tickets were found above:
  Ticket 4053 --> https://svn.open-mpi.org/trac/ompi/ticket/4053
2014-02-10 16:15:43 +00:00
Nathan Hjelm
6dd29a05f1 basesmuma: Fix typos in r30627
cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30651.

The following SVN revision numbers were found above:
  r30627 --> open-mpi/ompi@98ad6b3d1e

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-10 16:15:37 +00:00
Alex Margolin
636493393c OPENIB: Fixed error from writing to an uninitialized pipe.
The error was caused by leaving the pipe to the async thread uninitialized, then writing to it regardless of this. 
Fix is to check the existance of the async thread and the pipe to it.

reviewd by miked

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30644.
2014-02-09 14:07:14 +00:00
Nathan Hjelm
98ad6b3d1e bcol/basesmuma: fix initialization on 32-bit platforms
The initialization code did several allgathers on void *'s using
MPI_LONG_LONG_INT. This will produce the wrong result on 32-bit
platforms. Instead use MPI_BYTE with count = sizeof (void *).

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30627.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-08 00:00:30 +00:00
Nathan Hjelm
a8867a9ca4 btl/vader: fix 32-bit support
cmr=v1.7.5:ticket=trac:4053

This commit was SVN r30626.

The following Trac tickets were found above:
  Ticket 4053 --> https://svn.open-mpi.org/trac/ompi/ticket/4053
2014-02-07 23:57:36 +00:00
Nathan Hjelm
77869c3232 bcol/basesmuma: fix several bugs in the basesmuma code
Found two bugs in basesmuma:

 - Release all resources when tearing down the bcol module.

 - Allways call the allreduce in the smcm code. We do not know
   beforehand whether all procs have all the files mapped.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30623.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-07 21:39:24 +00:00
Pavel Shamis
3a683419c5 Fixing broken dependency between ML/BCOLS
This is hot-fix patch for the issue reported by Ralph. 
In future we plan to restructure ml data structure layout.

Tested by Nathan.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30619.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-07 19:15:45 +00:00
Jeff Squyres
6f8e76df7e Revert r30539 and r30540; using the sqrt() to limit the computation is
just plain wrong (i.e., it gives wrong answers).  

When time permits, perhaps we can put in a better algorithm for
MPI_DIMS_CREATE (Andreas Schäfer mentioned that nnodes can now be on
the order of millions, and the current algorithm is... inefficient, at
best).

This commit was SVN r30606.

The following SVN revision numbers were found above:
  r30539 --> open-mpi/ompi@fb67d98867
  r30540 --> open-mpi/ompi@4417ed2133
2014-02-07 13:46:48 +00:00
Ralph Castain
74d3393a4f Revert r30600, r30602-30604 as the first one broke the tarball and the others couldn't fix it
This commit was SVN r30605.

The following SVN revision numbers were found above:
  r30600 --> open-mpi/ompi@7d2c4cb468
  r30602 --> open-mpi/ompi@9e751a0302
  r30604 --> open-mpi/ompi@3012c280cf

Revision number ranges (suitable for "git log"):
  r30602-30604 --> open-mpi/ompi@9e751a03^..3012c280
2014-02-07 04:38:06 +00:00
Ralph Castain
3012c280cf I surrender - this code is just too interbred with other components for me to clean up, so turn it off for now
This commit was SVN r30604.
2014-02-07 04:16:21 +00:00
Ralph Castain
3954311bac We have rules about not cross-integrating components, even across frameworks - please follow them.
This commit was SVN r30603.
2014-02-07 03:46:45 +00:00
Ralph Castain
9e751a0302 You absolutely, positively *cannot* include a header file from a component in the base functions!
This commit was SVN r30602.
2014-02-07 03:27:06 +00:00
Nathan Hjelm
a06e491c2c ob1: large buffered sends were broken by the ob1 optimizations. fix them
The problem was caused by the static request optimization. The buffered send case
is much like the isend case in that the request structure may be needed after
MPI_Bsend completes. Fix this case by calling isend and freeing the resulting
request.

cmr=v1.7.5:ticket=trac:4149

This commit was SVN r30601.

The following Trac tickets were found above:
  Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149
2014-02-07 00:12:36 +00:00
Jeff Squyres
7d2c4cb468 There's a few ml-related bugs outstanding, and Nathan is looking into
them, but it's going to take a little time (at least one day).  So
Nathan says it's ok to .ompi_ignore coll ml until he's able to fix it.

This commit was SVN r30600.
2014-02-06 23:51:03 +00:00
Nathan Hjelm
3902cf66f1 ob1: OBJ_CONSTRUCT the convertor in the send_inline optimization.
This change does not appear to increase the small message latency of ping-pong
benchmarks and fixes an issue found by our ibm datatype tests.

Fixes trac:4232

cmr=v1.7.5:ticket=trac:4149

This commit was SVN r30598.

The following Trac tickets were found above:
  Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149
  Ticket 4232 --> https://svn.open-mpi.org/trac/ompi/ticket/4232
2014-02-06 21:27:42 +00:00
Nathan Hjelm
a41cb1f086 Remove duplicate definition of xpmem_apid_t
cmr=v1.7.5:ticket=trac:4216

This commit was SVN r30589.

The following Trac tickets were found above:
  Ticket 4216 --> https://svn.open-mpi.org/trac/ompi/ticket/4216
2014-02-06 20:38:20 +00:00
Jeff Squyres
12a4d1a27f Minor update to r30430: put the variables at the top of the function
instead of making an inner block.

Refs trac:4185

This commit was SVN r30588.

The following SVN revision numbers were found above:
  r30430 --> open-mpi/ompi@ea3cb1e110

The following Trac tickets were found above:
  Ticket 4185 --> https://svn.open-mpi.org/trac/ompi/ticket/4185
2014-02-06 18:37:19 +00:00
Jeff Squyres
fad3cbf639 Revert r30571.
This commit was SVN r30587.

The following SVN revision numbers were found above:
  r30571 --> open-mpi/ompi@081b679881
2014-02-06 18:35:30 +00:00
Mike Dubman
081b679881 OMPI: add call to del_procs
fixed by AlexM, reviewed by miked
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30571.
2014-02-06 08:38:32 +00:00
George Bosilca
6ee06b7fda No exit down into a BTL.
This commit was SVN r30566.
2014-02-05 15:04:01 +00:00
Ralph Castain
1326ed704f Per the RFC discussed here:
http://www.open-mpi.org/community/lists/devel/2014/01/13789.php

add support for async modex when requested.

cmr=v1.7.5:reviewer=jsquyres:subject=Add async modex support

This commit was SVN r30565.
2014-02-05 14:39:27 +00:00
Joshua Ladd
1dbd8688db This fixes a long standing bug in the OpenIB BTL's MCA param intialization.Only caught if BTL_OPENIB_FAILOVER_ENABLED. Thanks to Jeff for spotting. This should be added to:
cmr=v1.7.4:reviewer=jsquyres
cmr=v1.6.6

This commit was SVN r30558.
2014-02-04 20:01:39 +00:00
Jeff Squyres
d9786c42f7 Addendum to r30531:
* Fix some comments
 * Fix some spacing in the non-verbose "make" output
 * Make javadoc non-verbose output like other non-verbose output
 * Remove the use of JAVA_CLASS_FILES; it wasn't correct any way (it
   both derived names from JAVA_SRC_FILES ''and'' used mpi/*.class, so
   many files were listed twice)
 * Move the generation of javadoc files to "make" time (vs. "make
   install" time) by putting the "doc" subdirectory in BUILT_SOURCES
 * Make doc dependent upon mpi/MPI.class, not mpi.jar -- we only need
   the classes to exist, not the final jarfile.
 * Make jdoc-install dependent upon a real build artifact (the doc
   dir), not an artificial name that will never exist (jdoc)
 * Separate the removal of the doc (and mpi) subdirectories during
   "make clean" off into the clean-local target, because CLEANFILES
   can really only had ''files'' added to it.

These changes also fix parallel builds.

cmr=v1.7.5:ticket=trac:4214

This commit was SVN r30547.

The following SVN revision numbers were found above:
  r30531 --> open-mpi/ompi@6ca8e68e4b

The following Trac tickets were found above:
  Ticket 4214 --> https://svn.open-mpi.org/trac/ompi/ticket/4214
2014-02-03 22:32:45 +00:00
Jeff Squyres
fa02bba7c5 Remove a bunch of extra whitespace.
Thanks to Andreas Schäfer for the original patch.

This commit was SVN r30541.
2014-02-03 19:30:43 +00:00
Jeff Squyres
4417ed2133 Gah; I missed the #include in r30539.
cmr=v1.7.5:ticket=trac:4217

This commit was SVN r30540.

The following SVN revision numbers were found above:
  r30539 --> open-mpi/ompi@fb67d98867

The following Trac tickets were found above:
  Ticket 4217 --> https://svn.open-mpi.org/trac/ompi/ticket/4217
2014-02-03 19:28:07 +00:00
Jeff Squyres
fb67d98867 Suggestion from Andreas Schäfer: we really only need sqrt(freeprocs)
primes.  This considerably reduces the computational load when
freeprocs is large.

cmr=v1.7.5:reviewer=hjelmn:subject=MPI_Dims_create optimization

This commit was SVN r30539.
2014-02-03 19:21:04 +00:00
Nathan Hjelm
12f0bf9488 basesmuma: missed a couple of MB references
cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30538.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-03 18:19:53 +00:00
Nathan Hjelm
84320f3815 btl/vader: fix compilation with SGI xpmem and add some debugging to
component_init.

cmr=v1.7.5:ticker=#4053

This commit was SVN r30535.
2014-02-03 17:42:40 +00:00
Nathan Hjelm
64321acc22 basesmuma: do not call MB directly
opal does not always define MB. It is recommended that opal_atomic_[rw]mb is
called instead. We will need to address the cases where these functions are
no-ops on weak-memory ordered cpus.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30534.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-03 17:01:57 +00:00
Nathan Hjelm
c2b061cc84 basesmuma: clean up code
Several changes are contained in this commit:

 - Clean up tabs and trailing whitespaces

 - Use consistent indentation in changed files

 - Remove unused code. None of the removed code will ever have been
   used in a trunk build.

 - Clean up the smcm code quite a bit

 - Do not fflush stderr and use opal_output instead of fprintf.

These changes have been tested on Cray XE-6 and PSM systems.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30533.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-03 17:01:46 +00:00
Jeff Squyres
1e952808ef r30519 (and the associated CMR #4209) left out fixing
MPI_SUBARRAYS_SUPPORTED and MPI_ASYNC_PROTECTS_NONBLOCKING in the F08
descriptor prototype.

This commit fixes the F08 descriptor prototype in the same was as
r30519 did for the non-F08-descriptor implementation.

Thanks to Mike Dubman for finding the issue.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30532.

The following SVN revision numbers were found above:
  r30519 --> open-mpi/ompi@caaab7e8a3
2014-02-03 16:55:33 +00:00
Jeff Squyres
6ca8e68e4b Install Java API docs into $(docdir)
Fixes trac:4054

cmr=v1.7.5:reviewer=osvegis

This commit was SVN r30531.

The following Trac tickets were found above:
  Ticket 4054 --> https://svn.open-mpi.org/trac/ompi/ticket/4054
2014-02-03 16:10:08 +00:00
Christoph Niethammer
4f23d8214c Fixed incorrect calculation of reallocated memory in mca_bml_r2_del_btl.
This commit was SVN r30529.
2014-02-03 08:43:59 +00:00
Nathan Hjelm
1ae39753dc bcol/basesmuma: check the return code of bcol_basesmuma_smcm_allgather_connection.
Fixes a segmentation fault found by the bogus intercomm_create test.

cmr=v1.7.4:review=manjugv

This commit was SVN r30527.
2014-01-31 22:20:25 +00:00
Jeff Squyres
1a9cdcc8ff Restore version numbers to "ompi_info --all" output.
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30523.
2014-01-31 16:20:46 +00:00
Jeff Squyres
caaab7e8a3 Fix Fortran delcarations of MPI_SUBARRAYS_SUPPORTED and MPI_ASYNC_PROTECTES_NONBLOCKING
Ensure that these two flags are in all of mpif.h, the mpi module, and
the mpi_f08 module.  Thanks to Rolf Rabenseifner for pointing out the
issue.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30519.
2014-01-31 15:22:12 +00:00
Adrian Reber
7de34ea201 SNAPC/CRCP/SSTORE: remove compiler warnings
This commit was SVN r30488.
2014-01-29 20:52:00 +00:00
Adrian Reber
fa1036f38c SSTORE/CRCP: use ORTE_WAIT_FOR_COMPLETION with non-blocking receives
During the commits to make the C/R code compile again the
blocking receive calls were replaced by non-blocking
which broke the code. This patch uses ORTE_WAIT_FOR_COMPLETION()
to wait until the non-blocking calls have finished.

This commit was SVN r30486.
2014-01-29 20:30:35 +00:00
Hadi Montakhabi
7bf4c425ff Fix: making sure the file type is not overwritten by the last queried component
This commit was SVN r30478.
2014-01-29 19:21:03 +00:00
Nathan Hjelm
afae924e29 coll/ml: fix some warnings and the spelling of indices
This commit fixes one warning that should have caused coll/ml to segfault
on reduce. The fix should be correct but we will continue to investigate.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30477.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-01-29 18:44:21 +00:00
Nathan Hjelm
700e97cf6a btl/vader: add support for SGI's implementation of xpmem and add support
for 32-bit architectures.

This commit also modifies _OMPI_CHECK_HEADER to use AC_CHECK_HEADERS instead
of AC_CHECK_HEADER. This allows components to check for multiple headers
instead of just one. The new semantics of the header check in OMPI_CHECK_PACKAGE
are to return success if at least one of the specified headers exists. The new
semantics will not break current usage.

cmr=v1.7.5:ticket=trac:4053

This commit was SVN r30476.

The following Trac tickets were found above:
  Ticket 4053 --> https://svn.open-mpi.org/trac/ompi/ticket/4053
2014-01-29 18:35:47 +00:00
Jeff Squyres
3fa9d36aba Per http://www.open-mpi.org/community/lists/devel/2014/01/13938.php,
Orion Poplawski noticed that we should not be installing mpio.h.

cmr=v1.7.4:reviewer=hjelmn:subject=do not install mpio.h

This commit was SVN r30465.
2014-01-28 21:46:26 +00:00
George Bosilca
bde9619386 Various minor cleanups.
This commit was SVN r30431.
2014-01-26 17:27:12 +00:00
George Bosilca
ea3cb1e110 Don't forget to call del_procs.
This commit was SVN r30430.
2014-01-26 17:26:40 +00:00
George Bosilca
d265981c55 Don't always retain the proc, do it only for new procs. This enforce a strict policy in the BML, it has one and only one ref on each proc.
This commit was SVN r30429.
2014-01-26 17:26:04 +00:00
Ralph Castain
b32556e6dc Fixes trac:4143
After IM with Nathan, apply patch from ticket after verification by Paul Hargrove that it fixes the problem on non-x86 32-bit platforms

Verified by Paul, RM-approved

cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r30411.

The following Trac tickets were found above:
  Ticket 4143 --> https://svn.open-mpi.org/trac/ompi/ticket/4143
2014-01-24 17:56:52 +00:00
Nathan Hjelm
2435057a57 ignore the iboffload component for now.
This commit was SVN r30398.
2014-01-23 16:06:21 +00:00
Rolf vandeVaart
9f3bf4747d Provide option to have synchronous copy be asynchronous with a wait. For now,
this has to be selected at runtime.  Also fix up some error messages to have
node name in them.

This commit was SVN r30396.
2014-01-23 15:47:20 +00:00
Jeff Squyres
9fee7c2b4d According to a report from Adam Moody, there is a compile error with
ROMIO and Lustre 2.4.0.  It has been solved upstream already; here's
the ticket:

    http://trac.mpich.org/projects/mpich/ticket/1973

And here's the commit that fixed it:

    a0c4278f14

OMPI does not have the other code referred to in that git commit (in
ad_lustre_hints.c).

Thanks to Adam Moody for reporting the issue.

cmr=v1.7.4:reviewer=hjelmn:subject=Fix ROMIO compile error w/ Lustre 2.4

This commit was SVN r30393.
2014-01-23 14:15:35 +00:00
Christoph Niethammer
86776daf75 Fixed typo in opal output message.
This commit was SVN r30392.
2014-01-23 08:37:40 +00:00
Mike Dubman
071838bb0a HCOLL: call hcoll_finalize and hcoll progress unregister in case of hcoll module query failures
fixed by Elena, reviewed by Val/Miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30390.
2014-01-23 07:29:23 +00:00
Jeff Squyres
772afc760e Shift .h files from one Makefile.am to another to enable "make dist"
cmr=v1.7.4:ticket=4162

This commit was SVN r30384.

The following Trac tickets were found above:
  Ticket 4162 --> https://svn.open-mpi.org/trac/ompi/ticket/4162
2014-01-23 02:00:05 +00:00
Jeff Squyres
2281a682ba Remove old kruft from the Makefile.am
The dist graph functions are on the trunk and have long-since been
added to the relevant lists.

cmr=v1.7.5:ticket=4163

This commit was SVN r30382.

The following Trac tickets were found above:
  Ticket 4163 --> https://svn.open-mpi.org/trac/ompi/ticket/4163
2014-01-23 01:33:44 +00:00
Jeff Squyres
7515d2caa9 Add Emacs mode at the top of the file
cmr=v1.7.5:ticket=4163

This commit was SVN r30381.

The following Trac tickets were found above:
  Ticket 4163 --> https://svn.open-mpi.org/trac/ompi/ticket/4163
2014-01-23 01:32:26 +00:00
Jeff Squyres
d910522ff6 Remove placeholder text file.
cmr=v1.7.5:subject=Rollup of Fortran fixes for 1.7.5

This commit was SVN r30380.
2014-01-23 01:30:59 +00:00
Jeff Squyres
aa0ceaa78b Move common code to ompi/mpi/fortran/base.
The attribute and conversion callback subroutine interfaces
are used by all 3 modules, and belong in the fortran/base directory,
not the directory of a specific module.

Also clean up some comments.

cmr=v1.7.4:ticket=4162

This commit was SVN r30378.

The following Trac tickets were found above:
  Ticket 4162 --> https://svn.open-mpi.org/trac/ompi/ticket/4162
2014-01-23 01:28:04 +00:00
Jeff Squyres
19617394f0 Add profiling versions of dist_graph functions into the library
Also fix the interfaces that have logical parameters (the
non-profiling versions were added/fixed a long time ago; it looks like
the profiling versions were inadvertantly skipped).

cmr=v1.7.4:ticket=4162

This commit was SVN r30377.

The following Trac tickets were found above:
  Ticket 4162 --> https://svn.open-mpi.org/trac/ompi/ticket/4162
2014-01-23 01:24:54 +00:00
Jeff Squyres
5aa75d0ed9 Add missing pmpi interfaces for neighbor routines
Somehow these interfaces were missed when adding these interfaces.

cmr=v1.7.4:ticket=4162

This commit was SVN r30376.

The following Trac tickets were found above:
  Ticket 4162 --> https://svn.open-mpi.org/trac/ompi/ticket/4162
2014-01-23 01:23:31 +00:00
Jeff Squyres
fe76eac8ab Revert part of SVN r30273: remove "protected" from special Fortran sentinels
r30273 made the use of the Fortran "protected" keyword be
compiler-specific (i.e., configure/macro-ized it).  But it
inadvertantly added the use of "protected" to some sentinel constants
that should not be protected (e.g., MPI_STATUS_IGNORE).

This commit reverts the addition of "protected" to the constants that
should not be protected.

cmr=v1.7.4:subject=Rollup of Fortran fixes for v1.7.4

This commit was SVN r30375.

The following SVN revision numbers were found above:
  r30273 --> open-mpi/ompi@5f17bc3c2c
2014-01-23 01:21:42 +00:00
Ralph Castain
06e6a06f3e Cleanup a couple of abstraction breaks found by Thomas Naughton
This commit was SVN r30371.
2014-01-22 21:36:24 +00:00
Hadi Montakhabi
8af6b8b4e4 add support for PLFS filesystem
This commit was SVN r30370.
2014-01-22 21:16:15 +00:00
Nathan Hjelm
7ba8bd81fa coll/ml: remove debug fprintfs
cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30367.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-01-22 17:21:05 +00:00
Nathan Hjelm
82d996fb76 coll/ml: cleanup some merge related errors
cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30366.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-01-22 16:48:09 +00:00
Nathan Hjelm
ff4c9c808a btl/ugni: fix leak in new sendi function.
cmr=v1.7.5:ticket=trac:4151

This commit was SVN r30365.

The following Trac tickets were found above:
  Ticket 4151 --> https://svn.open-mpi.org/trac/ompi/ticket/4151
2014-01-22 16:32:07 +00:00
Nathan Hjelm
66b69da394 Fix a bug in the ob1 optimizations that can cause a segfault.
btl sendi functions currently can not handle the descriptor being NULL. The
send inline optimization was assuming (incorrectly) that NULL was ok.

cmr=v1.7.5:ticket=trac:4149

This commit was SVN r30364.

The following Trac tickets were found above:
  Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149
2014-01-22 16:31:58 +00:00
Nathan Hjelm
1a021b8f2d coll/ml: add support for blocking and non-blocking allreduce, reduce, and
allgather.

The new collectives provide a signifigant performance increase over tuned for
small and medium messages. We are initially setting the priority lower than
tuned until this has had some time to soak in the trunk. Please set
coll_ml_priority to 90 for MTT runs.

Credit for this work goes to Manjunath Gorentla Venkata (ORNL), Pavel Shamis (ORNL),
and Nathan Hjelm (LANL).

Commit details (for reference):

Import ORNL's collectives for MPI_Allreduce, MPI_Reduce, and MPI_Allgather.

We need to take the basesmuma header into account when calculating the
ptpcoll small message thresholds. Add a define to bcol.h indicating the
maximum header size so we can take the header into account while not
making ptpcoll dependent on information from basesmuma.

This resolves an issue with allreduce where ptpcoll overwrites the
header of the next buffer in the basesmuma bank.

Fix reduce and make a sequential collective launcher in coll_ml_inlines.h

The root calculation for reduce was wrong for any root != 0. There are
four possibilities for the root:

 - The root is not the current process but is in the current hierarchy. In
   this case the root is the index of the global root as specified in the
   root vector.

 - The root is not the current process and is not in the next level of the
   hierarchy. In this case 0 must be the local root since this process will
   never communicate with the real root.

 - The root is not the current process but will be in next level of the
   hierarchy. In this case the current process must be the root.

 - I am the root. The root is my index.

Tested with IMB which rotates the root on every call to MPI_Reduce. Consider
IMB the reproducer for the issue this commit solves.

Make the bcast algorithm decision an enumerated variable

Resolve various asset failures when destructing coll ml requests.

Two issues:

 - Always reset the request to be invalid before returning it to the
   free list. This will avoid an asset in ompi_request_t's destructor.
   OMPI_REQUEST_FINI does this (and also releases the fortran handle
   index).

 - Never explicitly construct or destruct the superclass of an opal
   object. This screws up the class function tables and will cause
   either an assert failure or a segmentation fault when destructing
   coll ml requests.

Cleanup allgather.

I removed the duplicate non-blocking and blocking functions and modeled
the cleanup after what I found in allreduce. Also cleaned up the code
somewhat.

Don't bother copying from the send to the recieve buffer in
bcol_basesmuma_allreduce_intra_fanin_fanout if the pointers are the
same.

The eliminates a warning about memcpy and aliasing and avoids an
unnecessary call to memcpy.

Alwasy call CHECK_AND_RELEASE on memsync collectives.

There was a call to OBJ_RELEASE on the collective communicator but
because CHECK_AND_RECYLCE was never called there was not matching call
to OBJ_RELEASE. This caused coll ml to leak communicators.

Make allreduce use the sequential collective launcher in coll_ml_inlines.h

Just launch the next collective in the component progress.

I am a little unsure about this patch. There appears to be some sort
of race between collectives that causes buffer exhaustion in some cases
(IMB Allreduce is a reproducer). Changing progress to only launch the
next bcol seems to resolve the issue but might not be the best fix.

Note that I see little-no performance penalty for this change.

Fix allreduce when there are extra sources.

There was an issue with the buffer offset calculation when there are
extra sources. In the case of extra sources == 1 the offset was set
to buffer_size (just past the header of the next buffer). I adjusted
the buffer size to take into accoun the maximum header size (see the
earlier commit that added this) and simplified the offset calculation.

Make reduce/allreduce non-blocking. This is required for MPI_Comm_idup
to work correctly.

This has been tested with various layouts using the ibm testsuite and
imb and appears to have the same performance as the old blocking version.

Fix allgather for non-contiguous layouts and simplify parsing the
topology.

Some things in this patch:

 - There were several comments to the effect that level 0 of the
   hierarchy MUST contain all of the ranks. At least one function
   made this assumption but it was not true. I changed the sbgp
   components and the coll ml initization code to enforce this
   requirement.

 - Ensure that hierarchy level 0 has the ranks in the correct
   scatter gather order. This removes the need for a separate
   sort list and fixes the offset calculation for allgather.

 - There were several passes over the hierarchy to determine
   properties of the hierarchy. I eliminated these extra passes
   and the memory allocation associated with them and calculate the
   tree properties on the fly. The same DFS recursion also handles
   the re-order of level 0.

All these changes have been verified with MPI_Allreduce, MPI_Reduce, and
MPI_Allgather. All functions now pass all IBM/Open MPI, and IMB tests.

coll/ml: correct pointer usage for MPI_BOTTOM

Since contiguous datatypes are copied via memcpy (bypassing the convertor) we
need to adjust for the lb of the datatype. This corrects problems found testing
code that uses MPI_BOTTOM (NULL) as the send pointer.

Add fallback collectives for allreduce and reduce.

cmr=v1.7.5:reviewer=pasha

This commit was SVN r30363.
2014-01-22 15:39:19 +00:00
Jeff Squyres
be0e557d3c Revert r30164: it was just the wrong thing to do.
Fixes trac:4155.

This commit was SVN r30360.

The following SVN revision numbers were found above:
  r30164 --> open-mpi/ompi@ca84ffdbd4

The following Trac tickets were found above:
  Ticket 4155 --> https://svn.open-mpi.org/trac/ompi/ticket/4155
2014-01-22 00:51:03 +00:00
Nathan Hjelm
c9c335544e btl/ugni: fix a typo in r30353
cmr=v1.7.5:ticket=trac:4151

This commit was SVN r30354.

The following SVN revision numbers were found above:
  r30353 --> open-mpi/ompi@aa3fea55b2

The following Trac tickets were found above:
  Ticket 4151 --> https://svn.open-mpi.org/trac/ompi/ticket/4151
2014-01-21 21:02:28 +00:00
Nathan Hjelm
aa3fea55b2 btl/ugni: re-add a sendi function to exploit the new optimization in
ob1.

Also update LANL platform files to use the latest version of ugni.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30353.
2014-01-21 20:53:35 +00:00
Nathan Hjelm
2b57f4227e ob1: optimize blocking send and receive paths
Per RFC. There are two optimizations in this commit:

 - Allocate requests for blocking sends and receives on the stack. This
   bypasses the request free list and saves two atomics on the critical path.
   This change improves the small message ping-pong by 50-200ns on both AMD
   and Intel CPUs.

 - For small messages try to use the btl sendi function before intializing a
   send request. If the sendi fails or the btl does not have a sendi function
   silently fallback on the standard send path.

cmr=v1.7.5:reviewer=brbarret

This commit was SVN r30343.
2014-01-21 15:16:21 +00:00
George Bosilca
7e1593ef80 Prevent integer overflow in datatype creation. Patch based on
Gilles Gouaillardet solution attached to ticket #4145.

Closes trac:4145.
cmr=v1.7.4:reviewer=ompi-rm1.7
cmr=v1.6.6:reviewer=ompi-rm1.6

This commit was SVN r30342.

The following Trac tickets were found above:
  Ticket 4145 --> https://svn.open-mpi.org/trac/ompi/ticket/4145
2014-01-21 14:44:00 +00:00
Mike Dubman
b8550a55a7 HCOLL: many fixes
Adds coll_hcoll_np mca parameter similar to that of fca component (defaults to 32). Those who use hcoll be aware that from now on the communicators less than 32 procs will run w/o hcoll by default. - Resolves fallback issue in case libhcoll runs out of allowed contexts. The solution is moving hcoll_context_create from comm_enable to comm_query. Shortly, comm_enable should never return OMPI_ERROR in the coll component with highest priority (hcoll). Otherwise the ompi coll_base_select will unselect the coll funtion pointers and module references leaving the communicator w/o coll pointer. This will cause the fail. Same behavior can be reproduced even with tuned if one would hardcore some "return OMPI_ERROR" into it's module_enable funtion. - Additionally, removed all the dead code under #if 0; removed unused variables (path for library, active_modules list) and classes (module list wrapper)

Fixed by Val, Reviewed by Devendar/Josh/Miked

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30341.
2014-01-21 12:19:47 +00:00
Ralph Castain
2cf4862b49 Cleanup warnings for use of void* - requires intermediate cast to uintptr_t. Thanks to Paul Hargrove for reporting it
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30333.
2014-01-20 15:44:45 +00:00
Edgar Gabriel
be5d5834c5 fix the problem identified by a user on the mailing list with MPI_MODE_EXCL
cmr=v1.7.4:reviewer=vvenkatesan:subject=fix a problem when opening a file with MODE_EXCL

This commit was SVN r30324.
2014-01-18 16:06:27 +00:00
Nathan Hjelm
c88626510c Fix a merge issues with new ROMIO and fix obvious ROMIO bug.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30319.
2014-01-18 00:29:16 +00:00
Hadi Montakhabi
8c14411289 f_cc_size is contiguous chunk size, not the stripe width. There is no stripe_width in the file handle structure.
This commit was SVN r30314.
2014-01-17 18:35:55 +00:00
Mike Dubman
2af0f878bc remove bml_init call, called from btl add_proc.
Refs trac:3763

This commit was SVN r30310.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2014-01-17 16:52:20 +00:00
Mike Dubman
b7750ccbf4 OSHMEM: bml initialization is moved into ompi_init
it fixes race of mca_var segfault in finalization of shmem

based on this thread:
http://www.open-mpi.org/community/lists/devel/2014/01/13778.php

Refs trac:3763

fixed by Igor, reviewed by Brian

This commit was SVN r30304.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2014-01-17 06:09:29 +00:00
Nathan Hjelm
f2a73fcdbd udreg: free huge page allocations correctly
This commit fixes an error path that occurs when huge page allocations are
enabled. In this case we allocate a huge page and try to register it but fail.
We then were calling free on the opal object. Fix this by calling the proper free
function.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30289.
2014-01-14 16:26:06 +00:00
Nathan Hjelm
f9d2032705 vader: ensure fast box data is aligned on 4-byte boundaries
This commit fixes a bus error on Solaris/Sparc.

Closes trac:4111

cmr=v1.7.5:ticket=trac:4053

This commit was SVN r30288.

The following Trac tickets were found above:
  Ticket 4053 --> https://svn.open-mpi.org/trac/ompi/ticket/4053
  Ticket 4111 --> https://svn.open-mpi.org/trac/ompi/ticket/4111
2014-01-14 16:04:52 +00:00
Rolf vandeVaart
e75afb2b82 Fix bug in distance computation code when deciding which devices to use on a NUMA node.
Also add a verbose flag so one can see what devices are selected as well as another flag to override
locality information and use all devices on the node.  

This commit was SVN r30287.
2014-01-14 15:41:56 +00:00
Nathan Hjelm
da1316ca6e vader: don't OBJ_RELEASE endpoint rcaches.
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30284.
2014-01-13 23:44:34 +00:00
Jeff Squyres
fbd70d7798 George correctly pointed out that there's no need for this test: it
effectively exists elsewhere in the code already.

This commit was SVN r30277.
2014-01-13 22:26:22 +00:00
Jeff Squyres
20d6391734 Patch submitted by Paul Hargrove to fix NetBSD compile with -laio.
NetBSD puts the AIO functions in -lrt, vs. the usual libc.  So we
need the fbtl/posix configure.m4 to test for -lrt properly.

Reviewed by Jeff Squyres.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Fix NetBSD use of -laio

This commit was SVN r30274.
2014-01-13 18:49:39 +00:00
Jeff Squyres
5f17bc3c2c Make the use of PROTECTED in the mpi_f08 module be optional.
Add a configure test to see if the Fortran compiler supports the
PROTECTED keyword.  If it does, use in mpi-f08-types.F90 (via a macro
defined in configure-fortran-output-bottom.h).

This is needed to support the PGI 9 Fortran compiler, which does not
support the PROTECTED keyword.

Note that regardless of whether we want to support the PGI 9 Fortran
compiler + mpi_f08, we need to correctly detect whether PROTECTED
works or not, and then use that determination as a criteria for
building the mpi_f08 module.  Previously, mpi-f08-types.F90 used
PROTECTED unconditionally, and we didn't test for it in configure.  So
if a compiler (e.g., PGI 9) supported everything else but didn't
support PROTECTED, it would try to compile the mpi_f08 stuff and choke
on the use of PROTECTED.

Refs trac:4093

This commit was SVN r30273.

The following Trac tickets were found above:
  Ticket 4093 --> https://svn.open-mpi.org/trac/ompi/ticket/4093
2014-01-13 18:35:42 +00:00
Ralph Castain
e7710873a1 Open/close the RTE framework
cmr=v1.7.4:reviewer=hjelmn

This commit was SVN r30270.
2014-01-13 17:43:24 +00:00
Jeff Squyres
40939df16c Add two predefined MPI object padding tests:
1. Canary compile-time test: this is compiled whenever you compile
    the entire OMPI tree.  It's a noinst standalone library comprised
    of a single .c file, so no one will notice its addition, and it
    doesn't get linked/installed to any real build products.  If we
    are out of padding space on any predefined MPI object type, it
    will fail to compile.  This will alert/annoy a human, who will be
    able to fix the real problem.
 1. Added a "make check" test that will print out the amount of
    predefined padding left on all the MPI object types.

This commit was SVN r30268.
2014-01-13 16:39:39 +00:00
Yossi Etigin
7564e2c13f Fix a recursion in mxm send flow which happens when mpi starts a new send from the context of send completion callback.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30265.
2014-01-12 17:47:03 +00:00
Yossi Etigin
9504969f7d fix communicator double-free from pt2pt component, caused by r29938.
cmr=v1.7.5:reviewer=brbarret

This commit was SVN r30264.

The following SVN revision numbers were found above:
  r29938 --> open-mpi/ompi@ecfb122c97
2014-01-12 17:38:14 +00:00
Ralph Castain
286ff6d552 For large scale systems, we would like to avoid doing a full modex during MPI_Init so that launch will scale a little better. At the moment, our options are somewhat limited as only a few BTLs don't immediately call modex_recv on all procs during startup. However, for those situations where someone can take advantage of it, add the ability to do a "modex on demand" retrieval of data from remote procs when we launch via mpirun.
NOTE: launch performance will be absolutely awful if you do this with BTLs that aren't configured to modex_recv on first message!

Even with "modex on demand", we still have to do a barrier in place of the modex - we simply don't move any data around, which does reduce the time impact. The barrier is required to ensure that the other proc has in fact registered all its BTL info and therefore is prepared to hand over a complete data package. Otherwise, you may not get the info you need. In addition, the shared memory BTL can fail to properly rendezvous as it expects the barrier to be in place.

This behavior will *only* take effect under the following conditions:

1. launched via mpirun

2. #procs is greater than ompi_hostname_cutoff, which defaults to UINT32_MAX

3. mca param rte_orte_direct_modex is set to 1. At the moment, we are having problems getting this param to register properly, so only the first two conditions are in effect. Still, the bottom line is you have to *want* this behavior to get it.

The planned next evolution of this will be to make the direct modex be non-blocking - this will require two fixes:

1. if the remote proc doesn't have the required info, then let it delay its response until it does. This means we need a way for the MPI layer to tell the RTE "I am done entering modex data".

2. adjust the SM rendezvous logic to loop until the required file has been created

Creating a placeholder to bring this over to 1.7.5 when ready.

cmr=v1.7.5:reviewer=hjelmn:subject=Enable direct modex at scale

This commit was SVN r30259.
2014-01-11 17:36:06 +00:00
Jeff Squyres
69ecf1670c Remove even more dead Fortran configury.
This configure option was only relevant when we were generating TKR
"use mpi" interfaces for MPI subroutines with choice buffers.  Now
that we aren't, the only interface that needs to accept a choice
buffer is MPI_SIZEOF (which we have to provide).  

And since there's now only several dozen interfaces in the "mpi" TKR
module, there's no reason to not generate ''all'' possible array rank
values (when there were thousands of interfaces, generating 4-vs-7
array ranks per interface per type was a big deal).  The default used
to be 4; now we can just hard-code it to 7, the max possible value for
Fortran 2003 (I think the max was raised ?to 11? in F2008, but let's
not go there for now).

cmr=v1.7.5:reviewer=dgoodell:subject=Remove even more dead Fortran configury

This commit was SVN r30257.
2014-01-11 14:06:59 +00:00
Jeff Squyres
b0ffdb3ae5 As noted by Paul Hargrove, older PGI compilers support ''some'' of
BIND(C), but not ''all'' of it.  So expand our configure checks to
look for multiple different forms of BIND(C):

 * ISO_C_BINDING
 * SUBROUTINE ... BIND(C)
 * TYPE, BIND(C)
 * TYPE(foo), BIND(C, name="bar")

If the compiler supports all of these, then declare that we support
BIND(C), and the rest of the mpi_f08 checks can continue.  If we miss
any one of those, don't bother continuing -- we won't build the
mpi_f08 module.

Also push the results of all of these tests down to ompi_info so that
they can be reported easily (e.g., "Hey, why doesn't my OMPI
installation have the mpi_f08 module?").

cmr=v1.7.4:reviewer=jsquyres:subject=Expand Fortran BIND(C) configure checks

This commit was SVN r30247.
2014-01-10 23:44:55 +00:00
Jeff Squyres
751aa195e9 Similar to r30244, make the libmpi_usempif08 Fortran library also
LIBADD libmpi.la

cmr=v1.7.4:reviewer=brbarret:subject=Add libmpi to libmpi_usempif08_LIBADD

This commit was SVN r30245.

The following SVN revision numbers were found above:
  r30244 --> open-mpi/ompi@7015343951
2014-01-10 21:33:10 +00:00
Jeff Squyres
7015343951 Make the Fortran libraries also LIBADD libmpi.la (libmpi_usempi for
TKR LIBADDs libmpi_mpifh; there	is no library for libmpi_usempi	ignore
TKR).

Refs trac:4085

This commit was SVN r30244.

The following Trac tickets were found above:
  Ticket 4085 --> https://svn.open-mpi.org/trac/ompi/ticket/4085
2014-01-10 21:30:58 +00:00
Jeff Squyres
34ae50a0ed Fix int <--> pointer casting by adding intermediate cast through (intptr_t)
Reviewed by Dave Goodell

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Add intptr_t casting in usnic btl

This commit was SVN r30243.
2014-01-10 20:42:53 +00:00
Nathan Hjelm
5259ab213f Fix one more error path in udreg. In this case we hit the maximum size
of the udreg cache and get a different error code back.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30242.
2014-01-10 19:27:32 +00:00
Ralph Castain
9566650458 Per Marco, don't define a "min" function if one is already defined to avoid conflict with cygwin reserved word
This commit was SVN r30241.
2014-01-10 18:03:25 +00:00
Ralph Castain
880943dc10 Per Marco, rename "interface" to "tcp_interface" to avoid cygwin reserved word
This commit was SVN r30240.
2014-01-10 18:02:22 +00:00
Ralph Castain
c7a94a57d7 Per Marco, rename ERROR tags to exit_ERROR to avoid cygwin reserved name issues.
Refs trac:4085

This commit was SVN r30239.

The following Trac tickets were found above:
  Ticket 4085 --> https://svn.open-mpi.org/trac/ompi/ticket/4085
2014-01-10 18:00:49 +00:00
Jeff Squyres
350d989c00 Fix OpenBSD warnings where <malloc.h> is available and usable, but not
intended to be used and emits a compile-time warning.

Thanks to Paul Hargrove for identifying the issue.

cmr=v1.7.4:reviewer=hjelmn:subject=remove/replace malloc.h

This commit was SVN r30231.
2014-01-10 17:20:49 +00:00
Jeff Squyres
53a3defde9 s/CACHE_LINE_SIZE/BASESMUMA_CACHE_LINE_SIZE/g to avoid a system macro
name clash on some BSDs.

cmr=v1.7.4:reviewer=pasha

This commit was SVN r30230.
2014-01-10 16:48:43 +00:00
Edgar Gabriel
217e61e345 add proper typcasts to intptr_t to avoid warnings on 32bit systems.
This commit was SVN r30229.
2014-01-10 16:19:04 +00:00
Jeff Squyres
212e07a1e9 Don't instantiate+init variables in a switch block.
Avoid compiler warning about (unnecessarily) initializing 2 variables
during instantiation at the top of a switch block (but outside of any
case statements): just declare the variables at the top of the outter
block.  They're already safely initialized, so don't worry about
initializing them in the instantiation.

Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Don't instantiate+init variables in a switch block

This commit was SVN r30228.
2014-01-10 15:39:16 +00:00
Mike Dubman
110c99af4f sharing negative tag space between libNBC and HCOLL
fixed by devendar, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30224.
2014-01-10 12:51:34 +00:00
Nathan Hjelm
52c231df3e ob1 does not check the return code of mpool_register. This can cause the
ob1 dummy registration to actually be used when using udreg. Fix this by
always setting reg to NULL when mpool/udreg's register function fails.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30214.
2014-01-10 00:46:16 +00:00
Jeff Squyres
b0b17c62aa Protect against orte_proc_applied_binding being NULL.
It is now possible for orte_proc_applied_binding to be NULL (e.g., if
you mpirun --bind-to none), so we need to ensure we don't pass it down
to opal_hwloc_base_cset2*str().

Also, take the opprotunity to de-duplicate some strings that are used
in multiple places.

Refs trac:4073

This commit was SVN r30204.

The following Trac tickets were found above:
  Ticket 4073 --> https://svn.open-mpi.org/trac/ompi/ticket/4073
2014-01-09 23:38:34 +00:00
Jeff Squyres
115025b8dd Ensure that the usnic BTL is only built on 64 bit Linux platforms.
Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Ensure the usnic BTL only builds on 64 bit Linux

This commit was SVN r30199.
2014-01-09 22:17:01 +00:00
Brian Barrett
013e0ec771 * Add multi-device support to the Portals 4 btl.
* Remove use of the Portals 4 proc tag for the btl, as it's causing more
problems than its worth.

This commit was SVN r30191.
2014-01-09 20:01:42 +00:00
Ralph Castain
f179f2086b Do a better job of reporting bindings - if someone gives a spec that binds us to all processors, then we are effectively unbound and should report it clearly instead of outputting a long line of B's.
cmr=v1.7.4:reviewer=jsquyres:subject=Do a better job of reporting bindings

This commit was SVN r30179.
2014-01-09 16:16:16 +00:00
Nathan Hjelm
bb01fc2938 Add missing MCA variable enumerator sentinel.
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30178.
2014-01-09 15:28:42 +00:00
Alina Sklarevich
2869ff1782 mxm: fixes for compilation warnings.
removed set but not used variables and a variable that is unused.

reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30176.
2014-01-09 15:15:14 +00:00
Mike Dubman
0fae2caef3 Create a comm keyval for hcoll component with delete callback function.
Set comm attribute with keyval.
Wait for pending hcoll module tasks in comm delete callback where PML
still valid on the communicator. safely destroy hcoll context during
hcoll module destructor.

Author: Devendar Bureddy 
reviewed by miked

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30175.
2014-01-09 11:27:24 +00:00
Nathan Hjelm
10ecd80c8c Fix typo in udreg mpool that could cause us to try to use an invalid
registration. This was causing transaction errors on Aries systems.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30174.
2014-01-09 05:56:29 +00:00
Ralph Castain
2453843972 Add missing include - thanks to Paul Hargrove for spotting it
cmr=v1.7.4:reviewer=jsquyres:subject=add missing include in bcol

This commit was SVN r30171.
2014-01-09 03:57:55 +00:00
Jeff Squyres
776f6144af Part 2/companion to r30169: remove Fortran TKR interfaces for MPI
subroutines with choice buffers.

Refs trac:4065

This commit was SVN r30170.

The following SVN revision numbers were found above:
  r30169 --> open-mpi/ompi@759ee33fd4

The following Trac tickets were found above:
  Ticket 4065 --> https://svn.open-mpi.org/trac/ompi/ticket/4065
2014-01-09 02:23:20 +00:00
Jeff Squyres
759ee33fd4 Per thread starting here:
http://www.open-mpi.org/community/lists/users/2014/01/23327.php

Revert the Fortran mpi module default size to "small", meaning that we
won't provide interfaces for MPI subroutines that take a choice buffer
any more.  The short version is that MPI-3 p610:34-41 disallows it.

This commit simply removes all these subroutines from the build
process (i.e., remove them from nodist_libmpi_usempi_la_SOURCES).
Since MPI-3 actually forbids providing these interfaces, I'll do a
second commit to actually remove all the scripts and associated
Makefile.am junk.

cmr=v1.7.4:reviewer=dgoodell:subject=Remove choice buffer interfaces from Fortran mpi module

This commit was SVN r30169.
2014-01-09 01:33:13 +00:00
Jeff Squyres
ca84ffdbd4 Need to have BIND(C) on the callback interfaces. Reviewed/confirmed
by Tobias Burnus.

Refs trac:4058.

This commit was SVN r30164.

The following Trac tickets were found above:
  Ticket 4058 --> https://svn.open-mpi.org/trac/ompi/ticket/4058
2014-01-08 23:12:41 +00:00
Jeff Squyres
9d41632eba Change the MCA level to 2 (from 5) on the rationale that it may be
needed for correctness.  The if_include/if_exclude are level 1, and
the TCP port range params are level 2; this parameter seems to be on
par with the TCP port range params.

Refs trac:4019

This commit was SVN r30161.

The following Trac tickets were found above:
  Ticket 4019 --> https://svn.open-mpi.org/trac/ompi/ticket/4019
2014-01-08 19:04:26 +00:00
Jeff Squyres
8c871c2db6 Fix some compiler warnings:
* Remove some set-but-not-used variables
 * Make a convenience function return void (we weren't using the
   return code, anyway)
 * Mark a function as inline (it was supposed to be inline anyway)

Reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7:subject=Fix usnic BTL compiler warnings

This commit was SVN r30160.
2014-01-08 16:57:14 +00:00
Ralph Castain
cb31187bbe Correct tcp_not_use_nodelay option processing - change in mca param system incorrectly reversed the original parameter
Thanks to Tetsuya Mishima for detecting it!

cmr=v1.7.4:reviewer=jsquyres:subject=Correct tcp_not_use_nodelay option processing

This commit was SVN r30157.
2014-01-08 15:12:50 +00:00
Mike Dubman
43d6a30693 Fix problems of:
- HCOLL close without init
- Call hcoll progress after comm finalize
- mpirun default for coll_hcoll_enable is 1

fixed by Igor, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30156.
2014-01-08 10:55:25 +00:00
Jeff Squyres
36cca10042 Thanks to a reminder from Tobias Burunus, commit support for the
upcoming GCC/gfortran 4.9's ignore TKR interface.

This was originally committed in a side mercurial repo, but I sadly
completely forgot about it until Tobias reminded me.

cmr=v1.7.5:reviewer=dgoodell:subject=Add support for gfortran 4.9 Fortran ignore TKR

This commit was SVN r30152.
2014-01-08 03:46:27 +00:00
Ralph Castain
e2ca265f40 Per 1/7/2014 telecon: Add an MCA param to turn on all warnings for missing excluded interfaces.
Refs trac:4019

This commit was SVN r30146.

The following Trac tickets were found above:
  Ticket 4019 --> https://svn.open-mpi.org/trac/ompi/ticket/4019
2014-01-08 00:21:25 +00:00
Jeff Squyres
13b29cff2c This commit compliements/completes r30140. r30140 made all the
configury/Makefile.am changes; this commit renames the internal
installdirs.h framework struct field names to match the configry macro
names:

 * pkgdatdir ->	ompidatadir
 * pkglibdir -> ompilibdir
 * pkgincludedir -> ompiincludedir

This commit was SVN r30145.

The following SVN revision numbers were found above:
  r30140 --> open-mpi/ompi@8b778903d8
2014-01-07 23:36:33 +00:00
Brian Barrett
7d472ad5a5 Improve some comments
This commit was SVN r30144.
2014-01-07 23:35:04 +00:00
Jeff Squyres
50d20ade82 Fix compiler warnings: remove unused variables
This commit was SVN r30143.
2014-01-07 23:21:47 +00:00
Jeff Squyres
8349e122e8 Fix compiler warning (signed/unsigned comparison)
This commit was SVN r30142.
2014-01-07 23:18:55 +00:00
Brian Barrett
afde8370b3 Pull both calls to get into one function, and wrap with the appropriate
reference count if flow control is enabled.

This commit was SVN r30141.
2014-01-07 23:15:09 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Tom Naughton
c01db6faca fix typo in btl:vader for OMPI_LOCAL_RANK_INVALID
This commit was SVN r30139.
2014-01-07 21:42:51 +00:00
Brian Barrett
dbcc53bc6f Fix a threading issue
Remove some unneeded UNLIKELYs

This commit was SVN r30138.
2014-01-07 19:41:39 +00:00
Rolf vandeVaart
b3edca19df Add braces per coding convention and design review.
This commit was SVN r30137.
2014-01-07 17:30:37 +00:00
Jeff Squyres
8bf4ad9030 Refs trac:4301
Complements r30073: tighten up the string parsing of the vendor parts
ID MCA param a bit.  Also fix a small memory leak: ensure to free the
array uint32_t's parsed out of the MCA param.

This commit was SVN r30128.

The following SVN revision numbers were found above:
  r30073 --> open-mpi/ompi@6003702a51

The following Trac tickets were found above:
  Ticket 4301 --> https://svn.open-mpi.org/trac/ompi/ticket/4301
2014-01-06 22:16:04 +00:00
Nathan Hjelm
e627c91227 btl/vader: add support for traditional shared memory.
This commit adds support for placing the send memory segment in a
traditional shared memory segment when XPMEM is not available. The
current default is to reserve 4MB for shared memory on each process.
The latest benchmarks show vader performing better than sm on both
Intel and AMD CPUs.

For large messages vader will now use CMA if it is available (and
XPMEM is not).

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30123.
2014-01-06 19:51:44 +00:00
Nathan Hjelm
5c8ea3a251 btl/openib: Move free list memory allocation to add_procs
Per RFC which expired two weeks ago:

We are planning to make a change to Open MPI to always set up the btls. This
means the btl init will be called even if add_procs is never called for that
btl. In the openib btl free lists fragments are currently allocated in btl_init.
To avoid wasting that memory this commit moves that final device setup to
the add_procs function. This included allocating free lists, and starting the
async event thread.

At this time this change is safe since we have a barrier after add_procs in
MPI_Init. If this changes we will need to re-think some of the initialization
since we might have the possibility of a connection request before add_procs
is called.

Tested with Mellanox ConnectX2 and QLogic HCAs.

Commit also cleans up tabs in btl_openib_async.c.

cmr=v1.7.5:reviewer=miked

This commit was SVN r30122.
2014-01-06 19:51:30 +00:00
Matthias Jurenz
03c5791104 Changes to VT/OTF:
Fixed compiler warnings seen with the Clang compiler.

This commit was SVN r30121.
2014-01-06 14:03:01 +00:00
Brian Barrett
d4bb1cbbad * Start working on thread safety of Portals 4 MTL
* Only call flowctl_add_procs if there's a new proc in the add_procs call

This commit was SVN r30110.
2014-01-02 22:37:01 +00:00
Brian Barrett
e811a8a9cb Make the Portals 4 collective component disable itself when there's not a
Portals 4 point-to-point (MTL or BTL) component in use

This commit was SVN r30109.
2014-01-02 22:35:37 +00:00
Oscar Vega-Gisbert
c9b7ea6d1a Comm.reduceLocal: add missing offset artifact
This commit was SVN r30108.
2014-01-02 21:57:48 +00:00
George Bosilca
fb0f7d7fa5 Fix the issue with the topologies attached to a communicator.
This commit was SVN r30107.
2014-01-02 17:38:09 +00:00
Ralph Castain
871f4e519c Silence warning
Refs trac:4040

This commit was SVN r30105.

The following Trac tickets were found above:
  Ticket 4040 --> https://svn.open-mpi.org/trac/ompi/ticket/4040
2014-01-02 16:05:54 +00:00
Oscar Vega-Gisbert
795131fc59 javadoc: remove old references to offset
This commit was SVN r30102.
2014-01-01 21:40:27 +00:00
Rolf vandeVaart
c47e06463d Adjust CUDA related crossover value.
This commit was SVN r30100.
2013-12-30 18:39:11 +00:00
Rolf vandeVaart
e7f430d9ac Add empty line that was inadvertently removed in message.
This commit was SVN r30099.
2013-12-30 18:38:07 +00:00
George Bosilca
947c180d7f Create a finalize function to provide an opportunity to the mpool
base to release the internal structures.

This commit was SVN r30098.
2013-12-29 11:45:46 +00:00
Ralph Castain
652f7a120f Add Mellanox device IDs that were included in prior releases, but somehow missing again here
cmr=v1.7.4:reviewer=miked

This commit was SVN r30095.
2013-12-26 17:47:05 +00:00
Ralph Castain
62378a64c8 As Jeff pointed out, the reqd flag should only turn off the show_help - still enter the rest of the code block
Refs trac:4019

This commit was SVN r30091.

The following Trac tickets were found above:
  Ticket 4019 --> https://svn.open-mpi.org/trac/ompi/ticket/4019
2013-12-26 15:02:41 +00:00
Ralph Castain
a8a91b374e Update component-level selection comments to match latest revisions
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30087.
2013-12-25 19:12:43 +00:00
Jeff Squyres
12d23e9c92 Left out valid end-of-string comparison in r30073.
Refs trac:4031

This commit was SVN r30074.

The following SVN revision numbers were found above:
  r30073 --> open-mpi/ompi@6003702a51

The following Trac tickets were found above:
  Ticket 4031 --> https://svn.open-mpi.org/trac/ompi/ticket/4031
2013-12-24 12:07:56 +00:00
Jeff Squyres
6003702a51 Minor improvements to the usnic BTL:
1. Fix ompi_info memory leak in usnic BTL: do not allocate memory in
    the component register function, because ompi_info only calls the
    component register function and then dlclose's the component -- it
    does not call component finalize.  Instead, defer parsing the MCA
    param (and alloc'ing memory) until the component init function so
    that any allocated memory can be freed in the component close
    function.
 1. Also add a new check to ensure that we actually have some part
    numbers to check.  Add a show_help message if we don't find any
    vendor part IDs to check.
 1. Add a verbose output if usnic disqualifies itself from selection
    because THREAD_MULTIPLE was specified.

cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r30073.
2013-12-24 11:57:35 +00:00
Jeff Squyres
365ce2cd03 Fix minor MPI thread memory leak / fix valgrind still-reachable warning.
cmr=v1.7.5:reviewer=brbarret:subject=Fix minor MPI thread memory leak

This commit was SVN r30072.
2013-12-24 11:05:51 +00:00
Jeff Squyres
bceaa347b1 Label what the GAP_TEST macro does. Print more meaningful output as
to what the test is doing (i.e., checking for gaps between struct fields).

This commit was SVN r30070.
2013-12-24 11:03:24 +00:00
Ralph Castain
6a432ca092 Per patch from Ashley Pittman, correct the name of the struct within which the code is looking for "mtc".
cmr=v1.7.4:reviewer=bosilca:subject=Correct name of struct

This commit was SVN r30061.
2013-12-23 21:32:16 +00:00
Ralph Castain
9eebb79d54 Cleanup a loop that couldn't possibly execute as the outer loop indexed was being reused by the inner loops, leaving the index at the cutoff point after the first iteration
cmr=v1.7.4:reviewer=edgar:subject=Cleanup loop in sharedfp

This commit was SVN r30059.
2013-12-23 18:34:34 +00:00
Nathan Hjelm
3be4536d9b Cleanup various leaks in ompi_info reported by valgrind.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30058.
2013-12-23 17:47:43 +00:00
Mike Dubman
80f4e02e0a Several changes:
- Modifications to coll/hcoll component related to the changes in the libhcoll API. 
  Now, hcoll_destroy_context accepts one more parameter that indicates if the context was
  really destroyed as a result of the call. 
  This new "non-blocking" context destruction fixes hang discovered in IMB with mcast enabled. 
- Clean up all the left contexts (if any) on the comm_world destruction. 

fixed by Val, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30055.
2013-12-23 06:57:12 +00:00
Jeff Squyres
1448522d15 In an MPI_IBCAST, we cannot shortcut if there's only 1 process.
cmr=v1.7.4:reviewer=brbarret:subject=Fix IBCAST for COMM_SELF
-This line, and those below, will be ignored--

M    c/ibcast.c

This commit was SVN r30054.
2013-12-22 22:55:58 +00:00
Jeff Squyres
71ec6c1617 Remove unnecessary "mpi.h"; move opal headers to the top.
This commit was SVN r30053.
2013-12-22 20:38:43 +00:00
George Bosilca
b324884375 This might explain the current difficulties with the mapping...
This commit was SVN r30047.
2013-12-21 23:26:13 +00:00
George Bosilca
38cbaeaa82 Try to impose a little bit of consistency on how we parse lists of
modules by enforcing the use of OPAL list accessors.

This commit was SVN r30045.
2013-12-21 23:23:33 +00:00
Ralph Castain
042ed95e4e Remove an annoying warning. If the user excludes a non-existent interface, there is no reason to warn - the interface may simply not exist on that node.
cmr=v1.7.4:reviewer=jsquyres:subject=Remove an annoying warning

This commit was SVN r30042.
2013-12-21 01:51:11 +00:00
Adrian Reber
53a70fe87f Trying to get the C/R code to compile again. (send_*_nb)
This patch changes all send/send_buffer occurrences in the C/R code
to send_nb/send_buffer_nb.
The new code compiles but does not work.

Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED

Changes from V2:
* just replace the blocking calls with the non-blocking calls
* all #ifdef's introduced in V1 are gone
* send_* returns error code or ORTE_SUCCESS (not the number of bytes)

This commit was SVN r30036.
2013-12-20 21:58:28 +00:00
Adrian Reber
a3813d37c7 Trying to get the C/R code to compile again. (recv_*_nb)
This patch changes all recv/recv_buffer occurrences in the C/R code
to recv_nb/recv_buffer_nb.
The old code is still there but disabled using ifdefs (ENABLE_FT_FIXED).
The new code compiles but does not work.

Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED

Changes from V2:
* only #ifdef out the code where the behaviour is changed
  (used to be blocking; now non-blocking)

This commit was SVN r30035.
2013-12-20 21:05:40 +00:00
Rolf vandeVaart
695d854cd8 Fix return value.
This commit was SVN r30034.
2013-12-20 20:57:04 +00:00
Ralph Castain
31248c0985 Correctly add support for the "env" MPI_Info key during comm_spawn, update the "map-by", "rank-by", and "bind-to" Info key behaviors to match the new mapping/ranking/binding system, and update all docs and comments to match.
Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node.

Refs trac:4003

This commit was SVN r30033.

The following Trac tickets were found above:
  Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003
2013-12-20 20:42:39 +00:00
Rolf vandeVaart
4cd1958deb Fix so we do not get warnings when running on system without CUDA software installed and CUDA-aware compiled in.
This commit was SVN r30032.
2013-12-20 20:39:25 +00:00
Ralph Castain
0098f9f51a Remove remaining stale references
Refs trac:4006

This commit was SVN r30027.

The following Trac tickets were found above:
  Ticket 4006 --> https://svn.open-mpi.org/trac/ompi/ticket/4006
2013-12-20 17:48:28 +00:00
Dave Goodell
bd901a68ed usnic: fix 'fls' warnings+errors
The old version caused compilation errors on Solaris.  Thanks to Paul
Hargrove for testing and reporting the bug:

  http://www.open-mpi.org/community/lists/devel/2013/12/13520.php

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30025.
2013-12-20 17:37:22 +00:00
George Bosilca
7178492dd5 Correctly initialize and finalize all the datatype classes. No memory leaks on the
datatype engine subsists.

This commit was SVN r30019.
2013-12-20 15:57:10 +00:00
Jeff Squyres
4739850931 As reported by Paul Hargrove
(http://www.open-mpi.org/community/lists/devel/2013/12/13521.php),
OpenBSD-5 #define's MIN and MAX, so we need to #undef them.

cmr=v1.7.4:reviewer=rhc:subject=undef MIN and MAX for OpenBSD-5

This commit was SVN r30007.
2013-12-20 11:40:59 +00:00
Ralph Castain
6959ba5577 Add missing include file.
Thanks to Paul Hargrove for spotting it.

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29998.
2013-12-19 23:39:21 +00:00
Dave Goodell
0c6b292442 romio: pick "infinitely stale" fix from upstream
Some NFS scenarios can result in an infinite ESTALE return, which will
hang ROMIO.  This commit causes ROMIO to error out after a large number
of retries instead of spinning forever.

This is MPICH commit b250d338:

http://git.mpich.org/mpich.git/commit/b250d338e66667a8a1071a5f73a4151fd59f83b2

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r29993.
2013-12-19 22:55:26 +00:00
Ralph Castain
b745078535 Support user-provided envars for comm_spawn using info key "env"
Thanks to Tom Fogal for the request

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29990.
2013-12-19 20:59:50 +00:00
Jeff Squyres
3a14adef63 Remove the comments around these assignments; otherwise, we won't get
function pointers set to the _map functions, and we get segv's in MTT
testing (e.g., the C++ suite, which actually calls MPI_Cart_map and
MPI_Graph_map).

cmr=v1.7.4:reviewer=bosilca:subject=Fix topo _map function pointer assignments

This commit was SVN r29988.
2013-12-19 20:41:32 +00:00
Yossi Etigin
6ab4aba9e6 Fix missing include of show_help.h in mtl mxm.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29987.
2013-12-19 19:37:21 +00:00
Tom Naughton
3aefca32b0 + update rte db_fetch comments with change from r29931
This commit was SVN r29971.

The following SVN revision numbers were found above:
  r29931 --> open-mpi/ompi@0995a6f3b9
2013-12-19 01:16:58 +00:00
Jeff Squyres
bb59b07321 Remove CFLAGS setting that was really only intended for the v1.6
branch (it's not necessary on trunk/v1.7 because they require C99,
which allows variadic macros).

Also fix another compiler warning (using %p to print a (void*)).

Submitted by Jeff, reviewed by Dave.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=two usnic BTL fixes

This commit was SVN r29966.
2013-12-19 00:19:05 +00:00
Nathan Hjelm
b9765a380f Update NEWS with new MPI-3 features and a note about the new ROMIO
version.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29965.
2013-12-19 00:16:07 +00:00
Jeff Squyres
515fd00411 CSCul95082: DMAR faults during mtt testing
usnic_channel_finalize() was deregistering recv buffers before
destroying the QP to which they were posted. The QP needs to be
destroyed first so that the NIC does not attemp tto write to
deregistered memory, causing the DMAR messages.

Submitted by Reese, reviewed by Jeff.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29963.
2013-12-19 00:01:35 +00:00
Ralph Castain
77553f72be Per this email thread:
http://www.open-mpi.org/community/lists/devel/2013/12/13412.php

fix the backtrace function to avoid async issues. Thanks to Takahiro Kawashima for the patch

This commit was SVN r29955.
2013-12-18 17:57:37 +00:00
Jeff Squyres
2665c91b2a Fixes trac:3958: use the right type name (mca_topo_base_module_t) in the
debugger code (not mca_topo_base_module_2_1_0_t).

I checked: we do a similar thing for coll in the communicator struct
(i.e., leave the version number off the module struct).  I confess to
not remembering ''why'' we leave the version number off, but it seems
to be consistent this way...

cmr=v1.7.4:reviewer=bosilca:subject=fix debugger type symbol lookup for mca_topo_base_module_t

This commit was SVN r29953.

The following Trac tickets were found above:
  Ticket 3958 --> https://svn.open-mpi.org/trac/ompi/ticket/3958
2013-12-18 15:17:15 +00:00
Yossi Etigin
ecfb122c97 Fix segfault in osc pt2pt completion handler, when the request is canceled during finalization.
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29938.
2013-12-17 17:30:14 +00:00
Ralph Castain
0995a6f3b9 Revert r29917 and replace it with a fix that resolves the thread deadlock while retaining the desired debug info. In an earlier commit, we had changed the modex accordingly:
* automatically retrieve the hostname (and all RTE info) for all procs during MPI_Init if nprocs < cutoff

* if nprocs > cutoff, retrieve the hostname (and all RTE info) for a proc upon the first call to modex_recv for that proc. This would provide the hostname for debugging purposes as we only report errors on messages, and so we must have called modex_recv to get the endpoint info

* BTLs are not to call modex_recv until they need the endpoint info for first message - i.e., not during add_procs so we don't call it for every process in the job, but only those with whom we communicate

My understanding is that only some BTLs have been modified to meet that third requirement, but those include the Cray ones where jobs are big enough that launch times were becoming an issue. Other BTLs would hopefully be modified as time went on and interest in using them at scale arose. Meantime, those BTLs would call modex_recv on every proc, and we would therefore be no worse than the prior behavior.

This commit revises the MPI-RTE interface to pass the ompi_proc_t instead of the ompi_process_name_t for the proc so that the hostname can be easily inserted. I have advised the ORNL folks of the change.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix thread deadlock

This commit was SVN r29931.

The following SVN revision numbers were found above:
  r29917 --> open-mpi/ompi@1a972e2c9d
2013-12-17 03:26:00 +00:00
Adrian Reber
b42aad44a3 Trying to get the C/R code to compile again. This patch
includes various fixes all over the C/R code which are
hard to group like the other patches.

Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values

Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)

This commit was SVN r29922.
2013-12-16 15:35:28 +00:00
George Bosilca
efb32da1e0 There is no need for this include.
This commit was SVN r29918.
2013-12-15 17:04:45 +00:00
George Bosilca
1a972e2c9d Don't be greedy, just do what we asked for.
This commit was SVN r29917.
2013-12-15 16:54:01 +00:00
George Bosilca
430a13719f Only if OMPI_BTL_SM_HAVE_CMA is set to 1.
This commit was SVN r29916.
2013-12-15 16:49:27 +00:00
Jeff Squyres
0ab48ad0d2 Fix some annoying flex warnings that have been there for years.
Many thanks to Tom Fogal for the initial patch.

cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings

This commit was SVN r29904.
2013-12-14 00:36:12 +00:00
Rolf vandeVaart
b955dbd6d9 Fix various items discovered by review of ticket #3951.
This commit was SVN r29900.
2013-12-13 21:25:07 +00:00
Jeff Squyres
f4afa4fd1f Add missing include, exposed in "external libevent" work.
Refs trac:3694

This commit was SVN r29898.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 21:21:30 +00:00
Jeff Squyres
bcfe2156d5 Bring over m4 quoting fix from v1.7 branch (in r29894) that was
discovered when removing some components.

This commit was SVN r29895.

The following SVN revision numbers were found above:
  r29894 --> open-mpi/ompi@58ed00296c
2013-12-13 20:27:33 +00:00
Brian Barrett
121ca26c59 Per discussion at Develoepr's Meeting, remove Solaris threads support. Solaris
will just fall back to pthreads, which should be no problem.

This commit was SVN r29893.
2013-12-13 20:07:11 +00:00
Ralph Castain
f763be26c4 Closes trac:2433. Check for hetero architecture and disqualify sm connections if that is found as the sm btl currently doesn't support hetero operations.
cmr=v1.7.4:reviewer=brbarret:subject=Disqualify sm btl for hetero procs

This commit was SVN r29882.

The following Trac tickets were found above:
  Ticket 2433 --> https://svn.open-mpi.org/trac/ompi/ticket/2433
2013-12-13 15:23:33 +00:00
Mike Dubman
fb3f94a16e remove debug print
Refs trac:3969

This commit was SVN r29876.

The following Trac tickets were found above:
  Ticket 3969 --> https://svn.open-mpi.org/trac/ompi/ticket/3969
2013-12-13 06:08:44 +00:00
Mike Dubman
21be95c9b5 Initialize sm global variables in mca_btl_sm_component_open(), because they are destructed in mca_btl_sm_component_close(), and init() function might not be called or fail.
For exammple, mca_btl_sm.knem_fd remained 0, and mca_btl_sm_component_close() ended up doing closing fd 0 which belongs to someone else.

fixed by Yossi, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29875.
2013-12-13 06:01:24 +00:00
Jeff Squyres
bac67e0d81 Per discussion @Chicago OMPI dev meeting Dec 2013: remove all MX support.
This commit was SVN r29873.
2013-12-12 18:54:47 +00:00
Nathan Hjelm
3262080391 Cleanup udcm structures to avoid issues with nesting structures with
flexible members.

UDCM is ready to go for 1.7.4 with this patch.

cmr=v1.7.4:ticket=3940

This commit was SVN r29861.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-12 05:24:37 +00:00
Nathan Hjelm
e0e94a6029 Fix warning caused by typo in r29815
This commit was SVN r29860.

The following SVN revision numbers were found above:
  r29815 --> open-mpi/ompi@d556b60b21
2013-12-11 21:45:39 +00:00
Nathan Hjelm
6ab69c758b Fix warnings in udcm.
cmr=v1.7.4:reviewer=rhc:ticket=3940

This commit was SVN r29859.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-11 21:40:06 +00:00
Rolf vandeVaart
3ae88f8a24 Ensure no fork support with GDR. CUDA-aware code only.
This commit was SVN r29854.
2013-12-10 18:08:53 +00:00
Rolf vandeVaart
1cc55f305f Add extra check for GDR. Adjust some names and replace opal_output with opal_show_help.
This commit was SVN r29853.
2013-12-10 16:04:08 +00:00
Jeff Squyres
0f61bb651e Technically, the PORT_ACTIVE is not a "bad" event.
Note that this event should never happen within a single OMPI job,
because OMPI will ignore usnic ports that are down.  The PORT_ACTIVE
event should only occur if a port ''was'' down and is now ''up''.  But
what the heck -- if we ever do get this event, it is harmless -- just
ignore it.

This commit was SVN r29852.
2013-12-09 20:45:55 +00:00
Edgar Gabriel
c253c2eec6 fix the condition for the lazy open of shared filepointers.
This commit was SVN r29850.
2013-12-09 19:37:21 +00:00
Mike Dubman
9a65e0d8c6 cosmetic fixed fpr hcol autotools
Refs: #3694

This commit was SVN r29841.
2013-12-08 09:45:13 +00:00
Mike Dubman
2e124454b4 cosmitic fix to remove redundant -lfca
use CPP extra flags var which propagated to coll/fca and scoll/fca
Refs: #3694

This commit was SVN r29832.
2013-12-07 15:00:54 +00:00
Jeff Squyres
3bd9c603ff Clean up variables used in configure with OPAL_VAR_SCOPE.
This is helpful in the work for #3694: ensure that many places that
eventually end up in configure don't overly-pollute the global shell
variable space (because debugging accidental shell variable pollution
can be a real pain).

Refs trac:3694

This commit was SVN r29830.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-06 23:40:34 +00:00
Rolf vandeVaart
d556b60b21 Chnage some CUDA configure code and macro names per review request by jsquyres in ticket #3880.
Functionally, nothing changes.

This commit was SVN r29815.
2013-12-06 14:35:10 +00:00
Nathan Hjelm
231ebb09c9 Update romio configury to remove a warning message.
cmr=v1.7.4:ticket=3158

This commit was SVN r29811.

The following Trac tickets were found above:
  Ticket 3158 --> https://svn.open-mpi.org/trac/ompi/ticket/3158
2013-12-06 00:12:35 +00:00
Dave Goodell
da26226e3c usnic: add some extra debug-build sanity checks
On the off chance that the PML is twiddling fields that it really
shouldn't be...

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29804.
2013-12-05 00:28:11 +00:00
Oscar Vega-Gisbert
97fc83e29e Remove references to pinning support
This commit was SVN r29802.
2013-12-04 22:40:26 +00:00
Jeff Squyres
ba018b3603 Protect the container_of #define.
MOFED apparently has a /usr/include/infiniband/verbs.h that also
defines a (slightly different but fully compatible) container_of
macro.  So put proper #ifndef protection around our definition of
container_of.

Thanks to Rolf vandeVaart for pointing out the issue.

Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29799.
2013-12-04 14:24:56 +00:00
Yossi Etigin
a913b00f89 mtl mxm: update configuration parsing api to mxm 2.1, drop
older version support (1.0 and 1.1), and cleanup the code.

reviewed by miked.

cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29797.
2013-12-04 09:11:55 +00:00
Jeff Squyres
c74c1e86d3 Per suggestion from Paul Kapinos, report in BTL verbosity if a device
is skipped because it is too far away.

(see thread starting here:
http://www.open-mpi.org/community/lists/devel/2013/06/12470.php)

This commit was SVN r29790.
2013-12-03 22:44:11 +00:00
Rolf vandeVaart
218c05a4d1 Make sure synchronous copies are complete before moving the data.
This commit was SVN r29789.
2013-12-03 21:20:14 +00:00
Rolf vandeVaart
ab77435d9b Fix the CUDA-aware case where we are not sending any GPU data.
This commit was SVN r29788.
2013-12-03 20:25:58 +00:00
Devendar Bureddy
4554770ee4 hcol fixes
cmr=v1.7.4:reviewer=jladd

This commit was SVN r29787.
2013-12-03 20:21:40 +00:00
Nathan Hjelm
fe327d9859 udcm: cleanup code and improve the ack handling
Originally udcm acks used the immediate data to indicate which message was
being acknowleged. This data was (mysteriously) junk when using QLogic HCAs so I
updated udcm to use the source info (slid, qp, etc) to determine which message was being
acked. This works as long as we don't have two messages simultaneously in flight
to a particular peer and then loose the first of the two messages. The chances of this
happening are tiny. To fix this case I updated the udcm message header to include
a pointer to the in flight message. This pointer is then sent back to the sending
process to ack receipt.

cmr=v1.7.4:ticket=trac:3940

This commit was SVN r29775.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-02 20:18:46 +00:00
Jeff Squyres
3a7af4ab40 Fix another clang warning: sendreq is undefined if proc==NULL.
cmr=v1.7.4:reviewer=hjelmn:subject=fix ob1 undefined sendreq value

This commit was SVN r29774.
2013-12-02 19:44:42 +00:00
Jeff Squyres
16c63c5bbe Fix conditional: don't just check the constant (thanks to clang for an
excellent warning message!)

cmr=v1.7.4:reviewer=hjelmn:subject=fix MCA_BASE_VAR_SOURCE_OVERRIDE test

This commit was SVN r29773.
2013-12-02 19:41:59 +00:00
Nathan Hjelm
fb0b0442c4 openib/connect: re-enable xrc support in the openib btl
This commit updates the udcm cpc to support xrc. The steps followed by udcm
mimic those in the removed xoob cpc. This update has been tested with both XRC
and RC.

Mellanox, this is intended to go into 1.7.4. Please review carefully and let
me know if there are any issues.

cmr=v1.7.4:reviewer=miked

This commit was SVN r29767.
2013-11-27 22:28:04 +00:00
George Bosilca
cb24277737 Restrict the usage of MPI_Type_extent only to receiving processes
(aka the root). This commit is based on a patch provided by Pierre 
Jolivet.
Fix all the output to match the failing MPI call.

This commit was SVN r29761.
2013-11-27 12:09:31 +00:00
Rolf vandeVaart
aa98b0333b Call function from function table. Discovered during static build.
This commit was SVN r29755.
2013-11-25 22:46:07 +00:00
Matthias Jurenz
90ebdd920f Changes to VT:
- added preprocessor conditional for vt_cupti_events_enabled
	  (fixes compile error when CUDA-RT wrapper are enabled and CUPTI is disabled (as reported at: https://svn.open-mpi.org/trac/ompi/changeset/29752 by Jörg Bornschein))

This commit was SVN r29754.
2013-11-25 12:58:43 +00:00
Ralph Castain
ac9820c46f Link against common cuda library
Thanks to Jorg Bornschein for pointing it out

cmr=v1.7.4:reviewer=rolfv

This commit was SVN r29750.
2013-11-24 17:06:51 +00:00
George Bosilca
68268377af Fix an error message for the igather and the usage of the extent on
non non-root processes for the iscatter. Thanks to Pierre
Jolivet for the bug report and the patch.

This commit was SVN r29736.
2013-11-23 00:59:22 +00:00
Matthias Jurenz
3923ee89ec Changes to VT/OTF:
Fixed warnings about the need of the 'subdir-objects' option when using Automake v1.14.
Due to a bug in Automake (see http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13928) the 'subdir-objects' option cannot be enabled.
To get around this problem external sources files are sym linked in the current build directory (as done in ompi/mpi/c/profile) to lead Automake to believe that all source files are in the same directory.

This commit was SVN r29732.
2013-11-22 12:37:31 +00:00
Edgar Gabriel
4f425872be fix the streams used in opal_output in the sharedfp components.
This commit was SVN r29726.
2013-11-21 16:11:49 +00:00
Devendar Bureddy
4a311ae9fd continue search sorted openib device list if no btls found with nearest HCA.
cmr=v1.7.4:reviewer=jladd

This commit was SVN r29725.
2013-11-20 22:23:12 +00:00
Nathan Hjelm
24a7e7aa34 Add support for the udreg registration cache and dynamics on XE/XK/XC.
To support the new mpool two changes were made to the mpool infrastructure:

 1) Added an mpool flag to indicate that an mpool does not need the memory
    hooks to use the leave pinned protocols. This flag is checked in the
    mpool lookup.

 2) Add a mpool context to the base registration. This new member is used
    by the udreg mpool to store the udreg context associated with the
    particular registration. The new member will not break the ABI
    compatibility as the new member is only currently used by the udreg
    mpool.

Dynamics support for Cray systems makes use of the global rank provided by
orte to give the ugni library a unique rank for each process. Dynamics
support is not available under direct-launch (srun.)

cmr=v1.7.4

This commit was SVN r29719.
2013-11-18 04:58:37 +00:00
Jeff Squyres
5206e877be Help decrease conflicts between SVN trunk and Cisco git branch of OMPI v1.6 branch
This commit was SVN r29715.
2013-11-15 21:35:56 +00:00
Jeff Squyres
e6ed7c9f4d Avoid trivial "don't mix declarations and code" compiler warning
This commit was SVN r29714.
2013-11-15 21:31:10 +00:00
Oscar Vega-Gisbert
da84609091 Update Javadoc
This commit was SVN r29713.
2013-11-14 22:18:45 +00:00
Rolf vandeVaart
92e6aaa808 Adjust a default value. Adjust some levels of verbosity and one more debug message.
This commit was SVN r29712.
2013-11-14 21:47:27 +00:00
Oscar Vega-Gisbert
73fc5d2b3a Update Javadoc
This commit was SVN r29710.
2013-11-14 21:19:16 +00:00
Ralph Castain
7480beb7f0 Per request from Nathan, add an offset value to the job struct so we can construct a "global rank" that spans multiple jobs during dynamic launch operations. Store a new ORTE_DB_GLOBAL_RANK value for each process in the database, and ensure that we share our own value during connect_accept so both sides can see it.
This isn't being used yet - just enabling Nathan to do what he needs.

***** NOTE: any use of the OMPI_DB_GLOBAL_RANK database key must be protected by #ifdef OMPI_DB_GLOBAL_RANK as not all RTE's will define this key. *****

This commit was SVN r29708.
2013-11-14 17:01:43 +00:00
Ralph Castain
22e30a680d Given that the oob and xoob cpc's are no longer operable and haven't been since the OOB update, remove them to avoid confusion
cmr:v1.7.4:reviewer=hjelmn:subject=Remove stale cpcs from openib

This commit was SVN r29703.
2013-11-14 04:16:53 +00:00
Nathan Hjelm
6b3cf0c1ba Merge branch 'romio_refresh'
This commit was SVN r29695.
2013-11-13 21:02:55 +00:00
Rolf vandeVaart
4964a5e98b Per this RFC from October 8, 2013 and as discuessed in telecon.
http://www.open-mpi.org/community/lists/devel/2013/10/13072.php

Add support for pinning GPU Direct RDMA in openib BTL for better small message latency of GPU buffers. 
Note that none of this is compiled in unless CUDA-aware support is requested.

This commit was SVN r29680.
2013-11-13 13:22:39 +00:00
Jeff Squyres
684dc2f849 Don't use the hard-coded name libmpi.so -- instead, use
libmpi.<OPAL_DYN_LIB_SUFFIX>, where OPAL_DYN_LIB_SUFFIX was determined
by configure.

Thanks to Ömer Demirel for reporting the issue.

Refs trac:3905.

This commit was SVN r29676.

The following Trac tickets were found above:
  Ticket 3905 --> https://svn.open-mpi.org/trac/ompi/ticket/3905
2013-11-13 03:25:18 +00:00
Jeff Squyres
98ff91cfeb Refs trac:3091
Gah!  The "device" variable isn't used at all in this loop (my eye
glossed over the next line and thought that "device" was used in the
free() statement, but it's actually "devices" -- not "device").

This commit was SVN r29665.

The following Trac tickets were found above:
  Ticket 3091 --> https://svn.open-mpi.org/trac/ompi/ticket/3091
2013-11-12 23:01:04 +00:00
Jeff Squyres
7cb31111a6 Refs trac:3901
Feedback from Dave's review.

This commit was SVN r29664.

The following Trac tickets were found above:
  Ticket 3901 --> https://svn.open-mpi.org/trac/ompi/ticket/3901
2013-11-12 22:51:20 +00:00
Ralph Castain
762400d559 Silence warning
Refs trac:3898

This commit was SVN r29659.

The following Trac tickets were found above:
  Ticket 3898 --> https://svn.open-mpi.org/trac/ompi/ticket/3898
2013-11-11 22:53:09 +00:00
Jeff Squyres
5a940f5ee7 Arrgh -- remove debugging printf.
This commit was SVN r29657.
2013-11-11 22:44:28 +00:00
Jeff Squyres
e20217eccc Expand the "btl_usnic" MPI_T enumeration to have strings of the form:
<usnic device name>,<eth device>,<ip address>/<CIDR prefix>

For example:

   usnic_0,eth4,10.1.0.15/16

This is just handy for mapping the usnic_X device back to the IP
network to which it corresponds.

This commit was SVN r29656.
2013-11-11 22:25:30 +00:00
Nathan Hjelm
6a331275d8 Set transfers as active before starting them.
cmr=v1.7.4:ticket=trac:3898

This commit was SVN r29654.

The following Trac tickets were found above:
  Ticket 3898 --> https://svn.open-mpi.org/trac/ompi/ticket/3898
2013-11-11 21:50:54 +00:00
Nathan Hjelm
3d3c29ae96 btl/scif: do not return resource busy if we started a connection attempt.
Resolves a hang when using scif for shared memory transfers. This is a
simple change and doesn't require a review.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29653.
2013-11-11 19:36:34 +00:00
Nathan Hjelm
b5ce72cc15 Set the modex as active before starting it. This resolves a hang in
MPI_Init() on comm-spawned processes.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29652.
2013-11-11 19:33:32 +00:00
Rolf vandeVaart
a6df7bc33a Fix issues reported in ticket #3877. Also added additional comments.
This commit was SVN r29641.
2013-11-07 20:44:47 +00:00
Rolf vandeVaart
2cf7c40ee5 Minor adjustments to error messages due to review of #3880.
This commit was SVN r29640.
2013-11-07 20:21:21 +00:00
Rolf vandeVaart
3290cde630 Various minor changes to bring smcuda up to date with sm.
This commit was SVN r29639.
2013-11-07 19:45:56 +00:00
Dave Goodell
82db913490 usnic: fix module_recv_buffers perf regression
Cisco v1.6 git commit 913ec6c and upstream trunk r29593 (segfault fix)
introduced a performance regression by inadvertently disabling the
`module_recv_buffers` functionality.  With those changes in place, the
`btl_usnic_recv.c` logic would end up mallocing a buffer that should
have otherwise come from a `module_recv_buffers` pool.  It also resulted
in a small, bounded memory leak (128 buffers at each power-of-two size
interval).

The new version just places the buffer after the free list item with a
flexible array member.  I bumped the pool to allocate all 128 elements
up front because the deferred allocation was modestly impacting IMB
Sendrecv performance at a few sizes.

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29631.

The following SVN revision numbers were found above:
  r29593 --> open-mpi/ompi@1ed9b8ff43
2013-11-07 01:27:31 +00:00
Vishwanath Venkatesan
d37a5faa20 Need not do aggregator selection for one process case
So adding a check for this corner case!

This commit was SVN r29622.
2013-11-06 21:05:26 +00:00
Brian Barrett
6d7a1fbb82 Move opal_portable_platform.h to opal/include/opal, which is where it really
should have been all along and fix one place that uses the file

Update opal_portable_platform.h with changes to mpi_portable_platform.h made 
in r29608.

Make mpi_portable_platform.h a symlink to opal_portable_platform.h, so that
they won't get out of sync.  I'd like to remove mpi_portable_platform.h, but
we don't automatically add -I${includedir}/openmpi/ to make that sane from
a header include point of view, so that's future work.

This commit was SVN r29618.

The following SVN revision numbers were found above:
  r29608 --> open-mpi/ompi@b71bd51cdd
2013-11-06 17:12:26 +00:00
Brian Barrett
cf8de1ef0f Minor indent cleanup in init_query()
Only use Portals on communicators with more than one rank
Fix computation of number of children when using the hypercube tree

This commit was SVN r29616.
2013-11-06 15:21:09 +00:00
Jeff Squyres
e28261898d Per discussion on the devel list, rename the btl_usnic_devices MPI_T
state pvar to be btl_usnic (i.e., the best suggestion so far).

See http://www.open-mpi.org/community/lists/devel/2013/11/13188.php
for more detail.

This commit was SVN r29614.
2013-11-06 06:19:03 +00:00
Brian Barrett
9780043456 re-apply r29608 and fix the broken configure test that broke worse with the
patch.  See ticket #3885, comment 10 for an explination of why calling
_STRINGIFY on something that's not a numerical constant is always a bad idea.

This commit was SVN r29613.

The following SVN revision numbers were found above:
  r29608 --> open-mpi/ompi@b71bd51cdd
2013-11-05 22:41:10 +00:00
Jeff Squyres
36e9987ce6 Reverting r29608 (C++11 fix); it's causing some problems.
This commit was SVN r29611.

The following SVN revision numbers were found above:
  r29608 --> open-mpi/ompi@b71bd51cdd
2013-11-05 22:13:11 +00:00
Jeff Squyres
b71bd51cdd Fix C++11 issue identified by Jeremiah Willcock
(http://www.open-mpi.org/community/lists/users/2013/11/22912.php).

cmr=v1.7.4:reviewer=brbarret

This commit was SVN r29608.
2013-11-05 19:48:03 +00:00
Rolf vandeVaart
e46c0bb952 Fix one more space for consistent defines.
This commit was SVN r29607.
2013-11-05 15:31:49 +00:00
Rolf vandeVaart
64b3a24fec Fix CUDA-aware compile issues.
This commit was SVN r29606.
2013-11-05 14:46:58 +00:00
Rolf vandeVaart
e57795f097 Revert r29594. That was just plain wrong. Sorry about workday configure change.
This commit was SVN r29605.

The following SVN revision numbers were found above:
  r29594 --> open-mpi/ompi@ed7ddcd9c7
2013-11-05 14:45:56 +00:00
Jeff Squyres
8aaa15c0ef Remove a line that looks like a copy-n-paste error.
This line results in a compile error when you configure thusly:

  ./configure CC=icc CXX=icpc FC=ifort FCFLAGS=-i8

cmr=v1.7.4:reviewer=hjelmn:subject=fix Fortran compile with -i8

This commit was SVN r29602.
2013-11-05 04:17:29 +00:00
Jeff Squyres
4160c37a4b Fix an ORTE -> OPAL name change that was missed in r29530.
This commit was SVN r29595.

The following SVN revision numbers were found above:
  r29530 --> open-mpi/ompi@588e7ce974
2013-11-05 03:29:26 +00:00
Rolf vandeVaart
ed7ddcd9c7 Fix CUDA-aware compile error introduces with r29581.
This commit was SVN r29594.

The following SVN revision numbers were found above:
  r29581 --> open-mpi/ompi@ee7510b025
2013-11-05 00:08:33 +00:00
Dave Goodell
1ed9b8ff43 usnic: fix segfault at finalize time
Without this commit, if you run IMB pingpong between two nodes with only
one usnic selected (e.g., via `--mca btl_usnic_if_include usnic_0`) then
the run will seem fine but will segfault at MPI_Finalize time.

This behavior has happened since Cisco v1.6 git commit ec7ddf8, upstream
trunk r29484, and upstream v1.7 r29507.

Root cause was that the free list element was being used as the recv
buffer instead of the data buffer associated with the element.  So the
reassembly code would stomp all over the free list element, which would
cause the destructor to explode when the free list attempted to clean up
all of its elements.  This surprisingly did not cause any other problems
until now.

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29593.

The following SVN revision numbers were found above:
  r29484 --> open-mpi/ompi@a6ed232a10
  r29507 --> open-mpi/ompi@790d269ce8
2013-11-04 22:52:14 +00:00
Dave Goodell
73a943492c usnic: pack via convertor on the fly
If we need to use a convertor, go back to stashing that convertor in the
frag and populating segments "on the fly" (in
ompi_btl_usnic_module_progress_sends).  Previously we would pack into a
chain of chunk segments at prepare_src time, unnecessarily consuming
additional memory.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29592.
2013-11-04 22:52:03 +00:00
Dave Goodell
71d0d73575 usnic: refactor callback invocation
This makes it a little easier to see what's happening with callbacks to
the PML.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29591.
2013-11-04 22:51:48 +00:00
Dave Goodell
4c791e21d2 usnic: add MSGDEBUG1_OUT/MSGDEBUG2_OUT macros
This includes suppressing picky-mode warnings about __VA_ARGS__, which
we know are supported by any compilers we care about.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29590.
2013-11-04 22:51:35 +00:00
Dave Goodell
825686a205 usnic: certain send frag members are immutable
Ensure that they never are touched by checking in their destructors.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29589.
2013-11-04 22:51:24 +00:00
Ralph Castain
eee7e49a4b Ensure the Java wrapper compiler files are in the tarball
This commit was SVN r29584.
2013-11-01 20:08:45 +00:00
Nathan Hjelm
c71125acfd Using MPI_* functions in iallreduce can cause comm-spawned processes to
crash. Update libnbc's iallreduce function to use ompi_* functions
instead.

cmr=v1.7.4:reviewer=brbarret

This commit was SVN r29582.
2013-11-01 16:45:54 +00:00
Rolf vandeVaart
ee7510b025 Remove redundant macro. This was from reviewed of earlier ticket.
Fixes trac:3878.  Reviewed by jsquyres.

This commit was SVN r29581.

The following Trac tickets were found above:
  Ticket 3878 --> https://svn.open-mpi.org/trac/ompi/ticket/3878
2013-11-01 12:19:40 +00:00
Rolf vandeVaart
99f9fdee01 Fix corner case involving threads and CUDA-aware support.
This commit was SVN r29579.
2013-10-31 20:53:46 +00:00
Nathan Hjelm
fd25b7af01 Fix common ugni Makefile.am for non-DSO builds.
This commit was SVN r29571.
2013-10-30 19:37:14 +00:00
Nathan Hjelm
a31e617d17 Remove outdated comments in coll_basic_reduce_scatter.c.
Refs trac:1559

This commit was SVN r29566.

The following Trac tickets were found above:
  Ticket 1559 --> https://svn.open-mpi.org/trac/ompi/ticket/1559
2013-10-30 16:20:20 +00:00
Mike Dubman
b0e64427a9 ompi/mca/btl/openib: Fix memory leak and accessing free'd memory issues
Let imagine that we have two btls in btl_openib_component_init() both points to the same openib_btl->device and as a result have the same openib_btl->device->endpoints array.

Finalization phase calls twice mca_btl_openib_finalize()->mca_btl_openib_finalize_resources().
mca_btl_openib_finalize_resources() frees endpoint related btl. But the second call of mca_btl_openib_finalize_resources() checks endpoint that is released by previus call.

fixed by Igor, reviewed by miked/vasily
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29563.
2013-10-30 11:47:49 +00:00
Nathan Hjelm
167d5613db Do not do arithmetic with void * in basic neighborhood alltoall[vw].
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29558.
2013-10-29 20:02:13 +00:00
Jeff Squyres
6569019b06 Move all usNIC stats to _stats.c|h and export them as MPI_T pvars.
This commit moves all the module stats into their own struct so that
the stats only need to appear as a single line in the module_t
definition, and then moves all the logic for reporting the stats into
btl_usnic_stats.c|h.

Further, the stats are now exported as MPI_T_BIND_NO_OBJECT entities
(i.e., not bound to any particular MPI handle), and are marked as
READONLY and CONTINUOUS.  They currently all default to verbose level
5 ("Application tuner / detailed", according to
https://svn.open-mpi.org/trac/ompi/wiki/MCAParamLevels).

Most of the statistics are counters, but a small number are high
watermark values.  Due to how counters are reported via MPI_T, none of
the counters are exported through MPI_T if the MCA param
btl_usnic_stats_relative=1 (i.e., the module resets the stats back to
zero at a given frequency).

When MPI_T_pvar_handle_alloc() is invoked on any of these pvars, it
will return a count that is equal to the number of active usnic BTL
modules.  The values returned for any given pvar (e.g.,
num_total_sends) are an array containing one value for each active
usnic BTL module.  The ordering of values in the array is both
consistent across all usnic pvars and stable throughout a single job:
array slot 0 corresponds to module X, array slot 1 corresponds to
module Y, etc.

Mapping which array slot corresponds to which underlying Linux usnic_X
device works as follows:

 * The btl_usnic_devices MPI_T state pvar is associated with a
   btl_usnic_device MPI_T enum, and be obtained via
   MPI_T_pvar_get_info().
 * If all usNIC pvars are of length N, the values [0,N) in the
   btl_usnic_device enum are associated with strings of the
   corresponding underlying Linux device.

For exampe, to look up which Linux device is reported in all usNIC
pvars' array slot 1, look up the int value 1 in the btl_usnic_devices
enum.  Its corresponding string value is underlying Linux device name
(e.g., "usnic_1").

cmr=v1.7.4:subject="usnic BTL MPI_T pvars"

This commit was SVN r29545.
2013-10-28 22:23:08 +00:00
Nathan Hjelm
404cceb9c4 Always check the return of [mc]alloc and fix a warning introduced by
r29479.

This fixes some issues reported awhile ago in the openib btl. There
are a couple more unchecked mallocs but they are a bit more difficult
to fix since they are in void functions (btl_openib_endpoint.c).

Refs trac:2401.

cmr=v1.7.4:reviewer=miked

This commit was SVN r29543.

The following SVN revision numbers were found above:
  r29479 --> open-mpi/ompi@d6ead2a3a5

The following Trac tickets were found above:
  Ticket 2401 --> https://svn.open-mpi.org/trac/ompi/ticket/2401
2013-10-28 20:04:49 +00:00