1
1

5227 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
b565e69b86 check-help-strings cleanup
This commit was SVN r32491.
2014-08-11 03:19:57 +00:00
Mike Dubman
3c8a4d7d2d mxm: opal refactoring voices
http://www.open-mpi.org/community/lists/devel/2014/08/15590.php

This commit was SVN r32486.
2014-08-10 04:35:56 +00:00
Mike Dubman
0f60c34a9f fca: adopt opal API refactoring, fix warning.
based on http://www.open-mpi.org/community/lists/devel/2014/08/15558.php

This commit was SVN r32484.
2014-08-09 15:50:51 +00:00
Ralph Castain
e95187514c Update the proc structure dereference to reflect the new opal_proc_t super field
This commit was SVN r32462.
2014-08-08 16:12:49 +00:00
Jeff Squyres
132375f07f helpfiles: fix filenames referenced by calls to show_help()
This commit was SVN r32453.
2014-08-08 13:34:15 +00:00
Jeff Squyres
80a7309462 helpfiles: remove empty helpfiles
This commit was SVN r32452.
2014-08-08 13:33:47 +00:00
Jeff Squyres
c6d9bf906e configury: ensure wrapper static LIBS is filled properly
In core library portions of the configury (e.g., top-level
configure.ac itself), we were calling AC_CHECK_LIB and
OPAL_CHECK_FUNC_LIB to check for various libraries.

'''SIDENOTE:''' It turns out that modern Autoconf has AC_SEARCH_LIBS,
which does just about exactly what OPAL_CHECK_FUNC_LIB does.  So this
commit effectively replaces OPAL_CHECK_FUNC_LIB with AC_SEARCH_LIBS.

However, we never bothered to add these found libraries to the wrapper
compiler list of libraries used for static linking (doh!).  We've been
getting lucky for quite a while that components were adding the same
libraries to their wrapper compiler LIBS list.  

This is problematic, however, if we don't build some of these
components.  For example, Paul Hargrove noticed that if he configured
with --disable-shared --enable-static --disable-io-romio, ROMIO was no
longer adding some libraries to the wrapper LIBS list -- libraries
that just happened to also be needed by core OPAL/ORTE/OMPI layers.

The solution is not to use AC_CHECK_LIB or OPAL_CHECK_FUNC_LIB, but
use a pair of new macros:

 * OPAL_SEARCH_LIBS_CORE: a wrapper around AC_SEARCH_LIBS.  If we add
   something to $LIBS, then also add it to the wrapper list of static
   libraries.  This is the main piece of functionality that was
   wrong/missing.
 * OPAL_SEARCH_LIBS_COMPONENT: similar to OPAL_SEARCH_LIBS_CORE, but
   instead of directly adding it to the wrapper list of static
   libaries, add it to <framework>_<component>_LIBS (which eventually
   gets slurped up into the wrapper list of static libraries.  See the
   lengthy comment in config/opal_setup_wrappers.m4 near the beginning
   of OPAL_SETUP_WRAPPER_INIT() for a more detailed explanation).
   Most components did this correctly already, but one or two weren't
   right, so I implemented this second macro quite similar to the
   first and put it everywhere we already used AC_SEARCH_LIBS or
   OPAL_CHECK_FUNC_LIB.

This needs to soak for a day or two on the trunk before moving to the
v1.8 branch.

Refs trac:4834

cmr=v1.8.2:reviewer=ggouaillardet

This commit was SVN r32447.

The following Trac tickets were found above:
  Ticket 4834 --> https://svn.open-mpi.org/trac/ompi/ticket/4834
2014-08-07 23:54:45 +00:00
Ralph Castain
1e93e85403 Cleanup some autoconf messages - thanks to Paul Hargrove for noting them
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32429.
2014-08-05 14:48:42 +00:00
Ralph Castain
5f244e8b19 Use opal_getpagesize to get the proper page size
Refs trac:4826

This commit was SVN r32422.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-04 20:23:00 +00:00
Ralph Castain
2ceaa8a1dd Per patch from Thomas, remove orte abstraction breaks
This commit was SVN r32413.
2014-08-04 17:07:50 +00:00
Gilles Gouaillardet
5f1e0f284a Fix compilation when --enable-hetorogeneous
This commit was SVN r32410.
2014-08-04 10:35:08 +00:00
Gilles Gouaillardet
5b1ae87c76 coll/ml: fix ML_ERROR/printf parameters
cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32409.
2014-08-04 04:05:59 +00:00
Gilles Gouaillardet
f7b13d1126 Fix missing ampersand.
also replase the OMPI_CAST_RTE_NAME macro with
an inline function if OPAL_ENABLE_DEBUG, so we can
get warnings from the compiler if ampersand is missing.

Thanks to Paul Hargrove for reporting the bugs

This commit was SVN r32408.
2014-08-04 02:52:56 +00:00
Ralph Castain
61bf7af9d2 Per Paul Hargrove's suggestion, create an opal_pagesize function to abstract the various ways of obtaining that value. Rather than creating a separate file for only that one function, put it in a convenient place that is at least somewhat related.
Refs trac:4826

This commit was SVN r32407.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-02 18:38:16 +00:00
Ralph Castain
daeb9b6c4f Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain.
Thanks to Gilles for pointing out some of the discrepancies.

This commit was SVN r32398.
2014-08-01 14:44:11 +00:00
Gilles Gouaillardet
cd8fa75f87 coll/ml: align on page size as returned by sysconf
Thanks to Paul Hargrove for pointing into the right direction

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32393.
2014-08-01 08:10:12 +00:00
George Bosilca
cee2a4e5c8 Missing alloca.h. Thanks Paul for catching this.
This commit was SVN r32388.
2014-08-01 03:28:23 +00:00
Ralph Castain
309d75dadc Add missing ampersand - function call required a pointer, not the name itself
This commit was SVN r32357.
2014-07-30 14:48:20 +00:00
Gilles Gouaillardet
b95537376f bcol/basesmuma: fix parameter order
Ref: #4815

This commit was SVN r32353.
2014-07-30 05:38:53 +00:00
Nathan Hjelm
0a32ea87e7 Remove unneeded EXTRA_DIST
This commit was SVN r32343.
2014-07-29 16:54:11 +00:00
George Bosilca
815d7bc846 Fix the example to use the new 2_2 topo component.
This commit was SVN r32338.
2014-07-29 07:00:01 +00:00
George Bosilca
78eae6108a Add an example on how to handle topologies.
This commit was SVN r32337.
2014-07-29 05:05:59 +00:00
Nathan Hjelm
1407c1f501 Remove RML code from common/sm
The only user of this code was coll/sm. I implemented a basic replacement
for the removed code. This gets the trunk compiling again with
--disable-dlopen.

This commit was SVN r32333.
2014-07-28 22:00:12 +00:00
Nathan Hjelm
0f15afa4d9 Fix typo in psm mtl
This commit was SVN r32332.
2014-07-28 22:00:03 +00:00
Nathan Hjelm
603ba71b0d Ignore components broken by the BTL move.
common/ofacm is only used by the iboffload code in ompi. This code does
not currently work so it is safe to ignore these components until it is
fixed.

This commit was SVN r32331.
2014-07-28 21:24:18 +00:00
Ryan Grant
caa10a5faf Portals fixes after latest move
This commit was SVN r32330.
2014-07-28 19:25:03 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Jeff Squyres
0ab4eaa7d3 usnic: revert r32315 because the BTL move to opal is ongoing
Let's not make the move to OPAL any harder than it has to be; this
commit can wait until after the BTL move.

This commit was SVN r32316.

The following SVN revision numbers were found above:
  r32315 --> open-mpi/ompi@7b7ed8ed97
2014-07-25 14:17:20 +00:00
Jeff Squyres
7b7ed8ed97 usnic: minor cleanup / consolidation
CMR'ing just to (try to) keep the differences between trunk and v1.8
branch (somewhat) small.

Reviewed by Dave Goodell

cmr=v1.8.3:reviewer=ompi-rm1.8

This commit was SVN r32315.
2014-07-25 14:11:54 +00:00
Jeff Squyres
6ae45b34fc usnic: check connectivity on first communication to a peer
Previously, we were only checking connectivity upon first ''send'' to
a peer.  But this ignores the case where the first communication to a
peer is actually an ACK -- i.e., we successfully received something
from the peer and we need to send an ACK back.  So we need to verify
that the ACK will actually get there.

Specifically, certain asymmetric routing cases can lead to a hang if
we don't check the connectivity in both directions.  E.g., if the
sender is able to get traffic to the receiver, but the receiver is
unable to get traffic back to the sender because it made a different
routing decision than the sender.

In this case, the connectivity checker from the sender could succeed
(because the connectivity checker will ACK along the same path in
which the ping was received), but sending a BTL ACK could fail
(because the BTL ACK will be sent back along the path chosen by the
graph algorithm, which, in an erroneous asymmetric routing scenario,
may be different/wrong).

Hence, we want to trigger the connectivity checker at the first
communication from A->B, which may either be a BTL send or an ACK.

Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32309.
2014-07-24 21:32:56 +00:00
Rolf vandeVaart
9bc8fbaefd Create new error message so we can better pinpoint where an error occurs.
This commit was SVN r32303.
2014-07-24 15:18:55 +00:00
Rolf vandeVaart
3f703afb97 Fix CUDA registration where we run out of memory being allocated.
This commit was SVN r32297.
2014-07-23 21:10:17 +00:00
Todd Kordenbrock
42a871efd4 This commit fixes trac:4662 - "Portals4/MTL hangs in c_get_accumulate test".
- Portals4/OSC was unable to acquire an exclusive lock due to an invalid
local address in the atomic operation.  This caused the reported hang.
- After fixing the hang, the test continued to fail because
ompi_datatype_is_contiguous_memory_layout() reports that MPI_EMPTY (the
origin datatype) is noncontiguous and Portals4/OSC does not support
noncontiguous datatypes at this time.  However, in this case the origin
count is zero so contiguous/noncontiguous is irrelevant.  Now we skip
the contiguous check if the count is zero.

cmr=v1.8.3:reviewer=regrant:subject=Fix for "Portals4/MTL hangs in c_get_accumulate test"

This commit was SVN r32295.

The following Trac tickets were found above:
  Ticket 4662 --> https://svn.open-mpi.org/trac/ompi/ticket/4662
2014-07-23 19:13:07 +00:00
Edgar Gabriel
d4f83ab929 clean up of the MCA parameters of the fcoll framework. Most parameters are now
set/retrieved in ompio instead of the fcoll components.

This commit was SVN r32294.
2014-07-23 19:03:14 +00:00
Jeff Squyres
ac2621debf usnic: show_help if we can't create the connectivity map file
QA ran across the case where the user can't write to the target
directory for the connectivity map file.  In this case, we silently
continued.  They requested that we at least warn in this case.

Fixes Cisco bug CSCup62821

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32283.
2014-07-22 20:50:59 +00:00
Devendar Bureddy
74852b4d21 HCOLL: fix misplaced hcoll_init return value check.
cmr=v1.8.2:reviewer=jladd

This commit was SVN r32282.
2014-07-22 18:47:34 +00:00
Rolf vandeVaart
63d6a08283 Fix set-but-unused-warning noticed by jsquyres.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32281.
2014-07-22 18:37:40 +00:00
Howard Pritchard
828a4a29b7 Subject: fix regression in ugni btl eager get path
Description:
This mod fixes a regression in the ugni btl eager get
path introduced in changeset 32196.
References:4800
Closes:4800

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32264.
2014-07-22 15:42:11 +00:00
Rolf vandeVaart
8778418da2 Remove some debug #ifdefs (oops). Other lock support.
This commit was SVN r32263.
2014-07-22 02:09:06 +00:00
Rolf vandeVaart
1a61dd3078 Add some more locks where needed.
This commit was SVN r32262.
2014-07-22 00:29:57 +00:00
Rolf vandeVaart
7897d2a828 Improve verbose message which says which device:ports are being used. Also move where message is generated.
This commit was SVN r32261.
2014-07-21 20:38:52 +00:00
Jeff Squyres
da18eb1b8b common/verbs: fix usnic detection
The logic was mishandling the case of a newer kernel and an older
libusnic_verbs.  Simplify usnic_transport() to return constants in the
2 known cases (not a usNIC device and the TRANSPORT_USNIC_UDP case),
and call the magic probe in all other cases.

Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32260.
2014-07-21 19:52:29 +00:00
Jeff Squyres
b6075ea775 usnic: explicitly handle case when both endpoints are NULL
If we don't explicitly declare that (a == NULL && b == NULL) is
equivalent to qsort, we could end up with wonky sorting order.  I.e.,
it's *possible* that some NULLs could end up in the middle of the
array.

Regardless of whether it will ever happen in practice, it makes the
code more clear to also handle the "both are NULL" case.

Also fix the 2-spacing indents.

Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32259.
2014-07-21 16:22:48 +00:00
Rolf vandeVaart
947a4e14b4 Add a lock and clean up handling of some error conditions,
This commit was SVN r32258.
2014-07-17 19:33:10 +00:00
Mike Dubman
da8df859b3 MXM: use builk connection establishment API
fixed by Vasily, reviewed by Yossi/Miked

cmr=v1.8.2:reviwer=ompi-rm1.8

This commit was SVN r32256.
2014-07-17 08:35:55 +00:00
Rolf vandeVaart
26e3282a18 One more minor movement for easier reading. No functional change.
This commit was SVN r32252.
2014-07-16 20:59:07 +00:00
Rolf vandeVaart
c332ca75ff Change function name for clarity.
This commit was SVN r32251.
2014-07-16 20:46:10 +00:00
Rolf vandeVaart
a2dd4ca226 Remove hack that is no longer needed.
This commit was SVN r32250.
2014-07-16 14:00:17 +00:00
Rolf vandeVaart
61821adf2f Fix self deadlock bugs.
This commit was SVN r32249.
2014-07-15 20:50:41 +00:00
Ralph Castain
6c5e592785 Revert r32222, r32210, and r32203 as they created a problem when daemon collectives did not involve app procs on every node. Instead, modify the ompi/mca/rte/orte/rte_orte.h to add a new function that allows apps to request new daemon collective ids for use in barrier and modex operations. This will only appear in ORTE-based installations, but it is only being used by a couple of researchers at the moment.
Update the orte/test/mpi/coll_test.c test to show the revised example.

This commit was SVN r32234.

The following SVN revision numbers were found above:
  r32203 --> open-mpi/ompi@a523dba41d
  r32210 --> open-mpi/ompi@2ce11ed5c4
  r32222 --> open-mpi/ompi@d55f16db50
2014-07-15 03:48:00 +00:00