1
1
Граф коммитов

25908 Коммитов

Автор SHA1 Сообщение Дата
rhc54
ad156e3e91 Merge pull request #2207 from rhc54/topic/pmixupdate
Update PMIx support to latest PMIx master
2016-10-11 18:57:11 -05:00
rhc54
ee9f33f08c Merge pull request #2146 from rhc54/topic/rml2
Bring the RML modifications across
2016-10-11 18:54:59 -05:00
Jeff Squyres
bcbf0bc4f9 usnic: s/OMPI/OPAL/
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-11 16:43:35 -07:00
Ralph Castain
a2919174d0 Bring the RML modifications across. This is the first step in a revamp of the ORTE messaging subsystem to support fabric-based communications during launch and wireup phases. When completed, the grpcomm and plm frameworks will each have their own "conduit" for communication - each conduit corresponds to a particular RML messaging transport. This can be the active OOB-based component, or a provider from within the RML/OFI component. Messages sent down the conduit will flow across the associated transport.
Multiple conduits can exist at the same time, and can even point to the same base transport. Each conduit can have its own characteristics (e.g., flow control) based on the info keys provided to the "open_conduit" call. For ease during the transition period, the "legacy" RML interfaces remain as wrappers over the new conduit-based APIs using a default conduit opened during orte_init - this default conduit is tied to the OOB framework so that current behaviors are preserved. Once the transition has been completed, a one-time cleanup will be done to update all RML calls to the new APIs and the "legacy" interfaces will be deleted.

While we are at it: Remove oob/usock component to eliminate the TMPDIR length problem - get all working, including oob_stress
2016-10-11 16:01:02 -07:00
Ralph Castain
6ce4b6d098 Eliminate -Wall from being hardcoded 2016-10-11 12:50:31 -07:00
Ralph Castain
1859b03416 Enable PMIx shared memory support by default 2016-10-11 12:18:01 -07:00
Ralph Castain
1d7d7c201b Update PMIx support to latest PMIx master 2016-10-11 10:17:23 -07:00
Nathan Hjelm
432d79046b Merge pull request #2197 from tkordenbrock/topic/master/osc-rdma.put.use.true_extent
osc-rdma: fix datatype lower bound errors in ompi_osc_rdma_master()
2016-10-11 10:42:02 -06:00
Ryan Grant
fd55204791 Merge pull request #2196 from tkordenbrock/topic/master/osc-portals4.put.use.true_extent
osc-portals4: fix datatype errors in put()
2016-10-10 08:57:12 -06:00
Todd Kordenbrock
05f86b5df7 osc-rdma: fix datatype lower bound errors in ompi_osc_rdma_master()
Instead of ompi_datatype_get_extent(), use ompi_datatype_get_true_extent()
to get the local and remote lower bound.  For derived types like
subarray, true_lb is the correct offset for RDMA operations.
2016-10-10 06:45:28 -05:00
Todd Kordenbrock
cc863ff9fb osc-portals4: fix datatype errors in put()
Instead of ompi_datatype_get_extent(), use ompi_datatype_get_true_extent()
to get the origin and target lower bound.  For derived types like
subarray, true_lb is the correct offset for RDMA operations.  Also,
instead of the extent use the size of the datatype.
2016-10-10 06:45:14 -05:00
rhc54
b703f2e167 Merge pull request #2195 from rhc54/topic/notify
Implement the backend support for process-generated event notification
2016-10-08 13:55:19 -05:00
Ralph Castain
5b1484a836 Implement the backend support for process-generated event notification 2016-10-08 09:24:28 -07:00
Gilles Gouaillardet
315a622723 ompi: invokes opal_cleanup() if ompi_mpi_finalize() when possible
As long as it is illegal to call MPI_T_init_thread() after MPI_Finalize(),
be gentle and release as much memory as possible in MPI_Finalize().
opal_cleanup() will be invoked again by the OPAL destructor, but will
do nothing since classes was set to NULL
2016-10-08 16:58:20 +09:00
Gilles Gouaillardet
0d24fad307 opal: always run opal_class_finalize in the opal_cleanup destructor
if MPI_Init[_thread]/MPI_Finalize and MPI_T_init_thread/MPI_T_finalize
are balanced, opal_initialized is zero, and hence opal_cleanup destructor
never invokes opal_class_finalize.
if MPI_Init[_thread] nor MPI_T_init_thread have been called, classes is NULL,
so opal_class_finalize does nothing
2016-10-08 16:58:20 +09:00
Gilles Gouaillardet
b55dd2442a libevent2022: rename _event_strlcpy 2016-10-08 16:58:20 +09:00
Gilles Gouaillardet
c92e9a5406 use the new OPAL_HASH_TABLE_FOREACH convenience macro 2016-10-08 16:58:20 +09:00
Gilles Gouaillardet
23a8f764bd opal: add the OPAL_HASH_TABLE_FOREACH macro
this is a convenience macro similar to the OPAL_LIST_FOREACH macro,
that can be used to iterate on all the key/value pairs of an opal_hash_table_t
2016-10-08 16:58:20 +09:00
Gilles Gouaillardet
014f917462 opal: fix comment in OPAL_LIST_FOREACH macro. no code change. 2016-10-08 16:58:19 +09:00
Gilles Gouaillardet
13d49c135f Merge pull request #2193 from ggouaillardet/topic/pmix_misc_plugs_and_fixes
pmix3x: plugs misc memory leaks and misc fixes
2016-10-08 16:57:06 +09:00
Gilles Gouaillardet
f1f1fb15eb pmix3x: configury: output major, minor and release version after checking them
and hence fix the configure output

(back-ported from upstream commit pmix/master@7b7cdda2de)
2016-10-08 13:01:28 +09:00
Gilles Gouaillardet
f3af799608 pmix3x: misc fixes to get pmix build on Solaris
- replace MAXHOSTNAMELEN with hardcoded 1024.
  unlike Linux, Solaris #define MAXHOSTNAMELEN in <netdb.h>,
  so use a hard coded value to keep the test simpl
- stdout cannot be assigned on Solaris, so use freopen instead

(back-ported from upstream commit pmix/master@a63f6e53f4)
2016-10-08 13:01:28 +09:00
Gilles Gouaillardet
5cbfddb8f1 pmix3x: fix misc memory leaks
(back-ported from upstream commit pmix/master@1eff526929)
2016-10-08 13:01:28 +09:00
Gilles Gouaillardet
b4e4e4a5f1 pmix3x: enhance pmix_nspace_t destructor
PMIX_RELEASE all elements stored in the internal and modex hash tables

(back-ported from upstream commit pmix/master@b90674fc52)
2016-10-08 13:01:27 +09:00
Gilles Gouaillardet
f1dc033767 pmix3x: add the PMIX_HASH_TABLE_FOREACH macro
this is a convenience macro similar to the PMIX_LIST_FOREACH macro,
that can be used to iterate on all the key/value pairs of a pmix_hash_table_t

(back-ported from upstream commit pmix/master@349971c68c)
2016-10-08 13:01:27 +09:00
rhc54
73298ad4e2 Merge pull request #2192 from rhc54/topic/showhelp
Send show_help out thru stderr
2016-10-07 22:43:44 -05:00
Ralph Castain
51b2bb1d41 Send show_help out thru stderr 2016-10-07 19:23:52 -07:00
Joshua Ladd
4f1b63d9a2 Merge pull request #2188 from jladd-mlnx/topic/oshmem-bump-to-v1.3
OSHMEM Specification version: Bump to v1.3.
2016-10-07 07:15:43 -04:00
Gilles Gouaillardet
1ef2ad029f fs: do not build the fs components configured with --disable-io-ompio 2016-10-07 13:15:04 +09:00
Joshua Ladd
fe2b8b7e06 OSHMEM Specification version: Bump to v1.3. 2016-10-06 22:12:07 +03:00
Gilles Gouaillardet
6c6e35bb40 ompi/communicator: silence warnings 2016-10-06 15:03:06 +09:00
Gilles Gouaillardet
b95e243f83 ompi/errhandler: silence warnings
ISO C forbids mixing object pointer and function pointer
2016-10-06 13:20:51 +09:00
Gilles Gouaillardet
95e63d7803 cxx bindings: fix support for --disable-mpi-io configure option
Fixes open-mpi/ompi#2179
2016-10-06 09:53:59 +09:00
Jeff Squyres
67684be7c9 usnic: fix one last stray fabric_attr->name --> linux_device_name
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-04 18:17:38 -07:00
Todd Kordenbrock
54c46ca14e Merge pull request #2156 from tkordenbrock/topic/raccumulate.offset.fix
osc-portals4: fix offset bug in raccumulate()
2016-10-04 20:11:54 -05:00
Gilles Gouaillardet
ebb43c59d9 travis: cope with brew upgrade failure
also try to brew install gcc before brew upgrade gcc
2016-10-05 09:50:34 +09:00
Gilles Gouaillardet
bc6724567f README: document --disable-io-romio and --disable-io-ompio configure options 2016-10-05 09:14:32 +09:00
Nathan Hjelm
6cdbdceee6 Merge pull request #2064 from hjelmn/cxx_isolation
mpi/cxx: isolate internal headers from C++ bindings
2016-10-04 12:54:11 -06:00
Nathan Hjelm
c6464cae37 mpi/cxx: isolate internal headers from C++ bindings
This commit adds some glue code to support the C++ bindings and
updates the bindings to use the new glue code. This protects our
internal headers (which are C99) from C++. This is done as a quick
workaround to compilation errors when the legacy C++ bindings are
requested.

Fixes open-mpi/ompi#2055

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-10-04 11:13:25 -06:00
Todd Kordenbrock
c536e11cf3 osc-portals4: fix offset bug in raccumulate()
This commit fixes a bug where the remote offset was used as both
the local and remote offset.

Thanks to @PDeveze for the patch.
2016-10-04 09:09:17 -05:00
Jeff Squyres
f3144c7a55 Merge pull request #2152 from jsquyres/pr/usnic-improvements
usNIC BTL improvements w.r.t. libfabric bootstrapping
2016-10-03 16:46:41 -04:00
Jeff Squyres
8b77359cac usnic: remove some legacy libfabric 1.0/1.1 code
We only support running with libfabric v1.3 or greater.  So it's safe
to remove the legacy/adaptive cq_readerr() behavior.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-03 11:59:41 -07:00
Jeff Squyres
345c07a252 usnic: require libfabric >= v1.3 at run time
There are critical usnic libfabric AV insert bugs before v1.3, so
don't allow any version prior to v1.3 at run time (still allow
*compiling* with earlier versions, though, since the ABI guarantees
allow us to compile with an earlier libfabric and run with a later
libfabric).

Switch to using fi_version() to check the version (instead of calling
fi_getinfo()) as a potentially lighter-weight / simpler solution.
This allows us to only call fi_getinfo() once.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-03 11:59:41 -07:00
Jeff Squyres
b13813810f usnic: print a helpful message invoke PML error callback
The previous message was unhelpful / confusing.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-03 11:59:41 -07:00
Gilles Gouaillardet
7601e783cc pmix3x: sec/munge: add a missing include file
(cherry picked from upstream pmix/master@f7cfb11f6b)
2016-10-03 16:09:10 +09:00
rhc54
507ef670e5 Merge pull request #2148 from rhc54/topic/showhelp
Put show_help thru the PMIx "log" API.
2016-10-02 19:04:56 -05:00
Ralph Castain
e773c17cf3 Put show_help thru the PMIx "log" API. This pushes the show_help output from apps into the pmix thread, thus avoiding conflicts in the RML thread, which should help with thread lock situations. 2016-10-02 16:02:23 -07:00
Jeff Squyres
2975c7f5b8 Merge pull request #2143 from jsquyres/pr/usnic-fix-cagent-max-msg-size
usnic cagent: correctly compute the "large" ping message size
2016-09-30 20:57:21 -04:00
Jeff Squyres
545d8f2e66 usnic cagent: correctly compute the "large" ping message size
The (effective) "+42" computation was, in fact, the incorrect answer
in this case (gasp!).

We should just take the max_msg_size from the command (which came from
the libfabric endpoint max_msg_size attribute in the client) and
subtract off the max header size: 68 (which is explained in the
comment).  This will result in a "large" message size which is likely
slightly smaller than the MTU, but still right up near the MTU, and
therefore good enough.

Note: the old computation (i.e., -(68-42)) worked fine when we asked
for Libfabric API v1.1 because the usnic provider would return a
max_msg_size that was already less than the MTU due to FI_PREFIX
behavior shenanigans.  Once we started asking for Libfabric API v1.4,
the usnic Libfabric provider started returning (MTU + prefix_size),
and the -(68-42) computation started giving a value that was over the
MTU.  This caused sendto() on the connectivity checker UDP socket
to fail.

This commit also removes an old/misleading comment.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-09-30 17:01:05 -07:00
Joshua Hursey
fc3cf994db build: Custom libmpi_FOO name fix for wrapper compilers
* In open-mpi/ompi@f6f24a4f67 I missed
   updating the library references for the wrapper compilers.
 * Fixes the CXX wrapper compiler and CXX library is renamed as needed.
 * Fixes the Java wrapper compiler and the Java library is renamed as needed.
2016-09-30 16:40:56 -05:00