1
1
Граф коммитов

4421 Коммитов

Автор SHA1 Сообщение Дата
rhc54
698dac108b Merge pull request #2309 from rhc54/topic/pmix
Update to latest PMIx master - mostly updates example codes, but incl…
2016-10-27 14:21:46 -07:00
Ralph Castain
f4a55118e6 Update to latest PMIx master - mostly updates example codes, but includes one critical cleanup during finalize.
NOTE: set dstore shared memory storage option to "on" by default

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-10-27 11:16:03 -07:00
Nathan Hjelm
9d92075e60 btl/self: rewrite to decrease memory usage (#2307)
This commit rewrites much of the btl/self component to fix a long
standing memory usage bug. Before this commit the prepare_src path
would always allocate a max send fragment (256kB). This caused the
rank to allocate 32 * 256k useless buffers from one send. This commit
makes the following changes:

 - Add the MCA_BTL_FLAGS_GET flag by default. No reason not to set it.

 - Reduce the eager limit, max send size, buffers per allocation, and
   maximum buffer count per fragment size. These changes should have
   no noticible affect on performance but should greatly reduce the
   memory usage of the component.

 - Implement the sendi function. This should reduce self send latency
   somewhat.

 - Rewrite prepare_src to never allocate a eager or max send fragment
   for contiguous data.

 - add_procs needs to return something in the peer array for the proc
   self not just set the reachability bit. Now stores (void *) 1.

 - Various cleanups. Removed and unused file.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-10-27 12:34:54 -04:00
Ralph Castain
f298f294e1 Update PMIx to latest master tarball. Ensure we set the HNP name for orted's so that PMIx_Lookup can find the server
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-10-26 15:48:56 -07:00
Gilles Gouaillardet
8cc3f288c9 opal: fix opal_class_finalize() usage
the class system can be initialized/finalized as many times as we like,
so there is no more need to have opal_class_finalize() invoked in a destructor

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-26 15:15:54 +09:00
Gilles Gouaillardet
f2a80dc09f configury: check libnl version and abort in case of conflict
libnl and libnl-3 are known to conflict with each other, so detect
and abort if these two libs are both used directly (e.g. Open MPI
uses libnl-3) or indirectly (e.g. libibverbs.so might depend on libnl)
2016-10-25 09:23:59 +09:00
Ralph Castain
649301a3a2 Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier).
Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
2016-10-23 21:52:39 -07:00
rhc54
900ae15d49 Merge pull request #2221 from bharatpotnuri/master
btl/openib: remove unwanted ompi header inclusion in opal code.
2016-10-21 14:05:55 -05:00
Ralph Castain
9131eca9c6 Update to latest PMIx master 2016-10-20 21:13:40 -07:00
Ralph Castain
be3197fe27 Ensure that the libevent headers are installed for external libevent when --with-devel-headers is given. Correct the path for opal_config.h in the external hwloc header 2016-10-20 20:57:50 -07:00
Ralph Castain
2f966bf3bf Cleanup external PMIx v3 component for copy/paste errors - component and module require unique names 2016-10-20 09:11:46 -07:00
Ralph Castain
8113a8d1b0 Now that we are hiding symbols in the internal PMIx component, we cannot reuse that component for integration to the external PMIx master as the symbols don't match. So create a new "ext3x" component and copy the PMIx v3 integration over there.
Also, remove a couple of build-product files from the pmix3x component.
2016-10-18 13:15:32 -07:00
Ralph Castain
50c9f3de55 Ensure the PMIx progress thread is stopped prior to tearing anything down. Thanks to Gilles for spotting this error! 2016-10-18 00:27:52 -07:00
Gilles Gouaillardet
4e19cd51b1 hwloc/external: add a missing include file 2016-10-14 09:27:33 +09:00
Ralph Castain
6f65d0a173 Repair event notification support. Cleanup the long-suffering "epoll: warning" coming out of libevent whenever a process abnormally terminated.
Add changes to test program

Sync to PMIx master
2016-10-13 16:27:39 -07:00
Ralph Castain
6417f217e1 Turn PMIx dstore off by default as MTT was effectively broken 2016-10-13 08:14:51 -07:00
Potnuri Bharat Teja
29f1aa836f btl/openib: remove unwanted ompi header inclusion in opal code.
OMPI header cannot be included in OPAL source code, hence removed it.
Fixes: (740b636db) btl/openib: Disqualify rdmacm CPC if
MPI_THREAD_MULTIPLE.

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
2016-10-13 16:21:36 +05:30
Nathan Hjelm
5b40fd267f Merge pull request #2204 from hjelmn/arm64
asm/arm64: ensure instruction ordering on timer
2016-10-12 11:22:28 -06:00
Nathan Hjelm
9a50ce6364 asm/arm64: ensure instruction ordering on timer
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-10-12 09:25:21 -06:00
Ralph Castain
8f05beb1ec Sync pmix/master@cb53105 2016-10-11 20:54:59 -07:00
rhc54
ad156e3e91 Merge pull request #2207 from rhc54/topic/pmixupdate
Update PMIx support to latest PMIx master
2016-10-11 18:57:11 -05:00
Jeff Squyres
bcbf0bc4f9 usnic: s/OMPI/OPAL/
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-11 16:43:35 -07:00
Ralph Castain
6ce4b6d098 Eliminate -Wall from being hardcoded 2016-10-11 12:50:31 -07:00
Ralph Castain
1859b03416 Enable PMIx shared memory support by default 2016-10-11 12:18:01 -07:00
Ralph Castain
1d7d7c201b Update PMIx support to latest PMIx master 2016-10-11 10:17:23 -07:00
Ralph Castain
5b1484a836 Implement the backend support for process-generated event notification 2016-10-08 09:24:28 -07:00
Gilles Gouaillardet
0d24fad307 opal: always run opal_class_finalize in the opal_cleanup destructor
if MPI_Init[_thread]/MPI_Finalize and MPI_T_init_thread/MPI_T_finalize
are balanced, opal_initialized is zero, and hence opal_cleanup destructor
never invokes opal_class_finalize.
if MPI_Init[_thread] nor MPI_T_init_thread have been called, classes is NULL,
so opal_class_finalize does nothing
2016-10-08 16:58:20 +09:00
Gilles Gouaillardet
b55dd2442a libevent2022: rename _event_strlcpy 2016-10-08 16:58:20 +09:00
Gilles Gouaillardet
c92e9a5406 use the new OPAL_HASH_TABLE_FOREACH convenience macro 2016-10-08 16:58:20 +09:00
Gilles Gouaillardet
23a8f764bd opal: add the OPAL_HASH_TABLE_FOREACH macro
this is a convenience macro similar to the OPAL_LIST_FOREACH macro,
that can be used to iterate on all the key/value pairs of an opal_hash_table_t
2016-10-08 16:58:20 +09:00
Gilles Gouaillardet
014f917462 opal: fix comment in OPAL_LIST_FOREACH macro. no code change. 2016-10-08 16:58:19 +09:00
Gilles Gouaillardet
f1f1fb15eb pmix3x: configury: output major, minor and release version after checking them
and hence fix the configure output

(back-ported from upstream commit pmix/master@7b7cdda2de)
2016-10-08 13:01:28 +09:00
Gilles Gouaillardet
f3af799608 pmix3x: misc fixes to get pmix build on Solaris
- replace MAXHOSTNAMELEN with hardcoded 1024.
  unlike Linux, Solaris #define MAXHOSTNAMELEN in <netdb.h>,
  so use a hard coded value to keep the test simpl
- stdout cannot be assigned on Solaris, so use freopen instead

(back-ported from upstream commit pmix/master@a63f6e53f4)
2016-10-08 13:01:28 +09:00
Gilles Gouaillardet
5cbfddb8f1 pmix3x: fix misc memory leaks
(back-ported from upstream commit pmix/master@1eff526929)
2016-10-08 13:01:28 +09:00
Gilles Gouaillardet
b4e4e4a5f1 pmix3x: enhance pmix_nspace_t destructor
PMIX_RELEASE all elements stored in the internal and modex hash tables

(back-ported from upstream commit pmix/master@b90674fc52)
2016-10-08 13:01:27 +09:00
Gilles Gouaillardet
f1dc033767 pmix3x: add the PMIX_HASH_TABLE_FOREACH macro
this is a convenience macro similar to the PMIX_LIST_FOREACH macro,
that can be used to iterate on all the key/value pairs of a pmix_hash_table_t

(back-ported from upstream commit pmix/master@349971c68c)
2016-10-08 13:01:27 +09:00
Jeff Squyres
67684be7c9 usnic: fix one last stray fabric_attr->name --> linux_device_name
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-04 18:17:38 -07:00
Jeff Squyres
8b77359cac usnic: remove some legacy libfabric 1.0/1.1 code
We only support running with libfabric v1.3 or greater.  So it's safe
to remove the legacy/adaptive cq_readerr() behavior.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-03 11:59:41 -07:00
Jeff Squyres
345c07a252 usnic: require libfabric >= v1.3 at run time
There are critical usnic libfabric AV insert bugs before v1.3, so
don't allow any version prior to v1.3 at run time (still allow
*compiling* with earlier versions, though, since the ABI guarantees
allow us to compile with an earlier libfabric and run with a later
libfabric).

Switch to using fi_version() to check the version (instead of calling
fi_getinfo()) as a potentially lighter-weight / simpler solution.
This allows us to only call fi_getinfo() once.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-03 11:59:41 -07:00
Jeff Squyres
b13813810f usnic: print a helpful message invoke PML error callback
The previous message was unhelpful / confusing.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-10-03 11:59:41 -07:00
Gilles Gouaillardet
7601e783cc pmix3x: sec/munge: add a missing include file
(cherry picked from upstream pmix/master@f7cfb11f6b)
2016-10-03 16:09:10 +09:00
Ralph Castain
e773c17cf3 Put show_help thru the PMIx "log" API. This pushes the show_help output from apps into the pmix thread, thus avoiding conflicts in the RML thread, which should help with thread lock situations. 2016-10-02 16:02:23 -07:00
Jeff Squyres
545d8f2e66 usnic cagent: correctly compute the "large" ping message size
The (effective) "+42" computation was, in fact, the incorrect answer
in this case (gasp!).

We should just take the max_msg_size from the command (which came from
the libfabric endpoint max_msg_size attribute in the client) and
subtract off the max header size: 68 (which is explained in the
comment).  This will result in a "large" message size which is likely
slightly smaller than the MTU, but still right up near the MTU, and
therefore good enough.

Note: the old computation (i.e., -(68-42)) worked fine when we asked
for Libfabric API v1.1 because the usnic provider would return a
max_msg_size that was already less than the MTU due to FI_PREFIX
behavior shenanigans.  Once we started asking for Libfabric API v1.4,
the usnic Libfabric provider started returning (MTU + prefix_size),
and the -(68-42) computation started giving a value that was over the
MTU.  This caused sendto() on the connectivity checker UDP socket
to fail.

This commit also removes an old/misleading comment.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-09-30 17:01:05 -07:00
Joshua Hursey
f6f24a4f67 build: Custom libmpi(_FOO) name option in configure
* Add a configure time option to rename libmpi(_FOO).*
   - `--with-libmpi-name=STRING`
 * This commit only impacts the installed libraries.
   Internal, temporary libraries have not been renamed to limit the
   scope of the patch to only what is needed.

For example:
```shell
shell$ ./configure --with-libmpi-name=wookie
...
shell$ find . -name "libmpi*"
shell$ find . -name "libwookie*"
./lib/libwookie.so.0.0.0
./lib/libwookie.so.0
./lib/libwookie.so
./lib/libwookie.la
./lib/libwookie_mpifh.so.0.0.0
./lib/libwookie_mpifh.so.0
./lib/libwookie_mpifh.so
./lib/libwookie_mpifh.la
./lib/libwookie_usempi.so.0.0.0
./lib/libwookie_usempi.so.0
./lib/libwookie_usempi.so
./lib/libwookie_usempi.la
shell$
```
2016-09-29 21:47:24 -05:00
Gilles Gouaillardet
871ade9231 pmix/{cray,s1,s2}: make pmi_opcaddy_t class static
theses three pmix components use the same class name,
declare it as static so Open MPI can be built with --disable-dlopen

Thanks Limin Gu for the report
2016-09-28 09:18:36 +09:00
Jeff Squyres
1a5a5fb400 Merge pull request #1861 from bharatpotnuri/master
btl/openib: Disqualify rdmacm CPC if MPI_THREAD_MULTIPLE
2016-09-27 13:03:35 -04:00
Potnuri Bharat Teja
740b636dbe btl/openib: Disqualify rdmacm CPC if MPI_THREAD_MULTIPLE
The rdmacm CPC in the openib BTL is not thread safe. The rdmacm CPC
should disqualify itself (instead of failing in random ways) if
MPI_THREAD_MULTIPLE is the thread level.

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
2016-09-27 14:20:59 +05:30
Gilles Gouaillardet
1fbc9a5431 pmix3x: dstore/pmix: flock portability
Using the fcntl-locking instead of the flock

(back-ported from upstream pmix/master@3030a0cca1)
2016-09-27 13:21:03 +09:00
George Bosilca
066370202d Support non-monotonic assembly timers.
If monotonic support has been required by the runtime and the
assembly timers are unable to provide it, fall back to clock_gettime.
2016-09-23 21:51:34 -04:00
George Bosilca
45dcf1f5d7 Always use the best timer available
If we have better timer than clock_gettime use it, even if it an
assembly timer.
2016-09-23 19:32:58 -04:00