openmpi

Автор	SHA1	Сообщение	Дата
Gilles Gouaillardet	7601e783cc	pmix3x: sec/munge: add a missing include file (cherry picked from upstream pmix/master@f7cfb11f6b)	2016-10-03 16:09:10 +09:00
Ralph Castain	e773c17cf3	Put show_help thru the PMIx "log" API. This pushes the show_help output from apps into the pmix thread, thus avoiding conflicts in the RML thread, which should help with thread lock situations.	2016-10-02 16:02:23 -07:00
Jeff Squyres	545d8f2e66	usnic cagent: correctly compute the "large" ping message size The (effective) "+42" computation was, in fact, the incorrect answer in this case (gasp!). We should just take the max_msg_size from the command (which came from the libfabric endpoint max_msg_size attribute in the client) and subtract off the max header size: 68 (which is explained in the comment). This will result in a "large" message size which is likely slightly smaller than the MTU, but still right up near the MTU, and therefore good enough. Note: the old computation (i.e., -(68-42)) worked fine when we asked for Libfabric API v1.1 because the usnic provider would return a max_msg_size that was already less than the MTU due to FI_PREFIX behavior shenanigans. Once we started asking for Libfabric API v1.4, the usnic Libfabric provider started returning (MTU + prefix_size), and the -(68-42) computation started giving a value that was over the MTU. This caused sendto() on the connectivity checker UDP socket to fail. This commit also removes an old/misleading comment. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-09-30 17:01:05 -07:00
Joshua Hursey	f6f24a4f67	build: Custom libmpi(_FOO) name option in configure * Add a configure time option to rename libmpi(_FOO).* - `--with-libmpi-name=STRING` * This commit only impacts the installed libraries. Internal, temporary libraries have not been renamed to limit the scope of the patch to only what is needed. For example: ```shell shell$ ./configure --with-libmpi-name=wookie ... shell$ find . -name "libmpi" shell$ find . -name "libwookie" ./lib/libwookie.so.0.0.0 ./lib/libwookie.so.0 ./lib/libwookie.so ./lib/libwookie.la ./lib/libwookie_mpifh.so.0.0.0 ./lib/libwookie_mpifh.so.0 ./lib/libwookie_mpifh.so ./lib/libwookie_mpifh.la ./lib/libwookie_usempi.so.0.0.0 ./lib/libwookie_usempi.so.0 ./lib/libwookie_usempi.so ./lib/libwookie_usempi.la shell$ ```	2016-09-29 21:47:24 -05:00
Gilles Gouaillardet	871ade9231	pmix/{cray,s1,s2}: make pmi_opcaddy_t class static theses three pmix components use the same class name, declare it as static so Open MPI can be built with --disable-dlopen Thanks Limin Gu for the report	2016-09-28 09:18:36 +09:00
Jeff Squyres	1a5a5fb400	Merge pull request #1861 from bharatpotnuri/master btl/openib: Disqualify rdmacm CPC if MPI_THREAD_MULTIPLE	2016-09-27 13:03:35 -04:00
Potnuri Bharat Teja	740b636dbe	btl/openib: Disqualify rdmacm CPC if MPI_THREAD_MULTIPLE The rdmacm CPC in the openib BTL is not thread safe. The rdmacm CPC should disqualify itself (instead of failing in random ways) if MPI_THREAD_MULTIPLE is the thread level. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>	2016-09-27 14:20:59 +05:30
Gilles Gouaillardet	1fbc9a5431	pmix3x: dstore/pmix: flock portability Using the fcntl-locking instead of the flock (back-ported from upstream pmix/master@3030a0cca1)	2016-09-27 13:21:03 +09:00
George Bosilca	066370202d	Support non-monotonic assembly timers. If monotonic support has been required by the runtime and the assembly timers are unable to provide it, fall back to clock_gettime.	2016-09-23 21:51:34 -04:00
George Bosilca	45dcf1f5d7	Always use the best timer available If we have better timer than clock_gettime use it, even if it an assembly timer.	2016-09-23 19:32:58 -04:00
George Bosilca	93fa94f96f	Re-enable support for local addresses. This patch is based on the "RFC: Reenabling the TCP BTL over local interfaces (when specifically requested)". It removes the hardcoded exception for the local devices that has been enforced by the TCP BTL. Instead, we exclude the local interface only via the exclude MCA (both IPv4 and IPv6 local addresses are already in the default if_exclude), which is also the behavior currently described in our README file.	2016-09-23 13:04:33 -04:00
Gilles Gouaillardet	362a5886de	pmix3x: client: fix PMIx_Finalize() sequence pmix_progress_thread_finalize() invokes libevent event_base_free, so all libevent stuff cannot be used after. Hence, pmix_client_globals.myserver must be PMIX_DESTRUCT'ed before invoking pmix_progress_thread_finalize()	2016-09-24 00:01:23 +09:00
Gilles Gouaillardet	5479c6cca7	pmix3x: add missing #include and get Open MPI build on OpenBSD 6.0	2016-09-23 11:23:18 +09:00
Gilles Gouaillardet	eaee1332e1	opal/util/ethtool: add missing headers and get Open MPI build on OpenBSD 6.0	2016-09-23 11:22:19 +09:00
Ralph Castain	a14ec3bdbc	Mucho thanks to Gilles - his patch to reorder the CPPFLAGS solves the problem of inadvertently picking up hwloc and libevent headers from locations in CPPFLAGS while continuing to build the embedded versions. Also silence a minor warning about an uninitialized var.	2016-09-22 07:39:22 -07:00
George Bosilca	131fe42db8	Fix MT wait-sync. Prevent a race condition between a thread checking count and then going in cond_wait, and another thread setting the count to 0 and signaling the condition. Thanks to Pascal Deveze for catching up the bug and for the initial patch.	2016-09-21 07:42:48 -04:00
Gilles Gouaillardet	fbf03299c3	Merge pull request #2079 from ggouaillardet/topic/pmix_configury_dlopen pmix3x: configury: correctly handle --disable-dlopen	2016-09-21 10:59:33 +09:00
Gilles Gouaillardet	6c1e25b76e	pmix/ext11: fix pmix1_value_unload() prototype and call pmix1_value_unload() was added a "key" argument which is unused, and pmix1_value_unload() was sometimes invoked with two arguments instead of three. since the "key" argument is unused, simply remove it from the subroutine prototype and calls.	2016-09-20 14:34:41 +09:00
Gilles Gouaillardet	e6f7facd7d	opal/util: improve error message in opal_os_dirpath_create()	2016-09-18 17:10:47 +09:00
Gilles Gouaillardet	4b47daeeb0	opal/util: improve return status of opal_os_dirpath_create()	2016-09-18 12:32:42 +09:00
George Bosilca	295eec7059	Small fix for persistence receives. A minor optimization, few typos and extra comments	2016-09-16 10:27:32 -04:00
Nathan Hjelm	2edc77b27b	asm/ppc: work around apparent PGI 16.9 bug The add_64, sub_64, and cmpset_64 atomics used "+m" (*addr) to indicate the asm also writes the memory location. This is better than using a memory clobber. PGI 16.9 introduced a bug that causes a compiler failure on the "+m" constraint (input/output). It seems to work with "=m" (output) which matches the 32-bit atomics. Fixes open-mpi/ompi#2086 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-15 12:43:31 -06:00
Gilles Gouaillardet	041a431966	pmix3x: configury: correctly handle --disable-dlopen the LT_* macros do overwrite the enable_dlopen variable, so it must be tested and saved before invoking LT_INIT. delay the invokation of the LT_* macros and use the PMIX_ENABLE_DLOPEN_SUPPORT variable to figure out whether --disable-dlopen was invoked	2016-09-15 13:26:20 +09:00
Nathan Hjelm	4c9e38e8e0	Merge pull request #2077 from hjelmn/tcp_fix btl/tcp: fix double list remove	2016-09-13 12:21:52 -06:00
Nathan Hjelm	a681837ba8	btl/tcp: fix double list remove This commit fixes an abort during finalize because pending events were removed from the list twice. References #2030 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-13 09:23:12 -06:00
Gilles Gouaillardet	628c730196	pkgconfig: define the pkgincludedir variable in *.pc files this has been made necesarry with open-mpi/ompi@12e796dcaf Refs open-mpi/ompi#2069	2016-09-13 09:50:14 +09:00
Artem Polyakov	9eba1b0b75	Merge pull request #2042 from artpol84/pmix_sdirs Several fixes related to session directories:	2016-09-07 14:15:47 +07:00
Gilles Gouaillardet	cd2b5a82ed	hwloc: plug memory leak as reported by Coverity with CID 1270441	2016-09-07 10:08:44 +09:00
Gilles Gouaillardet	44a66e208c	threads: fix WAIT_SYNC_INIT with a zero count WAIT_SYNC_INIT(sync,0); WAIT_SYNC_RELEASE(sync); hanged because sync->signaled was initialised to true, and there is no reason to invoke WAIT_SYNC_SIGNALED(sync) before WAIT_SYNC_RELEASE(sync) this commit initializes sync->signaled to true unless the count is zero. Thanks George for the review and guidance.	2016-09-07 10:03:40 +09:00
Nathan Hjelm	27a2509fec	Merge pull request #2051 from hjelmn/ppc_asm opal/asm: updates to powerpc assembly	2016-09-06 15:13:28 -06:00
Jeff Squyres	527efec4fb	Merge pull request #2050 from jsquyres/pr/btl-tcp-help-messages Add a show_help message to TCP BTL when peer unexpectedly disconnects	2016-09-06 09:40:31 -04:00
Jeff Squyres	1953e3406f	btl/tcp: add show_help message when peer hangs up We commonly see messages on the users list where a peer has hung up because it has crashed. Instead of having just a BTL_ERROR message, make this a real opal_show_help() message that tells the user that the peer unexpectedly hung up, and they should look into why that peer hung up. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-09-06 09:40:03 -04:00
Gilles Gouaillardet	894be7860a	gcc_builtin/atomic: Silence numerous warnings from Studio compilers This commit adds selective use of a compiler-specific pragma to silence the numerous warnings the Sun/Oracle/Studio compilers emit for the GNU-style inline asm used in atomic.h. Thanks Paul Hargrove for the initial patch and the guidance.	2016-09-06 09:07:16 +09:00
Gilles Gouaillardet	4b208e4463	btl/tcp: make mca_btl_tcp_proc_insert re-entrant otherwise bad things happen with --mca btl_tcp_progress_thread 1 (non default) and --mca mpi_add_procs_cutoff 0 (default)	2016-09-05 15:57:34 +09:00
Artem Polyakov	dc0ab674de	Add PMIx key to provide RM with ability to indicate that it will cleanup session directories provided at through OPAL_PMIX_TMPDIR, OPAL_PMIX_NSDIR, OPAL_PMIX_PROCDIR	2016-09-05 07:48:44 +03:00
Nathan Hjelm	a36bdfe69f	opal/asm: updates to powerpc assembly This commit contains the following changes: - There is a bug in the PGI 16.x betas for ppc64 that causes them to emit the incorrect instruction for loading 64-bit operands. If not cast to void * the operands are loaded with lwz (load word and zero) instead of ld. This does not affect optimized mode. The work around is to cast to void * and was implemented similar to a work-around for a xlc bug. - Actually implement 64-bit add/sub. These functions were missing and fell back to the less efficient compare-and-swap implementations. Thanks to @PHHargrove for helping to track this down. With this update the GCC inline assembly works as expected with pgi and ppc64. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-02 23:47:47 -06:00
Jeff Squyres	95c6f6cfc0	btl/tcp: fix help message It looks like one help message was accidentally pasted in the middle of another. Disentangle the two messages from each other, and slightly tweak the one message to say that the job may also crash (in addition to hanging). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-09-02 17:14:22 -04:00
Nathan Hjelm	f93c1f2106	btl/ugni: fix erroneous warning message This commit prevents the connection code from trying to connect an endpoint if the directed datagram has been posted but not received. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-02 09:17:44 -06:00
Ralph Castain	34f04a7924	Remove spurious Makefile.am line	2016-09-01 15:31:09 -07:00
Ralph Castain	0ea1cff733	Implement notification of completion on comm_spawn'd child jobs. Add a configure flag to enable PMIx 3's shared memory datastore, and set it disable by default so that comm_spawn functions again. Will reverse the default once that feature is fully functional	2016-09-01 13:10:10 -07:00
rhc54	39d086e000	Merge pull request #2035 from rhc54/topic/memprofile Provide a mechanism for obtaining memory profiles of daemons and application profiles for use in studying our memory footprint	2016-08-31 14:06:48 -05:00
Ralph Castain	39992d1ad7	Silence trivial Coverity warnings	2016-08-31 09:42:33 -07:00
Ralph Castain	c1050bc01e	Provide a mechanism for obtaining memory profiles of daemons and application profiles for use in studying our memory footprint. Setting OMPI_MEMPROFILE=N causes mpirun to set a timer for N seconds. When the timer fires, mpirun will query each daemon in the job to report its own memory usage plus the average memory usage of its child processes. The Proportional Set Size (PSS) is used for this purpose.	2016-08-31 09:32:07 -07:00
Ralph Castain	cfa784c9a6	Since we changed storage to pointers in pmix_value_t, we need to allocate space for those values when unpacking	2016-08-29 20:22:24 -07:00
George Bosilca	a6d515ba9e	Fixes opal_atomic_ll_64. Thanks to Paul Hardgrove for the report and his patch. This is an addition to #1140 and should go in 2.x	2016-08-27 12:43:48 -04:00
Nathan Hjelm	d33204b0dc	Merge pull request #2021 from hjelmn/xlc_fix opal/patcher: fix xlc support	2016-08-26 18:15:41 -06:00
rhc54	b90a64e734	Merge pull request #2022 from rhc54/topic/nnodes Provide the number of nodes in the job	2016-08-26 18:15:24 -05:00
Ralph Castain	2f6e0fec90	Provide the number of nodes in the job	2016-08-26 14:50:41 -07:00
Jeff Squyres	09ad7e81eb	Merge pull request #2007 from jsquyres/pr/usnic-show-local-udp-ports usnic: show the local UDP ports	2016-08-26 17:03:16 -04:00
Nathan Hjelm	a9bc692d99	opal/patcher: fix xlc support The xlc compiler seems to behave in a different way that gcc when it comes the inline asm. There were two problems with the code with xlc: - The TOC read in mca_patcher_base_patch_hook used the syntax register unsigned long toc asm("r2") to read $r2 (the TOC pointer). With gcc this seems to behave as expected but with xlc the result in toc is not the same as $r2. I updated the code to use asm volatile ("std 2, %0" : "=m" (toc)) to load the TOC pointer. - The OPAL_PATCHER_BEGIN macro is meant to be the first thing in a hook. On PPC64 it loads the correct TOC pointer (thanks to mca_patcher_base_patch_hook) and saves the old one. The OPAL_PATCHER_END macro restores the TOC pointer. Because we need the TOC to be correct before it is accessed in the hook the OPAL_PATCHER_BEGIN macro MUST come first. We did this and all was well with gcc. With xlc on the other hand there was a TOC access before the assembly inserted by OPAL_PATCHER_BEGIN. To fix this quickly I broke each hook into a pair of function with the OPAL_PATCHER_* macros on the top level functions. This works around the issue but is not a clean way to fix this. In the future we should 1) either update overwrite to not need this, or 2) figure out why xlc is not inserting the asm before the first TOC read. This fixes open-mpi/ompi#1854 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-26 14:43:03 -06:00

... 3 4 5 6 7 ...

4581 Коммитов