openmpi

Автор	SHA1	Сообщение	Дата
Gilles Gouaillardet	44a66e208c	threads: fix WAIT_SYNC_INIT with a zero count WAIT_SYNC_INIT(sync,0); WAIT_SYNC_RELEASE(sync); hanged because sync->signaled was initialised to true, and there is no reason to invoke WAIT_SYNC_SIGNALED(sync) before WAIT_SYNC_RELEASE(sync) this commit initializes sync->signaled to true unless the count is zero. Thanks George for the review and guidance.	2016-09-07 10:03:40 +09:00
Nathan Hjelm	27a2509fec	Merge pull request #2051 from hjelmn/ppc_asm opal/asm: updates to powerpc assembly	2016-09-06 15:13:28 -06:00
Jeff Squyres	527efec4fb	Merge pull request #2050 from jsquyres/pr/btl-tcp-help-messages Add a show_help message to TCP BTL when peer unexpectedly disconnects	2016-09-06 09:40:31 -04:00
Jeff Squyres	1953e3406f	btl/tcp: add show_help message when peer hangs up We commonly see messages on the users list where a peer has hung up because it has crashed. Instead of having just a BTL_ERROR message, make this a real opal_show_help() message that tells the user that the peer unexpectedly hung up, and they should look into why that peer hung up. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-09-06 09:40:03 -04:00
Gilles Gouaillardet	894be7860a	gcc_builtin/atomic: Silence numerous warnings from Studio compilers This commit adds selective use of a compiler-specific pragma to silence the numerous warnings the Sun/Oracle/Studio compilers emit for the GNU-style inline asm used in atomic.h. Thanks Paul Hargrove for the initial patch and the guidance.	2016-09-06 09:07:16 +09:00
Gilles Gouaillardet	4b208e4463	btl/tcp: make mca_btl_tcp_proc_insert re-entrant otherwise bad things happen with --mca btl_tcp_progress_thread 1 (non default) and --mca mpi_add_procs_cutoff 0 (default)	2016-09-05 15:57:34 +09:00
Artem Polyakov	dc0ab674de	Add PMIx key to provide RM with ability to indicate that it will cleanup session directories provided at through OPAL_PMIX_TMPDIR, OPAL_PMIX_NSDIR, OPAL_PMIX_PROCDIR	2016-09-05 07:48:44 +03:00
Nathan Hjelm	a36bdfe69f	opal/asm: updates to powerpc assembly This commit contains the following changes: - There is a bug in the PGI 16.x betas for ppc64 that causes them to emit the incorrect instruction for loading 64-bit operands. If not cast to void * the operands are loaded with lwz (load word and zero) instead of ld. This does not affect optimized mode. The work around is to cast to void * and was implemented similar to a work-around for a xlc bug. - Actually implement 64-bit add/sub. These functions were missing and fell back to the less efficient compare-and-swap implementations. Thanks to @PHHargrove for helping to track this down. With this update the GCC inline assembly works as expected with pgi and ppc64. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-02 23:47:47 -06:00
Jeff Squyres	95c6f6cfc0	btl/tcp: fix help message It looks like one help message was accidentally pasted in the middle of another. Disentangle the two messages from each other, and slightly tweak the one message to say that the job may also crash (in addition to hanging). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-09-02 17:14:22 -04:00
Nathan Hjelm	f93c1f2106	btl/ugni: fix erroneous warning message This commit prevents the connection code from trying to connect an endpoint if the directed datagram has been posted but not received. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-02 09:17:44 -06:00
Ralph Castain	34f04a7924	Remove spurious Makefile.am line	2016-09-01 15:31:09 -07:00
Ralph Castain	0ea1cff733	Implement notification of completion on comm_spawn'd child jobs. Add a configure flag to enable PMIx 3's shared memory datastore, and set it disable by default so that comm_spawn functions again. Will reverse the default once that feature is fully functional	2016-09-01 13:10:10 -07:00
rhc54	39d086e000	Merge pull request #2035 from rhc54/topic/memprofile Provide a mechanism for obtaining memory profiles of daemons and application profiles for use in studying our memory footprint	2016-08-31 14:06:48 -05:00
Ralph Castain	39992d1ad7	Silence trivial Coverity warnings	2016-08-31 09:42:33 -07:00
Ralph Castain	c1050bc01e	Provide a mechanism for obtaining memory profiles of daemons and application profiles for use in studying our memory footprint. Setting OMPI_MEMPROFILE=N causes mpirun to set a timer for N seconds. When the timer fires, mpirun will query each daemon in the job to report its own memory usage plus the average memory usage of its child processes. The Proportional Set Size (PSS) is used for this purpose.	2016-08-31 09:32:07 -07:00
Ralph Castain	cfa784c9a6	Since we changed storage to pointers in pmix_value_t, we need to allocate space for those values when unpacking	2016-08-29 20:22:24 -07:00
George Bosilca	a6d515ba9e	Fixes opal_atomic_ll_64. Thanks to Paul Hardgrove for the report and his patch. This is an addition to #1140 and should go in 2.x	2016-08-27 12:43:48 -04:00
Nathan Hjelm	d33204b0dc	Merge pull request #2021 from hjelmn/xlc_fix opal/patcher: fix xlc support	2016-08-26 18:15:41 -06:00
rhc54	b90a64e734	Merge pull request #2022 from rhc54/topic/nnodes Provide the number of nodes in the job	2016-08-26 18:15:24 -05:00
Ralph Castain	2f6e0fec90	Provide the number of nodes in the job	2016-08-26 14:50:41 -07:00
Jeff Squyres	09ad7e81eb	Merge pull request #2007 from jsquyres/pr/usnic-show-local-udp-ports usnic: show the local UDP ports	2016-08-26 17:03:16 -04:00
Nathan Hjelm	a9bc692d99	opal/patcher: fix xlc support The xlc compiler seems to behave in a different way that gcc when it comes the inline asm. There were two problems with the code with xlc: - The TOC read in mca_patcher_base_patch_hook used the syntax register unsigned long toc asm("r2") to read $r2 (the TOC pointer). With gcc this seems to behave as expected but with xlc the result in toc is not the same as $r2. I updated the code to use asm volatile ("std 2, %0" : "=m" (toc)) to load the TOC pointer. - The OPAL_PATCHER_BEGIN macro is meant to be the first thing in a hook. On PPC64 it loads the correct TOC pointer (thanks to mca_patcher_base_patch_hook) and saves the old one. The OPAL_PATCHER_END macro restores the TOC pointer. Because we need the TOC to be correct before it is accessed in the hook the OPAL_PATCHER_BEGIN macro MUST come first. We did this and all was well with gcc. With xlc on the other hand there was a TOC access before the assembly inserted by OPAL_PATCHER_BEGIN. To fix this quickly I broke each hook into a pair of function with the OPAL_PATCHER_* macros on the top level functions. This works around the issue but is not a clean way to fix this. In the future we should 1) either update overwrite to not need this, or 2) figure out why xlc is not inserting the asm before the first TOC read. This fixes open-mpi/ompi#1854 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-26 14:43:03 -06:00
Jeff Squyres	87a5ccc060	usnic: show the local UDP ports Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-26 12:25:18 -07:00
Jeff Squyres	e03a40a0e9	pmix3x: remove generated file Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-26 10:30:47 -07:00
Jeff Squyres	9ae51a09f2	Merge pull request #1989 from jsquyres/pr/update-usnic-to-libfabric-v1.4 Update usnic BTL to libfabric v1.4	2016-08-26 09:53:07 -04:00
Gilles Gouaillardet	e4bf915e75	pmix3x: remove auto-generated file remove opal/mca/pmix/pmix3x/pmix/src/include/pmix_config.h.in .gitignore is correct, so it seems this file was added before .gitignore was updated	2016-08-26 15:00:18 +09:00
Ralph Castain	af67f16422	Update configury to support multiple PMIx versions, rename pmix2x component to pmix3x for support of PMIx master Update support for external v1.1.x and v2.x libraries. Minor corrections to the v3.x component	2016-08-25 18:19:05 -07:00
Gilles Gouaillardet	277c319389	opal/util: fix (again and again) incorrect type casting in opal_path_df and silence CID 1371767 this fixes previous commits : - open-mpi/ompi@2eec8970ff - open-mpi/ompi@a439afce5b	2016-08-26 09:42:45 +09:00
Nathan Hjelm	de32c779e2	opal/wait_sync: add #if protection on header Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-25 14:31:52 -06:00
Jeff Squyres	f56b16f079	usnic: remove unused variable Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-25 03:53:18 -07:00
Jeff Squyres	9717bcb7e6	btl/usnic: remove stale comment Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-25 03:53:18 -07:00
Jeff Squyres	6f5e377fe0	btl/usnic: update for libfabric v1.4 With libfabric v1.4, the usnic provider changed the values of its fabric and domain name strings (compared to libfabric <v1.4). Update the Open MPI usNIC BTL to handle both pre-v1.4 and v1.4 fabric/domain names. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-25 03:53:17 -07:00
George Bosilca	3adff9d323	Fixes #1793 . Reshape the tearing down process (connection close) to prevent race conditions between the main thread and the progress thread. Minor cleanups.	2016-08-24 22:45:19 -04:00
Nathan Hjelm	83062db7cb	btl/ugni: actually make the endpoint lock recursive Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-24 10:36:08 -06:00
Gilles Gouaillardet	2eec8970ff	opal/util: fix (again) incorrect type casting in opal_path_df this fixes previous commit open-mpi/ompi@a439afce5b	2016-08-24 12:50:15 +09:00
Gilles Gouaillardet	02847d9e7b	pmix2x: dstore: add missing <fcntl.h> include file in pmix_esh.c (back-ported from upstream pmix/master@5c66ffe0f0)	2016-08-24 11:18:46 +09:00
Gilles Gouaillardet	c11e8163f8	pmix2x: sec/native: fix the pmix_native module under solaris by using getpeerucred() and fail with a user friendly message if no method is available: "sec: native cannot validate_cred on this system" (back-ported from upstream pmix/master@c474a1fc60)	2016-08-24 11:18:40 +09:00
Gilles Gouaillardet	e91292aa41	pmix2x: configury: add missing check for <netdb.h> header file (back-ported from upstream pmix/master@e54ce6d423)	2016-08-24 11:18:32 +09:00
Gilles Gouaillardet	a439afce5b	opal/util: fix incorrect type casting in opal_path_df	2016-08-24 10:26:13 +09:00
Potnuri Bharat Teja	9b7f9ece20	Add Chelsio T6 adapter device parameters. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>	2016-08-23 10:38:13 +05:30
Ralph Castain	639dbdb7ea	For maintainability, fold the external PMIx 2.x integration into the internal PMIx 2.x library component. This ensures that we always stay in sync with the two as that is becoming a problem.	2016-08-22 13:28:55 -07:00
George Bosilca	fd57f5bccd	Remove some of the clang warnings.	2016-08-20 14:21:42 -04:00
Ralph Castain	61ffba668b	Roll in the latest PMIx version - includes shared memory datastore and reduced memory footprint	2016-08-20 07:53:06 -07:00
Artem Polyakov	6ea8cccdab	Merge pull request #1969 from artpol84/pmix_jobid_fix Pmix jobid fix	2016-08-18 17:24:58 +07:00
Ralph Castain	7da9793fef	Support the PMIX_TIMEOUT key at the PMIx server when timeout=0 - this indicates that the user doesn't want a lookup of any data from the host RM.	2016-08-17 16:26:58 -05:00
Gilles Gouaillardet	3126ff77e2	pmix2x: common syms: whitelist bison-generated common symbols Bison generates some common symbols that we can't do anything about, so whitelist them.	2016-08-16 11:29:06 +09:00
Artem Polyakov	c5a91c5c9d	opal/pmix: fix pmix jobid calculation if external PMIx server is used.	2016-08-15 21:13:51 +03:00
Ralph Castain	ecbedee8bb	Fix typo	2016-08-15 07:32:00 -07:00
Artem Polyakov	f3c816b52e	opal/pmix: fix indentation in some files.	2016-08-15 18:21:50 +07:00
Gilles Gouaillardet	483685eb6a	update .gitignore remove autogenerated opal/mca/pmix/pmix2x/pmix/src/include/pmix_config.h.in	2016-08-15 17:00:20 +09:00
Ralph Castain	be8424b691	Provide backward compatible keys so that the non-PMIx components in the opal/pmix framework don't have to adjust as we continue to work on finalizing the PMIx reference scheme. Activate and utilize the new PMIx show_help capability to provide more meaningful error output when the server cannot start. Add a contrib script to cleanup permissions incorrectly modified due to things like smb mounts dd	2016-08-13 12:13:04 -07:00
rhc54	ddde154d28	Merge pull request #1962 from rhc54/topic/notify Ensure we properly convert pmix status to ORTE state before activatin…	2016-08-13 06:59:50 -07:00
Ralph Castain	48d35a9627	Ensure we properly convert pmix status to ORTE state before activating an error state upon notification. Cleanup some conversion issues on notification info. Add a new orte_notify.c test program	2016-08-12 21:14:29 -07:00
Ralph Castain	4a4c9703a9	Setup the job list in the PMIx integration so that static ports can run	2016-08-12 13:27:10 -07:00
Ralph Castain	0e58609327	Fix a bug where we were requiring that all paths in $PATH be absolute. Some users provide relative paths in their environment, and we should respect those.	2016-08-12 11:28:57 -07:00
Ralph Castain	1d44f0c0e2	Silence Coverity warnings	2016-08-11 21:22:01 -07:00
Ralph Castain	73544d2e00	Rename symbol	2016-08-11 13:06:46 -07:00
Ralph Castain	b0cc9b0bc8	Update to latest PMIx toolext branch Fix indentations Update the ext20 component to match latest PMIx master. Cleanup name conflicts and uninit vars	2016-08-11 12:29:48 -07:00
Gilles Gouaillardet	dfbf2b7be4	opal/threads: add OPAL_THREAD_SUB_SIZE_T macro -1 is not a valid size_t, so instead of OPAL_THREAD_ADD_SIZE_T(..., -1), simply OPAL_THREAD_SUB_SIZE_T(..., 1) and keep picky compilers happy	2016-08-10 13:37:36 +09:00
rhc54	60f789dca1	Merge pull request #1948 from rhc54/topic/pmixtool Update to include extended tool support, new datatypes	2016-08-09 16:17:28 -07:00
Nathan Hjelm	19be439998	Merge pull request #1949 from hjelmn/ugni_fix btl/ugni: fix another connection race	2016-08-09 08:32:40 -06:00
Nathan Hjelm	38f18eed22	Merge pull request #1941 from ggouaillardet/topic/memory_patcher_configury configury: make memory/patcher symbol detection more robust	2016-08-09 07:06:38 -06:00
Gilles Gouaillardet	13009aa290	opal/alfg: have opal_random() wrapper always return a positive int	2016-08-09 17:12:30 +09:00
Gilles Gouaillardet	6f6b3ac68a	configury: standardize memory/patcher symbol detection and make it more robust by default, Sun compilers optimize out the original test, and hence fail detecting a symbol is missing.	2016-08-09 09:35:52 +09:00
Nathan Hjelm	adb668209b	btl/ugni: fix another connection race This commit fixes a race that can occur when two threads are in the ugni progress function at the same time. This race occurs when one thread calls GNI_PostDataProbeById then goes to sleep then another thread calls GNI_PostDataProbeById then GNI_EpPostDataWaitById before the other thread wakes up. If this happens the first thread will print a warning on GNI_EpPostDataWaitById about no matching post. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-08 15:38:11 -06:00
Ralph Castain	527b5c692a	Update to include extended tool support, new datatypes	2016-08-08 13:39:46 -07:00
Todd Kordenbrock	b90da992c8	Merge pull request #1895 from PDeveze/Patchs-on-btl-portals4 btl/portals4: Take into account the limitation of portals4 (max_msg_s…	2016-08-08 15:12:50 -05:00
Nathan Hjelm	5ced037488	Merge pull request #1939 from hjelmn/ugni_fix btl/ugni: protect against re-entry and races in connections	2016-08-08 08:55:30 -06:00
Artem Polyakov	b24ec3e3b9	pmix/s2: fix indentation (only)	2016-08-06 16:31:19 +06:00
Artem Polyakov	2cb923a413	pmix/s1: fix indentation (only)	2016-08-06 16:30:45 +06:00
Artem Polyakov	8aa3ef7799	pmix/s2: fix s2 component data placement Use wildcard for the information related to the job-level data. Fixes s2 component with regard to PR https://github.com/open-mpi/ompi/pull/1897.	2016-08-06 15:49:16 +06:00
Artem Polyakov	81063f1717	pmix/s1: fix s1 component data placement Use wildcard for the information related to the job-level data. Fixes s1 component with regard to PR https://github.com/open-mpi/ompi/pull/1897.	2016-08-06 15:45:46 +06:00
Nathan Hjelm	14b36d4503	btl/ugni: protect against re-entry and races in connections This commit fixes two issues that can occur during a connection: - Re-entry to connection progress from modex lookup. Added an additional endpoint state that will keep the code from re-entering the common endpoint create. - Fixed a race between a process posting a directed datagram through a send and a connection being progressed through opal_progress(). The progress code was not obtaining the endpoint lock before attempting to update the endpoint. To limit the amount of code changed for 2.0.1 this commit makes the endpoint lock recursive. In a future update this may be changed. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-04 16:08:01 -06:00
Jeff Squyres	c42d8867e6	Merge pull request #1925 from jsquyres/pr/warnings-fixes hwloc: fix Valgrind warning	2016-08-04 08:48:50 -07:00
Jeff Squyres	36555b7a1d	Merge pull request #1933 from thananon/fix_random Make libevent use internal random	2016-08-04 08:27:56 -07:00
Boris Karasev	9d6a4b3b2d	configury/libevent: fix incorrect drop of OPAL_HAVE_WORKING_EVENTOPS Fixes PR https://github.com/open-mpi/ompi/pull/1687 The code that sets OPAL_HAVE_WORKING_EVENTOPS for internal libevent was executed even if the external libevent component was configured. As the result libevent progress wasn't called in opal_progress which for example caused ring_c to hang when pml/ob1 was used.	2016-08-04 16:37:37 +06:00
Gilles Gouaillardet	30f98cd9d0	pmix: redefine OPAL_PMIX_ARCH macro Architecture is set by the ompi layer after job startup, so the key cannot have the "pmix" prefix since optimizations in open-mpi/ompi@01a653d50a otherwise architecture cannot be retrieved	2016-08-04 13:31:28 +09:00
Thananon Patinyasakdikul	b3e9dadff2	libevent: use opal_random() instead of rand(3) This commits changed rand(3) and family in libevent to use internal random function provided in opal to prevent pertubing user's random seed. Fixes open-mpi/ompi#1877	2016-08-03 09:18:12 -07:00
Howard Pritchard	08266a1a56	mpool/hugepage mntent intro fallout On Cray, PR #1846 introduced a double free situation which led to all kinds of random memory corruption problems. This commit fixes this problem. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-08-02 05:52:31 -05:00
Jeff Squyres	7bea563e02	hwloc: fix Valgrind warning Cherry picked from open-mpi/hwloc@d4565c351e Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-01 18:50:40 -07:00
Gilles Gouaillardet	21e7f31dbe	pmix2x: fix unpack sequence in PMIx_Get callback first unpack the nspace (PMIX_STRING) before unpacking the various keys (PMIX_KVAL)	2016-08-01 14:21:22 +09:00
Howard Pritchard	477f6cb6a8	Merge pull request #1846 from ggouaillardet/topic/mntent mpool/hugepage: set mntent API instead of manually parsing /proc/mounts	2016-07-31 20:17:37 -06:00
Gilles Gouaillardet	1778e5b586	atomic/sparcv9: fix a typo in the comment, no code change	2016-08-01 10:34:02 +09:00
Ralph Castain	16fccd4964	Establish a way for ORTE to tell PMIx the base tmpdir to use, and update PMIx to understand such directives	2016-07-29 09:52:36 -07:00
Nathan Hjelm	325c9ba4cc	opal/thread: fix warnings Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-07-29 07:04:19 -06:00
Nathan Hjelm	1da558407c	Merge pull request #1911 from hjelmn/threads opal/thread: clean up and add additional OPAL_THREAD macros	2016-07-29 06:44:11 -06:00
Gilles Gouaillardet	273e56096b	configury: capture configury command line configury command line is quoted and made available via the OPAL_CONFIGURE_CLI macro. it can be retrieved via {orte-info,ompi_info,oshmem_info} -c, or {orte-info,ompi_info,oshmem_info} --all --parseable \| grep ^config:cli:	2016-07-29 09:14:09 +09:00
Ralph Castain	cacb582ecd	Support timeout values when performing connect/accept operations. Bump default timeout to 10 minutes so folks have time to start the partnering application	2016-07-28 14:09:06 -07:00
Nathan Hjelm	c281bd3c7f	Merge pull request #1908 from hjelmn/udreg_fix rcache/udreg: make reference count thread safe	2016-07-28 09:27:16 -06:00
Nathan Hjelm	aac611237b	opal/thread: clean up and add additional OPAL_THREAD macros This commit expands the OPAL_THREAD macros to include 32- and 64-bit atomic swap. Additionally, macro declararations have been updated to include both OPAL_THREAD_* and OPAL_ATOMIC_*. Before this commit the former was used with add and the later with cmpset. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-07-28 09:23:14 -06:00
Nathan Hjelm	a8c3699484	Fix performance regression caused by enabling opal thread support This commit adds opal_using_threads() protection around the atomic operation in OBJ_RETAIN/OBJ_RELEASE. This resolves the performance issues seen when running psm with MPI_THREAD_SINGLE. To avoid issues with header dependencies opal_using_threads() has been moved to a new header (thread_usage.h). The OPAL_THREAD_ADD* and OPAL_THREAD_CMPSET* macros have also been relocated to this header. This commit is cherry-picked off a fix that was submitted for the v1.8 release series but never applied to master. This fixes part of the problem reported by @nysal in #1902. (cherry picked from commit open-mpi/ompi-release@ce91307918) Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-07-28 07:01:27 -06:00
Nathan Hjelm	4658b761e4	rcache/udreg: make reference count thread safe Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-07-27 13:40:35 -06:00
Nathan Hjelm	1eb4ef438e	Merge pull request #1903 from hjelmn/openib_fixes btl/openib: set send flags only after endpoint is connected	2016-07-27 09:01:49 -06:00
Howard Pritchard	1dc7e9ed8f	Merge pull request #1904 from hppritcha/topic/fix_cray_srun_native_launch pmix/cray: switch to using wildcards for some	2016-07-27 07:12:02 -06:00
Howard Pritchard	b65bbe017f	pmix/cray: switch to using wildcards for some items so that at least srun native launch on cray works again. More issues to fix when using alps. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-07-26 17:07:58 -05:00
Nathan Hjelm	5e13e1ab7d	btl/openib: set send flags only after endpoint is connected The max inline send size on a queue pair is not available until after the endpoint is connected. Before this commit the send flags (including the inline flag) were set before this value was initialized. This commit moves setting the send_flags down to mca_btl_openib_put_internal which is only called after the endpoint is connected. This fixes a bug when using osc/rdma. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-07-26 16:01:11 -06:00
Gilles Gouaillardet	91ccec342c	btl/openib: remove some dead code remove useless call to opal_mem_hooks_support_level() and the value local variable.	2016-07-22 09:26:33 +09:00
Gilles Gouaillardet	1b3be0ac8c	configury + btl/openib: fix a typo test for existence of struct ibv_exp_device_attr.exp_atomic_cap. That was previously mistyped struct ibv_exp_device_attr.ext_atomic_cap	2016-07-22 09:26:33 +09:00
Ralph Castain	71de03fc67	Cleanup the new naming requirements to ensure that info is correctly retrieved Cleanup permissions Restore singleton operations	2016-07-21 09:46:03 -07:00
Ralph Castain	2b55ee8118	Cleanup Coverity warnings	2016-07-20 20:31:58 -07:00
Ralph Castain	01a653d50a	Remove a debug print in comm_cid.c. Update PMIx2 to include the revised PMIx_Get logic for higher performance by reducing the number of hash table lookups. Fix a bug where requests for data from a proc in another nspace could hang, or result in "not found". Remove stale file reference Restore autogen pass thru pmix Remove generated file	2016-07-20 00:58:19 -07:00
Pascal Deveze	6d6ec66705	btl/portals4: Take into account the limitation of portals4 (max_msg_size)	2016-07-19 15:19:29 +02:00
Nathan Hjelm	03bce91de8	pmix/pmix2x: add missing increment in loop This commit fixes a bug in the pmix2x client code where a loop variable is not correctly incremented. This was leading to hangs and crashes when creating intercommunicators. Also fixed two double increments in other loops. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-07-18 10:35:05 -06:00
Jeff Squyres	72f41d4490	pmix: replace all tabs with spaces No code or logic changes Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-07-17 15:08:33 -04:00
Jeff Squyres	1c32742c66	pmix_ext20: fix syntax error Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-07-17 15:04:12 -04:00
Ralph Castain	99f7096031	Fix permissions	2016-07-16 21:03:55 -07:00
Ralph Castain	d4071fbd1c	Fix dynamic operations by ensuring that we only fire the debugger release if the debugger is attached, and that the OPAL pmix key for directing events to non-default handlers matches the PMIx spelling	2016-07-16 13:20:41 -07:00
Ralph Castain	1ceb35ba5c	Fix singletons - do not include the PMIx tool URI in the environment provided to child processes	2016-07-13 17:33:34 -07:00
Ralph Castain	20a91c2baf	Add a new --continuous flag to mpirun that directs ORTE to let a job continue running as app procs terminate. Don't attempt to restart them. Add event notification of abnormally terminating procs, and demonstrate that in the mpi_spin test program. Cleanup debug message	2016-07-13 15:28:33 -07:00
Artem Polyakov	72585a905f	opal/pmix: add blocking Fence to SLURM components. Blocking fence is used in yalla del proc. Native pmix exposes this functionality. We need to expose it for SLURM's s1/s2 components as well. Also this commit fixes uninitialized `rc` in fencenb's of both components.	2016-07-11 09:43:15 +03:00
Artem Polyakov	8e16f47492	Merge pull request #1688 from artpol84/fix_base64 Fix base64 implementation in pmix framework.	2016-07-07 10:47:50 +06:00
Gilles Gouaillardet	1ba7e2b20b	mpool/hugepage: set mntent API instead of manually parsing /proc/mounts Refs open-mpi/ompi#1822	2016-07-06 15:00:19 +09:00
Gilles Gouaillardet	acda07472a	configury: revamp and re-ident sub configure.m4 after open-mpi/ompi@846360fd4c	2016-07-06 11:59:51 +09:00
Gilles Gouaillardet	846360fd4c	configury: correctly perform make distclean when {libevent,hwloc,pmix} are external components Thanks Jeff for the guidance Fixes open-mpi/ompi#1683 note: in order to keep this commit easy to review, some AS_IF([...]) were replaced with AS_IF([false], ...) or AS_IF_([true], ...) these will be removed and re-idented in a subsequent commit	2016-07-06 11:57:24 +09:00
Ralph Castain	ee56d9dc1a	Shorten the session directory name as some OS's are now providing unusually long temp directory names, causing us to overflow the sockaddr field	2016-07-05 14:59:50 -07:00
Ralph Castain	7e0af3f4f0	Update pmix2x to track upstream changes	2016-07-05 11:54:22 -07:00
Gilles Gouaillardet	267821f0dd	pmix2x/pmix: fix a typo in PMIx_tool_init() and remove now useless local variable i	2016-07-05 13:47:50 +09:00
Gilles Gouaillardet	efce8cc734	pmix2x/pmix: add missing include files pmix cannot be built on alpine linux because of some missing includes. uid_t and gid_t are defined in unistd.h or sys/types.h, and unistd.h is not indirectly pulled under alpine linux, so do it manually. Thanks N.L.K Nguyen for the report (back-ported from upstream pmix/master@c8d55350a9)	2016-07-05 09:03:14 +09:00
Ralph Castain	c9ada8e095	Silence Coverity warnings	2016-07-03 20:45:08 -07:00
Ralph Castain	673f82e2b6	Update the PMIx listener to avoid leaking sockets into children, and better handle race condition errors	2016-07-03 08:23:33 -07:00
Nathan Hjelm	01d6da31af	btl/openib: fix rdmacm locking bug This commit fixes a long standing bug in rdmacm. It is required that the thread that calls mca_btl_openib_endpoint_cpc_complete holds the endpoint lock. This was not the case for rdmacm. This causes debug builds to abort. This change also required changing mca_btl_openib_endpoint_send_cts to require the endpoint lock to be held when calling. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-30 15:50:07 -06:00
Nathan Hjelm	cc2b3e0c3f	Merge pull request #1830 from hjelmn/rdmacm_test Test for rdmacm hang fix	2016-06-30 10:41:46 -06:00
Nathan Hjelm	960fcd292c	btl/openib: fix rdma hang This commit is an attempt to fix a hang in finalize of rdmacm. This fixes a path where no rdmacm client is found for an endpoint. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-29 20:31:26 -06:00
Ralph Castain	6e434d6785	Add support for PMIx tool connections and queries. Initially only support a request to list all known namespaces (jobids) from ORTE, but other folks will extend that support to include additional information Update to match PMIx RFC Fix configury to point to correct libevent and hwloc locations	2016-06-29 19:19:19 -07:00
Jeff Squyres	f18d6606da	Merge pull request #1824 from hjelmn/rdmacm_fix btl/openib: fix segmentation fault	2016-06-28 18:10:35 -04:00
Nathan Hjelm	8128c8eb29	btl/openib: fix segmentation fault This commit fixes a segmentation fault that occurs if a device can be initialized but not used. In this case the devices_count is not equal to the number of usable devices in the devices pointer array. Thanks to @artpol84 for tracking this down. Fixes open-mpi/ompi#1823 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-28 10:31:32 -06:00
Nathan Hjelm	955269b4f1	Merge pull request #1816 from hjelmn/request_perfm_regression opal/sync: fix race condition	2016-06-28 09:12:00 -06:00
Artem Polyakov	541715572f	Fix MPI_Waitany and MPI_Waitsome (request handling related)	2016-06-28 16:40:00 +03:00
Artem Polyakov	8d011ea403	Fix Mellanox copyright.	2016-06-26 21:01:19 -06:00
Nathan Hjelm	fb455f0802	opal/sync: fix race condition This commit fixes a race condition discovered by @artpol84. The race happens when a signalling thread decrements the sync count to 0 then goes to sleep. If the waiting thread runs and detects the count == 0 before going to sleep on the condition variable it will destroy the condition variable while the signalling thread is potentially still processing the completion. The fix is to add a non-atomic member to the sync structure that indicates another process is handling completion. Since the member will only be set to false by the initiating thread and the completing thread the variable does not need to be protected. When destoying a condition variable the waiting thread needs to wait until the singalling thread is finished. Thanks to @artpol84 for tracking this down. Fixes #1813 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-26 20:14:01 -06:00
Nathan Hjelm	dac9201f3b	Merge pull request #1770 from hjelmn/rdma_wth btl/openib: fix rdmacm	2016-06-24 22:46:53 -06:00
Nathan Hjelm	2992d6d238	Merge pull request #1808 from abjoshi-brcm/timer_arm64 arm64: add timer support	2016-06-23 07:10:56 -06:00
Abhishek Joshi	f06f7eb3e6	arm64: add timer support Signed-off-by: Sreenidhi Bharathkar Ramesh <sreenidhi-bharathkar.ramesh@broadcom.com>	2016-06-23 11:01:00 +00:00
Ralph Castain	08b1438f15	Add missing PMIx range value so OPAL and PMIx align again	2016-06-22 22:03:25 -07:00
Nathan Hjelm	55d1933a89	opal/sync: fix warnings Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-22 15:03:21 -06:00
Nathan Hjelm	e4f920f6f9	opal/progress: improve performance when there are no LP callbacks This commit adds another check to the low-priority callback conditional that short-circuits the atomic-add if there are no low-priority callbacks. This should improve performance in the common case. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-22 09:52:37 -06:00
Nathan Hjelm	143a93f379	opal/sync: remove usage of OPAL_ENABLE_MULTI_THREADS The OPAL_ENABLE_MULTI_THREADS macro is always defined as 1. This was causing us to always use the multi-thread path for synchronization objects. The code has been updated to use the opal_using_threads() function. When MPI_THREAD_MULTIPLE support is disabled at build time (2.x only) this function is a macro evaluating to false so the compiler will optimize out the MT-path in this case. The OPAL_ATOMIC_ADD_32 macro has been removed and replaced by the existing OPAL_THREAD_ADD32 macro. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-22 09:52:37 -06:00
Gilles Gouaillardet	bf133c401e	pmix2x: fix a typo in dereg_event_hdlr() This bug has been fixed when open-mpi/ompi@dde69e1be2 was backported into upstream pmix in pmix/master@5e5577778c but it was not fixed in open-mpi/ompi	2016-06-22 13:45:29 +09:00
Jeff Squyres	af614afedf	Merge pull request #1800 from thananon/common_sym_fix Fixed common symbol error in btl/usnic.	2016-06-21 20:11:52 -04:00
Ralph Castain	441739b5a4	Cleanup a lagging message that generates an annoying (but seemingly harmless) warning	2016-06-20 12:23:27 -07:00
Thananon Patinyasakdikul	afe07cd5d5	Fixed common symbol in btl/usnic - This commit fixes the accidental common symbol btl_usnic_lock - It also moves the btl_usnic_lock declaration to btl_usnic.h	2016-06-20 10:05:44 -07:00
Howard Pritchard	1bed9fdb59	Merge pull request #1799 from hppritcha/topic/help_aries_with_knl common/ugni: help out knl with aries	2016-06-20 08:09:24 -06:00
Ralph Castain	0ba02821e6	Add requested key and job-level info	2016-06-19 18:22:31 -07:00
Ralph Castain	0a29f5cb77	Sigh - missed two typos	2016-06-18 20:57:53 -07:00
Ralph Castain	dd38cf1fed	Fix typo	2016-06-18 20:56:43 -07:00
Howard Pritchard	8b53487977	common/ugni: help out knl with aries The way the gni btl is currently coded, it will run completely out of gas on KNL at 123 processes/node. Since there are bound to be those who try to run a MPI process/hyperthread on KNL nodes, the fma sharing mode needs to be requested. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-06-18 15:09:05 -05:00
Ralph Castain	dde69e1be2	Cleanup CIDs 1362763, 1362762, 1362760, 1362759, 1362758, 1362757, 1362756, 1362755, 1362754. Unsure how to resolve 1362761. Fixes #1792	2016-06-18 12:28:46 -07:00
Jeff Squyres	7a8d7fb948	openib: fix compiler warnings Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-06-18 07:15:11 -07:00
Jeff Squyres	c332ee5884	Merge pull request #1784 from thananon/fix_usnic_thread Fix btl/usnic deadlock when the connectivity check is turned off.	2016-06-17 11:15:14 -04:00
Nathan Hjelm	f59c2fce6b	Merge pull request #1786 from hjelmn/32_fix opal/progress: use 32-bit atomics for call counter	2016-06-17 08:54:41 -06:00
Nathan Hjelm	2e4141f20a	Merge pull request #1787 from hjelmn/asm_fix opal/asm: fix syntax of timer code for ia32	2016-06-17 08:50:57 -06:00
Ralph Castain	044c561cba	Roll to latest PMIx master	2016-06-16 17:30:30 -07:00
Nathan Hjelm	9c709966f7	opal/asm: fix syntax of timer code for ia32 Thanks to Paul Hargrove for pointing this out. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-16 16:55:01 -06:00
rhc54	702a982271	Merge pull request #1767 from rhc54/topic/pmix2 Enable the PMIx event notification capability	2016-06-16 15:27:43 -07:00
Nathan Hjelm	7349ddc937	patcher/overwrite: use OPAL_ASSEMBLY_ARCH to determine architecture Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-16 10:00:00 -06:00
Nathan Hjelm	dbd8369485	opal/progress: use 32-bit atomics for call counter This commit fixes a compile error on 32-bit platforms. The low-priority call counter was always using 64-bit atomics which will not work if 64-bit atomic math is not available. Updated to use 32-bit instead. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-16 09:01:19 -06:00
Thananon Patinyasakdikul	7bd18214a7	Fix btl/usnic deadlock when the connectivity check is turned off.	2016-06-15 07:42:55 -07:00
Jeff Squyres	b7e937fea5	Merge pull request #1778 from thananon/usnic_thread_safe Added MPI_THREAD_MULTIPLE support for btl/usnic.	2016-06-14 18:43:04 -04:00
Ralph Castain	5d330d5220	Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler. Add PMIx 2.0 Remove PMIx 1.1.4 Cleanup copying of component Add missing file Touchup a typo in the Makefile.am Update the pmix ext114 component Minor cleanups and resync to master Update to latest PMIx 2.x Update to the PMIx event notification branch latest changes	2016-06-14 13:08:41 -07:00
Jeff Squyres	5071602c59	PSM/PSM2: Disable signal handler hijacking by default Per discussion on https://github.com/open-mpi/ompi/pull/1767 (and some subsequent phone calls and off-issue email discussions), the PSM library is hijacking signal handlers by default. Specifically: unless the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel TrueScale) is set, the library constructor for this library will hijack various signal handlers for the purpose of invoking its own error reporting mechanisms. This may be a bit surprising, but is not a problem, per se. The real problem is that older versions of at least the PSM library do not unregister these signal handlers upon being unloaded from memory. Hence, a segv can actually result in a double segv (i.e., the original segv and then another segv when the now-non-existent signal handler is invoked). This PSM signal hijacking subverts Open MPI's own signal reporting mechanism, which may be a bit surprising for some users (particularly those who do not have Intel TrueScale). As such, we disable it by default so that Open MPI's own error-reporting mechanisms are used. Additionally, there is a typo in the library destructor for the PSM2 library that may cause problems in the unloading of its signal handlers. This problem can be avoided by setting `HFI_NO_BACKTRACE=1` (for PSM2 / Intel OmniPath). This is further compounded by the fact that the PSM / PSM2 libraries can be loaded by the OFI MTL and the usNIC BTL (because they are loaded by libfabric), even when there is no Intel networking hardware present. Having the PSM/PSM2 libraries behave this way when no Intel hardware is present is clearly undesirable (and is likely to be fixed in future releases of the PSM/PSM2 libraries). This commit sets the following two environment variables to disable this behavior from the PSM/PSM2 libraries (if they are not already set): * IPATH_NO_BACKTRACE=1 * HFI_NO_BACKTRACE=1 If the user has set these variables before invoking Open MPI, we will not override their values (i.e., their preferences will be honored). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-06-14 11:45:23 -07:00
Thananon Patinyasakdikul	ee85204c12	Added MPI_THREAD_MULTIPLE support for btl/usnic.	2016-06-13 13:47:06 -07:00
Nathan Hjelm	253c91972e	arm64: add atomic swap function This commit adds the opal_atomic_swap_32 and opal_atomic_swap_64 functions. This should improve the performance of btl/vader. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-11 09:46:29 -06:00
Nathan Hjelm	109389dce2	Merge pull request #1634 from hjelmn/cma cma: add support for MIPS and ARM	2016-06-11 09:20:28 -06:00
Ralph Castain	d58da99dbc	Shift to memcpy to avoid Solaris issues	2016-06-09 12:07:17 -07:00
Gilles Gouaillardet	1f651d17c1	opal/util/ethtool: fix (infamous) strncpy usage the infamous strncpy does not NULL terminate the destination when the buffer is truncated do it ourself ! fix CID 1362576	2016-06-09 09:54:50 +09:00
Ralph Castain	8fa935534b	Abstract the strnlen function for environments that do not have it (e.g., Solaris 10)	2016-06-08 10:12:43 -07:00
Nathan Hjelm	f8957f24af	Merge pull request #1768 from hjelmn/cq_fix btl/openib: fix cq resize calculation	2016-06-07 21:34:36 -06:00
Nathan Hjelm	17ae1aceeb	btl/openib: fix rdmacm The rdma_disconnect function specifies that both the server and client should call rdma_disconnect. The code was not calling rdma_disconnect on an endpoint if the event came before the endpoint finalization. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-07 17:53:58 -06:00
Nathan Hjelm	dd519c55b1	btl/openib: fix cq resize calculation Before dynamic add_procs the openib_btl_size_queues was called exactly once for non-dynamic jobs. Now the function is called on each new connection so the calculation was wrong. Re-wrote the function to correctly calculate the CQ size and only attempt to adjust the CQ if the requested size has changed. This fixes a bug when using the openib btl on psm2 hardware that is caused by the time needed to resize a CQ. The overhead was causing udcm to timeout and fail. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-07 16:05:56 -06:00
Nathan Hjelm	e082ed752a	opal/progress: fix warnings This commit fixes several warning introduced by open-mpi/ompi@fc26d9c69f . Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-06 22:18:24 -06:00
Nathan Hjelm	4a2bd83302	opal/cma: improve Linux CMA detection This commit improves the CMA detection when the installed glibc doesn't have support for CMA. In this case we need to verify that the syscall numbers in opal/include/opal/sys/cma.h are valid for the architecture. This verification is done by attempting to use CMA while including the internal header. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-05 22:29:07 -06:00
Gilles Gouaillardet	b707d138fe	pmix114/pmix1_client: fix misc memory leaks Fixes CID 1325146-1325149	2016-06-06 09:33:35 +09:00
Nathan Hjelm	0084ad0d1b	opal: add armv8 support This commit adds assembly support for aarch64. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-03 10:32:21 -06:00
Nathan Hjelm	6169d03ea3	btl: adjust values of new atomic flags Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-02 19:21:34 -06:00
Nathan Hjelm	9f43b23725	Merge pull request #1710 from hjelmn/ugni_atomics Additional ugni atomics	2016-06-02 18:25:49 -06:00
George Bosilca	e1c6b0e4a7	Some compilers are more than picky.	2016-06-03 09:04:34 +09:00
Nathan Hjelm	d9fc855955	Merge pull request #1743 from hjelmn/gcc_atomics_fix atomic/gcc: add check for 128-bit CAS being lock-free	2016-06-02 16:55:31 -06:00
Nathan Hjelm	d86e41ea13	atomic/gcc: add check for 128-bit CAS being lock-free Compiler implementations are free to include support for atomics that use locks. Unfortunately lock-free and lock atomics do not mix. Older versions of llvm on OS X use locks to provide __atomic_compare_exchange on 128-bit values but are lock-free on 64-bit values. This screws up our lifo implementation which mixes 64-bit and 128-bit atomics on the same values to improve performance. This commit adds a configure-time check if 128-bit atomics are lock free. If they are not then the 128-bit __atomic CAS is disabled and we check for the __sync version as a fallback. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-02 15:59:05 -06:00
Nathan Hjelm	5aab4b2d51	Merge pull request #1662 from ggouaillardet/topic/amd64_atomic amd64/atomic: silence warnings	2016-06-02 14:10:20 -06:00
George Bosilca	87b1d17e7e	Remove warnings. clang 7.0 with the picky option on is extremely verbose, and complains about almost everything. Trying to make him happy, at least regarding the datatype engine.	2016-06-03 00:56:24 +09:00
rhc54	483b9c370a	Merge pull request #1741 from rhc54/topic/pmix114 Update to 1.1.4rc3	2016-06-02 06:57:37 -07:00
Nathan Hjelm	fc26d9c69f	Merge pull request #1734 from hjelmn/progress_threading opal/progress: make progress function registration mt safe	2016-06-02 06:35:59 -06:00
Ralph Castain	ecea1e3bb5	Update to 1.1.4rc3	2016-06-01 20:56:07 -07:00
Nathan Hjelm	2fad3b9bc6	opal/progress: make progress function registration mt safe This commit fixes a bug in opal progress registration that can cause crashes when a progress function is registered while another thread is in opal_progress(). Before this commit realloc is used to allocate more space for progress functions but it is possible for a thread in opal_progress() to try to read from the array that is freed by realloc before the array is re-assigned when realloc returns. To prevent this race use malloc + memcpy to fill the new array and atomically swap out the old and new array pointers. Per suggestion we now allocate a default of 8 slots for callbacks and double the current number when we run out of space. This commit also fixes leaking the callbacks_lp array. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-01 20:57:19 -06:00
George Bosilca	d9fb59bea5	Update the synchronization primitive Add comments and make sure we correctly return the status of the synchronization primitive, especially if it was completed with error.	2016-06-02 11:53:56 +09:00
Nathan Hjelm	f33bbfd381	atomic: add support for __atomic builtins (#1735 ) * atomic: add support for __atomic builtins This commit adds support for the gcc __atomic builtins. The __sync builtins are deprecated and have been replaced by these atomics. In addition, the new atomics support atomic exchange which was not supported by __sync. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> * atomic: add support for transactional memory This commit adds support for using transactional memory when using opal atomic locks. This feature is enabled if the __HLE__ feature is available and the gcc builtin atomics are in use. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-01 21:23:47 -04:00
rhc54	b85a5e62ab	Merge pull request #1739 from rhc54/topic/pmix Split the pmix external component into one for the 1.1.4 release, and…	2016-06-01 16:24:44 -07:00
Ralph Castain	12ecf972af	Split the pmix external component into one for the 1.1.4 release, and another for the upcoming 2.0 release. Clean up the configury so the components look for a series-specific function instead of running a program. NOTE: the changes for the 2.0 series are not yet in the PMIx master.	2016-06-01 14:15:24 -07:00
Nathan Hjelm	ceb2912838	Merge pull request #1736 from hjelmn/ugni_fixes ugni BTL fixes	2016-06-01 14:59:55 -06:00
Jeff Squyres	d175fd692d	README.ompi: track patches added to hwloc Track post-v1.11.3-release patches applied to the hwloc copy embedded in Open MPI. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-06-01 07:17:05 -07:00
Jeff Squyres	3867bd3640	hwloc.m4: only check for valgrind in non-embedded mode This fixes https://github.com/open-mpi/ompi/issues/1732: i.e., the case where the outer project has its own check for <valgrind/valgrind.h>, but also supplements CPPFLAGS (to find Valgrind's header files) before doing that check. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Ideally, we would tell OMPI to disable autoconf's caching of our valgrind check result so that its check gets the right result after adding CPPFLAGS. Not sure if we can do that. For now, just disable our Valgrind code in embedded mode. This will keep the x86 backend enabled under Valgrind but it will auto-disable itself when finding identical APIC ids anyway (because CPUID returns same outputs for all PUs). Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Fixes open-mpi/ompi#1732 (cherry picked from commit open-mpi/hwloc@8b44fb1c81)	2016-06-01 06:58:53 -07:00
Gilles Gouaillardet	57978a75d0	Merge pull request #1717 from ggouaillardet/topic/lex_cleanup configury: clean the flex generated .c files	2016-06-01 13:06:21 +09:00
Nathan Hjelm	5d4bcce042	Merge pull request #1700 from shamisp/topic/cma_config CMA: Fixing logic for CMA system call detection	2016-05-31 20:33:48 -06:00
Nathan Hjelm	340152a635	Merge pull request #1720 from shamisp/topic/vader/max_addr VADER: Adjusting VADER_MAX_ADDRESS for non x86 platforms.	2016-05-31 20:33:28 -06:00
Gilles Gouaillardet	5f565dfec3	configury: clean the flex generated .c files	2016-06-01 11:13:31 +09:00
Nathan Hjelm	bf10d79914	btl/ugni: remove erroneous unlock The endpoint lock was being released twice in mca_btl_ugni_get_ep. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-31 16:52:53 -06:00
Nathan Hjelm	cc96097873	btl/ugni: fix bug when attempting unaligned get on aries This commit fixes a programming error when using an aries nic. The documentation of ugni shows that only the local alignment restriction for get was lifted on aries. There is still a remote address alignment restriction. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-31 16:52:09 -06:00
Jeff Squyres	5cfee95ea4	hwloc1113: add missing file to Makefile.am Lack of this file causes a failure when you run autogen.pl on a distribution tarball. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-31 09:57:50 -07:00
Nathan Hjelm	60519c2b4e	cma: add support for MIPS and ARM Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-05-30 12:13:20 -06:00
George Bosilca	d2abff583e	Fix race condition during BTL TCP tear-down. bot🏷️bug bot:assign:@hjelmn	2016-05-30 10:47:14 -05:00
Jeff Squyres	e126d2cd18	Merge pull request #1584 from bgoglin/master Update hwloc to v1.11.3	2016-05-28 11:01:54 -04:00
Ralph Castain	55923eacd3	Stealing some pieces of Josh Hursey's PR #1583 and modifying a bit, allow the opal/pmix external component to handle both PMIx 1.1.4 and PMIx 2.0 versions. Automatically detect the version of the target external library and adjust the only two APIs that changed (PMIx_Init and PMIx_Finalize) Rename temp vars in .m4 to avoid conflict with Travis	2016-05-27 08:06:31 -07:00
Nathan Hjelm	28dfa36a3f	btl/ugni: fix bug when attempting unaligned get on aries This commit fixes a programming error when using an aries nic. The documentation of ugni shows that only the local alignment restriction for get was lifted on aries. There is still a remote address alignment restriction. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-27 08:22:13 -06:00
Nathan Hjelm	c19426ac1b	btl/ugni: add support for additional atomic operations This commit adds support for Cray Aries atomic operations. This includes 32-bit and floating point support. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-27 08:22:13 -06:00
Nathan Hjelm	23fe19a956	btl: add support for more atomics This commit add support for more atomic operations and type. The operations added are logical and, logical or, logical xor, swap, min, and max. New types are 32-bit int by using the MCA_BTL_ATOMIC_FLAG_32BIT flag, 64-bit float by using the MCA_BTL_ATOMIC_FLAG_FLOAT flag, and 32-bit float by using both flags. Floating point numbers are supported by packing the number in as an int64_t or int32_t. We will update the btl interface in the future to make this less confusing. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-27 08:22:13 -06:00
Nathan Hjelm	d25b846c01	Merge pull request #1704 from hpcraink/pr/configure_framework Fix configure for FreePGI on OSX	2016-05-26 17:01:08 -06:00
Nathan Hjelm	8c9292d5d1	Merge pull request #1721 from hjelmn/xrc_fix btl/openib: fix XRC WQE calculation	2016-05-26 17:00:31 -06:00
Nathan Hjelm	56bdcd0888	btl/openib: fix XRC WQE calculation Before dynamic add_procs support was committed to master we called add_procs with every proc in the job. The XRC code in the openib btl was taking advantage of this and setting the number of work queue entries (WQE) based on all the procs on a remote node. Since that is no longer the case we can not simply increment the sd_wqe field on the queue pair. To fix the issue a new field has been added to the xrc queue pair structure to keep track of how many wqes there are total on the queue pair. If a new endpoint is added that increases the number of wqes and the xrc queue pair is already connected the code will attempt to modify the number of wqes on the queue pair. A failure is ignored because all that will happen is the number of active send work requests on an XRC queue pair will be more limited. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-26 15:58:31 -06:00
Aurelien Bouteiller	49bd28d0ac	Merge pull request #1714 from hjelmn/scif_exclusivity btl/scif: reduce default exclusivity	2016-05-26 17:53:11 -04:00
Pavel Shamis (Pasha)	60fd25f3fb	VADER: Adjusting VADER_MAX_ADDRESS for non x86 platforms. The original VADER_MAX_ADDRESS was tunned for x86_64 platforms only. For non x86_64 platforms we can use XPMEM_MAXADDR_SIZE. Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>	2016-05-26 16:38:04 -05:00
Nathan Hjelm	99627319f0	btl/ugni: reduce overhead of progress function This commit reduces the overhead of calling the ugni progress function. It does the following: - Check for new connections once every eight calls. - Do not call remote smsg progress unless we are connected to at least one remote peer. - Do not call rdma progress unless at least one rdma fragment is outstanding. - Check endpoint wait list size before obtaining a lock. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 14:27:34 -06:00
Nathan Hjelm	5caf12cd9b	btl/scif: reduce default exclusivity This commit reduces the default exclusivity so that btl/scif is not used for send/recv over other shared memory transports. Fixes open-mpi/ompi#1712 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 14:25:07 -06:00
Rainer Keller	3727cba9bb	Fix compilation for FreePGI on OSX Our checks and the ones of libevent are somewhat flawed. If adding multiple "-framework" to CXXFLAGS or CFLAGS, we strip the keyword from the command-line, not good. libevent however assumes plain gcc without testing properly that the compiler supports -Wno-deprecated-declarations.	2016-05-25 09:12:39 +02:00
Nathan Hjelm	461ca1203b	Merge pull request #1703 from hjelmn/grdma_cuda_fix rcache/grdma: fix typo in cuda code	2016-05-24 18:51:22 -06:00
bosilca	b90c83840f	Refactor the request completion (#1422 ) * Remodel the request. Added the wait sync primitive and integrate it into the PML and MTL infrastructure. The multi-threaded requests are now significantly less heavy and less noisy (only the threads associated with completed requests are signaled). * Fix the condition to release the request.	2016-05-24 18:20:51 -05:00
Nathan Hjelm	af52dad8f8	rcache/grdma: fix typo in cuda code Fixes open-mpi/ompi#1702 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-24 15:56:39 -06:00
Pavel Shamis (Pasha)	d984b4b3f9	CMA: Fixing logic for CMA system call detection The OPAL_CMA_NEED_SYSCALL_DEFS is always defined/set to 0 or 1. Therefore instead of checking if the macro is defined, we have to look at the value itself. Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>	2016-05-24 14:53:25 -05:00
Ralph Castain	80f4e3b872	Fix the --tune problem by searching the argv for MCA params in advance of opal_init_util. Only search the first app_context as we historically have done - we can debate whether or not to search all app_contexts	2016-05-23 21:09:44 -07:00
Nathan Hjelm	37e9e2c660	mca/base: fix typo in flag enumeration This commit fixes a typo in flag enumeration that can cause the parser to miss valid flags or crash. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-23 12:21:34 -06:00
Artem Polyakov	725eea2819	Fix base64 implementation in pmix framework. In the commit `80f07b65f1` setting of '-' marker used as the string termination sign was moved from base64 code: from: `80f07b65f1 (diff-1b10896c267d2591dc2c08fd0542ab67L491)` to: `80f07b65f1 (diff-1b10896c267d2591dc2c08fd0542ab67R189)` However the decoding function wasn't fixed and still expects on extra byte at the end of the encoded string which leads to data truncation during extraction (was noticed on standalone code that was using base64 from OMPI).	2016-05-23 23:30:31 +06:00
Gilles Gouaillardet	d5a2ac6f2f	btl/openib: fix #if vs #ifdef	2016-05-23 14:27:33 +09:00
Gilles Gouaillardet	5a8cbe5a8f	btl/openib: remove obsolete reference to MEMORY_LINUX_MALLOC_ALIGN_ENABLED macro	2016-05-23 14:12:21 +09:00
Gilles Gouaillardet	8466a3daf3	pmix: update .gitignore git ignore opal/mca/pmix/pmix114/pmix/include/pmix/autogen/config.h.in git rm opal/mca/pmix/pmix114/pmix/include/pmix/autogen/config.h.in git ignore opal/mca/pmix/pmix*/...	2016-05-23 11:58:07 +09:00
Nathan Hjelm	31bfeede82	bml/r2: always add btl progress function This commit changes the behavior of bml/r2 from conditionally registering btl progress functions to always registering progress functions. Any progress function beloning to a btl that is not yet in use is registered as low-priority. As soon as a proc is added that will make use of the btl is is re-registered normally. This works around an issue with some btls. In order to progress a first message from an unknown peer both ugni and openib need to have their progress functions called. If either btl is not in use after the first call to add_procs the callback was never happening. This commit ensures the btl progress function is called at some point but the number of progress callbacks is reduced from normal to ensure lower overhead when a btl is not used. The current ratio is 1 low priority progress callback for every 8 calls to opal_progress(). Fixes open-mpi/ompi#1676 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-21 15:54:04 -04:00
Ralph Castain	4e0749f03d	Remove verbose error messages	2016-05-20 10:04:26 -07:00
Ralph Castain	42ecffb6d0	Move the registration of MCA params out of the init of the var system - put them in with the rest of the OPAL MCA param registrations Take another shot at untangling the spaghetti orterun: fix for command line parsing orte-submit calls opal_init_util () before parsing out MCA command line options (-mca, -am, etc). This prevents mpirun from setting opal MCA variables for some frameworks as well as the MCA base. This is because when a framework is opened all of its variables are set to read-only. Eventually we want to lift this restriction on some MCA variables but since -mca is affected we must parse out the MCA command line options before opal_init_util(). This commit fixes the bug by adding a new option to opal_cmd_line_parse (ignore unknown option) so orte-submit can pre-parse the command line for MCA options. Signed-off-by: Nathan Hjelm <hjelmn@me.com> Minor cleanups to avoid releasing/recreating the cmd line	2016-05-20 09:59:50 -07:00
Brice Goglin	ca621330a6	Update hwloc to v1.11.3 Remove contrib/windows/ Merge hwlocXYZ/hwloc/README-ompi.txt back into hwlocXYZ/README-ompi.txt instead of having both. Add README.txt in new automake-required directory contrib/systemd/ Keep the following patches applied since they are not in 1.11.3 linux: actually enable libudev based on the result of AC_CHECK_LIB (cherry picked from open-mpi/hwloc@9549fd59af) configure: check the actual may_alias syntax that we use (cherry picked from open-mpi/hwloc@0ab7af5e90)	2016-05-20 07:20:16 +02:00
Gilles Gouaillardet	5ec1eedbae	Merge pull request #1682 from ggouaillardet/topic/fix-ethtool-again opal/util/ethtool: use system ethtool_cmd_speed when available	2016-05-20 10:30:43 +09:00
Gilles Gouaillardet	cbbdce05b1	pmix/pmix114: silence a warning	2016-05-20 09:35:26 +09:00
Gilles Gouaillardet	ed3fd1775f	rcache/grdma: silence a warning	2016-05-20 09:30:29 +09:00
Gilles Gouaillardet	a01a5487a8	opal/util/ethtool: use system ethtool_cmd_speed when available Refs: open-mpi/ompi#1679	2016-05-20 09:05:09 +09:00
rhc54	99d3c283f5	Merge pull request #1681 from rhc54/topic/pmixupdate Update PMIx 114 to current release candidate	2016-05-19 13:50:16 -07:00
Ralph Castain	6f743f81b6	Update PMIx 114 to current release candidate	2016-05-19 12:55:05 -07:00
Jeff Squyres	87233aae49	ethtool: better handle portability Be sure to handle the case where we don't have ethtool support at all. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-19 10:57:14 -07:00
Gilles Gouaillardet	fd93d236b1	opal/util/ethtool: fix compilation on older Linux when struct ethtool_cmd has no speed_hi field Refs: open-mpi/ompi#1628	2016-05-19 11:58:04 +09:00
Jeff Squyres	66f53ec29a	Merge pull request #1628 from kmroz/wip-btl-tcp-ethtool-speed btl/tcp: autodetect bandwidth and latency if unset by the user	2016-05-18 12:12:55 -04:00
Nathan Hjelm	9371a6a52d	Merge pull request #1673 from hjelmn/fix_rcache_deadlock rcache: fix deadlock in multi-threaded environments	2016-05-18 08:32:21 -07:00
Karol Mroz	ca6ddf3270	btl/tcp: autodetect bandwidth and latency if unset Fixes open-mpi/ompi#120 Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-05-18 16:25:52 +02:00
Karol Mroz	b9c6c43c6b	btl/tcp: add default defines for bandwidth and latency Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-05-18 16:25:52 +02:00
Karol Mroz	31e33a64f9	opal/util: add function to obtain interface speed If kernel ethtool_cmd_speed() is not available, use copies if possible. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-05-18 16:25:51 +02:00
Nathan Hjelm	ab8ed177f5	rcache: fix deadlock in multi-threaded environments This commit fixes several bugs in the registration cache code: - Fix a programming error in the grdma invalidation function that can cause an infinite loop if more than 100 registrations are associated with a munmapped region. This happens because the mca_rcache_base_vma_find_all function returns the same 100 registrations on each call. This has been fixed by adding an iterate function to the vma tree interface. - Always obtain the vma lock when needed. This is required because there may be other threads in the system even if opal_using_threads() is false. Additionally, since it is safe to do so (the vma lock is recursive) the vma interface has been made thread safe. - Avoid calling free() while holding a lock. This avoids race conditions with locks held outside the Open MPI code. Fixes open-mpi/ompi#1654. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-17 09:02:40 -06:00
Nathan Hjelm	f6938868bd	Merge pull request #1659 from hjelmn/sync_64 sync_builtin: check for 64-bit atomic support	2016-05-17 05:40:04 -07:00
rhc54	8b534e9897	Merge pull request #1668 from rhc54/topic/slurm When direct launching applications, we must allow the MPI layer to pr…	2016-05-16 12:23:19 -07:00
Howard Pritchard	1a676e5b35	pmix/cray: fix some breakage Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-05-16 12:45:05 -05:00
Gilles Gouaillardet	4e21933a74	memory/patcher: declare __curbrk as extern in order not to generate an (unitialized) common symbol	2016-05-16 09:30:11 +09:00
Ralph Castain	01ba861f2a	When direct launching applications, we must allow the MPI layer to progress during RTE-level barriers. Neither SLURM nor Cray provide non-blocking fence functions, so push those calls into a separate event thread (use the OPAL async thread for this purpose so we don't create another one) and let the MPI thread sping in wait_for_completion. This also restores the "lazy" completion during MPI_Finalize to minimize cpu utilization. Update external as well Revise the change: we still need the MPI_Barrier in MPI_Finalize when we use a blocking fence, but do use the "lazy" wait for completion. Replace the direct logic in MPI_Init with a cleaner macro	2016-05-14 16:37:00 -07:00
Gilles Gouaillardet	456b73da69	btl/openib: fix error path in init_one_device() do not explicitly release ib verbs components since they will be released in the object destructor Thanks Durga for the report	2016-05-13 09:03:48 +09:00
Gilles Gouaillardet	5dae7a47ff	amd64/atomic: silence warnings Solaris Studio compilers issue (tons of) warnings because one arguments of several __asm__ __volatile__ section is not needed	2016-05-11 11:26:50 +09:00
Jeff Squyres	30f913f217	Merge pull request #1652 from jsquyres/pr/remove-aix-timer timer/aix: remove stale code	2016-05-10 15:47:02 -04:00
Jeff Squyres	eccf0ff4cd	hwloc/external: set WRAPPER_EXTRA_* vars in proper location WRAPPER_EXTRA flags are checked before the POST_CONFIG macro is invoked. So set them in the main CONFIG macro. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-10 07:34:56 -07:00
Josh Hursey	44d95cb610	Merge pull request #1657 from bgoglin/hwloc-for-2.0 configure: check the actual may_alias syntax that we use	2016-05-09 13:37:08 -05:00
Ralph Castain	7767882346	Per user request, add some missing data and definitions: OPAL_PMIX_UNIV_RANK - synonym for OPAL_PMIX_GLOBAL_RANK OPAL_PMIX_APP_SIZE - #ranks in the application of this proc	2016-05-09 08:39:01 -07:00
Nathan Hjelm	d99a9786b6	sync_builtin: check for 64-bit atomic support This commit adds an additional check for 64-bit atomic support for __sync builtins. If 64-bit support is not available the opal_atomic_*_64 atomics are disabled. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-05-09 03:17:51 -06:00
Brice Goglin	6839d928c2	configure: check the actual may_alias syntax that we use xlc 13.1.0 crashes because of our may_alias attributes in nolibxml.c on Power7. libxml.c and nolibxml.c are the only may_alias users for now, so change our configure check to match the actual code using it. Thanks to Paul Hargrove for reporting and debugging the issue, and providing the patch. https://www.open-mpi.org/community/lists/devel/2016/05/18918.php (cherry picked from open-mpi/hwloc@0ab7af5e90)	2016-05-08 22:22:30 +02:00
Ralph Castain	7594b95e4b	Ensure the hwloc external header is include when --with-devel-headers is given	2016-05-08 10:18:14 -07:00
Jeff Squyres	acbd2c608d	memory/patcher: check for <sys/syscall.h> Thanks to Paul Hargrove for reporting. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-07 09:48:14 -07:00
Jeff Squyres	b4982d7725	timer/aix: remove stale code Per discussion on the mailing list and with IBM, remove the AIX timer code (since AIX is no longer supported). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-07 09:31:34 -07:00
Ralph Castain	7e5ef6a240	Fix the env_list support - the MCA param was being set way too early, so provide a "backdoor" way of providing the value	2016-05-06 15:38:39 -07:00
Ralph Castain	58dd41facf	Repair the processing of cmd line options that mapped to MCA params. This was responsible for breaking things like map-by <foo>. Remove debug, let orterun send terminate cmd to DVM Recover the DVM support	2016-05-06 13:14:03 -07:00
Josh Hursey	35ae7e33d7	Merge pull request #1639 from jjhursey/topic/dl-open-null-fname dl/dlopen/libltdl: Allow opal_dl_open to take a NULL filename.	2016-05-05 22:15:46 -05:00
Ralph Castain	8ec1891d11	Silence warning	2016-05-05 20:04:10 -07:00
Ralph Castain	08022d7af1	Some minor cleanups of warnings from gcc 6.0.0. Update s1/s2 pmix to get max_procs as required.	2016-05-05 15:28:13 -07:00
Joshua Hursey	677178f206	dl/dlopen/libltdl: Allow opal_dl_open to take a NULL filename.	2016-05-05 17:07:26 -04:00
Nathan Hjelm	80f45925bc	Merge pull request #1629 from hjelmn/new_hooks_update New hooks update	2016-05-04 18:53:25 -06:00
Joshua Hursey	788cf1a9fe	asm/powerpc: Fix empty colon list in asm for XL compiler on power Thanks to Paul Hargrove for reporting the problem, and submitting patch. * https://www.open-mpi.org/community/lists/devel/2016/05/18886.php	2016-05-04 14:14:33 -05:00
Nathan Hjelm	ff2a54bd37	patcher/linux: code cleanup Update based on cleanup made to the upstream version on OpenUCX. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-04 12:53:45 -06:00
Nathan Hjelm	6c9a0e1c55	patcher/overwrite: disable ia64 support for now Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-04 12:53:24 -06:00
Nathan Hjelm	6ad68da407	patcher/linux: disable the linux patcher component This commit disables the linux patcher component due to a limitation in loader patching. While this component is effective in patching calls made within Open MPI and by the application it fails to hook calls made within glibc. This means the munmap call made by free is not correctly hooked. Until this problem can be resolved this component will remain disabled. If it can't be resolved this component should probably be removed. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-04 12:50:51 -06:00
Nathan Hjelm	71be36d380	patcher: fix ppc32 support The table of contents (TOC) code only appears to only apply to ppc64. The code was incorrectly assuming the existence of the TOC on ppc32. This commit updates the necessary code to only apply to ppc64. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-04 12:50:32 -06:00
Nathan Hjelm	41f00b7465	memory/patcher: initialize patcher framework when needed This commit moves the patcher framework initialization to the memory/patcher component. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-04 12:46:42 -06:00
Nathan Hjelm	0f54a95408	Merge pull request #1626 from hjelmn/vader_32 btl/vader: fix compilation on 32-bit systems	2016-05-03 16:39:46 -06:00
Nathan Hjelm	4a740e9f27	Merge pull request #1619 from hjelmn/ext_verbs_fix btl/openib: fix check for exp verbs struct members	2016-05-03 14:16:17 -06:00
Nathan Hjelm	e7ccbdee27	btl/vader: fix compilation on 32-bit systems This commit fixes a compile/link issue caused by vader. The vader btl was using OPAL_THREAD_ADD64 to increment a counter which may not be available on 32-bit systems. Changed to use OPAL_THREAD_ADD_SIZE_T which will be 64-bit or 32-bit depending on the system. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-03 10:14:44 -06:00
Nathan Hjelm	2d0e2b6233	patcher: do not clobber ebx ebx can not be clobbered when using -fPIC so save and restore the register instead of allowing it to be clobbered. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-05-03 08:24:33 -06:00
Brice Goglin	a2a721f961	linux: actually enable libudev based on the result of AC_CHECK_LIB instead of doing AC_CHECK_HEADERS+AC_CHECK_LIB and only using the result of the former. Thanks to Paul Hargrove for reporting the issue (OMPI build with -m32). (cherry picked from open-mpi/hwloc@9549fd59af)	2016-05-03 10:00:40 +02:00
Nathan Hjelm	da695a6ce6	Merge pull request #1618 from hjelmn/new_hooks_update More hook updates	2016-05-02 18:12:50 -06:00
Nathan Hjelm	a65af6d079	btl/openib: fix check for exp verbs struct members This commit fixes a compilation issue with some versions of exp verbs. In some cases struct ibv_exp_device_attr does not have either the exp_atom or exp_atomic_cap fields. It is fine to drop one check and fall back to the non-exp attribute check on the other. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-02 17:13:33 -06:00
Nathan Hjelm	1ff79656dd	patcher: remove debug fprintf Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-02 17:11:00 -06:00
Nathan Hjelm	581e47c271	patcher: check for clflush Add a feature check for clflush before trying to use the clflush instruction. As far as I can tell there is no equivalent before the SSE2 instruction set. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-02 17:10:42 -06:00
Nathan Hjelm	67fd6fa6eb	Merge pull request #1615 from hjelmn/new_hooks_update memory/patcher: add #if check for MREMAP_FIXED	2016-05-02 16:26:58 -06:00
Nathan Hjelm	eb14b34f04	memory/patcher: fix compilation on BSDs The function signature of mremap on BSD (NetBSD, FreeBSD) differs from the linux version. Added support for the BSD style of mremap. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-02 14:54:08 -06:00
Nathan Hjelm	52edb43bdc	memory/patcher: check for linux/mman.h Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-02 14:29:46 -06:00
Nathan Hjelm	f8b3be6236	patcher/overwrite: fix ia64 compilation Fixed a couple of typos in ia64 code. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-02 14:10:34 -06:00
Nathan Hjelm	14c34ae9f0	memory/patcher: add #if check for MREMAP_FIXED This commit fixes a compile error when the system has mremap but not MREMAP_FIXED. In this case we do not care about the value of new_address as the argument does not exist. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-02 13:58:51 -06:00
rhc54	648043597a	Merge pull request #1612 from ggouaillardet/poc/pmix_external_configury pmix/external: revamp external pmix package detection	2016-05-02 09:46:05 -07:00
Jeff Squyres	265e5b9795	Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1 ompi/opal/orte/oshmem/test: max hostname length cleanup	2016-05-02 09:44:18 -04:00
Gilles Gouaillardet	45f9a47d77	pmix/external: fix typo and silence a warning	2016-05-02 17:15:52 +09:00
Gilles Gouaillardet	08d91b9a03	pmix/external: revamp external pmix package detection	2016-05-02 16:23:31 +09:00
Ralph Castain	6ac7929bd0	Extend the schizo framework to allow definition of CLI options by environment. Refactor orterun to mesh with the orted_submit code, thus improving code reuse. Eliminate the orte-submit tool as orterun can now meet that need. Cleanups per @jjhursey review	2016-05-01 11:30:25 -07:00
George Bosilca	3445577f4c	Avoid race conditions during BTP TCP handshake. In some rare cases when a process receives the connect ack while locally updating the peer endpoint structure, we could drop the incomming connect ack due to the fact that the send handler is protected with a try lock (on the endpoint) and our initial send event was not persistent. Making the send event persistent solves all issues.	2016-05-01 14:19:29 -04:00
George Bosilca	702f80ad7e	Remove "signed vs. unsigned" warnings.	2016-05-01 11:45:48 -04:00
Ralph Castain	42d9d861fc	Fix minor typo in PMIx packing of pmix_app_t - thanks to Gilles for pointing it out	2016-04-29 08:55:46 -07:00
Howard Pritchard	f52dd511d4	Merge pull request #1600 from hppritcha/topic/pmix_fix_for_finalize pmix/cray: set fence_nb to NULL	2016-04-28 13:50:15 -06:00
hppritcha	aa1d7b9c50	pmix/cray: set fence_nb to NULL Rather than have a stub function for the pmix fence_nb operation, just set to NULL. Causes fewer problems. Fixes #1597 Fixes #1527 Signed-off-by: hppritcha <howardp@lanl.gov>	2016-04-28 13:48:54 -05:00
Nysal Jan K.A	18cf65dc24	Remove a stray print statement	2016-04-28 18:00:52 +05:30
Nathan Hjelm	03f4a854cb	btl/tcp: fix add_procs race condition This commit fixes a race between a thread calling the tcp btl's add_procs and a thread processing an incomming connection. The race occured because the add_procs thread adds a newly created proc object to the hash table before the object is fully initialized. The connection thread then attempts to use the object before the endpoints array on the object has beeen allocation. The fix is to only add the proc to the hash table after it has been completely initialized. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-27 10:24:39 -06:00
Nathan Hjelm	8f93b15e90	Merge pull request #1580 from hjelmn/new_hooks_update memory/patcher: cast away const in shmdt hook	2016-04-26 17:48:01 -06:00
Nathan Hjelm	df194087c7	Merge pull request #1591 from hjelmn/rcache_update rcache: fix leave_pinned failure path	2016-04-26 16:50:06 -06:00
Nathan Hjelm	25a97af695	rcache: fix leave_pinned failure path This commit fixes an error in the failure path of leave_pinned. When the rcache tries to enable leave_pinned but leave_pinned was not specifically requested (opal_leave_pinned == -1) the code was erroneously printing an error and returning NULL. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-26 14:39:23 -06:00
Ralph Castain	02876564d4	Silence warning of zero-byte malloc	2016-04-26 11:55:59 -07:00
Nathan Hjelm	5612998d21	memory/patcher: cast away const in shmdt hook The opal_mem_hooks_release_hook does not have const on the pointer (though it probably should). This commit eliminates a warning by casting away the const until opal_mem_hooks_release_hook is updated to use const. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-25 15:32:11 -06:00
Jeff Squyres	78b367eb0d	memory patcher: add some clarifying comments This is complicated stuff: add some comments so that future maintainers have some rationale to understand the way things have been done. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-04-25 13:12:02 -07:00
Geoffrey Paulsen	55a15fb1d0	Missed one IBM Copyright message for contributions in memory patcher component	2016-04-25 15:47:15 -04:00
Geoffrey Paulsen	ed6f508735	Updated IBM Copyright message for contributions in memory patcher component.	2016-04-25 15:13:38 -04:00
Karol Mroz	e1c64e6e59	opal: standardize on max hostname length Define OPAL_MAXHOSTNAMELEN to be either: (MAXHOSTNAMELEN + 1) or (limits.h:HOST_NAME_MAX + 1) or (255 + 1) For pmix code, define above using PMIX_MAXHOSTNAMELEN. Fixup opal layer to use the new max. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-24 08:19:47 +02:00
Nathan Hjelm	ae0ffbb67f	Merge pull request #1397 from hjelmn/enable_thread_multiple ompi: always enable MPI_THREAD_MULTIPLE support	2016-04-23 08:40:22 -06:00
Jeff Squyres	dc18c32437	usnic: fix resource check The math for checking the number of QPs and CQs per usNIC/VF was incorrect, allowing you to run MPI processes even when usNICs (i.e., VIC VFs) had fewer QPs and CQs than were necessary. This led to a confusing error later when fi_enable(3) failed (because we lazily create QPs). Fixing the math here ensure that we actually print a helpful error message telling the user specifically what is wrong. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-04-22 15:58:27 -07:00
George Bosilca	d379e23bf7	One less warning. The heterogeneous code need to gracefully handly the contiguous datatype loops in order to have the "#if 0" code path enabled again. This is a performance issue (the correctness is guaranteed by the current code).	2016-04-21 18:11:29 -04:00
Jeff Squyres	68c1a5eb6c	Merge pull request #1567 from jsquyres/pr/fix-ompi-to-opal-name-conversion m4: rename OMPI_SUMMARY_* macros to OPAL_SUMMARY_*	2016-04-20 13:10:06 -04:00
Jeff Squyres	6800ef9ec0	m4: rename OMPI_SUMMARY_* macros to OPAL_SUMMARY_* These macros should really be named OPAL_SUMMARY_*; they're used in all projects, and therefore should be in the lowest later project (OPAL). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-04-20 08:40:00 -07:00
Nathan Hjelm	db854c368a	memory/patcher: do not hook madvise if the syscall doesn't exist Fixes open-mpi/ompi#1565 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-20 09:18:31 -06:00
Nathan Hjelm	fdd1ff7c29	Merge pull request #1562 from hjelmn/opal_coverity mca/base: fix coverity issue	2016-04-19 14:29:05 -06:00
Nathan Hjelm	3f15d442de	Merge pull request #1561 from hjelmn/mpool_rewrite rcache/base: add missing file to tarball	2016-04-19 12:00:06 -06:00
Nathan Hjelm	16c28399cd	rcache/base: add missing file to tarball Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 11:03:38 -06:00
Nathan Hjelm	d981a9fc7d	patcher/overwrite: fix compile error on x86 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 10:16:39 -06:00
Nathan Hjelm	5bc9d9d1f8	patcher/linux: fix compiler warnings Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 10:16:20 -06:00
Nathan Hjelm	1147fb3dd1	patcher/linux: ensure component is only enabled on Linux Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 10:16:00 -06:00
Nathan Hjelm	a8e90e8796	memory/patcher: munmap hook could be called from within a malloc() implementation Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-18 15:00:07 -06:00
Nathan Hjelm	7f271171f6	memory/patcher: fix coverity warning Fix CID 1358512: Error handling issues (NEGATIVE_RETURNS): C libraries usually handle read (-1, ...) fine but it is safer to avoid calling read with a negative handle. Added negative file descriptor check. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-18 11:29:20 -06:00
Gilles Gouaillardet	d96919638f	pmix: remote autogenerated file and update .gitignore removed: opal/mca/pmix/pmix114/pmix/src/include/private/autogen/config.h.in	2016-04-18 12:57:41 +09:00
Ralph Castain	b009e58d25	Roll to PMIx 1.1.4rc2 - replaces some code that was incorrectly removed in prior update	2016-04-16 18:24:24 -07:00
Ralph Castain	8ff114e668	Update to official PMIx 1.1.4rc1	2016-04-15 21:47:46 -07:00
Ralph Castain	449ec41532	Roll to PMIx 1.1.4rc1 and remove the PMIx 1.2.0 directory as the community has decided to not do that release version. This incorporates a number of bug fixes that have been identified and repaired in the PMIx and OMPI code bases. Also includes several minor corrections to the PMIx code so it now supports run-thru without hanging on collectives involving a process that exits	2016-04-15 10:11:11 -07:00
Nathan Hjelm	4d9c047f04	Merge pull request #1546 from hjelmn/mpool_rewrite rcache: add missing file	2016-04-14 10:22:09 -06:00
Nathan Hjelm	9046424be5	rcache: add missing file Fixes #1545 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-14 09:21:09 -06:00
Ralph Castain	4b3995dd27	Trivial change to silence warning	2016-04-14 05:54:02 -07:00
Nathan Hjelm	1e6b4f2f55	Merge pull request #1495 from hjelmn/new_hooks Add new patcher memory hooks	2016-04-13 18:19:23 -06:00
Nathan Hjelm	c2b6fbb124	opal/memory: move initialization to first rcache creation Because of the removal of the linux memory component it is no longer necessary to initialize the memory component in opal_init(). This commit moves the initialization to the creation of the first rcache component. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-13 17:21:46 -06:00
Nathan Hjelm	80ec79cfc8	memory/patcher: updates to memory hooks This commit fixes bugs that can cause crashes and memory corruption when the mremap hook is called. The problem occurs because of the ellipses (...) in the mremap intercept function. The ellipses cover the optional new_addr argument on Linux. This commit removes the ellipses and adds an explicit 5th argument. This commit also adds a hook for shmdt. The code only works on Linux at the moment as it needs to read /proc/self/maps to determine the size of the shared memory segment. Additionally, this commit removes the mmap hook. There is no apparent benefit for detecting mmap(..., PROT_NONE, ...) and it seems to cause problems when threads are in use. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-13 17:20:24 -06:00
Nathan Hjelm	91bcab93cb	opal/memory: remove ptmalloc2 This commit removes the ptmalloc2 memory hooks. This is necessary in order to support lazy registration of memory hooks. A feature that is not supported by the ptmalloc hooks but is supported by the new patcher hooks. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-13 17:18:15 -06:00
Nathan Hjelm	27f8a4e806	opal: add code patcher framework This commit adds a framework to abstract runtime code patching. Components in the new framework can provide functions for either patching a named function or a function pointer. The later functionality is not being used but may provide a way to allow memory hooks when dlopen functionality is disabled. This commit adds two different flavors of code patching. The first is provided by the overwrite component. This component overwrites the first several instructions of the target function with code to jump to the provided hook function. The hook is expected to provide the full functionality of the hooked function. The linux patcher component is based on the memory hooks in ucx. It only works on linux and operates by overwriting function pointers in the symbol table. In this case the hook is free to call the original function using the function pointer returned by dlsym. Both components restore the original functions when the patcher framework closes. Changes had to be made to support Power/PowerPC with the Linux dynamic loader patcher. Some of the changes: - Move code necessary for powerpc/power support to the patcher base. The code is needed by both the overwrite and linux components. - Move patch structure down to base and move the patch list to mca_patcher_base_module_t. The structure has been modified to include a function pointer to the function that will unapply the patch. This allows the mixing of multiple different types of patches in the patch_list. - Update linux patching code to keep track of the matching between got entry and original (unpatched) address. This allows us to completely clean up the patch on finalize. All patchers keep track of the changes they made so that they can be reversed when the patcher framework is closed. At this time there are bugs in the Linux dynamic loader patcher so its priority is lower than the overwrite patcher. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-13 17:16:13 -06:00
Nathan Hjelm	4cac623aeb	opal/patch: add call to check if binary patching is supported Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-13 17:16:12 -06:00
Nathan Hjelm	11e2d7886e	opal/memory: update component structure This commit makes it possible to set relative priorities for components. Before the addition of the patched component there was only one component that would run on any system but that is no longer the case. When determining which component to open each component's query function is called and the one that returns the highest priority is opened. The default priority of the patcher component is set slightly higher than the old ptmalloc2/ummunotify component. This commit fixes a long-standing break in the abstration of the memory components. ompi_mpi_init.c was referencing the linux malloc hook initilize function to ensure the hooks are initialized for libmpi.so. The abstraction break has been fixed by adding a memory base function that calls the open memory component's malloc hook init function if it has one. The code is not yet complete but is intended to support ptmalloc in 2.0.0. In that case the base function will always call the ptmalloc hook init if exists. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-13 17:14:51 -06:00
Nathan Hjelm	7aa03d66b3	opal/memory: add support for patch based memory hooks This commit adds support for runtime binary patching. The support is broken down into two parts: util/opal_patcher.[ch] which contains the functionality for runtime patching of symbols, and mca/memory/patcher which patches the various symbols needed to provide support for memory hooks. This work is preliminary and is based off work donated by IBM. The patcher code is disabled if dlopen is disabled. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-13 17:14:31 -06:00
Gilles Gouaillardet	c72688e8cf	Merge pull request #1362 from ggouaillardet/topic/openib_warn_default_gid_prefix btl/openib: correctly issue a warning when two btls or more are in th…	2016-04-11 13:22:48 +09:00
Gilles Gouaillardet	4ab6c8ad56	mpool/hugepage: use statvfs() instead of statfs() when needed. Thanks Siegmar Gross for the report.	2016-04-11 11:13:29 +09:00
Ralph Castain	2432daf065	Some minor cleanups of a memory leak and error output	2016-04-08 07:46:18 -07:00
Rainer Keller	52080a5736	As per the pull request to pmix/master: https://github.com/pmix/master/pull/71 Have OMPI's current version of pmix120 nicely fail in case of too long sun_path (longer than 108 or in case of OSX 103 chars). And have OMPI return proper error messages with hints how to amend.	2016-04-07 22:12:53 +02:00
Thananon Patinyasakdikul	92290b94e0	Fixed Coverity reports 1358014-1358018 (DEADCODE and CHECK_RETURN)	2016-04-07 12:52:17 -04:00
rhc54	f858647779	Merge pull request #1522 from kmroz/wip-ompi-info-params-fix opal_info_support: fix memory leak and refactor for pvars	2016-04-05 10:02:42 -07:00
Nathan Hjelm	444190093a	Merge pull request #1516 from kmroz/wip-ompi-info-cleanup opal_info_support: fix api comments	2016-04-05 07:31:33 -06:00
Nathan Hjelm	9efd465539	Merge pull request #1517 from hjelmn/ugni_fixes Gemini/Aries bug fixes	2016-04-05 07:23:18 -06:00
George Bosilca	26fc8533f8	Remove compiler warnings.	2016-04-04 16:34:23 -04:00
Karol Mroz	13979f559f	opal_info_support: refactor component output separation Adding component name to the pvar pretty output as well. Further, I think keeping the asprintf()/opal_info_out(msg,msg,------) within the loops is needed to avoid printing any component information (independently for group_vars and group_pvars) in case the upcoming parameters are internal and not to be displayed. Lastly, unnecessarily duplicating the dashed output should not happen as each invocation of opal_info_show_mca_group_params() passes a new group structure which we check. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-04 19:49:57 +02:00
Karol Mroz	43254391a8	opal_info_support: fix memory leak Fixing a memory leak I introduced with `a3229c3a1f`. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-04 18:11:11 +02:00
Gilles Gouaillardet	6f450630d8	pmix/external: fix misc missing conversion and type issues	2016-04-04 10:12:34 +09:00
Gilles Gouaillardet	2ede47c462	pmix: fix misc missing conversion and type issues	2016-04-04 10:12:34 +09:00
rhc54	a548232c6b	Merge pull request #1518 from kmroz/wip-ompi-info-param-output-1 opal_info_support: add component to param pretty output	2016-04-03 06:39:00 -07:00
rhc54	d724d8a673	Merge pull request #1515 from kmroz/wip-ompi-info-all-2 opal_info_support: output component versions	2016-04-03 06:36:50 -07:00
Karol Mroz	e1eb23e7eb	opal_info_support: separate parameter groups in pretty output Use a dashed line to separate parameters based on component when pretty printing. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-03 10:56:39 +02:00

... 5 6 7 8 9 ...

4653 Коммитов