openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	e968ddfe64	start bug fixes (#1729 ) * mpi/start: fix bugs in cm and ob1 start functions There were several problems with the implementation of start in Open MPI: - There are no checks whatsoever on the state of the request(s) provided to MPI_Start/MPI_Start_all. It is erroneous to provide an active request to either of these calls. Since we are already looping over the provided requests there is little overhead in verifying that the request can be started. - Both ob1 and cm were always throwing away the request on the initial call to start and start_all with a particular request. Subsequent calls would see that the request was pml_complete and reuse it. This introduced a leak as the initial request was never freed. Since the only pml request that can be mpi complete but not pml complete is a buffered send the code to reallocate the request has been moved. To detect that a request is indeed mpi complete but not pml complete isend_init in both cm and ob1 now marks the new request as pml complete. - If a new request was needed the callbacks on the original request were not copied over to the new request. This can cause osc/pt2pt to hang as the incoming message callback is never called. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> * osc/pt2pt: add request for gc after starting a new request Starting a new receive may cause a recursive call into the pt2pt frag receive function. If this happens and the prior request is on the garbage collection list it could cause problems. This commit moves the gc insert until after the new request has been posted. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-02 20:22:40 -04:00
George Bosilca	e1c6b0e4a7	Some compilers are more than picky.	2016-06-03 09:04:34 +09:00
Matias Cabral	a15b648e65	Merge pull request #1747 from matcabral/master Adding owner file for PSM2 MTL.	2016-06-02 16:32:26 -07:00
Matias A Cabral	29ab28f4f6	Adding owner.txt file for PSM2 MTL.	2016-06-02 16:26:16 -07:00
Nathan Hjelm	d9fc855955	Merge pull request #1743 from hjelmn/gcc_atomics_fix atomic/gcc: add check for 128-bit CAS being lock-free	2016-06-02 16:55:31 -06:00
Nathan Hjelm	d86e41ea13	atomic/gcc: add check for 128-bit CAS being lock-free Compiler implementations are free to include support for atomics that use locks. Unfortunately lock-free and lock atomics do not mix. Older versions of llvm on OS X use locks to provide __atomic_compare_exchange on 128-bit values but are lock-free on 64-bit values. This screws up our lifo implementation which mixes 64-bit and 128-bit atomics on the same values to improve performance. This commit adds a configure-time check if 128-bit atomics are lock free. If they are not then the 128-bit __atomic CAS is disabled and we check for the __sync version as a fallback. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-02 15:59:05 -06:00
Nathan Hjelm	5aab4b2d51	Merge pull request #1662 from ggouaillardet/topic/amd64_atomic amd64/atomic: silence warnings	2016-06-02 14:10:20 -06:00
George Bosilca	d577e12dd0	Fix comment.	2016-06-03 00:57:31 +09:00
George Bosilca	87b1d17e7e	Remove warnings. clang 7.0 with the picky option on is extremely verbose, and complains about almost everything. Trying to make him happy, at least regarding the datatype engine.	2016-06-03 00:56:24 +09:00
George Bosilca	fc5d458249	Consistency in handling OPAL_ENABLE_FT_CR. I am not sure if we should continue to maintain the request support for FT_CR, but I tried here to simplify the code while maintaining the same meaning.	2016-06-03 00:54:24 +09:00
rhc54	483b9c370a	Merge pull request #1741 from rhc54/topic/pmix114 Update to 1.1.4rc3	2016-06-02 06:57:37 -07:00
Nathan Hjelm	fc26d9c69f	Merge pull request #1734 from hjelmn/progress_threading opal/progress: make progress function registration mt safe	2016-06-02 06:35:59 -06:00
Nathan Hjelm	b001184e63	request: fix warnings (#1742 ) Fix warnings introduced by request rework. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-02 04:53:16 -04:00
Ralph Castain	ecea1e3bb5	Update to 1.1.4rc3	2016-06-01 20:56:07 -07:00
George Bosilca	bfcf145613	Refactor the request test and wait functions.	2016-06-02 11:58:25 +09:00
Nathan Hjelm	2fad3b9bc6	opal/progress: make progress function registration mt safe This commit fixes a bug in opal progress registration that can cause crashes when a progress function is registered while another thread is in opal_progress(). Before this commit realloc is used to allocate more space for progress functions but it is possible for a thread in opal_progress() to try to read from the array that is freed by realloc before the array is re-assigned when realloc returns. To prevent this race use malloc + memcpy to fill the new array and atomically swap out the old and new array pointers. Per suggestion we now allocate a default of 8 slots for callbacks and double the current number when we run out of space. This commit also fixes leaking the callbacks_lp array. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-01 20:57:19 -06:00
George Bosilca	d9fb59bea5	Update the synchronization primitive Add comments and make sure we correctly return the status of the synchronization primitive, especially if it was completed with error.	2016-06-02 11:53:56 +09:00
George Bosilca	2e1b1d34c6	Safety first !	2016-06-02 11:52:43 +09:00
George Bosilca	50cec456fb	ompi_request_complete with signal Rewrite the ompi_request_complete function to take in account the with_signal argument. Change the comment to explain the expected behavior. Alter all the ompi_request_complete uses to make sure the status of the request is set before calling ompi_request_complete. bot🏷️enhancement	2016-06-02 11:49:12 +09:00
George Bosilca	223d75595d	Give a boost to MPI_Barrier. Based on current implementation it is faster to use a blocking send than the non-blocking version. Switch the exchange function used in the barrier to use the blocking version combined with the non-blocking version of the receive.	2016-06-02 11:45:25 +09:00
rhc54	3b68c1f8db	Merge pull request #1740 from rhc54/topic/async Add an experimental ability to skip the RTE barriers at the end of MPI_Init and the beginning of MPI_Finalize	2016-06-01 18:31:35 -07:00
Nathan Hjelm	f33bbfd381	atomic: add support for __atomic builtins (#1735 ) * atomic: add support for __atomic builtins This commit adds support for the gcc __atomic builtins. The __sync builtins are deprecated and have been replaced by these atomics. In addition, the new atomics support atomic exchange which was not supported by __sync. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> * atomic: add support for transactional memory This commit adds support for using transactional memory when using opal atomic locks. This feature is enabled if the __HLE__ feature is available and the gcc builtin atomics are in use. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-01 21:23:47 -04:00
Ralph Castain	2c086e56be	Add an experimental ability to skip the RTE barriers at the end of MPI_Init and the beginning of MPI_Finalize	2016-06-01 17:01:15 -07:00
rhc54	b85a5e62ab	Merge pull request #1739 from rhc54/topic/pmix Split the pmix external component into one for the 1.1.4 release, and…	2016-06-01 16:24:44 -07:00
Nathan Hjelm	d844442683	Merge pull request #1738 from hjelmn/ob1_req_fix pml/ob1: fix race on pml completion of send requests	2016-06-01 15:21:52 -06:00
Ralph Castain	12ecf972af	Split the pmix external component into one for the 1.1.4 release, and another for the upcoming 2.0 release. Clean up the configury so the components look for a series-specific function instead of running a program. NOTE: the changes for the 2.0 series are not yet in the PMIx master.	2016-06-01 14:15:24 -07:00
Jeff Squyres	873cebb4c0	Merge pull request #1727 from jsquyres/pr/mpirun-timeout-and-friends mpirun.1in: add descriptions of new options	2016-06-01 17:11:44 -04:00
Nathan Hjelm	ceb2912838	Merge pull request #1736 from hjelmn/ugni_fixes ugni BTL fixes	2016-06-01 14:59:55 -06:00
Nathan Hjelm	086ffc1838	pml/ob1: fix race on pml completion of send requests The request code was setting the request as pml_complete before calling MCA_PML_OB1_SEND_REQUEST_MPI_COMPLETE. This was causing MCA_PML_OB1_SEND_REQUEST_RETURN to be called twice in some cases. The code now mirrors the recvreq code and only sets the request as pml complete if the request has not already been freed. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-01 13:36:06 -06:00
Jeff Squyres	2c3d522147	Merge pull request #1737 from jsquyres/pr/fix-hwloc-valgrind-check fix hwloc valgrind check	2016-06-01 11:14:02 -04:00
Jeff Squyres	d175fd692d	README.ompi: track patches added to hwloc Track post-v1.11.3-release patches applied to the hwloc copy embedded in Open MPI. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-06-01 07:17:05 -07:00
Jeff Squyres	3867bd3640	hwloc.m4: only check for valgrind in non-embedded mode This fixes https://github.com/open-mpi/ompi/issues/1732: i.e., the case where the outer project has its own check for <valgrind/valgrind.h>, but also supplements CPPFLAGS (to find Valgrind's header files) before doing that check. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Ideally, we would tell OMPI to disable autoconf's caching of our valgrind check result so that its check gets the right result after adding CPPFLAGS. Not sure if we can do that. For now, just disable our Valgrind code in embedded mode. This will keep the x86 backend enabled under Valgrind but it will auto-disable itself when finding identical APIC ids anyway (because CPUID returns same outputs for all PUs). Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Fixes open-mpi/ompi#1732 (cherry picked from commit open-mpi/hwloc@8b44fb1c81)	2016-06-01 06:58:53 -07:00
Gilles Gouaillardet	57978a75d0	Merge pull request #1717 from ggouaillardet/topic/lex_cleanup configury: clean the flex generated .c files	2016-06-01 13:06:21 +09:00
Nathan Hjelm	5d4bcce042	Merge pull request #1700 from shamisp/topic/cma_config CMA: Fixing logic for CMA system call detection	2016-05-31 20:33:48 -06:00
Nathan Hjelm	340152a635	Merge pull request #1720 from shamisp/topic/vader/max_addr VADER: Adjusting VADER_MAX_ADDRESS for non x86 platforms.	2016-05-31 20:33:28 -06:00
Gilles Gouaillardet	5f565dfec3	configury: clean the flex generated .c files	2016-06-01 11:13:31 +09:00
Jeff Squyres	cf27ec36b3	mpirun.zsh: add options to zsh shell completion Add the following to zsh shell completion: * --get-stack-traces * --report-state-upon-timeout * --timeout Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-31 16:33:46 -07:00
Jeff Squyres	e9ce11c6a7	help-orterun.txt: minor word smything Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-31 16:33:46 -07:00
Jeff Squyres	347497cc7e	mpirun.1in: add descriptions of new options Add descriptions for the new --report-state-on-timeout and --get-stack-traces options. Also add --timeout, and cross-reference MPIEXEC_TIMEOUT with it. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-31 16:33:46 -07:00
Nathan Hjelm	bf10d79914	btl/ugni: remove erroneous unlock The endpoint lock was being released twice in mca_btl_ugni_get_ep. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-31 16:52:53 -06:00
Nathan Hjelm	cc96097873	btl/ugni: fix bug when attempting unaligned get on aries This commit fixes a programming error when using an aries nic. The documentation of ugni shows that only the local alignment restriction for get was lifted on aries. There is still a remote address alignment restriction. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-31 16:52:09 -06:00
Jeff Squyres	17202e5177	Merge pull request #1733 from jsquyres/pr/hwloc1113-fix hwloc1113: add missing file to Makefile.am	2016-05-31 13:59:08 -04:00
Jeff Squyres	5cfee95ea4	hwloc1113: add missing file to Makefile.am Lack of this file causes a failure when you run autogen.pl on a distribution tarball. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-31 09:57:50 -07:00
rhc54	93ff4ce36d	Merge pull request #1731 from rhc54/topic/timeout Provide ETIMEDOUT as the mpirun exit code if the timeout limit was hit	2016-05-31 08:41:21 -07:00
Ralph Castain	0cd0ccb7fd	Provide ETIMEDOUT as the mpirun exit code if the timeout limit was hit	2016-05-31 07:45:31 -07:00
Gilles Gouaillardet	1bbc5fadee	ompi/win: silence an other warning	2016-05-31 13:18:39 +09:00
Gilles Gouaillardet	c41321b9e5	ompi/win: silence warning	2016-05-31 13:03:20 +09:00
rhc54	0965cb3d41	Merge pull request #1730 from rhc54/topic/pmixext Patch from Gilles - modify detection of PMIx version for external libraries	2016-05-30 18:50:12 -07:00
Ralph Castain	7b115a9e0b	Patch from Gilles - modify detection of PMIx version for external libraries	2016-05-30 14:30:10 -07:00
George Bosilca	d2abff583e	Fix race condition during BTL TCP tear-down. bot🏷️bug bot:assign:@hjelmn	2016-05-30 10:47:14 -05:00

1 2 3 4 5 ...

25191 Коммитов