openmpi

Автор	SHA1	Сообщение	Дата
Gilles Gouaillardet	ead7efef3f	coll/basic: silence CID 1362614 in mca_coll_basic_allreduce_inter()	2016-06-09 09:40:19 +09:00
Gilles Gouaillardet	ad2e1a5ae9	coll/base: silence CID 1362613 in ompi_coll_base_alltoall_intra_basic_linear()	2016-06-09 09:40:05 +09:00
Gilles Gouaillardet	80b267af1c	coll/base: silence CID 1362601 in ompi_coll_base_sendrecv_zero()	2016-06-09 09:37:31 +09:00
Gilles Gouaillardet	0e393195d9	coll/base: fix [all]reduce with non zero lower bound datatypes Offset temporary buffer when a non zero lower bound datatype is used. Thanks Hristo Iliev for the report	2016-06-08 16:48:00 +09:00
Nathan Hjelm	3ddf3ccbf3	Merge pull request #1758 from hjelmn/ob1_fixes pml/ob1: bug fixes	2016-06-07 11:18:55 -06:00
Todd Kordenbrock	9671d6af47	Merge pull request #1689 from francois-wellenreiter/remove_trig_rdv_portals4 MTL portals4 : remove the triggered rendez-vous protocol	2016-06-06 21:55:01 -05:00
Nathan Hjelm	5d0b4679ea	pml/ob1: bug fixes This commit fixes two bugs in pml/ob1: - Do not called MCA_PML_OB1_PROGRESS_PENDING from mca_pml_ob1_send_request_start_copy as this may lead to a recursive call to mca_pml_ob1_send_request_process_pending. - In mca_pml_ob1_send_request_start_rdma return the rdma frag object if a btl fragment can not be allocated. This fixes a leak identified by @abouteiller and @bosilca. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-06 17:54:55 -06:00
Gilles Gouaillardet	c976559877	coll/basic: fix log basic bcast The log basic bcast was completely broken. The rank 0 gets the hibit set to -1, so it always returned an error.	2016-06-06 11:01:51 +09:00
Gilles Gouaillardet	99fedcb7a3	fs/base: silence a memory leak in mca_fs_base_get_fstype() Fixes CID 1351211	2016-06-06 09:20:14 +09:00
George Bosilca	9376b0340b	Fix the basic barrier. The log basic barrier was completely broken. The rank 0 gets the hibit set to 0, so it always returned an error.	2016-06-03 23:46:25 -04:00
Edgar Gabriel	d6af5444a6	fix the get_byte_offset code	2016-06-03 11:36:53 -05:00
Nathan Hjelm	e968ddfe64	start bug fixes (#1729 ) * mpi/start: fix bugs in cm and ob1 start functions There were several problems with the implementation of start in Open MPI: - There are no checks whatsoever on the state of the request(s) provided to MPI_Start/MPI_Start_all. It is erroneous to provide an active request to either of these calls. Since we are already looping over the provided requests there is little overhead in verifying that the request can be started. - Both ob1 and cm were always throwing away the request on the initial call to start and start_all with a particular request. Subsequent calls would see that the request was pml_complete and reuse it. This introduced a leak as the initial request was never freed. Since the only pml request that can be mpi complete but not pml complete is a buffered send the code to reallocate the request has been moved. To detect that a request is indeed mpi complete but not pml complete isend_init in both cm and ob1 now marks the new request as pml complete. - If a new request was needed the callbacks on the original request were not copied over to the new request. This can cause osc/pt2pt to hang as the incoming message callback is never called. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> * osc/pt2pt: add request for gc after starting a new request Starting a new receive may cause a recursive call into the pt2pt frag receive function. If this happens and the prior request is on the garbage collection list it could cause problems. This commit moves the gc insert until after the new request has been posted. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-02 20:22:40 -04:00
Matias A Cabral	29ab28f4f6	Adding owner.txt file for PSM2 MTL.	2016-06-02 16:26:16 -07:00
George Bosilca	d577e12dd0	Fix comment.	2016-06-03 00:57:31 +09:00
George Bosilca	2e1b1d34c6	Safety first !	2016-06-02 11:52:43 +09:00
George Bosilca	50cec456fb	ompi_request_complete with signal Rewrite the ompi_request_complete function to take in account the with_signal argument. Change the comment to explain the expected behavior. Alter all the ompi_request_complete uses to make sure the status of the request is set before calling ompi_request_complete. bot🏷️enhancement	2016-06-02 11:49:12 +09:00
George Bosilca	223d75595d	Give a boost to MPI_Barrier. Based on current implementation it is faster to use a blocking send than the non-blocking version. Switch the exchange function used in the barrier to use the blocking version combined with the non-blocking version of the receive.	2016-06-02 11:45:25 +09:00
Nathan Hjelm	086ffc1838	pml/ob1: fix race on pml completion of send requests The request code was setting the request as pml_complete before calling MCA_PML_OB1_SEND_REQUEST_MPI_COMPLETE. This was causing MCA_PML_OB1_SEND_REQUEST_RETURN to be called twice in some cases. The code now mirrors the recvreq code and only sets the request as pml complete if the request has not already been freed. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-01 13:36:06 -06:00
Gilles Gouaillardet	5f565dfec3	configury: clean the flex generated .c files	2016-06-01 11:13:31 +09:00
Thananon Patinyasakdikul	60d0fbf683	Removal of ompi_request_lock from pml/ucx.	2016-05-26 12:36:58 -04:00
George Bosilca	90f294096e	Remove more references to the request mutex. Regarding BFO it should be mentionned that this component is currently unmaintained, and that despite my efforts I could not make it compile (it would not compile before this patch either).	2016-05-25 23:27:06 -04:00
Nathan Hjelm	9d439664f0	pml/yalla: update for request changes This commit brings the pml/yalla component up to date with the request rework changes. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 15:42:53 -06:00
Nathan Hjelm	8445c885ce	pml/cm: update for request changes This fixes a hang caused by the request refactor work. The cm pml was not updated and was hanging is most cases. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 15:35:32 -06:00
Valentin Petrov	5ff6372886	coll/hcoll: bugfix: initialize req_type field If left uninitialized then segfault is possible in MPI_Waitall in the case the field by chance equals OMPI_REQUEST_GEN.	2016-05-25 15:38:01 +03:00
bosilca	b90c83840f	Refactor the request completion (#1422 ) * Remodel the request. Added the wait sync primitive and integrate it into the PML and MTL infrastructure. The multi-threaded requests are now significantly less heavy and less noisy (only the threads associated with completed requests are signaled). * Fix the condition to release the request.	2016-05-24 18:20:51 -05:00
Jeff Squyres	e7d46b96a3	Merge pull request #1680 from yburette/topic/fix_provider_selection mtl/ofi: Change default provider selection behavior.	2016-05-23 15:06:02 -04:00
Francois WELLENREITER	b2b0fc63e2	MTL portals4 : remove the triggered rendez-vous protocol	2016-05-23 15:50:00 +02:00
Gilles Gouaillardet	bca44592af	Merge pull request #1643 from ggouaillardet/topic/romio_openbsd57 io/romio: fix filesystem type check on OpenBSD	2016-05-23 16:33:56 +09:00
Nathan Hjelm	31bfeede82	bml/r2: always add btl progress function This commit changes the behavior of bml/r2 from conditionally registering btl progress functions to always registering progress functions. Any progress function beloning to a btl that is not yet in use is registered as low-priority. As soon as a proc is added that will make use of the btl is is re-registered normally. This works around an issue with some btls. In order to progress a first message from an unknown peer both ugni and openib need to have their progress functions called. If either btl is not in use after the first call to add_procs the callback was never happening. This commit ensures the btl progress function is called at some point but the number of progress callbacks is reduced from normal to ensure lower overhead when a btl is not used. The current ratio is 1 low priority progress callback for every 8 calls to opal_progress(). Fixes open-mpi/ompi#1676 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-21 15:54:04 -04:00
yohann	2f0cde791a	mtl/ofi: Change default provider selection behavior. As more providers get added to libfabric, the default exclude list would need to be updated. Instead, we choose to include only the providers known to work by default. New default: - include: psm,psm2,gni - exclude: none	2016-05-19 10:59:25 -07:00
Ralph Castain	a35bb8453a	Unlock the mutex prior to destructing it. Thanks to Nicolas Joly for the report	2016-05-19 10:36:58 -07:00
rhc54	8b534e9897	Merge pull request #1668 from rhc54/topic/slurm When direct launching applications, we must allow the MPI layer to pr…	2016-05-16 12:23:19 -07:00
Jeff Squyres	5275e5e2a1	bml_r2: use __func__ to identify function names There were some old/stale function names in some debugging/verbose opal_output calls. Use __func__ instead, so that they won't become stale in the future. Thanks to Durga Choudhury for pointing out the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-16 11:06:47 -04:00
Ralph Castain	01ba861f2a	When direct launching applications, we must allow the MPI layer to progress during RTE-level barriers. Neither SLURM nor Cray provide non-blocking fence functions, so push those calls into a separate event thread (use the OPAL async thread for this purpose so we don't create another one) and let the MPI thread sping in wait_for_completion. This also restores the "lazy" completion during MPI_Finalize to minimize cpu utilization. Update external as well Revise the change: we still need the MPI_Barrier in MPI_Finalize when we use a blocking fence, but do use the "lazy" wait for completion. Replace the direct logic in MPI_Init with a cleaner macro	2016-05-14 16:37:00 -07:00
Aurélien Bouteiller	7f65c2b18e	forgot to update copyright in commits `627a89b` `4899c89`	2016-05-13 11:34:59 -04:00
George Bosilca	37e03e3e5b	Don't update req_bytes_received if no bytes were received.	2016-05-12 23:39:32 -04:00
Matias A Cabral	528abff6ae	Merge remote-tracking branch 'upstream/master'	2016-05-10 15:42:08 -07:00
Matias A Cabral	d28ee62a96	Update in PSM and PSM2 MTLs to detect entries created by drivers for Intel TrueScale and Intel OmniPath, and detect a link in ACTIVE state. This fix addresses the scenario reported in the below OMPI users email, including formerly named Qlogic IB, now Intel True scale. Given the nature of the PSM/PSM2 mtls this fix applies to OmniPath: https://www.open-mpi.org/community/lists/users/2016/04/29018.php	2016-05-09 12:08:44 -07:00
Gilles Gouaillardet	0a19337371	coll/base: return MPI_ERR_UNSUPPORTED_OPERATION when coll_base_*_two_procs algo is used on a communicator that has no two tasks Thanks Dave Love for the report	2016-05-09 14:18:40 +09:00
Gilles Gouaillardet	b159587325	io/romio: fix filesystem type check on OpenBSD 5.7 check the existence of the f_type field in struct statfs Thanks Paul Hargrove for the report	2016-05-09 13:54:46 +09:00
Ralph Castain	6b24e2779b	Remove stale component - I'm not going to get to it	2016-05-07 04:13:34 -07:00
Edgar Gabriel	def1b95fd7	Merge pull request #1646 from edgargabriel/getview-preallocate-fixes io/ompio: file_getview and file_preallocate fixes	2016-05-06 11:46:00 -05:00
Edgar Gabriel	e65e189671	io/ompio: fix file size after file_preallocate Thanks for @dalcini for reporting Fixes open-mpi/ompi#1633	2016-05-06 08:20:59 -05:00
Edgar Gabriel	d358965134	io/ompio: fix envelope of datatype returned by getview Thanks for @dalcini for reporting Fixes open-mpi/ompi#1632	2016-05-06 08:19:48 -05:00
Edgar Gabriel	7c92acaa78	Merge pull request #1637 from edgargabriel/pr/netbsd-compilation-problems fs/lustre and fs/pvfs2: fix netbsd compilation problems	2016-05-06 08:05:36 -05:00
Gilles Gouaillardet	6c9d65c0ca	coll/libnbc: fix MPI_Ireduce_scatter_block for one task communicator Thanks Lisandro Dalcin for the report Fixes open-mpi/ompi#248	2016-05-06 09:43:29 +09:00
Ralph Castain	08022d7af1	Some minor cleanups of warnings from gcc 6.0.0. Update s1/s2 pmix to get max_procs as required.	2016-05-05 15:28:13 -07:00
Jeff Squyres	f167be1c91	ompio: always return valid info from FILE_GET_INFO MPI-3.1 says that even if no info keys are set on the file, we need to return a new, empty info. Thanks to Lisandro Dalcin for identifying the issue. Fixes open-mpi/ompi#1630 Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-05 12:03:29 -07:00
Aurélien Bouteiller	4899c89731	Fix a race condition when multiple threads try to create a bml endpoint simultaneously.	2016-05-05 10:49:30 -04:00
Aurélien Bouteiller	627a89bf71	Fix a race condition when multiple threads do the "first send" to an endpoint simultaneously.	2016-05-05 09:04:10 -04:00
Joshua Ladd	4771c9ece6	Merge pull request #1617 from jladd-mlnx/topic/disable-hcoll-barrier-in-finalize-ompi-trunk HCOLL: fix hang in hcoll barrier called from finalize for MXM/yalla	2016-05-04 10:12:34 -04:00
Edgar Gabriel	78fa8bb2c4	remove some unused variables that can cause compilation problems on netbsd	2016-05-03 10:25:15 -05:00
Todd Kordenbrock	3498bed650	Merge pull request #1555 from shawone/check_reduce_ret coll-portals4: check return value from reduce kary tree functions	2016-05-03 10:17:23 -05:00
Jeff Squyres	33dd8ca81e	osc_rdma_peer: properly include ompi_config.h Thanks to Paul Hargrove for reporting. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-03 07:39:55 -07:00
Devendar Bureddy	cafd55f18c	HCOLL: fix hang in hcoll barrier called from finalize for MXM/yalla tear down HCOLL barrier may not complete if HCOLL progress is not called periodically. which is the case in HCOLL teardown progress in the finalize. (cherry picked from commit 793244d75dd94d1d5e0243bcccf6d04318750f3f)	2016-05-03 00:49:57 +03:00
Nathan Hjelm	d3d779f6d9	osc/rdma: clear all_sync object when obtaining a lock This commit fixes a bad synchronization detection bug that occurs when mixing MPI_Win_fence() and MPI_Win_lock(). If no communication has occurred in the fence epoch it is safe to just clear the all_sync object (it was set up by fence). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-02 15:28:47 -06:00
Jeff Squyres	265e5b9795	Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1 ompi/opal/orte/oshmem/test: max hostname length cleanup	2016-05-02 09:44:18 -04:00
Ralph Castain	6ac7929bd0	Extend the schizo framework to allow definition of CLI options by environment. Refactor orterun to mesh with the orted_submit code, thus improving code reuse. Eliminate the orte-submit tool as orterun can now meet that need. Cleanups per @jjhursey review	2016-05-01 11:30:25 -07:00
Nathan Hjelm	7bda3eb2dc	osc/rdma: fix global index array calculation This commit fixes a bug that occurs when ranks are either not mapped evenly or by something other than core. Fixes open-mpi/ompi#1599 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-28 19:11:11 -06:00
Nathan Hjelm	f0f3383006	Merge pull request #1590 from hjelmn/thread_multiple osc/pt2pt: do not drop/reacquire the ompi_request_lock	2016-04-26 16:48:37 -06:00
Nathan Hjelm	34ff6293bd	osc/pt2pt: do not drop/reacquire the ompi_request_lock This lock is now recursive so it is safe to call into the pml without dropping the lock. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-26 14:19:38 -06:00
George Bosilca	bf190671e9	Make the request lock recursive. If during the request completion callback we post another request that completes right away (such a small send or a match for an unexpected short message) we will try to complete the second request while holding the lock for the completion of the first. For performance reasons (mainly to avoid unlocking and locking the request mutex several times) we have made the request lock recursive.	2016-04-26 16:16:07 -04:00
Nathan Hjelm	c16e639b2f	Merge pull request #1563 from hjelmn/ompi_coverity ompi coverity fixes	2016-04-26 09:17:48 -06:00
Karol Mroz	3322347da9	ompi: fixup hostname max length usage Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-25 07:08:23 +02:00
Nathan Hjelm	ae0ffbb67f	Merge pull request #1397 from hjelmn/enable_thread_multiple ompi: always enable MPI_THREAD_MULTIPLE support	2016-04-23 08:40:22 -06:00
Nathan Hjelm	1ff3d3b16b	pml/ob1: fix coverity issue Fix CID 1357978 (1 of 1): Logically dead code (DEADCODE): Remove duplicate check for NULL == endpoint. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 14:48:13 -06:00
Nathan Hjelm	70533e6d50	fcoll/static: fix coverity issues Fix CID 72362: Explicit null dereferenced (FORWARD_NULL) From what I can tell the code @ fcoll_static_file_read_all.c:649 should be setting bytes_per_process[i] to 0 not bytes_per_process. Fix CID 72361: Explicit null dereferenced (FORWARD_NULL) Modified check to check for blocklen_per_process non-NULL before trying to free blocklen_per_process[l]. This is sufficient because free (NULL) is safe. Also cleaned up the initialization of this an a couple other arrays. They were allocated with malloc() then initialized to 0. Changed to used calloc(). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 14:48:13 -06:00
Nathan Hjelm	8871bdb2f8	fcoll/two_phase: fix coverity issues Fix CID 72296: Resource leak (RESOURCE_LEAK): Changed code to goto exit instead of returning to ensure memory is freed. Fix CID 712589: Out-of-bounds read (OVERRUN): In this loop i and j are identical and always less than iov_count. The CID was triggered because i was incremented if i was < iov_count. This meant that if the loop did go on the next iteration would access an invalid index. Fix CID 741363: Uninitialized scalar variable (UNINIT): Allocate tmp_len with calloc to insure every index is initialized. Fix CID 741364: Uninitialized pointer read (UNINIT): Allocate recv_types with calloc to ensure all indices are always initialized. Also added a check to not loop and destroy if recv_types is NULL. Also added a NULL check on the allocation of decoded iov. This is not the cause of CID 126784 but should be fixed. Fix CID 712588: Out-of-bounds read (OVERRUN): Similar to CID 712589. Should silence the issue. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 14:47:41 -06:00
Valentin Petrov	21f1c572c0	Adds mapping to hcoll complex dte	2016-04-19 14:14:28 +03:00
Nicolas Chevalier	c86d4035d2	coll-portals4: check return value from reduce kary tree functions	2016-04-18 12:02:30 +00:00
Nathan Hjelm	3245428e82	Merge pull request #1535 from kawashima-fj/pr/osc-pt2pt-header-fix osc/pt2pt: Fix a struct name typo	2016-04-14 15:55:25 -06:00
Nathan Hjelm	330302c4b4	Merge pull request #1534 from kawashima-fj/pr/parallel-rma-fix osc/pt2pt: Fix tag conflicts on parallel RMA communications	2016-04-14 15:13:32 -06:00
Jeff Squyres	fdf33674b3	Merge pull request #1532 from kmroz/wip-hindexed-cleanup-1 romio,java: cleanup deprecated hindexed call	2016-04-14 17:07:31 -04:00
KAWASHIMA Takahiro	35ea9e5c3c	Add FUJITSU copyright	2016-04-12 13:47:53 +09:00
KAWASHIMA Takahiro	39bcbe439a	osc/pt2pt: Fix a struct name typo Fortunately the sizes of `ompi_osc_pt2pt_header_put_t` and `ompi_osc_pt2pt_header_get_t` are same. So this doesn't affect the behavior.	2016-04-11 20:55:22 +09:00
KAWASHIMA Takahiro	28a0577364	osc/pt2pt: Insert breaks in long lines	2016-04-11 19:06:01 +09:00
KAWASHIMA Takahiro	5ac95df9dc	osc/pt2pt: use two distinct "namespaces" for tags - revised Before this commit, a same PML tag may be used for distinct communications for long messages. For example, consider a condition where rank A calls ```MPI_PUT``` targeting rank B and rank B calls ```MPI_GET``` targeting rank A simultaneously. A PML tag for the ```MPI_PUT``` is acquired on rank A and is used for the long-message communication from rank A to rank B. A PML tag for the ```MPI_GET``` is acquired on rank B and is used for the long-message communication from rank A to rank B. These two tags may become a same value because they are managed independently on each rank. This will cause a data corruption. This commit separates the tag used in a single RMA communication call, one for communication from an origin to a target, and one for communication from a target to an origin. A "base" tag is acquired using ```get_tag``` function and PML tag is caluculated from the base tag by ```tag_to_target``` and ```tag_to_origin``` function.	2016-04-11 19:05:20 +09:00
KAWASHIMA Takahiro	3576ecafa7	Revert "osc/pt2pt: use two distinct "namespaces" for tags" This reverts commit `06ecdb6aa7` to reimplement the fix completely.	2016-04-11 19:04:11 +09:00
Karol Mroz	5c54184986	romio: replace deprecated hindexed call Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-10 19:56:22 +02:00
Nathan Hjelm	c6b19818be	bml: always enable the bml This commit ensures the bml is always enabled whether or not it will be used. This ensures that any available btls communicate their modex so that they can be used for one-sided communication. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-08 21:14:17 -06:00
George Bosilca	896f857fc4	Thanks @hjelmn for catching up the typo.	2016-04-07 13:56:26 -04:00
Thananon Patinyasakdikul	92290b94e0	Fixed Coverity reports 1358014-1358018 (DEADCODE and CHECK_RETURN)	2016-04-07 12:52:17 -04:00
Ryan Grant	7cdf50533c	Merge pull request #1314 from francois-wellenreiter/osc_disable_portals4_evt_send OSC portals4 : do not generate an EVENT_SEND to avoid to filter it	2016-04-07 10:04:27 -06:00
George Bosilca	004c0cc05b	Fix issues identified by @derbeyn.	2016-03-29 15:50:32 -04:00
Jeff Squyres	91c54d7a07	Merge pull request #1491 from ICLDisco/progress_thread BTL TCP async progress	2016-03-29 06:26:10 -04:00
George Bosilca	f69eba1bc4	Update the copyright and cleanup the code. Per @jsquyres suggestion remove all trailing spaces. Credit to `sed -i.bak 's/ $//' /[ch]`.	2016-03-28 14:41:01 -04:00
Thananon Patinyasakdikul	92062492b9	Enable Threading in the BTL TCP Added mca parameter to turn progress thread on/off Add a flag to check if we have btl progress thread. Added macro for ob1 matching lock. Update the AUTHORS file.	2016-03-28 14:41:01 -04:00
Nathan Hjelm	9d5eeecb8a	pml/ob1: detect unreachable errors This commit adds code to detect when procs are unreachable when using the dynamic add_procs functionality. Fixes #1501 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-28 10:52:40 -06:00
George Bosilca	57eadb0dd6	Fix for Coverity CID 1357152. Or at least that was the origin of the issue. It turns out we were freeing the wrong buffer (but as it only happen in the case of an error we never noticed).	2016-03-24 00:53:30 -04:00
George Bosilca	4b38b6bd0c	Fix multiple issues with the collective requests. This patch addresses most (if not all) @derbeyn concerns expressed on #1015. I added checks for the requests allocation in all functions, ompi_coll_base_free_reqs is called with the right number of requests, I removed the unnecessary basic_module_comm_t and use the base_module_comm_t instead, I remove all uses of the COLL_BASE_BCAST_USE_BLOCKING define, and other minor fixes.	2016-03-23 18:35:41 -04:00
Todd Kordenbrock	2122a15217	Merge pull request #1443 from francois-wellenreiter/fix_trig_rndv MTL portals4 : fix around triggered rndv operations	2016-03-21 08:16:33 -05:00
Ralph Castain	c146c4969b	Revert part of open-mpi/ompi@c1bbbb5e2f to restore the usock component, thus fixing show_help aggregation. Fixes #1467 Restore debugger attach operations Fixes #1225	2016-03-18 21:49:04 -07:00
Nathan Hjelm	075dfa4121	topo/treematch: fix component coverity issues Fix CID 1315298: Resource leak (RESOURCE_LEAK) : Fix CID 1315300: Resource leak (RESOURCE_LEAK): Fix CID 1315299: Resource leak (RESOURCE_LEAK): Fix CID 1315297 (#1 of 1): Resource leak (RESOURCE_LEAK): Confirmed leaks in error paths. Added the leaked arrays to the ERR_EXIT macro to ensure they are freed. Fix CID 1315296 (#1 of 1): Resource leak (RESOURCE_LEAK): Confirmed leak in error paths. Both the oversub and reqs arrays are leaked. Free these arrays on error. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-18 11:31:11 -06:00
Nathan Hjelm	3540b65f7d	bcol: fix coverity issues Fix CID 1269976 (#1 of 1): Unused value (UNUSED_VALUE): Fix CID 1269979 (#1 of 1): Unused value (UNUSED_VALUE): Removed unused variables k_temp1 and k_temp2. Fix CID 1269981 (#1 of 1): Unused value (UNUSED_VALUE): Fix CID 1269974 (#1 of 1): Unused value (UNUSED_VALUE): Removed gotos and use the matched flags to decide whether to return. Fix CID 715755 (#1 of 1): Dereference null return value (NULL_RETURNS): This was also a leak. The items on cs->ctl_structures are allocated using OBJ_NEW so they mist be released using OBJ_RELEASE not OBJ_DESTRUCT. Replaced the loop with OPAL_LIST_DESTRUCT(). Fix CID 715776 (#1 of 1): Dereference before null check (REVERSE_INULL): Rework error path to remove REVERSE_INULL. Also added a free to an error path where it was missing. Fix CID 1196603 (#1 of 1): Bad bit shift operation (BAD_SHIFT): Fix CID 1196601 (#1 of 1): Bad bit shift operation (BAD_SHIFT): Both of these are false positives but it is still worthwhile to fix so they no longer appear. The loop conditional has been updated to use radix_mask_pow instead of radix_mask to quiet these issues. Fix CID 1269804 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS): In general close (-1) is safe but coverity doesn’t like it. Reworked the error path for open to not try to close (-1). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-18 10:59:46 -06:00
Nathan Hjelm	c8b077f232	coll/ml: fix coverity issues Fix CID 715744 (#1 of 1): Logically dead code (DEADCODE): Fix CID 715745 (#1 of 1): Logically dead code (DEADCODE): The free of scratch_num in either place is defensive programming. Instead of removing the free the conditional around the free has been removed to quiet the warning. Fix CID 715753 (#1 of 1): Dereference after null check (FORWARD_NULL): Fix CID 715778 (#1 of 1): Dereference before null check (REVERSE_INULL): Fixed the conditional to check for collective_alg != NULL instead of collective_alg->functions != NULL. Fix CID 715749 (#1 of 4): Explicit null dereferenced (FORWARD_NULL): Updated code to ensure that none of the parse functions are reached with a non-NULL value. Fix CID 715746 (#1 of 1): Logically dead code (DEADCODE): Removed dead code. Fix CID 715768 (#1 of 1): Resource leak (RESOURCE_LEAK): Fix CID 715769 (#2 of 2): Resource leak (RESOURCE_LEAK): Fix CID 715772 (#1 of 1): Resource leak (RESOURCE_LEAK): Move free calls to before error checks to cleanup leak in error paths. Fix CID 741334 (#1 of 1): Explicit null dereferenced (FORWARD_NULL): Added a check to ensure temp is not dereferenced if it is NULL. Fix CID 1196605 (#1 of 1): Bad bit shift operation (BAD_SHIFT): Fixed overflow in calculation by replacing int mask with 1ul. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-18 10:11:16 -06:00
Nathan Hjelm	2f4e5325aa	coll/base: fix coverity issues Fix CID 1325868 (#1 of 1): Dereference after null check (FORWARD_NULL): Fix CID 1325869 (#1-2 of 2): Dereference after null check (FORWARD_NULL): Here reqs can indeed be NULL. Added a check to ompi_coll_base_free_reqs to prevent dereferencing NULL pointer. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-18 09:31:43 -06:00
Nathan Hjelm	2ed4501490	osc: fix coverity issues Fix CID 1324726 (#1 of 1): Free of address-of expression (BAD_FREE): Indeed, if a lock conflicts with the lock_all we will end up trying to free an invalid pointer. Fix CID 1328826 (#1 of 1): Dereference after null check (FORWARD_NULL): This was intentional but it would be a good idea to check for module->comm being non_NULL to be safe. Also cleaned out some checks for NULL before free(). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-18 09:11:48 -06:00
Nathan Hjelm	ec9712050b	Merge pull request #1118 from hjelmn/mpool_rewrite mpool/rcache rewrite	2016-03-15 10:46:24 -06:00
Nathan Hjelm	deae9e52bf	Merge pull request #1259 from kawashima-fj/pr/osc-sm-align osc/sm: Fix a bus error on MPI_WIN_{POST,START}.	2016-03-15 09:13:38 -06:00
Francois WELLENREITER	2bc432d95f	MTL portals4 : fix around triggered rndv operations	2016-03-15 15:31:04 +01:00
Nathan Hjelm	d4afb16f5a	opal: rework mpool and rcache frameworks This commit rewrites both the mpool and rcache frameworks. Summary of changes: - Before this change a significant portion of the rcache functionality lived in mpool components. This meant that it was impossible to add a new memory pool to use with rdma networks (ugni, openib, etc) without duplicating the functionality of an existing mpool component. All the registration functionality has been removed from the mpool and placed in the rcache framework. - All registration cache mpools components (udreg, grdma, gpusm, rgpusm) have been changed to rcache components. rcaches are allocated and released in the same way mpool components were. - It is now valid to pass NULL as the resources argument when creating an rcache. At this time the gpusm and rgpusm components support this. All other rcache components require non-NULL resources. - A new mpool component has been added: hugepage. This component supports huge page allocations on linux. - Memory pools are now allocated using "hints". Each mpool component is queried with the hints and returns a priority. The current hints supported are NULL (uses posix_memalign/malloc), page_size=x (huge page mpool), and mpool=x. - The sm mpool has been moved to common/sm. This reflects that the sm mpool is specialized and not meant for any general allocations. This mpool may be moved back into the mpool framework if there is any objection. - The opal_free_list_init arguments have been updated. The unused0 argument is not used to pass in the registration cache module. The mpool registration flags are now rcache registration flags. - All components have been updated to make use of the new framework interfaces. As this commit makes significant changes to both the mpool and rcache frameworks both versions have been bumped to 3.0.0. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-14 10:50:41 -06:00
Gilles Gouaillardet	fbed6df4a3	coll/base: fix a typo typo was introduced in open-mpi/ompi@c98e97a46e	2016-03-11 14:18:03 +09:00
Aurélien Bouteiller	c98e97a46e	Do not return MPI_ERR_PENDING from collectives.	2016-03-09 16:13:34 -05:00
Joshua Ladd	4dffae2f88	Fixing MXM Yalla and MTL add procs behavior. MXM cannot support dynamic add procs, so propaget this info to the MTL and PML layers.	2016-03-08 01:46:24 +02:00
Aurélien Bouteiller	892e1ed57e	Fix a potential race condition in which a progress matching thread could match a request while we are cancelling it.	2016-03-01 16:43:45 -05:00
Gilles Gouaillardet	8aff67c399	topo/base: correctly support MPI_UNWEIGHTED in mca_topo_base_dist_graph_neighbors() Thanks Jun Kudo for the bug report.	2016-03-01 10:28:28 +09:00
George Bosilca	dbe93b0b19	Use mca_bml_base_get_endpoint Correctly use mca_bml_base_get_endpoint instead of accessing the endpoint directly.	2016-02-25 11:00:30 -06:00
Sylvain Jeaugey	5f32f49eb8	pml/ob1: Fix segmentation fault on CUDA path. Fix segfault due to mca_pml_ob1_cuda_need_buffers not handling the case of the endpoint not being there. Calling mca_bml_get_endpoint() seems to fix the problem. Fixes open-mpi/ompi#1402	2016-02-24 21:32:25 -08:00
Nathan Hjelm	230d04327e	ompi: always enable MPI_THREAD_MULTIPLE support This commit removes the --with-mpi-thread-multiple option and forces MPI_THREAD_MULTIPLE support. This cleans up an abstration violation in opal where OMPI_ENABLE_THREAD_MULTIPLE determines whether the opal_using_threads is meaningful. To reduce the performance hit on MPI_THREAD_SINGLE programs an OPAL_UNLIKELY is used for the check on opal_using_threads in OPAL_THREAD_* macros. This commit does not clean up the arguments to the various functions that take whether muti-threading support is enabled. That should be done at a later time. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-23 10:02:14 -07:00
Edgar Gabriel	45003ef78d	fix the data size counter for large ops for the static fcoll component	2016-02-23 08:33:50 -06:00
yohann	59b6d041f8	mtl/ofi: Check allocated pointer.	2016-02-19 16:59:47 -08:00
yohann	bd47062764	mtl/ofi: Fix error handling.	2016-02-19 16:58:41 -08:00
yohann	404987e9b3	mtl/ofi: Fix mismatching types.	2016-02-19 16:57:26 -08:00
yohann	3ad59435ce	mtl/ofi: Prevent possible memory leak.	2016-02-19 16:57:02 -08:00
Edgar Gabriel	92d1b99468	optimize the shuffle step: 1. use communicator collectives if possible for performance reasons 2. combined multiple allgathers into a single one	2016-02-19 11:04:04 -06:00
Edgar Gabriel	e63836c653	clean up the mca parameter handling of the component. Add new parameters for number of sub groups and write chunk size. This will allow to perform a systematic parameter study.	2016-02-19 10:15:28 -06:00
Edgar Gabriel	4f400314e0	add the dynamic_gen2 component into the fcoll selection table.	2016-02-19 09:32:54 -06:00
Edgar Gabriel	268d525053	change the tag to be a positive value. handle 0-byte situations correctly.	2016-02-19 08:28:50 -06:00
Edgar Gabriel	ad79012059	first cut on the version which overlaps the communication/computation of 2 iterations.	2016-02-19 08:28:50 -06:00
Ralph Castain	60a7bc2e50	Enable the PMIx notification callback system. This currently is only supported by the pmix120 component, which is not selected by default. All other components will ignore error registration requests, and thus do not support debugger attach when launched via mpirun. Note that direct launched applications will support such attachment, but may not do so in a scalable fashion. Fixes ##1225	2016-02-18 09:29:12 -08:00
yohann	7fe395c82a	mtl/ofi: cleanup	2016-02-16 09:57:57 -08:00
yohann	22eddfee10	mtl/ofi: update copyright dates.	2016-02-16 09:56:09 -08:00
George Bosilca	68c36ea9dc	Fix two annoying warnings in our UCX support.	2016-02-14 00:02:16 -05:00
yohann	67ce4a080a	mtl/ofi: FI_AV_MAP support only.	2016-02-12 10:06:52 -08:00
yohann	b3d8ead76e	mtl/ofi: Fix dynamic add_procs.	2016-02-12 10:05:52 -08:00
Gilles Gouaillardet	b55b9e6aee	sentinel: fix sentinel to proc_name conversion converting an opal_process_name_t means the loss of one bit, it was decided to restrict the local job id to 15 bits, so the useful information of an opal_process_name_t can fit in 63 bits.	2016-02-10 15:44:07 +09:00
Gilles Gouaillardet	030a5f2054	sentinel: use type uintptr_t for sentinel MSB is now automatically cleared when right shifting Thanks George for pointing this	2016-02-10 11:28:56 +09:00
George Bosilca	7c574a3530	Typo.	2016-02-07 07:22:22 +02:00
Nathan Hjelm	5b9c82a964	osc/pt2pt: bug fixes This commit fixes several bugs identified by @ggouaillardet and MTT: - Fix SEGV in long send completion caused by missing update to the request callback data. - Add an MPI_Barrier to the fence short-cut. This fixes potential semantic issues where messages may be received before fence is reached. - Ensure fragments are flushed when using request-based RMA. This allows MPI_Test/MPI_Wait/etc to work as expected. - Restore the tag space back to 16-bits. It was intended that the space be expanded to 32-bits but the required change to the fragment headers was not committed. The tag space may be expanded in a later commit. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-04 16:59:39 -07:00
Gilles Gouaillardet	6eac6a8b00	osc/sm: create datafile into the per proc directory in order to make it unique per communicator Thanks Peter Wind for the report	2016-02-03 10:12:37 +09:00
Nathan Hjelm	519fffb65e	osc/pt2pt: eager sends are always active if MPI_MODE_NOCHECK is used This commit fixes open-mpi/ompi#1299. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-02 12:44:17 -07:00
Nathan Hjelm	d7264aa613	osc/pt2pt: various threading fixes This commit fixes several bugs identified by a new multi-threaded RMA benchmarking suite. The following bugs have been identified and fixed: - The code that signaled the actual start of an access epoch changed the eager_send_active flag on a synchronization object without holding the object's lock. This could cause another thread waiting on eager sends to block indefinitely because the entirety of ompi_osc_pt2pt_sync_expected could exectute between the check of eager_send_active and the conditon wait of ompi_osc_pt2pt_sync_wait. - The bookkeeping of fragments could get screwed up when performing long put/accumulate operations from different threads. This was caused by the fragment flush code at the end of both put and accumulate. This code was put in place to avoid sending a large number of unexpected messages to a peer. To fix the bookkeeping issue we now 1) wait for eager sends to be active before stating any large isend's, and 2) keep track of the number of large isends associated with a fragment. If the number of large isends reaches 32 the active fragment is flushed. - Use atomics to update the large receive/send tag counters. This prevents duplicate tags from being used. The tag space has also been updated to use the entire 16-bits of the tag space. These changes should also fix open-mpi/ompi#1299. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-02 12:33:33 -07:00
Edgar Gabriel	3f7fff5780	Merge pull request #1331 from edgargabriel/solaris-statfs-fix Solaris statfs fix	2016-01-28 20:16:33 -06:00
Nathan Hjelm	a19c265ab5	osc/rdma: fix typo in ompi_osc_rdma_complete_atomic The typo caused SEGVs on systems with only fetching atomic support. Fixes open-mpi/ompi#1329 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-01-26 15:44:07 -07:00
Edgar Gabriel	b4a725c26a	need to check for the parent dir as well, since the file might not exist yet.	2016-01-26 13:49:21 -06:00
Edgar Gabriel	722aab92e6	- extend opal_path_nfs to retrieve the file system type - use opal_path_nfs in the fs_base function to avoid code duplication.	2016-01-26 13:36:21 -06:00
Joshua Ladd	69e3c6f289	Merge pull request #1321 from jladd-mlnx/topic/add-allgatherv-reduce Adding entry points for Allgatherv, iAllgatherv, Reduce, and iReduce.	2016-01-25 20:46:52 -05:00
Nathan Hjelm	500e90422d	Merge pull request #1320 from hjelmn/osc_rdma_fix osc/rdma: fix hang when performing large unaligned gets	2016-01-25 09:36:13 -07:00
Nathan Hjelm	45da311473	osc/rdma: fix hang when performing large unaligned gets This commit adds code to handle large unaligned gets. There are two possible code paths for these transactions: 1) The remote region and local region have the same alignment. In this case the get will be broken down into at most three get transactions: 1 transaction to get the unaligned start of the region (buffered), 1 transaction to get the aligned portion of the region, and 1 transaction to get the end of the region. 2) The remote and local regions do not have the same alignment. This should be an uncommon case and is not optimized. In this case a buffer is allocated and registered locally to hold the aligned data from the remote region. There may be cases where this fails (low memory, can't register memory). Those conditions are unlikely and will be handled later. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-01-22 21:06:46 -07:00
Valentin Petrov	5e2a2c0755	BufFix for coll/hcoll: coll_request must be set to ACTIVE when alloced If the state of the request is not set to OMPI_REQUEST_ACTIVE then MPI_Test would immediately signal such request completed while hcoll may still be working on it. Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>	2016-01-23 03:23:59 +02:00
Joshua Ladd	e398bf6f3a	Adding entry points for Allgatherv, iAllgatherv, Reduce, and iReduce. Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>	2016-01-23 03:09:29 +02:00
Nathan Hjelm	49d2f44b97	osc/rdma: use correct endpoint for local state If atomics are not globally visible (cpu and nic atomics do not mix) then a btl endpoint must be used to access local ranks. To avoid issues that are caused by having the same region registered with multiple handles osc/rdma was updated to always use the handle for rank 0. There was a bug in the update that caused osc/rdma to continue using the local endpoint for accessing the state even though the pointer/handle are not valid for that endpoint. This commit fixes the bug. Fixes open-mpi/ompi#1241. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-01-22 10:41:27 -07:00
Nathan Hjelm	6180386bea	osc/rdma: disable put aggregation when using threads Optimizing put aggregation in the presence of threads will require a redesign of the code. For now just ensure that put aggregation is turned off when MPI_THREAD_MULTIPLE is enabled. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-01-21 15:50:35 -07:00
Edgar Gabriel	b253d4e887	fix CID 1349739, CID 1349738, CID 1349736 and (probably) CID 1349740 (not entirely sure about the last one, since I don't understand why block[i] is a problem but max_len[i] allocated and treated exactly the same way 1 line later is not).	2016-01-21 08:32:23 -06:00
Edgar Gabriel	9b8d769e41	will rivist the addproc component later in spring, right now it is constantly in the way of doing my tests.	2016-01-20 15:05:51 -06:00
Francois WELLENREITER	411b7301c3	OSC portals4 : do not generate an EVENT_SEND to avoid to filter it	2016-01-20 11:47:46 +01:00
Edgar Gabriel	a9ca37059a	improve the communicaton abstraction. This commit also allows all aggregators to work simultaniously, instead of the slightly staggered way of the previous version.	2016-01-17 09:48:49 -06:00
Edgar Gabriel	56e11bfc97	initialize the stripe_size variable as well.	2016-01-17 09:48:49 -06:00
Edgar Gabriel	26c57ef374	separate the size of the buffer used for the shuffle step and the size of the buffer used for a pwritev operation.	2016-01-17 09:48:49 -06:00
Edgar Gabriel	39d5c8c281	further bug fixes silencing a compiler warning and fixing a memory overrun	2016-01-17 09:48:49 -06:00
Edgar Gabriel	2bcae84e11	further debugging	2016-01-17 09:48:49 -06:00
Edgar Gabriel	2bdd6ba17a	correctly free some buffers, and ensure that lustre_stripe_size and stripe_count are always read from the file system.	2016-01-17 09:48:49 -06:00
Edgar Gabriel	4bbb22bd0b	add a new field to the ompio data structure (stripe_count) and set it correctly on pvfs2 and lustre.	2016-01-17 09:48:49 -06:00
Edgar Gabriel	d282e94b67	add the new dynamic_gen2 component, designed to coexist for now with the original dynamic component	2016-01-17 09:48:49 -06:00
Jeff Squyres	60ffe713b8	common syms: whitelist bison-generated common symbols Bison generates some common symbols that we can't do anything about, so whitelist them.	2016-01-16 03:53:14 -08:00
Joshua Ladd	18c5a21562	Fix typo in error handling flow.	2016-01-14 22:28:54 +02:00
Joshua Ladd	afa62d8ca1	Addressing reviewers' comments for https://github.com/open-mpi/ompi-release/pull/891	2016-01-14 19:22:27 +02:00
Tomislav Janjusic	3858bc8e62	Adding support for dynamic endpoint creation Signed-off-by: Tomislav Janjusic <tomislavj@mngx-apl-01.mtl.labs.mlnx> Signed-off-by: Tomislavj Janjusic <tomislavj@mellanox.com> Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>	2016-01-12 22:17:03 +02:00
Nathan Hjelm	dd4d49cbbb	Merge pull request #1278 from ggouaillardet/poc/osc_pt2pt osc/pt2pt: use two distinct "namespaces" for tags	2016-01-12 09:49:31 -07:00
Edgar Gabriel	0a1b735eed	use the actual preadv and pwritev functions if available. That's what the fbtl interfaces have been designed for.	2016-01-07 08:29:17 -06:00
Edgar Gabriel	1b0b849994	remove the MCA parameter setting the number of hosts in PLFS, since the plfs_setxattr function used is causing linking problems with PLFS 2.5 remove unused variables.	2016-01-05 11:13:23 -06:00
Edgar Gabriel	7861a8c357	revise the logic in the fbtl plfs avoiding the memcpy operation	2016-01-05 10:04:46 -06:00
Edgar Gabriel	da309ac962	- use a unique pid for each process as requested by the API - sync the file before closing it - use plfs_access() instead of access() before closing the file	2016-01-05 10:04:12 -06:00
KAWASHIMA Takahiro	ad26899110	osc/sm: Fix a bus error on MPI_WIN_{POST,START}. A bus error occurs in sm OSC under the following conditions. - sparc64 or any other architectures which need strict alignment. - `MPI_WIN_POST` or `MPI_WIN_START` is called for a window created by sm OSC. - The communicator size is odd and greater than 3. The lines 283-285 in current `ompi/mca/osc/sm/osc_sm_component.c` has the following code. ```c module->global_state = (ompi_osc_sm_global_state_t ) (module->segment_base); module->node_states = (ompi_osc_sm_node_state_t ) (module->global_state + 1); module->posts[0] = (uint64_t *) (module->node_states + comm_size); ``` The size of `ompi_osc_sm_node_state_t` is multiples of 4 but not multiples of 8. So if `comm_size` is odd, `module->posts[0]` does not aligned to 8. This causes a bus error when accessing `module->posts[i][j]`. This patch fixes the alignment of `module->posts[0]` by setting `module->posts[0]` first.	2016-01-05 19:04:53 +09:00
Gilles Gouaillardet	06ecdb6aa7	osc/pt2pt: use two distinct "namespaces" for tags	2016-01-05 16:57:37 +09:00
Gilles Gouaillardet	14fdf75944	fs/pvfs2: fix typo Thanks Dave Love for reporting this issue. Fixes #1272	2016-01-03 23:28:35 +09:00
Artem Polyakov	2abb2972ac	Fix Mellanox copyrights with respect to the following PRs: * https://github.com/open-mpi/ompi/pull/1184 * https://github.com/open-mpi/ompi/pull/1188 * https://github.com/open-mpi/ompi/pull/1197 * https://github.com/open-mpi/ompi/pull/1202 * https://github.com/open-mpi/ompi/pull/1210 * https://github.com/open-mpi/ompi/pull/1216 * https://github.com/open-mpi/ompi/pull/1236 * https://github.com/open-mpi/ompi/pull/1237 * https://github.com/open-mpi/ompi/pull/1248 * https://github.com/open-mpi/ompi/pull/1260 * https://github.com/open-mpi/ompi/pull/1264	2015-12-30 00:12:19 +06:00
Ralph Castain	810f2446b7	Add pmix120 component, update the error handling functions in the PMIx API. Update the configure logic for the new pmix120 component ckpt Get the pmix120 component to work - still not really registering or handling notifications, but infrastructure now operates Cleanup some of the symbol scopes, and provide a more comprehensive rename.h file. Will pretty it up later - let's see how this works Cleanup the rename files to use the pretty macros	2015-12-28 23:15:44 +09:00
Gilles Gouaillardet	fec973efda	configury: test portability replace test ... -o ... with test ... \|\| test ... and test ... -a ... with test ... && test ...	2015-12-28 13:58:45 +09:00
Gilles Gouaillardet	ccc96ad204	fbtl/base: add missing #include "opal/util/output.h" Thanks Marco Atzeri for contributing the original patch	2015-12-24 14:41:26 +09:00
Gilles Gouaillardet	cebde2a753	coll/tuned: add missing #include "opal/util/output.h" Thanks Marco Atzeri for contributing the original patch	2015-12-24 14:41:17 +09:00
Gilles Gouaillardet	ad9693c604	pml/yalla: add missing #include <alloca.h>	2015-12-24 14:33:58 +09:00
Gilles Gouaillardet	b38c17dbcb	pml/cm: add missing #include <alloca.h> Thanks Paul Hargrove for reporting this issue	2015-12-24 14:33:58 +09:00
Gilles Gouaillardet	071ae39a44	osc/rdma: add missing #include <alloca.h>	2015-12-24 14:33:58 +09:00
Gilles Gouaillardet	77f199d1d7	coll/fca: add missing #include <alloca.h>	2015-12-24 14:33:58 +09:00
Todd Kordenbrock	8a3660138e	mtl-portals4: initialize endpoint nid/pid when using logical mapping When mtl-portals4 is configured for logical mapping, coll-portals4 must disqualify because it does not yet support logical mapping. coll-portals4 looks for the endpoint pid to be zero which tells it that mtl-portals4 is configured for logical mapping. This commit initializes the endpoint nid/pid to zero for logical mapping.	2015-12-22 11:20:18 -06:00
rhc54	aa17bdf6e8	Merge pull request #1239 from rhc54/topic/cleanup Cleanup the warnings from the ompi layer when compiling optimized under Mac OSX	2015-12-21 07:23:31 -08:00
Edgar Gabriel	46c20a1246	correctly set all variables storing information on the file pointer position to zero when setting the file view	2015-12-21 09:41:39 +09:00
Ralph Castain	ac6289dca6	Cleanup the warnings from the ompi layer when compiling optimized under Mac OSX Cleanup per George's comments	2015-12-17 17:39:15 -08:00
igor.ivanov@itseez.com	041a6a9f53	ompi/pml: Fix warnings in yalla component	2015-12-16 16:22:30 +02:00
igor.ivanov@itseez.com	38c253c74c	ompi/mtl: Fix warnings in mxm component	2015-12-16 16:22:29 +02:00
igor.ivanov@itseez.com	0a9956927a	ompi/coll: Fix warnings in fca components warning: assignment from incompatible pointer type	2015-12-16 16:22:16 +02:00
igor.ivanov@itseez.com	8f45d83d46	ompi/coll: Fix warnings in hcoll component warning: assignment from incompatible pointer type	2015-12-16 14:52:29 +02:00
Ralph Castain	3a56f0d34b	Create the pmix external component. Fix a few places where opal/util/argv.h were required when building with an external pmix (go figure). NOTE: Building with external pmix requires that you also build with external libevent and hwloc libraries. Detect this at configure and error out with large message if this requirement is violated. Closes #1204 (replaces it) Fixes #1064	2015-12-15 15:26:13 -08:00
Nathan Hjelm	0de9445fc7	osc/rdma: fix bugs when running more than one process per node A previous commit updated the one-sided code to register the state region only once. This created an issue when using the scratch lock with fetching atomics. In this case on any rank that isn't local rank 0 the module->state_handle is NULL. This commit fixes the issue by removing the scratch lock and using a fragment pointer instead. Fixes open-mpi/ompi#1290 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-12-15 11:25:25 -07:00
Nathan Hjelm	b7ba301310	Merge pull request #1165 from hjelmn/add_procs_group ompi/group: release ompi_proc_t's at group destruction	2015-12-14 13:53:42 -08:00
Nathan Hjelm	9d659465b7	Merge pull request #1210 from artpol84/icbarrier_fix Fix NBC iBarrier for inter-communicators.	2015-12-14 13:52:38 -08:00
Nathan Hjelm	4b3dac5933	Merge pull request #1216 from artpol84/icgatherv_fix Fix NBC iGatherv for inter-communicators.	2015-12-14 13:51:58 -08:00
Matias Cabral	7cfd7d50b9	Merge pull request #1219 from matcabral/PSM2_tag_hashing Support for PSM2 hashing lookup in message queue.	2015-12-14 12:01:55 -08:00
matcabral	9a1f9be146	A new internal feature in PSM2 will use hash tables to accelerate message queue lookups if the lookups have the proper tag&mask layout. OpenMPI should follow PSM2's preferred tag&mask spec, so that PSM2 can provide a performance benefit.	2015-12-14 10:13:39 -08:00
Artem Polyakov	2d0919dbdc	Fix NBC iGatherv for inter-communicators. We need to use remote size to form a schedule.	2015-12-14 12:19:10 +06:00
Artem Polyakov	fc17deca43	Fix NBC iBarrier for inter-communicators. Remove send of the extra message. This bug hase triggered on MPICH/coll/nbicbarrier test. In this test a series of communicators are created. This extre-message was reseived after original communicator was destroyed and queued into non_existing_communicator_pending. When new completely unrelated communicator with the same id as original was created this message was pushed into the frags_cant_match queue and caused seq numbers skew and hang.	2015-12-12 13:27:31 +06:00
Gilles Gouaillardet	3a3b13ea12	coll/base: fix an integer overflow in ompi_coll_base_reduce_generic Refs open-mpi/ompi#1198	2015-12-11 13:55:59 +09:00
Alina Sklarevich	3ffd8dcd20	PML UCX: fix typo (following `7becc54d`).	2015-12-10 13:51:10 +02:00
Nathan Hjelm	dae3746d2f	Merge pull request #1190 from kawashima-fj/pr/sm-win-test-fix osc/sm: Fix a bug that `MPI_WIN_TEST` does not update `flag` to 0	2015-12-08 06:39:16 -07:00
KAWASHIMA Takahiro	9c7b6a4352	osc/sm: Fix a bug that `MPI_WIN_TEST` does not update `flag` to 0. `MPI_WIN_TEST` must update the `flag` parameter to 0 when not all origin processes called `MPI_WIN_COMPLETE`. But sm OSC doesn't. If the caller initialize the `flag` argument to a non-0 value, the caller will receive the non-0 `flag` value.	2015-12-08 19:23:21 +09:00
Gilles Gouaillardet	59a361b781	ompio: correctly handle zero f_cc_size in mca_io_ompio_simple_grouping	2015-12-08 17:00:11 +09:00
Nathan Hjelm	f68c315188	pml/ob1: add missing ompi_request_wait_completion for buffered sends This commit adds a call to ompi_request_wait_completion for buffered sends. Without this line it is possible to get into a state where the data is never sent. Fixes open-mpi/ompi#1185 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-12-07 22:28:07 -07:00
Gilles Gouaillardet	bfe8e03d9d	fcoll/two_phase: use ompi_mpi_abort instead of PMPI_Abort Thanks Jeff for the review	2015-12-07 11:34:36 +09:00
Gilles Gouaillardet	37c978f5e9	coll/libnbc: correctly handle changed types. this fixes open-mpi/ompi@d816d1c194 thanks Jeff for the review	2015-12-07 10:13:43 +09:00

... 2 3 4 5 6 ...

6104 Коммитов