openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	fad0803920	osc/rdma: fix typo in atomic code Fixes #3267 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-04-03 15:54:28 -06:00
Nathan Hjelm	c72fb30eb5	osc/pt2pt: fix typo Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2017-03-23 09:00:21 -06:00
Jeff Squyres	760db0d5ce	osc/pt2pt: fix compiler warning Remove unused variable. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-03-16 05:46:11 -07:00
Valentin Petrov	fe069c9570	Fixes the coll_allgather usage bug One should use the correct module object when calling c_coll.coll_allgather. Otherwise there will be a segfault in the case, for example, when hcoll is used. In that case c_coll.coll_allgather = mca_coll_hcoll_allgather while c_coll.coll_gather_module = tuned. Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2017-03-14 09:47:39 +02:00
Nathan Hjelm	0195d15401	osc/pt2pt: flush pending fragments on lock ack This commit addresses an issue that can occur in cases where a lot of fragments are outstanding. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-06 13:58:46 -07:00
Josh Hursey	b1c4e50500	Merge pull request #2934 from jjhursey/topic/coll-comm-restructure Move coll structure outside of the communicator	2017-02-28 08:45:18 -06:00
Nathan Hjelm	032bcf915a	osc/rdma: fix compile warning Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-27 16:26:00 -07:00
George Bosilca	366d64b7e5	Move the collective structure outside the communicator. As we changed the ABI (forcing a major release), we can limit the size of the predefined communicators by moving the collective structure outside the communicator. This might have a minimal, but unnoticeable, impact on performance. This approach has been discussed during the January 2017 devel meeting. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-02-27 11:54:17 -06:00
Nathan Hjelm	581bff9871	Merge pull request #3034 from hjelmn/osc_rdma_atomic osc/rdma: make locking code more robust	2017-02-27 08:46:52 -07:00
Nathan Hjelm	4707c7c5e0	osc/rdma: make locking code more robust Under heavy load the locking code could fail if the underlying btl module started to return OPAL_ERR_OUT_OF_RESOURCE on atomic operations. This commit updates the code to gracefully handle btl errors. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-27 00:01:26 -07:00
Gilles Gouaillardet	af0b5cffb4	asm: rename the AMD64 into X86_64 in this context, AMD64 really means amd64 or em64t, so let's rename this into X86_64 in order to avoid any confusion Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-27 15:10:50 +09:00
Gilles Gouaillardet	4184c01be5	Merge pull request #2393 from bosilca/topic/no_predefined_ddt_refcount Don't refcount the predefined datatypes.	2017-02-21 09:38:11 +09:00
Todd Kordenbrock	048f757d9f	osc-portals4: add support for noncontiguous datatypes This commit implements onesided operations for noncontiguous datatypes using two different algorithms. * If the result and/or origin datatype is noncontiguous and the target datatype is contiguous, then an iovec MD is created for the result and origin. The operation is performed using a single Portals4 call (unless it exceeds the max message size). * If the target datatype is noncontigous, then an algorithm similar to the one in osc-rdma is used to loop over the contiguous blocks of each datatype. The operation is performed using multiple Portals4 calls. This commit ensures that individual operations do not exceed the max atomic size or the max message size supported by the device. Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>	2017-02-15 16:17:13 -06:00
Gilles Gouaillardet	cd4537193c	osc/sm: fix MPI_Win_allocate_shared() alignment add padding so the memory allocated by MPI_Win_allocate_shared() is 64 bytes aligned. Thanks Joseph Schuchart for the bug report Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-15 13:40:48 +09:00
Nathan Hjelm	cc4a0fabcf	Merge pull request #2727 from hjelmn/osc_rdma osc/rdma: fix typo in check for MPI_MODE_NOCHECK	2017-02-14 10:50:33 -07:00
Geoff Paulsen	4917e44a7d	Merge pull request #2832 from jjhursey/topic/ibm/osc-base-dt-abort osc/base: Detect unsupported data types and abort	2017-02-05 04:26:04 -06:00
Nathan Hjelm	362ac8b87e	osc/pt2pt: fix threading issues This commit fixes a number of threading issues discovered in osc/pt2pt. This includes: - Lock the synchronization object not the module in osc_pt2pt_start. This fixes a race between the start function and processing post messages. - Always lock before calling cond_broadcast. Fixes a race between the waiting thread and signaling thread. - Make all atomically updated values volatile. - Make the module lock recursive to protect against some deadlock conditions. Will roll this back once the locks have been re-designed. - Mark incoming complete after completing an accumulate not before. This was causing an incorrect answer under certain conditions. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-01 10:33:01 -07:00
Nysal Jan K.A	94f92f6b49	osc/base: Detect unsupported data types and abort Using MPI_MINLOC or MPI_MAXLOC with the following data types leads to data corruption: * MPI_DOUBLE_INT * MPI_LONG_INT * MPI_SHORT_INT * MPI_LONG_DOUBLE_INT Detect this print a error message and abort. This workaround should be removed once the following issue is resolved: * https://github.com/open-mpi/ompi/issues/1666 Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-01-25 15:28:28 -06:00
Nathan Hjelm	0497ec0b70	osc/rdma: fix typo in check for MPI_MODE_NOCHECK This commit fixes two typos in the lock_all path that inverted the MPI_MODE_NOCHECK flag. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-01-12 11:28:11 -07:00
George Bosilca	c2cd717f82	Don't refcount the predefined datatypes. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-01-11 16:48:59 -05:00
Ralph Castain	dadc6fbaf6	Merge pull request #2448 from thananon/remove_request_lock Completely removed ompi_request_lock and ompi_request_cond	2017-01-03 19:31:46 -08:00
Mark Allen	eec1d5bf2e	osc/pt2pt: Fix hang with Put and Win_lock_all * When using `MPI_Put` with `MPI_Win_lock_all` a hang is possible since the `put` is waiting on `eager_send_active` to become `true` but that variable might not be reset in the case of `MPI_Win_lock_all` depending on other incoming events (e.g., `post` or ACKs of lock requests. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2016-12-16 11:52:53 -05:00
Mark Allen	0d1336b4a8	osc/pt2pt: Fix Lock/Unlock and Get wrong answer * When using `MPI_Lock`/`MPI_Unlock` with `MPI_Get` and non-contiguous datatypes is is possible that the unlock finishes too early before the data is actually present in the recv buffer. * We need to wait for the irecv to complete before unlocking the target. This commit waits for the outgoing fragment counts to become equal before unlocking. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2016-12-16 11:52:51 -05:00
Mark Allen	1ebf9fd3a4	osc/pt2pt: Fix PSCW after Fence wrong answer. * If the user uses PSCW synchronization after a Fence then the previous epoch is not reset which can cause the PSCW to transfer data before it is ready leading to wrong answers. * This commit resets the `eager_send_active` in the start call. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2016-12-16 11:52:49 -05:00
Ralph Castain	585540bcee	Reduce the flood of warnings due to uninitialized variables, mismatched types, and unused things to a more bearable trickle Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-14 16:33:50 -08:00
Ralph Castain	1e2019ce2a	Revert "Update to sync with OMPI master and cleanup to build" This reverts commit `cb55c88a8b`.	2016-11-22 15:03:20 -08:00
Thananon Patinyasakdikul	b25a8c3fa5	Completely removed ompi_request_lock and ompi_request_cond as we dont need them anymore. Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>	2016-11-22 17:58:31 -05:00
Ralph Castain	cb55c88a8b	Update to sync with OMPI master and cleanup to build Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-22 14:24:54 -08:00
Gilles Gouaillardet	bd364d29f7	osc/sm: plug an other memory leak in ompi_osc_sm_free Fixes open-mpi/ompi@f1b473ee63 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-14 23:19:07 -07:00
Gilles Gouaillardet	f1b473ee63	osc/sm: plug a memory leak in ompi_osc_sm_free Thanks Joseph Schuchart for the report. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-14 22:22:43 -07:00
Gilles Gouaillardet	958e29f929	osc/rdma: silence a warning declare a local variable volatile and silence CID 1372692	2016-10-13 16:10:07 +09:00
Nathan Hjelm	e8ef503bee	osc/rdma: fix warnings Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-10-12 10:17:25 -06:00
Nathan Hjelm	432d79046b	Merge pull request #2197 from tkordenbrock/topic/master/osc-rdma.put.use.true_extent osc-rdma: fix datatype lower bound errors in ompi_osc_rdma_master()	2016-10-11 10:42:02 -06:00
Todd Kordenbrock	05f86b5df7	osc-rdma: fix datatype lower bound errors in ompi_osc_rdma_master() Instead of ompi_datatype_get_extent(), use ompi_datatype_get_true_extent() to get the local and remote lower bound. For derived types like subarray, true_lb is the correct offset for RDMA operations.	2016-10-10 06:45:28 -05:00
Todd Kordenbrock	cc863ff9fb	osc-portals4: fix datatype errors in put() Instead of ompi_datatype_get_extent(), use ompi_datatype_get_true_extent() to get the origin and target lower bound. For derived types like subarray, true_lb is the correct offset for RDMA operations. Also, instead of the extent use the size of the datatype.	2016-10-10 06:45:14 -05:00
Todd Kordenbrock	c536e11cf3	osc-portals4: fix offset bug in raccumulate() This commit fixes a bug where the remote offset was used as both the local and remote offset. Thanks to @PDeveze for the patch.	2016-10-04 09:09:17 -05:00
Nathan Hjelm	59bae1a330	osc/rdma: fix typo in compare-and-swap This commit fixes a typo in compare-and-swap when retrieving the memory region associated with a displacement. It was erroneously 8 bytes instead of the datatype size. This can cause an incorrect RMA range error when the compare-and-swap is less than 4 bytes from the end of the region. Fixed open-mpi/ompi#2080 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-14 16:49:42 -06:00
Nathan Hjelm	7c8e7691a7	Merge pull request #2045 from hjelmn/osc_rdma_atomics osc/rdma: add support for network AMOs	2016-09-08 11:21:49 -06:00
Nathan Hjelm	1ce5847e8b	osc/rdma: add support for network AMOs This commit adds support for using network AMOs for MPI_Accumulate, MPI_Fetch_and_op, and MPI_Compare_and_swap. This support is only enabled if the ompi_single_intrinsic info key is specified or the acc_single_interinsic MCA variable is set. This configuration indicates to this implementation that no long accumulates will be performed since these do not currently mix with the AMO implementation. This commit also cleans up the code somwhat. This includes removing unnecessary struct keywords where the type is also typedef'd. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-01 15:47:33 -06:00
Nathan Hjelm	cb1cb5ffed	osc/pt2pt: do not use frag send to send lock request This commit cleans up some code in the passive target path. The code used the buffered frag control send path but it is more appropriate to use the unbuffered one. This avoids checking structures that are should not be in use in this path. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-01 09:57:27 -06:00
Nathan Hjelm	99b26644c1	Merge pull request #2011 from hjelmn/osc_pt2pt_fix osc/pt2pt: fix possible race in peer locking	2016-08-29 09:17:36 -06:00
Nathan Hjelm	e53de7ecbe	osc/rdma: fix bug in dynamic memory window tracking code This commit fixes an ordering bug in the code that keeps track of all attached memory windows. The code is intended to keep the memory regions sorted but was often inserting at the wrong index. Thanks to Christoph Niethammer for reporting the issue. The reproducer will be added to nightly MTT testing. Fixes open-mpi/ompi#2012 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-25 12:08:46 -06:00
Nathan Hjelm	7af138f83b	osc/pt2pt: fix possible race in peer locking It is possible for another thread to process a lock ack before the peer is set as locked. In this case either setting the locked or the eager active flag might clobber the other thread. To address this the flags have been made volatile and are set atomically. Since there is no a opal_atomic_or or opal_atomic_and function just use cmpset for now. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-25 09:28:25 -06:00
Nathan Hjelm	70f8a6e792	osc/pt2pt: fix several bugs This commit fixes some bugs uncovered during thread testing of 2.0.1rc1. With these fixes the component is running cleanly with threads. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-24 14:35:45 -06:00
Ralph Castain	6549c878a9	Silence the warnings	2016-08-22 15:35:27 -07:00
Nathan Hjelm	e5c7512692	Merge pull request #1983 from hjelmn/request_cb ompi/request: change semantics of ompi request callbacks	2016-08-18 08:31:56 -06:00
Nathan Hjelm	6aa658ae33	ompi/request: change semantics of ompi request callbacks This commit changes the sematics of ompi request callbacks. If a request's callback has freed or re-posted (using start) a request the callback must return 1 instead of OMPI_SUCCESS. This indicates to ompi_request_complete that the request should not be modified further. This fixes a race condition in osc/pt2pt that could lead to the req_state being inconsistent if a request is freed between the callback and setting the request as complete. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-17 20:14:01 -06:00
Nathan Hjelm	40b70889e5	osc/pt2pt: make receive count an unsigned int This receive_count MCA variable should never be negative. Change it to an unsigned int. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-08-17 08:14:24 -06:00
Gilles Gouaillardet	8faa1edafa	osc/pt2pt: silence misc warnings	2016-08-17 14:24:14 +09:00
Nathan Hjelm	9444df1eb7	osc/pt2pt: make lock_all locking on-demand The original lock_all algorithm in osc/pt2pt sent a lock message to each peer in the communicator even if the peer is never the target of an operation. Since this scales very poorly the implementation has been replaced by one that locks the remote peer on first communication after a call to MPI_Win_lock_all. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-11 15:33:07 -06:00

1 2 3 4 5 ...

456 Коммитов