openmpi

Автор	SHA1	Сообщение	Дата
Gilles Gouaillardet	0d560ddf77	osc: fix typo this typo caused build failure when configure'd with --enable-memchecker see http://mtt.open-mpi.org/index.php?do_redir=2234	2015-02-16 10:09:08 +09:00
George Bosilca	a7a4d6335e	Various cleanups.	2015-02-15 11:39:09 -05:00
Nathan Hjelm	0e822e03f7	osc/sm: always release the lock on MPI_Unlock When a lock was obtained with MPI_MODE_NOCHECK it was not correctly release on unlock. This is an error. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-12 18:54:22 -07:00
Nathan Hjelm	1a3597aa93	osc/base: fix accumulate on derived datatypes With certain datatypes the opal_datatype_unpack method for performing the accumulate operation does not work. This commit modifies the accumulate code in the osc base to use opal_convertor_raw instead. Fixes #385	2015-02-11 12:36:30 -07:00
Nathan Hjelm	a2bdfd99a2	osc/pt2pt: do not set active_incoming_frag_signal_count to 0 on fence completion	2015-02-11 12:34:04 -07:00
Todd Kordenbrock	b5a0f3d347	osc-portals4: rename OPAL_ASSEMBLY_ARCH values from OMPI_* to OPAL_*	2015-02-04 16:08:55 -06:00
Gilles Gouaillardet	9be4dfb152	osc/pt2pt: invoke ompi_osc_signal_outgoing only once per fragment	2015-01-22 13:43:44 +09:00
Gilles Gouaillardet	661c35ca67	cleanup dead code caused by the removal of the --with-threads configure option	2015-01-16 19:13:59 +09:00
Ralph Castain	4e592ac434	Fix the tarball by providing the correct list of headers in the Makefile.am	2015-01-07 18:37:26 -08:00
Nathan Hjelm	e68ed2876c	osc/pt2pt: threading fixes and code cleanup	2015-01-06 13:39:16 -07:00
Nathan Hjelm	9eba7b9d35	Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma	2015-01-06 13:38:55 -07:00
Nathan Hjelm	1b564f62bd	Revert "Merge pull request #275 from hjelmn/btlmod" This reverts commit ccaecf0fd6c862877e6a1e2643f95fa956c87769, reversing changes made to 6a19bf85dde5306f559f09952cf3919d97f52502.	2014-11-19 23:22:43 -07:00
Nathan Hjelm	0d413fb73f	Revert "Remove stale file reference" This reverts commit 4c8fa17234bcaab07460fb12fbf826ceee4f8df2.	2014-11-19 23:16:16 -07:00
Ralph Castain	4c8fa17234	Remove stale file reference	2014-11-19 18:32:19 -08:00
Nathan Hjelm	5a0a48c3c4	osc: remove lingering rdma component files	2014-11-19 12:11:54 -07:00
Nathan Hjelm	22625b005b	osc/pt2pt: threading fixes and code cleanup	2014-11-19 11:33:04 -07:00
Nathan Hjelm	45d1fac8af	ugni thread safety fixes	2014-11-19 11:33:03 -07:00
Nathan Hjelm	29e4e1c90a	Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma	2014-11-19 11:33:03 -07:00
Ralph Castain	616f0894ce	Add missing parens on values being passed to OPAL_THREAD_ADD32	2014-10-31 19:11:48 -07:00
Nathan Hjelm	672d96704c	osc/rdma: fix regression introduced by eed7b45db59b356f6257f3b019e77643f14f84ce The ompi_osc_signal_outgoing was moved from ompi_osc_rdma_frag_start to frag_send which gave correct results for the bug reproducer but hangs with simple OSC tests. Moved the ompi_osc_signal_outgoing back and it now passes all tests. Closes #256	2014-10-30 23:16:11 -06:00
Nathan Hjelm	23dd3af946	osc/rdma: use unsigned types for all counters Some of the counters used by the "rdma" one-sided component are intended to overflow. Since overflow behavior is undefined for signed integers in C it is safer to use unsigned integers here.	2014-10-22 15:36:15 -06:00
George Bosilca	7541c03b4c	Mark all instances where atomic operations are used but their return value is unnecessary	2014-10-15 21:47:32 -04:00
Nathan Hjelm	eed7b45db5	osc/rdma: fix issue identified by Berk Hess osc/rdma uses counters to determine if all messages have been received before exiting synchronization calls. The problem is that the active target counter is always increasing (never zeroed). If over 2^31-1 messages are sent this causes the counter to overflow (in itself this isn't an error). This causes test/wait to return before the communication is complete. There is an additional error in the use of the fragment flush function. If PSCW synchronization is in use this function CAN NOT be called unless a post message has arrived. Relevant mailing list thread: http://www.open-mpi.org/community/lists/devel/2014/10/16016.php This commit fixes both issues. Tested against MTT and issue reproducer. Closes #224.	2014-10-07 11:45:22 -06:00
Ralph Castain	eb95d6f892	ompi_info_get_bool returns "success" if the value isn't found, setting "flag" to false, but doesn't set the value of the param itself. So if you don't specify "blocking_fence" in MPI_Info, then the "blocking_fence" flag wasn't being set. Initialize the blocking_fence flag to false as the code logic indicates that it should only be set if someone provides that flag. Thanks to Lisandro Dalcin for reporting it cmr=v1.8.4:reviewer=hjelmn This commit was SVN r32812.	2014-09-29 17:21:28 +00:00
Gilles Gouaillardet	5f1e0f284a	Fix compilation when --enable-hetorogeneous This commit was SVN r32410.	2014-08-04 10:35:08 +00:00
Ralph Castain	61bf7af9d2	Per Paul Hargrove's suggestion, create an opal_pagesize function to abstract the various ways of obtaining that value. Rather than creating a separate file for only that one function, put it in a convenient place that is at least somewhat related. Refs trac:4826 This commit was SVN r32407. The following Trac tickets were found above: Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826	2014-08-02 18:38:16 +00:00
Ryan Grant	caa10a5faf	Portals fixes after latest move This commit was SVN r32330.	2014-07-28 19:25:03 +00:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00
Todd Kordenbrock	42a871efd4	This commit fixes trac:4662 - "Portals4/MTL hangs in c_get_accumulate test". - Portals4/OSC was unable to acquire an exclusive lock due to an invalid local address in the atomic operation. This caused the reported hang. - After fixing the hang, the test continued to fail because ompi_datatype_is_contiguous_memory_layout() reports that MPI_EMPTY (the origin datatype) is noncontiguous and Portals4/OSC does not support noncontiguous datatypes at this time. However, in this case the origin count is zero so contiguous/noncontiguous is irrelevant. Now we skip the contiguous check if the count is zero. cmr=v1.8.3:reviewer=regrant:subject=Fix for "Portals4/MTL hangs in c_get_accumulate test" This commit was SVN r32295. The following Trac tickets were found above: Ticket 4662 --> https://svn.open-mpi.org/trac/ompi/ticket/4662	2014-07-23 19:13:07 +00:00
Ralph Castain	3d1b32a2c6	Silence warning cmr=v1.8.2:reviewer=hjelmn This commit was SVN r32231.	2014-07-14 19:27:30 +00:00
Nathan Hjelm	32ab6f850e	osc/rdma: fix warning cmr=v1.8.2:reviewer=rhc This commit was SVN r32201.	2014-07-10 18:42:55 +00:00
Nathan Hjelm	b6abe68972	osc/rdma: check for more types of window access violations This commit adds a check to see if the target is in an access epoch. If not we return OMPI_ERR_RMA_SYNC. This fixes test_start3 in the onesided test suite. The cost of this extra check is 1 byte/peer for the boolean flag indicating that the peer is in an access epoch. I also fixed a problem where mupliple unexpected post messages are not correctly handled. cmr=v1.8.2:reviewer=jsquyres This commit was SVN r32160.	2014-07-08 21:11:12 +00:00
Ryan Grant	a1d312343b	This commit fixes trac:4681 - ibm c_fence_lock hangs cmr=v1.8.2:reviewer=tkordenbrock:subject=Portals4/MTL hanging fix This commit was SVN r32113. The following Trac tickets were found above: Ticket 4681 --> https://svn.open-mpi.org/trac/ompi/ticket/4681	2014-07-01 17:03:03 +00:00
Ryan Grant	5cb8cc856c	Refs trac:4682 - This commit fixes c_flush test failure in the ibm test suite for Portals 4 OSC cmr=v1.8.2:reviewer=tkordenbrock:subject=Move r32112 to v1.8.2 branch This commit was SVN r32112. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r32112 The following Trac tickets were found above: Ticket 4682 --> https://svn.open-mpi.org/trac/ompi/ticket/4682	2014-07-01 16:26:16 +00:00
Ralph Castain	f3cb124e50	Revert r32082 and r32070 - the developer's conference has decided to go a different direction on the threaded progress effort. This will involve some degree of prototyping to understand the tradeoffs prior to making a final design decision, and so we'll hold off on the final change until that is completed. This commit was SVN r32089. The following SVN revision numbers were found above: r32070 --> open-mpi/ompi@12d92d0c22 r32082 --> open-mpi/ompi@aa6438ef7a	2014-06-25 20:43:28 +00:00
Ralph Castain	12d92d0c22	Per the OMPI developer conference, remove the last vestiges of OMPI_USE_PROGRESS_THREADS This commit was SVN r32070.	2014-06-24 17:05:11 +00:00
Nathan Hjelm	fd21b244ce	osc/rdma: better name for lookup function cmr=v1.8.2:ticket=trac:4732:reviewer=ompi-rm1.8 This commit was SVN r32021. The following Trac tickets were found above: Ticket 4732 --> https://svn.open-mpi.org/trac/ompi/ticket/4732	2014-06-17 19:49:17 +00:00
Nathan Hjelm	390f8f52b4	osc/rdma: clean up group process matching a bit cmr=v1.8.2:ticket=trac:4732:reviewer=dgoodell This commit was SVN r32018. The following Trac tickets were found above: Ticket 4732 --> https://svn.open-mpi.org/trac/ompi/ticket/4732	2014-06-17 17:48:30 +00:00
Nathan Hjelm	7f20868179	osc/rdma: ensure matching of post/start calls The post and start window calls are supposed to be matching. The code did not check to see that an incoming post matched with the start call. This commit fixes the bug by placing the post on a pending list that will be checked by the next call to start. cmr=v1.8.2:reviewer=dgoodell This commit was SVN r32017.	2014-06-17 15:23:06 +00:00
Nathan Hjelm	927098d567	osc/rdma: fix hang when accumulating with MPI_REPLACE The replace callback did not increment the incoming frag counter. This leads to a hang during synchronization. This commit adds the increment and also puts the request on the garbage collection list to fix a leak. This fixes a hang found when running the mpich test suite. cmr=v1.8.2:reviewer=bbenton This commit was SVN r32016.	2014-06-17 14:53:29 +00:00
Nathan Hjelm	7f6de57653	osc/rdma: fix accumulate fragment size calculation The wrong type was used when calculating the amount of space needed for an accumulate fragment. Fixed the calculation and took the opportunity to eliminate the get_acc header as it is identical to the acc header. This fixes trac:4719 and #4718 Tracking these fixes for 1.8.2 in this CMR. Throwing this to Brad for review as he is the one who ran into the issue. cmr=v1.8.2:reviewer=bbenton This commit was SVN r32015. The following Trac tickets were found above: Ticket 4719 --> https://svn.open-mpi.org/trac/ompi/ticket/4719	2014-06-17 14:53:24 +00:00
Nathan Hjelm	2f96f16416	osc/rdma: ensure eager sends are active before checking for sync errors in self optimization This addresses an issue found with the MPICH pscw_ordering test. Eager sends were not yet active (which is ok for the standard path) but not ok for the self optimization. Fixed by waiting for all post messages before checking the sync state. Fixes trac:4724 Tracking the 1.8.2 issue in this CMR. cmr=v1.8.2:reviewer=bbenton This commit was SVN r32012. The following Trac tickets were found above: Ticket 4724 --> https://svn.open-mpi.org/trac/ompi/ticket/4724	2014-06-17 04:53:47 +00:00
Nathan Hjelm	41f0059f1e	osc/sm: use an unsigned long when calculating the total segment size Brad correctly pointed out that the total window size should not be an int. Changed it to an unsigned long. cmr=v1.8.2:reviewer=bbenton This commit was SVN r32010.	2014-06-17 04:33:43 +00:00
Nathan Hjelm	6ec9c6c422	osc/sm: return ompi_request_empty for all request ops Only one field is valid for RMA requests: MPI_ERROR. This field is set to the correct value in ompi_request_empty so there is no reason to allocate and keep track of osc/sm requests because they are always complete on return. Since we are no longer using the osc/sm request structure or free list they are now removed. Closes trac:4723 Tracking this issue with the CMR. Brad, can you verify the issue is indeed fixed. cmr=v1.8.2:reviewer=bbenton This commit was SVN r32009. The following Trac tickets were found above: Ticket 4723 --> https://svn.open-mpi.org/trac/ompi/ticket/4723	2014-06-17 04:27:02 +00:00
Gilles Gouaillardet	96ae38823d	Fix a memory leak in ompi_osc_base_finalize() cmr=v1.8.2:reviewer=rhc This commit was SVN r31788.	2014-05-16 05:25:24 +00:00
Nathan Hjelm	2f5b1ca4cf	osc/rdma: do not leak the receive request This commit fixes a bug that can cause request and communicator leaks when cleaning up an OSC window. The should prevent a hang seen with IMB-EXT. cmr=v1.8.2:reviewer=jsquyres This commit was SVN r31539.	2014-04-28 19:55:18 +00:00
Yossi Etigin	7efb724d7b	osc/rdma: fix deadlock with put_long protocol. When sending PUT_LONG, the data is sent before headers, and sometimes the header is not flushed immediately. This creates a lot of unexpected receives in the peer, since it would posts a receive only when gets the header, which makes it run out of receive buffers. When the sender eventually flushes the window, the receiver already has no buffers to receive the header, which causes a deadlock. The fix is to always flush the headers when doing put_long. cmr=v1.8.1:reviewer=hjelmn This commit was SVN r31378.	2014-04-13 16:24:56 +00:00
Nathan Hjelm	7aece0a7fd	osc/sm: fix bugs in both the passive and active target paths While testing one-sided on LANL systems I found a couple more OSC bugs that were not caught during the initial testing: - In the passive target code we read the read lock count as a char instead of the intended uint32_t. This causes lock to lockup when using shared locks after 127 iterations. - The post code used the wrong group when trying to increment post counters. This causes a segmentation fault. - Both the post and wait code used the wrong check in the inner loop leading to an infinite loop. cmr=v1.8.1:reviewer=jsquyres This commit was SVN r31354.	2014-04-08 21:55:00 +00:00
Nathan Hjelm	a31bfbeb2c	osc/rdma: fix typo in get accumulate path There was a typo in the ompi_osc_gacc_long_start that was causing a segmentation fault when executing long get accumulate operations. cmr=v1.8.1:reviewer=jsquyres This commit was SVN r31353.	2014-04-08 21:54:52 +00:00
Ralph Castain	b12ee27b3d	Add missing files - thanks to Mr. Anonymous for reporting them as missing from the 1.8 tarball cmr=v1.8.1:reviewer=jsquyres:subject=add missing portals4 files This commit was SVN r31332.	2014-04-08 02:55:14 +00:00

1 2 3 4 5 ...

346 Коммитов