openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	88f51fbb8e	btl: change argument type of BTL receive callbacks This commit updates the btl interface to change the parameters passed to receive callbacks. The interface used to pass the tag, a btl base descriptor, and the callback context. Most of the values in the btl base descriptor were unused and only helped simplify the callbacks from the self btl. All of the arguments have now been replaced with a single receive callback descriptor. This descriptor contains the incoming endpoint, data segment(s), tag, and callback context. All btls have been updated to use the new callback and the btl interface version has been bumped to v3.2.0. As part of this change the descriptor argument (and the segments contained within it) have been marked as const. The were treated as const before but this change could allow the compiler to make better optimization decisions and will enforce that the callback does not attempt to change the data in the descriptor. Signed-off-by: Nathan Hjelm <hjelmn@google.com>	2020-07-08 07:38:46 -07:00
Aurelien Bouteiller	9f4365fef6	Merge pull request #6783 from abouteiller/export/macos-epipe Prevent EPIPE on OSX.	2020-01-28 11:18:46 -05:00
William Zhang	e958f3cf22	btl tcp: Use reachability and graph solving for global interface matching Previously we used a fairly simple algorithm in mca_btl_tcp_proc_insert() to pair local and remote modules. This was a point in time solution rather than a global optimization problem (where global means all modules between two peers). The selection logic would often fail due to pairing interfaces that are not routable for traffic. The complexity of the selection logic was Θ(n^n), which was expensive. Due to poor scalability, this logic was only used when the number of interfaces was less than MAX_PERMUTATION_INTERFACES (default 8). More details can be found in this ticket: https://svn.open-mpi.org/trac/ompi/ticket/2031 (The complexity estimates in the ticket do not match what I calculated from the function) As a fallback, when interfaces surpassed this threshold, a brute force O(n^2) double for loop was used to match interfaces. This commit solves two problems. First, the point-in-time solution is turned into a global optimization solution. Second, the reachability framework was used to create a more realistic reachability map. We switched from using IP/netmask to using the reachability framework, which supports route lookup. This will help many corner cases as well as utilize any future development of the reachability framework. The solution implemented in this commit has a complexity mainly derived from the bipartite assignment solver. If the local and remote peer both have the same number of interfaces (n), the complexity of matching will be O(n^5). With the decrease in complexity to O(n^5), I calculated and tested that initialization costs would be 5000 microseconds with 30 interfaces per node (Likely close to the maximum realistic number of interfaces we will encounter). For additional datapoints, data up to 300 (a very unrealistic number) of interfaces was simulated. Up until 150 interfaces, the matching costs will be less than 1 second, climbing to 10 seconds with 300 interfaces. Reflecting on these results, I removed the suboptimal O(n^2) fallback logic, as it no longer seems necessary. Data was gathered comparing the scaling of initialization costs with ranks. For low number of interfaces, the impact of initialization is negligible. At an interface count of 7-8, the new code has slightly faster initialization costs. At an interface count of 15, the new code has slower initialization costs. However, all initialization costs scale linearly with the number of ranks. In order to use the reachable function, we populate local and remote lists of interfaces. We then convert the interface matching problem into a graph problem. We create a bipartite graph with the local and remote interfaces as vertices and use negative reachability weights as costs. Using the bipartite assignment solver, we generate the matches for the graph. To ensure that both the local and remote process have the same output, we ensure we mirror their respective inputs for the graphs. Finally, we store the endpoint matches that we created earlier in a hash table. This is stored with the btl_index as the key and a struct mca_btl_tcp_addr_t* as the value. This is then retrieved during insertion time to set the endpoint address. Signed-off-by: William Zhang <wilzhang@amazon.com>	2020-01-21 18:24:08 +00:00
George Bosilca	4620c351ea	Prevent EPIPE on OSX. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-06-28 15:32:59 -04:00
Brian Barrett	4f19221af2	btl tcp: Simplify module address storage Today, a btl tcp module is associated with exactly one IP address (IPv4 or IPv6). There's no need to reserve space for both an IPv4 and IPv6 address in the module structure, since the module will only be associated with one or the other. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-10-17 16:21:17 +00:00
Jordan Cherry	d7e7e3acb7	tcp btl: Fix multiple-link connection establishment. Fix case where the btl_tcp_links MCA parameter is used to create multiple TCP connections between peers. Three issues were resulting in hangs during large message transfer: * The 2nd..btl_tcp_link connections were dropped during establishment because the per-process address check was binary, rather than a count * The accept handler would not skip a btl module that was already in use, resulting in all connections for a given address being vectored to a single btl * Multiple addresses in the same subnet caused connections to be stalled, as the receiver would always use the same (first) address found. Binding the outgoing connection solves this issue * Lastly fix race condition created by connections being started at the exact same time by accpeting connections not in the closed state, allowing endpoint_accept to resolve dispute Signed-off-by: Jordan Cherry <cherryj@amazon.com>	2018-02-27 16:36:44 +00:00
Mohan	fc32ae401e	Btl Tcp: Updated tcp handshake methods This commit has two changes 1. Adding magic string during handshake can cause issue when used with older version of MPI. Hence set RCVTIMEO paramter to 2 second 2. Using single call during handshake instead of two calls Signed-off-by: Mohan Gandhi <mohgan@amazon.com>	2017-08-18 10:06:52 -07:00
Mohan	c30a42917c	Btl tcp: Refactoring non-blocking send/receive function Moving non-blocking send/receive function to btl_tcp will help reusing these function where ever needed. In this case we plan to reuse receive function to retrive magic string to validate established connection is from mpi process. Signed-off-by: Mohan Gandhi <mohgan@amazon.com>	2017-08-17 16:45:13 -07:00
George Bosilca	cfeeecd381	Remove the tcp_local field from the TCP component. Instead use the OPAL process name to get the name of the local process. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-01-07 13:24:18 -05:00
George Bosilca	d0dddef53d	Protect the tcp_endpoints list from concurrent accesses. Thanks Gilles for your help. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2016-11-11 00:06:03 -05:00
Gilles Gouaillardet	a49422fe84	btl/tcp: get rid of the MCA_BTL_TCP_SUPPORT_PROGRESS_THREAD macro since pthreads are now mandatory, the MCA_BTL_TCP_SUPPORT_PROGRESS_THREAD is always true and hence can be safely removed Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-08 14:00:05 +09:00
Thananon Patinyasakdikul	92290b94e0	Fixed Coverity reports 1358014-1358018 (DEADCODE and CHECK_RETURN)	2016-04-07 12:52:17 -04:00
George Bosilca	f69eba1bc4	Update the copyright and cleanup the code. Per @jsquyres suggestion remove all trailing spaces. Credit to `sed -i.bak 's/ $//' /[ch]`.	2016-03-28 14:41:01 -04:00
Thananon Patinyasakdikul	92062492b9	Enable Threading in the BTL TCP Added mca parameter to turn progress thread on/off Add a flag to check if we have btl progress thread. Added macro for ob1 matching lock. Update the AUTHORS file.	2016-03-28 14:41:01 -04:00
George Bosilca	32277db6ab	Add support for async progress in the BTL TCP. All BTL-only operations (basically all data movements with the exception of the matching operation) can now be handled for the TCP BTL by a progress thread.	2016-03-28 14:40:50 -04:00
Gilles Gouaillardet	5fa63f086a	btl/tcp: add missing #include <unistd.h> Thanks Marco Atzeri for contributing the original patch	2015-12-24 14:41:46 +09:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Nathan Hjelm	5f1254d710	Update code base to use the new opal_free_list_t Use of the old ompi_free_list_t and ompi_free_list_item_t is deprecated. These classes will be removed in a future commit. This commit updates the entire code base to use opal_free_list_t and opal_free_list_item_t. Notes: OMPI_FREE_LIST__MT -> opal_free_list_ (uses opal_using_threads ()) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-24 10:05:45 -07:00
Nathan Hjelm	f241b6e0a7	btl/tcp: update for BTL 3.0 interface Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:36 -07:00
George Bosilca	2fec570fe7	There is no need to keep track of these events. They are scheduled as triggers in libevent, so one bookkepping should be enough.	2014-12-15 22:35:29 -05:00
George Bosilca	f87a4b691b	Solve another handshake problem, where one threads was calling del_event while cleaning up after receiving a zero byte on the connect socket (localyy started connection), while another was trying to accept a new connection from the same peer. Create a zero-timed event and delocalize the accept into a timer_event. Add support for registering an error callback, that can be used when a connection is discovered as failed during the initialization process.	2014-12-15 20:27:32 -05:00
George Bosilca	5b8616d890	Fix the race condition in endpoint connection initialization. The race was quite subtle, and only happened on the process with the smallest guid (as this process will tear down the connection created locally and replace it with the result of accept). If multiple threads are active in the system, the deadlock occurs during the recv event deletion as one thread will hold the recv event lock of the endpoint and try to access the TCP event base lock, while the other thread will hold the TCP event base lock while trying to access the recv event lock (in case data is available on the socket). The proposed solution let the event callback fail to process the data, preventing the deadlock and allowing the other thread to always complete it's job. As the event is not execute the same triggered will trigger again at the next opportunity, so this solution introduce a minimal delay in the connection establishement.	2014-12-13 01:45:00 -05:00
Nathan Hjelm	1b564f62bd	Revert "Merge pull request #275 from hjelmn/btlmod" This reverts commit ccaecf0fd6c862877e6a1e2643f95fa956c87769, reversing changes made to 6a19bf85dde5306f559f09952cf3919d97f52502.	2014-11-19 23:22:43 -07:00
Nathan Hjelm	e03956e099	Update the scif and openib btls for the new btl interface Other changes: - Remove the registration argument from prepare_src since it no longer is meant for RDMA buffers. - Additional cleanup and bugfixes.	2014-11-19 11:33:02 -07:00
Ralph Castain	780c93ee57	Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL. We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.	2014-11-11 17:00:42 -08:00
Jeff Squyres	a694e46560	tcp btl: remove the btl_tcp_if_seq MCA param No one was using this functionality, anyway. This commit was SVN r32381.	2014-07-31 20:16:10 +00:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00

27 Коммитов