openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	a210f8046f	Cleanup ompi/dpm operations Do some code cleanup in the connect/accept code. Ensure that the OMPI layer has access to the PMIx identifier for the process. Add macros for converting PMIx names to/from strings. Cleanup a few of the simple test programs. Add a little more info to a btl/tcp error message. Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-04-08 08:37:25 -07:00
Jeff Squyres	9687d5e867	Upgrade all www.open-mpi.org URLs to https Found a handful of other URLs that weren't https-ized, so I updated them, too (after verifying that they support https, of course). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2020-04-02 10:43:50 -04:00
Nathan Hjelm	160ff188b8	Merge pull request #7169 from hjelmn/fix_what_wg21_calls_our_problem_not_theirs_seriously__in_some_ways_they_are_correct_but_wtf configure: use -iquote for non-system include paths	2020-03-30 09:22:54 -07:00
Ralph Castain	95db66d0c8	Fix typo in usnic btl Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-03-27 20:27:45 -07:00
Howard Pritchard	f136a20cae	Merge pull request #6578 from hppritcha/topic/thread_framework2 Implement a MCA framework for threads	2020-03-27 15:55:48 -06:00
Noah Evans	ee3517427e	Add threads framework Add a framework to support different types of threading models including user space thread packages such as Qthreads and argobot: https://github.com/pmodels/argobots https://github.com/Qthreads/qthreads The default threading model is pthreads. Alternate thread models are specificed at configure time using the --with-threads=X option. The framework is static. The theading model to use is selected at Open MPI configure/build time. mca/threads: implement Argobots threading layer config: fix thread configury - Add double quotations - Change Argobot to Argobots config: implement Argobots check If the poll time is too long, MPI hangs. This quick fix just sets it to 0, but it is not good for the Pthreads version. Need to find a good way to abstract it. Note that even 1 (= 1 millisecond) causes disastrous performance degradation. rework threads MCA framework configury It now works more like the ompi/mca/rte configury, modulo some edge items that are special for threading package linking, etc. qthreads module some argobots cleanup Signed-off-by: Noah Evans <noah.evans@gmail.com> Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov> Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2020-03-27 10:15:45 -06:00
Ralph Castain	33ab928e1b	ompi_proc_t size reduction: part 1 We currently save the hostname of a proc when we create the ompi_proc_t for it. This was originally done because the only method we had for discovering the host of a proc was to include that info in the modex, and we had to therefore store it somewhere proc-local. Obviously, this ccarried a memory penalty for storing all those strings, and so we added a "cutoff" parameter so that we wouldn't collect hostnames above a certain number of procs. Unfortunately, this still results in an 8-byte/proc memory cost as we have a char* pointer in the opal_proc_t that is contained in the ompi_proc_t so that we can store the hostname of the other procs if we fall below the cutoff. At scale, this can consume a fair amount of memory. With the switch to relying on PMIx, there is no longer a need to cache the proc hostnames. Using the "optional" feature of PMIx_Get, we restrict the retrieval to be purely proc-local - i.e., we retrieve the info either via shared memory or from within the proc-internal hash storage (depending upon the active PMIx components). Thus, the retrieval of a hostname is purely a local operation involving no communication. All RM's are required to provide a complete hostname map of all procs at startup. Thus, we have full access to all hostnames without including them in a modex or having to cache them on each proc. This allows us to remove the char* pointer from the opal_proc_t, saving us 8-bytes/proc. Unfortunately, PMIx_Get does not currently support the return of a static pointer to memory. Thus, even though PMIx has the hostname in its memory, it can only return a malloc'd version of it. I have therefore ensured that the return from opal_get_proc_hostname is consistently malloc'd and free'd wherever used. This shouldn't be a burden as the hostname is only used in one of two circumstances: (a) in an error message (b) in a verbose output for debugging purposes Thus, there should be no performance penalty associated with the malloc/free requirement. PMIx will eventually be returning static pointers, and so we can eventually simplify this method and return a "const char*" - but as noted, this really isn't an issue even today. Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-03-23 12:49:44 -07:00
Austen Lauria	b560fc5fae	Merge pull request #7505 from hkuno/john.l.byrne/btl_ofi Fix btl ofi clean-up logic	2020-03-23 10:10:33 -04:00
Jeff Squyres	1870b04017	usnic: remove typo Remove an amusing -- but harmless -- typo. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2020-03-20 11:16:52 -07:00
Ralph Castain	6b4fb509e9	Cleanup singleton detection and data retrieval Extend the PMIx modex recv macros to cover the full set of immediate/optional combinations. If PMIx_Init cannot reach a server, then declare the MPI proc to be a singleton. Provide full support for info values via PMIx Catch all the values used in the "info" area of OMPI using data available from PMIx instead of via envars. Update PMIx and PRRTE to sync with their capabilities. PMIx - ensure cleanup of fork/exec children - fix bug in gds/hash that left app info off of list PRRTE - fix multi-app bugs - port setup_child logic from orte - OMPI env changes - set app->first_rank - ensure common hostname across prun, prte, and pmix - Fix "nolocal" support Silence a warning from btl/vader Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-03-16 12:25:28 -07:00
Harumi Kuno	ab4875ddc2	set ep to NULL to avoid double close Per suggestion of @awlauria Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>	2020-03-10 17:39:59 -06:00
Gilles Gouaillardet	69bc2e8372	misc: fix <> vs "" includes throught the ompi codebase This commit fixes an issue with the include usage in some ompi source files. These source files are using the <> form of include when the "" form is correct (as these are internal, not system headers). Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Nathan Hjelm <hjelmn@google.com>	2020-03-09 21:13:49 -04:00
Harumi Kuno	1bc3dab118	Add comments about order of close ops Per suggestion of @awlauria, added some comments about the need to free ep before resources it points to. Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>	2020-03-07 14:08:39 -07:00
harumi kuno	3095fabf94	Fix mca_btl_ofi_finalize clean-up logic This fix is from John L. Byrne (john.l.byrne@hpe.com). When OFI Libfabric binds objects to endpoints, before the object can be successfully closed, the endpoint must first be freed. For scalable endpoints, objects can also be bound to transmit and receive contexts, and for objects that are bound to contexts, we need to first free the contexts before freeing the endpoint. We also need to clear the memory registration cache. If we don't clean up properly, then fi\_close may not be able to close the domain because the dom will have a non-zero ref count. Signed-off-by: harumi kuno <harumi.kuno@hpe.com>	2020-03-04 17:51:08 -07:00
Austen Lauria	f69c8d6819	Fix segv in btl/vader. Keep track of the connected procs in vader_add_procs(). Otherwise, the same rank will reconnect the same shmem segment (rank 0+...) multiple times instead of the next one as intended. Signed-off-by: Austen Lauria <awlauria@us.ibm.com>	2020-03-04 09:32:58 -05:00
bosilca	c4d36859ec	Merge pull request #7228 from devreal/progress-returns Harmonize return values of progress callbacks	2020-02-28 20:15:37 -05:00
Ralph Castain	9e2db26732	Fix vader local modex Restrict the search to the "immediate" range so at worst we check with our local server and don't go up to the host daemon. Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-02-27 18:15:34 -08:00
Jeff Squyres	cdf478e963	btl/sm: remove the deprecation-notice shell The SM BTL was effectively removed a long time ago. All that was left was a shell that warned people if they tried to use the SM BTL. For v5.0, we plan to finally remove this ancient shell (and possibly replace it with vader). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2020-02-25 11:48:42 -05:00
Gilles Gouaillardet	174e967dbc	Remove ORTE project Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build without reference to ORTE. Setup opal/pmix framework to be static. Remove support for all PMI-1 and PMI-2 libraries. Add support for "external" pmix component as well as internal v4 one. remove orte: misc fixes - UCX fixes - VPATH issue - oshmem fixes - remove useless definition - Add PRRTE submodule - Get autogen.pl to traverse PRRTE submodule - Remove stale orcm reference - Configure embedded PRRTE - Correctly pass the prefix to PRRTE - Correctly set the OMPI_WANT_PRRTE am_conditional - Move prrte configuration to the end of OMPI's configure.ac - Make mpirun a symlink to prun, when available - Fix makedist with --no-orte/--no-prrte option - Add a `--no-prrte` option which is the same as the legacy `--no-orte` option. - Remove embedded PMIx tarball. Replace it with new submodule pointing to OpenPMIx master repo's master branch - Some cleanup in PRRTE integration and add config summary entry - Correctly set the hostname - Fix locality - Fix singleton operations - Fix support for "tune" and "am" options Signed-off-by: Ralph Castain <rhc@pmix.org> Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2020-02-07 18:20:06 -08:00
Austen Lauria	824dbcbcf3	Protect use of _Static_assert(). Signed-off-by: Austen Lauria <awlauria@us.ibm.com>	2020-02-04 13:46:58 -05:00
Joseph Schuchart	2c97187ee0	Harmonize return values of progress callbacks Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-01-28 20:15:03 +01:00
Aurelien Bouteiller	9f4365fef6	Merge pull request #6783 from abouteiller/export/macos-epipe Prevent EPIPE on OSX.	2020-01-28 11:18:46 -05:00
Brian Barrett	fc8c7a5869	Merge pull request #7134 from wckzhang/btl_tcp_interface_match btl tcp: Use reachability and graph solving for global interface matching	2020-01-27 15:38:49 -08:00
Aurelien Bouteiller	76021e35ee	Adding a description of the FIN message for future reference. Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-01-27 13:32:34 -05:00
Aurelien Bouteiller	93846fd0ee	Remove the pending event when socket is TCP_FAILED Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-01-27 13:32:34 -05:00
Aurelien Bouteiller	6b3be224d4	Adding a FIN message to differentiate normal TCP closing from failures Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-01-27 13:32:34 -05:00
Aurelien Bouteiller	b7be64482a	Revert "Revert "Handle error cases in TCP BTL"" This reverts commit `5162011428`. Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-01-27 13:31:06 -05:00
William Zhang	e958f3cf22	btl tcp: Use reachability and graph solving for global interface matching Previously we used a fairly simple algorithm in mca_btl_tcp_proc_insert() to pair local and remote modules. This was a point in time solution rather than a global optimization problem (where global means all modules between two peers). The selection logic would often fail due to pairing interfaces that are not routable for traffic. The complexity of the selection logic was Θ(n^n), which was expensive. Due to poor scalability, this logic was only used when the number of interfaces was less than MAX_PERMUTATION_INTERFACES (default 8). More details can be found in this ticket: https://svn.open-mpi.org/trac/ompi/ticket/2031 (The complexity estimates in the ticket do not match what I calculated from the function) As a fallback, when interfaces surpassed this threshold, a brute force O(n^2) double for loop was used to match interfaces. This commit solves two problems. First, the point-in-time solution is turned into a global optimization solution. Second, the reachability framework was used to create a more realistic reachability map. We switched from using IP/netmask to using the reachability framework, which supports route lookup. This will help many corner cases as well as utilize any future development of the reachability framework. The solution implemented in this commit has a complexity mainly derived from the bipartite assignment solver. If the local and remote peer both have the same number of interfaces (n), the complexity of matching will be O(n^5). With the decrease in complexity to O(n^5), I calculated and tested that initialization costs would be 5000 microseconds with 30 interfaces per node (Likely close to the maximum realistic number of interfaces we will encounter). For additional datapoints, data up to 300 (a very unrealistic number) of interfaces was simulated. Up until 150 interfaces, the matching costs will be less than 1 second, climbing to 10 seconds with 300 interfaces. Reflecting on these results, I removed the suboptimal O(n^2) fallback logic, as it no longer seems necessary. Data was gathered comparing the scaling of initialization costs with ranks. For low number of interfaces, the impact of initialization is negligible. At an interface count of 7-8, the new code has slightly faster initialization costs. At an interface count of 15, the new code has slower initialization costs. However, all initialization costs scale linearly with the number of ranks. In order to use the reachable function, we populate local and remote lists of interfaces. We then convert the interface matching problem into a graph problem. We create a bipartite graph with the local and remote interfaces as vertices and use negative reachability weights as costs. Using the bipartite assignment solver, we generate the matches for the graph. To ensure that both the local and remote process have the same output, we ensure we mirror their respective inputs for the graphs. Finally, we store the endpoint matches that we created earlier in a hash table. This is stored with the btl_index as the key and a struct mca_btl_tcp_addr_t* as the value. This is then retrieved during insertion time to set the endpoint address. Signed-off-by: William Zhang <wilzhang@amazon.com>	2020-01-21 18:24:08 +00:00
Nathan Hjelm	037b0bd9ee	Merge pull request #7304 from hjelmn/btl_vader_fix_max_address_on_aarch64 btl/vader: modify how the max attachment address is determined	2020-01-15 17:02:33 -07:00
Nathan Hjelm	728d51f9f3	btl/vader: modify how the max attachment address is determined This PR removes the constant defining the max attachment address and replaces it with the largest address that shows up in /proc/self/maps. This should address issues found on AARCH64 where the max address may differ based on the configuration. Since the calculated max address may differ between processes the max address is sent as part of the modex and stored in the endpoint data. Signed-off-by: Nathan Hjelm <hjelmn@google.com>	2020-01-14 15:15:36 -08:00
Nathan Hjelm	61f96b3d6d	Merge pull request #7283 from hjelmn/fix_issues_in_both_vader_and_opal_interval_tree_t_that_were_causing_issue_6524 Fix issues in both vader and opal interval tree t that were causing issue 6524	2020-01-14 14:54:51 -07:00
Jeff Squyres	21bc9042e1	mtl/ofi: check for FI_LOCAL_COMM+FI_REMOTE_COMM Make sure to get an RDM provider that can provide both local and remote communication. We need this check because some providers could be selected via RXD or RXM, but can't provide local communication, for example. Add OPAL_CHECK_OFI_VERSION_GE() m4 macro to check that the Libfabric we're building against is >= a target version. Use this check in two places: 1. MTL/OFI: Make sure it is >= v1.5, because the FI_LOCAL_COMM / FI_REMOTE_COMM constants were introduced in Libfabric API v1.5. 2. BTL/usnic: It already had similar configury to check for Libfabric >= v1.1, but the usnic component was checking for >= v1.3. So update the btl/usnic configury to use the new macro and check for >= v1.3. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2020-01-13 08:19:53 -08:00
Nathan Hjelm	f86f805be1	btl/vader: fix issues with xpmem registration invalidation This commit fixes an issue discovered in the XPMEM registration cache. It was possible for a registration to be invalidated by multiple threads leading to a double-free situation or re-use of an invalidated registration. This commit fixes the issue by setting the INVALID flag on a registation when it will be deleted. The flag is set while iterating over the tree to take advantage of the fact that a registration can not be removed from the VMA tree by a thread while another thread is traversing the VMA tree. References #6524 References #7030 Closes #6534 Signed-off-by: Nathan Hjelm <hjelmn@google.com>	2020-01-07 22:50:52 -07:00
Todd Kordenbrock	1af6dbe277	Merge pull request #7066 from tkordenbrock/topic/master/portals4.fix.flowcontrol.bugs portals4: fix flow control bugs	2019-12-11 06:31:26 -06:00
Nathan Hjelm	09dd383f8b	Merge pull request #7108 from devreal/btl-ugni-deadlock uGNI: Fix potential deadlock when processing outstanding transfers	2019-11-11 10:56:56 -08:00
Howard Pritchard	9d345d9aa0	btl/uct: add UCT API version check to configury related to #7128 The UCX crew is no longer guaranteeing that the UCT API is going to be frozen, so this is kind of a whack-a-mole problem trying to keep the BTL UCT working with various changing UCT APIs. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2019-11-06 14:27:58 -07:00
Nathan Hjelm	a3026c016a	btl/uct: fix compilation for UCX 1.7.0 Ref #7128 Signed-off-by: Nathan Hjelm <hjelmn@google.com>	2019-11-05 12:53:26 -08:00
George Bosilca	476562752f	Correctly report TCP connect errors. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-10-31 18:33:15 -04:00
Joseph Schuchart	c09ca039b4	uGNI: Fix potential deadlock when processing outstanding transfers Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2019-10-26 12:21:17 +02:00
Nathan Hjelm	b1ef5a40fa	Merge pull request #7016 from hjelmn/fix_btl_uct_from_yet_another_unannounced_api_break_in_the_openucx_uct_layer btl/uct: add support for OpenUCX v1.8 API changes	2019-10-17 06:27:18 -07:00
Jeff Squyres	b6c4d5c118	Merge pull request #7060 from jsquyres/pr/usnic-mca-updates BTL usnic MCA updates	2019-10-15 10:48:10 -04:00
Stanislav Kirillov	0e0763e006	fix ipv6 btl connection bug Signed-off-by: Stanislav Kirillov <staskirillof@yandex.ru>	2019-10-10 11:20:37 +00:00
Todd Kordenbrock	f7e74b6a3d	btl-portals4: fix a flow control configure bug This commit fixes a configure bug that caused flow control to be disabled regardless of the configure options used. Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>	2019-10-09 17:12:56 -05:00
Geoff Paulsen	4e1e6f8972	Merge pull request #6993 from awlauria/fix_warnings_master Fix miscellaneous compiler warnings.	2019-10-09 09:17:02 -05:00
Jeff Squyres	3080033a8c	btl/usnic: set retrans_timeout back down to 5ms Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-10-08 11:17:54 -07:00
Jeff Squyres	132e4cab3b	btl/usnic: set ack_iteration_delay default to 4 It was previously accidentally set to 0. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-10-08 11:17:30 -07:00
Jeff Squyres	fe7f772f21	btl/usnic: properly size freelist items Move the prefix area from the head to the body in relevant size computations. This fixes a problem in high traffic situations where usNIC may have sent from unregistered memory. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-10-04 14:40:56 -07:00
Jeff Squyres	27e3040dfe	btl/usnic: cap the number of resends per progress iteration New MCA param: btl_usnic_max_resends_per_iteration. This is the max number of resends we'll do in a single pass through usNIC component progress. This prevents progress from getting stuck in an endless loop of retransmissions (i.e., if more retransmissions are triggered during the sending of retransmissions). Specifically: we need to leave the resend loop to allow receives to happen (which may ACK messages we have sent previously, and therefore cause pending resends to be moot). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-10-04 13:05:51 -07:00
Jeff Squyres	3cc95d86b2	btl/usnic: increase default retrans_timeout Significantly increase the default retrans timeout. If the retrans timeout is too soon, we can end up in a retransmission storm where the logic will continually re-transmit the same frames during a single run through the usNIC progress function (because the timer for a single frame expires before we have run through re-transmitting all the frames pending re-transmission). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-10-04 13:05:51 -07:00
Jeff Squyres	968b1a51b5	btl/usnic: clarifications and fixes regarding ACKs New MCA parameter: btl_usnic_ack_iteration_delay. Set this to the number of times through the usNIC component progress function before sending a standalone ACK (vs. piggy-backing the ACK on any other send going to the target peer). Use "ticks" language to clarify that we're really counting the number of times through the usNIC component DATA_CHANNEL completion check (to check for incoming messages) -- it has no relation to wall clock time whatsoever. Also slightly change the channel-checking scheme in usNIC component progress: only check the PRIORITY channel once (vs. checking it once, not finding anything, and then falling through the progress_2() where we check PRIORITY again and then check the DATA channel). As before, if our "progress" libevent fires, increment the tick counter enough to guarantee that all endpoints that need an ACK will get triggered to send standalone ACKs the next time through progress, if necessary. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-10-04 13:05:51 -07:00

1 2 3 4 5 ...

961 Коммитов