openmpi

Автор	SHA1	Сообщение	Дата
Aurelien Bouteiller	bec7dfc1b1	Errors in non-api calls remain fatal Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-07-31 17:49:35 -04:00
Aurelien Bouteiller	e0df0f4bd9	Make errors_mpi3 compat a global mpi-3 compatibility flag Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-07-31 17:48:47 -04:00
Aurelien Bouteiller	7dfe6c1adc	Thread-shift errors reported by PMIx to the main MPI progress engine Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu> make things happen before the terminal call Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-07-31 17:48:44 -04:00
William Zhang	9b8f463a76	btl/ofi: Use common provider include/exclude list The btl/ofi does not currently utilize the common ofi include/exclude list. Added verification code similar to the mtl/ofi that will check if the info object is in the include or exclude list. If it isn't in the include list or is in the exclude list, validate_info will return OPAL_ERROR. The btl/ofi will no longer pass a provider name as a hint when calling getinfo, instead filtering the provider during validate_info. This patch also moves the is_in_list MTL function into common code and adds additional debugging output to the BTL to match the MTL standard. Signed-off-by: William Zhang <wilzhang@amazon.com>	2020-07-31 12:13:00 -07:00
Artem Polyakov	dfb0ae748f	Merge pull request #7681 from janjust/master-tls-refactor_v3 ompi/osc/ucx: remove global TLS tables	2020-07-31 10:39:53 -07:00
William Zhang	a7dcfd9874	btl/ofi: Disable EFA provider in versions earlier than libfabric 1.12.0 EFA incorrectly implements FI_DELIVERY_COMPLETE in earlier libfabric versions. While FI_DELIVERY_COMPLETE would be advertised by the provider, completions would return too early by not accounting for bounce buffers on the receive side. This would cause the BTL to receive early completions that lead to correctness issues. This is not an issue in the mtl/ofi as it does not require FI_DELIVERY_COMPLETE. Signed-off-by: William Zhang <wilzhang@amazon.com>	2020-07-30 13:53:16 -07:00
Aurelien Bouteiller	8e0cb1d49d	des->tag = hdr->frag, should be hdr->tag Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-07-30 14:02:22 -04:00
Tommy Janjusic	2c8da2c0a9	Further code reduction and simplifications. Co-authored-by: Artem Polyakov <artpol84@gmail.com> Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>	2020-07-30 20:00:22 +03:00
Tomislav Janjusic	cbfc9a3263	opal/mca/common/ucx: Use new TSD api Co-authored-by: Artem Y. Polyakov <artemp@mellanox.com> Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>	2020-07-30 00:21:26 +03:00
Tomislav Janjusic	72296e12f4	opal/common/ucx: -mutex lock/unlock suggestions -common destructor/cleanup Co-authored-with: Artem Y. Polyakov <artemp@mellanox.com> Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>	2020-07-30 00:21:26 +03:00
Tomislav Janjusic	27ba4b612f	ompi/osc/ucx: Remove workerpool's global thread storage tables. Co-authored-by: Artem Y. Polyakov <artemp@mellanox.com> Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>	2020-07-30 00:21:26 +03:00
Brian Barrett	41df122083	Merge pull request #7730 from wckzhang/newdefaults coll/tuned: Change the default collective algorithm selection	2020-07-28 15:27:46 -07:00
William Zhang	ce40cfbaa5	coll/tuned: Change the default collective algorithm selection The default algorithm selections were out of date and not performing well. After gathering data from OMPI developers, new default algorithm decisions were selected for: allgather allgatherv allreduce alltoall alltoallv barrier bcast gather reduce reduce_scatter_block reduce_scatter scatter These results were gathered using the ompi-collectives-tuning package and then averaged amongst the results gathered from multiple OMPI developers on their clusters. You can access the graphs and averaged data here: https://drive.google.com/drive/folders/1MV5E9gN-5tootoWoh62aoXmN0jiWiqh3 Signed-off-by: William Zhang <wilzhang@amazon.com>	2020-07-28 10:41:48 -07:00
Austen Lauria	d0152eb51e	Merge pull request #7940 from awlauria/revert_libevent_commit Revert "Address a race condition in libevent select."	2020-07-28 11:34:59 -04:00
Jeff Squyres	c07d77fbf2	Merge pull request #7957 from bosilca/fix/avx_alignment Use the unaligned SSE memory access primitive.	2020-07-27 15:50:40 -04:00
Artem Polyakov	e5ef80fe8c	Merge pull request #7936 from janjust/master-new-tsd-thread-api Master: new thread-specific-data (tsd) api	2020-07-24 14:58:03 -07:00
Ralph Castain	863a058f8d	Merge pull request #7964 from rhc54/topic/sync Sync to PRRTE master	2020-07-24 14:57:32 -07:00
Ralph Castain	8c0269cd4f	Sync to PRRTE master Pickup the FT and libev cleanups Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-07-24 14:11:34 -07:00
Tomislav Janjusic	d809f6ba27	New TSD API interface fix for various components Co-authored by: Artem Polykaov <artemp@mellanox.com> Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>	2020-07-24 18:29:40 +03:00
Tomislav Janjusic	cba5a0e117	Rename tsd interface function calls Co-authored by: Artem Polykaov <artemp@mellanox.com> Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>	2020-07-24 18:29:07 +03:00
Tomislav Janjusic	cb1955bb53	Fix renamed interface functions for argo, q, and pthreads Co-authored by: Artem Polykaov <artemp@mellanox.com> Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>	2020-07-24 18:29:07 +03:00
Tomislav Janjusic	07dc86eb3a	opal/thread: New TSD API Co-authored-by: Artem Polyakov <artemp@mellanox.com> Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>	2020-07-24 18:29:07 +03:00
Ralph Castain	06c585c316	Merge pull request #7962 from rhc54/topic/sync Sync to PMIx and PRRTE master	2020-07-23 16:22:32 -07:00
Ralph Castain	c0bc89dc50	Sync to PMIx and PRRTE master Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-07-23 12:35:17 -07:00
Aurelien Bouteiller	06c563625a	Add a test for mpi_errors_mpi3 behavior and non-catastrophic errors Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-07-23 05:09:29 -04:00
Aurélien Bouteiller	b37202c74e	Add compliance mode with MPI-4 routing of errors to MPI_COMM_SELF by default And other streamlining of aborting behavior. Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu> Remove OMPI_COMM_ERRORS and use NOHANDLE macros instead. Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu> route unbound errors to self error handler Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu> Do not raise the error handler from within components Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-07-23 05:09:29 -04:00
George Bosilca	c4e88a43a3	Check unaligned ops for correctness. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-07-22 11:26:07 -04:00
Joshua Ladd	366e92ce54	Merge pull request #7860 from vspetrov/hcoll_reduce_scatter Coll/Hcoll: reduce_scatter(block) interface	2020-07-22 09:45:34 -04:00
George Bosilca	b6d71aa893	Use the unaligned SSE memory access primitive. Alter the test to validate misaligned data. Fixes #7954. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-07-22 01:19:12 -04:00
Jeff Squyres	30ba603c2c	Merge pull request #7953 from cniethammer/configure-leak-fix Fix memory leak in configure, which prevents leak sanitizer usage	2020-07-21 16:34:27 -04:00
Christoph Niethammer	6564c1b942	Fix memory leak in configure, which prevents leak sanitizer usage If building Open MPI with sanitizers, e.g $ configure CC=clang CFLAGS=-fsanitize=address .... configure test programs are also build with the sanitizers and will report errors resulting in configure to fail. Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>	2020-07-21 21:28:29 +02:00
Aurelien Bouteiller	816acbdfb1	Merge pull request #7840 from abouteiller/mpi-next/init-errh MPI-4: Initial error handler	2020-07-21 11:55:14 -04:00
Joseph Schuchart	60aa97b301	Merge pull request #7948 from devreal/osc-rdma-check-endpoints osc/rdma: fail query_btls if no endpoint for non-local peer is found	2020-07-20 15:14:25 +02:00
bosilca	1139d9ecae	Merge pull request #7931 from bosilca/fix/7928 Fix the BTL API conversion for the SMCUDA BTL	2020-07-18 17:35:39 -04:00
Joseph Schuchart	eebc451ec8	osc/rdma: fail query_btls if no endpoint for non-local peer is found Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-07-16 17:06:35 +02:00
Aurelien Bouteiller	7118755ae8	Add a tester for the initial error handler Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-07-16 03:10:32 -04:00
Aurelien Bouteiller	5f1f7fe313	route errors to self/initial error handler depending upon the state of MPI initialization Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-07-16 03:10:32 -04:00
Aurélien Bouteiller	bed909c3ba	Read the info key mpi_initial_errhandler from spawn/spawn_multiple Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu> Use the same env to transmit the initial error handler to spawnees Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-07-16 03:10:32 -04:00
Aurélien Bouteiller	83d0f92152	Set the initial error handler onto predefined communicators Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu> update to the predefined initial error handler selection Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-07-16 03:10:32 -04:00
Aurélien Bouteiller	3cd85a9ec5	Add the initial_errhandler info key to MPI_INFO_ENV and populate the value from prun populated paremeters Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu> Allow errhandlers to invoke the initial error handler before MPI_INIT Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu> Indentation Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-07-16 03:10:32 -04:00
Aurélien Bouteiller	703b8c356f	Make error_class and error_string callable before/after MPI_INIT/FINALIZE Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu> make lazy initialization opal unlikely Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2020-07-16 03:10:32 -04:00
Ralph Castain	7702dfcdd2	Merge pull request #7942 from rhc54/topic/init Ensure we init and protect values	2020-07-15 07:59:01 -07:00
George Bosilca	8bc1f3d8fb	Don't allow any asynchronous CUDA operations. There are 2 reasons for this: - pending CUDA events are not progressed by this BTL, so anything that becomes asychronous will never be completed. - we use the packed data on the shared memory backing file, and this will be returned to the peer process upon return (thus if we copy asynchronously we might not copy the right data). Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-07-15 01:37:09 -04:00
George Bosilca	0e32b0acef	Avoid a lock if no CUDA IPC operations are pending. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-07-15 01:35:34 -04:00
Ralph Castain	a574addce9	Ensure we init and protect values Scrub the entire ompi_rte.c file to initialize and protect values received from PMIx. Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-07-14 15:25:14 -07:00
Austen Lauria	67d90166cf	Revert "Address a race condition in libevent select." We do not want to be patching upstream components anymore. The proper method is to get this merged upstream, then pull it in the next upstream release. This reverts commit c39fb5758a772c062e20db9b42f2b06805884802. Signed-off-by: Austen Lauria <awlauria@us.ibm.com>	2020-07-14 16:23:21 -04:00
George Bosilca	fd4ca394e2	Make the smcuda BTL great again. It has been broken for months because of the lack of initialization of the HWLOC library. The smcuda process creating the backing file (local rank 0) uses opal_cache_line_size to align the objects in the backing file, and the opal_cache_line_size is initialized by default to 128. Later on, when the rest of the processes attach the same backing file, HWLOC has been called and the cache size has now been updated to the correct value. If this value is different than the default one (and they are as most cache sizes are 64 bytes right now) the objects in the backing file will be misaligned. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-07-14 01:48:08 -04:00
George Bosilca	96e8cbe25f	First step on fixing the BTL API conversion for the SMCUDA BTL Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-07-13 14:46:10 -04:00
Joshua Ladd	aa8f7f4ede	Merge pull request #7893 from bureddy/cuda-ucx UCX: initialize cuda from ucx pml component	2020-07-13 14:18:48 -04:00
bosilca	1f237f5fc9	Merge pull request #7419 from bosilca/topic/avx512 Add support for AVX512/AVX2/SSE/MMX	2020-07-13 11:56:50 -04:00

... 2 3 4 5 6 ...

31074 Коммитов