openmpi

Автор	SHA1	Сообщение	Дата
Michael Heinz	e21c31f54c	Merge pull request #7722 from mwheinz/mwheinz-7721 Add check for PSM2 reference counting to PSM2 MTL #7721	2020-05-19 08:06:41 -04:00
Michael Heinz	f10305a49f	Add check for PSM2 reference counting to PSM2 MTL #7721 As discussed, a feature is being added to libpsm2 to correctly handle the case where the library is opened by multiple OMPI transports in the same process. (For example, the OFI BTL and the PSM2 MTL). * Improved error message to indicate required libpsm2 version. * Adds a test at autogen/configure time for the existence of PSM2_LIB_REFCOUNT_CAP. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>	2020-05-18 15:25:22 -04:00
Ralph Castain	4468691eeb	Sync up with PRRTE and cleanup stale code Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-05-16 14:48:31 -07:00
Howard Pritchard	f744668f5f	Merge pull request #7646 from hppritcha/topic/ofi_common_wl add a common ofi whitelist/blacklist	2020-05-13 06:44:05 -06:00
Michael Heinz	4a5622a436	Merge pull request #7713 from mwheinz/master-7699 PSM2: Call add_procs through PML	2020-05-13 07:59:43 -04:00
Michael Heinz	548060e43f	PSM2: Call add_procs through PML Change ompi_mtl_ofi_get_endpoint() to call the active PML's add_procs() rather than the OFI MTL add_procs() directly when discovering a new process during operation. Functionally, this has no impact in correct operation. However, the current behavior means that the heterogenous and active PML checks are not being executed in the dynamic discovery case. Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>	2020-05-12 12:35:39 -04:00
Howard Pritchard	9f1081a07a	add a common ofi whitelist/blacklist also add common verbose variable. Note the verbosity thing is a little tricky owing to the way the MCA frameworks and components are registered and and initialized. The BTL's are registered/initialized prior to the MTL components even getting registered. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2020-05-09 14:50:31 -06:00
Michael Heinz	dbbdb8f2e2	Merge pull request #7621 from jsquyres/pr/remove-osc-pt2pt Remove OSC pt2pt component	2020-05-08 12:43:57 -04:00
Brian Barrett	0dc2325297	Merge pull request #7641 from dancejic/multi-NIC Added multi-NIC support to provider selection	2020-05-07 15:24:41 -07:00
Jeff Squyres	9afe58643e	Merge pull request #7600 from jsquyres/pr/mpit-general-docs MPI_T general docs	2020-05-07 10:11:40 -04:00
Ralph Castain	42b3541242	Update mtl_psm2.c Track change in PMIx Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-05-06 17:50:45 -07:00
Michael Heinz	c55c9e67f4	PSM2 update to use PRRTE instead of ORTE Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>	2020-05-06 16:16:27 -04:00
Jeff Squyres	70993e1670	Move "MPI" and "OpenMPI" man pages to section 5 Make the main man page be Open-MPI(5), and set nroff-native aliases for MPI(5) and OpenMPI(5). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2020-05-02 12:45:32 -07:00
Jeff Squyres	7ace873b50	Add MPI_T.5 man page for Open MPI-specific info Also added infrastructure to have developers write man pages in Markdown (vs. nroff). Pandoc >=v1.12 is used to convert those Markdown files into actual nroff man pages. Dist tarballs will contain generated nroff man pages; we don't want to require users to have Pandoc installed. Anyone who builds Open MPI from a git clone will need to have Pandoc installed (similar to how we treat Flex). You can opt out of Open MPI's Pandoc-generated man pages by configuring Open MPI with --disable-man-pages. This will also disable "make dist" (i.e., "make dist" will error if you configured with --disable-man-pages). Also removed the stuff to re-generate man pages. This commit also: 1. Includes a new man page, written in Markdown (ompi/mpi/man/man5/MPI_T.5.md) that contains Open MPI-specific information about MPI_T. 2. Includes a converted ompi/mpi/man/man3/MPI_T_init_thread.3.md (from MPI_T_init_thread.3in -- i.e., nroff) just to show that Markdown can be used throughout the Open MPI code base for man pages. 3. Made the Makefiles in ompi/mpi/man/man?/ be full-fledged Makefile.am's (vs. Makefile.extras that are designed to be included in ompi/Makefile.am). It is more convenient to test generation / installation of man pages when you can "make" and "make install" in their respective directories (vs. doing a build / install for the entire ompi project). 4. Removed logic from ompi/Makefile.am that re-generated man pages if opal_config.h changes. Other man pages -- hopefully all of them! -- will be converted to Markdown over time. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2020-05-02 12:45:31 -07:00
Yossi Itigin	b61bf9a00a	Merge pull request #7349 from hoopoepg/topic/ucx-new-api-nbx OPAL/UCX: enabling new API provided by UCX	2020-05-02 14:30:44 +03:00
Ralph Castain	f608575eec	Remove references to numa_rank Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-05-01 13:32:29 -07:00
Ralph Castain	86709b1c80	Fix PMIx_Fence call signature Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-05-01 12:27:42 -07:00
Sergey Oblomov	75bda25ddb	OPAL/UCX: enabling new API provided by UCX - added detection of new API into configuration - added tag_send call implemented using new API - added MPI_Send/MPI_Isend/MPI_Recv/MPI_Irecv implementations Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2020-05-01 17:58:29 +03:00
Nikola Dancejic	167d75b42a	common/ofi: Added multi-NIC support to provider selection Adds the capability to select a NIC based on hardware locality. Creates a list of NICs that share the same cpuset as the process, then selects the NIC based on the (local rank) % (number of NICs). If no NICs are available that share the same cpuset, the selection process will create a list of all available NICs and make a selection based on (local rank) % (number of NICs) Signed-off-by: Nikola Dancejic <dancejic@amazon.com>	2020-05-01 01:05:13 +00:00
Ralph Castain	bd29ab0ae9	Update dpm to handle deprecation of MPI_Info keys Deprecate the current OMPI-specific MPI_Info key definitions for MPI_Comm_spawn and replace them with their PMIx equivalents. Issue a deprecation/conversion warning as this is done. Also issue deprecation warnings for options such as "ompi_non_mpi" that are no longer used. Handle both cases where the user might pass either the PMIx attribute name itself (e.g., "PMIX_MAPBY") or the string value of the attribute (e.g., PMIX_MAPBY, which translates to "pmix.mapby"). This can only be done for PMIx v4 and above, so protect that code. Silence a couple of Coverity warnings and add a test along the way. Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-04-29 14:56:38 -07:00
Brian Barrett	4f03f44ced	Merge pull request #7582 from dipti-kothari/pml_check mca/pml: PML check for direct modex	2020-04-27 12:29:11 -07:00
Austen Lauria	2e22a247bb	Merge pull request #7650 from devreal/fix-7617-oscpt2pt-leak PT2PT osc: don't extra retain datatype	2020-04-24 08:55:28 -04:00
Austen Lauria	9f2f98e3ec	Merge pull request #7651 from devreal/fix-7617-oscrdma-complete_atomic RDMA osc: remove extra retain on pending_op	2020-04-24 08:55:08 -04:00
Ralph Castain	91be01beb2	Merge pull request #7652 from rhc54/topic/het Cleanup heterogeneous builds	2020-04-22 16:20:06 -07:00
Ralph Castain	6d29bbfde8	Cleanup heterogeneous builds Consolidate the ompi_process_info and opal_process_info structs to remove duplicate storage and conversion issues. Unwind some interweaving of include files using opal.h. Silence a couple of warnings. For now, set the arch to local if PMIX_ARCH is not found. Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-04-22 12:46:27 -07:00
Dipti Kothari	5418cc56dd	mca/pml: PML check for direct modex For direct modex, all procs publish the selected pml module and then at add_procs pml module for each proc is checked against every other proc in the add_proc call. For full modex, there is no change in functionality. Only Rank0 publishes its selected pml, all other procs in the add_proc call check their selected pml against Rank0. If pml's do not match, throw error and exit. Signed-off-by: Dipti Kothari <dkothar@amazon.com>	2020-04-22 16:25:01 +00:00
Joseph Schuchart	de67ada442	RDMA osc: remove extra retain on pending_op Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-04-21 22:49:48 +02:00
Joseph Schuchart	07d1011afe	OSC base: fix typos in documentation Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-04-21 21:53:36 +02:00
Joseph Schuchart	154cf571b6	OSC base: do not retain datatype by default Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-04-21 21:53:10 +02:00
William Zhang	771f9c011d	coll/tuned: Add NULL check to prevent segfault Signed-off-by: William Zhang <wilzhang@amazon.com> cr https://code.amazon.com/reviews/CR-23837553	2020-04-21 17:53:46 +00:00
William Zhang	50640402ab	coll/tuned: Fix typos Signed-off-by: William Zhang <wilzhang@amazon.com>	2020-04-21 17:39:37 +00:00
Nikola Dancejic	3637443454	adding NUMA_RANK to process metadata adding PMIX_NUMA_RANK info to process metadata so that the local NUMA rank can be accessed through the opal_process_info object. Signed-off-by: Nikola Dancejic <dancejic@amazon.com>	2020-04-20 22:02:47 +00:00
Ralph Castain	6635795911	Fix intercomm operations The locality for remote procs is not provided as it is only a local concept. Thus, you must _always_ use modex_recv_optional to ensure you don't hang waiting for a response until dmodex times out. Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-04-15 17:04:22 -07:00
Ralph Castain	de2d69ca24	Fix hetero builds Add missing variable declaration Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-04-15 08:46:21 -07:00
Jeff Squyres	8999bae25e	Remove OSC pt2pt component Per https://github.com/open-mpi/ompi/wiki/5.0.x-FeatureList, remove the OSC pt2pt component. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2020-04-13 12:29:54 -07:00
Jeff Squyres	b2e0957d6f	Merge pull request #7610 from bosilca/topic/fix_MPI_T Follow the MPI_T guidelines on return errors.	2020-04-12 14:12:32 -04:00
Ralph Castain	a210f8046f	Cleanup ompi/dpm operations Do some code cleanup in the connect/accept code. Ensure that the OMPI layer has access to the PMIx identifier for the process. Add macros for converting PMIx names to/from strings. Cleanup a few of the simple test programs. Add a little more info to a btl/tcp error message. Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-04-08 08:37:25 -07:00
George Bosilca	f4af1848c9	Follow the MPI_T guidelines on return errors. As indicated in the MPI3.2 document 14.3.10 page 599 line 1, the only MPI error code possible is MPI_SUCCESS. All other errors must be in the error class MPI_T_ERR*. Fix the return of few pvar/cvar function that failed to correctly convert to an MPI error code. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-04-08 00:02:45 -04:00
Jeff Squyres	9687d5e867	Upgrade all www.open-mpi.org URLs to https Found a handful of other URLs that weren't https-ized, so I updated them, too (after verifying that they support https, of course). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2020-04-02 10:43:50 -04:00
Yossi Itigin	5dcd1f4e6c	Merge pull request #7575 from yosefe/topic/pml-ucx-fix-usage-of-mca-pml pml/ucx: Fix usage of mca_pml_base_pml_check_selected()	2020-03-30 20:06:12 +03:00
Nathan Hjelm	160ff188b8	Merge pull request #7169 from hjelmn/fix_what_wg21_calls_our_problem_not_theirs_seriously__in_some_ways_they_are_correct_but_wtf configure: use -iquote for non-system include paths	2020-03-30 09:22:54 -07:00
Yossi Itigin	124f0c0d1f	pml/ucx: Fix usage of mca_pml_base_pml_check_selected() Pass the correct ompi_proc_t and array length to mca_pml_base_pml_check_selected() during dynamic modex. Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2020-03-29 17:46:45 +03:00
Howard Pritchard	f136a20cae	Merge pull request #6578 from hppritcha/topic/thread_framework2 Implement a MCA framework for threads	2020-03-27 15:55:48 -06:00
Austen Lauria	8a624ab613	Merge pull request #7523 from mkurnosov/fix-bcast-scatterallgather Fix Bcast scatter_allgather	2020-03-27 14:17:53 -04:00
Shintaro Iwasaki	a7ea0d9bd7	ompi/request: move REQUEST constants from mca/threads to ompi/request Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>	2020-03-27 10:16:04 -06:00
Noah Evans	ee3517427e	Add threads framework Add a framework to support different types of threading models including user space thread packages such as Qthreads and argobot: https://github.com/pmodels/argobots https://github.com/Qthreads/qthreads The default threading model is pthreads. Alternate thread models are specificed at configure time using the --with-threads=X option. The framework is static. The theading model to use is selected at Open MPI configure/build time. mca/threads: implement Argobots threading layer config: fix thread configury - Add double quotations - Change Argobot to Argobots config: implement Argobots check If the poll time is too long, MPI hangs. This quick fix just sets it to 0, but it is not good for the Pthreads version. Need to find a good way to abstract it. Note that even 1 (= 1 millisecond) causes disastrous performance degradation. rework threads MCA framework configury It now works more like the ompi/mca/rte configury, modulo some edge items that are special for threading package linking, etc. qthreads module some argobots cleanup Signed-off-by: Noah Evans <noah.evans@gmail.com> Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov> Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2020-03-27 10:15:45 -06:00
Brian Barrett	64d70b3076	ofi: Call add_procs through PML Change ompi_mtl_ofi_get_endpoint() to call the active PML's add_procs() rather than the OFI MTL add_procs() directly when discovering a new process during operation. Functionally, this has no impact in correct operation. However, the current behavior means that the heterogenous and active PML checks are not being executed in the dynamic discovery case. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2020-03-27 06:06:42 -07:00
Ralph Castain	c704ed4cc5	Merge pull request #7554 from rhc54/topic/proc1 ompi_proc_t size reduction: part 1	2020-03-26 13:23:06 -07:00
Ralph Castain	33ab928e1b	ompi_proc_t size reduction: part 1 We currently save the hostname of a proc when we create the ompi_proc_t for it. This was originally done because the only method we had for discovering the host of a proc was to include that info in the modex, and we had to therefore store it somewhere proc-local. Obviously, this ccarried a memory penalty for storing all those strings, and so we added a "cutoff" parameter so that we wouldn't collect hostnames above a certain number of procs. Unfortunately, this still results in an 8-byte/proc memory cost as we have a char* pointer in the opal_proc_t that is contained in the ompi_proc_t so that we can store the hostname of the other procs if we fall below the cutoff. At scale, this can consume a fair amount of memory. With the switch to relying on PMIx, there is no longer a need to cache the proc hostnames. Using the "optional" feature of PMIx_Get, we restrict the retrieval to be purely proc-local - i.e., we retrieve the info either via shared memory or from within the proc-internal hash storage (depending upon the active PMIx components). Thus, the retrieval of a hostname is purely a local operation involving no communication. All RM's are required to provide a complete hostname map of all procs at startup. Thus, we have full access to all hostnames without including them in a modex or having to cache them on each proc. This allows us to remove the char* pointer from the opal_proc_t, saving us 8-bytes/proc. Unfortunately, PMIx_Get does not currently support the return of a static pointer to memory. Thus, even though PMIx has the hostname in its memory, it can only return a malloc'd version of it. I have therefore ensured that the return from opal_get_proc_hostname is consistently malloc'd and free'd wherever used. This shouldn't be a burden as the hostname is only used in one of two circumstances: (a) in an error message (b) in a verbose output for debugging purposes Thus, there should be no performance penalty associated with the malloc/free requirement. PMIx will eventually be returning static pointers, and so we can eventually simplify this method and return a "const char*" - but as noted, this really isn't an issue even today. Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-03-23 12:49:44 -07:00
Ralph Castain	9bb06d0077	Merge pull request #7559 from rhc54/topic/fixes Bunch of fixes plus PMIx/PRRTE updates	2020-03-23 12:49:18 -07:00

1 2 3 4 5 ...

10715 Коммитов