openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	de2d69ca24	Fix hetero builds Add missing variable declaration Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-04-15 08:46:21 -07:00
Howard Pritchard	f136a20cae	Merge pull request #6578 from hppritcha/topic/thread_framework2 Implement a MCA framework for threads	2020-03-27 15:55:48 -06:00
Noah Evans	ee3517427e	Add threads framework Add a framework to support different types of threading models including user space thread packages such as Qthreads and argobot: https://github.com/pmodels/argobots https://github.com/Qthreads/qthreads The default threading model is pthreads. Alternate thread models are specificed at configure time using the --with-threads=X option. The framework is static. The theading model to use is selected at Open MPI configure/build time. mca/threads: implement Argobots threading layer config: fix thread configury - Add double quotations - Change Argobot to Argobots config: implement Argobots check If the poll time is too long, MPI hangs. This quick fix just sets it to 0, but it is not good for the Pthreads version. Need to find a good way to abstract it. Note that even 1 (= 1 millisecond) causes disastrous performance degradation. rework threads MCA framework configury It now works more like the ompi/mca/rte configury, modulo some edge items that are special for threading package linking, etc. qthreads module some argobots cleanup Signed-off-by: Noah Evans <noah.evans@gmail.com> Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov> Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2020-03-27 10:15:45 -06:00
Ralph Castain	33ab928e1b	ompi_proc_t size reduction: part 1 We currently save the hostname of a proc when we create the ompi_proc_t for it. This was originally done because the only method we had for discovering the host of a proc was to include that info in the modex, and we had to therefore store it somewhere proc-local. Obviously, this ccarried a memory penalty for storing all those strings, and so we added a "cutoff" parameter so that we wouldn't collect hostnames above a certain number of procs. Unfortunately, this still results in an 8-byte/proc memory cost as we have a char* pointer in the opal_proc_t that is contained in the ompi_proc_t so that we can store the hostname of the other procs if we fall below the cutoff. At scale, this can consume a fair amount of memory. With the switch to relying on PMIx, there is no longer a need to cache the proc hostnames. Using the "optional" feature of PMIx_Get, we restrict the retrieval to be purely proc-local - i.e., we retrieve the info either via shared memory or from within the proc-internal hash storage (depending upon the active PMIx components). Thus, the retrieval of a hostname is purely a local operation involving no communication. All RM's are required to provide a complete hostname map of all procs at startup. Thus, we have full access to all hostnames without including them in a modex or having to cache them on each proc. This allows us to remove the char* pointer from the opal_proc_t, saving us 8-bytes/proc. Unfortunately, PMIx_Get does not currently support the return of a static pointer to memory. Thus, even though PMIx has the hostname in its memory, it can only return a malloc'd version of it. I have therefore ensured that the return from opal_get_proc_hostname is consistently malloc'd and free'd wherever used. This shouldn't be a burden as the hostname is only used in one of two circumstances: (a) in an error message (b) in a verbose output for debugging purposes Thus, there should be no performance penalty associated with the malloc/free requirement. PMIx will eventually be returning static pointers, and so we can eventually simplify this method and return a "const char*" - but as noted, this really isn't an issue even today. Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-03-23 12:49:44 -07:00
Gilles Gouaillardet	174e967dbc	Remove ORTE project Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build without reference to ORTE. Setup opal/pmix framework to be static. Remove support for all PMI-1 and PMI-2 libraries. Add support for "external" pmix component as well as internal v4 one. remove orte: misc fixes - UCX fixes - VPATH issue - oshmem fixes - remove useless definition - Add PRRTE submodule - Get autogen.pl to traverse PRRTE submodule - Remove stale orcm reference - Configure embedded PRRTE - Correctly pass the prefix to PRRTE - Correctly set the OMPI_WANT_PRRTE am_conditional - Move prrte configuration to the end of OMPI's configure.ac - Make mpirun a symlink to prun, when available - Fix makedist with --no-orte/--no-prrte option - Add a `--no-prrte` option which is the same as the legacy `--no-orte` option. - Remove embedded PMIx tarball. Replace it with new submodule pointing to OpenPMIx master repo's master branch - Some cleanup in PRRTE integration and add config summary entry - Correctly set the hostname - Fix locality - Fix singleton operations - Fix support for "tune" and "am" options Signed-off-by: Ralph Castain <rhc@pmix.org> Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2020-02-07 18:20:06 -08:00
Artem Polyakov	1f7a3a2d54	ompi: Avoid unnecessary PMIx lookups when adding procs (step 2). Follow-up for 717f3fef62b193845e9add5aaaae3543c2f2ebfb. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-03-16 07:47:27 +07:00
Artem Polyakov	717f3fef62	ompi: Avoid unnecessary PMIx lookups when adding procs. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-02-22 16:09:30 +07:00
Gilles Gouaillardet	4932391002	ompi/proc: fix ompi_proc_finalize() revert bits of open-mpi/ompi@cf534d0c95 we cannot del_procs here since the pml framework has already been closed Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-12 11:41:35 +09:00
Gilles Gouaillardet	cf534d0c95	ompi/proc: plug a memory leak in ompi_proc_finalize() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 13:46:35 +09:00
Gilles Gouaillardet	b2aca6c753	ompi/proc: plug a memory leak in ompi_proc_unpack() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:29 +09:00
Ralph Castain	1e2019ce2a	Revert "Update to sync with OMPI master and cleanup to build" This reverts commit cb55c88a8b7817d5891ff06a447ea190b0e77479.	2016-11-22 15:03:20 -08:00
Ralph Castain	cb55c88a8b	Update to sync with OMPI master and cleanup to build Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-22 14:24:54 -08:00
Joshua Hursey	f6f24a4f67	build: Custom libmpi(_FOO) name option in configure * Add a configure time option to rename libmpi(_FOO).* - `--with-libmpi-name=STRING` * This commit only impacts the installed libraries. Internal, temporary libraries have not been renamed to limit the scope of the patch to only what is needed. For example: ```shell shell$ ./configure --with-libmpi-name=wookie ... shell$ find . -name "libmpi" shell$ find . -name "libwookie" ./lib/libwookie.so.0.0.0 ./lib/libwookie.so.0 ./lib/libwookie.so ./lib/libwookie.la ./lib/libwookie_mpifh.so.0.0.0 ./lib/libwookie_mpifh.so.0 ./lib/libwookie_mpifh.so ./lib/libwookie_mpifh.la ./lib/libwookie_usempi.so.0.0.0 ./lib/libwookie_usempi.so.0 ./lib/libwookie_usempi.so ./lib/libwookie_usempi.la shell$ ```	2016-09-29 21:47:24 -05:00
Gilles Gouaillardet	0a25420dac	oshmem: get rid of oshmem_proc_t and use ompi_proc_t instead store oshmem related per proc data in an oshmem_proc_data_t struct, that is stored in the padding section of an ompi_proc_t this data can be accessed via the OSHMEM_PROC_DATA(proc) macro Fixes open-mpi/ompi#2023	2016-09-01 14:20:14 +09:00
Gilles Gouaillardet	a4aa4c9571	ompi_proc_complete_init_single: make the subroutine public and accept a proc from a different job	2016-02-22 11:01:06 +09:00
Gilles Gouaillardet	96310f439b	sentinel: fix 32 bits arch since a sentinel is only made from the current job, only store the first 31 bits of the vpid into the sentinel.	2016-02-10 15:44:07 +09:00
Gilles Gouaillardet	b55b9e6aee	sentinel: fix sentinel to proc_name conversion converting an opal_process_name_t means the loss of one bit, it was decided to restrict the local job id to 15 bits, so the useful information of an opal_process_name_t can fit in 63 bits.	2016-02-10 15:44:07 +09:00
Gilles Gouaillardet	030a5f2054	sentinel: use type uintptr_t for sentinel MSB is now automatically cleared when right shifting Thanks George for pointing this	2016-02-10 11:28:56 +09:00
Artem Polyakov	2abb2972ac	Fix Mellanox copyrights with respect to the following PRs: * https://github.com/open-mpi/ompi/pull/1184 * https://github.com/open-mpi/ompi/pull/1188 * https://github.com/open-mpi/ompi/pull/1197 * https://github.com/open-mpi/ompi/pull/1202 * https://github.com/open-mpi/ompi/pull/1210 * https://github.com/open-mpi/ompi/pull/1216 * https://github.com/open-mpi/ompi/pull/1236 * https://github.com/open-mpi/ompi/pull/1237 * https://github.com/open-mpi/ompi/pull/1248 * https://github.com/open-mpi/ompi/pull/1260 * https://github.com/open-mpi/ompi/pull/1264	2015-12-30 00:12:19 +06:00
Artem Polyakov	6a791c3026	Fix add_proc deadlock.	2015-12-17 21:18:33 +06:00
Nathan Hjelm	b7ba301310	Merge pull request #1165 from hjelmn/add_procs_group ompi/group: release ompi_proc_t's at group destruction	2015-12-14 13:53:42 -08:00
Nathan Hjelm	f317ba5262	Merge pull request #1163 from hjelmn/ompi_proc_threads ompi/proc: make proc system always thread safe	2015-12-08 10:36:55 -07:00
Nathan Hjelm	eb830b9501	ompi_proc_pack: correctly handle proc sentinels Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-12-07 17:27:38 -07:00
Gilles Gouaillardet	351bd03249	ompi_proc_sentinel_to_name: clear the top left bit	2015-12-02 17:18:56 +09:00
Nathan Hjelm	22af95b266	ompi/proc: make proc system always thread safe This commit changes the OPAL_THREAD_LOCK/OPAL_THREAD_UNLOCK calls in ompi/proc to opal_mutex_lock/opal_mutex_unlock. This will allow multi-threaded BTLs the ability to creat ompi_proc_t's without having to set opal_using_threads. There should be no performance hits as none of the lock points are in the critical path. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-11-30 16:37:09 -07:00
Gilles Gouaillardet	8227bc6320	ompi_proc_find_and_add: use ompi_proc_allocate in order to update both ompi_proc_list and ompi_proc_hash	2015-11-30 14:00:59 +09:00
Ralph Castain	bfdf08ae86	Fix intercomm_create by ensuring that both sides know how to translate jobid to/from nspace Return something just to ensure that pack is happy	2015-11-06 02:19:45 -08:00
Ralph Castain	4c12022a50	Silence a couple of warnings from valgrind and compilers. Since some pmix components may return success with a NULL value from a "get", check for that situation before attempting to unload the data. Preset the hostname before calling modex_recv to get it so unload properly checks for NULL. Cast a returned value to the correct ompi_proc_t pointer	2015-10-22 20:56:02 -07:00
Jeff Squyres	9045d6de00	proc.c: fix some compiler warnings Eliminate unused variables and fix a signed/unsigned comparison issue.	2015-10-13 09:34:18 -04:00
Gilles Gouaillardet	57ecce4e0f	ompi_proc_complete_init: always reset u16ptr if a key is not found, u16ptr is set to NULL and following opal_value_unload calls might fail	2015-09-29 11:41:51 +09:00
Nathan Hjelm	12bd300c40	Merge pull request #929 from hjelmn/add_procs Update add_procs support	2015-09-28 17:29:13 -06:00
Gilles Gouaillardet	f241475db9	ompi: initialize ompi_proc_list common symbol	2015-09-28 10:09:27 +09:00
Nathan Hjelm	2c89c7f47d	ompi/proc: add function to get all allocated procs This commit adds two new functions: - ompi_proc_get_allocated - Returns all procs in the current job that have already been allocated. This is used in init/finalize to determine which procs to pass to add_procs/del_procs. - ompi_proc_world_size - returns the number of processes in MPI_COMM_WORLD. This may be removed in favor of callers just looking at ompi_process_info. The behavior of ompi_proc_world has been restored to return ompi_proc_t's for all processes in the current job. The use of this function is discouraged. Code that was using ompi_proc_world() has been updated to make use of the new functions to avoid the memory overhead of ompi_comm_world (). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-23 16:22:05 -06:00
Igor Ivanov	4b8d9b8eff	oshmem/proc: Refactor proc component Most functionality of oshmem_proc duplicates ompi_proc. In addition to that, Current logic does not allow to do oshmem initialization w/o ompi startup. So this refactoring allows to avoid code duplication, decrease used memory and make oshmem support easier. Now oshmem_proc is transparent ompi_proc structure, that can be extended by oshmem specific data. Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>	2015-09-17 18:49:00 +03:00
Igor Ivanov	11f61790ee	ompi/proc: Extend ompi_proc_t structure with padding to support oshmem data Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>	2015-09-17 18:48:59 +03:00
Nathan Hjelm	f29b65aa14	ompi/proc: fix typos CID 1323840 Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-09-11 21:02:30 -06:00
Ralph Castain	b60b03d613	It is okay not to get the hostname - we don't require that it be provided	2015-09-11 13:01:20 -07:00
Nathan Hjelm	1868b5937c	Merge pull request #889 from hjelmn/sentinel_update Use the low instead of the high bit to indicate a proc is a sentinel	2015-09-11 12:30:27 -06:00
Nathan Hjelm	64c8f124fc	Use the low instead of the high bit to indicate a proc is a sentinel The assumption that the high bit is not in use in pointers on any of our supported platforms was incorrect. A better assumption is that all ompi_proc_t pointers will be at least 2-byte aligned. This allows us to use the low bit. To do this we drop the highest bit of the opal_process_name_t jobid (hope this is ok) and use the low bit to indicate the proc is really a sentinel. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-09-11 09:32:02 -06:00
Ralph Castain	dc5796b8a1	Revert "Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local"" Fix the locality computation by correctly computing the vpid of the local peer This reverts commit open-mpi/ompi@6a8fad49e5.	2015-09-11 08:29:51 -07:00
Ralph Castain	6a8fad49e5	Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local" This reverts commit f94f3cda214ab937c46802896fb53b84bec6cc3a.	2015-09-11 02:01:25 -07:00
Gilles Gouaillardet	638a59adf3	fix compilation in heterogeneous mode use OPAL_PMIX_GLOBAL instead of PMIX_GLOBAL	2015-09-11 09:23:21 +09:00
Ralph Castain	f94f3cda21	Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local	2015-09-10 10:25:30 -07:00
Nathan Hjelm	5b7943db78	ompi/group: do not allocate ompi_proc_t's on group union/difference This commit modifies the ompi_group_t union/difference code to compare/copy the raw group values. This will either be a ompi_proc_t or a sentinel value. This commit also adds helper functions to convert between opal process names and sentinel values. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-09-10 08:55:55 -06:00
Nathan Hjelm	408da16d50	ompi/proc: add proc hash table for ompi_proc_t objects This commit adds an opal hash table to keep track of mapping between process identifiers and ompi_proc_t's. This hash table is used by the ompi_proc_by_name() function to lookup (in O(1) time) a given process. This can be used by a BTL or other component to get a ompi_proc_t when handling an incoming message from an as yet unknown peer. Additionally, this commit adds a new MCA variable to control the new add_procs behavior: mpi_add_procs_cutoff. If the number of ranks in the process falls below the threshold a ompi_proc_t is created for every process. If the number of ranks is above the threshold then a ompi_proc_t is only created for the local rank. The code needed to generate additional ompi_proc_t's for a communicator is not yet complete. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-10 08:55:54 -06:00
Ralph Castain	37c3ed68e7	Cleanup connect/disconnect and bring comm_spawn back online!	2015-09-06 10:27:39 -07:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Gilles Gouaillardet	a9044945fe	ompi/proc: correctly handle cutoff modex case as reported by Coverity with CID 1196664	2015-03-09 14:34:28 +09:00
Ralph Castain	780c93ee57	Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL. We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.	2014-11-11 17:00:42 -08:00

1 2 3 4

178 Коммитов