openmpi

Автор	SHA1	Сообщение	Дата
Artem Polyakov	1f7a3a2d54	ompi: Avoid unnecessary PMIx lookups when adding procs (step 2). Follow-up for `717f3fef62`. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-03-16 07:47:27 +07:00
Artem Polyakov	717f3fef62	ompi: Avoid unnecessary PMIx lookups when adding procs. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-02-22 16:09:30 +07:00
Gilles Gouaillardet	4932391002	ompi/proc: fix ompi_proc_finalize() revert bits of open-mpi/ompi@cf534d0c95 we cannot del_procs here since the pml framework has already been closed Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-12 11:41:35 +09:00
Gilles Gouaillardet	cf534d0c95	ompi/proc: plug a memory leak in ompi_proc_finalize() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 13:46:35 +09:00
Gilles Gouaillardet	b2aca6c753	ompi/proc: plug a memory leak in ompi_proc_unpack() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:29 +09:00
Ralph Castain	1e2019ce2a	Revert "Update to sync with OMPI master and cleanup to build" This reverts commit `cb55c88a8b`.	2016-11-22 15:03:20 -08:00
Ralph Castain	cb55c88a8b	Update to sync with OMPI master and cleanup to build Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-22 14:24:54 -08:00
Joshua Hursey	f6f24a4f67	build: Custom libmpi(_FOO) name option in configure * Add a configure time option to rename libmpi(_FOO).* - `--with-libmpi-name=STRING` * This commit only impacts the installed libraries. Internal, temporary libraries have not been renamed to limit the scope of the patch to only what is needed. For example: ```shell shell$ ./configure --with-libmpi-name=wookie ... shell$ find . -name "libmpi" shell$ find . -name "libwookie" ./lib/libwookie.so.0.0.0 ./lib/libwookie.so.0 ./lib/libwookie.so ./lib/libwookie.la ./lib/libwookie_mpifh.so.0.0.0 ./lib/libwookie_mpifh.so.0 ./lib/libwookie_mpifh.so ./lib/libwookie_mpifh.la ./lib/libwookie_usempi.so.0.0.0 ./lib/libwookie_usempi.so.0 ./lib/libwookie_usempi.so ./lib/libwookie_usempi.la shell$ ```	2016-09-29 21:47:24 -05:00
Gilles Gouaillardet	0a25420dac	oshmem: get rid of oshmem_proc_t and use ompi_proc_t instead store oshmem related per proc data in an oshmem_proc_data_t struct, that is stored in the padding section of an ompi_proc_t this data can be accessed via the OSHMEM_PROC_DATA(proc) macro Fixes open-mpi/ompi#2023	2016-09-01 14:20:14 +09:00
Gilles Gouaillardet	a4aa4c9571	ompi_proc_complete_init_single: make the subroutine public and accept a proc from a different job	2016-02-22 11:01:06 +09:00
Gilles Gouaillardet	96310f439b	sentinel: fix 32 bits arch since a sentinel is only made from the current job, only store the first 31 bits of the vpid into the sentinel.	2016-02-10 15:44:07 +09:00
Gilles Gouaillardet	b55b9e6aee	sentinel: fix sentinel to proc_name conversion converting an opal_process_name_t means the loss of one bit, it was decided to restrict the local job id to 15 bits, so the useful information of an opal_process_name_t can fit in 63 bits.	2016-02-10 15:44:07 +09:00
Gilles Gouaillardet	030a5f2054	sentinel: use type uintptr_t for sentinel MSB is now automatically cleared when right shifting Thanks George for pointing this	2016-02-10 11:28:56 +09:00
Artem Polyakov	2abb2972ac	Fix Mellanox copyrights with respect to the following PRs: * https://github.com/open-mpi/ompi/pull/1184 * https://github.com/open-mpi/ompi/pull/1188 * https://github.com/open-mpi/ompi/pull/1197 * https://github.com/open-mpi/ompi/pull/1202 * https://github.com/open-mpi/ompi/pull/1210 * https://github.com/open-mpi/ompi/pull/1216 * https://github.com/open-mpi/ompi/pull/1236 * https://github.com/open-mpi/ompi/pull/1237 * https://github.com/open-mpi/ompi/pull/1248 * https://github.com/open-mpi/ompi/pull/1260 * https://github.com/open-mpi/ompi/pull/1264	2015-12-30 00:12:19 +06:00
Artem Polyakov	6a791c3026	Fix add_proc deadlock.	2015-12-17 21:18:33 +06:00
Nathan Hjelm	b7ba301310	Merge pull request #1165 from hjelmn/add_procs_group ompi/group: release ompi_proc_t's at group destruction	2015-12-14 13:53:42 -08:00
Nathan Hjelm	f317ba5262	Merge pull request #1163 from hjelmn/ompi_proc_threads ompi/proc: make proc system always thread safe	2015-12-08 10:36:55 -07:00
Nathan Hjelm	eb830b9501	ompi_proc_pack: correctly handle proc sentinels Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-12-07 17:27:38 -07:00
Gilles Gouaillardet	351bd03249	ompi_proc_sentinel_to_name: clear the top left bit	2015-12-02 17:18:56 +09:00
Nathan Hjelm	22af95b266	ompi/proc: make proc system always thread safe This commit changes the OPAL_THREAD_LOCK/OPAL_THREAD_UNLOCK calls in ompi/proc to opal_mutex_lock/opal_mutex_unlock. This will allow multi-threaded BTLs the ability to creat ompi_proc_t's without having to set opal_using_threads. There should be no performance hits as none of the lock points are in the critical path. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-11-30 16:37:09 -07:00
Gilles Gouaillardet	8227bc6320	ompi_proc_find_and_add: use ompi_proc_allocate in order to update both ompi_proc_list and ompi_proc_hash	2015-11-30 14:00:59 +09:00
Ralph Castain	bfdf08ae86	Fix intercomm_create by ensuring that both sides know how to translate jobid to/from nspace Return something just to ensure that pack is happy	2015-11-06 02:19:45 -08:00
Ralph Castain	4c12022a50	Silence a couple of warnings from valgrind and compilers. Since some pmix components may return success with a NULL value from a "get", check for that situation before attempting to unload the data. Preset the hostname before calling modex_recv to get it so unload properly checks for NULL. Cast a returned value to the correct ompi_proc_t pointer	2015-10-22 20:56:02 -07:00
Jeff Squyres	9045d6de00	proc.c: fix some compiler warnings Eliminate unused variables and fix a signed/unsigned comparison issue.	2015-10-13 09:34:18 -04:00
Gilles Gouaillardet	57ecce4e0f	ompi_proc_complete_init: always reset u16ptr if a key is not found, u16ptr is set to NULL and following opal_value_unload calls might fail	2015-09-29 11:41:51 +09:00
Nathan Hjelm	12bd300c40	Merge pull request #929 from hjelmn/add_procs Update add_procs support	2015-09-28 17:29:13 -06:00
Gilles Gouaillardet	f241475db9	ompi: initialize ompi_proc_list common symbol	2015-09-28 10:09:27 +09:00
Nathan Hjelm	2c89c7f47d	ompi/proc: add function to get all allocated procs This commit adds two new functions: - ompi_proc_get_allocated - Returns all procs in the current job that have already been allocated. This is used in init/finalize to determine which procs to pass to add_procs/del_procs. - ompi_proc_world_size - returns the number of processes in MPI_COMM_WORLD. This may be removed in favor of callers just looking at ompi_process_info. The behavior of ompi_proc_world has been restored to return ompi_proc_t's for all processes in the current job. The use of this function is discouraged. Code that was using ompi_proc_world() has been updated to make use of the new functions to avoid the memory overhead of ompi_comm_world (). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-23 16:22:05 -06:00
Igor Ivanov	4b8d9b8eff	oshmem/proc: Refactor proc component Most functionality of oshmem_proc duplicates ompi_proc. In addition to that, Current logic does not allow to do oshmem initialization w/o ompi startup. So this refactoring allows to avoid code duplication, decrease used memory and make oshmem support easier. Now oshmem_proc is transparent ompi_proc structure, that can be extended by oshmem specific data. Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>	2015-09-17 18:49:00 +03:00
Igor Ivanov	11f61790ee	ompi/proc: Extend ompi_proc_t structure with padding to support oshmem data Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>	2015-09-17 18:48:59 +03:00
Nathan Hjelm	f29b65aa14	ompi/proc: fix typos CID 1323840 Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-09-11 21:02:30 -06:00
Ralph Castain	b60b03d613	It is okay not to get the hostname - we don't require that it be provided	2015-09-11 13:01:20 -07:00
Nathan Hjelm	1868b5937c	Merge pull request #889 from hjelmn/sentinel_update Use the low instead of the high bit to indicate a proc is a sentinel	2015-09-11 12:30:27 -06:00
Nathan Hjelm	64c8f124fc	Use the low instead of the high bit to indicate a proc is a sentinel The assumption that the high bit is not in use in pointers on any of our supported platforms was incorrect. A better assumption is that all ompi_proc_t pointers will be at least 2-byte aligned. This allows us to use the low bit. To do this we drop the highest bit of the opal_process_name_t jobid (hope this is ok) and use the low bit to indicate the proc is really a sentinel. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-09-11 09:32:02 -06:00
Ralph Castain	dc5796b8a1	Revert "Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local"" Fix the locality computation by correctly computing the vpid of the local peer This reverts commit open-mpi/ompi@6a8fad49e5.	2015-09-11 08:29:51 -07:00
Ralph Castain	6a8fad49e5	Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local" This reverts commit `f94f3cda21`.	2015-09-11 02:01:25 -07:00
Gilles Gouaillardet	638a59adf3	fix compilation in heterogeneous mode use OPAL_PMIX_GLOBAL instead of PMIX_GLOBAL	2015-09-11 09:23:21 +09:00
Ralph Castain	f94f3cda21	Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local	2015-09-10 10:25:30 -07:00
Nathan Hjelm	5b7943db78	ompi/group: do not allocate ompi_proc_t's on group union/difference This commit modifies the ompi_group_t union/difference code to compare/copy the raw group values. This will either be a ompi_proc_t or a sentinel value. This commit also adds helper functions to convert between opal process names and sentinel values. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-09-10 08:55:55 -06:00
Nathan Hjelm	408da16d50	ompi/proc: add proc hash table for ompi_proc_t objects This commit adds an opal hash table to keep track of mapping between process identifiers and ompi_proc_t's. This hash table is used by the ompi_proc_by_name() function to lookup (in O(1) time) a given process. This can be used by a BTL or other component to get a ompi_proc_t when handling an incoming message from an as yet unknown peer. Additionally, this commit adds a new MCA variable to control the new add_procs behavior: mpi_add_procs_cutoff. If the number of ranks in the process falls below the threshold a ompi_proc_t is created for every process. If the number of ranks is above the threshold then a ompi_proc_t is only created for the local rank. The code needed to generate additional ompi_proc_t's for a communicator is not yet complete. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-10 08:55:54 -06:00
Ralph Castain	37c3ed68e7	Cleanup connect/disconnect and bring comm_spawn back online!	2015-09-06 10:27:39 -07:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Gilles Gouaillardet	a9044945fe	ompi/proc: correctly handle cutoff modex case as reported by Coverity with CID 1196664	2015-03-09 14:34:28 +09:00
Ralph Castain	780c93ee57	Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL. We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.	2014-11-11 17:00:42 -08:00
Gilles Gouaillardet	b5aea782ce	Revert "Fix heterogeneous support" Per the discussion at http://www.open-mpi.org/community/lists/devel/2014/10/16050.php This reverts commit `c9c5d4011b`.	2014-10-16 12:24:38 +09:00
Gilles Gouaillardet	c9c5d4011b	Fix heterogeneous support * redefine orte_process_name_t so it can be converted between host and network format as an opal_identifier_t aka uint64_t by the OPAL layer. * correctly send OPAL_DSTORE_ARCH key	2014-10-15 17:19:13 +09:00
Elena	c905fe9b78	pmix: removed pmix_base_direct modex mca parameter, renamed orte_full_modex_cutoff and ompi_hostname_cutoff to direct_modex_cutoff	2014-10-09 06:15:31 +02:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Jeff Squyres	132375f07f	helpfiles: fix filenames referenced by calls to show_help() This commit was SVN r32453.	2014-08-08 13:34:15 +00:00

1 2 3 4

173 Коммитов