openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	f608575eec	Remove references to numa_rank Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-05-01 13:32:29 -07:00
Ralph Castain	6d29bbfde8	Cleanup heterogeneous builds Consolidate the ompi_process_info and opal_process_info structs to remove duplicate storage and conversion issues. Unwind some interweaving of include files using opal.h. Silence a couple of warnings. For now, set the arch to local if PMIX_ARCH is not found. Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-04-22 12:46:27 -07:00
Nikola Dancejic	3637443454	adding NUMA_RANK to process metadata adding PMIX_NUMA_RANK info to process metadata so that the local NUMA rank can be accessed through the opal_process_info object. Signed-off-by: Nikola Dancejic <dancejic@amazon.com>	2020-04-20 22:02:47 +00:00
Ralph Castain	33ab928e1b	ompi_proc_t size reduction: part 1 We currently save the hostname of a proc when we create the ompi_proc_t for it. This was originally done because the only method we had for discovering the host of a proc was to include that info in the modex, and we had to therefore store it somewhere proc-local. Obviously, this ccarried a memory penalty for storing all those strings, and so we added a "cutoff" parameter so that we wouldn't collect hostnames above a certain number of procs. Unfortunately, this still results in an 8-byte/proc memory cost as we have a char* pointer in the opal_proc_t that is contained in the ompi_proc_t so that we can store the hostname of the other procs if we fall below the cutoff. At scale, this can consume a fair amount of memory. With the switch to relying on PMIx, there is no longer a need to cache the proc hostnames. Using the "optional" feature of PMIx_Get, we restrict the retrieval to be purely proc-local - i.e., we retrieve the info either via shared memory or from within the proc-internal hash storage (depending upon the active PMIx components). Thus, the retrieval of a hostname is purely a local operation involving no communication. All RM's are required to provide a complete hostname map of all procs at startup. Thus, we have full access to all hostnames without including them in a modex or having to cache them on each proc. This allows us to remove the char* pointer from the opal_proc_t, saving us 8-bytes/proc. Unfortunately, PMIx_Get does not currently support the return of a static pointer to memory. Thus, even though PMIx has the hostname in its memory, it can only return a malloc'd version of it. I have therefore ensured that the return from opal_get_proc_hostname is consistently malloc'd and free'd wherever used. This shouldn't be a burden as the hostname is only used in one of two circumstances: (a) in an error message (b) in a verbose output for debugging purposes Thus, there should be no performance penalty associated with the malloc/free requirement. PMIx will eventually be returning static pointers, and so we can eventually simplify this method and return a "const char*" - but as noted, this really isn't an issue even today. Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-03-23 12:49:44 -07:00
Ralph Castain	829fd478b3	Create a hack to protect against non-integer jobids If someone gives us a namespace that doesn't easily translate to an integer, we have to create a mechanism for working around the disconnect. PRRTE has been updated to give us a flag so we know we were "natively" launched. If we don't see it, then fall back to generating a hash of the nspace as our jobid. We then have to translate back/forth between nspace and jobid using a lookup table. Probably not the right long-term solution, but hopefully helps get us thru for a bit. Includes update of PRRTE pointer Signed-off-by: Ralph Castain <rhc@pmix.org>	2020-02-21 06:04:55 -08:00
Gilles Gouaillardet	174e967dbc	Remove ORTE project Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build without reference to ORTE. Setup opal/pmix framework to be static. Remove support for all PMI-1 and PMI-2 libraries. Add support for "external" pmix component as well as internal v4 one. remove orte: misc fixes - UCX fixes - VPATH issue - oshmem fixes - remove useless definition - Add PRRTE submodule - Get autogen.pl to traverse PRRTE submodule - Remove stale orcm reference - Configure embedded PRRTE - Correctly pass the prefix to PRRTE - Correctly set the OMPI_WANT_PRRTE am_conditional - Move prrte configuration to the end of OMPI's configure.ac - Make mpirun a symlink to prun, when available - Fix makedist with --no-orte/--no-prrte option - Add a `--no-prrte` option which is the same as the legacy `--no-orte` option. - Remove embedded PMIx tarball. Replace it with new submodule pointing to OpenPMIx master repo's master branch - Some cleanup in PRRTE integration and add config summary entry - Correctly set the hostname - Fix locality - Fix singleton operations - Fix support for "tune" and "am" options Signed-off-by: Ralph Castain <rhc@pmix.org> Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2020-02-07 18:20:06 -08:00
Jeff Squyres	28e51ad4d4	opal: convert from strncpy() -> opal_string_copy() In many cases, this was a simple string replace. In a few places, it entailed: 1. Updating some comments and removing now-redundant foo[size-1]='\0' statements. 2. Updating passing (size-1) to (size) (because opal_string_copy() wants the entire destination buffer length). This commit actually fixes a bunch of potential (yet quite unlikely) bugs where we could have ended up with non-null-terminated strings. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-09-27 11:56:18 -07:00
Gilles Gouaillardet	b3558f261b	opal/util: initialize proc_hostname in the opal_proc_t constructor Refs open-mpi/ompi#4264 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-09-26 10:47:26 +09:00
Ralph Castain	9eab9a1ed3	Remove stale global variables Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers. Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation). Begin first cut at memory profiler Some minor cleanups of memprobe Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-02 14:04:24 -08:00
Ralph Castain	4dad5de8ff	Silence a couple of warnings - strncpy returns a char*, not an int	2016-01-16 09:44:52 -08:00
Gilles Gouaillardet	1d38430e43	opal: replace opal_convert_jobid_to_string with opal_snprintf_jobid	2016-01-14 10:39:03 +09:00
Nathan Hjelm	408da16d50	ompi/proc: add proc hash table for ompi_proc_t objects This commit adds an opal hash table to keep track of mapping between process identifiers and ompi_proc_t's. This hash table is used by the ompi_proc_by_name() function to lookup (in O(1) time) a given process. This can be used by a BTL or other component to get a ompi_proc_t when handling an incoming message from an as yet unknown peer. Additionally, this commit adds a new MCA variable to control the new add_procs behavior: mpi_add_procs_cutoff. If the number of ranks in the process falls below the threshold a ompi_proc_t is created for every process. If the number of ranks is above the threshold then a ompi_proc_t is only created for the local rank. The code needed to generate additional ompi_proc_t's for a communicator is not yet complete. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-10 08:55:54 -06:00
Ralph Castain	d97bc29102	Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given	2015-09-04 16:54:40 -07:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Ralph Castain	780c93ee57	Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL. We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.	2014-11-11 17:00:42 -08:00
Gilles Gouaillardet	62bde1fcb5	opal/util/proc.c: handle unaligned opal_process_name_t parameters	2014-10-27 14:40:10 +09:00
Ralph Castain	fd6a044b7f	Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages. Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.	2014-10-03 16:02:57 -06:00
Ralph Castain	8f1b9b463e	Fix shared memory operations - need to pass the local topology and cpusets of all local peers so we can properly compute relative locality for them. Also need to set default locality to "on node" in case where cpusets are not passed because procs are not bound. This commit was SVN r32577.	2014-08-22 05:17:51 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
George Bosilca	1e37b67e5d	No more assert in the proc destructor. This commit was SVN r32401.	2014-08-01 16:36:23 +00:00
Ralph Castain	daeb9b6c4f	Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain. Thanks to Gilles for pointing out some of the discrepancies. This commit was SVN r32398.	2014-08-01 14:44:11 +00:00
George Bosilca	f39abb9e69	Reverting r32355: a number of processes is not a notion that a low level communication library should use to initialize itself. Ralph will champion this change back with an RFC if there is a realistic need/use case from the community. This commit was SVN r32361. The following SVN revision numbers were found above: r32355 --> open-mpi/ompi@c903917f47	2014-07-30 20:11:35 +00:00
Ralph Castain	c903917f47	Expose the num_procs information to the opal layer as the info is needed in several BTLs This commit was SVN r32355.	2014-07-30 09:33:41 +00:00
George Bosilca	a3feb627cf	Move some of the ompi_process_info down in OPAL. This commit was SVN r32324.	2014-07-26 21:43:34 +00:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00

26 Коммитов