openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Ralph Castain	2581b41d08	Continue refactoring code by splitting the msg processing from the sendrecv code	2014-12-17 19:57:14 -08:00
Ralph Castain	f489e871c2	Take first step towards refactoring the PMIx server code by splitting out the proc_map function into its own file. Update ignore to include .DS_Store from the Mac	2014-12-17 19:08:52 -08:00
Jeff Squyres	c22e1ae33b	configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros These two macros set the prefix for the OPAL and ORTE libraries, respectively. Specifically, the OPAL library will be named libPREFIXopen-pal.la and the ORTE library will be named libPREFIXopen-rte.la. These macros must be called, even if the prefix argument is empty. The intent is that Open MPI will call these macros with an empty prefix, but other projects (such as ORCM) will call these macros with a non-empty prefix. For example, ORCM libraries can be named liborcm-open-pal.la and liborcm-open-rte.la. This scheme is necessary to allow running Open MPI applications under systems that use their own versions of ORTE and OPAL. For example, when running MPI applications under ORTE, if the ORTE and OPAL libraries between OMPI and ORCM are not identical (which, because they are released at different times, are likely to be different), we need to ensure that the OMPI applications link against their ORTE and OPAL libraries, but the ORCM executables link against their ORTE and OPAL libraries.	2014-10-22 10:32:19 -07:00
Jeff Squyres	01fd96bfa5	Revert "Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build." This reverts commit `63f619f871`.	2014-10-22 10:32:11 -07:00
Ralph Castain	63f619f871	Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build.	2014-10-10 11:39:08 -07:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00

8 Коммитов