openmpi

Автор	SHA1	Сообщение	Дата
Howard Pritchard	bf89131f9e	add owner files to opa/ompi/orte mca directories This commit adds an owner file in each of the component directories for each framework. This allows for a simple script to parse the contents of the files and generate, among other things, tables to be used on the project's wiki page. Currently there are two "fields" in the file, an owner and a status. A tool to parse the files and generate tables for the wiki page will be added in a subsequent commit.	2015-02-22 15:10:23 -07:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Ralph Castain	1107f9099e	Per the RFC issued here: http://www.open-mpi.org/community/lists/devel/2014/05/14827.php Refactor PMI support This commit was SVN r31907.	2014-06-01 04:28:17 +00:00
Ralph Castain	a8e2d6c3a6	The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature: top_ompi_srcdir -> OMPI_TOP_SRCDIR top_ompi_builddir -> OMPI_TOP_BUILDDIR We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers. Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon. This commit was SVN r31678.	2014-05-07 21:48:53 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
Ralph Castain	a8a91b374e	Update component-level selection comments to match latest revisions cmr=v1.7.4:reviewer=rhc This commit was SVN r30087.	2013-12-25 19:12:43 +00:00
Ralph Castain	bc92c260ca	Add missing library dependency cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29273.	2013-09-27 00:08:43 +00:00
Nathan Hjelm	0b8fc13299	MPI-3.0: update C bindings with const and consistent use of [] for arrays. The MPI 3.0 standard added const to all in buffers in the C bindings. This commit adds the const keyword and in most cases casts const away. We will eventually should go through and update the various interfaces (coll, pml, io, etc) to take the const keyword. The group, comm, win, and datatype interfaces have been updated with const. cmr=v1.7.4:ticket=trac:3785:reviewer=jsquyres This commit was SVN r29266. The following Trac tickets were found above: Ticket 3785 --> https://svn.open-mpi.org/trac/ompi/ticket/3785	2013-09-26 21:56:20 +00:00
Ralph Castain	db214a2321	Refs trac:3697 - use the opal_pmi_error function instead of ompi_error as the returned error codes are from PMI This commit was SVN r28941. The following Trac tickets were found above: Ticket 3697 --> https://svn.open-mpi.org/trac/ompi/ticket/3697	2013-07-24 04:05:41 +00:00
Joshua Ladd	0b5c1f2ea8	Add 'generic' support for PMI2 (previously, we checked for PMI2 only on Cray systems.) If your resource manager (e.g. SLURM) has support for PMI2, then the --with-pmi configure flag will enable its usage. If you don't have PMI2, then you will fallback to regular old PMI1. This patch was submitted by Ralph Castain and reviewed and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd This commit was SVN r28666.	2013-06-21 15:28:14 +00:00
Nathan Hjelm	cf377db823	MCA/base: Add new MCA variable system Features: - Support for an override parameter file (openmpi-mca-param-override.conf). Variable values in this file can not be overridden by any file or environment value. - Support for boolean, unsigned, and unsigned long long variables. - Support for true/false values. - Support for enumerations on integer variables. - Support for MPIT scope, verbosity, and binding. - Support for command line source. - Support for setting variable source via the environment using OMPI_MCA_SOURCE_<var name>=source (either command or file:filename) - Cleaner API. - Support for variable groups (equivalent to MPIT categories). Notes: - Variables must be created with a backing store (char *, int , or bool *) that must live at least as long as the variable. - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of mca_base_var_set_value() to change the value. - String values are duplicated when the variable is registered. It is up to the caller to free the original value if necessary. The new value will be freed by the mca_base_var system and must not be freed by the user. - Variables with constant scope may not be settable. - Variable groups (and all associated variables) are deregistered when the component is closed or the component repository item is freed. This prevents a segmentation fault from accessing a variable after its component is unloaded. - After some discussion we decided we should remove the automatic registration of component priority variables. Few component actually made use of this feature. - The enumerator interface was updated to be general enough to handle future uses of the interface. - The code to generate ompi_info output has been moved into the MCA variable system. See mca_base_var_dump(). opal: update core and components to mca_base_var system orte: update core and components to mca_base_var system ompi: update core and components to mca_base_var system This commit also modifies the rmaps framework. The following variables were moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode, rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables. This commit was SVN r28236.	2013-03-27 21:09:41 +00:00
Ralph Castain	bd9265c560	Per the meeting on moving the BTLs to OPAL, move the ORTE database "db" framework to OPAL so the relocated BTLs can access it. Because the data is indexed by process, this requires that we define a new "opal_identifier_t" that corresponds to the orte_process_name_t struct. In order to support multiple run-times, this is defined in opal/mca/db/db_types.h as a uint64_t without identifying the meaning of any part of that data. A few changes were required to support this move: 1. the PMI component used to identify rte-related data (e.g., host name, bind level) and package them as a unit to reduce the number of PMI keys. This code was moved up to the ORTE layer as the OPAL layer has no understanding of these concepts. In addition, the component locally stored data based on process jobid/vpid - this could no longer be supported (see below for the solution). 2. the hash component was updated to use the new opal_identifier_t instead of orte_process_name_t as its index for storing data in the hash tables. Previously, we did a hash on the vpid and stored the data in a 32-bit hash table. In the revised system, we don't see a separate "vpid" field - we only have a 64-bit opaque value. The orte_process_name_t hash turned out to do nothing useful, so we now store the data in a 64-bit hash table. Preliminary tests didn't show any identifiable change in behavior or performance, but we'll have to see if a move back to the 32-bit table is required at some later time. 3. the db framework was a "select one" system. However, since the PMI component could no longer use its internal storage system, the framework has now been changed to a "select many" mode of operation. This allows the hash component to handle all internal storage, while the PMI component only handles pushing/pulling things from the PMI system. This was something we had planned for some time - when fetching data, we first check internal storage to see if we already have it, and then automatically go to the global system to look for it if we don't. Accordingly, the framework was provided with a custom query function used during "select" that lets you seperately specify the "store" and "fetch" ordering. 4. the ORTE grpcomm and ess/pmi components, and the nidmap code, were updated to work with the new db framework and to specify internal/global storage options. No changes were made to the MPI layer, except for modifying the ORTE component of the OMPI/rte framework to support the new db framework. This commit was SVN r28112.	2013-02-26 17:50:04 +00:00
Brian Barrett	f42783ae1a	Move the RTE framework change into the trunk. With this change, all non-CR runtime code goes through one of the rte, dpm, or pubsub frameworks. This commit was SVN r27934.	2013-01-27 23:25:10 +00:00
Josh Hursey	28681deffa	Backout the ORCA commit. :( There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk. This commit was SVN r26676.	2012-06-27 01:28:28 +00:00
Josh Hursey	32050f026f	protect the ORTE_CHECK_PMI define in the OMPI layer for --no-orte builds This commit was SVN r26674.	2012-06-27 00:28:37 +00:00
Josh Hursey	542330e3a7	Commit of ORCA: Open MPI Runtime Collaborative Abstraction This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI. The project is described on the wiki: https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition And on this email thread: http://www.open-mpi.org/community/lists/devel/2012/06/11109.php This commit was SVN r26670.	2012-06-26 21:42:16 +00:00
Brian Barrett	7406ef1241	Make all the PMI components depend on the common pmi library and properly install the common pmi library This commit was SVN r26588.	2012-06-11 15:58:09 +00:00
Nathan Hjelm	cdc3c87ba6	move pmi init/finalize into a common component This commit was SVN r26470.	2012-05-22 15:15:39 +00:00
Ralph Castain	a0edae52f2	Ensure the wrapper flags get entered in the right order, with -lpmi coming before the alps util libs This commit was SVN r25809.	2012-01-27 20:56:21 +00:00
Ralph Castain	be3dfb6a1a	Ensure that we only add -lpmi once to the wrapper compilers, no matter how many components might use it. This commit was SVN r25753.	2012-01-20 04:56:38 +00:00
Ralph Castain	07655e2945	Handle the case where the allocator "fibs" to us about the node names. In some cases (ahem...you know who you are!), the allocator will tell us a node number (e.g., "16"). However, the daemon will return a node name (e.g., "nid0016") - leaving us not recognizing its location. So provide a new parameter (can't have too many!) that handles this situation by stripping the prefix from the returned node name. Also do a little cleanup to ensure we cleanly exit from errors, without generating too many annoying messages. This commit was SVN r25562.	2011-12-02 14:10:08 +00:00
Ralph Castain	357ac14530	Can't return a numerical value here This commit was SVN r25559.	2011-12-02 10:36:57 +00:00
Samuel Gutierrez	e03bc93fb7	only use pmi grpcomm and pubsub during the direct launch case. use PMI environment variable to setup vpid in ess alps on cray xe systems. add pmi test code. This commit was SVN r25447.	2011-11-06 17:28:40 +00:00
Ralph Castain	14966e0f8f	Cleanup PMI startup - if a component isn't selected, it should finalize PMI IFF it started it. Otherwise, components that aren't selected can finalize PMI when it is in use by other parts of the system. This commit was SVN r25407.	2011-11-01 16:25:12 +00:00
Samuel Gutierrez	0ba13e2f8e	fix typo. use PMI_Initialized for init status instead of PMI_Init. This commit was SVN r25378.	2011-10-27 22:41:50 +00:00
Ralph Castain	955d8e7d46	Allow apps to use pmi when launched by mpirun, if desired, without affecting daemons This commit was SVN r25359.	2011-10-23 15:57:13 +00:00
Nathan Hjelm	fb19f56965	Cray doesn't define PMI2_SUCCESS This commit was SVN r25354.	2011-10-21 16:34:22 +00:00
Ralph Castain	3e72fccacf	Cray's PMI implementation is quite different from slurm's - they extended PMI-1 by adding some, but not all, of the PMI-2 APIs. So you can't just switch to using PMI-2 functions as it isn't a complete implementation. Instead, you have to selectively figure out which ones they have in PMI-2, and use any missing ones from PMI-1. What fun. Modify the configure logic and the PMI components to accommodate Cray's approach. Refactor the PMI error reporting code so it resides in only one place. Cray actually decided -not- to define the PMI-2 error codes, so we have to use the PMI-1 codes instead. More fun. This commit was SVN r25348.	2011-10-21 04:54:38 +00:00
Nathan Hjelm	586403f052	more pmi return code wtf This commit was SVN r25337.	2011-10-20 17:53:04 +00:00
Ralph Castain	72a4b0bd8a	Fix constants This commit was SVN r25325.	2011-10-19 14:14:58 +00:00
Ralph Castain	0bf4f48aa3	Don't need priority in this framework This commit was SVN r25308.	2011-10-17 22:39:15 +00:00
Ralph Castain	8f0ef54130	Complete implementation of pmi support. Ensure we support both mpirun and direct launch within same configuration to avoid requiring separate builds. Add support for generic pmi, not just under slurm. Add publish/subscribe support, although slurm's pmi implementation will just return an error as it hasn't been done yet. This commit was SVN r25303.	2011-10-17 20:51:22 +00:00

32 Коммитов