1
1
Граф коммитов

25 Коммитов

Автор SHA1 Сообщение Дата
Nysal Jan K.A
2d2ea63231 Fix PMI and PMI2 builds 2015-08-11 21:13:45 +05:30
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Jeff Squyres
0166318966 opal_check_pmi: protect un-prefixed shell variables
Since there's unfortunately only a global namespace for shell
variables, we need to protect un-prefixed shell variables with
OPAL_VAR_SCOPE_PUSH/POP.
2015-03-13 04:48:31 -07:00
Ralph Castain
1c1d9f69f6 Update the PMI configury to support addition of --with-pmi-libdir option for environments that install the PMI libs in a non-default location 2015-02-27 10:48:28 -08:00
Howard Pritchard
67b0f8de7f separate out cray pmi config
The mixing of the Slurm PMI and Cray PMI configure was getting
messy and dangerous - developers working on Slurm PMI often don't
have access to Cray PMI, etc.

This mod pulls out the Cray PMI configure into a separate m4 file.
Cray pmi is now configured as follows:

1) on Cray CLE 5 and higher, Cray PMI is auto detected.  pkg-config
   is used to resolve the necessary CPP flags, link flags and libs,
   etc.  Nothing needs to be added to the configure line to pick up
   Cray PMI.

2) on legacy Cray CLE 4 systems with PMI 4.X, Cray PMI is also
   auto detected.

3) on legacy Cray CLE 4 systems with PMI 5.X Cray PMI can't be auto-detected
   owing to changes in the PMI pkg-config file which result in pkg-config
   returning an error owing to a dependency of PMI on newer versions of ALPS
   installs that are not present on CLE 4. So, for those falling in to this
   situation, the --with-cray-pmi=(DIR) method needs to be used.

   DIR specifies the Cray PMI install directory.  The configure file looks
   for required alps libraries first in /usr/lib/alps, then in
   /opt/cray/xe-sysroot/default/usr/lib/alps.
2014-11-04 15:25:25 -07:00
Jeff Squyres
81dafbdba1 opal_check_pmi.m4: trivial whitespace cleanup; no code changes 2014-11-01 04:02:23 -07:00
Ralph Castain
4f0c1ae8d9 Continue cleanup of the PMI config code. Eliminate the multiple calls to check for pmi1 and pmi2 - we must check it only once to get the pmix components to build only in the correct situations. Ensure we set the wrapper flags so we handle static builds correctly. 2014-10-27 20:37:33 -07:00
Gilles Gouaillardet
8da6c14e06 configury: fix a typo in config/opal_check_pmi.m4 2014-10-24 13:31:36 +09:00
Ralph Castain
75d8a7f25b Refactor the PMI configure check logic to be a lot cleaner and simpler. 2014-10-23 18:35:18 -07:00
Nathan Hjelm
083a659217 Correct some typos in Cray PMI detection 2014-10-14 10:28:36 -06:00
Nathan Hjelm
169a1866b8 Modify Cray PMI check to detect PMI on older systems 2014-10-09 17:01:31 -06:00
Ralph Castain
b1a58726ac Cleanup the PMI m4 syntax with respect to -a, and look for libpmi* so we can pickup both .a, .la, and whatever other extensions that particular system might use. 2014-10-09 14:04:43 -07:00
Nadezhda Kogteva
ffa8674e01 Fix bugs in PMI configure: set correct include path, fix test command with multiple conditions. 2014-10-09 17:23:56 +03:00
Ralph Castain
9c027e6def Update the PMI configure logic to handle the oddball case where both lib and lib64 may exist, and the required files may be in one or the other of them. 2014-10-07 10:20:46 -07:00
Ralph Castain
9e35f80ab6 Don't multiply define WANT_PMI_SUPPORT and friends. Turns out they weren't being used anywhere anyway, so no point in defining them at all
This commit was SVN r32822.
2014-09-30 20:43:25 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
Ralph Castain
3b64c603b4 First stage of RFC to rename OMPI_foo build system support: change OMPI_CHECK_PACKAGE -> OPAL_CHECK_PACKAGE
This commit was SVN r31582.
2014-05-01 14:24:56 +00:00
Ralph Castain
f259d50ed7 Fully fix the PMI2 warning - turned out to be larger than originally thought due to the way the function was being handled across multiple files. Properly resolve the problem by not compiling the file if PMI2 is not desired, and then appropriately setting the visibility of the function within the module
Refs trac:4400

This commit was SVN r31084.

The following Trac tickets were found above:
  Ticket 4400 --> https://svn.open-mpi.org/trac/ompi/ticket/4400
2014-03-17 17:36:37 +00:00
Ralph Castain
af4a9a0688 Make clear that --with-pmi can/should be used to specify the path to the pmi installation since at least one person didn't realize it.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30439.
2014-01-27 22:50:37 +00:00
Ralph Castain
e6199da2e7 Fixes trac:3486 - prevent opal_check_pmi from bleeding CPPFLAGS
This commit was SVN r28940.

The following Trac tickets were found above:
  Ticket 3486 --> https://svn.open-mpi.org/trac/ompi/ticket/3486
2013-07-24 03:53:23 +00:00
Ralph Castain
5f520e241b Ensure we get both -lpmi and -lpmi2 when the libs are separate
This commit was SVN r28795.
2013-07-16 14:57:18 +00:00
Ralph Castain
e8340b6339 There is no convention out there as to how OEMs handle PMI2 functions. Some put them in their own -lpmi2 library, and some don't. Some have split the PMI2 definitions into a pmi2.h and keep the PMI-1 definitions in a separate pmi.h, and some don't.
Try to handle cases more generally so at least Slurm and Cray can co-exist in peace.

This commit was SVN r28672.
2013-06-26 00:43:26 +00:00
Ralph Castain
fa943dc6ff Cleanup a few things in the revised PMI configury - we know slurm has both pmi and pmi2 libs, so just auto-detect the presence of them if the user directed us to build with pmi support.
Also cleanup some changed names in the alps code

This commit was SVN r28670.
2013-06-24 02:41:40 +00:00
Joshua Ladd
0b5c1f2ea8 Add 'generic' support for PMI2 (previously, we checked for PMI2 only on Cray systems.) If your resource manager (e.g. SLURM) has support for PMI2, then the --with-pmi configure flag will enable its usage. If you don't have PMI2, then you will fallback to regular old PMI1. This patch was submitted by Ralph Castain and reviewed and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28666.
2013-06-21 15:28:14 +00:00
Ralph Castain
bd9265c560 Per the meeting on moving the BTLs to OPAL, move the ORTE database "db" framework to OPAL so the relocated BTLs can access it. Because the data is indexed by process, this requires that we define a new "opal_identifier_t" that corresponds to the orte_process_name_t struct. In order to support multiple run-times, this is defined in opal/mca/db/db_types.h as a uint64_t without identifying the meaning of any part of that data.
A few changes were required to support this move:

1. the PMI component used to identify rte-related data (e.g., host name, bind level) and package them as a unit to reduce the number of PMI keys. This code was moved up to the ORTE layer as the OPAL layer has no understanding of these concepts. In addition, the component locally stored data based on process jobid/vpid - this could no longer be supported (see below for the solution).

2. the hash component was updated to use the new opal_identifier_t instead of orte_process_name_t as its index for storing data in the hash tables. Previously, we did a hash on the vpid and stored the data in a 32-bit hash table. In the revised system, we don't see a separate "vpid" field - we only have a 64-bit opaque value. The orte_process_name_t hash turned out to do nothing useful, so we now store the data in a 64-bit hash table. Preliminary tests didn't show any identifiable change in behavior or performance, but we'll have to see if a move back to the 32-bit table is required at some later time.

3. the db framework was a "select one" system. However, since the PMI component could no longer use its internal storage system, the framework has now been changed to a "select many" mode of operation. This allows the hash component to handle all internal storage, while the PMI component only handles pushing/pulling things from the PMI system. This was something we had planned for some time - when fetching data, we first check internal storage to see if we already have it, and then automatically go to the global system to look for it if we don't. Accordingly, the framework was provided with a custom query function used during "select" that lets you seperately specify the "store" and "fetch" ordering.

4. the ORTE grpcomm and ess/pmi components, and the nidmap code,  were updated to work with the new db framework and to specify internal/global storage options.

No changes were made to the MPI layer, except for modifying the ORTE component of the OMPI/rte framework to support the new db framework.

This commit was SVN r28112.
2013-02-26 17:50:04 +00:00