1
1
Граф коммитов

200 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
cf9796accd Remove the old configure option for disabling full rte support - we now use the OMPI rte framework for such purposes
This commit was SVN r28134.
2013-02-28 01:35:55 +00:00
Ralph Castain
8d2fa3693b First cut at removing the native Windows support. Remove all the Windows-specific components, and the .windows files sprinkled around. Remove the Windows platform files and MTT scripts. Update the NEWS to point Windows users to the cygwin package.
This commit was SVN r28116.
2013-02-26 20:44:56 +00:00
Ralph Castain
bd9265c560 Per the meeting on moving the BTLs to OPAL, move the ORTE database "db" framework to OPAL so the relocated BTLs can access it. Because the data is indexed by process, this requires that we define a new "opal_identifier_t" that corresponds to the orte_process_name_t struct. In order to support multiple run-times, this is defined in opal/mca/db/db_types.h as a uint64_t without identifying the meaning of any part of that data.
A few changes were required to support this move:

1. the PMI component used to identify rte-related data (e.g., host name, bind level) and package them as a unit to reduce the number of PMI keys. This code was moved up to the ORTE layer as the OPAL layer has no understanding of these concepts. In addition, the component locally stored data based on process jobid/vpid - this could no longer be supported (see below for the solution).

2. the hash component was updated to use the new opal_identifier_t instead of orte_process_name_t as its index for storing data in the hash tables. Previously, we did a hash on the vpid and stored the data in a 32-bit hash table. In the revised system, we don't see a separate "vpid" field - we only have a 64-bit opaque value. The orte_process_name_t hash turned out to do nothing useful, so we now store the data in a 64-bit hash table. Preliminary tests didn't show any identifiable change in behavior or performance, but we'll have to see if a move back to the 32-bit table is required at some later time.

3. the db framework was a "select one" system. However, since the PMI component could no longer use its internal storage system, the framework has now been changed to a "select many" mode of operation. This allows the hash component to handle all internal storage, while the PMI component only handles pushing/pulling things from the PMI system. This was something we had planned for some time - when fetching data, we first check internal storage to see if we already have it, and then automatically go to the global system to look for it if we don't. Accordingly, the framework was provided with a custom query function used during "select" that lets you seperately specify the "store" and "fetch" ordering.

4. the ORTE grpcomm and ess/pmi components, and the nidmap code,  were updated to work with the new db framework and to specify internal/global storage options.

No changes were made to the MPI layer, except for modifying the ORTE component of the OMPI/rte framework to support the new db framework.

This commit was SVN r28112.
2013-02-26 17:50:04 +00:00
Ralph Castain
744ed49b2d Begin cleanup of the thread_lock calls in ORTE. We'll ignore the ones in the rml/oob for now as that code block is being rewritten anyway.
This commit was SVN r28053.
2013-02-13 01:53:12 +00:00
Ralph Castain
cfaefb3286 Remove the only place where PMI was used outside a component, and relocate that code to common/pmi.
This commit was SVN r27944.
2013-01-28 20:14:51 +00:00
Ralph Castain
c65de32218 Cleanup the PMI subsystems to support Sam's "rml-less" shared memory wireup. Only retrieve keys that are specifically requested, and only when they are requested. Let string values be segmented across multiple keys, but don't do it for anything else.
This commit was SVN r27737.
2013-01-03 02:16:10 +00:00
Ralph Castain
c26ed7dcdd Fix comm_spawn when ORTE progress thread is enabled by ensuring that all operations on the global list of active collectives are done in events to avoid conflicts.
This commit was SVN r27658.
2012-12-09 02:53:20 +00:00
Ralph Castain
26f1cd0909 Fix compiler warnings
This commit was SVN r27588.
2012-11-12 02:50:45 +00:00
Nathan Hjelm
bdedd8b0d3 Per RFC modify the behavior of mca_base_components_close to NOT close the output. Modify frameworks to always close their output and set to -1.
Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality.

This commit was SVN r27570.
2012-11-06 19:09:26 +00:00
Brian Barrett
e61c00212d Add files found in svn but not tarball
This commit was SVN r27549.
2012-11-01 02:27:03 +00:00
Ralph Castain
c4fd3df2df Remove unused variables
This commit was SVN r27319.
2012-09-12 12:03:24 +00:00
Ralph Castain
c82cfecc1c Cleanup comm_spawn for the multi-node case where at least one new process isn't spawned on every node. Avoid the complexities of trying to execute a daemon collective across the dynamic spawn as it becomes too hard to ensure that all daemons participate or are accounted for - instead, use a less scalable but workable solution of sending the data directly between the participating procs. Ensure that singletons get their collectives properly defined at startup so the spawned "HNP" is ready for them.
As a secondary cleanup, the HNP doesn't need to update its nidmap during an xcast as it already has an up-to-date picture of the situation. So just dump that data and move along.

This commit was SVN r27318.
2012-09-12 11:31:36 +00:00
Ralph Castain
efc4a40c8a It is okay for a key not to be found
This commit was SVN r27187.
2012-08-30 15:12:23 +00:00
Ralph Castain
6dbb7a8493 Just because a peer didn't post a particular pmi key, that doesn't mean it is an immediate irrecoverable error - could be they just don't have a matching interface. Let the upper layer decide what to do about it.
This commit was SVN r27186.
2012-08-30 14:12:09 +00:00
Ralph Castain
a3b08f5800 Fix a few things relating to comm_spawn that causes new daemons to be launched. Ensure that all new daemons receive a full pidmap. Properly mark the daemon job as "updated" when daemons are added
This commit was SVN r27177.
2012-08-29 03:11:37 +00:00
Ralph Castain
98580c117b Introduce staged execution. If you don't have adequate resources to run everything without oversubscribing, don't want to oversubscribe, and aren't using MPI, then staged execution lets you (a) run as many procs as there are available resources, and (b) start additional procs as others complete and free up resources. Adds a new mapper as well as a new state machine.
Remove some stale configure.m4's we no longer need.

Optimize the nidmaps a bit by only sending info that has changed each time, instead of sending a complete copy of everything. Makes no difference for the typical MPI job - only impacts things like staged execution where we are sending multiple (possibly many) launch messages.

This commit was SVN r27165.
2012-08-28 21:20:17 +00:00
Ralph Castain
d310dd8c58 Fix a strange race condition by creating a separate buffer for each send - apparently, just a retain isn't enough protection on some systems
This commit was SVN r27161.
2012-08-28 17:17:34 +00:00
Ralph Castain
11c68e2299 Correct the count in the pmi key
This commit was SVN r27156.
2012-08-28 15:05:02 +00:00
Ralph Castain
64cf75cec5 Add some debug
This commit was SVN r27087.
2012-08-17 02:19:26 +00:00
Ralph Castain
c4ee297a60 Cleanup the pmi grpcomm module so it passes non-btl modex data correctly.
This commit was SVN r26992.
2012-08-10 20:35:50 +00:00
Ralph Castain
0dfe29b1a6 Roll in the rest of the modex change. Eliminate all non-modex API access of RTE info from the MPI layer - in some cases, the info was already present (either in the ompi_proc_t or in the orte_process_info struct) and no call was necessary. This removes all calls to orte_ess from the MPI layer. Calls to orte_grpcomm remain required.
Update all the orte ess components to remove their associated APIs for retrieving proc data. Update the grpcomm API to reflect transfer of set/get modex info to the db framework.

Note that this doesn't recreate the old GPR. This is strictly a local db storage that may (at some point) obtain any missing data from the local daemon as part of an async methodology. The framework allows us to experiment with such methods without perturbing the default one.

This commit was SVN r26678.
2012-06-27 14:53:55 +00:00
Brian Barrett
b22faedd9d Remove the Portals4 SHMEM reference implementation runtime support, as we're
no longer using the runtime provided by the reference implementation.

Remove the Catamount support from ORTE, since we're no longer supporting
Catamount.  Left the Catamount timer component, because I'm not sure whether
it's used on the XTs running CNL.

This commit was SVN r26677.
2012-06-27 14:17:43 +00:00
Shiqing Fan
6f746cdb33 remove a unused file.
This commit was SVN r26645.
2012-06-25 10:17:21 +00:00
Ralph Castain
0a713cd27e Add database framework to ORTE and refactor modex code to utilize it. Create the "hash" db component from the prior modex db code. Leave the other components ignored for now - will activate them later.
Modex is still a blocking operation at this point.

This commit was SVN r26618.
2012-06-19 13:38:42 +00:00
Ralph Castain
9e0bb6ae28 Revert r26600 and r26601 for a couple of reasons:
1. they modified the OMPI-ORTE interface, which is something I promised to avoid doing unless absolutely necessary, and

2. the framework ident is already in the component name key provided to the modex db. What is missing is the project ident, but as Jeff and I discussed last week, we really need to add that field to the component struct anyway to avoid multi-project collisions on framework names. That will be done over the next couple of weeks as a separate effort.

This commit was SVN r26613.

The following SVN revision numbers were found above:
  r26600 --> open-mpi/ompi@5ba4deff07
  r26601 --> open-mpi/ompi@0e3094c318
2012-06-16 09:11:03 +00:00
Ralph Castain
96c778656a Improve launch performance on clusters that use dedicated nodes by instructing the orteds to use the same port as the HNP, thus allowing them to "rollup" their initial callback via the routed network. This substantially reduces the HNP bottleneck and the number of ports opened by the HNP.
Restore enable-static-ports option by default - the Cray will have to disable it to get around their library issues, but that's just a warning problem as opposed to blocking the build.

This commit was SVN r26606.
2012-06-15 10:15:07 +00:00
Ralph Castain
0e3094c318 Update the other grpcomm modules to new API
This commit was SVN r26601.
2012-06-14 03:28:48 +00:00
Ralph Castain
5ba4deff07 Extend the modex database to support multiple projects and frameworks that might have duplicate component names. No visible API change in the BTL's as it was executed solely in the ompi modex code.
This commit was SVN r26600.
2012-06-14 02:55:06 +00:00
Brian Barrett
7406ef1241 Make all the PMI components depend on the common pmi library and properly
install the common pmi library

This commit was SVN r26588.
2012-06-11 15:58:09 +00:00
Shiqing Fan
2abf783fa0 Remove a unnecessary definition before the real one.
This commit was SVN r26562.
2012-06-06 14:15:39 +00:00
Nathan Hjelm
b9959a95cd ack! one more
This commit was SVN r26472.
2012-05-22 20:52:52 +00:00
Nathan Hjelm
cdc3c87ba6 move pmi init/finalize into a common component
This commit was SVN r26470.
2012-05-22 15:15:39 +00:00
Ralph Castain
c4f8043064 Per Nathan, with a little cleanup by me: update the PMI support to aggregate modex info, thus reducing the number of keys required so it fits within Cray default constraints
This commit was SVN r26456.
2012-05-19 16:12:52 +00:00
Ralph Castain
b2f77bf08f Extend the iof by adding two new components to support map-reduce IO chaining. Add a mapreduce tool for running such applications.
Fix the state machine to support multiple jobs being simultaneously launched as this is not only required for mapreduce, but can happen under comm-spawn applications as well.

This commit was SVN r26380.
2012-05-02 21:00:22 +00:00
Ralph Castain
7741ba47be Fix comm_spawn that spans multiple nodes
This commit was SVN r26268.
2012-04-13 01:59:07 +00:00
Ralph Castain
4d16790836 Fix collectives for jobs running across partial allocations
This commit was SVN r26267.
2012-04-13 00:38:47 +00:00
Ralph Castain
9cd4c06488 Get things to build and run when --disable-orte is specified
This commit was SVN r26263.
2012-04-10 21:50:01 +00:00
Ralph Castain
f5cd996b91 Fix the case where n=1
This commit was SVN r26258.
2012-04-09 22:44:56 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Ralph Castain
366f9d1518 Add some missing localities to the hwloc pretty-print, fix pmi modex
This commit was SVN r26105.
2012-03-06 06:21:10 +00:00
Ralph Castain
a0edae52f2 Ensure the wrapper flags get entered in the right order, with -lpmi coming before the alps util libs
This commit was SVN r25809.
2012-01-27 20:56:21 +00:00
Ralph Castain
be3dfb6a1a Ensure that we only add -lpmi once to the wrapper compilers, no matter how many components might use it.
This commit was SVN r25753.
2012-01-20 04:56:38 +00:00
Ralph Castain
9b59d8de6f This is actually a much smaller commit than it appears at first glance - it just touches a lot of files. The --without-rte-support configuration option has never really been implemented completely. The option caused various objects not to be defined and conditionally compiled some base functions, but did nothing to prevent build of the component libraries. Unfortunately, since many of those components use objects covered by the option, it caused builds to break if those components were allowed to build.
Brian dealt with this in the past by creating platform files and using "no-build" to block the components. This was clunky, but acceptable when only one organization was using that option. However, that number has now expanded to at least two more locations.

Accordingly, make --without-rte-support actually work by adding appropriate configury to prevent components from building when they shouldn't. While doing so, remove two frameworks (db and rmcast) that are no longer used as ORCM comes to a close (besides, they belonged in ORCM now anyway). Do some minor cleanups along the way.

This commit was SVN r25497.
2011-11-22 21:24:35 +00:00
Ralph Castain
6310361532 At long last, the fabled revision to the affinity system has arrived. A more detailed explanation of how this all works will be presented here:
https://svn.open-mpi.org/trac/ompi/wiki/ProcessPlacement

The wiki page is incomplete at the moment, but I hope to complete it over the next few days. I will provide updates on the devel list. As the wiki page states, the default and most commonly used options remain unchanged (except as noted below). New, esoteric and complex options have been added, but unless you are a true masochist, you are unlikely to use many of them beyond perhaps an initial curiosity-motivated experimentation.

In a nutshell, this commit revamps the map/rank/bind procedure to take into account topology info on the compute nodes. I have, for the most part, preserved the default behaviors, with three notable exceptions:

1. I have at long last bowed my head in submission to the system admin's of managed clusters. For years, they have complained about our default of allowing users to oversubscribe nodes - i.e., to run more processes on a node than allocated slots. Accordingly, I have modified the default behavior: if you are running off of hostfile/dash-host allocated nodes, then the default is to allow oversubscription. If you are running off of RM-allocated nodes, then the default is to NOT allow oversubscription. Flags to override these behaviors are provided, so this only affects the default behavior.

2. both cpus/rank and stride have been removed. The latter was demanded by those who didn't understand the purpose behind it - and I agreed as the users who requested it are no longer using it. The former was removed temporarily pending implementation.

3. vm launch is now the sole method for starting OMPI. It was just too darned hard to maintain multiple launch procedures - maybe someday, provided someone can demonstrate a reason to do so.

As Jeff stated, it is impossible to fully test a change of this size. I have tested it on Linux and Mac, covering all the default and simple options, singletons, and comm_spawn. That said, I'm sure others will find problems, so I'll be watching MTT results until this stabilizes.

This commit was SVN r25476.
2011-11-15 03:40:11 +00:00
Ralph Castain
c8e105bd8c Remove stale code
This commit was SVN r25475.
2011-11-14 23:39:23 +00:00
Ralph Castain
d008aeb531 Silence debug
This commit was SVN r25471.
2011-11-11 16:42:45 +00:00
Ralph Castain
a3ce355a60 Revert r25453 and r25450 until we can fix the libevent2013 configure code - still not getting the includedir to eval correctly.
This commit was SVN r25454.

The following SVN revision numbers were found above:
  r25450 --> open-mpi/ompi@7f7d5c4f1f
  r25453 --> open-mpi/ompi@c9fe8c32e2
2011-11-07 16:23:44 +00:00
Samuel Gutierrez
e03bc93fb7 only use pmi grpcomm and pubsub during the direct launch case. use PMI environment variable to setup vpid in ess alps on cray xe systems. add pmi test code.
This commit was SVN r25447.
2011-11-06 17:28:40 +00:00
Ralph Castain
3e4165fd8d Cleanup includes
This commit was SVN r25418.
2011-11-02 18:28:28 +00:00
Ralph Castain
55b996678e Minor indentation changes
This commit was SVN r25414.
2011-11-02 15:56:56 +00:00