1
1
Граф коммитов

102 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Nadezhda Kogteva
2bce929330 MTL MXM cleanup: unnecessary OMPI_MTL_MXM_CONNECT_ON_FIRST_COMM variable removed 2014-10-20 10:29:47 +03:00
Vasily Filipov
a215a4831d MTL/MXM: disable "bulk_connect" by default. 2014-10-13 09:47:56 +03:00
Ralph Castain
41c6058153 Bring over changes to MXM from pmix branch:
MTL MXM: establish endpoint connection on the first communication when direct_modex used

This commit was SVN r32668.
2014-09-03 18:22:11 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
Alina Sklarevich
a914c68356 MTL MXM: fix check-help-string.pl errors and warnings.
This commit was SVN r32533.
2014-08-14 07:46:56 +00:00
Vasily Filipov
5ca2fffa44 MTL/MXM: call for ompi_proc_world instead of ompi_comm_size during del_procs.
This commit was SVN r32504.
2014-08-11 11:52:23 +00:00
Mike Dubman
3c8a4d7d2d mxm: opal refactoring voices
http://www.open-mpi.org/community/lists/devel/2014/08/15590.php

This commit was SVN r32486.
2014-08-10 04:35:56 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Mike Dubman
da8df859b3 MXM: use builk connection establishment API
fixed by Vasily, reviewed by Yossi/Miked

cmr=v1.8.2:reviwer=ompi-rm1.8

This commit was SVN r32256.
2014-07-17 08:35:55 +00:00
Mike Dubman
e342a11c2e opal envlist mca: implement Jeff`s quibbles
fixed by Elena, reviewed by Miked

This commit was SVN r32216.
2014-07-11 07:23:20 +00:00
Joshua Ladd
057370364d Opal: Add a new MCA variable type "version_string". Also add a
new flag to ompi_info that allows a user to print all MCA variables of a specific type.  

 --type version_string

This command will print all MCA variables of type version_string.

This feature was developed by Elena Shipunova and was reviewed by Josh Ladd.

This commit was SVN r32166.
2014-07-09 01:37:23 +00:00
Mike Dubman
b51a42aeca MXM: fix mxm cleanup, should be called for any compat API
fixe by miked, reviewed by yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31984.
2014-06-12 15:46:38 +00:00
Alina Sklarevich
7b8ad47e93 MXM: fix env variable name to hint for thread usage in mxm
reviewed by MikeD
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31968.
2014-06-09 06:40:32 +00:00
Alina Sklarevich
f8a664f5ec MXM: generate the jobid only for MXM versions under v2.0.
reviewed by miked
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31910.
2014-06-01 13:29:24 +00:00
Mike Dubman
fad1063980 MXM: fix warning
reviewed by Yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31855.
2014-05-21 07:50:05 +00:00
Yossi Etigin
6aa5680059 Revert r30966.
cmr=v1.8.1:reviewer=ompi-gk1.8

This commit was SVN r31593.

The following SVN revision numbers were found above:
  r30966 --> open-mpi/ompi@280e96c99a
2014-05-01 22:17:09 +00:00
Mike Dubman
a4990de055 mca: track external lib version (runtime/compiletime) for mca component
based on thread: http://www.open-mpi.org/community/lists/devel/2014/04/14505.php

Create mca parameter to track runtime/compiletime ext lib version for component.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31487.
2014-04-22 18:02:26 +00:00
Mike Dubman
6f057e57ba MXM: enable on demand mapping for only MPI mxm context
fixed by Devender, reviewed by Yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31463.
2014-04-20 09:15:37 +00:00
Alina Sklarevich
5cbf085dc2 mtl mxm: silent a warning.
in ompi_mtl_mxm_add_procs, define the ep_index variable only
for an older version of mxm.

submitted by Alina, reviewed by Mike.
cmr=v1.8:reviewer=ompi-rm1.8

This commit was SVN r31245.
2014-03-27 08:39:51 +00:00
Ralph Castain
e4efd5675f Per telecon, add comment indicating this needs to be fixed
Refs trac:4354

This commit was SVN r30991.

The following Trac tickets were found above:
  Ticket 4354 --> https://svn.open-mpi.org/trac/ompi/ticket/4354
2014-03-11 15:57:11 +00:00
Yossi Etigin
280e96c99a In mtl_mxm, don't disconnect from a proc with refcount > 1.
This will keep the connection until mxm endpoint is destroyed.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30966.
2014-03-09 08:35:44 +00:00
Mike Dubman
05ee929832 OMPI-MXM: handle multiple calls to add_procs() in MXM
- now add_procs can be called more than once (during MPI_INIT and Inter_Comm_Create)
- adjust MXM to this reality

fixed by Alina, reviewed by Yossi/Mike

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30907.
2014-03-03 13:50:37 +00:00
Mike Dubman
49ee63f4b8 MXM: do not enforce version check
- MXM uses libtool versioning scheme which is enough, no need additional in OMPI

reviewed by yossi

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30768.
2014-02-18 19:44:37 +00:00
Yossi Etigin
7564e2c13f Fix a recursion in mxm send flow which happens when mpi starts a new send from the context of send completion callback.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30265.
2014-01-12 17:47:03 +00:00
Alina Sklarevich
2869ff1782 mxm: fixes for compilation warnings.
removed set but not used variables and a variable that is unused.

reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30176.
2014-01-09 15:15:14 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Yossi Etigin
6ab4aba9e6 Fix missing include of show_help.h in mtl mxm.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29987.
2013-12-19 19:37:21 +00:00
Yossi Etigin
a913b00f89 mtl mxm: update configuration parsing api to mxm 2.1, drop
older version support (1.0 and 1.1), and cleanup the code.

reviewed by miked.

cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29797.
2013-12-04 09:11:55 +00:00
Mike Dubman
432c10750a enable mxm2 from np>0
reviewed by yossi
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29178.
2013-09-16 12:36:28 +00:00
Mike Dubman
44bfa95553 enable mxm2 by default on np>=0
reviewed by yossi
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29177.
2013-09-16 12:32:29 +00:00
Brian Barrett
16a1166884 Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a
configure-time dynamic allocation of flags.  The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process.  Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.

This commit was SVN r29100.
2013-08-30 16:54:55 +00:00
Ralph Castain
45e695928f As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time:
* add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit.

* remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL"

* modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded

* removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base

* added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames

This commit was SVN r29052.
2013-08-20 18:59:36 +00:00
Ralph Castain
611d7f9f6b When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require.
This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times.

Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes:

* upon first request for data, have the OPAL db pmi component fetch and decode *all* the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally

* reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test

* reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued).

Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it

Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time.

This commit was SVN r29040.
2013-08-17 00:49:18 +00:00
Aurelien Bouteiller
e1066143a4 rename ompi_free_list operations to _mt, as per discussions at last face to face meeting
This commit was SVN r28734.
2013-07-08 22:07:52 +00:00
George Bosilca
c9e5ab9ed1 Our macros for the OMPI-level free list had one extra argument, a possible return
value to signal that the operation of retrieving the element from the free list
failed. However in this case the returned pointer was set to NULL as well, so the
error code was redundant. Moreover, this was a continuous source of warnings when
the picky mode is on.

The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of
using the return code.

This commit was SVN r28722.
2013-07-04 08:34:37 +00:00
Mike Dubman
d1c82994be fix: detect threading model to take appropriate flow in mxm
This commit was SVN r28648.
2013-06-16 08:40:06 +00:00
Yossi Etigin
64d98e0438 Fix data corruption in MXM by registering to OPAL memory release hooks and removing any mappings created by mxm
This commit was SVN r28489.
2013-05-14 12:27:44 +00:00
Alex Margolin
aebd794bf6 Fixed macro definition order in MXM component headers
This commit was SVN r28378.
2013-04-24 16:51:43 +00:00
Alex Margolin
0ab7675019 Fix MXM connection establishment flow
This commit was SVN r28329.
2013-04-12 16:37:42 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Vasily Filipov
f897c8a1e0 MTL MXM: STREAM supporting for isend and irecv.
This commit was SVN r28122.
2013-02-27 13:21:30 +00:00
Vasily Filipov
52a9241859 MTL MXM: adapt to mxm 2.0 api changes - flags are only for send requests, and SYNC is part of the opcode.
This commit was SVN r28069.
2013-02-17 10:04:19 +00:00
Vasily Filipov
8270d8f52a MTL MXM: "#include "opal/util/show_help.h" adding.
This commit was SVN r28068.
2013-02-17 09:51:03 +00:00
Brian Barrett
312f37706e In talking about this with Jeff and Ralph, we don't actually need
ompi_show_help, because opal_show_help is replaced with an 
aggregating version when using ORTE, so there's no reason to
directly call orte_show_help.

This commit was SVN r28051.
2013-02-12 21:10:11 +00:00
Vasily Filipov
21b170b43b MTL MXM: push commit r27987 back, now with right user.
r27987 - MTL MXM: ver. 2.0 interface changes.

This commit was SVN r28026.

The following SVN revision numbers were found above:
  r27987 --> open-mpi/ompi@2735658d81
2013-02-04 06:59:24 +00:00
Vasily Filipov
aa5e436479 Revert revesion -r27986, the reason is - it was submitted with wrong user name.
This commit was SVN r28025.

The following SVN revision numbers were found above:
  r27986 --> open-mpi/ompi@729caaf0cd
2013-02-04 06:54:24 +00:00
Pavel Shamis
2735658d81 MTL MXM: ver. 2.0 interface changes.
This commit was SVN r27987.
2013-01-31 08:38:08 +00:00
Brian Barrett
b8442ba505 Revamp the handling of wrapper compiler flags. The user flags, main configure
flags, and mca flags are kept seperate until the very end.  The main configure
wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD
macro.  MCA components should either let <framework>_<component>_{LIBS,LDFLAGS}
be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}.
The situations in which WRAPPER CPPFLAGS can be set by MCA components was
made very small to match the one use case where it makes sense.

This commit was SVN r27950.
2013-01-29 00:00:43 +00:00
Brian Barrett
f42783ae1a Move the RTE framework change into the trunk. With this change, all non-CR
runtime code goes through one of the rte, dpm, or pubsub frameworks.

This commit was SVN r27934.
2013-01-27 23:25:10 +00:00