openmpi

Автор	SHA1	Сообщение	Дата
Yossi Etigin	280e96c99a	In mtl_mxm, don't disconnect from a proc with refcount > 1. This will keep the connection until mxm endpoint is destroyed. cmr=v1.7.5:reviewer=ompi-rm1.7 This commit was SVN r30966.	2014-03-09 08:35:44 +00:00
Mike Dubman	05ee929832	OMPI-MXM: handle multiple calls to add_procs() in MXM - now add_procs can be called more than once (during MPI_INIT and Inter_Comm_Create) - adjust MXM to this reality fixed by Alina, reviewed by Yossi/Mike cmr=v1.7.5:reviewer=ompi-rm1.7 This commit was SVN r30907.	2014-03-03 13:50:37 +00:00
Mike Dubman	49ee63f4b8	MXM: do not enforce version check - MXM uses libtool versioning scheme which is enough, no need additional in OMPI reviewed by yossi cmr=v1.7.5:reviewer=ompi-rm1.7 This commit was SVN r30768.	2014-02-18 19:44:37 +00:00
Yossi Etigin	7564e2c13f	Fix a recursion in mxm send flow which happens when mpi starts a new send from the context of send completion callback. cmr=v1.7.5:reviewer=jsquyres This commit was SVN r30265.	2014-01-12 17:47:03 +00:00
Alina Sklarevich	2869ff1782	mxm: fixes for compilation warnings. removed set but not used variables and a variable that is unused. reviewed by miked cmr=v1.7.4:reviewer=ompi-rm1.7 This commit was SVN r30176.	2014-01-09 15:15:14 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
Yossi Etigin	6ab4aba9e6	Fix missing include of show_help.h in mtl mxm. cmr=v1.7.4:reviewer=jsquyres This commit was SVN r29987.	2013-12-19 19:37:21 +00:00
Yossi Etigin	a913b00f89	mtl mxm: update configuration parsing api to mxm 2.1, drop older version support (1.0 and 1.1), and cleanup the code. reviewed by miked. cmr=v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r29797.	2013-12-04 09:11:55 +00:00
Mike Dubman	432c10750a	enable mxm2 from np>0 reviewed by yossi cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29178.	2013-09-16 12:36:28 +00:00
Mike Dubman	44bfa95553	enable mxm2 by default on np>=0 reviewed by yossi cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29177.	2013-09-16 12:32:29 +00:00
Brian Barrett	16a1166884	Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a configure-time dynamic allocation of flags. The net result for platforms which only support BTL-based communication is a reduction of 8*nprocs bytes per process. Platforms which support both MTLs and BTLs will not see a space reduction, but will now be able to safely run both the MTL and BTL side-by-side, which will prove useful. This commit was SVN r29100.	2013-08-30 16:54:55 +00:00
Ralph Castain	45e695928f	As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time: * add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit. * remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL" * modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded * removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base * added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames This commit was SVN r29052.	2013-08-20 18:59:36 +00:00
Ralph Castain	611d7f9f6b	When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require. This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times. Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes: * upon first request for data, have the OPAL db pmi component fetch and decode all the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally * reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test * reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued). Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time. This commit was SVN r29040.	2013-08-17 00:49:18 +00:00
Aurelien Bouteiller	e1066143a4	rename ompi_free_list operations to _mt, as per discussions at last face to face meeting This commit was SVN r28734.	2013-07-08 22:07:52 +00:00
George Bosilca	c9e5ab9ed1	Our macros for the OMPI-level free list had one extra argument, a possible return value to signal that the operation of retrieving the element from the free list failed. However in this case the returned pointer was set to NULL as well, so the error code was redundant. Moreover, this was a continuous source of warnings when the picky mode is on. The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of using the return code. This commit was SVN r28722.	2013-07-04 08:34:37 +00:00
Mike Dubman	d1c82994be	fix: detect threading model to take appropriate flow in mxm This commit was SVN r28648.	2013-06-16 08:40:06 +00:00
Yossi Etigin	64d98e0438	Fix data corruption in MXM by registering to OPAL memory release hooks and removing any mappings created by mxm This commit was SVN r28489.	2013-05-14 12:27:44 +00:00
Alex Margolin	aebd794bf6	Fixed macro definition order in MXM component headers This commit was SVN r28378.	2013-04-24 16:51:43 +00:00
Alex Margolin	0ab7675019	Fix MXM connection establishment flow This commit was SVN r28329.	2013-04-12 16:37:42 +00:00
Nathan Hjelm	cf377db823	MCA/base: Add new MCA variable system Features: - Support for an override parameter file (openmpi-mca-param-override.conf). Variable values in this file can not be overridden by any file or environment value. - Support for boolean, unsigned, and unsigned long long variables. - Support for true/false values. - Support for enumerations on integer variables. - Support for MPIT scope, verbosity, and binding. - Support for command line source. - Support for setting variable source via the environment using OMPI_MCA_SOURCE_<var name>=source (either command or file:filename) - Cleaner API. - Support for variable groups (equivalent to MPIT categories). Notes: - Variables must be created with a backing store (char *, int , or bool *) that must live at least as long as the variable. - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of mca_base_var_set_value() to change the value. - String values are duplicated when the variable is registered. It is up to the caller to free the original value if necessary. The new value will be freed by the mca_base_var system and must not be freed by the user. - Variables with constant scope may not be settable. - Variable groups (and all associated variables) are deregistered when the component is closed or the component repository item is freed. This prevents a segmentation fault from accessing a variable after its component is unloaded. - After some discussion we decided we should remove the automatic registration of component priority variables. Few component actually made use of this feature. - The enumerator interface was updated to be general enough to handle future uses of the interface. - The code to generate ompi_info output has been moved into the MCA variable system. See mca_base_var_dump(). opal: update core and components to mca_base_var system orte: update core and components to mca_base_var system ompi: update core and components to mca_base_var system This commit also modifies the rmaps framework. The following variables were moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode, rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables. This commit was SVN r28236.	2013-03-27 21:09:41 +00:00
Vasily Filipov	f897c8a1e0	MTL MXM: STREAM supporting for isend and irecv. This commit was SVN r28122.	2013-02-27 13:21:30 +00:00
Vasily Filipov	52a9241859	MTL MXM: adapt to mxm 2.0 api changes - flags are only for send requests, and SYNC is part of the opcode. This commit was SVN r28069.	2013-02-17 10:04:19 +00:00
Vasily Filipov	8270d8f52a	MTL MXM: "#include "opal/util/show_help.h" adding. This commit was SVN r28068.	2013-02-17 09:51:03 +00:00
Brian Barrett	312f37706e	In talking about this with Jeff and Ralph, we don't actually need ompi_show_help, because opal_show_help is replaced with an aggregating version when using ORTE, so there's no reason to directly call orte_show_help. This commit was SVN r28051.	2013-02-12 21:10:11 +00:00
Vasily Filipov	21b170b43b	MTL MXM: push commit r27987 back, now with right user. r27987 - MTL MXM: ver. 2.0 interface changes. This commit was SVN r28026. The following SVN revision numbers were found above: r27987 --> open-mpi/ompi@2735658d81	2013-02-04 06:59:24 +00:00
Vasily Filipov	aa5e436479	Revert revesion -r27986, the reason is - it was submitted with wrong user name. This commit was SVN r28025. The following SVN revision numbers were found above: r27986 --> open-mpi/ompi@729caaf0cd	2013-02-04 06:54:24 +00:00
Pavel Shamis	2735658d81	MTL MXM: ver. 2.0 interface changes. This commit was SVN r27987.	2013-01-31 08:38:08 +00:00
Brian Barrett	b8442ba505	Revamp the handling of wrapper compiler flags. The user flags, main configure flags, and mca flags are kept seperate until the very end. The main configure wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD macro. MCA components should either let <framework>_<component>_{LIBS,LDFLAGS} be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}. The situations in which WRAPPER CPPFLAGS can be set by MCA components was made very small to match the one use case where it makes sense. This commit was SVN r27950.	2013-01-29 00:00:43 +00:00
Brian Barrett	f42783ae1a	Move the RTE framework change into the trunk. With this change, all non-CR runtime code goes through one of the rte, dpm, or pubsub frameworks. This commit was SVN r27934.	2013-01-27 23:25:10 +00:00
Mike Dubman	a454341e2b	add support for mxm 2.0 This commit was SVN r27661.	2012-12-09 22:58:37 +00:00
Aleksey Senin	ae92f64842	Check that MXM runtime version match compiled. Reviewed by Mike Dubman. This commit was SVN r27575.	2012-11-07 14:44:33 +00:00
Aleksey Senin	33ae1fe6c7	Fix untitialized return code in ompi_mtl_mxm_add_procs function. This commit was SVN r27216.	2012-09-02 13:17:49 +00:00
Aleksey Senin	68e0894a58	MXM send/recv request changes. Adapt OMPI to the latest MXM changes in send/recv request. Use memory handle structure instead of memory key. This commit was SVN r27155.	2012-08-28 05:57:36 +00:00
Yael Dayan	79e6b9c91d	Adapt OMPI to use newer version of MXM. This commit was SVN r26974.	2012-08-08 15:29:38 +00:00
Yael Dayan	954bcdc0a5	adapt the way to find amount of local processes to OMPI trunk. This commit was SVN r26973.	2012-08-08 15:26:28 +00:00
Vasily Filipov	fc712182db	MTL MXM: make MXM use MXM_VERSION macro for MXM version checking. This commit was SVN r26952.	2012-08-06 06:35:57 +00:00
Vasily Filipov	c386847d9a	MTL MXM: Adding MXM version protect for Mprobe, Mrecv resources. This commit was SVN r26922.	2012-07-31 07:57:25 +00:00
Vasily Filipov	4e66ff030b	MTL MXM Mrecv: adding missed return message to a free list. This commit was SVN r26870.	2012-07-26 11:22:22 +00:00
Vasily Filipov	ef9bd8e4cb	MTL MXM: MPI_Mprobe, MPI_Mrecv implementation for MXM adding. This commit was SVN r26866.	2012-07-25 13:26:40 +00:00
Mike Dubman	4784253f5c	revert commit, breaks backwards compatability, will be revised This commit was SVN r26852.	2012-07-24 11:48:18 +00:00
Vasily Filipov	99bd5977bd	MTL MXM: small fix in the mxm_req_probe func interface. This commit was SVN r26850.	2012-07-24 08:46:38 +00:00
Vasily Filipov	597a422272	MTL: make MXM work with read (in blocking send case) call-backs. This commit was SVN r26807.	2012-07-19 13:28:06 +00:00
Yevgeny Kliteynik	0e28fa984b	Remove dead code that was related to ticket #2971 This commit was SVN r26701.	2012-07-02 11:19:09 +00:00
Ralph Castain	0dfe29b1a6	Roll in the rest of the modex change. Eliminate all non-modex API access of RTE info from the MPI layer - in some cases, the info was already present (either in the ompi_proc_t or in the orte_process_info struct) and no call was necessary. This removes all calls to orte_ess from the MPI layer. Calls to orte_grpcomm remain required. Update all the orte ess components to remove their associated APIs for retrieving proc data. Update the grpcomm API to reflect transfer of set/get modex info to the db framework. Note that this doesn't recreate the old GPR. This is strictly a local db storage that may (at some point) obtain any missing data from the local daemon as part of an async methodology. The framework allows us to experiment with such methods without perturbing the default one. This commit was SVN r26678.	2012-06-27 14:53:55 +00:00
Josh Hursey	28681deffa	Backout the ORCA commit. :( There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk. This commit was SVN r26676.	2012-06-27 01:28:28 +00:00
Josh Hursey	542330e3a7	Commit of ORCA: Open MPI Runtime Collaborative Abstraction This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI. The project is described on the wiki: https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition And on this email thread: http://www.open-mpi.org/community/lists/devel/2012/06/11109.php This commit was SVN r26670.	2012-06-26 21:42:16 +00:00
Mike Dubman	10831e111a	detect num of local procs This commit was SVN r26555.	2012-06-05 09:13:16 +00:00
Yevgeny Kliteynik	1cbce83ece	Fixed wording of MXM parameters as suggested By Jeff. This commit was SVN r26545.	2012-06-03 21:48:42 +00:00
Yevgeny Kliteynik	f02bf707a4	Added MXM parameter "np" that controls the minimal number of processes that allow MXM to run Default: 128 MXM advantages kick in with large number of processes. This commit was SVN r26544.	2012-06-02 11:07:20 +00:00
Mike Dubman	34acf769d4	mtl_mxm: support canceling messages This commit was SVN r26256.	2012-04-09 16:02:05 +00:00

1 2

81 Коммитов