openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	611d7f9f6b	When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require. This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times. Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes: * upon first request for data, have the OPAL db pmi component fetch and decode all the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally * reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test * reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued). Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time. This commit was SVN r29040.	2013-08-17 00:49:18 +00:00
Ralph Castain	991e59a58a	Update MCA param in platform file This commit was SVN r29039.	2013-08-16 22:18:22 +00:00
Ralph Castain	11a3743b21	Cleanup unitialized var warnings This commit was SVN r29038.	2013-08-16 21:49:17 +00:00
Ralph Castain	90cfd139cf	Cleanup error - need an "and" instead of an "or" This commit was SVN r29037.	2013-08-16 21:41:59 +00:00
Ralph Castain	f8a72feb25	Silence unitialized var warning This commit was SVN r29036.	2013-08-16 21:39:28 +00:00
Ralph Castain	c5f395d36a	Silence unitialized var warnings This commit was SVN r29035.	2013-08-16 21:37:35 +00:00
Ralph Castain	b2d86e1857	Silence uninitialized var warning This commit was SVN r29034.	2013-08-16 21:35:51 +00:00
Ralph Castain	c74c54e18d	Cleanup uninitialized warnings This commit was SVN r29033.	2013-08-16 21:23:09 +00:00
Ralph Castain	b34bff8792	Cleanup warning This commit was SVN r29032.	2013-08-16 21:14:35 +00:00
Ralph Castain	7947cec8fa	Cleanup warning This commit was SVN r29031.	2013-08-16 21:13:40 +00:00
Ralph Castain	33beab5918	Avoid segfault due to uninitialized variable This commit was SVN r29030.	2013-08-16 21:10:38 +00:00
Ralph Castain	7d2e3028d6	Add unique info_key to documentation This commit was SVN r29029.	2013-08-14 04:24:17 +00:00
Ralph Castain	bebe852057	Add new info key for publish that allows user to designate that the port is to be unique - i.e., to return an error if that service has already been published. Default is to overwrite This commit was SVN r29028.	2013-08-14 04:21:17 +00:00
Ralph Castain	72b5e867ab	Correct shutdown ordering - rml must go last This commit was SVN r29027.	2013-08-14 04:20:17 +00:00
Ralph Castain	8a4c5f4957	Attempt to plug a few memory leaks by ensuring we finalize all things opened during init. However, we are still leaking memory like a sieve in param registration and hwloc. This commit was SVN r29026.	2013-08-14 02:03:00 +00:00
Ralph Castain	318467c04f	If we only have global scope, then don't fall back to looking at local scope if the lookup target wasn't found else we will hang This commit was SVN r29025.	2013-08-13 04:45:33 +00:00
Nathan Hjelm	6c75699068	coll/ml: fix typo in assert that could cause an abort in debug builds. cmr=v1.7.3:reviewer=manjugv This commit was SVN r29024.	2013-08-12 14:31:44 +00:00
Ralph Castain	2c286bccca	Fix typo - thanks to Michael Schlottke for pointing it out cmr:v1.7.3:reviewer=brbarret This commit was SVN r29015.	2013-08-11 18:16:21 +00:00
Jeff Squyres	c09ec204ad	Change usNIC BTL to always use small fragments when there is a non-contiguous converter. We can't "convert on the fly" because the # of bytes requested may not divide evenly into the convertor data type. This commit was SVN r29014.	2013-08-11 17:04:13 +00:00
Nathan Hjelm	b2e773ece3	Fix debugger support for direct-launched jobs. The orte rte component checks the orte_standalone_operation to decide if it should wait for a message from the hnp or wait on the debugger. This variable needed to be set to true in ess/pmi to enable the correct path when direct launching. cmr=v1.7.3:reviewer=rhc cmr=v1.6.6:reviewer=rhc This commit was SVN r29013.	2013-08-09 22:39:41 +00:00
Nathan Hjelm	524e9b148b	MCA/base: add a function to unload a component without closing it for components that have been registered but not opened This commit was SVN r29012.	2013-08-09 20:16:08 +00:00
Nathan Hjelm	841ed962f6	fix MCA variable and component system leaks cmr=v1.7.3:reviewer=rhc This commit was SVN r29011.	2013-08-09 19:50:28 +00:00
Nathan Hjelm	47320713bb	coll/ml: do not register variables in open and fix a bug in the coll/ml parser cmr=v1.7.3:reviewer=pasha This commit was SVN r29010.	2013-08-09 17:55:30 +00:00
Rolf vandeVaart	cd72024a3c	Refactor some of the initialization code. This commit was SVN r29009.	2013-08-09 14:54:17 +00:00
Edgar Gabriel	f7391eca23	Lazy open does not work for the addproc sharedfp component since it starts by spawning a process using MPI_Comm_spawn. For this, the first operation has to be collective which we can not guarantuee outside of the MPI_File_open operation. This commit was SVN r29008.	2013-08-06 20:48:20 +00:00
Edgar Gabriel	e348f5567f	add unignore for me. This commit was SVN r29007.	2013-08-06 20:47:08 +00:00
Jeff Squyres	d5e6b50d83	Add bullet about MPI_Get_address in the "mpi" module This commit was SVN r29006.	2013-08-06 15:23:36 +00:00
Jeff Squyres	ed130dcef0	Add missing Fortran mpi module TKR implementation for MPI_Get_address This commit was SVN r29005.	2013-08-06 15:08:00 +00:00
George Bosilca	837b3363fe	Silence few warnings. This commit was SVN r29004.	2013-08-06 09:38:30 +00:00
George Bosilca	710d3836d5	Use a recv convertor for the pack external case. This commit was SVN r29003.	2013-08-06 09:09:42 +00:00
George Bosilca	30b910b54d	More info in the debug mode. This commit was SVN r29002.	2013-08-06 09:08:43 +00:00
Nathan Hjelm	be1bd4661c	db/pmi: speed up modex by caching pmi data internally This commit was SVN r29001.	2013-08-05 22:31:50 +00:00
George Bosilca	4adaaa0b2b	Fix the profiling prototypes and the copyright. This commit was SVN r29000.	2013-08-05 21:07:32 +00:00
George Bosilca	a938f8fcc5	Add all missing prototypes for the _x functions. This commit was SVN r28999.	2013-08-05 20:49:31 +00:00
George Bosilca	47b1128993	It must be an MPI_Count. This commit was SVN r28998.	2013-08-05 20:49:00 +00:00
Brian Barrett	2cc947513b	* Fix some compile errors * Need to subtract 1 off the size so that we stay in the bit length requirements This commit was SVN r28997.	2013-08-05 18:49:48 +00:00
Ralph Castain	354f407fae	Update ignore This commit was SVN r28996.	2013-08-05 02:47:39 +00:00
Ralph Castain	b0a98b2b16	Update platform files This commit was SVN r28994.	2013-08-03 11:23:44 +00:00
Nathan Hjelm	88cadc552d	Make opal/db/pmi use as few PMI keys as possible. This commit reintroduces key compression into the pmi db. This feature compresses the keys stored into the component into a small number of PMI keys by serializing the data and base64 encoding the result. This will avoid issues with Cray PMI which restricts us to ~ 3 PMI keys per rank. This commit was SVN r28993.	2013-08-03 01:06:59 +00:00
Ralph Castain	72dc8f1f6e	Blasted typo This commit was SVN r28991.	2013-08-02 19:18:33 +00:00
Ralph Castain	f81cbad3e3	Fix platform files so trunk tarball can build This commit was SVN r28989.	2013-08-02 16:22:51 +00:00
Jeff Squyres	e7a87d3170	Update 1.6.6 NEWS bullets to match the 1.6 branch This commit was SVN r28988.	2013-08-02 14:37:06 +00:00
Ralph Castain	285429a1c6	Remove release of buffer - non-blocking send callback will do it This commit was SVN r28985.	2013-08-02 03:49:17 +00:00
Nathan Hjelm	ba8bfeded0	lanl: clean up tlcc plaform files No review necessary. cmr=v1.7.3:reviewer=ompi-gk1.7 This commit was SVN r28976.	2013-08-01 19:54:29 +00:00
Jeff Squyres	87910daf51	Fix a collection of bugs found by QA and Coverity, and make some minor improvements: * Fix minor memory leaks during component_init * Ensure that an initialization loop does not underflow an unsigned int * Improve mlock limit checking * Fix set of BTL modules created during component_init when failing to get QP resources or otherwise excluding some (but not all) usnic verbs devices * Fix/improve error messages to be consistent with other Cisco documentation * Randomize the initial sliding window sequence number so that we silently drop incoming frames from previous jobs that still have existant processes in the middle of dying (and are still transmitting) * Ensure we don't break out of add_procs too soon and create an asymetrical view of what interfaces are available This commit was SVN r28975.	2013-08-01 16:56:15 +00:00
Ralph Castain	37db1727a2	Refs trac:3710 Simplify the whole stripping of prefix method by consolidating it into a single MCA param. Allow for multiple prefixes to be stripped, each separated in the param by a comma. If no prefix is given, or the specified prefix isn't in the nodename, then just use the hostname itself. This commit was SVN r28974. The following Trac tickets were found above: Ticket 3710 --> https://svn.open-mpi.org/trac/ompi/ticket/3710	2013-08-01 00:32:10 +00:00
Nathan Hjelm	8429485a39	mpool/grdma: use the rcache even if not using mpi_leave_pinned or mpi_leave_pinned_pipeline This change should improve performance is the non-pinned case where the same memory region is involved in multiple simultaneous transfers. cmr=v1.7.3:reviewer=brbarret This commit was SVN r28973.	2013-07-31 23:50:41 +00:00
Nathan Hjelm	83a3fc2fd2	Add an option to control which hostnames orte_strip_prefix_from_node_names works on. This corrects a problem with Cray systems where the login node's hostname was being stripped causing the login node to be used as a compute node by mpirun. cmr=v1.7.3:reviewer=rhc This commit was SVN r28970.	2013-07-31 18:42:02 +00:00
Nathan Hjelm	278522d8e8	Update LANL platform files for changes in linux memory hook configuration. No review necessary cmr=v1.7.3:reviewer=ompi-gk1.7 This commit was SVN r28969.	2013-07-31 17:56:22 +00:00
Matthias Jurenz	5c43ae156c	Fixed # 3704 This commit was SVN r28967.	2013-07-31 07:38:24 +00:00

1 2 3 4 5 ...

18499 Коммитов