openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	45e695928f	As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time: * add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit. * remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL" * modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded * removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base * added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames This commit was SVN r29052.	2013-08-20 18:59:36 +00:00
Jeff Squyres	b30ad28276	Remove some unused variables and an unused goto label. This commit was SVN r29044.	2013-08-19 16:18:35 +00:00
Ralph Castain	e0cfcf376f	Okay, fix it so it works both --disable-mpi-profile and --enable-mpi-profile. I'm not sure why mpit's library has to be treated differently, but it seems that it needs some special care to work in both scenarios Refs trac:3725 This commit was SVN r29043. The following Trac tickets were found above: Ticket 3725 --> https://svn.open-mpi.org/trac/ompi/ticket/3725	2013-08-19 14:48:23 +00:00
Ralph Castain	b730c9540e	Fix --disable-mpi-profile option so it can build cmr:v1.7.3:reviewer=hjelmn This commit was SVN r29041.	2013-08-18 18:22:34 +00:00
Ralph Castain	611d7f9f6b	When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require. This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times. Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes: * upon first request for data, have the OPAL db pmi component fetch and decode all the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally * reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test * reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued). Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time. This commit was SVN r29040.	2013-08-17 00:49:18 +00:00
Ralph Castain	90cfd139cf	Cleanup error - need an "and" instead of an "or" This commit was SVN r29037.	2013-08-16 21:41:59 +00:00
Ralph Castain	f8a72feb25	Silence unitialized var warning This commit was SVN r29036.	2013-08-16 21:39:28 +00:00
Ralph Castain	c5f395d36a	Silence unitialized var warnings This commit was SVN r29035.	2013-08-16 21:37:35 +00:00
Ralph Castain	c74c54e18d	Cleanup uninitialized warnings This commit was SVN r29033.	2013-08-16 21:23:09 +00:00
Ralph Castain	33beab5918	Avoid segfault due to uninitialized variable This commit was SVN r29030.	2013-08-16 21:10:38 +00:00
Ralph Castain	7d2e3028d6	Add unique info_key to documentation This commit was SVN r29029.	2013-08-14 04:24:17 +00:00
Ralph Castain	bebe852057	Add new info key for publish that allows user to designate that the port is to be unique - i.e., to return an error if that service has already been published. Default is to overwrite This commit was SVN r29028.	2013-08-14 04:21:17 +00:00
Ralph Castain	318467c04f	If we only have global scope, then don't fall back to looking at local scope if the lookup target wasn't found else we will hang This commit was SVN r29025.	2013-08-13 04:45:33 +00:00
Nathan Hjelm	6c75699068	coll/ml: fix typo in assert that could cause an abort in debug builds. cmr=v1.7.3:reviewer=manjugv This commit was SVN r29024.	2013-08-12 14:31:44 +00:00
Jeff Squyres	c09ec204ad	Change usNIC BTL to always use small fragments when there is a non-contiguous converter. We can't "convert on the fly" because the # of bytes requested may not divide evenly into the convertor data type. This commit was SVN r29014.	2013-08-11 17:04:13 +00:00
Nathan Hjelm	47320713bb	coll/ml: do not register variables in open and fix a bug in the coll/ml parser cmr=v1.7.3:reviewer=pasha This commit was SVN r29010.	2013-08-09 17:55:30 +00:00
Rolf vandeVaart	cd72024a3c	Refactor some of the initialization code. This commit was SVN r29009.	2013-08-09 14:54:17 +00:00
Edgar Gabriel	f7391eca23	Lazy open does not work for the addproc sharedfp component since it starts by spawning a process using MPI_Comm_spawn. For this, the first operation has to be collective which we can not guarantuee outside of the MPI_File_open operation. This commit was SVN r29008.	2013-08-06 20:48:20 +00:00
Edgar Gabriel	e348f5567f	add unignore for me. This commit was SVN r29007.	2013-08-06 20:47:08 +00:00
Jeff Squyres	ed130dcef0	Add missing Fortran mpi module TKR implementation for MPI_Get_address This commit was SVN r29005.	2013-08-06 15:08:00 +00:00
George Bosilca	837b3363fe	Silence few warnings. This commit was SVN r29004.	2013-08-06 09:38:30 +00:00
George Bosilca	710d3836d5	Use a recv convertor for the pack external case. This commit was SVN r29003.	2013-08-06 09:09:42 +00:00
George Bosilca	4adaaa0b2b	Fix the profiling prototypes and the copyright. This commit was SVN r29000.	2013-08-05 21:07:32 +00:00
George Bosilca	a938f8fcc5	Add all missing prototypes for the _x functions. This commit was SVN r28999.	2013-08-05 20:49:31 +00:00
George Bosilca	47b1128993	It must be an MPI_Count. This commit was SVN r28998.	2013-08-05 20:49:00 +00:00
Brian Barrett	2cc947513b	* Fix some compile errors * Need to subtract 1 off the size so that we stay in the bit length requirements This commit was SVN r28997.	2013-08-05 18:49:48 +00:00
Jeff Squyres	87910daf51	Fix a collection of bugs found by QA and Coverity, and make some minor improvements: * Fix minor memory leaks during component_init * Ensure that an initialization loop does not underflow an unsigned int * Improve mlock limit checking * Fix set of BTL modules created during component_init when failing to get QP resources or otherwise excluding some (but not all) usnic verbs devices * Fix/improve error messages to be consistent with other Cisco documentation * Randomize the initial sliding window sequence number so that we silently drop incoming frames from previous jobs that still have existant processes in the middle of dying (and are still transmitting) * Ensure we don't break out of add_procs too soon and create an asymetrical view of what interfaces are available This commit was SVN r28975.	2013-08-01 16:56:15 +00:00
Nathan Hjelm	8429485a39	mpool/grdma: use the rcache even if not using mpi_leave_pinned or mpi_leave_pinned_pipeline This change should improve performance is the non-pinned case where the same memory region is involved in multiple simultaneous transfers. cmr=v1.7.3:reviewer=brbarret This commit was SVN r28973.	2013-07-31 23:50:41 +00:00
Matthias Jurenz	5c43ae156c	Fixed # 3704 This commit was SVN r28967.	2013-07-31 07:38:24 +00:00
Jeff Squyres	c7ff45d046	This define is not needed in mpi.h (it's private to the implementation, and is already included in opal_config.h) This commit was SVN r28964.	2013-07-30 21:33:19 +00:00
Nathan Hjelm	befcd8b63e	Disable mpi_param_check by default if parameter checking is compiled out. cmr=v1.7.3:reviewer=rhc This commit was SVN r28960.	2013-07-26 17:44:10 +00:00
Nathan Hjelm	1382d4fb53	Fix typos in _Complex ops This commit was SVN r28959.	2013-07-26 17:02:45 +00:00
Nathan Hjelm	9c519c5ce8	Bump MPI version to 2.2 now that in place alltoall and reductions on MPI_C_COMPLEX and MPI_CXX_COMPLEX are supported This commit was SVN r28958.	2013-07-26 15:51:38 +00:00
Jeff Squyres	c59770651f	Propagate the use of MPI_COUNT_KIND everywhere. This commit was SVN r28956.	2013-07-25 22:41:48 +00:00
Jeff Squyres	a001f01f05	Remove the other bogus KIND from here; it looks like old kruft This commit was SVN r28955.	2013-07-25 22:41:12 +00:00
Nathan Hjelm	7097ba3992	MPI_COUNT_KIND is defined in mpif-config.h This commit was SVN r28954.	2013-07-25 21:42:27 +00:00
Nathan Hjelm	e4f105ffb3	revert change that shouldn't have been part of r28952 This commit was SVN r28953. The following SVN revision numbers were found above: r28952 --> open-mpi/ompi@cb90a4a7fc	2013-07-25 20:23:55 +00:00
Nathan Hjelm	cb90a4a7fc	Add simple algorithms to support MPI_IN_PLACE for MPI_Alltoall, MPI_Alltoallv, and MPI_Alltoallw. Working on faster algorithms for tuned that will come at a later time. cmr=v1.7.3:ticket=trac:2965 This commit was SVN r28952. The following Trac tickets were found above: Ticket 2965 --> https://svn.open-mpi.org/trac/ompi/ticket/2965	2013-07-25 19:19:41 +00:00
Nathan Hjelm	99adeb7f6e	Fix support for complex datatypes when fortran is not available but _Complex is This commit was SVN r28951.	2013-07-25 19:08:21 +00:00
Edgar Gabriel	012f99c3b6	ompi_ignore this component until we find a solution for the dependence on libmpi for the external process that is being spawned by this component. This commit was SVN r28945.	2013-07-24 20:02:47 +00:00
Nathan Hjelm	22868b9f68	MPI_T: add man pages for MPI_T_* functions and fix typos in tool file names This commit was SVN r28943.	2013-07-24 18:19:40 +00:00
Jeff Squyres	f7337b8f77	Correct faulty max payload and MTU computations (and update some debugging that helped us find those). This commit was SVN r28942.	2013-07-24 16:06:28 +00:00
Ralph Castain	db214a2321	Refs trac:3697 - use the opal_pmi_error function instead of ompi_error as the returned error codes are from PMI This commit was SVN r28941. The following Trac tickets were found above: Ticket 3697 --> https://svn.open-mpi.org/trac/ompi/ticket/3697	2013-07-24 04:05:41 +00:00
Jeff Squyres	5323051047	Use sysfs to check MPI has enough VFs, QPs, and CQs Use the new sysfs files to check that there are enough VFs, QPs, and CQs for all the MPI processes on this server. Move the checking code into its own subroutine to make it smaller and easier to read/grok. This commit was SVN r28937.	2013-07-24 00:38:32 +00:00
Nathan Hjelm	90f5bd9424	Add missing f08 binding declarations for MPI_Count functions This commit was SVN r28936.	2013-07-23 19:00:06 +00:00
Rolf vandeVaart	a3995f73d3	Update trunk to match OMPI 1.7.3 due to code reviews. This commit was SVN r28934.	2013-07-23 17:58:21 +00:00
Nathan Hjelm	c6e586a81d	MPI-3: fortran support for large counts using derived datatypes Jeff: - Make sure not to go over 72 characters. Love Fortran! - Ensure to include 'mpif-config.h' in Type_size_x. This commit was SVN r28933.	2013-07-23 15:36:03 +00:00
Nathan Hjelm	c4c69b4ddf	MPI-3: add support for large counts using derived datatypes Add support for MPI_Count type and MPI_COUNT datatype and add the required MPI-3 functions MPI_Get_elements_x, MPI_Status_set_elements_x, MPI_Type_get_extent_x, MPI_Type_get_true_extent_x, and MPI_Type_size_x. This commit adds only the C bindings. Fortran bindins will be added in another commit. For now the MPI_Count type is define to have the same size as MPI_Offset. The type is required to be at least as large as MPI_Offset and MPI_Aint. The type was initially intended to be a ssize_t (if it was the same size as a long long) but there were issues compiling romio with that definition (despite the inclusion of stddef.h). I updated the datatype engine to use size_t instead of uint32_t to support large datatypes. This will require some review to make sure that 1) the changes are beneficial, 2) nothing was broken by the change (I doubt anything was), and 3) there are no performance regressions due to this change. Increase the maximum number of predifined datatypes to support MPI_Count Put common get_elements code to ompi/datatype/ompi_datatype_get_elements.c Update MPI_Get_count to reflect changes in MPI-3 (return MPI_UNDEFINED when the count is too large for an int) This commit was SVN r28932.	2013-07-23 15:35:14 +00:00
Matthias Jurenz	c4a7dded5f	Changes to VT: - configure: Removed double slashes in path names which make trouble when building RPMs on Fedora (see #3688) This commit was SVN r28924.	2013-07-23 08:12:18 +00:00
Nathan Hjelm	1349b825c2	MPI-2.2: Add C++ datatypes to mpi.h and fix support for MPI_C_*COMPLEX This commit was SVN r28919.	2013-07-22 23:45:45 +00:00

1 2 3 4 5 ...

6534 Коммитов