openmpi

Автор	SHA1	Сообщение	Дата
Gleb Natapov	0a1fa2cb56	req_match_received is set inside MCA_PML_OB1_RECV_REQUEST_MATCHE(). This commit was SVN r17442.	2008-02-13 08:34:39 +00:00
Gleb Natapov	876f49f1a7	Remove unnecessary assignment. It is done later in the same function. This commit was SVN r17441.	2008-02-13 08:28:25 +00:00
Jeff Squyres	17ede97ef8	Two fixes to revert some long-ago decisions that seemed like a good idea at the time, but led to logistical difficulties in importing new versions of ROMIO: * We are effectively eliminating the ROMIO file prefix rule hacks in the ROMIO component, which create symlinks from foo.c to io_romio_foo.c. In reality, the file name conflict potential will be small. * Additionally, we are effectively eliminating the ROMIO function prefix rule in the ROMIO component. This is another place where there are generally problems with the merge up new versions of ROMIO and/or patches from the user community (for their own local builds). In reality, since other major MPI implementations provides the same exact symbols, it won't cause any practical problems for users. In return, we make it ''much'' simpler to apply ROMIO patches to Open MPI. The problem right now is that any patch will have filenames such as ad_panfs.c, but Open MPI will only have io_romio_ad_panfs.c, making things extremely difficult for users. I believe, for example, that this would make it possible for LANL to have applied their patches without too much hassle on either their part or our part. It will also make things easier for OMPI when we/they want to do the next ROMIO upgrade (this was one of the sources of problems on each upgrade). This commit was SVN r17436.	2008-02-12 18:55:17 +00:00
Shiqing Fan	54c7b71cfd	Use the correct way of including memchecker.h, which will work with '--with-devel-headers'. This commit was SVN r17435.	2008-02-12 18:01:17 +00:00
Rainer Keller	7621800477	- Fix and add comments -- output full name for pd - Protect argument in macro... This commit was SVN r17434.	2008-02-12 16:59:59 +00:00
Jeff Squyres	6adc5015f9	This file was accidentally re-introduced in r17409. This commit was SVN r17428. The following SVN revision numbers were found above: r17409 --> open-mpi/ompi@98f70d6318	2008-02-12 13:07:44 +00:00
Shiqing Fan	f5792bbda5	merging the memchecker into trunk. This commit was SVN r17424.	2008-02-12 08:46:27 +00:00
Gleb Natapov	cf801edfe5	Use carto topology framework to choose which HCAs to use. This commit was SVN r17414.	2008-02-11 10:34:11 +00:00
George Bosilca	ee321748a6	The lost space. This commit was SVN r17413.	2008-02-10 22:08:49 +00:00
George Bosilca	55179b833c	Unexpected ... Removing unistd.h from datatype.h break the compilation of the pml_base_bsend ... This commit was SVN r17412.	2008-02-10 21:49:19 +00:00
Tim Prins	b88a3f7a94	Update onesided components to fix the case (on 64 bit machines) where the total offset is greater than 2^31-1 bytes. See: http://www.open-mpi.org/community/lists/users/2008/01/4880.php This commit was SVN r17400.	2008-02-07 18:45:35 +00:00
Pavel Shamis	df787bbeab	Fixing compilation issue on machines with ofed under 1.3. Also finx in apm migration flow. This commit was SVN r17383.	2008-02-06 13:54:58 +00:00
Pavel Shamis	3ba3f70624	Adding apm support for xrc. This commit was SVN r17382.	2008-02-06 10:19:51 +00:00
Gleb Natapov	03c80bdfe3	Fix old libiverbs case. This commit was SVN r17370.	2008-02-04 14:05:01 +00:00
Pavel Shamis	f0c478e7e0	XRC - replacing the new old API with new one. This commit was SVN r17369.	2008-02-04 14:03:38 +00:00
Gleb Natapov	67f752dd50	Add compatibility function between old libibverbs and current libibverbs way of detecting HCAs. This commit was SVN r17365.	2008-02-03 15:16:24 +00:00
George Bosilca	3a6d2e3894	The latest and greatest Elan improvements. This commit was SVN r17361.	2008-02-01 21:29:57 +00:00
Edgar Gabriel	77057a50a3	- adding the two-level hierarchy detection algorithm - minor fix in the temporary collectives - removing the symmetric parameter, since it didn't really make sense. This commit was SVN r17359.	2008-02-01 17:11:36 +00:00
Rich Graham	fda485ff9c	backing file is allocated and deallocated. This commit was SVN r17358.	2008-02-01 15:26:20 +00:00
Gleb Natapov	f73adf69c0	Fix compiler warnings on 32bit systems. This commit was SVN r17346.	2008-01-31 09:05:25 +00:00
Adrian Knoth	8ae4a10b4c	Reverted r17331, r17332. Still broken. I'm in a bad hurry. :-( Re #1206 This commit was SVN r17333. The following SVN revision numbers were found above: r17331 --> open-mpi/ompi@3846e2a797 r17332 --> open-mpi/ompi@c03de08c55	2008-01-30 16:51:55 +00:00
Adrian Knoth	c03de08c55	Logic is wrong. I'm going to revert it again. Re #1206 This commit was SVN r17332.	2008-01-30 16:48:50 +00:00
Adrian Knoth	3846e2a797	When checking incoming connections, also care about aliased interfaces. Re #1206 This commit was SVN r17331.	2008-01-30 16:45:41 +00:00
Adrian Knoth	7f79c68930	Reverted r17307 and r17308. It broke parallel TCP connections. Re #1206 This commit was SVN r17329. The following SVN revision numbers were found above: r17307 --> open-mpi/ompi@7a59b3f58c r17308 --> open-mpi/ompi@72b29bc21f	2008-01-30 14:31:47 +00:00
Aurelien Bouteiller	4da1258d60	Quick fix for static builds (mca_component_retain always return failure in static build mode, so just blatently ignore the failure. Though, this may crash severly sometime later if the failure occurs while in dso mode. This commit was SVN r17328.	2008-01-30 10:41:49 +00:00
George Bosilca	4e703741b7	Move the PML tags into the legal range. This commit was SVN r17326.	2008-01-30 00:09:45 +00:00
Adrian Knoth	72b29bc21f	Cosmetic patch. Use IN6_ARE_ADDR_EQUAL instead of memcmp(). Re #1206 . This commit was SVN r17308.	2008-01-29 16:02:24 +00:00
Adrian Knoth	7a59b3f58c	accept incoming connections from hosts with multiple addresses. We loop over all peer addresses and accept when one of them matches. Note that this might break functionality: mca_btl_tcp_proc_insert now always inserts the same endpoint. (is the lack of endpoints the problem? should there be one for every remote address?) Re #1206 This commit was SVN r17307.	2008-01-29 15:55:56 +00:00
Pavel Shamis	7b59f8ae0b	Fixing warning in apm code. This commit was SVN r17306.	2008-01-29 15:45:18 +00:00
Gleb Natapov	bb03e07ec4	Move eager RDMA channels accounting into completion callback. Otherwise it can go wrong with XRC as endpoint may be not yet connected at the time eager rdma channel is created. This commit was SVN r17302.	2008-01-29 14:35:33 +00:00
Pavel Shamis	92ef832472	Making sure that XRC will not overrun ib_dev_attr.max_qp_wr This commit was SVN r17300.	2008-01-29 13:15:21 +00:00
Aurelien Bouteiller	2fd8230025	Windows might not be the only one... This commit was SVN r17296.	2008-01-29 07:44:33 +00:00
Aurelien Bouteiller	bd10a0231f	Replaced the explicit include of inttypes.h by the opal replacement. This commit was SVN r17295.	2008-01-29 07:35:14 +00:00
Aurelien Bouteiller	e261861f4a	Major build system modification. Removed symlinks (problem with make dist), solved issues with static builds and can accept most compile options. The only unsupported compile option for now is --enable-mca-no-build=pml-v. Still investigating this... This commit was SVN r17294.	2008-01-29 06:07:57 +00:00
George Bosilca	fad6136794	To be or not to be ! As DR require 64 bits atomics, only allow it to build when thread support is disabled or we have 64 bits atomics support. This commit was SVN r17293.	2008-01-29 05:24:56 +00:00
Rich Graham	165fc3f8cc	memory allocation implemented and debugged. Still need to finish file allocation/dealocation and control information initialization. This commit was SVN r17291.	2008-01-29 03:09:12 +00:00
Pavel Shamis	7d83f34eb0	Protecting the apm code with OMPI_HAVE_THREADS. This commit was SVN r17284.	2008-01-28 16:10:18 +00:00
Jeff Squyres	6a49c97368	Remove erroneous #if This commit was SVN r17282.	2008-01-28 14:38:03 +00:00
Pavel Shamis	28a3917306	Adding APM support (over different lids). This commit was SVN r17280.	2008-01-28 10:38:08 +00:00
George Bosilca	c5d5fcf50a	Protect the standard header file, and allow the PML V to compile on Windows. This commit was SVN r17250.	2008-01-26 18:43:06 +00:00
Aurelien Bouteiller	ca8eb1fb30	There should be no leftovers of configuration phase after distclean This commit was SVN r17249.	2008-01-26 09:56:02 +00:00
Aurelien Bouteiller	b5d44261a0	Fix one warning about extremely long lines (due to macro expansion) This commit was SVN r17247.	2008-01-26 00:38:33 +00:00
Aurelien Bouteiller	48cabdc40b	Changed build system. Should be more distcheck, VPATH, static and other compilation mode friendly. This commit was SVN r17245.	2008-01-25 23:57:01 +00:00
Rich Graham	e24c2ebbc0	have a working skeleton for the SM-V2 component. It does nothing at this stage. This commit was SVN r17241.	2008-01-25 21:16:36 +00:00
Rich Graham	1d0334f4f2	skeleton for new shared memory collective component. This commit was SVN r17235.	2008-01-25 19:35:26 +00:00
Rainer Keller	f7e586fc01	- allow --enable-mca-direct=pml-ob1 This commit was SVN r17227.	2008-01-25 09:56:45 +00:00
Rich Graham	432ba0cecd	add comments about the life-cycle of a collective module. This commit was SVN r17223.	2008-01-25 03:46:31 +00:00
George Bosilca	3418485085	Replace the tport by a queue. This commit was SVN r17221.	2008-01-25 01:15:18 +00:00
Aurelien Bouteiller	e471abb55e	put back ompi ignore until long filenames and other dist issues are fixed This commit was SVN r17219.	2008-01-25 00:28:30 +00:00
Donald Kerr	66acac8ff3	the value for invalid idx was just plain wrong, a more appropriate value is now used This commit was SVN r17201.	2008-01-24 15:01:26 +00:00
Jeff Squyres	2227d5ec4a	Add configure check for struct ibv_device.transport type, which was added in OFED v1.2. Still need to fix up oob and rdma_cm cpc's to do something better with this information... This commit was SVN r17198.	2008-01-24 12:14:21 +00:00
Aurelien Bouteiller	11815d9773	Fixed two warnings (especially the one that get repeted a large number of times in 64bit builds) This commit was SVN r17197.	2008-01-24 04:59:31 +00:00
Aurelien Bouteiller	a9045402c4	remove a pedantic warning This commit was SVN r17196.	2008-01-24 02:29:07 +00:00
Aurelien Bouteiller	76b13f91b9	fixed link:wq error in static mode This commit was SVN r17194.	2008-01-23 23:54:02 +00:00
Aurelien Bouteiller	f29ed2ed53	fixed missing errno.h on some architectures This commit was SVN r17186.	2008-01-23 20:24:54 +00:00
Aurelien Bouteiller	6fe17aff4a	solve compatibility issue from MMAP_NOCACHE This commit was SVN r17184.	2008-01-23 19:29:19 +00:00
Aurelien Bouteiller	69b3bae999	removed ignore, as the code is robust enough to avoid interfering with others This commit was SVN r17182.	2008-01-23 17:27:23 +00:00
Gleb Natapov	6e4155d111	Initialize local variable before use. This commit was SVN r17170.	2008-01-21 15:17:49 +00:00
Gleb Natapov	52c94fa7ea	Fix compilation warnings. This commit was SVN r17169.	2008-01-21 15:07:39 +00:00
Gleb Natapov	c9a1b06771	Remove trailing whitespaces. No code changes in this commit. This commit was SVN r17167.	2008-01-21 12:11:18 +00:00
George Bosilca	31390c0074	We should take in account the extent of the datatype when we compute the initial displacement in bytes. Thanks to Daniel G. Hyams for the fix. This commit was SVN r17165.	2008-01-19 05:34:53 +00:00
George Bosilca	170416797d	This commit was SVN r17162.	2008-01-18 20:10:57 +00:00
George Bosilca	0081202195	Mark the receives as ELAN_TPORT_RXBUF \| ELAN_TPORT_RXANY ... This commit was SVN r17161.	2008-01-18 20:00:44 +00:00
George Bosilca	bf299bb833	Keep most of the functions as static. Improve the progress function. Get rid of all internal quues that are not really useful. This commit was SVN r17160.	2008-01-18 19:28:50 +00:00
Donald Kerr	5f884b1ca4	fix for #1130 - adds support for multi-rail configurations This commit was SVN r17152.	2008-01-17 17:30:50 +00:00
Donald Kerr	908b514ac5	update use of internal tag values to accommodate the active message change found in r17140 This commit was SVN r17148. The following SVN revision numbers were found above: r17140 --> open-mpi/ompi@6310ce955c	2008-01-16 21:17:25 +00:00
Pavel Shamis	add4d9df8a	XRC fixes for MPI2 dynamics. This commit was SVN r17144.	2008-01-15 21:14:48 +00:00
Jeff Squyres	251842ff6a	Remove this AS_IF -- it breaks "make dist". This commit was SVN r17143.	2008-01-15 12:33:08 +00:00
George Bosilca	e8ac5ff04d	Typos. This commit was SVN r17141.	2008-01-15 05:37:42 +00:00
George Bosilca	6310ce955c	The first patch related to the Active Message stuff. So far, here is what we have: - the registration array is now global instead of one by BTL. - each framework have to declare the entries in the registration array reserved. Then it have to define the internal way of sharing (or not) these entries between all components. As an example, the PML will not share as there is only one active PML at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3 are reserved for the framework while the remaining 5 are use internally by each framework. - The registration function is optional. If a BTL do not provide such function, nothing happens. However, in the case where such function is provided in the BTL structure, it will be called by the BML, when a tag is registered. Now, it's time for the second step... Converting OB1 from a switch based PML to an active message one. This commit was SVN r17140.	2008-01-15 05:32:53 +00:00
George Bosilca	98f79f2ea0	Remove the second declaration of the PML V component. This commit was SVN r17139.	2008-01-15 05:26:26 +00:00
Jon Mason	a0d4122606	The new cpc selection framework is now in place. The patch below allows for dynamic selection of cpc methods based on what is available. It also allows for inclusion/exclusions of methods. It even futher allows for modifying the priorities of certain cpc methods to better determine the optimal cpc method. This patch also contains XRC compile time disablement (per Jeff's patch). At a high level, the cpc selections works by walking through each cpc and allowing it to test to see if it is permissable to run on this mpirun. It returns a priority if it is permissable or a -1 if not. All of the cpc names and priorities are rolled into a string. This string is then encapsulated in a message and passed around all the ompi processes. Once received and unpacked, the list received is compared to a local copy of the list. The connection method is chosen by comparing the lists passed around to all nodes via modex with the list generated locally. Any non-negative number is a potentially valid connection method. The method below of determining the optimal connection method is to take the cross-section of the two lists. The highest single value (and the other side being non-negative) is selected as the cpc method. svn merge -r 16948:17128 https://svn.open-mpi.org/svn/ompi/tmp-public/openib-cpc/ . This commit was SVN r17138.	2008-01-14 23:22:03 +00:00
Pavel Shamis	6e50fca2dd	Fixing permissions for XRC domain file. This commit was SVN r17127.	2008-01-13 19:23:11 +00:00
Jon Mason	626e0814a2	Style clean-up This commit was SVN r17126.	2008-01-12 18:47:17 +00:00
Ron Brightwell	b02cad2a0b	added optional rendezvous protocol for long messages This commit was SVN r17124.	2008-01-11 22:12:45 +00:00
George Bosilca	3fca3973d3	The PTLs are now long gone !!! This commit was SVN r17104.	2008-01-10 00:18:45 +00:00
Jon Mason	3970c3ff6c	Add Chelsio T3 to ompi/mca/btl/openib/mca-btl-openib-hca-params.ini This commit was SVN r17101.	2008-01-09 22:14:18 +00:00
Jon Mason	597c7e68f1	Minor cleanups This commit was SVN r17100.	2008-01-09 21:54:11 +00:00
George Bosilca	1bd31aa3ac	Cleanup the OMPI_DECLSPEC/OMPI_MODULE_DECLSPEC in the PMLs. This commit was SVN r17093.	2008-01-09 20:32:39 +00:00
Rolf vandeVaart	870fa8b1f1	Pad the sm btl header to double-word alignment. Preserves PML header as double-word aligned and prevents bus errors on SPARC based servers. This is part of fix for #1148. Refs trac:1148 This commit was SVN r17090. The following Trac tickets were found above: Ticket 1148 --> https://svn.open-mpi.org/trac/ompi/ticket/1148	2008-01-09 18:50:51 +00:00
Gleb Natapov	25ce70bb92	Call mca_btl_openib_endpoint_post_send() holding endpoint lock and not holding qp lock since this is what the function assumes. This commit was SVN r17086.	2008-01-09 14:46:41 +00:00
Pavel Shamis	99f51482e3	Fixing openib finalization flow. This commit was SVN r17085.	2008-01-09 12:36:30 +00:00
Gleb Natapov	51d6ca0cb6	Provide no lock version of mca_btl_openib_endpoint_post_rr(). On connection creation we call it with endpoint lock already held. This commit was SVN r17084.	2008-01-09 10:39:35 +00:00
Gleb Natapov	50af6b9e78	Rearrange functions order so that functions are defined before they are used. No code changes here. This commit was SVN r17083.	2008-01-09 10:27:15 +00:00
Gleb Natapov	621fa223c5	Create free lists of fragments per HCA, not per BTL. Saves memory in case of multiple LMCs. This commit was SVN r17082.	2008-01-09 10:26:21 +00:00
Gleb Natapov	5ce3213158	Rearrange functions order so that functions are defined before they are used. No code changes here. This commit was SVN r17081.	2008-01-09 10:05:41 +00:00
Gleb Natapov	b37ff74a24	Make function that is used only in one file static. Remove static functions declaration. This commit was SVN r17080.	2008-01-09 09:54:35 +00:00
Ethan Mallove	f32dcb1636	The Sun Studio 12 compilers need to have `inline` specified as `static` in cases where a function is not part of a separate compilation unit (such as `append_recv_req_to_queue`). This commit was SVN r17069.	2008-01-08 18:45:51 +00:00
Pavel Shamis	fbf7bcd9a9	We need to prepost on srq/xrc before reply with ENDPOINT_XOOB_CONNECT_XRC_RESPONSE. This commit was SVN r17066.	2008-01-08 10:30:16 +00:00
Gleb Natapov	8bfcfa464a	Don't call free(), or library functions that may call free() inside (such as ibv_dereg_mr() for instance) from ptmalloc callback. Call to free() from the callback causes deadlock. Notice what should be unregistered inside the callback and do actual cleanup at the next call to mpool->register(). This commit was SVN r17064.	2008-01-08 08:55:42 +00:00
Aurelien Bouteiller	9bf54e1604	Windows compatibility patch. Also introduces work in progress "convertor" sender based copy algorithm. This algorithm cannot be selected without other modifications in the convertor (not currently available in trunk). The default old synchronous copy algorithm is selected by default. This commit was SVN r17063.	2008-01-07 23:35:44 +00:00
Rolf vandeVaart	0f0fde3490	Partial fix for #1148 . Enable this for 32-bit sparc as well as 64-bit sparc. This commit was SVN r17059.	2008-01-07 15:43:44 +00:00
Gleb Natapov	c3bbf69356	Set send_flags correctly in btl_openib_put. Otherwise we may reuse flags from previous use of the buffer and they may be incorrect. This commit was SVN r17058.	2008-01-07 10:19:07 +00:00
George Bosilca	d2324050f8	Allow the PML V component to be compiled on Windows. Force all .c files to include the ompi_config.h as the first #include. This commit was SVN r17056.	2008-01-05 00:17:32 +00:00
George Bosilca	48f5a26e8c	Cast to keep VC happy (quiet). This commit was SVN r17054.	2008-01-04 23:13:32 +00:00
Jeff Squyres	a234ba198a	Remove superflous / unused -D from Makefile.am. This commit was SVN r17030.	2008-01-02 18:00:20 +00:00
Jeff Squyres	c9bea80f8f	Fix unbalanced parenthesees noticed by Paul Hargove. This commit was SVN r17029.	2008-01-02 13:34:07 +00:00
Gleb Natapov	2fb6947f88	Destroy endpoints that use eager rdma communication before destroying SRQ. Do't skip async event thread destruction if SRQ was not destroyed, or it will segfault on module removal. This commit was SVN r17025.	2007-12-23 13:58:31 +00:00
Gleb Natapov	b06d92bdab	OpenIB BTL has three channels through which data can be received (eager rdma, high prio QPs and low prio QPs) and because not all of them are polled each time progrgess() is called (to save on latency) starvation is possible. The commit fixes this. Now each channel is polled, but higher priority channels are polled more often. Three new parameters are introduced that control polling ratios between different channels. This commit was SVN r17024.	2007-12-23 12:29:34 +00:00
Brad Penoff	4c2571b54c	fixed more 64 bit SCTP BTL warnings This commit was SVN r17022.	2007-12-21 21:50:00 +00:00
Brad Penoff	195faa37b6	fixed send side of 64 bit compilation warnings This commit was SVN r17019.	2007-12-21 19:11:50 +00:00
Jeff Squyres	558d179e2e	Fix typo. This commit was SVN r17012.	2007-12-21 14:25:48 +00:00
George Bosilca	42414b27e9	Use BEGIN_C_DECLS and END_C_DECLS instead of the ugly #if/#endif. This commit was SVN r17009.	2007-12-21 06:19:46 +00:00
George Bosilca	b58dae00db	Allow PERUSE to compile correctly. This commit was SVN r17008.	2007-12-21 06:18:19 +00:00
George Bosilca	906e8bf1d1	Replace the ompi_pointer_array with opal_pointer_array. The next step (sometimes after the merge with the ORTE branch), the opal_pointer_array will became the only pointer_array implementation (the orte_pointer_array will be removed). This commit was SVN r17007.	2007-12-21 06:02:00 +00:00
Tim Mattox	bbeef5b84b	Change the MX BTL's exclusivity to MCA_BTL_EXCLUSIVITY_DEFAULT, so that it is higher than the new TCP BTL exclusivity as of r16942. The portals BTL maintainer may want to do the same... This commit was SVN r16995. The following SVN revision numbers were found above: r16942 --> open-mpi/ompi@80e9730100	2007-12-19 21:24:45 +00:00
Gleb Natapov	35bf8c7c46	Rewrite OB1 matching logic. Get rid of macros, make the code shorter. This commit was SVN r16993.	2007-12-19 09:16:20 +00:00
Pavel Shamis	fcbca510d8	The ib_inline_max should be updated only when SEND qp is created. This commit was SVN r16973.	2007-12-17 10:30:30 +00:00
Gleb Natapov	f79e344ea4	Fix bug in debug build. This commit was SVN r16972.	2007-12-17 10:26:18 +00:00
Gleb Natapov	64a95f63cd	Fix error reporting in openib if parameter value is out of range. This commit was SVN r16971.	2007-12-16 14:04:36 +00:00
Gleb Natapov	5cd38b8b06	Better encapsulate heterogeneous arch handling in ob1. This commit was SVN r16970.	2007-12-16 08:45:44 +00:00
Gleb Natapov	8b511b969d	Introduce a new BTL parameter btl_rndv_eager_limit which determines size of a first fragment of rendezvous protocol. Remove no longer used btl_min_send_size parameter. This commit was SVN r16969.	2007-12-16 08:35:17 +00:00
Jeff Squyres	213b5d5c6e	Per long threads on the mailing list and much confusion discussion about linkers, have all OPAL, ORTE, and OMPI components '''not'' link against the OPAL, ORTE, or OMPI libraries. See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a better-formatted version of the same info). This commit was SVN r16968.	2007-12-15 13:32:02 +00:00
Brad Penoff	540d483dd3	64 bit fix and initial Solaris support This commit was SVN r16967.	2007-12-15 03:28:10 +00:00
Donald Kerr	d05d3afaed	clean up and make consistent the reporting out from the udapl btl; report out readeable event string instead of just a number This commit was SVN r16954.	2007-12-13 15:32:26 +00:00
Josh Hursey	a287c9cb65	This commit distinguishes the file transfer stage from the finish stage. This commit also cleans up the checkpoint and terminate case making it more precise than before. Previously the application could make a small amount of progress between checkpoint completion and application termination. Now the application will make no progress at all in this time span. Additional minor change: - Start using OPAL_INT_TO_BOOL instead of if/else logic This commit was SVN r16952.	2007-12-13 14:37:17 +00:00
Brad Penoff	ecd563b0fa	reduced noise for SCTP BTL on RHEL4U4 This commit was SVN r16951.	2007-12-13 03:15:29 +00:00
Aurelien Bouteiller	93f39fa190	Fixes various issues with --enable-visibility, C++ and exotic C compilers. Aurelien This commit was SVN r16949.	2007-12-12 19:13:23 +00:00
Jeff Squyres	80e9730100	Per http://www.open-mpi.org/community/lists/devel/2007/12/2698.php and this thread: http://www.open-mpi.org/community/lists/devel/2007/12/2807.php, set TCP's exclusivity to LOW+100 and SCTP's exclusivity to LOW. This commit was SVN r16942.	2007-12-12 15:55:37 +00:00
Jon Mason	e05cd7b0e4	To modify the default connection method, a "btl_openib_connect <arg>" should be passed via commandline. However, there is a slight coding bug in the openib connect code. When registering the name of the option, mca_base_param_reg_string will prepend the relevant info ("btl_openib_" in this case). The existing code will require "btl_openib_btl_openib_connect" instead of "btl_openib_connect". This patch corrects this. This commit was SVN r16937.	2007-12-11 20:36:36 +00:00
Galen Shipman	a04d21b459	Make CNL compile again.. This commit was SVN r16929.	2007-12-11 16:14:30 +00:00
Gleb Natapov	2a59b2a68f	1. Set segments length in prepare_src() after packing because actual size may be smaller then allocated size. 2. If reserve zero don't allocate coalesced frag since it will be RDMAed, not send. The logic was other way around. This commit was SVN r16928.	2007-12-11 13:10:52 +00:00
Jon Mason	df82fcb917	Slight word usage and grammar error in the openib btl help test. I believe the change below is the intended meaning. This commit was SVN r16921.	2007-12-10 21:50:48 +00:00
Donald Kerr	a604fca52c	follow on change to r16901 and r16898; the interface change mca_btl_udapl_alloc() was not applied to two locations in this file This commit was SVN r16918. The following SVN revision numbers were found above: r16898 --> open-mpi/ompi@7364b7cf47 r16901 --> open-mpi/ompi@e2e211f23b	2007-12-10 18:10:52 +00:00
Gleb Natapov	17611dafbe	Fix pointer casting on 32bit machines. This commit was SVN r16907.	2007-12-09 14:15:35 +00:00
Gleb Natapov	2f9c5b46cf	Return OMPI_ERR_RESOURCE_BUSY from openib_btl_send() if fragment is not on wire. This commit was SVN r16906.	2007-12-09 14:14:11 +00:00
Gleb Natapov	e0dc53e516	Use mca_bml_base_send_status() in OB1. This commit was SVN r16905.	2007-12-09 14:13:24 +00:00
Gleb Natapov	666b282e7e	Add mca_bml_base_send_status function. It returns ORTE_ERR_RESOURCE_BUSY if packet was queued inside BTL. BTL should return this error if packet was queued internally. This commit was SVN r16904.	2007-12-09 14:12:38 +00:00
Gleb Natapov	493951e09d	Add heterogeneous support to message coalescing. This commit was SVN r16903.	2007-12-09 14:10:25 +00:00
Gleb Natapov	b4698dc6df	Use flags provided during allocation to coalesce to correct priority queue. This commit was SVN r16902.	2007-12-09 14:08:55 +00:00
Gleb Natapov	e2e211f23b	Add flags parameter to btl_alloc() and btl_prepare_src() functions. If BTL knows at the time of allocation priority of a descriptor it may do some optimizations. This commit was SVN r16901.	2007-12-09 14:08:01 +00:00
Gleb Natapov	5313a2baa7	Message coalescing for openib BTL. If fragment is waiting to be transmitted in a pending queue pack another message into it if there is enough space there. This commit was SVN r16900.	2007-12-09 14:05:13 +00:00
Gleb Natapov	7302cd24eb	Call btl_alloc() from btl_prepare_src() to have one point of frag allocation. This commit was SVN r16899.	2007-12-09 14:02:32 +00:00
Gleb Natapov	7364b7cf47	Add endpoint parameter to btl_alloc() function. Enables various optimizations inside BTL. This commit was SVN r16898.	2007-12-09 14:00:42 +00:00
Gleb Natapov	2d784752dd	Remove descriptor caching form BML. With descriptor caching some optimizations are impossible. This commit was SVN r16897.	2007-12-09 13:58:17 +00:00
Gleb Natapov	de3761208a	Send cm_seen by eager rdma channel. Encode qp index into credits filed. If cm_seen is not send here non symmetric eager rdma connection may hang. This commit was SVN r16896.	2007-12-09 13:56:13 +00:00
Tim Mattox	d188642715	Apparently the SCTP BTL has a btl_sctp_component.h file that needs to be part of the "sources" list. Hopefully this will clear of the nightly tarball creation for the trunk. This commit was SVN r16895.	2007-12-08 04:05:59 +00:00
Galen Shipman	4daa552c97	Correct makefile to include all sources, should fix a problem in building a distro.. This commit was SVN r16894.	2007-12-07 18:59:16 +00:00
Karl Mroz	71b54d8e4e	Removed .ompi_ignore and .ompi_unignore from SCTP BTL. This commit was SVN r16893.	2007-12-07 17:02:32 +00:00
Aurelien Bouteiller	6190c97ee9	PML V and vprotocol framework management of customizable wait/test. This is still a fast and dirty implementation (cleanup of the customized request functions is not totally correct if several component modify them out of order). This commit was SVN r16890.	2007-12-07 08:21:25 +00:00
Aurelien Bouteiller	859169214c	Vprotocol pessimist benefits from customizable requests. Waitany, waitsome, test, testany, testall, testsome can now be hooked and are therefore logged correctly. This commit was SVN r16885.	2007-12-07 08:17:30 +00:00
Jon Mason	20294e7800	There is a double call to ompi_btl_openib_connect_base_open in mca_btl_openib_mca_setup_qps(). It looks like someone just forgot to clean-up the previous call when they added the check for the return code. I ran a quick IMB test over IB to verify everything is still working. This commit was SVN r16870.	2007-12-06 17:25:38 +00:00
Pavel Shamis	e8aeadb11e	XRC fixes: - create separate xrc domain file for each hca - return error if we failed to create xrc file. This commit was SVN r16853.	2007-12-05 14:32:44 +00:00
Pavel Shamis	f60ca0e4e5	Removing unused mca_btl_openib_ib_address_status This commit was SVN r16835.	2007-12-04 13:16:26 +00:00
Pavel Shamis	57728986f8	Fixing XRC multiport/multisubnet support. This commit was SVN r16819.	2007-12-03 09:49:53 +00:00
Gleb Natapov	b2858236fb	Use new free list interface. This commit was SVN r16818.	2007-12-02 15:13:11 +00:00
Gleb Natapov	a774cd98f8	Put send completions to low prio CQ. Receive is more important. This commit was SVN r16817.	2007-12-02 14:46:37 +00:00
Gleb Natapov	b17f5b7480	Change how default receive queues parameters are calculated. Current default parameters don't make any sense. Credits are never piggybacked. Also make default queue sizes to be calculated from eager_limit and max_send_size values. This commit was SVN r16816.	2007-12-02 14:43:28 +00:00
Josh Hursey	5fb83a4f10	- Remove an unnecessary barrier - verbose -> VERBOSE just for the fun of it This commit was SVN r16811.	2007-11-30 22:26:18 +00:00
Rich Graham	6e77414a68	changes to the ompi_free_list_ex - called ompi_free_list_ex_new, for now. This commit was SVN r16803.	2007-11-29 21:18:37 +00:00
Ron Brightwell	0138a2ee17	Do cleanup in ompi_mtl_portals_del_procs() rather than ompi_mtl_portals_finalize(). Previous code was cleaning up Portals resources that hadn't been allocated, which caused valid handles used elsewhere to be freed, which broke cnos_barrier() for the Portals btl. This commit was SVN r16801.	2007-11-29 17:29:46 +00:00
Jeff Squyres	8c0060701c	Stub out the ibcm CPC. This commit was SVN r16800.	2007-11-29 13:23:17 +00:00
Pavel Shamis	8aca6eb31b	OFED 1.3 doesn't implement ibv_resize_cq for connectX. On error exit from ibv_resize_cq we should to check if the function is implemented. This commit was SVN r16799.	2007-11-28 15:23:19 +00:00
Gleb Natapov	5f242c77f2	Post each recv wr not separately but in one call to ibv_post_recv(). This commit was SVN r16798.	2007-11-28 14:57:15 +00:00
Gleb Natapov	14cffee726	Uninline mca_btl_openib_post_srr() function. This commit was SVN r16797.	2007-11-28 14:52:31 +00:00
Pavel Shamis	1c314ef4c3	If XRC qp was specified in btl_openib_receive_queues we automatically should choose xoob connection module. This commit was SVN r16796.	2007-11-28 10:33:32 +00:00
Pavel Shamis	488a508732	Removing comments from help file. This commit was SVN r16795.	2007-11-28 10:16:08 +00:00
Pavel Shamis	3e2e4f6d2a	Removing unused lid. This commit was SVN r16794.	2007-11-28 10:06:57 +00:00
Pavel Shamis	aa79bdabc8	Removing port_touse - we don't really need it This commit was SVN r16793.	2007-11-28 09:57:48 +00:00
Pavel Shamis	2ffbe8776a	Fixing compilation problems in openib This commit was SVN r16792.	2007-11-28 09:38:49 +00:00
Gleb Natapov	218adb2a96	Account for eager rdma credit fragments when creating send queue. Create XRC receive QP with zero receive and send queue length. We don't going to use this QP for send and receives a posted to SRQs. This commit was SVN r16791.	2007-11-28 07:22:01 +00:00
Gleb Natapov	601952a952	Don't shared endpoint->qps array, only pointer to actual QP. Calculate send queue size for shared QP based on all endpoints that want to use it. This commit was SVN r16790.	2007-11-28 07:21:07 +00:00
Gleb Natapov	b46c9cc7bc	Make xrc use srq_qp unions instead of the xrc_qp which is exactly like srq_qp. This commit was SVN r16789.	2007-11-28 07:20:26 +00:00
Gleb Natapov	be0981fc07	Change a type of xrc_recv_qp to "struct ibv_qp". This commit was SVN r16788.	2007-11-28 07:19:36 +00:00
Gleb Natapov	bd47da4699	Initial XRC support by Mellanox. This commit was SVN r16787.	2007-11-28 07:18:59 +00:00
Gleb Natapov	b49788c499	Receive queue is not used in case of SRQ QP, so don't create one. This commit was SVN r16786.	2007-11-28 07:17:22 +00:00
Gleb Natapov	923666b75c	Process pending put/get frags on endpoint connection establishment. This commit was SVN r16785.	2007-11-28 07:16:52 +00:00
Gleb Natapov	e502402470	Fix endpoint destructor to not skip closed endpoints. This commit was SVN r16784.	2007-11-28 07:15:54 +00:00
Gleb Natapov	5a4e953aaa	Allow share the same qp for different buffer sizes. Needed for XRC support. This commit was SVN r16783.	2007-11-28 07:15:20 +00:00
Gleb Natapov	b123696d57	Fix async thread creation and destruction. Create async thread only when it is needed instead of creating it and then canceling if it is not needed. Change error handling during finalize so that it will not skip async thread destruction. Otherwise async thread may segfault during openib module unloading. This commit was SVN r16782.	2007-11-28 07:14:34 +00:00
Gleb Natapov	5463eb892c	Send all explicit credits for PP QPs of all orders over smallest PP qp. This commit was SVN r16781.	2007-11-28 07:13:34 +00:00
Gleb Natapov	a9f864d15c	If there is an eager rdma credit, but there is no WQE to send a packet we add it to a pending queue of eager rdma QP instead of correct pending list. This patch fixes this by getting reed of "eager rdma qp" notion. Packet is always send over its order QP. The patch also adds two pending queues for high and low prio packets. Only high prio packets are sent over eager RDMA channel. This commit was SVN r16780.	2007-11-28 07:12:44 +00:00
Gleb Natapov	6a2d210b7d	Use OMPI object system to make fragment hierarchy more object oriented. The main idea (except of cleanup) is to save on initialisation of unneeded fields and to use C type checking system to catch obvious errors. This commit was SVN r16779.	2007-11-28 07:11:14 +00:00
Gleb Natapov	267cd2342a	Cleanup. Remove unused functions. This commit was SVN r16778.	2007-11-28 07:08:56 +00:00
Ron Brightwell	924414f92f	Added support for Accelerated Portals for the btl. This commit was SVN r16771.	2007-11-21 21:34:17 +00:00
Ron Brightwell	a6d6be1bb9	Added send-side optimizations (persistent zero-length md and copy blocks) and support for Acclerated Portals. This commit was SVN r16770.	2007-11-21 21:31:37 +00:00
Brad Penoff	fb5536f11d	conforming SCTP BTL to Open MPI naming conventions and IP requirements This commit was SVN r16764.	2007-11-21 10:13:41 +00:00
Andrew Friedley	c50f2aa74c	fix warning This commit was SVN r16759.	2007-11-20 16:55:12 +00:00
Brad Penoff	ede8a6a7a1	adjusting for Linux when sctp_recvmsg returns 0 for remote close This commit was SVN r16742.	2007-11-20 06:02:08 +00:00
Tim Prins	f42fcd36db	make the mx btl compile again after the free list changes This commit was SVN r16735.	2007-11-19 19:41:22 +00:00
Brad Penoff	f34ddfef80	for SCTP BTL, added Mac OS X support for systems using SCTP NKE (Network Kernel Extension) This commit was SVN r16729.	2007-11-17 02:56:27 +00:00
Aurelien Bouteiller	15ffe6c89c	Accomoding the new interface for free_lists. This commit was SVN r16727.	2007-11-16 00:00:38 +00:00
Brad Penoff	5abd2d8064	initial SCTP BTL commit This commit was SVN r16723.	2007-11-13 23:39:16 +00:00
Adrian Knoth	037a533752	Reformatted r16691 to OMPI style. Re #733 This commit was SVN r16693. The following SVN revision numbers were found above: r16691 --> open-mpi/ompi@8dca19cb3b	2007-11-08 12:54:48 +00:00
Adrian Knoth	8dca19cb3b	upstream patch, provided by Jiri Polach. Re #733 This commit was SVN r16691.	2007-11-08 12:44:10 +00:00
Jeff Squyres	a4d571f8ad	Fix typo that broke the build. This commit was SVN r16635.	2007-11-02 09:19:55 +00:00
Rich Graham	27a748e7eb	change all instances of ompi_free_list_init to ompi_free_list_init_new. Header and payload data are specified separately at this stage. This commit was SVN r16633.	2007-11-01 23:38:50 +00:00
Andrew Friedley	46516d98e1	Update MCA params -- sd_num_peer is no longer used, change rd_num_init to rd_num This commit was SVN r16601.	2007-10-29 22:56:30 +00:00
Andrew Friedley	8273b61471	Bugfix for hangs in certain communication patterns, particularly alltoall. This commit was SVN r16600.	2007-10-29 21:51:28 +00:00
Gleb Natapov	04578ffdd6	Change calls to bml_btl->btl_alloc() to mca_bml_base_alloc(). This commit was SVN r16596.	2007-10-28 16:04:17 +00:00
Rich Graham	67f4b69848	propogate fix for out of buffered send memory space to dr and ob1 - thanks George. This commit was SVN r16593.	2007-10-27 00:17:53 +00:00
Rich Graham	9c0483088a	if unable to get buffered space, try and progress communications to free up resources. This commit was SVN r16591.	2007-10-26 23:16:31 +00:00
George Bosilca	d67c0eefb4	Remove a compilation warning about using uninitialized variables. This commit was SVN r16589.	2007-10-26 20:15:28 +00:00
George Bosilca	b1b5cb6453	Looks like SO_REUSEPORT it's not defined on some platforms. Switch to the conventional SO_REUSEADDR instead. This commit was SVN r16588.	2007-10-26 19:56:21 +00:00
George Bosilca	337f78a4a8	Restrict the port range for the OOB and the BTL. Each protocols (v4 and v6) has his own range which is defined by a min value and a range. By default there is no limitation on the port range, which is exactly the same behavior as before. This commit was SVN r16584.	2007-10-26 16:36:51 +00:00
George Bosilca	682f110658	Correctly test the finalize condition. Thanks to Ake Sandgren for bringing this issue to our attention. This commit was SVN r16560.	2007-10-24 13:34:27 +00:00
Gleb Natapov	3a63eb6c17	Cleanup macro definitions. This commit was SVN r16554.	2007-10-23 13:33:19 +00:00
Gleb Natapov	d836f3dbbe	Remove unused macro. This commit was SVN r16552.	2007-10-23 13:18:10 +00:00
Gleb Natapov	18ed60edeb	Revert previous commit. There was no memory leak, the pointer is saved inside free list for future use. This patch moves BTL initialization into separate function too. This commit was SVN r16551.	2007-10-23 12:57:45 +00:00
Gleb Natapov	657e544e02	Fix memory leak. Define init_data on a stack instead of allocation it each time. This commit was SVN r16550.	2007-10-23 11:10:52 +00:00
Gleb Natapov	9e2d5acf8e	Remove unused filed from openib fragment structure. This commit was SVN r16549.	2007-10-23 07:38:29 +00:00
George Bosilca	95c9fbdf45	Make sure the MX MTL component is shared between all files. This commit was SVN r16545.	2007-10-22 18:06:52 +00:00
Gleb Natapov	63dde87076	If SM BTL cannot send fragment because the cyclic buffer is full put the fragment on the pending list and send it later instead of spinning on opal_progress(). This commit was SVN r16537.	2007-10-22 12:07:22 +00:00
Rich Graham	0de9bd9fa0	when attaching an md for posted receive, generate a start event, so that PtlMDUpdate will pick up all incoming events. This commit was SVN r16517.	2007-10-19 19:09:40 +00:00
Gleb Natapov	52c6160252	MCA_PML_BASE_REQUEST_MPI_COMPLETE() macro does nothing except call to ompi_request_complete(). Remove the macro and call the function directly. This commit was SVN r16498.	2007-10-18 14:20:24 +00:00
George Bosilca	aa20a94b6f	Remove warning about an unused variable. This commit was SVN r16497.	2007-10-18 13:48:56 +00:00
Gleb Natapov	4f865e22e8	We have two different version of ompi_request_complete. One as a function another as a macro. Make it one inline function. This commit was SVN r16495.	2007-10-18 13:02:27 +00:00
Gleb Natapov	e0a3a7e53e	Move duplicated code all over the code to a single function ompi_request_wait_completion(). This commit was SVN r16494.	2007-10-18 12:33:21 +00:00
Gleb Natapov	807f49ed7f	If there are more then one BTL present we may divide payload between them in such a way that converter will not be able to pack some of it. This commit adds handling of such cases. If converter can't pack any data for a BTL the data is sent over another BTL that has data to send. This commit was SVN r16493.	2007-10-18 12:07:37 +00:00
Jeff Squyres	b7eeae0a74	Remove the mvapi BTL. Woo hoo! This commit was SVN r16483.	2007-10-17 14:08:03 +00:00
Jeff Squyres	94b1e9cff9	Update to use BTL_VERBOSE and BTL_ERROR instead of opal_output'ing to the mca_btl_base_output stream directly (and relying on it to be -1 if we didn't want any output). This commit was SVN r16449.	2007-10-15 17:53:02 +00:00
Rolf vandeVaart	3dd5196338	Remove the --mca btl_base_debug flag and clean up the use of the --mca btl_base_verbose flag. The btl framework now matches all the other frameworks. Slightly modify error messages for clarity. This commit was SVN r16443.	2007-10-15 13:10:20 +00:00
Gleb Natapov	1330974e5e	eager_limit is no longer needed in OB1 PML. Remove it. This commit was SVN r16442.	2007-10-15 09:26:42 +00:00
George Bosilca	436b0f2a5b	Way to many numbers in this uint32_t. This commit was SVN r16437.	2007-10-12 13:11:55 +00:00
Jeff Squyres	3500376d9e	Remove a warning about an unused label. This commit was SVN r16429.	2007-10-11 16:38:37 +00:00
George Bosilca	e3105a85be	Don't require a progress function from the PML. If there is one then the PML base will take care of the registration with the event library. Otherwise, (and this apply for the CM case) the MTL are in charge of registering their own progress function. This commit was SVN r16415.	2007-10-09 23:28:53 +00:00
Galen Shipman	6a25a635de	that shouldn't have slipped through.. This commit was SVN r16411.	2007-10-09 19:07:23 +00:00
Galen Shipman	6b051e255e	already checked size.. no need to do it again.. This commit was SVN r16409.	2007-10-09 18:59:10 +00:00
Nysal Jan	b51d85fb3f	Fix assertion failure "assert( 0 == btl_endpoint->endpoint_cache_length )" while executing mt_coll testcase. This commit was SVN r16408.	2007-10-09 18:00:01 +00:00
Galen Shipman	62ade993ca	Seperate finalize and close for the PML, this gives the PML a chance to complete any outstanding operations prior to close. Before this change we just called pml_finalize in pml_close which causes problems if there are outstanding events that a BTL/MTL needs to progress during finalize. The problem is that MPI_COMM_WORLD and others were destroyed prior to closing the PML, pml_close would call pml_finalize, events would progress in the BTL, and these events expected MPI_COMM_WORLD to still be around.. This commit was SVN r16405.	2007-10-09 15:28:56 +00:00
Andrew Friedley	c15047b264	Add LLNL copyright to the file i modified yesterday This commit was SVN r16404.	2007-10-09 15:18:23 +00:00
Andrew Friedley	fd51d9cf28	The call to opal_list_insert() had an off by one error (I think), causing selected components to get lost with certain load orderings. I went ahead and rewrote the code to use opal_list_insert_pos() instead, which gives a cleaner flow and more speed. This commit was SVN r16392.	2007-10-08 23:01:36 +00:00
Josh Hursey	7437f37e96	This commit contains the following: * Fix some missing includes in a few places. * Add the cr_request() functionality to the BLCR CRS component. We are now dependent upon the 0.6.* series of BLCR. * Made the CR notification mechanism a registered function. This way we can have an OPAL-only version and it can be replaced at runtime with the ORTE version. * Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only CR functionality when the user wants it. Default: Disabled. * Fix the placement of a checkpoint request check in MPI_Init * Pull the OPAL notification mechanism into the SnapC framework. * We no longer fork/exec the 'opal-checkpoint' command for local checkpointing, the Local coordinator in the orted does this directly. * The Local and Application coordinator talk together bypassing the OPAL notifiation mechanism. * Optimized the Local <-> App Coordinator communication. * Improved the structure used to track vpid_snapshots in the local coord. * Fix a race condition in which an application under heavy communication load may produce an inconsistent global checkpoint. This commit was SVN r16389.	2007-10-08 20:53:02 +00:00
Jeff Squyres	f92d9097d8	Some more changes to update to coll v1.1.0 that were missed yesterday. This actually exposed a very, very long-standing bug where part of the coll base was incorrectly checking the coll API version against the MCA API version. When coll went to v1.1 (yesterday) and was no longer the same as the MCA v1.0, the test started failing. This commit fixes to check for v1.1 everywhere in the coll base, and to ensure to check coll framework/API version numbers against coll framework/API version numbers (vs. against the MCA API version number). This commit was SVN r16373.	2007-10-07 12:20:22 +00:00
Jeff Squyres	3d34bff596	No technical/functional changes: simply change the name of the "data" parameter to "module" everywhere, just to be a little more clear what the purpose of that parameter is. This commit was SVN r16372.	2007-10-07 08:36:45 +00:00
Jeff Squyres	fc2b4376e9	Update forgotten macro. This commit was SVN r16368.	2007-10-06 14:11:35 +00:00
Ralph Castain	54b2cf747e	These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC. The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component. This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done: As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in. In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in. The incoming changes revamp these procedures in three ways: 1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step. The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic. Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure. 2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed. The size of this data has been reduced in three ways: (a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes. To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose. (b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction. (c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using. While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly. 3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup. It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging. Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future. There are a few minor additional changes in the commit that I'll just note in passing: propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details. * requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details. * cleanup of some stale header files This commit was SVN r16364.	2007-10-05 19:48:23 +00:00
Jelena Pjesivac-Grbovic	ada43fef9e	This fixes bug #1157 in coll/self module. All vector functions had incorrect handling of the offset. This commit was SVN r16360.	2007-10-05 17:40:16 +00:00
Jeff Squyres	f92154fc72	Gah -- ompi_info doesn't setup the connect pseudo component, so it'll be NULL. Ensure to protect for this. This commit was SVN r16333.	2007-10-04 18:03:56 +00:00
Jeff Squyres	13fa7ae93e	It's not necessary to link against all 3 libs (in fact, we shouldn't do it -- let libtool pull them in via the .la file if it needs to) This commit was SVN r16332.	2007-10-04 18:01:30 +00:00
Jeff Squyres	80ce974291	Fixes trac:1156: ensure to finalize the "connect" sub-component. This commit was SVN r16330. The following Trac tickets were found above: Ticket 1156 --> https://svn.open-mpi.org/trac/ompi/ticket/1156	2007-10-04 17:36:12 +00:00
Andrew Friedley	2e66590993	Fix mistakes in the basic component.. can't call collectives on the communicator and always pass the basic module.. have to give them the module off the communicator. This commit was SVN r16329.	2007-10-04 16:29:24 +00:00
Andrew Friedley	5be7f5e2dc	fixes trac:1154 Check if an exclusion string (i.e. '-mca btl ^sm) was provided; if so OFUD just disables itself. This commit was SVN r16307. The following Trac tickets were found above: Ticket 1154 --> https://svn.open-mpi.org/trac/ompi/ticket/1154	2007-10-02 20:37:16 +00:00
Gleb Natapov	60af46d541	We have QP description in component structure, module structure and endpoint. Each one of them has a field to store QP type, but this is redundant. Store qp type only in one structure (the component one). This commit was SVN r16272.	2007-09-30 16:14:17 +00:00
Gleb Natapov	9c04b127f5	Forget to put this fix in previous commit. This commit was SVN r16271.	2007-09-30 15:33:20 +00:00
Gleb Natapov	3a15d645be	Remove lcl_qp_attr from endpoint qp description. It is used during init only. This commit was SVN r16270.	2007-09-30 15:29:35 +00:00
Aurelien Bouteiller	670956e172	Another cast mistake. This commit was SVN r16247.	2007-09-26 21:14:35 +00:00
Aurelien Bouteiller	f7d7d58fb6	Various cast type errors on 64bit architectures This commit was SVN r16246.	2007-09-26 20:54:18 +00:00
Brian Barrett	56e26ed390	Need to install the mpool_rdma.h so that we can build external BTLs that use the RDMA protocol This commit was SVN r16237.	2007-09-26 16:58:54 +00:00
Gleb Natapov	c7105eadc7	Update Voltaire copyright. This commit was SVN r16189.	2007-09-24 10:11:52 +00:00
Aurelien Bouteiller	0df0087f17	Investigating improvement of cache line management on shared memory This commit was SVN r16183.	2007-09-21 20:02:56 +00:00
Josh Hursey	1fe1276fd5	Make sure to match on the communicator ID as well. This commit was SVN r16179.	2007-09-21 18:16:02 +00:00
Josh Hursey	3e51d7bb25	Implement the MPI_Iprobe and MPI_Probe wrappers. Remove some old, unused code. This commit was SVN r16178.	2007-09-21 16:28:46 +00:00
Aurelien Bouteiller	d3b376a340	This patch adds actual non-blocking sender-based message logging. This improves bandwidth. Still need to work on malloc/mmap storage to reach optimal bandwidth. This commit was SVN r16172.	2007-09-21 03:24:08 +00:00
Aurelien Bouteiller	bc318b35e2	There is room in convertor to copy the packed data. It works just need to add the correct memcopy. It does not manage the short messages but I alreqdy think of a workaround for this (and it might even be better regarding latency). This commit was SVN r16169.	2007-09-20 21:57:21 +00:00
Aurelien Bouteiller	bbac6e650a	New improved version of sender-based. Under dev but a new framework for expressing various methods have been added. This commit was SVN r16159.	2007-09-19 03:42:56 +00:00
Gleb Natapov	097b17d30e	Prevent a receive request from been freed while other thread holds a reference to it or there is an outstanding completion for the request. This commit was SVN r16153.	2007-09-18 16:18:47 +00:00
Jeff Squyres	33955a0ed0	Oops -- when converted from uint to int, -1 (the default value, meaning "infinite") is no longer larger than the minimum required size. So put in an appropriate test to ensure that "infinite" was not requested. This commit was SVN r16142.	2007-09-17 19:28:21 +00:00
Jeff Squyres	130a272cec	Fix some compiler warnings about signed/unsigned comparisons. This commit was SVN r16139.	2007-09-17 13:08:45 +00:00
Josh Hursey	d2ef0d445a	Add some basic timing hooks so I can extract a few more detailed performance numbers for tuning. Switch the bookmark_recv to be non-blocking. If this is blocking then for process counts >= 32 slight process delays were causing cascading performance delays in the protocol. This lead to checkpoints either taking about 3 sec or 45 sec (or more) for 64 procs due to the cascading delays. With the nonblocking receive version this is no longer the case we get the speedup we expect for this part of the protocol. More tuning to come. This commit was SVN r16137.	2007-09-16 15:13:23 +00:00
Jeff Squyres	6004e177e0	Fixes trac:1133: if you specify a max freelist size that is too small, you'll get a helpful error message and the openib BTL will deactivate itself. This commit was SVN r16133. The following Trac tickets were found above: Ticket 1133 --> https://svn.open-mpi.org/trac/ompi/ticket/1133	2007-09-14 21:42:56 +00:00
George Bosilca	617ff3a413	Add a MCA parameter for the ELAN MAP ID file. Fix small memory bugs, and track the final segfault. Still some ork to do. This commit was SVN r16117.	2007-09-12 21:25:35 +00:00
Aurelien Bouteiller	a1f5312afb	Fixed two little warnings This commit was SVN r16116.	2007-09-12 21:07:11 +00:00
Aurelien Bouteiller	ccb3f75e8f	Make sure that the pml v parasite never get loaded when user did not requested FT. This does not break the ability to switch protocol on the fly. This commit was SVN r16114.	2007-09-12 20:47:17 +00:00
George Bosilca	1e7a791349	Remove some of the problems identified by Coverty. This commit was SVN r16112.	2007-09-12 20:13:26 +00:00
Aurelien Bouteiller	828af95be8	Major modification of the vprotocol framework build system. With a better integration in autogen.sh, it allows for generating static-components.h the usual way. NOTE: This build system does not work with the current autogen.sh. Modified one is under heavy testing to make sure it does not have side effects This commit was SVN r16110.	2007-09-12 18:46:37 +00:00
George Bosilca	05ae27c68b	Don't segfault if we receive a fragment for a non existing communicator. Instead, drop it by now. This commit was SVN r16105.	2007-09-12 17:52:02 +00:00
George Bosilca	c755938eb0	Coverty: release the temporary buffer on error. This commit was SVN r16104.	2007-09-12 17:45:12 +00:00
Shiqing Fan	a0660f4deb	- Just some type casts. This commit was SVN r16100.	2007-09-12 15:29:58 +00:00
Gleb Natapov	07c8fddeef	Fix scheduling of pending send request. It should be scheduled req_lock times. This commit was SVN r16096.	2007-09-12 07:08:38 +00:00
George Bosilca	d8fed2cfa1	Set a default value so that some compilers stop complaining about uninitialized values. This commit was SVN r16094.	2007-09-11 18:00:53 +00:00
Gleb Natapov	b0614931f4	Remove mpool_tree_item from the mpool_tree before unregistering/freeing memory. Otherwise a race exists if another thread allocates already freed memory which is not removed from the mpool_tree yet. This commit was SVN r16038.	2007-09-03 10:56:55 +00:00
Rainer Keller	a3b30749b0	- Only lock/unlock when using threads. Basically revert this part of r16015. This commit was SVN r16029. The following SVN revision numbers were found above: r16015 --> open-mpi/ompi@435e7d80e9	2007-08-31 12:34:48 +00:00
Rainer Keller	9c1c345c07	- head_lock is an opal_atomic_lock_t... This commit was SVN r16028.	2007-08-31 12:20:21 +00:00
Shiqing Fan	efdcfa3807	- "extern 'C'" has been set twice. Remove one. This commit was SVN r16022.	2007-08-30 15:03:59 +00:00
Shiqing Fan	80fdd5e2a4	- Need to be exported. This commit was SVN r16021.	2007-08-30 14:16:03 +00:00
Gleb Natapov	79011279e5	Remove debug output. This commit was SVN r16016.	2007-08-30 13:29:41 +00:00
Gleb Natapov	435e7d80e9	Remove rc parameter from MCA_BTL_SM_FIFO_WRITE() macro. It cannot fail in current implementation. This commit was SVN r16015.	2007-08-30 13:21:52 +00:00
Gleb Natapov	690fb95bda	Cleanup send scheduling code. This commit was SVN r16014.	2007-08-30 12:10:04 +00:00
Gleb Natapov	0b0f9d14aa	Mark send request complete on PML level only when absolutely sure there is no more work associated with this request. No more outstanding completions or packets and send scheduling isn't running in another thread. This commit was SVN r16013.	2007-08-30 12:08:33 +00:00
Gleb Natapov	fe414047bd	registration may be freed inside mca_mpool_rdma_deregister(). This commit was SVN r16012.	2007-08-30 10:52:38 +00:00
Gleb Natapov	091862a25a	Protect access to mca_mpool_base_tree by a lock. This commit was SVN r16011.	2007-08-30 10:51:02 +00:00
Gleb Natapov	eac2674f66	The inner voice tells me this is a typo. This commit was SVN r16004.	2007-08-29 13:28:47 +00:00
Jeff Squyres	466394a878	We only care about the value of ret in the !OMPI_ENABLE_PROGRESS_THREADS case. Reviewed by Brian. This commit was SVN r16000.	2007-08-29 01:36:17 +00:00
Jeff Squyres	c4a38f47f6	Resolve Coverity CID 467: remove unused variable / dead code. This commit was SVN r15997.	2007-08-29 01:23:18 +00:00
Brian Barrett	59b22533f2	Enable RDMA for heterogeneous situations. Currently done by overloading the ompi_convertor_need_buffers function to only return 0 if the convertor is homogeneous (which it never does on the trunk, but does to on v1.2, but that's a different issue). Only enable the heterogeneous rdma code for a btl if it supports it (via a flag), as some btls need some work for this to work properly. Currently only TCP and OpenIB extensively tested This commit was SVN r15990.	2007-08-28 21:23:44 +00:00
Gleb Natapov	fa69c5cc10	If a memory on a sender's size is not registered don't register it on a receive side too. Otherwise a content of the recvreq->req_rdma array is replaced later without freeing previous content and refcount on registration in mpool become wrong. This commit was SVN r15978.	2007-08-28 07:43:06 +00:00
Rich Graham	bc97d22182	remove tabs. Remove old code that was commented out. This commit was SVN r15975.	2007-08-28 03:08:36 +00:00
Rich Graham	4d58f9aed7	Add comments. Move temporary receive object from a free list object to a stack object. This commit was SVN r15971.	2007-08-27 21:41:04 +00:00
Gleb Natapov	e1a1d9d90e	Receive request converter can be accessed in parallel by a thread that receives data and a thread that run RDMA schedule function. Protect access to the converter by a lock. This commit was SVN r15967.	2007-08-27 11:41:42 +00:00
Gleb Natapov	33196d972b	post_send() function is called without endpoint lock held from explicit credits update function so eager_rdma_remote.head have to be updated in a thread safe manner. This commit was SVN r15966.	2007-08-27 11:37:01 +00:00
Gleb Natapov	32a61c3bf2	Credit fragment is not protected properly from concurrent access. There is a race that can prevent further explicit credits update from been sent. Fix the race. This commit was SVN r15965.	2007-08-27 11:34:59 +00:00
Gleb Natapov	065d04dfde	Do not free recvreq while schedule function is running in another thread. This commit was SVN r15964.	2007-08-27 11:31:40 +00:00
Brad Benton	ccda5c9c74	Modified the MCA_BTL_TCP_CONNECTED case in mca_btl_tcp_endpoint_send_handler() to always first check for a NULL frag pointer before trying to send the fragment. This avoids an issue in multi-threaded execution in which multiple threads working on the same endpoint can result in a thread finding itself here with nothing to send. This commit was SVN r15963.	2007-08-26 23:40:02 +00:00
Edgar Gabriel	a2f5cada1a	convert the hiearch component to the new structure. More testing required before we remove the .ompi_ignore flag again. This commit was SVN r15954.	2007-08-23 20:41:29 +00:00
Rainer Keller	1b5fa48a29	- Add missing PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q when matching from the posted generic_recv-queue. - Move the PERUSE_COMM_MSG_MATCH_POSTED_REQ from MCA_PML_OB1_RECV_REQUEST_MATCHED to mca_pml_ob1_recv_frag_match() as suggested by Terry Dontje Only post, if this is not a probe/iprobe request. - Do not post PERUSE_COMM_REQ_MATCH_UNEX for probes / iprobes and do in correct order before PERUSE_COMM_MSG_REMOVE_FROM_UNEX_Q This commit was SVN r15947.	2007-08-23 07:09:43 +00:00
Rainer Keller	c175801f98	- Initialize in the order of mca_pml_ob1_comm_proc_t... This commit was SVN r15946.	2007-08-23 05:56:22 +00:00
Rainer Keller	b0df55d53b	- For MPI_Probe/MPI_Iprobe, we should not have a PERUSE_COMM_REQ_ACTIVATE event. Therefore move the PERUSE_TRACE_COMM_EVENT for this event from MCA_PML_BASE_SEND_REQUEST_INIT / MCA_PML_BASE_RECV_REQUEST_INIT to the proper places into pml_ob1_isend.c / pml_ob1_irecv.c right after the MCA_PML_OB1_SEND_REQUEST_INIT / MCA_PML_OB1_RECV_REQUEST_INIT. This commit was SVN r15945.	2007-08-23 05:52:33 +00:00
Gleb Natapov	becf4aa9c9	ompi_pointer_array_get_size doesn't return how much elements are actually in an array, so count them by ourselves. This commit was SVN r15943.	2007-08-22 09:31:12 +00:00
Shiqing Fan	a497a3fcad	- Fix some small bugs, copy-paste mistakes. This commit was SVN r15941.	2007-08-21 19:57:28 +00:00
Sven Stork	3985a35c35	- export required symbol This commit was SVN r15939.	2007-08-21 18:46:11 +00:00
Gleb Natapov	d8f3063895	Create only one CQ for all BTLs on the same HCA. Many BTLs can be created for one HCA. Multiple ports, LMC, multiple BTLs per one LID. Having only one CQ for all of them substantially reduce polling time. This commit was SVN r15933.	2007-08-20 12:28:25 +00:00
Gleb Natapov	5596aa5f53	The sizes of mca_pml_ob1_send_request_t and mca_pml_ob1_recv_request_t depend on a parameter and are determined in runtime. r15346 removed calculation of correct sizes for this structures. This patch adds it back and fixes trac:1116, #1114. This commit was SVN r15932. The following SVN revision numbers were found above: r15346 --> open-mpi/ompi@433f8a7694 The following Trac tickets were found above: Ticket 1116 --> https://svn.open-mpi.org/trac/ompi/ticket/1116	2007-08-20 12:06:27 +00:00
George Bosilca	c7e0ab93ae	Don't forget to include string.h for the strcmp function. This commit was SVN r15927.	2007-08-19 19:59:15 +00:00
Brian Barrett	af4e86c25f	Update collectives selection logic to allow for multiple components to be used at nce (up to one unique collective module per collective function). Matches r15795:15921 of the tmp/bwb-coll-select branch This commit was SVN r15924. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15795 r15921	2007-08-19 03:37:49 +00:00
Brian Barrett	2b8af283de	Add ability to completely turn off MPI one-sided support, so that users can experiment with using ROMIO directly. This commit was SVN r15922.	2007-08-18 21:35:51 +00:00
Josh Hursey	729c63cf9d	Fix invalid MCA 'base' names so they appear in ompi_info. A subset of this patch needs to be applied to v1.2 Refs trac:928 This commit was SVN r15918. The following Trac tickets were found above: Ticket 928 --> https://svn.open-mpi.org/trac/ompi/ticket/928	2007-08-18 03:05:45 +00:00
Brian Barrett	3b98b5f0a1	The reference implementation of Portals (which runs over TCP on Linux) is only static libraries. Previously, we were linking the libraries into directly into the common, btl, and mtl code. This seemed to work fine for me on my Opteron Fedora box, but caused Lisa some issues (PtlNIInit would succeed, but the network handle would fail when used with PtlEQAlloc). Instead, link the portals libraries directly into libmpi and not at all into the common, btl, or mtl components. THen use some linker tricks to force the linker to bring in the public interface for the reference implementation (which thankfully is pretty small). This commit was SVN r15902.	2007-08-17 03:56:49 +00:00
Brad Benton	c254645383	Fixes trac:1134. Fixed a condition test while checking that all segments are empty. Without this fix, a NULL segment pointer could make it past the test, resulting in a SegV when dereferenced. This commit was SVN r15891. The following Trac tickets were found above: Ticket 1134 --> https://svn.open-mpi.org/trac/ompi/ticket/1134	2007-08-16 19:39:52 +00:00
Brad Benton	1ddba9ec65	Lock the endpoint before doing endpoint_state processing. This ensures that the subsequent unlock is valid. This commit was SVN r15890.	2007-08-16 18:11:29 +00:00

... 4 5 6 7 8 ...

2486 Коммитов