openmpi

Автор	SHA1	Сообщение	Дата
George Bosilca	3fca3973d3	The PTLs are now long gone !!! This commit was SVN r17104.	2008-01-10 00:18:45 +00:00
Jon Mason	3970c3ff6c	Add Chelsio T3 to ompi/mca/btl/openib/mca-btl-openib-hca-params.ini This commit was SVN r17101.	2008-01-09 22:14:18 +00:00
Jon Mason	597c7e68f1	Minor cleanups This commit was SVN r17100.	2008-01-09 21:54:11 +00:00
George Bosilca	1bd31aa3ac	Cleanup the OMPI_DECLSPEC/OMPI_MODULE_DECLSPEC in the PMLs. This commit was SVN r17093.	2008-01-09 20:32:39 +00:00
Rolf vandeVaart	870fa8b1f1	Pad the sm btl header to double-word alignment. Preserves PML header as double-word aligned and prevents bus errors on SPARC based servers. This is part of fix for #1148. Refs trac:1148 This commit was SVN r17090. The following Trac tickets were found above: Ticket 1148 --> https://svn.open-mpi.org/trac/ompi/ticket/1148	2008-01-09 18:50:51 +00:00
Gleb Natapov	25ce70bb92	Call mca_btl_openib_endpoint_post_send() holding endpoint lock and not holding qp lock since this is what the function assumes. This commit was SVN r17086.	2008-01-09 14:46:41 +00:00
Pavel Shamis	99f51482e3	Fixing openib finalization flow. This commit was SVN r17085.	2008-01-09 12:36:30 +00:00
Gleb Natapov	51d6ca0cb6	Provide no lock version of mca_btl_openib_endpoint_post_rr(). On connection creation we call it with endpoint lock already held. This commit was SVN r17084.	2008-01-09 10:39:35 +00:00
Gleb Natapov	50af6b9e78	Rearrange functions order so that functions are defined before they are used. No code changes here. This commit was SVN r17083.	2008-01-09 10:27:15 +00:00
Gleb Natapov	621fa223c5	Create free lists of fragments per HCA, not per BTL. Saves memory in case of multiple LMCs. This commit was SVN r17082.	2008-01-09 10:26:21 +00:00
Gleb Natapov	5ce3213158	Rearrange functions order so that functions are defined before they are used. No code changes here. This commit was SVN r17081.	2008-01-09 10:05:41 +00:00
Gleb Natapov	b37ff74a24	Make function that is used only in one file static. Remove static functions declaration. This commit was SVN r17080.	2008-01-09 09:54:35 +00:00
Ethan Mallove	f32dcb1636	The Sun Studio 12 compilers need to have `inline` specified as `static` in cases where a function is not part of a separate compilation unit (such as `append_recv_req_to_queue`). This commit was SVN r17069.	2008-01-08 18:45:51 +00:00
Pavel Shamis	fbf7bcd9a9	We need to prepost on srq/xrc before reply with ENDPOINT_XOOB_CONNECT_XRC_RESPONSE. This commit was SVN r17066.	2008-01-08 10:30:16 +00:00
Gleb Natapov	8bfcfa464a	Don't call free(), or library functions that may call free() inside (such as ibv_dereg_mr() for instance) from ptmalloc callback. Call to free() from the callback causes deadlock. Notice what should be unregistered inside the callback and do actual cleanup at the next call to mpool->register(). This commit was SVN r17064.	2008-01-08 08:55:42 +00:00
Aurelien Bouteiller	9bf54e1604	Windows compatibility patch. Also introduces work in progress "convertor" sender based copy algorithm. This algorithm cannot be selected without other modifications in the convertor (not currently available in trunk). The default old synchronous copy algorithm is selected by default. This commit was SVN r17063.	2008-01-07 23:35:44 +00:00
Rolf vandeVaart	0f0fde3490	Partial fix for #1148 . Enable this for 32-bit sparc as well as 64-bit sparc. This commit was SVN r17059.	2008-01-07 15:43:44 +00:00
Gleb Natapov	c3bbf69356	Set send_flags correctly in btl_openib_put. Otherwise we may reuse flags from previous use of the buffer and they may be incorrect. This commit was SVN r17058.	2008-01-07 10:19:07 +00:00
George Bosilca	d2324050f8	Allow the PML V component to be compiled on Windows. Force all .c files to include the ompi_config.h as the first #include. This commit was SVN r17056.	2008-01-05 00:17:32 +00:00
George Bosilca	48f5a26e8c	Cast to keep VC happy (quiet). This commit was SVN r17054.	2008-01-04 23:13:32 +00:00
Jeff Squyres	a234ba198a	Remove superflous / unused -D from Makefile.am. This commit was SVN r17030.	2008-01-02 18:00:20 +00:00
Jeff Squyres	c9bea80f8f	Fix unbalanced parenthesees noticed by Paul Hargove. This commit was SVN r17029.	2008-01-02 13:34:07 +00:00
Gleb Natapov	2fb6947f88	Destroy endpoints that use eager rdma communication before destroying SRQ. Do't skip async event thread destruction if SRQ was not destroyed, or it will segfault on module removal. This commit was SVN r17025.	2007-12-23 13:58:31 +00:00
Gleb Natapov	b06d92bdab	OpenIB BTL has three channels through which data can be received (eager rdma, high prio QPs and low prio QPs) and because not all of them are polled each time progrgess() is called (to save on latency) starvation is possible. The commit fixes this. Now each channel is polled, but higher priority channels are polled more often. Three new parameters are introduced that control polling ratios between different channels. This commit was SVN r17024.	2007-12-23 12:29:34 +00:00
Brad Penoff	4c2571b54c	fixed more 64 bit SCTP BTL warnings This commit was SVN r17022.	2007-12-21 21:50:00 +00:00
Brad Penoff	195faa37b6	fixed send side of 64 bit compilation warnings This commit was SVN r17019.	2007-12-21 19:11:50 +00:00
Jeff Squyres	558d179e2e	Fix typo. This commit was SVN r17012.	2007-12-21 14:25:48 +00:00
George Bosilca	42414b27e9	Use BEGIN_C_DECLS and END_C_DECLS instead of the ugly #if/#endif. This commit was SVN r17009.	2007-12-21 06:19:46 +00:00
George Bosilca	b58dae00db	Allow PERUSE to compile correctly. This commit was SVN r17008.	2007-12-21 06:18:19 +00:00
George Bosilca	906e8bf1d1	Replace the ompi_pointer_array with opal_pointer_array. The next step (sometimes after the merge with the ORTE branch), the opal_pointer_array will became the only pointer_array implementation (the orte_pointer_array will be removed). This commit was SVN r17007.	2007-12-21 06:02:00 +00:00
Tim Mattox	bbeef5b84b	Change the MX BTL's exclusivity to MCA_BTL_EXCLUSIVITY_DEFAULT, so that it is higher than the new TCP BTL exclusivity as of r16942. The portals BTL maintainer may want to do the same... This commit was SVN r16995. The following SVN revision numbers were found above: r16942 --> open-mpi/ompi@80e9730100	2007-12-19 21:24:45 +00:00
Gleb Natapov	35bf8c7c46	Rewrite OB1 matching logic. Get rid of macros, make the code shorter. This commit was SVN r16993.	2007-12-19 09:16:20 +00:00
Pavel Shamis	fcbca510d8	The ib_inline_max should be updated only when SEND qp is created. This commit was SVN r16973.	2007-12-17 10:30:30 +00:00
Gleb Natapov	f79e344ea4	Fix bug in debug build. This commit was SVN r16972.	2007-12-17 10:26:18 +00:00
Gleb Natapov	64a95f63cd	Fix error reporting in openib if parameter value is out of range. This commit was SVN r16971.	2007-12-16 14:04:36 +00:00
Gleb Natapov	5cd38b8b06	Better encapsulate heterogeneous arch handling in ob1. This commit was SVN r16970.	2007-12-16 08:45:44 +00:00
Gleb Natapov	8b511b969d	Introduce a new BTL parameter btl_rndv_eager_limit which determines size of a first fragment of rendezvous protocol. Remove no longer used btl_min_send_size parameter. This commit was SVN r16969.	2007-12-16 08:35:17 +00:00
Jeff Squyres	213b5d5c6e	Per long threads on the mailing list and much confusion discussion about linkers, have all OPAL, ORTE, and OMPI components '''not'' link against the OPAL, ORTE, or OMPI libraries. See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a better-formatted version of the same info). This commit was SVN r16968.	2007-12-15 13:32:02 +00:00
Brad Penoff	540d483dd3	64 bit fix and initial Solaris support This commit was SVN r16967.	2007-12-15 03:28:10 +00:00
Donald Kerr	d05d3afaed	clean up and make consistent the reporting out from the udapl btl; report out readeable event string instead of just a number This commit was SVN r16954.	2007-12-13 15:32:26 +00:00
Josh Hursey	a287c9cb65	This commit distinguishes the file transfer stage from the finish stage. This commit also cleans up the checkpoint and terminate case making it more precise than before. Previously the application could make a small amount of progress between checkpoint completion and application termination. Now the application will make no progress at all in this time span. Additional minor change: - Start using OPAL_INT_TO_BOOL instead of if/else logic This commit was SVN r16952.	2007-12-13 14:37:17 +00:00
Brad Penoff	ecd563b0fa	reduced noise for SCTP BTL on RHEL4U4 This commit was SVN r16951.	2007-12-13 03:15:29 +00:00
Aurelien Bouteiller	93f39fa190	Fixes various issues with --enable-visibility, C++ and exotic C compilers. Aurelien This commit was SVN r16949.	2007-12-12 19:13:23 +00:00
Jeff Squyres	80e9730100	Per http://www.open-mpi.org/community/lists/devel/2007/12/2698.php and this thread: http://www.open-mpi.org/community/lists/devel/2007/12/2807.php, set TCP's exclusivity to LOW+100 and SCTP's exclusivity to LOW. This commit was SVN r16942.	2007-12-12 15:55:37 +00:00
Jon Mason	e05cd7b0e4	To modify the default connection method, a "btl_openib_connect <arg>" should be passed via commandline. However, there is a slight coding bug in the openib connect code. When registering the name of the option, mca_base_param_reg_string will prepend the relevant info ("btl_openib_" in this case). The existing code will require "btl_openib_btl_openib_connect" instead of "btl_openib_connect". This patch corrects this. This commit was SVN r16937.	2007-12-11 20:36:36 +00:00
Galen Shipman	a04d21b459	Make CNL compile again.. This commit was SVN r16929.	2007-12-11 16:14:30 +00:00
Gleb Natapov	2a59b2a68f	1. Set segments length in prepare_src() after packing because actual size may be smaller then allocated size. 2. If reserve zero don't allocate coalesced frag since it will be RDMAed, not send. The logic was other way around. This commit was SVN r16928.	2007-12-11 13:10:52 +00:00
Jon Mason	df82fcb917	Slight word usage and grammar error in the openib btl help test. I believe the change below is the intended meaning. This commit was SVN r16921.	2007-12-10 21:50:48 +00:00
Donald Kerr	a604fca52c	follow on change to r16901 and r16898; the interface change mca_btl_udapl_alloc() was not applied to two locations in this file This commit was SVN r16918. The following SVN revision numbers were found above: r16898 --> open-mpi/ompi@7364b7cf47 r16901 --> open-mpi/ompi@e2e211f23b	2007-12-10 18:10:52 +00:00
Gleb Natapov	17611dafbe	Fix pointer casting on 32bit machines. This commit was SVN r16907.	2007-12-09 14:15:35 +00:00
Gleb Natapov	2f9c5b46cf	Return OMPI_ERR_RESOURCE_BUSY from openib_btl_send() if fragment is not on wire. This commit was SVN r16906.	2007-12-09 14:14:11 +00:00
Gleb Natapov	e0dc53e516	Use mca_bml_base_send_status() in OB1. This commit was SVN r16905.	2007-12-09 14:13:24 +00:00
Gleb Natapov	666b282e7e	Add mca_bml_base_send_status function. It returns ORTE_ERR_RESOURCE_BUSY if packet was queued inside BTL. BTL should return this error if packet was queued internally. This commit was SVN r16904.	2007-12-09 14:12:38 +00:00
Gleb Natapov	493951e09d	Add heterogeneous support to message coalescing. This commit was SVN r16903.	2007-12-09 14:10:25 +00:00
Gleb Natapov	b4698dc6df	Use flags provided during allocation to coalesce to correct priority queue. This commit was SVN r16902.	2007-12-09 14:08:55 +00:00
Gleb Natapov	e2e211f23b	Add flags parameter to btl_alloc() and btl_prepare_src() functions. If BTL knows at the time of allocation priority of a descriptor it may do some optimizations. This commit was SVN r16901.	2007-12-09 14:08:01 +00:00
Gleb Natapov	5313a2baa7	Message coalescing for openib BTL. If fragment is waiting to be transmitted in a pending queue pack another message into it if there is enough space there. This commit was SVN r16900.	2007-12-09 14:05:13 +00:00
Gleb Natapov	7302cd24eb	Call btl_alloc() from btl_prepare_src() to have one point of frag allocation. This commit was SVN r16899.	2007-12-09 14:02:32 +00:00
Gleb Natapov	7364b7cf47	Add endpoint parameter to btl_alloc() function. Enables various optimizations inside BTL. This commit was SVN r16898.	2007-12-09 14:00:42 +00:00
Gleb Natapov	2d784752dd	Remove descriptor caching form BML. With descriptor caching some optimizations are impossible. This commit was SVN r16897.	2007-12-09 13:58:17 +00:00
Gleb Natapov	de3761208a	Send cm_seen by eager rdma channel. Encode qp index into credits filed. If cm_seen is not send here non symmetric eager rdma connection may hang. This commit was SVN r16896.	2007-12-09 13:56:13 +00:00
Tim Mattox	d188642715	Apparently the SCTP BTL has a btl_sctp_component.h file that needs to be part of the "sources" list. Hopefully this will clear of the nightly tarball creation for the trunk. This commit was SVN r16895.	2007-12-08 04:05:59 +00:00
Galen Shipman	4daa552c97	Correct makefile to include all sources, should fix a problem in building a distro.. This commit was SVN r16894.	2007-12-07 18:59:16 +00:00
Karl Mroz	71b54d8e4e	Removed .ompi_ignore and .ompi_unignore from SCTP BTL. This commit was SVN r16893.	2007-12-07 17:02:32 +00:00
Aurelien Bouteiller	6190c97ee9	PML V and vprotocol framework management of customizable wait/test. This is still a fast and dirty implementation (cleanup of the customized request functions is not totally correct if several component modify them out of order). This commit was SVN r16890.	2007-12-07 08:21:25 +00:00
Aurelien Bouteiller	859169214c	Vprotocol pessimist benefits from customizable requests. Waitany, waitsome, test, testany, testall, testsome can now be hooked and are therefore logged correctly. This commit was SVN r16885.	2007-12-07 08:17:30 +00:00
Jon Mason	20294e7800	There is a double call to ompi_btl_openib_connect_base_open in mca_btl_openib_mca_setup_qps(). It looks like someone just forgot to clean-up the previous call when they added the check for the return code. I ran a quick IMB test over IB to verify everything is still working. This commit was SVN r16870.	2007-12-06 17:25:38 +00:00
Pavel Shamis	e8aeadb11e	XRC fixes: - create separate xrc domain file for each hca - return error if we failed to create xrc file. This commit was SVN r16853.	2007-12-05 14:32:44 +00:00
Pavel Shamis	f60ca0e4e5	Removing unused mca_btl_openib_ib_address_status This commit was SVN r16835.	2007-12-04 13:16:26 +00:00
Pavel Shamis	57728986f8	Fixing XRC multiport/multisubnet support. This commit was SVN r16819.	2007-12-03 09:49:53 +00:00
Gleb Natapov	b2858236fb	Use new free list interface. This commit was SVN r16818.	2007-12-02 15:13:11 +00:00
Gleb Natapov	a774cd98f8	Put send completions to low prio CQ. Receive is more important. This commit was SVN r16817.	2007-12-02 14:46:37 +00:00
Gleb Natapov	b17f5b7480	Change how default receive queues parameters are calculated. Current default parameters don't make any sense. Credits are never piggybacked. Also make default queue sizes to be calculated from eager_limit and max_send_size values. This commit was SVN r16816.	2007-12-02 14:43:28 +00:00
Josh Hursey	5fb83a4f10	- Remove an unnecessary barrier - verbose -> VERBOSE just for the fun of it This commit was SVN r16811.	2007-11-30 22:26:18 +00:00
Rich Graham	6e77414a68	changes to the ompi_free_list_ex - called ompi_free_list_ex_new, for now. This commit was SVN r16803.	2007-11-29 21:18:37 +00:00
Ron Brightwell	0138a2ee17	Do cleanup in ompi_mtl_portals_del_procs() rather than ompi_mtl_portals_finalize(). Previous code was cleaning up Portals resources that hadn't been allocated, which caused valid handles used elsewhere to be freed, which broke cnos_barrier() for the Portals btl. This commit was SVN r16801.	2007-11-29 17:29:46 +00:00
Jeff Squyres	8c0060701c	Stub out the ibcm CPC. This commit was SVN r16800.	2007-11-29 13:23:17 +00:00
Pavel Shamis	8aca6eb31b	OFED 1.3 doesn't implement ibv_resize_cq for connectX. On error exit from ibv_resize_cq we should to check if the function is implemented. This commit was SVN r16799.	2007-11-28 15:23:19 +00:00
Gleb Natapov	5f242c77f2	Post each recv wr not separately but in one call to ibv_post_recv(). This commit was SVN r16798.	2007-11-28 14:57:15 +00:00
Gleb Natapov	14cffee726	Uninline mca_btl_openib_post_srr() function. This commit was SVN r16797.	2007-11-28 14:52:31 +00:00
Pavel Shamis	1c314ef4c3	If XRC qp was specified in btl_openib_receive_queues we automatically should choose xoob connection module. This commit was SVN r16796.	2007-11-28 10:33:32 +00:00
Pavel Shamis	488a508732	Removing comments from help file. This commit was SVN r16795.	2007-11-28 10:16:08 +00:00
Pavel Shamis	3e2e4f6d2a	Removing unused lid. This commit was SVN r16794.	2007-11-28 10:06:57 +00:00
Pavel Shamis	aa79bdabc8	Removing port_touse - we don't really need it This commit was SVN r16793.	2007-11-28 09:57:48 +00:00
Pavel Shamis	2ffbe8776a	Fixing compilation problems in openib This commit was SVN r16792.	2007-11-28 09:38:49 +00:00
Gleb Natapov	218adb2a96	Account for eager rdma credit fragments when creating send queue. Create XRC receive QP with zero receive and send queue length. We don't going to use this QP for send and receives a posted to SRQs. This commit was SVN r16791.	2007-11-28 07:22:01 +00:00
Gleb Natapov	601952a952	Don't shared endpoint->qps array, only pointer to actual QP. Calculate send queue size for shared QP based on all endpoints that want to use it. This commit was SVN r16790.	2007-11-28 07:21:07 +00:00
Gleb Natapov	b46c9cc7bc	Make xrc use srq_qp unions instead of the xrc_qp which is exactly like srq_qp. This commit was SVN r16789.	2007-11-28 07:20:26 +00:00
Gleb Natapov	be0981fc07	Change a type of xrc_recv_qp to "struct ibv_qp". This commit was SVN r16788.	2007-11-28 07:19:36 +00:00
Gleb Natapov	bd47da4699	Initial XRC support by Mellanox. This commit was SVN r16787.	2007-11-28 07:18:59 +00:00
Gleb Natapov	b49788c499	Receive queue is not used in case of SRQ QP, so don't create one. This commit was SVN r16786.	2007-11-28 07:17:22 +00:00
Gleb Natapov	923666b75c	Process pending put/get frags on endpoint connection establishment. This commit was SVN r16785.	2007-11-28 07:16:52 +00:00
Gleb Natapov	e502402470	Fix endpoint destructor to not skip closed endpoints. This commit was SVN r16784.	2007-11-28 07:15:54 +00:00
Gleb Natapov	5a4e953aaa	Allow share the same qp for different buffer sizes. Needed for XRC support. This commit was SVN r16783.	2007-11-28 07:15:20 +00:00
Gleb Natapov	b123696d57	Fix async thread creation and destruction. Create async thread only when it is needed instead of creating it and then canceling if it is not needed. Change error handling during finalize so that it will not skip async thread destruction. Otherwise async thread may segfault during openib module unloading. This commit was SVN r16782.	2007-11-28 07:14:34 +00:00
Gleb Natapov	5463eb892c	Send all explicit credits for PP QPs of all orders over smallest PP qp. This commit was SVN r16781.	2007-11-28 07:13:34 +00:00
Gleb Natapov	a9f864d15c	If there is an eager rdma credit, but there is no WQE to send a packet we add it to a pending queue of eager rdma QP instead of correct pending list. This patch fixes this by getting reed of "eager rdma qp" notion. Packet is always send over its order QP. The patch also adds two pending queues for high and low prio packets. Only high prio packets are sent over eager RDMA channel. This commit was SVN r16780.	2007-11-28 07:12:44 +00:00
Gleb Natapov	6a2d210b7d	Use OMPI object system to make fragment hierarchy more object oriented. The main idea (except of cleanup) is to save on initialisation of unneeded fields and to use C type checking system to catch obvious errors. This commit was SVN r16779.	2007-11-28 07:11:14 +00:00
Gleb Natapov	267cd2342a	Cleanup. Remove unused functions. This commit was SVN r16778.	2007-11-28 07:08:56 +00:00
Ron Brightwell	924414f92f	Added support for Accelerated Portals for the btl. This commit was SVN r16771.	2007-11-21 21:34:17 +00:00
Ron Brightwell	a6d6be1bb9	Added send-side optimizations (persistent zero-length md and copy blocks) and support for Acclerated Portals. This commit was SVN r16770.	2007-11-21 21:31:37 +00:00
Brad Penoff	fb5536f11d	conforming SCTP BTL to Open MPI naming conventions and IP requirements This commit was SVN r16764.	2007-11-21 10:13:41 +00:00
Andrew Friedley	c50f2aa74c	fix warning This commit was SVN r16759.	2007-11-20 16:55:12 +00:00
Brad Penoff	ede8a6a7a1	adjusting for Linux when sctp_recvmsg returns 0 for remote close This commit was SVN r16742.	2007-11-20 06:02:08 +00:00
Tim Prins	f42fcd36db	make the mx btl compile again after the free list changes This commit was SVN r16735.	2007-11-19 19:41:22 +00:00
Brad Penoff	f34ddfef80	for SCTP BTL, added Mac OS X support for systems using SCTP NKE (Network Kernel Extension) This commit was SVN r16729.	2007-11-17 02:56:27 +00:00
Aurelien Bouteiller	15ffe6c89c	Accomoding the new interface for free_lists. This commit was SVN r16727.	2007-11-16 00:00:38 +00:00
Brad Penoff	5abd2d8064	initial SCTP BTL commit This commit was SVN r16723.	2007-11-13 23:39:16 +00:00
Adrian Knoth	037a533752	Reformatted r16691 to OMPI style. Re #733 This commit was SVN r16693. The following SVN revision numbers were found above: r16691 --> open-mpi/ompi@8dca19cb3b	2007-11-08 12:54:48 +00:00
Adrian Knoth	8dca19cb3b	upstream patch, provided by Jiri Polach. Re #733 This commit was SVN r16691.	2007-11-08 12:44:10 +00:00
Jeff Squyres	a4d571f8ad	Fix typo that broke the build. This commit was SVN r16635.	2007-11-02 09:19:55 +00:00
Rich Graham	27a748e7eb	change all instances of ompi_free_list_init to ompi_free_list_init_new. Header and payload data are specified separately at this stage. This commit was SVN r16633.	2007-11-01 23:38:50 +00:00
Andrew Friedley	46516d98e1	Update MCA params -- sd_num_peer is no longer used, change rd_num_init to rd_num This commit was SVN r16601.	2007-10-29 22:56:30 +00:00
Andrew Friedley	8273b61471	Bugfix for hangs in certain communication patterns, particularly alltoall. This commit was SVN r16600.	2007-10-29 21:51:28 +00:00
Gleb Natapov	04578ffdd6	Change calls to bml_btl->btl_alloc() to mca_bml_base_alloc(). This commit was SVN r16596.	2007-10-28 16:04:17 +00:00
Rich Graham	67f4b69848	propogate fix for out of buffered send memory space to dr and ob1 - thanks George. This commit was SVN r16593.	2007-10-27 00:17:53 +00:00
Rich Graham	9c0483088a	if unable to get buffered space, try and progress communications to free up resources. This commit was SVN r16591.	2007-10-26 23:16:31 +00:00
George Bosilca	d67c0eefb4	Remove a compilation warning about using uninitialized variables. This commit was SVN r16589.	2007-10-26 20:15:28 +00:00
George Bosilca	b1b5cb6453	Looks like SO_REUSEPORT it's not defined on some platforms. Switch to the conventional SO_REUSEADDR instead. This commit was SVN r16588.	2007-10-26 19:56:21 +00:00
George Bosilca	337f78a4a8	Restrict the port range for the OOB and the BTL. Each protocols (v4 and v6) has his own range which is defined by a min value and a range. By default there is no limitation on the port range, which is exactly the same behavior as before. This commit was SVN r16584.	2007-10-26 16:36:51 +00:00
George Bosilca	682f110658	Correctly test the finalize condition. Thanks to Ake Sandgren for bringing this issue to our attention. This commit was SVN r16560.	2007-10-24 13:34:27 +00:00
Gleb Natapov	3a63eb6c17	Cleanup macro definitions. This commit was SVN r16554.	2007-10-23 13:33:19 +00:00
Gleb Natapov	d836f3dbbe	Remove unused macro. This commit was SVN r16552.	2007-10-23 13:18:10 +00:00
Gleb Natapov	18ed60edeb	Revert previous commit. There was no memory leak, the pointer is saved inside free list for future use. This patch moves BTL initialization into separate function too. This commit was SVN r16551.	2007-10-23 12:57:45 +00:00
Gleb Natapov	657e544e02	Fix memory leak. Define init_data on a stack instead of allocation it each time. This commit was SVN r16550.	2007-10-23 11:10:52 +00:00
Gleb Natapov	9e2d5acf8e	Remove unused filed from openib fragment structure. This commit was SVN r16549.	2007-10-23 07:38:29 +00:00
George Bosilca	95c9fbdf45	Make sure the MX MTL component is shared between all files. This commit was SVN r16545.	2007-10-22 18:06:52 +00:00
Gleb Natapov	63dde87076	If SM BTL cannot send fragment because the cyclic buffer is full put the fragment on the pending list and send it later instead of spinning on opal_progress(). This commit was SVN r16537.	2007-10-22 12:07:22 +00:00
Rich Graham	0de9bd9fa0	when attaching an md for posted receive, generate a start event, so that PtlMDUpdate will pick up all incoming events. This commit was SVN r16517.	2007-10-19 19:09:40 +00:00
Gleb Natapov	52c6160252	MCA_PML_BASE_REQUEST_MPI_COMPLETE() macro does nothing except call to ompi_request_complete(). Remove the macro and call the function directly. This commit was SVN r16498.	2007-10-18 14:20:24 +00:00
George Bosilca	aa20a94b6f	Remove warning about an unused variable. This commit was SVN r16497.	2007-10-18 13:48:56 +00:00
Gleb Natapov	4f865e22e8	We have two different version of ompi_request_complete. One as a function another as a macro. Make it one inline function. This commit was SVN r16495.	2007-10-18 13:02:27 +00:00
Gleb Natapov	e0a3a7e53e	Move duplicated code all over the code to a single function ompi_request_wait_completion(). This commit was SVN r16494.	2007-10-18 12:33:21 +00:00
Gleb Natapov	807f49ed7f	If there are more then one BTL present we may divide payload between them in such a way that converter will not be able to pack some of it. This commit adds handling of such cases. If converter can't pack any data for a BTL the data is sent over another BTL that has data to send. This commit was SVN r16493.	2007-10-18 12:07:37 +00:00
Jeff Squyres	b7eeae0a74	Remove the mvapi BTL. Woo hoo! This commit was SVN r16483.	2007-10-17 14:08:03 +00:00
Jeff Squyres	94b1e9cff9	Update to use BTL_VERBOSE and BTL_ERROR instead of opal_output'ing to the mca_btl_base_output stream directly (and relying on it to be -1 if we didn't want any output). This commit was SVN r16449.	2007-10-15 17:53:02 +00:00
Rolf vandeVaart	3dd5196338	Remove the --mca btl_base_debug flag and clean up the use of the --mca btl_base_verbose flag. The btl framework now matches all the other frameworks. Slightly modify error messages for clarity. This commit was SVN r16443.	2007-10-15 13:10:20 +00:00
Gleb Natapov	1330974e5e	eager_limit is no longer needed in OB1 PML. Remove it. This commit was SVN r16442.	2007-10-15 09:26:42 +00:00
George Bosilca	436b0f2a5b	Way to many numbers in this uint32_t. This commit was SVN r16437.	2007-10-12 13:11:55 +00:00
Jeff Squyres	3500376d9e	Remove a warning about an unused label. This commit was SVN r16429.	2007-10-11 16:38:37 +00:00
George Bosilca	e3105a85be	Don't require a progress function from the PML. If there is one then the PML base will take care of the registration with the event library. Otherwise, (and this apply for the CM case) the MTL are in charge of registering their own progress function. This commit was SVN r16415.	2007-10-09 23:28:53 +00:00
Galen Shipman	6a25a635de	that shouldn't have slipped through.. This commit was SVN r16411.	2007-10-09 19:07:23 +00:00
Galen Shipman	6b051e255e	already checked size.. no need to do it again.. This commit was SVN r16409.	2007-10-09 18:59:10 +00:00
Nysal Jan	b51d85fb3f	Fix assertion failure "assert( 0 == btl_endpoint->endpoint_cache_length )" while executing mt_coll testcase. This commit was SVN r16408.	2007-10-09 18:00:01 +00:00
Galen Shipman	62ade993ca	Seperate finalize and close for the PML, this gives the PML a chance to complete any outstanding operations prior to close. Before this change we just called pml_finalize in pml_close which causes problems if there are outstanding events that a BTL/MTL needs to progress during finalize. The problem is that MPI_COMM_WORLD and others were destroyed prior to closing the PML, pml_close would call pml_finalize, events would progress in the BTL, and these events expected MPI_COMM_WORLD to still be around.. This commit was SVN r16405.	2007-10-09 15:28:56 +00:00
Andrew Friedley	c15047b264	Add LLNL copyright to the file i modified yesterday This commit was SVN r16404.	2007-10-09 15:18:23 +00:00
Andrew Friedley	fd51d9cf28	The call to opal_list_insert() had an off by one error (I think), causing selected components to get lost with certain load orderings. I went ahead and rewrote the code to use opal_list_insert_pos() instead, which gives a cleaner flow and more speed. This commit was SVN r16392.	2007-10-08 23:01:36 +00:00
Josh Hursey	7437f37e96	This commit contains the following: * Fix some missing includes in a few places. * Add the cr_request() functionality to the BLCR CRS component. We are now dependent upon the 0.6.* series of BLCR. * Made the CR notification mechanism a registered function. This way we can have an OPAL-only version and it can be replaced at runtime with the ORTE version. * Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only CR functionality when the user wants it. Default: Disabled. * Fix the placement of a checkpoint request check in MPI_Init * Pull the OPAL notification mechanism into the SnapC framework. * We no longer fork/exec the 'opal-checkpoint' command for local checkpointing, the Local coordinator in the orted does this directly. * The Local and Application coordinator talk together bypassing the OPAL notifiation mechanism. * Optimized the Local <-> App Coordinator communication. * Improved the structure used to track vpid_snapshots in the local coord. * Fix a race condition in which an application under heavy communication load may produce an inconsistent global checkpoint. This commit was SVN r16389.	2007-10-08 20:53:02 +00:00
Jeff Squyres	f92d9097d8	Some more changes to update to coll v1.1.0 that were missed yesterday. This actually exposed a very, very long-standing bug where part of the coll base was incorrectly checking the coll API version against the MCA API version. When coll went to v1.1 (yesterday) and was no longer the same as the MCA v1.0, the test started failing. This commit fixes to check for v1.1 everywhere in the coll base, and to ensure to check coll framework/API version numbers against coll framework/API version numbers (vs. against the MCA API version number). This commit was SVN r16373.	2007-10-07 12:20:22 +00:00
Jeff Squyres	3d34bff596	No technical/functional changes: simply change the name of the "data" parameter to "module" everywhere, just to be a little more clear what the purpose of that parameter is. This commit was SVN r16372.	2007-10-07 08:36:45 +00:00
Jeff Squyres	fc2b4376e9	Update forgotten macro. This commit was SVN r16368.	2007-10-06 14:11:35 +00:00
Ralph Castain	54b2cf747e	These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC. The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component. This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done: As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in. In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in. The incoming changes revamp these procedures in three ways: 1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step. The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic. Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure. 2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed. The size of this data has been reduced in three ways: (a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes. To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose. (b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction. (c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using. While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly. 3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup. It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging. Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future. There are a few minor additional changes in the commit that I'll just note in passing: propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details. * requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details. * cleanup of some stale header files This commit was SVN r16364.	2007-10-05 19:48:23 +00:00
Jelena Pjesivac-Grbovic	ada43fef9e	This fixes bug #1157 in coll/self module. All vector functions had incorrect handling of the offset. This commit was SVN r16360.	2007-10-05 17:40:16 +00:00
Jeff Squyres	f92154fc72	Gah -- ompi_info doesn't setup the connect pseudo component, so it'll be NULL. Ensure to protect for this. This commit was SVN r16333.	2007-10-04 18:03:56 +00:00
Jeff Squyres	13fa7ae93e	It's not necessary to link against all 3 libs (in fact, we shouldn't do it -- let libtool pull them in via the .la file if it needs to) This commit was SVN r16332.	2007-10-04 18:01:30 +00:00
Jeff Squyres	80ce974291	Fixes trac:1156: ensure to finalize the "connect" sub-component. This commit was SVN r16330. The following Trac tickets were found above: Ticket 1156 --> https://svn.open-mpi.org/trac/ompi/ticket/1156	2007-10-04 17:36:12 +00:00
Andrew Friedley	2e66590993	Fix mistakes in the basic component.. can't call collectives on the communicator and always pass the basic module.. have to give them the module off the communicator. This commit was SVN r16329.	2007-10-04 16:29:24 +00:00
Andrew Friedley	5be7f5e2dc	fixes trac:1154 Check if an exclusion string (i.e. '-mca btl ^sm) was provided; if so OFUD just disables itself. This commit was SVN r16307. The following Trac tickets were found above: Ticket 1154 --> https://svn.open-mpi.org/trac/ompi/ticket/1154	2007-10-02 20:37:16 +00:00
Gleb Natapov	60af46d541	We have QP description in component structure, module structure and endpoint. Each one of them has a field to store QP type, but this is redundant. Store qp type only in one structure (the component one). This commit was SVN r16272.	2007-09-30 16:14:17 +00:00
Gleb Natapov	9c04b127f5	Forget to put this fix in previous commit. This commit was SVN r16271.	2007-09-30 15:33:20 +00:00
Gleb Natapov	3a15d645be	Remove lcl_qp_attr from endpoint qp description. It is used during init only. This commit was SVN r16270.	2007-09-30 15:29:35 +00:00
Aurelien Bouteiller	670956e172	Another cast mistake. This commit was SVN r16247.	2007-09-26 21:14:35 +00:00
Aurelien Bouteiller	f7d7d58fb6	Various cast type errors on 64bit architectures This commit was SVN r16246.	2007-09-26 20:54:18 +00:00
Brian Barrett	56e26ed390	Need to install the mpool_rdma.h so that we can build external BTLs that use the RDMA protocol This commit was SVN r16237.	2007-09-26 16:58:54 +00:00
Gleb Natapov	c7105eadc7	Update Voltaire copyright. This commit was SVN r16189.	2007-09-24 10:11:52 +00:00
Aurelien Bouteiller	0df0087f17	Investigating improvement of cache line management on shared memory This commit was SVN r16183.	2007-09-21 20:02:56 +00:00
Josh Hursey	1fe1276fd5	Make sure to match on the communicator ID as well. This commit was SVN r16179.	2007-09-21 18:16:02 +00:00
Josh Hursey	3e51d7bb25	Implement the MPI_Iprobe and MPI_Probe wrappers. Remove some old, unused code. This commit was SVN r16178.	2007-09-21 16:28:46 +00:00
Aurelien Bouteiller	d3b376a340	This patch adds actual non-blocking sender-based message logging. This improves bandwidth. Still need to work on malloc/mmap storage to reach optimal bandwidth. This commit was SVN r16172.	2007-09-21 03:24:08 +00:00
Aurelien Bouteiller	bc318b35e2	There is room in convertor to copy the packed data. It works just need to add the correct memcopy. It does not manage the short messages but I alreqdy think of a workaround for this (and it might even be better regarding latency). This commit was SVN r16169.	2007-09-20 21:57:21 +00:00
Aurelien Bouteiller	bbac6e650a	New improved version of sender-based. Under dev but a new framework for expressing various methods have been added. This commit was SVN r16159.	2007-09-19 03:42:56 +00:00
Gleb Natapov	097b17d30e	Prevent a receive request from been freed while other thread holds a reference to it or there is an outstanding completion for the request. This commit was SVN r16153.	2007-09-18 16:18:47 +00:00
Jeff Squyres	33955a0ed0	Oops -- when converted from uint to int, -1 (the default value, meaning "infinite") is no longer larger than the minimum required size. So put in an appropriate test to ensure that "infinite" was not requested. This commit was SVN r16142.	2007-09-17 19:28:21 +00:00
Jeff Squyres	130a272cec	Fix some compiler warnings about signed/unsigned comparisons. This commit was SVN r16139.	2007-09-17 13:08:45 +00:00
Josh Hursey	d2ef0d445a	Add some basic timing hooks so I can extract a few more detailed performance numbers for tuning. Switch the bookmark_recv to be non-blocking. If this is blocking then for process counts >= 32 slight process delays were causing cascading performance delays in the protocol. This lead to checkpoints either taking about 3 sec or 45 sec (or more) for 64 procs due to the cascading delays. With the nonblocking receive version this is no longer the case we get the speedup we expect for this part of the protocol. More tuning to come. This commit was SVN r16137.	2007-09-16 15:13:23 +00:00
Jeff Squyres	6004e177e0	Fixes trac:1133: if you specify a max freelist size that is too small, you'll get a helpful error message and the openib BTL will deactivate itself. This commit was SVN r16133. The following Trac tickets were found above: Ticket 1133 --> https://svn.open-mpi.org/trac/ompi/ticket/1133	2007-09-14 21:42:56 +00:00
George Bosilca	617ff3a413	Add a MCA parameter for the ELAN MAP ID file. Fix small memory bugs, and track the final segfault. Still some ork to do. This commit was SVN r16117.	2007-09-12 21:25:35 +00:00
Aurelien Bouteiller	a1f5312afb	Fixed two little warnings This commit was SVN r16116.	2007-09-12 21:07:11 +00:00
Aurelien Bouteiller	ccb3f75e8f	Make sure that the pml v parasite never get loaded when user did not requested FT. This does not break the ability to switch protocol on the fly. This commit was SVN r16114.	2007-09-12 20:47:17 +00:00
George Bosilca	1e7a791349	Remove some of the problems identified by Coverty. This commit was SVN r16112.	2007-09-12 20:13:26 +00:00
Aurelien Bouteiller	828af95be8	Major modification of the vprotocol framework build system. With a better integration in autogen.sh, it allows for generating static-components.h the usual way. NOTE: This build system does not work with the current autogen.sh. Modified one is under heavy testing to make sure it does not have side effects This commit was SVN r16110.	2007-09-12 18:46:37 +00:00
George Bosilca	05ae27c68b	Don't segfault if we receive a fragment for a non existing communicator. Instead, drop it by now. This commit was SVN r16105.	2007-09-12 17:52:02 +00:00
George Bosilca	c755938eb0	Coverty: release the temporary buffer on error. This commit was SVN r16104.	2007-09-12 17:45:12 +00:00
Shiqing Fan	a0660f4deb	- Just some type casts. This commit was SVN r16100.	2007-09-12 15:29:58 +00:00
Gleb Natapov	07c8fddeef	Fix scheduling of pending send request. It should be scheduled req_lock times. This commit was SVN r16096.	2007-09-12 07:08:38 +00:00
George Bosilca	d8fed2cfa1	Set a default value so that some compilers stop complaining about uninitialized values. This commit was SVN r16094.	2007-09-11 18:00:53 +00:00
Gleb Natapov	b0614931f4	Remove mpool_tree_item from the mpool_tree before unregistering/freeing memory. Otherwise a race exists if another thread allocates already freed memory which is not removed from the mpool_tree yet. This commit was SVN r16038.	2007-09-03 10:56:55 +00:00
Rainer Keller	a3b30749b0	- Only lock/unlock when using threads. Basically revert this part of r16015. This commit was SVN r16029. The following SVN revision numbers were found above: r16015 --> open-mpi/ompi@435e7d80e9	2007-08-31 12:34:48 +00:00
Rainer Keller	9c1c345c07	- head_lock is an opal_atomic_lock_t... This commit was SVN r16028.	2007-08-31 12:20:21 +00:00
Shiqing Fan	efdcfa3807	- "extern 'C'" has been set twice. Remove one. This commit was SVN r16022.	2007-08-30 15:03:59 +00:00
Shiqing Fan	80fdd5e2a4	- Need to be exported. This commit was SVN r16021.	2007-08-30 14:16:03 +00:00
Gleb Natapov	79011279e5	Remove debug output. This commit was SVN r16016.	2007-08-30 13:29:41 +00:00
Gleb Natapov	435e7d80e9	Remove rc parameter from MCA_BTL_SM_FIFO_WRITE() macro. It cannot fail in current implementation. This commit was SVN r16015.	2007-08-30 13:21:52 +00:00
Gleb Natapov	690fb95bda	Cleanup send scheduling code. This commit was SVN r16014.	2007-08-30 12:10:04 +00:00
Gleb Natapov	0b0f9d14aa	Mark send request complete on PML level only when absolutely sure there is no more work associated with this request. No more outstanding completions or packets and send scheduling isn't running in another thread. This commit was SVN r16013.	2007-08-30 12:08:33 +00:00
Gleb Natapov	fe414047bd	registration may be freed inside mca_mpool_rdma_deregister(). This commit was SVN r16012.	2007-08-30 10:52:38 +00:00
Gleb Natapov	091862a25a	Protect access to mca_mpool_base_tree by a lock. This commit was SVN r16011.	2007-08-30 10:51:02 +00:00
Gleb Natapov	eac2674f66	The inner voice tells me this is a typo. This commit was SVN r16004.	2007-08-29 13:28:47 +00:00
Jeff Squyres	466394a878	We only care about the value of ret in the !OMPI_ENABLE_PROGRESS_THREADS case. Reviewed by Brian. This commit was SVN r16000.	2007-08-29 01:36:17 +00:00
Jeff Squyres	c4a38f47f6	Resolve Coverity CID 467: remove unused variable / dead code. This commit was SVN r15997.	2007-08-29 01:23:18 +00:00

... 2 3 4 5 6 ...

2311 Коммитов