openmpi

Автор	SHA1	Сообщение	Дата
George Bosilca	6265625983	Generate the XFER_CONTINUE PERUSE event (or the receive) before unpacking the data. This commit was SVN r10663.	2006-07-05 19:45:00 +00:00
Brian Barrett	4ee4acb6a6	* ignore some Cray-only code when not on the Cray machine This commit was SVN r10660.	2006-07-05 17:16:27 +00:00
Brian Barrett	043153dad3	* fix opal_list_item_t -> ompi_free_list_item_t type change This commit was SVN r10659.	2006-07-05 17:02:16 +00:00
George Bosilca	d2bf3844e9	Include the header file which define opal_output. This commit was SVN r10648.	2006-07-04 06:23:01 +00:00
George Bosilca	2bdb06b549	Force the request to NULL in order to avoid complaints from the compiler. This commit was SVN r10647.	2006-07-04 06:20:13 +00:00
George Bosilca	402a03d229	Add a .h dependency in order to remove a warning when we compile without --enable-debug. This commit was SVN r10646.	2006-07-04 04:53:38 +00:00
George Bosilca	9ac1a6cdb3	Remove the warnings. Now they are ompi_free_list_item not opal_list_item_t. This commit was SVN r10645.	2006-07-04 04:21:16 +00:00
Brian Barrett	7d12f9119a	* make sure to include post_configure.sh in the dist tarball, so that direct calling the ob1 pml works properly. This commit was SVN r10644.	2006-07-04 04:03:58 +00:00
Brian Barrett	47725c9b02	* Add new PML (CM) and network drivers (MTL) for high speed interconnects that provide matching logic in the library. Currently includes support for MX and some support for Portals * Fix overuse of proc_pml pointer on the ompi_proc structuer, splitting into proc_pml for pml data and proc_bml for the BML endpoint data * bug fixes in bsend init code, which wasn't being used by the OB1 or DR PMLs... This commit was SVN r10642.	2006-07-04 01:20:20 +00:00
Graham Fagg	f10c21b746	corrected mca param description and algorithm count (now to find out why I have disallowed direct calling fo the bm tree) This commit was SVN r10603.	2006-06-30 23:22:49 +00:00
Graham Fagg	f64cbbe8f2	ops. some decisions used extent rather than size for decision making yes this means it WAS possible for two nodes to choice two different algorithms (discovered by Doug Gregor and figured out by George) Also changed some names like size to comsize so we know which sizes we are using where This should be updated in al versions This commit was SVN r10601.	2006-06-30 21:49:04 +00:00
Brian Barrett	df9273587f	* romio_cb_write should also be forced to enable when optimizations are requested This commit was SVN r10584.	2006-06-30 15:06:10 +00:00
Galen Shipman	7e079d20ab	fix for stupid casting.. addresses issue on PPC64 where sizes get set improperly and badness ensues.. This commit was SVN r10574.	2006-06-29 21:58:50 +00:00
George Bosilca	7d59a6885b	Remove all references to the MRU list. Add back the repost list checks. For some reasons it decrease the latency by around 0.3 micro-seconds ... This commit was SVN r10571.	2006-06-29 19:25:44 +00:00
George Bosilca	78f0de127d	Typo. This commit was SVN r10567.	2006-06-29 15:16:25 +00:00
George Bosilca	4df58b5579	Latency is LATENCY as everybody understand it not some percentage of something. Now, we really order the BTL depending on the real latency for the eager protocol. Starting from now, the latency one can specify for the devices will be in micro-second, while the bandwidth is in Mbs (as it was before). This commit was SVN r10566.	2006-06-29 15:13:58 +00:00
George Bosilca	238147f576	Help the compiler to optimize the code. Now the order in the enum reflect the order we use them in the switch. This commit was SVN r10565.	2006-06-29 15:10:58 +00:00
George Bosilca	9bf281bca2	Remove the gm_mru_reg list as it is never used. Cleanup the repost logic. Now we repost a receive fragment only when we're done with the message from inside and we try to add it to the list. This commit was SVN r10564.	2006-06-29 15:10:11 +00:00
George Bosilca	43b7b17033	Release the memory registration when the descriptors get freed. This commit was SVN r10540.	2006-06-28 15:24:16 +00:00
George Bosilca	d9daa34a6c	Set the registration field to NULL when we create a new fragment. This commit was SVN r10539.	2006-06-28 15:23:36 +00:00
Gleb Natapov	c8f75c472a	remove modulo op from fast path. Improvement 0.02-0.04ms. This commit was SVN r10538.	2006-06-28 12:00:47 +00:00
Gleb Natapov	e58a89ef3e	OMPI_ENABLE_DEBUG is always defined (to 0 or 1). Use #if and nto #ifdef. This commit was SVN r10537.	2006-06-28 11:25:09 +00:00
Gleb Natapov	704a5eb645	Support for LMC (lid mask count) and multiple QPs per port. This commit was SVN r10536.	2006-06-28 07:23:08 +00:00
Galen Shipman	e6cd8db0e5	DR will now checksum on a per btl basis (see MCA_BTL_FLAGS_NEED_CSUM). We still always send ACK's, teasing apart completion for ACK/no ACK looks like a pain in the .. This commit was SVN r10530.	2006-06-27 20:23:47 +00:00
Brian Barrett	0031e39d72	* fix for dumb memory bug introduced in romio performance fixup code This commit was SVN r10528.	2006-06-27 19:58:18 +00:00
Brian Barrett	9a65a7ca97	* re-add -Is necessary for VPATH builds. This commit was SVN r10524.	2006-06-27 14:10:34 +00:00
Jeff Squyres	df45221a3e	Until a real fix for #142 is found, this workaround prohibits using mpi_leave_pinned when multiple OpenIB HCA ports are found. Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are found, the MCA parameter btl_openib_max_btls is set to 1. If the MCA parameter btl_openib_warn_leave_pinned_multi_port is true, emit a warning that this happened (having an MCA parameter to control the warning allows users/sysadmins to turn it off instead of being nagged for every run). This commit was SVN r10521.	2006-06-27 10:43:03 +00:00
Gleb Natapov	52208d7bf9	Whe don't need to register zero sized frags. This commit was SVN r10519.	2006-06-27 08:50:12 +00:00
Galen Shipman	8855e5b73a	Fixes for DR as well as better diagnostic.. Successfully passing the intel test suite with/without induced errors/drops. This commit was SVN r10518.	2006-06-26 22:29:29 +00:00
Brian Barrett	970d858f30	* Add performance code requested by LANL, per ticket #128 . Must be explicitly enabled at run-time with the mca parameter io_romio_enable_parallel_optimizations set to something non-zero. This will enable some magic flags in Panasas if the user didn't set them (either on or off) and do some slightly better things with strided collective writes. This commit was SVN r10516.	2006-06-26 22:26:36 +00:00
George Bosilca	940dbff0fa	Add a new PERUSE macro. This is for the CONTINUE event (the one we added to the standard). This macro allow us to specify the length of the fragment. Now we are able to know how the message is fragmented between the network devices or inside the communication protocol. This commit was SVN r10508.	2006-06-26 20:08:33 +00:00
George Bosilca	41c886399b	Don't let the user to specify flags which does not make sense. If the PUT flag is specified check that the put function is available for the BTL. Same safe check for the GET function. At the end make sure that at least on communication protocol is specified, otherwise force the send flag. This commit was SVN r10507.	2006-06-26 20:00:18 +00:00
George Bosilca	c43b9821e7	Generate the PERUSE XFER_CONTINUE event. This commit was SVN r10501.	2006-06-26 19:01:22 +00:00
George Bosilca	53a5d3df0f	Remove useless lines. This commit was SVN r10500.	2006-06-26 19:00:37 +00:00
George Bosilca	a514cdc068	Always limit the size of the RDMA transfer to the maximum amount supported by the BTL (btl_max_rdma_size). Now the PUT protocol is pipelined even if there is just one network between the 2 peers. Unfortunately, this problem is present the 1.1 (no pipeline for the PUT protocol). This commit was SVN r10499.	2006-06-26 19:00:07 +00:00
George Bosilca	8cd4718198	Generate the PERUSE PERUSE_COMM_REQ_XFER_BEGIN event only when there is some data to transfer. This commit was SVN r10498.	2006-06-26 18:57:55 +00:00
Gleb Natapov	b7715395cb	Return descriptor before sending credits one more time. We may need it. This commit was SVN r10495.	2006-06-26 07:05:58 +00:00
Andrew Friedley	7bfac82ce7	Change over from lazy connection setup to setting up at initialization time. UD is connectionless, and as long as peers are statically assigned to QPs, there is no reason to set up the adressing information lazily. Lots of code was axed, as endpoints no longer have state. Removed a number of other elements in the endpoint struct to make it as lightweight as possible. I was able to remove an entire function call/branch in the send path, which I believe is the main contributor to a 2us drop in NetPIPE latency. Some whitespace cleanups as well. Passes IBM test suite, and all but certain intel tests that were failing before the change, over ob1 PML. This commit was SVN r10494.	2006-06-23 16:50:50 +00:00
Andrew Friedley	046f4cd4ae	Enough cleanup for now. Moved a lot of the module-specific init from the component init to the module init. Try keeping a pointer to reduce indexing, didn't seem to help - leaving in place for now. This commit was SVN r10485.	2006-06-22 22:12:13 +00:00
Andrew Friedley	8392ed4cac	A checkpoint before I really do some cleanup.. nothing pretty here. Playing around with OPAL_LIKELY/UNLIKELY, no real gains yet. Reworked progress() to process many WC's at a time, as well as immediately repost groups of receive buffers. This commit was SVN r10481.	2006-06-22 18:06:55 +00:00
George Bosilca	31365fa799	Use the RDMA limit not the eager one when we schedule a receive (for the PUT protocol). This commit was SVN r10456.	2006-06-21 15:51:56 +00:00
Andrew Friedley	365c81d6e9	Fix a few issues reported by Terry Dontje: 1. ompi/mca/btl/udapl/btl_udapl_proc.c should be including btl_udapl_endpoint.h for mca_btl_udapl_proc_insert function. 2. btl_udapl_endpoint.c it looks like you are using &endpoint->endpoint_lock when you should use &ep->endpoint_lock in a OPAL_THREAD_LOCK call. 3. btl_udapl_frag.h has a couple opal_list_item_t's that should be ompi_free_list_item_t in the _FRAG_ALLOC_{EAGER,MAX} macros. This commit was SVN r10442.	2006-06-20 17:13:44 +00:00
George Bosilca	ec28040c58	Remove all useless assignment (now they are done inside the macro). Protect one call to the _UNPACK macro, in the case where the length of the received data is zero. This might happens on the PUT protocol. This commit was SVN r10431.	2006-06-20 14:16:52 +00:00
George Bosilca	f38480f1d1	Set the recv_bytes value in all the cases. Somehow the PERUSE macro contained an error, so now it hould be back again. This commit was SVN r10430.	2006-06-20 14:14:04 +00:00
George Bosilca	dee2a7a08d	On this branch the rdma_offset should be set. The send_offset is anyway already set in the _START macro. This commit was SVN r10429.	2006-06-20 14:12:32 +00:00
George Bosilca	044868df45	Set the destination descriptor before calling the recv registration. Once this call is completed, we have to remove it in order to be able to cleanup correctly the fragments. This commit was SVN r10428.	2006-06-20 14:11:09 +00:00
George Bosilca	1b18b7d934	Change the parameter registration of this BTL to the new calls (new is relative here). Change the self BTL to use RDMA protocol. This commit was SVN r10427.	2006-06-20 14:09:58 +00:00
Jeff Squyres	1d27ca5d0a	Until a real fix for #142 is found, this workaround prohibits using mpi_leave_pinned when multiple OpenIB HCA ports are found. Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are found, the MCA parameter btl_openib_max_btls is set to 1. If the MCA parameter btl_openib_warn_leave_pinned_multi_port is true, emit a warning that this happened (having an MCA parameter to control the warning allows users/sysadmins to turn it off instead of being nagged for every run). This commit was SVN r10424.	2006-06-20 11:32:46 +00:00
Jeff Squyres	600bf4295a	Update the help message to be slightly more concise and clear This commit was SVN r10422.	2006-06-20 11:23:38 +00:00
Brian Barrett	3d027e57a8	* fix for ticket #141 . If we are going to shortcut out of polling the send/receive queues if there is something available in the short message rdma queues, then we have to poll ALL the rdma queues before exiting, or we aren't fair about frag reception and fall into degenerate matching cases. This commit was SVN r10410.	2006-06-17 21:32:25 +00:00
Brian Barrett	5cadbbbf41	Fix for bug #140 . If we're leaving things pinned, certain assumptions about where to look for registrations that were used in the alloc/free code don't work (because the memory returned from malloc() -- whowever gets around to calling it) might actually be registered already. So just call malloc and free directly and avoid the whole issue when leave pinned is on. After all, you have to pay the registration cost sometime, and if leave pinned is on, you only have to pay it once. It makes things much simpler to have that once be at first use rather than during ALLOC_MEM, and as far as I can read, we're still standards conformant this way. This commit was SVN r10406.	2006-06-17 18:34:41 +00:00
Brian Barrett	c9e8dbc10e	* fix for multi-nic case with put protocol -- index will be 1 for the first put request if we have more than one nic This commit was SVN r10397.	2006-06-16 22:25:04 +00:00
George Bosilca	27000ef7d6	More compact and readable code. Otherwise, no big difference with the previous version. This commit was SVN r10389.	2006-06-16 03:07:42 +00:00
George Bosilca	3f96f39e46	If the goal of this code was to copy the iovec and skip the first offset bytes then it was not correct. This commit was SVN r10388.	2006-06-16 03:06:30 +00:00
George Bosilca	93afe59226	It is not required to initialize the csum. This commit was SVN r10387.	2006-06-16 03:05:20 +00:00
George Bosilca	1f96768b76	For zero length persistent request do not reposition the convertor as it is not initialized. This commit was SVN r10386.	2006-06-16 03:04:41 +00:00
Brian Barrett	05046e8ad2	if MX isn't running on some hosts, but is on others, we were blocking in the modex receive waiting for the non-running procs to publish their contact information. Publish their (lack of) contact information. This commit was SVN r10355.	2006-06-14 19:07:38 +00:00
George Bosilca	aca71521db	Complete the move of the mpool registration from opal_list_item_t to the ompi_free_list_item_t. This commit was SVN r10354.	2006-06-14 17:43:50 +00:00
Galen Shipman	5d71c149c2	Another fix for PML request completion when local network completion can occur out of order.. Reviewed by Brian.. needs to hit 1.1 This commit was SVN r10353.	2006-06-14 16:55:35 +00:00
Brian Barrett	d367dc5d56	* Fix for bug #115 -- we need to decrement the use count on a pinned buffer so that memory is actually deregistered. Reviewed by Galen. This commit was SVN r10349.	2006-06-14 13:38:24 +00:00
George Bosilca	3727fa2ae6	Nothing relevant. I add some more output in the case we have a checksum error. Just to be able to know more information about the failure. This commit was SVN r10337.	2006-06-13 19:36:38 +00:00
Galen Shipman	0eddad6849	Handle out of order completion/receives when marking completion... this is a fix for #107... needs to go to the 1.1 branch.. This commit was SVN r10331.	2006-06-13 16:57:41 +00:00
Andrew Friedley	c68c6ac122	A number of fixes and the usual cleanup.. - Added some basic flow control to limit number of posted sends. - Merged endpoint send/recv lock into single endpoint lock. - Set the LMR triplet length in the send path, not at allocation time. This has to be done because upper layers might send less than the amount allocated. - Alter the tie-breaker if statement protecting the second call to dat_ep_connect(). The logic was reversed compared to the tie- breaker for the first dat_ep_connect(), making it possible for 3 or more processes to form a deadlock loop. - Some asserts were added for debugging purposes.. leaving them in place for now. This commit was SVN r10317.	2006-06-12 22:42:01 +00:00
Galen Shipman	218a438509	finished the ompi_free_list_t class nightmare.. This commit was SVN r10314.	2006-06-12 22:09:03 +00:00
Galen Shipman	18dda70fd0	make ompi_free_list_item_t a class.. This will go to the 1.1 branch but will probably require a few changes as ompi_free_list_t is different in the branch.. This commit was SVN r10306.	2006-06-12 16:44:00 +00:00
Brian Barrett	d3257f22d8	* back out Galen's r10300 because it breaks the build. Real fix coming RSN. This commit was SVN r10303. The following SVN revision numbers were found above: r10300 --> open-mpi/ompi@b0f3745791	2006-06-12 14:38:14 +00:00
Gleb Natapov	48d348b577	Don't complete send request before we've got completion on the first rndv packet. Sender can receive and complete PUT request before it gets completion on the first rndv packet. senreq struct may be reused for the next MPI_Send and unexpected completion mess up the things. I sometimes got SEGV and sometimes data corruption. This commit was SVN r10301.	2006-06-12 14:00:43 +00:00
Galen Shipman	b0f3745791	declare these as ompi_free_list_item_t's This needs to go to 1.1 This commit was SVN r10300.	2006-06-12 13:26:15 +00:00
George Bosilca	7d1feffbf7	The real solution. If the sendreq->req_send.req_bytes_packed is zero then there is no data to be trasfered. And this is the condition which lead to a non initialized convertor. This commit was SVN r10299.	2006-06-12 06:18:18 +00:00
George Bosilca	c959c2f214	Don't reset the convertor's position if it wasn't initialized before. This can only happens for zero byte persistent requests. This commit was SVN r10298.	2006-06-12 06:14:35 +00:00
Galen Shipman	9d73217637	These list items are free list items, and should inherit properly.. This commit was SVN r10295.	2006-06-11 20:19:12 +00:00
Brian Barrett	d5acb4e3cc	* silence dumb (and mostly useless) warning during cleanup This commit was SVN r10280.	2006-06-09 21:09:53 +00:00
Brian Barrett	cc99a63169	* fix issue with PANFS not building properly - we didn't add PANFS_LIB to the list of libraries This commit was SVN r10279.	2006-06-09 20:41:12 +00:00
Jeff Squyres	a4030ad2d9	Improve the tremendously unhelpful MCA help message for the btl_openib_ib_mtu and btl_mvapi_ib_mtu MCA params by showing the valid values what what they represent (got a question about this from Cisco testing engineers). This commit was SVN r10277.	2006-06-09 18:02:45 +00:00
Andrew Friedley	9a92394bfd	Mostly cleanups - preprocessor fixes and removal of OPAL_OUTPUTs. Also updated to match recent mpool_free changes. This commit was SVN r10273.	2006-06-09 00:18:29 +00:00
Andrew Friedley	75176370ae	blah. somehow missed adding .ompi_ignore/.ompi_unignore. This commit was SVN r10272.	2006-06-09 00:15:36 +00:00
Andrew Friedley	cca1616368	Finally committing the UD BTL. UD is the Unreliable Datagram transport for Infiniband, specifically OpenIB. This BTL is derived from the existing openib BTL, which is RC (Reliable Connection) based. Still a work in progress, as there is a lot of work left to do. Specifically, performance, scalability, and flow control need to be addressed. Currently I'm playing around with different methods for handling receive buffers, as well as profiling to figure out where the time is going. This commit was SVN r10271.	2006-06-09 00:13:45 +00:00
Galen Shipman	08823e56fa	check address before looking for the item in the tree corresponding to the address.. All have been reviewed by brian.. putting in a changeset request.. This commit was SVN r10256.	2006-06-08 16:27:59 +00:00
Galen Shipman	636ef0cf6c	don't put back null items on the list.. This commit was SVN r10253.	2006-06-08 14:46:41 +00:00
Galen Shipman	429056078a	fix numerous late night errors.. 1) don't need tree if memory is just malloc'd 2) fix memory and free list leak.. 3) deregister first and then free... doh.. This commit was SVN r10251.	2006-06-08 14:23:20 +00:00
Galen Shipman	5a2ceda93f	a couple of stupid late night mistakes... This commit was SVN r10250.	2006-06-08 13:39:41 +00:00
Galen Shipman	0bb8a6fca8	roll back to not use memalign This commit was SVN r10249.	2006-06-08 04:34:04 +00:00
Galen Shipman	b42b0bd1af	potential fix for ticket #81 Added a tree to track memory allocation from MPI_Alloc_mem, this allows us to free the registrations in a sane fashion.. also should be faster.. This commit was SVN r10248.	2006-06-08 04:29:27 +00:00
Sven Stork	c31e6f9767	use memalign instead of malloc + manually alignment in the mvapi mpool revert commit 10243 This commit was SVN r10247.	2006-06-07 23:21:23 +00:00
Andrew Friedley	5ace292cc1	Should fix ticket #81 - which is specific to MVAPI, I've included the same fix for gm/openib as well. uDAPL has the same problem, will fix in separate commit so it doesn't go to branch. This commit was SVN r10243.	2006-06-07 15:52:48 +00:00
Galen Shipman	84479d0b5a	potential fix for iprobe test,, tested with openib.. will have andy try ud.. This commit was SVN r10232.	2006-06-06 22:10:41 +00:00
Galen Shipman	90799f82cd	copy paste error.. This commit was SVN r10220.	2006-06-06 02:38:29 +00:00
Galen Shipman	cc54b07aa0	add better error messages for vapi retry exceeded errors. This commit was SVN r10219.	2006-06-06 02:04:56 +00:00
Galen Shipman	9e6e7575b9	doh... add the file.. This commit was SVN r10210.	2006-06-05 21:24:42 +00:00
Galen Shipman	f05dee0435	add help file to explain why things went south.. This commit was SVN r10209.	2006-06-05 21:23:45 +00:00
Galen Shipman	74c97fb784	cleanup error reporting.. use ompi_proc_t->proc_name if available this gives us source/dest hostnames for communication errors.. This goes to 1.1 branch (reviewed by Brian).. This commit was SVN r10200.	2006-06-05 20:02:41 +00:00
Brian Barrett	c70fff6ed0	* Fix for bug #44 for the trunk -- remove a bunch of warnings from the DR PML when compiling on Solaris. Patch won't apply cleanly to the v1.1 branch, so a diff for that is coming up soon. This commit was SVN r10173.	2006-06-01 18:58:38 +00:00
Galen Shipman	83ff3201b5	don't use rank or nprocs in error messages when we don't have them.. This should hit 1.1 and 1.0 branches.. Reviewed by Brian This commit was SVN r10164.	2006-06-01 14:24:11 +00:00
Galen Shipman	0344ae4ac5	Fix to allow eager limit and max send size to be any size (within resource limitations). Instead of storing the ompi_free_list_t * in the fragment, we use the frag type enum, this tells us where the frag came from and where it should return.. This could also be done in mvapi but is not a high priority moving forward.. Review by Brian, needs to hit the trunk + 1.1 release.. This commit was SVN r10157.	2006-06-01 02:32:18 +00:00
Brian Barrett	5163f2b296	Fix for bug #36 . The MX, MVAPI, and OpenIB components don't have support for progress threads, so we shouldn't build them or try to use them when support for progress threads has been requested. The TCP, GM, SELF, and SM BTLs should have progress thread support, so they aren't disabled. The Portals BTL isn't compiled on platforms with threads, so it doens't need to be updated. This commit was SVN r10156.	2006-06-01 01:30:16 +00:00
Galen Shipman	c79efc9efb	track which list a fragment came from, allows returning based on list, not on size. This commit was SVN r10142.	2006-05-31 14:24:32 +00:00
Brian Barrett	4904e34a52	set datarootdir, necessary for Autoconf-2.60 which will define some variables based upon this value (e.g., datadir, docdir). Submitted by: Ralf Wildenhues Reviewed by: Brian Barrett This commit was SVN r10133.	2006-05-31 03:43:55 +00:00
Brian Barrett	6026fc98f6	* Fix M4 quoting so that AC 2.60 won't complain Submitted by: Ralf Wildenhues Reviewed by: Brian Barrett This commit was SVN r10129.	2006-05-31 03:39:18 +00:00
Brian Barrett	c723d196c5	Rather than using fragment size to determine fragment type, use an enum. Do this rather than the my_list pointer because we need to do some things that are somewhat special because we pre-pin eager fragments but not send fragments. Also makes a couple ideas I have slightly easier to play around with. This commit was SVN r10127.	2006-05-31 03:34:32 +00:00
Galen Shipman	2667c52a5d	Track fragments by list, not by size.. -- reviewed by Brian, needs to hit all the branches.. This commit was SVN r10078.	2006-05-25 18:07:26 +00:00
Galen Shipman	38a0561d9b	Allow maximum send size to be less than the eager limit. Instead of figuring out which free list the fragment belongs to based on size we simply store a pointer to the list which it belongs in the fragment. This was reviewed by Brian and should hit all the branches. This commit was SVN r10072.	2006-05-25 16:57:14 +00:00
Andrew Friedley	fa9ec2afdf	Add my sandia username for convenience This commit was SVN r10071.	2006-05-25 15:49:11 +00:00
Andrew Friedley	8a3d0862ca	I can commit! happy dance Trying to remember what I did here.. eager/max messages should work now, no RDMA yet. A number of other fixes and cleanups. I do know of two problems: Bad stuff happens when flooded with send frags too quickly - the BTL doesn't handle flow control. Certain IBM tests turn up a length assertion in the datatype engine - needs more investigation. This commit was SVN r10070.	2006-05-25 15:47:59 +00:00
Gleb Natapov	f590d8a190	fix eager RDMA on PPC64. This commit was SVN r10059.	2006-05-25 11:05:12 +00:00
Jeff Squyres	dd44d36be0	Fix for ticket #25 . Ensure that in the threaded case where we have This commit was SVN r10043.	2006-05-24 16:15:07 +00:00
George Bosilca	95d0395578	I'm skeptical about the ability of the compiler to correctly optimize the loop local variables. This commit was SVN r10019.	2006-05-23 03:21:15 +00:00
George Bosilca	085cac552f	Don't let TCP to create local connections, we have the self BTL for this purpose. This commit was SVN r10018.	2006-05-23 03:06:32 +00:00
George Bosilca	837221831a	Temporary solution for in-bound computation of the next BTL. This commit was SVN r10016.	2006-05-22 23:28:40 +00:00
George Bosilca	b8ef0cc749	Minor cleanups. This commit was SVN r10001.	2006-05-21 05:55:21 +00:00
George Bosilca	e43fbd0082	Remove all useless variables. Minor cleanups. This commit was SVN r10000.	2006-05-21 05:53:22 +00:00
Galen Shipman	9165882c07	fixes for failover... This commit was SVN r9998.	2006-05-20 02:39:05 +00:00
Gleb Natapov	1c1b87a9f1	init mutex before use. This commit was SVN r9963.	2006-05-18 09:35:11 +00:00
Jeff Squyres	15758d5f29	Fix AC_DEFINE to match what it's supposed to be defining This commit was SVN r9952.	2006-05-17 03:26:43 +00:00
Galen Shipman	deb2254c91	1. mpool_free changes to allow null registrations 2. fix for MPI_Free_mem, was calling deregister but never called mpool_free.. so we leaked memory. Still an open issue here though, if the memory is alloc'd and the mpool doesn't create and cache a registration, we will never find the mpool to free with. This commit was SVN r9944.	2006-05-16 22:04:31 +00:00
Jeff Squyres	7b59847765	Ensure that endpoint->endpoint_addr is not NULL before trying to derefence through it. It is legal for endpoint_addr to be NULL in the destructor because if btl_tcp_add_procs() -> btl_tcp_proc_insert() returns UNREACH, then endpoint_addr will be NULL and we'll OBJ_RELEASE it. This commit was SVN r9940.	2006-05-16 19:01:08 +00:00
Jeff Squyres	e24377a89c	Back out a pair of commits from George from last week because they apparently don't work properly: r9869, r9868 (sm btl alignment issues) This commit was SVN r9936. The following SVN revision numbers were found above: r9868 --> open-mpi/ompi@9b985c3216 r9869 --> open-mpi/ompi@adedf511fb	2006-05-16 16:48:43 +00:00
Sven Stork	da7ad0e8b8	- update function name inside debug statement This commit was SVN r9933.	2006-05-16 14:33:41 +00:00
Brian Barrett	dcc6b47fa2	* put rdma operations in the send event queue instead of receive because it's easier to do event accounting that way * greatly increase receive event and buffer sizes. We're still about half of what Cray defaults to, so I don't feel bad about the increases * Implement a pre-pinning optimization for eager fragments - will be pinned on first use and left pinned for the life of the fragment * Since we can't have two receive frag callbacks fired at the same time, don't have receive free list - just keep one receive fragment in the module. Saves a big free list and all that interaction. This commit was SVN r9915.	2006-05-14 04:23:26 +00:00
Brian Barrett	db03ca0cc0	rip out a bunch of code that didn't work and really sucked and was only there to try to get some numbers that I couldn't actually get. So back to the restart point. This commit was SVN r9914.	2006-05-14 00:59:40 +00:00
Brian Barrett	f2a6e63d82	Fix for the double iWrite problem Edgar found with ROMIO, plus some other things I found: - Locking should prevent it from happening (I think), but there was a race condition in the component progress -- a callback could be triggered that would free the request before it was off the outstanding requests list. - When pulling a request off the component free list, make sure to reinitialize the free_called state on the IO request. This was what was causing Edgar's failures - In the request cleanup code, pull the request out of the per- component free list before returning to the free list. This probably would cause asserts to fire, although it looks like I wrote the loops such that it would have been memory safe if the asserts didn't fire. Not really sure why I did that, but let's try it again... This should go to the v1.0 and v1.1 branches. This commit was SVN r9913.	2006-05-13 02:30:40 +00:00
Jeff Squyres	a6d52ceed1	Minor correction in use of mca param API; otherwise the param is not found. This commit was SVN r9903.	2006-05-11 22:12:29 +00:00
Andrew Friedley	4c3aa05c83	uDAPL has an expects memory for enumerating interface adapters in a really weird way - fix up to do things 'properly'. Add my sandia username to the unignore. This commit was SVN r9879.	2006-05-10 19:50:30 +00:00
George Bosilca	adedf511fb	Remove the printf that I unfortunately commit. This commit was SVN r9869.	2006-05-10 00:02:54 +00:00
George Bosilca	9b985c3216	Force the useful data to be aligned on special boundary. It is 32 bits right now. Some testing on large NUMA machines should be done in order to make sure that we need to export this variable out to the MCA layer. This commit was SVN r9868.	2006-05-09 21:46:10 +00:00
George Bosilca	a386fccccc	Increase the default limits for the SM BTL. These new values allow better performances on all the clusters I was able to test. This commit was SVN r9867.	2006-05-09 21:44:24 +00:00
Brian Barrett	91086cf2a4	* we want to unlink match entries when we unlink memory descriptors, but I want to be lazy and not do it by hand, so set the match entries to PTL_UNLINK. This commit was SVN r9861.	2006-05-09 14:20:51 +00:00
Gleb Natapov	0c34d5c9e6	fix endpoint matching in on demand connection establishment. This fix is in mvapi btl already. This commit was SVN r9855.	2006-05-09 12:12:52 +00:00
Brian Barrett	1d337831d0	Fixes for more issues found by Dries Kimpe: - We had a bad conditional choice, such that asking for pvfs2 would result in pvfs trying to build as well, which was going to fail. - We didn't try to link in the libray for PVFS2's adio component. - We were clobbering romio_flags, so it was impossible to pass flags to romio (like the selection of filesystems) This commit was SVN r9854.	2006-05-09 09:30:09 +00:00
Galen Shipman	c992eeb1f3	don't need to decrement memory registered twice,, this is done in mru_delete.. This commit was SVN r9853.	2006-05-08 17:42:34 +00:00
Brian Barrett	7dddc6d54c	Define the constants needed by ROMIO to activate support code for DARRAY / SUBARRAY. This commit was SVN r9851.	2006-05-08 16:33:31 +00:00
Brian Barrett	462849d88c	Fix two issues reported by Dries Kimpe: - LDFLAGS set at the top level of Open MPI were not passed to the ROMIO configure script - If ROMIO was explicitly required (with --enable-io-romio) and not able to be built, abort OMPI's configure script. This needs to go to the v1.0 and v1.1 branches. This commit was SVN r9845.	2006-05-08 13:13:32 +00:00
Brian Barrett	8397a1d71f	still running into issues, but... - change MASK behavior for tags - we need the upper bit to be whether the tag is reseved or not. MPI_ANY_TAG should not pull off any reserved tag communication - some other random debugging output to try to get some idea what is spewing out of here. This commit was SVN r9844.	2006-05-08 09:23:09 +00:00
George Bosilca	e658557d52	Move the convertor creation out of th critical path. If we expect a message from a known peer (not MPI_ANY_SOURCE) then we can attach the remote proc and initialize the convertor as soon as we know the data-type, and the count (so basically in the _INIT macro). If it's not the case, then create them in the _MATCHED macro (as in the original version). Of course, beforeinitializing the convertor we check that there will be some data in the message. This commit, plus the convertor improvements from few days ago, lower the latency for my test case environment (mvapi) by 0.1 microseconds. The convertor now is as slim as it can be, I don't think there is anything else to remove/improve. This commit was SVN r9843.	2006-05-07 21:03:12 +00:00
George Bosilca	a7542824ed	Generic length computation (moved from the endpoint.h). This commit was SVN r9842.	2006-05-07 20:54:44 +00:00
George Bosilca	569b88e093	The endpoint include is not required. This commit was SVN r9841.	2006-05-07 20:52:55 +00:00
George Bosilca	e63c1dc242	The last commit wans't supposed to bring this function in. It's not yet ready for primetime... This commit was SVN r9840.	2006-05-07 20:51:43 +00:00
George Bosilca	33aa65f894	Remove useless include. This commit was SVN r9839.	2006-05-07 20:49:45 +00:00
Galen Shipman	a4c9db0c18	decrease the total bytes in the rcache when a registration is deleted from the cache. This commit was SVN r9837.	2006-05-07 01:15:33 +00:00
Rainer Keller	0f9b10ff8e	- Update test dup MPI_COMM_WORLD -- so that we may have additional Barriers for output. This commit was SVN r9831.	2006-05-05 07:42:33 +00:00
Rainer Keller	71d328c086	- Add the PERUSE_COMM_REQ_XFER_CONTINUE for recv. This commit was SVN r9820.	2006-05-04 19:31:33 +00:00
Tim Woodall	161e54e6c8	finalize/cleanup failed btl This commit was SVN r9819.	2006-05-04 18:48:45 +00:00
Tim Woodall	d8ff8010f3	track wether the vfrag is being retransmitted This commit was SVN r9817.	2006-05-04 17:30:58 +00:00
Tim Woodall	1b26caa95b	first cut at btl failover - seems to be working for simple test case This commit was SVN r9816.	2006-05-04 16:16:26 +00:00
Tim Woodall	350d5b1713	change hardcoded values into mca params This commit was SVN r9815.	2006-05-04 15:20:18 +00:00
Tim Woodall	fdd622544b	added optional copy routine to allow "derived" class of mca_bml_base_endpoint to copy state if an endpoint is updated (e.g. btl deleted/added) This commit was SVN r9814.	2006-05-04 15:19:12 +00:00
Brian Barrett	d101e91b97	* fix matching logic - since tag might be negative, need to mask the proper bits or the bit-wise or changes all the high bits, which is bad * push convertor creation to init to save a bit of time * make debugging use macros so that it can go bye-bye This commit was SVN r9810.	2006-05-04 13:48:32 +00:00
George Bosilca	bdecdc8d41	Cleanup the MX BTL. Remove all mpool related code as there will never be a MX mpool. This commit was SVN r9808.	2006-05-04 06:55:45 +00:00
George Bosilca	c5209aad93	The return value is random. Let's return something that make sense. This commit was SVN r9805.	2006-05-03 18:17:00 +00:00
Brian Barrett	6db0f2a027	* couple of corrections to compile on Red Storm This commit was SVN r9801.	2006-05-03 13:13:59 +00:00
Brian Barrett	4add400f7d	* properly start with the memory descriptor inactive This commit was SVN r9787.	2006-05-01 20:23:38 +00:00
Brian Barrett	5f939c53be	* first take at send / receive for a poratls pml (still really dumb and simple) This commit was SVN r9786.	2006-05-01 20:03:49 +00:00
Brian Barrett	56f48357b3	* don't try to register callback at init time (will do at window creation time anyway), so that we can run without ob1 This commit was SVN r9785.	2006-05-01 20:03:03 +00:00
Brian Barrett	4256705ffb	* rename irecv, isend, and iprobe files to recv, send, and probe This commit was SVN r9780.	2006-04-29 22:06:21 +00:00
Brian Barrett	315a889247	Try to get the Portals PML going again, just to get some data for the Cray paper. This is just the shell, for checkpoint. Changes: * Fix copyrights * remove cancel code and ptl references * add dump command This commit was SVN r9779.	2006-04-29 22:05:20 +00:00
Tim Woodall	02d991532f	interface to post a callback for notification of change to modex data This commit was SVN r9753.	2006-04-27 16:15:35 +00:00
Tim Woodall	4fd2a71b6c	removed debug code - free list implementation has changed This commit was SVN r9750.	2006-04-27 15:34:12 +00:00
Brian Barrett	9cab1bb54a	* re-enable the eager fragment throttling, this time with the proper threshold value for when the memory descriptor is closing itself, so that it actually works properly ;). I think I was just getting lucky and not sending enough short messages with the reference impl. This commit was SVN r9748.	2006-04-27 14:13:52 +00:00
Brian Barrett	66d1d3b83f	* add a quick debugging sanity check * It appears that Cray's SeaStar has some horrible performance for iovecs - IN_pLACE was actually slower than copying into eager frags. Ugh. And we don't even pre-pin eager frags yet! This commit was SVN r9738.	2006-04-27 02:55:31 +00:00
George Bosilca	3e968d4f63	There is no length on the free list. This commit was SVN r9704.	2006-04-24 23:13:51 +00:00
Brian Barrett	1da22f9099	* silence a bunch of compiler warnings on Solaris when using the Sun compilers. This should go to the v1.1 branch This commit was SVN r9693.	2006-04-23 21:15:09 +00:00
Brian Barrett	9befdc7d9f	* Ensure that mca_common_sm_mmap_seg_alloc() always returns a word-aligned pointer. Otherwise, we can end up segfaulting when the memory area is used by the caller. Fixes a bug reported by Alex Spiegel. This commit was SVN r9692.	2006-04-23 21:14:03 +00:00
Brian Barrett	9a65ddd788	* back out r9005, which for some reason works fine on the reference implementation but causes resource exhaustion on the Red Storm implementation. Sigh... This commit was SVN r9686. The following SVN revision numbers were found above: r9005 --> open-mpi/ompi@20d06e889e	2006-04-22 20:12:33 +00:00
George Bosilca	29219ee57d	Thanks to Gleb now we are able to call the schduler on Windows. Instead of using sched_yield, we use our friend SwitchToThread. This commit was SVN r9671.	2006-04-20 19:56:50 +00:00
Graham Fagg	c31a5ad4b3	A few small changes that just expanded in the name of neatness... (1) As pointed out by Torsten after Jeff comment that there are 15 collectives yesterday.. nope.. I have 16 but miss counted them in my ifdefs (I had two #11s). Replaces with enum... (2) Added a readonly MCA param for how many backend algorithms are available per collective (used by benchmarker/STS) This allowed me to remove the tuned query internal functions and replace them with ompi_coll_tuned_forced_max_algorithms[COLL]. (3) I was reading the user forced MCA params for the collectives on each comm create (module init) but I then put the values into a global set of variables (like ompi_coll_tuned_reduce_forced_algorithm). To fix this and make the code neater: (a) The component looks up the MCA param indices on Open if dynamic_rules is set via the ompi_coll_tuned_COLLECTIVE_intra_check_forced_init () call. (b) Got rid of the ompi_coll_ompi_coll_tuned_COLLECTIVE_forced_algorithm/segmentsize/etc globals with a struct that is now cached on the module data hung off the communicator. i.e. done right. (c) On module init if dynamic rules enabled we call a general getvalues routine (in coll_tuned_forced.c) to get the CURRENT values using the MCA param indices and then put them on the modules data segment. A shorter version of getvalues exists for barrier which only needs the algorithm choice This commit was SVN r9663.	2006-04-19 23:42:06 +00:00
Andrew Friedley	345551cb36	Checkpoint before starting work on max-sized frags (maybe user too?). - Some initial work on prepare_src - Move some fragment initialization around - Fix a union casting issue on picky compilers, identified by Don Kerr - Other small cleanups/bugfixes This commit was SVN r9662.	2006-04-19 22:20:22 +00:00
George Bosilca	61bea41350	The same in MX (missing copyright). This commit was SVN r9661.	2006-04-19 21:37:30 +00:00
George Bosilca	afe9821d84	Add a missing copyright. This commit was SVN r9660.	2006-04-19 21:36:22 +00:00
Tim Woodall	10f343734f	decrease eager limit to 12K (improves latency) This commit was SVN r9646.	2006-04-14 22:28:37 +00:00
Tim Woodall	6523c12e4b	- decrease eager limit to 12K (improves latency) - trigger event library while setting up connections This commit was SVN r9645.	2006-04-14 22:28:05 +00:00
Tim Woodall	c6489cb5aa	- turn on eager rdma by default This commit was SVN r9641.	2006-04-14 21:11:14 +00:00
George Bosilca	b3cc3d82d3	Activate the OOB while we setup connections for MVAPI. Same thing should be done for the Open IB ... This commit was SVN r9640.	2006-04-14 20:53:42 +00:00
Galen Shipman	ba0aa46220	make csum's optional in pml dr, on by default, see mca param pml_dr_enable_csum This commit was SVN r9608.	2006-04-10 21:54:46 +00:00
Gleb Natapov	98282a3567	fix spelling. threashold -> threshold. This commit was SVN r9577.	2006-04-08 08:13:37 +00:00
Andrew Friedley	d461b55696	- Implement OOB connection handshaking via the ORTE RML. To start a connect, we send our local addr_t OOB. Remote side then matches endpoints and calls dat_ep_connect(). Everything should be the same as before from here, except that client/server roles are reversed. - Properly set our buffer size when posting receives. When the frag used to transfer address information is recycled by the free list, the wrong buffer size was being used, which caused buffer overflow errors. - Finally put the uDAPL error handling stuff in the mpool component. - Remove a few more OPAL_OUTPUTs. This commit was SVN r9569.	2006-04-07 15:26:05 +00:00
Galen Shipman	c29db49198	return out if we ack a duplicate matched rendezvous from mathed receives sequence tracker and the communicator is null.. This commit was SVN r9521.	2006-04-03 21:04:51 +00:00
Gleb Natapov	b6ab1f4262	fix compilation warnings. This commit was SVN r9515.	2006-04-02 11:32:25 +00:00
George Bosilca	22572940c8	Remove some useless code. This commit was SVN r9513.	2006-04-01 07:42:43 +00:00
George Bosilca	58cd591d3b	PERUSE support for OB1. There we go, now the trunk has a partial peruse implementation. We support all the events in the PERUSE specifications, but right now only one event of each type can be attached to a communicator. This will be worked out in the future. The events were places in such a way, that we will be able to measure the overhead for our threading implementation (the cost of the synchronization objects). This commit was SVN r9500.	2006-03-31 17:09:09 +00:00
George Bosilca	1226d452bf	Add a base _START macro that will do the base initialization. Additinaly, that allow me to add the PERUSE event is a more homogeneous manner (all PML's will have them). This commit was SVN r9499.	2006-03-31 17:05:09 +00:00
Andrew Friedley	74b2f77a4c	The expected cleanup/refactoring commit.. Not much got tested that wasn't already - I've uncovered a connection establishment deadlock and wanted to get these changes committed before I attack it. The big changes: - Moved much of the connection code from btl_udapl_component.c to btl_udapl_endpoint.c. - Cleaned up initialization of various fragment members. - MCA_BTL_UDAPL_ERROR macro, which is compiled in/out appropriately. This commit was SVN r9496.	2006-03-31 16:25:19 +00:00
Galen Shipman	1d67917b69	must handle header validation correctly for each case, not enough in common for the MACRO This commit was SVN r9486.	2006-03-30 21:27:21 +00:00
Tim Woodall	9a73fe8beb	check for valid sequence number before attempting to use communicator This commit was SVN r9482.	2006-03-30 19:36:15 +00:00
Gleb Natapov	256bf70530	Forgot to add file to previous commit This commit was SVN r9480.	2006-03-30 17:37:52 +00:00
Gleb Natapov	79bcfb096f	Add type to frag. Sometimes we need to know that a frag is from short rdma area. I used hack for this that doesn't work for mvapi, so changing it to something more sane. This commit was SVN r9477.	2006-03-30 15:26:21 +00:00
Gleb Natapov	ea11582191	Porting of short message RDMA from openib BTL. Endpoint registers circular buffer and sends its address and rkey to the peer. Peer uses this buffer to eagerly RDMA small message into it. Endpoint polls the buffer for message arrival before checking HP/LP QPs. Set btl_mvapi_use_eager_rdma to 1 to enable it. This commit was SVN r9474.	2006-03-30 12:55:31 +00:00
Galen Shipman	641fa6c0d2	more fixes, reset state on completion.. This commit was SVN r9469.	2006-03-29 22:21:35 +00:00
Galen Shipman	2945f77f9e	randomly drop fragments without local completion, currently commented out as we must handle the other cases first.. This commit was SVN r9468.	2006-03-29 22:19:58 +00:00
Andrew Friedley	0eba366b07	Various pieces all over to make basic small message send/recv work. Next step is clean up the code.. it is in need of refactoring and testing. Thanks to Brian for help in troubleshooting! This commit was SVN r9466.	2006-03-29 21:55:41 +00:00
Galen Shipman	5271948ec0	--- opal object changes add object size to opal class no longer need the size when allocating a new object as this is stored in the class structure --- dr changes Previous rev. maintained state on the communicator used for acking duplicate fragments, but the communicator may be destroyed prior to successfull delivery of an ack to the peer. We must therefore maintain this state globally on a per peer, not a per peer, per communicator basis. This requires that we use a global rank on the wire and translate this as appropriate to a local rank within the communicator. This commit was SVN r9454.	2006-03-29 16:19:17 +00:00
George Bosilca	5d465cf118	Call the constructor on the DR lock. This commit was SVN r9438.	2006-03-28 07:34:02 +00:00
Graham Fagg	19906e66dc	missing lock? This commit was SVN r9436.	2006-03-28 06:15:48 +00:00
George Bosilca	46c442fe0d	We do not have direct access to the module. Grab the one attached to the window instead. This commit was SVN r9434.	2006-03-28 05:06:40 +00:00
Tim Woodall	c1bf71b1be	- updated copyrights - removed unused state - starting to add support for btl failover This commit was SVN r9431.	2006-03-27 22:48:12 +00:00
Tim Woodall	c724e4c804	- removed unused flags - updated copyrights This commit was SVN r9430.	2006-03-27 22:44:26 +00:00
Gleb Natapov	590c992a7e	fix recursive lock of openib_btl->ib_lock. This commit was SVN r9427.	2006-03-26 15:02:43 +00:00
Gleb Natapov	01a119c3c5	fix compilation bug with --enable-mpi-threads This commit was SVN r9426.	2006-03-26 13:24:10 +00:00
Gleb Natapov	a5a78b10cc	Implementation of short message RDMA. Endpoint registers circular buffer and sends its address and rkey to the peer. Peer uses this buffer to eagerly RDMA small message into it. Endpoint polls the buffer for message arrival before checking HP/LP QPs. Set btl_openib_use_eager_rdma to 1 to enable it. This commit was SVN r9425.	2006-03-26 08:30:50 +00:00
Galen Shipman	1677ca1cd4	continue to debug retransmission of incorrect offset, only occurs on vfrag timeout.. This commit was SVN r9421.	2006-03-24 22:28:43 +00:00
Brian Barrett	01671f2991	* allow user to set "no_locks" info argument as MCA parameter to override the default * Add ability to start Put and Get requests immediately instead of queuing until synchronizaion when using Fence. Not entirely sure this is completely safe, so it must be explicitly enabled by the user, either with an MCA parameter or info argument to Win_create. This commit was SVN r9418.	2006-03-24 18:56:59 +00:00
Tim Woodall	2e376e0ee8	misc cleanup This commit was SVN r9410.	2006-03-24 06:49:45 +00:00

... 2 3 4 5 6 ...

1224 Коммитов