openmpi

Автор	SHA1	Сообщение	Дата
Tim Prins	06bf4c3f3b	fix some printf warnings This commit was SVN r14934.	2007-06-06 22:37:26 +00:00
George Bosilca	6a5e039466	Allow smart connection to be setup. Each peer now has attached to it thea unique id based on the last half of the mapper MAC. This allow us to figure out how to connect peers. This allow the MX BTL to be used in a cluster of cluster configuration where each cluster have MX internally as well as on a multi rail MX system. This commit was SVN r14932.	2007-06-06 21:42:11 +00:00
Galen Shipman	5340f5e320	Try to cleanup the flow control logic a bit Renamed a few variables Inialize the reserve receive buffers to 1, prior to this they were initialized to zero. This commit was SVN r14919.	2007-06-06 18:51:09 +00:00
Gleb Natapov	de58336c45	Let rdma_pipeline_offset to be set to zero. This commit was SVN r14900.	2007-06-06 11:54:25 +00:00
Rich Graham	e276f7bcc7	undo my error. This commit was SVN r14890.	2007-06-05 23:32:47 +00:00
Rich Graham	ce0e9ac329	initialize lock properly. This commit was SVN r14881.	2007-06-05 20:34:11 +00:00
Donald Kerr	8ecbc71ed2	add support for connection private data, off by default This commit was SVN r14878.	2007-06-05 19:29:50 +00:00
Gleb Natapov	ac1e8f81af	Lets be real. TCP latency is slightly worse then mx/openib. This commit was SVN r14865.	2007-06-05 12:22:57 +00:00
Gleb Natapov	fbd033b162	Cut&Paste error in r14795. Fix. This commit was SVN r14862. The following SVN revision numbers were found above: r14795 --> open-mpi/ompi@6b0d8c0858	2007-06-05 10:07:06 +00:00
Shiqing Fan	c142c23f88	Initialize req_ompi.req_status._count to be 0 before starting the request. This commit was SVN r14861.	2007-06-05 09:50:06 +00:00
Brian Barrett	508da4e959	OS X apparently really doesn't like shared libraries with unresolvable symbols in them and environ is defined only in the final application (probably in crt1.o). Apple provides a function for getting at the environment, so use that instead if it's available. This commit was SVN r14857.	2007-06-05 03:03:59 +00:00
Brian Barrett	a446af5b6b	* Remove unneeded SRQ test -- we no longer support OFED builds that don't have the SRQ interface. * Instead of setting AC_DEFINEs per MCA component, set per test. THe answers can never be difference, and this will speed sed just a teeny bit This commit was SVN r14856.	2007-06-05 01:49:26 +00:00
Brian Barrett	0798c0784d	properly set fields so that most difficult alignment rules are always met. This commit was SVN r14854.	2007-06-05 01:46:04 +00:00
Shiqing Fan	0961669912	Spaces after backslash are removed. This commit was SVN r14844.	2007-06-04 10:10:24 +00:00
Shiqing Fan	7bf18a4fd5	MPI_SOURCE should be initialized. This commit was SVN r14843.	2007-06-04 09:37:21 +00:00
Gleb Natapov	10266fb467	Fix deadlock in OB1 protocol by by sending memory by copying if registration fails. This commit was SVN r14842.	2007-06-03 08:31:58 +00:00
Gleb Natapov	a25e1e7b15	Implement new function mca_pml_ob1_send_requst_copy_in_out(req, offset, len) that allows to send any range of a request by send/recv instaed of RDMA and use it to send data from the end of a request in pipeline protocol. This commit was SVN r14841.	2007-06-03 08:30:07 +00:00
Brian Barrett	8cf02de3b4	* cleanup some ompi_info output * enable eager sending by default This commit was SVN r14813.	2007-05-30 22:23:34 +00:00
Brian Barrett	a2713dcac8	eeks! Bad to notice after committing the pt2pt part of r14806 that the compile failed because of the wrong variable name. This commit was SVN r14807. The following SVN revision numbers were found above: r14806 --> open-mpi/ompi@7e57bbb0ef	2007-05-30 20:33:08 +00:00
Brian Barrett	7e57bbb0ef	React slightly better when datatype creation from a buffer fails This commit was SVN r14806.	2007-05-30 20:32:02 +00:00
Brian Barrett	84f7ed70b3	Re-enable the ability for the rdma one-sided component to start messages as soon as the epochs allow, rather than waiting for the end of the synchronization phase. This commit was SVN r14800.	2007-05-30 17:06:19 +00:00
Gleb Natapov	6b0d8c0858	TCP BTL ignores btl_tcp_bandwidth parameter. Fix it. This commit was SVN r14795.	2007-05-30 14:12:05 +00:00
Donald Kerr	91c9b7b6f9	don't call dat_evd_resize if new value is less than or equal to current because ofed stack does not return DAT_INVALID_STATE This commit was SVN r14792.	2007-05-29 20:08:16 +00:00
Gleb Natapov	06bf5d74e7	Remove mca_pml_ob1_send_fin_btl function. This commit was SVN r14784.	2007-05-28 06:51:12 +00:00
Gleb Natapov	f5078db0db	Fix order of parameters to function. This commit was SVN r14783.	2007-05-27 13:45:24 +00:00
Gleb Natapov	f191834e56	No need for MCA_BTL_FLAGS_NEED_ACK any more. As of commit r14768 this is the default behaviour. This commit was SVN r14782. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07	2007-05-27 11:25:39 +00:00
Gleb Natapov	444762456e	Don't dereference NULL pointer. Fix bug introduced in r14768. This commit was SVN r14781. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07	2007-05-27 09:24:56 +00:00
Gleb Natapov	ad69d3c6ac	Fix out of resource handling for FIN packets broken by r14768. This commit was SVN r14780. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07	2007-05-27 08:29:38 +00:00
Brian Barrett	80fa8eef6e	Don't include malloc.h in mpool/base/base.h because it causes all kinds of problems when the memory debugging stuff is enabled. Push it down into the two .c files that do use it. This commit was SVN r14779.	2007-05-27 03:55:21 +00:00
George Bosilca	eb43abf7ae	Allow compilation when there is a progress thread. This commit was SVN r14776.	2007-05-25 01:59:29 +00:00
George Bosilca	8b817e96fd	Allow threaded compilation. This commit was SVN r14775.	2007-05-25 01:53:29 +00:00
Galen Shipman	3401bd2b07	Add optional ordering to the BTL interface. This is required to tighten up the BTL semantics. Ordering is not guaranteed, but, if the BTL returns a order tag in a descriptor (other than MCA_BTL_NO_ORDER) then we may request another descriptor that will obey ordering w.r.t. to the other descriptor. This will allow sane behavior for RDMA networks, where local completion of an RDMA operation on the active side does not imply remote completion on the passive side. If we send a FIN message after local completion and the FIN is not ordered w.r.t. the RDMA operation then badness may occur as the passive side may now try to deregister the memory and the RDMA operation may still be pending on the passive side. Note that this has no impact on networks that don't suffer from this limitation as the ORDER tag can simply always be specified as MCA_BTL_NO_ORDER. This commit was SVN r14768.	2007-05-24 19:51:26 +00:00
Jeff Squyres	81df632e29	Clarification to MCA parameter help messages This commit was SVN r14765.	2007-05-24 19:18:29 +00:00
Brian Barrett	5ec421e1b0	Create a new queue (to simplify locking) for requests that are started but can not be started by the BTL. This commit was SVN r14757.	2007-05-24 17:21:56 +00:00
George Bosilca	7459ab45f1	This is the complete commit for the TCP header issue. Jeff commit a partial fix (r14749) and then backed it out (r14753). As we are unable to send more than a 32 bits length over TCP in one go, there is no reason to have an uint64 length in the header. This reduce the size of the TCP header. This commit was SVN r14755. The following SVN revision numbers were found above: r14749 --> open-mpi/ompi@48c026ce6b r14753 --> open-mpi/ompi@28ed850b4c	2007-05-24 16:40:49 +00:00
Jeff Squyres	28ed850b4c	Back out r14749; it wasn't quite ready for prime time yet... This commit was SVN r14753. The following SVN revision numbers were found above: r14749 --> open-mpi/ompi@48c026ce6b	2007-05-24 15:46:15 +00:00
Brian Barrett	1b025798d2	remove some now unneeded volatiles This commit was SVN r14752.	2007-05-24 15:42:06 +00:00
Brian Barrett	1a9f48c89d	Some much needed cleanup of the rdma one-sided component, similar to r14703 for the point-to-point component. * Associate the list of long message requests to poll with the component, not the individual modules * add progress thread that sits on the OMPI request structure and wakes up at the appropriate time to poll the message list to move long messages asynchronously. * Instead of calling opal_progress() all over the place, move to using the condition variables like the rest of the project. Has the advantage of moving it slightly further along in the becoming thread safe thing. * Fix a problem with the passive side of unlock where it could go recursive and cause all kinds of problems, especially when progress threads are used. Instead, have two parts of passive unlock -- one to start the unlock, and another to complete the lock and send the ack back. The data moving code trips the second at the right time. This commit was SVN r14751. The following SVN revision numbers were found above: r14703 --> open-mpi/ompi@2b4b754925	2007-05-24 15:41:24 +00:00
Jeff Squyres	48c026ce6b	Commit a patch from George (reviewed by Brian): reduce the size of the mca_btl_tcp_hdr_t struct and remove the need for the heterogeneous padding by changing the type of the "size" member to be uint32_t (vs. uint64_t). The value would never be greater than 32 bits anyway, so having the type be uint64_t was wasteful. This commit was SVN r14749.	2007-05-24 15:08:57 +00:00
Gleb Natapov	be71b78f6a	Initialize btl_send_limit before use. This commit was SVN r14745.	2007-05-24 08:40:26 +00:00
Brian Barrett	075389f67d	fix some printf warnings This commit was SVN r14740.	2007-05-23 21:19:26 +00:00
Brian Barrett	38b0d22243	Some cleanups to the pt2pt component * Remove unused declaration * remove unused variable warning when not using progress threads * If we're using progress threads, we want to lock, not trylock when in progress, since it was called from the wakeup thread and not the progress function This commit was SVN r14739.	2007-05-23 20:31:25 +00:00
George Bosilca	b2e805db61	Nothing relevant. Indentation, typos, change PTL to BTL. This commit was SVN r14727.	2007-05-23 14:03:52 +00:00
Sven Stork	88f0845c44	- let the pt2pt component compile with threads enabled This commit was SVN r14725.	2007-05-23 12:56:34 +00:00
Brian Barrett	38eab3613b	* Fix race condition with the pending_{in,out} variables -- if we're going to do while(...) { } then we can't change the variables in the ... atomically, but should do it while holding the module lock. * Fix dumb communicator creation error when we don't create the progress stuff (because a window already exists), where we would accidently jump to the error case. This commit was SVN r14715.	2007-05-21 20:53:02 +00:00
Ralph Castain	4fff584a68	Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that did start. The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system. Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed. Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief. With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn. Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put". This commit was SVN r14711.	2007-05-21 18:31:28 +00:00
Brian Barrett	0e9e0c518a	Fix a couple more progress thread related issues... This commit was SVN r14708.	2007-05-21 16:06:14 +00:00
Pavel Shamis	5ceaa605d7	Adding new vendor_part_id for Mellanox Hermon HCA This commit was SVN r14705.	2007-05-21 13:33:54 +00:00
Brian Barrett	1191677b76	Fix dumb threads-related compile issues This commit was SVN r14704.	2007-05-21 03:23:58 +00:00
Brian Barrett	2b4b754925	Some much needed cleanup of the point-to-point one-sided component... * Combine polling of the long requests and buffer requests into one type, and in one place * Associate the list of requests to poll with the component, not the individual modules * add progress thread that sits on the OMPI request structure and wakes up at the appropriate time to poll the message list. Not the best, but without some asynch notification from the PML that a given set of requests has completed, there isn't much better * Instead of calling opal_progress() all over the place, move to using the condition variables like the rest of the project. Has the advantage of moving it slightly futher along in the becoming thread safe thing * Fix a problem with the passive side of unlock where it could go recursive and cause all kinds of problems, especially when progress threads are used. Instead, have two parts of passive unlock -- one to start the unlock, and another to complete the lock and send the ack back. The data moving code trips the second at the right time. This commit was SVN r14703.	2007-05-21 02:21:25 +00:00

1 2 3 4 5 ...

1750 Коммитов