openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	50bae9c603	Bring in the modular-wireup stuff for the openib BTL (from /tmp/jms-modular-wireup branch): * This commit moves all the openib BTL connection code out of btl_openib_endpoint.c and into a connect "pseudo-component" area, meaning that different schemes for doing OFA connection schemes can be chosen via function pointer (i.e., MCA parameter) at run-time. * The connect/connect.h file includes comments describing the specific interface for the connect pseudo-component. * Two pseudo-components are in this commit (more can certainly be added). * oob: use the same old oob/rml scheme for creating OFA connections that we've had forever; this now just puts the logic into this self-contained pseudo-component. * rdma_cm: a currently-empty set of functions (that currently return NOT_IMPLEMENTED) that will someday use the RDMA connection manager to make OFA connections. This commit was SVN r15786.	2007-08-06 23:40:35 +00:00
Jeff Squyres	0fb8cf65a8	If you have an HCA with no active ports, we still create an mpool. This mpool will have no btl module owner there was no btl created for the HCA with no ports, but it will still be tracked in the mpool framework (i.e., it's available). If MPI_ALLOC_MEM is called by the app, one of two things will happen: 1. if there's an HCA on the host with some active ports, the openib btl component will still be in the process space, and therefore the "mpool with no btl" (MWNB) module will still be able to call the reg/dereg functions, and all will be fine. However, if MPI_FREE_MEM is never invoked to free the memory, bad things will happen during MPI_FINALIZE. The pml is finalized, which finalizes all the btls. The btls finalize all their mpools and all is fine. But later we close down the mpool framework which then finalizes any left over mpool modules, such as MWNB. However, the openib BTL module functions that the MWNB was registered with are no longer in the process space, and it segv's while trying deregister the memory. 2. if there are no HCA's on the host with active ports, then the openib btl will have been unloaded, and when the MWNM tries to register the memory, the functions it tries to call (in the openib btl) are no longer there, and we segv. This commit was SVN r15735.	2007-08-01 20:53:34 +00:00
Gleb Natapov	758f932aa6	Handle credit in a thread safe manner. I am sure more work will have to be done in this are. This commit was SVN r15721.	2007-08-01 12:15:43 +00:00
Gleb Natapov	9c20d67301	1) Return IB header to it's previous size by using char for cm_seen field. 2) Allow to specify rd_win/rd_rsv parameters by user, but make them optional. This commit was SVN r15719.	2007-08-01 12:10:56 +00:00
Gleb Natapov	2d9669a69d	mca_btl_openib_endpoint_post_send() is called with endpoint lock held. No need to call lock() in btl_openib_acquire_send_resources(). This commit was SVN r15678.	2007-07-30 09:03:08 +00:00
Jeff Squyres	cae00d1854	Passing NULL to pthread_exit() is verbotten. This commit was SVN r15661.	2007-07-27 01:06:36 +00:00
Jeff Squyres	015fc08ff4	Remove the ib_static_rate MCA parameter; it will be replaced with a dynamic mechanism to adjust the rate only if necessary (e.g., two ports of differing speeds are connected). This commit was SVN r15653.	2007-07-26 21:10:51 +00:00
Gleb Natapov	cce6bb478c	Process message before reposting buffers. This way rd_posted should be calculated properly. This commit was SVN r15635.	2007-07-26 13:56:07 +00:00
Pavel Shamis	bda6f1a5cf	Fixing compilation problem in openib btl progress thread. This commit was SVN r15631.	2007-07-26 11:35:15 +00:00
Gleb Natapov	1f18b060ce	If eager_rdma_local in not initialized credits and rd_win are zero and the comparison is always true. This commit was SVN r15629.	2007-07-26 07:53:35 +00:00
Jeff Squyres	e36038bb17	We know that --enable-progress-threads doesn't work. But this allows it to at least compile. If you actually get to the point of invoking the openib btl progress thread, you'll get a big opal_output warning that it is pretty much guaranteed not to work. This commit was SVN r15628.	2007-07-26 00:58:56 +00:00
Galen Shipman	514811c50b	cleanup btl.h comments document the btl interface a bit better This commit was SVN r15618.	2007-07-25 17:26:23 +00:00
Galen Shipman	438a56e0d7	update copyrights for ib_multifrag commit This commit was SVN r15612.	2007-07-25 15:03:34 +00:00
Galen Shipman	325c184fb4	remove debugging "abort()" fix a debugging assert This commit was SVN r15611.	2007-07-25 14:51:19 +00:00
Jeff Squyres	f2a2b2c0f9	A little more error checking; clean up the invalid MCA help message This commit was SVN r15589.	2007-07-24 20:57:40 +00:00
Gleb Natapov	5b7d3faedc	Implement "credit management for credit messages" protocol. On each message a sender piggybacks a number of credit messages it received from a peer. A number of outstanding credit messages is limited. This is needed to never ever fall back to HW flow control. This commit was SVN r15580.	2007-07-24 15:19:51 +00:00
Gleb Natapov	45a7a0650b	btl_openib_handle_incoming() is called from regular receive path and from eager RDMA receive path and checks internally from where it was called from to perform different tasks. Leave only common code in there and move other code to appropriate places. This commit was SVN r15579.	2007-07-24 13:23:08 +00:00
Brian Barrett	5b9fa7e998	reapply r15517 and r15520, which were removed in r15527 so that I could get the RML/OOB merge in slightly easier This commit was SVN r15530. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95 r15520 --> open-mpi/ompi@9cbc9df1b8 r15527 --> open-mpi/ompi@2d17dd9516	2007-07-20 02:34:29 +00:00
Brian Barrett	2d17dd9516	temporarily back our r15517 and 15520 so that I can get the RML / OOB changes to cleanly apply This commit was SVN r15527. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95	2007-07-20 01:10:34 +00:00
Ralph Castain	41977fcc95	Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but... Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point. Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings. This commit was SVN r15517.	2007-07-19 20:56:46 +00:00
Pavel Shamis	d837f1446b	It is work around for Ticket #1092 . It will prevent the error failure in openib finalize but it doesn't resolve the actual issue. I guess that oneside tests some how allocates memory (mpool?) and doesn't release it. Need to check it. This commit was SVN r15488.	2007-07-18 18:02:13 +00:00
Gleb Natapov	45fcb45e31	Remove debug checks that produce lots of warnings during compilation. This commit was SVN r15479.	2007-07-18 13:49:15 +00:00
Gleb Natapov	30b2183314	Remove debug output from a hot path. This commit was SVN r15478.	2007-07-18 12:48:34 +00:00
Jeff Squyres	3bc940ac27	Fix three things from r15474 (thanks to Brian for noticing): * bml.h had a change that introduced a variable named "_order" to avoid a conflict with a local variable. The namespace starting with _ belongs to the os/compiler/kernel/not us. So we can't start symbols with _. So I replaced it with arg_order, and also updated the threaded equivalent of the macro that was modified. * in btl_openib_proc.c, one opal_output accidentally had its string reverted from "ompi_modex_recv..." to "mca_pml_base_modex_recv....". This was fixed. * The change to ompi/runtime/ompi_preconnect.c was entirely reverted; it was an artifact of debugging. This commit was SVN r15475. The following SVN revision numbers were found above: r15474 --> open-mpi/ompi@8ace07efed	2007-07-18 11:38:06 +00:00
Jeff Squyres	8ace07efed	This commit brings in two major things: 1. Galen's fine-grain control of queue pair resources in the openib BTL. 1. Pasha's new implementation of asychronous HCA event handling. Pasha's new implementation doesn't take much explanation, but the new "multifrag" stuff does. Note that "svn merge" was not used to bring this new code from the /tmp/ib_multifrag branch -- something Bad happened in the periodic trunk pulls on that branch making an actual merge back to the trunk effectively impossible (i.e., lots and lots of arbitrary conflicts and artifical changes). :-( == Fine-grain control of queue pair resources == Galen's fine-grain control of queue pair resources to the OpenIB BTL (thanks to Gleb for fixing broken code and providing additional functionality, Pasha for finding broken code, and Jeff for doing all the svn work and regression testing). Prior to this commit, the OpenIB BTL created two queue pairs: one for eager size fragments and one for max send size fragments. When the use of the shared receive queue (SRQ) was specified (via "-mca btl_openib_use_srq 1"), these QPs would use a shared receive queue for receive buffers instead of the default per-peer (PP) receive queues and buffers. One consequence of this design is that receive buffer utilization (the size of the data received as a percentage of the receive buffer used for the data) was quite poor for a number of applications. The new design allows multiple QPs to be specified at runtime. Each QP can be setup to use PP or SRQ receive buffers as well as giving fine-grained control over receive buffer size, number of receive buffers to post, when to replenish the receive queue (low water mark) and for SRQ QPs, the number of outstanding sends can also be specified. The following is an example of the syntax to describe QPs to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues: {{{ -mca btl_openib_receive_queues \ "P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32" }}} Each QP description is delimited by ";" (semicolon) with individual fields of the QP description delimited by "," (comma). The above example therefore describes 4 QPs. The first QP is: P,128,16,4 Meaning: per-peer receive buffer QPs are indicated by a starting field of "P"; the first QP (shown above) is therefore a per-peer based QP. The second field indicates the size of the receive buffer in bytes (128 bytes). The third field indicates the number of receive buffers to allocate to the QP (16). The fourth field indicates the low watermark for receive buffers at which time the BTL will repost receive buffers to the QP (4). The second QP is: S,1024,256,128,32 Shared receive queue based QPs are indicated by a starting field of "S"; the second QP (shown above) is therefore a shared receive queue based QP. The second, third and fourth fields are the same as in the per-peer based QP. The fifth field is the number of outstanding sends that are allowed at a given time on the QP (32). This provides a "good enough" mechanism of flow control for some regular communication patterns. QPs MUST be specified in ascending receive buffer size order. This requirement may be removed prior to 1.3 release. This commit was SVN r15474.	2007-07-18 01:15:59 +00:00
Jeff Squyres	cee9c214c7	Update the vendor ID list to include HP (0x1708). Thanks to Peter Kjellstrom for pointing this out. This commit was SVN r15316.	2007-07-09 20:09:31 +00:00
Brian Barrett	8b9e8054fd	Move modex from pml base to general ompi runtime, sicne it's used by more than just the PML/BTLs these days. Also clean up the code so that it handles the situation where not all nodes register information for a given node (rather than just spinning until that node sends information, like we do today). Includes r15234 and r15265 from the /tmp/bwb-modex branch. This commit was SVN r15310. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15234 r15265	2007-07-09 17:16:34 +00:00
Jeff Squyres	022bd30558	Back out r15158 because it apparently breaks with recent versions of flex (which, incidentally, emit ''more'' warnings than earlier versions). Grumble. This commit was SVN r15166. The following SVN revision numbers were found above: r15158 --> open-mpi/ompi@57d09c10f7	2007-06-21 21:14:10 +00:00
Jeff Squyres	57d09c10f7	Avoid some compiler warnings that come up ''every day'' in MTT (and have been for eons): make a symbol be used in a dumb but harmless way. This commit was SVN r15158.	2007-06-21 15:42:06 +00:00
Gleb Natapov	b88b7dedfe	Rename btl_rdma_offset to btl_pipeline_send_length. This commit was SVN r15153.	2007-06-21 07:12:40 +00:00
Jeff Squyres	930a9b7682	Make the help messages for if_include/if_exclude a little better. This commit was SVN r15134.	2007-06-19 13:38:58 +00:00
Jeff Squyres	2399b9a535	Ensure to initialize the variable so that we don't segv. This commit was SVN r15078.	2007-06-14 13:59:28 +00:00
Gleb Natapov	7b9ae49fe1	This time correctly calculate local BTL rank among all BTLs in a subnet. This commit was SVN r15073.	2007-06-14 10:27:11 +00:00
Jeff Squyres	1e18265c16	Bring over the functionality from the /tmp/jnysal-openib-wireup branch: * Support btl_openib_if_include and btl_openib_if_exclude MCA parameters, similar to those supported by other BTLs. Each take a comma-delimited lists of identifiers. Identifiers can be HCA interface names (e.g., ipath0, mthca1, etc.) or an HCA interface name and port numbers (e.g., ipath0:1, mthca1:2, etc.). It is an error to specify both _include and _exclude. If you specify a non-existant (or non-ACTIVE) HCA and/or port, you'll get a warning unless you disable the warning by setting the MCA parameter btl_openib_warn_nonexistent_if to 0. * Start updating to use BEGIN_C_DECLS and END_C_DECLS * A few other minor fixes that were picked up along the way. This commit was SVN r15063.	2007-06-14 01:59:25 +00:00
Gleb Natapov	8164723014	Allow to configure bandwidth and latency with finer granularity. Set bandwidth for all ports of mthca0: --mca btl_openib_bandwidth_mthca0 1000 Set bandwidth for port 1 of mthca1: --mca btl_openib_bandwidth_mthca1:1 1000 Set latency for port 2 lid 123 on mthca0: --mca btl_openib_latency_mthca0:2:123 20 This commit was SVN r15041.	2007-06-13 12:47:38 +00:00
Gleb Natapov	5c3f511451	Properly determine btl's rank among all btls withing the same subnet. This commit was SVN r15038.	2007-06-13 11:15:58 +00:00
Galen Shipman	5340f5e320	Try to cleanup the flow control logic a bit Renamed a few variables Inialize the reserve receive buffers to 1, prior to this they were initialized to zero. This commit was SVN r14919.	2007-06-06 18:51:09 +00:00
Brian Barrett	a446af5b6b	* Remove unneeded SRQ test -- we no longer support OFED builds that don't have the SRQ interface. * Instead of setting AC_DEFINEs per MCA component, set per test. THe answers can never be difference, and this will speed sed just a teeny bit This commit was SVN r14856.	2007-06-05 01:49:26 +00:00
Gleb Natapov	444762456e	Don't dereference NULL pointer. Fix bug introduced in r14768. This commit was SVN r14781. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07	2007-05-27 09:24:56 +00:00
Galen Shipman	3401bd2b07	Add optional ordering to the BTL interface. This is required to tighten up the BTL semantics. Ordering is not guaranteed, but, if the BTL returns a order tag in a descriptor (other than MCA_BTL_NO_ORDER) then we may request another descriptor that will obey ordering w.r.t. to the other descriptor. This will allow sane behavior for RDMA networks, where local completion of an RDMA operation on the active side does not imply remote completion on the passive side. If we send a FIN message after local completion and the FIN is not ordered w.r.t. the RDMA operation then badness may occur as the passive side may now try to deregister the memory and the RDMA operation may still be pending on the passive side. Note that this has no impact on networks that don't suffer from this limitation as the ORDER tag can simply always be specified as MCA_BTL_NO_ORDER. This commit was SVN r14768.	2007-05-24 19:51:26 +00:00
Jeff Squyres	81df632e29	Clarification to MCA parameter help messages This commit was SVN r14765.	2007-05-24 19:18:29 +00:00
Pavel Shamis	5ceaa605d7	Adding new vendor_part_id for Mellanox Hermon HCA This commit was SVN r14705.	2007-05-21 13:33:54 +00:00
Gleb Natapov	3ebaff8dfe	Implement new BTL parameters: We eagerly send data up to btl__eager_limit with the match Upon ACK of the MATCH we start using send/receives of size btl__max_send_size up to the btl__rdma_pipeline_offset After the btl__rdma_pipeline_offset we begin using RDMA writes of size btl__rdma_pipeline_frag_size. Now, on a per message basis we only use the above protocol if the message is larger than btl__min_rdma_pipeline_size btl__eager_limit - > same btl__max_send_size -> same btl__rdma_pipeline_offset -> btl__min_rdma_size btl__rdma_pipeline_frag_size -> btl__max_rdma_size btl_*_min_rdma_pipeline_size is new.. This patch also moves all BTL common parameters initialisation into btl_base_mca.c file. This commit was SVN r14681.	2007-05-17 07:54:27 +00:00
Pavel Shamis	cd87b05711	Added check for IBV_EVENT_CLIENT_REREGISTER async event that was not exists in old openib gen2 versions (Ticket #1025) This commit was SVN r14658.	2007-05-15 13:53:49 +00:00
Jeff Squyres	92090967b1	Add definitions for Hemon/ConnectX Mellanox HCA This commit was SVN r14639.	2007-05-10 12:27:51 +00:00
Pavel Shamis	e2d0e27111	Adding: * openib_finalize flow for openib btl * async event handler for openib btl This commit was SVN r14623.	2007-05-08 21:47:21 +00:00
Jeff Squyres	ecf5a3b8dd	Fix compiler warning This commit was SVN r14604.	2007-05-08 13:12:50 +00:00
Rainer Keller	1aceece03f	- Add a few comments for elements for structs, a few spelling fixes. No functional change. This commit was SVN r14534.	2007-04-26 21:03:38 +00:00
Rainer Keller	ce32b918da	- Fixes for for unlocking the mutex in case of error in functions mca_btl_openib_post_srr and btl_openib_endpoint_post_rr This commit was SVN r14530.	2007-04-26 13:33:02 +00:00
Sharon Melamed	cf3f41288b	Add pkey value MCA parameter. if this param is used, only ports with the actual pkey value will be initiate. This commit was SVN r14463.	2007-04-22 10:22:12 +00:00

1 2 3 4 5 ...

266 Коммитов