openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	671f0c379d	Remove a whole pile of orte/util/show_help.h's that I missed. :-( This commit was SVN r18437.	2008-05-14 11:32:33 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Jon Mason	125eb5a2ed	Convert from the Linux ifaddrs to the OMPI ifaddrs, which should unbreak Solaris. This commit was SVN r18433.	2008-05-13 18:34:22 +00:00
Jeff Squyres	d8e5608053	Remove all retransmission code; the IBCM kernel module handles all of that for us. This commit was SVN r18432.	2008-05-13 16:10:34 +00:00
Jon Mason	74bf1ae25f	Fix compiler warnings This commit was SVN r18431.	2008-05-13 16:01:58 +00:00
Jon Mason	4ead9442b5	Add in IDs for all Chelsio iWARP capable adapters This commit was SVN r18428.	2008-05-12 21:59:03 +00:00
Jeff Squyres	6b26895ad4	A little style update -- constants on the left... This commit was SVN r18426.	2008-05-12 12:05:16 +00:00
Jeff Squyres	16cde0e5fa	Fix compile error on older OFED systems This commit was SVN r18425.	2008-05-12 11:56:14 +00:00
Gleb Natapov	6844ff32ba	Return OMPI_ERR_RESOURCE_BUSY from sm->btl_send() function if there is no place in cb. This will prevent OB1 from doing early completion of small sends. This commit was SVN r18424.	2008-05-12 07:15:29 +00:00
Gleb Natapov	0827e537fa	Don't include rdma/rdma_cma.h if !OMPI_HAVE_RDMACM. This commit was SVN r18422.	2008-05-11 11:58:02 +00:00
Jon Mason	99ab66e131	RDMACM code cleanup This patch adds some much needed comments, reduces the amount of code wrapping, and rearrges and removes redundant code. This commit was SVN r18417.	2008-05-08 21:20:12 +00:00
Jon Mason	88e5f2a339	Abstract iWARP subnet ID functions (sans build break) The iWARP subnet ID determination should not be in the RDMACM cpc, as it was in the preversion, as this violates the cpc abstract that is present throughout the code. Also, this patch uses the opal_list_t data struct instead of using its own linked lists. This attempt includes iwarp.c and iwarp.h This commit was SVN r18414.	2008-05-08 14:38:14 +00:00
Jeff Squyres	60f39a30f6	Revert r18409; that commit broke the build because it forgot to add the btl_openib_iwarp.c and btl_openib_iwarp.h files. This commit was SVN r18410. The following SVN revision numbers were found above: r18409 --> open-mpi/ompi@056bbb68c8	2008-05-08 00:22:21 +00:00
Jon Mason	056bbb68c8	Abstract iWARP subnet ID functions The iWARP subnet ID determination should not be in the RDMACM cpc, as it was in the preversion, as this violates the cpc abstract that is present throughout the code. Also, this patch uses the opal_list_t data struct instead of using its own linked lists. This commit was SVN r18409.	2008-05-07 23:59:43 +00:00
Ralph Castain	7c7b9b0486	Do a little cleanup on the opal graph class and opal carto framework to conform to OMPI naming conventions and avoid potential conflict with user applications - no change in functionality, passes carto test program This commit was SVN r18407.	2008-05-07 19:33:49 +00:00
Jeff Squyres	157cea378f	* A few fixes to make IP address and port number comparisons properly * A few indenting and style fixes This commit was SVN r18405.	2008-05-07 16:56:07 +00:00
Jeff Squyres	bfae8ea828	The comment wasn't long enough; I felt the need to make it longer (and explain a little more ;-) ). This commit was SVN r18404.	2008-05-07 16:53:05 +00:00
Jeff Squyres	63abb3eb9b	Clarify a comment / fix typos. This commit was SVN r18402.	2008-05-07 14:51:36 +00:00
Jon Mason	502d164908	Create subnet ID's for iWARP. This enables subnet differientation for iWARP devices, and rearrange initilization so that the services are available when they are needed. This commit was SVN r18393.	2008-05-06 22:43:52 +00:00
Jon Mason	9c724128f8	Handle no IP Address in rdmacm more resiliently If there is no IP Address, have rdmacm log the correct error and let another cpc have a go at it. This is being done by splitting off the IP address checking logic for the modex message creation, and having it log the correct error in the error case. This commit was SVN r18392.	2008-05-06 22:31:29 +00:00
Jon Mason	46bfd42c09	Fix compile warnings in rdmacm Fix some reported compiler warnings and make the code a little prettier. This commit was SVN r18391.	2008-05-06 22:19:28 +00:00
Jon Mason	9066168cd1	Prevent iWARP qp flush errors. For iWARP, the TCP connection is tied to the QP once the QP is in RTS. And destroying the QP is thus tied to connection teardown for iWARP. This is a key distinction from IB, I think. Anyway, to destroy the connection in iWARP you must move the QP out of RTS, either into CLOSING for a nice graceful close, or to ERROR if you want to be rude. In both cases, all pending non-completed SQ and RQ WRs must be flushed. This patch ignores all flush errors reaped by the cq and removes an earlier attempt to work around this in the rdmacm cpc. This commit was SVN r18388.	2008-05-06 21:57:40 +00:00
Jeff Squyres	a06d4023b8	Oops -- missed one sys_errlist -> strerror(). This commit was SVN r18378.	2008-05-06 13:22:36 +00:00
Jeff Squyres	4154e587de	strerror() is much better. This commit was SVN r18376.	2008-05-05 21:06:07 +00:00
Jon Mason	a3bf503e01	Remove error on rdma cm If there are multiple QP's, RDMACM will not send a message if the qpnum != 0. In doing so, it will log an error unecessarily. This removes that. This commit was SVN r18363.	2008-05-02 20:12:01 +00:00
Jon Mason	3989981578	Enable support of num_proc > num_nodes Add the logic to support using port numbers, instead of simply using the IP address of the sending node to determine which endpoint to connect. Since each process calls the cpc query function, it will generate its own port to listen on thus enablign this to work. This commit was SVN r18362.	2008-05-02 16:20:28 +00:00
Jeff Squyres	ba5615a18f	Merge in /tmp-public/cpc3 branch to trunk. oob/xoob still remains the default CPC. This commit was SVN r18356.	2008-05-02 11:52:33 +00:00
Donald Kerr	843a35094f	adding local work queue accounting This commit was SVN r18352.	2008-05-01 21:01:51 +00:00
George Bosilca	a69ac964df	Allow any order in the list of Elan vpid. This commit was SVN r18350.	2008-05-01 20:32:03 +00:00
Pavel Shamis	61cc8843bf	The r17940 broke the XRC code. The endpoint may be appended to list during XOOB connection bring up. This commit was SVN r18328. The following SVN revision numbers were found above: r17940 --> open-mpi/ompi@ebfdd133f5	2008-04-29 13:22:40 +00:00
Brad Penoff	c699236be2	updating SCTP BTL to configure properly with FreeBSD 7 This commit was SVN r18324.	2008-04-28 04:19:10 +00:00
Adrian Knoth	c53d3c3c22	reverted r18169,r18170 due to connection reset by peer on odin/sif This commit was SVN r18255. The following SVN revision numbers were found above: r18169 --> open-mpi/ompi@20473bfda2 r18170 --> open-mpi/ompi@d34dfbe12c	2008-04-23 15:26:15 +00:00
Jeff Squyres	c40740947f	Fix minor spelling error. This commit was SVN r18229.	2008-04-22 13:11:50 +00:00
Galen Shipman	27c425b304	make portals level ack's optional (require ACK by default) This commit was SVN r18228.	2008-04-21 22:22:18 +00:00
Ralph Castain	fa082cafa9	Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex. Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer. This commit was SVN r18198.	2008-04-17 20:43:56 +00:00
Adrian Knoth	d34dfbe12c	fixed misleading comment. This commit was SVN r18170.	2008-04-16 11:26:15 +00:00
Adrian Knoth	20473bfda2	on incoming connections, compare with every possible source address. Rational (taken from the code): /* This is PITA. We never know which source address an * incoming/outgoing packet will have, so even with * btl_tcp_if_include/exclude on the remote end, we * might get a different source address. * * If this address isn't included in btl_proc->proc_addrs, * we would erroneously drop the connection */ merge -r18165:18167 to the trunk. This commit was SVN r18169. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r18165 r18167	2008-04-16 11:24:09 +00:00
Adrian Knoth	e981a259bb	btl_tcp_disable_family=4 and btl_tcp_disable_family=6 are mutually exclusive, so this should result in "unreachable" when set differently between peers. This commit was SVN r18168.	2008-04-16 10:14:58 +00:00
Adrian Knoth	75c54616c7	renamed opal_sockaddr2str to opal_net_get_hostname for WANT_PEER_DUMP=1 This commit was SVN r18154.	2008-04-15 19:23:47 +00:00
Jeff Squyres	72af302360	Remove unused variable. This commit was SVN r18151.	2008-04-15 14:58:32 +00:00
Aurelien Bouteiller	0f311ed824	Make sure the function returns NULL when no elan adapter is available instead of a random value. This commit was SVN r18136.	2008-04-11 21:03:01 +00:00
Aurelien Bouteiller	20592cbcbf	Fixes a warning about mallocing 0 bytes when no elan adapter is available. This commit was SVN r18135.	2008-04-11 20:59:12 +00:00
Jon Mason	08ead87604	Potential double free of locks mca_btl_openib_endpoint_post_rr_nolock is freeing the endpoint lock on the error case, but most/all of the functions calling this free the lock regardless of its error case. Thus resulting is a double free of the lock. This commit was SVN r18131.	2008-04-10 21:15:01 +00:00
Donald Kerr	38e298cc9a	report error message in all libs, not just debug This commit was SVN r18103.	2008-04-08 22:58:28 +00:00
Gleb Natapov	713a27dc71	Counter of created RDMA channels should be incremented immediately after channel creation (not in control message completion) otherwise more than max_eager_rdma channel may be created. This commit was SVN r18082.	2008-04-06 13:48:45 +00:00
Jeff Squyres	7072a32703	* Properly protect XRC stuff * A few minor style fixes This commit was SVN r18076.	2008-04-02 19:52:03 +00:00
George Bosilca	944453c4c1	Cleanups. This commit was SVN r18068.	2008-04-02 06:37:42 +00:00
Jeff Squyres	d0f12f3df0	Make a better error message. This commit was SVN r18014.	2008-03-29 12:54:24 +00:00
George Bosilca	be4b153f0d	Another patch for thread safety in the TCP BTL (thanks to Pierre). This commit was SVN r17993.	2008-03-27 18:36:08 +00:00
Jeff Squyres	5320c91ab3	Oops -- fix the constructor to also use opal_object_t instead of opal_list_item_t. This commit was SVN r17945.	2008-03-25 11:59:50 +00:00
Jeff Squyres	ebfdd133f5	AFACT, we never put endpoints on a list. This commit was SVN r17940.	2008-03-24 18:32:55 +00:00
Ralph Castain	dc7f45dafd	Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure. Only one place used the user name field - session_dir, when formulating the name of the top-level directory. Accordingly, the code for getting the user's id has been moved to the session_dir code. This commit was SVN r17926.	2008-03-23 23:10:15 +00:00
Galen Shipman	dcac824f59	Fix problem in releasing fragments during GET_END event (didn't check that portals btl has ownership and therefor didn't free the frag as it should) this causes leakage and hangs in MPI_Finalize. Also added a bit more debugging. This commit was SVN r17900.	2008-03-20 22:46:32 +00:00
George Bosilca	1d04ec4ded	Correct the connection logic for TCP. Now we have not only a cleaner connection, but a more thread safe one. Thanks to Pierre for his help on this. This commit was SVN r17853.	2008-03-18 02:42:16 +00:00
Gleb Natapov	9b6db25182	Fix compilation warning. This commit was SVN r17839.	2008-03-17 13:37:57 +00:00
Pavel Shamis	54ad8d7446	The issue was reported/fixed by Jon Mason one month ago but the fix was not committed. So I'm commiting it now. This commit was SVN r17835.	2008-03-17 11:13:06 +00:00
Brad Penoff	be13b86fc5	Clarifying and fixing SCTP btl_sctp_if_11 parameter This commit was SVN r17834.	2008-03-17 09:18:31 +00:00
Gleb Natapov	f488b94899	More SM BTL initialization cleanups. This commit was SVN r17833.	2008-03-16 10:01:56 +00:00
Jeff Squyres	6c77c995c2	Add missing dependencies in the static build case. This commit was SVN r17825.	2008-03-15 12:11:36 +00:00
George Bosilca	5e229fe688	Thanks Ma for the patch. Correct the multi-rail support and rename some fields to something more clear. This commit was SVN r17824.	2008-03-14 19:17:28 +00:00
George Bosilca	ecebd5ae77	Update the Elan BTL to take in account multiple networks, and correctly deal with the node position in the network. This commit was SVN r17822.	2008-03-14 17:32:35 +00:00
Gleb Natapov	772772b944	Remove unneeded include. This commit was SVN r17813.	2008-03-12 10:01:20 +00:00
Gleb Natapov	90c70e37b9	Clean up SM btl startup code. Remove no longer needed code leftovers from two BTL times. Remove old and no longer correct comment. This commit was SVN r17805.	2008-03-11 14:39:10 +00:00
Gleb Natapov	ffa09c44fd	Pass correct pointer to mpool_base function. This commit was SVN r17795.	2008-03-09 13:22:12 +00:00
Gleb Natapov	b0b21c68b4	Remove trailing spaces from SM BTL. This commit was SVN r17794.	2008-03-09 13:17:13 +00:00
Tim Prins	5de3e1965e	Remove the orte_proc_table. Migrate all users of it to the opal_hash_table and a new name hash function in orte. Everything should work, however I am unable to compile and test the sctp BTL. This commit was SVN r17751.	2008-03-05 22:44:35 +00:00
Donald Kerr	ef8f807c1c	was not passing correct variable to dat_strerror This commit was SVN r17749.	2008-03-05 21:45:16 +00:00
Jeff Squyres	ea5c0cb4a2	Now that the nightly tarball has safely been made, let's try this commit again. Remove the svn:ignore from problematic directories and try a merge from /tmp-public/plpa-merge-area2. This commit was SVN r17718.	2008-03-05 02:45:15 +00:00
Jeff Squyres	8189fcc7d5	Back out r17702; it went very badly. This commit was SVN r17704. The following SVN revision numbers were found above: r17702 --> open-mpi/ompi@3df754ebd7	2008-03-05 00:42:39 +00:00
Jeff Squyres	3df754ebd7	Bring over PLPA v1.1 from /tmp-public/plpa-v1.1 branch. This commit was SVN r17702.	2008-03-05 00:16:49 +00:00
Christian Bell	c3d0a81cd3	Add new QLogic adapters to hca-params.init This commit was SVN r17699.	2008-03-04 22:14:27 +00:00
Gleb Natapov	08abafdaa1	Initialize ib_pd to NULL. This commit was SVN r17674.	2008-03-02 09:11:23 +00:00
Tim Prins	84b2099fe8	Remove the now-unused orte_value_array. As this is the last 'class' split between orte and ompi, remove the big comment about the split in ompi_bitmap. Also, update some properties (source files should not be executeable...), and remove a couple unneeded inclusions of orte_proc_table.h This commit was SVN r17655.	2008-02-28 21:39:42 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Galen Shipman	44003a41f2	Update common_portals to allow using portals interconnect with a modex rather than relying on cnos to get the nid/pid map. This commit was SVN r17588.	2008-02-25 19:17:21 +00:00
Brian Barrett	bc8d863ce3	* Make Portals BTL compile again (looks like the frag ownership stuff didn't get copied well) * Clean up a bunch of warnings This commit was SVN r17562.	2008-02-23 01:45:36 +00:00
Donald Kerr	437e280829	removing a few superfluous casts when the base or super is available This commit was SVN r17554.	2008-02-22 20:10:55 +00:00
Donald Kerr	fe51084d8e	fix compile warning by casting btl udapl module to base module before call to mca_btl_udapl_free This commit was SVN r17541.	2008-02-21 16:19:06 +00:00
Pierre Lemarinier	2a99f89631	Modification of the mutex lock order to prevent races during connection stage. This commit was SVN r17535.	2008-02-20 18:17:58 +00:00
Pavel Shamis	a0d12a9c92	Adding support for APM over different ports This commit was SVN r17521.	2008-02-20 13:44:05 +00:00
Gleb Natapov	60c151608c	Set flags inside fragment allocation function. This commit was SVN r17508.	2008-02-19 12:26:45 +00:00
Nysal Jan	479f36adfc	Fix a SEGV on ppc64. size_t is 8 bytes on a 64-bit build This commit was SVN r17507.	2008-02-19 11:01:21 +00:00
Jeff Squyres	f22f62ef1f	Fix typos. This commit was SVN r17502.	2008-02-18 21:26:21 +00:00
Jeff Squyres	33a4aff18e	Make openib btl a bit more resillient in the face of driver errors -- return OMPI_ERR_UNREACH if the port returns an invalid speed or width. OMPI_ERR_VALUE_OUT_OF_BOUNDS is reserved for when we exceed the number of allowable BTLs. This commit was SVN r17500.	2008-02-18 20:28:06 +00:00
George Bosilca	7a21d77b29	Remove some compilation warnings. This commit was SVN r17498.	2008-02-18 18:55:32 +00:00
George Bosilca	fa31ec81d0	Add the ownership flags to the PML/BTL interface. The layer owning the descriptor is responsible for releasing it once the descriptor is not in use anymore. This commit was SVN r17497.	2008-02-18 17:39:30 +00:00
George Bosilca	be2579467a	With the new ompi_free_list this is not needed anymore. This commit was SVN r17465.	2008-02-15 03:22:16 +00:00
Donald Kerr	58bf7f5a1d	add uintptr_t to prevent the possibility of a signed extension occuring This commit was SVN r17456.	2008-02-14 19:16:34 +00:00
Jeff Squyres	6420db7088	Add missing header file that caused compilation errors in the rhc-step2b branch last night. This commit was SVN r17453.	2008-02-14 14:10:27 +00:00
George Bosilca	255cd2186b	Improve the performance of the MX BTL. Correct the fake PUT protocol. This commit was SVN r17452.	2008-02-14 04:38:55 +00:00
Adrian Knoth	f1648f08df	Advanced address selection code from Thomas Peiselt. Re #1207 , #1027 This commit was SVN r17450.	2008-02-13 21:53:00 +00:00
Sharon Melamed	5b2dab2439	Reverted commit # r17443 This commit was SVN r17446. The following SVN revision numbers were found above: r17443 --> open-mpi/ompi@88ce5a2b73	2008-02-13 14:07:12 +00:00
Sharon Melamed	88ce5a2b73	Replaced PLPA to the latest PLPA (plpa-1.1a3r123) This commit was SVN r17443.	2008-02-13 13:09:11 +00:00
Rainer Keller	7621800477	- Fix and add comments -- output full name for pd - Protect argument in macro... This commit was SVN r17434.	2008-02-12 16:59:59 +00:00
Gleb Natapov	cf801edfe5	Use carto topology framework to choose which HCAs to use. This commit was SVN r17414.	2008-02-11 10:34:11 +00:00
George Bosilca	ee321748a6	The lost space. This commit was SVN r17413.	2008-02-10 22:08:49 +00:00
Pavel Shamis	df787bbeab	Fixing compilation issue on machines with ofed under 1.3. Also finx in apm migration flow. This commit was SVN r17383.	2008-02-06 13:54:58 +00:00
Pavel Shamis	3ba3f70624	Adding apm support for xrc. This commit was SVN r17382.	2008-02-06 10:19:51 +00:00
Gleb Natapov	03c80bdfe3	Fix old libiverbs case. This commit was SVN r17370.	2008-02-04 14:05:01 +00:00
Pavel Shamis	f0c478e7e0	XRC - replacing the new old API with new one. This commit was SVN r17369.	2008-02-04 14:03:38 +00:00
Gleb Natapov	67f752dd50	Add compatibility function between old libibverbs and current libibverbs way of detecting HCAs. This commit was SVN r17365.	2008-02-03 15:16:24 +00:00
George Bosilca	3a6d2e3894	The latest and greatest Elan improvements. This commit was SVN r17361.	2008-02-01 21:29:57 +00:00
Gleb Natapov	f73adf69c0	Fix compiler warnings on 32bit systems. This commit was SVN r17346.	2008-01-31 09:05:25 +00:00
Adrian Knoth	8ae4a10b4c	Reverted r17331, r17332. Still broken. I'm in a bad hurry. :-( Re #1206 This commit was SVN r17333. The following SVN revision numbers were found above: r17331 --> open-mpi/ompi@3846e2a797 r17332 --> open-mpi/ompi@c03de08c55	2008-01-30 16:51:55 +00:00
Adrian Knoth	c03de08c55	Logic is wrong. I'm going to revert it again. Re #1206 This commit was SVN r17332.	2008-01-30 16:48:50 +00:00
Adrian Knoth	3846e2a797	When checking incoming connections, also care about aliased interfaces. Re #1206 This commit was SVN r17331.	2008-01-30 16:45:41 +00:00
Adrian Knoth	7f79c68930	Reverted r17307 and r17308. It broke parallel TCP connections. Re #1206 This commit was SVN r17329. The following SVN revision numbers were found above: r17307 --> open-mpi/ompi@7a59b3f58c r17308 --> open-mpi/ompi@72b29bc21f	2008-01-30 14:31:47 +00:00
Adrian Knoth	72b29bc21f	Cosmetic patch. Use IN6_ARE_ADDR_EQUAL instead of memcmp(). Re #1206 . This commit was SVN r17308.	2008-01-29 16:02:24 +00:00
Adrian Knoth	7a59b3f58c	accept incoming connections from hosts with multiple addresses. We loop over all peer addresses and accept when one of them matches. Note that this might break functionality: mca_btl_tcp_proc_insert now always inserts the same endpoint. (is the lack of endpoints the problem? should there be one for every remote address?) Re #1206 This commit was SVN r17307.	2008-01-29 15:55:56 +00:00
Pavel Shamis	7b59f8ae0b	Fixing warning in apm code. This commit was SVN r17306.	2008-01-29 15:45:18 +00:00
Gleb Natapov	bb03e07ec4	Move eager RDMA channels accounting into completion callback. Otherwise it can go wrong with XRC as endpoint may be not yet connected at the time eager rdma channel is created. This commit was SVN r17302.	2008-01-29 14:35:33 +00:00
Pavel Shamis	92ef832472	Making sure that XRC will not overrun ib_dev_attr.max_qp_wr This commit was SVN r17300.	2008-01-29 13:15:21 +00:00
Pavel Shamis	7d83f34eb0	Protecting the apm code with OMPI_HAVE_THREADS. This commit was SVN r17284.	2008-01-28 16:10:18 +00:00
Jeff Squyres	6a49c97368	Remove erroneous #if This commit was SVN r17282.	2008-01-28 14:38:03 +00:00
Pavel Shamis	28a3917306	Adding APM support (over different lids). This commit was SVN r17280.	2008-01-28 10:38:08 +00:00
George Bosilca	3418485085	Replace the tport by a queue. This commit was SVN r17221.	2008-01-25 01:15:18 +00:00
Donald Kerr	66acac8ff3	the value for invalid idx was just plain wrong, a more appropriate value is now used This commit was SVN r17201.	2008-01-24 15:01:26 +00:00
Jeff Squyres	2227d5ec4a	Add configure check for struct ibv_device.transport type, which was added in OFED v1.2. Still need to fix up oob and rdma_cm cpc's to do something better with this information... This commit was SVN r17198.	2008-01-24 12:14:21 +00:00
Gleb Natapov	52c94fa7ea	Fix compilation warnings. This commit was SVN r17169.	2008-01-21 15:07:39 +00:00
Gleb Natapov	c9a1b06771	Remove trailing whitespaces. No code changes in this commit. This commit was SVN r17167.	2008-01-21 12:11:18 +00:00
George Bosilca	170416797d	This commit was SVN r17162.	2008-01-18 20:10:57 +00:00
George Bosilca	0081202195	Mark the receives as ELAN_TPORT_RXBUF \| ELAN_TPORT_RXANY ... This commit was SVN r17161.	2008-01-18 20:00:44 +00:00
George Bosilca	bf299bb833	Keep most of the functions as static. Improve the progress function. Get rid of all internal quues that are not really useful. This commit was SVN r17160.	2008-01-18 19:28:50 +00:00
Donald Kerr	5f884b1ca4	fix for #1130 - adds support for multi-rail configurations This commit was SVN r17152.	2008-01-17 17:30:50 +00:00
Donald Kerr	908b514ac5	update use of internal tag values to accommodate the active message change found in r17140 This commit was SVN r17148. The following SVN revision numbers were found above: r17140 --> open-mpi/ompi@6310ce955c	2008-01-16 21:17:25 +00:00
Pavel Shamis	add4d9df8a	XRC fixes for MPI2 dynamics. This commit was SVN r17144.	2008-01-15 21:14:48 +00:00
Jeff Squyres	251842ff6a	Remove this AS_IF -- it breaks "make dist". This commit was SVN r17143.	2008-01-15 12:33:08 +00:00
George Bosilca	e8ac5ff04d	Typos. This commit was SVN r17141.	2008-01-15 05:37:42 +00:00
George Bosilca	6310ce955c	The first patch related to the Active Message stuff. So far, here is what we have: - the registration array is now global instead of one by BTL. - each framework have to declare the entries in the registration array reserved. Then it have to define the internal way of sharing (or not) these entries between all components. As an example, the PML will not share as there is only one active PML at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3 are reserved for the framework while the remaining 5 are use internally by each framework. - The registration function is optional. If a BTL do not provide such function, nothing happens. However, in the case where such function is provided in the BTL structure, it will be called by the BML, when a tag is registered. Now, it's time for the second step... Converting OB1 from a switch based PML to an active message one. This commit was SVN r17140.	2008-01-15 05:32:53 +00:00
Jon Mason	a0d4122606	The new cpc selection framework is now in place. The patch below allows for dynamic selection of cpc methods based on what is available. It also allows for inclusion/exclusions of methods. It even futher allows for modifying the priorities of certain cpc methods to better determine the optimal cpc method. This patch also contains XRC compile time disablement (per Jeff's patch). At a high level, the cpc selections works by walking through each cpc and allowing it to test to see if it is permissable to run on this mpirun. It returns a priority if it is permissable or a -1 if not. All of the cpc names and priorities are rolled into a string. This string is then encapsulated in a message and passed around all the ompi processes. Once received and unpacked, the list received is compared to a local copy of the list. The connection method is chosen by comparing the lists passed around to all nodes via modex with the list generated locally. Any non-negative number is a potentially valid connection method. The method below of determining the optimal connection method is to take the cross-section of the two lists. The highest single value (and the other side being non-negative) is selected as the cpc method. svn merge -r 16948:17128 https://svn.open-mpi.org/svn/ompi/tmp-public/openib-cpc/ . This commit was SVN r17138.	2008-01-14 23:22:03 +00:00
Pavel Shamis	6e50fca2dd	Fixing permissions for XRC domain file. This commit was SVN r17127.	2008-01-13 19:23:11 +00:00
Jon Mason	626e0814a2	Style clean-up This commit was SVN r17126.	2008-01-12 18:47:17 +00:00
Jon Mason	3970c3ff6c	Add Chelsio T3 to ompi/mca/btl/openib/mca-btl-openib-hca-params.ini This commit was SVN r17101.	2008-01-09 22:14:18 +00:00
Jon Mason	597c7e68f1	Minor cleanups This commit was SVN r17100.	2008-01-09 21:54:11 +00:00
Rolf vandeVaart	870fa8b1f1	Pad the sm btl header to double-word alignment. Preserves PML header as double-word aligned and prevents bus errors on SPARC based servers. This is part of fix for #1148. Refs trac:1148 This commit was SVN r17090. The following Trac tickets were found above: Ticket 1148 --> https://svn.open-mpi.org/trac/ompi/ticket/1148	2008-01-09 18:50:51 +00:00
Gleb Natapov	25ce70bb92	Call mca_btl_openib_endpoint_post_send() holding endpoint lock and not holding qp lock since this is what the function assumes. This commit was SVN r17086.	2008-01-09 14:46:41 +00:00
Pavel Shamis	99f51482e3	Fixing openib finalization flow. This commit was SVN r17085.	2008-01-09 12:36:30 +00:00
Gleb Natapov	51d6ca0cb6	Provide no lock version of mca_btl_openib_endpoint_post_rr(). On connection creation we call it with endpoint lock already held. This commit was SVN r17084.	2008-01-09 10:39:35 +00:00
Gleb Natapov	50af6b9e78	Rearrange functions order so that functions are defined before they are used. No code changes here. This commit was SVN r17083.	2008-01-09 10:27:15 +00:00
Gleb Natapov	621fa223c5	Create free lists of fragments per HCA, not per BTL. Saves memory in case of multiple LMCs. This commit was SVN r17082.	2008-01-09 10:26:21 +00:00
Gleb Natapov	5ce3213158	Rearrange functions order so that functions are defined before they are used. No code changes here. This commit was SVN r17081.	2008-01-09 10:05:41 +00:00
Pavel Shamis	fbf7bcd9a9	We need to prepost on srq/xrc before reply with ENDPOINT_XOOB_CONNECT_XRC_RESPONSE. This commit was SVN r17066.	2008-01-08 10:30:16 +00:00
Rolf vandeVaart	0f0fde3490	Partial fix for #1148 . Enable this for 32-bit sparc as well as 64-bit sparc. This commit was SVN r17059.	2008-01-07 15:43:44 +00:00
Gleb Natapov	c3bbf69356	Set send_flags correctly in btl_openib_put. Otherwise we may reuse flags from previous use of the buffer and they may be incorrect. This commit was SVN r17058.	2008-01-07 10:19:07 +00:00
George Bosilca	48f5a26e8c	Cast to keep VC happy (quiet). This commit was SVN r17054.	2008-01-04 23:13:32 +00:00
Jeff Squyres	a234ba198a	Remove superflous / unused -D from Makefile.am. This commit was SVN r17030.	2008-01-02 18:00:20 +00:00
Jeff Squyres	c9bea80f8f	Fix unbalanced parenthesees noticed by Paul Hargove. This commit was SVN r17029.	2008-01-02 13:34:07 +00:00
Gleb Natapov	2fb6947f88	Destroy endpoints that use eager rdma communication before destroying SRQ. Do't skip async event thread destruction if SRQ was not destroyed, or it will segfault on module removal. This commit was SVN r17025.	2007-12-23 13:58:31 +00:00
Gleb Natapov	b06d92bdab	OpenIB BTL has three channels through which data can be received (eager rdma, high prio QPs and low prio QPs) and because not all of them are polled each time progrgess() is called (to save on latency) starvation is possible. The commit fixes this. Now each channel is polled, but higher priority channels are polled more often. Three new parameters are introduced that control polling ratios between different channels. This commit was SVN r17024.	2007-12-23 12:29:34 +00:00
Brad Penoff	4c2571b54c	fixed more 64 bit SCTP BTL warnings This commit was SVN r17022.	2007-12-21 21:50:00 +00:00

1 2 3 4 5 ...

1152 Коммитов