openmpi

Автор	SHA1	Сообщение	Дата
Pavel Shamis	379e00050c	Fixing openib btl finalize flow. Bug fix for #1286 . This commit was SVN r18590.	2008-06-05 12:20:13 +00:00
Jeff Squyres	91a281080a	Fix a compiler warning for a case that would never really happen anyway. Rename a variable to be a bit more descriptive. This commit was SVN r18585.	2008-06-04 19:10:23 +00:00
Jeff Squyres	bc584dedd6	Remove a compiler warning that would never happen in practice. This commit was SVN r18584.	2008-06-04 19:03:02 +00:00
Jeff Squyres	6e37dd0ef0	Fix some 32/64 printf errors once and for all This commit was SVN r18582.	2008-06-04 14:39:37 +00:00
Pavel Shamis	0a8321e08d	Calls to APM functions should be protected with OMPI_HAVE_THREADS. This commit was SVN r18581.	2008-06-04 14:27:41 +00:00
Jeff Squyres	5e918ad25d	Add first cut of NetXen iWARP NIC definition. May still be refined with more experimentation. This commit was SVN r18580.	2008-06-04 12:11:45 +00:00
Pavel Shamis	c73ed2b256	Updating cpc name from xrc to xoob. This commit was SVN r18571.	2008-06-04 08:50:30 +00:00
Ralph Castain	c992e99035	Remove the tags from orte_output_open and the filtering operation from orte_output - this will be handled differently to improve the XML output interface This commit was SVN r18557.	2008-06-03 14:24:01 +00:00
Jeff Squyres	69d78c6739	Fixes trac:1215: adds specific show_help messages about PP vs. SRQ/XRC RNR retry exceeded errors. This commit was SVN r18554. The following Trac tickets were found above: Ticket 1215 --> https://svn.open-mpi.org/trac/ompi/ticket/1215	2008-06-02 11:03:48 +00:00
Jeff Squyres	8c267d50a3	Fixes trac:1121. We already show_help when we fail to create queues, so I just made the message a little more verbose such that it may be that OMPI is trying to use a feature that is not supported on the hardware. This commit was SVN r18553. The following Trac tickets were found above: Ticket 1121 --> https://svn.open-mpi.org/trac/ompi/ticket/1121	2008-05-30 19:03:58 +00:00
George Bosilca	e361bcb64c	Send optimizations. 1. The send path get shorter. The BTL is allowed to return > 0 to specify that the descriptor was pushed to the networks, and that the memory attached to it is available again for the upper layer. The MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag can be used by the PML to force the BTL to always trigger the callback. Unmodified BTL will continue to work as expected, as they will return OMPI_SUCCESS which force the PML to have exactly the same behavior as before. Some BTLs have been modified: self, sm, tcp, mx. 2. Add send immediate interface to BTL. The idea is to have a mechanism of allowing the BTL to take advantage of send optimizations such as the ability to deliver data "inline". Some network APIs such as Portals allow data to be sent using a "thin" event without packing data into a memory descriptor. This interface change allows the BTL to use such capabilities and allows for other optimizations in the future. All existing BTLs except for Portals and sm have this interface set to NULL. This commit was SVN r18551.	2008-05-30 03:58:39 +00:00
Jeff Squyres	728ee47be4	Just check for the presents of $sysfsdir/class/infiniband and check that it's a directory. That's good enough to know that the OpenFabrics kernel drivers have been loaded. If you have no RDMA devices and don't want to see the OMPI warning about not finding any devices, then don't start the OpenFabrics kernel drivers. This commit was SVN r18540.	2008-05-29 14:19:51 +00:00
Nysal Jan	25ac3629e9	eHCA does not have SRQ. Adding receive_queues value so that it works out of the box This commit was SVN r18537.	2008-05-29 13:55:39 +00:00
Jeff Squyres	d5bf8fe005	Remove unused variables. This commit was SVN r18532.	2008-05-29 11:58:16 +00:00
Jeff Squyres	e5ea9d08ca	Fixes trac:1305: check to see if $sysfsdir/class/infiniband exists and is non-empty. If not, then exit the openib btl silently. This addresses the case where libibverbs is installed (which is getting more common) and therefore the openib BTL was built/installed, but the kernel drivers are not loaded (assumedly because there is no RDMA hardware present). In this case, "mpirun a.out" will not issue a warning. There appears to be no good way to definitely tell if there are no RDMA hardware devices present. For example, if libibverbs/the openib BTL is installed, there are no RDMA devices present, but the RDMA hardware kernel drivers ''are'' loaded, OMPI will warn that it was unable to find suitable devices. This warning is easily eliminated by unloading the kernel drivers. This commit was SVN r18530. The following Trac tickets were found above: Ticket 1305 --> https://svn.open-mpi.org/trac/ompi/ticket/1305	2008-05-28 22:05:47 +00:00
Pavel Shamis	28c763f751	Fixing the error flow when somebody tries to use XRC without XOOB. This commit was SVN r18527.	2008-05-28 15:56:04 +00:00
Pavel Shamis	2c81b0ab9a	Fixing compilation warning in btl_openib_connect_ibcm.c This commit was SVN r18526.	2008-05-28 15:20:48 +00:00
Pavel Shamis	879a9fe45c	setup_qps() may exit with error. This commit was SVN r18523.	2008-05-28 11:36:38 +00:00
Pavel Shamis	e657a03143	Fixing broken XRC initialization flow. This commit was SVN r18522.	2008-05-28 11:31:38 +00:00
Pavel Shamis	6596d19c90	Adding new ConnectX vendor_part_id. Fix for ticket #1310 . This commit was SVN r18495.	2008-05-26 12:25:49 +00:00
Jeff Squyres	e1f118d0e6	Remove unused variable This commit was SVN r18491.	2008-05-24 13:05:04 +00:00
Jeff Squyres	1b50e5f6a5	Use the right variable in the output This commit was SVN r18487.	2008-05-23 13:11:12 +00:00
Jeff Squyres	8faeeab81a	Style cleanup only: s/struct foo/foo_t/g to conform to rest of code base This commit was SVN r18483.	2008-05-22 19:26:00 +00:00
Jeff Squyres	1f7f0e1f96	Fixes trac:1281 * s/port/tcp_port/g where relevant to disambiguate TCP port from device port * Rework ipaddrcheck to make it work in the LMC>0 case This commit was SVN r18482. The following Trac tickets were found above: Ticket 1281 --> https://svn.open-mpi.org/trac/ompi/ticket/1281	2008-05-22 19:18:15 +00:00
Jon Mason	d0e26b1cf6	Add pretty comments for _iwarp. This commit was SVN r18478.	2008-05-22 18:02:20 +00:00
Jeff Squyres	62ac6533e0	* Add proper copyrights * Ensure _iwarp.h is always included, or you'll get warnings on platforms that don't have the RDMACM * Add skeleton for function descriptions in comments in iwarp.h This commit was SVN r18477.	2008-05-22 17:41:43 +00:00
Jeff Squyres	28b56c389a	Only check if the opal_ifindex is >= 0 (opal_ifbegin() and opal_ifnext() return -1 upon completion); don't check it against opal_ifcount() -- the interface indexes aren't necessarily related to how many interfaces were found. This commit was SVN r18476.	2008-05-22 02:10:23 +00:00
Jeff Squyres	27978b29f8	Fixes trac:1302: ensure to also use the LID for identifing an incoming IBCM request (not just the port number). This commit was SVN r18475. The following Trac tickets were found above: Ticket 1302 --> https://svn.open-mpi.org/trac/ompi/ticket/1302	2008-05-22 01:28:34 +00:00
George Bosilca	df2156568d	The Elan BTL is now thread safe, and can be build in all conditions. This commit was SVN r18471.	2008-05-21 20:44:37 +00:00
Pak Lui	1585789e8b	Fix the undeclared variable. This commit was SVN r18470.	2008-05-21 04:09:54 +00:00
Jon Mason	b9c25efbd2	Modify to comply with the "prefix rule" and remove "static inline" for the non-rdmacm enabled case. This should fix Ticket #1294. This commit was SVN r18468.	2008-05-20 23:28:59 +00:00
Jeff Squyres	64f61ebd07	Fixes trac:1285. Really. This commit has the same commit message as r18450, but without the extra bonus memory corruption that was introduced. This commit was SVN r18467. The following SVN revision numbers were found above: r18450 --> open-mpi/ompi@5295902ebe The following Trac tickets were found above: Ticket 1285 --> https://svn.open-mpi.org/trac/ompi/ticket/1285	2008-05-20 21:53:42 +00:00
Jeff Squyres	01a7f7eeb6	Switch orte_output* -> OPAL_OUTPUT* for two reasons: 1. We can't use orte_output in the CPC service thread because orte is not thread safe 1. Use the macro version sso that they're compiled out of production builds This commit was SVN r18455.	2008-05-19 17:42:51 +00:00
Jeff Squyres	76fc8dd188	Revert r18450 -- there is some memory badness in there somewhere... This commit was SVN r18451. The following SVN revision numbers were found above: r18450 --> open-mpi/ompi@5295902ebe	2008-05-18 19:11:45 +00:00
Jeff Squyres	5295902ebe	Fixes trac:1285: * allow receive_queues to be specified in the INI file * detect when multiple different receive_queues are specified and gracefully abort However, accomplishing these goals ran into multiple difficulties. By putting receive_queues in the INI file: 1. we may not find the value until we've already traversed multiple HCAs 1. we may find multiple different receive_queues values But since the openib btl initializes as it discovers each HCA/port/LID (including the BSRQ data), if we find a new receive_queues value late in the discovery process, then all the BSRQ data that was previously initialized will likely be invalid. So I had to pull all the BSRQ initialization out until after the rest of the discovery / initialization process. Additionally, note that if the user specifies the MCA parameter btl_openib_receive_queues, it trumps whatever was in the INI file. So in this case, there can never be a receive_queues conflict. This commit does the following (Jon wrote part of this, too): * adapt _ini.c to accept the "receive_queues" field in the file * move 90% of _setup_qps() from _ini.c to _component.c * move what was left of _setup_qps() into the main _register_mca_params() function * adapt init_one_hca() to detect conflicting receive_queues values from the INI file * after the _component.c loop calling init_one_hca(): * call setup_qps() to parse the final receive_queues string value * traverse all resulting btls and initialize their HCAs (if they weren't already): setup some lists and call prepare_hca_for_use() I tested this code on a dual-HCA system where I artificially put in differing receive_queues values in the INI file for the two different types of HCAs that I have and it all seemed to work. This commit was SVN r18450. The following Trac tickets were found above: Ticket 1285 --> https://svn.open-mpi.org/trac/ompi/ticket/1285	2008-05-18 18:50:56 +00:00
Jeff Squyres	caacaadb0a	Minor shuffling of code: no need to query the GID in the iWARP case. This commit was SVN r18446.	2008-05-16 03:36:48 +00:00
Jeff Squyres	9f1b5237fe	Ensure to return an error rather than continue This commit was SVN r18445.	2008-05-16 03:36:11 +00:00
Jeff Squyres	6546898f09	Minor style cleanups; nothing very important in this commit. This commit was SVN r18444.	2008-05-16 03:28:20 +00:00
Jeff Squyres	5c91f53848	Fix a minor memory leak This commit was SVN r18443.	2008-05-16 03:27:42 +00:00
Jeff Squyres	671f0c379d	Remove a whole pile of orte/util/show_help.h's that I missed. :-( This commit was SVN r18437.	2008-05-14 11:32:33 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Jon Mason	125eb5a2ed	Convert from the Linux ifaddrs to the OMPI ifaddrs, which should unbreak Solaris. This commit was SVN r18433.	2008-05-13 18:34:22 +00:00
Jeff Squyres	d8e5608053	Remove all retransmission code; the IBCM kernel module handles all of that for us. This commit was SVN r18432.	2008-05-13 16:10:34 +00:00
Jon Mason	74bf1ae25f	Fix compiler warnings This commit was SVN r18431.	2008-05-13 16:01:58 +00:00
Jon Mason	4ead9442b5	Add in IDs for all Chelsio iWARP capable adapters This commit was SVN r18428.	2008-05-12 21:59:03 +00:00
Jeff Squyres	6b26895ad4	A little style update -- constants on the left... This commit was SVN r18426.	2008-05-12 12:05:16 +00:00
Jeff Squyres	16cde0e5fa	Fix compile error on older OFED systems This commit was SVN r18425.	2008-05-12 11:56:14 +00:00
Gleb Natapov	6844ff32ba	Return OMPI_ERR_RESOURCE_BUSY from sm->btl_send() function if there is no place in cb. This will prevent OB1 from doing early completion of small sends. This commit was SVN r18424.	2008-05-12 07:15:29 +00:00
Gleb Natapov	0827e537fa	Don't include rdma/rdma_cma.h if !OMPI_HAVE_RDMACM. This commit was SVN r18422.	2008-05-11 11:58:02 +00:00
Jon Mason	99ab66e131	RDMACM code cleanup This patch adds some much needed comments, reduces the amount of code wrapping, and rearrges and removes redundant code. This commit was SVN r18417.	2008-05-08 21:20:12 +00:00

1 2 3 4 5 ...

1091 Коммитов