openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	828ae26d90	ORTE-level MCA params are defined in several places. Ompi_info cannot call orte_init due to an issue with the memory allocator, thus making it impossible for ompi_info to display all of the ORTE-level MCA params. By consolidating them all into one function, ompi_info can call that function and register the desired variables. This also requires, however, that ompi_info call orte_output_init to avoid generating tons of error messages, so make that adjustment too. Fixes ticket #1314 In addition, orte_output has a race condition issue whereby calls to orte_output/verbose can occur prior to either the RML being defined/setup, or the HNP being defined. This latter occurs during the initialization of the orte_process_info structure. In both cases, there is no way orte_output can send the output to the HNP. Hence, the message must be simply output locally. Fixes ticket #1315 This commit was SVN r18524.	2008-05-28 13:29:58 +00:00
Pavel Shamis	879a9fe45c	setup_qps() may exit with error. This commit was SVN r18523.	2008-05-28 11:36:38 +00:00
Pavel Shamis	e657a03143	Fixing broken XRC initialization flow. This commit was SVN r18522.	2008-05-28 11:31:38 +00:00
Rolf vandeVaart	18879285c7	Fix the selection logic to prevent memory leaks. More work may be done in the priority logic but for now we just fix the leaks and preserve current behavior. This commit fixes trac:1307. This commit was SVN r18504. The following Trac tickets were found above: Ticket 1307 --> https://svn.open-mpi.org/trac/ompi/ticket/1307	2008-05-27 14:16:39 +00:00
Pavel Shamis	6596d19c90	Adding new ConnectX vendor_part_id. Fix for ticket #1310 . This commit was SVN r18495.	2008-05-26 12:25:49 +00:00
Gleb Natapov	5fabade090	Use payload_buffer_alignment value for payload alignment. This commit was SVN r18493.	2008-05-26 08:29:02 +00:00
Jeff Squyres	e1f118d0e6	Remove unused variable This commit was SVN r18491.	2008-05-24 13:05:04 +00:00
Rolf vandeVaart	5baa733ad5	Fix another warning (using a variable before it was initialized.) Thanks Jeff for pointing this out. This commit was SVN r18489.	2008-05-23 13:57:55 +00:00
Rolf vandeVaart	0d8faf7559	Fix the fix for ticket #1298 . Thanks George for pointing it out. This commit was SVN r18488.	2008-05-23 13:33:38 +00:00
Jeff Squyres	1b50e5f6a5	Use the right variable in the output This commit was SVN r18487.	2008-05-23 13:11:12 +00:00
Rich Graham	b08839f9f5	change reduce-scatter/gather for non-power of 2. Spreading out the load for the non-power of 2 phase of the reduction. This commit was SVN r18486.	2008-05-22 21:42:42 +00:00
Rich Graham	f2a4b67809	automate the allreduce selection logic. This commit was SVN r18484.	2008-05-22 20:53:35 +00:00
Jeff Squyres	8faeeab81a	Style cleanup only: s/struct foo/foo_t/g to conform to rest of code base This commit was SVN r18483.	2008-05-22 19:26:00 +00:00
Jeff Squyres	1f7f0e1f96	Fixes trac:1281 * s/port/tcp_port/g where relevant to disambiguate TCP port from device port * Rework ipaddrcheck to make it work in the LMC>0 case This commit was SVN r18482. The following Trac tickets were found above: Ticket 1281 --> https://svn.open-mpi.org/trac/ompi/ticket/1281	2008-05-22 19:18:15 +00:00
Rolf vandeVaart	8c3b31b181	Need to properly handle zero-length scatters and gathers on intercommunicators. Add a check for the MPI_ROOT and MPI_PROC_NULL processes so they do not enter collective module when count=0. This commit was SVN r18481.	2008-05-22 19:09:43 +00:00
Rich Graham	5900415a25	for non-powers of 2, distribute the work on the first step among all the procs doing the work. This commit was SVN r18480.	2008-05-22 18:50:53 +00:00
Jon Mason	d0e26b1cf6	Add pretty comments for _iwarp. This commit was SVN r18478.	2008-05-22 18:02:20 +00:00
Jeff Squyres	62ac6533e0	* Add proper copyrights * Ensure _iwarp.h is always included, or you'll get warnings on platforms that don't have the RDMACM * Add skeleton for function descriptions in comments in iwarp.h This commit was SVN r18477.	2008-05-22 17:41:43 +00:00
Jeff Squyres	28b56c389a	Only check if the opal_ifindex is >= 0 (opal_ifbegin() and opal_ifnext() return -1 upon completion); don't check it against opal_ifcount() -- the interface indexes aren't necessarily related to how many interfaces were found. This commit was SVN r18476.	2008-05-22 02:10:23 +00:00
Jeff Squyres	27978b29f8	Fixes trac:1302: ensure to also use the LID for identifing an incoming IBCM request (not just the port number). This commit was SVN r18475. The following Trac tickets were found above: Ticket 1302 --> https://svn.open-mpi.org/trac/ompi/ticket/1302	2008-05-22 01:28:34 +00:00
George Bosilca	21b940887a	Tricky stuff !!! If we post a receive for ZERO bytes and we match it with something with a different size ... well we segfault. The reason was that the logic in the PML OB1 call the convertor based on the length of he data on the wire and not the length of the data that the receiver expects. In other words, this is only half a patch :) It fix the problem, but we still have to make sure the unpack is not called at all when the receiver expect ZERO bytes. This commit was SVN r18474.	2008-05-21 23:31:34 +00:00
George Bosilca	c31cc5b270	Remove a warning about line being unused. This commit was SVN r18472.	2008-05-21 20:46:22 +00:00
George Bosilca	df2156568d	The Elan BTL is now thread safe, and can be build in all conditions. This commit was SVN r18471.	2008-05-21 20:44:37 +00:00
Pak Lui	1585789e8b	Fix the undeclared variable. This commit was SVN r18470.	2008-05-21 04:09:54 +00:00
Rich Graham	afd71abde6	remove some useless qualifiers. This commit was SVN r18469.	2008-05-21 01:11:49 +00:00
Jon Mason	b9c25efbd2	Modify to comply with the "prefix rule" and remove "static inline" for the non-rdmacm enabled case. This should fix Ticket #1294. This commit was SVN r18468.	2008-05-20 23:28:59 +00:00
Jeff Squyres	64f61ebd07	Fixes trac:1285. Really. This commit has the same commit message as r18450, but without the extra bonus memory corruption that was introduced. This commit was SVN r18467. The following SVN revision numbers were found above: r18450 --> open-mpi/ompi@5295902ebe The following Trac tickets were found above: Ticket 1285 --> https://svn.open-mpi.org/trac/ompi/ticket/1285	2008-05-20 21:53:42 +00:00
Edgar Gabriel	0500420bec	fixing a bug in the inter-communicator scatter operation, where we used accidentally rcount instead of scounts. This commit was SVN r18466.	2008-05-20 21:17:19 +00:00
Rolf vandeVaart	74d0259480	Add new implentation of barrier. This shows better performance on some clusters. However, no decision logic is changed by this commit so default behavior has not changed. This is only selectable by runtime parameters. This commit was SVN r18464.	2008-05-20 17:37:41 +00:00
Rolf vandeVaart	71091a19c3	Fix bug in spacing of code per https://svn.open-mpi.org/trac/ompi/wiki/CodingStyle . This commit was SVN r18463.	2008-05-20 14:11:10 +00:00
Jeff Squyres	a9e26c33e0	Ensure that we don't try to call orte_show_help() before orte_init() succeeds. This commit was SVN r18458.	2008-05-19 21:57:54 +00:00
Rolf vandeVaart	763f5259a8	Fix memory leak of 88 bytes that occurred on each call to MPI_Comm_dup. Need to release the items and the item list after selecting the collective modules that are being used. Reviewed by Jeff Squyres. This commit was SVN r18457.	2008-05-19 21:34:01 +00:00
Jeff Squyres	c8c01572d0	ompi_info was erroneously not showing all the paths that it supports (via compiled-in defaults/configure, or via env variables). This commit was SVN r18456.	2008-05-19 17:44:56 +00:00
Jeff Squyres	01a7f7eeb6	Switch orte_output* -> OPAL_OUTPUT* for two reasons: 1. We can't use orte_output in the CPC service thread because orte is not thread safe 1. Use the macro version sso that they're compiled out of production builds This commit was SVN r18455.	2008-05-19 17:42:51 +00:00
Jeff Squyres	7154776465	Removed unused variable / compiler warning. This commit was SVN r18454.	2008-05-19 13:41:45 +00:00
Jeff Squyres	76fc8dd188	Revert r18450 -- there is some memory badness in there somewhere... This commit was SVN r18451. The following SVN revision numbers were found above: r18450 --> open-mpi/ompi@5295902ebe	2008-05-18 19:11:45 +00:00
Jeff Squyres	5295902ebe	Fixes trac:1285: * allow receive_queues to be specified in the INI file * detect when multiple different receive_queues are specified and gracefully abort However, accomplishing these goals ran into multiple difficulties. By putting receive_queues in the INI file: 1. we may not find the value until we've already traversed multiple HCAs 1. we may find multiple different receive_queues values But since the openib btl initializes as it discovers each HCA/port/LID (including the BSRQ data), if we find a new receive_queues value late in the discovery process, then all the BSRQ data that was previously initialized will likely be invalid. So I had to pull all the BSRQ initialization out until after the rest of the discovery / initialization process. Additionally, note that if the user specifies the MCA parameter btl_openib_receive_queues, it trumps whatever was in the INI file. So in this case, there can never be a receive_queues conflict. This commit does the following (Jon wrote part of this, too): * adapt _ini.c to accept the "receive_queues" field in the file * move 90% of _setup_qps() from _ini.c to _component.c * move what was left of _setup_qps() into the main _register_mca_params() function * adapt init_one_hca() to detect conflicting receive_queues values from the INI file * after the _component.c loop calling init_one_hca(): * call setup_qps() to parse the final receive_queues string value * traverse all resulting btls and initialize their HCAs (if they weren't already): setup some lists and call prepare_hca_for_use() I tested this code on a dual-HCA system where I artificially put in differing receive_queues values in the INI file for the two different types of HCAs that I have and it all seemed to work. This commit was SVN r18450. The following Trac tickets were found above: Ticket 1285 --> https://svn.open-mpi.org/trac/ompi/ticket/1285	2008-05-18 18:50:56 +00:00
Jeff Squyres	87d4201bdf	From our faithful Debian package maintainers: remove some lint-quality lines from the man pages. This commit was SVN r18449.	2008-05-16 14:58:52 +00:00
Jeff Squyres	1cc663ebf6	Change this back to use opal_init_util() -- using orte_init() mucks with the C++ memory allocator. Let's not go there. This commit was SVN r18447.	2008-05-16 14:18:56 +00:00
Jeff Squyres	caacaadb0a	Minor shuffling of code: no need to query the GID in the iWARP case. This commit was SVN r18446.	2008-05-16 03:36:48 +00:00
Jeff Squyres	9f1b5237fe	Ensure to return an error rather than continue This commit was SVN r18445.	2008-05-16 03:36:11 +00:00
Jeff Squyres	6546898f09	Minor style cleanups; nothing very important in this commit. This commit was SVN r18444.	2008-05-16 03:28:20 +00:00
Jeff Squyres	5c91f53848	Fix a minor memory leak This commit was SVN r18443.	2008-05-16 03:27:42 +00:00
Rolf vandeVaart	375406e1fa	Remove the ignore files as decided at Tuesday's developers conference call. Now, hierarchical collectives will be compiled in but the priority is still at 0 requiring a user to set mca parameters to enable them. This commit was SVN r18440.	2008-05-15 01:26:52 +00:00
Josh Hursey	35a2af28d1	Cleanup the CRCP Coord timing functionality. Provides a rough assessment of time each element of the algorithm is taking. There are more details in the code regarding how to use this feature. Also shift a few of the orte_output back to opal_output. I'm experiencing an odd problem with locks in the oob/tcp when using orte_output. I haven't had time to track it down yet. This commit was SVN r18439.	2008-05-14 19:54:20 +00:00
Jeff Squyres	671f0c379d	Remove a whole pile of orte/util/show_help.h's that I missed. :-( This commit was SVN r18437.	2008-05-14 11:32:33 +00:00
Jeff Squyres	fb17097de4	Make ompi_info correctly display "filter" components This commit was SVN r18435.	2008-05-13 20:56:20 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Jon Mason	125eb5a2ed	Convert from the Linux ifaddrs to the OMPI ifaddrs, which should unbreak Solaris. This commit was SVN r18433.	2008-05-13 18:34:22 +00:00
Jeff Squyres	d8e5608053	Remove all retransmission code; the IBCM kernel module handles all of that for us. This commit was SVN r18432.	2008-05-13 16:10:34 +00:00

1 2 3 4 5 ...

3699 Коммитов