openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	0af7ac53f2	Fixes trac:1392, #1400 * add "register" function to mca_base_component_t * converted coll:basic and paffinity:linux and paffinity:solaris to use this function * we'll convert the rest over time (I'll file a ticket once all this is committed) * add 32 bytes of "reserved" space to the end of mca_base_component_t and mca_base_component_data_2_0_0_t to make future upgrades [slightly] easier * new mca_base_component_t size: 196 bytes * new mca_base_component_data_2_0_0_t size: 36 bytes * MCA base version bumped to v2.0 * '''We now refuse to load components that are not MCA v2.0.x''' * all MCA frameworks versions bumped to v2.0 * be a little more explicit about version numbers in the MCA base * add big comment in mca.h about versioning philosophy This commit was SVN r19073. The following Trac tickets were found above: Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392	2008-07-28 22:40:57 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
Rich Graham	b08839f9f5	change reduce-scatter/gather for non-power of 2. Spreading out the load for the non-power of 2 phase of the reduction. This commit was SVN r18486.	2008-05-22 21:42:42 +00:00
Rich Graham	f2a4b67809	automate the allreduce selection logic. This commit was SVN r18484.	2008-05-22 20:53:35 +00:00
Rich Graham	5900415a25	for non-powers of 2, distribute the work on the first step among all the procs doing the work. This commit was SVN r18480.	2008-05-22 18:50:53 +00:00
Jeff Squyres	671f0c379d	Remove a whole pile of orte/util/show_help.h's that I missed. :-( This commit was SVN r18437.	2008-05-14 11:32:33 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Rich Graham	df35223603	add selection logic for barrier and reduce. This commit was SVN r18215.	2008-04-19 22:40:04 +00:00
Rich Graham	bee8b42f29	remove debug code that would not let people run. Add infrastructure for blocking-barrier. This commit was SVN r18214.	2008-04-19 01:34:04 +00:00
Rich Graham	6c77fa4921	add a blocking shared memory algorithm. This commit was SVN r18185.	2008-04-16 22:10:23 +00:00
Rich Graham	249445d61f	added reduce-scatter followed by gather to root. This commit was SVN r18133.	2008-04-11 13:49:08 +00:00
Rich Graham	a6bdbfab97	implement allreduce as reduce-scatter, followed by an allgather. This commit was SVN r18132.	2008-04-11 04:06:29 +00:00
Rich Graham	70f3aab5f2	remove some code that is not needed. This commit was SVN r18128.	2008-04-10 17:32:04 +00:00
Rich Graham	5c7db1e315	remove 2 race conditions in the buffer recycling logic. This commit was SVN r18127.	2008-04-10 17:20:52 +00:00
Rich Graham	c6783549ef	getting old This commit was SVN r18110.	2008-04-09 16:55:16 +00:00
Rich Graham	1a20c3ce51	more debug. This commit was SVN r18109.	2008-04-09 16:19:52 +00:00
Rich Graham	e7e18303f6	more debug. This commit was SVN r18108.	2008-04-09 15:10:58 +00:00
Rich Graham	b14c6b17d5	adding debug output. This commit was SVN r18107.	2008-04-09 13:32:01 +00:00
Rich Graham	10434fb2f1	add barrier synchorinzation at the end of the module init, to avoid initializing shared memory variables in use. This commit was SVN r18105.	2008-04-09 03:44:40 +00:00
Rich Graham	19bb1a2e86	fix initialization bug. This commit was SVN r18104.	2008-04-08 23:34:06 +00:00
Rich Graham	a69a8d9626	initialize the flags. This commit was SVN r18102.	2008-04-08 22:16:39 +00:00
Rich Graham	8765a2bbdd	more debug code. This commit was SVN r18101.	2008-04-08 20:38:20 +00:00
Rich Graham	08becf33b5	add more debugging. This commit was SVN r18100.	2008-04-08 18:44:50 +00:00
Rich Graham	aa1b7dd406	more debug This commit was SVN r18099.	2008-04-08 03:56:47 +00:00
Rich Graham	0c18bdeff7	more debug code. This commit was SVN r18098.	2008-04-08 03:04:20 +00:00
Rich Graham	9d5a7238df	Add some debugging code. This commit was SVN r18097.	2008-04-07 23:20:15 +00:00
Rich Graham	fa696734d5	add some debug code. This commit was SVN r18096.	2008-04-07 21:03:23 +00:00
Rich Graham	1b54e8b76e	fix buffer management for nb-barrier. This commit was SVN r18081.	2008-04-05 21:59:04 +00:00
Rich Graham	94f8fd365c	a few reduction optimizations. Add bcast. This commit was SVN r18075.	2008-04-02 19:02:33 +00:00
Rich Graham	eb5d6096f1	add reduction routine - fix buffer recycling logic which was totally broken. This commit was SVN r18065.	2008-04-01 22:56:18 +00:00
Rich Graham	90e53ca9ee	debug the pipeline algorithm. This commit was SVN r18008.	2008-03-28 15:10:07 +00:00
Rich Graham	e2ad9c4be2	adjust to change in orte_process_info. This commit was SVN r17986.	2008-03-27 01:25:28 +00:00
Rich Graham	441fb9fb9e	checkpoint. This commit was SVN r17985.	2008-03-27 01:16:32 +00:00
Ralph Castain	cca449e379	Move an OMPI RML tag to the OMPI layer This commit was SVN r17950.	2008-03-25 13:30:48 +00:00
Ralph Castain	dc7f45dafd	Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure. Only one place used the user name field - session_dir, when formulating the name of the top-level directory. Accordingly, the code for getting the user's id has been moved to the session_dir code. This commit was SVN r17926.	2008-03-23 23:10:15 +00:00
Rich Graham	a7c836a2b0	fix location of the restrict key word. Make the tag in the fan-in/fan-out algorithm be fragment based. This commit was SVN r17903.	2008-03-21 01:40:36 +00:00
Rich Graham	2c66d396b7	take care of some bit-rot with the fanin-fanout method. This commit was SVN r17902.	2008-03-21 01:08:49 +00:00
Rich Graham	b9520e61dc	get the sm optimized allreduce working for all but user defined operations. Added to the reduction operations a set of reduction functions that take 2 input buffers and one output buffer to avoid some extra memory copies. These can't be used with user defined operations. The intel c collective suite passes both original, and new (new, not the user defined operations). This commit was SVN r17901.	2008-03-20 23:51:16 +00:00
Rich Graham	27182afb67	get the timers in correctly. This commit was SVN r17832.	2008-03-16 03:25:16 +00:00
Rich Graham	afcd1016fd	move temp buffer allocation out of the iteration loop - i.e. always use the same temp loop. The algorithm is rather synchronous already... This commit was SVN r17831.	2008-03-16 03:20:46 +00:00
Rich Graham	a1766b29f6	fix some barrier addressing errors. This commit was SVN r17830.	2008-03-15 22:46:19 +00:00
Rich Graham	0453e7d2f4	bug in management memory allocation - too much memory allocated. This commit was SVN r17829.	2008-03-15 18:12:20 +00:00
Rich Graham	3c2f1eb8bf	reduce the number of temp buffers used. This commit was SVN r17828.	2008-03-15 17:23:04 +00:00
Rich Graham	0f9d642d51	temp buffer pointers are computed when they are set up. A bit more efficient, but more important, it is much easier to play around with memory layout now. This commit was SVN r17827.	2008-03-15 16:36:35 +00:00
Rich Graham	e3e336b5ab	check point This commit was SVN r17826.	2008-03-15 13:31:21 +00:00
Rich Graham	ebcf928c24	add some diagnostics. This commit was SVN r17789.	2008-03-07 22:27:41 +00:00
Rich Graham	9131461511	move some test code to another machine. This commit was SVN r17785.	2008-03-07 19:18:02 +00:00
Rich Graham	c230b65543	fix a couple of bugs. Recursive doubling seems to be working. This commit was SVN r17777.	2008-03-07 02:51:38 +00:00
Rich Graham	70157166f9	checkpoint - compiles, now neeed to debug. This commit was SVN r17775.	2008-03-07 00:39:59 +00:00
Rich Graham	4eace9d020	starting to implement recursive doubling algorithm. This commit was SVN r17765.	2008-03-06 18:38:58 +00:00

1 2

79 Коммитов