openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	0af7ac53f2	Fixes trac:1392, #1400 * add "register" function to mca_base_component_t * converted coll:basic and paffinity:linux and paffinity:solaris to use this function * we'll convert the rest over time (I'll file a ticket once all this is committed) * add 32 bytes of "reserved" space to the end of mca_base_component_t and mca_base_component_data_2_0_0_t to make future upgrades [slightly] easier * new mca_base_component_t size: 196 bytes * new mca_base_component_data_2_0_0_t size: 36 bytes * MCA base version bumped to v2.0 * '''We now refuse to load components that are not MCA v2.0.x''' * all MCA frameworks versions bumped to v2.0 * be a little more explicit about version numbers in the MCA base * add big comment in mca.h about versioning philosophy This commit was SVN r19073. The following Trac tickets were found above: Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392	2008-07-28 22:40:57 +00:00
Rolf vandeVaart	9c080b27d6	Fix for bug when running 64-bit heterogeneous. This commit fixes trac:1341. This commit was SVN r18940. The following Trac tickets were found above: Ticket 1341 --> https://svn.open-mpi.org/trac/ompi/ticket/1341	2008-07-17 19:04:40 +00:00
Terry Dontje	12baa72580	This commit fixes trac:1306 This commit was SVN r18718. The following Trac tickets were found above: Ticket 1306 --> https://svn.open-mpi.org/trac/ompi/ticket/1306	2008-06-24 14:38:11 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Shiqing Fan	f35a06119c	Use memchecker_convertor_call function instead the old one. Move the function to the place that we can use convertor. This commit was SVN r18370.	2008-05-05 13:57:27 +00:00
Ralph Castain	fa082cafa9	Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex. Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer. This commit was SVN r18198.	2008-04-17 20:43:56 +00:00
Shiqing Fan	79da2fdd2c	Use the new memchecker convertor function. Remove some unnecessary memchecker calls. This commit was SVN r18172.	2008-04-16 13:24:35 +00:00
Shiqing Fan	54c7b71cfd	Use the correct way of including memchecker.h, which will work with '--with-devel-headers'. This commit was SVN r17435.	2008-02-12 18:01:17 +00:00
Shiqing Fan	f5792bbda5	merging the memchecker into trunk. This commit was SVN r17424.	2008-02-12 08:46:27 +00:00
Tim Prins	b88a3f7a94	Update onesided components to fix the case (on 64 bit machines) where the total offset is greater than 2^31-1 bytes. See: http://www.open-mpi.org/community/lists/users/2008/01/4880.php This commit was SVN r17400.	2008-02-07 18:45:35 +00:00
Jeff Squyres	213b5d5c6e	Per long threads on the mailing list and much confusion discussion about linkers, have all OPAL, ORTE, and OMPI components '''not'' link against the OPAL, ORTE, or OMPI libraries. See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a better-formatted version of the same info). This commit was SVN r16968.	2007-12-15 13:32:02 +00:00
Shiqing Fan	efdcfa3807	- "extern 'C'" has been set twice. Remove one. This commit was SVN r16022.	2007-08-30 15:03:59 +00:00
Jeff Squyres	466394a878	We only care about the value of ret in the !OMPI_ENABLE_PROGRESS_THREADS case. Reviewed by Brian. This commit was SVN r16000.	2007-08-29 01:36:17 +00:00
Brian Barrett	af4e86c25f	Update collectives selection logic to allow for multiple components to be used at nce (up to one unique collective module per collective function). Matches r15795:15921 of the tmp/bwb-coll-select branch This commit was SVN r15924. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15795 r15921	2007-08-19 03:37:49 +00:00
Brian Barrett	7a9a8c7e17	Support reduction operations other than MPI_REPLACE for user-defined datatypes with MPI_ACCUMULATE This commit was SVN r15418.	2007-07-13 20:46:12 +00:00
Brian Barrett	739fed9dc9	Don't poke at internal structure fiealds of communicators or groups, but instead use accessor functions This commit was SVN r15366.	2007-07-11 17:16:06 +00:00
Brian Barrett	2ed0548da8	* No need for waiting until exposure epochs are over in order to complete a WIN_FREE * Fix race condition in threaded builds with pending unlocks and finishing an epoch * Fix memory leak due to use of OBJ_DESTRUCT instead of OBJ_RELEASE * Fix race condition between releasing multiple shared locks and starting a new lock * Need to incremement the shared count if starting a new shared lock once an exclusive lock finishes This commit was SVN r15185.	2007-06-24 22:36:00 +00:00
Brian Barrett	5528e0ca60	Properly initialize variable for threaded case This commit was SVN r15174.	2007-06-22 15:29:06 +00:00
Brian Barrett	5f16251808	revert r15167. I don't know what I was thinking, but it was most definitely "not right". This commit was SVN r15172. The following SVN revision numbers were found above: r15167 --> open-mpi/ompi@faa401dc47	2007-06-22 15:25:39 +00:00
Brian Barrett	80c50120ad	debugging output should be macro version This commit was SVN r15168.	2007-06-21 22:09:37 +00:00
Brian Barrett	faa401dc47	* Need to OBJ_RELEASE, not OBJ_DESTRUCT things that were created with OBJ_NEW * Need to single when the passive unlock has left an expose epoch for the win_free case * Clean up some debugging output * fix missing variable initialization This commit was SVN r15167.	2007-06-21 22:08:30 +00:00
Brian Barrett	0798c0784d	properly set fields so that most difficult alignment rules are always met. This commit was SVN r14854.	2007-06-05 01:46:04 +00:00
Brian Barrett	a2713dcac8	eeks! Bad to notice after committing the pt2pt part of r14806 that the compile failed because of the wrong variable name. This commit was SVN r14807. The following SVN revision numbers were found above: r14806 --> open-mpi/ompi@7e57bbb0ef	2007-05-30 20:33:08 +00:00
Brian Barrett	7e57bbb0ef	React slightly better when datatype creation from a buffer fails This commit was SVN r14806.	2007-05-30 20:32:02 +00:00
George Bosilca	8b817e96fd	Allow threaded compilation. This commit was SVN r14775.	2007-05-25 01:53:29 +00:00
Brian Barrett	38b0d22243	Some cleanups to the pt2pt component * Remove unused declaration * remove unused variable warning when not using progress threads * If we're using progress threads, we want to lock, not trylock when in progress, since it was called from the wakeup thread and not the progress function This commit was SVN r14739.	2007-05-23 20:31:25 +00:00
Sven Stork	88f0845c44	- let the pt2pt component compile with threads enabled This commit was SVN r14725.	2007-05-23 12:56:34 +00:00
Brian Barrett	38eab3613b	* Fix race condition with the pending_{in,out} variables -- if we're going to do while(...) { } then we can't change the variables in the ... atomically, but should do it while holding the module lock. * Fix dumb communicator creation error when we don't create the progress stuff (because a window already exists), where we would accidently jump to the error case. This commit was SVN r14715.	2007-05-21 20:53:02 +00:00
Brian Barrett	0e9e0c518a	Fix a couple more progress thread related issues... This commit was SVN r14708.	2007-05-21 16:06:14 +00:00
Brian Barrett	1191677b76	Fix dumb threads-related compile issues This commit was SVN r14704.	2007-05-21 03:23:58 +00:00
Brian Barrett	2b4b754925	Some much needed cleanup of the point-to-point one-sided component... * Combine polling of the long requests and buffer requests into one type, and in one place * Associate the list of requests to poll with the component, not the individual modules * add progress thread that sits on the OMPI request structure and wakes up at the appropriate time to poll the message list. Not the best, but without some asynch notification from the PML that a given set of requests has completed, there isn't much better * Instead of calling opal_progress() all over the place, move to using the condition variables like the rest of the project. Has the advantage of moving it slightly futher along in the becoming thread safe thing * Fix a problem with the passive side of unlock where it could go recursive and cause all kinds of problems, especially when progress threads are used. Instead, have two parts of passive unlock -- one to start the unlock, and another to complete the lock and send the ack back. The data moving code trips the second at the right time. This commit was SVN r14703.	2007-05-21 02:21:25 +00:00
Jeff Squyres	51f286d737	Just like r14289 on the ORTE trunk: Per discussions with Brian and Ralph, make a slight correction in where components are installed. Use $pkglibdir, not $libdir/openmpi, so that when compiled in the orte trunk, components are installed to the right directory (because the component search patch is checking $pkglibdir). This commit was SVN r14345. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r14289	2007-04-12 11:19:42 +00:00
Mohamad Chaarawi	bfaf9d4a12	Added new module for intercomm collectives. This will require an autogen. This commit was SVN r14149.	2007-03-27 02:06:42 +00:00
Brian Barrett	62e5e81e99	revert r14142, as the onesided change should not have come over This commit was SVN r14143. The following SVN revision numbers were found above: r14142 --> open-mpi/ompi@241545a098	2007-03-26 15:58:41 +00:00
Brian Barrett	241545a098	Back out r14073 - it speeds up TCP latency / bandwidth but at the same time it kills ROMIO and one-sided performance when using only TCP. The problem is that it only allows those two to be progressed every couple of seconds, leading to what looks like hangs in the one-sided tests (and the ROMIO stuff, although people seem to not notice that at this point). This commit was SVN r14142. The following SVN revision numbers were found above: r14073 --> open-mpi/ompi@64fbbc20b8	2007-03-26 15:56:23 +00:00
Josh Hursey	dadca7da88	Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158	2007-03-16 23:11:45 +00:00
Brian Barrett	d9e0e80190	Make some debugging output only looked at when debugging is enabled This commit was SVN r13777.	2007-02-25 01:03:19 +00:00
Brian Barrett	65b07140c0	clean up some of the printf warnings caused by the attribute code This commit was SVN r13395.	2007-01-31 17:11:06 +00:00
Rainer Keller	061ba05439	- Fixes uncovered with the format attribute to opal_output and opal_output_verbose This commit was SVN r13371.	2007-01-30 20:56:31 +00:00
Brian Barrett	385a435813	Start long message send as soon as possible, to minimze ack time for the receive, greatly increasing mid-range bandwidth This commit was SVN r13317.	2007-01-25 23:07:03 +00:00
Brian Barrett	95c0a17b9a	Send the unlock request before starting the requests. We won't unlock until we get an ack from the remote side, so there's no longer a race there (I used to do the unlock request last, after local completion of all the requests completed, to try to avoid having the passive side reply to the active side, but I don't do that anymore). The unlock side will not "unlock" the window until it actually receives the correct number of results, so we're good there. This fixes an issue where we would receive data on the remote side we weren't expecting that could cause us to release a lock before it really should have been released to the requesting peer. It could also cause a deadlock if one of the processes trying to unlock was "self", as that would result in the active unlock never sending the unlock request, even though it sent the payload, which could cause a counter that should always be positive to hit -1, causing an infinite loop that could only be solved by popping up the stack, which was an impossibility. Refs trac:785 This commit was SVN r13160. The following Trac tickets were found above: Ticket 785 --> https://svn.open-mpi.org/trac/ompi/ticket/785	2007-01-17 21:13:12 +00:00
Brian Barrett	c1be97199b	Fix an issue with recursive calls into the component progress caused by btls sometimes calling opal_progress() during their send calls by dropping the loop through the list of pending control messages if any are marked as completed. Refs trac:784 This commit was SVN r13159. The following Trac tickets were found above: Ticket 784 --> https://svn.open-mpi.org/trac/ompi/ticket/784	2007-01-17 20:48:35 +00:00
Brian Barrett	35c57457c6	Don't call ompi_request_test() if the request isn't likely to finish. Otherwise, we end up recursively calling into the progress functions and corrupting a list that doesn't like to be corrupted. Refs trac:561 This commit was SVN r13138. The following Trac tickets were found above: Ticket 561 --> https://svn.open-mpi.org/trac/ompi/ticket/561	2007-01-17 02:30:11 +00:00
Brian Barrett	f03ffb3a62	Send reply from the passive side of an unlock request back to the active side and only let MPI_WIN_UNLOCK return when the passive side has actively replied that the window is unlocked. Refs trac:761 This commit was SVN r13118. The following Trac tickets were found above: Ticket 761 --> https://svn.open-mpi.org/trac/ompi/ticket/761	2007-01-14 22:08:38 +00:00
George Bosilca	87ff2b5ce8	Cast to the correct type. This commit was SVN r13046.	2007-01-08 22:04:01 +00:00
Brian Barrett	48ec0b2071	Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix for now... This commit was SVN r12997. The following SVN revision numbers were found above: r12974 --> open-mpi/ompi@27cea44a9c	2007-01-04 22:07:37 +00:00
Brian Barrett	27cea44a9c	Fix a number of issues with the ompi_ptr_t: * Make sure that the pval always writes to the correct portion of the lval. This only matters on 32 bit big endian machines. * On 32 bit machines when assigning to pval, the other 4 bytes of lval weren't being written, which could lead to bogus data We use macros so that there aren't casts all over the code and the pval assignment can occur to the correct 4 bytes. Refs trac:587 This commit was SVN r12974. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2007-01-03 19:47:48 +00:00
Brian Barrett	2a09fa2d9d	* silence compiler warning This commit was SVN r12717.	2006-12-01 20:01:53 +00:00
Brian Barrett	beb1e9d4dd	* finish move from hard coded tag to #define'd constant tag This commit was SVN r12674.	2006-11-27 21:55:41 +00:00

1 2 3

102 Коммитов