openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	0af7ac53f2	Fixes trac:1392, #1400 * add "register" function to mca_base_component_t * converted coll:basic and paffinity:linux and paffinity:solaris to use this function * we'll convert the rest over time (I'll file a ticket once all this is committed) * add 32 bytes of "reserved" space to the end of mca_base_component_t and mca_base_component_data_2_0_0_t to make future upgrades [slightly] easier * new mca_base_component_t size: 196 bytes * new mca_base_component_data_2_0_0_t size: 36 bytes * MCA base version bumped to v2.0 * '''We now refuse to load components that are not MCA v2.0.x''' * all MCA frameworks versions bumped to v2.0 * be a little more explicit about version numbers in the MCA base * add big comment in mca.h about versioning philosophy This commit was SVN r19073. The following Trac tickets were found above: Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392	2008-07-28 22:40:57 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
Ralph Castain	c992e99035	Remove the tags from orte_output_open and the filtering operation from orte_output - this will be handled differently to improve the XML output interface This commit was SVN r18557.	2008-06-03 14:24:01 +00:00
George Bosilca	e361bcb64c	Send optimizations. 1. The send path get shorter. The BTL is allowed to return > 0 to specify that the descriptor was pushed to the networks, and that the memory attached to it is available again for the upper layer. The MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag can be used by the PML to force the BTL to always trigger the callback. Unmodified BTL will continue to work as expected, as they will return OMPI_SUCCESS which force the PML to have exactly the same behavior as before. Some BTLs have been modified: self, sm, tcp, mx. 2. Add send immediate interface to BTL. The idea is to have a mechanism of allowing the BTL to take advantage of send optimizations such as the ability to deliver data "inline". Some network APIs such as Portals allow data to be sent using a "thin" event without packing data into a memory descriptor. This interface change allows the BTL to use such capabilities and allows for other optimizations in the future. All existing BTLs except for Portals and sm have this interface set to NULL. This commit was SVN r18551.	2008-05-30 03:58:39 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Galen Shipman	92e3b8671f	nasty memory bug... This commit was SVN r18207.	2008-04-18 03:01:53 +00:00
Christian Bell	987de57c9c	Looks like orte/ns is now gone This commit was SVN r17706.	2008-03-05 00:55:43 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Galen Shipman	b378c8c12c	return success. This commit was SVN r17612.	2008-02-27 02:15:53 +00:00
Galen Shipman	44003a41f2	Update common_portals to allow using portals interconnect with a modex rather than relying on cnos to get the nid/pid map. This commit was SVN r17588.	2008-02-25 19:17:21 +00:00
Ron Brightwell	b02cad2a0b	added optional rendezvous protocol for long messages This commit was SVN r17124.	2008-01-11 22:12:45 +00:00
Jeff Squyres	213b5d5c6e	Per long threads on the mailing list and much confusion discussion about linkers, have all OPAL, ORTE, and OMPI components '''not'' link against the OPAL, ORTE, or OMPI libraries. See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a better-formatted version of the same info). This commit was SVN r16968.	2007-12-15 13:32:02 +00:00
Galen Shipman	4daa552c97	Correct makefile to include all sources, should fix a problem in building a distro.. This commit was SVN r16894.	2007-12-07 18:59:16 +00:00
Ron Brightwell	0138a2ee17	Do cleanup in ompi_mtl_portals_del_procs() rather than ompi_mtl_portals_finalize(). Previous code was cleaning up Portals resources that hadn't been allocated, which caused valid handles used elsewhere to be freed, which broke cnos_barrier() for the Portals btl. This commit was SVN r16801.	2007-11-29 17:29:46 +00:00
Ron Brightwell	a6d6be1bb9	Added send-side optimizations (persistent zero-length md and copy blocks) and support for Acclerated Portals. This commit was SVN r16770.	2007-11-21 21:31:37 +00:00
Rich Graham	27a748e7eb	change all instances of ompi_free_list_init to ompi_free_list_init_new. Header and payload data are specified separately at this stage. This commit was SVN r16633.	2007-11-01 23:38:50 +00:00
George Bosilca	95c9fbdf45	Make sure the MX MTL component is shared between all files. This commit was SVN r16545.	2007-10-22 18:06:52 +00:00
Rich Graham	0de9bd9fa0	when attaching an md for posted receive, generate a start event, so that PtlMDUpdate will pick up all incoming events. This commit was SVN r16517.	2007-10-19 19:09:40 +00:00
Brian Barrett	69952d9603	Fix abort caused by calling PtlEQGet on an invalid eq, which could occur if add_procs was never called. This commit was SVN r15779.	2007-08-06 17:28:11 +00:00
Christian Bell	5ae68f82b2	fix gcc 3.x compilation warnings This commit was SVN r15327.	2007-07-10 13:54:34 +00:00
Brian Barrett	8b9e8054fd	Move modex from pml base to general ompi runtime, sicne it's used by more than just the PML/BTLs these days. Also clean up the code so that it handles the situation where not all nodes register information for a given node (rather than just spinning until that node sends information, like we do today). Includes r15234 and r15265 from the /tmp/bwb-modex branch. This commit was SVN r15310. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15234 r15265	2007-07-09 17:16:34 +00:00
Josh Hursey	acae12d0bb	Fix warning: stderr -> fileno(stderr) This commit was SVN r15207.	2007-06-26 19:28:40 +00:00
Josh Hursey	5199f4123d	Add 2 new MCA parameters to set the size of the expected and unexpected queues. This commit was SVN r15206.	2007-06-26 17:31:43 +00:00
Rich Graham	aa2ffcfcd8	add some output before abort() is called. This commit was SVN r15204.	2007-06-26 15:57:47 +00:00
Galen Shipman	8e7cce813e	don't update MPI_ERROR This commit was SVN r15004.	2007-06-11 21:40:29 +00:00
Galen Shipman	406b05bdc3	update copyright.. This commit was SVN r15003.	2007-06-11 21:17:49 +00:00
Galen Shipman	798cc2c5b8	handle MPI_STATUS_IGNORE in iprobe for the MTLs This commit was SVN r15002.	2007-06-11 20:19:31 +00:00
George Bosilca	5d6c958066	Enable the MTLs to be compiled in a visibility featured environment. This commit was SVN r14955.	2007-06-07 20:14:53 +00:00
Jeff Squyres	51f286d737	Just like r14289 on the ORTE trunk: Per discussions with Brian and Ralph, make a slight correction in where components are installed. Use $pkglibdir, not $libdir/openmpi, so that when compiled in the orte trunk, components are installed to the right directory (because the component search patch is checking $pkglibdir). This commit was SVN r14345. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r14289	2007-04-12 11:19:42 +00:00
Josh Hursey	dadca7da88	Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158	2007-03-16 23:11:45 +00:00
Tim Mattox	ec82d01555	Add a missing extern keyword that prevented compilation on OS X. This commit was SVN r13853.	2007-02-28 20:26:34 +00:00
Sven Stork	870740efe2	- proper export symbols that are required by other components. This commit was SVN r13841.	2007-02-28 12:51:55 +00:00
Ron Brightwell	e15e85a0b6	Fix a problem with long unexpected messages that was causing hangs. Long unexpected messages were not generating PUT_START events because the MD for long unexpected messages was configured to ignore start events. When a long unexpected message arrived, it traversed the match list, and ended up in the long unexpected MD. As the long message is being consumed, the code called PtlMDUpdate() to look for the message, but there was no event that indicated that it had arrived. So, the update succeeded. Once the long unexpected message was consumed, the PUT_END event showed up in the event queue -- except the code wasn't looking for it anymore. The PUT_START events exist specifically to handle ordering between short and long unexpected messages, so PUT_START events can't be ignored on long unexpected messages. Modified the code to generate PUT_START events for both long and short unexpected messages and handle matching up START and END events appropriately. This commit was SVN r13746.	2007-02-21 21:59:48 +00:00
Rich Graham	b925d6588d	add some missing error checking - thanks to Ron B. This commit was SVN r13692.	2007-02-16 22:19:24 +00:00
Galen Shipman	f98a442c82	Fix a problem in the selection logic for MX. Basically we need to be able to open MTL MX and BTL MX and initialize them at the same time. The problem is that both call mx_init and mx_finalize, solution is to add an external entity that does the init and finalize (based on ref counting). This commit was SVN r13576.	2007-02-09 03:19:38 +00:00
Brian Barrett	09cc9e4941	properly compute starting offset -- the lb will be included in the offset, so we don't need both. Refs trac:864 This commit was SVN r13494. The following Trac tickets were found above: Ticket 864 --> https://svn.open-mpi.org/trac/ompi/ticket/864	2007-02-05 18:12:18 +00:00
Galen Shipman	a94101fa62	mostly another hack around for PML selection, allows CM be select itself if an MTL is available, if not OB1 is used. Still prevents DR and OB1 from stomping on each other though. This commit was SVN r13481.	2007-02-03 02:01:18 +00:00
Christian Bell	e04c55af00	Fixes to psm mtl following a more comprehensive testing of intel tests. This commit was SVN r13471.	2007-02-02 21:55:04 +00:00
Brian Barrett	039a3d8c17	add comment about why there's no status update here, since I always forget This commit was SVN r13400.	2007-01-31 21:39:20 +00:00
Brian Barrett	846eed84f1	When receiving a message, need to account for the fact that the displacement of the first entry might not be the start of the user's buffer. This is similar to what ompi_convertor_unpack does. This is the solution for the test case attached to ticket #690. Refs trac:690 This commit was SVN r13397. The following Trac tickets were found above: Ticket 690 --> https://svn.open-mpi.org/trac/ompi/ticket/690	2007-01-31 18:18:19 +00:00
Brian Barrett	65b07140c0	clean up some of the printf warnings caused by the attribute code This commit was SVN r13395.	2007-01-31 17:11:06 +00:00
Patrick Geoffray	b252cb82c8	oops, ".", not "->", copy error... This commit was SVN r13287.	2007-01-24 19:16:46 +00:00
Patrick Geoffray	d58f6b2451	Free memory in synchronous send case if free_after requires it. Fixes memory leak using synchronous sends and custom data types. This commit was SVN r13286.	2007-01-24 19:10:38 +00:00
Brian Barrett	b8413fb1d5	Just cast the pointer to a uintptr_t then to the match bits, instead of abusing the ompi_ptr_t interface. Not critical for v1.2, as there are no portals platforms that are big endian, so the code in v1.2 will work well enough for now This commit was SVN r13024.	2007-01-07 03:11:27 +00:00
Brian Barrett	48ec0b2071	Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix for now... This commit was SVN r12997. The following SVN revision numbers were found above: r12974 --> open-mpi/ompi@27cea44a9c	2007-01-04 22:07:37 +00:00
Brian Barrett	27cea44a9c	Fix a number of issues with the ompi_ptr_t: * Make sure that the pval always writes to the correct portion of the lval. This only matters on 32 bit big endian machines. * On 32 bit machines when assigning to pval, the other 4 bytes of lval weren't being written, which could lead to bogus data We use macros so that there aren't casts all over the code and the pval assignment can occur to the correct 4 bytes. Refs trac:587 This commit was SVN r12974. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2007-01-03 19:47:48 +00:00
Brian Barrett	6f8b366acb	Rename liborte to libopen-rte and libopal to libopen-pal per telecon today and bug #632. Refs trac:632 This commit was SVN r12762. The following Trac tickets were found above: Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632	2006-12-05 18:27:24 +00:00
Brian Barrett	bfbc281e93	Fix slow startup issue with the MX MTL. The problem is caused by mx_connect() being a one-sided operation from the API level, but not being an interrupting call when the target is not entering the MX library. So if most of the processes exit mtl_mx_add_procs() and enter the stage gate 2 barrier, the other processes can only progress their mx_connect() calls when the targets enter the mx library. Because the event library is in EV_ONELOOP mode, this only happens once a second. The mx progress thread (hidden in the MX library) also only wakes up once a second, so mx_connect calls can take a second to complete. The temporary solution is to switch into EV_NONBLOCK mode earlier (right after the mx_connect loop) so that there isn't a giant slowdown when processes enter the stage gate 2 barrier before other proesses. They will now not block in the event library for any period of time, which appears to have a 50% speedup when running at > 64 procs. Refs trac:645 This commit was SVN r12713. The following Trac tickets were found above: Ticket 645 --> https://svn.open-mpi.org/trac/ompi/ticket/645	2006-12-01 02:49:01 +00:00
Brian Barrett	993d2a7753	Fix for issue IU is seeing on BigRed with connections timing out during MPI_INIT. Use an infinite timeout, which is exactly what MPICH-MX does. This commit was SVN r12669.	2006-11-27 20:10:27 +00:00
George Bosilca	bfbd0e61f6	Minimize the number of lines of code :) This commit was SVN r12550.	2006-11-10 20:56:08 +00:00

1 2

94 Коммитов