openmpi

Автор	SHA1	Сообщение	Дата
George Bosilca	1eb62b6c48	Remove a warning. Close ticket #1357 . This commit was SVN r18717.	2008-06-24 14:23:02 +00:00
George Bosilca	54e7e03695	One less warning. This commit was SVN r18695.	2008-06-20 17:50:19 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
George Bosilca	e361bcb64c	Send optimizations. 1. The send path get shorter. The BTL is allowed to return > 0 to specify that the descriptor was pushed to the networks, and that the memory attached to it is available again for the upper layer. The MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag can be used by the PML to force the BTL to always trigger the callback. Unmodified BTL will continue to work as expected, as they will return OMPI_SUCCESS which force the PML to have exactly the same behavior as before. Some BTLs have been modified: self, sm, tcp, mx. 2. Add send immediate interface to BTL. The idea is to have a mechanism of allowing the BTL to take advantage of send optimizations such as the ability to deliver data "inline". Some network APIs such as Portals allow data to be sent using a "thin" event without packing data into a memory descriptor. This interface change allows the BTL to use such capabilities and allows for other optimizations in the future. All existing BTLs except for Portals and sm have this interface set to NULL. This commit was SVN r18551.	2008-05-30 03:58:39 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Adrian Knoth	c53d3c3c22	reverted r18169,r18170 due to connection reset by peer on odin/sif This commit was SVN r18255. The following SVN revision numbers were found above: r18169 --> open-mpi/ompi@20473bfda2 r18170 --> open-mpi/ompi@d34dfbe12c	2008-04-23 15:26:15 +00:00
Ralph Castain	fa082cafa9	Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex. Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer. This commit was SVN r18198.	2008-04-17 20:43:56 +00:00
Adrian Knoth	d34dfbe12c	fixed misleading comment. This commit was SVN r18170.	2008-04-16 11:26:15 +00:00
Adrian Knoth	20473bfda2	on incoming connections, compare with every possible source address. Rational (taken from the code): /* This is PITA. We never know which source address an * incoming/outgoing packet will have, so even with * btl_tcp_if_include/exclude on the remote end, we * might get a different source address. * * If this address isn't included in btl_proc->proc_addrs, * we would erroneously drop the connection */ merge -r18165:18167 to the trunk. This commit was SVN r18169. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r18165 r18167	2008-04-16 11:24:09 +00:00
Adrian Knoth	e981a259bb	btl_tcp_disable_family=4 and btl_tcp_disable_family=6 are mutually exclusive, so this should result in "unreachable" when set differently between peers. This commit was SVN r18168.	2008-04-16 10:14:58 +00:00
Adrian Knoth	75c54616c7	renamed opal_sockaddr2str to opal_net_get_hostname for WANT_PEER_DUMP=1 This commit was SVN r18154.	2008-04-15 19:23:47 +00:00
George Bosilca	944453c4c1	Cleanups. This commit was SVN r18068.	2008-04-02 06:37:42 +00:00
George Bosilca	be4b153f0d	Another patch for thread safety in the TCP BTL (thanks to Pierre). This commit was SVN r17993.	2008-03-27 18:36:08 +00:00
George Bosilca	1d04ec4ded	Correct the connection logic for TCP. Now we have not only a cleaner connection, but a more thread safe one. Thanks to Pierre for his help on this. This commit was SVN r17853.	2008-03-18 02:42:16 +00:00
Tim Prins	5de3e1965e	Remove the orte_proc_table. Migrate all users of it to the opal_hash_table and a new name hash function in orte. Everything should work, however I am unable to compile and test the sctp BTL. This commit was SVN r17751.	2008-03-05 22:44:35 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Pierre Lemarinier	2a99f89631	Modification of the mutex lock order to prevent races during connection stage. This commit was SVN r17535.	2008-02-20 18:17:58 +00:00
George Bosilca	fa31ec81d0	Add the ownership flags to the PML/BTL interface. The layer owning the descriptor is responsible for releasing it once the descriptor is not in use anymore. This commit was SVN r17497.	2008-02-18 17:39:30 +00:00
Adrian Knoth	f1648f08df	Advanced address selection code from Thomas Peiselt. Re #1207 , #1027 This commit was SVN r17450.	2008-02-13 21:53:00 +00:00
Adrian Knoth	8ae4a10b4c	Reverted r17331, r17332. Still broken. I'm in a bad hurry. :-( Re #1206 This commit was SVN r17333. The following SVN revision numbers were found above: r17331 --> open-mpi/ompi@3846e2a797 r17332 --> open-mpi/ompi@c03de08c55	2008-01-30 16:51:55 +00:00
Adrian Knoth	c03de08c55	Logic is wrong. I'm going to revert it again. Re #1206 This commit was SVN r17332.	2008-01-30 16:48:50 +00:00
Adrian Knoth	3846e2a797	When checking incoming connections, also care about aliased interfaces. Re #1206 This commit was SVN r17331.	2008-01-30 16:45:41 +00:00
Adrian Knoth	7f79c68930	Reverted r17307 and r17308. It broke parallel TCP connections. Re #1206 This commit was SVN r17329. The following SVN revision numbers were found above: r17307 --> open-mpi/ompi@7a59b3f58c r17308 --> open-mpi/ompi@72b29bc21f	2008-01-30 14:31:47 +00:00
Adrian Knoth	72b29bc21f	Cosmetic patch. Use IN6_ARE_ADDR_EQUAL instead of memcmp(). Re #1206 . This commit was SVN r17308.	2008-01-29 16:02:24 +00:00
Adrian Knoth	7a59b3f58c	accept incoming connections from hosts with multiple addresses. We loop over all peer addresses and accept when one of them matches. Note that this might break functionality: mca_btl_tcp_proc_insert now always inserts the same endpoint. (is the lack of endpoints the problem? should there be one for every remote address?) Re #1206 This commit was SVN r17307.	2008-01-29 15:55:56 +00:00
George Bosilca	6310ce955c	The first patch related to the Active Message stuff. So far, here is what we have: - the registration array is now global instead of one by BTL. - each framework have to declare the entries in the registration array reserved. Then it have to define the internal way of sharing (or not) these entries between all components. As an example, the PML will not share as there is only one active PML at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3 are reserved for the framework while the remaining 5 are use internally by each framework. - The registration function is optional. If a BTL do not provide such function, nothing happens. However, in the case where such function is provided in the BTL structure, it will be called by the BML, when a tag is registered. Now, it's time for the second step... Converting OB1 from a switch based PML to an active message one. This commit was SVN r17140.	2008-01-15 05:32:53 +00:00
Rolf vandeVaart	0f0fde3490	Partial fix for #1148 . Enable this for 32-bit sparc as well as 64-bit sparc. This commit was SVN r17059.	2008-01-07 15:43:44 +00:00
George Bosilca	48f5a26e8c	Cast to keep VC happy (quiet). This commit was SVN r17054.	2008-01-04 23:13:32 +00:00
Gleb Natapov	8b511b969d	Introduce a new BTL parameter btl_rndv_eager_limit which determines size of a first fragment of rendezvous protocol. Remove no longer used btl_min_send_size parameter. This commit was SVN r16969.	2007-12-16 08:35:17 +00:00
Jeff Squyres	213b5d5c6e	Per long threads on the mailing list and much confusion discussion about linkers, have all OPAL, ORTE, and OMPI components '''not'' link against the OPAL, ORTE, or OMPI libraries. See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a better-formatted version of the same info). This commit was SVN r16968.	2007-12-15 13:32:02 +00:00
Jeff Squyres	80e9730100	Per http://www.open-mpi.org/community/lists/devel/2007/12/2698.php and this thread: http://www.open-mpi.org/community/lists/devel/2007/12/2807.php, set TCP's exclusivity to LOW+100 and SCTP's exclusivity to LOW. This commit was SVN r16942.	2007-12-12 15:55:37 +00:00
Gleb Natapov	e2e211f23b	Add flags parameter to btl_alloc() and btl_prepare_src() functions. If BTL knows at the time of allocation priority of a descriptor it may do some optimizations. This commit was SVN r16901.	2007-12-09 14:08:01 +00:00
Gleb Natapov	7364b7cf47	Add endpoint parameter to btl_alloc() function. Enables various optimizations inside BTL. This commit was SVN r16898.	2007-12-09 14:00:42 +00:00
Rich Graham	27a748e7eb	change all instances of ompi_free_list_init to ompi_free_list_init_new. Header and payload data are specified separately at this stage. This commit was SVN r16633.	2007-11-01 23:38:50 +00:00
George Bosilca	d67c0eefb4	Remove a compilation warning about using uninitialized variables. This commit was SVN r16589.	2007-10-26 20:15:28 +00:00
George Bosilca	b1b5cb6453	Looks like SO_REUSEPORT it's not defined on some platforms. Switch to the conventional SO_REUSEADDR instead. This commit was SVN r16588.	2007-10-26 19:56:21 +00:00
George Bosilca	337f78a4a8	Restrict the port range for the OOB and the BTL. Each protocols (v4 and v6) has his own range which is defined by a min value and a range. By default there is no limitation on the port range, which is exactly the same behavior as before. This commit was SVN r16584.	2007-10-26 16:36:51 +00:00
Rolf vandeVaart	3dd5196338	Remove the --mca btl_base_debug flag and clean up the use of the --mca btl_base_verbose flag. The btl framework now matches all the other frameworks. Slightly modify error messages for clarity. This commit was SVN r16443.	2007-10-15 13:10:20 +00:00
Nysal Jan	b51d85fb3f	Fix assertion failure "assert( 0 == btl_endpoint->endpoint_cache_length )" while executing mt_coll testcase. This commit was SVN r16408.	2007-10-09 18:00:01 +00:00
Shiqing Fan	a0660f4deb	- Just some type casts. This commit was SVN r16100.	2007-09-12 15:29:58 +00:00
Brian Barrett	59b22533f2	Enable RDMA for heterogeneous situations. Currently done by overloading the ompi_convertor_need_buffers function to only return 0 if the convertor is homogeneous (which it never does on the trunk, but does to on v1.2, but that's a different issue). Only enable the heterogeneous rdma code for a btl if it supports it (via a flag), as some btls need some work for this to work properly. Currently only TCP and OpenIB extensively tested This commit was SVN r15990.	2007-08-28 21:23:44 +00:00
Brad Benton	ccda5c9c74	Modified the MCA_BTL_TCP_CONNECTED case in mca_btl_tcp_endpoint_send_handler() to always first check for a NULL frag pointer before trying to send the fragment. This avoids an issue in multi-threaded execution in which multiple threads working on the same endpoint can result in a thread finding itself here with nothing to send. This commit was SVN r15963.	2007-08-26 23:40:02 +00:00
Jeff Squyres	f4b117957d	Add MCA parameter to enable/disable Nagle's algorithm on the TCP BTL. This commit was SVN r15606.	2007-07-25 12:21:00 +00:00
Brian Barrett	5b9fa7e998	reapply r15517 and r15520, which were removed in r15527 so that I could get the RML/OOB merge in slightly easier This commit was SVN r15530. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95 r15520 --> open-mpi/ompi@9cbc9df1b8 r15527 --> open-mpi/ompi@2d17dd9516	2007-07-20 02:34:29 +00:00
Brian Barrett	2d17dd9516	temporarily back our r15517 and 15520 so that I can get the RML / OOB changes to cleanly apply This commit was SVN r15527. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95	2007-07-20 01:10:34 +00:00
Ralph Castain	41977fcc95	Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but... Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point. Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings. This commit was SVN r15517.	2007-07-19 20:56:46 +00:00
Josh Hursey	d4d5a351c1	Silence a compiler warning when not using IPV6. Also convert a few statements to conform to coding standard for Open MPI. This commit was SVN r15407.	2007-07-13 16:38:36 +00:00
Jeff Squyres	8aa8a667da	Use the OMPI version number for the component number, like all other btl components. This commit was SVN r15363.	2007-07-11 15:45:25 +00:00
Brian Barrett	1d02b9e7b5	Fix a bunch of issues exposed by Ken Cain in getting Open MPI to work with VxWorks. Still some issues remaining, I'm sure. Refs trac:1010 This commit was SVN r15320. The following Trac tickets were found above: Ticket 1010 --> https://svn.open-mpi.org/trac/ompi/ticket/1010	2007-07-10 03:46:57 +00:00
Brian Barrett	8b9e8054fd	Move modex from pml base to general ompi runtime, sicne it's used by more than just the PML/BTLs these days. Also clean up the code so that it handles the situation where not all nodes register information for a given node (rather than just spinning until that node sends information, like we do today). Includes r15234 and r15265 from the /tmp/bwb-modex branch. This commit was SVN r15310. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15234 r15265	2007-07-09 17:16:34 +00:00

1 2 3 4

157 Коммитов