openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	0af7ac53f2	Fixes trac:1392, #1400 * add "register" function to mca_base_component_t * converted coll:basic and paffinity:linux and paffinity:solaris to use this function * we'll convert the rest over time (I'll file a ticket once all this is committed) * add 32 bytes of "reserved" space to the end of mca_base_component_t and mca_base_component_data_2_0_0_t to make future upgrades [slightly] easier * new mca_base_component_t size: 196 bytes * new mca_base_component_data_2_0_0_t size: 36 bytes * MCA base version bumped to v2.0 * '''We now refuse to load components that are not MCA v2.0.x''' * all MCA frameworks versions bumped to v2.0 * be a little more explicit about version numbers in the MCA base * add big comment in mca.h about versioning philosophy This commit was SVN r19073. The following Trac tickets were found above: Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392	2008-07-28 22:40:57 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
George Bosilca	6310ce955c	The first patch related to the Active Message stuff. So far, here is what we have: - the registration array is now global instead of one by BTL. - each framework have to declare the entries in the registration array reserved. Then it have to define the internal way of sharing (or not) these entries between all components. As an example, the PML will not share as there is only one active PML at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3 are reserved for the framework while the remaining 5 are use internally by each framework. - The registration function is optional. If a BTL do not provide such function, nothing happens. However, in the case where such function is provided in the BTL structure, it will be called by the BML, when a tag is registered. Now, it's time for the second step... Converting OB1 from a switch based PML to an active message one. This commit was SVN r17140.	2008-01-15 05:32:53 +00:00
George Bosilca	48f5a26e8c	Cast to keep VC happy (quiet). This commit was SVN r17054.	2008-01-04 23:13:32 +00:00
Gleb Natapov	8b511b969d	Introduce a new BTL parameter btl_rndv_eager_limit which determines size of a first fragment of rendezvous protocol. Remove no longer used btl_min_send_size parameter. This commit was SVN r16969.	2007-12-16 08:35:17 +00:00
Jeff Squyres	80e9730100	Per http://www.open-mpi.org/community/lists/devel/2007/12/2698.php and this thread: http://www.open-mpi.org/community/lists/devel/2007/12/2807.php, set TCP's exclusivity to LOW+100 and SCTP's exclusivity to LOW. This commit was SVN r16942.	2007-12-12 15:55:37 +00:00
Rich Graham	27a748e7eb	change all instances of ompi_free_list_init to ompi_free_list_init_new. Header and payload data are specified separately at this stage. This commit was SVN r16633.	2007-11-01 23:38:50 +00:00
George Bosilca	d67c0eefb4	Remove a compilation warning about using uninitialized variables. This commit was SVN r16589.	2007-10-26 20:15:28 +00:00
George Bosilca	b1b5cb6453	Looks like SO_REUSEPORT it's not defined on some platforms. Switch to the conventional SO_REUSEADDR instead. This commit was SVN r16588.	2007-10-26 19:56:21 +00:00
George Bosilca	337f78a4a8	Restrict the port range for the OOB and the BTL. Each protocols (v4 and v6) has his own range which is defined by a min value and a range. By default there is no limitation on the port range, which is exactly the same behavior as before. This commit was SVN r16584.	2007-10-26 16:36:51 +00:00
Shiqing Fan	a0660f4deb	- Just some type casts. This commit was SVN r16100.	2007-09-12 15:29:58 +00:00
Brian Barrett	59b22533f2	Enable RDMA for heterogeneous situations. Currently done by overloading the ompi_convertor_need_buffers function to only return 0 if the convertor is homogeneous (which it never does on the trunk, but does to on v1.2, but that's a different issue). Only enable the heterogeneous rdma code for a btl if it supports it (via a flag), as some btls need some work for this to work properly. Currently only TCP and OpenIB extensively tested This commit was SVN r15990.	2007-08-28 21:23:44 +00:00
Jeff Squyres	f4b117957d	Add MCA parameter to enable/disable Nagle's algorithm on the TCP BTL. This commit was SVN r15606.	2007-07-25 12:21:00 +00:00
Josh Hursey	d4d5a351c1	Silence a compiler warning when not using IPV6. Also convert a few statements to conform to coding standard for Open MPI. This commit was SVN r15407.	2007-07-13 16:38:36 +00:00
Jeff Squyres	8aa8a667da	Use the OMPI version number for the component number, like all other btl components. This commit was SVN r15363.	2007-07-11 15:45:25 +00:00
Brian Barrett	8b9e8054fd	Move modex from pml base to general ompi runtime, sicne it's used by more than just the PML/BTLs these days. Also clean up the code so that it handles the situation where not all nodes register information for a given node (rather than just spinning until that node sends information, like we do today). Includes r15234 and r15265 from the /tmp/bwb-modex branch. This commit was SVN r15310. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15234 r15265	2007-07-09 17:16:34 +00:00
Brian Barrett	f8fb1e9720	Fix some compile failures on Solaris 9 because it doesn't have V6ONLY. This commit was SVN r15237.	2007-06-28 18:52:15 +00:00
Gleb Natapov	b88b7dedfe	Rename btl_rdma_offset to btl_pipeline_send_length. This commit was SVN r15153.	2007-06-21 07:12:40 +00:00
Gleb Natapov	ac1e8f81af	Lets be real. TCP latency is slightly worse then mx/openib. This commit was SVN r14865.	2007-06-05 12:22:57 +00:00
Gleb Natapov	fbd033b162	Cut&Paste error in r14795. Fix. This commit was SVN r14862. The following SVN revision numbers were found above: r14795 --> open-mpi/ompi@6b0d8c0858	2007-06-05 10:07:06 +00:00
Gleb Natapov	6b0d8c0858	TCP BTL ignores btl_tcp_bandwidth parameter. Fix it. This commit was SVN r14795.	2007-05-30 14:12:05 +00:00
Gleb Natapov	f191834e56	No need for MCA_BTL_FLAGS_NEED_ACK any more. As of commit r14768 this is the default behaviour. This commit was SVN r14782. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07	2007-05-27 11:25:39 +00:00
Gleb Natapov	3ebaff8dfe	Implement new BTL parameters: We eagerly send data up to btl__eager_limit with the match Upon ACK of the MATCH we start using send/receives of size btl__max_send_size up to the btl__rdma_pipeline_offset After the btl__rdma_pipeline_offset we begin using RDMA writes of size btl__rdma_pipeline_frag_size. Now, on a per message basis we only use the above protocol if the message is larger than btl__min_rdma_pipeline_size btl__eager_limit - > same btl__max_send_size -> same btl__rdma_pipeline_offset -> btl__min_rdma_size btl__rdma_pipeline_frag_size -> btl__max_rdma_size btl_*_min_rdma_pipeline_size is new.. This patch also moves all BTL common parameters initialisation into btl_base_mca.c file. This commit was SVN r14681.	2007-05-17 07:54:27 +00:00
Brian Barrett	33a5758521	Some IPv6 improvements: * Move ipv6comat.h code into opal_config_bottom.h and change into some more intelligent testing of structures * Change opal's if interface to use sockaddr instead of sockaddr_storage, as the RFCs suggest we do * Move the networking code in opal that isn't directly related to if detection into net.h * Add quicky function to get the port out of either a sockaddr_in or sockaddr_in6, saving a bunch of code in the oob. * Update TCP oob and btl with new interface This commit was SVN r14679.	2007-05-17 01:17:59 +00:00
Brian Barrett	7708c4f887	Don't complain about unsupported protocols. Needs to be made better, but this will quit the whining from platforms where the kernel doesn't have IPv6 support. This commit was SVN r14676.	2007-05-16 20:11:47 +00:00
George Bosilca	46265db0a9	Update the TCP BTL in order to bring back some of the functionalities lost during the IPv6 patch. The most important is the multi BTL support. There was a quite interesting bug. Instead of setting up the multiple connections over different physical devices, based on the time when these connections were created most of the time they were all using the same physical network. Which, of course, was not the intended goal, as we top at the maximum bandwidth available over one device instead of gathering all available bandwidth from all devices. Second, the IPv6 RFC suggest to use sockaddr_storage as a holder for the IP information, but use a sockaddr* when we pass it to functions. This is only partially corrected by this patch. Some other minor cleanups. This commit was SVN r14544.	2007-04-28 19:13:47 +00:00
Adrian Knoth	e3d35258b4	Cosmetics. Brian fixes my crappy code and I fix the curly braces. That's teamwork, right? ;) This commit was SVN r14517.	2007-04-25 20:17:19 +00:00
Brian Barrett	4b8bb70afb	A couple cleanups for the IPv6 support: - make opal_sockaddr2str() take a sockaddr_storage instead of a sockaddr_in6 so that it works for IPv4 and IPv6 addresses, and remove a whole bunch of #ifs in the OOOB code. - Fix a compiler warning in the TCP BTL due to run-time determined array size by making it a dynamicly allocated array. - Fix the unpacking code of IPv4 addresses when using IPv6 support, so that the address is in the correct location (instead of in an IPv6 structure, use an IPv4 structure). Refs trac:1005. This commit was SVN r14514. The following Trac tickets were found above: Ticket 1005 --> https://svn.open-mpi.org/trac/ompi/ticket/1005	2007-04-25 19:08:07 +00:00
Jeff Squyres	c4c68e666a	Merge in the ipv6 work from /tmp/ipv6-merge. This commit was SVN r14503.	2007-04-25 01:55:40 +00:00
George Bosilca	8c9e4baa47	Add multi-link capabilities to the TCP BTL. This is useful for systems where the latency is high and the network relatively fast. This will allow for more kernel level buffering, which allow overlap between system calls and communications. Somehow, even on fast clusters there is an improvement (non significant). This patch create multiple modules for the same device, which in turn will create multiple sockets between the peers. By default the number of BTL by device is set to 1, so there is no fundamental difference with the current version. Change the value of btl_tcp_links to enable multiple links between peers. This commit was SVN r14076.	2007-03-20 11:50:17 +00:00
Josh Hursey	dadca7da88	Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158	2007-03-16 23:11:45 +00:00
Brian Barrett	2ab65eb521	Remove some debugging output that was #if 0'ed out but shouldn't have been committed into the trunk anyway This commit was SVN r12897.	2006-12-19 02:34:41 +00:00
Brian Barrett	38c2e43ac2	Print out error string rather than errno for TCP-related errors, making it easier for both the user and us to debug issues with BTL and OOB issues... This commit was SVN r12852.	2006-12-14 18:20:43 +00:00
Brian Barrett	441432950f	Merge in changes from the bwb-heterogeneous temp branch (r12491 - r12714) for supporting compilers / architectures with different padding rules. This commit was SVN r12749. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r12491 r12714	2006-12-04 20:11:42 +00:00
Gleb Natapov	30ca7457b4	Some BTLs (e.g TCP) can report put/get completion before data actually hits the buffer on the other side. For this kind of BTLs we need to send FIN through the same BTL, PUT was performed with so network will handle ordering for us. If we will use another BTL, receiver can get FIN before data will hit the buffer and complete request prematurely. We mark such problematic BTLs with MCA_BTL_FLAGS_FAKE_RDMA flag (this kind of RDMA is really fake, because the real one guaranties that sender will see the completion only after receiver's NIC confirmed that all the data was received). This commit was SVN r12732.	2006-12-03 10:12:09 +00:00
Brian Barrett	0895f5e08d	Rename OMPI_PROCESS_NAME_{HTON, NTOH} macros to ORTE_PROCESS_NAME_{HTON, NTOH} because they are in ORTE, not OMPI. Also, remove the ORTE_PROCESS_NAME macros in iof base as they are duplicates of the ones that were in ns_types, which meant that bad things happened if you changed what an orte_process_name_t looked like. This commit was SVN r12646.	2006-11-22 03:03:21 +00:00
Ralph Castain	37dfdb76eb	Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done. This commit was SVN r11661.	2006-09-14 21:29:51 +00:00
George Bosilca	3f0a7cad9e	The last patch for Windows support. Mostly casting and conversion to C++ friendly headers. This commit was SVN r11400.	2006-08-24 16:38:08 +00:00
Galen Shipman	e5c594c211	More updates for the async error handler for btl's In order to provide backwards compatability the framework versions are bumped and the handler registeration function is at the end of the btl struct. Testing done on sm, openib, and gm.. This commit was SVN r11256.	2006-08-17 22:02:01 +00:00
Galen Shipman	3b49953ce2	Add error callback to the btl interface, this allows error to be delivered to the upperlayer assynchronously although there are some issues with this.. such as there are multiple consumers of the btl's.. who get's the This commit was SVN r11232.	2006-08-16 20:21:38 +00:00
Ralph Castain	d2912f03e0	Cleanup a historical naming convention problem. Move the socket_errno definitions to the OPAL layer and change the name accordingly. This cleans up some interrelationship issues as well as removing a name confusion. This commit was SVN r11186.	2006-08-14 20:14:44 +00:00
Galen Shipman	38a0561d9b	Allow maximum send size to be less than the eager limit. Instead of figuring out which free list the fragment belongs to based on size we simply store a pointer to the list which it belongs in the fragment. This was reviewed by Brian and should hit all the branches. This commit was SVN r10072.	2006-05-25 16:57:14 +00:00
Brian Barrett	3e2c51dea8	* fix some silly commenting done by a previous developer that are good for a laugh but probably not good for usability ;) This commit was SVN r9253.	2006-03-11 03:09:24 +00:00
Brian Barrett	285581dff2	More endian-related cleanups: - moved hton64 and ntoh64 from the bunch of places it had been copied into one header file - properly set and use the btl_tcp's nbo option to put things in network byte order on the wire if both sides don't have the same endianness - Put the OB1 PML's headers (with a couple exceptions I need to discuss with Tim) in network byte order on the wire if both sides don't have the same endianness - since it was needed for the TCP BTL, move the orte_process_name_t HTON and NTOH macros from the TCP OOB to ns_types.h This commit was SVN r9145.	2006-02-26 00:45:54 +00:00
Brian Barrett	566a050c23	Next step in the project split, mainly source code re-arranging - move files out of toplevel include/ and etc/, moving it into the sub-projects - rather than including config headers with <project>/include, have them as <project> - require all headers to be included with a project prefix, with the exception of the config headers ({opal,orte,ompi}_config.h mpi.h, and mpif.h) This commit was SVN r8985.	2006-02-12 01:33:29 +00:00
George Bosilca	7baae4f394	Protect the headers and remove the unused ones. This commit was SVN r8439.	2005-12-10 22:04:28 +00:00
George Bosilca	5851b55647	Improve the latency for small and medium messages. The idea is to decrease the number of recv system call by caching the data. Each endpoint has a buffer (the size is an MCA parameter) that can be use as a cache. Before each receive operation this buffer is added at the end of the iovec list. All data that are not expected by the fragment will go in this cache. If the cache contain data all subsequent receive will just memcpy the data into the BTL buffers. The only drawback is that we will spin around the receive_handle until all the cached data is readed by the PML layer. This limitation come from the fact that the event library is unable to call us if there is no events on the socket. Therefore we are unable to keep the data in the cache until the next loop into the progress engine. This commit was SVN r8398.	2005-12-07 00:12:59 +00:00
Tim Woodall	62fd74140b	decrease socket buffers sizes to same as ptl code This commit was SVN r8072.	2005-11-10 00:40:55 +00:00

1 2

61 Коммитов