openmpi

Автор	SHA1	Сообщение	Дата
Rolf vandeVaart	c82e468ede	Undo revision r21767 - sorry folks This commit was SVN r21769. The following SVN revision numbers were found above: r21767 --> open-mpi/ompi@41f38110ff	2009-08-05 22:23:26 +00:00
Rolf vandeVaart	41f38110ff	HCA failover support in openib BTL This commit was SVN r21767.	2009-08-05 21:53:02 +00:00
Rainer Keller	6c5532072a	- Split the datatype engine into two parts: an MPI specific part in OMPI and a language agnostic part in OPAL. The convertor is completely moved into OPAL. This offers several benefits as described in RFC http://www.open-mpi.org/community/lists/devel/2009/07/6387.php namely: - Fewer basic types (int* and float* types, boolean and wchar - Fixing naming scheme to ompi-nomenclature. - Usability outside of the ompi-layer. - Due to the fixed nature of simple opal types, their information is completely known at compile time and therefore constified - With fewer datatypes (22), the actual sizes of bit-field types may be reduced from 64 to 32 bits, allowing reorganizing the opal_datatype structure, eliminating holes and keeping data required in convertor (upon send/recv) in one cacheline... This has implications to the convertor-datastructure and other parts of the code. - Several performance tests have been run, the netpipe latency does not change with this patch on Linux/x86-64 on the smoky cluster. - Extensive tests have been done to verify correctness (no new regressions) using: 1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and ompi-ddt: a. running both trunk and ompi-ddt resulted in no differences (except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run correctly). b. with --enable-memchecker and running under valgrind (one buglet when run with static found in test-suite, commited) 2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt: all passed (except for the dynamic/ tests failed!! as trunk/MTT) 3. compilation and usage of HDF5 tests on Jaguar using PGI and PathScale compilers. 4. compilation and usage on Scicortex. - Please note, that for the heterogeneous case, (-m32 compiled binaries/ompi), neither ompi-trunk, nor ompi-ddt branch would successfully launch. This commit was SVN r21641.	2009-07-13 04:56:31 +00:00
Rainer Keller	221fb9dbca	... Delayed due to notifier commits earlier this day ... - Delete unnecessary header files using contrib/check_unnecessary_headers.sh after applying patches, that include headers, being "lost" due to inclusion in one of the now deleted headers... In total 817 files are touched. In ompi/mpi/c/ header files are moved up into the actual c-file, where necessary (these are the only additional #include), otherwise it is only deletions of #include (apart from the above additions required due to notifier...) - To get different MCAs (OpenIB, TM, ALPS), an earlier version was successfully compiled (yesterday) on: Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled This commit was SVN r21096.	2009-04-29 01:32:14 +00:00
Rainer Keller	9dea63d63a	- Last of intrusive commits (promised)... err for now. Anyway, this is blocking the move: do not include pml.h if not really needed, aka none of the following used: mca_pml MCA_PML_CALL OMPI_ANY_TAG OMPI_ANY_SOURCE OMPI_PROC_NULL - Notable exceptions (deleting in one header->adding): - ompi/mca/mtl/psm/ - ompi/mca/osc/rdma/ - ompi/mca/btl/openib/btl_openib_endpoint.c depended on pml_base_sendreq.h - Tested on Linux/x86-64, this time including make check (thanks Jeff and Ralph) This commit was SVN r20725.	2009-03-04 17:06:51 +00:00
Rainer Keller	811f2bd9b4	- As discussed on RFC, move the ompi_bitmap to the opal layer. Add a check against a maximum (actually get rid of ifs internally to opal_bitmap.c) -- the functionality to set the current maximum size opal_bitmap_set_max_size() is currently only used in attribute.c to set the maximum OMPI_FORTRAN_HANDLE_MAX... Tested on linux/x86-64 with intel-tests with all_tests_no_perf_f run with 6 procs. Let's look into MTT as well... This commit was SVN r20708.	2009-03-03 22:25:13 +00:00
Rainer Keller	d81443cc5a	- On the way to get the BTLs split out and lessen dependency on orte: Often, orte/util/show_help.h is included, although no functionality is required -- instead, most often opal_output.h, or orte/mca/rml/rml_types.h Please see orte_show_help_replacement.sh commited next. - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration actually showed two missing #include "orte/util/show_help.h" in orte/mca/odls/base/odls_base_default_fns.c and in orte/tools/orte-top/orte-top.c Manually added these. Let's have MTT the last word. This commit was SVN r20557.	2009-02-14 02:26:12 +00:00
George Bosilca	7d404d238d	Update the tcp BTL to fulfill the requirements for #1713 . This commit was SVN r20151.	2008-12-17 22:15:12 +00:00
George Bosilca	1eb62b6c48	Remove a warning. Close ticket #1357 . This commit was SVN r18717.	2008-06-24 14:23:02 +00:00
George Bosilca	54e7e03695	One less warning. This commit was SVN r18695.	2008-06-20 17:50:19 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
George Bosilca	e361bcb64c	Send optimizations. 1. The send path get shorter. The BTL is allowed to return > 0 to specify that the descriptor was pushed to the networks, and that the memory attached to it is available again for the upper layer. The MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag can be used by the PML to force the BTL to always trigger the callback. Unmodified BTL will continue to work as expected, as they will return OMPI_SUCCESS which force the PML to have exactly the same behavior as before. Some BTLs have been modified: self, sm, tcp, mx. 2. Add send immediate interface to BTL. The idea is to have a mechanism of allowing the BTL to take advantage of send optimizations such as the ability to deliver data "inline". Some network APIs such as Portals allow data to be sent using a "thin" event without packing data into a memory descriptor. This interface change allows the BTL to use such capabilities and allows for other optimizations in the future. All existing BTLs except for Portals and sm have this interface set to NULL. This commit was SVN r18551.	2008-05-30 03:58:39 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
George Bosilca	944453c4c1	Cleanups. This commit was SVN r18068.	2008-04-02 06:37:42 +00:00
George Bosilca	fa31ec81d0	Add the ownership flags to the PML/BTL interface. The layer owning the descriptor is responsible for releasing it once the descriptor is not in use anymore. This commit was SVN r17497.	2008-02-18 17:39:30 +00:00
George Bosilca	6310ce955c	The first patch related to the Active Message stuff. So far, here is what we have: - the registration array is now global instead of one by BTL. - each framework have to declare the entries in the registration array reserved. Then it have to define the internal way of sharing (or not) these entries between all components. As an example, the PML will not share as there is only one active PML at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3 are reserved for the framework while the remaining 5 are use internally by each framework. - The registration function is optional. If a BTL do not provide such function, nothing happens. However, in the case where such function is provided in the BTL structure, it will be called by the BML, when a tag is registered. Now, it's time for the second step... Converting OB1 from a switch based PML to an active message one. This commit was SVN r17140.	2008-01-15 05:32:53 +00:00
Gleb Natapov	e2e211f23b	Add flags parameter to btl_alloc() and btl_prepare_src() functions. If BTL knows at the time of allocation priority of a descriptor it may do some optimizations. This commit was SVN r16901.	2007-12-09 14:08:01 +00:00
Gleb Natapov	7364b7cf47	Add endpoint parameter to btl_alloc() function. Enables various optimizations inside BTL. This commit was SVN r16898.	2007-12-09 14:00:42 +00:00
Gleb Natapov	b88b7dedfe	Rename btl_rdma_offset to btl_pipeline_send_length. This commit was SVN r15153.	2007-06-21 07:12:40 +00:00
Galen Shipman	3401bd2b07	Add optional ordering to the BTL interface. This is required to tighten up the BTL semantics. Ordering is not guaranteed, but, if the BTL returns a order tag in a descriptor (other than MCA_BTL_NO_ORDER) then we may request another descriptor that will obey ordering w.r.t. to the other descriptor. This will allow sane behavior for RDMA networks, where local completion of an RDMA operation on the active side does not imply remote completion on the passive side. If we send a FIN message after local completion and the FIN is not ordered w.r.t. the RDMA operation then badness may occur as the passive side may now try to deregister the memory and the RDMA operation may still be pending on the passive side. Note that this has no impact on networks that don't suffer from this limitation as the ORDER tag can simply always be specified as MCA_BTL_NO_ORDER. This commit was SVN r14768.	2007-05-24 19:51:26 +00:00
George Bosilca	7459ab45f1	This is the complete commit for the TCP header issue. Jeff commit a partial fix (r14749) and then backed it out (r14753). As we are unable to send more than a 32 bits length over TCP in one go, there is no reason to have an uint64 length in the header. This reduce the size of the TCP header. This commit was SVN r14755. The following SVN revision numbers were found above: r14749 --> open-mpi/ompi@48c026ce6b r14753 --> open-mpi/ompi@28ed850b4c	2007-05-24 16:40:49 +00:00
Gleb Natapov	3ebaff8dfe	Implement new BTL parameters: We eagerly send data up to btl__eager_limit with the match Upon ACK of the MATCH we start using send/receives of size btl__max_send_size up to the btl__rdma_pipeline_offset After the btl__rdma_pipeline_offset we begin using RDMA writes of size btl__rdma_pipeline_frag_size. Now, on a per message basis we only use the above protocol if the message is larger than btl__min_rdma_pipeline_size btl__eager_limit - > same btl__max_send_size -> same btl__rdma_pipeline_offset -> btl__min_rdma_size btl__rdma_pipeline_frag_size -> btl__max_rdma_size btl_*_min_rdma_pipeline_size is new.. This patch also moves all BTL common parameters initialisation into btl_base_mca.c file. This commit was SVN r14681.	2007-05-17 07:54:27 +00:00
George Bosilca	46265db0a9	Update the TCP BTL in order to bring back some of the functionalities lost during the IPv6 patch. The most important is the multi BTL support. There was a quite interesting bug. Instead of setting up the multiple connections over different physical devices, based on the time when these connections were created most of the time they were all using the same physical network. Which, of course, was not the intended goal, as we top at the maximum bandwidth available over one device instead of gathering all available bandwidth from all devices. Second, the IPv6 RFC suggest to use sockaddr_storage as a holder for the IP information, but use a sockaddr* when we pass it to functions. This is only partially corrected by this patch. Some other minor cleanups. This commit was SVN r14544.	2007-04-28 19:13:47 +00:00
Jeff Squyres	c4c68e666a	Merge in the ipv6 work from /tmp/ipv6-merge. This commit was SVN r14503.	2007-04-25 01:55:40 +00:00
Josh Hursey	8f119d9063	Closes trac:977 Fix for memory corruption in the restarted process stack. This stemed from the brute force method we were previously using. This commit fixes this by using a lighter weight solution focused in the r2 BML instead of above the PML. This is a more efficient and flexible solution, and it solves the original problem. In the process I pulled out the ft_event function in the tcp BTL and r2 BML into a set of *_ft.[c\|h] files just to keep any updates to these code paths as isolated as possible to make merging easier on everyone. This commit was SVN r14371. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855 The following Trac tickets were found above: Ticket 977 --> https://svn.open-mpi.org/trac/ompi/ticket/977	2007-04-14 02:06:05 +00:00
George Bosilca	667bda0fef	Rework the code a little bit to make things simpler. This commit was SVN r14203.	2007-04-03 16:05:51 +00:00
George Bosilca	1cb26e3b9c	Finally the convertor export a convenience function to allow a consistent computation of the current location on the pack/unpack process. This can be used both for retrieving the pointer to the first byte (in the special case of the cached RDMA protocol) and for getting the current position (for the pipelined protocol). I modified all BTLs, but most of them are still untested. This commit was SVN r14180.	2007-03-30 22:02:45 +00:00
George Bosilca	8c9e4baa47	Add multi-link capabilities to the TCP BTL. This is useful for systems where the latency is high and the network relatively fast. This will allow for more kernel level buffering, which allow overlap between system calls and communications. Somehow, even on fast clusters there is an improvement (non significant). This patch create multiple modules for the same device, which in turn will create multiple sockets between the peers. By default the number of BTL by device is set to 1, so there is no fundamental difference with the current version. Change the value of btl_tcp_links to enable multiple links between peers. This commit was SVN r14076.	2007-03-20 11:50:17 +00:00
Josh Hursey	dadca7da88	Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158	2007-03-16 23:11:45 +00:00
Brian Barrett	48ec0b2071	Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix for now... This commit was SVN r12997. The following SVN revision numbers were found above: r12974 --> open-mpi/ompi@27cea44a9c	2007-01-04 22:07:37 +00:00
Brian Barrett	27cea44a9c	Fix a number of issues with the ompi_ptr_t: * Make sure that the pval always writes to the correct portion of the lval. This only matters on 32 bit big endian machines. * On 32 bit machines when assigning to pval, the other 4 bytes of lval weren't being written, which could lead to bogus data We use macros so that there aren't casts all over the code and the pval assignment can occur to the correct 4 bytes. Refs trac:587 This commit was SVN r12974. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2007-01-03 19:47:48 +00:00
Brian Barrett	33320b7165	Rework the opal_progress interface to better support dynamic processes and at the same time, remove some of the MPI-related options from OPAL: - provide mechanism to change at runtime whether sched_yield() should be called when the progress engine is idle - provide mechanism for changing the rate at which the event engine is called when there are "no" users of the event engine (ie, when using MPI but not TCP) - fix some function names in the progress engine to better match their intended use (and remove MPI naming scheme) - remove progress_mpi_enable / progress_mpi_disable because we can now use the functions to set the sched_yield and tick rate interfaces - rename opal_progress_events() to opal_progress_set_event_flag() because the first really isn't descriptive of what the function does and I always got confused by it This commit was SVN r12645.	2006-11-22 02:06:52 +00:00
George Bosilca	126a68dc9a	Big datatype commit. Remove all unused features of the datatype engine. As the memory allocation logic is completely done outside the data-type engine (in the PML) there is no need for any special case inside the data-type engine. There is less arguments for the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is not required anymore as there is no memory allocated in the engine itself). This change affect all components using datatypes. I test most of them, but it might happens that I miss some ... If it's the case please let me know (don't shoot the pianist!!). This commit was SVN r12331.	2006-10-26 23:11:26 +00:00
George Bosilca	640178c4b3	Grepping through the source files I found these calls to the data-type engine with the wrong type of arguments. This commit was SVN r12148.	2006-10-17 21:05:04 +00:00
George Bosilca	3f0a7cad9e	The last patch for Windows support. Mostly casting and conversion to C++ friendly headers. This commit was SVN r11400.	2006-08-24 16:38:08 +00:00
Galen Shipman	e5c594c211	More updates for the async error handler for btl's In order to provide backwards compatability the framework versions are bumped and the handler registeration function is at the end of the btl struct. Testing done on sm, openib, and gm.. This commit was SVN r11256.	2006-08-17 22:02:01 +00:00
Galen Shipman	3b49953ce2	Add error callback to the btl interface, this allows error to be delivered to the upperlayer assynchronously although there are some issues with this.. such as there are multiple consumers of the btl's.. who get's the This commit was SVN r11232.	2006-08-16 20:21:38 +00:00
Galen Shipman	38a0561d9b	Allow maximum send size to be less than the eager limit. Instead of figuring out which free list the fragment belongs to based on size we simply store a pointer to the list which it belongs in the fragment. This was reviewed by Brian and should hit all the branches. This commit was SVN r10072.	2006-05-25 16:57:14 +00:00
George Bosilca	085cac552f	Don't let TCP to create local connections, we have the self BTL for this purpose. This commit was SVN r10018.	2006-05-23 03:06:32 +00:00
Tim Woodall	712468dbef	add diagnostic interface This commit was SVN r9328.	2006-03-17 17:39:41 +00:00
Brian Barrett	285581dff2	More endian-related cleanups: - moved hton64 and ntoh64 from the bunch of places it had been copied into one header file - properly set and use the btl_tcp's nbo option to put things in network byte order on the wire if both sides don't have the same endianness - Put the OB1 PML's headers (with a couple exceptions I need to discuss with Tim) in network byte order on the wire if both sides don't have the same endianness - since it was needed for the TCP BTL, move the orte_process_name_t HTON and NTOH macros from the TCP OOB to ns_types.h This commit was SVN r9145.	2006-02-26 00:45:54 +00:00
Galen Shipman	e58b758031	standardize behavior of btl_alloc, if the size is larger than the max send size, btl_alloc returns NULL. This commit was SVN r9114.	2006-02-22 17:37:59 +00:00
Brian Barrett	566a050c23	Next step in the project split, mainly source code re-arranging - move files out of toplevel include/ and etc/, moving it into the sub-projects - rather than including config headers with <project>/include, have them as <project> - require all headers to be included with a project prefix, with the exception of the config headers ({opal,orte,ompi}_config.h mpi.h, and mpif.h) This commit was SVN r8985.	2006-02-12 01:33:29 +00:00
George Bosilca	01b0db91ae	Get the lower-bound from the data not from the convertor. This commit was SVN r8444.	2005-12-10 22:38:25 +00:00
Tim Woodall	1929a97d2f	corrections for MPI_BOTTOM This commit was SVN r8429.	2005-12-09 23:27:55 +00:00
George Bosilca	b9a739e2b6	Remove 2 useless assignments (they are done at the end before the return). This commit was SVN r8260.	2005-11-26 21:16:30 +00:00
Jeff Squyres	42ec26e640	Update the copyright notices for IU and UTK. This commit was SVN r7999.	2005-11-05 19:57:48 +00:00
George Bosilca	6b3d02b514	Warning cleanups. On some OSes the iov_base member of the iovec structure is defined as an void * when on others as an char. Thus the right side of all assignment should be explicitly casted to an void in order to avoid any casting complaints from the compilers. This commit was SVN r7607.	2005-10-04 12:36:07 +00:00
George Bosilca	c9fb1f32f2	And more dependencies fixes. The big commit will follow shortly. This commit was SVN r7319.	2005-09-12 20:22:59 +00:00
Tim Woodall	d34e299829	correctly decrement progress_event if tcp is not being used so that tcp doesn't impact progress loop This commit was SVN r7078.	2005-08-29 17:29:58 +00:00

1 2

56 Коммитов