openmpi

Автор	SHA1	Сообщение	Дата
George Bosilca	271eb11f28	Remove an unused statically defined function. This commit was SVN r21157.	2009-05-05 13:23:49 +00:00
Brian Barrett	77cf736f48	Make max_contextid field match same type as cid in communicator. Refs trac:1904 This commit was SVN r21141. The following Trac tickets were found above: Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904	2009-05-01 21:11:59 +00:00
Ralph Castain	f6da7d86a2	Propagate Brian's change so we abort if we run out of CIDs to the csum module This commit was SVN r21137.	2009-05-01 15:09:44 +00:00
Brian Barrett	736debcffc	Check during communicator creation that we didn't get assigned a CID we can't handle, so that the code aborts instead of hange. Refs trac:1904 This commit was SVN r21133. The following Trac tickets were found above: Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904	2009-04-30 19:23:57 +00:00
Josh Hursey	13a3453e35	more copyright fixes - sorry This commit was SVN r21129.	2009-04-30 16:41:50 +00:00
Josh Hursey	1f42065950	Make sure the CRCPW wrapper does not try to reference a NULL value in MPI_Finalize(), due to the ordering of pml_finalize and comm_del. Some of the PML interfaces are noops in BKMRK. Allow the CRCPW to detect and skip the call to these functions. This commit was SVN r21126.	2009-04-30 16:36:50 +00:00
Rainer Keller	221fb9dbca	... Delayed due to notifier commits earlier this day ... - Delete unnecessary header files using contrib/check_unnecessary_headers.sh after applying patches, that include headers, being "lost" due to inclusion in one of the now deleted headers... In total 817 files are touched. In ompi/mpi/c/ header files are moved up into the actual c-file, where necessary (these are the only additional #include), otherwise it is only deletions of #include (apart from the above additions required due to notifier...) - To get different MCAs (OpenIB, TM, ALPS), an earlier version was successfully compiled (yesterday) on: Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled This commit was SVN r21096.	2009-04-29 01:32:14 +00:00
Rainer Keller	6c1cce8761	- For the upcoming header cleanup commit, several header files (previously included by header-files) now have to be moved "upward". This is mainly system headers such as string.h, stdio.h and for networking, but also some orte headers. This commit was SVN r21095.	2009-04-29 00:49:23 +00:00
Shiqing Fan	3d4e0472d6	Add windows support files into the tarball, including .windows, CMakeLists.txt files, and CMake modules. Thanks to Jeff for testing it on Linux. This commit was SVN r21069.	2009-04-24 16:39:33 +00:00
Nysal Jan	5353236a53	Reduce the size of FIN header by 8 bytes. This is done by rearranging the fields to reduce the amount of compiler padding This commit was SVN r21046.	2009-04-21 14:41:51 +00:00
Ralph Castain	2b0b9dd227	Sync to change in a tmp branch This commit was SVN r21015.	2009-04-15 13:09:51 +00:00
Ralph Castain	46d6c6d516	Sync the csum module with the recent ob1 changes This commit was SVN r21002.	2009-04-14 18:40:54 +00:00
Nysal Jan	221447ef17	Fix checksum mismatch on Big-endian systems when heterogeneous mode is enabled This commit was SVN r21001.	2009-04-14 17:21:38 +00:00
Nysal Jan	697f1837f4	Move fix for ticket #1875 to csum PML This commit was SVN r20986.	2009-04-14 10:44:29 +00:00
George Bosilca	b5deb228f3	Allow the BTL to release the descriptor. In fact the only thing the PML needs is to be involved in the RMA completion process, which is insured by the MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag. Fixes trac:1875. This commit was SVN r20983. The following Trac tickets were found above: Ticket 1875 --> https://svn.open-mpi.org/trac/ompi/ticket/1875	2009-04-13 23:41:50 +00:00
George Bosilca	527540aeb1	Rename req_bytes_delivered to req_bytes_expected for the receive requests to really reflect what this field means. This commit was SVN r20971.	2009-04-10 16:36:20 +00:00
George Bosilca	c148d33eb5	Play nicely with the reference count on the ompi_proc structure. This commit was SVN r20970.	2009-04-10 16:32:02 +00:00
Nysal Jan	1decf8bf36	Move ob1 FIN/ACK fixes to csum PML This commit was SVN r20954.	2009-04-08 10:43:35 +00:00
George Bosilca	dfc7cea329	Fix the deadlock issues on the osu_bw. The problem is that the PML is event driver, and if there are no event generated by the BTLs ... well nothing happens (i.e there is no progress at the PML level and all pending fragments remain pending). By forcing the BTL to trigger the callbacks for all ACK and FIN, we give more opportunities to the PML to do real progress, but we pay this in terms of performance. This commit was SVN r20953.	2009-04-07 16:56:37 +00:00
George Bosilca	44ce610b8b	Add a comment to highlight the fact that this function reappend the FIN message to the pending list when the send fails. Therefore, any upper level function is not required to add it. Make sure we don't send the FIN twice. This commit was SVN r20952.	2009-04-07 16:48:58 +00:00
George Bosilca	ccb79b963f	This is the other half of the commit r20946 as I mess them up between two of my testing machines. The fix require both commits! This commit was SVN r20947. The following SVN revision numbers were found above: r20946 --> open-mpi/ompi@e2bb4c9b8f	2009-04-06 21:49:52 +00:00
George Bosilca	e2bb4c9b8f	Correct the handling of the pckt_pending list. The problem was that we returned the pck before coping the values out. With this change it seems to work at least on two architectures (even with the mpool size set back to 0). This commit was SVN r20946.	2009-04-06 21:45:08 +00:00
Nysal Jan	5032f59edf	Fix checksum computation in the buffered send code This commit was SVN r20935.	2009-04-03 07:09:24 +00:00
Ralph Castain	ba1a98c398	Fix a warning message by pointing to the correct header This commit was SVN r20930.	2009-04-02 13:54:59 +00:00
Nysal Jan	e561a6c43a	Add missing checksum calculation. This fixes a checksum mismatch failure while using TCP BTL This commit was SVN r20927.	2009-04-01 20:48:35 +00:00
Nysal Jan	aff903f39c	Don't print this message by default This commit was SVN r20914.	2009-04-01 14:31:21 +00:00
Ralph Castain	cba3708893	Cleanup debugging output, remove an unnecessary re-compute of the checksum This commit was SVN r20895.	2009-03-30 17:09:32 +00:00
Ralph Castain	d5e6104035	Continue to cleanup the csum pml module. Some minor corrections and debug output added. This commit was SVN r20894.	2009-03-29 23:27:06 +00:00
Ralph Castain	f72e3ba9f9	Update the PML base send init macro to take a converter_flag field (discussed with George). Update the csum pml module - still not quite right, but closer. Modify the LANL platform files to keep pace. This commit was SVN r20859.	2009-03-24 19:12:53 +00:00
Ralph Castain	d88df53a86	A touch more cleanup. Also, bring over the peruse cleanups from r20844 This commit was SVN r20849. The following SVN revision numbers were found above: r20844 --> open-mpi/ompi@daba352af4	2009-03-24 01:36:31 +00:00
Ralph Castain	78323fd6b2	Minor cleanups to compile without warnings This commit was SVN r20848.	2009-03-24 00:54:16 +00:00
Ralph Castain	75ca19d1d1	Turn off a function that hasn't been added to the code base yet... This commit was SVN r20847.	2009-03-23 23:56:11 +00:00
Ralph Castain	17f51a0389	Add a new PML module that acts as a "mini-dr" - when requested, it performs a dr-like checksum on messages for BTL's that require it, as specified by MCA params. Add two new configure options that specify: 1. when to add padding to the openib control header - this only happens when the configure option is specified 2. when to use the dr-like checksum as opposed to the memcpy checksum. Not selectable at runtime - to eliminate performance impacts, this is a configure-only option Also removed an unused checksum version from opal/util/crc.h. The new component still needs a little cleanup and some sync with recent ob1 bug fixes. It was created as a separate module to avoid performance hits in ob1 itself, though most of the code is duplicative. The component is only selectable by either specifying it directly, or configuring with the dr-like checksum -and- setting -mca pml_csum_enable_checksum 1. Modify the LANL platform files to take advantage of the new module. This commit was SVN r20846.	2009-03-23 23:52:05 +00:00
George Bosilca	daba352af4	As the request is not yet updated (i.e. _MATCHED cannot be called as we don't yet know the expected length of the message) we should use the source and tag from the message header instead of the value from the status structure attached to the request. -This line, and those below, will be ignored-- M pml_ob1_recvreq.c This commit was SVN r20844.	2009-03-23 20:25:53 +00:00
Aurelien Bouteiller	fa9b6e729b	Fix missing file in Makefile.am and the "CREATE FAILURE". This commit was SVN r20821.	2009-03-18 13:42:48 +00:00
Rainer Keller	6f808d9b05	Preparation work for another commit (after RFC): - This patch solely _adds_ required headers and is rather localized The next patch (after RFC) heavily removes headers (based on script) - ompi/communicator/communicator.h: For sources that use ompi_mpi_comm_world, don't require them to include "mpi.h" - ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs #include "ompi/mca/topo/topo.h" - ompi/errhandler/errhandler_predefined.h: ompi/communicator/communicator.h depends on this header file! To prevent recursion just have fwd declarations. #include "ompi/types.h" for fwd declarations of the main structs. - ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t - ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and ompi_rb_tree_t, so have the proper classes - ompi/mca/op/op.h: Op is pretty self-contained: Nobody up to now has done #include "opal/class/opal_object.h" - ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h: #include "opal/types.h" for ompi_ptr_t - ompi/mca/pml/base/base.h: We use opal_lists - ompi/mca/pml/dr/pml_dr_vfrag.h: #include "opal/types.h" for ompi_ptr_t - ompi/mca/pml/ob1/pml_ob1_hdr.h: #include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t - opal/dss/dss_unpack.c: #include "opal/types.h" - opal/mca/base/base.h: #include "opal/util/cmd_line.h" for opal_cmd_line_t - orte/mca/oob/tcp/oob_tcp.c: #include "opal/types.h" for opal_socklen_t - orte/mca/oob/tcp/oob_tcp.h: #include "opal/threads/threads.h" for opal_thread_t - orte/mca/oob/tcp/oob_tcp_msg.c: #include "opal/types.h" - orte/mca/oob/tcp/oob_tcp_peer.c: #include "opal/types.h" for opal_socklen_t - orte/mca/oob/tcp/oob_tcp_send.c: #include "opal/types.h" - orte/mca/plm/base/plm_base_proxy.c: #include "orte/util/name_fns.h" for ORTE_NAME_PRINT - orte/mca/rml/base/rml_base_receive.c: #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE - orte/mca/rml/oob/rml_oob_recv.c: #include "opal/types.h" for ompi_iov_base_ptr_t - orte/mca/rml/oob/rml_oob_send.c: #include "opal/types.h" for ompi_iov_base_ptr_t - orte/runtime/orte_data_server.c #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE - orte/runtime/orte_globals.h: #include "orte/util/name_fns.h" for ORTE_NAME_PRINT Tested on Linux/x86-64 This commit was SVN r20817.	2009-03-17 21:34:30 +00:00
Aurelien Bouteiller	3cd5a0d833	Support for the MPI event logger improving event logging perfs. This commit was SVN r20804.	2009-03-17 17:35:28 +00:00
Rainer Keller	d8cf4c0fec	- Get pgcc on XT to complain less: In case we use memcmp, strlen, strup and friends include <string.h> Also several constants.h are not included directly - Let's have mca_topo_base_cart_create return ompi-errors in ompi/mca/topo/base/topo_base_cart_create.c This commit was SVN r20773.	2009-03-13 02:10:32 +00:00
Rainer Keller	ec0ed48718	- Revert r20739 This commit was SVN r20742. The following SVN revision numbers were found above: r20739 --> open-mpi/ompi@781caee0b6	2009-03-05 21:56:03 +00:00
Rainer Keller	a94438343b	- Revert r20740 This commit was SVN r20741. The following SVN revision numbers were found above: r20740 --> open-mpi/ompi@2a70618a77	2009-03-05 21:50:47 +00:00
Rainer Keller	2a70618a77	- Second patch, as discussed in Louisville. Replace short macros in orte/util/name_fns.h to the actual fct. call. - Compiles on linux/x86-64 This commit was SVN r20740.	2009-03-05 21:14:18 +00:00
Rainer Keller	781caee0b6	- First of two or three patches, in orte/util/proc_info.h: Adapt orte_process_info to orte_proc_info, and change orte_proc_info() to orte_proc_info_init(). - Compiled on linux-x86-64 - Discussed with Ralph This commit was SVN r20739.	2009-03-05 20:36:44 +00:00
Rainer Keller	fd28b392bf	- An intrusive commit yet again (sorry): with the separation we get bitten by header depending on having already included the corresponding [opal\|orte\|ompi]_config.h header. When separating, things like [OPAL\|ORTE\|OMPI]_DECLSPEC are missed. Script to add the corresponding header in front of all following (taking care of possible #ifdef HAVE_...) - Including some minor cleanups to - ompi/group/group.h -- include _after_ #ifndef OMPI_GROUP_H - ompi/mca/btl/btl.h -- nclude _after_ #ifndef MCA_BTL_H - ompi/mca/crcp/bkmrk/crcp_bkmrk_btl.c -- still no need for orte/util/output.h - ompi/mca/pml/dr/pml_dr_recvreq.c -- no need for mpool.h - ompi/mca/btl/btl.h -- reorder to fit - ompi/mca/bml/bml.h -- reorder to fit - ompi/runtime/ompi_mpi_finalize.c -- reorder to fit - ompi/request/request.h -- additionally need ompi/constants.h - Tested on linux/x86-64 This commit was SVN r20720.	2009-03-04 15:35:54 +00:00
Eugene Loh	efe8c3a283	Initialize reuse_old_request properly at the beginning of each loop iteration in pml_ob1_start.c. This commit was SVN r20712.	2009-03-04 06:58:36 +00:00
Rainer Keller	811f2bd9b4	- As discussed on RFC, move the ompi_bitmap to the opal layer. Add a check against a maximum (actually get rid of ifs internally to opal_bitmap.c) -- the functionality to set the current maximum size opal_bitmap_set_max_size() is currently only used in attribute.c to set the maximum OMPI_FORTRAN_HANDLE_MAX... Tested on linux/x86-64 with intel-tests with all_tests_no_perf_f run with 6 procs. Let's look into MTT as well... This commit was SVN r20708.	2009-03-03 22:25:13 +00:00
Rainer Keller	02416033ad	- Get rid of warning on function declarations: First "static inline", then the type This commit was SVN r20657.	2009-02-28 14:15:34 +00:00
George Bosilca	e181ba50c9	Stop valgrind from complaining about few uninitialized bytes on the PML headers. This feature is enabled only in debug mode when the heterogeneous support is enabled. This commit was SVN r20648.	2009-02-27 05:24:06 +00:00
Rainer Keller	04567d3af0	- Header orte/mca/errmgr/errmgr.h is not needed. Once again compiles fine with -Wimplicit-function-declaration This commit was SVN r20640.	2009-02-26 04:05:30 +00:00
Rainer Keller	96e1b9b747	- Header orte/mca/rml/rml.h is not needed if no occurence of orte_rml or ORTE_RML. As the others compiles fine with -Wimplicit-function-declaration This commit was SVN r20639.	2009-02-26 03:52:31 +00:00
Rainer Keller	b356e90fa1	- Get rid of include orte/util/proc_info.h, if not needed Only proc_info.h-internal include file is opal/dss/dss_types.h - In one case (orte/util/hnp_contact.c) had to add proc_info.h again. - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration works fine, no errors. Again, let's have MTT the last word. This commit was SVN r20631.	2009-02-25 03:38:00 +00:00
Terry Dontje	0178b6c45f	Added padding to predefined handle structures to maintain library version to version compatibility. This commit was SVN r20627.	2009-02-24 17:17:33 +00:00
George Bosilca	97a2296fdd	Correct the GET protocol. Thanks to Mike Dubman for finding the problem and testing my patch. This commit was SVN r20591.	2009-02-19 16:00:15 +00:00
Rainer Keller	d81443cc5a	- On the way to get the BTLs split out and lessen dependency on orte: Often, orte/util/show_help.h is included, although no functionality is required -- instead, most often opal_output.h, or orte/mca/rml/rml_types.h Please see orte_show_help_replacement.sh commited next. - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration actually showed two missing #include "orte/util/show_help.h" in orte/mca/odls/base/odls_base_default_fns.c and in orte/tools/orte-top/orte-top.c Manually added these. Let's have MTT the last word. This commit was SVN r20557.	2009-02-14 02:26:12 +00:00
Jeff Squyres	91415c2996	Some minor valgrind-inspired cleanups: fix some memory leaks This commit was SVN r20542.	2009-02-13 03:45:11 +00:00
George Bosilca	4747a4bb53	ompi_comm_all allocate memory and retain the objects. Therefore, after each call to ompi_comm_all we should parse the communicator list and release the objects ... This commit was SVN r20525.	2009-02-11 21:48:11 +00:00
Jeff Squyres	679e2855b7	Fix CID 1135: the assignment to item was never used (it was overwritten in the next loop iteration "item = next_item"). This commit was SVN r20189.	2009-01-03 15:15:42 +00:00
George Bosilca	4d5fbc5955	Remove unused lock from the ompi_proc_t. This reduce the size of the ompi_proc_t by 64 bytes. Remove the useless pml_proc from the PML layer. This commit was SVN r20157.	2008-12-19 19:56:27 +00:00
Shiqing Fan	a5281f0434	- 1/4 commit for Windows Visual Studio and CCP support: CMakeLists and .windows files. In contribs preconfigured and precompiled parts. This commit was SVN r20108.	2008-12-10 20:59:20 +00:00
Ralph Castain	1ace83c470	Enable modex-less launch. Consists of: 1. minor modification to include two new opal MCA params: (a) opal_profile: outputs what components were selected by each framework currently enabled for most, but not all, frameworks (b) opal_profile_file: name of file that contains profile info required for modex 2. introduction of two new tools: (a) ompi-probe: MPI process that simply calls MPI_Init/Finalize with opal_profile set. Also reports back the rml IP address for all interfaces on the node (b) ompi-profiler: uses ompi-probe to create the profile_file, also reports out a summary of what framework components are actually being used to help with configuration options 3. modification of the grpcomm basic component to utilize the profile file in place of the modex where possible 4. modification of orterun so it properly sees opal mca params and handles opal_profile correctly to ensure we don't get its profile 5. similar mod to orted as for orterun 6. addition of new test that calls orte_init followed by calls to grpcomm.barrier This is all completely benign unless actively selected. At the moment, it only supports modex-less launch for openib-based systems. Minor mod to the TCP btl would be required to enable it as well, if people are interested. Similarly, anyone interested in enabling other BTL's for modex-less operation should let me know and I'll give you the magic details. This seems to significantly improve scalability provided the file can be locally located on the nodes. I'm looking at an alternative means of disseminating the info (perhaps in launch message) as an option for removing that constraint. This commit was SVN r20098.	2008-12-09 23:49:02 +00:00
Shiqing Fan	abd21b6d17	- An update for memchecker : 1. fix a bug in pml_ob1_recvreq/sendreq.c, buffer was made defined where the request has already been released. 2. complete memchecker support for collective functions. 3. change the wrongly spelled function name of memchecker, i.e. '_isaddressible' should be '_isaddressable' This commit was SVN r20043.	2008-11-27 16:34:02 +00:00
Jeff Squyres	bb0b5b04bd	Remove duplicate copyright notice (found by script). This commit was SVN r19984.	2008-11-12 17:42:40 +00:00
George Bosilca	82d1d5d785	The patch for "Unexpected message queue for unknown CID's required" ticket #1460 . I'm unable to split it in two parts, my patch and Edgar's one. So I just update copyright information for both of us. What this patch do: - it use the unexpected queue create by commit r19562 to dispatch the unexpected message to the right communicator (once this communicator is created and initialized). - delay the PML comm_add until we have the context_id for the new communicator. - only do the PML comm_add on processes that really belong to the new communicator. Please read the lengthy comment in the source code for the reason behind this. This commit was SVN r19929. The following SVN revision numbers were found above: r19562 --> open-mpi/ompi@acd3406aa7	2008-11-04 21:58:06 +00:00
George Bosilca	9260f6b157	There is no reason to ask for an ACK from the BTL here. This commit was SVN r19789.	2008-10-22 20:13:33 +00:00
Josh Hursey	88aa45dd52	Commit to bring online OpenIB, MX, and shared memory support for Open MPI's checkpoint/restart functionality. Some tuning is still needed, but basic functionality is in place. There is still a problem with OpenIB and threads (external to C/R functionality). It has been reported in Ticket #1539 Additionally: * Fix a file cleanup bug in CRS Base. * Fix a possible deadlock in the TCP ft_event function * Add a mca_base_param_deregister() function to MCA base * Add whole process checkpoint timers * Add support for BTL: OpenIB, MX, Shared Memory * Add support Mpool: rdma, sm * Sundry bounds checking an cleanup in some scattered functions This commit was SVN r19756.	2008-10-16 15:09:00 +00:00
Aurelien Bouteiller	852c0e35b8	Fixed a stupid type mismatch. Thanks Jeff for noticing. Aurelien This commit was SVN r19672.	2008-10-02 00:22:41 +00:00
Aurelien Bouteiller	77fa3b5d4c	Code to connect to event loggers over optimized MPI communication channels. This commit was SVN r19669.	2008-10-01 18:42:43 +00:00
Aurelien Bouteiller	aded765084	Upgrade to the new mca version This commit was SVN r19668.	2008-10-01 18:40:44 +00:00
George Bosilca	00d24bf8ab	Scalability patch, or slim-fast effect #1 . All BML structures just got a whole lot smaller, decreasing the memory footprint of the running application. How much it's a good question. Here is a breakdown: - in mca_bml_base_endpoint_t: 3 size_t + 1 uint32_t - in mca_bml_base_btl_t: 1 * int + 1 * double - 1 * float + 6 * size_t + 9 * (void) The decrease in mca_bml_base_endpoint_t is for each peer and the decrease in mca_bml_base_btl_t is for each BTL for each peer. So, if we consider the most convenient case where there is only one network between all peers, this decrease the memory foot print per peer by 9size_t + 9(void) + 2 * int32_t + 1 * double - 1 * float. On a 64 bits machine this will be 156 bytes per peer. Now we access all these fields directly from the underlying BTL structure, and as this structure is common to multiple BML endpoint, we are a lot more cache friendly. Even if this do not improve the latency, it makes the SM performance graph a lot smoother. This commit was SVN r19659.	2008-09-30 21:02:37 +00:00
George Bosilca	b32e4e7f34	Nothing important, mainly replacing tabs with spaces. This commit was SVN r19658.	2008-09-30 18:30:35 +00:00
George Bosilca	325d006577	Mostly cleanups, and eventually a little bit more scalable add_procs. There was an argument that was barely used, and on return at the PML level it contained nothing usable. It has been removed, so now we're using less memory ... This commit was SVN r19657.	2008-09-30 15:47:43 +00:00
Jeff Squyres	d2d06008a0	Change the default value of mpi_leave_pinned to -1, meaning that we'll figure it out at runtime (really meaning: we'll still default to "0" unless something explicitly overrides to 1, such as the openib BTL). This way, ompi_info doesn't confusingly report mpi_leave_pinned==0 for mpi_leave_pinned, but we end up running with mpi_leave_pinned==1. Fixes trac:1502. This commit was SVN r19571. The following Trac tickets were found above: Ticket 1502 --> https://svn.open-mpi.org/trac/ompi/ticket/1502	2008-09-16 22:06:14 +00:00
Jeff Squyres	270f482fea	Addendum to r19561: also remove a comment that is no longer true and some code that is commented out. This commit was SVN r19564. The following SVN revision numbers were found above: r19561 --> open-mpi/ompi@17e65369be	2008-09-16 13:02:10 +00:00
George Bosilca	acd3406aa7	Never drop messages. No never no more. This is supposed to fix the ticket #1460. This commit was SVN r19562.	2008-09-15 23:04:18 +00:00
George Bosilca	17e65369be	Fix the deadlock when we run out of resources on the BTLs. Move the progress function from the BML into the PML. The BTL progress functions are now directly registered with the event library. This commit was SVN r19561.	2008-09-15 22:56:23 +00:00
George Bosilca	2499112d1c	Fix indentation. This commit was SVN r19313.	2008-08-17 20:10:54 +00:00
Rainer Keller	e84f1f6fdf	- Mark the variable bytes_delivered as being unused (it is just set within MCA_PML_OB1_RECV_REQUEST_UNPACK) Iff Coverity's prevent makes usage of __attribute__(unused), this should get rid of warning. Relates to CID1060 Would then apply to a many int _rc; definitions, that are used in other macros in similar fashion... This commit was SVN r19179.	2008-08-06 13:46:23 +00:00
George Bosilca	3dafa58b32	Fix coverty issue 1044. This commit was SVN r19178.	2008-08-06 13:38:21 +00:00
Rainer Keller	c1f2b8e476	- Fix resource leak in case of error. Coverity CID1067 This commit was SVN r19168.	2008-08-06 08:04:27 +00:00
Jeff Squyres	0af7ac53f2	Fixes trac:1392, #1400 * add "register" function to mca_base_component_t * converted coll:basic and paffinity:linux and paffinity:solaris to use this function * we'll convert the rest over time (I'll file a ticket once all this is committed) * add 32 bytes of "reserved" space to the end of mca_base_component_t and mca_base_component_data_2_0_0_t to make future upgrades [slightly] easier * new mca_base_component_t size: 196 bytes * new mca_base_component_data_2_0_0_t size: 36 bytes * MCA base version bumped to v2.0 * '''We now refuse to load components that are not MCA v2.0.x''' * all MCA frameworks versions bumped to v2.0 * be a little more explicit about version numbers in the MCA base * add big comment in mca.h about versioning philosophy This commit was SVN r19073. The following Trac tickets were found above: Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392	2008-07-28 22:40:57 +00:00
Jeff Squyres	e3e79c0881	Fixes trac:1379: * Use synonym/deprecated MCA param API for some mca base params * In openib BTL, if we have appropriate memory hooks support, and if mpi_leave_pinned and mpi_leave_pinned_pipeline were not set by the user, set mpi_leave_pinned to 1. * Defer checking mpi_leave_pinned_* until as late as possible (i.e., until after the btl's have had a chance to set mpi_leave_pinned to 1): * in ob1 pml * in rdma mpool This commit was SVN r19022. The following Trac tickets were found above: Ticket 1379 --> https://svn.open-mpi.org/trac/ompi/ticket/1379	2008-07-24 22:51:26 +00:00
Aurelien Bouteiller	086cb6190e	Use the generic version number instead of hardcoded ones This commit was SVN r18983.	2008-07-22 21:10:51 +00:00
George Bosilca	3ba0a8c0c1	In the case where the environment is homogeneous we can ALWAYS create the receiver convertor when we create the request (as we know all architectures are identical). This commit was SVN r18934.	2008-07-17 04:57:55 +00:00
George Bosilca	902a2892b6	Fix typo. This commit was SVN r18933.	2008-07-17 04:55:23 +00:00
George Bosilca	939fa3001d	Small cleanups. Remove some switch cases that cannot be reached. Rename a struct field. This commit was SVN r18931.	2008-07-17 04:50:39 +00:00
George Bosilca	319a8b3219	Once matched the proc attached to the request should be the source of the message and not the first on the list. This fix the ticket #1386. This commit was SVN r18929.	2008-07-17 03:04:28 +00:00
Aurelien Bouteiller	66463cb258	Fix the annoying message from showing up when not using PML V. The underlying bug is not fixed though, but at least people not involved in FT dev should not see it anymore. Fix ticket https://svn.open-mpi.org/trac/ompi/ticket/1328 Aurelien This commit was SVN r18917.	2008-07-15 22:05:40 +00:00
George Bosilca	3de0488410	Fix the truncation problem. This close the #211 . This commit was SVN r18850.	2008-07-09 17:38:41 +00:00
Ralph Castain	6af8a73dc0	Modify the checking logic to look for NULL return This commit was SVN r18749.	2008-06-26 14:08:36 +00:00
Ralph Castain	af8c167861	May be picky, but cleanup before returning in error conditions This commit was SVN r18748.	2008-06-26 13:31:36 +00:00
Ralph Castain	3631a60181	Update the PML selection logic to detect when a modex is required, and in those cases to only have rank=0 report its selected module. This is per the email thread on the devel list: http://www.open-mpi.org/community/lists/devel/2008/06/4223.php This commit was SVN r18747.	2008-06-26 13:22:48 +00:00
George Bosilca	bc9b950162	Honor ^ for the PML selection. This commit was SVN r18683.	2008-06-19 16:50:46 +00:00
George Bosilca	dc0ab0d0a8	Enable the sendi path. This commit was SVN r18633.	2008-06-09 23:03:56 +00:00
Aurelien Bouteiller	ebe6df4c06	Moving the pml_v_output global variable inside the pml_v structure. This should avoid one of the missing symbols when visibility is enabled. This commit was SVN r18627.	2008-06-09 20:38:44 +00:00
Galen Shipman	dbd282fcad	doh.. fix GET protocol.. This commit was SVN r18623.	2008-06-09 19:45:44 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
George Bosilca	2aec094d56	The PML V is a component so it should use OMPI_MODULE_DECLSPEC. This commit was SVN r18610.	2008-06-06 17:43:57 +00:00
George Bosilca	4d8cbbc167	Add Pasha's patch as it correctly solve the issues. In fact in the current incarnation these functions do not need the inline keyword anymore. This commit was SVN r18558.	2008-06-03 16:03:36 +00:00
Ralph Castain	c992e99035	Remove the tags from orte_output_open and the filtering operation from orte_output - this will be handled differently to improve the XML output interface This commit was SVN r18557.	2008-06-03 14:24:01 +00:00
George Bosilca	e361bcb64c	Send optimizations. 1. The send path get shorter. The BTL is allowed to return > 0 to specify that the descriptor was pushed to the networks, and that the memory attached to it is available again for the upper layer. The MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag can be used by the PML to force the BTL to always trigger the callback. Unmodified BTL will continue to work as expected, as they will return OMPI_SUCCESS which force the PML to have exactly the same behavior as before. Some BTLs have been modified: self, sm, tcp, mx. 2. Add send immediate interface to BTL. The idea is to have a mechanism of allowing the BTL to take advantage of send optimizations such as the ability to deliver data "inline". Some network APIs such as Portals allow data to be sent using a "thin" event without packing data into a memory descriptor. This interface change allows the BTL to use such capabilities and allows for other optimizations in the future. All existing BTLs except for Portals and sm have this interface set to NULL. This commit was SVN r18551.	2008-05-30 03:58:39 +00:00
Galen Shipman	4da4c44210	Receive side changes, basically uses multiple active message callbacks rather than using a single receive callback followed by a switch on the header. Also fast pathed the matching for small fragments. This commit was SVN r18549.	2008-05-30 01:29:09 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Gleb Natapov	31d2797a2f	If RDMA PUT is received before ACK and registration of memory fails don't start sending fragment by copy in/out before ACK is received as we don't know pointer to receive request yet. Pipeline protocol sometimes doesn't send ACK though, so this case is still broken. This commit was SVN r18423.	2008-05-11 12:40:55 +00:00
Josh Hursey	da2f1c58e2	Some checkpoint/restart cleanup. * Remove the opal_only option. This was suffering from bit rot, and no one uses it. It can be added back fairly easily if wanted. * Cleanup metadata interactions at the local level. * Touch up some of the INC funcitonality (fix typos and a minor ordering issue) This commit was SVN r18416.	2008-05-08 18:47:47 +00:00
Shiqing Fan	8393fb5d47	Use the new memchecker_call function for memory checking of non-blocking communication. This commit was SVN r18399.	2008-05-07 12:28:51 +00:00
Shiqing Fan	f35a06119c	Use memchecker_convertor_call function instead the old one. Move the function to the place that we can use convertor. This commit was SVN r18370.	2008-05-05 13:57:27 +00:00
Josh Hursey	dcd21d7d07	Some checkpoint/restart fixes in response to r18338 (changes in modex). Things should be working now. This commit was SVN r18348. The following SVN revision numbers were found above: r18338 --> open-mpi/ompi@3e55fe6f6d	2008-05-01 17:48:13 +00:00
Ralph Castain	3e55fe6f6d	Fold in the revised modex scheme. Move the ompi_proc_t modex portions to the RTE level since the daemons already have that info. Provide each process with the equivalent of a "nidmap" - both a map of what nodes are in the job, and a map of which node each process is on. This enables the use of static ports, though that hasn't been turned "on" in this commit. Update the rsh tree spawn capability so we spawn the next wave of daemons before launching our own local procs. Add an ability to encode nodenames for large clusters with contiguous node name numbering schemes - this allows communication of all node names in a few bytes instead of tens-of-bytes/node. This commit was SVN r18338.	2008-04-30 19:49:53 +00:00
George Bosilca	6e6c370917	Rollback r18274 as its legal to have a sequence number smaller than the expected one. It doesn't necessarily means the message is duplicated, it can simply signify the message is out of sequence and the counter overflowed. This commit was SVN r18323. The following SVN revision numbers were found above: r18274 --> open-mpi/ompi@73c9de3af9	2008-04-27 18:35:54 +00:00
Aurelien Bouteiller	c20b020ea6	Fix ticket #1275 . The pml v can now be correctly deactivated on the configure command line. Also fix a dist target under some unusual circumpstances. This commit was SVN r18291.	2008-04-24 21:42:54 +00:00
Josh Hursey	2c736873bb	Fix a checkpoint/restart bug that causes a restarted application to occasionally throw a SIGSEGV or SIGPIPE due to invalid socket descriptors. The problem was caused by a bad ordering between the restart of the ORTE level tcp connections (in the OOB - out-of-band communication) and the Open MPI level tcp connections (BTLs). Before this commit ORTE would shutdown and restart the OOB completely before the OMPI level restarted its tcp connections. What would happen is that a socket descriptor used by the OMPI level on checkpoint was assigned to the ORTE level on restart. But the OMPI level had no knowledge that the socket descriptor it was previously using has been recycled so it closed it on restart. This caused the ORTE level to break as the newly created socket descriptor was closed without its knowledge. The fix is to have the OMPI level shutdown tcp connections, allow the ORTE level to restart, and then allow the OMPi level to restart its connections. This seems obvious, and I'm surprised that this bug has not cropped up sooner. I'm confident that this specific problem has been fixed with this commit. Thanks to Eric Roman and Tamer El Sayed for their help in identifying this problem, and patience while I was fixing it. * Add a new state {{{OPAL_CRS_RESTART_PRE}}}. This state identifies when we are on the down slope of the INC (finalize-like) which is useful when you want to close, but not reopen a component set for fear of interfering with a lower level. * Use this new state in OMPI level coordination. Here we want to make sure to play well with both the OMPI/BTL/TCP and ORTE/OOB/TCP components. * Update ft_event functions in PML and BML to handle the new restart state. * Add an additional flag to the error output in OOB/TCP so we can see what the socket descriptor was on failure as this can be helpful in debugging. This commit was SVN r18276.	2008-04-24 17:54:22 +00:00
George Bosilca	3ccac4f803	Oops ... This commit was SVN r18275.	2008-04-24 15:54:52 +00:00
George Bosilca	73c9de3af9	Bark if we got a wrong sequence number. Here wrong means that the seq number if smaller than what we expect. This commit was SVN r18274.	2008-04-24 15:48:43 +00:00
Josh Hursey	cc83d41ad9	Merge in tmp/jjh-scratch {{{ svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch . }}} Contains: * Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart. * Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff * Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P. * Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry * Some other sundry cleanup items all dealing with C/R functionality in the trunk. This commit was SVN r18241.	2008-04-23 00:17:12 +00:00
Ralph Castain	fa082cafa9	Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex. Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer. This commit was SVN r18198.	2008-04-17 20:43:56 +00:00
Tim Prins	3582e11200	cleanup some warnings on 32 bit systems This commit was SVN r18187.	2008-04-17 12:25:05 +00:00
Ralph Castain	3a0d09300b	Fully implement the inbound binomial allgather for daemon-based collectives. Supports both modex and barrier operations. Comm_spawn still uses the rank=0 method - shifting that algo to the daemons is under study. This commit was SVN r18115.	2008-04-09 22:10:53 +00:00
Shiqing Fan	28746bbcdb	Remove the memchecker macro in pml base request, used in req_wait.c, which actually is in the wrong place. Instead, one simple call from send_request_free and recv_request_free(already done) will do all the work, fast and clean. This commit was SVN r18095.	2008-04-07 17:46:50 +00:00
Shiqing Fan	a1e5df1cc9	Use the new memchecker function call which is based on convertor. Remove one unnecessary call. This commit was SVN r18085.	2008-04-07 07:52:04 +00:00
George Bosilca	b4f828f389	We need a newline at the nd of the file, or some compiler bark. This commit was SVN r18023.	2008-03-30 19:05:56 +00:00
Aurelien Bouteiller	77653ac787	Missing .h file in makefile breaked nightly tarball distcheck... This commit was SVN r18006.	2008-03-28 14:36:56 +00:00
Aurelien Bouteiller	c16339944a	Fix a coverity warning about using unsafe sprintf. This commit was SVN r17999.	2008-03-27 21:24:27 +00:00
Aurelien Bouteiller	e11237aadb	Introduction of the "progress" sender_based method to replace the slow isend-self method. This commit was SVN r17998.	2008-03-27 21:19:45 +00:00
Aurelien Bouteiller	93db01871e	This is part of the previous patch. This commit was SVN r17997.	2008-03-27 21:06:14 +00:00
Aurelien Bouteiller	f8bf6f2c6a	Code cleanup. sender_based.h is now split in two files, to solve cyclic .h files inclusion. Most macros are now inline functions. Variable names have been changed from places to places. Various other small things... This commit was SVN r17996.	2008-03-27 21:05:44 +00:00
Gleb Natapov	cf40674369	Decide if sends should be throttled at the receiver and pass this to the sender in an ACK message. The decision can't be done reliably at the sender. This commit was SVN r17987.	2008-03-27 08:56:43 +00:00
Galen Shipman	0116041133	BTL shouldn't own the passive side's descriptor in the PML get protocol. The BTL doesn't know when to free it on the passive side. This commit was SVN r17943.	2008-03-25 01:43:41 +00:00
George Bosilca	8943ae0b4e	Cleanup plus some typos. This commit was SVN r17858.	2008-03-18 03:03:33 +00:00
Josh Hursey	612ebdc2ac	Cleanup some symbol visability issues. This commit was SVN r17733.	2008-03-05 13:59:25 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Aurelien Bouteiller	76e6334a57	This change is a mistake. CONVERTOR METHOD does not work with unpatched trunk. Revert back to PACK_METHOD. This commit was SVN r17629.	2008-02-27 20:02:25 +00:00
Aurelien Bouteiller	1d57b8b0e0	Replaced all the (long) cast by PRIsize_t. Should solve definitely compiler warnings that appeared from time to time depending on sizeof(size_t)... This commit was SVN r17627.	2008-02-27 19:58:18 +00:00
George Bosilca	fa31ec81d0	Add the ownership flags to the PML/BTL interface. The layer owning the descriptor is responsible for releasing it once the descriptor is not in use anymore. This commit was SVN r17497.	2008-02-18 17:39:30 +00:00
Shiqing Fan	653857ddbe	Wrong function name was copied here. This commit was SVN r17486.	2008-02-17 19:47:47 +00:00
Gleb Natapov	354c5bc5e1	Don't call progress() from OB1 fragment scheduling functions. They don't serve any purpose and case recursion calls to progress engine. This commit was SVN r17478.	2008-02-17 12:42:32 +00:00
Aurelien Bouteiller	3ffe845187	Fixed warning. This commit was SVN r17454.	2008-02-14 15:18:19 +00:00
Gleb Natapov	0a1fa2cb56	req_match_received is set inside MCA_PML_OB1_RECV_REQUEST_MATCHE(). This commit was SVN r17442.	2008-02-13 08:34:39 +00:00
Gleb Natapov	876f49f1a7	Remove unnecessary assignment. It is done later in the same function. This commit was SVN r17441.	2008-02-13 08:28:25 +00:00
Shiqing Fan	54c7b71cfd	Use the correct way of including memchecker.h, which will work with '--with-devel-headers'. This commit was SVN r17435.	2008-02-12 18:01:17 +00:00
Rainer Keller	7621800477	- Fix and add comments -- output full name for pd - Protect argument in macro... This commit was SVN r17434.	2008-02-12 16:59:59 +00:00
Jeff Squyres	6adc5015f9	This file was accidentally re-introduced in r17409. This commit was SVN r17428. The following SVN revision numbers were found above: r17409 --> open-mpi/ompi@98f70d6318	2008-02-12 13:07:44 +00:00
Shiqing Fan	f5792bbda5	merging the memchecker into trunk. This commit was SVN r17424.	2008-02-12 08:46:27 +00:00
George Bosilca	55179b833c	Unexpected ... Removing unistd.h from datatype.h break the compilation of the pml_base_bsend ... This commit was SVN r17412.	2008-02-10 21:49:19 +00:00
Aurelien Bouteiller	4da1258d60	Quick fix for static builds (mca_component_retain always return failure in static build mode, so just blatently ignore the failure. Though, this may crash severly sometime later if the failure occurs while in dso mode. This commit was SVN r17328.	2008-01-30 10:41:49 +00:00
George Bosilca	4e703741b7	Move the PML tags into the legal range. This commit was SVN r17326.	2008-01-30 00:09:45 +00:00
Aurelien Bouteiller	2fd8230025	Windows might not be the only one... This commit was SVN r17296.	2008-01-29 07:44:33 +00:00
Aurelien Bouteiller	bd10a0231f	Replaced the explicit include of inttypes.h by the opal replacement. This commit was SVN r17295.	2008-01-29 07:35:14 +00:00
Aurelien Bouteiller	e261861f4a	Major build system modification. Removed symlinks (problem with make dist), solved issues with static builds and can accept most compile options. The only unsupported compile option for now is --enable-mca-no-build=pml-v. Still investigating this... This commit was SVN r17294.	2008-01-29 06:07:57 +00:00
George Bosilca	fad6136794	To be or not to be ! As DR require 64 bits atomics, only allow it to build when thread support is disabled or we have 64 bits atomics support. This commit was SVN r17293.	2008-01-29 05:24:56 +00:00
George Bosilca	c5d5fcf50a	Protect the standard header file, and allow the PML V to compile on Windows. This commit was SVN r17250.	2008-01-26 18:43:06 +00:00
Aurelien Bouteiller	ca8eb1fb30	There should be no leftovers of configuration phase after distclean This commit was SVN r17249.	2008-01-26 09:56:02 +00:00
Aurelien Bouteiller	b5d44261a0	Fix one warning about extremely long lines (due to macro expansion) This commit was SVN r17247.	2008-01-26 00:38:33 +00:00
Aurelien Bouteiller	48cabdc40b	Changed build system. Should be more distcheck, VPATH, static and other compilation mode friendly. This commit was SVN r17245.	2008-01-25 23:57:01 +00:00
Rainer Keller	f7e586fc01	- allow --enable-mca-direct=pml-ob1 This commit was SVN r17227.	2008-01-25 09:56:45 +00:00
Aurelien Bouteiller	e471abb55e	put back ompi ignore until long filenames and other dist issues are fixed This commit was SVN r17219.	2008-01-25 00:28:30 +00:00
Aurelien Bouteiller	11815d9773	Fixed two warnings (especially the one that get repeted a large number of times in 64bit builds) This commit was SVN r17197.	2008-01-24 04:59:31 +00:00
Aurelien Bouteiller	a9045402c4	remove a pedantic warning This commit was SVN r17196.	2008-01-24 02:29:07 +00:00
Aurelien Bouteiller	76b13f91b9	fixed link:wq error in static mode This commit was SVN r17194.	2008-01-23 23:54:02 +00:00
Aurelien Bouteiller	f29ed2ed53	fixed missing errno.h on some architectures This commit was SVN r17186.	2008-01-23 20:24:54 +00:00
Aurelien Bouteiller	6fe17aff4a	solve compatibility issue from MMAP_NOCACHE This commit was SVN r17184.	2008-01-23 19:29:19 +00:00
Aurelien Bouteiller	69b3bae999	removed ignore, as the code is robust enough to avoid interfering with others This commit was SVN r17182.	2008-01-23 17:27:23 +00:00
Gleb Natapov	6e4155d111	Initialize local variable before use. This commit was SVN r17170.	2008-01-21 15:17:49 +00:00
George Bosilca	6310ce955c	The first patch related to the Active Message stuff. So far, here is what we have: - the registration array is now global instead of one by BTL. - each framework have to declare the entries in the registration array reserved. Then it have to define the internal way of sharing (or not) these entries between all components. As an example, the PML will not share as there is only one active PML at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3 are reserved for the framework while the remaining 5 are use internally by each framework. - The registration function is optional. If a BTL do not provide such function, nothing happens. However, in the case where such function is provided in the BTL structure, it will be called by the BML, when a tag is registered. Now, it's time for the second step... Converting OB1 from a switch based PML to an active message one. This commit was SVN r17140.	2008-01-15 05:32:53 +00:00
George Bosilca	98f79f2ea0	Remove the second declaration of the PML V component. This commit was SVN r17139.	2008-01-15 05:26:26 +00:00
Jon Mason	a0d4122606	The new cpc selection framework is now in place. The patch below allows for dynamic selection of cpc methods based on what is available. It also allows for inclusion/exclusions of methods. It even futher allows for modifying the priorities of certain cpc methods to better determine the optimal cpc method. This patch also contains XRC compile time disablement (per Jeff's patch). At a high level, the cpc selections works by walking through each cpc and allowing it to test to see if it is permissable to run on this mpirun. It returns a priority if it is permissable or a -1 if not. All of the cpc names and priorities are rolled into a string. This string is then encapsulated in a message and passed around all the ompi processes. Once received and unpacked, the list received is compared to a local copy of the list. The connection method is chosen by comparing the lists passed around to all nodes via modex with the list generated locally. Any non-negative number is a potentially valid connection method. The method below of determining the optimal connection method is to take the cross-section of the two lists. The highest single value (and the other side being non-negative) is selected as the cpc method. svn merge -r 16948:17128 https://svn.open-mpi.org/svn/ompi/tmp-public/openib-cpc/ . This commit was SVN r17138.	2008-01-14 23:22:03 +00:00
George Bosilca	1bd31aa3ac	Cleanup the OMPI_DECLSPEC/OMPI_MODULE_DECLSPEC in the PMLs. This commit was SVN r17093.	2008-01-09 20:32:39 +00:00
Gleb Natapov	b37ff74a24	Make function that is used only in one file static. Remove static functions declaration. This commit was SVN r17080.	2008-01-09 09:54:35 +00:00
Ethan Mallove	f32dcb1636	The Sun Studio 12 compilers need to have `inline` specified as `static` in cases where a function is not part of a separate compilation unit (such as `append_recv_req_to_queue`). This commit was SVN r17069.	2008-01-08 18:45:51 +00:00
Aurelien Bouteiller	9bf54e1604	Windows compatibility patch. Also introduces work in progress "convertor" sender based copy algorithm. This algorithm cannot be selected without other modifications in the convertor (not currently available in trunk). The default old synchronous copy algorithm is selected by default. This commit was SVN r17063.	2008-01-07 23:35:44 +00:00
George Bosilca	d2324050f8	Allow the PML V component to be compiled on Windows. Force all .c files to include the ompi_config.h as the first #include. This commit was SVN r17056.	2008-01-05 00:17:32 +00:00
George Bosilca	42414b27e9	Use BEGIN_C_DECLS and END_C_DECLS instead of the ugly #if/#endif. This commit was SVN r17009.	2007-12-21 06:19:46 +00:00
George Bosilca	b58dae00db	Allow PERUSE to compile correctly. This commit was SVN r17008.	2007-12-21 06:18:19 +00:00
George Bosilca	906e8bf1d1	Replace the ompi_pointer_array with opal_pointer_array. The next step (sometimes after the merge with the ORTE branch), the opal_pointer_array will became the only pointer_array implementation (the orte_pointer_array will be removed). This commit was SVN r17007.	2007-12-21 06:02:00 +00:00
Gleb Natapov	35bf8c7c46	Rewrite OB1 matching logic. Get rid of macros, make the code shorter. This commit was SVN r16993.	2007-12-19 09:16:20 +00:00
Gleb Natapov	5cd38b8b06	Better encapsulate heterogeneous arch handling in ob1. This commit was SVN r16970.	2007-12-16 08:45:44 +00:00
Gleb Natapov	8b511b969d	Introduce a new BTL parameter btl_rndv_eager_limit which determines size of a first fragment of rendezvous protocol. Remove no longer used btl_min_send_size parameter. This commit was SVN r16969.	2007-12-16 08:35:17 +00:00
Jeff Squyres	213b5d5c6e	Per long threads on the mailing list and much confusion discussion about linkers, have all OPAL, ORTE, and OMPI components '''not'' link against the OPAL, ORTE, or OMPI libraries. See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a better-formatted version of the same info). This commit was SVN r16968.	2007-12-15 13:32:02 +00:00
Aurelien Bouteiller	93f39fa190	Fixes various issues with --enable-visibility, C++ and exotic C compilers. Aurelien This commit was SVN r16949.	2007-12-12 19:13:23 +00:00
Gleb Natapov	e0dc53e516	Use mca_bml_base_send_status() in OB1. This commit was SVN r16905.	2007-12-09 14:13:24 +00:00
Gleb Natapov	e2e211f23b	Add flags parameter to btl_alloc() and btl_prepare_src() functions. If BTL knows at the time of allocation priority of a descriptor it may do some optimizations. This commit was SVN r16901.	2007-12-09 14:08:01 +00:00
Gleb Natapov	2d784752dd	Remove descriptor caching form BML. With descriptor caching some optimizations are impossible. This commit was SVN r16897.	2007-12-09 13:58:17 +00:00
Aurelien Bouteiller	6190c97ee9	PML V and vprotocol framework management of customizable wait/test. This is still a fast and dirty implementation (cleanup of the customized request functions is not totally correct if several component modify them out of order). This commit was SVN r16890.	2007-12-07 08:21:25 +00:00
Aurelien Bouteiller	859169214c	Vprotocol pessimist benefits from customizable requests. Waitany, waitsome, test, testany, testall, testsome can now be hooked and are therefore logged correctly. This commit was SVN r16885.	2007-12-07 08:17:30 +00:00
Aurelien Bouteiller	15ffe6c89c	Accomoding the new interface for free_lists. This commit was SVN r16727.	2007-11-16 00:00:38 +00:00
Rich Graham	27a748e7eb	change all instances of ompi_free_list_init to ompi_free_list_init_new. Header and payload data are specified separately at this stage. This commit was SVN r16633.	2007-11-01 23:38:50 +00:00
Rich Graham	67f4b69848	propogate fix for out of buffered send memory space to dr and ob1 - thanks George. This commit was SVN r16593.	2007-10-27 00:17:53 +00:00
Rich Graham	9c0483088a	if unable to get buffered space, try and progress communications to free up resources. This commit was SVN r16591.	2007-10-26 23:16:31 +00:00
Gleb Natapov	52c6160252	MCA_PML_BASE_REQUEST_MPI_COMPLETE() macro does nothing except call to ompi_request_complete(). Remove the macro and call the function directly. This commit was SVN r16498.	2007-10-18 14:20:24 +00:00
George Bosilca	aa20a94b6f	Remove warning about an unused variable. This commit was SVN r16497.	2007-10-18 13:48:56 +00:00
Gleb Natapov	4f865e22e8	We have two different version of ompi_request_complete. One as a function another as a macro. Make it one inline function. This commit was SVN r16495.	2007-10-18 13:02:27 +00:00
Gleb Natapov	e0a3a7e53e	Move duplicated code all over the code to a single function ompi_request_wait_completion(). This commit was SVN r16494.	2007-10-18 12:33:21 +00:00
Gleb Natapov	807f49ed7f	If there are more then one BTL present we may divide payload between them in such a way that converter will not be able to pack some of it. This commit adds handling of such cases. If converter can't pack any data for a BTL the data is sent over another BTL that has data to send. This commit was SVN r16493.	2007-10-18 12:07:37 +00:00
Gleb Natapov	1330974e5e	eager_limit is no longer needed in OB1 PML. Remove it. This commit was SVN r16442.	2007-10-15 09:26:42 +00:00
George Bosilca	e3105a85be	Don't require a progress function from the PML. If there is one then the PML base will take care of the registration with the event library. Otherwise, (and this apply for the CM case) the MTL are in charge of registering their own progress function. This commit was SVN r16415.	2007-10-09 23:28:53 +00:00
Galen Shipman	62ade993ca	Seperate finalize and close for the PML, this gives the PML a chance to complete any outstanding operations prior to close. Before this change we just called pml_finalize in pml_close which causes problems if there are outstanding events that a BTL/MTL needs to progress during finalize. The problem is that MPI_COMM_WORLD and others were destroyed prior to closing the PML, pml_close would call pml_finalize, events would progress in the BTL, and these events expected MPI_COMM_WORLD to still be around.. This commit was SVN r16405.	2007-10-09 15:28:56 +00:00
Josh Hursey	7437f37e96	This commit contains the following: * Fix some missing includes in a few places. * Add the cr_request() functionality to the BLCR CRS component. We are now dependent upon the 0.6.* series of BLCR. * Made the CR notification mechanism a registered function. This way we can have an OPAL-only version and it can be replaced at runtime with the ORTE version. * Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only CR functionality when the user wants it. Default: Disabled. * Fix the placement of a checkpoint request check in MPI_Init * Pull the OPAL notification mechanism into the SnapC framework. * We no longer fork/exec the 'opal-checkpoint' command for local checkpointing, the Local coordinator in the orted does this directly. * The Local and Application coordinator talk together bypassing the OPAL notifiation mechanism. * Optimized the Local <-> App Coordinator communication. * Improved the structure used to track vpid_snapshots in the local coord. * Fix a race condition in which an application under heavy communication load may produce an inconsistent global checkpoint. This commit was SVN r16389.	2007-10-08 20:53:02 +00:00
Ralph Castain	54b2cf747e	These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC. The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component. This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done: As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in. In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in. The incoming changes revamp these procedures in three ways: 1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step. The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic. Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure. 2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed. The size of this data has been reduced in three ways: (a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes. To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose. (b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction. (c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using. While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly. 3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup. It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging. Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future. There are a few minor additional changes in the commit that I'll just note in passing: propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details. * requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details. * cleanup of some stale header files This commit was SVN r16364.	2007-10-05 19:48:23 +00:00
Aurelien Bouteiller	670956e172	Another cast mistake. This commit was SVN r16247.	2007-09-26 21:14:35 +00:00
Aurelien Bouteiller	f7d7d58fb6	Various cast type errors on 64bit architectures This commit was SVN r16246.	2007-09-26 20:54:18 +00:00
Aurelien Bouteiller	0df0087f17	Investigating improvement of cache line management on shared memory This commit was SVN r16183.	2007-09-21 20:02:56 +00:00
Josh Hursey	3e51d7bb25	Implement the MPI_Iprobe and MPI_Probe wrappers. Remove some old, unused code. This commit was SVN r16178.	2007-09-21 16:28:46 +00:00

... 2 3 4 5 6 ...

937 Коммитов