openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	252d370c70	Update svn:ignore This commit was SVN r19688.	2008-10-06 16:27:44 +00:00
Jeff Squyres	dbb932b619	Remove the missing app mpi_after_finalize from the Makefile. This commit was SVN r19687.	2008-10-06 14:35:15 +00:00
Jeff Squyres	c42ab8ea37	Fixes trac:1210, #1319 Commit from a long-standing Mercurial tree that ended up incorporating a lot of things: * A few fixes for CPC interface changes in all the CPCs * Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC * #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC) * Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag) * CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers. * CPCs allocate a CTS buffer only if they're about to make a connection * RDMA CM improvements: * Use threaded mode openib fd monitoring to wait for for RDMA CM events * Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions * Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors * Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type * Renamed many variables to be internally consistent * Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator * Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them) * Use rdma_create_qp() instead of ibv_create_qp() * openib fd monitoring improvements: * Renamed a bunch of functions and variables to be a little more obvious as to their true function * Use pipes to communicate between main thread and service thread * Add ability for main thread to invoke a function back on the service thread * Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3)) * Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port * Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout * Other improvements: * btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp") * Somewhat improved error handling * Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements * Oodles of little coding style fixes * Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction * Added some more show_help error messages * Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread) This commit was SVN r19686. The following Trac tickets were found above: Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210	2008-10-06 00:46:02 +00:00
Tim Mattox	4602b0e94a	Adjust NEWS items about a SLURM fix. This commit was SVN r19678.	2008-10-03 19:01:12 +00:00
Ralph Castain	1976992300	Update platform files to correctly deal with visibility and memchecker This commit was SVN r19677.	2008-10-03 13:41:33 +00:00
Ralph Castain	f4f81c7308	Let the HNP only update the routing tree if necessary. Enable some debug output This commit was SVN r19676.	2008-10-03 13:41:08 +00:00
Ralph Castain	0cc2e724f8	Separate var declaration from use to remove compiler warnings in non-debug builds This commit was SVN r19675.	2008-10-03 13:40:31 +00:00
Ralph Castain	15c47a2473	Revise the daemon collective system to handle comm_spawn patterns that cross into new nodes that are not direct children on the routing tree of the HNP. Refers to ticket #1548. Although this appears to fix the problem, the ticket will be held open pending further test prior to transition to the 1.3 branch. This commit was SVN r19674.	2008-10-02 20:08:27 +00:00
Rolf vandeVaart	0a0ddfc934	Handle MPI_IN_PLACE correctly in the ompi_coll_tuned_reduce_scatter_intra_ring function. We were not adjusting the sendbuf in this case so we were reducing garbage. This fixes ticket #1506. This commit was SVN r19673.	2008-10-02 20:01:27 +00:00
Aurelien Bouteiller	852c0e35b8	Fixed a stupid type mismatch. Thanks Jeff for noticing. Aurelien This commit was SVN r19672.	2008-10-02 00:22:41 +00:00
Jeff Squyres	8cb6ad5cb7	Adjustment to match the ob1 change from r19658. This commit was SVN r19671. The following SVN revision numbers were found above: r19658 --> open-mpi/ompi@b32e4e7f34	2008-10-01 23:57:26 +00:00
Jeff Squyres	70b02a0178	Sometimes we don't have a valid error code, so don't segv if ompi_mpi_errnum_get_string() returns a NULL. This commit was SVN r19670.	2008-10-01 21:42:08 +00:00
Aurelien Bouteiller	77fa3b5d4c	Code to connect to event loggers over optimized MPI communication channels. This commit was SVN r19669.	2008-10-01 18:42:43 +00:00
Aurelien Bouteiller	aded765084	Upgrade to the new mca version This commit was SVN r19668.	2008-10-01 18:40:44 +00:00
Aurelien Bouteiller	73edda92db	Take into account Ralph's comments. There is no duplicate functionnality with the rml base. Also fix a stupid typo bug. This commit was SVN r19667.	2008-10-01 02:31:14 +00:00
Tim Mattox	1995fb23cb	Resync the NEWS file for a 1.2.8 change. This commit was SVN r19662.	2008-10-01 00:01:01 +00:00
George Bosilca	00d24bf8ab	Scalability patch, or slim-fast effect #1 . All BML structures just got a whole lot smaller, decreasing the memory footprint of the running application. How much it's a good question. Here is a breakdown: - in mca_bml_base_endpoint_t: 3 size_t + 1 uint32_t - in mca_bml_base_btl_t: 1 * int + 1 * double - 1 * float + 6 * size_t + 9 * (void) The decrease in mca_bml_base_endpoint_t is for each peer and the decrease in mca_bml_base_btl_t is for each BTL for each peer. So, if we consider the most convenient case where there is only one network between all peers, this decrease the memory foot print per peer by 9size_t + 9(void) + 2 * int32_t + 1 * double - 1 * float. On a 64 bits machine this will be 156 bytes per peer. Now we access all these fields directly from the underlying BTL structure, and as this structure is common to multiple BML endpoint, we are a lot more cache friendly. Even if this do not improve the latency, it makes the SM performance graph a lot smoother. This commit was SVN r19659.	2008-09-30 21:02:37 +00:00
George Bosilca	b32e4e7f34	Nothing important, mainly replacing tabs with spaces. This commit was SVN r19658.	2008-09-30 18:30:35 +00:00
George Bosilca	325d006577	Mostly cleanups, and eventually a little bit more scalable add_procs. There was an argument that was barely used, and on return at the PML level it contained nothing usable. It has been removed, so now we're using less memory ... This commit was SVN r19657.	2008-09-30 15:47:43 +00:00
Ralph Castain	aa11e0977c	Correct a bug in the bookmarking code that incorrectly looked at #slots instead of #slots_allocated, thus causing slot reductions in hostfiles to be ignored when selecting our starting node. Fixes trac:1527 This commit was SVN r19656. The following Trac tickets were found above: Ticket 1527 --> https://svn.open-mpi.org/trac/ompi/ticket/1527	2008-09-29 14:09:02 +00:00
Ralph Castain	4f89adae0c	Prettify the user level display of allocation and map to make it easier to see and understand This commit was SVN r19655.	2008-09-28 16:44:09 +00:00
Ralph Castain	508cb45583	Add a little more diagnostic info when we cannot do an rml send This commit was SVN r19654.	2008-09-28 02:13:49 +00:00
Aurelien Bouteiller	89a2eea37a	Add functions to access the opaque port_string and to add routes to a remote port. This is usefull for FT, but could also turn usefull when considering MPI3 extentions to the MPI2 dynamics. This commit was SVN r19653.	2008-09-27 13:22:32 +00:00
Jeff Squyres	8b786cac04	The configure test we had for checking whether openib could build (related to the presence of posix threads and ptmalloc2) is now a little outdated: since we don't build ptmalloc2 as part of libopal anymore, the openib BTL's requirements are not directly tied to ptmalloc2's anymore. Specifically, I altered the test to: 1. At compile time, if no threads are found, the ptmalloc2 component is going to be built, '''and the ptmalloc2 component is going to be inside libopal,''' then refuse to build the openib BTL. 1. At run time, if no threads were available at compile time and the ptmalloc2 component is part of the process, then refuse to use the openib BTL. Fixes trac:1537. This commit was SVN r19652. The following Trac tickets were found above: Ticket 1537 --> https://svn.open-mpi.org/trac/ompi/ticket/1537	2008-09-27 11:19:21 +00:00
George Bosilca	2803de5298	Remove the protection around computing the remote size. This has to be done always in a heterogeneous way in order to be able to support extern32. It doesn't really matter as it is outside the critical path. This commit was SVN r19651.	2008-09-26 23:11:53 +00:00
George Bosilca	ed0f5a18ce	Typo. This commit was SVN r19650.	2008-09-26 23:09:48 +00:00
Brian Barrett	76f47aaa1c	Fix a bunch of compiler warnings. Refs trac:1458 This commit was SVN r19649. The following Trac tickets were found above: Ticket 1458 --> https://svn.open-mpi.org/trac/ompi/ticket/1458	2008-09-26 16:15:05 +00:00
Jeff Squyres	c11bee41da	Add bullet about updated SLURM support This commit was SVN r19648.	2008-09-26 12:36:59 +00:00
Ralph Castain	edb3d99687	Update SLURM environmental variables used to describe allocation. Retain backwards compatibility to SLURM 1.1 and earlier versions. This commit was SVN r19647.	2008-09-26 02:38:37 +00:00
Tim Mattox	aaae28b6c7	Resync the NEWS file with the 1.2 branch. This commit was SVN r19646.	2008-09-25 21:48:17 +00:00
Aurelien Bouteiller	4be474f727	CRS is now an opal framework. It should use OPAL version defines. This commit was SVN r19643.	2008-09-25 21:01:04 +00:00
Kenneth Matney	91bbc6b919	Change algorithm from spawning a shell that spawns another shell, and thereby runs apstat twice; and in the process thereof reads the ALPS appinfo file TWICE; and in addition, experiences a failure sometimes which causes mpirun to hang. Change this to a looped read attempt that breaks on success, thereby avoiding failure (except in the most This commit was SVN r19642.	2008-09-25 20:44:16 +00:00
Tim Mattox	821d81304f	Testing... This commit was SVN r19641.	2008-09-25 20:34:00 +00:00
Jeff Squyres	8de0663ae0	Increase the size of MPI_MAX_PORT_NAME from 256 to 1024. Rationale: 1. This value has already changed since v1.2 (v1.2 MPI_MAX_PORT_NAME == 36). Hence, this commit simply increases the value from a previous change. 1. The changes does increase OMPI's memory footprint slightly, but only when using MPI-2 dynamics. So it is expected that the change will have minimal impact on the overall footprint. 1. The change is helpful for nodes that have 4 or more IP networks (e.g., regular ethernet and multiple IP-over-<pick your favorite high-speed network> networks). Without this change, invoking MPI_COMM_SPAWN on hosts with 4 or more IP networks will fail because we'll exceed 256 bytes for the port name. Some OMPI developer test clusters already have this kind of configuration (e.g., Cisco); it is expected that this is not too common in the real world yet, but with "manycore" coming, having multiple IP-based networks in a single server will likely become more common. This commit was SVN r19638.	2008-09-25 16:47:17 +00:00
Ralph Castain	037231fbcb	MOdify the node_rank and local_rank fields to be uint16_t so we can handle more than 256 procs/node. Change the type to a defined one so that any future change can be easily done, if required. This commit was SVN r19637.	2008-09-25 13:39:08 +00:00
Ralph Castain	55738aeabe	Very tiny modification of the output when displaying mca param values to clarify that ones found in the environment could have also been set on the cmd line - we don't have a way to distinguish them internally. This commit was SVN r19636.	2008-09-25 13:08:17 +00:00
Tim Mattox	6a3e28a3b6	Resync the NEWS file with the 1.3 branch. This commit was SVN r19635.	2008-09-24 21:06:34 +00:00
Tim Mattox	fae18d1ea2	Ugh, whitespace differences suck... fixing. This commit was SVN r19632.	2008-09-24 20:51:18 +00:00
Tim Mattox	1d5a6602b6	Resync the NEWS on the trunk with the 1.2 branch. This commit was SVN r19631.	2008-09-24 20:41:17 +00:00
Jeff Squyres	627f1ecd36	Oops -- fix silly cut-n-paste error... This commit was SVN r19630.	2008-09-24 20:17:54 +00:00
Jeff Squyres	d85aaf521a	Also show the process name; it is useful, at least to us developers ;-) This commit was SVN r19629.	2008-09-24 19:32:34 +00:00
Josh Hursey	77e6b72c06	Update my entry to reflect all of the affiliations. This commit was SVN r19627.	2008-09-24 17:34:58 +00:00
Jeff Squyres	78a25cf116	Commit a few missing header files, etc. This commit was SVN r19626.	2008-09-24 15:41:42 +00:00
Brian Barrett	1f69ae5356	Add SNL affiliation for me. SNL is plural. This commit was SVN r19625.	2008-09-24 15:20:45 +00:00
Ralph Castain	8d1ecdb361	Correct the creation of MPIR_Proctable so that the structs in the array correspond to the order of the ranks. This commit was SVN r19624.	2008-09-24 14:55:46 +00:00
Jeff Squyres	bbfac2dfb5	Based on a review by Ralph, no need to call getpid() or gethostname(); we already have them in orte_process_info. Refs trac:1523. This commit was SVN r19615. The following Trac tickets were found above: Ticket 1523 --> https://svn.open-mpi.org/trac/ompi/ticket/1523	2008-09-23 20:04:34 +00:00
Jeff Squyres	2879de60a1	Update a name spelling This commit was SVN r19614.	2008-09-23 19:59:57 +00:00
Jeff Squyres	ca323aae8e	Very minor updates. Refs trac:1399. This commit was SVN r19613. The following Trac tickets were found above: Ticket 1399 --> https://svn.open-mpi.org/trac/ompi/ticket/1399	2008-09-23 19:50:31 +00:00
Jeff Squyres	ef6a216771	Update AUTHORS file with all the IDs that have committed so far on the OMPI trunk. Need all organizations to ensure I got spellings and affiliations correct. Also commit a helper script to help keep AUTHORS up to date on the trunk; it should be run before we create release branches. This commit was SVN r19612.	2008-09-23 19:38:53 +00:00
Jeff Squyres	4c558ed637	Enable aggregation checking for "*** An error occurred..." MPI layer help messages so that users only see the message once instead of N times when their MPI app crashes. Note that there is a tradeoff here -- we now call malloc in this particular "show the error" code path. This shouldn't usually be a problem, because the errors typically displayed through this mechanism are MPI API argument problems (e.g., sending a negative count to MPI_SEND), and not memory errors. But such API argument errors could be a consequence of of a prior memory error, so there's a nonzero chance that the error failure will fail to print because malloc failed. In this case, the user can disable help message aggregation (via the orte_base_want_aggregate MCA parameter) and we'll fall back to the no-malloc code path (but without aggregation). Note that we won't aggregate before MPI_INIT or after MPI_FINALIZE. So if you call an MPI function before MPI_INIT / after MPI_FINALIZE, you'll still see the error message N times. Nothing we can do about that; we need ORTE to do the aggregation properly (which is obviously unavailable before MPI_INIT / after MPI_FINALIZE). This commit was SVN r19611.	2008-09-23 17:19:24 +00:00

1 2 3 4 5 ...

12277 Коммитов