openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00
Gilles Gouaillardet	75616320b5	remove all message output from ompi/communicator/comm_helpers.c Thanks George for pointing out. cmr=v1.8.2:reviewer=bosilca:ticket=4676 This commit was SVN r31889. The following Trac tickets were found above: Ticket 4676 --> https://svn.open-mpi.org/trac/ompi/ticket/4676	2014-05-26 01:54:24 +00:00
Gilles Gouaillardet	40f3b849eb	Fix argument checks for [i]neighbor_alltoall{v\|w} This fixes a bug introduced in : - r31815 (trunk) - r31853 (v1.8 branch) cmr=v1.8.2:reviewer=bosilca This commit was SVN r31888. The following SVN revision numbers were found above: r31815 --> open-mpi/ompi@8bafe06c57 r31853 --> open-mpi/ompi@bff944d766	2014-05-23 08:19:17 +00:00
George Bosilca	750c6c7861	Update the UTK copyright on the topology related files. This commit was SVN r31805.	2014-05-16 22:23:52 +00:00
Nathan Hjelm	279c0a3ca7	comm: fix communicator subsystem leaks This commit fixes two leaks: - We never destructed the attributes on MPI_COMM_WORLD. All other communicators that have attributes are released through ompi_comm_free which does the attribute destruction. For MPI_COMM_WORLD this is now done before the destructor is called. - Add missing OBJ_RELEASE for ompi_comm_f_to_c_table. cmr=v1.8.2:reviewer=jsquyres This commit was SVN r31766.	2014-05-14 21:15:33 +00:00
Ralph Castain	c4c9bc1573	As per the RFC: http://www.open-mpi.org/community/lists/devel/2014/04/14496.php Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM This commit was SVN r31557.	2014-04-29 21:49:23 +00:00
Jeff Squyres	964249e8a6	comm_cid.c: Ensure that "flag" is initially set to false. If the loops never get executed because CIDs are exhausted, then the value of flag will be undefined. Refs trac:4572 This commit was SVN r31546. The following Trac tickets were found above: Ticket 4572 --> https://svn.open-mpi.org/trac/ompi/ticket/4572	2014-04-29 17:39:14 +00:00
Nathan Hjelm	e410401523	comm: detect if we run out of communicator ids (cids) Due to a leak in the osc/rdma component we were running out of cids on a one-sided tests. This resulted in a hang instead of an error. This commit causes the nextcid algorithm to return an error if we run out of cids. cmr=v1.8.2:reviewer=jsquyres This commit was SVN r31538.	2014-04-28 19:55:09 +00:00
Nathan Hjelm	a88b24ce21	Per comments from Jeff makes some changes to the communicator changes. Changed: - Use ompi_mpi_group_null instead of MPI_GROUP_NULL. - Asserts don't always quiet the clang static analyser. Change them to ifs to really quite the warnings. cmr=v1.8.1:ticket=trac:4527:reviewer=jsquyres This commit was SVN r31424. The following Trac tickets were found above: Ticket 4527 --> https://svn.open-mpi.org/trac/ompi/ticket/4527	2014-04-18 17:23:46 +00:00
Nathan Hjelm	f80aece271	Silence warnings identified by the clang static analyzer in the communicator code. Many of the warnings were false warnings. These were silenced by adding the appropriate asserts. Other warnings identified some potential issues in error paths that should now be resolved. cmr=v1.8.1:reviewer=jsquyres This commit was SVN r31416.	2014-04-16 22:43:20 +00:00
Nathan Hjelm	a64bd4035c	Fix bugs in intercomm creation and comm split. This commit addresses bugs discovered by ggouaillardet. - Fix hang when creating an intercommunicator - Fix memory leak - Fix coverity warning cid70288 - Fix false coverity warning cid1196589 Fixes trac:4507 Fixes trac:4522 cmr=v1.8.1:reviewer=jsquyres This commit was SVN r31415. The following Trac tickets were found above: Ticket 4507 --> https://svn.open-mpi.org/trac/ompi/ticket/4507 Ticket 4522 --> https://svn.open-mpi.org/trac/ompi/ticket/4522	2014-04-16 22:43:12 +00:00
Ralph Castain	543271b9de	Set the locality prior to calling add_procs so bozos like Jeff get it at the right time Refs trac:4411 This commit was SVN r31119. The following Trac tickets were found above: Ticket 4411 --> https://svn.open-mpi.org/trac/ompi/ticket/4411	2014-03-18 17:57:27 +00:00
Ralph Castain	3323c47ab4	Ensure all procs set locality for all remote procs in the multi-way intercomm_create problem Refs trac:4411 This commit was SVN r31118. The following Trac tickets were found above: Ticket 4411 --> https://svn.open-mpi.org/trac/ompi/ticket/4411	2014-03-18 16:55:15 +00:00
Ralph Castain	9c66c4f439	Correctly implement --disable-oshmem and --without-orte so we don't build the disabled section of code. Fix a bunch of code rot in the PMI rte component, and add several missing headers when building --without-orte. NOTE: I transferred the oshmem-disabled-by-default from the 1.7 branch to the trunk to minimize future disruption if/when we change that option. cmr=v1.8:reviewer=jsquyres This commit was SVN r31006.	2014-03-11 22:02:40 +00:00
Ralph Castain	16b1ad052f	Silence compiler warning This commit was SVN r29478.	2013-10-23 04:13:51 +00:00
Nathan Hjelm	7bee047e5d	Fix rentry check in communicator request progress. cmr=v1.7.4:ticket=3796 This commit was SVN r29465. The following Trac tickets were found above: Ticket 3796 --> https://svn.open-mpi.org/trac/ompi/ticket/3796	2013-10-22 15:33:39 +00:00
Nathan Hjelm	e11233cb65	Remove unnecessary member from the comm idup context. Refs trac:3796 This commit was SVN r29419. The following Trac tickets were found above: Ticket 3796 --> https://svn.open-mpi.org/trac/ompi/ticket/3796	2013-10-09 22:00:17 +00:00
Nathan Hjelm	b025babfec	Correctly set the state for communicator requests. Refs trac:3796 This commit was SVN r29377. The following Trac tickets were found above: Ticket 3796 --> https://svn.open-mpi.org/trac/ompi/ticket/3796	2013-10-04 17:34:54 +00:00
Ralph Castain	61c4baefe0	Silence compiler warning Refs trac:3796 This commit was SVN r29352. The following Trac tickets were found above: Ticket 3796 --> https://svn.open-mpi.org/trac/ompi/ticket/3796	2013-10-03 19:50:10 +00:00
Nathan Hjelm	539fab1600	Fix typo in communicator destruction This commit was SVN r29343.	2013-10-03 01:46:36 +00:00
Nathan Hjelm	c17b21b11d	Due to MPI_Comm_idup we can no longer use the communicator's CID as the fortran handle. Use a seperate opal_pointer_array to keep track of the fortran handles of communicators. This commit also fixes a bug in ompi_comm_idup where the newcomm was not set until after the operation completed. cmr=v1.7.4:reviewer=jsquyres:ticket=trac:3796 This commit was SVN r29342. The following Trac tickets were found above: Ticket 3796 --> https://svn.open-mpi.org/trac/ompi/ticket/3796	2013-10-03 01:11:28 +00:00
Nathan Hjelm	195c892a9d	Add support for MPI_Comm_dup_with_info, MPI_Comm_create_group, and MPI_Comm_idup. As part of this work I implemented a basic request scheduler in ompi/comm/comm_request.c. This scheduler might be useful for more than just communicator requests and could be moved to ompi/request if there is a demand. Otherwise I will leave it where it is. Added a non-blocking version of ompi_comm_set to support ompi_comm_idup. The call makes a recursive call to comm_dup and a non-blocking version was needed. To simplify the code the blocking version calls the nonblocking version and waits on the resulting request if one exists. cmr=v1.7.4:reviewer=jsquyres:ticket=trac:3796 This commit was SVN r29334. The following Trac tickets were found above: Ticket 3796 --> https://svn.open-mpi.org/trac/ompi/ticket/3796	2013-10-02 14:26:40 +00:00
George Bosilca	d0ad20aacb	Don't use ORTE specifics in the OMPI layer. Instead use the RTE equivalents. Patch submitted by Geoffroy Vallee. This commit was SVN r29303.	2013-09-30 23:31:04 +00:00
George Bosilca	f60365a91e	Allow the trunk to cimpile after r29265. The addition of the neighborhood collective to the mca_coll_base_comm_coll_t structure increased the size of the ompi_communicator_t over the limit of the predefined padding (PREDEFINED_COMMUNICATOR_PAD). This fix is a temporary fix to allow the trunk to compile. Unfortuantely it breaks the compatibility with all other versions of Open MPI. Please read the comment in this header file for a more complete explanation. This commit was SVN r29277. The following SVN revision numbers were found above: r29265 --> open-mpi/ompi@c5596548b2	2013-09-27 07:25:26 +00:00
Nathan Hjelm	0b8fc13299	MPI-3.0: update C bindings with const and consistent use of [] for arrays. The MPI 3.0 standard added const to all in buffers in the C bindings. This commit adds the const keyword and in most cases casts const away. We will eventually should go through and update the various interfaces (coll, pml, io, etc) to take the const keyword. The group, comm, win, and datatype interfaces have been updated with const. cmr=v1.7.4:ticket=trac:3785:reviewer=jsquyres This commit was SVN r29266. The following Trac tickets were found above: Ticket 3785 --> https://svn.open-mpi.org/trac/ompi/ticket/3785	2013-09-26 21:56:20 +00:00
Ralph Castain	865a7028f8	Per patch from George, with a few minor cleanups. Correctly address the complete exchange of required wireup information in Intercomm_create so all procs in the resulting communicator know how to talk to each other. Refs trac:29166 This commit was SVN r29200. The following Trac tickets were found above: Ticket 29166 --> https://svn.open-mpi.org/trac/ompi/ticket/29166	2013-09-18 02:01:30 +00:00
Ralph Castain	537e7380b1	As per the discussion on the devel telecon, do not compute ompi_comm_world_thread_level_mult if thread multiple is disabled. We aren't using the value anyway, but we will leave the current code in-place until we understand if it is needed or not. This commit was SVN r29080.	2013-08-28 17:44:04 +00:00
George Bosilca	badd011ac3	Minor cleanup. This commit was SVN r29076.	2013-08-28 05:48:58 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Jeff Squyres	b417095639	Do not destroy the sub-communicator until we have freed its attributes, per the reason cited in the comment in the code. This commit was SVN r28724.	2013-07-05 12:15:03 +00:00
George Bosilca	5fae72b9aa	Add the MPI 2.2 MPI_Dist_graph functionality. This patch reshape the way we deal with topologies completely. Where our topologies were mainly storage components (they were not capable of creating the new communicator), the new version is built around a [possibly] common representation (in mca/topo/topo.h), but the functions to attach and retrieve the topological information are specific to each component. As a result the ompi_create_cart and ompi_create_graph functions become useless and have been removed. In addition to adding the internal infrastructure to manage the topology information, it updates the MPI interface, and the debuggers support and provides all Fortran interfaces. This commit was SVN r28687.	2013-07-01 12:40:08 +00:00
George Bosilca	1c281a224c	Cleanup the communicator cid allocation function. The value of the old_com that has been temporarily stored in the communicator_array should be removed or the finalization will segfault (the same communicator will be released twice). This commit was SVN r28214.	2013-03-26 14:45:21 +00:00
Brian Barrett	f42783ae1a	Move the RTE framework change into the trunk. With this change, all non-CR runtime code goes through one of the rte, dpm, or pubsub frameworks. This commit was SVN r27934.	2013-01-27 23:25:10 +00:00
George Bosilca	a1754dfe31	Don't leave the cid registered in any case. This commit was SVN r27735.	2013-01-02 18:32:46 +00:00
Josh Hursey	28681deffa	Backout the ORCA commit. :( There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk. This commit was SVN r26676.	2012-06-27 01:28:28 +00:00
Josh Hursey	542330e3a7	Commit of ORCA: Open MPI Runtime Collaborative Abstraction This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI. The project is described on the wiki: https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition And on this email thread: http://www.open-mpi.org/community/lists/devel/2012/06/11109.php This commit was SVN r26670.	2012-06-26 21:42:16 +00:00
Jeff Squyres	2ba10c37fe	Per RFC, bring in the following changes: * Remove paffinity, maffinity, and carto frameworks -- they've been wholly replaced by hwloc. * Move ompi_mpi_init() affinity-setting/checking code down to ORTE. * Update sm, smcuda, wv, and openib components to no longer use carto. Instead, use hwloc data. There are still optimizations possible in the sm/smcuda BTLs (i.e., making multiple mpools). Also, the old carto-based code found out how many NUMA nodes were ''available'' -- not how many were used ''in this job''. The new hwloc-using code computes the same value -- it was not updated to calculate how many NUMA nodes are used ''by this job.'' * Note that I cannot compile the smcuda and wv BTLs -- I ''think'' they're right, but they need to be verified by their owners. * The openib component now does a bunch of stuff to figure out where "near" OpenFabrics devices are. '''THIS IS A CHANGE IN DEFAULT BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors (I do not have a NUMA machine with an OpenFabrics device that is a non-uniform distance from multiple different NUMA nodes). * Completely rewrite the OMPI_Affinity_str() routine from the "affinity" mpiext extension. This extension now understands hyperthreads; the output format of it has changed a bit to reflect this new information. * Bunches of minor changes around the code base to update names/types from maffinity/paffinity-based names to hwloc-based names. * Add some helper functions into the hwloc base, mainly having to do with the fact that we have the hwloc data reporting ''all'' topology information, but sometimes you really only want the (online \| available) data. This commit was SVN r26391.	2012-05-07 14:52:54 +00:00
Jeff Squyres	253444c6d0	== Highlights == 1. New mpifort wrapper compiler: you can utilize mpif.h, use mpi, and use mpi_f08 through this one wrapper compiler 1. mpif77 and mpif90 still exist, but are sym links to mpifort and may be removed in a future release 1. The mpi module has been re-implemented and is significantly "mo' bettah" 1. The mpi_f08 module offers many, many improvements over mpif.h and the mpi module This stuff is coming from a VERY long-lived mercurial branch (3 years!); it'll almost certainly take a few SVN commits and a bunch of testing before I get it correctly committed to the SVN trunk. == More details == Craig Rasmussen and I have been working with the MPI-3 Fortran WG and Fortran J3 committees for a long, long time to make a prototype MPI-3 Fortran bindings implementation. We think we're at a stable enough state to bring this stuff back to the trunk, with the goal of including it in OMPI v1.7. Special thanks go out to everyone who has been incredibly patient and helpful to us in this journey: * Rolf Rabenseifner/HLRS (mastermind/genius behind the entire MPI-3 Fortran effort) * The Fortran J3 committee * Tobias Burnus/gfortran * Tony !Goetz/Absoft * Terry !Donte/Oracle * ...and probably others whom I'm forgetting :-( There's still opportunities for optimization in the mpi_f08 implementation, but by and large, it is as far along as it can be until Fortran compilers start implementing the new F08 dimension(..) syntax. Note that gfortran is currently unsupported for the mpi_f08 module and the new mpi module. gfortran users will a) fall back to the same mpi module implementation that is in OMPI v1.5.x, and b) not get the new mpi_f08 module. The gfortran maintainers are actively working hard to add the necessary features to support both the new mpi_f08 module and the new mpi module implementations. This will take some time. As mentioned above, ompi/mpi/f77 and ompi/mpi/f90 no longer exist. All the fortran bindings implementations have been collated under ompi/mpi/fortran; each implementation has its own subdirectory: {{{ ompi/mpi/fortran/ base/ - glue code mpif-h/ - what used to be ompi/mpi/f77 use-mpi-tkr/ - what used to be ompi/mpi/f90 use-mpi-ignore-tkr/ - new mpi module implementation use-mpi-f08/ - new mpi_f08 module implementation }}} There's also a prototype 6-function-MPI implementation under use-mpi-f08-desc that emulates the new F08 dimension(..) syntax that isn't fully available in Fortran compilers yet. We did that to prove it to ourselves that it could be done once the compilers fully support it. This directory/implementation will likely eventually replace the use-mpi-f08 version. Other things that were done: * ompi_info grew a few new output fields to describe what level of Fortran support is included * Existing Fortran examples in examples/ were renamed; new mpi_f08 examples were added * The old Fortran MPI libraries were renamed: * libmpi_f77 -> libmpi_mpifh * libmpi_f90 -> libmpi_usempi * The configury for Fortran was consolidated and significantly slimmed down. Note that the F77 env variable is now IGNORED for configure; you should only use FC. Example: {{{ shell$ ./configure CC=icc CXX=icpc FC=ifort ... }}} All of this work was done in a Mercurial branch off the SVN trunk, and hosted at Bitbucket. This branch has got to be one of OMPI's longest-running branches. Its first commit was Tue Apr 07 23:01:46 2009 -0400 -- it's over 3 years old! :-) We think we've pulled in all relevant changes from the OMPI trunk (e.g., Fortran implementations of the new MPI-3 MPROBE stuff for mpif.h, use mpi, and use mpi_f08, and the recent Fujitsu Fortran patches). I anticipate some instability when we bring this stuff into the trunk, simply because it touches a LOT of code in the MPI layer in the OMPI code base. We'll try our best to make it as pain-free as possible, but please bear with us when it is committed. This commit was SVN r26283.	2012-04-18 15:57:29 +00:00
Ralph Castain	bd8b4f7f1e	Sorry for mid-day commit, but I had promised on the call to do this upon my return. Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code. Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch. This commit was SVN r26242.	2012-04-06 14:23:13 +00:00
Josh Hursey	35bd7e638f	Make sure we check the correct communicator object for errors. This commit was SVN r26235.	2012-04-04 16:41:55 +00:00
Josh Hursey	0fb6f1c7ac	fix the return code, so we cleanup properly on errors This commit was SVN r26173.	2012-03-21 17:53:34 +00:00
Josh Hursey	3324b9e451	Check the return code from the allreduce collectives. If they fail, so should the operation. This commit was SVN r26170.	2012-03-21 13:46:26 +00:00
Josh Hursey	b36b6639b2	forgot the Copyright update This commit was SVN r26123.	2012-03-09 14:43:00 +00:00
Josh Hursey	ae9f424bd6	Per George's suggestion - reuse ompi_group_compare in the ompi_comm_compare operation to reuse the comparison code. This commit was SVN r26122.	2012-03-09 14:39:09 +00:00
Brian Barrett	b2411fe131	Add support for MPI-3's MPI_COMM_SPLIT_TYPE function This commit was SVN r25738.	2012-01-18 23:35:21 +00:00
George Bosilca	80c02647c8	Each level (OPAL/ORTE/OMPI) should only return it's own constants, instead of the current mismatch. This commit was SVN r25230.	2011-10-04 14:50:31 +00:00
Jeff Squyres	b2b781e537	Fix a few miscelaneous memory leaks. This commit was SVN r24865.	2011-07-08 16:39:58 +00:00
Edgar Gabriel	725a0d2100	fix a formatting issue This commit was SVN r24596.	2011-03-31 20:05:45 +00:00
Edgar Gabriel	ad9f793ce4	avoid calling omp_dpm.mark_dyncomm if the size of the local communicator is zero. The routine assumes that at least one process is available in the group, which lead to a segfault when creating communicators with GROUP_EMPTY. Fixes trac:2752 This commit was SVN r24595. The following Trac tickets were found above: Ticket 2752 --> https://svn.open-mpi.org/trac/ompi/ticket/2752	2011-03-31 19:57:06 +00:00
Josh Hursey	ee42c673fe	Fix formatting in group and communicator code (- No functionality changes -) Mostly TAB to spaces changes, though a couple style fixes were included as well. The tab/space issue was causing problems with off-trunk branch merging. This commit was SVN r23827.	2010-10-04 14:54:58 +00:00
Abhishek Kulkarni	afbe3e99c6	* Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with (OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns back the native error code. * Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to decode 'ret' to get the native error code. This commit was SVN r23162.	2010-05-17 23:08:56 +00:00
Ralph Castain	b400b84162	Merge in the modified thread configure option branch per today's telecon. Remove the --enable-progress-threads option as this is no longer functional, and hardcode OPAL_ENABLE_PROGRESS_THREADS to 0. Replace the --enable-mpi-threads option with --enable-mpi-thread-multiple as this is clearer as to meaning. This option automatically turns "on" opal thread support if it wasn't already so specified. If the user specifies --disable-opal-multi-threads --enable-mpi-thread-multiple, we will error out with a message Add a new --enable-opal-multi-threads option that turns "on" opal thread support without doing anything wrt mpi-thread-multiple This commit was SVN r22841.	2010-03-16 23:10:50 +00:00
Jeff Squyres	5ec2d8764b	Amendment to r22671: change the name of the new communicator flag from INTERNAL to EXTRA_RETAIN, because not all "internal" communicators have this flag set (only internal communicators with CIDs less than their parent). Hence, what this flag ''really'' means is that there was an extra RETAIN performed on it. So name the flag just that -- EXTRA_RETAIN -- indicating that an extra RETAIN has occurred. This commit was SVN r22690. The following SVN revision numbers were found above: r22671 --> open-mpi/ompi@61dee816db	2010-02-23 21:24:07 +00:00
Shiqing Fan	e0bfd9f836	A type cast. This commit was SVN r22672.	2010-02-20 10:47:37 +00:00
Edgar Gabriel	61dee816db	This commit fixes a bug on how to deal with the potential if a 'dependent' communicator that we created has a lower CID than the parent comm. This can happen when using the hierarch collective communication module or for inter-communicators (since we make a duplicate of the original communicator). This is not a problem as long as the user calls MPI_Comm_free on the parent communicator. However, if the communicators are not freed by the user but released by Open MPI in MPI_Finalize, we walk through the list of still available communicators and free them one by one. Thus, local_comm is freed before the actual inter-communicator. However, the local_comm pointer in the inter communicator will still contain the 'previous' address of the local_comm and thus this will lead to a segmentation violation. In order to prevent that from happening, we increase the reference counter local_comm by one if its CID is lower than the parent. We cannot increase however its reference counter if the CID of local_comm is larger than the CID of the inter communicators, since a regular MPI_Comm_free would leave in that the case the local_comm hanging around and thus we would not recycle CID's properly, which was the reason and the cause for this trouble. This commit fixes tickets 2094 and 2166. Note however, that I want to close them manually, since a slightly different patch is required for the 1.4 series. This commit will have to be applied for the 1.5 series. And I will need a volunteer to review it. This commit was SVN r22671.	2010-02-19 23:45:30 +00:00
Brian Barrett	8b4825ff37	Updates to make trunk run on Catamount again: * Don't build the pstat component if all defines needed aren't there. * Update platform file to work better * Work around two places that depended on modex being operational This commit was SVN r22536.	2010-02-03 05:07:40 +00:00
Jeff Squyres	6e46fbdd7c	Remove some unused variables / silence some compiler warnings This commit was SVN r22419.	2010-01-15 03:15:18 +00:00
Shiqing Fan	6dc506c9de	Make the MS compiler happy when building static libraries. This commit was SVN r22416.	2010-01-14 22:01:26 +00:00
Edgar Gabriel	5c6384e771	clean up the comm_cid code by removing everything related to the block_cid algorithm. This makes it much easier to read again. This commit was SVN r22379.	2010-01-07 16:26:30 +00:00
Edgar Gabriel	9abeaad6e2	so here is what happens: in the v1.2 series the cid's could never go above the max. allowed for a particular pml. Because of that, pml_add_comm never checked for the cid, and in fact pml_add_comm was called in comm_set, which is before we knew the cid. in the v1.3 series (and trunk) we check now the cid to detect overflow, and because of that pml_add_comm has been moved after the cid allocation routine, namely into the comm_activate routine. in the v1.2 series, the comm_activate contained a synchronization step of the old communicator in order to prevent incoming fragments on the new communicator, with the main problem being that the allreduce in the communicator allocation finished at different times on different processes, and thus, this scenario could and did really occur. in the v1.3 series, the comm_activate does not contain the synchronization step anymore, since we introduced the new queue for fragments with unknown cid. The problem is however, that whether a fragment is known or not is decided by using ompi_comm_lookup(), which will return something useful as soon as the cid allocation finished, even before pml_add_comm has been called. So there is a small time gap where we will not post a message into queue for unknown cid's, but we can also not look up the process structure belonging to the rank in that comm ( that is in pml_ob1_match_recv_frag or something like that). The current fix reintroduces the synchronization step in comm_activate, and ensures that no fragment can be received for a new communicator before the synchronization occurs , and thus comm_nextcid() and pml_add_comm has been called. It seems to be the safest and easiest way for now. Welcome back, v1.2. This commit was SVN r21970.	2009-09-17 14:37:02 +00:00
Lenny Verkhovsky	4a84f29fa6	__func__ changed to hardcoded name, after a long thread of emails :) This commit was SVN r21965.	2009-09-10 08:11:38 +00:00
Lenny Verkhovsky	a4ae241769	replaced __FUNCTION__ with __func__ This commit was SVN r21956.	2009-09-09 12:02:45 +00:00
Lenny Verkhovsky	130d15384f	fixed error message. thanks to Arthur Huillet This commit was SVN r21952.	2009-09-08 15:36:37 +00:00
Brian Barrett	468bb42f83	Per discussion in ticket #2009 , temporarily disable the block CID allocation algorithms until they properly reuse CIDs. This commit was SVN r21879.	2009-08-25 15:13:31 +00:00
Rainer Keller	5a4c6434e7	- Single request, so use single wait Technically we should have used MPI_STATUSES_IGNORE, anyhow. (however both MPI_STATUS_IGNORE are ((MPI_Status)0) ;-) This commit was SVN r21757.	2009-08-04 14:51:15 +00:00
Ralph Castain	3561880546	Silence compiler warning about comparing signed and unsigned values This commit was SVN r21637.	2009-07-11 18:36:43 +00:00
Edgar Gabriel	b6f292f794	add a uint8_t to the startup modex which allows us to recognize whether different processes have requested different levels of thread support. This verification is restricted to MPI_COMM_WORLD. In case one ore more processes have requested support for MPI_THREAD_MULTIPLE, the cid selection algorithm will fall back to the original, thread safe approach. Else, it uses the block-algorithm. For dynamic communicators, we always fall back now to the original algorithm. This has been tested for homogeneous and heterogeneous settings for MCW. However, I could not test yet the dynamic comm scenario for technical reasons, and that's why I don't close yet ticket 1949. This commit was SVN r21613.	2009-07-07 18:32:14 +00:00
Rainer Keller	b572dc3591	- As discussed revert r21330, Fortran-configure info should not end up in OPAL - Will post an updated patch for the OMPI_ALIGNMENT_ parts (within C). This commit was SVN r21342. The following SVN revision numbers were found above: r21330 --> open-mpi/ompi@95596d1814	2009-06-01 19:02:34 +00:00
Rainer Keller	95596d1814	- Move alignment and size output generated by configure-tests into the OPAL namespace, eliminating cases like opal/util/arch.c testing for ompi_fortran_logical_t. As this is processor- and compiler-related information (e.g. does the compiler/architecture support REAL*16) this should have been on the OPAL layer. - Unifies f77 code using MPI_Flogical instead of opal_fortran_logical_t - Tested locally (Linux/x86-64) with mpich and intel testsuite but would like to get this week-ends MTT output - PLEASE NOTE: configure-internal macro-names and ompi_cv_ variables have not been changed, so that external platform (not in contrib/) files still work. This commit was SVN r21330.	2009-05-30 15:54:29 +00:00
Edgar Gabriel	d93def71ea	second part of the 'running out of cids problem', this time focusing on what happens when hierarch is used. . Two major items: - modify the comm_activate step to take an additional argument, indicating whether the new communicatio has to go through the collective selection step. This is not required sometimes (e.g. when a process calls MPI_COMM_SPLIT with color=MPI_UNDEFINED), and contributed significantly to the exhaustion of cids. - when freeing a communicator, check whether we can reuse the block of cids assigned to that comm. This only works if the current front of the cid assignment (cid_block_start) is right ater the block of cids assigned to this comm. Fixes trac:1904 Fixes trac:1926 This commit was SVN r21296. The following Trac tickets were found above: Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904 Ticket 1926 --> https://svn.open-mpi.org/trac/ompi/ticket/1926	2009-05-27 15:21:07 +00:00
Greg Koenig	60485ff95f	This is a very large change to rename several #define values from OMPI_* to OPAL_*. This allows opal layer to be used more independent from the whole of ompi. NOTE: 9 "svn mv" operations immediately follow this commit. This commit was SVN r21180.	2009-05-06 20:11:28 +00:00
Shiqing Fan	387ee0ad29	fix a type cast This commit was SVN r21161.	2009-05-05 13:51:02 +00:00
Edgar Gabriel	338b136c28	adding a feature which tries to reuse a block of cids assigned to a communicator. This works, if all processes agree that all communicators utilizing the cids in the block have been freed. If they don't, they assign a new block of cid's. This fixes the application scenario reported in the week, in fact the test succefully creates 100,000 communicators without exceeding a cid of 20. The fix also keeps the main property of the algorithm (namely using a single Allreduce operation to get a new block) and did not modify the communicator structure. This commit was SVN r21142.	2009-05-02 18:03:57 +00:00
Brian Barrett	77cf736f48	Make max_contextid field match same type as cid in communicator. Refs trac:1904 This commit was SVN r21141. The following Trac tickets were found above: Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904	2009-05-01 21:11:59 +00:00
Rainer Keller	221fb9dbca	... Delayed due to notifier commits earlier this day ... - Delete unnecessary header files using contrib/check_unnecessary_headers.sh after applying patches, that include headers, being "lost" due to inclusion in one of the now deleted headers... In total 817 files are touched. In ompi/mpi/c/ header files are moved up into the actual c-file, where necessary (these are the only additional #include), otherwise it is only deletions of #include (apart from the above additions required due to notifier...) - To get different MCAs (OpenIB, TM, ALPS), an earlier version was successfully compiled (yesterday) on: Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled This commit was SVN r21096.	2009-04-29 01:32:14 +00:00
George Bosilca	b7c1ae4f76	Nothing important, just an identation. This commit was SVN r20919.	2009-04-01 15:27:16 +00:00
Rainer Keller	6f808d9b05	Preparation work for another commit (after RFC): - This patch solely _adds_ required headers and is rather localized The next patch (after RFC) heavily removes headers (based on script) - ompi/communicator/communicator.h: For sources that use ompi_mpi_comm_world, don't require them to include "mpi.h" - ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs #include "ompi/mca/topo/topo.h" - ompi/errhandler/errhandler_predefined.h: ompi/communicator/communicator.h depends on this header file! To prevent recursion just have fwd declarations. #include "ompi/types.h" for fwd declarations of the main structs. - ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t - ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and ompi_rb_tree_t, so have the proper classes - ompi/mca/op/op.h: Op is pretty self-contained: Nobody up to now has done #include "opal/class/opal_object.h" - ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h: #include "opal/types.h" for ompi_ptr_t - ompi/mca/pml/base/base.h: We use opal_lists - ompi/mca/pml/dr/pml_dr_vfrag.h: #include "opal/types.h" for ompi_ptr_t - ompi/mca/pml/ob1/pml_ob1_hdr.h: #include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t - opal/dss/dss_unpack.c: #include "opal/types.h" - opal/mca/base/base.h: #include "opal/util/cmd_line.h" for opal_cmd_line_t - orte/mca/oob/tcp/oob_tcp.c: #include "opal/types.h" for opal_socklen_t - orte/mca/oob/tcp/oob_tcp.h: #include "opal/threads/threads.h" for opal_thread_t - orte/mca/oob/tcp/oob_tcp_msg.c: #include "opal/types.h" - orte/mca/oob/tcp/oob_tcp_peer.c: #include "opal/types.h" for opal_socklen_t - orte/mca/oob/tcp/oob_tcp_send.c: #include "opal/types.h" - orte/mca/plm/base/plm_base_proxy.c: #include "orte/util/name_fns.h" for ORTE_NAME_PRINT - orte/mca/rml/base/rml_base_receive.c: #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE - orte/mca/rml/oob/rml_oob_recv.c: #include "opal/types.h" for ompi_iov_base_ptr_t - orte/mca/rml/oob/rml_oob_send.c: #include "opal/types.h" for ompi_iov_base_ptr_t - orte/runtime/orte_data_server.c #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE - orte/runtime/orte_globals.h: #include "orte/util/name_fns.h" for ORTE_NAME_PRINT Tested on Linux/x86-64 This commit was SVN r20817.	2009-03-17 21:34:30 +00:00
Terry Dontje	9215100ac4	Increase communicator padding to accomodate ppc larger lock structures. This commit was SVN r20728.	2009-03-04 19:54:58 +00:00
Rainer Keller	fd28b392bf	- An intrusive commit yet again (sorry): with the separation we get bitten by header depending on having already included the corresponding [opal\|orte\|ompi]_config.h header. When separating, things like [OPAL\|ORTE\|OMPI]_DECLSPEC are missed. Script to add the corresponding header in front of all following (taking care of possible #ifdef HAVE_...) - Including some minor cleanups to - ompi/group/group.h -- include _after_ #ifndef OMPI_GROUP_H - ompi/mca/btl/btl.h -- nclude _after_ #ifndef MCA_BTL_H - ompi/mca/crcp/bkmrk/crcp_bkmrk_btl.c -- still no need for orte/util/output.h - ompi/mca/pml/dr/pml_dr_recvreq.c -- no need for mpool.h - ompi/mca/btl/btl.h -- reorder to fit - ompi/mca/bml/bml.h -- reorder to fit - ompi/runtime/ompi_mpi_finalize.c -- reorder to fit - ompi/request/request.h -- additionally need ompi/constants.h - Tested on linux/x86-64 This commit was SVN r20720.	2009-03-04 15:35:54 +00:00
Terry Dontje	0178b6c45f	Added padding to predefined handle structures to maintain library version to version compatibility. This commit was SVN r20627.	2009-02-24 17:17:33 +00:00
Rainer Keller	d81443cc5a	- On the way to get the BTLs split out and lessen dependency on orte: Often, orte/util/show_help.h is included, although no functionality is required -- instead, most often opal_output.h, or orte/mca/rml/rml_types.h Please see orte_show_help_replacement.sh commited next. - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration actually showed two missing #include "orte/util/show_help.h" in orte/mca/odls/base/odls_base_default_fns.c and in orte/tools/orte-top/orte-top.c Manually added these. Let's have MTT the last word. This commit was SVN r20557.	2009-02-14 02:26:12 +00:00
Jeff Squyres	d1c6f3f89a	* Fix a truckload of Cisco copyrights to be the same as the rest of the code base. * Fix a few misspellings in other copyrights. This commit was SVN r20241.	2009-01-11 02:30:00 +00:00
Edgar Gabriel	b05393b363	now that we have the unexpected message queue for unknown communicators, there is no need to have this additional synchronization operation for multi-threaded communicator creations. This commit was SVN r20046.	2008-12-02 16:30:15 +00:00
George Bosilca	82d1d5d785	The patch for "Unexpected message queue for unknown CID's required" ticket #1460 . I'm unable to split it in two parts, my patch and Edgar's one. So I just update copyright information for both of us. What this patch do: - it use the unexpected queue create by commit r19562 to dispatch the unexpected message to the right communicator (once this communicator is created and initialized). - delay the PML comm_add until we have the context_id for the new communicator. - only do the PML comm_add on processes that really belong to the new communicator. Please read the lengthy comment in the source code for the reason behind this. This commit was SVN r19929. The following SVN revision numbers were found above: r19562 --> open-mpi/ompi@acd3406aa7	2008-11-04 21:58:06 +00:00
Jeff Squyres	0ae2c27d3b	Ensure that the mutex is properly constructed/destructed. This commit was SVN r19527.	2008-09-09 12:57:45 +00:00
George Bosilca	bf25b3339d	Minor cleanup. This commit was SVN r19464.	2008-08-31 21:03:39 +00:00
Jeff Squyres	008fa8c5cc	Fixes trac:1236, #1237 . * Various changes to enable 0-dimensional cartesian communicators: * Set various mtc_* members to NULL when there are 0 dimensions (and don't bother trying to memcpy these arrays when duplicating the communicator -- because they're NULL) * adjust topo_base_cart_sub to correctly handle 0 dimensions (simplified it a bit) * adjust a few error codes to return ERR_OUT_OF_RESOURCE * adjust error checking of CART_CREATE, CART_RANK * Allow MPI_GRAPH_CREATE to accept 0 == nnodes. * Bump reported MPI version in mpi.h to 2.1 This commit was SVN r19461. The following Trac tickets were found above: Ticket 1236 --> https://svn.open-mpi.org/trac/ompi/ticket/1236	2008-08-31 19:31:10 +00:00
Rainer Keller	9cc83d7414	- Fix the freeing of already allocated buffers, if one fails. Fixes Coverity CID 291 & CID 292 - Adjust the rc for other functions as well. This commit was SVN r19232.	2008-08-11 09:43:01 +00:00
Jeff Squyres	765749209f	Fix CID 413: possible uninitialized variable This commit was SVN r19176.	2008-08-06 12:25:56 +00:00
Edgar Gabriel	1adb3a6cda	Fixes trac:1408 The optimization that was introduced a year ago for saving a collective synchronization step for certain communicator creation functions has to be disabled for now. The bug has been exposed by the hierarch module, but could appear as well for inter-communicator creations. The problem is, that within a communicator creation step we invoke a comm_dup (for intercomm_create) or other collective operations (in case of hierarch) before all processes have been synchronized. This lead to the "Dropped message for non-existant communicators" error. This commit disables the optimization without removing it from the code base. In theory, it can be enabled again as soon as we have the unexpected message queues for unknown cid's, which were required if I remember right anyway for the multi-threaded scenarios and potentially for fault tolerance. Before moving the patch to 1.3 I would like to let it soak for a couple of days on trunk. Please note, taht my 2nd comment on ticket #1408 was semi-correct, since the order of activation of the communicator and quering the collective module have already been changed earlier. This commit was SVN r19139. The following Trac tickets were found above: Ticket 1408 --> https://svn.open-mpi.org/trac/ompi/ticket/1408	2008-08-04 14:55:09 +00:00
Jeff Squyres	0af7ac53f2	Fixes trac:1392, #1400 * add "register" function to mca_base_component_t * converted coll:basic and paffinity:linux and paffinity:solaris to use this function * we'll convert the rest over time (I'll file a ticket once all this is committed) * add 32 bytes of "reserved" space to the end of mca_base_component_t and mca_base_component_data_2_0_0_t to make future upgrades [slightly] easier * new mca_base_component_t size: 196 bytes * new mca_base_component_data_2_0_0_t size: 36 bytes * MCA base version bumped to v2.0 * '''We now refuse to load components that are not MCA v2.0.x''' * all MCA frameworks versions bumped to v2.0 * be a little more explicit about version numbers in the MCA base * add big comment in mca.h about versioning philosophy This commit was SVN r19073. The following Trac tickets were found above: Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392	2008-07-28 22:40:57 +00:00
Jeff Squyres	90576a435b	Fixes trac:1345 The issue is that the field mca_topo_base_comm_t->mtc_periods_or_edges has a different length, depending on whether the communicator is a graph or a cart. One of the comm dup functions always assumed that it was the length required by graph comms, which could lead to badness in some cases. This commit makes the legnth of that field on a comm dup be the proper length and copies the data over appropriately. I also changed the syntax of the ompi_comm_copy_topo() function to use shorter pointer notation; it made the code much easier to read and fix. This commit was SVN r18752. The following Trac tickets were found above: Ticket 1345 --> https://svn.open-mpi.org/trac/ompi/ticket/1345	2008-06-26 16:59:31 +00:00
Ralph Castain	0532d799d6	Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm. Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed. This commit was SVN r18664.	2008-06-18 03:15:56 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
George Bosilca	a00ca20446	More cleanups. This commit was SVN r18069.	2008-04-02 06:38:33 +00:00
Jeff Squyres	597266fdec	Present state of MPI debugger work: * New/improved bootstrapping technique for DLLs * First cut of the MPI handle debugging interface. It is still evolving, but the interface is getting more stable. * Some minor bugs were fixed in the unity topo component (brought to light because of the new MPI handle debugging stuff). Fixes trac:1209. This commit was SVN r17730. The following Trac tickets were found above: Ticket 1209 --> https://svn.open-mpi.org/trac/ompi/ticket/1209	2008-03-05 12:22:34 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Shiqing Fan	54c7b71cfd	Use the correct way of including memchecker.h, which will work with '--with-devel-headers'. This commit was SVN r17435.	2008-02-12 18:01:17 +00:00
Shiqing Fan	f5792bbda5	merging the memchecker into trunk. This commit was SVN r17424.	2008-02-12 08:46:27 +00:00

1 2 3 4 5 ...

254 Коммитов