openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	0af7ac53f2	Fixes trac:1392, #1400 * add "register" function to mca_base_component_t * converted coll:basic and paffinity:linux and paffinity:solaris to use this function * we'll convert the rest over time (I'll file a ticket once all this is committed) * add 32 bytes of "reserved" space to the end of mca_base_component_t and mca_base_component_data_2_0_0_t to make future upgrades [slightly] easier * new mca_base_component_t size: 196 bytes * new mca_base_component_data_2_0_0_t size: 36 bytes * MCA base version bumped to v2.0 * '''We now refuse to load components that are not MCA v2.0.x''' * all MCA frameworks versions bumped to v2.0 * be a little more explicit about version numbers in the MCA base * add big comment in mca.h about versioning philosophy This commit was SVN r19073. The following Trac tickets were found above: Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392	2008-07-28 22:40:57 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Galen Shipman	ced88a338b	include portals modex fun in the distro This commit was SVN r18325.	2008-04-28 18:51:54 +00:00
Galen Shipman	3a59cbd4a7	not sure how this got missed.. This commit was SVN r17710.	2008-03-05 01:23:43 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Galen Shipman	44003a41f2	Update common_portals to allow using portals interconnect with a modex rather than relying on cnos to get the nid/pid map. This commit was SVN r17588.	2008-02-25 19:17:21 +00:00
Galen Shipman	a04d21b459	Make CNL compile again.. This commit was SVN r16929.	2007-12-11 16:14:30 +00:00
Ron Brightwell	924414f92f	Added support for Accelerated Portals for the btl. This commit was SVN r16771.	2007-11-21 21:34:17 +00:00
Brian Barrett	3b98b5f0a1	The reference implementation of Portals (which runs over TCP on Linux) is only static libraries. Previously, we were linking the libraries into directly into the common, btl, and mtl code. This seemed to work fine for me on my Opteron Fedora box, but caused Lisa some issues (PtlNIInit would succeed, but the network handle would fail when used with PtlEQAlloc). Instead, link the portals libraries directly into libmpi and not at all into the common, btl, or mtl components. THen use some linker tricks to force the linker to bring in the public interface for the reference implementation (which thankfully is pretty small). This commit was SVN r15902.	2007-08-17 03:56:49 +00:00
Brian Barrett	1fb78a35f9	Back out part of r15756. The common_portals_utcp.c file is only used with the Sandia reference implementation of Portals, and doesn't have the cnos functions. This file should never be compiled (and wasn't being compiled) on the Cray machines, so doesn't need to be updated to support CNL. This commit was SVN r15778. The following SVN revision numbers were found above: r15756 --> open-mpi/ompi@755658694e	2007-08-06 17:21:00 +00:00
Josh Hursey	755658694e	Bring in changes to support Cray's Compute Node Linux (CNL) and Application Level Placement Scheduler (ALPS). This commit was tested under two Cray machines at ORNL: Jaguar (Catamount) and Rizzo (CNL Test cage). Both machines performed as they should across the commit. It is likely that mor changes will follow this the work and environment stabilizes. Most of the infrastructure works the same for Catamount and CNL except for a few bits. Below are the highlights: Default IFACE Change: On Catamount we can use PTL_IFACE_DEFAULT, but on the CNL system we have access to will fail on this interface, and should be set to: IFACE_FROM_BRIDGE_AND_NALID(PTL_BRIDGE_UK,PTL_IFACE_SS). So if we detect that we are running with YOD then use the former interface and if we detect that we are running with ALPS then use the latter. We will want to pursue a more elegant solution if this interface continues to change across machines. PtlGetId and cnos_register_ptlid: The header suggests that these should never be called when launching with YOD. But in the ALPS environment the cnos_barrier() will hang forever if these functions are not called after PtlNIInit(). Since these functions only need to be called once, and the orte rmgr/cnos component is loaded before the ompi common/portals componet then just call these functions once in the rmgr/cnos component. cnos_barrier_init(): This is a noop for YOD, but critical for ALPS. So be sure to call it before calling the first barrier in the rmgr/cnos component. cnos_barrier vs cnos_pm_barrier: It is suggested the cnos_pm_barrier only be used during finalization as it will indicate to the launcher (yod or aprun) that the app is about to complete. It was suggested that we use the regular cnos_barrier() instead. I want to look into this a bit more to make sure there are not adverse side effects. A note has been placed in the code to indicate this reasoning. This commit was SVN r15756.	2007-08-03 19:46:38 +00:00
Brian Barrett	d4950c6aa1	Allow an arbitrary list of procs to be passed to the resolve function, instead of just the procs for MCW (in MCW order). Should make resolving ptl_process_id_t structures for arbitrary communicators easier for applications that need it. This commit was SVN r15393.	2007-07-12 20:55:44 +00:00
Brian Barrett	8b9e8054fd	Move modex from pml base to general ompi runtime, sicne it's used by more than just the PML/BTLs these days. Also clean up the code so that it handles the situation where not all nodes register information for a given node (rather than just spinning until that node sends information, like we do today). Includes r15234 and r15265 from the /tmp/bwb-modex branch. This commit was SVN r15310. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15234 r15265	2007-07-09 17:16:34 +00:00
Jeff Squyres	c91fcd7fbd	Fix a bunch of minor typos submitted by Bernhard Fischer. This commit was SVN r13505.	2007-02-06 12:00:30 +00:00
Rich Graham	1c20feb52b	Take into account constants that in the cray headers are defined different than in the portals spec. This commit was SVN r13311.	2007-01-25 18:32:47 +00:00
Brian Barrett	a34e67d743	Remove unneeded PARAM_INIT_FILE variable in configure.params files used by components that use configure.m4 for configuration or are always built. The macro has not been needed since moving to configure types other than configure.stub Fixes trac:590 This commit was SVN r13031. The following Trac tickets were found above: Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590	2007-01-08 03:44:22 +00:00
Brian Barrett	3e29949cc8	* Fix shutdown code in utcp portals code * make all sends long sends for now in Portals MTL * More optimized match check This commit was SVN r10667.	2006-07-05 21:46:45 +00:00
Brian Barrett	47725c9b02	* Add new PML (CM) and network drivers (MTL) for high speed interconnects that provide matching logic in the library. Currently includes support for MX and some support for Portals * Fix overuse of proc_pml pointer on the ompi_proc structuer, splitting into proc_pml for pml data and proc_bml for the BML endpoint data * bug fixes in bsend init code, which wasn't being used by the OB1 or DR PMLs... This commit was SVN r10642.	2006-07-04 01:20:20 +00:00

19 Коммитов