openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	277e4ac292	Provide a warning message if a user's app executes a "fork" operation while using subsystems that may not cleanly support it - e.g., the openib btl. The provided warning is a generic one indicating that use of fork in current conditions is not recommended. This is setup so that it only is issued once (as opposed to every time they do it), and goes through orte_show_help so the user doesn't get hammered by #procs copies of the warning. In addition, there is a new MCA param (can't have too many!) to shut the warning off altogether. This closes ticket #1244 This commit was SVN r19196.	2008-08-06 14:22:03 +00:00
Ralph Castain	fdde3de903	Combination of some changes by both Jeff and I. Few minor cleanups to the code (e.g., allow options to show-mca-params to be either case), and an enhancement that allows the user to specify multiple options separated by commas (e.g., "env,api"). This commit was SVN r19124.	2008-08-02 00:43:27 +00:00
Jeff Squyres	4bdc093746	Fixes trac:1361: mainly add new internal MCA parameter that orterun will set when it launches under debuggers using the --debug option. This commit was SVN r19116. The following Trac tickets were found above: Ticket 1361 --> https://svn.open-mpi.org/trac/ompi/ticket/1361	2008-07-31 22:11:46 +00:00
Ralph Castain	f7d1c2d229	Extend the mca param display capability to allow independent output of the params based on where they were last set (default, enviro, file, or API), and to out put the name of the file that set them if they were set by file. This is of great assistance to support personnel trying to understand why a user is having pro blems. Coordinated with Jeff. This commit was SVN r19111.	2008-07-31 20:00:45 +00:00
Jeff Squyres	0af7ac53f2	Fixes trac:1392, #1400 * add "register" function to mca_base_component_t * converted coll:basic and paffinity:linux and paffinity:solaris to use this function * we'll convert the rest over time (I'll file a ticket once all this is committed) * add 32 bytes of "reserved" space to the end of mca_base_component_t and mca_base_component_data_2_0_0_t to make future upgrades [slightly] easier * new mca_base_component_t size: 196 bytes * new mca_base_component_data_2_0_0_t size: 36 bytes * MCA base version bumped to v2.0 * '''We now refuse to load components that are not MCA v2.0.x''' * all MCA frameworks versions bumped to v2.0 * be a little more explicit about version numbers in the MCA base * add big comment in mca.h about versioning philosophy This commit was SVN r19073. The following Trac tickets were found above: Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392	2008-07-28 22:40:57 +00:00
Ralph Castain	83e7c19d33	Remove deprecated function - this was incorporated into the paffinity framework a long time ago. Fortunately, nobody was actually using it! This commit was SVN r18990.	2008-07-23 03:43:31 +00:00
Ralph Castain	7834201f69	Silence unused var warning This commit was SVN r18888.	2008-07-11 15:39:59 +00:00
Jeff Squyres	583bf425c0	Fixes trac:1383: Short version: remove opal_paffinity_alone and restore mpi_paffinity_alone. ORTE makes various information available for the MPI layer to decide what it wants to do in terms of processor affinity. Details: * remove opal_paffinity_alone MCA param; restore mpi_paffinity_alone MCA param * move opal_paffinity_slot_list param registration to paffinity base * ompi_mpi_init() calls opal_paffinity_base_slot_list_set(); if that succeeds use that. If no slot list was set, see if mpi_paffinity_alone was set. If so, bind this process to its Node Local Rank (NLR). The NLR is the ORTE-maintained slot ID; if you COMM_SPAWN to a host in this ORTE universe that already has procs on it, the NLR for the new job will start at N (not 0). So this is slightly better than mpi_paffinity_alone in the v1.2 series. * If a slot list is specified and mpi_paffinity_alone is set, we display an error and abort. * Remove calls from rmaps/rank_file component to register and lookup opal_paffinity mca params. * Remove code in orte/odls that set affinities - instead, have them just pass a slot_list if it exists. * Cleanup the orte/odls code that determined oversubscribed/want_processor as these were just opposites of each other. This commit was SVN r18874. The following Trac tickets were found above: Ticket 1383 --> https://svn.open-mpi.org/trac/ompi/ticket/1383	2008-07-10 21:12:45 +00:00
Jeff Squyres	49be4b1e45	Fixes trac:1383 Lenny and I went back and forth on whether we should simply register another "mpi_paffinity_alone" MCA param and then try to figure out which one was set in ompi_mpi_init, but there was difficulty in figuring out what to do. So it seemed like the Right Thing to do was to implement what was committed in r18770; then we could tell where MCA parameters were set from and you could do Better Things (this is also useful in the openib BTL, where parameters can be set either via MCA parameter or via an INI file). But after that was done, it seemed only a few steps further to actually implement two new features in the MCA params area: * Synonyms (where one MCA param name is a synonym for another) * Allow MCA params and/or their synonyms to be marked as "deprecated" (printing out warnings if they are used) These features have actually long been discussed/desired, and I had some time in airports and airplanes recently where I could work in this stuff on a standalone laptop. So I did it. :-) This commit introduces these two new features, and then uses them to register mpi_paffinity_alone as a non-deprecated synonym for opal_paffinity_alone. A few other random points in this commit: * Add a few error checks for conditions that were not checked before * Correct some comments in mca_base_params.h * Add a few comments in strategic places * ompi_info now prints additional information: * for any MCA parameter that has synonyms, it lists all the synonyms * synonyms are also output as 1st-class MCA params, but with an additional attribute indicating that they have a "parent" * all MCA param name (both "real" or "synonym") will output an attribute indicating whether it is deprecated or not. A synonym is deprecated if it iself is marked as deprecated (via the mca_base_param_regist_syn() or mca_base_param_register_syn_name() functions) or if its "parent" MCA parameter is deprecated This commit was SVN r18859. The following SVN revision numbers were found above: r18770 --> open-mpi/ompi@8efe67e08c The following Trac tickets were found above: Ticket 1383 --> https://svn.open-mpi.org/trac/ompi/ticket/1383	2008-07-10 01:44:51 +00:00
Lenny Verkhovsky	ba1fa73881	Selectign Maffinity only if Paffinity selected fix This commit was SVN r18797.	2008-07-03 13:39:34 +00:00
Ralph Castain	265b4de5de	Ensure that the call to orte_routed is properly protected at compile time when RTE support is disabled This commit was SVN r18681.	2008-06-19 15:20:06 +00:00
Ralph Castain	3b5e80fa61	Shift responsibility for preconnecting the oob to the orte routed framework, which is the only place that knows what needs to be done. Only the direct module will actually do anything - it uses the same algo as the original preconnect function. This commit was SVN r18677.	2008-06-19 13:48:26 +00:00
Ralph Castain	955d117f5e	Add a new grpcomm module that mimics the old 1.2 behavior - it -always- does a modex because it always includes the architecture. Hence, we called it "blind-and-dumb" since it doesn't look to see if this is required - moniker of "bad". :-) Update the ESS API so we can update the stored arch's should the modex include that info. Update ompi/proc to check/set the arch for remote procs, and add that function call to mpi_init right after the modex is done. Setup to allow other grpcomm modules to decide whether or not to add the arch to the modex, and to detect if other entries have been made. If not, then the modex can just fall through. Begin setting up some logic in the "basic" module to handle different arch situations. For now, default to the "bad" module so we will work in all situations, even though we may be sending around more info than we really require. This fixes ticket #1340 This commit was SVN r18673.	2008-06-18 22:17:53 +00:00
Ralph Castain	282a220e7e	Update the debugger interface per email thread with Jeff and Brian. Handoff to them for final test and validation This commit was SVN r18670.	2008-06-18 15:28:46 +00:00
Ralph Castain	0532d799d6	Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm. Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed. This commit was SVN r18664.	2008-06-18 03:15:56 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
Ralph Castain	c992e99035	Remove the tags from orte_output_open and the filtering operation from orte_output - this will be handled differently to improve the XML output interface This commit was SVN r18557.	2008-06-03 14:24:01 +00:00
Jeff Squyres	a9e26c33e0	Ensure that we don't try to call orte_show_help() before orte_init() succeeds. This commit was SVN r18458.	2008-05-19 21:57:54 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Terry Dontje	8dd0421015	Moved ident lines to ompi_mpi_init.c and created new ompi_version_string variable. This commit was SVN r18345.	2008-05-01 15:06:10 +00:00
Ralph Castain	3e55fe6f6d	Fold in the revised modex scheme. Move the ompi_proc_t modex portions to the RTE level since the daemons already have that info. Provide each process with the equivalent of a "nidmap" - both a map of what nodes are in the job, and a map of which node each process is on. This enables the use of static ports, though that hasn't been turned "on" in this commit. Update the rsh tree spawn capability so we spawn the next wave of daemons before launching our own local procs. Add an ability to encode nodenames for large clusters with contiguous node name numbering schemes - this allows communication of all node names in a few bytes instead of tens-of-bytes/node. This commit was SVN r18338.	2008-04-30 19:49:53 +00:00
Josh Hursey	2c736873bb	Fix a checkpoint/restart bug that causes a restarted application to occasionally throw a SIGSEGV or SIGPIPE due to invalid socket descriptors. The problem was caused by a bad ordering between the restart of the ORTE level tcp connections (in the OOB - out-of-band communication) and the Open MPI level tcp connections (BTLs). Before this commit ORTE would shutdown and restart the OOB completely before the OMPI level restarted its tcp connections. What would happen is that a socket descriptor used by the OMPI level on checkpoint was assigned to the ORTE level on restart. But the OMPI level had no knowledge that the socket descriptor it was previously using has been recycled so it closed it on restart. This caused the ORTE level to break as the newly created socket descriptor was closed without its knowledge. The fix is to have the OMPI level shutdown tcp connections, allow the ORTE level to restart, and then allow the OMPi level to restart its connections. This seems obvious, and I'm surprised that this bug has not cropped up sooner. I'm confident that this specific problem has been fixed with this commit. Thanks to Eric Roman and Tamer El Sayed for their help in identifying this problem, and patience while I was fixing it. * Add a new state {{{OPAL_CRS_RESTART_PRE}}}. This state identifies when we are on the down slope of the INC (finalize-like) which is useful when you want to close, but not reopen a component set for fear of interfering with a lower level. * Use this new state in OMPI level coordination. Here we want to make sure to play well with both the OMPI/BTL/TCP and ORTE/OOB/TCP components. * Update ft_event functions in PML and BML to handle the new restart state. * Add an additional flag to the error output in OOB/TCP so we can see what the socket descriptor was on failure as this can be helpful in debugging. This commit was SVN r18276.	2008-04-24 17:54:22 +00:00
Edgar Gabriel	f7c8bb78fd	move the coll_base_comm_select functions after dpm has been opened and selected, but before we check whether we have been spawned. This is necessary in order for the hierarch collective component to work. This component might create new communicators already in MPI_Init(), which then have to execute the dpm.mark_dyncomm function. If dpm is not initialized at that point, we segfault. This commit was SVN r18045.	2008-03-31 19:37:37 +00:00
George Bosilca	60111ce66d	Few less warnings. This commit was SVN r18025.	2008-03-30 19:06:49 +00:00
Lenny Verkhovsky	7e45d7e134	Few updates due to RMAPS rank_file component changes 1. applied prefix rule to functions and variables of RMAPS rank_file component 2. cleaned ompi_mpi_init.c from paffinity code 3. paffinity code moved to new opal/mca/paffinity/base/paffinity_base_service.c file 4. added opal_paffinity_slot_list mca parameter This commit was SVN r18019.	2008-03-30 11:52:11 +00:00
Ralph Castain	6166278e18	Improve the scalability of the modex operation and fix a bug reported by Tim P The bug was a race condition in the barrier operation that caused the barrier in MPI_Finalize to fail on very short programs. Scalaiblity was improved by using the daemons to aggregate modex and barrier messages before sending them to the rank=0 proc. Improvement is proportional to ppn, of course, but there really wasn't a scaling problem at low ppn anyway. This modification also paves the way for better allgather operations since now all the data for each node is sitting at the daemon level, and the daemons are now aware that a collective operation on the OOB is underway (so they -can- participate in a collective of their own to support it). Also added better diagnostics to map out the timing associated with MPI_Init - turned on by -mca orte_timing 1. This commit was SVN r17988.	2008-03-27 15:17:53 +00:00
Jeff Squyres	a2795fe43d	Very minor modification against r17980: check the whole string against "all", not just the first 3 chars (i.e., if someone sets the value "allfoo", we should still error). This commit was SVN r17981. The following SVN revision numbers were found above: r17980 --> open-mpi/ompi@b3ef774d46	2008-03-26 19:10:02 +00:00
Josh Hursey	b3ef774d46	A fix for r17956. r17956 broke the ability for the user to override the 'opal_event_include' parameter. This commit checks to see if the user specified a value before forcing the "all" value on the event engine. This commit fixes Checkpoint/Restart support in the trunk which requires this feature. This commit was SVN r17980. The following SVN revision numbers were found above: r17956 --> open-mpi/ompi@763218e754	2008-03-26 14:54:09 +00:00
Jeff Squyres	763218e754	Fix #1253 : default libevent to use select/poll and only use the other mechanisms (such as epoll) if someone (ompi_mpi_init()) requests otherwise. See big comment in opal/event/event.c for a full explanation. This commit was SVN r17956.	2008-03-25 17:18:17 +00:00
Jeff Squyres	004c3a5b09	Ensure to cover all cases when either ORTE or OMPI is not yet initialized. For example, there is a period of time during ompi_mpi_init when orte_initialized==true, but ompi_mpi_initialized==false (and therefore communicators are not setup yet, etc.). This commit was SVN r17937.	2008-03-24 16:25:14 +00:00
Ralph Castain	dc7f45dafd	Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure. Only one place used the user name field - session_dir, when formulating the name of the top-level directory. Accordingly, the code for getting the user's id has been moved to the session_dir code. This commit was SVN r17926.	2008-03-23 23:10:15 +00:00
Jeff Squyres	4314609a00	* Remove a meaningless clause (it could never be true) * Fix an error message to correctly display if we were before MPI_INIT or after MPI_FINALIZE (refs trac:1243) This commit was SVN r17873. The following Trac tickets were found above: Ticket 1243 --> https://svn.open-mpi.org/trac/ompi/ticket/1243	2008-03-18 22:26:43 +00:00
Ralph Castain	f39ce707b5	Remove an ORTE debug flag from an MPI function This commit was SVN r17871.	2008-03-18 18:25:45 +00:00
Ralph Castain	32a82349df	More fixes to cleanup compiler warnings for rank_file code This commit was SVN r17863.	2008-03-18 13:21:38 +00:00
Lenny Verkhovsky	647bce6d3e	Support for new RMAPS rank mapping component This commit was SVN r17860.	2008-03-18 09:39:07 +00:00
Jeff Squyres	597266fdec	Present state of MPI debugger work: * New/improved bootstrapping technique for DLLs * First cut of the MPI handle debugging interface. It is still evolving, but the interface is getting more stable. * Some minor bugs were fixed in the unity topo component (brought to light because of the new MPI handle debugging stuff). Fixes trac:1209. This commit was SVN r17730. The following Trac tickets were found above: Ticket 1209 --> https://svn.open-mpi.org/trac/ompi/ticket/1209	2008-03-05 12:22:34 +00:00
Josh Hursey	3b4073e32c	This commit fixes the checkpoint/restart functionality on the trunk. Included in this commit are: * Extension to the ESS framework to support C/R * Fixed support for {{{snapc_base_establish_global_snapshot_dir}}} * Fixed FileM support * Misc. minor code modifications There are some outstanding visability issues that I want to fix next. This commit was SVN r17725.	2008-03-05 04:57:23 +00:00
Ralph Castain	8d819cf3d3	Move carto open/close/finalize to opal layer so that ORTE can get access to topo info. This will be used to support a topo grpcomm that optimizes communications in non-uniform topologies like RR. This commit was SVN r17652.	2008-02-28 21:04:30 +00:00
George Bosilca	d9937cca81	Only declare ret in the block where it is used (avoid a warning about unused variable). This commit was SVN r17638.	2008-02-28 06:18:57 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Sharon Melamed	025b68becf	Move the carto framework to the trunk. This commit was SVN r17177.	2008-01-23 09:20:34 +00:00
George Bosilca	906e8bf1d1	Replace the ompi_pointer_array with opal_pointer_array. The next step (sometimes after the merge with the ORTE branch), the opal_pointer_array will became the only pointer_array implementation (the orte_pointer_array will be removed). This commit was SVN r17007.	2007-12-21 06:02:00 +00:00
Jeff Squyres	26d8fe70c3	Fixes trac:1029: add in support for MPI_CONVERSION_FN_NULL. This commit brings over all the work from the /tmp-public/datarep branch. See commits r16855, r16859, r16860 for the highlights of what was done. This commit was SVN r16891. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r16855 r16859 r16860 The following Trac tickets were found above: Ticket 1029 --> https://svn.open-mpi.org/trac/ompi/ticket/1029	2007-12-07 13:09:07 +00:00
Galen Shipman	62ade993ca	Seperate finalize and close for the PML, this gives the PML a chance to complete any outstanding operations prior to close. Before this change we just called pml_finalize in pml_close which causes problems if there are outstanding events that a BTL/MTL needs to progress during finalize. The problem is that MPI_COMM_WORLD and others were destroyed prior to closing the PML, pml_close would call pml_finalize, events would progress in the BTL, and these events expected MPI_COMM_WORLD to still be around.. This commit was SVN r16405.	2007-10-09 15:28:56 +00:00
Ralph Castain	54b2cf747e	These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC. The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component. This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done: As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in. In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in. The incoming changes revamp these procedures in three ways: 1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step. The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic. Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure. 2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed. The size of this data has been reduced in three ways: (a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes. To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose. (b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction. (c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using. While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly. 3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup. It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging. Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future. There are a few minor additional changes in the commit that I'll just note in passing: propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details. * requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details. * cleanup of some stale header files This commit was SVN r16364.	2007-10-05 19:48:23 +00:00
Tim Prins	4033a40e4e	Coding standards... This commit was SVN r16118.	2007-09-13 14:00:59 +00:00
George Bosilca	7b3dcff267	Coverty: Limit the strcpy to the maximum length of the destination. This commit was SVN r16107.	2007-09-12 18:03:53 +00:00
Jeff Squyres	ad784a9ab0	Make "simultaneous" be a size_t; there's already a check to ensure that it is >= 1, so making it a size_t makes it easier to interact with all the other size_t variables and removes a compiler warning. This commit was SVN r15935.	2007-08-20 13:22:46 +00:00
Brian Barrett	af4e86c25f	Update collectives selection logic to allow for multiple components to be used at nce (up to one unique collective module per collective function). Matches r15795:15921 of the tmp/bwb-coll-select branch This commit was SVN r15924. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15795 r15921	2007-08-19 03:37:49 +00:00
Rolf vandeVaart	797078115d	Fix the case where mpi_preconnect_oob=1 and mpi_preconnect_oob_simultaneous > np. Need to scale back simultaneous to equal np in those cases. Reviewed by Brian. This commit fixes trac:1064. This commit was SVN r15916. The following Trac tickets were found above: Ticket 1064 --> https://svn.open-mpi.org/trac/ompi/ticket/1064	2007-08-17 20:18:42 +00:00
Brian Barrett	d166a2bb6d	Change requested by Ralph -- Remove the dependency on GPR triggers for filling in the OMPI proc structures. For now, use an extension of the modex that is keyed on strings. Eventually, this should use the attribute put/get that is part of the RSL interface. This commit was SVN r15820.	2007-08-09 18:53:28 +00:00
Mohamad Chaarawi	59a7bf8a9f	Merging in the Sparse Groups.. This commit includes config changes.. This commit was SVN r15764.	2007-08-04 00:41:26 +00:00
George Bosilca	8baeadb761	The PTLs are now long gone ... This commit was SVN r15763.	2007-08-04 00:37:52 +00:00
Jeff Squyres	d3f008492f	Introduce a new debugging MCA parameter: mpi_show_mpi_alloc_mem_leaks When activated, MPI_FINALIZE displays a list of memory allocations from MPI_ALLOC_MEM that were not freed by MPI_FREE_MEM (in each MPI process). * If set to a positive integer, display only that many leaks. * If set to a negative integer, display all leaks. * If set to 0, do not show any leaks. This commit was SVN r15736.	2007-08-01 21:33:25 +00:00
Brian Barrett	58ee6b4e35	More documentation. Yay? This commit was SVN r15622.	2007-07-25 21:01:10 +00:00
Brian Barrett	5b9fa7e998	reapply r15517 and r15520, which were removed in r15527 so that I could get the RML/OOB merge in slightly easier This commit was SVN r15530. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95 r15520 --> open-mpi/ompi@9cbc9df1b8 r15527 --> open-mpi/ompi@2d17dd9516	2007-07-20 02:34:29 +00:00
Brian Barrett	39a6057fc6	A number of improvements / changes to the RML/OOB layers: * General TCP cleanup for OPAL / ORTE * Simplifying the OOB by moving much of the logic into the RML * Allowing the OOB RML component to do routing of messages * Adding a component framework for handling routing tables * Moving the xcast functionality from the OOB base to its own framework Includes merge from tmp/bwb-oob-rml-merge revisions: r15506, r15507, r15508, r15510, r15511, r15512, r15513 This commit was SVN r15528. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15506 r15507 r15508 r15510 r15511 r15512 r15513	2007-07-20 01:34:02 +00:00
Brian Barrett	2d17dd9516	temporarily back our r15517 and 15520 so that I can get the RML / OOB changes to cleanly apply This commit was SVN r15527. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95	2007-07-20 01:10:34 +00:00
Ralph Castain	41977fcc95	Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but... Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point. Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings. This commit was SVN r15517.	2007-07-19 20:56:46 +00:00
Jeff Squyres	3bc940ac27	Fix three things from r15474 (thanks to Brian for noticing): * bml.h had a change that introduced a variable named "_order" to avoid a conflict with a local variable. The namespace starting with _ belongs to the os/compiler/kernel/not us. So we can't start symbols with _. So I replaced it with arg_order, and also updated the threaded equivalent of the macro that was modified. * in btl_openib_proc.c, one opal_output accidentally had its string reverted from "ompi_modex_recv..." to "mca_pml_base_modex_recv....". This was fixed. * The change to ompi/runtime/ompi_preconnect.c was entirely reverted; it was an artifact of debugging. This commit was SVN r15475. The following SVN revision numbers were found above: r15474 --> open-mpi/ompi@8ace07efed	2007-07-18 11:38:06 +00:00
Jeff Squyres	8ace07efed	This commit brings in two major things: 1. Galen's fine-grain control of queue pair resources in the openib BTL. 1. Pasha's new implementation of asychronous HCA event handling. Pasha's new implementation doesn't take much explanation, but the new "multifrag" stuff does. Note that "svn merge" was not used to bring this new code from the /tmp/ib_multifrag branch -- something Bad happened in the periodic trunk pulls on that branch making an actual merge back to the trunk effectively impossible (i.e., lots and lots of arbitrary conflicts and artifical changes). :-( == Fine-grain control of queue pair resources == Galen's fine-grain control of queue pair resources to the OpenIB BTL (thanks to Gleb for fixing broken code and providing additional functionality, Pasha for finding broken code, and Jeff for doing all the svn work and regression testing). Prior to this commit, the OpenIB BTL created two queue pairs: one for eager size fragments and one for max send size fragments. When the use of the shared receive queue (SRQ) was specified (via "-mca btl_openib_use_srq 1"), these QPs would use a shared receive queue for receive buffers instead of the default per-peer (PP) receive queues and buffers. One consequence of this design is that receive buffer utilization (the size of the data received as a percentage of the receive buffer used for the data) was quite poor for a number of applications. The new design allows multiple QPs to be specified at runtime. Each QP can be setup to use PP or SRQ receive buffers as well as giving fine-grained control over receive buffer size, number of receive buffers to post, when to replenish the receive queue (low water mark) and for SRQ QPs, the number of outstanding sends can also be specified. The following is an example of the syntax to describe QPs to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues: {{{ -mca btl_openib_receive_queues \ "P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32" }}} Each QP description is delimited by ";" (semicolon) with individual fields of the QP description delimited by "," (comma). The above example therefore describes 4 QPs. The first QP is: P,128,16,4 Meaning: per-peer receive buffer QPs are indicated by a starting field of "P"; the first QP (shown above) is therefore a per-peer based QP. The second field indicates the size of the receive buffer in bytes (128 bytes). The third field indicates the number of receive buffers to allocate to the QP (16). The fourth field indicates the low watermark for receive buffers at which time the BTL will repost receive buffers to the QP (4). The second QP is: S,1024,256,128,32 Shared receive queue based QPs are indicated by a starting field of "S"; the second QP (shown above) is therefore a shared receive queue based QP. The second, third and fourth fields are the same as in the per-peer based QP. The fifth field is the number of outstanding sends that are allowed at a given time on the QP (32). This provides a "good enough" mechanism of flow control for some regular communication patterns. QPs MUST be specified in ascending receive buffer size order. This requirement may be removed prior to 1.3 release. This commit was SVN r15474.	2007-07-18 01:15:59 +00:00
Rich Graham	0991c3d5f5	move buffered send component clean up out of the pml to ompi_mpi_finalize. This commit was SVN r15463.	2007-07-17 14:50:52 +00:00
Rich Graham	de5670cd79	add missing header file - Thanks Brian. This commit was SVN r15455.	2007-07-17 00:06:35 +00:00
Rich Graham	1a4ce2a961	move setting of the component used to managed buffer sends out of the pmls, and into ompi_mpi_init. This is the first of several steps to pull buffered send management out of the pmls. This commit was SVN r15451.	2007-07-16 21:52:25 +00:00
Sven Stork	804f3bee41	- export symbols that are required for the fortran bindings This commit was SVN r15439.	2007-07-16 13:23:57 +00:00
George Bosilca	b9db0a4c2d	Remove a warning: ompi-trunk/ompi/runtime/ompi_mpi_init.c:221: warning: `cmd_buffer' might be used uninitialized in this function This commit was SVN r15397.	2007-07-13 06:20:44 +00:00
Ralph Castain	bd65f8ba88	Bring in an updated launch system for the orteds. This commit restores the ability to execute singletons and singleton comm_spawn, both in single node and multi-node environments. Short description: major changes include - 1. singletons now fork/exec a local daemon to manage their operations. 2. the orte daemon code now resides in libopen-rte 3. daemons no longer use the orte triggering system during startup. Instead, they directly call back to their parent pls component to report ready to operate. A base function to count the callbacks has been provided. I have modified all the pls components except xcpu and poe (don't understand either well enough to do it). Full functionality has been verified for rsh, SLURM, and TM systems. Compile has been verified for xgrid and gridengine. This commit was SVN r15390.	2007-07-12 19:53:18 +00:00
Tim Prins	5b815ec94b	fix deadlock in new modex code This commit was SVN r15326.	2007-07-10 13:28:44 +00:00
Brian Barrett	8b9e8054fd	Move modex from pml base to general ompi runtime, sicne it's used by more than just the PML/BTLs these days. Also clean up the code so that it handles the situation where not all nodes register information for a given node (rather than just spinning until that node sends information, like we do today). Includes r15234 and r15265 from the /tmp/bwb-modex branch. This commit was SVN r15310. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15234 r15265	2007-07-09 17:16:34 +00:00
Bill D'Amico	9b5f73976d	Bring Portable Linux Processor Affinity into trunk. Changes paffinity interface to use a cpu mask for available/preferred cpus rather than the current coarse grained paffinity that lets the OS choose which processor. Macros for setting and clearing masks are provided. Solaris and windows changes have not been made. Solaris subdirectory has some suggested changes - however the relevant man pages for the Solaris 10 APIs have some ambiguity regarding order in which one create and sets a processor set. As we did not have access to a solaris 10 machine we could not test to see the correct way to do the work under solaris. This commit was SVN r14887.	2007-06-05 22:07:30 +00:00
Josh Hursey	1e678c3f55	per conversation with Ralph and Jeff take out the opal_init_only logic. This commit moves the initalization/finalization of opal_event and opal_progress to opal_init/finalize. These were previously init/final in ORTE which is an abstraction violation. After talking about it we concluded that there are no ordering issues that require these to be init/final in ORTE instead of OPAL. I ran the IBM test suite against this commit and it didn't turn up any new failures so I think it is good to go. Let us know if this causes problems. This commit was SVN r14773.	2007-05-24 21:54:58 +00:00
Brian Barrett	5f15becf4e	Allow multiple connections to be started simultaneously when doing the OOB wireup. For small clusters or clusters with decent ARP lookup and connect times, this will have marginal impact. For systems with either bad ARP lookup times or long connect times, increasing this number to something much closer to SOMAXCONN (128 on most modern machines) will result in a faster OOB wireup. Don't set higher than SOMAXCONN or you can end up with lots of connect() retries and we'll end up slower. This commit was SVN r14742.	2007-05-23 21:35:44 +00:00
Ralph Castain	4fff584a68	Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that did start. The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system. Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed. Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief. With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn. Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put". This commit was SVN r14711.	2007-05-21 18:31:28 +00:00
Sven Stork	22af6d38e6	- UNexport symbols that shouldn't be needed outside the libraries - replace #if/#endif with BEGIN/END_C_DECLS - reformating This commit was SVN r14669.	2007-05-16 15:46:52 +00:00
Brian Barrett	a25ce44dc1	Clean up the preconnect code: * Don't need the 2 process case -- we'll send an extra message, but at very little cost and less code is better. * Use COMPLETE sends instead of STANDARD sends so that the connection is fully established before we move on to the next connection. The previous code was still causing minor connection flooding for huge numbers of processes. * mpi_preconnect_all now connects both OOB and MPI layers. There's also mpi_preconnect_mpi and mpi_preconnect_oob should you want to be more specific. * Since we're only using the MCA parameters once at the beginning of time, no need for global constants. Just do the quick param lookup right before the parameter is needed. Save some of that global variable space for the next guy. Fixes trac:963 This commit was SVN r14553. The following Trac tickets were found above: Ticket 963 --> https://svn.open-mpi.org/trac/ompi/ticket/963	2007-05-01 04:49:36 +00:00
Ralph Castain	18b2dca51c	Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly. There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd really have to try). This also involved a slight change to the oob.xcast API, so propagated that as required. Note: this has only been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-) Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately. This commit was SVN r14475.	2007-04-23 18:41:04 +00:00
Josh Hursey	8f119d9063	Closes trac:977 Fix for memory corruption in the restarted process stack. This stemed from the brute force method we were previously using. This commit fixes this by using a lighter weight solution focused in the r2 BML instead of above the PML. This is a more efficient and flexible solution, and it solves the original problem. In the process I pulled out the ft_event function in the tcp BTL and r2 BML into a set of *_ft.[c\|h] files just to keep any updates to these code paths as isolated as possible to make merging easier on everyone. This commit was SVN r14371. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855 The following Trac tickets were found above: Ticket 977 --> https://svn.open-mpi.org/trac/ompi/ticket/977	2007-04-14 02:06:05 +00:00
Rich Graham	f481722bdf	move the code that sets the thread level information before the btl are initialized, so that the btl's have this information for correct setup. This commit was SVN r14258.	2007-04-07 05:06:47 +00:00
George Bosilca	cb93b1d40d	Deal with compiler warnings and size_t in same time ... It's getting more and more tricky !!! This commit was SVN r14162.	2007-03-28 22:02:13 +00:00
George Bosilca	4bc69447b4	Setting a size_t to -1 leads to unexpected results ... This commit was SVN r14160.	2007-03-28 18:23:42 +00:00
Shiqing Fan	fb50a72e92	Unnecessary header removed. This commit was SVN r14152.	2007-03-27 14:32:30 +00:00
Shiqing Fan	91cfb2f149	A few mismatched declearations are fixed, and several header files are added for Cygwin... This commit was SVN r14151.	2007-03-27 14:17:25 +00:00
Josh Hursey	7c4ca3c420	remove some stale code This commit was SVN r14134.	2007-03-23 14:11:12 +00:00
Josh Hursey	dadca7da88	Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158	2007-03-16 23:11:45 +00:00
Brian Barrett	211ed6e852	Make the trunk look similar to v1.1 and v1.2, but return an error if we can't find "me" in the list of procs, since we should always be in the proc_world list, or something bad has happened... This commit was SVN r14025.	2007-03-13 20:17:10 +00:00
Brian Barrett	f59d38dd81	fix stupid compiler warning This commit was SVN r14024.	2007-03-13 19:45:26 +00:00
George Bosilca	533dfff56d	Only do the preconnection stage if we found the local proc. It's mostly to make some compilers complain less about uninitialized values. This commit was SVN r13805.	2007-02-26 22:24:44 +00:00
Tim Prins	dbe82c70d6	Get rid of stale file, and remove the (unused) references to it. This commit was SVN r13771.	2007-02-23 13:50:39 +00:00
Brian Barrett	727f64aecf	It appears that SEND_COMPLETE on a 0 byte message with BTLs that don't support SEND_IN_PLACE causes badness because the BTL tries to use the not-exactly-complete convertor. Don't need it in this situation anyway. This commit was SVN r13700.	2007-02-19 02:43:26 +00:00
Brian Barrett	8b28e5b33d	Allow the OOB to connect between all MPI applications during MPI_INIT without also establishing MPI connectivity. This commit was SVN r13595.	2007-02-09 20:17:37 +00:00
Brian Barrett	262cbbc5c9	Back out r13593, which contained a change that shouldn't be committed. This commit was SVN r13594. The following SVN revision numbers were found above: r13593 --> open-mpi/ompi@81472363ea	2007-02-09 20:13:02 +00:00
Brian Barrett	81472363ea	Allow the OOB to connect between all MPI applications during MPI_INIT without also establishing MPI connectivity. This commit was SVN r13593.	2007-02-09 20:11:40 +00:00
Ralph Castain	c7be9a7121	Complete backout of prior sched_yield and paffinity changes This commit was SVN r13530.	2007-02-07 14:22:37 +00:00
Jeff Squyres	0732c555de	Refs trac:853 Add new function opal_get_num_processors() that will return the number of processors on the local host. Does the Right thing in POSIX environments (to include a special case for OS X), and will shortly do the Right Thing for Windows (this commit includes a change to configure, so I wanted to get that in before the US workday -- the Windows code can some shortly because it won't involve configury changes). This commit was SVN r13506. The following Trac tickets were found above: Ticket 853 --> https://svn.open-mpi.org/trac/ompi/ticket/853	2007-02-06 12:03:56 +00:00
Ralph Castain	3daf8b341b	Fix the sched_yield problem for generic environments. We now determine and set sched_yield during mpi_init based on the following logical sequence: 1. if the user has specified sched_yield, we simply do what we are told 2. if they didn't specify anything, try to get the number of processors on this node. Note that we already now get the number of local procs in our job that are sharing this node - that now comes in through the proc callback and is stored in the ompi_proc_t structures. 3. if we can get the number of processors, compare that to the number of local procs from my job that are sharing my node. If the number of local procs exceeds the number of processors, then set sched_yield to true. If not, then be a hog and set sched_yield to false 4. if we can't get the number of processors, default to conservative behavior and set sched_yield to true. Note that I have not yet dealt with the need to dynamically adjust this setting as more processes are added via comm_spawn. So far, we are only looking within our own job. Given that we have now moved this logic to mpi_init (and away from the orteds), it isn't yet clear to me how a process will be informed about the number of procs in other jobs that are also sharing this node. Something to continue to ponder. This commit was SVN r13430.	2007-02-01 19:31:44 +00:00
Brian Barrett	49bcec64e4	More sm startup time reduction * The real fix, don't leave the OOB in blocking mode during comm_dyn_init(), as it means no progressing MPI events while the event library is waiting for TCP stuff to come in. * Add many comments explaining the reasons for the current ordering This commit was SVN r13422.	2007-02-01 18:47:43 +00:00
Jeff Squyres	e90b3e415b	* Before this commit, if we called ompi_mpi_abort() before MPI_INIT completed successfully, Bad Things(tm) could happen. * Now we explicitly check orte_initialized (a new global in ORTE indicating whether we are between orte_init() and orte_finalize() or not), and if so, react accordingly. * If ORTE is initialized, use orte_system_info.nodename; otherwise, use gethostname(). * Add loop protection to ensure that ompi_mpi_abort() is not invoked multiple times recursively. This commit was SVN r13354.	2007-01-29 22:01:28 +00:00
Jeff Squyres	0ce78f22fa	Compliments r13351 -- use the new field on the ompi_proc_t struct to know what my local rank is, and therefore set my paffinity ID as appropriate. Specifically, we're no longer relying on the special/secret mpi_paffinity_processor MCA parameter that the orted would set for us. This allows processor affinity to be used in environments where the orted is not used (e.g., bproc, and someday in the hopefully not too-distant future, SLURM). This commit was SVN r13352. The following SVN revision numbers were found above: r13351 --> open-mpi/ompi@a338b7e533	2007-01-29 21:53:04 +00:00
Jeff Squyres	52ca6cf86c	The mpi_leave_pinned and mpi_leave_pinned_pipeline MCA parameters were needlessly registered in multiple different places, and none of them had a good help string. There was also an inconsistent check for setting both mpi_leave_pinned and mpi_leave_pinned_pipeline (i.e., it was only in ob1). This commit moves the registration of these params to one central place (ompi/runtime/ompi_mpi_params.c, with all other mpi_* MCA params) and uses globals to propagate the values as relevant. The error check was also moved to the central location to ensure that we can consistency everywhere. This commit was SVN r13226.	2007-01-21 14:02:06 +00:00
Jeff Squyres	32bfbfc735	Correct a filename that would prevent show_help messages from appearing properly. This commit was SVN r13225.	2007-01-21 13:56:16 +00:00

1 2 3 4 5

232 Коммитов