openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	49d938de29	Merge one-sided updates to the trunk - written by Brian Barrett and Nathan Hjelmn cmr=v1.7.5:reviewer=hjelmn:subject=Update one-sided to MPI-3 This commit was SVN r30816.	2014-02-25 17:36:43 +00:00
Brian Barrett	f42783ae1a	Move the RTE framework change into the trunk. With this change, all non-CR runtime code goes through one of the rte, dpm, or pubsub frameworks. This commit was SVN r27934.	2013-01-27 23:25:10 +00:00
Josh Hursey	28681deffa	Backout the ORCA commit. :( There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk. This commit was SVN r26676.	2012-06-27 01:28:28 +00:00
Josh Hursey	542330e3a7	Commit of ORCA: Open MPI Runtime Collaborative Abstraction This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI. The project is described on the wiki: https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition And on this email thread: http://www.open-mpi.org/community/lists/devel/2012/06/11109.php This commit was SVN r26670.	2012-06-26 21:42:16 +00:00
Christopher Yeoh	bab59bda76	Fixes trac:2767: Recursive locking when ROMIO used with THREAD_MULITPLE This commit was SVN r24681. The following Trac tickets were found above: Ticket 2767 --> https://svn.open-mpi.org/trac/ompi/ticket/2767	2011-05-04 06:31:42 +00:00
Rainer Keller	0feb158aaf	- Since r22727 orte_app_idx_t was introduced, being a uint32_t (was previously an orte_std_cntr_t, which is int32_t). Comparison with < 0 don't make any sense, here. This commit was SVN r22799. The following SVN revision numbers were found above: r22727 --> open-mpi/ompi@2541aa98ab	2010-03-08 22:56:33 +00:00
George Bosilca	3e971e61f3	The system headers are supposed to be protected by #ifdef and not by #if. This commit was SVN r21700.	2009-07-16 18:27:33 +00:00
Rainer Keller	ec0ed48718	- Revert r20739 This commit was SVN r20742. The following SVN revision numbers were found above: r20739 --> open-mpi/ompi@781caee0b6	2009-03-05 21:56:03 +00:00
Rainer Keller	781caee0b6	- First of two or three patches, in orte/util/proc_info.h: Adapt orte_process_info to orte_proc_info, and change orte_proc_info() to orte_proc_info_init(). - Compiled on linux-x86-64 - Discussed with Ralph This commit was SVN r20739.	2009-03-05 20:36:44 +00:00
Ralph Castain	ba5498cdc6	Repair the MPI-2 dynamic operations. This includes: 1. repair of the linear and direct routed modules 2. repair of the ompi/pubsub/orte module to correctly init routes to the ompi-server, and correctly handle failure to correctly parse the provided ompi-server URI 3. modification of orterun to accept both "file" and "FILE" for designating where the ompi-server URI is to be found - purely a convenience feature 4. resolution of a message ordering problem during the connect/accept handshake that allowed the "send-first" proc to attempt to send to the "recv-first" proc before the HNP had actually updated its routes. Let this be a further reminder to all - message ordering is NOT guaranteed in the OOB 5. Repair the ompi/dpm/orte module to correctly init routes during connect/accept. Reminder to all: messages sent to procs in another job family (i.e., started by a different mpirun) are ALWAYS routed through the respective HNPs. As per the comments in orte/routed, this is REQUIRED to maintain connect/accept (where only the root proc on each side is capable of init'ing the routes), allow communication between mpirun's using different routing modules, and to minimize connections on tools such as ompi-server. It is all taken care of "under the covers" by the OOB to ensure that a route back to the sender is maintained, even when the different mpirun's are using different routed modules. 6. corrections in the orte/odls to ensure proper identification of daemons participating in a dynamic launch 7. corrections in build/nidmap to support update of an existing nidmap during dynamic launch 8. corrected implementation of the update_arch function in the ESS, along with consolidation of a number of ESS operations into base functions for easier maintenance. The ability to support info from multiple jobs was added, although we don't currently do so - this will come later to support further fault recovery strategies 9. minor updates to several functions to remove unnecessary and/or no longer used variables and envar's, add some debugging output, etc. 10. addition of a new macro ORTE_PROC_IS_DAEMON that resolves to true if the provided proc is a daemon There is still more cleanup to be done for efficiency, but this at least works. Tested on single-node Mac, multi-node SLURM via odin. Tests included connect/accept, publish/lookup/unpublish, comm_spawn, comm_spawn_multiple, and singleton comm_spawn. Fixes ticket #1256 This commit was SVN r18804.	2008-07-03 17:53:37 +00:00
Jeff Squyres	3616b03eb3	Fix a comment -- we implemented windows a long time ago. This commit was SVN r16657.	2007-11-05 13:43:53 +00:00
Jeff Squyres	b5abb12c98	Commit Ralph's fix for MPI_APPNUM. This commit was SVN r16371.	2007-10-06 18:54:43 +00:00
Tim Prins	c46ed1d5d4	Make it so the universe size is passed through the ODLS instead of through a gpr trigger during MPI init. This matches what is currently being done with the app number. The default odls has been updated and works fine. The process odls has been updated, but I could not verify its operation. The bproc ODLS has not been updated yet. Ralph will look at it soon. This commit was SVN r15257.	2007-07-02 01:33:35 +00:00
Jeff Squyres	260f1fd468	Fixes trac:817 The C++ bindings were not tracking keyvals properly -- they were freeing some internal meta data when Free_keyval() was called, not when the keyval was actually destroyed (keyvals are refcounted in the C layer, just like all other MPI objects, because they can live for long after their corresponding Free call is invoked). This commit fixes this problem and several other things: * Add infrastructure on the ompi_attribute_keyval_t for an "extra" destructor pointer that will be invoked during the "real" constructor (i.e., when OBJ_RELEASE puts the refcount to 0). This allows calling back into the C++ layer to release meta data associated with the keyval. * Adjust all cases where keyvals are created to pass in relevant destructors (NULL or the C++ destructor). * Do essentially the same for MPI::Comm, MPI::Win, and MPI:Datatype: * Move several functions out of the .cc file into the _inln.h file since they no longer require locks * Make the 4 Create_keyval() functions call a common back-end keyval creation function that does the Right Thing depending on whether C or C++ function pointers were used for the keyval functions. The back-end function does not call the corresponding C MPI__create_keyval function, but rather does the work itself so that it can associate a "destructor" callback for the C++ bindings for when the keyval is actually destroyed. Change a few type names to be more indicative of what they are (mostly dealing with keyvals [not "keys"]). * Add the 3 missing bindings for MPI::Comm::Create_keyval(). * Remove MPI::Comm::comm_map (and associated types) because it's no longer necessary in the intercepts -- it was a by-product of being a portable C++ bindings layer. Now we can just query the C layer directly to figure out what type a communicator is. This solves some logistics / callback issues, too. * Rename several types, variables, and fix many comments in the back-end C attribute implementation to make the names really reflect what they are (keyvals vs. attributes). The previous names heavily overloaded the name "key" and were ''extremely'' confusing. This commit was SVN r13565. The following Trac tickets were found above: Ticket 817 --> https://svn.open-mpi.org/trac/ompi/ticket/817	2007-02-08 23:50:04 +00:00
Ralph Castain	0a5d41857a	Complete next round of message size reduction: "strip" the descriptive info from the returned values. I have now added a flag to the gpr address mode (ORTE_GPR_STRIPPED) that instructs the gpr to not include segment names or tokens in the returned gpr_value_t objects. I found only two places that were looking at the tokens: 1. the odls - we used the tokens to separately process the globals container data from everything else. In this case, I left the subscription that returned the globals data alone, but "stripped" the subscription that returned the launch data for the procs. These subscriptions have nothing to do with the xcast message. 2. the pml_base_modex - the callback function was getting process names from the returned tokens. Actually, this function was doing a very bad thing - it was assuming that the first token returned was always the process name. This is currently true, but is one of those assumptions that someone could have easily changed - and suddenly found the system inexplicably failing. I modified the function to (a) get the name sent back to us, (b) "stripped" the value structures of tokens and segment strings, and (c) correctly obtained process names from the returned values. I also reindented the heck out of the code so it was legible (at least, to my old eyes). This commit was SVN r12813.	2006-12-09 23:10:25 +00:00
Ralph Castain	a1153fdc8f	Eliminate virtually all of the attribute_predefined data from the STG1 message. We now compute the total number of slots allocated to us and save that in the registry - the attributed_predefined then retrieves it via the STG1 message. The app_num is passed via the process_info structure, which gets the value from the ODLS in the environment. Obviously, people like bproc will have to get the app_num via another avenue...but that's a problem for another day. Several options are easily available. This commit was SVN r12788.	2006-12-07 03:11:20 +00:00
Edgar Gabriel	1359ba9b13	Rewriting much of the errorcode and errorclass code, since - we have to be able to attach a string to an error class, not just to an error code - according to MPI-2 the attribute MPI_LASTUSEDCODE has to be updated everytime you add a new code or a new class. Thus, you have to have single list for both. Thus, we got rid of the error_class structure. In the error-code structure, we can distinguish whether we are dealing with an error code or an error class by looking at the err->code element of the structure. In case its value is MPI_UNDEFINED, the according entry is a class, else it is an error code. All predefined error codes have the code and the class field set to the same value. The test MPI_Add_error_class1 passes now. Fixes trac:418 This commit was SVN r12764. The following Trac tickets were found above: Ticket 418 --> https://svn.open-mpi.org/trac/ompi/ticket/418	2006-12-05 19:07:02 +00:00
George Bosilca	a0ed53d70b	Make the compilers happy. This commit was SVN r12729.	2006-12-03 00:19:11 +00:00
Ralph Castain	6d6cebb4a7	Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it. I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn). This commit was SVN r12597.	2006-11-14 19:34:59 +00:00
Ralph Castain	8b6921f297	Initialize the rank variable before it is used. This commit was SVN r12565.	2006-11-13 02:37:12 +00:00
Ralph Castain	17c71f8d2a	Fix ticket #545 Setup subscriptions to correctly return the MPI_APPNUM attribute. Fix an unreported bug that was found. The universe size was incorrectly defined in the attributes code. As coded, it looked for size_t values and based its size computation on those numbers. Unfortunately, the node_slots value had been changed to an orte_std_cntr_t awhile back! So the universe size was never updated. Update the hello_nodename test to check for MPI_APPNUM. Add a definition to ns_types for ORTE_PROC_MY_NAME - just a shortcut for orte_process_info.my_name. Brought over from ORTE 2.0 as it will be used extensively there. This commit was SVN r12377.	2006-10-31 21:29:07 +00:00
George Bosilca	06563b5dec	Last set of explicit conversions. We are now close to the zero warnings on all platforms. The only exceptions (and I will not deal with them anytime soon) are on Windows: - the write functions which require the length to be an int when it's a size_t on all UNIX variants. - all iovec manipulation functions where the iov_len is again an int when it's a size_t on most of the UNIXes. As these only happens on Windows, so I think we're set for now :) This commit was SVN r12215.	2006-10-20 03:57:44 +00:00
Ralph Castain	5dfd54c778	With the branch to 1.2 made.... Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced). Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up). I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t). In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but... Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems. This commit was SVN r11204.	2006-08-15 19:54:10 +00:00
George Bosilca	85bb1a9c90	Add one more argument to the copy functions for the MPI objects. As this argument is the last one on the list and as on C the caller "make it right" this addition will not affect the way we handle the user defined copy functions. Only the C version of the function has this additional parameter. As it represent the pointer to the newly created MPI object It hold the key to allow us to modify the new object (communicator, window or type) depending on some key stored on the initial communicator. This commit was SVN r9371.	2006-03-23 04:47:14 +00:00
Rainer Keller	4b1194056f	- Upon Finalize, also release the memory of predefined attributes. This commit was SVN r9169.	2006-02-27 15:15:48 +00:00
Brian Barrett	566a050c23	Next step in the project split, mainly source code re-arranging - move files out of toplevel include/ and etc/, moving it into the sub-projects - rather than including config headers with <project>/include, have them as <project> - require all headers to be included with a project prefix, with the exception of the config headers ({opal,orte,ompi}_config.h mpi.h, and mpif.h) This commit was SVN r8985.	2006-02-12 01:33:29 +00:00
Ralph Castain	4b9f015c0b	Merge in the new data support subsystem for ORTE. MPI folks should not notice a difference. Longer explanation will be sent to developers mailing list. This commit was SVN r8912.	2006-02-07 03:32:36 +00:00
Brian Barrett	b1d2424013	Merge in present work on the MPI-2 onesided chapter. The current code is not complete, but stable enough that it will have no impact on general development, so into the trunk it goes. Changes in this commit include: - Remove the --with option for disabling MPI-2 onesided support. It complicated code, and has no real reason for existing - add a framework osc (OneSided Communication) for encapsulating all the MPI-2 onesided functionality - Modify the MPI interface functions for the MPI-2 onesided chapter to properly call the underlying framework and do the required error checking - Created an osc component pt2pt, which is layered over the BML/BTL for communication (although it also uses the PML for long message transfers). Currently, all support functions, all communication functions (Put, Get, Accumulate), and the Fence synchronization function are implemented. The PWSC active synchronization functions and Lock/Unlock passive synchronization functions are still not implemented This commit was SVN r8836.	2006-01-28 15:38:37 +00:00
Jeff Squyres	42ec26e640	Update the copyright notices for IU and UTK. This commit was SVN r7999.	2005-11-05 19:57:48 +00:00
Ralph Castain	96f4bb7a63	Hey, sports fans!! Guess what?? Here's the huge registry check-in you've all been waiting for with baited breath. The revised version sends a single message to all processes at the various stage gates, thus making the startup much more scalable. I could provide you with all the tawdry details, but won't for now - you are welcome to ask, though, and I'll merrily bore your ears to tears. In addition, the commit contains the following: 1. set the ignore properties on ompi/debuggers and orte/mca/pls/poe 2. Added simplified subscribe and put functions to the registry's API. I have also converted all of the ompi functions that registered subscriptions to the new API, and caught their associated put's as well. In a follow-on commit, I'll be adding support for George's hetero arch registry subscription (wanted to get this one in first). This commit was SVN r7118.	2005-09-01 01:07:30 +00:00
Jeff Squyres	c9cdb36b0b	Finally get this right: move orte_sys_info.[ch] back into the orte tree. - fix up #include's throughout the tree (yay contrib/search_replace.pl!) - remove a few extraneous #include's - remove orte_sys_info*() from opal_init()/opal_finalize() (it's already in orte_init_stage1() and orte_system_finalize()) - remove dependencies in opal on orte_system_info -- util/os_path.c and util/os_create_dirpath.c (they only used path_sep, anyway -- easily changed to #defines) This commit was SVN r7059.	2005-08-26 21:03:41 +00:00
Brian Barrett	f273d84b1b	* update ob1 to direct call * don't know what I was thinking, but can't use the MCA_PML_CALL macro on the two data values, as they don't have things that the macro can expand into This commit was SVN r6868.	2005-08-14 03:14:20 +00:00
Brian Barrett	95fd068ffa	remove hard coded constants for value of MPI_TAG_UB and the max CID and add the values to the PML structure. This will allow PMLs that want to do hardware matching at the cost of a smaller range of valid tags and cids. Updated all the places that used the MPI_TAG_UB_VALUE constant to instead look at the pml struct. This commit was SVN r6778.	2005-08-09 14:56:04 +00:00
Ralph Castain	19d58ee17e	First phase of the scalable RTE changes: 1. Modify the registry to eliminate redundant data copying for startup messages. 2. Revise the subscription/trigger system to avoid redundant storage of triggers and subscriptions. This dramatically reduces the search time when a registry action occurs - to illustrate the point, there are now only a handful of triggers on the system for each job. Before, there were a handful of triggers for each PROCESS in the job, all of which had to be checked every time something happened on the registry. This is much, much faster now. 3. Update all subscriptions to the new format. There are now "named" subscriptions - this allows you to "name" a subscription that all the processes will be using. The first one to hit the registry actually defines the subscription. From then on, any subsequent "subscribes" to the same name just cause that process to "attach" to the existing subscription. This keeps the number of subscriptions being tracked by the registry to a minimum, while ensuring that each process still gets notified. 4. Do the same for triggers. Also fixed a duplicate subscription problem that was causing people to receive data equal to the number of processes times the data they should have received from a trigger/subscription. Sorry about that... :-( ...but it's all better now! Uncovered a situation where the modex data seems to be getting entered on the registry a second time - the latter time coming after the compound command has been "fired", thereby causing all the subscriptions to fire. Asked Tim and Jeff to look into this. Second phase of the changes will involve modifying the xcast system so that the same message gets sent to all processes. This will further reduce the message traffic, and - once we have a true "broadcast" version of xcast - really speed things up and improve scalability. This commit was SVN r6542.	2005-07-18 18:49:00 +00:00
Brian Barrett	22dbdb2d66	* set universe size to be size of comm_world in initial setup. Means that all required attributes are set even if the trigger never fires (the trigger can update the value, if needed. This is all in MPI_Init, so that does not violate the standard) This commit was SVN r6427.	2005-07-12 02:34:00 +00:00
Brian Barrett	a13166b500	* rename ompi_output to opal_output This commit was SVN r6329.	2005-07-03 23:31:27 +00:00
Jeff Squyres	4ab17f019b	Rename src -> ompi This commit was SVN r6269.	2005-07-02 13:43:57 +00:00

37 Коммитов