openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	0a5d41857a	Complete next round of message size reduction: "strip" the descriptive info from the returned values. I have now added a flag to the gpr address mode (ORTE_GPR_STRIPPED) that instructs the gpr to not include segment names or tokens in the returned gpr_value_t objects. I found only two places that were looking at the tokens: 1. the odls - we used the tokens to separately process the globals container data from everything else. In this case, I left the subscription that returned the globals data alone, but "stripped" the subscription that returned the launch data for the procs. These subscriptions have nothing to do with the xcast message. 2. the pml_base_modex - the callback function was getting process names from the returned tokens. Actually, this function was doing a very bad thing - it was assuming that the first token returned was always the process name. This is currently true, but is one of those assumptions that someone could have easily changed - and suddenly found the system inexplicably failing. I modified the function to (a) get the name sent back to us, (b) "stripped" the value structures of tokens and segment strings, and (c) correctly obtained process names from the returned values. I also reindented the heck out of the code so it was legible (at least, to my old eyes). This commit was SVN r12813.	2006-12-09 23:10:25 +00:00
Brian Barrett	98884e45e4	Clean up the way procs are added to the global process list after MPI_INIT: * Do not add new procs to the global list during modex callback or when sharing orte names during accept/connect. For modex, we cache the modex info for later, in case that proc ever does get added to the global proc list. For accept/connect orte name exchange between the roots, we only need the orte name, so no need to add a proc structure anyway. The procs will be added to the global process list during the proc exchange later in the wireup process * Rename proc_get_namebuf and proc_get_proclist to proc_pack and proc_unpack and extend them to include all information needed to build that proc struct on a remote node (which includes ORTE name, architecture, and hostname). Change unpack to call pml_add_procs for the entire list of new procs at once, rather than one at a time. * Remove ompi_proc_find_and_add from the public proc interface and make it a private function. This function would add a half-created proc to the global proc list, so making it harder to call is a good thing. This means that there's only two ways to add new procs into the global proc list at this time: During MPI_INIT via the call to ompi_proc_init, where my job is added to the list and via ompi_proc_unpack using a buffer from a packed proc list sent to us by someone else. Currently, this is enough to implement MPI semantics. We can extend the interface more if we like, but that may require HNP communication to get the remote proc information and I wanted to avoid that if at all possible. Refs trac:564 This commit was SVN r12798. The following Trac tickets were found above: Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564	2006-12-07 19:56:54 +00:00
Brian Barrett	41a70a8f01	indent, this time with the right coding standards... This commit was SVN r12787.	2006-12-07 00:24:01 +00:00
Brian Barrett	f9ec8d6f2a	reindent file to make it easier to deal with... This commit was SVN r12786.	2006-12-07 00:21:25 +00:00
Ralph Castain	6d6cebb4a7	Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it. I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn). This commit was SVN r12597.	2006-11-14 19:34:59 +00:00
George Bosilca	17405cd9c6	A temporary fix, until we figure out a better approach. The problem is that if one add "pml=" to the configuration file, really bad things happen. All PMLs will get initialize, and each of them will initialize all BTLs. This patch force the mca_pml_base_pml to get initialized in all cases before we go out of the mca_pml_base_open function. This commit was SVN r12527.	2006-11-10 04:53:00 +00:00
George Bosilca	eab1776e9a	Explicit casts for our friendly Windows environment... This commit was SVN r12496.	2006-11-08 17:02:46 +00:00
Jeff Squyres	e02114dcf3	Fixes trac:529. * Create a new request type: NOOP (described below) * For all MPI__INIT functions, OBJ_NEW an ompi_request_t and set its type to NOOP Ensure that the NOOP requests are OBJ_RELEASE'd when they are done * MPI_START looks at the request type; if NOOP, just return success. If not, call the PML start() function * MPI_STARTALL always pass the entire array of requests back to the PML (see next point) * Make the PMLs only process PML requests (i.e., ignore/skip anything that isn't of type PML -- such as the NOOP requests) * Add a little more param error checking in STARTALL This commit was SVN r12338. The following Trac tickets were found above: Ticket 529 --> https://svn.open-mpi.org/trac/ompi/ticket/529	2006-10-27 12:32:36 +00:00
George Bosilca	126a68dc9a	Big datatype commit. Remove all unused features of the datatype engine. As the memory allocation logic is completely done outside the data-type engine (in the PML) there is no need for any special case inside the data-type engine. There is less arguments for the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is not required anymore as there is no memory allocated in the engine itself). This change affect all components using datatypes. I test most of them, but it might happens that I miss some ... If it's the case please let me know (don't shoot the pianist!!). This commit was SVN r12331.	2006-10-26 23:11:26 +00:00
Rainer Keller	47b24a0603	- Now the branch is done, linearize access regarding request handling. Buys a little bit on IMB, no functional change, otherwise. This commit was SVN r12165.	2006-10-18 16:11:50 +00:00
George Bosilca	6f5ec2390b	pedantic... This commit was SVN r12147.	2006-10-17 20:25:40 +00:00
George Bosilca	1c464d340c	Do not increase the reference count for the datatype if it is not required. Plus some typos. This commit was SVN r11728.	2006-09-20 20:14:15 +00:00
Ralph Castain	37dfdb76eb	Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done. This commit was SVN r11661.	2006-09-14 21:29:51 +00:00
George Bosilca	e33c35112b	Correct the conversion between int and bool. Apply it on all files except the one that will be modified by Ralph for the ORTE 2.0. The missing ones are in the rsh PLS. This commit was SVN r11476.	2006-08-28 18:59:16 +00:00
George Bosilca	3f0a7cad9e	The last patch for Windows support. Mostly casting and conversion to C++ friendly headers. This commit was SVN r11400.	2006-08-24 16:38:08 +00:00
George Bosilca	6fc16516cc	orte_std_cntr_t vs. size_t round 3. I back up this one as it wasn't suppose to be committed (and because it's wrong). This commit was SVN r11318.	2006-08-22 15:15:09 +00:00
George Bosilca	0417d27f46	orte_std_cntr_t vs. size_t round 2. Advantage for size_t ... This commit was SVN r11317.	2006-08-22 14:58:31 +00:00
Ralph Castain	5dfd54c778	With the branch to 1.2 made.... Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced). Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up). I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t). In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but... Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems. This commit was SVN r11204.	2006-08-15 19:54:10 +00:00
Brian Barrett	2897d2ef9b	* automagically select the "right" PML when direct-calling This commit was SVN r10818.	2006-07-14 21:33:26 +00:00
Galen Shipman	6ed255f114	Substantial changes to the CM PML, allows us to have a very thin request for all but buffered and persistent requests. Unfortunately we were note able to reuse the pml_base_request_t as it was just too heavy for our needs. Lots of code for 2/10 usec ;-) This commit was SVN r10810.	2006-07-14 19:32:26 +00:00
Galen Shipman	68ae99123d	fix bsend completion.. This commit was SVN r10709.	2006-07-10 22:27:32 +00:00
George Bosilca	476c9e64df	Don't keep multiples copies of the datatype and count. The only one we really need is the one provided by the user. For the buffered send the real datatype used for the communication is always MPI_BYTE and the count can be retrieved from the req_bytes_packed field. This will decrease the size of the request by one pointer and one size_t (8 bytes or 16 bytes depending on the architecture). This commit was SVN r10680.	2006-07-06 17:58:25 +00:00
Brian Barrett	26eee59032	* turns out that you should only call bsend_request_alloc or bsend_request_init, but not both. Otherwise, you don't free some buffer space and end up leaking buffers and ending in badness * since you only call alloc() or init(), but not both, need to restore reference counting in init() This commit was SVN r10674.	2006-07-06 14:02:51 +00:00
Brian Barrett	47725c9b02	* Add new PML (CM) and network drivers (MTL) for high speed interconnects that provide matching logic in the library. Currently includes support for MX and some support for Portals * Fix overuse of proc_pml pointer on the ompi_proc structuer, splitting into proc_pml for pml data and proc_bml for the BML endpoint data * bug fixes in bsend init code, which wasn't being used by the OB1 or DR PMLs... This commit was SVN r10642.	2006-07-04 01:20:20 +00:00
George Bosilca	e63c1dc242	The last commit wans't supposed to bring this function in. It's not yet ready for primetime... This commit was SVN r9840.	2006-05-07 20:51:43 +00:00
George Bosilca	33aa65f894	Remove useless include. This commit was SVN r9839.	2006-05-07 20:49:45 +00:00
Tim Woodall	02d991532f	interface to post a callback for notification of change to modex data This commit was SVN r9753.	2006-04-27 16:15:35 +00:00
George Bosilca	1226d452bf	Add a base _START macro that will do the base initialization. Additinaly, that allow me to add the PERUSE event is a more homogeneous manner (all PML's will have them). This commit was SVN r9499.	2006-03-31 17:05:09 +00:00
Tim Woodall	bd870519fd	- modified convertor copy_and_prepare routines to accept an addition flag, new flags to be included when convertor is initialized - modified pml/btl module defs and added stub functions for diagnostic output routines to dump state of queues / endpoints - updates to data reliability pml This commit was SVN r9329.	2006-03-17 18:46:48 +00:00
George Bosilca	612570134f	The request management framework has been redesigned. The main idea is to let the PML (or io, more generally the low level request manager) to have it's own release function (what was before the req_fini). This function will only be called from the low level while the req_free will be called from the upper level (MPI layer) in order to mark the request as not used by the user anymore. From the request point of view the requests will be marked as inactive everytime we read their status (true for persistent as well). As MPI_REQUEST_NULL is already marked as inactive, the test and wait functions are simpler. The drawback is that now we have to change in the ompi_request_{test\|wait} the req_status of the request once we get it's status. This commit was SVN r9290.	2006-03-15 22:53:41 +00:00
Brian Barrett	57b9c22adf	* fix for last ptl fix... have to actually return a value... This commit was SVN r9129.	2006-02-23 05:24:58 +00:00
Brian Barrett	2eb76ff0cd	* finish the TEG/UNIQ/PTL removal This commit was SVN r9118.	2006-02-23 00:39:01 +00:00
Brian Barrett	566a050c23	Next step in the project split, mainly source code re-arranging - move files out of toplevel include/ and etc/, moving it into the sub-projects - rather than including config headers with <project>/include, have them as <project> - require all headers to be included with a project prefix, with the exception of the config headers ({opal,orte,ompi}_config.h mpi.h, and mpif.h) This commit was SVN r8985.	2006-02-12 01:33:29 +00:00
Ralph Castain	4b9f015c0b	Merge in the new data support subsystem for ORTE. MPI folks should not notice a difference. Longer explanation will be sent to developers mailing list. This commit was SVN r8912.	2006-02-07 03:32:36 +00:00
George Bosilca	eb1d2dd290	Working down the latency (0.2 micro-sec on a Xeon 2Ghz) by removing the second instance of the ompi_proc from the send and receive request. This information is already available on the base request, so there is no need for duplication. The drawback is that now (in order to avoid a second lookup in the communicator array of procs) we have to set the base proc in the PML's _ALLOC macro. This commit was SVN r8900.	2006-02-05 06:13:07 +00:00
George Bosilca	58c9c82dab	Add a macro to mark the MPI request as completed (MCA_PML_BASE_REQUEST_MPI_COMPLETE) and broadcast the request condition if required. This macro should be called with the request's mutex locked. This commit was SVN r8811.	2006-01-25 23:15:36 +00:00
Brian Barrett	8faa1884f0	* The last of the build system optimizations. Combine the component and component/base Makefile.am files, reducing the time configure spends stamping out Makefiles at the end * Install base_impl.h file when devel-headers are being installed This commit was SVN r8200.	2005-11-20 01:03:01 +00:00
Jeff Squyres	42ec26e640	Update the copyright notices for IU and UTK. This commit was SVN r7999.	2005-11-05 19:57:48 +00:00
George Bosilca	d916e0c5b4	The (I hope) final solution for the convertor problem. As all the PML inherit the base send and receive request from the pml_base, we can solve our problem if we construct the convertor attached to any request in the pml_base_construct function. At the end of the life time for each request (here life time is related to one utilisation, without taking in account the cache) we release all information attached to the convertors in the _FINI macro by calling the ompi_convertor_cleanup. This commit was SVN r7910.	2005-10-28 03:26:36 +00:00
George Bosilca	75bc3dd43c	Dont mess around with the OBJ_DESTRUCT on the communicator. It's quicker (and safer) to call directly the communicator cleanup function (ompi_convertor_cleanup). This commit was SVN r7814.	2005-10-19 21:28:52 +00:00
George Bosilca	1d75b7972f	Solve thee problem with the reference count on the datatype (RT bug 1492). The problem is that the convertor (when prepared) increase the reference count on the used datatype. This reference count will be released only when the OBJ_DESTRUCT is called on a convertor. However, having to call OBJ_CONSTRUCT and OBJ_DESTRUCT on each request every time we want to use it (even when it come from the cache) is an expensive operation. This can be avoided is the OBJ_DESTRUCT will leave the convertor in exactly the same state as OBJ_CONSTRUCT. With this approach we just have to call OBJ_CONSTRUCT for each convertor once when we initially create the request. This commit was SVN r7813.	2005-10-19 20:57:39 +00:00
Brian Barrett	7b20370306	* pretty-print an error message if a btl component loads but can't find any NICs to use * Make mvapi, gm, and mx components all publish information, even if there are no NICs available so that modex_recv doesn't hang. If there are no NICs available, don't set the reachable bit, but don't do anything to fail. This unfortunately doesn't cover the hangs that will result if different procs load different sets of components, but it's a start This commit was SVN r7550.	2005-09-30 04:39:44 +00:00
Tim Woodall	6ae2ae4d1a	- code cleanup - correct mpi semantics This commit was SVN r7390.	2005-09-15 18:47:59 +00:00
Tim Woodall	c25fb5dab0	- fixed issue w/ btl send-in-place option that was affecting tcp - reduced size of match header by an additional 4 bytes to 16 bytes - corrections for buffered send (work in progress) This commit was SVN r7371.	2005-09-14 17:08:08 +00:00
Jeff Squyres	5456d3444f	- Add missing header files - Use new #include file scheme This commit was SVN r7367.	2005-09-14 09:37:20 +00:00
Tim Woodall	bacc1b9122	hack to request all existing values for jobids other than our own - required for mpi2 dynamic processes This commit was SVN r7336.	2005-09-13 03:53:53 +00:00
George Bosilca	c9fb1f32f2	And more dependencies fixes. The big commit will follow shortly. This commit was SVN r7319.	2005-09-12 20:22:59 +00:00
Brian Barrett	ed56e743b7	* update configure.ac to use the modern version of AC_INIT and AM_INIT_AUTOMAKE, instead of the deprecated version. * Work around dumbness in modern AC_INIT that requires the version number to be set at autoconf time (instead of at configure time, as it was before). Set the version number, minus the subversion r number, at autoconf time. Override the internal variables to include the r number (if needed) at configure time. Basically, the right thing should always happen. The only place it might not is the version reported as part of configure --help will not have an r number. * Since AM_INIT_AUTOMAKE taks a list of options, no need to specify them in all the Makefile.am files. * Addes support for subdir-objects, meaning that object files are put in the directory containing source files, even if the Makefile.am is in another directory. This should start making it feasible to reduce the number of Makefile.am files we have in the tree, which will greatly reduce the time to run autogen and configure. This commit was SVN r7211.	2005-09-07 05:54:53 +00:00
Ralph Castain	03e45e6723	Two quick additions: 1. Added OMPI_PROC_ARCH as a defined registry key and added the code so that the architecture info gets properly transmitted across all processes using the startup message. 2. Added an OMPI_MODEX_KEY definition and removed the hard-coded "modex" key from pml_modex_exchange This commit was SVN r7129.	2005-09-01 15:05:03 +00:00
Jeff Squyres	3962c53e2e	- Add to AM_CPPFLAGS $(OPAL_LTDL_CPPFLAGS) where necessary in order to add a -I to find the included ltdl.h (vs. a system-installed ltdl.h) - Clean up kruft in a bunch of Makefile.am's to remove now-unnecessary AM_CPPFLAGS settings to get static-components.h for each framework - Move the component_repository API functions out of opal/mca/base/base.h and into opal/mca/base/mca_base_component_repository.h in order to decrease unnecessary dependencies (e.g., before this, almost everything in the tree depended on ltdl.h, which is unnecessary -- only a small number of files really need ltdl.h) This commit was SVN r7127.	2005-09-01 12:16:36 +00:00
Ralph Castain	96f4bb7a63	Hey, sports fans!! Guess what?? Here's the huge registry check-in you've all been waiting for with baited breath. The revised version sends a single message to all processes at the various stage gates, thus making the startup much more scalable. I could provide you with all the tawdry details, but won't for now - you are welcome to ask, though, and I'll merrily bore your ears to tears. In addition, the commit contains the following: 1. set the ignore properties on ompi/debuggers and orte/mca/pls/poe 2. Added simplified subscribe and put functions to the registry's API. I have also converted all of the ompi functions that registered subscriptions to the new API, and caught their associated put's as well. In a follow-on commit, I'll be adding support for George's hetero arch registry subscription (wanted to get this one in first). This commit was SVN r7118.	2005-09-01 01:07:30 +00:00
Tim Woodall	e4fd117f5f	This commit was SVN r6986.	2005-08-23 21:30:42 +00:00
Jeff Squyres	cf16a521c8	Ensure to get ompi/include/constants.h This commit was SVN r6845.	2005-08-12 21:42:07 +00:00
George Bosilca	8b93cb7661	Rename all the functions starting with mca_base_modex to mca_pml_base_modex. Change all the places where they are used to fit the new name. Remove the code to check the remote arch from the PML. We will have a GPR mechanism in ompi_mpi_initialize to do that. This commit was SVN r6750.	2005-08-05 18:03:30 +00:00
Ralph Castain	19d58ee17e	First phase of the scalable RTE changes: 1. Modify the registry to eliminate redundant data copying for startup messages. 2. Revise the subscription/trigger system to avoid redundant storage of triggers and subscriptions. This dramatically reduces the search time when a registry action occurs - to illustrate the point, there are now only a handful of triggers on the system for each job. Before, there were a handful of triggers for each PROCESS in the job, all of which had to be checked every time something happened on the registry. This is much, much faster now. 3. Update all subscriptions to the new format. There are now "named" subscriptions - this allows you to "name" a subscription that all the processes will be using. The first one to hit the registry actually defines the subscription. From then on, any subsequent "subscribes" to the same name just cause that process to "attach" to the existing subscription. This keeps the number of subscriptions being tracked by the registry to a minimum, while ensuring that each process still gets notified. 4. Do the same for triggers. Also fixed a duplicate subscription problem that was causing people to receive data equal to the number of processes times the data they should have received from a trigger/subscription. Sorry about that... :-( ...but it's all better now! Uncovered a situation where the modex data seems to be getting entered on the registry a second time - the latter time coming after the compound command has been "fired", thereby causing all the subscriptions to fire. Asked Tim and Jeff to look into this. Second phase of the changes will involve modifying the xcast system so that the same message gets sent to all processes. This will further reduce the message traffic, and - once we have a true "broadcast" version of xcast - really speed things up and improve scalability. This commit was SVN r6542.	2005-07-18 18:49:00 +00:00
George Bosilca	10a8e46f99	If I want the default values then I have to pick them up from the req_base !!! This commit was SVN r6504.	2005-07-14 22:06:27 +00:00
George Bosilca	a7adea8b8f	As at the end of the start function for the bsend request we replace the default convertor by one where the data is already packed, we have to recreate the default one in the case we reuse the initial request. This commit was SVN r6503.	2005-07-14 22:03:58 +00:00
George Bosilca	0b0c4c17a5	If the user explicitly specify a PML then print out it's name in the case where the PML was unable to initialize correctly. This commit was SVN r6445.	2005-07-12 19:30:51 +00:00
Josh Hursey	048d5c1415	Added some userlevel error checking, and messaging. This commit was SVN r6440.	2005-07-12 18:06:31 +00:00
George Bosilca	6aa956241f	Solve the issues when several PML are available. The main problem here come from the fact that an PML is a lot more difficult than a PTL, and it can adapt it's behavior to the level of threading required by the user. In this case the behavior is the priorit of the PML. Therefore this information is never availale before the init function (of the PML) is called. So I try to keep nearly the same structure as it was before, with one change. When a PML get initialized it does not necessarily means it has been selected, so it does not means it has to create all it's internal structures (and select the PTL and all this stuff). They can all be done later, when a PML knows that it definitively get selected (when the enable function is called with the argument set to true). Thus, in the case of a PML close one have to check if the PML has been selected or not before trying to clean up the internals. I had to change the MPI_Init function to allow the PML to be enabled before we start adding procs inside. This commit was SVN r6434.	2005-07-12 05:40:56 +00:00
Brian Barrett	6c9cba5d55	* protect pointer assignment, as registration will be NULL if special mpools aren't used This commit was SVN r6426.	2005-07-12 02:00:42 +00:00
Brian Barrett	0ae16f2ab7	* add local hook to remove static-components.h in distclean target. The files are generated by configure, and not part of the tarball, so distclean would be the right place to remove them. This commit was SVN r6390.	2005-07-08 13:54:12 +00:00
Jeff Squyres	6a9c9953bc	Remove a bunch of -I's that are no longer necessary with properly-prefixed static-component.h files. This commit was SVN r6342.	2005-07-04 18:24:58 +00:00
Brian Barrett	a13166b500	* rename ompi_output to opal_output This commit was SVN r6329.	2005-07-03 23:31:27 +00:00
Brian Barrett	39dbeeedfb	* rename locking code from ompi to opal This commit was SVN r6327.	2005-07-03 22:45:48 +00:00
Brian Barrett	ccd2624e3f	* rename ompi_progress to opal_progress This commit was SVN r6326.	2005-07-03 21:57:43 +00:00
Brian Barrett	9f0c969bb4	* rename ompi_hash_table opal_hash_table This commit was SVN r6324.	2005-07-03 16:52:32 +00:00
Brian Barrett	761402f95f	* rename ompi_list to opal_list This commit was SVN r6322.	2005-07-03 16:22:16 +00:00
Brian Barrett	499e4de1e7	* rename ompi_object and ompi_class to opal_object and opal_class This commit was SVN r6321.	2005-07-03 16:06:07 +00:00
Brian Barrett	8cad33db40	* finish modex move * fix protection in opal_free_list.h * Fix some makefiles This commit was SVN r6311.	2005-07-03 00:52:18 +00:00
Jeff Squyres	1b6326f76d	Move module_exchange to pml/base This commit was SVN r6305.	2005-07-02 16:12:04 +00:00
Jeff Squyres	aa056f7bfd	First cut of OMPI Makefile.am's, plus a few more catchup updates in orte This commit was SVN r6286.	2005-07-02 15:06:47 +00:00
Jeff Squyres	4ab17f019b	Rename src -> ompi This commit was SVN r6269.	2005-07-02 13:43:57 +00:00

1 2 3 4

173 Коммитов