openmpi

Автор	SHA1	Сообщение	Дата
Rainer Keller	c175801f98	- Initialize in the order of mca_pml_ob1_comm_proc_t... This commit was SVN r15946.	2007-08-23 05:56:22 +00:00
Rainer Keller	b0df55d53b	- For MPI_Probe/MPI_Iprobe, we should not have a PERUSE_COMM_REQ_ACTIVATE event. Therefore move the PERUSE_TRACE_COMM_EVENT for this event from MCA_PML_BASE_SEND_REQUEST_INIT / MCA_PML_BASE_RECV_REQUEST_INIT to the proper places into pml_ob1_isend.c / pml_ob1_irecv.c right after the MCA_PML_OB1_SEND_REQUEST_INIT / MCA_PML_OB1_RECV_REQUEST_INIT. This commit was SVN r15945.	2007-08-23 05:52:33 +00:00
Gleb Natapov	5596aa5f53	The sizes of mca_pml_ob1_send_request_t and mca_pml_ob1_recv_request_t depend on a parameter and are determined in runtime. r15346 removed calculation of correct sizes for this structures. This patch adds it back and fixes trac:1116, #1114. This commit was SVN r15932. The following SVN revision numbers were found above: r15346 --> open-mpi/ompi@433f8a7694 The following Trac tickets were found above: Ticket 1116 --> https://svn.open-mpi.org/trac/ompi/ticket/1116	2007-08-20 12:06:27 +00:00
George Bosilca	c7e0ab93ae	Don't forget to include string.h for the strcmp function. This commit was SVN r15927.	2007-08-19 19:59:15 +00:00
Josh Hursey	729c63cf9d	Fix invalid MCA 'base' names so they appear in ompi_info. A subset of this patch needs to be applied to v1.2 Refs trac:928 This commit was SVN r15918. The following Trac tickets were found above: Ticket 928 --> https://svn.open-mpi.org/trac/ompi/ticket/928	2007-08-18 03:05:45 +00:00
Aurelien Bouteiller	3a83c61c40	Fixed a bug with available space in sender based. This commit was SVN r15889.	2007-08-16 17:54:26 +00:00
Aurelien Bouteiller	77565d60d9	Heavy modification of the pml_v framework. * Code cleanup and rationalization * Fixed: mca_pml_base_send/recv_request are now allocated before recreation by the PML-V * Fixed: pointer arithmetic bug in sender based that crashed * Changed: directory structure. This is one step forward using autogen.sh to build static-components.h (it needs to have the directory structure of a mca framework for this). This commit was SVN r15878.	2007-08-16 05:52:30 +00:00
Aurelien Bouteiller	ee708d702d	Slight modification to register the name of the selected pml (from the pml framework) instead of the generic mca name. This might be a different name when enabling FT features. This name modification in the modex allows the PMLS to detect a FT protocol mismatch among hosts. This commit was SVN r15877.	2007-08-16 05:46:11 +00:00
Aurelien Bouteiller	fa7f6f6722	Improved error detection of request types This commit was SVN r15857.	2007-08-14 17:24:46 +00:00
Aurelien Bouteiller	67399e7c31	Added a debug type checking for request types (to make sure request size is correctly computed). This commit was SVN r15856.	2007-08-14 17:18:15 +00:00
Aurelien Bouteiller	1d97c183e7	Better argument checking for output function and added a routine for error printing. This commit was SVN r15855.	2007-08-14 17:17:12 +00:00
Aurelien Bouteiller	ca69915b1e	Code cleanup This commit was SVN r15783.	2007-08-06 22:20:44 +00:00
Mohamad Chaarawi	59a7bf8a9f	Merging in the Sparse Groups.. This commit includes config changes.. This commit was SVN r15764.	2007-08-04 00:41:26 +00:00
George Bosilca	e41ee17ca5	Add a small comment that hopefully will enforce the correct ordering of the fields between CM and the other PML in the requests structure. This commit was SVN r15760.	2007-08-03 23:59:29 +00:00
Aurelien Bouteiller	1d160ca583	Needed change for vampir pml to work This commit was SVN r15750.	2007-08-03 02:23:24 +00:00
Gleb Natapov	627d9bc8ed	Delay freeing of a send request if scheduling function is running by other thread. This commit was SVN r15722.	2007-08-01 12:19:16 +00:00
Aurelien Bouteiller	a403fed18a	More checkings (assert) on the output system so that malformed format string does not crash the application at a later random time. Changed various debug messages to retain most usefull messages This commit was SVN r15715.	2007-07-31 19:33:39 +00:00
Aurelien Bouteiller	cec9ce8106	Fixed: various warnings with printf(%x, uint64_t) on 32 bit architectures + some left (long) cast for size_t printf. This commit was SVN r15706.	2007-07-31 17:12:21 +00:00
Aurelien Bouteiller	a5d0e53bb3	Moved replay macros to functions. The performance improvement in process recovery does not worth the debugging hassle. This commit was SVN r15703.	2007-07-31 16:01:32 +00:00
Aurelien Bouteiller	5a792a3fad	(hopefully) fixed various pedantic warning about casts on 32bit machines. Not tried only have 64bits available. This commit was SVN r15702.	2007-07-31 15:58:19 +00:00
Aurelien Bouteiller	3559fd5d1a	Fixed issues with "verbose" output being too silent. This commit was SVN r15691.	2007-07-30 19:11:15 +00:00
Gleb Natapov	afac5eb93f	Guard recv request with lock against simultaneous access from different threads. This commit was SVN r15681.	2007-07-30 12:50:38 +00:00
Gleb Natapov	21dd061696	Init req_send_range_lock. Found by Terry Dontje. This commit was SVN r15677.	2007-07-30 08:21:52 +00:00
Aurelien Bouteiller	17e10ff918	Modified the output system to comply with a wider range of compilers. Jelena: this should solve the issue you faced today. This commit was SVN r15668.	2007-07-27 23:11:00 +00:00
Aurelien Bouteiller	e07b95bdd5	Fixed: warnings with printf(%d, size_t) Fixed: All copyrights are now correct up to 2007 Fixed: Build system now works with VPATHs Changed: protocol_example is now ignored by default This commit was SVN r15627.	2007-07-25 22:28:04 +00:00
Galen Shipman	f6a20715b7	minor nit.. This commit was SVN r15619.	2007-07-25 17:34:37 +00:00
George Bosilca	873bd41796	More fixes for the Windows suport. This commit was SVN r15602.	2007-07-25 04:22:21 +00:00
George Bosilca	10175c3014	No more warnings in the PML V. This commit was SVN r15601.	2007-07-25 04:19:58 +00:00
George Bosilca	c6d2e03cdd	Correct the prototype for non GNU compilers. This commit was SVN r15598.	2007-07-25 03:50:35 +00:00
Aurelien Bouteiller	16da13c79e	Missing file... This commit was SVN r15540.	2007-07-20 22:24:02 +00:00
Aurelien Bouteiller	70bb44d7a9	Moving the Message Log framework to the trunk. Protocol example (simple showcase) and sender based are provided for now. Ignored by default except for utk folks. This commit was SVN r15539.	2007-07-20 21:36:11 +00:00
Brian Barrett	5b9fa7e998	reapply r15517 and r15520, which were removed in r15527 so that I could get the RML/OOB merge in slightly easier This commit was SVN r15530. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95 r15520 --> open-mpi/ompi@9cbc9df1b8 r15527 --> open-mpi/ompi@2d17dd9516	2007-07-20 02:34:29 +00:00
Brian Barrett	39a6057fc6	A number of improvements / changes to the RML/OOB layers: * General TCP cleanup for OPAL / ORTE * Simplifying the OOB by moving much of the logic into the RML * Allowing the OOB RML component to do routing of messages * Adding a component framework for handling routing tables * Moving the xcast functionality from the OOB base to its own framework Includes merge from tmp/bwb-oob-rml-merge revisions: r15506, r15507, r15508, r15510, r15511, r15512, r15513 This commit was SVN r15528. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15506 r15507 r15508 r15510 r15511 r15512 r15513	2007-07-20 01:34:02 +00:00
Brian Barrett	2d17dd9516	temporarily back our r15517 and 15520 so that I can get the RML / OOB changes to cleanly apply This commit was SVN r15527. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95	2007-07-20 01:10:34 +00:00
Ralph Castain	41977fcc95	Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but... Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point. Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings. This commit was SVN r15517.	2007-07-19 20:56:46 +00:00
Rich Graham	f2a30cde5d	add table of send completion callback functions, on a per send-type basis. This commit was SVN r15471.	2007-07-17 21:26:56 +00:00
Rich Graham	0991c3d5f5	move buffered send component clean up out of the pml to ompi_mpi_finalize. This commit was SVN r15463.	2007-07-17 14:50:52 +00:00
Rich Graham	1a4ce2a961	move setting of the component used to managed buffer sends out of the pmls, and into ompi_mpi_init. This is the first of several steps to pull buffered send management out of the pmls. This commit was SVN r15451.	2007-07-16 21:52:25 +00:00
George Bosilca	1e825888a5	Fix the problem reported on #1087 . The global send and receive requests queues are now release in the base close, so there is no need for the cm PML to destroy them. This commit was SVN r15425.	2007-07-13 23:56:09 +00:00
George Bosilca	8643f38adf	Don't allow the BTL to be closed before the end of the process. Count the number of times the BTLs are opened, and then don't remove them until close was called the same number of times. This commit was SVN r15376.	2007-07-11 22:21:04 +00:00
George Bosilca	9ed3ede73e	Correct the thin and heavy requests management for the CM PML. This commit was SVN r15361.	2007-07-11 15:10:01 +00:00
George Bosilca	ef7d17d814	Fix a copy&paste typo. This commit was SVN r15360.	2007-07-11 15:09:06 +00:00
George Bosilca	9b501eb66d	Looks like MAX is not a standard macro. Anyway, that the heavy requests is larger than the thin seems to be a "correct" assumption. This commit was SVN r15348.	2007-07-11 00:04:33 +00:00
George Bosilca	e19777e910	A more consistent version. As we now share the send and receive queue, we have to construct/destruct only once. Therefore, the construction will happens before digging for a PML, while the destruction just before finalizing the component. Add some OPAL_LIKELY/OPAL_UNLIKELY. This commit was SVN r15347.	2007-07-10 23:45:23 +00:00
George Bosilca	433f8a7694	This patch bring full support for message queues in Open MPI. Now the send and receive queues are shared among all PMLs, they are declared in the base PML, and the selected PML is in charge of initializing and releasing them. The CM PML is slightly different compared with OB1 or DR. Internally it use 2 different types of requests: light and heavy. However, now with this patch both types of requests are stored in the same queue, and cast appropriately on the allocation macro. This means we might use less memory than we allocate, but in exchange we got full support for most of the parallel debuggers. Another thing with this patch, is that now for all PML (CM included) the basic PML requests start with the same fields, and they are declared in the same order in the request structure. Moreover, the fields have been moved in such a way that only one volatile/atomic will exist per line of cache (hopefully). This commit was SVN r15346.	2007-07-10 22:16:38 +00:00
Brian Barrett	8b9e8054fd	Move modex from pml base to general ompi runtime, sicne it's used by more than just the PML/BTLs these days. Also clean up the code so that it handles the situation where not all nodes register information for a given node (rather than just spinning until that node sends information, like we do today). Includes r15234 and r15265 from the /tmp/bwb-modex branch. This commit was SVN r15310. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15234 r15265	2007-07-09 17:16:34 +00:00
Tim Prins	f3ac4ac20e	Fix order of function arguments This commit was SVN r15304.	2007-07-08 16:37:51 +00:00
Rainer Keller	cff1b6a71b	- PERUSE_COMM_REQ_XFER_BEGIN should be emited for first fragment of larger message as well. This commit was SVN r15299.	2007-07-06 15:02:36 +00:00
George Bosilca	951e4929b9	Usually it's unlikely to have additional fragments. This commit was SVN r15253.	2007-07-01 16:19:53 +00:00
George Bosilca	c435094639	Only trigger the PERUSE_COMM_REQ_XFER_BEGIN event on the initial fragment. This commit was SVN r15252.	2007-07-01 16:19:13 +00:00
George Bosilca	60319f99ac	Make sure in case of error what we return is clean (set to NULL). This commit was SVN r15251.	2007-07-01 16:17:43 +00:00
George Bosilca	11656e20aa	Remove few warnings. This commit was SVN r15250.	2007-07-01 16:16:05 +00:00
Gleb Natapov	77e54ebc7e	Schedule RDMA op on the last BTL that got completion. This commit was SVN r15249.	2007-07-01 11:35:55 +00:00
Gleb Natapov	54b40aef91	Schedule SEND traffic of pipeline protocol between BTLs in accordance with relative bandwidths of each BTL. Precalculate what part of a message should be send via each BTL in advance instead of doing it during scheduling. This commit was SVN r15248.	2007-07-01 11:34:23 +00:00
Gleb Natapov	e74aa6b295	Schedule RDMA traffic between BTLs in accordance with relative bandwidths of each BTL. Precalculate what part of a message should be send via each BTL in advance instead of doing it during scheduling. This commit was SVN r15247.	2007-07-01 11:31:26 +00:00
Gleb Natapov	1c7141df4d	Remove unused struct. This commit was SVN r15228.	2007-06-28 11:58:16 +00:00
Gleb Natapov	b88b7dedfe	Rename btl_rdma_offset to btl_pipeline_send_length. This commit was SVN r15153.	2007-06-21 07:12:40 +00:00
Josh Hursey	7fd1805e97	Fix a couple of compile warnings that Tim P brought to by attention. This commit was SVN r15132.	2007-06-19 00:46:16 +00:00
George Bosilca	10a017d1bf	For a obscure reason this have to be defined on Windows. The obscure reason it's that we don't have the nice configure stuff, so detecting when to enable the CR PML it's kind of hard. Keep it defined and at least it compile smoothly. This commit was SVN r15116.	2007-06-17 05:01:09 +00:00
Rainer Keller	ca09aae2cc	- Get PERUSE compile again with latest RDMA changes in r14768/r14842. This commit was SVN r15042. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07 r14842 --> open-mpi/ompi@10266fb467	2007-06-13 12:47:47 +00:00
Brian Barrett	84d1512fba	Add the potential for doing some basic error checking on mutexes during single threaded builds. In its default configuration, all this does is ensure that there's at least a good chance of threads building based on non-threaded development (since the variable names will be checked). There is also code to make sure that a "mutex" is never "double locked" when using the conditional macro mutex operations. This is off by default because there are a number of places in both ORTE and OMPI where this alarm spews mega bytes of errors on a simple test. So we have some work to do on our path towards thread support. Also removed the macro versions of the non-conditional thread locks, as the only places they were used, the author of the code intended to use the conditional thread locks. So now you have upper-case macros for conditional thread locks and lowercase functions for non-conditional locks. Simple, right? :). This commit was SVN r15011.	2007-06-12 16:25:26 +00:00
Gleb Natapov	423f404c34	Shut up compiler warning. Ugly, but I can see better way except changing converter to use uint64_t(ssize_t?) for offset. This commit was SVN r14950.	2007-06-07 11:33:28 +00:00
Gleb Natapov	9f9b64db4e	Revert r14947 as this doesn't solve the problem. This commit was SVN r14949. The following SVN revision numbers were found above: r14947 --> open-mpi/ompi@5b9fe28e3f	2007-06-07 11:24:24 +00:00
Gleb Natapov	5b9fe28e3f	Fix warning on 32bit systems. This commit was SVN r14947.	2007-06-07 08:57:34 +00:00
Rich Graham	e276f7bcc7	undo my error. This commit was SVN r14890.	2007-06-05 23:32:47 +00:00
Rich Graham	ce0e9ac329	initialize lock properly. This commit was SVN r14881.	2007-06-05 20:34:11 +00:00
Shiqing Fan	c142c23f88	Initialize req_ompi.req_status._count to be 0 before starting the request. This commit was SVN r14861.	2007-06-05 09:50:06 +00:00
Shiqing Fan	0961669912	Spaces after backslash are removed. This commit was SVN r14844.	2007-06-04 10:10:24 +00:00
Shiqing Fan	7bf18a4fd5	MPI_SOURCE should be initialized. This commit was SVN r14843.	2007-06-04 09:37:21 +00:00
Gleb Natapov	10266fb467	Fix deadlock in OB1 protocol by by sending memory by copying if registration fails. This commit was SVN r14842.	2007-06-03 08:31:58 +00:00
Gleb Natapov	a25e1e7b15	Implement new function mca_pml_ob1_send_requst_copy_in_out(req, offset, len) that allows to send any range of a request by send/recv instaed of RDMA and use it to send data from the end of a request in pipeline protocol. This commit was SVN r14841.	2007-06-03 08:30:07 +00:00
Gleb Natapov	06bf5d74e7	Remove mca_pml_ob1_send_fin_btl function. This commit was SVN r14784.	2007-05-28 06:51:12 +00:00
Gleb Natapov	f5078db0db	Fix order of parameters to function. This commit was SVN r14783.	2007-05-27 13:45:24 +00:00
Gleb Natapov	ad69d3c6ac	Fix out of resource handling for FIN packets broken by r14768. This commit was SVN r14780. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07	2007-05-27 08:29:38 +00:00
Galen Shipman	3401bd2b07	Add optional ordering to the BTL interface. This is required to tighten up the BTL semantics. Ordering is not guaranteed, but, if the BTL returns a order tag in a descriptor (other than MCA_BTL_NO_ORDER) then we may request another descriptor that will obey ordering w.r.t. to the other descriptor. This will allow sane behavior for RDMA networks, where local completion of an RDMA operation on the active side does not imply remote completion on the passive side. If we send a FIN message after local completion and the FIN is not ordered w.r.t. the RDMA operation then badness may occur as the passive side may now try to deregister the memory and the RDMA operation may still be pending on the passive side. Note that this has no impact on networks that don't suffer from this limitation as the ORDER tag can simply always be specified as MCA_BTL_NO_ORDER. This commit was SVN r14768.	2007-05-24 19:51:26 +00:00
Ralph Castain	4fff584a68	Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that did start. The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system. Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed. Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief. With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn. Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put". This commit was SVN r14711.	2007-05-21 18:31:28 +00:00
Gleb Natapov	3ebaff8dfe	Implement new BTL parameters: We eagerly send data up to btl__eager_limit with the match Upon ACK of the MATCH we start using send/receives of size btl__max_send_size up to the btl__rdma_pipeline_offset After the btl__rdma_pipeline_offset we begin using RDMA writes of size btl__rdma_pipeline_frag_size. Now, on a per message basis we only use the above protocol if the message is larger than btl__min_rdma_pipeline_size btl__eager_limit - > same btl__max_send_size -> same btl__rdma_pipeline_offset -> btl__min_rdma_size btl__rdma_pipeline_frag_size -> btl__max_rdma_size btl_*_min_rdma_pipeline_size is new.. This patch also moves all BTL common parameters initialisation into btl_base_mca.c file. This commit was SVN r14681.	2007-05-17 07:54:27 +00:00
Sven Stork	22af6d38e6	- UNexport symbols that shouldn't be needed outside the libraries - replace #if/#endif with BEGIN/END_C_DECLS - reformating This commit was SVN r14669.	2007-05-16 15:46:52 +00:00
Gleb Natapov	2562253678	Do more work at RDMA frag preparation time and less work at RDMA frag sending time. This commit was SVN r14627.	2007-05-09 12:11:51 +00:00
Gleb Natapov	78fda79630	Use size_t instead of uint64_t in call to convertor cloning. This commit was SVN r14626.	2007-05-09 10:02:06 +00:00
Sven Stork	a04c8eb39a	- Bring over the visibility feature, for a finer symbol export control via the visibility feature that is provided by some compilers. Per default this feature is disabled, to enable it you need to configure with --enable-visibility and obviously you need a compiler with visibility support. Please refer to the wiki for more information. https://svn.open-mpi.org/trac/ompi/wiki/Visibility This commit was SVN r14582.	2007-05-04 09:03:37 +00:00
Gleb Natapov	8029893489	In multithreaded application sending of initial portion of a request may overlap with RDMAing the rest of it. Also more than one RDMA writes can be performed simultaneously by different threads. To make this code thread safe this patch clones original request convertor for each RDMA fragment. This commit was SVN r14574.	2007-05-03 09:13:17 +00:00
George Bosilca	bb481273a6	Typos. This commit was SVN r14546.	2007-04-28 19:15:53 +00:00
Rainer Keller	6f9251ed39	- Small fixes by PGI -Minform=inform This commit was SVN r14524.	2007-04-26 08:16:07 +00:00
Josh Hursey	8c2385416f	Per a developer request - Make sure that the wrapper selection is compiled out if not enabling FT. Before the logic would skip over it since the conditional if statements would not be satisfied, now there are no additional if statements when compiled out. With this modification the selection logic looks nearly identical to pre-r14051 with the exception of the non-FT related improvements. This commit was SVN r14491. The following SVN revision numbers were found above: r14051 --> open-mpi/ompi@dadca7da88	2007-04-24 17:08:48 +00:00
Ralph Castain	18b2dca51c	Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly. There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd really have to try). This also involved a slight change to the oob.xcast API, so propagated that as required. Note: this has only been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-) Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately. This commit was SVN r14475.	2007-04-23 18:41:04 +00:00
Adrian Knoth	339dbf6cd5	Cosmetics. Enforcing style guide. This commit was SVN r14459.	2007-04-21 21:47:25 +00:00
Josh Hursey	4159b72a60	Some minor updates to go along with commit r14457 This commit was SVN r14458. The following SVN revision numbers were found above: r14457 --> open-mpi/ompi@2af38229c1	2007-04-21 21:24:44 +00:00
Josh Hursey	eef364546c	Check for NULL before trying to use the variable. This commit was SVN r14444.	2007-04-20 17:17:11 +00:00
Josh Hursey	12e5d0e817	ft_event Commit: - Move the PML Modex stuff out of the BML -- Abstraction violation. - Also fix the location of the add_procs with respect to the stage gates. This commit was SVN r14422.	2007-04-19 03:05:12 +00:00
George Bosilca	51fc2474f1	Don't keep the data attached to a fragment segmented when we have to move it into the unexpected queue. Instead pack the data in only one buffer. Now the code look more optimized and clear, but I have a doubt about who's using this functionality. I think that all BTLs always return only one memory segment attached to the matching fragment (i.e. there is no unexpected iov type receive). This commit was SVN r14416.	2007-04-18 15:52:11 +00:00
Adrian Knoth	e3178fd39f	Cosmetics. PTLs are now called BTLs. This commit was SVN r14382.	2007-04-16 10:12:27 +00:00
Josh Hursey	8f119d9063	Closes trac:977 Fix for memory corruption in the restarted process stack. This stemed from the brute force method we were previously using. This commit fixes this by using a lighter weight solution focused in the r2 BML instead of above the PML. This is a more efficient and flexible solution, and it solves the original problem. In the process I pulled out the ft_event function in the tcp BTL and r2 BML into a set of *_ft.[c\|h] files just to keep any updates to these code paths as isolated as possible to make merging easier on everyone. This commit was SVN r14371. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855 The following Trac tickets were found above: Ticket 977 --> https://svn.open-mpi.org/trac/ompi/ticket/977	2007-04-14 02:06:05 +00:00
Jeff Squyres	51f286d737	Just like r14289 on the ORTE trunk: Per discussions with Brian and Ralph, make a slight correction in where components are installed. Use $pkglibdir, not $libdir/openmpi, so that when compiled in the orte trunk, components are installed to the right directory (because the component search patch is checking $pkglibdir). This commit was SVN r14345. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r14289	2007-04-12 11:19:42 +00:00
Galen Shipman	d7e428909e	two fixes, one mine, the other gleb's, I'm committing for gleb due to time difference... 1) The PML makes an assumption on local/remote completion semantics of the BTL which Self BTL does not obey, nor should it, so we fix the PML 2) The Get protocol must handle the case when sender and reciever do not agree on wheter the data is contiguous This commit was SVN r14313.	2007-04-11 22:03:06 +00:00
Josh Hursey	fbc59f668c	fix typo This commit was SVN r14301.	2007-04-11 15:39:42 +00:00
Josh Hursey	5efae25390	No functionality changes (yet). Just fix the indentation to meet the coding standard. This commit was SVN r14300.	2007-04-11 15:19:51 +00:00
Josh Hursey	38547459ae	Improve the cleanup process in ob1 Remove a redundant statement in the r2 BML. This commit was SVN r14228. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855	2007-04-05 17:37:29 +00:00
Josh Hursey	98fb9f26ef	Some cleanup. - Remove an old comment from crcp_base_fns.c - Let ob1 have its very own ft_event function (which I'll fill in shortly) - Make sure ob1 finalizes the bsend stuff so we don't leave a bunch of memory sitting around - PML base - destruct the array upon finalize. Shrink the include search so it stops after finding a match This commit was SVN r14222.	2007-04-05 13:52:05 +00:00
Josh Hursey	51daa15f9c	play a bit nicer with references. This commit was SVN r14201.	2007-04-02 22:27:52 +00:00
George Bosilca	f2a6b9394f	Deal with the include spree. Protect "environ" on Windows. Some others minors modifications in order to make it compile [again] on Windows. This commit was SVN r14188.	2007-04-01 16:16:54 +00:00
George Bosilca	1cb26e3b9c	Finally the convertor export a convenience function to allow a consistent computation of the current location on the pack/unpack process. This can be used both for retrieving the pointer to the first byte (in the special case of the cached RDMA protocol) and for getting the current position (for the pipelined protocol). I modified all BTLs, but most of them are still untested. This commit was SVN r14180.	2007-03-30 22:02:45 +00:00
Galen Shipman	a78672be2b	fix mpi_leave_pinned case for arbitrary datatypes George will be streamlining this with a new convertor function soon... This commit was SVN r14174.	2007-03-30 02:06:08 +00:00
Galen Shipman	ace68b1883	Change the way we handle unexpected messages, if less than or equal pml_ob1_unexpected_limit just buffer in the PML level recv fragment else allocate a buffer via the bucket allocator This commit was SVN r14117.	2007-03-22 01:00:34 +00:00
Josh Hursey	299332ecac	fix small compiler warning This commit was SVN r14097.	2007-03-21 04:44:54 +00:00
Josh Hursey	6d29146748	fix dumb logic break in the PML selection finalization This commit was SVN r14053.	2007-03-17 16:33:43 +00:00
Josh Hursey	dadca7da88	Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158	2007-03-16 23:11:45 +00:00
Sven Stork	870740efe2	- proper export symbols that are required by other components. This commit was SVN r13841.	2007-02-28 12:51:55 +00:00
Josh Hursey	c573171b7d	Mostly a cleanup commit. - Implement the BML/r2 finialize funciton - Cleanup the btl close routine - Wire up a pml_base_verbose MCA parameter so you can actually watch the PML selection logic if you really want to. - Fix a potental segfault in the selection logic. ompi_pointer_array_get_item() may return NULL, so we have to check for it This commit was SVN r13734. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855	2007-02-21 16:18:43 +00:00
Pavel Shamis	2483cefc57	Additional check if descriptor is NULL. It prevents mca_pml_dr_sendreq_cleanup_active failure on segfault. This commit was SVN r13647.	2007-02-14 10:43:43 +00:00
Brian Barrett	c00d841741	Fix hang on Cray machine introduced with r13582. The modex will never fire when on the Cray machine (aka when the NULL GPR is in use). This commit was SVN r13638. The following SVN revision numbers were found above: r13582 --> open-mpi/ompi@041beeb1b6	2007-02-13 18:34:03 +00:00
George Bosilca	2e042c91cf	Once we compute the local offset use it (instead of the global one). This commit was SVN r13634.	2007-02-13 09:34:04 +00:00
George Bosilca	22eca30b45	One less compiler warning. This commit was SVN r13633.	2007-02-13 09:32:57 +00:00
Gleb Natapov	1033002595	Fix memory leak. Free allocated descriptor if operation cannot proceed. This commit was SVN r13610.	2007-02-12 09:47:51 +00:00
Brian Barrett	041beeb1b6	Share currently selected PML in the modex information, then check whenever adding new procs that the remote proc's pml is the same as our local pml. Turns the hangs from mismatched PMLs into an abort, which is better, I think. This commit was SVN r13582.	2007-02-09 16:38:16 +00:00
Galen Shipman	f98a442c82	Fix a problem in the selection logic for MX. Basically we need to be able to open MTL MX and BTL MX and initialize them at the same time. The problem is that both call mx_init and mx_finalize, solution is to add an external entity that does the init and finalize (based on ref counting). This commit was SVN r13576.	2007-02-09 03:19:38 +00:00
Josh Hursey	90f449f675	fix a typo that got in there This commit was SVN r13523.	2007-02-06 20:56:48 +00:00
Galen Shipman	ec610a9e65	spread priorities out a bit.. This commit was SVN r13487.	2007-02-04 00:55:25 +00:00
Galen Shipman	ddf08cb0b3	woops.. This commit was SVN r13482.	2007-02-03 02:32:00 +00:00
Galen Shipman	a94101fa62	mostly another hack around for PML selection, allows CM be select itself if an MTL is available, if not OB1 is used. Still prevents DR and OB1 from stomping on each other though. This commit was SVN r13481.	2007-02-03 02:01:18 +00:00
George Bosilca	0ff2115964	Other warnings are now silenced. This commit was SVN r13462.	2007-02-02 06:47:35 +00:00
George Bosilca	79ea6d471b	Even less warnings. This commit was SVN r13429.	2007-02-01 19:27:11 +00:00
Brian Barrett	a0b40ce45a	Fix race condition in setting MPI_ERROR -- with buffered sends, the request can complete before the operation, meaning that a bogus MPI_ERROR is read This commit was SVN r13401.	2007-01-31 21:40:14 +00:00
Rainer Keller	061ba05439	- Fixes uncovered with the format attribute to opal_output and opal_output_verbose This commit was SVN r13371.	2007-01-30 20:56:31 +00:00
Rainer Keller	3669e8921e	- Fix further compiler warnings regarding initialization and shadowing variables. This commit was SVN r13358.	2007-01-30 06:34:38 +00:00
Rainer Keller	ca35881cd0	- Minor bugfixes and removed compiler warnings This commit was SVN r13343.	2007-01-28 19:52:09 +00:00
George Bosilca	790f175d4e	Explicit conversions to make the code Windows friendly. This commit was SVN r13266.	2007-01-24 00:50:24 +00:00
Rainer Keller	96030de97b	- Initialize the size of the opal_object class. - Use the OBJ_CLASS_INSTANCE macro to initialize classes. This also gets rid of several missing initialization errors. This commit was SVN r13227.	2007-01-21 14:24:29 +00:00
Jeff Squyres	52ca6cf86c	The mpi_leave_pinned and mpi_leave_pinned_pipeline MCA parameters were needlessly registered in multiple different places, and none of them had a good help string. There was also an inconsistent check for setting both mpi_leave_pinned and mpi_leave_pinned_pipeline (i.e., it was only in ob1). This commit moves the registration of these params to one central place (ompi/runtime/ompi_mpi_params.c, with all other mpi_* MCA params) and uses globals to propagate the values as relevant. The error check was also moved to the central location to ensure that we can consistency everywhere. This commit was SVN r13226.	2007-01-21 14:02:06 +00:00
Rolf vandeVaart	6a260e4a9a	Fix two problems. For MPI_Buffer_detach, do not attempt to return the buffer address from Fortran. It is not expected behavior. For MPI_Buffer_attach, adjust the address of the buffer handed in so it is always aligned. Refs trac:750 Buffer detach reviewed by Jeff Squyres Buffer attach alignment reviewed by George Bosilca This commit was SVN r13205. The following Trac tickets were found above: Ticket 750 --> https://svn.open-mpi.org/trac/ompi/ticket/750	2007-01-18 23:32:39 +00:00
Ralph Castain	4ef4cbb5ad	Fix a compiler warning about comparing signed/unsigned values This commit was SVN r13190.	2007-01-18 17:14:06 +00:00
Gleb Natapov	4c7dbd36c7	Balance RDMA operation in round robin fashion between all available RDMA BTLs. OB1 always use first element from array of BTLs available for RDMA. The patch change the array creation algorithm, it puts different BTL in the first element in round robin fashion. This commit was SVN r13174.	2007-01-18 09:15:18 +00:00
Jeff Squyres	52e8089600	Fix compiler warning. This commit was SVN r13148.	2007-01-17 14:23:46 +00:00
George Bosilca	87ff2b5ce8	Cast to the correct type. This commit was SVN r13046.	2007-01-08 22:04:01 +00:00
George Bosilca	53ddbe8446	Nothing relevant. This commit was SVN r13044.	2007-01-08 22:02:17 +00:00
Brian Barrett	a34e67d743	Remove unneeded PARAM_INIT_FILE variable in configure.params files used by components that use configure.m4 for configuration or are always built. The macro has not been needed since moving to configure types other than configure.stub Fixes trac:590 This commit was SVN r13031. The following Trac tickets were found above: Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590	2007-01-08 03:44:22 +00:00
Brian Barrett	8900d3ae43	Second take at fixing the issues with using ompi_ptr_t. Add helper functions for converting from .pval to .lval and vice-versa. Users of ompi_ptr_t types should only use one of the fields in the union unless using the helper conversion functions. For the BTLs, local pointers will always be stored in the .pval field and remote pointers always stored in the .lval field. George wrote the initial patch, I extended it slightly and am responsible for all bugs found. Refs trac:587 This commit was SVN r13023. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2007-01-07 01:48:57 +00:00
Brian Barrett	48ec0b2071	Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix for now... This commit was SVN r12997. The following SVN revision numbers were found above: r12974 --> open-mpi/ompi@27cea44a9c	2007-01-04 22:07:37 +00:00
Galen Shipman	931a389c4f	fix deadlock on rendezvous protocol.. This commit was SVN r12982.	2007-01-04 03:46:11 +00:00
Brian Barrett	27cea44a9c	Fix a number of issues with the ompi_ptr_t: * Make sure that the pval always writes to the correct portion of the lval. This only matters on 32 bit big endian machines. * On 32 bit machines when assigning to pval, the other 4 bytes of lval weren't being written, which could lead to bogus data We use macros so that there aren't casts all over the code and the pval assignment can occur to the correct 4 bytes. Refs trac:587 This commit was SVN r12974. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2007-01-03 19:47:48 +00:00
Gleb Natapov	a6127fd8ce	Increase req_bytes_delivered atomically. This commit was SVN r12971.	2007-01-03 15:19:34 +00:00
Gleb Natapov	79202561f6	Don't check req_pipeline_depth on frag completion. Checking of req_bytes_delivered should be enough. This commit was SVN r12967.	2007-01-03 14:44:20 +00:00
Gleb Natapov	1ad6c41735	Sender can start scheduling send fragments immediately after receiving ACK. No need to wait for RNDV completion. This commit was SVN r12965.	2007-01-03 12:37:11 +00:00
Brian Barrett	99c0a29602	Disable CM and DR PMLs in heterogeneous situtations as neither are heterogeneous safe. Refs trac:587 This commit was SVN r12942. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2006-12-30 16:17:56 +00:00
George Bosilca	0b5d879a63	ompi_convertor_pack do not return errors (all checkings are done when the convertor is created). This commit was SVN r12940.	2006-12-29 07:40:02 +00:00
George Bosilca	d8db9e49f3	Set the bml_btl to NULL or segfault !!! This commit was SVN r12939.	2006-12-29 07:38:24 +00:00
Brian Barrett	2ab65eb521	Remove some debugging output that was #if 0'ed out but shouldn't have been committed into the trunk anyway This commit was SVN r12897.	2006-12-19 02:34:41 +00:00
Gleb Natapov	190e7a27cd	Merge with gleb-mpool branch. All RDMA components use same mpool now (rdma). udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited). This commit was SVN r12878.	2006-12-17 12:26:41 +00:00
Brian Barrett	01e8fc5f91	Redo of r12871, without the preconnect code change: Move the req_mtl structure back to the end of each of the structures in the CM PML. The req_mtl structure is cast into a mtl__request_structure for each MTL, which is larger than the req_mtl itself. The cast will cause the _request to overwrite parts of the heavy requests if the req_mtl isn't the LAST thing on each structure (hence the comment). This was moved as an optimization at some point, which caused buffer sends to fail... Refs trac:669 This commit was SVN r12873. The following SVN revision numbers were found above: r12871 --> open-mpi/ompi@597598b712 The following Trac tickets were found above: Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669	2006-12-15 17:54:14 +00:00
Brian Barrett	bdf0b231b2	Undo r12871, as it contained some code in ompi/runtime that shouldn't have been committed Refs trac:669 This commit was SVN r12872. The following SVN revision numbers were found above: r12871 --> open-mpi/ompi@597598b712 The following Trac tickets were found above: Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669	2006-12-15 17:52:13 +00:00
Brian Barrett	597598b712	Move the req_mtl structure back to the end of each of the structures in the CM PML. The req_mtl structure is cast into a mtl__request_structure for each MTL, which is larger than the req_mtl itself. The cast will cause the _request to overwrite parts of the heavy requests if the req_mtl isn't the LAST thing on each structure (hence the comment). This was moved as an optimization at some point, which caused buffer sends to fail... Refs trac:669 This commit was SVN r12871. The following Trac tickets were found above: Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669	2006-12-15 17:46:53 +00:00
Brian Barrett	10af8ab454	Corrections for when threading is enabled. Refs trac:564 This commit was SVN r12830. The following Trac tickets were found above: Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564	2006-12-12 18:48:42 +00:00
Brian Barrett	cf196ce420	Instead of an unknown proc list that requires ownership transfer of data (which, in turn, requires a complex series of locks to be held during the transfer), use a modex backing store with backpointers from the proc to the backing store. The proc structures no longer own the modex data, which greatly simplifies locking when an unknown proc suddenly becomes known. Refs trac:564 This commit was SVN r12822. The following Trac tickets were found above: Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564	2006-12-11 21:27:30 +00:00
Ralph Castain	0a5d41857a	Complete next round of message size reduction: "strip" the descriptive info from the returned values. I have now added a flag to the gpr address mode (ORTE_GPR_STRIPPED) that instructs the gpr to not include segment names or tokens in the returned gpr_value_t objects. I found only two places that were looking at the tokens: 1. the odls - we used the tokens to separately process the globals container data from everything else. In this case, I left the subscription that returned the globals data alone, but "stripped" the subscription that returned the launch data for the procs. These subscriptions have nothing to do with the xcast message. 2. the pml_base_modex - the callback function was getting process names from the returned tokens. Actually, this function was doing a very bad thing - it was assuming that the first token returned was always the process name. This is currently true, but is one of those assumptions that someone could have easily changed - and suddenly found the system inexplicably failing. I modified the function to (a) get the name sent back to us, (b) "stripped" the value structures of tokens and segment strings, and (c) correctly obtained process names from the returned values. I also reindented the heck out of the code so it was legible (at least, to my old eyes). This commit was SVN r12813.	2006-12-09 23:10:25 +00:00
Brian Barrett	98884e45e4	Clean up the way procs are added to the global process list after MPI_INIT: * Do not add new procs to the global list during modex callback or when sharing orte names during accept/connect. For modex, we cache the modex info for later, in case that proc ever does get added to the global proc list. For accept/connect orte name exchange between the roots, we only need the orte name, so no need to add a proc structure anyway. The procs will be added to the global process list during the proc exchange later in the wireup process * Rename proc_get_namebuf and proc_get_proclist to proc_pack and proc_unpack and extend them to include all information needed to build that proc struct on a remote node (which includes ORTE name, architecture, and hostname). Change unpack to call pml_add_procs for the entire list of new procs at once, rather than one at a time. * Remove ompi_proc_find_and_add from the public proc interface and make it a private function. This function would add a half-created proc to the global proc list, so making it harder to call is a good thing. This means that there's only two ways to add new procs into the global proc list at this time: During MPI_INIT via the call to ompi_proc_init, where my job is added to the list and via ompi_proc_unpack using a buffer from a packed proc list sent to us by someone else. Currently, this is enough to implement MPI semantics. We can extend the interface more if we like, but that may require HNP communication to get the remote proc information and I wanted to avoid that if at all possible. Refs trac:564 This commit was SVN r12798. The following Trac tickets were found above: Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564	2006-12-07 19:56:54 +00:00
Brian Barrett	41a70a8f01	indent, this time with the right coding standards... This commit was SVN r12787.	2006-12-07 00:24:01 +00:00
Brian Barrett	f9ec8d6f2a	reindent file to make it easier to deal with... This commit was SVN r12786.	2006-12-07 00:21:25 +00:00
Brian Barrett	6f8b366acb	Rename liborte to libopen-rte and libopal to libopen-pal per telecon today and bug #632. Refs trac:632 This commit was SVN r12762. The following Trac tickets were found above: Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632	2006-12-05 18:27:24 +00:00
Brian Barrett	441432950f	Merge in changes from the bwb-heterogeneous temp branch (r12491 - r12714) for supporting compilers / architectures with different padding rules. This commit was SVN r12749. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r12491 r12714	2006-12-04 20:11:42 +00:00
Rainer Keller	6f8f28f40f	- Get rid of inline definition, otherwise static-compilation fails. This commit was SVN r12735.	2006-12-03 14:52:17 +00:00
Gleb Natapov	30ca7457b4	Some BTLs (e.g TCP) can report put/get completion before data actually hits the buffer on the other side. For this kind of BTLs we need to send FIN through the same BTL, PUT was performed with so network will handle ordering for us. If we will use another BTL, receiver can get FIN before data will hit the buffer and complete request prematurely. We mark such problematic BTLs with MCA_BTL_FLAGS_FAKE_RDMA flag (this kind of RDMA is really fake, because the real one guaranties that sender will see the completion only after receiver's NIC confirmed that all the data was received). This commit was SVN r12732.	2006-12-03 10:12:09 +00:00
Gleb Natapov	39c930b160	The bug fixing part of r12720 introduce much more serious bug that it fixes. It calls mca_pml_ob1_send_fin_btl() which may fail and doesn't check return code. This breaks all RDMA transports event when only one BTL is used. Revert it for now, I am working on a real fix for the problem (I hope). This commit was SVN r12731. The following SVN revision numbers were found above: r12720 --> open-mpi/ompi@3e3689320b	2006-12-03 08:55:59 +00:00
Gleb Natapov	65d7ad4581	The "bug fix" from the r12721 reverts part of the r12433 that fixed regresion from v1.1 was reviewed and put to v1.2 branch. So revert this part of r12721 back. This commit was SVN r12730. The following SVN revision numbers were found above: r12433 --> open-mpi/ompi@82f7c0dd69 r12721 --> open-mpi/ompi@3edd850d2e	2006-12-03 08:29:55 +00:00
George Bosilca	3edd850d2e	Some indentation and code arrangement. However, there is a bug fix. Force the PUT protocol to always obey to the btl_max_rdma_size. This commit was SVN r12721.	2006-12-01 22:26:14 +00:00
George Bosilca	3e3689320b	Some indentations and one BIG fix. Avoid race conditions on the PUT RDMA protocol when multiple NICS are available between 2 peers. The fix force the FIN message to take exactly the same path as the fragment it describe (i.e. same path means same BTL). Otherwise, the FIN can be received by the peer before the RDMA complete and the request will get freed too early. This commit was SVN r12720.	2006-12-01 21:52:07 +00:00
Ralph Castain	6d6cebb4a7	Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it. I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn). This commit was SVN r12597.	2006-11-14 19:34:59 +00:00
Andrew Friedley	a4bdcb4faa	Fix a segfault that turned up in more MPI_THREAD_MULTIPLE testing. Same sort of problem and fix as described in r12323 - mca_pml_ob1_recv_frag_progress() was segfaulting due to a NULL req_proc pointer. The path leading to this was through the mca_pml_ob1_check_cantmatch_for_match() function, where we can match a frag using the same macros as mca_pml_ob1_frag_match() and never initialize the req_proc pointer. This commit was SVN r12582. The following SVN revision numbers were found above: r12323 --> open-mpi/ompi@c752502dee	2006-11-13 20:12:51 +00:00
George Bosilca	a38cd366d7	Construct the convertor. It's not really required, but it's not in the critical path anyway. At least in debug mode we get nice informations about where the convertor was created. This commit was SVN r12549.	2006-11-10 20:55:06 +00:00
George Bosilca	858ab24e8e	The req_mtl field has to be the last in the struct or bad things happen. This commit was SVN r12548.	2006-11-10 20:53:41 +00:00
George Bosilca	17405cd9c6	A temporary fix, until we figure out a better approach. The problem is that if one add "pml=" to the configuration file, really bad things happen. All PMLs will get initialize, and each of them will initialize all BTLs. This patch force the mca_pml_base_pml to get initialized in all cases before we go out of the mca_pml_base_open function. This commit was SVN r12527.	2006-11-10 04:53:00 +00:00
George Bosilca	eab1776e9a	Explicit casts for our friendly Windows environment... This commit was SVN r12496.	2006-11-08 17:02:46 +00:00
George Bosilca	915d748d72	Initialize the convertor on _START not on _INIT. This allow us to set it up before the match when we know the peer, saving some time on the critical path. If the receive is ANY_SOURCE then we initialize the convertor on _MATCHED. Anyway, we will set it up only once per receive. This commit was SVN r12484.	2006-11-08 05:42:29 +00:00
George Bosilca	eb45a5e402	Move things around a little bit. Mainly fields from the send and receive request in the base request. Rearrange the fields to keep the data together. Remove some useless tests. This commit was SVN r12482.	2006-11-08 04:58:23 +00:00
Galen Shipman	55db17b37c	don't try to use a dead btl.. This commit was SVN r12456.	2006-11-06 23:25:24 +00:00
Galen Shipman	eef37430a7	failing already failed for ACK timeout.. This commit was SVN r12452.	2006-11-06 22:09:39 +00:00
Galen Shipman	813e7faea8	more fixes for failover.. and yet still more to come.. This commit was SVN r12450.	2006-11-06 21:27:17 +00:00
Gleb Natapov	82f7c0dd69	Fix regression from v1.1. 1) make the code do what comment says 2) if memory is prepinned don't send multiple PUT messages. This commit was SVN r12433.	2006-11-06 12:00:17 +00:00
Galen Shipman	f7c554df65	Try to failover when we get an async error from the lower layer (BTL).. This commit was SVN r12420.	2006-11-03 15:40:26 +00:00
Gleb Natapov	7b39039cd6	Add comments to process_pending functions. This commit was SVN r12346.	2006-10-29 09:12:24 +00:00
Gleb Natapov	8ef5b6a589	Change tabs to spaces to be consistent with the rest of the file. This commit was SVN r12345.	2006-10-29 08:12:44 +00:00
George Bosilca	a9c6ae8f15	Minimize the number of branches, and orce the correct prediction for the most usual one. Most of the time we expect the functions which allocate requests to succeed. This commit was SVN r12344.	2006-10-27 23:16:13 +00:00
George Bosilca	44f3dd81b4	Update the comment to reflect what's inside the code. This commit was SVN r12343.	2006-10-27 23:09:37 +00:00
George Bosilca	3472d19d4d	Do not modify the convertor if there is no data to be send across the network. The req_bytes_packed field is initialized in the BASE_INIT macro, so it is set for all requests at this stage. This commit was SVN r12342.	2006-10-27 23:03:15 +00:00
Jeff Squyres	020efdf1f9	Refs trac:250 This commit essentially caches the invoking comm/win/file on the ompi_request_t. This, paired with the req_type field, allows us to retrieve the invoking MPI object and invoke the proper errhandler. The patch is missing most updates for the MPI-2 one-sided stuff (i.e., the patch mainly fixes comms and files); I didn't really understand that code and didn't want to hazard trying to figure it out when Brian can probably do it much more quickly. So #250 will still stay open, pending MPI-2 one-sided updates for this stuff. This commit was SVN r12339. The following Trac tickets were found above: Ticket 250 --> https://svn.open-mpi.org/trac/ompi/ticket/250	2006-10-27 12:35:27 +00:00
Jeff Squyres	e02114dcf3	Fixes trac:529. * Create a new request type: NOOP (described below) * For all MPI__INIT functions, OBJ_NEW an ompi_request_t and set its type to NOOP Ensure that the NOOP requests are OBJ_RELEASE'd when they are done * MPI_START looks at the request type; if NOOP, just return success. If not, call the PML start() function * MPI_STARTALL always pass the entire array of requests back to the PML (see next point) * Make the PMLs only process PML requests (i.e., ignore/skip anything that isn't of type PML -- such as the NOOP requests) * Add a little more param error checking in STARTALL This commit was SVN r12338. The following Trac tickets were found above: Ticket 529 --> https://svn.open-mpi.org/trac/ompi/ticket/529	2006-10-27 12:32:36 +00:00
George Bosilca	126a68dc9a	Big datatype commit. Remove all unused features of the datatype engine. As the memory allocation logic is completely done outside the data-type engine (in the PML) there is no need for any special case inside the data-type engine. There is less arguments for the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is not required anymore as there is no memory allocated in the engine itself). This change affect all components using datatypes. I test most of them, but it might happens that I miss some ... If it's the case please let me know (don't shoot the pianist!!). This commit was SVN r12331.	2006-10-26 23:11:26 +00:00
Andrew Friedley	c752502dee	Fix for a common race condition when running the Sandia mt_send_recv.cc test. A segfault would occur in mca_pml_ob1_recv_request_progress() when trying to prepare the convertor for unpacking, because the request's req_proc field was NULL. Turns out that we weren't setting the req_proc field in the MCA_PML_OB1_CHECK_SPECIFIC_AND_WILD_RECEIVES_FOR_MATCH macro. Instead of just setting it there I removed the other place req_proc was being set correctly, and instead took care of all the cases at once in mca_pml_ob1_recv_frag_match(). This commit was SVN r12323.	2006-10-26 19:09:39 +00:00
Gleb Natapov	90be664b9f	Some process_pending() functions get bml_btl on which resource was freed as a parameter. For optimisation purpose only this BTL is used to send packet through instead of trying to send packets through all BTLs. But actually the code was wrong. It simply used provided bml_btl and it may represent different endpoint from packet's destination. The fixed code checks if packet's destination is reachable through the BTL, finds appropriate bml_btl and only then tries to send it through correct bml_btl. This commit was SVN r12319.	2006-10-26 13:21:47 +00:00
Sven Stork	f3f39e003e	- Increment the pipeline depth before we trigger the send function. As mentioned in the comment the completion/callback of the triggered send operation can happen before the call returns. If this happens and if the pipeline depth is 0 before we triggered the send operation and this is the last send operation of the request then the completion detection code will decrement the pipeline depth and check it for equality to 0. Because (0-1) != 0 the pml completion function for this request will not be called. This part 2 of the fix for ticket #246. This commit was SVN r12292.	2006-10-25 08:52:39 +00:00
George Bosilca	06563b5dec	Last set of explicit conversions. We are now close to the zero warnings on all platforms. The only exceptions (and I will not deal with them anytime soon) are on Windows: - the write functions which require the length to be an int when it's a size_t on all UNIX variants. - all iovec manipulation functions where the iov_len is again an int when it's a size_t on most of the UNIXes. As these only happens on Windows, so I think we're set for now :) This commit was SVN r12215.	2006-10-20 03:57:44 +00:00
Galen Shipman	2036bf5c3c	make smart and dumb compilers happy This commit was SVN r12178.	2006-10-18 19:33:39 +00:00
Rainer Keller	47b24a0603	- Now the branch is done, linearize access regarding request handling. Buys a little bit on IMB, no functional change, otherwise. This commit was SVN r12165.	2006-10-18 16:11:50 +00:00
George Bosilca	6f5ec2390b	pedantic... This commit was SVN r12147.	2006-10-17 20:25:40 +00:00
George Bosilca	8852c00c36	Look like a big commit but in fact it address only one issue. The way we're working with size and diplacement of data-type. After this patch all data can contain size_t bytes and the displacements are defined as ptrdiff_t. All of the files I was able to compile have been modified to match this requirement. This commit was SVN r12146.	2006-10-17 20:20:58 +00:00
George Bosilca	ed83927025	Don't reset the convertor when a persistent request complete. Instead reset it next time then request is used. This will keep the execution path on the default case (not persistent) shorter. This commit was SVN r12134.	2006-10-17 05:01:47 +00:00
George Bosilca	ef66afe45c	Another inner loop optimization. Only check for num_fails when prev_bytes is equal to num_bytes. This commit was SVN r12133.	2006-10-17 04:38:38 +00:00
George Bosilca	b27f1814c6	If the function is expected to return a bool then let's return only true or false. This commit was SVN r11991.	2006-10-05 05:10:34 +00:00
George Bosilca	e4df4285b1	Reorder the enum in order to allow some compilers to optimize the big switch in the header analisys. This commit was SVN r11975.	2006-10-04 20:03:28 +00:00
Andrew Friedley	836261b85a	Fixes ticket 186. First, move the OPAL_THREAD_LOCK out to the same level as its corresponding UNLOCK. It was possible to hit the UNLOCK without ever acquiring the lock. Since the OPAL_THREAD_ADD64() is now protected by this lock, we can just do the decrement non-atomically. This commit was SVN r11958.	2006-10-03 18:15:26 +00:00
Andrew Friedley	1177844d7a	Fixes trac:183. Don't try to acquire ompi_request_lock here, which in all cases is already held. Avoids deadlock that occurs even when threads are enabled and we're running a THREAD_SINGLE app. Reviewed by Galen. This commit was SVN r11957. The following Trac tickets were found above: Ticket 183 --> https://svn.open-mpi.org/trac/ompi/ticket/183	2006-10-03 18:08:48 +00:00

... 2 3 4 5 6 ...

717 Коммитов