openmpi

Автор	SHA1	Сообщение	Дата
George Bosilca	b58dae00db	Allow PERUSE to compile correctly. This commit was SVN r17008.	2007-12-21 06:18:19 +00:00
Gleb Natapov	35bf8c7c46	Rewrite OB1 matching logic. Get rid of macros, make the code shorter. This commit was SVN r16993.	2007-12-19 09:16:20 +00:00
Gleb Natapov	5cd38b8b06	Better encapsulate heterogeneous arch handling in ob1. This commit was SVN r16970.	2007-12-16 08:45:44 +00:00
Gleb Natapov	8b511b969d	Introduce a new BTL parameter btl_rndv_eager_limit which determines size of a first fragment of rendezvous protocol. Remove no longer used btl_min_send_size parameter. This commit was SVN r16969.	2007-12-16 08:35:17 +00:00
Jeff Squyres	213b5d5c6e	Per long threads on the mailing list and much confusion discussion about linkers, have all OPAL, ORTE, and OMPI components '''not'' link against the OPAL, ORTE, or OMPI libraries. See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a better-formatted version of the same info). This commit was SVN r16968.	2007-12-15 13:32:02 +00:00
Gleb Natapov	e0dc53e516	Use mca_bml_base_send_status() in OB1. This commit was SVN r16905.	2007-12-09 14:13:24 +00:00
Gleb Natapov	e2e211f23b	Add flags parameter to btl_alloc() and btl_prepare_src() functions. If BTL knows at the time of allocation priority of a descriptor it may do some optimizations. This commit was SVN r16901.	2007-12-09 14:08:01 +00:00
Gleb Natapov	2d784752dd	Remove descriptor caching form BML. With descriptor caching some optimizations are impossible. This commit was SVN r16897.	2007-12-09 13:58:17 +00:00
Rich Graham	27a748e7eb	change all instances of ompi_free_list_init to ompi_free_list_init_new. Header and payload data are specified separately at this stage. This commit was SVN r16633.	2007-11-01 23:38:50 +00:00
Gleb Natapov	52c6160252	MCA_PML_BASE_REQUEST_MPI_COMPLETE() macro does nothing except call to ompi_request_complete(). Remove the macro and call the function directly. This commit was SVN r16498.	2007-10-18 14:20:24 +00:00
Gleb Natapov	e0a3a7e53e	Move duplicated code all over the code to a single function ompi_request_wait_completion(). This commit was SVN r16494.	2007-10-18 12:33:21 +00:00
Gleb Natapov	807f49ed7f	If there are more then one BTL present we may divide payload between them in such a way that converter will not be able to pack some of it. This commit adds handling of such cases. If converter can't pack any data for a BTL the data is sent over another BTL that has data to send. This commit was SVN r16493.	2007-10-18 12:07:37 +00:00
Gleb Natapov	1330974e5e	eager_limit is no longer needed in OB1 PML. Remove it. This commit was SVN r16442.	2007-10-15 09:26:42 +00:00
Ralph Castain	54b2cf747e	These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC. The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component. This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done: As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in. In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in. The incoming changes revamp these procedures in three ways: 1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step. The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic. Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure. 2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed. The size of this data has been reduced in three ways: (a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes. To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose. (b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction. (c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using. While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly. 3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup. It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging. Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future. There are a few minor additional changes in the commit that I'll just note in passing: propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details. * requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details. * cleanup of some stale header files This commit was SVN r16364.	2007-10-05 19:48:23 +00:00
Gleb Natapov	097b17d30e	Prevent a receive request from been freed while other thread holds a reference to it or there is an outstanding completion for the request. This commit was SVN r16153.	2007-09-18 16:18:47 +00:00
George Bosilca	05ae27c68b	Don't segfault if we receive a fragment for a non existing communicator. Instead, drop it by now. This commit was SVN r16105.	2007-09-12 17:52:02 +00:00
Shiqing Fan	a0660f4deb	- Just some type casts. This commit was SVN r16100.	2007-09-12 15:29:58 +00:00
Gleb Natapov	07c8fddeef	Fix scheduling of pending send request. It should be scheduled req_lock times. This commit was SVN r16096.	2007-09-12 07:08:38 +00:00
George Bosilca	d8fed2cfa1	Set a default value so that some compilers stop complaining about uninitialized values. This commit was SVN r16094.	2007-09-11 18:00:53 +00:00
Gleb Natapov	79011279e5	Remove debug output. This commit was SVN r16016.	2007-08-30 13:29:41 +00:00
Gleb Natapov	690fb95bda	Cleanup send scheduling code. This commit was SVN r16014.	2007-08-30 12:10:04 +00:00
Gleb Natapov	0b0f9d14aa	Mark send request complete on PML level only when absolutely sure there is no more work associated with this request. No more outstanding completions or packets and send scheduling isn't running in another thread. This commit was SVN r16013.	2007-08-30 12:08:33 +00:00
Gleb Natapov	eac2674f66	The inner voice tells me this is a typo. This commit was SVN r16004.	2007-08-29 13:28:47 +00:00
Brian Barrett	59b22533f2	Enable RDMA for heterogeneous situations. Currently done by overloading the ompi_convertor_need_buffers function to only return 0 if the convertor is homogeneous (which it never does on the trunk, but does to on v1.2, but that's a different issue). Only enable the heterogeneous rdma code for a btl if it supports it (via a flag), as some btls need some work for this to work properly. Currently only TCP and OpenIB extensively tested This commit was SVN r15990.	2007-08-28 21:23:44 +00:00
Gleb Natapov	fa69c5cc10	If a memory on a sender's size is not registered don't register it on a receive side too. Otherwise a content of the recvreq->req_rdma array is replaced later without freeing previous content and refcount on registration in mpool become wrong. This commit was SVN r15978.	2007-08-28 07:43:06 +00:00
Gleb Natapov	e1a1d9d90e	Receive request converter can be accessed in parallel by a thread that receives data and a thread that run RDMA schedule function. Protect access to the converter by a lock. This commit was SVN r15967.	2007-08-27 11:41:42 +00:00
Gleb Natapov	065d04dfde	Do not free recvreq while schedule function is running in another thread. This commit was SVN r15964.	2007-08-27 11:31:40 +00:00
Rainer Keller	1b5fa48a29	- Add missing PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q when matching from the posted generic_recv-queue. - Move the PERUSE_COMM_MSG_MATCH_POSTED_REQ from MCA_PML_OB1_RECV_REQUEST_MATCHED to mca_pml_ob1_recv_frag_match() as suggested by Terry Dontje Only post, if this is not a probe/iprobe request. - Do not post PERUSE_COMM_REQ_MATCH_UNEX for probes / iprobes and do in correct order before PERUSE_COMM_MSG_REMOVE_FROM_UNEX_Q This commit was SVN r15947.	2007-08-23 07:09:43 +00:00
Rainer Keller	c175801f98	- Initialize in the order of mca_pml_ob1_comm_proc_t... This commit was SVN r15946.	2007-08-23 05:56:22 +00:00
Rainer Keller	b0df55d53b	- For MPI_Probe/MPI_Iprobe, we should not have a PERUSE_COMM_REQ_ACTIVATE event. Therefore move the PERUSE_TRACE_COMM_EVENT for this event from MCA_PML_BASE_SEND_REQUEST_INIT / MCA_PML_BASE_RECV_REQUEST_INIT to the proper places into pml_ob1_isend.c / pml_ob1_irecv.c right after the MCA_PML_OB1_SEND_REQUEST_INIT / MCA_PML_OB1_RECV_REQUEST_INIT. This commit was SVN r15945.	2007-08-23 05:52:33 +00:00
Gleb Natapov	5596aa5f53	The sizes of mca_pml_ob1_send_request_t and mca_pml_ob1_recv_request_t depend on a parameter and are determined in runtime. r15346 removed calculation of correct sizes for this structures. This patch adds it back and fixes trac:1116, #1114. This commit was SVN r15932. The following SVN revision numbers were found above: r15346 --> open-mpi/ompi@433f8a7694 The following Trac tickets were found above: Ticket 1116 --> https://svn.open-mpi.org/trac/ompi/ticket/1116	2007-08-20 12:06:27 +00:00
Mohamad Chaarawi	59a7bf8a9f	Merging in the Sparse Groups.. This commit includes config changes.. This commit was SVN r15764.	2007-08-04 00:41:26 +00:00
Gleb Natapov	627d9bc8ed	Delay freeing of a send request if scheduling function is running by other thread. This commit was SVN r15722.	2007-08-01 12:19:16 +00:00
Gleb Natapov	afac5eb93f	Guard recv request with lock against simultaneous access from different threads. This commit was SVN r15681.	2007-07-30 12:50:38 +00:00
Gleb Natapov	21dd061696	Init req_send_range_lock. Found by Terry Dontje. This commit was SVN r15677.	2007-07-30 08:21:52 +00:00
Brian Barrett	39a6057fc6	A number of improvements / changes to the RML/OOB layers: * General TCP cleanup for OPAL / ORTE * Simplifying the OOB by moving much of the logic into the RML * Allowing the OOB RML component to do routing of messages * Adding a component framework for handling routing tables * Moving the xcast functionality from the OOB base to its own framework Includes merge from tmp/bwb-oob-rml-merge revisions: r15506, r15507, r15508, r15510, r15511, r15512, r15513 This commit was SVN r15528. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15506 r15507 r15508 r15510 r15511 r15512 r15513	2007-07-20 01:34:02 +00:00
Rich Graham	0991c3d5f5	move buffered send component clean up out of the pml to ompi_mpi_finalize. This commit was SVN r15463.	2007-07-17 14:50:52 +00:00
Rich Graham	1a4ce2a961	move setting of the component used to managed buffer sends out of the pmls, and into ompi_mpi_init. This is the first of several steps to pull buffered send management out of the pmls. This commit was SVN r15451.	2007-07-16 21:52:25 +00:00
George Bosilca	8643f38adf	Don't allow the BTL to be closed before the end of the process. Count the number of times the BTLs are opened, and then don't remove them until close was called the same number of times. This commit was SVN r15376.	2007-07-11 22:21:04 +00:00
George Bosilca	e19777e910	A more consistent version. As we now share the send and receive queue, we have to construct/destruct only once. Therefore, the construction will happens before digging for a PML, while the destruction just before finalizing the component. Add some OPAL_LIKELY/OPAL_UNLIKELY. This commit was SVN r15347.	2007-07-10 23:45:23 +00:00
George Bosilca	433f8a7694	This patch bring full support for message queues in Open MPI. Now the send and receive queues are shared among all PMLs, they are declared in the base PML, and the selected PML is in charge of initializing and releasing them. The CM PML is slightly different compared with OB1 or DR. Internally it use 2 different types of requests: light and heavy. However, now with this patch both types of requests are stored in the same queue, and cast appropriately on the allocation macro. This means we might use less memory than we allocate, but in exchange we got full support for most of the parallel debuggers. Another thing with this patch, is that now for all PML (CM included) the basic PML requests start with the same fields, and they are declared in the same order in the request structure. Moreover, the fields have been moved in such a way that only one volatile/atomic will exist per line of cache (hopefully). This commit was SVN r15346.	2007-07-10 22:16:38 +00:00
Brian Barrett	8b9e8054fd	Move modex from pml base to general ompi runtime, sicne it's used by more than just the PML/BTLs these days. Also clean up the code so that it handles the situation where not all nodes register information for a given node (rather than just spinning until that node sends information, like we do today). Includes r15234 and r15265 from the /tmp/bwb-modex branch. This commit was SVN r15310. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15234 r15265	2007-07-09 17:16:34 +00:00
Tim Prins	f3ac4ac20e	Fix order of function arguments This commit was SVN r15304.	2007-07-08 16:37:51 +00:00
Rainer Keller	cff1b6a71b	- PERUSE_COMM_REQ_XFER_BEGIN should be emited for first fragment of larger message as well. This commit was SVN r15299.	2007-07-06 15:02:36 +00:00
George Bosilca	951e4929b9	Usually it's unlikely to have additional fragments. This commit was SVN r15253.	2007-07-01 16:19:53 +00:00
George Bosilca	c435094639	Only trigger the PERUSE_COMM_REQ_XFER_BEGIN event on the initial fragment. This commit was SVN r15252.	2007-07-01 16:19:13 +00:00
George Bosilca	60319f99ac	Make sure in case of error what we return is clean (set to NULL). This commit was SVN r15251.	2007-07-01 16:17:43 +00:00
George Bosilca	11656e20aa	Remove few warnings. This commit was SVN r15250.	2007-07-01 16:16:05 +00:00
Gleb Natapov	77e54ebc7e	Schedule RDMA op on the last BTL that got completion. This commit was SVN r15249.	2007-07-01 11:35:55 +00:00
Gleb Natapov	54b40aef91	Schedule SEND traffic of pipeline protocol between BTLs in accordance with relative bandwidths of each BTL. Precalculate what part of a message should be send via each BTL in advance instead of doing it during scheduling. This commit was SVN r15248.	2007-07-01 11:34:23 +00:00
Gleb Natapov	e74aa6b295	Schedule RDMA traffic between BTLs in accordance with relative bandwidths of each BTL. Precalculate what part of a message should be send via each BTL in advance instead of doing it during scheduling. This commit was SVN r15247.	2007-07-01 11:31:26 +00:00
Gleb Natapov	1c7141df4d	Remove unused struct. This commit was SVN r15228.	2007-06-28 11:58:16 +00:00
Gleb Natapov	b88b7dedfe	Rename btl_rdma_offset to btl_pipeline_send_length. This commit was SVN r15153.	2007-06-21 07:12:40 +00:00
Josh Hursey	7fd1805e97	Fix a couple of compile warnings that Tim P brought to by attention. This commit was SVN r15132.	2007-06-19 00:46:16 +00:00
Rainer Keller	ca09aae2cc	- Get PERUSE compile again with latest RDMA changes in r14768/r14842. This commit was SVN r15042. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07 r14842 --> open-mpi/ompi@10266fb467	2007-06-13 12:47:47 +00:00
Brian Barrett	84d1512fba	Add the potential for doing some basic error checking on mutexes during single threaded builds. In its default configuration, all this does is ensure that there's at least a good chance of threads building based on non-threaded development (since the variable names will be checked). There is also code to make sure that a "mutex" is never "double locked" when using the conditional macro mutex operations. This is off by default because there are a number of places in both ORTE and OMPI where this alarm spews mega bytes of errors on a simple test. So we have some work to do on our path towards thread support. Also removed the macro versions of the non-conditional thread locks, as the only places they were used, the author of the code intended to use the conditional thread locks. So now you have upper-case macros for conditional thread locks and lowercase functions for non-conditional locks. Simple, right? :). This commit was SVN r15011.	2007-06-12 16:25:26 +00:00
Gleb Natapov	423f404c34	Shut up compiler warning. Ugly, but I can see better way except changing converter to use uint64_t(ssize_t?) for offset. This commit was SVN r14950.	2007-06-07 11:33:28 +00:00
Gleb Natapov	9f9b64db4e	Revert r14947 as this doesn't solve the problem. This commit was SVN r14949. The following SVN revision numbers were found above: r14947 --> open-mpi/ompi@5b9fe28e3f	2007-06-07 11:24:24 +00:00
Gleb Natapov	5b9fe28e3f	Fix warning on 32bit systems. This commit was SVN r14947.	2007-06-07 08:57:34 +00:00
Rich Graham	e276f7bcc7	undo my error. This commit was SVN r14890.	2007-06-05 23:32:47 +00:00
Rich Graham	ce0e9ac329	initialize lock properly. This commit was SVN r14881.	2007-06-05 20:34:11 +00:00
Gleb Natapov	10266fb467	Fix deadlock in OB1 protocol by by sending memory by copying if registration fails. This commit was SVN r14842.	2007-06-03 08:31:58 +00:00
Gleb Natapov	a25e1e7b15	Implement new function mca_pml_ob1_send_requst_copy_in_out(req, offset, len) that allows to send any range of a request by send/recv instaed of RDMA and use it to send data from the end of a request in pipeline protocol. This commit was SVN r14841.	2007-06-03 08:30:07 +00:00
Gleb Natapov	06bf5d74e7	Remove mca_pml_ob1_send_fin_btl function. This commit was SVN r14784.	2007-05-28 06:51:12 +00:00
Gleb Natapov	f5078db0db	Fix order of parameters to function. This commit was SVN r14783.	2007-05-27 13:45:24 +00:00
Gleb Natapov	ad69d3c6ac	Fix out of resource handling for FIN packets broken by r14768. This commit was SVN r14780. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07	2007-05-27 08:29:38 +00:00
Galen Shipman	3401bd2b07	Add optional ordering to the BTL interface. This is required to tighten up the BTL semantics. Ordering is not guaranteed, but, if the BTL returns a order tag in a descriptor (other than MCA_BTL_NO_ORDER) then we may request another descriptor that will obey ordering w.r.t. to the other descriptor. This will allow sane behavior for RDMA networks, where local completion of an RDMA operation on the active side does not imply remote completion on the passive side. If we send a FIN message after local completion and the FIN is not ordered w.r.t. the RDMA operation then badness may occur as the passive side may now try to deregister the memory and the RDMA operation may still be pending on the passive side. Note that this has no impact on networks that don't suffer from this limitation as the ORDER tag can simply always be specified as MCA_BTL_NO_ORDER. This commit was SVN r14768.	2007-05-24 19:51:26 +00:00
Ralph Castain	4fff584a68	Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that did start. The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system. Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed. Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief. With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn. Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put". This commit was SVN r14711.	2007-05-21 18:31:28 +00:00
Gleb Natapov	3ebaff8dfe	Implement new BTL parameters: We eagerly send data up to btl__eager_limit with the match Upon ACK of the MATCH we start using send/receives of size btl__max_send_size up to the btl__rdma_pipeline_offset After the btl__rdma_pipeline_offset we begin using RDMA writes of size btl__rdma_pipeline_frag_size. Now, on a per message basis we only use the above protocol if the message is larger than btl__min_rdma_pipeline_size btl__eager_limit - > same btl__max_send_size -> same btl__rdma_pipeline_offset -> btl__min_rdma_size btl__rdma_pipeline_frag_size -> btl__max_rdma_size btl_*_min_rdma_pipeline_size is new.. This patch also moves all BTL common parameters initialisation into btl_base_mca.c file. This commit was SVN r14681.	2007-05-17 07:54:27 +00:00
Gleb Natapov	2562253678	Do more work at RDMA frag preparation time and less work at RDMA frag sending time. This commit was SVN r14627.	2007-05-09 12:11:51 +00:00
Gleb Natapov	78fda79630	Use size_t instead of uint64_t in call to convertor cloning. This commit was SVN r14626.	2007-05-09 10:02:06 +00:00
Gleb Natapov	8029893489	In multithreaded application sending of initial portion of a request may overlap with RDMAing the rest of it. Also more than one RDMA writes can be performed simultaneously by different threads. To make this code thread safe this patch clones original request convertor for each RDMA fragment. This commit was SVN r14574.	2007-05-03 09:13:17 +00:00
George Bosilca	bb481273a6	Typos. This commit was SVN r14546.	2007-04-28 19:15:53 +00:00
Ralph Castain	18b2dca51c	Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly. There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd really have to try). This also involved a slight change to the oob.xcast API, so propagated that as required. Note: this has only been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-) Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately. This commit was SVN r14475.	2007-04-23 18:41:04 +00:00
Adrian Knoth	339dbf6cd5	Cosmetics. Enforcing style guide. This commit was SVN r14459.	2007-04-21 21:47:25 +00:00
Josh Hursey	eef364546c	Check for NULL before trying to use the variable. This commit was SVN r14444.	2007-04-20 17:17:11 +00:00
Josh Hursey	12e5d0e817	ft_event Commit: - Move the PML Modex stuff out of the BML -- Abstraction violation. - Also fix the location of the add_procs with respect to the stage gates. This commit was SVN r14422.	2007-04-19 03:05:12 +00:00
George Bosilca	51fc2474f1	Don't keep the data attached to a fragment segmented when we have to move it into the unexpected queue. Instead pack the data in only one buffer. Now the code look more optimized and clear, but I have a doubt about who's using this functionality. I think that all BTLs always return only one memory segment attached to the matching fragment (i.e. there is no unexpected iov type receive). This commit was SVN r14416.	2007-04-18 15:52:11 +00:00
Josh Hursey	8f119d9063	Closes trac:977 Fix for memory corruption in the restarted process stack. This stemed from the brute force method we were previously using. This commit fixes this by using a lighter weight solution focused in the r2 BML instead of above the PML. This is a more efficient and flexible solution, and it solves the original problem. In the process I pulled out the ft_event function in the tcp BTL and r2 BML into a set of *_ft.[c\|h] files just to keep any updates to these code paths as isolated as possible to make merging easier on everyone. This commit was SVN r14371. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855 The following Trac tickets were found above: Ticket 977 --> https://svn.open-mpi.org/trac/ompi/ticket/977	2007-04-14 02:06:05 +00:00
Jeff Squyres	51f286d737	Just like r14289 on the ORTE trunk: Per discussions with Brian and Ralph, make a slight correction in where components are installed. Use $pkglibdir, not $libdir/openmpi, so that when compiled in the orte trunk, components are installed to the right directory (because the component search patch is checking $pkglibdir). This commit was SVN r14345. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r14289	2007-04-12 11:19:42 +00:00
Galen Shipman	d7e428909e	two fixes, one mine, the other gleb's, I'm committing for gleb due to time difference... 1) The PML makes an assumption on local/remote completion semantics of the BTL which Self BTL does not obey, nor should it, so we fix the PML 2) The Get protocol must handle the case when sender and reciever do not agree on wheter the data is contiguous This commit was SVN r14313.	2007-04-11 22:03:06 +00:00
Josh Hursey	38547459ae	Improve the cleanup process in ob1 Remove a redundant statement in the r2 BML. This commit was SVN r14228. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855	2007-04-05 17:37:29 +00:00
Josh Hursey	98fb9f26ef	Some cleanup. - Remove an old comment from crcp_base_fns.c - Let ob1 have its very own ft_event function (which I'll fill in shortly) - Make sure ob1 finalizes the bsend stuff so we don't leave a bunch of memory sitting around - PML base - destruct the array upon finalize. Shrink the include search so it stops after finding a match This commit was SVN r14222.	2007-04-05 13:52:05 +00:00
George Bosilca	1cb26e3b9c	Finally the convertor export a convenience function to allow a consistent computation of the current location on the pack/unpack process. This can be used both for retrieving the pointer to the first byte (in the special case of the cached RDMA protocol) and for getting the current position (for the pipelined protocol). I modified all BTLs, but most of them are still untested. This commit was SVN r14180.	2007-03-30 22:02:45 +00:00
Galen Shipman	a78672be2b	fix mpi_leave_pinned case for arbitrary datatypes George will be streamlining this with a new convertor function soon... This commit was SVN r14174.	2007-03-30 02:06:08 +00:00
Galen Shipman	ace68b1883	Change the way we handle unexpected messages, if less than or equal pml_ob1_unexpected_limit just buffer in the PML level recv fragment else allocate a buffer via the bucket allocator This commit was SVN r14117.	2007-03-22 01:00:34 +00:00
Josh Hursey	dadca7da88	Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158	2007-03-16 23:11:45 +00:00
George Bosilca	2e042c91cf	Once we compute the local offset use it (instead of the global one). This commit was SVN r13634.	2007-02-13 09:34:04 +00:00
Gleb Natapov	1033002595	Fix memory leak. Free allocated descriptor if operation cannot proceed. This commit was SVN r13610.	2007-02-12 09:47:51 +00:00
Brian Barrett	041beeb1b6	Share currently selected PML in the modex information, then check whenever adding new procs that the remote proc's pml is the same as our local pml. Turns the hangs from mismatched PMLs into an abort, which is better, I think. This commit was SVN r13582.	2007-02-09 16:38:16 +00:00
Galen Shipman	ec610a9e65	spread priorities out a bit.. This commit was SVN r13487.	2007-02-04 00:55:25 +00:00
Galen Shipman	a94101fa62	mostly another hack around for PML selection, allows CM be select itself if an MTL is available, if not OB1 is used. Still prevents DR and OB1 from stomping on each other though. This commit was SVN r13481.	2007-02-03 02:01:18 +00:00
George Bosilca	0ff2115964	Other warnings are now silenced. This commit was SVN r13462.	2007-02-02 06:47:35 +00:00
Rainer Keller	ca35881cd0	- Minor bugfixes and removed compiler warnings This commit was SVN r13343.	2007-01-28 19:52:09 +00:00
Jeff Squyres	52ca6cf86c	The mpi_leave_pinned and mpi_leave_pinned_pipeline MCA parameters were needlessly registered in multiple different places, and none of them had a good help string. There was also an inconsistent check for setting both mpi_leave_pinned and mpi_leave_pinned_pipeline (i.e., it was only in ob1). This commit moves the registration of these params to one central place (ompi/runtime/ompi_mpi_params.c, with all other mpi_* MCA params) and uses globals to propagate the values as relevant. The error check was also moved to the central location to ensure that we can consistency everywhere. This commit was SVN r13226.	2007-01-21 14:02:06 +00:00
Gleb Natapov	4c7dbd36c7	Balance RDMA operation in round robin fashion between all available RDMA BTLs. OB1 always use first element from array of BTLs available for RDMA. The patch change the array creation algorithm, it puts different BTL in the first element in round robin fashion. This commit was SVN r13174.	2007-01-18 09:15:18 +00:00
Brian Barrett	a34e67d743	Remove unneeded PARAM_INIT_FILE variable in configure.params files used by components that use configure.m4 for configuration or are always built. The macro has not been needed since moving to configure types other than configure.stub Fixes trac:590 This commit was SVN r13031. The following Trac tickets were found above: Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590	2007-01-08 03:44:22 +00:00
Brian Barrett	8900d3ae43	Second take at fixing the issues with using ompi_ptr_t. Add helper functions for converting from .pval to .lval and vice-versa. Users of ompi_ptr_t types should only use one of the fields in the union unless using the helper conversion functions. For the BTLs, local pointers will always be stored in the .pval field and remote pointers always stored in the .lval field. George wrote the initial patch, I extended it slightly and am responsible for all bugs found. Refs trac:587 This commit was SVN r13023. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2007-01-07 01:48:57 +00:00
Brian Barrett	48ec0b2071	Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix for now... This commit was SVN r12997. The following SVN revision numbers were found above: r12974 --> open-mpi/ompi@27cea44a9c	2007-01-04 22:07:37 +00:00
Galen Shipman	931a389c4f	fix deadlock on rendezvous protocol.. This commit was SVN r12982.	2007-01-04 03:46:11 +00:00

1 2 3 4 5 ...

391 Коммитов