openmpi

Автор	SHA1	Сообщение	Дата
Rainer Keller	ca35881cd0	- Minor bugfixes and removed compiler warnings This commit was SVN r13343.	2007-01-28 19:52:09 +00:00
George Bosilca	790f175d4e	Explicit conversions to make the code Windows friendly. This commit was SVN r13266.	2007-01-24 00:50:24 +00:00
Rainer Keller	96030de97b	- Initialize the size of the opal_object class. - Use the OBJ_CLASS_INSTANCE macro to initialize classes. This also gets rid of several missing initialization errors. This commit was SVN r13227.	2007-01-21 14:24:29 +00:00
Jeff Squyres	52ca6cf86c	The mpi_leave_pinned and mpi_leave_pinned_pipeline MCA parameters were needlessly registered in multiple different places, and none of them had a good help string. There was also an inconsistent check for setting both mpi_leave_pinned and mpi_leave_pinned_pipeline (i.e., it was only in ob1). This commit moves the registration of these params to one central place (ompi/runtime/ompi_mpi_params.c, with all other mpi_* MCA params) and uses globals to propagate the values as relevant. The error check was also moved to the central location to ensure that we can consistency everywhere. This commit was SVN r13226.	2007-01-21 14:02:06 +00:00
Rolf vandeVaart	6a260e4a9a	Fix two problems. For MPI_Buffer_detach, do not attempt to return the buffer address from Fortran. It is not expected behavior. For MPI_Buffer_attach, adjust the address of the buffer handed in so it is always aligned. Refs trac:750 Buffer detach reviewed by Jeff Squyres Buffer attach alignment reviewed by George Bosilca This commit was SVN r13205. The following Trac tickets were found above: Ticket 750 --> https://svn.open-mpi.org/trac/ompi/ticket/750	2007-01-18 23:32:39 +00:00
Ralph Castain	4ef4cbb5ad	Fix a compiler warning about comparing signed/unsigned values This commit was SVN r13190.	2007-01-18 17:14:06 +00:00
Gleb Natapov	4c7dbd36c7	Balance RDMA operation in round robin fashion between all available RDMA BTLs. OB1 always use first element from array of BTLs available for RDMA. The patch change the array creation algorithm, it puts different BTL in the first element in round robin fashion. This commit was SVN r13174.	2007-01-18 09:15:18 +00:00
Jeff Squyres	52e8089600	Fix compiler warning. This commit was SVN r13148.	2007-01-17 14:23:46 +00:00
George Bosilca	87ff2b5ce8	Cast to the correct type. This commit was SVN r13046.	2007-01-08 22:04:01 +00:00
George Bosilca	53ddbe8446	Nothing relevant. This commit was SVN r13044.	2007-01-08 22:02:17 +00:00
Brian Barrett	a34e67d743	Remove unneeded PARAM_INIT_FILE variable in configure.params files used by components that use configure.m4 for configuration or are always built. The macro has not been needed since moving to configure types other than configure.stub Fixes trac:590 This commit was SVN r13031. The following Trac tickets were found above: Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590	2007-01-08 03:44:22 +00:00
Brian Barrett	8900d3ae43	Second take at fixing the issues with using ompi_ptr_t. Add helper functions for converting from .pval to .lval and vice-versa. Users of ompi_ptr_t types should only use one of the fields in the union unless using the helper conversion functions. For the BTLs, local pointers will always be stored in the .pval field and remote pointers always stored in the .lval field. George wrote the initial patch, I extended it slightly and am responsible for all bugs found. Refs trac:587 This commit was SVN r13023. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2007-01-07 01:48:57 +00:00
Brian Barrett	48ec0b2071	Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix for now... This commit was SVN r12997. The following SVN revision numbers were found above: r12974 --> open-mpi/ompi@27cea44a9c	2007-01-04 22:07:37 +00:00
Galen Shipman	931a389c4f	fix deadlock on rendezvous protocol.. This commit was SVN r12982.	2007-01-04 03:46:11 +00:00
Brian Barrett	27cea44a9c	Fix a number of issues with the ompi_ptr_t: * Make sure that the pval always writes to the correct portion of the lval. This only matters on 32 bit big endian machines. * On 32 bit machines when assigning to pval, the other 4 bytes of lval weren't being written, which could lead to bogus data We use macros so that there aren't casts all over the code and the pval assignment can occur to the correct 4 bytes. Refs trac:587 This commit was SVN r12974. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2007-01-03 19:47:48 +00:00
Gleb Natapov	a6127fd8ce	Increase req_bytes_delivered atomically. This commit was SVN r12971.	2007-01-03 15:19:34 +00:00
Gleb Natapov	79202561f6	Don't check req_pipeline_depth on frag completion. Checking of req_bytes_delivered should be enough. This commit was SVN r12967.	2007-01-03 14:44:20 +00:00
Gleb Natapov	1ad6c41735	Sender can start scheduling send fragments immediately after receiving ACK. No need to wait for RNDV completion. This commit was SVN r12965.	2007-01-03 12:37:11 +00:00
Brian Barrett	99c0a29602	Disable CM and DR PMLs in heterogeneous situtations as neither are heterogeneous safe. Refs trac:587 This commit was SVN r12942. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2006-12-30 16:17:56 +00:00
George Bosilca	0b5d879a63	ompi_convertor_pack do not return errors (all checkings are done when the convertor is created). This commit was SVN r12940.	2006-12-29 07:40:02 +00:00
George Bosilca	d8db9e49f3	Set the bml_btl to NULL or segfault !!! This commit was SVN r12939.	2006-12-29 07:38:24 +00:00
Brian Barrett	2ab65eb521	Remove some debugging output that was #if 0'ed out but shouldn't have been committed into the trunk anyway This commit was SVN r12897.	2006-12-19 02:34:41 +00:00
Gleb Natapov	190e7a27cd	Merge with gleb-mpool branch. All RDMA components use same mpool now (rdma). udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited). This commit was SVN r12878.	2006-12-17 12:26:41 +00:00
Brian Barrett	01e8fc5f91	Redo of r12871, without the preconnect code change: Move the req_mtl structure back to the end of each of the structures in the CM PML. The req_mtl structure is cast into a mtl__request_structure for each MTL, which is larger than the req_mtl itself. The cast will cause the _request to overwrite parts of the heavy requests if the req_mtl isn't the LAST thing on each structure (hence the comment). This was moved as an optimization at some point, which caused buffer sends to fail... Refs trac:669 This commit was SVN r12873. The following SVN revision numbers were found above: r12871 --> open-mpi/ompi@597598b712 The following Trac tickets were found above: Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669	2006-12-15 17:54:14 +00:00
Brian Barrett	bdf0b231b2	Undo r12871, as it contained some code in ompi/runtime that shouldn't have been committed Refs trac:669 This commit was SVN r12872. The following SVN revision numbers were found above: r12871 --> open-mpi/ompi@597598b712 The following Trac tickets were found above: Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669	2006-12-15 17:52:13 +00:00
Brian Barrett	597598b712	Move the req_mtl structure back to the end of each of the structures in the CM PML. The req_mtl structure is cast into a mtl__request_structure for each MTL, which is larger than the req_mtl itself. The cast will cause the _request to overwrite parts of the heavy requests if the req_mtl isn't the LAST thing on each structure (hence the comment). This was moved as an optimization at some point, which caused buffer sends to fail... Refs trac:669 This commit was SVN r12871. The following Trac tickets were found above: Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669	2006-12-15 17:46:53 +00:00
Brian Barrett	10af8ab454	Corrections for when threading is enabled. Refs trac:564 This commit was SVN r12830. The following Trac tickets were found above: Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564	2006-12-12 18:48:42 +00:00
Brian Barrett	cf196ce420	Instead of an unknown proc list that requires ownership transfer of data (which, in turn, requires a complex series of locks to be held during the transfer), use a modex backing store with backpointers from the proc to the backing store. The proc structures no longer own the modex data, which greatly simplifies locking when an unknown proc suddenly becomes known. Refs trac:564 This commit was SVN r12822. The following Trac tickets were found above: Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564	2006-12-11 21:27:30 +00:00
Ralph Castain	0a5d41857a	Complete next round of message size reduction: "strip" the descriptive info from the returned values. I have now added a flag to the gpr address mode (ORTE_GPR_STRIPPED) that instructs the gpr to not include segment names or tokens in the returned gpr_value_t objects. I found only two places that were looking at the tokens: 1. the odls - we used the tokens to separately process the globals container data from everything else. In this case, I left the subscription that returned the globals data alone, but "stripped" the subscription that returned the launch data for the procs. These subscriptions have nothing to do with the xcast message. 2. the pml_base_modex - the callback function was getting process names from the returned tokens. Actually, this function was doing a very bad thing - it was assuming that the first token returned was always the process name. This is currently true, but is one of those assumptions that someone could have easily changed - and suddenly found the system inexplicably failing. I modified the function to (a) get the name sent back to us, (b) "stripped" the value structures of tokens and segment strings, and (c) correctly obtained process names from the returned values. I also reindented the heck out of the code so it was legible (at least, to my old eyes). This commit was SVN r12813.	2006-12-09 23:10:25 +00:00
Brian Barrett	98884e45e4	Clean up the way procs are added to the global process list after MPI_INIT: * Do not add new procs to the global list during modex callback or when sharing orte names during accept/connect. For modex, we cache the modex info for later, in case that proc ever does get added to the global proc list. For accept/connect orte name exchange between the roots, we only need the orte name, so no need to add a proc structure anyway. The procs will be added to the global process list during the proc exchange later in the wireup process * Rename proc_get_namebuf and proc_get_proclist to proc_pack and proc_unpack and extend them to include all information needed to build that proc struct on a remote node (which includes ORTE name, architecture, and hostname). Change unpack to call pml_add_procs for the entire list of new procs at once, rather than one at a time. * Remove ompi_proc_find_and_add from the public proc interface and make it a private function. This function would add a half-created proc to the global proc list, so making it harder to call is a good thing. This means that there's only two ways to add new procs into the global proc list at this time: During MPI_INIT via the call to ompi_proc_init, where my job is added to the list and via ompi_proc_unpack using a buffer from a packed proc list sent to us by someone else. Currently, this is enough to implement MPI semantics. We can extend the interface more if we like, but that may require HNP communication to get the remote proc information and I wanted to avoid that if at all possible. Refs trac:564 This commit was SVN r12798. The following Trac tickets were found above: Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564	2006-12-07 19:56:54 +00:00
Brian Barrett	41a70a8f01	indent, this time with the right coding standards... This commit was SVN r12787.	2006-12-07 00:24:01 +00:00
Brian Barrett	f9ec8d6f2a	reindent file to make it easier to deal with... This commit was SVN r12786.	2006-12-07 00:21:25 +00:00
Brian Barrett	6f8b366acb	Rename liborte to libopen-rte and libopal to libopen-pal per telecon today and bug #632. Refs trac:632 This commit was SVN r12762. The following Trac tickets were found above: Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632	2006-12-05 18:27:24 +00:00
Brian Barrett	441432950f	Merge in changes from the bwb-heterogeneous temp branch (r12491 - r12714) for supporting compilers / architectures with different padding rules. This commit was SVN r12749. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r12491 r12714	2006-12-04 20:11:42 +00:00
Rainer Keller	6f8f28f40f	- Get rid of inline definition, otherwise static-compilation fails. This commit was SVN r12735.	2006-12-03 14:52:17 +00:00
Gleb Natapov	30ca7457b4	Some BTLs (e.g TCP) can report put/get completion before data actually hits the buffer on the other side. For this kind of BTLs we need to send FIN through the same BTL, PUT was performed with so network will handle ordering for us. If we will use another BTL, receiver can get FIN before data will hit the buffer and complete request prematurely. We mark such problematic BTLs with MCA_BTL_FLAGS_FAKE_RDMA flag (this kind of RDMA is really fake, because the real one guaranties that sender will see the completion only after receiver's NIC confirmed that all the data was received). This commit was SVN r12732.	2006-12-03 10:12:09 +00:00
Gleb Natapov	39c930b160	The bug fixing part of r12720 introduce much more serious bug that it fixes. It calls mca_pml_ob1_send_fin_btl() which may fail and doesn't check return code. This breaks all RDMA transports event when only one BTL is used. Revert it for now, I am working on a real fix for the problem (I hope). This commit was SVN r12731. The following SVN revision numbers were found above: r12720 --> open-mpi/ompi@3e3689320b	2006-12-03 08:55:59 +00:00
Gleb Natapov	65d7ad4581	The "bug fix" from the r12721 reverts part of the r12433 that fixed regresion from v1.1 was reviewed and put to v1.2 branch. So revert this part of r12721 back. This commit was SVN r12730. The following SVN revision numbers were found above: r12433 --> open-mpi/ompi@82f7c0dd69 r12721 --> open-mpi/ompi@3edd850d2e	2006-12-03 08:29:55 +00:00
George Bosilca	3edd850d2e	Some indentation and code arrangement. However, there is a bug fix. Force the PUT protocol to always obey to the btl_max_rdma_size. This commit was SVN r12721.	2006-12-01 22:26:14 +00:00
George Bosilca	3e3689320b	Some indentations and one BIG fix. Avoid race conditions on the PUT RDMA protocol when multiple NICS are available between 2 peers. The fix force the FIN message to take exactly the same path as the fragment it describe (i.e. same path means same BTL). Otherwise, the FIN can be received by the peer before the RDMA complete and the request will get freed too early. This commit was SVN r12720.	2006-12-01 21:52:07 +00:00
Ralph Castain	6d6cebb4a7	Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it. I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn). This commit was SVN r12597.	2006-11-14 19:34:59 +00:00
Andrew Friedley	a4bdcb4faa	Fix a segfault that turned up in more MPI_THREAD_MULTIPLE testing. Same sort of problem and fix as described in r12323 - mca_pml_ob1_recv_frag_progress() was segfaulting due to a NULL req_proc pointer. The path leading to this was through the mca_pml_ob1_check_cantmatch_for_match() function, where we can match a frag using the same macros as mca_pml_ob1_frag_match() and never initialize the req_proc pointer. This commit was SVN r12582. The following SVN revision numbers were found above: r12323 --> open-mpi/ompi@c752502dee	2006-11-13 20:12:51 +00:00
George Bosilca	a38cd366d7	Construct the convertor. It's not really required, but it's not in the critical path anyway. At least in debug mode we get nice informations about where the convertor was created. This commit was SVN r12549.	2006-11-10 20:55:06 +00:00
George Bosilca	858ab24e8e	The req_mtl field has to be the last in the struct or bad things happen. This commit was SVN r12548.	2006-11-10 20:53:41 +00:00
George Bosilca	17405cd9c6	A temporary fix, until we figure out a better approach. The problem is that if one add "pml=" to the configuration file, really bad things happen. All PMLs will get initialize, and each of them will initialize all BTLs. This patch force the mca_pml_base_pml to get initialized in all cases before we go out of the mca_pml_base_open function. This commit was SVN r12527.	2006-11-10 04:53:00 +00:00
George Bosilca	eab1776e9a	Explicit casts for our friendly Windows environment... This commit was SVN r12496.	2006-11-08 17:02:46 +00:00
George Bosilca	915d748d72	Initialize the convertor on _START not on _INIT. This allow us to set it up before the match when we know the peer, saving some time on the critical path. If the receive is ANY_SOURCE then we initialize the convertor on _MATCHED. Anyway, we will set it up only once per receive. This commit was SVN r12484.	2006-11-08 05:42:29 +00:00
George Bosilca	eb45a5e402	Move things around a little bit. Mainly fields from the send and receive request in the base request. Rearrange the fields to keep the data together. Remove some useless tests. This commit was SVN r12482.	2006-11-08 04:58:23 +00:00
Galen Shipman	55db17b37c	don't try to use a dead btl.. This commit was SVN r12456.	2006-11-06 23:25:24 +00:00
Galen Shipman	eef37430a7	failing already failed for ACK timeout.. This commit was SVN r12452.	2006-11-06 22:09:39 +00:00

1 2 3 4 5 ...

442 Коммитов