openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	3500376d9e	Remove a warning about an unused label. This commit was SVN r16429.	2007-10-11 16:38:37 +00:00
George Bosilca	e3105a85be	Don't require a progress function from the PML. If there is one then the PML base will take care of the registration with the event library. Otherwise, (and this apply for the CM case) the MTL are in charge of registering their own progress function. This commit was SVN r16415.	2007-10-09 23:28:53 +00:00
Galen Shipman	6a25a635de	that shouldn't have slipped through.. This commit was SVN r16411.	2007-10-09 19:07:23 +00:00
Galen Shipman	6b051e255e	already checked size.. no need to do it again.. This commit was SVN r16409.	2007-10-09 18:59:10 +00:00
Nysal Jan	b51d85fb3f	Fix assertion failure "assert( 0 == btl_endpoint->endpoint_cache_length )" while executing mt_coll testcase. This commit was SVN r16408.	2007-10-09 18:00:01 +00:00
Galen Shipman	62ade993ca	Seperate finalize and close for the PML, this gives the PML a chance to complete any outstanding operations prior to close. Before this change we just called pml_finalize in pml_close which causes problems if there are outstanding events that a BTL/MTL needs to progress during finalize. The problem is that MPI_COMM_WORLD and others were destroyed prior to closing the PML, pml_close would call pml_finalize, events would progress in the BTL, and these events expected MPI_COMM_WORLD to still be around.. This commit was SVN r16405.	2007-10-09 15:28:56 +00:00
Andrew Friedley	c15047b264	Add LLNL copyright to the file i modified yesterday This commit was SVN r16404.	2007-10-09 15:18:23 +00:00
Andrew Friedley	fd51d9cf28	The call to opal_list_insert() had an off by one error (I think), causing selected components to get lost with certain load orderings. I went ahead and rewrote the code to use opal_list_insert_pos() instead, which gives a cleaner flow and more speed. This commit was SVN r16392.	2007-10-08 23:01:36 +00:00
Josh Hursey	7437f37e96	This commit contains the following: * Fix some missing includes in a few places. * Add the cr_request() functionality to the BLCR CRS component. We are now dependent upon the 0.6.* series of BLCR. * Made the CR notification mechanism a registered function. This way we can have an OPAL-only version and it can be replaced at runtime with the ORTE version. * Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only CR functionality when the user wants it. Default: Disabled. * Fix the placement of a checkpoint request check in MPI_Init * Pull the OPAL notification mechanism into the SnapC framework. * We no longer fork/exec the 'opal-checkpoint' command for local checkpointing, the Local coordinator in the orted does this directly. * The Local and Application coordinator talk together bypassing the OPAL notifiation mechanism. * Optimized the Local <-> App Coordinator communication. * Improved the structure used to track vpid_snapshots in the local coord. * Fix a race condition in which an application under heavy communication load may produce an inconsistent global checkpoint. This commit was SVN r16389.	2007-10-08 20:53:02 +00:00
Jeff Squyres	f92d9097d8	Some more changes to update to coll v1.1.0 that were missed yesterday. This actually exposed a very, very long-standing bug where part of the coll base was incorrectly checking the coll API version against the MCA API version. When coll went to v1.1 (yesterday) and was no longer the same as the MCA v1.0, the test started failing. This commit fixes to check for v1.1 everywhere in the coll base, and to ensure to check coll framework/API version numbers against coll framework/API version numbers (vs. against the MCA API version number). This commit was SVN r16373.	2007-10-07 12:20:22 +00:00
Jeff Squyres	3d34bff596	No technical/functional changes: simply change the name of the "data" parameter to "module" everywhere, just to be a little more clear what the purpose of that parameter is. This commit was SVN r16372.	2007-10-07 08:36:45 +00:00
Jeff Squyres	b5abb12c98	Commit Ralph's fix for MPI_APPNUM. This commit was SVN r16371.	2007-10-06 18:54:43 +00:00
Jeff Squyres	fc2b4376e9	Update forgotten macro. This commit was SVN r16368.	2007-10-06 14:11:35 +00:00
Ralph Castain	54b2cf747e	These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC. The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component. This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done: As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in. In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in. The incoming changes revamp these procedures in three ways: 1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step. The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic. Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure. 2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed. The size of this data has been reduced in three ways: (a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes. To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose. (b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction. (c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using. While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly. 3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup. It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging. Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future. There are a few minor additional changes in the commit that I'll just note in passing: propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details. * requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details. * cleanup of some stale header files This commit was SVN r16364.	2007-10-05 19:48:23 +00:00
Jelena Pjesivac-Grbovic	ada43fef9e	This fixes bug #1157 in coll/self module. All vector functions had incorrect handling of the offset. This commit was SVN r16360.	2007-10-05 17:40:16 +00:00
Jeff Squyres	f92154fc72	Gah -- ompi_info doesn't setup the connect pseudo component, so it'll be NULL. Ensure to protect for this. This commit was SVN r16333.	2007-10-04 18:03:56 +00:00
Jeff Squyres	13fa7ae93e	It's not necessary to link against all 3 libs (in fact, we shouldn't do it -- let libtool pull them in via the .la file if it needs to) This commit was SVN r16332.	2007-10-04 18:01:30 +00:00
Jeff Squyres	80ce974291	Fixes trac:1156: ensure to finalize the "connect" sub-component. This commit was SVN r16330. The following Trac tickets were found above: Ticket 1156 --> https://svn.open-mpi.org/trac/ompi/ticket/1156	2007-10-04 17:36:12 +00:00
Andrew Friedley	2e66590993	Fix mistakes in the basic component.. can't call collectives on the communicator and always pass the basic module.. have to give them the module off the communicator. This commit was SVN r16329.	2007-10-04 16:29:24 +00:00
Galen Shipman	77f080575f	fix for the cray.. This commit was SVN r16317.	2007-10-03 19:25:23 +00:00
Andrew Friedley	5be7f5e2dc	fixes trac:1154 Check if an exclusion string (i.e. '-mca btl ^sm) was provided; if so OFUD just disables itself. This commit was SVN r16307. The following Trac tickets were found above: Ticket 1154 --> https://svn.open-mpi.org/trac/ompi/ticket/1154	2007-10-02 20:37:16 +00:00
Tim Prins	34966edaf1	remove unneeded and never-initialized lock. The orte_ns.assign_tag function does all the locking we need for us. This commit was SVN r16299.	2007-10-02 14:22:29 +00:00
Gleb Natapov	60af46d541	We have QP description in component structure, module structure and endpoint. Each one of them has a field to store QP type, but this is redundant. Store qp type only in one structure (the component one). This commit was SVN r16272.	2007-09-30 16:14:17 +00:00
Gleb Natapov	9c04b127f5	Forget to put this fix in previous commit. This commit was SVN r16271.	2007-09-30 15:33:20 +00:00
Gleb Natapov	3a15d645be	Remove lcl_qp_attr from endpoint qp description. It is used during init only. This commit was SVN r16270.	2007-09-30 15:29:35 +00:00
Brian Barrett	48c49cb89c	Handle case where modex_recv_string() isn't implemented (ie, the Cray) This commit was SVN r16267.	2007-09-28 18:50:37 +00:00
Tim Prins	1d1d0f6d4c	Fix segfault when user provides a working directory for comm_spawn. Thanks to Murat Knecht for reporting this and suggesting a fix. This commit was SVN r16266.	2007-09-27 23:30:40 +00:00
Aurelien Bouteiller	670956e172	Another cast mistake. This commit was SVN r16247.	2007-09-26 21:14:35 +00:00
Aurelien Bouteiller	f7d7d58fb6	Various cast type errors on 64bit architectures This commit was SVN r16246.	2007-09-26 20:54:18 +00:00
Brian Barrett	56e26ed390	Need to install the mpool_rdma.h so that we can build external BTLs that use the RDMA protocol This commit was SVN r16237.	2007-09-26 16:58:54 +00:00
Andrew Friedley	069e6dc4a0	Fix a bug introduced when the collective selection logic was changed to allow for a different component to be used for each collective. Passing the barrier module to the bcast function is a bad idea when barrier is using a different component from bcast.. This commit was SVN r16212.	2007-09-25 17:09:52 +00:00
Pak Lui	97e692d85a	mqs_communicator type should not be changed as it serves as the interface between Totalview and DLL. This commit was SVN r16200.	2007-09-24 19:02:56 +00:00
Gleb Natapov	c7105eadc7	Update Voltaire copyright. This commit was SVN r16189.	2007-09-24 10:11:52 +00:00
Aurelien Bouteiller	0df0087f17	Investigating improvement of cache line management on shared memory This commit was SVN r16183.	2007-09-21 20:02:56 +00:00
Josh Hursey	1fe1276fd5	Make sure to match on the communicator ID as well. This commit was SVN r16179.	2007-09-21 18:16:02 +00:00
Josh Hursey	3e51d7bb25	Implement the MPI_Iprobe and MPI_Probe wrappers. Remove some old, unused code. This commit was SVN r16178.	2007-09-21 16:28:46 +00:00
George Bosilca	8bdd14ba40	Remove unique_id which wasn't used anymore. Instead use the recv_context which is set to the cid of the communicator (unique id for each communicator). Make sure each communicator have a group attached to it. The MPI_COMM_NULL should have the MPI_GROUP_NULL as a group, in all circumstances. This commit was SVN r16177.	2007-09-21 14:30:40 +00:00
Aurelien Bouteiller	d3b376a340	This patch adds actual non-blocking sender-based message logging. This improves bandwidth. Still need to work on malloc/mmap storage to reach optimal bandwidth. This commit was SVN r16172.	2007-09-21 03:24:08 +00:00
Aurelien Bouteiller	bc318b35e2	There is room in convertor to copy the packed data. It works just need to add the correct memcopy. It does not manage the short messages but I alreqdy think of a workaround for this (and it might even be better regarding latency). This commit was SVN r16169.	2007-09-20 21:57:21 +00:00
Pak Lui	54c87daaed	Fix a SEGV when the user updates the message queue graph after the user executable has called MPI_Finalize(). It happens when removing the group from each of the communicators, that MPI_COMM_NULL doesn't have a group. Also fix the code from skipping over every other communicator when freeing the groups. This commit was SVN r16166.	2007-09-20 18:58:16 +00:00
Tim Prins	38fde640ad	Fix builds on FreeBSD by renaming strings.h to f77_strings.h so that our file does not get accidently included by FreeBSD's string.h. Thanks to Karol Mroz for pointing out the problem. This commit was SVN r16164.	2007-09-19 23:24:23 +00:00
Aurelien Bouteiller	bbac6e650a	New improved version of sender-based. Under dev but a new framework for expressing various methods have been added. This commit was SVN r16159.	2007-09-19 03:42:56 +00:00
Brian Barrett	6bf121e17b	fix comment This commit was SVN r16154.	2007-09-18 16:30:45 +00:00
Gleb Natapov	097b17d30e	Prevent a receive request from been freed while other thread holds a reference to it or there is an outstanding completion for the request. This commit was SVN r16153.	2007-09-18 16:18:47 +00:00
Jeff Squyres	33955a0ed0	Oops -- when converted from uint to int, -1 (the default value, meaning "infinite") is no longer larger than the minimum required size. So put in an appropriate test to ensure that "infinite" was not requested. This commit was SVN r16142.	2007-09-17 19:28:21 +00:00
Jeff Squyres	130a272cec	Fix some compiler warnings about signed/unsigned comparisons. This commit was SVN r16139.	2007-09-17 13:08:45 +00:00
Josh Hursey	d2ef0d445a	Add some basic timing hooks so I can extract a few more detailed performance numbers for tuning. Switch the bookmark_recv to be non-blocking. If this is blocking then for process counts >= 32 slight process delays were causing cascading performance delays in the protocol. This lead to checkpoints either taking about 3 sec or 45 sec (or more) for 64 procs due to the cascading delays. With the nonblocking receive version this is no longer the case we get the speedup we expect for this part of the protocol. More tuning to come. This commit was SVN r16137.	2007-09-16 15:13:23 +00:00
Tim Prins	a194896ae8	Reverts r16130. There is a reason that we use the internal type (ompi_file_errhandler_fn) instead of the MPI typedef. When building without MPI-IO support (--disable-mpi-io), the MPI type is not defined, but the internal type IS defined in order to try to keep binary compatibility for apps that don't use MPI-IO. This commit was SVN r16136. The following SVN revision numbers were found above: r16130 --> open-mpi/ompi@cf5a38af5e	2007-09-15 11:19:13 +00:00
Jeff Squyres	6004e177e0	Fixes trac:1133: if you specify a max freelist size that is too small, you'll get a helpful error message and the openib BTL will deactivate itself. This commit was SVN r16133. The following Trac tickets were found above: Ticket 1133 --> https://svn.open-mpi.org/trac/ompi/ticket/1133	2007-09-14 21:42:56 +00:00
George Bosilca	cf5a38af5e	There is no reason to use the internal type (ompi_file_errhandler_fn) while everywhere else we're using the MPI typedef (MPI_File_errhandler_fn). This commit was SVN r16130.	2007-09-14 21:23:39 +00:00
Tim Prins	4033a40e4e	Coding standards... This commit was SVN r16118.	2007-09-13 14:00:59 +00:00
George Bosilca	617ff3a413	Add a MCA parameter for the ELAN MAP ID file. Fix small memory bugs, and track the final segfault. Still some ork to do. This commit was SVN r16117.	2007-09-12 21:25:35 +00:00
Aurelien Bouteiller	a1f5312afb	Fixed two little warnings This commit was SVN r16116.	2007-09-12 21:07:11 +00:00
Aurelien Bouteiller	ccb3f75e8f	Make sure that the pml v parasite never get loaded when user did not requested FT. This does not break the ability to switch protocol on the fly. This commit was SVN r16114.	2007-09-12 20:47:17 +00:00
George Bosilca	1e7a791349	Remove some of the problems identified by Coverty. This commit was SVN r16112.	2007-09-12 20:13:26 +00:00
Aurelien Bouteiller	828af95be8	Major modification of the vprotocol framework build system. With a better integration in autogen.sh, it allows for generating static-components.h the usual way. NOTE: This build system does not work with the current autogen.sh. Modified one is under heavy testing to make sure it does not have side effects This commit was SVN r16110.	2007-09-12 18:46:37 +00:00
George Bosilca	7b3dcff267	Coverty: Limit the strcpy to the maximum length of the destination. This commit was SVN r16107.	2007-09-12 18:03:53 +00:00
George Bosilca	bfb4ddc3e2	Coverty: remove dead code. This commit was SVN r16106.	2007-09-12 17:56:33 +00:00
George Bosilca	05ae27c68b	Don't segfault if we receive a fragment for a non existing communicator. Instead, drop it by now. This commit was SVN r16105.	2007-09-12 17:52:02 +00:00
George Bosilca	c755938eb0	Coverty: release the temporary buffer on error. This commit was SVN r16104.	2007-09-12 17:45:12 +00:00
George Bosilca	2b7ed6262b	Update the communicator lowest_free when we rebuild the communicator list. This commit was SVN r16102.	2007-09-12 16:41:14 +00:00
Shiqing Fan	b1ea3e0054	- add more lines for static import declaration on windows. This commit was SVN r16101.	2007-09-12 15:32:54 +00:00
Shiqing Fan	a0660f4deb	- Just some type casts. This commit was SVN r16100.	2007-09-12 15:29:58 +00:00
Gleb Natapov	07c8fddeef	Fix scheduling of pending send request. It should be scheduled req_lock times. This commit was SVN r16096.	2007-09-12 07:08:38 +00:00
George Bosilca	d8fed2cfa1	Set a default value so that some compilers stop complaining about uninitialized values. This commit was SVN r16094.	2007-09-11 18:00:53 +00:00
George Bosilca	2e46809995	Only release the comm_reg is we have one. This commit was SVN r16093.	2007-09-11 17:59:40 +00:00
Gleb Natapov	e82a6eec27	Restore check for lowest id. It prevents livelock situation if multiple threads are inside the function and they failed to obtain new cid the first time around. This commit was SVN r16090.	2007-09-11 15:32:46 +00:00
Gleb Natapov	58a018c16d	The code tries to prevent itself from running for more then one communicator simultaneously, but is doing it incorrectly. If the function is running already for one communicator and it is called from another thread for other communicator with lower cid the check comm->c_contextid != ompi_comm_lowest_cid() will fail and the function will be executed for two different communicators by two threads simultaneously. There is nothing in the algorithm that prevent it from been running simultaneously for different communicators as far as I can see, but ompi_comm_unregister_cid() assumes that it is always called for a communicator with the lowest cid and this is not always the case. This patch removes bogus lowest cid check and fix ompi_comm_register_cid() to properly remove cid from the list. This commit was SVN r16088.	2007-09-11 13:23:46 +00:00
George Bosilca	8659a864e9	This is the real fix for ticket 317 and ticket 1065 and ticket 278. This commit was SVN r16084.	2007-09-10 22:27:59 +00:00
George Bosilca	8622beda54	This commit should fix the issues with ticket 1065. Now, we correctly duplicate the MPI_UB and MPI_LB datatypes. This commit was SVN r16083.	2007-09-10 22:13:42 +00:00
Pak Lui	e3fdfdbd9c	Fix some typos here and there. This commit was SVN r16058.	2007-09-06 14:56:08 +00:00
Tim Prins	f677ef5c12	Fix build failure on BigRed This commit was SVN r16054.	2007-09-06 12:10:11 +00:00
Pak Lui	3d7b5b306f	Fix a problem with OPAL_ALIGN that causes the upper bytes to get chopped off and bogus addresses to show up for the requests, which in turns causes message queues not showing up when debugging a 64 bit app on a 32 bit tvd and dll on only Solaris SPARC. This commit was SVN r16052.	2007-09-05 23:52:36 +00:00
Pak Lui	99ae2c1c44	Nothing relevent (yet). Just making debugging more enjoyable. This commit was SVN r16051.	2007-09-05 23:21:58 +00:00
Gleb Natapov	b0614931f4	Remove mpool_tree_item from the mpool_tree before unregistering/freeing memory. Otherwise a race exists if another thread allocates already freed memory which is not removed from the mpool_tree yet. This commit was SVN r16038.	2007-09-03 10:56:55 +00:00
Rainer Keller	a3b30749b0	- Only lock/unlock when using threads. Basically revert this part of r16015. This commit was SVN r16029. The following SVN revision numbers were found above: r16015 --> open-mpi/ompi@435e7d80e9	2007-08-31 12:34:48 +00:00
Rainer Keller	9c1c345c07	- head_lock is an opal_atomic_lock_t... This commit was SVN r16028.	2007-08-31 12:20:21 +00:00
Pak Lui	bc34a46969	* complete the fix started in r15915, that is, to prevent negative tags from showing up in the message queue graph. Tags are now casted to int before the negative checks, since tags by the spec are stored as mqs_tword_t, an unsigned long long. This commit was SVN r16027. The following SVN revision numbers were found above: r15915 --> open-mpi/ompi@b9ea4c92e7	2007-08-31 03:02:24 +00:00
Shiqing Fan	b1250eba3a	- Some more to be exported. This commit was SVN r16023.	2007-08-30 15:13:08 +00:00
Shiqing Fan	efdcfa3807	- "extern 'C'" has been set twice. Remove one. This commit was SVN r16022.	2007-08-30 15:03:59 +00:00
Shiqing Fan	80fdd5e2a4	- Need to be exported. This commit was SVN r16021.	2007-08-30 14:16:03 +00:00
Gleb Natapov	79011279e5	Remove debug output. This commit was SVN r16016.	2007-08-30 13:29:41 +00:00
Gleb Natapov	435e7d80e9	Remove rc parameter from MCA_BTL_SM_FIFO_WRITE() macro. It cannot fail in current implementation. This commit was SVN r16015.	2007-08-30 13:21:52 +00:00
Gleb Natapov	690fb95bda	Cleanup send scheduling code. This commit was SVN r16014.	2007-08-30 12:10:04 +00:00
Gleb Natapov	0b0f9d14aa	Mark send request complete on PML level only when absolutely sure there is no more work associated with this request. No more outstanding completions or packets and send scheduling isn't running in another thread. This commit was SVN r16013.	2007-08-30 12:08:33 +00:00
Gleb Natapov	fe414047bd	registration may be freed inside mca_mpool_rdma_deregister(). This commit was SVN r16012.	2007-08-30 10:52:38 +00:00
Gleb Natapov	091862a25a	Protect access to mca_mpool_base_tree by a lock. This commit was SVN r16011.	2007-08-30 10:51:02 +00:00
Brian Barrett	f53b14bde5	George noted I had this logic completely backwards. Oops. This commit was SVN r16005.	2007-08-29 16:18:04 +00:00
Gleb Natapov	eac2674f66	The inner voice tells me this is a typo. This commit was SVN r16004.	2007-08-29 13:28:47 +00:00
George Bosilca	756eee571e	Fix Coverty #24 . This test didn't make sense in this branch of the if. This commit was SVN r16001.	2007-08-29 02:02:19 +00:00
Jeff Squyres	466394a878	We only care about the value of ret in the !OMPI_ENABLE_PROGRESS_THREADS case. Reviewed by Brian. This commit was SVN r16000.	2007-08-29 01:36:17 +00:00
Jeff Squyres	c4a38f47f6	Resolve Coverity CID 467: remove unused variable / dead code. This commit was SVN r15997.	2007-08-29 01:23:18 +00:00
Jeff Squyres	f08cce16db	Fix Coverity CID 468: remove unused variable. This commit was SVN r15996.	2007-08-29 01:21:17 +00:00
Brian Barrett	59b22533f2	Enable RDMA for heterogeneous situations. Currently done by overloading the ompi_convertor_need_buffers function to only return 0 if the convertor is homogeneous (which it never does on the trunk, but does to on v1.2, but that's a different issue). Only enable the heterogeneous rdma code for a btl if it supports it (via a flag), as some btls need some work for this to work properly. Currently only TCP and OpenIB extensively tested This commit was SVN r15990.	2007-08-28 21:23:44 +00:00
Gleb Natapov	fa69c5cc10	If a memory on a sender's size is not registered don't register it on a receive side too. Otherwise a content of the recvreq->req_rdma array is replaced later without freeing previous content and refcount on registration in mpool become wrong. This commit was SVN r15978.	2007-08-28 07:43:06 +00:00
Rich Graham	bc97d22182	remove tabs. Remove old code that was commented out. This commit was SVN r15975.	2007-08-28 03:08:36 +00:00
Rich Graham	4d58f9aed7	Add comments. Move temporary receive object from a free list object to a stack object. This commit was SVN r15971.	2007-08-27 21:41:04 +00:00
Pak Lui	75c7d4e03b	Temporary workaround for making Totalview be able to get those opal symbols and load into the library when compiled with a Sun Studio C compiler This commit was SVN r15970.	2007-08-27 19:04:56 +00:00
Gleb Natapov	e1a1d9d90e	Receive request converter can be accessed in parallel by a thread that receives data and a thread that run RDMA schedule function. Protect access to the converter by a lock. This commit was SVN r15967.	2007-08-27 11:41:42 +00:00
Gleb Natapov	33196d972b	post_send() function is called without endpoint lock held from explicit credits update function so eager_rdma_remote.head have to be updated in a thread safe manner. This commit was SVN r15966.	2007-08-27 11:37:01 +00:00

1 2 3 4 5 ...

3117 Коммитов