openmpi

Автор	SHA1	Сообщение	Дата
Pavel Shamis	e8aeadb11e	XRC fixes: - create separate xrc domain file for each hca - return error if we failed to create xrc file. This commit was SVN r16853.	2007-12-05 14:32:44 +00:00
Pavel Shamis	f60ca0e4e5	Removing unused mca_btl_openib_ib_address_status This commit was SVN r16835.	2007-12-04 13:16:26 +00:00
Pavel Shamis	57728986f8	Fixing XRC multiport/multisubnet support. This commit was SVN r16819.	2007-12-03 09:49:53 +00:00
Gleb Natapov	b2858236fb	Use new free list interface. This commit was SVN r16818.	2007-12-02 15:13:11 +00:00
Gleb Natapov	a774cd98f8	Put send completions to low prio CQ. Receive is more important. This commit was SVN r16817.	2007-12-02 14:46:37 +00:00
Gleb Natapov	b17f5b7480	Change how default receive queues parameters are calculated. Current default parameters don't make any sense. Credits are never piggybacked. Also make default queue sizes to be calculated from eager_limit and max_send_size values. This commit was SVN r16816.	2007-12-02 14:43:28 +00:00
Josh Hursey	5fb83a4f10	- Remove an unnecessary barrier - verbose -> VERBOSE just for the fun of it This commit was SVN r16811.	2007-11-30 22:26:18 +00:00
Rich Graham	6e77414a68	changes to the ompi_free_list_ex - called ompi_free_list_ex_new, for now. This commit was SVN r16803.	2007-11-29 21:18:37 +00:00
Ron Brightwell	0138a2ee17	Do cleanup in ompi_mtl_portals_del_procs() rather than ompi_mtl_portals_finalize(). Previous code was cleaning up Portals resources that hadn't been allocated, which caused valid handles used elsewhere to be freed, which broke cnos_barrier() for the Portals btl. This commit was SVN r16801.	2007-11-29 17:29:46 +00:00
Jeff Squyres	8c0060701c	Stub out the ibcm CPC. This commit was SVN r16800.	2007-11-29 13:23:17 +00:00
Pavel Shamis	8aca6eb31b	OFED 1.3 doesn't implement ibv_resize_cq for connectX. On error exit from ibv_resize_cq we should to check if the function is implemented. This commit was SVN r16799.	2007-11-28 15:23:19 +00:00
Gleb Natapov	5f242c77f2	Post each recv wr not separately but in one call to ibv_post_recv(). This commit was SVN r16798.	2007-11-28 14:57:15 +00:00
Gleb Natapov	14cffee726	Uninline mca_btl_openib_post_srr() function. This commit was SVN r16797.	2007-11-28 14:52:31 +00:00
Pavel Shamis	1c314ef4c3	If XRC qp was specified in btl_openib_receive_queues we automatically should choose xoob connection module. This commit was SVN r16796.	2007-11-28 10:33:32 +00:00
Pavel Shamis	488a508732	Removing comments from help file. This commit was SVN r16795.	2007-11-28 10:16:08 +00:00
Pavel Shamis	3e2e4f6d2a	Removing unused lid. This commit was SVN r16794.	2007-11-28 10:06:57 +00:00
Pavel Shamis	aa79bdabc8	Removing port_touse - we don't really need it This commit was SVN r16793.	2007-11-28 09:57:48 +00:00
Pavel Shamis	2ffbe8776a	Fixing compilation problems in openib This commit was SVN r16792.	2007-11-28 09:38:49 +00:00
Gleb Natapov	218adb2a96	Account for eager rdma credit fragments when creating send queue. Create XRC receive QP with zero receive and send queue length. We don't going to use this QP for send and receives a posted to SRQs. This commit was SVN r16791.	2007-11-28 07:22:01 +00:00
Gleb Natapov	601952a952	Don't shared endpoint->qps array, only pointer to actual QP. Calculate send queue size for shared QP based on all endpoints that want to use it. This commit was SVN r16790.	2007-11-28 07:21:07 +00:00
Gleb Natapov	b46c9cc7bc	Make xrc use srq_qp unions instead of the xrc_qp which is exactly like srq_qp. This commit was SVN r16789.	2007-11-28 07:20:26 +00:00
Gleb Natapov	be0981fc07	Change a type of xrc_recv_qp to "struct ibv_qp". This commit was SVN r16788.	2007-11-28 07:19:36 +00:00
Gleb Natapov	bd47da4699	Initial XRC support by Mellanox. This commit was SVN r16787.	2007-11-28 07:18:59 +00:00
Gleb Natapov	b49788c499	Receive queue is not used in case of SRQ QP, so don't create one. This commit was SVN r16786.	2007-11-28 07:17:22 +00:00
Gleb Natapov	923666b75c	Process pending put/get frags on endpoint connection establishment. This commit was SVN r16785.	2007-11-28 07:16:52 +00:00
Gleb Natapov	e502402470	Fix endpoint destructor to not skip closed endpoints. This commit was SVN r16784.	2007-11-28 07:15:54 +00:00
Gleb Natapov	5a4e953aaa	Allow share the same qp for different buffer sizes. Needed for XRC support. This commit was SVN r16783.	2007-11-28 07:15:20 +00:00
Gleb Natapov	b123696d57	Fix async thread creation and destruction. Create async thread only when it is needed instead of creating it and then canceling if it is not needed. Change error handling during finalize so that it will not skip async thread destruction. Otherwise async thread may segfault during openib module unloading. This commit was SVN r16782.	2007-11-28 07:14:34 +00:00
Gleb Natapov	5463eb892c	Send all explicit credits for PP QPs of all orders over smallest PP qp. This commit was SVN r16781.	2007-11-28 07:13:34 +00:00
Gleb Natapov	a9f864d15c	If there is an eager rdma credit, but there is no WQE to send a packet we add it to a pending queue of eager rdma QP instead of correct pending list. This patch fixes this by getting reed of "eager rdma qp" notion. Packet is always send over its order QP. The patch also adds two pending queues for high and low prio packets. Only high prio packets are sent over eager RDMA channel. This commit was SVN r16780.	2007-11-28 07:12:44 +00:00
Gleb Natapov	6a2d210b7d	Use OMPI object system to make fragment hierarchy more object oriented. The main idea (except of cleanup) is to save on initialisation of unneeded fields and to use C type checking system to catch obvious errors. This commit was SVN r16779.	2007-11-28 07:11:14 +00:00
Gleb Natapov	267cd2342a	Cleanup. Remove unused functions. This commit was SVN r16778.	2007-11-28 07:08:56 +00:00
Ron Brightwell	924414f92f	Added support for Accelerated Portals for the btl. This commit was SVN r16771.	2007-11-21 21:34:17 +00:00
Ron Brightwell	a6d6be1bb9	Added send-side optimizations (persistent zero-length md and copy blocks) and support for Acclerated Portals. This commit was SVN r16770.	2007-11-21 21:31:37 +00:00
Brad Penoff	fb5536f11d	conforming SCTP BTL to Open MPI naming conventions and IP requirements This commit was SVN r16764.	2007-11-21 10:13:41 +00:00
Andrew Friedley	c50f2aa74c	fix warning This commit was SVN r16759.	2007-11-20 16:55:12 +00:00
Brad Penoff	ede8a6a7a1	adjusting for Linux when sctp_recvmsg returns 0 for remote close This commit was SVN r16742.	2007-11-20 06:02:08 +00:00
Tim Prins	f42fcd36db	make the mx btl compile again after the free list changes This commit was SVN r16735.	2007-11-19 19:41:22 +00:00
Brad Penoff	f34ddfef80	for SCTP BTL, added Mac OS X support for systems using SCTP NKE (Network Kernel Extension) This commit was SVN r16729.	2007-11-17 02:56:27 +00:00
Aurelien Bouteiller	15ffe6c89c	Accomoding the new interface for free_lists. This commit was SVN r16727.	2007-11-16 00:00:38 +00:00
Brad Penoff	5abd2d8064	initial SCTP BTL commit This commit was SVN r16723.	2007-11-13 23:39:16 +00:00
Adrian Knoth	037a533752	Reformatted r16691 to OMPI style. Re #733 This commit was SVN r16693. The following SVN revision numbers were found above: r16691 --> open-mpi/ompi@8dca19cb3b	2007-11-08 12:54:48 +00:00
Adrian Knoth	8dca19cb3b	upstream patch, provided by Jiri Polach. Re #733 This commit was SVN r16691.	2007-11-08 12:44:10 +00:00
Jeff Squyres	a4d571f8ad	Fix typo that broke the build. This commit was SVN r16635.	2007-11-02 09:19:55 +00:00
Rich Graham	27a748e7eb	change all instances of ompi_free_list_init to ompi_free_list_init_new. Header and payload data are specified separately at this stage. This commit was SVN r16633.	2007-11-01 23:38:50 +00:00
Andrew Friedley	46516d98e1	Update MCA params -- sd_num_peer is no longer used, change rd_num_init to rd_num This commit was SVN r16601.	2007-10-29 22:56:30 +00:00
Andrew Friedley	8273b61471	Bugfix for hangs in certain communication patterns, particularly alltoall. This commit was SVN r16600.	2007-10-29 21:51:28 +00:00
Gleb Natapov	04578ffdd6	Change calls to bml_btl->btl_alloc() to mca_bml_base_alloc(). This commit was SVN r16596.	2007-10-28 16:04:17 +00:00
Rich Graham	67f4b69848	propogate fix for out of buffered send memory space to dr and ob1 - thanks George. This commit was SVN r16593.	2007-10-27 00:17:53 +00:00
Rich Graham	9c0483088a	if unable to get buffered space, try and progress communications to free up resources. This commit was SVN r16591.	2007-10-26 23:16:31 +00:00
George Bosilca	d67c0eefb4	Remove a compilation warning about using uninitialized variables. This commit was SVN r16589.	2007-10-26 20:15:28 +00:00
George Bosilca	b1b5cb6453	Looks like SO_REUSEPORT it's not defined on some platforms. Switch to the conventional SO_REUSEADDR instead. This commit was SVN r16588.	2007-10-26 19:56:21 +00:00
George Bosilca	337f78a4a8	Restrict the port range for the OOB and the BTL. Each protocols (v4 and v6) has his own range which is defined by a min value and a range. By default there is no limitation on the port range, which is exactly the same behavior as before. This commit was SVN r16584.	2007-10-26 16:36:51 +00:00
George Bosilca	682f110658	Correctly test the finalize condition. Thanks to Ake Sandgren for bringing this issue to our attention. This commit was SVN r16560.	2007-10-24 13:34:27 +00:00
Gleb Natapov	3a63eb6c17	Cleanup macro definitions. This commit was SVN r16554.	2007-10-23 13:33:19 +00:00
Gleb Natapov	d836f3dbbe	Remove unused macro. This commit was SVN r16552.	2007-10-23 13:18:10 +00:00
Gleb Natapov	18ed60edeb	Revert previous commit. There was no memory leak, the pointer is saved inside free list for future use. This patch moves BTL initialization into separate function too. This commit was SVN r16551.	2007-10-23 12:57:45 +00:00
Gleb Natapov	657e544e02	Fix memory leak. Define init_data on a stack instead of allocation it each time. This commit was SVN r16550.	2007-10-23 11:10:52 +00:00
Gleb Natapov	9e2d5acf8e	Remove unused filed from openib fragment structure. This commit was SVN r16549.	2007-10-23 07:38:29 +00:00
George Bosilca	95c9fbdf45	Make sure the MX MTL component is shared between all files. This commit was SVN r16545.	2007-10-22 18:06:52 +00:00
Gleb Natapov	63dde87076	If SM BTL cannot send fragment because the cyclic buffer is full put the fragment on the pending list and send it later instead of spinning on opal_progress(). This commit was SVN r16537.	2007-10-22 12:07:22 +00:00
Rich Graham	0de9bd9fa0	when attaching an md for posted receive, generate a start event, so that PtlMDUpdate will pick up all incoming events. This commit was SVN r16517.	2007-10-19 19:09:40 +00:00
Gleb Natapov	52c6160252	MCA_PML_BASE_REQUEST_MPI_COMPLETE() macro does nothing except call to ompi_request_complete(). Remove the macro and call the function directly. This commit was SVN r16498.	2007-10-18 14:20:24 +00:00
George Bosilca	aa20a94b6f	Remove warning about an unused variable. This commit was SVN r16497.	2007-10-18 13:48:56 +00:00
Gleb Natapov	4f865e22e8	We have two different version of ompi_request_complete. One as a function another as a macro. Make it one inline function. This commit was SVN r16495.	2007-10-18 13:02:27 +00:00
Gleb Natapov	e0a3a7e53e	Move duplicated code all over the code to a single function ompi_request_wait_completion(). This commit was SVN r16494.	2007-10-18 12:33:21 +00:00
Gleb Natapov	807f49ed7f	If there are more then one BTL present we may divide payload between them in such a way that converter will not be able to pack some of it. This commit adds handling of such cases. If converter can't pack any data for a BTL the data is sent over another BTL that has data to send. This commit was SVN r16493.	2007-10-18 12:07:37 +00:00
Jeff Squyres	b7eeae0a74	Remove the mvapi BTL. Woo hoo! This commit was SVN r16483.	2007-10-17 14:08:03 +00:00
Jeff Squyres	94b1e9cff9	Update to use BTL_VERBOSE and BTL_ERROR instead of opal_output'ing to the mca_btl_base_output stream directly (and relying on it to be -1 if we didn't want any output). This commit was SVN r16449.	2007-10-15 17:53:02 +00:00
Rolf vandeVaart	3dd5196338	Remove the --mca btl_base_debug flag and clean up the use of the --mca btl_base_verbose flag. The btl framework now matches all the other frameworks. Slightly modify error messages for clarity. This commit was SVN r16443.	2007-10-15 13:10:20 +00:00
Gleb Natapov	1330974e5e	eager_limit is no longer needed in OB1 PML. Remove it. This commit was SVN r16442.	2007-10-15 09:26:42 +00:00
George Bosilca	436b0f2a5b	Way to many numbers in this uint32_t. This commit was SVN r16437.	2007-10-12 13:11:55 +00:00
Jeff Squyres	3500376d9e	Remove a warning about an unused label. This commit was SVN r16429.	2007-10-11 16:38:37 +00:00
George Bosilca	e3105a85be	Don't require a progress function from the PML. If there is one then the PML base will take care of the registration with the event library. Otherwise, (and this apply for the CM case) the MTL are in charge of registering their own progress function. This commit was SVN r16415.	2007-10-09 23:28:53 +00:00
Galen Shipman	6a25a635de	that shouldn't have slipped through.. This commit was SVN r16411.	2007-10-09 19:07:23 +00:00
Galen Shipman	6b051e255e	already checked size.. no need to do it again.. This commit was SVN r16409.	2007-10-09 18:59:10 +00:00
Nysal Jan	b51d85fb3f	Fix assertion failure "assert( 0 == btl_endpoint->endpoint_cache_length )" while executing mt_coll testcase. This commit was SVN r16408.	2007-10-09 18:00:01 +00:00
Galen Shipman	62ade993ca	Seperate finalize and close for the PML, this gives the PML a chance to complete any outstanding operations prior to close. Before this change we just called pml_finalize in pml_close which causes problems if there are outstanding events that a BTL/MTL needs to progress during finalize. The problem is that MPI_COMM_WORLD and others were destroyed prior to closing the PML, pml_close would call pml_finalize, events would progress in the BTL, and these events expected MPI_COMM_WORLD to still be around.. This commit was SVN r16405.	2007-10-09 15:28:56 +00:00
Andrew Friedley	c15047b264	Add LLNL copyright to the file i modified yesterday This commit was SVN r16404.	2007-10-09 15:18:23 +00:00
Andrew Friedley	fd51d9cf28	The call to opal_list_insert() had an off by one error (I think), causing selected components to get lost with certain load orderings. I went ahead and rewrote the code to use opal_list_insert_pos() instead, which gives a cleaner flow and more speed. This commit was SVN r16392.	2007-10-08 23:01:36 +00:00
Josh Hursey	7437f37e96	This commit contains the following: * Fix some missing includes in a few places. * Add the cr_request() functionality to the BLCR CRS component. We are now dependent upon the 0.6.* series of BLCR. * Made the CR notification mechanism a registered function. This way we can have an OPAL-only version and it can be replaced at runtime with the ORTE version. * Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only CR functionality when the user wants it. Default: Disabled. * Fix the placement of a checkpoint request check in MPI_Init * Pull the OPAL notification mechanism into the SnapC framework. * We no longer fork/exec the 'opal-checkpoint' command for local checkpointing, the Local coordinator in the orted does this directly. * The Local and Application coordinator talk together bypassing the OPAL notifiation mechanism. * Optimized the Local <-> App Coordinator communication. * Improved the structure used to track vpid_snapshots in the local coord. * Fix a race condition in which an application under heavy communication load may produce an inconsistent global checkpoint. This commit was SVN r16389.	2007-10-08 20:53:02 +00:00
Jeff Squyres	f92d9097d8	Some more changes to update to coll v1.1.0 that were missed yesterday. This actually exposed a very, very long-standing bug where part of the coll base was incorrectly checking the coll API version against the MCA API version. When coll went to v1.1 (yesterday) and was no longer the same as the MCA v1.0, the test started failing. This commit fixes to check for v1.1 everywhere in the coll base, and to ensure to check coll framework/API version numbers against coll framework/API version numbers (vs. against the MCA API version number). This commit was SVN r16373.	2007-10-07 12:20:22 +00:00
Jeff Squyres	3d34bff596	No technical/functional changes: simply change the name of the "data" parameter to "module" everywhere, just to be a little more clear what the purpose of that parameter is. This commit was SVN r16372.	2007-10-07 08:36:45 +00:00
Jeff Squyres	fc2b4376e9	Update forgotten macro. This commit was SVN r16368.	2007-10-06 14:11:35 +00:00
Ralph Castain	54b2cf747e	These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC. The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component. This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done: As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in. In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in. The incoming changes revamp these procedures in three ways: 1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step. The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic. Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure. 2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed. The size of this data has been reduced in three ways: (a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes. To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose. (b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction. (c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using. While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly. 3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup. It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging. Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future. There are a few minor additional changes in the commit that I'll just note in passing: propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details. * requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details. * cleanup of some stale header files This commit was SVN r16364.	2007-10-05 19:48:23 +00:00
Jelena Pjesivac-Grbovic	ada43fef9e	This fixes bug #1157 in coll/self module. All vector functions had incorrect handling of the offset. This commit was SVN r16360.	2007-10-05 17:40:16 +00:00
Jeff Squyres	f92154fc72	Gah -- ompi_info doesn't setup the connect pseudo component, so it'll be NULL. Ensure to protect for this. This commit was SVN r16333.	2007-10-04 18:03:56 +00:00
Jeff Squyres	13fa7ae93e	It's not necessary to link against all 3 libs (in fact, we shouldn't do it -- let libtool pull them in via the .la file if it needs to) This commit was SVN r16332.	2007-10-04 18:01:30 +00:00
Jeff Squyres	80ce974291	Fixes trac:1156: ensure to finalize the "connect" sub-component. This commit was SVN r16330. The following Trac tickets were found above: Ticket 1156 --> https://svn.open-mpi.org/trac/ompi/ticket/1156	2007-10-04 17:36:12 +00:00
Andrew Friedley	2e66590993	Fix mistakes in the basic component.. can't call collectives on the communicator and always pass the basic module.. have to give them the module off the communicator. This commit was SVN r16329.	2007-10-04 16:29:24 +00:00
Andrew Friedley	5be7f5e2dc	fixes trac:1154 Check if an exclusion string (i.e. '-mca btl ^sm) was provided; if so OFUD just disables itself. This commit was SVN r16307. The following Trac tickets were found above: Ticket 1154 --> https://svn.open-mpi.org/trac/ompi/ticket/1154	2007-10-02 20:37:16 +00:00
Gleb Natapov	60af46d541	We have QP description in component structure, module structure and endpoint. Each one of them has a field to store QP type, but this is redundant. Store qp type only in one structure (the component one). This commit was SVN r16272.	2007-09-30 16:14:17 +00:00
Gleb Natapov	9c04b127f5	Forget to put this fix in previous commit. This commit was SVN r16271.	2007-09-30 15:33:20 +00:00
Gleb Natapov	3a15d645be	Remove lcl_qp_attr from endpoint qp description. It is used during init only. This commit was SVN r16270.	2007-09-30 15:29:35 +00:00
Aurelien Bouteiller	670956e172	Another cast mistake. This commit was SVN r16247.	2007-09-26 21:14:35 +00:00
Aurelien Bouteiller	f7d7d58fb6	Various cast type errors on 64bit architectures This commit was SVN r16246.	2007-09-26 20:54:18 +00:00
Brian Barrett	56e26ed390	Need to install the mpool_rdma.h so that we can build external BTLs that use the RDMA protocol This commit was SVN r16237.	2007-09-26 16:58:54 +00:00
Gleb Natapov	c7105eadc7	Update Voltaire copyright. This commit was SVN r16189.	2007-09-24 10:11:52 +00:00
Aurelien Bouteiller	0df0087f17	Investigating improvement of cache line management on shared memory This commit was SVN r16183.	2007-09-21 20:02:56 +00:00
Josh Hursey	1fe1276fd5	Make sure to match on the communicator ID as well. This commit was SVN r16179.	2007-09-21 18:16:02 +00:00
Josh Hursey	3e51d7bb25	Implement the MPI_Iprobe and MPI_Probe wrappers. Remove some old, unused code. This commit was SVN r16178.	2007-09-21 16:28:46 +00:00
Aurelien Bouteiller	d3b376a340	This patch adds actual non-blocking sender-based message logging. This improves bandwidth. Still need to work on malloc/mmap storage to reach optimal bandwidth. This commit was SVN r16172.	2007-09-21 03:24:08 +00:00
Aurelien Bouteiller	bc318b35e2	There is room in convertor to copy the packed data. It works just need to add the correct memcopy. It does not manage the short messages but I alreqdy think of a workaround for this (and it might even be better regarding latency). This commit was SVN r16169.	2007-09-20 21:57:21 +00:00
Aurelien Bouteiller	bbac6e650a	New improved version of sender-based. Under dev but a new framework for expressing various methods have been added. This commit was SVN r16159.	2007-09-19 03:42:56 +00:00
Gleb Natapov	097b17d30e	Prevent a receive request from been freed while other thread holds a reference to it or there is an outstanding completion for the request. This commit was SVN r16153.	2007-09-18 16:18:47 +00:00
Jeff Squyres	33955a0ed0	Oops -- when converted from uint to int, -1 (the default value, meaning "infinite") is no longer larger than the minimum required size. So put in an appropriate test to ensure that "infinite" was not requested. This commit was SVN r16142.	2007-09-17 19:28:21 +00:00
Jeff Squyres	130a272cec	Fix some compiler warnings about signed/unsigned comparisons. This commit was SVN r16139.	2007-09-17 13:08:45 +00:00
Josh Hursey	d2ef0d445a	Add some basic timing hooks so I can extract a few more detailed performance numbers for tuning. Switch the bookmark_recv to be non-blocking. If this is blocking then for process counts >= 32 slight process delays were causing cascading performance delays in the protocol. This lead to checkpoints either taking about 3 sec or 45 sec (or more) for 64 procs due to the cascading delays. With the nonblocking receive version this is no longer the case we get the speedup we expect for this part of the protocol. More tuning to come. This commit was SVN r16137.	2007-09-16 15:13:23 +00:00
Jeff Squyres	6004e177e0	Fixes trac:1133: if you specify a max freelist size that is too small, you'll get a helpful error message and the openib BTL will deactivate itself. This commit was SVN r16133. The following Trac tickets were found above: Ticket 1133 --> https://svn.open-mpi.org/trac/ompi/ticket/1133	2007-09-14 21:42:56 +00:00
George Bosilca	617ff3a413	Add a MCA parameter for the ELAN MAP ID file. Fix small memory bugs, and track the final segfault. Still some ork to do. This commit was SVN r16117.	2007-09-12 21:25:35 +00:00
Aurelien Bouteiller	a1f5312afb	Fixed two little warnings This commit was SVN r16116.	2007-09-12 21:07:11 +00:00
Aurelien Bouteiller	ccb3f75e8f	Make sure that the pml v parasite never get loaded when user did not requested FT. This does not break the ability to switch protocol on the fly. This commit was SVN r16114.	2007-09-12 20:47:17 +00:00
George Bosilca	1e7a791349	Remove some of the problems identified by Coverty. This commit was SVN r16112.	2007-09-12 20:13:26 +00:00
Aurelien Bouteiller	828af95be8	Major modification of the vprotocol framework build system. With a better integration in autogen.sh, it allows for generating static-components.h the usual way. NOTE: This build system does not work with the current autogen.sh. Modified one is under heavy testing to make sure it does not have side effects This commit was SVN r16110.	2007-09-12 18:46:37 +00:00
George Bosilca	05ae27c68b	Don't segfault if we receive a fragment for a non existing communicator. Instead, drop it by now. This commit was SVN r16105.	2007-09-12 17:52:02 +00:00
George Bosilca	c755938eb0	Coverty: release the temporary buffer on error. This commit was SVN r16104.	2007-09-12 17:45:12 +00:00
Shiqing Fan	a0660f4deb	- Just some type casts. This commit was SVN r16100.	2007-09-12 15:29:58 +00:00
Gleb Natapov	07c8fddeef	Fix scheduling of pending send request. It should be scheduled req_lock times. This commit was SVN r16096.	2007-09-12 07:08:38 +00:00
George Bosilca	d8fed2cfa1	Set a default value so that some compilers stop complaining about uninitialized values. This commit was SVN r16094.	2007-09-11 18:00:53 +00:00
Gleb Natapov	b0614931f4	Remove mpool_tree_item from the mpool_tree before unregistering/freeing memory. Otherwise a race exists if another thread allocates already freed memory which is not removed from the mpool_tree yet. This commit was SVN r16038.	2007-09-03 10:56:55 +00:00
Rainer Keller	a3b30749b0	- Only lock/unlock when using threads. Basically revert this part of r16015. This commit was SVN r16029. The following SVN revision numbers were found above: r16015 --> open-mpi/ompi@435e7d80e9	2007-08-31 12:34:48 +00:00
Rainer Keller	9c1c345c07	- head_lock is an opal_atomic_lock_t... This commit was SVN r16028.	2007-08-31 12:20:21 +00:00
Shiqing Fan	efdcfa3807	- "extern 'C'" has been set twice. Remove one. This commit was SVN r16022.	2007-08-30 15:03:59 +00:00
Shiqing Fan	80fdd5e2a4	- Need to be exported. This commit was SVN r16021.	2007-08-30 14:16:03 +00:00
Gleb Natapov	79011279e5	Remove debug output. This commit was SVN r16016.	2007-08-30 13:29:41 +00:00
Gleb Natapov	435e7d80e9	Remove rc parameter from MCA_BTL_SM_FIFO_WRITE() macro. It cannot fail in current implementation. This commit was SVN r16015.	2007-08-30 13:21:52 +00:00
Gleb Natapov	690fb95bda	Cleanup send scheduling code. This commit was SVN r16014.	2007-08-30 12:10:04 +00:00
Gleb Natapov	0b0f9d14aa	Mark send request complete on PML level only when absolutely sure there is no more work associated with this request. No more outstanding completions or packets and send scheduling isn't running in another thread. This commit was SVN r16013.	2007-08-30 12:08:33 +00:00
Gleb Natapov	fe414047bd	registration may be freed inside mca_mpool_rdma_deregister(). This commit was SVN r16012.	2007-08-30 10:52:38 +00:00
Gleb Natapov	091862a25a	Protect access to mca_mpool_base_tree by a lock. This commit was SVN r16011.	2007-08-30 10:51:02 +00:00
Gleb Natapov	eac2674f66	The inner voice tells me this is a typo. This commit was SVN r16004.	2007-08-29 13:28:47 +00:00
Jeff Squyres	466394a878	We only care about the value of ret in the !OMPI_ENABLE_PROGRESS_THREADS case. Reviewed by Brian. This commit was SVN r16000.	2007-08-29 01:36:17 +00:00
Jeff Squyres	c4a38f47f6	Resolve Coverity CID 467: remove unused variable / dead code. This commit was SVN r15997.	2007-08-29 01:23:18 +00:00
Brian Barrett	59b22533f2	Enable RDMA for heterogeneous situations. Currently done by overloading the ompi_convertor_need_buffers function to only return 0 if the convertor is homogeneous (which it never does on the trunk, but does to on v1.2, but that's a different issue). Only enable the heterogeneous rdma code for a btl if it supports it (via a flag), as some btls need some work for this to work properly. Currently only TCP and OpenIB extensively tested This commit was SVN r15990.	2007-08-28 21:23:44 +00:00
Gleb Natapov	fa69c5cc10	If a memory on a sender's size is not registered don't register it on a receive side too. Otherwise a content of the recvreq->req_rdma array is replaced later without freeing previous content and refcount on registration in mpool become wrong. This commit was SVN r15978.	2007-08-28 07:43:06 +00:00
Rich Graham	bc97d22182	remove tabs. Remove old code that was commented out. This commit was SVN r15975.	2007-08-28 03:08:36 +00:00
Rich Graham	4d58f9aed7	Add comments. Move temporary receive object from a free list object to a stack object. This commit was SVN r15971.	2007-08-27 21:41:04 +00:00
Gleb Natapov	e1a1d9d90e	Receive request converter can be accessed in parallel by a thread that receives data and a thread that run RDMA schedule function. Protect access to the converter by a lock. This commit was SVN r15967.	2007-08-27 11:41:42 +00:00
Gleb Natapov	33196d972b	post_send() function is called without endpoint lock held from explicit credits update function so eager_rdma_remote.head have to be updated in a thread safe manner. This commit was SVN r15966.	2007-08-27 11:37:01 +00:00
Gleb Natapov	32a61c3bf2	Credit fragment is not protected properly from concurrent access. There is a race that can prevent further explicit credits update from been sent. Fix the race. This commit was SVN r15965.	2007-08-27 11:34:59 +00:00
Gleb Natapov	065d04dfde	Do not free recvreq while schedule function is running in another thread. This commit was SVN r15964.	2007-08-27 11:31:40 +00:00
Brad Benton	ccda5c9c74	Modified the MCA_BTL_TCP_CONNECTED case in mca_btl_tcp_endpoint_send_handler() to always first check for a NULL frag pointer before trying to send the fragment. This avoids an issue in multi-threaded execution in which multiple threads working on the same endpoint can result in a thread finding itself here with nothing to send. This commit was SVN r15963.	2007-08-26 23:40:02 +00:00
Edgar Gabriel	a2f5cada1a	convert the hiearch component to the new structure. More testing required before we remove the .ompi_ignore flag again. This commit was SVN r15954.	2007-08-23 20:41:29 +00:00
Rainer Keller	1b5fa48a29	- Add missing PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q when matching from the posted generic_recv-queue. - Move the PERUSE_COMM_MSG_MATCH_POSTED_REQ from MCA_PML_OB1_RECV_REQUEST_MATCHED to mca_pml_ob1_recv_frag_match() as suggested by Terry Dontje Only post, if this is not a probe/iprobe request. - Do not post PERUSE_COMM_REQ_MATCH_UNEX for probes / iprobes and do in correct order before PERUSE_COMM_MSG_REMOVE_FROM_UNEX_Q This commit was SVN r15947.	2007-08-23 07:09:43 +00:00
Rainer Keller	c175801f98	- Initialize in the order of mca_pml_ob1_comm_proc_t... This commit was SVN r15946.	2007-08-23 05:56:22 +00:00
Rainer Keller	b0df55d53b	- For MPI_Probe/MPI_Iprobe, we should not have a PERUSE_COMM_REQ_ACTIVATE event. Therefore move the PERUSE_TRACE_COMM_EVENT for this event from MCA_PML_BASE_SEND_REQUEST_INIT / MCA_PML_BASE_RECV_REQUEST_INIT to the proper places into pml_ob1_isend.c / pml_ob1_irecv.c right after the MCA_PML_OB1_SEND_REQUEST_INIT / MCA_PML_OB1_RECV_REQUEST_INIT. This commit was SVN r15945.	2007-08-23 05:52:33 +00:00
Gleb Natapov	becf4aa9c9	ompi_pointer_array_get_size doesn't return how much elements are actually in an array, so count them by ourselves. This commit was SVN r15943.	2007-08-22 09:31:12 +00:00
Shiqing Fan	a497a3fcad	- Fix some small bugs, copy-paste mistakes. This commit was SVN r15941.	2007-08-21 19:57:28 +00:00
Sven Stork	3985a35c35	- export required symbol This commit was SVN r15939.	2007-08-21 18:46:11 +00:00
Gleb Natapov	d8f3063895	Create only one CQ for all BTLs on the same HCA. Many BTLs can be created for one HCA. Multiple ports, LMC, multiple BTLs per one LID. Having only one CQ for all of them substantially reduce polling time. This commit was SVN r15933.	2007-08-20 12:28:25 +00:00
Gleb Natapov	5596aa5f53	The sizes of mca_pml_ob1_send_request_t and mca_pml_ob1_recv_request_t depend on a parameter and are determined in runtime. r15346 removed calculation of correct sizes for this structures. This patch adds it back and fixes trac:1116, #1114. This commit was SVN r15932. The following SVN revision numbers were found above: r15346 --> open-mpi/ompi@433f8a7694 The following Trac tickets were found above: Ticket 1116 --> https://svn.open-mpi.org/trac/ompi/ticket/1116	2007-08-20 12:06:27 +00:00
George Bosilca	c7e0ab93ae	Don't forget to include string.h for the strcmp function. This commit was SVN r15927.	2007-08-19 19:59:15 +00:00
Brian Barrett	af4e86c25f	Update collectives selection logic to allow for multiple components to be used at nce (up to one unique collective module per collective function). Matches r15795:15921 of the tmp/bwb-coll-select branch This commit was SVN r15924. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15795 r15921	2007-08-19 03:37:49 +00:00
Brian Barrett	2b8af283de	Add ability to completely turn off MPI one-sided support, so that users can experiment with using ROMIO directly. This commit was SVN r15922.	2007-08-18 21:35:51 +00:00
Josh Hursey	729c63cf9d	Fix invalid MCA 'base' names so they appear in ompi_info. A subset of this patch needs to be applied to v1.2 Refs trac:928 This commit was SVN r15918. The following Trac tickets were found above: Ticket 928 --> https://svn.open-mpi.org/trac/ompi/ticket/928	2007-08-18 03:05:45 +00:00
Brian Barrett	3b98b5f0a1	The reference implementation of Portals (which runs over TCP on Linux) is only static libraries. Previously, we were linking the libraries into directly into the common, btl, and mtl code. This seemed to work fine for me on my Opteron Fedora box, but caused Lisa some issues (PtlNIInit would succeed, but the network handle would fail when used with PtlEQAlloc). Instead, link the portals libraries directly into libmpi and not at all into the common, btl, or mtl components. THen use some linker tricks to force the linker to bring in the public interface for the reference implementation (which thankfully is pretty small). This commit was SVN r15902.	2007-08-17 03:56:49 +00:00
Brad Benton	c254645383	Fixes trac:1134. Fixed a condition test while checking that all segments are empty. Without this fix, a NULL segment pointer could make it past the test, resulting in a SegV when dereferenced. This commit was SVN r15891. The following Trac tickets were found above: Ticket 1134 --> https://svn.open-mpi.org/trac/ompi/ticket/1134	2007-08-16 19:39:52 +00:00
Brad Benton	1ddba9ec65	Lock the endpoint before doing endpoint_state processing. This ensures that the subsequent unlock is valid. This commit was SVN r15890.	2007-08-16 18:11:29 +00:00
Aurelien Bouteiller	3a83c61c40	Fixed a bug with available space in sender based. This commit was SVN r15889.	2007-08-16 17:54:26 +00:00
Tim Prins	5a795128af	Change it so that different components in orte use unique rml tags This commit was SVN r15881.	2007-08-16 14:02:35 +00:00
Aurelien Bouteiller	77565d60d9	Heavy modification of the pml_v framework. * Code cleanup and rationalization * Fixed: mca_pml_base_send/recv_request are now allocated before recreation by the PML-V * Fixed: pointer arithmetic bug in sender based that crashed * Changed: directory structure. This is one step forward using autogen.sh to build static-components.h (it needs to have the directory structure of a mca framework for this). This commit was SVN r15878.	2007-08-16 05:52:30 +00:00
Aurelien Bouteiller	ee708d702d	Slight modification to register the name of the selected pml (from the pml framework) instead of the generic mca name. This might be a different name when enabling FT features. This name modification in the modex allows the PMLS to detect a FT protocol mismatch among hosts. This commit was SVN r15877.	2007-08-16 05:46:11 +00:00
Aurelien Bouteiller	fa7f6f6722	Improved error detection of request types This commit was SVN r15857.	2007-08-14 17:24:46 +00:00
Aurelien Bouteiller	67399e7c31	Added a debug type checking for request types (to make sure request size is correctly computed). This commit was SVN r15856.	2007-08-14 17:18:15 +00:00
Aurelien Bouteiller	1d97c183e7	Better argument checking for output function and added a routine for error printing. This commit was SVN r15855.	2007-08-14 17:17:12 +00:00
Jeff Squyres	d7c5fea096	* Fix problem caused by r15848: the test parser was looking for semicolons but the new specitifcation string used colons. The text parser now looks for colons. * Changed all opal_output() error messages to much-more-helpful/descriptive opal_show_help() messages. * A few minor style/indenting fixes This commit was SVN r15850. The following SVN revision numbers were found above: r15848 --> open-mpi/ompi@dd30597f39	2007-08-14 14:46:13 +00:00
Jeff Squyres	dd30597f39	Change the default receive_queues value per http://www.open-mpi.org/community/lists/devel/2007/08/2100.php. This commit was SVN r15848.	2007-08-13 21:51:05 +00:00
Jelena Pjesivac-Grbovic	9bd9c92dbd	Making sure that the decision function for scatter and gather correctly computes everything for MPI_IN_PLACE case. This commit was SVN r15841.	2007-08-13 17:35:50 +00:00
Jelena Pjesivac-Grbovic	b558e820cb	removing compiler wraning This commit was SVN r15803.	2007-08-08 15:22:01 +00:00
Jelena Pjesivac-Grbovic	daa10b277e	modifying scatter decision function to use binomial algorithm for small message sizes. This commit was SVN r15798.	2007-08-07 22:16:13 +00:00
Pak Lui	0790c4cc40	* Update the comment for the previous fix. Thanks Gleb for pointing out. This commit was SVN r15790.	2007-08-07 14:40:13 +00:00
Jeff Squyres	50bae9c603	Bring in the modular-wireup stuff for the openib BTL (from /tmp/jms-modular-wireup branch): * This commit moves all the openib BTL connection code out of btl_openib_endpoint.c and into a connect "pseudo-component" area, meaning that different schemes for doing OFA connection schemes can be chosen via function pointer (i.e., MCA parameter) at run-time. * The connect/connect.h file includes comments describing the specific interface for the connect pseudo-component. * Two pseudo-components are in this commit (more can certainly be added). * oob: use the same old oob/rml scheme for creating OFA connections that we've had forever; this now just puts the logic into this self-contained pseudo-component. * rdma_cm: a currently-empty set of functions (that currently return NOT_IMPLEMENTED) that will someday use the RDMA connection manager to make OFA connections. This commit was SVN r15786.	2007-08-06 23:40:35 +00:00
Aurelien Bouteiller	ca69915b1e	Code cleanup This commit was SVN r15783.	2007-08-06 22:20:44 +00:00
Brian Barrett	69952d9603	Fix abort caused by calling PtlEQGet on an invalid eq, which could occur if add_procs was never called. This commit was SVN r15779.	2007-08-06 17:28:11 +00:00
Brian Barrett	1fb78a35f9	Back out part of r15756. The common_portals_utcp.c file is only used with the Sandia reference implementation of Portals, and doesn't have the cnos functions. This file should never be compiled (and wasn't being compiled) on the Cray machines, so doesn't need to be updated to support CNL. This commit was SVN r15778. The following SVN revision numbers were found above: r15756 --> open-mpi/ompi@755658694e	2007-08-06 17:21:00 +00:00
Sven Stork	9e2263f29f	- fix a small memory leak This commit was SVN r15768.	2007-08-06 13:35:32 +00:00
Mohamad Chaarawi	59a7bf8a9f	Merging in the Sparse Groups.. This commit includes config changes.. This commit was SVN r15764.	2007-08-04 00:41:26 +00:00
George Bosilca	e41ee17ca5	Add a small comment that hopefully will enforce the correct ordering of the fields between CM and the other PML in the requests structure. This commit was SVN r15760.	2007-08-03 23:59:29 +00:00
Josh Hursey	755658694e	Bring in changes to support Cray's Compute Node Linux (CNL) and Application Level Placement Scheduler (ALPS). This commit was tested under two Cray machines at ORNL: Jaguar (Catamount) and Rizzo (CNL Test cage). Both machines performed as they should across the commit. It is likely that mor changes will follow this the work and environment stabilizes. Most of the infrastructure works the same for Catamount and CNL except for a few bits. Below are the highlights: Default IFACE Change: On Catamount we can use PTL_IFACE_DEFAULT, but on the CNL system we have access to will fail on this interface, and should be set to: IFACE_FROM_BRIDGE_AND_NALID(PTL_BRIDGE_UK,PTL_IFACE_SS). So if we detect that we are running with YOD then use the former interface and if we detect that we are running with ALPS then use the latter. We will want to pursue a more elegant solution if this interface continues to change across machines. PtlGetId and cnos_register_ptlid: The header suggests that these should never be called when launching with YOD. But in the ALPS environment the cnos_barrier() will hang forever if these functions are not called after PtlNIInit(). Since these functions only need to be called once, and the orte rmgr/cnos component is loaded before the ompi common/portals componet then just call these functions once in the rmgr/cnos component. cnos_barrier_init(): This is a noop for YOD, but critical for ALPS. So be sure to call it before calling the first barrier in the rmgr/cnos component. cnos_barrier vs cnos_pm_barrier: It is suggested the cnos_pm_barrier only be used during finalization as it will indicate to the launcher (yod or aprun) that the app is about to complete. It was suggested that we use the regular cnos_barrier() instead. I want to look into this a bit more to make sure there are not adverse side effects. A note has been placed in the code to indicate this reasoning. This commit was SVN r15756.	2007-08-03 19:46:38 +00:00
Pak Lui	010d216db9	* restrict the user with 32 bit app to specify a sm_size to be between 2GB to 4GB-1 by using long instead of size_t for the sm size. * it is done to prevent user from running into the ftruncate() in common sm component (and possibly others) problem that ftruncate takes an off_t which is a signed long integer. If we use an unsigned long, it'll run into an invalid argument errno=22. * See trac #1117 This commit was SVN r15752.	2007-08-03 15:43:02 +00:00
Aurelien Bouteiller	1d160ca583	Needed change for vampir pml to work This commit was SVN r15750.	2007-08-03 02:23:24 +00:00
Jeff Squyres	d3f008492f	Introduce a new debugging MCA parameter: mpi_show_mpi_alloc_mem_leaks When activated, MPI_FINALIZE displays a list of memory allocations from MPI_ALLOC_MEM that were not freed by MPI_FREE_MEM (in each MPI process). * If set to a positive integer, display only that many leaks. * If set to a negative integer, display all leaks. * If set to 0, do not show any leaks. This commit was SVN r15736.	2007-08-01 21:33:25 +00:00
Jeff Squyres	0fb8cf65a8	If you have an HCA with no active ports, we still create an mpool. This mpool will have no btl module owner there was no btl created for the HCA with no ports, but it will still be tracked in the mpool framework (i.e., it's available). If MPI_ALLOC_MEM is called by the app, one of two things will happen: 1. if there's an HCA on the host with some active ports, the openib btl component will still be in the process space, and therefore the "mpool with no btl" (MWNB) module will still be able to call the reg/dereg functions, and all will be fine. However, if MPI_FREE_MEM is never invoked to free the memory, bad things will happen during MPI_FINALIZE. The pml is finalized, which finalizes all the btls. The btls finalize all their mpools and all is fine. But later we close down the mpool framework which then finalizes any left over mpool modules, such as MWNB. However, the openib BTL module functions that the MWNB was registered with are no longer in the process space, and it segv's while trying deregister the memory. 2. if there are no HCA's on the host with active ports, then the openib btl will have been unloaded, and when the MWNM tries to register the memory, the functions it tries to call (in the openib btl) are no longer there, and we segv. This commit was SVN r15735.	2007-08-01 20:53:34 +00:00
Gleb Natapov	627d9bc8ed	Delay freeing of a send request if scheduling function is running by other thread. This commit was SVN r15722.	2007-08-01 12:19:16 +00:00
Gleb Natapov	758f932aa6	Handle credit in a thread safe manner. I am sure more work will have to be done in this are. This commit was SVN r15721.	2007-08-01 12:15:43 +00:00
Gleb Natapov	9c20d67301	1) Return IB header to it's previous size by using char for cm_seen field. 2) Allow to specify rd_win/rd_rsv parameters by user, but make them optional. This commit was SVN r15719.	2007-08-01 12:10:56 +00:00
Aurelien Bouteiller	a403fed18a	More checkings (assert) on the output system so that malformed format string does not crash the application at a later random time. Changed various debug messages to retain most usefull messages This commit was SVN r15715.	2007-07-31 19:33:39 +00:00
Aurelien Bouteiller	cec9ce8106	Fixed: various warnings with printf(%x, uint64_t) on 32 bit architectures + some left (long) cast for size_t printf. This commit was SVN r15706.	2007-07-31 17:12:21 +00:00
Aurelien Bouteiller	a5d0e53bb3	Moved replay macros to functions. The performance improvement in process recovery does not worth the debugging hassle. This commit was SVN r15703.	2007-07-31 16:01:32 +00:00
Aurelien Bouteiller	5a792a3fad	(hopefully) fixed various pedantic warning about casts on 32bit machines. Not tried only have 64bits available. This commit was SVN r15702.	2007-07-31 15:58:19 +00:00
Sven Stork	fd778a5539	- put the label to the right place This commit was SVN r15699.	2007-07-31 09:34:41 +00:00
Sven Stork	a13d2dcb96	- fix possible memory leak found by coverity This commit was SVN r15698.	2007-07-31 09:32:49 +00:00
Aurelien Bouteiller	3559fd5d1a	Fixed issues with "verbose" output being too silent. This commit was SVN r15691.	2007-07-30 19:11:15 +00:00
Sven Stork	855434de59	- fixes several coverty issues - add missing initialisation for variables - use strncpy instead of strcpy This commit was SVN r15683.	2007-07-30 14:44:37 +00:00
Gleb Natapov	afac5eb93f	Guard recv request with lock against simultaneous access from different threads. This commit was SVN r15681.	2007-07-30 12:50:38 +00:00
Gleb Natapov	2d9669a69d	mca_btl_openib_endpoint_post_send() is called with endpoint lock held. No need to call lock() in btl_openib_acquire_send_resources(). This commit was SVN r15678.	2007-07-30 09:03:08 +00:00
Gleb Natapov	21dd061696	Init req_send_range_lock. Found by Terry Dontje. This commit was SVN r15677.	2007-07-30 08:21:52 +00:00
Rainer Keller	830de8ad20	- In ompi/mca/mpool/base/mpool_base_alloc.c, we may miss freeing value.. Actually, this string is not used anywhere else, so just have key and value on the stack. This commit was SVN r15675.	2007-07-30 07:30:47 +00:00
Aurelien Bouteiller	17e10ff918	Modified the output system to comply with a wider range of compilers. Jelena: this should solve the issue you faced today. This commit was SVN r15668.	2007-07-27 23:11:00 +00:00
Jeff Squyres	cae00d1854	Passing NULL to pthread_exit() is verbotten. This commit was SVN r15661.	2007-07-27 01:06:36 +00:00
Jeff Squyres	015fc08ff4	Remove the ib_static_rate MCA parameter; it will be replaced with a dynamic mechanism to adjust the rate only if necessary (e.g., two ports of differing speeds are connected). This commit was SVN r15653.	2007-07-26 21:10:51 +00:00
Gleb Natapov	cce6bb478c	Process message before reposting buffers. This way rd_posted should be calculated properly. This commit was SVN r15635.	2007-07-26 13:56:07 +00:00
Pavel Shamis	bda6f1a5cf	Fixing compilation problem in openib btl progress thread. This commit was SVN r15631.	2007-07-26 11:35:15 +00:00
Gleb Natapov	1f18b060ce	If eager_rdma_local in not initialized credits and rd_win are zero and the comparison is always true. This commit was SVN r15629.	2007-07-26 07:53:35 +00:00
Jeff Squyres	e36038bb17	We know that --enable-progress-threads doesn't work. But this allows it to at least compile. If you actually get to the point of invoking the openib btl progress thread, you'll get a big opal_output warning that it is pretty much guaranteed not to work. This commit was SVN r15628.	2007-07-26 00:58:56 +00:00
Aurelien Bouteiller	e07b95bdd5	Fixed: warnings with printf(%d, size_t) Fixed: All copyrights are now correct up to 2007 Fixed: Build system now works with VPATHs Changed: protocol_example is now ignored by default This commit was SVN r15627.	2007-07-25 22:28:04 +00:00
Galen Shipman	f6a20715b7	minor nit.. This commit was SVN r15619.	2007-07-25 17:34:37 +00:00
Galen Shipman	514811c50b	cleanup btl.h comments document the btl interface a bit better This commit was SVN r15618.	2007-07-25 17:26:23 +00:00
Galen Shipman	438a56e0d7	update copyrights for ib_multifrag commit This commit was SVN r15612.	2007-07-25 15:03:34 +00:00
Galen Shipman	325c184fb4	remove debugging "abort()" fix a debugging assert This commit was SVN r15611.	2007-07-25 14:51:19 +00:00
Jeff Squyres	4d53b2f2a7	Remove some now-obsolete mpool's (all have been replaced by RDMA). These components were already .ompi_ignored -- this removal should not trigger the need for an autogen. This commit was SVN r15607.	2007-07-25 12:37:09 +00:00
Jeff Squyres	f4b117957d	Add MCA parameter to enable/disable Nagle's algorithm on the TCP BTL. This commit was SVN r15606.	2007-07-25 12:21:00 +00:00
George Bosilca	873bd41796	More fixes for the Windows suport. This commit was SVN r15602.	2007-07-25 04:22:21 +00:00
George Bosilca	10175c3014	No more warnings in the PML V. This commit was SVN r15601.	2007-07-25 04:19:58 +00:00
George Bosilca	c961cb5749	The Windows support is now back in bussiness. This commit was SVN r15599.	2007-07-25 03:55:34 +00:00
George Bosilca	c6d2e03cdd	Correct the prototype for non GNU compilers. This commit was SVN r15598.	2007-07-25 03:50:35 +00:00
Donald Kerr	be0bf9c27d	add a missing subroutine prototype This commit was SVN r15590.	2007-07-24 21:07:57 +00:00
Jeff Squyres	f2a2b2c0f9	A little more error checking; clean up the invalid MCA help message This commit was SVN r15589.	2007-07-24 20:57:40 +00:00
Gleb Natapov	5b7d3faedc	Implement "credit management for credit messages" protocol. On each message a sender piggybacks a number of credit messages it received from a peer. A number of outstanding credit messages is limited. This is needed to never ever fall back to HW flow control. This commit was SVN r15580.	2007-07-24 15:19:51 +00:00
Gleb Natapov	45a7a0650b	btl_openib_handle_incoming() is called from regular receive path and from eager RDMA receive path and checks internally from where it was called from to perform different tasks. Leave only common code in there and move other code to appropriate places. This commit was SVN r15579.	2007-07-24 13:23:08 +00:00
George Bosilca	0486e8949e	Remove all warnings. This commit was SVN r15570.	2007-07-23 21:06:25 +00:00
Donald Kerr	2df5576d1d	add support for if_include/if_exclude mca parameter to allow selection of udapl registry interface adapters; reviewed by rolf van de vaart This commit was SVN r15565.	2007-07-23 19:49:34 +00:00
George Bosilca	21a7670390	Update the elan BTL. Now we support the following protocols: send, put and partially get. This commit was SVN r15564.	2007-07-23 19:07:13 +00:00
Pavel Shamis	ebadede3c4	Adding mca_mpool_rdma_finalize. This commit was SVN r15557.	2007-07-23 16:18:36 +00:00
Brian Barrett	9184db7239	Update ROMIO release to the one included with MPICH2-1.0.5p4, tagged in vendor/romio as mpich2-1.0.5p4. This commit was SVN r15544.	2007-07-21 22:08:27 +00:00
Aurelien Bouteiller	16da13c79e	Missing file... This commit was SVN r15540.	2007-07-20 22:24:02 +00:00
Aurelien Bouteiller	70bb44d7a9	Moving the Message Log framework to the trunk. Protocol example (simple showcase) and sender based are provided for now. Ignored by default except for utk folks. This commit was SVN r15539.	2007-07-20 21:36:11 +00:00
Brian Barrett	5b9fa7e998	reapply r15517 and r15520, which were removed in r15527 so that I could get the RML/OOB merge in slightly easier This commit was SVN r15530. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95 r15520 --> open-mpi/ompi@9cbc9df1b8 r15527 --> open-mpi/ompi@2d17dd9516	2007-07-20 02:34:29 +00:00
Brian Barrett	39a6057fc6	A number of improvements / changes to the RML/OOB layers: * General TCP cleanup for OPAL / ORTE * Simplifying the OOB by moving much of the logic into the RML * Allowing the OOB RML component to do routing of messages * Adding a component framework for handling routing tables * Moving the xcast functionality from the OOB base to its own framework Includes merge from tmp/bwb-oob-rml-merge revisions: r15506, r15507, r15508, r15510, r15511, r15512, r15513 This commit was SVN r15528. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15506 r15507 r15508 r15510 r15511 r15512 r15513	2007-07-20 01:34:02 +00:00
Brian Barrett	2d17dd9516	temporarily back our r15517 and 15520 so that I can get the RML / OOB changes to cleanly apply This commit was SVN r15527. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95	2007-07-20 01:10:34 +00:00
Tim Prins	0b06832fc7	Properly return a value in all cases. This commit was SVN r15519.	2007-07-19 21:33:23 +00:00
Ralph Castain	41977fcc95	Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but... Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point. Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings. This commit was SVN r15517.	2007-07-19 20:56:46 +00:00
Brian Barrett	9b14008f61	add a couple of comments, clean up the organization a bit This commit was SVN r15499.	2007-07-18 22:56:33 +00:00
Pavel Shamis	d837f1446b	It is work around for Ticket #1092 . It will prevent the error failure in openib finalize but it doesn't resolve the actual issue. I guess that oneside tests some how allocates memory (mpool?) and doesn't release it. Need to check it. This commit was SVN r15488.	2007-07-18 18:02:13 +00:00
Gleb Natapov	45fcb45e31	Remove debug checks that produce lots of warnings during compilation. This commit was SVN r15479.	2007-07-18 13:49:15 +00:00
Gleb Natapov	30b2183314	Remove debug output from a hot path. This commit was SVN r15478.	2007-07-18 12:48:34 +00:00
Jeff Squyres	3bc940ac27	Fix three things from r15474 (thanks to Brian for noticing): * bml.h had a change that introduced a variable named "_order" to avoid a conflict with a local variable. The namespace starting with _ belongs to the os/compiler/kernel/not us. So we can't start symbols with _. So I replaced it with arg_order, and also updated the threaded equivalent of the macro that was modified. * in btl_openib_proc.c, one opal_output accidentally had its string reverted from "ompi_modex_recv..." to "mca_pml_base_modex_recv....". This was fixed. * The change to ompi/runtime/ompi_preconnect.c was entirely reverted; it was an artifact of debugging. This commit was SVN r15475. The following SVN revision numbers were found above: r15474 --> open-mpi/ompi@8ace07efed	2007-07-18 11:38:06 +00:00
Jeff Squyres	8ace07efed	This commit brings in two major things: 1. Galen's fine-grain control of queue pair resources in the openib BTL. 1. Pasha's new implementation of asychronous HCA event handling. Pasha's new implementation doesn't take much explanation, but the new "multifrag" stuff does. Note that "svn merge" was not used to bring this new code from the /tmp/ib_multifrag branch -- something Bad happened in the periodic trunk pulls on that branch making an actual merge back to the trunk effectively impossible (i.e., lots and lots of arbitrary conflicts and artifical changes). :-( == Fine-grain control of queue pair resources == Galen's fine-grain control of queue pair resources to the OpenIB BTL (thanks to Gleb for fixing broken code and providing additional functionality, Pasha for finding broken code, and Jeff for doing all the svn work and regression testing). Prior to this commit, the OpenIB BTL created two queue pairs: one for eager size fragments and one for max send size fragments. When the use of the shared receive queue (SRQ) was specified (via "-mca btl_openib_use_srq 1"), these QPs would use a shared receive queue for receive buffers instead of the default per-peer (PP) receive queues and buffers. One consequence of this design is that receive buffer utilization (the size of the data received as a percentage of the receive buffer used for the data) was quite poor for a number of applications. The new design allows multiple QPs to be specified at runtime. Each QP can be setup to use PP or SRQ receive buffers as well as giving fine-grained control over receive buffer size, number of receive buffers to post, when to replenish the receive queue (low water mark) and for SRQ QPs, the number of outstanding sends can also be specified. The following is an example of the syntax to describe QPs to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues: {{{ -mca btl_openib_receive_queues \ "P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32" }}} Each QP description is delimited by ";" (semicolon) with individual fields of the QP description delimited by "," (comma). The above example therefore describes 4 QPs. The first QP is: P,128,16,4 Meaning: per-peer receive buffer QPs are indicated by a starting field of "P"; the first QP (shown above) is therefore a per-peer based QP. The second field indicates the size of the receive buffer in bytes (128 bytes). The third field indicates the number of receive buffers to allocate to the QP (16). The fourth field indicates the low watermark for receive buffers at which time the BTL will repost receive buffers to the QP (4). The second QP is: S,1024,256,128,32 Shared receive queue based QPs are indicated by a starting field of "S"; the second QP (shown above) is therefore a shared receive queue based QP. The second, third and fourth fields are the same as in the per-peer based QP. The fifth field is the number of outstanding sends that are allowed at a given time on the QP (32). This provides a "good enough" mechanism of flow control for some regular communication patterns. QPs MUST be specified in ascending receive buffer size order. This requirement may be removed prior to 1.3 release. This commit was SVN r15474.	2007-07-18 01:15:59 +00:00
George Bosilca	59ee366728	Remove a compilation warning. This commit was SVN r15472.	2007-07-17 22:32:59 +00:00
Rich Graham	f2a30cde5d	add table of send completion callback functions, on a per send-type basis. This commit was SVN r15471.	2007-07-17 21:26:56 +00:00
Rich Graham	0991c3d5f5	move buffered send component clean up out of the pml to ompi_mpi_finalize. This commit was SVN r15463.	2007-07-17 14:50:52 +00:00
Rich Graham	1a4ce2a961	move setting of the component used to managed buffer sends out of the pmls, and into ompi_mpi_init. This is the first of several steps to pull buffered send management out of the pmls. This commit was SVN r15451.	2007-07-16 21:52:25 +00:00
Brian Barrett	6a1f876e98	Don't inline this function so that we can access the predefined datatype array even when visibility is turned on This commit was SVN r15444.	2007-07-16 16:29:51 +00:00
George Bosilca	c839694fb8	Dont print anything when the user requested a specific MX interface. This commit was SVN r15426.	2007-07-14 00:04:50 +00:00
George Bosilca	1e825888a5	Fix the problem reported on #1087 . The global send and receive requests queues are now release in the base close, so there is no need for the cm PML to destroy them. This commit was SVN r15425.	2007-07-13 23:56:09 +00:00
Brian Barrett	c9ad5d1f24	ooops, need to handle case where extents are not same as type sizes This commit was SVN r15423.	2007-07-13 21:26:12 +00:00
Jelena Pjesivac-Grbovic	1b66a52c50	Modifying type of binomial tree used for binomial reduce: switching: 0 0 / \ \ / \ \ 1 \ \ --> 4 \ \ / \ \ / \ \ 3 2 \ 3 2 \ 4 1 (duh). The first form is the bmtree suitable for bcast, but the latter is better for reduce. Updating default decision function accordingly. This commit was SVN r15422.	2007-07-13 21:07:51 +00:00
Pak Lui	685dd6f47b	Fixed the mpool sm size specification problem at large -np due to variable has overflown Added a verbose MCA param for showing the actual size of the mpool sm allocation See trac #1083 for details This commit was SVN r15419.	2007-07-13 20:49:30 +00:00
Brian Barrett	7a9a8c7e17	Support reduction operations other than MPI_REPLACE for user-defined datatypes with MPI_ACCUMULATE This commit was SVN r15418.	2007-07-13 20:46:12 +00:00
Galen Shipman	06b97cb267	fix template btl This commit was SVN r15413.	2007-07-13 20:06:22 +00:00
Josh Hursey	d4d5a351c1	Silence a compiler warning when not using IPV6. Also convert a few statements to conform to coding standard for Open MPI. This commit was SVN r15407.	2007-07-13 16:38:36 +00:00
Josh Hursey	021249fa65	Use the new MCA metadata flag instead of 'false' for the newly added components This commit was SVN r15400.	2007-07-13 14:39:17 +00:00
Brian Barrett	d4950c6aa1	Allow an arbitrary list of procs to be passed to the resolve function, instead of just the procs for MCW (in MCW order). Should make resolving ptl_process_id_t structures for arbitrary communicators easier for applications that need it. This commit was SVN r15393.	2007-07-12 20:55:44 +00:00
George Bosilca	752909c628	These are supposed to have a high probability of success. This commit was SVN r15377.	2007-07-11 23:02:47 +00:00
George Bosilca	8643f38adf	Don't allow the BTL to be closed before the end of the process. Count the number of times the BTLs are opened, and then don't remove them until close was called the same number of times. This commit was SVN r15376.	2007-07-11 22:21:04 +00:00
Brian Barrett	1f2942cf2a	* Provide flag if the BTL can do RDMA, but requires a prepare_{src,dst} that exactly describes the buffer to be used as the target of the operation * Use the above flag to disable components setting the flag from being used for real RDMA operations for the one-sided component (the BTLs will still be used for RDMA transfers for the PML and for send/receive communication for the OSC component) This commit was SVN r15375.	2007-07-11 21:21:40 +00:00
Brian Barrett	739fed9dc9	Don't poke at internal structure fiealds of communicators or groups, but instead use accessor functions This commit was SVN r15366.	2007-07-11 17:16:06 +00:00
Jeff Squyres	8aa8a667da	Use the OMPI version number for the component number, like all other btl components. This commit was SVN r15363.	2007-07-11 15:45:25 +00:00
Donald Kerr	88c9dfdf9f	improve message to user when dat_ia_open fails This commit was SVN r15362.	2007-07-11 15:20:35 +00:00
George Bosilca	9ed3ede73e	Correct the thin and heavy requests management for the CM PML. This commit was SVN r15361.	2007-07-11 15:10:01 +00:00
George Bosilca	ef7d17d814	Fix a copy&paste typo. This commit was SVN r15360.	2007-07-11 15:09:06 +00:00
George Bosilca	9b501eb66d	Looks like MAX is not a standard macro. Anyway, that the heavy requests is larger than the thin seems to be a "correct" assumption. This commit was SVN r15348.	2007-07-11 00:04:33 +00:00
George Bosilca	e19777e910	A more consistent version. As we now share the send and receive queue, we have to construct/destruct only once. Therefore, the construction will happens before digging for a PML, while the destruction just before finalizing the component. Add some OPAL_LIKELY/OPAL_UNLIKELY. This commit was SVN r15347.	2007-07-10 23:45:23 +00:00
George Bosilca	433f8a7694	This patch bring full support for message queues in Open MPI. Now the send and receive queues are shared among all PMLs, they are declared in the base PML, and the selected PML is in charge of initializing and releasing them. The CM PML is slightly different compared with OB1 or DR. Internally it use 2 different types of requests: light and heavy. However, now with this patch both types of requests are stored in the same queue, and cast appropriately on the allocation macro. This means we might use less memory than we allocate, but in exchange we got full support for most of the parallel debuggers. Another thing with this patch, is that now for all PML (CM included) the basic PML requests start with the same fields, and they are declared in the same order in the request structure. Moreover, the fields have been moved in such a way that only one volatile/atomic will exist per line of cache (hopefully). This commit was SVN r15346.	2007-07-10 22:16:38 +00:00
Andrew Friedley	87dd4bbd47	No idea how I did this.. thanks again to Jeff. This commit was SVN r15345.	2007-07-10 20:37:42 +00:00
Christian Bell	5ae68f82b2	fix gcc 3.x compilation warnings This commit was SVN r15327.	2007-07-10 13:54:34 +00:00
Brian Barrett	1d02b9e7b5	Fix a bunch of issues exposed by Ken Cain in getting Open MPI to work with VxWorks. Still some issues remaining, I'm sure. Refs trac:1010 This commit was SVN r15320. The following Trac tickets were found above: Ticket 1010 --> https://svn.open-mpi.org/trac/ompi/ticket/1010	2007-07-10 03:46:57 +00:00
George Bosilca	1200fa4ac5	The first version of the Elan BTL. This commit was SVN r15319.	2007-07-09 21:03:13 +00:00
Jeff Squyres	cee9c214c7	Update the vendor ID list to include HP (0x1708). Thanks to Peter Kjellstrom for pointing this out. This commit was SVN r15316.	2007-07-09 20:09:31 +00:00
Brian Barrett	8b9e8054fd	Move modex from pml base to general ompi runtime, sicne it's used by more than just the PML/BTLs these days. Also clean up the code so that it handles the situation where not all nodes register information for a given node (rather than just spinning until that node sends information, like we do today). Includes r15234 and r15265 from the /tmp/bwb-modex branch. This commit was SVN r15310. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15234 r15265	2007-07-09 17:16:34 +00:00
Andrew Friedley	b212cf4dae	Fix a signedness warning reported by Jeff/MTT. This commit was SVN r15309.	2007-07-09 15:30:29 +00:00
Tim Prins	f3ac4ac20e	Fix order of function arguments This commit was SVN r15304.	2007-07-08 16:37:51 +00:00
Gleb Natapov	88f4018543	Don't fail MPI_Alloc_mem() when no more memory can be registered. This commit was SVN r15303.	2007-07-08 11:44:58 +00:00
Jelena Pjesivac-Grbovic	d677db9b5f	cleaning up alltoall implementation: - removing MPI_* calls from bruck implementation - simplifying 2 process case - identation, etc. This commit was SVN r15301.	2007-07-07 01:06:19 +00:00
Rainer Keller	cff1b6a71b	- PERUSE_COMM_REQ_XFER_BEGIN should be emited for first fragment of larger message as well. This commit was SVN r15299.	2007-07-06 15:02:36 +00:00
Andrew Friedley	77038b65a8	Bring the UD BTL over to the trunk, named 'ofud'. This commit was SVN r15298.	2007-07-05 23:42:54 +00:00
Brian Barrett	872623a527	Fix situation where we were not propogating error codes from ROMIO into non-blocking request status fields, so it was never being relayed to the user This commit was SVN r15296.	2007-07-05 22:30:42 +00:00
Brian Barrett	25e52238ab	add ability to buffer put/accumulate messages during an epoch This commit was SVN r15295.	2007-07-05 21:40:06 +00:00
Brian Barrett	5bbee1482e	make debugging output slightly more useful This commit was SVN r15294.	2007-07-05 21:01:32 +00:00
Jelena Pjesivac-Grbovic	483222085e	Fixing compiler warnings. In gather, the ptmp += incr is irrelevant, since ptmp is set within the loop. This commit was SVN r15293.	2007-07-05 20:40:50 +00:00
Brian Barrett	6c9de88d13	while the BTL semantics are sorted out, wait for completion of all rdma events before starting the ack This commit was SVN r15292.	2007-07-05 16:50:05 +00:00
Sven Stork	21f12f29f8	- fix a sm bug that causes segfaults in the case of threaded builds. The problem is that in the case of threaded builds for every fifo a head and tail lock will be allocated inside the shared memory segment and the ptr is stored inside the fifo. In the case that the sm backend file will be mapped in all processes at the same address (mostly the case for non-thread builds) this is fine, but in the cases when the processes map the file at different addresses this addresses cause big trouble in other processes than the one that allocted the locks. Therefore the send lock addresses have to be recalculated to match the local mapping of the processes that use them. This commit was SVN r15291.	2007-07-05 14:26:32 +00:00
Brian Barrett	74008aac53	Support real RDMA operations for networks that support it This commit was SVN r15288.	2007-07-05 03:32:32 +00:00
Brian Barrett	41afd4ebee	Clean up the MX configure test a bit. Use AC macros instead of hand writing them. Better tests, less code, and caching. Update the code to match changes in configure defines. This commit was SVN r15287.	2007-07-04 22:07:30 +00:00
Jelena Pjesivac-Grbovic	3b0a52a104	adding tuned allgatherv implementation using bruck, ring, and neighbor-exchange algorithms. The implementations passed intel and imb tests up to 40 processes. This commit was SVN r15280.	2007-07-03 23:33:12 +00:00
Brian Barrett	f5c721d11c	Wire up all the RDMA-capable BTLs. Still no RDMA communication, but the datastructures are finally all there This commit was SVN r15271.	2007-07-02 22:22:59 +00:00
George Bosilca	951e4929b9	Usually it's unlikely to have additional fragments. This commit was SVN r15253.	2007-07-01 16:19:53 +00:00
George Bosilca	c435094639	Only trigger the PERUSE_COMM_REQ_XFER_BEGIN event on the initial fragment. This commit was SVN r15252.	2007-07-01 16:19:13 +00:00
George Bosilca	60319f99ac	Make sure in case of error what we return is clean (set to NULL). This commit was SVN r15251.	2007-07-01 16:17:43 +00:00
George Bosilca	11656e20aa	Remove few warnings. This commit was SVN r15250.	2007-07-01 16:16:05 +00:00
Gleb Natapov	77e54ebc7e	Schedule RDMA op on the last BTL that got completion. This commit was SVN r15249.	2007-07-01 11:35:55 +00:00
Gleb Natapov	54b40aef91	Schedule SEND traffic of pipeline protocol between BTLs in accordance with relative bandwidths of each BTL. Precalculate what part of a message should be send via each BTL in advance instead of doing it during scheduling. This commit was SVN r15248.	2007-07-01 11:34:23 +00:00
Gleb Natapov	e74aa6b295	Schedule RDMA traffic between BTLs in accordance with relative bandwidths of each BTL. Precalculate what part of a message should be send via each BTL in advance instead of doing it during scheduling. This commit was SVN r15247.	2007-07-01 11:31:26 +00:00
George Bosilca	dfa5ae34e1	Per a discussion with Kees Verstoep and Reese Faucette add one more argument to the query for the line speed. This function is still not documented, and it really look strange that we have to respecify the nic_id (it's already attached to the endpoint). This commit was SVN r15241.	2007-06-28 20:58:00 +00:00
Jelena Pjesivac-Grbovic	d55b415bb0	fixing typo This commit was SVN r15240.	2007-06-28 20:56:55 +00:00
Brian Barrett	f8fb1e9720	Fix some compile failures on Solaris 9 because it doesn't have V6ONLY. This commit was SVN r15237.	2007-06-28 18:52:15 +00:00
George Bosilca	aec0b00f29	Get some hints about the network and propagate them to the upper level. This commit was SVN r15236.	2007-06-28 18:51:48 +00:00
George Bosilca	98142263c6	These functions are potentially shared between multiple components so they should be visible. This commit was SVN r15235.	2007-06-28 18:50:33 +00:00
Gleb Natapov	1c7141df4d	Remove unused struct. This commit was SVN r15228.	2007-06-28 11:58:16 +00:00
Jelena Pjesivac-Grbovic	8fc8b44d11	Modifying reduce decision function for large, single element reduces (again). Binary algorithm without segmentation tends to outperform binomial algorithm in this case. This commit was SVN r15226.	2007-06-27 22:01:56 +00:00
Jelena Pjesivac-Grbovic	0ecef1750d	Modifying the default reduce decision function to use binomial algorithm for single-element reduce (segmented algorithms make no sense in this case and can cause performance degradation). This commit was SVN r15209.	2007-06-26 20:14:03 +00:00
Jelena Pjesivac-Grbovic	567b40b9a9	Modifying the default broadcast decision function to use binomial algorithm for single-element broadcasts (segmented algorithms make no sense in this case and can cause performance degradation). This commit was SVN r15208.	2007-06-26 20:08:31 +00:00
Josh Hursey	acae12d0bb	Fix warning: stderr -> fileno(stderr) This commit was SVN r15207.	2007-06-26 19:28:40 +00:00
Josh Hursey	5199f4123d	Add 2 new MCA parameters to set the size of the expected and unexpected queues. This commit was SVN r15206.	2007-06-26 17:31:43 +00:00
Rich Graham	aa2ffcfcd8	add some output before abort() is called. This commit was SVN r15204.	2007-06-26 15:57:47 +00:00
Sven Stork	428f697542	- addition to r15198. Update also the prepare destintation functions. This commit was SVN r15199. The following SVN revision numbers were found above: r15198 --> open-mpi/ompi@f63dd902cb	2007-06-26 12:07:30 +00:00
Sven Stork	f63dd902cb	- bring the order changes of r14768 also to the mvapi btl This commit was SVN r15198. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07	2007-06-26 09:34:44 +00:00
Brian Barrett	e279192865	ug - fix some dumb copy-n-paste errors This commit was SVN r15188.	2007-06-25 01:59:34 +00:00
Brian Barrett	42b2c4e1df	* RELEASE not DETRUCT things created with OBJ_NEW to fix a memory leak * Fix potential race condition with starting a new lock epoch if we were releasing a lock * Increment the shared counter if we start a shared lock session during the unlock code This commit was SVN r15186.	2007-06-24 23:30:10 +00:00
Brian Barrett	2ed0548da8	* No need for waiting until exposure epochs are over in order to complete a WIN_FREE * Fix race condition in threaded builds with pending unlocks and finishing an epoch * Fix memory leak due to use of OBJ_DESTRUCT instead of OBJ_RELEASE * Fix race condition between releasing multiple shared locks and starting a new lock * Need to incremement the shared count if starting a new shared lock once an exclusive lock finishes This commit was SVN r15185.	2007-06-24 22:36:00 +00:00
Brian Barrett	5528e0ca60	Properly initialize variable for threaded case This commit was SVN r15174.	2007-06-22 15:29:06 +00:00
Brian Barrett	5f16251808	revert r15167. I don't know what I was thinking, but it was most definitely "not right". This commit was SVN r15172. The following SVN revision numbers were found above: r15167 --> open-mpi/ompi@faa401dc47	2007-06-22 15:25:39 +00:00
Brian Barrett	8031f6561e	Make a bunch of debugging calls use the macro version This commit was SVN r15170.	2007-06-21 22:24:40 +00:00
Brian Barrett	80c50120ad	debugging output should be macro version This commit was SVN r15168.	2007-06-21 22:09:37 +00:00
Brian Barrett	faa401dc47	* Need to OBJ_RELEASE, not OBJ_DESTRUCT things that were created with OBJ_NEW * Need to single when the passive unlock has left an expose epoch for the win_free case * Clean up some debugging output * fix missing variable initialization This commit was SVN r15167.	2007-06-21 22:08:30 +00:00
Jeff Squyres	022bd30558	Back out r15158 because it apparently breaks with recent versions of flex (which, incidentally, emit ''more'' warnings than earlier versions). Grumble. This commit was SVN r15166. The following SVN revision numbers were found above: r15158 --> open-mpi/ompi@57d09c10f7	2007-06-21 21:14:10 +00:00
Jelena Pjesivac-Grbovic	3740640711	Modifying MPI_Gather in tuned module: - adding linear algorithm with synchronization for gather. This algorithm prevents congestion at root process, but introduces synchronization (serializes non-root processes, but allows messages to arrive from two processes at the same time). It performed better than binomial and linear algorithms for large message, and intermediate and large communicator sizes. - Updating MPI_Gather decision function to reflect performance results from MX. I will perform more measurements though - so this one can change. This commit was SVN r15165.	2007-06-21 20:00:36 +00:00
Jeff Squyres	57d09c10f7	Avoid some compiler warnings that come up ''every day'' in MTT (and have been for eons): make a symbol be used in a dumb but harmless way. This commit was SVN r15158.	2007-06-21 15:42:06 +00:00
Gleb Natapov	b88b7dedfe	Rename btl_rdma_offset to btl_pipeline_send_length. This commit was SVN r15153.	2007-06-21 07:12:40 +00:00
Jeff Squyres	84487f5c4b	Update and correct the help messages for the generic BTL MCA parameters. Hopefully, they now make more sense to the mostly naieve user... This commit was SVN r15147.	2007-06-20 16:37:50 +00:00
Jeff Squyres	930a9b7682	Make the help messages for if_include/if_exclude a little better. This commit was SVN r15134.	2007-06-19 13:38:58 +00:00
Josh Hursey	7fd1805e97	Fix a couple of compile warnings that Tim P brought to by attention. This commit was SVN r15132.	2007-06-19 00:46:16 +00:00
Gleb Natapov	643037907f	Convert all #ifdef OMPI_ENABLE_DEBUG to #if. This commit was SVN r15117.	2007-06-17 07:14:47 +00:00
George Bosilca	10a017d1bf	For a obscure reason this have to be defined on Windows. The obscure reason it's that we don't have the nice configure stuff, so detecting when to enable the CR PML it's kind of hard. Keep it defined and at least it compile smoothly. This commit was SVN r15116.	2007-06-17 05:01:09 +00:00
George Bosilca	ceb8abe9c1	OMPI_ENABLE_DEBUG require an #if not an #ifdef This commit was SVN r15107.	2007-06-15 19:22:19 +00:00
Josh Hursey	6cdfefad87	Fix portals BTL and cnos RML. Both were failing due to interface changes that were never applied to them properly. This commit was SVN r15082.	2007-06-14 18:49:41 +00:00
Jeff Squyres	2399b9a535	Ensure to initialize the variable so that we don't segv. This commit was SVN r15078.	2007-06-14 13:59:28 +00:00
Gleb Natapov	7b9ae49fe1	This time correctly calculate local BTL rank among all BTLs in a subnet. This commit was SVN r15073.	2007-06-14 10:27:11 +00:00
Jeff Squyres	1e18265c16	Bring over the functionality from the /tmp/jnysal-openib-wireup branch: * Support btl_openib_if_include and btl_openib_if_exclude MCA parameters, similar to those supported by other BTLs. Each take a comma-delimited lists of identifiers. Identifiers can be HCA interface names (e.g., ipath0, mthca1, etc.) or an HCA interface name and port numbers (e.g., ipath0:1, mthca1:2, etc.). It is an error to specify both _include and _exclude. If you specify a non-existant (or non-ACTIVE) HCA and/or port, you'll get a warning unless you disable the warning by setting the MCA parameter btl_openib_warn_nonexistent_if to 0. * Start updating to use BEGIN_C_DECLS and END_C_DECLS * A few other minor fixes that were picked up along the way. This commit was SVN r15063.	2007-06-14 01:59:25 +00:00
Rainer Keller	ca09aae2cc	- Get PERUSE compile again with latest RDMA changes in r14768/r14842. This commit was SVN r15042. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07 r14842 --> open-mpi/ompi@10266fb467	2007-06-13 12:47:47 +00:00
Gleb Natapov	8164723014	Allow to configure bandwidth and latency with finer granularity. Set bandwidth for all ports of mthca0: --mca btl_openib_bandwidth_mthca0 1000 Set bandwidth for port 1 of mthca1: --mca btl_openib_bandwidth_mthca1:1 1000 Set latency for port 2 lid 123 on mthca0: --mca btl_openib_latency_mthca0:2:123 20 This commit was SVN r15041.	2007-06-13 12:47:38 +00:00
Gleb Natapov	5c3f511451	Properly determine btl's rank among all btls withing the same subnet. This commit was SVN r15038.	2007-06-13 11:15:58 +00:00
Brian Barrett	b71b2b4b0d	Make the aio detection work with cross compiling. The tests no longer even look at the status code and basically guarantee that the aio function was never called, so there's really no point in AC_TRY_RUN over AC_COMPILE_IFELSE... This commit was SVN r15033.	2007-06-13 03:16:32 +00:00
Brian Barrett	84d1512fba	Add the potential for doing some basic error checking on mutexes during single threaded builds. In its default configuration, all this does is ensure that there's at least a good chance of threads building based on non-threaded development (since the variable names will be checked). There is also code to make sure that a "mutex" is never "double locked" when using the conditional macro mutex operations. This is off by default because there are a number of places in both ORTE and OMPI where this alarm spews mega bytes of errors on a simple test. So we have some work to do on our path towards thread support. Also removed the macro versions of the non-conditional thread locks, as the only places they were used, the author of the code intended to use the conditional thread locks. So now you have upper-case macros for conditional thread locks and lowercase functions for non-conditional locks. Simple, right? :). This commit was SVN r15011.	2007-06-12 16:25:26 +00:00
Galen Shipman	8e7cce813e	don't update MPI_ERROR This commit was SVN r15004.	2007-06-11 21:40:29 +00:00
Galen Shipman	406b05bdc3	update copyright.. This commit was SVN r15003.	2007-06-11 21:17:49 +00:00
Galen Shipman	798cc2c5b8	handle MPI_STATUS_IGNORE in iprobe for the MTLs This commit was SVN r15002.	2007-06-11 20:19:31 +00:00
Brian Barrett	27ad954265	Fix a couple of problems with the way we were using orte_process_name_t structures in the system. Instead of using memcmp, use the ns function. This won't cause a problem as long as all three elements of the name are ints, but if they have different sizes, alignment and padding rules can cause memcmp() to compare padding space, which rarely holds a sane value. This commit was SVN r14998.	2007-06-11 19:12:11 +00:00
George Bosilca	e2dd0a50fc	A better version alowing for multi-rails or clusters of clusters. A lot of cleanups. This commit was SVN r14963.	2007-06-08 20:37:20 +00:00
George Bosilca	c66cf32ee2	Cleaning up. Removing all unused variables and fields in the MX BTL and component structures. This commit was SVN r14957.	2007-06-07 21:02:18 +00:00
George Bosilca	5d6c958066	Enable the MTLs to be compiled in a visibility featured environment. This commit was SVN r14955.	2007-06-07 20:14:53 +00:00
Gleb Natapov	423f404c34	Shut up compiler warning. Ugly, but I can see better way except changing converter to use uint64_t(ssize_t?) for offset. This commit was SVN r14950.	2007-06-07 11:33:28 +00:00
Gleb Natapov	9f9b64db4e	Revert r14947 as this doesn't solve the problem. This commit was SVN r14949. The following SVN revision numbers were found above: r14947 --> open-mpi/ompi@5b9fe28e3f	2007-06-07 11:24:24 +00:00
Gleb Natapov	5b9fe28e3f	Fix warning on 32bit systems. This commit was SVN r14947.	2007-06-07 08:57:34 +00:00
Tim Prins	06bf4c3f3b	fix some printf warnings This commit was SVN r14934.	2007-06-06 22:37:26 +00:00
George Bosilca	6a5e039466	Allow smart connection to be setup. Each peer now has attached to it thea unique id based on the last half of the mapper MAC. This allow us to figure out how to connect peers. This allow the MX BTL to be used in a cluster of cluster configuration where each cluster have MX internally as well as on a multi rail MX system. This commit was SVN r14932.	2007-06-06 21:42:11 +00:00
Galen Shipman	5340f5e320	Try to cleanup the flow control logic a bit Renamed a few variables Inialize the reserve receive buffers to 1, prior to this they were initialized to zero. This commit was SVN r14919.	2007-06-06 18:51:09 +00:00
Gleb Natapov	de58336c45	Let rdma_pipeline_offset to be set to zero. This commit was SVN r14900.	2007-06-06 11:54:25 +00:00
Rich Graham	e276f7bcc7	undo my error. This commit was SVN r14890.	2007-06-05 23:32:47 +00:00
Rich Graham	ce0e9ac329	initialize lock properly. This commit was SVN r14881.	2007-06-05 20:34:11 +00:00
Donald Kerr	8ecbc71ed2	add support for connection private data, off by default This commit was SVN r14878.	2007-06-05 19:29:50 +00:00
Gleb Natapov	ac1e8f81af	Lets be real. TCP latency is slightly worse then mx/openib. This commit was SVN r14865.	2007-06-05 12:22:57 +00:00
Gleb Natapov	fbd033b162	Cut&Paste error in r14795. Fix. This commit was SVN r14862. The following SVN revision numbers were found above: r14795 --> open-mpi/ompi@6b0d8c0858	2007-06-05 10:07:06 +00:00
Shiqing Fan	c142c23f88	Initialize req_ompi.req_status._count to be 0 before starting the request. This commit was SVN r14861.	2007-06-05 09:50:06 +00:00
Brian Barrett	508da4e959	OS X apparently really doesn't like shared libraries with unresolvable symbols in them and environ is defined only in the final application (probably in crt1.o). Apple provides a function for getting at the environment, so use that instead if it's available. This commit was SVN r14857.	2007-06-05 03:03:59 +00:00
Brian Barrett	a446af5b6b	* Remove unneeded SRQ test -- we no longer support OFED builds that don't have the SRQ interface. * Instead of setting AC_DEFINEs per MCA component, set per test. THe answers can never be difference, and this will speed sed just a teeny bit This commit was SVN r14856.	2007-06-05 01:49:26 +00:00
Brian Barrett	0798c0784d	properly set fields so that most difficult alignment rules are always met. This commit was SVN r14854.	2007-06-05 01:46:04 +00:00
Shiqing Fan	0961669912	Spaces after backslash are removed. This commit was SVN r14844.	2007-06-04 10:10:24 +00:00
Shiqing Fan	7bf18a4fd5	MPI_SOURCE should be initialized. This commit was SVN r14843.	2007-06-04 09:37:21 +00:00
Gleb Natapov	10266fb467	Fix deadlock in OB1 protocol by by sending memory by copying if registration fails. This commit was SVN r14842.	2007-06-03 08:31:58 +00:00
Gleb Natapov	a25e1e7b15	Implement new function mca_pml_ob1_send_requst_copy_in_out(req, offset, len) that allows to send any range of a request by send/recv instaed of RDMA and use it to send data from the end of a request in pipeline protocol. This commit was SVN r14841.	2007-06-03 08:30:07 +00:00
Brian Barrett	8cf02de3b4	* cleanup some ompi_info output * enable eager sending by default This commit was SVN r14813.	2007-05-30 22:23:34 +00:00
Brian Barrett	a2713dcac8	eeks! Bad to notice after committing the pt2pt part of r14806 that the compile failed because of the wrong variable name. This commit was SVN r14807. The following SVN revision numbers were found above: r14806 --> open-mpi/ompi@7e57bbb0ef	2007-05-30 20:33:08 +00:00
Brian Barrett	7e57bbb0ef	React slightly better when datatype creation from a buffer fails This commit was SVN r14806.	2007-05-30 20:32:02 +00:00
Brian Barrett	84f7ed70b3	Re-enable the ability for the rdma one-sided component to start messages as soon as the epochs allow, rather than waiting for the end of the synchronization phase. This commit was SVN r14800.	2007-05-30 17:06:19 +00:00
Gleb Natapov	6b0d8c0858	TCP BTL ignores btl_tcp_bandwidth parameter. Fix it. This commit was SVN r14795.	2007-05-30 14:12:05 +00:00
Donald Kerr	91c9b7b6f9	don't call dat_evd_resize if new value is less than or equal to current because ofed stack does not return DAT_INVALID_STATE This commit was SVN r14792.	2007-05-29 20:08:16 +00:00
Gleb Natapov	06bf5d74e7	Remove mca_pml_ob1_send_fin_btl function. This commit was SVN r14784.	2007-05-28 06:51:12 +00:00
Gleb Natapov	f5078db0db	Fix order of parameters to function. This commit was SVN r14783.	2007-05-27 13:45:24 +00:00
Gleb Natapov	f191834e56	No need for MCA_BTL_FLAGS_NEED_ACK any more. As of commit r14768 this is the default behaviour. This commit was SVN r14782. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07	2007-05-27 11:25:39 +00:00
Gleb Natapov	444762456e	Don't dereference NULL pointer. Fix bug introduced in r14768. This commit was SVN r14781. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07	2007-05-27 09:24:56 +00:00
Gleb Natapov	ad69d3c6ac	Fix out of resource handling for FIN packets broken by r14768. This commit was SVN r14780. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07	2007-05-27 08:29:38 +00:00
Brian Barrett	80fa8eef6e	Don't include malloc.h in mpool/base/base.h because it causes all kinds of problems when the memory debugging stuff is enabled. Push it down into the two .c files that do use it. This commit was SVN r14779.	2007-05-27 03:55:21 +00:00
George Bosilca	eb43abf7ae	Allow compilation when there is a progress thread. This commit was SVN r14776.	2007-05-25 01:59:29 +00:00
George Bosilca	8b817e96fd	Allow threaded compilation. This commit was SVN r14775.	2007-05-25 01:53:29 +00:00
Galen Shipman	3401bd2b07	Add optional ordering to the BTL interface. This is required to tighten up the BTL semantics. Ordering is not guaranteed, but, if the BTL returns a order tag in a descriptor (other than MCA_BTL_NO_ORDER) then we may request another descriptor that will obey ordering w.r.t. to the other descriptor. This will allow sane behavior for RDMA networks, where local completion of an RDMA operation on the active side does not imply remote completion on the passive side. If we send a FIN message after local completion and the FIN is not ordered w.r.t. the RDMA operation then badness may occur as the passive side may now try to deregister the memory and the RDMA operation may still be pending on the passive side. Note that this has no impact on networks that don't suffer from this limitation as the ORDER tag can simply always be specified as MCA_BTL_NO_ORDER. This commit was SVN r14768.	2007-05-24 19:51:26 +00:00
Jeff Squyres	81df632e29	Clarification to MCA parameter help messages This commit was SVN r14765.	2007-05-24 19:18:29 +00:00
Brian Barrett	5ec421e1b0	Create a new queue (to simplify locking) for requests that are started but can not be started by the BTL. This commit was SVN r14757.	2007-05-24 17:21:56 +00:00
George Bosilca	7459ab45f1	This is the complete commit for the TCP header issue. Jeff commit a partial fix (r14749) and then backed it out (r14753). As we are unable to send more than a 32 bits length over TCP in one go, there is no reason to have an uint64 length in the header. This reduce the size of the TCP header. This commit was SVN r14755. The following SVN revision numbers were found above: r14749 --> open-mpi/ompi@48c026ce6b r14753 --> open-mpi/ompi@28ed850b4c	2007-05-24 16:40:49 +00:00
Jeff Squyres	28ed850b4c	Back out r14749; it wasn't quite ready for prime time yet... This commit was SVN r14753. The following SVN revision numbers were found above: r14749 --> open-mpi/ompi@48c026ce6b	2007-05-24 15:46:15 +00:00
Brian Barrett	1b025798d2	remove some now unneeded volatiles This commit was SVN r14752.	2007-05-24 15:42:06 +00:00
Brian Barrett	1a9f48c89d	Some much needed cleanup of the rdma one-sided component, similar to r14703 for the point-to-point component. * Associate the list of long message requests to poll with the component, not the individual modules * add progress thread that sits on the OMPI request structure and wakes up at the appropriate time to poll the message list to move long messages asynchronously. * Instead of calling opal_progress() all over the place, move to using the condition variables like the rest of the project. Has the advantage of moving it slightly further along in the becoming thread safe thing. * Fix a problem with the passive side of unlock where it could go recursive and cause all kinds of problems, especially when progress threads are used. Instead, have two parts of passive unlock -- one to start the unlock, and another to complete the lock and send the ack back. The data moving code trips the second at the right time. This commit was SVN r14751. The following SVN revision numbers were found above: r14703 --> open-mpi/ompi@2b4b754925	2007-05-24 15:41:24 +00:00
Jeff Squyres	48c026ce6b	Commit a patch from George (reviewed by Brian): reduce the size of the mca_btl_tcp_hdr_t struct and remove the need for the heterogeneous padding by changing the type of the "size" member to be uint32_t (vs. uint64_t). The value would never be greater than 32 bits anyway, so having the type be uint64_t was wasteful. This commit was SVN r14749.	2007-05-24 15:08:57 +00:00
Gleb Natapov	be71b78f6a	Initialize btl_send_limit before use. This commit was SVN r14745.	2007-05-24 08:40:26 +00:00
Brian Barrett	075389f67d	fix some printf warnings This commit was SVN r14740.	2007-05-23 21:19:26 +00:00
Brian Barrett	38b0d22243	Some cleanups to the pt2pt component * Remove unused declaration * remove unused variable warning when not using progress threads * If we're using progress threads, we want to lock, not trylock when in progress, since it was called from the wakeup thread and not the progress function This commit was SVN r14739.	2007-05-23 20:31:25 +00:00
George Bosilca	b2e805db61	Nothing relevant. Indentation, typos, change PTL to BTL. This commit was SVN r14727.	2007-05-23 14:03:52 +00:00
Sven Stork	88f0845c44	- let the pt2pt component compile with threads enabled This commit was SVN r14725.	2007-05-23 12:56:34 +00:00
Brian Barrett	38eab3613b	* Fix race condition with the pending_{in,out} variables -- if we're going to do while(...) { } then we can't change the variables in the ... atomically, but should do it while holding the module lock. * Fix dumb communicator creation error when we don't create the progress stuff (because a window already exists), where we would accidently jump to the error case. This commit was SVN r14715.	2007-05-21 20:53:02 +00:00
Ralph Castain	4fff584a68	Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that did start. The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system. Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed. Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief. With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn. Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put". This commit was SVN r14711.	2007-05-21 18:31:28 +00:00
Brian Barrett	0e9e0c518a	Fix a couple more progress thread related issues... This commit was SVN r14708.	2007-05-21 16:06:14 +00:00
Pavel Shamis	5ceaa605d7	Adding new vendor_part_id for Mellanox Hermon HCA This commit was SVN r14705.	2007-05-21 13:33:54 +00:00
Brian Barrett	1191677b76	Fix dumb threads-related compile issues This commit was SVN r14704.	2007-05-21 03:23:58 +00:00
Brian Barrett	2b4b754925	Some much needed cleanup of the point-to-point one-sided component... * Combine polling of the long requests and buffer requests into one type, and in one place * Associate the list of requests to poll with the component, not the individual modules * add progress thread that sits on the OMPI request structure and wakes up at the appropriate time to poll the message list. Not the best, but without some asynch notification from the PML that a given set of requests has completed, there isn't much better * Instead of calling opal_progress() all over the place, move to using the condition variables like the rest of the project. Has the advantage of moving it slightly futher along in the becoming thread safe thing * Fix a problem with the passive side of unlock where it could go recursive and cause all kinds of problems, especially when progress threads are used. Instead, have two parts of passive unlock -- one to start the unlock, and another to complete the lock and send the ack back. The data moving code trips the second at the right time. This commit was SVN r14703.	2007-05-21 02:21:25 +00:00
Donald Kerr	23280bd7da	remove an assignment which is not required This commit was SVN r14692.	2007-05-18 01:33:02 +00:00
Donald Kerr	588d5bd6a9	clean up compile warnings This commit was SVN r14691.	2007-05-17 23:37:47 +00:00
George Bosilca	7738079ab9	Remove unused variable. This commit was SVN r14689.	2007-05-17 20:01:30 +00:00
Gleb Natapov	b2c8fcdbab	Forget to add file in r14681. This commit was SVN r14682. The following SVN revision numbers were found above: r14681 --> open-mpi/ompi@3ebaff8dfe	2007-05-17 08:41:01 +00:00
Gleb Natapov	3ebaff8dfe	Implement new BTL parameters: We eagerly send data up to btl__eager_limit with the match Upon ACK of the MATCH we start using send/receives of size btl__max_send_size up to the btl__rdma_pipeline_offset After the btl__rdma_pipeline_offset we begin using RDMA writes of size btl__rdma_pipeline_frag_size. Now, on a per message basis we only use the above protocol if the message is larger than btl__min_rdma_pipeline_size btl__eager_limit - > same btl__max_send_size -> same btl__rdma_pipeline_offset -> btl__min_rdma_size btl__rdma_pipeline_frag_size -> btl__max_rdma_size btl_*_min_rdma_pipeline_size is new.. This patch also moves all BTL common parameters initialisation into btl_base_mca.c file. This commit was SVN r14681.	2007-05-17 07:54:27 +00:00
Brian Barrett	33a5758521	Some IPv6 improvements: * Move ipv6comat.h code into opal_config_bottom.h and change into some more intelligent testing of structures * Change opal's if interface to use sockaddr instead of sockaddr_storage, as the RFCs suggest we do * Move the networking code in opal that isn't directly related to if detection into net.h * Add quicky function to get the port out of either a sockaddr_in or sockaddr_in6, saving a bunch of code in the oob. * Update TCP oob and btl with new interface This commit was SVN r14679.	2007-05-17 01:17:59 +00:00
Donald Kerr	c40307fd27	add user warning message to inform when udapl btl is no longer able to register memory This commit was SVN r14678.	2007-05-16 21:04:50 +00:00
Brian Barrett	7708c4f887	Don't complain about unsupported protocols. Needs to be made better, but this will quit the whining from platforms where the kernel doesn't have IPv6 support. This commit was SVN r14676.	2007-05-16 20:11:47 +00:00
Sven Stork	22af6d38e6	- UNexport symbols that shouldn't be needed outside the libraries - replace #if/#endif with BEGIN/END_C_DECLS - reformating This commit was SVN r14669.	2007-05-16 15:46:52 +00:00
Gleb Natapov	61e889a1d9	Fix breakage of GM by r13921. On receive GM provides only buffer pointer without any context so we need to save a context somewhere so it can be retrieved given only buffer pointer. This patch saves context (pointer to frag) just before start of a buffer so it can be be easily retrieved. This commit was SVN r14664. The following SVN revision numbers were found above: r13921 --> open-mpi/ompi@90fb58de4f	2007-05-16 12:20:58 +00:00
Donald Kerr	2ed72bf2e2	break evd_qlen into individual qlens (async,dto,conn); add checks based on udapl limits and number of peers This commit was SVN r14659.	2007-05-15 17:47:00 +00:00
Pavel Shamis	cd87b05711	Added check for IBV_EVENT_CLIENT_REREGISTER async event that was not exists in old openib gen2 versions (Ticket #1025) This commit was SVN r14658.	2007-05-15 13:53:49 +00:00
Brian Barrett	21e00f6f0c	Clean up a couple of configure things: * Require Autoconf 2.60 or higher and remove some cruft required for AC 2.59 or the AC 2.59 / AC 2.60 mix * Remove a bunch of now unnecessary AC_SUBST calls * Use the libtool-provided variables for the -I and library to use when compiling against ltdl Fixes trac:1000 This commit was SVN r14652. The following Trac tickets were found above: Ticket 1000 --> https://svn.open-mpi.org/trac/ompi/ticket/1000	2007-05-15 04:23:48 +00:00
Jeff Squyres	92090967b1	Add definitions for Hemon/ConnectX Mellanox HCA This commit was SVN r14639.	2007-05-10 12:27:51 +00:00
Donald Kerr	436d370d51	latency improvements: use ompi_free_list_init_ex, create optimal alignment parameter, remove rdma guarantee path, replace dat_lmt_sync_rdma with use of volatile This commit was SVN r14634.	2007-05-09 19:41:25 +00:00
Gleb Natapov	2562253678	Do more work at RDMA frag preparation time and less work at RDMA frag sending time. This commit was SVN r14627.	2007-05-09 12:11:51 +00:00
Gleb Natapov	78fda79630	Use size_t instead of uint64_t in call to convertor cloning. This commit was SVN r14626.	2007-05-09 10:02:06 +00:00
Pavel Shamis	e2d0e27111	Adding: * openib_finalize flow for openib btl * async event handler for openib btl This commit was SVN r14623.	2007-05-08 21:47:21 +00:00
Terry Dontje	f864348f97	Put an ifdef to conditionalize the use of memcpy for sparcv9 platforms to avoid alignmment issues. This commit fixes trac:1009. This commit was SVN r14608. The following Trac tickets were found above: Ticket 1009 --> https://svn.open-mpi.org/trac/ompi/ticket/1009	2007-05-08 17:17:34 +00:00
Jeff Squyres	ecf5a3b8dd	Fix compiler warning This commit was SVN r14604.	2007-05-08 13:12:50 +00:00
Sven Stork	a04c8eb39a	- Bring over the visibility feature, for a finer symbol export control via the visibility feature that is provided by some compilers. Per default this feature is disabled, to enable it you need to configure with --enable-visibility and obviously you need a compiler with visibility support. Please refer to the wiki for more information. https://svn.open-mpi.org/trac/ompi/wiki/Visibility This commit was SVN r14582.	2007-05-04 09:03:37 +00:00
Jelena Pjesivac-Grbovic	625c6739ab	Removing warning about unsed variable This commit was SVN r14579.	2007-05-03 20:26:41 +00:00
Gleb Natapov	8029893489	In multithreaded application sending of initial portion of a request may overlap with RDMAing the rest of it. Also more than one RDMA writes can be performed simultaneously by different threads. To make this code thread safe this patch clones original request convertor for each RDMA fragment. This commit was SVN r14574.	2007-05-03 09:13:17 +00:00
Jelena Pjesivac-Grbovic	9eff74ad4d	Modifying generalized reduce "synchronized" behavior: - Removing "small" message size limit because it really does not relate to the eager size accross the board. Now, the leaf nodes in generalized reduce will use blocking send (DEFAULT/ORIGINAL BEHAVIOR) either when the maximum number of outstanding requests is 0 or when the total number of segments is less than the maximum number of outstanding requests. Otherwise, it will send messages using non-blocking synchronized send operation. This commit was SVN r14572.	2007-05-02 21:42:45 +00:00
George Bosilca	69642a9cd4	Remove 2 warnings about ptrdiff_t to unsigned long implicit conversion. This commit was SVN r14565.	2007-05-01 19:47:33 +00:00
Adrian Knoth	d63d125a88	I guess we only need this when IPv6 is enabled. This commit was SVN r14551.	2007-04-29 16:38:34 +00:00
Adrian Knoth	5765ecc22e	This patch reverts r14549 while retaining IPv6 support. Re #1008 This commit was SVN r14550. The following SVN revision numbers were found above: r14549 --> open-mpi/ompi@386baed55b	2007-04-29 16:23:11 +00:00
Adrian Knoth	386baed55b	Hotfix for IPv6 support. Closes trac:1008 This commit was SVN r14549. The following Trac tickets were found above: Ticket 1008 --> https://svn.open-mpi.org/trac/ompi/ticket/1008	2007-04-29 13:46:45 +00:00
George Bosilca	bb481273a6	Typos. This commit was SVN r14546.	2007-04-28 19:15:53 +00:00
George Bosilca	46265db0a9	Update the TCP BTL in order to bring back some of the functionalities lost during the IPv6 patch. The most important is the multi BTL support. There was a quite interesting bug. Instead of setting up the multiple connections over different physical devices, based on the time when these connections were created most of the time they were all using the same physical network. Which, of course, was not the intended goal, as we top at the maximum bandwidth available over one device instead of gathering all available bandwidth from all devices. Second, the IPv6 RFC suggest to use sockaddr_storage as a holder for the IP information, but use a sockaddr* when we pass it to functions. This is only partially corrected by this patch. Some other minor cleanups. This commit was SVN r14544.	2007-04-28 19:13:47 +00:00
Josh Hursey	4c453caab6	Make the check a bit better This commit was SVN r14542.	2007-04-27 17:38:36 +00:00
Josh Hursey	486f29eb6b	Make sure to use the new metadata flags This commit was SVN r14541.	2007-04-27 17:18:26 +00:00
Sven Stork	8d92773067	- export required symbol This commit was SVN r14536.	2007-04-27 11:38:45 +00:00
Rainer Keller	1aceece03f	- Add a few comments for elements for structs, a few spelling fixes. No functional change. This commit was SVN r14534.	2007-04-26 21:03:38 +00:00
Rainer Keller	ce32b918da	- Fixes for for unlocking the mutex in case of error in functions mca_btl_openib_post_srr and btl_openib_endpoint_post_rr This commit was SVN r14530.	2007-04-26 13:33:02 +00:00
Rainer Keller	6f9251ed39	- Small fixes by PGI -Minform=inform This commit was SVN r14524.	2007-04-26 08:16:07 +00:00
Josh Hursey	af38efd27c	Use more of the datatype engine supplied functions This commit was SVN r14519.	2007-04-26 00:06:22 +00:00
Jelena Pjesivac-Grbovic	3eac49aa59	Adding flow control for leaf nodes in generalized reduce structure. This "feature" is disabled by default and it should not affect the current performance. In case when the message size is large and segment size is smaller than eager size for particular interface, the leaf nodes in generalized reduce function can overflood parent nodes by sending all segments without any synchronization. This can cause the parent to have HIGH number of unexpected messages (think 16MB message with 1KB segments for example). In case of binomial algorithm root node always has at least one child which is leaf, so this can potentially affect the root's performance significantly [Especially in large communicators where root may have quite a few children (binomial tree for example)]. When the segment size is bigger than the eager size, rendezvous protocol ensures that this does not happen so it is not necessary. Originally, the problem was exposed in "infinite" bucket allocator clean up time for "small" segment sizes (which may explain some "deadlocks" on Thunderbird tests). To prevent this, we allow user to specify mca parameter "--mca coll_tuned_reduce_algorithm_max_requests NUM" this limits number of outstanding messages from a leaf node in generalized reduce to the parent to NUM. Messages are sent as non-blocking synchrnous messages, so syncronization happens at "wait" time. The synchronization actually improved performance of pipeline and binomial algorithm for large message sizes with 1KB segments over MX, but I need to test it some more to make sure it is consistent. Since there is no easy way to find out what is "the eager" size for particular btl, I set the limit to 4000B. If message/individual segment size is greater than 4000B - we will not use this feature. This variable may or may not be exposed as mca parameter later... I did not have any problems running it and both "default" and "synchronous" tests passed Intel Reduce* tests up to 80 processes (over MX). This commit was SVN r14518.	2007-04-25 20:39:53 +00:00
Adrian Knoth	e3d35258b4	Cosmetics. Brian fixes my crappy code and I fix the curly braces. That's teamwork, right? ;) This commit was SVN r14517.	2007-04-25 20:17:19 +00:00
Brian Barrett	4b8bb70afb	A couple cleanups for the IPv6 support: - make opal_sockaddr2str() take a sockaddr_storage instead of a sockaddr_in6 so that it works for IPv4 and IPv6 addresses, and remove a whole bunch of #ifs in the OOOB code. - Fix a compiler warning in the TCP BTL due to run-time determined array size by making it a dynamicly allocated array. - Fix the unpacking code of IPv4 addresses when using IPv6 support, so that the address is in the correct location (instead of in an IPv6 structure, use an IPv4 structure). Refs trac:1005. This commit was SVN r14514. The following Trac tickets were found above: Ticket 1005 --> https://svn.open-mpi.org/trac/ompi/ticket/1005	2007-04-25 19:08:07 +00:00
Adrian Knoth	d1ce39de4f	Move mca_btl_tcp_addr_isipv4public to opal_addr_isipv4public This commit was SVN r14512.	2007-04-25 18:06:06 +00:00
Donald Kerr	80d984441f	change so that we only check connection queue when expecting a connection; create a mca parameter that controls frequency at which the async queue is checked This commit was SVN r14511.	2007-04-25 17:46:25 +00:00
Jeff Squyres	c4c68e666a	Merge in the ipv6 work from /tmp/ipv6-merge. This commit was SVN r14503.	2007-04-25 01:55:40 +00:00
Donald Kerr	cae24fcde1	move mca parameter registration into own .c and .h files This commit was SVN r14493.	2007-04-24 18:34:16 +00:00
Josh Hursey	8c2385416f	Per a developer request - Make sure that the wrapper selection is compiled out if not enabling FT. Before the logic would skip over it since the conditional if statements would not be satisfied, now there are no additional if statements when compiled out. With this modification the selection logic looks nearly identical to pre-r14051 with the exception of the non-FT related improvements. This commit was SVN r14491. The following SVN revision numbers were found above: r14051 --> open-mpi/ompi@dadca7da88	2007-04-24 17:08:48 +00:00
Ralph Castain	18b2dca51c	Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly. There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd really have to try). This also involved a slight change to the oob.xcast API, so propagated that as required. Note: this has only been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-) Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately. This commit was SVN r14475.	2007-04-23 18:41:04 +00:00
Donald Kerr	3f428af7b8	couple of minor changes to fix #973 and seperated eager rdma fragments into structure only and data only area This commit was SVN r14470.	2007-04-23 17:41:34 +00:00
Jelena Pjesivac-Grbovic	53cbec7a09	Make coll/tuned dynamic rules more verbose (when promted with --mca coll_base_verbose 1) This commit was SVN r14469.	2007-04-23 16:34:52 +00:00
Rich Graham	ce35761683	make sure not to go out of bounds. element i+1 of bml_btls is referenced, which for i-arr_size-1 is beyond the array dimentions. This commit was SVN r14464.	2007-04-22 21:43:34 +00:00
Sharon Melamed	cf3f41288b	Add pkey value MCA parameter. if this param is used, only ports with the actual pkey value will be initiate. This commit was SVN r14463.	2007-04-22 10:22:12 +00:00
Adrian Knoth	339dbf6cd5	Cosmetics. Enforcing style guide. This commit was SVN r14459.	2007-04-21 21:47:25 +00:00
Josh Hursey	4159b72a60	Some minor updates to go along with commit r14457 This commit was SVN r14458. The following SVN revision numbers were found above: r14457 --> open-mpi/ompi@2af38229c1	2007-04-21 21:24:44 +00:00
Josh Hursey	2af38229c1	Re-worked the implementation of the LAM-like coord component. It's a bit longer, but much more clear in it's implementation I believe. Fundamentally it is the same, but is much more solid in the implementation. I created quite a few directed tests that this version of the implementation now passes. This commit was SVN r14457.	2007-04-21 20:35:01 +00:00
Jeff Squyres	0ba47105ed	Merge the /tmp/jms-installdirs-trunk branch into the trunk. This finally brings in functionality that is already on the 1.2 branch, and was developed and tested in the v1.2ofed branch (and other places). Short version of new features: * Support for ibv_fork_init() * Automatically fill in the openib BTL bandwidth value by querying the HCA port * Installdirs functionality * Fixes to always use -I in the Fortran wrapper compilers (#924) * Gleb's mpool updates * Remove some kruft in btl/openib/configure.m4, therefore fixing the harmless warnings noted in #665 * Bunches of updates to the Linux RPM spec file I.e., effectively the same thing that r14411 brought to the v1.2 branch. Also effectively brought in r14432 and r14433 (some fixes on top of the original r14411 commit to v1.2). Still need to bring in the moral equivalent of r14445 after this commit (fixes to installdirs). This commit was SVN r14449. The following SVN revision numbers were found above: r14411 --> open-mpi/ompi@83b31314ae r14432 --> open-mpi/ompi@a48f160595 r14433 --> open-mpi/ompi@68f346d2bc r14445 --> open-mpi/ompi@13d366b827	2007-04-21 00:15:05 +00:00
Josh Hursey	eef364546c	Check for NULL before trying to use the variable. This commit was SVN r14444.	2007-04-20 17:17:11 +00:00
Josh Hursey	12e5d0e817	ft_event Commit: - Move the PML Modex stuff out of the BML -- Abstraction violation. - Also fix the location of the add_procs with respect to the stage gates. This commit was SVN r14422.	2007-04-19 03:05:12 +00:00
Josh Hursey	d12ddcdb7a	Protect the free since if we never send any messages this could be NULL. This commit was SVN r14421.	2007-04-19 02:17:50 +00:00
George Bosilca	51fc2474f1	Don't keep the data attached to a fragment segmented when we have to move it into the unexpected queue. Instead pack the data in only one buffer. Now the code look more optimized and clear, but I have a doubt about who's using this functionality. I think that all BTLs always return only one memory segment attached to the matching fragment (i.e. there is no unexpected iov type receive). This commit was SVN r14416.	2007-04-18 15:52:11 +00:00
George Bosilca	66a110e115	Add some comments on the internals of the bucket structure. Alter the cleanup function to make it more scalable. The memory fragmentation is still high, but at least in most of the cases (where all ressources are correctly released before the cleanup) the code is now highly efficient. Before the code execute in (N * (N-1))!, which take a while when the number of allocated ressources increase (which is the case when a lot of unexpected messages are created). The fix consist of checking if all items are freed and if it's the case then do not recreate the free items list (as we know that everything will be released). If this condition is not true, we fall back on the original execution path (which is still sub-sub-sub ... optimal). This commit was SVN r14406.	2007-04-17 20:43:30 +00:00
Jeff Squyres	82caceda08	A minor change to ROMIO's configure script: make it use exactly the same "restrict" check as the top-level OMPI configure.ac script so that it will guarantee to always get the same result. Therefore, the #define for restrict will always have the same value in both opal_config.h and romioconf.h, and we get 7 less warnings (6 in the IO ROMIO component, 1 in ROMIO itself) when compiling with icc on Linux (because PAC_C_RESTRICT and AC_C_RESTRICT would get different values for the "restrict" #define in this case). This commit was SVN r14387.	2007-04-17 03:10:06 +00:00
Adrian Knoth	e3178fd39f	Cosmetics. PTLs are now called BTLs. This commit was SVN r14382.	2007-04-16 10:12:27 +00:00
Josh Hursey	8f119d9063	Closes trac:977 Fix for memory corruption in the restarted process stack. This stemed from the brute force method we were previously using. This commit fixes this by using a lighter weight solution focused in the r2 BML instead of above the PML. This is a more efficient and flexible solution, and it solves the original problem. In the process I pulled out the ft_event function in the tcp BTL and r2 BML into a set of *_ft.[c\|h] files just to keep any updates to these code paths as isolated as possible to make merging easier on everyone. This commit was SVN r14371. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855 The following Trac tickets were found above: Ticket 977 --> https://svn.open-mpi.org/trac/ompi/ticket/977	2007-04-14 02:06:05 +00:00
Jeff Squyres	51f286d737	Just like r14289 on the ORTE trunk: Per discussions with Brian and Ralph, make a slight correction in where components are installed. Use $pkglibdir, not $libdir/openmpi, so that when compiled in the orte trunk, components are installed to the right directory (because the component search patch is checking $pkglibdir). This commit was SVN r14345. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r14289	2007-04-12 11:19:42 +00:00
Gleb Natapov	d41ca417e8	Delete declaration of non-existent functions and no longer relevant comment. This commit was SVN r14341.	2007-04-12 08:12:31 +00:00
George Bosilca	20f0ec584a	A tricky optimization. On my test machine it improve the bandwidth by about 3Mb/s out of 580Mb/s. But the real interest is for small to middle size unexpected messages. The unexpected messages are copied by the PML in it's own unexpected buffers. Therefore, there is no reason to make a first copy in the TCP BTL. The BTL can handle to the PML it's own buffer, and can be sure that once the callback completed it can reuse the buffer, no matter what happened with the fragment. This commit was SVN r14320.	2007-04-12 04:52:29 +00:00
George Bosilca	88365518aa	Small cleanup. This commit was SVN r14319.	2007-04-12 04:34:53 +00:00
Galen Shipman	ebca0bb34e	fix for aggregated writes This commit was SVN r14314.	2007-04-11 22:07:19 +00:00
Galen Shipman	d7e428909e	two fixes, one mine, the other gleb's, I'm committing for gleb due to time difference... 1) The PML makes an assumption on local/remote completion semantics of the BTL which Self BTL does not obey, nor should it, so we fix the PML 2) The Get protocol must handle the case when sender and reciever do not agree on wheter the data is contiguous This commit was SVN r14313.	2007-04-11 22:03:06 +00:00
Josh Hursey	fbc59f668c	fix typo This commit was SVN r14301.	2007-04-11 15:39:42 +00:00
Josh Hursey	5efae25390	No functionality changes (yet). Just fix the indentation to meet the coding standard. This commit was SVN r14300.	2007-04-11 15:19:51 +00:00
Jeff Squyres	85d7678350	Revert r14286; it worked for icc, but not for gcc. #$%@#$% Sorry for configure changes during the day; I totally forgot about that. :-( This commit was SVN r14288. The following SVN revision numbers were found above: r14286 --> open-mpi/ompi@0083eba18e	2007-04-10 15:42:59 +00:00
Jeff Squyres	0083eba18e	Comment out the PAC_C_RESTRICT test from ROMIO's configure.in script. The top-level OMPI configure script already checks for "restrict" and will issue a #define for it. PAC_C_RESTRICT would also check for restrict, but sometimes come up with a different answer than the top-level OMPI configure script, thereby resulting in conflicting #define's for "restrict" (e.g., icc 9.0/9.1 on linux x86-64). So it's easiest just to remove this test from ROMIO's configure.in script. This commit was SVN r14286.	2007-04-10 14:50:47 +00:00
Josh Hursey	38547459ae	Improve the cleanup process in ob1 Remove a redundant statement in the r2 BML. This commit was SVN r14228. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855	2007-04-05 17:37:29 +00:00
Josh Hursey	98fb9f26ef	Some cleanup. - Remove an old comment from crcp_base_fns.c - Let ob1 have its very own ft_event function (which I'll fill in shortly) - Make sure ob1 finalizes the bsend stuff so we don't leave a bunch of memory sitting around - PML base - destruct the array upon finalize. Shrink the include search so it stops after finding a match This commit was SVN r14222.	2007-04-05 13:52:05 +00:00
Josh Hursey	a8918fe3d5	pedantic cleanup. Switch loop to lowest rank sends first This commit was SVN r14215.	2007-04-04 14:23:45 +00:00
Li-Ta Lo	ec8a859a44	fixed typo This commit was SVN r14207.	2007-04-03 17:21:54 +00:00
George Bosilca	667bda0fef	Rework the code a little bit to make things simpler. This commit was SVN r14203.	2007-04-03 16:05:51 +00:00
Josh Hursey	51daa15f9c	play a bit nicer with references. This commit was SVN r14201.	2007-04-02 22:27:52 +00:00
Josh Hursey	5ff1c10e70	minor cleanup This commit was SVN r14199.	2007-04-02 20:39:36 +00:00
Josh Hursey	b0b91a5fde	A couple more fixes for async case. Mostly working again, 1 small bug I'm still tracking. This commit was SVN r14198.	2007-04-02 20:00:58 +00:00
Josh Hursey	71937c3eaf	A bit of cleanup for async case... Still one bug in there. This commit was SVN r14197.	2007-04-02 19:25:22 +00:00
George Bosilca	120cf76ad8	Remove some warnings. This commit was SVN r14196.	2007-04-02 19:11:06 +00:00
George Bosilca	8273c5eeba	Correct an error introduced by commit r14180. This commit was SVN r14191. The following SVN revision numbers were found above: r14180 --> open-mpi/ompi@1cb26e3b9c	2007-04-02 02:59:23 +00:00
George Bosilca	f2a6b9394f	Deal with the include spree. Protect "environ" on Windows. Some others minors modifications in order to make it compile [again] on Windows. This commit was SVN r14188.	2007-04-01 16:16:54 +00:00
Tim Prins	80e047b843	make the mx btl compile again... This commit was SVN r14183.	2007-04-01 02:49:23 +00:00
George Bosilca	1cb26e3b9c	Finally the convertor export a convenience function to allow a consistent computation of the current location on the pack/unpack process. This can be used both for retrieving the pointer to the first byte (in the special case of the cached RDMA protocol) and for getting the current position (for the pipelined protocol). I modified all BTLs, but most of them are still untested. This commit was SVN r14180.	2007-03-30 22:02:45 +00:00
Galen Shipman	a78672be2b	fix mpi_leave_pinned case for arbitrary datatypes George will be streamlining this with a new convertor function soon... This commit was SVN r14174.	2007-03-30 02:06:08 +00:00
Galen Shipman	db63458495	bring disable_sbrk back online, there was a change to properly support AIX some time ago (last summer) that included checking for M_TRIM_THRESHOLD and M_MMAP_MAX, unfortunately we didn't include <malloc.h> which is where these are define, so disabling sbrk for the registration cache has been busted for some time. This commit was SVN r14169.	2007-03-29 16:11:00 +00:00
George Bosilca	cc65814969	And set the message size before the first use too. This commit was SVN r14159.	2007-03-28 18:01:13 +00:00
George Bosilca	b540545fa7	Set the communicator size before using it. This commit was SVN r14158.	2007-03-28 17:59:21 +00:00
George Bosilca	78f362d0d6	Be consistent about the definitions of mca_mpool_base_page_size and mca_mpool_base_page_size_log. They are exported by the mpool/base/base.h, if some other code need them, then it should include this file instead of having it's own redefinition of these externals. This commit was SVN r14156.	2007-03-28 14:14:05 +00:00
Shiqing Fan	91cfb2f149	A few mismatched declearations are fixed, and several header files are added for Cygwin... This commit was SVN r14151.	2007-03-27 14:17:25 +00:00
Mohamad Chaarawi	bfaf9d4a12	Added new module for intercomm collectives. This will require an autogen. This commit was SVN r14149.	2007-03-27 02:06:42 +00:00
Brian Barrett	e283e6f9d9	Retry of r14142, without the one-sided code... Back out r14073 - it speeds up TCP latency / bandwidth but at the same time it kills ROMIO and one-sided performance when using only TCP. The problem is that it only allows those two to be progressed every couple of seconds, leading to what looks like hangs in the one-sided tests (and the ROMIO stuff, although people seem to not notice that at this point). This commit was SVN r14144. The following SVN revision numbers were found above: r14073 --> open-mpi/ompi@64fbbc20b8 r14142 --> open-mpi/ompi@241545a098	2007-03-26 16:01:27 +00:00
Brian Barrett	62e5e81e99	revert r14142, as the onesided change should not have come over This commit was SVN r14143. The following SVN revision numbers were found above: r14142 --> open-mpi/ompi@241545a098	2007-03-26 15:58:41 +00:00
Brian Barrett	241545a098	Back out r14073 - it speeds up TCP latency / bandwidth but at the same time it kills ROMIO and one-sided performance when using only TCP. The problem is that it only allows those two to be progressed every couple of seconds, leading to what looks like hangs in the one-sided tests (and the ROMIO stuff, although people seem to not notice that at this point). This commit was SVN r14142. The following SVN revision numbers were found above: r14073 --> open-mpi/ompi@64fbbc20b8	2007-03-26 15:56:23 +00:00
Gleb Natapov	e5450613b5	Add new SM BTL parameter btl_sm_cb_max_num. If set to value greater then zero it limits the number of circular buffers allocated between each pair of peers. This allows for more tight memory usage control. This commit was SVN r14120.	2007-03-22 12:21:42 +00:00
Gleb Natapov	efe0323d35	Initialize fifos at SM BTL init time instead of waiting for first send. This waist slightly more memory, but prevents problem when fifo cannot be allocated later during a job run when memory resource is exhausted. This commit was SVN r14119.	2007-03-22 12:18:44 +00:00
Galen Shipman	ace68b1883	Change the way we handle unexpected messages, if less than or equal pml_ob1_unexpected_limit just buffer in the PML level recv fragment else allocate a buffer via the bucket allocator This commit was SVN r14117.	2007-03-22 01:00:34 +00:00
Gleb Natapov	c389c47d79	Fix SM connectivity calculations. This commit was SVN r14109.	2007-03-21 13:29:19 +00:00
Gleb Natapov	a1a14aa4c3	Add memory barriers during SM btl initialization. This commit was SVN r14099.	2007-03-21 10:25:10 +00:00
Gleb Natapov	435565590f	Don't relay on opcode to decide how to progress pending message. This commit was SVN r14098.	2007-03-21 07:59:59 +00:00
Josh Hursey	299332ecac	fix small compiler warning This commit was SVN r14097.	2007-03-21 04:44:54 +00:00
Brian Barrett	464d536928	remove debugging printf This commit was SVN r14088.	2007-03-20 21:28:28 +00:00
Josh Hursey	3492fdeae3	Fix a couple of compiler warnings (errors?) caught by ICC testing at Cisco. This commit was SVN r14080.	2007-03-20 14:12:13 +00:00
George Bosilca	8c9e4baa47	Add multi-link capabilities to the TCP BTL. This is useful for systems where the latency is high and the network relatively fast. This will allow for more kernel level buffering, which allow overlap between system calls and communications. Somehow, even on fast clusters there is an improvement (non significant). This patch create multiple modules for the same device, which in turn will create multiple sockets between the peers. By default the number of BTL by device is set to 1, so there is no fundamental difference with the current version. Change the value of btl_tcp_links to enable multiple links between peers. This commit was SVN r14076.	2007-03-20 11:50:17 +00:00

... 8 9 10 11 12 ...

2544 Коммитов