openmpi

Автор	SHA1	Сообщение	Дата
George Bosilca	50b26ebb6a	Allow the ompi_ddt_init and ompi_ddt_finalize to be visible even when the visibility feature is on. This commit was SVN r14726.	2007-05-23 14:02:08 +00:00
Sven Stork	88f0845c44	- let the pt2pt component compile with threads enabled This commit was SVN r14725.	2007-05-23 12:56:34 +00:00
Brian Barrett	38eab3613b	* Fix race condition with the pending_{in,out} variables -- if we're going to do while(...) { } then we can't change the variables in the ... atomically, but should do it while holding the module lock. * Fix dumb communicator creation error when we don't create the progress stuff (because a window already exists), where we would accidently jump to the error case. This commit was SVN r14715.	2007-05-21 20:53:02 +00:00
Ralph Castain	4fff584a68	Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that did start. The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system. Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed. Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief. With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn. Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put". This commit was SVN r14711.	2007-05-21 18:31:28 +00:00
Brian Barrett	0e9e0c518a	Fix a couple more progress thread related issues... This commit was SVN r14708.	2007-05-21 16:06:14 +00:00
Pavel Shamis	5ceaa605d7	Adding new vendor_part_id for Mellanox Hermon HCA This commit was SVN r14705.	2007-05-21 13:33:54 +00:00
Brian Barrett	1191677b76	Fix dumb threads-related compile issues This commit was SVN r14704.	2007-05-21 03:23:58 +00:00
Brian Barrett	2b4b754925	Some much needed cleanup of the point-to-point one-sided component... * Combine polling of the long requests and buffer requests into one type, and in one place * Associate the list of requests to poll with the component, not the individual modules * add progress thread that sits on the OMPI request structure and wakes up at the appropriate time to poll the message list. Not the best, but without some asynch notification from the PML that a given set of requests has completed, there isn't much better * Instead of calling opal_progress() all over the place, move to using the condition variables like the rest of the project. Has the advantage of moving it slightly futher along in the becoming thread safe thing * Fix a problem with the passive side of unlock where it could go recursive and cause all kinds of problems, especially when progress threads are used. Instead, have two parts of passive unlock -- one to start the unlock, and another to complete the lock and send the ack back. The data moving code trips the second at the right time. This commit was SVN r14703.	2007-05-21 02:21:25 +00:00
Ralph Castain	fa5a40070d	Test the return status code from comm_dyn_start_processes - if we see an error, then let's report it and not continue on with the comm_spawn procedure! This commit was SVN r14699.	2007-05-18 20:22:32 +00:00
Donald Kerr	23280bd7da	remove an assignment which is not required This commit was SVN r14692.	2007-05-18 01:33:02 +00:00
Donald Kerr	588d5bd6a9	clean up compile warnings This commit was SVN r14691.	2007-05-17 23:37:47 +00:00
George Bosilca	7738079ab9	Remove unused variable. This commit was SVN r14689.	2007-05-17 20:01:30 +00:00
Gleb Natapov	b2c8fcdbab	Forget to add file in r14681. This commit was SVN r14682. The following SVN revision numbers were found above: r14681 --> open-mpi/ompi@3ebaff8dfe	2007-05-17 08:41:01 +00:00
Gleb Natapov	3ebaff8dfe	Implement new BTL parameters: We eagerly send data up to btl__eager_limit with the match Upon ACK of the MATCH we start using send/receives of size btl__max_send_size up to the btl__rdma_pipeline_offset After the btl__rdma_pipeline_offset we begin using RDMA writes of size btl__rdma_pipeline_frag_size. Now, on a per message basis we only use the above protocol if the message is larger than btl__min_rdma_pipeline_size btl__eager_limit - > same btl__max_send_size -> same btl__rdma_pipeline_offset -> btl__min_rdma_size btl__rdma_pipeline_frag_size -> btl__max_rdma_size btl_*_min_rdma_pipeline_size is new.. This patch also moves all BTL common parameters initialisation into btl_base_mca.c file. This commit was SVN r14681.	2007-05-17 07:54:27 +00:00
Brian Barrett	33a5758521	Some IPv6 improvements: * Move ipv6comat.h code into opal_config_bottom.h and change into some more intelligent testing of structures * Change opal's if interface to use sockaddr instead of sockaddr_storage, as the RFCs suggest we do * Move the networking code in opal that isn't directly related to if detection into net.h * Add quicky function to get the port out of either a sockaddr_in or sockaddr_in6, saving a bunch of code in the oob. * Update TCP oob and btl with new interface This commit was SVN r14679.	2007-05-17 01:17:59 +00:00
Donald Kerr	c40307fd27	add user warning message to inform when udapl btl is no longer able to register memory This commit was SVN r14678.	2007-05-16 21:04:50 +00:00
Brian Barrett	7708c4f887	Don't complain about unsupported protocols. Needs to be made better, but this will quit the whining from platforms where the kernel doesn't have IPv6 support. This commit was SVN r14676.	2007-05-16 20:11:47 +00:00
Sven Stork	22af6d38e6	- UNexport symbols that shouldn't be needed outside the libraries - replace #if/#endif with BEGIN/END_C_DECLS - reformating This commit was SVN r14669.	2007-05-16 15:46:52 +00:00
Sven Stork	bd29eb9bd1	- backout commit r14667, because internal functionality shouldn't be exported. NOTE: if visibility is enabled "make check" will fail This commit was SVN r14668. The following SVN revision numbers were found above: r14667 --> open-mpi/ompi@1f526a95e9	2007-05-16 15:43:44 +00:00
Sven Stork	1f526a95e9	- we need to export this internal symbols because the tests in test/memory need them. This commit was SVN r14667.	2007-05-16 15:14:31 +00:00
Gleb Natapov	61e889a1d9	Fix breakage of GM by r13921. On receive GM provides only buffer pointer without any context so we need to save a context somewhere so it can be retrieved given only buffer pointer. This patch saves context (pointer to frag) just before start of a buffer so it can be be easily retrieved. This commit was SVN r14664. The following SVN revision numbers were found above: r13921 --> open-mpi/ompi@90fb58de4f	2007-05-16 12:20:58 +00:00
Donald Kerr	2ed72bf2e2	break evd_qlen into individual qlens (async,dto,conn); add checks based on udapl limits and number of peers This commit was SVN r14659.	2007-05-15 17:47:00 +00:00
Pavel Shamis	cd87b05711	Added check for IBV_EVENT_CLIENT_REREGISTER async event that was not exists in old openib gen2 versions (Ticket #1025) This commit was SVN r14658.	2007-05-15 13:53:49 +00:00
Sven Stork	91fa494f0e	- another missing symbol This commit was SVN r14657.	2007-05-15 13:38:50 +00:00
Sven Stork	18a5747799	- this symbol is (at least) used by the basic collective component This commit was SVN r14654.	2007-05-15 12:48:58 +00:00
Brian Barrett	21e00f6f0c	Clean up a couple of configure things: * Require Autoconf 2.60 or higher and remove some cruft required for AC 2.59 or the AC 2.59 / AC 2.60 mix * Remove a bunch of now unnecessary AC_SUBST calls * Use the libtool-provided variables for the -I and library to use when compiling against ltdl Fixes trac:1000 This commit was SVN r14652. The following Trac tickets were found above: Ticket 1000 --> https://svn.open-mpi.org/trac/ompi/ticket/1000	2007-05-15 04:23:48 +00:00
Jeff Squyres	92090967b1	Add definitions for Hemon/ConnectX Mellanox HCA This commit was SVN r14639.	2007-05-10 12:27:51 +00:00
Donald Kerr	436d370d51	latency improvements: use ompi_free_list_init_ex, create optimal alignment parameter, remove rdma guarantee path, replace dat_lmt_sync_rdma with use of volatile This commit was SVN r14634.	2007-05-09 19:41:25 +00:00
Gleb Natapov	2562253678	Do more work at RDMA frag preparation time and less work at RDMA frag sending time. This commit was SVN r14627.	2007-05-09 12:11:51 +00:00
Gleb Natapov	78fda79630	Use size_t instead of uint64_t in call to convertor cloning. This commit was SVN r14626.	2007-05-09 10:02:06 +00:00
Pavel Shamis	e2d0e27111	Adding: * openib_finalize flow for openib btl * async event handler for openib btl This commit was SVN r14623.	2007-05-08 21:47:21 +00:00
Terry Dontje	f864348f97	Put an ifdef to conditionalize the use of memcpy for sparcv9 platforms to avoid alignmment issues. This commit fixes trac:1009. This commit was SVN r14608. The following Trac tickets were found above: Ticket 1009 --> https://svn.open-mpi.org/trac/ompi/ticket/1009	2007-05-08 17:17:34 +00:00
Jeff Squyres	ecf5a3b8dd	Fix compiler warning This commit was SVN r14604.	2007-05-08 13:12:50 +00:00
Sven Stork	d0c936ca85	- export required symbols or library is useless This commit was SVN r14595.	2007-05-07 13:59:37 +00:00
Sven Stork	32564d73da	- the define is always defined independent, therefore we need to check if it's 1 or not This commit was SVN r14594.	2007-05-07 13:18:09 +00:00
Jeff Squyres	86f48bf0d7	Fix ordering of dimensions for status arrays. Thanks to Randy Bramley for noticing the problem. This commit was SVN r14591.	2007-05-05 05:02:02 +00:00
Sven Stork	a04c8eb39a	- Bring over the visibility feature, for a finer symbol export control via the visibility feature that is provided by some compilers. Per default this feature is disabled, to enable it you need to configure with --enable-visibility and obviously you need a compiler with visibility support. Please refer to the wiki for more information. https://svn.open-mpi.org/trac/ompi/wiki/Visibility This commit was SVN r14582.	2007-05-04 09:03:37 +00:00
Jelena Pjesivac-Grbovic	625c6739ab	Removing warning about unsed variable This commit was SVN r14579.	2007-05-03 20:26:41 +00:00
Gleb Natapov	8029893489	In multithreaded application sending of initial portion of a request may overlap with RDMAing the rest of it. Also more than one RDMA writes can be performed simultaneously by different threads. To make this code thread safe this patch clones original request convertor for each RDMA fragment. This commit was SVN r14574.	2007-05-03 09:13:17 +00:00
Jelena Pjesivac-Grbovic	9eff74ad4d	Modifying generalized reduce "synchronized" behavior: - Removing "small" message size limit because it really does not relate to the eager size accross the board. Now, the leaf nodes in generalized reduce will use blocking send (DEFAULT/ORIGINAL BEHAVIOR) either when the maximum number of outstanding requests is 0 or when the total number of segments is less than the maximum number of outstanding requests. Otherwise, it will send messages using non-blocking synchronized send operation. This commit was SVN r14572.	2007-05-02 21:42:45 +00:00
George Bosilca	69642a9cd4	Remove 2 warnings about ptrdiff_t to unsigned long implicit conversion. This commit was SVN r14565.	2007-05-01 19:47:33 +00:00
Brian Barrett	a25ce44dc1	Clean up the preconnect code: * Don't need the 2 process case -- we'll send an extra message, but at very little cost and less code is better. * Use COMPLETE sends instead of STANDARD sends so that the connection is fully established before we move on to the next connection. The previous code was still causing minor connection flooding for huge numbers of processes. * mpi_preconnect_all now connects both OOB and MPI layers. There's also mpi_preconnect_mpi and mpi_preconnect_oob should you want to be more specific. * Since we're only using the MCA parameters once at the beginning of time, no need for global constants. Just do the quick param lookup right before the parameter is needed. Save some of that global variable space for the next guy. Fixes trac:963 This commit was SVN r14553. The following Trac tickets were found above: Ticket 963 --> https://svn.open-mpi.org/trac/ompi/ticket/963	2007-05-01 04:49:36 +00:00
Adrian Knoth	d63d125a88	I guess we only need this when IPv6 is enabled. This commit was SVN r14551.	2007-04-29 16:38:34 +00:00
Adrian Knoth	5765ecc22e	This patch reverts r14549 while retaining IPv6 support. Re #1008 This commit was SVN r14550. The following SVN revision numbers were found above: r14549 --> open-mpi/ompi@386baed55b	2007-04-29 16:23:11 +00:00
Adrian Knoth	386baed55b	Hotfix for IPv6 support. Closes trac:1008 This commit was SVN r14549. The following Trac tickets were found above: Ticket 1008 --> https://svn.open-mpi.org/trac/ompi/ticket/1008	2007-04-29 13:46:45 +00:00
George Bosilca	bb481273a6	Typos. This commit was SVN r14546.	2007-04-28 19:15:53 +00:00
George Bosilca	46265db0a9	Update the TCP BTL in order to bring back some of the functionalities lost during the IPv6 patch. The most important is the multi BTL support. There was a quite interesting bug. Instead of setting up the multiple connections over different physical devices, based on the time when these connections were created most of the time they were all using the same physical network. Which, of course, was not the intended goal, as we top at the maximum bandwidth available over one device instead of gathering all available bandwidth from all devices. Second, the IPv6 RFC suggest to use sockaddr_storage as a holder for the IP information, but use a sockaddr* when we pass it to functions. This is only partially corrected by this patch. Some other minor cleanups. This commit was SVN r14544.	2007-04-28 19:13:47 +00:00
Josh Hursey	4c453caab6	Make the check a bit better This commit was SVN r14542.	2007-04-27 17:38:36 +00:00
Josh Hursey	486f29eb6b	Make sure to use the new metadata flags This commit was SVN r14541.	2007-04-27 17:18:26 +00:00
Sven Stork	8d92773067	- export required symbol This commit was SVN r14536.	2007-04-27 11:38:45 +00:00
Rainer Keller	63b904ed1d	- Don't segfault, when calling PERUSE_Init before MPI_Init... This commit was SVN r14535.	2007-04-26 21:06:08 +00:00
Rainer Keller	1aceece03f	- Add a few comments for elements for structs, a few spelling fixes. No functional change. This commit was SVN r14534.	2007-04-26 21:03:38 +00:00
Rainer Keller	ce32b918da	- Fixes for for unlocking the mutex in case of error in functions mca_btl_openib_post_srr and btl_openib_endpoint_post_rr This commit was SVN r14530.	2007-04-26 13:33:02 +00:00
Sven Stork	fe3b08004e	- export symbols that are needed by the fortran libs This commit was SVN r14527.	2007-04-26 09:34:41 +00:00
Rainer Keller	6f9251ed39	- Small fixes by PGI -Minform=inform This commit was SVN r14524.	2007-04-26 08:16:07 +00:00
Josh Hursey	af38efd27c	Use more of the datatype engine supplied functions This commit was SVN r14519.	2007-04-26 00:06:22 +00:00
Jelena Pjesivac-Grbovic	3eac49aa59	Adding flow control for leaf nodes in generalized reduce structure. This "feature" is disabled by default and it should not affect the current performance. In case when the message size is large and segment size is smaller than eager size for particular interface, the leaf nodes in generalized reduce function can overflood parent nodes by sending all segments without any synchronization. This can cause the parent to have HIGH number of unexpected messages (think 16MB message with 1KB segments for example). In case of binomial algorithm root node always has at least one child which is leaf, so this can potentially affect the root's performance significantly [Especially in large communicators where root may have quite a few children (binomial tree for example)]. When the segment size is bigger than the eager size, rendezvous protocol ensures that this does not happen so it is not necessary. Originally, the problem was exposed in "infinite" bucket allocator clean up time for "small" segment sizes (which may explain some "deadlocks" on Thunderbird tests). To prevent this, we allow user to specify mca parameter "--mca coll_tuned_reduce_algorithm_max_requests NUM" this limits number of outstanding messages from a leaf node in generalized reduce to the parent to NUM. Messages are sent as non-blocking synchrnous messages, so syncronization happens at "wait" time. The synchronization actually improved performance of pipeline and binomial algorithm for large message sizes with 1KB segments over MX, but I need to test it some more to make sure it is consistent. Since there is no easy way to find out what is "the eager" size for particular btl, I set the limit to 4000B. If message/individual segment size is greater than 4000B - we will not use this feature. This variable may or may not be exposed as mca parameter later... I did not have any problems running it and both "default" and "synchronous" tests passed Intel Reduce* tests up to 80 processes (over MX). This commit was SVN r14518.	2007-04-25 20:39:53 +00:00
Adrian Knoth	e3d35258b4	Cosmetics. Brian fixes my crappy code and I fix the curly braces. That's teamwork, right? ;) This commit was SVN r14517.	2007-04-25 20:17:19 +00:00
Brian Barrett	4b8bb70afb	A couple cleanups for the IPv6 support: - make opal_sockaddr2str() take a sockaddr_storage instead of a sockaddr_in6 so that it works for IPv4 and IPv6 addresses, and remove a whole bunch of #ifs in the OOOB code. - Fix a compiler warning in the TCP BTL due to run-time determined array size by making it a dynamicly allocated array. - Fix the unpacking code of IPv4 addresses when using IPv6 support, so that the address is in the correct location (instead of in an IPv6 structure, use an IPv4 structure). Refs trac:1005. This commit was SVN r14514. The following Trac tickets were found above: Ticket 1005 --> https://svn.open-mpi.org/trac/ompi/ticket/1005	2007-04-25 19:08:07 +00:00
Adrian Knoth	d1ce39de4f	Move mca_btl_tcp_addr_isipv4public to opal_addr_isipv4public This commit was SVN r14512.	2007-04-25 18:06:06 +00:00
Donald Kerr	80d984441f	change so that we only check connection queue when expecting a connection; create a mca parameter that controls frequency at which the async queue is checked This commit was SVN r14511.	2007-04-25 17:46:25 +00:00
Ralph Castain	7d0f51e6b9	Begin setting up for a change to the OOB information passing functionality - this is totally transparent at the moment (need to change computers). This commit was SVN r14510.	2007-04-25 17:36:26 +00:00
Jeff Squyres	c4c68e666a	Merge in the ipv6 work from /tmp/ipv6-merge. This commit was SVN r14503.	2007-04-25 01:55:40 +00:00
Donald Kerr	cae24fcde1	move mca parameter registration into own .c and .h files This commit was SVN r14493.	2007-04-24 18:34:16 +00:00
Josh Hursey	8c2385416f	Per a developer request - Make sure that the wrapper selection is compiled out if not enabling FT. Before the logic would skip over it since the conditional if statements would not be satisfied, now there are no additional if statements when compiled out. With this modification the selection logic looks nearly identical to pre-r14051 with the exception of the non-FT related improvements. This commit was SVN r14491. The following SVN revision numbers were found above: r14051 --> open-mpi/ompi@dadca7da88	2007-04-24 17:08:48 +00:00
Ralph Castain	18b2dca51c	Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly. There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd really have to try). This also involved a slight change to the oob.xcast API, so propagated that as required. Note: this has only been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-) Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately. This commit was SVN r14475.	2007-04-23 18:41:04 +00:00
Donald Kerr	3f428af7b8	couple of minor changes to fix #973 and seperated eager rdma fragments into structure only and data only area This commit was SVN r14470.	2007-04-23 17:41:34 +00:00
Jelena Pjesivac-Grbovic	53cbec7a09	Make coll/tuned dynamic rules more verbose (when promted with --mca coll_base_verbose 1) This commit was SVN r14469.	2007-04-23 16:34:52 +00:00
Rich Graham	ce35761683	make sure not to go out of bounds. element i+1 of bml_btls is referenced, which for i-arr_size-1 is beyond the array dimentions. This commit was SVN r14464.	2007-04-22 21:43:34 +00:00
Sharon Melamed	cf3f41288b	Add pkey value MCA parameter. if this param is used, only ports with the actual pkey value will be initiate. This commit was SVN r14463.	2007-04-22 10:22:12 +00:00
Adrian Knoth	339dbf6cd5	Cosmetics. Enforcing style guide. This commit was SVN r14459.	2007-04-21 21:47:25 +00:00
Josh Hursey	4159b72a60	Some minor updates to go along with commit r14457 This commit was SVN r14458. The following SVN revision numbers were found above: r14457 --> open-mpi/ompi@2af38229c1	2007-04-21 21:24:44 +00:00
Josh Hursey	2af38229c1	Re-worked the implementation of the LAM-like coord component. It's a bit longer, but much more clear in it's implementation I believe. Fundamentally it is the same, but is much more solid in the implementation. I created quite a few directed tests that this version of the implementation now passes. This commit was SVN r14457.	2007-04-21 20:35:01 +00:00
Jeff Squyres	401a072888	Revert r14435 -- it breaks compiling on Linux with at least the PGI compiler suite. The rule is that ompi_info.h is supposed to be the ''first'' file included so that it can affect system header files if necessary (and it is sometimes necessary, such as with the PGI compiler suite). If this breaks VC on Windows, we'll have to find another fix. More on the mailing list... This commit was SVN r14453. The following SVN revision numbers were found above: r14435 --> open-mpi/ompi@a20b43ace9	2007-04-21 00:51:31 +00:00
Jeff Squyres	5bebd24250	Bring over Brian's installdirs fixes from this afternoon (r14445). This commit was SVN r14450. The following SVN revision numbers were found above: r14445 --> open-mpi/ompi@13d366b827	2007-04-21 00:16:31 +00:00
Jeff Squyres	0ba47105ed	Merge the /tmp/jms-installdirs-trunk branch into the trunk. This finally brings in functionality that is already on the 1.2 branch, and was developed and tested in the v1.2ofed branch (and other places). Short version of new features: * Support for ibv_fork_init() * Automatically fill in the openib BTL bandwidth value by querying the HCA port * Installdirs functionality * Fixes to always use -I in the Fortran wrapper compilers (#924) * Gleb's mpool updates * Remove some kruft in btl/openib/configure.m4, therefore fixing the harmless warnings noted in #665 * Bunches of updates to the Linux RPM spec file I.e., effectively the same thing that r14411 brought to the v1.2 branch. Also effectively brought in r14432 and r14433 (some fixes on top of the original r14411 commit to v1.2). Still need to bring in the moral equivalent of r14445 after this commit (fixes to installdirs). This commit was SVN r14449. The following SVN revision numbers were found above: r14411 --> open-mpi/ompi@83b31314ae r14432 --> open-mpi/ompi@a48f160595 r14433 --> open-mpi/ompi@68f346d2bc r14445 --> open-mpi/ompi@13d366b827	2007-04-21 00:15:05 +00:00
Josh Hursey	eef364546c	Check for NULL before trying to use the variable. This commit was SVN r14444.	2007-04-20 17:17:11 +00:00
Jeff Squyres	40f4b60a2a	Use #ifdef, not #if This commit was SVN r14439.	2007-04-20 14:26:20 +00:00
Shiqing Fan	a20b43ace9	The header file ompi_config.h should be included in ompi_info.h file but not in the .cc files. This make it could be compiled with VC compiler, and it works no difference under linux systems. This commit was SVN r14435.	2007-04-20 09:03:16 +00:00
Josh Hursey	12e5d0e817	ft_event Commit: - Move the PML Modex stuff out of the BML -- Abstraction violation. - Also fix the location of the add_procs with respect to the stage gates. This commit was SVN r14422.	2007-04-19 03:05:12 +00:00
Josh Hursey	d12ddcdb7a	Protect the free since if we never send any messages this could be NULL. This commit was SVN r14421.	2007-04-19 02:17:50 +00:00
George Bosilca	51fc2474f1	Don't keep the data attached to a fragment segmented when we have to move it into the unexpected queue. Instead pack the data in only one buffer. Now the code look more optimized and clear, but I have a doubt about who's using this functionality. I think that all BTLs always return only one memory segment attached to the matching fragment (i.e. there is no unexpected iov type receive). This commit was SVN r14416.	2007-04-18 15:52:11 +00:00
Sven Stork	037b01ce9e	- more symbols that need to be exported This commit was SVN r14415.	2007-04-18 14:53:56 +00:00
George Bosilca	66a110e115	Add some comments on the internals of the bucket structure. Alter the cleanup function to make it more scalable. The memory fragmentation is still high, but at least in most of the cases (where all ressources are correctly released before the cleanup) the code is now highly efficient. Before the code execute in (N * (N-1))!, which take a while when the number of allocated ressources increase (which is the case when a lot of unexpected messages are created). The fix consist of checking if all items are freed and if it's the case then do not recreate the free items list (as we know that everything will be released). If this condition is not true, we fall back on the original execution path (which is still sub-sub-sub ... optimal). This commit was SVN r14406.	2007-04-17 20:43:30 +00:00
Jeff Squyres	82caceda08	A minor change to ROMIO's configure script: make it use exactly the same "restrict" check as the top-level OMPI configure.ac script so that it will guarantee to always get the same result. Therefore, the #define for restrict will always have the same value in both opal_config.h and romioconf.h, and we get 7 less warnings (6 in the IO ROMIO component, 1 in ROMIO itself) when compiling with icc on Linux (because PAC_C_RESTRICT and AC_C_RESTRICT would get different values for the "restrict" #define in this case). This commit was SVN r14387.	2007-04-17 03:10:06 +00:00
Adrian Knoth	e3178fd39f	Cosmetics. PTLs are now called BTLs. This commit was SVN r14382.	2007-04-16 10:12:27 +00:00
Josh Hursey	8f119d9063	Closes trac:977 Fix for memory corruption in the restarted process stack. This stemed from the brute force method we were previously using. This commit fixes this by using a lighter weight solution focused in the r2 BML instead of above the PML. This is a more efficient and flexible solution, and it solves the original problem. In the process I pulled out the ft_event function in the tcp BTL and r2 BML into a set of *_ft.[c\|h] files just to keep any updates to these code paths as isolated as possible to make merging easier on everyone. This commit was SVN r14371. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855 The following Trac tickets were found above: Ticket 977 --> https://svn.open-mpi.org/trac/ompi/ticket/977	2007-04-14 02:06:05 +00:00
Jeff Squyres	51f286d737	Just like r14289 on the ORTE trunk: Per discussions with Brian and Ralph, make a slight correction in where components are installed. Use $pkglibdir, not $libdir/openmpi, so that when compiled in the orte trunk, components are installed to the right directory (because the component search patch is checking $pkglibdir). This commit was SVN r14345. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r14289	2007-04-12 11:19:42 +00:00
Gleb Natapov	d41ca417e8	Delete declaration of non-existent functions and no longer relevant comment. This commit was SVN r14341.	2007-04-12 08:12:31 +00:00
George Bosilca	20f0ec584a	A tricky optimization. On my test machine it improve the bandwidth by about 3Mb/s out of 580Mb/s. But the real interest is for small to middle size unexpected messages. The unexpected messages are copied by the PML in it's own unexpected buffers. Therefore, there is no reason to make a first copy in the TCP BTL. The BTL can handle to the PML it's own buffer, and can be sure that once the callback completed it can reuse the buffer, no matter what happened with the fragment. This commit was SVN r14320.	2007-04-12 04:52:29 +00:00
George Bosilca	88365518aa	Small cleanup. This commit was SVN r14319.	2007-04-12 04:34:53 +00:00
George Bosilca	6b217d31e1	Add OPAL_LIKELY where necessary. This commit was SVN r14318.	2007-04-12 04:32:07 +00:00
Galen Shipman	ebca0bb34e	fix for aggregated writes This commit was SVN r14314.	2007-04-11 22:07:19 +00:00
Galen Shipman	d7e428909e	two fixes, one mine, the other gleb's, I'm committing for gleb due to time difference... 1) The PML makes an assumption on local/remote completion semantics of the BTL which Self BTL does not obey, nor should it, so we fix the PML 2) The Get protocol must handle the case when sender and reciever do not agree on wheter the data is contiguous This commit was SVN r14313.	2007-04-11 22:03:06 +00:00
Josh Hursey	fbc59f668c	fix typo This commit was SVN r14301.	2007-04-11 15:39:42 +00:00
Josh Hursey	5efae25390	No functionality changes (yet). Just fix the indentation to meet the coding standard. This commit was SVN r14300.	2007-04-11 15:19:51 +00:00
Jeff Squyres	85d7678350	Revert r14286; it worked for icc, but not for gcc. #$%@#$% Sorry for configure changes during the day; I totally forgot about that. :-( This commit was SVN r14288. The following SVN revision numbers were found above: r14286 --> open-mpi/ompi@0083eba18e	2007-04-10 15:42:59 +00:00
Jeff Squyres	0083eba18e	Comment out the PAC_C_RESTRICT test from ROMIO's configure.in script. The top-level OMPI configure script already checks for "restrict" and will issue a #define for it. PAC_C_RESTRICT would also check for restrict, but sometimes come up with a different answer than the top-level OMPI configure script, thereby resulting in conflicting #define's for "restrict" (e.g., icc 9.0/9.1 on linux x86-64). So it's easiest just to remove this test from ROMIO's configure.in script. This commit was SVN r14286.	2007-04-10 14:50:47 +00:00
Rich Graham	f481722bdf	move the code that sets the thread level information before the btl are initialized, so that the btl's have this information for correct setup. This commit was SVN r14258.	2007-04-07 05:06:47 +00:00
Tim Prins	f0e6a28a1f	pedantic indentation... This commit was SVN r14251.	2007-04-06 19:18:31 +00:00
Josh Hursey	38547459ae	Improve the cleanup process in ob1 Remove a redundant statement in the r2 BML. This commit was SVN r14228. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855	2007-04-05 17:37:29 +00:00
Josh Hursey	98fb9f26ef	Some cleanup. - Remove an old comment from crcp_base_fns.c - Let ob1 have its very own ft_event function (which I'll fill in shortly) - Make sure ob1 finalizes the bsend stuff so we don't leave a bunch of memory sitting around - PML base - destruct the array upon finalize. Shrink the include search so it stops after finding a match This commit was SVN r14222.	2007-04-05 13:52:05 +00:00
Josh Hursey	a8918fe3d5	pedantic cleanup. Switch loop to lowest rank sends first This commit was SVN r14215.	2007-04-04 14:23:45 +00:00
Edgar Gabriel	4d2b3e859d	fix the indenting from tabs to spaces :-) This commit was SVN r14211.	2007-04-03 21:33:44 +00:00
Edgar Gabriel	188f770d94	ok, increase the reference count on ompi_mpi_group_null twice when creating ompi_mpi_comm_null, since the destructor of ompi_mpi_comm_null will decrease the reference counter of ompi_mpi_group_null twice according to the last fix of Mohamad. Added also a lengthy comment in ompi_comm_finalize about why we do not decrease the reference counters for ompi_mpi_comm_null, ompi_mpi_group_null etc. for the parent communicator, although we do increase it in ompi_comm_init This commit was SVN r14210.	2007-04-03 21:16:26 +00:00
Li-Ta Lo	ec8a859a44	fixed typo This commit was SVN r14207.	2007-04-03 17:21:54 +00:00
George Bosilca	667bda0fef	Rework the code a little bit to make things simpler. This commit was SVN r14203.	2007-04-03 16:05:51 +00:00
George Bosilca	cb1b976486	Big update. Correct the behavior for true_lb and true_ub computation when the size of the data is zero. Now they are not updated, which leave us with the correct memory layout in all situations (so far). Update all the comments to reflect exactly the supported behavior of the DDT engine. This commit was SVN r14202.	2007-04-03 16:05:15 +00:00
Josh Hursey	51daa15f9c	play a bit nicer with references. This commit was SVN r14201.	2007-04-02 22:27:52 +00:00
Josh Hursey	5ff1c10e70	minor cleanup This commit was SVN r14199.	2007-04-02 20:39:36 +00:00
Josh Hursey	b0b91a5fde	A couple more fixes for async case. Mostly working again, 1 small bug I'm still tracking. This commit was SVN r14198.	2007-04-02 20:00:58 +00:00
Josh Hursey	71937c3eaf	A bit of cleanup for async case... Still one bug in there. This commit was SVN r14197.	2007-04-02 19:25:22 +00:00
George Bosilca	120cf76ad8	Remove some warnings. This commit was SVN r14196.	2007-04-02 19:11:06 +00:00
Mohamad Chaarawi	0e98bf2ac6	quick fix for the cart create problem caused by the previous memory leak fix This commit was SVN r14195.	2007-04-02 19:06:52 +00:00
George Bosilca	8273c5eeba	Correct an error introduced by commit r14180. This commit was SVN r14191. The following SVN revision numbers were found above: r14180 --> open-mpi/ompi@1cb26e3b9c	2007-04-02 02:59:23 +00:00
George Bosilca	f2a6b9394f	Deal with the include spree. Protect "environ" on Windows. Some others minors modifications in order to make it compile [again] on Windows. This commit was SVN r14188.	2007-04-01 16:16:54 +00:00
Tim Prins	80e047b843	make the mx btl compile again... This commit was SVN r14183.	2007-04-01 02:49:23 +00:00
George Bosilca	f518a9c1f6	Remove some warnings from the data-type engine. This commit was SVN r14181.	2007-03-31 04:14:47 +00:00
George Bosilca	1cb26e3b9c	Finally the convertor export a convenience function to allow a consistent computation of the current location on the pack/unpack process. This can be used both for retrieving the pointer to the first byte (in the special case of the cached RDMA protocol) and for getting the current position (for the pipelined protocol). I modified all BTLs, but most of them are still untested. This commit was SVN r14180.	2007-03-30 22:02:45 +00:00
Mohamad Chaarawi	8f4f992bfc	fixed the memory leak problem by decrementing the ref count on the remote group in case of Intra communicators. This needs to go in V1.2. We will file a move request on monday.. This commit was SVN r14179.	2007-03-30 19:30:40 +00:00
Galen Shipman	a78672be2b	fix mpi_leave_pinned case for arbitrary datatypes George will be streamlining this with a new convertor function soon... This commit was SVN r14174.	2007-03-30 02:06:08 +00:00
Galen Shipman	db63458495	bring disable_sbrk back online, there was a change to properly support AIX some time ago (last summer) that included checking for M_TRIM_THRESHOLD and M_MMAP_MAX, unfortunately we didn't include <malloc.h> which is where these are define, so disabling sbrk for the registration cache has been busted for some time. This commit was SVN r14169.	2007-03-29 16:11:00 +00:00
George Bosilca	cb93b1d40d	Deal with compiler warnings and size_t in same time ... It's getting more and more tricky !!! This commit was SVN r14162.	2007-03-28 22:02:13 +00:00
George Bosilca	4bc69447b4	Setting a size_t to -1 leads to unexpected results ... This commit was SVN r14160.	2007-03-28 18:23:42 +00:00
George Bosilca	cc65814969	And set the message size before the first use too. This commit was SVN r14159.	2007-03-28 18:01:13 +00:00
George Bosilca	b540545fa7	Set the communicator size before using it. This commit was SVN r14158.	2007-03-28 17:59:21 +00:00
George Bosilca	78f362d0d6	Be consistent about the definitions of mca_mpool_base_page_size and mca_mpool_base_page_size_log. They are exported by the mpool/base/base.h, if some other code need them, then it should include this file instead of having it's own redefinition of these externals. This commit was SVN r14156.	2007-03-28 14:14:05 +00:00
Shiqing Fan	fb50a72e92	Unnecessary header removed. This commit was SVN r14152.	2007-03-27 14:32:30 +00:00
Shiqing Fan	91cfb2f149	A few mismatched declearations are fixed, and several header files are added for Cygwin... This commit was SVN r14151.	2007-03-27 14:17:25 +00:00
Mohamad Chaarawi	bfaf9d4a12	Added new module for intercomm collectives. This will require an autogen. This commit was SVN r14149.	2007-03-27 02:06:42 +00:00
Brian Barrett	e283e6f9d9	Retry of r14142, without the one-sided code... Back out r14073 - it speeds up TCP latency / bandwidth but at the same time it kills ROMIO and one-sided performance when using only TCP. The problem is that it only allows those two to be progressed every couple of seconds, leading to what looks like hangs in the one-sided tests (and the ROMIO stuff, although people seem to not notice that at this point). This commit was SVN r14144. The following SVN revision numbers were found above: r14073 --> open-mpi/ompi@64fbbc20b8 r14142 --> open-mpi/ompi@241545a098	2007-03-26 16:01:27 +00:00
Brian Barrett	62e5e81e99	revert r14142, as the onesided change should not have come over This commit was SVN r14143. The following SVN revision numbers were found above: r14142 --> open-mpi/ompi@241545a098	2007-03-26 15:58:41 +00:00
Brian Barrett	241545a098	Back out r14073 - it speeds up TCP latency / bandwidth but at the same time it kills ROMIO and one-sided performance when using only TCP. The problem is that it only allows those two to be progressed every couple of seconds, leading to what looks like hangs in the one-sided tests (and the ROMIO stuff, although people seem to not notice that at this point). This commit was SVN r14142. The following SVN revision numbers were found above: r14073 --> open-mpi/ompi@64fbbc20b8	2007-03-26 15:56:23 +00:00
Josh Hursey	7c4ca3c420	remove some stale code This commit was SVN r14134.	2007-03-23 14:11:12 +00:00
Gleb Natapov	e5450613b5	Add new SM BTL parameter btl_sm_cb_max_num. If set to value greater then zero it limits the number of circular buffers allocated between each pair of peers. This allows for more tight memory usage control. This commit was SVN r14120.	2007-03-22 12:21:42 +00:00
Gleb Natapov	efe0323d35	Initialize fifos at SM BTL init time instead of waiting for first send. This waist slightly more memory, but prevents problem when fifo cannot be allocated later during a job run when memory resource is exhausted. This commit was SVN r14119.	2007-03-22 12:18:44 +00:00
Galen Shipman	ace68b1883	Change the way we handle unexpected messages, if less than or equal pml_ob1_unexpected_limit just buffer in the PML level recv fragment else allocate a buffer via the bucket allocator This commit was SVN r14117.	2007-03-22 01:00:34 +00:00
Gleb Natapov	c389c47d79	Fix SM connectivity calculations. This commit was SVN r14109.	2007-03-21 13:29:19 +00:00
Jeff Squyres	3e2031e0e3	Finally commit something that has been sitting around in one of my development trees since last year (had to wait for some intel tests to run yesterday, so I finally took the time to finish this work): * Improve MPI API argument checking by also checking for NULL values (especially helps when invalid Fortran MPI handles are passed, because the various MPI_f2c functions are supposed to return an "invalid" MPI handle [meaning NULL] when this happens). So now OMPI will generate an MPI exception rather than a segv. Removed a few redundant DATATYPE_NULL checks. * Also check for some other forms of "invalid" handles (e.g., already been freed, etc.) in some cases. We could probably be a bit more stringent in this regard if we really wanted to. * Change MPI_Get_processor_name to zero out the string up to MPI_MAX_PROCESSOR_NAME characters, because the MPI spec says that the string must be at least that long. We were already passing that length to gethostname(), anyway. This commit was SVN r14100.	2007-03-21 11:10:42 +00:00
Gleb Natapov	a1a14aa4c3	Add memory barriers during SM btl initialization. This commit was SVN r14099.	2007-03-21 10:25:10 +00:00
Gleb Natapov	435565590f	Don't relay on opcode to decide how to progress pending message. This commit was SVN r14098.	2007-03-21 07:59:59 +00:00
Josh Hursey	299332ecac	fix small compiler warning This commit was SVN r14097.	2007-03-21 04:44:54 +00:00
Brian Barrett	464d536928	remove debugging printf This commit was SVN r14088.	2007-03-20 21:28:28 +00:00
Josh Hursey	3492fdeae3	Fix a couple of compiler warnings (errors?) caught by ICC testing at Cisco. This commit was SVN r14080.	2007-03-20 14:12:13 +00:00
George Bosilca	8c9e4baa47	Add multi-link capabilities to the TCP BTL. This is useful for systems where the latency is high and the network relatively fast. This will allow for more kernel level buffering, which allow overlap between system calls and communications. Somehow, even on fast clusters there is an improvement (non significant). This patch create multiple modules for the same device, which in turn will create multiple sockets between the peers. By default the number of BTL by device is set to 1, so there is no fundamental difference with the current version. Change the value of btl_tcp_links to enable multiple links between peers. This commit was SVN r14076.	2007-03-20 11:50:17 +00:00
George Bosilca	0edd770644	Nothing really relevant. This commit was SVN r14075.	2007-03-20 11:21:23 +00:00
George Bosilca	4332295b32	Typos. This commit was SVN r14074.	2007-03-20 11:18:05 +00:00
George Bosilca	64fbbc20b8	Switch the event engine to a blocking mode if there is no high performance networks available. This commit was SVN r14073.	2007-03-20 11:15:08 +00:00
Rainer Keller	249abd29c2	- Mark some deprecated functions (two still commented) and fix to not use opal_cmd_line_make_opt anymore. This commit was SVN r14072.	2007-03-20 10:08:58 +00:00
Gleb Natapov	e551c5f1a3	Get rid of separate sm BTL for different shared memory base addresses. Now, when we precalculate most of the addresses there is no point to have separate BTL for this. The sm_progress() code become much more simple as a result. This commit was SVN r14071.	2007-03-20 08:15:58 +00:00

1 2 3 4 5 ...

2715 Коммитов