openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	84d63a46cd	Remove a hard-coded limit of 64 independent jobs that could connect/accept together This commit was SVN r23378.	2010-07-12 18:34:33 +00:00
Shiqing Fan	8de5654bf9	Add new files into the tarball. This commit was SVN r23377.	2010-07-12 16:21:46 +00:00
Shiqing Fan	cdc7e0bec9	Mainly type casts. Get rid of pthread and other unnecessary stuffs for Windows. This commit was SVN r23376.	2010-07-12 16:17:56 +00:00
Shiqing Fan	e3be90ff22	Update CMake modules, adding initial support for openib. This commit was SVN r23373.	2010-07-12 15:28:37 +00:00
Jeff Squyres	87e17a41da	Ensure that the com_rules[] array entries are initialized to NULL in case individual entries aren't used, but dynamic rules are enabled (i.e., at least one or more of them are not NULL, meaning that they'll all be assumed to be either NULL or a valid value). This commit was SVN r23361.	2010-07-07 14:04:18 +00:00
Jeff Squyres	c8bb7537e7	Remove include/opal/sys/cache.h -- its only purpose in life was to #define CACHE_LINE_SIZE to 128. This name has a conflict on NetBSD, and it seems kinda odd to have a header file that ''only'' defines a single value. Also, we'll soon be raising hwloc to be a first-class item, so having this file around seemed kinda weird. Therefore, I replaced CACHE_LINE_SIZE with opal_cache_line_size, an int (in opal/runtime/opal_init.c and opal/runtime/opal.h) on the rationale that we can fill this in at runtime with hwloc info (trunk and v1.5/beyond, only). The only place we ''needed'' a compile-time CACHE_LINE_SIZE was in the BTL SM (for struct padding), so I made a new BTL_SM_ preprocessor macro with the old CACHE_LINE_SIZE value (128). That use isn't suitable for run-time hwloc information, anyway. This commit was SVN r23349.	2010-07-06 14:33:36 +00:00
Jeff Squyres	6d77118254	Fixes for FT code that came from recent shared memory updates. This commit was SVN r23348.	2010-07-06 12:58:48 +00:00
Jeff Squyres	e82e7f896e	These compile warnings have been forever; I finally got inspired to fix them. This commit was SVN r23316.	2010-06-28 17:26:38 +00:00
Nadia Derbey	c22e6b3613	openib btl unsafe in case of extremely low srq settings This commit was SVN r23301.	2010-06-24 09:59:45 +00:00
Shiqing Fan	d391c57b0f	A more proper fix for the HANDLE definition. This commit was SVN r23269.	2010-06-14 14:17:07 +00:00
Samuel Gutierrez	2fb7c344fc	Added a new System V (sysv) shared memory component for Open MPI. Configure Option: --enable-sysv MCA Parameter: mpi_common_sm mpi_common_sm accepts a comma delimited list of: [sysv],mmap (order dependent). The first component that is successfully selected is used. For example, -mca mpi_common_sm sysv,mmap will first try sysv. If sysv is not successfully selected, then mmap will be used. mmap will be used if mpi_common_sm is not provided. Notes: Please make certain that your system's shmmax limit, or equivalent, is larger than mpool_sm_min_size. Otherwise, shmget may fail. This commit was SVN r23260.	2010-06-09 16:58:52 +00:00
George Bosilca	c8ee150c95	If we fail to correctly initialize the MX device, don't mark it as initialized. This commit was SVN r23238.	2010-06-02 15:00:42 +00:00
Jeff Squyres	e45be29f0d	This function shouldn't have an ibv_ prefix -- it's not part of verbs (it's just a static convenience function here in this file). This commit was SVN r23237.	2010-06-02 12:54:56 +00:00
Jeff Squyres	464bd8c56e	Fix typo This commit was SVN r23212.	2010-05-27 21:19:38 +00:00
Rolf vandeVaart	27f070a575	Start setting a flag when a port error is detected on the openib BTL. At this point, it is just cleared (and ignored) so default behavior has not changed. However, future failover support can take advantage of this flag. Reviewed by Pasha Shamis. This commit was SVN r23204.	2010-05-24 18:57:55 +00:00
Shiqing Fan	857f1669e2	Solve a few compilation problems on Windows. This commit was SVN r23193.	2010-05-21 14:30:15 +00:00
Edgar Gabriel	f6598138ba	fix some instances, where we might have allocated 0 bytes. Also, for allgather make sure that we do not call coll_gather and coll_bcast in the very same instances, since some collective (intra) modules do not seem to like the fact if they are called for scount or rcount being zero (for regular intra-communicator operations, this is handled on the MPI API layer). Fixes trac:2405 This commit was SVN r23188. The following Trac tickets were found above: Ticket 2405 --> https://svn.open-mpi.org/trac/ompi/ticket/2405	2010-05-20 22:23:44 +00:00
George Bosilca	b56ab33ff6	Indent and fix some uninitialized variables. This commit was SVN r23179.	2010-05-19 21:20:33 +00:00
George Bosilca	c51932c250	Don't forget to initialize "line" in all cases. This commit was SVN r23178.	2010-05-19 21:19:45 +00:00
Rolf vandeVaart	03b3e75f86	Add two arguments to the PML error callback function. This allows the BTL to specify a specific ompi_proc_t that had an error. Also add an optional descriptive string. Currently, arguments are not used but will be by future failover PML. Changes based on RFC. Reviewed by George Bosilca. This commit was SVN r23174.	2010-05-19 11:55:45 +00:00
Abhishek Kulkarni	c63c4d6892	Fix bugs where (OMPI_ERROR == ) checks cannot be converted to (OMPI_SUCCESS != ) since the return codes are overloaded to return an "index" on success. The fix is to just check if the return value is positive or not, since all the SOS encoded errors are always negative. The real fix (as Ralph points out) is to change these functions (opal_pointer_array_add and mca_base_param*) to return the index as a pointer. This commit was SVN r23173.	2010-05-18 20:54:11 +00:00
Josh Hursey	f57e73d4e5	add a few more missing SOS includes This commit was SVN r23168.	2010-05-18 15:00:07 +00:00
Abhishek Kulkarni	afbe3e99c6	* Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with (OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns back the native error code. * Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to decode 'ret' to get the native error code. This commit was SVN r23162.	2010-05-17 23:08:56 +00:00
Rolf vandeVaart	9e300703ec	Add reference to trac ticket as requested by code review. This commit was SVN r23123.	2010-05-13 13:55:54 +00:00
Jeff Squyres	c7c3de87f5	Add ummunotify support to Open MPI. See http://marc.info/?l=linux-mm-commits&m=127352503417787&w=2 for more details. * Remove the ptmalloc memory component; replace it with a new "linux" memory component. * The linux memory component will conditionally compile in support for ummunotify. At run-time, if it has ummunotify support and finds run-time support for ummunotify (i.e., /dev/ummunotify), it uses it. If not, it tries to use ptmalloc via the glibc memory hooks. * Add some more API functions to the memory framework to accomodate the ummunotify model (i.e., poll to see if memory has "changed"). * Add appropriate calls in the rcache to the new memory APIs to see if memory has changed, and to react accordingly. * Add a few comments in the openib BTL to indicate why we don't need to notify the OPAL memory framework about specific instances of registered memory. * Add dummy API calls in the solaris malloc component (since it doesn't have polling/"did memory change" support). This commit was SVN r23113.	2010-05-11 21:43:19 +00:00
Jeff Squyres	b6e401a512	Fix minor typo. This commit was SVN r23067.	2010-04-29 11:45:25 +00:00
George Bosilca	321213e779	Fix segmentation fault on heterogeneous architectures. Don't mess with the ompi_ptr_t by translating into void*. Instead keep it as an ompi_ptr_t all the way. Thanks to Timur Magomedov for helping to track down this issue and test the patch. cmr:v1.4 cmr:v1.5 This commit was SVN r23030.	2010-04-23 15:14:55 +00:00
Samuel Gutierrez	7654b39349	Fix segfault in two error paths. This commit was SVN r22978.	2010-04-15 15:51:57 +00:00
Jeff Squyres	181331d65e	Very minor nits/updates. This commit was SVN r22977.	2010-04-15 14:44:55 +00:00
Rolf vandeVaart	892091c77d	After fix 22669 was applied which allowed for more than 8 interfaces, it was discovered that the connection algorithm did not scale. Therefore, switch to a simpler algorithm in the extremely rare case when one has more than 8 interfaces. This commit fixes trac:2301. This commit was SVN r22976. The following Trac tickets were found above: Ticket 2301 --> https://svn.open-mpi.org/trac/ompi/ticket/2301	2010-04-14 14:18:35 +00:00
Rainer Keller	a48a11821b	- mca_base_param_reg_string_name allocates default_pml. As it is strdup, just free(default_pml). cmr:v1.5 This commit was SVN r22955.	2010-04-12 19:54:07 +00:00
Pavel Shamis	fc077a2102	Fix a minor bug in the error flow of check_if_device_support_modify_srq Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il> This commit was SVN r22953.	2010-04-12 11:28:44 +00:00
Rolf vandeVaart	0adb570693	Add pml_ob1_verbose flag. Fix the current location it is being used This commit was SVN r22939.	2010-04-07 13:51:42 +00:00
Ralph Castain	522a23d6a3	A few changes to the FT-related configure options: 1. fix a bug that caused an infinite loop in configure when specifying want-ft but not want-ft-thread by removing a stale reference to the opal-progress-thread option 2. add want-ft=orcm so we can build the orcm errmgr component 3. cleanup the use of "ompi_want_ft_xxx" and replace it with "opal_want_ft_xxx" so that naming conventions are preserved This commit was SVN r22885.	2010-03-25 22:53:48 +00:00
Christopher Yeoh	a6175bbefc	Adds copyright notice that should have gone in with r22700 This commit was SVN r22881. The following SVN revision numbers were found above: r22700 --> open-mpi/ompi@774a7a58b0	2010-03-25 04:03:52 +00:00
Christopher Yeoh	81e06a2baf	fixes trac:2340 - race in mca_mpool_base_free This commit was SVN r22878. The following Trac tickets were found above: Ticket 2340 --> https://svn.open-mpi.org/trac/ompi/ticket/2340	2010-03-25 03:29:27 +00:00
Christopher Yeoh	0b93c87c2c	Correct year for copyright notices This commit was SVN r22877.	2010-03-25 03:14:21 +00:00
George Bosilca	c0ff44b9fe	Don't let ROMIO mishandle the displacement for contiguous data with a non-zero true_lb. Thanks to Pascal Deveze for the patch. This commit was SVN r22864.	2010-03-23 01:23:45 +00:00
George Bosilca	1ed7fe5057	The mpool should take the same route as the rest of the pcie modules. This commit was SVN r22844.	2010-03-17 04:16:23 +00:00
Ralph Castain	b400b84162	Merge in the modified thread configure option branch per today's telecon. Remove the --enable-progress-threads option as this is no longer functional, and hardcode OPAL_ENABLE_PROGRESS_THREADS to 0. Replace the --enable-mpi-threads option with --enable-mpi-thread-multiple as this is clearer as to meaning. This option automatically turns "on" opal thread support if it wasn't already so specified. If the user specifies --disable-opal-multi-threads --enable-mpi-thread-multiple, we will error out with a message Add a new --enable-opal-multi-threads option that turns "on" opal thread support without doing anything wrt mpi-thread-multiple This commit was SVN r22841.	2010-03-16 23:10:50 +00:00
Rainer Keller	f6e4694d67	- Print the name correctly when a certain sync module is disabled This should be cmr'd to v1.5 and v1.4.2 (but the svn post hook won't let me at the moment). This commit was SVN r22827.	2010-03-13 21:07:34 +00:00
Josh Hursey	e9b5162d79	Fix the configure logic for --with-ft so that it properly takes a comma separated list. Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those. The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed. This commit was SVN r22824.	2010-03-12 23:57:50 +00:00
Josh Hursey	3db01f0795	Add the process name to the error message resulting from a failed mmap(), open(), or ftruncate() so that it is slightly easier to figure out which process in the system caused the problem with sm. This commit was SVN r22803.	2010-03-10 00:18:04 +00:00
Samuel Gutierrez	15f9f35a49	Another small typo fix. This commit was SVN r22802.	2010-03-09 21:23:21 +00:00
Samuel Gutierrez	dcb5a2331f	Fixed some typos in comments. This commit was SVN r22801.	2010-03-09 20:41:25 +00:00
Rainer Keller	06f5ba1c19	- Reverse the logic (OPAL_LIKELY -> OPAL_UNLIKELY) This commit was SVN r22796.	2010-03-08 14:00:59 +00:00
Jeff Squyres	95d7e08a66	More more discussion and testing has occurred off-ticket. Short version: there is a bug in OS X/Snow Leopard, but there is also a bug in Open MPI. Fixing the bug in Open MPI is both trivial (a 1-line change) and avoids the bug in OS X. We'll file an OS X bug report upstream with Apple, but it should no longer affect us here in OMPI. Fixes trac:2039. More details: Some background first: 1. IPv4 sockets can only accept incoming IPv4 connections. However, IPv6 sockets can be configured to accept ''only'' incoming IPv6 connection, or ''both'' incoming IPv4 and IPv6 connections. An IPv6 socket attribute sets which listening behavior is used. 1. IPv4 and IPv6 have different port namespaces. Hence, it is permissable to bind a v4 socket to port X ''and'' also bind a v6 socket to that same port X on the same interface (assuming that the v6 socket is only accepting incoming v6 connections). Incoming v4 connections to port X on the interface should get matched to the listening v4 socket; incoming v6 connections should get matched to the listening v6 socket. 1. When v6 sockets accept ''both'' incoming v4 and v6 connections, it should claim port X in both namespaces. 1. Linux's default behavior is to only allow one listening socket to be bound to a given port (i.e., ''either'' a v6 or v4 socket to be bound to a single port X -- not both). A v6 socket can listen for both v4 and v6 incoming connections on that port, but still -- only one socket will be bound to that port. 1. Snow Leopard's default behavior is to share ports -- i.e., let both a v4 and a v6 listening socket to be bound to port X (assuming that the v6 socket is only accepting incoming v6 connections). The TCP BTL creates a listening socket for each address family. Hence, it creates a v4 listening socket on INADDR_ANY and a v6 listening socket on the v6 equivalent of INADDR_ANY. OMPI then iteratively tries to find ports to listen on within the range of [mca_btl_tcp_port_min, mca_btl_tcp_port_min + mca_btl_tcp_port_range). On Linux, the v4 socket will be bound to port X and the v6 socket will likely be bound to port Y (where X != Y). On Snow Leopard, the v4 socket will be bound to port X and the v6 socket may ''also'' be bound to port X. Since the namespaces are separate, this shouldn't be a problem. However, Open MPI was accidentally setting the v6 listening behavior to accept ''both'' v4 and v6 incoming connections. This is a trivial thing to fix -- change a 0 to a 1 in the code. On Linux, this issue didn't matter because the v4 and v6 sockets were on different ports. So even though the v6 socket ''would'' have accepted incoming v4 connections, that never happened because OMPI would direct v4 connections to the v4 port. But on Snow Leopard, the v4 and v6 listening ports could end up sharing the same port number. As mentioned above, this ''shouldn't'' have been a problem, but it looks like Snow Leopard has the following bugs: * If a v4 socket is already bound to port X, we're pretty sure that a v6 socket with the "accept both v4 and v6 incoming connections" listening behavior should not be able to claim port X (because there's already a v4 socket listening on X). However, Snow Leopard would allow binding a v4 socket to port X, and then allow a v6 socket configured to allow incoming v4 and v6 connections to ''also'' be bound to port X. * After binding the v6 socket to port X, Snow Leopard then lets ''another'' v4 socket ''also'' get bound to port X. Hence, there's now '''three''' sockets all listening on port X. This obviously led to mis-matched TCP connections, and things went downhill from there. That being said, Snow Leopard doesn't exhibit this behavior if v6 sockets only allow incoming v6 connections. And technically, that is exactly the behavior we want (we want v6 sockets to only accept incoming v6 connections). So if we just change the flag to make our v6 listening socket us this behavior, the problem on OS X goes away. That's what this commit does -- it changes a 0 to a 1, indicating "only let this v6 socket allow incoming v6 connections." That was simple, wasn't it? This commit was SVN r22788. The following Trac tickets were found above: Ticket 2039 --> https://svn.open-mpi.org/trac/ompi/ticket/2039	2010-03-05 17:37:57 +00:00
Iain Bason	18d9e96301	Fixed two problems: 1. The code that looks at btl_tcp_if_exclude before doing a modex_send uses strcmp rather than strncmp. That means that "lo0" gets sent even though "lo" is excluded. 2. The code that determines whether a particular local TCP interface can connect to a particular remote interface doesn't check for loopback interfaces. With this fix, users can now enable "lo" and be assured that it will only be used for intra- node communication. This commit was SVN r22762.	2010-03-03 15:51:15 +00:00
Ralph Castain	c88fe1ea54	Create a new mca parameter to control creation of session directories. Defaults to true so that the current behavior of always creating them is preserved. If set to false (0), then don't create session directories. Helps in those environments where session directories are a problem. Tell the sm btl that it cannot run if no session directories were created. This commit was SVN r22756.	2010-03-02 15:18:33 +00:00
Ralph Castain	f4c3cceb5e	Get the function prototypes to match so we eliminate an annoying warning This commit was SVN r22726.	2010-02-27 16:41:16 +00:00

1 2 3 4 5 ...

3242 Коммитов