openmpi

Автор	SHA1	Сообщение	Дата
Rolf vandeVaart	c48dde66ac	Change the __const qualifier to const. This change was needed because the Sun Studio compiler did not recognize __const. This commit fixes trac:1011. This commit was SVN r14558. The following Trac tickets were found above: Ticket 1011 --> https://svn.open-mpi.org/trac/ompi/ticket/1011	2007-05-01 17:45:37 +00:00
Ralph Castain	4510b42638	Hold the RMGR in the spawn command until the application process actually launches. Previously, we returned from spawn immediately after launching the daemons - this meant that the caller had to define their own "wait until app launches". This only tells the caller that the app procs were launched, of course - it doesn't mean that they have started execution or reached any particular stage. However, for non-MPI procs, this is as far as we can go - there is no further stage gate we can provide. Still, better than what we provided before... This commit was SVN r14554.	2007-05-01 11:27:36 +00:00
Brian Barrett	a25ce44dc1	Clean up the preconnect code: * Don't need the 2 process case -- we'll send an extra message, but at very little cost and less code is better. * Use COMPLETE sends instead of STANDARD sends so that the connection is fully established before we move on to the next connection. The previous code was still causing minor connection flooding for huge numbers of processes. * mpi_preconnect_all now connects both OOB and MPI layers. There's also mpi_preconnect_mpi and mpi_preconnect_oob should you want to be more specific. * Since we're only using the MCA parameters once at the beginning of time, no need for global constants. Just do the quick param lookup right before the parameter is needed. Save some of that global variable space for the next guy. Fixes trac:963 This commit was SVN r14553. The following Trac tickets were found above: Ticket 963 --> https://svn.open-mpi.org/trac/ompi/ticket/963	2007-05-01 04:49:36 +00:00
Brian Barrett	e63346a633	Clean up a couple of issues with some of the configure tests: * Remove duplicate calls to ompi_check_package by poking at the internals just a bit. Possibly should make those officially exposed, but whatever. * Don't build OpenIB with PTMalloc2 and no thread support, as this will always lead to badness. * minor formatting cleanups This commit was SVN r14552.	2007-05-01 04:40:31 +00:00
Adrian Knoth	d63d125a88	I guess we only need this when IPv6 is enabled. This commit was SVN r14551.	2007-04-29 16:38:34 +00:00
Adrian Knoth	5765ecc22e	This patch reverts r14549 while retaining IPv6 support. Re #1008 This commit was SVN r14550. The following SVN revision numbers were found above: r14549 --> open-mpi/ompi@386baed55b	2007-04-29 16:23:11 +00:00
Adrian Knoth	386baed55b	Hotfix for IPv6 support. Closes trac:1008 This commit was SVN r14549. The following Trac tickets were found above: Ticket 1008 --> https://svn.open-mpi.org/trac/ompi/ticket/1008	2007-04-29 13:46:45 +00:00
George Bosilca	bb481273a6	Typos. This commit was SVN r14546.	2007-04-28 19:15:53 +00:00
George Bosilca	abe6ddfd04	Remove dead code. This commit was SVN r14545.	2007-04-28 19:15:02 +00:00
George Bosilca	46265db0a9	Update the TCP BTL in order to bring back some of the functionalities lost during the IPv6 patch. The most important is the multi BTL support. There was a quite interesting bug. Instead of setting up the multiple connections over different physical devices, based on the time when these connections were created most of the time they were all using the same physical network. Which, of course, was not the intended goal, as we top at the maximum bandwidth available over one device instead of gathering all available bandwidth from all devices. Second, the IPv6 RFC suggest to use sockaddr_storage as a holder for the IP information, but use a sockaddr* when we pass it to functions. This is only partially corrected by this patch. Some other minor cleanups. This commit was SVN r14544.	2007-04-28 19:13:47 +00:00
Josh Hursey	4c453caab6	Make the check a bit better This commit was SVN r14542.	2007-04-27 17:38:36 +00:00
Josh Hursey	486f29eb6b	Make sure to use the new metadata flags This commit was SVN r14541.	2007-04-27 17:18:26 +00:00
Shiqing Fan	c166e3d02c	Too few arguments for call, fixed according to the corresponding definition. This commit was SVN r14538.	2007-04-27 13:14:43 +00:00
Sven Stork	8d92773067	- export required symbol This commit was SVN r14536.	2007-04-27 11:38:45 +00:00
Rainer Keller	63b904ed1d	- Don't segfault, when calling PERUSE_Init before MPI_Init... This commit was SVN r14535.	2007-04-26 21:06:08 +00:00
Rainer Keller	1aceece03f	- Add a few comments for elements for structs, a few spelling fixes. No functional change. This commit was SVN r14534.	2007-04-26 21:03:38 +00:00
Ralph Castain	7d6d0a1c00	Update reuse_daemons to find the daemons again - requires that orteds now report their nodenames (probably temporary patch pending upcoming minor revision of orted) This commit was SVN r14533.	2007-04-26 15:09:54 +00:00
Ralph Castain	c733a7916b	Update the gridengine pls to handle failed-to-start. Fix a few places where the fork'd child incorrectly called "return" instead of "exit" (undoubtedly copied from the same error in the old rsh pls). This commit was SVN r14532.	2007-04-26 15:08:37 +00:00
Ralph Castain	bca2de3a57	Complete the update of the rsh pls to handle failed-to-start This commit was SVN r14531.	2007-04-26 15:07:40 +00:00
Rainer Keller	ce32b918da	- Fixes for for unlocking the mutex in case of error in functions mca_btl_openib_post_srr and btl_openib_endpoint_post_rr This commit was SVN r14530.	2007-04-26 13:33:02 +00:00
Sven Stork	7341af93cf	- fix comment to refer to the right header file This commit was SVN r14528.	2007-04-26 09:36:47 +00:00
Sven Stork	fe3b08004e	- export symbols that are needed by the fortran libs This commit was SVN r14527.	2007-04-26 09:34:41 +00:00
Rainer Keller	6f9251ed39	- Small fixes by PGI -Minform=inform This commit was SVN r14524.	2007-04-26 08:16:07 +00:00
Rainer Keller	9c3838d1a0	- Make definition available within C as well. This commit was SVN r14523.	2007-04-26 08:06:00 +00:00
Jeff Squyres	c331ed7bda	Fix "make dist" by adding ipv6compat.h into the headers list. This commit was SVN r14522.	2007-04-26 02:33:02 +00:00
Jeff Squyres	e30d467ea3	Fix typo in PGI notes. This commit was SVN r14520.	2007-04-26 00:37:14 +00:00
Josh Hursey	af38efd27c	Use more of the datatype engine supplied functions This commit was SVN r14519.	2007-04-26 00:06:22 +00:00
Jelena Pjesivac-Grbovic	3eac49aa59	Adding flow control for leaf nodes in generalized reduce structure. This "feature" is disabled by default and it should not affect the current performance. In case when the message size is large and segment size is smaller than eager size for particular interface, the leaf nodes in generalized reduce function can overflood parent nodes by sending all segments without any synchronization. This can cause the parent to have HIGH number of unexpected messages (think 16MB message with 1KB segments for example). In case of binomial algorithm root node always has at least one child which is leaf, so this can potentially affect the root's performance significantly [Especially in large communicators where root may have quite a few children (binomial tree for example)]. When the segment size is bigger than the eager size, rendezvous protocol ensures that this does not happen so it is not necessary. Originally, the problem was exposed in "infinite" bucket allocator clean up time for "small" segment sizes (which may explain some "deadlocks" on Thunderbird tests). To prevent this, we allow user to specify mca parameter "--mca coll_tuned_reduce_algorithm_max_requests NUM" this limits number of outstanding messages from a leaf node in generalized reduce to the parent to NUM. Messages are sent as non-blocking synchrnous messages, so syncronization happens at "wait" time. The synchronization actually improved performance of pipeline and binomial algorithm for large message sizes with 1KB segments over MX, but I need to test it some more to make sure it is consistent. Since there is no easy way to find out what is "the eager" size for particular btl, I set the limit to 4000B. If message/individual segment size is greater than 4000B - we will not use this feature. This variable may or may not be exposed as mca parameter later... I did not have any problems running it and both "default" and "synchronous" tests passed Intel Reduce* tests up to 80 processes (over MX). This commit was SVN r14518.	2007-04-25 20:39:53 +00:00
Adrian Knoth	e3d35258b4	Cosmetics. Brian fixes my crappy code and I fix the curly braces. That's teamwork, right? ;) This commit was SVN r14517.	2007-04-25 20:17:19 +00:00
Josh Hursey	d68ff8c2a3	minor typo This commit was SVN r14516.	2007-04-25 19:54:53 +00:00
Josh Hursey	596062d34b	Seems that the recent changes in the sds and oob exposed some invalid assumptions in the FT restart code for the ORTE layer. This fixes those problems by having the RML completely shutdown and restart the OOB framework (instead of just the module as before). This makes it much easier to manage, and maintainable as the OOB changes in the future. The SDS now does communication as part of its startup procedure, so we need to make sure we restart the RML before the SDS so that it can communicate properly. OOB base [close\|open] used a static bool to determine if they have been called previously or not. I needed to expose this boolean so that I can close() then open() the oob base in the restart procedure. The functionality has not changed, we just now have the ability to open/close the framework as many times as we need to as long as we always call them in that order. (So calling open twice in a row is not allowed as before, it is only allowed if you open(), close(), then open() again). Things seem to be working now. This commit was SVN r14515.	2007-04-25 19:51:52 +00:00
Brian Barrett	4b8bb70afb	A couple cleanups for the IPv6 support: - make opal_sockaddr2str() take a sockaddr_storage instead of a sockaddr_in6 so that it works for IPv4 and IPv6 addresses, and remove a whole bunch of #ifs in the OOOB code. - Fix a compiler warning in the TCP BTL due to run-time determined array size by making it a dynamicly allocated array. - Fix the unpacking code of IPv4 addresses when using IPv6 support, so that the address is in the correct location (instead of in an IPv6 structure, use an IPv4 structure). Refs trac:1005. This commit was SVN r14514. The following Trac tickets were found above: Ticket 1005 --> https://svn.open-mpi.org/trac/ompi/ticket/1005	2007-04-25 19:08:07 +00:00
Adrian Knoth	d01125f051	Typo. This commit was SVN r14513.	2007-04-25 18:19:34 +00:00
Adrian Knoth	d1ce39de4f	Move mca_btl_tcp_addr_isipv4public to opal_addr_isipv4public This commit was SVN r14512.	2007-04-25 18:06:06 +00:00
Donald Kerr	80d984441f	change so that we only check connection queue when expecting a connection; create a mca parameter that controls frequency at which the async queue is checked This commit was SVN r14511.	2007-04-25 17:46:25 +00:00
Ralph Castain	7d0f51e6b9	Begin setting up for a change to the OOB information passing functionality - this is totally transparent at the moment (need to change computers). This commit was SVN r14510.	2007-04-25 17:36:26 +00:00
Adrian Knoth	35fce38f43	Don't know why this line was here. This commit was SVN r14509.	2007-04-25 12:31:13 +00:00
Jeff Squyres	68da8b984f	Oops -- misspelled the macro. Thanks to Ralph for catching the error... This commit was SVN r14508.	2007-04-25 11:56:31 +00:00
Ralph Castain	8517a5a3a6	cleanup a few compiler warnings This commit was SVN r14507.	2007-04-25 11:51:18 +00:00
Adrian Knoth	868d8febfa	Enable rds/hostfile to accept IPv6 addresses. This commit was SVN r14505.	2007-04-25 06:55:58 +00:00
Jeff Squyres	c4c68e666a	Merge in the ipv6 work from /tmp/ipv6-merge. This commit was SVN r14503.	2007-04-25 01:55:40 +00:00
Jeff Squyres	00347db419	Put in a check for RLIMIT_PROC before trying to use it. This commit was SVN r14502.	2007-04-25 01:54:37 +00:00
Jeff Squyres	321e08c605	Add some missing header files This commit was SVN r14500.	2007-04-24 21:39:12 +00:00
Ralph Castain	18cb5c9762	Complete modifications for failed-to-start of applications. Modifications for failed-to-start of orteds coming next. This completes the minor changes required to the PLS components. Basically, there is a small change required to the parameter list of the orted cmd functions. I caught and did it for xcpu and poe, in addition to the components listed in my email - so I think that only leaves xgrid unconverted. The orted fail-to-start mods will also make changes in the PLS components, but those can be localized so they come in one at a time. This commit was SVN r14499.	2007-04-24 20:53:54 +00:00
Ralph Castain	a764aa6395	Modify iof to report back more descriptive errors This commit was SVN r14497.	2007-04-24 19:28:37 +00:00
Ralph Castain	c774f641fb	Modify orterun to provide more user-friendly reporting on jobs that fail to start This commit was SVN r14496.	2007-04-24 19:19:14 +00:00
Ralph Castain	19767802de	Let the errmgr know how to deal with incomplete starts This commit was SVN r14495.	2007-04-24 19:04:29 +00:00
Ralph Castain	ef71055cf8	Teach the odls to properly test for and report failed-to-start for application processes. Test for system limits (where known) prior to doing things like fork and pipe since some systems aren't very nice about it when we try to exceed such limits. This commit was SVN r14494.	2007-04-24 18:54:45 +00:00
Donald Kerr	cae24fcde1	move mca parameter registration into own .c and .h files This commit was SVN r14493.	2007-04-24 18:34:16 +00:00
Josh Hursey	8c2385416f	Per a developer request - Make sure that the wrapper selection is compiled out if not enabling FT. Before the logic would skip over it since the conditional if statements would not be satisfied, now there are no additional if statements when compiled out. With this modification the selection logic looks nearly identical to pre-r14051 with the exception of the non-FT related improvements. This commit was SVN r14491. The following SVN revision numbers were found above: r14051 --> open-mpi/ompi@dadca7da88	2007-04-24 17:08:48 +00:00

... 6 7 8 9 10 ...

9925 Коммитов