openmpi

Автор	SHA1	Сообщение	Дата
Josh Hursey	4c453caab6	Make the check a bit better This commit was SVN r14542.	2007-04-27 17:38:36 +00:00
Josh Hursey	486f29eb6b	Make sure to use the new metadata flags This commit was SVN r14541.	2007-04-27 17:18:26 +00:00
Shiqing Fan	c166e3d02c	Too few arguments for call, fixed according to the corresponding definition. This commit was SVN r14538.	2007-04-27 13:14:43 +00:00
Sven Stork	8d92773067	- export required symbol This commit was SVN r14536.	2007-04-27 11:38:45 +00:00
Rainer Keller	63b904ed1d	- Don't segfault, when calling PERUSE_Init before MPI_Init... This commit was SVN r14535.	2007-04-26 21:06:08 +00:00
Rainer Keller	1aceece03f	- Add a few comments for elements for structs, a few spelling fixes. No functional change. This commit was SVN r14534.	2007-04-26 21:03:38 +00:00
Ralph Castain	7d6d0a1c00	Update reuse_daemons to find the daemons again - requires that orteds now report their nodenames (probably temporary patch pending upcoming minor revision of orted) This commit was SVN r14533.	2007-04-26 15:09:54 +00:00
Ralph Castain	c733a7916b	Update the gridengine pls to handle failed-to-start. Fix a few places where the fork'd child incorrectly called "return" instead of "exit" (undoubtedly copied from the same error in the old rsh pls). This commit was SVN r14532.	2007-04-26 15:08:37 +00:00
Ralph Castain	bca2de3a57	Complete the update of the rsh pls to handle failed-to-start This commit was SVN r14531.	2007-04-26 15:07:40 +00:00
Rainer Keller	ce32b918da	- Fixes for for unlocking the mutex in case of error in functions mca_btl_openib_post_srr and btl_openib_endpoint_post_rr This commit was SVN r14530.	2007-04-26 13:33:02 +00:00
Sven Stork	7341af93cf	- fix comment to refer to the right header file This commit was SVN r14528.	2007-04-26 09:36:47 +00:00
Sven Stork	fe3b08004e	- export symbols that are needed by the fortran libs This commit was SVN r14527.	2007-04-26 09:34:41 +00:00
Rainer Keller	6f9251ed39	- Small fixes by PGI -Minform=inform This commit was SVN r14524.	2007-04-26 08:16:07 +00:00
Rainer Keller	9c3838d1a0	- Make definition available within C as well. This commit was SVN r14523.	2007-04-26 08:06:00 +00:00
Jeff Squyres	c331ed7bda	Fix "make dist" by adding ipv6compat.h into the headers list. This commit was SVN r14522.	2007-04-26 02:33:02 +00:00
Jeff Squyres	e30d467ea3	Fix typo in PGI notes. This commit was SVN r14520.	2007-04-26 00:37:14 +00:00
Josh Hursey	af38efd27c	Use more of the datatype engine supplied functions This commit was SVN r14519.	2007-04-26 00:06:22 +00:00
Jelena Pjesivac-Grbovic	3eac49aa59	Adding flow control for leaf nodes in generalized reduce structure. This "feature" is disabled by default and it should not affect the current performance. In case when the message size is large and segment size is smaller than eager size for particular interface, the leaf nodes in generalized reduce function can overflood parent nodes by sending all segments without any synchronization. This can cause the parent to have HIGH number of unexpected messages (think 16MB message with 1KB segments for example). In case of binomial algorithm root node always has at least one child which is leaf, so this can potentially affect the root's performance significantly [Especially in large communicators where root may have quite a few children (binomial tree for example)]. When the segment size is bigger than the eager size, rendezvous protocol ensures that this does not happen so it is not necessary. Originally, the problem was exposed in "infinite" bucket allocator clean up time for "small" segment sizes (which may explain some "deadlocks" on Thunderbird tests). To prevent this, we allow user to specify mca parameter "--mca coll_tuned_reduce_algorithm_max_requests NUM" this limits number of outstanding messages from a leaf node in generalized reduce to the parent to NUM. Messages are sent as non-blocking synchrnous messages, so syncronization happens at "wait" time. The synchronization actually improved performance of pipeline and binomial algorithm for large message sizes with 1KB segments over MX, but I need to test it some more to make sure it is consistent. Since there is no easy way to find out what is "the eager" size for particular btl, I set the limit to 4000B. If message/individual segment size is greater than 4000B - we will not use this feature. This variable may or may not be exposed as mca parameter later... I did not have any problems running it and both "default" and "synchronous" tests passed Intel Reduce* tests up to 80 processes (over MX). This commit was SVN r14518.	2007-04-25 20:39:53 +00:00
Adrian Knoth	e3d35258b4	Cosmetics. Brian fixes my crappy code and I fix the curly braces. That's teamwork, right? ;) This commit was SVN r14517.	2007-04-25 20:17:19 +00:00
Josh Hursey	d68ff8c2a3	minor typo This commit was SVN r14516.	2007-04-25 19:54:53 +00:00
Josh Hursey	596062d34b	Seems that the recent changes in the sds and oob exposed some invalid assumptions in the FT restart code for the ORTE layer. This fixes those problems by having the RML completely shutdown and restart the OOB framework (instead of just the module as before). This makes it much easier to manage, and maintainable as the OOB changes in the future. The SDS now does communication as part of its startup procedure, so we need to make sure we restart the RML before the SDS so that it can communicate properly. OOB base [close\|open] used a static bool to determine if they have been called previously or not. I needed to expose this boolean so that I can close() then open() the oob base in the restart procedure. The functionality has not changed, we just now have the ability to open/close the framework as many times as we need to as long as we always call them in that order. (So calling open twice in a row is not allowed as before, it is only allowed if you open(), close(), then open() again). Things seem to be working now. This commit was SVN r14515.	2007-04-25 19:51:52 +00:00
Brian Barrett	4b8bb70afb	A couple cleanups for the IPv6 support: - make opal_sockaddr2str() take a sockaddr_storage instead of a sockaddr_in6 so that it works for IPv4 and IPv6 addresses, and remove a whole bunch of #ifs in the OOOB code. - Fix a compiler warning in the TCP BTL due to run-time determined array size by making it a dynamicly allocated array. - Fix the unpacking code of IPv4 addresses when using IPv6 support, so that the address is in the correct location (instead of in an IPv6 structure, use an IPv4 structure). Refs trac:1005. This commit was SVN r14514. The following Trac tickets were found above: Ticket 1005 --> https://svn.open-mpi.org/trac/ompi/ticket/1005	2007-04-25 19:08:07 +00:00
Adrian Knoth	d01125f051	Typo. This commit was SVN r14513.	2007-04-25 18:19:34 +00:00
Adrian Knoth	d1ce39de4f	Move mca_btl_tcp_addr_isipv4public to opal_addr_isipv4public This commit was SVN r14512.	2007-04-25 18:06:06 +00:00
Donald Kerr	80d984441f	change so that we only check connection queue when expecting a connection; create a mca parameter that controls frequency at which the async queue is checked This commit was SVN r14511.	2007-04-25 17:46:25 +00:00
Ralph Castain	7d0f51e6b9	Begin setting up for a change to the OOB information passing functionality - this is totally transparent at the moment (need to change computers). This commit was SVN r14510.	2007-04-25 17:36:26 +00:00
Adrian Knoth	35fce38f43	Don't know why this line was here. This commit was SVN r14509.	2007-04-25 12:31:13 +00:00
Jeff Squyres	68da8b984f	Oops -- misspelled the macro. Thanks to Ralph for catching the error... This commit was SVN r14508.	2007-04-25 11:56:31 +00:00
Ralph Castain	8517a5a3a6	cleanup a few compiler warnings This commit was SVN r14507.	2007-04-25 11:51:18 +00:00
Adrian Knoth	868d8febfa	Enable rds/hostfile to accept IPv6 addresses. This commit was SVN r14505.	2007-04-25 06:55:58 +00:00
Jeff Squyres	c4c68e666a	Merge in the ipv6 work from /tmp/ipv6-merge. This commit was SVN r14503.	2007-04-25 01:55:40 +00:00
Jeff Squyres	00347db419	Put in a check for RLIMIT_PROC before trying to use it. This commit was SVN r14502.	2007-04-25 01:54:37 +00:00
Jeff Squyres	321e08c605	Add some missing header files This commit was SVN r14500.	2007-04-24 21:39:12 +00:00
Ralph Castain	18cb5c9762	Complete modifications for failed-to-start of applications. Modifications for failed-to-start of orteds coming next. This completes the minor changes required to the PLS components. Basically, there is a small change required to the parameter list of the orted cmd functions. I caught and did it for xcpu and poe, in addition to the components listed in my email - so I think that only leaves xgrid unconverted. The orted fail-to-start mods will also make changes in the PLS components, but those can be localized so they come in one at a time. This commit was SVN r14499.	2007-04-24 20:53:54 +00:00
Ralph Castain	a764aa6395	Modify iof to report back more descriptive errors This commit was SVN r14497.	2007-04-24 19:28:37 +00:00
Ralph Castain	c774f641fb	Modify orterun to provide more user-friendly reporting on jobs that fail to start This commit was SVN r14496.	2007-04-24 19:19:14 +00:00
Ralph Castain	19767802de	Let the errmgr know how to deal with incomplete starts This commit was SVN r14495.	2007-04-24 19:04:29 +00:00
Ralph Castain	ef71055cf8	Teach the odls to properly test for and report failed-to-start for application processes. Test for system limits (where known) prior to doing things like fork and pipe since some systems aren't very nice about it when we try to exceed such limits. This commit was SVN r14494.	2007-04-24 18:54:45 +00:00
Donald Kerr	cae24fcde1	move mca parameter registration into own .c and .h files This commit was SVN r14493.	2007-04-24 18:34:16 +00:00
Josh Hursey	8c2385416f	Per a developer request - Make sure that the wrapper selection is compiled out if not enabling FT. Before the logic would skip over it since the conditional if statements would not be satisfied, now there are no additional if statements when compiled out. With this modification the selection logic looks nearly identical to pre-r14051 with the exception of the non-FT related improvements. This commit was SVN r14491. The following SVN revision numbers were found above: r14051 --> open-mpi/ompi@dadca7da88	2007-04-24 17:08:48 +00:00
Ralph Castain	f5ef3d795e	Tell the smr how to handle failed-to-start This commit was SVN r14488.	2007-04-24 16:23:26 +00:00
Jeff Squyres	d8cc501384	Add missing header. This commit was SVN r14485.	2007-04-24 14:27:51 +00:00
Jeff Squyres	0674bbd001	Fix segv when the shell is not recognized. Thanks to Mostyn Lewis for noticing the problem. This commit was SVN r14483.	2007-04-24 12:00:54 +00:00
Ralph Castain	2d04298002	Update the orted cmd xmit functions to match orted recv's. This fixes trac:1004. This commit was SVN r14482. The following Trac tickets were found above: Ticket 1004 --> https://svn.open-mpi.org/trac/ompi/ticket/1004	2007-04-24 01:58:40 +00:00
Jeff Squyres	50e0745c9e	Update copyright. This commit was SVN r14480.	2007-04-24 00:18:38 +00:00
Josh Hursey	260e7612ad	Fix a few interface changes introduced by r14475 This commit was SVN r14479. The following SVN revision numbers were found above: r14475 --> open-mpi/ompi@18b2dca51c	2007-04-23 20:18:27 +00:00
Ralph Castain	5f94d6d791	Fix the cnos rml to match revised xcast API This commit was SVN r14478.	2007-04-23 19:07:44 +00:00
Jeff Squyres	08041a54c5	Add yet another define option for the spec file: use_default_rpm_opt_flags. It defaults to a value of 1, meaning that we'll try to use $RPM_OPT_FLAGS. But if you're not compiling with the GNU compilers, you might want to set this value to 0 so that your compiler doesn't get flags that it doesn't understand (e.g., PGI 7.0 will barf on flags that it doesn't understand). This commit was SVN r14477.	2007-04-23 19:00:29 +00:00
Ralph Castain	1682a72d34	Add ability to read system limits on number of children, open files, and file size from the local OS - to be used in failed-to-start scenarios This commit was SVN r14476.	2007-04-23 18:53:47 +00:00
Ralph Castain	18b2dca51c	Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly. There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd really have to try). This also involved a slight change to the oob.xcast API, so propagated that as required. Note: this has only been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-) Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately. This commit was SVN r14475.	2007-04-23 18:41:04 +00:00

1 2 3 4 5 ...

9565 Коммитов