openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	e1e224b81a	Silence a couple of minor compiler warnings This commit was SVN r18617.	2008-06-09 12:57:41 +00:00
Ralph Castain	7bee71aa59	Fix a potential, albeit perhaps esoteric, race condition that can occur for fast HNP's, slow orteds, and fast apps. Under those conditions, it is possible for the orted to be caught in its original send of contact info back to the HNP, and thus for the progress stack never to recover back to a high level. In those circumstances, the orted can "hang" when trying to exit. Add a new function to opal_progress that tells us our recursion depth to support that solution. Yes, I know this sounds picky, but good ol' Jeff managed to make it happen by driving his cluster near to death... Also ensure that we declare "failed" for the daemon job when daemons fail instead of the application job. This is important so that orte knows that it cannot use xcast to tell daemons to "exit", nor should it expect all daemons to respond. Otherwise, it is possible to hang. After lots of testing, decide to default (again) to slurm detecting failed orteds. This proved necessary to avoid rather annoying hangs that were difficult to recover from. There are conditions where slurm will fail to launch all daemons (slurm folks are working on it), and yet again, good ol' Jeff managed to find both of them. Thanks you Jeff! :-/ This commit was SVN r18611.	2008-06-06 19:36:27 +00:00
George Bosilca	b2aa751c28	Remove a race condition in the threaded mode. As a callback is allowed to modify the callback array (add or remove), make sure we don't call the same callback twice if it get remove in another thread. This commit was SVN r18608.	2008-06-06 15:54:40 +00:00
Brian Barrett	de2c4deeda	Fix deadlock in thread case exposed by ORTE message model -- if we are in a callback from the event library and post an RML receive, we'll deadlock because the event library wouldn't be entered until the event library was not already entered. Now just protect data structures (which we were basically already doing) instead of code, like good threading people ;). This commit was SVN r15585.	2007-07-24 19:10:19 +00:00
Ralph Castain	c7be9a7121	Complete backout of prior sched_yield and paffinity changes This commit was SVN r13530.	2007-02-07 14:22:37 +00:00
Brian Barrett	cf8bc2ad0b	print debugging information about switing the event flag and print the initialization information This commit was SVN r13497.	2007-02-05 19:38:54 +00:00
Brian Barrett	e130f18cc2	Fix some compiler warnings that have slipped in lately... This commit was SVN r13037.	2007-01-08 17:20:09 +00:00
Rainer Keller	20d5c35f43	- Add header needed for OPAL_OUTPUT. This commit was SVN r12647.	2006-11-22 12:20:08 +00:00
Brian Barrett	33320b7165	Rework the opal_progress interface to better support dynamic processes and at the same time, remove some of the MPI-related options from OPAL: - provide mechanism to change at runtime whether sched_yield() should be called when the progress engine is idle - provide mechanism for changing the rate at which the event engine is called when there are "no" users of the event engine (ie, when using MPI but not TCP) - fix some function names in the progress engine to better match their intended use (and remove MPI naming scheme) - remove progress_mpi_enable / progress_mpi_disable because we can now use the functions to set the sched_yield and tick rate interfaces - rename opal_progress_events() to opal_progress_set_event_flag() because the first really isn't descriptive of what the function does and I always got confused by it This commit was SVN r12645.	2006-11-22 02:06:52 +00:00
Brian Barrett	f3a4026b39	* fix the comment, too This commit was SVN r11392.	2006-08-24 14:06:01 +00:00
Brian Barrett	9e5f5fe0af	* make the name of the define be correct... This commit was SVN r11391.	2006-08-24 14:02:26 +00:00
George Bosilca	9d26565e27	Add the sched_yield for Windows. This commit was SVN r11304.	2006-08-21 20:08:51 +00:00
Brian Barrett	a84e557815	Add new loop mode OPAL_EVLOOP_ONELOOP that behaved like OPAL_EVLOOP_ONCE did pre-libevent update. The problem is that the behavior of OPAL_EVLOOP_ONCE was changed by the OMPI team, which them broke things during the update, so it had to be reverted to the old meaning of loop until one event occurs. OPAL_EVLOOP_ONELOOP will go through the event loop once (like EVLOOP_NONBLOCK) but will pause in the event library for a bit (like EVLOOP_ONCE). fixes trac:234 This commit was SVN r11081. The following Trac tickets were found above: Ticket 234 --> https://svn.open-mpi.org/trac/ompi/ticket/234	2006-08-01 22:23:57 +00:00
Tim Woodall	8bf6ed7a36	- corrected locking in gm btl - gm api is not thread safe - initial support for gm progress thread - corrected threading issue in pml - added polling progress for a configurable number of cycles to wait for threaded case This commit was SVN r9188.	2006-03-02 00:39:07 +00:00
George Bosilca	670cefa1d0	Reorder the if's to avoid doing useless functions calls on the fast path. This commit was SVN r9061.	2006-02-16 16:08:12 +00:00
Brian Barrett	566a050c23	Next step in the project split, mainly source code re-arranging - move files out of toplevel include/ and etc/, moving it into the sub-projects - rather than including config headers with <project>/include, have them as <project> - require all headers to be included with a project prefix, with the exception of the config headers ({opal,orte,ompi}_config.h mpi.h, and mpif.h) This commit was SVN r8985.	2006-02-12 01:33:29 +00:00
George Bosilca	bd0ee62e62	Protect headers and use __WINDOWS__ for Windows code. This commit was SVN r8468.	2005-12-12 22:01:51 +00:00
Jeff Squyres	42ec26e640	Update the copyright notices for IU and UTK. This commit was SVN r7999.	2005-11-05 19:57:48 +00:00
Brian Barrett	dfdb5dc12a	* high resolution, low latency timers for a number of platforms, plus mods to opal_progress() to use the timers instead of a tick count for deciding whether to call the event loop or not. Currently supported platforms are: - solaris (x86 / sparc) - Linux (x86 / x86_64 / IA64) - Mac OS X (x86 / Power PC) This commit was SVN r6922.	2005-08-18 05:34:22 +00:00
Jeff Squyres	cf2c8b45a8	- Use proper prefixes for all #include statements (opal/, orte/, and ompi/). - There's still a handful of places that have orte/ #include files; still need to clean those up - A lot of places still use ompi/include/constants.h -- those need to be converted over to use OPAL_ return codes and then switch to the opal constants.h. This commit is the first few steps towards that... This commit was SVN r6843.	2005-08-12 20:46:25 +00:00
Brian Barrett	ad383f5fcd	* If the event library is going to be a noop (really, this only happens on Red Storm), don't bother doing all the book keeping work to do calls into the event library This commit was SVN r6829.	2005-08-12 16:21:17 +00:00
Brian Barrett	57efade1e2	* bump the event_tick_rate count from 100 to 10k, per conversation with Tim. Might want to bump this higher? * remove assumption that progress functions registered with opal_progress are not reentrant. There is still the assumption that the event loop is not reentrant (because it's not), so that part is still protected by a spinlock. This commit was SVN r6827.	2005-08-12 16:08:44 +00:00
Brian Barrett	a926a9b4fb	* I'm sure any decent optimizing compiler would have figured this one out, but there's no point in having the call_yield check performed if we don't have sched_yield to call in the first place. This commit was SVN r6823.	2005-08-12 15:21:59 +00:00
Brian Barrett	aab684f159	* fix off by one error that was causing the event library to only be triggered every other call to opal_progress when the TCP BTL/PTL were being used. This commit was SVN r6822.	2005-08-12 15:20:37 +00:00
Brian Barrett	6aa464b67e	More changes from Red Storm port - only call sched_yield if it exists - don't fail out if modex doens't work in ob1 - bunch of fixes for Portals BTL - add cnos rml component - add NULL gpr component (should only be used if replica AND proxy fail to load) This commit was SVN r6629.	2005-07-27 23:07:14 +00:00
Brian Barrett	a13166b500	* rename ompi_output to opal_output This commit was SVN r6329.	2005-07-03 23:31:27 +00:00
Brian Barrett	23b687b0f4	* rename ompi_event to opal_event This commit was SVN r6328.	2005-07-03 23:09:55 +00:00
Brian Barrett	ccd2624e3f	* rename ompi_progress to opal_progress This commit was SVN r6326.	2005-07-03 21:57:43 +00:00

28 Коммитов