openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	79cf382ff3	Fix a few issues with error messages: * If something goes wrong during ompi_mpi_init, don't erroneously report that it is illegal to invoke MPI_INIT* before MPI_INIT * Aggregate help messages when possible when something goes wring during ompi_mpi_init This commit was SVN r24492.	2011-03-07 16:45:45 +00:00
Jeff Squyres	a525e70f46	Convert "opal_show_help" to be a global variable pointer. It is statically initialized to the real back-end OPAL show_help function. During orte_show_help_init(), the variable is re-assigned with the value of the back-end ORTE show_help function (the one that does error message aggregation). Therefore, anything that calls opal_show_help() after a certain point in orte_init() will have their show_help messages be aggregated. w00t! Even code down in OPAL -- that has no knowledge of ORTE -- will have their messages aggregated. '''Double w00t!''' During orte_show_help_finalize(), we restore the original pointer value so that it something calls opal_show_help() after orte_finalize(), it'll still work properly (but it won't be aggregated). This commit was SVN r24185.	2010-12-16 23:00:25 +00:00
Ralph Castain	9ea2b196ce	Convert the opal_event framework to use direct function calls instead of hiding functions behind function pointers. Eliminate the opal_object_t abstraction of libevent's event struct so it can be directly passed to the libevent functions. Note: the ompi_check_libfca.m4 file had to be modified to avoid it stomping on global CPPFLAGS and the like. The file was also relocated to the ompi/config directory as it pertains solely to an ompi-layer component. Forgive the mid-day configure change, but I know Shiqing is working the windows issues and don't want to cause him unnecessary redo work. This commit was SVN r23966.	2010-10-28 15:22:46 +00:00
Ralph Castain	86c7365e8e	Clean up a few initialization issues - don't think these are impacting the shared memory situation as it didn't fix the problem. Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution. This commit was SVN r23943.	2010-10-26 02:41:42 +00:00
Ralph Castain	fceabb2498	Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac. This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects. Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems. Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct. I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things: 1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new) 2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it. There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do. This commit was SVN r23925.	2010-10-24 18:35:54 +00:00
Jeff Squyres	2c03554fe7	Add new function: orte_show_help_norender(). It is exactly the same as orte_show_help(), but it takes a fully-rendered string instead of a varargs list that must be rendered. This function is useful in cases where one entity renders the "show help" string and a different entity sends the string via the normal orte "show help" mechanisms for aggregation, etc. Example usage: errors occur in the ODLS after forking but before exec'ing. In such cases, it makes sense for the the child process to render the "show help" string because it has all the details about the error. But the child process can't call orte_show_help() itself because it is not an ORTE process -- it can't OOB send the message to the HNP, etc. After rendering the help string, the child sends the rendered string to its parent via normal IPC (e.g., via a pipe) and the parent can then invoke orte_show_help_norender() with the ready-to-go string. The message then displays out via the normal mechanisms (i.e., out via the HNP, aggregated/coalesced, etc.). This commit was SVN r23651.	2010-08-24 19:12:57 +00:00
Jeff Squyres	f1a7b5cc33	Make "processor affinity not supported" error message a little better: * Remove OPAL_ERR_PAFFINITY_NOT_SUPPORTED; fit it into the generic OPAL_ERR_NOT_SUPPORTED case. * When odls_default detects that processor affinity is not supported, it prints a specific message about it, and then it suppressed a generic HNP help message that would normally follow it (i.e., it's easier to have the "processor affinity is not supported" show_help message last). * Use some symbolic names in odls_default instead of fixed int's, just for slight readability improvements in the code. * Introduce orte_show_help_suppress(), which gives the ability to suppress any future showings of any arbitrary show_help() message. This is useful if you display message X and want to suppress message Y. This suppression only works in environments where orte_show_help() does coalescing. This commit was SVN r23249.	2010-06-08 20:16:07 +00:00
Abhishek Kulkarni	afbe3e99c6	* Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with (OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns back the native error code. * Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to decode 'ret' to get the native error code. This commit was SVN r23162.	2010-05-17 23:08:56 +00:00
Ralph Castain	de6679dbd3	Truly respect the -quiet option. Make it an mca param so someone doesn't have to put it solely on the cmd line. Tell show_help to shaddup as well. This commit was SVN r22926.	2010-04-02 14:19:38 +00:00
Ralph Castain	0421a49844	Update the xml support to allow -xml-file foo whereby we redirect all xml formatted output (and ONLY xml formatted output) to a specified file This commit was SVN r21930.	2009-09-02 18:03:10 +00:00
Ralph Castain	35f8b68de6	Note to self: save all changes before committing This commit was SVN r21863.	2009-08-21 12:54:29 +00:00
Ralph Castain	535408d6c2	Answer a Jeff-ism and check malloc for NULL return - for all xml formatting errors, revert to at least showing the non-xml formatted message This commit was SVN r21862.	2009-08-21 12:41:54 +00:00
Ralph Castain	2e0bd04755	Ensure that show_help messages are properly xml formatted This commit was SVN r21858.	2009-08-20 19:23:26 +00:00
Ralph Castain	50bd635200	Also require that the routed framework be initialized before attempting to use orte_show_help This commit was SVN r21638.	2009-07-12 10:50:14 +00:00
Ralph Castain	cc7620c210	Fix orte-ps so it properly ignores/reports stale HNPs, but continues to provide output on running ones. Add a timeout on the send side of the comm so we don't hang while trying to send the info request to the non-existent HNP. This commit was SVN r21257.	2009-05-21 02:42:21 +00:00
Ralph Castain	4be24521aa	Modify the orte_process_info structure to handle a broader range of process types by replacing the individual booleans with a 32-bit bitmap. Use a set of #define's to define the individual bits, and a set of matching macros to test for them. Update the orte code base to use the macros instead of the booleans. Minor mod to the ompi layer to use the new #define's - just one-line name replacements. This commit was SVN r21144.	2009-05-04 11:07:40 +00:00
Rainer Keller	221fb9dbca	... Delayed due to notifier commits earlier this day ... - Delete unnecessary header files using contrib/check_unnecessary_headers.sh after applying patches, that include headers, being "lost" due to inclusion in one of the now deleted headers... In total 817 files are touched. In ompi/mpi/c/ header files are moved up into the actual c-file, where necessary (these are the only additional #include), otherwise it is only deletions of #include (apart from the above additions required due to notifier...) - To get different MCAs (OpenIB, TM, ALPS), an earlier version was successfully compiled (yesterday) on: Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled This commit was SVN r21096.	2009-04-29 01:32:14 +00:00
Rainer Keller	ec0ed48718	- Revert r20739 This commit was SVN r20742. The following SVN revision numbers were found above: r20739 --> open-mpi/ompi@781caee0b6	2009-03-05 21:56:03 +00:00
Rainer Keller	a94438343b	- Revert r20740 This commit was SVN r20741. The following SVN revision numbers were found above: r20740 --> open-mpi/ompi@2a70618a77	2009-03-05 21:50:47 +00:00
Rainer Keller	2a70618a77	- Second patch, as discussed in Louisville. Replace short macros in orte/util/name_fns.h to the actual fct. call. - Compiles on linux/x86-64 This commit was SVN r20740.	2009-03-05 21:14:18 +00:00
Rainer Keller	781caee0b6	- First of two or three patches, in orte/util/proc_info.h: Adapt orte_process_info to orte_proc_info, and change orte_proc_info() to orte_proc_info_init(). - Compiled on linux-x86-64 - Discussed with Ralph This commit was SVN r20739.	2009-03-05 20:36:44 +00:00
Rainer Keller	d81443cc5a	- On the way to get the BTLs split out and lessen dependency on orte: Often, orte/util/show_help.h is included, although no functionality is required -- instead, most often opal_output.h, or orte/mca/rml/rml_types.h Please see orte_show_help_replacement.sh commited next. - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration actually showed two missing #include "orte/util/show_help.h" in orte/mca/odls/base/odls_base_default_fns.c and in orte/tools/orte-top/orte-top.c Manually added these. Let's have MTT the last word. This commit was SVN r20557.	2009-02-14 02:26:12 +00:00
Jeff Squyres	e0a991a8c2	Print out a message telling the user how to enable non-aggregated help / error messages. This commit was SVN r19604.	2008-09-22 17:42:56 +00:00
Jeff Squyres	8eccda391a	Fix comment to match the code. This commit was SVN r19598.	2008-09-20 12:35:48 +00:00
Ralph Castain	8e3658b320	Remove the nodename:pid prefix from show_help output so it doesn't disrupt the formatted output This commit was SVN r18843.	2008-07-08 22:57:50 +00:00
George Bosilca	0f9b9c0aff	Remove a warning and add arequired header (otherwise we cannot compile when --disable-debug is specified). This commit was SVN r18665.	2008-06-18 08:10:02 +00:00
Ralph Castain	0532d799d6	Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm. Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed. This commit was SVN r18664.	2008-06-18 03:15:56 +00:00
Ralph Castain	d61fe87d04	Use the opal_show_help system if orte_show_help has not been initialized This fixes ticket #1342 This commit was SVN r18644.	2008-06-11 12:50:40 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00

29 Коммитов