openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	ffc5d8877f	Fix a problem where we're accidentally initializing the wrong errhandler (should be initializing _errors_throw_exceptions, not _are_fatal). This bug was not a huge tragedy because the only real problem is that _are_fatal has the wrong string name with it (because MPI::Init fixes up the _errors_throw_exceptions later). This commit was SVN r20458.	2009-02-05 21:36:10 +00:00
Jeff Squyres	bbfac2dfb5	Based on a review by Ralph, no need to call getpid() or gethostname(); we already have them in orte_process_info. Refs trac:1523. This commit was SVN r19615. The following Trac tickets were found above: Ticket 1523 --> https://svn.open-mpi.org/trac/ompi/ticket/1523	2008-09-23 20:04:34 +00:00
Jeff Squyres	4c558ed637	Enable aggregation checking for "*** An error occurred..." MPI layer help messages so that users only see the message once instead of N times when their MPI app crashes. Note that there is a tradeoff here -- we now call malloc in this particular "show the error" code path. This shouldn't usually be a problem, because the errors typically displayed through this mechanism are MPI API argument problems (e.g., sending a negative count to MPI_SEND), and not memory errors. But such API argument errors could be a consequence of of a prior memory error, so there's a nonzero chance that the error failure will fail to print because malloc failed. In this case, the user can disable help message aggregation (via the orte_base_want_aggregate MCA parameter) and we'll fall back to the no-malloc code path (but without aggregation). Note that we won't aggregate before MPI_INIT or after MPI_FINALIZE. So if you call an MPI function before MPI_INIT / after MPI_FINALIZE, you'll still see the error message N times. Nothing we can do about that; we need ORTE to do the aggregation properly (which is obviously unavailable before MPI_INIT / after MPI_FINALIZE). This commit was SVN r19611.	2008-09-23 17:19:24 +00:00
Jeff Squyres	d6696c46a6	Oops -- sometimes we actually pass NULL for the error_code. Make sure to handle that nicely without segv'ing. This commit was SVN r19603.	2008-09-22 17:41:39 +00:00
Jeff Squyres	5fd742e769	Add in the standardized way to notify a debugger if the MPI job is about to abort. Fixes trac:1509. This commit was SVN r19596. The following Trac tickets were found above: Ticket 1509 --> https://svn.open-mpi.org/trac/ompi/ticket/1509	2008-09-20 11:34:37 +00:00
Jeff Squyres	262f865e77	Fix CID 387: be safer about string handling when using fixed-length strings This commit was SVN r19174.	2008-08-06 12:15:49 +00:00
Rainer Keller	4712a73db5	- Help the compiler, by noting that errors are unlikely. This commit was SVN r19138.	2008-08-04 14:50:27 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Josh Hursey	99144db970	Improve checkpoint/restart support by allowing a checkpoint to progress when the process is not in the MPI library. This involves creating a separate thread for polling for a checkpoint request. This thread is active when the MPI process is not in the MPI library, and paused when the MPI process is in the library. Some MPI C interface files saw some spacing changes to conform to the coding standards of Open MPI. Changed MPI C interface files to use {{{OPAL_CR_ENTER_LIBRARY()}}} and {{{OPAL_CR_EXIT_LIBRARY()}}} instead of just {{{OPAL_CR_TEST_CHECKPOINT_READY()}}}. This will allow the checkpoint/restart system more flexibility in how it is to behave. Fixed the configure check for {{{--enable-ft-thread}}} so it has a know dependance on {{{--enable-mpi-thread}}} (and/or {{{--enable-progress-thread}}}). Added a line for Checkpoint/Restart support to {{{ompi_info}}}. Added some options to choose at runtime whether or not to use the checkpoint polling thread. By default, if the user asked for it to be compiled in, then it is used. But some users will want the ability to toggle its use at runtime. There are still some places for improvement, but the feature works correctly. As always with Checkpoint/Restart, it is compiled out unless explicitly asked for at configure time. Further, if it was configured in, then it is not used unless explicitly asked for by the user at runtime. This commit was SVN r17516.	2008-02-19 22:15:52 +00:00
Tim Prins	f722cc0fa2	Fixes trac:1216 Add a missing break to the outer switch statement of ompi_errhandler_invoke. While I'm there, remove a couple of TABs. This commit was SVN r17489. The following Trac tickets were found above: Ticket 1216 --> https://svn.open-mpi.org/trac/ompi/ticket/1216	2008-02-18 13:03:39 +00:00
Dan Lacher	6ff2123833	"fixes trac:1213" C++ breakage when used with --enable-visibility This commit was SVN r17448. The following Trac tickets were found above: Ticket 1213 --> https://svn.open-mpi.org/trac/ompi/ticket/1213	2008-02-13 19:47:16 +00:00
Dan Lacher	98f70d6318	Convert the C++ Comm, Datatype and Winn keyval creation and intercept callbacks to not use the STL as well as removing the STL use from the error handler routines. This was removing the STL from the C++ bindings (Solaris has 2 versions of the STL; if OMPI uses one and an MPI application wants to use another, Bad Things happen). The main idea is to wrap up the C++ callback function pointers and the user's extra_state into our own struct that is passed as the extra_state to the C keyval registration along with the intercept routines in intercepts.cc. When the C++ intercepts are activated, they unwrap the user's callback and extra state and call them. This commit was SVN r17409.	2008-02-10 19:29:25 +00:00
George Bosilca	626e5c4af8	Replace TAB by spaces. This commit was SVN r17251.	2008-01-26 18:53:05 +00:00
George Bosilca	6a0e944915	Add a new macro (CONSTRUCT_ERRCODE) and replace all the error code initialization with this macro. This make the code a lot small and less errorprone. This commit was SVN r17010.	2007-12-21 06:22:29 +00:00
George Bosilca	906e8bf1d1	Replace the ompi_pointer_array with opal_pointer_array. The next step (sometimes after the merge with the ORTE branch), the opal_pointer_array will became the only pointer_array implementation (the orte_pointer_array will be removed). This commit was SVN r17007.	2007-12-21 06:02:00 +00:00
Tim Prins	a194896ae8	Reverts r16130. There is a reason that we use the internal type (ompi_file_errhandler_fn) instead of the MPI typedef. When building without MPI-IO support (--disable-mpi-io), the MPI type is not defined, but the internal type IS defined in order to try to keep binary compatibility for apps that don't use MPI-IO. This commit was SVN r16136. The following SVN revision numbers were found above: r16130 --> open-mpi/ompi@cf5a38af5e	2007-09-15 11:19:13 +00:00
George Bosilca	cf5a38af5e	There is no reason to use the internal type (ompi_file_errhandler_fn) while everywhere else we're using the MPI typedef (MPI_File_errhandler_fn). This commit was SVN r16130.	2007-09-14 21:23:39 +00:00
Brian Barrett	2b8af283de	Add ability to completely turn off MPI one-sided support, so that users can experiment with using ROMIO directly. This commit was SVN r15922.	2007-08-18 21:35:51 +00:00
Sven Stork	4c5836c2ee	- add missing va_end found by coverity This commit was SVN r15689.	2007-07-30 16:08:18 +00:00
Sven Stork	855434de59	- fixes several coverty issues - add missing initialisation for variables - use strncpy instead of strcpy This commit was SVN r15683.	2007-07-30 14:44:37 +00:00
Sven Stork	fe3b08004e	- export symbols that are needed by the fortran libs This commit was SVN r14527.	2007-04-26 09:34:41 +00:00
George Bosilca	6b217d31e1	Add OPAL_LIKELY where necessary. This commit was SVN r14318.	2007-04-12 04:32:07 +00:00
Jeff Squyres	38c976d527	Remove redundant declaration of ompi_err_unknown. This commit was SVN r13829.	2007-02-27 19:37:42 +00:00
George Bosilca	1be60e802f	Add the DECLSPEC. This commit was SVN r12975.	2007-01-03 19:59:18 +00:00
Brian Barrett	1ba97181dc	A number of MPI-2 compliance fixes for the C++ bindings: * Added Create_errhandler for MPI::File * Make errors_throw_exceptions a first-class predefined exception handler, and make it work for Comm, File, and Win * Deal with error handlers and attributes for Files, Types, and Wins like we do with Comms - can't just cast the callbacks from C++ signatures to C signatures. Callbacks will then fire with the C object, not the C++ object. That's bad. Refs trac:455 This commit was SVN r12945. The following Trac tickets were found above: Ticket 455 --> https://svn.open-mpi.org/trac/ompi/ticket/455	2006-12-30 23:41:42 +00:00
Edgar Gabriel	dc532577db	Adding more accurate checking of the input parameters for the add_error_class and add_error_code files. Also fixed the update of the lastusedcode attribute, all of work according to my tests pretty fine. Please note: the testcode attached to the bug 683 still reports some bugs. I am however pretty sure that the testcode is wrong at that points: - the standard says that the attribute MPI_LASTUSEDCODE has to be updated for a new error_class or a new error_code. The test currently assumes, that only the add_error_code call changes the attribute value. - you have to comment out the two lines 73 and 74 in order to make the test finish, since these lines check for the error string of non-existent codes. - line 126 the error-string of MPI_ERR_ARG is not "invalid argument" but a little bit more, so the test thinks the output is wrong. So probably the test has to be update to match the according error string of MPI_ERR_ARG. Fixes trac:682 This commit was SVN r12913. The following Trac tickets were found above: Ticket 682 --> https://svn.open-mpi.org/trac/ompi/ticket/682	2006-12-21 19:36:31 +00:00
Jeff Squyres	c7282855e7	Fixes trac:659 This commit fixes several aspects regarding MPI conformance of requests. * Eliminate the last argument of ompi_errhandler_request_invoke(); we ''always'' want to invoke the back-end exception handler with the real error code. * Make it clear in comments that we only invoke the ''first'' exception in a given array of requests, even if there's more than one request with a non-MPI_SUCCESS value for MPI_ERROR. * Defer the freeing of requests upon exception in the back-end functions to MPI_WAIT* and MPI_TEST* until later; the requests are kept so that we know what handler to invoke when we actually invoke the exception. After figuring that out, ''then'' we free requests with pending exceptions on them. * Clean up return codes from the back-end MPI_TEST* and MPI_WAIT* functions. * Slightly modify ompi_errcode_get_mpi_code() to return unity if it receives an MPI error code (vs. an OMPI error code). This commit was SVN r12810. The following Trac tickets were found above: Ticket 659 --> https://svn.open-mpi.org/trac/ompi/ticket/659	2006-12-09 14:20:08 +00:00
Edgar Gabriel	1359ba9b13	Rewriting much of the errorcode and errorclass code, since - we have to be able to attach a string to an error class, not just to an error code - according to MPI-2 the attribute MPI_LASTUSEDCODE has to be updated everytime you add a new code or a new class. Thus, you have to have single list for both. Thus, we got rid of the error_class structure. In the error-code structure, we can distinguish whether we are dealing with an error code or an error class by looking at the err->code element of the structure. In case its value is MPI_UNDEFINED, the according entry is a class, else it is an error code. All predefined error codes have the code and the class field set to the same value. The test MPI_Add_error_class1 passes now. Fixes trac:418 This commit was SVN r12764. The following Trac tickets were found above: Ticket 418 --> https://svn.open-mpi.org/trac/ompi/ticket/418	2006-12-05 19:07:02 +00:00
George Bosilca	eab1776e9a	Explicit casts for our friendly Windows environment... This commit was SVN r12496.	2006-11-08 17:02:46 +00:00
George Bosilca	f9f1087e7f	Be nice and let everybody access this variable. This commit was SVN r12495.	2006-11-08 16:56:17 +00:00
Jeff Squyres	0b2616173a	Fixes trac:549 * For MPI_TEST, MPI_TESTANY, MPI_WAIT, and MPI_WAITANY (i.e., the TEST/WAIT functions that return up to exactly one completed request), return the actual error code. * For MPI_TESTALL, MPI_TESTSOME, MPI_WAITALL, MPI_WAITSOME, (i.e., the TEST/WAIT functions that can return more than one completed request), return MPI_ERR_IN_STATUS. This commit was SVN r12355. The following Trac tickets were found above: Ticket 549 --> https://svn.open-mpi.org/trac/ompi/ticket/549	2006-10-30 19:50:09 +00:00
Jeff Squyres	e02114dcf3	Fixes trac:529. * Create a new request type: NOOP (described below) * For all MPI__INIT functions, OBJ_NEW an ompi_request_t and set its type to NOOP Ensure that the NOOP requests are OBJ_RELEASE'd when they are done * MPI_START looks at the request type; if NOOP, just return success. If not, call the PML start() function * MPI_STARTALL always pass the entire array of requests back to the PML (see next point) * Make the PMLs only process PML requests (i.e., ignore/skip anything that isn't of type PML -- such as the NOOP requests) * Add a little more param error checking in STARTALL This commit was SVN r12338. The following Trac tickets were found above: Ticket 529 --> https://svn.open-mpi.org/trac/ompi/ticket/529	2006-10-27 12:32:36 +00:00
Jeff Squyres	126a8b1e22	If we pass an erroneous error code in, don't segv. Instead, return a "this is bogus" kind of answer. Passing in bad error codes should only happen in erroneous sections of the OMPI code base, but still, it's far more social to print a message saying, "hey, you messed up!" rather than seg faulting. Reviewed by Edgar. This commit was SVN r12295.	2006-10-25 13:37:45 +00:00
Brian Barrett	566a050c23	Next step in the project split, mainly source code re-arranging - move files out of toplevel include/ and etc/, moving it into the sub-projects - rather than including config headers with <project>/include, have them as <project> - require all headers to be included with a project prefix, with the exception of the config headers ({opal,orte,ompi}_config.h mpi.h, and mpif.h) This commit was SVN r8985.	2006-02-12 01:33:29 +00:00
Brian Barrett	b1d2424013	Merge in present work on the MPI-2 onesided chapter. The current code is not complete, but stable enough that it will have no impact on general development, so into the trunk it goes. Changes in this commit include: - Remove the --with option for disabling MPI-2 onesided support. It complicated code, and has no real reason for existing - add a framework osc (OneSided Communication) for encapsulating all the MPI-2 onesided functionality - Modify the MPI interface functions for the MPI-2 onesided chapter to properly call the underlying framework and do the required error checking - Created an osc component pt2pt, which is layered over the BML/BTL for communication (although it also uses the PML for long message transfers). Currently, all support functions, all communication functions (Put, Get, Accumulate), and the Fence synchronization function are implemented. The PWSC active synchronization functions and Lock/Unlock passive synchronization functions are still not implemented This commit was SVN r8836.	2006-01-28 15:38:37 +00:00
Jeff Squyres	42ec26e640	Update the copyright notices for IU and UTK. This commit was SVN r7999.	2005-11-05 19:57:48 +00:00
Brian Barrett	1302cb4072	The next in a long line of crazed build system changes from Brian. This was originally suggested by Ralf Wildenhues, to try to speed autogen, configure, and make (and possibly even make install). Use automake's include directive to drastically reduce the number of Makefile files (although the number of Makefile.am files is the same - most are just included in a top-level Makefile.am). Also use an Automake SUBDIRs feature to eliminate the dynamic-mca tree, which was no longer really needed. This makes adding a framework easier (since you don't have to remember the dynamic-mca tree) and makes building faster (as make doesn't have to recurse through the dynamic-mca tree) This commit was SVN r7777.	2005-10-17 00:21:10 +00:00
Jeff Squyres	e097ee635a	Silence compiler warnings. This commit was SVN r7768.	2005-10-14 22:06:25 +00:00
Brian Barrett	ed56e743b7	* update configure.ac to use the modern version of AC_INIT and AM_INIT_AUTOMAKE, instead of the deprecated version. * Work around dumbness in modern AC_INIT that requires the version number to be set at autoconf time (instead of at configure time, as it was before). Set the version number, minus the subversion r number, at autoconf time. Override the internal variables to include the r number (if needed) at configure time. Basically, the right thing should always happen. The only place it might not is the version reported as part of configure --help will not have an r number. * Since AM_INIT_AUTOMAKE taks a list of options, no need to specify them in all the Makefile.am files. * Addes support for subdir-objects, meaning that object files are put in the directory containing source files, even if the Makefile.am is in another directory. This should start making it feasible to reduce the number of Makefile.am files we have in the tree, which will greatly reduce the time to run autogen and configure. This commit was SVN r7211.	2005-09-07 05:54:53 +00:00
Jeff Squyres	cf16a521c8	Ensure to get ompi/include/constants.h This commit was SVN r6845.	2005-08-12 21:42:07 +00:00
Brian Barrett	ed81e51c3a	* rename ompi_printf to opal_printf * rename ompi pty code to opal pty code * rename ompi_qsort to opal_qsort This commit was SVN r6335.	2005-07-04 02:16:57 +00:00
Brian Barrett	a13166b500	* rename ompi_output to opal_output This commit was SVN r6329.	2005-07-03 23:31:27 +00:00
Brian Barrett	499e4de1e7	* rename ompi_object and ompi_class to opal_object and opal_class This commit was SVN r6321.	2005-07-03 16:06:07 +00:00
Jeff Squyres	35c141aef6	While we're moving directories around, move ompi/mpi/runtime -> ompi/runtime, for consistency and parallel-ness will orte/runtime. Also remove a few useless #includes along the way. This commit was SVN r6317.	2005-07-03 12:07:29 +00:00
Jeff Squyres	aa056f7bfd	First cut of OMPI Makefile.am's, plus a few more catchup updates in orte This commit was SVN r6286.	2005-07-02 15:06:47 +00:00
Jeff Squyres	4ab17f019b	Rename src -> ompi This commit was SVN r6269.	2005-07-02 13:43:57 +00:00

47 Коммитов