openmpi

Автор	SHA1	Сообщение	Дата
Abhishek Kulkarni	afbe3e99c6	* Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with (OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns back the native error code. * Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to decode 'ret' to get the native error code. This commit was SVN r23162.	2010-05-17 23:08:56 +00:00
Ralph Castain	5965d3e620	Include the new error code in the error strings This commit was SVN r23111.	2010-05-07 18:09:08 +00:00
Ralph Castain	d4f56cff61	More cleanup on paffinity....groan It is okay to not have a paffinity module IF you aren't using paffinity anyway. So don't error out of MPI_Init because a paffinity module wasn't selected. Cleanup error reporting in the odls default module to (once and for all!) eliminate messages originating in the fork'd process. Create some new error codes to allow us to pass enough info back to the parent process to provide useful error messages. This commit was SVN r23106.	2010-05-06 20:57:17 +00:00
Jeff Squyres	16b100219d	A patch from UTK to allow orte_init(), opal_init(), and associated friends also receive &argc and &argv (George asked Jeff to Ralph to review before committing). The thought is that passing argv and argc to opal/orte_init be useful to other projects outside of OMPI that are using OPAL and/or ORTE (especially in conjunction with some other bootstrapping code where it is helpful to modify argv). It's such a small thing that it's easy to apply here to make others' lives a little easier. Ask George for more details; I'm just the messenger. :-) Judging by the copyrights on this patch, it's been around for a while. :-) This commit was SVN r22260.	2009-12-04 00:51:15 +00:00
Ralph Castain	176fdd3a83	Add a new API to the show_help system that allows external users (e.g., libraries built upon OMPI) to define their own locations for show_help files. This allows such users to exploit the rather nice features of the OPAL show_help system -without- interfering with the ability of the ORTE and OMPI layers to use show_help themselves. Reviewed by Jeff to protect toes...and to get some good comments :-) This commit was SVN r22026.	2009-09-29 02:07:46 +00:00
Ralph Castain	7cc045f9c5	Check return codes when init'ing the paffinity framework to avoid segfaulting This commit was SVN r21884.	2009-08-26 01:58:15 +00:00
George Bosilca	5155eaf945	The opal datatype engine should _ALWAYS_ be initialized. Therefore move the call to opal_datatype_init in the opal_util_init. This commit was SVN r21754.	2009-08-03 16:46:33 +00:00
Rainer Keller	6c5532072a	- Split the datatype engine into two parts: an MPI specific part in OMPI and a language agnostic part in OPAL. The convertor is completely moved into OPAL. This offers several benefits as described in RFC http://www.open-mpi.org/community/lists/devel/2009/07/6387.php namely: - Fewer basic types (int* and float* types, boolean and wchar - Fixing naming scheme to ompi-nomenclature. - Usability outside of the ompi-layer. - Due to the fixed nature of simple opal types, their information is completely known at compile time and therefore constified - With fewer datatypes (22), the actual sizes of bit-field types may be reduced from 64 to 32 bits, allowing reorganizing the opal_datatype structure, eliminating holes and keeping data required in convertor (upon send/recv) in one cacheline... This has implications to the convertor-datastructure and other parts of the code. - Several performance tests have been run, the netpipe latency does not change with this patch on Linux/x86-64 on the smoky cluster. - Extensive tests have been done to verify correctness (no new regressions) using: 1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and ompi-ddt: a. running both trunk and ompi-ddt resulted in no differences (except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run correctly). b. with --enable-memchecker and running under valgrind (one buglet when run with static found in test-suite, commited) 2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt: all passed (except for the dynamic/ tests failed!! as trunk/MTT) 3. compilation and usage of HDF5 tests on Jaguar using PGI and PathScale compilers. 4. compilation and usage on Scicortex. - Please note, that for the heterogeneous case, (-m32 compiled binaries/ompi), neither ompi-trunk, nor ompi-ddt branch would successfully launch. This commit was SVN r21641.	2009-07-13 04:56:31 +00:00
Greg Koenig	60485ff95f	This is a very large change to rename several #define values from OMPI_* to OPAL_*. This allows opal layer to be used more independent from the whole of ompi. NOTE: 9 "svn mv" operations immediately follow this commit. This commit was SVN r21180.	2009-05-06 20:11:28 +00:00
Ralph Castain	afe1950da5	Make the error message clearer - this error only is used when two buffer types don't match, thus preventing an operation from being executed This commit was SVN r21033.	2009-04-16 16:23:28 +00:00
Jeff Squyres	d1c6f3f89a	* Fix a truckload of Cisco copyrights to be the same as the rest of the code base. * Fix a few misspellings in other copyrights. This commit was SVN r20241.	2009-01-11 02:30:00 +00:00
Ralph Castain	1ace83c470	Enable modex-less launch. Consists of: 1. minor modification to include two new opal MCA params: (a) opal_profile: outputs what components were selected by each framework currently enabled for most, but not all, frameworks (b) opal_profile_file: name of file that contains profile info required for modex 2. introduction of two new tools: (a) ompi-probe: MPI process that simply calls MPI_Init/Finalize with opal_profile set. Also reports back the rml IP address for all interfaces on the node (b) ompi-profiler: uses ompi-probe to create the profile_file, also reports out a summary of what framework components are actually being used to help with configuration options 3. modification of the grpcomm basic component to utilize the profile file in place of the modex where possible 4. modification of orterun so it properly sees opal mca params and handles opal_profile correctly to ensure we don't get its profile 5. similar mod to orted as for orterun 6. addition of new test that calls orte_init followed by calls to grpcomm.barrier This is all completely benign unless actively selected. At the moment, it only supports modex-less launch for openib-based systems. Minor mod to the TCP btl would be required to enable it as well, if people are interested. Similarly, anyone interested in enabling other BTL's for modex-less operation should let me know and I'll give you the magic details. This seems to significantly improve scalability provided the file can be locally located on the nodes. I'm looking at an alternative means of disseminating the info (perhaps in launch message) as an option for removing that constraint. This commit was SVN r20098.	2008-12-09 23:49:02 +00:00
Ralph Castain	9927b2445c	Remove the filter framework - the xml support will have to be provided in a different manner that will be implemented shortly This commit was SVN r18572.	2008-06-04 09:04:51 +00:00
Terry Dontje	ef7ac86929	created opal_version_string and orte_version_string to match the ompi changes made in r18345 for ompi_version_string. This was done per request from Jeff Squyres to maintain consistency and to remove some warnings caused by the non-use of some static const char. This commit was SVN r18461. The following SVN revision numbers were found above: r18345 --> open-mpi/ompi@8dd0421015	2008-05-20 12:13:19 +00:00
Jeff Squyres	d12b21e21b	Ensure that if an error occurs, we actually return that error rather than an undefined value (which could be 0/OPAL_SUCCESS). This commit was SVN r18452.	2008-05-19 11:57:44 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Sharon Melamed	4a8e2a2648	Renove status check from carto initiation. This commit was SVN r17812.	2008-03-12 08:55:28 +00:00
Jeff Squyres	b2ed2b95aa	Fix filename so that the help file can be found. This commit was SVN r17759.	2008-03-06 14:44:47 +00:00
Ralph Castain	8d819cf3d3	Move carto open/close/finalize to opal layer so that ORTE can get access to topo info. This will be used to support a topo grpcomm that optimizes communications in non-uniform topologies like RR. This commit was SVN r17652.	2008-02-28 21:04:30 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Rainer Keller	7621800477	- Fix and add comments -- output full name for pd - Protect argument in macro... This commit was SVN r17434.	2008-02-12 16:59:59 +00:00
Shiqing Fan	f5792bbda5	merging the memchecker into trunk. This commit was SVN r17424.	2008-02-12 08:46:27 +00:00
Jeff Squyres	714b409595	Fix an uninitialized variable in the error case. Thanks to Ake Sandgren for pointing out the mistake. This commit was SVN r16682.	2007-11-07 01:52:23 +00:00
Ethan Mallove	005652c9d4	* Embed ident strings into the Open MPI libraries using one of the following methods (in order of precedence): 1. #pragma ident <ident string> (e.g., Intel and Sun) 1. #ident <ident string> (e.g., GCC) 1. static const char ident[] = <ident string> (all others) By default, the ident string used is the standard Open MPI version string. Only the following libraries will get the embedded version strings (e.g., DSOs will not): * libmpi.so * libmpi_cxx.so * libmpi_f77.so * libopen-pal.so * libopen-rte.so * Added two new configure options: * `--with-package-name="STRING"` (defaults to "Open MPI username@hostname Distribution"). `STRING` is displayed by `ompi_info` next to the "Package" heading. * `--with-ident-string="STRING"` (defaults to the standard Open MPI version string - e.g., X.Y.Zr######). `%VERSION%` will expand to the Open MPI version string if it is supplied to this configure option. This commit was SVN r16644.	2007-11-03 02:40:22 +00:00
George Bosilca	31dfa5592e	Few clean-ups, few indentations. Nothing really important. This commit was SVN r15767.	2007-08-04 00:44:23 +00:00
Brian Barrett	8a7b6656b3	Reference count calls to the util access as well as the main initialized code This commit was SVN r15495.	2007-07-18 20:28:19 +00:00
Brian Barrett	916397f358	Use thread specific data and static buffers for the return type of opal_net_get_hostname() rather than malloc, because no one was freeing the buffer and the common use case was for printfs, where calling free is a pain. This commit was SVN r15494.	2007-07-18 20:25:01 +00:00
Brian Barrett	34fea87819	* Only need to to the opal_progress_event_users_increment() once between OPAL and ORTE. Since we now do opal_progress_init(), we do it there. Fixes a performance issue introduced in r14773. * While trying to find the above, notived that we did the reference counting for the init in init_util and for finalize in fini. That isn't right, so make them both in the non-util versions. This commit was SVN r14830. The following SVN revision numbers were found above: r14773 --> open-mpi/ompi@1e678c3f55	2007-06-01 02:43:46 +00:00
Josh Hursey	1e678c3f55	per conversation with Ralph and Jeff take out the opal_init_only logic. This commit moves the initalization/finalization of opal_event and opal_progress to opal_init/finalize. These were previously init/final in ORTE which is an abstraction violation. After talking about it we concluded that there are no ordering issues that require these to be init/final in ORTE instead of OPAL. I ran the IBM test suite against this commit and it didn't turn up any new failures so I think it is good to go. Let us know if this causes problems. This commit was SVN r14773.	2007-05-24 21:54:58 +00:00
Ralph Castain	1682a72d34	Add ability to read system limits on number of children, open files, and file size from the local OS - to be used in failed-to-start scenarios This commit was SVN r14476.	2007-04-23 18:53:47 +00:00
Jeff Squyres	0ba47105ed	Merge the /tmp/jms-installdirs-trunk branch into the trunk. This finally brings in functionality that is already on the 1.2 branch, and was developed and tested in the v1.2ofed branch (and other places). Short version of new features: * Support for ibv_fork_init() * Automatically fill in the openib BTL bandwidth value by querying the HCA port * Installdirs functionality * Fixes to always use -I in the Fortran wrapper compilers (#924) * Gleb's mpool updates * Remove some kruft in btl/openib/configure.m4, therefore fixing the harmless warnings noted in #665 * Bunches of updates to the Linux RPM spec file I.e., effectively the same thing that r14411 brought to the v1.2 branch. Also effectively brought in r14432 and r14433 (some fixes on top of the original r14411 commit to v1.2). Still need to bring in the moral equivalent of r14445 after this commit (fixes to installdirs). This commit was SVN r14449. The following SVN revision numbers were found above: r14411 --> open-mpi/ompi@83b31314ae r14432 --> open-mpi/ompi@a48f160595 r14433 --> open-mpi/ompi@68f346d2bc r14445 --> open-mpi/ompi@13d366b827	2007-04-21 00:15:05 +00:00
Josh Hursey	dadca7da88	Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158	2007-03-16 23:11:45 +00:00
Rainer Keller	e61dd8722e	- Silence compiler on ORTE_TRANSPORT_KEY_FMT, it is fixed to llx - No functional changes, just indentation and corrections to error output. This commit was SVN r12734.	2006-12-03 13:59:23 +00:00
Brian Barrett	778bba2668	refs trac:405 * Make sure to AC_SUBST the backtrace CFLAGS so that the right flags are passed to the component (especially -m64) * Properly open / close the component. This isn't strictly necessary to fix the bug, but was an oversight that should be fixed. This commit was SVN r11806. The following Trac tickets were found above: Ticket 405 --> https://svn.open-mpi.org/trac/ompi/ticket/405	2006-09-25 23:41:06 +00:00
George Bosilca	136c79908b	Count how many times the opal library get initialized and require the same numbers of finalizations before really destroying the internals. This commit was SVN r11303.	2006-08-21 20:07:38 +00:00
George Bosilca	6e6698bec3	Open and close the memcpy component. Hopefully it is in the right place, as the memcpy should be available as soon as possible after startup. This commit was SVN r9533.	2006-04-05 05:57:51 +00:00
Brian Barrett	566a050c23	Next step in the project split, mainly source code re-arranging - move files out of toplevel include/ and etc/, moving it into the sub-projects - rather than including config headers with <project>/include, have them as <project> - require all headers to be included with a project prefix, with the exception of the config headers ({opal,orte,ompi}_config.h mpi.h, and mpif.h) This commit was SVN r8985.	2006-02-12 01:33:29 +00:00
Brian Barrett	c96f870674	* Merge of wrapper compiler updates from the bwb-wrapper-fix branch (r8690 - r8698), with changes below: - Split wrapper flags into those required for each of the three projects, and cleaned up some cruft (including the LIBMPI_EXTRA_*FLAGS) through- out the build system - Added opal_init_util and opal_finalize_util to allow init / cleanup of all the opal code that doesn't require the MCA system - Create standalone key=value file parser, based on the one that used to be in the mca param parser, so that it can be shared in multiple places - Add wrapper datafiles for opal, orte, and ompi wrappers, and add wrapper compiler with support for all the old features This commit was SVN r8699. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r8690 r8698	2006-01-16 01:48:03 +00:00
Brian Barrett	60ac1cb5f4	print stack traces (when available) for opal and orte processes, as well as ompi processes. Also add SIGABRT to the list of signals that are intercepted to print out pretty messages. This commit was SVN r8672.	2006-01-11 04:36:39 +00:00
Jeff Squyres	f1e8790bbe	Add remaining OPAL error codes in opal_err2str() This commit was SVN r8573.	2005-12-21 06:27:34 +00:00
Brian Barrett	79bf8843d2	* update memory hooks interface to allow for callbacks on both allocations and dealllocations, per request from Galen and Tim This commit was SVN r8303.	2005-11-29 04:46:14 +00:00
George Bosilca	16ca6e4c88	error seems to be a reserved keyword for some compilers ... This commit was SVN r8262.	2005-11-26 21:18:47 +00:00
Brian Barrett	878676218e	Rename opal/memory to opal/memoryhooks because XLC++ on Mac OS X is broken. When compiling C++ code that includes something that looks for the C++ header file "memory" (stupid C++ headers not having .h extensions), it goes through the header file search path, which includes $(topsrcdir)/opal, so it finds the directory $(topsrcdir)/opal/memory/ and tries to load that as the memory header file and all goes downhill. This commit was SVN r8111.	2005-11-11 00:26:27 +00:00
Jeff Squyres	42ec26e640	Update the copyright notices for IU and UTK. This commit was SVN r7999.	2005-11-05 19:57:48 +00:00
George Bosilca	0d4aaf6fa6	Provide the boool value as expected by the opal_show_help function. This commit was SVN r7832.	2005-10-21 20:04:18 +00:00
Andrew Friedley	b1af69dfe7	Don't check for errors on the paffinity stuff, as per Brian's request. This commit was SVN r7640.	2005-10-05 18:08:06 +00:00
Andrew Friedley	37123ed430	Implement an opal_show_help() (like is done in ompi_mpi_init) for error handling in opal_init and both stages of orte_init. Some of the functions in opal_init are void or return a bool (opal_output_init, but always returns true.. eh?), so I don't check them. This commit was SVN r7638.	2005-10-05 13:56:35 +00:00
Brian Barrett	1d9b663b62	* test for condition where we think we can intercept malloc/free/munmap but really can't. Test for munmap, since it's the most likely to cause problems, since it's always an interposed symbol. The condition that usually causes problems is if libmpi was brought in as the result of a library dependency, rather than as a -l on the link line. The linker in this case will find malloc/free/munmap/etc. in libc, rather than in libmpi. This commit was SVN r7508.	2005-09-26 20:20:20 +00:00
Ralph Castain	2c6e47e38c	Add a trace utility that provides info on progress through functions. This is not enabled yet - need Jeff or Brian to add it to the configure/build system. This commit was SVN r7222.	2005-09-07 18:52:28 +00:00
Brian Barrett	77ebdf1c6f	* Add some debugging output Ralph asked for when an unknown error code is passed to opal_error This commit was SVN r7087.	2005-08-29 23:36:53 +00:00

1 2

63 Коммитов