openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	6c3ddf98ae	revert open-mpi/ompi@c75650e68f That commit was a bad idea.	2015-02-23 15:39:53 -08:00
Jeff Squyres	c75650e68f	opal_finalize: fix minor memory leak This string is strdup'ed in opal_init_util().	2015-02-23 12:34:49 -08:00
Jeff Squyres	4a85f759ec	opal_info_support.c: prevent a NULL pointer If NULL is passed in, then assume the caller meant "". This was CID 993714.	2015-02-12 13:41:29 -08:00
Ralph Castain	3ae3b96c17	Fix master compilation - a buried header dependency must have been removed.	2015-02-10 07:22:10 -08:00
Ralph Castain	f28238af59	Fix a race condition seen by Absoft during finalize. Stop the orte progress thread without cleaning it up, thus allowing the frameworks to still cancel their posted recv's. Then cleanup the memory footprint afterwards.	2015-02-05 11:41:37 -08:00
Gilles Gouaillardet	661c35ca67	cleanup dead code caused by the removal of the --with-threads configure option	2015-01-16 19:13:59 +09:00
Nathan Hjelm	d0da29351f	opal_progress: fix sched_yield check	2014-12-09 14:14:20 -07:00
Jeff Squyres	c22e1ae33b	configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros These two macros set the prefix for the OPAL and ORTE libraries, respectively. Specifically, the OPAL library will be named libPREFIXopen-pal.la and the ORTE library will be named libPREFIXopen-rte.la. These macros must be called, even if the prefix argument is empty. The intent is that Open MPI will call these macros with an empty prefix, but other projects (such as ORCM) will call these macros with a non-empty prefix. For example, ORCM libraries can be named liborcm-open-pal.la and liborcm-open-rte.la. This scheme is necessary to allow running Open MPI applications under systems that use their own versions of ORTE and OPAL. For example, when running MPI applications under ORTE, if the ORTE and OPAL libraries between OMPI and ORCM are not identical (which, because they are released at different times, are likely to be different), we need to ensure that the OMPI applications link against their ORTE and OPAL libraries, but the ORCM executables link against their ORTE and OPAL libraries.	2014-10-22 10:32:19 -07:00
Jeff Squyres	01fd96bfa5	Revert "Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build." This reverts commit `63f619f871`.	2014-10-22 10:32:11 -07:00
Ralph Castain	63f619f871	Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build.	2014-10-10 11:39:08 -07:00
Ralph Castain	fd6a044b7f	Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages. Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.	2014-10-03 16:02:57 -06:00
Jeff Squyres	413e775dbf	version configury: make dist now works Update the VERSION file scheme: * Remove "want_repo_rev". * Add "tarball_version". All values are now always included (major, minor, release, greek, repo_rev). However, configure.ac now runs "opal_get_version.sh ... --tarball", which will return the value of tarball_version (if it is non-empty) or the "full" version string (i.e., "major.minor.releasegreek").	2014-10-02 11:32:54 -07:00
Jeff Squyres	d4e2809531	version: always use all 3 version numbers In all previous releases, the version number would be "A.B.C" unless C was 0, in which case it would be "A.B". This commit changes that scheme to always be "A.B.C", even if C==0. Hence, v1.9.0 will be the first release where this new scheme is evident. This commit was SVN r32816.	2014-09-30 15:54:18 +00:00
Artem Polyakov	f2e586980b	Fix timing framework: 1. Fixes according to (http://www.open-mpi.org/community/lists/devel/2014/09/15869.php) 2. Force mpisync:rank0 to gather results. Now sync info is written by rank0 to the output file. 3. Improve mpirun_prof: 1) adopt to the environment (SLURM/TORQUE); 2) recognize some noteset-related mpirun options. This commit was SVN r32772.	2014-09-23 12:59:54 +00:00
Artem Polyakov	70587d1804	Remove outdated OPAL parameter "opal_pmi_version". Now PMI selection is handled by PMIx MCA. This commit was SVN r32767.	2014-09-20 02:30:23 +00:00
Ralph Castain	dfb952fa78	[Contribution from Artem - moved it to svn from git for him] Replace our old, clunky timing setup with a much nicer one that is only available if configured with --enable-timing. Add a tool for profiling clock differences between the nodes so you can get more precise timing measurements. I'll ask Artem to update the Github wiki with full instructions on how to use this setup. This commit was SVN r32738.	2014-09-15 18:00:46 +00:00
Ralph Castain	e32d541c8d	Bring over a slight modification to the opal_init_test routine This commit was SVN r32676.	2014-09-07 15:46:53 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Jeff Squyres	0a398c155f	opal MCA params: Move (and adapt) help message to opal help file This commit was SVN r32547.	2014-08-16 11:54:41 +00:00
Ralph Castain	a347b19dc1	Add missing include This commit was SVN r32406.	2014-08-01 18:49:37 +00:00
Ralph Castain	76d82b885f	Correctly dereference the thread object This commit was SVN r32321.	2014-07-26 17:01:27 +00:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00
Ralph Castain	c1bb5b68d0	It is possible to have a "standard" progress thread, so simplify the usage of the opal_progress_thread code. This commit was SVN r32277.	2014-07-22 16:55:23 +00:00
George Bosilca	7c21491858	Fix the indentation. Allow for deregistration of OPAL params. This commit was SVN r32242.	2014-07-15 05:20:26 +00:00
George Bosilca	2f6bc76dc1	Be symmetric to the opal_init. This commit was SVN r32241.	2014-07-15 05:05:26 +00:00
George Bosilca	77c74e8872	Don't iniaitlize twice the event framework (it is already initialized the init_tool). This commit was SVN r32239.	2014-07-15 05:04:29 +00:00
Mike Dubman	e342a11c2e	opal envlist mca: implement Jeff`s quibbles fixed by Elena, reviewed by Miked This commit was SVN r32216.	2014-07-11 07:23:20 +00:00
Ralph Castain	60da1456d9	Silence unused var warning This commit was SVN r32187.	2014-07-09 22:37:22 +00:00
Joshua Ladd	057370364d	Opal: Add a new MCA variable type "version_string". Also add a new flag to ompi_info that allows a user to print all MCA variables of a specific type. --type version_string This command will print all MCA variables of type version_string. This feature was developed by Elena Shipunova and was reviewed by Josh Ladd. This commit was SVN r32166.	2014-07-09 01:37:23 +00:00
Ralph Castain	832fa4a028	Ensure that the progress thread tracker properly cleans up the blocking event, if set. Also, use the blocking event to help wake up the progress thread for quick shutdown as some threads can be blocked in a long-running call to select. This commit was SVN r32141.	2014-07-04 14:55:51 +00:00
Ralph Castain	f6d4b4c11b	As discussed at the OMPI developer's meeting, add functions to start, stop, and restart libevent-driven progress threads. Critical NOTE: if you don't have a file descriptor event defined for your progress thread, it will spin hard! Accordingly, the "start progress thread" function has a boolean parameter you can use to request that the function automatically create one for you. This commit was SVN r32137.	2014-07-03 18:56:46 +00:00
Ralph Castain	1107f9099e	Per the RFC issued here: http://www.open-mpi.org/community/lists/devel/2014/05/14827.php Refactor PMI support This commit was SVN r31907.	2014-06-01 04:28:17 +00:00
Ralph Castain	5602156a1c	Use the correct abstraction layer name for the data dirs This commit was SVN r31684.	2014-05-08 14:32:24 +00:00
Ralph Castain	11faab1091	The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees. This commit was SVN r31679.	2014-05-08 02:01:35 +00:00
Jeff Squyres	24e222f49d	opal: Remove unused help message. Help messages about deprecated variables are now provided by the MCA var system. This commit was SVN r31325.	2014-04-07 15:41:43 +00:00
Nathan Hjelm	a4fff57720	make the opal progress yield variable settable at any time The semantics of the variable mpi_yield_when_idle are to call opal_progress_set_yield_when_idle at MPI_Init. It would be difficult to modify the old variable to support setting this parameter at runtime. The fix is to add an additional parameter to opal: opal_progress_yield_when_idle that directly sets the variable. This variable is settable anytime and does not affect the semantics of the old mpi_yield_when_idle variable. Refs trac:193 cmr=v1.8.1:reviewer=jsquyres This commit was SVN r31255. The following Trac tickets were found above: Ticket 193 --> https://svn.open-mpi.org/trac/ompi/ticket/193	2014-03-27 15:51:06 +00:00
Adrian Reber	4ca07ae125	re-introduce distill_checkpoint_ready In the OPAL_ENABLE_FT_CR code path there used to be a variable 'mca_base_component_distill_checkpoint_ready' which got removed. The FT code was not compiling and while trying to get it to compile again the old variable was #ifdef'd out. This re-introduces the variable with a new name 'opal_base_distill_checkpoint_ready' and enables the code previously #ifdef'd out. This removes the last hack introduced to get the FT code to compile again. This commit was SVN r30928.	2014-03-04 16:14:46 +00:00
Ralph Castain	78e1846b4b	Add further clarification regarding new "test" APIs This commit was SVN r30567.	2014-02-05 15:48:31 +00:00
Ralph Castain	230336b6a8	Upgrade the security framework to avoid multiple hits against the global security server. Add support for future case where mpirun assings a global security credential for a given run, though we need to work out how to handle connect-accept from other mpirun's in that case. Remove a bunch of duplicate code in the OOB by consolidating the connection handshake code. Refs trac:4221 This commit was SVN r30554. The following Trac tickets were found above: Ticket 4221 --> https://svn.open-mpi.org/trac/ompi/ticket/4221	2014-02-04 14:47:04 +00:00
Ralph Castain	5980b7e042	Add a security framework for authenticating connections - we will add LDAP, Kerberos, and Keystone support in the next month. For now, just put a placeholder "basic" module that does the minimum. Wire the security check into ORTE's OOB handshake, and add a "version" check to ensure that both ends are from the same ORTE version. If not, report the mismatch and refuse the connection Fixes trac:4171 cmr=v1.7.5:reviewer=jsquyres:subject=Add a security framework for authenticating connections This commit was SVN r30551. The following Trac tickets were found above: Ticket 4171 --> https://svn.open-mpi.org/trac/ompi/ticket/4171	2014-02-04 01:38:45 +00:00
Ralph Castain	83e32aadb7	Add a variant of opal_init/finalize for running unit tests This commit was SVN r30497.	2014-01-30 11:14:36 +00:00
Ralph Castain	26fbb4e77b	Necessary constants for postgress module This commit was SVN r30338.	2014-01-20 19:58:56 +00:00
Jeff Squyres	13b29cff2c	This commit compliements/completes r30140. r30140 made all the configury/Makefile.am changes; this commit renames the internal installdirs.h framework struct field names to match the configry macro names: * pkgdatdir -> ompidatadir * pkglibdir -> ompilibdir * pkgincludedir -> ompiincludedir This commit was SVN r30145. The following SVN revision numbers were found above: r30140 --> open-mpi/ompi@8b778903d8	2014-01-07 23:36:33 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
Nathan Hjelm	3be4536d9b	Cleanup various leaks in ompi_info reported by valgrind. cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30058.	2013-12-23 17:47:43 +00:00
Nathan Hjelm	ee9cd13b90	Remove opal_recursion_depth_counter and opal_progress_thread_count. These counters add two atomics in the critical path and are not currently used. We can bring them back if there turns out to be a good use for them. cmr=v1.7.4:reviewer=brbarret This commit was SVN r29994.	2013-12-19 23:15:27 +00:00
Brian Barrett	6ef938de3f	* Per the Developer's meeting today, restructure the threading in Open MPI a bit more: - Remove OPAL_ENABLE_MULTI_THREADS, since it didn't really do anything correctly. Opal always has threads enabled at this point. - Remove OMPI_ENABLE_PROGRESS_THREADS, since this hasn't worked in 8 years and it has performance issues we'll never be able to overcome. Note that we have plans for re-adding async progress, using a hybrid protocol of async and sync sends. - OMPI_ENABLE_THREAD_MULTIPLE now determines whether the thread lock macros do the check or not. - Condition variables are ALWAYS polling right now, which fixes the thread live-lock currently found when THREAD_MULTIPLE is turned on. This commit was SVN r29891.	2013-12-13 19:40:12 +00:00
Jeff Squyres	f1bff698a4	Fix compiler warning: event is unsigned; it can't be negative cmr=v1.7.4:reviewer=rhc This commit was SVN r29684.	2013-11-13 15:35:37 +00:00
Alex Margolin	50a3c01a0f	fixed build without thread support This commit was SVN r29145.	2013-09-06 19:03:19 +00:00
Nathan Hjelm	77a41e1ca9	ompi_info: mark the variables from disabled components as disabled in the output of ompi_info. A variable is disabled if its component will never be selected due to a component selection parameter (eg. -mca btl self). The old behavior of ompi_info was to not print these parameters at all. Now we print the parameters. After some discussion with George it was decided that there needed to be some way to see what parameters will not be used. This was the comprimise. This commit also fixes a bug and a typo in the pvar sytem. The enum_count value in mca_base_pvar_dump was being used without being set. The full_name in mca_base_pvar_t was not being used. cmr=v1.7.3:ticket=trac:3734 This commit was SVN r29078. The following Trac tickets were found above: Ticket 3734 --> https://svn.open-mpi.org/trac/ompi/ticket/3734	2013-08-28 16:03:23 +00:00
Nathan Hjelm	c699ee7812	Update the ompi_info man page with information about variable levels and improve the behavior of ompi_info. This commit changes the default behavior of ompi_info --all when a level is not specified. Instead of assuming level 1 in this case we now assume level 9. This change is due to feedback from the community after the introduction of the --level option. I also added a new option: --selected-only. This option will limit the displayed variables to components that can be selected (ie. if there is a selection parameter set-- btl self,sm) cmr=v1.7.3:reviewer=jsquyres This commit was SVN r29070.	2013-08-27 19:11:37 +00:00
Ralph Castain	7947cec8fa	Cleanup warning This commit was SVN r29031.	2013-08-16 21:13:40 +00:00
Ralph Castain	8a4c5f4957	Attempt to plug a few memory leaks by ensuring we finalize all things opened during init. However, we are still leaking memory like a sieve in param registration and hwloc. This commit was SVN r29026.	2013-08-14 02:03:00 +00:00
Nathan Hjelm	456de007a8	ignore unavailable components when registering This commit was SVN r28802.	2013-07-16 16:02:33 +00:00
Nathan Hjelm	d446675526	MCA: Per-RFC, add support for performance variables This commit adds an API for registering and querying performance variables (mca_base_pvar) in the MCA base. The existing MCA variable system API has been updated to reflect the new API: MCA variable groups have performance variables, and new types have been added (double, unsigned long long) to reflect what is required by the MPI_T interface. Additionally, the MCA variable group code has been split into its own set of files: mca_base_var_group.[ch]. Details of the new API can be found in doxygen comments in the header: mca_base_pvar.h. Other changes to the variable system: - Use an opal_hash_table to speed up variable/group lookup. - Clean up code associated with MCA variable types. - Registered performance variables are printed by ompi_info -a. In the future an option should be added to control this behavior. Changes to OMPI: - Added full support for the MPI_T performance variable interface. This commit was SVN r28800.	2013-07-16 16:02:13 +00:00
Ralph Castain	10ca1c1b04	Turns out that there was exactly ONE place in all of the OMPI code base that still referred to OPAL_TRACE, though a few places retained the include file for no reason. So no point in letting this sit as it is clearly an unused "feature". This commit was SVN r28789.	2013-07-14 18:57:20 +00:00
Nathan Hjelm	a694bcb6b6	Add support for the MCA variable information level to ompi_info. Add an option to ompi_info (-l, --level) that takes a number in the interval (1,9). Only MCA variables up to this level will be printed. The default level is 1. Print the level as part of both the parsable and readable output. This commit was SVN r28750.	2013-07-10 18:52:36 +00:00
Nathan Hjelm	721779d7ab	Per RFC: remove old MCA parameter system. This commit was SVN r28541.	2013-05-20 15:36:13 +00:00
Jeff Squyres	089c632cce	Remove a bunch of dead code: gcc 4.7 warns of set-but-unused variables. So get rid of them. This commit was SVN r28538.	2013-05-17 21:45:49 +00:00
Nathan Hjelm	c50b99005d	fix typo in opal_info_show_component_version and clean up more from ompi_info This commit was SVN r28389.	2013-04-24 22:07:06 +00:00
Nathan Hjelm	4896b3bc4b	clean up some ompi_info code This commit was SVN r28388.	2013-04-24 21:37:24 +00:00
Ralph Castain	4fae24f2f1	Crud - missed this file, needs to go with prior commit, will add to cmr This commit was SVN r28382.	2013-04-24 17:47:18 +00:00
Nathan Hjelm	bccf8c657a	Per RFC add initial support for the MPI 3.0 tools interface. Current MPI_T support: - Full cvar interface. - Full categories interface. - No pvar support at this time. This commit was SVN r28376.	2013-04-24 15:59:23 +00:00
Ralph Castain	1f011bef99	Cleanup the updated sys limits capability. Fix a few copy/paste bugs (my bad). Shift the limit set to the ODLS default module so that we sete the limits for all apps, even those that don't call opal_init. Leave it in opal_init as well to support direct-launch apps, but ensure we only set the limits once by removing the envar after launch by ODLS. Provide some nice error messages if we fail to set the limits. Since the user had to specifically request we set the limit, treat failure as an error-out situation. This commit was SVN r28288.	2013-04-04 16:00:17 +00:00
Ralph Castain	d09a9e8096	Upgrade the system limit code to support a broader range of parameters. For now, we support stack size, #open files, #children, and file size we can c reate. Continue to support the old "1" or "0" options for backward compatibility. This commit was SVN r28282.	2013-04-03 18:57:53 +00:00
Ralph Castain	a9dc5a31f2	Fix verbosity setting This commit was SVN r28251.	2013-03-27 22:12:01 +00:00
Nathan Hjelm	17315bf360	Now that the entire codebase has been updated to use the MCA framework system remove the last calls to the MCA parameter system. This commit was SVN r28242.	2013-03-27 21:17:53 +00:00
Nathan Hjelm	9d4a26f47d	Update OMPI frameworks to use the MCA framework system. Notes: - This commit also eliminates the need for an available components list in use in several frameworks. None of the code in question was making use of the priority field of the priority component list item so these extra lists were removed. - Cleaned up selection code in several frameworks to sort lists using opal_list_sort. - Cleans up the ompi/orte-info functions. Expose the functions that construct the list of params so they can be used elsewhere. patches for mtl/portals4 from brian missed a few output variables in openib This commit was SVN r28241.	2013-03-27 21:17:31 +00:00
Nathan Hjelm	365cf48db5	Update OPAL frameworks to use the MCA framework system. This commit was SVN r28239.	2013-03-27 21:11:47 +00:00
Nathan Hjelm	c3b67d0187	Automatically generate a list of installed frameworks in project/include/project/frameworks.h This commit was SVN r28238.	2013-03-27 21:10:32 +00:00
Nathan Hjelm	cf377db823	MCA/base: Add new MCA variable system Features: - Support for an override parameter file (openmpi-mca-param-override.conf). Variable values in this file can not be overridden by any file or environment value. - Support for boolean, unsigned, and unsigned long long variables. - Support for true/false values. - Support for enumerations on integer variables. - Support for MPIT scope, verbosity, and binding. - Support for command line source. - Support for setting variable source via the environment using OMPI_MCA_SOURCE_<var name>=source (either command or file:filename) - Cleaner API. - Support for variable groups (equivalent to MPIT categories). Notes: - Variables must be created with a backing store (char *, int , or bool *) that must live at least as long as the variable. - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of mca_base_var_set_value() to change the value. - String values are duplicated when the variable is registered. It is up to the caller to free the original value if necessary. The new value will be freed by the mca_base_var system and must not be freed by the user. - Variables with constant scope may not be settable. - Variable groups (and all associated variables) are deregistered when the component is closed or the component repository item is freed. This prevents a segmentation fault from accessing a variable after its component is unloaded. - After some discussion we decided we should remove the automatic registration of component priority variables. Few component actually made use of this feature. - The enumerator interface was updated to be general enough to handle future uses of the interface. - The code to generate ompi_info output has been moved into the MCA variable system. See mca_base_var_dump(). opal: update core and components to mca_base_var system orte: update core and components to mca_base_var system ompi: update core and components to mca_base_var system This commit also modifies the rmaps framework. The following variables were moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode, rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables. This commit was SVN r28236.	2013-03-27 21:09:41 +00:00
Ralph Castain	b7f0e46319	Provide a nicer error message when someone gives a bad signal number to opal_signal cmr:v1.7.1 This commit was SVN r28188.	2013-03-20 15:30:59 +00:00
Ralph Castain	a4b6fb241f	Remove all remaining vestiges of the Windows integration This commit was SVN r28137.	2013-02-28 17:31:47 +00:00
Ralph Castain	bd9265c560	Per the meeting on moving the BTLs to OPAL, move the ORTE database "db" framework to OPAL so the relocated BTLs can access it. Because the data is indexed by process, this requires that we define a new "opal_identifier_t" that corresponds to the orte_process_name_t struct. In order to support multiple run-times, this is defined in opal/mca/db/db_types.h as a uint64_t without identifying the meaning of any part of that data. A few changes were required to support this move: 1. the PMI component used to identify rte-related data (e.g., host name, bind level) and package them as a unit to reduce the number of PMI keys. This code was moved up to the ORTE layer as the OPAL layer has no understanding of these concepts. In addition, the component locally stored data based on process jobid/vpid - this could no longer be supported (see below for the solution). 2. the hash component was updated to use the new opal_identifier_t instead of orte_process_name_t as its index for storing data in the hash tables. Previously, we did a hash on the vpid and stored the data in a 32-bit hash table. In the revised system, we don't see a separate "vpid" field - we only have a 64-bit opaque value. The orte_process_name_t hash turned out to do nothing useful, so we now store the data in a 64-bit hash table. Preliminary tests didn't show any identifiable change in behavior or performance, but we'll have to see if a move back to the 32-bit table is required at some later time. 3. the db framework was a "select one" system. However, since the PMI component could no longer use its internal storage system, the framework has now been changed to a "select many" mode of operation. This allows the hash component to handle all internal storage, while the PMI component only handles pushing/pulling things from the PMI system. This was something we had planned for some time - when fetching data, we first check internal storage to see if we already have it, and then automatically go to the global system to look for it if we don't. Accordingly, the framework was provided with a custom query function used during "select" that lets you seperately specify the "store" and "fetch" ordering. 4. the ORTE grpcomm and ess/pmi components, and the nidmap code, were updated to work with the new db framework and to specify internal/global storage options. No changes were made to the MPI layer, except for modifying the ORTE component of the OMPI/rte framework to support the new db framework. This commit was SVN r28112.	2013-02-26 17:50:04 +00:00
Brian Barrett	fc3df11e08	Remove the (only two) fortran constants from OPAL. The only places that actually care if opal_pointer_array is limited to handle_max already passes that in as the max_size during init, so don't need it there. The arch constant was a bit more difficult, so pass that in during MPI init and leave empty otherwise. This is to help with the effort to allow building ompi against an external opal or orte. This commit was SVN r27817.	2013-01-15 01:27:36 +00:00
Ralph Castain	de486d3000	Silence compiler warnings This commit was SVN r27589.	2012-11-12 02:51:05 +00:00
Nathan Hjelm	a754674fd7	Per the specification for putenv (http://pubs.opengroup.org/onlinepubs/009604599/functions/putenv.html ) the string given to putenv becomes part of the environment. The string must not be changed or freed. cmr:v1.7 This commit was SVN r27578.	2012-11-09 16:33:14 +00:00
Jeff Squyres	fb2e543a57	Refs trac:3275. We ran into a case where the OMPI SVN trunk grew a new acceptable MCA parameter value, but this new value was not accepted on the v1.6 branch (hwloc_base_mem_bind_failure_action -- on the trunk it accepts the value "silent", but on the older v1.6 branch, it doesn't). If you set "hwloc_base_mem_bind_failure_action=silent" in the default MCA params file and then accidentally ran with the v1.6 branch, every OMPI executable (including ompi_info) just failed because hwloc_base_open() would say "hey, 'silent' is not a valid value for hwloc_base_mem_bind_failure_action!". Kaboom. The only problem is that it didn't give you any indication of where this value was being set. Quite maddening, from a user perspective. So we changed the ompi_info handles this case. If any framework open function return OMPI_ERR_BAD_PARAM (either because its base MCA params got a bad value or because one of its component register/open functions return OMPI_ERR_BAD_PARAM), ompi_info will stop, print out a warning that it received and error, and then dump out the parameters that it has received so far in the framework that had a problem. At a minimum, this will show the user the MCA param that had an error (it's usually the last one), and ''where it was set from'' (so that they can go fix it). We updated ompi_info to check for O???_ERR_BAD_PARAM from each from the framework opens. Also updated the doxygen docs in mca.h for this O???_BAD_PARAM behavior. And we noticed that mca.h had MCA_SUCCESS and MCA_ERR_??? codes. Why? I think we used them in exactly one place in the code base (mca_base_components_open.c). So we deleted those and just used the normal OPAL_* codes instead. While we were doing this, we also cleaned up a little memory management during ompi_info/orte-info/opal-info finalization. Valgrind still reports a truckload of memory still in use at ompi_info termination, but they mostly look to be components not freeing memory/resources properly (and outside the scope of this fix). This commit was SVN r27306. The following Trac tickets were found above: Ticket 3275 --> https://svn.open-mpi.org/trac/ompi/ticket/3275	2012-09-11 20:47:24 +00:00
George Bosilca	f7528bb404	Remove unused variables. This commit was SVN r26966.	2012-08-08 12:43:13 +00:00
George Bosilca	2303cd0bdb	Remove initialized but unused variables. This commit was SVN r26959.	2012-08-07 12:05:25 +00:00
Jeff Squyres	ce85596bc9	Also show the memcpy framework (if there are any components, which there probably won't be, but...). This commit was SVN r26879.	2012-07-26 21:28:41 +00:00
Shiqing Fan	8c4a3e1269	correct the symbol dllexports for windows build This commit was SVN r26827.	2012-07-22 08:54:50 +00:00
Shiqing Fan	12d99a9ebb	Update the hwloc build on Windows and related files. This commit was SVN r26818.	2012-07-20 12:14:28 +00:00
Jeff Squyres	7a4f6a6a1a	Assign the "ret" variable before it is used. This commit was SVN r26781.	2012-07-11 12:09:00 +00:00
Ralph Castain	e335de3564	Refactor ompi_info, splitting it into parts according to the layer involved. Thus, we call down to the opal layer to get those frameworks and components, and down to the orte layer to get those. Still some abstraction breaks, but they mostly involve renaming of OMPI_foo labels that have been around since before we split the build system by layer. This commit was SVN r26695.	2012-06-28 18:23:34 +00:00
Jeff Squyres	2ba10c37fe	Per RFC, bring in the following changes: * Remove paffinity, maffinity, and carto frameworks -- they've been wholly replaced by hwloc. * Move ompi_mpi_init() affinity-setting/checking code down to ORTE. * Update sm, smcuda, wv, and openib components to no longer use carto. Instead, use hwloc data. There are still optimizations possible in the sm/smcuda BTLs (i.e., making multiple mpools). Also, the old carto-based code found out how many NUMA nodes were ''available'' -- not how many were used ''in this job''. The new hwloc-using code computes the same value -- it was not updated to calculate how many NUMA nodes are used ''by this job.'' * Note that I cannot compile the smcuda and wv BTLs -- I ''think'' they're right, but they need to be verified by their owners. * The openib component now does a bunch of stuff to figure out where "near" OpenFabrics devices are. '''THIS IS A CHANGE IN DEFAULT BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors (I do not have a NUMA machine with an OpenFabrics device that is a non-uniform distance from multiple different NUMA nodes). * Completely rewrite the OMPI_Affinity_str() routine from the "affinity" mpiext extension. This extension now understands hyperthreads; the output format of it has changed a bit to reflect this new information. * Bunches of minor changes around the code base to update names/types from maffinity/paffinity-based names to hwloc-based names. * Add some helper functions into the hwloc base, mainly having to do with the fact that we have the hwloc data reporting ''all'' topology information, but sometimes you really only want the (online \| available) data. This commit was SVN r26391.	2012-05-07 14:52:54 +00:00
Jeff Squyres	0652e2e913	Fix/clarify a comment from r26322. This commit was SVN r26323. The following SVN revision numbers were found above: r26322 --> open-mpi/ompi@aba398ce09	2012-04-24 17:35:19 +00:00
Jeff Squyres	aba398ce09	Per RFC (http://www.open-mpi.org/community/lists/devel/2012/04/10905.php), set opal_cache_line_size via hwloc data, if we have it. opal_cache_line_size will be set to an hwloc-inspired value by the end of orte_init(), but will always have a safe value to use (i.e., a default value 128) -- even before opal_init() has completed. Default to the same value of 128 that Open MPI has used for several years if a) we have no hwloc data, or b) we weren't able to find L2 objects in the hwloc data. This commit was SVN r26322.	2012-04-24 17:31:06 +00:00
Ralph Castain	bd8b4f7f1e	Sorry for mid-day commit, but I had promised on the call to do this upon my return. Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code. Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch. This commit was SVN r26242.	2012-04-06 14:23:13 +00:00
Josh Hursey	1941f6b3b1	Cleanup some compiler warnings when doing an optimized/non-debug build. This commit was SVN r26236.	2012-04-04 20:40:16 +00:00
Jeff Squyres	63a96e92b5	In a recent v1.5 branch issue, it took a while to figure out that paffinity hwloc was returning "NOT_SUPPORTED" when the real problem was that the underlying hwloc simply hadn't been initialized yet. So let's clearly delineate this case: return OPAL_ERR_NOT_INITIALIZED if the underlying hwloc is not initialized. This commit was SVN r25902.	2012-02-10 18:29:52 +00:00
Jeff Squyres	1d3dc0af28	Gah! opal_shmem_base_register_params() ''wasn't'' added for the mmap on NFS warning -- it was already there! So put it back so that it can register base_verbose and RUNTIME_QUERY_hint. This commit was SVN r25663.	2011-12-15 21:14:34 +00:00
Jeff Squyres	9cef715194	Updates to r25652 -- put this MCA param in the shmem/mmap component. No need for it to be in the base (we mistakenly thought it was used in multiple shmem components). This commit was SVN r25662. The following SVN revision numbers were found above: r25652 --> open-mpi/ompi@7e223b5799	2011-12-15 20:41:14 +00:00
Ralph Castain	7e223b5799	Okay, okay...stop the whining! Put the mca param registration in the shmem base. This commit was SVN r25652.	2011-12-14 22:25:32 +00:00
Ralph Castain	4303958968	Allow users to silence warning This commit was SVN r25650.	2011-12-14 21:50:34 +00:00
Josh Hursey	58938b2f50	* Clarified show help when CRS component cannot be loaded. * Fixes trac:2329 : Improves the error message, and ensures opal-restart will not segv in opal_finalize. This commit was SVN r25586. The following Trac tickets were found above: Ticket 2329 --> https://svn.open-mpi.org/trac/ompi/ticket/2329	2011-12-07 14:58:08 +00:00
Ralph Castain	6fefe236a4	Warn users if they set opal_paffinity_alone, either to true or false, that this parameter is no longer functional - they must use the --bind-to option and its corresponding mca param. This commit was SVN r25567.	2011-12-03 01:10:52 +00:00
Samuel Gutierrez	375162c693	this commit fixes a few things. 1. silence warning in common sm. 2. remove unneeded config code in common sm. 3. move opal_shmem_base_close to a better place in opal_finalize. 4. fix opal_path_nfs output. This commit was SVN r25518.	2011-11-28 23:41:19 +00:00
Ralph Castain	6310361532	At long last, the fabled revision to the affinity system has arrived. A more detailed explanation of how this all works will be presented here: https://svn.open-mpi.org/trac/ompi/wiki/ProcessPlacement The wiki page is incomplete at the moment, but I hope to complete it over the next few days. I will provide updates on the devel list. As the wiki page states, the default and most commonly used options remain unchanged (except as noted below). New, esoteric and complex options have been added, but unless you are a true masochist, you are unlikely to use many of them beyond perhaps an initial curiosity-motivated experimentation. In a nutshell, this commit revamps the map/rank/bind procedure to take into account topology info on the compute nodes. I have, for the most part, preserved the default behaviors, with three notable exceptions: 1. I have at long last bowed my head in submission to the system admin's of managed clusters. For years, they have complained about our default of allowing users to oversubscribe nodes - i.e., to run more processes on a node than allocated slots. Accordingly, I have modified the default behavior: if you are running off of hostfile/dash-host allocated nodes, then the default is to allow oversubscription. If you are running off of RM-allocated nodes, then the default is to NOT allow oversubscription. Flags to override these behaviors are provided, so this only affects the default behavior. 2. both cpus/rank and stride have been removed. The latter was demanded by those who didn't understand the purpose behind it - and I agreed as the users who requested it are no longer using it. The former was removed temporarily pending implementation. 3. vm launch is now the sole method for starting OMPI. It was just too darned hard to maintain multiple launch procedures - maybe someday, provided someone can demonstrate a reason to do so. As Jeff stated, it is impossible to fully test a change of this size. I have tested it on Linux and Mac, covering all the default and simple options, singletons, and comm_spawn. That said, I'm sure others will find problems, so I'll be watching MTT results until this stabilizes. This commit was SVN r25476.	2011-11-15 03:40:11 +00:00
Ralph Castain	1bfc2bb424	Minor cleanup This commit was SVN r25417.	2011-11-02 18:24:19 +00:00

1 2 3 4 5 ...

321 Коммитов