openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	265e5b9795	Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1 ompi/opal/orte/oshmem/test: max hostname length cleanup	2016-05-02 09:44:18 -04:00
Ralph Castain	6ac7929bd0	Extend the schizo framework to allow definition of CLI options by environment. Refactor orterun to mesh with the orted_submit code, thus improving code reuse. Eliminate the orte-submit tool as orterun can now meet that need. Cleanups per @jjhursey review	2016-05-01 11:30:25 -07:00
Karol Mroz	e1c64e6e59	opal: standardize on max hostname length Define OPAL_MAXHOSTNAMELEN to be either: (MAXHOSTNAMELEN + 1) or (limits.h:HOST_NAME_MAX + 1) or (255 + 1) For pmix code, define above using PMIX_MAXHOSTNAMELEN. Fixup opal layer to use the new max. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-24 08:19:47 +02:00
Nathan Hjelm	c2b6fbb124	opal/memory: move initialization to first rcache creation Because of the removal of the linux memory component it is no longer necessary to initialize the memory component in opal_init(). This commit moves the initialization to the creation of the first rcache component. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-13 17:21:46 -06:00
Nathan Hjelm	27f8a4e806	opal: add code patcher framework This commit adds a framework to abstract runtime code patching. Components in the new framework can provide functions for either patching a named function or a function pointer. The later functionality is not being used but may provide a way to allow memory hooks when dlopen functionality is disabled. This commit adds two different flavors of code patching. The first is provided by the overwrite component. This component overwrites the first several instructions of the target function with code to jump to the provided hook function. The hook is expected to provide the full functionality of the hooked function. The linux patcher component is based on the memory hooks in ucx. It only works on linux and operates by overwriting function pointers in the symbol table. In this case the hook is free to call the original function using the function pointer returned by dlsym. Both components restore the original functions when the patcher framework closes. Changes had to be made to support Power/PowerPC with the Linux dynamic loader patcher. Some of the changes: - Move code necessary for powerpc/power support to the patcher base. The code is needed by both the overwrite and linux components. - Move patch structure down to base and move the patch list to mca_patcher_base_module_t. The structure has been modified to include a function pointer to the function that will unapply the patch. This allows the mixing of multiple different types of patches in the patch_list. - Update linux patching code to keep track of the matching between got entry and original (unpatched) address. This allows us to completely clean up the patch on finalize. All patchers keep track of the changes they made so that they can be reversed when the patcher framework is closed. At this time there are bugs in the Linux dynamic loader patcher so its priority is lower than the overwrite patcher. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-13 17:16:13 -06:00
rhc54	f858647779	Merge pull request #1522 from kmroz/wip-ompi-info-params-fix opal_info_support: fix memory leak and refactor for pvars	2016-04-05 10:02:42 -07:00
Nathan Hjelm	444190093a	Merge pull request #1516 from kmroz/wip-ompi-info-cleanup opal_info_support: fix api comments	2016-04-05 07:31:33 -06:00
Karol Mroz	13979f559f	opal_info_support: refactor component output separation Adding component name to the pvar pretty output as well. Further, I think keeping the asprintf()/opal_info_out(msg,msg,------) within the loops is needed to avoid printing any component information (independently for group_vars and group_pvars) in case the upcoming parameters are internal and not to be displayed. Lastly, unnecessarily duplicating the dashed output should not happen as each invocation of opal_info_show_mca_group_params() passes a new group structure which we check. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-04 19:49:57 +02:00
Karol Mroz	43254391a8	opal_info_support: fix memory leak Fixing a memory leak I introduced with `a3229c3a1f`. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-04 18:11:11 +02:00
rhc54	a548232c6b	Merge pull request #1518 from kmroz/wip-ompi-info-param-output-1 opal_info_support: add component to param pretty output	2016-04-03 06:39:00 -07:00
Karol Mroz	e1eb23e7eb	opal_info_support: separate parameter groups in pretty output Use a dashed line to separate parameters based on component when pretty printing. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-03 10:56:39 +02:00
Karol Mroz	a3229c3a1f	opal_info_support: add component to param pretty output When listing available parameters, add the component name to the MCA framework field. Parsable option is already doing this, makes sense for the pretty print option to do it as well. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-03 10:49:32 +02:00
Karol Mroz	20e448c7d8	opal_info_support: output component versions When invoking, for example, `ompi_info` with: -a --params foo all --params foo bar it's useful to have the appropriate components and their versions be displayed, regardless of whether they have registered any parameters. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-02 21:21:40 +02:00
Karol Mroz	a468c3ba1a	opal_info_support: pass component map when handling params Pass component_map to opal_info_do_params(). It will be needed to output component versions. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-02 21:17:44 +02:00
Karol Mroz	296bd156e7	opal_info_support: fix api comments Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-02 17:19:40 +02:00
Ralph Castain	60a7bc2e50	Enable the PMIx notification callback system. This currently is only supported by the pmix120 component, which is not selected by default. All other components will ignore error registration requests, and thus do not support debugger attach when launched via mpirun. Note that direct launched applications will support such attachment, but may not do so in a scalable fashion. Fixes ##1225	2016-02-18 09:29:12 -08:00
Gilles Gouaillardet	0fb7b07a71	opal/progress: fix non debug builds this bug was introduced in open-mpi/ompi@64b695669a Thanks Pavel (Pasha) Shamis for reporting this issue	2016-01-09 15:47:40 +09:00
Gilles Gouaillardet	0b3e3c6817	opal/runtime: add missing #include <unistd.h> Thanks Marco Atzeri for contributing the original patch	2015-12-24 14:41:56 +09:00
Ralph Castain	64b695669a	Cleanup warnings in opal and orte layers when building optimized on Mac	2015-12-17 07:51:24 -08:00
igor.ivanov@itseez.com	c15bf147bf	opal: Add opal_abort_print_stack mca variable with aliases for ompi/oshmem This commit allows to control output during abnormal oshmem/ompi application termination. Fixed issue in backtrace output. HAVE_BACKTRACE was never set so user was limited in control of this variable. Two related mca variables are moved to opal layer. Corresponding aliases are added for ompi and oshmem.	2015-11-25 18:18:33 +02:00
Jeff Squyres	a7e7ecd42d	opal_progress_thread: fix stale comment	2015-10-14 18:25:31 -07:00
Jeff Squyres	92bc8afd43	opal_progress_threads: fix double RELEASE If a thread failed to start, the tracker would be released twice. This commit fixes CID 1316020.	2015-08-12 05:11:40 -07:00
Jeff Squyres	99fa054507	opal_progress_threads: update to the API There are now four functions and one global constant: * opal_progress_thread_name: the name of the OPAL-wide async progress thread. If you have general purpose events that you need to run in a progress thread, but not a dedicated progress thread, use this name in the functions below to glom your events on to the general OPAL-wide async progress thread. * opal_progress_thread_init(): return an event base corresponding to a progress thread of the specified name (a progress thread will be created for that name if it does not already exist). * opal_progress_thread_finalize(): decrement the refcount on the passed progress thread name. If the refcount is 0, stop the thread and destroy the event base. * opal_progress_thread_pause(): stop processing events on the event base corresponding to the progress thread name, but do not destroy the event base. * opal_progess_thread_resume(): resume processing events on the event base corresponding to a previously-paused progress thread name.	2015-08-07 10:13:40 -07:00
Ralph Castain	219c4dfba5	Create a new opal_async_event_base and have the pmix/native and ORTE level use it. This reduces our thread count by one.	2015-07-12 08:23:34 -07:00
Ralph Castain	683efcb850	Rename the current opal_event_base to opal_sync_event_base in preparation for adding an async progress thread to opal. No functional changes made here - just a simple rename.	2015-07-11 10:08:19 -07:00
Nathan Hjelm	4d92c9989e	more c99 updates This commit does two things. It removes checks for C99 required headers (stdlib.h, string.h, signal.h, etc). Additionally it removes definitions for required C99 types (intptr_t, int64_t, int32_t, etc). Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-06-25 10:14:13 -06:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Jeff Squyres	d164fe9bc5	opal_params.c: fix typo in comment	2015-06-06 10:17:20 -07:00
Nathan Hjelm	427aebbaca	Fix cuda support MCA variables This commit fixes some issues with the cuda support parameters. There were a couple of duplicate registrations and an incorrect synonym (one variable was made a synonym of mpi_preconnect_mpi). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-05-12 09:52:51 -06:00
Gilles Gouaillardet	c809aace47	initialize common symbols from opal A few uninitialized common symbols are remaining: common symbols generated by flex : * opal/util/keyval/keyval_lex.l: opal_util_keyval_yyleng * opal/util/keyval/keyval_lex.o: opal_util_keyval_yytext * opal/util/show_help_lex.l: opal_show_help_yyleng * opal/util/show_help_lex.l: opal_show_help_yytext common symbol generated by "external" hwloc library: * opal/mca/hwloc/hwloc191/hwloc/src/components.o: component_map	2015-05-08 09:48:51 +09:00
Nathan Hjelm	8287e1d28f	Merge pull request #528 from hjelmn/add_destructor Add opal destructor/fini function	2015-04-20 11:30:15 -06:00
Jeff Squyres	1e6a558993	opal_info_support.c: whitespace cleanup No code changes	2015-04-20 08:56:42 -07:00
Jeff Squyres	1f237b78d1	*_info tools: quote parsable values if they contain colons Thanks to Lev Givon for the suggestion.	2015-04-20 08:56:42 -07:00
Nathan Hjelm	662460b06b	Modify destructor function configury Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-20 09:51:06 -06:00
Nathan Hjelm	38589c46c0	opal: add a destructor/fini function to opal This commit is related to an RFC from June 2014. Disscussion can be found at: http://www.open-mpi.org/community/lists/devel/2014/07/15140.php The finalize function is set using either the linker option -fini or __attribute__((destructor)) depending on compiler support. I have confirmed that this hybrid approach works with all the major compilers. The attribute is supported by gcc, clang, llvm, xlc, and icc. The fini function will support pgi. If a compiler/linker combination does not support either the destructor or fini function a message will be printed on re-init indicating it is not supported (an improvement over the current behavior-- SEGV). I moved the following to the destructor function: - Class system finalize. This solves a bug when MPI_T_finalize is called before MPI_Init. The only downside to this change is we will leave the footprint of the opal class system after MPI_Finalize. This footprint should be relatively small. This is an alternative to #517 but the two PRs are not mutually-exclusive (with some modifications). This commit should also be safe for 1.8.x as it does not change internal or external ABI (#517 changes internal ABI). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-16 19:53:52 -06:00
Nathan Hjelm	e794658f2d	Merge pull request #516 from hjelmn/repository_update RFC: Repository update	2015-04-15 10:03:08 -06:00
Nathan Hjelm	c954f457d9	mca/base: update the way dynamic components are handled This commit is a rework of the component repository. The changes included in this commit are: - Remove the component dependency code based off .ompi_info files. This code is legacy code dating back 10 years that and is no longer used. - Move the plugin scanning code to the component repository. New calls have been added to add new scanning paths, query available components, and dlopen/load components. - Pass the framework down to mca_base_component_find/filter. Eventually the framework structure will be used to further validate components before they are used. - Add support to the MCA framework system to disable scanning for dlopened components on open (support already existed in register). This is really only relevant to installdirs as it has no register function and no DSO components. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-14 15:55:33 -06:00
Nathan Hjelm	a7b0c00ab6	fix memory leaks and valgrind errors This commit fixes several vagrind errors. Included: - installdirs did not correctly reinitialize all pointers to NULL at close. This causes valgrind errors on a subsequent call to opal_init_tool. - several opal strings were leaked by opal_deregister_params which was setting them to NULL instead of letting them be freed by the MCA variable system. - move opal_net_init to AFTER the variable system is initialized and opal's MCA variables have been registered. opal_net_init uses a variable registered by opal_register_params! - do not leak ompi_mpi_main_thread when it is allocated by MPI_T_init_thread. - do not overwrite ompi_mpi_main_thread if it is already set (by MPI_T_init_thread). - mca_base_var: read_files was overwritting mca_base_var_file_list even if it was non-NULL. - mca_base_var: set all file global variables to initial states on finalize. - btl/vader: decrement enumerator reference count to ensure that it is freed. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-11 09:28:35 -06:00
Nathan Hjelm	9cd955badf	opal: fix multiple bugs in MCA and opal This commit fixes the following bugs: - opal_output_finalize did not properly set internal state. This caused problems when calling the sequence opal_output_init (), opal_output_finalize (), opal_output_init (). - opal_info support called mca_base_open () but never called the matching mca_base_close (). mca_base_open () and mca_base_close () have been updated to use a open count instead of an open flag to allow mca_base_open to be called through multiple paths (as may be the case when MPI_T is in use). - orte_info support did not register opal variables. This can cause orte-info to not return opal variables. - opal_info, orte_info, and ompi_info support have been updated to use a register count. - When opening the dl framework the reference count was added to ensure the framework stuck around. The framework being closed prematurely was a bug in the MCA base that has since been corrected. The increment (and associated decrement) have been removed. - dl/dlopen did not set the value of mca_dl_dlopen_component.filename_suffixes_mca_storage on each call to register. Instead the value was set in the component structure. This caused the value to be lost when re-loading the component. Fixed by setting the default value in register. - Reset shmem framework state on close to avoid returning a stale component after reloading opal/shmem. - MCA base parameters were not properly deregistered when the MCA base was closed. This commit may fix #374. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-07 19:13:20 -06:00
Ralph Castain	9dbc69df0f	Stop an ugly infinite loop caused by continual re-opening of the opal if framework.	2015-03-24 17:50:14 -07:00
Adrian Reber	f45dd069bd	FT: fix compilation using --with-ft (1/5) Enabling the FT code breaks compilation (again). This series tries to fix the compiler errors. This is again only fixing the compiler errors without any warranty that the result might actually support FT again. This first patch moves orte_cr_continue_like_restart from ORTE to opal_cr_continue_like_restart in OPAL. This only leaves three calls from OPAL to ORTE in the FT code. As it is not yet 100% clear how to handle these calls the code orte_sstore.set_attr() has been #ifdef'd out for now.	2015-03-11 14:23:33 +01:00
Gilles Gouaillardet	f7f7fa73dd	opal_cr: fix incorrect NULL assignment as reported by Coverity with CID 1288084	2015-03-10 12:06:57 +09:00
Gilles Gouaillardet	1746e23f11	opal/cr: fix misc memory leak and error case as reported by Coverity with CIDs 71858 and 710640	2015-03-09 19:28:52 +09:00
Mike Dubman	98503b56e0	Revert "create the opal_common_verbs_want_fork_support parameter."	2015-03-03 14:28:31 +02:00
Alina Sklarevich	8fe42f1bc1	create the opal_common_verbs_want_fork_support parameter. call the opal_common_verbs_mca_register function to make sure that opal_common_verbs_want_fork_support mca parameter is created and therefore can be used to control the fork support.	2015-03-01 17:40:49 +02:00
Jeff Squyres	336626dafe	spelling: trivial spelling fix s/interupted/interrupted/gi	2015-02-27 18:30:43 -08:00
George Bosilca	aeace0468e	A more sensible fix, move the MCA variable in the verbs common area.	2015-02-26 16:51:09 -05:00
George Bosilca	6777f3ac3c	Add missing qualifiers to the global variable.	2015-02-26 16:25:56 -05:00
Alina Sklarevich	e4c4e7df5e	Fix the calls to ibv_fork_init and remove btl_openib_want_fork_support. In order to have an effect, ibv_fork_init should be called in the beginning of the verbs initialization flow - before the calls to the ibv_create_qp and ibv_create_cq verbs. These functions are called from the oob/ud code and by the time the other verbs components (btl openib, pml yalla, ...) call ibv_fork_init, it's too late. This commit forces the call to ibv_fork_init (if it's requested) right at the beginning of all the components that are using verbs. (ibv_fork_init() can be safely called multiple times) This commit also removes the btl_openib_want_fork_support mca parameter and adds a new mca parameter instead - opal_verbs_want_fork_support. Through this new parameter, fork support may be requested for ALL components. The default value for this parameter is set to 1. Before this commit the btl_openib_want_fork_support parameter didn't provide fork support for the openib btl if its value was set to 1. (because when openib called ibv_fork_init, it was already after the calls to ibv_create_* in oob/ud and thereofre it failed).	2015-02-25 10:58:50 +02:00
Jeff Squyres	04d9085c3b	opal_info_support: protect against (group->group_component==NULL) This was CID 1196660	2015-02-24 15:24:09 -05:00
Jeff Squyres	6c3ddf98ae	revert open-mpi/ompi@c75650e68f That commit was a bad idea.	2015-02-23 15:39:53 -08:00
Jeff Squyres	c75650e68f	opal_finalize: fix minor memory leak This string is strdup'ed in opal_init_util().	2015-02-23 12:34:49 -08:00
Jeff Squyres	4a85f759ec	opal_info_support.c: prevent a NULL pointer If NULL is passed in, then assume the caller meant "". This was CID 993714.	2015-02-12 13:41:29 -08:00
Ralph Castain	3ae3b96c17	Fix master compilation - a buried header dependency must have been removed.	2015-02-10 07:22:10 -08:00
Ralph Castain	f28238af59	Fix a race condition seen by Absoft during finalize. Stop the orte progress thread without cleaning it up, thus allowing the frameworks to still cancel their posted recv's. Then cleanup the memory footprint afterwards.	2015-02-05 11:41:37 -08:00
Gilles Gouaillardet	661c35ca67	cleanup dead code caused by the removal of the --with-threads configure option	2015-01-16 19:13:59 +09:00
Nathan Hjelm	d0da29351f	opal_progress: fix sched_yield check	2014-12-09 14:14:20 -07:00
Jeff Squyres	c22e1ae33b	configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros These two macros set the prefix for the OPAL and ORTE libraries, respectively. Specifically, the OPAL library will be named libPREFIXopen-pal.la and the ORTE library will be named libPREFIXopen-rte.la. These macros must be called, even if the prefix argument is empty. The intent is that Open MPI will call these macros with an empty prefix, but other projects (such as ORCM) will call these macros with a non-empty prefix. For example, ORCM libraries can be named liborcm-open-pal.la and liborcm-open-rte.la. This scheme is necessary to allow running Open MPI applications under systems that use their own versions of ORTE and OPAL. For example, when running MPI applications under ORTE, if the ORTE and OPAL libraries between OMPI and ORCM are not identical (which, because they are released at different times, are likely to be different), we need to ensure that the OMPI applications link against their ORTE and OPAL libraries, but the ORCM executables link against their ORTE and OPAL libraries.	2014-10-22 10:32:19 -07:00
Jeff Squyres	01fd96bfa5	Revert "Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build." This reverts commit `63f619f871`.	2014-10-22 10:32:11 -07:00
Ralph Castain	63f619f871	Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build.	2014-10-10 11:39:08 -07:00
Ralph Castain	fd6a044b7f	Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages. Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.	2014-10-03 16:02:57 -06:00
Jeff Squyres	413e775dbf	version configury: make dist now works Update the VERSION file scheme: * Remove "want_repo_rev". * Add "tarball_version". All values are now always included (major, minor, release, greek, repo_rev). However, configure.ac now runs "opal_get_version.sh ... --tarball", which will return the value of tarball_version (if it is non-empty) or the "full" version string (i.e., "major.minor.releasegreek").	2014-10-02 11:32:54 -07:00
Jeff Squyres	d4e2809531	version: always use all 3 version numbers In all previous releases, the version number would be "A.B.C" unless C was 0, in which case it would be "A.B". This commit changes that scheme to always be "A.B.C", even if C==0. Hence, v1.9.0 will be the first release where this new scheme is evident. This commit was SVN r32816.	2014-09-30 15:54:18 +00:00
Artem Polyakov	f2e586980b	Fix timing framework: 1. Fixes according to (http://www.open-mpi.org/community/lists/devel/2014/09/15869.php) 2. Force mpisync:rank0 to gather results. Now sync info is written by rank0 to the output file. 3. Improve mpirun_prof: 1) adopt to the environment (SLURM/TORQUE); 2) recognize some noteset-related mpirun options. This commit was SVN r32772.	2014-09-23 12:59:54 +00:00
Artem Polyakov	70587d1804	Remove outdated OPAL parameter "opal_pmi_version". Now PMI selection is handled by PMIx MCA. This commit was SVN r32767.	2014-09-20 02:30:23 +00:00
Ralph Castain	dfb952fa78	[Contribution from Artem - moved it to svn from git for him] Replace our old, clunky timing setup with a much nicer one that is only available if configured with --enable-timing. Add a tool for profiling clock differences between the nodes so you can get more precise timing measurements. I'll ask Artem to update the Github wiki with full instructions on how to use this setup. This commit was SVN r32738.	2014-09-15 18:00:46 +00:00
Ralph Castain	e32d541c8d	Bring over a slight modification to the opal_init_test routine This commit was SVN r32676.	2014-09-07 15:46:53 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Jeff Squyres	0a398c155f	opal MCA params: Move (and adapt) help message to opal help file This commit was SVN r32547.	2014-08-16 11:54:41 +00:00
Ralph Castain	a347b19dc1	Add missing include This commit was SVN r32406.	2014-08-01 18:49:37 +00:00
Ralph Castain	76d82b885f	Correctly dereference the thread object This commit was SVN r32321.	2014-07-26 17:01:27 +00:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00
Ralph Castain	c1bb5b68d0	It is possible to have a "standard" progress thread, so simplify the usage of the opal_progress_thread code. This commit was SVN r32277.	2014-07-22 16:55:23 +00:00
George Bosilca	7c21491858	Fix the indentation. Allow for deregistration of OPAL params. This commit was SVN r32242.	2014-07-15 05:20:26 +00:00
George Bosilca	2f6bc76dc1	Be symmetric to the opal_init. This commit was SVN r32241.	2014-07-15 05:05:26 +00:00
George Bosilca	77c74e8872	Don't iniaitlize twice the event framework (it is already initialized the init_tool). This commit was SVN r32239.	2014-07-15 05:04:29 +00:00
Mike Dubman	e342a11c2e	opal envlist mca: implement Jeff`s quibbles fixed by Elena, reviewed by Miked This commit was SVN r32216.	2014-07-11 07:23:20 +00:00
Ralph Castain	60da1456d9	Silence unused var warning This commit was SVN r32187.	2014-07-09 22:37:22 +00:00
Joshua Ladd	057370364d	Opal: Add a new MCA variable type "version_string". Also add a new flag to ompi_info that allows a user to print all MCA variables of a specific type. --type version_string This command will print all MCA variables of type version_string. This feature was developed by Elena Shipunova and was reviewed by Josh Ladd. This commit was SVN r32166.	2014-07-09 01:37:23 +00:00
Ralph Castain	832fa4a028	Ensure that the progress thread tracker properly cleans up the blocking event, if set. Also, use the blocking event to help wake up the progress thread for quick shutdown as some threads can be blocked in a long-running call to select. This commit was SVN r32141.	2014-07-04 14:55:51 +00:00
Ralph Castain	f6d4b4c11b	As discussed at the OMPI developer's meeting, add functions to start, stop, and restart libevent-driven progress threads. Critical NOTE: if you don't have a file descriptor event defined for your progress thread, it will spin hard! Accordingly, the "start progress thread" function has a boolean parameter you can use to request that the function automatically create one for you. This commit was SVN r32137.	2014-07-03 18:56:46 +00:00
Ralph Castain	1107f9099e	Per the RFC issued here: http://www.open-mpi.org/community/lists/devel/2014/05/14827.php Refactor PMI support This commit was SVN r31907.	2014-06-01 04:28:17 +00:00
Ralph Castain	5602156a1c	Use the correct abstraction layer name for the data dirs This commit was SVN r31684.	2014-05-08 14:32:24 +00:00
Ralph Castain	11faab1091	The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees. This commit was SVN r31679.	2014-05-08 02:01:35 +00:00
Jeff Squyres	24e222f49d	opal: Remove unused help message. Help messages about deprecated variables are now provided by the MCA var system. This commit was SVN r31325.	2014-04-07 15:41:43 +00:00
Nathan Hjelm	a4fff57720	make the opal progress yield variable settable at any time The semantics of the variable mpi_yield_when_idle are to call opal_progress_set_yield_when_idle at MPI_Init. It would be difficult to modify the old variable to support setting this parameter at runtime. The fix is to add an additional parameter to opal: opal_progress_yield_when_idle that directly sets the variable. This variable is settable anytime and does not affect the semantics of the old mpi_yield_when_idle variable. Refs trac:193 cmr=v1.8.1:reviewer=jsquyres This commit was SVN r31255. The following Trac tickets were found above: Ticket 193 --> https://svn.open-mpi.org/trac/ompi/ticket/193	2014-03-27 15:51:06 +00:00
Adrian Reber	4ca07ae125	re-introduce distill_checkpoint_ready In the OPAL_ENABLE_FT_CR code path there used to be a variable 'mca_base_component_distill_checkpoint_ready' which got removed. The FT code was not compiling and while trying to get it to compile again the old variable was #ifdef'd out. This re-introduces the variable with a new name 'opal_base_distill_checkpoint_ready' and enables the code previously #ifdef'd out. This removes the last hack introduced to get the FT code to compile again. This commit was SVN r30928.	2014-03-04 16:14:46 +00:00
Ralph Castain	78e1846b4b	Add further clarification regarding new "test" APIs This commit was SVN r30567.	2014-02-05 15:48:31 +00:00
Ralph Castain	230336b6a8	Upgrade the security framework to avoid multiple hits against the global security server. Add support for future case where mpirun assings a global security credential for a given run, though we need to work out how to handle connect-accept from other mpirun's in that case. Remove a bunch of duplicate code in the OOB by consolidating the connection handshake code. Refs trac:4221 This commit was SVN r30554. The following Trac tickets were found above: Ticket 4221 --> https://svn.open-mpi.org/trac/ompi/ticket/4221	2014-02-04 14:47:04 +00:00
Ralph Castain	5980b7e042	Add a security framework for authenticating connections - we will add LDAP, Kerberos, and Keystone support in the next month. For now, just put a placeholder "basic" module that does the minimum. Wire the security check into ORTE's OOB handshake, and add a "version" check to ensure that both ends are from the same ORTE version. If not, report the mismatch and refuse the connection Fixes trac:4171 cmr=v1.7.5:reviewer=jsquyres:subject=Add a security framework for authenticating connections This commit was SVN r30551. The following Trac tickets were found above: Ticket 4171 --> https://svn.open-mpi.org/trac/ompi/ticket/4171	2014-02-04 01:38:45 +00:00
Ralph Castain	83e32aadb7	Add a variant of opal_init/finalize for running unit tests This commit was SVN r30497.	2014-01-30 11:14:36 +00:00
Ralph Castain	26fbb4e77b	Necessary constants for postgress module This commit was SVN r30338.	2014-01-20 19:58:56 +00:00
Jeff Squyres	13b29cff2c	This commit compliements/completes r30140. r30140 made all the configury/Makefile.am changes; this commit renames the internal installdirs.h framework struct field names to match the configry macro names: * pkgdatdir -> ompidatadir * pkglibdir -> ompilibdir * pkgincludedir -> ompiincludedir This commit was SVN r30145. The following SVN revision numbers were found above: r30140 --> open-mpi/ompi@8b778903d8	2014-01-07 23:36:33 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
Nathan Hjelm	3be4536d9b	Cleanup various leaks in ompi_info reported by valgrind. cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30058.	2013-12-23 17:47:43 +00:00
Nathan Hjelm	ee9cd13b90	Remove opal_recursion_depth_counter and opal_progress_thread_count. These counters add two atomics in the critical path and are not currently used. We can bring them back if there turns out to be a good use for them. cmr=v1.7.4:reviewer=brbarret This commit was SVN r29994.	2013-12-19 23:15:27 +00:00
Brian Barrett	6ef938de3f	* Per the Developer's meeting today, restructure the threading in Open MPI a bit more: - Remove OPAL_ENABLE_MULTI_THREADS, since it didn't really do anything correctly. Opal always has threads enabled at this point. - Remove OMPI_ENABLE_PROGRESS_THREADS, since this hasn't worked in 8 years and it has performance issues we'll never be able to overcome. Note that we have plans for re-adding async progress, using a hybrid protocol of async and sync sends. - OMPI_ENABLE_THREAD_MULTIPLE now determines whether the thread lock macros do the check or not. - Condition variables are ALWAYS polling right now, which fixes the thread live-lock currently found when THREAD_MULTIPLE is turned on. This commit was SVN r29891.	2013-12-13 19:40:12 +00:00
Jeff Squyres	f1bff698a4	Fix compiler warning: event is unsigned; it can't be negative cmr=v1.7.4:reviewer=rhc This commit was SVN r29684.	2013-11-13 15:35:37 +00:00
Alex Margolin	50a3c01a0f	fixed build without thread support This commit was SVN r29145.	2013-09-06 19:03:19 +00:00
Nathan Hjelm	77a41e1ca9	ompi_info: mark the variables from disabled components as disabled in the output of ompi_info. A variable is disabled if its component will never be selected due to a component selection parameter (eg. -mca btl self). The old behavior of ompi_info was to not print these parameters at all. Now we print the parameters. After some discussion with George it was decided that there needed to be some way to see what parameters will not be used. This was the comprimise. This commit also fixes a bug and a typo in the pvar sytem. The enum_count value in mca_base_pvar_dump was being used without being set. The full_name in mca_base_pvar_t was not being used. cmr=v1.7.3:ticket=trac:3734 This commit was SVN r29078. The following Trac tickets were found above: Ticket 3734 --> https://svn.open-mpi.org/trac/ompi/ticket/3734	2013-08-28 16:03:23 +00:00

1 2 3 4 5 ...

371 Коммитов