openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	9dbc69df0f	Stop an ugly infinite loop caused by continual re-opening of the opal if framework.	2015-03-24 17:50:14 -07:00
Adrian Reber	f45dd069bd	FT: fix compilation using --with-ft (1/5) Enabling the FT code breaks compilation (again). This series tries to fix the compiler errors. This is again only fixing the compiler errors without any warranty that the result might actually support FT again. This first patch moves orte_cr_continue_like_restart from ORTE to opal_cr_continue_like_restart in OPAL. This only leaves three calls from OPAL to ORTE in the FT code. As it is not yet 100% clear how to handle these calls the code orte_sstore.set_attr() has been #ifdef'd out for now.	2015-03-11 14:23:33 +01:00
Gilles Gouaillardet	f7f7fa73dd	opal_cr: fix incorrect NULL assignment as reported by Coverity with CID 1288084	2015-03-10 12:06:57 +09:00
Gilles Gouaillardet	1746e23f11	opal/cr: fix misc memory leak and error case as reported by Coverity with CIDs 71858 and 710640	2015-03-09 19:28:52 +09:00
Mike Dubman	98503b56e0	Revert "create the opal_common_verbs_want_fork_support parameter."	2015-03-03 14:28:31 +02:00
Alina Sklarevich	8fe42f1bc1	create the opal_common_verbs_want_fork_support parameter. call the opal_common_verbs_mca_register function to make sure that opal_common_verbs_want_fork_support mca parameter is created and therefore can be used to control the fork support.	2015-03-01 17:40:49 +02:00
Jeff Squyres	336626dafe	spelling: trivial spelling fix s/interupted/interrupted/gi	2015-02-27 18:30:43 -08:00
George Bosilca	aeace0468e	A more sensible fix, move the MCA variable in the verbs common area.	2015-02-26 16:51:09 -05:00
George Bosilca	6777f3ac3c	Add missing qualifiers to the global variable.	2015-02-26 16:25:56 -05:00
Alina Sklarevich	e4c4e7df5e	Fix the calls to ibv_fork_init and remove btl_openib_want_fork_support. In order to have an effect, ibv_fork_init should be called in the beginning of the verbs initialization flow - before the calls to the ibv_create_qp and ibv_create_cq verbs. These functions are called from the oob/ud code and by the time the other verbs components (btl openib, pml yalla, ...) call ibv_fork_init, it's too late. This commit forces the call to ibv_fork_init (if it's requested) right at the beginning of all the components that are using verbs. (ibv_fork_init() can be safely called multiple times) This commit also removes the btl_openib_want_fork_support mca parameter and adds a new mca parameter instead - opal_verbs_want_fork_support. Through this new parameter, fork support may be requested for ALL components. The default value for this parameter is set to 1. Before this commit the btl_openib_want_fork_support parameter didn't provide fork support for the openib btl if its value was set to 1. (because when openib called ibv_fork_init, it was already after the calls to ibv_create_* in oob/ud and thereofre it failed).	2015-02-25 10:58:50 +02:00
Jeff Squyres	04d9085c3b	opal_info_support: protect against (group->group_component==NULL) This was CID 1196660	2015-02-24 15:24:09 -05:00
Jeff Squyres	6c3ddf98ae	revert open-mpi/ompi@c75650e68f That commit was a bad idea.	2015-02-23 15:39:53 -08:00
Jeff Squyres	c75650e68f	opal_finalize: fix minor memory leak This string is strdup'ed in opal_init_util().	2015-02-23 12:34:49 -08:00
Jeff Squyres	4a85f759ec	opal_info_support.c: prevent a NULL pointer If NULL is passed in, then assume the caller meant "". This was CID 993714.	2015-02-12 13:41:29 -08:00
Ralph Castain	3ae3b96c17	Fix master compilation - a buried header dependency must have been removed.	2015-02-10 07:22:10 -08:00
Ralph Castain	f28238af59	Fix a race condition seen by Absoft during finalize. Stop the orte progress thread without cleaning it up, thus allowing the frameworks to still cancel their posted recv's. Then cleanup the memory footprint afterwards.	2015-02-05 11:41:37 -08:00
Gilles Gouaillardet	661c35ca67	cleanup dead code caused by the removal of the --with-threads configure option	2015-01-16 19:13:59 +09:00
Nathan Hjelm	d0da29351f	opal_progress: fix sched_yield check	2014-12-09 14:14:20 -07:00
Jeff Squyres	c22e1ae33b	configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros These two macros set the prefix for the OPAL and ORTE libraries, respectively. Specifically, the OPAL library will be named libPREFIXopen-pal.la and the ORTE library will be named libPREFIXopen-rte.la. These macros must be called, even if the prefix argument is empty. The intent is that Open MPI will call these macros with an empty prefix, but other projects (such as ORCM) will call these macros with a non-empty prefix. For example, ORCM libraries can be named liborcm-open-pal.la and liborcm-open-rte.la. This scheme is necessary to allow running Open MPI applications under systems that use their own versions of ORTE and OPAL. For example, when running MPI applications under ORTE, if the ORTE and OPAL libraries between OMPI and ORCM are not identical (which, because they are released at different times, are likely to be different), we need to ensure that the OMPI applications link against their ORTE and OPAL libraries, but the ORCM executables link against their ORTE and OPAL libraries.	2014-10-22 10:32:19 -07:00
Jeff Squyres	01fd96bfa5	Revert "Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build." This reverts commit `63f619f871`.	2014-10-22 10:32:11 -07:00
Ralph Castain	63f619f871	Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build.	2014-10-10 11:39:08 -07:00
Ralph Castain	fd6a044b7f	Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages. Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.	2014-10-03 16:02:57 -06:00
Jeff Squyres	413e775dbf	version configury: make dist now works Update the VERSION file scheme: * Remove "want_repo_rev". * Add "tarball_version". All values are now always included (major, minor, release, greek, repo_rev). However, configure.ac now runs "opal_get_version.sh ... --tarball", which will return the value of tarball_version (if it is non-empty) or the "full" version string (i.e., "major.minor.releasegreek").	2014-10-02 11:32:54 -07:00
Jeff Squyres	d4e2809531	version: always use all 3 version numbers In all previous releases, the version number would be "A.B.C" unless C was 0, in which case it would be "A.B". This commit changes that scheme to always be "A.B.C", even if C==0. Hence, v1.9.0 will be the first release where this new scheme is evident. This commit was SVN r32816.	2014-09-30 15:54:18 +00:00
Artem Polyakov	f2e586980b	Fix timing framework: 1. Fixes according to (http://www.open-mpi.org/community/lists/devel/2014/09/15869.php) 2. Force mpisync:rank0 to gather results. Now sync info is written by rank0 to the output file. 3. Improve mpirun_prof: 1) adopt to the environment (SLURM/TORQUE); 2) recognize some noteset-related mpirun options. This commit was SVN r32772.	2014-09-23 12:59:54 +00:00
Artem Polyakov	70587d1804	Remove outdated OPAL parameter "opal_pmi_version". Now PMI selection is handled by PMIx MCA. This commit was SVN r32767.	2014-09-20 02:30:23 +00:00
Ralph Castain	dfb952fa78	[Contribution from Artem - moved it to svn from git for him] Replace our old, clunky timing setup with a much nicer one that is only available if configured with --enable-timing. Add a tool for profiling clock differences between the nodes so you can get more precise timing measurements. I'll ask Artem to update the Github wiki with full instructions on how to use this setup. This commit was SVN r32738.	2014-09-15 18:00:46 +00:00
Ralph Castain	e32d541c8d	Bring over a slight modification to the opal_init_test routine This commit was SVN r32676.	2014-09-07 15:46:53 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Jeff Squyres	0a398c155f	opal MCA params: Move (and adapt) help message to opal help file This commit was SVN r32547.	2014-08-16 11:54:41 +00:00
Ralph Castain	a347b19dc1	Add missing include This commit was SVN r32406.	2014-08-01 18:49:37 +00:00
Ralph Castain	76d82b885f	Correctly dereference the thread object This commit was SVN r32321.	2014-07-26 17:01:27 +00:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00
Ralph Castain	c1bb5b68d0	It is possible to have a "standard" progress thread, so simplify the usage of the opal_progress_thread code. This commit was SVN r32277.	2014-07-22 16:55:23 +00:00
George Bosilca	7c21491858	Fix the indentation. Allow for deregistration of OPAL params. This commit was SVN r32242.	2014-07-15 05:20:26 +00:00
George Bosilca	2f6bc76dc1	Be symmetric to the opal_init. This commit was SVN r32241.	2014-07-15 05:05:26 +00:00
George Bosilca	77c74e8872	Don't iniaitlize twice the event framework (it is already initialized the init_tool). This commit was SVN r32239.	2014-07-15 05:04:29 +00:00
Mike Dubman	e342a11c2e	opal envlist mca: implement Jeff`s quibbles fixed by Elena, reviewed by Miked This commit was SVN r32216.	2014-07-11 07:23:20 +00:00
Ralph Castain	60da1456d9	Silence unused var warning This commit was SVN r32187.	2014-07-09 22:37:22 +00:00
Joshua Ladd	057370364d	Opal: Add a new MCA variable type "version_string". Also add a new flag to ompi_info that allows a user to print all MCA variables of a specific type. --type version_string This command will print all MCA variables of type version_string. This feature was developed by Elena Shipunova and was reviewed by Josh Ladd. This commit was SVN r32166.	2014-07-09 01:37:23 +00:00
Ralph Castain	832fa4a028	Ensure that the progress thread tracker properly cleans up the blocking event, if set. Also, use the blocking event to help wake up the progress thread for quick shutdown as some threads can be blocked in a long-running call to select. This commit was SVN r32141.	2014-07-04 14:55:51 +00:00
Ralph Castain	f6d4b4c11b	As discussed at the OMPI developer's meeting, add functions to start, stop, and restart libevent-driven progress threads. Critical NOTE: if you don't have a file descriptor event defined for your progress thread, it will spin hard! Accordingly, the "start progress thread" function has a boolean parameter you can use to request that the function automatically create one for you. This commit was SVN r32137.	2014-07-03 18:56:46 +00:00
Ralph Castain	1107f9099e	Per the RFC issued here: http://www.open-mpi.org/community/lists/devel/2014/05/14827.php Refactor PMI support This commit was SVN r31907.	2014-06-01 04:28:17 +00:00
Ralph Castain	5602156a1c	Use the correct abstraction layer name for the data dirs This commit was SVN r31684.	2014-05-08 14:32:24 +00:00
Ralph Castain	11faab1091	The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees. This commit was SVN r31679.	2014-05-08 02:01:35 +00:00
Jeff Squyres	24e222f49d	opal: Remove unused help message. Help messages about deprecated variables are now provided by the MCA var system. This commit was SVN r31325.	2014-04-07 15:41:43 +00:00
Nathan Hjelm	a4fff57720	make the opal progress yield variable settable at any time The semantics of the variable mpi_yield_when_idle are to call opal_progress_set_yield_when_idle at MPI_Init. It would be difficult to modify the old variable to support setting this parameter at runtime. The fix is to add an additional parameter to opal: opal_progress_yield_when_idle that directly sets the variable. This variable is settable anytime and does not affect the semantics of the old mpi_yield_when_idle variable. Refs trac:193 cmr=v1.8.1:reviewer=jsquyres This commit was SVN r31255. The following Trac tickets were found above: Ticket 193 --> https://svn.open-mpi.org/trac/ompi/ticket/193	2014-03-27 15:51:06 +00:00
Adrian Reber	4ca07ae125	re-introduce distill_checkpoint_ready In the OPAL_ENABLE_FT_CR code path there used to be a variable 'mca_base_component_distill_checkpoint_ready' which got removed. The FT code was not compiling and while trying to get it to compile again the old variable was #ifdef'd out. This re-introduces the variable with a new name 'opal_base_distill_checkpoint_ready' and enables the code previously #ifdef'd out. This removes the last hack introduced to get the FT code to compile again. This commit was SVN r30928.	2014-03-04 16:14:46 +00:00
Ralph Castain	78e1846b4b	Add further clarification regarding new "test" APIs This commit was SVN r30567.	2014-02-05 15:48:31 +00:00
Ralph Castain	230336b6a8	Upgrade the security framework to avoid multiple hits against the global security server. Add support for future case where mpirun assings a global security credential for a given run, though we need to work out how to handle connect-accept from other mpirun's in that case. Remove a bunch of duplicate code in the OOB by consolidating the connection handshake code. Refs trac:4221 This commit was SVN r30554. The following Trac tickets were found above: Ticket 4221 --> https://svn.open-mpi.org/trac/ompi/ticket/4221	2014-02-04 14:47:04 +00:00

1 2 3 4 5 ...

282 Коммитов