openmpi

Автор	SHA1	Сообщение	Дата
Gilles Gouaillardet	134c866aa9	btl/openib: fix misc memory leaks as reported by Coverity with CIDs 1269848, 1269852 and 1269862	2015-03-05 14:06:18 +09:00
Nathan Hjelm	855d422e62	Merge pull request #408 from hjelmn/btl_3_0_mod btl: expose local registration thresholds	2015-02-26 12:57:43 -07:00
Alina Sklarevich	e4c4e7df5e	Fix the calls to ibv_fork_init and remove btl_openib_want_fork_support. In order to have an effect, ibv_fork_init should be called in the beginning of the verbs initialization flow - before the calls to the ibv_create_qp and ibv_create_cq verbs. These functions are called from the oob/ud code and by the time the other verbs components (btl openib, pml yalla, ...) call ibv_fork_init, it's too late. This commit forces the call to ibv_fork_init (if it's requested) right at the beginning of all the components that are using verbs. (ibv_fork_init() can be safely called multiple times) This commit also removes the btl_openib_want_fork_support mca parameter and adds a new mca parameter instead - opal_verbs_want_fork_support. Through this new parameter, fork support may be requested for ALL components. The default value for this parameter is set to 1. Before this commit the btl_openib_want_fork_support parameter didn't provide fork support for the openib btl if its value was set to 1. (because when openib called ibv_fork_init, it was already after the calls to ibv_create_* in oob/ud and thereofre it failed).	2015-02-25 10:58:50 +02:00
Jeff Squyres	a85a392896	Merge pull request #422 from jsquyres/topic/coverity-fixes Some Coverity fixes	2015-02-24 17:00:10 -05:00
Jeff Squyres	3cd36ab12a	openib: fix double free This was CID 1269989	2015-02-24 15:24:10 -05:00
Nathan Hjelm	5f1254d710	Update code base to use the new opal_free_list_t Use of the old ompi_free_list_t and ompi_free_list_item_t is deprecated. These classes will be removed in a future commit. This commit updates the entire code base to use opal_free_list_t and opal_free_list_item_t. Notes: OMPI_FREE_LIST__MT -> opal_free_list_ (uses opal_using_threads ()) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-24 10:05:45 -07:00
Nathan Hjelm	cc750b00a6	btl: export local registration thresholds Some BTLs do not require local registration for some rdma transactions. For example: inline put on openib, fma put on ugni. This commit adds code to expose the local registration thresholds to BTL users. Optimized code can take advantage of this information to improve rdma performance.	2015-02-19 16:13:37 -07:00
Nathan Hjelm	cf91156105	btl/openib: add atomic operation support Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:36 -07:00
Nathan Hjelm	74f1af4548	btl/openib: update for BTL 3.0 interface Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:36 -07:00
Gilles Gouaillardet	661c35ca67	cleanup dead code caused by the removal of the --with-threads configure option	2015-01-16 19:13:59 +09:00
Nathan Hjelm	9aaac11648	btl/openib: fix recieve queue source detection	2015-01-06 11:39:11 -07:00
Gilles Gouaillardet	b3617e736e	btl/openib: add XRC support with OFED 3.12+ based on an original patch contributed by Bull.	2015-01-06 15:30:52 +09:00
Devendar Bureddy	ccafc62c07	OMPI: btl openib: fix max registarable memory caluclation - by default allow to register maximum possible (i.e 2 * total_memory) memory. This beheviour can be turned off using mca parameter "btl_openib_allow_max_memory_registration" - In fallback case, use device specific parameters to calulate memory limit.	2014-12-23 23:35:54 +02:00
Jeff Squyres	269d7f9713	openib: don't use opal_using_threads() in component_init Use the flag that was passed in, instead.	2014-12-17 15:08:43 -08:00
Nathan Hjelm	1b564f62bd	Revert "Merge pull request #275 from hjelmn/btlmod" This reverts commit ccaecf0fd6c862877e6a1e2643f95fa956c87769, reversing changes made to 6a19bf85dde5306f559f09952cf3919d97f52502.	2014-11-19 23:22:43 -07:00
Nathan Hjelm	b1f9569b7d	Revert "btl/openib: fix warnings" This reverts commit 6e6c786b49dc9df70c87bb8348c12b6f68de173b.	2014-11-19 23:16:16 -07:00
Nathan Hjelm	6e6c786b49	btl/openib: fix warnings	2014-11-19 15:57:01 -07:00
Nathan Hjelm	bf7daac388	btl/openib: add atomic operation support	2014-11-19 11:33:04 -07:00
Nathan Hjelm	38e9611930	btl/openib: fix recieve queue source detection	2014-11-19 11:33:03 -07:00
Nathan Hjelm	7c43b566d2	more openib updates	2014-11-19 11:33:03 -07:00
Nathan Hjelm	e03956e099	Update the scif and openib btls for the new btl interface Other changes: - Remove the registration argument from prepare_src since it no longer is meant for RDMA buffers. - Additional cleanup and bugfixes.	2014-11-19 11:33:02 -07:00
Nathan Hjelm	ec33374339	btl: remove des_remote/des_remote_count from the mca_btl_base_descriptor_t structure This structure member was originally used to specify the remote segment for an RDMA operation. Since the new btl interface no longer uses desriptors for RDMA this member no longer has a purpose. In addition to removing these members the local segment information has been renamed to des_segments/des_segment_count.	2014-11-19 11:33:02 -07:00
Ralph Castain	780c93ee57	Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL. We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.	2014-11-11 17:00:42 -08:00
Gilles Gouaillardet	e269a52ac7	btl/openib: send openib modex with the PMIX_GLOBAL flag	2014-11-06 08:42:23 -08:00
Ralph Castain	fd6a044b7f	Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages. Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.	2014-10-03 16:02:57 -06:00
Gilles Gouaillardet	6916bfc368	btl/openib: fix use of mca_btl_openib_component.default_recv_qps - do not have mca_btl_openib_component.default_recv_qps point to the stack - do not reset mca_btl_openib_component.default_recv_qps in btl_openib_component_open cmr=v1.8.3:reviewer=miked This commit was SVN r32642.	2014-08-29 04:41:34 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Rolf vandeVaart	c53c981506	Fix initialization and cleanup code for CUDA-aware code. Eliminates all resource leaks. This commit was SVN r32512.	2014-08-12 19:41:46 +00:00
George Bosilca	beec6b4b4b	Remove a lost #include. This commit was SVN r32478.	2014-08-08 23:42:40 +00:00
Jeff Squyres	2d2534a1bc	btl_openib_component.c: remove inline logic and use 2 calls to show_help The contrib/check-help-strings.pl gets confused if the topic is an inline logic check, so separate it into two calls to show_help. This commit was SVN r32455.	2014-08-08 13:35:29 +00:00
Ralph Castain	c366554048	Cleanup the MCA param registration - the MPI_T interface is allowed to modify the value, so don't read the value until we are ready to use it. Discussed with Nathan, but will ask his review prior to porting it to 1.8.2 Refs trac:4816 This commit was SVN r32379. The following Trac tickets were found above: Ticket 4816 --> https://svn.open-mpi.org/trac/ompi/ticket/4816	2014-07-31 16:30:08 +00:00
Joshua Ladd	da286d9a84	Fixing the OpenIB receive queue selection logic. Refs trac:4816 This commit was SVN r32350. The following Trac tickets were found above: Ticket 4816 --> https://svn.open-mpi.org/trac/ompi/ticket/4816	2014-07-30 02:17:46 +00:00
Joshua Ladd	7a694cb95e	Reverting r32346. Will do it the correct way and update the patch in Refs trac:32346 This commit was SVN r32348. The following SVN revision numbers were found above: r32346 --> open-mpi/ompi@5be6ff07d5 The following Trac tickets were found above: Ticket 32346 --> https://svn.open-mpi.org/trac/ompi/ticket/32346	2014-07-29 23:23:44 +00:00
Joshua Ladd	5be6ff07d5	This fixes the OpenIB BTL receive queue selection logic in the trunk. Custom patch for 1.8.2 is provided in Refs trac:4816 This commit was SVN r32346. The following Trac tickets were found above: Ticket 4816 --> https://svn.open-mpi.org/trac/ompi/ticket/4816	2014-07-29 21:42:20 +00:00
Nathan Hjelm	dffbcb3803	User opal_process_info.cpuset to determine if the process is bound This commit was SVN r32334.	2014-07-28 22:00:18 +00:00
George Bosilca	f217661ee0	Use opal_process_info whenever possible. Some other minor cleanups. This commit was SVN r32325.	2014-07-26 21:48:23 +00:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00

37 Коммитов