openmpi

Автор	SHA1	Сообщение	Дата
Rolf vandeVaart	08dceda2c0	Fix logic for handling priority and eager RDMA. There was some refactoring that was done in this code and it ended up changing the logic that is used to set up eager RDMA. Rather than setting up eager RDMA with a high priority message, it did it the other way around. For some reason, CUDA-aware support did not like this. So, basically, restore the logic to the way it was prior to the refactoring. The refactoring did not intend to change this. Lightly reviewed by hjelmn.	2015-02-11 16:38:36 -05:00
Mike Dubman	6816e3421f	Merge pull request #377 from regrant/ib_wr_fix fix problem with get_pathrecord posting too many recv requests	2015-02-10 08:47:23 +02:00
Ryan Grant	de93497789	fix problem with get_pathrecord posting too many recv requests	2015-02-04 09:53:58 -07:00
Ryan Grant	5d5e9bc1f8	fixes OpenIB connect error reporting for ibv_* calls that return an errno	2015-02-04 09:09:14 -07:00
Gilles Gouaillardet	9f80aa2d28	btl/openib: regression fix when rdmacm or udcm are disabled This fixes a regression introduced in open-mpi/ompi@661c35ca67 Thanks to Mark Santcroos for reporting this issue	2015-01-20 11:31:50 +09:00
Gilles Gouaillardet	661c35ca67	cleanup dead code caused by the removal of the --with-threads configure option	2015-01-16 19:13:59 +09:00
Nathan Hjelm	006074c48d	Merge pull request #332 from hjelmn/openib_updates Openib updates	2015-01-15 15:05:18 -06:00
Aurélien Bouteiller	f49981bb2a	Disable coalescing until pull request #332 gets in.	2015-01-14 14:12:47 -05:00
Gilles Gouaillardet	4c29d8e247	btl/openib: silence warning (unused code)	2015-01-08 17:18:07 +09:00
Gilles Gouaillardet	06e071454e	btl/openib: cleanup duplicate code	2015-01-07 14:07:30 +09:00
Gilles Gouaillardet	135ecce0eb	btl/openib: rename OPAL_HAVE_XRCD macro into OPAL_HAVE_CONNECTX_XRC_DOMAINS	2015-01-07 13:27:25 +09:00
Nathan Hjelm	cde79bfa60	btl/openib: misc cleanup (tabs, etc) and put credit code into a common place (was duplicated in the send and sendi paths)	2015-01-06 11:39:23 -07:00
Nathan Hjelm	9bae131589	btl/openib: fix message coalescing There was a bug in the openib btl handling this valid sequence of calls: desc = btl_alloc (); btl_free (desc); When triggered the bug would cause either fragment loss or undefined behavior (SEGV, etc). The problem occured because btl_alloc contained the logic to modify the pending fragment (length, etc) and these changes were not corrected if the fragment was freed instead of sent. To fix this issue I 1) moved some of the coalescing logic to the btl_send function, and 2) retry the coalesced fragment on btl_free if it was never sent. This appears to completely address the issue.	2015-01-06 11:39:16 -07:00
Nathan Hjelm	9aaac11648	btl/openib: fix recieve queue source detection	2015-01-06 11:39:11 -07:00
Howard Pritchard	7df648f1cf	btl/openib: fix problems from commit `b3617e73` For systems with OFED's lacking XRC support, commit `b3617e73` broke the build of the openib btl. This commit addresses the issues introduced by this commit.	2015-01-06 11:31:12 -07:00
Gilles Gouaillardet	b3617e736e	btl/openib: add XRC support with OFED 3.12+ based on an original patch contributed by Bull.	2015-01-06 15:30:52 +09:00
Devendar Bureddy	ccafc62c07	OMPI: btl openib: fix max registarable memory caluclation - by default allow to register maximum possible (i.e 2 * total_memory) memory. This beheviour can be turned off using mca parameter "btl_openib_allow_max_memory_registration" - In fallback case, use device specific parameters to calulate memory limit.	2014-12-23 23:35:54 +02:00
Jeff Squyres	269d7f9713	openib: don't use opal_using_threads() in component_init Use the flag that was passed in, instead.	2014-12-17 15:08:43 -08:00
Nathan Hjelm	1b564f62bd	Revert "Merge pull request #275 from hjelmn/btlmod" This reverts commit `ccaecf0fd6`, reversing changes made to `6a19bf85dd`.	2014-11-19 23:22:43 -07:00
Nathan Hjelm	b1f9569b7d	Revert "btl/openib: fix warnings" This reverts commit `6e6c786b49`.	2014-11-19 23:16:16 -07:00
Nathan Hjelm	6e6c786b49	btl/openib: fix warnings	2014-11-19 15:57:01 -07:00
Nathan Hjelm	2b579610f2	btl/openib: fix compilation issues with XRC	2014-11-19 11:44:48 -07:00
Nathan Hjelm	bf7daac388	btl/openib: add atomic operation support	2014-11-19 11:33:04 -07:00
Nathan Hjelm	90554d0f95	btl/openib: misc cleanup (tabs, etc) and put credit code into a common place (was duplicated in the send and sendi paths)	2014-11-19 11:33:03 -07:00
Nathan Hjelm	4122067236	btl/openib: fix message coalescing	2014-11-19 11:33:03 -07:00
Nathan Hjelm	38e9611930	btl/openib: fix recieve queue source detection	2014-11-19 11:33:03 -07:00
Nathan Hjelm	7c43b566d2	more openib updates	2014-11-19 11:33:03 -07:00
Nathan Hjelm	e03956e099	Update the scif and openib btls for the new btl interface Other changes: - Remove the registration argument from prepare_src since it no longer is meant for RDMA buffers. - Additional cleanup and bugfixes.	2014-11-19 11:33:02 -07:00
Nathan Hjelm	ec33374339	btl: remove des_remote/des_remote_count from the mca_btl_base_descriptor_t structure This structure member was originally used to specify the remote segment for an RDMA operation. Since the new btl interface no longer uses desriptors for RDMA this member no longer has a purpose. In addition to removing these members the local segment information has been renamed to des_segments/des_segment_count.	2014-11-19 11:33:02 -07:00
Ralph Castain	780c93ee57	Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL. We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.	2014-11-11 17:00:42 -08:00
Gilles Gouaillardet	e269a52ac7	btl/openib: send openib modex with the PMIX_GLOBAL flag	2014-11-06 08:42:23 -08:00
Steve Wise	7316a88754	openib btl: add Soft iWARP device to the ini file This enables IBM's software iWARP provider. With this driver you can run iWARP/RDMA over any ethernet NIC. Useful for testing OMPI RDMA logic without requiring an expensive RDMA adapter/infrastructure. The Soft iWARP code is at: https://www.gitorious.org/softiwarp	2014-11-04 14:48:43 -08:00
rolfv	9134f48d4c	Do not use sendi path with GPU buffer	2014-10-24 13:35:01 -07:00
Jeff Squyres	c22e1ae33b	configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros These two macros set the prefix for the OPAL and ORTE libraries, respectively. Specifically, the OPAL library will be named libPREFIXopen-pal.la and the ORTE library will be named libPREFIXopen-rte.la. These macros must be called, even if the prefix argument is empty. The intent is that Open MPI will call these macros with an empty prefix, but other projects (such as ORCM) will call these macros with a non-empty prefix. For example, ORCM libraries can be named liborcm-open-pal.la and liborcm-open-rte.la. This scheme is necessary to allow running Open MPI applications under systems that use their own versions of ORTE and OPAL. For example, when running MPI applications under ORTE, if the ORTE and OPAL libraries between OMPI and ORCM are not identical (which, because they are released at different times, are likely to be different), we need to ensure that the OMPI applications link against their ORTE and OPAL libraries, but the ORCM executables link against their ORTE and OPAL libraries.	2014-10-22 10:32:19 -07:00
George Bosilca	7541c03b4c	Mark all instances where atomic operations are used but their return value is unnecessary	2014-10-15 21:47:32 -04:00
Ralph Castain	fd6a044b7f	Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages. Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.	2014-10-03 16:02:57 -06:00
Vasily Filipov	e26af91a64	BTL/OPENIB: set "max_lmc" param to be "1" and not "all available values" by default. cmr=v1.8.3:reviewer=miked This commit was SVN r32736.	2014-09-15 13:56:41 +00:00
Alex Mikheev	31d0724a08	OMPI: btl openib: fix detection of max registarable memory Deal with the case when mlx4 module is loaded but device is not present cmr=v1.8.3:reviewer=miked This commit was SVN r32734.	2014-09-15 12:17:23 +00:00
Ralph Castain	cb2ad98f57	Silence an unused function warning This commit was SVN r32704.	2014-09-10 17:36:34 +00:00
Ralph Castain	a7c5b77d70	Just because the openib BTL can't reach a process doesn't mean it is a job-ending error. If we have other methods for reaching the process (e.g., sm for a local proc), then that's okay. If there is no method for reaching a proc, then that's an error - but the BML will report that situation. The question of whether or not the openib BTL supports loopback is a separate question. It may be more appropriate to make the modex be PMIX_GLOBAL for cases where openib can support loopback so someone can run without a shared memory component. I'll leave that decision to the IB vendors. This commit was SVN r32702.	2014-09-10 17:02:16 +00:00
Gilles Gouaillardet	6916bfc368	btl/openib: fix use of mca_btl_openib_component.default_recv_qps - do not have mca_btl_openib_component.default_recv_qps point to the stack - do not reset mca_btl_openib_component.default_recv_qps in btl_openib_component_open cmr=v1.8.3:reviewer=miked This commit was SVN r32642.	2014-08-29 04:41:34 +00:00
Gilles Gouaillardet	b8a2e90f2d	btl/openib: fix a typo cmr=v1.8.3:reviewer=miked This commit was SVN r32639.	2014-08-29 04:23:42 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Mike Dubman	c3beb0472e	openib/btl: better detect max reg memory. OFED has no runtime versioning API :( based on http://www.open-mpi.org/community/lists/users/2014/08/25048.php reviewed by AlexM cmr=v1.8.2:reviewer=ompi-rm1.8 This commit was SVN r32569.	2014-08-21 12:12:43 +00:00
Mike Dubman	acd5a9acac	udcm: psn should be 24 bit, new OFED actually checks and fails if it is not 24 bit. This commit was SVN r32567.	2014-08-21 08:49:43 +00:00
Mike Dubman	5b90af601c	btl/openib: add missing definition for ConnectX3 card This commit was SVN r32521.	2014-08-13 13:56:34 +00:00
Rolf vandeVaart	37dc9477d0	As requested in RFC and discussed at weekly meeting, change default setting of ibv_fork_init() to off. Link to RFC: http://www.open-mpi.org/community/lists/devel/2014/07/15393.php cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32514.	2014-08-12 20:57:11 +00:00
Rolf vandeVaart	c53c981506	Fix initialization and cleanup code for CUDA-aware code. Eliminates all resource leaks. This commit was SVN r32512.	2014-08-12 19:41:46 +00:00
George Bosilca	beec6b4b4b	Remove a lost #include. This commit was SVN r32478.	2014-08-08 23:42:40 +00:00
Howard Pritchard	1e02bb056f	openib btl check-help-strings cleanup This commit was SVN r32470.	2014-08-08 20:40:18 +00:00

1 2

66 Коммитов