openmpi

Автор	SHA1	Сообщение	Дата
Joshua Ladd	18c5a21562	Fix typo in error handling flow.	2016-01-14 22:28:54 +02:00
Joshua Ladd	afa62d8ca1	Addressing reviewers' comments for https://github.com/open-mpi/ompi-release/pull/891	2016-01-14 19:22:27 +02:00
Tomislav Janjusic	3858bc8e62	Adding support for dynamic endpoint creation Signed-off-by: Tomislav Janjusic <tomislavj@mngx-apl-01.mtl.labs.mlnx> Signed-off-by: Tomislavj Janjusic <tomislavj@mellanox.com> Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>	2016-01-12 22:17:03 +02:00
Gilles Gouaillardet	ad9693c604	pml/yalla: add missing #include <alloca.h>	2015-12-24 14:33:58 +09:00
Gilles Gouaillardet	b38c17dbcb	pml/cm: add missing #include <alloca.h> Thanks Paul Hargrove for reporting this issue	2015-12-24 14:33:58 +09:00
Ralph Castain	ac6289dca6	Cleanup the warnings from the ompi layer when compiling optimized under Mac OSX Cleanup per George's comments	2015-12-17 17:39:15 -08:00
igor.ivanov@itseez.com	041a6a9f53	ompi/pml: Fix warnings in yalla component	2015-12-16 16:22:30 +02:00
Alina Sklarevich	3ffd8dcd20	PML UCX: fix typo (following `7becc54d`).	2015-12-10 13:51:10 +02:00
Nathan Hjelm	f68c315188	pml/ob1: add missing ompi_request_wait_completion for buffered sends This commit adds a call to ompi_request_wait_completion for buffered sends. Without this line it is possible to get into a state where the data is never sent. Fixes open-mpi/ompi#1185 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-12-07 22:28:07 -07:00
yosefe	3bb1270715	yalla: fix valgrind error due to uninitialized status field.	2015-11-19 10:59:31 +02:00
Yossi	b750b72a81	Merge pull request #1127 from yosefe/topic/pml-ucx-implement-cancel pml_ucx: implement cancel, and add small optimizations.	2015-11-12 10:50:48 +02:00
yosefe	7becc54d67	pml_ucx: fix typo.	2015-11-12 09:57:41 +02:00
yosefe	d66b01d380	pml_ucx: implement cancel, and add small optimizations.	2015-11-10 17:40:06 +02:00
Gilles Gouaillardet	d6ff25b9a2	pml/monitoring: initialize common symbols	2015-11-10 13:58:54 +09:00
yosefe	45c3d04857	pml_ucx: fix request construct/destruct. We should invoke OBJ_CONTRUCT/OBJ_DESTRUCT only on regular requests (which are embedded inside UCX requests) and for the completed request. Persistent requests are already constructed/destructed by the free list. This fixes an assertion in ompi_request_destruct.	2015-11-04 11:03:46 +02:00
George Bosilca	5c60e76669	Fix Coverity CIDs 1338021, 1338020, 1338019, 1338018.	2015-11-02 17:38:51 -05:00
George Bosilca	b77c203068	Add more comments and restore the progress, flags, max tag, and max context_id from the original PML.	2015-10-31 17:13:35 -04:00
George Bosilca	3efd494972	Make sure the monitoring infrastructure works well with the new dynamic add_procs.	2015-10-31 17:13:35 -04:00
Guillaume Papauré	86714ad91e	change pml_monitoring_messages_count and pml_monitoring_messages_size pvars to use the start/stop features	2015-10-31 17:13:35 -04:00
George Bosilca	a43c2ce529	Fully integrate the monitoring with the MPI_T PVAR. Writing to the pml_monitoring_flush variable will set the filename of the output file. Stopping a session for the pml_monitoring_flush will force the generation of the nobitoring output file (as long as the filename is not NULL). To reset the monitoring, une has to bind the pml_monitoring_flush to a session.	2015-10-31 17:13:35 -04:00
George Bosilca	646a662721	Use the new group interface and add const to the PML send functions.	2015-10-31 17:13:35 -04:00
George Bosilca	5224a7ce4d	Allow the pvar to be written by invoking the associated callback. Use a PVAR to generate the monitoring dump of the information into a file. Use the PVAR to instruct the PML monitoring when to do the dump.	2015-10-31 17:13:35 -04:00
George Bosilca	df167f4177	Rewrite the close logic to be more clean and cleaner.	2015-10-31 17:13:35 -04:00
George Bosilca	c801ffde86	Use MPI_T variables to handle the flush in a more MPI-blessed way. Code cleanup. Update the monitoring test to use MPI_T variables.	2015-10-31 17:13:35 -04:00
George Bosilca	4f88c82500	Fix a convertion problem and add a comment about the lack of component retain in the new component infrastructure. Clean Makefile.am to fix "make distcheck". Update the gitignore rules.	2015-10-31 17:13:35 -04:00
George Bosilca	80343a0d39	add ability to querry pml monitorinting results with MPI Tools interface using performance variables "pml_monitoring_messages_count" and "pml_monitoring_messages_size" Per Brice suggestion make all data count and message length be uint64_t.	2015-10-31 17:13:35 -04:00
George Bosilca	a47d69202f	Add a monitoring PML. This PML track all data exchanges by the processes counting or not the collective traffic as a separate entity. The need for such a PML is simply because the PMPI interface doesn't allow us to identify the collective generated traffic.	2015-10-31 17:13:35 -04:00
Rolf vandeVaart	f2ff6e03ab	Make CUDA 4.1 a requirement for CUDA-aware support. Remove all related preprocessor conditionals.	2015-10-29 11:24:02 -04:00
yosefe	ae738d0434	pml_ucx: add pmi fence in del_procs	2015-10-28 18:34:36 +02:00
yosefe	41b6230be3	pml_ucx: fix debug macros, and initialize mpi request properly.	2015-10-28 10:59:25 +02:00
Nathan Hjelm	08e267b811	add_procs: add threading protection for dynamic add_procs This commit add protection to the group, ob1, and bml endpoint lookup code. For ob1 and the bml a lock has been added. For performance reasons the lock is only held if a bml or ob1 endpoint does not exist. ompi_group_dense_lookup no uses opal_atomic_cmpset to ensure the proc is only retained by the thread that actually updates the group. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-10-21 16:13:41 -06:00
yosefe	cc76db8d39	ucx: reduce components priority to 5.	2015-10-21 17:38:25 +03:00
Mike Dubman	4ea13f10f6	Merge pull request #1008 from alex-mikheev/topic/ucx_support UCX support for ompi and oshmem	2015-10-21 09:33:33 +03:00
yosefe	a313588337	ompi: Add UCX PML.	2015-10-20 19:46:06 +03:00
Nathan Hjelm	53f6b57c0a	pml/cm: use the priority of the mtl component This commit changes the priority of mtl components to be relative to pml/ob1 and updates the mtl interface to expose this priority. cm now sets its own priority based on the priority of the selected mtl component. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-10-19 12:32:42 -06:00
Nathan Hjelm	bedd80214e	pml/ob1: remove priority check This commit removes code that checks the ob1 priority vs the previous priority. The previous priority is meaningless here and may only cause ob1 to disable itself when it shouldn't. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-10-19 12:32:41 -06:00
Nathan Hjelm	2fd176ac7f	cm: fix selection priority This patch removes a priority check that disables cm if the previous pml had higher priority. The check was incorrect as coded and is unnecessary as we finalize all but one pml anyway. Fixes open-mpi/ompi#1035 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-10-19 12:32:26 -06:00
Nathan Hjelm	341b60dd57	Merge pull request #1029 from kawashima-fj/pr/ob1-fin-memory-leak pml/ob1: Fix a memory leak regarding pending FIN control messages.	2015-10-15 07:55:52 -06:00
KAWASHIMA Takahiro	4e56505202	pml/ob1: Fix a memory leak regarding pending FIN control messages. Once a FIN control message is appended to the pending list, the ob1 PML attempts to send the FIN again in the `mca_pml_ob1_process_pending_packets` function. But if the PML failed to sent the FIN again, the `mca_pml_ob1_send_fin` function creates a new `mca_pml_ob1_pckt_pending_t` object and the old object is not retured to the free list.	2015-10-15 11:15:03 +09:00
Jeff Squyres	889d80a659	mxm/yalla: disable MPI dynamic process functionality Disable the MPI dynamic process functionality when these components are selected to be used.	2015-10-14 13:42:56 -07:00
Nathan Hjelm	12bd300c40	Merge pull request #929 from hjelmn/add_procs Update add_procs support	2015-09-28 17:29:13 -06:00
Nathan Hjelm	6611c000c9	Fix coverity warnings Fix CID 1315271: Constant expression result The intent of this conditional is to not produce a peruse event for probe or mprobe requests. Coverity is correct that the expression is always true. Changed the \|\| to && to fix. Also moved the conditional within an OMPI_WANT_PERUSE to ensure the conditional is not evaluated if peruse is disabled. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-28 15:35:25 -06:00
George Bosilca	01d8e23ccc	Fix the random errors related to the recursive sends and receives identified by Fujitsu.	2015-09-26 00:44:51 +02:00
Nathan Hjelm	54a4061d88	Add support for detecting when dynamic add_procs is not possible This commit adds support to the pml, mtl, and btl frameworks for components to indicate at runtime that they do not support the new dynamic add_procs behavior. At the high end the lack of dynamic add_procs support is signalled by the pml using the new pml_flags member to the pml module structure. If the MCA_PML_BASE_FLAG_REQUIRE_WORLD flag is set MPI_Init will generate the ompi_proc_t array passed to add_proc from ompi_proc_world () instead of ompi_proc_get_allocated (). Both cm and ob1 have been updated to detect if the underlying mtl and btl components support dynamic add_procs. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-23 16:22:05 -06:00
Gilles Gouaillardet	a611274704	pml: fix commit open-mpi/ompi@6e6a3e965c do not use the const modifier for allocator nor recv buffers	2015-09-18 09:54:18 +09:00
Nathan Hjelm	b4a0d40915	pml/ob1: Add support for dynamically calling add_procs This commit contains the following changes: - pml/ob1: use the bml accessor function when requesting a bml endpoint. this will ensure that bml endpoints are only created when needed. for example, a bml endpoint is not requested and not allocated when receiving an eager message from a peer. - pml/ob1: change the pml_procs array in the ob1 communicator to a proc pointer array. at the cost of a single level of extra redirection this will allow us to allocate pml procs on demand. - pml/ob1: add an accessor function to access the pml proc structure for a given peer. this function will allocate the proc if it doesn't already exist. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-10 08:55:54 -06:00
Gilles Gouaillardet	6e6a3e965c	pml: do not cast way the const modifier when this is not necessary update the pml framework and mpi c bindings	2015-09-09 09:18:57 +09:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
yosefe	85580ad055	yalla: fix passing on-demand mapping config to mxm.	2015-08-18 15:00:59 +03:00
Gilles Gouaillardet	6b2fe9120e	yalla: fix Makefile.am LDFLAGS	2015-08-13 17:33:52 +09:00
Jithin Jose	bc4e8b7e73	Fix warnings in direct (pml-cm,mtl-ofi) build Signed-off-by: Jithin Jose <jithin.jose@intel.com>	2015-07-29 15:49:37 -07:00
yosefe	103cac5bd9	yalla: fix mxm configuration parsing. Take configuration from MXM_MPI_xx instead of MXM_PML_xx, same as mtl mxm.	2015-07-08 19:18:23 +03:00
Rolf vandeVaart	30a872b478	Add the ability to send host buffers through one sized staging buffers and CUDA buffers through different sized buffers. Fixes performance issues	2015-07-02 11:11:15 -04:00
Nathan Hjelm	ee36d813dc	Merge pull request #657 from hjelmn/c99 more c99 updates	2015-06-25 11:21:09 -06:00
Nathan Hjelm	4d92c9989e	more c99 updates This commit does two things. It removes checks for C99 required headers (stdlib.h, string.h, signal.h, etc). Additionally it removes definitions for required C99 types (intptr_t, int64_t, int32_t, etc). Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-06-25 10:14:13 -06:00
Howard Pritchard	e49a37c034	ownership: update ownership files per discussions at OMPI devel workshop Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2015-06-25 10:04:42 -06:00
George Bosilca	dc1b125b12	There is no destructor for the base requests.	2015-06-24 14:29:45 -07:00
bosilca	1b8556f926	Merge pull request #653 from hjelmn/moar_ob1_fixes pml/ob1: fix bugs in static request objects	2015-06-24 14:28:11 -07:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Nathan Hjelm	9a8a87611e	pml/ob1: fix bugs in static request objects This commit fixes several bugs in the static request objects used by ob1 for blocking send/receive operations. - Fix memory leak when using MPI_THREAD_MULTIPLE. Requests were allocated off the free list but were destructed and NOT returned. - Fix double-destruct of static objects. There is no reason to CONSTRUCT/DESTUCT the static object for each send/receive operation. This adds overhead and no benefit. To keep the code clean helper functions have been added to finalize ob1 send/receive requests. - Remove now unnecessary include of alloca.h. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-06-23 11:00:45 -06:00
Nathan Hjelm	ac51acb3e1	Merge pull request #651 from hjelmn/fix_thread_multiple_check pml/ob1: do not use OPAL_ENABLE_MULTI_THREADS to determine thread multiple support	2015-06-22 21:45:43 -06:00
Nathan Hjelm	284dd6babe	pml/ob1: do not use OPAL_ENABLE_MULTI_THREADS to determine thread multiple support OPAL_ENABLE_MULTI_THREADS is always on. The correct value to check is OMPI_ENABLE_THREAD_MULTIPLE. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-06-22 19:17:23 -06:00
Andrew Friedley	2c9be59b37	Add new PSM2 MTL. This new MTL runs over PSM2 for Omni Path. PSM2 is a descendant of PSM with changes to support more ranks and some MPI-3 features like mprobe. PSM2 will only support Omni Path networks; PSM only supports True Scale. Likewise, the existing PSM MTL will continue to be maintained for True Scale, while the PSM2 MTL is developed and maintained for Omni Path.	2015-06-22 07:55:46 -07:00
rhc54	9a8bda0b72	Merge pull request #637 from jithinjosepkl/pr/pml-cm-opt pml-cm bug fixes	2015-06-15 19:25:09 -07:00
Jithin Jose	7ccde09a09	Do opal_convertor_copy_and_prepare_for_send for buffered send mode as MCA_PML_CM_HVY_SEND_REQUEST_BSEND_ALLOC calls opal_convertor_pack directly. Signed-off-by: Jithin Jose <jithin.jose@intel.com>	2015-06-15 17:12:50 -07:00
Gilles Gouaillardet	ee3a1da28a	pml/ob1:mca_pml_ob1_recv_request_put_frag silence a warning proc local variable is used only in heterogeneous mode	2015-06-15 10:00:53 +09:00
George Bosilca	67b70bb47a	Add multi-threaded support.	2015-06-12 14:22:17 -07:00
George Bosilca	b2cf74cabc	A first cut at a possible solution for the missing requests from the message queues (a debugging feature). With this approach all blocking (single threaded) requests are allocated from the main freelist, so they will be accounted for during the message queues investigation).	2015-06-12 14:22:17 -07:00
Jithin Jose	7cfbfc4c89	Initialize convertor in pml-cm-send and recv. Signed-off-by: Jithin Jose <jithin.jose@intel.com>	2015-06-10 09:39:31 -07:00
Jeff Squyres	347290f785	pml/Makefile.am: add missing file to $(headers)	2015-06-02 20:07:54 -07:00
Jithin Jose	5ba5a9ade2	Offset buffer by datatype true_lb to handle resized datatypes. - Follow up patch for `56869bff38` Signed-off-by: Jithin Jose <jithin.jose@intel.com>	2015-05-27 13:51:05 -07:00
Jithin Jose	07043894bd	Avoid extra lookup for ompi_proc in homogenous build Signed-off-by: Jithin Jose <jithin.jose@intel.com>	2015-05-26 21:42:42 -07:00
Jithin Jose	50089977ac	Inline PML-CM Signed-off-by: Jithin Jose <jithin.jose@intel.com>	2015-05-26 21:42:41 -07:00
Jithin Jose	56869bff38	Avoid datatype pack/unpack for contiguous data on homogenous systems. Signed-off-by: Jithin Jose <jithin.jose@intel.com>	2015-05-26 21:42:41 -07:00
Gilles Gouaillardet	e980958ad4	pml/ob1: silence a warning	2015-05-26 15:05:44 +09:00
Gilles Gouaillardet	85c45e2275	pml/ob1: fix mca_pml_ob1_recv_request_put_frag(...) in heterogeneous mode	2015-05-22 15:48:45 +09:00
Nathan Hjelm	ce48eabd84	pml/ob1: use c99 flexible array members instead of size 1 arrays This commit updates several ob1 structures to take advantage of C99's flexible array member. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-05-20 10:31:35 -06:00
Ralph Castain	6e95bcd583	Fix typo in oob_tcp.c when IPV6 enabled. Cleanup a few other warnings, including a type in coll_sm that prevented that component from registering its MCA params!	2015-05-07 21:05:08 -07:00
Gilles Gouaillardet	9d56b85b55	initialize common symbols from ompi	2015-05-08 10:11:58 +09:00
Nathan Hjelm	033894b493	Merge pull request #541 from hjelmn/c99_components C99 component initialization	2015-04-20 10:45:39 -06:00
Nathan Hjelm	d251fa1525	pml/ob1: fix heterogenous build Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-20 09:27:00 -06:00
Nathan Hjelm	df75d0382f	ompi: use C99 subobject naming for component initialization This commit helps future-proof ompi components by initializing each component member by name. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-18 10:29:58 -06:00
Nathan Hjelm	3436f2917d	Merge pull request #449 from hjelmn/mca_base_update mca/base update	2015-04-16 08:41:48 -06:00
Jithin Jose	c09582a3ff	- CM blocking send/recv optimizations This patch tries to do as little as possible in the PML CM blocking send/receive routines. Basically, avoid creating and filling in an entire request object. An OMPI-level request is still needed, but we can create that on the stack instead of going to a free list. Signed-off-by: Andrew Friedley <andrew.friedley@intel.com> Signed-off-by: Jithin Jose <jithin.jose@intel.com>	2015-04-03 15:19:08 -07:00
Nathan Hjelm	b68d66bb9b	MCA: Add the project/project version to the MCA base component This commit adds support for project_framework_component_* parameter matching. This is the first step in allowing the same framework name in multiple projects. This change also bumps the MCA component version to 2.1.0. All master frameworks have been updated to use the new component versioning macro. An mca.h has been added to each project to add a project specific versioning macro of the form PROJECT_MCA_VERSION_2_1_0. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-03-27 10:59:04 -06:00
adrianreber	714d9aa67e	Merge pull request #348 from adrianreber/topic/orte_cr_continue_like_restart Topic/orte cr continue like restart	2015-03-12 14:54:02 +01:00
Alina Sklarevich	28586caecf	MTL_MXM/PML_YALLA: fix coverity issues.	2015-03-12 11:49:22 +02:00
Nathan Hjelm	ce6caab2a7	Merge pull request #463 from hjelmn/cuda_async btl/openib: cuda: fix CUDA-aware support with async copy	2015-03-11 09:52:48 -06:00
Adrian Reber	c08e234af7	FT: fix compilation using --with-ft (5/5) Enabling the FT code breaks compilation (again). This series tries to fix the compiler errors. This is again only fixing the compiler errors without any warranty that the result might actually support FT again. With the changes introduced in the previous patches in this series some goto constructs for cleanup are no longer necessary and removed.	2015-03-11 14:23:33 +01:00
Adrian Reber	1c5a8df724	FT: fix compilation using --with-ft (2/5) Enabling the FT code breaks compilation (again). This series tries to fix the compiler errors. This is again only fixing the compiler errors without any warranty that the result might actually support FT again. The FT code used barrier mechanisms which have been removed with `aec5cd08bd`. This patch replaces all those different barriers with opal_pmix.fence(NULL, 0); I am not sure this is completely correct but at least a starting point for a review.	2015-03-11 14:23:33 +01:00
Adrian Reber	f45dd069bd	FT: fix compilation using --with-ft (1/5) Enabling the FT code breaks compilation (again). This series tries to fix the compiler errors. This is again only fixing the compiler errors without any warranty that the result might actually support FT again. This first patch moves orte_cr_continue_like_restart from ORTE to opal_cr_continue_like_restart in OPAL. This only leaves three calls from OPAL to ORTE in the FT code. As it is not yet 100% clear how to handle these calls the code orte_sstore.set_attr() has been #ifdef'd out for now.	2015-03-11 14:23:33 +01:00
Alina Sklarevich	f9a9b936a1	PML_YALLA: fix compilation warnings.	2015-03-11 10:58:54 +02:00
Nathan Hjelm	3d32dbd793	btl/openib: cuda: fix CUDA-aware support with async copy This commit should resolve an issue seen with CUDA-aware support. The problem came in with BTL 3.0. Before 3.0 the size of the copy was stored in the incoming segment's des_remote_count field. This field does not exist in BTL 3.0 so I stored the value in the des_segment_count field. This caused problems with the cuda support code. To fix the issue the endpoint pointer is now stored in the in fragment's endpoint pointer which free's up the segment's des_cbdata pointer for storing the transfer size. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-03-10 14:38:12 -06:00
Mike Dubman	6f91a007e1	Merge pull request #458 from yosefe/topic/pml-yalla-fix-segv keep mxm context alive as long as pml_yalla component is open.	2015-03-10 13:38:14 +02:00
yosefe	976144dca7	keep mxm context alive as long as pml_yalla component is open. pml_yalla_del_comm may be called after yalla module is finalized, which leads to invalid memory access if mxm context is already destroyed in this point.	2015-03-10 11:52:44 +02:00
George Bosilca	420ae98dfe	Remove all unnecessary whitespaces and make sure we close the module correctly.	2015-03-05 13:00:13 -05:00
Alex Mikheev	168c83ed95	OMPI/MXM: add out of band barrier at the end of del_procs mxm shutdown requires out of band barrier	2015-03-02 12:56:02 +02:00
Rolf vandeVaart	30e9dd5066	Look in extra rdma array to find bml. This is needed with recent BML changes. Only affects CUDA-aware code.	2015-02-27 09:02:21 -05:00
George Bosilca	3fd8dc099d	Revert "This function is now useless." This reverts commit `0871c5c489`.	2015-02-26 17:54:46 -05:00
George Bosilca	7f90cedf23	Revert "Fix the logic for computing the different weights for each BTLs. This" This reverts commit `de118609ec`.	2015-02-26 17:54:31 -05:00
George Bosilca	d4c2fc9d41	Merge branch 'master' of github.com:open-mpi/ompi	2015-02-25 12:01:57 -05:00
Mike Dubman	a0afb7d96e	Merge pull request #424 from miked-mellanox/topic/master_fix_yalla fixes issue #414	2015-02-25 19:01:47 +02:00
George Bosilca	f3b58006c8	Merge branch 'master' of github.com:open-mpi/ompi	2015-02-25 12:01:35 -05:00
Jeff Squyres	c3381150de	ob1: fix another PERUSE compile error	2015-02-25 05:53:12 -08:00
yosefe	0332ab4d8b	Initialize pml_yalla bsend request status.	2015-02-25 15:33:26 +02:00
Nathan Hjelm	0ac2f08460	pml/ob1: fix peruse compile error Fixes #416	2015-02-24 15:39:46 -07:00
Nathan Hjelm	5ef24000c7	pml/yalla: fix typo in PML_YALLA_FREELIST_INIT	2015-02-24 10:08:54 -07:00
Nathan Hjelm	5f1254d710	Update code base to use the new opal_free_list_t Use of the old ompi_free_list_t and ompi_free_list_item_t is deprecated. These classes will be removed in a future commit. This commit updates the entire code base to use opal_free_list_t and opal_free_list_item_t. Notes: OMPI_FREE_LIST__MT -> opal_free_list_ (uses opal_using_threads ()) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-24 10:05:45 -07:00
Nathan Hjelm	ed78553512	Update opal_free_list_t usage to reflect new class interface. Please verify your components have been updated correctly. Keep in mind that in terms of threading: OPAL_FREE_LIST_GET -> opal_free_list_get_st OPAL_FREE_LIST_RETURN -> opal_free_list_return_st I used the opal_using_threads() variant anytime it appeared multiple threads could be operating on the free list. If this is not the case update to _st. If multiple threads are always in use change to _mt.	2015-02-24 10:05:44 -07:00
Howard Pritchard	c9e81b54fb	Merge pull request #412 from hppritcha/topic/owner_files add owner files to opa/ompi/orte mca directories	2015-02-23 09:48:20 -07:00
Howard Pritchard	bf89131f9e	add owner files to opa/ompi/orte mca directories This commit adds an owner file in each of the component directories for each framework. This allows for a simple script to parse the contents of the files and generate, among other things, tables to be used on the project's wiki page. Currently there are two "fields" in the file, an owner and a status. A tool to parse the files and generate tables for the wiki page will be added in a subsequent commit.	2015-02-22 15:10:23 -07:00
Mike Dubman	00d416ba9d	yalla: fix coverity errors dead code fix	2015-02-22 13:57:45 +02:00
George Bosilca	0871c5c489	This function is now useless.	2015-02-21 16:38:17 -05:00
George Bosilca	de118609ec	Fix the logic for computing the different weights for each BTLs. This removes the call to qsort, as the BTLs are already sorted based on their respective bandwidth.	2015-02-21 16:37:18 -05:00
Rolf vandeVaart	dbd0064713	Fix bug in CUDA-aware and GDR introduced by refactoring	2015-02-18 17:44:28 -05:00
Nathan Hjelm	3847025540	pml/ob1: when using btl_get try to register the entire region before attempting to break the get into multiple rdma fragments A little background. Historically ob1 always registered the entire memory region when the RGET protocol was in use. This changed when Mellanox added support to fragment RGET using the btl_prepare_dst function. Now that the BTL layer has changed to split out the limits of get/put there is explicit fragmentation code in ob1. Before this commit the registration was still done per RGET fragment. This commit will attempt to register the entire region before creating RGET fragments. If the registration is successfull then all RGET fragments will use this registration otherwise they will each attempt to register their own segment of the receive buffer. If that fails enough times each fragment will give up and fall back on send/recv. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:37 -07:00
Nathan Hjelm	868e10caf2	pml/bfo: ompi ignore until updated for BTL 3.0 interface Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:37 -07:00
Nathan Hjelm	c4a0e02261	pml/ob1: update for BTL 3.0 interface Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:37 -07:00
Jeff Squyres	f38f2a159b	pml_base: whitespace cleanup; no code changes	2015-02-06 11:27:50 -08:00
Jeff Squyres	46a1722dfc	pml_base: fix errant show_help message	2015-02-06 11:27:50 -08:00
Yohann Burette	1ad188206b	Add OFI MTL to CM PML. This allows the CM PML to be picked when the OFI MTL is selected.	2015-01-20 10:50:14 -08:00
George Bosilca	df0512550e	The extent of the datatype is irrelevant for deciding to do an immediate send as long as we have to pack.	2015-01-19 02:23:12 -05:00
Gilles Gouaillardet	d14daf40d0	ob1: correctly handle types in which size > extent do not send inline if extentcount OR* size*count are greater than 256	2015-01-19 14:07:23 +09:00
Howard Pritchard	3fc7b389ff	initial async progress changes for gni	2014-12-24 11:50:23 -07:00
yosefe	3f152733bf	Add yalla to the list of default PMLs	2014-12-01 13:11:28 +02:00
Nathan Hjelm	1b564f62bd	Revert "Merge pull request #275 from hjelmn/btlmod" This reverts commit `ccaecf0fd6`, reversing changes made to `6a19bf85dd`.	2014-11-19 23:22:43 -07:00
Nathan Hjelm	1a5349ec79	ompi ignore bfo until it is updated for new btl interface	2014-11-19 11:33:04 -07:00
Nathan Hjelm	0110603782	ob1 warning fix	2014-11-19 11:33:04 -07:00
Nathan Hjelm	24427639b6	Fix ob1 warnings	2014-11-19 11:33:03 -07:00
Nathan Hjelm	271818f887	pml/ob1: bug fixes and adjustments for changes in btl_sendi behavior	2014-11-19 11:33:03 -07:00
Nathan Hjelm	ee2b111011	Update PML for latest BTL update	2014-11-19 11:33:02 -07:00
Nathan Hjelm	c61e017177	pml: updates to reflect member changes in mca_btl_base_descriptor_t and mca_btl_base_module_t structures	2014-11-19 11:33:02 -07:00
Nathan Hjelm	5936411a07	pml/ob1: when using btl_get try to register the entire region before attempting to break the get into multiple rdma fragments A little background. Historically ob1 always registered the entire memory region when the RGET protocol was in use. This changed when Mellanox added support to fragment RGET using the btl_prepare_dst function. Now that the BTL layer has changed to split out the limits of get/put there is explicit fragmentation code in ob1. Before this commit the registration was still done per RGET fragment. This commit will attempt to register the entire region before creating RGET fragments. If the registration is successfull then all RGET fragments will use this registration otherwise they will each attempt to register their own segment of the receive buffer. If that fails enough times each fragment will give up and fall back on send/recv.	2014-11-19 11:33:02 -07:00
Nathan Hjelm	b75bb8aea7	Update pml for btl changes	2014-11-19 11:33:02 -07:00
Jeff Squyres	7a5b2e9b13	ob1: change an OPAL_UNLIKELY to OPAL_LIKELY Per `924d39e415 (commitcomment-8378266)`, this OPAN_UNLIKELY should really be OPAL_LIKELY.	2014-10-31 03:22:55 -07:00
George Bosilca	924d39e415	Always OBJ_DESTRUCT the send request.	2014-10-30 01:28:50 -04:00
Gilles Gouaillardet	ed93c8787d	ob1: add a destructor to mca_pml_ob1_recv_request_t opal_mutex_t must be OBJ_DESTRUCTed in order to avoid a memory leak (pthread_mutex_init allocates memory under Cygwin, so pthread_mutex_destroy is mandatory) Thanks to Marco Atzeri for reporting this issue	2014-10-29 13:30:29 +09:00
Jeff Squyres	c22e1ae33b	configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros These two macros set the prefix for the OPAL and ORTE libraries, respectively. Specifically, the OPAL library will be named libPREFIXopen-pal.la and the ORTE library will be named libPREFIXopen-rte.la. These macros must be called, even if the prefix argument is empty. The intent is that Open MPI will call these macros with an empty prefix, but other projects (such as ORCM) will call these macros with a non-empty prefix. For example, ORCM libraries can be named liborcm-open-pal.la and liborcm-open-rte.la. This scheme is necessary to allow running Open MPI applications under systems that use their own versions of ORTE and OPAL. For example, when running MPI applications under ORTE, if the ORTE and OPAL libraries between OMPI and ORCM are not identical (which, because they are released at different times, are likely to be different), we need to ensure that the OMPI applications link against their ORTE and OPAL libraries, but the ORCM executables link against their ORTE and OPAL libraries.	2014-10-22 10:32:19 -07:00
yosefe	b4f569b4d4	yalla: address comments on #246 by @jsquires	2014-10-22 10:42:56 +03:00
yosefe	ce7c748e51	Add new PML yalla, which uses mxm directly to reduce overhead. http://starwars.wikia.com/wiki/Ubed_Yalla	2014-10-21 16:08:24 +03:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Gilles Gouaillardet	f24699623f	check-help-strings cleanup This commit was SVN r32495.	2014-08-11 03:25:22 +00:00
Gilles Gouaillardet	f7b13d1126	Fix missing ampersand. also replase the OMPI_CAST_RTE_NAME macro with an inline function if OPAL_ENABLE_DEBUG, so we can get warnings from the compiler if ampersand is missing. Thanks to Paul Hargrove for reporting the bugs This commit was SVN r32408.	2014-08-04 02:52:56 +00:00
Ralph Castain	daeb9b6c4f	Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain. Thanks to Gilles for pointing out some of the discrepancies. This commit was SVN r32398.	2014-08-01 14:44:11 +00:00
George Bosilca	cee2a4e5c8	Missing alloca.h. Thanks Paul for catching this. This commit was SVN r32388.	2014-08-01 03:28:23 +00:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00
Nathan Hjelm	f960e4273e	Fix typo in r32196 The wrong descriptor field was used when calculating the size received when using the RDMA rendevous protcol. This commit was SVN r32232. The following SVN revision numbers were found above: r32196 --> open-mpi/ompi@a14e0f10d4	2014-07-14 21:00:53 +00:00
Gilles Gouaillardet	77184b5c4c	Fix a cornercase with MPI_PROC_NULL persistent requests Handle OMPI_REQUEST_NOOP in MPI_Startall rather than PML cmr=v1.8.2:reviewer=bosilca:ticket=4764 This commit was SVN r32213. The following Trac tickets were found above: Ticket 4764 --> https://svn.open-mpi.org/trac/ompi/ticket/4764	2014-07-11 04:37:01 +00:00
Nathan Hjelm	1b9621eeb0	Fix typo in r32196 This commit was SVN r32202. The following SVN revision numbers were found above: r32196 --> open-mpi/ompi@a14e0f10d4	2014-07-10 18:43:49 +00:00
Nathan Hjelm	a14e0f10d4	Per RFC: Remove des_src and des_dst members from the mca_btl_base_segment_t and replace them with des_local and des_remote This change also updates the BTL version to 3.0.0. This commit does not represent the final version of BTL 3.0.0. More changes are coming. In making this change I updated all of the BTLs as well as BTL user's to use the new structure members. Please evaluate your component to ensure the changes are correct. RFC text: This is the first of several BTL interface changes I am proposing for the 1.9/2.0 release series. What: Change naming of btl descriptor members. I propose we change des_src and des_dst (and their associated counts) to be des_local and des_remote. For receive callbacks the des_local member will be used to communicate the segment information to the callback. The proposed change will include updating all of the doxygen in btl.h as well as updating all BTLs and BTL users to use the new naming scheme. Why: My btl usage makes use of both put and get operations on the same descriptor. With the current naming scheme I need to ensure that there is consistency beteen the segments described in des_src and des_dst depending on whether a put or get operation is executed. Additionally, the current naming prevents BTLs that do not require prepare/RMA matched operations (do not set MCA_BTL_FLAGS_RDMA_MATCHED) from executing multiple simultaneous put AND get operations. At the moment the descriptor can only be used with one or the other. The naming change makes it easier for BTL users to setup/modify descriptors for RMA operations as the local segment and remote segment are always in the same member field. The only issue I forsee with this change is that it will require a little more work to move BTL fixes to the 1.8 release series. This commit was SVN r32196.	2014-07-10 16:31:15 +00:00

1 2 3 4 5 ...

1285 Коммитов