openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	1322e5dee8	Merge pull request #3274 from hjelmn/osc_rdma_fix osc/rdma: fix typo in atomic code	2017-04-04 00:20:42 -06:00
Gilles Gouaillardet	5dfd4ab6ca	coll/tuned: remove set-but-not-used variables Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-04 13:18:11 +09:00
Nathan Hjelm	fad0803920	osc/rdma: fix typo in atomic code Fixes #3267 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-04-03 15:54:28 -06:00
Nadia Derbey	b6de94e449	Fix yalla PML: MPI_Recv does not return MPI_ERR_TRUNCATE upon overflow Signed-off-by: Nadia Derbey <Nadia.Derbey@atos.net>	2017-03-30 15:18:31 +02:00
Xin Zhao	ee952fcccd	Passing estimated_num_procs to UCX init in PML and SPML. Signed-off-by: Xin Zhao <xinz@mellanox.com>	2017-03-27 20:36:52 +03:00
Nathan Hjelm	c72fb30eb5	osc/pt2pt: fix typo Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2017-03-23 09:00:21 -06:00
Xin Zhao	6a99c60fbd	Add multithreading support in PML UCX framework. Signed-off-by: Xin Zhao <xinz@mellanox.com>	2017-03-20 19:55:00 +02:00
Jeff Squyres	ce0e1cd32c	Merge pull request #3201 from hppritcha/jjhursey-topic/timer-gettimeofday Jjhursey topic/timer gettimeofday	2017-03-18 20:12:36 -04:00
Howard Pritchard	b9331527f5	timer: hack use of clock_gettime better solution needed later workaround for #3003 Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-03-18 15:08:59 -05:00
Ralph Castain	45b46dc446	Merge pull request #3181 from artpol84/add_proc_fix_2/master ompi: Avoid unnecessary PMIx lookups when adding procs (step 2).	2017-03-16 15:06:08 -07:00
Jeff Squyres	760db0d5ce	osc/pt2pt: fix compiler warning Remove unused variable. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-03-16 05:46:11 -07:00
Jeff Squyres	1947280865	topo/treematch: squash some compiler warnings Only define MIN/MAX if they are not already defined. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-03-16 05:44:26 -07:00
Joshua Hursey	48d13aa8ef	mpi/c: Force wtick/wtime to use gettimeofday * See https://github.com/open-mpi/ompi/issues/3003 for a discussion about this patch. Once we get a better version in place we can revert this change. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-03-15 21:24:37 -05:00
Artem Polyakov	1f7a3a2d54	ompi: Avoid unnecessary PMIx lookups when adding procs (step 2). Follow-up for `717f3fef62`. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-03-16 07:47:27 +07:00
Nathan Hjelm	37214eda09	Merge pull request #3164 from hjelmn/ob1_pinned pml/ob1: do not cache leave_pinned	2017-03-14 13:22:18 -06:00
Nathan Hjelm	3e7ef48c13	pml/ob1: do not cache leave_pinned This commit fixes a bug that disabled both the RDMA pipeline and RDMA protocols in ob1. ob1 was internally caching the values of opal_leave_pinned and opal_leave_pinned_pipeline at init time. This is no longer valid as opal_leave_pinned may be set by any call to a btl's add_procs. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-14 09:00:40 -06:00
Valentin Petrov	fe069c9570	Fixes the coll_allgather usage bug One should use the correct module object when calling c_coll.coll_allgather. Otherwise there will be a segfault in the case, for example, when hcoll is used. In that case c_coll.coll_allgather = mca_coll_hcoll_allgather while c_coll.coll_gather_module = tuned. Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2017-03-14 09:47:39 +02:00
Jeff Squyres	086748bb70	Merge pull request #3102 from omor1/master Add missing definition of MPI_T_PVAR_SESSION_NULL (resolve #2652)	2017-03-13 15:27:05 -04:00
Alex Mikheev	c081239f88	ompi: pml ucx: fix persistant request init CR changes Signed-off-by: Alex Mikheev <alexm@mellanox.com>	2017-03-08 13:26:29 +02:00
Alex Mikheev	c113c37a7a	ompi: pml ucx: fix persistant request initialization Signed-off-by: Alex Mikheev <alexm@mellanox.com>	2017-03-08 10:59:41 +02:00
Nathan Hjelm	0195d15401	osc/pt2pt: flush pending fragments on lock ack This commit addresses an issue that can occur in cases where a lot of fragments are outstanding. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-06 13:58:46 -07:00
Edgar Gabriel	607dc2c039	Merge pull request #3103 from edgargabriel/pr/sharedfp-name-collision-fix sharedfp/lockedfile and sm: fix the namecollision	2017-03-05 14:46:20 -06:00
Edgar Gabriel	2d462b3b80	sharedfp/lockedfile and sm: fix name collision this fixes the issue reported by Nicolas Joly on the mailing: the sharedfp/lockedfile component does not support right now a scenario where multiple jobs read from the same input file, due to a collision of the filenames utilized for the sharedfp handle. Although not part of the oroginal report, the same occurs for the sharedfp/sm component. Add therefore the jobid to be part of the lockedfilename/sm file name. use the OMPI_CAST_RTE_NAME macro to determine jobid Fixes: #3098 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-03-05 11:28:28 -06:00
Omri Mor	20ab37a297	Add missing MPI_T_PVAR_SESSION_NULL to mpi.h MPI_T_pvar_session_free() should reject null sessions and set *session to MPI_T_PVAR_SESSION_NULL Signed-off-by: Omri Mor <omri50@gmail.com>	2017-03-05 09:03:30 -06:00
Artem Polyakov	9448814c40	ompi/pml/ucx: Fix uninitialized UCX request field. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-03-05 03:06:30 +07:00
Edgar Gabriel	d1fed77781	Merge pull request #3094 from edgargabriel/pr/master-lustre-priority io/ompio: adjust the priority of the OMPIO component on lustre	2017-03-03 09:29:14 -06:00
KAWASHIMA Takahiro	39294caf04	Merge pull request #3086 from kawashima-fj/pr/coll-base-defs coll: Update `ompi/mca/coll/base/coll_base_functions.h`	2017-03-03 18:53:00 +09:00
KAWASHIMA Takahiro	7cb42d9aaa	Merge pull request #3085 from kawashima-fj/pr/pml-bfo-typo pml/bfo: Correct a function name and header filenames	2017-03-03 18:48:01 +09:00
Edgar Gabriel	9e19834327	io/ompio: adjust the priority of the OMPIO component on lustre this commit brings over the behavior from the 2.x series to master, mostly with the fork for the 3.x series in mind. Also, use strncasecmp instead of two strncmps Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-03-02 12:10:11 -06:00
Jeff Squyres	dc53cd5f74	MPI_Wtick: may return a higher resolution than 10e-6 these days Thanks to Mark Dixon (@ccaamad) for reporting the error. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-03-02 10:39:28 -05:00
KAWASHIMA Takahiro	c4ca5e703d	coll: Update `ompi/mca/coll/base/coll_base_functions.h` - Support MPI-2.2 and MPI-3.0 COLL features. * `MPI_REDUCE_SCATTER_BLOCK` * neighborhood collective communication * nonblocking collective communication - Add `_BASE_ARGS` and `_BASE_ARG_NAMES` for convenience. - Use parameter names used in the MPI Standard. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2017-03-02 17:58:02 +09:00
KAWASHIMA Takahiro	96aa0d90c1	pml/bfo: Correct a function name and header filenames These lines were incorrectly modified in `90f2940`. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2017-03-02 16:02:53 +09:00
Alex Mikheev	152f77df59	ompi: pml ucx: fix datatype packing error in bsend Signed-off-by: Alex Mikheev <alexm@mellanox.com>	2017-03-01 16:18:19 +02:00
Yossi Itigin	33471c44ee	pml_yalla/mtl_mxm/hcoll: open memory component to activate memory hooks. Memory hooks are now set-up on demand. pml/yalla, mtl/mxm and coll/hcoll need the memory hooks, so make sure those are installed. Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2017-03-01 12:12:20 +02:00
Gilles Gouaillardet	880f2d5431	mpi/c: revamp error handling in MPI_{Pack,Unpack}[_external] Thanks Alex and the folks at Mellanox for the help. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-03-01 10:03:31 +09:00
Jeff Squyres	d5266aba90	Merge pull request #2955 from jsquyres/pr/hwloc-external-fixes Fix --with-hwloc=external	2017-02-28 14:57:07 -05:00
Josh Hursey	0006f0d7c5	Merge pull request #2773 from jjhursey/topic/hook-fwk Add a 'hook' framework	2017-02-28 12:29:50 -06:00
Ralph Castain	735fbf8f67	Merge pull request #3011 from artpol84/add_proc_fix/master ompi: Avoid unnecessary PMIx lookups when adding procs.	2017-02-28 08:25:08 -08:00
Jeff Squyres	fec519a793	hwloc: rename opal/mca/hwloc/hwloc.h -> hwloc-internal.h Per a prior commit, the presence of "hwloc.h" can cause ambiguity when using --with-hwloc=external (i.e., whether to include opal/mca/hwloc/hwloc.h or whether to include the system-installed hwloc.h). This commit: 1. Renames opal/mca/hwloc/hwloc.h to hwloc-internal.h. 2. Adds opal/mca/hwloc/autogen.options to tell autogen.pl to expect to find hwloc-internal.h (instead of hwloc.h) in opal/mca/hwloc. 3. s@opal/mca/hwloc/hwloc.h@opal/mca/hwloc/hwloc-internal.h@g in the rest of the code base. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-02-28 07:48:42 -08:00
Jeff Squyres	0cd3b6c235	treematch: do not include <hwloc.h> Instead, include "opal/mca/hwloc/hwloc.h" Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-02-28 07:45:23 -08:00
Josh Hursey	b1c4e50500	Merge pull request #2934 from jjhursey/topic/coll-comm-restructure Move coll structure outside of the communicator	2017-02-28 08:45:18 -06:00
Nathan Hjelm	032bcf915a	osc/rdma: fix compile warning Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-27 16:26:00 -07:00
George Bosilca	366d64b7e5	Move the collective structure outside the communicator. As we changed the ABI (forcing a major release), we can limit the size of the predefined communicators by moving the collective structure outside the communicator. This might have a minimal, but unnoticeable, impact on performance. This approach has been discussed during the January 2017 devel meeting. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-02-27 11:54:17 -06:00
Joshua Hursey	c10bbfded6	ompi/hook: Add the hook/license framework * Include a 'demo' component that shows some of the features. * Currently has hooks for: - MPI_Initialized - top, bottom - MPI_Init_thread - top, bottom - MPI_Finalized - top, bottom - MPI_Init - top (pre-opal_init), top (post-opal_init), error, bottom - MPI_Finalize - top, bottom * Other places in ompi can 'register' to hook into any one of these places by passing back a component structure filled with function pointers. * Add a `MCA_BASE_COMPONENT_FLAG_REQUIRED` flag to the MCA structure that is checked by the `hook` framework. If a required, static component has been excluded then the `hook` framework will fail to initialize. - See note in `opal/mca/mca.h` as to why this is checked in the `hook` framework and not in `opal/mca/base/mca_base_component_find.c` Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-02-27 12:05:53 -05:00
Nathan Hjelm	581bff9871	Merge pull request #3034 from hjelmn/osc_rdma_atomic osc/rdma: make locking code more robust	2017-02-27 08:46:52 -07:00
Nathan Hjelm	4707c7c5e0	osc/rdma: make locking code more robust Under heavy load the locking code could fail if the underlying btl module started to return OPAL_ERR_OUT_OF_RESOURCE on atomic operations. This commit updates the code to gracefully handle btl errors. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-27 00:01:26 -07:00
Gilles Gouaillardet	af0b5cffb4	asm: rename the AMD64 into X86_64 in this context, AMD64 really means amd64 or em64t, so let's rename this into X86_64 in order to avoid any confusion Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-27 15:10:50 +09:00
Sylvain Jeaugey	f827b6b8dd	Fix more typos using the allgather module for allreduce operations, causing a crash when CUDA collectives are enabled. Signed-off-by: Sylvain Jeaugey <sjeaugey@nvidia.com> Signed-off-by: Akshay Venkatesh <akvenkatesh@nvidia.com>	2017-02-24 16:35:29 -08:00
Yossi	fb67c966a8	Merge pull request #2944 from alex-mikheev/topic/pml_ucx_bsend ompi: pml ucx: add support for the buffered send	2017-02-22 12:21:03 +02:00
Artem Polyakov	717f3fef62	ompi: Avoid unnecessary PMIx lookups when adding procs. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-02-22 16:09:30 +07:00
Alex Mikheev	b015c8bb48	ompi: pml ucx: add support for the buffered send Signed-off-by: Alex Mikheev <alexm@mellanox.com>	2017-02-21 17:19:22 +02:00
Gilles Gouaillardet	4184c01be5	Merge pull request #2393 from bosilca/topic/no_predefined_ddt_refcount Don't refcount the predefined datatypes.	2017-02-21 09:38:11 +09:00
Todd Kordenbrock	048f757d9f	osc-portals4: add support for noncontiguous datatypes This commit implements onesided operations for noncontiguous datatypes using two different algorithms. * If the result and/or origin datatype is noncontiguous and the target datatype is contiguous, then an iovec MD is created for the result and origin. The operation is performed using a single Portals4 call (unless it exceeds the max message size). * If the target datatype is noncontigous, then an algorithm similar to the one in osc-rdma is used to loop over the contiguous blocks of each datatype. The operation is performed using multiple Portals4 calls. This commit ensures that individual operations do not exceed the max atomic size or the max message size supported by the device. Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>	2017-02-15 16:17:13 -06:00
Gilles Gouaillardet	cd4537193c	osc/sm: fix MPI_Win_allocate_shared() alignment add padding so the memory allocated by MPI_Win_allocate_shared() is 64 bytes aligned. Thanks Joseph Schuchart for the bug report Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-15 13:40:48 +09:00
Josh Hursey	0b273c2561	Merge pull request #2808 from jjhursey/fix/ibm/reduce-local-to-coll coll: Move reduce_local into the coll framework	2017-02-14 15:54:15 -06:00
Nathan Hjelm	cc4a0fabcf	Merge pull request #2727 from hjelmn/osc_rdma osc/rdma: fix typo in check for MPI_MODE_NOCHECK	2017-02-14 10:50:33 -07:00
Joshua Hursey	78006f93a4	coll: Move reduce_local into the coll framework * Since we are adding a new function to `mca_coll_base_module_2_1_0_t` we need to increase the version of the module structure to `2_2_0`. * Add a comment just above the PREDEFINED_COMMUNICATOR_PAD describing it's purpose and when it should change. To help future developers trying to answer the question noted in the comment. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-02-14 08:56:07 -06:00
Gilles Gouaillardet	e70a30cca4	coll/libnbc: optimize zero size ialltoall{v,w} with MPI_IN_PLACE and incidentally avoids malloc(0) Thanks Lisandro Dalcin for the report Fixes open-mpi/ompi#2945 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-13 15:21:28 +09:00
Gilles Gouaillardet	12949547f4	coll/libnbc: fix a2aw_sched_linear() with zero size datatype or zero count Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-13 15:21:28 +09:00
Joshua Hursey	383330a50d	coll/basic: Expand check for negative input values * Negative values are parameter errors for neighborhood collectives - Add checks to the mpi/c interface `MPI_PARAM_CHECK` * Fix a success check for neighbor_alltoallw with dist_graph Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-02-08 14:26:32 -06:00
Geoff Paulsen	4917e44a7d	Merge pull request #2832 from jjhursey/topic/ibm/osc-base-dt-abort osc/base: Detect unsupported data types and abort	2017-02-05 04:26:04 -06:00
Howard Pritchard	f4ad119693	Merge pull request #2914 from hppritcha/topic/nbc_compiler_warning swat some compiler warnings	2017-02-04 11:56:52 -05:00
Howard Pritchard	acaecb2448	swat some compiler warnings Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-02-03 08:28:15 -07:00
Gilles Gouaillardet	e879d2910a	coll/tuned: make coll_tuned_gather_algorithms MCA settable Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-02 11:00:38 +09:00
Nathan Hjelm	362ac8b87e	osc/pt2pt: fix threading issues This commit fixes a number of threading issues discovered in osc/pt2pt. This includes: - Lock the synchronization object not the module in osc_pt2pt_start. This fixes a race between the start function and processing post messages. - Always lock before calling cond_broadcast. Fixes a race between the waiting thread and signaling thread. - Make all atomically updated values volatile. - Make the module lock recursive to protect against some deadlock conditions. Will roll this back once the locks have been re-designed. - Mark incoming complete after completing an accumulate not before. This was causing an incorrect answer under certain conditions. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-01 10:33:01 -07:00
Gilles Gouaillardet	02558134ef	coll/base: remove unused local variable Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-01 11:54:17 +09:00
Gilles Gouaillardet	ad44ecb2ba	pml/base: initialize global variables Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-01 11:49:47 +09:00
bosilca	c331e6794c	Allow all tuned MCA parameters to be modified programatically. (#2829 ) Fix a comment in the MCA header. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-01-31 21:47:36 -05:00
Josh Hursey	5fcd69da52	Merge pull request #2831 from jjhursey/topic/ibm/pml-bsend pml/base: Expose some bsend varaibles so PMLs may reference them	2017-01-31 10:31:42 -06:00
Gilles Gouaillardet	9bcadbd51b	coll/libnbc: fix the red_schain algo of ireduce with MPI_IN_PLACE this fixes a regression introduced in open-mpi/ompi@045d0c5f4c Fixes open-mpi/ompi#2879 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-30 14:19:45 +09:00
Yossi Itigin	13c3bf0dd7	yalla: fix memory leak with blocking non-contig send. Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2017-01-29 18:51:43 +02:00
Ralph Castain	3440b46e5e	Merge pull request #2820 from rhc54/topic/async Per f2f meeting: if async modex is given, default to no MPI init barr…	2017-01-27 15:43:43 -08:00
Josh Hursey	f4a86904c4	Merge pull request #2813 from jjhursey/fix/ibm/comm-cleanup communicator: Fix uninitialized variable	2017-01-26 14:35:32 -06:00
Josh Hursey	ebc90f926e	Merge pull request #2806 from jjhursey/fix/ibm/aint-diff-type Fix a minor error at MPI_AINT_DIFF.	2017-01-26 14:23:21 -06:00
Josh Hursey	0408c116eb	Merge pull request #2805 from jjhursey/fix/ibm/base-allgatherv coll/base: Allgatherv MPI_IN_PLACE Bug	2017-01-26 14:21:57 -06:00
Geoffrey Paulsen	d2527cff46	Fixing comment only in MPI_IN_PLACE case for ireduce in libnbc. Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>	2017-01-26 10:58:51 -08:00
Geoffrey Paulsen	045d0c5f4c	Fix for Ireduce + MPI_IN_PLACE. Fixes a wrong answer from MPI_Ireduce when the red_sched_chain() path was taken (which only happens for np<=4 and mesgsize>=64k). The way libnbc treats MPI_IN_PLACE is to set sbuf == rbuf, and whether an algorithm will work cleanly or not after that depends on the details. In this case the last steps of the algorithm amounted to (right neighbor is sending us reduction results from ranks 1..n-1) recv into rbuf from right neighbor add the contribution from our sbuf into rbuf this would be fine in general, but if sbuf==rbuf, that recv overwrites the sbuf. I changed it to recv into a tmpbuf if MPI_IN_PLACE was used. Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>	2017-01-25 18:08:08 -08:00
Nysal Jan K.A	94f92f6b49	osc/base: Detect unsupported data types and abort Using MPI_MINLOC or MPI_MAXLOC with the following data types leads to data corruption: * MPI_DOUBLE_INT * MPI_LONG_INT * MPI_SHORT_INT * MPI_LONG_DOUBLE_INT Detect this print a error message and abort. This workaround should be removed once the following issue is resolved: * https://github.com/open-mpi/ompi/issues/1666 Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-01-25 15:28:28 -06:00
Sameh S. Sharkawi	320ab3b84f	pml/base: Expose some bsend varaibles so PMLs may reference them Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-01-25 15:21:53 -06:00
Ralph Castain	a7b8190fdc	Per f2f meeting: if async modex is given, default to no MPI init barrier, letting the user override that if desired. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-25 10:13:53 -08:00
Joshua Hursey	a2d45f6e9f	communicator: Fix uninitialized variable Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-01-24 16:46:13 -06:00
Zhi Ming Wang	9718bbac82	Fix a minor error at MPI_AINT_DIFF. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-01-24 16:06:14 -06:00
Mark Allen	a3452adfa9	coll/base: Allgatherv MPI_IN_PLACE Bug MPI_Allgatherv with MPI_IN_PLACE reads data from wrong location. They were locating the MPI_IN_PLACE send buffer as ```c send_buf = (char)rbuf; for (i = 0; i < rank; ++i) { send_buf += ((ptrdiff_t)rcounts[i] extent); } ``` when it should be ```c send_buf = (char)rbuf; send_buf += ((ptrdiff_t)disps[rank] extent); ``` because disps[] specifies where things are in the v-style buffers. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-01-24 15:52:36 -06:00
Edgar Gabriel	cbb3cb9745	fs/ufs: avoid using the exclusive flag with shared file pointer when a file is opened a second time for shared file pointer operations, avoid setting the create and exclusive flag. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-01-24 12:11:29 -06:00
Edgar Gabriel	f5289a1803	common/ompio: store correctly the SHAREDFP_IS_SET flag it looks like disabling the lazy_open flag for sharedfp components revealead a bug that lead to a crash in file_close in some tests. Make sure the SHAREDFP_IS_SET flag is correctly set (and not overwritten again), and we use that to avoid a double-free of the communicator. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-01-24 12:09:56 -06:00
Gilles Gouaillardet	d5aa310884	mpiext/affinity: initialize all output variables of OMPI_Affinity_str() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-24 09:13:29 +09:00
Gilles Gouaillardet	501eb8dc7e	ompio: plug misc memory leaks Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-24 09:13:19 +09:00
Gilles Gouaillardet	d0629f18c2	coll/libnbc: optimize size one communicators simply "return" with ompi_request_empty if the communicator size is 1 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-24 09:12:47 +09:00
Gilles Gouaillardet	6f2ca5809b	man: fix a typo in MPI_Win_get_name() Thanks Nicolas Joly for the report Fixes open-mpi/ompi#2782 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-24 09:08:13 +09:00
Edgar Gabriel	4dc09de3b8	common/ompio: update comment based on the previsou commit. No source code changed. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-01-23 13:38:05 -06:00
Edgar Gabriel	3eae0eecd0	io/ompio: change default for sharedfp_lazy_open parameter Revert the logic of io_ompio_sharedfp_lazy_open. The user now has to explicitely disable shared fp in order for the structures not to be allocated. Otherwise, resetting the shared fp e.g. in case the file was opened in append mode will not work correctly, the code could deadlock. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-01-23 08:59:22 -06:00
Edgar Gabriel	d3a8d38cc6	common/ompio: correctly position shared fp in append mode Fixes a bug reported on the mailing list. ompio did only reposition the individual file pointer when the file was opened in append mode. Set the shared file pointer also to point to the end of the file, similarly to the individual file pointer. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-01-23 08:59:05 -06:00
Nathan Hjelm	0497ec0b70	osc/rdma: fix typo in check for MPI_MODE_NOCHECK This commit fixes two typos in the lock_all path that inverted the MPI_MODE_NOCHECK flag. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-01-12 11:28:11 -07:00
Gilles Gouaillardet	4932391002	ompi/proc: fix ompi_proc_finalize() revert bits of open-mpi/ompi@cf534d0c95 we cannot del_procs here since the pml framework has already been closed Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-12 11:41:35 +09:00
George Bosilca	c2cd717f82	Don't refcount the predefined datatypes. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-01-11 16:48:59 -05:00
Gilles Gouaillardet	2189c5bcc3	ompi/dpm: plug a memory leak in disconnect_waitall() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 15:38:44 +09:00
Gilles Gouaillardet	cf534d0c95	ompi/proc: plug a memory leak in ompi_proc_finalize() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 13:46:35 +09:00
Gilles Gouaillardet	1daa80d78f	mtl/psm2: plug a memory leak in ompi_mtl_psm2_component_open() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 09:28:32 +09:00
Joshua Ladd	57c0c847d0	Merge pull request #2603 from xinzhao3/topic/revert-ucx-mt Revert "PML/SPML/UCX: add UCX MT support to PML and SPML."	2017-01-04 11:50:37 -05:00
Ralph Castain	66131b4183	Remove the bcol, coll/ml, and sbgp code as stale and lacking a maintainer Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-03 19:32:48 -08:00
Ralph Castain	dadc6fbaf6	Merge pull request #2448 from thananon/remove_request_lock Completely removed ompi_request_lock and ompi_request_cond	2017-01-03 19:31:46 -08:00
Jeff Squyres	33d2988985	Merge pull request #2647 from OMGtechy/master Fixed -Wmisleading-indentation in ad_read_coll.c	2017-01-03 12:24:22 -05:00
Ralph Castain	fe68f23099	Only instantiate the HWLOC topology in an MPI process if it actually will be used. There are only five places in the non-daemon code paths where opal_hwloc_topology is currently referenced: * shared memory BTLs (sm, smcuda). I have added a code path to those components that uses the location string instead of the topology itself, if available, thus avoiding instantiating the topology * openib BTL. This uses the distance matrix. At present, I haven't developed a method for replacing that reference. Thus, this component will instantiate the topology * usnic BTL. Uses the distance matrix. * treematch TOPO component. Does some complex tree-based algorithm, so it will instantiate the topology * ess base functions. If a process is direct launched and not bound at launch, this code attempts to bind it. Thus, procs in this scenario will instantiate the topology Note that instantiating the topology on complex chips such as KNL can consume megabytes of memory. Fix pernode binding policy Properly handle the unbound case Correct pointer usage Do not free static error messages! Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-29 10:33:29 -08:00
Joshua Gerrard	94e87654c6	Fixed -Wmisleading-indentation in ad_read_coll.c Signed-off-by: Joshua Gerrard <joshuagerrard+ompi-commit@protonmail.com>	2016-12-28 20:14:13 +00:00
Jeff Squyres	d772fcf8f1	Merge pull request #2509 from OMGtechy/master Fixed memory leak and some -Werror=unused-result warnings	2016-12-27 17:13:23 -05:00
Nysal Jan K.A	25ba507ada	mpit: Fix MPI_T_pvar_get_index MPI_T_pvar_get_index was returning an incorrect index. The index was never set correctly while registering the performance variables. Additionally fix a missing case in the mca_base_var_type_t to MPI datatype conversion. This type is currently used for control variables registered by mxm, fca and hcoll components. Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>	2016-12-22 12:30:21 +05:30
Gilles Gouaillardet	773cad6b3e	ompi/debugger: fix mqs_version_string() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-22 15:00:47 +09:00
Xin Zhao	2d77912c19	Revert "PML/SPML/UCX: add UCX MT support to PML and SPML." This reverts commit `0ecf3c951c`. Signed-off-by: Xin Zhao <xinz@mellanox.com>	2016-12-19 18:57:48 +02:00
Joshua Gerrard	3332a7d630	Fixed memory leak and some -Werror=unused-result warnings Signed-off-by: Joshua Gerrard <joshuagerrard+ompi-commit@protonmail.com>	2016-12-17 17:43:14 +00:00
Mark Allen	eec1d5bf2e	osc/pt2pt: Fix hang with Put and Win_lock_all * When using `MPI_Put` with `MPI_Win_lock_all` a hang is possible since the `put` is waiting on `eager_send_active` to become `true` but that variable might not be reset in the case of `MPI_Win_lock_all` depending on other incoming events (e.g., `post` or ACKs of lock requests. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2016-12-16 11:52:53 -05:00
Mark Allen	0d1336b4a8	osc/pt2pt: Fix Lock/Unlock and Get wrong answer * When using `MPI_Lock`/`MPI_Unlock` with `MPI_Get` and non-contiguous datatypes is is possible that the unlock finishes too early before the data is actually present in the recv buffer. * We need to wait for the irecv to complete before unlocking the target. This commit waits for the outgoing fragment counts to become equal before unlocking. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2016-12-16 11:52:51 -05:00
Mark Allen	1ebf9fd3a4	osc/pt2pt: Fix PSCW after Fence wrong answer. * If the user uses PSCW synchronization after a Fence then the previous epoch is not reset which can cause the PSCW to transfer data before it is ready leading to wrong answers. * This commit resets the `eager_send_active` in the start call. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2016-12-16 11:52:49 -05:00
Xin Zhao	0ecf3c951c	PML/SPML/UCX: add UCX MT support to PML and SPML. Signed-off-by: Xin Zhao <xinz@mellanox.com>	2016-12-15 23:59:15 +02:00
Ralph Castain	585540bcee	Reduce the flood of warnings due to uninitialized variables, mismatched types, and unused things to a more bearable trickle Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-14 16:33:50 -08:00
Ralph Castain	884fb7fcf2	Update the PMIx2 support to include the latest shared memory optimizations Update ORTE support for dynamic PMIx operations e.g., PMIx_Spawn Update to track master Ensure that --disable-pmix-dstore actually disables the dstore. Sync to a few debugger updates Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-14 15:00:10 -08:00
Nathan Hjelm	8155124adc	Merge pull request #2558 from hjelmn/datatype_fix ompi/datatype: fix bug in darray that causes MPI/IO failures	2016-12-13 14:02:15 -07:00
Yossi	fa6e263821	Merge pull request #2537 from alinask/topic/pml-spml-ucx-api PML/SPML/UCX: Adapt to the API changes in the UCX lib.	2016-12-13 20:01:47 +02:00
Nathan Hjelm	eb439228b1	ompi/datatype: fix bug in darray that causes MPI/IO failures This commit fixes errors in the lb and extent of darray datatypes. For these datatypes the lb should be the start offset of the rank's data in the array and the extent should be the size of the entire datatype. In master the lb was always 0 and the extent was always to small. This commit updates the call to opal_datatype_resize to set the correct lb and fixes the extent calculation. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-12-13 09:25:16 -07:00
Jeff Squyres	f9e8a55a0e	Merge pull request #2543 from ggouaillardet/topic/dll_bit_reproducible ompi/debuggers: make the binary bit reproducible	2016-12-09 06:35:47 -05:00
KAWASHIMA Takahiro	6510800c16	ompi/request: Fix a persistent request creation bug According to the MPI-3.1 p.52 and p.53 (cited below), a request created by `MPI_*_INIT` but not yet started by `MPI_START` or `MPI_STARTALL` is inactive therefore `MPI_WAIT` or its friends must return immediately if such a request is passed. The current implementation hangs in `MPI_WAIT` and its friends in such case because a persistent request is initialized as `req_complete = REQUEST_PENDING`. This commit fixes the initialization. Also, this commit fixes internal requests used in `MPI_PROBE` and `MPI_IPROBE` which was marked wrongly as persistent. MPI-3.1 p.52: We shall use the following terminology: A null handle is a handle with value MPI_REQUEST_NULL. A persistent request and the handle to it are inactive if the request is not associated with any ongoing communication (see Section 3.9). A handle is active if it is neither null nor inactive. An empty status is a status which is set to return tag = MPI_ANY_TAG, source = MPI_ANY_SOURCE, error = MPI_SUCCESS, and is also internally configured so that calls to MPI_GET_COUNT, MPI_GET_ELEMENTS, and MPI_GET_ELEMENTS_X return count = 0 and MPI_TEST_CANCELLED returns false. We set a status variable to empty when the value returned by it is not significant. Status is set in this way so as to prevent errors due to accesses of stale information. MPI-3.1 p.53: One is allowed to call MPI_WAIT with a null or inactive request argument. In this case the operation returns immediately with empty status. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2016-12-08 21:42:05 +09:00
Alina Sklarevich	e9d2d029c6	PML/SPML/UCX: Adapt to the API changes in the UCX lib. Signed-off-by: Alina Sklarevich <alinas@mellanox.com>	2016-12-08 11:33:29 +02:00
Gilles Gouaillardet	4d8f606420	ompi/debuggers: make the binary bit reproducible instead of compilation date __DATE__, use a MPI_Get_library_version() like string Thanks Alastair McKinstry for the report Fixes open-mpi/ompi#2518 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-08 13:46:43 +09:00
Joshua Ladd	59f40e7cc5	Merge pull request #2500 from vspetrov/hcoll_ctx_free_detection Detect hcoll_context_free at config	2016-12-05 22:39:40 -05:00
Jeff Squyres	40d94fdc5a	Merge pull request #2422 from edgargabriel/pr/cycle-buf-default-val io/ompio: change the default value of mca parameter	2016-12-05 15:33:52 -05:00
Jeff Squyres	6319332258	Merge pull request #2491 from OMGtechy/master Swapped use of fprintf for opal_output_verbose	2016-12-03 07:32:03 -05:00
Valentin Petrov	e13e264185	Detect hcoll_context_free at config Needed for better flexibility with versioning Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2016-12-02 22:09:20 +02:00
Jeff Squyres	1504ffb18d	ompi_file_delete: output a better error message Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-12-02 11:08:04 -05:00
Joshua Gerrard	d5a45bc12e	Swapped use of fprintf for opal_output_verbose Signed-off-by: Joshua Gerrard <enquiries@joshuagerrard.com>	2016-12-01 19:56:06 +00:00
Gilles Gouaillardet	188b9668e4	ompi/attribute: plug a memory leak in set_value() OBJ_RELEASE() the previous attribute value if any Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:30 +09:00
Gilles Gouaillardet	d94e8c97a0	ompi/runtime: release F90 types in ompi_mpi_finalize() F90 types cannot be freed by the enduser as specified by the standard. but since they are ompi_datatype_dup'ed from predefined datatypes, they have to be explicitly free'd at finalize time in order to avoid a memory leak. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:30 +09:00
Gilles Gouaillardet	b2aca6c753	ompi/proc: plug a memory leak in ompi_proc_unpack() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:29 +09:00
Gilles Gouaillardet	ae278fd5df	ompi/runtime: plug a memory leak declare ompi_mpi_show_mca_params_file as NULL so MPI_T_Init_thread() can be invoked without leaking memory Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:29 +09:00
Gilles Gouaillardet	43ee08b20e	ompi/c: remove unused variable in [i]gatherv Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 13:59:25 +09:00
Gilles Gouaillardet	fe4c4e95eb	coll/libnbc: fix MPI_IN_PLACE handling in i{gather,scatter}[v] MPI_IN_PLACE is only relevant on the root task, so only test is there Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 13:59:25 +09:00
Gilles Gouaillardet	1a8a276914	coll/libnbc: use zero-size messages in ibarrier and silence a valgrind warning Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 13:59:25 +09:00
Gilles Gouaillardet	2eec6a08b5	coll/base: fix ompi_coll_base_reduce_scatter_intra_nonoverlapping() with MPI_IN_PLACE invoke underlying scatterv with MPI_IN_PLACE when appropriate Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 13:59:24 +09:00
Gilles Gouaillardet	8b7999469b	coll/base: fix MPI_IN_PLACE in ompi_coll_base_reduce_generic() avoid copying data to itself when MPI_IN_PLACE is used Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 13:59:24 +09:00
Gilles Gouaillardet	3f1486a508	pml/ob1: initialize one more field in mca_pml_ob1_recv_request_progress_rget() always initialize recvreq->req_rdma_offset to zero. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 13:14:23 +09:00
Jeff Squyres	756d09fd6f	Merge pull request #2457 from OMGtechy/master Fixed -Werror=unused-result warnings in comm_cid.c by adding error checking	2016-11-30 20:41:55 -05:00
Gilles Gouaillardet	15098161a3	coll/libnbc: add some comments on how locks are used no code change Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-30 17:29:51 +09:00
rhc54	d31f173744	Merge pull request #2476 from rhc54/topic/dbgupdate Bring forward the debugger-related changes	2016-11-29 19:10:32 -08:00
Ralph Castain	d5fd635efe	Bring forward the debugger-related changes Refs https://github.com/open-mpi/ompi/pull/2425 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-29 13:15:20 -08:00
Joshua Gerrard	7cf5de12b9	Fixed -Werror=unused-result warnings in comm_cid.c by adding error checking Signed-off-by: Joshua Gerrard <enquiries@joshuagerrard.com>	2016-11-29 21:08:12 +00:00
Ralph Castain	114e20ad66	Never collect data when doing the fence at the end of MPI_Init Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-29 08:31:35 -08:00
Valentin Petrov	4cdb8ecaad	coll/hcoll: hcoll_context_free Adds the new API hcoll_conetxt_free that resolves the issues observed with the ctx cache and group_destroy_notify. Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2016-11-29 07:33:05 +02:00
Jeff Squyres	34ea3ce25a	Merge pull request #1946 from thananon/romio-add-notes romio: update REFRESH_NOTES to accommodate the random() patch.	2016-11-28 16:37:23 -05:00
KAWASHIMA Takahiro	9bfca8b274	pml/ob1: Reduce per-rank memory footprint slightly `sturct mca_pml_ob1_comm_proc_t`, which is allocated per connected rank in a communicator, had two paddings after `expected_sequence` and `send_sequence` by alignments. By changing the order of the members, the size of `mca_pml_ob1_comm_proc_t` is reduced by 8 bytes on 64-bit architectures. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2016-11-28 19:20:48 +09:00
Edgar Gabriel	ebcfbbc045	Merge pull request #2456 from edgargabriel/pr/dynamic_gen2_uneven_distro_bug fcoll/dynamic_gen2: fix bug exposed by uneven distribution of data	2016-11-24 16:34:19 -06:00
Edgar Gabriel	b10558c3da	fcoll/dynamic_gen2: fix bug exposed by uneven distribution of data This fixes a bug reported in-house occuring with this component. It is triggered if the data assigned to different aggregators is highly differing, leading to different number of internal iterations required to handle it. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2016-11-24 13:02:19 -06:00
Artem Polyakov	c660b9b445	Merge pull request #2401 from artpol84/lazy_wait_fix ompi/init: always lazy-wait in ompi_mpi_init	2016-11-23 08:53:02 -08:00
Thananon Patinyasakdikul	6c5553c23c	Removed the unused ompi_request_[waiting, completed, failed, poll] variable. Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>	2016-11-23 10:39:45 -05:00
Ralph Castain	1e2019ce2a	Revert "Update to sync with OMPI master and cleanup to build" This reverts commit `cb55c88a8b`.	2016-11-22 15:03:20 -08:00
Thananon Patinyasakdikul	b25a8c3fa5	Completely removed ompi_request_lock and ompi_request_cond as we dont need them anymore. Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>	2016-11-22 17:58:31 -05:00
Ralph Castain	cb55c88a8b	Update to sync with OMPI master and cleanup to build Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-22 14:24:54 -08:00
Gilles Gouaillardet	2c94a3a6f3	coll/libnbc: fix race condition with multi threaded apps protect the mca_coll_libnbc_component.active_requests list with the new mca_coll_libnbc_component.lock mutex. Thanks Jie Hu for the report Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-21 10:21:47 +09:00
Jijo Varghese	25e138ea1d	error correction to the MPI_file operations thread safety lock Signed-off-by: Jijo Varghese <jijo733@gmail.com>	2016-11-17 08:18:49 -05:00
Edgar Gabriel	26e9210b15	io/ompio: change the default value of mca parameter change the default value of the mca_io_ompio_cycle_buffer_size parameter in order to avoid accidental truncation of a file for very large individual operations. Thanks to @cniethammer for reporting it. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2016-11-15 10:09:43 -06:00
Gilles Gouaillardet	bd364d29f7	osc/sm: plug an other memory leak in ompi_osc_sm_free Fixes open-mpi/ompi@f1b473ee63 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-14 23:19:07 -07:00
Gilles Gouaillardet	f1b473ee63	osc/sm: plug a memory leak in ompi_osc_sm_free Thanks Joseph Schuchart for the report. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-14 22:22:43 -07:00
Josh Hursey	bcc8230501	Merge pull request #2410 from jjhursey/topic/fix-dist-graph-coll topo/base: Fix module reference in collective call	2016-11-14 14:29:00 -06:00
Howard Pritchard	fb5ccd3618	Merge pull request #2404 from osvegis/topic/java_paper Update java paper reference.	2016-11-14 11:35:45 -07:00
Joshua Hursey	5a8b2f7431	topo/base: Fix module reference in collective call Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2016-11-14 11:34:54 -06:00
Jeff Squyres	8d2c98e616	Merge pull request #2408 from hppritcha/manpage_typo fix minor typo in MPI_Comm_connect man page	2016-11-14 09:38:38 -07:00
Howard Pritchard	3923bf8151	fix minor type in MPI_Comm_connect man page Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-11-14 05:52:45 -07:00
Gilles Gouaillardet	fc776e3fa5	coll: code cleanup - instead of coll_base_comm_get_reqs(2) for irecv/isend, use only one request allocated in the stack and do a irecv/send - instead of ompi_request_wait_all(2), simpy ompi_request_wait Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-13 22:35:33 -07:00
Gilles Gouaillardet	99d30353af	coll: Don't allocate space for zero requests Refs #2402 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-13 22:20:58 -07:00
Oscar Vega-Gisbert	891272f556	Update java paper reference. Signed-off-by: Oscar Vega-Gisbert	2016-11-13 22:05:18 +01:00
George Bosilca	725277bc26	Don't allocate space for the requests if the underlying topology has no neighbors. This commit fixes issue #2402. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2016-11-12 18:01:09 -05:00
Artem Polyakov	06a73da5ea	ompi/init: always lazy-wait in ompi_mpi_init According to discussion in #2181 we don't need MCA parameter any more. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2016-11-12 07:54:48 +07:00
Gilles Gouaillardet	11dc86f26b	cleanup: always #include <pthread.h> pthreads are now mandatory, so there is no more need to Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-08 13:07:45 +09:00
Gilles Gouaillardet	023d18abae	pml/ob1: mca_pml_ob1_recv must have memchecker mark the buffer as defined upon success this is generally done in mca_pml_ob1_recv_request_free(), but this is not invoked in via mca_pml_ob1_recv(), so do it manually Thanks Yvan Fournier for the report Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-07 13:10:15 +09:00
Jeff Squyres	f11b0c7edf	Merge pull request #2330 from jjhursey/topic/ibcast-non-uniform-dt-wa coll/libnbc: Work around for non-uniform data types in ibcast	2016-11-05 10:26:04 -04:00
Joshua Hursey	350ef67fe0	coll/libnbc: Work around for non-uniform data types in ibcast * If (legal) non-uniform data type signatures are used in ibcast then the chosen algorithm may fail on the request, and worst case it could produce wrong answers. * Add an MCA parameter that, by default, protects the user from this scenario. If the user really wants to use it then they have to 'opt-in' by setting the following parameter to false: - `-mca coll_libnbc_ibcast_skip_dt_decision f` * Once the following Issues are resolved then this parameter can be removed. - https://github.com/open-mpi/ompi/issues/2256 - https://github.com/open-mpi/ompi/issues/1763 Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2016-11-01 13:33:23 -05:00
Yossi Itigin	17c8f76411	pml_ucx: fix uninitialized field req_status->_cancelled. Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2016-11-01 17:02:22 +02:00
Thananon Patinyasakdikul	ea2d38de14	romio: update REFRESH_NOTES to accommodate the random() patch. From patch: open-mpi/ompi@23b27c510c Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>	2016-10-31 16:08:08 -04:00
Joshua Ladd	d27b680de2	Merge pull request #2305 from vspetrov/hcoll_fortran_pair_types coll/hcoll fortran pair types	2016-10-28 12:05:00 -04:00
Gilles Gouaillardet	efac15e9a1	ompi: use opal_setenv instead of putenv this fixes a memory leak at finalize Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-10-28 09:32:30 +09:00
Gilles Gouaillardet	981dccab8d	ompi: cleanup environment at finalize Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-10-28 09:32:30 +09:00
Gilles Gouaillardet	af67183e2f	pml/v: fix a memory leak close the framework if no more component should be used Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-10-28 09:32:30 +09:00
Jeff Squyres	e9aab634af	Merge pull request #2294 from ggouaillardet/topic/fortran_use_mpi_tkr fortran/use-mpi-tkr misc fixes	2016-10-27 13:09:06 -04:00
Valentin Petrov	2b7e362e56	coll/hcoll fortran pair types Adds mapping of the MPI Fortran pair types (2INTEGER, 2REAL, 2DBLPREC) to the corresponding hcoll dtypes. Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2016-10-27 18:24:07 +03:00
Gilles Gouaillardet	ad7f3f93b0	mpi: support MPI_Dims_create(..., ndims=0, ...) this is a bozo case, but it should not fail with MPI_ERR_DIMS Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-10-27 14:13:51 +09:00
Gilles Gouaillardet	bf789762a7	fortran/use-mpi-tkr: fix MPI_Sizeof handling MPI_Sizeof related stuff has been moved to their own files. Remove MPI_Sizeof from Fortran interfaces when it cannot be built (e.g. stock gcc 4.8 on CentOS 7) Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-10-27 08:54:28 +09:00
Gilles Gouaillardet	1a16e68c26	fortran/use-mpi-trk: add PMPI_* interfaces in mpi module Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-10-27 08:54:07 +09:00
Gilles Gouaillardet	8e26e78728	fortran/use-mpi-tkr: only build MPI_File support if requested Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-10-27 08:54:07 +09:00
Gilles Gouaillardet	5543b19e9a	fortran/use-mpi-tkr: rename mpi-f90-cptr-interfaces.F90 into mpi-f90-cptr-interfaces.h this file is meant to be included and not compiled, so use a consistent naming Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-10-27 08:54:07 +09:00
Edgar Gabriel	2076622924	Merge pull request #2238 from edgargabriel/pr/delete-error-codes update the error codes reported by file_delete	2016-10-25 12:38:03 -05:00
George Bosilca	028e747470	Do not alter ompi_coll_tuned_use_dynamic_rules. This is set globally as an MCA parameter and should be never altered based on a single communicator setting.	2016-10-25 12:17:25 -04:00
George Bosilca	253eb80e26	Code cleaning of the tuned module.	2016-10-25 12:17:25 -04:00
Edgar Gabriel	74441b960b	update the error codes reported by file_delete	2016-10-25 10:15:14 -05:00
Gilles Gouaillardet	8e788b5aee	pml/ob1: refactor append_recv_req_to_queue() to improve readability and fix a typo in a comment Thanks George for the patch	2016-10-25 10:50:40 +09:00
Gilles Gouaillardet	4a886ac4cc	pml/ob1: correctly reset receive request type before init recvreq->req_recv.req_base.req_type should always be set before invoking MCA_PML_OB1_RECV_REQUEST_INIT(recvreq, ...) otherwise, the previous type might be set, and you could end up with MPC_PML_REQUEST_IMPROBE when MCA_PML_REQUEST_RECV is expected. Thanks Chris Pattison for the report and test case. Fixes open-mpi/ompi#2275	2016-10-24 16:50:23 +09:00
Gilles Gouaillardet	055df6f7c6	fortran: correctly defines MPI_DISPLACEMENT_CURRENT with KIND=MPI_OFFSET_KIND and remove unused ompi/include/mpif-mpi-io.h	2016-10-24 09:53:53 +09:00
Gilles Gouaillardet	e2769e4343	fortran/use-mpi-ignore-tkr: fix typo in MPI_File_write_at_all_begin interface	2016-10-24 09:53:52 +09:00
Gilles Gouaillardet	e02dc1e637	fortran/use-mpi-ignore-trk: fix typo in MPI_File_read_ordered_begin interface	2016-10-24 09:53:52 +09:00
Gilles Gouaillardet	6714f6aee7	coll/libnbc: fix MPI_Ialltoallv with MPI_IN_PLACE and without MPI param check	2016-10-24 09:29:06 +09:00
Gilles Gouaillardet	98f62690f1	man: fix typos in MPI_Info_get_{nkeys,nthkey} Thanks Nicolas Joly for the patch	2016-10-24 09:04:29 +09:00
Josh Hursey	d1ecc83e14	Merge pull request #2245 from jjhursey/topic/libnbc-error-path coll/libnbc: Fix error path on internal error	2016-10-21 13:27:17 -05:00
Joshua Hursey	8748e54c11	coll/libnbc: Fix error path on internal error * If an error is detected internal to libnbc (e.g., PML truncation error) this patch makes sure that the request is completed and the `MPI_ERROR` field is set approprately. * Make an attempt to cleanup outstanding requests before returning. - This is a "best attempt" since not all PMLs support canceling requests.	2016-10-21 11:41:08 -04:00
Gilles Gouaillardet	45336d0bea	libnbc: fix iallgather[v] In order to optimize for MPI_IN_PLACE, data is sent from the receive buffer. consequently, it should be sent with the receive type and count. Thanks Josh Hursey for the report and test case Refs open-mpi/ompi#2256	2016-10-21 10:24:25 +09:00
Gilles Gouaillardet	e78fcc4db9	coll/base: fix ompi_coll_base_{gather,scatter}_intra_binomial receive type is only relevant for root with gather, send type is only relevant for root with scatter, so do not access these types on a non root task	2016-10-19 14:05:22 +09:00
Gilles Gouaillardet	cb76d93b4e	ompi_wrapper_script: fix $extra_ldflags use @OMPI_PKG_CONFIG_LDFLAGS@ instead of @OMPI_WRAPPER_EXTRA_LDFLAGS@ so @{libdir} is substitued with ${libdir} Thanks Manesh Nanavalla for the report	2016-10-19 09:57:55 +09:00
Ralph Castain	7910aa23eb	Set lazy_wait_in_init "on" by default for test in master	2016-10-18 08:47:04 -07:00
Gilles Gouaillardet	1e3191115b	Merge pull request #2172 from ggouaillardet/topic/ialltoall_in_place support MPI_IN_PLACE in MPI_Ialltoall*	2016-10-17 17:00:47 +09:00
Gilles Gouaillardet	c530b0a07c	mpi/cxx: remove duplicate and now useless typedef	2016-10-15 14:30:00 +09:00
Joshua Ladd	64a15188bd	Merge pull request #2199 from vspetrov/coll_hcoll_ialltoallv coll/hcoll: ialltoallv interface	2016-10-14 07:59:23 -06:00
Gilles Gouaillardet	9389de4199	topo/treematch: fix displacements in mca_topo_treematch_dist_graph_create()	2016-10-14 17:16:49 +09:00
Joshua Ladd	b661307e6f	Merge pull request #2218 from yosefe/topic/ucx-pml-spml-update ucx: adapt pml_ucx and spml_ucx to new UCX APIs	2016-10-13 09:23:37 -04:00
Gilles Gouaillardet	958e29f929	osc/rdma: silence a warning declare a local variable volatile and silence CID 1372692	2016-10-13 16:10:07 +09:00
Yossi Itigin	05ca466c6b	ucx: adapt pml_ucx and spml_ucx to new UCX APIs - pass field_mask to ucp_init(). - use non-blocking disconnect. - recv() with pre-allocated request. - call opal_progress() from iprobe() and improbe(). - use shift pattern in connect/disconnect.	2016-10-12 23:45:45 +03:00
Nathan Hjelm	1b01b6db4f	Merge pull request #2213 from hjelmn/osc_rdma osc/rdma: fix warnings	2016-10-12 13:13:27 -06:00
Nathan Hjelm	ab0a005c95	mpi/cxx: fix warnings Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-10-12 11:30:36 -06:00
Nathan Hjelm	e8ef503bee	osc/rdma: fix warnings Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-10-12 10:17:25 -06:00
Gilles Gouaillardet	974dd64293	ompi/errhandler: remove useless include file and avoid #includ'ing stuff that breaks C++ compilers	2016-10-12 14:49:31 +09:00
Ralph Castain	a2919174d0	Bring the RML modifications across. This is the first step in a revamp of the ORTE messaging subsystem to support fabric-based communications during launch and wireup phases. When completed, the grpcomm and plm frameworks will each have their own "conduit" for communication - each conduit corresponds to a particular RML messaging transport. This can be the active OOB-based component, or a provider from within the RML/OFI component. Messages sent down the conduit will flow across the associated transport. Multiple conduits can exist at the same time, and can even point to the same base transport. Each conduit can have its own characteristics (e.g., flow control) based on the info keys provided to the "open_conduit" call. For ease during the transition period, the "legacy" RML interfaces remain as wrappers over the new conduit-based APIs using a default conduit opened during orte_init - this default conduit is tied to the OOB framework so that current behaviors are preserved. Once the transition has been completed, a one-time cleanup will be done to update all RML calls to the new APIs and the "legacy" interfaces will be deleted. While we are at it: Remove oob/usock component to eliminate the TMPDIR length problem - get all working, including oob_stress	2016-10-11 16:01:02 -07:00
Nathan Hjelm	432d79046b	Merge pull request #2197 from tkordenbrock/topic/master/osc-rdma.put.use.true_extent osc-rdma: fix datatype lower bound errors in ompi_osc_rdma_master()	2016-10-11 10:42:02 -06:00
Valentin Petrov	9747a9ea9b	coll/hcoll: ialltoallv interface	2016-10-10 15:09:07 +03:00
Todd Kordenbrock	05f86b5df7	osc-rdma: fix datatype lower bound errors in ompi_osc_rdma_master() Instead of ompi_datatype_get_extent(), use ompi_datatype_get_true_extent() to get the local and remote lower bound. For derived types like subarray, true_lb is the correct offset for RDMA operations.	2016-10-10 06:45:28 -05:00
Todd Kordenbrock	cc863ff9fb	osc-portals4: fix datatype errors in put() Instead of ompi_datatype_get_extent(), use ompi_datatype_get_true_extent() to get the origin and target lower bound. For derived types like subarray, true_lb is the correct offset for RDMA operations. Also, instead of the extent use the size of the datatype.	2016-10-10 06:45:14 -05:00
Gilles Gouaillardet	7cae36f5ab	ompi: accept MPI_IN_PLACE in MPI_Ialltoall*	2016-10-08 19:47:25 +09:00
Gilles Gouaillardet	1e0f591811	coll/libnbc: implement support for MPI_IN_PLACE in MPI_Ialltoall* Thanks Chris Ward for the report Many thanks to George for the guidance	2016-10-08 19:44:01 +09:00
Gilles Gouaillardet	315a622723	ompi: invokes opal_cleanup() if ompi_mpi_finalize() when possible As long as it is illegal to call MPI_T_init_thread() after MPI_Finalize(), be gentle and release as much memory as possible in MPI_Finalize(). opal_cleanup() will be invoked again by the OPAL destructor, but will do nothing since classes was set to NULL	2016-10-08 16:58:20 +09:00
Gilles Gouaillardet	1ef2ad029f	fs: do not build the fs components configured with --disable-io-ompio	2016-10-07 13:15:04 +09:00
Gilles Gouaillardet	6c6e35bb40	ompi/communicator: silence warnings	2016-10-06 15:03:06 +09:00
Gilles Gouaillardet	b95e243f83	ompi/errhandler: silence warnings ISO C forbids mixing object pointer and function pointer	2016-10-06 13:20:51 +09:00
Gilles Gouaillardet	95e63d7803	cxx bindings: fix support for --disable-mpi-io configure option Fixes open-mpi/ompi#2179	2016-10-06 09:53:59 +09:00
Todd Kordenbrock	54c46ca14e	Merge pull request #2156 from tkordenbrock/topic/raccumulate.offset.fix osc-portals4: fix offset bug in raccumulate()	2016-10-04 20:11:54 -05:00
Nathan Hjelm	c6464cae37	mpi/cxx: isolate internal headers from C++ bindings This commit adds some glue code to support the C++ bindings and updates the bindings to use the new glue code. This protects our internal headers (which are C99) from C++. This is done as a quick workaround to compilation errors when the legacy C++ bindings are requested. Fixes open-mpi/ompi#2055 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-10-04 11:13:25 -06:00
Todd Kordenbrock	c536e11cf3	osc-portals4: fix offset bug in raccumulate() This commit fixes a bug where the remote offset was used as both the local and remote offset. Thanks to @PDeveze for the patch.	2016-10-04 09:09:17 -05:00
Joshua Hursey	fc3cf994db	build: Custom libmpi_FOO name fix for wrapper compilers * In open-mpi/ompi@f6f24a4f67 I missed updating the library references for the wrapper compilers. * Fixes the CXX wrapper compiler and CXX library is renamed as needed. * Fixes the Java wrapper compiler and the Java library is renamed as needed.	2016-09-30 16:40:56 -05:00
Joshua Hursey	f6f24a4f67	build: Custom libmpi(_FOO) name option in configure * Add a configure time option to rename libmpi(_FOO).* - `--with-libmpi-name=STRING` * This commit only impacts the installed libraries. Internal, temporary libraries have not been renamed to limit the scope of the patch to only what is needed. For example: ```shell shell$ ./configure --with-libmpi-name=wookie ... shell$ find . -name "libmpi" shell$ find . -name "libwookie" ./lib/libwookie.so.0.0.0 ./lib/libwookie.so.0 ./lib/libwookie.so ./lib/libwookie.la ./lib/libwookie_mpifh.so.0.0.0 ./lib/libwookie_mpifh.so.0 ./lib/libwookie_mpifh.so ./lib/libwookie_mpifh.la ./lib/libwookie_usempi.so.0.0.0 ./lib/libwookie_usempi.so.0 ./lib/libwookie_usempi.so ./lib/libwookie_usempi.la shell$ ```	2016-09-29 21:47:24 -05:00
Joshua Ladd	4b0b7fd18e	Merge pull request #2089 from artpol84/fix_pmix_barrier ompi/mpi_init: fix barrier	2016-09-27 09:43:25 -04:00
Gilles Gouaillardet	80beb30c58	coll/sync: plug a memory leak	2016-09-27 16:29:57 +09:00
Artem Polyakov	08618845a4	ompi/mpi_init: fix barrier Relax CPU usage pressure from the application processes when doing modex and barrier in ompi_mpi_init. We see significant latencies in SLURM/pmix plugin barrier progress because app processes are aggressively call opal_progress pushing away daemon process doing collective progress.	2016-09-27 07:28:52 +03:00
Gilles Gouaillardet	6b57b77ecb	configury: add the --disable-io-ompio option --disable-io-ompio is a shortcut that disable the following frameworks and components - fbtl - fcoll - sharedfp - common/ompio - io/ompio Fixes open-mpi/ompi#1934	2016-09-23 09:41:09 +09:00
Gilles Gouaillardet	505be0ebaf	Merge pull request #2018 from ggouaillardet/topic/disable_mpi_io configury: fix --disable-mpi-io for static builds	2016-09-21 23:30:14 +09:00
George Bosilca	803897a915	Correctly indent the code.	2016-09-21 07:46:53 -04:00
Gilles Gouaillardet	c3f4b7bd46	configury: fix --disable-mpi-io - move the mpi-io configury option into config/ompi_configure_options.m4 - add ompi/mca/common/ompio/configure.m4 so this component is not built when Open MPI is configure'd with --disable-mpi-io Fixes open-mpi/ompi#2009	2016-09-21 14:29:37 +09:00
Gilles Gouaillardet	eae9d31784	pre_condition_transports: code cleanup replace hard coded "OMPI_MCA_orte_precondition_transports" environment variable name with macro'ed OPAL_MCA_PREFIX"orte_precondition_transports"	2016-09-19 13:31:47 +09:00
George Bosilca	295eec7059	Small fix for persistence receives. A minor optimization, few typos and extra comments	2016-09-16 10:27:32 -04:00
Nathan Hjelm	56cd5e102d	Merge pull request #2082 from hjelmn/osc_rdma_fix osc/rdma: fix typo in compare-and-swap	2016-09-15 08:49:06 -06:00
KAWASHIMA Takahiro	e3b3e52fdd	man: Fix typos in man	2016-09-15 17:25:16 +09:00
Nathan Hjelm	59bae1a330	osc/rdma: fix typo in compare-and-swap This commit fixes a typo in compare-and-swap when retrieving the memory region associated with a displacement. It was erroneously 8 bytes instead of the datatype size. This can cause an incorrect RMA range error when the compare-and-swap is less than 4 bytes from the end of the region. Fixed open-mpi/ompi#2080 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-14 16:49:42 -06:00
Josh Hursey	4159fee0d6	Merge pull request #2073 from jjhursey/topic/ompitrace-version libompitrace: Use VERSION file to set .so version	2016-09-13 08:51:27 -05:00
Gilles Gouaillardet	628c730196	pkgconfig: define the pkgincludedir variable in *.pc files this has been made necesarry with open-mpi/ompi@12e796dcaf Refs open-mpi/ompi#2069	2016-09-13 09:50:14 +09:00
Joshua Hursey	b8dfd9a92b	libompitrace: Use VERSION file to set .so version	2016-09-12 18:02:06 -04:00
Gilles Gouaillardet	3b968ec6bb	ompi/communicator: fix typos in CID generation use MPI_MIN instead of MPI_MAX when appropriate, otherwise a currently used CID can be reused, and bad things will likely happen. Refs open-mpi/ompi#2061	2016-09-09 10:10:35 +09:00
Nathaniel Graham	745872e781	Merge pull request #2063 from nrgraham23/check_exceptionCheck Error handling improvements	2016-09-08 14:26:03 -06:00
Nathaniel Graham	5380427050	Error handling improvements This commit improves and corrects error handling. In cases where existing objects are altered after a call to ompi_java_exceptionCheck, the results of the exception check method are checked. In the case of an exception, memory is cleaned up and the code returns to Java without altering existing objects. Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>	2016-09-08 11:25:43 -06:00
Nathan Hjelm	7c8e7691a7	Merge pull request #2045 from hjelmn/osc_rdma_atomics osc/rdma: add support for network AMOs	2016-09-08 11:21:49 -06:00
Gilles Gouaillardet	d1e1ec51b6	ompio: correctly fix a memory plug as newly reported by Coverity with CID 1372660	2016-09-08 18:50:18 +09:00
Artem Polyakov	84e178ce94	Merge pull request #1821 from artpol84/fix_waitsome_v2 MPI_Waitsome performance improvement (version #2)	2016-09-08 13:55:37 +07:00
Nathan Hjelm	63d73a5dd0	Merge pull request #2061 from hjelmn/cid_inter comm/cid: use ibcast to distribute result in intercomm case	2016-09-07 16:36:00 -06:00
Jeff Squyres	fd829ac389	Merge pull request #1982 from jsquyres/pr/fix-pkg-config-static pkg-config: fix static linking	2016-09-07 14:55:50 -04:00
Nathan Hjelm	54cc829aab	comm/cid: use ibcast to distribute result in intercomm case This commit updates the intercomm allgather to do a local comm bcast as the final step. This should resolve a hang seen in intercomm tests. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-09-07 10:49:04 -06:00
Gilles Gouaillardet	213a981041	io/ompio: plug memory leaks as reported by Coverity with CIDs 1369022 and 1369023	2016-09-07 10:08:44 +09:00
Ralph Castain	7f3fac48ab	Fix typo on the COLL_SYNC macro	2016-09-06 12:43:07 -07:00
Todd Kordenbrock	a17dff281d	Merge pull request #1900 from PDeveze/mtl-portals4-short_msg-split_msg Mtl portals4 short msg split msg	2016-09-06 11:14:19 -05:00
Gilles Gouaillardet	91e1200c14	ompi/request: correctly handle zero count in ompi_request_default_wait_{all,any,some}	2016-09-05 17:19:30 +09:00
Nathan Hjelm	1ce5847e8b	osc/rdma: add support for network AMOs This commit adds support for using network AMOs for MPI_Accumulate, MPI_Fetch_and_op, and MPI_Compare_and_swap. This support is only enabled if the ompi_single_intrinsic info key is specified or the acc_single_interinsic MCA variable is set. This configuration indicates to this implementation that no long accumulates will be performed since these do not currently mix with the AMO implementation. This commit also cleans up the code somwhat. This includes removing unnecessary struct keywords where the type is also typedef'd. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-01 15:47:33 -06:00
Nathan Hjelm	cb1cb5ffed	osc/pt2pt: do not use frag send to send lock request This commit cleans up some code in the passive target path. The code used the buffered frag control send path but it is more appropriate to use the unbuffered one. This avoids checking structures that are should not be in use in this path. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-01 09:57:27 -06:00
Gilles Gouaillardet	0a25420dac	oshmem: get rid of oshmem_proc_t and use ompi_proc_t instead store oshmem related per proc data in an oshmem_proc_data_t struct, that is stored in the padding section of an ompi_proc_t this data can be accessed via the OSHMEM_PROC_DATA(proc) macro Fixes open-mpi/ompi#2023	2016-09-01 14:20:14 +09:00
Gilles Gouaillardet	75b7ef97a0	coll/libnbc: fix nbc_ireduce when sendbuf == recvbuf if sendbuf is equal to recvbuf, that should not be interpreted as equivalent to MPI_IN_PLACE on the non root rank(s) Thanks Valentin Petrov for the report	2016-09-01 10:19:05 +09:00
Gilles Gouaillardet	2969235324	libnbc: fix NBC_Copy for predefined datatypes predefined datatypes such as MPI_LONG_DOUBLE_INT are not really contiguous, so use span as returned by opal_datatype_span() instead of type extent, otherwise data might be written above allocated memory. Thanks Valentin Petrov for the report	2016-09-01 10:18:57 +09:00
Edgar Gabriel	be183cb3dd	io/ompio: fix the reference count of basic datatypes used as etypes or ftypes.	2016-08-31 14:08:26 -05:00
Nathan Hjelm	99b26644c1	Merge pull request #2011 from hjelmn/osc_pt2pt_fix osc/pt2pt: fix possible race in peer locking	2016-08-29 09:17:36 -06:00
Edgar Gabriel	b5c757e82c	Merge pull request #2014 from edgargabriel/topic/mt-io Topic/mt io	2016-08-26 08:54:45 -05:00
Edgar Gabriel	1ba03d38ec	io/ompio: protect remaining functions in multi-threaded scenarios protect the remaining functions where necessary by a mutex lock to avoid problems in multi-threaded executions. Some functions do not require that in my opinion, and I provided an explanation in those cases.	2016-08-25 13:45:51 -05:00
Nathan Hjelm	e53de7ecbe	osc/rdma: fix bug in dynamic memory window tracking code This commit fixes an ordering bug in the code that keeps track of all attached memory windows. The code is intended to keep the memory regions sorted but was often inserting at the wrong index. Thanks to Christoph Niethammer for reporting the issue. The reproducer will be added to nightly MTT testing. Fixes open-mpi/ompi#2012 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-25 12:08:46 -06:00
Nathan Hjelm	7af138f83b	osc/pt2pt: fix possible race in peer locking It is possible for another thread to process a lock ack before the peer is set as locked. In this case either setting the locked or the eager active flag might clobber the other thread. To address this the flags have been made volatile and are set atomically. Since there is no a opal_atomic_or or opal_atomic_and function just use cmpset for now. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-25 09:28:25 -06:00
Nathan Hjelm	c082068953	Merge pull request #2006 from hjelmn/osc_pt2pt_fix osc/pt2pt: fix several bugs	2016-08-25 09:19:29 -06:00
Edgar Gabriel	1cee83cc1b	use the common/ interfaces in file_preallocate instead of the io_ompio_ interfaces. Necessar for avoiding potential deadlock situations in multi-threaded scenarios.	2016-08-25 08:55:12 -05:00
Nathan Hjelm	70f8a6e792	osc/pt2pt: fix several bugs This commit fixes some bugs uncovered during thread testing of 2.0.1rc1. With these fixes the component is running cleanly with threads. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-24 14:35:45 -06:00
Ralph Castain	bcf5ac3971	Set the default value of both barrier counters to zero, thus ensuring the coll/sync component is off by default	2016-08-24 07:51:32 -07:00
Ralph Castain	22844b0dc6	Balance priorities to ensure something is below sync	2016-08-23 17:33:45 -07:00
Ralph Castain	540f23c4dd	Adjust priority of coll/sync downwards	2016-08-23 17:12:48 -07:00
Edgar Gabriel	41ed4a28d2	add the protective lock around read and write operations in ompio	2016-08-23 11:07:58 -05:00
Howard Pritchard	696121cc4a	Merge pull request #1988 from hppritcha/topic/another_ofi_fix mtl/ofi: fix a botched assignment of av_type	2016-08-22 17:59:59 -06:00
Ralph Castain	6549c878a9	Silence the warnings	2016-08-22 15:35:27 -07:00
Ralph Castain	871bedb103	Add missing "const" qualifiers	2016-08-22 12:54:24 -07:00
Edgar Gabriel	a76f4d7c69	Merge pull request #1990 from edgargabriel/topic/mt-io steps towards making file I/O operations thread safe	2016-08-22 08:19:33 -05:00
Joshua Ladd	deae1ab375	Merge pull request #1985 from vspetrov/master coll/hcoll: Fixes predifined types mapping	2016-08-22 09:18:59 -04:00
Edgar Gabriel	c3d4ee3f73	ompi/file: add a muteex to the ompi_file_t structure Adding a mutex to thje ompi_file_t structure allows to have a per-file handle mutex lock for both ROMIO and OMPIO. I double checked that the size of the ompi_file_t structure is still below the size of the predefined_file_t structure, so we should be good from the backward compatibility perspective.	2016-08-21 16:09:12 -05:00
Edgar Gabriel	bc042259bc	make initialization of the io framework thread safe. Also, remove the lock/unlock in the file_open ompi-interface routines of romio314. The global lock in the romio component does probably not work, it is easy to construct a testcase where two threads perform collective I/O operations on different file handles. With a global lock it is easy to deadlock. THe lock has to be at least on the file handle basis. move the mutex to file/file.c to avoid duplicate symbol problem in file_open.c pfile_open.c	2016-08-21 16:09:00 -05:00
George Bosilca	b96ec77e40	This variable belongs to the tuned modules and not to base.	2016-08-20 15:37:55 -04:00
George Bosilca	e8425eb1f5	Rename an OMPI internal variable (ticket #1955 ).	2016-08-20 15:37:55 -04:00
rhc54	102d3afe2c	Merge pull request #1992 from rhc54/topic/sync Restore the coll/sync module and provide a test to verify its operation	2016-08-20 13:33:28 -05:00
George Bosilca	fd57f5bccd	Remove some of the clang warnings.	2016-08-20 14:21:42 -04:00
Ralph Castain	9888615e75	Restore the coll/sync module and provide a test to verify its operation	2016-08-20 10:14:52 -07:00
Howard Pritchard	61d62b6821	mtl/ofi: fix a botched assignment of av_type Well now the av_type is being assigned correctly Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-08-19 17:01:02 -05:00
Nathan Hjelm	f3e9a72f1a	Merge pull request #1987 from hjelmn/cid comm/cid: fix threaded CID allocation	2016-08-19 14:26:39 -06:00
Nathan Hjelm	fbbf743c36	comm/cid: fix threaded CID allocation This commit should restore the pre-non-blocking behavior of the CID allocator when threads are used. There are two primary changes: 1) do not hold the cid allocator lock past the end of a request callback, and 2) if a lower id communicator is detected during CID allocation back off and let the lower id communicator finish before continuing. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-19 11:47:19 -06:00
Valentin Petrov	9790373fc6	coll/hcoll: Fixes predifined types mapping	2016-08-19 11:19:12 +03:00
Nathan Hjelm	e5c7512692	Merge pull request #1983 from hjelmn/request_cb ompi/request: change semantics of ompi request callbacks	2016-08-18 08:31:56 -06:00
Nathan Hjelm	6aa658ae33	ompi/request: change semantics of ompi request callbacks This commit changes the sematics of ompi request callbacks. If a request's callback has freed or re-posted (using start) a request the callback must return 1 instead of OMPI_SUCCESS. This indicates to ompi_request_complete that the request should not be modified further. This fixes a race condition in osc/pt2pt that could lead to the req_state being inconsistent if a request is freed between the callback and setting the request as complete. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-17 20:14:01 -06:00
Jeff Squyres	fb894e6e3e	pkg-config: fix static linking We need to list all major project libraries in the private libraries line to enable static linking to work properly. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-17 20:37:51 -05:00
Sylvain Jeaugey	61e900eea5	Fix typo calling allreduce with the allgather module. That was causing CUDA collective to crash.	2016-08-17 17:05:13 -07:00
Edgar Gabriel	e14c23ba79	Merge pull request #1980 from edgargabriel/topic/coverty-cleanup io/ompio: Topic/coverty cleanup	2016-08-17 17:27:51 -05:00
Edgar Gabriel	2c8437ce62	fs/pvfs2: fix a common symbol	2016-08-17 13:10:32 -05:00
Edgar Gabriel	eba5293586	fix coverty warning CID 1369021	2016-08-17 13:02:45 -05:00
Nathan Hjelm	40b70889e5	osc/pt2pt: make receive count an unsigned int This receive_count MCA variable should never be negative. Change it to an unsigned int. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-08-17 08:14:24 -06:00
Gilles Gouaillardet	8faa1edafa	osc/pt2pt: silence misc warnings	2016-08-17 14:24:14 +09:00
LANL OMPI Bot	96c7762050	Merge pull request #1942 from hppritcha/topic/minor_ofi_fix mtl/ofi: use mca param to set av type	2016-08-16 14:14:12 -06:00
Nathan Hjelm	9444df1eb7	osc/pt2pt: make lock_all locking on-demand The original lock_all algorithm in osc/pt2pt sent a lock message to each peer in the communicator even if the peer is never the target of an operation. Since this scales very poorly the implementation has been replaced by one that locks the remote peer on first communication after a call to MPI_Win_lock_all. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-11 15:33:07 -06:00
Nathan Hjelm	7589a25377	osc/pt2pt: do not repost receive from request callback This commit fixes an issue that can occur if a target gets overwhelmed with requests. This can cause osc/pt2pt to go into deep recursion with a stack like req_complete_cb -> ompi_osc_pt2pt_callback -> start -> req_complete_cb -> ... . At small scale this is fine as the recursion depth stays small but at larger scale we can quickly exhaust the stack processing frag requests. To fix the issue the request callback now simply puts the request on a list and returns. The osc/pt2pt progress function then handles the processing and reposting of the request. As part of this change osc/pt2pt can now post multiple fragment receive requests per window. This should help prevent a target from being overwhelmed. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-08-11 15:33:07 -06:00
George Bosilca	8d0baf140f	If the RTE fails to deliver the daemon information, gracefully fallback to a non-reordered communicator. Optimize the loops building the process hierarchy.	2016-08-11 13:04:27 -04:00
Howard Pritchard	e46eee3fcb	mtl/ofi: use mca param to set av type Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-08-10 16:10:17 -06:00
Gilles Gouaillardet	dfbf2b7be4	opal/threads: add OPAL_THREAD_SUB_SIZE_T macro -1 is not a valid size_t, so instead of OPAL_THREAD_ADD_SIZE_T(..., -1), simply OPAL_THREAD_SUB_SIZE_T(..., 1) and keep picky compilers happy	2016-08-10 13:37:36 +09:00
Nathan Hjelm	799104f688	Merge pull request #1947 from hjelmn/perf pml/ob1: be more selective when using rdma capable btls	2016-08-09 22:15:09 -06:00
Nathan Hjelm	4079eec974	pml/ob1: be more selective when using rdma capable btls This commit updates the btl selection logic for the RDMA and RDMA pipeline protocols to use a btl iff: 1) the btl is also used for eager messages (high exclusivity), or 2) no other RDMA btl is available on an endpoint and the pml_ob1_use_all_rdma MCA variable is true. This fixes a performance regression with shared memory when an RDMA capable network is available. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-09 20:54:42 -06:00
Ralph Castain	ba77d9beff	Remove forced debugs	2016-08-08 13:20:24 -07:00
Nathan Hjelm	2788083b98	Merge pull request #1936 from hjelmn/osc_pt2pt_fix osc/pt2pt: do not set rdma_frag after start	2016-08-08 14:17:40 -06:00
Nathan Hjelm	e4d7ea75a9	Merge pull request #1935 from hjelmn/persistent_fix pml/ob1: reset req_bytes_packed on start	2016-08-08 14:17:13 -06:00
Todd Kordenbrock	3be6052523	Merge pull request #1896 from PDeveze/Patchs-on-coll-portals4 Patchs on coll portals4	2016-08-08 14:57:02 -05:00
Edgar Gabriel	fb9fa4fbc4	Merge pull request #1938 from edgargabriel/pr/barrier-on-close io/ompio: Add barrier to file_close and to file_set_size	2016-08-08 09:22:08 -05:00
Edgar Gabriel	4709f4229b	Merge pull request #1929 from edgargabriel/pr/ompio-code-reorg io/ompio: next step in code-reorganization	2016-08-08 09:20:54 -05:00
Thananon Patinyasakdikul	23b27c510c	romio: make romio use internal opal_random instead of rand(3). This fixes issue #1877	2016-08-05 09:04:52 -07:00
Howard Pritchard	ff669e7b15	code cleanup: clang is now a happier panda Clang 5.1 on my mac was a sad panda compiling a couple of files, complaining about uninitialized stack variables. This commit makes clang a happier panda (or at least not so sad). Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-08-04 19:34:44 -06:00
Edgar Gabriel	9c3180160c	io/ompio: Add barrier to file_close and to file_set_size This fixes a bug reported on the mailing for ompio. https://www.open-mpi.org/community/lists/users/2016/05/29333.php	2016-08-04 11:20:31 -05:00
Jeff Squyres	c2634612d7	Merge pull request #1870 from jsquyres/pr/fix-unweighted-and-weights-empty mpi.h: fix types of MPI_UNWEIGHTED and MPI_WEIGHTS_EMPTY	2016-08-04 08:49:58 -07:00
Gilles Gouaillardet	60e91e890a	coll/base: give a boost to ompi_coll_base_sendrecv_nonzero_actual() Based on current implementation it is faster to use a blocking send than the non-blocking version. Switch the exchange function used in the barrier to use the blocking version combined with the non-blocking version of the receive. This is similar to open-mpi/ompi@223d75595d	2016-08-04 13:31:07 +09:00
Nathan Hjelm	11c853d05e	osc/pt2pt: do not set rdma_frag after start It is possible for the start call to complete the requests. For this reason the module rdma_frag field should be filled in before start is called. If the request completes the completion callback will reset the rdma_frag field to NULL. Fixes a bug discovered by @tkordenbrock. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-03 15:20:36 -06:00
Nathan Hjelm	889dd32806	pml/ob1: reset req_bytes_packed on start On start we were not correctly resetting all request fields. This was leading to a double-completion on persistent receives. This commit updates the base start code to reset the receive req_bytes_packed and the send request convertor. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-03 11:29:30 -06:00
KAWASHIMA Takahiro	642acfbb8c	Merge pull request #1927 from kawashima-fj/pr/fortran-named-constants fortran: Correct named constants and a datatype name	2016-08-03 08:40:36 +09:00
Edgar Gabriel	aa7e852e44	common/ompio: files are only compiled in case MPI I/O is requested fixes: open-mpi/ompi#1932	2016-08-02 15:01:38 -05:00
George Bosilca	087761c2dc	Fix a warning and other small cleanups.	2016-08-02 17:33:53 +02:00
Edgar Gabriel	19fe5cac50	io/ompio: next step in code-reorganization - move the sort_iovec operations to fcoll/base - move set_view_internal to common/ompio - move set_file_default to common/ompio - remove io_ompio_sort, not used anymore.	2016-08-02 09:18:29 -05:00
KAWASHIMA Takahiro	722e898e8e	fortran: Correct the name of `MPI_INTEGER16`. The name of `MPI_INTEGER16` obtained using `MPI_TYPE_GET_NAME` from Fortran program was incorrect (`MPI_INTEGER8` was obtained) when `INTEGER*16` is not supported by a compiler. This bug affects only the Fortran binding because `MPI_INTEGER16` is not defined in `mpi.h` if a compiler does not support it.	2016-08-02 22:37:58 +09:00
KAWASHIMA Takahiro	5383003eab	fortran: Add missing predefined datatype named constants. This commit add the following Fortran named constants which are defined in the MPI standard but are missing in Open MPI. - `MPI_LONG_LONG` (defined as a synonym of `MPI_LONG_LONG_INT`) - `MPI_CXX_FLOAT_COMPLEX` - `MPI_C_BOOL` And this commit also changes the value of the following Fortran named constant for consistency. - `MPI_C_COMPLEX` `(MPI_C_FLOAT_COMPLEX` is defined as a synonym of this) Each needs a different solution described below. For `MPI_LONG_LONG`: The value of `MPI_LONG_LONG` is defined to have a same value as `MPI_LONG_LONG_INT` because of the following reasons. 1. It is defined as a synonym of `MPI_LONG_LONG_INT` in the MPI standard. 2. `MPI_LONG_LONG_INT` and `MPI_LONG_LONG` has a same value for C in `mpi.h`. 3. `ompi_mpi_long_long` is not defined in `ompi/datatype/ompi_datatype_module.c`. For `MPI_CXX_FLOAT_COMPLEX`: Existing `MPI_CXX_COMPLEX` is replaced with `MPI_CXX_FLOAT_COMPLEX` bacause `MPI_CXX_FLOAT_COMPLEX` is the right name defined in MPI-3.1 and `MPI_CXX_COMPLEX` is not defined in MPI-3.1 (nor older). But for compatibility, `MPI_CXX_COMPLEX` is treated as a synonym of `MPI_CXX_FLOAT_COMPLEX` on Open MPI. For `MPI_C_BOOL`: `MPI_C_BOOL` is newly added. The value which `MPI_C_COMPLEX` had used (68) is assinged for it because the value becomes no longer in use (described later) and it is a suited position as a datatype added on MPI-2.2. For `MPI_C_COMPLEX`: Existing `MPI_C_FLOAT_COMPLEX` is replaced with `MPI_C_COMPLEX` and `MPI_C_FLOAT_COMPLEX` is changed to have the same value. In other words, make `MPI_C_COMPLEX` the canonical name and make `MPI_C_FLOAT_COMPLEX` an alias of it. This is bacause the relation of these datatypes is same as the relation of `MPI_LONG_LONG_INT` and `MPI_LONG_LONG`, and `MPI_LONG_LONG_INT` and `MPI_LONG_LONG` are implemented like that. But in the datatype engine, we use `ompi_mpi_c_float_complex` instead of `ompi_mpi_c_complex` as a variable name to keep the consistency with the other similar types such as `ompi_mpi_c_double_complex` (see George's comment in open-mpi/ompi#1927). We don't delete `ompi_mpi_c_complex` now because it is used in some other places in Open MPI code. It may be cleand up in the future. In addition, `MPI_CXX_COMPLEX`, which was defined only in the Open MPI Fortran binding, is added to `mpi.h` for the C binding. This commit breaks binary compatibility of Fortran `MPI_C_COMPLEX`. When this commit is merged into v2.x branch, the change of `MPI_C_COMPLEX` should be excluded.	2016-08-02 22:36:41 +09:00
Gilles Gouaillardet	917d96ba50	coll/libnbc: cleanup handling of the second temporary buffer in ireduce	2016-08-02 16:32:15 +09:00
Gilles Gouaillardet	ed9139ca13	coll/libnbc: correctly handle datatype alignment when allocating two buffers at once	2016-08-02 15:44:12 +09:00
KAWASHIMA Takahiro	0cb5dfe18d	fortran: Correct predefined datatype named constants.	2016-08-02 13:12:22 +09:00
KAWASHIMA Takahiro	ad3b590172	fortran: Add missing `MPI_NO_OP` and `MPI_WIN_*` named constants.	2016-08-02 13:10:44 +09:00
Edgar Gabriel	c0bd8728fd	io/ompio: move aggregator selection code to a separate file - move all functions related to aggregator selection to a single file - perform code cleanup fixing many Coverty complains along the way.	2016-08-01 14:04:27 -05:00
Jeff Squyres	50952c3a31	Merge pull request #1912 from rivis/pr/mpisync mpisync: Fix a compilation error.	2016-07-29 14:53:44 -04:00
Edgar Gabriel	160d9a78c1	Merge pull request #1886 from edgargabriel/pr/ompio-reorg io/ompio: move io/ompio functionality to common/ompio	2016-07-29 12:24:21 -05:00
Joshua Ladd	4a03a657c6	Merge pull request #1913 from vspetrov/hcoll_derived_datatypes coll/hcoll mpi datatypes support	2016-07-29 10:08:23 -04:00
Nathan Hjelm	1da558407c	Merge pull request #1911 from hjelmn/threads opal/thread: clean up and add additional OPAL_THREAD macros	2016-07-29 06:44:11 -06:00
Valentin Petrov	3582bba6b7	coll/hcoll mpi datatypes support	2016-07-29 10:06:39 +03:00
Gilles Gouaillardet	9f3e1a0620	Merge pull request #1898 from ggouaillardet/topic/poc_configury_cli configury: capture configury command line	2016-07-29 11:55:21 +09:00
Howard Pritchard	5ff6b81eee	Merge pull request #1871 from hppritcha/topic/ofi_mtl_params mtl/ofi: add some more mca parameters	2016-07-28 18:21:23 -06:00
Gilles Gouaillardet	273e56096b	configury: capture configury command line configury command line is quoted and made available via the OPAL_CONFIGURE_CLI macro. it can be retrieved via {orte-info,ompi_info,oshmem_info} -c, or {orte-info,ompi_info,oshmem_info} --all --parseable \| grep ^config:cli:	2016-07-29 09:14:09 +09:00
Ralph Castain	cacb582ecd	Support timeout values when performing connect/accept operations. Bump default timeout to 10 minutes so folks have time to start the partnering application	2016-07-28 14:09:06 -07:00
KAWASHIMA Takahiro	2a932f48ad	mpisync: Fix a compilation error. This commit fixes the undefined `OPAL_MAXHOSTNAMELEN` error which arises only when `--enable-timing` is specified for `configure`. This bug exists only in master branch because the commit `3322347` is not merged into other branches.	2016-07-29 02:38:25 +09:00
Nathan Hjelm	aac611237b	opal/thread: clean up and add additional OPAL_THREAD macros This commit expands the OPAL_THREAD macros to include 32- and 64-bit atomic swap. Additionally, macro declararations have been updated to include both OPAL_THREAD_* and OPAL_ATOMIC_*. Before this commit the former was used with add and the later with cmpset. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-07-28 09:23:14 -06:00
Howard Pritchard	22c8743557	mtl/ofi: add some more mca parameters allow for toggling of both control/data progress models. allow for using FI_AV_TABLE or FI_AV_MAP for av type. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-07-28 02:35:09 -06:00
Gilles Gouaillardet	a0a999e63d	coll/base: fix ompi_coll_base_allgatherv_intra_basic_default() with MPI_IN_PLACE	2016-07-28 13:57:18 +09:00
Gilles Gouaillardet	b8a1ffb87e	coll/base: fix ompi_coll_base_allgatherv_intra_basic_default() Fixes open-mpi/ompi#1907	2016-07-28 13:50:04 +09:00
Jeff Squyres	2e0c3c7d77	libompitrace: explicitly set the .so version Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-07-27 07:05:58 -04:00
Pascal Deveze	10763f5abc	mtl/portals4: Take into account the limitation of portals4 (max_msg_size) and split messages if necessary	2016-07-26 08:44:07 +02:00

... 5 6 7 8 9 ...

9753 Коммитов