openmpi

Автор	SHA1	Сообщение	Дата
KAWASHIMA Takahiro	dff6accec6	ompi/datatype: Fix args of DARRAY According to MPI-3.1 P.122, `ni` for `MPI_COMBINER_DARRAY` should be `4ndims+4`, not `4size+4`. This bug may cause SEGV if `size` is smaller than `ndims` when the darray is used for one-sided communication (pt2pt OSC). This bug was introduced in open-mpi/ompi@79b13f36 (when darray became a first class citizen and the `a_i` index of darray was shifted by 2). The corresponding `MPI_Type_create_darray()` function sets a right value so we don't need to update the function.	2016-06-15 11:24:22 +09:00
Ralph Castain	5d330d5220	Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler. Add PMIx 2.0 Remove PMIx 1.1.4 Cleanup copying of component Add missing file Touchup a typo in the Makefile.am Update the pmix ext114 component Minor cleanups and resync to master Update to latest PMIx 2.x Update to the PMIx event notification branch latest changes	2016-06-14 13:08:41 -07:00
Jeff Squyres	c2185bb4b8	Merge pull request #1781 from jsquyres/pr/disable-psm-psm2-signal-hijacking PSM/PSM2: Disable signal handler hijacking by default	2016-06-14 15:33:24 -04:00
Jeff Squyres	5071602c59	PSM/PSM2: Disable signal handler hijacking by default Per discussion on https://github.com/open-mpi/ompi/pull/1767 (and some subsequent phone calls and off-issue email discussions), the PSM library is hijacking signal handlers by default. Specifically: unless the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel TrueScale) is set, the library constructor for this library will hijack various signal handlers for the purpose of invoking its own error reporting mechanisms. This may be a bit surprising, but is not a problem, per se. The real problem is that older versions of at least the PSM library do not unregister these signal handlers upon being unloaded from memory. Hence, a segv can actually result in a double segv (i.e., the original segv and then another segv when the now-non-existent signal handler is invoked). This PSM signal hijacking subverts Open MPI's own signal reporting mechanism, which may be a bit surprising for some users (particularly those who do not have Intel TrueScale). As such, we disable it by default so that Open MPI's own error-reporting mechanisms are used. Additionally, there is a typo in the library destructor for the PSM2 library that may cause problems in the unloading of its signal handlers. This problem can be avoided by setting `HFI_NO_BACKTRACE=1` (for PSM2 / Intel OmniPath). This is further compounded by the fact that the PSM / PSM2 libraries can be loaded by the OFI MTL and the usNIC BTL (because they are loaded by libfabric), even when there is no Intel networking hardware present. Having the PSM/PSM2 libraries behave this way when no Intel hardware is present is clearly undesirable (and is likely to be fixed in future releases of the PSM/PSM2 libraries). This commit sets the following two environment variables to disable this behavior from the PSM/PSM2 libraries (if they are not already set): * IPATH_NO_BACKTRACE=1 * HFI_NO_BACKTRACE=1 If the user has set these variables before invoking Open MPI, we will not override their values (i.e., their preferences will be honored). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-06-14 11:45:23 -07:00
Edgar Gabriel	1ddfd6cdca	io/ompio: fix the preallocate function handle preallocating sizes less than the current file size correctly.	2016-06-14 10:50:32 -05:00
KAWASHIMA Takahiro	84b110a1f2	ompi/datatype: Fix args of HINDEXED_BLOCK According to MPI-3.1 P.121, `ni` for `MPI_COMBINER_HINDEXED_BLOCK` should be `2`, not `2 + count`. This bug was introduced in `113b45b4` (when `MPI_Type_create_hindexed_block` support is added in Open MPI) and fixed partially in `7f5314ee` and `8de93982`. This commit fixes the remaining part. Probably this bug has no user impact. It only consumes a bit more memory.	2016-06-10 17:32:33 +09:00
Gilles Gouaillardet	80e362de52	coll/base: fix memory free in ompi_coll_base_allreduce_intra_recursivedoubling err handler Fix CID 1362630 Fixes open-mpi/ompi@0e393195d9	2016-06-09 13:12:25 +09:00
Gilles Gouaillardet	ead7efef3f	coll/basic: silence CID 1362614 in mca_coll_basic_allreduce_inter()	2016-06-09 09:40:19 +09:00
Gilles Gouaillardet	ad2e1a5ae9	coll/base: silence CID 1362613 in ompi_coll_base_alltoall_intra_basic_linear()	2016-06-09 09:40:05 +09:00
Gilles Gouaillardet	80b267af1c	coll/base: silence CID 1362601 in ompi_coll_base_sendrecv_zero()	2016-06-09 09:37:31 +09:00
Gilles Gouaillardet	0e393195d9	coll/base: fix [all]reduce with non zero lower bound datatypes Offset temporary buffer when a non zero lower bound datatype is used. Thanks Hristo Iliev for the report	2016-06-08 16:48:00 +09:00
Nathan Hjelm	97c1643216	Merge pull request #1766 from hjelmn/req_fix ompi/request: fix loop conditional	2016-06-07 12:11:56 -06:00
Nathan Hjelm	3ddf3ccbf3	Merge pull request #1758 from hjelmn/ob1_fixes pml/ob1: bug fixes	2016-06-07 11:18:55 -06:00
Nathan Hjelm	5a4adb866d	ompi/request: fix loop conditional This commit fixes a bug in waitany that causes the code to go past the beginning of the request array. The loop conditional i >= 0 is invalid since i is unsigned. Changed to loop to check (i+1) > 0. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-07 10:28:46 -06:00
Todd Kordenbrock	9671d6af47	Merge pull request #1689 from francois-wellenreiter/remove_trig_rdv_portals4 MTL portals4 : remove the triggered rendez-vous protocol	2016-06-06 21:55:01 -05:00
Nathan Hjelm	5d0b4679ea	pml/ob1: bug fixes This commit fixes two bugs in pml/ob1: - Do not called MCA_PML_OB1_PROGRESS_PENDING from mca_pml_ob1_send_request_start_copy as this may lead to a recursive call to mca_pml_ob1_send_request_process_pending. - In mca_pml_ob1_send_request_start_rdma return the rdma frag object if a btl fragment can not be allocated. This fixes a leak identified by @abouteiller and @bosilca. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-06 17:54:55 -06:00
Gilles Gouaillardet	544a2f1631	configury: fix mpifort and oshmemfort wrapper data NAG compiler use gcc (and not ld) as a linker, so in order to pass an option to the linker, the flag is -Wl,-Wl,,<option> and not -Wl,<option> Thanks Paul Hargrove for the report	2016-06-06 11:54:12 +09:00
Gilles Gouaillardet	c976559877	coll/basic: fix log basic bcast The log basic bcast was completely broken. The rank 0 gets the hibit set to -1, so it always returned an error.	2016-06-06 11:01:51 +09:00
Gilles Gouaillardet	99fedcb7a3	fs/base: silence a memory leak in mca_fs_base_get_fstype() Fixes CID 1351211	2016-06-06 09:20:14 +09:00
George Bosilca	9376b0340b	Fix the basic barrier. The log basic barrier was completely broken. The rank 0 gets the hibit set to 0, so it always returned an error.	2016-06-03 23:46:25 -04:00
Edgar Gabriel	d6af5444a6	fix the get_byte_offset code	2016-06-03 11:36:53 -05:00
Josh Hursey	9f9f70ee50	Merge pull request #1746 from jjhursey/topic/op-init ompi/op: Provide a default value for type/flags	2016-06-03 07:56:29 -05:00
Nathan Hjelm	e968ddfe64	start bug fixes (#1729 ) * mpi/start: fix bugs in cm and ob1 start functions There were several problems with the implementation of start in Open MPI: - There are no checks whatsoever on the state of the request(s) provided to MPI_Start/MPI_Start_all. It is erroneous to provide an active request to either of these calls. Since we are already looping over the provided requests there is little overhead in verifying that the request can be started. - Both ob1 and cm were always throwing away the request on the initial call to start and start_all with a particular request. Subsequent calls would see that the request was pml_complete and reuse it. This introduced a leak as the initial request was never freed. Since the only pml request that can be mpi complete but not pml complete is a buffered send the code to reallocate the request has been moved. To detect that a request is indeed mpi complete but not pml complete isend_init in both cm and ob1 now marks the new request as pml complete. - If a new request was needed the callbacks on the original request were not copied over to the new request. This can cause osc/pt2pt to hang as the incoming message callback is never called. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> * osc/pt2pt: add request for gc after starting a new request Starting a new receive may cause a recursive call into the pt2pt frag receive function. If this happens and the prior request is on the garbage collection list it could cause problems. This commit moves the gc insert until after the new request has been posted. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-02 20:22:40 -04:00
Matias A Cabral	29ab28f4f6	Adding owner.txt file for PSM2 MTL.	2016-06-02 16:26:16 -07:00
Joshua Hursey	a776d78f2d	ompi/op: Provide a default value for type/flags * User defined ops leave the op_type unset which can confuse logic in a collective component that is trying to convert the op to the approprate local function.	2016-06-02 13:59:04 -05:00
George Bosilca	d577e12dd0	Fix comment.	2016-06-03 00:57:31 +09:00
George Bosilca	fc5d458249	Consistency in handling OPAL_ENABLE_FT_CR. I am not sure if we should continue to maintain the request support for FT_CR, but I tried here to simplify the code while maintaining the same meaning.	2016-06-03 00:54:24 +09:00
Nathan Hjelm	b001184e63	request: fix warnings (#1742 ) Fix warnings introduced by request rework. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-02 04:53:16 -04:00
George Bosilca	bfcf145613	Refactor the request test and wait functions.	2016-06-02 11:58:25 +09:00
George Bosilca	2e1b1d34c6	Safety first !	2016-06-02 11:52:43 +09:00
George Bosilca	50cec456fb	ompi_request_complete with signal Rewrite the ompi_request_complete function to take in account the with_signal argument. Change the comment to explain the expected behavior. Alter all the ompi_request_complete uses to make sure the status of the request is set before calling ompi_request_complete. bot🏷️enhancement	2016-06-02 11:49:12 +09:00
George Bosilca	223d75595d	Give a boost to MPI_Barrier. Based on current implementation it is faster to use a blocking send than the non-blocking version. Switch the exchange function used in the barrier to use the blocking version combined with the non-blocking version of the receive.	2016-06-02 11:45:25 +09:00
Ralph Castain	2c086e56be	Add an experimental ability to skip the RTE barriers at the end of MPI_Init and the beginning of MPI_Finalize	2016-06-01 17:01:15 -07:00
Nathan Hjelm	086ffc1838	pml/ob1: fix race on pml completion of send requests The request code was setting the request as pml_complete before calling MCA_PML_OB1_SEND_REQUEST_MPI_COMPLETE. This was causing MCA_PML_OB1_SEND_REQUEST_RETURN to be called twice in some cases. The code now mirrors the recvreq code and only sets the request as pml complete if the request has not already been freed. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-01 13:36:06 -06:00
Gilles Gouaillardet	5f565dfec3	configury: clean the flex generated .c files	2016-06-01 11:13:31 +09:00
Gilles Gouaillardet	1bbc5fadee	ompi/win: silence an other warning	2016-05-31 13:18:39 +09:00
Gilles Gouaillardet	c41321b9e5	ompi/win: silence warning	2016-05-31 13:03:20 +09:00
Jeff Squyres	59f4a765b3	Merge pull request #1656 from hpcraink/pr/make_manpage In case, we do not build Fortran, Fortran 2008 or CXX, the regexp in …	2016-05-28 11:02:12 -04:00
Nathan Hjelm	d8fd3a411a	Merge pull request #1725 from hjelmn/request_fixes ompi/request: fix bugs in MPI_Wait_some and MPI_Wait_any	2016-05-27 13:47:49 -06:00
Nathan Hjelm	0591139f49	ompi/request: fix bugs in MPI_Wait_some and MPI_Wait_any This commit fixes two bugs in MPI_Wait_any: - If all requests are inactive then the sync wait would hang forever because no requests are attached to the sync. - The request pointer was pointing to the request before the completed request which caused the wrong request to be freed or marked inactive. MPI_Wait_some had a similar issue if all the requests were pending. These issues were identified by MTT. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-27 12:36:10 -06:00
Nathan Hjelm	0adfb328e1	win: fix warnings Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-05-27 10:14:02 -06:00
Thananon Patinyasakdikul	60d0fbf683	Removal of ompi_request_lock from pml/ucx.	2016-05-26 12:36:58 -04:00
George Bosilca	90f294096e	Remove more references to the request mutex. Regarding BFO it should be mentionned that this component is currently unmaintained, and that despite my efforts I could not make it compile (it would not compile before this patch either).	2016-05-25 23:27:06 -04:00
Nathan Hjelm	9d439664f0	pml/yalla: update for request changes This commit brings the pml/yalla component up to date with the request rework changes. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 15:42:53 -06:00
Nathan Hjelm	8445c885ce	pml/cm: update for request changes This fixes a hang caused by the request refactor work. The cm pml was not updated and was hanging is most cases. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 15:35:32 -06:00
Nathan Hjelm	ef11ba9394	request: fix compilation error The request.h header is unfortunately included files in the C++ bindings. C++ does not allow assigning from void * to another pointer without a cast. This commit adds the cast. We can clean this up when the C++ bindings are deleted. Fixes open-mpi/ompi#1707 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 09:52:23 -06:00
Valentin Petrov	5ff6372886	coll/hcoll: bugfix: initialize req_type field If left uninitialized then segfault is possible in MPI_Waitall in the case the field by chance equals OMPI_REQUEST_GEN.	2016-05-25 15:38:01 +03:00
George Bosilca	2b868c4952	Fix MPI datatype args. Compensate for the datatype ID that we add to the array.	2016-05-24 23:36:54 -04:00
bosilca	b90c83840f	Refactor the request completion (#1422 ) * Remodel the request. Added the wait sync primitive and integrate it into the PML and MTL infrastructure. The multi-threaded requests are now significantly less heavy and less noisy (only the threads associated with completed requests are signaled). * Fix the condition to release the request.	2016-05-24 18:20:51 -05:00
Nathan Hjelm	5126da5377	win: add support for accumulate_ordering info key This commit adds support for the MPI-3.1 accumulate_ordering info key. The default value is rar,war,raw,waw and is supported using an MCA variable flag enumerator. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-24 11:13:30 -06:00
Jeff Squyres	e7d46b96a3	Merge pull request #1680 from yburette/topic/fix_provider_selection mtl/ofi: Change default provider selection behavior.	2016-05-23 15:06:02 -04:00
Francois WELLENREITER	b2b0fc63e2	MTL portals4 : remove the triggered rendez-vous protocol	2016-05-23 15:50:00 +02:00
Gilles Gouaillardet	bca44592af	Merge pull request #1643 from ggouaillardet/topic/romio_openbsd57 io/romio: fix filesystem type check on OpenBSD	2016-05-23 16:33:56 +09:00
George Bosilca	16d9f71d01	Correctly compute the space needed for the args. Add checks to bail out if our precomputed value is less than needed (we are already at fault). bot:milestone:v1.10.3 bot:milestone:v2.0 bot🏷️bug bot:assign: @ggouaillardet	2016-05-21 16:01:16 -04:00
George Bosilca	0641005dab	Only check the parameters on valid dimensions.	2016-05-21 15:54:04 -04:00
George Bosilca	6aac0d9c22	Remove useless output stream.	2016-05-21 15:54:04 -04:00
Nathan Hjelm	31bfeede82	bml/r2: always add btl progress function This commit changes the behavior of bml/r2 from conditionally registering btl progress functions to always registering progress functions. Any progress function beloning to a btl that is not yet in use is registered as low-priority. As soon as a proc is added that will make use of the btl is is re-registered normally. This works around an issue with some btls. In order to progress a first message from an unknown peer both ugni and openib need to have their progress functions called. If either btl is not in use after the first call to add_procs the callback was never happening. This commit ensures the btl progress function is called at some point but the number of progress callbacks is reduced from normal to ensure lower overhead when a btl is not used. The current ratio is 1 low priority progress callback for every 8 calls to opal_progress(). Fixes open-mpi/ompi#1676 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-21 15:54:04 -04:00
yohann	2f0cde791a	mtl/ofi: Change default provider selection behavior. As more providers get added to libfabric, the default exclude list would need to be updated. Instead, we choose to include only the providers known to work by default. New default: - include: psm,psm2,gni - exclude: none	2016-05-19 10:59:25 -07:00
Ralph Castain	a35bb8453a	Unlock the mutex prior to destructing it. Thanks to Nicolas Joly for the report	2016-05-19 10:36:58 -07:00
Rainer Keller	0fb0913cd4	In case, we do not build Fortran, Fortran 2008 or CXX, the regexp in make_manpage.pl will delete all lines up to the next ".fi" -- which for functions that do not implement the corresponding interface as code will have all eliminated. Change to delete the man page's content up to the next section header ".SH" Also in case of make V=1, we'd like to see the command line, too. Amend OMPI_Affinity_str according to the other man-pages definitions.	2016-05-17 14:21:35 +02:00
rhc54	8b534e9897	Merge pull request #1668 from rhc54/topic/slurm When direct launching applications, we must allow the MPI layer to pr…	2016-05-16 12:23:19 -07:00
Jeff Squyres	5275e5e2a1	bml_r2: use __func__ to identify function names There were some old/stale function names in some debugging/verbose opal_output calls. Use __func__ instead, so that they won't become stale in the future. Thanks to Durga Choudhury for pointing out the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-16 11:06:47 -04:00
Ralph Castain	01ba861f2a	When direct launching applications, we must allow the MPI layer to progress during RTE-level barriers. Neither SLURM nor Cray provide non-blocking fence functions, so push those calls into a separate event thread (use the OPAL async thread for this purpose so we don't create another one) and let the MPI thread sping in wait_for_completion. This also restores the "lazy" completion during MPI_Finalize to minimize cpu utilization. Update external as well Revise the change: we still need the MPI_Barrier in MPI_Finalize when we use a blocking fence, but do use the "lazy" wait for completion. Replace the direct logic in MPI_Init with a cleaner macro	2016-05-14 16:37:00 -07:00
Aurélien Bouteiller	7f65c2b18e	forgot to update copyright in commits `627a89b` `4899c89`	2016-05-13 11:34:59 -04:00
George Bosilca	37e03e3e5b	Don't update req_bytes_received if no bytes were received.	2016-05-12 23:39:32 -04:00
rhc54	4d026e223c	Merge pull request #1661 from matcabral/master PSM and PSM2 MTLs to detect drivers and link	2016-05-11 17:43:17 -07:00
George Bosilca	f8facb177d	atomically update the refcount on the datatype args.	2016-05-11 12:40:18 -04:00
Matias A Cabral	528abff6ae	Merge remote-tracking branch 'upstream/master'	2016-05-10 15:42:08 -07:00
Matias A Cabral	d28ee62a96	Update in PSM and PSM2 MTLs to detect entries created by drivers for Intel TrueScale and Intel OmniPath, and detect a link in ACTIVE state. This fix addresses the scenario reported in the below OMPI users email, including formerly named Qlogic IB, now Intel True scale. Given the nature of the PSM/PSM2 mtls this fix applies to OmniPath: https://www.open-mpi.org/community/lists/users/2016/04/29018.php	2016-05-09 12:08:44 -07:00
Gilles Gouaillardet	0a19337371	coll/base: return MPI_ERR_UNSUPPORTED_OPERATION when coll_base_*_two_procs algo is used on a communicator that has no two tasks Thanks Dave Love for the report	2016-05-09 14:18:40 +09:00
Gilles Gouaillardet	b159587325	io/romio: fix filesystem type check on OpenBSD 5.7 check the existence of the f_type field in struct statfs Thanks Paul Hargrove for the report	2016-05-09 13:54:46 +09:00
Ralph Castain	6b24e2779b	Remove stale component - I'm not going to get to it	2016-05-07 04:13:34 -07:00
Edgar Gabriel	def1b95fd7	Merge pull request #1646 from edgargabriel/getview-preallocate-fixes io/ompio: file_getview and file_preallocate fixes	2016-05-06 11:46:00 -05:00
Edgar Gabriel	e65e189671	io/ompio: fix file size after file_preallocate Thanks for @dalcini for reporting Fixes open-mpi/ompi#1633	2016-05-06 08:20:59 -05:00
Edgar Gabriel	d358965134	io/ompio: fix envelope of datatype returned by getview Thanks for @dalcini for reporting Fixes open-mpi/ompi#1632	2016-05-06 08:19:48 -05:00
Edgar Gabriel	7c92acaa78	Merge pull request #1637 from edgargabriel/pr/netbsd-compilation-problems fs/lustre and fs/pvfs2: fix netbsd compilation problems	2016-05-06 08:05:36 -05:00
Jeff Squyres	810db734c4	Merge pull request #1640 from jsquyres/pr/mpir-cleanup debuggers: remove some useless code	2016-05-05 21:23:30 -04:00
Gilles Gouaillardet	6c9d65c0ca	coll/libnbc: fix MPI_Ireduce_scatter_block for one task communicator Thanks Lisandro Dalcin for the report Fixes open-mpi/ompi#248	2016-05-06 09:43:29 +09:00
Ralph Castain	08022d7af1	Some minor cleanups of warnings from gcc 6.0.0. Update s1/s2 pmix to get max_procs as required.	2016-05-05 15:28:13 -07:00
Jeff Squyres	83c2d04aa3	debuggers: remove some useless code MPIR-1.0 specifies that the following symbols are only relevant in the starter process: - MPIR_Breakpoint - MPIR_being_debugged - MPIR_debug_state - MPIR_debug_abort_string I.e., the code filling in values in these various symbols was useless / never used. MPIR-1.1 will define that MPIR_being_debugged is relevant in MPI processes. That symbol is currently defined in libopen-rte (which is currently causing a duplicate symbol error for static builds -- this commit fixes that error), and is therefore still available for MPI processes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-05 14:22:55 -07:00
Jeff Squyres	f167be1c91	ompio: always return valid info from FILE_GET_INFO MPI-3.1 says that even if no info keys are set on the file, we need to return a new, empty info. Thanks to Lisandro Dalcin for identifying the issue. Fixes open-mpi/ompi#1630 Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-05 12:03:29 -07:00
Aurélien Bouteiller	4899c89731	Fix a race condition when multiple threads try to create a bml endpoint simultaneously.	2016-05-05 10:49:30 -04:00
Aurélien Bouteiller	627a89bf71	Fix a race condition when multiple threads do the "first send" to an endpoint simultaneously.	2016-05-05 09:04:10 -04:00
Joshua Ladd	4771c9ece6	Merge pull request #1617 from jladd-mlnx/topic/disable-hcoll-barrier-in-finalize-ompi-trunk HCOLL: fix hang in hcoll barrier called from finalize for MXM/yalla	2016-05-04 10:12:34 -04:00
Aurélien Bouteiller	8344d00418	use-mpi extensions do not have a .la lib, so the fortran module should not depend on them.	2016-05-03 11:54:35 -04:00
Edgar Gabriel	78fa8bb2c4	remove some unused variables that can cause compilation problems on netbsd	2016-05-03 10:25:15 -05:00
Todd Kordenbrock	3498bed650	Merge pull request #1555 from shawone/check_reduce_ret coll-portals4: check return value from reduce kary tree functions	2016-05-03 10:17:23 -05:00
Jeff Squyres	33dd8ca81e	osc_rdma_peer: properly include ompi_config.h Thanks to Paul Hargrove for reporting. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-03 07:39:55 -07:00
Devendar Bureddy	cafd55f18c	HCOLL: fix hang in hcoll barrier called from finalize for MXM/yalla tear down HCOLL barrier may not complete if HCOLL progress is not called periodically. which is the case in HCOLL teardown progress in the finalize. (cherry picked from commit 793244d75dd94d1d5e0243bcccf6d04318750f3f)	2016-05-03 00:49:57 +03:00
Nathan Hjelm	d3d779f6d9	osc/rdma: clear all_sync object when obtaining a lock This commit fixes a bad synchronization detection bug that occurs when mixing MPI_Win_fence() and MPI_Win_lock(). If no communication has occurred in the fence epoch it is safe to just clear the all_sync object (it was set up by fence). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-02 15:28:47 -06:00
Jeff Squyres	265e5b9795	Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1 ompi/opal/orte/oshmem/test: max hostname length cleanup	2016-05-02 09:44:18 -04:00
Ralph Castain	6ac7929bd0	Extend the schizo framework to allow definition of CLI options by environment. Refactor orterun to mesh with the orted_submit code, thus improving code reuse. Eliminate the orte-submit tool as orterun can now meet that need. Cleanups per @jjhursey review	2016-05-01 11:30:25 -07:00
George Bosilca	6e6ed62a3c	Allow NULL arrays for emoty datatypes. When building an empty datatype (aka. size = 0) because the count of included datatypes is 0, be less strict on what the arguments are (allow NULL pointers).	2016-05-01 12:37:02 -04:00
Nathan Hjelm	ec66a6a1f8	Merge pull request #1605 from hjelmn/rdma_fixes osc/rdma: fix global index array calculation	2016-04-28 20:41:36 -06:00
Nathan Hjelm	7bda3eb2dc	osc/rdma: fix global index array calculation This commit fixes a bug that occurs when ranks are either not mapped evenly or by something other than core. Fixes open-mpi/ompi#1599 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-28 19:11:11 -06:00
Nathan Hjelm	1783d94f91	ompi/group: fix sparse group proc reference counting This commit fixes a bug when sparse groups are in use. Since sparse group do not actually increment the reference counts of any procs (they just retain the parent group) it is wrong to decrement the reference counts of all procs in the group using ompi_group_decrement_proc_count(). This commit makes the call to ompi_group_decrement_proc_count() conditional on the group being dense. Fixes open-mpi/ompi#1593 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-27 15:55:13 -06:00
Gilles Gouaillardet	01c90d4e71	fortran/mpif-h: fix _create_keyval_f correctly handle out parameter _keyval when OMPI_SIZEOF_FORTRAN_INTEGER > SIZEOF_INT	2016-04-27 13:34:32 +09:00
Gilles Gouaillardet	178dde6a20	fortran/mpif-h: fix MPI_Win_shared_query correctly handle out parameter disp_unit when OMPI_SIZEOF_FORTRAN_INTEGER > SIZEOF_INT	2016-04-27 11:22:09 +09:00
Gilles Gouaillardet	7f59d2a8c7	fortran/mpif-h: fix MPI_Win_free_keyval initialize inout parameter when OMPI_SIZEOF_FORTRAN_INTEGER > SIZEOF_INT	2016-04-27 10:46:14 +09:00
Nathan Hjelm	f0f3383006	Merge pull request #1590 from hjelmn/thread_multiple osc/pt2pt: do not drop/reacquire the ompi_request_lock	2016-04-26 16:48:37 -06:00
Nathan Hjelm	34ff6293bd	osc/pt2pt: do not drop/reacquire the ompi_request_lock This lock is now recursive so it is safe to call into the pml without dropping the lock. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-26 14:19:38 -06:00
George Bosilca	bf190671e9	Make the request lock recursive. If during the request completion callback we post another request that completes right away (such a small send or a match for an unexpected short message) we will try to complete the second request while holding the lock for the completion of the first. For performance reasons (mainly to avoid unlocking and locking the request mutex several times) we have made the request lock recursive.	2016-04-26 16:16:07 -04:00
Nathan Hjelm	1e4daa2a0e	mpi_init: move opal_set_using_threads() earlier in MPI_Init() There is a potential race condition in MPI_Init() where an orte even thread could be in a function that uses OPAL_THREAD_LOCK / OPAL_THREAD_UNLOCK when ompi_mpi_init calls opal_set_using_threads(). Closes open-mpi/ompi#1586 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-26 13:02:42 -06:00
Nathan Hjelm	c16e639b2f	Merge pull request #1563 from hjelmn/ompi_coverity ompi coverity fixes	2016-04-26 09:17:48 -06:00
Jeff Squyres	8ab88f2051	ompi_mpi_finalize: add/update comments This is a follow-on to open-mpi/ompi@7373111: add some comments explaining why the code is the way it is. Also update a previous comment. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-04-25 13:42:30 -07:00
Ralph Castain	7373111662	Somehow, the logic for finalize got lost, so restore it here. If pmix.fence_nb is available, then call it and cycle opal_progress until complete. If pmix.fence_nb is not available, then do an MPI_Barrier and call pmix.fence. Needs to go over to 2.x	2016-04-25 08:04:35 -07:00
Karol Mroz	3322347da9	ompi: fixup hostname max length usage Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-25 07:08:23 +02:00
Nathan Hjelm	ae0ffbb67f	Merge pull request #1397 from hjelmn/enable_thread_multiple ompi: always enable MPI_THREAD_MULTIPLE support	2016-04-23 08:40:22 -06:00
Joshua Ladd	0d5a57d9d3	Merge pull request #1558 from vspetrov/hcoll_complex_dtype_support Adds mapping to hcoll complex data type	2016-04-20 08:35:33 -04:00
Gilles Gouaillardet	490b538ad6	ompi/datatype: fix MPI_LONG_LONG_INT type name MPI_LONG_LONG_INT is a named predefined datatype, so its name is now MPI_LONG_LONG_INT MPI_LONG_LONG is a synonym of MPI_LONG_LONG_INT, and its name is also MPI_LONG_LONG_INT	2016-04-20 09:34:20 +09:00
Nathan Hjelm	1ff3d3b16b	pml/ob1: fix coverity issue Fix CID 1357978 (1 of 1): Logically dead code (DEADCODE): Remove duplicate check for NULL == endpoint. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 14:48:13 -06:00
Nathan Hjelm	70533e6d50	fcoll/static: fix coverity issues Fix CID 72362: Explicit null dereferenced (FORWARD_NULL) From what I can tell the code @ fcoll_static_file_read_all.c:649 should be setting bytes_per_process[i] to 0 not bytes_per_process. Fix CID 72361: Explicit null dereferenced (FORWARD_NULL) Modified check to check for blocklen_per_process non-NULL before trying to free blocklen_per_process[l]. This is sufficient because free (NULL) is safe. Also cleaned up the initialization of this an a couple other arrays. They were allocated with malloc() then initialized to 0. Changed to used calloc(). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 14:48:13 -06:00
Nathan Hjelm	8871bdb2f8	fcoll/two_phase: fix coverity issues Fix CID 72296: Resource leak (RESOURCE_LEAK): Changed code to goto exit instead of returning to ensure memory is freed. Fix CID 712589: Out-of-bounds read (OVERRUN): In this loop i and j are identical and always less than iov_count. The CID was triggered because i was incremented if i was < iov_count. This meant that if the loop did go on the next iteration would access an invalid index. Fix CID 741363: Uninitialized scalar variable (UNINIT): Allocate tmp_len with calloc to insure every index is initialized. Fix CID 741364: Uninitialized pointer read (UNINIT): Allocate recv_types with calloc to ensure all indices are always initialized. Also added a check to not loop and destroy if recv_types is NULL. Also added a NULL check on the allocation of decoded iov. This is not the cause of CID 126784 but should be fixed. Fix CID 712588: Out-of-bounds read (OVERRUN): Similar to CID 712589. Should silence the issue. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 14:47:41 -06:00
Valentin Petrov	21f1c572c0	Adds mapping to hcoll complex dte	2016-04-19 14:14:28 +03:00
Nicolas Chevalier	c86d4035d2	coll-portals4: check return value from reduce kary tree functions	2016-04-18 12:02:30 +00:00
Ralph Castain	7829fbdc29	Per request from Jeff, aggregate all help messages during MPI_Init thru MPI_Finalize as long as the RTE is available	2016-04-15 13:37:22 -07:00
KAWASHIMA Takahiro	e854404570	fortran: Change the line order of #pragma No code change. These lines were introduced in my recent commit `17d32ac`. I had a editing mistake and the order is different from other lines/files.	2016-04-15 12:49:03 +09:00
Nathan Hjelm	3245428e82	Merge pull request #1535 from kawashima-fj/pr/osc-pt2pt-header-fix osc/pt2pt: Fix a struct name typo	2016-04-14 15:55:25 -06:00
Jeff Squyres	4566286b9a	Merge pull request #1538 from kawashima-fj/pr/fortran-binding-fix fortran: Fix many Fortran binding bugs	2016-04-14 17:18:59 -04:00
Nathan Hjelm	330302c4b4	Merge pull request #1534 from kawashima-fj/pr/parallel-rma-fix osc/pt2pt: Fix tag conflicts on parallel RMA communications	2016-04-14 15:13:32 -06:00
Jeff Squyres	fdf33674b3	Merge pull request #1532 from kmroz/wip-hindexed-cleanup-1 romio,java: cleanup deprecated hindexed call	2016-04-14 17:07:31 -04:00
Jeff Squyres	2374d8fcf7	Merge pull request #1536 from kawashima-fj/pr/inplace-fix mpi/c, mpi/fortran: Fix `MPI_IN_PLACE`-related bugs	2016-04-14 15:56:55 -04:00
Nathan Hjelm	b4e5b5c09e	Merge pull request #1531 from hjelmn/bml bml: always enable the bml	2016-04-14 10:22:33 -06:00
Nathan Hjelm	1e6b4f2f55	Merge pull request #1495 from hjelmn/new_hooks Add new patcher memory hooks	2016-04-13 18:19:23 -06:00
Nathan Hjelm	11e2d7886e	opal/memory: update component structure This commit makes it possible to set relative priorities for components. Before the addition of the patched component there was only one component that would run on any system but that is no longer the case. When determining which component to open each component's query function is called and the one that returns the highest priority is opened. The default priority of the patcher component is set slightly higher than the old ptmalloc2/ummunotify component. This commit fixes a long-standing break in the abstration of the memory components. ompi_mpi_init.c was referencing the linux malloc hook initilize function to ensure the hooks are initialized for libmpi.so. The abstraction break has been fixed by adding a memory base function that calls the open memory component's malloc hook init function if it has one. The code is not yet complete but is intended to support ptmalloc in 2.0.0. In that case the base function will always call the ptmalloc hook init if exists. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-13 17:14:51 -06:00
KAWASHIMA Takahiro	4944ba7edc	datatype: Fix incorrect predefined datatype names and other datatype bugs (#1537 ) * datatype: Fix a incorrect datatype name of `MPI_UNSIGNED` Name of predefined datatype for C `unsigned int` gotten by `MPI_TYPE_GET_NAME` should be `MPI_UNSIGNED`, not `MPI_UNSIGNED_INT`. * datatype: Fix incorrect datatype names of `MPI_C_BOOL` and `MPI_CXX_` Names of predefined datatypes gotten by `MPI_TYPE_GET_NAME` are: after this commit (correct) \| before this commit (incorrect) ----------------------------------------------------------- MPI_C_BOOL MPI_BOOL MPI_CXX_BOOL MPI_BOOL MPI_CXX_FLOAT_COMPLEX MPI_C_FLOAT_COMPLEX MPI_CXX_DOUBLE_COMPLEX MPI_C_DOUBLE_COMPLEX MPI_CXX_LONG_DOUBLE_COMPLEX MPI_C_LONG_DOUBLE_COMPLEX datatype: Fix a incorrect datatype name of `MPI_2DOUBLE_PRECISION` Name of the predefined datatype for Fortran two `double precision` gotten by `MPI_TYPE_GET_NAME` should be `MPI_2DOUBLE_PRECISION`, not `MPI_2DBLPREC`. This bug was caused by setting the name to `opal_datatype_t::name` instead of `ompi_datatype_t::name`. * datatype: Fix `MPI_UNSIGNED_CHAR` internal flag `MPI_UNSIGNED_CHAR` is an integer type. * ompi/cxx: Fix C++ `MPI::LONG_DOUBLE_INT` definition Just a typo fix. Without this fix, `MPI::MAX_LOC` and `MPI::MIN_LOC` cannot be used with `MPI::LONG_DOUBLE_INT` in C++ programs. I know the C++ binding is obsolete, but fixing this is harmless. * Add FUJITSU copyright	2016-04-12 20:17:46 +02:00
KAWASHIMA Takahiro	17d32acbb6	fortran: Add missing `(P)MPI_Alloc_mem_cptr_{f,f08}` symbols This commit adds the following symbols MPI_Alloc_mem_cptr_f MPI_Alloc_mem_cptr_f08 PMPI_Alloc_mem_cptr_f PMPI_Alloc_mem_cptr_f08 These are implemented in the same way as other `_cptr` routines.	2016-04-12 22:40:58 +09:00
KAWASHIMA Takahiro	d48c8525ed	fortran: Fix incorrect weak symbol names	2016-04-12 22:16:32 +09:00
KAWASHIMA Takahiro	5d32a601ff	fortran: Add missing interfaces (part 2)	2016-04-12 22:06:35 +09:00
KAWASHIMA Takahiro	6f09d53e34	fortran: Add missing interfaces	2016-04-12 21:44:33 +09:00
KAWASHIMA Takahiro	f3b9a49ad1	fortran: Add missing PMPI interfaces	2016-04-12 20:55:41 +09:00
KAWASHIMA Takahiro	b6cb0bc257	fortran: Fix an incorrect interface name	2016-04-12 20:48:08 +09:00
KAWASHIMA Takahiro	96e93a9c5f	fortran: Sort declared subroutins in alphabetical order And insert necessary empty lines and remove unnecessary empty lines. No code change.	2016-04-12 20:36:46 +09:00
KAWASHIMA Takahiro	334c63cf0a	fortran: Change subroutine declaration order Same order for `comm`, `type`, and `win`. No code change.	2016-04-12 20:10:15 +09:00
KAWASHIMA Takahiro	10c11ff5b5	fortran: Add missing `MPI_DUP_FN` subroutine Though the `MPI_DUP_FN` subroutine is depricated, it is not yet removed as of MPI-3.1.	2016-04-12 20:06:50 +09:00
KAWASHIMA Takahiro	35ea9e5c3c	Add FUJITSU copyright	2016-04-12 13:47:53 +09:00
KAWASHIMA Takahiro	39bcbe439a	osc/pt2pt: Fix a struct name typo Fortunately the sizes of `ompi_osc_pt2pt_header_put_t` and `ompi_osc_pt2pt_header_get_t` are same. So this doesn't affect the behavior.	2016-04-11 20:55:22 +09:00
KAWASHIMA Takahiro	d3d6386578	mpi/forran: Support `MPI_IN_PLACE` on `(I)ALLTOALLW` and `(I)EXSCAN` `MPI_IN_PLACE` support for `MPI_ALLTOALLW` and `MPI_EXSCAN` was added in MPI-2.2 but it was missed in OMPI Fortran binding code.	2016-04-11 20:38:28 +09:00
KAWASHIMA Takahiro	28a0577364	osc/pt2pt: Insert breaks in long lines	2016-04-11 19:06:01 +09:00
KAWASHIMA Takahiro	5ac95df9dc	osc/pt2pt: use two distinct "namespaces" for tags - revised Before this commit, a same PML tag may be used for distinct communications for long messages. For example, consider a condition where rank A calls ```MPI_PUT``` targeting rank B and rank B calls ```MPI_GET``` targeting rank A simultaneously. A PML tag for the ```MPI_PUT``` is acquired on rank A and is used for the long-message communication from rank A to rank B. A PML tag for the ```MPI_GET``` is acquired on rank B and is used for the long-message communication from rank A to rank B. These two tags may become a same value because they are managed independently on each rank. This will cause a data corruption. This commit separates the tag used in a single RMA communication call, one for communication from an origin to a target, and one for communication from a target to an origin. A "base" tag is acquired using ```get_tag``` function and PML tag is caluculated from the base tag by ```tag_to_target``` and ```tag_to_origin``` function.	2016-04-11 19:05:20 +09:00
KAWASHIMA Takahiro	3576ecafa7	Revert "osc/pt2pt: use two distinct "namespaces" for tags" This reverts commit `06ecdb6aa7` to reimplement the fix completely.	2016-04-11 19:04:11 +09:00
KAWASHIMA Takahiro	eb5c31521b	mpi/c: Fix `MPI_IALLTOALLW` memchecker	2016-04-11 18:47:30 +09:00
KAWASHIMA Takahiro	1ced7f213c	mpi/c: Fix `IALLTOALL{V\|W}` + `MPI_IN_PLACE` param check `sendcounts`, `sdispls`, and `sendtype(s)` must be ignored if `MPI_IN_PLACE` is specified for `sendbuf`. This commit makes the param check code same as the blocking `ALLTOALL{V\|W}` function.	2016-04-11 18:34:11 +09:00
Karol Mroz	f8ecdbd623	java: replace deprecated hindexed call Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-10 19:56:22 +02:00
Karol Mroz	5c54184986	romio: replace deprecated hindexed call Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-10 19:56:22 +02:00
Nathan Hjelm	c6b19818be	bml: always enable the bml This commit ensures the bml is always enabled whether or not it will be used. This ensures that any available btls communicate their modex so that they can be used for one-sided communication. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-08 21:14:17 -06:00
George Bosilca	896f857fc4	Thanks @hjelmn for catching up the typo.	2016-04-07 13:56:26 -04:00
Thananon Patinyasakdikul	92290b94e0	Fixed Coverity reports 1358014-1358018 (DEADCODE and CHECK_RETURN)	2016-04-07 12:52:17 -04:00
Ryan Grant	7cdf50533c	Merge pull request #1314 from francois-wellenreiter/osc_disable_portals4_evt_send OSC portals4 : do not generate an EVENT_SEND to avoid to filter it	2016-04-07 10:04:27 -06:00
Gilles Gouaillardet	7b803ac557	MPI_Unpack: fix error code when insize <= 0 this fixes a regression from open-mpi/ompi@f2e33c725f	2016-04-06 09:47:21 +09:00
Karol Mroz	a468c3ba1a	opal_info_support: pass component map when handling params Pass component_map to opal_info_do_params(). It will be needed to output component versions. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-02 21:17:44 +02:00
Gilles Gouaillardet	f2e33c725f	MPI_Unpack: fix return status this regression was previously introduced in open-mpi/ompi@221e6e2eab	2016-03-31 09:56:54 +09:00
Gilles Gouaillardet	5932287cef	datatype/[un]pack_external[_size]: move subroutines down to ompi/datatype so it can be directly used by test/datatype/external32	2016-03-30 13:01:33 +09:00
Gilles Gouaillardet	221e6e2eab	Add the datatype checks to the pack/unpack functions. The datatype must satisfy the same constraints as for the corresponding communication function (send for pack and recv for unpack).	2016-03-30 11:40:08 +09:00
Gilles Gouaillardet	a89f113507	mpi/c: add missing OPAL_CR_EXIT_LIBRARY() in [un]pack[_external]	2016-03-30 11:25:21 +09:00
George Bosilca	004c0cc05b	Fix issues identified by @derbeyn.	2016-03-29 15:50:32 -04:00
Jeff Squyres	91c54d7a07	Merge pull request #1491 from ICLDisco/progress_thread BTL TCP async progress	2016-03-29 06:26:10 -04:00
George Bosilca	f69eba1bc4	Update the copyright and cleanup the code. Per @jsquyres suggestion remove all trailing spaces. Credit to `sed -i.bak 's/ $//' /[ch]`.	2016-03-28 14:41:01 -04:00
Thananon Patinyasakdikul	92062492b9	Enable Threading in the BTL TCP Added mca parameter to turn progress thread on/off Add a flag to check if we have btl progress thread. Added macro for ob1 matching lock. Update the AUTHORS file.	2016-03-28 14:41:01 -04:00
Nathan Hjelm	9d5eeecb8a	pml/ob1: detect unreachable errors This commit adds code to detect when procs are unreachable when using the dynamic add_procs functionality. Fixes #1501 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-28 10:52:40 -06:00
Gilles Gouaillardet	1baed498b6	win: silence a warning in alloc_window(...)	2016-03-28 14:57:31 +09:00
Nathan Hjelm	d6e90f24b1	Merge pull request #1483 from hjelmn/flag_enum_2 RFC: Add support for flag enumerators for MCA variables	2016-03-26 11:43:33 -06:00
George Bosilca	57eadb0dd6	Fix for Coverity CID 1357152. Or at least that was the origin of the issue. It turns out we were freeing the wrong buffer (but as it only happen in the case of an error we never noticed).	2016-03-24 00:53:30 -04:00
George Bosilca	4b38b6bd0c	Fix multiple issues with the collective requests. This patch addresses most (if not all) @derbeyn concerns expressed on #1015. I added checks for the requests allocation in all functions, ompi_coll_base_free_reqs is called with the right number of requests, I removed the unnecessary basic_module_comm_t and use the base_module_comm_t instead, I remove all uses of the COLL_BASE_BCAST_USE_BLOCKING define, and other minor fixes.	2016-03-23 18:35:41 -04:00
Nathan Hjelm	a1420003b6	ompi/comm: clean up includes in comm_request.h Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-22 09:17:38 -06:00
Nathan Hjelm	b15a45088c	mca: add support for flag enumerators This commit adds a new type of enumerator meant to support flag values. The enumerator parses comma-delimited strings and matches each string or value to a list of valid flags. Additionally, the enumerator does some basic checks to see if 1) a flag is valid in the enumerator, and 2) if any conflicting flags are specified. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-21 15:20:56 -06:00
Todd Kordenbrock	2122a15217	Merge pull request #1443 from francois-wellenreiter/fix_trig_rndv MTL portals4 : fix around triggered rndv operations	2016-03-21 08:16:33 -05:00
Ralph Castain	c146c4969b	Revert part of open-mpi/ompi@c1bbbb5e2f to restore the usock component, thus fixing show_help aggregation. Fixes #1467 Restore debugger attach operations Fixes #1225	2016-03-18 21:49:04 -07:00
Nathan Hjelm	075dfa4121	topo/treematch: fix component coverity issues Fix CID 1315298: Resource leak (RESOURCE_LEAK) : Fix CID 1315300: Resource leak (RESOURCE_LEAK): Fix CID 1315299: Resource leak (RESOURCE_LEAK): Fix CID 1315297 (#1 of 1): Resource leak (RESOURCE_LEAK): Confirmed leaks in error paths. Added the leaked arrays to the ERR_EXIT macro to ensure they are freed. Fix CID 1315296 (#1 of 1): Resource leak (RESOURCE_LEAK): Confirmed leak in error paths. Both the oversub and reqs arrays are leaked. Free these arrays on error. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-18 11:31:11 -06:00
Nathan Hjelm	3540b65f7d	bcol: fix coverity issues Fix CID 1269976 (#1 of 1): Unused value (UNUSED_VALUE): Fix CID 1269979 (#1 of 1): Unused value (UNUSED_VALUE): Removed unused variables k_temp1 and k_temp2. Fix CID 1269981 (#1 of 1): Unused value (UNUSED_VALUE): Fix CID 1269974 (#1 of 1): Unused value (UNUSED_VALUE): Removed gotos and use the matched flags to decide whether to return. Fix CID 715755 (#1 of 1): Dereference null return value (NULL_RETURNS): This was also a leak. The items on cs->ctl_structures are allocated using OBJ_NEW so they mist be released using OBJ_RELEASE not OBJ_DESTRUCT. Replaced the loop with OPAL_LIST_DESTRUCT(). Fix CID 715776 (#1 of 1): Dereference before null check (REVERSE_INULL): Rework error path to remove REVERSE_INULL. Also added a free to an error path where it was missing. Fix CID 1196603 (#1 of 1): Bad bit shift operation (BAD_SHIFT): Fix CID 1196601 (#1 of 1): Bad bit shift operation (BAD_SHIFT): Both of these are false positives but it is still worthwhile to fix so they no longer appear. The loop conditional has been updated to use radix_mask_pow instead of radix_mask to quiet these issues. Fix CID 1269804 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS): In general close (-1) is safe but coverity doesn’t like it. Reworked the error path for open to not try to close (-1). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-18 10:59:46 -06:00
Nathan Hjelm	c8b077f232	coll/ml: fix coverity issues Fix CID 715744 (#1 of 1): Logically dead code (DEADCODE): Fix CID 715745 (#1 of 1): Logically dead code (DEADCODE): The free of scratch_num in either place is defensive programming. Instead of removing the free the conditional around the free has been removed to quiet the warning. Fix CID 715753 (#1 of 1): Dereference after null check (FORWARD_NULL): Fix CID 715778 (#1 of 1): Dereference before null check (REVERSE_INULL): Fixed the conditional to check for collective_alg != NULL instead of collective_alg->functions != NULL. Fix CID 715749 (#1 of 4): Explicit null dereferenced (FORWARD_NULL): Updated code to ensure that none of the parse functions are reached with a non-NULL value. Fix CID 715746 (#1 of 1): Logically dead code (DEADCODE): Removed dead code. Fix CID 715768 (#1 of 1): Resource leak (RESOURCE_LEAK): Fix CID 715769 (#2 of 2): Resource leak (RESOURCE_LEAK): Fix CID 715772 (#1 of 1): Resource leak (RESOURCE_LEAK): Move free calls to before error checks to cleanup leak in error paths. Fix CID 741334 (#1 of 1): Explicit null dereferenced (FORWARD_NULL): Added a check to ensure temp is not dereferenced if it is NULL. Fix CID 1196605 (#1 of 1): Bad bit shift operation (BAD_SHIFT): Fixed overflow in calculation by replacing int mask with 1ul. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-18 10:11:16 -06:00
Nathan Hjelm	2f4e5325aa	coll/base: fix coverity issues Fix CID 1325868 (#1 of 1): Dereference after null check (FORWARD_NULL): Fix CID 1325869 (#1-2 of 2): Dereference after null check (FORWARD_NULL): Here reqs can indeed be NULL. Added a check to ompi_coll_base_free_reqs to prevent dereferencing NULL pointer. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-18 09:31:43 -06:00
Nathan Hjelm	2ed4501490	osc: fix coverity issues Fix CID 1324726 (#1 of 1): Free of address-of expression (BAD_FREE): Indeed, if a lock conflicts with the lock_all we will end up trying to free an invalid pointer. Fix CID 1328826 (#1 of 1): Dereference after null check (FORWARD_NULL): This was intentional but it would be a good idea to check for module->comm being non_NULL to be safe. Also cleaned out some checks for NULL before free(). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-18 09:11:48 -06:00
Nathan Hjelm	b9d100929b	man: fix typo in MPI_Win_allocate_shared Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-16 14:47:40 -06:00
Nathan Hjelm	ec9712050b	Merge pull request #1118 from hjelmn/mpool_rewrite mpool/rcache rewrite	2016-03-15 10:46:24 -06:00
Nathan Hjelm	deae9e52bf	Merge pull request #1259 from kawashima-fj/pr/osc-sm-align osc/sm: Fix a bus error on MPI_WIN_{POST,START}.	2016-03-15 09:13:38 -06:00
Francois WELLENREITER	2bc432d95f	MTL portals4 : fix around triggered rndv operations	2016-03-15 15:31:04 +01:00
Nathan Hjelm	d4afb16f5a	opal: rework mpool and rcache frameworks This commit rewrites both the mpool and rcache frameworks. Summary of changes: - Before this change a significant portion of the rcache functionality lived in mpool components. This meant that it was impossible to add a new memory pool to use with rdma networks (ugni, openib, etc) without duplicating the functionality of an existing mpool component. All the registration functionality has been removed from the mpool and placed in the rcache framework. - All registration cache mpools components (udreg, grdma, gpusm, rgpusm) have been changed to rcache components. rcaches are allocated and released in the same way mpool components were. - It is now valid to pass NULL as the resources argument when creating an rcache. At this time the gpusm and rgpusm components support this. All other rcache components require non-NULL resources. - A new mpool component has been added: hugepage. This component supports huge page allocations on linux. - Memory pools are now allocated using "hints". Each mpool component is queried with the hints and returns a priority. The current hints supported are NULL (uses posix_memalign/malloc), page_size=x (huge page mpool), and mpool=x. - The sm mpool has been moved to common/sm. This reflects that the sm mpool is specialized and not meant for any general allocations. This mpool may be moved back into the mpool framework if there is any objection. - The opal_free_list_init arguments have been updated. The unused0 argument is not used to pass in the registration cache module. The mpool registration flags are now rcache registration flags. - All components have been updated to make use of the new framework interfaces. As this commit makes significant changes to both the mpool and rcache frameworks both versions have been bumped to 3.0.0. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-14 10:50:41 -06:00
Gilles Gouaillardet	eb690432e8	fortran: add missing constants for MPI_WIN_CREATE_FLAVOR and MPI_WIN_MODEL also add valid values used by MPI_WIN_CREATE_FLAVOR : - MPI_WIN_FLAVOR_CREATE - MPI_WIN_FLAVOR_ALLOCATE - MPI_WIN_FLAVOR_DYNAMIC - MPI_WIN_FLAVOR_SHARED	2016-03-14 10:19:21 +09:00
Nathan Hjelm	60a3eb12ac	comm_spawn_multiple_f: fix coverity issue Fix CID 1327338: Resource leak (RESOURCE_LEAK): Confirmed that the c_info array was being leaked. Free the array before returning. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-11 13:07:01 -07:00
Gilles Gouaillardet	fbed6df4a3	coll/base: fix a typo typo was introduced in open-mpi/ompi@c98e97a46e	2016-03-11 14:18:03 +09:00
Gilles Gouaillardet	0da1374f22	man: fix typo in MPI_File related man pages	2016-03-11 14:16:21 +09:00
Gilles Gouaillardet	d08fb46ec7	ompi/win: use type int* for MPI_WIN_DISP_UNIT, MPI_WIN_CREATE_FLAVOR and MPI_WIN_MODEL Thanks Alastair McKinstry for the report.	2016-03-11 09:22:25 +09:00
Aurélien Bouteiller	c98e97a46e	Do not return MPI_ERR_PENDING from collectives.	2016-03-09 16:13:34 -05:00
Joshua Ladd	4dffae2f88	Fixing MXM Yalla and MTL add procs behavior. MXM cannot support dynamic add procs, so propaget this info to the MTL and PML layers.	2016-03-08 01:46:24 +02:00
Aurélien Bouteiller	892e1ed57e	Fix a potential race condition in which a progress matching thread could match a request while we are cancelling it.	2016-03-01 16:43:45 -05:00
Gilles Gouaillardet	8aff67c399	topo/base: correctly support MPI_UNWEIGHTED in mca_topo_base_dist_graph_neighbors() Thanks Jun Kudo for the bug report.	2016-03-01 10:28:28 +09:00
Jeff Squyres	89d0a033b7	cxx: "rank" is now a function in C++11 Use "myrank" instead (I tried using ::rank, but had varied success... so I just renamed the variable).	2016-02-25 15:56:08 -06:00
George Bosilca	dbe93b0b19	Use mca_bml_base_get_endpoint Correctly use mca_bml_base_get_endpoint instead of accessing the endpoint directly.	2016-02-25 11:00:30 -06:00
Sylvain Jeaugey	5f32f49eb8	pml/ob1: Fix segmentation fault on CUDA path. Fix segfault due to mca_pml_ob1_cuda_need_buffers not handling the case of the endpoint not being there. Calling mca_bml_get_endpoint() seems to fix the problem. Fixes open-mpi/ompi#1402	2016-02-24 21:32:25 -08:00
Gilles Gouaillardet	d8482ce6f4	opal/mca/memory: add a memoryc_set_alignment subroutine to the OPAL memory MCA this commit also (partially) reverts : - open-mpi/ompi@7de01b347c - open-mpi/ompi@8b05f308f9	2016-02-24 09:50:12 +09:00
Nathan Hjelm	230d04327e	ompi: always enable MPI_THREAD_MULTIPLE support This commit removes the --with-mpi-thread-multiple option and forces MPI_THREAD_MULTIPLE support. This cleans up an abstration violation in opal where OMPI_ENABLE_THREAD_MULTIPLE determines whether the opal_using_threads is meaningful. To reduce the performance hit on MPI_THREAD_SINGLE programs an OPAL_UNLIKELY is used for the check on opal_using_threads in OPAL_THREAD_* macros. This commit does not clean up the arguments to the various functions that take whether muti-threading support is enabled. That should be done at a later time. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-23 10:02:14 -07:00
Edgar Gabriel	45003ef78d	fix the data size counter for large ops for the static fcoll component	2016-02-23 08:33:50 -06:00
Gilles Gouaillardet	308bbcbad1	ompi/dpm: retrieves OPAL_PMIX_ARCH in heterogeneous mode also remove code duplication by using ompi_proc_complete_init_single() Thanks Siegmar Gross for reporting this issue, and Ralph for the guidance.	2016-02-22 11:01:06 +09:00
Gilles Gouaillardet	a4aa4c9571	ompi_proc_complete_init_single: make the subroutine public and accept a proc from a different job	2016-02-22 11:01:06 +09:00
yohann	59b6d041f8	mtl/ofi: Check allocated pointer.	2016-02-19 16:59:47 -08:00
yohann	bd47062764	mtl/ofi: Fix error handling.	2016-02-19 16:58:41 -08:00
yohann	404987e9b3	mtl/ofi: Fix mismatching types.	2016-02-19 16:57:26 -08:00
yohann	3ad59435ce	mtl/ofi: Prevent possible memory leak.	2016-02-19 16:57:02 -08:00
Edgar Gabriel	92d1b99468	optimize the shuffle step: 1. use communicator collectives if possible for performance reasons 2. combined multiple allgathers into a single one	2016-02-19 11:04:04 -06:00
Edgar Gabriel	e63836c653	clean up the mca parameter handling of the component. Add new parameters for number of sub groups and write chunk size. This will allow to perform a systematic parameter study.	2016-02-19 10:15:28 -06:00
Edgar Gabriel	4f400314e0	add the dynamic_gen2 component into the fcoll selection table.	2016-02-19 09:32:54 -06:00
Edgar Gabriel	268d525053	change the tag to be a positive value. handle 0-byte situations correctly.	2016-02-19 08:28:50 -06:00
Edgar Gabriel	ad79012059	first cut on the version which overlaps the communication/computation of 2 iterations.	2016-02-19 08:28:50 -06:00
Jeff Squyres	7b73c868d5	memchecker.h: fix memchecker no-data case Thanks to @clintonstimpson for reporting the issue. Fixes open-mpi/ompi#100 Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-02-18 10:48:11 -08:00
Ralph Castain	60a7bc2e50	Enable the PMIx notification callback system. This currently is only supported by the pmix120 component, which is not selected by default. All other components will ignore error registration requests, and thus do not support debugger attach when launched via mpirun. Note that direct launched applications will support such attachment, but may not do so in a scalable fashion. Fixes ##1225	2016-02-18 09:29:12 -08:00
yohann	7fe395c82a	mtl/ofi: cleanup	2016-02-16 09:57:57 -08:00
yohann	22eddfee10	mtl/ofi: update copyright dates.	2016-02-16 09:56:09 -08:00
Gilles Gouaillardet	7de01b347c	ompi/init: fix abstraction violation This fixes open-mpi/ompi@8b05f308f9 libmpi.so cannot be built (unresolved symbols) with configure'd with --disable-mem-debug --disable-mem-profile --disable-memchecker --without-memory-manager	2016-02-16 16:39:21 +09:00
igor-ivanov	d9eefefa74	Merge pull request #1351 from igor-ivanov/pr/issue-1336 opal/memory: Move Memory Allocation Hooks usage from openib	2016-02-15 14:07:36 +04:00
George Bosilca	56425a5d48	Fix issue identified by Lisandro Dalcin regarding the lack of support for NULL value in MPI_Type_set_attr. Provides a fix for issue #1359.	2016-02-14 00:07:08 -05:00
George Bosilca	68c36ea9dc	Fix two annoying warnings in our UCX support.	2016-02-14 00:02:16 -05:00
Jeff Squyres	7bc62e8f4c	Merge pull request #1356 from hjelmn/get_address Fix MPI_Get_address (MPI_BOTTOM, ...)	2016-02-13 08:27:18 -05:00
Nathan Hjelm	064a67f5b9	Fix MPI_Get_address (MPI_BOTTOM, ...) Nowhere in the standard does it say that it is invalid to pass MPI_BOTTOM to MPI_Get_address yet we were returning an error. This commit removes the error check on NULL == location. Fixes open-mpi/ompi#1355. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-12 16:34:21 -07:00
yohann	67ce4a080a	mtl/ofi: FI_AV_MAP support only.	2016-02-12 10:06:52 -08:00
yohann	b3d8ead76e	mtl/ofi: Fix dynamic add_procs.	2016-02-12 10:05:52 -08:00
Jeff Squyres	d98616b9ed	Merge pull request #1337 from ggouaillardet/poc/f08_fn mpi_f08: correctly implements MPI_{COMM,TYPE,WIN}_{DUP,NULL_{COPY,DEL…	2016-02-11 12:27:29 -05:00
Igor Ivanov	8b05f308f9	opal/memory: Move Memory Allocation Hooks usage from openib These changes fix issue https://github.com/open-mpi/ompi/issues/1336 - improve abstractions: opal/memory/linux component should be single place that opeartes with Memory Allocation Hooks. - avoid collisions in case dynamic component open/close: it is safe because it is linked statically. - does not change original behaivour.	2016-02-11 14:46:35 +02:00
Gilles Gouaillardet	96310f439b	sentinel: fix 32 bits arch since a sentinel is only made from the current job, only store the first 31 bits of the vpid into the sentinel.	2016-02-10 15:44:07 +09:00
Gilles Gouaillardet	b55b9e6aee	sentinel: fix sentinel to proc_name conversion converting an opal_process_name_t means the loss of one bit, it was decided to restrict the local job id to 15 bits, so the useful information of an opal_process_name_t can fit in 63 bits.	2016-02-10 15:44:07 +09:00
Gilles Gouaillardet	030a5f2054	sentinel: use type uintptr_t for sentinel MSB is now automatically cleared when right shifting Thanks George for pointing this	2016-02-10 11:28:56 +09:00
Jeff Squyres	d537ee9f26	Merge pull request #1340 from jsquyres/pr/decrease-mpi_add_procs_cutoff RFC: ompi_mpi_params.c: set mpi_add_procs_cutoff default to 0	2016-02-09 13:36:43 -05:00
Jeff Squyres	902b477aac	ompi_mpi_params.c: set mpi_add_procs_cutoff default to 0 Decrease the default value of the "mpi_add_procs_cutoff" MCA param from 1024 to 0.	2016-02-09 09:41:36 -08:00
George Bosilca	7c574a3530	Typo.	2016-02-07 07:22:22 +02:00
Nathan Hjelm	5b9c82a964	osc/pt2pt: bug fixes This commit fixes several bugs identified by @ggouaillardet and MTT: - Fix SEGV in long send completion caused by missing update to the request callback data. - Add an MPI_Barrier to the fence short-cut. This fixes potential semantic issues where messages may be received before fence is reached. - Ensure fragments are flushed when using request-based RMA. This allows MPI_Test/MPI_Wait/etc to work as expected. - Restore the tag space back to 16-bits. It was intended that the space be expanded to 32-bits but the required change to the fragment headers was not committed. The tag space may be expanded in a later commit. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-04 16:59:39 -07:00
Gilles Gouaillardet	6eac6a8b00	osc/sm: create datafile into the per proc directory in order to make it unique per communicator Thanks Peter Wind for the report	2016-02-03 10:12:37 +09:00
Nathan Hjelm	519fffb65e	osc/pt2pt: eager sends are always active if MPI_MODE_NOCHECK is used This commit fixes open-mpi/ompi#1299. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-02 12:44:17 -07:00
Nathan Hjelm	d7264aa613	osc/pt2pt: various threading fixes This commit fixes several bugs identified by a new multi-threaded RMA benchmarking suite. The following bugs have been identified and fixed: - The code that signaled the actual start of an access epoch changed the eager_send_active flag on a synchronization object without holding the object's lock. This could cause another thread waiting on eager sends to block indefinitely because the entirety of ompi_osc_pt2pt_sync_expected could exectute between the check of eager_send_active and the conditon wait of ompi_osc_pt2pt_sync_wait. - The bookkeeping of fragments could get screwed up when performing long put/accumulate operations from different threads. This was caused by the fragment flush code at the end of both put and accumulate. This code was put in place to avoid sending a large number of unexpected messages to a peer. To fix the bookkeeping issue we now 1) wait for eager sends to be active before stating any large isend's, and 2) keep track of the number of large isends associated with a fragment. If the number of large isends reaches 32 the active fragment is flushed. - Use atomics to update the large receive/send tag counters. This prevents duplicate tags from being used. The tag space has also been updated to use the entire 16-bits of the tag space. These changes should also fix open-mpi/ompi#1299. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-02 12:33:33 -07:00
Gilles Gouaillardet	cda094afc7	mpi_f08: correctly implements MPI_{COMM,TYPE,WIN}_{DUP,NULL_{COPY,DELETE}}_FN Fixes open-mpi/ompi#1323	2016-02-02 13:38:01 +09:00
Gilles Gouaillardet	728a97c558	use-mpi-f08: remove duplicates from Makefile.am	2016-02-02 13:33:07 +09:00
Jeff Squyres	910eca751f	Merge pull request #1327 from ggouaillardet/poc/mpi_xxx_dup_yyy_no_bind f08: do not BIND(C) to subroutines with LOGICAL parameters	2016-02-01 17:56:27 -05:00
Edgar Gabriel	3f7fff5780	Merge pull request #1331 from edgargabriel/solaris-statfs-fix Solaris statfs fix	2016-01-28 20:16:33 -06:00
Gilles Gouaillardet	69ba2a9b6b	ddt: fix support of MPI_COMBINER_RESIZED in __ompi_datatype_create_from_args Thanks James Ramsey for the report	2016-01-28 11:32:29 +09:00
Nathan Hjelm	a19c265ab5	osc/rdma: fix typo in ompi_osc_rdma_complete_atomic The typo caused SEGVs on systems with only fetching atomic support. Fixes open-mpi/ompi#1329 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-01-26 15:44:07 -07:00
Edgar Gabriel	b4a725c26a	need to check for the parent dir as well, since the file might not exist yet.	2016-01-26 13:49:21 -06:00
Edgar Gabriel	722aab92e6	- extend opal_path_nfs to retrieve the file system type - use opal_path_nfs in the fs_base function to avoid code duplication.	2016-01-26 13:36:21 -06:00
Gilles Gouaillardet	704f14f91e	f08: do not BIND(C) to subroutines with LOGICAL parameters Thanks Paul Romano for reporting this issue.	2016-01-26 13:56:24 +09:00
Joshua Ladd	69e3c6f289	Merge pull request #1321 from jladd-mlnx/topic/add-allgatherv-reduce Adding entry points for Allgatherv, iAllgatherv, Reduce, and iReduce.	2016-01-25 20:46:52 -05:00
Nathan Hjelm	500e90422d	Merge pull request #1320 from hjelmn/osc_rdma_fix osc/rdma: fix hang when performing large unaligned gets	2016-01-25 09:36:13 -07:00
Nathan Hjelm	45da311473	osc/rdma: fix hang when performing large unaligned gets This commit adds code to handle large unaligned gets. There are two possible code paths for these transactions: 1) The remote region and local region have the same alignment. In this case the get will be broken down into at most three get transactions: 1 transaction to get the unaligned start of the region (buffered), 1 transaction to get the aligned portion of the region, and 1 transaction to get the end of the region. 2) The remote and local regions do not have the same alignment. This should be an uncommon case and is not optimized. In this case a buffer is allocated and registered locally to hold the aligned data from the remote region. There may be cases where this fails (low memory, can't register memory). Those conditions are unlikely and will be handled later. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-01-22 21:06:46 -07:00
Valentin Petrov	5e2a2c0755	BufFix for coll/hcoll: coll_request must be set to ACTIVE when alloced If the state of the request is not set to OMPI_REQUEST_ACTIVE then MPI_Test would immediately signal such request completed while hcoll may still be working on it. Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>	2016-01-23 03:23:59 +02:00
Joshua Ladd	e398bf6f3a	Adding entry points for Allgatherv, iAllgatherv, Reduce, and iReduce. Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>	2016-01-23 03:09:29 +02:00
Nathan Hjelm	49d2f44b97	osc/rdma: use correct endpoint for local state If atomics are not globally visible (cpu and nic atomics do not mix) then a btl endpoint must be used to access local ranks. To avoid issues that are caused by having the same region registered with multiple handles osc/rdma was updated to always use the handle for rank 0. There was a bug in the update that caused osc/rdma to continue using the local endpoint for accessing the state even though the pointer/handle are not valid for that endpoint. This commit fixes the bug. Fixes open-mpi/ompi#1241. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-01-22 10:41:27 -07:00
Nathan Hjelm	243d973cfe	Merge pull request #1316 from hjelmn/datatype_pack_threads ompi/datatype: make datatype pack thread safe	2016-01-21 20:14:10 -07:00
Nathan Hjelm	b921831f2b	ompi/datatype: make datatype pack thread safe This commit makes ompi_datatype_get_pack_description thread safe. The call is used by osc/pt2pt to send the packed description to remote peers. Before this commit if MPI_THREAD_MULTIPLE is enabled and the user uses MPI_Put, MPI_Get, etc we could hit a race where multiple threads attempt to store the packed description on the datatype. Since the code in question is not performance-critical the threading fix uses opal_atomic_* calls instead of bothering with OPAL_THREAD_*. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-01-21 17:53:28 -07:00
Nathan Hjelm	6180386bea	osc/rdma: disable put aggregation when using threads Optimizing put aggregation in the presence of threads will require a redesign of the code. For now just ensure that put aggregation is turned off when MPI_THREAD_MULTIPLE is enabled. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-01-21 15:50:35 -07:00
Edgar Gabriel	b253d4e887	fix CID 1349739, CID 1349738, CID 1349736 and (probably) CID 1349740 (not entirely sure about the last one, since I don't understand why block[i] is a problem but max_len[i] allocated and treated exactly the same way 1 line later is not).	2016-01-21 08:32:23 -06:00
Edgar Gabriel	9b8d769e41	will rivist the addproc component later in spring, right now it is constantly in the way of doing my tests.	2016-01-20 15:05:51 -06:00
Edgar Gabriel	1671604dbc	Merge pull request #1307 from edgargabriel/fcoll-dynamic_gen2 Fcoll dynamic gen2	2016-01-20 10:19:56 -06:00
Francois WELLENREITER	411b7301c3	OSC portals4 : do not generate an EVENT_SEND to avoid to filter it	2016-01-20 11:47:46 +01:00
Gilles Gouaillardet	2adbe273d6	mpi: have MPI_Wtick() return the period (and not the frequency) if OPAL_TIMER_CYCLE_NATIVE	2016-01-20 14:14:47 +09:00
Gilles Gouaillardet	c0f8f2ce32	ompi/dpm: correctly handle sentinels in construct peers This fix is similar to open-mpi/ompi@4c1ea4a171 and open-mpi/ompi@213b2abde4	2016-01-18 09:57:38 +09:00
Edgar Gabriel	a9ca37059a	improve the communicaton abstraction. This commit also allows all aggregators to work simultaniously, instead of the slightly staggered way of the previous version.	2016-01-17 09:48:49 -06:00
Edgar Gabriel	56e11bfc97	initialize the stripe_size variable as well.	2016-01-17 09:48:49 -06:00
Edgar Gabriel	26c57ef374	separate the size of the buffer used for the shuffle step and the size of the buffer used for a pwritev operation.	2016-01-17 09:48:49 -06:00
Edgar Gabriel	39d5c8c281	further bug fixes silencing a compiler warning and fixing a memory overrun	2016-01-17 09:48:49 -06:00
Edgar Gabriel	2bcae84e11	further debugging	2016-01-17 09:48:49 -06:00
Edgar Gabriel	2bdd6ba17a	correctly free some buffers, and ensure that lustre_stripe_size and stripe_count are always read from the file system.	2016-01-17 09:48:49 -06:00
Edgar Gabriel	4bbb22bd0b	add a new field to the ompio data structure (stripe_count) and set it correctly on pvfs2 and lustre.	2016-01-17 09:48:49 -06:00
Edgar Gabriel	d282e94b67	add the new dynamic_gen2 component, designed to coexist for now with the original dynamic component	2016-01-17 09:48:49 -06:00
Jeff Squyres	60ffe713b8	common syms: whitelist bison-generated common symbols Bison generates some common symbols that we can't do anything about, so whitelist them.	2016-01-16 03:53:14 -08:00
Jeff Squyres	96f94f8228	fortran: whitelist deliberate common symbols The Fortran library has a number of common symbols that are deliberate, so whitelist them.	2016-01-16 03:53:14 -08:00
Joshua Ladd	18c5a21562	Fix typo in error handling flow.	2016-01-14 22:28:54 +02:00
Joshua Ladd	afa62d8ca1	Addressing reviewers' comments for https://github.com/open-mpi/ompi-release/pull/891	2016-01-14 19:22:27 +02:00
Tomislav Janjusic	3858bc8e62	Adding support for dynamic endpoint creation Signed-off-by: Tomislav Janjusic <tomislavj@mngx-apl-01.mtl.labs.mlnx> Signed-off-by: Tomislavj Janjusic <tomislavj@mellanox.com> Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>	2016-01-12 22:17:03 +02:00
Nathan Hjelm	dd4d49cbbb	Merge pull request #1278 from ggouaillardet/poc/osc_pt2pt osc/pt2pt: use two distinct "namespaces" for tags	2016-01-12 09:49:31 -07:00
Nathan Hjelm	d26cc3fece	ompi/group: do no decrement parent group proc pointers in destruct Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-01-11 12:56:11 -07:00
Edgar Gabriel	0a1b735eed	use the actual preadv and pwritev functions if available. That's what the fbtl interfaces have been designed for.	2016-01-07 08:29:17 -06:00
Gilles Gouaillardet	4c1ea4a171	dpm: correctly handle procs_cutoff in ompi_dpm_connect_accept() this commit includes missing bits from open-mpi/ompi@213b2abde4	2016-01-07 09:11:03 +09:00
Gilles Gouaillardet	213b2abde4	dpm: correctly handle procs_cutoff in ompi_dpm_connect_accept()	2016-01-06 16:21:13 +09:00
Edgar Gabriel	1b0b849994	remove the MCA parameter setting the number of hosts in PLFS, since the plfs_setxattr function used is causing linking problems with PLFS 2.5 remove unused variables.	2016-01-05 11:13:23 -06:00
Edgar Gabriel	7861a8c357	revise the logic in the fbtl plfs avoiding the memcpy operation	2016-01-05 10:04:46 -06:00
Edgar Gabriel	da309ac962	- use a unique pid for each process as requested by the API - sync the file before closing it - use plfs_access() instead of access() before closing the file	2016-01-05 10:04:12 -06:00
KAWASHIMA Takahiro	ad26899110	osc/sm: Fix a bus error on MPI_WIN_{POST,START}. A bus error occurs in sm OSC under the following conditions. - sparc64 or any other architectures which need strict alignment. - `MPI_WIN_POST` or `MPI_WIN_START` is called for a window created by sm OSC. - The communicator size is odd and greater than 3. The lines 283-285 in current `ompi/mca/osc/sm/osc_sm_component.c` has the following code. ```c module->global_state = (ompi_osc_sm_global_state_t ) (module->segment_base); module->node_states = (ompi_osc_sm_node_state_t ) (module->global_state + 1); module->posts[0] = (uint64_t *) (module->node_states + comm_size); ``` The size of `ompi_osc_sm_node_state_t` is multiples of 4 but not multiples of 8. So if `comm_size` is odd, `module->posts[0]` does not aligned to 8. This causes a bus error when accessing `module->posts[i][j]`. This patch fixes the alignment of `module->posts[0]` by setting `module->posts[0]` first.	2016-01-05 19:04:53 +09:00
Gilles Gouaillardet	06ecdb6aa7	osc/pt2pt: use two distinct "namespaces" for tags	2016-01-05 16:57:37 +09:00
Gilles Gouaillardet	14fdf75944	fs/pvfs2: fix typo Thanks Dave Love for reporting this issue. Fixes #1272	2016-01-03 23:28:35 +09:00
Artem Polyakov	2abb2972ac	Fix Mellanox copyrights with respect to the following PRs: * https://github.com/open-mpi/ompi/pull/1184 * https://github.com/open-mpi/ompi/pull/1188 * https://github.com/open-mpi/ompi/pull/1197 * https://github.com/open-mpi/ompi/pull/1202 * https://github.com/open-mpi/ompi/pull/1210 * https://github.com/open-mpi/ompi/pull/1216 * https://github.com/open-mpi/ompi/pull/1236 * https://github.com/open-mpi/ompi/pull/1237 * https://github.com/open-mpi/ompi/pull/1248 * https://github.com/open-mpi/ompi/pull/1260 * https://github.com/open-mpi/ompi/pull/1264	2015-12-30 00:12:19 +06:00
Ralph Castain	810f2446b7	Add pmix120 component, update the error handling functions in the PMIx API. Update the configure logic for the new pmix120 component ckpt Get the pmix120 component to work - still not really registering or handling notifications, but infrastructure now operates Cleanup some of the symbol scopes, and provide a more comprehensive rename.h file. Will pretty it up later - let's see how this works Cleanup the rename files to use the pretty macros	2015-12-28 23:15:44 +09:00
Gilles Gouaillardet	fec973efda	configury: test portability replace test ... -o ... with test ... \|\| test ... and test ... -a ... with test ... && test ...	2015-12-28 13:58:45 +09:00
Gilles Gouaillardet	47ab2fcb89	man: fix MPI_Neighbor_alltoall{v,w} prototypes Thanks Willem Vermin for bringing this to our attention	2015-12-28 09:39:33 +09:00
Gilles Gouaillardet	ccc96ad204	fbtl/base: add missing #include "opal/util/output.h" Thanks Marco Atzeri for contributing the original patch	2015-12-24 14:41:26 +09:00
Gilles Gouaillardet	cebde2a753	coll/tuned: add missing #include "opal/util/output.h" Thanks Marco Atzeri for contributing the original patch	2015-12-24 14:41:17 +09:00
Gilles Gouaillardet	ad9693c604	pml/yalla: add missing #include <alloca.h>	2015-12-24 14:33:58 +09:00
Gilles Gouaillardet	b38c17dbcb	pml/cm: add missing #include <alloca.h> Thanks Paul Hargrove for reporting this issue	2015-12-24 14:33:58 +09:00
Gilles Gouaillardet	071ae39a44	osc/rdma: add missing #include <alloca.h>	2015-12-24 14:33:58 +09:00
Gilles Gouaillardet	77f199d1d7	coll/fca: add missing #include <alloca.h>	2015-12-24 14:33:58 +09:00
Todd Kordenbrock	8a3660138e	mtl-portals4: initialize endpoint nid/pid when using logical mapping When mtl-portals4 is configured for logical mapping, coll-portals4 must disqualify because it does not yet support logical mapping. coll-portals4 looks for the endpoint pid to be zero which tells it that mtl-portals4 is configured for logical mapping. This commit initializes the endpoint nid/pid to zero for logical mapping.	2015-12-22 11:20:18 -06:00
Gilles Gouaillardet	e918d75fae	java: try do dlopen libmpi with the full path Since OS X 10.11 (aka El Capitan) DYLD_LIBRARY_PATH is no more propagated to children, so try to dlopen libmpi with the full path using the directory of libmpi_java Fixes open-mpi/ompi#1220 Thanks Alexander Daryin for reporting this	2015-12-22 11:09:46 +09:00
rhc54	aa17bdf6e8	Merge pull request #1239 from rhc54/topic/cleanup Cleanup the warnings from the ompi layer when compiling optimized under Mac OSX	2015-12-21 07:23:31 -08:00
Edgar Gabriel	46c20a1246	correctly set all variables storing information on the file pointer position to zero when setting the file view	2015-12-21 09:41:39 +09:00
George Bosilca	12dad8b37f	Fix the missing resize of the returned type for the subarray and darray types. Thanks Keith Bennett and Dan Garmann for reporting this issue Fixes open-mpi/ompi#1191	2015-12-21 09:41:30 +09:00
George Bosilca	6e6fd14a19	Fix indentation.	2015-12-20 03:15:19 -05:00
George Bosilca	c895eb7068	Remove extraneous declaration.	2015-12-19 01:34:48 -05:00
Ralph Castain	ac6289dca6	Cleanup the warnings from the ompi layer when compiling optimized under Mac OSX Cleanup per George's comments	2015-12-17 17:39:15 -08:00
Nathan Hjelm	d0b4aa1f9a	Merge pull request #1237 from artpol84/add_proc_deadlck_fix Fix add_proc deadlock.	2015-12-17 12:09:40 -07:00
Artem Polyakov	6a791c3026	Fix add_proc deadlock.	2015-12-17 21:18:33 +06:00
igor.ivanov@itseez.com	041a6a9f53	ompi/pml: Fix warnings in yalla component	2015-12-16 16:22:30 +02:00
igor.ivanov@itseez.com	38c253c74c	ompi/mtl: Fix warnings in mxm component	2015-12-16 16:22:29 +02:00
igor.ivanov@itseez.com	0a9956927a	ompi/coll: Fix warnings in fca components warning: assignment from incompatible pointer type	2015-12-16 16:22:16 +02:00
igor.ivanov@itseez.com	8f45d83d46	ompi/coll: Fix warnings in hcoll component warning: assignment from incompatible pointer type	2015-12-16 14:52:29 +02:00
Ralph Castain	3a56f0d34b	Create the pmix external component. Fix a few places where opal/util/argv.h were required when building with an external pmix (go figure). NOTE: Building with external pmix requires that you also build with external libevent and hwloc libraries. Detect this at configure and error out with large message if this requirement is violated. Closes #1204 (replaces it) Fixes #1064	2015-12-15 15:26:13 -08:00
Nathan Hjelm	4992c22f4a	Merge pull request #1224 from hjelmn/osc_fixes osc/rdma: fix bugs when running more than one process per node	2015-12-15 14:01:01 -08:00
Nathan Hjelm	0de9445fc7	osc/rdma: fix bugs when running more than one process per node A previous commit updated the one-sided code to register the state region only once. This created an issue when using the scratch lock with fetching atomics. In this case on any rank that isn't local rank 0 the module->state_handle is NULL. This commit fixes the issue by removing the scratch lock and using a fragment pointer instead. Fixes open-mpi/ompi#1290 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-12-15 11:25:25 -07:00
Jeff Squyres	a2a5d650f9	Merge pull request #1180 from ggouaillardet/mpi_xxx_dup_fn fortran: add missing MPI_xxx_DUP_FN bindings	2015-12-15 13:15:27 -05:00
Gilles Gouaillardet	f0df2a7b2b	ompi: silence CID 1343322	2015-12-15 13:33:43 +09:00
Nathan Hjelm	139799f3c4	Merge pull request #1202 from artpol84/alltoall_fix Fix MPI_Alltoall to support inter-communicators.	2015-12-14 14:33:23 -08:00
Nathan Hjelm	b7ba301310	Merge pull request #1165 from hjelmn/add_procs_group ompi/group: release ompi_proc_t's at group destruction	2015-12-14 13:53:42 -08:00
Nathan Hjelm	9d659465b7	Merge pull request #1210 from artpol84/icbarrier_fix Fix NBC iBarrier for inter-communicators.	2015-12-14 13:52:38 -08:00
Nathan Hjelm	4b3dac5933	Merge pull request #1216 from artpol84/icgatherv_fix Fix NBC iGatherv for inter-communicators.	2015-12-14 13:51:58 -08:00
Matias Cabral	7cfd7d50b9	Merge pull request #1219 from matcabral/PSM2_tag_hashing Support for PSM2 hashing lookup in message queue.	2015-12-14 12:01:55 -08:00
matcabral	9a1f9be146	A new internal feature in PSM2 will use hash tables to accelerate message queue lookups if the lookups have the proper tag&mask layout. OpenMPI should follow PSM2's preferred tag&mask spec, so that PSM2 can provide a performance benefit.	2015-12-14 10:13:39 -08:00
Artem Polyakov	2d0919dbdc	Fix NBC iGatherv for inter-communicators. We need to use remote size to form a schedule.	2015-12-14 12:19:10 +06:00
Ralph Castain	5e5adebf8e	Port the changes from #782 to the master. Not everything applies here as the code in the 1.10 series is a little different. In addition, we asked for a few changes (e.g., using MPI_ERR_ARG instead of "13") that are incorporated here. Thanks to @jsharpe for the PR	2015-12-12 12:40:34 -08:00
Artem Polyakov	fc17deca43	Fix NBC iBarrier for inter-communicators. Remove send of the extra message. This bug hase triggered on MPICH/coll/nbicbarrier test. In this test a series of communicators are created. This extre-message was reseived after original communicator was destroyed and queued into non_existing_communicator_pending. When new completely unrelated communicator with the same id as original was created this message was pushed into the frags_cant_match queue and caused seq numbers skew and hang.	2015-12-12 13:27:31 +06:00
Gilles Gouaillardet	3a3b13ea12	coll/base: fix an integer overflow in ompi_coll_base_reduce_generic Refs open-mpi/ompi#1198	2015-12-11 13:55:59 +09:00
Artem Polyakov	25077fc5d9	Fix MPI_Alltoall to support inter-communicators. Remove excessive parameter check to avoid premature exit from the collective. MPI standard says: The type signature associated with sendcount, sendtype, at a process must be equal to the type signature associated with recvcount, recvtype at any other process. This implies that the amount of data sent must be equal to the amount of data received, pairwise between every pair of processes. In case of inter-communicator we have 2 group of processes and "left" group may call MPI_Alltoall(NULL, 0, MPI_INT, buf, 10, MPI_INT, comm, ...); and the right one: MPI_Alltoall(buf,10,MPI_INT, NULL, 0, MPI_INT, comm, ...); And it would be legal though one of the group will receive 0 bytes from others. This was triggered by MPICH/coll test called icalltoall.	2015-12-11 08:50:34 +06:00
Alina Sklarevich	3ffd8dcd20	PML UCX: fix typo (following `7becc54d`).	2015-12-10 13:51:10 +02:00
Artem Polyakov	ee71e35a90	Fix ompi_comm_create when source communicator is inter-communicator. This bug was triggered by probe-intercom and icm tests from MPICH suite.	2015-12-09 15:44:26 +02:00
Gilles Gouaillardet	3a62341b30	Merge pull request #1189 from ggouaillardet/topic/empty_ddt_fix ddt: duplicate MPI_DATATYPE_NULL when ompi_datatype_create_indexed of…	2015-12-09 15:29:03 +09:00
Nathan Hjelm	f317ba5262	Merge pull request #1163 from hjelmn/ompi_proc_threads ompi/proc: make proc system always thread safe	2015-12-08 10:36:55 -07:00
Nathan Hjelm	b47a64f27d	Merge pull request #1188 from artpol84/intercomm_split_fix Yet one more fix to intercommunicator splitting logic.	2015-12-08 07:09:46 -07:00
Nathan Hjelm	dae3746d2f	Merge pull request #1190 from kawashima-fj/pr/sm-win-test-fix osc/sm: Fix a bug that `MPI_WIN_TEST` does not update `flag` to 0	2015-12-08 06:39:16 -07:00
KAWASHIMA Takahiro	9c7b6a4352	osc/sm: Fix a bug that `MPI_WIN_TEST` does not update `flag` to 0. `MPI_WIN_TEST` must update the `flag` parameter to 0 when not all origin processes called `MPI_WIN_COMPLETE`. But sm OSC doesn't. If the caller initialize the `flag` argument to a non-0 value, the caller will receive the non-0 `flag` value.	2015-12-08 19:23:21 +09:00
Gilles Gouaillardet	59a361b781	ompio: correctly handle zero f_cc_size in mca_io_ompio_simple_grouping	2015-12-08 17:00:11 +09:00
Gilles Gouaillardet	d43ad3fada	ddt: duplicate MPI_DATATYPE_NULL when ompi_datatype_create_indexed of ompi_datatype_create_indexed_block is invoked with a zero count	2015-12-08 16:25:36 +09:00
Artem Polyakov	7690f4027a	Yet one more fix to intercommunicator splitting logic. Previous commit `f2794740` reverts Nathans changes. However it turns out that I was unable to trace his logic until I started investigation of icsplit hang. Bug was triggered when splitting Intercom was giving a group where on side of the communicator was empty (icsplit, intercom create #2). in this case remote_size == 0 and there is no way to distinguish between inter- and intra-communicator. Conclusion: We do need to distinguish between intra- and inter-communicators. So we should use ompi_mpi_group_null.group.	2015-12-08 08:43:08 +02:00
Nathan Hjelm	63d8feb31c	Merge pull request #1187 from hjelmn/bsend_fix pml/ob1: add missing ompi_request_wait_completion for buffered sends	2015-12-07 23:09:04 -07:00
Nathan Hjelm	f68c315188	pml/ob1: add missing ompi_request_wait_completion for buffered sends This commit adds a call to ompi_request_wait_completion for buffered sends. Without this line it is possible to get into a state where the data is never sent. Fixes open-mpi/ompi#1185 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-12-07 22:28:07 -07:00
Nathan Hjelm	eb830b9501	ompi_proc_pack: correctly handle proc sentinels Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-12-07 17:27:38 -07:00
Artem Polyakov	f2794740b3	Fix intercommunicator split (was triggered by MPICH/icsend test)	2015-12-07 15:41:29 +02:00
Gilles Gouaillardet	bfe8e03d9d	fcoll/two_phase: use ompi_mpi_abort instead of PMPI_Abort Thanks Jeff for the review	2015-12-07 11:34:36 +09:00
Gilles Gouaillardet	ef03bc726c	ompi: fix comment in ompi/mpi/c/Makefile.am Thanks Jeff for the review	2015-12-07 11:34:01 +09:00
Gilles Gouaillardet	37c978f5e9	coll/libnbc: correctly handle changed types. this fixes open-mpi/ompi@d816d1c194 thanks Jeff for the review	2015-12-07 10:13:43 +09:00
Gilles Gouaillardet	26b2ed1069	fortran: add missing MPI_xxx_DUP_FN bindings in use-mpi-tkr - MPI_COMM_DUP_FN - MPI_TYPE_DUP_FN - MPI_WIN_DUP_FN	2015-12-07 09:10:48 +09:00
George Bosilca	3a9664ac9d	Fix Coverity CIDs 1341584-1341589.	2015-12-06 14:06:36 -05:00
Jeff Squyres	ad35a363fa	Merge pull request #1179 from jsquyres/pr/mpi-testsome-man-page-update Pr/mpi testsome man page update	2015-12-04 05:55:33 -05:00
bosilca	8fee96c086	Merge pull request #1091 from bosilca/topic/datatype_span Cleanup the temporary memory allocation in collectives	2015-12-03 19:25:04 -05:00
Jeff Squyres	0adcb5b5cd	MPI_Testsome.3in: wrap some long lines Wrap some long lines; no other text or semantics changes.	2015-12-03 17:06:43 -05:00
Jeff Squyres	11c571b568	MPI_Testsome.3in: add explicit verbiage about return values Instead of solely relying on the out value definitions in MPI_Waitsome.3, explicitly copy this text here. Note that the original text in this man page was copied verbatim from the MPI spec; we've now added a bit more text (copied from MPI_Waitsome.3in) that explains the out values so that users don't have to cross-reference to another man page. Thanks to Eric Schnetter for the suggestion. Fixes open-mpi/ompi#1153	2015-12-03 17:06:22 -05:00
Gilles Gouaillardet	a5440ade5f	topo/treematch: do not invoke hwloc_topology_{init,load} * this is not necessary * this overwrites existing topology, that could be different if hwloc_base_topo_file is used	2015-12-03 11:24:32 +09:00
George Bosilca	688108cf7f	Patch submitted by @ggouaillardet on ticket #1091 .	2015-12-02 20:42:18 -05:00
George Bosilca	4d00c59b2e	Cleanup the memory handling for temporary buffers in some of the collective modules. Added a new function opan_datatype_span, to compute the memory span of count number of datatype, excluding the gaps in the beginning and at the end. If a memory allocation is made using the returned value, the gap (also returned) should be removed from the allocated pointer.	2015-12-02 20:42:18 -05:00
Gilles Gouaillardet	351bd03249	ompi_proc_sentinel_to_name: clear the top left bit	2015-12-02 17:18:56 +09:00
Jeff Squyres	15325c8094	op/x86: change the owner to Ralph Cisco no longer cares about this component, but Intel might. Transferring ownership to Ralph.	2015-12-01 15:08:07 -08:00
igor-ivanov	d8c85738ab	Merge pull request #1151 from igor-ivanov/pr/opal-abort-vars Add new mca variables opal_abort_delay and opal_abort_print_stack	2015-12-01 16:27:11 +04:00
Nathan Hjelm	406b9ff1e6	ompi/group: add helper function for creating plist groups This commit adds a helper function for creating groups from proc lists. The function is used by ompi_comm_fill_rest to create the local and remote groups. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-11-30 23:52:57 -07:00
Nathan Hjelm	5334d22a37	ompi/group: release ompi_proc_t's at group destruction This commit changes the way ompi_proc_t's are retained/released by ompi_group_t's. Before this change ompi_proc_t's were retained once for the group and then once for each retain of a group. This method adds unnecessary overhead (need to traverse the group list each time the group is retained) and causes problems when using an async add_procs. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-11-30 23:03:47 -07:00
Ryan Grant	324534b191	Merge pull request #1161 from tkordenbrock/topic/add.triggered.scatter coll-portals4: add scatter and iscatter implementations that use Portals4 triggered operations	2015-11-30 16:53:47 -07:00
Nathan Hjelm	22af95b266	ompi/proc: make proc system always thread safe This commit changes the OPAL_THREAD_LOCK/OPAL_THREAD_UNLOCK calls in ompi/proc to opal_mutex_lock/opal_mutex_unlock. This will allow multi-threaded BTLs the ability to creat ompi_proc_t's without having to set opal_using_threads. There should be no performance hits as none of the lock points are in the critical path. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-11-30 16:37:09 -07:00
Todd Kordenbrock	4721b70dd5	coll-portals4: add scatter and iscatter implementations that use Portals4 triggered operations This commit adds implementations of scatter and iscatter using Portals4 triggered operations. Currently, the only algorithm is linear.	2015-11-30 15:07:18 -06:00
Todd Kordenbrock	f6f525e0d8	coll-portals4: remove unneeded code from gather This commit removes two pieces of unneeded code from gather. First it removes destroy_tree() calls from linear_top(), because the linear algorithm does not create a tree, so there is no need to destroy it. Second it removes unpack_bytes from the gather request because it was calculated but never used.	2015-11-30 10:38:51 -06:00
Gilles Gouaillardet	80f02518ff	topo/base: correctly free the topo object in mca_topo_base_dist_graph_create_adjacent	2015-11-30 15:33:59 +09:00
Gilles Gouaillardet	8227bc6320	ompi_proc_find_and_add: use ompi_proc_allocate in order to update both ompi_proc_list and ompi_proc_hash	2015-11-30 14:00:59 +09:00
igor.ivanov@itseez.com	c15bf147bf	opal: Add opal_abort_print_stack mca variable with aliases for ompi/oshmem This commit allows to control output during abnormal oshmem/ompi application termination. Fixed issue in backtrace output. HAVE_BACKTRACE was never set so user was limited in control of this variable. Two related mca variables are moved to opal layer. Corresponding aliases are added for ompi and oshmem.	2015-11-25 18:18:33 +02:00
Ryan Grant	81d482dca6	Merge pull request #1137 from francois-wellenreiter/trig_mtl_rdv MTL portals4 : improve the rendez-vous protocol using PtlTriggeredGet…	2015-11-24 17:31:31 -07:00
Ryan Grant	219581e87e	Merge pull request #1090 from tkordenbrock/topic/check.for.invalid.handles.in.finalize mtl-portals4: test for valid handle before releasing resources	2015-11-20 07:54:44 -06:00
Mike Dubman	c544620a7c	Merge pull request #1138 from igor-ivanov/pr/yalla-valgrind yalla: fix valgrind error due to uninitialized status field.	2015-11-20 07:19:11 -05:00
Gilles Gouaillardet	002c7b8b3a	fcoll/two_phase: use PMPI_* insted of MPI_*	2015-11-20 13:46:19 +09:00
Gilles Gouaillardet	561e7f6647	vprotocol/pessimist: use internal ompi_* insted of MPI_*	2015-11-20 13:46:19 +09:00
Gilles Gouaillardet	025fd8a9fc	osc: use PMPI_* insted of MPI_*	2015-11-20 13:46:19 +09:00
Gilles Gouaillardet	d816d1c194	coll/libnbc: use PMPI_* and internal ompi_* insted of MPI_*	2015-11-20 13:46:19 +09:00
yosefe	3bb1270715	yalla: fix valgrind error due to uninitialized status field.	2015-11-19 10:59:31 +02:00
Francois WELLENREITER	9126ea5e82	MTL portals4 : improve the rendez-vous protocol using PtlTriggeredGet operation	2015-11-19 09:52:53 +01:00
Edgar Gabriel	9e5ade4e8b	argh, a debugging sleep statement got into the source code.	2015-11-16 13:26:57 -06:00
Edgar Gabriel	dbfbcdecd5	make adjustments for the default settings of grouping parameters and the default contiguous group size option. minor bug fix in the simple grouping strategy.	2015-11-16 08:17:27 -06:00
Edgar Gabriel	27628774c7	add a new option for a simple aggregator selection which has zero communication costs.	2015-11-16 08:17:26 -06:00
Edgar Gabriel	66c1ea5fcb	change the default value of the grouping option. Also add new grouping option which skips the refinement step in the aggregator selection.	2015-11-16 08:17:23 -06:00
Edgar Gabriel	e8e117503d	reduce the communication volume during MPI_File_set_view	2015-11-16 08:17:22 -06:00
yohann	005400a937	mtl/ofi: Make sure the resources are managed by the provider.	2015-11-13 16:16:58 -08:00
Nathan Hjelm	9ef0821856	osc/rdma: fix some threading bugs There were two bugs in osc/rdma when using threads: - Deadlock is ompi_osc_rdma_start_atomic. This occurs because ompi_osc_rdma_frag_alloc is called with the module lock. To fix the issue the module lock is now recursive. In the future I will add a new lock to protect just the current rdma fragment. - Do not drop the lock in ompi_osc_rdma_frag_alloc when calling ompi_osc_rdma_frag_complete. Not only is it not needed but dropping the lock at this point can cause a competing thread to mess up the state. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-11-12 20:25:57 -07:00
Yossi	b750b72a81	Merge pull request #1127 from yosefe/topic/pml-ucx-implement-cancel pml_ucx: implement cancel, and add small optimizations.	2015-11-12 10:50:48 +02:00
yosefe	7becc54d67	pml_ucx: fix typo.	2015-11-12 09:57:41 +02:00
Todd Kordenbrock	b9630f802b	Merge pull request #1120 from francois-wellenreiter/mtl_min_mdbind mtl-portals4 : remove useless PtlMDBind PtlMDRelease calls for rendez-vous messages	2015-11-10 14:34:19 -06:00
yosefe	d66b01d380	pml_ucx: implement cancel, and add small optimizations.	2015-11-10 17:40:06 +02:00
Gilles Gouaillardet	d6ff25b9a2	pml/monitoring: initialize common symbols	2015-11-10 13:58:54 +09:00
Jeff Squyres	e89ecac83c	bml r2: fix exclusivity comparison Fixes open-mpi/ompi#1106	2015-11-06 13:26:32 -08:00
Francois WELLENREITER	b301b49a40	MTL portals4 : remove useless PtlMDBind PtlMDRelease calls for rendez-vous messages	2015-11-06 15:55:44 +01:00
Ralph Castain	bfdf08ae86	Fix intercomm_create by ensuring that both sides know how to translate jobid to/from nspace Return something just to ensure that pack is happy	2015-11-06 02:19:45 -08:00
Nathan Hjelm	fda5daf453	Merge pull request #1096 from kawashima-fj/pr/fortran-var-type-fix Fix Fortran variable types	2015-11-05 14:27:40 -07:00
Nathan Hjelm	acf3cb9b9b	Merge pull request #1095 from kawashima-fj/pr/trivial-fixes Some trivial fixes	2015-11-04 09:45:59 -07:00
yosefe	45c3d04857	pml_ucx: fix request construct/destruct. We should invoke OBJ_CONTRUCT/OBJ_DESTRUCT only on regular requests (which are embedded inside UCX requests) and for the completed request. Persistent requests are already constructed/destructed by the free list. This fixes an assertion in ompi_request_destruct.	2015-11-04 11:03:46 +02:00
KAWASHIMA Takahiro	c09f9f05d3	mpi/tool: Fix an incorrect type cast. This bug caused an invalid result value on `MPI_T_cvar_read` on big-endian machines or for large (>=2Gi) cvar values.	2015-11-04 11:28:43 +09:00
KAWASHIMA Takahiro	384f4b51d1	fortran: Fix: missing `dimension(*)` in `(I)NEIGHBOR_ALLTOALLW`.	2015-11-04 10:38:25 +09:00
KAWASHIMA Takahiro	1092eabfab	fortran: Update comment. The structure was changed in commit `9c77c6b`.	2015-11-04 10:38:25 +09:00
KAWASHIMA Takahiro	107c0073dd	fortran: Fix: `MPI_UNWEIGHTED` and `MPI_WEIGHTS_EMPTY` should be arrays. Without this modification, gfortran throw the following error if these variables are used for `MPI_DIST_GRAPH_CREATE_ADJACENT` or `MPI_DIST_GRAPH_CREATE_ADJACENT`. Error: There is no specific subroutine for the generic 'mpi_dist_graph_create_adjacent' at (1)	2015-11-04 10:38:25 +09:00
KAWASHIMA Takahiro	d5e1f40a1e	fortran: Fix: `info` should be an integer parameter.	2015-11-04 10:38:24 +09:00
KAWASHIMA Takahiro	9bf93810d7	fortran: Fix: array dimension of `MPI_ARGVS_NULL`. `MPI_ARGVS_NULL` should be a two-dimensional array. Without this modification, gfortran throw the following error if `MPI_ARGVS_NULL` is used for `MPI_COMM_SPAWN_MULTIPLE`. Error: There is no specific subroutine for the generic 'mpi_comm_spawn_multiple' at (1)	2015-11-04 10:38:24 +09:00
George Bosilca	b14212f142	Fix Coverity issue 1338059.	2015-11-02 22:51:52 -05:00
Todd Kordenbrock	cefe50cf54	mtl-portals4: test for valid handle before releasing resources During component finalize, mtl-portals4 would blindly release resources without testing if the handle was valid. This was OK, but resource allocation is now delayed until add_procs(). If mtl-portals4 is deselected, it will be finalized without add_procs() ever being called. This commit ensures that invalid handles are not released.	2015-11-02 21:01:14 -06:00
George Bosilca	5c60e76669	Fix Coverity CIDs 1338021, 1338020, 1338019, 1338018.	2015-11-02 17:38:51 -05:00
bosilca	f1a5362f94	Merge pull request #1072 from bosilca/topic/resized Fix for the subarray and darray type creation issue.	2015-11-01 21:17:03 -05:00
George Bosilca	b77c203068	Add more comments and restore the progress, flags, max tag, and max context_id from the original PML.	2015-10-31 17:13:35 -04:00
George Bosilca	3efd494972	Make sure the monitoring infrastructure works well with the new dynamic add_procs.	2015-10-31 17:13:35 -04:00
Guillaume Papauré	86714ad91e	change pml_monitoring_messages_count and pml_monitoring_messages_size pvars to use the start/stop features	2015-10-31 17:13:35 -04:00
George Bosilca	a43c2ce529	Fully integrate the monitoring with the MPI_T PVAR. Writing to the pml_monitoring_flush variable will set the filename of the output file. Stopping a session for the pml_monitoring_flush will force the generation of the nobitoring output file (as long as the filename is not NULL). To reset the monitoring, une has to bind the pml_monitoring_flush to a session.	2015-10-31 17:13:35 -04:00
George Bosilca	646a662721	Use the new group interface and add const to the PML send functions.	2015-10-31 17:13:35 -04:00
George Bosilca	5224a7ce4d	Allow the pvar to be written by invoking the associated callback. Use a PVAR to generate the monitoring dump of the information into a file. Use the PVAR to instruct the PML monitoring when to do the dump.	2015-10-31 17:13:35 -04:00
George Bosilca	df167f4177	Rewrite the close logic to be more clean and cleaner.	2015-10-31 17:13:35 -04:00
George Bosilca	c801ffde86	Use MPI_T variables to handle the flush in a more MPI-blessed way. Code cleanup. Update the monitoring test to use MPI_T variables.	2015-10-31 17:13:35 -04:00
George Bosilca	4f88c82500	Fix a convertion problem and add a comment about the lack of component retain in the new component infrastructure. Clean Makefile.am to fix "make distcheck". Update the gitignore rules.	2015-10-31 17:13:35 -04:00

... 6 7 8 9 10 ...

9363 Коммитов