openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	39992d1ad7	Silence trivial Coverity warnings	2016-08-31 09:42:33 -07:00
Jeff Squyres	ead9b6389a	README: update for new mailman and main web sites Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-31 09:55:36 -04:00
Jeff Squyres	c33aaa5604	Merge pull request #1997 from jsquyres/pr/make-cma-configury-better opal_check_cma: make consistent with rest of configury	2016-08-31 09:50:24 -04:00
Jeff Squyres	ba0ed2401a	Merge pull request #2031 from jsquyres/pr/fix-fortran-runpath-detection opal_setup_wrappers.m4: fix typo in Fortran rpath detection	2016-08-31 09:28:32 -04:00
rhc54	ed5846038b	Merge pull request #2033 from rhc54/topic/state Ensure that the "running" state is correctly updated	2016-08-31 01:50:38 -05:00
Ralph Castain	9b991bd1f5	Ensure that the "running" state is correctly updated It is possible that one or more procs could get thru PMIx_Init, and thus be marked as in state "registered", before all local procs have been started. If that happens, then we would report some of the procs in state "running", and the others in state "registered" - which means that the HNP would miss the "running" stage of the state machine. Thanks to Jingchao Zhang for his patience in tracking this down on the 2.0 branch	2016-08-30 19:24:39 -07:00
Jeff Squyres	1b9e165a4c	opal_setup_wrappers.m4: fix typo in Fortran rpath detection Add missing space in Libtool command for runpath detection. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-30 16:48:05 -07:00
rhc54	bfe0327f7b	Merge pull request #2029 from rhc54/topic/notify Since we changed storage to pointers in pmix_value_t, we need to allocate space for those values when unpacking	2016-08-29 23:11:23 -05:00
Ralph Castain	cfa784c9a6	Since we changed storage to pointers in pmix_value_t, we need to allocate space for those values when unpacking	2016-08-29 20:22:24 -07:00
Nathan Hjelm	99b26644c1	Merge pull request #2011 from hjelmn/osc_pt2pt_fix osc/pt2pt: fix possible race in peer locking	2016-08-29 09:17:36 -06:00
Josh Hursey	b0d8638824	Merge pull request #2015 from jjhursey/topic/mixed-hostnames orte: Expand use of !orte_keep_fqdn_hostnames MCA parameter	2016-08-29 09:14:54 -05:00
George Bosilca	a6d515ba9e	Fixes opal_atomic_ll_64. Thanks to Paul Hardgrove for the report and his patch. This is an addition to #1140 and should go in 2.x	2016-08-27 12:43:48 -04:00
Nathan Hjelm	d33204b0dc	Merge pull request #2021 from hjelmn/xlc_fix opal/patcher: fix xlc support	2016-08-26 18:15:41 -06:00
rhc54	b90a64e734	Merge pull request #2022 from rhc54/topic/nnodes Provide the number of nodes in the job	2016-08-26 18:15:24 -05:00
Ralph Castain	2f6e0fec90	Provide the number of nodes in the job	2016-08-26 14:50:41 -07:00
Joshua Hursey	d26dd2c20e	orte: Expand the application of !orte_keep_fqdn_hostnames * Expand the use of the `orte_keep_fqdn_hostnames` MCA parameter when it is set to false. * If that parameter is set to false (default) then short hostnames (e.g., `node01`) will match with the long hostnames (e.g., `node01.mycluster.org`). This allows a user (or resource manager) to mix the use of short and long hostnames. - Note that this mechanism does _not_ perform a DNS lookup, but instead strips off the FQDN by truncating the hostname string at the first `.` character (when not an IP address). - By default (`false`) the following is true: `node01 == node01.mycluster.org == node01.bogus.com` since we use `node01` as the hostname.	2016-08-26 16:09:04 -05:00
Jeff Squyres	09ad7e81eb	Merge pull request #2007 from jsquyres/pr/usnic-show-local-udp-ports usnic: show the local UDP ports	2016-08-26 17:03:16 -04:00
Nathan Hjelm	a9bc692d99	opal/patcher: fix xlc support The xlc compiler seems to behave in a different way that gcc when it comes the inline asm. There were two problems with the code with xlc: - The TOC read in mca_patcher_base_patch_hook used the syntax register unsigned long toc asm("r2") to read $r2 (the TOC pointer). With gcc this seems to behave as expected but with xlc the result in toc is not the same as $r2. I updated the code to use asm volatile ("std 2, %0" : "=m" (toc)) to load the TOC pointer. - The OPAL_PATCHER_BEGIN macro is meant to be the first thing in a hook. On PPC64 it loads the correct TOC pointer (thanks to mca_patcher_base_patch_hook) and saves the old one. The OPAL_PATCHER_END macro restores the TOC pointer. Because we need the TOC to be correct before it is accessed in the hook the OPAL_PATCHER_BEGIN macro MUST come first. We did this and all was well with gcc. With xlc on the other hand there was a TOC access before the assembly inserted by OPAL_PATCHER_BEGIN. To fix this quickly I broke each hook into a pair of function with the OPAL_PATCHER_* macros on the top level functions. This works around the issue but is not a clean way to fix this. In the future we should 1) either update overwrite to not need this, or 2) figure out why xlc is not inserting the asm before the first TOC read. This fixes open-mpi/ompi#1854 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-26 14:43:03 -06:00
Jeff Squyres	87a5ccc060	usnic: show the local UDP ports Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-26 12:25:18 -07:00
Jeff Squyres	e03a40a0e9	pmix3x: remove generated file Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-26 10:30:47 -07:00
rhc54	03838f275a	Merge pull request #2019 from artpol84/fix_schizo orte/schizo: fix binding detection in slurm component	2016-08-26 09:08:43 -05:00
Edgar Gabriel	b5c757e82c	Merge pull request #2014 from edgargabriel/topic/mt-io Topic/mt io	2016-08-26 08:54:45 -05:00
Jeff Squyres	9ae51a09f2	Merge pull request #1989 from jsquyres/pr/update-usnic-to-libfabric-v1.4 Update usnic BTL to libfabric v1.4	2016-08-26 09:53:07 -04:00
Artem Polyakov	55ac3b0be3	orte/schizo: fix binding detection in slurm component in SLURM 16.05 the SLURM_CPU_BIND_TYPE is equal to "mask_cpu:" instead of "mask_cpu". Account for that.	2016-08-26 09:55:52 +03:00
Gilles Gouaillardet	e4bf915e75	pmix3x: remove auto-generated file remove opal/mca/pmix/pmix3x/pmix/src/include/pmix_config.h.in .gitignore is correct, so it seems this file was added before .gitignore was updated	2016-08-26 15:00:18 +09:00
rhc54	c0fff60e59	Merge pull request #2017 from rhc54/topic/pmixconfig Update configury to support multiple PMIx versions	2016-08-25 21:36:34 -05:00
Ralph Castain	af67f16422	Update configury to support multiple PMIx versions, rename pmix2x component to pmix3x for support of PMIx master Update support for external v1.1.x and v2.x libraries. Minor corrections to the v3.x component	2016-08-25 18:19:05 -07:00
Gilles Gouaillardet	277c319389	opal/util: fix (again and again) incorrect type casting in opal_path_df and silence CID 1371767 this fixes previous commits : - open-mpi/ompi@2eec8970ff - open-mpi/ompi@a439afce5b	2016-08-26 09:42:45 +09:00
Nathan Hjelm	89c2f4974c	Merge pull request #2016 from hjelmn/wait_sync opal/wait_sync: add #if protection on header	2016-08-25 15:13:09 -07:00
Nathan Hjelm	f3d4eaeaf7	Merge pull request #2013 from hjelmn/osc_rdma_fix osc/rdma: fix bug in dynamic memory window tracking code	2016-08-25 13:42:27 -07:00
Nathan Hjelm	de32c779e2	opal/wait_sync: add #if protection on header Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-25 14:31:52 -06:00
rhc54	19b0f4db9f	Merge pull request #1995 from rhc54/topic/pe-per-rank Change the behavior of cpus-per-rank.	2016-08-25 14:38:12 -05:00
Edgar Gabriel	1ba03d38ec	io/ompio: protect remaining functions in multi-threaded scenarios protect the remaining functions where necessary by a mutex lock to avoid problems in multi-threaded executions. Some functions do not require that in my opinion, and I provided an explanation in those cases.	2016-08-25 13:45:51 -05:00
Nathan Hjelm	e53de7ecbe	osc/rdma: fix bug in dynamic memory window tracking code This commit fixes an ordering bug in the code that keeps track of all attached memory windows. The code is intended to keep the memory regions sorted but was often inserting at the wrong index. Thanks to Christoph Niethammer for reporting the issue. The reproducer will be added to nightly MTT testing. Fixes open-mpi/ompi#2012 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-25 12:08:46 -06:00
Nathan Hjelm	7af138f83b	osc/pt2pt: fix possible race in peer locking It is possible for another thread to process a lock ack before the peer is set as locked. In this case either setting the locked or the eager active flag might clobber the other thread. To address this the flags have been made volatile and are set atomically. Since there is no a opal_atomic_or or opal_atomic_and function just use cmpset for now. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-25 09:28:25 -06:00
Nathan Hjelm	c082068953	Merge pull request #2006 from hjelmn/osc_pt2pt_fix osc/pt2pt: fix several bugs	2016-08-25 09:19:29 -06:00
rhc54	17a210f7f0	Merge pull request #2008 from rhc54/topic/binding Correct the binding algorithm to decouple it from oversubscribe.	2016-08-25 09:25:33 -05:00
Edgar Gabriel	1cee83cc1b	use the common/ interfaces in file_preallocate instead of the io_ompio_ interfaces. Necessar for avoiding potential deadlock situations in multi-threaded scenarios.	2016-08-25 08:55:12 -05:00
Jeff Squyres	0d19cc4a13	README: fix a bunch of typos Thanks to Paul Hargrove for pointing them out. Really. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-25 09:15:27 -04:00
Jeff Squyres	f56b16f079	usnic: remove unused variable Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-25 03:53:18 -07:00
Jeff Squyres	9717bcb7e6	btl/usnic: remove stale comment Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-25 03:53:18 -07:00
Jeff Squyres	6f5e377fe0	btl/usnic: update for libfabric v1.4 With libfabric v1.4, the usnic provider changed the values of its fabric and domain name strings (compared to libfabric <v1.4). Update the Open MPI usNIC BTL to handle both pre-v1.4 and v1.4 fabric/domain names. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-25 03:53:17 -07:00
rhc54	b563c9e303	Merge pull request #2003 from rhc54/topic/sync Set the default value of both barrier counters to zero, thus ensuring the coll/sync component is off by default	2016-08-24 23:18:58 -05:00
Ralph Castain	440eae90ec	Correct the binding algorithm to decouple it from oversubscribe. Oversubscribe stipulates that we allow more procs on the node than assigned slots - it has nothing to do with the number of available pe's. Let overload directives handle the pe situation.	2016-08-24 21:17:22 -07:00
George Bosilca	3adff9d323	Fixes #1793 . Reshape the tearing down process (connection close) to prevent race conditions between the main thread and the progress thread. Minor cleanups.	2016-08-24 22:45:19 -04:00
Nathan Hjelm	70f8a6e792	osc/pt2pt: fix several bugs This commit fixes some bugs uncovered during thread testing of 2.0.1rc1. With these fixes the component is running cleanly with threads. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-24 14:35:45 -06:00
Nathan Hjelm	6de64ddbc1	Merge pull request #2005 from hjelmn/ugni_fix btl/ugni: actually make the endpoint lock recursive	2016-08-24 11:05:27 -06:00
Nathan Hjelm	83062db7cb	btl/ugni: actually make the endpoint lock recursive Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-24 10:36:08 -06:00
Ralph Castain	bcf5ac3971	Set the default value of both barrier counters to zero, thus ensuring the coll/sync component is off by default	2016-08-24 07:51:32 -07:00
Gilles Gouaillardet	2eec8970ff	opal/util: fix (again) incorrect type casting in opal_path_df this fixes previous commit open-mpi/ompi@a439afce5b	2016-08-24 12:50:15 +09:00

1 2 3 4 5 ...

25714 Коммитов