openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	9eab9a1ed3	Remove stale global variables Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers. Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation). Begin first cut at memory profiler Some minor cleanups of memprobe Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-02 14:04:24 -08:00
Ralph Castain	e8aea2ebfc	Minor cleanups Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-30 16:19:42 -08:00
Ralph Castain	08c76a42bb	Update to latest PMIx master Signed-off-by: Ralph Castain <rhc@open-mpi.org> Plug a minor memory leak. Tell the PMIx server not to create a dstore memory region for the daemon job as there is nobody to share it with. Signed-off-by: Ralph Castain <rhc@open-mpi.org> Protect users of hwloc membind functions Signed-off-by: Ralph Castain <rhc@open-mpi.org> Update PMIx to include NULL string protection Signed-off-by: Ralph Castain <rhc@open-mpi.org> Update to PMIx master to include key overwrite protection Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-30 12:44:47 -08:00
Ralph Castain	fe68f23099	Only instantiate the HWLOC topology in an MPI process if it actually will be used. There are only five places in the non-daemon code paths where opal_hwloc_topology is currently referenced: * shared memory BTLs (sm, smcuda). I have added a code path to those components that uses the location string instead of the topology itself, if available, thus avoiding instantiating the topology * openib BTL. This uses the distance matrix. At present, I haven't developed a method for replacing that reference. Thus, this component will instantiate the topology * usnic BTL. Uses the distance matrix. * treematch TOPO component. Does some complex tree-based algorithm, so it will instantiate the topology * ess base functions. If a process is direct launched and not bound at launch, this code attempts to bind it. Thus, procs in this scenario will instantiate the topology Note that instantiating the topology on complex chips such as KNL can consume megabytes of memory. Fix pernode binding policy Properly handle the unbound case Correct pointer usage Do not free static error messages! Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-29 10:33:29 -08:00
Ralph Castain	3a2d6a5ab6	Begin to reduce reliance of application procs on the topology tree itself by having the daemon provide more detailed info. In this case, provide the topology description string so that procs can readily determine the number of types of objects on the node, and a "locality" string that describes which objects this process is executing upon. The latter allows a process to compute the objects of overlap between itself and another proc without consulting the topology tree. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-28 09:14:26 -08:00
Ralph Castain	d3aa3777f3	Per @jsquyres: avoid mangling user-provided CFLAGS by using the new PMIX_FLAGS_UNIQ autoconf script in place of PMIX_UNIQ Refs #2636 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-27 09:00:59 -08:00
Gilles Gouaillardet	22db1d36b6	pmix2x: silence misc warnings Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-26 13:35:17 +09:00
Nysal Jan K A	19e3be31e5	Merge pull request #2421 from nysal/master mpit: Fix MPI_T_pvar_get_index	2016-12-22 15:33:51 +05:30
Gilles Gouaillardet	54c84196a6	btl/vader: plug a memory leak as reported by Coverity with CID 1362691 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-22 16:04:36 +09:00
Nysal Jan K.A	25ba507ada	mpit: Fix MPI_T_pvar_get_index MPI_T_pvar_get_index was returning an incorrect index. The index was never set correctly while registering the performance variables. Additionally fix a missing case in the mca_base_var_type_t to MPI datatype conversion. This type is currently used for control variables registered by mxm, fca and hcoll components. Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>	2016-12-22 12:30:21 +05:30
Jeff Squyres	3571c3c5bb	hwloc external: minor fixes to `9649c44` - Fix capitolization typos - Make comment more correct / flow better - Use AM_CPPFLAGS, not DEFAULT_INCLUDES - Remove extra "hwloc/" from external hwloc.h specification Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-12-21 09:06:24 -08:00
Gilles Gouaillardet	9649c44fa0	hwloc: correctly handle --with-hwloc=external - simply #include "hwloc.h" to use the external hwloc header - do use the external hwloc header instead of opal/mca/hwloc/hwloc.h Thanks Orion Poplawski for the report Fixes open-mpi/ompi#2616 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-21 11:58:10 +09:00
Ralph Castain	ea133206ec	Sync the internal OMPI component to PMIx master Update external PMIx v2.x component Add missing Makefile Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-19 19:14:16 -08:00
Ralph Castain	256b5adac5	Transfer across final fixes from debugger attach work Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-19 00:34:27 -08:00
Ralph Castain	c6f6f40529	Transfer debugger support changes Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-17 18:14:46 -08:00
rhc54	54c4925f3f	Merge pull request #2598 from rhc54/topic/debugger Transfer back changes from debugger attach work	2016-12-17 13:09:38 -08:00
Nathan Hjelm	16a2f09cd5	Merge pull request #2596 from hjelmn/x86_rtdtsc opal/timer: add code to check if rtdtsc is core invariant	2016-12-17 11:14:49 -07:00
Ralph Castain	269753f5c1	Transfer back changes from debugger attach work Silence warning Remove debug Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-17 10:00:52 -08:00
Ralph Castain	215d6290e0	Add a flux component for LLNL Fine tuning of flux component Fix a few minor issues with the initial cut: * Job id could be obtained from the PMI kvsname like SLURM, but simpler to getenv (FLUX_JOB_ID) * Flux pmi-1 doesn't define PMI_BOOL, PMI_TRUE, PMI_FALSE * Flux pmi-1 maps the deprecated PMI_Get_kvs_domain_id() to PMI_KVS_Get_my_name() internally, so just call that instead. * Drop residual slurm references. Add wrappers for PMI functions so that if HAVE_FLUX_PMI_LIBRARY is not defined, the component can dlopen libpmi.so at location specified by the FLUX_PMI_LIBRARY_PATH env variable, which adds flexibility. If HAVE_FLUX_PMI_LIBRARY is defined, link with libpmi.so at build time in the usual way. Update configury for flux component Update m4 so the configure options work as follows: --with-flux-pmi Build Flux PMI support (default: yes) --with-flux-pmi-library Link Flux PMI support with PMI library at build time. Otherwise the library is opened at runtime at location specified by FLUX_PMI_LIBRARY_PATH environment variable. Use this option to enable Flux support when building statically or without dlopen support (default: no) If the latter option is provided, the library/header is located at build time using the pkg-config module 'flux-pmi'. Otherwise there is no library/header dependency. Handle the case where ompi is configured with --disable-dlopen or --enable-statkc. In those cases, don't build the component unless --with-flux-pmi-library is provided. It is fatal if the user explicitly requests --with-flux-pmi but it cannot be built (e.g. due to --disable-dlopen). Add a schizo/flux component Update schizo/flux component Eliminate slurm-specific usage cases. Since the module is only loaded if FLUX_JOB_ID is set, there are only two cases to handle: 1) App was launched indirectly through mpirun. This is not yet supported with Flux, but hook remains in case this mode is supported in the future. 2) App was launched directly by Flux, with Flux providing CPU binding, if any. Fix up white space in pmix/flux component Drop non-blocking fence from pmix:flux component The flux PMI-1 library is not thread safe, therefore register a regular blocking fence callback instead of the thread-shifting fencenb(). pmix/flux component avoids extra PMI_KVS_Gets Keys stored into the base cache under the wildcard rank are not intended to be part of the global key namespace. These keys therefore should not trigger a PMI_KVS_Get() if they are not found in the cache. Minor pmix/flux component cleanup pmix/flux: drop code for fetching unused pmix_id pmix/flux: err_exit must return error Problem: in flux_init(), although 'ret' (variable holding err_exit return code) is initialized to OPAL_ERROR, the variable is reused as a temporary result code, so if there are some successes followed by a failure that doesn't set 'ret', flux_init() could return success with PMI not initialized. Ensure that a "goto err_exit" returns OPAL_ERROR if 'ret' is not set to some other error code. pmix/flux: don't mix OPAL_ and PMI_ return codes Problem: flux_init() can return both PMI_ and OPAL_ return codes. Although OPAL_SUCCESS and PMI_SUCCESS are both defined as 0, other codes are not compatible. Ensure that flux_init() consistently uses 'rc' for PMI_ return codes and 'ret' for OPAL_ return codes. pmix/flux: factor out repeated code for cache put Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-16 18:26:38 -08:00
Nathan Hjelm	a718743a5c	opal/timer: add code to check if rtdtsc is core invariant Newer x86 processors have a core invariant tsc. On these systems it is safe to use the rtdtsc instruction as a monotonic timer. This commit adds a new function to the opal timer code to check if the timer backend is monotonic. On x86 it checks the appropriate bit and on other architectures it parrots back the OPAL_TIMER_MONOTONIC value. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-12-16 15:11:50 -07:00
rhc54	00b87ea829	Merge pull request #2584 from rhc54/topic/warnings Reduce the flood of warnings due to uninitialized variables, mismatch…	2016-12-15 10:09:01 -08:00
Ralph Castain	585540bcee	Reduce the flood of warnings due to uninitialized variables, mismatched types, and unused things to a more bearable trickle Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-14 16:33:50 -08:00
Gilles Gouaillardet	a019095b84	pmix2x/class: correctly handle concurrent class initialization (back-ported from upstream commit pmix/master@ceedbd67fd) Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-15 09:07:24 +09:00
Ralph Castain	884fb7fcf2	Update the PMIx2 support to include the latest shared memory optimizations Update ORTE support for dynamic PMIx operations e.g., PMIx_Spawn Update to track master Ensure that --disable-pmix-dstore actually disables the dstore. Sync to a few debugger updates Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-14 15:00:10 -08:00
Ralph Castain	1961a1c22a	Use the server tmpdir instead of the system tmpdir for tool contact files Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-14 08:42:09 -08:00
Ralph Castain	fbed2d794a	Update to latest PMIx master + PTL branch Update the usock component to disable it Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-06 20:47:44 -08:00
Ralph Castain	d51821cbc7	Update pmix check headers to support Open BSD Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-06 19:37:06 -08:00
Ralph Castain	f91f8ce494	Protect against NULL param Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-03 21:19:32 -08:00
Ralph Castain	f633d5c1b1	Initialize var Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-03 21:08:45 -08:00
rhc54	003f7d308f	Merge pull request #2504 from hppritcha/topic/fix_pmix_base_help_file pmix: Fix pmix base help file.	2016-12-03 11:57:07 -08:00
Ralph Castain	a43fae74a5	Avoid hanging in show_help if PMIx was unable to initialize Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-03 10:12:58 -08:00
Ralph Castain	a4e3f615e3	Fix executable modes of pmix config scripts Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-02 20:50:59 -08:00
Ralph Castain	342dbfcf4e	Update Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-02 20:33:03 -08:00
Ralph Castain	8e64382edf	Update to correct tarball Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-02 19:16:07 -08:00
Ralph Castain	1a0bccb536	Now that PMIx has settled on its release strategy and numbering, update the OPAL pmix framework to track Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-02 15:44:43 -08:00
Howard Pritchard	3049848731	Fix pmix base help file. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-12-02 15:03:22 -06:00
Ralph Castain	6041467df0	Update to latest PMIx master Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-01 14:47:44 -08:00
Gilles Gouaillardet	3a76a78bff	btl/openib: plug a memory leak in btl_openib_register_mca_params() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:30 +09:00
Gilles Gouaillardet	c9aeccb84e	opal/if: open the if framework once in opal_init_util the if framework is no more open in opal_if*, which plugs several memory leaks Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:30 +09:00
Gilles Gouaillardet	45732fd764	hwloc/base: fix a memory leak in buffer_cleanup() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:29 +09:00
Gilles Gouaillardet	2739346a18	opal: invoke mca_base_close() in opal_finalize_util() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:29 +09:00
Howard Pritchard	7ce3ca25ef	Merge pull request #2458 from hppritcha/topic/pmix_cray_no_dlopen pmix/cray: abort job if using aprun for general case	2016-11-25 14:03:20 -07:00
Howard Pritchard	eee9f7ae3a	pmix/cray: abort job if using aprun for general case It turns that there is an incompatibility between the Cray PMI library and the default configuration for building Open MPI (master). To work around this, we now disable use of aprun for direct launch of Open MPI jobs except under specific conditions. The problem is that there are now (on master) packages getting initialized that do not work properly across a fork operation. As part of a constructor in the Cray PMI library, a fork operation is done to simplify use of shared memory between the processes in a job on the same node. This ends up thoroughly messing up the Open MPI initialization process in the case that dlopen support is enabled. The initialization process gets about half-way through when the PMIX framework is opened and components are loaded, which triggers the Cray PMI constructor and hence the fork operation. There are two workarounds for this: 1) configure Open MPI for Cray XE/XC systems using aprun with the --disable-dlopen option 2) set the PMI_NO_FORK environment variable in the shell in which the aprun command is run. Without taking these measures, a Open MPI job will just hang at job startup in the first attempt to "thread-shift" the PMIx fence_nb operation. Additional hangs occur at shutdown if this problem is worked around, again due to the insertion of a fork operation halfway through the Open MPI initialization procedure. This commit detects if the conditions that bring out the hang situation are present, and if so, prints out a message and aborts the job launch. Note on systems using slurm, the PMI_NO_FORK environment variable is set as part of the srun job launch, hence this issue is avoided on those systems. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-11-25 06:28:19 -07:00
Gilles Gouaillardet	8fd1c3f0df	opal/util: handle a race condition in opal_os_dirpath_destroy An file might have been destroyed by an other task between readdir() and stat(), so simply ignore stat() failure. That typically occurs when one task is removing the job_session_dir and an other task is still removing its proc_session_dir. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-24 10:45:48 +09:00
Gilles Gouaillardet	1a279c4ee9	btl/self: fix fragment segment length in mca_btl_self_prepare_src() opal_convertor_pack() might pack less bytes than requested, so always set frag->segments[0].seg_len. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-24 10:44:56 +09:00
Ralph Castain	1e2019ce2a	Revert "Update to sync with OMPI master and cleanup to build" This reverts commit `cb55c88a8b`.	2016-11-22 15:03:20 -08:00
Ralph Castain	cb55c88a8b	Update to sync with OMPI master and cleanup to build Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-22 14:24:54 -08:00
Howard Pritchard	2bb4cffffd	Merge pull request #2447 from hppritcha/topic/compiler_warning_swats btl/ugni:vader swat some compiler warnings	2016-11-22 05:28:20 -07:00
Howard Pritchard	09f47fcf8e	btl/ugni:vader swat some compiler warnings Swat some compiler warnings. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-11-21 14:58:34 -06:00
Howard Pritchard	2cbc0e8472	pmix/cray: fix disable-dlopen problem PR open-mpi/ompi#2432 introduced a regression where configure and build with --disable-dlopn caused build failure owing to unresolved alps lli symbols in the libopal-pal shared library. This commit fixes this problem. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-11-21 13:45:10 -06:00
Howard Pritchard	0bbb319246	Merge pull request #2444 from hppritcha/topic/cray_pmix_ws_cleanup pmix/cray: whitespace cleanup	2016-11-21 06:03:56 -07:00
Howard Pritchard	08dce4f161	pmix/cray: whitespace cleanup Get rid of tabs. This is anti-ompi style. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-11-18 19:30:40 -07:00
Howard Pritchard	44f4663d0d	Merge pull request #2432 from hppritcha/topic/fix_info_etc_w_alps pmix/cray: set some envars for MPI_INFO_ENV object	2016-11-18 14:23:53 -07:00
Howard Pritchard	de3de131af	pmix/cray: set some envars for MPI_INFO_ENV object Enhance the cray pmix component to set some OMPI internal env. variables used to set some key/value pairs on the MPI_INFO_ENV object. This allows more of the ompi-tests ibm unit tests to pass when using aprun/srun direct launch and Cray PMI. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-11-16 17:37:52 -06:00
Joshua Ladd	4907085c6f	Add the ConnectX-5 device ID to openib BTL. Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>	2016-11-16 21:42:37 +02:00
Gilles Gouaillardet	8ef538adeb	Merge pull request #2398 from bosilca/topic/tcp_endpoints_mutex Protect the tcp_endpoints list from concurrent accesses.	2016-11-14 22:13:29 -07:00
Howard Pritchard	703b464c03	pmix: fix a typo in a help file Fixes #2391 Thanks to @njoly for reporting Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-11-12 11:49:15 -07:00
George Bosilca	d0dddef53d	Protect the tcp_endpoints list from concurrent accesses. Thanks Gilles for your help. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2016-11-11 00:06:03 -05:00
Gilles Gouaillardet	a49422fe84	btl/tcp: get rid of the MCA_BTL_TCP_SUPPORT_PROGRESS_THREAD macro since pthreads are now mandatory, the MCA_BTL_TCP_SUPPORT_PROGRESS_THREAD is always true and hence can be safely removed Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-08 14:00:05 +09:00
Gilles Gouaillardet	11dc86f26b	cleanup: always #include <pthread.h> pthreads are now mandatory, so there is no more need to Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-08 13:07:45 +09:00
Aboorva Devarajan	fb8e074583	powerpc: Add support for powerpcle in timer/pstat. Signed-off-by: Aboorva Devarajan <abodevar@in.ibm.com>	2016-11-07 02:35:44 -05:00
Gilles Gouaillardet	7a2894f1e0	event/libevent2022: cleanup dependencies to the embedded libevent lib configury force event/libevent2022 to be built as a static module, so simplify Makefile.am Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-07 14:57:52 +09:00
Gilles Gouaillardet	6f7ed1f552	event/libevent2022: add missing dependencies to the embedded libevent lib force the libevent2022 component rebuild if the embedded libevent is updated Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-04 11:13:44 +09:00
Jeff Squyres	a4ffa590c8	Merge pull request #2308 from hjelmn/vader_mem btl/vader: reduce memory footprint when using xpmem	2016-11-02 10:28:26 -04:00
Steve Wise	7050969d47	openib btl: remove BTL_OPENIB_FAILOVER_ENABLED code Remove BTL_OPENIB_FAILOVER_ENABLED code in the openib btl source. Remove the failover-specific files from the openib btl. Update the openib/Makefile.am accordingly. Remove the -enable-openib-failover config logic. Signed-off-by: Steve Wise <swise@opengridcomputing.com>	2016-11-01 14:45:36 -07:00
Ralph Castain	64873487b4	Remove the max_connections parameter from the radix component as it is confusing. Modify PMIx client init so that it simply returns the nspace/rank if called by a server - this allows the server to retrieve its assigned ID. Register the server's nspace so client-side operations can succeed Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-01 12:17:11 -07:00
Jeff Squyres	149b660666	btl/usnic: fix compiler warning Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-10-28 07:36:20 -07:00
rhc54	698dac108b	Merge pull request #2309 from rhc54/topic/pmix Update to latest PMIx master - mostly updates example codes, but incl…	2016-10-27 14:21:46 -07:00
Ralph Castain	f4a55118e6	Update to latest PMIx master - mostly updates example codes, but includes one critical cleanup during finalize. NOTE: set dstore shared memory storage option to "on" by default Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-10-27 11:16:03 -07:00
Nathan Hjelm	9d92075e60	btl/self: rewrite to decrease memory usage (#2307 ) This commit rewrites much of the btl/self component to fix a long standing memory usage bug. Before this commit the prepare_src path would always allocate a max send fragment (256kB). This caused the rank to allocate 32 * 256k useless buffers from one send. This commit makes the following changes: - Add the MCA_BTL_FLAGS_GET flag by default. No reason not to set it. - Reduce the eager limit, max send size, buffers per allocation, and maximum buffer count per fragment size. These changes should have no noticible affect on performance but should greatly reduce the memory usage of the component. - Implement the sendi function. This should reduce self send latency somewhat. - Rewrite prepare_src to never allocate a eager or max send fragment for contiguous data. - add_procs needs to return something in the peer array for the proc self not just set the reachability bit. Now stores (void *) 1. - Various cleanups. Removed and unused file. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-10-27 12:34:54 -04:00
Nathan Hjelm	a652a193ea	btl/vader: reduce memory footprint when using xpmem The vader btl kept a per-peer registration cache to keep track of attachments. This is not really a problem with small numbers of local ranks but can be a problem with large SMP machines. To reduce the footprint there is now one registration cache for all xpmem attachments. This will probably increase the lookup time for large transfers but is a worthwhile trade-off. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-10-27 10:09:43 -06:00
Ralph Castain	f298f294e1	Update PMIx to latest master tarball. Ensure we set the HNP name for orted's so that PMIx_Lookup can find the server Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-10-26 15:48:56 -07:00
Gilles Gouaillardet	8cc3f288c9	opal: fix opal_class_finalize() usage the class system can be initialized/finalized as many times as we like, so there is no more need to have opal_class_finalize() invoked in a destructor Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-10-26 15:15:54 +09:00
Gilles Gouaillardet	f2a80dc09f	configury: check libnl version and abort in case of conflict libnl and libnl-3 are known to conflict with each other, so detect and abort if these two libs are both used directly (e.g. Open MPI uses libnl-3) or indirectly (e.g. libibverbs.so might depend on libnl)	2016-10-25 09:23:59 +09:00
Ralph Castain	649301a3a2	Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier). Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.	2016-10-23 21:52:39 -07:00
rhc54	900ae15d49	Merge pull request #2221 from bharatpotnuri/master btl/openib: remove unwanted ompi header inclusion in opal code.	2016-10-21 14:05:55 -05:00
Ralph Castain	9131eca9c6	Update to latest PMIx master	2016-10-20 21:13:40 -07:00
Ralph Castain	be3197fe27	Ensure that the libevent headers are installed for external libevent when --with-devel-headers is given. Correct the path for opal_config.h in the external hwloc header	2016-10-20 20:57:50 -07:00
Ralph Castain	2f966bf3bf	Cleanup external PMIx v3 component for copy/paste errors - component and module require unique names	2016-10-20 09:11:46 -07:00
Ralph Castain	8113a8d1b0	Now that we are hiding symbols in the internal PMIx component, we cannot reuse that component for integration to the external PMIx master as the symbols don't match. So create a new "ext3x" component and copy the PMIx v3 integration over there. Also, remove a couple of build-product files from the pmix3x component.	2016-10-18 13:15:32 -07:00
Ralph Castain	50c9f3de55	Ensure the PMIx progress thread is stopped prior to tearing anything down. Thanks to Gilles for spotting this error!	2016-10-18 00:27:52 -07:00
Gilles Gouaillardet	4e19cd51b1	hwloc/external: add a missing include file	2016-10-14 09:27:33 +09:00
Ralph Castain	6f65d0a173	Repair event notification support. Cleanup the long-suffering "epoll: warning" coming out of libevent whenever a process abnormally terminated. Add changes to test program Sync to PMIx master	2016-10-13 16:27:39 -07:00
Ralph Castain	6417f217e1	Turn PMIx dstore off by default as MTT was effectively broken	2016-10-13 08:14:51 -07:00
Potnuri Bharat Teja	29f1aa836f	btl/openib: remove unwanted ompi header inclusion in opal code. OMPI header cannot be included in OPAL source code, hence removed it. Fixes: (`740b636db`) btl/openib: Disqualify rdmacm CPC if MPI_THREAD_MULTIPLE. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>	2016-10-13 16:21:36 +05:30
Nathan Hjelm	5b40fd267f	Merge pull request #2204 from hjelmn/arm64 asm/arm64: ensure instruction ordering on timer	2016-10-12 11:22:28 -06:00
Nathan Hjelm	9a50ce6364	asm/arm64: ensure instruction ordering on timer Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-10-12 09:25:21 -06:00
Ralph Castain	8f05beb1ec	Sync pmix/master@cb53105	2016-10-11 20:54:59 -07:00
rhc54	ad156e3e91	Merge pull request #2207 from rhc54/topic/pmixupdate Update PMIx support to latest PMIx master	2016-10-11 18:57:11 -05:00
Jeff Squyres	bcbf0bc4f9	usnic: s/OMPI/OPAL/ Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-10-11 16:43:35 -07:00
Ralph Castain	6ce4b6d098	Eliminate -Wall from being hardcoded	2016-10-11 12:50:31 -07:00
Ralph Castain	1859b03416	Enable PMIx shared memory support by default	2016-10-11 12:18:01 -07:00
Ralph Castain	1d7d7c201b	Update PMIx support to latest PMIx master	2016-10-11 10:17:23 -07:00
Ralph Castain	5b1484a836	Implement the backend support for process-generated event notification	2016-10-08 09:24:28 -07:00
Gilles Gouaillardet	0d24fad307	opal: always run opal_class_finalize in the opal_cleanup destructor if MPI_Init[_thread]/MPI_Finalize and MPI_T_init_thread/MPI_T_finalize are balanced, opal_initialized is zero, and hence opal_cleanup destructor never invokes opal_class_finalize. if MPI_Init[_thread] nor MPI_T_init_thread have been called, classes is NULL, so opal_class_finalize does nothing	2016-10-08 16:58:20 +09:00
Gilles Gouaillardet	b55dd2442a	libevent2022: rename _event_strlcpy	2016-10-08 16:58:20 +09:00
Gilles Gouaillardet	c92e9a5406	use the new OPAL_HASH_TABLE_FOREACH convenience macro	2016-10-08 16:58:20 +09:00
Gilles Gouaillardet	23a8f764bd	opal: add the OPAL_HASH_TABLE_FOREACH macro this is a convenience macro similar to the OPAL_LIST_FOREACH macro, that can be used to iterate on all the key/value pairs of an opal_hash_table_t	2016-10-08 16:58:20 +09:00
Gilles Gouaillardet	014f917462	opal: fix comment in OPAL_LIST_FOREACH macro. no code change.	2016-10-08 16:58:19 +09:00
Gilles Gouaillardet	f1f1fb15eb	pmix3x: configury: output major, minor and release version after checking them and hence fix the configure output (back-ported from upstream commit pmix/master@7b7cdda2de)	2016-10-08 13:01:28 +09:00
Gilles Gouaillardet	f3af799608	pmix3x: misc fixes to get pmix build on Solaris - replace MAXHOSTNAMELEN with hardcoded 1024. unlike Linux, Solaris #define MAXHOSTNAMELEN in <netdb.h>, so use a hard coded value to keep the test simpl - stdout cannot be assigned on Solaris, so use freopen instead (back-ported from upstream commit pmix/master@a63f6e53f4)	2016-10-08 13:01:28 +09:00
Gilles Gouaillardet	5cbfddb8f1	pmix3x: fix misc memory leaks (back-ported from upstream commit pmix/master@1eff526929)	2016-10-08 13:01:28 +09:00
Gilles Gouaillardet	b4e4e4a5f1	pmix3x: enhance pmix_nspace_t destructor PMIX_RELEASE all elements stored in the internal and modex hash tables (back-ported from upstream commit pmix/master@b90674fc52)	2016-10-08 13:01:27 +09:00
Gilles Gouaillardet	f1dc033767	pmix3x: add the PMIX_HASH_TABLE_FOREACH macro this is a convenience macro similar to the PMIX_LIST_FOREACH macro, that can be used to iterate on all the key/value pairs of a pmix_hash_table_t (back-ported from upstream commit pmix/master@349971c68c)	2016-10-08 13:01:27 +09:00
Jeff Squyres	67684be7c9	usnic: fix one last stray fabric_attr->name --> linux_device_name Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-10-04 18:17:38 -07:00
Jeff Squyres	8b77359cac	usnic: remove some legacy libfabric 1.0/1.1 code We only support running with libfabric v1.3 or greater. So it's safe to remove the legacy/adaptive cq_readerr() behavior. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-10-03 11:59:41 -07:00
Jeff Squyres	345c07a252	usnic: require libfabric >= v1.3 at run time There are critical usnic libfabric AV insert bugs before v1.3, so don't allow any version prior to v1.3 at run time (still allow compiling with earlier versions, though, since the ABI guarantees allow us to compile with an earlier libfabric and run with a later libfabric). Switch to using fi_version() to check the version (instead of calling fi_getinfo()) as a potentially lighter-weight / simpler solution. This allows us to only call fi_getinfo() once. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-10-03 11:59:41 -07:00
Jeff Squyres	b13813810f	usnic: print a helpful message invoke PML error callback The previous message was unhelpful / confusing. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-10-03 11:59:41 -07:00
Gilles Gouaillardet	7601e783cc	pmix3x: sec/munge: add a missing include file (cherry picked from upstream pmix/master@f7cfb11f6b)	2016-10-03 16:09:10 +09:00
Ralph Castain	e773c17cf3	Put show_help thru the PMIx "log" API. This pushes the show_help output from apps into the pmix thread, thus avoiding conflicts in the RML thread, which should help with thread lock situations.	2016-10-02 16:02:23 -07:00
Jeff Squyres	545d8f2e66	usnic cagent: correctly compute the "large" ping message size The (effective) "+42" computation was, in fact, the incorrect answer in this case (gasp!). We should just take the max_msg_size from the command (which came from the libfabric endpoint max_msg_size attribute in the client) and subtract off the max header size: 68 (which is explained in the comment). This will result in a "large" message size which is likely slightly smaller than the MTU, but still right up near the MTU, and therefore good enough. Note: the old computation (i.e., -(68-42)) worked fine when we asked for Libfabric API v1.1 because the usnic provider would return a max_msg_size that was already less than the MTU due to FI_PREFIX behavior shenanigans. Once we started asking for Libfabric API v1.4, the usnic Libfabric provider started returning (MTU + prefix_size), and the -(68-42) computation started giving a value that was over the MTU. This caused sendto() on the connectivity checker UDP socket to fail. This commit also removes an old/misleading comment. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-09-30 17:01:05 -07:00
Joshua Hursey	f6f24a4f67	build: Custom libmpi(_FOO) name option in configure * Add a configure time option to rename libmpi(_FOO).* - `--with-libmpi-name=STRING` * This commit only impacts the installed libraries. Internal, temporary libraries have not been renamed to limit the scope of the patch to only what is needed. For example: ```shell shell$ ./configure --with-libmpi-name=wookie ... shell$ find . -name "libmpi" shell$ find . -name "libwookie" ./lib/libwookie.so.0.0.0 ./lib/libwookie.so.0 ./lib/libwookie.so ./lib/libwookie.la ./lib/libwookie_mpifh.so.0.0.0 ./lib/libwookie_mpifh.so.0 ./lib/libwookie_mpifh.so ./lib/libwookie_mpifh.la ./lib/libwookie_usempi.so.0.0.0 ./lib/libwookie_usempi.so.0 ./lib/libwookie_usempi.so ./lib/libwookie_usempi.la shell$ ```	2016-09-29 21:47:24 -05:00
Gilles Gouaillardet	871ade9231	pmix/{cray,s1,s2}: make pmi_opcaddy_t class static theses three pmix components use the same class name, declare it as static so Open MPI can be built with --disable-dlopen Thanks Limin Gu for the report	2016-09-28 09:18:36 +09:00
Jeff Squyres	1a5a5fb400	Merge pull request #1861 from bharatpotnuri/master btl/openib: Disqualify rdmacm CPC if MPI_THREAD_MULTIPLE	2016-09-27 13:03:35 -04:00
Potnuri Bharat Teja	740b636dbe	btl/openib: Disqualify rdmacm CPC if MPI_THREAD_MULTIPLE The rdmacm CPC in the openib BTL is not thread safe. The rdmacm CPC should disqualify itself (instead of failing in random ways) if MPI_THREAD_MULTIPLE is the thread level. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>	2016-09-27 14:20:59 +05:30
Gilles Gouaillardet	1fbc9a5431	pmix3x: dstore/pmix: flock portability Using the fcntl-locking instead of the flock (back-ported from upstream pmix/master@3030a0cca1)	2016-09-27 13:21:03 +09:00
George Bosilca	066370202d	Support non-monotonic assembly timers. If monotonic support has been required by the runtime and the assembly timers are unable to provide it, fall back to clock_gettime.	2016-09-23 21:51:34 -04:00
George Bosilca	45dcf1f5d7	Always use the best timer available If we have better timer than clock_gettime use it, even if it an assembly timer.	2016-09-23 19:32:58 -04:00
George Bosilca	93fa94f96f	Re-enable support for local addresses. This patch is based on the "RFC: Reenabling the TCP BTL over local interfaces (when specifically requested)". It removes the hardcoded exception for the local devices that has been enforced by the TCP BTL. Instead, we exclude the local interface only via the exclude MCA (both IPv4 and IPv6 local addresses are already in the default if_exclude), which is also the behavior currently described in our README file.	2016-09-23 13:04:33 -04:00
Gilles Gouaillardet	362a5886de	pmix3x: client: fix PMIx_Finalize() sequence pmix_progress_thread_finalize() invokes libevent event_base_free, so all libevent stuff cannot be used after. Hence, pmix_client_globals.myserver must be PMIX_DESTRUCT'ed before invoking pmix_progress_thread_finalize()	2016-09-24 00:01:23 +09:00
Gilles Gouaillardet	5479c6cca7	pmix3x: add missing #include and get Open MPI build on OpenBSD 6.0	2016-09-23 11:23:18 +09:00
Gilles Gouaillardet	eaee1332e1	opal/util/ethtool: add missing headers and get Open MPI build on OpenBSD 6.0	2016-09-23 11:22:19 +09:00
Ralph Castain	a14ec3bdbc	Mucho thanks to Gilles - his patch to reorder the CPPFLAGS solves the problem of inadvertently picking up hwloc and libevent headers from locations in CPPFLAGS while continuing to build the embedded versions. Also silence a minor warning about an uninitialized var.	2016-09-22 07:39:22 -07:00
George Bosilca	131fe42db8	Fix MT wait-sync. Prevent a race condition between a thread checking count and then going in cond_wait, and another thread setting the count to 0 and signaling the condition. Thanks to Pascal Deveze for catching up the bug and for the initial patch.	2016-09-21 07:42:48 -04:00
Gilles Gouaillardet	fbf03299c3	Merge pull request #2079 from ggouaillardet/topic/pmix_configury_dlopen pmix3x: configury: correctly handle --disable-dlopen	2016-09-21 10:59:33 +09:00
Gilles Gouaillardet	6c1e25b76e	pmix/ext11: fix pmix1_value_unload() prototype and call pmix1_value_unload() was added a "key" argument which is unused, and pmix1_value_unload() was sometimes invoked with two arguments instead of three. since the "key" argument is unused, simply remove it from the subroutine prototype and calls.	2016-09-20 14:34:41 +09:00
Gilles Gouaillardet	e6f7facd7d	opal/util: improve error message in opal_os_dirpath_create()	2016-09-18 17:10:47 +09:00
Gilles Gouaillardet	4b47daeeb0	opal/util: improve return status of opal_os_dirpath_create()	2016-09-18 12:32:42 +09:00
George Bosilca	295eec7059	Small fix for persistence receives. A minor optimization, few typos and extra comments	2016-09-16 10:27:32 -04:00
Nathan Hjelm	2edc77b27b	asm/ppc: work around apparent PGI 16.9 bug The add_64, sub_64, and cmpset_64 atomics used "+m" (*addr) to indicate the asm also writes the memory location. This is better than using a memory clobber. PGI 16.9 introduced a bug that causes a compiler failure on the "+m" constraint (input/output). It seems to work with "=m" (output) which matches the 32-bit atomics. Fixes open-mpi/ompi#2086 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-15 12:43:31 -06:00
Gilles Gouaillardet	041a431966	pmix3x: configury: correctly handle --disable-dlopen the LT_* macros do overwrite the enable_dlopen variable, so it must be tested and saved before invoking LT_INIT. delay the invokation of the LT_* macros and use the PMIX_ENABLE_DLOPEN_SUPPORT variable to figure out whether --disable-dlopen was invoked	2016-09-15 13:26:20 +09:00
Nathan Hjelm	4c9e38e8e0	Merge pull request #2077 from hjelmn/tcp_fix btl/tcp: fix double list remove	2016-09-13 12:21:52 -06:00
Nathan Hjelm	a681837ba8	btl/tcp: fix double list remove This commit fixes an abort during finalize because pending events were removed from the list twice. References #2030 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-13 09:23:12 -06:00
Gilles Gouaillardet	628c730196	pkgconfig: define the pkgincludedir variable in *.pc files this has been made necesarry with open-mpi/ompi@12e796dcaf Refs open-mpi/ompi#2069	2016-09-13 09:50:14 +09:00
Artem Polyakov	9eba1b0b75	Merge pull request #2042 from artpol84/pmix_sdirs Several fixes related to session directories:	2016-09-07 14:15:47 +07:00
Gilles Gouaillardet	cd2b5a82ed	hwloc: plug memory leak as reported by Coverity with CID 1270441	2016-09-07 10:08:44 +09:00
Gilles Gouaillardet	44a66e208c	threads: fix WAIT_SYNC_INIT with a zero count WAIT_SYNC_INIT(sync,0); WAIT_SYNC_RELEASE(sync); hanged because sync->signaled was initialised to true, and there is no reason to invoke WAIT_SYNC_SIGNALED(sync) before WAIT_SYNC_RELEASE(sync) this commit initializes sync->signaled to true unless the count is zero. Thanks George for the review and guidance.	2016-09-07 10:03:40 +09:00
Nathan Hjelm	27a2509fec	Merge pull request #2051 from hjelmn/ppc_asm opal/asm: updates to powerpc assembly	2016-09-06 15:13:28 -06:00
Jeff Squyres	527efec4fb	Merge pull request #2050 from jsquyres/pr/btl-tcp-help-messages Add a show_help message to TCP BTL when peer unexpectedly disconnects	2016-09-06 09:40:31 -04:00
Jeff Squyres	1953e3406f	btl/tcp: add show_help message when peer hangs up We commonly see messages on the users list where a peer has hung up because it has crashed. Instead of having just a BTL_ERROR message, make this a real opal_show_help() message that tells the user that the peer unexpectedly hung up, and they should look into why that peer hung up. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-09-06 09:40:03 -04:00
Gilles Gouaillardet	894be7860a	gcc_builtin/atomic: Silence numerous warnings from Studio compilers This commit adds selective use of a compiler-specific pragma to silence the numerous warnings the Sun/Oracle/Studio compilers emit for the GNU-style inline asm used in atomic.h. Thanks Paul Hargrove for the initial patch and the guidance.	2016-09-06 09:07:16 +09:00
Gilles Gouaillardet	4b208e4463	btl/tcp: make mca_btl_tcp_proc_insert re-entrant otherwise bad things happen with --mca btl_tcp_progress_thread 1 (non default) and --mca mpi_add_procs_cutoff 0 (default)	2016-09-05 15:57:34 +09:00
Artem Polyakov	dc0ab674de	Add PMIx key to provide RM with ability to indicate that it will cleanup session directories provided at through OPAL_PMIX_TMPDIR, OPAL_PMIX_NSDIR, OPAL_PMIX_PROCDIR	2016-09-05 07:48:44 +03:00
Nathan Hjelm	a36bdfe69f	opal/asm: updates to powerpc assembly This commit contains the following changes: - There is a bug in the PGI 16.x betas for ppc64 that causes them to emit the incorrect instruction for loading 64-bit operands. If not cast to void * the operands are loaded with lwz (load word and zero) instead of ld. This does not affect optimized mode. The work around is to cast to void * and was implemented similar to a work-around for a xlc bug. - Actually implement 64-bit add/sub. These functions were missing and fell back to the less efficient compare-and-swap implementations. Thanks to @PHHargrove for helping to track this down. With this update the GCC inline assembly works as expected with pgi and ppc64. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-02 23:47:47 -06:00
Jeff Squyres	95c6f6cfc0	btl/tcp: fix help message It looks like one help message was accidentally pasted in the middle of another. Disentangle the two messages from each other, and slightly tweak the one message to say that the job may also crash (in addition to hanging). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-09-02 17:14:22 -04:00
Nathan Hjelm	f93c1f2106	btl/ugni: fix erroneous warning message This commit prevents the connection code from trying to connect an endpoint if the directed datagram has been posted but not received. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-02 09:17:44 -06:00
Ralph Castain	34f04a7924	Remove spurious Makefile.am line	2016-09-01 15:31:09 -07:00
Ralph Castain	0ea1cff733	Implement notification of completion on comm_spawn'd child jobs. Add a configure flag to enable PMIx 3's shared memory datastore, and set it disable by default so that comm_spawn functions again. Will reverse the default once that feature is fully functional	2016-09-01 13:10:10 -07:00
rhc54	39d086e000	Merge pull request #2035 from rhc54/topic/memprofile Provide a mechanism for obtaining memory profiles of daemons and application profiles for use in studying our memory footprint	2016-08-31 14:06:48 -05:00
Ralph Castain	39992d1ad7	Silence trivial Coverity warnings	2016-08-31 09:42:33 -07:00
Ralph Castain	c1050bc01e	Provide a mechanism for obtaining memory profiles of daemons and application profiles for use in studying our memory footprint. Setting OMPI_MEMPROFILE=N causes mpirun to set a timer for N seconds. When the timer fires, mpirun will query each daemon in the job to report its own memory usage plus the average memory usage of its child processes. The Proportional Set Size (PSS) is used for this purpose.	2016-08-31 09:32:07 -07:00
Ralph Castain	cfa784c9a6	Since we changed storage to pointers in pmix_value_t, we need to allocate space for those values when unpacking	2016-08-29 20:22:24 -07:00
George Bosilca	a6d515ba9e	Fixes opal_atomic_ll_64. Thanks to Paul Hardgrove for the report and his patch. This is an addition to #1140 and should go in 2.x	2016-08-27 12:43:48 -04:00
Nathan Hjelm	d33204b0dc	Merge pull request #2021 from hjelmn/xlc_fix opal/patcher: fix xlc support	2016-08-26 18:15:41 -06:00
rhc54	b90a64e734	Merge pull request #2022 from rhc54/topic/nnodes Provide the number of nodes in the job	2016-08-26 18:15:24 -05:00
Ralph Castain	2f6e0fec90	Provide the number of nodes in the job	2016-08-26 14:50:41 -07:00
Jeff Squyres	09ad7e81eb	Merge pull request #2007 from jsquyres/pr/usnic-show-local-udp-ports usnic: show the local UDP ports	2016-08-26 17:03:16 -04:00
Nathan Hjelm	a9bc692d99	opal/patcher: fix xlc support The xlc compiler seems to behave in a different way that gcc when it comes the inline asm. There were two problems with the code with xlc: - The TOC read in mca_patcher_base_patch_hook used the syntax register unsigned long toc asm("r2") to read $r2 (the TOC pointer). With gcc this seems to behave as expected but with xlc the result in toc is not the same as $r2. I updated the code to use asm volatile ("std 2, %0" : "=m" (toc)) to load the TOC pointer. - The OPAL_PATCHER_BEGIN macro is meant to be the first thing in a hook. On PPC64 it loads the correct TOC pointer (thanks to mca_patcher_base_patch_hook) and saves the old one. The OPAL_PATCHER_END macro restores the TOC pointer. Because we need the TOC to be correct before it is accessed in the hook the OPAL_PATCHER_BEGIN macro MUST come first. We did this and all was well with gcc. With xlc on the other hand there was a TOC access before the assembly inserted by OPAL_PATCHER_BEGIN. To fix this quickly I broke each hook into a pair of function with the OPAL_PATCHER_* macros on the top level functions. This works around the issue but is not a clean way to fix this. In the future we should 1) either update overwrite to not need this, or 2) figure out why xlc is not inserting the asm before the first TOC read. This fixes open-mpi/ompi#1854 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-26 14:43:03 -06:00
Jeff Squyres	87a5ccc060	usnic: show the local UDP ports Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-26 12:25:18 -07:00
Jeff Squyres	e03a40a0e9	pmix3x: remove generated file Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-26 10:30:47 -07:00
Jeff Squyres	9ae51a09f2	Merge pull request #1989 from jsquyres/pr/update-usnic-to-libfabric-v1.4 Update usnic BTL to libfabric v1.4	2016-08-26 09:53:07 -04:00
Gilles Gouaillardet	e4bf915e75	pmix3x: remove auto-generated file remove opal/mca/pmix/pmix3x/pmix/src/include/pmix_config.h.in .gitignore is correct, so it seems this file was added before .gitignore was updated	2016-08-26 15:00:18 +09:00
Ralph Castain	af67f16422	Update configury to support multiple PMIx versions, rename pmix2x component to pmix3x for support of PMIx master Update support for external v1.1.x and v2.x libraries. Minor corrections to the v3.x component	2016-08-25 18:19:05 -07:00
Gilles Gouaillardet	277c319389	opal/util: fix (again and again) incorrect type casting in opal_path_df and silence CID 1371767 this fixes previous commits : - open-mpi/ompi@2eec8970ff - open-mpi/ompi@a439afce5b	2016-08-26 09:42:45 +09:00
Nathan Hjelm	de32c779e2	opal/wait_sync: add #if protection on header Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-25 14:31:52 -06:00
Jeff Squyres	f56b16f079	usnic: remove unused variable Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-25 03:53:18 -07:00
Jeff Squyres	9717bcb7e6	btl/usnic: remove stale comment Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-25 03:53:18 -07:00
Jeff Squyres	6f5e377fe0	btl/usnic: update for libfabric v1.4 With libfabric v1.4, the usnic provider changed the values of its fabric and domain name strings (compared to libfabric <v1.4). Update the Open MPI usNIC BTL to handle both pre-v1.4 and v1.4 fabric/domain names. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-25 03:53:17 -07:00
George Bosilca	3adff9d323	Fixes #1793 . Reshape the tearing down process (connection close) to prevent race conditions between the main thread and the progress thread. Minor cleanups.	2016-08-24 22:45:19 -04:00
Nathan Hjelm	83062db7cb	btl/ugni: actually make the endpoint lock recursive Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-24 10:36:08 -06:00
Gilles Gouaillardet	2eec8970ff	opal/util: fix (again) incorrect type casting in opal_path_df this fixes previous commit open-mpi/ompi@a439afce5b	2016-08-24 12:50:15 +09:00
Gilles Gouaillardet	02847d9e7b	pmix2x: dstore: add missing <fcntl.h> include file in pmix_esh.c (back-ported from upstream pmix/master@5c66ffe0f0)	2016-08-24 11:18:46 +09:00
Gilles Gouaillardet	c11e8163f8	pmix2x: sec/native: fix the pmix_native module under solaris by using getpeerucred() and fail with a user friendly message if no method is available: "sec: native cannot validate_cred on this system" (back-ported from upstream pmix/master@c474a1fc60)	2016-08-24 11:18:40 +09:00
Gilles Gouaillardet	e91292aa41	pmix2x: configury: add missing check for <netdb.h> header file (back-ported from upstream pmix/master@e54ce6d423)	2016-08-24 11:18:32 +09:00
Gilles Gouaillardet	a439afce5b	opal/util: fix incorrect type casting in opal_path_df	2016-08-24 10:26:13 +09:00
Potnuri Bharat Teja	9b7f9ece20	Add Chelsio T6 adapter device parameters. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>	2016-08-23 10:38:13 +05:30
Ralph Castain	639dbdb7ea	For maintainability, fold the external PMIx 2.x integration into the internal PMIx 2.x library component. This ensures that we always stay in sync with the two as that is becoming a problem.	2016-08-22 13:28:55 -07:00
George Bosilca	fd57f5bccd	Remove some of the clang warnings.	2016-08-20 14:21:42 -04:00
Ralph Castain	61ffba668b	Roll in the latest PMIx version - includes shared memory datastore and reduced memory footprint	2016-08-20 07:53:06 -07:00
Artem Polyakov	6ea8cccdab	Merge pull request #1969 from artpol84/pmix_jobid_fix Pmix jobid fix	2016-08-18 17:24:58 +07:00
Ralph Castain	7da9793fef	Support the PMIX_TIMEOUT key at the PMIx server when timeout=0 - this indicates that the user doesn't want a lookup of any data from the host RM.	2016-08-17 16:26:58 -05:00
Gilles Gouaillardet	3126ff77e2	pmix2x: common syms: whitelist bison-generated common symbols Bison generates some common symbols that we can't do anything about, so whitelist them.	2016-08-16 11:29:06 +09:00
Artem Polyakov	c5a91c5c9d	opal/pmix: fix pmix jobid calculation if external PMIx server is used.	2016-08-15 21:13:51 +03:00
Ralph Castain	ecbedee8bb	Fix typo	2016-08-15 07:32:00 -07:00
Artem Polyakov	f3c816b52e	opal/pmix: fix indentation in some files.	2016-08-15 18:21:50 +07:00
Gilles Gouaillardet	483685eb6a	update .gitignore remove autogenerated opal/mca/pmix/pmix2x/pmix/src/include/pmix_config.h.in	2016-08-15 17:00:20 +09:00
Ralph Castain	be8424b691	Provide backward compatible keys so that the non-PMIx components in the opal/pmix framework don't have to adjust as we continue to work on finalizing the PMIx reference scheme. Activate and utilize the new PMIx show_help capability to provide more meaningful error output when the server cannot start. Add a contrib script to cleanup permissions incorrectly modified due to things like smb mounts dd	2016-08-13 12:13:04 -07:00
rhc54	ddde154d28	Merge pull request #1962 from rhc54/topic/notify Ensure we properly convert pmix status to ORTE state before activatin…	2016-08-13 06:59:50 -07:00
Ralph Castain	48d35a9627	Ensure we properly convert pmix status to ORTE state before activating an error state upon notification. Cleanup some conversion issues on notification info. Add a new orte_notify.c test program	2016-08-12 21:14:29 -07:00
Ralph Castain	4a4c9703a9	Setup the job list in the PMIx integration so that static ports can run	2016-08-12 13:27:10 -07:00
Ralph Castain	0e58609327	Fix a bug where we were requiring that all paths in $PATH be absolute. Some users provide relative paths in their environment, and we should respect those.	2016-08-12 11:28:57 -07:00
Ralph Castain	1d44f0c0e2	Silence Coverity warnings	2016-08-11 21:22:01 -07:00
Ralph Castain	73544d2e00	Rename symbol	2016-08-11 13:06:46 -07:00
Ralph Castain	b0cc9b0bc8	Update to latest PMIx toolext branch Fix indentations Update the ext20 component to match latest PMIx master. Cleanup name conflicts and uninit vars	2016-08-11 12:29:48 -07:00
Gilles Gouaillardet	dfbf2b7be4	opal/threads: add OPAL_THREAD_SUB_SIZE_T macro -1 is not a valid size_t, so instead of OPAL_THREAD_ADD_SIZE_T(..., -1), simply OPAL_THREAD_SUB_SIZE_T(..., 1) and keep picky compilers happy	2016-08-10 13:37:36 +09:00
rhc54	60f789dca1	Merge pull request #1948 from rhc54/topic/pmixtool Update to include extended tool support, new datatypes	2016-08-09 16:17:28 -07:00
Nathan Hjelm	19be439998	Merge pull request #1949 from hjelmn/ugni_fix btl/ugni: fix another connection race	2016-08-09 08:32:40 -06:00
Nathan Hjelm	38f18eed22	Merge pull request #1941 from ggouaillardet/topic/memory_patcher_configury configury: make memory/patcher symbol detection more robust	2016-08-09 07:06:38 -06:00
Gilles Gouaillardet	13009aa290	opal/alfg: have opal_random() wrapper always return a positive int	2016-08-09 17:12:30 +09:00
Gilles Gouaillardet	6f6b3ac68a	configury: standardize memory/patcher symbol detection and make it more robust by default, Sun compilers optimize out the original test, and hence fail detecting a symbol is missing.	2016-08-09 09:35:52 +09:00

... 2 3 4 5 6 ...

4639 Коммитов