1
1
Граф коммитов

2894 Коммитов

Автор SHA1 Сообщение Дата
Nysal Jan K A
19e3be31e5 Merge pull request #2421 from nysal/master
mpit: Fix MPI_T_pvar_get_index
2016-12-22 15:33:51 +05:30
Gilles Gouaillardet
54c84196a6 btl/vader: plug a memory leak
as reported by Coverity with CID 1362691

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-22 16:04:36 +09:00
Nysal Jan K.A
25ba507ada mpit: Fix MPI_T_pvar_get_index
MPI_T_pvar_get_index was returning an incorrect index. The index
was never set correctly while registering the performance variables.
Additionally fix a missing case in the mca_base_var_type_t to MPI
datatype conversion. This type is currently used for control variables
registered by mxm, fca and hcoll components.

Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
2016-12-22 12:30:21 +05:30
Jeff Squyres
3571c3c5bb hwloc external: minor fixes to 9649c44
- Fix capitolization typos
- Make comment more correct / flow better
- Use AM_CPPFLAGS, not DEFAULT_INCLUDES
- Remove extra "hwloc/" from external hwloc.h specification

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-21 09:06:24 -08:00
Gilles Gouaillardet
9649c44fa0 hwloc: correctly handle --with-hwloc=external
- simply #include "hwloc.h" to use the external hwloc header
- do use the external hwloc header instead of opal/mca/hwloc/hwloc.h

Thanks Orion Poplawski for the report

Fixes open-mpi/ompi#2616

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-21 11:58:10 +09:00
Ralph Castain
ea133206ec Sync the internal OMPI component to PMIx master
Update external PMIx v2.x component
Add missing Makefile

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-19 19:14:16 -08:00
Ralph Castain
256b5adac5 Transfer across final fixes from debugger attach work
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-19 00:34:27 -08:00
Ralph Castain
c6f6f40529 Transfer debugger support changes
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-17 18:14:46 -08:00
rhc54
54c4925f3f Merge pull request #2598 from rhc54/topic/debugger
Transfer back changes from debugger attach work
2016-12-17 13:09:38 -08:00
Nathan Hjelm
16a2f09cd5 Merge pull request #2596 from hjelmn/x86_rtdtsc
opal/timer: add code to check if rtdtsc is core invariant
2016-12-17 11:14:49 -07:00
Ralph Castain
269753f5c1 Transfer back changes from debugger attach work
Silence warning

Remove debug

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-17 10:00:52 -08:00
Ralph Castain
215d6290e0 Add a flux component for LLNL
Fine tuning of flux component
Fix a few minor issues with the initial cut:
* Job id could be obtained from the PMI kvsname like SLURM,
  but simpler to getenv (FLUX_JOB_ID)
* Flux pmi-1 doesn't define PMI_BOOL, PMI_TRUE, PMI_FALSE
* Flux pmi-1 maps the deprecated PMI_Get_kvs_domain_id() to
  PMI_KVS_Get_my_name() internally, so just call that instead.
* Drop residual slurm references.

Add wrappers for PMI functions so that if HAVE_FLUX_PMI_LIBRARY
is not defined, the component can dlopen libpmi.so at location
specified by the FLUX_PMI_LIBRARY_PATH env variable, which adds
flexibility.  If HAVE_FLUX_PMI_LIBRARY is defined, link with
libpmi.so at build time in the usual way.

Update configury for flux component

Update m4 so the configure options work as follows:

 --with-flux-pmi
      Build Flux PMI support (default: yes)

 --with-flux-pmi-library
      Link Flux PMI support with PMI library at build
      time. Otherwise the library is opened at runtime at
      location specified by FLUX_PMI_LIBRARY_PATH environment
      variable. Use this option to enable Flux support when
      building statically or without dlopen support (default: no)

If the latter option is provided, the library/header is located at
build time using the pkg-config module 'flux-pmi'.  Otherwise there
is no library/header dependency.

Handle the case where ompi is configured with --disable-dlopen
or --enable-statkc.  In those cases, don't build the component
unless --with-flux-pmi-library is provided.

It is fatal if the user explicitly requests --with-flux-pmi but
it cannot be built (e.g. due to --disable-dlopen).

Add a schizo/flux component

Update schizo/flux component

Eliminate slurm-specific usage cases.

Since the module is only loaded if FLUX_JOB_ID is set, there are
only two cases to handle:

1) App was launched indirectly through mpirun.  This is not yet
supported with Flux, but hook remains in case this mode is supported
in the future.

2) App was launched directly by Flux, with Flux providing
CPU binding, if any.

Fix up white space in pmix/flux component

Drop non-blocking fence from pmix:flux component

The flux PMI-1 library is not thread safe, therefore
register a regular blocking fence callback instead of the
thread-shifting fencenb().

pmix/flux component avoids extra PMI_KVS_Gets

Keys stored into the base cache under the wildcard
rank are not intended to be part of the global key namespace.
These keys therefore should not trigger a PMI_KVS_Get() if they
are not found in the cache.

Minor pmix/flux component cleanup

pmix/flux: drop code for fetching unused pmix_id

pmix/flux: err_exit must return error

Problem: in flux_init(), although 'ret' (variable holding
err_exit return code) is initialized to OPAL_ERROR, the
variable is reused as a temporary result code, so if there are
some successes followed by a failure that doesn't set 'ret',
flux_init() could return success with PMI not initialized.

Ensure that a "goto err_exit" returns OPAL_ERROR if 'ret'
is not set to some other error code.

pmix/flux: don't mix OPAL_ and PMI_ return codes

Problem: flux_init() can return both PMI_ and OPAL_ return
codes.  Although OPAL_SUCCESS and PMI_SUCCESS are both defined
as 0, other codes are not compatible.

Ensure that flux_init() consistently uses 'rc' for PMI_
return codes and 'ret' for OPAL_ return codes.

pmix/flux: factor out repeated code for cache put

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-16 18:26:38 -08:00
Nathan Hjelm
a718743a5c opal/timer: add code to check if rtdtsc is core invariant
Newer x86 processors have a core invariant tsc. On these systems it is
safe to use the rtdtsc instruction as a monotonic timer. This commit
adds a new function to the opal timer code to check if the timer
backend is monotonic. On x86 it checks the appropriate bit and on
other architectures it parrots back the OPAL_TIMER_MONOTONIC value.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-12-16 15:11:50 -07:00
rhc54
00b87ea829 Merge pull request #2584 from rhc54/topic/warnings
Reduce the flood of warnings due to uninitialized variables, mismatch…
2016-12-15 10:09:01 -08:00
Ralph Castain
585540bcee Reduce the flood of warnings due to uninitialized variables, mismatched types, and unused things to a more bearable trickle
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-14 16:33:50 -08:00
Gilles Gouaillardet
a019095b84 pmix2x/class: correctly handle concurrent class initialization
(back-ported from upstream commit pmix/master@ceedbd67fd)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-15 09:07:24 +09:00
Ralph Castain
884fb7fcf2 Update the PMIx2 support to include the latest shared memory optimizations
Update ORTE support for dynamic PMIx operations e.g., PMIx_Spawn
Update to track master
Ensure that --disable-pmix-dstore actually disables the dstore. Sync to a few debugger updates

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-14 15:00:10 -08:00
Ralph Castain
1961a1c22a Use the server tmpdir instead of the system tmpdir for tool contact files
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-14 08:42:09 -08:00
Ralph Castain
fbed2d794a Update to latest PMIx master + PTL branch
Update the usock component to disable it

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-06 20:47:44 -08:00
Ralph Castain
d51821cbc7 Update pmix check headers to support Open BSD
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-06 19:37:06 -08:00
Ralph Castain
f91f8ce494 Protect against NULL param
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-03 21:19:32 -08:00
Ralph Castain
f633d5c1b1 Initialize var
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-03 21:08:45 -08:00
rhc54
003f7d308f Merge pull request #2504 from hppritcha/topic/fix_pmix_base_help_file
pmix: Fix pmix base help file.
2016-12-03 11:57:07 -08:00
Ralph Castain
a43fae74a5 Avoid hanging in show_help if PMIx was unable to initialize
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-03 10:12:58 -08:00
Ralph Castain
a4e3f615e3 Fix executable modes of pmix config scripts
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-02 20:50:59 -08:00
Ralph Castain
342dbfcf4e Update
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-02 20:33:03 -08:00
Ralph Castain
8e64382edf Update to correct tarball
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-02 19:16:07 -08:00
Ralph Castain
1a0bccb536 Now that PMIx has settled on its release strategy and numbering, update the OPAL pmix framework to track
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-02 15:44:43 -08:00
Howard Pritchard
3049848731 Fix pmix base help file.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-12-02 15:03:22 -06:00
Ralph Castain
6041467df0 Update to latest PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-01 14:47:44 -08:00
Gilles Gouaillardet
3a76a78bff btl/openib: plug a memory leak in btl_openib_register_mca_params()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-01 14:24:30 +09:00
Gilles Gouaillardet
c9aeccb84e opal/if: open the if framework once in opal_init_util
the if framework is no more open in opal_if*, which plugs
several memory leaks

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-01 14:24:30 +09:00
Gilles Gouaillardet
45732fd764 hwloc/base: fix a memory leak in buffer_cleanup()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-01 14:24:29 +09:00
Howard Pritchard
7ce3ca25ef Merge pull request #2458 from hppritcha/topic/pmix_cray_no_dlopen
pmix/cray: abort job if using aprun for general case
2016-11-25 14:03:20 -07:00
Howard Pritchard
eee9f7ae3a pmix/cray: abort job if using aprun for general case
It turns that there is an incompatibility between the Cray PMI
library and the default configuration for building Open MPI (master).
To work around this, we now disable use of aprun for direct launch
of Open MPI jobs except under specific conditions.

The problem is that there are now (on master) packages getting
initialized that do not work properly across a fork operation.
As part of a constructor in the Cray PMI library, a fork operation
is done to simplify use of shared memory between the
processes in a job on the same node.  This ends up thoroughly
messing up the Open MPI initialization process in the case
that dlopen support is enabled.  The initialization process gets
about half-way through when the PMIX framework is opened and
components are loaded, which triggers the Cray PMI constructor
and hence the fork operation.

There are two workarounds for this:
1) configure Open MPI for Cray XE/XC systems using aprun with the
   --disable-dlopen option
2) set the PMI_NO_FORK environment variable in the shell in which
   the aprun command is run.

Without taking these measures, a Open MPI job will just hang at
job startup in the first attempt to "thread-shift" the PMIx
fence_nb operation.  Additional hangs occur at shutdown if this
problem is worked around, again due to the insertion of a fork
operation halfway through the Open MPI initialization procedure.

This commit detects if the conditions that bring out the hang
situation are present, and if so, prints out a message and
aborts the job launch.

Note on systems using slurm, the PMI_NO_FORK environment variable
is set as part of the srun job launch, hence this issue is avoided
on those systems.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-11-25 06:28:19 -07:00
Gilles Gouaillardet
1a279c4ee9 btl/self: fix fragment segment length in mca_btl_self_prepare_src()
opal_convertor_pack() might pack less bytes than requested,
so always set frag->segments[0].seg_len.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-11-24 10:44:56 +09:00
Howard Pritchard
2bb4cffffd Merge pull request #2447 from hppritcha/topic/compiler_warning_swats
btl/ugni:vader swat some compiler warnings
2016-11-22 05:28:20 -07:00
Howard Pritchard
09f47fcf8e btl/ugni:vader swat some compiler warnings
Swat some compiler warnings.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-11-21 14:58:34 -06:00
Howard Pritchard
2cbc0e8472 pmix/cray: fix disable-dlopen problem
PR open-mpi/ompi#2432 introduced a regression where configure
and build with --disable-dlopn caused build failure owing
to unresolved alps lli symbols in the libopal-pal shared library.

This commit fixes this problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-11-21 13:45:10 -06:00
Howard Pritchard
0bbb319246 Merge pull request #2444 from hppritcha/topic/cray_pmix_ws_cleanup
pmix/cray: whitespace cleanup
2016-11-21 06:03:56 -07:00
Howard Pritchard
08dce4f161 pmix/cray: whitespace cleanup
Get rid of tabs.  This is anti-ompi style.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-11-18 19:30:40 -07:00
Howard Pritchard
44f4663d0d Merge pull request #2432 from hppritcha/topic/fix_info_etc_w_alps
pmix/cray: set some envars for MPI_INFO_ENV object
2016-11-18 14:23:53 -07:00
Howard Pritchard
de3de131af pmix/cray: set some envars for MPI_INFO_ENV object
Enhance the cray pmix component to set some OMPI internal
env. variables used to set some key/value pairs
on the MPI_INFO_ENV object.  This allows more of the
ompi-tests ibm unit tests to pass when using aprun/srun
direct launch and Cray PMI.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-11-16 17:37:52 -06:00
Joshua Ladd
4907085c6f Add the ConnectX-5 device ID to openib BTL.
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-11-16 21:42:37 +02:00
Gilles Gouaillardet
8ef538adeb Merge pull request #2398 from bosilca/topic/tcp_endpoints_mutex
Protect the tcp_endpoints list from concurrent accesses.
2016-11-14 22:13:29 -07:00
Howard Pritchard
703b464c03 pmix: fix a typo in a help file
Fixes #2391

Thanks to @njoly for reporting

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-11-12 11:49:15 -07:00
George Bosilca
d0dddef53d
Protect the tcp_endpoints list from concurrent accesses.
Thanks Gilles for your help.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2016-11-11 00:06:03 -05:00
Gilles Gouaillardet
a49422fe84 btl/tcp: get rid of the MCA_BTL_TCP_SUPPORT_PROGRESS_THREAD macro
since pthreads are now mandatory, the MCA_BTL_TCP_SUPPORT_PROGRESS_THREAD
is always true and hence can be safely removed

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-11-08 14:00:05 +09:00
Aboorva Devarajan
fb8e074583 powerpc: Add support for powerpcle in timer/pstat.
Signed-off-by: Aboorva Devarajan <abodevar@in.ibm.com>
2016-11-07 02:35:44 -05:00
Gilles Gouaillardet
7a2894f1e0 event/libevent2022: cleanup dependencies to the embedded libevent lib
configury force event/libevent2022 to be built as a static module,
so simplify Makefile.am

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-11-07 14:57:52 +09:00