1
1
Граф коммитов

27994 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
9771c575f5 Merge pull request #4352 from edgargabriel/pr/sem_close_fix
sharedfp/sm: close the named semaphore
2017-10-19 17:04:43 +09:00
Edgar Gabriel
4d995bd4eb sharedfp/sm: close the named semaphore
in case a named semaphore is used, it is necessary to close the semaphore to remove
all sm segments. sem_unlink just removes the name references once all proceeses have closed
the sem.

Fixes issue: #4336

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>

sharedfp/sm: unlink only needs to be called by one process

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-10-18 10:37:30 -05:00
Aurelien Bouteiller
3ef23f41a3
Bugfix a crash when a comm cannot be initialized
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2017-10-18 11:32:37 -04:00
Howard Pritchard
3345122f86 Merge pull request #4357 from hppritcha/topic/pmix_cray_fence_fix
pmix/cray: define fence method for cray pmix
2017-10-18 07:52:11 -06:00
Alex Mikheev
7cb7af1685
OSHMEM: add ucx to the list of default spmls
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-10-18 10:41:00 +03:00
Howard Pritchard
e8bfd494e7 pmix/cray: define fence method for cray pmix
Turns out UCX PML calls opal_pmix.fence in its del procs
method without checking whether or not the fence method
for the pmix component was defined.  Rather than patch
UCX PML, actually define a fence method for the cray pmix.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-10-17 15:58:01 -06:00
Ralph Castain
a76a61b2c9 Merge pull request #4348 from ggouaillardet/topic/pmix_libdir
configury: enhance external PMIx detection - add the --with-pmix-libd…
2017-10-17 10:05:48 -05:00
Gilles Gouaillardet
db2f3643d7 configury: enhance external PMIx detection - add the --with-pmix-libdir=DIR option look for libpmix.* libs in DIR, DIR/lib64 and DIR - if --with-pmix=DIR is given, look for libpmix.* in DIR/lib64 and DIR/lib
Fixes open-mpi/ompi#4347

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-10-17 17:03:01 +09:00
Joshua Ladd
27eb401a84 Merge pull request #4344 from alinask/topic/oshmem_verbs_build_restore
OSHMEM/CONFIGURE: verbs component - restore the previous build behavior
2017-10-16 11:32:19 -04:00
Alina Sklarevich
c7f5d13550 OSHMEM/CONFIGURE: verbs component - restore the previous build behavior
In case where support was requested but not found, stop the build.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-10-16 11:53:02 +03:00
Ralph Castain
6d7a780016 Merge pull request #4341 from rhc54/topic/foo
Ensure that the pmix server system-level rendezvous file is only output by the HNP
2017-10-14 13:17:42 -05:00
Ralph Castain
6ffb0d0507 Ensure that the pmix server system-level rendezvous file is only output by the HNP as (at least for slurm on cray) a daemon could be colocated with the HNP and overwrite the file. Update the scaling.pl script to only use the system-level rendezvous so it doesn't get rejected by a colocated daemon
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-14 10:16:49 -07:00
Ralph Castain
b75ed83d4b Merge pull request #4340 from rhc54/topic/update
Sync to PMIx v3. Ensure prun uses the ess/tool component.
2017-10-14 11:51:05 -05:00
Ralph Castain
60b338e857 Sync to PMIx v3. Ensure prun uses the ess/tool component.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-14 08:24:57 -07:00
Ralph Castain
ba4ec735e0 Merge pull request #4339 from rhc54/topic/s2
Ensure we exit with an appropriate error code when hitting a PMI2 error
2017-10-13 22:46:52 -05:00
Ralph Castain
8ae10c9e1a Ensure we exit with an appropriate error code when hitting a PMI2 error
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-13 19:30:28 -07:00
Ralph Castain
e1d1c8d3b2 Merge pull request #4337 from rhc54/topic/scaling
Update the scaling.pl script
2017-10-13 20:25:42 -05:00
Ralph Castain
31bce4ba9c Update the scaling.pl script
* check that the command succeeds when pre-positioning the file to ensure there isn't an error somewhere in the execution

* properly define srun cmd line options

* terminate the orte-dvm only when it is actually in operation so prun doesn't generate spurious error messages

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-13 18:23:18 -07:00
Yossi Itigin
835fa336d3 Merge pull request #4315 from alinask/topic/oshmem_config_ibv_exp_reg_shared_mr
OSHMEM/CONFIGURE: Check for the presence of ibv_exp_reg_shared_mr.
2017-10-13 22:40:23 +03:00
Alina Sklarevich
3008827f83 OSHMEM/CONFIGURE: Check for the presence of ibv_exp_reg_shared_mr.
+ The sshmem verbs component will disqualify itself if this verb isn't
present on the build host.
+ In case where support was requested but not found, don't stop the
build - continue without this component.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-10-12 19:57:12 +03:00
Jeff Squyres
fba6990328 Merge pull request #4330 from jsquyres/pr/configure-option-for-show-load-errors
configure: add --en|disable-show-load-errors-by-default
2017-10-12 10:42:33 -04:00
Nathan Hjelm
1c52d9dffe opal/asm: clean up no longer supported architectures
We no longer officially support MIPS or ARM before v6. This commit
updates the configury to check for sync builtins on these
architectures and removes the MIPS and IA64 assembly from
opal/include/opal/sys.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-10-11 13:09:29 -06:00
Jeff Squyres
5705192151 configure: add --en|disable-show-load-errors-by-default
Give packagers a configure CLI option to set the value of the MCA
variable mca_base_component_show_load_errors.

The --disable form of this option is intended for Open MPI packagers
who tend to enable support for many different types of networks and
systems in their packages.  For example, consider a packager who
includes support for both the FOO and BAR networks in their Open MPI
package, both of which require support libraries (libFOO.so and
libBAR.so).  If an end user only has BAR hardware, they likely only
have libBAR.so available on their systems -- not libFOO.so.  Disabling
load errors by default will prevent the user from seeing potentially
confusing warnings about the FOO components failing to load because
libFOO.so is not available on their systems.

Conversely, system administrators tend to build an Open MPI that is
targeted at their specific environment, and contains few (if any)
components that are not needed.  In such cases, they might want their
users to be warned that the FOO network components failed to load
(e.g., if libFOO.so was mistakenly unavailable), because Open MPI may
otherwise silently failover to a slower network path for MPI traffic.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-10-11 11:02:21 -07:00
Ralph Castain
6dd718afd7 Merge pull request #4325 from rhc54/topic/event
Add support for the -v (verbose) option to prun and silence the "executing" and "completed" output otherwise.
2017-10-10 20:24:38 -05:00
Ralph Castain
388034c814 Add support for the -v (verbose) option to prun and silence the "executing" and "completed" output otherwise.
Debounce "unreachable" notifications for tools when they disconnect
Enable the -x cmd line option for prun

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 0a5b36180a22959654461ac1303cec35313f8b4a)
2017-10-10 12:54:49 -07:00
Ralph Castain
1ae78e23fa Merge pull request #4313 from rhc54/topic/connect
Since PMIx is moving to release v3.0, embed the new release candidate…
2017-10-09 21:06:47 -05:00
Ralph Castain
c696e04c5e Since PMIx is moving to release v3.0, embed the new release candidate in opal/pmix framework. Move the pmix2x code over to the ext2x component. Create a new ext3x component
Remove some build product. Tell PMIx that we don't need a new nspace generated when OMPI calls connect
Add missing Makefile

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-09 13:51:08 -07:00
Joshua Ladd
bb4bbbf8dc Merge pull request #4309 from vspetrov/coll_hcoll_dte_extensions
Coll hcoll dte extensions
2017-10-09 12:18:18 -04:00
Ralph Castain
59682c2c49 Merge pull request #4312 from rhc54/topic/dli
Fix cmd line passing of DVM URI
2017-10-06 22:55:07 -05:00
Ralph Castain
51f3fbdb3e Fix cmd line passing of DVM URI
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-06 18:10:46 -07:00
Ralph Castain
38e0194c7b Merge pull request #4311 from rhc54/topic/pup
Sync to PMIx master
2017-10-06 15:30:23 -05:00
Ralph Castain
c3b239cee8 Sync to PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-06 12:40:23 -07:00
Ralph Castain
3326439cf5 Merge pull request #4310 from rhc54/topic/remote
Enable remote tool connections for the DVM.
2017-10-06 14:01:01 -05:00
Ralph Castain
5352c31914 Enable remote tool connections for the DVM. Fix notifications so we "de-bounce" termination calls
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-06 10:47:05 -07:00
Howard Pritchard
406c4cc126 Merge pull request #4299 from hppritcha/topic/update_lanl_toss_platform_file
LANL/platform: disable use of XRC recv bufs
2017-10-06 09:31:17 -06:00
Valentin Petrov
1e311b2619 coll/hcoll: dtype fallback optimization
If hcoll fails to create mpi derived type let's set zero_dte on this dtype.
    This will save cycles on subsequent collective calls with the same derived
    type since we will not try to create hcoll type again.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2017-10-06 10:29:29 +03:00
Valentin Petrov
06ef344630 coll/hcoll: extends dtypes support
Adds support for legacy MPI_UB/LB types (old apps may use it) as
    well as for BOOL/WCHAR.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2017-10-06 10:29:29 +03:00
Ralph Castain
3660cedc48 Merge pull request #4305 from rhc54/topic/pup
Update to track PMIx master
2017-10-05 13:46:09 -05:00
Ralph Castain
073eff5dcd Update to track PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-05 10:50:08 -07:00
Geoff Paulsen
be7b0af5d9 Merge pull request #3609 from markalle/pr/single_type_with_LB_UB
single_predefined_type with MPI_LB/UB
2017-10-04 15:13:09 -05:00
Ralph Castain
8adbe9ffdd Merge pull request #4301 from rhc54/topic/fixes
Fix the embedded hwloc configure to always disable cuda support
2017-10-04 15:11:29 -05:00
Ralph Castain
c341b53475 Fix the embedded hwloc configure to always disable cuda support. Add definitions for updated hwloc objects when old external versions are used
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-04 11:35:20 -07:00
Howard Pritchard
1a639ec477 LANL/platform: disable use of XRC recv bufs
Forgot as part of #3970 to disable use of XRC
recv bufs by default in LANL platform config
file.

related to #4300

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-10-04 11:40:20 -06:00
Mohan Gandhi
fa01fad2ca Merge pull request #4295 from mohanasudhan/iss4131
Btl tcp: Fix racing condition on simultaneous handshake
2017-10-03 17:14:24 -07:00
Mark Allen
e24d5ccb7e single_predefined_type with MPI_LB/UB
The ompi_datatype_get_single_predefined_type_from_args() recurses down
into a constructed type to identify what base datatype it's built from
if it's built from a single type.  But if the type has MPI_LB/MPI_UB,
for example
    lens[0] = 1;
    lens[1] = 1;
    disps[0] = 0;
    disps[1] = 0;
    types[0] = MPI_LB;
    types[1] = MPI_INT;
    MPI_Type_create_struct(2, lens, disps, types, &mydt);
then this function will see the base type MPI_LB as differing from MPI_INT
and will identify mydt as not being constructed from a single base type, so
the type will be rejected for calls like MPI_Accumulate.

I think those "meta data" types shouldn't result in rejection like that, and
the above mydt should still be identified as having a single base type
of MPI_INT.

Addition: boslica wanted another change discussed here
    https://github.com/open-mpi/ompi/pull/3609
relating to the calculation for "count" after identifying the
predefined_type that was being used.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-10-03 19:08:18 -04:00
Ralph Castain
590a668f13 Merge pull request #4294 from rhc54/topic/pup
Sync to PMIx master
2017-10-03 18:04:54 -05:00
George Bosilca
bdbea63a1c
Update the MPI standard reference.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-10-03 16:48:50 -04:00
Mohan Gandhi
6d642e8d94 Btl tcp: Fix racing condition on simultaneous handshake
Their is racing condition in TCP connection establishment
during simultaneous handshake. This PR handles the fix for
it.

Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-10-03 13:13:43 -07:00
Mohan Gandhi
8e99b43084 mailmap: adding new entry for myself
Signed-off-by: Mohan Gandhi <mohgan@amazon.com>
2017-10-03 13:13:42 -07:00
Ralph Castain
3ad5a40ba8 Sync to PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-03 10:56:30 -07:00