1
1

Merge pull request #4828 from hppritcha/topic/update_lanl_toss_platform

lanl/platform: add new toss2/3 platform files
Этот коммит содержится в:
Nathan Hjelm 2018-05-01 09:52:14 -06:00 коммит произвёл GitHub
родитель 380dcb57de 8eb738a9c8
Коммит 85d1965a0f
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
22 изменённых файлов: 552 добавлений и 135 удалений

Просмотреть файл

@ -12,7 +12,7 @@
# Copyright (c) 2009-2017 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2009-2017 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2010 IBM Corporation. All rights reserved. # Copyright (c) 2010 IBM Corporation. All rights reserved.
# Copyright (c) 2010-2011 Oak Ridge National Labs. All rights reserved. # Copyright (c) 2010-2011 Oak Ridge National Labs. All rights reserved.
# Copyright (c) 2013-2016 Los Alamos National Security, Inc. All rights # Copyright (c) 2013-2018 Los Alamos National Security, Inc. All rights
# reserved. # reserved.
# Copyright (c) 2013 Intel Corporation. All rights reserved. # Copyright (c) 2013 Intel Corporation. All rights reserved.
# Copyright (c) 2017 Amazon.com, Inc. or its affiliates. # Copyright (c) 2017 Amazon.com, Inc. or its affiliates.
@ -67,17 +67,21 @@ EXTRA_DIST = \
platform/lanl/cray_xc_cle5.2/optimized-common \ platform/lanl/cray_xc_cle5.2/optimized-common \
platform/lanl/cray_xc_cle5.2/optimized-lustre \ platform/lanl/cray_xc_cle5.2/optimized-lustre \
platform/lanl/cray_xc_cle5.2/optimized-lustre.conf \ platform/lanl/cray_xc_cle5.2/optimized-lustre.conf \
platform/lanl/toss/debug-common \ platform/lanl/toss/README \
platform/lanl/toss/debug \ platform/lanl/toss/common \
platform/lanl/toss/debug.conf \ platform/lanl/toss/common-optimized \
platform/lanl/toss/debug-mlx \ platform/lanl/toss/cray-lustre-optimized \
platform/lanl/toss/debug-mlx.conf \ platform/lanl/toss/cray-lustre-optimized.conf \
platform/lanl/toss/optimized-common \ platform/lanl/toss/toss2-mlx-optimized \
platform/lanl/toss/optimized \ platform/lanl/toss/toss2-mlx-optimized.conf \
platform/lanl/toss/optimized.conf \ platform/lanl/toss/toss2-qib-optimized \
platform/lanl/toss/optimized-mlx \ platform/lanl/toss/toss2-qib-optimized.conf \
platform/lanl/toss/optimized-mlx.conf \ platform/lanl/toss/toss3-hfi-optimized \
platform/lanl/toss/toss-common \ platform/lanl/toss/toss3-hfi-optimized.conf \
platform/lanl/toss/toss3-mlx-optimized \
platform/lanl/toss/toss3-mlx-optimized.conf \
platform/lanl/toss/toss3-wc-optimized \
platform/lanl/toss/toss3-wc-optimized.conf \
platform/lanl/darwin/darwin-common \ platform/lanl/darwin/darwin-common \
platform/lanl/darwin/debug-common \ platform/lanl/darwin/debug-common \
platform/lanl/darwin/optimized-common \ platform/lanl/darwin/optimized-common \

99
contrib/platform/lanl/toss/README Обычный файл
Просмотреть файл

@ -0,0 +1,99 @@
These platform files were created from platform files shipped with the release
tarball. Each file has been modified. Here are the details on how they were
created.
- common
Copy of contrib/platform/lanl/toss/toss-common. Removed entries in bottom
half of file that were specific to TOSS so that it could be used for Cray
platforms as well.
- common-optimized
Copy of contrib/platform/lanl/toss/optimized-common. Used the file as-is.
- toss2-qib-optimized
Copy of contrib/platform/lanl/toss/optimized with the following changes:
- source common and common-optimzed instead of toss-common and
optimized-common
- added entries that were removed from common:
- enable_mca_no_build
- with_slurm
- with_tm
- with_pmi
- with_verbs
- NOTE: common had "with_devel_headers=yes" in it that was not propagated.
This option should not be used in production as per Open MPI developer
mailing list guidance.
- Changed comment "Disable components not needed on any TOSS platform" to
"Disable components not needed on TOSS platforms with high-speed networks"
- Changed "enable panasas" to "enable lustre"
- toss2-qib-optimized.conf
- copy of contrib/platform/lanl/toss/optimized.conf with the following
changes:
- changed: orte_no_session_dirs = /lustre,/net,/users,/usr/projects
- changed: btl = ^openib
- removed: hwloc_base_binding_policy = core (outdated setting)
- added: rmaps_base_ranking_policy = core (rank by core)
- added: ras_base_launch_orted_on_hn = true (run orted on parent node of
allocation)
- toss2-mlx-optimized
- copy of toss2-qib-optimized
- toss2-mlx-optimized.conf
- copy of toss2-qib-optimized.conf with the following changes:
- remove: oob_tcp_if_include = ib0,eth0 (identification of general network
device names is problematic in RHEL7. Just let Open MPI figure it out)
- change: btl = vader,openib,self
- change: btl_openib_receive_queues = X,4096,1024:X,12288,512:X,65536,512
(change S to X; make sure numbers match those for the same entry in
contrib/platform/lanl/toss/optimized-mlx.conf)
- addition: pml = ob1 (disable MXM)
- addition: coll = ^hcoll (disable MXM)
- toss3-hfi-optimized
- copy of toss2-qib-optimized
- toss3-hfi-optimized.conf
- copy of toss2-qib-optimized.conf with the following changes:
- remove: oob_tcp_if_include = ib0,eth0
- add: oob_tcp_if_exclude = ib0 (Omnipath is flaky; don't use it for oob)
- toss3-wc-optimized (platform file for woodchuck which is an ethernet-only
connected cluster)
- copy of toss3-hfi-optimized with the following changes:
- change: remove "btl-tcp" from the enable_mca_no_build list
- change: comment "Disable components not needed on TOSS platforms with
high-speed networks" to "Disable components not needed on TOSS Ethernet-
connected clusters"
- change: with_verbs=no
- change: comment "Always build ibverbs support" to "Do not build ibverbs
support"
- toss3-wc-optimized.conf
- copy of toss3-hfi-optimized.conf with the following changes:
- change: comment "Add the interface for out-of-band communication and set
it up" to "Set up the interface for out-of-band communication"
- remove: oob_tcp_if_exclude = ib0
- remove: btl (let Open MPI figure out what best to use for ethernet-
connected hardware)
- remove: btl_openib_want_fork_support (no infiniband)
- remove: btl_openib_receive_queues (no infiniband)
- cray-lustre-optimized
- copy of contrib/platform/lanl/cray_xc_cle5.2/optimized-lustre with the
following changes:
- remove: whole if/else clause of 'test "$enable_debug" = "yes"'
- addition: source ./common
- addition: source ./common-optimized
- change: with_io_romio_flags="--with-file-system=ufs+nfs+lustre"
- remove: with_lustre=/opt/cray/lustre-cray_ari_s/default
- additions from platform/lanl/cray_xc_cle5.2/optimized-common that don't
go in common-optimzed:
- enable_mca_no_build=crs,filem,routed-linear,snapc,pml-dr,pml-crcp2,pml-crcpw,pml-v,pml-example,crcp,pml-cm,ess-cnos,grpcomm-cnos,plm-rsh,btl-tcp,oob-ud,ras-simulator,mpool-fake
- enable_mca_static=btl:ugni,btl:self,btl:vader,pml:ob1
- enable_mca_directpml-ob1
- with_verbs=no
- with_tm=no
- enable_orte_static_ports=no
- enable_pty_support=no
- addition: enable_dlopen=yes (change from original platform file as per
Nathan Hjelm)
- cray-lustre-optimized.conf
- copy of contrib/platform/lanl/cray_xc_cle5.2/optimized-lustre.conf with
the following changes:
- change: orte_no_session_dirs = /lustre,/users,/usr/projects
- remove: hwloc_base_binding_policy = core (outdated setting)
- addition: rmaps_base_ranking_policy = core (rank by core)
# vi: filetype=txt

20
contrib/platform/lanl/toss/common Обычный файл
Просмотреть файл

@ -0,0 +1,20 @@
# (c) 2013-1018 Los Alamos National Security, LLC. All rights reserved.
# Open MPI common configuration for TOSS/TOSS2 v1.7.x/1.8.x
enable_binaries=yes
enable_heterogeneous=no
enable_shared=yes
enable_static=yes
enable_ipv6=no
enable_ft_thread=no
enable_per_user_config_files=no
enable_memchecker=no
with_valgrind=no
# Enable the fortran bindings
enable_mpi_fortran=yes
# Disable the C++ binding. They were deprecated in MPI-2.2 and removed in MPI-3
enable_mpi_cxx=no
enable_mpi_cxx_seek=no
enable_cxx_exceptions=no

Просмотреть файл

@ -1,4 +1,4 @@
# (c) 2013 Los Alamos National Security, LLC. All rights reserved. # (c) 2013-2018 Los Alamos National Security, LLC. All rights reserved.
# Open MPI common optimized configuration for TOSS/TOSS2 v1.7.x/1.8.x # Open MPI common optimized configuration for TOSS/TOSS2 v1.7.x/1.8.x
enable_mem_debug=no enable_mem_debug=no

Просмотреть файл

@ -0,0 +1,34 @@
# (c) 2012-2018 Los Alamos National Security, LLC. All rights reserved.
# Open MPI configuration for Cray XC v2.x GNU compiler,
# Lustre
if test "$CC" = "cc" ; then
echo "ERROR: Open MPI should not be compiled with Cray's wrapper compilers (cc/CC/ftn)"
exit 1
fi
source ./common
source ./common-optimized
# enable Lustre in romio
with_io_romio_flags="--with-file-system=ufs+nfs+lustre"
# Disable components not needed
enable_mca_no_build=crs,filem,routed-linear,snapc,pml-dr,pml-crcp2,pml-crcpw,pml-v,pml-example,crcp,pml-cm,ess-cnos,grpcomm-cnos,plm-rsh,btl-tcp,oob-ud,ras-simulator,mpool-fake
enable_mca_static=btl:ugni,btl:self,btl:vader,pml:ob1
# enable direct calling for ob1
enable_mca_direct=pml-ob1
# do not use IB verbs
with_verbs=no
# do not use torque
with_tm=no
enable_dlopen=yes
enable_orte_static_ports=no
enable_pty_support=no

Просмотреть файл

@ -10,8 +10,8 @@
# Copyright (c) 2004-2005 The Regents of the University of California. # Copyright (c) 2004-2005 The Regents of the University of California.
# All rights reserved. # All rights reserved.
# Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2006 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2011-2016 Los Alamos National Security, LLC. All rights # Copyright (c) 2015-2018 Los Alamos National Security, LLC.
# reserved. # All rights reserved.
# $COPYRIGHT$ # $COPYRIGHT$
# #
# Additional copyrights may follow # Additional copyrights may follow
@ -62,43 +62,46 @@
# Basic behavior to smooth startup # Basic behavior to smooth startup
mca_base_component_show_load_errors = 0 mca_base_component_show_load_errors = 0
opal_set_max_sys_limits = 1 #orte_report_launch_progress = 1
orte_report_launch_progress = 1
# Set line buffering for stdout/stderr
ess_base_stream_buffering = 1
# Define timeout for daemons to report back during launch # Define timeout for daemons to report back during launch
orte_startup_timeout = 10000 orte_startup_timeout = 360
## Protect the shared file systems ## Protect the shared file systems
orte_no_session_dirs = /panfs,/scratch,/users,/usr/projects orte_no_session_dirs = /lustre,/users,/usr/projects
orte_tmpdir_base = /tmp orte_tmpdir_base = /var/tmp
## Require an allocation to run - protects the frontend ## Require an allocation to run - protects the frontend
## from inadvertent job executions ## from inadvertent job executions
orte_allocation_required = 1 orte_allocation_required = 1
## Deal with the allocator
orte_strip_prefix = nid
orte_retain_aliases = 1
# 1st alias entry is the stripped node name,
# 2nd is the unstripped one
orte_hostname_alias_index = 2
## Add the interface for out-of-band communication ## Add the interface for out-of-band communication
## and set it up ## and set it up
oob_tcp_if_include=ib0,eth0 oob_tcp_if_include=ipogif0
oob_tcp_peer_retries = 1000 oob_tcp_peer_retries = 1000
oob_tcp_sndbuf = 32768 oob_tcp_sndbuf = 32768
oob_tcp_rcvbuf = 32768 oob_tcp_rcvbuf = 32768
## Define the MPI interconnects ## Define the MPI interconnects
btl = vader,openib,self btl = self,vader,ugni
## Setup OpenIB - just in case ## Setup Gemini
btl_openib_want_fork_support = 0 # TODO LANL
btl_openib_receive_queues = X,4096,1024:X,12288,512:X,65536,512
## Disable MXM ## Rank by core
pml = ob1 rmaps_base_ranking_policy = core
coll = ^hcoll
## Enable cpu affinity
hwloc_base_binding_policy = core
## Setup MPI options ## Setup MPI options
mpi_show_handle_leaks = 1 mpi_show_handle_leaks = 1
mpi_warn_on_fork = 1 mpi_warn_on_fork = 1
#mpi_abort_print_stack = 1 #mpi_abort_print_stack = 1

Просмотреть файл

@ -1,8 +0,0 @@
# (c) 2013-2016 Los Alamos National Security, LLC. All rights reserved.
# Open MPI debug configuration for TOSS/TOSS2 v1.7.x/1.8.x
source ./toss-common
source ./debug-common
# Enable panasas support in romio
with_io_romio_flags=--with-file-system=ufs+nfs+lustre

Просмотреть файл

@ -1,8 +0,0 @@
# (c) 2013 Los Alamos National Security, LLC. All rights reserved.
# Open MPI common debug configuration for TOSS/TOSS2 v1.7.x/1.8.x
enable_mem_debug=yes
enable_mem_profile=yes
enable_debug_symbols=yes
enable_picky=yes
enable_debug=yes

Просмотреть файл

@ -1,4 +0,0 @@
# (c) 2013-2016 Los Alamos National Security, LLC. All rights reserved.
# Open MPI debug configuration for TOSS/TOSS2 v1.7.x/1.8.x
source ./debug

Просмотреть файл

@ -1,8 +0,0 @@
# (c) 2013-2016 Los Alamos National Security, LLC. All rights reserved.
# Open MPI optimized configuration for TOSS/TOSS2 v1.7.x/1.8.x
source ./toss-common
source ./optimized-common
# Enable panasas support in romio
with_io_romio_flags=--with-file-system=ufs+nfs+lustre

Просмотреть файл

@ -1,4 +0,0 @@
# (c) 2013-2016 Los Alamos National Security, LLC. All rights reserved.
# Open MPI optimized configuration for TOSS/TOSS2 v1.7.x/1.8.x
source ./optimized

Просмотреть файл

@ -1,40 +0,0 @@
# (c) 2013 Los Alamos National Security, LLC. All rights reserved.
# Open MPI common configuration for TOSS/TOSS2 v1.7.x/1.8.x
enable_binaries=yes
enable_heterogeneous=no
enable_shared=yes
enable_static=yes
enable_ipv6=no
enable_ft_thread=no
enable_per_user_config_files=no
enable_memchecker=no
with_valgrind=no
# Enable the fortran bindings
enable_mpi_fortran=yes
# Disable the C++ binding. They were deprecated in MPI-2.2 and removed in MPI-3
enable_mpi_cxx=no
enable_mpi_cxx_seek=no
enable_cxx_exceptions=no
# Disable components not needed on any TOSS platform
enable_mca_no_build=carto,crs,filem,routed-linear,snapc,pml-dr,pml-crcp2,pml-crcpw,pml-v,pml-example,crcp,btl-tcp
# Enable malloc hooks for mpi_leave_pinned
with_memory_manager=linux
# TOSS2 uses slurm
with_slurm=yes
with_tm=no
# Enable PMI support for direct launch
with_pmi=yes
# Always build ibverbs support
with_verbs=yes
# Install the development headers
with_devel_headers=yes

Просмотреть файл

@ -0,0 +1,21 @@
# (c) 2013-2018 Los Alamos National Security, LLC. All rights reserved.
# Open MPI optimized configuration for TOSS/TOSS2 v1.7.x/1.8.x
source ./common
source ./common-optimized
# Disable components not needed on TOSS platforms with high-speed networks
enable_mca_no_build=carto,crs,filem,routed-linear,snapc,pml-dr,pml-crcp2,pml-crcpw,pml-v,pml-example,crcp,btl-tcp
# TOSS2 uses slurm
with_slurm=yes
with_tm=no
# Enable PMI support for direct launch
with_pmi=yes
# Enable lustre support in romio
with_io_romio_flags=--with-file-system=ufs+nfs+lustre
# Always build ibverbs support
with_verbs=yes

Просмотреть файл

@ -10,7 +10,7 @@
# Copyright (c) 2004-2005 The Regents of the University of California. # Copyright (c) 2004-2005 The Regents of the University of California.
# All rights reserved. # All rights reserved.
# Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2006 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2011-2016 Los Alamos National Security, LLC. All rights # Copyright (c) 2011-2018 Los Alamos National Security, LLC. All rights
# reserved. # reserved.
# $COPYRIGHT$ # $COPYRIGHT$
# #
@ -69,7 +69,7 @@ orte_report_launch_progress = 1
orte_startup_timeout = 10000 orte_startup_timeout = 10000
## Protect the shared file systems ## Protect the shared file systems
orte_no_session_dirs = /panfs,/scratch,/users,/usr/projects orte_no_session_dirs = /lustre,/net,/users,/usr/projects
orte_tmpdir_base = /tmp orte_tmpdir_base = /tmp
## Require an allocation to run - protects the frontend ## Require an allocation to run - protects the frontend
@ -88,17 +88,22 @@ btl = vader,openib,self
## Setup OpenIB - just in case ## Setup OpenIB - just in case
btl_openib_want_fork_support = 0 btl_openib_want_fork_support = 0
## Use Shared Receive Queues (SRQ). Mellanox ConnectX cards should be able to
## use eXtended Reliable Connection receive queues (XRC), but our systems are
## missing the needed libraries and headers to support it.
btl_openib_receive_queues = S,4096,1024:S,12288,512:S,65536,512 btl_openib_receive_queues = S,4096,1024:S,12288,512:S,65536,512
## Disable MXM ## Rank by core
pml = ob1 rmaps_base_ranking_policy = core
coll = ^hcoll
## Enable cpu affinity
hwloc_base_binding_policy = core
## Setup MPI options ## Setup MPI options
mpi_show_handle_leaks = 0 mpi_show_handle_leaks = 0
mpi_warn_on_fork = 1 mpi_warn_on_fork = 1
#mpi_abort_print_stack = 0 #mpi_abort_print_stack = 0
## Run orted on parent node of allocation
ras_base_launch_orted_on_hn = true
## Disable MXM
pml = ob1
coll = ^hcoll

Просмотреть файл

@ -0,0 +1,21 @@
# (c) 2013-2018 Los Alamos National Security, LLC. All rights reserved.
# Open MPI optimized configuration for TOSS/TOSS2 v1.7.x/1.8.x
source ./common
source ./common-optimized
# Disable components not needed on TOSS platforms with high-speed networks
enable_mca_no_build=carto,crs,filem,routed-linear,snapc,pml-dr,pml-crcp2,pml-crcpw,pml-v,pml-example,crcp,btl-tcp
# TOSS2 uses slurm
with_slurm=yes
with_tm=no
# Enable PMI support for direct launch
with_pmi=yes
# Enable lustre support in romio
with_io_romio_flags=--with-file-system=ufs+nfs+lustre
# Always build ibverbs support
with_verbs=yes

Просмотреть файл

@ -0,0 +1,111 @@
#
# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
# University Research and Technology
# Corporation. All rights reserved.
# Copyright (c) 2004-2005 The University of Tennessee and The University
# of Tennessee Research Foundation. All rights
# reserved.
# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
# University of Stuttgart. All rights reserved.
# Copyright (c) 2004-2005 The Regents of the University of California.
# All rights reserved.
# Copyright (c) 2006 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2011-2018 Los Alamos National Security, LLC. All rights
# reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
# This is the default system-wide MCA parameters defaults file.
# Specifically, the MCA parameter "mca_param_files" defaults to a
# value of
# "$HOME/.openmpi/mca-params.conf:$sysconf/openmpi-mca-params.conf"
# (this file is the latter of the two). So if the default value of
# mca_param_files is not changed, this file is used to set system-wide
# MCA parameters. This file can therefore be used to set system-wide
# default MCA parameters for all users. Of course, users can override
# these values if they want, but this file is an excellent location
# for setting system-specific MCA parameters for those users who don't
# know / care enough to investigate the proper values for them.
# Note that this file is only applicable where it is visible (in a
# filesystem sense). Specifically, MPI processes each read this file
# during their startup to determine what default values for MCA
# parameters should be used. mpirun does not bundle up the values in
# this file from the node where it was run and send them to all nodes;
# the default value decisions are effectively distributed. Hence,
# these values are only applicable on nodes that "see" this file. If
# $sysconf is a directory on a local disk, it is likely that changes
# to this file will need to be propagated to other nodes. If $sysconf
# is a directory that is shared via a networked filesystem, changes to
# this file will be visible to all nodes that share this $sysconf.
# The format is straightforward: one per line, mca_param_name =
# rvalue. Quoting is ignored (so if you use quotes or escape
# characters, they'll be included as part of the value). For example:
# Disable run-time MPI parameter checking
# mpi_param_check = 0
# Note that the value "~/" will be expanded to the current user's home
# directory. For example:
# Change component loading path
# component_path = /usr/local/lib/openmpi:~/my_openmpi_components
# See "ompi_info --param all all" for a full listing of Open MPI MCA
# parameters available and their default values.
#
# Basic behavior to smooth startup
mca_base_component_show_load_errors = 0
opal_set_max_sys_limits = 1
orte_report_launch_progress = 1
# Define timeout for daemons to report back during launch
orte_startup_timeout = 10000
## Protect the shared file systems
orte_no_session_dirs = /lustre,/net,/users,/usr/projects
orte_tmpdir_base = /tmp
## Require an allocation to run - protects the frontend
## from inadvertent job executions
orte_allocation_required = 1
## Add the interface for out-of-band communication
## and set it up
oob_tcp_if_include = ib0,eth0
oob_tcp_peer_retries = 1000
oob_tcp_sndbuf = 32768
oob_tcp_rcvbuf = 32768
## Define the MPI interconnects
btl = ^openib
## Turn off osc rdma component. This is used in one-sided communication which
## isn't supported on psm2 and omnipath interconnects. If this option isn't
## used, a runtime error about btl's will result because we are turning off
## openib above. The current implementation of one-sided communication by-
## passes the loaded components and goes straight for btl's. Since one-sided
## communication doesn't really work on psm2 and omnipath, rather than turn
## openib back on, we're going to turn off some rdma capability.
osc = ^rdma
## Setup OpenIB - just in case
btl_openib_want_fork_support = 0
btl_openib_receive_queues = S,4096,1024:S,12288,512:S,65536,512
## Rank by core
rmaps_base_ranking_policy = core
## Setup MPI options
mpi_show_handle_leaks = 0
mpi_warn_on_fork = 1
#mpi_abort_print_stack = 0
## Run orted on parent node of allocation
ras_base_launch_orted_on_hn = true

Просмотреть файл

@ -0,0 +1,21 @@
# (c) 2013-2018 Los Alamos National Security, LLC. All rights reserved.
# Open MPI optimized configuration for TOSS/TOSS2 v1.7.x/1.8.x
source ./common
source ./common-optimized
# Disable components not needed on TOSS platforms with high-speed networks
enable_mca_no_build=carto,crs,filem,routed-linear,snapc,pml-dr,pml-crcp2,pml-crcpw,pml-v,pml-example,crcp,btl-tcp
# TOSS2 uses slurm
with_slurm=yes
with_tm=no
# Enable PMI support for direct launch
with_pmi=yes
# Enable lustre support in romio
with_io_romio_flags=--with-file-system=ufs+nfs+lustre
# Always build ibverbs support
with_verbs=yes

Просмотреть файл

@ -0,0 +1,112 @@
#
# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
# University Research and Technology
# Corporation. All rights reserved.
# Copyright (c) 2004-2005 The University of Tennessee and The University
# of Tennessee Research Foundation. All rights
# reserved.
# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
# University of Stuttgart. All rights reserved.
# Copyright (c) 2004-2005 The Regents of the University of California.
# All rights reserved.
# Copyright (c) 2006 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2011-2018 Los Alamos National Security, LLC. All rights
# reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
# This is the default system-wide MCA parameters defaults file.
# Specifically, the MCA parameter "mca_param_files" defaults to a
# value of
# "$HOME/.openmpi/mca-params.conf:$sysconf/openmpi-mca-params.conf"
# (this file is the latter of the two). So if the default value of
# mca_param_files is not changed, this file is used to set system-wide
# MCA parameters. This file can therefore be used to set system-wide
# default MCA parameters for all users. Of course, users can override
# these values if they want, but this file is an excellent location
# for setting system-specific MCA parameters for those users who don't
# know / care enough to investigate the proper values for them.
# Note that this file is only applicable where it is visible (in a
# filesystem sense). Specifically, MPI processes each read this file
# during their startup to determine what default values for MCA
# parameters should be used. mpirun does not bundle up the values in
# this file from the node where it was run and send them to all nodes;
# the default value decisions are effectively distributed. Hence,
# these values are only applicable on nodes that "see" this file. If
# $sysconf is a directory on a local disk, it is likely that changes
# to this file will need to be propagated to other nodes. If $sysconf
# is a directory that is shared via a networked filesystem, changes to
# this file will be visible to all nodes that share this $sysconf.
# The format is straightforward: one per line, mca_param_name =
# rvalue. Quoting is ignored (so if you use quotes or escape
# characters, they'll be included as part of the value). For example:
# Disable run-time MPI parameter checking
# mpi_param_check = 0
# Note that the value "~/" will be expanded to the current user's home
# directory. For example:
# Change component loading path
# component_path = /usr/local/lib/openmpi:~/my_openmpi_components
# See "ompi_info --param all all" for a full listing of Open MPI MCA
# parameters available and their default values.
#
# Basic behavior to smooth startup
mca_base_component_show_load_errors = 0
opal_set_max_sys_limits = 1
orte_report_launch_progress = 1
# Define timeout for daemons to report back during launch
orte_startup_timeout = 10000
## Protect the shared file systems
orte_no_session_dirs = /lustre,/net,/users,/usr/projects
orte_tmpdir_base = /tmp
## Require an allocation to run - protects the frontend
## from inadvertent job executions
orte_allocation_required = 1
## Set up out-of-band communication.
## For OmniPath connected machines, don't use the ib0 network for
## out-of-band communication. It's a little flaky.
oob_tcp_if_exclude = ib0
oob_tcp_peer_retries = 1000
oob_tcp_sndbuf = 32768
oob_tcp_rcvbuf = 32768
## Define the MPI interconnects
btl = ^openib
## Turn off osc rdma component. This is used in one-sided communication which
## isn't supported on psm2 and omnipath interconnects. If this option isn't
## used, a runtime error about btl's will result because we are turning off
## openib above. The current implementation of one-sided communication by-
## passes the loaded components and goes straight for btl's. Since one-sided
## communication doesn't really work on psm2 and omnipath, rather than turn
## openib back on, we're going to turn off some rdma capability.
osc = ^rdma
## Setup OpenIB - just in case
btl_openib_want_fork_support = 0
btl_openib_receive_queues = S,4096,1024:S,12288,512:S,65536,512
## Rank by core
rmaps_base_ranking_policy = core
## Setup MPI options
mpi_show_handle_leaks = 0
mpi_warn_on_fork = 1
#mpi_abort_print_stack = 0
## Run orted on parent node of allocation
ras_base_launch_orted_on_hn = true

Просмотреть файл

@ -0,0 +1,21 @@
# (c) 2013-2018 Los Alamos National Security, LLC. All rights reserved.
# Open MPI optimized configuration for TOSS/TOSS2 v1.7.x/1.8.x
source ./common
source ./common-optimized
# Disable components not needed on TOSS platforms with high-speed networks
enable_mca_no_build=carto,crs,filem,routed-linear,snapc,pml-dr,pml-crcp2,pml-crcpw,pml-v,pml-example,crcp,btl-tcp
# TOSS2 uses slurm
with_slurm=yes
with_tm=no
# Enable PMI support for direct launch
with_pmi=yes
# Enable lustre support in romio
with_io_romio_flags=--with-file-system=ufs+nfs+lustre
# Always build ibverbs support
with_verbs=yes

Просмотреть файл

@ -10,7 +10,7 @@
# Copyright (c) 2004-2005 The Regents of the University of California. # Copyright (c) 2004-2005 The Regents of the University of California.
# All rights reserved. # All rights reserved.
# Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2006 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2011-2014 Los Alamos National Security, LLC. All rights # Copyright (c) 2011-2018 Los Alamos National Security, LLC. All rights
# reserved. # reserved.
# $COPYRIGHT$ # $COPYRIGHT$
# #
@ -69,16 +69,14 @@ orte_report_launch_progress = 1
orte_startup_timeout = 10000 orte_startup_timeout = 10000
## Protect the shared file systems ## Protect the shared file systems
orte_no_session_dirs = /panfs,/scratch,/users,/usr/projects orte_no_session_dirs = /lustre,/net,/users,/usr/projects
orte_tmpdir_base = /tmp orte_tmpdir_base = /tmp
## Require an allocation to run - protects the frontend ## Require an allocation to run - protects the frontend
## from inadvertent job executions ## from inadvertent job executions
orte_allocation_required = 1 orte_allocation_required = 1
## Add the interface for out-of-band communication ## Set up out-of-band communication.
## and set it up
oob_tcp_if_include=ib0,eth0
oob_tcp_peer_retries = 1000 oob_tcp_peer_retries = 1000
oob_tcp_sndbuf = 32768 oob_tcp_sndbuf = 32768
oob_tcp_rcvbuf = 32768 oob_tcp_rcvbuf = 32768
@ -88,13 +86,18 @@ btl = vader,openib,self
## Setup OpenIB - just in case ## Setup OpenIB - just in case
btl_openib_want_fork_support = 0 btl_openib_want_fork_support = 0
## Use Shared Receive Queues (SRQ). Mellanox ConnectX cards should be able to
## use eXtended Reliable Connection receive queues (XRC), but our systems are
## missing the needed libraries and headers to support it.
btl_openib_receive_queues = S,4096,1024:S,12288,512:S,65536,512 btl_openib_receive_queues = S,4096,1024:S,12288,512:S,65536,512
## Enable cpu affinity ## Rank by core
hwloc_base_binding_policy = core rmaps_base_ranking_policy = core
## Setup MPI options ## Setup MPI options
mpi_show_handle_leaks = 1 mpi_show_handle_leaks = 0
mpi_warn_on_fork = 1 mpi_warn_on_fork = 1
#mpi_abort_print_stack = 1 #mpi_abort_print_stack = 0
## Run orted on parent node of allocation
ras_base_launch_orted_on_hn = true

Просмотреть файл

@ -0,0 +1,21 @@
# (c) 2013-2018 Los Alamos National Security, LLC. All rights reserved.
# Open MPI optimized configuration for TOSS/TOSS2 v1.7.x/1.8.x
source ./common
source ./common-optimized
# Disable components not needed on TOSS Ethernet-connected clusters
enable_mca_no_build=carto,crs,filem,routed-linear,snapc,pml-dr,pml-crcp2,pml-crcpw,pml-v,pml-example,crcp
# TOSS2 uses slurm
with_slurm=yes
with_tm=no
# Enable PMI support for direct launch
with_pmi=yes
# Enable lustre support in romio
with_io_romio_flags=--with-file-system=ufs+nfs+lustre
# Do not build ibverbs support
with_verbs=no

Просмотреть файл

@ -10,7 +10,7 @@
# Copyright (c) 2004-2005 The Regents of the University of California. # Copyright (c) 2004-2005 The Regents of the University of California.
# All rights reserved. # All rights reserved.
# Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2006 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2011-2014 Los Alamos National Security, LLC. All rights # Copyright (c) 2011-2018 Los Alamos National Security, LLC. All rights
# reserved. # reserved.
# $COPYRIGHT$ # $COPYRIGHT$
# #
@ -69,32 +69,25 @@ orte_report_launch_progress = 1
orte_startup_timeout = 10000 orte_startup_timeout = 10000
## Protect the shared file systems ## Protect the shared file systems
orte_no_session_dirs = /panfs,/scratch,/users,/usr/projects orte_no_session_dirs = /lustre,/net,/users,/usr/projects
orte_tmpdir_base = /tmp orte_tmpdir_base = /tmp
## Require an allocation to run - protects the frontend ## Require an allocation to run - protects the frontend
## from inadvertent job executions ## from inadvertent job executions
orte_allocation_required = 1 orte_allocation_required = 1
## Add the interface for out-of-band communication ## Set up the interface for out-of-band communication
## and set it up
oob_tcp_if_include = ib0,eth0
oob_tcp_peer_retries = 1000 oob_tcp_peer_retries = 1000
oob_tcp_sndbuf = 32768 oob_tcp_sndbuf = 32768
oob_tcp_rcvbuf = 32768 oob_tcp_rcvbuf = 32768
## Define the MPI interconnects ## Rank by core
btl = vader,openib,self rmaps_base_ranking_policy = core
## Setup OpenIB - just in case
btl_openib_want_fork_support = 0
btl_openib_receive_queues = S,4096,1024:S,12288,512:S,65536,512
## Enable cpu affinity
hwloc_base_binding_policy = core
## Setup MPI options ## Setup MPI options
mpi_show_handle_leaks = 0 mpi_show_handle_leaks = 0
mpi_warn_on_fork = 1 mpi_warn_on_fork = 1
#mpi_abort_print_stack = 0 #mpi_abort_print_stack = 0
## Run orted on parent node of allocation
ras_base_launch_orted_on_hn = true