39367ca0bf
Turns out that the way the SLURM plm works is not compatible with the way MPI processes on Cray XC obtain RDMA credentials to use the high speed network. Unlike with ALPS, the mpirun process is on the first compute node in the job. With the current PLM launch system, mpirun (HNP daemon) launches the MPI ranks on that node rather than relying on srun. This will probably require a significant amount of effort to rework to support Native SLURM on Cray XC's. As a short term alternative, have the alps plm (which gets selected by default again on Cray systems regardless of the launch system) check whether or not srun or alps is being used on the system. If alps is not being used, print a helpful message for the user and abort the job launch. Signed-off-by: Howard Pritchard <howardp@lanl.gov>
46 строки
1.7 KiB
Plaintext
46 строки
1.7 KiB
Plaintext
# -*- text -*-
|
|
#
|
|
# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
|
|
# University Research and Technology
|
|
# Corporation. All rights reserved.
|
|
# Copyright (c) 2004-2005 The University of Tennessee and The University
|
|
# of Tennessee Research Foundation. All rights
|
|
# reserved.
|
|
# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
|
# University of Stuttgart. All rights reserved.
|
|
# Copyright (c) 2004-2005 The Regents of the University of California.
|
|
# All rights reserved.
|
|
# $COPYRIGHT$
|
|
#
|
|
# Additional copyrights may follow
|
|
#
|
|
# $HEADER$
|
|
#
|
|
[multiple-prefixes]
|
|
The ALPS process starter for Open MPI does not support multiple
|
|
different --prefix options to mpirun. You can specify at most one
|
|
unique value for the --prefix option (in any of the application
|
|
contexts); it will be applied to all the application contexts of your
|
|
parallel job.
|
|
|
|
Put simply, you must have Open MPI installed in the same location on
|
|
all of your ALPS nodes.
|
|
|
|
Multiple different --prefix options were specified to mpirun. This is
|
|
a fatal error for the ALPS process starter in Open MPI.
|
|
|
|
The first two prefix values supplied were:
|
|
%s
|
|
and %s
|
|
#
|
|
[no-hosts-in-list]
|
|
The ALPS process starter for Open MPI didn't find any hosts in
|
|
the map for this application. This can be caused by a lack of
|
|
an allocation, or by an error in the Open MPI code. Please check
|
|
to ensure you have a ALPS allocation. If you do, then please pass
|
|
the error to the Open MPI user's mailing list for assistance.
|
|
#
|
|
[slurm-not-supported]
|
|
mpirun is not a supported launcher on Cray XC using Native SLURM.
|
|
srun must be used to launch jobs on these systems.
|