LANL noticed that calling MPI_ABORT invokes opal_output(0, ...)
unconditionally, which can result in a flood of messages to the user if all MPI processes invoke abort. Additionally, some users were confused because they saw the MPI_ABORT opal_output() messages from ''some'' MPI processes, but not ''all'' of them (despite the fact that every MPI process supposedly invoked MPI_ABORT). The reason is that calling MPI_ABORT triggers ORTE to kill all MPI processes, so it's a race condition as to whether a) all MPI processes actually invoke MPI_ABORT, and/or b) whether every process is able to opal_output() before they are killed. This commit does two simple things: * Now use orte_show_help() for the MPI_ABORT message, so they are aggregated. * Add a note in the message that calling MPI_ABORT kills all processes, so you might not see all output, yadda yadda yadda. This commit was SVN r19735.
Этот коммит содержится в:
родитель
4ff6bb924f
Коммит
57a3dce9ba
@ -9,7 +9,7 @@
|
||||
* University of Stuttgart. All rights reserved.
|
||||
* Copyright (c) 2004-2005 The Regents of the University of California.
|
||||
* All rights reserved.
|
||||
* Copyright (c) 2007 Cisco Systems, Inc. All rights reserved.
|
||||
* Copyright (c) 2007-2008 Cisco Systems, Inc. All rights reserved.
|
||||
* $COPYRIGHT$
|
||||
*
|
||||
* Additional copyrights may follow
|
||||
@ -18,6 +18,7 @@
|
||||
*/
|
||||
#include "ompi_config.h"
|
||||
|
||||
#include "orte/util/show_help.h"
|
||||
#include "ompi/mpi/c/bindings.h"
|
||||
#include "ompi/runtime/mpiruntime.h"
|
||||
#include "ompi/memchecker.h"
|
||||
@ -49,7 +50,9 @@ int MPI_Abort(MPI_Comm comm, int errorcode)
|
||||
OMPI_ERR_INIT_FINALIZE(FUNC_NAME);
|
||||
}
|
||||
|
||||
opal_output(0, "MPI_ABORT invoked on rank %d in communicator %s with errorcode %d\n",
|
||||
ompi_comm_rank(comm), comm->c_name, errorcode);
|
||||
orte_show_help("help-mpi-api.txt", "mpi-abort", true,
|
||||
ompi_comm_rank(comm),
|
||||
(NULL != comm->c_name) ? comm->c_name : "<Unknown>",
|
||||
errorcode);
|
||||
return ompi_mpi_abort(comm, errorcode, true);
|
||||
}
|
||||
|
@ -2,7 +2,7 @@
|
||||
#
|
||||
# Copyright (c) 2006 High Performance Computing Center Stuttgart,
|
||||
# University of Stuttgart. All rights reserved.
|
||||
# Copyright (c) 2006 Cisco Systems, Inc. All rights reserved.
|
||||
# Copyright (c) 2006-2008 Cisco Systems, Inc. All rights reserved.
|
||||
# $COPYRIGHT$
|
||||
#
|
||||
# Additional copyrights may follow
|
||||
@ -14,7 +14,14 @@
|
||||
[mpi-function-after-finalize]
|
||||
Calling any MPI-function after calling MPI_Finalize is erroneous.
|
||||
The only exceptions are MPI_Initialized, MPI_Finalized and MPI_Get_version.
|
||||
|
||||
#
|
||||
[mpi-initialize-twice]
|
||||
Calling MPI_Init or MPI_Init_thread twice is erroneous.
|
||||
#
|
||||
[mpi-abort]
|
||||
MPI_ABORT was invoked on rank %d in communicator %s
|
||||
with errorcode %d.
|
||||
|
||||
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
|
||||
You may or may not see output from other processes, depending on
|
||||
exactly when Open MPI kills them.
|
||||
|
Загрузка…
Ссылка в новой задаче
Block a user