1
1

LANL noticed that calling MPI_ABORT invokes opal_output(0, ...)

unconditionally, which can result in a flood of messages to the user
if all MPI processes invoke abort.  Additionally, some users were
confused because they saw the MPI_ABORT opal_output() messages from
''some'' MPI processes, but not ''all'' of them (despite the fact that
every MPI process supposedly invoked MPI_ABORT).  The reason is that
calling MPI_ABORT triggers ORTE to kill all MPI processes, so it's a
race condition as to whether a) all MPI processes actually invoke
MPI_ABORT, and/or b) whether every process is able to opal_output()
before they are killed.

This commit does two simple things:
 * Now use orte_show_help() for the MPI_ABORT message, so they are
   aggregated. 
 * Add a note in the message that calling MPI_ABORT kills all
   processes, so you might not see all output, yadda yadda yadda.

This commit was SVN r19735.
Этот коммит содержится в:
Jeff Squyres 2008-10-14 19:23:03 +00:00
родитель 4ff6bb924f
Коммит 57a3dce9ba
2 изменённых файлов: 15 добавлений и 5 удалений

Просмотреть файл

@ -9,7 +9,7 @@
* University of Stuttgart. All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
* Copyright (c) 2007 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2007-2008 Cisco Systems, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@ -18,6 +18,7 @@
*/
#include "ompi_config.h"
#include "orte/util/show_help.h"
#include "ompi/mpi/c/bindings.h"
#include "ompi/runtime/mpiruntime.h"
#include "ompi/memchecker.h"
@ -49,7 +50,9 @@ int MPI_Abort(MPI_Comm comm, int errorcode)
OMPI_ERR_INIT_FINALIZE(FUNC_NAME);
}
opal_output(0, "MPI_ABORT invoked on rank %d in communicator %s with errorcode %d\n",
ompi_comm_rank(comm), comm->c_name, errorcode);
orte_show_help("help-mpi-api.txt", "mpi-abort", true,
ompi_comm_rank(comm),
(NULL != comm->c_name) ? comm->c_name : "<Unknown>",
errorcode);
return ompi_mpi_abort(comm, errorcode, true);
}

Просмотреть файл

@ -2,7 +2,7 @@
#
# Copyright (c) 2006 High Performance Computing Center Stuttgart,
# University of Stuttgart. All rights reserved.
# Copyright (c) 2006 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2006-2008 Cisco Systems, Inc. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
@ -14,7 +14,14 @@
[mpi-function-after-finalize]
Calling any MPI-function after calling MPI_Finalize is erroneous.
The only exceptions are MPI_Initialized, MPI_Finalized and MPI_Get_version.
#
[mpi-initialize-twice]
Calling MPI_Init or MPI_Init_thread twice is erroneous.
#
[mpi-abort]
MPI_ABORT was invoked on rank %d in communicator %s
with errorcode %d.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.