Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
/*
|
2005-11-05 22:57:48 +03:00
|
|
|
* Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
|
|
|
|
* University Research and Technology
|
|
|
|
* Corporation. All rights reserved.
|
|
|
|
* Copyright (c) 2004-2005 The University of Tennessee and The University
|
|
|
|
* of Tennessee Research Foundation. All rights
|
|
|
|
* reserved.
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
|
|
|
* University of Stuttgart. All rights reserved.
|
2005-03-24 15:43:37 +03:00
|
|
|
* Copyright (c) 2004-2005 The Regents of the University of California.
|
|
|
|
* All rights reserved.
|
2006-07-17 15:16:58 +04:00
|
|
|
* Copyright (c) 2006 Sun Microsystems, Inc. All rights reserved.
|
2009-01-03 18:33:54 +03:00
|
|
|
* Copyright (c) 2008-2009 Cisco Systems, Inc. All rights reserved.
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
* $COPYRIGHT$
|
|
|
|
*
|
|
|
|
* Additional copyrights may follow
|
|
|
|
*
|
|
|
|
* $HEADER$
|
|
|
|
*/
|
|
|
|
|
2006-02-12 04:33:29 +03:00
|
|
|
#include "opal_config.h"
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
|
2005-02-09 06:08:13 +03:00
|
|
|
#include <stdio.h>
|
|
|
|
#ifdef HAVE_UNISTD_H
|
|
|
|
#include <unistd.h>
|
|
|
|
#endif
|
|
|
|
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
#ifdef HAVE_STRING_H
|
|
|
|
#include <string.h>
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifdef HAVE_SIGNAL_H
|
|
|
|
#include <signal.h>
|
|
|
|
#endif
|
|
|
|
|
2005-07-04 06:38:44 +04:00
|
|
|
#include "opal/util/stacktrace.h"
|
2005-08-13 00:46:25 +04:00
|
|
|
#include "opal/mca/base/mca_base_param.h"
|
2006-07-27 06:56:02 +04:00
|
|
|
#include "opal/mca/backtrace/backtrace.h"
|
2006-02-12 04:33:29 +03:00
|
|
|
#include "opal/constants.h"
|
2008-12-19 01:39:49 +03:00
|
|
|
#include "opal/util/output.h"
|
2009-03-20 04:05:30 +03:00
|
|
|
#include "opal/util/show_help.h"
|
2010-05-18 02:57:42 +04:00
|
|
|
#include "opal/util/argv.h"
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
|
|
|
|
#ifndef _NSIG
|
|
|
|
#define _NSIG 32
|
|
|
|
#endif
|
|
|
|
|
2006-12-17 22:14:13 +03:00
|
|
|
#define HOSTFORMAT "[%s:%05d] "
|
|
|
|
|
2006-12-17 22:48:19 +03:00
|
|
|
static char stacktrace_hostname[64];
|
2009-01-03 18:33:54 +03:00
|
|
|
static char *unable_to_print_msg = "Unable to print stack trace!\n";
|
2006-12-17 22:48:19 +03:00
|
|
|
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
/**
|
|
|
|
* This function is being called as a signal-handler in response
|
|
|
|
* to a user-specified signal (e.g. SIGFPE or SIGSEGV).
|
|
|
|
* For Linux/Glibc, it then uses backtrace and backtrace_symbols
|
|
|
|
* to figure the current stack and then prints that out to stdout.
|
2005-10-13 19:41:25 +04:00
|
|
|
* Where available, the BSD libexecinfo is used to provide Linux/Glibc
|
2007-02-06 15:00:30 +03:00
|
|
|
* compatible backtrace and backtrace_symbols functions.
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
* Yes, printf and malloc are not signal-safe per se, but should be
|
|
|
|
* on Linux?
|
|
|
|
*
|
|
|
|
* @param signo with the signal number raised
|
|
|
|
* @param info with information regarding the reason/send of the signal
|
|
|
|
* @param p
|
|
|
|
*
|
2005-01-28 23:58:38 +03:00
|
|
|
* FIXME: Should distinguish for systems, which don't have siginfo...
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
*/
|
2009-05-07 00:11:28 +04:00
|
|
|
#if OPAL_WANT_PRETTY_PRINT_STACKTRACE && ! defined(__WINDOWS__)
|
2008-12-10 23:40:47 +03:00
|
|
|
static void show_stackframe (int signo, siginfo_t * info, void * p)
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
{
|
|
|
|
char print_buffer[1024];
|
|
|
|
char * tmp = print_buffer;
|
|
|
|
int size = sizeof (print_buffer);
|
2006-12-17 22:14:13 +03:00
|
|
|
int ret, traces_size;
|
2006-12-17 22:48:19 +03:00
|
|
|
char *si_code_str = "";
|
2006-12-17 22:14:13 +03:00
|
|
|
char **traces;
|
|
|
|
|
2006-12-17 22:27:57 +03:00
|
|
|
/* write out the footer information */
|
|
|
|
memset (print_buffer, 0, sizeof (print_buffer));
|
|
|
|
ret = snprintf(print_buffer, sizeof(print_buffer),
|
|
|
|
HOSTFORMAT "*** Process received signal ***\n",
|
2006-12-17 22:50:20 +03:00
|
|
|
stacktrace_hostname, getpid());
|
2006-12-17 22:27:57 +03:00
|
|
|
write(fileno(stderr), print_buffer, ret);
|
|
|
|
fflush(stderr);
|
|
|
|
|
|
|
|
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
/*
|
|
|
|
* Yes, we are doing printf inside a signal-handler.
|
|
|
|
* However, backtrace itself calls malloc (which may not be signal-safe,
|
2008-12-11 00:18:13 +03:00
|
|
|
* under linux, printf and malloc are)
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
*
|
|
|
|
* We could use backtrace_symbols_fd and write directly into an
|
|
|
|
* filedescriptor, however, without formatting -- also this fd
|
|
|
|
* should be opened in a sensible way...
|
|
|
|
*/
|
|
|
|
memset (print_buffer, 0, sizeof (print_buffer));
|
|
|
|
|
2006-12-17 22:14:13 +03:00
|
|
|
#ifdef HAVE_STRSIGNAL
|
|
|
|
ret = snprintf (tmp, size, HOSTFORMAT "Signal: %s (%d)\n",
|
2006-12-17 22:48:19 +03:00
|
|
|
stacktrace_hostname, getpid(), strsignal(signo), signo);
|
2006-12-17 22:14:13 +03:00
|
|
|
#else
|
|
|
|
ret = snprintf (tmp, size, HOSTFORMAT "Signal: %d\n",
|
2006-12-17 22:48:19 +03:00
|
|
|
stacktrace_hostname, getpid(), signo);
|
2006-12-17 22:14:13 +03:00
|
|
|
#endif
|
|
|
|
size -= ret;
|
|
|
|
tmp += ret;
|
|
|
|
|
2007-03-17 00:27:19 +03:00
|
|
|
if (NULL != info) {
|
|
|
|
switch (signo)
|
|
|
|
{
|
|
|
|
case SIGILL:
|
|
|
|
switch (info->si_code)
|
|
|
|
{
|
2005-10-13 19:41:25 +04:00
|
|
|
#ifdef ILL_ILLOPC
|
2006-12-17 22:14:13 +03:00
|
|
|
case ILL_ILLOPC: si_code_str = "Illegal opcode"; break;
|
2005-10-13 19:41:25 +04:00
|
|
|
#endif
|
2005-01-26 23:05:44 +03:00
|
|
|
#ifdef ILL_ILLOPN
|
2006-12-17 22:14:13 +03:00
|
|
|
case ILL_ILLOPN: si_code_str = "Illegal operand"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
|
|
|
#ifdef ILL_ILLADR
|
2006-12-17 22:14:13 +03:00
|
|
|
case ILL_ILLADR: si_code_str = "Illegal addressing mode"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
2005-10-13 19:41:25 +04:00
|
|
|
#ifdef ILL_ILLTRP
|
2006-12-17 22:14:13 +03:00
|
|
|
case ILL_ILLTRP: si_code_str = "Illegal trap"; break;
|
2005-10-13 19:41:25 +04:00
|
|
|
#endif
|
|
|
|
#ifdef ILL_PRVOPC
|
2006-12-17 22:14:13 +03:00
|
|
|
case ILL_PRVOPC: si_code_str = "Privileged opcode"; break;
|
2005-10-13 19:41:25 +04:00
|
|
|
#endif
|
2005-01-26 23:05:44 +03:00
|
|
|
#ifdef ILL_PRVREG
|
2006-12-17 22:14:13 +03:00
|
|
|
case ILL_PRVREG: si_code_str = "Privileged register"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
|
|
|
#ifdef ILL_COPROC
|
2006-12-17 22:14:13 +03:00
|
|
|
case ILL_COPROC: si_code_str = "Coprocessor error"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
|
|
|
#ifdef ILL_BADSTK
|
2006-12-17 22:14:13 +03:00
|
|
|
case ILL_BADSTK: si_code_str = "Internal stack error"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
2007-03-17 00:27:19 +03:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
case SIGFPE:
|
|
|
|
switch (info->si_code)
|
|
|
|
{
|
2005-01-26 23:05:44 +03:00
|
|
|
#ifdef FPE_INTDIV
|
2006-12-17 22:14:13 +03:00
|
|
|
case FPE_INTDIV: si_code_str = "Integer divide-by-zero"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
|
|
|
#ifdef FPE_INTOVF
|
2006-12-17 22:14:13 +03:00
|
|
|
case FPE_INTOVF: si_code_str = "Integer overflow"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
2006-12-17 22:14:13 +03:00
|
|
|
case FPE_FLTDIV: si_code_str = "Floating point divide-by-zero"; break;
|
|
|
|
case FPE_FLTOVF: si_code_str = "Floating point overflow"; break;
|
|
|
|
case FPE_FLTUND: si_code_str = "Floating point underflow"; break;
|
2007-07-25 03:19:45 +04:00
|
|
|
#ifdef FPE_FLTRES
|
2006-12-17 22:14:13 +03:00
|
|
|
case FPE_FLTRES: si_code_str = "Floating point inexact result"; break;
|
2007-07-25 03:19:45 +04:00
|
|
|
#endif
|
2007-07-10 07:46:57 +04:00
|
|
|
#ifdef FBE_FLTINV
|
2006-12-17 22:14:13 +03:00
|
|
|
case FPE_FLTINV: si_code_str = "Invalid floating point operation"; break;
|
2007-07-10 07:46:57 +04:00
|
|
|
#endif
|
2005-01-26 23:05:44 +03:00
|
|
|
#ifdef FPE_FLTSUB
|
2006-12-17 22:14:13 +03:00
|
|
|
case FPE_FLTSUB: si_code_str = "Subscript out of range"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
2007-03-17 00:27:19 +03:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
case SIGSEGV:
|
|
|
|
switch (info->si_code)
|
|
|
|
{
|
2005-10-13 19:41:25 +04:00
|
|
|
#ifdef SEGV_MAPERR
|
2006-12-17 22:14:13 +03:00
|
|
|
case SEGV_MAPERR: si_code_str = "Address not mapped"; break;
|
2005-10-13 19:41:25 +04:00
|
|
|
#endif
|
|
|
|
#ifdef SEGV_ACCERR
|
2006-12-17 22:14:13 +03:00
|
|
|
case SEGV_ACCERR: si_code_str = "Invalid permissions"; break;
|
2005-10-13 19:41:25 +04:00
|
|
|
#endif
|
2007-03-17 00:27:19 +03:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
case SIGBUS:
|
|
|
|
switch (info->si_code)
|
|
|
|
{
|
2005-10-13 19:41:25 +04:00
|
|
|
#ifdef BUS_ADRALN
|
2006-12-17 22:14:13 +03:00
|
|
|
case BUS_ADRALN: si_code_str = "Invalid address alignment"; break;
|
2005-10-13 19:41:25 +04:00
|
|
|
#endif
|
2009-08-12 17:07:04 +04:00
|
|
|
#ifdef BUS_ADRERR
|
|
|
|
case BUS_ADRERR: si_code_str = "Non-existant physical address"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
|
|
|
#ifdef BUS_OBJERR
|
2006-12-17 22:14:13 +03:00
|
|
|
case BUS_OBJERR: si_code_str = "Objet-specific hardware error"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
2007-03-17 00:27:19 +03:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
case SIGTRAP:
|
|
|
|
switch (info->si_code)
|
|
|
|
{
|
2005-01-26 23:05:44 +03:00
|
|
|
#ifdef TRAP_BRKPT
|
2006-12-17 22:14:13 +03:00
|
|
|
case TRAP_BRKPT: si_code_str = "Process breakpoint"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
|
|
|
#ifdef TRAP_TRACE
|
2006-12-17 22:14:13 +03:00
|
|
|
case TRAP_TRACE: si_code_str = "Process trace trap"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
2007-03-17 00:27:19 +03:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
case SIGCHLD:
|
|
|
|
switch (info->si_code)
|
|
|
|
{
|
2005-10-13 19:41:25 +04:00
|
|
|
#ifdef CLD_EXITED
|
2006-12-17 22:14:13 +03:00
|
|
|
case CLD_EXITED: si_code_str = "Child has exited"; break;
|
2005-10-13 19:41:25 +04:00
|
|
|
#endif
|
|
|
|
#ifdef CLD_KILLED
|
2006-12-17 22:14:13 +03:00
|
|
|
case CLD_KILLED: si_code_str = "Child has terminated abnormally and did not create a core file"; break;
|
2005-10-13 19:41:25 +04:00
|
|
|
#endif
|
|
|
|
#ifdef CLD_DUMPED
|
2006-12-17 22:14:13 +03:00
|
|
|
case CLD_DUMPED: si_code_str = "Child has terminated abnormally and created a core file"; break;
|
2005-10-13 19:41:25 +04:00
|
|
|
#endif
|
|
|
|
#ifdef CLD_WTRAPPED
|
2006-12-17 22:14:13 +03:00
|
|
|
case CLD_TRAPPED: si_code_str = "Traced child has trapped"; break;
|
2005-10-13 19:41:25 +04:00
|
|
|
#endif
|
|
|
|
#ifdef CLD_STOPPED
|
2006-12-17 22:14:13 +03:00
|
|
|
case CLD_STOPPED: si_code_str = "Child has stopped"; break;
|
2005-10-13 19:41:25 +04:00
|
|
|
#endif
|
|
|
|
#ifdef CLD_CONTINUED
|
2006-12-17 22:14:13 +03:00
|
|
|
case CLD_CONTINUED: si_code_str = "Stopped child has continued"; break;
|
2005-10-13 19:41:25 +04:00
|
|
|
#endif
|
2007-03-17 00:27:19 +03:00
|
|
|
}
|
|
|
|
break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#ifdef SIGPOLL
|
2007-03-17 00:27:19 +03:00
|
|
|
case SIGPOLL:
|
|
|
|
switch (info->si_code)
|
|
|
|
{
|
2005-12-15 03:51:28 +03:00
|
|
|
#ifdef POLL_IN
|
2007-03-17 00:27:19 +03:00
|
|
|
case POLL_IN: si_code_str = "Data input available"; break;
|
2005-12-15 03:51:28 +03:00
|
|
|
#endif
|
|
|
|
#ifdef POLL_OUT
|
2007-03-17 00:27:19 +03:00
|
|
|
case POLL_OUT: si_code_str = "Output buffers available"; break;
|
2005-12-15 03:51:28 +03:00
|
|
|
#endif
|
|
|
|
#ifdef POLL_MSG
|
2007-03-17 00:27:19 +03:00
|
|
|
case POLL_MSG: si_code_str = "Input message available"; break;
|
2005-12-15 03:51:28 +03:00
|
|
|
#endif
|
|
|
|
#ifdef POLL_ERR
|
2007-03-17 00:27:19 +03:00
|
|
|
case POLL_ERR: si_code_str = "I/O error"; break;
|
2005-12-15 03:51:28 +03:00
|
|
|
#endif
|
|
|
|
#ifdef POLL_PRI
|
2007-03-17 00:27:19 +03:00
|
|
|
case POLL_PRI: si_code_str = "High priority input available"; break;
|
2005-12-15 03:51:28 +03:00
|
|
|
#endif
|
|
|
|
#ifdef POLL_HUP
|
2007-03-17 00:27:19 +03:00
|
|
|
case POLL_HUP: si_code_str = "Device disconnected"; break;
|
2005-12-15 03:51:28 +03:00
|
|
|
#endif
|
2007-03-17 00:27:19 +03:00
|
|
|
}
|
|
|
|
break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif /* SIGPOLL */
|
2007-03-17 00:27:19 +03:00
|
|
|
default:
|
|
|
|
switch (info->si_code)
|
|
|
|
{
|
2005-01-26 23:05:44 +03:00
|
|
|
#ifdef SI_ASYNCNL
|
2006-12-17 22:14:13 +03:00
|
|
|
case SI_ASYNCNL: si_code_str = "SI_ASYNCNL"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
|
|
|
#ifdef SI_SIGIO
|
2006-12-17 22:14:13 +03:00
|
|
|
case SI_SIGIO: si_code_str = "Queued SIGIO"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
2006-09-24 22:20:55 +04:00
|
|
|
#ifdef SI_ASYNCIO
|
2006-12-17 22:14:13 +03:00
|
|
|
case SI_ASYNCIO: si_code_str = "Asynchronous I/O request completed"; break;
|
2006-09-24 22:20:55 +04:00
|
|
|
#endif
|
|
|
|
#ifdef SI_MESGQ
|
2006-12-17 22:14:13 +03:00
|
|
|
case SI_MESGQ: si_code_str = "Message queue state changed"; break;
|
2006-09-24 22:20:55 +04:00
|
|
|
#endif
|
2006-12-17 22:14:13 +03:00
|
|
|
case SI_TIMER: si_code_str = "Timer expiration"; break;
|
|
|
|
case SI_QUEUE: si_code_str = "Sigqueue() signal"; break;
|
|
|
|
case SI_USER: si_code_str = "User function (kill, sigsend, abort, etc.)"; break;
|
2005-01-26 23:05:44 +03:00
|
|
|
#ifdef SI_KERNEL
|
2006-12-17 22:14:13 +03:00
|
|
|
case SI_KERNEL: si_code_str = "Kernel signal"; break;
|
2005-10-13 19:41:25 +04:00
|
|
|
#endif
|
2010-03-15 08:33:42 +03:00
|
|
|
/* Dragonfly defines SI_USER and SI_UNDEFINED both as zero: */
|
2010-04-01 21:04:06 +04:00
|
|
|
/* For some reason, the PGI compiler will not let us combine these two
|
|
|
|
#if tests into a single statement. Sigh. */
|
|
|
|
#if defined(SI_UNDEFINED)
|
|
|
|
#if SI_UNDEFINED != SI_USER
|
2006-12-17 22:14:13 +03:00
|
|
|
case SI_UNDEFINED: si_code_str = "Undefined code"; break;
|
2010-04-01 21:04:06 +04:00
|
|
|
#endif
|
2005-01-26 23:05:44 +03:00
|
|
|
#endif
|
2007-03-17 00:27:19 +03:00
|
|
|
}
|
|
|
|
}
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
|
2007-03-17 00:27:19 +03:00
|
|
|
/* print signal errno information */
|
|
|
|
if (0 != info->si_errno) {
|
|
|
|
ret = snprintf(tmp, size, HOSTFORMAT "Associated errno: %s (%d)\n",
|
|
|
|
stacktrace_hostname, getpid(),
|
|
|
|
strerror (info->si_errno), info->si_errno);
|
|
|
|
size -= ret;
|
|
|
|
tmp += ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = snprintf(tmp, size, HOSTFORMAT "Signal code: %s (%d)\n",
|
2006-12-17 22:48:19 +03:00
|
|
|
stacktrace_hostname, getpid(),
|
2007-03-17 00:27:19 +03:00
|
|
|
si_code_str, info->si_code);
|
2006-12-17 22:14:13 +03:00
|
|
|
size -= ret;
|
|
|
|
tmp += ret;
|
|
|
|
|
2007-03-17 00:27:19 +03:00
|
|
|
switch (signo)
|
|
|
|
{
|
|
|
|
case SIGILL:
|
|
|
|
case SIGFPE:
|
|
|
|
case SIGSEGV:
|
|
|
|
case SIGBUS:
|
2005-02-09 06:08:13 +03:00
|
|
|
{
|
2006-12-17 22:14:13 +03:00
|
|
|
ret = snprintf(tmp, size, HOSTFORMAT "Failing at address: %p\n",
|
2006-12-17 22:48:19 +03:00
|
|
|
stacktrace_hostname, getpid(), info->si_addr);
|
2005-02-09 06:08:13 +03:00
|
|
|
size -= ret;
|
|
|
|
tmp += ret;
|
|
|
|
break;
|
|
|
|
}
|
2007-03-17 00:27:19 +03:00
|
|
|
case SIGCHLD:
|
|
|
|
{
|
|
|
|
ret = snprintf(tmp, size, HOSTFORMAT "Sending PID: %d, Sending UID: %d, Status: %d\n",
|
|
|
|
stacktrace_hostname, getpid(),
|
|
|
|
info->si_pid, info->si_uid, info->si_status);
|
|
|
|
size -= ret;
|
|
|
|
tmp += ret;
|
|
|
|
break;
|
|
|
|
}
|
2005-01-26 23:05:44 +03:00
|
|
|
#ifdef SIGPOLL
|
2007-03-17 00:27:19 +03:00
|
|
|
case SIGPOLL:
|
|
|
|
{
|
2005-04-06 09:32:11 +04:00
|
|
|
#ifdef HAVE_SIGINFO_T_SI_FD
|
2007-03-17 00:27:19 +03:00
|
|
|
ret = snprintf(tmp, size, HOSTFORMAT "Band event: %ld, File Descriptor : %d\n",
|
2008-11-11 17:58:53 +03:00
|
|
|
stacktrace_hostname, getpid(), (long)info->si_band, info->si_fd);
|
2005-12-15 03:51:28 +03:00
|
|
|
#elif HAVE_SIGINFO_T_SI_BAND
|
2007-03-17 00:27:19 +03:00
|
|
|
ret = snprintf(tmp, size, HOSTFORMAT "Band event: %ld\n",
|
2008-11-11 17:58:53 +03:00
|
|
|
stacktrace_hostname, getpid(), (long)info->si_band);
|
2005-12-15 03:51:28 +03:00
|
|
|
#else
|
2007-03-17 00:27:19 +03:00
|
|
|
ret = 0;
|
2005-04-06 09:32:11 +04:00
|
|
|
#endif
|
2007-03-17 00:27:19 +03:00
|
|
|
size -= ret;
|
|
|
|
tmp += ret;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
ret = snprintf(tmp, size,
|
|
|
|
HOSTFORMAT "siginfo is NULL, additional information unavailable\n",
|
|
|
|
stacktrace_hostname, getpid());
|
2005-02-09 06:08:13 +03:00
|
|
|
size -= ret;
|
|
|
|
tmp += ret;
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
}
|
2005-02-09 06:08:13 +03:00
|
|
|
|
2006-12-17 22:14:13 +03:00
|
|
|
/* write out the signal information generated above */
|
2006-03-15 18:06:09 +03:00
|
|
|
write(fileno(stderr), print_buffer, sizeof(print_buffer)-size);
|
2005-02-09 06:08:13 +03:00
|
|
|
fflush(stderr);
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
|
2006-12-17 22:14:13 +03:00
|
|
|
/* print out the stack trace */
|
|
|
|
ret = opal_backtrace_buffer(&traces, &traces_size);
|
|
|
|
if (OPAL_SUCCESS == ret) {
|
|
|
|
int i;
|
|
|
|
/* since we have the opportunity, strip off the bottom two
|
|
|
|
function calls, which will be this function and
|
2008-12-11 00:18:13 +03:00
|
|
|
opal_backtrace_buffer(). */
|
2006-12-17 22:14:13 +03:00
|
|
|
for (i = 2 ; i < traces_size ; ++i) {
|
|
|
|
ret = snprintf(print_buffer, sizeof(print_buffer),
|
2006-12-17 22:27:57 +03:00
|
|
|
HOSTFORMAT "[%2d] %s\n",
|
2006-12-17 22:48:19 +03:00
|
|
|
stacktrace_hostname, getpid(), i - 2, traces[i]);
|
2009-02-05 18:24:48 +03:00
|
|
|
if (ret > 0) {
|
|
|
|
write(fileno(stderr), print_buffer, ret);
|
2009-02-05 18:26:44 +03:00
|
|
|
} else {
|
|
|
|
write(fileno(stderr), unable_to_print_msg,
|
|
|
|
strlen(unable_to_print_msg));
|
2009-02-05 18:24:48 +03:00
|
|
|
}
|
2006-12-17 22:14:13 +03:00
|
|
|
}
|
|
|
|
} else {
|
|
|
|
opal_backtrace_print(stderr);
|
|
|
|
}
|
2005-02-09 06:08:13 +03:00
|
|
|
|
2006-12-17 22:14:13 +03:00
|
|
|
/* write out the footer information */
|
|
|
|
memset (print_buffer, 0, sizeof (print_buffer));
|
|
|
|
ret = snprintf(print_buffer, sizeof(print_buffer),
|
|
|
|
HOSTFORMAT "*** End of error message ***\n",
|
2006-12-17 22:48:19 +03:00
|
|
|
stacktrace_hostname, getpid());
|
2009-01-03 18:33:54 +03:00
|
|
|
if (ret > 0) {
|
|
|
|
write(fileno(stderr), print_buffer, ret);
|
|
|
|
} else {
|
|
|
|
write(fileno(stderr), unable_to_print_msg, strlen(unable_to_print_msg));
|
|
|
|
}
|
2005-02-09 06:08:13 +03:00
|
|
|
fflush(stderr);
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
}
|
|
|
|
|
2009-05-07 00:11:28 +04:00
|
|
|
#endif /* OPAL_WANT_PRETTY_PRINT_STACKTRACE && ! defined(__WINDOWS__) */
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
|
|
|
|
|
2010-05-21 18:30:15 +04:00
|
|
|
#if OPAL_WANT_PRETTY_PRINT_STACKTRACE
|
2008-12-10 23:40:47 +03:00
|
|
|
void opal_stackframe_output(int stream)
|
|
|
|
{
|
|
|
|
int traces_size;
|
|
|
|
char **traces;
|
|
|
|
|
|
|
|
/* print out the stack trace */
|
|
|
|
if (OPAL_SUCCESS == opal_backtrace_buffer(&traces, &traces_size)) {
|
|
|
|
int i;
|
|
|
|
/* since we have the opportunity, strip off the bottom two
|
|
|
|
function calls, which will be this function and
|
|
|
|
opa_backtrace_buffer(). */
|
|
|
|
for (i = 2; i < traces_size; ++i) {
|
2009-10-01 03:33:12 +04:00
|
|
|
opal_output(stream, "%s", traces[i]);
|
2008-12-10 23:40:47 +03:00
|
|
|
}
|
|
|
|
} else {
|
|
|
|
opal_backtrace_print(stderr);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-05-18 02:57:42 +04:00
|
|
|
char *opal_stackframe_output_string(void)
|
|
|
|
{
|
|
|
|
int traces_size, i;
|
|
|
|
size_t len;
|
|
|
|
char *output, **traces;
|
|
|
|
|
|
|
|
len = 0;
|
|
|
|
if (OPAL_SUCCESS != opal_backtrace_buffer(&traces, &traces_size)) {
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Calculate the space needed for the string */
|
|
|
|
for (i = 3; i < traces_size; i++) {
|
|
|
|
if (NULL == traces[i]) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
len += strlen(traces[i]) + 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
output = (char *) malloc(len + 1);
|
|
|
|
if (NULL == output) {
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
*output = '\0';
|
|
|
|
for (i = 3; i < traces_size; i++) {
|
|
|
|
if (NULL == traces[i]) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
strcat(output, traces[i]);
|
|
|
|
strcat(output, "\n");
|
|
|
|
}
|
|
|
|
|
|
|
|
free(traces);
|
|
|
|
return output;
|
|
|
|
}
|
|
|
|
|
2010-05-21 18:30:15 +04:00
|
|
|
#endif /* OPAL_WANT_PRETTY_PRINT_STACKTRACE */
|
2008-12-10 23:40:47 +03:00
|
|
|
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
/**
|
2008-12-10 23:40:47 +03:00
|
|
|
* Here we register the show_stackframe function for signals
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
* passed to OpenMPI by the mpi_signal-parameter passed to mpirun
|
|
|
|
* by the user.
|
|
|
|
*
|
2006-02-12 04:33:29 +03:00
|
|
|
* @returnvalue OPAL_SUCCESS
|
|
|
|
* @returnvalue OPAL_ERR_BAD_PARAM if the value in the signal-list
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
* is not a valid signal-number
|
|
|
|
*
|
|
|
|
*/
|
2005-07-04 06:38:44 +04:00
|
|
|
int opal_util_register_stackhandlers (void)
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
{
|
2009-05-07 00:11:28 +04:00
|
|
|
#if OPAL_WANT_PRETTY_PRINT_STACKTRACE && ! defined(__WINDOWS__)
|
2009-03-20 04:05:30 +03:00
|
|
|
struct sigaction act, old;
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
char * string_value;
|
|
|
|
char * tmp;
|
|
|
|
char * next;
|
2007-02-21 17:26:30 +03:00
|
|
|
int param, i;
|
2009-04-10 19:32:33 +04:00
|
|
|
bool complain, showed_help = false;
|
2006-12-17 22:48:19 +03:00
|
|
|
|
|
|
|
gethostname(stacktrace_hostname, sizeof(stacktrace_hostname));
|
|
|
|
stacktrace_hostname[sizeof(stacktrace_hostname) - 1] = '\0';
|
|
|
|
/* to keep these somewhat readable, only print the machine name */
|
2007-02-21 17:26:30 +03:00
|
|
|
for (i = 0 ; i < (int)sizeof(stacktrace_hostname) ; ++i) {
|
2006-12-17 22:48:19 +03:00
|
|
|
if (stacktrace_hostname[i] == '.') {
|
|
|
|
stacktrace_hostname[i] = '\0';
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
|
2006-01-11 07:36:39 +03:00
|
|
|
param = mca_base_param_find ("opal", NULL, "signal");
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
mca_base_param_lookup_string (param, &string_value);
|
|
|
|
|
|
|
|
memset(&act, 0, sizeof(act));
|
2008-12-10 23:40:47 +03:00
|
|
|
act.sa_sigaction = show_stackframe;
|
2005-01-26 23:05:44 +03:00
|
|
|
act.sa_flags = SA_SIGINFO;
|
|
|
|
#ifdef SA_ONESHOT
|
|
|
|
act.sa_flags |= SA_ONESHOT;
|
|
|
|
#else
|
|
|
|
act.sa_flags |= SA_RESETHAND;
|
|
|
|
#endif
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
|
|
|
|
for (tmp = next = string_value ;
|
|
|
|
next != NULL && *next != '\0';
|
|
|
|
tmp = next + 1)
|
|
|
|
{
|
|
|
|
int sig;
|
|
|
|
int ret;
|
|
|
|
|
2009-04-10 19:32:33 +04:00
|
|
|
complain = false;
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
sig = strtol (tmp, &next, 10);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If there is no sensible number in the string, exit.
|
|
|
|
* Similarly for any number which is not in the signal-number range
|
|
|
|
*/
|
2005-02-09 06:08:13 +03:00
|
|
|
if (((0 == sig) && (tmp == next)) || (0 > sig) || (_NSIG <= sig)) {
|
2006-02-12 04:33:29 +03:00
|
|
|
return OPAL_ERR_BAD_PARAM;
|
2009-04-10 19:32:33 +04:00
|
|
|
} else if (next == NULL) {
|
2006-02-12 04:33:29 +03:00
|
|
|
return OPAL_ERR_BAD_PARAM;
|
2009-04-10 19:32:33 +04:00
|
|
|
} else if (':' == *next &&
|
|
|
|
0 == strncasecmp(next, ":complain", 9)) {
|
|
|
|
complain = true;
|
|
|
|
next += 9;
|
|
|
|
} else if (',' != *next && '\0' != *next) {
|
|
|
|
return OPAL_ERR_BAD_PARAM;
|
2005-02-09 06:08:13 +03:00
|
|
|
}
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
|
2009-04-10 19:32:33 +04:00
|
|
|
/* Just query first */
|
|
|
|
ret = sigaction (sig, NULL, &old);
|
|
|
|
if (0 != ret) {
|
|
|
|
return OPAL_ERR_IN_ERRNO;
|
2005-02-09 06:08:13 +03:00
|
|
|
}
|
2009-04-10 19:32:33 +04:00
|
|
|
/* Was there something already there? */
|
2009-03-20 04:05:30 +03:00
|
|
|
if (SIG_IGN != old.sa_handler && SIG_DFL != old.sa_handler) {
|
2009-04-10 19:32:33 +04:00
|
|
|
if (!showed_help && complain) {
|
2009-03-20 04:05:30 +03:00
|
|
|
/* JMS This is icky; there is no error message
|
|
|
|
aggregation here so this message may be repeated for
|
|
|
|
every single MPI process... This should be replaced
|
|
|
|
with OPAL_SOS when that is done so that it can be
|
|
|
|
properly aggregated. */
|
|
|
|
opal_show_help("help-opal-util.txt",
|
|
|
|
"stacktrace signal override",
|
|
|
|
true, sig, sig, sig, string_value);
|
|
|
|
showed_help = true;
|
|
|
|
}
|
2009-04-10 19:32:33 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Nope, nothing was there, so put in ours */
|
|
|
|
else {
|
|
|
|
if (0 != sigaction(sig, &act, NULL)) {
|
2009-03-20 04:05:30 +03:00
|
|
|
return OPAL_ERR_IN_ERRNO;
|
|
|
|
}
|
|
|
|
}
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
}
|
2007-03-07 04:09:38 +03:00
|
|
|
free(string_value);
|
2009-05-07 00:11:28 +04:00
|
|
|
#endif /* OPAL_WANT_PRETTY_PRINT_STACKTRACE && ! defined(__WINDOWS__) */
|
2005-07-13 08:16:03 +04:00
|
|
|
|
2006-02-12 04:33:29 +03:00
|
|
|
return OPAL_SUCCESS;
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
}
|
|
|
|
|