2004-01-15 09:08:25 +03:00
|
|
|
/*
|
2004-11-22 04:38:40 +03:00
|
|
|
* Copyright (c) 2004-2005 The Trustees of Indiana University.
|
|
|
|
* All rights reserved.
|
|
|
|
* Copyright (c) 2004-2005 The Trustees of the University of Tennessee.
|
|
|
|
* All rights reserved.
|
2004-11-28 23:09:25 +03:00
|
|
|
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
|
|
|
* University of Stuttgart. All rights reserved.
|
2005-03-24 15:43:37 +03:00
|
|
|
* Copyright (c) 2004-2005 The Regents of the University of California.
|
|
|
|
* All rights reserved.
|
2004-11-22 04:38:40 +03:00
|
|
|
* $COPYRIGHT$
|
|
|
|
*
|
|
|
|
* Additional copyrights may follow
|
|
|
|
*
|
2004-01-15 09:08:25 +03:00
|
|
|
* $HEADER$
|
|
|
|
*/
|
|
|
|
|
2004-06-07 19:33:53 +04:00
|
|
|
#include "ompi_config.h"
|
2004-01-15 09:08:25 +03:00
|
|
|
|
2004-03-19 00:35:28 +03:00
|
|
|
#include "include/constants.h"
|
2004-08-14 05:56:05 +04:00
|
|
|
#include "mpi/runtime/mpiruntime.h"
|
|
|
|
#include "mpi/runtime/params.h"
|
2004-03-17 21:45:16 +03:00
|
|
|
#include "runtime/runtime.h"
|
2004-11-18 01:47:08 +03:00
|
|
|
#include "runtime/ompi_progress.h"
|
2004-06-17 21:29:47 +04:00
|
|
|
#include "util/sys_info.h"
|
2004-07-01 18:49:54 +04:00
|
|
|
#include "util/proc_info.h"
|
2004-09-10 07:21:03 +04:00
|
|
|
#include "util/session_dir.h"
|
2004-01-15 09:08:25 +03:00
|
|
|
#include "mpi.h"
|
2004-03-17 21:45:16 +03:00
|
|
|
#include "communicator/communicator.h"
|
|
|
|
#include "group/group.h"
|
2004-08-12 20:56:24 +04:00
|
|
|
#include "info/info.h"
|
2004-09-05 20:05:37 +04:00
|
|
|
#include "util/show_help.h"
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
#include "util/stacktrace.h"
|
2004-07-27 04:49:41 +04:00
|
|
|
#include "errhandler/errcode.h"
|
|
|
|
#include "errhandler/errclass.h"
|
2004-10-08 21:12:36 +04:00
|
|
|
#include "request/request.h"
|
2004-04-21 02:38:22 +04:00
|
|
|
#include "op/op.h"
|
2004-08-14 05:56:05 +04:00
|
|
|
#include "file/file.h"
|
2004-11-05 10:52:30 +03:00
|
|
|
#include "attribute/attribute.h"
|
2004-11-15 23:03:14 +03:00
|
|
|
#include "threads/thread.h"
|
2004-08-14 05:56:05 +04:00
|
|
|
|
2004-03-17 21:45:16 +03:00
|
|
|
#include "mca/base/base.h"
|
2004-06-17 20:23:34 +04:00
|
|
|
#include "mca/allocator/base/base.h"
|
|
|
|
#include "mca/allocator/allocator.h"
|
|
|
|
#include "mca/mpool/base/base.h"
|
|
|
|
#include "mca/mpool/mpool.h"
|
2004-03-17 21:45:16 +03:00
|
|
|
#include "mca/ptl/ptl.h"
|
|
|
|
#include "mca/ptl/base/base.h"
|
|
|
|
#include "mca/pml/pml.h"
|
|
|
|
#include "mca/pml/base/base.h"
|
|
|
|
#include "mca/coll/coll.h"
|
|
|
|
#include "mca/coll/base/base.h"
|
2004-08-14 05:56:05 +04:00
|
|
|
#include "mca/io/io.h"
|
|
|
|
#include "mca/io/base/base.h"
|
2004-11-20 22:12:43 +03:00
|
|
|
#include "mca/oob/oob.h"
|
Not as bad as this all may look. Tim and I made a significant change to the way we handle the startup of the oob, the seed, etc. We have made it backwards-compatible so that mpirun2 and singleton operations remain working. We had to adjust the name server and gpr as well, plus the process_info structure.
This also includes a checkpoint update to openmpi.c and ompid.c. I have re-enabled the ompid compile.
This latter raises an important point. The trunk compiles the programs like ompid just fine under Linux. It also does just fine for OSX under the dynamic libraries. However, we are seeing errors when compiling under OSX for the static case - the linker seems to have trouble resolving some variable names, even though linker diagnostics show the variables as being defined. Thus, a warning to Mac users that you may have to locally turn things off if you are trying to do static compiles. We ask, however, that you don't commit those changes that turn things off for everyone else - instead, let's try to figure out why the static compile is having a problem, and let everyone else continue to work.
Thanks
Ralph
This commit was SVN r2534.
2004-09-08 07:59:06 +04:00
|
|
|
#include "mca/oob/base/base.h"
|
2005-03-14 23:57:21 +03:00
|
|
|
#include "mca/ns/ns.h"
|
|
|
|
#include "mca/gpr/gpr.h"
|
|
|
|
#include "mca/rml/rml.h"
|
|
|
|
#include "mca/soh/soh.h"
|
|
|
|
#include "mca/errmgr/errmgr.h"
|
Not as bad as this all may look. Tim and I made a significant change to the way we handle the startup of the oob, the seed, etc. We have made it backwards-compatible so that mpirun2 and singleton operations remain working. We had to adjust the name server and gpr as well, plus the process_info structure.
This also includes a checkpoint update to openmpi.c and ompid.c. I have re-enabled the ompid compile.
This latter raises an important point. The trunk compiles the programs like ompid just fine under Linux. It also does just fine for OSX under the dynamic libraries. However, we are seeing errors when compiling under OSX for the static case - the linker seems to have trouble resolving some variable names, even though linker diagnostics show the variables as being defined. Thus, a warning to Mac users that you may have to locally turn things off if you are trying to do static compiles. We ask, however, that you don't commit those changes that turn things off for everyone else - instead, let's try to figure out why the static compile is having a problem, and let everyone else continue to work.
Thanks
Ralph
This commit was SVN r2534.
2004-09-08 07:59:06 +04:00
|
|
|
|
|
|
|
#include "runtime/runtime.h"
|
2004-10-28 23:12:45 +04:00
|
|
|
#include "event/event.h"
|
2004-01-15 09:08:25 +03:00
|
|
|
|
2004-02-05 04:52:56 +03:00
|
|
|
/*
|
|
|
|
* Global variables and symbols for the MPI layer
|
|
|
|
*/
|
|
|
|
|
2004-06-07 19:33:53 +04:00
|
|
|
bool ompi_mpi_initialized = false;
|
|
|
|
bool ompi_mpi_finalized = false;
|
2004-08-12 20:56:24 +04:00
|
|
|
|
2004-06-07 19:33:53 +04:00
|
|
|
bool ompi_mpi_thread_multiple = false;
|
|
|
|
int ompi_mpi_thread_requested = MPI_THREAD_SINGLE;
|
|
|
|
int ompi_mpi_thread_provided = MPI_THREAD_SINGLE;
|
2004-02-05 04:52:56 +03:00
|
|
|
|
2004-11-15 23:03:14 +03:00
|
|
|
ompi_thread_t *ompi_mpi_main_thread = NULL;
|
|
|
|
|
2004-06-07 19:33:53 +04:00
|
|
|
int ompi_mpi_init(int argc, char **argv, int requested, int *provided)
|
2004-01-15 09:08:25 +03:00
|
|
|
{
|
2004-08-14 05:56:05 +04:00
|
|
|
int ret, param;
|
2004-06-07 19:33:53 +04:00
|
|
|
ompi_proc_t** procs;
|
2004-03-03 19:44:41 +03:00
|
|
|
size_t nprocs;
|
2004-11-17 05:30:07 +03:00
|
|
|
char *error = NULL;
|
2005-03-25 06:06:06 +03:00
|
|
|
bool compound_cmd = false;
|
2005-03-14 23:57:21 +03:00
|
|
|
|
2005-03-25 06:06:06 +03:00
|
|
|
/* Join the run-time environment - do the things that don't hit
|
|
|
|
the registry */
|
|
|
|
|
2005-03-23 20:50:12 +03:00
|
|
|
if (ORTE_SUCCESS != (ret = orte_init_stage1())) {
|
|
|
|
error = "ompi_mpi_init: orte_init_stage1 failed";
|
|
|
|
goto error;
|
Not as bad as this all may look. Tim and I made a significant change to the way we handle the startup of the oob, the seed, etc. We have made it backwards-compatible so that mpirun2 and singleton operations remain working. We had to adjust the name server and gpr as well, plus the process_info structure.
This also includes a checkpoint update to openmpi.c and ompid.c. I have re-enabled the ompid compile.
This latter raises an important point. The trunk compiles the programs like ompid just fine under Linux. It also does just fine for OSX under the dynamic libraries. However, we are seeing errors when compiling under OSX for the static case - the linker seems to have trouble resolving some variable names, even though linker diagnostics show the variables as being defined. Thus, a warning to Mac users that you may have to locally turn things off if you are trying to do static compiles. We ask, however, that you don't commit those changes that turn things off for everyone else - instead, let's try to figure out why the static compile is having a problem, and let everyone else continue to work.
Thanks
Ralph
This commit was SVN r2534.
2004-09-08 07:59:06 +04:00
|
|
|
}
|
|
|
|
|
2005-03-23 20:50:12 +03:00
|
|
|
/* if we are not the seed nor a singleton, AND we have not set the
|
|
|
|
* orte_debug flag, then
|
|
|
|
* start recording the compound command that starts us up.
|
|
|
|
* if we are the seed or a singleton, then don't do this - the registry is
|
|
|
|
* local, so we'll just drive it directly */
|
|
|
|
if (orte_process_info.seed ||
|
|
|
|
NULL == orte_process_info.ns_replica ||
|
|
|
|
orte_debug_flag) {
|
|
|
|
compound_cmd = false;
|
|
|
|
} else {
|
|
|
|
if (ORTE_SUCCESS != (ret = orte_gpr.begin_compound_cmd())) {
|
|
|
|
ORTE_ERROR_LOG(ret);
|
|
|
|
error = "ompi_mpi_init: orte_gpr.begin_compound_cmd failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
compound_cmd = true;
|
|
|
|
}
|
2004-09-23 18:35:02 +04:00
|
|
|
|
2005-03-23 20:50:12 +03:00
|
|
|
/* Now do the things that hit the registry */
|
|
|
|
if (ORTE_SUCCESS != (ret = orte_init_stage2())) {
|
|
|
|
ORTE_ERROR_LOG(ret);
|
|
|
|
error = "ompi_mpi_init: orte_init_stage2 failed";
|
|
|
|
goto error;
|
|
|
|
}
|
2004-08-14 05:56:05 +04:00
|
|
|
/* Once we've joined the RTE, see if any MCA parameters were
|
|
|
|
passed to the MPI level */
|
|
|
|
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_mpi_register_params())) {
|
2004-09-05 20:05:37 +04:00
|
|
|
error = "mca_mpi_register_params() failed";
|
|
|
|
goto error;
|
2004-08-14 05:56:05 +04:00
|
|
|
}
|
|
|
|
|
2005-02-10 22:08:35 +03:00
|
|
|
#ifndef WIN32
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
if (OMPI_SUCCESS != (ret = ompi_util_register_stackhandlers ())) {
|
|
|
|
error = "util_register_stackhandlers() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
2005-02-10 22:08:35 +03:00
|
|
|
#endif
|
Add a Stacktrace feature, which figures where/what signal has happened
after MPI-startup.
For this a new mpirun-parameter "mpi_signal" is added, one may specify a
comma-separated list of signals to grab, e.g. mpirun --mca mpi_signal 8,11
will check for SIGFPE and SIGSEGV.
It only finds the first fault (SA_ONESHOT), as after the return the same
fault will occur again.
As printout, the data provided by siginfo_t is printed to STDOUT (yes,
it calls printf ,-]).
Additionally, with glibc, it uses backtrace and backtrace_symbols to
print the calling stack up to the function in which the signal was raised:
(Rank:0) Going to write to RD_ONLY mmaped shared mem
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x4020c000
[0] func:/home/rusraink/ompi-gcc/lib/libmpi.so.0 [0x40121afe]
[1] func:./t0 [0x42029180]
[2] func:./t0(__libc_start_main+0x95) [0x42017589]
[3] func:./t0(__libc_start_main+0x49) [0x8048691]
This commit was SVN r4170.
2005-01-26 22:11:46 +03:00
|
|
|
|
2004-06-07 19:33:53 +04:00
|
|
|
/* initialize ompi procs */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_proc_init())) {
|
2004-09-05 20:05:37 +04:00
|
|
|
error = "mca_proc_init() failed";
|
|
|
|
goto error;
|
2004-03-03 19:44:41 +03:00
|
|
|
}
|
|
|
|
|
2005-03-14 23:57:21 +03:00
|
|
|
/* Open up MPI-related MCA modules. */
|
2004-08-14 05:56:05 +04:00
|
|
|
|
2004-06-17 20:23:34 +04:00
|
|
|
if (OMPI_SUCCESS != (ret = mca_allocator_base_open())) {
|
2004-09-05 20:05:37 +04:00
|
|
|
error = "mca_allocator_base_open() failed";
|
|
|
|
goto error;
|
2004-06-17 20:23:34 +04:00
|
|
|
}
|
|
|
|
if (OMPI_SUCCESS != (ret = mca_mpool_base_open())) {
|
2004-09-05 20:05:37 +04:00
|
|
|
error = "mca_mpool_base_open() failed";
|
|
|
|
goto error;
|
2004-06-17 20:23:34 +04:00
|
|
|
}
|
2004-06-07 19:33:53 +04:00
|
|
|
if (OMPI_SUCCESS != (ret = mca_pml_base_open())) {
|
2004-09-05 20:05:37 +04:00
|
|
|
error = "mca_pml_base_open() failed";
|
|
|
|
goto error;
|
2004-02-13 16:56:55 +03:00
|
|
|
}
|
2004-06-07 19:33:53 +04:00
|
|
|
if (OMPI_SUCCESS != (ret = mca_ptl_base_open())) {
|
2004-09-05 20:05:37 +04:00
|
|
|
error = "mca_ptl_base_open() failed";
|
|
|
|
goto error;
|
2004-02-13 16:56:55 +03:00
|
|
|
}
|
2004-06-07 19:33:53 +04:00
|
|
|
if (OMPI_SUCCESS != (ret = mca_coll_base_open())) {
|
2004-09-05 20:05:37 +04:00
|
|
|
error = "mca_coll_base_open() failed";
|
|
|
|
goto error;
|
2004-02-13 16:56:55 +03:00
|
|
|
}
|
2005-01-04 18:43:26 +03:00
|
|
|
/* The io framework is initialized lazily, at the first use of any
|
|
|
|
MPI_File_* function, so it is not included here. */
|
2004-01-30 06:59:39 +03:00
|
|
|
|
2004-10-15 00:50:06 +04:00
|
|
|
/* initialize module exchange */
|
|
|
|
if (OMPI_SUCCESS != (ret = mca_base_modex_init())) {
|
|
|
|
error = "mca_base_modex_init() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2004-02-13 16:56:55 +03:00
|
|
|
/* Select which pml, ptl, and coll modules to use, and determine the
|
2004-09-10 07:21:03 +04:00
|
|
|
final thread level */
|
2004-01-30 06:59:39 +03:00
|
|
|
|
2004-06-07 19:33:53 +04:00
|
|
|
if (OMPI_SUCCESS !=
|
2005-03-25 06:06:06 +03:00
|
|
|
(ret = mca_base_init_select_components(requested, provided))) {
|
2004-09-05 20:05:37 +04:00
|
|
|
error = "mca_base_init_select_components() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2004-10-08 21:12:36 +04:00
|
|
|
/* initialize requests */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_request_init())) {
|
|
|
|
error = "ompi_request_init() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2004-09-05 20:05:37 +04:00
|
|
|
/* initialize info */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_info_init())) {
|
|
|
|
error = "ompi_info_init() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
2004-10-08 21:12:36 +04:00
|
|
|
|
2004-09-05 20:05:37 +04:00
|
|
|
/* initialize error handlers */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_errhandler_init())) {
|
|
|
|
error = "ompi_errhandler_init() failed";
|
|
|
|
goto error;
|
2004-02-13 16:56:55 +03:00
|
|
|
}
|
2004-01-30 06:59:39 +03:00
|
|
|
|
2004-09-05 20:05:37 +04:00
|
|
|
/* initialize error codes */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_mpi_errcode_init())) {
|
|
|
|
error = "ompi_mpi_errcode_init() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* initialize error classes */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_errclass_init())) {
|
|
|
|
error = "ompi_errclass_init() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* initialize internal error codes */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_errcode_intern_init())) {
|
|
|
|
error = "ompi_errcode_intern_init() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
2004-05-08 03:23:03 +04:00
|
|
|
|
2004-09-05 20:05:37 +04:00
|
|
|
/* initialize groups */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_group_init())) {
|
|
|
|
error = "ompi_group_init() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* initialize communicators */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_comm_init())) {
|
|
|
|
error = "ompi_comm_init() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* initialize datatypes */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_ddt_init())) {
|
|
|
|
error = "ompi_ddt_init() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* initialize ops */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_op_init())) {
|
|
|
|
error = "ompi_op_init() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* initialize file handles */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_file_init())) {
|
|
|
|
error = "ompi_file_init() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2004-09-16 04:00:09 +04:00
|
|
|
/* initialize attribute meta-data structure for comm/win/dtype */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_attr_init())) {
|
|
|
|
error = "ompi_attr_init() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
2004-09-05 20:05:37 +04:00
|
|
|
/* do module exchange */
|
|
|
|
if (OMPI_SUCCESS != (ret = mca_base_modex_exchange())) {
|
|
|
|
error = "ompi_base_modex_exchange() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2004-11-20 22:12:43 +03:00
|
|
|
/*
|
2005-03-14 23:57:21 +03:00
|
|
|
* Let system know we are at STG1 Barrier
|
2004-11-20 22:12:43 +03:00
|
|
|
*/
|
2005-03-14 23:57:21 +03:00
|
|
|
if (ORTE_SUCCESS != (ret = orte_soh.set_proc_soh(orte_process_info.my_name,
|
|
|
|
ORTE_PROC_STATE_AT_STG1, 0))) {
|
|
|
|
ORTE_ERROR_LOG(ret);
|
|
|
|
error = "set process state failed";
|
2004-11-20 22:12:43 +03:00
|
|
|
goto error;
|
2005-03-14 23:57:21 +03:00
|
|
|
}
|
2004-11-20 22:12:43 +03:00
|
|
|
|
2005-03-23 20:50:12 +03:00
|
|
|
/* if the compound command is operative, execute it
|
2004-11-20 22:12:43 +03:00
|
|
|
*/
|
2005-03-23 20:50:12 +03:00
|
|
|
if (compound_cmd) {
|
|
|
|
if (OMPI_SUCCESS != (ret = orte_gpr.exec_compound_cmd())) {
|
|
|
|
ORTE_ERROR_LOG(ret);
|
|
|
|
error = "ompi_rte_init: orte_gpr.exec_compound_cmd failed";
|
|
|
|
goto error;
|
|
|
|
}
|
2005-03-14 23:57:21 +03:00
|
|
|
}
|
2005-03-23 20:50:12 +03:00
|
|
|
|
2004-11-20 22:12:43 +03:00
|
|
|
|
2005-03-14 23:57:21 +03:00
|
|
|
/* FIRST BARRIER - WAIT FOR MSG FROM RMGR_PROC_STAGE_GATE_MGR TO ARRIVE */
|
|
|
|
if (ORTE_SUCCESS != (ret = orte_rml.xcast(NULL, NULL, 0, NULL, NULL))) {
|
|
|
|
ORTE_ERROR_LOG(ret);
|
|
|
|
error = "ompi_mpi_init: failed to see all procs register\n";
|
|
|
|
goto error;
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
|
|
|
|
2005-03-14 23:57:21 +03:00
|
|
|
if (orte_debug_flag) {
|
|
|
|
ompi_output(0, "[%d,%d,%d] process startup completed",
|
|
|
|
ORTE_NAME_ARGS(orte_process_info.my_name));
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
|
|
|
|
2004-09-05 20:05:37 +04:00
|
|
|
/* add all ompi_proc_t's to PML */
|
|
|
|
if (NULL == (procs = ompi_proc_world(&nprocs))) {
|
|
|
|
error = "ompi_proc_world() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
if (OMPI_SUCCESS != (ret = mca_pml.pml_add_procs(procs, nprocs))) {
|
|
|
|
free(procs);
|
|
|
|
error = "PML add procs failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
free(procs);
|
|
|
|
|
|
|
|
/* start PTL's */
|
|
|
|
param = 1;
|
|
|
|
if (OMPI_SUCCESS !=
|
|
|
|
(ret = mca_pml.pml_control(MCA_PTL_ENABLE, ¶m, sizeof(param)))) {
|
|
|
|
error = "PML control failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* save the resulting thread levels */
|
2004-02-05 04:52:56 +03:00
|
|
|
|
2004-06-07 19:33:53 +04:00
|
|
|
ompi_mpi_thread_requested = requested;
|
2004-10-10 04:03:42 +04:00
|
|
|
ompi_mpi_thread_provided = *provided;
|
2004-06-29 04:02:25 +04:00
|
|
|
ompi_mpi_thread_multiple = (ompi_mpi_thread_provided ==
|
|
|
|
MPI_THREAD_MULTIPLE);
|
2005-02-16 20:42:07 +03:00
|
|
|
#if OMPI_ENABLE_MPI_THREADS
|
2004-11-15 23:03:14 +03:00
|
|
|
ompi_mpi_main_thread = ompi_thread_get_self();
|
|
|
|
#else
|
|
|
|
ompi_mpi_main_thread = NULL;
|
|
|
|
#endif
|
2004-02-05 04:52:56 +03:00
|
|
|
|
2004-05-08 03:23:03 +04:00
|
|
|
/* Init coll for the comms */
|
|
|
|
|
2004-09-05 20:05:37 +04:00
|
|
|
if (OMPI_SUCCESS !=
|
|
|
|
(ret = mca_coll_base_comm_select(MPI_COMM_SELF, NULL))) {
|
|
|
|
error = "mca_coll_base_comm_select(MPI_COMM_SELF) failed";
|
|
|
|
goto error;
|
2004-07-13 16:35:43 +04:00
|
|
|
}
|
2004-05-08 03:23:03 +04:00
|
|
|
|
2004-09-05 20:05:37 +04:00
|
|
|
if (OMPI_SUCCESS !=
|
|
|
|
(ret = mca_coll_base_comm_select(MPI_COMM_WORLD, NULL))) {
|
|
|
|
error = "mca_coll_base_comm_select(MPI_COMM_WORLD) failed";
|
|
|
|
goto error;
|
2004-07-13 16:35:43 +04:00
|
|
|
}
|
2004-05-08 03:23:03 +04:00
|
|
|
|
2005-02-16 20:42:07 +03:00
|
|
|
#if OMPI_ENABLE_PROGRESS_THREADS /* BWB - XXX - FIXME - is this actually correct? */
|
2005-01-13 18:30:49 +03:00
|
|
|
/* setup I/O forwarding */
|
2005-02-21 21:56:30 +03:00
|
|
|
if (ompi_process_info.seed == false) {
|
2005-01-18 20:32:54 +03:00
|
|
|
if (OMPI_SUCCESS != (ret = ompi_mpi_init_io())) {
|
|
|
|
error = "ompi_rte_init_io failed";
|
|
|
|
goto error;
|
|
|
|
}
|
2005-01-13 18:30:49 +03:00
|
|
|
}
|
2004-11-18 02:37:49 +03:00
|
|
|
#endif
|
2004-05-08 03:23:03 +04:00
|
|
|
|
2005-02-21 21:56:30 +03:00
|
|
|
/*
|
2005-03-14 23:57:21 +03:00
|
|
|
* Let system know we are at STG2 Barrier
|
2005-02-21 21:56:30 +03:00
|
|
|
*/
|
2005-03-14 23:57:21 +03:00
|
|
|
if (ORTE_SUCCESS != (ret = orte_soh.set_proc_soh(orte_process_info.my_name,
|
|
|
|
ORTE_PROC_STATE_AT_STG2, 0))) {
|
|
|
|
ORTE_ERROR_LOG(ret);
|
|
|
|
error = "set process state failed";
|
|
|
|
goto error;
|
2005-02-21 21:56:30 +03:00
|
|
|
}
|
|
|
|
|
2005-03-14 23:57:21 +03:00
|
|
|
/* BWB - is this still needed? */
|
2005-02-16 20:42:07 +03:00
|
|
|
#if OMPI_ENABLE_PROGRESS_THREADS == 0
|
2005-01-13 18:30:49 +03:00
|
|
|
ompi_progress_events(OMPI_EVLOOP_NONBLOCK);
|
|
|
|
#endif
|
|
|
|
|
2005-03-14 23:57:21 +03:00
|
|
|
/* SECOND BARRIER - WAIT FOR MSG FROM RMGR_PROC_STAGE_GATE_MGR TO ARRIVE */
|
|
|
|
if (ORTE_SUCCESS != (ret = orte_rml.xcast(NULL, NULL, 0, NULL, NULL))) {
|
|
|
|
ORTE_ERROR_LOG(ret);
|
|
|
|
error = "ompi_mpi_init: failed to see all procs register\n";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2004-09-29 16:41:55 +04:00
|
|
|
/* new very last step: check whether we have been spawned or not.
|
|
|
|
We introduce that at the very end, since we need collectives,
|
|
|
|
datatypes, ptls etc. up and running here....
|
|
|
|
*/
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_comm_dyn_init())) {
|
|
|
|
error = "ompi_comm_dyn_init() failed";
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2004-09-05 20:05:37 +04:00
|
|
|
error:
|
|
|
|
if (ret != OMPI_SUCCESS) {
|
|
|
|
ompi_show_help("help-mpi-runtime",
|
|
|
|
"mpi_init:startup:internal-failure", true,
|
|
|
|
"MPI_INIT", "MPI_INIT", error, ret);
|
|
|
|
return ret;
|
2004-06-29 04:02:25 +04:00
|
|
|
}
|
2004-05-08 03:23:03 +04:00
|
|
|
|
2004-02-13 16:56:55 +03:00
|
|
|
/* All done */
|
2004-02-05 04:52:56 +03:00
|
|
|
|
2004-06-07 19:33:53 +04:00
|
|
|
ompi_mpi_initialized = true;
|
2004-11-20 22:12:43 +03:00
|
|
|
|
2005-03-14 23:57:21 +03:00
|
|
|
if (orte_debug_flag) {
|
2004-11-20 22:12:43 +03:00
|
|
|
ompi_output(0, "[%d,%d,%d] ompi_mpi_init completed",
|
2005-03-14 23:57:21 +03:00
|
|
|
ORTE_NAME_ARGS(orte_process_info.my_name));
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
|
|
|
|
2004-02-13 16:56:55 +03:00
|
|
|
return MPI_SUCCESS;
|
2004-01-15 09:08:25 +03:00
|
|
|
}
|