2004-07-01 14:49:54 +00:00
|
|
|
/*
|
2005-11-05 19:57:48 +00:00
|
|
|
* Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
|
|
|
|
* University Research and Technology
|
|
|
|
* Corporation. All rights reserved.
|
|
|
|
* Copyright (c) 2004-2005 The University of Tennessee and The University
|
|
|
|
* of Tennessee Research Foundation. All rights
|
|
|
|
* reserved.
|
2006-02-07 03:32:36 +00:00
|
|
|
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
2004-11-28 20:09:25 +00:00
|
|
|
* University of Stuttgart. All rights reserved.
|
2005-03-24 12:43:37 +00:00
|
|
|
* Copyright (c) 2004-2005 The Regents of the University of California.
|
|
|
|
* All rights reserved.
|
2004-11-22 01:38:40 +00:00
|
|
|
* $COPYRIGHT$
|
2006-02-07 03:32:36 +00:00
|
|
|
*
|
2004-11-22 01:38:40 +00:00
|
|
|
* Additional copyrights may follow
|
2006-02-07 03:32:36 +00:00
|
|
|
*
|
2004-01-31 21:43:26 +00:00
|
|
|
* $HEADER$
|
|
|
|
*/
|
2004-07-01 14:49:54 +00:00
|
|
|
/** @file:
|
|
|
|
*
|
2004-08-04 23:42:51 +00:00
|
|
|
* the oob framework
|
2004-07-01 14:49:54 +00:00
|
|
|
*/
|
2004-01-31 21:43:26 +00:00
|
|
|
|
2004-08-04 23:42:51 +00:00
|
|
|
#ifndef _MCA_OOB_BASE_H_
|
|
|
|
#define _MCA_OOB_BASE_H_
|
2005-03-14 20:57:21 +00:00
|
|
|
|
|
|
|
#include "orte_config.h"
|
|
|
|
|
2005-12-01 18:28:20 +00:00
|
|
|
#ifdef HAVE_UNISTD_H
|
|
|
|
#include <unistd.h>
|
|
|
|
#endif
|
2004-10-22 16:06:05 +00:00
|
|
|
#ifdef HAVE_SYS_UIO_H
|
2006-02-07 03:32:36 +00:00
|
|
|
#include <sys/uio.h>
|
2004-10-22 16:06:05 +00:00
|
|
|
#endif
|
2005-03-14 20:57:21 +00:00
|
|
|
|
2006-02-07 03:32:36 +00:00
|
|
|
#include "opal/mca/mca.h"
|
Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that *did* start.
The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system.
Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed.
Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief.
With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn.
Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put".
This commit was SVN r14711.
2007-05-21 18:31:28 +00:00
|
|
|
#include "opal/threads/condition.h"
|
2006-02-07 03:32:36 +00:00
|
|
|
|
|
|
|
#include "orte/dss/dss_types.h"
|
|
|
|
#include "orte/mca/ns/ns_types.h"
|
|
|
|
#include "orte/mca/gpr/gpr_types.h"
|
|
|
|
#include "orte/mca/oob/oob_types.h"
|
Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that *did* start.
The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system.
Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed.
Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief.
With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn.
Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put".
This commit was SVN r14711.
2007-05-21 18:31:28 +00:00
|
|
|
#include "orte/mca/rml/rml_types.h"
|
2004-01-31 21:43:26 +00:00
|
|
|
|
2004-10-26 19:15:19 +00:00
|
|
|
#if defined(c_plusplus) || defined(__cplusplus)
|
|
|
|
extern "C" {
|
|
|
|
#endif
|
2004-01-31 21:43:26 +00:00
|
|
|
|
2006-11-03 16:04:40 +00:00
|
|
|
/*
|
|
|
|
* global flag for use in timing tests
|
|
|
|
*/
|
Bring in the generalized xcast communication system along with the correspondingly revised orted launch. I will send a message out to developers explaining the basic changes. In brief:
1. generalize orte_rml.xcast to become a general broadcast-like messaging system. Messages can now be sent to any tag on the daemons or processes. Note that any message sent via xcast will be delivered to ALL processes in the specified job - you don't get to pick and choose. At a later date, we will introduce an augmented capability that will use the daemons as relays, but will allow you to send to a specified array of process names.
2. extended orte_rml.xcast so it supports more scalable message routing methodologies. At the moment, we support three: (a) direct, which sends the message directly to all recipients; (b) linear, which sends the message to the local daemon on each node, which then relays it to its own local procs; and (b) binomial, which sends the message via a binomial algo across all the daemons, each of which then relays to its own local procs. The crossover points between the algos are adjustable via MCA param, or you can simply demand that a specific algo be used.
3. orteds no longer exhibit two types of behavior: bootproxy or VM. Orteds now always behave like they are part of a virtual machine - they simply launch a job if mpirun tells them to do so. This is another step towards creating an "orteboot" functionality, but also provided a clean system for supporting message relaying.
Note one major impact of this commit: multiple daemons on a node cannot be supported any longer! Only a single daemon/node is now allowed.
This commit is known to break support for the following environments: POE, Xgrid, Xcpu, Windows. It has been tested on rsh, SLURM, and Bproc. Modifications for TM support have been made but could not be verified due to machine problems at LANL. Modifications for SGE have been made but could not be verified. The developers for the non-verified environments will be separately notified along with suggestions on how to fix the problems.
This commit was SVN r15007.
2007-06-12 13:28:54 +00:00
|
|
|
ORTE_DECLSPEC extern int mca_oob_base_output;
|
2006-11-03 16:04:40 +00:00
|
|
|
ORTE_DECLSPEC extern bool orte_oob_base_timing;
|
2006-12-01 22:30:39 +00:00
|
|
|
ORTE_DECLSPEC extern bool orte_oob_xcast_timing;
|
Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that *did* start.
The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system.
Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed.
Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief.
With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn.
Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put".
This commit was SVN r14711.
2007-05-21 18:31:28 +00:00
|
|
|
ORTE_DECLSPEC extern opal_mutex_t orte_oob_xcast_mutex;
|
|
|
|
ORTE_DECLSPEC extern opal_condition_t orte_oob_xcast_cond;
|
Bring in the generalized xcast communication system along with the correspondingly revised orted launch. I will send a message out to developers explaining the basic changes. In brief:
1. generalize orte_rml.xcast to become a general broadcast-like messaging system. Messages can now be sent to any tag on the daemons or processes. Note that any message sent via xcast will be delivered to ALL processes in the specified job - you don't get to pick and choose. At a later date, we will introduce an augmented capability that will use the daemons as relays, but will allow you to send to a specified array of process names.
2. extended orte_rml.xcast so it supports more scalable message routing methodologies. At the moment, we support three: (a) direct, which sends the message directly to all recipients; (b) linear, which sends the message to the local daemon on each node, which then relays it to its own local procs; and (b) binomial, which sends the message via a binomial algo across all the daemons, each of which then relays to its own local procs. The crossover points between the algos are adjustable via MCA param, or you can simply demand that a specific algo be used.
3. orteds no longer exhibit two types of behavior: bootproxy or VM. Orteds now always behave like they are part of a virtual machine - they simply launch a job if mpirun tells them to do so. This is another step towards creating an "orteboot" functionality, but also provided a clean system for supporting message relaying.
Note one major impact of this commit: multiple daemons on a node cannot be supported any longer! Only a single daemon/node is now allowed.
This commit is known to break support for the following environments: POE, Xgrid, Xcpu, Windows. It has been tested on rsh, SLURM, and Bproc. Modifications for TM support have been made but could not be verified due to machine problems at LANL. Modifications for SGE have been made but could not be verified. The developers for the non-verified environments will be separately notified along with suggestions on how to fix the problems.
This commit was SVN r15007.
2007-06-12 13:28:54 +00:00
|
|
|
ORTE_DECLSPEC extern int orte_oob_xcast_linear_xover, orte_oob_xcast_binomial_xover;
|
2007-04-25 19:51:52 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Flag indicating if this framework has been opened
|
|
|
|
*/
|
|
|
|
ORTE_DECLSPEC extern bool orte_oob_base_already_opened;
|
|
|
|
|
2004-08-04 23:42:51 +00:00
|
|
|
/*
|
2006-02-07 03:32:36 +00:00
|
|
|
* OOB API
|
2004-06-29 20:36:34 +00:00
|
|
|
*/
|
2004-07-01 14:49:54 +00:00
|
|
|
|
2004-08-04 23:42:51 +00:00
|
|
|
/**
|
|
|
|
* General flags for send/recv
|
2006-02-07 03:32:36 +00:00
|
|
|
*
|
2004-08-04 23:42:51 +00:00
|
|
|
* An example of usage - to determine the size of the next available message w/out receiving it:
|
|
|
|
*
|
2004-08-11 21:07:16 +00:00
|
|
|
* int size = mca_oob_recv(name, 0, 0, MCA_OOB_TRUNC|MCA_OOB_PEEK);
|
2004-08-04 23:42:51 +00:00
|
|
|
*/
|
2004-07-01 14:49:54 +00:00
|
|
|
|
2004-08-09 23:07:53 +00:00
|
|
|
#define MCA_OOB_PEEK 0x01 /**< flag to oob_recv to allow caller to peek a portion of the next available
|
2004-08-04 23:42:51 +00:00
|
|
|
* message w/out removing the message from the queue. */
|
2006-02-07 03:32:36 +00:00
|
|
|
#define MCA_OOB_TRUNC 0x02 /**< flag to oob_recv to return the actual size of the message even if
|
2004-08-12 22:41:42 +00:00
|
|
|
* the receive buffer is smaller than the number of bytes available */
|
|
|
|
#define MCA_OOB_ALLOC 0x04 /**< flag to oob_recv to request the oob to allocate a buffer of the appropriate
|
|
|
|
* size for the receive and return the allocated buffer and size in the first
|
|
|
|
* element of the iovec array. */
|
2005-10-20 22:06:11 +00:00
|
|
|
#define MCA_OOB_PERSISTENT 0x08 /* post receive request persistently - don't remove on match */
|
2004-06-29 20:36:34 +00:00
|
|
|
|
2006-02-07 03:32:36 +00:00
|
|
|
|
2004-08-24 16:54:45 +00:00
|
|
|
/**
|
|
|
|
* Obtain a string representation of the OOB contact information for
|
2006-02-07 03:32:36 +00:00
|
|
|
* the selected OOB channels. This string may be passed to another
|
2004-08-24 16:54:45 +00:00
|
|
|
* application via an MCA parameter (OMPI_MCA_oob_base_seed) to bootstrap
|
|
|
|
* communications.
|
|
|
|
*
|
|
|
|
* @return A null terminated string that should be freed by the caller.
|
|
|
|
*
|
|
|
|
* Note that mca_oob_base_init() must be called to load and select
|
|
|
|
* an OOB module prior to calling this routine.
|
|
|
|
*/
|
|
|
|
|
Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that *did* start.
The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system.
Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed.
Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief.
With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn.
Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put".
This commit was SVN r14711.
2007-05-21 18:31:28 +00:00
|
|
|
ORTE_DECLSPEC char* mca_oob_get_my_contact_info(void);
|
2004-08-24 16:54:45 +00:00
|
|
|
|
|
|
|
/**
|
2006-02-07 03:32:36 +00:00
|
|
|
* Pre-populate the cache of contact information required by the OOB
|
2004-09-08 17:02:24 +00:00
|
|
|
* to reach a given destination. This is required to setup a pointer
|
|
|
|
* to initial registry/name server/etc.
|
2004-08-24 16:54:45 +00:00
|
|
|
*
|
Not as bad as this all may look. Tim and I made a significant change to the way we handle the startup of the oob, the seed, etc. We have made it backwards-compatible so that mpirun2 and singleton operations remain working. We had to adjust the name server and gpr as well, plus the process_info structure.
This also includes a checkpoint update to openmpi.c and ompid.c. I have re-enabled the ompid compile.
This latter raises an important point. The trunk compiles the programs like ompid just fine under Linux. It also does just fine for OSX under the dynamic libraries. However, we are seeing errors when compiling under OSX for the static case - the linker seems to have trouble resolving some variable names, even though linker diagnostics show the variables as being defined. Thus, a warning to Mac users that you may have to locally turn things off if you are trying to do static compiles. We ask, however, that you don't commit those changes that turn things off for everyone else - instead, let's try to figure out why the static compile is having a problem, and let everyone else continue to work.
Thanks
Ralph
This commit was SVN r2534.
2004-09-08 03:59:06 +00:00
|
|
|
* @param uri The contact information of the peer process obtained
|
2004-08-24 16:54:45 +00:00
|
|
|
* via a call to mca_oob_get_contact_info().
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_set_contact_info(const char*);
|
2004-08-24 16:54:45 +00:00
|
|
|
|
2004-09-08 17:02:24 +00:00
|
|
|
/**
|
2006-02-07 03:32:36 +00:00
|
|
|
* A routine to ping a given process name to determine if it is reachable.
|
2004-09-08 17:02:24 +00:00
|
|
|
*
|
|
|
|
* @param name The peer name.
|
|
|
|
* @param tv The length of time to wait on a connection/response.
|
2006-02-07 03:32:36 +00:00
|
|
|
*
|
2004-09-08 17:02:24 +00:00
|
|
|
* Note that this routine blocks up to the specified timeout waiting for a
|
|
|
|
* connection / response from the specified peer. If the peer is unavailable
|
|
|
|
* an error status is returned.
|
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_ping(const char*, struct timeval* tv);
|
2004-09-08 17:02:24 +00:00
|
|
|
|
2004-08-10 22:11:31 +00:00
|
|
|
/**
|
Not as bad as this all may look. Tim and I made a significant change to the way we handle the startup of the oob, the seed, etc. We have made it backwards-compatible so that mpirun2 and singleton operations remain working. We had to adjust the name server and gpr as well, plus the process_info structure.
This also includes a checkpoint update to openmpi.c and ompid.c. I have re-enabled the ompid compile.
This latter raises an important point. The trunk compiles the programs like ompid just fine under Linux. It also does just fine for OSX under the dynamic libraries. However, we are seeing errors when compiling under OSX for the static case - the linker seems to have trouble resolving some variable names, even though linker diagnostics show the variables as being defined. Thus, a warning to Mac users that you may have to locally turn things off if you are trying to do static compiles. We ask, however, that you don't commit those changes that turn things off for everyone else - instead, let's try to figure out why the static compile is having a problem, and let everyone else continue to work.
Thanks
Ralph
This commit was SVN r2534.
2004-09-08 03:59:06 +00:00
|
|
|
* Extract from the contact info the peer process identifier.
|
2004-08-10 22:11:31 +00:00
|
|
|
*
|
Not as bad as this all may look. Tim and I made a significant change to the way we handle the startup of the oob, the seed, etc. We have made it backwards-compatible so that mpirun2 and singleton operations remain working. We had to adjust the name server and gpr as well, plus the process_info structure.
This also includes a checkpoint update to openmpi.c and ompid.c. I have re-enabled the ompid compile.
This latter raises an important point. The trunk compiles the programs like ompid just fine under Linux. It also does just fine for OSX under the dynamic libraries. However, we are seeing errors when compiling under OSX for the static case - the linker seems to have trouble resolving some variable names, even though linker diagnostics show the variables as being defined. Thus, a warning to Mac users that you may have to locally turn things off if you are trying to do static compiles. We ask, however, that you don't commit those changes that turn things off for everyone else - instead, let's try to figure out why the static compile is having a problem, and let everyone else continue to work.
Thanks
Ralph
This commit was SVN r2534.
2004-09-08 03:59:06 +00:00
|
|
|
* @param cinfo (IN) The contact information of the peer process.
|
|
|
|
* @param name (OUT) The peer process identifier.
|
|
|
|
* @param uris (OUT) Will return an array of uri strings corresponding
|
|
|
|
* to the peers exported protocols.
|
2006-02-07 03:32:36 +00:00
|
|
|
*
|
Not as bad as this all may look. Tim and I made a significant change to the way we handle the startup of the oob, the seed, etc. We have made it backwards-compatible so that mpirun2 and singleton operations remain working. We had to adjust the name server and gpr as well, plus the process_info structure.
This also includes a checkpoint update to openmpi.c and ompid.c. I have re-enabled the ompid compile.
This latter raises an important point. The trunk compiles the programs like ompid just fine under Linux. It also does just fine for OSX under the dynamic libraries. However, we are seeing errors when compiling under OSX for the static case - the linker seems to have trouble resolving some variable names, even though linker diagnostics show the variables as being defined. Thus, a warning to Mac users that you may have to locally turn things off if you are trying to do static compiles. We ask, however, that you don't commit those changes that turn things off for everyone else - instead, let's try to figure out why the static compile is having a problem, and let everyone else continue to work.
Thanks
Ralph
This commit was SVN r2534.
2004-09-08 03:59:06 +00:00
|
|
|
* Note the caller may pass NULL for the uris if they only wish to extact
|
|
|
|
* the process name.
|
2004-08-10 22:11:31 +00:00
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_parse_contact_info(const char* uri, orte_process_name_t* peer, char*** uris);
|
2004-08-28 01:15:19 +00:00
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Set the contact info for the seed daemon.
|
|
|
|
*
|
|
|
|
* Note that this can also be passed to the application as an
|
|
|
|
* MCA parameter (OMPI_MCA_oob_base_seed). The contact info (of the seed)
|
|
|
|
* must currently be set before calling mca_oob_base_init().
|
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_set_contact_info(const char*);
|
2004-08-28 01:15:19 +00:00
|
|
|
|
Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that *did* start.
The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system.
Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed.
Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief.
With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn.
Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put".
This commit was SVN r14711.
2007-05-21 18:31:28 +00:00
|
|
|
/**
|
|
|
|
* Update the contact info tables
|
|
|
|
*/
|
|
|
|
ORTE_DECLSPEC void mca_oob_update_contact_info(orte_gpr_notify_data_t* data, void* cbdata);
|
|
|
|
|
2004-08-04 23:42:51 +00:00
|
|
|
/**
|
|
|
|
* Similiar to unix writev(2).
|
|
|
|
*
|
|
|
|
* @param peer (IN) Opaque name of peer process.
|
|
|
|
* @param msg (IN) Array of iovecs describing user buffers and lengths.
|
|
|
|
* @param count (IN) Number of elements in iovec array.
|
|
|
|
* @param tag (IN) User defined tag for matching send/recv.
|
|
|
|
* @param flags (IN) Currently unused.
|
|
|
|
* @return OMPI error code (<0) on error number of bytes actually sent.
|
|
|
|
*
|
|
|
|
* This routine provides semantics similar to unix send/writev with the addition of
|
|
|
|
* a tag parameter that can be used by the application to match the send w/ a specific
|
|
|
|
* receive. In other words - a recv call by the specified peer will only succeed when
|
|
|
|
* the corresponding (or wildcard) tag is used.
|
|
|
|
*
|
|
|
|
* The <i>peer</i> parameter represents an opaque handle to the peer process that
|
|
|
|
* is resolved by the oob layer (using the registry) to an actual physical network
|
|
|
|
* address.
|
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_send(
|
2006-02-07 03:32:36 +00:00
|
|
|
orte_process_name_t* peer,
|
|
|
|
struct iovec *msg,
|
|
|
|
int count,
|
2004-08-04 23:42:51 +00:00
|
|
|
int tag,
|
|
|
|
int flags);
|
|
|
|
|
2004-08-11 22:05:02 +00:00
|
|
|
/*
|
|
|
|
* Similiar to unix send(2) and mca_oob_send.
|
|
|
|
*
|
|
|
|
* @param peer (IN) Opaque name of peer process.
|
|
|
|
* @param buffer (IN) Prepacked OMPI_BUFFER containing data to send
|
|
|
|
* @param flags (IN) Currently unused.
|
|
|
|
* @return OMPI error code (<0) on error or number of bytes actually sent.
|
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_send_packed(
|
2006-02-07 03:32:36 +00:00
|
|
|
orte_process_name_t* peer,
|
|
|
|
orte_buffer_t* buffer,
|
|
|
|
int tag,
|
2004-08-04 23:42:51 +00:00
|
|
|
int flags);
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Similiar to unix readv(2)
|
|
|
|
*
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
|
|
|
* @param peer (IN/OUT) Opaque name of peer process or ORTE_NAME_WILDCARD for wildcard receive. In the
|
2004-08-12 22:41:42 +00:00
|
|
|
* case of a wildcard receive, will be modified to return the matched peer name.
|
|
|
|
* @param msg (IN) Array of iovecs describing user buffers and lengths.
|
|
|
|
* @param count (IN) Number of elements in iovec array.
|
|
|
|
* @param tag (IN/OUT) User defined tag for matching send/recv. In the case of a wildcard receive, will
|
2006-02-07 03:32:36 +00:00
|
|
|
* be modified to return the matched tag. May be optionally by NULL to specify a
|
2004-08-12 22:41:42 +00:00
|
|
|
* wildcard receive with no return value.
|
|
|
|
* @param flags (IN) May be MCA_OOB_PEEK to return up to the number of bytes provided in the
|
|
|
|
* iovec array without removing the message from the queue.
|
|
|
|
* @return OMPI error code (<0) on error or number of bytes actually received.
|
2004-08-04 23:42:51 +00:00
|
|
|
*
|
|
|
|
* The OOB recv call is similar to unix recv/readv in that it requires the caller to manage
|
|
|
|
* memory associated w/ the message. The routine accepts an array of iovecs (<i>msg</i>); however,
|
|
|
|
* the caller must determine the appropriate number of elements (<i>count</i>) and allocate the
|
2006-02-07 03:32:36 +00:00
|
|
|
* buffer space for each entry.
|
2004-08-04 23:42:51 +00:00
|
|
|
*
|
2006-02-07 03:32:36 +00:00
|
|
|
* The <i>tag</i> parameter is provided to facilitate this. The user may define tags based on message
|
2004-08-04 23:42:51 +00:00
|
|
|
* type to determine the message layout and size, as the mca_oob_recv call will block until a message
|
|
|
|
* with the matching tag is received.
|
|
|
|
*
|
2006-02-07 03:32:36 +00:00
|
|
|
* Alternately, the <i>flags</i> parameter may be used to peek (MCA_OOB_PEEK) a portion of the message
|
2004-08-04 23:42:51 +00:00
|
|
|
* (e.g. a standard message header) or determine the overall message size (MCA_OOB_TRUNC|MCA_OOB_PEEK)
|
2006-02-07 03:32:36 +00:00
|
|
|
* without removing the message from the queue.
|
2004-08-04 23:42:51 +00:00
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_recv(
|
2006-02-07 03:32:36 +00:00
|
|
|
orte_process_name_t* peer,
|
|
|
|
struct iovec *msg,
|
|
|
|
int count,
|
|
|
|
int tag,
|
2004-08-04 23:42:51 +00:00
|
|
|
int flags);
|
|
|
|
|
2004-08-11 21:07:16 +00:00
|
|
|
/**
|
|
|
|
* Similiar to unix read(2)
|
|
|
|
*
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
|
|
|
* @param peer (IN) Opaque name of peer process or ORTE_NAME_WILDCARD for wildcard receive.
|
2004-08-12 22:41:42 +00:00
|
|
|
* @param buf (OUT) Array of iovecs describing user buffers and lengths.
|
|
|
|
* @param tag (IN/OUT) User defined tag for matching send/recv.
|
2004-08-11 21:07:16 +00:00
|
|
|
* @return OMPI error code (<0) on error or number of bytes actually received.
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* This version of oob_recv is as above except it does NOT take a iovec list
|
2006-02-07 03:32:36 +00:00
|
|
|
* but instead hands back a orte_buffer_t* buffer with the message in it.
|
2005-03-14 20:57:21 +00:00
|
|
|
* The user is responsible for releasing the buffer when finished w/ it.
|
2004-08-11 21:07:16 +00:00
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_recv_packed (
|
2006-02-07 03:32:36 +00:00
|
|
|
orte_process_name_t* peer,
|
|
|
|
orte_buffer_t *buf,
|
|
|
|
int tag);
|
2004-08-11 21:07:16 +00:00
|
|
|
|
2004-08-04 23:42:51 +00:00
|
|
|
/*
|
|
|
|
* Non-blocking versions of send/recv.
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Callback function on send/recv completion.
|
|
|
|
*
|
|
|
|
* @param status (IN) Completion status - equivalent to the return value from blocking send/recv.
|
|
|
|
* @param peer (IN) Opaque name of peer process.
|
|
|
|
* @param msg (IN) Array of iovecs describing user buffers and lengths.
|
|
|
|
* @param count (IN) Number of elements in iovec array.
|
|
|
|
* @param tag (IN) User defined tag for matching send/recv.
|
|
|
|
* @param cbdata (IN) User data.
|
|
|
|
*/
|
|
|
|
|
|
|
|
typedef void (*mca_oob_callback_fn_t)(
|
|
|
|
int status,
|
2006-02-07 03:32:36 +00:00
|
|
|
orte_process_name_t* peer,
|
|
|
|
struct iovec* msg,
|
2004-08-04 23:42:51 +00:00
|
|
|
int count,
|
2004-08-18 15:51:40 +00:00
|
|
|
int tag,
|
2004-08-12 20:30:03 +00:00
|
|
|
void* cbdata);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Callback function on send/recv completion for buffer PACKED message only.
|
|
|
|
* i.e. only mca_oob_send_packed_nb and mca_oob_recv_packed_nb USE this.
|
|
|
|
*
|
|
|
|
* @param status (IN) Completion status - equivalent to the return value from blocking send/recv.
|
|
|
|
* @param peer (IN) Opaque name of peer process.
|
|
|
|
* @param buffer (IN) For sends, this is a pointer to a prepacked buffer
|
2006-02-07 03:32:36 +00:00
|
|
|
For recvs, OOB creates and returns a buffer
|
2004-08-12 20:30:03 +00:00
|
|
|
* @param tag (IN) User defined tag for matching send/recv.
|
|
|
|
* @param cbdata (IN) User data.
|
|
|
|
*/
|
|
|
|
|
|
|
|
typedef void (*mca_oob_callback_packed_fn_t)(
|
|
|
|
int status,
|
2006-02-07 03:32:36 +00:00
|
|
|
orte_process_name_t* peer,
|
2005-03-14 20:57:21 +00:00
|
|
|
orte_buffer_t* buffer,
|
2004-08-18 15:51:40 +00:00
|
|
|
int tag,
|
2004-08-04 23:42:51 +00:00
|
|
|
void* cbdata);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Non-blocking version of mca_oob_send().
|
|
|
|
*
|
|
|
|
* @param peer (IN) Opaque name of peer process.
|
|
|
|
* @param msg (IN) Array of iovecs describing user buffers and lengths.
|
|
|
|
* @param count (IN) Number of elements in iovec array.
|
|
|
|
* @param tag (IN) User defined tag for matching send/recv.
|
|
|
|
* @param flags (IN) Currently unused.
|
|
|
|
* @param cbfunc (IN) Callback function on send completion.
|
|
|
|
* @param cbdata (IN) User data that is passed to callback function.
|
|
|
|
* @return OMPI error code (<0) on error number of bytes actually sent.
|
|
|
|
*
|
|
|
|
* The user supplied callback function is called when the send completes. Note that
|
2006-02-07 03:32:36 +00:00
|
|
|
* the callback may occur before the call to mca_oob_send returns to the caller,
|
2004-08-04 23:42:51 +00:00
|
|
|
* if the send completes during the call.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_send_nb(
|
2006-02-07 03:32:36 +00:00
|
|
|
orte_process_name_t* peer,
|
|
|
|
struct iovec* msg,
|
|
|
|
int count,
|
2004-08-04 23:42:51 +00:00
|
|
|
int tag,
|
2006-02-07 03:32:36 +00:00
|
|
|
int flags,
|
2004-08-04 23:42:51 +00:00
|
|
|
mca_oob_callback_fn_t cbfunc,
|
|
|
|
void* cbdata);
|
|
|
|
|
2004-08-18 15:51:40 +00:00
|
|
|
/**
|
|
|
|
* Non-blocking version of mca_oob_send_packed().
|
|
|
|
*
|
|
|
|
* @param peer (IN) Opaque name of peer process.
|
|
|
|
* @param buffer (IN) Opaque buffer handle.
|
|
|
|
* @param tag (IN) User defined tag for matching send/recv.
|
|
|
|
* @param flags (IN) Currently unused.
|
|
|
|
* @param cbfunc (IN) Callback function on send completion.
|
|
|
|
* @param cbdata (IN) User data that is passed to callback function.
|
|
|
|
* @return OMPI error code (<0) on error number of bytes actually sent.
|
|
|
|
*
|
|
|
|
* The user supplied callback function is called when the send completes. Note that
|
2006-02-07 03:32:36 +00:00
|
|
|
* the callback may occur before the call to mca_oob_send returns to the caller,
|
2004-08-18 15:51:40 +00:00
|
|
|
* if the send completes during the call.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_send_packed_nb(
|
2006-02-07 03:32:36 +00:00
|
|
|
orte_process_name_t* peer,
|
2005-03-14 20:57:21 +00:00
|
|
|
orte_buffer_t* buffer,
|
2004-08-18 15:51:40 +00:00
|
|
|
int tag,
|
2006-02-07 03:32:36 +00:00
|
|
|
int flags,
|
2004-08-18 15:51:40 +00:00
|
|
|
mca_oob_callback_packed_fn_t cbfunc,
|
|
|
|
void* cbdata);
|
|
|
|
|
2004-08-04 23:42:51 +00:00
|
|
|
/**
|
|
|
|
* Non-blocking version of mca_oob_recv().
|
|
|
|
*
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
|
|
|
* @param peer (IN) Opaque name of peer process or ORTE_NAME_WILDCARD for wildcard receive.
|
2004-08-04 23:42:51 +00:00
|
|
|
* @param msg (IN) Array of iovecs describing user buffers and lengths.
|
|
|
|
* @param count (IN) Number of elements in iovec array.
|
|
|
|
* @param tag (IN) User defined tag for matching send/recv.
|
|
|
|
* @param flags (IN) May be MCA_OOB_PEEK to return up to size bytes of msg w/out removing it from the queue,
|
|
|
|
* @param cbfunc (IN) Callback function on recv completion.
|
|
|
|
* @param cbdata (IN) User data that is passed to callback function.
|
|
|
|
* @return OMPI error code (<0) on error or number of bytes actually received.
|
|
|
|
*
|
2006-02-07 03:32:36 +00:00
|
|
|
* The user supplied callback function is called asynchronously when a message is received
|
2004-08-04 23:42:51 +00:00
|
|
|
* that matches the call parameters.
|
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_recv_nb(
|
2006-02-07 03:32:36 +00:00
|
|
|
orte_process_name_t* peer,
|
|
|
|
struct iovec* msg,
|
|
|
|
int count,
|
2004-08-18 15:51:40 +00:00
|
|
|
int tag,
|
2006-02-07 03:32:36 +00:00
|
|
|
int flags,
|
2004-08-04 23:42:51 +00:00
|
|
|
mca_oob_callback_fn_t cbfunc,
|
|
|
|
void* cbdata);
|
2004-09-30 15:09:29 +00:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Routine to cancel pending non-blocking recvs.
|
|
|
|
*
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
|
|
|
* @param peer (IN) Opaque name of peer process or ORTE_NAME_WILDCARD for wildcard receive.
|
2004-09-30 15:09:29 +00:00
|
|
|
* @param tag (IN) User defined tag for matching send/recv.
|
|
|
|
* @return OMPI error code (<0) on error or number of bytes actually received.
|
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_recv_cancel(
|
2006-02-07 03:32:36 +00:00
|
|
|
orte_process_name_t* peer,
|
2004-09-30 15:09:29 +00:00
|
|
|
int tag);
|
2004-08-04 23:42:51 +00:00
|
|
|
|
2004-06-29 20:36:34 +00:00
|
|
|
/**
|
2004-08-18 15:51:40 +00:00
|
|
|
* Non-blocking version of mca_oob_recv_packed().
|
|
|
|
*
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
|
|
|
* @param peer (IN) Opaque name of peer process or ORTE_NAME_WILDCARD for wildcard receive.
|
2004-08-18 15:51:40 +00:00
|
|
|
* @param buffer (IN) Array of iovecs describing user buffers and lengths.
|
|
|
|
* @param count (IN) Number of elements in iovec array.
|
|
|
|
* @param tag (IN) User defined tag for matching send/recv.
|
|
|
|
* @param flags (IN) May be MCA_OOB_PEEK to return up to size bytes of msg w/out removing it from the queue,
|
|
|
|
* @param cbfunc (IN) Callback function on recv completion.
|
|
|
|
* @param cbdata (IN) User data that is passed to callback function.
|
|
|
|
* @return OMPI error code (<0) on error or number of bytes actually received.
|
|
|
|
*
|
2006-02-07 03:32:36 +00:00
|
|
|
* The user supplied callback function is called asynchronously when a message is received
|
2004-08-18 15:51:40 +00:00
|
|
|
* that matches the call parameters.
|
|
|
|
*/
|
2004-06-29 20:36:34 +00:00
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_recv_packed_nb(
|
2006-02-07 03:32:36 +00:00
|
|
|
orte_process_name_t* peer,
|
2004-08-18 15:51:40 +00:00
|
|
|
int tag,
|
2006-02-07 03:32:36 +00:00
|
|
|
int flags,
|
2004-08-18 15:51:40 +00:00
|
|
|
mca_oob_callback_packed_fn_t cbfunc,
|
|
|
|
void* cbdata);
|
2004-01-31 21:43:26 +00:00
|
|
|
|
2004-11-20 19:12:43 +00:00
|
|
|
/**
|
2006-02-07 03:32:36 +00:00
|
|
|
* A "broadcast-like" function over the specified set of peers.
|
Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly.
There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd *really* have to try).
This also involved a slight change to the oob.xcast API, so propagated that as required.
Note: this has *only* been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-)
Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately.
This commit was SVN r14475.
2007-04-23 18:41:04 +00:00
|
|
|
* @param job The job whose processes are to receive the message.
|
|
|
|
* @param msg The message to be sent
|
|
|
|
* @param cbfunc Callback function on receipt of data
|
2006-02-07 03:32:36 +00:00
|
|
|
*
|
2004-11-20 19:12:43 +00:00
|
|
|
* Note that the callback function is provided so that the data can be
|
Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly.
There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd *really* have to try).
This also involved a slight change to the oob.xcast API, so propagated that as required.
Note: this has *only* been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-)
Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately.
This commit was SVN r14475.
2007-04-23 18:41:04 +00:00
|
|
|
* received and interpreted by the application
|
2004-11-20 19:12:43 +00:00
|
|
|
*/
|
|
|
|
|
2006-11-28 00:06:25 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_xcast(orte_jobid_t job,
|
Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that *did* start.
The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system.
Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed.
Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief.
With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn.
Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put".
This commit was SVN r14711.
2007-05-21 18:31:28 +00:00
|
|
|
orte_buffer_t *buffer,
|
|
|
|
orte_rml_tag_t tag);
|
|
|
|
|
|
|
|
ORTE_DECLSPEC int mca_oob_xcast_nb(orte_jobid_t job,
|
|
|
|
orte_buffer_t *buffer,
|
|
|
|
orte_rml_tag_t tag);
|
|
|
|
|
|
|
|
ORTE_DECLSPEC int mca_oob_xcast_gate(orte_gpr_trigger_cb_fn_t cbfunc);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Register my contact info with the General Purpose Registry
|
|
|
|
* This function causes the component to "put" its contact info
|
|
|
|
* on the registry.
|
|
|
|
*/
|
|
|
|
ORTE_DECLSPEC int mca_oob_register_contact_info(void);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Register a subscription to receive contact info on other processes
|
|
|
|
* This function will typically be called from within a GPR compound command
|
|
|
|
* to register a subscription against a stage gate trigger. When fired, this
|
|
|
|
* will return the OOB contact info for all processes in the specified job
|
|
|
|
*/
|
|
|
|
ORTE_DECLSPEC int mca_oob_register_subscription(orte_jobid_t job, char *trigger);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get contact info for a process or job
|
|
|
|
* Returns contact info for the specified process. If the vpid in the process name
|
|
|
|
* is WILDCARD, then it returns the contact info for all processes in the specified
|
|
|
|
* job. If the jobid is WILDCARD, then it returns the contact info for processes
|
|
|
|
* of the specified vpid across all jobs. Obviously, combining the two WILDCARD
|
|
|
|
* values will return contact info for everyone!
|
|
|
|
*/
|
|
|
|
ORTE_DECLSPEC int mca_oob_get_contact_info(orte_process_name_t *name, orte_gpr_notify_data_t **data);
|
|
|
|
|
|
|
|
|
2004-11-20 19:12:43 +00:00
|
|
|
|
2005-10-06 19:39:20 +00:00
|
|
|
/*
|
|
|
|
* Callback on exception condition.
|
|
|
|
*/
|
|
|
|
|
|
|
|
typedef enum {
|
|
|
|
MCA_OOB_PEER_UNREACH,
|
|
|
|
MCA_OOB_PEER_DISCONNECTED
|
|
|
|
} mca_oob_base_exception_t;
|
|
|
|
|
|
|
|
typedef int (*mca_oob_base_exception_fn_t)(const orte_process_name_t* peer, int exception);
|
2006-02-07 03:32:36 +00:00
|
|
|
|
2005-10-06 19:39:20 +00:00
|
|
|
/**
|
|
|
|
* Register a callback function on loss of a connection.
|
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_add_exception_handler(
|
2005-10-06 19:39:20 +00:00
|
|
|
mca_oob_base_exception_fn_t cbfunc);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Remove a callback
|
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC int mca_oob_del_exception_handler(
|
2005-10-06 19:39:20 +00:00
|
|
|
mca_oob_base_exception_fn_t cbfunc);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Invoke exception handlers
|
|
|
|
*/
|
|
|
|
|
2006-08-20 15:54:04 +00:00
|
|
|
ORTE_DECLSPEC void mca_oob_call_exception_handlers(
|
2005-10-06 19:39:20 +00:00
|
|
|
orte_process_name_t* peer, int exception);
|
|
|
|
|
2004-01-31 21:43:26 +00:00
|
|
|
#if defined(c_plusplus) || defined(__cplusplus)
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
#endif
|
2004-08-04 23:42:51 +00:00
|
|
|
|