2004-08-02 04:24:22 +04:00
|
|
|
/*
|
2007-03-17 02:11:45 +03:00
|
|
|
* Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana
|
2005-11-05 22:57:48 +03:00
|
|
|
* University Research and Technology
|
|
|
|
* Corporation. All rights reserved.
|
|
|
|
* Copyright (c) 2004-2005 The University of Tennessee and The University
|
|
|
|
* of Tennessee Research Foundation. All rights
|
|
|
|
* reserved.
|
2006-02-07 06:32:36 +03:00
|
|
|
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
2004-11-28 23:09:25 +03:00
|
|
|
* University of Stuttgart. All rights reserved.
|
2005-03-24 15:43:37 +03:00
|
|
|
* Copyright (c) 2004-2005 The Regents of the University of California.
|
|
|
|
* All rights reserved.
|
2004-11-22 04:38:40 +03:00
|
|
|
* $COPYRIGHT$
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2004-11-22 04:38:40 +03:00
|
|
|
* Additional copyrights may follow
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2004-08-02 04:24:22 +04:00
|
|
|
* $HEADER$
|
|
|
|
*/
|
|
|
|
/** @file:
|
|
|
|
*
|
|
|
|
* The Open MPI Name Server
|
|
|
|
*
|
|
|
|
* The Open MPI Name Server provides unique name ranges for processes
|
|
|
|
* within the universe. Each universe will have one name server
|
|
|
|
* running within the seed daemon. This is done to prevent the
|
|
|
|
* inadvertent duplication of names.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef MCA_NS_H
|
|
|
|
#define MCA_NS_H
|
|
|
|
|
|
|
|
/*
|
|
|
|
* includes
|
|
|
|
*/
|
2004-08-16 07:53:17 +04:00
|
|
|
|
2005-03-14 23:57:21 +03:00
|
|
|
#include "orte_config.h"
|
2006-02-12 04:33:29 +03:00
|
|
|
#include "orte/orte_constants.h"
|
|
|
|
#include "orte/orte_types.h"
|
2005-05-01 04:54:12 +04:00
|
|
|
|
2006-02-07 06:32:36 +03:00
|
|
|
#include "orte/dss/dss.h"
|
2004-08-02 04:24:22 +04:00
|
|
|
|
2006-02-07 06:32:36 +03:00
|
|
|
#include "opal/mca/mca.h"
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
#include "orte/mca/rml/rml_types.h"
|
2004-08-02 04:24:22 +04:00
|
|
|
|
2007-03-17 02:11:45 +03:00
|
|
|
#include "opal/mca/crs/crs.h"
|
|
|
|
#include "opal/mca/crs/base/base.h"
|
|
|
|
|
2005-03-14 23:57:21 +03:00
|
|
|
#include "ns_types.h"
|
2004-08-02 04:24:22 +04:00
|
|
|
|
2005-05-12 00:21:10 +04:00
|
|
|
#if defined(c_plusplus) || defined(__cplusplus)
|
|
|
|
extern "C" {
|
|
|
|
#endif
|
|
|
|
|
2004-08-13 19:09:24 +04:00
|
|
|
|
|
|
|
/*
|
2005-03-14 23:57:21 +03:00
|
|
|
* Component functions - all MUST be provided!
|
2004-08-13 19:09:24 +04:00
|
|
|
*/
|
2004-08-02 04:24:22 +04:00
|
|
|
|
2005-03-14 23:57:21 +03:00
|
|
|
/* Init the selected module
|
2004-08-02 04:24:22 +04:00
|
|
|
*/
|
2005-03-14 23:57:21 +03:00
|
|
|
typedef int (*orte_ns_base_module_init_fn_t)(void);
|
2004-08-02 04:24:22 +04:00
|
|
|
|
2005-08-07 17:21:52 +04:00
|
|
|
/**** CELL FUNCTIONS ****/
|
2004-08-02 04:24:22 +04:00
|
|
|
/**
|
|
|
|
* Create a new cell id.
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
* Allocates a new cell id for use by the caller. The function returns an
|
|
|
|
* existing cellid if the specified site/resource already has been assigned
|
|
|
|
* one.
|
2004-08-02 04:24:22 +04:00
|
|
|
*
|
2005-05-17 01:01:09 +04:00
|
|
|
* @param site The name of the site where the cell is located.
|
|
|
|
* @param resource The name of the resource associated with this cell (e.g., the name
|
|
|
|
* of the cluster).
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
* @param cellid The location where the cellid is to be stored.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2005-05-17 01:01:09 +04:00
|
|
|
* @retval ORTE_SUCCESS A cellid was created and returned.
|
|
|
|
* @retval ORTE_ERROR_VALUE An error code indicative of the problem.
|
2004-08-02 04:24:22 +04:00
|
|
|
*
|
|
|
|
* @endcode
|
|
|
|
*/
|
2005-05-17 01:01:09 +04:00
|
|
|
typedef int (*orte_ns_base_module_create_cellid_fn_t)(orte_cellid_t *cellid,
|
|
|
|
char *site, char *resource);
|
2004-08-02 04:24:22 +04:00
|
|
|
|
2005-05-17 01:01:09 +04:00
|
|
|
/**
|
|
|
|
* Get cell info
|
|
|
|
* Retrieve the site and resource info on a cell.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2005-05-17 01:01:09 +04:00
|
|
|
* @param cellid The id of the cell who's info is being requested.
|
|
|
|
* @param site Returns a pointer to a strdup'd string containing the site name.
|
|
|
|
* @param resource Returns a pointer to a strdup'd string containg the resource name.
|
|
|
|
* @retval ORTE_SUCCESS A cellid was created and returned.
|
|
|
|
* @retval ORTE_ERROR_VALUE An error code indicative of the problem.
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_get_cell_info_fn_t)(orte_cellid_t cellid,
|
|
|
|
char **site, char **resource);
|
2006-02-07 06:32:36 +03:00
|
|
|
|
2005-08-07 17:21:52 +04:00
|
|
|
/**
|
|
|
|
* Get the cell id as a character string.
|
|
|
|
* The get_cellid_string() function returns the cell id in a character string
|
|
|
|
* representation. The string is created by expressing the field in hexadecimal. Memory
|
|
|
|
* for the string is allocated by the function - releasing that allocation is the
|
|
|
|
* responsibility of the calling program.
|
|
|
|
*
|
|
|
|
* @param *name A pointer to the name structure containing the name to be
|
|
|
|
* "translated" to a string.
|
|
|
|
*
|
|
|
|
* @retval *name_string A pointer to the character string representation of the
|
|
|
|
* cell id.
|
|
|
|
* @retval NULL Indicates an error occurred - either no memory could be allocated
|
|
|
|
* or the caller provided an incorrect name pointer (e.g., NULL).
|
|
|
|
*
|
|
|
|
* @code
|
|
|
|
* cellid-string = ompi_name_server.get_cellid_string(&name)
|
|
|
|
* @endcode
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_get_cellid_string_fn_t)(char **cellid_string, const orte_process_name_t* name);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Convert cellid to character string
|
|
|
|
* Returns the cellid in a character string representation. The string is created
|
|
|
|
* by expressing the provided cellid in hexadecimal. Memory for the string is
|
|
|
|
* allocated by the function - releasing that allocation is the responsibility of
|
|
|
|
* the calling program.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2005-08-07 17:21:52 +04:00
|
|
|
* @param cellid The cellid to be converted.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2005-08-07 17:21:52 +04:00
|
|
|
* @retval *cellid_string A pointer to a character string representation of the cellid.
|
|
|
|
* @retval NULL Indicates an error occurred - probably no memory could be allocated.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2005-08-07 17:21:52 +04:00
|
|
|
* @code
|
|
|
|
* cellid-string = ompi_name_server.convert_cellid_to_string(cellid);
|
|
|
|
* @endcode
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_convert_cellid_to_string_fn_t)(char **cellid_string, const orte_cellid_t cellid);
|
2006-02-07 06:32:36 +03:00
|
|
|
|
2005-08-07 17:21:52 +04:00
|
|
|
/**
|
|
|
|
* Convert a string to a cellid.
|
|
|
|
* Converts a characters string into a cellid. The character string must be a
|
|
|
|
* hexadecimal representation of a valid cellid.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2005-08-07 17:21:52 +04:00
|
|
|
* @param cellidstring The string to be converted.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2005-08-07 17:21:52 +04:00
|
|
|
* @retval cellid The resulting cellid
|
|
|
|
* @retval MCA_NS_BASE_CELLID_MAX String could not be converted.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2005-08-07 17:21:52 +04:00
|
|
|
* @code
|
|
|
|
* cellid = ompi_name_server.convert_string_to_cellid(cellidstring);
|
|
|
|
* @endcode
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_convert_string_to_cellid_fn_t)(orte_cellid_t *cellid, const char *cellidstring);
|
|
|
|
|
|
|
|
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
/**** NODE FUNCTIONS ****/
|
|
|
|
/*
|
|
|
|
* Get an array of node id's
|
|
|
|
* Given the cell and a NULL-terminated array of names of nodes within it, this function assigns an id to represent
|
|
|
|
* each node within the cell.
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_create_nodeids_fn_t)(orte_nodeid_t **nodes, orte_std_cntr_t *nnodes,
|
|
|
|
orte_cellid_t cellid, char **nodename);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get node info
|
|
|
|
* Retrieve the names of an array of nodes given their cellid and nodeids. The cellid
|
|
|
|
* is required as the nodeids are only unique within a given cell.
|
|
|
|
*
|
|
|
|
* @param cellid The id of the cell of the node.
|
|
|
|
* @param nodeids The ids of the node.
|
|
|
|
* @param nodenames Returns a pointer to a NULL-terminated array of strdup'd strings containing the node names.
|
|
|
|
* @retval ORTE_SUCCESS The nodename was created and returned.
|
|
|
|
* @retval ORTE_ERROR_VALUE An error code indicative of the problem.
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_get_node_info_fn_t)(char ***nodename, orte_cellid_t cellid,
|
|
|
|
orte_std_cntr_t num_nodes, orte_nodeid_t *nodeids);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Convert nodeid to character string
|
|
|
|
* Returns the nodeid in a character string representation. The string is created
|
|
|
|
* by expressing the provided nodeid in decimal. Memory for the string is
|
|
|
|
* allocated by the function - releasing that allocation is the responsibility of
|
|
|
|
* the calling program.
|
2005-08-07 17:21:52 +04:00
|
|
|
*
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
* @param nodeid The nodeid to be converted.
|
2005-08-07 17:21:52 +04:00
|
|
|
*
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
* @param *nodeid_string A pointer to a character string representation of the nodeid.
|
|
|
|
* @retval ORTE_SUCCESS The string was created and returned.
|
|
|
|
* @retval ORTE_ERROR_VALUE An error code indicative of the problem.
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_convert_nodeid_to_string_fn_t)(char **nodeid_string, const orte_nodeid_t nodeid);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Convert a string to a nodeid.
|
|
|
|
* Converts a characters string into a nodeid. The character string must be a
|
|
|
|
* decimal representation of a valid nodeid.
|
2005-08-07 17:21:52 +04:00
|
|
|
*
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
* @param nodeidstring The string to be converted.
|
|
|
|
*
|
|
|
|
* @param nodeid A pointer to a location where the resulting nodeid is to be stored.
|
|
|
|
* @retval ORTE_SUCCESS The string was created and returned.
|
|
|
|
* @retval ORTE_ERROR_VALUE An error code indicative of the problem.
|
2005-08-07 17:21:52 +04:00
|
|
|
*/
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
typedef int (*orte_ns_base_module_convert_string_to_nodeid_fn_t)(orte_nodeid_t *nodeid, const char *nodeidstring);
|
2005-08-07 17:21:52 +04:00
|
|
|
|
|
|
|
|
|
|
|
/**** JOB ID FUNCTIONS ****/
|
2004-08-02 04:24:22 +04:00
|
|
|
/**
|
|
|
|
* Create a new job id.
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
* Allocate a new job id for use by the caller.
|
|
|
|
*
|
2004-08-02 04:24:22 +04:00
|
|
|
* The 0 job id is reserved for daemons within the system and will not be allocated.
|
|
|
|
* Developers should therefore assume that the daemon job id is automatically allocated
|
|
|
|
* and proceed to request names against it.
|
|
|
|
*
|
|
|
|
* @param None
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
* @param jobid A pointer to the location where the jobid is to be returned.
|
|
|
|
* @param attrs A list of attributes that describe any conditions to be placed on
|
|
|
|
* the assigned jobid. For example, specifying USE_PARENT indicates that the specified
|
|
|
|
* jobid is to be identified as the parent of the new jobid. USE_ROOT indicates that
|
|
|
|
* the root of the job family of the specified jobid is to be identified as the parent.
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_create_jobid_fn_t)(orte_jobid_t *jobid, opal_list_t *attrs);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get job descendants
|
|
|
|
* Given a jobid, return the array of jobids that descend from this one.
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_get_job_descendants_fn_t)(orte_jobid_t** descendants,
|
|
|
|
orte_std_cntr_t *num_desc,
|
|
|
|
orte_jobid_t job);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get job children
|
|
|
|
* Given a jobid, return the array of jobids that are direct children of that job
|
2004-08-02 04:24:22 +04:00
|
|
|
*/
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
typedef int (*orte_ns_base_module_get_job_children_fn_t)(orte_jobid_t** children,
|
|
|
|
orte_std_cntr_t *num_childs,
|
|
|
|
orte_jobid_t job);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get root job from job family
|
|
|
|
* Given a jobid, return the jobid at the head of this job's family. If the jobid provided is the
|
|
|
|
* root for that family, that value will be returned.
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_get_root_job_fn_t)(orte_jobid_t *root_job, orte_jobid_t job);
|
|
|
|
|
2007-04-23 16:48:19 +04:00
|
|
|
/*
|
|
|
|
* Get a job family
|
|
|
|
* Given a jobid, return the array of jobids (including the given one) that are members
|
|
|
|
* of that extended job family. This will return ALL jobs related to the given one.
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_get_job_family_fn_t)(orte_jobid_t** family,
|
|
|
|
orte_std_cntr_t *num_members,
|
|
|
|
orte_jobid_t job);
|
|
|
|
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
/*
|
|
|
|
* Get parent jobid
|
|
|
|
* Given a jobid, return the parent job from which it descended. If the provided jobid is the
|
|
|
|
* root (i.e., has no parent), this function will return that same value.
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_get_parent_job_fn_t)(orte_jobid_t *parent, orte_jobid_t job);
|
2004-08-02 04:24:22 +04:00
|
|
|
|
2005-08-07 17:21:52 +04:00
|
|
|
/**
|
|
|
|
* Reserve a range of process id's.
|
|
|
|
* The reserve_range() function reserves a range of vpid's for the given jobid.
|
|
|
|
* Note that the cellid does not factor into this request - jobid's span the entire universe,
|
|
|
|
* hence the cell where the process is currently executing is irrelevant to this request.
|
|
|
|
*
|
|
|
|
* @param jobid The id of the job for which the vpid's are to be reserved.
|
|
|
|
* @param range The number of vpid's to be reserved. The function will find the
|
|
|
|
* next available process id and assign range-number of sequential id's to the caller.
|
|
|
|
* These id's will be reserved - i.e., they cannot be assigned to any subsequent caller.
|
|
|
|
*
|
|
|
|
* @retval startid The starting value of the reserved range of vpid's. A value of MCA_NS_BASE_VPID_MAX
|
|
|
|
* indicates that an error occurred.
|
|
|
|
*
|
|
|
|
* @code
|
|
|
|
* starting_procid = ompi_name_server.reserve_range(jobid, range)
|
|
|
|
* @endcode
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_reserve_range_fn_t)(orte_jobid_t job,
|
|
|
|
orte_vpid_t range,
|
|
|
|
orte_vpid_t *startvpid);
|
|
|
|
|
2007-04-23 16:48:19 +04:00
|
|
|
/*
|
|
|
|
* Get the range of vpids assigned to a specified jobid
|
|
|
|
* Given a jobid, return the maximum vpid value assigned to that job.
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_get_vpid_range_fn_t)(orte_jobid_t job, orte_vpid_t *range);
|
|
|
|
|
2005-08-07 17:21:52 +04:00
|
|
|
/**
|
|
|
|
* Get the job id as a character string.
|
|
|
|
* The get_jobid_string() function returns the job id in a character string
|
|
|
|
* representation. The string is created by expressing the field in hexadecimal. Memory
|
|
|
|
* for the string is allocated by the function - releasing that allocation is the
|
|
|
|
* responsibility of the calling program.
|
|
|
|
*
|
|
|
|
* @param *name A pointer to the name structure containing the name to be
|
|
|
|
* "translated" to a string.
|
|
|
|
*
|
|
|
|
* @retval *name_string A pointer to the character string representation of the
|
|
|
|
* job id.
|
|
|
|
* @retval NULL Indicates an error occurred - either no memory could be allocated
|
|
|
|
* or the caller provided an incorrect name pointer (e.g., NULL).
|
|
|
|
*
|
|
|
|
* @code
|
|
|
|
* jobid-string = ompi_name_server.get_jobid_string(&name)
|
|
|
|
* @endcode
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_get_jobid_string_fn_t)(char **jobid_string, const orte_process_name_t* name);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Convert jobid to character string
|
|
|
|
* The convert_jobid_to_string() function returns the jobid in a character string representation.
|
|
|
|
* The string is created by expressing the provided jobid in hexadecimal. Memory
|
|
|
|
* for the string is allocated by the function - releasing that allocation is the
|
|
|
|
* responsibility of the calling program.
|
|
|
|
*
|
|
|
|
* @param jobid The jobid to be converted.
|
|
|
|
*
|
|
|
|
* @retval *jobid_string A pointer to a character string representation of the
|
|
|
|
* jobid.
|
|
|
|
* @retval NULL Indicates an error occurred - probably no memory could be allocated.
|
|
|
|
*
|
|
|
|
* @code
|
|
|
|
* jobid-string = ompi_name_server.convert_jobid_to_string(jobid);
|
|
|
|
* @endcode
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_convert_jobid_to_string_fn_t)(char **jobid_string, const orte_jobid_t jobid);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Convert a string to a jobid
|
|
|
|
* Converts a character string into a jobid. The character string must be a hexadecimal
|
|
|
|
* representation of a valid jobid.
|
|
|
|
*
|
|
|
|
* @param jobidstring The string to be converted.
|
|
|
|
*
|
|
|
|
* @retval jobid The resulting jobid.
|
|
|
|
* @retval MCA_NS_BASE_JOBID_MAX String could not be converted.
|
|
|
|
*
|
|
|
|
* @code
|
|
|
|
* jobid = ompi_name_server.convert_string_to_jobid(jobidstring);
|
|
|
|
* @endcode
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_convert_string_to_jobid_fn_t)(orte_jobid_t *jobid, const char* jobidstring);
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/**** NAME FUNCTIONS ****/
|
2004-08-02 04:24:22 +04:00
|
|
|
/**
|
|
|
|
* Obtain a single new process name.
|
|
|
|
* The create_process_name() function creates a single process name structure and fills the
|
|
|
|
* fields with the provided values.
|
|
|
|
*
|
|
|
|
* @param cell The cell for which the process name is intended. Usually, this is
|
|
|
|
* the id of the cell where the process is initially planning to be spawned.
|
|
|
|
* @param job The id of the job to which the process will belong. Process id's are
|
|
|
|
* tracked according to jobid, but not cellid. Thus, two processes
|
|
|
|
* can have the same process id if and only if they have different jobid's. However,
|
|
|
|
* two processes in the same jobid cannot have the same process id, regardless
|
|
|
|
* of whether or not they are in the same cell.
|
2006-02-07 06:32:36 +03:00
|
|
|
* @param vpid The virtual process id for the name. Note that no check is made for uniqueness -
|
2004-08-02 04:24:22 +04:00
|
|
|
* the caller is responsible for ensuring that the requested name is, in fact, unique
|
|
|
|
* by first requesting reservation of an appropriate range of virtual process id's.
|
|
|
|
*
|
|
|
|
* @retval *name Pointer to an ompi_process_name_t structure containing the name.
|
|
|
|
* @retval NULL Indicates an error, probably due to inability to allocate memory for
|
|
|
|
* the name structure.
|
|
|
|
*
|
|
|
|
* @code
|
|
|
|
* new_name = ompi_name_server.create_process_name(cell, job, vpid);
|
|
|
|
* @endcode
|
|
|
|
*/
|
2005-03-14 23:57:21 +03:00
|
|
|
typedef int (*orte_ns_base_module_create_proc_name_fn_t)(orte_process_name_t **name,
|
|
|
|
orte_cellid_t cell,
|
|
|
|
orte_jobid_t job,
|
|
|
|
orte_vpid_t vpid);
|
2004-08-02 04:24:22 +04:00
|
|
|
|
2005-05-24 17:39:15 +04:00
|
|
|
/*
|
|
|
|
* Create my name
|
|
|
|
* If a process is a singleton, then it needs to create a name for itself. When
|
|
|
|
* a persistent daemon is present, this requires a communication to that daemon.
|
|
|
|
* Since the RML uses process names as its index into the RML communicator table,
|
|
|
|
* the RML automatically assigns a name to each process when it first attempts
|
|
|
|
* to communicate. This function takes advantage of that behavior to ensure that
|
|
|
|
* one, and ONLY one, name gets assigned to the process
|
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_create_my_name_fn_t)(void);
|
|
|
|
|
2004-08-10 21:58:23 +04:00
|
|
|
/**
|
|
|
|
* Convert a string representation to a process name.
|
|
|
|
* The convert_string_to_process_name() function converts a string representation of a process
|
|
|
|
* name into an Open MPI name structure. The string must be of the proper form - i.e., it
|
|
|
|
* must be in the form "cellid.jobid.vpid", where each field is expressed in hexadecimal form.
|
|
|
|
*
|
|
|
|
* @param *name_string A character string representation of a process name.
|
|
|
|
*
|
|
|
|
* @retval *name Pointer to an ompi_process_name_t structure containing the name.
|
|
|
|
* @retval NULL Indicates an error, probably due to inability to allocate memory for
|
|
|
|
* the name structure.
|
|
|
|
*
|
|
|
|
* @code
|
|
|
|
* name = ompi_name_server.convert_string_to_process_name(name_string);
|
|
|
|
* @endcode
|
|
|
|
*/
|
2005-03-14 23:57:21 +03:00
|
|
|
typedef int (*orte_ns_base_module_convert_string_to_process_name_fn_t)(orte_process_name_t **name,
|
|
|
|
const char* name_string);
|
2004-08-10 21:58:23 +04:00
|
|
|
|
|
|
|
|
2004-08-02 04:24:22 +04:00
|
|
|
/**
|
|
|
|
* Get the process name as a character string.
|
|
|
|
* The get_proc_name_string() function returns the entire process name in a
|
|
|
|
* character string representation. The string is created by expressing each
|
|
|
|
* field in hexadecimal separated by periods, as follows:
|
|
|
|
*
|
|
|
|
* sprintf(string_name, "%x.%x.%x", cellid, jobid, vpid)
|
|
|
|
*
|
|
|
|
* The memory required for the string is allocated by the function - releasing
|
|
|
|
* that allocation is the responsibility of the calling program.
|
|
|
|
*
|
|
|
|
* @param *name A pointer to the name structure containing the name to be
|
|
|
|
* "translated" to a string.
|
|
|
|
*
|
|
|
|
* @retval *name_string A pointer to the character string representation of the
|
|
|
|
* full name.
|
|
|
|
* @retval NULL Indicates an error occurred - either no memory could be allocated
|
|
|
|
* or the caller provided an incorrect name pointer (e.g., NULL).
|
|
|
|
*
|
|
|
|
* @code
|
|
|
|
* name-string = ompi_name_server.get_proc_name_string(&name)
|
|
|
|
* @endcode
|
|
|
|
*/
|
2005-03-14 23:57:21 +03:00
|
|
|
typedef int (*orte_ns_base_module_get_proc_name_string_fn_t)(char **name_string,
|
|
|
|
const orte_process_name_t* name);
|
2004-08-02 04:24:22 +04:00
|
|
|
|
2005-08-07 17:21:52 +04:00
|
|
|
/**
|
|
|
|
* Compare two name values.
|
|
|
|
* The compare() function checks the value of the fields in the two
|
|
|
|
* provided names, and returns a value indicating if the first one is less than, greater
|
|
|
|
* than, or equal to the second. The value of each field is compared in a hierarchical
|
|
|
|
* fashion, with cellid first, followed by jobid and vpid in sequence. The bit-mask
|
|
|
|
* indicates which fields are to be included in the comparison. Fields not included via the
|
|
|
|
* bit-mask are ignored. Thus, the caller may request that any combination of the three fields
|
|
|
|
* be included in the comparison.
|
|
|
|
*
|
|
|
|
* @param fields A bit-mask indicating which fields are to be included in the comparison. The
|
|
|
|
* comparison is performed on a hierarchical basis, with cellid being first, followed by
|
|
|
|
* jobid and then vpid. Each field can be included separately, thus allowing the caller
|
|
|
|
* to configure the comparison to meet their needs.
|
|
|
|
* @param *name1 A pointer to the first name structure.
|
|
|
|
* @param *name2 A pointer to the second name structure.
|
|
|
|
*
|
|
|
|
* @retval -1 The indicated fields of the first provided name are less than the same
|
|
|
|
* fields of the second provided name.
|
|
|
|
* @retval 0 The indicated fields of the two provided names are equal.
|
|
|
|
* @retval +1 The indicated fields of the first provided name is greater than the same
|
|
|
|
* fields of the second provided name.
|
|
|
|
*
|
|
|
|
* The function returns a large negative value if there is an error.
|
|
|
|
*
|
|
|
|
* @code
|
|
|
|
* result = ompi_name_server.compare(bit_mask, &name1, &name2)
|
|
|
|
* @endcode
|
|
|
|
*/
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
typedef int (*orte_ns_base_module_compare_fields_fn_t)(orte_ns_cmp_bitmask_t fields,
|
|
|
|
const orte_process_name_t* name1,
|
|
|
|
const orte_process_name_t* name2);
|
2005-08-07 17:21:52 +04:00
|
|
|
|
|
|
|
|
|
|
|
/**** VPID FUNCTIONS ****/
|
2004-08-02 04:24:22 +04:00
|
|
|
/**
|
|
|
|
* Get the virtual process id as a character string.
|
|
|
|
* The get_vpid_string() function returns the vpid in a character string
|
|
|
|
* representation. The string is created by expressing the field in hexadecimal. Memory
|
|
|
|
* for the string is allocated by the function - releasing that allocation is the
|
|
|
|
* responsibility of the calling program.
|
|
|
|
*
|
|
|
|
* @param *name A pointer to the name structure containing the name to be
|
|
|
|
* "translated" to a string.
|
|
|
|
*
|
|
|
|
* @retval *name_string A pointer to the character string representation of the
|
|
|
|
* vpid.
|
|
|
|
* @retval NULL Indicates an error occurred - either no memory could be allocated
|
|
|
|
* or the caller provided an incorrect name pointer (e.g., NULL).
|
|
|
|
*
|
|
|
|
* @code
|
|
|
|
* vpid-string = ompi_name_server.get_vpid_string(&name)
|
|
|
|
* @endcode
|
|
|
|
*/
|
2005-03-14 23:57:21 +03:00
|
|
|
typedef int (*orte_ns_base_module_get_vpid_string_fn_t)(char **vpid_string, const orte_process_name_t* name);
|
2004-08-02 04:24:22 +04:00
|
|
|
|
2004-11-30 19:27:32 +03:00
|
|
|
/**
|
|
|
|
* Convert vpid to character string
|
|
|
|
* Returns the vpid in a character string representation. The string is created
|
|
|
|
* by expressing the provided vpid in hexadecimal. Memory for the string is
|
|
|
|
* allocated by the function - releasing that allocation is the responsibility of
|
|
|
|
* the calling program.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2004-11-30 19:27:32 +03:00
|
|
|
* @param vpid The vpid to be converted.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2004-11-30 19:27:32 +03:00
|
|
|
* @retval *vpid_string A pointer to a character string representation of the vpid.
|
|
|
|
* @retval NULL Indicates an error occurred - probably no memory could be allocated.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2004-11-30 19:27:32 +03:00
|
|
|
* @code
|
|
|
|
* vpid-string = ompi_name_server.convert_vpid_to_string(vpid);
|
|
|
|
* @endcode
|
|
|
|
*/
|
2005-03-14 23:57:21 +03:00
|
|
|
typedef int (*orte_ns_base_module_convert_vpid_to_string_fn_t)(char **vpid_string, const orte_vpid_t vpid);
|
2006-02-07 06:32:36 +03:00
|
|
|
|
2004-11-30 19:27:32 +03:00
|
|
|
/**
|
|
|
|
* Convert a string to a vpid.
|
|
|
|
* Converts a characters string into a vpid. The character string must be a
|
|
|
|
* hexadecimal representation of a valid vpid.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2004-11-30 19:27:32 +03:00
|
|
|
* @param vpidstring The string to be converted.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2004-11-30 19:27:32 +03:00
|
|
|
* @retval vpid The resulting vpid
|
|
|
|
* @retval MCA_NS_BASE_VPID_MAX String could not be converted.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2004-11-30 19:27:32 +03:00
|
|
|
* @code
|
|
|
|
* vpid = ompi_name_server.convert_string_to_vpid(vpidstring);
|
|
|
|
* @endcode
|
|
|
|
*/
|
2005-03-14 23:57:21 +03:00
|
|
|
typedef int (*orte_ns_base_module_convert_string_to_vpid_fn_t)(orte_vpid_t *vpid, const char* vpidstring);
|
2004-11-30 19:27:32 +03:00
|
|
|
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
|
2004-08-02 04:24:22 +04:00
|
|
|
|
2005-08-07 17:21:52 +04:00
|
|
|
/**** TAG SERVER ****/
|
2004-11-20 22:20:13 +03:00
|
|
|
/*
|
2005-03-14 23:57:21 +03:00
|
|
|
* Allocate a tag
|
|
|
|
* If name is NULL, tag server provides next unique tag but cannot look
|
|
|
|
* that number up again for anyone else.
|
2004-11-20 22:20:13 +03:00
|
|
|
*/
|
2005-03-14 23:57:21 +03:00
|
|
|
typedef int (*orte_ns_base_module_assign_rml_tag_fn_t)(orte_rml_tag_t *tag,
|
|
|
|
char *name);
|
2004-11-20 22:20:13 +03:00
|
|
|
|
2005-08-07 17:21:52 +04:00
|
|
|
/**** DATA TYPE SERVER ****/
|
2005-05-01 04:58:06 +04:00
|
|
|
/* This function defines a new data type and gives it a system-wide unique
|
|
|
|
* identifier for use in the data packing subsystem. Only called from the
|
|
|
|
* dps when needing a new identifier.
|
2005-05-01 04:54:12 +04:00
|
|
|
*/
|
|
|
|
typedef int (*orte_ns_base_module_define_data_type_fn_t)(
|
|
|
|
const char *name,
|
|
|
|
orte_data_type_t *type);
|
|
|
|
|
2005-08-07 17:21:52 +04:00
|
|
|
|
|
|
|
/**** PEER RETRIEVAL ****/
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
/**
|
|
|
|
* Get the process names of all processes in the specified conditions. It is
|
|
|
|
* sometimes necessary for a process to communicate to all processes of a
|
|
|
|
* given job, all processes in a given cell or on a given node, etc. The RML
|
|
|
|
* communication system utilizes the process name as its "pointer" for
|
|
|
|
* sending messages to another process. This function returns an array of
|
|
|
|
* process name pointers that contains the names of all processes that
|
|
|
|
* meet the specified combination of attributes.
|
|
|
|
*
|
|
|
|
* @param procs The location where the address of the array of pointers
|
|
|
|
* is to be stored. The function will dynamically allocate space for the
|
|
|
|
* array - the caller is responsible for releasing this space.
|
|
|
|
* @param num_procs The location where the number of entries in the
|
|
|
|
* returned array is to be stored.
|
|
|
|
* @param attributes A list of conditions to be used in defining the
|
|
|
|
* peers to be included in the returned array. This can include a
|
|
|
|
* request that all peers for the parent job be returned, for example.
|
|
|
|
* More common options would be to specify a cell or job.
|
|
|
|
*
|
|
|
|
* NOTE The combination of ORTE_CELLID_WILDCARD and ORTE_JOBID_WILDCARD
|
|
|
|
* in the attribute list will cause the function to return the names of *all*
|
|
|
|
* processes currently active in the universe.
|
2006-02-07 06:32:36 +03:00
|
|
|
*
|
2005-01-07 19:03:55 +03:00
|
|
|
*/
|
2006-02-07 06:32:36 +03:00
|
|
|
typedef int (*orte_ns_base_module_get_peers_fn_t)(orte_process_name_t **procs,
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
orte_std_cntr_t *num_procs,
|
|
|
|
opal_list_t *attributes);
|
2005-08-07 17:21:52 +04:00
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* DIAGNOSTIC INTERFACES
|
|
|
|
*/
|
2006-04-04 15:05:52 +04:00
|
|
|
typedef int (*orte_ns_base_module_dump_cells_fn_t)(void);
|
2005-08-07 17:21:52 +04:00
|
|
|
|
2006-04-04 15:05:52 +04:00
|
|
|
typedef int (*orte_ns_base_module_dump_jobs_fn_t)(void);
|
2005-08-07 17:21:52 +04:00
|
|
|
|
2006-04-04 15:05:52 +04:00
|
|
|
typedef int (*orte_ns_base_module_dump_tags_fn_t)(void);
|
2005-08-07 17:21:52 +04:00
|
|
|
|
2006-04-04 15:05:52 +04:00
|
|
|
typedef int (*orte_ns_base_module_dump_datatypes_fn_t)(void);
|
2005-08-07 17:21:52 +04:00
|
|
|
|
2007-03-17 02:11:45 +03:00
|
|
|
typedef int (*orte_ns_base_module_ft_event_fn_t)(int state);
|
2006-02-07 06:32:36 +03:00
|
|
|
|
2004-08-02 04:24:22 +04:00
|
|
|
/*
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
* Ver 2.0
|
2004-08-02 04:24:22 +04:00
|
|
|
*/
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
struct mca_ns_base_module_2_0_0_t {
|
2005-08-07 17:21:52 +04:00
|
|
|
/* init */
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
orte_ns_base_module_init_fn_t init;
|
2005-08-07 17:21:52 +04:00
|
|
|
/* cell functions */
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
orte_ns_base_module_create_cellid_fn_t create_cellid;
|
|
|
|
orte_ns_base_module_get_cell_info_fn_t get_cell_info;
|
|
|
|
orte_ns_base_module_get_cellid_string_fn_t get_cellid_string;
|
|
|
|
orte_ns_base_module_convert_cellid_to_string_fn_t convert_cellid_to_string;
|
|
|
|
orte_ns_base_module_convert_string_to_cellid_fn_t convert_string_to_cellid;
|
|
|
|
/** node functions */
|
|
|
|
orte_ns_base_module_create_nodeids_fn_t create_nodeids;
|
|
|
|
orte_ns_base_module_get_node_info_fn_t get_node_info;
|
|
|
|
orte_ns_base_module_convert_nodeid_to_string_fn_t convert_nodeid_to_string;
|
|
|
|
orte_ns_base_module_convert_string_to_nodeid_fn_t convert_string_to_nodeid;
|
2005-08-07 17:21:52 +04:00
|
|
|
/* jobid functions */
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
orte_ns_base_module_create_jobid_fn_t create_jobid;
|
|
|
|
orte_ns_base_module_get_job_descendants_fn_t get_job_descendants;
|
|
|
|
orte_ns_base_module_get_job_children_fn_t get_job_children;
|
|
|
|
orte_ns_base_module_get_root_job_fn_t get_root_job;
|
|
|
|
orte_ns_base_module_get_parent_job_fn_t get_parent_job;
|
2007-04-23 16:48:19 +04:00
|
|
|
orte_ns_base_module_get_job_family_fn_t get_job_family;
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
orte_ns_base_module_get_jobid_string_fn_t get_jobid_string;
|
|
|
|
orte_ns_base_module_convert_jobid_to_string_fn_t convert_jobid_to_string;
|
|
|
|
orte_ns_base_module_convert_string_to_jobid_fn_t convert_string_to_jobid;
|
|
|
|
orte_ns_base_module_reserve_range_fn_t reserve_range;
|
2007-04-23 16:48:19 +04:00
|
|
|
orte_ns_base_module_get_vpid_range_fn_t get_vpid_range;
|
2005-08-07 17:21:52 +04:00
|
|
|
/* vpid functions */
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
orte_ns_base_module_get_vpid_string_fn_t get_vpid_string;
|
|
|
|
orte_ns_base_module_convert_vpid_to_string_fn_t convert_vpid_to_string;
|
|
|
|
orte_ns_base_module_convert_string_to_vpid_fn_t convert_string_to_vpid;
|
2005-08-07 17:21:52 +04:00
|
|
|
/* name functions */
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
orte_ns_base_module_create_proc_name_fn_t create_process_name;
|
|
|
|
orte_ns_base_module_create_my_name_fn_t create_my_name;
|
2005-03-14 23:57:21 +03:00
|
|
|
orte_ns_base_module_convert_string_to_process_name_fn_t convert_string_to_process_name;
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
orte_ns_base_module_get_proc_name_string_fn_t get_proc_name_string;
|
|
|
|
orte_ns_base_module_compare_fields_fn_t compare_fields;
|
2005-08-07 17:21:52 +04:00
|
|
|
/* peer functions */
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
orte_ns_base_module_get_peers_fn_t get_peers;
|
2005-08-07 17:21:52 +04:00
|
|
|
/* tag server functions */
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
orte_ns_base_module_assign_rml_tag_fn_t assign_rml_tag;
|
2005-08-07 17:21:52 +04:00
|
|
|
/* data type functions */
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
orte_ns_base_module_define_data_type_fn_t define_data_type;
|
2005-08-07 17:21:52 +04:00
|
|
|
/* diagnostic functions */
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
orte_ns_base_module_dump_cells_fn_t dump_cells;
|
|
|
|
orte_ns_base_module_dump_jobs_fn_t dump_jobs;
|
|
|
|
orte_ns_base_module_dump_tags_fn_t dump_tags;
|
|
|
|
orte_ns_base_module_dump_datatypes_fn_t dump_datatypes;
|
2007-03-17 02:11:45 +03:00
|
|
|
|
|
|
|
orte_ns_base_module_ft_event_fn_t ft_event;
|
2004-08-02 04:24:22 +04:00
|
|
|
};
|
2004-08-18 19:50:10 +04:00
|
|
|
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
typedef struct mca_ns_base_module_2_0_0_t mca_ns_base_module_2_0_0_t;
|
|
|
|
typedef mca_ns_base_module_2_0_0_t mca_ns_base_module_t;
|
2004-08-02 04:24:22 +04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* NS Component
|
|
|
|
*/
|
2005-03-14 23:57:21 +03:00
|
|
|
/**
|
|
|
|
* Initialize the selected component.
|
|
|
|
*/
|
|
|
|
typedef mca_ns_base_module_t* (*mca_ns_base_component_init_fn_t)(int *priority);
|
2004-08-02 04:24:22 +04:00
|
|
|
|
2005-03-14 23:57:21 +03:00
|
|
|
/**
|
|
|
|
* Finalize the selected module
|
|
|
|
*/
|
2004-08-02 04:24:22 +04:00
|
|
|
typedef int (*mca_ns_base_component_finalize_fn_t)(void);
|
2006-02-07 06:32:36 +03:00
|
|
|
|
2005-03-14 23:57:21 +03:00
|
|
|
|
2004-08-02 04:24:22 +04:00
|
|
|
/*
|
|
|
|
* the standard component data structure
|
|
|
|
*/
|
|
|
|
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
struct mca_ns_base_component_2_0_0_t {
|
2004-08-02 04:24:22 +04:00
|
|
|
mca_base_component_t ns_version;
|
|
|
|
mca_base_component_data_1_0_0_t ns_data;
|
|
|
|
|
|
|
|
mca_ns_base_component_init_fn_t ns_init;
|
|
|
|
mca_ns_base_component_finalize_fn_t ns_finalize;
|
|
|
|
};
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
typedef struct mca_ns_base_component_2_0_0_t mca_ns_base_component_2_0_0_t;
|
|
|
|
typedef mca_ns_base_component_2_0_0_t mca_ns_base_component_t;
|
2004-08-02 04:24:22 +04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
* Macro for use in components that are of type ns v2.0.0
|
2004-08-02 04:24:22 +04:00
|
|
|
*/
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
#define MCA_NS_BASE_VERSION_2_0_0 \
|
|
|
|
/* ns v2.0 is chained to MCA v1.0 */ \
|
2004-08-02 04:24:22 +04:00
|
|
|
MCA_BASE_VERSION_1_0_0, \
|
Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
2006-11-14 22:34:59 +03:00
|
|
|
/* ns v2.0 */ \
|
|
|
|
"ns", 2, 0, 0
|
2004-08-02 04:24:22 +04:00
|
|
|
|
2005-03-14 23:57:21 +03:00
|
|
|
/* Global structure for accessing name server functions
|
|
|
|
*/
|
2006-08-20 19:54:04 +04:00
|
|
|
ORTE_DECLSPEC extern mca_ns_base_module_t orte_ns; /* holds selected module's function pointers */
|
2005-03-14 23:57:21 +03:00
|
|
|
|
2005-05-12 00:21:10 +04:00
|
|
|
#if defined(c_plusplus) || defined(__cplusplus)
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2004-08-02 04:24:22 +04:00
|
|
|
#endif
|