1
1
openmpi/ompi/mca/btl/openib/btl_openib_ini.c

654 строки
19 KiB
C
Исходник Обычный вид История

Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
/*
* Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
* University Research and Technology
* Corporation. All rights reserved.
* Copyright (c) 2004-2005 The University of Tennessee and The University
* of Tennessee Research Foundation. All rights
* reserved.
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
* University of Stuttgart. All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
* Copyright (c) 2006 Cisco Systems, Inc. All rights reserved.
* $COPYRIGHT$
*
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
* Additional copyrights may follow
*
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
* $HEADER$
*/
#include "ompi_config.h"
#include <string.h>
#include <ctype.h>
#include <stdlib.h>
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
#include <unistd.h>
#include "orte/util/show_help.h"
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
#include "opal/mca/base/mca_base_param.h"
#include "btl_openib.h"
#include "btl_openib_lex.h"
#include "btl_openib_ini.h"
static const char *ini_filename = NULL;
static bool initialized = false;
static opal_list_t hcas;
static char *key_buffer = NULL;
static size_t key_buffer_len = 0;
/*
* Struct to hold the section name, vendor ID, and list of vendor part
* ID's and a corresponding set of values (parsed from an INI file).
*/
typedef struct parsed_section_values_t {
char *name;
uint32_t *vendor_ids;
int vendor_ids_len;
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
uint32_t *vendor_part_ids;
int vendor_part_ids_len;
ompi_btl_openib_ini_values_t values;
} parsed_section_values_t;
/*
* Struct to hold the final values. Different from above in a few ways:
*
* - The vendor and part IDs will always be set properly
* - There will only be one part ID (i.e., the above struct is
* exploded into multiple of these for each of searching)
* - There is a super of opal_list_item_t so that we can have a list
* of these
*/
typedef struct hca_values_t {
opal_list_item_t super;
char *section_name;
uint32_t vendor_id;
uint32_t vendor_part_id;
ompi_btl_openib_ini_values_t values;
} hca_values_t;
static void hca_values_constructor(hca_values_t *s);
static void hca_values_destructor(hca_values_t *s);
OBJ_CLASS_INSTANCE(hca_values_t,
opal_list_item_t,
hca_values_constructor,
hca_values_destructor);
/*
* Local functions
*/
static int parse_file(char *filename);
static int parse_line(parsed_section_values_t *item);
static void reset_section(bool had_previous_value, parsed_section_values_t *s);
static void reset_values(ompi_btl_openib_ini_values_t *v);
static int save_section(parsed_section_values_t *s);
static int intify(char *string);
static int intify_list(char *str, uint32_t **values, int *len);
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
static inline void show_help(const char *topic);
/*
* Read the INI files for HCA-specific values and save them in
* internal data structures for later lookup.
*/
int ompi_btl_openib_ini_init(void)
{
int ret = OMPI_ERR_NOT_FOUND;
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
char *colon;
OBJ_CONSTRUCT(&hcas, opal_list_t);
colon = strchr(mca_btl_openib_component.hca_params_file_names, ':');
if (NULL == colon) {
/* If we've only got 1 file (i.e., no colons found), parse it
and be done */
ret = parse_file(mca_btl_openib_component.hca_params_file_names);
} else {
/* Otherwise, loop over all the files and parse them */
char *orig = strdup(mca_btl_openib_component.hca_params_file_names);
char *str = orig;
while (NULL != (colon = strchr(str, ':'))) {
*colon = '\0';
ret = parse_file(str);
/* Note that NOT_FOUND and SUCCESS are not fatal errors
and we keep going. Other errors are treated as
fatal */
if (OMPI_ERR_NOT_FOUND != ret && OMPI_SUCCESS != ret) {
break;
}
str = colon + 1;
}
/* Parse the last file if we didn't have a fatal error above */
if (OMPI_ERR_NOT_FOUND != ret && OMPI_SUCCESS != ret) {
ret = parse_file(str);
}
/* All done */
free(orig);
}
/* Return SUCCESS unless we got a fatal error */
initialized = true;
return (OMPI_SUCCESS == ret || OMPI_ERR_NOT_FOUND == ret) ?
OMPI_SUCCESS : ret;
}
/*
* The component found an HCA and is querying to see if an INI file
* specified any parameters for it.
*/
int ompi_btl_openib_ini_query(uint32_t vendor_id, uint32_t vendor_part_id,
ompi_btl_openib_ini_values_t *values)
{
int ret;
hca_values_t *h;
opal_list_item_t *item;
if (!initialized) {
if (OMPI_SUCCESS != (ret = ompi_btl_openib_ini_init())) {
return ret;
}
}
if (mca_btl_openib_component.verbose) {
BTL_OUTPUT(("Querying INI files for vendor 0x%04x, part ID %d",
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
vendor_id, vendor_part_id));
}
reset_values(values);
/* Iterate over all the saved hcas */
for (item = opal_list_get_first(&hcas);
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
item != opal_list_get_end(&hcas);
item = opal_list_get_next(item)) {
h = (hca_values_t*) item;
if (vendor_id == h->vendor_id &&
vendor_part_id == h->vendor_part_id) {
/* Found it! */
/* NOTE: There is a bug in the PGI 6.2 series that causes
the compiler to choke when copying structs containing
bool members by value. So do a memcpy here instead. */
memcpy(values, &h->values, sizeof(h->values));
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
if (mca_btl_openib_component.verbose) {
BTL_OUTPUT(("Found corresponding INI values: %s",
h->section_name));
}
return OMPI_SUCCESS;
}
}
/* If we fall through to here, we didn't find it */
if (mca_btl_openib_component.verbose) {
BTL_OUTPUT(("Did not find corresponding INI values"));
}
return OMPI_ERR_NOT_FOUND;
}
/*
* The component is shutting down; release all internal state
*/
int ompi_btl_openib_ini_finalize(void)
{
opal_list_item_t *item;
if (initialized) {
for (item = opal_list_remove_first(&hcas);
NULL != item;
item = opal_list_remove_first(&hcas)) {
OBJ_RELEASE(item);
}
OBJ_DESTRUCT(&hcas);
initialized = true;
}
return OMPI_SUCCESS;
}
/**************************************************************************/
/*
* Parse a single file
*/
static int parse_file(char *filename)
{
int val;
int ret = OMPI_SUCCESS;
bool showed_no_section_warning = false;
bool showed_unexpected_tokens_warning = false;
parsed_section_values_t section;
reset_section(false, &section);
/* Open the file */
ini_filename = filename;
btl_openib_ini_yyin = fopen(filename, "r");
if (NULL == btl_openib_ini_yyin) {
This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.
2008-05-14 00:00:55 +04:00
orte_show_help("help-mpi-btl-openib.txt", "ini file:file not found",
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
true, filename);
ret = OMPI_ERR_NOT_FOUND;
goto cleanup;
}
/* Do the parsing */
btl_openib_ini_parse_done = false;
btl_openib_ini_yynewlines = 1;
btl_openib_ini_init_buffer(btl_openib_ini_yyin);
while (!btl_openib_ini_parse_done) {
val = btl_openib_ini_yylex();
switch (val) {
case BTL_OPENIB_INI_PARSE_DONE:
/* This will also set btl_openib_ini_parse_done to true, so just
break here */
break;
case BTL_OPENIB_INI_PARSE_NEWLINE:
/* blank line! ignore it */
break;
case BTL_OPENIB_INI_PARSE_SECTION:
/* We're starting a new section; if we have previously
parsed a section, go see if we can use its values. */
save_section(&section);
reset_section(true, &section);
section.name = strdup(btl_openib_ini_yytext);
break;
case BTL_OPENIB_INI_PARSE_SINGLE_WORD:
if (NULL == section.name) {
/* Warn that there is no current section, and ignore
this parameter */
if (!showed_no_section_warning) {
show_help("ini file:not in a section");
showed_no_section_warning = true;
}
/* Parse it and then dump it */
parse_line(&section);
reset_section(true, &section);
} else {
parse_line(&section);
}
break;
default:
/* anything else is an error */
if (!showed_unexpected_tokens_warning) {
show_help("ini file:unexpected token");
showed_unexpected_tokens_warning = true;
}
break;
}
}
save_section(&section);
fclose(btl_openib_ini_yyin);
cleanup:
reset_section(true, &section);
if (NULL != key_buffer) {
free(key_buffer);
key_buffer = NULL;
key_buffer_len = 0;
}
return ret;
}
/*
* Parse a single line in the INI file
*/
static int parse_line(parsed_section_values_t *sv)
{
int val, ret = OMPI_SUCCESS;
char *value = NULL;
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
bool showed_unknown_field_warning = false;
/* Save the name name */
if (key_buffer_len < strlen(btl_openib_ini_yytext) + 1) {
char *tmp;
key_buffer_len = strlen(btl_openib_ini_yytext) + 1;
tmp = realloc(key_buffer, key_buffer_len);
if (NULL == tmp) {
free(key_buffer);
key_buffer_len = 0;
key_buffer = NULL;
return OMPI_ERR_TEMP_OUT_OF_RESOURCE;
}
key_buffer = tmp;
}
strncpy(key_buffer, btl_openib_ini_yytext, key_buffer_len);
/* The first thing we have to see is an "=" */
val = btl_openib_ini_yylex();
if (btl_openib_ini_parse_done || BTL_OPENIB_INI_PARSE_EQUAL != val) {
show_help("ini file:expected equals");
return OMPI_ERROR;
}
/* Next we get the value */
val = btl_openib_ini_yylex();
if (BTL_OPENIB_INI_PARSE_SINGLE_WORD == val ||
BTL_OPENIB_INI_PARSE_VALUE == val) {
value = strdup(btl_openib_ini_yytext);
/* Now we need to see the newline */
val = btl_openib_ini_yylex();
if (BTL_OPENIB_INI_PARSE_NEWLINE != val &&
BTL_OPENIB_INI_PARSE_DONE != val) {
show_help("ini file:expected newline");
free(value);
return OMPI_ERROR;
}
}
/* If we did not get EOL or EOF, something is wrong */
else if (BTL_OPENIB_INI_PARSE_DONE != val &&
BTL_OPENIB_INI_PARSE_NEWLINE != val) {
show_help("ini file:expected newline");
return OMPI_ERROR;
}
/* Ok, we got a good parse. Now figure out what it is and save
the value. Note that the flex already took care of trimming
all whitespace at the beginning and ending of the value. */
if (0 == strcasecmp(key_buffer, "vendor_id")) {
if (OMPI_SUCCESS != (ret = intify_list(value, &sv->vendor_ids,
&sv->vendor_ids_len))) {
return ret;
}
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
}
else if (0 == strcasecmp(key_buffer, "vendor_part_id")) {
if (OMPI_SUCCESS != (ret = intify_list(value, &sv->vendor_part_ids,
&sv->vendor_part_ids_len))) {
return ret;
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
}
}
else if (0 == strcasecmp(key_buffer, "mtu")) {
/* Single value */
sv->values.mtu = (uint32_t) intify(value);
sv->values.mtu_set = true;
}
else if (0 == strcasecmp(key_buffer, "use_eager_rdma")) {
/* Single value */
sv->values.use_eager_rdma = (uint32_t) intify(value);
sv->values.use_eager_rdma_set = true;
}
else if (0 == strcasecmp(key_buffer, "receive_queues")) {
/* Single value (already strdup'ed) */
sv->values.receive_queues = value;
value = NULL;
}
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
else {
/* Have no idea what this parameter is. Not an error -- just
ignore it */
if (!showed_unknown_field_warning) {
This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.
2008-05-14 00:00:55 +04:00
orte_show_help("help-mpi-btl-openib.txt",
"ini file:unknown field", true,
ini_filename, btl_openib_ini_yynewlines,
key_buffer);
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
showed_unknown_field_warning = true;
}
}
/* All done */
if (NULL != value) {
free(value);
}
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
return ret;
}
/*
* Construct an hca_values_t and set all of its values to known states
*/
static void hca_values_constructor(hca_values_t *s)
{
s->section_name = NULL;
s->vendor_id = 0;
s->vendor_part_id = 0;
reset_values(&s->values);
}
/*
* Destruct an hca_values_t and free any memory that it has
*/
static void hca_values_destructor(hca_values_t *s)
{
if (NULL != s->section_name) {
free(s->section_name);
}
if (NULL != s->values.receive_queues) {
free(s->values.receive_queues);
}
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
}
/*
* Reset a parsed section; free any memory that it may have had
*/
static void reset_section(bool had_previous_value, parsed_section_values_t *s)
{
if (had_previous_value) {
if (NULL != s->name) {
free(s->name);
}
if (NULL != s->vendor_ids) {
free(s->vendor_ids);
}
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
if (NULL != s->vendor_part_ids) {
free(s->vendor_part_ids);
}
}
s->name = NULL;
s->vendor_ids = NULL;
s->vendor_ids_len = 0;
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
s->vendor_part_ids = NULL;
s->vendor_part_ids_len = 0;
reset_values(&s->values);
}
/*
* Reset the values to known states
*/
static void reset_values(ompi_btl_openib_ini_values_t *v)
{
v->mtu = 0;
v->mtu_set = false;
v->use_eager_rdma = 0;
v->use_eager_rdma_set = false;
v->receive_queues = NULL;
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
}
/*
* If we have a valid section, see if we have a matching section
* somewhere (i.e., same vendor ID and vendor part ID). If we do,
* update the values. If not, save the values in a new instance and
* add it to the list.
*/
static int save_section(parsed_section_values_t *s)
{
int i, j;
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
opal_list_item_t *item;
hca_values_t *h;
bool found;
/* Is the parsed section valid? */
if (NULL == s->name || 0 == s->vendor_ids_len ||
0 == s->vendor_part_ids_len) {
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
return OMPI_ERR_BAD_PARAM;
}
/* Iterate over each of the vendor/part IDs in the parsed
values */
for (i = 0; i < s->vendor_ids_len; ++i) {
for (j = 0; j < s->vendor_part_ids_len; ++j) {
found = false;
/* Iterate over all the saved hcas */
for (item = opal_list_get_first(&hcas);
item != opal_list_get_end(&hcas);
item = opal_list_get_next(item)) {
h = (hca_values_t*) item;
if (s->vendor_ids[i] == h->vendor_id &&
s->vendor_part_ids[j] == h->vendor_part_id) {
/* Found a match. Update any newly-set values. */
if (s->values.mtu_set) {
h->values.mtu = s->values.mtu;
h->values.mtu_set = true;
}
if (s->values.use_eager_rdma_set) {
h->values.use_eager_rdma = s->values.use_eager_rdma;
h->values.use_eager_rdma_set = true;
}
found = true;
break;
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
}
}
/* Did we find/update it in the exising list? If not,
create a new one. */
if (!found) {
h = OBJ_NEW(hca_values_t);
h->section_name = strdup(s->name);
h->vendor_id = s->vendor_ids[i];
h->vendor_part_id = s->vendor_part_ids[j];
/* NOTE: There is a bug in the PGI 6.2 series that
causes the compiler to choke when copying structs
containing bool members by value. So do a memcpy
here instead. */
memcpy(&h->values, &s->values, sizeof(s->values));
/* Need to strdup the string, though */
if (NULL != h->values.receive_queues) {
h->values.receive_queues = strdup(s->values.receive_queues);
}
opal_list_append(&hcas, &h->super);
}
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
}
}
/* All done */
return OMPI_SUCCESS;
}
/*
* Do string-to-integer conversion, for both hex and decimal numbers
*/
static int intify(char *str)
{
while (isspace(*str)) {
++str;
}
/* If it's hex, use sscanf() */
if (strlen(str) > 3 && 0 == strncasecmp("0x", str, 2)) {
unsigned int i;
sscanf(str, "%X", &i);
return (int) i;
}
/* Nope -- just decimal, so use atoi() */
return atoi(str);
}
/*
* Take a comma-delimited list and infity them all
*/
static int intify_list(char *value, uint32_t **values, int *len)
{
char *comma;
char *str = value;
*len = 0;
/* Comma-delimited list of values */
comma = strchr(str, ',');
if (NULL == comma) {
/* If we only got one value (i.e., no comma found), then
just make an array of one value and save it */
*values = malloc(sizeof(uint32_t));
if (NULL == *values) {
return OMPI_ERR_OUT_OF_RESOURCE;
}
*values[0] = (uint32_t) intify(str);
*len = 1;
} else {
int newsize = 1;
/* Count how many values there are and allocate enough space
for them */
while (NULL != comma) {
++newsize;
str = comma + 1;
comma = strchr(str, ',');
}
*values = malloc(sizeof(uint32_t) * newsize);
if (NULL == *values) {
return OMPI_ERR_OUT_OF_RESOURCE;
}
/* Iterate over the values and save them */
str = value;
comma = strchr(str, ',');
do {
*comma = '\0';
(*values)[*len] = (uint32_t) intify(str);
++(*len);
str = comma + 1;
comma = strchr(str, ',');
} while (NULL != comma);
/* Get the last value (i.e., the value after the last
comma, because it won't have been snarfed in the
loop) */
(*values)[*len] = (uint32_t) intify(str);
++(*len);
}
return OMPI_SUCCESS;
}
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
/*
* Trival helper function
*/
static inline void show_help(const char *topic)
{
char *save = btl_openib_ini_yytext;
if (0 == strcmp("\n", btl_openib_ini_yytext)) {
btl_openib_ini_yytext = "<end of line>";
}
This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.
2008-05-14 00:00:55 +04:00
orte_show_help("help-mpi-btl-openib.txt", topic, true,
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
ini_filename, btl_openib_ini_yynewlines,
btl_openib_ini_yytext);
btl_openib_ini_yytext = save;
Bring over all the work from the /tmp/ib-hw-detect branch. In addition to my design and testing, it was conceptually approved by Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat lightly] tested by Galen. We may still have to shake out some bugs during the next few months, but it seems to be working for all the cases that I can throw at it. Here's a summary of the changes from that branch: * Move MCA parameter registration to a new file (btl_openib_mca.c): * Properly check the retun status of registering MCA params * Check for valid values of MCA parameters * Make help strings better * Otherwise, the only default value of an MCA param that was changed was max_btls; it went from 4 to -1 (meaning: use all available) * Properly prototyped internal functions in _component.c * Made a bunch of functions static that didn't need to be public * Renamed to remove "mca_" prefix from static functions * Call new MCA param registration function * Call new INI file read/lookup/finalize functions * Updated a bunch of macros to be "BTL_" instead of "ORTE_" * Be a little more consistent with return values * Handle -1 for the max_btls MCA param * Fixed a free() that should have been an OBJ_RELEASE() * Some re-indenting * Added INI-file parsing * New flex file: btl_openib_ini.l * New default HCA params .ini file (probably to be expanded over time by other HCA vendors) * Added more show_help messages for parsing problems * Read in INI files and cache the values for later lookup * When component opens an HCA, lookup to see if any corresponding values were found in the INI files (ID'ed by the HCA vendor_id and vendor_part_id) * Added btl_openib_verbose MCA param that shows what the INI-file stuff does (e.g., shows which MTU your HCA ends up using) * Added btl_openib_hca_param_files as a colon-delimited list of INI files to check for values during startup (in order, left-to-right, just like the MCA base directory param). * MTU is currently the only value supported in this framework. * It is not a fatal error if we don't find params for the HCA in the INI file(s). Instead, just print a warning. New MCA param btl_openib_warn_no_hca_params_found can be used to disable printing the warning. * Add MTU to peer negotiation when making a connection * Exchange maximum MTU; select the lesser of the two This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
}