1
1
Граф коммитов

9 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
93cf3c7203 Update OPAL and ORTE for thread safety
(I swear, if I look this over one more time, I'll puke)

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-06 12:30:57 -07:00
Nathan Hjelm
4d92c9989e more c99 updates
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Gilles Gouaillardet
2e384a3b65 initialize common symbols from orte
A few uninitialized common symbols are remaining (generated by flex) :
 * orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_leng
 * orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_text
 * orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_leng
 * orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_text
2015-05-08 10:11:58 +09:00
Nathan Hjelm
b68d66bb9b MCA: Add the project/project version to the MCA base component
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.

All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-03-27 10:59:04 -06:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Ralph Castain
e4e678e234 Per the RFC and discussion on the devel list, update the RTE-MPI error handling interface. There are a few differences in the code from the original RFC that came out of the discussion - I've captured those in the following writeup
George and I were talking about ORTE's error handling the other day in regards to the right way to deal with errors in the updated OOB. Specifically, it seemed a bad idea for a library such as ORTE to be aborting the job on its own prerogative. If we lose a connection or cannot send a message, then we really should just report it upwards and let the application and/or upper layers decide what to do about it.

The current code base only allows a single error callback to exist, which seemed unduly limiting. So, based on the conversation, I've modified the errmgr interface to provide a mechanism for registering any number of error handlers (this replaces the current "set_fault_callback" API). When an error occurs, these handlers will be called in order until one responds that the error has been "resolved" - i.e., no further action is required - by returning OMPI_SUCCESS. The default MPI layer error handler is specified to go "last" and calls mpi_abort, so the current "abort" behavior is preserved unless other error handlers are registered.

In the register_callback function, I provide an "order" param so you can specify "this callback must come first" or "this callback must come last". Seemed to me that we will probably have different code areas registering callbacks, and one might require it go first (the default "abort" will always require it go last). So you can append and prepend, or go first. Note that only one registration can declare itself "first" or "last", and since the default "abort" callback automatically takes "last", that one isn't available. :-)

The errhandler callback function passes an opal_pointer_array of structs, each of which contains the name of the proc involved (which can be yourself for internal errors) and the error code. This is a change from the current fault callback which returned an opal_pointer_array of just process names. Rationale is that you might need to see the cause of the error to decide what action to take. I realize that isn't a requirement for remote procs, but remember that we will use the SAME interface to report RTE errors internal to the proc itself. In those cases, you really do need to see the error code. It is legal to pass a NULL for the pointer array (e.g., when reporting an internal failure without error code), so handlers must be prepared for that possibility. If people find that too burdensome, we can remove it.

Should we ever decide to create a separate callback path for internal errors vs remote process failures, or if we decide to do something different based on experience, then we can adjust this API.

This commit was SVN r28852.
2013-07-19 01:08:53 +00:00
Ralph Castain
10ca1c1b04 Turns out that there was exactly ONE place in all of the OMPI code base that still referred to OPAL_TRACE, though a few places retained the include file for no reason. So no point in letting this sit as it is clearly an unused "feature".
This commit was SVN r28789.
2013-07-14 18:57:20 +00:00
Nathan Hjelm
c041156f60 Update ORTE frameworks to use the MCA framework system.
This commit was SVN r28240.
2013-03-27 21:14:43 +00:00