1
1
Граф коммитов

132 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
fec973efda configury: test portability
replace test ... -o ... with test ... || test ...
and test ... -a ... with test ... && test ...
2015-12-28 13:58:45 +09:00
Nathan Hjelm
8b5810f7f7 mca/base: add priority output to mca_base_select
The mca_base_select function uses returned priorities to select the
best component/module. This priority may be of use to the caller so
pass that information back in an optional argument. If the priority is
not needed pass NULL.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:41 -06:00
Nathan Hjelm
ee36d813dc Merge pull request #657 from hjelmn/c99
more c99 updates
2015-06-25 11:21:09 -06:00
Nathan Hjelm
4d92c9989e more c99 updates
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Howard Pritchard
e49a37c034 ownership: update ownership files
per discussions at OMPI devel workshop

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-06-25 10:04:42 -06:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Gilles Gouaillardet
9e278a21ce opal/crs: fix a string overflow
and revamp out of resource handling
fixes resource leak as reported by Coverity with CID 1304752
2015-06-10 14:23:25 +09:00
Nathan Hjelm
6772d32b85 opal/crs: silence clang warnings introduced by coverity fixes
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-06-08 09:16:13 -06:00
Nathan Hjelm
f72b6d45c7 crs/none: fix coverity issues
CID 1301389 Resource leak (RESOURCE_LEAK)

There is no conceivable reason to strdup cr_argv[0] in either
location. Removed the calls to strdup.

CID 741357 Resource leak (RESOURCE_LEAK)

cr_argv was created by opal_argv_split (tmp_argv[0], ' '). Why should
we call opal_argv_join (' ') on this array. Leak fixed by printing out
tmp_argv[0] instead of calling opal_argv_join.

CID 741358 Resource leak (RESOURCE_LEAK)

The code does not handle exec failure correctly. The error should be
communicated to the parent process but the function in question is
only called by the parent. This calls into question some of the
structure of the function in general (like what is the point of
returning the child process id). That said, I will go ahead and add
the opal_argv_free to quiet this error.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-06-01 16:00:51 -06:00
Nathan Hjelm
9353fcea95 crs/base: fix coverity issues
CID 1196720 Resource leak (RESOURCE_LEAK)
CID 1196721 Resource leak (RESOURCE_LEAK)

The code in question does leak loc_token and loc_value. Cleaned up the
code a bit and plugged the leak.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:10 -06:00
Nathan Hjelm
9e56ef0da9 opal/crs: clean up parsing code to fix coverity issues
CID 70622 Dereference before null check (REVERSE_INULL)
CID 70459 Logically dead code (DEADCODE)

Cleanup some cludgy code which (among other things) reimplemented
strcat, strdup, and strchr. In the process this resolved two
outstanding coverity issues.

CID 70631 Dereference before null check (REVERSE_INULL)

best_module can not be NULL in this code path. Remove NULL check and
unnecessary goto statements.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-26 14:45:35 -06:00
Gilles Gouaillardet
c809aace47 initialize common symbols from opal
A few uninitialized common symbols are remaining:

common symbols generated by flex :
 * opal/util/keyval/keyval_lex.l: opal_util_keyval_yyleng
 * opal/util/keyval/keyval_lex.o: opal_util_keyval_yytext
 * opal/util/show_help_lex.l: opal_show_help_yyleng
 * opal/util/show_help_lex.l: opal_show_help_yytext

common symbol generated by "external" hwloc library:
 * opal/mca/hwloc/hwloc191/hwloc/src/components.o: component_map
2015-05-08 09:48:51 +09:00
Nathan Hjelm
33181b2543 opal: use C99 subobject naming for component initialization
This commit helps future-proof opal components by initializing each
component member by name.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Nathan Hjelm
b68d66bb9b MCA: Add the project/project version to the MCA base component
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.

All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-03-27 10:59:04 -06:00
Jeff Squyres
f381c5ea8b crs none: ensure file is != NULL before closing it
This was CID 71701
2015-02-24 15:24:09 -05:00
Jeff Squyres
8fd5e75463 crs base: ensure metadata != NULL
This was CID 71700
2015-02-24 15:24:08 -05:00
Howard Pritchard
bf89131f9e add owner files to opa/ompi/orte mca directories
This commit adds an owner file in each of the component directories
for each framework.  This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page.  Currently there are two
"fields" in the file, an owner and a status.  A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
2015-02-22 15:10:23 -07:00
Jeff Squyres
4f1996df5d various: remove $(LTDLINCL) from Makefile.am's that didn't need it 2015-02-11 12:25:20 -08:00
Jeff Squyres
c6d9bf906e configury: ensure wrapper static LIBS is filled properly
In core library portions of the configury (e.g., top-level
configure.ac itself), we were calling AC_CHECK_LIB and
OPAL_CHECK_FUNC_LIB to check for various libraries.

'''SIDENOTE:''' It turns out that modern Autoconf has AC_SEARCH_LIBS,
which does just about exactly what OPAL_CHECK_FUNC_LIB does.  So this
commit effectively replaces OPAL_CHECK_FUNC_LIB with AC_SEARCH_LIBS.

However, we never bothered to add these found libraries to the wrapper
compiler list of libraries used for static linking (doh!).  We've been
getting lucky for quite a while that components were adding the same
libraries to their wrapper compiler LIBS list.  

This is problematic, however, if we don't build some of these
components.  For example, Paul Hargrove noticed that if he configured
with --disable-shared --enable-static --disable-io-romio, ROMIO was no
longer adding some libraries to the wrapper LIBS list -- libraries
that just happened to also be needed by core OPAL/ORTE/OMPI layers.

The solution is not to use AC_CHECK_LIB or OPAL_CHECK_FUNC_LIB, but
use a pair of new macros:

 * OPAL_SEARCH_LIBS_CORE: a wrapper around AC_SEARCH_LIBS.  If we add
   something to $LIBS, then also add it to the wrapper list of static
   libraries.  This is the main piece of functionality that was
   wrong/missing.
 * OPAL_SEARCH_LIBS_COMPONENT: similar to OPAL_SEARCH_LIBS_CORE, but
   instead of directly adding it to the wrapper list of static
   libaries, add it to <framework>_<component>_LIBS (which eventually
   gets slurped up into the wrapper list of static libraries.  See the
   lengthy comment in config/opal_setup_wrappers.m4 near the beginning
   of OPAL_SETUP_WRAPPER_INIT() for a more detailed explanation).
   Most components did this correctly already, but one or two weren't
   right, so I implemented this second macro quite similar to the
   first and put it everywhere we already used AC_SEARCH_LIBS or
   OPAL_CHECK_FUNC_LIB.

This needs to soak for a day or two on the trunk before moving to the
v1.8 branch.

Refs trac:4834

cmr=v1.8.2:reviewer=ggouaillardet

This commit was SVN r32447.

The following Trac tickets were found above:
  Ticket 4834 --> https://svn.open-mpi.org/trac/ompi/ticket/4834
2014-08-07 23:54:45 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Ralph Castain
5602156a1c Use the correct abstraction layer name for the data dirs
This commit was SVN r31684.
2014-05-08 14:32:24 +00:00
Ralph Castain
11faab1091 The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Ralph Castain
e20dae536c Last step under current RFC: OMPI_CHECK_WITHDIR -> OPAL_CHECK_WITHDIR
This commit was SVN r31585.
2014-05-01 15:38:07 +00:00
Ralph Castain
e11eb15518 Next step of RFC: OMPI_CHECK_FUNC_LIB -> OPAL_CHECK_FUNC_LIB
This commit was SVN r31583.
2014-05-01 14:57:43 +00:00
Ralph Castain
3b64c603b4 First stage of RFC to rename OMPI_foo build system support: change OMPI_CHECK_PACKAGE -> OPAL_CHECK_PACKAGE
This commit was SVN r31582.
2014-05-01 14:24:56 +00:00
Jeff Squyres
ddb44eb81f compress/crs: Remove empty help files.
Aside from comments, these help files are empty -- so just remove
them.  Also remove a few inclusions of show_help.h, since it isn't
used in those source files.

This commit was SVN r31328.
2014-04-07 15:43:40 +00:00
Jeff Squyres
a4940a2a37 crs self: Remove unused help message.
Also:

* fix formatting error at top (make the emacs format line begin
  with a comment character)
* Remove blank lines between topics and replace them with empty
  comment lines (so that the help messages are not displayed with a
  trailing blank line)

This commit was SVN r31324.
2014-04-07 15:41:24 +00:00
Jeff Squyres
4eee2eff88 crs: Remove empty help text file
This commit was SVN r31323.
2014-04-07 15:40:50 +00:00
Jeff Squyres
173c046617 build: add Automake-like silent/verbose macros for "ln -s ..." operations
Also, since I put some of the macros for these silent/verbose rules up
in the top-level Makefile.man-page-rules file, I renamed it to
Makefile.ompi-rules.

I've had this sitting around for a while; now seems like as good a
time as any to commit it.

This commit was SVN r31271.
2014-03-28 18:24:32 +00:00
Adrian Reber
d7734ac6d8 CRS/CRIU: add code to actually checkpoint a process
This adds the code to actually checkpoint a process using CRIU
with the necessary variables to control the behaviour.

Right now only --np 1 is supported and --mca oob tcp.

Following parameters are supported:

* crs_criu_log: name of the log file
* crs_criu_log_level: verbosity level in the log file
* crs_criu_tcp_established: C/R established TCP connections
* crs_criu_shell_job: C/R shell jobs
* crs_criu_ext_unix_sk: allow external unix connections
* crs_criu_leave_running: leave tasks in running state after checkpoint

This commit was SVN r30772.
2014-02-19 13:30:12 +00:00
Adrian Reber
14ba81d166 Simplification to the CRIU configure.m4 script:
* Remove redundant/unnecessary uses of $2
 * Change a bunch of logic from negative to positive
 * Use OPAL_VAR_SCOPE_PUSH/POP to help reduce env var usage
 * Only use "" in test statements with strings that require sanitization
 * Removed redundant AC_MSG_WARN/ERROR.  There's now only one check at
   the bottom for whether the component is "good" or not.  We'll
   AC_MSG_WARN/ERROR in that one location.

Thanks to Jeff Squyres for this patch.

This commit was SVN r30739.
2014-02-15 21:22:30 +00:00
Adrian Reber
56c23f22be CRS/SELF: fix compiler warning: variable 'callback_matched' set but not used
This commit was SVN r30667.
2014-02-11 14:45:29 +00:00
Adrian Reber
42d7b014e4 CRS/CRIU: added CRIU as new CRS component
To be able to checkpoint/restart using criu (criu.org) a new
CRS component is added which is based on criu. This first commit
provides the minimal set of functions and configure script options
to enable --with-criu and link against libcriu.so.
No actual checkpoint/restart functionality is yet implemented.
This is only the framework which needs to be filled with the
actual functionality.

This commit was SVN r30666.
2014-02-11 14:43:38 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Adrian Reber
d7ab0231e8 Cleanup trailing/heading whitespaces in COMPRESS/CRS MCA
This commit was SVN r30131.
2014-01-07 14:34:45 +00:00
Adrian Reber
015799c724 Fix COMPRESS/CRS MCA for C/R by returning correct value
Right now the C/R code fails because of a change introduced in
opal/mca/compress/base/compress_base_open.c and
pal/mca/crs/base/crs_base_open.c in 2013 with commit

git 734c724ff76d9bf814f3ab0396bcd9ee6fddcd1b
svn r28239

    Update OPAL frameworks to use the MCA framework system.

This commit changed a lot but also the return value of functions from
OPAL_SUCCESS to OPAL_ERR_NOT_AVAILABLE/OPAL_ERR_NOT_AVAILABLE.

This commit lets opal_compress_base_register() and opal_crs_base_open()
always return OPAL_SUCCESS and removes unneeded #includes.

This commit was SVN r30130.

The following SVN revision numbers were found above:
  r28239 --> open-mpi/ompi@365cf48db5
2014-01-07 14:16:54 +00:00
Adrian Reber
b42aad44a3 Trying to get the C/R code to compile again. This patch
includes various fixes all over the C/R code which are
hard to group like the other patches.

Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values

Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)

This commit was SVN r29922.
2013-12-16 15:35:28 +00:00
Nathan Hjelm
365cf48db5 Update OPAL frameworks to use the MCA framework system.
This commit was SVN r28239.
2013-03-27 21:11:47 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Ralph Castain
8d2fa3693b First cut at removing the native Windows support. Remove all the Windows-specific components, and the .windows files sprinkled around. Remove the Windows platform files and MTT scripts. Update the NEWS to point Windows users to the cygwin package.
This commit was SVN r28116.
2013-02-26 20:44:56 +00:00
Brian Barrett
b8442ba505 Revamp the handling of wrapper compiler flags. The user flags, main configure
flags, and mca flags are kept seperate until the very end.  The main configure
wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD
macro.  MCA components should either let <framework>_<component>_{LIBS,LDFLAGS}
be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}.
The situations in which WRAPPER CPPFLAGS can be set by MCA components was
made very small to match the one use case where it makes sense.

This commit was SVN r27950.
2013-01-29 00:00:43 +00:00
Nathan Hjelm
bdedd8b0d3 Per RFC modify the behavior of mca_base_components_close to NOT close the output. Modify frameworks to always close their output and set to -1.
Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality.

This commit was SVN r27570.
2012-11-06 19:09:26 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Jeff Squyres
3bf038bb1c Per RFC from long ago:
http://www.open-mpi.org/community/lists/devel/2011/10/9784.php

Bring support for a DMTCP CRS module into the trunk.  See
http://dmtcp.sourceforge.net/ for a description of DMTCP.  Thanks to
the contribution from Alex Brick at Northeastern University, and all
the others up there who helped shepherd this into being ready to
submit.

This commit was SVN r26176.
2012-03-22 12:01:46 +00:00
Josh Hursey
076db435cd * Fixes trac:2807 : Improve the BLCR configure option so that it checks if the {{{--with-blcr}}} option is specified, but not {{{--with-ft=cr}}}, then configure returns an error (since this is clearly not what the user intended).
This commit was SVN r25590.

The following Trac tickets were found above:
  Ticket 2807 --> https://svn.open-mpi.org/trac/ompi/ticket/2807
2011-12-07 21:36:03 +00:00
Josh Hursey
58938b2f50 * Clarified show help when CRS component cannot be loaded.
* Fixes trac:2329 : Improves the error message, and ensures opal-restart will not segv in opal_finalize.

This commit was SVN r25586.

The following Trac tickets were found above:
  Ticket 2329 --> https://svn.open-mpi.org/trac/ompi/ticket/2329
2011-12-07 14:58:08 +00:00
Josh Hursey
b223d355fc Add explicit number for opal_crs_state_type_t enum (for debugging). Also add a MAX so we can easily check for out of bounds states during debugging.
This commit was SVN r24766.
2011-06-09 14:27:24 +00:00
Josh Hursey
7c737b9274 Some string and state cleanup. Thanks to George Bosilca for the initial patch.
This commit was SVN r24471.
2011-03-01 20:12:23 +00:00
Josh Hursey
66af515061 Fix C/R functionality with the new libtool. This fixes the case where the restarted process cannot be checkpointed or finalized.
Short Version:
--------------
Event engine needs to be flushed so it does not use old/stale file descriptors.

Long Version:
-------------
The problem was that the restarted process was waiting for the socket to the local daemon to finish establishing during the 'sync' operation. The core problem was that the daemon was sending a header of 36 bytes, but the restarted process only received 35 bytes of the message. So the restarted process became stuck waiting for the last byte to arrive.

After many hours of digging, I figured out that the event engine was using the same file descriptor for its evsig_cb functionality (to signal itself when a signal arrives). So when the daemon wrote in to the new fd the event engine was stealing the first byte (*shakes fist at event engine*) before the recv() could be posted.

The solution is to use the event_reinit() function on restart to re-establish the now-stale file descriptors in the event engine. This seems to have fixed the problem.


A few other minor things:
-------------------------
 * Add a check to make sure the event engine is balanced in its init/finalize
 * Add the opal_event_base_close() to the BLCR restart exec function (still not 100% sure it is needed, but there it is).

This commit was SVN r24296.
2011-01-25 22:43:47 +00:00
Josh Hursey
676adfb7cc fix compile error, need to revisit this line later
This commit was SVN r23991.
2010-11-03 17:34:11 +00:00