This commit looks larger than it really is since it includes a fair amount of code cleanup.
The SIGSTOP/SIGCONT+checkpointing work uses some of the functionality in r20391. Basic use case below (note that the checkpoint generated is useable as usual if the stopped application is terminated).
{{{
shell 1) mpirun -np 2 -am ft-enable-cr my-app
... running ...
shell 2) ompi-checkpoint --stop -v MPIRUN_PID
[localhost:001300] [ 0.00 / 0.20] Requested - ...
[localhost:001300] [ 0.00 / 0.20] Pending - ...
[localhost:001300] [ 0.01 / 0.21] Running - ...
[localhost:001300] [ 1.01 / 1.22] Stopped - ompi_global_snapshot_1234.ckpt
Snapshot Ref.: 0 ompi_global_snapshot_1234.ckpt
shell 2) killall -CONT mpirun
... Application Continues execution in shell 1 ...
}}}
Other items in this commit are mostly cleanup that has been sitting off-trunk for too long:
* Add a new {{{opal_crs_base_ckpt_options_t}}} type that encapsulates the various options that could be passed to the CRS. Currently only TERM and STOP, but this makes adding others ''much'' easier.
* Eliminate ORTE_SNAPC_CKPT_STATE_PENDING_TERM, since it served a redundant purpose with the new options type.
* Lay some basic ground work for some future features.
This commit was SVN r21995.
The following SVN revision numbers were found above:
r20391 --> open-mpi/ompi@0704b98668
#if defined (c_plusplus)
defined (__cplusplus)
followed by
extern "C" {
and the closing counterpart by BEGIN_C_DECLS and END_C_DECLS.
Notable exceptions are:
- opal/include/opal_config_bottom.h:
This is our generated code, that itself defines BEGIN_C_DECL and
END_C_DECL
- ompi/mpi/cxx/mpicxx.h:
Here we do not include opal_config_bottom.h:
- Belongs to external code:
opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.c
opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.h
- opal/include/opal/prefetch.h:
Has C++ specific macros that are protected:
- Had #if ... } #endif _and_ END_C_DECLS (aka end up with 2x
END_C_DECLS)
ompi/mca/btl/openib/btl_openib.h
- opal/event/event.h has #ifdef __cplusplus as BEGIN_C_DECLS...
- opal/win32/ompi_process.h: had extern "C"\n {...
opal/win32/ompi_process.h: dito
- ompi/mca/btl/pcie/btl_pcie_lex.l: needed to add *_C_DECLS
ompi/mpi/f90/test/align_c.c: dito
- ompi/debuggers/msgq_interface.h: used #ifdef __cplusplus
- ompi/mpi/f90/xml/common-C.xsl: Amend
Tested on linux using --with-openib and --with-mx
The following do not contain either opal_config.h, orte_config.h or
ompi_config.h
(but possibly other header files, that include one of the above):
ompi/mca/bml/r2/bml_r2_ft.h
ompi/mca/btl/gm/btl_gm_endpoint.h
ompi/mca/btl/gm/btl_gm_proc.h
ompi/mca/btl/mx/btl_mx_endpoint.h
ompi/mca/btl/ofud/btl_ofud_endpoint.h
ompi/mca/btl/ofud/btl_ofud_frag.h
ompi/mca/btl/ofud/btl_ofud_proc.h
ompi/mca/btl/openib/btl_openib_mca.h
ompi/mca/btl/portals/btl_portals_endpoint.h
ompi/mca/btl/portals/btl_portals_frag.h
ompi/mca/btl/sctp/btl_sctp_endpoint.h
ompi/mca/btl/sctp/btl_sctp_proc.h
ompi/mca/btl/tcp/btl_tcp_endpoint.h
ompi/mca/btl/tcp/btl_tcp_ft.h
ompi/mca/btl/tcp/btl_tcp_proc.h
ompi/mca/btl/template/btl_template_endpoint.h
ompi/mca/btl/template/btl_template_proc.h
ompi/mca/btl/udapl/btl_udapl_eager_rdma.h
ompi/mca/btl/udapl/btl_udapl_endpoint.h
ompi/mca/btl/udapl/btl_udapl_mca.h
ompi/mca/btl/udapl/btl_udapl_proc.h
ompi/mca/mtl/mx/mtl_mx_endpoint.h
ompi/mca/mtl/mx/mtl_mx.h
ompi/mca/mtl/psm/mtl_psm_endpoint.h
ompi/mca/mtl/psm/mtl_psm.h
ompi/mca/pml/cm/pml_cm_component.h
ompi/mca/pml/csum/pml_csum_comm.h
ompi/mca/pml/dr/pml_dr_comm.h
ompi/mca/pml/dr/pml_dr_component.h
ompi/mca/pml/dr/pml_dr_endpoint.h
ompi/mca/pml/dr/pml_dr_recvfrag.h
ompi/mca/pml/example/pml_example.h
ompi/mca/pml/ob1/pml_ob1_comm.h
ompi/mca/pml/ob1/pml_ob1_component.h
ompi/mca/pml/ob1/pml_ob1_endpoint.h
ompi/mca/pml/ob1/pml_ob1_rdmafrag.h
ompi/mca/pml/ob1/pml_ob1_recvfrag.h
ompi/mca/pml/v/pml_v_output.h
opal/include/opal/prefetch.h
opal/mca/timer/aix/timer_aix.h
opal/util/qsort.h
test/support/components.h
This commit was SVN r21855.
The following SVN revision numbers were found above:
r2 --> open-mpi/ompi@58fdc18855
* Check for {{{dlfcn.h}}} in the self component's configure.m4 (also clean up the .m4 a bit.
* Adjust the priority of the BLCR component so that the self component has a higher priority (if the application went to the trouble of writing the routines, why not use them.) The 'self' component checks for the appropriate functions during query, so it know if it -can- be used during component selection.
* Adjust some copyrights that I missed before
* Fix a warning when casing the result of dlsym() into a function pointer. There is a bit of pointer magic to make this happen (thanks to the following website, and RedHat EL 4 man pages for illustrating it:
http://www.opengroup.org/onlinepubs/009695399/functions/dlsym.html
Passing to Jeff for a final review of the patch before moving to v1.3.
This commit was SVN r21768.
The following SVN revision numbers were found above:
r21766 --> open-mpi/ompi@91e52d062b
Due to the visibility patch to libltdl in r21731, this module can no longer access or use the libltdl interfaces directly. Instead just use the dlopen/dlsym/dlclose functions directly. This is a portability implication here, but for the moment it does not seem to bite us.
Also in this patch, cleanup some of the 'self' specific code paths.
* opal-restart need not special case the 'self' component since it can now interact with it as if it were a normal component.
* Cleanup the initialization of the cmd line arguments in opal-restart.
* Make sure to mark opal-restart as a 'tool', but do so by setting the global variable directly instead of setting the environment variable, which could be inherited by the application.
* Most of the functions in the 'self' component should not be used by a command line tool (exception being 'restart'), so make sure that if we accidently call them then errors are returned.
* Increase the priority of the 'none' component to be above that of 'self' when being selected in a command line tool. This allows for both mpirun and opal-restart to work correctly with the 'self' module.
This commit was SVN r21766.
The following SVN revision numbers were found above:
r21731 --> open-mpi/ompi@0278b86456
* Pass the sequence number of the checkpoint along with reference from the global to the local coordinator.
* 'orte-restart --apponly' now just generates the app context file, and does not run with it. This provides the user the ability to edit the file before launching.
* Add a OPAL_CRS_NONE state
* Split the INC into three distinct parts.
* Implement a restart mechanism for the 'none' component. If given a context it simply execvp()'s it.
This commit was SVN r21195.
- Delete unnecessary header files using
contrib/check_unnecessary_headers.sh after applying
patches, that include headers, being "lost" due to
inclusion in one of the now deleted headers...
In total 817 files are touched.
In ompi/mpi/c/ header files are moved up into the actual c-file,
where necessary (these are the only additional #include),
otherwise it is only deletions of #include (apart from the above
additions required due to notifier...)
- To get different MCAs (OpenIB, TM, ALPS), an earlier version was
successfully compiled (yesterday) on:
Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled
This commit was SVN r21096.
In case we use memcmp, strlen, strup and friends include <string.h>
Also several constants.h are not included directly
- Let's have mca_topo_base_cart_create return ompi-errors in
ompi/mca/topo/base/topo_base_cart_create.c
This commit was SVN r20773.
* add "register" function to mca_base_component_t
* converted coll:basic and paffinity:linux and paffinity:solaris to
use this function
* we'll convert the rest over time (I'll file a ticket once all
this is committed)
* add 32 bytes of "reserved" space to the end of mca_base_component_t
and mca_base_component_data_2_0_0_t to make future upgrades
[slightly] easier
* new mca_base_component_t size: 196 bytes
* new mca_base_component_data_2_0_0_t size: 36 bytes
* MCA base version bumped to v2.0
* '''We now refuse to load components that are not MCA v2.0.x'''
* all MCA frameworks versions bumped to v2.0
* be a little more explicit about version numbers in the MCA base
* add big comment in mca.h about versioning philosophy
This commit was SVN r19073.
The following Trac tickets were found above:
Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392
I promoted the ''none'' component to a full component, and updated the other components to reflect this code movement. The ''none'' component is the default component unless the user requests '''-am ft-enable-cr''' to auto-select a component. There is an MCA parameter to show a warning if the application requested an FT enabled job, but the ''none'' component was selected ({{{crs_none_select_warning}}}).
This temporarily fixes the problem mentioned in r18739. The full fix will entail working on ticket #1291.
Thanks to Ethan from Sun for finding this bug.
This commit was SVN r18840.
The following SVN revision numbers were found above:
r18739 --> open-mpi/ompi@a003fa7a50