1
1
Граф коммитов

14935 Коммитов

Автор SHA1 Сообщение Дата
Josh Hursey
fabd5cc153 Simplification of the ErrMgr framework by removing the 'stack'/composite functionality.
The composite functionality was becoming difficult to maintain, so we removed it for now which simplifies the framework design considerably.

Since the 'crmig' and 'autor' components were -very- similar to the 'hnp' component, this commit also merges them together. By moving the 'crmig' and 'autor' to a separate file under the 'hnp' component we are able to isolate the C/R logic to a large extent, thus being only minimally hooked into the previous 'hnp' component.

So other than some name changes, the functionality is all still in place. I will update the C/R documentation later this morning.

This commit was SVN r23628.
2010-08-19 13:09:20 +00:00
Josh Hursey
77792c937d When we checkpoint with the --stop option, be sure to write out all the metadata before clearing the storage handle.
Here we tried to write out the session directory marker after we sync the directory, which happens early in the case of --stop.

Thanks to Ananda Mudar for noticing the bug.

This commit was SVN r23627.
2010-08-18 20:44:03 +00:00
Rolf vandeVaart
e71827b8ff Undo 4 of the 5 changes introduced by r22638. Leave
one of them in as it may still be needed on Solaris.

This fixes trac:2530.

This commit was SVN r23626.

The following SVN revision numbers were found above:
  r22638 --> open-mpi/ompi@2a4b1227d9

The following Trac tickets were found above:
  Ticket 2530 --> https://svn.open-mpi.org/trac/ompi/ticket/2530
2010-08-18 20:06:50 +00:00
Brian Barrett
94be1e043d Fix make distcheck issue with mpiextensions framework
This commit was SVN r23625.
2010-08-18 17:18:20 +00:00
Rainer Keller
33f2b9398e - This warning now is not supported anymore. Using it generates
a warning itselve (when another warning is generated within the file),
   which can be rather anying.
   Therefore check for output regarding this unrecognized warning.

This commit was SVN r23624.
2010-08-18 06:01:23 +00:00
Rainer Keller
104afe39e4 - ompi_ext.m4: For VPATH builds, create the subdirectories first
- mpiext.h: For OMPI_DECLSPEC, include the ompi_config.h

This commit was SVN r23623.
2010-08-17 22:40:22 +00:00
Brian Barrett
13c827dda8 Make trunk compile on Red Storm again
This commit was SVN r23622.
2010-08-17 21:51:38 +00:00
Brian Barrett
6ae9790d19 * Add option of init/fini hooks for MPI extensions to be called at the end of
MPI_INIT and start of MPI_FINALIZE.
* Clean up MPI Extensions build system to acknowledge that OMPI's the only
  project with extensions, as well as remove some build artifacts necessary
  for more general components.

This commit was SVN r23616.
2010-08-17 04:44:22 +00:00
Ralph Castain
bbf84fd92b Refine the protection from cross-dvm communications
This commit was SVN r23615.
2010-08-16 16:33:39 +00:00
Mike Dubman
a036c24253 revert fix to comply with #2534
- use op->o_name directly
- cosmetic prints

This commit was SVN r23614.
2010-08-15 11:04:34 +00:00
Ralph Castain
23904c2f3e Correct the extra_dist path to the .windows file
This commit was SVN r23613.
2010-08-14 01:21:58 +00:00
Ralph Castain
930f7adb0f Check the return status and report any error
This commit was SVN r23611.
2010-08-13 15:04:59 +00:00
Ralph Castain
4491a0e5dc Add a channel for reporting errors, fix a bug in the tcp module
This commit was SVN r23610.
2010-08-13 15:04:22 +00:00
Ralph Castain
ace1f60429 Rename an mca param to something more intuitive and set its default to 0 so the module only runs if a non-zero value is provided
This commit was SVN r23609.
2010-08-13 15:03:45 +00:00
Jeff Squyres
a2f349167e Update hwloc to 1.0.3a1r2398. This fixes a problem with Solaris
linking against libibverbs on Solaris.

Sorry for the mid-day configure change folks; I meant to commit this
last night and forgot.  :-(

This commit was SVN r23606.
2010-08-13 13:18:09 +00:00
Shiqing Fan
550f180014 Add a windows support file into the tarball.
This commit was SVN r23605.
2010-08-13 11:54:13 +00:00
Rainer Keller
14aad075eb - On Jaguar, we don't have pretty printed stackframe, aka no opal_stackframe_output*
This commit was SVN r23602.
2010-08-12 14:44:56 +00:00
Rainer Keller
fc4cb0c0c1 - Allow changing ALPS run command
- Fix misnomer

This commit was SVN r23601.
2010-08-12 14:41:35 +00:00
Jeff Squyres
e6f0422f7c r20280 introduced the op framework and changed all the back-end op
string names from MPI_<foo> to MPI_OP_<foo>.  While these names are
OMPI-internal-only (i.e., not exposed to MPI applications), this
change is a difference between the released 1.3/1.4 series.

The Voltaire FCA library uses these strings for its own internal
purposes; since the names changed between the 1.3/1.4 series and the
upcoming 1.5 series, it caused a problem for the FCA library.  They
volunteered to put in a hot fix in FCA, but it seems to me that we
shouldn't change the names to begin with -- there was no real reason
to change them to MPI_OP_<foo>.  So this commit changes them back to
MPI_<foo>. 

This commit was SVN r23600.

The following SVN revision numbers were found above:
  r20280 --> open-mpi/ompi@4d8a187450
2010-08-12 13:56:01 +00:00
Shiqing Fan
330999e36c Some fixes for C/R enhancement on Windows. Add the option and fix some type casts, just let it compile.
This commit was SVN r23599.
2010-08-12 13:31:37 +00:00
Mike Dubman
16d7169680 refactoring:
* split fca_open() into fca_register() and fca_open()

This commit was SVN r23598.
2010-08-12 12:05:23 +00:00
Mike Dubman
ba5bc9b674 fixes:
* fixup lookup of supported ops by name:
        in ompi 1.5.x the op string representation were changed from MPI_XXX to MPI_OP_XXX (relative to OMPI 1.4.x)
		* keep compat between diff versions of FCA
		* better error handling (return error if symbol not found)
		* register to opal_progress and call fca_progress API

This commit was SVN r23597.
2010-08-12 08:15:55 +00:00
Ralph Castain
5715a5b421 Let VM-based mappings include the updated nidmap
This commit was SVN r23596.
2010-08-11 21:04:28 +00:00
Ralph Castain
18f7b919d1 Update platform files to no-build new components and frameworks
This commit was SVN r23595.
2010-08-11 21:04:02 +00:00
Josh Hursey
e50f6fb71a update svn:ignore
This commit was SVN r23591.
2010-08-11 00:44:53 +00:00
Rolf vandeVaart
5e59de9ce6 Update comment to explain why macro should only
be used for 64-bit SPARC because of performance
implications.  Also added minor optimization to
macro.

This fixes trac:2526. 

This commit was SVN r23590.

The following Trac tickets were found above:
  Ticket 2526 --> https://svn.open-mpi.org/trac/ompi/ticket/2526
2010-08-10 21:13:27 +00:00
Rainer Keller
7c85144ac6 - Hmm, these mca parameters indeeed are registered twice ,-]
Thanks, Jeff!

   This should be added to CMR:v1.5:#2527

This commit was SVN r23589.
2010-08-10 21:11:59 +00:00
Rainer Keller
dd63c2a922 - svn propedit svn:ignore
This commit was SVN r23588.
2010-08-10 20:53:04 +00:00
Josh Hursey
e12ca48cd9 A number of C/R enhancements per RFC below:
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php

Documentation:
  http://osl.iu.edu/research/ft/

Major Changes: 
-------------- 
 * Added C/R-enabled Debugging support. 
   Enabled with the --enable-crdebug flag. See the following website for more information: 
   http://osl.iu.edu/research/ft/crdebug/ 
 * Added Stable Storage (SStore) framework for checkpoint storage 
   * 'central' component does a direct to central storage save 
   * 'stage' component stages checkpoints to central storage while the application continues execution. 
     * 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress) 
     * 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching) 
 * Added Compression (compress) framework to support 
 * Add two new ErrMgr recovery policies 
   * {{{crmig}}} C/R Process Migration 
   * {{{autor}}} C/R Automatic Recovery 
 * Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component 
 * Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option) 
   * {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342) 
   * {{{OMPI_CR_Restart}}} 
   * {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules) 
   * {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192) 
   * {{{OMPI_CR_Quiesce_start}}} 
   * {{{OMPI_CR_Quiesce_checkpoint}}} 
   * {{{OMPI_CR_Quiesce_end}}} 
   * {{{OMPI_CR_self_register_checkpoint_callback}}} 
   * {{{OMPI_CR_self_register_restart_callback}}} 
   * {{{OMPI_CR_self_register_continue_callback}}} 
 * The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future. 
 * Add a progress meter to: 
   * FileM rsh (filem_rsh_process_meter) 
   * SnapC full (snapc_full_progress_meter) 
   * SStore stage (sstore_stage_progress_meter) 
 * Added 2 new command line options to ompi-restart 
   * --showme : Display the full command line that would have been exec'ed. 
   * --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413) 
 * Deprecated some MCA params: 
   * crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir 
   * snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir 
   * snapc_base_global_shared deprecated, use sstore_stage_global_is_shared 
   * snapc_base_store_in_place deprecated, replaced with different components of SStore 
   * snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref 
   * snapc_base_establish_global_snapshot_dir deprecated, never well supported 
   * snapc_full_skip_filem deprecated, use sstore_stage_skip_filem 

Minor Changes: 
-------------- 
 * Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing. 
 * Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components 
 * Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it. 
 * Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}} 
 * Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set. 
 * opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality. 
 * Cleanup the CRS framework and components to work with the SStore framework. 
 * Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably). 
 * Add 'quiesce' hook to CRCP for a future enhancement. 
 * We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}. 
 * Add optional application level INC callbacks (registered through the CR MPI Ext interface). 
 * Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive. 
 * {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked. 
 * {{{opal-restart}}} also support local decompression before restarting 
 * {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata 
 * {{{orte-restart}}} now uses the SStore framework to work with the metadata 
 * Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality. 
 * Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}. 
 * Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped. 
 * Make sure to decrement the number of 'num_local_procs' in the orted when one goes away. 
 * odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options. 
 * Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities. 
 * Improve the checks for 'already checkpointing' error path. 
 * A a recovery output timer, to show how long it takes to restart a job 
 * Do a better job of cleaning up the old session directory on restart. 
 * Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment) 
 * Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize. 

This commit was SVN r23587.

The following Trac tickets were found above:
  Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
  Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
  Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
  Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
  Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
  Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
  Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
2010-08-10 20:51:11 +00:00
Rainer Keller
9fff01704f - Add on to r23580: we do check for F90's DOUBLE COMPLEX, but do not do
so for F77. The DDT-engine is taken care of, it maps to C's dblcplx
   accordingly.

   Manually added to CMR:

This commit was SVN r23586.

The following SVN revision numbers were found above:
  r23580 --> open-mpi/ompi@16bf3c2f30
2010-08-10 20:33:50 +00:00
Terry Dontje
b74ef351b7 Added new solaris sysinfo module. Also added code to assign
orte_local_chip_type and orte_local_chip_model in MPI processes it the
appropriate sysinfo module found the values on the machine.

This commit was SVN r23581.
2010-08-09 19:28:56 +00:00
Jeff Squyres
16bf3c2f30 Fix an issue with ompi_info reporting the wrong sizes/alignments for
some Fortran types.  Thanks to Gus Correa and others for helping
identify this issue.

This commit was SVN r23580.
2010-08-09 16:56:32 +00:00
Rainer Keller
c2d1002e50 - The directory $(MPT_DIR)/lib/snos64 containing libalpslli
does not exist anymore on JaguarPF... Fun.

This commit was SVN r23579.
2010-08-09 16:15:25 +00:00
Rainer Keller
2ee01042c9 - Spelling fixes and line breaks in the parameter descriptions.
Please cmr:v1.5

This commit was SVN r23578.
2010-08-09 16:10:31 +00:00
Rainer Keller
28a5043c93 - svn propedit svn:ignore
Makefile.in and friends

This commit was SVN r23577.
2010-08-09 16:02:32 +00:00
Nysal Jan
b6524f6a92 Fix the conditional branch, jump to the correct location. Reported by Matthew Clark
This commit was SVN r23576.
2010-08-09 10:07:58 +00:00
Matthias Jurenz
8f940bd53b Fixed typo
This commit was SVN r23572.
2010-08-09 08:52:05 +00:00
Mike Dubman
7d1a8a154d fca:
- keep compat to fca v1.2 and fca 2.0
- fix segv
- keep compat to ompi 1.4.x

This commit was SVN r23569.
2010-08-08 13:28:41 +00:00
Matthias Jurenz
e0844f3a40 - enforced creating of event/summary files even process/thread doesn't produce trace data
(reworked r23550)
- append "vampirtrace" to ${datarootdir} and ${includedir} even the options '--includedir' and '--datarootdir' are specified
  (this is meaningful for the creation of the Open MPI distribution packages)
- disable OpenMP support in otfprofile if the PGI compiler is used to work around the following errors:

	compiler version  compiler error
	< 9.0-3           PGCC-S-0000-Internal compiler error. calc_dw_tag:no tag
	(see Technical Problem Report 4337 at http://www.pgroup.com/support/release_tprs_90.htm)

	10.1 - 10.6       this kind of pragma may not be used here
	                        #pargma omp barrier

This commit was SVN r23564.

The following SVN revision numbers were found above:
  r23550 --> open-mpi/ompi@3ef374478f
2010-08-06 12:47:40 +00:00
Eugene Loh
5e1f40ba12 Clean up English and clarify a point regarding process binding
in the mpirun man page.

This commit was SVN r23558.
2010-08-05 23:10:44 +00:00
Ralph Castain
9c69175117 If debug is enabled, provide an mca param and supporting logic to output when OPAL_ACQUIRE_THREAD is waiting and has obtained the thread, and when OPAL_RELEASE_THREAD releases it.
This commit was SVN r23557.
2010-08-05 16:25:32 +00:00
Shiqing Fan
b8db8d0ef8 Need to change another variable name.
This commit was SVN r23556.
2010-08-05 12:38:28 +00:00
Eugene Loh
1c12192c93 Fix grammatical typo in orterun man page.
This commit was SVN r23555.
2010-08-04 20:37:28 +00:00
Rolf vandeVaart
0324fdb407 Created two new macros that are used when filling in either the
status structure or the _ucount field in the status structure.
On 64-bit sparc, the macros resolve into integer array assignments.
For all others, they are just simple assignments.  This fixes 
possible BUS errors seen when running on the SPARC processor.
This bug was introduced when the _count field changed from an int
into a size_t.  See the changes to request.h for additional details.

This commit fixes trac:2514.

This commit was SVN r23554.

The following Trac tickets were found above:
  Ticket 2514 --> https://svn.open-mpi.org/trac/ompi/ticket/2514
2010-08-04 19:36:40 +00:00
Shiqing Fan
33719634da Use different variable for option definitions, otherwise CMake get confused somehow.
This commit was SVN r23553.
2010-08-04 19:11:27 +00:00
Mike Dubman
2914d11793 fix datatype API
This commit was SVN r23552.
2010-08-04 14:01:54 +00:00
Shiqing Fan
6893021f7c Get rid of the week string in the date format, it might contain different/unusual characters based on windows language setting.
This commit was SVN r23551.
2010-08-04 09:13:20 +00:00
Matthias Jurenz
3ef374478f Do only write active process/thread ids to the OTF master control file (*.otf).
Vampir 7.2 is unable to load trace files where processes/threads are defined which didn't produced event records.

This commit was SVN r23550.
2010-08-04 07:12:39 +00:00
Ralph Castain
9fbd7c1949 Fix a bug in tcp multicast
This commit was SVN r23547.
2010-08-04 01:37:54 +00:00
Ralph Castain
586f5b8bf5 Add missing includes per Greg Koenig
This commit was SVN r23546.
2010-08-03 17:30:59 +00:00