1
1
Граф коммитов

14918 Коммитов

Автор SHA1 Сообщение Дата
Rainer Keller
fc4cb0c0c1 - Allow changing ALPS run command
- Fix misnomer

This commit was SVN r23601.
2010-08-12 14:41:35 +00:00
Jeff Squyres
e6f0422f7c r20280 introduced the op framework and changed all the back-end op
string names from MPI_<foo> to MPI_OP_<foo>.  While these names are
OMPI-internal-only (i.e., not exposed to MPI applications), this
change is a difference between the released 1.3/1.4 series.

The Voltaire FCA library uses these strings for its own internal
purposes; since the names changed between the 1.3/1.4 series and the
upcoming 1.5 series, it caused a problem for the FCA library.  They
volunteered to put in a hot fix in FCA, but it seems to me that we
shouldn't change the names to begin with -- there was no real reason
to change them to MPI_OP_<foo>.  So this commit changes them back to
MPI_<foo>. 

This commit was SVN r23600.

The following SVN revision numbers were found above:
  r20280 --> open-mpi/ompi@4d8a187450
2010-08-12 13:56:01 +00:00
Shiqing Fan
330999e36c Some fixes for C/R enhancement on Windows. Add the option and fix some type casts, just let it compile.
This commit was SVN r23599.
2010-08-12 13:31:37 +00:00
Mike Dubman
16d7169680 refactoring:
* split fca_open() into fca_register() and fca_open()

This commit was SVN r23598.
2010-08-12 12:05:23 +00:00
Mike Dubman
ba5bc9b674 fixes:
* fixup lookup of supported ops by name:
        in ompi 1.5.x the op string representation were changed from MPI_XXX to MPI_OP_XXX (relative to OMPI 1.4.x)
		* keep compat between diff versions of FCA
		* better error handling (return error if symbol not found)
		* register to opal_progress and call fca_progress API

This commit was SVN r23597.
2010-08-12 08:15:55 +00:00
Ralph Castain
5715a5b421 Let VM-based mappings include the updated nidmap
This commit was SVN r23596.
2010-08-11 21:04:28 +00:00
Ralph Castain
18f7b919d1 Update platform files to no-build new components and frameworks
This commit was SVN r23595.
2010-08-11 21:04:02 +00:00
Josh Hursey
e50f6fb71a update svn:ignore
This commit was SVN r23591.
2010-08-11 00:44:53 +00:00
Rolf vandeVaart
5e59de9ce6 Update comment to explain why macro should only
be used for 64-bit SPARC because of performance
implications.  Also added minor optimization to
macro.

This fixes trac:2526. 

This commit was SVN r23590.

The following Trac tickets were found above:
  Ticket 2526 --> https://svn.open-mpi.org/trac/ompi/ticket/2526
2010-08-10 21:13:27 +00:00
Rainer Keller
7c85144ac6 - Hmm, these mca parameters indeeed are registered twice ,-]
Thanks, Jeff!

   This should be added to CMR:v1.5:#2527

This commit was SVN r23589.
2010-08-10 21:11:59 +00:00
Rainer Keller
dd63c2a922 - svn propedit svn:ignore
This commit was SVN r23588.
2010-08-10 20:53:04 +00:00
Josh Hursey
e12ca48cd9 A number of C/R enhancements per RFC below:
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php

Documentation:
  http://osl.iu.edu/research/ft/

Major Changes: 
-------------- 
 * Added C/R-enabled Debugging support. 
   Enabled with the --enable-crdebug flag. See the following website for more information: 
   http://osl.iu.edu/research/ft/crdebug/ 
 * Added Stable Storage (SStore) framework for checkpoint storage 
   * 'central' component does a direct to central storage save 
   * 'stage' component stages checkpoints to central storage while the application continues execution. 
     * 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress) 
     * 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching) 
 * Added Compression (compress) framework to support 
 * Add two new ErrMgr recovery policies 
   * {{{crmig}}} C/R Process Migration 
   * {{{autor}}} C/R Automatic Recovery 
 * Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component 
 * Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option) 
   * {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342) 
   * {{{OMPI_CR_Restart}}} 
   * {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules) 
   * {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192) 
   * {{{OMPI_CR_Quiesce_start}}} 
   * {{{OMPI_CR_Quiesce_checkpoint}}} 
   * {{{OMPI_CR_Quiesce_end}}} 
   * {{{OMPI_CR_self_register_checkpoint_callback}}} 
   * {{{OMPI_CR_self_register_restart_callback}}} 
   * {{{OMPI_CR_self_register_continue_callback}}} 
 * The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future. 
 * Add a progress meter to: 
   * FileM rsh (filem_rsh_process_meter) 
   * SnapC full (snapc_full_progress_meter) 
   * SStore stage (sstore_stage_progress_meter) 
 * Added 2 new command line options to ompi-restart 
   * --showme : Display the full command line that would have been exec'ed. 
   * --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413) 
 * Deprecated some MCA params: 
   * crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir 
   * snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir 
   * snapc_base_global_shared deprecated, use sstore_stage_global_is_shared 
   * snapc_base_store_in_place deprecated, replaced with different components of SStore 
   * snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref 
   * snapc_base_establish_global_snapshot_dir deprecated, never well supported 
   * snapc_full_skip_filem deprecated, use sstore_stage_skip_filem 

Minor Changes: 
-------------- 
 * Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing. 
 * Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components 
 * Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it. 
 * Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}} 
 * Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set. 
 * opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality. 
 * Cleanup the CRS framework and components to work with the SStore framework. 
 * Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably). 
 * Add 'quiesce' hook to CRCP for a future enhancement. 
 * We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}. 
 * Add optional application level INC callbacks (registered through the CR MPI Ext interface). 
 * Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive. 
 * {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked. 
 * {{{opal-restart}}} also support local decompression before restarting 
 * {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata 
 * {{{orte-restart}}} now uses the SStore framework to work with the metadata 
 * Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality. 
 * Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}. 
 * Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped. 
 * Make sure to decrement the number of 'num_local_procs' in the orted when one goes away. 
 * odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options. 
 * Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities. 
 * Improve the checks for 'already checkpointing' error path. 
 * A a recovery output timer, to show how long it takes to restart a job 
 * Do a better job of cleaning up the old session directory on restart. 
 * Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment) 
 * Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize. 

This commit was SVN r23587.

The following Trac tickets were found above:
  Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
  Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
  Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
  Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
  Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
  Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
  Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
2010-08-10 20:51:11 +00:00
Rainer Keller
9fff01704f - Add on to r23580: we do check for F90's DOUBLE COMPLEX, but do not do
so for F77. The DDT-engine is taken care of, it maps to C's dblcplx
   accordingly.

   Manually added to CMR:

This commit was SVN r23586.

The following SVN revision numbers were found above:
  r23580 --> open-mpi/ompi@16bf3c2f30
2010-08-10 20:33:50 +00:00
Terry Dontje
b74ef351b7 Added new solaris sysinfo module. Also added code to assign
orte_local_chip_type and orte_local_chip_model in MPI processes it the
appropriate sysinfo module found the values on the machine.

This commit was SVN r23581.
2010-08-09 19:28:56 +00:00
Jeff Squyres
16bf3c2f30 Fix an issue with ompi_info reporting the wrong sizes/alignments for
some Fortran types.  Thanks to Gus Correa and others for helping
identify this issue.

This commit was SVN r23580.
2010-08-09 16:56:32 +00:00
Rainer Keller
c2d1002e50 - The directory $(MPT_DIR)/lib/snos64 containing libalpslli
does not exist anymore on JaguarPF... Fun.

This commit was SVN r23579.
2010-08-09 16:15:25 +00:00
Rainer Keller
2ee01042c9 - Spelling fixes and line breaks in the parameter descriptions.
Please cmr:v1.5

This commit was SVN r23578.
2010-08-09 16:10:31 +00:00
Rainer Keller
28a5043c93 - svn propedit svn:ignore
Makefile.in and friends

This commit was SVN r23577.
2010-08-09 16:02:32 +00:00
Nysal Jan
b6524f6a92 Fix the conditional branch, jump to the correct location. Reported by Matthew Clark
This commit was SVN r23576.
2010-08-09 10:07:58 +00:00
Matthias Jurenz
8f940bd53b Fixed typo
This commit was SVN r23572.
2010-08-09 08:52:05 +00:00
Mike Dubman
7d1a8a154d fca:
- keep compat to fca v1.2 and fca 2.0
- fix segv
- keep compat to ompi 1.4.x

This commit was SVN r23569.
2010-08-08 13:28:41 +00:00
Matthias Jurenz
e0844f3a40 - enforced creating of event/summary files even process/thread doesn't produce trace data
(reworked r23550)
- append "vampirtrace" to ${datarootdir} and ${includedir} even the options '--includedir' and '--datarootdir' are specified
  (this is meaningful for the creation of the Open MPI distribution packages)
- disable OpenMP support in otfprofile if the PGI compiler is used to work around the following errors:

	compiler version  compiler error
	< 9.0-3           PGCC-S-0000-Internal compiler error. calc_dw_tag:no tag
	(see Technical Problem Report 4337 at http://www.pgroup.com/support/release_tprs_90.htm)

	10.1 - 10.6       this kind of pragma may not be used here
	                        #pargma omp barrier

This commit was SVN r23564.

The following SVN revision numbers were found above:
  r23550 --> open-mpi/ompi@3ef374478f
2010-08-06 12:47:40 +00:00
Eugene Loh
5e1f40ba12 Clean up English and clarify a point regarding process binding
in the mpirun man page.

This commit was SVN r23558.
2010-08-05 23:10:44 +00:00
Ralph Castain
9c69175117 If debug is enabled, provide an mca param and supporting logic to output when OPAL_ACQUIRE_THREAD is waiting and has obtained the thread, and when OPAL_RELEASE_THREAD releases it.
This commit was SVN r23557.
2010-08-05 16:25:32 +00:00
Shiqing Fan
b8db8d0ef8 Need to change another variable name.
This commit was SVN r23556.
2010-08-05 12:38:28 +00:00
Eugene Loh
1c12192c93 Fix grammatical typo in orterun man page.
This commit was SVN r23555.
2010-08-04 20:37:28 +00:00
Rolf vandeVaart
0324fdb407 Created two new macros that are used when filling in either the
status structure or the _ucount field in the status structure.
On 64-bit sparc, the macros resolve into integer array assignments.
For all others, they are just simple assignments.  This fixes 
possible BUS errors seen when running on the SPARC processor.
This bug was introduced when the _count field changed from an int
into a size_t.  See the changes to request.h for additional details.

This commit fixes trac:2514.

This commit was SVN r23554.

The following Trac tickets were found above:
  Ticket 2514 --> https://svn.open-mpi.org/trac/ompi/ticket/2514
2010-08-04 19:36:40 +00:00
Shiqing Fan
33719634da Use different variable for option definitions, otherwise CMake get confused somehow.
This commit was SVN r23553.
2010-08-04 19:11:27 +00:00
Mike Dubman
2914d11793 fix datatype API
This commit was SVN r23552.
2010-08-04 14:01:54 +00:00
Shiqing Fan
6893021f7c Get rid of the week string in the date format, it might contain different/unusual characters based on windows language setting.
This commit was SVN r23551.
2010-08-04 09:13:20 +00:00
Matthias Jurenz
3ef374478f Do only write active process/thread ids to the OTF master control file (*.otf).
Vampir 7.2 is unable to load trace files where processes/threads are defined which didn't produced event records.

This commit was SVN r23550.
2010-08-04 07:12:39 +00:00
Ralph Castain
9fbd7c1949 Fix a bug in tcp multicast
This commit was SVN r23547.
2010-08-04 01:37:54 +00:00
Ralph Castain
586f5b8bf5 Add missing includes per Greg Koenig
This commit was SVN r23546.
2010-08-03 17:30:59 +00:00
Ralph Castain
94f046e7f3 Fix an "oops" where we missed some instances of a name change. Thanks to Greg Koenig for spotting it!
This commit was SVN r23545.
2010-08-03 16:36:08 +00:00
Shiqing Fan
714883d472 A better way to make this work with VS 2010.
This commit was SVN r23544.
2010-08-03 09:06:50 +00:00
Shiqing Fan
a096cc9082 it's not a component for Windows, so get rid of the Windows support files.
This commit was SVN r23543.
2010-08-02 17:12:40 +00:00
Mike Dubman
59be1b1c15 updated with fca component info
This commit was SVN r23541.
2010-08-02 12:21:29 +00:00
Shiqing Fan
e822f465b5 Remove a bunch of warnings due to the new POSIX supplement in VS 2010.
This commit was SVN r23540.
2010-08-02 12:16:29 +00:00
Mike Dubman
7cbe9b43c2 initial release of Voltaire FCA (fabric collective accelerator) collective component
- compatible with FCA v1.2

This commit was SVN r23539.
2010-08-02 11:25:53 +00:00
Shiqing Fan
3ef2be67b9 Add search paths for VS 2010.
This commit was SVN r23538.
2010-08-02 10:09:23 +00:00
Ralph Castain
8ccd14d508 Correct the -h output to reflect that --bind-to-none is the default behavior
This commit was SVN r23537.
2010-08-01 22:31:34 +00:00
Josh Hursey
01fbb729e3 Need to get the nidmap so that the apps have something to look at for wireup.
Without this we get errors like the following since the nidmap is empty in the apps:
[[13094,1],1] ORTE_ERROR_LOG: Not found in file ess_env_module.c at line 234
[[13094,1],1] ORTE_ERROR_LOG: Not found in file ess_env_module.c at line 281

This commit was SVN r23536.
2010-07-31 16:25:13 +00:00
Josh Hursey
ba7e94dd89 Some relatively minor C/R related cleanup
* Fix a configure warning for checking --enable-ft-thread
 * In hnp and orted ErrMgr components check to see if other components have already recovered this process before trying to recover it again.
 * Fix 'npernode' for restarting using the resilient rmaps component
 * export ompi_info_set, so that internal functionality can use it.

This commit was SVN r23535.
2010-07-30 18:59:34 +00:00
Shiqing Fan
ea7bf2bd9e Correctly check the data type alignment for VS 2010 environment, and set the event include paths to global level, in order to make the clever VS load them.
This commit was SVN r23534.
2010-07-30 14:25:15 +00:00
Ralph Castain
d8ec83f939 Remove an unneeded tag
This commit was SVN r23533.
2010-07-29 02:13:06 +00:00
Ralph Castain
0ed98967ed Update the thread protection in the ring_buffer class
This commit was SVN r23532.
2010-07-29 02:12:44 +00:00
Rolf vandeVaart
3d9b05ba2b Fix bug introduced by r23463. We now handle positive
error codes correctly again.  Also fix a typo.
Reviewed by Jeff Squyres. 

This commit was SVN r23531.

The following SVN revision numbers were found above:
  r23463 --> open-mpi/ompi@2af3e6e5ae
2010-07-28 19:19:27 +00:00
Jeff Squyres
9ca4a4a154 Make it safe to call this script inside of an Open MPI tarball (not
just an SVN or HG checkout).

This commit was SVN r23528.
2010-07-28 17:22:04 +00:00
Jeff Squyres
c59743d7e3 Move the predefined gap test to ompi/debuggers (we already have the
dlopen_test there, so why not put the other debugger test there with
it?).

This commit was SVN r23527.
2010-07-28 16:22:10 +00:00
Jeff Squyres
49b8008986 Remove the peruse test from any possibility of being run during "make
check" (it's been deactivated for 2+ years now, anyway).  It needs to
be launched via "mpirun" and needs >= 2 processes, so it wasn't a good
candidate for "make check", anyway.

The test itself has moved to OMPI's internal testing suites.

This commit was SVN r23526.
2010-07-28 16:04:18 +00:00