1
1
Граф коммитов

14962 Коммитов

Автор SHA1 Сообщение Дата
Shiqing Fan
9911797867 Rename a few odls help files for Windows installation.
This commit was SVN r23668.
2010-08-26 09:31:18 +00:00
Ralph Castain
2e223abe33 Restore the auto-poll method for detecting debugger attachment, but only in the mpirx debugger module and only if the corresponding rate mca param is set.
Guess we missed it before, but add the debugger framework to the orte-info and ompi_info tools

This commit was SVN r23667.
2010-08-25 22:52:33 +00:00
Jeff Squyres
97fb426325 Per long-ago RFC, now that the odsl default module reports errors nicely, remove all paffinity components except for hwloc and test.
This commit was SVN r23666.
2010-08-25 22:34:30 +00:00
Ethan Mallove
7acb18f3d4 Patch ltmain.sh in autogen.sh per this Libtool thread:
http://www.mail-archive.com/libtool@gnu.org/msg11249.html

This commit was SVN r23665.
2010-08-25 19:40:17 +00:00
Jeff Squyres
2c52096976 Several EXTRA_STATE parameter types were erroneously "INTEGER" (they
should be "INTEGER(kind=MPI_ADDRESS_KIND)").  This has been wrong for
''years''.  Apparently no one who uses the F90 bindings also uses MPI
attributes.  Sigh.

This commit was SVN r23664.
2010-08-25 16:46:36 +00:00
Ralph Castain
f72cdc4160 Update the compare_name_fields function to allow the caller to specify that wildcard values are to be treated as wildcards
This commit was SVN r23663.
2010-08-25 15:35:41 +00:00
Ralph Castain
4ecd9a0bbe Protect against an obscure race condition that AFAICT only occurs when we are in a loop waiting to recv a message from a peer who is then killed by signal.
This commit was SVN r23662.
2010-08-25 15:35:01 +00:00
Shiqing Fan
7a1bdd2327 Get rid of a warning of "pointer of type ‘void *’ used in arithmetic" on Linux, which is also an error on Windows.
This commit was SVN r23660.
2010-08-25 08:26:11 +00:00
Ralph Castain
f1a00c9a21 Per Jeff's inquiry, play chicken and don't assume herror exists everywhere.
This commit was SVN r23656.
2010-08-24 20:46:41 +00:00
Jeff Squyres
207ca2d928 This commit is the first of several steps in a paffinity makeover
extravaganza.

= Short version =

This commit does several things, but the short version is that it
re-orients the error message creation of the ODLS default module to
generate error strings in the child process for errors that occur
after the fork but before the exec (such errors are ''usually''
related to paffinity).  A show_help string is rendered in the child
and then IPC'ed up to the parent, who displays the string through
normal ORTE show_help aggregation mechanisms.  We also broke up the
ginormous paffinity-setting logic into a few separate functions, both
to help us understand the code, and hopefully to ease future
maintenance.

The logic for the ODLS default binding should not have changed -- this
is mainly a code reshuffle and improvement on error reporting.

= Rationale =

The reasoning for this commit is complex.  As mentioned above, it's
the first step in some paffinity cleanup.  Here's the line of dominoes
that must fall (in this order):

 1. Add hwloc paffinity component (already done).
 1. While testing hwloc, we discovered that the error reporting from
    the ODLS default module was abysmal.  So we fixed it.
 1. Further, we reorganized the code in the odsl_default_module.c a bit
    to help our understanding of it.
 1. We also discovered a few bugs in the original ODLS default module
    logic that existed before this code shuffle; separate tickets
    will be filed to fix them.
 1. Next up will be some improvements to paffinity / odls default to
    make the act of binding to a core ensure to bind to ''all''
    hardware threads contained in that core (similar for sockets:
    binding to a socket will bind to ''all'' hardware threads in that
    socket).
 1. Next will be improvements to paffinity to expose binding to
    hardware threads through the paffinity framework API.
 1. Finally, we'll expose these binding controls to the user (e.g.,
    through mpirun command line arguments, MCA parameters, etc.).

This commit represents the first few bullets; the last 4 bullets are
being worked on right now, but there is no definite timeline for
completion. 

= Miscelaneous =

A few points worth mentioning:

 * We have tested this new code a bunch; we're pretty sure it behaves
   just like the trunk -- but with better / more precise error
   reporting.  More testing is needed on a wider array of platforms,
   however. 
 * A big comment at the top of odls_default_module.c explains the
   (new) general scheme for the error reporting.
 * The error reporting in the parent process is now really dumb;
   almost all the intelligence about creating error messages is in the
   child.
 * The show_help file was renamed to be more consistent with other
   help files (help-odls-default.txt -> help-orte-odls-default.txt)
 * Removed the use of sched_yield() because of recent changes in the
   Linux 2.6.3x kernels.  We already had an #else clause for
   select()'ing for 1us if we didn't have sched_yield() -- that is now
   the only code path.  This is not a performance-critical section of
   the code, so this shouldn't be controversial.
 * Replaced the macro-based error reporting with function-based
   reporting.  It's a bit more bulky, but it helped us understand the
   code and saved us multiple times with compile-time parameter
   checking, etc.
 * Cleaned up the use of several show_help messages to ensure that
   they mapped to real messages in help*.txt files.

This commit was SVN r23652.
2010-08-24 19:38:29 +00:00
Jeff Squyres
2c03554fe7 Add new function: orte_show_help_norender(). It is exactly the same
as orte_show_help(), but it takes a fully-rendered string instead of a
varargs list that must be rendered.  This function is useful in cases
where one entity renders the "show help" string and a different entity
sends the string via the normal orte "show help" mechanisms for
aggregation, etc.

Example usage: errors occur in the ODLS after forking but before
exec'ing.  In such cases, it makes sense for the the child process to
render the "show help" string because it has all the details about the
error.  But the child process can't call orte_show_help() itself
because it is not an ORTE process -- it can't OOB send the message
to the HNP, etc.  

After rendering the help string, the child sends the rendered string
to its parent via normal IPC (e.g., via a pipe) and the parent can
then invoke orte_show_help_norender() with the ready-to-go string.
The message then displays out via the normal mechanisms (i.e., out via
the HNP, aggregated/coalesced, etc.).

This commit was SVN r23651.
2010-08-24 19:12:57 +00:00
Jeff Squyres
a5ce58f098 Define that we return OPAL_ERR_TIMEOUT if the other end of the socket
closes in an opal_fd_read().

This commit was SVN r23650.
2010-08-24 19:07:04 +00:00
Ethan Mallove
f42c2a737f Fixes trac:2532 - "MPI_Put can result in SIGBUS on SPARC"
Reviewed by Rolf V and Brian B

This commit was SVN r23649.

The following Trac tickets were found above:
  Ticket 2532 --> https://svn.open-mpi.org/trac/ompi/ticket/2532
2010-08-24 18:10:43 +00:00
Ralph Castain
3b3cd67d07 If we are using static ports and cannot resolve a hostname, then see if the proc is on the local host. If so, then attempt to use a loopback interface to complete the connection. Only implemented for IPv4 because the if.c code has been so hashed I couldn't figure out how to do this cleanly for all cases.
This commit was SVN r23647.
2010-08-24 14:14:59 +00:00
Samuel Gutierrez
3b572e14ce Fix build issues on Windows. Thanks to Shiqing for pointing this out.
This commit was SVN r23646.
2010-08-24 14:01:05 +00:00
Mike Dubman
fca50c4a09 comply to code-style: no c++ style commends
This commit was SVN r23645.
2010-08-24 13:42:21 +00:00
Mike Dubman
9cb2e0490b removed #if 0
This commit was SVN r23643.
2010-08-24 13:32:28 +00:00
Shiqing Fan
a987eafc90 Add another sm definition for ignoring posix sm on Windows, and exclude those source files.
This commit was SVN r23640.
2010-08-24 09:28:56 +00:00
Ralph Castain
7608513158 Cleanup the code and add some comments to make it easier to understand. Add a bozo error check
This commit was SVN r23639.
2010-08-24 04:46:59 +00:00
Ralph Castain
2886da5669 Ensure that the local daemon vpid gets defined so that the locality procedures work when using the ess generic module.
This commit was SVN r23638.
2010-08-24 04:38:21 +00:00
Ralph Castain
51833bfe6c Not -everyone- wants to ignore loopback devices. Give us a choice.
This commit was SVN r23637.
2010-08-24 02:37:05 +00:00
Shiqing Fan
0aa02850bd Get rid of a debug output.
This commit was SVN r23634.
2010-08-23 16:10:47 +00:00
Samuel Gutierrez
3b162593e6 New POSIX shared memory component and other common sm enhancements.
NOTE: mmap is still the default.

Some highlights:
o Silent component failover.
o The sysv component will only be queried for selection if it is placed before
  the mmap component (for example, -mca mpi_common_sm sysv,posix,mmap).  In the
  default case, sysv will never be queried/selected.
o Per some on-list discussion, now unlinking mmaped file in both mmap and posix
  components (see: "System V Shared Memory for Open MPI: Request for Community
  Input and Testing" thread).
o  Assuming local process homogeneity with respect to all utilized shared
   memory facilities. That is, if one local process deems a particular shared
   memory facility acceptable, then ALL local processes should be able to
   utilize that facility. As it stands, this is an important point because one
   process dictates to all other local processes which common sm component will
   be selected based on its own, local run-time test.
o Addressed some of George's code reuse concerns.

This commit was SVN r23633.
2010-08-23 16:04:13 +00:00
Shiqing Fan
7a301bc417 Add support for ompi_ext on Windows.
This commit was SVN r23632.
2010-08-23 13:16:30 +00:00
Shiqing Fan
c110edbf44 Use exclude lists for non-ordinary sub directories check.
This commit was SVN r23631.
2010-08-23 09:43:05 +00:00
Rolf vandeVaart
07604d74b8 We were incorrectly handling some of the --with-wrapper-XXX
variables resulting in the same flag being installed
twice for the OMPI wrappers.  

This fixes trac:2539.

This commit was SVN r23630.

The following Trac tickets were found above:
  Ticket 2539 --> https://svn.open-mpi.org/trac/ompi/ticket/2539
2010-08-20 18:01:43 +00:00
Josh Hursey
4ffc2d6f68 fix a couple of missed prefixes
This commit was SVN r23629.
2010-08-19 13:26:33 +00:00
Josh Hursey
fabd5cc153 Simplification of the ErrMgr framework by removing the 'stack'/composite functionality.
The composite functionality was becoming difficult to maintain, so we removed it for now which simplifies the framework design considerably.

Since the 'crmig' and 'autor' components were -very- similar to the 'hnp' component, this commit also merges them together. By moving the 'crmig' and 'autor' to a separate file under the 'hnp' component we are able to isolate the C/R logic to a large extent, thus being only minimally hooked into the previous 'hnp' component.

So other than some name changes, the functionality is all still in place. I will update the C/R documentation later this morning.

This commit was SVN r23628.
2010-08-19 13:09:20 +00:00
Josh Hursey
77792c937d When we checkpoint with the --stop option, be sure to write out all the metadata before clearing the storage handle.
Here we tried to write out the session directory marker after we sync the directory, which happens early in the case of --stop.

Thanks to Ananda Mudar for noticing the bug.

This commit was SVN r23627.
2010-08-18 20:44:03 +00:00
Rolf vandeVaart
e71827b8ff Undo 4 of the 5 changes introduced by r22638. Leave
one of them in as it may still be needed on Solaris.

This fixes trac:2530.

This commit was SVN r23626.

The following SVN revision numbers were found above:
  r22638 --> open-mpi/ompi@2a4b1227d9

The following Trac tickets were found above:
  Ticket 2530 --> https://svn.open-mpi.org/trac/ompi/ticket/2530
2010-08-18 20:06:50 +00:00
Brian Barrett
94be1e043d Fix make distcheck issue with mpiextensions framework
This commit was SVN r23625.
2010-08-18 17:18:20 +00:00
Rainer Keller
33f2b9398e - This warning now is not supported anymore. Using it generates
a warning itselve (when another warning is generated within the file),
   which can be rather anying.
   Therefore check for output regarding this unrecognized warning.

This commit was SVN r23624.
2010-08-18 06:01:23 +00:00
Rainer Keller
104afe39e4 - ompi_ext.m4: For VPATH builds, create the subdirectories first
- mpiext.h: For OMPI_DECLSPEC, include the ompi_config.h

This commit was SVN r23623.
2010-08-17 22:40:22 +00:00
Brian Barrett
13c827dda8 Make trunk compile on Red Storm again
This commit was SVN r23622.
2010-08-17 21:51:38 +00:00
Brian Barrett
6ae9790d19 * Add option of init/fini hooks for MPI extensions to be called at the end of
MPI_INIT and start of MPI_FINALIZE.
* Clean up MPI Extensions build system to acknowledge that OMPI's the only
  project with extensions, as well as remove some build artifacts necessary
  for more general components.

This commit was SVN r23616.
2010-08-17 04:44:22 +00:00
Ralph Castain
bbf84fd92b Refine the protection from cross-dvm communications
This commit was SVN r23615.
2010-08-16 16:33:39 +00:00
Mike Dubman
a036c24253 revert fix to comply with #2534
- use op->o_name directly
- cosmetic prints

This commit was SVN r23614.
2010-08-15 11:04:34 +00:00
Ralph Castain
23904c2f3e Correct the extra_dist path to the .windows file
This commit was SVN r23613.
2010-08-14 01:21:58 +00:00
Ralph Castain
930f7adb0f Check the return status and report any error
This commit was SVN r23611.
2010-08-13 15:04:59 +00:00
Ralph Castain
4491a0e5dc Add a channel for reporting errors, fix a bug in the tcp module
This commit was SVN r23610.
2010-08-13 15:04:22 +00:00
Ralph Castain
ace1f60429 Rename an mca param to something more intuitive and set its default to 0 so the module only runs if a non-zero value is provided
This commit was SVN r23609.
2010-08-13 15:03:45 +00:00
Jeff Squyres
a2f349167e Update hwloc to 1.0.3a1r2398. This fixes a problem with Solaris
linking against libibverbs on Solaris.

Sorry for the mid-day configure change folks; I meant to commit this
last night and forgot.  :-(

This commit was SVN r23606.
2010-08-13 13:18:09 +00:00
Shiqing Fan
550f180014 Add a windows support file into the tarball.
This commit was SVN r23605.
2010-08-13 11:54:13 +00:00
Rainer Keller
14aad075eb - On Jaguar, we don't have pretty printed stackframe, aka no opal_stackframe_output*
This commit was SVN r23602.
2010-08-12 14:44:56 +00:00
Rainer Keller
fc4cb0c0c1 - Allow changing ALPS run command
- Fix misnomer

This commit was SVN r23601.
2010-08-12 14:41:35 +00:00
Jeff Squyres
e6f0422f7c r20280 introduced the op framework and changed all the back-end op
string names from MPI_<foo> to MPI_OP_<foo>.  While these names are
OMPI-internal-only (i.e., not exposed to MPI applications), this
change is a difference between the released 1.3/1.4 series.

The Voltaire FCA library uses these strings for its own internal
purposes; since the names changed between the 1.3/1.4 series and the
upcoming 1.5 series, it caused a problem for the FCA library.  They
volunteered to put in a hot fix in FCA, but it seems to me that we
shouldn't change the names to begin with -- there was no real reason
to change them to MPI_OP_<foo>.  So this commit changes them back to
MPI_<foo>. 

This commit was SVN r23600.

The following SVN revision numbers were found above:
  r20280 --> open-mpi/ompi@4d8a187450
2010-08-12 13:56:01 +00:00
Shiqing Fan
330999e36c Some fixes for C/R enhancement on Windows. Add the option and fix some type casts, just let it compile.
This commit was SVN r23599.
2010-08-12 13:31:37 +00:00
Mike Dubman
16d7169680 refactoring:
* split fca_open() into fca_register() and fca_open()

This commit was SVN r23598.
2010-08-12 12:05:23 +00:00
Mike Dubman
ba5bc9b674 fixes:
* fixup lookup of supported ops by name:
        in ompi 1.5.x the op string representation were changed from MPI_XXX to MPI_OP_XXX (relative to OMPI 1.4.x)
		* keep compat between diff versions of FCA
		* better error handling (return error if symbol not found)
		* register to opal_progress and call fca_progress API

This commit was SVN r23597.
2010-08-12 08:15:55 +00:00
Ralph Castain
5715a5b421 Let VM-based mappings include the updated nidmap
This commit was SVN r23596.
2010-08-11 21:04:28 +00:00