1
1
Граф коммитов

14992 Коммитов

Автор SHA1 Сообщение Дата
Mike Dubman
104d57f69a * Support allgatherv, convert displs and rcounts arrays to bytes.
* change comm_init API - no need to pass local rank groups, fca calculates that on its own.
* remove local rank list from module - libfca maintains that now.
* in fca_bcast and fca_reduce - pass root rank index and let libfca figure out the local rank index.

This commit was SVN r23716.
2010-09-05 09:49:59 +00:00
Nadia Derbey
e265dc51e5 Added Bull vendor id for ConnectX card
This commit was SVN r23715.
2010-09-03 14:13:19 +00:00
Jeff Squyres
b9ac24eadd Based on
http://www.open-mpi.org/community/lists/devel/2010/09/8455.php, rever
this patch.  George, Brice, and Scott can decide what they want to do
here.  

This commit was SVN r23714.
2010-09-03 13:48:36 +00:00
Abhishek Kulkarni
c3a653ebb3 Fix MPI segfaults during MPI_Init() with the MX BTL and MTL.
Thanks to Scott Atchley for the patch.

This commit was SVN r23713.
2010-09-03 12:38:14 +00:00
Jeff Squyres
2b2b29a6d4 For some reason, the MX btl sets btl_bandwidth in megabits/s instead
of megabytes/s. So we get crazy btl_weights in case of heterogeneous
multirail. And --mca btl_mx_bandwidth <width> cannot work around the
problem (it probably doesn't help because it's overriden by the
runtime link width detection anyway?).

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>

This commit was SVN r23712.
2010-09-03 12:03:06 +00:00
Abhishek Kulkarni
a143622b54 Remove unused code. Notifier events are aggregated on a per-event basis by the HNP notifier.
This commit was SVN r23711.
2010-09-02 16:00:22 +00:00
Ralph Castain
f75437f5a3 Add the ability to receive notifier output when job completes. Set the notification level to INFO for normal job completion, and to ALERT for abnormal termination.
This commit was SVN r23710.
2010-09-02 14:42:41 +00:00
Rolf vandeVaart
14e7bcc383 Create new entries in the wrapper data files so the
administrator can specify compiler flags that get
inserted into the command before the user's flags.
These flags can be specified at configure time.
Reviewed by Jeff Squyres.

This fixes ticket #2474.

This commit was SVN r23709.
2010-09-02 10:47:55 +00:00
Mike Dubman
48274c1c77 better control for enable/disable specific coll APIs
This commit was SVN r23708.
2010-09-02 09:22:24 +00:00
Rolf vandeVaart
47940f2aa0 Fix the fix (r23649) for ticket 2532. We were neglecting to
update the remain_len field for the buffer.

This really fixes ticket #2532.

This commit was SVN r23706.

The following SVN revision numbers were found above:
  r23649 --> open-mpi/ompi@f42c2a737f
2010-09-01 14:12:08 +00:00
Mike Dubman
8ef56bf258 * drop support for FCA v1.2
* add support for FCA ABI
* add support for allgather

This commit was SVN r23705.
2010-09-01 11:29:10 +00:00
Ethan Mallove
eae9b4c564 Improve on r23665. Ensure patches can be used on Solaris.
This commit was SVN r23701.

The following SVN revision numbers were found above:
  r23665 --> open-mpi/ompi@7acb18f3d4
2010-08-31 17:59:38 +00:00
Ralph Castain
b982f908e8 Fixed some newly-induced warnings
This commit was SVN r23694.
2010-08-31 14:51:19 +00:00
Rainer Keller
97511912ec - Fixup several functions, that cannot return
- Add one instance where we do not use a parameter in a function
 - Fix a buglet in commit r23689, where the attribute-for-function ptrs
   was applied.

This commit was SVN r23690.

The following SVN revision numbers were found above:
  r23689 --> open-mpi/ompi@5eb571c458
2010-08-31 12:21:13 +00:00
Rainer Keller
5eb571c458 - As suggested in CMR #2558, attribute-macros should be
be tested on function pointers and assigned accordingly,
   instead of using the pre-processor in the header files.

   A functional change is (re-) specifying __opal_attribute_noreturn__
   on orte_errmgr_base_abort(): All modules in the errmgr framework
   either use this function, or define their own abort function,
   which sets __opal_attribute_noreturn__.
   This attributes was taken out with the errmgr overhaul in r22872.

This commit was SVN r23689.

The following SVN revision numbers were found above:
  r22872 --> open-mpi/ompi@e4f2d03d28
2010-08-31 10:28:51 +00:00
Brad Benton
09c4f4d95c Added copyright notices for the files modified in r23669.
This commit was SVN r23687.

The following SVN revision numbers were found above:
  r23669 --> open-mpi/ompi@271cfa8c9a
2010-08-30 17:46:47 +00:00
Ralph Castain
b81358815c Add some debug
This commit was SVN r23686.
2010-08-29 13:45:10 +00:00
Ralph Castain
554aede041 Fix a situation where we were unlocking a thread that isn't locked for the main launch - it is only used for dynamic spawns.
This commit was SVN r23682.
2010-08-28 14:03:17 +00:00
Jeff Squyres
c0685fc673 Fix problem noted by Sebastian Andrzej Siewior; we should not be using AS_VAR_GET. Per advice from Ralf, change them all to AS_VAR_IF and AS_VAR_COPY. CMR:v1.5. A separate patch has to be created for v1.4 because files have moved around.
This commit was SVN r23681.
2010-08-27 22:48:57 +00:00
Jeff Squyres
3eedbee7a4 Fixes trac:2541. Ensure that we keep CPPFLAGS if a non-standard valgrind location was specified. CMR:v1.4.3 CMR:v1.5
This commit was SVN r23680.

The following Trac tickets were found above:
  Ticket 2541 --> https://svn.open-mpi.org/trac/ompi/ticket/2541
2010-08-27 22:45:02 +00:00
Ethan Mallove
bdbc24a589 Get ltmain_pgi_tp.diff into nightly tarball
This commit was SVN r23678.
2010-08-27 14:06:51 +00:00
Jeff Squyres
8d114b78ac Add bullett about ABI-changing F90 parameter changes. Note that this
is 1.5 only; we probably shouldn't change it in 1.4.x.

This commit was SVN r23677.
2010-08-27 12:00:11 +00:00
Rainer Keller
4abcf5a0d7 - The Sun-compiler 12 update 1 complains about noreturn-attributes
assigned to function-declarations.
   Check this case and mark the currently only case existing in trunk.

   Thanks to Paul Hargrove for bringing this up.

   Let's test the svn commit msg CMR:v1.5

This commit was SVN r23676.
2010-08-27 09:18:30 +00:00
Jeff Squyres
ce91a8572d Twice the code for half the price! :-)
Somehow, there's an entire 2nd (identical) copy of the sm btl
configure.m4 in here -- this commit removes the duplicate copy,
leaving only 1 copy of each relevant m4 macro.

Thanks to Ralph for spotting it!

This commit was SVN r23675.
2010-08-27 01:24:55 +00:00
Rainer Keller
044b387d3c - If we don't compile with PGI, then mark the parameter as unused,
otherwise we get swamped with warnings by gcc, everywhere header is
   included.
 - Remove redundant declaration of opal_datatype_safeguard_pointer_debug_breakpoint

   Check whether  CMR:v1.5 works

This commit was SVN r23674.
2010-08-26 15:07:18 +00:00
Josh Hursey
00cf339820 Fix typo in README.
Thanks to Paul Hargrove for pointing this out.

Refs trac:2549, #2548

This commit was SVN r23673.

The following Trac tickets were found above:
  Ticket 2549 --> https://svn.open-mpi.org/trac/ompi/ticket/2549
2010-08-26 14:40:35 +00:00
Rolf vandeVaart
8862179380 Fix instructions for gcc,sparc,32-bit.
This fixes trac:2551.

This commit was SVN r23672.

The following Trac tickets were found above:
  Ticket 2551 --> https://svn.open-mpi.org/trac/ompi/ticket/2551
2010-08-26 14:18:14 +00:00
Rainer Keller
12ed573e5e - Include <strings.h> for rindex(3).
Thanks to Paul Hargrove.

   Please CMR:v1.5

This commit was SVN r23671.
2010-08-26 13:42:36 +00:00
Jeff Squyres
60dacba04e This stuff has been outdated for years -- might as well remove it (it
isn't included in the tarball and was only used to generate the
initial f90 scripts -- we've moved well beyond this XML by updating
the scripts without also updating the corresponding XML).

This commit was SVN r23670.
2010-08-26 11:38:38 +00:00
Nysal Jan
271cfa8c9a Fix the the opal_path_nfs test for GPFS. Reported by Paul H. Hargrove
This commit was SVN r23669.
2010-08-26 10:10:16 +00:00
Shiqing Fan
9911797867 Rename a few odls help files for Windows installation.
This commit was SVN r23668.
2010-08-26 09:31:18 +00:00
Ralph Castain
2e223abe33 Restore the auto-poll method for detecting debugger attachment, but only in the mpirx debugger module and only if the corresponding rate mca param is set.
Guess we missed it before, but add the debugger framework to the orte-info and ompi_info tools

This commit was SVN r23667.
2010-08-25 22:52:33 +00:00
Jeff Squyres
97fb426325 Per long-ago RFC, now that the odsl default module reports errors nicely, remove all paffinity components except for hwloc and test.
This commit was SVN r23666.
2010-08-25 22:34:30 +00:00
Ethan Mallove
7acb18f3d4 Patch ltmain.sh in autogen.sh per this Libtool thread:
http://www.mail-archive.com/libtool@gnu.org/msg11249.html

This commit was SVN r23665.
2010-08-25 19:40:17 +00:00
Jeff Squyres
2c52096976 Several EXTRA_STATE parameter types were erroneously "INTEGER" (they
should be "INTEGER(kind=MPI_ADDRESS_KIND)").  This has been wrong for
''years''.  Apparently no one who uses the F90 bindings also uses MPI
attributes.  Sigh.

This commit was SVN r23664.
2010-08-25 16:46:36 +00:00
Ralph Castain
f72cdc4160 Update the compare_name_fields function to allow the caller to specify that wildcard values are to be treated as wildcards
This commit was SVN r23663.
2010-08-25 15:35:41 +00:00
Ralph Castain
4ecd9a0bbe Protect against an obscure race condition that AFAICT only occurs when we are in a loop waiting to recv a message from a peer who is then killed by signal.
This commit was SVN r23662.
2010-08-25 15:35:01 +00:00
Shiqing Fan
7a1bdd2327 Get rid of a warning of "pointer of type ‘void *’ used in arithmetic" on Linux, which is also an error on Windows.
This commit was SVN r23660.
2010-08-25 08:26:11 +00:00
Ralph Castain
f1a00c9a21 Per Jeff's inquiry, play chicken and don't assume herror exists everywhere.
This commit was SVN r23656.
2010-08-24 20:46:41 +00:00
Jeff Squyres
207ca2d928 This commit is the first of several steps in a paffinity makeover
extravaganza.

= Short version =

This commit does several things, but the short version is that it
re-orients the error message creation of the ODLS default module to
generate error strings in the child process for errors that occur
after the fork but before the exec (such errors are ''usually''
related to paffinity).  A show_help string is rendered in the child
and then IPC'ed up to the parent, who displays the string through
normal ORTE show_help aggregation mechanisms.  We also broke up the
ginormous paffinity-setting logic into a few separate functions, both
to help us understand the code, and hopefully to ease future
maintenance.

The logic for the ODLS default binding should not have changed -- this
is mainly a code reshuffle and improvement on error reporting.

= Rationale =

The reasoning for this commit is complex.  As mentioned above, it's
the first step in some paffinity cleanup.  Here's the line of dominoes
that must fall (in this order):

 1. Add hwloc paffinity component (already done).
 1. While testing hwloc, we discovered that the error reporting from
    the ODLS default module was abysmal.  So we fixed it.
 1. Further, we reorganized the code in the odsl_default_module.c a bit
    to help our understanding of it.
 1. We also discovered a few bugs in the original ODLS default module
    logic that existed before this code shuffle; separate tickets
    will be filed to fix them.
 1. Next up will be some improvements to paffinity / odls default to
    make the act of binding to a core ensure to bind to ''all''
    hardware threads contained in that core (similar for sockets:
    binding to a socket will bind to ''all'' hardware threads in that
    socket).
 1. Next will be improvements to paffinity to expose binding to
    hardware threads through the paffinity framework API.
 1. Finally, we'll expose these binding controls to the user (e.g.,
    through mpirun command line arguments, MCA parameters, etc.).

This commit represents the first few bullets; the last 4 bullets are
being worked on right now, but there is no definite timeline for
completion. 

= Miscelaneous =

A few points worth mentioning:

 * We have tested this new code a bunch; we're pretty sure it behaves
   just like the trunk -- but with better / more precise error
   reporting.  More testing is needed on a wider array of platforms,
   however. 
 * A big comment at the top of odls_default_module.c explains the
   (new) general scheme for the error reporting.
 * The error reporting in the parent process is now really dumb;
   almost all the intelligence about creating error messages is in the
   child.
 * The show_help file was renamed to be more consistent with other
   help files (help-odls-default.txt -> help-orte-odls-default.txt)
 * Removed the use of sched_yield() because of recent changes in the
   Linux 2.6.3x kernels.  We already had an #else clause for
   select()'ing for 1us if we didn't have sched_yield() -- that is now
   the only code path.  This is not a performance-critical section of
   the code, so this shouldn't be controversial.
 * Replaced the macro-based error reporting with function-based
   reporting.  It's a bit more bulky, but it helped us understand the
   code and saved us multiple times with compile-time parameter
   checking, etc.
 * Cleaned up the use of several show_help messages to ensure that
   they mapped to real messages in help*.txt files.

This commit was SVN r23652.
2010-08-24 19:38:29 +00:00
Jeff Squyres
2c03554fe7 Add new function: orte_show_help_norender(). It is exactly the same
as orte_show_help(), but it takes a fully-rendered string instead of a
varargs list that must be rendered.  This function is useful in cases
where one entity renders the "show help" string and a different entity
sends the string via the normal orte "show help" mechanisms for
aggregation, etc.

Example usage: errors occur in the ODLS after forking but before
exec'ing.  In such cases, it makes sense for the the child process to
render the "show help" string because it has all the details about the
error.  But the child process can't call orte_show_help() itself
because it is not an ORTE process -- it can't OOB send the message
to the HNP, etc.  

After rendering the help string, the child sends the rendered string
to its parent via normal IPC (e.g., via a pipe) and the parent can
then invoke orte_show_help_norender() with the ready-to-go string.
The message then displays out via the normal mechanisms (i.e., out via
the HNP, aggregated/coalesced, etc.).

This commit was SVN r23651.
2010-08-24 19:12:57 +00:00
Jeff Squyres
a5ce58f098 Define that we return OPAL_ERR_TIMEOUT if the other end of the socket
closes in an opal_fd_read().

This commit was SVN r23650.
2010-08-24 19:07:04 +00:00
Ethan Mallove
f42c2a737f Fixes trac:2532 - "MPI_Put can result in SIGBUS on SPARC"
Reviewed by Rolf V and Brian B

This commit was SVN r23649.

The following Trac tickets were found above:
  Ticket 2532 --> https://svn.open-mpi.org/trac/ompi/ticket/2532
2010-08-24 18:10:43 +00:00
Ralph Castain
3b3cd67d07 If we are using static ports and cannot resolve a hostname, then see if the proc is on the local host. If so, then attempt to use a loopback interface to complete the connection. Only implemented for IPv4 because the if.c code has been so hashed I couldn't figure out how to do this cleanly for all cases.
This commit was SVN r23647.
2010-08-24 14:14:59 +00:00
Samuel Gutierrez
3b572e14ce Fix build issues on Windows. Thanks to Shiqing for pointing this out.
This commit was SVN r23646.
2010-08-24 14:01:05 +00:00
Mike Dubman
fca50c4a09 comply to code-style: no c++ style commends
This commit was SVN r23645.
2010-08-24 13:42:21 +00:00
Mike Dubman
9cb2e0490b removed #if 0
This commit was SVN r23643.
2010-08-24 13:32:28 +00:00
Shiqing Fan
a987eafc90 Add another sm definition for ignoring posix sm on Windows, and exclude those source files.
This commit was SVN r23640.
2010-08-24 09:28:56 +00:00
Ralph Castain
7608513158 Cleanup the code and add some comments to make it easier to understand. Add a bozo error check
This commit was SVN r23639.
2010-08-24 04:46:59 +00:00
Ralph Castain
2886da5669 Ensure that the local daemon vpid gets defined so that the locality procedures work when using the ess generic module.
This commit was SVN r23638.
2010-08-24 04:38:21 +00:00