1
1
Граф коммитов

2836 Коммитов

Автор SHA1 Сообщение Дата
Terry Dontje
64ace9ec12 convert bzero calls to memset to remove warnings.
This commit was SVN r20471.
2009-02-06 19:08:22 +00:00
Jeff Squyres
dfb2d92b37 s/ID/id/ - both work, but if I don't make this change, I'll wonder if
we remembered to use strcasecmp() every time I see this entry in the
file... (we did, but I just don't want to have to keep remembering
that ;-) )

This commit was SVN r20461.
2009-02-06 01:02:25 +00:00
Jeff Squyres
656d8578d0 * Rename (new) MCA parameter to
btl_openib_connect_rdmacm_reject_causes_connect_error (yes, it's
   still long -- on purpose :-) )
 * Add INI file parameter rdmacm_reject_causes_connect_error
 * Now only treat CONNECT_ERROR events as a REJECT if:
   * It's on a connection where we were expecting a REJECT, ''and''
   * The MCA parameter is true ''or'' the INI parameter for this
     device is true
 * Set the INI parameter for true for the NE020

This commit was SVN r20459.
2009-02-06 00:51:04 +00:00
Jeff Squyres
50b1fd1392 Per the big discussion on the OpenFabrics list a while ago, some
versions of the NE driver will report the OUI while others will report
the PCI ID.  We'll put in the Intel values when we get them (may not
be for a few more weeks).

This commit was SVN r20457.
2009-02-05 21:19:45 +00:00
Jeff Squyres
66d0a02f90 For a problem for some iWARP drivers that don't handle RDMA CM REJECT
properly at all.  NetEffect's current driver (OFED 1.4.0) will return
a CONNECT_ERROR event to the initiator rather than the REJECTED event.
Doh!  Additionally -- unfortunately -- NetEffect's vendor_id and
vendor_part_id are reported as 0 in OFED 1.4.0, so we can't
automatically detect these cards and work around the problem.  So all
we can do is add a new MCA parameter
(btl_openib_connect_rdmacm_ignore_connect_errors -- yes, it's long on
purpose ;-) ) that says that if we get a CONNECT_ERROR, bascially
treat it exactly as a REJECT for the WRONG_DIRECTION reason (which is
a "good" reject).  This allows OMPI to function with NetEffect/Intel
cards on OFED 1.4.0.

Note that NetEffect has been bought by Intel; I'm waiting for
information from them to update the ini file for their new OUI/PCI
ID's and/or new vendor_part_id values.

This commit was SVN r20454.
2009-02-05 18:45:59 +00:00
Jeff Squyres
08c35ca135 Somehow this mca param registration code got duplicated; remove one of
them

This commit was SVN r20452.
2009-02-05 16:52:30 +00:00
Ralph Castain
b100513022 Add a few new MPI_Info options to the dpm - documentation to follow.
Fix a mistake in the dpm that hardcoded the update of routes to the HNP. This needs to be done by the individual routing modules so they can take whatever action is required - which will usually include updating the HNP, but might not...and might include additional steps. New routing modules are coming that violated this assumption, so it had to be moved back into init_routes.

All current routed modules know what to do - anyone with routed modules not in the current trunk may need to adjust them (see any of the current routed modules for examples of what to do).

This commit was SVN r20427.
2009-02-04 22:30:23 +00:00
Jeff Squyres
2cafa5d640 Re-add missing assignment of component variable from MCA param that
somehow must have gotten deleted along the way...

This commit was SVN r20386.
2009-01-30 11:36:14 +00:00
Jeff Squyres
35c5e28a8e Up to SVN r20383
This commit was SVN r20384.

The following SVN revision numbers were found above:
  r20383 --> open-mpi/ompi@e0638c84c8
2009-01-29 17:59:04 +00:00
Rainer Keller
fb0e0b854a - Again, no need for #include "orte/util/show_help.h"
- Use BEGIN_C_DECLS and END_C_DECLS

This commit was SVN r20358.
2009-01-27 19:19:04 +00:00
Rainer Keller
9825e087b8 - In rb/rcache_rb.c, the reg->flags should only be operated under the
lock -- therefore move the OPAL_THREAD_UNLOCK after
   the if-OMPI_ERR_TEMP_OUT_OF_RESOURCE block.

 - As mca_rcache_rb_mru_delete is the only setter of rc, move the
   error-check right after mca_rcache_rb_mru_delete.

 - Removed a few nitty ompi/info/info.h and orte/util/show_help.h

This commit was SVN r20355.
2009-01-27 19:00:03 +00:00
Rainer Keller
de4c123ca2 - No dependancy on orte/util/show_help.h, so get rid of #include
This commit was SVN r20354.
2009-01-27 16:30:21 +00:00
Jeff Squyres
ca0f7d77e9 Fix a help message regarding the btl_openib_receive_queues MCA
parameter.

This commit was SVN r20350.
2009-01-26 18:57:07 +00:00
Jeff Squyres
1573aaceb7 Add missing header file.
This commit was SVN r20290.
2009-01-17 12:21:42 +00:00
Jeff Squyres
84a3f84fdf Possible fix for random openib segv.
This commit was SVN r20282.
2009-01-15 17:10:18 +00:00
Jeff Squyres
8483c3c66e It is not an error if there are no op components found; we'll just
fallback to the base functions.

This commit was SVN r20281.
2009-01-15 02:01:32 +00:00
Jeff Squyres
4d8a187450 Two major things in this commit:
* New "op" MPI layer framework
 * Addition of the MPI_REDUCE_LOCAL proposed function (for MPI-2.2)

= Op framework =

Add new "op" framework in the ompi layer.  This framework replaces the
hard-coded MPI_Op back-end functions for (MPI_Op, MPI_Datatype) tuples
for pre-defined MPI_Ops, allowing components and modules to provide
the back-end functions.  The intent is that components can be written
to take advantage of hardware acceleration (GPU, FPGA, specialized CPU
instructions, etc.).  Similar to other frameworks, components are
intended to be able to discover at run-time if they can be used, and
if so, elect themselves to be selected (or disqualify themselves from
selection if they cannot run).  If specialized hardware is not
available, there is a default set of functions that will automatically
be used.

This framework is ''not'' used for user-defined MPI_Ops.

The new op framework is similar to the existing coll framework, in
that the final set of function pointers that are used on any given
intrinsic MPI_Op can be a mixed bag of function pointers, potentially
coming from multiple different op modules.  This allows for hardware
that only supports some of the operations, not all of them (e.g., a
GPU that only supports single-precision operations).

All the hard-coded back-end MPI_Op functions for (MPI_Op,
MPI_Datatype) tuples still exist, but unlike coll, they're in the
framework base (vs. being in a separate "basic" component) and are
automatically used if no component is found at runtime that provides a
module with the necessary function pointers.

There is an "example" op component that will hopefully be useful to
those writing meaningful op components.  It is currently
.ompi_ignore'd so that it doesn't impinge on other developers (it's
somewhat chatty in terms of opal_output() so that you can tell when
its functions have been invoked).  See the README file in the example
op component directory.  Developers of new op components are
encouraged to look at the following wiki pages:

  https://svn.open-mpi.org/trac/ompi/wiki/devel/Autogen
  https://svn.open-mpi.org/trac/ompi/wiki/devel/CreateComponent
  https://svn.open-mpi.org/trac/ompi/wiki/devel/CreateFramework

= MPI_REDUCE_LOCAL =

Part of the MPI-2.2 proposal listed here:

    https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/24

is to add a new function named MPI_REDUCE_LOCAL.  It is very easy to
implement, so I added it (also because it makes testing the op
framework pretty easy -- you can do it in serial rather than via
parallel reductions).  There's even a man page!

This commit was SVN r20280.
2009-01-14 23:44:31 +00:00
Brian Barrett
cfc400eb57 * Enable eager sending for Accumulate
* If the accumulate is local, make it short-circuit the request path.  Accumulate requires local
  ops due to its window rules, so this is likely to help a bunch (on the codes I"m messing
  with at least)
* Due a better job at flushing everything that can go out on the wire in a resource constrained problem
* Move some debugging values around to make large problems somewhat easier to deal with

This commit was SVN r20277.
2009-01-14 20:15:15 +00:00
Edgar Gabriel
1072812bcf not every element in the pointer array list contains a valid entry. Thus, do not try to free elements if the list returns NULL.
This commit was SVN r20275.
2009-01-14 19:11:30 +00:00
George Bosilca
01adc999c5 Correctly forward the right module if we call another collective function. Kudos to
Edgar for figuring out this tricky bug.

This commit was SVN r20267.
2009-01-14 03:22:54 +00:00
Jeff Squyres
d1c6f3f89a * Fix a truckload of Cisco copyrights to be the same as the rest of
the code base.
 * Fix a few misspellings in other copyrights.

This commit was SVN r20241.
2009-01-11 02:30:00 +00:00
George Bosilca
9da6fba64b Update the SCTP BTL regarding ticket #1725.
This commit was SVN r20231.
2009-01-08 16:38:35 +00:00
Rolf vandeVaart
e78add702a Increase the default maximum size that the sm btl file is allowed
to grow to. Without this change, jobs with np>120 get errors.   
This does not change anything for np<16 jobs.  It only comes into 
play with larger np count on a node. I imagine that this can be 
scaled back in the future if the usage of memory in the sm
btl is improved.

This fixes trac:1449.

This commit was SVN r20230.

The following Trac tickets were found above:
  Ticket 1449 --> https://svn.open-mpi.org/trac/ompi/ticket/1449
2009-01-08 14:39:00 +00:00
Donald Kerr
e57435a5d4 udapl btl fix for #1725; replace WAIT with GET
This commit was SVN r20227.
2009-01-08 13:41:36 +00:00
Pavel Shamis
391b101439 Renaming pending_frags to no_credits_pending_frags.
(this commit is part of bug fix for ticket #1693)

This commit was SVN r20217.
2009-01-07 14:41:20 +00:00
Pavel Shamis
2f7b66160b Adding real fix for ticket #1693 - XRC + coalescing segfault.
This commit was SVN r20214.
2009-01-07 14:10:58 +00:00
George Bosilca
8e4107353f Update the last instance of bml_base_send to correctly cope with the
return values from the BTL. This is related to ticket 1734.

This commit was SVN r20210.
2009-01-06 19:44:48 +00:00
George Bosilca
f2b9b3fa0b One less warning about a unmatched printf type.
This commit was SVN r20199.
2009-01-05 15:02:00 +00:00
George Bosilca
760e744294 Use a more clear name for the proc in the constructor and destructor functions.
Make sure the lock is created and destroyed as expected.

This commit was SVN r20197.
2009-01-05 14:14:38 +00:00
Jeff Squyres
11b375f8b5 CIDs 1080-1090: assert() checks were not sufficient to check for
NEGATIVE_RETURNS from _reg_int() because those are not always
checked.  So replace them with real if() checks.

This commit was SVN r20195.
2009-01-03 15:56:25 +00:00
Jeff Squyres
679e2855b7 Fix CID 1135: the assignment to item was never used (it was
overwritten in the next loop iteration "item = next_item").

This commit was SVN r20189.
2009-01-03 15:15:42 +00:00
Jeff Squyres
62385a6c39 Add a comment explaining why there is an empty Makefile.am in this
tree.

This commit was SVN r20184.
2009-01-03 02:07:01 +00:00
Jeff Squyres
78b282d0d6 Cosmetic changes; replace tabs with spaces.
This commit was SVN r20183.
2009-01-03 01:45:52 +00:00
Brian Barrett
e1f40c6a71 Fixes to make the rdma osc component work again:
* Don't overwrite the des_flags field, removing the
    all important always callback field
  * Fix up return status of bml_base_send, since
    the rest of the code expects OMPI_SUCCESS or
    an error code

This commit was SVN r20178.
2009-01-01 23:48:29 +00:00
Jeff Squyres
f13ea32830 Remove the code checkig the MCA "coll" parameter for a list of coll
components to use.  This code was rendered obsolete (albiet harmless)
by the MCA base improvements that only open the components that were
specified by each framework's MCA parameter.

This commit was SVN r20176.
2008-12-31 13:40:51 +00:00
Jeff Squyres
759a295cc9 Gaah -- missed one s/m/component/g
This commit was SVN r20175.
2008-12-31 13:35:37 +00:00
Jeff Squyres
955d1e132d Rename a variable to be "component" (not "m"), to emphasize that it is
the component struct, not a module.

This commit was SVN r20174.
2008-12-31 13:32:46 +00:00
Jeff Squyres
865900dd27 Nothing of substance; just indenting changes (''finally'' update this
framework base to 4 space tabs!).

This commit was SVN r20173.
2008-12-31 12:17:08 +00:00
Jeff Squyres
ce313fa391 Minor fixes to a few comments
This commit was SVN r20172.
2008-12-31 11:34:27 +00:00
Jeff Squyres
d533215dac Fix a comment to reflect the right version number
This commit was SVN r20169.
2008-12-30 12:39:32 +00:00
Donald Kerr
213daa58da support for solaris relaxed ordering
This commit was SVN r20167.
2008-12-24 15:05:12 +00:00
George Bosilca
4d5fbc5955 Remove unused lock from the ompi_proc_t. This reduce the size of the ompi_proc_t
by 64 bytes.
Remove the useless pml_proc from the PML layer.

This commit was SVN r20157.
2008-12-19 19:56:27 +00:00
George Bosilca
7fc48ae11e Update the template BTL to fulfill the requirements for #1713.
This commit was SVN r20153.
2008-12-17 22:15:43 +00:00
George Bosilca
209b844017 Update the self BTL to fulfill the requirements for #1713.
This commit was SVN r20152.
2008-12-17 22:15:27 +00:00
George Bosilca
7d404d238d Update the tcp BTL to fulfill the requirements for #1713.
This commit was SVN r20151.
2008-12-17 22:15:12 +00:00
George Bosilca
341ee1389c Update the sm BTL to fulfill the requirements for #1713.
This commit was SVN r20150.
2008-12-17 22:14:59 +00:00
George Bosilca
7cec018149 Update the Elan BTL to fulfill the requirements for #1713.
This commit was SVN r20149.
2008-12-17 22:14:45 +00:00
Josh Hursey
c954045989 Add a patch to address a deadlock in the CRCP BKMRK component.
The problem was that we doubly decremented the active count on blocking receives that we stall to complete. This moved the active count into the negative. With a negative count for 'active' a message that should have been accounted for would be over looked. This then causes the bookmark exchange to post a drain for a message that was never posted, thus locking the protocol. By eliminating the decrement on the 'active' count when we attempt to post the drain message, we only the decrement this counter when the outstanding blocking recv completes during the stall operation.

Refs trac:1619
Does not close this ticket since there is an outstanding potential problem with ANY_SOURCE and ANY_TAG, as referenced in the ticket.

This should be moved to v1.3

This commit was SVN r20147.

The following Trac tickets were found above:
  Ticket 1619 --> https://svn.open-mpi.org/trac/ompi/ticket/1619
2008-12-17 17:23:39 +00:00
Brian Barrett
f8537c0059 Following ticket #1725, when a free list item can not be allocated, return
the error to the upper layer and let it deal with the problem

This commit was SVN r20143.
2008-12-16 22:38:02 +00:00
Ethan Mallove
9003e4d722 Add missing #include <errno.h> line (for SunStudio Solaris).
This commit was SVN r20138.
2008-12-16 15:30:02 +00:00