1
1
Граф коммитов

1237 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
78b282d0d6 Cosmetic changes; replace tabs with spaces.
This commit was SVN r20183.
2009-01-03 01:45:52 +00:00
Donald Kerr
213daa58da support for solaris relaxed ordering
This commit was SVN r20167.
2008-12-24 15:05:12 +00:00
George Bosilca
7fc48ae11e Update the template BTL to fulfill the requirements for #1713.
This commit was SVN r20153.
2008-12-17 22:15:43 +00:00
George Bosilca
209b844017 Update the self BTL to fulfill the requirements for #1713.
This commit was SVN r20152.
2008-12-17 22:15:27 +00:00
George Bosilca
7d404d238d Update the tcp BTL to fulfill the requirements for #1713.
This commit was SVN r20151.
2008-12-17 22:15:12 +00:00
George Bosilca
341ee1389c Update the sm BTL to fulfill the requirements for #1713.
This commit was SVN r20150.
2008-12-17 22:14:59 +00:00
George Bosilca
7cec018149 Update the Elan BTL to fulfill the requirements for #1713.
This commit was SVN r20149.
2008-12-17 22:14:45 +00:00
Brian Barrett
f8537c0059 Following ticket #1725, when a free list item can not be allocated, return
the error to the upper layer and let it deal with the problem

This commit was SVN r20143.
2008-12-16 22:38:02 +00:00
Avneesh Pant
c1e508750b Check the active port MTU against the MTU statically configured for the HCA. QLogic HCA's capable of MTU had an issue when connected to switches running at 2K.
This commit was SVN r20131.
2008-12-15 21:17:58 +00:00
George Bosilca
fe87e28fee This is a temporary fix for the deadlock problem over MX. The real
problem seems to come from the free list, but due to lack of time to
understand it completely, I provide this fix. Basically, there is no
waiting in the MX BTL anymore, if we cannot allocate a fragment we
rely on the PML to take the corrective actions.

This commit was SVN r20124.
2008-12-15 03:45:34 +00:00
George Bosilca
fec8692074 Get rid of all elan3 references.
This commit was SVN r20122.
2008-12-12 23:59:21 +00:00
Nysal Jan
6a5454b76a Fixes crash in openib BTL on a heterogeneous cluster Refs trac:1700
This commit was SVN r20113.

The following Trac tickets were found above:
  Ticket 1700 --> https://svn.open-mpi.org/trac/ompi/ticket/1700
2008-12-10 22:07:48 +00:00
Shiqing Fan
a5281f0434 - 1/4 commit for Windows Visual Studio and CCP support:
CMakeLists and .windows files.
  In contribs preconfigured and precompiled parts.

This commit was SVN r20108.
2008-12-10 20:59:20 +00:00
Ralph Castain
1ace83c470 Enable modex-less launch. Consists of:
1. minor modification to include two new opal MCA params:
   (a) opal_profile: outputs what components were selected by each framework
       currently enabled for most, but not all, frameworks
   (b) opal_profile_file: name of file that contains profile info required
       for modex

2. introduction of two new tools:
   (a) ompi-probe: MPI process that simply calls MPI_Init/Finalize with
       opal_profile set. Also reports back the rml IP address for all
       interfaces on the node
   (b) ompi-profiler: uses ompi-probe to create the profile_file, also
       reports out a summary of what framework components are actually
       being used to help with configuration options

3. modification of the grpcomm basic component to utilize the
   profile file in place of the modex where possible

4. modification of orterun so it properly sees opal mca params and
   handles opal_profile correctly to ensure we don't get its profile

5. similar mod to orted as for orterun

6. addition of new test that calls orte_init followed by calls to
   grpcomm.barrier

This is all completely benign unless actively selected. At the moment, it only supports modex-less launch for openib-based systems. Minor mod to the TCP btl would be required to enable it as well, if people are interested. Similarly, anyone interested in enabling other BTL's for modex-less operation should let me know and I'll give you the magic details.

This seems to significantly improve scalability provided the file can be locally located on the nodes. I'm looking at an alternative means of disseminating the info (perhaps in launch message) as an option for removing that constraint.

This commit was SVN r20098.
2008-12-09 23:49:02 +00:00
Pavel Shamis
068054132a Temporary work around for #1693 (osu_bibw segfault in xrc + coalescing mode)
This commit was SVN r20093.
2008-12-09 19:21:54 +00:00
Brad Benton
0b83ab39e5 Added the part number for IBM's 2nd rev of the eHCA to the eHCA param stanza
This commit was SVN r20057.
2008-12-03 14:35:43 +00:00
Jon Mason
54b4e9901c Gracefully handle NULL strings when calling orte_show_help for preventing usage
of more than one of the btl_openib_if_include, btl_openib_if_exclude,
btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude MCA parameters.

This commit was SVN r20053.
2008-12-02 23:17:46 +00:00
Jon Mason
91b26eba67 This commit adds comments regarding IP Aliases and the default behavior when
determining which IP address to use when transmitting data.  Also it adds logic
to prevent usage of more than one of the btl_openib_if_include,
btl_openib_if_exclude, btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude
MCA parameters.

This should complete the code modifications needed for ticket 1665.

This commit was SVN r20052.
2008-12-02 22:42:01 +00:00
Donald Kerr
668eafec88 Support to handle platforms which do not define RLIMIT_MEMLOCK
This commit was SVN r20035.
2008-11-25 03:13:09 +00:00
Jon Mason
4757970438 This patch consists of two parts. Part one is the fixing of a bug in the
determing of the IP subnet.  The netmask was being used improperly when
determining which subnet each connection is on.  Part two is the ability to
include/exclude specific subnets.

This patch fixes ticket #1665

This commit was SVN r20016.
2008-11-17 20:20:24 +00:00
Nysal Jan
e4bdaac6d8 Fixed the case where a device does not support inline data. Redefined the interpretation of max_inline_data MCA parameter.
* If max_inline_data == -1 perform runtime detection 
* If max_inline_data >=0 use the value provided 
* If the user does not explicitly set this via command line, use the value from INI file

This commit fixes trac:1662

This commit was SVN r19995.

The following Trac tickets were found above:
  Ticket 1662 --> https://svn.open-mpi.org/trac/ompi/ticket/1662
2008-11-14 12:15:35 +00:00
Rolf vandeVaart
76f8ce01cf Need to add sppp to list of default excluded interfaces
to support Sun M9000 server.

This commit was SVN r19988.
2008-11-12 20:30:14 +00:00
Ralph Castain
ce26e3a2fb Update the notifier framework in prep for move to v1.3. Add an API to handle the case where error messages have been expressed via "show_help" so they can look similar to what was presented to users. Add three key calls in the openib btl to drop messages into syslog.
This will sit in trunk for a few days - would like to actually see some errors reported to syslog before moving the code to 1.3

This commit was SVN r19986.
2008-11-12 18:03:51 +00:00
Jeff Squyres
a48b2d45be Fix wonky copyright year.
This commit was SVN r19985.
2008-11-12 17:51:54 +00:00
Kenneth Matney
07f7f00c91 This disables sendi, since it may do 0-byte requests and it still has
another bug.  This also causes 0-byte requests to be treated as a buffer
error, causing the base request to be requeued.  On Cray XT, it may be
temporarily impossible to make allocations for buffer requests, as the
default stack size is small (8 MB) and there is no true swap device.
Even with the stack size increased, there will be cases in which this
condition recurs.

One possibility is to make the buffer allocations off of the heap; but,
this does not change the fact that eventually an out-of-memory condition
will occur and we need to support multiple receives in transit, a
condition for which the available buffer space may change.  On the other
hand, if we switch to allocating the buffer space from the heap, we will
need to return an error when the allocation fails and there are no other
buffers in transit.

This commit was SVN r19981.
2008-11-12 16:04:14 +00:00
George Bosilca
e84af7920e Move __counter outside the #ifdef section. Cleanup the usage of __counter.
This commit was SVN r19979.
2008-11-11 16:46:11 +00:00
Josh Hursey
080e581422 This commit removes some duplicate finalize code between the component's finalize, and the version that C/R needed in the ft_event function. From my testing everything looks fine, but should probably soak overnight just to be sure. It will need to be moved to v1.3
Thanks to Jeff, Pasha, and Tim M. for bringing this to my attention.

This commit was SVN r19963.
2008-11-10 18:35:57 +00:00
Pavel Shamis
29cc6de40b OOB, XOOB, RDMACM and IBCM does not support qp creation and connection for self communication. So we must use self.
This commit was SVN r19960.
2008-11-10 11:24:57 +00:00
Rolf vandeVaart
cad49da72d Fix the tcp btl so it makes use of the btl_tcp_if_include and btl_tcp_if_exclude
parameters on the connecting side also.  Also move define of IF_NAMESIZE
into if.h file.  And lastly, add one verbose debug message which may be
useful if we run into other issues like this.

This commit fixes trac:1573.

This commit was SVN r19932.

The following Trac tickets were found above:
  Ticket 1573 --> https://svn.open-mpi.org/trac/ompi/ticket/1573
2008-11-05 18:45:42 +00:00
Jeff Squyres
84e30534a2 Update btl_<foo>_flags help message
This commit was SVN r19930.
2008-11-05 00:00:55 +00:00
George Bosilca
d37706f6f8 Reset the variables to NULL after releasing the memory.
This commit was SVN r19912.
2008-11-04 16:49:46 +00:00
Terry Dontje
19cbe4567e This commit fixes trac:1635.
This commit was SVN r19911.

The following Trac tickets were found above:
  Ticket 1635 --> https://svn.open-mpi.org/trac/ompi/ticket/1635
2008-11-04 16:46:45 +00:00
George Bosilca
721626af70 Remove some unused code and a compiler warning.
This commit was SVN r19910.
2008-11-04 16:42:23 +00:00
Terry Dontje
b5a30af20e Correct double free of event on tcp_events list. Refs trac:1629.
This commit was SVN r19890.

The following Trac tickets were found above:
  Ticket 1629 --> https://svn.open-mpi.org/trac/ompi/ticket/1629
2008-11-03 19:03:57 +00:00
George Bosilca
d06091cf84 Completely refactor the connection procedure for MX. It is now cleaner, and
in case of failure more user friendly. In addition we now fully integrate
with the MX_BONDING features (i.e the multi-rails bonding can be done either
at the PML OB1 level, or deep down in the MX library). This feature is driven
by the btl_mx_bonding MCA parameter.
Plus some extra features:
- Correctly compute the bandwidth in case of bonding.
- Nicely handle any mapper access errors.

This commit was SVN r19874.
2008-11-01 17:05:48 +00:00
Jeff Squyres
ae86c0eeca Ensure that the btl variable is always initialized.
This commit was SVN r19873.
2008-11-01 11:24:11 +00:00
George Bosilca
ef3d8cd2a1 Release the memory if we fail to open an MX endpoint.
This commit was SVN r19871.
2008-10-31 23:23:14 +00:00
Jeff Squyres
3b1a1d75f8 Fix problem on RHEL4U3 when compiling with the PGI 32 bit compiler:
<infiniband/driver.h> cannot be included because it will fail to
compile.  So per advice from Roland, in that case, just put manually
include the one ibv_*() prototype that we need.  Use an undocumented
feature of AC_CHECK_HEADERS to check for <infiniband/driver.h> that I
was told on the autoconf mailing list (see the comment for more
details).

This commit was SVN r19857.
2008-10-29 21:45:06 +00:00
Rolf vandeVaart
95bf870055 Fix the allocation for the pointers to the btl structures.
We were only allocating one whereas we would typically need
one per interface.  Most of the time we got lucky and we 
happily trundled over unallocated memory.  But with enough
interfaces, we would get a SEGV.

Also add include to fix compilation with static libs.

This commit was SVN r19845.
2008-10-29 15:15:37 +00:00
Jeff Squyres
6b6c08ef67 Fixes trac:1588: have several BTLs disable themselves in the presence of
THREAD_MULTIPLE.  There's a new (hidden) MCA parameter to re-enable
these BTLs in the presence of THREAD_MULTIPLE:
btl_base_thread_multiple_override.  This MCA parameter should ''only''
be used by developers who are working on make their BTLs thread safe;
it should ''not'' be used by end-users!

This commit was SVN r19826.

The following Trac tickets were found above:
  Ticket 1588 --> https://svn.open-mpi.org/trac/ompi/ticket/1588
2008-10-28 18:29:57 +00:00
Jeff Squyres
96f7d971d3 Refs trac:1589: remove the OMPI_DECLSPEC because it doesn't need to be
there.

This commit was SVN r19825.

The following Trac tickets were found above:
  Ticket 1589 --> https://svn.open-mpi.org/trac/ompi/ticket/1589
2008-10-28 17:24:31 +00:00
Jeff Squyres
200a8552fe Fixes trac:1598: it's not an error if a CPC returns UNREACH.
This commit was SVN r19815.

The following Trac tickets were found above:
  Ticket 1598 --> https://svn.open-mpi.org/trac/ompi/ticket/1598
2008-10-27 18:23:56 +00:00
Jeff Squyres
b5123cb79f Don't destroy the event channel until after everything else has been
torn down.  Fixes trac:1582.

This commit was SVN r19800.

The following Trac tickets were found above:
  Ticket 1582 --> https://svn.open-mpi.org/trac/ompi/ticket/1582
2008-10-24 15:04:54 +00:00
Jeff Squyres
ae34fd150a It always helps to initialize a variable before you try to use it
(vs. only initializing it in some cases).

This commit was SVN r19796.
2008-10-24 14:20:07 +00:00
Jeff Squyres
238a6f851f Wow. And I won't name who it was who did this. You know who you are! :-)
We had a public symbol named "already_opened".  This commit changes
the name to mca_btl_base_already_opened.

I guess it's good we have the visibility stuff enabled by default!
:-)

This commit was SVN r19791.
2008-10-23 18:56:10 +00:00
George Bosilca
7dfdf3e907 A lot of MX fixes.
1. Allow MX bonding via btl_mx_bonding MCA parameter. With this on, Open MPI
  suppose that lib MX will do the bonding, and we will only return one BTL.
  Otherwise, we return as many as devices.
2. Decrease the memory footprint, by cleaning up what we store about the
  peers and how we store it.
3. Allow multiple MX routes that share the same mapper. In this particular
  case we will link by their nic_id.
4. Allow multiple MX routes with multiple mappers. In this case we match
  the NICs based on the last 6 digits of the mapper MAC.
5. Increase the size of the eager and rendez-vous eager limits in the
  case where we are unable to register an unexpected callback with MX.
6. Increase the default max number of MX fragments.
7. Increase the max number of MX BTLs.
8. Only allow mx_if_include and mx_if_exclude if we have acess to the
  mapper.

This commit was SVN r19788.
2008-10-22 20:12:30 +00:00
Jeff Squyres
307d52aedc Ensure to check for internal ptmalloc2 support properly.
This commit was SVN r19760.
2008-10-17 16:27:20 +00:00
Josh Hursey
88aa45dd52 Commit to bring online OpenIB, MX, and shared memory support for Open MPI's checkpoint/restart functionality. Some tuning is still needed, but basic functionality is in place.
There is still a problem with OpenIB and threads (external to C/R functionality). It has been reported in Ticket #1539

Additionally:
* Fix a file cleanup bug in CRS Base.
* Fix a possible deadlock in the TCP ft_event function
* Add a mca_base_param_deregister() function to MCA base
* Add whole process checkpoint timers
* Add support for BTL: OpenIB, MX,  Shared Memory
* Add support Mpool: rdma, sm
* Sundry bounds checking an cleanup in some scattered functions

This commit was SVN r19756.
2008-10-16 15:09:00 +00:00
Lenny Verkhovsky
7ab9a72d0f merge with r19717, memory barrier on PPC
I run IMB exchange on two QS22 machines with r19674 and it got stucked after 256 or 512 bytes every time.
After applying r19717 the test passed, so I guess this is a essential patch.

This commit was SVN r19752.

The following SVN revision numbers were found above:
  r19674 --> open-mpi/ompi@15c47a2473
  r19717 --> open-mpi/ompi@0a765cd788
2008-10-15 16:56:42 +00:00
Jeff Squyres
1e14bb305f Don't increase num_peers if the operation fails.
This commit was SVN r19748.
2008-10-15 10:37:20 +00:00