1
1
Граф коммитов

1455 Коммитов

Автор SHA1 Сообщение Дата
Donald Kerr
47dc1bd493 fix #1828; rework the private data connection establishment process; reviewed by terry d.
This commit was SVN r20889.
2009-03-26 17:54:44 +00:00
Donald Kerr
27ed29a0a1 wrap linux specific steps in __linux__ define
This commit was SVN r20888.
2009-03-26 15:09:01 +00:00
Rainer Keller
6ad07dbffa - First have the _config.h header,
then include system headers based on what's defined.

This commit was SVN r20886.
2009-03-26 12:56:22 +00:00
Donald Kerr
a1ba2e2164 add Sun vendor_id to Tavor Infinihost settings
This commit was SVN r20883.
2009-03-25 19:13:55 +00:00
Donald Kerr
cd7940e208 add vendor_id
This commit was SVN r20882.
2009-03-25 17:57:24 +00:00
Pavel Shamis
d25b7203a2 Adding send_immediate (sendi) implementation to openib btl.
This commit was SVN r20881.
2009-03-25 16:53:26 +00:00
Pavel Shamis
8888c9831c Prevent segfault for case when we release SRQ before srq_create.
This commit was SVN r20872.
2009-03-25 14:18:05 +00:00
Ralph Castain
17f51a0389 Add a new PML module that acts as a "mini-dr" - when requested, it performs a dr-like checksum on messages for BTL's that require it, as specified by MCA params.
Add two new configure options that specify:

1. when to add padding to the openib control header - this *only* happens when the configure option is specified

2. when to use the dr-like checksum as opposed to the memcpy checksum. Not selectable at runtime - to eliminate performance impacts, this is a configure-only option

Also removed an unused checksum version from opal/util/crc.h.

The new component still needs a little cleanup and some sync with recent ob1 bug fixes. It was created as a separate module to avoid performance hits in ob1 itself, though most of the code is duplicative. The component is only selectable by either specifying it directly, or configuring with the dr-like checksum -and- setting -mca pml_csum_enable_checksum 1.

Modify the LANL platform files to take advantage of the new module.

This commit was SVN r20846.
2009-03-23 23:52:05 +00:00
Ralph Castain
fb2b41d40a Give up on the pcie BTL and blow it away. The drivers for this initial implementation have been too customized by IBM - too hard to re-integrate the code.
Maybe someday, someone with enough interest/time can start over...

This commit was SVN r20845.
2009-03-23 23:27:57 +00:00
Jeff Squyres
804eb94f5f Fix the MCA param name to use the non-deprecated name.
This commit was SVN r20832.
2009-03-20 01:44:40 +00:00
Rainer Keller
6f808d9b05 Preparation work for another commit (after RFC):
- This patch solely _adds_ required headers and is rather localized
   The next patch (after RFC) heavily removes headers (based on script)
 - ompi/communicator/communicator.h: For sources that use
   ompi_mpi_comm_world, don't require them to include "mpi.h"
 - ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs
   #include "ompi/mca/topo/topo.h"
 - ompi/errhandler/errhandler_predefined.h:
   ompi/communicator/communicator.h depends on this header file!
   To prevent recursion just have fwd declarations.
   #include "ompi/types.h" for fwd declarations of the main structs.
 - ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and
   ompi_rb_tree_t, so have the proper classes
 - ompi/mca/op/op.h:
   Op is pretty self-contained: Nobody up to now has done
   #include "opal/class/opal_object.h"
 - ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h:
   #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/pml/base/base.h:
   We use opal_lists  
 - ompi/mca/pml/dr/pml_dr_vfrag.h:
   #include "opal/types.h" for ompi_ptr_t
 - ompi/mca/pml/ob1/pml_ob1_hdr.h:
   #include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t
 - opal/dss/dss_unpack.c:
   #include "opal/types.h"
 - opal/mca/base/base.h:
   #include "opal/util/cmd_line.h" for opal_cmd_line_t
 - orte/mca/oob/tcp/oob_tcp.c:
   #include "opal/types.h" for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp.h:
   #include "opal/threads/threads.h" for opal_thread_t
 - orte/mca/oob/tcp/oob_tcp_msg.c:
   #include "opal/types.h" 
 - orte/mca/oob/tcp/oob_tcp_peer.c:
   #include "opal/types.h"  for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp_send.c:
   #include "opal/types.h" 
 - orte/mca/plm/base/plm_base_proxy.c:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT
 - orte/mca/rml/base/rml_base_receive.c:
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/mca/rml/oob/rml_oob_recv.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/mca/rml/oob/rml_oob_send.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/runtime/orte_data_server.c
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/runtime/orte_globals.h:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT

 Tested on Linux/x86-64

This commit was SVN r20817.
2009-03-17 21:34:30 +00:00
Rainer Keller
b9b84a9c29 - ompi/mca/mpool/rdma/mpool_rdma_module.c: At this level
without mpi.h we have no notion of MPI_SUCCESS...
 - ompi/mca/btl/sm/btl_sm.h: ptrdiff_t needs stddef.h 
 - ompi/mca/mpool/base/: If we use opal_pointer_array_t,
   better include the class header.

This commit was SVN r20816.
2009-03-17 20:21:36 +00:00
Pavel Shamis
c6d038a8e8 Adding vendor_error code to error report.
This commit was SVN r20803.
2009-03-17 15:47:34 +00:00
Pavel Shamis
5afa2988f1 Updating RNR/IB timeout for openib btl
This commit was SVN r20801.
2009-03-17 15:03:06 +00:00
Donald Kerr
d29a5e57c1 remove superfluous define
This commit was SVN r20785.
2009-03-16 02:24:01 +00:00
Eugene Loh
64f52b0168 Clean up in response to code review on CMR 1825:
minor changes in comments and edge-case handling.

This commit was SVN r20774.
2009-03-13 18:11:41 +00:00
Donald Kerr
ef55aae401 fix #1829 : udapl btl support for relaxed ordering
This commit was SVN r20772.
2009-03-13 01:01:00 +00:00
Rainer Keller
296a6fb275 - So much fun along the way:
we normally don't do opal/include/opal/...
   Just use the std. opal/...

This commit was SVN r20766.
2009-03-12 19:21:11 +00:00
Rainer Keller
29b1b205fd - Remove two headers (and actually include rml.h) prior to test of
removal script...

This commit was SVN r20765.
2009-03-12 17:58:39 +00:00
Ralph Castain
a8002c0f04 Remove missing files from Makefile.am so make dist will succeed
This commit was SVN r20743.
2009-03-06 02:57:51 +00:00
Rainer Keller
ec0ed48718 - Revert r20739
This commit was SVN r20742.

The following SVN revision numbers were found above:
  r20739 --> open-mpi/ompi@781caee0b6
2009-03-05 21:56:03 +00:00
Rainer Keller
a94438343b - Revert r20740
This commit was SVN r20741.

The following SVN revision numbers were found above:
  r20740 --> open-mpi/ompi@2a70618a77
2009-03-05 21:50:47 +00:00
Rainer Keller
2a70618a77 - Second patch, as discussed in Louisville.
Replace short macros in orte/util/name_fns.h
   to the actual fct. call.

 - Compiles on linux/x86-64

This commit was SVN r20740.
2009-03-05 21:14:18 +00:00
Rainer Keller
781caee0b6 - First of two or three patches, in orte/util/proc_info.h:
Adapt orte_process_info to orte_proc_info, and
   change orte_proc_info() to orte_proc_info_init().
 - Compiled on linux-x86-64
 - Discussed with Ralph

This commit was SVN r20739.
2009-03-05 20:36:44 +00:00
Shiqing Fan
99b415a7e0 On windows, the mca_common_* libraries should be installed in bin, otherwise the libraries that are dependent on them, e.g. shared build of mca_btl_sm, couldn't be loaded at runtime. This commit fixes the problem.
This commit was SVN r20735.
2009-03-05 14:57:35 +00:00
Ralph Castain
20b81ff634 Add the PCIE BTL. This won't actually work yet - still need to work through issues with system header files, generalize specification of resources, etc. - but it won't build unless specifically directed to do so. Meantime, any more changes that impact these areas of the code base can be reflected here rather than having to be dealt with later.
This commit was SVN r20734.
2009-03-05 02:40:25 +00:00
Rainer Keller
9dea63d63a - Last of intrusive commits (promised)... err for now.
Anyway, this is blocking the move: do not include pml.h
   if not really needed, aka none of the following used:
     mca_pml
     MCA_PML_CALL
     OMPI_ANY_TAG
     OMPI_ANY_SOURCE
     OMPI_PROC_NULL

 - Notable exceptions (deleting in one header->adding):
   - ompi/mca/mtl/psm/
   - ompi/mca/osc/rdma/
   - ompi/mca/btl/openib/btl_openib_endpoint.c depended on
     pml_base_sendreq.h

 - Tested on Linux/x86-64, this time including make check
   (thanks Jeff and Ralph)

This commit was SVN r20725.
2009-03-04 17:06:51 +00:00
Rainer Keller
fd28b392bf - An intrusive commit yet again (sorry): with the separation we
get bitten by header depending on having already included
   the corresponding [opal|orte|ompi]_config.h header.
   When separating, things like [OPAL|ORTE|OMPI]_DECLSPEC
   are missed.

   Script to add the corresponding header in front of all following
   (taking care of possible #ifdef HAVE_...)

 - Including some minor cleanups to
   - ompi/group/group.h -- include _after_ #ifndef OMPI_GROUP_H
   - ompi/mca/btl/btl.h -- nclude _after_ #ifndef MCA_BTL_H
   - ompi/mca/crcp/bkmrk/crcp_bkmrk_btl.c -- still no need for
     orte/util/output.h
   - ompi/mca/pml/dr/pml_dr_recvreq.c -- no need for mpool.h
   - ompi/mca/btl/btl.h -- reorder to fit
   - ompi/mca/bml/bml.h -- reorder to fit
   - ompi/runtime/ompi_mpi_finalize.c -- reorder to fit
   - ompi/request/request.h -- additionally need ompi/constants.h

 - Tested on linux/x86-64

This commit was SVN r20720.
2009-03-04 15:35:54 +00:00
Rainer Keller
811f2bd9b4 - As discussed on RFC, move the ompi_bitmap to the
opal layer.
   Add a check against a maximum (actually get rid of ifs internally to
   opal_bitmap.c) -- the functionality to set the current maximum size
   opal_bitmap_set_max_size() is currently only used in attribute.c
   to set the maximum OMPI_FORTRAN_HANDLE_MAX...

   Tested on linux/x86-64 with intel-tests with all_tests_no_perf_f
   run with 6 procs.
   Let's look into MTT as well...

This commit was SVN r20708.
2009-03-03 22:25:13 +00:00
Tim Mattox
57be80c983 First pass at integrating the CIFTS/FTB support as
a notifier module.
The Notifier framework was extended slightly to
convey more information about each event notice.
This works with the FTB v0.5 API.

To compile with FTB support, use --with-ftb=/path/to/ftb/install

CIFTS == Coordinated Infrastructure for Fault Tolerant Systems
FTB == Fault Tolerance Backplane
see http://wiki.mcs.anl.gov/cifts/index.php

This commit was SVN r20655.
2009-02-27 22:53:43 +00:00
Eugene Loh
ffb35a1b6c Exposed mca_btl_sm_sendi() to the PML so that it will be used. Reviewed the code.
Added a few comments and changed the return code after the FIFO write to be SUCCESS,
even if the FIFO write indicated an error.  Such an error would only mean that the
FIFO was full, but the FIFO-write operation would still be queued.  Therefore, the
PML should think of this as successful.

This commit was SVN r20644.
2009-02-26 18:10:50 +00:00
Rainer Keller
4c0e8e1e69 - Header orte/mca/oob/base/base.h is probably the wrong one to include
anyhow -- if oob functionality is neededm then orte/mca/oob/oob.h

   Nevertheless compiles fine with -Wimplicit-function-declaration   

This commit was SVN r20641.
2009-02-26 04:20:03 +00:00
Rainer Keller
04567d3af0 - Header orte/mca/errmgr/errmgr.h is not needed.
Once again compiles fine with -Wimplicit-function-declaration   

This commit was SVN r20640.
2009-02-26 04:05:30 +00:00
Rainer Keller
96e1b9b747 - Header orte/mca/rml/rml.h is not needed if no occurence of orte_rml
or ORTE_RML.
   As the others compiles fine with -Wimplicit-function-declaration

This commit was SVN r20639.
2009-02-26 03:52:31 +00:00
Rainer Keller
224d89a353 - There sure is no local stdio.h header file.
Take the system header file...

This commit was SVN r20637.
2009-02-26 02:17:29 +00:00
Rainer Keller
b356e90fa1 - Get rid of include orte/util/proc_info.h, if not needed
Only proc_info.h-internal include file is opal/dss/dss_types.h
 - In one case (orte/util/hnp_contact.c) had to add proc_info.h again.
 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   works fine, no errors.

   Again, let's have MTT the last word.

This commit was SVN r20631.
2009-02-25 03:38:00 +00:00
Eugene Loh
463f11f993 Improve shared-memory allocation:
* compute mmap-file size more wisely and pass requested size to allocator
* change MCA parameters:
  - get rid of mpool_sm_per_peer_size
  - get rid of mpool_sm_max_size
  - set default mpool_sm_min_size to 0
* no longer pad sm allocations to page boundaries
* have sm_btl_first_time_init check return codes on free-list creations

Have mca_btl_sm_prepare_src() check to see if it can allocate an EAGER fragment
rather than a MAX fragment if the smaller size works.

Remove ompi/class/ompi_[circular_buffer_]fifo.h and references thereto.

Remove opal/util/pow2.[c|h] and references thereto.

This commit was SVN r20614.
2009-02-20 19:51:57 +00:00
Rainer Keller
02599446d0 - Occurences of ORTE_PROC_MY_NAME require orte/runtime/orte_globals.h
This commit was SVN r20607.
2009-02-20 03:16:13 +00:00
Rainer Keller
32b7189995 - Make usage of BTL_OUTPUT
This commit was SVN r20606.
2009-02-20 03:05:14 +00:00
Jeff Squyres
14a83a6bbc Clean up the BML shutdown. Reviewed by George.
This commit was SVN r20586.
2009-02-19 13:17:01 +00:00
Jeff Squyres
563e989b6d Use a bit more friendly language. :-)
This commit was SVN r20583.
2009-02-18 22:12:42 +00:00
George Bosilca
1b1ed0da37 Always set the frag to NULL.
This commit was SVN r20579.
2009-02-18 02:15:09 +00:00
Eugene Loh
5bbf5ba7d7 First putback of some sm BTL latency optimizations:
* The main thing done here is to convert from multiple FIFOs/queues per
  receiver (each receiver has one FIFO for each sender) to a single FIFO/queue
  per receiver (all senders sharing the same FIFO for a given receiver).
* This requires rewriting the FIFO support, so that
  ompi/class/ompi_[circular_buffer_]fifo.h is no longer used and FIFO
  support is instead in btl_sm.h.
* The number of FIFOs per receiver is actually an MCA tunable parameter,
  but it appears that 1 or possibly 2 FIFOs (even for 112 local processes)
  per receiver is sufficient.

This commit was SVN r20578.
2009-02-17 15:58:15 +00:00
Jeff Squyres
265ac096e8 Restore a few #include's
This commit was SVN r20559.
2009-02-14 15:21:28 +00:00
Rainer Keller
d81443cc5a - On the way to get the BTLs split out and lessen dependency on orte:
Often, orte/util/show_help.h is included, although no functionality
   is required -- instead, most often opal_output.h, or               
   orte/mca/rml/rml_types.h                                           
   Please see orte_show_help_replacement.sh commited next.            

 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   actually showed two *missing* #include "orte/util/show_help.h"     
   in orte/mca/odls/base/odls_base_default_fns.c and                  
   in orte/tools/orte-top/orte-top.c                                  
   Manually added these.                                              

   Let's have MTT the last word.

This commit was SVN r20557.
2009-02-14 02:26:12 +00:00
Jeff Squyres
6a1a8311cd Some minor valgrind-inspired cleanups: fix some memory leaks
This commit was SVN r20540.
2009-02-13 03:43:29 +00:00
Tim Mattox
9b83df22ec Fix some "is proc on local node?" logic that got accidentally flipped
by r20496 for the sm BTL, openib BTL on iWarp, and the sm & sm2 coll modules.

This commit was SVN r20515.

The following SVN revision numbers were found above:
  r20496 --> open-mpi/ompi@4cdf91a8d4
2009-02-11 15:02:38 +00:00
Ralph Castain
4cdf91a8d4 Per the RFC, extend the current use of the ompi_proc_t flags field (without changing the field itself).
The prior ompi_proc_t structure had a uint8_t flag field in it, where only one
bit was used to flag that a proc was "local". In that context, "local" was
constrained to mean "local to this node".

This commit provides a greater degree of granularity on the term "local", to include tests
to see if the proc is on the same socket, PC board, node, switch, CU (computing
unit), and cluster.

Add #define's to designate which bits stand for which local condition. This
was added to the OPAL layer to avoid conflicting with the proposed movement of
the BTLs. To make it easier to use, a set of macros have been defined - e.g.,
OPAL_PROC_ON_LOCAL_SOCKET - that test the specific bit. These can be used in
the code base to clearly indicate which sense of locality is being considered.

All locations in the code base that looked at the current proc_t field have
been changed to use the new macros.

Also modify the orte_ess modules so that each returns a uint8_t (to match the
ompi_proc_t field) that contains a complete description of the locality of this
proc. Obviously, not all environments will be capable of providing such detailed
info. Thus, getting a "false" from a test for "on_local_socket" may simply
indicate a lack of knowledge.

This commit was SVN r20496.
2009-02-10 02:20:16 +00:00
Ralph Castain
eaa57e29b6 Revert r20480 as this breaks the trunk. The dpm.h include file has defines for OMPI_RML tags that are required for wireup.
This commit was SVN r20482.

The following SVN revision numbers were found above:
  r20480 --> open-mpi/ompi@62282fefe5
2009-02-09 14:14:45 +00:00
Rainer Keller
62282fefe5 - Get rid of #include "ompi/mca/dpm/dpm.h"
This commit was SVN r20480.
2009-02-09 02:56:10 +00:00
Jeff Squyres
f68d2b00d8 Fix one more place where the old name was left over.
This commit was SVN r20473.
2009-02-06 19:21:50 +00:00
Terry Dontje
64ace9ec12 convert bzero calls to memset to remove warnings.
This commit was SVN r20471.
2009-02-06 19:08:22 +00:00
Jeff Squyres
dfb2d92b37 s/ID/id/ - both work, but if I don't make this change, I'll wonder if
we remembered to use strcasecmp() every time I see this entry in the
file... (we did, but I just don't want to have to keep remembering
that ;-) )

This commit was SVN r20461.
2009-02-06 01:02:25 +00:00
Jeff Squyres
656d8578d0 * Rename (new) MCA parameter to
btl_openib_connect_rdmacm_reject_causes_connect_error (yes, it's
   still long -- on purpose :-) )
 * Add INI file parameter rdmacm_reject_causes_connect_error
 * Now only treat CONNECT_ERROR events as a REJECT if:
   * It's on a connection where we were expecting a REJECT, ''and''
   * The MCA parameter is true ''or'' the INI parameter for this
     device is true
 * Set the INI parameter for true for the NE020

This commit was SVN r20459.
2009-02-06 00:51:04 +00:00
Jeff Squyres
50b1fd1392 Per the big discussion on the OpenFabrics list a while ago, some
versions of the NE driver will report the OUI while others will report
the PCI ID.  We'll put in the Intel values when we get them (may not
be for a few more weeks).

This commit was SVN r20457.
2009-02-05 21:19:45 +00:00
Jeff Squyres
66d0a02f90 For a problem for some iWARP drivers that don't handle RDMA CM REJECT
properly at all.  NetEffect's current driver (OFED 1.4.0) will return
a CONNECT_ERROR event to the initiator rather than the REJECTED event.
Doh!  Additionally -- unfortunately -- NetEffect's vendor_id and
vendor_part_id are reported as 0 in OFED 1.4.0, so we can't
automatically detect these cards and work around the problem.  So all
we can do is add a new MCA parameter
(btl_openib_connect_rdmacm_ignore_connect_errors -- yes, it's long on
purpose ;-) ) that says that if we get a CONNECT_ERROR, bascially
treat it exactly as a REJECT for the WRONG_DIRECTION reason (which is
a "good" reject).  This allows OMPI to function with NetEffect/Intel
cards on OFED 1.4.0.

Note that NetEffect has been bought by Intel; I'm waiting for
information from them to update the ini file for their new OUI/PCI
ID's and/or new vendor_part_id values.

This commit was SVN r20454.
2009-02-05 18:45:59 +00:00
Jeff Squyres
08c35ca135 Somehow this mca param registration code got duplicated; remove one of
them

This commit was SVN r20452.
2009-02-05 16:52:30 +00:00
Jeff Squyres
2cafa5d640 Re-add missing assignment of component variable from MCA param that
somehow must have gotten deleted along the way...

This commit was SVN r20386.
2009-01-30 11:36:14 +00:00
Jeff Squyres
35c5e28a8e Up to SVN r20383
This commit was SVN r20384.

The following SVN revision numbers were found above:
  r20383 --> open-mpi/ompi@e0638c84c8
2009-01-29 17:59:04 +00:00
Jeff Squyres
ca0f7d77e9 Fix a help message regarding the btl_openib_receive_queues MCA
parameter.

This commit was SVN r20350.
2009-01-26 18:57:07 +00:00
Jeff Squyres
84a3f84fdf Possible fix for random openib segv.
This commit was SVN r20282.
2009-01-15 17:10:18 +00:00
Jeff Squyres
d1c6f3f89a * Fix a truckload of Cisco copyrights to be the same as the rest of
the code base.
 * Fix a few misspellings in other copyrights.

This commit was SVN r20241.
2009-01-11 02:30:00 +00:00
George Bosilca
9da6fba64b Update the SCTP BTL regarding ticket #1725.
This commit was SVN r20231.
2009-01-08 16:38:35 +00:00
Donald Kerr
e57435a5d4 udapl btl fix for #1725; replace WAIT with GET
This commit was SVN r20227.
2009-01-08 13:41:36 +00:00
Pavel Shamis
391b101439 Renaming pending_frags to no_credits_pending_frags.
(this commit is part of bug fix for ticket #1693)

This commit was SVN r20217.
2009-01-07 14:41:20 +00:00
Pavel Shamis
2f7b66160b Adding real fix for ticket #1693 - XRC + coalescing segfault.
This commit was SVN r20214.
2009-01-07 14:10:58 +00:00
George Bosilca
f2b9b3fa0b One less warning about a unmatched printf type.
This commit was SVN r20199.
2009-01-05 15:02:00 +00:00
George Bosilca
760e744294 Use a more clear name for the proc in the constructor and destructor functions.
Make sure the lock is created and destroyed as expected.

This commit was SVN r20197.
2009-01-05 14:14:38 +00:00
Jeff Squyres
78b282d0d6 Cosmetic changes; replace tabs with spaces.
This commit was SVN r20183.
2009-01-03 01:45:52 +00:00
Donald Kerr
213daa58da support for solaris relaxed ordering
This commit was SVN r20167.
2008-12-24 15:05:12 +00:00
George Bosilca
7fc48ae11e Update the template BTL to fulfill the requirements for #1713.
This commit was SVN r20153.
2008-12-17 22:15:43 +00:00
George Bosilca
209b844017 Update the self BTL to fulfill the requirements for #1713.
This commit was SVN r20152.
2008-12-17 22:15:27 +00:00
George Bosilca
7d404d238d Update the tcp BTL to fulfill the requirements for #1713.
This commit was SVN r20151.
2008-12-17 22:15:12 +00:00
George Bosilca
341ee1389c Update the sm BTL to fulfill the requirements for #1713.
This commit was SVN r20150.
2008-12-17 22:14:59 +00:00
George Bosilca
7cec018149 Update the Elan BTL to fulfill the requirements for #1713.
This commit was SVN r20149.
2008-12-17 22:14:45 +00:00
Brian Barrett
f8537c0059 Following ticket #1725, when a free list item can not be allocated, return
the error to the upper layer and let it deal with the problem

This commit was SVN r20143.
2008-12-16 22:38:02 +00:00
Avneesh Pant
c1e508750b Check the active port MTU against the MTU statically configured for the HCA. QLogic HCA's capable of MTU had an issue when connected to switches running at 2K.
This commit was SVN r20131.
2008-12-15 21:17:58 +00:00
George Bosilca
fe87e28fee This is a temporary fix for the deadlock problem over MX. The real
problem seems to come from the free list, but due to lack of time to
understand it completely, I provide this fix. Basically, there is no
waiting in the MX BTL anymore, if we cannot allocate a fragment we
rely on the PML to take the corrective actions.

This commit was SVN r20124.
2008-12-15 03:45:34 +00:00
George Bosilca
fec8692074 Get rid of all elan3 references.
This commit was SVN r20122.
2008-12-12 23:59:21 +00:00
Nysal Jan
6a5454b76a Fixes crash in openib BTL on a heterogeneous cluster Refs trac:1700
This commit was SVN r20113.

The following Trac tickets were found above:
  Ticket 1700 --> https://svn.open-mpi.org/trac/ompi/ticket/1700
2008-12-10 22:07:48 +00:00
Shiqing Fan
a5281f0434 - 1/4 commit for Windows Visual Studio and CCP support:
CMakeLists and .windows files.
  In contribs preconfigured and precompiled parts.

This commit was SVN r20108.
2008-12-10 20:59:20 +00:00
Ralph Castain
1ace83c470 Enable modex-less launch. Consists of:
1. minor modification to include two new opal MCA params:
   (a) opal_profile: outputs what components were selected by each framework
       currently enabled for most, but not all, frameworks
   (b) opal_profile_file: name of file that contains profile info required
       for modex

2. introduction of two new tools:
   (a) ompi-probe: MPI process that simply calls MPI_Init/Finalize with
       opal_profile set. Also reports back the rml IP address for all
       interfaces on the node
   (b) ompi-profiler: uses ompi-probe to create the profile_file, also
       reports out a summary of what framework components are actually
       being used to help with configuration options

3. modification of the grpcomm basic component to utilize the
   profile file in place of the modex where possible

4. modification of orterun so it properly sees opal mca params and
   handles opal_profile correctly to ensure we don't get its profile

5. similar mod to orted as for orterun

6. addition of new test that calls orte_init followed by calls to
   grpcomm.barrier

This is all completely benign unless actively selected. At the moment, it only supports modex-less launch for openib-based systems. Minor mod to the TCP btl would be required to enable it as well, if people are interested. Similarly, anyone interested in enabling other BTL's for modex-less operation should let me know and I'll give you the magic details.

This seems to significantly improve scalability provided the file can be locally located on the nodes. I'm looking at an alternative means of disseminating the info (perhaps in launch message) as an option for removing that constraint.

This commit was SVN r20098.
2008-12-09 23:49:02 +00:00
Pavel Shamis
068054132a Temporary work around for #1693 (osu_bibw segfault in xrc + coalescing mode)
This commit was SVN r20093.
2008-12-09 19:21:54 +00:00
Brad Benton
0b83ab39e5 Added the part number for IBM's 2nd rev of the eHCA to the eHCA param stanza
This commit was SVN r20057.
2008-12-03 14:35:43 +00:00
Jon Mason
54b4e9901c Gracefully handle NULL strings when calling orte_show_help for preventing usage
of more than one of the btl_openib_if_include, btl_openib_if_exclude,
btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude MCA parameters.

This commit was SVN r20053.
2008-12-02 23:17:46 +00:00
Jon Mason
91b26eba67 This commit adds comments regarding IP Aliases and the default behavior when
determining which IP address to use when transmitting data.  Also it adds logic
to prevent usage of more than one of the btl_openib_if_include,
btl_openib_if_exclude, btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude
MCA parameters.

This should complete the code modifications needed for ticket 1665.

This commit was SVN r20052.
2008-12-02 22:42:01 +00:00
Donald Kerr
668eafec88 Support to handle platforms which do not define RLIMIT_MEMLOCK
This commit was SVN r20035.
2008-11-25 03:13:09 +00:00
Jon Mason
4757970438 This patch consists of two parts. Part one is the fixing of a bug in the
determing of the IP subnet.  The netmask was being used improperly when
determining which subnet each connection is on.  Part two is the ability to
include/exclude specific subnets.

This patch fixes ticket #1665

This commit was SVN r20016.
2008-11-17 20:20:24 +00:00
Nysal Jan
e4bdaac6d8 Fixed the case where a device does not support inline data. Redefined the interpretation of max_inline_data MCA parameter.
* If max_inline_data == -1 perform runtime detection 
* If max_inline_data >=0 use the value provided 
* If the user does not explicitly set this via command line, use the value from INI file

This commit fixes trac:1662

This commit was SVN r19995.

The following Trac tickets were found above:
  Ticket 1662 --> https://svn.open-mpi.org/trac/ompi/ticket/1662
2008-11-14 12:15:35 +00:00
Rolf vandeVaart
76f8ce01cf Need to add sppp to list of default excluded interfaces
to support Sun M9000 server.

This commit was SVN r19988.
2008-11-12 20:30:14 +00:00
Ralph Castain
ce26e3a2fb Update the notifier framework in prep for move to v1.3. Add an API to handle the case where error messages have been expressed via "show_help" so they can look similar to what was presented to users. Add three key calls in the openib btl to drop messages into syslog.
This will sit in trunk for a few days - would like to actually see some errors reported to syslog before moving the code to 1.3

This commit was SVN r19986.
2008-11-12 18:03:51 +00:00
Jeff Squyres
a48b2d45be Fix wonky copyright year.
This commit was SVN r19985.
2008-11-12 17:51:54 +00:00
Kenneth Matney
07f7f00c91 This disables sendi, since it may do 0-byte requests and it still has
another bug.  This also causes 0-byte requests to be treated as a buffer
error, causing the base request to be requeued.  On Cray XT, it may be
temporarily impossible to make allocations for buffer requests, as the
default stack size is small (8 MB) and there is no true swap device.
Even with the stack size increased, there will be cases in which this
condition recurs.

One possibility is to make the buffer allocations off of the heap; but,
this does not change the fact that eventually an out-of-memory condition
will occur and we need to support multiple receives in transit, a
condition for which the available buffer space may change.  On the other
hand, if we switch to allocating the buffer space from the heap, we will
need to return an error when the allocation fails and there are no other
buffers in transit.

This commit was SVN r19981.
2008-11-12 16:04:14 +00:00
George Bosilca
e84af7920e Move __counter outside the #ifdef section. Cleanup the usage of __counter.
This commit was SVN r19979.
2008-11-11 16:46:11 +00:00
Josh Hursey
080e581422 This commit removes some duplicate finalize code between the component's finalize, and the version that C/R needed in the ft_event function. From my testing everything looks fine, but should probably soak overnight just to be sure. It will need to be moved to v1.3
Thanks to Jeff, Pasha, and Tim M. for bringing this to my attention.

This commit was SVN r19963.
2008-11-10 18:35:57 +00:00
Pavel Shamis
29cc6de40b OOB, XOOB, RDMACM and IBCM does not support qp creation and connection for self communication. So we must use self.
This commit was SVN r19960.
2008-11-10 11:24:57 +00:00
Rolf vandeVaart
cad49da72d Fix the tcp btl so it makes use of the btl_tcp_if_include and btl_tcp_if_exclude
parameters on the connecting side also.  Also move define of IF_NAMESIZE
into if.h file.  And lastly, add one verbose debug message which may be
useful if we run into other issues like this.

This commit fixes trac:1573.

This commit was SVN r19932.

The following Trac tickets were found above:
  Ticket 1573 --> https://svn.open-mpi.org/trac/ompi/ticket/1573
2008-11-05 18:45:42 +00:00
Jeff Squyres
84e30534a2 Update btl_<foo>_flags help message
This commit was SVN r19930.
2008-11-05 00:00:55 +00:00
George Bosilca
d37706f6f8 Reset the variables to NULL after releasing the memory.
This commit was SVN r19912.
2008-11-04 16:49:46 +00:00
Terry Dontje
19cbe4567e This commit fixes trac:1635.
This commit was SVN r19911.

The following Trac tickets were found above:
  Ticket 1635 --> https://svn.open-mpi.org/trac/ompi/ticket/1635
2008-11-04 16:46:45 +00:00
George Bosilca
721626af70 Remove some unused code and a compiler warning.
This commit was SVN r19910.
2008-11-04 16:42:23 +00:00
Terry Dontje
b5a30af20e Correct double free of event on tcp_events list. Refs trac:1629.
This commit was SVN r19890.

The following Trac tickets were found above:
  Ticket 1629 --> https://svn.open-mpi.org/trac/ompi/ticket/1629
2008-11-03 19:03:57 +00:00
George Bosilca
d06091cf84 Completely refactor the connection procedure for MX. It is now cleaner, and
in case of failure more user friendly. In addition we now fully integrate
with the MX_BONDING features (i.e the multi-rails bonding can be done either
at the PML OB1 level, or deep down in the MX library). This feature is driven
by the btl_mx_bonding MCA parameter.
Plus some extra features:
- Correctly compute the bandwidth in case of bonding.
- Nicely handle any mapper access errors.

This commit was SVN r19874.
2008-11-01 17:05:48 +00:00
Jeff Squyres
ae86c0eeca Ensure that the btl variable is always initialized.
This commit was SVN r19873.
2008-11-01 11:24:11 +00:00
George Bosilca
ef3d8cd2a1 Release the memory if we fail to open an MX endpoint.
This commit was SVN r19871.
2008-10-31 23:23:14 +00:00
Jeff Squyres
3b1a1d75f8 Fix problem on RHEL4U3 when compiling with the PGI 32 bit compiler:
<infiniband/driver.h> cannot be included because it will fail to
compile.  So per advice from Roland, in that case, just put manually
include the one ibv_*() prototype that we need.  Use an undocumented
feature of AC_CHECK_HEADERS to check for <infiniband/driver.h> that I
was told on the autoconf mailing list (see the comment for more
details).

This commit was SVN r19857.
2008-10-29 21:45:06 +00:00
Rolf vandeVaart
95bf870055 Fix the allocation for the pointers to the btl structures.
We were only allocating one whereas we would typically need
one per interface.  Most of the time we got lucky and we 
happily trundled over unallocated memory.  But with enough
interfaces, we would get a SEGV.

Also add include to fix compilation with static libs.

This commit was SVN r19845.
2008-10-29 15:15:37 +00:00
Jeff Squyres
6b6c08ef67 Fixes trac:1588: have several BTLs disable themselves in the presence of
THREAD_MULTIPLE.  There's a new (hidden) MCA parameter to re-enable
these BTLs in the presence of THREAD_MULTIPLE:
btl_base_thread_multiple_override.  This MCA parameter should ''only''
be used by developers who are working on make their BTLs thread safe;
it should ''not'' be used by end-users!

This commit was SVN r19826.

The following Trac tickets were found above:
  Ticket 1588 --> https://svn.open-mpi.org/trac/ompi/ticket/1588
2008-10-28 18:29:57 +00:00
Jeff Squyres
96f7d971d3 Refs trac:1589: remove the OMPI_DECLSPEC because it doesn't need to be
there.

This commit was SVN r19825.

The following Trac tickets were found above:
  Ticket 1589 --> https://svn.open-mpi.org/trac/ompi/ticket/1589
2008-10-28 17:24:31 +00:00
Jeff Squyres
200a8552fe Fixes trac:1598: it's not an error if a CPC returns UNREACH.
This commit was SVN r19815.

The following Trac tickets were found above:
  Ticket 1598 --> https://svn.open-mpi.org/trac/ompi/ticket/1598
2008-10-27 18:23:56 +00:00
Jeff Squyres
b5123cb79f Don't destroy the event channel until after everything else has been
torn down.  Fixes trac:1582.

This commit was SVN r19800.

The following Trac tickets were found above:
  Ticket 1582 --> https://svn.open-mpi.org/trac/ompi/ticket/1582
2008-10-24 15:04:54 +00:00
Jeff Squyres
ae34fd150a It always helps to initialize a variable before you try to use it
(vs. only initializing it in some cases).

This commit was SVN r19796.
2008-10-24 14:20:07 +00:00
Jeff Squyres
238a6f851f Wow. And I won't name who it was who did this. You know who you are! :-)
We had a public symbol named "already_opened".  This commit changes
the name to mca_btl_base_already_opened.

I guess it's good we have the visibility stuff enabled by default!
:-)

This commit was SVN r19791.
2008-10-23 18:56:10 +00:00
George Bosilca
7dfdf3e907 A lot of MX fixes.
1. Allow MX bonding via btl_mx_bonding MCA parameter. With this on, Open MPI
  suppose that lib MX will do the bonding, and we will only return one BTL.
  Otherwise, we return as many as devices.
2. Decrease the memory footprint, by cleaning up what we store about the
  peers and how we store it.
3. Allow multiple MX routes that share the same mapper. In this particular
  case we will link by their nic_id.
4. Allow multiple MX routes with multiple mappers. In this case we match
  the NICs based on the last 6 digits of the mapper MAC.
5. Increase the size of the eager and rendez-vous eager limits in the
  case where we are unable to register an unexpected callback with MX.
6. Increase the default max number of MX fragments.
7. Increase the max number of MX BTLs.
8. Only allow mx_if_include and mx_if_exclude if we have acess to the
  mapper.

This commit was SVN r19788.
2008-10-22 20:12:30 +00:00
Jeff Squyres
307d52aedc Ensure to check for internal ptmalloc2 support properly.
This commit was SVN r19760.
2008-10-17 16:27:20 +00:00
Josh Hursey
88aa45dd52 Commit to bring online OpenIB, MX, and shared memory support for Open MPI's checkpoint/restart functionality. Some tuning is still needed, but basic functionality is in place.
There is still a problem with OpenIB and threads (external to C/R functionality). It has been reported in Ticket #1539

Additionally:
* Fix a file cleanup bug in CRS Base.
* Fix a possible deadlock in the TCP ft_event function
* Add a mca_base_param_deregister() function to MCA base
* Add whole process checkpoint timers
* Add support for BTL: OpenIB, MX,  Shared Memory
* Add support Mpool: rdma, sm
* Sundry bounds checking an cleanup in some scattered functions

This commit was SVN r19756.
2008-10-16 15:09:00 +00:00
Lenny Verkhovsky
7ab9a72d0f merge with r19717, memory barrier on PPC
I run IMB exchange on two QS22 machines with r19674 and it got stucked after 256 or 512 bytes every time.
After applying r19717 the test passed, so I guess this is a essential patch.

This commit was SVN r19752.

The following SVN revision numbers were found above:
  r19674 --> open-mpi/ompi@15c47a2473
  r19717 --> open-mpi/ompi@0a765cd788
2008-10-15 16:56:42 +00:00
Jeff Squyres
1e14bb305f Don't increase num_peers if the operation fails.
This commit was SVN r19748.
2008-10-15 10:37:20 +00:00
Jeff Squyres
8b66171086 Only fail the CPC when the first QP is not PP *if* the CPC uses the
CTS protocol.

This commit was SVN r19726.
2008-10-09 17:31:36 +00:00
Jeff Squyres
f7a94f17b9 Since we now & in the mask, the value can never be higher than the
mask value.  Also, the value is unsigned, so it can never be less than
0.

This commit was SVN r19719.
2008-10-09 13:12:49 +00:00
Jeff Squyres
46d7ffd298 Remove some redundancy from redundant MCA redundant param names. The
following names are all new for v1.3, and therefore haven't been
officially released yet:

 * btl_openib_of_cq_size
 * btl_openib_of_max_inline_data
 * btl_openib_of_pkey
 * btl_openib_of_psn
 * btl_openib_of_mtu

The "_of_" (for OpenFabrics) in there is redundant.  It used to be
"_ib_", indicating that these values are pretty much passed directly
to the verbs stack.  But I think the "openib" in the name implies this
already; having "_of_" in there just seems redundant, makes the name
longer, and seems redundant.  It's also redundant.

So I took those "_of_"'s out of the MCA names.  The old (v1.2) names
are still valid (but deprecated), such ash btl_openib_ib_cq_size.

This commit was SVN r19718.
2008-10-08 21:34:05 +00:00
Jeff Squyres
b8b7619312 * Remove pkey index as an MCA param
* Change name: mca_btl_openib_of_pkey_value -> mca_btl_openib_of_pkey
   (since now there's no index, the "_value" suffix is somewhat
   superfluous)
 * Put in a better help message for the _pkey MCA param (to agree with
   the new help message in v1.2.8)

This commit was SVN r19716.
2008-10-08 20:55:40 +00:00
Pavel Shamis
d6eb6b3a34 replacing maximum pkey value with mask
This commit was SVN r19706.
2008-10-08 10:39:57 +00:00
Pavel Shamis
eefb66a133 Fixing openib partition support.
This commit was SVN r19705.
2008-10-08 09:56:43 +00:00
Pavel Shamis
ceb3e04d0d Fixing compilation failure for case when OMPI_HAVE_RDMACM is not defined.
This commit was SVN r19704.
2008-10-08 09:50:54 +00:00
Jeff Squyres
942b59d572 The endpoint is actually a list_item_t -- not a base object. But only
XOOB uses it, so we didn't notice util recently.

This commit was SVN r19689.
2008-10-06 17:38:25 +00:00
Jeff Squyres
c42ab8ea37 Fixes trac:1210, #1319
Commit from a long-standing Mercurial tree that ended up incorporating a lot of things:

 * A few fixes for CPC interface changes in all the CPCs
 * Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC
 * #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC)
   * Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag)
   * CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers. 
   * CPCs allocate a CTS buffer only if they're about to make a connection
 * RDMA CM improvements:
   * Use threaded mode openib fd monitoring to wait for for RDMA CM events
   * Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions
   * Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors
   * Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type
   * Renamed many variables to be internally consistent
   * Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator
   * Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them) 
   * Use rdma_create_qp() instead of ibv_create_qp()
 * openib fd monitoring improvements:
   * Renamed a bunch of functions and variables to be a little more obvious as to their true function
   * Use pipes to communicate between main thread and service thread
   * Add ability for main thread to invoke a function back on the service thread 
   * Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3))
   * Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port
   * Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout
 * Other improvements:
   * btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp")
   * Somewhat improved error handling
   * Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements
   * Oodles of little coding style fixes
   * Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction
   * Added some more show_help error messages
   * Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread) 

This commit was SVN r19686.

The following Trac tickets were found above:
  Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210
2008-10-06 00:46:02 +00:00
Jeff Squyres
8cb6ad5cb7 Adjustment to match the ob1 change from r19658.
This commit was SVN r19671.

The following SVN revision numbers were found above:
  r19658 --> open-mpi/ompi@b32e4e7f34
2008-10-01 23:57:26 +00:00
George Bosilca
b32e4e7f34 Nothing important, mainly replacing tabs with spaces.
This commit was SVN r19658.
2008-09-30 18:30:35 +00:00
Jeff Squyres
8b786cac04 The configure test we had for checking whether openib could build
(related to the presence of posix threads and ptmalloc2) is now a
little outdated: since we don't build ptmalloc2 as part of libopal
anymore, the openib BTL's requirements are not directly tied to
ptmalloc2's anymore.  Specifically, I altered the test to:

 1. At compile time, if no threads are found, the ptmalloc2 component
    is going to be built, '''and the ptmalloc2 component is going to be
    inside libopal,''' then refuse to build the openib BTL.
 1. At run time, if no threads were available at compile time and the
    ptmalloc2 component is part of the process, then refuse to use the
    openib BTL.

Fixes trac:1537.

This commit was SVN r19652.

The following Trac tickets were found above:
  Ticket 1537 --> https://svn.open-mpi.org/trac/ompi/ticket/1537
2008-09-27 11:19:21 +00:00
Brian Barrett
76f47aaa1c Fix a bunch of compiler warnings. Refs trac:1458
This commit was SVN r19649.

The following Trac tickets were found above:
  Ticket 1458 --> https://svn.open-mpi.org/trac/ompi/ticket/1458
2008-09-26 16:15:05 +00:00
Jeff Squyres
d2d06008a0 Change the default value of mpi_leave_pinned to -1, meaning that we'll
figure it out at runtime (really meaning: we'll still default to "0"
unless something explicitly overrides to 1, such as the openib BTL).
This way, ompi_info doesn't confusingly report mpi_leave_pinned==0 for
mpi_leave_pinned, but we end up running with mpi_leave_pinned==1.

Fixes trac:1502.

This commit was SVN r19571.

The following Trac tickets were found above:
  Ticket 1502 --> https://svn.open-mpi.org/trac/ompi/ticket/1502
2008-09-16 22:06:14 +00:00
Shiqing Fan
04ee20a880 - Mainly type casts. Microsoft VC++ compiler is too strict.
This commit was SVN r19517.
2008-09-08 15:39:30 +00:00
George Bosilca
579d70edad We should use #ifdef and not #if
This commit was SVN r19504.
2008-09-05 12:44:19 +00:00
George Bosilca
10b4664e9c Fix the problem with the unexpected MX handler. The Solution so far is to
mostly don't use this mechanism, as we have to be thread safe in order to
really take full advantage of it. The unexpected handler is called by one
of th MX threads, and we do not have control on the moment. In a non threaded
case, this will completely destroy our recv requests queues, so the safest
approach is to don't the unexpected handler.

This commit was SVN r19496.
2008-09-04 15:46:29 +00:00
George Bosilca
9e0d64b9e5 Add a send immediate for MX.
This commit was SVN r19495.
2008-09-04 15:43:56 +00:00
Jeff Squyres
7b05a14d9a Back up r19489, which was the result of a "svn ci -m ..." instead of
an "hg ci -m ...".  Oops.

This commit was SVN r19490.

The following SVN revision numbers were found above:
  r19489 --> open-mpi/ompi@ea866f9e26
2008-09-03 08:45:33 +00:00
Jeff Squyres
ea866f9e26 Up to SVN r19488
This commit was SVN r19489.

The following SVN revision numbers were found above:
  r19488 --> open-mpi/ompi@c0b8a4a9b5
2008-09-03 08:35:59 +00:00
Ralph Castain
f722f134f7 Check the return code from opal_paffinity before continuing to set maffinity as it can only be done if the map_to_socket_core function succeeds.
This commit was SVN r19396.
2008-08-23 03:13:29 +00:00
Ralph Castain
4ef9d15d97 Revamp the opal mca paffinity interface. We ran into a problem when we encountered machines that had "holes" in their physical processor layout - e.g., machines that supported "hotplugging", or that had unpopulated sockets. To solve that problem, we had to clarify at the API level where we were describing physical vs logical processor info, and then translate accordingly in the underlying implementation.
See opal/mca/paffinity/paffinity.h for explanation as to the physical vs logical nature of the params used in the API.

Fixes trac:1435

This commit was SVN r19391.

The following Trac tickets were found above:
  Ticket 1435 --> https://svn.open-mpi.org/trac/ompi/ticket/1435
2008-08-21 19:21:28 +00:00
George Bosilca
a41dcbbc44 Play nicely with the PML. If the data is in the pending send queue, then
tell the PML we're still in charge of sending it.

This commit was SVN r19311.
2008-08-17 20:07:53 +00:00
George Bosilca
f3568a271a Replace tabs with spaces.
This commit was SVN r19310.
2008-08-17 20:06:48 +00:00
George Bosilca
10612bef8a Bring back the SM pending queue, to avoid deadlocks.
This commit fixes trac:1378.

This commit was SVN r19309.

The following Trac tickets were found above:
  Ticket 1378 --> https://svn.open-mpi.org/trac/ompi/ticket/1378
2008-08-17 19:00:50 +00:00
Jeff Squyres
c9f3f2c682 From Ralph Campbell at QLogic:
Roland noticed that the QLogic HCA driver was using the PCIe vendor ID
for the ibv_query_device so the IEEE OUI value is now used. This means
the config file should recognize the vendor ID value 0x1175 too.

This commit was SVN r19277.
2008-08-13 18:35:37 +00:00
Jeff Squyres
735c1d8bee Add a missing OMPI_DECLSPEC
This commit was SVN r19221.
2008-08-07 20:36:11 +00:00
Jeff Squyres
4aff8038f9 A compromise -- move the shutting down of the async thread to the
component_close, when we know all the btl modules are gone and there
will be no more of them created.

This commit was SVN r19214.
2008-08-07 13:48:38 +00:00
Jeff Squyres
e835c89242 Revert r19205.
This commit was SVN r19212.

The following SVN revision numbers were found above:
  r19205 --> open-mpi/ompi@58b3ea2f8b
2008-08-07 11:51:38 +00:00
Donald Kerr
7a88e2c6cb catch when mpool not created and prevent the subsequent segv
This commit was SVN r19211.
2008-08-07 11:33:28 +00:00
Jeff Squyres
6d9f047238 Somehow this sign ended up pointing the wrong way.
This commit was SVN r19207.
2008-08-06 20:35:42 +00:00
Jeff Squyres
58b3ea2f8b The async thread should not be shut down by the module since it's a
per-device entity.  Also, we should shut down the CPCs after we have
destroyed both PP and SRQ QPs.

This commit was SVN r19205.
2008-08-06 18:26:53 +00:00
Ralph Castain
277e4ac292 Provide a warning message if a user's app executes a "fork" operation while using subsystems that may not cleanly support it - e.g., the openib btl. The provided warning is a generic one indicating that use of fork in current conditions is not recommended.
This is setup so that it only is issued once (as opposed to every time they do it), and goes through orte_show_help so the user doesn't get hammered by #procs copies of the warning. In addition, there is a new MCA param (can't have too many!) to shut the warning off altogether.

This closes ticket #1244

This commit was SVN r19196.
2008-08-06 14:22:03 +00:00
Rainer Keller
69a16b14bc - Fix variable set but not used
Coverity CID1057

This commit was SVN r19189.
2008-08-06 14:02:12 +00:00
Rainer Keller
0d08866786 - Declare functions in lex-files as extern "C" {} to get
rid of warnings.

This commit was SVN r19132.
2008-08-04 11:49:01 +00:00
Jeff Squyres
b45d59ea2e Adjust API usage for new parameter in mca_base_param_lookup_source()
This commit was SVN r19115.
2008-07-31 21:56:20 +00:00
George Bosilca
7f01be2830 Remove useless defines. The MCA was bumped to 2.0, there is no reason to keep
these defines around.

This commit was SVN r19091.
2008-07-30 10:40:18 +00:00
Donald Kerr
2899f64146 moving vendor_id info to top of file
This commit was SVN r19087.
2008-07-29 22:33:17 +00:00
Donald Kerr
513225c9f3 add Sun Microsystems, Inc. vendor_id
This commit was SVN r19085.
2008-07-29 19:31:41 +00:00
Jeff Squyres
0af7ac53f2 Fixes trac:1392, #1400
* add "register" function to mca_base_component_t
   * converted coll:basic and paffinity:linux and paffinity:solaris to
     use this function
   * we'll convert the rest over time (I'll file a ticket once all
     this is committed)
 * add 32 bytes of "reserved" space to the end of mca_base_component_t
   and mca_base_component_data_2_0_0_t to make future upgrades
   [slightly] easier
   * new mca_base_component_t size: 196 bytes
   * new mca_base_component_data_2_0_0_t size: 36 bytes
 * MCA base version bumped to v2.0
   * '''We now refuse to load components that are not MCA v2.0.x'''
 * all MCA frameworks versions bumped to v2.0
 * be a little more explicit about version numbers in the MCA base
   * add big comment in mca.h about versioning philosophy

This commit was SVN r19073.

The following Trac tickets were found above:
  Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392
2008-07-28 22:40:57 +00:00
Lenny Verkhovsky
f278609d77 Funny warning message about rd_win size fix
This commit was SVN r19063.
2008-07-28 14:30:57 +00:00
Adrian Knoth
5096512c3a Cosmetics, only typos.
This commit was SVN r19061.
2008-07-28 13:33:08 +00:00
Jeff Squyres
e3e79c0881 Fixes trac:1379:
* Use synonym/deprecated MCA param API for some mca base params
 * In openib BTL, if we have appropriate memory hooks support, and if
   mpi_leave_pinned and mpi_leave_pinned_pipeline were not set by the
   user, set mpi_leave_pinned to 1.
 * Defer checking mpi_leave_pinned_* until as late as possible (i.e.,
   until after the btl's have had a chance to set mpi_leave_pinned to
   1):
   * in ob1 pml
   * in rdma mpool

This commit was SVN r19022.

The following Trac tickets were found above:
  Ticket 1379 --> https://svn.open-mpi.org/trac/ompi/ticket/1379
2008-07-24 22:51:26 +00:00
Jeff Squyres
5b9219565c Remove the use of __cpu_to_be64() and replace it with hton64().
This commit was SVN r18995.
2008-07-23 12:08:55 +00:00
Jeff Squyres
2f208f885c Fixes trac:1295: change language in openib BTL from IB-specific to be
"!OpenFabrics" / neutral (i.e., refer to IB and/or iWARP).

 * Mostly just type, variable/field, and funcion name changes, such as
   s/hca/device/g, etc.  
 * Changed the INI file for the hardware-specific parameters to be
   mca-btl-openib-device-params.ini.
 * Updated a lot of help messages in the help-*.txt files, not just to
   update it to be !OpenFabrics/neutral language, but also for some
   consistency of tone, indenting, etc.
 * Deprecated a bunch of MCA params in favor of language-neutral new
   ones:
   * btl_openib_warn_no_hca_params_found (s/hca/device/)
   * btl_openib_hca_param_files
   * btl_openib_ib_cq_size (s/_ib_/_of_/)
   * btl_openib_ib_max_inline_data
   * btl_openib_ib_psn
   * btl_openib_ib_mtu
   * btl_openib_ib_pkey_ix
   * btl_openib_ib_pkey_val

This commit was SVN r18985.

The following Trac tickets were found above:
  Ticket 1295 --> https://svn.open-mpi.org/trac/ompi/ticket/1295
2008-07-23 00:28:59 +00:00
Jon Mason
f80404d991 Add openib error handling during wireup for rdmacm
The rdmacm event handler has no way of reporting fatal errors to the upper
layers.  By calling mca_btl_openib_endpoint_invoke_error in the rdmacm event
handler for the errors encountered, these errors can now be handled
appropriately.

Closes out Ticket #1283

This commit was SVN r18980.
2008-07-22 19:03:13 +00:00
Pavel Shamis
849a8f86a7 Bug fox for #1388 - fixing ib_cm_listen() random failures.
This commit was SVN r18952.
2008-07-20 06:21:32 +00:00
Andrew Friedley
d479d981a7 Missed this on my last commit regarding if include/exclude -- should now iterate from 0, not 1.
This commit was SVN r18924.
2008-07-16 14:15:25 +00:00
Andrew Friedley
dabe6defb3 Add support for btl_ofud_{in,ex}clude MCA parameters.
This commit was SVN r18916.
2008-07-15 17:57:52 +00:00
Pavel Shamis
e233c11df0 Disabling IBCM cpc by default. In order to enable it, you need to define it explicitly in command line.
This commit was SVN r18911.
2008-07-15 08:02:00 +00:00
Pavel Shamis
807c53c7b1 "failed to ib_cm_listen 10 times" - is know bug in IBCM and we should print it
only if IBCM was explicitly requested.

This commit was SVN r18897.
2008-07-13 16:36:41 +00:00
Pavel Shamis
12379e7f3e Fixing race condition between main thread and async event thread
during openib finalization.

This commit was SVN r18895.
2008-07-13 16:21:49 +00:00
Pavel Shamis
a34bb98f8a Bug fix for #1376.
If IBCM was explicitly specified with exclude/include parameter,
OpenIB BTL will enable verbose report for "/dev/infiniband/ucm" error,
other way the error will not be reported.

This commit was SVN r18868.
2008-07-10 15:08:49 +00:00
Brad Benton
9f0280bd55 arghh...I inadvertantly checked this in to the 1.3 branch rather than
first to the trunk.  So, here is the trunk checkin:

The call to orte_show_help() to notify truncation of the max_inline value
was missing the want_error_header boolean, which eventually results in
a SEGV.  This change corrects the call with the bool set to true.

This commit was SVN r18839.
2008-07-08 15:28:53 +00:00
Pavel Shamis
452141bfb8 Bugfix for #1375.
- Adding configure options that allow to disable IB/RDMA-CM support.
- Code cleanup in openib section of configure

This commit was SVN r18830.
2008-07-08 06:32:54 +00:00
Jeff Squyres
160ba5fe11 Set max_inline_data for the iWARP adapters to be 64
This commit was SVN r18782.
2008-06-30 14:25:32 +00:00
Pavel Shamis
eaa7676c57 Changing default maximum inline data size
from Maximum_Supported_By_Device to 128 (our original value).

This commit was SVN r18774.
2008-06-30 07:47:09 +00:00
Jeff Squyres
f42f55f84b Several improvements and fixes to the openib IBCM CPC:
* Move the passive side QP move to RTS to before we send the reply
   (vs. sending it after we get the RTU).  A lengthy comment explains
   the need for this.
 * Add some timers to the code for analyzing where time is spent.
 * Clarify a few error messages.
 * Currently have a loop around ib_cm_listen() because sometimes it
   fails for seeminly no reason.  Have pending e-mails in to Sean
   Hefty to see if we can figure out why this is happening.  Note that
   the more MPI processes you add, the more likely this error is to
   occur (e.g., ran 720 processes and it happens at least 50% of the
   time).  This makes IBCM somewhat unattractive for general use;
   hopefully we can get a fix...

This commit was SVN r18766.
2008-06-28 10:53:58 +00:00
Jeff Squyres
67933e743c Move some of the things I did in r18762 to the openib BTL proper (in
endpoint.c) because it's almost identical in all the CPC's.  OOB,
XOOB, and IBCM now all invoke the btl error handler properly if
there's an error during wireup.  RDMACM still needs to be done.

This commit was SVN r18764.

The following SVN revision numbers were found above:
  r18762 --> open-mpi/ompi@3eda04578f
2008-06-27 22:48:45 +00:00
Jeff Squyres
3eda04578f Clean up a lot of error handling in IBCM CPC; properly pass error to
upper level btl (and therefore the PML) when something goes wrong
during wireup.

Refs trac:1283.

This commit was SVN r18762.

The following Trac tickets were found above:
  Ticket 1283 --> https://svn.open-mpi.org/trac/ompi/ticket/1283
2008-06-27 21:02:05 +00:00
Jeff Squyres
ad16d8f335 Fix some issues in the ibcm CPC:
* Properly handle non-symmetric subnet ID's
 * Be a bit more stringent when checking for the GID
 * Add lots of BTL_VERBOSE's for diagnostics

This commit was SVN r18754.
2008-06-26 20:23:56 +00:00
Jeff Squyres
dd563f9297 Add more output to btl_base_verbose
This commit was SVN r18743.
2008-06-25 20:16:34 +00:00
Jeff Squyres
b8b1aded4e Hook all the ibcm diagnostics up to "--mca btl_base_verbose 1":
* Use BTL_VERBOSE
 * Remove tailing \n's
 * Remove redundant prefixes (BTL_VERBOSE outputs __FILE__, function,
   and __LINE__)

This commit was SVN r18742.
2008-06-25 19:19:03 +00:00
Jeff Squyres
26f6d9cf0c Properly protect the use of dev->transport_type and
IBV_TRANSPORT_IWARP.

This commit was SVN r18738.
2008-06-25 14:50:59 +00:00
Jeff Squyres
64034b65d0 Remove debugging message kruft
This commit was SVN r18737.
2008-06-25 11:32:33 +00:00
George Bosilca
df9e54194b Fix a deadlock in the put protocol for MX. The callback was never triggered,
as the send and the put share the same execution path and in the case of the
put the MCA_BTL_DES_SEND_ALWAYS_CALLBACK was not set.

This commit was SVN r18735.
2008-06-25 03:45:30 +00:00
George Bosilca
24316b77cf This commit is related to ticket #1362. The following devices respect the
behavior described on the ticket: elan, gm, mx, self, sm, tcp.

This commit was SVN r18734.
2008-06-25 03:35:07 +00:00
Jeff Squyres
edad3b66a2 Add warning message if you request one max_inline_data value and the
device reduces it.

This commit was SVN r18728.
2008-06-24 20:40:03 +00:00
Jon Mason
eefc8956b6 Fix a bug introduced in r18725
ompi_info will crash due to a reference to a unalloced struct.  Add a check for
this struct to prevent this crash.

This commit was SVN r18727.

The following SVN revision numbers were found above:
  r18725 --> open-mpi/ompi@c2351d39f1
2008-06-24 20:38:47 +00:00
Jon Mason
c2351d39f1 Fix Ticket #1359
If rdmacm cpc is disabled and an iwarp device is forced to be used, then
mca_btl_openib_get_iwarp_subnet_id will attempt to reference an uninitalized
list.  Modify the code to check for the uninitalized list and return zero.

Also, fix a memory leak caused by not freeing alloced addresses structs during
shutdown. 

This commit was SVN r18725.
2008-06-24 19:16:06 +00:00
George Bosilca
fbf341068a Remove the pending queue from the shared memory BTL. The PML is in charge of managing
the fragments that failed to be send, there is no need to replicate the same
mechanism in the BTL.
Force the SM BTL to empty all ack fragments in the component progress function.

This commit was SVN r18724.
2008-06-24 19:01:26 +00:00
Jeff Squyres
ea21c31f44 * MCA params btl_openib_use_eager_rdma can now override the
INI file use_eager_rdma value (fixes trac:1169)
 * fixed a typo in a MCA param help message
 * made the check for enabling short/eager RDMA more robust in the
   presence of progress threads; it now emits a show_help warning

This commit was SVN r18723.

The following Trac tickets were found above:
  Ticket 1169 --> https://svn.open-mpi.org/trac/ompi/ticket/1169
2008-06-24 18:31:46 +00:00
Jeff Squyres
e0545460ff Fixes trac:1355: allow INI file to set max_inline_data vale, and if not
specified, probe for max value supported by device.

This commit was SVN r18720.

The following Trac tickets were found above:
  Ticket 1355 --> https://svn.open-mpi.org/trac/ompi/ticket/1355
2008-06-24 17:18:07 +00:00
George Bosilca
1eb62b6c48 Remove a warning. Close ticket #1357.
This commit was SVN r18717.
2008-06-24 14:23:02 +00:00
Jeff Squyres
3f95b906c5 Fixes trac:1085: Improves SLURM configure logic to also allow OS X or any platform where srun is found in the PATH.
This commit was SVN r18714.

The following Trac tickets were found above:
  Ticket 1085 --> https://svn.open-mpi.org/trac/ompi/ticket/1085
2008-06-23 23:12:55 +00:00
Jeff Squyres
281c37afcc Ensure to ignoe the "empty" CPC components.
This commit was SVN r18702.
2008-06-21 11:39:53 +00:00
Jeff Squyres
807e2cc742 Mark a notable place where we need to return an error up to the BTL or
PML.

This commit was SVN r18701.
2008-06-20 22:11:49 +00:00
Jeff Squyres
5ded50df0e * Fix a > that should be ==
* Ensure to destroy the correct QP (local->id[num]->qp will always
   have a valid pointer in it, even if we setup a dummy qp)
 * Note two notable places where we need to figure out how to
   propagate errors up from the CPC to the main BTL / PML when errors
   occur.  Probably have the same issue in IBCM, too.

This commit was SVN r18700.
2008-06-20 22:09:30 +00:00
Jeff Squyres
0074126886 Per #1352, most iWARP adapters today cannot handle connections between
two processes on the same server (!).  So for today, we'll simply mark
all local processes that use iWARP adapters as "unreachable".

More details in #1352.

This commit was SVN r18699.
2008-06-20 22:08:00 +00:00
Jeff Squyres
f4145fce7a Ensure that we don't try to shut down a thread that is not [yet] there
(e.g., if you're excluding some devices, their destructors will be
invoked before the async event thread was setup for them).

This commit was SVN r18698.
2008-06-20 19:30:51 +00:00
Jeff Squyres
ed17b51204 Adjust the max_inline default size down so that it can be accepted on
multiple adapters (eg., Chelsio T3).

But we need to figure out how to determine a good value for the
resident adapter(s) at runtime.  It's problematic because, for
example, Mellanox ConnectX and Chelsio T3 report max_inline values
differently at run-time.  If you ibv_create_qp with a max_inline value
of 0, ConnectX reports back a value that is a formular based on a few
other values (e.g., max_send_sge and max_recv_sge).  But T3 always
reports back "64".

We're looking into this to figure out the best way -- reducing the
default right now should allow other adapters to run while we figure
it out.

This commit was SVN r18697.
2008-06-20 18:24:04 +00:00
George Bosilca
54e7e03695 One less warning.
This commit was SVN r18695.
2008-06-20 17:50:19 +00:00