1
1
Граф коммитов

3070 Коммитов

Автор SHA1 Сообщение Дата
Terry Dontje
e4d52b16b5 Add in eager limit checks in pmls.
This commit was SVN r21778.
2009-08-10 12:46:20 +00:00
Donald Kerr
de6a7f57b0 fix #1984; only decrement send request req_state when not equal to zero
This commit was SVN r21775.
2009-08-07 14:58:50 +00:00
Steve Wise
ed39853f41 add new device ids for the Chelsio T3 RNIC
This commit was SVN r21774.
2009-08-07 14:14:08 +00:00
Rolf vandeVaart
c82e468ede Undo revision r21767 - sorry folks
This commit was SVN r21769.

The following SVN revision numbers were found above:
  r21767 --> open-mpi/ompi@41f38110ff
2009-08-05 22:23:26 +00:00
Rolf vandeVaart
41f38110ff HCA failover support in openib BTL
This commit was SVN r21767.
2009-08-05 21:53:02 +00:00
George Bosilca
cf8bd2142a Various cleanups and typos.
This commit was SVN r21765.
2009-08-05 03:12:33 +00:00
Rainer Keller
1bd94f2d98 - When calling ompi_mtl_portals_finalize, when then pml/ob1 is used
(aka w/o  --mca pml cm), make sure PtlEQGet will actually work
   on ompi_mtl_portals.ptl_eq_h -- do so without adding code to 
   ompi_mtl_portals_progress.

   Otherwise we abort() with
[nid09979:32503] ompi_mtl_portals_finalize: Going to call ompi_mtl_portals_progress
[nid09979:32503]  Error returned from PtlEQGet.  Error code - 14
[nid09979:32502] Signal: Aborted (6)
[nid09979:32502] Signal code:  (-6) 

This commit was SVN r21761.
2009-08-04 22:48:07 +00:00
George Bosilca
98bdf5d17b Remove a compiler warning about missing braces around
initializer.

This commit was SVN r21760.
2009-08-04 21:41:14 +00:00
George Bosilca
32416761ce Make the open/close symmetric with regards to the local variable
contruction/destruction.

This commit was SVN r21753.
2009-08-03 16:45:18 +00:00
Avneesh Pant
af09e7678c Convert a few opal_output() calls to instead use orte_show_help() as well as do some minor cosmetic changes dealing with tab spacing and c-blocks being enclosed with \{\}. There was also a long standing bug with the PSM mtl if the number of hardware contexts on adapter were less than the number of cores on a node (The default case is they are the same hence no issues were reported). For completeness we take care of this case as well but it requires us to tell PSM how many local processes are running on a node and the local rank of the process on a node so it can allocate the available hardware contexts appropriately.
This commit was SVN r21745.
2009-07-30 02:55:20 +00:00
George Bosilca
0bf381e931 This patch try to solve a issue on Leopard. The supposedly global
variables that are not initialized and are declared in a file that
doesn't export any globally visible function are marked as
non-initialized constants, i.e. uninitialized common symbols. For some
obscure reasons, they get removed from the object files on Mac OS X.

So far I found two solution to this problem. One require the addition
of "-c" to the linker command, the second one (corresponding to this
patch) force them to became a common initialized symbol.

This commit was SVN r21739.
2009-07-28 17:06:16 +00:00
Avneesh Pant
38e48d4e2f Add support for MCA parameters for PSM MTL to specify IB unit, port, IB service level and PSM debug level to use. Also specify in the openib btl params file that QLogic hardware supports a max inlined messages size of 0 only.
This commit was SVN r21734.
2009-07-24 20:09:39 +00:00
Edgar Gabriel
9a369d0fc1 - we accidentally decreased the counter for the number of dynamic
communicators twice, once in dpm.disconnect_wait, and once in
comm_free. The second location seems to be the right place for that (since a
communicator could be freed, and not disconnected), remove the instance in
disconnect_wait.

- add some error messages in case something goes wrong.

This commit was SVN r21720.
2009-07-20 19:54:24 +00:00
Terry Dontje
d432c9fdbc Add asserts to catch when btl_eager_limit is smaller than the pml headers.
This commit was SVN r21707.
2009-07-17 14:54:18 +00:00
George Bosilca
8275120656 Get rid of the ompi_convertor.h header file. Replace all references to ompi_convertor
by opal_convertor.
Cleanup the pcie BTL.

This commit was SVN r21703.
2009-07-16 19:13:30 +00:00
George Bosilca
2b57fc3835 Add the missing headers.
This commit was SVN r21701.
2009-07-16 18:42:14 +00:00
George Bosilca
3e971e61f3 The system headers are supposed to be protected by #ifdef and not by #if.
This commit was SVN r21700.
2009-07-16 18:27:33 +00:00
George Bosilca
d07ffedc54 No opal datatype functions in the BTL. The datatype attached to the
convertor is an ompi_datatype_t so calling the ompi level functions
is the way to go.

This commit was SVN r21698.
2009-07-16 18:25:08 +00:00
Jeff Squyres
9dc7f884b2 Fix yet another compile error from the great DDT split (r21641). Sigh.
This commit was SVN r21697.

The following SVN revision numbers were found above:
  r21641 --> open-mpi/ompi@6c5532072a
2009-07-16 18:08:03 +00:00
George Bosilca
e1383027e1 Correct a comment and cleanup/reorder the code.
This commit was SVN r21696.
2009-07-16 17:41:32 +00:00
Ralph Castain
e75d9b8296 Use orte_notifier to alert sys admins to checksum violations in the csum pml.
Add ability to store the RM's jobid string to tag the notifier message so that the sys admin knows what job had the problem.

This commit was SVN r21687.
2009-07-15 19:43:26 +00:00
Rainer Keller
8243831d76 - Get OpenIB BTL to work with old libibverbs installation
Tested on smoky.

This commit was SVN r21685.
2009-07-15 16:12:47 +00:00
Ralph Castain
dbac602be5 Add support for the add-host and add-hostfile MPI Info keys to allow Comm_spawn users to add new hosts to those already known by mpirun.
Requires full testing once comm_spawn is fixed (Edgar is working that now).

This commit was SVN r21664.
2009-07-14 14:34:11 +00:00
George Bosilca
2143424eb5 The MCA parameter should always be taken into account, independent on
how many networks are available on the node.

This commit was SVN r21652.
2009-07-13 19:40:00 +00:00
Josh Hursey
8d9d2ba7d1 Fix the datatype usage in CRCP Bkmrk. as a result of the great datatype shift in r21641
This commit was SVN r21650.

The following SVN revision numbers were found above:
  r21641 --> open-mpi/ompi@6c5532072a
2009-07-13 17:54:26 +00:00
Rainer Keller
6c5532072a - Split the datatype engine into two parts: an MPI specific part in
OMPI
   and a language agnostic part in OPAL. The convertor is completely
   moved into OPAL.  This offers several benefits as described in RFC
   http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
   namely:
    - Fewer basic types (int* and float* types, boolean and wchar
    - Fixing naming scheme to ompi-nomenclature.
    - Usability outside of the ompi-layer.
 - Due to the fixed nature of simple opal types, their information is
   completely
   known at compile time and therefore constified
 - With fewer datatypes (22), the actual sizes of bit-field types may be
   reduced
   from 64 to 32 bits, allowing reorganizing the opal_datatype
   structure, eliminating holes and keeping data required in convertor
   (upon send/recv) in one cacheline...
   This has implications to the convertor-datastructure and other parts
   of the code.
 - Several performance tests have been run, the netpipe latency does not
   change with
   this patch on Linux/x86-64 on the smoky cluster.
 - Extensive tests have been done to verify correctness (no new
   regressions) using:
   1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
    ompi-ddt:
    a. running both trunk and ompi-ddt resulted in no differences
       (except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
       correctly).
    b. with --enable-memchecker and running under valgrind (one buglet
       when run with static found in test-suite, commited)
   2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
      all passed (except for the dynamic/ tests failed!! as trunk/MTT)
   3. compilation and usage of HDF5 tests on Jaguar using PGI and
      PathScale compilers.
   4. compilation and usage on Scicortex.
 - Please note, that for the heterogeneous case, (-m32 compiled
   binaries/ompi), neither
   ompi-trunk, nor ompi-ddt branch would successfully launch.

This commit was SVN r21641.
2009-07-13 04:56:31 +00:00
Brian Barrett
2f3c0b4fcf Drain pipe from service thread to main thread during shutdown. By this
point, the event engine has been shut down until btl finalization is
done, so opal_progress in the wait loop is not an option - we have
to drain from inside the btl.

Clean up the looping structure for the finalize routine

Update copyrights.

This commit was SVN r21620.
2009-07-09 22:13:10 +00:00
Brian Barrett
ac34b1de69 RDMA CM doesn't retry if a packet is dropped, just timesout during route
discovery, which results in a timeout and we don't recover.  Instead,
try to recover a couple of times by retrying.

This commit was SVN r21619.
2009-07-09 22:10:06 +00:00
George Bosilca
311e27b42f Pretty print an error message when the specified range of ports (for both
IPv4 and IPv6) is outside the legal boundaries. This fixes trac:1869.

This commit was SVN r21612.

The following Trac tickets were found above:
  Ticket 1869 --> https://svn.open-mpi.org/trac/ompi/ticket/1869
2009-07-07 17:52:30 +00:00
George Bosilca
4038834dfb Convert the port number in network order before binding the socket.
Thanks to Mariusz Mamonski (mamonski@man.poznan.pl) for the bug
report and patch.

This commit was SVN r21610.
2009-07-07 17:21:28 +00:00
Jeff Squyres
92e40cb20a Enable the coll sync component to barrier before each 1000th collective.
This commit was SVN r21594.
2009-07-02 20:16:45 +00:00
Brian Barrett
3b410b0200 Increase context ref count and push on list before calling rdma_resolve_addr,
in case the event returns before rdma_resolve_addr returns.

This commit was SVN r21588.
2009-07-02 16:12:19 +00:00
Shiqing Fan
0b56a8a4d5 Enable IPv6 on Windows by default, and fix two type casts for IPv6 operations.
This commit was SVN r21586.
2009-07-02 14:41:03 +00:00
Jeff Squyres
cad12fda5f * Remove an extra blank line from the help file
* Add the help file to the Makefile.am so that it gets installed

This commit was SVN r21567.
2009-06-30 18:58:09 +00:00
Eugene Loh
fcd9fabae9 Minor cleanup in the sm BTL. http://www.open-mpi.org/community/lists/devel/2009/06/6363.php
This commit was SVN r21556.
2009-06-27 23:42:09 +00:00
Nysal Jan
5b13fd004a Bump up default MTU for eHCA 2. Improves peak unidirectional bandwidth by around 14%
This commit was SVN r21553.
2009-06-27 07:39:30 +00:00
Ralph Castain
ee18838e2f Remove svn conflict lines due to commit r21551 in the sm btl. I #if 0'd out the offending line that cause the conflict just in case it was the correct one. However, this now compiles cleanly, minus the following warnings that I wasn't sure which way to resolve:
btl_sm.c: In function ‘mca_btl_sm_sendi’:
btl_sm.c:734: warning: comparison between signed and unsigned
btl_sm.c: In function ‘mca_btl_sm_send’:
btl_sm.c:812: warning: comparison between signed and unsigned

This commit was SVN r21552.

The following SVN revision numbers were found above:
  r21551 --> open-mpi/ompi@bd995d26b4
2009-06-27 01:39:15 +00:00
Eugene Loh
bd995d26b4 Try to improve flow control in the sm BTL:
- poll FIFO occasionally even if just sending messages
- retry pending sends more often
  - just before trying a new send
  - as part of mca_btl_sm_component_progress
Maintain two new mca_btl_sm_component variables, num_outstanding_frags
and num_pending_sends, to keep overhead low.


Drain only one message fragment from the FIFO per btl_sm_component_progress
call (rather than drain until empty, which in retrospect everyone considers
to have been a mistake).

This commit was SVN r21551.
2009-06-27 00:12:56 +00:00
George Bosilca
24e74922ce Not yet the right time for this to be there.
This commit was SVN r21540.
2009-06-26 16:00:55 +00:00
Lenny Verkhovsky
7f8dc7c8b8 fix for r21524, mispell fix HAVE_IBV_FORK_INIT
This commit was SVN r21533.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r21524
2009-06-25 17:45:38 +00:00
Terry Dontje
efac1b73fb Surround use of want_fork_support structure field with ifdef instead of c conditional.
This commit was SVN r21526.
2009-06-25 12:03:30 +00:00
Nysal Jan
938599cb2d Fix build failure with latest IBM XL C/C++ v10.1 compiler. Also this seems like cleaner code.
This commit was SVN r21497.
2009-06-23 14:08:04 +00:00
George Bosilca
7f24b41051 This function doesn't have to be globally visible.
This commit was SVN r21494.
2009-06-22 17:14:54 +00:00
Jeff Squyres
c39998db17 Also show the "you might not have enough registered memory" warning
message earlier in the openib BTL startup sequence

This commit was SVN r21469.
2009-06-18 12:24:39 +00:00
Rolf vandeVaart
36a560506c Fix error message to match code default.
This commit was SVN r21452.
2009-06-16 20:59:53 +00:00
Rainer Keller
5e6061af02 - Few fixes and comments
This commit was SVN r21443.
2009-06-15 21:12:04 +00:00
Ralph Castain
44bb265a52 Add a new MPI_Info key to preposition OMPI libraries - implementation underway, but this just defines and passes the new key
This commit was SVN r21425.
2009-06-12 17:53:13 +00:00
Jeff Squyres
814a8f5e0f * Fix #1916: endian problems in iwarp wireup on big endian machines
(now works on both big and little endian machines)
 * Be a little more flexible when looking for active devices in
   btl_openib_component.c
 * Add device name and port number to lots of verbose and help
   messages
 * Add a bunch of verbose messages to give insight into what is
   occurring during all the CPC wireups

This commit was SVN r21418.
2009-06-11 17:30:30 +00:00
Ralph Castain
4881cd0df3 Revert the prior change out from the individual .h files - the problem was in the Makefile.am's, causing the make dist to fail.
This commit was SVN r21414.
2009-06-11 03:15:47 +00:00
Ralph Castain
91ab2b3e4f Specify complete path to included header files so it compiles in all environments
This commit was SVN r21412.
2009-06-11 02:46:30 +00:00