1
1
Граф коммитов

8826 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
b996c00d1a Set the limits for the MX fragments to 4K. Add code to dump the state of the MX
hardware (not activated).

This commit was SVN r12931.
2006-12-28 08:40:37 +00:00
George Bosilca
3903009b8b Add a check for the unexpected handler. If enabled, allow the zero-copy
protocol over the MX BTL. Now, we have only one matching, the one in Open
MPI.

The problem is that when the unexpected handler is triggered, not all the
message is on the host memory. In the best case we get one MX fragment (internal
MX fragment), in the worst we get NULL. The only way to fit this with the
design of the PML is to force the eager protocol at the MX internal fragment
size, and to limit the send/receive protocol at the same size. Tests show
the outcome is not far from optimal (if the pipeline depth is increased
a little bit).

Set MX_PIPELINE_LOG in order to allow MX to use internal fragments of 4K.

This commit was SVN r12930.
2006-12-28 03:35:41 +00:00
George Bosilca
ff2319dcb7 Complete the OUT protocol. Small latency improvements. Some minor cleanups.
Create some macros, reorder some functions. Make sure all fragments are
correctly released at the end.

This commit was SVN r12926.
2006-12-26 18:15:24 +00:00
George Bosilca
75a35ed7ee Implement the PUT protocol over MX. The send/receive approach give the best
performance on a 2G Myrinet card, as it look like pipelining the messages
by 1M is faster than a simple send/receive. However, when using a 10G card
the send/receive will limit the maximum bandwidth to 2.5Gbs. The reason is
the scarce bus resources that have to be shared between the Myrinet hardware
and the memcpy operation. The PUT protocol remove the memcpy, we now have a 
true zero-copy mechanism. But, there is no pipelining yet as it look like the
RDMA pipeline somehow disappeared from the OB1 PML ...

This commit was SVN r12925.
2006-12-24 22:52:46 +00:00
George Bosilca
e8bd985870 Add more output when calls to the MX library fails.
Move the connection status from theproc into the endpoint.

This commit was SVN r12924.
2006-12-24 22:34:48 +00:00
George Bosilca
14dc72f595 Allow the user to change the MX flags.
This commit was SVN r12923.
2006-12-24 22:21:00 +00:00
George Bosilca
dbe2798638 Allow MX to handle shared memory and self communications. By default these features
are disabled (btl_mx_shared_mem respectively btl_mx_self have to be set in order
to activate them).

This commit was SVN r12922.
2006-12-24 22:18:41 +00:00
Brian Barrett
287b5ee27c Add news item about MX BTL fix
This commit was SVN r12921.
2006-12-23 05:23:56 +00:00
Jelena Pjesivac-Grbovic
3494e1bb05 - Updated decision function for Alltoall collective.
Fixes "jump" for intermediate sizes message on 24+ number of nodes
    (at least on Grig cluster).

This commit was SVN r12920.
2006-12-22 19:59:17 +00:00
George Bosilca
b1725e02d4 No more warnings plus some code reordering.
This commit was SVN r12919.
2006-12-21 22:42:15 +00:00
Edgar Gabriel
dc532577db Adding more accurate checking of the input parameters for the
add_error_class and add_error_code files. Also fixed the update of the
lastusedcode attribute, all of work according to my tests pretty fine.

Please note: the testcode attached to the bug 683 still reports some bugs. I
am however pretty sure that the testcode is wrong at that points:
 - the standard says that the attribute MPI_LASTUSEDCODE has to be updated for
 a new error_class or a new error_code. The test currently assumes, that only
 the add_error_code call changes the attribute value.
 - you have to comment out the two lines 73 and 74 in order to make the
 test finish, since these lines check for the error string of non-existent
 codes.
- line 126 the error-string of MPI_ERR_ARG is not "invalid argument" but a
little bit more, so the test thinks the output is wrong. So probably the test
has to be update to match the according error string of MPI_ERR_ARG.

Fixes trac:682

This commit was SVN r12913.

The following Trac tickets were found above:
  Ticket 682 --> https://svn.open-mpi.org/trac/ompi/ticket/682
2006-12-21 19:36:31 +00:00
Jelena Pjesivac-Grbovic
f1aec23507 Adding tuned allgather implementation.
It contains four algorithms: 
Bruck (ciel(logP) steps), Recursive Doubling (log(P) for power-of-2 processes), Ring (P-1 steps),
and Neighbor Exchange (P/2 steps for even number of processes).

All algorithms passed occ, IMB-2.3, and intel verification tests from ompi-tests/ for up to 56 processes.
The fixed decision function is based on results collected over MX on the Grig cluster at 
the University of Tennessee at Knoxville.  
I have also added (and commented out) copy of MPICH2 decision function for allgather
(from their IJHPCA 2005 paper).

This commit was SVN r12910.
2006-12-21 18:40:02 +00:00
Brian Barrett
7880353fcc Need to close every endpoint we open, or the MX progress thread doesn't die,
which can cause segfaults on shutdown.  Calling mx_finalize() isn't enough
to shutdown the thread, so must close endpoints as well.

Refs trac:513

This commit was SVN r12908.

The following Trac tickets were found above:
  Ticket 513 --> https://svn.open-mpi.org/trac/ompi/ticket/513
2006-12-21 18:13:22 +00:00
Galen Shipman
faa0bafa96 modify preconnect to use a rotating ring algorithm, OOB connections are
brought up lazily so we want to be a bit less agressive. 

This commit was SVN r12906.
2006-12-21 01:36:57 +00:00
Li-Ta Lo
6df4e80727 new XCPU PLS and SDS to work with libxcpu
This commit was SVN r12905.
2006-12-21 00:05:36 +00:00
Li-Ta Lo
eac3520d00 update xcpu autoconfig checks for libxcpu and friends
This commit was SVN r12903.
2006-12-20 22:56:25 +00:00
Rolf vandeVaart
657f0b2d51 Modify the examples Makefile so that it works with Sun's
make as well as gmake.  To do this, added two new MACORS
and modified one dynamic variable. 

Refs trac:533

This commit was SVN r12900.

The following Trac tickets were found above:
  Ticket 533 --> https://svn.open-mpi.org/trac/ompi/ticket/533
2006-12-20 21:31:30 +00:00
Gleb Natapov
484c6a2c1a Use OPAL_ALIGN() macro to align length. Return address from mpool_alloc is now
properly aligned so no need to align it once more.

This commit was SVN r12899.
2006-12-19 08:34:48 +00:00
Brian Barrett
c9a20d7a5e Make sure to distribute align.h. Fixes build failure.
This commit was SVN r12898.
2006-12-19 02:42:01 +00:00
Brian Barrett
2ab65eb521 Remove some debugging output that was #if 0'ed out but shouldn't have been
committed into the trunk anyway

This commit was SVN r12897.
2006-12-19 02:34:41 +00:00
Gleb Natapov
b910d10a81 Add general functions for alignment and change rdma_mpool_align to always
honor an alignment event if posix_memalign() is not available.

This commit was SVN r12892.
2006-12-18 10:52:18 +00:00
Brian Barrett
414a87b14c On OS X, the lowest 8 bits of the exit status are for signals, so pushing
rc (which is -1 or 4 if we hit this case) resulted in an odd error that a
signal killed the proc (instead of a startup error, as is reality).
Instead, use the W_EXITCODE macro (if available) to build up an exit
code that has an error code for exit status, but does not make it look
like the process died from a signal

This commit was SVN r12890.
2006-12-18 02:30:05 +00:00
Brian Barrett
bc6cec346f Print out the description of the signal from mpirun when a proc was aborted
by a signal if we have strsignal()

This commit was SVN r12888.
2006-12-17 20:01:11 +00:00
Brian Barrett
299fc7149e Argh! screwed up a merge. Fix compile error
Refs trac:538

This commit was SVN r12887.

The following Trac tickets were found above:
  Ticket 538 --> https://svn.open-mpi.org/trac/ompi/ticket/538
2006-12-17 19:50:20 +00:00
Brian Barrett
edbce8bfec Don't get the hostname from the environment as SLURM doesn't update that
environment variable, so it's not so useful (arg!).  Instead, get the
hostname during opal_init().  Don't want to call gethostname() during
the signal handler.  While we're at it, only print the machine name
so that the output isn't so wide

Refs trac:538

This commit was SVN r12886.

The following Trac tickets were found above:
  Ticket 538 --> https://svn.open-mpi.org/trac/ompi/ticket/538
2006-12-17 19:48:19 +00:00
Brian Barrett
f28391f9ae Hide the fact we're not printing two levels of trace
Refs trac:538

This commit was SVN r12885.

The following Trac tickets were found above:
  Ticket 538 --> https://svn.open-mpi.org/trac/ompi/ticket/538
2006-12-17 19:31:08 +00:00
Brian Barrett
5fb6183e64 * Write out stackframe number when using backtrace_buffer()
* Print out header that a signal was received

Refs trac:538

This commit was SVN r12884.

The following Trac tickets were found above:
  Ticket 538 --> https://svn.open-mpi.org/trac/ompi/ticket/538
2006-12-17 19:27:57 +00:00
Brian Barrett
b34042a887 Changes to the information printed when a signal occurs:
* Have darwin backtrace code return an error when buffer() is
    called, since it is not imnplemented
  * Print out hostname & pid when giving signal information
  * If backtrace_buffer() is implemented, use that instead of
    backtrace_print() and prefix stacktrace with the hostname
  * Make the signal information printed be more user friendly
  * If we're using the backtrace_buffer() code, don't print 
    the last two functions (which will be show_stackframe()
    then backtrace_buffer()) so that users won't keep thinking
    the error occurred inside Open MPI (sneaky, yes...)

Refs trac:538

This commit was SVN r12883.

The following Trac tickets were found above:
  Ticket 538 --> https://svn.open-mpi.org/trac/ompi/ticket/538
2006-12-17 19:14:13 +00:00
Brian Barrett
2206fc3a45 Ignore the usual files...
This commit was SVN r12881.
2006-12-17 17:31:41 +00:00
Brian Barrett
c554638446 Support systems without malloc.h or posix_memalign (ie, pretty much every
one we support that isn't Linux)

This commit was SVN r12880.
2006-12-17 17:28:59 +00:00
Brian Barrett
b448b4e47e More heterogeneous fixes. Don't set reachability bit on a remote proc
if the remote architecture differs from the local architecture and the
btl doesn't support heterogeneous transport.

Refs trac:587

This commit was SVN r12879.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2006-12-17 17:27:08 +00:00
Gleb Natapov
190e7a27cd Merge with gleb-mpool branch. All RDMA components use same mpool now (rdma).
udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows
to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited).

This commit was SVN r12878.
2006-12-17 12:26:41 +00:00
Brian Barrett
f1fdd7c041 Handle case where remote process is of different architecture than the local
process when creating a datatype from an internal description.

Refs trac:640

This commit was SVN r12877.

The following Trac tickets were found above:
  Ticket 640 --> https://svn.open-mpi.org/trac/ompi/ticket/640
2006-12-17 04:39:16 +00:00
Brian Barrett
0653dc3f24 Pad headers to eliminate heterogeneous issues. Add conversion functions
for switching endianness of headers.  Galen is going to add the code to
use the endian stuff...

This commit was SVN r12876.
2006-12-17 00:50:59 +00:00
Ralph Castain
a0ef517550 Fix some errors in the bproc components that prevented compiling. Thought I had already done this, but either those changes were lost when I did the merge, or my old man's memory is fading....
Whaz-at??? :-)

This commit was SVN r12874.
2006-12-15 19:40:04 +00:00
Brian Barrett
01e8fc5f91 Redo of r12871, without the preconnect code change:
Move the req_mtl structure back to the end of each of the structures in 
the CM PML. The req_mtl structure is cast into a mtl_*_request_structure 
for each MTL, which is larger than the req_mtl itself. The cast will cause
the *_request to overwrite parts of the heavy requests if the req_mtl
isn't the *LAST* thing on each structure (hence the comment). This was 
moved as an optimization at some point, which caused buffer sends to fail...

Refs trac:669

This commit was SVN r12873.

The following SVN revision numbers were found above:
  r12871 --> open-mpi/ompi@597598b712

The following Trac tickets were found above:
  Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669
2006-12-15 17:54:14 +00:00
Brian Barrett
bdf0b231b2 Undo r12871, as it contained some code in ompi/runtime that shouldn't have been
committed

Refs trac:669

This commit was SVN r12872.

The following SVN revision numbers were found above:
  r12871 --> open-mpi/ompi@597598b712

The following Trac tickets were found above:
  Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669
2006-12-15 17:52:13 +00:00
Brian Barrett
597598b712 Move the req_mtl structure back to the end of each of the structures in the
CM PML.  The req_mtl structure is cast into a mtl_*_request_structure for
each MTL, which is larger than the req_mtl itself.  The cast will cause
the *_request to overwrite parts of the heavy requests if the req_mtl
isn't the *LAST* thing on each structure (hence the comment).  This was
moved as an optimization at some point, which caused buffer sends to
fail...

Refs trac:669

This commit was SVN r12871.

The following Trac tickets were found above:
  Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669
2006-12-15 17:46:53 +00:00
Ralph Castain
1e1d0e8a89 Set the app_num attribute into the process environment so we pick it up on the other end
This commit was SVN r12868.
2006-12-15 16:43:52 +00:00
Ralph Castain
677d1260aa cleanup nicely if we don't launch
This commit was SVN r12867.
2006-12-15 14:03:53 +00:00
Ralph Castain
cbb660504c Retain the ability to run valgrind on the bproc launcher - do not call bproc_version if "nolaunch" is specified.
This commit was SVN r12866.
2006-12-15 14:01:21 +00:00
Ralph Castain
64ec238b7b Repair support for Bproc 4 on 64-bit systems. Update the SMR framework to actually support the begin_monitoring API. Implement the get/set_node_state APIs.
This commit was SVN r12864.
2006-12-15 02:34:14 +00:00
Rainer Keller
b99e5a71d1 - Help message in case of MPI-application with two init or
calling init functions after finalize.

This commit was SVN r12858.
2006-12-14 19:58:04 +00:00
Brad Benton
18da4c40d3 Set the QP's static rate from the associated MCA parameter, rather
than just defaulting to 0.

Fixes trac:675

This commit was SVN r12855.

The following Trac tickets were found above:
  Ticket 675 --> https://svn.open-mpi.org/trac/ompi/ticket/675
2006-12-14 19:42:24 +00:00
Rolf vandeVaart
c51c36c4a2 These changes fix two issues.
1. For OS's without the dirent.d_type field, we were potentially
not initializing a filename string.  This could result in a 
directory not being cleaned up.
2. Potential memory leaks in filename strings that were allocated.

Refs trac:678

This commit was SVN r12853.

The following Trac tickets were found above:
  Ticket 678 --> https://svn.open-mpi.org/trac/ompi/ticket/678
2006-12-14 18:27:27 +00:00
Brian Barrett
38c2e43ac2 Print out error string rather than errno for TCP-related errors, making it easier for both the user and us to debug issues with BTL and OOB issues...
This commit was SVN r12852.
2006-12-14 18:20:43 +00:00
Jeff Squyres
0ca8cb35b7 Fixes trac:366
Add ability for ini files to recognize "use_eager_rdma" flag.  Set the
default to "no" (because we should assume that HCAs cannot support the
property necessary for using RDMA for eager messages -- that the last
byte of the message is guaranteed to be written to memory last --
unless proven otherwise.  For example, iWARP cards apparently do not
provide this guarantee), and then set all Mellanox and IBM HCAs to
override the default to enable this behavior on these cards.

This commit was SVN r12851.

The following Trac tickets were found above:
  Ticket 366 --> https://svn.open-mpi.org/trac/ompi/ticket/366
2006-12-14 15:52:13 +00:00
Dan Lacher
e3f749acc4 Ticket: #673
Submitted by: Dan Lacher

This commit was SVN r12844.
2006-12-13 20:01:16 +00:00
Ralph Castain
7b8f445e13 Modify the "--display-map-at-launch" option to just "--display-map". Now that we have a "--do-not-launch" option, the "-at-launch" part of the display-map option was confusing. "--display-map" displays the resulting process map before we launch anyway, so this is clearer.
This commit was SVN r12840.
2006-12-13 13:49:15 +00:00
Ralph Castain
82946cb220 Add a new option to orterun: "--do-not-launch" directs the system to do the allocation, map, job setup, etc., but don't actually launch the job. This lets us test all the setup portions of the code.
Also, take the first step in updating how we handle mca params in ORTE - bring it closer to how it is done in the other two layers. Much more work to be done here.

This commit was SVN r12838.
2006-12-13 04:51:38 +00:00