1
1

2685 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
e283e6f9d9 Retry of r14142, without the one-sided code...
Back out r14073 - it speeds up TCP latency / bandwidth but at the same time 
it kills ROMIO and one-sided performance when using only TCP. The problem 
is that it only allows those two to be progressed every couple of seconds, 
leading to what looks like hangs in the one-sided tests (and the ROMIO stuff, 
although people seem to not notice that at this point). 

This commit was SVN r14144.

The following SVN revision numbers were found above:
  r14073 --> open-mpi/ompi@64fbbc20b8
  r14142 --> open-mpi/ompi@241545a098
2007-03-26 16:01:27 +00:00
Brian Barrett
62e5e81e99 revert r14142, as the onesided change should *not* have come over
This commit was SVN r14143.

The following SVN revision numbers were found above:
  r14142 --> open-mpi/ompi@241545a098
2007-03-26 15:58:41 +00:00
Brian Barrett
241545a098 Back out r14073 - it speeds up TCP latency / bandwidth but at the same time
it kills ROMIO and one-sided performance when using only TCP.  The problem
is that it only allows those two to be progressed every couple of seconds,
leading to what looks like hangs in the one-sided tests (and the ROMIO stuff,
although people seem to not notice that at this point).

This commit was SVN r14142.

The following SVN revision numbers were found above:
  r14073 --> open-mpi/ompi@64fbbc20b8
2007-03-26 15:56:23 +00:00
Josh Hursey
7c4ca3c420 remove some stale code
This commit was SVN r14134.
2007-03-23 14:11:12 +00:00
Gleb Natapov
e5450613b5 Add new SM BTL parameter btl_sm_cb_max_num. If set to value greater then zero
it limits the number of circular buffers allocated between each pair of peers.
This allows for more tight memory usage control.

This commit was SVN r14120.
2007-03-22 12:21:42 +00:00
Gleb Natapov
efe0323d35 Initialize fifos at SM BTL init time instead of waiting for first send. This
waist slightly more memory, but prevents problem when fifo cannot be allocated
later during a job run when memory resource is exhausted.

This commit was SVN r14119.
2007-03-22 12:18:44 +00:00
Galen Shipman
ace68b1883 Change the way we handle unexpected messages,
if less than or equal  pml_ob1_unexpected_limit just buffer in the PML level recv
fragment else allocate a buffer via the bucket allocator 

This commit was SVN r14117.
2007-03-22 01:00:34 +00:00
Gleb Natapov
c389c47d79 Fix SM connectivity calculations.
This commit was SVN r14109.
2007-03-21 13:29:19 +00:00
Jeff Squyres
3e2031e0e3 Finally commit something that has been sitting around in one of my
development trees since last year (had to wait for some intel tests to
run yesterday, so I finally took the time to finish this work):

 * Improve MPI API argument checking by also checking for NULL values
   (especially helps when invalid Fortran MPI handles are passed,
   because the various MPI_*f2c functions are supposed to return an
   "invalid" MPI handle [meaning NULL] when this happens).  So now
   OMPI will generate an MPI exception rather than a segv.
 * Removed a few redundant DATATYPE_NULL checks.
 * Also check for some other forms of "invalid" handles (e.g., already
   been freed, etc.) in some cases.  We could probably be a bit more
   stringent in this regard if we really wanted to.
 * Change MPI_Get_processor_name to zero out the string up to
   MPI_MAX_PROCESSOR_NAME characters, because the MPI spec says that
   the string must be at least that long.  We were already passing
   that length to gethostname(), anyway.

This commit was SVN r14100.
2007-03-21 11:10:42 +00:00
Gleb Natapov
a1a14aa4c3 Add memory barriers during SM btl initialization.
This commit was SVN r14099.
2007-03-21 10:25:10 +00:00
Gleb Natapov
435565590f Don't relay on opcode to decide how to progress pending message.
This commit was SVN r14098.
2007-03-21 07:59:59 +00:00
Josh Hursey
299332ecac fix small compiler warning
This commit was SVN r14097.
2007-03-21 04:44:54 +00:00
Brian Barrett
464d536928 remove debugging printf
This commit was SVN r14088.
2007-03-20 21:28:28 +00:00
Josh Hursey
3492fdeae3 Fix a couple of compiler warnings (errors?) caught by ICC testing at Cisco.
This commit was SVN r14080.
2007-03-20 14:12:13 +00:00
George Bosilca
8c9e4baa47 Add multi-link capabilities to the TCP BTL. This is useful for systems where the
latency is high and the network relatively fast. This will allow for more kernel
level buffering, which allow overlap between system calls and communications.
Somehow, even on fast clusters there is an improvement (non significant).

This patch create multiple modules for the same device, which in turn will
create multiple sockets between the peers. By default the number of BTL by
device is set to 1, so there is no fundamental difference with the current
version. Change the value of btl_tcp_links to enable multiple links between
peers.

This commit was SVN r14076.
2007-03-20 11:50:17 +00:00
George Bosilca
0edd770644 Nothing really relevant.
This commit was SVN r14075.
2007-03-20 11:21:23 +00:00
George Bosilca
4332295b32 Typos.
This commit was SVN r14074.
2007-03-20 11:18:05 +00:00
George Bosilca
64fbbc20b8 Switch the event engine to a blocking mode if there is no high performance
networks available.

This commit was SVN r14073.
2007-03-20 11:15:08 +00:00
Rainer Keller
249abd29c2 - Mark some deprecated functions (two still commented) and fix to
not use opal_cmd_line_make_opt anymore.

This commit was SVN r14072.
2007-03-20 10:08:58 +00:00
Gleb Natapov
e551c5f1a3 Get rid of separate sm BTL for different shared memory base addresses. Now,
when we precalculate most of the addresses there is no point to have separate
BTL for this. The sm_progress() code become much more simple as a result.

This commit was SVN r14071.
2007-03-20 08:15:58 +00:00
Jelena Pjesivac-Grbovic
d6402b6898 Adding in-order binary tree algorithm for non-commutative reduce operations.
I tested algorithm with intel and ibm tests and it passed again - so it should work.

This commit was SVN r14068.
2007-03-19 21:03:57 +00:00
Josh Hursey
e1a18fa149 Patch from Gleb
Always set opcode appropriately before calling ibv_post_send.

This commit was SVN r14056.
2007-03-18 13:33:15 +00:00
Josh Hursey
d03073e87d Make sure to protect the finalize call so tools like ompi_info
do not segv.

This commit was SVN r14054.
2007-03-17 19:47:54 +00:00
Josh Hursey
6d29146748 fix dumb logic break in the PML selection finalization
This commit was SVN r14053.
2007-03-17 16:33:43 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Jeff Squyres
266e805427 * Update parameter checking per MPI-1:2.4.1 and MPI-1:5.4.1 -- also
return an error if MPI_COMM_NULL is used.  
 * Minor style fixes.

This commit was SVN r14041.
2007-03-16 13:09:49 +00:00
Gleb Natapov
1dc1ee3998 Send control credit message over "eager rdma" channel if possible.
This commit was SVN r14032.
2007-03-14 14:38:56 +00:00
Gleb Natapov
1f3ac2d7ae Hold pointers to free_max/free_eager lists in array indexed by priority.
This eliminates couple of ifs from fast path.

This commit was SVN r14031.
2007-03-14 14:36:03 +00:00
Gleb Natapov
8607957df9 Get rid of remaining _hp/_lp stuff. Consolidate HP/LP QP creation code.
This commit was SVN r14030.
2007-03-14 14:33:24 +00:00
Brian Barrett
211ed6e852 Make the trunk look similar to v1.1 and v1.2, but return an error if we
can't find "me" in the list of procs, since we should always be in the
proc_world list, or something bad has happened...

This commit was SVN r14025.
2007-03-13 20:17:10 +00:00
Brian Barrett
f59d38dd81 fix stupid compiler warning
This commit was SVN r14024.
2007-03-13 19:45:26 +00:00
Rolf vandeVaart
42168575fd Fix for the special case where np=2 and the sendbuf is set to MPI_IN_PLACE.
In that case, sendcount and sendtype are not valid and we need to use
recvcount and recvtype.

This commit fixes trac:943.  Reviewed by Jelena Pjesivac-Grbovic.

This commit was SVN r14022.

The following Trac tickets were found above:
  Ticket 943 --> https://svn.open-mpi.org/trac/ompi/ticket/943
2007-03-13 19:01:20 +00:00
Brian Barrett
f6be04ff37 be a bit more careful with parens than the r13992 fix
This commit was SVN r13996.

The following SVN revision numbers were found above:
  r13992 --> open-mpi/ompi@3cbac958eb
2007-03-09 16:39:23 +00:00
Brian Barrett
3cbac958eb fix warning about types
This commit was SVN r13992.
2007-03-09 02:32:22 +00:00
Galen Shipman
8253d83410 make btl template compile again
This commit was SVN r13990.
2007-03-08 21:58:26 +00:00
Galen Shipman
67ba5264f6 ORTE_NAME_ARGS casts to long, not unsigned long.
This commit was SVN r13988.
2007-03-08 21:42:29 +00:00
Galen Shipman
8072dd344c use %ld instead of %d as ORTE_NAME_ARGS does casting to long not unsigned long
This commit was SVN r13987.
2007-03-08 21:41:39 +00:00
Bill D'Amico
53d434d6ab Fix warnings when building with UDAPL - minor formatting errors.
This commit was SVN r13971.
2007-03-08 18:39:40 +00:00
Jeff Squyres
b94a39236b Submitted by Gleb, reviewed by Rich:
Queue_empty is determined by the reader, and is it's local view.
However, the writer may continue writing to this queue.  The decision
to go on to the next cb_fifo is done in an atomic region, checking the
writer's view.  The writer also "changes it's view" in an atomic
region protected by the same lock.

This commit was SVN r13968.
2007-03-08 16:51:59 +00:00
Brian Barrett
e926bed69f Implement MPI_TYPE_CREATE_DARRAY function. Works with MPICH2 darray-pack
test, Sun's darray test, and an internal LANL test code.  I would not
assume it will work properly on other codes, as I'm still not sure I
completely understand what the standard says this function is supposed to
do.

Refs trac:65

This commit was SVN r13967.

The following Trac tickets were found above:
  Ticket 65 --> https://svn.open-mpi.org/trac/ompi/ticket/65
2007-03-08 16:33:08 +00:00
Jelena Pjesivac-Grbovic
9780a000ba Cleanup of generic reduce function and possible (low probability) bug fix.
- fixing line lengths and some of the comments
- possible bug fix (but I do not think we exposed it in any tests so far)
  temporary buffers were allocated as multiples of extent instead of 
  true_extent + (count -1) * extent.
Everything is still passing Intel tests over tcp and btl mx up to 64 nodes.

This commit was SVN r13956.
2007-03-08 00:54:52 +00:00
Jelena Pjesivac-Grbovic
57cbafafd5 Clean up of generic broadcast function: removing unecessary statements and improving comments.
This commit was SVN r13955.
2007-03-07 21:59:53 +00:00
Rolf vandeVaart
333357f4cc This fixes the initialization of the usable size of the shared memory.
The original code was not compensating for the space used by the header.  

When memory got tight, the allocator would return a pointer to memory that 
did not exist resulting in a SEGV for the application.  This is a partial 
fix for ticket #929.

Reviewed by Rich Graham.  

This commit was SVN r13950.
2007-03-07 13:28:06 +00:00
Jelena Pjesivac-Grbovic
0c07654c30 Updating reduce_scatter decision function based on MX results up to 64 nodes and both 1ppn and 2ppn
configurations.

This commit was SVN r13945.
2007-03-07 00:38:33 +00:00
George Bosilca
4b63631535 Allow correct duplication for MPI_UB and MPI_LB. The problem is that we cannot
create a duplicate type, because any duplicate type lose the PREDEFINED flag.
An MPI_LB (respectively MPI_UB) without the PREDEFINED tag is useless, as it's
not the a marker anymore. The solution is to return the same pointer, but once
the reference count has been increased. In order for this to work, I allowed
the destruction to check for the reference count of an object before complaining
about destroying a predefined type.

This fixed ticket #317.

This commit was SVN r13942.
2007-03-06 18:21:49 +00:00
Gleb Natapov
40501f8274 Amend IB parameter checking.
This commit was SVN r13936.
2007-03-06 13:05:12 +00:00
Brian Barrett
9660bb6ccc These symbols aren't actually created in ROMIO with Open MPI's configure, so
no need to have them in here.

This commit was SVN r13933.
2007-03-05 22:55:17 +00:00
Jelena Pjesivac-Grbovic
e5ed167a6e Adding tuned version of reduce_scatter implementation.
Currently 3 algorithms are available:
- non-overlapping, reduce + scatterv, (works for non-commutative operations)
- recursive halving algorithm (copied from basic module)
- ring algorithm  (similar to allreduce ring, for large messages)

This commit was SVN r13929.
2007-03-05 20:40:39 +00:00
Gleb Natapov
be018944d2 Clean up circular buffer implementation. Get rid of _same_base_address()
functions by pre-calculating everything in advance.

This commit was SVN r13923.
2007-03-05 14:27:26 +00:00
Gleb Natapov
8078ae5977 Optimize sm communication. Pass message type (MCA_BTL_SM_FRAG_ACK/
MCA_BTL_SM_FRAG_SEND) and status success/fail in low bits of pointers we
are passing through circular buffer. The rank that receives ACK doesn't need
to look into data it received and this is a big win since this data is not in
the cache of the rank's CPU. (Note that we can use low bits of pointers because
free_list always return pointers aligned at least to cache line size).

This commit was SVN r13922.
2007-03-05 14:24:09 +00:00