1
1
Граф коммитов

47 Коммитов

Автор SHA1 Сообщение Дата
Gleb Natapov
3ebaff8dfe Implement new BTL parameters:
We eagerly send data up to btl_*_eager_limit with the match
Upon ACK of the MATCH we start using send/receives of size
btl_*_max_send_size up to the btl_*_rdma_pipeline_offset
After the btl_*_rdma_pipeline_offset we begin using RDMA writes of
size btl_*_rdma_pipeline_frag_size.

Now, on a per message basis we only use the above protocol if the
message is larger than btl_*_min_rdma_pipeline_size

btl_*_eager_limit - > same
btl_*_max_send_size -> same
btl_*_rdma_pipeline_offset -> btl_*_min_rdma_size
btl_*_rdma_pipeline_frag_size -> btl_*_max_rdma_size


btl_*_min_rdma_pipeline_size is new..

This patch also moves all BTL common parameters initialisation into
btl_base_mca.c file.

This commit was SVN r14681.
2007-05-17 07:54:27 +00:00
Josh Hursey
8f119d9063 Closes trac:977
Fix for memory corruption in the restarted process stack. This stemed from 
the brute force method we were previously using. This commit fixes this by
using a lighter weight solution focused in the r2 BML instead of above the PML.
This is a more efficient and flexible solution, and it solves the original
problem.

In the process I pulled out the ft_event function in the tcp BTL and r2 BML
into a set of *_ft.[c|h] files just to keep any updates to these code paths
as isolated as possible to make merging easier on everyone.

This commit was SVN r14371.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855

The following Trac tickets were found above:
  Ticket 977 --> https://svn.open-mpi.org/trac/ompi/ticket/977
2007-04-14 02:06:05 +00:00
Josh Hursey
38547459ae Improve the cleanup process in ob1
Remove a redundant statement in the r2 BML.

This commit was SVN r14228.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2007-04-05 17:37:29 +00:00
Brian Barrett
e283e6f9d9 Retry of r14142, without the one-sided code...
Back out r14073 - it speeds up TCP latency / bandwidth but at the same time 
it kills ROMIO and one-sided performance when using only TCP. The problem 
is that it only allows those two to be progressed every couple of seconds, 
leading to what looks like hangs in the one-sided tests (and the ROMIO stuff, 
although people seem to not notice that at this point). 

This commit was SVN r14144.

The following SVN revision numbers were found above:
  r14073 --> open-mpi/ompi@64fbbc20b8
  r14142 --> open-mpi/ompi@241545a098
2007-03-26 16:01:27 +00:00
Brian Barrett
62e5e81e99 revert r14142, as the onesided change should *not* have come over
This commit was SVN r14143.

The following SVN revision numbers were found above:
  r14142 --> open-mpi/ompi@241545a098
2007-03-26 15:58:41 +00:00
Brian Barrett
241545a098 Back out r14073 - it speeds up TCP latency / bandwidth but at the same time
it kills ROMIO and one-sided performance when using only TCP.  The problem
is that it only allows those two to be progressed every couple of seconds,
leading to what looks like hangs in the one-sided tests (and the ROMIO stuff,
although people seem to not notice that at this point).

This commit was SVN r14142.

The following SVN revision numbers were found above:
  r14073 --> open-mpi/ompi@64fbbc20b8
2007-03-26 15:56:23 +00:00
George Bosilca
64fbbc20b8 Switch the event engine to a blocking mode if there is no high performance
networks available.

This commit was SVN r14073.
2007-03-20 11:15:08 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Josh Hursey
c573171b7d Mostly a cleanup commit.
- Implement the BML/r2 finialize funciton
- Cleanup the btl close routine
- Wire up a pml_base_verbose MCA parameter so you can actually watch the PML selection logic if you really want to.
- Fix a potental segfault in the selection logic.
  ompi_pointer_array_get_item() may return NULL, so we have to check for it

This commit was SVN r13734.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2007-02-21 16:18:43 +00:00
Rainer Keller
125ba1acfa - Reduce the amount of warnings with -Wshadow -- mainly due to
usage of index and abs in inline-fcts in header files.

This commit was SVN r13217.
2007-01-19 19:48:06 +00:00
Gleb Natapov
4c7dbd36c7 Balance RDMA operation in round robin fashion between all available RDMA BTLs.
OB1 always use first element from array of BTLs available for RDMA. The patch
change the array creation algorithm, it puts different BTL in the first element
in round robin fashion.

This commit was SVN r13174.
2007-01-18 09:15:18 +00:00
Brian Barrett
c010119667 If a BTL isn't needed due to exclusivity ranking, need to call a matching
inuse decrement for the increment that was at the start of the procs loop.
Otherwise, the inuse count can end up higher than it actually is and a btl
can end up in the progress loop when it isn't active to any peer.

Refs trac:543

This commit was SVN r12938.

The following Trac tickets were found above:
  Ticket 543 --> https://svn.open-mpi.org/trac/ompi/ticket/543
2006-12-29 02:22:40 +00:00
Gleb Natapov
190e7a27cd Merge with gleb-mpool branch. All RDMA components use same mpool now (rdma).
udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows
to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited).

This commit was SVN r12878.
2006-12-17 12:26:41 +00:00
Galen Shipman
813e7faea8 more fixes for failover.. and yet still more to come..
This commit was SVN r12450.
2006-11-06 21:27:17 +00:00
George Bosilca
3f0a7cad9e The last patch for Windows support. Mostly casting and conversion to C++ friendly headers.
This commit was SVN r11400.
2006-08-24 16:38:08 +00:00
George Bosilca
6afa4c6c64 Windows friendly version. We have to split the OMPI_DECLSPEC in at least 3
different macros, one for each project. Therefore, now we have OPAL_DECLSPEC,
ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project.

This commit was SVN r11270.
2006-08-20 15:54:04 +00:00
Galen Shipman
e5c594c211 More updates for the async error handler for btl's
In order to provide backwards compatability the framework versions are bumped
and the handler registeration function is at the end of the btl struct.
Testing done on sm, openib, and gm.. 

This commit was SVN r11256.
2006-08-17 22:02:01 +00:00
Galen Shipman
3b49953ce2 Add error callback to the btl interface, this allows error to be delivered to
the upperlayer assynchronously although there are some issues with this.. such
as there are multiple consumers of the btl's.. who get's the

This commit was SVN r11232.
2006-08-16 20:21:38 +00:00
Brian Barrett
dd6fa1da2a * Fix for ticket #242, print a friendly error message if we can't reach
a particular peer.  Will now fail during MPI_INIT.  Printing of the
  error messages about no endpoints can be turned off.

This commit was SVN r11181.
2006-08-14 19:17:36 +00:00
Brian Barrett
47725c9b02 * Add new PML (CM) and network drivers (MTL) for high speed
interconnects that provide matching logic in the library.
  Currently includes support for MX and some support for
  Portals
* Fix overuse of proc_pml pointer on the ompi_proc structuer, 
  splitting into proc_pml for pml data and proc_bml for
  the BML endpoint data
* bug fixes in bsend init code, which wasn't being used by
  the OB1 or DR PMLs...

This commit was SVN r10642.
2006-07-04 01:20:20 +00:00
George Bosilca
4df58b5579 Latency is LATENCY as everybody understand it not some percentage of something. Now, we really
order the BTL depending on the real latency for the eager protocol. Starting from now, the
latency one can specify for the devices will be in micro-second, while the bandwidth is in Mbs
(as it was before).

This commit was SVN r10566.
2006-06-29 15:13:58 +00:00
Galen Shipman
e6cd8db0e5 DR will now checksum on a per btl basis (see MCA_BTL_FLAGS_NEED_CSUM). We
still always send ACK's, teasing apart completion for ACK/no ACK looks like a
pain in the .. 

This commit was SVN r10530.
2006-06-27 20:23:47 +00:00
George Bosilca
41c886399b Don't let the user to specify flags which does not make sense. If the PUT flag is
specified check that the put function is available for the BTL. Same safe check for
the GET function. At the end make sure that at least on communication protocol is
specified, otherwise force the send flag.

This commit was SVN r10507.
2006-06-26 20:00:18 +00:00
George Bosilca
e43fbd0082 Remove all useless variables. Minor cleanups.
This commit was SVN r10000.
2006-05-21 05:53:22 +00:00
Galen Shipman
9165882c07 fixes for failover...
This commit was SVN r9998.
2006-05-20 02:39:05 +00:00
Tim Woodall
161e54e6c8 finalize/cleanup failed btl
This commit was SVN r9819.
2006-05-04 18:48:45 +00:00
Tim Woodall
fdd622544b added optional copy routine to allow "derived" class
of mca_bml_base_endpoint to copy state if an endpoint
is updated (e.g. btl deleted/added)

This commit was SVN r9814.
2006-05-04 15:19:12 +00:00
Galen Shipman
5271948ec0 --- opal object changes
add object size to opal class
no longer need the size when allocating a new object as this is stored in
the class structure

--- dr changes 
Previous rev. maintained state on the communicator used for acking duplicate
fragments, but the communicator may be destroyed prior to successfull
delivery of an ack to the peer. We must therefore maintain this state
globally on a per peer, not a per peer, per communicator basis. 
This requires that we use a global rank on the wire and translate this as
appropriate to a local rank within the communicator. 

This commit was SVN r9454.
2006-03-29 16:19:17 +00:00
Tim Woodall
c1bf71b1be - updated copyrights
- removed unused state
- starting to add support for btl failover

This commit was SVN r9431.
2006-03-27 22:48:12 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
Galen Shipman
44fe6c3896 allow pml pipeline to cache memory registrations
to enable this (off by default) use: 
-mca pml_ob1_leave_pinned_pipeline 1 
!!AND!!!
-mca mpool_use_mem_hooks 1 

This commit was SVN r8949.
2006-02-09 15:49:51 +00:00
Tim Woodall
3c170c410c changes required by dr
This commit was SVN r8580.
2005-12-21 15:11:40 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Edgar Gabriel
5d7fbd9d2e minor change in bml_r2_add_procs: the memory for the bml_endpoints structure
has to be allocated outside of the routine. Thus, the update version of pml/ob1/oml_ob1.c

This commit was SVN r7739.
2005-10-12 20:59:25 +00:00
Tim Woodall
4a71621410 merge in scheduling changes from release branch
This commit was SVN r7699.
2005-10-11 20:41:51 +00:00
Galen Shipman
d932cfd342 merge of rcache work into the trunk.. lotsa fun ;-)..
I regression tested before the merge, I will regression test tonight and
correct issues that might have crept in. 

This commit was SVN r7329.
2005-09-12 22:28:23 +00:00
George Bosilca
c9fb1f32f2 And more dependencies fixes. The big commit will follow shortly.
This commit was SVN r7319.
2005-09-12 20:22:59 +00:00
Galen Shipman
4556f5bb7c Fix for multiple calls to add_procs
This commit was SVN r7111.
2005-08-31 19:44:28 +00:00
Jeff Squyres
d6819618ff Minor updates
- Fix compiler warnings
- Fix problem with using "p" instead of "p_index"
- Style updates
- Check return of malloc() for NULL

This commit was SVN r6999.
2005-08-24 10:39:23 +00:00
Tim Woodall
f274f524ab - added get based protocol (if supported by btl) for pre-registered memory
- removed 8 bytes from the majority of the pml headers 

This commit was SVN r6916.
2005-08-17 18:23:38 +00:00
Galen Shipman
f248db3789 misc fixes, changes to support multiple mvapi btl's
This commit was SVN r6890.
2005-08-15 19:39:56 +00:00
George Bosilca
cdee9045c1 ISO C90 forbids mixed declarations and code ...
This commit was SVN r6887.
2005-08-15 17:56:04 +00:00
Galen Shipman
0b0e4874d0 cleanup comments and minor adjustments.
This commit was SVN r6882.
2005-08-15 15:19:07 +00:00
Galen Shipman
8e1e2eec3d Misc fixes for threaded builds..
This commit was SVN r6874.
2005-08-14 19:03:09 +00:00
Galen Shipman
c3fcf25508 fix to add_procs
This commit was SVN r6846.
2005-08-12 21:42:55 +00:00
Galen Shipman
df78b8957c Misc fixes.. no need for procs hashtable in bml_r2..
This commit was SVN r6833.
2005-08-12 16:59:15 +00:00
Galen Shipman
c3c83aa3e1 BML (BTL Managment Layer). Allows BTL's to be used outside of the PML. See
bml.h and PML-OB1 for usage. 

This commit was SVN r6815.
2005-08-12 02:41:14 +00:00