1
1
Граф коммитов

1765 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
1cb26e3b9c Finally the convertor export a convenience function to allow a consistent
computation of the current location on the pack/unpack process. This can
be used both for retrieving the pointer to the first byte (in the special
case of the cached RDMA protocol) and for getting the current
position (for the pipelined protocol).

I modified all BTLs, but most of them are still untested.

This commit was SVN r14180.
2007-03-30 22:02:45 +00:00
Galen Shipman
a78672be2b fix mpi_leave_pinned case for arbitrary datatypes
George will be streamlining this with a new convertor function soon... 

This commit was SVN r14174.
2007-03-30 02:06:08 +00:00
Galen Shipman
db63458495 bring disable_sbrk back online, there was a change to properly support AIX
some time ago (last summer) that included checking for M_TRIM_THRESHOLD and
M_MMAP_MAX, unfortunately we didn't include <malloc.h> which is where these
are define, so disabling sbrk for the registration cache has been busted for
some time. 

This commit was SVN r14169.
2007-03-29 16:11:00 +00:00
George Bosilca
cc65814969 And set the message size before the first use too.
This commit was SVN r14159.
2007-03-28 18:01:13 +00:00
George Bosilca
b540545fa7 Set the communicator size before using it.
This commit was SVN r14158.
2007-03-28 17:59:21 +00:00
George Bosilca
78f362d0d6 Be consistent about the definitions of mca_mpool_base_page_size and
mca_mpool_base_page_size_log. They are exported by the mpool/base/base.h,
if some other code need them, then it should include this file
instead of having it's own redefinition of these externals.

This commit was SVN r14156.
2007-03-28 14:14:05 +00:00
Shiqing Fan
91cfb2f149 A few mismatched declearations are fixed, and several header files are added for Cygwin...
This commit was SVN r14151.
2007-03-27 14:17:25 +00:00
Mohamad Chaarawi
bfaf9d4a12 Added new module for intercomm collectives. This will require an
autogen.

This commit was SVN r14149.
2007-03-27 02:06:42 +00:00
Brian Barrett
e283e6f9d9 Retry of r14142, without the one-sided code...
Back out r14073 - it speeds up TCP latency / bandwidth but at the same time 
it kills ROMIO and one-sided performance when using only TCP. The problem 
is that it only allows those two to be progressed every couple of seconds, 
leading to what looks like hangs in the one-sided tests (and the ROMIO stuff, 
although people seem to not notice that at this point). 

This commit was SVN r14144.

The following SVN revision numbers were found above:
  r14073 --> open-mpi/ompi@64fbbc20b8
  r14142 --> open-mpi/ompi@241545a098
2007-03-26 16:01:27 +00:00
Brian Barrett
62e5e81e99 revert r14142, as the onesided change should *not* have come over
This commit was SVN r14143.

The following SVN revision numbers were found above:
  r14142 --> open-mpi/ompi@241545a098
2007-03-26 15:58:41 +00:00
Brian Barrett
241545a098 Back out r14073 - it speeds up TCP latency / bandwidth but at the same time
it kills ROMIO and one-sided performance when using only TCP.  The problem
is that it only allows those two to be progressed every couple of seconds,
leading to what looks like hangs in the one-sided tests (and the ROMIO stuff,
although people seem to not notice that at this point).

This commit was SVN r14142.

The following SVN revision numbers were found above:
  r14073 --> open-mpi/ompi@64fbbc20b8
2007-03-26 15:56:23 +00:00
Gleb Natapov
e5450613b5 Add new SM BTL parameter btl_sm_cb_max_num. If set to value greater then zero
it limits the number of circular buffers allocated between each pair of peers.
This allows for more tight memory usage control.

This commit was SVN r14120.
2007-03-22 12:21:42 +00:00
Gleb Natapov
efe0323d35 Initialize fifos at SM BTL init time instead of waiting for first send. This
waist slightly more memory, but prevents problem when fifo cannot be allocated
later during a job run when memory resource is exhausted.

This commit was SVN r14119.
2007-03-22 12:18:44 +00:00
Galen Shipman
ace68b1883 Change the way we handle unexpected messages,
if less than or equal  pml_ob1_unexpected_limit just buffer in the PML level recv
fragment else allocate a buffer via the bucket allocator 

This commit was SVN r14117.
2007-03-22 01:00:34 +00:00
Gleb Natapov
c389c47d79 Fix SM connectivity calculations.
This commit was SVN r14109.
2007-03-21 13:29:19 +00:00
Gleb Natapov
a1a14aa4c3 Add memory barriers during SM btl initialization.
This commit was SVN r14099.
2007-03-21 10:25:10 +00:00
Gleb Natapov
435565590f Don't relay on opcode to decide how to progress pending message.
This commit was SVN r14098.
2007-03-21 07:59:59 +00:00
Josh Hursey
299332ecac fix small compiler warning
This commit was SVN r14097.
2007-03-21 04:44:54 +00:00
Brian Barrett
464d536928 remove debugging printf
This commit was SVN r14088.
2007-03-20 21:28:28 +00:00
Josh Hursey
3492fdeae3 Fix a couple of compiler warnings (errors?) caught by ICC testing at Cisco.
This commit was SVN r14080.
2007-03-20 14:12:13 +00:00
George Bosilca
8c9e4baa47 Add multi-link capabilities to the TCP BTL. This is useful for systems where the
latency is high and the network relatively fast. This will allow for more kernel
level buffering, which allow overlap between system calls and communications.
Somehow, even on fast clusters there is an improvement (non significant).

This patch create multiple modules for the same device, which in turn will
create multiple sockets between the peers. By default the number of BTL by
device is set to 1, so there is no fundamental difference with the current
version. Change the value of btl_tcp_links to enable multiple links between
peers.

This commit was SVN r14076.
2007-03-20 11:50:17 +00:00
George Bosilca
4332295b32 Typos.
This commit was SVN r14074.
2007-03-20 11:18:05 +00:00
George Bosilca
64fbbc20b8 Switch the event engine to a blocking mode if there is no high performance
networks available.

This commit was SVN r14073.
2007-03-20 11:15:08 +00:00
Gleb Natapov
e551c5f1a3 Get rid of separate sm BTL for different shared memory base addresses. Now,
when we precalculate most of the addresses there is no point to have separate
BTL for this. The sm_progress() code become much more simple as a result.

This commit was SVN r14071.
2007-03-20 08:15:58 +00:00
Jelena Pjesivac-Grbovic
d6402b6898 Adding in-order binary tree algorithm for non-commutative reduce operations.
I tested algorithm with intel and ibm tests and it passed again - so it should work.

This commit was SVN r14068.
2007-03-19 21:03:57 +00:00
Josh Hursey
e1a18fa149 Patch from Gleb
Always set opcode appropriately before calling ibv_post_send.

This commit was SVN r14056.
2007-03-18 13:33:15 +00:00
Josh Hursey
d03073e87d Make sure to protect the finalize call so tools like ompi_info
do not segv.

This commit was SVN r14054.
2007-03-17 19:47:54 +00:00
Josh Hursey
6d29146748 fix dumb logic break in the PML selection finalization
This commit was SVN r14053.
2007-03-17 16:33:43 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Gleb Natapov
1dc1ee3998 Send control credit message over "eager rdma" channel if possible.
This commit was SVN r14032.
2007-03-14 14:38:56 +00:00
Gleb Natapov
1f3ac2d7ae Hold pointers to free_max/free_eager lists in array indexed by priority.
This eliminates couple of ifs from fast path.

This commit was SVN r14031.
2007-03-14 14:36:03 +00:00
Gleb Natapov
8607957df9 Get rid of remaining _hp/_lp stuff. Consolidate HP/LP QP creation code.
This commit was SVN r14030.
2007-03-14 14:33:24 +00:00
Rolf vandeVaart
42168575fd Fix for the special case where np=2 and the sendbuf is set to MPI_IN_PLACE.
In that case, sendcount and sendtype are not valid and we need to use
recvcount and recvtype.

This commit fixes trac:943.  Reviewed by Jelena Pjesivac-Grbovic.

This commit was SVN r14022.

The following Trac tickets were found above:
  Ticket 943 --> https://svn.open-mpi.org/trac/ompi/ticket/943
2007-03-13 19:01:20 +00:00
Galen Shipman
8253d83410 make btl template compile again
This commit was SVN r13990.
2007-03-08 21:58:26 +00:00
Galen Shipman
67ba5264f6 ORTE_NAME_ARGS casts to long, not unsigned long.
This commit was SVN r13988.
2007-03-08 21:42:29 +00:00
Galen Shipman
8072dd344c use %ld instead of %d as ORTE_NAME_ARGS does casting to long not unsigned long
This commit was SVN r13987.
2007-03-08 21:41:39 +00:00
Bill D'Amico
53d434d6ab Fix warnings when building with UDAPL - minor formatting errors.
This commit was SVN r13971.
2007-03-08 18:39:40 +00:00
Jelena Pjesivac-Grbovic
9780a000ba Cleanup of generic reduce function and possible (low probability) bug fix.
- fixing line lengths and some of the comments
- possible bug fix (but I do not think we exposed it in any tests so far)
  temporary buffers were allocated as multiples of extent instead of 
  true_extent + (count -1) * extent.
Everything is still passing Intel tests over tcp and btl mx up to 64 nodes.

This commit was SVN r13956.
2007-03-08 00:54:52 +00:00
Jelena Pjesivac-Grbovic
57cbafafd5 Clean up of generic broadcast function: removing unecessary statements and improving comments.
This commit was SVN r13955.
2007-03-07 21:59:53 +00:00
Rolf vandeVaart
333357f4cc This fixes the initialization of the usable size of the shared memory.
The original code was not compensating for the space used by the header.  

When memory got tight, the allocator would return a pointer to memory that 
did not exist resulting in a SEGV for the application.  This is a partial 
fix for ticket #929.

Reviewed by Rich Graham.  

This commit was SVN r13950.
2007-03-07 13:28:06 +00:00
Jelena Pjesivac-Grbovic
0c07654c30 Updating reduce_scatter decision function based on MX results up to 64 nodes and both 1ppn and 2ppn
configurations.

This commit was SVN r13945.
2007-03-07 00:38:33 +00:00
Gleb Natapov
40501f8274 Amend IB parameter checking.
This commit was SVN r13936.
2007-03-06 13:05:12 +00:00
Brian Barrett
9660bb6ccc These symbols aren't actually created in ROMIO with Open MPI's configure, so
no need to have them in here.

This commit was SVN r13933.
2007-03-05 22:55:17 +00:00
Jelena Pjesivac-Grbovic
e5ed167a6e Adding tuned version of reduce_scatter implementation.
Currently 3 algorithms are available:
- non-overlapping, reduce + scatterv, (works for non-commutative operations)
- recursive halving algorithm (copied from basic module)
- ring algorithm  (similar to allreduce ring, for large messages)

This commit was SVN r13929.
2007-03-05 20:40:39 +00:00
Gleb Natapov
be018944d2 Clean up circular buffer implementation. Get rid of _same_base_address()
functions by pre-calculating everything in advance.

This commit was SVN r13923.
2007-03-05 14:27:26 +00:00
Gleb Natapov
8078ae5977 Optimize sm communication. Pass message type (MCA_BTL_SM_FRAG_ACK/
MCA_BTL_SM_FRAG_SEND) and status success/fail in low bits of pointers we
are passing through circular buffer. The rank that receives ACK doesn't need
to look into data it received and this is a big win since this data is not in
the cache of the rank's CPU. (Note that we can use low bits of pointers because
free_list always return pointers aligned at least to cache line size).

This commit was SVN r13922.
2007-03-05 14:24:09 +00:00
Gleb Natapov
90fb58de4f When frags are allocated from mpool by free_list the frag structure is also
allocated from mpool memory (which is registered memory for RDMA transports)
This is not a problem for a small jobs, but for a big number of ranks an
amount of waisted memory is big.

This commit was SVN r13921.
2007-03-05 14:17:50 +00:00
Rich Graham
e932d9a695 macro variable has same name as one of the parameters passed to the
macro.
Typo - most likely cut and paste error.

This commit was SVN r13918.
2007-03-04 23:31:07 +00:00
Li-Ta Lo
196e2a86bb addes binomial tree based scatter, passed IBM and intel tests
This commit was SVN r13906.
2007-03-02 23:19:02 +00:00
Li-Ta Lo
11c94cbe76 eliminated the use of MPI_Get_count
This commit was SVN r13904.
2007-03-02 22:57:50 +00:00
Li-Ta Lo
3765e19d15 added ASCII graph for the topologies
This commit was SVN r13892.
2007-03-02 17:17:14 +00:00
Li-Ta Lo
bd75f2f162 change ALLGATHER to GATHER
This commit was SVN r13891.
2007-03-02 17:02:29 +00:00
Josh Hursey
0404444dbe * Added 2 new MCA parameters
- mca_base_param_file_prefix
     (Default: NULL)
     This is the fullname of the "-am" mpirun option. Used to specify a ':'
     separated list of AMCA parameter set files.
  - mca_base_param_file_path
     (Default: $SYSCONFDIR/amca-param-sets/:$CWD)
     The path to search for AMCA files with relative paths. A warning will be
     printed if the AMCA file cannot be found.

* Added a new function "mca_base_param_recache_files" the re-reads the file
configurations. This is used internally to help bootstrap the MCA system.

* Added a new orterun/mpirun command line option '-am' that aliases for the
mca_base_param_file_prefix MCA parameter

* Exposed the opal_path_access function as it is generally useful in other
places in the code.

* New function "opal_cmd_line_make_opt_mca" which will allow you to append a
new command line option with MCA parameter identifiers to set at the same
time. Previously this could only be done at command line declaration time.

* Added a new directory under the $pkgdatadir named "amca-param-sets" where all
the 'shipped with' Open MPI AMCA parameter sets are placed. This is the first
place to search for AMCA sets with relative paths.

* An example.conf AMCA parameter set file is located in
contrib/amca-param-sets/.

* Jeff Squyres contributed an OpenIB AMCA set for benchmarking.

Note: You will need to autogen with this commit as it adds a configure param.
  Sorry :(

This commit was SVN r13867.
2007-03-01 13:39:20 +00:00
Tim Mattox
ec82d01555 Add a missing extern keyword that prevented compilation on OS X.
This commit was SVN r13853.
2007-02-28 20:26:34 +00:00
Gleb Natapov
2b6cbd6299 Separate frag lists for RDMA descriptors to two, one for src descriptors
and another for dst descriptors. This provide partial solution to OB1 protocol
deadlock problem. We can limit number of RDMA descriptors (by setting
btl_openib_free_list_max to something different from -1) and if we will be
lucky to hit this limit before we fail to register more memory the protocol
will not deadlock. When we had only one list for src/dst descriptors we
deadlocked when we reached max limit for the list.

This commit was SVN r13844.
2007-02-28 13:43:38 +00:00
Sven Stork
870740efe2 - proper export symbols that are required by other components.
This commit was SVN r13841.
2007-02-28 12:51:55 +00:00
Rainer Keller
0889ebd59f - Eliminate warnings, that PGI-6.2.5 issues with -Minform=inform
This commit was SVN r13840.
2007-02-28 08:36:34 +00:00
Li-Ta Lo
c5d8c221b0 added binomial tree based Gather alogrithm, passed IBM and Intel tests
This commit was SVN r13835.
2007-02-28 01:11:01 +00:00
Jelena Pjesivac-Grbovic
627533fe4a Adding segmented ring algorithm for Allreduce for commutative operations.
Algorithm allows user to specify the segment size to be used for computation/communication overlap.
The additional memory requirement for the algorithm is 2 x segment size.
It performed well for (really) large message sizes over MX and it passed intel Allreduce_c and Allreduce_loc_c tests.

This commit was SVN r13832.
2007-02-27 20:32:30 +00:00
Sven Stork
d8a369936e - Fix more symbols that should be exported.
This commit was SVN r13824.
2007-02-27 15:17:17 +00:00
George Bosilca
bec20422ee Remove the warnings about printf data-type mismatch.
This commit was SVN r13804.
2007-02-26 22:20:35 +00:00
Brian Barrett
6d70f5fbe0 don't define malloc and friends in opal_config, as it causes problems when
we later include malloc.h

This commit was SVN r13803.
2007-02-26 21:34:48 +00:00
Li-Ta Lo
c860bd1be5 fixed a typo in the comment
This commit was SVN r13802.
2007-02-26 19:20:46 +00:00
Li-Ta Lo
73a73b1c78 added ASCII graph on reduce_log_intra
This commit was SVN r13801.
2007-02-26 19:15:37 +00:00
Pavel Shamis
6fe84f581b mpool_base_module_destroy was removing all modules from
a list instead of removing specific one. Fixing the bug.

This commit was SVN r13795.
2007-02-26 16:25:20 +00:00
Brian Barrett
d9e0e80190 Make some debugging output only looked at when debugging is enabled
This commit was SVN r13777.
2007-02-25 01:03:19 +00:00
Bill D'Amico
db1c2a58c4 Removed cruft - unused variables causing warnings during OMPI build.
This commit was SVN r13772.
2007-02-23 18:55:41 +00:00
Tim Prins
f35f67ed1c (very) minor correction to helpfile
This commit was SVN r13758.
2007-02-22 16:02:12 +00:00
Ron Brightwell
e15e85a0b6 Fix a problem with long unexpected messages that was causing hangs.
Long unexpected messages were not generating PUT_START events
because the MD for long unexpected messages was configured to
ignore start events.  When a long unexpected message arrived, it
traversed the match list, and ended up in the long unexpected MD.
As the long message is being consumed, the code called PtlMDUpdate()
to look for the message, but there was no event that indicated
that it had arrived. So, the update succeeded.  Once the long
unexpected message was consumed, the PUT_END event showed up in the
event queue -- except the code wasn't looking for it anymore.
The PUT_START events exist specifically to handle ordering between
short and long unexpected messages, so PUT_START events can't be
ignored on long unexpected messages.

Modified the code to generate PUT_START events for both long and
short unexpected messages and handle matching up START and END
events appropriately.

This commit was SVN r13746.
2007-02-21 21:59:48 +00:00
Li-Ta Lo
049921a5ec the temporary buffer is not needed for the MPI_IN_PLACE cases if the underlying Gather is implemented correctly
This commit was SVN r13740.
2007-02-21 20:39:56 +00:00
Jelena Pjesivac-Grbovic
36156f39c2 Modification to allreduce ring algorithm:
- the block sizes are computed in more uniformn way.
  The first k blocks may be 1 element larger than the remaining blocks.
The algorithm passed Intel Allreduce_c and Allreduce_loc_c tests, and 
IMB-3.2 Allreduce, over TCP and both btl and mtl MX (up to 128 processes).
The algorithm still only supports commutative operations.

This commit was SVN r13738.
2007-02-21 19:30:08 +00:00
Josh Hursey
c573171b7d Mostly a cleanup commit.
- Implement the BML/r2 finialize funciton
- Cleanup the btl close routine
- Wire up a pml_base_verbose MCA parameter so you can actually watch the PML selection logic if you really want to.
- Fix a potental segfault in the selection logic.
  ompi_pointer_array_get_item() may return NULL, so we have to check for it

This commit was SVN r13734.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2007-02-21 16:18:43 +00:00
Jelena Pjesivac-Grbovic
b608887466 Adding variant of linear alltoall algorithm where the number of
outstanding requests can be limited using mca parameters.
The implementation passed Intel, IMB-3.2, and mpi_test_suite tests over
TCP and MX up to 128 processes (64 nodes), on both 32-bit and 64-bit machines.
It is not activated by default, but it should be useful for really large
communicator sizes.

This commit was SVN r13720.
2007-02-20 04:25:00 +00:00
Jeff Squyres
f820e44112 Remove a gcc-ism from the code (defining an anonymous union in the
middle of a struct).  Now we properly define and name the union
outside the struct and simply create an instance of it inside the
struct. 

This commit was SVN r13709.
2007-02-19 18:21:57 +00:00
George Bosilca
020b8ade70 A slightly better fix for the data mismatch compiler complaints.
This commit was SVN r13695.
2007-02-17 05:23:57 +00:00
Jelena Pjesivac-Grbovic
d2d02642ca Removing compilation warnings about the output format.
This commit was SVN r13693.
2007-02-16 23:32:47 +00:00
Rich Graham
b925d6588d add some missing error checking - thanks to Ron B.
This commit was SVN r13692.
2007-02-16 22:19:24 +00:00
George Bosilca
04138c23af No more warnings.
This commit was SVN r13683.
2007-02-16 16:25:58 +00:00
Pavel Shamis
edeab0e912 Adding Mellanox Technologies copyright to files touched by Mellanox.
This commit was SVN r13669.
2007-02-15 18:03:20 +00:00
Jelena Pjesivac-Grbovic
e532b928af Adding segmented binary reduce algorithm which works with non-commutative operations.
Implementation passed intel: MPI_Reduce_c , MPI_Reduce_loc_c, and MPI_Reduce_user_c tests
over TCP, BTL MX, and MTL MX, as well as, mpi_test_suite Reduce tests (up to 64 nodes).

The algorithm is still not activated by decision function (will be in the near future).

This commit was SVN r13657.
2007-02-14 22:38:38 +00:00
Pavel Shamis
2483cefc57 Additional check if descriptor is NULL. It prevents
mca_pml_dr_sendreq_cleanup_active failure on segfault.

This commit was SVN r13647.
2007-02-14 10:43:43 +00:00
Brian Barrett
c00d841741 Fix hang on Cray machine introduced with r13582. The modex will never fire
when on the Cray machine (aka when the NULL GPR is in use).

This commit was SVN r13638.

The following SVN revision numbers were found above:
  r13582 --> open-mpi/ompi@041beeb1b6
2007-02-13 18:34:03 +00:00
Gleb Natapov
4d4b0a022a Add error callback to sm BTL. Call it when allocation of the initial circular
buffer fails. If cb is already allocated, but it is full and allocation of
additional cb fails, we spin waiting for receiver to free space in existing
cb.

This commit was SVN r13635.
2007-02-13 12:01:36 +00:00
George Bosilca
2e042c91cf Once we compute the local offset use it (instead of the global one).
This commit was SVN r13634.
2007-02-13 09:34:04 +00:00
George Bosilca
22eca30b45 One less compiler warning.
This commit was SVN r13633.
2007-02-13 09:32:57 +00:00
Gleb Natapov
1033002595 Fix memory leak. Free allocated descriptor if operation cannot proceed.
This commit was SVN r13610.
2007-02-12 09:47:51 +00:00
Jelena Pjesivac-Grbovic
b52dc9e427 Modifying fixed decision function for reduce to utilize linear algorithm only for really small communicator sizes.
This commit was SVN r13597.
2007-02-10 00:31:10 +00:00
Brian Barrett
041beeb1b6 Share currently selected PML in the modex information, then check whenever
adding new procs that the remote proc's pml is the same as our local pml.
Turns the hangs from mismatched PMLs into an abort, which is better,
I think.

This commit was SVN r13582.
2007-02-09 16:38:16 +00:00
Galen Shipman
f98a442c82 Fix a problem in the selection logic for MX. Basically we need to be able to
open MTL MX and BTL MX and initialize them at the same time. The problem is
that both call mx_init and mx_finalize, solution is to add an external entity
that does the init and finalize (based on ref counting).

This commit was SVN r13576.
2007-02-09 03:19:38 +00:00
Jelena Pjesivac-Grbovic
6efca498ec Fixes trac:692 in trunk: receive buffer in MPI_Reduce operation is no longer overwritten on non-root nodes.
This commit was SVN r13538.

The following Trac tickets were found above:
  Ticket 692 --> https://svn.open-mpi.org/trac/ompi/ticket/692
2007-02-07 18:57:03 +00:00
Josh Hursey
90f449f675 fix a typo that got in there
This commit was SVN r13523.
2007-02-06 20:56:48 +00:00
Jeff Squyres
c91fcd7fbd Fix a bunch of minor typos submitted by Bernhard Fischer.
This commit was SVN r13505.
2007-02-06 12:00:30 +00:00
Brian Barrett
09cc9e4941 properly compute starting offset -- the lb will be included in the offset, so we don't need
both.

Refs trac:864

This commit was SVN r13494.

The following Trac tickets were found above:
  Ticket 864 --> https://svn.open-mpi.org/trac/ompi/ticket/864
2007-02-05 18:12:18 +00:00
Galen Shipman
ec610a9e65 spread priorities out a bit..
This commit was SVN r13487.
2007-02-04 00:55:25 +00:00
Galen Shipman
ddf08cb0b3 woops..
This commit was SVN r13482.
2007-02-03 02:32:00 +00:00
Galen Shipman
a94101fa62 mostly another hack around for PML selection, allows CM be select itself if an
MTL is available, if not OB1 is used. Still prevents DR and OB1 from stomping
on each other though. 

This commit was SVN r13481.
2007-02-03 02:01:18 +00:00
Christian Bell
e04c55af00 Fixes to psm mtl following a more comprehensive testing of intel tests.
This commit was SVN r13471.
2007-02-02 21:55:04 +00:00
George Bosilca
0ff2115964 Other warnings are now silenced.
This commit was SVN r13462.
2007-02-02 06:47:35 +00:00
Jelena Pjesivac-Grbovic
e193d625bc Bugfix for ring allreduce algorithm.
The step used to iterate through buffer was function of true_extent instead of extent.

This may or may not solve ticket #689 because I am still getting failures over btl mx, 
but I cannot reproduce failures over mtl mx nor tcp.

This commit was SVN r13459.
2007-02-02 02:44:16 +00:00
George Bosilca
1c7c39b32b I miss this warnings on my last commit.
This commit was SVN r13431.
2007-02-01 19:34:21 +00:00
George Bosilca
79ea6d471b Even less warnings.
This commit was SVN r13429.
2007-02-01 19:27:11 +00:00
George Bosilca
56ffbfc5ff Get rid of the warnings in the Open IB BTL.
This commit was SVN r13424.
2007-02-01 19:07:04 +00:00
George Bosilca
b611e6d7dc Less warnings.
This commit was SVN r13419.
2007-02-01 17:51:43 +00:00
George Bosilca
6ef3917741 Allow the user to specify the bandwidth and latency for the MX device.
This commit was SVN r13418.
2007-02-01 17:51:00 +00:00
Brian Barrett
58b325b03f Two changes to improve the sm situation with spawn:
* have the mpool size be based on MCW, not num procs
    in other jobs we know about.  Solves the problem of
    the spawned job having a much bigger than needed
    sm file
  * Can't assume that "me" is in the list of procs
    passed to addprocs, so need to use slightly different
    logic and not go through all of add procs unless
    there's a proc in my job that isn't me.

This seems to greatly improve the situation, although
there still seems to be more of a slowdown through
MPI_INIT for the children (if there are more than one
child) than MPI_INIT for the parent if there are 'n'
children compared to 'n' parents.  Hopefully that
made sense ;)

This commit was SVN r13417.
2007-02-01 17:18:35 +00:00
Brian Barrett
a0b40ce45a Fix race condition in setting MPI_ERROR -- with buffered sends, the
request can complete before the operation, meaning that a bogus MPI_ERROR
is read

This commit was SVN r13401.
2007-01-31 21:40:14 +00:00
Brian Barrett
039a3d8c17 add comment about why there's no status update here, since I always forget
This commit was SVN r13400.
2007-01-31 21:39:20 +00:00
Brian Barrett
846eed84f1 When receiving a message, need to account for the fact that the displacement
of the first entry might not be the start of the user's buffer.  This is
similar to what ompi_convertor_unpack does.  This is the solution for
the test case attached to ticket #690.

Refs trac:690

This commit was SVN r13397.

The following Trac tickets were found above:
  Ticket 690 --> https://svn.open-mpi.org/trac/ompi/ticket/690
2007-01-31 18:18:19 +00:00
Brian Barrett
65b07140c0 clean up some of the printf warnings caused by the attribute code
This commit was SVN r13395.
2007-01-31 17:11:06 +00:00
George Bosilca
a02d1c7c8d No more warnings.
This commit was SVN r13382.
2007-01-31 04:27:41 +00:00
Brian Barrett
ee753694e0 Print out the memlock limit when we can't allocate memory
This commit was SVN r13372.
2007-01-30 21:22:56 +00:00
Rainer Keller
061ba05439 - Fixes uncovered with the format attribute to
opal_output and opal_output_verbose

This commit was SVN r13371.
2007-01-30 20:56:31 +00:00
Jeff Squyres
86f8c66a27 Turns out that the leave_pinned stuff isn't used in these BTLs at
all.  So just remove it.

This commit was SVN r13360.
2007-01-30 15:39:49 +00:00
Rainer Keller
3669e8921e - Fix further compiler warnings regarding initialization
and shadowing variables.

This commit was SVN r13358.
2007-01-30 06:34:38 +00:00
Jeff Squyres
c9f072b84f Strike down a few more stray places that were registering
mpi_leave_pinned and replace them with the one central global
variable.

This commit was SVN r13349.
2007-01-29 20:24:31 +00:00
Brian Barrett
93a2f31932 Use a recursive halving communication algorithm similar to the one used by
MPICH2 for "small" commutative operations in the reduce_scatter basic
implementation.  "small" is currently pretty big, as it doesn't take
much to beat reduce/scatterv.  Need to do much more than this for
better all around performance of MPI_Reduce_scatter, but this was enough
to solve the problems I was having.

This commit was SVN r13348.
2007-01-29 19:29:35 +00:00
Rainer Keller
ca35881cd0 - Minor bugfixes and removed compiler warnings
This commit was SVN r13343.
2007-01-28 19:52:09 +00:00
Jelena Pjesivac-Grbovic
33dcb4f810 Minor change to linear alltoall algorithm:
- post isends in reverse order of posting irecvs.
if the messages arrive approximately in order, this should 
minimize the time spent in matching the requests.

I did not see any performance difference over MX up to 64 nodes, but 
the change makes sense and may have some impact when we have (many) 
more nodes.

This commit was SVN r13337.
2007-01-26 21:59:31 +00:00
Brian Barrett
385a435813 Start long message send as soon as possible, to minimze ack time for the receive,
greatly increasing mid-range bandwidth

This commit was SVN r13317.
2007-01-25 23:07:03 +00:00
Rich Graham
1c20feb52b Take into account constants that in the cray headers are defined different than in the portals spec.
This commit was SVN r13311.
2007-01-25 18:32:47 +00:00
Jeff Squyres
7b6ed64c7b Add in the hostname to the BTL_* output macros so that you can tell on
which node an event occurred.

This commit was SVN r13302.
2007-01-25 14:02:54 +00:00
Jeff Squyres
6fea000e5f Oops -- get the right function name (copy-n-paste error).
This commit was SVN r13290.
2007-01-24 22:31:13 +00:00
Jeff Squyres
6b69ea664d Make a much, much better error message for a not-uncommon failure
scenario (user/sysadmin forgot to set the memlock limits high
enough).

This commit was SVN r13289.
2007-01-24 22:25:40 +00:00
Patrick Geoffray
b252cb82c8 oops, ".", not "->", copy error...
This commit was SVN r13287.
2007-01-24 19:16:46 +00:00
Patrick Geoffray
d58f6b2451 Free memory in synchronous send case if free_after requires it.
Fixes memory leak using synchronous sends and custom data types.

This commit was SVN r13286.
2007-01-24 19:10:38 +00:00
George Bosilca
d19a4f4740 Cast it to make cl happy.
This commit was SVN r13267.
2007-01-24 00:51:01 +00:00
George Bosilca
790f175d4e Explicit conversions to make the code Windows friendly.
This commit was SVN r13266.
2007-01-24 00:50:24 +00:00
George Bosilca
a4488ff8d2 Add explicit conversions.
This commit was SVN r13265.
2007-01-24 00:49:08 +00:00
George Bosilca
6f720f0d26 Add all required explicit conversions in order to be able
to build on Windows.

This commit was SVN r13264.
2007-01-24 00:48:16 +00:00
Jeff Squyres
c9fe68c406 Better patch from Gleb to do the per-port (endpoint) specification of
whether to use eager RDMA or not

This commit was SVN r13262.
2007-01-23 22:40:59 +00:00
Jelena Pjesivac-Grbovic
5cbcf42dc3 Removing yet another unsed variable (missed it in previous submit).
This commit was SVN r13259.
2007-01-23 21:30:57 +00:00
Jelena Pjesivac-Grbovic
afbd032ff9 Removing compiler warnings about comparison of unsigned values to signed ones, and
unused variables.

This commit was SVN r13258.
2007-01-23 21:10:07 +00:00
Jelena Pjesivac-Grbovic
568477ade8 Adding new Allreduce algorithms, updating allreduce decision function, and cleaning up util.
- Allreduce algorithms:
  - Recursive doubling is used for small messages (up to 10KB) and can be used for 
    both commutative and non-commutative operations.  
	 Recursive doubling passed OCC, IMB-3.2, Intel (Allreduce_c, Allreduce_loc_c, and
	 Allreduce_user_c), mpi_test_suite (Allreduce MIN/MAX, and Allreduce MIN/MAX with 
	 MPI_IN_PLACE) tests on TCP up to 36 nodes and MX up to 64 nodes.
  - Ring algorithms performs well for larger messages but cannot be used for 
    non-commutative operations.  It passed the same tests as recursive doubling, except
	 some of the non-commutative tests in Intel benchmarks Allreduce_loc_c and Allreduce_user_c
	 (which was expected).
- MPI_Allreduce with new decision function passed all of the tests mentioned above.
- Cleaning up coll_tuned_util.  Moving isendrecv to static inline just like sendrecv. 

This commit was SVN r13252.
2007-01-23 01:19:11 +00:00
Jeff Squyres
3389a523e9 Arrgh. That printf should not have been in there!
This commit was SVN r13243.
2007-01-22 18:52:49 +00:00
Jeff Squyres
a24f3c0886 Move the "use eager RDMA" flag to the individual openib BTL modules,
not the component.  This potentially allows for a mix of HCAs that
support eager RDMA and those who do not on a port-by-port basis.

This commit was SVN r13242.
2007-01-22 18:49:32 +00:00
Jeff Squyres
91b855c2f4 Minor fixes in the help messages
* If the text to cite where the problem occurred is "\n", prettyprint
   somethign a little nicer so that it's clear that we're talking
   about the end of line
 * Add a missing help message ("ini file:unknown field"), and display
   it a little better (i.e., show the erroneous field, not a
   misleading "end of line" marker)
 * It's "OpenIB", not "Open IB"

This commit was SVN r13241.
2007-01-22 18:45:43 +00:00
George Bosilca
242292673a sendrecv is a static inline.
This commit was SVN r13237.
2007-01-22 05:50:23 +00:00
Rainer Keller
96030de97b - Initialize the size of the opal_object class.
- Use the OBJ_CLASS_INSTANCE macro to initialize classes.
   This also gets rid of several missing initialization errors.

This commit was SVN r13227.
2007-01-21 14:24:29 +00:00
Jeff Squyres
52ca6cf86c The mpi_leave_pinned and mpi_leave_pinned_pipeline MCA parameters were
needlessly registered in multiple different places, and none of them
had a good help string.  There was also an inconsistent check for
setting both mpi_leave_pinned and mpi_leave_pinned_pipeline (i.e., it
was only in ob1).  This commit moves the registration of these params
to one central place (ompi/runtime/ompi_mpi_params.c, with all other
mpi_* MCA params) and uses globals to propagate the values as
relevant.  The error check was also moved to the central location to
ensure that we can consistency everywhere.

This commit was SVN r13226.
2007-01-21 14:02:06 +00:00
Rainer Keller
125ba1acfa - Reduce the amount of warnings with -Wshadow -- mainly due to
usage of index and abs in inline-fcts in header files.

This commit was SVN r13217.
2007-01-19 19:48:06 +00:00
Jeff Squyres
e934272f3e Commit data supplied by Christian Bell at QLogic for their vendor and
part ID's.

This commit was SVN r13216.
2007-01-19 19:46:29 +00:00
Sven Stork
862dcb1a34 - fix compiler warning in ia64
This commit was SVN r13212.
2007-01-19 14:48:47 +00:00
Rolf vandeVaart
6a260e4a9a Fix two problems. For MPI_Buffer_detach, do not attempt to
return the buffer address from Fortran. It is not expected
behavior.  For MPI_Buffer_attach, adjust the address of
the buffer handed in so it is always aligned.  

Refs trac:750
Buffer detach reviewed by Jeff Squyres
Buffer attach alignment reviewed by George Bosilca

This commit was SVN r13205.

The following Trac tickets were found above:
  Ticket 750 --> https://svn.open-mpi.org/trac/ompi/ticket/750
2007-01-18 23:32:39 +00:00
Ralph Castain
4ef4cbb5ad Fix a compiler warning about comparing signed/unsigned values
This commit was SVN r13190.
2007-01-18 17:14:06 +00:00
Gleb Natapov
4c7dbd36c7 Balance RDMA operation in round robin fashion between all available RDMA BTLs.
OB1 always use first element from array of BTLs available for RDMA. The patch
change the array creation algorithm, it puts different BTL in the first element
in round robin fashion.

This commit was SVN r13174.
2007-01-18 09:15:18 +00:00
Brian Barrett
860fd63710 lower priority of rdma one-sided component so that pt2pt is preferred for most
people, so that it gets more testing

This commit was SVN r13163.
2007-01-17 22:01:03 +00:00
Jelena Pjesivac-Grbovic
85192c01b0 Modifying util functionality:
- removing static qualification on ompi_coll_tuned_sendrecv 
- adding ompi_coll_tuned_isendrecv function which posts isend and irecv requests
These changes are separate from but necessary for new algorithms I am working on.

This commit was SVN r13161.
2007-01-17 21:29:13 +00:00
Brian Barrett
95c0a17b9a Send the unlock request before starting the requests. We won't unlock until we get an ack from the remote side,
so there's no longer a race there (I used to do the unlock request last, after local completion of all the
requests completed, to try to avoid having the passive side reply to the active side, but I don't do that
anymore).  The unlock side will not "unlock" the window until it actually receives the correct number of results,
so we're good there.

This fixes an issue where we would receive data on the remote side we weren't expecting that could cause
us to release a lock before it really should have been released to the requesting peer.  It could also
cause a deadlock if one of the processes trying to unlock was "self", as that would result in the active
unlock never sending the unlock request, even though it sent the payload, which could cause a counter
that should always be positive to hit -1, causing an infinite loop that could only be solved by
popping up the stack, which was an impossibility.

Refs trac:785

This commit was SVN r13160.

The following Trac tickets were found above:
  Ticket 785 --> https://svn.open-mpi.org/trac/ompi/ticket/785
2007-01-17 21:13:12 +00:00
Brian Barrett
c1be97199b Fix an issue with recursive calls into the component progress caused by btls sometimes calling opal_progress()
during their send calls by dropping the loop through the list of pending control messages if any are marked
as completed.

Refs trac:784

This commit was SVN r13159.

The following Trac tickets were found above:
  Ticket 784 --> https://svn.open-mpi.org/trac/ompi/ticket/784
2007-01-17 20:48:35 +00:00
Jeff Squyres
52e8089600 Fix compiler warning.
This commit was SVN r13148.
2007-01-17 14:23:46 +00:00
Brian Barrett
35c57457c6 Don't call ompi_request_test() if the request isn't likely to finish.
Otherwise, we end up recursively calling into the progress functions
and corrupting a list that doesn't like to be corrupted.

Refs trac:561

This commit was SVN r13138.

The following Trac tickets were found above:
  Ticket 561 --> https://svn.open-mpi.org/trac/ompi/ticket/561
2007-01-17 02:30:11 +00:00
Jeff Squyres
754042f1fc Fix a compiler warning.
This commit was SVN r13134.
2007-01-16 23:03:17 +00:00
George Bosilca
3a07982ae7 This file is just a left over from a dark past.
This commit was SVN r13132.
2007-01-16 22:02:13 +00:00
Brian Barrett
f03ffb3a62 Send reply from the passive side of an unlock request back to the active
side and only let MPI_WIN_UNLOCK return when the passive side has actively
replied that the window is unlocked.

Refs trac:761

This commit was SVN r13118.

The following Trac tickets were found above:
  Ticket 761 --> https://svn.open-mpi.org/trac/ompi/ticket/761
2007-01-14 22:08:38 +00:00
Brian Barrett
e93eaa0790 Remote pointers are always in .lval, not .pval, so need to read the .lval
and convert it to a pointer when finding the destination addr.

Refs trac:587

This commit was SVN r13116.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-14 20:34:41 +00:00
Jeff Squyres
d5404f21a3 Make the trunk openib btl compile again.
This commit was SVN r13110.
2007-01-13 14:22:42 +00:00
Galen Shipman
4a6ad30440 remove unused macro calls..
This commit was SVN r13107.
2007-01-12 23:17:17 +00:00
Galen Shipman
2097d174f6 heterogeneous fixes to the OpenIB BTL. This includes work by nysal, brian and
I. 

This commit was SVN r13106.
2007-01-12 23:14:45 +00:00
Galen Shipman
df099a4731 call it what it is...
we are looking at subnet_id's and we are counting active ports per subnet. 
move subnet count out of procs loop,, no need to do it there... 

This commit was SVN r13105.
2007-01-12 22:42:20 +00:00
Brian Barrett
075161afa9 Enable MX wireup in heterogeneous situations.
Refs trac:587

This commit was SVN r13095.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-12 15:44:58 +00:00
Donald Kerr
ed097d17c1 fix for bug #749, though I can not confirm without a linux compiler
This commit was SVN r13090.
2007-01-11 22:25:13 +00:00
Donald Kerr
80f2cbb498 add udapl rdma capabilities into the udapl btl
This commit was SVN r13082.
2007-01-11 15:22:08 +00:00
Jeff Squyres
e5205657cf A much better fix for #739. No configure test -- just do a simple
memcpy() instead of assigning the struct's by value.

Fixes trac:739.

This commit was SVN r13081.

The following Trac tickets were found above:
  Ticket 739 --> https://svn.open-mpi.org/trac/ompi/ticket/739
2007-01-11 14:30:32 +00:00
Jeff Squyres
add3909096 Back out 13076 and 13077 in favor of a much simpler approach.
Sorry for the configure change -- hopefully it's early enough in the
morning that it won't affect people... (new approach won't have a
configure change).

Refs trac:739.

This commit was SVN r13080.

The following Trac tickets were found above:
  Ticket 739 --> https://svn.open-mpi.org/trac/ompi/ticket/739
2007-01-11 14:07:15 +00:00
George Bosilca
ceb5d436d8 Missing an include.
This commit was SVN r13078.
2007-01-11 05:45:13 +00:00
George Bosilca
24a91fad1d OPAL_BOOL_STRUCT_COPY or OMPI_BOOL_STRUCT_COPY that's the question!
Let's minimize the disturbances and say that the configure system is right.
From now on it's OPAL_BOOL_STRUCT_COPY. This one is related to r13076 and
has to follow when r13076 goes in the 1.2.

This commit was SVN r13077.

The following SVN revision numbers were found above:
  r13076 --> open-mpi/ompi@f0932a0701
2007-01-11 05:44:48 +00:00
Jeff Squyres
f0932a0701 A workaround for a bug in the PGI 6.2 compiler series. This bug has
been fixed in the 7.0 PGI series, but is unlikely to be fixed in the
6.2 series:

 * Add a configure test looking for the bad behavior (the PGI compiler
   chokes on C code where structs containing bool's are copied by
   value)
 * Set OMPI_BOOL_STRUCT_COPY to 1 if it's ok, 0 if it's not (i.e., PGI
   6.2 series will have this value set to 0)
 * In two places in the code base -- orte-clean and btl_openib_ini.h,
   we have a struct that contains a bool that is copied by value.  In
   these two places, check OMPI_BOOL_STRUCT_COPY and if it's 1, use
   the "int" type instead of "bool".

Fixes trac:739

This commit was SVN r13076.

The following Trac tickets were found above:
  Ticket 739 --> https://svn.open-mpi.org/trac/ompi/ticket/739
2007-01-11 02:21:26 +00:00
Jelena Pjesivac-Grbovic
d2921a9d42 Cleanup of Barrier implementation:
- utilizing coll_tuned_util functions
- setting line length to 80.

This implementation uses standard send messages (instead of synchronous ones).
The change improved our performance over MX multiple number of times, however,
there exists a small potential that last message to be sent can be delayed 
(until next mpi call, which means potentially infinitely).

If this shows to be a problem, I will modify the algorithms to use synchronous
send as last operation (which will incur performance penalty again).

This commit was SVN r13071.
2007-01-10 22:49:43 +00:00
Jelena Pjesivac-Grbovic
ccc3ee0b6b Minor changes to allgather implementation with some clean-up of util code.
- in allgather algorithms I replaces irecv-isend-waitall sequence with 
  call to ompi_coll_tuned_sendrecv
- most of the functions in util code and allgather decision function conform to 80 character line width.
- 

This commit was SVN r13069.
2007-01-10 21:56:59 +00:00
Josh Hursey
93208445fd Make sure we wireup the 'verbose' MCA parameter for the BTL's.
This commit was SVN r13067.
2007-01-10 21:24:35 +00:00
Gleb Natapov
624f139bd8 This commit fixes trac:729. Initialize pointer to registration to NULL. Otherwise
it may contain garbage and we will try to unregister it later in btl_free().

This commit was SVN r13054.

The following Trac tickets were found above:
  Ticket 729 --> https://svn.open-mpi.org/trac/ompi/ticket/729
2007-01-09 10:29:20 +00:00
Gleb Natapov
d3ac56272a Prevent access to openib_btl after free().
This commit was SVN r13052.
2007-01-09 09:07:32 +00:00
George Bosilca
87ff2b5ce8 Cast to the correct type.
This commit was SVN r13046.
2007-01-08 22:04:01 +00:00
George Bosilca
f419960c7f All files have to include ompi_config.h before anything else.
This commit was SVN r13045.
2007-01-08 22:03:16 +00:00
George Bosilca
53ddbe8446 Nothing relevant.
This commit was SVN r13044.
2007-01-08 22:02:17 +00:00
Brian Barrett
e130f18cc2 Fix some compiler warnings that have slipped in lately...
This commit was SVN r13037.
2007-01-08 17:20:09 +00:00
Brian Barrett
a34e67d743 Remove unneeded PARAM_INIT_FILE variable in configure.params files used by
components that use configure.m4 for configuration or are always built. 
The macro has not been needed since moving to configure types other than
configure.stub

Fixes trac:590

This commit was SVN r13031.

The following Trac tickets were found above:
  Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590
2007-01-08 03:44:22 +00:00
Brian Barrett
b8413fb1d5 Just cast the pointer to a uintptr_t then to the match bits, instead of abusing the ompi_ptr_t interface. Not critical for v1.2, as there are no portals platforms that are big endian, so the code in v1.2 will work well enough for now
This commit was SVN r13024.
2007-01-07 03:11:27 +00:00
Brian Barrett
8900d3ae43 Second take at fixing the issues with using ompi_ptr_t. Add helper functions for converting from .pval to .lval and vice-versa. Users of ompi_ptr_t types should only use one of the fields in the union unless using the helper conversion functions. For the BTLs, local pointers will always be stored in the .pval field and remote pointers always stored in the .lval field.
George wrote the initial patch, I extended it slightly and am responsible for all bugs found.

Refs trac:587

This commit was SVN r13023.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-07 01:48:57 +00:00
Jelena Pjesivac-Grbovic
eae3df4904 Updated broadcast decision function based on MX results up to 64 nodes.
(The previous decision function did not consider binomial algorithm (since we did not have it at the time)).

This commit was SVN r13007.
2007-01-06 00:37:40 +00:00
Brian Barrett
48ec0b2071 Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix
for now...

This commit was SVN r12997.

The following SVN revision numbers were found above:
  r12974 --> open-mpi/ompi@27cea44a9c
2007-01-04 22:07:37 +00:00
Galen Shipman
d207a6c988 endpoint should use a uint64_t for subnet, as everyone else does.. makes bad
things happen when packing into a 64 bit buffer... 

Misc cleanup.. 

This commit was SVN r12993.
2007-01-04 20:25:28 +00:00
Brian Barrett
936fdd2ae1 remove some code that accidently came in with r12974. Refs trac:587
This commit was SVN r12991.

The following SVN revision numbers were found above:
  r12974 --> open-mpi/ompi@27cea44a9c

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-04 20:17:07 +00:00
Galen Shipman
931a389c4f fix deadlock on rendezvous protocol..
This commit was SVN r12982.
2007-01-04 03:46:11 +00:00
Galen Shipman
f12bbe0591 Handle different subnets correctly and multiple nic endpoint negotiation
This is somewhat limited currently for expample,  if you have 3 ports on Node A and 5 ports
on Node B then the peers will use 3 ports to communicate with each other. 
This is on a subnet basis, so for any pair of nodes we take the
intersection of the available ports within a subnet.

We use subnets to determine reachability for lazy connection establishment. So
if Node A and Node B each have two HCA's (on seperate networks) then the
subnet's must be distinct, otherwise we will try to wire up HCA's on seperate
networks.  

This commit was SVN r12978.
2007-01-03 22:35:41 +00:00
Brian Barrett
7cac26d240 * fix some typos that slipped in with r12974. Refs trac:587
This commit was SVN r12976.

The following SVN revision numbers were found above:
  r12974 --> open-mpi/ompi@27cea44a9c

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-03 20:14:45 +00:00
Brian Barrett
27cea44a9c Fix a number of issues with the ompi_ptr_t:
* Make sure that the pval always writes to the correct portion of the
    lval.  This only matters on 32 bit big endian machines.
  * On 32 bit machines when assigning to pval, the other 4 bytes of lval
    weren't being written, which could lead to bogus data

We use macros so that there aren't casts all over the code and the pval
assignment can occur to the correct 4 bytes.  Refs trac:587

This commit was SVN r12974.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-03 19:47:48 +00:00
Gleb Natapov
a6127fd8ce Increase req_bytes_delivered atomically.
This commit was SVN r12971.
2007-01-03 15:19:34 +00:00
Gleb Natapov
79202561f6 Don't check req_pipeline_depth on frag completion. Checking of
req_bytes_delivered should be enough.

This commit was SVN r12967.
2007-01-03 14:44:20 +00:00
Gleb Natapov
1ad6c41735 Sender can start scheduling send fragments immediately after receiving ACK. No
need to wait for RNDV completion.

This commit was SVN r12965.
2007-01-03 12:37:11 +00:00
Rich Graham
8a9da02063 change code to conform with coding standard.
Handle error condition where shared memory file is not created.

This commit was SVN r12964.
2007-01-03 00:06:02 +00:00
Donald Kerr
899297c8f4 udapl btl was not compiling after r12878 on 12/17/2006, some minor changes to allow btl to compile
This commit was SVN r12963.

The following SVN revision numbers were found above:
  r12878 --> open-mpi/ompi@190e7a27cd
2007-01-02 21:44:12 +00:00
George Bosilca
d8dee3a740 If the MX driver was unable to load correctly, or if the endpoint was not
created then don't try to call the MX endpoint close function.

This commit was SVN r12950.
2007-01-02 00:01:50 +00:00
Rich Graham
6cb2377015 Change the allocation of the shared memory backing file. The file
is allocated on a per comm_world instance, with the lowest rank
in comm_world on the given host creating and initializing the file,
and then notifying the remaining files via the OOB.

Reviewed: Ralph Castain, Brian Barrett
Addressing ticket #674.

This commit was SVN r12949.
2007-01-01 02:39:02 +00:00
George Bosilca
e223b27268 A fragment is marked completed by the PML when the peer signal the
completion of the RDMA operation associated with the fragment. The
PML will call the BML free which in turn will call the BTL free. The MX 
BTL will not release the fragment if it not tagged with 0xff.

This commit was SVN r12947.
2006-12-31 03:17:47 +00:00
George Bosilca
47601e315e Allow the MX BTL to select at runtime if the unexpected handler will
be activated or not.

This commit was SVN r12944.
2006-12-30 20:57:50 +00:00
Brian Barrett
99c0a29602 Disable CM and DR PMLs in heterogeneous situtations as neither are
heterogeneous safe.

Refs trac:587

This commit was SVN r12942.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2006-12-30 16:17:56 +00:00
George Bosilca
d401a65975 Minor cleanups. Don't set the fields that will never be used.
This commit was SVN r12941.
2006-12-29 07:55:17 +00:00
George Bosilca
0b5d879a63 ompi_convertor_pack do not return errors (all checkings are done when the
convertor is created).

This commit was SVN r12940.
2006-12-29 07:40:02 +00:00
George Bosilca
d8db9e49f3 Set the bml_btl to NULL or segfault !!!
This commit was SVN r12939.
2006-12-29 07:38:24 +00:00