1
1
Граф коммитов

1224 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
5cadbbbf41 Fix for bug #140. If we're leaving things pinned, certain assumptions about
where to look for registrations that were used in the alloc/free code don't
work (because the memory returned from malloc() -- whowever gets around to
calling it) might actually be registered already.  So just call malloc
and free directly and avoid the whole issue when leave pinned is on.  After
all, you have to pay the registration cost sometime, and if leave pinned
is on, you only have to pay it once.  It makes things much simpler to
have that once be at first use rather than during ALLOC_MEM, and as far
as I can read, we're still standards conformant this way.

This commit was SVN r10406.
2006-06-17 18:34:41 +00:00
Brian Barrett
c9e8dbc10e * fix for multi-nic case with put protocol -- index will be 1 for the first
put request if we have more than one nic

This commit was SVN r10397.
2006-06-16 22:25:04 +00:00
George Bosilca
27000ef7d6 More compact and readable code. Otherwise, no big difference with the
previous version.

This commit was SVN r10389.
2006-06-16 03:07:42 +00:00
George Bosilca
3f96f39e46 If the goal of this code was to copy the iovec and skip the first offset
bytes then it was not correct.

This commit was SVN r10388.
2006-06-16 03:06:30 +00:00
George Bosilca
93afe59226 It is not required to initialize the csum.
This commit was SVN r10387.
2006-06-16 03:05:20 +00:00
George Bosilca
1f96768b76 For zero length persistent request do not reposition the convertor as
it is not initialized.

This commit was SVN r10386.
2006-06-16 03:04:41 +00:00
Brian Barrett
05046e8ad2 if MX isn't running on some hosts, but is on others, we were blocking in the modex receive
waiting for the non-running procs to publish their contact information.  Publish their
(lack of) contact information.

This commit was SVN r10355.
2006-06-14 19:07:38 +00:00
George Bosilca
aca71521db Complete the move of the mpool registration from opal_list_item_t to the
ompi_free_list_item_t.

This commit was SVN r10354.
2006-06-14 17:43:50 +00:00
Galen Shipman
5d71c149c2 Another fix for PML request completion when local network completion can occur
out of order.. 

Reviewed by Brian.. needs to hit 1.1 

This commit was SVN r10353.
2006-06-14 16:55:35 +00:00
Brian Barrett
d367dc5d56 * Fix for bug #115 -- we need to decrement the use count on a pinned buffer
so that memory is actually deregistered.  Reviewed by Galen.

This commit was SVN r10349.
2006-06-14 13:38:24 +00:00
George Bosilca
3727fa2ae6 Nothing relevant. I add some more output in the case we have a checksum error.
Just to be able to know more information about the failure.

This commit was SVN r10337.
2006-06-13 19:36:38 +00:00
Galen Shipman
0eddad6849 Handle out of order completion/receives when marking completion...
this is a fix for #107... needs to go to the 1.1 branch.. 

This commit was SVN r10331.
2006-06-13 16:57:41 +00:00
Andrew Friedley
c68c6ac122 A number of fixes and the usual cleanup..
- Added some basic flow control to limit number of posted sends.
- Merged endpoint send/recv lock into single endpoint lock.
- Set the LMR triplet length in the send path, not at allocation time.
  This has to be done because upper layers might send less than the
  amount allocated.
- Alter the tie-breaker if statement protecting the second call
  to dat_ep_connect().  The logic was reversed compared to the tie-
  breaker for the first dat_ep_connect(), making it possible for
  3 or more processes to form a deadlock loop.
- Some asserts were added for debugging purposes.. leaving them
  in place for now.

This commit was SVN r10317.
2006-06-12 22:42:01 +00:00
Galen Shipman
218a438509 finished the ompi_free_list_t class nightmare..
This commit was SVN r10314.
2006-06-12 22:09:03 +00:00
Galen Shipman
18dda70fd0 make ompi_free_list_item_t a class..
This will go to the 1.1 branch but will probably require a few changes as
ompi_free_list_t is different in the branch.. 

This commit was SVN r10306.
2006-06-12 16:44:00 +00:00
Brian Barrett
d3257f22d8 * back out Galen's r10300 because it breaks the build. Real fix coming RSN.
This commit was SVN r10303.

The following SVN revision numbers were found above:
  r10300 --> open-mpi/ompi@b0f3745791
2006-06-12 14:38:14 +00:00
Gleb Natapov
48d348b577 Don't complete send request before we've got completion on the first rndv packet.
Sender can receive and complete PUT request before it gets completion on the first rndv packet. senreq struct may be reused for the next MPI_Send and unexpected completion mess up the things. I sometimes got SEGV and sometimes data corruption.

This commit was SVN r10301.
2006-06-12 14:00:43 +00:00
Galen Shipman
b0f3745791 declare these as ompi_free_list_item_t's
This needs to go to 1.1

This commit was SVN r10300.
2006-06-12 13:26:15 +00:00
George Bosilca
7d1feffbf7 The real solution. If the sendreq->req_send.req_bytes_packed is zero then there
is no data to be trasfered. And this is the condition which lead to a non
initialized convertor.

This commit was SVN r10299.
2006-06-12 06:18:18 +00:00
George Bosilca
c959c2f214 Don't reset the convertor's position if it wasn't initialized before. This can
only happens for zero byte persistent requests.

This commit was SVN r10298.
2006-06-12 06:14:35 +00:00
Galen Shipman
9d73217637 These list items are free list items, and should inherit properly..
This commit was SVN r10295.
2006-06-11 20:19:12 +00:00
Brian Barrett
d5acb4e3cc * silence dumb (and mostly useless) warning during cleanup
This commit was SVN r10280.
2006-06-09 21:09:53 +00:00
Brian Barrett
cc99a63169 * fix issue with PANFS not building properly - we didn't add PANFS_LIB to the
list of libraries

This commit was SVN r10279.
2006-06-09 20:41:12 +00:00
Jeff Squyres
a4030ad2d9 Improve the tremendously unhelpful MCA help message for the
btl_openib_ib_mtu and btl_mvapi_ib_mtu MCA params by showing the valid
values what what they represent (got a question about this from Cisco
testing engineers).

This commit was SVN r10277.
2006-06-09 18:02:45 +00:00
Andrew Friedley
9a92394bfd Mostly cleanups - preprocessor fixes and removal of OPAL_OUTPUTs.
Also updated to match recent mpool_free changes.

This commit was SVN r10273.
2006-06-09 00:18:29 +00:00
Andrew Friedley
75176370ae blah. somehow missed adding .ompi_ignore/.ompi_unignore.
This commit was SVN r10272.
2006-06-09 00:15:36 +00:00
Andrew Friedley
cca1616368 Finally committing the UD BTL.
UD is the Unreliable Datagram transport for Infiniband, specifically OpenIB.  This BTL is derived from the existing openib BTL, which is RC (Reliable Connection) based.

Still a work in progress, as there is a lot of work left to do.  Specifically, performance, scalability, and flow control need to be addressed.

Currently I'm playing around with different methods for handling receive buffers, as well as profiling to figure out where the time is going.

This commit was SVN r10271.
2006-06-09 00:13:45 +00:00
Galen Shipman
08823e56fa check address before looking for the item in the tree corresponding to the
address.. 
All have been reviewed by brian.. putting in a changeset request.. 

This commit was SVN r10256.
2006-06-08 16:27:59 +00:00
Galen Shipman
636ef0cf6c don't put back null items on the list..
This commit was SVN r10253.
2006-06-08 14:46:41 +00:00
Galen Shipman
429056078a fix numerous late night errors..
1) don't need tree if memory is just malloc'd 
2) fix memory and free list leak.. 
3) deregister first and then free... doh.. 

This commit was SVN r10251.
2006-06-08 14:23:20 +00:00
Galen Shipman
5a2ceda93f a couple of stupid late night mistakes...
This commit was SVN r10250.
2006-06-08 13:39:41 +00:00
Galen Shipman
0bb8a6fca8 roll back to not use memalign
This commit was SVN r10249.
2006-06-08 04:34:04 +00:00
Galen Shipman
b42b0bd1af potential fix for ticket #81
Added a tree to track memory allocation from MPI_Alloc_mem, this allows us to
free the registrations in a sane fashion.. also should be faster.. 

This commit was SVN r10248.
2006-06-08 04:29:27 +00:00
Sven Stork
c31e6f9767 use memalign instead of malloc + manually alignment in the mvapi mpool
revert commit 10243

This commit was SVN r10247.
2006-06-07 23:21:23 +00:00
Andrew Friedley
5ace292cc1 Should fix ticket #81 - which is specific to MVAPI, I've included the same fix for gm/openib as well.
uDAPL has the same problem, will fix in separate commit so it doesn't go to branch.

This commit was SVN r10243.
2006-06-07 15:52:48 +00:00
Galen Shipman
84479d0b5a potential fix for iprobe test,, tested with openib.. will have andy try ud..
This commit was SVN r10232.
2006-06-06 22:10:41 +00:00
Galen Shipman
90799f82cd copy paste error..
This commit was SVN r10220.
2006-06-06 02:38:29 +00:00
Galen Shipman
cc54b07aa0 add better error messages for vapi retry exceeded errors.
This commit was SVN r10219.
2006-06-06 02:04:56 +00:00
Galen Shipman
9e6e7575b9 doh... add the file..
This commit was SVN r10210.
2006-06-05 21:24:42 +00:00
Galen Shipman
f05dee0435 add help file to explain why things went south..
This commit was SVN r10209.
2006-06-05 21:23:45 +00:00
Galen Shipman
74c97fb784 cleanup error reporting.. use ompi_proc_t->proc_name if available this gives
us source/dest hostnames for communication errors.. 

This goes to 1.1 branch (reviewed by Brian).. 

This commit was SVN r10200.
2006-06-05 20:02:41 +00:00
Brian Barrett
c70fff6ed0 * Fix for bug #44 for the trunk -- remove a bunch of warnings from the DR
PML when compiling on Solaris.  Patch won't apply cleanly to the v1.1
  branch, so a diff for that is coming up soon.

This commit was SVN r10173.
2006-06-01 18:58:38 +00:00
Galen Shipman
83ff3201b5 don't use rank or nprocs in error messages when we don't have them..
This should hit 1.1 and 1.0 branches.. 
Reviewed by Brian

This commit was SVN r10164.
2006-06-01 14:24:11 +00:00
Galen Shipman
0344ae4ac5 Fix to allow eager limit and max send size to be any size (within resource limitations). Instead of storing the ompi_free_list_t * in the fragment, we use the frag type enum, this tells us where the frag came from and where it should return.. This could also be done in mvapi but is not a high priority moving forward..
Review by Brian, needs to hit the trunk + 1.1 release.. 

This commit was SVN r10157.
2006-06-01 02:32:18 +00:00
Brian Barrett
5163f2b296 Fix for bug #36. The MX, MVAPI, and OpenIB components don't have
support for progress threads, so we shouldn't build them or try to use
them when support for progress threads has been requested.  The TCP, GM,
SELF, and SM BTLs should have progress thread support, so they aren't
disabled.  The Portals BTL isn't compiled on platforms with threads,
so it doens't need to be updated.

This commit was SVN r10156.
2006-06-01 01:30:16 +00:00
Galen Shipman
c79efc9efb track which list a fragment came from, allows returning based on list, not
on size. 

This commit was SVN r10142.
2006-05-31 14:24:32 +00:00
Brian Barrett
4904e34a52 set datarootdir, necessary for Autoconf-2.60 which will define some variables
based upon this value (e.g., datadir, docdir).

Submitted by: Ralf Wildenhues
Reviewed by: Brian Barrett

This commit was SVN r10133.
2006-05-31 03:43:55 +00:00
Brian Barrett
6026fc98f6 * Fix M4 quoting so that AC 2.60 won't complain
Submitted by: Ralf Wildenhues
Reviewed by: Brian Barrett

This commit was SVN r10129.
2006-05-31 03:39:18 +00:00
Brian Barrett
c723d196c5 Rather than using fragment size to determine fragment type, use an enum.
Do this rather than the my_list pointer because we need to do some
things that are somewhat special because we pre-pin eager fragments but
not send fragments.  Also makes a couple ideas I have slightly easier to
play around with.

This commit was SVN r10127.
2006-05-31 03:34:32 +00:00
Galen Shipman
2667c52a5d Track fragments by list, not by size..
-- reviewed by Brian, needs to hit all the branches.. 

This commit was SVN r10078.
2006-05-25 18:07:26 +00:00
Galen Shipman
38a0561d9b Allow maximum send size to be less than the eager limit.
Instead of figuring out which free list the fragment belongs to based on size
we simply store a pointer to the list which it belongs in the fragment.

This was reviewed by Brian and should hit all the branches.

This commit was SVN r10072.
2006-05-25 16:57:14 +00:00
Andrew Friedley
fa9ec2afdf Add my sandia username for convenience
This commit was SVN r10071.
2006-05-25 15:49:11 +00:00
Andrew Friedley
8a3d0862ca I can commit! *happy dance*
Trying to remember what I did here.. eager/max messages should work now, no RDMA yet.  A number of other fixes and cleanups.

I do know of two problems:
 Bad stuff happens when flooded with send frags too quickly - the BTL doesn't handle flow control.
 Certain IBM tests turn up a length assertion in the datatype engine - needs more investigation.

This commit was SVN r10070.
2006-05-25 15:47:59 +00:00
Gleb Natapov
f590d8a190 fix eager RDMA on PPC64.
This commit was SVN r10059.
2006-05-25 11:05:12 +00:00
Jeff Squyres
dd44d36be0 Fix for ticket #25. Ensure that in the threaded case where we have
This commit was SVN r10043.
2006-05-24 16:15:07 +00:00
George Bosilca
95d0395578 I'm skeptical about the ability of the compiler to correctly optimize the
loop local variables.

This commit was SVN r10019.
2006-05-23 03:21:15 +00:00
George Bosilca
085cac552f Don't let TCP to create local connections, we have the self BTL for this purpose.
This commit was SVN r10018.
2006-05-23 03:06:32 +00:00
George Bosilca
837221831a Temporary solution for in-bound computation of the next BTL.
This commit was SVN r10016.
2006-05-22 23:28:40 +00:00
George Bosilca
b8ef0cc749 Minor cleanups.
This commit was SVN r10001.
2006-05-21 05:55:21 +00:00
George Bosilca
e43fbd0082 Remove all useless variables. Minor cleanups.
This commit was SVN r10000.
2006-05-21 05:53:22 +00:00
Galen Shipman
9165882c07 fixes for failover...
This commit was SVN r9998.
2006-05-20 02:39:05 +00:00
Gleb Natapov
1c1b87a9f1 init mutex before use.
This commit was SVN r9963.
2006-05-18 09:35:11 +00:00
Jeff Squyres
15758d5f29 Fix AC_DEFINE to match what it's supposed to be defining
This commit was SVN r9952.
2006-05-17 03:26:43 +00:00
Galen Shipman
deb2254c91 1. mpool_free changes to allow null registrations
2. fix for MPI_Free_mem, was calling deregister but never called mpool_free.. so
we leaked memory. Still an open issue here though, if the memory is alloc'd
and the mpool doesn't create and cache a registration, we will never find the
mpool to free with. 

This commit was SVN r9944.
2006-05-16 22:04:31 +00:00
Jeff Squyres
7b59847765 Ensure that endpoint->endpoint_addr is not NULL before trying to
derefence through it.  It is legal for endpoint_addr to be NULL in the
destructor because if btl_tcp_add_procs() -> btl_tcp_proc_insert()
returns UNREACH, then endpoint_addr will be NULL and we'll OBJ_RELEASE
it.

This commit was SVN r9940.
2006-05-16 19:01:08 +00:00
Jeff Squyres
e24377a89c Back out a pair of commits from George from last week because they
apparently don't work properly: r9869, r9868 (sm btl alignment issues)

This commit was SVN r9936.

The following SVN revision numbers were found above:
  r9868 --> open-mpi/ompi@9b985c3216
  r9869 --> open-mpi/ompi@adedf511fb
2006-05-16 16:48:43 +00:00
Sven Stork
da7ad0e8b8 - update function name inside debug statement
This commit was SVN r9933.
2006-05-16 14:33:41 +00:00
Brian Barrett
dcc6b47fa2 * put rdma operations in the send event queue instead of receive because it's
easier to do event accounting that way
* greatly increase receive event and buffer sizes.  We're still about half
  of what Cray defaults to, so I don't feel bad about the increases
* Implement a pre-pinning optimization for eager fragments - will be
  pinned on first use and left pinned for the life of the fragment
* Since we can't have two receive frag callbacks fired at the same time,
  don't have receive free list - just keep one receive fragment in the
  module.  Saves a big free list and all that interaction.

This commit was SVN r9915.
2006-05-14 04:23:26 +00:00
Brian Barrett
db03ca0cc0 rip out a bunch of code that didn't work and really sucked and was only there
to try to get some numbers that I couldn't actually get.  So back to the
restart point.

This commit was SVN r9914.
2006-05-14 00:59:40 +00:00
Brian Barrett
f2a6e63d82 Fix for the double iWrite problem Edgar found with ROMIO, plus some other
things I found:
  - Locking should prevent it from happening (I think), but there was a 
    race condition in the component progress -- a callback could be
    triggered that would free the request before it was off the outstanding
    requests list.
  - When pulling a request off the component free list, make sure to
    reinitialize the free_called state on the IO request.  This was
    what was causing Edgar's failures
  - In the request cleanup code, pull the request out of the per-
    component free list before returning to the free list.  This
    probably would cause asserts to fire, although it looks like
    I wrote the loops such that it would have been memory safe if
    the asserts didn't fire.  Not really sure why I did that, but
    let's try it again...

This should go to the v1.0 and v1.1 branches.

This commit was SVN r9913.
2006-05-13 02:30:40 +00:00
Jeff Squyres
a6d52ceed1 Minor correction in use of mca param API; otherwise the param is not found.
This commit was SVN r9903.
2006-05-11 22:12:29 +00:00
Andrew Friedley
4c3aa05c83 uDAPL has an expects memory for enumerating interface adapters in a really
weird way - fix up to do things 'properly'.

Add my sandia username to the unignore.

This commit was SVN r9879.
2006-05-10 19:50:30 +00:00
George Bosilca
adedf511fb Remove the printf that I unfortunately commit.
This commit was SVN r9869.
2006-05-10 00:02:54 +00:00
George Bosilca
9b985c3216 Force the useful data to be aligned on special boundary. It is 32 bits
right now. Some testing on large NUMA machines should be done in order
to make sure that we need to export this variable out to the MCA layer.

This commit was SVN r9868.
2006-05-09 21:46:10 +00:00
George Bosilca
a386fccccc Increase the default limits for the SM BTL. These new
values allow better performances on all the clusters
I was able to test.

This commit was SVN r9867.
2006-05-09 21:44:24 +00:00
Brian Barrett
91086cf2a4 * we want to unlink match entries when we unlink memory descriptors, but
I want to be lazy and not do it by hand, so set the match entries to
  PTL_UNLINK.

This commit was SVN r9861.
2006-05-09 14:20:51 +00:00
Gleb Natapov
0c34d5c9e6 fix endpoint matching in on demand connection establishment. This fix is in mvapi btl already.
This commit was SVN r9855.
2006-05-09 12:12:52 +00:00
Brian Barrett
1d337831d0 Fixes for more issues found by Dries Kimpe:
- We had a bad conditional choice, such that asking for pvfs2 would
    result in pvfs trying to build as well, which was going to fail.
  - We didn't try to link in the libray for PVFS2's adio component.
  - We were clobbering romio_flags, so it was impossible to pass
    flags to romio (like the selection of filesystems)

This commit was SVN r9854.
2006-05-09 09:30:09 +00:00
Galen Shipman
c992eeb1f3 don't need to decrement memory registered twice,, this is done in
mru_delete.. 

This commit was SVN r9853.
2006-05-08 17:42:34 +00:00
Brian Barrett
7dddc6d54c Define the constants needed by ROMIO to activate support code for
DARRAY / SUBARRAY.

This commit was SVN r9851.
2006-05-08 16:33:31 +00:00
Brian Barrett
462849d88c Fix two issues reported by Dries Kimpe:
- LDFLAGS set at the top level of Open MPI were not passed to the 
   ROMIO configure script
 - If ROMIO was explicitly required (with --enable-io-romio) and
   not able to be built, abort OMPI's configure script.

This needs to go to the v1.0 and v1.1 branches.

This commit was SVN r9845.
2006-05-08 13:13:32 +00:00
Brian Barrett
8397a1d71f still running into issues, but...
- change MASK behavior for tags - we need the upper bit to be whether
  the tag is reseved or not.  MPI_ANY_TAG should not pull off any
  reserved tag communication
- some other random debugging output to try to get some idea what is
  spewing out of here.

This commit was SVN r9844.
2006-05-08 09:23:09 +00:00
George Bosilca
e658557d52 Move the convertor creation out of th critical path. If we expect a
message from a known peer (not MPI_ANY_SOURCE) then we can attach the
remote proc and initialize the convertor as soon as we know the data-type,
and the count (so basically in the _INIT macro). If it's not the case, then
create them in the _MATCHED macro (as in the original version). Of course,
beforeinitializing the convertor we check that there will be some data
in the message.

This commit, plus the convertor improvements from few days ago, lower the
latency for my test case environment (mvapi) by 0.1 microseconds. The convertor
now is as slim as it can be, I don't think there is anything else to
remove/improve. 

This commit was SVN r9843.
2006-05-07 21:03:12 +00:00
George Bosilca
a7542824ed Generic length computation (moved from the endpoint.h).
This commit was SVN r9842.
2006-05-07 20:54:44 +00:00
George Bosilca
569b88e093 The endpoint include is not required.
This commit was SVN r9841.
2006-05-07 20:52:55 +00:00
George Bosilca
e63c1dc242 The last commit wans't supposed to bring this function in. It's not yet
ready for primetime...

This commit was SVN r9840.
2006-05-07 20:51:43 +00:00
George Bosilca
33aa65f894 Remove useless include.
This commit was SVN r9839.
2006-05-07 20:49:45 +00:00
Galen Shipman
a4c9db0c18 decrease the total bytes in the rcache when a registration is deleted from the
cache. 

This commit was SVN r9837.
2006-05-07 01:15:33 +00:00
Rainer Keller
0f9b10ff8e - Update test dup MPI_COMM_WORLD -- so that we may
have additional Barriers for output.

This commit was SVN r9831.
2006-05-05 07:42:33 +00:00
Rainer Keller
71d328c086 - Add the PERUSE_COMM_REQ_XFER_CONTINUE for recv.
This commit was SVN r9820.
2006-05-04 19:31:33 +00:00
Tim Woodall
161e54e6c8 finalize/cleanup failed btl
This commit was SVN r9819.
2006-05-04 18:48:45 +00:00
Tim Woodall
d8ff8010f3 track wether the vfrag is being retransmitted
This commit was SVN r9817.
2006-05-04 17:30:58 +00:00
Tim Woodall
1b26caa95b first cut at btl failover - seems to be working for simple test case
This commit was SVN r9816.
2006-05-04 16:16:26 +00:00
Tim Woodall
350d5b1713 change hardcoded values into mca params
This commit was SVN r9815.
2006-05-04 15:20:18 +00:00
Tim Woodall
fdd622544b added optional copy routine to allow "derived" class
of mca_bml_base_endpoint to copy state if an endpoint
is updated (e.g. btl deleted/added)

This commit was SVN r9814.
2006-05-04 15:19:12 +00:00
Brian Barrett
d101e91b97 * fix matching logic - since tag might be negative, need to mask the proper bits
or the bit-wise or changes all the high bits, which is bad
* push convertor creation to init to save a bit of time
* make debugging use macros so that it can go bye-bye

This commit was SVN r9810.
2006-05-04 13:48:32 +00:00
George Bosilca
bdecdc8d41 Cleanup the MX BTL. Remove all mpool related code as there will never be a MX mpool.
This commit was SVN r9808.
2006-05-04 06:55:45 +00:00
George Bosilca
c5209aad93 The return value is random. Let's return something that make sense.
This commit was SVN r9805.
2006-05-03 18:17:00 +00:00
Brian Barrett
6db0f2a027 * couple of corrections to compile on Red Storm
This commit was SVN r9801.
2006-05-03 13:13:59 +00:00
Brian Barrett
4add400f7d * properly start with the memory descriptor inactive
This commit was SVN r9787.
2006-05-01 20:23:38 +00:00
Brian Barrett
5f939c53be * first take at send / receive for a poratls pml (still really dumb and simple)
This commit was SVN r9786.
2006-05-01 20:03:49 +00:00
Brian Barrett
56f48357b3 * don't try to register callback at init time (will do at window creation time
anyway), so that we can run without ob1

This commit was SVN r9785.
2006-05-01 20:03:03 +00:00
Brian Barrett
4256705ffb * rename irecv, isend, and iprobe files to recv, send, and probe
This commit was SVN r9780.
2006-04-29 22:06:21 +00:00
Brian Barrett
315a889247 Try to get the Portals PML going again, just to get some data for the Cray
paper.  This is just the shell, for checkpoint.  Changes:

* Fix copyrights
* remove cancel code and ptl references
* add dump command 

This commit was SVN r9779.
2006-04-29 22:05:20 +00:00
Tim Woodall
02d991532f interface to post a callback for notification of change to modex data
This commit was SVN r9753.
2006-04-27 16:15:35 +00:00
Tim Woodall
4fd2a71b6c removed debug code - free list implementation has changed
This commit was SVN r9750.
2006-04-27 15:34:12 +00:00
Brian Barrett
9cab1bb54a * re-enable the eager fragment throttling, this time with the proper threshold value for when
the memory descriptor is closing itself, so that it actually works properly ;).  I think I
  was just getting lucky and not sending enough short messages with the reference impl.

This commit was SVN r9748.
2006-04-27 14:13:52 +00:00
Brian Barrett
66d1d3b83f * add a quick debugging sanity check
* It appears that Cray's SeaStar has some horrible performance for iovecs - IN_pLACE
  was actually slower than copying into eager frags.  Ugh.  And we don't even pre-pin
  eager frags yet!

This commit was SVN r9738.
2006-04-27 02:55:31 +00:00
George Bosilca
3e968d4f63 There is no length on the free list.
This commit was SVN r9704.
2006-04-24 23:13:51 +00:00
Brian Barrett
1da22f9099 * silence a bunch of compiler warnings on Solaris when using the Sun
compilers.

  This should go to the v1.1 branch

This commit was SVN r9693.
2006-04-23 21:15:09 +00:00
Brian Barrett
9befdc7d9f * Ensure that mca_common_sm_mmap_seg_alloc() always returns a word-aligned
pointer.  Otherwise, we can end up segfaulting when the memory area is
  used by the caller.  Fixes a bug reported by Alex Spiegel.

This commit was SVN r9692.
2006-04-23 21:14:03 +00:00
Brian Barrett
9a65ddd788 * back out r9005, which for some reason works fine on the reference implementation
but causes resource exhaustion on the Red Storm implementation.  Sigh...

This commit was SVN r9686.

The following SVN revision numbers were found above:
  r9005 --> open-mpi/ompi@20d06e889e
2006-04-22 20:12:33 +00:00
George Bosilca
29219ee57d Thanks to Gleb now we are able to call the schduler on Windows. Instead of using
sched_yield, we use our friend SwitchToThread.

This commit was SVN r9671.
2006-04-20 19:56:50 +00:00
Graham Fagg
c31a5ad4b3 A few small changes that just expanded in the name of neatness...
(1) As pointed out by Torsten after Jeff comment that there are 15 collectives yesterday.. nope.. I have 16 but
    miss counted them in my ifdefs (I had two #11s). Replaces with enum...
(2) Added a readonly MCA param for how many backend algorithms are available per collective (used by benchmarker/STS)
    This allowed me to remove the tuned query internal functions and replace them with ompi_coll_tuned_forced_max_algorithms[COLL].
(3) I was reading the user forced MCA params for the collectives on each comm create (module init) but I then put the 
    values into a global set of variables (like ompi_coll_tuned_reduce_forced_algorithm).

    To fix this and make the code neater:
    (a) The component looks up the MCA param indices on Open if dynamic_rules is set via the
                        ompi_coll_tuned_COLLECTIVE_intra_check_forced_init () call.
    (b) Got rid of the ompi_coll_ompi_coll_tuned_COLLECTIVE_forced_algorithm/segmentsize/etc globals with a struct that
            is now cached on the module data hung off the communicator. i.e. done right.
    (c) On module init if dynamic rules enabled we call a general getvalues routine (in coll_tuned_forced.c) to get the
            CURRENT values using the MCA param indices and then put them on the modules data segment.
        A shorter version of getvalues exists for barrier which only needs the algorithm choice

This commit was SVN r9663.
2006-04-19 23:42:06 +00:00
Andrew Friedley
345551cb36 Checkpoint before starting work on max-sized frags (maybe user too?).
- Some initial work on prepare_src
- Move some fragment initialization around
- Fix a union casting issue on picky compilers, identified by Don Kerr
- Other small cleanups/bugfixes

This commit was SVN r9662.
2006-04-19 22:20:22 +00:00
George Bosilca
61bea41350 The same in MX (missing copyright).
This commit was SVN r9661.
2006-04-19 21:37:30 +00:00
George Bosilca
afe9821d84 Add a missing copyright.
This commit was SVN r9660.
2006-04-19 21:36:22 +00:00
Tim Woodall
10f343734f decrease eager limit to 12K (improves latency)
This commit was SVN r9646.
2006-04-14 22:28:37 +00:00
Tim Woodall
6523c12e4b - decrease eager limit to 12K (improves latency)
- trigger event library while setting up connections

This commit was SVN r9645.
2006-04-14 22:28:05 +00:00
Tim Woodall
c6489cb5aa - turn on eager rdma by default
This commit was SVN r9641.
2006-04-14 21:11:14 +00:00
George Bosilca
b3cc3d82d3 Activate the OOB while we setup connections for MVAPI. Same thing should be done for the
Open IB ...

This commit was SVN r9640.
2006-04-14 20:53:42 +00:00
Galen Shipman
ba0aa46220 make csum's optional in pml dr, on by default, see mca param
pml_dr_enable_csum

This commit was SVN r9608.
2006-04-10 21:54:46 +00:00
Gleb Natapov
98282a3567 fix spelling. threashold -> threshold.
This commit was SVN r9577.
2006-04-08 08:13:37 +00:00
Andrew Friedley
d461b55696 - Implement OOB connection handshaking via the ORTE RML. To start a connect,
we send our local addr_t OOB.  Remote side then matches endpoints and calls
  dat_ep_connect().  Everything should be the same as before from here, except
  that client/server roles are reversed.
- Properly set our buffer size when posting receives.  When the frag used to
  transfer address information is recycled by the free list, the wrong buffer
  size was being used, which caused buffer overflow errors.
- Finally put the uDAPL error handling stuff in the mpool component.
- Remove a few more OPAL_OUTPUTs.

This commit was SVN r9569.
2006-04-07 15:26:05 +00:00
Galen Shipman
c29db49198 return out if we ack a duplicate matched rendezvous from mathed receives
sequence tracker and the communicator is null.. 

This commit was SVN r9521.
2006-04-03 21:04:51 +00:00
Gleb Natapov
b6ab1f4262 fix compilation warnings.
This commit was SVN r9515.
2006-04-02 11:32:25 +00:00
George Bosilca
22572940c8 Remove some useless code.
This commit was SVN r9513.
2006-04-01 07:42:43 +00:00
George Bosilca
58cd591d3b PERUSE support for OB1. There we go, now the trunk has a partial peruse implementation.
We support all the events in the PERUSE specifications, but right now only one event
of each type can be attached to a communicator. This will be worked out in the future.
The events were places in such a way, that we will be able to measure the overhead
for our threading implementation (the cost of the synchronization objects).

This commit was SVN r9500.
2006-03-31 17:09:09 +00:00
George Bosilca
1226d452bf Add a base _START macro that will do the base initialization. Additinaly, that allow me to
add the PERUSE event is a more homogeneous manner (all PML's will have them).

This commit was SVN r9499.
2006-03-31 17:05:09 +00:00
Andrew Friedley
74b2f77a4c The expected cleanup/refactoring commit..
Not much got tested that wasn't already - I've uncovered a connection
establishment deadlock and wanted to get these changes committed before I
attack it.

The big changes:
 - Moved much of the connection code from btl_udapl_component.c to
   btl_udapl_endpoint.c.
 - Cleaned up initialization of various fragment members.
 - MCA_BTL_UDAPL_ERROR macro, which is compiled in/out appropriately.

This commit was SVN r9496.
2006-03-31 16:25:19 +00:00
Galen Shipman
1d67917b69 must handle header validation correctly for each case, not enough in common
for the MACRO 

This commit was SVN r9486.
2006-03-30 21:27:21 +00:00
Tim Woodall
9a73fe8beb check for valid sequence number before attempting to use communicator
This commit was SVN r9482.
2006-03-30 19:36:15 +00:00
Gleb Natapov
256bf70530 Forgot to add file to previous commit
This commit was SVN r9480.
2006-03-30 17:37:52 +00:00
Gleb Natapov
79bcfb096f Add type to frag. Sometimes we need to know that a frag is from short rdma area.
I used hack for this that doesn't work for mvapi, so changing it to something more sane.

This commit was SVN r9477.
2006-03-30 15:26:21 +00:00
Gleb Natapov
ea11582191 Porting of short message RDMA from openib BTL. Endpoint registers circular buffer and sends its address and rkey to the peer. Peer uses this buffer to eagerly RDMA small message into it. Endpoint polls the buffer for message arrival before checking HP/LP QPs. Set btl_mvapi_use_eager_rdma to 1 to enable it.
This commit was SVN r9474.
2006-03-30 12:55:31 +00:00
Galen Shipman
641fa6c0d2 more fixes, reset state on completion..
This commit was SVN r9469.
2006-03-29 22:21:35 +00:00
Galen Shipman
2945f77f9e randomly drop fragments without local completion, currently commented out as
we must handle the other cases first.. 

This commit was SVN r9468.
2006-03-29 22:19:58 +00:00
Andrew Friedley
0eba366b07 Various pieces all over to make basic small message send/recv work. Next step
is clean up the code.. it is in need of refactoring and testing.

Thanks to Brian for help in troubleshooting!

This commit was SVN r9466.
2006-03-29 21:55:41 +00:00
Galen Shipman
5271948ec0 --- opal object changes
add object size to opal class
no longer need the size when allocating a new object as this is stored in
the class structure

--- dr changes 
Previous rev. maintained state on the communicator used for acking duplicate
fragments, but the communicator may be destroyed prior to successfull
delivery of an ack to the peer. We must therefore maintain this state
globally on a per peer, not a per peer, per communicator basis. 
This requires that we use a global rank on the wire and translate this as
appropriate to a local rank within the communicator. 

This commit was SVN r9454.
2006-03-29 16:19:17 +00:00
George Bosilca
5d465cf118 Call the constructor on the DR lock.
This commit was SVN r9438.
2006-03-28 07:34:02 +00:00
Graham Fagg
19906e66dc missing lock?
This commit was SVN r9436.
2006-03-28 06:15:48 +00:00
George Bosilca
46c442fe0d We do not have direct access to the module. Grab the one attached to the
window instead.

This commit was SVN r9434.
2006-03-28 05:06:40 +00:00
Tim Woodall
c1bf71b1be - updated copyrights
- removed unused state
- starting to add support for btl failover

This commit was SVN r9431.
2006-03-27 22:48:12 +00:00
Tim Woodall
c724e4c804 - removed unused flags
- updated copyrights

This commit was SVN r9430.
2006-03-27 22:44:26 +00:00
Gleb Natapov
590c992a7e fix recursive lock of openib_btl->ib_lock.
This commit was SVN r9427.
2006-03-26 15:02:43 +00:00
Gleb Natapov
01a119c3c5 fix compilation bug with --enable-mpi-threads
This commit was SVN r9426.
2006-03-26 13:24:10 +00:00
Gleb Natapov
a5a78b10cc Implementation of short message RDMA. Endpoint registers circular buffer and sends its address and rkey to the peer. Peer uses this buffer to eagerly RDMA small message into it. Endpoint polls the buffer for message arrival before checking HP/LP QPs. Set btl_openib_use_eager_rdma to 1 to enable it.
This commit was SVN r9425.
2006-03-26 08:30:50 +00:00
Galen Shipman
1677ca1cd4 continue to debug retransmission of incorrect offset,
only occurs on vfrag timeout.. 

This commit was SVN r9421.
2006-03-24 22:28:43 +00:00
Brian Barrett
01671f2991 * allow user to set "no_locks" info argument as MCA parameter to override the
default
* Add ability to start Put and Get requests immediately instead of queuing
  until synchronizaion when using Fence.  Not entirely sure this is
  completely safe, so it must be explicitly enabled by the user, either with
  an MCA parameter or info argument to Win_create.

This commit was SVN r9418.
2006-03-24 18:56:59 +00:00
Tim Woodall
2e376e0ee8 misc cleanup
This commit was SVN r9410.
2006-03-24 06:49:45 +00:00
George Bosilca
dec87e2cea Remove a warning by protecting one of the variables around #if/#endif.
This commit was SVN r9409.
2006-03-24 04:43:53 +00:00
George Bosilca
dabe47ca3d A function declared as static inline and who's not used directly, but
only as a pointer reference completely confuse some compilers (gcc 4.1
included). Removing the inline (it was there before when the function
was used in the same file) seems to solve the problem. However, the most
strange thing is that the bug only appear when we compile directly in
the trunk directory. It just don't happens when we're using the VPATH
build.

This commit was SVN r9408.
2006-03-24 04:21:30 +00:00
Brian Barrett
6cc582b20e * Fix "make dist" for peruse
* Install peruse.h in $includedir, since applications need to be able
  to include it as <peruse.h>
* Fix issue with onesided code always installing it's headers

This commit was SVN r9405.
2006-03-23 23:41:49 +00:00
Tim Woodall
1aaad721e8 clear state on rndv ack
This commit was SVN r9404.
2006-03-23 23:36:07 +00:00
Galen Shipman
19732d4c7c add length to frag_ack
This commit was SVN r9403.
2006-03-23 23:06:19 +00:00
Tim Woodall
0fa49f1297 set requests vfrag id when matched
This commit was SVN r9402.
2006-03-23 23:04:20 +00:00
Galen Shipman
3595cd8956 use hdr_match..
This commit was SVN r9401.
2006-03-23 22:21:15 +00:00
Galen Shipman
bec2ee346c use correct ack for rendezvous from seq tracker
This commit was SVN r9400.
2006-03-23 22:18:09 +00:00
Tim Woodall
996a1b56df more tweaking
This commit was SVN r9399.
2006-03-23 22:08:59 +00:00
Galen Shipman
c38fd90e63 need state to ack sync send retransmits, even after the recvreq is gone..
This commit was SVN r9397.
2006-03-23 22:02:59 +00:00
Tim Woodall
d1d8967844 init counters
This commit was SVN r9395.
2006-03-23 20:29:18 +00:00
Galen Shipman
754b424266 set vf_mask_pending when retransmitting so completion will occur before
the request is completed.. 

This commit was SVN r9394.
2006-03-23 20:28:52 +00:00
Galen Shipman
f609204cc5 disable reliability checking in bml
This commit was SVN r9392.
2006-03-23 17:50:20 +00:00
Galen Shipman
e01cf0a166 Seperate out sequence tracking list as stand alone class.
This commit was SVN r9391.
2006-03-23 17:02:17 +00:00
Tim Woodall
c1bec478c4 updates to reliability debug code
This commit was SVN r9390.
2006-03-23 17:00:20 +00:00
Tim Woodall
d9dc534c08 fix bogus comment
This commit was SVN r9388.
2006-03-23 16:41:37 +00:00
Tim Woodall
28fa260404 for frag case don't use retrans flag, simply
retransmit all segments of vfrag that have not been acked

This commit was SVN r9387.
2006-03-23 16:36:13 +00:00
Andrew Friedley
48d61cd99a Mostly fragment/LMR handling fixes:
- Grab the mpool_registration in _frag_common_constructor()
 - Save the LMR context in the segment key
 - No need for cookie variables - can just cast the frag
 - No need to memcpy() data when recv'ing
 - Add an LMR triplet to the fragment structure and initialize it
   in btl_udapl_alloc().
 - Whitespace/typo fixes, remove some opal_output() calls

Looks like I can use triplets describing sub-regions of registered LMR's.  So I
do this - prior to this patch I was sending the entire free list memory over,
which isn't correct :)

Back to an earlier problem - when sending address information right after
connection establishment, the receiving end receives a DTO completion event and
appears to have good data.  But the sending end never receives a DTO completion
event indicating the send completed, and never completes the client side of the
connection.

This commit was SVN r9386.
2006-03-23 16:21:08 +00:00
Galen Shipman
adf621fcce enable both mpool_base_use_mem_hooks and mpool_use_mem_hooks, same for
disable_sbrk. 

This commit was SVN r9385.
2006-03-23 16:15:50 +00:00
Galen Shipman
e548f5f8a8 change pml_ob1_leave_pinned_pipeline param to mpi_leave_pinned_pipeline
This commit was SVN r9384.
2006-03-23 15:57:34 +00:00
Tim Woodall
dc125cf7d5 misc corrections
This commit was SVN r9380.
2006-03-23 15:11:06 +00:00
Galen Shipman
0dd4af919d minor fix to special mca_bml_base_send which randomly corrupts and drops
packets (used for testing). 

This commit was SVN r9378.
2006-03-23 15:04:43 +00:00
Galen Shipman
70cf1ce562 more work in progress..
This commit was SVN r9369.
2006-03-22 23:06:18 +00:00
Tim Woodall
078cdcc9a8 cleanup
This commit was SVN r9368.
2006-03-22 23:01:37 +00:00
Tim Woodall
0f6161c6da reorg
This commit was SVN r9366.
2006-03-22 15:02:36 +00:00
Galen Shipman
bcb23dc762 rework rndv and eager data timeout/retrans
This commit was SVN r9358.
2006-03-21 21:23:33 +00:00
Tim Woodall
c7ee5e13bc simplification - dont swap src/dst pointers - always leave both
src/dst pointing to same segments

This commit was SVN r9357.
2006-03-21 18:20:17 +00:00
Tim Woodall
12e502b10d use correct loop index
This commit was SVN r9356.
2006-03-21 18:18:22 +00:00
George Bosilca
f7a5a582c5 Diagnostic function for mvapi. It print all the credits used for the flow control.
This commit was SVN r9355.
2006-03-21 17:02:14 +00:00
Tim Woodall
7a1ad5b6fb corrections to scheduling logic
This commit was SVN r9354.
2006-03-21 14:30:54 +00:00
Brian Barrett
0750a8a118 * fix (incorrect) GCC warning about using ret uninitialized. Bloody compilers.
This commit was SVN r9353.
2006-03-21 14:10:07 +00:00
Andrew Friedley
cf9246f7b9 Long overdue commit.. many changes.
In short, I'm very close to having connection establishment and eager send/recv working.

Part of the connection process involves sending address information from the
client to server.  For some reason, I am never receiving an event indicating
completetion of the send on the client side.  Otherwise, connection
establishment is working and eager send/recv should be trivial from here.


Some more detailed changes:
 - Send partially implemented, just handles starting up new connections.
 - Several support functions implemented for establishing connection.  Client
   side code went in btl_udapl_endpoint.c, server side in btl_udapl_component.c
 - Frags list and send/recv locks added to the endpoint structure.
 - BTL sets up a public service point, which listens for new connections.
   Steps over ports that are already bound, iterating through a range of ports.
 - Remove any traces of recv frags, don't think I need them after all.
 - Pieces of component_progress() implemented for connection establishment.
 - Frags have two new types for connection establishment - CONN_SEND and
   CONN_RECV.
 - Many other minor cleanups not affecting functionality

This commit was SVN r9345.
2006-03-21 00:12:55 +00:00
Andrew Friedley
200bb7d59b Remove an unwanted opal_output()
This commit was SVN r9344.
2006-03-21 00:01:37 +00:00
Tim Woodall
797a6b2887 dont compute checksum over header - data only
This commit was SVN r9343.
2006-03-20 23:08:14 +00:00
Galen Shipman
fc42320ea6 check retry counts on NAK retrans as well as timeouts
This commit was SVN r9342.
2006-03-20 22:11:23 +00:00
Galen Shipman
7ce7baff15 more bml work
This commit was SVN r9341.
2006-03-20 21:58:20 +00:00
Galen Shipman
ca13833e95 more dr work
This commit was SVN r9340.
2006-03-20 21:57:30 +00:00
Galen Shipman
5600932c2f fix misc warnings
This commit was SVN r9339.
2006-03-20 15:41:45 +00:00
Galen Shipman
15bdbd5ca1 add parameter names to cb func
This commit was SVN r9338.
2006-03-20 15:29:35 +00:00
George Bosilca
e181153f16 Remove the bogus prototype.
This commit was SVN r9333.
2006-03-19 19:22:35 +00:00
George Bosilca
a0d25ab6ef Add missing prototype for the mvapi diagnostic function.
This commit was SVN r9331.
2006-03-18 19:38:56 +00:00
Tim Woodall
bd870519fd - modified convertor copy_and_prepare routines to accept an addition
flag, new flags to be included when convertor is initialized
- modified pml/btl module defs and added stub functions for diagnostic
  output routines to dump state of queues / endpoints
- updates to data reliability pml

This commit was SVN r9329.
2006-03-17 18:46:48 +00:00
Tim Woodall
712468dbef add diagnostic interface
This commit was SVN r9328.
2006-03-17 17:39:41 +00:00
Galen Shipman
a465047e97 enable timeouts and retransmissions
This commit was SVN r9322.
2006-03-16 22:33:08 +00:00
George Bosilca
229f26dc55 First split of the datatype. More files and a cleaner distribution of functions
in the corresponding files. There are few others changes to come ...

This commit was SVN r9319.
2006-03-16 21:04:34 +00:00
Galen Shipman
3c9ce06f59 Use new csum routines
This commit was SVN r9318.
2006-03-16 20:26:33 +00:00
Galen Shipman
ff75de8c52 more dr work, add destination check on all receives, misc
This commit was SVN r9317.
2006-03-16 19:38:21 +00:00
Brian Barrett
234adb292b * add ability to try a couple of different collectives for fence
synchronization to see which gives the best performance

This commit was SVN r9314.
2006-03-16 18:40:42 +00:00
Jeff Squyres
8a9e76dfa3 Thanks to Sven for noticing that the increment in scatter should be
per the send datatype, not the receive datatype (MPI-1:105).

This commit was SVN r9312.
2006-03-16 18:18:28 +00:00
George Bosilca
4aa343990f Remove the segfault in ompi_info, when we try to destruct a not yet
constructed object.

This commit was SVN r9308.
2006-03-16 16:56:22 +00:00
Andrew Friedley
a3e2d2442b Mostly whitespace and other small cleanups.
Fixed configure.m4 to pull in the modified CFLAGS properly.

Some additional error checking and use of OMPI_ENABLE_DEBUG.

This commit was SVN r9296.
2006-03-16 02:38:08 +00:00
Tim Woodall
178d8ea905 use consistent macros for csum
This commit was SVN r9294.
2006-03-16 00:20:43 +00:00
Tim Woodall
c34f4c2cb7 correct cleanup for threaded case
This commit was SVN r9291.
2006-03-16 00:05:39 +00:00
George Bosilca
612570134f The request management framework has been redesigned. The main idea is
to let the PML (or io, more generally the low level request manager)
to have it's own release function (what was before the req_fini). This
function will only be called from the low level while the req_free will
be called from the upper level (MPI layer) in order to mark the request
as not used by the user anymore.

From the request point of view the requests will be marked as inactive
everytime we read their status (true for persistent as well). As 
MPI_REQUEST_NULL is already marked as inactive, the test and wait functions
are simpler. The drawback is that now we have to change in the
ompi_request_{test|wait} the req_status of the request once we get it's
status.

This commit was SVN r9290.
2006-03-15 22:53:41 +00:00
George Bosilca
8fb84e90ce It's already done in the send ... we don't have to initialize this
field several times.

This commit was SVN r9282.
2006-03-14 21:55:57 +00:00
Tim Woodall
92c5e26758 correct scheduling
This commit was SVN r9277.
2006-03-14 18:25:25 +00:00
Galen Shipman
440417e92c Add max_btls option
This commit was SVN r9263.
2006-03-13 17:03:21 +00:00
Sven Stork
12b94972e2 Fix comment of a paramter.
This commit was SVN r9261.
2006-03-13 09:11:46 +00:00
Brian Barrett
871957437d * SilverStorm's header files do some evil things that cause some of our
header files to go bad.  Include vapi.h after our headers to solve
  the issue.

This commit was SVN r9258.
2006-03-12 04:45:28 +00:00
Brian Barrett
c42da09796 * Fix a small bug George noticed - if you change the prefix (or any of the
installation directories) in configure, the files that depend on this
  information are not properly rebuilt.  If you need this information,
  don't setup a -D in the Makefile.am - instead, include 
  opal/install_dirs.h.
* Use the : option in AC_CONFIG_FILES to avoid needing to expose that
  we are playing around with temporary files with our headers to avoid
  rebuilding
* Clean up the version file information a bit, and like the install 
  directory stuff, make sure that there is a dependency so that 
  ompi_info gets rebuilt properly when a version number changes.

This commit was SVN r9256.
2006-03-12 04:35:01 +00:00
Brian Barrett
d041558f85 * protect lock when not building threaded
This commit was SVN r9254.
2006-03-11 03:20:50 +00:00
Brian Barrett
3e2c51dea8 * fix some silly commenting done by a previous developer that are good for
a laugh but probably not good for usability ;)

This commit was SVN r9253.
2006-03-11 03:09:24 +00:00
Brian Barrett
eafdfba0d4 * remove useless debugging output
This commit was SVN r9252.
2006-03-11 03:04:12 +00:00
Brian Barrett
bbfd10fb39 * check for 0 byte movement should happen before trying to allocate a
request, not while trying to allocate a request (duh)

This commit was SVN r9239.
2006-03-10 01:57:01 +00:00
Tim Woodall
9ae910044b resolve threading issue
This commit was SVN r9234.
2006-03-09 17:59:05 +00:00
Tim Woodall
c83b2fce4d resolve threading issue
This commit was SVN r9233.
2006-03-09 17:57:31 +00:00
George Bosilca
4fb373c7e8 HAVE_MALLOPT is defined only when we have it. It's not a 1/0 ttype of define, it's a
defined/undefined one.

This commit was SVN r9221.
2006-03-08 22:29:01 +00:00
Graham Fagg
95b060c741 output the right name and stop confusing george
This commit was SVN r9215.
2006-03-08 00:40:14 +00:00
Galen Shipman
5531baaec6 fix warnings, generalize acked datastructure, allows for easier external
testing. 

This commit was SVN r9212.
2006-03-06 23:18:26 +00:00
George Bosilca
1d0e378df3 icc complain about a missing return.
This commit was SVN r9211.
2006-03-06 21:42:07 +00:00
Tim Woodall
d350232c04 work in progress
This commit was SVN r9209.
2006-03-06 19:30:37 +00:00
Tim Woodall
0ef924769a minor edits
This commit was SVN r9205.
2006-03-06 16:32:36 +00:00
Tim Woodall
274ee03df6 work in progress
This commit was SVN r9192.
2006-03-04 00:36:16 +00:00
Galen Shipman
4e430b0428 fix warnings, other misc
This commit was SVN r9190.
2006-03-03 04:01:10 +00:00
Tim Woodall
8bf6ed7a36 - corrected locking in gm btl - gm api is not thread safe
- initial support for gm progress thread
- corrected threading issue in pml
- added polling progress for a configurable number of cycles to wait for threaded case

This commit was SVN r9188.
2006-03-02 00:39:07 +00:00
Galen Shipman
84d3055db5 Make sure everything is imediatly acked, even if not matched
Buffer first descriptor on the sendreq until postive ACK 
Set bytes delivered only after postive ACK, removed num_acks, etc, in general
trying to remove as much state as possible so that rolling things back isn't
such a nightmare 

This commit was SVN r9187.
2006-03-01 22:37:10 +00:00
Brian Barrett
1479a90b39 * assert() that endianness doesn't need to change if we are sending RDMA headers
around, since OB1 currently doesn't do the right thing there, but that should
  not happen in the near future because the R2 BML should not make any RDMA
  networks available between machines with different architectures
* Clean up the #ifs a little bit so that we don't do unneeded work when
  on big endian machines and heterogeneous support is disabled...

This commit was SVN r9184.
2006-02-28 19:54:46 +00:00
Brian Barrett
579e74290f * make gm wire-up endian safe
This commit was SVN r9179.
2006-02-28 02:03:46 +00:00
Galen Shipman
d9fd35d399 add acked items to datastructure,
fix compile issue. 

This commit was SVN r9178.
2006-02-28 01:07:35 +00:00
Galen Shipman
c6b4cc4417 Add data structure to track ACK's
This commit was SVN r9177.
2006-02-27 22:56:43 +00:00
Brian Barrett
e865a751bd * First whack at making the onesided component endian safe. Needs a endian-safe
datatype engine to really give it a whirl ;).

This commit was SVN r9176.
2006-02-27 18:47:00 +00:00
Galen Shipman
db6b1db548 use pml level datatype, someone else already cleaned this up in ob1.
This commit was SVN r9174.
2006-02-27 18:20:49 +00:00
Galen Shipman
2aa7b129a6 don't use ptl datatypes!
This commit was SVN r9173.
2006-02-27 18:07:38 +00:00
Brian Barrett
bfd49d248b * (hopefully) fix MPI_BOTTOM for portals, same way as oll the other RDMA btls from
eons ago...

This commit was SVN r9172.
2006-02-27 17:07:24 +00:00
Brian Barrett
9b19e3fef0 * remove some debugging output that shouldn't have been committed. Doh!
This commit was SVN r9171.
2006-02-27 16:23:52 +00:00
Rainer Keller
5102571c02 - Get rid of the temporary reachability bitmap.
This commit was SVN r9163.
2006-02-27 11:06:01 +00:00
Jeff Squyres
4a0d9bd46f Turns out that $(MCA_io_romio_STATIC_LTLIBS) is a throwback to a
previous incarnation of the configure/build system and isn't defined
anymore.  So it can be removed.

This commit was SVN r9158.
2006-02-27 03:28:40 +00:00
Jeff Squyres
d068c1516c A few minor Makefile.am fixes
This commit was SVN r9157.
2006-02-27 03:18:57 +00:00
Brian Barrett
d45b2b77d1 * don't reset AR or RANLIB - libtool will do the "right thing" for us
This commit was SVN r9151.
2006-02-26 20:32:47 +00:00
Brian Barrett
285581dff2 More endian-related cleanups:
- moved hton64 and ntoh64 from the bunch of places it had been copied
    into one header file
  - properly set and use the btl_tcp's nbo option to put things in
    network byte order on the wire if both sides don't have the same
    endianness
  - Put the OB1 PML's headers (with a couple exceptions I need to discuss
    with Tim) in network byte order on the wire if both sides don't have
    the same endianness
  - since it was needed for the TCP BTL, move the orte_process_name_t
    HTON and NTOH macros from the TCP OOB to ns_types.h

This commit was SVN r9145.
2006-02-26 00:45:54 +00:00
Galen Shipman
05140c5f8f Rework the data reliability PML, still needs quite a bit of work,
working on creating a uniform retransmission mechanism otherwise each type of
send ends up needing a special case for retransmission. 
Removed NACK for individual transmissions, we just aggregate these and send
them at the end of a vfrag 

This commit was SVN r9141.
2006-02-24 17:08:14 +00:00
Brian Barrett
d5e0ea3590 * Post and Start should only check their epoch types for conflicts, otherwise
you can't be in a post and a start at the same time, and that is clearly
  legal to do
* Fix interptretation of when the epochs start for MPI_Fence.  Only start
  an epoch if communication actually occurs, otherwise it isn't actually
  an epoch.  I don't know who thought that wording in the MPI standard
  was a good idea, but can't change it now...

This commit was SVN r9139.
2006-02-24 13:04:15 +00:00
Brian Barrett
27b8430e8f * update MPI_ACCUMULATE to perform it's parameter checking based on what
MPI-2:6.3.4 says about reduction operations
* Have the point-to-point one-sided component spew a warning and return
  an error if a non-predefined datatype is used with an MPI_OP other
  than MPI_REPLACE.  Yes, this violates the MPI standard, but it's the
  best we can do until George and I implement support for figuring out
  where all the locations to update are..

This commit was SVN r9134.
2006-02-23 21:07:49 +00:00
Brian Barrett
c544584387 * fix a race condition where a sendreq could be reused if it was originally
for a Get request and the reply came in before the local completion
  callback was fired from the btl.
* Silence some more debugging output for the moment

This commit was SVN r9130.
2006-02-23 06:02:10 +00:00
Brian Barrett
57b9c22adf * fix for last ptl fix... have to actually return a value...
This commit was SVN r9129.
2006-02-23 05:24:58 +00:00
George Bosilca
39252b764f Correctly compute the size of the datatype.
This commit was SVN r9127.
2006-02-23 04:30:52 +00:00
George Bosilca
79d25220b6 HAVE_MALLOPT is automatically detected by configure, so the correct
check is not against a value but against the define.

This commit was SVN r9126.
2006-02-23 04:30:24 +00:00
Brian Barrett
2db1babd40 * complete the correct group
This commit was SVN r9123.
2006-02-23 02:42:39 +00:00
Jeff Squyres
628125599d Fix the TCL btl module endpoint matching during setup for the scenario
when running an MPI job spanning a node that has two TCP NICs and a
node that has one TCP NIC.  Previously, for the 2 NIC/module process,
we would return the first peer IP address if we couldn't find a subnet
match with any of the peer's published IP addresses -- this was to
support running OMPI across subnet boundaries.  Changed the behavior
to only do that behavior if the IP address we're trying to match is
public (i.e., not 10.x.y.z, 192.168.x.y, or 172.16.x.y) *and* any of
the remote peer's addresses are public (working on the assumption that
if we both have public addresses, they're routable to each other).

This definitely will not work in all scenarios, such as when we go to
WAN kinds of executions, and will need to be revisited at that time.

This commit was SVN r9119.
2006-02-23 02:02:19 +00:00
Brian Barrett
2eb76ff0cd * finish the TEG/UNIQ/PTL removal
This commit was SVN r9118.
2006-02-23 00:39:01 +00:00