1
1
Граф коммитов

675 Коммитов

Автор SHA1 Сообщение Дата
Edgar Gabriel
c8adc2e65e coding around the collective operations
This commit was SVN r7698.
2005-10-11 20:34:17 +00:00
Edgar Gabriel
083d0b9630 Checkpoint: most of the coding should be done for the basic
infrastructure.

This commit was SVN r7696.
2005-10-11 19:45:21 +00:00
Graham Fagg
607bdf51b6 Last Cleanup BEFORE adding last two methods and final cross over points.
- new mca param calls
- move printfs to OPAL_OUTPUT

This commit was SVN r7692.
2005-10-11 18:51:03 +00:00
Edgar Gabriel
b42d4ac780 Checkpoint:
- update the hierarch stuff to use btl's instead of ptl's
- start the new logic regarding how to handle local leader communicators

This commit was SVN r7691.
2005-10-11 17:29:59 +00:00
Galen Shipman
23cbac25c8 lower default free list sizes..
This commit was SVN r7676.
2005-10-09 18:15:12 +00:00
Galen Shipman
fb19cc4177 compiler warning fixes..
This commit was SVN r7661.
2005-10-07 17:38:34 +00:00
Jeff Squyres
b22fab2826 Fix for a bug Galen noticed yesterday -- make the shared memory only
be allocated the first time a sm coll is selected for a communicator,
not before.

This commit was SVN r7647.
2005-10-06 13:17:27 +00:00
George Bosilca
1fe18814da Decrease the default length for the first fragment.
This commit was SVN r7643.
2005-10-06 00:05:01 +00:00
George Bosilca
0f04132b13 mx_connect in the MX documentation is supposed to take a timeout in seconds. However, in real life it seems that the timeout should be in micro-second.
This commit was SVN r7642.
2005-10-06 00:04:27 +00:00
Brian Barrett
b7ef094766 * the cid in the header is only 16 bits, so limit our max cid to what can fit in there.
This commit was SVN r7639.
2005-10-05 15:43:28 +00:00
Jeff Squyres
83b5a675f9 Don't automatically take the first entry off the selected component
list; be sure to check its priority against the basic component and
take the one with the higher priority.

This commit was SVN r7621.
2005-10-04 17:09:45 +00:00
George Bosilca
967cd1be32 Make the datatype compile on solaris.
Remove some warnings ...

This commit was SVN r7619.
2005-10-04 15:45:18 +00:00
Jeff Squyres
b17c4334c4 - Remove all vestigates of using the built-in mcb_tree from the
reduce_inorder() function -- we don't use the tree at all.
- Add more relevant "volatile"'s for the control buffers in the
  fragment mpool (and associated casts where necessary)

This commit was SVN r7616.
2005-10-04 14:52:59 +00:00
George Bosilca
9a67831ba3 Alway call the memory allocation function with the correct type for the first argument. The problem is
that on some OSes the iovec struct is not POSIX complian, the iov_len is not a size_t but simply an int.
This patch, add a local variable (type size_t) to use with the memory allocation function, and then put
back the value in the iov_len field.

This commit was SVN r7615.
2005-10-04 14:44:59 +00:00
George Bosilca
059c802094 Correct a small typo
This commit was SVN r7614.
2005-10-04 14:42:37 +00:00
Jeff Squyres
94bab558dd Put in a check to ensure the root is valid (all other rooted
operations have this; we somehow missed this for intracomms on reduce,
and it bit me this morning ;-) )

This commit was SVN r7612.
2005-10-04 14:38:17 +00:00
Tim Woodall
3b4a134a24 - removed unused define
- correct free to release registration rather than retain it

This commit was SVN r7611.
2005-10-04 14:33:26 +00:00
George Bosilca
f8355ec104 Cast the right side member to void* before assignment.
This commit was SVN r7608.
2005-10-04 12:37:23 +00:00
George Bosilca
6b3d02b514 Warning cleanups. On some OSes the iov_base member of the iovec structure is defined as an void * when
on others as an char*. Thus the right side of all assignment should be explicitly casted to an void* in
order to avoid any casting complaints from the compilers.

This commit was SVN r7607.
2005-10-04 12:36:07 +00:00
George Bosilca
3453a6c0e9 Remove some compiler warnings about unused variables
Correctly define the 64 bits constants.
Some minor cleanups.

This commit was SVN r7606.
2005-10-04 12:29:51 +00:00
George Bosilca
492c0e59dc Correct the casting type and remove some useless output (already commented out).
This commit was SVN r7605.
2005-10-04 12:28:47 +00:00
Tim Woodall
c05ef28f6e - added routine to ompi_pointer_array to remove array contents
- corrected memory hook callback to catch all allocations (need to optimize this)
- don't attempt to consolidate allocations

This commit was SVN r7600.
2005-10-03 23:29:26 +00:00
Jeff Squyres
c7fe54ba44 - Remove some silly compiler warnings
- Move the "process 0" logic out of the main loop in reduce to make
  the code a bit less complex (at the price of slight code
  duplication, but it iss now significantly easier to read)
- Fix problem with uniquenes guarantee in the bootstrap mpool -- using
  the CID alone was not sufficient enough to guarantee uniquenes; now
  use (CID, rank 0 process name) tuple to check for uniqueness
- Made a few debugging help changes in coll_sm.h; especially helps
  debugging on uniprocessors

This commit was SVN r7599.
2005-10-03 21:34:58 +00:00
Jeff Squyres
2cedfeec53 - Eliminate some unused base globals
- Move one base global to the basic component and make it an MCA
  parameter 
- Convert the basic component to use the new MCA param API

This commit was SVN r7598.
2005-10-03 21:07:42 +00:00
Jeff Squyres
57fb96b018 Clarification of a help message
This commit was SVN r7597.
2005-10-03 21:06:13 +00:00
Jeff Squyres
ab099fa8cb Re-indent; real commit with some changes coming shortly.
This commit was SVN r7596.
2005-10-03 19:56:39 +00:00
Galen Shipman
eefe0fd04a fix threaded compile
fix misc warnings 
cleanup posting of receive descriptors 
comment why we retain before deregister in rcache_rb_mru.c 

This commit was SVN r7595.
2005-10-03 16:35:12 +00:00
Galen Shipman
f46548e691 Add SRQ support to OpenIB btl, removed old mca param - not used..
This commit was SVN r7585.
2005-10-02 18:58:57 +00:00
Jeff Squyres
10064df0e9 Remove compiler warning
This commit was SVN r7578.
2005-10-02 10:43:53 +00:00
Jeff Squyres
84feccd3d5 This is something I forgot to commit from long ago -- already
discussed and cleared with Edgar.

Ensure that only processes who will be in the new communicator call
the coll selection function.  It is pointless (and Bad in some cases)
for processes who are not in the new communicator to try to select a
coll module for the new communicator.

This commit was SVN r7573.
2005-10-01 11:57:17 +00:00
Jeff Squyres
37fc944b01 Use the right number of segments per in-use flag when calculating
offsets.

This commit was SVN r7571.
2005-09-30 23:12:23 +00:00
Galen Shipman
67d38b7896 Add multi-nic support to openib
Fix connection establishment race in openib 
Other misc 

This commit was SVN r7570.
2005-09-30 22:58:09 +00:00
Jeff Squyres
934caaf449 Fix at least one segv; use the right number of segments (i.e., the
number o segments in the fragment pool, not in the bootstrap pool)

This commit was SVN r7565.
2005-09-30 18:01:15 +00:00
Brian Barrett
db872a0fbb * check that return from ibv_get_devices isn't NULL before calling dlist_start().
On thor, if IB is down, we get NULL back from ibv_get_devices(), which then
  caused segfaults in dlist_start().
* Pretty-print error message if no HCAs found

This commit was SVN r7557.
2005-09-30 14:58:59 +00:00
Jeff Squyres
fcef1774d5 Per advice from Ralf W., change the pkgdata declarations in
Makefile.am's to be a *slightly* more correct (and, more importantly,
less error-prone) construct.

This commit was SVN r7554.
2005-09-30 13:32:39 +00:00
Jeff Squyres
80b7deb4d7 Add in EXTRA_DIST to get helpfile in tarballs
This commit was SVN r7553.
2005-09-30 10:25:04 +00:00
Brian Barrett
7b20370306 * pretty-print an error message if a btl component loads but can't find
any NICs to use
* Make mvapi, gm, and mx components all publish information, even if there
  are no NICs available so that modex_recv doesn't hang.  If there are no
  NICs available, don't set the reachable bit, but don't do anything
  to fail.  This unfortunately doesn't cover the hangs that will result if
  different procs load different sets of components, but it's a start

This commit was SVN r7550.
2005-09-30 04:39:44 +00:00
Brian Barrett
a77c908496 * the last of the tuning params for portals
This commit was SVN r7548.
2005-09-30 04:05:31 +00:00
Galen Shipman
8239e635b9 fix misc warnings, cleanup macro..
This commit was SVN r7547.
2005-09-30 03:13:51 +00:00
Galen Shipman
05e6e51fec re-reg from min of bases and max of bounds
add byte counting for total registered memory 

This commit was SVN r7546.
2005-09-29 21:28:54 +00:00
Jeff Squyres
bc181d7130 Remove the .ompi_ignore so that everyone starts compiling this, but
lower the default priority to 0 so that it's not active unless you
specifically ask for it (this component needs more testing by people
other than me before we unleash it on the public).

This commit was SVN r7545.
2005-09-29 18:05:47 +00:00
Jeff Squyres
d4b7618db7 Comment out what seems to be a debugging output. Will confirm with
George.

This commit was SVN r7544.
2005-09-29 16:39:27 +00:00
Brian Barrett
997644af31 * There are now two forms of ibv_create_cq, one with 3 params and one with 5.
Try to detect which form this version of Open IB uses, defaulting to the 5
  version if we can't figure it out (the new version has 5 params)
* Only add -lcm if it exists on the system - some versions of Open IB
  apparently don't need it.

This commit was SVN r7542.
2005-09-29 13:35:57 +00:00
Josh Hursey
e825b4522f Upon further investigation the fix in r7537 was an anomoly of zero'ing out the
bits to expose the low bits being set. We were casting from a size_t to a void*
which is not good when working with big endian machines.

This fix makes MPI 2 dynamics work on PPC 64 (tested with a Linux OS).

This commit was SVN r7538.

The following SVN revision numbers were found above:
  r7537 --> open-mpi/ompi@fd45714c03
2005-09-28 23:50:42 +00:00
Josh Hursey
fd45714c03 For some reason we have to initialize this variable or bad things happen in the
comm->c_coll.coll_bcast of the rnamebuflen.

This fixes the threaded MPI 2 Dynamics stuff. Should be working great now! Yay!

This commit was SVN r7537.
2005-09-28 22:30:41 +00:00
Galen Shipman
3ded88a3c0 use addr +size -1 instead of base->addr as base->addr is down_aligned.
This commit was SVN r7536.
2005-09-28 20:19:33 +00:00
Galen Shipman
26a74d42fa release, not retain on gm_free
This commit was SVN r7535.
2005-09-28 20:18:52 +00:00
Edgar Gabriel
67dd52efb1 making the allreduce and reduce_scatter tests pass as well
This commit was SVN r7532.
2005-09-28 15:12:05 +00:00
Josh Hursey
75419313f7 check the return code and do something reasonable, instead of progressing and hanging on error
This commit was SVN r7531.
2005-09-28 06:13:51 +00:00
Galen Shipman
c1f5543f62 need to call mpool_release on all registrations obtained in the pml.
sanity checks 

This commit was SVN r7530.
2005-09-28 04:49:40 +00:00
Galen Shipman
b9b78f8f5d modify rcache_rb to find registrations in the middle of a base and bound
This commit was SVN r7528.
2005-09-28 02:11:35 +00:00
Edgar Gabriel
dbbbd416df fixing MPI_IN_PLACE for the log-reduce algorithm.
This commit was SVN r7526.
2005-09-27 21:51:55 +00:00
Galen Shipman
0fc17cedee change order of ops on register
This commit was SVN r7525.
2005-09-27 21:43:41 +00:00
Jeff Squyres
285ded5655 - Ensure to have !initialized || finalized test *first*
- If we have an NS error, don't return an error -- this function's
  purpose is to abort :-)
- s/abort()/exit(1)/ so that we don't drop massive corefiles

This commit was SVN r7524.
2005-09-27 20:26:38 +00:00
Galen Shipman
09e67ce4fd fix off by one on up_align_addr, use base and bound instead of base_align and
bound_align.. 

This commit was SVN r7521.
2005-09-27 18:10:44 +00:00
Galen Shipman
af04b3e1ab fix warnings..
This commit was SVN r7515.
2005-09-27 14:23:51 +00:00
Brian Barrett
80ac5c2efd * there are now two upcoming points where we want to release a version with
a random string of characters as part of the version number (the really
  soon to happen 1.0lanl release and the 1.1sc2005 release that we've
  talked about).  So rather than having alpha and beta fields that must
  be numeric values, have a general field that can be any alphanumeric
  value.

This commit was SVN r7511.
2005-09-27 02:06:05 +00:00
Galen Shipman
3c97b3f722 Modified the registration to include a base_align and bound_align for
searching the tree. Modified the memory callback to search the tree at each
page boundary for registrations. This is necessary as an application may
malloc memory and send out of any portion of that memory, even discontiguous
regions. 

This commit was SVN r7510.
2005-09-27 02:01:21 +00:00
Brian Barrett
d9e80d8f2a * increase size of event queue for receives - it was too small to be useful
on a reasonably sized machine
* if no mpool exists, don't try to malloc out an array of 0 bytes

This commit was SVN r7507.
2005-09-25 17:04:03 +00:00
Galen Shipman
384c472c94 reset ompi_pointer_array in mca_rcache_rb_find otherwise you might use an old
registration by accident.. 

This commit was SVN r7506.
2005-09-24 20:48:14 +00:00
Galen Shipman
02ce7a176e had that backwards..
This commit was SVN r7504.
2005-09-24 16:58:07 +00:00
Galen Shipman
d1246be47e should be strictly > or <
This commit was SVN r7503.
2005-09-24 16:45:34 +00:00
Galen Shipman
c53d51778a fix for warnings..
This commit was SVN r7501.
2005-09-24 15:08:28 +00:00
Galen Shipman
9fe5844071 decrement ref count on removal of registration from mru and tree.
add misc asserts to check for proper reference counting. 

ugly hack 1 -- use mallopt to never release memory ala sbrk - this is
commented out in mca_btl_mvapi_component_init

ugly hack 2 -- test registrations comming out of the tree via rcache_find, for
an unknown reason the tree is returning registrations where the address is not
within the base or bound of the registration. If this happens, we return
NULL. 

comment out code to enable mem hooks if leave_pinned is set, note we can do
this via an mca param and will default it to leave_pinned with mem_hooks when
we iron out these issues. 

I am adding a unit test for the rcache. Note that we have a unit test for the
rb tree but the compare function is significantly different than that used for
registrations. After we have tracked down the issues with rcache_rb we will
remove the above hacks. 

This commit was SVN r7499.
2005-09-24 00:24:49 +00:00
Brian Barrett
50dc5499b4 * fix some remaining --with-btl-portals configure issues
This commit was SVN r7498.
2005-09-24 00:11:40 +00:00
Brian Barrett
0d68728b94 * add some more debugging output for send fragment issue to figure out why
Red Storm is complaining about invalid memory pointer (need to go back
  to Linux and look at this with valgrind)
* Turn off send in place for now, so I can run the tests on RS and see if
  everything else is ok

This commit was SVN r7497.
2005-09-23 19:30:54 +00:00
Brian Barrett
07b0b8c943 * add some useful debugging output
* fix dumb bug in btl_portals_get where I using the dest descriptor key instead
  of the source descriptor key for the match bits, resulting in a PtlGet() with
  the wrong match bits

This commit was SVN r7496.
2005-09-23 15:30:18 +00:00
Tim Woodall
604c9d1002 bump the default cache size
This commit was SVN r7492.
2005-09-22 19:37:42 +00:00
Tim Woodall
848f12e7fd if mpi_leave_pinned is enabled - force malloc hooks
This commit was SVN r7491.
2005-09-22 17:27:56 +00:00
Tim Woodall
aceab46c5f use MPI_Alloc_mem/MPI_Free_mem for internally allocated buffers
This commit was SVN r7487.
2005-09-22 16:43:17 +00:00
Tim Woodall
147716c249 added hostname to error output
This commit was SVN r7486.
2005-09-22 16:41:34 +00:00
Tim Woodall
b404581293 local variables/objects (regs) must be initialized/constructed
This commit was SVN r7485.
2005-09-22 16:18:23 +00:00
Tim Woodall
7acf0a6bdb corrections for MPI_Free_mem
This commit was SVN r7481.
2005-09-22 15:47:33 +00:00
Andrew Friedley
555ae37255 Add lib{opal,orte,mpi}.la to appropriate LIBADD's, some whitespace cleanup as well.
This commit was SVN r7477.
2005-09-22 12:28:54 +00:00
Tim Woodall
da1e4a1292 - since we append new registrations to the end of the list,
need to remove old ones from the front
- call deregister to actually remove items from the cache/mru
list and deregister the memory (if not being used)

This commit was SVN r7476.
2005-09-21 23:25:16 +00:00
Tim Woodall
9791c066e8 dont attempt to pin the receive buffer if data has
already been received

This commit was SVN r7475.
2005-09-21 23:23:47 +00:00
Tim Woodall
1b7b220089 cleanup refcount
This commit was SVN r7462.
2005-09-21 20:04:56 +00:00
Tim Woodall
a74ca0062a reductions to initial memory footprint
This commit was SVN r7455.
2005-09-21 19:10:56 +00:00
Galen Shipman
4296e723c9 default free_lists to smaller size..
This commit was SVN r7454.
2005-09-21 18:55:07 +00:00
Galen Shipman
96ab5a6bd3 we can be in WAITING_ACK state without a race if the OOB ack is "slower" than
the scheduling of queued IB send operations. 

This commit was SVN r7452.
2005-09-21 16:47:08 +00:00
Tim Woodall
782e5b21cc cleanup
This commit was SVN r7451.
2005-09-21 15:34:45 +00:00
Tim Woodall
a49a442fe4 cleanup refcount logic
This commit was SVN r7450.
2005-09-21 15:32:27 +00:00
Tim Woodall
0ee34051f8 debug asserts
This commit was SVN r7449.
2005-09-21 15:30:17 +00:00
Tim Woodall
1b73d3856e possible race condition - set endpoint state before sending connect ack
This commit was SVN r7448.
2005-09-20 21:03:55 +00:00
David Daniel
e4985c2a07 Moving totalview spin to the very end of mpi_init
This commit was SVN r7444.
2005-09-20 15:22:15 +00:00
Brian Barrett
fd9901f683 * shell of a portals PML, properly ompi_ignored for most of the world...
This commit was SVN r7437.
2005-09-20 08:07:08 +00:00
Brian Barrett
d81726833e * Add memory barriers for shared memory. Rich and I think we got them
all and the Intel tests pass slightly oversubscribed.

This commit was SVN r7431.
2005-09-19 16:28:25 +00:00
Tim Woodall
aeb5bc3f57 still need to cleanup/revise the template for mpool changes
This commit was SVN r7425.
2005-09-19 14:34:24 +00:00
George Bosilca
b70230858b Correct the misnaming problem in the GM PTL.
This commit was SVN r7424.
2005-09-19 10:34:06 +00:00
Galen Shipman
6499eb3976 init the return code..
This commit was SVN r7423.
2005-09-18 14:25:30 +00:00
George Bosilca
97673b45d1 Remove the last bad symbol from the GM PTL.
This commit was SVN r7422.
2005-09-18 12:52:37 +00:00
George Bosilca
b5cb27c006 The self should use self named files.
This commit was SVN r7421.
2005-09-18 12:37:15 +00:00
George Bosilca
a7db1763e2 Cleanups ...
This commit was SVN r7420.
2005-09-18 12:34:29 +00:00
Jeff Squyres
d67c31f238 Remove useless compiler warnings.
This commit was SVN r7418.
2005-09-17 10:54:48 +00:00
Jeff Squyres
f9a1e14f65 Per suggestion from our friendly Libtool developer friends, add proper
dependencies for liborte and libompi (i.e., make liborte depend on
libopal, and make libmpi depend on liborte)

This commit was SVN r7417.
2005-09-17 10:45:46 +00:00
Galen Shipman
b8cb6e1c64 modified mpool module to contain flags - used to determine if the mpool will
be used in MPI_Alloc_mem operations. Note that we found an interesting bug in
which if memory was allocated by the sm mpool (via mmap) and then registered
via the mvapi mpool, the registration would fail on certain systems. 

Added mca param mpool_base_use_mem_hooks, set to 1 to enable the memory hooks
so that memory is deregistered if the user frees it behind our back. This is
only useful if the mca param mpi_leave_pinned is also set to 1. Otherwise all
registrations are deregistered within the MPI library or via
MPI_Free_buf. After testing we should probably set both mpi_leave_pinned and
mpool_base_use_mem_hooks to default to 1. 

This commit was SVN r7415.
2005-09-16 22:22:03 +00:00
Brian Barrett
c87babb565 * if start_rank == end_rank, there doesn't seem to be a requirement that
stride == 1, and the Intel tests explicitly test the case with
  strides != 1, expecting them to work

This commit was SVN r7411.
2005-09-16 18:44:21 +00:00
Galen Shipman
808b2c1c53 threaded build fix for btl_gm..
This commit was SVN r7409.
2005-09-16 17:18:15 +00:00
Jeff Squyres
2b82224a4f Remove superflous ; (actually causes an error in some cases)
This commit was SVN r7405.
2005-09-16 12:27:25 +00:00
Jeff Squyres
10d02b2110 Make sure to copy the right amount out of the temp buffer.
This commit was SVN r7400.
2005-09-15 22:06:36 +00:00