1
1
Граф коммитов

375 Коммитов

Автор SHA1 Сообщение Дата
Galen Shipman
f12bbe0591 Handle different subnets correctly and multiple nic endpoint negotiation
This is somewhat limited currently for expample,  if you have 3 ports on Node A and 5 ports
on Node B then the peers will use 3 ports to communicate with each other. 
This is on a subnet basis, so for any pair of nodes we take the
intersection of the available ports within a subnet.

We use subnets to determine reachability for lazy connection establishment. So
if Node A and Node B each have two HCA's (on seperate networks) then the
subnet's must be distinct, otherwise we will try to wire up HCA's on seperate
networks.  

This commit was SVN r12978.
2007-01-03 22:35:41 +00:00
Brian Barrett
27cea44a9c Fix a number of issues with the ompi_ptr_t:
* Make sure that the pval always writes to the correct portion of the
    lval.  This only matters on 32 bit big endian machines.
  * On 32 bit machines when assigning to pval, the other 4 bytes of lval
    weren't being written, which could lead to bogus data

We use macros so that there aren't casts all over the code and the pval
assignment can occur to the correct 4 bytes.  Refs trac:587

This commit was SVN r12974.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-03 19:47:48 +00:00
Gleb Natapov
484c6a2c1a Use OPAL_ALIGN() macro to align length. Return address from mpool_alloc is now
properly aligned so no need to align it once more.

This commit was SVN r12899.
2006-12-19 08:34:48 +00:00
Gleb Natapov
190e7a27cd Merge with gleb-mpool branch. All RDMA components use same mpool now (rdma).
udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows
to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited).

This commit was SVN r12878.
2006-12-17 12:26:41 +00:00
Brian Barrett
0653dc3f24 Pad headers to eliminate heterogeneous issues. Add conversion functions
for switching endianness of headers.  Galen is going to add the code to
use the endian stuff...

This commit was SVN r12876.
2006-12-17 00:50:59 +00:00
Brad Benton
18da4c40d3 Set the QP's static rate from the associated MCA parameter, rather
than just defaulting to 0.

Fixes trac:675

This commit was SVN r12855.

The following Trac tickets were found above:
  Ticket 675 --> https://svn.open-mpi.org/trac/ompi/ticket/675
2006-12-14 19:42:24 +00:00
Jeff Squyres
0ca8cb35b7 Fixes trac:366
Add ability for ini files to recognize "use_eager_rdma" flag.  Set the
default to "no" (because we should assume that HCAs cannot support the
property necessary for using RDMA for eager messages -- that the last
byte of the message is guaranteed to be written to memory last --
unless proven otherwise.  For example, iWARP cards apparently do not
provide this guarantee), and then set all Mellanox and IBM HCAs to
override the default to enable this behavior on these cards.

This commit was SVN r12851.

The following Trac tickets were found above:
  Ticket 366 --> https://svn.open-mpi.org/trac/ompi/ticket/366
2006-12-14 15:52:13 +00:00
Brad Benton
337116d5fd Added IBM eHCA vendor and part id info
This commit was SVN r12827.
2006-12-12 14:12:39 +00:00
Jeff Squyres
e70ef98ea6 Update the help message to be a bit more specific and refer to the web
FAQ.

This commit was SVN r12812.
2006-12-09 15:13:03 +00:00
Brian Barrett
6f8b366acb Rename liborte to libopen-rte and libopal to libopen-pal per telecon today
and bug #632.

Refs trac:632

This commit was SVN r12762.

The following Trac tickets were found above:
  Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632
2006-12-05 18:27:24 +00:00
Brian Barrett
441432950f Merge in changes from the bwb-heterogeneous temp branch (r12491 -
r12714) for supporting compilers / architectures with different
padding rules.

This commit was SVN r12749.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r12491
  r12714
2006-12-04 20:11:42 +00:00
Pavel Shamis
f08bc818c4 Cleaning mca_btl_openib_progress_thread from
unused variables.

This commit was SVN r12709.
2006-11-30 18:28:45 +00:00
Brian Barrett
33320b7165 Rework the opal_progress interface to better support dynamic processes and at
the same time, remove some of the MPI-related options from OPAL:

  - provide mechanism to change at runtime whether sched_yield() should 
    be called when the progress engine is idle
  - provide mechanism for changing the rate at which the event engine
    is called when there are "no" users of the event engine (ie, when
    using MPI but not TCP)
  - fix some function names in the progress engine to better match
    their intended use (and remove MPI naming scheme)
  - remove progress_mpi_enable / progress_mpi_disable because 
    we can now use the functions to set the sched_yield and
    tick rate interfaces
  - rename opal_progress_events() to opal_progress_set_event_flag()
    because the first really isn't descriptive of what the function
    does and I always got confused by it

This commit was SVN r12645.
2006-11-22 02:06:52 +00:00
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
Gleb Natapov
9933a6f469 Previous fix doesn't fix the case when opcode is changed in put/get functions.
The fix is to set opcode to SEND at the entrance to the send function before
checking credits and putting fragment to the pending list. We do the same thing
in put/get functions i.e setting opcode at the entrance to the function.

This commit was SVN r12559.
2006-11-11 07:51:06 +00:00
Gleb Natapov
7e03b83d23 Reset opcode field to SEND. It is checked later in pending progress function.
This commit was SVN r12531.
2006-11-10 06:17:00 +00:00
Gleb Natapov
b4fd2d7d50 Fix warnings from progress thread patch.
This commit was SVN r12434.
2006-11-06 12:34:56 +00:00
Pavel Shamis
566667ac61 Adding progress thread support to OpenIB BTL.
Reviewed by Gleb.

This commit was SVN r12411.
2006-11-02 16:15:21 +00:00
Gleb Natapov
4c784b6403 As Andrew Friedley pointed, my previous patch may cause deadlock if
mca_btl_openib_endpoint_connect_eager_rdma() is called recursively. He also
noticed that orte_pointer_array_add() can't fail because we allocate max number
of elements at init time. So just remove error handling and locking. No locking
 - no deadlocks.

This commit was SVN r12388.
2006-11-01 15:53:33 +00:00
Gleb Natapov
aac695a51f eager_rdma_buffers update is not atomic. A buffer is added to the array and if
something is going wrong down in the code it is removed from the array. So add
mutex to prevent concurrent access to the array from different threads.

This commit was SVN r12385.
2006-11-01 07:27:32 +00:00
Andrew Friedley
48c5117476 Fix some signedness warnings on threaded builds introduced by r12369
This commit was SVN r12376.

The following SVN revision numbers were found above:
  r12369 --> open-mpi/ompi@d7375ec102
2006-10-31 17:29:25 +00:00
Gleb Natapov
d7375ec102 Fix deadlock reported by Andrew Friedley:
What's happening is that we're holding openib_btl->eager_rdma_lock when
we call mca_btl_openib_endpoint_send_eager_rdma() on
btl_openib_endpoint.c:1227.  This in turn calls
mca_btl_openib_endpoint_send() on line 1179.  Then, if the endpoint
state isn't MCA_BTL_IB_CONNECTED or MCA_BTL_IB_FAILED, we call
opal_progress(), where we eventually try to lock
openib_btl->eager_rdma_lock at btl_openib_component.c:997.

The fix removes this lock altogether. Instead we atomically set local RDMA
pointer to prevent other threads to create rdma buffer for the same endpoint.
And we increment eager_rdma_buffers_count atomically thus polling thread doesn't
need lock around it.

This commit was SVN r12369.
2006-10-31 09:54:52 +00:00
Gleb Natapov
1b152dfe09 On 64 bit platform if high 32 bits of buf address is not zero they are trimmed
by wrong bitwise and. Fix it by expanding mask to 64 bits.

This commit was SVN r12368.
2006-10-31 07:33:35 +00:00
George Bosilca
126a68dc9a Big datatype commit. Remove all unused features of the datatype engine. As the memory
allocation logic is completely done outside the data-type engine (in the PML) there is
no need for any special case inside the data-type engine. There is less arguments for
the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is
not required anymore as there is no memory allocated in the engine itself). This change
affect all components using datatypes. I test most of them, but it might happens that I
miss some ... If it's the case please let me know (don't shoot the pianist!!).

This commit was SVN r12331.
2006-10-26 23:11:26 +00:00
George Bosilca
640178c4b3 Grepping through the source files I found these calls to the data-type engine
with the wrong type of arguments.

This commit was SVN r12148.
2006-10-17 21:05:04 +00:00
Rainer Keller
668902c780 - trivial spelling
This commit was SVN r12139.
2006-10-17 16:34:52 +00:00
Pavel Shamis
d64cb58007 Tavor HCA vendor_id was changed to correct one: 23108
Reviewed by: Jeff

This commit was SVN r12127.
2006-10-16 14:46:39 +00:00
Gleb Natapov
26660d657a Openib BTL supports PUT and GET protocol, communicate it to PML. We get better
bandwidth with GET protocol and mpi_leave_pinned option.

This commit was SVN r11862.
2006-09-28 11:48:04 +00:00
Gleb Natapov
30608c5a50 Add separate queues for pending rget and rput frags. Process only limited
number of pending packets at once, otherwise we can spin forever.

This commit was SVN r11861.
2006-09-28 11:41:45 +00:00
Gleb Natapov
7b1b4f95e3 Local GID table contains not what I thought it contains. It contains local HCA
GIDs (there can be more than one) and not GIDs of the HCA on the network. Entry
zero always have to be initialized so we use it, and warn user if there is more
then one port active and default subnet is configured on at least one of them.

This commit was SVN r11815.
2006-09-26 12:12:33 +00:00
Gleb Natapov
9cd25158d6 Fix btl_openib_max_btls parameter handling.
This commit was SVN r11772.
2006-09-25 11:18:20 +00:00
Gleb Natapov
601a6ca17a Use real subnet prefix instead of sm_lid.
This commit was SVN r11749.
2006-09-22 10:27:12 +00:00
Gleb Natapov
18c54f574f add rmb()
This commit was SVN r11710.
2006-09-19 13:27:05 +00:00
Gleb Natapov
ac42284c16 Print more helpful message in case we can't find active port.
This commit was SVN r11706.
2006-09-19 08:56:32 +00:00
George Bosilca
a3ad4a7fc8 The visibility flags (and/or Windows friendly export) is now on for all BTLs.
This commit was SVN r11662.
2006-09-14 22:19:39 +00:00
Gleb Natapov
7999c08107 consolidate credit management and CQ polling code.
This commit was SVN r11622.
2006-09-12 09:17:59 +00:00
Gleb Natapov
9b93f48e22 fix compile warnings in previous commit.
This commit was SVN r11554.
2006-09-07 13:31:50 +00:00
Gleb Natapov
d0caffa0aa Consolidate receive buffers prepost code for HP/LP QPs.
This commit was SVN r11552.
2006-09-07 13:05:41 +00:00
Gleb Natapov
298c825592 Remove #if OMPI_MCA_BTL_OPENIB_HAVE_SRQ. Always compile SRQ.
This commit was SVN r11537.
2006-09-06 05:45:37 +00:00
Gleb Natapov
4f05da21a1 remove unused macro
This commit was SVN r11536.
2006-09-06 05:36:26 +00:00
Gleb Natapov
424e412391 Make eager rdma work with SRQ enabled.
This commit was SVN r11530.
2006-09-05 16:04:04 +00:00
Gleb Natapov
c13240a1d1 remove rdma_credits from openib BTL header. Use one field for regular and rdma credits.
This commit was SVN r11529.
2006-09-05 16:02:09 +00:00
Gleb Natapov
fe932ca7bf consolidate part of HP/LP fields.
This commit was SVN r11528.
2006-09-05 16:00:18 +00:00
Gleb Natapov
b6bac100b0 Move error path out of the way.
This commit was SVN r11527.
2006-09-05 15:59:02 +00:00
Gleb Natapov
ffe7051488 fix compilation warnings.
This commit was SVN r11524.
2006-09-05 09:16:22 +00:00
Jeff Squyres
7fe337ce3b Yoinks -- remove some debugging output.
This commit was SVN r11515.
2006-09-01 11:48:26 +00:00
Jeff Squyres
6c2e938d31 Update the comment to reflect that this can now be a comma-delimited list.
This commit was SVN r11507.
2006-08-30 20:28:48 +00:00
Jeff Squyres
91bdbc0673 This commit fixes a few things. It looks bigger than it is because a
bunch of code changed indenting level and some code got moved out of
one function and made into its own subroutine.

- Gleb pointed out that I wasn't taking into account values from the
  default section of the INI file (and not finding values in the INI
  file is not an error).
- I incorrectly thought that 0x5ad was Mellanox's vendor ID.  Turns
  out that 0x5ad is Cisco's ID, while 0x2c9 is Mellanox.
  Specifically, Cisco burns its own firmware into the HCA which
  replaces the vendor ID, although the part ID stays the same.  So
  it's Mellanox hardware with Cisco firmware.  And apparently several
  of us do that.  :-)  So I expanded the concept of the vendor_id in
  the INI file to allow for lists of vendor IDs.  
- Along with that, I updated the default INI file to list all the IB
  vendors (that I am aware of -- certainly open to putting more data
  in there from other vendors) who overwrite Mellanox's vendor_id with
  their own for the part numbers that we have on file.

This commit was SVN r11506.
2006-08-30 20:21:47 +00:00
Jeff Squyres
9e2488bfe9 George found a great way to avoid warnings from flex for that unused
function.  Woo hoo!

This commit was SVN r11469.
2006-08-28 13:44:37 +00:00
Gleb Natapov
c70eb43e43 Align eager RDMA buffer so that last byte of the buffer is on the last byte of
the CPU cache line. Improves zero byte latency a little bit because of L1 cache
miss reduction.

This commit was SVN r11465.
2006-08-28 11:03:56 +00:00
George Bosilca
3f0a7cad9e The last patch for Windows support. Mostly casting and conversion to C++ friendly headers.
This commit was SVN r11400.
2006-08-24 16:38:08 +00:00
Gleb Natapov
21e99cd334 init mtu parameter when no warn is set.
This commit was SVN r11388.
2006-08-24 10:42:42 +00:00
Galen Shipman
fbf7e9cf1c use int32_t's not size_t's (interface change in ORTE)..
This commit was SVN r11322.
2006-08-22 16:26:36 +00:00
Galen Shipman
e5c594c211 More updates for the async error handler for btl's
In order to provide backwards compatability the framework versions are bumped
and the handler registeration function is at the end of the btl struct.
Testing done on sm, openib, and gm.. 

This commit was SVN r11256.
2006-08-17 22:02:01 +00:00
Galen Shipman
3b49953ce2 Add error callback to the btl interface, this allows error to be delivered to
the upperlayer assynchronously although there are some issues with this.. such
as there are multiple consumers of the btl's.. who get's the

This commit was SVN r11232.
2006-08-16 20:21:38 +00:00
Jeff Squyres
474564a6b1 Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen.  Functionally [probably somewhat
lightly] tested by Galen.  We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.

Here's a summary of the changes from that branch: 

* Move MCA parameter registration to a new file (btl_openib_mca.c):
   * Properly check the retun status of registering MCA params
   * Check for valid values of MCA parameters
   * Make help strings better
   * Otherwise, the only default value of an MCA param that was
     changed was max_btls; it went from 4 to -1 (meaning: use all
     available)
 * Properly prototyped internal functions in _component.c
   * Made a bunch of functions static that didn't need to be public
   * Renamed to remove "mca_" prefix from static functions
   * Call new MCA param registration function
   * Call new INI file read/lookup/finalize functions
   * Updated a bunch of macros to be "BTL_" instead of "ORTE_"
   * Be a little more consistent with return values
   * Handle -1 for the max_btls MCA param
   * Fixed a free() that should have been an OBJ_RELEASE()
   * Some re-indenting
 * Added INI-file parsing
   * New flex file: btl_openib_ini.l
   * New default HCA params .ini file (probably to be expanded over
     time by other HCA vendors)
   * Added more show_help messages for parsing problems
   * Read in INI files and cache the values for later lookup
   * When component opens an HCA, lookup to see if any corresponding
     values were found in the INI files (ID'ed by the HCA vendor_id
     and vendor_part_id)
   * Added btl_openib_verbose MCA param that shows what the INI-file
     stuff does (e.g., shows which MTU your HCA ends up using)
   * Added btl_openib_hca_param_files as a colon-delimited list of INI
     files to check for values during startup (in order,
     left-to-right, just like the MCA base directory param).
   * MTU is currently the only value supported in this framework.
   * It is not a fatal error if we don't find params for the HCA in
     the INI file(s).  Instead, just print a warning.  New MCA param
     btl_openib_warn_no_hca_params_found can be used to disable
     printing the warning.
 * Add MTU to peer negotiation when making a connection
   * Exchange maximum MTU; select the lesser of the two

This commit was SVN r11182.
2006-08-14 19:30:37 +00:00
Galen Shipman
f7015abb92 set the inline_max to something.. doh..
This commit was SVN r11133.
2006-08-08 17:24:12 +00:00
Galen Shipman
c93711cfdb checking for max_inline_data == 0 as an error condition is not valid,, so
don't do it.. 

This commit was SVN r11132.
2006-08-08 16:53:47 +00:00
Brian Barrett
9c30aefff5 * constant is always defined -- use #if, not #ifdef
This commit was SVN r11089.
2006-08-02 18:37:41 +00:00
Galen Shipman
fb9210463f clarify assignment..
This commit was SVN r11065.
2006-07-31 20:54:54 +00:00
Galen Shipman
ce0b8d9b48 cleanup of cq/srq sizing..
This commit was SVN r11061.
2006-07-31 17:24:39 +00:00
Galen Shipman
c9e0eda190 Initialize the completion queue to a reasonable size based on maximum number
of send/receives outstanding.

Use ibv_cq_resize if available after initial creation of completion queue if
cq_size is too small (based on number of peers). 

This commit was SVN r11053.
2006-07-30 00:58:40 +00:00
Gleb Natapov
72575d81d2 Create separate pool for control messages. It is unlimited, but the maximum number of element that are allocated from it is limited by number of connections.
This commit was SVN r11028.
2006-07-27 14:09:30 +00:00
Gleb Natapov
4b605295b3 remove unused field.
This commit was SVN r10965.
2006-07-24 06:12:16 +00:00
Gleb Natapov
3b34dc8df8 remove MCA_BTL_IB_FRAG_ALIGN. Alignment is handled in free_list_t.
This commit was SVN r10945.
2006-07-23 12:33:49 +00:00
Gleb Natapov
91f48f9a79 Merge with gleb-pml branch. Add out of resource handling support to PML layer.
If resource is not available request is added to one of the pending list and retried later.

This commit was SVN r10900.
2006-07-20 14:44:35 +00:00
Gleb Natapov
383694c68d Add support to get alignemnt buffers from free_list_t. Convert openib BTL to new interface.
This commit was SVN r10899.
2006-07-20 14:39:05 +00:00
Gleb Natapov
e05ec69dc4 print "flush error" only once.
This commit was SVN r10672.
2006-07-06 08:03:01 +00:00
Gleb Natapov
9b0807e547 Put pending fragment on the right waiting list.
This commit was SVN r10671.
2006-07-06 07:51:23 +00:00
Galen Shipman
7e079d20ab fix for stupid casting.. addresses issue on PPC64 where sizes get set
improperly and badness ensues..

This commit was SVN r10574.
2006-06-29 21:58:50 +00:00
Gleb Natapov
c8f75c472a remove modulo op from fast path. Improvement 0.02-0.04ms.
This commit was SVN r10538.
2006-06-28 12:00:47 +00:00
Gleb Natapov
e58a89ef3e OMPI_ENABLE_DEBUG is always defined (to 0 or 1). Use #if and nto #ifdef.
This commit was SVN r10537.
2006-06-28 11:25:09 +00:00
Gleb Natapov
704a5eb645 Support for LMC (lid mask count) and multiple QPs per port.
This commit was SVN r10536.
2006-06-28 07:23:08 +00:00
Jeff Squyres
df45221a3e Until a real fix for #142 is found, this workaround prohibits using
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1.  If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).

This commit was SVN r10521.
2006-06-27 10:43:03 +00:00
Gleb Natapov
52208d7bf9 Whe don't need to register zero sized frags.
This commit was SVN r10519.
2006-06-27 08:50:12 +00:00
Galen Shipman
8855e5b73a Fixes for DR as well as better diagnostic..
Successfully passing the intel test suite with/without induced errors/drops. 

This commit was SVN r10518.
2006-06-26 22:29:29 +00:00
Gleb Natapov
b7715395cb Return descriptor before sending credits one more time. We may need it.
This commit was SVN r10495.
2006-06-26 07:05:58 +00:00
Jeff Squyres
1d27ca5d0a Until a real fix for #142 is found, this workaround prohibits using
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1.  If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).

This commit was SVN r10424.
2006-06-20 11:32:46 +00:00
Jeff Squyres
600bf4295a Update the help message to be slightly more concise and clear
This commit was SVN r10422.
2006-06-20 11:23:38 +00:00
Brian Barrett
3d027e57a8 * fix for ticket #141. If we are going to shortcut out of polling the
send/receive queues if there is something available in the short message
  rdma queues, then we have to poll *ALL* the rdma queues before exiting,
  or we aren't fair about frag reception and fall into degenerate matching
  cases.

This commit was SVN r10410.
2006-06-17 21:32:25 +00:00
Galen Shipman
218a438509 finished the ompi_free_list_t class nightmare..
This commit was SVN r10314.
2006-06-12 22:09:03 +00:00
Jeff Squyres
a4030ad2d9 Improve the tremendously unhelpful MCA help message for the
btl_openib_ib_mtu and btl_mvapi_ib_mtu MCA params by showing the valid
values what what they represent (got a question about this from Cisco
testing engineers).

This commit was SVN r10277.
2006-06-09 18:02:45 +00:00
Galen Shipman
cc54b07aa0 add better error messages for vapi retry exceeded errors.
This commit was SVN r10219.
2006-06-06 02:04:56 +00:00
Galen Shipman
9e6e7575b9 doh... add the file..
This commit was SVN r10210.
2006-06-05 21:24:42 +00:00
Galen Shipman
f05dee0435 add help file to explain why things went south..
This commit was SVN r10209.
2006-06-05 21:23:45 +00:00
Galen Shipman
74c97fb784 cleanup error reporting.. use ompi_proc_t->proc_name if available this gives
us source/dest hostnames for communication errors.. 

This goes to 1.1 branch (reviewed by Brian).. 

This commit was SVN r10200.
2006-06-05 20:02:41 +00:00
Galen Shipman
0344ae4ac5 Fix to allow eager limit and max send size to be any size (within resource limitations). Instead of storing the ompi_free_list_t * in the fragment, we use the frag type enum, this tells us where the frag came from and where it should return.. This could also be done in mvapi but is not a high priority moving forward..
Review by Brian, needs to hit the trunk + 1.1 release.. 

This commit was SVN r10157.
2006-06-01 02:32:18 +00:00
Brian Barrett
5163f2b296 Fix for bug #36. The MX, MVAPI, and OpenIB components don't have
support for progress threads, so we shouldn't build them or try to use
them when support for progress threads has been requested.  The TCP, GM,
SELF, and SM BTLs should have progress thread support, so they aren't
disabled.  The Portals BTL isn't compiled on platforms with threads,
so it doens't need to be updated.

This commit was SVN r10156.
2006-06-01 01:30:16 +00:00
Gleb Natapov
f590d8a190 fix eager RDMA on PPC64.
This commit was SVN r10059.
2006-05-25 11:05:12 +00:00
Gleb Natapov
0c34d5c9e6 fix endpoint matching in on demand connection establishment. This fix is in mvapi btl already.
This commit was SVN r9855.
2006-05-09 12:12:52 +00:00
Tim Woodall
6523c12e4b - decrease eager limit to 12K (improves latency)
- trigger event library while setting up connections

This commit was SVN r9645.
2006-04-14 22:28:05 +00:00
Tim Woodall
c6489cb5aa - turn on eager rdma by default
This commit was SVN r9641.
2006-04-14 21:11:14 +00:00
Gleb Natapov
98282a3567 fix spelling. threashold -> threshold.
This commit was SVN r9577.
2006-04-08 08:13:37 +00:00
Gleb Natapov
b6ab1f4262 fix compilation warnings.
This commit was SVN r9515.
2006-04-02 11:32:25 +00:00
Gleb Natapov
79bcfb096f Add type to frag. Sometimes we need to know that a frag is from short rdma area.
I used hack for this that doesn't work for mvapi, so changing it to something more sane.

This commit was SVN r9477.
2006-03-30 15:26:21 +00:00
Gleb Natapov
590c992a7e fix recursive lock of openib_btl->ib_lock.
This commit was SVN r9427.
2006-03-26 15:02:43 +00:00
Gleb Natapov
01a119c3c5 fix compilation bug with --enable-mpi-threads
This commit was SVN r9426.
2006-03-26 13:24:10 +00:00
Gleb Natapov
a5a78b10cc Implementation of short message RDMA. Endpoint registers circular buffer and sends its address and rkey to the peer. Peer uses this buffer to eagerly RDMA small message into it. Endpoint polls the buffer for message arrival before checking HP/LP QPs. Set btl_openib_use_eager_rdma to 1 to enable it.
This commit was SVN r9425.
2006-03-26 08:30:50 +00:00
Tim Woodall
712468dbef add diagnostic interface
This commit was SVN r9328.
2006-03-17 17:39:41 +00:00
Galen Shipman
440417e92c Add max_btls option
This commit was SVN r9263.
2006-03-13 17:03:21 +00:00
Galen Shipman
e58b758031 standardize behavior of btl_alloc, if the size is larger than the max send
size, btl_alloc returns NULL. 

This commit was SVN r9114.
2006-02-22 17:37:59 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
Galen Shipman
c8045bf397 Fixup for ORTE datatype checkin,
- use appropriate header files 
- change calls from orte_dps to orte_dss 

This commit was SVN r8920.
2006-02-07 15:20:44 +00:00
Tim Woodall
a2fde48f2f changes from release branch
This commit was SVN r8858.
2006-01-31 16:17:18 +00:00
Tim Woodall
bcd6c525f8 removed duplicate locks
This commit was SVN r8857.
2006-01-31 16:12:37 +00:00
Tim Woodall
e861158fcd - removed debug code
- removed extraneous memset

This commit was SVN r8798.
2006-01-24 23:38:41 +00:00
Galen Shipman
d657052510 misc cleanup..
This commit was SVN r8731.
2006-01-18 16:20:50 +00:00
Galen Shipman
84a09e4f4e use #if not #ifdef..
This commit was SVN r8720.
2006-01-17 21:07:34 +00:00
Galen Shipman
0c81c0a6ce use ibv_get_device_list if present
(submitted from roland)

This commit was SVN r8712.
2006-01-17 16:23:35 +00:00
Tim Woodall
a584c60dbe re-worked flow control logic to take into account the return
of credits from the peer prior to local completion, so that
we don't overrun the number of send wqes available.

This commit was SVN r8683.
2006-01-12 23:42:44 +00:00
Tim Woodall
63d0438991 merge in changes from release branch
This commit was SVN r8637.
2006-01-04 16:34:45 +00:00
Tim Woodall
e9498f7a75 improve error reporting when registrations fail
This commit was SVN r8598.
2005-12-22 16:05:28 +00:00
George Bosilca
e5158142b9 The lb should be extracted from the datatype not from the convertor.
This commit was SVN r8446.
2005-12-10 23:27:20 +00:00
Tim Woodall
1929a97d2f corrections for MPI_BOTTOM
This commit was SVN r8429.
2005-12-09 23:27:55 +00:00
Galen Shipman
4fce90a37b one last warning fixed on 32 bit platforms.
This commit was SVN r8191.
2005-11-18 17:27:09 +00:00
Galen Shipman
635e7a682b fix for 32bit compile warnings.
This commit was SVN r8190.
2005-11-18 17:08:51 +00:00
Galen Shipman
dde38d4119 reset sg_entry->addr to point at header when sending control messages.
cast to uint64_t (the correct datatype per verbs.h) instead of uintptr_t. 

This commit was SVN r8175.
2005-11-17 05:45:33 +00:00
Tim Woodall
58dd6c2493 - merge from release branch
This commit was SVN r8174.
2005-11-17 05:32:30 +00:00
Tim Woodall
01b94862df merge from release branch
This commit was SVN r8168.
2005-11-16 17:12:44 +00:00
Galen Shipman
5a4b1ebdd4 in mca_btl_openib_endpoint_post_send: set opcode on work request before potentially inserting it on pending list..
This commit was SVN r8127.
2005-11-12 02:11:14 +00:00
Tim Woodall
654ba6d262 srq cleanup
This commit was SVN r8106.
2005-11-10 23:29:54 +00:00
Tim Woodall
4a06e8463c port of flow control from mvapi
This commit was SVN r8102.
2005-11-10 20:15:02 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Tim Woodall
31eb35c3f1 correct rnr parameter - need to review this code and pass correct data type
This commit was SVN r7936.
2005-10-31 17:18:39 +00:00
Tim Woodall
3bd5b81dfa Submitted: Gleb Natapov
This commit was SVN r7899.
2005-10-27 17:48:40 +00:00
Galen Shipman
cb84a57c57 add endpoint and srq flow-control..
Note, we are failing the ring tests in the intel p2p test suite, but we seem
to fail the same tests under the current trunk.. will look into this further. 

This commit was SVN r7823.
2005-10-21 02:21:45 +00:00
Galen Shipman
0d1d231169 convert to new mca params, adding description strings.
changed mca param rr_buf_min/max to rd_min/max 
Add bandwidth param to openib 

This commit was SVN r7815.
2005-10-20 02:55:21 +00:00
Galen Shipman
3efecaaeda convert openib btl to use new mca_param registration.. Also, change rr_buf_min
and rr_buf_max to rd_min and rd_max 

This commit was SVN r7786.
2005-10-17 20:00:34 +00:00
Galen Shipman
eefe0fd04a fix threaded compile
fix misc warnings 
cleanup posting of receive descriptors 
comment why we retain before deregister in rcache_rb_mru.c 

This commit was SVN r7595.
2005-10-03 16:35:12 +00:00
Galen Shipman
f46548e691 Add SRQ support to OpenIB btl, removed old mca param - not used..
This commit was SVN r7585.
2005-10-02 18:58:57 +00:00
Galen Shipman
67d38b7896 Add multi-nic support to openib
Fix connection establishment race in openib 
Other misc 

This commit was SVN r7570.
2005-09-30 22:58:09 +00:00
Brian Barrett
db872a0fbb * check that return from ibv_get_devices isn't NULL before calling dlist_start().
On thor, if IB is down, we get NULL back from ibv_get_devices(), which then
  caused segfaults in dlist_start().
* Pretty-print error message if no HCAs found

This commit was SVN r7557.
2005-09-30 14:58:59 +00:00
Brian Barrett
997644af31 * There are now two forms of ibv_create_cq, one with 3 params and one with 5.
Try to detect which form this version of Open IB uses, defaulting to the 5
  version if we can't figure it out (the new version has 5 params)
* Only add -lcm if it exists on the system - some versions of Open IB
  apparently don't need it.

This commit was SVN r7542.
2005-09-29 13:35:57 +00:00
Andrew Friedley
555ae37255 Add lib{opal,orte,mpi}.la to appropriate LIBADD's, some whitespace cleanup as well.
This commit was SVN r7477.
2005-09-22 12:28:54 +00:00
Galen Shipman
f0b1ea52bc if all else fails in prepare_src,, pack
init the rdma_pending list in ob1

This commit was SVN r7366.
2005-09-14 04:41:33 +00:00
Galen Shipman
d932cfd342 merge of rcache work into the trunk.. lotsa fun ;-)..
I regression tested before the merge, I will regression test tonight and
correct issues that might have crept in. 

This commit was SVN r7329.
2005-09-12 22:28:23 +00:00
Brian Barrett
ed56e743b7 * update configure.ac to use the modern version of AC_INIT and
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
  number to be set at autoconf time (instead of at configure time, as
  it was before).  Set the version number, minus the subversion r number,
  at autoconf time.  Override the internal variables to include the r
  number (if needed) at configure time.  Basically, the right thing
  should always happen.  The only place it might not is the version
  reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
  them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
  in the directory containing source files, even if the Makefile.am is
  in another directory.  This should start making it feasible to
  reduce the number of Makefile.am files we have in the tree, which
  will greatly reduce the time to run autogen and configure.

This commit was SVN r7211.
2005-09-07 05:54:53 +00:00
Galen Shipman
afdfa70f73 Added support for openib RDMA READ.. note that performance is currently an
issue so PUT is default.. We are determining if this is an openib issue or a
btl issue as we have seen performance increases on mvapi. 

This commit was SVN r6928.
2005-08-18 17:08:27 +00:00
Tim Woodall
f274f524ab - added get based protocol (if supported by btl) for pre-registered memory
- removed 8 bytes from the majority of the pml headers 

This commit was SVN r6916.
2005-08-17 18:23:38 +00:00
Galen Shipman
ee6999fa90 typo in threaded build..
This commit was SVN r6898.
2005-08-16 13:22:08 +00:00
Galen Shipman
8e1e2eec3d Misc fixes for threaded builds..
This commit was SVN r6874.
2005-08-14 19:03:09 +00:00
Jeff Squyres
cf16a521c8 Ensure to get ompi/include/constants.h
This commit was SVN r6845.
2005-08-12 21:42:07 +00:00
Tim Woodall
5558c014b9 default TCP to only be used if self/sm/gm/mvapi.... are not available
This commit was SVN r6832.
2005-08-12 16:56:46 +00:00
Galen Shipman
73757b300c Added BTL_VERBOSE and OMPI_MCA_btl_base_debug , if set to 1 DEBUG output if
set to 2 VERBOSE output.. 

This commit was SVN r6783.
2005-08-09 17:49:39 +00:00
Galen Shipman
ba82bc11bc bug fixes and configure check for topspin directory structure..
This commit was SVN r6767.
2005-08-08 19:10:36 +00:00
Tim Woodall
2214f0502d - first cut at tcp btl (working but not optimal)
- reworked btl error logging macros

This commit was SVN r6701.
2005-08-02 13:20:50 +00:00
Galen Shipman
aa749499f8 Don't need to remove pedantic flags for openib.
This commit was SVN r6615.
2005-07-27 14:59:41 +00:00
Galen Shipman
115ef95c88 remove ompi_ignore files
add configure.m4 to openib btl and mpool. 

This commit was SVN r6613.
2005-07-27 13:42:37 +00:00
Galen Shipman
f3843bee55 Updated openib btl and mpool to use configure.m4
removed ompi_ignore files from openib btl and mpool. 

This commit was SVN r6612.
2005-07-27 03:38:25 +00:00
Galen Shipman
e33a8205e8 Bugfix and use INLINE flag on send.
This commit was SVN r6600.
2005-07-25 14:57:33 +00:00
Galen Shipman
4c119def85 Misc fixes..
This commit was SVN r6583.
2005-07-21 20:26:17 +00:00
Galen Shipman
fd969ac833 More code cleanup.. Also converted post receive requests to macros..
This commit was SVN r6566.
2005-07-20 17:43:31 +00:00
Galen Shipman
946402b980 More openib cleanup.. still note ready for public consumption ;-)
This commit was SVN r6565.
2005-07-20 15:17:18 +00:00
Galen Shipman
2f67ab82bb Working version of openib btl ;-)
Fixed receive descriptor counts that limited mvapi and openib to 2 procs.                                                   
Begin porting error messages to use the BTL_ERROR macro. 

This commit was SVN r6554.
2005-07-19 21:04:22 +00:00
Galen Shipman
5af3cc8045 carryover mvapi mpool changes to openib
This commit was SVN r6525.
2005-07-15 16:05:05 +00:00
Galen Shipman
b75560796c Fix up error handling in openib.. Added a simple debug test for memory
registration.. 

This commit was SVN r6520.
2005-07-15 15:13:19 +00:00
Galen Shipman
dcbda13a72 Various bug fixes..
This commit was SVN r6464.
2005-07-13 21:13:30 +00:00
Galen Shipman
c1c4a5efba Compiling checkin of openib btl and mpool..
This commit was SVN r6452.
2005-07-13 00:17:08 +00:00
Galen Shipman
d7bdc46ac9 compile error and warining fixes for openib..
This commit was SVN r6449.
2005-07-12 21:49:30 +00:00
Galen Shipman
ed1da1a7c8 verbs.h not vapi.h --- doh
This commit was SVN r6444.
2005-07-12 19:07:32 +00:00
Galen Shipman
ac527bdc78 Modified btl names as appropriate..
This commit was SVN r6443.
2005-07-12 19:02:39 +00:00
Galen Shipman
2a2bb0c1e5 Modified the openib configure.stub files to search for openib libs in the
appropriate spot and also added a check for the libsysfs library required by openmpi. Modified the mvapi configure.stub to use AC_TRY_LINK for
pthreads. 

This commit was SVN r6441.
2005-07-12 18:06:54 +00:00
Galen Shipman
4286dae71b Added the appropriate ignore and unignore files...
This commit was SVN r6436.
2005-07-12 13:39:59 +00:00
Galen Shipman
454fdff824 Initial commit of changes to the mvapi btl to the openib btl. Still need to
work on the configure.stub to correctly locate the ib libraries. 

This commit was SVN r6435.
2005-07-12 13:38:54 +00:00
Jeff Squyres
ba99409628 Major simplifications to component versioning:
- After long discussions and ruminations on how we run components in
  LAM/MPI, made the decision that, by default, all components included
  in Open MPI will use the version number of their parent project
  (i.e., OMPI or ORTE).  They are certaint free to use a different
  number, but this simplification makes the common cases easy:
  - components are only released when the parent project is released
  - it is easy (trivial?) to distinguish which version component goes
    with with version of the parent project
- removed all autogen/configure code for templating the version .h
  file in components
- made all ORTE components use ORTE_*_VERSION for version numbers
- made all OMPI components use OMPI_*_VERSION for version numbers
- removed all VERSION files from components
- configure now displays OPAL, ORTE, and OMPI version numbers
- ditto for ompi_info
- right now, faking it -- OPAL and ORTE and OMPI will always have the
  same version number (i.e., they all come from the same top-level
  VERSION file).  But this paves the way for the Great Configure
  Reorganization, where, among other things, each project will have
  its own version number.

So all in all, we went from a boatload of version numbers to
[effectively] three.  That's pretty good.  :-)

This commit was SVN r6344.
2005-07-04 20:12:36 +00:00
Brian Barrett
e55f99d23a * rename ompi_if to opal_if
* rename ompi_malloc to opal_malloc
* rename ompi_numtostr to opal_numtostr
* start of rename of ompi_environ to opal_environ

This commit was SVN r6332.
2005-07-04 01:36:20 +00:00
Brian Barrett
9f44b80291 * rename ompi_argv to opal_argv
* rename ompi_basename to opal_basename
* rename ompi bitop functions to opal
* rename ompi_cmd_line to opal_cmd_line
* rename ompi_sizet2int to opal_sizet2int
* rename orte_daemon_init to opal_daemon_init
* rename ompi_few to opal_few

This commit was SVN r6330.
2005-07-04 00:13:44 +00:00
Brian Barrett
a13166b500 * rename ompi_output to opal_output
This commit was SVN r6329.
2005-07-03 23:31:27 +00:00
Brian Barrett
23b687b0f4 * rename ompi_event to opal_event
This commit was SVN r6328.
2005-07-03 23:09:55 +00:00
Brian Barrett
39dbeeedfb * rename locking code from ompi to opal
This commit was SVN r6327.
2005-07-03 22:45:48 +00:00
Brian Barrett
9f0c969bb4 * rename ompi_hash_table opal_hash_table
This commit was SVN r6324.
2005-07-03 16:52:32 +00:00
Brian Barrett
761402f95f * rename ompi_list to opal_list
This commit was SVN r6322.
2005-07-03 16:22:16 +00:00
Brian Barrett
499e4de1e7 * rename ompi_object and ompi_class to opal_object and opal_class
This commit was SVN r6321.
2005-07-03 16:06:07 +00:00
Brian Barrett
8cad33db40 * finish modex move
* fix protection in opal_free_list.h
* Fix some makefiles

This commit was SVN r6311.
2005-07-03 00:52:18 +00:00
Jeff Squyres
4ab17f019b Rename src -> ompi
This commit was SVN r6269.
2005-07-02 13:43:57 +00:00