1
1
Граф коммитов

163 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
2d17dd9516 temporarily back our r15517 and 15520 so that I can get the RML / OOB changes
to cleanly apply

This commit was SVN r15527.

The following SVN revision numbers were found above:
  r15517 --> open-mpi/ompi@41977fcc95
2007-07-20 01:10:34 +00:00
Ralph Castain
41977fcc95 Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but...
Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point.

Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings.

This commit was SVN r15517.
2007-07-19 20:56:46 +00:00
Josh Hursey
d4d5a351c1 Silence a compiler warning when not using IPV6.
Also convert a few statements to conform to coding standard for Open MPI.

This commit was SVN r15407.
2007-07-13 16:38:36 +00:00
Jeff Squyres
8aa8a667da Use the OMPI version number for the component number, like all other
btl components.

This commit was SVN r15363.
2007-07-11 15:45:25 +00:00
Brian Barrett
1d02b9e7b5 Fix a bunch of issues exposed by Ken Cain in getting Open MPI to work with
VxWorks.  Still some issues remaining, I'm sure.

Refs trac:1010

This commit was SVN r15320.

The following Trac tickets were found above:
  Ticket 1010 --> https://svn.open-mpi.org/trac/ompi/ticket/1010
2007-07-10 03:46:57 +00:00
Brian Barrett
8b9e8054fd Move modex from pml base to general ompi runtime, sicne it's used by more
than just the PML/BTLs these days.  Also clean up the code so that it
handles the situation where not all nodes register information for a given
node (rather than just spinning until that node sends information, like
we do today).

Includes r15234 and r15265 from the /tmp/bwb-modex branch.

This commit was SVN r15310.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r15234
  r15265
2007-07-09 17:16:34 +00:00
Brian Barrett
f8fb1e9720 Fix some compile failures on Solaris 9 because it doesn't have V6ONLY.
This commit was SVN r15237.
2007-06-28 18:52:15 +00:00
Gleb Natapov
b88b7dedfe Rename btl_rdma_offset to btl_pipeline_send_length.
This commit was SVN r15153.
2007-06-21 07:12:40 +00:00
Brian Barrett
27ad954265 Fix a couple of problems with the way we were using orte_process_name_t
structures in the system.  Instead of using memcmp, use the ns function.
This won't cause a problem as long as all three elements of the name are
ints, but if they have different sizes, alignment and padding rules
can cause memcmp() to compare padding space, which rarely holds a sane
value.

This commit was SVN r14998.
2007-06-11 19:12:11 +00:00
Gleb Natapov
ac1e8f81af Lets be real. TCP latency is slightly worse then mx/openib.
This commit was SVN r14865.
2007-06-05 12:22:57 +00:00
Gleb Natapov
fbd033b162 Cut&Paste error in r14795. Fix.
This commit was SVN r14862.

The following SVN revision numbers were found above:
  r14795 --> open-mpi/ompi@6b0d8c0858
2007-06-05 10:07:06 +00:00
Gleb Natapov
6b0d8c0858 TCP BTL ignores btl_tcp_bandwidth parameter. Fix it.
This commit was SVN r14795.
2007-05-30 14:12:05 +00:00
Gleb Natapov
f191834e56 No need for MCA_BTL_FLAGS_NEED_ACK any more. As of commit r14768 this is the
default behaviour.

This commit was SVN r14782.

The following SVN revision numbers were found above:
  r14768 --> open-mpi/ompi@3401bd2b07
2007-05-27 11:25:39 +00:00
Galen Shipman
3401bd2b07 Add optional ordering to the BTL interface.
This is required to tighten up the BTL semantics. Ordering is not guaranteed,
but, if the BTL returns a order tag in a descriptor (other than
MCA_BTL_NO_ORDER) then we may request another descriptor that will obey
ordering w.r.t. to the other descriptor.


This will allow sane behavior for RDMA networks, where local completion of an
RDMA operation on the active side does not imply remote completion on the
passive side. If we send a FIN message after local completion and the FIN is
not ordered w.r.t. the RDMA operation then badness may occur as the passive
side may now try to deregister the memory and the RDMA operation may still be
pending on the passive side. 

Note that this has no impact on networks that don't suffer from this
limitation as the ORDER tag can simply always be specified as
MCA_BTL_NO_ORDER.

This commit was SVN r14768.
2007-05-24 19:51:26 +00:00
George Bosilca
7459ab45f1 This is the complete commit for the TCP header issue. Jeff commit a partial
fix (r14749) and then backed it out (r14753).

As we are unable to send more than a 32 bits length over TCP in one go, there
is no reason to have an uint64 length in the header. This reduce the size
of the TCP header.

This commit was SVN r14755.

The following SVN revision numbers were found above:
  r14749 --> open-mpi/ompi@48c026ce6b
  r14753 --> open-mpi/ompi@28ed850b4c
2007-05-24 16:40:49 +00:00
Jeff Squyres
28ed850b4c Back out r14749; it wasn't quite ready for prime time yet...
This commit was SVN r14753.

The following SVN revision numbers were found above:
  r14749 --> open-mpi/ompi@48c026ce6b
2007-05-24 15:46:15 +00:00
Jeff Squyres
48c026ce6b Commit a patch from George (reviewed by Brian): reduce the size of the
mca_btl_tcp_hdr_t struct and remove the need for the heterogeneous
padding by changing the type of the "size" member to be uint32_t
(vs. uint64_t).  The value would never be greater than 32 bits anyway,
so having the type be uint64_t was wasteful.

This commit was SVN r14749.
2007-05-24 15:08:57 +00:00
George Bosilca
b2e805db61 Nothing relevant. Indentation, typos, change PTL to BTL.
This commit was SVN r14727.
2007-05-23 14:03:52 +00:00
Gleb Natapov
3ebaff8dfe Implement new BTL parameters:
We eagerly send data up to btl_*_eager_limit with the match
Upon ACK of the MATCH we start using send/receives of size
btl_*_max_send_size up to the btl_*_rdma_pipeline_offset
After the btl_*_rdma_pipeline_offset we begin using RDMA writes of
size btl_*_rdma_pipeline_frag_size.

Now, on a per message basis we only use the above protocol if the
message is larger than btl_*_min_rdma_pipeline_size

btl_*_eager_limit - > same
btl_*_max_send_size -> same
btl_*_rdma_pipeline_offset -> btl_*_min_rdma_size
btl_*_rdma_pipeline_frag_size -> btl_*_max_rdma_size


btl_*_min_rdma_pipeline_size is new..

This patch also moves all BTL common parameters initialisation into
btl_base_mca.c file.

This commit was SVN r14681.
2007-05-17 07:54:27 +00:00
Brian Barrett
33a5758521 Some IPv6 improvements:
* Move ipv6comat.h code into opal_config_bottom.h and change into some
    more intelligent testing of structures
  * Change opal's if interface to use sockaddr instead of sockaddr_storage,
    as the RFCs suggest we do
  * Move the networking code in opal that isn't directly related to if
    detection into net.h
  * Add quicky function to get the port out of either a sockaddr_in
    or sockaddr_in6, saving a bunch of code in the oob.
  * Update TCP oob and btl with new interface

This commit was SVN r14679.
2007-05-17 01:17:59 +00:00
Brian Barrett
7708c4f887 Don't complain about unsupported protocols. Needs to be made better,
but this will quit the whining from platforms where the kernel doesn't
have IPv6 support.

This commit was SVN r14676.
2007-05-16 20:11:47 +00:00
Terry Dontje
f864348f97 Put an ifdef to conditionalize the use of memcpy for sparcv9 platforms to
avoid alignmment issues.  This commit fixes trac:1009.

This commit was SVN r14608.

The following Trac tickets were found above:
  Ticket 1009 --> https://svn.open-mpi.org/trac/ompi/ticket/1009
2007-05-08 17:17:34 +00:00
Adrian Knoth
d63d125a88 I guess we only need this when IPv6 is enabled.
This commit was SVN r14551.
2007-04-29 16:38:34 +00:00
Adrian Knoth
5765ecc22e This patch reverts r14549 while retaining IPv6 support.
Re #1008

This commit was SVN r14550.

The following SVN revision numbers were found above:
  r14549 --> open-mpi/ompi@386baed55b
2007-04-29 16:23:11 +00:00
Adrian Knoth
386baed55b Hotfix for IPv6 support. Closes trac:1008
This commit was SVN r14549.

The following Trac tickets were found above:
  Ticket 1008 --> https://svn.open-mpi.org/trac/ompi/ticket/1008
2007-04-29 13:46:45 +00:00
George Bosilca
46265db0a9 Update the TCP BTL in order to bring back some of the functionalities lost
during the IPv6 patch. The most important is the multi BTL support. There
was a quite interesting bug. Instead of setting up the multiple connections
over different physical devices, based on the time when these connections
were created most of the time they were all using the same physical network.
Which, of course, was not the intended goal, as we top at the maximum
bandwidth available over one device instead of gathering all available
bandwidth from all devices.

Second, the IPv6 RFC suggest to use sockaddr_storage as a holder for the
IP information, but use a sockaddr* when we pass it to functions. This is
only partially corrected by this patch.

Some other minor cleanups.

This commit was SVN r14544.
2007-04-28 19:13:47 +00:00
Adrian Knoth
e3d35258b4 Cosmetics. Brian fixes my crappy code and I fix the curly braces.
That's teamwork, right? ;)

This commit was SVN r14517.
2007-04-25 20:17:19 +00:00
Brian Barrett
4b8bb70afb A couple cleanups for the IPv6 support:
- make opal_sockaddr2str() take a sockaddr_storage instead of a sockaddr_in6
    so that it works for IPv4 and IPv6 addresses, and remove a whole bunch
    of #ifs in the OOOB code.
  - Fix a compiler warning in the TCP BTL due to run-time determined
    array size by making it a dynamicly allocated array.
  - Fix the unpacking code of IPv4 addresses when using IPv6 support, so
    that the address is in the correct location (instead of in an IPv6
    structure, use an IPv4 structure).  Refs trac:1005.

This commit was SVN r14514.

The following Trac tickets were found above:
  Ticket 1005 --> https://svn.open-mpi.org/trac/ompi/ticket/1005
2007-04-25 19:08:07 +00:00
Adrian Knoth
d1ce39de4f Move mca_btl_tcp_addr_isipv4public to opal_addr_isipv4public
This commit was SVN r14512.
2007-04-25 18:06:06 +00:00
Jeff Squyres
c4c68e666a Merge in the ipv6 work from /tmp/ipv6-merge.
This commit was SVN r14503.
2007-04-25 01:55:40 +00:00
Josh Hursey
8f119d9063 Closes trac:977
Fix for memory corruption in the restarted process stack. This stemed from 
the brute force method we were previously using. This commit fixes this by
using a lighter weight solution focused in the r2 BML instead of above the PML.
This is a more efficient and flexible solution, and it solves the original
problem.

In the process I pulled out the ft_event function in the tcp BTL and r2 BML
into a set of *_ft.[c|h] files just to keep any updates to these code paths
as isolated as possible to make merging easier on everyone.

This commit was SVN r14371.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855

The following Trac tickets were found above:
  Ticket 977 --> https://svn.open-mpi.org/trac/ompi/ticket/977
2007-04-14 02:06:05 +00:00
Jeff Squyres
51f286d737 Just like r14289 on the ORTE trunk:
Per discussions with Brian and Ralph, make a slight correction in
where components are installed. Use $pkglibdir, not $libdir/openmpi,
so that when compiled in the orte trunk, components are installed to
the right directory (because the component search patch is checking
$pkglibdir).

This commit was SVN r14345.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r14289
2007-04-12 11:19:42 +00:00
George Bosilca
20f0ec584a A tricky optimization. On my test machine it improve the bandwidth by about 3Mb/s out of 580Mb/s. But
the real interest is for small to middle size unexpected messages. The unexpected messages are copied
by the PML in it's own unexpected buffers. Therefore, there is no reason to make a first copy in the
TCP BTL. The BTL can handle to the PML it's own buffer, and can be sure that once the callback
completed it can reuse the buffer, no matter what happened with the fragment.

This commit was SVN r14320.
2007-04-12 04:52:29 +00:00
George Bosilca
667bda0fef Rework the code a little bit to make things simpler.
This commit was SVN r14203.
2007-04-03 16:05:51 +00:00
George Bosilca
1cb26e3b9c Finally the convertor export a convenience function to allow a consistent
computation of the current location on the pack/unpack process. This can
be used both for retrieving the pointer to the first byte (in the special
case of the cached RDMA protocol) and for getting the current
position (for the pipelined protocol).

I modified all BTLs, but most of them are still untested.

This commit was SVN r14180.
2007-03-30 22:02:45 +00:00
Brian Barrett
464d536928 remove debugging printf
This commit was SVN r14088.
2007-03-20 21:28:28 +00:00
George Bosilca
8c9e4baa47 Add multi-link capabilities to the TCP BTL. This is useful for systems where the
latency is high and the network relatively fast. This will allow for more kernel
level buffering, which allow overlap between system calls and communications.
Somehow, even on fast clusters there is an improvement (non significant).

This patch create multiple modules for the same device, which in turn will
create multiple sockets between the peers. By default the number of BTL by
device is set to 1, so there is no fundamental difference with the current
version. Change the value of btl_tcp_links to enable multiple links between
peers.

This commit was SVN r14076.
2007-03-20 11:50:17 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Brian Barrett
a34e67d743 Remove unneeded PARAM_INIT_FILE variable in configure.params files used by
components that use configure.m4 for configuration or are always built. 
The macro has not been needed since moving to configure types other than
configure.stub

Fixes trac:590

This commit was SVN r13031.

The following Trac tickets were found above:
  Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590
2007-01-08 03:44:22 +00:00
Brian Barrett
8900d3ae43 Second take at fixing the issues with using ompi_ptr_t. Add helper functions for converting from .pval to .lval and vice-versa. Users of ompi_ptr_t types should only use one of the fields in the union unless using the helper conversion functions. For the BTLs, local pointers will always be stored in the .pval field and remote pointers always stored in the .lval field.
George wrote the initial patch, I extended it slightly and am responsible for all bugs found.

Refs trac:587

This commit was SVN r13023.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-07 01:48:57 +00:00
Brian Barrett
48ec0b2071 Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix
for now...

This commit was SVN r12997.

The following SVN revision numbers were found above:
  r12974 --> open-mpi/ompi@27cea44a9c
2007-01-04 22:07:37 +00:00
Brian Barrett
27cea44a9c Fix a number of issues with the ompi_ptr_t:
* Make sure that the pval always writes to the correct portion of the
    lval.  This only matters on 32 bit big endian machines.
  * On 32 bit machines when assigning to pval, the other 4 bytes of lval
    weren't being written, which could lead to bogus data

We use macros so that there aren't casts all over the code and the pval
assignment can occur to the correct 4 bytes.  Refs trac:587

This commit was SVN r12974.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-03 19:47:48 +00:00
Brian Barrett
2ab65eb521 Remove some debugging output that was #if 0'ed out but shouldn't have been
committed into the trunk anyway

This commit was SVN r12897.
2006-12-19 02:34:41 +00:00
Brian Barrett
38c2e43ac2 Print out error string rather than errno for TCP-related errors, making it easier for both the user and us to debug issues with BTL and OOB issues...
This commit was SVN r12852.
2006-12-14 18:20:43 +00:00
Brian Barrett
6f8b366acb Rename liborte to libopen-rte and libopal to libopen-pal per telecon today
and bug #632.

Refs trac:632

This commit was SVN r12762.

The following Trac tickets were found above:
  Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632
2006-12-05 18:27:24 +00:00
Brian Barrett
441432950f Merge in changes from the bwb-heterogeneous temp branch (r12491 -
r12714) for supporting compilers / architectures with different
padding rules.

This commit was SVN r12749.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r12491
  r12714
2006-12-04 20:11:42 +00:00
Gleb Natapov
30ca7457b4 Some BTLs (e.g TCP) can report put/get completion before data actually
hits the buffer on the other side. For this kind of BTLs we need to send
FIN through the same BTL, PUT was performed with so network will handle
ordering for us. If we will use another BTL, receiver can get FIN before
data will hit the buffer and complete request prematurely. We mark such
problematic BTLs with MCA_BTL_FLAGS_FAKE_RDMA flag (this kind of RDMA
is really fake, because the real one guaranties that sender will see the
completion only after receiver's NIC confirmed that all the data was
received).

This commit was SVN r12732.
2006-12-03 10:12:09 +00:00
George Bosilca
658879232b Several small improvements:
- consistent error message when something fails (via BTL_ERROR macro)
- decrease the number of jumps.
- cleanup some parts of the code.

This commit was SVN r12719.
2006-12-01 21:48:06 +00:00
Brian Barrett
0895f5e08d Rename OMPI_PROCESS_NAME_{HTON, NTOH} macros to ORTE_PROCESS_NAME_{HTON, NTOH}
because they are in ORTE, not OMPI.  Also, remove the ORTE_PROCESS_NAME macros
in iof base as they are duplicates of the ones that were in ns_types, which 
meant that bad things happened if you changed what an orte_process_name_t
looked like.

This commit was SVN r12646.
2006-11-22 03:03:21 +00:00
Brian Barrett
33320b7165 Rework the opal_progress interface to better support dynamic processes and at
the same time, remove some of the MPI-related options from OPAL:

  - provide mechanism to change at runtime whether sched_yield() should 
    be called when the progress engine is idle
  - provide mechanism for changing the rate at which the event engine
    is called when there are "no" users of the event engine (ie, when
    using MPI but not TCP)
  - fix some function names in the progress engine to better match
    their intended use (and remove MPI naming scheme)
  - remove progress_mpi_enable / progress_mpi_disable because 
    we can now use the functions to set the sched_yield and
    tick rate interfaces
  - rename opal_progress_events() to opal_progress_set_event_flag()
    because the first really isn't descriptive of what the function
    does and I always got confused by it

This commit was SVN r12645.
2006-11-22 02:06:52 +00:00
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
George Bosilca
126a68dc9a Big datatype commit. Remove all unused features of the datatype engine. As the memory
allocation logic is completely done outside the data-type engine (in the PML) there is
no need for any special case inside the data-type engine. There is less arguments for
the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is
not required anymore as there is no memory allocated in the engine itself). This change
affect all components using datatypes. I test most of them, but it might happens that I
miss some ... If it's the case please let me know (don't shoot the pianist!!).

This commit was SVN r12331.
2006-10-26 23:11:26 +00:00
George Bosilca
640178c4b3 Grepping through the source files I found these calls to the data-type engine
with the wrong type of arguments.

This commit was SVN r12148.
2006-10-17 21:05:04 +00:00
George Bosilca
a3ad4a7fc8 The visibility flags (and/or Windows friendly export) is now on for all BTLs.
This commit was SVN r11662.
2006-09-14 22:19:39 +00:00
Ralph Castain
37dfdb76eb Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done.
This commit was SVN r11661.
2006-09-14 21:29:51 +00:00
George Bosilca
3f0a7cad9e The last patch for Windows support. Mostly casting and conversion to C++ friendly headers.
This commit was SVN r11400.
2006-08-24 16:38:08 +00:00
Galen Shipman
e5c594c211 More updates for the async error handler for btl's
In order to provide backwards compatability the framework versions are bumped
and the handler registeration function is at the end of the btl struct.
Testing done on sm, openib, and gm.. 

This commit was SVN r11256.
2006-08-17 22:02:01 +00:00
Galen Shipman
3b49953ce2 Add error callback to the btl interface, this allows error to be delivered to
the upperlayer assynchronously although there are some issues with this.. such
as there are multiple consumers of the btl's.. who get's the

This commit was SVN r11232.
2006-08-16 20:21:38 +00:00
Ralph Castain
d2912f03e0 Cleanup a historical naming convention problem. Move the socket_errno definitions to the OPAL layer and change the name accordingly. This cleans up some interrelationship issues as well as removing a name confusion.
This commit was SVN r11186.
2006-08-14 20:14:44 +00:00
George Bosilca
238147f576 Help the compiler to optimize the code. Now the order in the enum reflect the
order we use them in the switch.

This commit was SVN r10565.
2006-06-29 15:10:58 +00:00
Galen Shipman
218a438509 finished the ompi_free_list_t class nightmare..
This commit was SVN r10314.
2006-06-12 22:09:03 +00:00
Galen Shipman
38a0561d9b Allow maximum send size to be less than the eager limit.
Instead of figuring out which free list the fragment belongs to based on size
we simply store a pointer to the list which it belongs in the fragment.

This was reviewed by Brian and should hit all the branches.

This commit was SVN r10072.
2006-05-25 16:57:14 +00:00
George Bosilca
085cac552f Don't let TCP to create local connections, we have the self BTL for this purpose.
This commit was SVN r10018.
2006-05-23 03:06:32 +00:00
Jeff Squyres
7b59847765 Ensure that endpoint->endpoint_addr is not NULL before trying to
derefence through it.  It is legal for endpoint_addr to be NULL in the
destructor because if btl_tcp_add_procs() -> btl_tcp_proc_insert()
returns UNREACH, then endpoint_addr will be NULL and we'll OBJ_RELEASE
it.

This commit was SVN r9940.
2006-05-16 19:01:08 +00:00
Tim Woodall
712468dbef add diagnostic interface
This commit was SVN r9328.
2006-03-17 17:39:41 +00:00
Brian Barrett
3e2c51dea8 * fix some silly commenting done by a previous developer that are good for
a laugh but probably not good for usability ;)

This commit was SVN r9253.
2006-03-11 03:09:24 +00:00
Brian Barrett
9b19e3fef0 * remove some debugging output that shouldn't have been committed. Doh!
This commit was SVN r9171.
2006-02-27 16:23:52 +00:00
Brian Barrett
285581dff2 More endian-related cleanups:
- moved hton64 and ntoh64 from the bunch of places it had been copied
    into one header file
  - properly set and use the btl_tcp's nbo option to put things in
    network byte order on the wire if both sides don't have the same
    endianness
  - Put the OB1 PML's headers (with a couple exceptions I need to discuss
    with Tim) in network byte order on the wire if both sides don't have
    the same endianness
  - since it was needed for the TCP BTL, move the orte_process_name_t
    HTON and NTOH macros from the TCP OOB to ns_types.h

This commit was SVN r9145.
2006-02-26 00:45:54 +00:00
Jeff Squyres
628125599d Fix the TCL btl module endpoint matching during setup for the scenario
when running an MPI job spanning a node that has two TCP NICs and a
node that has one TCP NIC.  Previously, for the 2 NIC/module process,
we would return the first peer IP address if we couldn't find a subnet
match with any of the peer's published IP addresses -- this was to
support running OMPI across subnet boundaries.  Changed the behavior
to only do that behavior if the IP address we're trying to match is
public (i.e., not 10.x.y.z, 192.168.x.y, or 172.16.x.y) *and* any of
the remote peer's addresses are public (working on the assumption that
if we both have public addresses, they're routable to each other).

This definitely will not work in all scenarios, such as when we go to
WAN kinds of executions, and will need to be revisited at that time.

This commit was SVN r9119.
2006-02-23 02:02:19 +00:00
Galen Shipman
e58b758031 standardize behavior of btl_alloc, if the size is larger than the max send
size, btl_alloc returns NULL. 

This commit was SVN r9114.
2006-02-22 17:37:59 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
George Bosilca
9f1357fb89 Remove all the useless includes. Most of the endpoint do not depend on the
orte includes.

This commit was SVN r8932.
2006-02-08 05:10:48 +00:00
Galen Shipman
c8045bf397 Fixup for ORTE datatype checkin,
- use appropriate header files 
- change calls from orte_dps to orte_dss 

This commit was SVN r8920.
2006-02-07 15:20:44 +00:00
Ralph Castain
4b9f015c0b Merge in the new data support subsystem for ORTE. MPI folks should not notice a difference. Longer explanation will be sent to developers mailing list.
This commit was SVN r8912.
2006-02-07 03:32:36 +00:00
George Bosilca
d4699037f7 Protect an assert if the endpoint cache is not activated.
This commit was SVN r8695.
2006-01-14 21:10:09 +00:00
George Bosilca
3317bf81ad A better implementation for the TCP endpoint cache + few comments.
This commit was SVN r8692.
2006-01-14 20:21:44 +00:00
George Bosilca
1b667067d6 I need to know the number of iovec attached to the fragment.
This commit was SVN r8447.
2005-12-10 23:28:16 +00:00
George Bosilca
01b0db91ae Get the lower-bound from the data not from the convertor.
This commit was SVN r8444.
2005-12-10 22:38:25 +00:00
George Bosilca
7baae4f394 Protect the headers and remove the unused ones.
This commit was SVN r8439.
2005-12-10 22:04:28 +00:00
Tim Woodall
1929a97d2f corrections for MPI_BOTTOM
This commit was SVN r8429.
2005-12-09 23:27:55 +00:00
George Bosilca
8888bfb063 And the thread-safe version. The lock/unlock macros are supposed to be
empty for non threaded builds, but somehow just by moving the code a
little bit around and removing 2 call to lock/unlock the latency for TCP
went down by 2 micro-seconds ...

This commit was SVN r8426.
2005-12-09 05:16:50 +00:00
George Bosilca
5851b55647 Improve the latency for small and medium messages. The idea is to decrease the
number of recv system call by caching the data. Each endpoint has a buffer
(the size is an MCA parameter) that can be use as a cache. Before each receive
operation this buffer is added at the end of the iovec list. All data that are
not expected by the fragment will go in this cache. If the cache contain data
all subsequent receive will just memcpy the data into the BTL buffers.

The only drawback is that we will spin around the receive_handle until all the
cached data is readed by the PML layer. This limitation come from the fact that
the event library is unable to call us if there is no events on the socket.
Therefore we are unable to keep the data in the cache until the next loop
into the progress engine.

This commit was SVN r8398.
2005-12-07 00:12:59 +00:00
Tim Woodall
5db38b38f5 corrections for latency issue
- don't do additional select until non-blocking read fails 
- don't do an additional read for 0 byte message

This commit was SVN r8312.
2005-11-29 17:33:01 +00:00
George Bosilca
b9a739e2b6 Remove 2 useless assignments (they are done at the end before the return).
This commit was SVN r8260.
2005-11-26 21:16:30 +00:00
Galen Shipman
5cf2d8d40c default to first available IP address if no matching subnets found..
This commit was SVN r8125.
2005-11-12 00:31:34 +00:00
Tim Woodall
62fd74140b decrease socket buffers sizes to same as ptl code
This commit was SVN r8072.
2005-11-10 00:40:55 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Tim Woodall
13409ec53b correction for hang, check for additional fragments before callback,
which may queue a new fragment

This commit was SVN r7889.
2005-10-27 01:39:39 +00:00
George Bosilca
6b3d02b514 Warning cleanups. On some OSes the iov_base member of the iovec structure is defined as an void * when
on others as an char*. Thus the right side of all assignment should be explicitly casted to an void* in
order to avoid any casting complaints from the compilers.

This commit was SVN r7607.
2005-10-04 12:36:07 +00:00
Andrew Friedley
555ae37255 Add lib{opal,orte,mpi}.la to appropriate LIBADD's, some whitespace cleanup as well.
This commit was SVN r7477.
2005-09-22 12:28:54 +00:00
Tim Woodall
a74ca0062a reductions to initial memory footprint
This commit was SVN r7455.
2005-09-21 19:10:56 +00:00
Tim Woodall
d190e6a315 handle losing a connection
This commit was SVN r7373.
2005-09-14 21:27:30 +00:00
Tim Woodall
c25fb5dab0 - fixed issue w/ btl send-in-place option that was affecting tcp
- reduced size of match header by an additional 4 bytes to 16 bytes
- corrections for buffered send (work in progress)

This commit was SVN r7371.
2005-09-14 17:08:08 +00:00
Brian Barrett
e98415eb7b * make tree compile on OS X
This commit was SVN r7370.
2005-09-14 15:52:42 +00:00
George Bosilca
c9fb1f32f2 And more dependencies fixes. The big commit will follow shortly.
This commit was SVN r7319.
2005-09-12 20:22:59 +00:00
Tim Woodall
3e002203a0 dont need to adjust size
This commit was SVN r7213.
2005-09-07 13:25:05 +00:00
Brian Barrett
ed56e743b7 * update configure.ac to use the modern version of AC_INIT and
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
  number to be set at autoconf time (instead of at configure time, as
  it was before).  Set the version number, minus the subversion r number,
  at autoconf time.  Override the internal variables to include the r
  number (if needed) at configure time.  Basically, the right thing
  should always happen.  The only place it might not is the version
  reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
  them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
  in the directory containing source files, even if the Makefile.am is
  in another directory.  This should start making it feasible to
  reduce the number of Makefile.am files we have in the tree, which
  will greatly reduce the time to run autogen and configure.

This commit was SVN r7211.
2005-09-07 05:54:53 +00:00
Tim Woodall
d34e299829 correctly decrement progress_event if tcp is not being
used so that tcp doesn't impact progress loop

This commit was SVN r7078.
2005-08-29 17:29:58 +00:00
Tim Woodall
d57f3e1662 cleanup - handle request/prepare of zero bytes as special case
This commit was SVN r7055.
2005-08-26 20:19:11 +00:00
Tim Woodall
205af3af0a correct segment address
This commit was SVN r6942.
2005-08-19 20:20:27 +00:00