1
1
Граф коммитов

75 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
a34e67d743 Remove unneeded PARAM_INIT_FILE variable in configure.params files used by
components that use configure.m4 for configuration or are always built. 
The macro has not been needed since moving to configure types other than
configure.stub

Fixes trac:590

This commit was SVN r13031.

The following Trac tickets were found above:
  Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590
2007-01-08 03:44:22 +00:00
Brian Barrett
8900d3ae43 Second take at fixing the issues with using ompi_ptr_t. Add helper functions for converting from .pval to .lval and vice-versa. Users of ompi_ptr_t types should only use one of the fields in the union unless using the helper conversion functions. For the BTLs, local pointers will always be stored in the .pval field and remote pointers always stored in the .lval field.
George wrote the initial patch, I extended it slightly and am responsible for all bugs found.

Refs trac:587

This commit was SVN r13023.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-07 01:48:57 +00:00
Brian Barrett
48ec0b2071 Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix
for now...

This commit was SVN r12997.

The following SVN revision numbers were found above:
  r12974 --> open-mpi/ompi@27cea44a9c
2007-01-04 22:07:37 +00:00
Brian Barrett
27cea44a9c Fix a number of issues with the ompi_ptr_t:
* Make sure that the pval always writes to the correct portion of the
    lval.  This only matters on 32 bit big endian machines.
  * On 32 bit machines when assigning to pval, the other 4 bytes of lval
    weren't being written, which could lead to bogus data

We use macros so that there aren't casts all over the code and the pval
assignment can occur to the correct 4 bytes.  Refs trac:587

This commit was SVN r12974.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-03 19:47:48 +00:00
Brian Barrett
2ab65eb521 Remove some debugging output that was #if 0'ed out but shouldn't have been
committed into the trunk anyway

This commit was SVN r12897.
2006-12-19 02:34:41 +00:00
Brian Barrett
38c2e43ac2 Print out error string rather than errno for TCP-related errors, making it easier for both the user and us to debug issues with BTL and OOB issues...
This commit was SVN r12852.
2006-12-14 18:20:43 +00:00
Brian Barrett
6f8b366acb Rename liborte to libopen-rte and libopal to libopen-pal per telecon today
and bug #632.

Refs trac:632

This commit was SVN r12762.

The following Trac tickets were found above:
  Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632
2006-12-05 18:27:24 +00:00
Brian Barrett
441432950f Merge in changes from the bwb-heterogeneous temp branch (r12491 -
r12714) for supporting compilers / architectures with different
padding rules.

This commit was SVN r12749.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r12491
  r12714
2006-12-04 20:11:42 +00:00
Gleb Natapov
30ca7457b4 Some BTLs (e.g TCP) can report put/get completion before data actually
hits the buffer on the other side. For this kind of BTLs we need to send
FIN through the same BTL, PUT was performed with so network will handle
ordering for us. If we will use another BTL, receiver can get FIN before
data will hit the buffer and complete request prematurely. We mark such
problematic BTLs with MCA_BTL_FLAGS_FAKE_RDMA flag (this kind of RDMA
is really fake, because the real one guaranties that sender will see the
completion only after receiver's NIC confirmed that all the data was
received).

This commit was SVN r12732.
2006-12-03 10:12:09 +00:00
George Bosilca
658879232b Several small improvements:
- consistent error message when something fails (via BTL_ERROR macro)
- decrease the number of jumps.
- cleanup some parts of the code.

This commit was SVN r12719.
2006-12-01 21:48:06 +00:00
Brian Barrett
0895f5e08d Rename OMPI_PROCESS_NAME_{HTON, NTOH} macros to ORTE_PROCESS_NAME_{HTON, NTOH}
because they are in ORTE, not OMPI.  Also, remove the ORTE_PROCESS_NAME macros
in iof base as they are duplicates of the ones that were in ns_types, which 
meant that bad things happened if you changed what an orte_process_name_t
looked like.

This commit was SVN r12646.
2006-11-22 03:03:21 +00:00
Brian Barrett
33320b7165 Rework the opal_progress interface to better support dynamic processes and at
the same time, remove some of the MPI-related options from OPAL:

  - provide mechanism to change at runtime whether sched_yield() should 
    be called when the progress engine is idle
  - provide mechanism for changing the rate at which the event engine
    is called when there are "no" users of the event engine (ie, when
    using MPI but not TCP)
  - fix some function names in the progress engine to better match
    their intended use (and remove MPI naming scheme)
  - remove progress_mpi_enable / progress_mpi_disable because 
    we can now use the functions to set the sched_yield and
    tick rate interfaces
  - rename opal_progress_events() to opal_progress_set_event_flag()
    because the first really isn't descriptive of what the function
    does and I always got confused by it

This commit was SVN r12645.
2006-11-22 02:06:52 +00:00
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
George Bosilca
126a68dc9a Big datatype commit. Remove all unused features of the datatype engine. As the memory
allocation logic is completely done outside the data-type engine (in the PML) there is
no need for any special case inside the data-type engine. There is less arguments for
the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is
not required anymore as there is no memory allocated in the engine itself). This change
affect all components using datatypes. I test most of them, but it might happens that I
miss some ... If it's the case please let me know (don't shoot the pianist!!).

This commit was SVN r12331.
2006-10-26 23:11:26 +00:00
George Bosilca
640178c4b3 Grepping through the source files I found these calls to the data-type engine
with the wrong type of arguments.

This commit was SVN r12148.
2006-10-17 21:05:04 +00:00
George Bosilca
a3ad4a7fc8 The visibility flags (and/or Windows friendly export) is now on for all BTLs.
This commit was SVN r11662.
2006-09-14 22:19:39 +00:00
Ralph Castain
37dfdb76eb Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done.
This commit was SVN r11661.
2006-09-14 21:29:51 +00:00
George Bosilca
3f0a7cad9e The last patch for Windows support. Mostly casting and conversion to C++ friendly headers.
This commit was SVN r11400.
2006-08-24 16:38:08 +00:00
Galen Shipman
e5c594c211 More updates for the async error handler for btl's
In order to provide backwards compatability the framework versions are bumped
and the handler registeration function is at the end of the btl struct.
Testing done on sm, openib, and gm.. 

This commit was SVN r11256.
2006-08-17 22:02:01 +00:00
Galen Shipman
3b49953ce2 Add error callback to the btl interface, this allows error to be delivered to
the upperlayer assynchronously although there are some issues with this.. such
as there are multiple consumers of the btl's.. who get's the

This commit was SVN r11232.
2006-08-16 20:21:38 +00:00
Ralph Castain
d2912f03e0 Cleanup a historical naming convention problem. Move the socket_errno definitions to the OPAL layer and change the name accordingly. This cleans up some interrelationship issues as well as removing a name confusion.
This commit was SVN r11186.
2006-08-14 20:14:44 +00:00
George Bosilca
238147f576 Help the compiler to optimize the code. Now the order in the enum reflect the
order we use them in the switch.

This commit was SVN r10565.
2006-06-29 15:10:58 +00:00
Galen Shipman
218a438509 finished the ompi_free_list_t class nightmare..
This commit was SVN r10314.
2006-06-12 22:09:03 +00:00
Galen Shipman
38a0561d9b Allow maximum send size to be less than the eager limit.
Instead of figuring out which free list the fragment belongs to based on size
we simply store a pointer to the list which it belongs in the fragment.

This was reviewed by Brian and should hit all the branches.

This commit was SVN r10072.
2006-05-25 16:57:14 +00:00
George Bosilca
085cac552f Don't let TCP to create local connections, we have the self BTL for this purpose.
This commit was SVN r10018.
2006-05-23 03:06:32 +00:00
Jeff Squyres
7b59847765 Ensure that endpoint->endpoint_addr is not NULL before trying to
derefence through it.  It is legal for endpoint_addr to be NULL in the
destructor because if btl_tcp_add_procs() -> btl_tcp_proc_insert()
returns UNREACH, then endpoint_addr will be NULL and we'll OBJ_RELEASE
it.

This commit was SVN r9940.
2006-05-16 19:01:08 +00:00
Tim Woodall
712468dbef add diagnostic interface
This commit was SVN r9328.
2006-03-17 17:39:41 +00:00
Brian Barrett
3e2c51dea8 * fix some silly commenting done by a previous developer that are good for
a laugh but probably not good for usability ;)

This commit was SVN r9253.
2006-03-11 03:09:24 +00:00
Brian Barrett
9b19e3fef0 * remove some debugging output that shouldn't have been committed. Doh!
This commit was SVN r9171.
2006-02-27 16:23:52 +00:00
Brian Barrett
285581dff2 More endian-related cleanups:
- moved hton64 and ntoh64 from the bunch of places it had been copied
    into one header file
  - properly set and use the btl_tcp's nbo option to put things in
    network byte order on the wire if both sides don't have the same
    endianness
  - Put the OB1 PML's headers (with a couple exceptions I need to discuss
    with Tim) in network byte order on the wire if both sides don't have
    the same endianness
  - since it was needed for the TCP BTL, move the orte_process_name_t
    HTON and NTOH macros from the TCP OOB to ns_types.h

This commit was SVN r9145.
2006-02-26 00:45:54 +00:00
Jeff Squyres
628125599d Fix the TCL btl module endpoint matching during setup for the scenario
when running an MPI job spanning a node that has two TCP NICs and a
node that has one TCP NIC.  Previously, for the 2 NIC/module process,
we would return the first peer IP address if we couldn't find a subnet
match with any of the peer's published IP addresses -- this was to
support running OMPI across subnet boundaries.  Changed the behavior
to only do that behavior if the IP address we're trying to match is
public (i.e., not 10.x.y.z, 192.168.x.y, or 172.16.x.y) *and* any of
the remote peer's addresses are public (working on the assumption that
if we both have public addresses, they're routable to each other).

This definitely will not work in all scenarios, such as when we go to
WAN kinds of executions, and will need to be revisited at that time.

This commit was SVN r9119.
2006-02-23 02:02:19 +00:00
Galen Shipman
e58b758031 standardize behavior of btl_alloc, if the size is larger than the max send
size, btl_alloc returns NULL. 

This commit was SVN r9114.
2006-02-22 17:37:59 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
George Bosilca
9f1357fb89 Remove all the useless includes. Most of the endpoint do not depend on the
orte includes.

This commit was SVN r8932.
2006-02-08 05:10:48 +00:00
Galen Shipman
c8045bf397 Fixup for ORTE datatype checkin,
- use appropriate header files 
- change calls from orte_dps to orte_dss 

This commit was SVN r8920.
2006-02-07 15:20:44 +00:00
Ralph Castain
4b9f015c0b Merge in the new data support subsystem for ORTE. MPI folks should not notice a difference. Longer explanation will be sent to developers mailing list.
This commit was SVN r8912.
2006-02-07 03:32:36 +00:00
George Bosilca
d4699037f7 Protect an assert if the endpoint cache is not activated.
This commit was SVN r8695.
2006-01-14 21:10:09 +00:00
George Bosilca
3317bf81ad A better implementation for the TCP endpoint cache + few comments.
This commit was SVN r8692.
2006-01-14 20:21:44 +00:00
George Bosilca
1b667067d6 I need to know the number of iovec attached to the fragment.
This commit was SVN r8447.
2005-12-10 23:28:16 +00:00
George Bosilca
01b0db91ae Get the lower-bound from the data not from the convertor.
This commit was SVN r8444.
2005-12-10 22:38:25 +00:00
George Bosilca
7baae4f394 Protect the headers and remove the unused ones.
This commit was SVN r8439.
2005-12-10 22:04:28 +00:00
Tim Woodall
1929a97d2f corrections for MPI_BOTTOM
This commit was SVN r8429.
2005-12-09 23:27:55 +00:00
George Bosilca
8888bfb063 And the thread-safe version. The lock/unlock macros are supposed to be
empty for non threaded builds, but somehow just by moving the code a
little bit around and removing 2 call to lock/unlock the latency for TCP
went down by 2 micro-seconds ...

This commit was SVN r8426.
2005-12-09 05:16:50 +00:00
George Bosilca
5851b55647 Improve the latency for small and medium messages. The idea is to decrease the
number of recv system call by caching the data. Each endpoint has a buffer
(the size is an MCA parameter) that can be use as a cache. Before each receive
operation this buffer is added at the end of the iovec list. All data that are
not expected by the fragment will go in this cache. If the cache contain data
all subsequent receive will just memcpy the data into the BTL buffers.

The only drawback is that we will spin around the receive_handle until all the
cached data is readed by the PML layer. This limitation come from the fact that
the event library is unable to call us if there is no events on the socket.
Therefore we are unable to keep the data in the cache until the next loop
into the progress engine.

This commit was SVN r8398.
2005-12-07 00:12:59 +00:00
Tim Woodall
5db38b38f5 corrections for latency issue
- don't do additional select until non-blocking read fails 
- don't do an additional read for 0 byte message

This commit was SVN r8312.
2005-11-29 17:33:01 +00:00
George Bosilca
b9a739e2b6 Remove 2 useless assignments (they are done at the end before the return).
This commit was SVN r8260.
2005-11-26 21:16:30 +00:00
Galen Shipman
5cf2d8d40c default to first available IP address if no matching subnets found..
This commit was SVN r8125.
2005-11-12 00:31:34 +00:00
Tim Woodall
62fd74140b decrease socket buffers sizes to same as ptl code
This commit was SVN r8072.
2005-11-10 00:40:55 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Tim Woodall
13409ec53b correction for hang, check for additional fragments before callback,
which may queue a new fragment

This commit was SVN r7889.
2005-10-27 01:39:39 +00:00