openmpi

Автор	SHA1	Сообщение	Дата
Brian Barrett	a34e67d743	Remove unneeded PARAM_INIT_FILE variable in configure.params files used by components that use configure.m4 for configuration or are always built. The macro has not been needed since moving to configure types other than configure.stub Fixes trac:590 This commit was SVN r13031. The following Trac tickets were found above: Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590	2007-01-08 03:44:22 +00:00
Brian Barrett	8900d3ae43	Second take at fixing the issues with using ompi_ptr_t. Add helper functions for converting from .pval to .lval and vice-versa. Users of ompi_ptr_t types should only use one of the fields in the union unless using the helper conversion functions. For the BTLs, local pointers will always be stored in the .pval field and remote pointers always stored in the .lval field. George wrote the initial patch, I extended it slightly and am responsible for all bugs found. Refs trac:587 This commit was SVN r13023. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2007-01-07 01:48:57 +00:00
Brian Barrett	48ec0b2071	Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix for now... This commit was SVN r12997. The following SVN revision numbers were found above: r12974 --> open-mpi/ompi@27cea44a9c	2007-01-04 22:07:37 +00:00
Brian Barrett	27cea44a9c	Fix a number of issues with the ompi_ptr_t: * Make sure that the pval always writes to the correct portion of the lval. This only matters on 32 bit big endian machines. * On 32 bit machines when assigning to pval, the other 4 bytes of lval weren't being written, which could lead to bogus data We use macros so that there aren't casts all over the code and the pval assignment can occur to the correct 4 bytes. Refs trac:587 This commit was SVN r12974. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2007-01-03 19:47:48 +00:00
Brian Barrett	2ab65eb521	Remove some debugging output that was #if 0'ed out but shouldn't have been committed into the trunk anyway This commit was SVN r12897.	2006-12-19 02:34:41 +00:00
Brian Barrett	38c2e43ac2	Print out error string rather than errno for TCP-related errors, making it easier for both the user and us to debug issues with BTL and OOB issues... This commit was SVN r12852.	2006-12-14 18:20:43 +00:00
Brian Barrett	6f8b366acb	Rename liborte to libopen-rte and libopal to libopen-pal per telecon today and bug #632. Refs trac:632 This commit was SVN r12762. The following Trac tickets were found above: Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632	2006-12-05 18:27:24 +00:00
Brian Barrett	441432950f	Merge in changes from the bwb-heterogeneous temp branch (r12491 - r12714) for supporting compilers / architectures with different padding rules. This commit was SVN r12749. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r12491 r12714	2006-12-04 20:11:42 +00:00
Gleb Natapov	30ca7457b4	Some BTLs (e.g TCP) can report put/get completion before data actually hits the buffer on the other side. For this kind of BTLs we need to send FIN through the same BTL, PUT was performed with so network will handle ordering for us. If we will use another BTL, receiver can get FIN before data will hit the buffer and complete request prematurely. We mark such problematic BTLs with MCA_BTL_FLAGS_FAKE_RDMA flag (this kind of RDMA is really fake, because the real one guaranties that sender will see the completion only after receiver's NIC confirmed that all the data was received). This commit was SVN r12732.	2006-12-03 10:12:09 +00:00
George Bosilca	658879232b	Several small improvements: - consistent error message when something fails (via BTL_ERROR macro) - decrease the number of jumps. - cleanup some parts of the code. This commit was SVN r12719.	2006-12-01 21:48:06 +00:00
Brian Barrett	0895f5e08d	Rename OMPI_PROCESS_NAME_{HTON, NTOH} macros to ORTE_PROCESS_NAME_{HTON, NTOH} because they are in ORTE, not OMPI. Also, remove the ORTE_PROCESS_NAME macros in iof base as they are duplicates of the ones that were in ns_types, which meant that bad things happened if you changed what an orte_process_name_t looked like. This commit was SVN r12646.	2006-11-22 03:03:21 +00:00
Brian Barrett	33320b7165	Rework the opal_progress interface to better support dynamic processes and at the same time, remove some of the MPI-related options from OPAL: - provide mechanism to change at runtime whether sched_yield() should be called when the progress engine is idle - provide mechanism for changing the rate at which the event engine is called when there are "no" users of the event engine (ie, when using MPI but not TCP) - fix some function names in the progress engine to better match their intended use (and remove MPI naming scheme) - remove progress_mpi_enable / progress_mpi_disable because we can now use the functions to set the sched_yield and tick rate interfaces - rename opal_progress_events() to opal_progress_set_event_flag() because the first really isn't descriptive of what the function does and I always got confused by it This commit was SVN r12645.	2006-11-22 02:06:52 +00:00
Ralph Castain	6d6cebb4a7	Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it. I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn). This commit was SVN r12597.	2006-11-14 19:34:59 +00:00
George Bosilca	126a68dc9a	Big datatype commit. Remove all unused features of the datatype engine. As the memory allocation logic is completely done outside the data-type engine (in the PML) there is no need for any special case inside the data-type engine. There is less arguments for the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is not required anymore as there is no memory allocated in the engine itself). This change affect all components using datatypes. I test most of them, but it might happens that I miss some ... If it's the case please let me know (don't shoot the pianist!!). This commit was SVN r12331.	2006-10-26 23:11:26 +00:00
George Bosilca	640178c4b3	Grepping through the source files I found these calls to the data-type engine with the wrong type of arguments. This commit was SVN r12148.	2006-10-17 21:05:04 +00:00
George Bosilca	a3ad4a7fc8	The visibility flags (and/or Windows friendly export) is now on for all BTLs. This commit was SVN r11662.	2006-09-14 22:19:39 +00:00
Ralph Castain	37dfdb76eb	Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done. This commit was SVN r11661.	2006-09-14 21:29:51 +00:00
George Bosilca	3f0a7cad9e	The last patch for Windows support. Mostly casting and conversion to C++ friendly headers. This commit was SVN r11400.	2006-08-24 16:38:08 +00:00
Galen Shipman	e5c594c211	More updates for the async error handler for btl's In order to provide backwards compatability the framework versions are bumped and the handler registeration function is at the end of the btl struct. Testing done on sm, openib, and gm.. This commit was SVN r11256.	2006-08-17 22:02:01 +00:00
Galen Shipman	3b49953ce2	Add error callback to the btl interface, this allows error to be delivered to the upperlayer assynchronously although there are some issues with this.. such as there are multiple consumers of the btl's.. who get's the This commit was SVN r11232.	2006-08-16 20:21:38 +00:00
Ralph Castain	d2912f03e0	Cleanup a historical naming convention problem. Move the socket_errno definitions to the OPAL layer and change the name accordingly. This cleans up some interrelationship issues as well as removing a name confusion. This commit was SVN r11186.	2006-08-14 20:14:44 +00:00
George Bosilca	238147f576	Help the compiler to optimize the code. Now the order in the enum reflect the order we use them in the switch. This commit was SVN r10565.	2006-06-29 15:10:58 +00:00
Galen Shipman	218a438509	finished the ompi_free_list_t class nightmare.. This commit was SVN r10314.	2006-06-12 22:09:03 +00:00
Galen Shipman	38a0561d9b	Allow maximum send size to be less than the eager limit. Instead of figuring out which free list the fragment belongs to based on size we simply store a pointer to the list which it belongs in the fragment. This was reviewed by Brian and should hit all the branches. This commit was SVN r10072.	2006-05-25 16:57:14 +00:00
George Bosilca	085cac552f	Don't let TCP to create local connections, we have the self BTL for this purpose. This commit was SVN r10018.	2006-05-23 03:06:32 +00:00
Jeff Squyres	7b59847765	Ensure that endpoint->endpoint_addr is not NULL before trying to derefence through it. It is legal for endpoint_addr to be NULL in the destructor because if btl_tcp_add_procs() -> btl_tcp_proc_insert() returns UNREACH, then endpoint_addr will be NULL and we'll OBJ_RELEASE it. This commit was SVN r9940.	2006-05-16 19:01:08 +00:00
Tim Woodall	712468dbef	add diagnostic interface This commit was SVN r9328.	2006-03-17 17:39:41 +00:00
Brian Barrett	3e2c51dea8	* fix some silly commenting done by a previous developer that are good for a laugh but probably not good for usability ;) This commit was SVN r9253.	2006-03-11 03:09:24 +00:00
Brian Barrett	9b19e3fef0	* remove some debugging output that shouldn't have been committed. Doh! This commit was SVN r9171.	2006-02-27 16:23:52 +00:00
Brian Barrett	285581dff2	More endian-related cleanups: - moved hton64 and ntoh64 from the bunch of places it had been copied into one header file - properly set and use the btl_tcp's nbo option to put things in network byte order on the wire if both sides don't have the same endianness - Put the OB1 PML's headers (with a couple exceptions I need to discuss with Tim) in network byte order on the wire if both sides don't have the same endianness - since it was needed for the TCP BTL, move the orte_process_name_t HTON and NTOH macros from the TCP OOB to ns_types.h This commit was SVN r9145.	2006-02-26 00:45:54 +00:00
Jeff Squyres	628125599d	Fix the TCL btl module endpoint matching during setup for the scenario when running an MPI job spanning a node that has two TCP NICs and a node that has one TCP NIC. Previously, for the 2 NIC/module process, we would return the first peer IP address if we couldn't find a subnet match with any of the peer's published IP addresses -- this was to support running OMPI across subnet boundaries. Changed the behavior to only do that behavior if the IP address we're trying to match is public (i.e., not 10.x.y.z, 192.168.x.y, or 172.16.x.y) and any of the remote peer's addresses are public (working on the assumption that if we both have public addresses, they're routable to each other). This definitely will not work in all scenarios, such as when we go to WAN kinds of executions, and will need to be revisited at that time. This commit was SVN r9119.	2006-02-23 02:02:19 +00:00
Galen Shipman	e58b758031	standardize behavior of btl_alloc, if the size is larger than the max send size, btl_alloc returns NULL. This commit was SVN r9114.	2006-02-22 17:37:59 +00:00
Brian Barrett	566a050c23	Next step in the project split, mainly source code re-arranging - move files out of toplevel include/ and etc/, moving it into the sub-projects - rather than including config headers with <project>/include, have them as <project> - require all headers to be included with a project prefix, with the exception of the config headers ({opal,orte,ompi}_config.h mpi.h, and mpif.h) This commit was SVN r8985.	2006-02-12 01:33:29 +00:00
George Bosilca	9f1357fb89	Remove all the useless includes. Most of the endpoint do not depend on the orte includes. This commit was SVN r8932.	2006-02-08 05:10:48 +00:00
Galen Shipman	c8045bf397	Fixup for ORTE datatype checkin, - use appropriate header files - change calls from orte_dps to orte_dss This commit was SVN r8920.	2006-02-07 15:20:44 +00:00
Ralph Castain	4b9f015c0b	Merge in the new data support subsystem for ORTE. MPI folks should not notice a difference. Longer explanation will be sent to developers mailing list. This commit was SVN r8912.	2006-02-07 03:32:36 +00:00
George Bosilca	d4699037f7	Protect an assert if the endpoint cache is not activated. This commit was SVN r8695.	2006-01-14 21:10:09 +00:00
George Bosilca	3317bf81ad	A better implementation for the TCP endpoint cache + few comments. This commit was SVN r8692.	2006-01-14 20:21:44 +00:00
George Bosilca	1b667067d6	I need to know the number of iovec attached to the fragment. This commit was SVN r8447.	2005-12-10 23:28:16 +00:00
George Bosilca	01b0db91ae	Get the lower-bound from the data not from the convertor. This commit was SVN r8444.	2005-12-10 22:38:25 +00:00
George Bosilca	7baae4f394	Protect the headers and remove the unused ones. This commit was SVN r8439.	2005-12-10 22:04:28 +00:00
Tim Woodall	1929a97d2f	corrections for MPI_BOTTOM This commit was SVN r8429.	2005-12-09 23:27:55 +00:00
George Bosilca	8888bfb063	And the thread-safe version. The lock/unlock macros are supposed to be empty for non threaded builds, but somehow just by moving the code a little bit around and removing 2 call to lock/unlock the latency for TCP went down by 2 micro-seconds ... This commit was SVN r8426.	2005-12-09 05:16:50 +00:00
George Bosilca	5851b55647	Improve the latency for small and medium messages. The idea is to decrease the number of recv system call by caching the data. Each endpoint has a buffer (the size is an MCA parameter) that can be use as a cache. Before each receive operation this buffer is added at the end of the iovec list. All data that are not expected by the fragment will go in this cache. If the cache contain data all subsequent receive will just memcpy the data into the BTL buffers. The only drawback is that we will spin around the receive_handle until all the cached data is readed by the PML layer. This limitation come from the fact that the event library is unable to call us if there is no events on the socket. Therefore we are unable to keep the data in the cache until the next loop into the progress engine. This commit was SVN r8398.	2005-12-07 00:12:59 +00:00
Tim Woodall	5db38b38f5	corrections for latency issue - don't do additional select until non-blocking read fails - don't do an additional read for 0 byte message This commit was SVN r8312.	2005-11-29 17:33:01 +00:00
George Bosilca	b9a739e2b6	Remove 2 useless assignments (they are done at the end before the return). This commit was SVN r8260.	2005-11-26 21:16:30 +00:00
Galen Shipman	5cf2d8d40c	default to first available IP address if no matching subnets found.. This commit was SVN r8125.	2005-11-12 00:31:34 +00:00
Tim Woodall	62fd74140b	decrease socket buffers sizes to same as ptl code This commit was SVN r8072.	2005-11-10 00:40:55 +00:00
Jeff Squyres	42ec26e640	Update the copyright notices for IU and UTK. This commit was SVN r7999.	2005-11-05 19:57:48 +00:00
Tim Woodall	13409ec53b	correction for hang, check for additional fragments before callback, which may queue a new fragment This commit was SVN r7889.	2005-10-27 01:39:39 +00:00

1 2

75 Коммитов