openmpi

Автор	SHA1	Сообщение	Дата
George Bosilca	d8dee3a740	If the MX driver was unable to load correctly, or if the endpoint was not created then don't try to call the MX endpoint close function. This commit was SVN r12950.	2007-01-02 00:01:50 +00:00
George Bosilca	e223b27268	A fragment is marked completed by the PML when the peer signal the completion of the RDMA operation associated with the fragment. The PML will call the BML free which in turn will call the BTL free. The MX BTL will not release the fragment if it not tagged with 0xff. This commit was SVN r12947.	2006-12-31 03:17:47 +00:00
George Bosilca	47601e315e	Allow the MX BTL to select at runtime if the unexpected handler will be activated or not. This commit was SVN r12944.	2006-12-30 20:57:50 +00:00
George Bosilca	d401a65975	Minor cleanups. Don't set the fields that will never be used. This commit was SVN r12941.	2006-12-29 07:55:17 +00:00
George Bosilca	416e5b5f6a	Enable the MX extensions if and only if the mx_extensions.h header is installed on the system. This commit was SVN r12937.	2006-12-29 00:31:32 +00:00
George Bosilca	d7bc180a90	The max allocated tag is not 16. Use the define instead. This commit was SVN r12936.	2006-12-28 22:48:58 +00:00
George Bosilca	3eeecc3838	Add support for faster small messages. While sending a message, we check if the data was buffered by the MX library. If it's the case then we declare the send as completed and disable the completion event for the mx request. This commit was SVN r12935.	2006-12-28 22:34:24 +00:00
George Bosilca	b996c00d1a	Set the limits for the MX fragments to 4K. Add code to dump the state of the MX hardware (not activated). This commit was SVN r12931.	2006-12-28 08:40:37 +00:00
George Bosilca	3903009b8b	Add a check for the unexpected handler. If enabled, allow the zero-copy protocol over the MX BTL. Now, we have only one matching, the one in Open MPI. The problem is that when the unexpected handler is triggered, not all the message is on the host memory. In the best case we get one MX fragment (internal MX fragment), in the worst we get NULL. The only way to fit this with the design of the PML is to force the eager protocol at the MX internal fragment size, and to limit the send/receive protocol at the same size. Tests show the outcome is not far from optimal (if the pipeline depth is increased a little bit). Set MX_PIPELINE_LOG in order to allow MX to use internal fragments of 4K. This commit was SVN r12930.	2006-12-28 03:35:41 +00:00
George Bosilca	ff2319dcb7	Complete the OUT protocol. Small latency improvements. Some minor cleanups. Create some macros, reorder some functions. Make sure all fragments are correctly released at the end. This commit was SVN r12926.	2006-12-26 18:15:24 +00:00
George Bosilca	75a35ed7ee	Implement the PUT protocol over MX. The send/receive approach give the best performance on a 2G Myrinet card, as it look like pipelining the messages by 1M is faster than a simple send/receive. However, when using a 10G card the send/receive will limit the maximum bandwidth to 2.5Gbs. The reason is the scarce bus resources that have to be shared between the Myrinet hardware and the memcpy operation. The PUT protocol remove the memcpy, we now have a true zero-copy mechanism. But, there is no pipelining yet as it look like the RDMA pipeline somehow disappeared from the OB1 PML ... This commit was SVN r12925.	2006-12-24 22:52:46 +00:00
George Bosilca	e8bd985870	Add more output when calls to the MX library fails. Move the connection status from theproc into the endpoint. This commit was SVN r12924.	2006-12-24 22:34:48 +00:00
George Bosilca	14dc72f595	Allow the user to change the MX flags. This commit was SVN r12923.	2006-12-24 22:21:00 +00:00
George Bosilca	dbe2798638	Allow MX to handle shared memory and self communications. By default these features are disabled (btl_mx_shared_mem respectively btl_mx_self have to be set in order to activate them). This commit was SVN r12922.	2006-12-24 22:18:41 +00:00
Brian Barrett	7880353fcc	Need to close every endpoint we open, or the MX progress thread doesn't die, which can cause segfaults on shutdown. Calling mx_finalize() isn't enough to shutdown the thread, so must close endpoints as well. Refs trac:513 This commit was SVN r12908. The following Trac tickets were found above: Ticket 513 --> https://svn.open-mpi.org/trac/ompi/ticket/513	2006-12-21 18:13:22 +00:00
Gleb Natapov	484c6a2c1a	Use OPAL_ALIGN() macro to align length. Return address from mpool_alloc is now properly aligned so no need to align it once more. This commit was SVN r12899.	2006-12-19 08:34:48 +00:00
Brian Barrett	2ab65eb521	Remove some debugging output that was #if 0'ed out but shouldn't have been committed into the trunk anyway This commit was SVN r12897.	2006-12-19 02:34:41 +00:00
Brian Barrett	b448b4e47e	More heterogeneous fixes. Don't set reachability bit on a remote proc if the remote architecture differs from the local architecture and the btl doesn't support heterogeneous transport. Refs trac:587 This commit was SVN r12879. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2006-12-17 17:27:08 +00:00
Gleb Natapov	190e7a27cd	Merge with gleb-mpool branch. All RDMA components use same mpool now (rdma). udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited). This commit was SVN r12878.	2006-12-17 12:26:41 +00:00
Brian Barrett	0653dc3f24	Pad headers to eliminate heterogeneous issues. Add conversion functions for switching endianness of headers. Galen is going to add the code to use the endian stuff... This commit was SVN r12876.	2006-12-17 00:50:59 +00:00
Brad Benton	18da4c40d3	Set the QP's static rate from the associated MCA parameter, rather than just defaulting to 0. Fixes trac:675 This commit was SVN r12855. The following Trac tickets were found above: Ticket 675 --> https://svn.open-mpi.org/trac/ompi/ticket/675	2006-12-14 19:42:24 +00:00
Brian Barrett	38c2e43ac2	Print out error string rather than errno for TCP-related errors, making it easier for both the user and us to debug issues with BTL and OOB issues... This commit was SVN r12852.	2006-12-14 18:20:43 +00:00
Jeff Squyres	0ca8cb35b7	Fixes trac:366 Add ability for ini files to recognize "use_eager_rdma" flag. Set the default to "no" (because we should assume that HCAs cannot support the property necessary for using RDMA for eager messages -- that the last byte of the message is guaranteed to be written to memory last -- unless proven otherwise. For example, iWARP cards apparently do not provide this guarantee), and then set all Mellanox and IBM HCAs to override the default to enable this behavior on these cards. This commit was SVN r12851. The following Trac tickets were found above: Ticket 366 --> https://svn.open-mpi.org/trac/ompi/ticket/366	2006-12-14 15:52:13 +00:00
George Bosilca	80bc0c8868	Allow the MX to survive if we are unable to connect to a peer. The PML will try to find another route. This commit was SVN r12837.	2006-12-13 01:12:07 +00:00
Brad Benton	337116d5fd	Added IBM eHCA vendor and part id info This commit was SVN r12827.	2006-12-12 14:12:39 +00:00
Jeff Squyres	e70ef98ea6	Update the help message to be a bit more specific and refer to the web FAQ. This commit was SVN r12812.	2006-12-09 15:13:03 +00:00
Patrick Geoffray	58c6f8c8e1	Copyright update. Thanks to Jeff to remind me. This commit was SVN r12803.	2006-12-07 23:55:00 +00:00
Patrick Geoffray	6e09b0c23f	lval is not defined when pval is assigned on 32 bit systems. this usually is ok on little-endian systems, as the upper 32 bits will likely be ignored, but on 32-bit big-endian systems, lval is complete junk. Use ival if 32 bit mode, lval if 64. Mixing of 32 and 64 bit architectures won't work without more changes. This commit was SVN r12802.	2006-12-07 23:34:04 +00:00
Brian Barrett	6f8b366acb	Rename liborte to libopen-rte and libopal to libopen-pal per telecon today and bug #632. Refs trac:632 This commit was SVN r12762. The following Trac tickets were found above: Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632	2006-12-05 18:27:24 +00:00
Brian Barrett	441432950f	Merge in changes from the bwb-heterogeneous temp branch (r12491 - r12714) for supporting compilers / architectures with different padding rules. This commit was SVN r12749. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r12491 r12714	2006-12-04 20:11:42 +00:00
Gleb Natapov	30ca7457b4	Some BTLs (e.g TCP) can report put/get completion before data actually hits the buffer on the other side. For this kind of BTLs we need to send FIN through the same BTL, PUT was performed with so network will handle ordering for us. If we will use another BTL, receiver can get FIN before data will hit the buffer and complete request prematurely. We mark such problematic BTLs with MCA_BTL_FLAGS_FAKE_RDMA flag (this kind of RDMA is really fake, because the real one guaranties that sender will see the completion only after receiver's NIC confirmed that all the data was received). This commit was SVN r12732.	2006-12-03 10:12:09 +00:00
George Bosilca	658879232b	Several small improvements: - consistent error message when something fails (via BTL_ERROR macro) - decrease the number of jumps. - cleanup some parts of the code. This commit was SVN r12719.	2006-12-01 21:48:06 +00:00
Pavel Shamis	f08bc818c4	Cleaning mca_btl_openib_progress_thread from unused variables. This commit was SVN r12709.	2006-11-30 18:28:45 +00:00
George Bosilca	59cfee0cd2	Use the MX infinite timeout by default. The user can modify it using an MCA parameter. This commit was SVN r12670.	2006-11-27 20:18:58 +00:00
Brian Barrett	0895f5e08d	Rename OMPI_PROCESS_NAME_{HTON, NTOH} macros to ORTE_PROCESS_NAME_{HTON, NTOH} because they are in ORTE, not OMPI. Also, remove the ORTE_PROCESS_NAME macros in iof base as they are duplicates of the ones that were in ns_types, which meant that bad things happened if you changed what an orte_process_name_t looked like. This commit was SVN r12646.	2006-11-22 03:03:21 +00:00
Brian Barrett	33320b7165	Rework the opal_progress interface to better support dynamic processes and at the same time, remove some of the MPI-related options from OPAL: - provide mechanism to change at runtime whether sched_yield() should be called when the progress engine is idle - provide mechanism for changing the rate at which the event engine is called when there are "no" users of the event engine (ie, when using MPI but not TCP) - fix some function names in the progress engine to better match their intended use (and remove MPI naming scheme) - remove progress_mpi_enable / progress_mpi_disable because we can now use the functions to set the sched_yield and tick rate interfaces - rename opal_progress_events() to opal_progress_set_event_flag() because the first really isn't descriptive of what the function does and I always got confused by it This commit was SVN r12645.	2006-11-22 02:06:52 +00:00
Ralph Castain	6d6cebb4a7	Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it. I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn). This commit was SVN r12597.	2006-11-14 19:34:59 +00:00
George Bosilca	139f9cf3d0	Make sure we disable the MX shared memory when we use the MX BTL. This commit was SVN r12587.	2006-11-13 22:17:06 +00:00
Gleb Natapov	9933a6f469	Previous fix doesn't fix the case when opcode is changed in put/get functions. The fix is to set opcode to SEND at the entrance to the send function before checking credits and putting fragment to the pending list. We do the same thing in put/get functions i.e setting opcode at the entrance to the function. This commit was SVN r12559.	2006-11-11 07:51:06 +00:00
Gleb Natapov	7e03b83d23	Reset opcode field to SEND. It is checked later in pending progress function. This commit was SVN r12531.	2006-11-10 06:17:00 +00:00
George Bosilca	eab1776e9a	Explicit casts for our friendly Windows environment... This commit was SVN r12496.	2006-11-08 17:02:46 +00:00
George Bosilca	3d0df2cf29	Allow the MX BTL to finish the small sends quicker. Once the mx_isend is posted if the message size is less than 4K do a check for the message completion and if any call the callback. This commit was SVN r12453.	2006-11-06 23:12:01 +00:00
Gleb Natapov	b4fd2d7d50	Fix warnings from progress thread patch. This commit was SVN r12434.	2006-11-06 12:34:56 +00:00
Pavel Shamis	566667ac61	Adding progress thread support to OpenIB BTL. Reviewed by Gleb. This commit was SVN r12411.	2006-11-02 16:15:21 +00:00
Gleb Natapov	4c784b6403	As Andrew Friedley pointed, my previous patch may cause deadlock if mca_btl_openib_endpoint_connect_eager_rdma() is called recursively. He also noticed that orte_pointer_array_add() can't fail because we allocate max number of elements at init time. So just remove error handling and locking. No locking - no deadlocks. This commit was SVN r12388.	2006-11-01 15:53:33 +00:00
Gleb Natapov	b5714d698a	Fix compilation with GM version smaller than 2.0. Fix compilation warnings. This commit was SVN r12386.	2006-11-01 10:26:15 +00:00
Gleb Natapov	aac695a51f	eager_rdma_buffers update is not atomic. A buffer is added to the array and if something is going wrong down in the code it is removed from the array. So add mutex to prevent concurrent access to the array from different threads. This commit was SVN r12385.	2006-11-01 07:27:32 +00:00
Andrew Friedley	48c5117476	Fix some signedness warnings on threaded builds introduced by r12369 This commit was SVN r12376. The following SVN revision numbers were found above: r12369 --> open-mpi/ompi@d7375ec102	2006-10-31 17:29:25 +00:00
Gleb Natapov	d7375ec102	Fix deadlock reported by Andrew Friedley: What's happening is that we're holding openib_btl->eager_rdma_lock when we call mca_btl_openib_endpoint_send_eager_rdma() on btl_openib_endpoint.c:1227. This in turn calls mca_btl_openib_endpoint_send() on line 1179. Then, if the endpoint state isn't MCA_BTL_IB_CONNECTED or MCA_BTL_IB_FAILED, we call opal_progress(), where we eventually try to lock openib_btl->eager_rdma_lock at btl_openib_component.c:997. The fix removes this lock altogether. Instead we atomically set local RDMA pointer to prevent other threads to create rdma buffer for the same endpoint. And we increment eager_rdma_buffers_count atomically thus polling thread doesn't need lock around it. This commit was SVN r12369.	2006-10-31 09:54:52 +00:00
Gleb Natapov	1b152dfe09	On 64 bit platform if high 32 bits of buf address is not zero they are trimmed by wrong bitwise and. Fix it by expanding mask to 64 bits. This commit was SVN r12368.	2006-10-31 07:33:35 +00:00

1 2 3 4 5 ...

674 Коммитов