1
1

1960 Коммитов

Автор SHA1 Сообщение Дата
Gleb Natapov
fa69c5cc10 If a memory on a sender's size is not registered don't register it on a receive
side too. Otherwise a content of the recvreq->req_rdma array is replaced later
without freeing previous content and refcount on registration in mpool become
wrong.

This commit was SVN r15978.
2007-08-28 07:43:06 +00:00
Rich Graham
bc97d22182 remove tabs. Remove old code that was commented out.
This commit was SVN r15975.
2007-08-28 03:08:36 +00:00
Rich Graham
4d58f9aed7 Add comments. Move temporary receive object from a free list object to
a stack object.

This commit was SVN r15971.
2007-08-27 21:41:04 +00:00
Gleb Natapov
e1a1d9d90e Receive request converter can be accessed in parallel by a thread that receives
data and a thread that run RDMA schedule function. Protect access to the
converter by a lock.

This commit was SVN r15967.
2007-08-27 11:41:42 +00:00
Gleb Natapov
33196d972b post_send() function is called without endpoint lock held from explicit credits
update function so eager_rdma_remote.head have to be updated in a thread safe
manner.

This commit was SVN r15966.
2007-08-27 11:37:01 +00:00
Gleb Natapov
32a61c3bf2 Credit fragment is not protected properly from concurrent access. There is a
race that can prevent further explicit credits update from been sent. Fix the
race.

This commit was SVN r15965.
2007-08-27 11:34:59 +00:00
Gleb Natapov
065d04dfde Do not free recvreq while schedule function is running in another thread.
This commit was SVN r15964.
2007-08-27 11:31:40 +00:00
Brad Benton
ccda5c9c74 Modified the MCA_BTL_TCP_CONNECTED case in mca_btl_tcp_endpoint_send_handler()
to always first check for a NULL frag pointer before trying to send the
fragment.  This avoids an issue in multi-threaded execution in which 
multiple threads working on the same endpoint can result in a thread 
finding itself here with nothing to send.

This commit was SVN r15963.
2007-08-26 23:40:02 +00:00
Edgar Gabriel
a2f5cada1a convert the hiearch component to the new structure. More testing required before we remove the .ompi_ignore flag again.
This commit was SVN r15954.
2007-08-23 20:41:29 +00:00
Rainer Keller
1b5fa48a29 - Add missing PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q when matching
from the posted generic_recv-queue.
 - Move the PERUSE_COMM_MSG_MATCH_POSTED_REQ from
   MCA_PML_OB1_RECV_REQUEST_MATCHED to
   mca_pml_ob1_recv_frag_match() as suggested by Terry Dontje
   Only post, if this is not a probe/iprobe request.
 - Do not post PERUSE_COMM_REQ_MATCH_UNEX for probes / iprobes and
   do in correct order before PERUSE_COMM_MSG_REMOVE_FROM_UNEX_Q

This commit was SVN r15947.
2007-08-23 07:09:43 +00:00
Rainer Keller
c175801f98 - Initialize in the order of mca_pml_ob1_comm_proc_t...
This commit was SVN r15946.
2007-08-23 05:56:22 +00:00
Rainer Keller
b0df55d53b - For MPI_Probe/MPI_Iprobe, we should not have a
PERUSE_COMM_REQ_ACTIVATE event.
   Therefore move the PERUSE_TRACE_COMM_EVENT for this event from
   MCA_PML_BASE_SEND_REQUEST_INIT / MCA_PML_BASE_RECV_REQUEST_INIT
   to the proper places into pml_ob1_isend.c / pml_ob1_irecv.c right
   after the MCA_PML_OB1_SEND_REQUEST_INIT /
   MCA_PML_OB1_RECV_REQUEST_INIT.

This commit was SVN r15945.
2007-08-23 05:52:33 +00:00
Gleb Natapov
becf4aa9c9 ompi_pointer_array_get_size doesn't return how much elements are actually in an
array, so count them by ourselves.

This commit was SVN r15943.
2007-08-22 09:31:12 +00:00
Shiqing Fan
a497a3fcad - Fix some small bugs, copy-paste mistakes.
This commit was SVN r15941.
2007-08-21 19:57:28 +00:00
Sven Stork
3985a35c35 - export required symbol
This commit was SVN r15939.
2007-08-21 18:46:11 +00:00
Gleb Natapov
d8f3063895 Create only one CQ for all BTLs on the same HCA. Many BTLs can be created for
one HCA. Multiple ports, LMC, multiple BTLs per one LID. Having only one CQ for
all of them substantially reduce polling time.

This commit was SVN r15933.
2007-08-20 12:28:25 +00:00
Gleb Natapov
5596aa5f53 The sizes of mca_pml_ob1_send_request_t and mca_pml_ob1_recv_request_t depend
on a parameter and are determined in runtime. r15346 removed calculation of
correct sizes for this structures. This patch adds it back and fixes trac:1116, #1114.

This commit was SVN r15932.

The following SVN revision numbers were found above:
  r15346 --> open-mpi/ompi@433f8a7694

The following Trac tickets were found above:
  Ticket 1116 --> https://svn.open-mpi.org/trac/ompi/ticket/1116
2007-08-20 12:06:27 +00:00
George Bosilca
c7e0ab93ae Don't forget to include string.h for the strcmp function.
This commit was SVN r15927.
2007-08-19 19:59:15 +00:00
Brian Barrett
af4e86c25f Update collectives selection logic to allow for multiple components to be
used at nce (up to one unique collective module per collective function).
Matches r15795:15921 of the tmp/bwb-coll-select branch

This commit was SVN r15924.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r15795
  r15921
2007-08-19 03:37:49 +00:00
Brian Barrett
2b8af283de Add ability to completely turn off MPI one-sided support, so that users
can experiment with using ROMIO directly.

This commit was SVN r15922.
2007-08-18 21:35:51 +00:00
Josh Hursey
729c63cf9d Fix invalid MCA 'base' names so they appear in ompi_info.
A subset of this patch needs to be applied to v1.2

Refs trac:928

This commit was SVN r15918.

The following Trac tickets were found above:
  Ticket 928 --> https://svn.open-mpi.org/trac/ompi/ticket/928
2007-08-18 03:05:45 +00:00
Brian Barrett
3b98b5f0a1 The reference implementation of Portals (which runs over TCP on Linux) is
only static libraries.  Previously, we were linking the libraries into 
directly into the common, btl, and mtl code.  This seemed to work fine
for me on my Opteron Fedora box, but caused Lisa some issues (PtlNIInit
would succeed, but the network handle would fail when used with
PtlEQAlloc).

Instead, link the portals libraries directly into libmpi and not at
all into the common, btl, or mtl components.  THen use some linker
tricks to force the linker to bring in the public interface for the
reference implementation (which thankfully is pretty small).

This commit was SVN r15902.
2007-08-17 03:56:49 +00:00
Brad Benton
c254645383 Fixes trac:1134.
Fixed a condition test while checking that all segments are empty.
Without this fix, a NULL segment pointer could make it past the
test, resulting in a SegV when dereferenced.

This commit was SVN r15891.

The following Trac tickets were found above:
  Ticket 1134 --> https://svn.open-mpi.org/trac/ompi/ticket/1134
2007-08-16 19:39:52 +00:00
Brad Benton
1ddba9ec65 Lock the endpoint before doing endpoint_state processing. This ensures
that the subsequent unlock is valid.

This commit was SVN r15890.
2007-08-16 18:11:29 +00:00
Aurelien Bouteiller
3a83c61c40 Fixed a bug with available space in sender based.
This commit was SVN r15889.
2007-08-16 17:54:26 +00:00
Tim Prins
5a795128af Change it so that different components in orte use unique rml tags
This commit was SVN r15881.
2007-08-16 14:02:35 +00:00
Aurelien Bouteiller
77565d60d9 Heavy modification of the pml_v framework.
* Code cleanup and rationalization
* Fixed: mca_pml_base_send/recv_request are now allocated before recreation by the PML-V
* Fixed: pointer arithmetic bug in sender based that crashed 
* Changed: directory structure. This is one step forward using autogen.sh to build static-components.h (it needs to have the directory structure of a mca framework for this). 

This commit was SVN r15878.
2007-08-16 05:52:30 +00:00
Aurelien Bouteiller
ee708d702d Slight modification to register the name of the selected pml (from the pml framework) instead of the generic mca name. This might be a different name when enabling FT features. This name modification in the modex allows the PMLS to detect a FT protocol mismatch among hosts.
This commit was SVN r15877.
2007-08-16 05:46:11 +00:00
Aurelien Bouteiller
fa7f6f6722 Improved error detection of request types
This commit was SVN r15857.
2007-08-14 17:24:46 +00:00
Aurelien Bouteiller
67399e7c31 Added a debug type checking for request types (to make sure request size is correctly computed).
This commit was SVN r15856.
2007-08-14 17:18:15 +00:00
Aurelien Bouteiller
1d97c183e7 Better argument checking for output function and added a routine for error printing.
This commit was SVN r15855.
2007-08-14 17:17:12 +00:00
Jeff Squyres
d7c5fea096 * Fix problem caused by r15848: the test parser was looking for
semicolons but the new specitifcation string used colons.  The text
   parser now looks for colons.
 * Changed all opal_output() error messages to
   much-more-helpful/descriptive opal_show_help() messages.
 * A few minor style/indenting fixes

This commit was SVN r15850.

The following SVN revision numbers were found above:
  r15848 --> open-mpi/ompi@dd30597f39
2007-08-14 14:46:13 +00:00
Jeff Squyres
dd30597f39 Change the default receive_queues value per
http://www.open-mpi.org/community/lists/devel/2007/08/2100.php.

This commit was SVN r15848.
2007-08-13 21:51:05 +00:00
Jelena Pjesivac-Grbovic
9bd9c92dbd Making sure that the decision function for scatter and gather correctly
computes everything for MPI_IN_PLACE case.

This commit was SVN r15841.
2007-08-13 17:35:50 +00:00
Jelena Pjesivac-Grbovic
b558e820cb removing compiler wraning
This commit was SVN r15803.
2007-08-08 15:22:01 +00:00
Jelena Pjesivac-Grbovic
daa10b277e modifying scatter decision function to use binomial algorithm for
small message sizes.

This commit was SVN r15798.
2007-08-07 22:16:13 +00:00
Pak Lui
0790c4cc40 * Update the comment for the previous fix. Thanks Gleb for pointing out.
This commit was SVN r15790.
2007-08-07 14:40:13 +00:00
Jeff Squyres
50bae9c603 Bring in the modular-wireup stuff for the openib BTL (from
/tmp/jms-modular-wireup branch):

 * This commit moves all the openib BTL connection code out of
   btl_openib_endpoint.c and into a connect "pseudo-component" area,
   meaning that different schemes for doing OFA connection schemes can
   be chosen via function pointer (i.e., MCA parameter) at run-time.
 * The connect/connect.h file includes comments describing the
   specific interface for the connect pseudo-component.
 * Two pseudo-components are in this commit (more can certainly be
   added).
   * oob: use the same old oob/rml scheme for creating OFA connections
     that we've had forever; this now just puts the logic into this
     self-contained pseudo-component.
   * rdma_cm: a currently-empty set of functions (that currently
     return NOT_IMPLEMENTED) that will someday use the RDMA connection
     manager to make OFA connections.

This commit was SVN r15786.
2007-08-06 23:40:35 +00:00
Aurelien Bouteiller
ca69915b1e Code cleanup
This commit was SVN r15783.
2007-08-06 22:20:44 +00:00
Brian Barrett
69952d9603 Fix abort caused by calling PtlEQGet on an invalid eq, which could occur
if add_procs was never called.

This commit was SVN r15779.
2007-08-06 17:28:11 +00:00
Brian Barrett
1fb78a35f9 Back out part of r15756. The common_portals_utcp.c file is only used with
the Sandia reference implementation of Portals, and doesn't have the cnos
functions.  This file should never be compiled (and wasn't being compiled)
on the Cray machines, so doesn't need to be updated to support CNL.

This commit was SVN r15778.

The following SVN revision numbers were found above:
  r15756 --> open-mpi/ompi@755658694e
2007-08-06 17:21:00 +00:00
Sven Stork
9e2263f29f - fix a small memory leak
This commit was SVN r15768.
2007-08-06 13:35:32 +00:00
Mohamad Chaarawi
59a7bf8a9f Merging in the Sparse Groups..
This commit includes config changes..

This commit was SVN r15764.
2007-08-04 00:41:26 +00:00
George Bosilca
e41ee17ca5 Add a small comment that hopefully will enforce the correct ordering of
the fields between CM and the other PML in the requests structure.

This commit was SVN r15760.
2007-08-03 23:59:29 +00:00
Josh Hursey
755658694e Bring in changes to support Cray's Compute Node Linux (CNL) and
Application Level Placement Scheduler (ALPS).

This commit was tested under two Cray machines at ORNL: Jaguar (Catamount)
and Rizzo (CNL Test cage). Both machines performed as they should across
the commit.

It is likely that mor changes will follow this the work and environment
stabilizes.

Most of the infrastructure works the same for Catamount and CNL
except for a few bits. Below are the highlights:

Default IFACE Change:
 On Catamount we can use PTL_IFACE_DEFAULT, but on the CNL system we have access
 to will fail on this interface, and should be set to:
    IFACE_FROM_BRIDGE_AND_NALID(PTL_BRIDGE_UK,PTL_IFACE_SS).
 So if we detect that we are running with YOD then use the former interface
 and if we detect that we are running with ALPS then use the latter.
 We will want to pursue a more elegant solution if this interface continues to 
 change across machines.

PtlGetId and cnos_register_ptlid:
 The header suggests that these should never be called when launching with YOD.
 But in the ALPS environment the cnos_barrier() will hang forever if these 
 functions are not called after PtlNIInit(). Since these functions only need to
 be called once, and the orte rmgr/cnos component is loaded before the ompi 
 common/portals componet then just call these functions once in the rmgr/cnos
 component.

cnos_barrier_init():
 This is a noop for YOD, but critical for ALPS. So be sure to call it before
 calling the first barrier in the rmgr/cnos component.

cnos_barrier vs cnos_pm_barrier:
 It is suggested the cnos_pm_barrier only be used during finalization 
 as it will indicate to the launcher (yod or aprun) that the app is about
 to complete. It was suggested that we use the regular cnos_barrier() instead.
 I want to look into this a bit more to make sure there are not adverse
 side effects. A note has been placed in the code to indicate this reasoning.

This commit was SVN r15756.
2007-08-03 19:46:38 +00:00
Pak Lui
010d216db9 * restrict the user with 32 bit app to specify a sm_size to be between
2GB to 4GB-1 by using long instead of size_t for the sm size.
   * it is done to prevent user from running into the ftruncate() in 
	 common sm component (and possibly others) problem that ftruncate 
	 takes an off_t which is a signed long integer. If we use an 
	 unsigned long, it'll run into an invalid argument errno=22.
   * See trac #1117

This commit was SVN r15752.
2007-08-03 15:43:02 +00:00
Aurelien Bouteiller
1d160ca583 Needed change for vampir pml to work
This commit was SVN r15750.
2007-08-03 02:23:24 +00:00
Jeff Squyres
d3f008492f Introduce a new debugging MCA parameter:
mpi_show_mpi_alloc_mem_leaks

When activated, MPI_FINALIZE displays a list of memory allocations
from MPI_ALLOC_MEM that were not freed by MPI_FREE_MEM (in each MPI
process).

 * If set to a positive integer, display only that many leaks.
 * If set to a negative integer, display all leaks.
 * If set to 0, do not show any leaks.

This commit was SVN r15736.
2007-08-01 21:33:25 +00:00
Jeff Squyres
0fb8cf65a8 If you have an HCA with no active ports, we still create an mpool.
This mpool will have no btl module owner there was no btl created for
the HCA with no ports, but it will still be tracked in the mpool
framework (i.e., it's available).

If MPI_ALLOC_MEM is called by the app, one of two things will happen:

 1. if there's an HCA on the host with some active ports, the openib
    btl component will still be in the process space, and therefore
    the "mpool with no btl" (MWNB) module will still be able to call
    the reg/dereg functions, and all will be fine.  However, if
    MPI_FREE_MEM is never invoked to free the memory, bad things will
    happen during MPI_FINALIZE.  The pml is finalized, which finalizes
    all the btls.  The btls finalize all their mpools and all is fine.
    But later we close down the mpool framework which then finalizes
    any left over mpool modules, such as MWNB.  However, the openib
    BTL module functions that the MWNB was registered with are no
    longer in the process space, and it segv's while trying deregister
    the memory.
 2. if there are *no* HCA's on the host with active ports, then the
    openib btl will have been unloaded, and when the MWNM tries to
    register the memory, the functions it tries to call (in the openib
    btl) are no longer there, and we segv.

This commit was SVN r15735.
2007-08-01 20:53:34 +00:00
Gleb Natapov
627d9bc8ed Delay freeing of a send request if scheduling function is running by other
thread.

This commit was SVN r15722.
2007-08-01 12:19:16 +00:00