1
1
Граф коммитов

1498 Коммитов

Автор SHA1 Сообщение Дата
Gleb Natapov
b910d10a81 Add general functions for alignment and change rdma_mpool_align to always
honor an alignment event if posix_memalign() is not available.

This commit was SVN r12892.
2006-12-18 10:52:18 +00:00
Brian Barrett
c554638446 Support systems without malloc.h or posix_memalign (ie, pretty much every
one we support that isn't Linux)

This commit was SVN r12880.
2006-12-17 17:28:59 +00:00
Brian Barrett
b448b4e47e More heterogeneous fixes. Don't set reachability bit on a remote proc
if the remote architecture differs from the local architecture and the
btl doesn't support heterogeneous transport.

Refs trac:587

This commit was SVN r12879.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2006-12-17 17:27:08 +00:00
Gleb Natapov
190e7a27cd Merge with gleb-mpool branch. All RDMA components use same mpool now (rdma).
udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows
to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited).

This commit was SVN r12878.
2006-12-17 12:26:41 +00:00
Brian Barrett
0653dc3f24 Pad headers to eliminate heterogeneous issues. Add conversion functions
for switching endianness of headers.  Galen is going to add the code to
use the endian stuff...

This commit was SVN r12876.
2006-12-17 00:50:59 +00:00
Brian Barrett
01e8fc5f91 Redo of r12871, without the preconnect code change:
Move the req_mtl structure back to the end of each of the structures in 
the CM PML. The req_mtl structure is cast into a mtl_*_request_structure 
for each MTL, which is larger than the req_mtl itself. The cast will cause
the *_request to overwrite parts of the heavy requests if the req_mtl
isn't the *LAST* thing on each structure (hence the comment). This was 
moved as an optimization at some point, which caused buffer sends to fail...

Refs trac:669

This commit was SVN r12873.

The following SVN revision numbers were found above:
  r12871 --> open-mpi/ompi@597598b712

The following Trac tickets were found above:
  Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669
2006-12-15 17:54:14 +00:00
Brian Barrett
bdf0b231b2 Undo r12871, as it contained some code in ompi/runtime that shouldn't have been
committed

Refs trac:669

This commit was SVN r12872.

The following SVN revision numbers were found above:
  r12871 --> open-mpi/ompi@597598b712

The following Trac tickets were found above:
  Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669
2006-12-15 17:52:13 +00:00
Brian Barrett
597598b712 Move the req_mtl structure back to the end of each of the structures in the
CM PML.  The req_mtl structure is cast into a mtl_*_request_structure for
each MTL, which is larger than the req_mtl itself.  The cast will cause
the *_request to overwrite parts of the heavy requests if the req_mtl
isn't the *LAST* thing on each structure (hence the comment).  This was
moved as an optimization at some point, which caused buffer sends to
fail...

Refs trac:669

This commit was SVN r12871.

The following Trac tickets were found above:
  Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669
2006-12-15 17:46:53 +00:00
Brad Benton
18da4c40d3 Set the QP's static rate from the associated MCA parameter, rather
than just defaulting to 0.

Fixes trac:675

This commit was SVN r12855.

The following Trac tickets were found above:
  Ticket 675 --> https://svn.open-mpi.org/trac/ompi/ticket/675
2006-12-14 19:42:24 +00:00
Brian Barrett
38c2e43ac2 Print out error string rather than errno for TCP-related errors, making it easier for both the user and us to debug issues with BTL and OOB issues...
This commit was SVN r12852.
2006-12-14 18:20:43 +00:00
Jeff Squyres
0ca8cb35b7 Fixes trac:366
Add ability for ini files to recognize "use_eager_rdma" flag.  Set the
default to "no" (because we should assume that HCAs cannot support the
property necessary for using RDMA for eager messages -- that the last
byte of the message is guaranteed to be written to memory last --
unless proven otherwise.  For example, iWARP cards apparently do not
provide this guarantee), and then set all Mellanox and IBM HCAs to
override the default to enable this behavior on these cards.

This commit was SVN r12851.

The following Trac tickets were found above:
  Ticket 366 --> https://svn.open-mpi.org/trac/ompi/ticket/366
2006-12-14 15:52:13 +00:00
George Bosilca
80bc0c8868 Allow the MX to survive if we are unable to connect to a peer. The PML will
try to find another route.

This commit was SVN r12837.
2006-12-13 01:12:07 +00:00
Brian Barrett
10af8ab454 Corrections for when threading is enabled.
Refs trac:564

This commit was SVN r12830.

The following Trac tickets were found above:
  Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564
2006-12-12 18:48:42 +00:00
Brad Benton
337116d5fd Added IBM eHCA vendor and part id info
This commit was SVN r12827.
2006-12-12 14:12:39 +00:00
Brian Barrett
cf196ce420 Instead of an unknown proc list that requires ownership transfer of data (which, in turn, requires a complex series of locks to be held during the transfer), use a modex backing store with backpointers from the proc to the backing store. The proc structures no longer own the modex data, which greatly simplifies locking when an unknown proc suddenly becomes known.
Refs trac:564

This commit was SVN r12822.

The following Trac tickets were found above:
  Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564
2006-12-11 21:27:30 +00:00
Ralph Castain
0a5d41857a Complete next round of message size reduction: "strip" the descriptive info from the returned values. I have now added a flag to the gpr address mode (ORTE_GPR_STRIPPED) that instructs the gpr to not include segment names or tokens in the returned gpr_value_t objects.
I found only two places that were looking at the tokens:

1. the odls - we used the tokens to separately process the globals container data from everything else. In this case, I left the subscription that returned the globals data alone, but "stripped" the subscription that returned the launch data for the procs. These subscriptions have nothing to do with the xcast message.

2. the pml_base_modex - the callback function was getting process names from the returned tokens. Actually, this function was doing a very bad thing - it was assuming that the first token returned was *always* the process name. This is currently true, but is one of those assumptions that someone could have easily changed - and suddenly found the system inexplicably failing. I modified the function to (a) get the name sent back to us, (b) "stripped" the value structures of tokens and segment strings, and (c) correctly obtained process names from the returned values. I also reindented the heck out of the code so it was legible (at least, to my old eyes).

This commit was SVN r12813.
2006-12-09 23:10:25 +00:00
Jeff Squyres
e70ef98ea6 Update the help message to be a bit more specific and refer to the web
FAQ.

This commit was SVN r12812.
2006-12-09 15:13:03 +00:00
Patrick Geoffray
58c6f8c8e1 Copyright update. Thanks to Jeff to remind me.
This commit was SVN r12803.
2006-12-07 23:55:00 +00:00
Patrick Geoffray
6e09b0c23f lval is not defined when pval is assigned on 32 bit systems. this
usually is ok on little-endian systems, as the upper 32 bits will likely
be ignored, but on 32-bit big-endian systems, lval is complete junk.
Use ival if 32 bit mode, lval if 64.

Mixing of 32 and 64 bit architectures won't work without more changes.

This commit was SVN r12802.
2006-12-07 23:34:04 +00:00
Brian Barrett
98884e45e4 Clean up the way procs are added to the global process list after MPI_INIT:
* Do not add new procs to the global list during modex callback or
    when sharing orte names during accept/connect.  For modex, we
    cache the modex info for later, in case that proc ever does get
    added to the global proc list.  For accept/connect orte name
    exchange between the roots, we only need the orte name, so no
    need to add a proc structure anyway.  The procs will be added
    to the global process list during the proc exchange later in 
    the wireup process
  * Rename proc_get_namebuf and proc_get_proclist to proc_pack
    and proc_unpack and extend them to include all information
    needed to build that proc struct on a remote node (which
    includes ORTE name, architecture, and hostname).  Change
    unpack to call pml_add_procs for the entire list of new
    procs at once, rather than one at a time.
  * Remove ompi_proc_find_and_add from the public proc
    interface and make it a private function.  This function
    would add a half-created proc to the global proc list, so
    making it harder to call is a good thing.

This means that there's only two ways to add new procs into the global proc list at this time: During MPI_INIT via the call to ompi_proc_init, where my job is added to the list and via ompi_proc_unpack using a buffer from a packed proc list sent to us by someone else.  Currently, this is enough to implement MPI semantics.  We can extend the interface more if we like, but that may require HNP communication to get the remote proc information and I wanted to avoid that if at all possible.

Refs trac:564

This commit was SVN r12798.

The following Trac tickets were found above:
  Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564
2006-12-07 19:56:54 +00:00
Brian Barrett
41a70a8f01 indent, this time with the right coding standards...
This commit was SVN r12787.
2006-12-07 00:24:01 +00:00
Brian Barrett
f9ec8d6f2a reindent file to make it easier to deal with...
This commit was SVN r12786.
2006-12-07 00:21:25 +00:00
Brian Barrett
6f8b366acb Rename liborte to libopen-rte and libopal to libopen-pal per telecon today
and bug #632.

Refs trac:632

This commit was SVN r12762.

The following Trac tickets were found above:
  Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632
2006-12-05 18:27:24 +00:00
Brian Barrett
441432950f Merge in changes from the bwb-heterogeneous temp branch (r12491 -
r12714) for supporting compilers / architectures with different
padding rules.

This commit was SVN r12749.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r12491
  r12714
2006-12-04 20:11:42 +00:00
Rainer Keller
6f8f28f40f - Get rid of inline definition, otherwise static-compilation fails.
This commit was SVN r12735.
2006-12-03 14:52:17 +00:00
Gleb Natapov
30ca7457b4 Some BTLs (e.g TCP) can report put/get completion before data actually
hits the buffer on the other side. For this kind of BTLs we need to send
FIN through the same BTL, PUT was performed with so network will handle
ordering for us. If we will use another BTL, receiver can get FIN before
data will hit the buffer and complete request prematurely. We mark such
problematic BTLs with MCA_BTL_FLAGS_FAKE_RDMA flag (this kind of RDMA
is really fake, because the real one guaranties that sender will see the
completion only after receiver's NIC confirmed that all the data was
received).

This commit was SVN r12732.
2006-12-03 10:12:09 +00:00
Gleb Natapov
39c930b160 The bug fixing part of r12720 introduce much more serious bug that it fixes.
It calls mca_pml_ob1_send_fin_btl() which may fail and doesn't check return
code. This breaks all RDMA transports event when only one BTL is used. Revert
it for now, I am working on a real fix for the problem (I hope).

This commit was SVN r12731.

The following SVN revision numbers were found above:
  r12720 --> open-mpi/ompi@3e3689320b
2006-12-03 08:55:59 +00:00
Gleb Natapov
65d7ad4581 The "bug fix" from the r12721 reverts part of the r12433 that fixed
regresion from v1.1 was reviewed and put to v1.2 branch. So revert this part
of r12721 back.

This commit was SVN r12730.

The following SVN revision numbers were found above:
  r12433 --> open-mpi/ompi@82f7c0dd69
  r12721 --> open-mpi/ompi@3edd850d2e
2006-12-03 08:29:55 +00:00
George Bosilca
3edd850d2e Some indentation and code arrangement. However, there is a bug fix. Force the PUT
protocol to always obey to the btl_max_rdma_size.

This commit was SVN r12721.
2006-12-01 22:26:14 +00:00
George Bosilca
3e3689320b Some indentations and one BIG fix. Avoid race conditions on the PUT RDMA
protocol when multiple NICS are available between 2 peers. The fix force
the FIN message to take exactly the same path as the fragment it describe
(i.e. same path means same BTL). Otherwise, the FIN can be received by
the peer before the RDMA complete and the request will get freed
too early.

This commit was SVN r12720.
2006-12-01 21:52:07 +00:00
George Bosilca
658879232b Several small improvements:
- consistent error message when something fails (via BTL_ERROR macro)
- decrease the number of jumps.
- cleanup some parts of the code.

This commit was SVN r12719.
2006-12-01 21:48:06 +00:00
Brian Barrett
2a09fa2d9d * silence compiler warning
This commit was SVN r12717.
2006-12-01 20:01:53 +00:00
Brian Barrett
bfbc281e93 Fix slow startup issue with the MX MTL. The problem is caused by mx_connect() being a one-sided operation from the API level, but not being an interrupting call when the target is not entering the MX library. So if most of the processes exit mtl_mx_add_procs() and enter the stage gate 2 barrier, the other processes can only progress their mx_connect() calls when the targets enter the mx library. Because the event library is in EV_ONELOOP mode, this only happens once a second. The mx progress thread (hidden in the MX library) also only wakes up once a second, so mx_connect calls can take a second to complete.
The temporary solution is to switch into EV_NONBLOCK mode earlier (right after the mx_connect loop) so that there isn't a giant slowdown when processes enter the stage gate 2 barrier before other proesses.  They will now not block in the event library for any period of time, which appears to have a 50% speedup when running at > 64 procs.

Refs trac:645

This commit was SVN r12713.

The following Trac tickets were found above:
  Ticket 645 --> https://svn.open-mpi.org/trac/ompi/ticket/645
2006-12-01 02:49:01 +00:00
Pavel Shamis
f08bc818c4 Cleaning mca_btl_openib_progress_thread from
unused variables.

This commit was SVN r12709.
2006-11-30 18:28:45 +00:00
Brian Barrett
bc3250fe6f Fix issue where messages could start arriving before the window was
fully initialized, especially during lock/unlock, which doesn't use MPI
collectives for synchronization (unlike Fence)

This commit was SVN r12676.
2006-11-27 22:42:21 +00:00
Brian Barrett
b5b6f4c4bf Remove check for epoch on the receiver side, as this is a race condition between message arrival, and because we'll be still blocking in a synchronization call, is of no consequence to the user. We do an epoch check on the sending side,
so this isn't an issue there either.  Refs trac:488

This commit was SVN r12675.

The following Trac tickets were found above:
  Ticket 488 --> https://svn.open-mpi.org/trac/ompi/ticket/488
2006-11-27 22:15:34 +00:00
Brian Barrett
beb1e9d4dd * finish move from hard coded tag to #define'd constant tag
This commit was SVN r12674.
2006-11-27 21:55:41 +00:00
Brian Barrett
0c25f7be09 More One-sided fixes:
* Fix a counter roll-over issue that could result from a large (but
    not excessive) number of outstanding put/get/accumulate calls
    during a single synchronization issues (Refs trac:506)
  * Fix epoch issue with rdma component that would effect PWSC
    synchronization (Refs trac:507)

This commit was SVN r12673.

The following Trac tickets were found above:
  Ticket 506 --> https://svn.open-mpi.org/trac/ompi/ticket/506
  Ticket 507 --> https://svn.open-mpi.org/trac/ompi/ticket/507
2006-11-27 21:41:29 +00:00
George Bosilca
59cfee0cd2 Use the MX infinite timeout by default. The user can modify it using an MCA
parameter.

This commit was SVN r12670.
2006-11-27 20:18:58 +00:00
Brian Barrett
993d2a7753 Fix for issue IU is seeing on BigRed with connections timing out during
MPI_INIT.  Use an infinite timeout, which is exactly what MPICH-MX does.

This commit was SVN r12669.
2006-11-27 20:10:27 +00:00
Brian Barrett
63e5668e29 Number of one-sided fixes:
* use one-sided datatype check instead of send/receive and check both
    the origin and target datatypes
  * allow error handler to be set on MPI_WIN_NULL, per standard
  * Allow recursive calls into the pt2pt osc component's progress
    function
  * Fix an uninitialized variable problem in the unlock header

This commit was SVN r12667.
2006-11-27 03:22:44 +00:00
Brian Barrett
0895f5e08d Rename OMPI_PROCESS_NAME_{HTON, NTOH} macros to ORTE_PROCESS_NAME_{HTON, NTOH}
because they are in ORTE, not OMPI.  Also, remove the ORTE_PROCESS_NAME macros
in iof base as they are duplicates of the ones that were in ns_types, which 
meant that bad things happened if you changed what an orte_process_name_t
looked like.

This commit was SVN r12646.
2006-11-22 03:03:21 +00:00
Brian Barrett
33320b7165 Rework the opal_progress interface to better support dynamic processes and at
the same time, remove some of the MPI-related options from OPAL:

  - provide mechanism to change at runtime whether sched_yield() should 
    be called when the progress engine is idle
  - provide mechanism for changing the rate at which the event engine
    is called when there are "no" users of the event engine (ie, when
    using MPI but not TCP)
  - fix some function names in the progress engine to better match
    their intended use (and remove MPI naming scheme)
  - remove progress_mpi_enable / progress_mpi_disable because 
    we can now use the functions to set the sched_yield and
    tick rate interfaces
  - rename opal_progress_events() to opal_progress_set_event_flag()
    because the first really isn't descriptive of what the function
    does and I always got confused by it

This commit was SVN r12645.
2006-11-22 02:06:52 +00:00
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
George Bosilca
139f9cf3d0 Make sure we disable the MX shared memory when we use the MX BTL.
This commit was SVN r12587.
2006-11-13 22:17:06 +00:00
Andrew Friedley
a4bdcb4faa Fix a segfault that turned up in more MPI_THREAD_MULTIPLE testing.
Same sort of problem and fix as described in r12323 - mca_pml_ob1_recv_frag_progress() was segfaulting due to a NULL req_proc pointer.  The path leading to this was through the mca_pml_ob1_check_cantmatch_for_match() function, where we can match a frag using the same macros as mca_pml_ob1_frag_match() and never initialize the req_proc pointer.

This commit was SVN r12582.

The following SVN revision numbers were found above:
  r12323 --> open-mpi/ompi@c752502dee
2006-11-13 20:12:51 +00:00
Gleb Natapov
9933a6f469 Previous fix doesn't fix the case when opcode is changed in put/get functions.
The fix is to set opcode to SEND at the entrance to the send function before
checking credits and putting fragment to the pending list. We do the same thing
in put/get functions i.e setting opcode at the entrance to the function.

This commit was SVN r12559.
2006-11-11 07:51:06 +00:00
George Bosilca
ec410644ce Implement the send receive as 2 non blocking operations. That will help us
avoiding too many calls to opal_progress.

This commit was SVN r12553.
2006-11-10 23:06:19 +00:00
George Bosilca
c2c6a1b37e Correctly compute the number of elements in a segment.
For broadcast send the correct size for all intermediary nodes.

This commit was SVN r12552.
2006-11-10 23:04:50 +00:00
George Bosilca
7102147b9f Correctly detect when the specified algorithm is out of range. In
this case we reset it to zero.

This commit was SVN r12551.
2006-11-10 21:47:07 +00:00
George Bosilca
bfbd0e61f6 Minimize the number of lines of code :)
This commit was SVN r12550.
2006-11-10 20:56:08 +00:00
George Bosilca
a38cd366d7 Construct the convertor. It's not really required, but it's not in the
critical path anyway. At least in debug mode we get nice informations about
where the convertor was created.

This commit was SVN r12549.
2006-11-10 20:55:06 +00:00
George Bosilca
858ab24e8e The req_mtl field has to be the last in the struct or bad things happen.
This commit was SVN r12548.
2006-11-10 20:53:41 +00:00
George Bosilca
af68171253 Use the macro to compute the number of elements in a segment in both
bcast and reduce and update the default values for the variables
as required by the comment in the coll_tuned.h file.

This commit was SVN r12546.
2006-11-10 20:04:08 +00:00
George Bosilca
476b922074 Updates & upgrades:
- consistent arguments checking (not allowing to select an algorithm which
     is not available)
 - consistent way of computing the segcount (number of datatypes by segment).
 - small cleanups.
 - more informative debugging messages.

This commit was SVN r12545.
2006-11-10 19:54:09 +00:00
Gleb Natapov
7e03b83d23 Reset opcode field to SEND. It is checked later in pending progress function.
This commit was SVN r12531.
2006-11-10 06:17:00 +00:00
George Bosilca
77ef979457 New architecture for broadcast. A generic broadcast working on a tree
description. Most of the bcast algorithms can be completed using this
generic function once we create the tree structure. Add all kind of
trees.

There are 2 versions of the generic bcast function. One using overlapping
between receives (for intermediary nodes) and then blocking sends to all
childs and another where all sends are non blocking. I still have to
figure out which one give the smallest overhead.

This commit was SVN r12530.
2006-11-10 05:53:50 +00:00
George Bosilca
17405cd9c6 A temporary fix, until we figure out a better approach. The problem
is that if one add "pml=" to the configuration file, really bad things
happen. All PMLs will get initialize, and each of them will initialize
all BTLs. This patch force the mca_pml_base_pml to get initialized in
all cases before we go out of the mca_pml_base_open function.

This commit was SVN r12527.
2006-11-10 04:53:00 +00:00
George Bosilca
1d80f685b5 Remove one compiler warning.
This commit was SVN r12520.
2006-11-09 20:08:43 +00:00
George Bosilca
73eec4bfef Show the MCA parameter coll_base_verbose only if Open MPI is compiled in
debug mode. Otherwise there is no debug anyway ...

This commit was SVN r12516.
2006-11-09 19:02:32 +00:00
George Bosilca
a82ce427e4 Update the number of reduce algorithms available.
This commit was SVN r12503.
2006-11-08 22:20:34 +00:00
George Bosilca
eab1776e9a Explicit casts for our friendly Windows environment...
This commit was SVN r12496.
2006-11-08 17:02:46 +00:00
George Bosilca
0914892044 Small cleanups, some explicit casts.
This commit was SVN r12494.
2006-11-08 16:54:03 +00:00
George Bosilca
74d3946342 Remove the call to set_args. This is only required for the MPI level,
because there we have to be able to return to the user the description
of the data.

This commit was SVN r12493.
2006-11-08 16:52:48 +00:00
George Bosilca
915d748d72 Initialize the convertor on _START not on _INIT. This allow us to
set it up before the match when we know the peer, saving some
time on the critical path. If the receive is ANY_SOURCE then
we initialize the convertor on _MATCHED. Anyway, we will set it
up only once per receive.

This commit was SVN r12484.
2006-11-08 05:42:29 +00:00
George Bosilca
eb45a5e402 Move things around a little bit. Mainly fields from the send and receive
request in the base request. Rearrange the fields to keep the data
together. Remove some useless tests.

This commit was SVN r12482.
2006-11-08 04:58:23 +00:00
George Bosilca
63462331c9 Reduce the number of branches. Keep the fast path as short as possible.
Remove some useless error checking. Add OPAL_UNLIKELY directives.

This commit was SVN r12477.
2006-11-07 23:59:32 +00:00
George Bosilca
f3de2e1a82 Keep the fast path as short as possible.
This commit was SVN r12476.
2006-11-07 23:56:32 +00:00
Jeff Squyres
427c20af0d Use a new algorithm for allgatherv. The old algorithm essentially did
N gatherv's:

  for (i = 0 ... size)
    MPI_Gatherv(..., root = i, ...)

The new algorithm simply does (effectively):

  MPI_Gatherv(..., root = 0, ...)
  MPI_Bcast(..., root = 0, ...)

This commit was SVN r12469.
2006-11-07 18:07:55 +00:00
Galen Shipman
55db17b37c don't try to use a dead btl..
This commit was SVN r12456.
2006-11-06 23:25:24 +00:00
George Bosilca
108ea4dbe9 When the MX MTL complete a request, force a return from the progress function.
Decrease the latency by about 0.3 microseconds.

This commit was SVN r12454.
2006-11-06 23:13:07 +00:00
George Bosilca
3d0df2cf29 Allow the MX BTL to finish the small sends quicker. Once the mx_isend is posted if
the message size is less than 4K do a check for the message completion and if any
call the callback.

This commit was SVN r12453.
2006-11-06 23:12:01 +00:00
Galen Shipman
eef37430a7 failing already failed for ACK timeout..
This commit was SVN r12452.
2006-11-06 22:09:39 +00:00
Galen Shipman
813e7faea8 more fixes for failover.. and yet still more to come..
This commit was SVN r12450.
2006-11-06 21:27:17 +00:00
Gleb Natapov
b4fd2d7d50 Fix warnings from progress thread patch.
This commit was SVN r12434.
2006-11-06 12:34:56 +00:00
Gleb Natapov
82f7c0dd69 Fix regression from v1.1.
1) make the code do what comment says
2) if memory is prepinned don't send multiple PUT messages.

This commit was SVN r12433.
2006-11-06 12:00:17 +00:00
Galen Shipman
f7c554df65 Try to failover when we get an async error from the lower layer (BTL)..
This commit was SVN r12420.
2006-11-03 15:40:26 +00:00
George Bosilca
8529238d93 Add 2 more algorithms to the dynamic list.
This commit was SVN r12415.
2006-11-02 19:19:08 +00:00
George Bosilca
110d07b7d3 Small optimization or zero length messages.
This commit was SVN r12414.
2006-11-02 19:10:28 +00:00
Pavel Shamis
566667ac61 Adding progress thread support to OpenIB BTL.
Reviewed by Gleb.

This commit was SVN r12411.
2006-11-02 16:15:21 +00:00
George Bosilca
dbec514b0f Optimize the generation of the match_bits and the mask.
This commit was SVN r12396.
2006-11-01 23:19:20 +00:00
Gleb Natapov
4c784b6403 As Andrew Friedley pointed, my previous patch may cause deadlock if
mca_btl_openib_endpoint_connect_eager_rdma() is called recursively. He also
noticed that orte_pointer_array_add() can't fail because we allocate max number
of elements at init time. So just remove error handling and locking. No locking
 - no deadlocks.

This commit was SVN r12388.
2006-11-01 15:53:33 +00:00
Gleb Natapov
3bf31fe4a3 Correctly determine the first element on the list. opal_list_get_prev() never
returns NULL, it should be compared with opal_list_get_end() instead.

This commit was SVN r12387.
2006-11-01 13:44:47 +00:00
Gleb Natapov
b5714d698a Fix compilation with GM version smaller than 2.0. Fix compilation warnings.
This commit was SVN r12386.
2006-11-01 10:26:15 +00:00
Gleb Natapov
aac695a51f eager_rdma_buffers update is not atomic. A buffer is added to the array and if
something is going wrong down in the code it is removed from the array. So add
mutex to prevent concurrent access to the array from different threads.

This commit was SVN r12385.
2006-11-01 07:27:32 +00:00
Andrew Friedley
48c5117476 Fix some signedness warnings on threaded builds introduced by r12369
This commit was SVN r12376.

The following SVN revision numbers were found above:
  r12369 --> open-mpi/ompi@d7375ec102
2006-10-31 17:29:25 +00:00
Gleb Natapov
d7375ec102 Fix deadlock reported by Andrew Friedley:
What's happening is that we're holding openib_btl->eager_rdma_lock when
we call mca_btl_openib_endpoint_send_eager_rdma() on
btl_openib_endpoint.c:1227.  This in turn calls
mca_btl_openib_endpoint_send() on line 1179.  Then, if the endpoint
state isn't MCA_BTL_IB_CONNECTED or MCA_BTL_IB_FAILED, we call
opal_progress(), where we eventually try to lock
openib_btl->eager_rdma_lock at btl_openib_component.c:997.

The fix removes this lock altogether. Instead we atomically set local RDMA
pointer to prevent other threads to create rdma buffer for the same endpoint.
And we increment eager_rdma_buffers_count atomically thus polling thread doesn't
need lock around it.

This commit was SVN r12369.
2006-10-31 09:54:52 +00:00
Gleb Natapov
1b152dfe09 On 64 bit platform if high 32 bits of buf address is not zero they are trimmed
by wrong bitwise and. Fix it by expanding mask to 64 bits.

This commit was SVN r12368.
2006-10-31 07:33:35 +00:00
Gleb Natapov
7b39039cd6 Add comments to process_pending functions.
This commit was SVN r12346.
2006-10-29 09:12:24 +00:00
Gleb Natapov
8ef5b6a589 Change tabs to spaces to be consistent with the rest of the file.
This commit was SVN r12345.
2006-10-29 08:12:44 +00:00
George Bosilca
a9c6ae8f15 Minimize the number of branches, and orce the correct prediction for the
most usual one. Most of the time we expect the functions which allocate
requests to succeed.

This commit was SVN r12344.
2006-10-27 23:16:13 +00:00
George Bosilca
44f3dd81b4 Update the comment to reflect what's inside the code.
This commit was SVN r12343.
2006-10-27 23:09:37 +00:00
George Bosilca
3472d19d4d Do not modify the convertor if there is no data to be send across the network. The
req_bytes_packed field is initialized in the BASE_INIT macro, so it is set for
all requests at this stage.

This commit was SVN r12342.
2006-10-27 23:03:15 +00:00
Jeff Squyres
020efdf1f9 Refs trac:250
This commit essentially caches the invoking comm/win/file on the
ompi_request_t. This, paired with the req_type field, allows us to
retrieve the invoking MPI object and invoke the proper errhandler.

The patch is missing most updates for the MPI-2 one-sided stuff (i.e.,
the patch mainly fixes comms and files); I didn't really understand
that code and didn't want to hazard trying to figure it out when Brian
can probably do it much more quickly.

So #250 will still stay open, pending MPI-2 one-sided updates for this
stuff.

This commit was SVN r12339.

The following Trac tickets were found above:
  Ticket 250 --> https://svn.open-mpi.org/trac/ompi/ticket/250
2006-10-27 12:35:27 +00:00
Jeff Squyres
e02114dcf3 Fixes trac:529.
* Create a new request type: NOOP (described below)
 * For all MPI_*_INIT functions, OBJ_NEW an ompi_request_t and set its
   type to NOOP
 * Ensure that the NOOP requests are OBJ_RELEASE'd when they are done
 * MPI_START looks at the request type; if NOOP, just return success. If
   not, call the PML start() function
 * MPI_STARTALL always pass the entire array of requests back to the PML
   (see next point)
 * Make the PMLs only process PML requests (i.e., ignore/skip anything
   that isn't of type PML -- such as the NOOP requests)
 * Add a little more param error checking in STARTALL

This commit was SVN r12338.

The following Trac tickets were found above:
  Ticket 529 --> https://svn.open-mpi.org/trac/ompi/ticket/529
2006-10-27 12:32:36 +00:00
George Bosilca
882b429f64 ompi_mtl_datatype_pack is not a data-type function (really) so it still
need the free_after (which btw has a different meaning that the one removed
from the data-type engine few minutes ago).

This commit was SVN r12333.
2006-10-27 00:15:53 +00:00
George Bosilca
393657ee26 Initialize the sndbuf in all cases. Do not forget to initialize the
tree used in each of the broadcast functions.

This commit was SVN r12332.
2006-10-27 00:13:33 +00:00
George Bosilca
126a68dc9a Big datatype commit. Remove all unused features of the datatype engine. As the memory
allocation logic is completely done outside the data-type engine (in the PML) there is
no need for any special case inside the data-type engine. There is less arguments for
the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is
not required anymore as there is no memory allocated in the engine itself). This change
affect all components using datatypes. I test most of them, but it might happens that I
miss some ... If it's the case please let me know (don't shoot the pianist!!).

This commit was SVN r12331.
2006-10-26 23:11:26 +00:00
George Bosilca
a1a4f7c422 Reset the segment pointer once we release the self fragment.
This commit was SVN r12330.
2006-10-26 23:07:14 +00:00
George Bosilca
be8516e0d7 Anothers indentations.
This commit was SVN r12329.
2006-10-26 23:06:15 +00:00
George Bosilca
83dfd36c1f Indentations.
This commit was SVN r12328.
2006-10-26 23:05:41 +00:00
George Bosilca
91ab093e96 Cleanup. No extern required for the function prototypes.
This commit was SVN r12327.
2006-10-26 23:03:12 +00:00
George Bosilca
ba3c247f2a Big collective commit. I lightly test it, but I think it should be quite stable. Anyway,
the default decision functions (for broadcast, reduce and barrier) are based on a
high performance network (not TCP). It should give good performance (really good) for
any network having the following caracteristics: small latency (5 microseconds) and good
bandwidth (more than 1Gb/s).
+ Cleanup of the reduce algorithms, plus 2 new algorithms (binary and binomial). Now most
  of the reduce algorithms use a generic tree based function for completing the reduce.
+ Added macros for computing the trees (they are used for bcast and reduce right now).
+ Allow the usage of all 5 topologies.
+ Jelena's implementation of a binary tree that can be used for non commutative operations.
  Right now only the tree building function is there, it will get activated soon.
+ Some others minor cleanups.

This commit was SVN r12326.
2006-10-26 22:53:05 +00:00
Andrew Friedley
c752502dee Fix for a common race condition when running the Sandia mt_send_recv.cc test.
A segfault would occur in mca_pml_ob1_recv_request_progress() when trying to prepare the convertor for unpacking, because the request's req_proc field was NULL.

Turns out that we weren't setting the req_proc field in the MCA_PML_OB1_CHECK_SPECIFIC_AND_WILD_RECEIVES_FOR_MATCH macro.  Instead of just setting it there I removed the other place req_proc was being set correctly, and instead took care of all the cases at once in mca_pml_ob1_recv_frag_match().

This commit was SVN r12323.
2006-10-26 19:09:39 +00:00
Gleb Natapov
90be664b9f Some process_pending() functions get bml_btl on which resource was freed as a
parameter. For optimisation purpose only this BTL is used to send packet
through instead of trying to send packets through all BTLs. But actually the 
code was wrong. It simply used provided bml_btl and it may represent different
endpoint from packet's destination. The fixed code checks if packet's
destination is reachable through the BTL, finds appropriate bml_btl and only
then tries to send it through correct bml_btl.

This commit was SVN r12319.
2006-10-26 13:21:47 +00:00
Terry Dontje
7259d1b512 Adjust allocation size to be a quantity divisible by sizeof(size_t). This
is done to assure alignment so strictly aligned CPUs (like SPARC) do not
sigbus.  This also may benefit other platforms too.

This commit fixes trac:494.

This commit was SVN r12312.

The following Trac tickets were found above:
  Ticket 494 --> https://svn.open-mpi.org/trac/ompi/ticket/494
2006-10-25 18:22:38 +00:00
Sven Stork
f3f39e003e - Increment the pipeline depth before we trigger the send function. As
mentioned in the comment the completion/callback of the triggered 
  send operation can happen before the call returns. If this happens and
  if the pipeline depth is 0 before we triggered the send operation and 
  this is the last send operation of the request then the completion detection
  code will decrement the pipeline depth and check it for equality to 0.
  Because (0-1) != 0 the pml completion function for this request will 
  *not* be called.
  This part 2 of the fix for ticket #246.

This commit was SVN r12292.
2006-10-25 08:52:39 +00:00
Sven Stork
3563f15fde - Fix a bug in descriptor handling code. The self BTL was mixing the different
kinds of descriptors (e.g. put rdma descriptor in the eager free-list). 
  This part 1 of the fix for ticket #246.

This commit was SVN r12291.
2006-10-25 08:45:29 +00:00
George Bosilca
99631ccf66 Cleanups.
This commit was SVN r12272.
2006-10-23 22:29:17 +00:00
George Bosilca
d7d3f9e486 Tuned collectives works only for at least 2 processes. We have the self module
for the other cases.

This commit was SVN r12271.
2006-10-23 22:28:56 +00:00
George Bosilca
b848a5ad06 Remove all ompi_coll_chain_t references.
This commit was SVN r12269.
2006-10-23 21:47:50 +00:00
George Bosilca
39cd8d3d17 One to rule them all. We only need one topology information: a tree. How we
build it it's hat make the difference.

This commit was SVN r12268.
2006-10-23 21:46:30 +00:00
George Bosilca
9cf3040e5f Allocate enough memory for the reduce operation when MPI_IN_PLACE is specified.
This commit was SVN r12260.
2006-10-23 17:51:36 +00:00
George Bosilca
6b697ad3dd If the operation is not commutative then force the basic reducve algorithm. The others
cannot be used for non commutative operations ... yet ...

This commit was SVN r12241.
2006-10-20 22:11:44 +00:00
George Bosilca
a7b6078b73 No more segfault. Still some wrong data around ...
This commit was SVN r12238.
2006-10-20 20:17:34 +00:00
George Bosilca
02759cf515 Update the reduce chain collective.
This commit was SVN r12237.
2006-10-20 19:47:52 +00:00
George Bosilca
d7268557a8 Complete the SM BTL changes. Now all displacements are ptrdiff_t and there is
no warnings about any issue with signed/unsigned.

This commit was SVN r12234.
2006-10-20 19:28:12 +00:00
George Bosilca
c86214f420 Fix the SM BTL issues. The problem seems to come from the fact that
the maximum number of nodes on the SM file should be signed, as we use
the -1 to unlimit it.

This commit was SVN r12227.
2006-10-20 17:25:53 +00:00
George Bosilca
06563b5dec Last set of explicit conversions. We are now close to the zero warnings on
all platforms. The only exceptions (and I will not deal with them
anytime soon) are on Windows:
- the write functions which require the length to be an int when it's
  a size_t on all UNIX variants.
- all iovec manipulation functions where the iov_len is again an int
  when it's a size_t on most of the UNIXes.
As these only happens on Windows, so I think we're set for now :)

This commit was SVN r12215.
2006-10-20 03:57:44 +00:00
George Bosilca
527bb7a197 Remove a double ;
This commit was SVN r12213.
2006-10-20 03:28:51 +00:00
George Bosilca
caefd6d0ee Do not leak memory. Allocate the intermediary buffer only when we really need it
(not leafs) and release on the same way.

This commit was SVN r12200.
2006-10-19 22:20:33 +00:00
George Bosilca
26b33ec2d7 If there is just one node, we don't need a decision function, just do the copy
and return.

This commit was SVN r12199.
2006-10-19 22:19:36 +00:00
George Bosilca
3eb2f90ceb For the recurvise doubling correctly compute the closest power of 2 number of
nodes.

This commit was SVN r12191.
2006-10-19 17:14:57 +00:00
George Bosilca
041fcb8d18 Update the barrier decision function.
This commit was SVN r12190.
2006-10-19 17:14:01 +00:00
George Bosilca
18d119bc06 No more warnings.
This commit was SVN r12181.
2006-10-18 21:10:11 +00:00
Galen Shipman
2036bf5c3c make smart and dumb compilers happy
This commit was SVN r12178.
2006-10-18 19:33:39 +00:00
George Bosilca
c9da782804 Keep only one function to get the size of a datatype.
This commit was SVN r12170.
2006-10-18 17:33:01 +00:00
George Bosilca
3db5c0487d typos.
This commit was SVN r12168.
2006-10-18 17:12:25 +00:00
George Bosilca
21ade43b96 Remove a non reacheable statement.
This commit was SVN r12166.
2006-10-18 16:43:55 +00:00
Rainer Keller
47b24a0603 - Now the branch is done, linearize access regarding
request handling.  Buys a little bit on IMB, no
   functional change, otherwise.

This commit was SVN r12165.
2006-10-18 16:11:50 +00:00
Gleb Natapov
252a9cea34 Fix bug in vma rcache.
This commit was SVN r12163.
2006-10-18 10:55:01 +00:00
George Bosilca
be27ee6fa0 Correct the bcast problem where we always did a bcast with segzise of 0.
Activate the reduce decision function.
Others small updates (mostly TAB to spaces).

This commit was SVN r12161.
2006-10-18 02:00:46 +00:00
George Bosilca
640178c4b3 Grepping through the source files I found these calls to the data-type engine
with the wrong type of arguments.

This commit was SVN r12148.
2006-10-17 21:05:04 +00:00
George Bosilca
6f5ec2390b pedantic...
This commit was SVN r12147.
2006-10-17 20:25:40 +00:00
George Bosilca
8852c00c36 Look like a big commit but in fact it address only one issue. The way we're working with
size and diplacement of data-type. After this patch all data can contain size_t bytes
and the displacements are defined as ptrdiff_t. All of the files I was able to compile
have been modified to match this requirement.

This commit was SVN r12146.
2006-10-17 20:20:58 +00:00
Andrew Friedley
16769e64fe Remove old UD BTL code.
The UD BTL isn't gone - the latest version is in my afriedle-ud branch.  This version on the trunk was very old, ompi_ignore'd, lacked performance, and probably contained bugs.  The maintained version on my branch is working solid, and will eventually come back, but not for v1.2.

This commit was SVN r12144.
2006-10-17 18:59:21 +00:00
Rainer Keller
668902c780 - trivial spelling
This commit was SVN r12139.
2006-10-17 16:34:52 +00:00
George Bosilca
ed83927025 Don't reset the convertor when a persistent request complete. Instead reset it
next time then request is used. This will keep the execution path on the default
case (not persistent) shorter.

This commit was SVN r12134.
2006-10-17 05:01:47 +00:00
George Bosilca
ef66afe45c Another inner loop optimization. Only check for num_fails when prev_bytes is
equal to num_bytes.

This commit was SVN r12133.
2006-10-17 04:38:38 +00:00
Pavel Shamis
d64cb58007 Tavor HCA vendor_id was changed to correct one: 23108
Reviewed by: Jeff

This commit was SVN r12127.
2006-10-16 14:46:39 +00:00
Brian Barrett
14f338b7df Fix for lock/unlock epoch issues. Previously, we did not handle the case
where a window was in both the passive and active side of a lock sequence.

Refs trac:488

This commit was SVN r12112.

The following Trac tickets were found above:
  Ticket 488 --> https://svn.open-mpi.org/trac/ompi/ticket/488
2006-10-12 22:52:13 +00:00
Gleb Natapov
afa26fc4f9 Remove empty lines at the end of the file.
This commit was SVN r12083.
2006-10-11 11:38:38 +00:00
Galen Shipman
7102d39415 Use OMPI_FREE_LIST_GET not the blocking OMPI_FREE_LIST_WAIT, this allows GM
to take advantage of PML OB1 resource management.. Tested with intel test
suite p2p_c on 4 nodes. 

This commit was SVN r12075.
2006-10-10 14:56:55 +00:00
George Bosilca
179067dfb5 Correct a type that break the PSM build.
This commit was SVN r12069.
2006-10-09 23:14:22 +00:00
Brian Barrett
51b2a0fd3f A couple of changes to improve shared memory behavior when resources get
constrained:

  * Make sure we always have a number of eager fragments available
    that scales with the number of processes communicating with
    a given proc over shared memory
  * Use FREE_LIST_GET instead of FREE_LIST_WAIT to return an
    error to the PML when resource exhaustion occurs
  * Don't dereference the frag during alloc unless we're sure
    it's not NULL

Reviewed by: Galen

Refs trac:413

This commit was SVN r12053.

The following Trac tickets were found above:
  Ticket 413 --> https://svn.open-mpi.org/trac/ompi/ticket/413
2006-10-06 21:13:49 +00:00
George Bosilca
b27f1814c6 If the function is expected to return a bool then let's return only
true or false.

This commit was SVN r11991.
2006-10-05 05:10:34 +00:00
George Bosilca
4689c56210 Always cast the return of malloc.
This commit was SVN r11990.
2006-10-05 05:07:43 +00:00
George Bosilca
e4df4285b1 Reorder the enum in order to allow some compilers to optimize the big switch in
the header analisys.

This commit was SVN r11975.
2006-10-04 20:03:28 +00:00
Andrew Friedley
836261b85a Fixes ticket 186.
First, move the OPAL_THREAD_LOCK out to the same level as its corresponding UNLOCK.  It was possible to hit the UNLOCK without ever acquiring the lock.

Since the OPAL_THREAD_ADD64() is now protected by this lock, we can just do the decrement non-atomically.

This commit was SVN r11958.
2006-10-03 18:15:26 +00:00
Andrew Friedley
1177844d7a Fixes trac:183.
Don't try to acquire ompi_request_lock here, which in all cases is already held.  Avoids deadlock that occurs even when threads are enabled and we're running a THREAD_SINGLE app.

Reviewed by Galen.

This commit was SVN r11957.

The following Trac tickets were found above:
  Ticket 183 --> https://svn.open-mpi.org/trac/ompi/ticket/183
2006-10-03 18:08:48 +00:00