1
1
Граф коммитов

2527 Коммитов

Автор SHA1 Сообщение Дата
Jelena Pjesivac-Grbovic
eae3df4904 Updated broadcast decision function based on MX results up to 64 nodes.
(The previous decision function did not consider binomial algorithm (since we did not have it at the time)).

This commit was SVN r13007.
2007-01-06 00:37:40 +00:00
Jeff Squyres
1bdf883277 Oops --- don't need those header files.
This commit was SVN r13000.
2007-01-05 00:08:16 +00:00
Jeff Squyres
fc3637a7c6 Take out the checks for NULL and STATUS[ES]_IGNORE -- it turns out
that a) STATUS[ES]_IGNORE *is* NULL, and b) ROMIO blindly sends its
status through to STATUS_SET_ELEMENTS, even if the status is IGNORE.
So we have some legal cases where IGNORE can be passed through here.

Well, that's what we get for trying to do good error checking.  :-(

This commit was SVN r12999.
2007-01-04 23:03:36 +00:00
Jeff Squyres
75df4ca602 Minor fixes for MPI-level aborting:
- Fix some fpritnf's in ompi_mpi_abort() that incorrectly assumed that
  we were always being invoked from MPI_ABORT (ompi_mpi_abort() may be
  invoked from a bunch of different places)
- Also try to opal_backtrace_print() if opal_bactrace_buffer() is not
  supported. 
- Print a message in MPI_ABORT if we're aborting.

This commit was SVN r12998.
2007-01-04 22:30:28 +00:00
Brian Barrett
48ec0b2071 Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix
for now...

This commit was SVN r12997.

The following SVN revision numbers were found above:
  r12974 --> open-mpi/ompi@27cea44a9c
2007-01-04 22:07:37 +00:00
Galen Shipman
d207a6c988 endpoint should use a uint64_t for subnet, as everyone else does.. makes bad
things happen when packing into a 64 bit buffer... 

Misc cleanup.. 

This commit was SVN r12993.
2007-01-04 20:25:28 +00:00
Brian Barrett
936fdd2ae1 remove some code that accidently came in with r12974. Refs trac:587
This commit was SVN r12991.

The following SVN revision numbers were found above:
  r12974 --> open-mpi/ompi@27cea44a9c

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-04 20:17:07 +00:00
Galen Shipman
931a389c4f fix deadlock on rendezvous protocol..
This commit was SVN r12982.
2007-01-04 03:46:11 +00:00
Galen Shipman
f12bbe0591 Handle different subnets correctly and multiple nic endpoint negotiation
This is somewhat limited currently for expample,  if you have 3 ports on Node A and 5 ports
on Node B then the peers will use 3 ports to communicate with each other. 
This is on a subnet basis, so for any pair of nodes we take the
intersection of the available ports within a subnet.

We use subnets to determine reachability for lazy connection establishment. So
if Node A and Node B each have two HCA's (on seperate networks) then the
subnet's must be distinct, otherwise we will try to wire up HCA's on seperate
networks.  

This commit was SVN r12978.
2007-01-03 22:35:41 +00:00
Brian Barrett
7cac26d240 * fix some typos that slipped in with r12974. Refs trac:587
This commit was SVN r12976.

The following SVN revision numbers were found above:
  r12974 --> open-mpi/ompi@27cea44a9c

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-03 20:14:45 +00:00
George Bosilca
1be60e802f Add the DECLSPEC.
This commit was SVN r12975.
2007-01-03 19:59:18 +00:00
Brian Barrett
27cea44a9c Fix a number of issues with the ompi_ptr_t:
* Make sure that the pval always writes to the correct portion of the
    lval.  This only matters on 32 bit big endian machines.
  * On 32 bit machines when assigning to pval, the other 4 bytes of lval
    weren't being written, which could lead to bogus data

We use macros so that there aren't casts all over the code and the pval
assignment can occur to the correct 4 bytes.  Refs trac:587

This commit was SVN r12974.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-03 19:47:48 +00:00
Gleb Natapov
a6127fd8ce Increase req_bytes_delivered atomically.
This commit was SVN r12971.
2007-01-03 15:19:34 +00:00
Gleb Natapov
79202561f6 Don't check req_pipeline_depth on frag completion. Checking of
req_bytes_delivered should be enough.

This commit was SVN r12967.
2007-01-03 14:44:20 +00:00
Gleb Natapov
1ad6c41735 Sender can start scheduling send fragments immediately after receiving ACK. No
need to wait for RNDV completion.

This commit was SVN r12965.
2007-01-03 12:37:11 +00:00
Rich Graham
8a9da02063 change code to conform with coding standard.
Handle error condition where shared memory file is not created.

This commit was SVN r12964.
2007-01-03 00:06:02 +00:00
Donald Kerr
899297c8f4 udapl btl was not compiling after r12878 on 12/17/2006, some minor changes to allow btl to compile
This commit was SVN r12963.

The following SVN revision numbers were found above:
  r12878 --> open-mpi/ompi@190e7a27cd
2007-01-02 21:44:12 +00:00
Jeff Squyres
ea2d49e55d Because of r12712, we actually need to allocate one ''more'' item than
the value of __n holds.  This is not a problem in the first case
because sizeof(int) == sizeof(MPI_Flogical), so no alloc is actually
performed (which is most compilers, and why we haven't been bitten by
this yet).  But the second case -- where sizeof(int) !=
sizeof(MPI_Flogical) -- is definitely a problem and needs the "+1" in
the alloc, or Bad Things will happen.

This commit was SVN r12953.

The following SVN revision numbers were found above:
  r12712 --> open-mpi/ompi@3e11c76d4c
2007-01-02 16:16:29 +00:00
George Bosilca
3cde822d98 Add missing headers.
This commit was SVN r12951.
2007-01-02 08:04:34 +00:00
George Bosilca
d8dee3a740 If the MX driver was unable to load correctly, or if the endpoint was not
created then don't try to call the MX endpoint close function.

This commit was SVN r12950.
2007-01-02 00:01:50 +00:00
Rich Graham
6cb2377015 Change the allocation of the shared memory backing file. The file
is allocated on a per comm_world instance, with the lowest rank
in comm_world on the given host creating and initializing the file,
and then notifying the remaining files via the OOB.

Reviewed: Ralph Castain, Brian Barrett
Addressing ticket #674.

This commit was SVN r12949.
2007-01-01 02:39:02 +00:00
George Bosilca
e223b27268 A fragment is marked completed by the PML when the peer signal the
completion of the RDMA operation associated with the fragment. The
PML will call the BML free which in turn will call the BTL free. The MX 
BTL will not release the fragment if it not tagged with 0xff.

This commit was SVN r12947.
2006-12-31 03:17:47 +00:00
Brian Barrett
2951586b43 Fix type casting issue with MPI_ERRCODES_IGNORE that would cause
errors when using a C++ compiler.  

This commit was SVN r12946.
2006-12-31 02:51:13 +00:00
Brian Barrett
1ba97181dc A number of MPI-2 compliance fixes for the C++ bindings:
* Added Create_errhandler for MPI::File
  * Make errors_throw_exceptions a first-class predefined exception
    handler, and make it work for Comm, File, and Win
  * Deal with error handlers and attributes for Files, Types, and Wins
    like we do with Comms - can't just cast the callbacks from C++
    signatures to C signatures.  Callbacks will then fire with the
    C object, not the C++ object.  That's bad.

Refs trac:455

This commit was SVN r12945.

The following Trac tickets were found above:
  Ticket 455 --> https://svn.open-mpi.org/trac/ompi/ticket/455
2006-12-30 23:41:42 +00:00
George Bosilca
47601e315e Allow the MX BTL to select at runtime if the unexpected handler will
be activated or not.

This commit was SVN r12944.
2006-12-30 20:57:50 +00:00
Brian Barrett
4e157380bf Heterogeneous support changes:
* Add line about heterogeneous support to ompi_info output
  * Print warning and abort if heterogeneous detected and 
    no heterogeneous support available.

Refs trac:587

This commit was SVN r12943.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2006-12-30 17:13:18 +00:00
Brian Barrett
99c0a29602 Disable CM and DR PMLs in heterogeneous situtations as neither are
heterogeneous safe.

Refs trac:587

This commit was SVN r12942.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2006-12-30 16:17:56 +00:00
George Bosilca
d401a65975 Minor cleanups. Don't set the fields that will never be used.
This commit was SVN r12941.
2006-12-29 07:55:17 +00:00
George Bosilca
0b5d879a63 ompi_convertor_pack do not return errors (all checkings are done when the
convertor is created).

This commit was SVN r12940.
2006-12-29 07:40:02 +00:00
George Bosilca
d8db9e49f3 Set the bml_btl to NULL or segfault !!!
This commit was SVN r12939.
2006-12-29 07:38:24 +00:00
Brian Barrett
c010119667 If a BTL isn't needed due to exclusivity ranking, need to call a matching
inuse decrement for the increment that was at the start of the procs loop.
Otherwise, the inuse count can end up higher than it actually is and a btl
can end up in the progress loop when it isn't active to any peer.

Refs trac:543

This commit was SVN r12938.

The following Trac tickets were found above:
  Ticket 543 --> https://svn.open-mpi.org/trac/ompi/ticket/543
2006-12-29 02:22:40 +00:00
George Bosilca
416e5b5f6a Enable the MX extensions if and only if the mx_extensions.h header
is installed on the system.

This commit was SVN r12937.
2006-12-29 00:31:32 +00:00
George Bosilca
d7bc180a90 The max allocated tag is not 16. Use the define instead.
This commit was SVN r12936.
2006-12-28 22:48:58 +00:00
George Bosilca
3eeecc3838 Add support for faster small messages. While sending a message, we check if
the data was buffered by the MX library. If it's the case then we declare
the send as completed and disable the completion event for the mx request.

This commit was SVN r12935.
2006-12-28 22:34:24 +00:00
Brian Barrett
f191a042f6 Fix compile errors in heterogeneous code when not building heterogeneous
support.

Refs trac:701

This commit was SVN r12932.

The following Trac tickets were found above:
  Ticket 701 --> https://svn.open-mpi.org/trac/ompi/ticket/701
2006-12-28 20:28:14 +00:00
George Bosilca
b996c00d1a Set the limits for the MX fragments to 4K. Add code to dump the state of the MX
hardware (not activated).

This commit was SVN r12931.
2006-12-28 08:40:37 +00:00
George Bosilca
3903009b8b Add a check for the unexpected handler. If enabled, allow the zero-copy
protocol over the MX BTL. Now, we have only one matching, the one in Open
MPI.

The problem is that when the unexpected handler is triggered, not all the
message is on the host memory. In the best case we get one MX fragment (internal
MX fragment), in the worst we get NULL. The only way to fit this with the
design of the PML is to force the eager protocol at the MX internal fragment
size, and to limit the send/receive protocol at the same size. Tests show
the outcome is not far from optimal (if the pipeline depth is increased
a little bit).

Set MX_PIPELINE_LOG in order to allow MX to use internal fragments of 4K.

This commit was SVN r12930.
2006-12-28 03:35:41 +00:00
George Bosilca
ff2319dcb7 Complete the OUT protocol. Small latency improvements. Some minor cleanups.
Create some macros, reorder some functions. Make sure all fragments are
correctly released at the end.

This commit was SVN r12926.
2006-12-26 18:15:24 +00:00
George Bosilca
75a35ed7ee Implement the PUT protocol over MX. The send/receive approach give the best
performance on a 2G Myrinet card, as it look like pipelining the messages
by 1M is faster than a simple send/receive. However, when using a 10G card
the send/receive will limit the maximum bandwidth to 2.5Gbs. The reason is
the scarce bus resources that have to be shared between the Myrinet hardware
and the memcpy operation. The PUT protocol remove the memcpy, we now have a 
true zero-copy mechanism. But, there is no pipelining yet as it look like the
RDMA pipeline somehow disappeared from the OB1 PML ...

This commit was SVN r12925.
2006-12-24 22:52:46 +00:00
George Bosilca
e8bd985870 Add more output when calls to the MX library fails.
Move the connection status from theproc into the endpoint.

This commit was SVN r12924.
2006-12-24 22:34:48 +00:00
George Bosilca
14dc72f595 Allow the user to change the MX flags.
This commit was SVN r12923.
2006-12-24 22:21:00 +00:00
George Bosilca
dbe2798638 Allow MX to handle shared memory and self communications. By default these features
are disabled (btl_mx_shared_mem respectively btl_mx_self have to be set in order
to activate them).

This commit was SVN r12922.
2006-12-24 22:18:41 +00:00
Jelena Pjesivac-Grbovic
3494e1bb05 - Updated decision function for Alltoall collective.
Fixes "jump" for intermediate sizes message on 24+ number of nodes
    (at least on Grig cluster).

This commit was SVN r12920.
2006-12-22 19:59:17 +00:00
George Bosilca
b1725e02d4 No more warnings plus some code reordering.
This commit was SVN r12919.
2006-12-21 22:42:15 +00:00
Edgar Gabriel
dc532577db Adding more accurate checking of the input parameters for the
add_error_class and add_error_code files. Also fixed the update of the
lastusedcode attribute, all of work according to my tests pretty fine.

Please note: the testcode attached to the bug 683 still reports some bugs. I
am however pretty sure that the testcode is wrong at that points:
 - the standard says that the attribute MPI_LASTUSEDCODE has to be updated for
 a new error_class or a new error_code. The test currently assumes, that only
 the add_error_code call changes the attribute value.
 - you have to comment out the two lines 73 and 74 in order to make the
 test finish, since these lines check for the error string of non-existent
 codes.
- line 126 the error-string of MPI_ERR_ARG is not "invalid argument" but a
little bit more, so the test thinks the output is wrong. So probably the test
has to be update to match the according error string of MPI_ERR_ARG.

Fixes trac:682

This commit was SVN r12913.

The following Trac tickets were found above:
  Ticket 682 --> https://svn.open-mpi.org/trac/ompi/ticket/682
2006-12-21 19:36:31 +00:00
Jelena Pjesivac-Grbovic
f1aec23507 Adding tuned allgather implementation.
It contains four algorithms: 
Bruck (ciel(logP) steps), Recursive Doubling (log(P) for power-of-2 processes), Ring (P-1 steps),
and Neighbor Exchange (P/2 steps for even number of processes).

All algorithms passed occ, IMB-2.3, and intel verification tests from ompi-tests/ for up to 56 processes.
The fixed decision function is based on results collected over MX on the Grig cluster at 
the University of Tennessee at Knoxville.  
I have also added (and commented out) copy of MPICH2 decision function for allgather
(from their IJHPCA 2005 paper).

This commit was SVN r12910.
2006-12-21 18:40:02 +00:00
Brian Barrett
7880353fcc Need to close every endpoint we open, or the MX progress thread doesn't die,
which can cause segfaults on shutdown.  Calling mx_finalize() isn't enough
to shutdown the thread, so must close endpoints as well.

Refs trac:513

This commit was SVN r12908.

The following Trac tickets were found above:
  Ticket 513 --> https://svn.open-mpi.org/trac/ompi/ticket/513
2006-12-21 18:13:22 +00:00
Galen Shipman
faa0bafa96 modify preconnect to use a rotating ring algorithm, OOB connections are
brought up lazily so we want to be a bit less agressive. 

This commit was SVN r12906.
2006-12-21 01:36:57 +00:00
Gleb Natapov
484c6a2c1a Use OPAL_ALIGN() macro to align length. Return address from mpool_alloc is now
properly aligned so no need to align it once more.

This commit was SVN r12899.
2006-12-19 08:34:48 +00:00
Brian Barrett
2ab65eb521 Remove some debugging output that was #if 0'ed out but shouldn't have been
committed into the trunk anyway

This commit was SVN r12897.
2006-12-19 02:34:41 +00:00
Gleb Natapov
b910d10a81 Add general functions for alignment and change rdma_mpool_align to always
honor an alignment event if posix_memalign() is not available.

This commit was SVN r12892.
2006-12-18 10:52:18 +00:00
Brian Barrett
c554638446 Support systems without malloc.h or posix_memalign (ie, pretty much every
one we support that isn't Linux)

This commit was SVN r12880.
2006-12-17 17:28:59 +00:00
Brian Barrett
b448b4e47e More heterogeneous fixes. Don't set reachability bit on a remote proc
if the remote architecture differs from the local architecture and the
btl doesn't support heterogeneous transport.

Refs trac:587

This commit was SVN r12879.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2006-12-17 17:27:08 +00:00
Gleb Natapov
190e7a27cd Merge with gleb-mpool branch. All RDMA components use same mpool now (rdma).
udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows
to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited).

This commit was SVN r12878.
2006-12-17 12:26:41 +00:00
Brian Barrett
f1fdd7c041 Handle case where remote process is of different architecture than the local
process when creating a datatype from an internal description.

Refs trac:640

This commit was SVN r12877.

The following Trac tickets were found above:
  Ticket 640 --> https://svn.open-mpi.org/trac/ompi/ticket/640
2006-12-17 04:39:16 +00:00
Brian Barrett
0653dc3f24 Pad headers to eliminate heterogeneous issues. Add conversion functions
for switching endianness of headers.  Galen is going to add the code to
use the endian stuff...

This commit was SVN r12876.
2006-12-17 00:50:59 +00:00
Brian Barrett
01e8fc5f91 Redo of r12871, without the preconnect code change:
Move the req_mtl structure back to the end of each of the structures in 
the CM PML. The req_mtl structure is cast into a mtl_*_request_structure 
for each MTL, which is larger than the req_mtl itself. The cast will cause
the *_request to overwrite parts of the heavy requests if the req_mtl
isn't the *LAST* thing on each structure (hence the comment). This was 
moved as an optimization at some point, which caused buffer sends to fail...

Refs trac:669

This commit was SVN r12873.

The following SVN revision numbers were found above:
  r12871 --> open-mpi/ompi@597598b712

The following Trac tickets were found above:
  Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669
2006-12-15 17:54:14 +00:00
Brian Barrett
bdf0b231b2 Undo r12871, as it contained some code in ompi/runtime that shouldn't have been
committed

Refs trac:669

This commit was SVN r12872.

The following SVN revision numbers were found above:
  r12871 --> open-mpi/ompi@597598b712

The following Trac tickets were found above:
  Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669
2006-12-15 17:52:13 +00:00
Brian Barrett
597598b712 Move the req_mtl structure back to the end of each of the structures in the
CM PML.  The req_mtl structure is cast into a mtl_*_request_structure for
each MTL, which is larger than the req_mtl itself.  The cast will cause
the *_request to overwrite parts of the heavy requests if the req_mtl
isn't the *LAST* thing on each structure (hence the comment).  This was
moved as an optimization at some point, which caused buffer sends to
fail...

Refs trac:669

This commit was SVN r12871.

The following Trac tickets were found above:
  Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669
2006-12-15 17:46:53 +00:00
Rainer Keller
b99e5a71d1 - Help message in case of MPI-application with two init or
calling init functions after finalize.

This commit was SVN r12858.
2006-12-14 19:58:04 +00:00
Brad Benton
18da4c40d3 Set the QP's static rate from the associated MCA parameter, rather
than just defaulting to 0.

Fixes trac:675

This commit was SVN r12855.

The following Trac tickets were found above:
  Ticket 675 --> https://svn.open-mpi.org/trac/ompi/ticket/675
2006-12-14 19:42:24 +00:00
Brian Barrett
38c2e43ac2 Print out error string rather than errno for TCP-related errors, making it easier for both the user and us to debug issues with BTL and OOB issues...
This commit was SVN r12852.
2006-12-14 18:20:43 +00:00
Jeff Squyres
0ca8cb35b7 Fixes trac:366
Add ability for ini files to recognize "use_eager_rdma" flag.  Set the
default to "no" (because we should assume that HCAs cannot support the
property necessary for using RDMA for eager messages -- that the last
byte of the message is guaranteed to be written to memory last --
unless proven otherwise.  For example, iWARP cards apparently do not
provide this guarantee), and then set all Mellanox and IBM HCAs to
override the default to enable this behavior on these cards.

This commit was SVN r12851.

The following Trac tickets were found above:
  Ticket 366 --> https://svn.open-mpi.org/trac/ompi/ticket/366
2006-12-14 15:52:13 +00:00
Dan Lacher
e3f749acc4 Ticket: #673
Submitted by: Dan Lacher

This commit was SVN r12844.
2006-12-13 20:01:16 +00:00
George Bosilca
80bc0c8868 Allow the MX to survive if we are unable to connect to a peer. The PML will
try to find another route.

This commit was SVN r12837.
2006-12-13 01:12:07 +00:00
Mohamad Chaarawi
cae083dec6 replaced the old CID allocation algorithm with the blocked algorithm. The
impace in the communicator directory is still not great since the interface
for allocating a Cid has not changed..

This commit was SVN r12836.
2006-12-12 22:01:39 +00:00
Brian Barrett
10af8ab454 Corrections for when threading is enabled.
Refs trac:564

This commit was SVN r12830.

The following Trac tickets were found above:
  Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564
2006-12-12 18:48:42 +00:00
Brad Benton
337116d5fd Added IBM eHCA vendor and part id info
This commit was SVN r12827.
2006-12-12 14:12:39 +00:00
Brian Barrett
cf196ce420 Instead of an unknown proc list that requires ownership transfer of data (which, in turn, requires a complex series of locks to be held during the transfer), use a modex backing store with backpointers from the proc to the backing store. The proc structures no longer own the modex data, which greatly simplifies locking when an unknown proc suddenly becomes known.
Refs trac:564

This commit was SVN r12822.

The following Trac tickets were found above:
  Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564
2006-12-11 21:27:30 +00:00
Ralph Castain
0a5d41857a Complete next round of message size reduction: "strip" the descriptive info from the returned values. I have now added a flag to the gpr address mode (ORTE_GPR_STRIPPED) that instructs the gpr to not include segment names or tokens in the returned gpr_value_t objects.
I found only two places that were looking at the tokens:

1. the odls - we used the tokens to separately process the globals container data from everything else. In this case, I left the subscription that returned the globals data alone, but "stripped" the subscription that returned the launch data for the procs. These subscriptions have nothing to do with the xcast message.

2. the pml_base_modex - the callback function was getting process names from the returned tokens. Actually, this function was doing a very bad thing - it was assuming that the first token returned was *always* the process name. This is currently true, but is one of those assumptions that someone could have easily changed - and suddenly found the system inexplicably failing. I modified the function to (a) get the name sent back to us, (b) "stripped" the value structures of tokens and segment strings, and (c) correctly obtained process names from the returned values. I also reindented the heck out of the code so it was legible (at least, to my old eyes).

This commit was SVN r12813.
2006-12-09 23:10:25 +00:00
Jeff Squyres
e70ef98ea6 Update the help message to be a bit more specific and refer to the web
FAQ.

This commit was SVN r12812.
2006-12-09 15:13:03 +00:00
Jeff Squyres
c7282855e7 Fixes trac:659
This commit fixes several aspects regarding MPI conformance of requests.

 * Eliminate the last argument of ompi_errhandler_request_invoke(); we
   ''always'' want to invoke the back-end exception handler with the
   real error code.
 * Make it clear in comments that we only invoke the ''first''
   exception in a given array of requests, even if there's more than
   one request with a non-MPI_SUCCESS value for MPI_ERROR.
 * Defer the freeing of requests upon exception in the back-end
   functions to MPI_WAIT* and MPI_TEST* until later; the requests are
   kept so that we know what handler to invoke when we actually invoke
   the exception.  After figuring that out, ''then'' we free requests
   with pending exceptions on them.
 * Clean up return codes from the back-end MPI_TEST* and MPI_WAIT*
   functions.
 * Slightly modify ompi_errcode_get_mpi_code() to return unity if it
   receives an MPI error code (vs. an OMPI error code).

This commit was SVN r12810.

The following Trac tickets were found above:
  Ticket 659 --> https://svn.open-mpi.org/trac/ompi/ticket/659
2006-12-09 14:20:08 +00:00
Patrick Geoffray
58c6f8c8e1 Copyright update. Thanks to Jeff to remind me.
This commit was SVN r12803.
2006-12-07 23:55:00 +00:00
Patrick Geoffray
6e09b0c23f lval is not defined when pval is assigned on 32 bit systems. this
usually is ok on little-endian systems, as the upper 32 bits will likely
be ignored, but on 32-bit big-endian systems, lval is complete junk.
Use ival if 32 bit mode, lval if 64.

Mixing of 32 and 64 bit architectures won't work without more changes.

This commit was SVN r12802.
2006-12-07 23:34:04 +00:00
Brian Barrett
98884e45e4 Clean up the way procs are added to the global process list after MPI_INIT:
* Do not add new procs to the global list during modex callback or
    when sharing orte names during accept/connect.  For modex, we
    cache the modex info for later, in case that proc ever does get
    added to the global proc list.  For accept/connect orte name
    exchange between the roots, we only need the orte name, so no
    need to add a proc structure anyway.  The procs will be added
    to the global process list during the proc exchange later in 
    the wireup process
  * Rename proc_get_namebuf and proc_get_proclist to proc_pack
    and proc_unpack and extend them to include all information
    needed to build that proc struct on a remote node (which
    includes ORTE name, architecture, and hostname).  Change
    unpack to call pml_add_procs for the entire list of new
    procs at once, rather than one at a time.
  * Remove ompi_proc_find_and_add from the public proc
    interface and make it a private function.  This function
    would add a half-created proc to the global proc list, so
    making it harder to call is a good thing.

This means that there's only two ways to add new procs into the global proc list at this time: During MPI_INIT via the call to ompi_proc_init, where my job is added to the list and via ompi_proc_unpack using a buffer from a packed proc list sent to us by someone else.  Currently, this is enough to implement MPI semantics.  We can extend the interface more if we like, but that may require HNP communication to get the remote proc information and I wanted to avoid that if at all possible.

Refs trac:564

This commit was SVN r12798.

The following Trac tickets were found above:
  Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564
2006-12-07 19:56:54 +00:00
Brian Barrett
b07dfa7841 * remove unused variable in ompi_comm_get_rprocs
* don't load data into a buffer until we have the data, as
    the data contains some header information needed to
    properly load the data

This commit was SVN r12792.
2006-12-07 16:19:44 +00:00
Ralph Castain
a1153fdc8f Eliminate virtually all of the attribute_predefined data from the STG1 message. We now compute the total number of slots allocated to us and save that in the registry - the attributed_predefined then retrieves it via the STG1 message. The app_num is passed via the process_info structure, which gets the value from the ODLS in the environment.
Obviously, people like bproc will have to get the app_num via another avenue...but that's a problem for another day. Several options are easily available.

This commit was SVN r12788.
2006-12-07 03:11:20 +00:00
Brian Barrett
41a70a8f01 indent, this time with the right coding standards...
This commit was SVN r12787.
2006-12-07 00:24:01 +00:00
Brian Barrett
f9ec8d6f2a reindent file to make it easier to deal with...
This commit was SVN r12786.
2006-12-07 00:21:25 +00:00
Edgar Gabriel
1359ba9b13 Rewriting much of the errorcode and errorclass code, since
- we have to be able to attach a string to an error class, not just to an
 error code
 - according to MPI-2 the attribute MPI_LASTUSEDCODE has to be updated
  everytime you add a new code or a new class. Thus, you have to have single
  list for both. 

Thus, we got rid of the error_class structure. In the error-code structure, we
can distinguish whether we are dealing with an error code or an error class by
looking at the err->code element of the structure. In case its value is
MPI_UNDEFINED, the according entry is a class, else it is an error code. All
predefined error codes have the code and the class field set to the same
value.

The test MPI_Add_error_class1 passes now.

Fixes trac:418

This commit was SVN r12764.

The following Trac tickets were found above:
  Ticket 418 --> https://svn.open-mpi.org/trac/ompi/ticket/418
2006-12-05 19:07:02 +00:00
Brian Barrett
6f8b366acb Rename liborte to libopen-rte and libopal to libopen-pal per telecon today
and bug #632.

Refs trac:632

This commit was SVN r12762.

The following Trac tickets were found above:
  Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632
2006-12-05 18:27:24 +00:00
Brian Barrett
441432950f Merge in changes from the bwb-heterogeneous temp branch (r12491 -
r12714) for supporting compilers / architectures with different
padding rules.

This commit was SVN r12749.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r12491
  r12714
2006-12-04 20:11:42 +00:00
Rainer Keller
6f8f28f40f - Get rid of inline definition, otherwise static-compilation fails.
This commit was SVN r12735.
2006-12-03 14:52:17 +00:00
Gleb Natapov
30ca7457b4 Some BTLs (e.g TCP) can report put/get completion before data actually
hits the buffer on the other side. For this kind of BTLs we need to send
FIN through the same BTL, PUT was performed with so network will handle
ordering for us. If we will use another BTL, receiver can get FIN before
data will hit the buffer and complete request prematurely. We mark such
problematic BTLs with MCA_BTL_FLAGS_FAKE_RDMA flag (this kind of RDMA
is really fake, because the real one guaranties that sender will see the
completion only after receiver's NIC confirmed that all the data was
received).

This commit was SVN r12732.
2006-12-03 10:12:09 +00:00
Gleb Natapov
39c930b160 The bug fixing part of r12720 introduce much more serious bug that it fixes.
It calls mca_pml_ob1_send_fin_btl() which may fail and doesn't check return
code. This breaks all RDMA transports event when only one BTL is used. Revert
it for now, I am working on a real fix for the problem (I hope).

This commit was SVN r12731.

The following SVN revision numbers were found above:
  r12720 --> open-mpi/ompi@3e3689320b
2006-12-03 08:55:59 +00:00
Gleb Natapov
65d7ad4581 The "bug fix" from the r12721 reverts part of the r12433 that fixed
regresion from v1.1 was reviewed and put to v1.2 branch. So revert this part
of r12721 back.

This commit was SVN r12730.

The following SVN revision numbers were found above:
  r12433 --> open-mpi/ompi@82f7c0dd69
  r12721 --> open-mpi/ompi@3edd850d2e
2006-12-03 08:29:55 +00:00
George Bosilca
a0ed53d70b Make the compilers happy.
This commit was SVN r12729.
2006-12-03 00:19:11 +00:00
George Bosilca
3edd850d2e Some indentation and code arrangement. However, there is a bug fix. Force the PUT
protocol to always obey to the btl_max_rdma_size.

This commit was SVN r12721.
2006-12-01 22:26:14 +00:00
George Bosilca
3e3689320b Some indentations and one BIG fix. Avoid race conditions on the PUT RDMA
protocol when multiple NICS are available between 2 peers. The fix force
the FIN message to take exactly the same path as the fragment it describe
(i.e. same path means same BTL). Otherwise, the FIN can be received by
the peer before the RDMA complete and the request will get freed
too early.

This commit was SVN r12720.
2006-12-01 21:52:07 +00:00
George Bosilca
658879232b Several small improvements:
- consistent error message when something fails (via BTL_ERROR macro)
- decrease the number of jumps.
- cleanup some parts of the code.

This commit was SVN r12719.
2006-12-01 21:48:06 +00:00
Brian Barrett
657168a74c A couple of Window related error handling cleanups:
* Always invoke the error handler on MPI_COMM_WORLD for
    invalid windows (except in win_create, which should 
    instead be on the given communicator).
  * Allow get_errhandler in addition to set_errhandler
    on MPI_WIN_NULL

Refs trac:647

This commit was SVN r12718.

The following Trac tickets were found above:
  Ticket 647 --> https://svn.open-mpi.org/trac/ompi/ticket/647
2006-12-01 20:05:06 +00:00
Brian Barrett
2a09fa2d9d * silence compiler warning
This commit was SVN r12717.
2006-12-01 20:01:53 +00:00
Brian Barrett
bfbc281e93 Fix slow startup issue with the MX MTL. The problem is caused by mx_connect() being a one-sided operation from the API level, but not being an interrupting call when the target is not entering the MX library. So if most of the processes exit mtl_mx_add_procs() and enter the stage gate 2 barrier, the other processes can only progress their mx_connect() calls when the targets enter the mx library. Because the event library is in EV_ONELOOP mode, this only happens once a second. The mx progress thread (hidden in the MX library) also only wakes up once a second, so mx_connect calls can take a second to complete.
The temporary solution is to switch into EV_NONBLOCK mode earlier (right after the mx_connect loop) so that there isn't a giant slowdown when processes enter the stage gate 2 barrier before other proesses.  They will now not block in the event library for any period of time, which appears to have a 50% speedup when running at > 64 procs.

Refs trac:645

This commit was SVN r12713.

The following Trac tickets were found above:
  Ticket 645 --> https://svn.open-mpi.org/trac/ompi/ticket/645
2006-12-01 02:49:01 +00:00
Bill D'Amico
3e11c76d4c Fixes trac:482
OMPI_ARRAY_INT_2_LOGICAL had an array bounds error - fixed this and the
analogous error in OMPI_ARRAY_LOGICAL_2_INT.

This commit was SVN r12712.

The following Trac tickets were found above:
  Ticket 482 --> https://svn.open-mpi.org/trac/ompi/ticket/482
2006-11-30 19:48:18 +00:00
Pavel Shamis
f08bc818c4 Cleaning mca_btl_openib_progress_thread from
unused variables.

This commit was SVN r12709.
2006-11-30 18:28:45 +00:00
Jeff Squyres
384caeacf4 More fixes similar to r12684 -- fix bad lvalues in assignments for the
case where sizeof(INTEGER) > sizeof(int).

This commit was SVN r12707.

The following SVN revision numbers were found above:
  r12684 --> open-mpi/ompi@e2c605f32a
2006-11-30 16:41:56 +00:00
Ralph Castain
79bd8a842e Add xcast timing in mpi_init since we cannot directly measure it (tree-based comm).
This commit was SVN r12706.
2006-11-30 15:06:40 +00:00
Galen Shipman
4e857f0384 add odls framework to ompi_info
This commit was SVN r12699.
2006-11-29 16:24:49 +00:00
Jeff Squyres
e2c605f32a Fix a problem cited by Pierre-Matthieu Anglade: some typos in the code
path in the MPI F77 API functions where sizeof(int) != sizeof(INTEGER).

This commit was SVN r12684.
2006-11-28 12:21:42 +00:00
Ralph Castain
bc4e97a435 First stage in the move to a faster startup. Change the ORTE stage gate xcast into a binary tree broadcast (away from a linear broadcast). Also, removed the timing report in the gpr_proxy component that printed out the number of bytes in the compound command message as the answer was "not much" - reduces the clutter in the data.
This commit was SVN r12679.
2006-11-28 00:06:25 +00:00
Brian Barrett
bc3250fe6f Fix issue where messages could start arriving before the window was
fully initialized, especially during lock/unlock, which doesn't use MPI
collectives for synchronization (unlike Fence)

This commit was SVN r12676.
2006-11-27 22:42:21 +00:00
Brian Barrett
b5b6f4c4bf Remove check for epoch on the receiver side, as this is a race condition between message arrival, and because we'll be still blocking in a synchronization call, is of no consequence to the user. We do an epoch check on the sending side,
so this isn't an issue there either.  Refs trac:488

This commit was SVN r12675.

The following Trac tickets were found above:
  Ticket 488 --> https://svn.open-mpi.org/trac/ompi/ticket/488
2006-11-27 22:15:34 +00:00
Brian Barrett
beb1e9d4dd * finish move from hard coded tag to #define'd constant tag
This commit was SVN r12674.
2006-11-27 21:55:41 +00:00
Brian Barrett
0c25f7be09 More One-sided fixes:
* Fix a counter roll-over issue that could result from a large (but
    not excessive) number of outstanding put/get/accumulate calls
    during a single synchronization issues (Refs trac:506)
  * Fix epoch issue with rdma component that would effect PWSC
    synchronization (Refs trac:507)

This commit was SVN r12673.

The following Trac tickets were found above:
  Ticket 506 --> https://svn.open-mpi.org/trac/ompi/ticket/506
  Ticket 507 --> https://svn.open-mpi.org/trac/ompi/ticket/507
2006-11-27 21:41:29 +00:00
George Bosilca
59cfee0cd2 Use the MX infinite timeout by default. The user can modify it using an MCA
parameter.

This commit was SVN r12670.
2006-11-27 20:18:58 +00:00
Brian Barrett
993d2a7753 Fix for issue IU is seeing on BigRed with connections timing out during
MPI_INIT.  Use an infinite timeout, which is exactly what MPICH-MX does.

This commit was SVN r12669.
2006-11-27 20:10:27 +00:00
Brian Barrett
63e5668e29 Number of one-sided fixes:
* use one-sided datatype check instead of send/receive and check both
    the origin and target datatypes
  * allow error handler to be set on MPI_WIN_NULL, per standard
  * Allow recursive calls into the pt2pt osc component's progress
    function
  * Fix an uninitialized variable problem in the unlock header

This commit was SVN r12667.
2006-11-27 03:22:44 +00:00
Brian Barrett
0895f5e08d Rename OMPI_PROCESS_NAME_{HTON, NTOH} macros to ORTE_PROCESS_NAME_{HTON, NTOH}
because they are in ORTE, not OMPI.  Also, remove the ORTE_PROCESS_NAME macros
in iof base as they are duplicates of the ones that were in ns_types, which 
meant that bad things happened if you changed what an orte_process_name_t
looked like.

This commit was SVN r12646.
2006-11-22 03:03:21 +00:00
Brian Barrett
33320b7165 Rework the opal_progress interface to better support dynamic processes and at
the same time, remove some of the MPI-related options from OPAL:

  - provide mechanism to change at runtime whether sched_yield() should 
    be called when the progress engine is idle
  - provide mechanism for changing the rate at which the event engine
    is called when there are "no" users of the event engine (ie, when
    using MPI but not TCP)
  - fix some function names in the progress engine to better match
    their intended use (and remove MPI naming scheme)
  - remove progress_mpi_enable / progress_mpi_disable because 
    we can now use the functions to set the sched_yield and
    tick rate interfaces
  - rename opal_progress_events() to opal_progress_set_event_flag()
    because the first really isn't descriptive of what the function
    does and I always got confused by it

This commit was SVN r12645.
2006-11-22 02:06:52 +00:00
Jeff Squyres
922f335678 Fixes trac:624
Ensure that the new predefined MPI-2 attribute callback functions take
the proper types (INTEGER, kind=MPI_ADDRESS_KIND instead of just
INTEGER).

This commit was SVN r12639.

The following Trac tickets were found above:
  Ticket 624 --> https://svn.open-mpi.org/trac/ompi/ticket/624
2006-11-21 13:54:13 +00:00
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
George Bosilca
139f9cf3d0 Make sure we disable the MX shared memory when we use the MX BTL.
This commit was SVN r12587.
2006-11-13 22:17:06 +00:00
Andrew Friedley
a4bdcb4faa Fix a segfault that turned up in more MPI_THREAD_MULTIPLE testing.
Same sort of problem and fix as described in r12323 - mca_pml_ob1_recv_frag_progress() was segfaulting due to a NULL req_proc pointer.  The path leading to this was through the mca_pml_ob1_check_cantmatch_for_match() function, where we can match a frag using the same macros as mca_pml_ob1_frag_match() and never initialize the req_proc pointer.

This commit was SVN r12582.

The following SVN revision numbers were found above:
  r12323 --> open-mpi/ompi@c752502dee
2006-11-13 20:12:51 +00:00
George Bosilca
b621069638 Don't copy the reference count when we duplicate a datatype. Thanks
to Andreas Schafer for the fix.

This commit was SVN r12566.
2006-11-13 05:45:29 +00:00
Ralph Castain
8b6921f297 Initialize the rank variable before it is used.
This commit was SVN r12565.
2006-11-13 02:37:12 +00:00
George Bosilca
2b9f6613c9 Rewrite the ompi_ddt_sndrcv function.
This commit was SVN r12561.
2006-11-12 05:56:02 +00:00
George Bosilca
2ee30fd468 This wasn't the correct way to write the if statement. What I want is
that if one of them is zero then the if will get executed.

This commit was SVN r12560.
2006-11-12 05:27:27 +00:00
Gleb Natapov
9933a6f469 Previous fix doesn't fix the case when opcode is changed in put/get functions.
The fix is to set opcode to SEND at the entrance to the send function before
checking credits and putting fragment to the pending list. We do the same thing
in put/get functions i.e setting opcode at the entrance to the function.

This commit was SVN r12559.
2006-11-11 07:51:06 +00:00
Ralph Castain
4e50cdae52 This commit accomplishes two things:
1. Fix the "hang" condition when an application isn't found. It turned out that the ODLS had some difficulty with the process actually not having been started - hence, it never called the waitpid callback. As a result, the "terminated" trigger didn't fire, and so mpirun didn't wake up. With this change, the HNP's errmgr forces the issue by causing the trigger to fire itself when an abort condition occurs.

2. Shift the recording of the pid and the nodename from mpi_init to the orted launcher. This allows programs such as Eclipse PTP to get the pids even for non-MPI applications. In the case of bproc, the pls handles this chore since we don't use orteds in that system.

This commit was SVN r12558.
2006-11-11 04:03:45 +00:00
George Bosilca
ec410644ce Implement the send receive as 2 non blocking operations. That will help us
avoiding too many calls to opal_progress.

This commit was SVN r12553.
2006-11-10 23:06:19 +00:00
George Bosilca
c2c6a1b37e Correctly compute the number of elements in a segment.
For broadcast send the correct size for all intermediary nodes.

This commit was SVN r12552.
2006-11-10 23:04:50 +00:00
George Bosilca
7102147b9f Correctly detect when the specified algorithm is out of range. In
this case we reset it to zero.

This commit was SVN r12551.
2006-11-10 21:47:07 +00:00
George Bosilca
bfbd0e61f6 Minimize the number of lines of code :)
This commit was SVN r12550.
2006-11-10 20:56:08 +00:00
George Bosilca
a38cd366d7 Construct the convertor. It's not really required, but it's not in the
critical path anyway. At least in debug mode we get nice informations about
where the convertor was created.

This commit was SVN r12549.
2006-11-10 20:55:06 +00:00
George Bosilca
858ab24e8e The req_mtl field has to be the last in the struct or bad things happen.
This commit was SVN r12548.
2006-11-10 20:53:41 +00:00
George Bosilca
af68171253 Use the macro to compute the number of elements in a segment in both
bcast and reduce and update the default values for the variables
as required by the comment in the coll_tuned.h file.

This commit was SVN r12546.
2006-11-10 20:04:08 +00:00
George Bosilca
476b922074 Updates & upgrades:
- consistent arguments checking (not allowing to select an algorithm which
     is not available)
 - consistent way of computing the segcount (number of datatypes by segment).
 - small cleanups.
 - more informative debugging messages.

This commit was SVN r12545.
2006-11-10 19:54:09 +00:00
Gleb Natapov
7e03b83d23 Reset opcode field to SEND. It is checked later in pending progress function.
This commit was SVN r12531.
2006-11-10 06:17:00 +00:00
George Bosilca
77ef979457 New architecture for broadcast. A generic broadcast working on a tree
description. Most of the bcast algorithms can be completed using this
generic function once we create the tree structure. Add all kind of
trees.

There are 2 versions of the generic bcast function. One using overlapping
between receives (for intermediary nodes) and then blocking sends to all
childs and another where all sends are non blocking. I still have to
figure out which one give the smallest overhead.

This commit was SVN r12530.
2006-11-10 05:53:50 +00:00
George Bosilca
17405cd9c6 A temporary fix, until we figure out a better approach. The problem
is that if one add "pml=" to the configuration file, really bad things
happen. All PMLs will get initialize, and each of them will initialize
all BTLs. This patch force the mca_pml_base_pml to get initialized in
all cases before we go out of the mca_pml_base_open function.

This commit was SVN r12527.
2006-11-10 04:53:00 +00:00
George Bosilca
ab1655079d Turn off the CONVERTOR_NO_OP once we reach this point.
This commit was SVN r12526.
2006-11-09 23:56:31 +00:00
Jeff Squyres
8a08b092f6 Check to see if we need to do anything. If we don't (i.e., if all the
rcounts are 0), then just return MPI_SUCCESS.

This commit was SVN r12525.
2006-11-09 23:21:34 +00:00
George Bosilca
4cb7910a8b And now the optimization step.
This commit was SVN r12521.
2006-11-09 20:26:51 +00:00
George Bosilca
1d80f685b5 Remove one compiler warning.
This commit was SVN r12520.
2006-11-09 20:08:43 +00:00
George Bosilca
e33d1dedab Rewrite the conditions to keep them as small as possible. Correct some
of these conditions. Optimize the flags generation for the convertor.

This commit was SVN r12518.
2006-11-09 19:33:19 +00:00
George Bosilca
da1720d1ef Indentation only.
This commit was SVN r12517.
2006-11-09 19:31:52 +00:00
George Bosilca
73eec4bfef Show the MCA parameter coll_base_verbose only if Open MPI is compiled in
debug mode. Otherwise there is no debug anyway ...

This commit was SVN r12516.
2006-11-09 19:02:32 +00:00
Jeff Squyres
0a28212392 This is a workaround to bug in the Intel C++ compiler, version 9.1
(all versions up to and including 20060925).  The issue has been
reported to Intel, along with a small [non-MPI] test program that
reproduces the problem (the test program and the OMPI C++ bindings
work fine with Intel C++ 9.0 and many other C++ compilers).

In short, a static initializer for a global variable (i.e., its
constructor is fired before main()) that takes as an argument a
reference to a typedef'd type will simply get the wrong value in the
argument.  Specifically:

{{{
namespace MPI {
    Intracomm COMM_WORLD(MPI_COMM_WORLD);
}
}}}

The constructor for MPI::Intracomm should get the value of
&ompi_mpi_comm_world.  It does not; it seems to get a random value.

As mandated by MPI-2, annex B.13.4, for C/C++ interoperability, the
prototype for this constructor is:

{{{
class Intracomm {
public:
    Intracomm(const MPI_Comm& data);
};
}}}

Experiments with icpc 9.1/20060925 have shown that removing the
reference from the prototype makes it work (!).  After lots of
discussions about this issue with a C++ expert (Doug Gregor from IU),
we decided the following (cut-n-paste from an e-mail):

-----
> So here's my question: given that OMPI's MPI_<CLASS> types are all
> pointers, is there any legal MPI program that adheres to the above
> bindings that would fail to compile or work properly if we simply
> removed the "&" from the second binding, above?

I don't know of any way that a program could detect this change. FWIW,
the C++ committee has agreed that implementation of the C++ standard
library are allowed to decide arbitrarily between const& and by-value.
If they don't care, MPI users won't care.

When you remove the '&', I suggest also removing the "const". It is
redundant, but can trigger some strange name mangling in Sun's C++
compiler.
-----

So with this change:

 * we now work again with the Intel 9.1 compiler
 * our C++ bindings do not exactly conform to the MPI-2 spec, but
   valid/legal MPI C++ apps cannot tell the difference (i.e., the
   functionality is the same)

This commit was SVN r12514.
2006-11-09 17:34:12 +00:00
George Bosilca
a14ff905f8 Update the convertor.c file. This commit was supposed to go together with
r12486 but somehow I miss it.
Update the pack and unpack functions for contigusous datatypes to minimize their
impact on the performance. Keep them as condensed as possible.

This commit was SVN r12513.

The following SVN revision numbers were found above:
  r12486 --> open-mpi/ompi@8746369338
2006-11-09 17:26:55 +00:00
George Bosilca
a82ce427e4 Update the number of reduce algorithms available.
This commit was SVN r12503.
2006-11-08 22:20:34 +00:00
George Bosilca
0dcf0097db They are supposed to be ints not size_t.
This commit was SVN r12497.
2006-11-08 17:05:15 +00:00
George Bosilca
eab1776e9a Explicit casts for our friendly Windows environment...
This commit was SVN r12496.
2006-11-08 17:02:46 +00:00
George Bosilca
f9f1087e7f Be nice and let everybody access this variable.
This commit was SVN r12495.
2006-11-08 16:56:17 +00:00
George Bosilca
0914892044 Small cleanups, some explicit casts.
This commit was SVN r12494.
2006-11-08 16:54:03 +00:00
George Bosilca
74d3946342 Remove the call to set_args. This is only required for the MPI level,
because there we have to be able to return to the user the description
of the data.

This commit was SVN r12493.
2006-11-08 16:52:48 +00:00
George Bosilca
b53a313904 These functions dont have to be exported. The outside world
access them only via the pointer in the convertor structure.

This commit was SVN r12492.
2006-11-08 16:49:34 +00:00
George Bosilca
8746369338 This commit was supposed to go with the r12485.
This commit was SVN r12486.

The following SVN revision numbers were found above:
  r12485 --> open-mpi/ompi@b4e9199651
2006-11-08 06:21:15 +00:00
George Bosilca
b4e9199651 Minimize the impact on contiguous and predefined data-types. Most of the
usefull things are now done without the call to any external function,
directly in the macros in this file.
Create a new flag for the convertor to help figure out when we have
to do anything special for pack/unpack. A convertor with the flag
CONVERTOR_NO_OP is a basic convertor, designed for contiguous types.

This commit was SVN r12485.
2006-11-08 06:16:47 +00:00
George Bosilca
915d748d72 Initialize the convertor on _START not on _INIT. This allow us to
set it up before the match when we know the peer, saving some
time on the critical path. If the receive is ANY_SOURCE then
we initialize the convertor on _MATCHED. Anyway, we will set it
up only once per receive.

This commit was SVN r12484.
2006-11-08 05:42:29 +00:00
George Bosilca
eb45a5e402 Move things around a little bit. Mainly fields from the send and receive
request in the base request. Rearrange the fields to keep the data
together. Remove some useless tests.

This commit was SVN r12482.
2006-11-08 04:58:23 +00:00
George Bosilca
63462331c9 Reduce the number of branches. Keep the fast path as short as possible.
Remove some useless error checking. Add OPAL_UNLIKELY directives.

This commit was SVN r12477.
2006-11-07 23:59:32 +00:00
George Bosilca
f3de2e1a82 Keep the fast path as short as possible.
This commit was SVN r12476.
2006-11-07 23:56:32 +00:00
Jeff Squyres
31e8f510fe After review with Rolf, decided to be a bit more safe and instead of
assigning status._cancelled, call MPI_STATUS_SET_CANCELLED.

This commit was SVN r12471.
2006-11-07 20:49:31 +00:00
Jeff Squyres
427c20af0d Use a new algorithm for allgatherv. The old algorithm essentially did
N gatherv's:

  for (i = 0 ... size)
    MPI_Gatherv(..., root = i, ...)

The new algorithm simply does (effectively):

  MPI_Gatherv(..., root = 0, ...)
  MPI_Bcast(..., root = 0, ...)

This commit was SVN r12469.
2006-11-07 18:07:55 +00:00
Jeff Squyres
25dab9700f * Add some error checking to GET_ELEMENTS and STATUS_SET_CANCELLED
* Move the check for STATUS_IGNORE in STATUS_SET_ELEMENTS up into the
   error checking block

This commit was SVN r12460.
2006-11-07 02:47:35 +00:00
Galen Shipman
55db17b37c don't try to use a dead btl..
This commit was SVN r12456.
2006-11-06 23:25:24 +00:00
George Bosilca
f81a599326 Don't do the safe guard for contiguous memory layouts.
This commit was SVN r12455.
2006-11-06 23:15:24 +00:00
George Bosilca
108ea4dbe9 When the MX MTL complete a request, force a return from the progress function.
Decrease the latency by about 0.3 microseconds.

This commit was SVN r12454.
2006-11-06 23:13:07 +00:00
George Bosilca
3d0df2cf29 Allow the MX BTL to finish the small sends quicker. Once the mx_isend is posted if
the message size is less than 4K do a check for the message completion and if any
call the callback.

This commit was SVN r12453.
2006-11-06 23:12:01 +00:00
Galen Shipman
eef37430a7 failing already failed for ACK timeout..
This commit was SVN r12452.
2006-11-06 22:09:39 +00:00
Galen Shipman
813e7faea8 more fixes for failover.. and yet still more to come..
This commit was SVN r12450.
2006-11-06 21:27:17 +00:00
Jeff Squyres
6e1729cb93 Fixes trac:580
* Add MPI::Status methods Set_elements() and Set_cancelled()
 * Added a bunch of comments in various places in the MPI C++ bindings
   implementatio just to explain what's going on (because C++ can hide
   a lot from you)
 * Insert C++ callbacks for the MPI_Grequest callback functions
   registered by MPI::Grequest::Start().  These callbacks keep a
   little meta-data (created by Grequest::Start()) that allow the
   proper callback signatures from C (i.e., from ompi_grequest_<foo>()
   in libmpi.a -- C code), translate arguments as required, and then
   invoke the callbacks with proper C++ signatures (i.e., call
   user-defined callbacks with C++ function signatures).

This commit was SVN r12446.

The following Trac tickets were found above:
  Ticket 580 --> https://svn.open-mpi.org/trac/ompi/ticket/580
2006-11-06 18:42:00 +00:00
Gleb Natapov
b4fd2d7d50 Fix warnings from progress thread patch.
This commit was SVN r12434.
2006-11-06 12:34:56 +00:00
Gleb Natapov
82f7c0dd69 Fix regression from v1.1.
1) make the code do what comment says
2) if memory is prepinned don't send multiple PUT messages.

This commit was SVN r12433.
2006-11-06 12:00:17 +00:00
Ralph Castain
c77f6c605e Update timing reports:
1. Remove timing of xcast from mpi_init

2. Add timing report from oob_xcast on how long it took to send the message

This commit was SVN r12428.
2006-11-03 18:55:05 +00:00
Ralph Castain
60e27c77e7 Add some additional timing reporting:
1. Added reporting points around the xcasts in MPI_Init. Note that these times will include time spent waiting for a trigger to fire, which is why the times between stage gates did NOT include these times initially. The inter-stage-gate times still do NOT include the xcast time - the xcast time is reported separately.

2. Added the process vpid on the MPI_Init timing reports for clarity.

3. Added a report from the xcast function on the HNP that outputs the number of bytes in the message being sent to the processes.

This commit was SVN r12422.
2006-11-03 16:04:40 +00:00
Jeff Squyres
bc93583184 Fix the SIZEOF implementation for complex's. We want to output the
size of the complex type as determined by configure; not the size of
the next larger complex type (i.e., a complex*N is 2 real*(N/2)'s, not
2 real*N's).

This commit was SVN r12421.
2006-11-03 15:52:46 +00:00
Galen Shipman
f7c554df65 Try to failover when we get an async error from the lower layer (BTL)..
This commit was SVN r12420.
2006-11-03 15:40:26 +00:00
George Bosilca
8529238d93 Add 2 more algorithms to the dynamic list.
This commit was SVN r12415.
2006-11-02 19:19:08 +00:00
George Bosilca
110d07b7d3 Small optimization or zero length messages.
This commit was SVN r12414.
2006-11-02 19:10:28 +00:00
Pavel Shamis
566667ac61 Adding progress thread support to OpenIB BTL.
Reviewed by Gleb.

This commit was SVN r12411.
2006-11-02 16:15:21 +00:00
George Bosilca
284e286011 Correctly compute the local size using the count argument not
the convertor count field which is not yet initialized.

This commit was SVN r12405.
2006-11-02 06:44:51 +00:00
Jeff Squyres
431f940a52 Fixes trac:496
* Add some more error checking to GREQUEST_START
 * Move the error checking in GREQUEST_COMPLETE up to inside the
   MPI_PARAM_CHECK block, where it belongs
 * Invoke the gen request query_fn in all the Right spots (per MPI-2:8.2)
 * Distinguish between grequests created from C and Fortran
 * Use the OBJ system to reference count to release the grequest at
   the Right time and invoke the grequest free_fn properly (see
   lengthy comment in grequest.c above the destructor)
 * Have ompi_grequest_complete() call ompi_request_complete() rather
   than [poorly] copy the contents of ompi_request_complete()
 * Fix Fortran function callback pointer typedefs to use proper
   Fortran types
 * Edit ompi_request_test* and ompi_request_wait* to properly handle
   generalized requests.  This adds an "if" statement in the critical
   path for all the back-end test* and wait* functions :-(,
   but fortunately George took out two "if" statements from the
   critical path last week.  So we're still ahead.  :-)
 * Move ompi_request_test() out of request.h and into request.c (all
   other test* and wait* functions were already in the .c file -- and
   ompi_request_test() was too long to be statically inlined anyway)

This commit was SVN r12402.

The following Trac tickets were found above:
  Ticket 496 --> https://svn.open-mpi.org/trac/ompi/ticket/496
2006-11-02 03:34:53 +00:00
George Bosilca
ea91fd3bdb Small optimizations and some cleanups.
This commit was SVN r12400.
2006-11-02 00:19:03 +00:00
George Bosilca
994bfce7e8 On heterogeneous environment check for the hetero mask before deciding if there is or not
anything to do for packing and unpacking of a particular data-type.

This commit was SVN r12399.
2006-11-01 23:57:37 +00:00
George Bosilca
d89a52f8e7 If there is no data to send or receive don't do anything useless.
This commit was SVN r12398.
2006-11-01 23:56:34 +00:00
George Bosilca
dbec514b0f Optimize the generation of the match_bits and the mask.
This commit was SVN r12396.
2006-11-01 23:19:20 +00:00
George Bosilca
23e783a487 Initialize the length before using it.
This commit was SVN r12394.
2006-11-01 22:07:21 +00:00
Jeff Squyres
2f936c4597 Fixes trac:549
Had group discussion about this on the weekly call.  The decision was
that we should pass the real error code to the back-end exception
handler because it's pretty useless to pass MPI_ERR_IN_STATUS to the
back-end exception handler (because exception handlers don't have
access to the request or the status - this has potential issues for
fault tolerance kinds of scenarios).  So in TESTALL, TESTSOME,
WAITALL, and WAITSOME, we examine the error code and if it's not
MPI_SUCCESS, return MPI_ERR_IN_STATUS.

This commit was SVN r12389.

The following Trac tickets were found above:
  Ticket 549 --> https://svn.open-mpi.org/trac/ompi/ticket/549
2006-11-01 19:31:43 +00:00
Gleb Natapov
4c784b6403 As Andrew Friedley pointed, my previous patch may cause deadlock if
mca_btl_openib_endpoint_connect_eager_rdma() is called recursively. He also
noticed that orte_pointer_array_add() can't fail because we allocate max number
of elements at init time. So just remove error handling and locking. No locking
 - no deadlocks.

This commit was SVN r12388.
2006-11-01 15:53:33 +00:00
Gleb Natapov
3bf31fe4a3 Correctly determine the first element on the list. opal_list_get_prev() never
returns NULL, it should be compared with opal_list_get_end() instead.

This commit was SVN r12387.
2006-11-01 13:44:47 +00:00
Gleb Natapov
b5714d698a Fix compilation with GM version smaller than 2.0. Fix compilation warnings.
This commit was SVN r12386.
2006-11-01 10:26:15 +00:00
Gleb Natapov
aac695a51f eager_rdma_buffers update is not atomic. A buffer is added to the array and if
something is going wrong down in the code it is removed from the array. So add
mutex to prevent concurrent access to the array from different threads.

This commit was SVN r12385.
2006-11-01 07:27:32 +00:00
Ralph Castain
9204747930 Add timing info to comm_spawn - timing collected and reported when OMPI_MCA_ompi_timing = 1 (or something other than zero).
This commit was SVN r12381.
2006-10-31 23:32:39 +00:00
George Bosilca
a711a58410 Add a function to handle the MPI_Status_set_elements correctly. The current
implementation was just wrong !!!

This commit was SVN r12380.
2006-10-31 23:02:42 +00:00
Ralph Castain
17c71f8d2a Fix ticket #545
Setup subscriptions to correctly return the MPI_APPNUM attribute.

Fix an unreported bug that was found. The universe size was incorrectly defined in the attributes code. As coded, it looked for size_t values and based its size computation on those numbers. Unfortunately, the node_slots value had been changed to an orte_std_cntr_t awhile back! So the universe size was never updated.

Update the hello_nodename test to check for MPI_APPNUM.

Add a definition to ns_types for ORTE_PROC_MY_NAME - just a shortcut for orte_process_info.my_name. Brought over from ORTE 2.0 as it will be used extensively there.

This commit was SVN r12377.
2006-10-31 21:29:07 +00:00
Andrew Friedley
48c5117476 Fix some signedness warnings on threaded builds introduced by r12369
This commit was SVN r12376.

The following SVN revision numbers were found above:
  r12369 --> open-mpi/ompi@d7375ec102
2006-10-31 17:29:25 +00:00
Gleb Natapov
d7375ec102 Fix deadlock reported by Andrew Friedley:
What's happening is that we're holding openib_btl->eager_rdma_lock when
we call mca_btl_openib_endpoint_send_eager_rdma() on
btl_openib_endpoint.c:1227.  This in turn calls
mca_btl_openib_endpoint_send() on line 1179.  Then, if the endpoint
state isn't MCA_BTL_IB_CONNECTED or MCA_BTL_IB_FAILED, we call
opal_progress(), where we eventually try to lock
openib_btl->eager_rdma_lock at btl_openib_component.c:997.

The fix removes this lock altogether. Instead we atomically set local RDMA
pointer to prevent other threads to create rdma buffer for the same endpoint.
And we increment eager_rdma_buffers_count atomically thus polling thread doesn't
need lock around it.

This commit was SVN r12369.
2006-10-31 09:54:52 +00:00
Gleb Natapov
1b152dfe09 On 64 bit platform if high 32 bits of buf address is not zero they are trimmed
by wrong bitwise and. Fix it by expanding mask to 64 bits.

This commit was SVN r12368.
2006-10-31 07:33:35 +00:00
Jeff Squyres
63155bca09 Refs trac:496
Have no idea why this function always returns a failure.  It should
always return SUCCESS (provided the status is value).

This commit was SVN r12364.

The following Trac tickets were found above:
  Ticket 496 --> https://svn.open-mpi.org/trac/ompi/ticket/496
2006-10-30 22:44:23 +00:00
Jeff Squyres
0b2616173a Fixes trac:549
* For MPI_TEST, MPI_TESTANY, MPI_WAIT, and MPI_WAITANY (i.e., the
   TEST/WAIT functions that return up to exactly one completed
   request), return the actual error code.
 * For MPI_TESTALL, MPI_TESTSOME, MPI_WAITALL, MPI_WAITSOME, (i.e.,
   the TEST/WAIT functions that can return more than one completed
   request), return MPI_ERR_IN_STATUS.

This commit was SVN r12355.

The following Trac tickets were found above:
  Ticket 549 --> https://svn.open-mpi.org/trac/ompi/ticket/549
2006-10-30 19:50:09 +00:00
Jeff Squyres
d66f7526fa Ensure to FINI the request (e.g., release fortran resources).
This commit was SVN r12353.
2006-10-30 16:02:29 +00:00
Jeff Squyres
b6899e1085 Ensure to set the request to MPI_REQUEST_NULL when we release it.
This commit was SVN r12352.
2006-10-30 14:43:26 +00:00
Jeff Squyres
2de51fca63 Need to return the error code.
This commit was SVN r12351.
2006-10-30 14:15:44 +00:00
Gleb Natapov
7b39039cd6 Add comments to process_pending functions.
This commit was SVN r12346.
2006-10-29 09:12:24 +00:00
Gleb Natapov
8ef5b6a589 Change tabs to spaces to be consistent with the rest of the file.
This commit was SVN r12345.
2006-10-29 08:12:44 +00:00
George Bosilca
a9c6ae8f15 Minimize the number of branches, and orce the correct prediction for the
most usual one. Most of the time we expect the functions which allocate
requests to succeed.

This commit was SVN r12344.
2006-10-27 23:16:13 +00:00
George Bosilca
44f3dd81b4 Update the comment to reflect what's inside the code.
This commit was SVN r12343.
2006-10-27 23:09:37 +00:00
George Bosilca
3472d19d4d Do not modify the convertor if there is no data to be send across the network. The
req_bytes_packed field is initialized in the BASE_INIT macro, so it is set for
all requests at this stage.

This commit was SVN r12342.
2006-10-27 23:03:15 +00:00
Jeff Squyres
020efdf1f9 Refs trac:250
This commit essentially caches the invoking comm/win/file on the
ompi_request_t. This, paired with the req_type field, allows us to
retrieve the invoking MPI object and invoke the proper errhandler.

The patch is missing most updates for the MPI-2 one-sided stuff (i.e.,
the patch mainly fixes comms and files); I didn't really understand
that code and didn't want to hazard trying to figure it out when Brian
can probably do it much more quickly.

So #250 will still stay open, pending MPI-2 one-sided updates for this
stuff.

This commit was SVN r12339.

The following Trac tickets were found above:
  Ticket 250 --> https://svn.open-mpi.org/trac/ompi/ticket/250
2006-10-27 12:35:27 +00:00
Jeff Squyres
e02114dcf3 Fixes trac:529.
* Create a new request type: NOOP (described below)
 * For all MPI_*_INIT functions, OBJ_NEW an ompi_request_t and set its
   type to NOOP
 * Ensure that the NOOP requests are OBJ_RELEASE'd when they are done
 * MPI_START looks at the request type; if NOOP, just return success. If
   not, call the PML start() function
 * MPI_STARTALL always pass the entire array of requests back to the PML
   (see next point)
 * Make the PMLs only process PML requests (i.e., ignore/skip anything
   that isn't of type PML -- such as the NOOP requests)
 * Add a little more param error checking in STARTALL

This commit was SVN r12338.

The following Trac tickets were found above:
  Ticket 529 --> https://svn.open-mpi.org/trac/ompi/ticket/529
2006-10-27 12:32:36 +00:00
Jeff Squyres
477424c537 Fixes trac:532
* Remove an extra OMPI_REQUEST_INIT() from the grequest constructor
   (it was already invoked by the parent MPI_Request constructor)
 * Set the state of the generalized request to ACTIVE (because this is
   invoked from MPI_GREQUEST_START -- analogous to MPI_START)
 * Before invoking the query function in MPI_REQUEST_COMPLETE, set the
   status on the base request to ompi_status_empty. This gives a set
   of default values for the request, including one for
   status.MPI_ERROR = MPI_SUCCESS (because we check the value of
   MPI_ERROR in MPI_TEST* and MPI_WAIT* processing, and use it to
   determine whether the upper-level API call should raise an MPI
   exception or not).

This commit was SVN r12337.

The following Trac tickets were found above:
  Ticket 532 --> https://svn.open-mpi.org/trac/ompi/ticket/532
2006-10-27 12:28:07 +00:00
George Bosilca
882b429f64 ompi_mtl_datatype_pack is not a data-type function (really) so it still
need the free_after (which btw has a different meaning that the one removed
from the data-type engine few minutes ago).

This commit was SVN r12333.
2006-10-27 00:15:53 +00:00
George Bosilca
393657ee26 Initialize the sndbuf in all cases. Do not forget to initialize the
tree used in each of the broadcast functions.

This commit was SVN r12332.
2006-10-27 00:13:33 +00:00
George Bosilca
126a68dc9a Big datatype commit. Remove all unused features of the datatype engine. As the memory
allocation logic is completely done outside the data-type engine (in the PML) there is
no need for any special case inside the data-type engine. There is less arguments for
the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is
not required anymore as there is no memory allocated in the engine itself). This change
affect all components using datatypes. I test most of them, but it might happens that I
miss some ... If it's the case please let me know (don't shoot the pianist!!).

This commit was SVN r12331.
2006-10-26 23:11:26 +00:00
George Bosilca
a1a4f7c422 Reset the segment pointer once we release the self fragment.
This commit was SVN r12330.
2006-10-26 23:07:14 +00:00
George Bosilca
be8516e0d7 Anothers indentations.
This commit was SVN r12329.
2006-10-26 23:06:15 +00:00
George Bosilca
83dfd36c1f Indentations.
This commit was SVN r12328.
2006-10-26 23:05:41 +00:00
George Bosilca
91ab093e96 Cleanup. No extern required for the function prototypes.
This commit was SVN r12327.
2006-10-26 23:03:12 +00:00
George Bosilca
ba3c247f2a Big collective commit. I lightly test it, but I think it should be quite stable. Anyway,
the default decision functions (for broadcast, reduce and barrier) are based on a
high performance network (not TCP). It should give good performance (really good) for
any network having the following caracteristics: small latency (5 microseconds) and good
bandwidth (more than 1Gb/s).
+ Cleanup of the reduce algorithms, plus 2 new algorithms (binary and binomial). Now most
  of the reduce algorithms use a generic tree based function for completing the reduce.
+ Added macros for computing the trees (they are used for bcast and reduce right now).
+ Allow the usage of all 5 topologies.
+ Jelena's implementation of a binary tree that can be used for non commutative operations.
  Right now only the tree building function is there, it will get activated soon.
+ Some others minor cleanups.

This commit was SVN r12326.
2006-10-26 22:53:05 +00:00
Andrew Friedley
c752502dee Fix for a common race condition when running the Sandia mt_send_recv.cc test.
A segfault would occur in mca_pml_ob1_recv_request_progress() when trying to prepare the convertor for unpacking, because the request's req_proc field was NULL.

Turns out that we weren't setting the req_proc field in the MCA_PML_OB1_CHECK_SPECIFIC_AND_WILD_RECEIVES_FOR_MATCH macro.  Instead of just setting it there I removed the other place req_proc was being set correctly, and instead took care of all the cases at once in mca_pml_ob1_recv_frag_match().

This commit was SVN r12323.
2006-10-26 19:09:39 +00:00
Gleb Natapov
90be664b9f Some process_pending() functions get bml_btl on which resource was freed as a
parameter. For optimisation purpose only this BTL is used to send packet
through instead of trying to send packets through all BTLs. But actually the 
code was wrong. It simply used provided bml_btl and it may represent different
endpoint from packet's destination. The fixed code checks if packet's
destination is reachable through the BTL, finds appropriate bml_btl and only
then tries to send it through correct bml_btl.

This commit was SVN r12319.
2006-10-26 13:21:47 +00:00
Sven Stork
5861ed865d - Add parameter checking as required by the standard.
This commit was SVN r12318.
2006-10-26 09:18:21 +00:00
Sven Stork
9024c5be4b - Fix wrong error values.
This commit was SVN r12317.
2006-10-26 08:26:03 +00:00
Terry Dontje
7259d1b512 Adjust allocation size to be a quantity divisible by sizeof(size_t). This
is done to assure alignment so strictly aligned CPUs (like SPARC) do not
sigbus.  This also may benefit other platforms too.

This commit fixes trac:494.

This commit was SVN r12312.

The following Trac tickets were found above:
  Ticket 494 --> https://svn.open-mpi.org/trac/ompi/ticket/494
2006-10-25 18:22:38 +00:00
Ralph Castain
36d4511143 Bring the timing instrumentation to the trunk.
If you want to look at our launch and MPI process startup times, you can do so with two MCA params:

OMPI_MCA_orte_timing: set it to anything non-zero and you will get the launch time for different steps in the job launch procedure. The degree of detail depends on the launch environment. rsh will provide you with the average, min, and max launch time for the daemons. SLURM block launches the daemon, so you only get the time to launch the daemons and the total time to launch the job. Ditto for bproc. TM looks more like rsh. Only those four environments are currently supported - anyone interested in extending this capability to other environs is welcome to do so. In all cases, you also get the time to setup the job for launch.

OMPI_MCA_ompi_timing: set it to anything non-zero and you will get the time for mpi_init to reach the compound registry command, the time to execute that command, the time to go from our stage1 barrier to the stage2 barrier, and the time to go from the stage2 barrier to the end of mpi_init. This will be output for each process, so you'll have to compile any statistics on your own. Note: if someone develops a nice parser to do so, it would be really appreciated if you could/would share!

This commit was SVN r12302.
2006-10-25 15:27:47 +00:00
George Bosilca
2d17f0fa9d First step on supporting full external32 conversion on both operations
pack and unpack.

This commit was SVN r12299.
2006-10-25 14:33:06 +00:00
George Bosilca
ccbe5ee016 Remove unused code.
This commit was SVN r12297.
2006-10-25 13:59:57 +00:00
Jeff Squyres
126a8b1e22 If we pass an erroneous error code in, don't segv. Instead, return a
"this is bogus" kind of answer.  Passing in bad error codes should
only happen in erroneous sections of the OMPI code base, but still,
it's far more social to print a message saying, "hey, you messed up!"
rather than seg faulting.

Reviewed by Edgar.

This commit was SVN r12295.
2006-10-25 13:37:45 +00:00
Sven Stork
f3f39e003e - Increment the pipeline depth before we trigger the send function. As
mentioned in the comment the completion/callback of the triggered 
  send operation can happen before the call returns. If this happens and
  if the pipeline depth is 0 before we triggered the send operation and 
  this is the last send operation of the request then the completion detection
  code will decrement the pipeline depth and check it for equality to 0.
  Because (0-1) != 0 the pml completion function for this request will 
  *not* be called.
  This part 2 of the fix for ticket #246.

This commit was SVN r12292.
2006-10-25 08:52:39 +00:00
Sven Stork
3563f15fde - Fix a bug in descriptor handling code. The self BTL was mixing the different
kinds of descriptors (e.g. put rdma descriptor in the eager free-list). 
  This part 1 of the fix for ticket #246.

This commit was SVN r12291.
2006-10-25 08:45:29 +00:00
George Bosilca
99631ccf66 Cleanups.
This commit was SVN r12272.
2006-10-23 22:29:17 +00:00
George Bosilca
d7d3f9e486 Tuned collectives works only for at least 2 processes. We have the self module
for the other cases.

This commit was SVN r12271.
2006-10-23 22:28:56 +00:00
George Bosilca
b848a5ad06 Remove all ompi_coll_chain_t references.
This commit was SVN r12269.
2006-10-23 21:47:50 +00:00
George Bosilca
39cd8d3d17 One to rule them all. We only need one topology information: a tree. How we
build it it's hat make the difference.

This commit was SVN r12268.
2006-10-23 21:46:30 +00:00
Rolf vandeVaart
272f766c5f Fix for ticket #219 MPI::Grequest is missing from C++ API. I did the initial implementation and Jeff fixed it up. Passes a new test in trunk/simple/basic/cxx/grequest.cc.
This commit was SVN r12264.
2006-10-23 20:17:30 +00:00
George Bosilca
9cf3040e5f Allocate enough memory for the reduce operation when MPI_IN_PLACE is specified.
This commit was SVN r12260.
2006-10-23 17:51:36 +00:00
Edgar Gabriel
8b09bd181f just a reordering of the arguments in a comparison in order to comply with the
OMPI programming style...

This commit was SVN r12259.
2006-10-23 17:14:23 +00:00
George Bosilca
6b697ad3dd If the operation is not commutative then force the basic reducve algorithm. The others
cannot be used for non commutative operations ... yet ...

This commit was SVN r12241.
2006-10-20 22:11:44 +00:00
George Bosilca
a7b6078b73 No more segfault. Still some wrong data around ...
This commit was SVN r12238.
2006-10-20 20:17:34 +00:00
George Bosilca
02759cf515 Update the reduce chain collective.
This commit was SVN r12237.
2006-10-20 19:47:52 +00:00
George Bosilca
b51b87a4aa The correct way to compute the difference between the actual size and the
expected size, based on the comment few lines before.

This commit was SVN r12235.
2006-10-20 19:33:55 +00:00
George Bosilca
d7268557a8 Complete the SM BTL changes. Now all displacements are ptrdiff_t and there is
no warnings about any issue with signed/unsigned.

This commit was SVN r12234.
2006-10-20 19:28:12 +00:00
Mohamad Chaarawi
08a9b6458c fixed the MPI_Translate_ranks issues reported earlier, where a rank of
MPI_PROC_NULL translates to MPI_PROC_NULL, and an MPI_GROUP_EMPTY as one of
the groups doesn't cause a segmentation fault, but returns MPI_UNDEFINED for
all ranks to be translated.

This commit was SVN r12233.
2006-10-20 19:13:49 +00:00
George Bosilca
c86214f420 Fix the SM BTL issues. The problem seems to come from the fact that
the maximum number of nodes on the SM file should be signed, as we use
the -1 to unlimit it.

This commit was SVN r12227.
2006-10-20 17:25:53 +00:00
Brian Barrett
37fad860b7 Grrr... Forgot that EXTRA_DIST and man_MANS are not set to include all the
possible things contained in the conditional like other rules are (for
example, a SOURCES rule in a conditional automatically has its files
added to the dist rules, even if that conditional isn't tru when
make dist occurs).  So the man files weren't in the tarball.

Put the EXTRA_DIST with the files explicitly listed outside any conditionals
so the man pages always end up in the tarball.

This commit was SVN r12220.
2006-10-20 14:15:38 +00:00
George Bosilca
06563b5dec Last set of explicit conversions. We are now close to the zero warnings on
all platforms. The only exceptions (and I will not deal with them
anytime soon) are on Windows:
- the write functions which require the length to be an int when it's
  a size_t on all UNIX variants.
- all iovec manipulation functions where the iov_len is again an int
  when it's a size_t on most of the UNIXes.
As these only happens on Windows, so I think we're set for now :)

This commit was SVN r12215.
2006-10-20 03:57:44 +00:00
George Bosilca
e81d38f322 Remove a function that was just a proof of concept. The same approach is
not used by the TotalView support.

This commit was SVN r12214.
2006-10-20 03:34:16 +00:00
George Bosilca
527bb7a197 Remove a double ;
This commit was SVN r12213.
2006-10-20 03:28:51 +00:00
Brian Barrett
4dad3ef3ef Follow on to r12146. For platforms that dont' have a ptrdiff_t definition,
provide one for the internals of Open MPI.  For mpi.h, typedef MPI_Aint
either to ptrdiff_t or whatever we used as ptrdiff_t if that type doesn't
actually exist.

This commit was SVN r12212.

The following SVN revision numbers were found above:
  r12146 --> open-mpi/ompi@8852c00c36
2006-10-20 03:24:59 +00:00
George Bosilca
f43d4fa4f2 Last set of datatype updates. Mostly function prototypes updates and
explicit casting.

This commit was SVN r12211.
2006-10-20 02:31:50 +00:00
George Bosilca
dc7bcabb22 type.
This commit was SVN r12210.
2006-10-20 02:30:33 +00:00
George Bosilca
caefd6d0ee Do not leak memory. Allocate the intermediary buffer only when we really need it
(not leafs) and release on the same way.

This commit was SVN r12200.
2006-10-19 22:20:33 +00:00
George Bosilca
26b33ec2d7 If there is just one node, we don't need a decision function, just do the copy
and return.

This commit was SVN r12199.
2006-10-19 22:19:36 +00:00
Brian Barrett
204f5b8f52 - Clean up wrapper compiler man pages during maintainer-clean, since
they might require special tools (not sure if sed with multiple -e
    arguments is totally portable)
  - ignore the opalcc.1 man page.  Couldn't do this in the previous
    man page commit (r12192) because I was removing opalcc.1 in that
    commit.

This commit was SVN r12194.

The following SVN revision numbers were found above:
  r12192 --> open-mpi/ompi@581a4b0a4e
2006-10-19 20:14:40 +00:00
Brian Barrett
581a4b0a4e A few cleanups to the wrapper compiler build system / man pages:
- Only install opal{cc,c++} and orte{cc,c++} if configured with
     --with-devel-headers.  Right now, they are always installed, but 
    there are no header files installed for either project, so there's
    really not much way for a user to actually compile an OPAL / ORTE
    application.

  - Drop support for opalCC and orteCC.  It's a pain to setup all the 
    symlinks (indeed, they are currently done wrong for opalCC) and 
    there's no history like there is for mpiCC.

  - Change what is currently opalcc.1 to opal_wrapper.1 and add some
    macros that get sed'ed so that the man pages appear to be 
    customized for the given command.  

  - Install the wrapper data files even if we compiled with 
    --disable-binaries.  This is for the use case of doing multi-lib
    builds, where one word size will only have the library built, but 
    we need both set of wrapper data files to piece together to 
    activate the multi-lib support in the wrapper compilers.

This commit was SVN r12192.
2006-10-19 18:34:17 +00:00
George Bosilca
3eb2f90ceb For the recurvise doubling correctly compute the closest power of 2 number of
nodes.

This commit was SVN r12191.
2006-10-19 17:14:57 +00:00
George Bosilca
041fcb8d18 Update the barrier decision function.
This commit was SVN r12190.
2006-10-19 17:14:01 +00:00
George Bosilca
18d119bc06 No more warnings.
This commit was SVN r12181.
2006-10-18 21:10:11 +00:00
Ralph Castain
d0eb7d7216 Complete the attribute management functions.
Modify the mapper to better bookmark its stopping place each time, and to pick up the next time from there. This needs to be validated on a multi-node system.

Fix a major memory corruption problem in the registry put/get functions that was doing multiple free's. Not sure how valgrind missed this one, though it only occurred in specific circumstances (such as comm_spawn).

This commit was SVN r12179.
2006-10-18 20:02:16 +00:00
Galen Shipman
2036bf5c3c make smart and dumb compilers happy
This commit was SVN r12178.
2006-10-18 19:33:39 +00:00
George Bosilca
c9da782804 Keep only one function to get the size of a datatype.
This commit was SVN r12170.
2006-10-18 17:33:01 +00:00
George Bosilca
3db5c0487d typos.
This commit was SVN r12168.
2006-10-18 17:12:25 +00:00
George Bosilca
21ade43b96 Remove a non reacheable statement.
This commit was SVN r12166.
2006-10-18 16:43:55 +00:00
Rainer Keller
47b24a0603 - Now the branch is done, linearize access regarding
request handling.  Buys a little bit on IMB, no
   functional change, otherwise.

This commit was SVN r12165.
2006-10-18 16:11:50 +00:00
Ralph Castain
f4a458532b This doesn't totally resolve the comm_spawn problem, but it helps a little. I'll continue working on it and hope to resolve it completely shortly. The issue primarily centers on where to start mapping the child job's processes, and how to deal with oversubscription that might result. At the moment, I am trying to resolve the first issue first (hey, that even sounds right!).
This change does a couple of things:

1. Since the USE_PARENT_ALLOC attribute is a directive about regarding allocation of resources to a job, it more properly should be an attribute of the RAS. Change the name to reflect that and move the attribute define to the ras_types.h file.

2. Add the attributes list to the RMAPS map_job interface. This provides us with the desired flexibility to dynamically specify directives for mapping. The system will - in the absence of any attribute-based directive - default to the values provided in the MCA parameters (either from environment or command-line interface).

This commit was SVN r12164.
2006-10-18 14:01:44 +00:00
Gleb Natapov
252a9cea34 Fix bug in vma rcache.
This commit was SVN r12163.
2006-10-18 10:55:01 +00:00
George Bosilca
be27ee6fa0 Correct the bcast problem where we always did a bcast with segzise of 0.
Activate the reduce decision function.
Others small updates (mostly TAB to spaces).

This commit was SVN r12161.
2006-10-18 02:00:46 +00:00
George Bosilca
640178c4b3 Grepping through the source files I found these calls to the data-type engine
with the wrong type of arguments.

This commit was SVN r12148.
2006-10-17 21:05:04 +00:00
George Bosilca
6f5ec2390b pedantic...
This commit was SVN r12147.
2006-10-17 20:25:40 +00:00
George Bosilca
8852c00c36 Look like a big commit but in fact it address only one issue. The way we're working with
size and diplacement of data-type. After this patch all data can contain size_t bytes
and the displacements are defined as ptrdiff_t. All of the files I was able to compile
have been modified to match this requirement.

This commit was SVN r12146.
2006-10-17 20:20:58 +00:00
Andrew Friedley
16769e64fe Remove old UD BTL code.
The UD BTL isn't gone - the latest version is in my afriedle-ud branch.  This version on the trunk was very old, ompi_ignore'd, lacked performance, and probably contained bugs.  The maintained version on my branch is working solid, and will eventually come back, but not for v1.2.

This commit was SVN r12144.
2006-10-17 18:59:21 +00:00
Rainer Keller
668902c780 - trivial spelling
This commit was SVN r12139.
2006-10-17 16:34:52 +00:00
Ralph Castain
13227e36ab This commit looks a lot bigger than it is, so relax :-)
Fix the problem observed by multiple people that comm_spawned children were (once again) being mapped onto the same nodes as their parents. This was caused by going through the RAS a second time, thus overwriting the mapper's bookkeeping that told RMAPS where it had left off.

To solve this - and to continue moving forward on the ORTE development - we introduce the concept of attributes to control the behavior of the RM frameworks. I defined the attributes and a list of attributes as new ORTE data types to make it easier for people to pass them around (since they are now fundamental to the system, and therefore we will be packing and unpacking them frequently). Thus, all the functions to manipulate attributes can be implemented and debugged in one place.

I used those capabilities in two places:

1. Added an attribute list to the rmgr.spawn interface.

2. Added an attribute list to the ras.allocate interface. At the moment, the only attribute I modified the various RAS components to recognize is the USE_PARENT_ALLOCATION one (as defined in rmgr_types.h).

So the RAS components now know how to reuse an allocation. I have debugged this under rsh, but it now needs to be tested on a wider set of platforms.

This commit was SVN r12138.
2006-10-17 16:06:17 +00:00
George Bosilca
ed83927025 Don't reset the convertor when a persistent request complete. Instead reset it
next time then request is used. This will keep the execution path on the default
case (not persistent) shorter.

This commit was SVN r12134.
2006-10-17 05:01:47 +00:00
George Bosilca
ef66afe45c Another inner loop optimization. Only check for num_fails when prev_bytes is
equal to num_bytes.

This commit was SVN r12133.
2006-10-17 04:38:38 +00:00
George Bosilca
6a4cf1fef5 A request is inactive only if it is persistant. Therefore, we don't need
2 sequential ifs, we can reach the same result with 2 embedded ifs.

This commit was SVN r12132.
2006-10-17 04:27:00 +00:00
George Bosilca
e116a37482 My last commit was wrong. Here is the correct version.
This commit was SVN r12131.
2006-10-17 02:45:03 +00:00
George Bosilca
01f5b4007b Check the count. It has to be a positive number.
This commit was SVN r12130.
2006-10-17 02:40:17 +00:00
Jeff Squyres
0ebe687ed8 Refs trac:502, #503. This commit as a result of review of r12122.
* Update comments in some MPI_FILE_* functions to reflect that the
   MPI specs have different page numbers in the ps and pdf (woof!).
 * Update comments to say "Retain" where we meant retain (not "return)
 * Add a check in MPI_ERRHANDLER_FREE to raise an MPI exception if the
   user attempts to free an intrinsic errhandler *and* the refcount is
   1 (meaning that it would actually free the intrinsic).  This
   protects erroneous programs from segv'ing.
 * Remove lengthy comment from comm_get_errhandler.c which is no
   longer valid (because of the MPI-2 errata that says that users *do*
   have to call MPI_ERRHANDLER_FREE).

This commit was SVN r12128.

The following SVN revision numbers were found above:
  r12122 --> open-mpi/ompi@407b3cb788

The following Trac tickets were found above:
  Ticket 502 --> https://svn.open-mpi.org/trac/ompi/ticket/502
2006-10-17 00:18:35 +00:00
Pavel Shamis
d64cb58007 Tavor HCA vendor_id was changed to correct one: 23108
Reviewed by: Jeff

This commit was SVN r12127.
2006-10-16 14:46:39 +00:00
Brian Barrett
fdc8f69b84 * need to include iostream as well as stdio.h when doing the tricks with
MPI::SEEK_* because iostreams (well, ios_base, but I don't think that
    should be included directly) can use SEEK_* as values in an enum, which
    means that 'const int' is bad for them.
  * Remove now useless comments in the cxx example programs
  * include iostream after mpi.h so that our examples work with other MPI
    implementations that don't try to be as friendly with the constants.

Refs trac:387

This commit was SVN r12125.

The following Trac tickets were found above:
  Ticket 387 --> https://svn.open-mpi.org/trac/ompi/ticket/387
2006-10-16 14:20:31 +00:00
Jeff Squyres
407b3cb788 Fix some problems with errhandlers:
* Fix MPI-2 page number in comments for a specific reference in the
   spec
 * Allow getting/setting the errhandler on MPI_FILE_NULL
 * Allow freeing of intrinsic errhandlers, per MPI-2 errata (if you GET
   an errhandler on a communicator, you must be able to FREE it, even
   if it's an intrinsic).

Thanks to Lisandro Dalcin for reporting these problems.

This commit was SVN r12122.
2006-10-16 12:58:40 +00:00
Brian Barrett
e7a7a64e4c Implement MPI::SEEK_{SET, END, POS} for the C++ bindings, working around
some issues with the C #defines SEEK_{SET, END, POS}.  The workaround
involves some hackery that should work in almost every common use case
for the C stdio constants (and all the legal issues of the MPI constants).
The one issue is that the C stdio constants are now const ints instead
of #defines, which means that #ifdef checks will fail for the constants.

Behavior can be disabled at either configure time or build time.

Refs trac:387

This commit was SVN r12121.

The following Trac tickets were found above:
  Ticket 387 --> https://svn.open-mpi.org/trac/ompi/ticket/387
2006-10-15 23:50:24 +00:00
Brian Barrett
9adde4f7b8 Allow multilib capability based on compiler flags. See:
https://svn.open-mpi.org/trac/ompi/wiki/compilerwrapper3264
for more information.

Refs trac:374

This commit was SVN r12120.

The following Trac tickets were found above:
  Ticket 374 --> https://svn.open-mpi.org/trac/ompi/ticket/374
2006-10-15 21:21:08 +00:00
Brian Barrett
14f338b7df Fix for lock/unlock epoch issues. Previously, we did not handle the case
where a window was in both the passive and active side of a lock sequence.

Refs trac:488

This commit was SVN r12112.

The following Trac tickets were found above:
  Ticket 488 --> https://svn.open-mpi.org/trac/ompi/ticket/488
2006-10-12 22:52:13 +00:00
George Bosilca
c75cfd3efc Remove all warnings from the TotalView interface.
This commit was SVN r12091.
2006-10-11 17:29:29 +00:00
Gleb Natapov
afa26fc4f9 Remove empty lines at the end of the file.
This commit was SVN r12083.
2006-10-11 11:38:38 +00:00
Ralph Castain
1f7a5da3ce Bring singleton comm_spawn online.
This commit was SVN r12081.
2006-10-10 23:59:48 +00:00
Galen Shipman
7102d39415 Use OMPI_FREE_LIST_GET not the blocking OMPI_FREE_LIST_WAIT, this allows GM
to take advantage of PML OB1 resource management.. Tested with intel test
suite p2p_c on 4 nodes. 

This commit was SVN r12075.
2006-10-10 14:56:55 +00:00
George Bosilca
33f300f636 I don't know what it was supposed to do but I'm quite sure it didn't do it correctly.
Just follow inc_num and you will understand. Now _resize will grow the list to match
the required number of elements as described in the comment in the .h file.

This commit was SVN r12074.
2006-10-10 14:47:51 +00:00
Pavel Shamis
e400da01bc Fix of typo error introduced in revision 12053.
Reviewed by: Jeff

This commit was SVN r12071.
2006-10-10 11:58:29 +00:00
George Bosilca
179067dfb5 Correct a type that break the PSM build.
This commit was SVN r12069.
2006-10-09 23:14:22 +00:00
Brian Barrett
51b2a0fd3f A couple of changes to improve shared memory behavior when resources get
constrained:

  * Make sure we always have a number of eager fragments available
    that scales with the number of processes communicating with
    a given proc over shared memory
  * Use FREE_LIST_GET instead of FREE_LIST_WAIT to return an
    error to the PML when resource exhaustion occurs
  * Don't dereference the frag during alloc unless we're sure
    it's not NULL

Reviewed by: Galen

Refs trac:413

This commit was SVN r12053.

The following Trac tickets were found above:
  Ticket 413 --> https://svn.open-mpi.org/trac/ompi/ticket/413
2006-10-06 21:13:49 +00:00
George Bosilca
2411ad74e4 Fix the bug #315. If there are multiple threads waiting for a free_list item
then use broadcast in order to wake them up. If there is only one then use signal
(which is supposed to be faster) and of course if there are no threads
waiting then just continue.

This commit was SVN r12049.
2006-10-06 16:17:50 +00:00
George Bosilca
8e246ce9db Add th headers to the make dist.
This commit was SVN r12045.
2006-10-06 14:52:44 +00:00
Jeff Squyres
77a5eb3b69 Fixes trac:236
Doing pointer math properly (e.g., incrementing by the right amount)
helps you not overflow buffers, cause random chaos, and contribute to
the heat death of the universe.  Sigh.

This commit was SVN r12015.

The following Trac tickets were found above:
  Ticket 236 --> https://svn.open-mpi.org/trac/ompi/ticket/236
2006-10-05 16:43:10 +00:00
George Bosilca
b2b2ade6c9 Remove the last underscore. Look like a typo ...
This commit was SVN r12006.
2006-10-05 05:41:57 +00:00
George Bosilca
3fc0e969b7 Cast in order to return the expected bool.
This commit was SVN r11992.
2006-10-05 05:11:05 +00:00
George Bosilca
b27f1814c6 If the function is expected to return a bool then let's return only
true or false.

This commit was SVN r11991.
2006-10-05 05:10:34 +00:00
George Bosilca
4689c56210 Always cast the return of malloc.
This commit was SVN r11990.
2006-10-05 05:07:43 +00:00
George Bosilca
5fb0c63959 Keep the ompi_config.h in the begining of the include list. This fixes
the #365. I found another way around on Windows.

This commit was SVN r11989.
2006-10-05 05:02:27 +00:00
George Bosilca
bf1d4ee259 Cast to the uint64_t before playing with the << operator in order to make sure we keep
all usefull informations.

This commit was SVN r11986.
2006-10-04 23:58:59 +00:00
George Bosilca
9e4b64ce81 Cleaning up the file, removing useless asserts. Commenting all calls to
opal_output.

This commit was SVN r11985.
2006-10-04 23:57:05 +00:00
George Bosilca
54476382e6 Remove all useless asserts and correct one of the comments.
Correctly compute the length to be unpacked.
Protect all outputs (only on debug mode and only if mpi_ddt_unpack is set).

This commit was SVN r11984.
2006-10-04 23:56:10 +00:00
George Bosilca
66e039496d This is the minimal patch for #365 (BLACS problem). Correctly compute the displacement
of the elements when we unroll a loop.

This commit was SVN r11983.
2006-10-04 23:54:22 +00:00
Jeff Squyres
e08c6e81f5 * Fixes trac:338: Only look at root-significant values at the root (e.g.,
recvbuf in MPI_GATHER).
 * Minor style updates (constants on the left of == and !=)
 * Fix a minor buglet that crept in r11904: had a recvbuf where it
   should have been recvcount.  Thankfully, this would have only
   affected erroneous programs.  ;-)

This commit was SVN r11980.

The following SVN revision numbers were found above:
  r11904 --> open-mpi/ompi@17539dc154

The following Trac tickets were found above:
  Ticket 338 --> https://svn.open-mpi.org/trac/ompi/ticket/338
2006-10-04 22:36:01 +00:00
George Bosilca
e4df4285b1 Reorder the enum in order to allow some compilers to optimize the big switch in
the header analisys.

This commit was SVN r11975.
2006-10-04 20:03:28 +00:00
George Bosilca
4d18d208d4 Getting closer to a full Total View support.
This commit was SVN r11974.
2006-10-04 20:01:33 +00:00
George Bosilca
e9572c7aee Mea culpa ... that was an weird error. When playing with 64 bits types we have to
the constant before using the << operator ... otherwise everything is 32 bits ...
which does not give the right answer.

This commit was SVN r11972.
2006-10-04 16:23:51 +00:00
George Bosilca
d05f492901 With the lastest changes for heterogeneous environments we have to
compute the hetero flag a little bit different. The hetero flag is now
attached to a convertor if and only if we can use the optimized conversion
functions. It's a little bit broader than before (the 2 architectures
has to be identical).

This commit was SVN r11962.
2006-10-03 20:25:30 +00:00
George Bosilca
432659a2d8 Fix some typos.
This commit was SVN r11961.
2006-10-03 20:21:24 +00:00
Andrew Friedley
836261b85a Fixes ticket 186.
First, move the OPAL_THREAD_LOCK out to the same level as its corresponding UNLOCK.  It was possible to hit the UNLOCK without ever acquiring the lock.

Since the OPAL_THREAD_ADD64() is now protected by this lock, we can just do the decrement non-atomically.

This commit was SVN r11958.
2006-10-03 18:15:26 +00:00
Andrew Friedley
1177844d7a Fixes trac:183.
Don't try to acquire ompi_request_lock here, which in all cases is already held.  Avoids deadlock that occurs even when threads are enabled and we're running a THREAD_SINGLE app.

Reviewed by Galen.

This commit was SVN r11957.

The following Trac tickets were found above:
  Ticket 183 --> https://svn.open-mpi.org/trac/ompi/ticket/183
2006-10-03 18:08:48 +00:00
George Bosilca
f4da7a80bd Fine grain selection for heterogeneous environments. The hetero version of the
conversion function are more complex and costly than a simple memcpy. Therefore,
we want to decrease as much as possible the usage of these functions.
We now check not only th HOMOGENEOUS flag on the datatype or convertor, but the
bits indicating a type is in use. If a communication transfert a type having the
same representation on both peers we can use the optimized version of the conversion.
In same time we build a more accurate conversion table for each master convertor,
based on the minimum differences between the 2 architectures.

This commit was SVN r11945.
2006-10-03 08:13:16 +00:00
Jeff Squyres
9ccbab464d Remove unused variable.
This commit was SVN r11918.
2006-10-01 11:51:41 +00:00
Dan Lacher
ba0389723e Ticket: #346
remove requirements on .la files on wrapper scripts

Ticket: #374
  extend compilers to support 32 bit and 64 bit in one version of the wrapper

Submitted by: Dan Lacher
Reviewed by: Rolf Vandevaart

This commit was SVN r11908.
2006-09-29 23:58:58 +00:00
Jeff Squyres
17539dc154 Fixes trac:430. Fix a few places where optimization checking conflicted
with the use of MPI_IN_PLACE, and make some optimization checks more
correct.  Thanks to Lisandro Dalcin for reporting the problems.

This commit was SVN r11904.

The following Trac tickets were found above:
  Ticket 430 --> https://svn.open-mpi.org/trac/ompi/ticket/430
2006-09-29 22:49:04 +00:00
Edgar Gabriel
ec55acd8f4 orte_rml.send_buffer returns the number of bytes sent or a negative value if
something went wrong. A positiv number > 0 is however a correct value (in
contrary to orte_rml.recv_buffer, which really returns ORTE_SUCCESS or an
error code).

Note: this part of the code is correct on 1.1 and 1.2 branch, no need to move
this change patch to the release branches.

This commit was SVN r11897.
2006-09-29 20:28:45 +00:00
Jeff Squyres
785a2e1c90 Move the man page installs to install-data-hook. Putting them in
install-exec-hook is not only wrong, it can cause ordering issues such
as trying to put sym links to man pages in directories that do not yet
exist.

This commit was SVN r11893.
2006-09-29 14:34:39 +00:00
Ralph Castain
0411f9772e Begin instrumenting for scalability tests.
I have added a new MCA param (hey, you can't have too many!) called OMPI_MCA_orte_timing. If set to anything other than zero, the system will report out critical timing loops. At the moment, this includes three measurements:

1. Time spent going through the RDS->RAS->RMAPS, setting up triggers, etc. prior to calling the actual PLS launch function. This is reported out as time to setup job.

2. Time spent in MPI_Init from start of that function (well, right after opal_init) to the place where we send all of our info the registry. Reported out as time from start to exec_compound_cmd

3. Time actually spent executing the compound cmd. Reported out as time to exec_compound_cmd.

A few additional timing points will be added shortly.

These may eventually be removed or (better) setup with a conditional compile flag.

This commit was SVN r11892.
2006-09-29 13:19:44 +00:00
Jeff Squyres
7b9b90f10a openmpi.3 was deleted; OpenMPI.3 was the one that was left.
This commit was SVN r11886.
2006-09-29 11:59:27 +00:00
Jeff Squyres
b3a35b7d1c Remove entries for files that are no longer there.
This commit was SVN r11885.
2006-09-29 11:57:08 +00:00
Andrew Lumsdaine
e187f058b7 Remove mpi.3 to fix naming conflict
This commit was SVN r11883.
2006-09-29 01:49:21 +00:00
Andrew Lumsdaine
a1ef0067e0 Remove openmpi.3 to fix naming conflict
This commit was SVN r11882.
2006-09-29 01:48:13 +00:00
George Bosilca
7c8c8d6a46 Keep the critical path as short as possible.
This commit was SVN r11881.
2006-09-28 23:59:24 +00:00
George Bosilca
8d2a8229bb We don't use the send and receive request destructor.
This commit was SVN r11880.
2006-09-28 23:57:49 +00:00
George Bosilca
7f2fd41ace Make sure we trigger the PERUSE event before releasing the request.
This commit was SVN r11879.
2006-09-28 23:54:38 +00:00
George Bosilca
9ae37e474b Force the initialization of the convertor if we detect truncation of messages.
This commit was SVN r11877.
2006-09-28 23:42:56 +00:00
George Bosilca
e5ccc1aece Keep the loop as short as possible. And specialize the search for ANY_TAG.
This commit was SVN r11874.
2006-09-28 22:47:40 +00:00
Jeff Squyres
9c37afc265 Add Makefile mojo to install the MPI API man pages and include them in
the distribution tarball.

This commit was SVN r11872.
2006-09-28 19:59:28 +00:00
Rolf vandeVaart
aa7317d635 Initial man pages for all the MPI APIs. There are 304 of them. Also a general man page for mpi,openmpi,MPI, and OpenMPI. Extra work is needed so they get installed properly.
This commit was SVN r11871.
2006-09-28 18:39:36 +00:00
Brian Barrett
451d362296 MPI_Fence's stupid "I might be in a fence epoch but you can't tell until
I do something else" rule screws me up again.  If we're in a FENCE, but
not in ACCESS | EXPOSE, put us in ACCESS|EXPOSE, as we are now known we
now in a real Fence epoch.  Yay silly MPI standards

Refs trac:441

This commit was SVN r11865.

The following Trac tickets were found above:
  Ticket 441 --> https://svn.open-mpi.org/trac/ompi/ticket/441
2006-09-28 15:11:11 +00:00
Gleb Natapov
26660d657a Openib BTL supports PUT and GET protocol, communicate it to PML. We get better
bandwidth with GET protocol and mpi_leave_pinned option.

This commit was SVN r11862.
2006-09-28 11:48:04 +00:00
Gleb Natapov
30608c5a50 Add separate queues for pending rget and rput frags. Process only limited
number of pending packets at once, otherwise we can spin forever.

This commit was SVN r11861.
2006-09-28 11:41:45 +00:00
Jeff Squyres
a5928a2a89 Ralf W. swears to me that this is a product of a very, very old AM bug
and is no longer necessary (especially since we enforce pretty recent
Gnu tools).

This commit was SVN r11855.
2006-09-27 21:33:26 +00:00
Terry Dontje
d228784fa8 Fixed trac #409 by adding a new MCA parameter named
btl_base_warn_component_unused.

This commit was SVN r11844.
2006-09-27 13:24:42 +00:00
Jeff Squyres
860788ff8f Fixes trac:389
Make the Fortran MPI_MAX_DATAREP_STRING follow the same convention as
the rest of the Fortran constants -- be one less than the C constant
of the same name.

This commit was SVN r11842.

The following Trac tickets were found above:
  Ticket 389 --> https://svn.open-mpi.org/trac/ompi/ticket/389
2006-09-27 11:24:36 +00:00
Edgar Gabriel
4dafe7ce0d The length of the ranks-arrays can be longer than the size of the according
groups. And zero is also an acceptable value according to the MPI spec.

Fixes trac:428

This commit was SVN r11841.

The following Trac tickets were found above:
  Ticket 428 --> https://svn.open-mpi.org/trac/ompi/ticket/428
2006-09-27 11:02:47 +00:00
Jeff Squyres
afa7f53d43 Refs trac:429.
Tweak a comment to match the code.

This commit was SVN r11839.

The following Trac tickets were found above:
  Ticket 429 --> https://svn.open-mpi.org/trac/ompi/ticket/429
2006-09-27 00:04:24 +00:00
Jeff Squyres
13d5e5aab4 Refs trac:429
Fixes simple off-by-one error in the error check for
MPI_INFO_GET_NTHKEY. 

This commit was SVN r11838.

The following Trac tickets were found above:
  Ticket 429 --> https://svn.open-mpi.org/trac/ompi/ticket/429
2006-09-26 23:36:22 +00:00
Brian Barrett
85cc94f634 Change mpool shared memory file size from 512MB to a scaling factor based
on the number of known local procs, with a high and low watermark.  Right
now, we default to a low watermark point of 64MB, a per-proc scaling
factor of 32MB, and a high watermark point of 512MB

Refs trac:212

This commit was SVN r11824.

The following Trac tickets were found above:
  Ticket 212 --> https://svn.open-mpi.org/trac/ompi/ticket/212
2006-09-26 16:02:31 +00:00
Brian Barrett
58dce54513 don't use macro or it will try to use the (now destroyed) MPI_COMM_WORLD,
which isn't going to work well

Refs trac:419

This commit was SVN r11822.

The following Trac tickets were found above:
  Ticket 419 --> https://svn.open-mpi.org/trac/ompi/ticket/419
2006-09-26 15:56:51 +00:00
Terry Dontje
bc93adee26 Fixed connection inversion bug by putting in sequence checking for
first sendrecv exchanges for each connection.  This was to fix Trac
#390.

This commit was SVN r11821.
2006-09-26 13:53:00 +00:00
Gleb Natapov
7b1b4f95e3 Local GID table contains not what I thought it contains. It contains local HCA
GIDs (there can be more than one) and not GIDs of the HCA on the network. Entry
zero always have to be initialized so we use it, and warn user if there is more
then one port active and default subnet is configured on at least one of them.

This commit was SVN r11815.
2006-09-26 12:12:33 +00:00
Brian Barrett
a546b6834b Only add the ROMIO source directory into DIST_SUBDIRS and SUBDIRS if
the component is configured successfully.  Otherwise, we can end up 
trying to run make in the romio directory without any Makefiles.  This
really only happens on the targets that recurse into DIST_SUBDIRS - ie
dist, maintainer-clean, and distclean

refs trac:411

This commit was SVN r11807.

The following Trac tickets were found above:
  Ticket 411 --> https://svn.open-mpi.org/trac/ompi/ticket/411
2006-09-25 23:51:13 +00:00
Brian Barrett
c3306f7073 * don't abort if we get a status error - just pass it on to the next layer up
This commit was SVN r11791.
2006-09-25 17:28:24 +00:00
Brian Barrett
b18055451d * need to include errno.h to get errno
This commit was SVN r11790.
2006-09-25 17:17:43 +00:00
Jeff Squyres
a822c34ebf Ralf W. categorically told me that this kind of statement in
Makefile.am's is from a very old Automake bug which has long-since
been fixed.  Since we require very recent versions of AM, we don't
need these anymore.

This commit was SVN r11774.
2006-09-25 14:28:04 +00:00
Jeff Squyres
c5cc1f0c1a Add man page for wrapper compilers.
Fixes trac:358.

This commit was SVN r11773.

The following Trac tickets were found above:
  Ticket 358 --> https://svn.open-mpi.org/trac/ompi/ticket/358
2006-09-25 14:11:21 +00:00
Gleb Natapov
9cd25158d6 Fix btl_openib_max_btls parameter handling.
This commit was SVN r11772.
2006-09-25 11:18:20 +00:00
Terry Dontje
d636db5832 Fixed bug trac #213 by moving the udapl btl header to being a footer.
Also fixed bug trac #346.

This commit was SVN r11760.
2006-09-22 19:28:09 +00:00
Brian Barrett
ab6cbb2359 * update ompi_mpi_abort to call abort_procs_request on the processes that
should die, according to the MPI standard.  It's possible that the
    ORTE layer may kill additional processes, but that's beyond our
    control and seems to be allowed by the standard (ie, it might also
    end up killing all the procs in all the jobs covered by the
    communicator).

  * update the stack trace printing code to use the framework rather
    than calling execinfo directly, so that we should be able to get
    stack traces on all the platforms we support stack tracing on
    (if the user wants stack traces on abort, of course)

This commit was SVN r11753.
2006-09-22 15:04:04 +00:00
Gleb Natapov
601a6ca17a Use real subnet prefix instead of sm_lid.
This commit was SVN r11749.
2006-09-22 10:27:12 +00:00
Brian Barrett
8fc278c3a3 Rest of the fix for #325. It uses a bit more space, but now we can reasonably
tell if the remote proc should be in an exposure epoch or not.

Refs trac:325

This commit was SVN r11746.

The following Trac tickets were found above:
  Ticket 325 --> https://svn.open-mpi.org/trac/ompi/ticket/325
2006-09-21 20:49:15 +00:00
Brian Barrett
2ec0c4f593 * Fix race condition in post/wait/start/complete synchronization where one
epoch's control data could overwrite the previous epoch's data because
    we were reusing data structures between PW and SC.  Instead, we now
    have explicit post_msg and complete_msg counters for completion.

    refs trac:354

  * Only register the rdma osc callback once, as it turns out that some
    btls (MX) do somethng more than update a table during the register
    call, and each register call sucks up valuable fragments...

This commit was SVN r11745.

The following Trac tickets were found above:
  Ticket 354 --> https://svn.open-mpi.org/trac/ompi/ticket/354
2006-09-21 19:57:57 +00:00
George Bosilca
383c4e8c18 Don't print the optimized description if there is nothing.
This commit was SVN r11733.
2006-09-21 05:40:51 +00:00
George Bosilca
645790dd9c Pedantic...
This commit was SVN r11731.
2006-09-20 22:20:10 +00:00
George Bosilca
688a16ea78 A long time waiting patch. Get rid of the comm->c_pml_procs. It was (and that was
long ago) supposed to be used as a cache for accessing the PML procs. But in
all of the PMLs the PML proc contain only one field i.e. a pointer to the ompi_proc.
This pointer can be accessed using the c_remote_group easily. Therefore, there is no
meaning of keeping the PML procs around. Slim fast commit ...

This commit was SVN r11730.
2006-09-20 22:14:46 +00:00
George Bosilca
20459bd982 Remove the HIDDEN flag. It is not used anywhere.
This commit was SVN r11729.
2006-09-20 20:57:10 +00:00