1
1

652 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
Andrew Friedley
a4bdcb4faa Fix a segfault that turned up in more MPI_THREAD_MULTIPLE testing.
Same sort of problem and fix as described in r12323 - mca_pml_ob1_recv_frag_progress() was segfaulting due to a NULL req_proc pointer.  The path leading to this was through the mca_pml_ob1_check_cantmatch_for_match() function, where we can match a frag using the same macros as mca_pml_ob1_frag_match() and never initialize the req_proc pointer.

This commit was SVN r12582.

The following SVN revision numbers were found above:
  r12323 --> open-mpi/ompi@c752502dee
2006-11-13 20:12:51 +00:00
George Bosilca
a38cd366d7 Construct the convertor. It's not really required, but it's not in the
critical path anyway. At least in debug mode we get nice informations about
where the convertor was created.

This commit was SVN r12549.
2006-11-10 20:55:06 +00:00
George Bosilca
858ab24e8e The req_mtl field has to be the last in the struct or bad things happen.
This commit was SVN r12548.
2006-11-10 20:53:41 +00:00
George Bosilca
17405cd9c6 A temporary fix, until we figure out a better approach. The problem
is that if one add "pml=" to the configuration file, really bad things
happen. All PMLs will get initialize, and each of them will initialize
all BTLs. This patch force the mca_pml_base_pml to get initialized in
all cases before we go out of the mca_pml_base_open function.

This commit was SVN r12527.
2006-11-10 04:53:00 +00:00
George Bosilca
eab1776e9a Explicit casts for our friendly Windows environment...
This commit was SVN r12496.
2006-11-08 17:02:46 +00:00
George Bosilca
915d748d72 Initialize the convertor on _START not on _INIT. This allow us to
set it up before the match when we know the peer, saving some
time on the critical path. If the receive is ANY_SOURCE then
we initialize the convertor on _MATCHED. Anyway, we will set it
up only once per receive.

This commit was SVN r12484.
2006-11-08 05:42:29 +00:00
George Bosilca
eb45a5e402 Move things around a little bit. Mainly fields from the send and receive
request in the base request. Rearrange the fields to keep the data
together. Remove some useless tests.

This commit was SVN r12482.
2006-11-08 04:58:23 +00:00
Galen Shipman
55db17b37c don't try to use a dead btl..
This commit was SVN r12456.
2006-11-06 23:25:24 +00:00
Galen Shipman
eef37430a7 failing already failed for ACK timeout..
This commit was SVN r12452.
2006-11-06 22:09:39 +00:00
Galen Shipman
813e7faea8 more fixes for failover.. and yet still more to come..
This commit was SVN r12450.
2006-11-06 21:27:17 +00:00
Gleb Natapov
82f7c0dd69 Fix regression from v1.1.
1) make the code do what comment says
2) if memory is prepinned don't send multiple PUT messages.

This commit was SVN r12433.
2006-11-06 12:00:17 +00:00
Galen Shipman
f7c554df65 Try to failover when we get an async error from the lower layer (BTL)..
This commit was SVN r12420.
2006-11-03 15:40:26 +00:00
Gleb Natapov
7b39039cd6 Add comments to process_pending functions.
This commit was SVN r12346.
2006-10-29 09:12:24 +00:00
Gleb Natapov
8ef5b6a589 Change tabs to spaces to be consistent with the rest of the file.
This commit was SVN r12345.
2006-10-29 08:12:44 +00:00
George Bosilca
a9c6ae8f15 Minimize the number of branches, and orce the correct prediction for the
most usual one. Most of the time we expect the functions which allocate
requests to succeed.

This commit was SVN r12344.
2006-10-27 23:16:13 +00:00
George Bosilca
44f3dd81b4 Update the comment to reflect what's inside the code.
This commit was SVN r12343.
2006-10-27 23:09:37 +00:00
George Bosilca
3472d19d4d Do not modify the convertor if there is no data to be send across the network. The
req_bytes_packed field is initialized in the BASE_INIT macro, so it is set for
all requests at this stage.

This commit was SVN r12342.
2006-10-27 23:03:15 +00:00
Jeff Squyres
020efdf1f9 Refs trac:250
This commit essentially caches the invoking comm/win/file on the
ompi_request_t. This, paired with the req_type field, allows us to
retrieve the invoking MPI object and invoke the proper errhandler.

The patch is missing most updates for the MPI-2 one-sided stuff (i.e.,
the patch mainly fixes comms and files); I didn't really understand
that code and didn't want to hazard trying to figure it out when Brian
can probably do it much more quickly.

So #250 will still stay open, pending MPI-2 one-sided updates for this
stuff.

This commit was SVN r12339.

The following Trac tickets were found above:
  Ticket 250 --> https://svn.open-mpi.org/trac/ompi/ticket/250
2006-10-27 12:35:27 +00:00
Jeff Squyres
e02114dcf3 Fixes trac:529.
* Create a new request type: NOOP (described below)
 * For all MPI_*_INIT functions, OBJ_NEW an ompi_request_t and set its
   type to NOOP
 * Ensure that the NOOP requests are OBJ_RELEASE'd when they are done
 * MPI_START looks at the request type; if NOOP, just return success. If
   not, call the PML start() function
 * MPI_STARTALL always pass the entire array of requests back to the PML
   (see next point)
 * Make the PMLs only process PML requests (i.e., ignore/skip anything
   that isn't of type PML -- such as the NOOP requests)
 * Add a little more param error checking in STARTALL

This commit was SVN r12338.

The following Trac tickets were found above:
  Ticket 529 --> https://svn.open-mpi.org/trac/ompi/ticket/529
2006-10-27 12:32:36 +00:00
George Bosilca
126a68dc9a Big datatype commit. Remove all unused features of the datatype engine. As the memory
allocation logic is completely done outside the data-type engine (in the PML) there is
no need for any special case inside the data-type engine. There is less arguments for
the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is
not required anymore as there is no memory allocated in the engine itself). This change
affect all components using datatypes. I test most of them, but it might happens that I
miss some ... If it's the case please let me know (don't shoot the pianist!!).

This commit was SVN r12331.
2006-10-26 23:11:26 +00:00
Andrew Friedley
c752502dee Fix for a common race condition when running the Sandia mt_send_recv.cc test.
A segfault would occur in mca_pml_ob1_recv_request_progress() when trying to prepare the convertor for unpacking, because the request's req_proc field was NULL.

Turns out that we weren't setting the req_proc field in the MCA_PML_OB1_CHECK_SPECIFIC_AND_WILD_RECEIVES_FOR_MATCH macro.  Instead of just setting it there I removed the other place req_proc was being set correctly, and instead took care of all the cases at once in mca_pml_ob1_recv_frag_match().

This commit was SVN r12323.
2006-10-26 19:09:39 +00:00
Gleb Natapov
90be664b9f Some process_pending() functions get bml_btl on which resource was freed as a
parameter. For optimisation purpose only this BTL is used to send packet
through instead of trying to send packets through all BTLs. But actually the 
code was wrong. It simply used provided bml_btl and it may represent different
endpoint from packet's destination. The fixed code checks if packet's
destination is reachable through the BTL, finds appropriate bml_btl and only
then tries to send it through correct bml_btl.

This commit was SVN r12319.
2006-10-26 13:21:47 +00:00
Sven Stork
f3f39e003e - Increment the pipeline depth before we trigger the send function. As
mentioned in the comment the completion/callback of the triggered 
  send operation can happen before the call returns. If this happens and
  if the pipeline depth is 0 before we triggered the send operation and 
  this is the last send operation of the request then the completion detection
  code will decrement the pipeline depth and check it for equality to 0.
  Because (0-1) != 0 the pml completion function for this request will 
  *not* be called.
  This part 2 of the fix for ticket #246.

This commit was SVN r12292.
2006-10-25 08:52:39 +00:00
George Bosilca
06563b5dec Last set of explicit conversions. We are now close to the zero warnings on
all platforms. The only exceptions (and I will not deal with them
anytime soon) are on Windows:
- the write functions which require the length to be an int when it's
  a size_t on all UNIX variants.
- all iovec manipulation functions where the iov_len is again an int
  when it's a size_t on most of the UNIXes.
As these only happens on Windows, so I think we're set for now :)

This commit was SVN r12215.
2006-10-20 03:57:44 +00:00
Galen Shipman
2036bf5c3c make smart and dumb compilers happy
This commit was SVN r12178.
2006-10-18 19:33:39 +00:00
Rainer Keller
47b24a0603 - Now the branch is done, linearize access regarding
request handling.  Buys a little bit on IMB, no
   functional change, otherwise.

This commit was SVN r12165.
2006-10-18 16:11:50 +00:00
George Bosilca
6f5ec2390b pedantic...
This commit was SVN r12147.
2006-10-17 20:25:40 +00:00
George Bosilca
8852c00c36 Look like a big commit but in fact it address only one issue. The way we're working with
size and diplacement of data-type. After this patch all data can contain size_t bytes
and the displacements are defined as ptrdiff_t. All of the files I was able to compile
have been modified to match this requirement.

This commit was SVN r12146.
2006-10-17 20:20:58 +00:00
George Bosilca
ed83927025 Don't reset the convertor when a persistent request complete. Instead reset it
next time then request is used. This will keep the execution path on the default
case (not persistent) shorter.

This commit was SVN r12134.
2006-10-17 05:01:47 +00:00
George Bosilca
ef66afe45c Another inner loop optimization. Only check for num_fails when prev_bytes is
equal to num_bytes.

This commit was SVN r12133.
2006-10-17 04:38:38 +00:00
George Bosilca
b27f1814c6 If the function is expected to return a bool then let's return only
true or false.

This commit was SVN r11991.
2006-10-05 05:10:34 +00:00
George Bosilca
e4df4285b1 Reorder the enum in order to allow some compilers to optimize the big switch in
the header analisys.

This commit was SVN r11975.
2006-10-04 20:03:28 +00:00
Andrew Friedley
836261b85a Fixes ticket 186.
First, move the OPAL_THREAD_LOCK out to the same level as its corresponding UNLOCK.  It was possible to hit the UNLOCK without ever acquiring the lock.

Since the OPAL_THREAD_ADD64() is now protected by this lock, we can just do the decrement non-atomically.

This commit was SVN r11958.
2006-10-03 18:15:26 +00:00
Andrew Friedley
1177844d7a Fixes trac:183.
Don't try to acquire ompi_request_lock here, which in all cases is already held.  Avoids deadlock that occurs even when threads are enabled and we're running a THREAD_SINGLE app.

Reviewed by Galen.

This commit was SVN r11957.

The following Trac tickets were found above:
  Ticket 183 --> https://svn.open-mpi.org/trac/ompi/ticket/183
2006-10-03 18:08:48 +00:00
George Bosilca
7c8c8d6a46 Keep the critical path as short as possible.
This commit was SVN r11881.
2006-09-28 23:59:24 +00:00
George Bosilca
8d2a8229bb We don't use the send and receive request destructor.
This commit was SVN r11880.
2006-09-28 23:57:49 +00:00
George Bosilca
7f2fd41ace Make sure we trigger the PERUSE event before releasing the request.
This commit was SVN r11879.
2006-09-28 23:54:38 +00:00
George Bosilca
9ae37e474b Force the initialization of the convertor if we detect truncation of messages.
This commit was SVN r11877.
2006-09-28 23:42:56 +00:00
George Bosilca
e5ccc1aece Keep the loop as short as possible. And specialize the search for ANY_TAG.
This commit was SVN r11874.
2006-09-28 22:47:40 +00:00
George Bosilca
688a16ea78 A long time waiting patch. Get rid of the comm->c_pml_procs. It was (and that was
long ago) supposed to be used as a cache for accessing the PML procs. But in
all of the PMLs the PML proc contain only one field i.e. a pointer to the ompi_proc.
This pointer can be accessed using the c_remote_group easily. Therefore, there is no
meaning of keeping the PML procs around. Slim fast commit ...

This commit was SVN r11730.
2006-09-20 22:14:46 +00:00
George Bosilca
1c464d340c Do not increase the reference count for the datatype if it is not required. Plus
some typos.

This commit was SVN r11728.
2006-09-20 20:14:15 +00:00
Andrew Friedley
e776b01811 This assert fails if -mca pml_dr_enable_csum 0 is set, which isn't what we want..
This commit was SVN r11719.
2006-09-19 19:57:33 +00:00
George Bosilca
6f3782bbd7 When we succesfully cancel a request we have to set it's pml_complete flag to true
if we want to be able to reuse the request. If not, the request will never be freed
even if the user call MPI_Request_free.

This commit was SVN r11717.
2006-09-19 18:04:09 +00:00
Rainer Keller
40cb5d3e30 - Fix peruse compilation
This commit was SVN r11685.
2006-09-18 07:41:09 +00:00
Ralph Castain
37dfdb76eb Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done.
This commit was SVN r11661.
2006-09-14 21:29:51 +00:00
Gleb Natapov
03cda61302 Fix hang in receiving into MPI_alloced area.
This code hangs with openib BTL:

int size = 4000000;
sbuf = malloc(size);  
MPI_Alloc_mem(size, MPI_INFO_NULL, &rbuf);

if (rank == 0)
{
    MPI_Recv(rbuf, size, MPI_CHAR, 1, 1, MPI_COMM_WORLD, &stat);
}else{
    MPI_Send(sbuf, size, MPI_CHAR, 0, 1, MPI_COMM_WORLD);
}

This commit was SVN r11613.
2006-09-11 12:18:59 +00:00
Gleb Natapov
fa17445384 fix compilation warning.
This commit was SVN r11601.
2006-09-10 06:17:33 +00:00
Gleb Natapov
e7650ff48a Bad things happen if min_rdma_size is smaller then data delivered in the RNDV
packet. Fix this.

This commit was SVN r11548.
2006-09-07 10:42:35 +00:00
George Bosilca
e33c35112b Correct the conversion between int and bool. Apply it on all files except
the one that will be modified by Ralph for the ORTE 2.0. The missing ones
are in the rsh PLS.

This commit was SVN r11476.
2006-08-28 18:59:16 +00:00