1
1
Граф коммитов

13 Коммитов

Автор SHA1 Сообщение Дата
Tim Prins
e058266c96 Change the ORTE datatype service in 2 ways:
1. Remove a unneeded field, bytes_avail, from orte_buffer_t. It is a calcualed value, and updating it everywhere is worse then just calculating it in the one place it is acutally used.
2. Change it so the default size of a orte_buffer is 128 bytes instead of 1024 bytes. We then double the size of the buffer up to 1024 bytes, then we additively increase the size by 1024 bytes at a time as was done before.

This commit was SVN r14252.
2007-04-06 19:40:29 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
George Bosilca
9f73335bdb Silence the compiler.
This commit was SVN r13381.
2007-01-31 04:24:56 +00:00
Brian Barrett
8f68764e5e A number of heterogeneous fixes for the dss with the new buffer options:
* When using the load/unload interface, stash away the current buffer
    type so that it can be properly unpacked on the receiving side if
    the buffer type is other than the receiver default
  * Include type information for unsized types (bool, int, size_t,
    pid_t) so that they can be properly unpacked by the receiver
    in the heterogeneous case.
  * Restore the NON_DESC type as the default for optimized builds,
    since it looks like this fixes the known issues with the
    non-described buffers

Refs trac:587

This commit was SVN r12784.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2006-12-06 23:19:06 +00:00
Ralph Castain
897744cdeb Two major changes to the runtime:
1. implement and enable the non-described buffer operations. I will send out a more detailed explanation separately. However, this mode of operation (which is now the default) significantly reduces message size during startup. If you want the described buffers, set the mca param "-mca dss_describe_buffer 1".

2. revise the xcast system to support both linear and binomial tree broadcast methods. Since we are seeing scenarios where the binomiall tree can cause problems, I have made the linear method the default. To run with the binomial tree, set the mca param "-mca oob_xcast_mode binomial".

3. add some detailed timing reports to the xcast operation. These are enabled via "-mca oob_xcast_timing 1".

4. add some more unit tests for the dss and gpr (focused on support for the non-described buffer)

This commit was SVN r12722.
2006-12-01 22:30:39 +00:00
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
George Bosilca
2aa3e51223 Nothing relevant. Only a set of castings to have a clean compile on
Windows. The cl.exe compiler is pretty good at complaining about
any kind of non explicit cast.

This commit was SVN r12207.
2006-10-20 02:25:50 +00:00
Ralph Castain
37dfdb76eb Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done.
This commit was SVN r11661.
2006-09-14 21:29:51 +00:00
George Bosilca
f52c10d18e And ORTE is ready for prime-time. All Windows tricks are in:
- use the OPAL functions for PATH and environment variables
- make all headers C++ friendly
- no unamed structures
- no implicit cast.

Plus a full implementation for the orte_wait functions.

This commit was SVN r11347.
2006-08-23 03:32:36 +00:00
George Bosilca
0417d27f46 orte_std_cntr_t vs. size_t round 2. Advantage for size_t ...
This commit was SVN r11317.
2006-08-22 14:58:31 +00:00
Ralph Castain
bd7e8febb1 Bring over the ORTE 2.0 DSS. This introduces a few changes, almost all of which are transparent to the user:
1. Introduces a flag for the type of buffer that now allows a user to either have a fully described or a completely non-described buffer. In the latter case, no data type descriptions are included in the buffer. This obviously limits what we can do for debugging purposes, but the intent here was to provide an optimized communications capability for those wanting it.

Note that individual buffers can be designated for either type using the orte_dss.set_buffer_type command. In other words, the buffer type can be set dynamically - it isn't a configuration setting at all. The type will default to fully described. A buffer MUST be empty to set its type - this is checked by the set_buffer_type command, and you will receive an error if you violate that rule.

IMPORTANT NOTE: ORTE 1.x actually will NOT work with non-described buffers. This capability should therefore NOT be used until we tell you it is okay. For now, it is here simply so we can begin bringing over parts of ORTE 2.0. The problem is that ORTE 1.x depends upon the transmission of non-hard-cast data types such as size_t. These "soft" types currently utilize a "peek" function to see their actual type in the buffer - obviously, without description, the system has no idea how to unpack these "soft" types. We will deal with this later - for now, please don't use the non-described buffer option.

2. Introduces the orte_std_cntr_t type. This will become the replacement for the size_t's used throughout ORTE 1.x. At the moment, it is actually typedef'd to size_t for backward compatibility.

3. Introduces the orte_dss.arith API that supports arbitrary arithmetic functions on numeric data types. Calling the function with any other data type will generate an error.

This commit was SVN r11075.
2006-08-01 18:42:25 +00:00
Brian Barrett
02c8a51b76 * fix endian encoding for 64 bit numbers to use hton64
* cleanup the unpack_size_mismatch macros a little bit
* ad comment about endianness of size_mismatch cleanup code so that I don't
  think I've found a bug that really isn't and lose an hour tracking it down
  again...

This commit was SVN r9458.
2006-03-29 18:58:02 +00:00
Ralph Castain
4b9f015c0b Merge in the new data support subsystem for ORTE. MPI folks should not notice a difference. Longer explanation will be sent to developers mailing list.
This commit was SVN r8912.
2006-02-07 03:32:36 +00:00