1
1
Граф коммитов

2461 Коммитов

Автор SHA1 Сообщение Дата
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Jeff Squyres
266e805427 * Update parameter checking per MPI-1:2.4.1 and MPI-1:5.4.1 -- also
return an error if MPI_COMM_NULL is used.  
 * Minor style fixes.

This commit was SVN r14041.
2007-03-16 13:09:49 +00:00
Gleb Natapov
1dc1ee3998 Send control credit message over "eager rdma" channel if possible.
This commit was SVN r14032.
2007-03-14 14:38:56 +00:00
Gleb Natapov
1f3ac2d7ae Hold pointers to free_max/free_eager lists in array indexed by priority.
This eliminates couple of ifs from fast path.

This commit was SVN r14031.
2007-03-14 14:36:03 +00:00
Gleb Natapov
8607957df9 Get rid of remaining _hp/_lp stuff. Consolidate HP/LP QP creation code.
This commit was SVN r14030.
2007-03-14 14:33:24 +00:00
Brian Barrett
211ed6e852 Make the trunk look similar to v1.1 and v1.2, but return an error if we
can't find "me" in the list of procs, since we should always be in the
proc_world list, or something bad has happened...

This commit was SVN r14025.
2007-03-13 20:17:10 +00:00
Brian Barrett
f59d38dd81 fix stupid compiler warning
This commit was SVN r14024.
2007-03-13 19:45:26 +00:00
Rolf vandeVaart
42168575fd Fix for the special case where np=2 and the sendbuf is set to MPI_IN_PLACE.
In that case, sendcount and sendtype are not valid and we need to use
recvcount and recvtype.

This commit fixes trac:943.  Reviewed by Jelena Pjesivac-Grbovic.

This commit was SVN r14022.

The following Trac tickets were found above:
  Ticket 943 --> https://svn.open-mpi.org/trac/ompi/ticket/943
2007-03-13 19:01:20 +00:00
Brian Barrett
f6be04ff37 be a bit more careful with parens than the r13992 fix
This commit was SVN r13996.

The following SVN revision numbers were found above:
  r13992 --> open-mpi/ompi@3cbac958eb
2007-03-09 16:39:23 +00:00
Brian Barrett
3cbac958eb fix warning about types
This commit was SVN r13992.
2007-03-09 02:32:22 +00:00
Galen Shipman
8253d83410 make btl template compile again
This commit was SVN r13990.
2007-03-08 21:58:26 +00:00
Galen Shipman
67ba5264f6 ORTE_NAME_ARGS casts to long, not unsigned long.
This commit was SVN r13988.
2007-03-08 21:42:29 +00:00
Galen Shipman
8072dd344c use %ld instead of %d as ORTE_NAME_ARGS does casting to long not unsigned long
This commit was SVN r13987.
2007-03-08 21:41:39 +00:00
Bill D'Amico
53d434d6ab Fix warnings when building with UDAPL - minor formatting errors.
This commit was SVN r13971.
2007-03-08 18:39:40 +00:00
Jeff Squyres
b94a39236b Submitted by Gleb, reviewed by Rich:
Queue_empty is determined by the reader, and is it's local view.
However, the writer may continue writing to this queue.  The decision
to go on to the next cb_fifo is done in an atomic region, checking the
writer's view.  The writer also "changes it's view" in an atomic
region protected by the same lock.

This commit was SVN r13968.
2007-03-08 16:51:59 +00:00
Brian Barrett
e926bed69f Implement MPI_TYPE_CREATE_DARRAY function. Works with MPICH2 darray-pack
test, Sun's darray test, and an internal LANL test code.  I would not
assume it will work properly on other codes, as I'm still not sure I
completely understand what the standard says this function is supposed to
do.

Refs trac:65

This commit was SVN r13967.

The following Trac tickets were found above:
  Ticket 65 --> https://svn.open-mpi.org/trac/ompi/ticket/65
2007-03-08 16:33:08 +00:00
Jelena Pjesivac-Grbovic
9780a000ba Cleanup of generic reduce function and possible (low probability) bug fix.
- fixing line lengths and some of the comments
- possible bug fix (but I do not think we exposed it in any tests so far)
  temporary buffers were allocated as multiples of extent instead of 
  true_extent + (count -1) * extent.
Everything is still passing Intel tests over tcp and btl mx up to 64 nodes.

This commit was SVN r13956.
2007-03-08 00:54:52 +00:00
Jelena Pjesivac-Grbovic
57cbafafd5 Clean up of generic broadcast function: removing unecessary statements and improving comments.
This commit was SVN r13955.
2007-03-07 21:59:53 +00:00
Rolf vandeVaart
333357f4cc This fixes the initialization of the usable size of the shared memory.
The original code was not compensating for the space used by the header.  

When memory got tight, the allocator would return a pointer to memory that 
did not exist resulting in a SEGV for the application.  This is a partial 
fix for ticket #929.

Reviewed by Rich Graham.  

This commit was SVN r13950.
2007-03-07 13:28:06 +00:00
Jelena Pjesivac-Grbovic
0c07654c30 Updating reduce_scatter decision function based on MX results up to 64 nodes and both 1ppn and 2ppn
configurations.

This commit was SVN r13945.
2007-03-07 00:38:33 +00:00
George Bosilca
4b63631535 Allow correct duplication for MPI_UB and MPI_LB. The problem is that we cannot
create a duplicate type, because any duplicate type lose the PREDEFINED flag.
An MPI_LB (respectively MPI_UB) without the PREDEFINED tag is useless, as it's
not the a marker anymore. The solution is to return the same pointer, but once
the reference count has been increased. In order for this to work, I allowed
the destruction to check for the reference count of an object before complaining
about destroying a predefined type.

This fixed ticket #317.

This commit was SVN r13942.
2007-03-06 18:21:49 +00:00
Gleb Natapov
40501f8274 Amend IB parameter checking.
This commit was SVN r13936.
2007-03-06 13:05:12 +00:00
Brian Barrett
9660bb6ccc These symbols aren't actually created in ROMIO with Open MPI's configure, so
no need to have them in here.

This commit was SVN r13933.
2007-03-05 22:55:17 +00:00
Jelena Pjesivac-Grbovic
e5ed167a6e Adding tuned version of reduce_scatter implementation.
Currently 3 algorithms are available:
- non-overlapping, reduce + scatterv, (works for non-commutative operations)
- recursive halving algorithm (copied from basic module)
- ring algorithm  (similar to allreduce ring, for large messages)

This commit was SVN r13929.
2007-03-05 20:40:39 +00:00
Gleb Natapov
be018944d2 Clean up circular buffer implementation. Get rid of _same_base_address()
functions by pre-calculating everything in advance.

This commit was SVN r13923.
2007-03-05 14:27:26 +00:00
Gleb Natapov
8078ae5977 Optimize sm communication. Pass message type (MCA_BTL_SM_FRAG_ACK/
MCA_BTL_SM_FRAG_SEND) and status success/fail in low bits of pointers we
are passing through circular buffer. The rank that receives ACK doesn't need
to look into data it received and this is a big win since this data is not in
the cache of the rank's CPU. (Note that we can use low bits of pointers because
free_list always return pointers aligned at least to cache line size).

This commit was SVN r13922.
2007-03-05 14:24:09 +00:00
Gleb Natapov
90fb58de4f When frags are allocated from mpool by free_list the frag structure is also
allocated from mpool memory (which is registered memory for RDMA transports)
This is not a problem for a small jobs, but for a big number of ranks an
amount of waisted memory is big.

This commit was SVN r13921.
2007-03-05 14:17:50 +00:00
Rich Graham
e932d9a695 macro variable has same name as one of the parameters passed to the
macro.
Typo - most likely cut and paste error.

This commit was SVN r13918.
2007-03-04 23:31:07 +00:00
Li-Ta Lo
196e2a86bb addes binomial tree based scatter, passed IBM and intel tests
This commit was SVN r13906.
2007-03-02 23:19:02 +00:00
Li-Ta Lo
11c94cbe76 eliminated the use of MPI_Get_count
This commit was SVN r13904.
2007-03-02 22:57:50 +00:00
Li-Ta Lo
3765e19d15 added ASCII graph for the topologies
This commit was SVN r13892.
2007-03-02 17:17:14 +00:00
Li-Ta Lo
bd75f2f162 change ALLGATHER to GATHER
This commit was SVN r13891.
2007-03-02 17:02:29 +00:00
Josh Hursey
0404444dbe * Added 2 new MCA parameters
- mca_base_param_file_prefix
     (Default: NULL)
     This is the fullname of the "-am" mpirun option. Used to specify a ':'
     separated list of AMCA parameter set files.
  - mca_base_param_file_path
     (Default: $SYSCONFDIR/amca-param-sets/:$CWD)
     The path to search for AMCA files with relative paths. A warning will be
     printed if the AMCA file cannot be found.

* Added a new function "mca_base_param_recache_files" the re-reads the file
configurations. This is used internally to help bootstrap the MCA system.

* Added a new orterun/mpirun command line option '-am' that aliases for the
mca_base_param_file_prefix MCA parameter

* Exposed the opal_path_access function as it is generally useful in other
places in the code.

* New function "opal_cmd_line_make_opt_mca" which will allow you to append a
new command line option with MCA parameter identifiers to set at the same
time. Previously this could only be done at command line declaration time.

* Added a new directory under the $pkgdatadir named "amca-param-sets" where all
the 'shipped with' Open MPI AMCA parameter sets are placed. This is the first
place to search for AMCA sets with relative paths.

* An example.conf AMCA parameter set file is located in
contrib/amca-param-sets/.

* Jeff Squyres contributed an OpenIB AMCA set for benchmarking.

Note: You will need to autogen with this commit as it adds a configure param.
  Sorry :(

This commit was SVN r13867.
2007-03-01 13:39:20 +00:00
Tim Prins
74555cda51 - Re-enable MPI_Comm_spawn_multiple from singletons. It has been working for a while, but the check was never removed.
- Coding standardize some code
- Remove now unused help message

This commit was SVN r13858.
2007-02-28 22:09:30 +00:00
Tim Mattox
ec82d01555 Add a missing extern keyword that prevented compilation on OS X.
This commit was SVN r13853.
2007-02-28 20:26:34 +00:00
Gleb Natapov
2b6cbd6299 Separate frag lists for RDMA descriptors to two, one for src descriptors
and another for dst descriptors. This provide partial solution to OB1 protocol
deadlock problem. We can limit number of RDMA descriptors (by setting
btl_openib_free_list_max to something different from -1) and if we will be
lucky to hit this limit before we fail to register more memory the protocol
will not deadlock. When we had only one list for src/dst descriptors we
deadlocked when we reached max limit for the list.

This commit was SVN r13844.
2007-02-28 13:43:38 +00:00
Sven Stork
870740efe2 - proper export symbols that are required by other components.
This commit was SVN r13841.
2007-02-28 12:51:55 +00:00
Rainer Keller
0889ebd59f - Eliminate warnings, that PGI-6.2.5 issues with -Minform=inform
This commit was SVN r13840.
2007-02-28 08:36:34 +00:00
Li-Ta Lo
c5d8c221b0 added binomial tree based Gather alogrithm, passed IBM and Intel tests
This commit was SVN r13835.
2007-02-28 01:11:01 +00:00
Jelena Pjesivac-Grbovic
627533fe4a Adding segmented ring algorithm for Allreduce for commutative operations.
Algorithm allows user to specify the segment size to be used for computation/communication overlap.
The additional memory requirement for the algorithm is 2 x segment size.
It performed well for (really) large message sizes over MX and it passed intel Allreduce_c and Allreduce_loc_c tests.

This commit was SVN r13832.
2007-02-27 20:32:30 +00:00
Jeff Squyres
38c976d527 Remove redundant declaration of ompi_err_unknown.
This commit was SVN r13829.
2007-02-27 19:37:42 +00:00
Sven Stork
d8a369936e - Fix more symbols that should be exported.
This commit was SVN r13824.
2007-02-27 15:17:17 +00:00
George Bosilca
533dfff56d Only do the preconnection stage if we found the local proc. It's mostly to
make some compilers complain less about uninitialized values.

This commit was SVN r13805.
2007-02-26 22:24:44 +00:00
George Bosilca
bec20422ee Remove the warnings about printf data-type mismatch.
This commit was SVN r13804.
2007-02-26 22:20:35 +00:00
Brian Barrett
6d70f5fbe0 don't define malloc and friends in opal_config, as it causes problems when
we later include malloc.h

This commit was SVN r13803.
2007-02-26 21:34:48 +00:00
Li-Ta Lo
c860bd1be5 fixed a typo in the comment
This commit was SVN r13802.
2007-02-26 19:20:46 +00:00
Li-Ta Lo
73a73b1c78 added ASCII graph on reduce_log_intra
This commit was SVN r13801.
2007-02-26 19:15:37 +00:00
Pavel Shamis
6fe84f581b mpool_base_module_destroy was removing all modules from
a list instead of removing specific one. Fixing the bug.

This commit was SVN r13795.
2007-02-26 16:25:20 +00:00
Brian Barrett
d9e0e80190 Make some debugging output only looked at when debugging is enabled
This commit was SVN r13777.
2007-02-25 01:03:19 +00:00
Bill D'Amico
db1c2a58c4 Removed cruft - unused variables causing warnings during OMPI build.
This commit was SVN r13772.
2007-02-23 18:55:41 +00:00