1
1

1005 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
120cf76ad8 Remove some warnings.
This commit was SVN r14196.
2007-04-02 19:11:06 +00:00
George Bosilca
8273c5eeba Correct an error introduced by commit r14180.
This commit was SVN r14191.

The following SVN revision numbers were found above:
  r14180 --> open-mpi/ompi@1cb26e3b9c
2007-04-02 02:59:23 +00:00
Tim Prins
80e047b843 make the mx btl compile again...
This commit was SVN r14183.
2007-04-01 02:49:23 +00:00
George Bosilca
1cb26e3b9c Finally the convertor export a convenience function to allow a consistent
computation of the current location on the pack/unpack process. This can
be used both for retrieving the pointer to the first byte (in the special
case of the cached RDMA protocol) and for getting the current
position (for the pipelined protocol).

I modified all BTLs, but most of them are still untested.

This commit was SVN r14180.
2007-03-30 22:02:45 +00:00
Gleb Natapov
e5450613b5 Add new SM BTL parameter btl_sm_cb_max_num. If set to value greater then zero
it limits the number of circular buffers allocated between each pair of peers.
This allows for more tight memory usage control.

This commit was SVN r14120.
2007-03-22 12:21:42 +00:00
Gleb Natapov
efe0323d35 Initialize fifos at SM BTL init time instead of waiting for first send. This
waist slightly more memory, but prevents problem when fifo cannot be allocated
later during a job run when memory resource is exhausted.

This commit was SVN r14119.
2007-03-22 12:18:44 +00:00
Gleb Natapov
c389c47d79 Fix SM connectivity calculations.
This commit was SVN r14109.
2007-03-21 13:29:19 +00:00
Gleb Natapov
a1a14aa4c3 Add memory barriers during SM btl initialization.
This commit was SVN r14099.
2007-03-21 10:25:10 +00:00
Gleb Natapov
435565590f Don't relay on opcode to decide how to progress pending message.
This commit was SVN r14098.
2007-03-21 07:59:59 +00:00
Brian Barrett
464d536928 remove debugging printf
This commit was SVN r14088.
2007-03-20 21:28:28 +00:00
George Bosilca
8c9e4baa47 Add multi-link capabilities to the TCP BTL. This is useful for systems where the
latency is high and the network relatively fast. This will allow for more kernel
level buffering, which allow overlap between system calls and communications.
Somehow, even on fast clusters there is an improvement (non significant).

This patch create multiple modules for the same device, which in turn will
create multiple sockets between the peers. By default the number of BTL by
device is set to 1, so there is no fundamental difference with the current
version. Change the value of btl_tcp_links to enable multiple links between
peers.

This commit was SVN r14076.
2007-03-20 11:50:17 +00:00
George Bosilca
4332295b32 Typos.
This commit was SVN r14074.
2007-03-20 11:18:05 +00:00
Gleb Natapov
e551c5f1a3 Get rid of separate sm BTL for different shared memory base addresses. Now,
when we precalculate most of the addresses there is no point to have separate
BTL for this. The sm_progress() code become much more simple as a result.

This commit was SVN r14071.
2007-03-20 08:15:58 +00:00
Josh Hursey
e1a18fa149 Patch from Gleb
Always set opcode appropriately before calling ibv_post_send.

This commit was SVN r14056.
2007-03-18 13:33:15 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Gleb Natapov
1dc1ee3998 Send control credit message over "eager rdma" channel if possible.
This commit was SVN r14032.
2007-03-14 14:38:56 +00:00
Gleb Natapov
1f3ac2d7ae Hold pointers to free_max/free_eager lists in array indexed by priority.
This eliminates couple of ifs from fast path.

This commit was SVN r14031.
2007-03-14 14:36:03 +00:00
Gleb Natapov
8607957df9 Get rid of remaining _hp/_lp stuff. Consolidate HP/LP QP creation code.
This commit was SVN r14030.
2007-03-14 14:33:24 +00:00
Galen Shipman
8253d83410 make btl template compile again
This commit was SVN r13990.
2007-03-08 21:58:26 +00:00
Galen Shipman
67ba5264f6 ORTE_NAME_ARGS casts to long, not unsigned long.
This commit was SVN r13988.
2007-03-08 21:42:29 +00:00
Galen Shipman
8072dd344c use %ld instead of %d as ORTE_NAME_ARGS does casting to long not unsigned long
This commit was SVN r13987.
2007-03-08 21:41:39 +00:00
Bill D'Amico
53d434d6ab Fix warnings when building with UDAPL - minor formatting errors.
This commit was SVN r13971.
2007-03-08 18:39:40 +00:00
Gleb Natapov
40501f8274 Amend IB parameter checking.
This commit was SVN r13936.
2007-03-06 13:05:12 +00:00
Gleb Natapov
be018944d2 Clean up circular buffer implementation. Get rid of _same_base_address()
functions by pre-calculating everything in advance.

This commit was SVN r13923.
2007-03-05 14:27:26 +00:00
Gleb Natapov
8078ae5977 Optimize sm communication. Pass message type (MCA_BTL_SM_FRAG_ACK/
MCA_BTL_SM_FRAG_SEND) and status success/fail in low bits of pointers we
are passing through circular buffer. The rank that receives ACK doesn't need
to look into data it received and this is a big win since this data is not in
the cache of the rank's CPU. (Note that we can use low bits of pointers because
free_list always return pointers aligned at least to cache line size).

This commit was SVN r13922.
2007-03-05 14:24:09 +00:00
Gleb Natapov
90fb58de4f When frags are allocated from mpool by free_list the frag structure is also
allocated from mpool memory (which is registered memory for RDMA transports)
This is not a problem for a small jobs, but for a big number of ranks an
amount of waisted memory is big.

This commit was SVN r13921.
2007-03-05 14:17:50 +00:00
Rich Graham
e932d9a695 macro variable has same name as one of the parameters passed to the
macro.
Typo - most likely cut and paste error.

This commit was SVN r13918.
2007-03-04 23:31:07 +00:00
Josh Hursey
0404444dbe * Added 2 new MCA parameters
- mca_base_param_file_prefix
     (Default: NULL)
     This is the fullname of the "-am" mpirun option. Used to specify a ':'
     separated list of AMCA parameter set files.
  - mca_base_param_file_path
     (Default: $SYSCONFDIR/amca-param-sets/:$CWD)
     The path to search for AMCA files with relative paths. A warning will be
     printed if the AMCA file cannot be found.

* Added a new function "mca_base_param_recache_files" the re-reads the file
configurations. This is used internally to help bootstrap the MCA system.

* Added a new orterun/mpirun command line option '-am' that aliases for the
mca_base_param_file_prefix MCA parameter

* Exposed the opal_path_access function as it is generally useful in other
places in the code.

* New function "opal_cmd_line_make_opt_mca" which will allow you to append a
new command line option with MCA parameter identifiers to set at the same
time. Previously this could only be done at command line declaration time.

* Added a new directory under the $pkgdatadir named "amca-param-sets" where all
the 'shipped with' Open MPI AMCA parameter sets are placed. This is the first
place to search for AMCA sets with relative paths.

* An example.conf AMCA parameter set file is located in
contrib/amca-param-sets/.

* Jeff Squyres contributed an OpenIB AMCA set for benchmarking.

Note: You will need to autogen with this commit as it adds a configure param.
  Sorry :(

This commit was SVN r13867.
2007-03-01 13:39:20 +00:00
Gleb Natapov
2b6cbd6299 Separate frag lists for RDMA descriptors to two, one for src descriptors
and another for dst descriptors. This provide partial solution to OB1 protocol
deadlock problem. We can limit number of RDMA descriptors (by setting
btl_openib_free_list_max to something different from -1) and if we will be
lucky to hit this limit before we fail to register more memory the protocol
will not deadlock. When we had only one list for src/dst descriptors we
deadlocked when we reached max limit for the list.

This commit was SVN r13844.
2007-02-28 13:43:38 +00:00
Sven Stork
d8a369936e - Fix more symbols that should be exported.
This commit was SVN r13824.
2007-02-27 15:17:17 +00:00
Josh Hursey
c573171b7d Mostly a cleanup commit.
- Implement the BML/r2 finialize funciton
- Cleanup the btl close routine
- Wire up a pml_base_verbose MCA parameter so you can actually watch the PML selection logic if you really want to.
- Fix a potental segfault in the selection logic.
  ompi_pointer_array_get_item() may return NULL, so we have to check for it

This commit was SVN r13734.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2007-02-21 16:18:43 +00:00
Jeff Squyres
f820e44112 Remove a gcc-ism from the code (defining an anonymous union in the
middle of a struct).  Now we properly define and name the union
outside the struct and simply create an instance of it inside the
struct. 

This commit was SVN r13709.
2007-02-19 18:21:57 +00:00
George Bosilca
020b8ade70 A slightly better fix for the data mismatch compiler complaints.
This commit was SVN r13695.
2007-02-17 05:23:57 +00:00
George Bosilca
04138c23af No more warnings.
This commit was SVN r13683.
2007-02-16 16:25:58 +00:00
Pavel Shamis
edeab0e912 Adding Mellanox Technologies copyright to files touched by Mellanox.
This commit was SVN r13669.
2007-02-15 18:03:20 +00:00
Gleb Natapov
4d4b0a022a Add error callback to sm BTL. Call it when allocation of the initial circular
buffer fails. If cb is already allocated, but it is full and allocation of
additional cb fails, we spin waiting for receiver to free space in existing
cb.

This commit was SVN r13635.
2007-02-13 12:01:36 +00:00
Galen Shipman
f98a442c82 Fix a problem in the selection logic for MX. Basically we need to be able to
open MTL MX and BTL MX and initialize them at the same time. The problem is
that both call mx_init and mx_finalize, solution is to add an external entity
that does the init and finalize (based on ref counting).

This commit was SVN r13576.
2007-02-09 03:19:38 +00:00
Jeff Squyres
c91fcd7fbd Fix a bunch of minor typos submitted by Bernhard Fischer.
This commit was SVN r13505.
2007-02-06 12:00:30 +00:00
Galen Shipman
a94101fa62 mostly another hack around for PML selection, allows CM be select itself if an
MTL is available, if not OB1 is used. Still prevents DR and OB1 from stomping
on each other though. 

This commit was SVN r13481.
2007-02-03 02:01:18 +00:00
George Bosilca
0ff2115964 Other warnings are now silenced.
This commit was SVN r13462.
2007-02-02 06:47:35 +00:00
George Bosilca
56ffbfc5ff Get rid of the warnings in the Open IB BTL.
This commit was SVN r13424.
2007-02-01 19:07:04 +00:00
George Bosilca
b611e6d7dc Less warnings.
This commit was SVN r13419.
2007-02-01 17:51:43 +00:00
George Bosilca
6ef3917741 Allow the user to specify the bandwidth and latency for the MX device.
This commit was SVN r13418.
2007-02-01 17:51:00 +00:00
Brian Barrett
58b325b03f Two changes to improve the sm situation with spawn:
* have the mpool size be based on MCW, not num procs
    in other jobs we know about.  Solves the problem of
    the spawned job having a much bigger than needed
    sm file
  * Can't assume that "me" is in the list of procs
    passed to addprocs, so need to use slightly different
    logic and not go through all of add procs unless
    there's a proc in my job that isn't me.

This seems to greatly improve the situation, although
there still seems to be more of a slowdown through
MPI_INIT for the children (if there are more than one
child) than MPI_INIT for the parent if there are 'n'
children compared to 'n' parents.  Hopefully that
made sense ;)

This commit was SVN r13417.
2007-02-01 17:18:35 +00:00
Brian Barrett
ee753694e0 Print out the memlock limit when we can't allocate memory
This commit was SVN r13372.
2007-01-30 21:22:56 +00:00
Jeff Squyres
86f8c66a27 Turns out that the leave_pinned stuff isn't used in these BTLs at
all.  So just remove it.

This commit was SVN r13360.
2007-01-30 15:39:49 +00:00
Jeff Squyres
c9f072b84f Strike down a few more stray places that were registering
mpi_leave_pinned and replace them with the one central global
variable.

This commit was SVN r13349.
2007-01-29 20:24:31 +00:00
Jeff Squyres
7b6ed64c7b Add in the hostname to the BTL_* output macros so that you can tell on
which node an event occurred.

This commit was SVN r13302.
2007-01-25 14:02:54 +00:00
Jeff Squyres
6fea000e5f Oops -- get the right function name (copy-n-paste error).
This commit was SVN r13290.
2007-01-24 22:31:13 +00:00
Jeff Squyres
6b69ea664d Make a much, much better error message for a not-uncommon failure
scenario (user/sysadmin forgot to set the memlock limits high
enough).

This commit was SVN r13289.
2007-01-24 22:25:40 +00:00