1
1

331 Коммитов

Автор SHA1 Сообщение Дата
Gleb Natapov
5c3f511451 Properly determine btl's rank among all btls withing the same subnet.
This commit was SVN r15038.
2007-06-13 11:15:58 +00:00
Galen Shipman
5340f5e320 Try to cleanup the flow control logic a bit
Renamed a few variables 
Inialize the reserve receive buffers to 1, prior to this they were initialized
to zero. 

This commit was SVN r14919.
2007-06-06 18:51:09 +00:00
Brian Barrett
a446af5b6b * Remove unneeded SRQ test -- we no longer support OFED builds that don't
have the SRQ interface.
  * Instead of setting AC_DEFINEs per MCA component, set per test.  THe
    answers can never be difference, and this will speed sed just a teeny
    bit

This commit was SVN r14856.
2007-06-05 01:49:26 +00:00
Gleb Natapov
444762456e Don't dereference NULL pointer. Fix bug introduced in r14768.
This commit was SVN r14781.

The following SVN revision numbers were found above:
  r14768 --> open-mpi/ompi@3401bd2b07
2007-05-27 09:24:56 +00:00
Galen Shipman
3401bd2b07 Add optional ordering to the BTL interface.
This is required to tighten up the BTL semantics. Ordering is not guaranteed,
but, if the BTL returns a order tag in a descriptor (other than
MCA_BTL_NO_ORDER) then we may request another descriptor that will obey
ordering w.r.t. to the other descriptor.


This will allow sane behavior for RDMA networks, where local completion of an
RDMA operation on the active side does not imply remote completion on the
passive side. If we send a FIN message after local completion and the FIN is
not ordered w.r.t. the RDMA operation then badness may occur as the passive
side may now try to deregister the memory and the RDMA operation may still be
pending on the passive side. 

Note that this has no impact on networks that don't suffer from this
limitation as the ORDER tag can simply always be specified as
MCA_BTL_NO_ORDER.

This commit was SVN r14768.
2007-05-24 19:51:26 +00:00
Jeff Squyres
81df632e29 Clarification to MCA parameter help messages
This commit was SVN r14765.
2007-05-24 19:18:29 +00:00
Pavel Shamis
5ceaa605d7 Adding new vendor_part_id for Mellanox Hermon HCA
This commit was SVN r14705.
2007-05-21 13:33:54 +00:00
Gleb Natapov
3ebaff8dfe Implement new BTL parameters:
We eagerly send data up to btl_*_eager_limit with the match
Upon ACK of the MATCH we start using send/receives of size
btl_*_max_send_size up to the btl_*_rdma_pipeline_offset
After the btl_*_rdma_pipeline_offset we begin using RDMA writes of
size btl_*_rdma_pipeline_frag_size.

Now, on a per message basis we only use the above protocol if the
message is larger than btl_*_min_rdma_pipeline_size

btl_*_eager_limit - > same
btl_*_max_send_size -> same
btl_*_rdma_pipeline_offset -> btl_*_min_rdma_size
btl_*_rdma_pipeline_frag_size -> btl_*_max_rdma_size


btl_*_min_rdma_pipeline_size is new..

This patch also moves all BTL common parameters initialisation into
btl_base_mca.c file.

This commit was SVN r14681.
2007-05-17 07:54:27 +00:00
Pavel Shamis
cd87b05711 Added check for IBV_EVENT_CLIENT_REREGISTER async
event that was not exists in old openib gen2 versions
(Ticket #1025)

This commit was SVN r14658.
2007-05-15 13:53:49 +00:00
Jeff Squyres
92090967b1 Add definitions for Hemon/ConnectX Mellanox HCA
This commit was SVN r14639.
2007-05-10 12:27:51 +00:00
Pavel Shamis
e2d0e27111 Adding:
* openib_finalize flow for openib btl
* async event handler for openib btl

This commit was SVN r14623.
2007-05-08 21:47:21 +00:00
Jeff Squyres
ecf5a3b8dd Fix compiler warning
This commit was SVN r14604.
2007-05-08 13:12:50 +00:00
Rainer Keller
1aceece03f - Add a few comments for elements for structs, a few spelling fixes.
No functional change.

This commit was SVN r14534.
2007-04-26 21:03:38 +00:00
Rainer Keller
ce32b918da - Fixes for for unlocking the mutex in case of error in functions
mca_btl_openib_post_srr and
     btl_openib_endpoint_post_rr

This commit was SVN r14530.
2007-04-26 13:33:02 +00:00
Sharon Melamed
cf3f41288b Add pkey value MCA parameter. if this param is used,
only ports with the actual pkey value will be initiate.

This commit was SVN r14463.
2007-04-22 10:22:12 +00:00
Jeff Squyres
0ba47105ed Merge the /tmp/jms-installdirs-trunk branch into the trunk. This
finally brings in functionality that is already on the 1.2 branch, and
was developed and tested in the v1.2ofed branch (and other places).

Short version of new features:

 * Support for ibv_fork_init() 
 * Automatically fill in the openib BTL bandwidth value by 
   querying the HCA port 
 * Installdirs functionality 
 * Fixes to always use -I in the Fortran wrapper compilers (#924) 
 * Gleb's mpool updates 
 * Remove some kruft in btl/openib/configure.m4, therefore 
   fixing the harmless warnings noted in #665 
 * Bunches of updates to the Linux RPM spec file 

I.e., effectively the same thing that r14411 brought to the v1.2
branch.

Also effectively brought in r14432 and r14433 (some fixes on top of
the original r14411 commit to v1.2).  Still need to bring in the moral
equivalent of r14445 after this commit (fixes to installdirs).

This commit was SVN r14449.

The following SVN revision numbers were found above:
  r14411 --> open-mpi/ompi@83b31314ae
  r14432 --> open-mpi/ompi@a48f160595
  r14433 --> open-mpi/ompi@68f346d2bc
  r14445 --> open-mpi/ompi@13d366b827
2007-04-21 00:15:05 +00:00
Jeff Squyres
51f286d737 Just like r14289 on the ORTE trunk:
Per discussions with Brian and Ralph, make a slight correction in
where components are installed. Use $pkglibdir, not $libdir/openmpi,
so that when compiled in the orte trunk, components are installed to
the right directory (because the component search patch is checking
$pkglibdir).

This commit was SVN r14345.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r14289
2007-04-12 11:19:42 +00:00
George Bosilca
1cb26e3b9c Finally the convertor export a convenience function to allow a consistent
computation of the current location on the pack/unpack process. This can
be used both for retrieving the pointer to the first byte (in the special
case of the cached RDMA protocol) and for getting the current
position (for the pipelined protocol).

I modified all BTLs, but most of them are still untested.

This commit was SVN r14180.
2007-03-30 22:02:45 +00:00
Gleb Natapov
435565590f Don't relay on opcode to decide how to progress pending message.
This commit was SVN r14098.
2007-03-21 07:59:59 +00:00
Josh Hursey
e1a18fa149 Patch from Gleb
Always set opcode appropriately before calling ibv_post_send.

This commit was SVN r14056.
2007-03-18 13:33:15 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Gleb Natapov
1dc1ee3998 Send control credit message over "eager rdma" channel if possible.
This commit was SVN r14032.
2007-03-14 14:38:56 +00:00
Gleb Natapov
1f3ac2d7ae Hold pointers to free_max/free_eager lists in array indexed by priority.
This eliminates couple of ifs from fast path.

This commit was SVN r14031.
2007-03-14 14:36:03 +00:00
Gleb Natapov
8607957df9 Get rid of remaining _hp/_lp stuff. Consolidate HP/LP QP creation code.
This commit was SVN r14030.
2007-03-14 14:33:24 +00:00
Gleb Natapov
40501f8274 Amend IB parameter checking.
This commit was SVN r13936.
2007-03-06 13:05:12 +00:00
Gleb Natapov
90fb58de4f When frags are allocated from mpool by free_list the frag structure is also
allocated from mpool memory (which is registered memory for RDMA transports)
This is not a problem for a small jobs, but for a big number of ranks an
amount of waisted memory is big.

This commit was SVN r13921.
2007-03-05 14:17:50 +00:00
Josh Hursey
0404444dbe * Added 2 new MCA parameters
- mca_base_param_file_prefix
     (Default: NULL)
     This is the fullname of the "-am" mpirun option. Used to specify a ':'
     separated list of AMCA parameter set files.
  - mca_base_param_file_path
     (Default: $SYSCONFDIR/amca-param-sets/:$CWD)
     The path to search for AMCA files with relative paths. A warning will be
     printed if the AMCA file cannot be found.

* Added a new function "mca_base_param_recache_files" the re-reads the file
configurations. This is used internally to help bootstrap the MCA system.

* Added a new orterun/mpirun command line option '-am' that aliases for the
mca_base_param_file_prefix MCA parameter

* Exposed the opal_path_access function as it is generally useful in other
places in the code.

* New function "opal_cmd_line_make_opt_mca" which will allow you to append a
new command line option with MCA parameter identifiers to set at the same
time. Previously this could only be done at command line declaration time.

* Added a new directory under the $pkgdatadir named "amca-param-sets" where all
the 'shipped with' Open MPI AMCA parameter sets are placed. This is the first
place to search for AMCA sets with relative paths.

* An example.conf AMCA parameter set file is located in
contrib/amca-param-sets/.

* Jeff Squyres contributed an OpenIB AMCA set for benchmarking.

Note: You will need to autogen with this commit as it adds a configure param.
  Sorry :(

This commit was SVN r13867.
2007-03-01 13:39:20 +00:00
Gleb Natapov
2b6cbd6299 Separate frag lists for RDMA descriptors to two, one for src descriptors
and another for dst descriptors. This provide partial solution to OB1 protocol
deadlock problem. We can limit number of RDMA descriptors (by setting
btl_openib_free_list_max to something different from -1) and if we will be
lucky to hit this limit before we fail to register more memory the protocol
will not deadlock. When we had only one list for src/dst descriptors we
deadlocked when we reached max limit for the list.

This commit was SVN r13844.
2007-02-28 13:43:38 +00:00
Pavel Shamis
edeab0e912 Adding Mellanox Technologies copyright to files touched by Mellanox.
This commit was SVN r13669.
2007-02-15 18:03:20 +00:00
Jeff Squyres
c91fcd7fbd Fix a bunch of minor typos submitted by Bernhard Fischer.
This commit was SVN r13505.
2007-02-06 12:00:30 +00:00
George Bosilca
56ffbfc5ff Get rid of the warnings in the Open IB BTL.
This commit was SVN r13424.
2007-02-01 19:07:04 +00:00
George Bosilca
b611e6d7dc Less warnings.
This commit was SVN r13419.
2007-02-01 17:51:43 +00:00
Brian Barrett
ee753694e0 Print out the memlock limit when we can't allocate memory
This commit was SVN r13372.
2007-01-30 21:22:56 +00:00
Jeff Squyres
6fea000e5f Oops -- get the right function name (copy-n-paste error).
This commit was SVN r13290.
2007-01-24 22:31:13 +00:00
Jeff Squyres
6b69ea664d Make a much, much better error message for a not-uncommon failure
scenario (user/sysadmin forgot to set the memlock limits high
enough).

This commit was SVN r13289.
2007-01-24 22:25:40 +00:00
Jeff Squyres
c9fe68c406 Better patch from Gleb to do the per-port (endpoint) specification of
whether to use eager RDMA or not

This commit was SVN r13262.
2007-01-23 22:40:59 +00:00
Jeff Squyres
3389a523e9 Arrgh. That printf should not have been in there!
This commit was SVN r13243.
2007-01-22 18:52:49 +00:00
Jeff Squyres
a24f3c0886 Move the "use eager RDMA" flag to the individual openib BTL modules,
not the component.  This potentially allows for a mix of HCAs that
support eager RDMA and those who do not on a port-by-port basis.

This commit was SVN r13242.
2007-01-22 18:49:32 +00:00
Jeff Squyres
91b855c2f4 Minor fixes in the help messages
* If the text to cite where the problem occurred is "\n", prettyprint
   somethign a little nicer so that it's clear that we're talking
   about the end of line
 * Add a missing help message ("ini file:unknown field"), and display
   it a little better (i.e., show the erroneous field, not a
   misleading "end of line" marker)
 * It's "OpenIB", not "Open IB"

This commit was SVN r13241.
2007-01-22 18:45:43 +00:00
Jeff Squyres
e934272f3e Commit data supplied by Christian Bell at QLogic for their vendor and
part ID's.

This commit was SVN r13216.
2007-01-19 19:46:29 +00:00
Jeff Squyres
754042f1fc Fix a compiler warning.
This commit was SVN r13134.
2007-01-16 23:03:17 +00:00
Jeff Squyres
d5404f21a3 Make the trunk openib btl compile again.
This commit was SVN r13110.
2007-01-13 14:22:42 +00:00
Galen Shipman
4a6ad30440 remove unused macro calls..
This commit was SVN r13107.
2007-01-12 23:17:17 +00:00
Galen Shipman
2097d174f6 heterogeneous fixes to the OpenIB BTL. This includes work by nysal, brian and
I. 

This commit was SVN r13106.
2007-01-12 23:14:45 +00:00
Galen Shipman
df099a4731 call it what it is...
we are looking at subnet_id's and we are counting active ports per subnet. 
move subnet count out of procs loop,, no need to do it there... 

This commit was SVN r13105.
2007-01-12 22:42:20 +00:00
Jeff Squyres
e5205657cf A much better fix for #739. No configure test -- just do a simple
memcpy() instead of assigning the struct's by value.

Fixes trac:739.

This commit was SVN r13081.

The following Trac tickets were found above:
  Ticket 739 --> https://svn.open-mpi.org/trac/ompi/ticket/739
2007-01-11 14:30:32 +00:00
Jeff Squyres
add3909096 Back out 13076 and 13077 in favor of a much simpler approach.
Sorry for the configure change -- hopefully it's early enough in the
morning that it won't affect people... (new approach won't have a
configure change).

Refs trac:739.

This commit was SVN r13080.

The following Trac tickets were found above:
  Ticket 739 --> https://svn.open-mpi.org/trac/ompi/ticket/739
2007-01-11 14:07:15 +00:00
George Bosilca
24a91fad1d OPAL_BOOL_STRUCT_COPY or OMPI_BOOL_STRUCT_COPY that's the question!
Let's minimize the disturbances and say that the configure system is right.
From now on it's OPAL_BOOL_STRUCT_COPY. This one is related to r13076 and
has to follow when r13076 goes in the 1.2.

This commit was SVN r13077.

The following SVN revision numbers were found above:
  r13076 --> open-mpi/ompi@f0932a0701
2007-01-11 05:44:48 +00:00
Jeff Squyres
f0932a0701 A workaround for a bug in the PGI 6.2 compiler series. This bug has
been fixed in the 7.0 PGI series, but is unlikely to be fixed in the
6.2 series:

 * Add a configure test looking for the bad behavior (the PGI compiler
   chokes on C code where structs containing bool's are copied by
   value)
 * Set OMPI_BOOL_STRUCT_COPY to 1 if it's ok, 0 if it's not (i.e., PGI
   6.2 series will have this value set to 0)
 * In two places in the code base -- orte-clean and btl_openib_ini.h,
   we have a struct that contains a bool that is copied by value.  In
   these two places, check OMPI_BOOL_STRUCT_COPY and if it's 1, use
   the "int" type instead of "bool".

Fixes trac:739

This commit was SVN r13076.

The following Trac tickets were found above:
  Ticket 739 --> https://svn.open-mpi.org/trac/ompi/ticket/739
2007-01-11 02:21:26 +00:00
Gleb Natapov
624f139bd8 This commit fixes trac:729. Initialize pointer to registration to NULL. Otherwise
it may contain garbage and we will try to unregister it later in btl_free().

This commit was SVN r13054.

The following Trac tickets were found above:
  Ticket 729 --> https://svn.open-mpi.org/trac/ompi/ticket/729
2007-01-09 10:29:20 +00:00