1
1
Граф коммитов

487 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
e1383027e1 Correct a comment and cleanup/reorder the code.
This commit was SVN r21696.
2009-07-16 17:41:32 +00:00
Rainer Keller
6c5532072a - Split the datatype engine into two parts: an MPI specific part in
OMPI
   and a language agnostic part in OPAL. The convertor is completely
   moved into OPAL.  This offers several benefits as described in RFC
   http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
   namely:
    - Fewer basic types (int* and float* types, boolean and wchar
    - Fixing naming scheme to ompi-nomenclature.
    - Usability outside of the ompi-layer.
 - Due to the fixed nature of simple opal types, their information is
   completely
   known at compile time and therefore constified
 - With fewer datatypes (22), the actual sizes of bit-field types may be
   reduced
   from 64 to 32 bits, allowing reorganizing the opal_datatype
   structure, eliminating holes and keeping data required in convertor
   (upon send/recv) in one cacheline...
   This has implications to the convertor-datastructure and other parts
   of the code.
 - Several performance tests have been run, the netpipe latency does not
   change with
   this patch on Linux/x86-64 on the smoky cluster.
 - Extensive tests have been done to verify correctness (no new
   regressions) using:
   1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
    ompi-ddt:
    a. running both trunk and ompi-ddt resulted in no differences
       (except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
       correctly).
    b. with --enable-memchecker and running under valgrind (one buglet
       when run with static found in test-suite, commited)
   2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
      all passed (except for the dynamic/ tests failed!! as trunk/MTT)
   3. compilation and usage of HDF5 tests on Jaguar using PGI and
      PathScale compilers.
   4. compilation and usage on Scicortex.
 - Please note, that for the heterogeneous case, (-m32 compiled
   binaries/ompi), neither
   ompi-trunk, nor ompi-ddt branch would successfully launch.

This commit was SVN r21641.
2009-07-13 04:56:31 +00:00
George Bosilca
f9a510fd8a There is no need for an atomic read if we are not in a threaded case.
This commit was SVN r21394.
2009-06-08 23:55:52 +00:00
George Bosilca
be320ca959 Don't add the offset to all segments, only the first one should be affected. Thanks
to Roberto Ammendola for this bug report and patch.

This commit was SVN r21300.
2009-05-27 16:12:18 +00:00
Greg Koenig
60485ff95f This is a very large change to rename several #define values from
OMPI_* to OPAL_*.  This allows opal layer to be used more independent
from the whole of ompi.

NOTE: 9 "svn mv" operations immediately follow this commit.

This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Shiqing Fan
cd565923d3 Completely remove ltdl support for Windows build.
This commit was SVN r21170.
2009-05-05 18:59:13 +00:00
George Bosilca
271eb11f28 Remove an unused statically defined function.
This commit was SVN r21157.
2009-05-05 13:23:49 +00:00
Brian Barrett
736debcffc Check during communicator creation that we didn't get assigned a CID we can't handle, so that the code aborts instead of hange.
Refs trac:1904

This commit was SVN r21133.

The following Trac tickets were found above:
  Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904
2009-04-30 19:23:57 +00:00
Rainer Keller
221fb9dbca ... Delayed due to notifier commits earlier this day ...
- Delete unnecessary header files using
   contrib/check_unnecessary_headers.sh after applying
   patches, that include headers, being "lost" due to
   inclusion in one of the now deleted headers...

   In total 817 files are touched.
   In ompi/mpi/c/ header files are moved up into the actual c-file,
   where necessary (these are the only additional #include),
   otherwise it is only deletions of #include (apart from the above
   additions required due to notifier...)

 - To get different MCAs (OpenIB, TM, ALPS), an earlier version was
   successfully compiled (yesterday) on:
   Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
   Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
   Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled

This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
Shiqing Fan
3d4e0472d6 Add windows support files into the tarball, including .windows, CMakeLists.txt files, and CMake modules. Thanks to Jeff for testing it on Linux.
This commit was SVN r21069.
2009-04-24 16:39:33 +00:00
Nysal Jan
5353236a53 Reduce the size of FIN header by 8 bytes. This is done by rearranging the fields to reduce the amount of compiler padding
This commit was SVN r21046.
2009-04-21 14:41:51 +00:00
George Bosilca
b5deb228f3 Allow the BTL to release the descriptor. In fact the only thing the PML
needs is to be involved in the RMA completion process, which is insured
by the MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag. Fixes trac:1875.

This commit was SVN r20983.

The following Trac tickets were found above:
  Ticket 1875 --> https://svn.open-mpi.org/trac/ompi/ticket/1875
2009-04-13 23:41:50 +00:00
George Bosilca
527540aeb1 Rename req_bytes_delivered to req_bytes_expected for the receive
requests to really reflect what this field means.

This commit was SVN r20971.
2009-04-10 16:36:20 +00:00
George Bosilca
c148d33eb5 Play nicely with the reference count on the ompi_proc structure.
This commit was SVN r20970.
2009-04-10 16:32:02 +00:00
George Bosilca
dfc7cea329 Fix the deadlock issues on the osu_bw. The problem is that the PML is
event driver, and if there are no event generated by the BTLs ... well
nothing happens (i.e there is no progress at the PML level and all
pending fragments remain pending). By forcing the BTL to trigger the
callbacks for all ACK and FIN, we give more opportunities to the PML
to do real progress, but we pay this in terms of performance.

This commit was SVN r20953.
2009-04-07 16:56:37 +00:00
George Bosilca
44ce610b8b Add a comment to highlight the fact that this function reappend the
FIN message to the pending list when the send fails. Therefore, any
upper level function is not required to add it.
Make sure we don't send the FIN twice.

This commit was SVN r20952.
2009-04-07 16:48:58 +00:00
George Bosilca
ccb79b963f This is the other half of the commit r20946 as I mess them up between
two of my testing machines. The fix require both commits!

This commit was SVN r20947.

The following SVN revision numbers were found above:
  r20946 --> open-mpi/ompi@e2bb4c9b8f
2009-04-06 21:49:52 +00:00
George Bosilca
e2bb4c9b8f Correct the handling of the pckt_pending list. The problem was that
we returned the pck before coping the values out. With this change
it seems to work at least on two architectures (even with the 
mpool size set back to 0).

This commit was SVN r20946.
2009-04-06 21:45:08 +00:00
Ralph Castain
f72e3ba9f9 Update the PML base send init macro to take a converter_flag field (discussed with George).
Update the csum pml module - still not quite right, but closer.

Modify the LANL platform files to keep pace.

This commit was SVN r20859.
2009-03-24 19:12:53 +00:00
George Bosilca
daba352af4 As the request is not yet updated (i.e. _MATCHED cannot be called as we don't yet know the
expected length of the message) we should use the source and tag from the message header
instead of the value from the status structure attached to the request.
-This line, and those below, will be ignored--

M    pml_ob1_recvreq.c

This commit was SVN r20844.
2009-03-23 20:25:53 +00:00
Rainer Keller
6f808d9b05 Preparation work for another commit (after RFC):
- This patch solely _adds_ required headers and is rather localized
   The next patch (after RFC) heavily removes headers (based on script)
 - ompi/communicator/communicator.h: For sources that use
   ompi_mpi_comm_world, don't require them to include "mpi.h"
 - ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs
   #include "ompi/mca/topo/topo.h"
 - ompi/errhandler/errhandler_predefined.h:
   ompi/communicator/communicator.h depends on this header file!
   To prevent recursion just have fwd declarations.
   #include "ompi/types.h" for fwd declarations of the main structs.
 - ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and
   ompi_rb_tree_t, so have the proper classes
 - ompi/mca/op/op.h:
   Op is pretty self-contained: Nobody up to now has done
   #include "opal/class/opal_object.h"
 - ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h:
   #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/pml/base/base.h:
   We use opal_lists  
 - ompi/mca/pml/dr/pml_dr_vfrag.h:
   #include "opal/types.h" for ompi_ptr_t
 - ompi/mca/pml/ob1/pml_ob1_hdr.h:
   #include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t
 - opal/dss/dss_unpack.c:
   #include "opal/types.h"
 - opal/mca/base/base.h:
   #include "opal/util/cmd_line.h" for opal_cmd_line_t
 - orte/mca/oob/tcp/oob_tcp.c:
   #include "opal/types.h" for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp.h:
   #include "opal/threads/threads.h" for opal_thread_t
 - orte/mca/oob/tcp/oob_tcp_msg.c:
   #include "opal/types.h" 
 - orte/mca/oob/tcp/oob_tcp_peer.c:
   #include "opal/types.h"  for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp_send.c:
   #include "opal/types.h" 
 - orte/mca/plm/base/plm_base_proxy.c:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT
 - orte/mca/rml/base/rml_base_receive.c:
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/mca/rml/oob/rml_oob_recv.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/mca/rml/oob/rml_oob_send.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/runtime/orte_data_server.c
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/runtime/orte_globals.h:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT

 Tested on Linux/x86-64

This commit was SVN r20817.
2009-03-17 21:34:30 +00:00
Eugene Loh
efe8c3a283 Initialize reuse_old_request properly at the beginning of each loop iteration in pml_ob1_start.c.
This commit was SVN r20712.
2009-03-04 06:58:36 +00:00
Rainer Keller
811f2bd9b4 - As discussed on RFC, move the ompi_bitmap to the
opal layer.
   Add a check against a maximum (actually get rid of ifs internally to
   opal_bitmap.c) -- the functionality to set the current maximum size
   opal_bitmap_set_max_size() is currently only used in attribute.c
   to set the maximum OMPI_FORTRAN_HANDLE_MAX...

   Tested on linux/x86-64 with intel-tests with all_tests_no_perf_f
   run with 6 procs.
   Let's look into MTT as well...

This commit was SVN r20708.
2009-03-03 22:25:13 +00:00
Rainer Keller
02416033ad - Get rid of warning on function declarations:
First "static inline", then the type

This commit was SVN r20657.
2009-02-28 14:15:34 +00:00
George Bosilca
e181ba50c9 Stop valgrind from complaining about few uninitialized bytes on the PML
headers. This feature is enabled only in debug mode when the heterogeneous
support is enabled.

This commit was SVN r20648.
2009-02-27 05:24:06 +00:00
Rainer Keller
96e1b9b747 - Header orte/mca/rml/rml.h is not needed if no occurence of orte_rml
or ORTE_RML.
   As the others compiles fine with -Wimplicit-function-declaration

This commit was SVN r20639.
2009-02-26 03:52:31 +00:00
Rainer Keller
b356e90fa1 - Get rid of include orte/util/proc_info.h, if not needed
Only proc_info.h-internal include file is opal/dss/dss_types.h
 - In one case (orte/util/hnp_contact.c) had to add proc_info.h again.
 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   works fine, no errors.

   Again, let's have MTT the last word.

This commit was SVN r20631.
2009-02-25 03:38:00 +00:00
Terry Dontje
0178b6c45f Added padding to predefined handle structures to maintain library version to
version compatibility.

This commit was SVN r20627.
2009-02-24 17:17:33 +00:00
George Bosilca
97a2296fdd Correct the GET protocol. Thanks to Mike Dubman for finding the problem and
testing my patch.

This commit was SVN r20591.
2009-02-19 16:00:15 +00:00
Rainer Keller
d81443cc5a - On the way to get the BTLs split out and lessen dependency on orte:
Often, orte/util/show_help.h is included, although no functionality
   is required -- instead, most often opal_output.h, or               
   orte/mca/rml/rml_types.h                                           
   Please see orte_show_help_replacement.sh commited next.            

 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   actually showed two *missing* #include "orte/util/show_help.h"     
   in orte/mca/odls/base/odls_base_default_fns.c and                  
   in orte/tools/orte-top/orte-top.c                                  
   Manually added these.                                              

   Let's have MTT the last word.

This commit was SVN r20557.
2009-02-14 02:26:12 +00:00
Jeff Squyres
91415c2996 Some minor valgrind-inspired cleanups: fix some memory leaks
This commit was SVN r20542.
2009-02-13 03:45:11 +00:00
George Bosilca
4747a4bb53 ompi_comm_all allocate memory and retain the objects. Therefore, after
each call to ompi_comm_all we should parse the communicator list and
release the objects ...

This commit was SVN r20525.
2009-02-11 21:48:11 +00:00
Jeff Squyres
679e2855b7 Fix CID 1135: the assignment to item was never used (it was
overwritten in the next loop iteration "item = next_item").

This commit was SVN r20189.
2009-01-03 15:15:42 +00:00
Shiqing Fan
a5281f0434 - 1/4 commit for Windows Visual Studio and CCP support:
CMakeLists and .windows files.
  In contribs preconfigured and precompiled parts.

This commit was SVN r20108.
2008-12-10 20:59:20 +00:00
Shiqing Fan
abd21b6d17 - An update for memchecker :
1. fix a bug in pml_ob1_recvreq/sendreq.c, buffer was made defined where the request has already been released.
2. complete memchecker support for collective functions.
3. change the wrongly spelled function name of memchecker, i.e. '*_isaddressible' should be '*_isaddressable'

This commit was SVN r20043.
2008-11-27 16:34:02 +00:00
George Bosilca
82d1d5d785 The patch for "Unexpected message queue for unknown CID's required" ticket #1460.
I'm unable to split it in two parts, my patch and Edgar's one. So I just update
copyright information for both of us.
What this patch do:
- it use the unexpected queue create by commit r19562 to dispatch the
  unexpected message to the right communicator (once this communicator
  is created and initialized).
- delay the PML comm_add until we have the context_id for the new communicator.
- only do the PML comm_add on processes that really belong to the new
  communicator. Please read the lengthy comment in the source code for the
  reason behind this.

This commit was SVN r19929.

The following SVN revision numbers were found above:
  r19562 --> open-mpi/ompi@acd3406aa7
2008-11-04 21:58:06 +00:00
George Bosilca
9260f6b157 There is no reason to ask for an ACK from the BTL here.
This commit was SVN r19789.
2008-10-22 20:13:33 +00:00
Josh Hursey
88aa45dd52 Commit to bring online OpenIB, MX, and shared memory support for Open MPI's checkpoint/restart functionality. Some tuning is still needed, but basic functionality is in place.
There is still a problem with OpenIB and threads (external to C/R functionality). It has been reported in Ticket #1539

Additionally:
* Fix a file cleanup bug in CRS Base.
* Fix a possible deadlock in the TCP ft_event function
* Add a mca_base_param_deregister() function to MCA base
* Add whole process checkpoint timers
* Add support for BTL: OpenIB, MX,  Shared Memory
* Add support Mpool: rdma, sm
* Sundry bounds checking an cleanup in some scattered functions

This commit was SVN r19756.
2008-10-16 15:09:00 +00:00
George Bosilca
00d24bf8ab Scalability patch, or slim-fast effect #1. All BML structures just
got a whole lot smaller, decreasing the memory footprint of the
running application. How much it's a good question. Here is a
breakdown:

- in mca_bml_base_endpoint_t: 3 *size_t + 1 * uint32_t
- in mca_bml_base_btl_t: 1 * int + 1 * double - 1 * float
                         + 6 * size_t + 9 * (void*)

The decrease in mca_bml_base_endpoint_t is for each peer and the
decrease in mca_bml_base_btl_t is for each BTL for each peer.
So, if we consider the most convenient case where there is only
one network between all peers, this decrease the memory foot print
per peer by
9*size_t + 9*(void*) + 2 * int32_t + 1 * double - 1 * float.
On a 64 bits machine this will be 156 bytes per peer.

Now we access all these fields directly from the underlying BTL
structure, and as this structure is common to multiple BML endpoint,
we are a lot more cache friendly. Even if this do not improve the
latency, it makes the SM performance graph a lot smoother.

This commit was SVN r19659.
2008-09-30 21:02:37 +00:00
George Bosilca
b32e4e7f34 Nothing important, mainly replacing tabs with spaces.
This commit was SVN r19658.
2008-09-30 18:30:35 +00:00
George Bosilca
325d006577 Mostly cleanups, and eventually a little bit more scalable add_procs.
There was an argument that was barely used, and on return at the PML
level it contained nothing usable. It has been removed, so now we're
using less memory ...

This commit was SVN r19657.
2008-09-30 15:47:43 +00:00
Jeff Squyres
d2d06008a0 Change the default value of mpi_leave_pinned to -1, meaning that we'll
figure it out at runtime (really meaning: we'll still default to "0"
unless something explicitly overrides to 1, such as the openib BTL).
This way, ompi_info doesn't confusingly report mpi_leave_pinned==0 for
mpi_leave_pinned, but we end up running with mpi_leave_pinned==1.

Fixes trac:1502.

This commit was SVN r19571.

The following Trac tickets were found above:
  Ticket 1502 --> https://svn.open-mpi.org/trac/ompi/ticket/1502
2008-09-16 22:06:14 +00:00
Jeff Squyres
270f482fea Addendum to r19561: also remove a comment that is no longer true and
some code that is commented out.

This commit was SVN r19564.

The following SVN revision numbers were found above:
  r19561 --> open-mpi/ompi@17e65369be
2008-09-16 13:02:10 +00:00
George Bosilca
acd3406aa7 Never drop messages. No never no more.
This is supposed to fix the ticket #1460.

This commit was SVN r19562.
2008-09-15 23:04:18 +00:00
George Bosilca
17e65369be Fix the deadlock when we run out of resources on the BTLs. Move the progress
function from the BML into the PML. The BTL progress functions are now directly
registered with the event library.

This commit was SVN r19561.
2008-09-15 22:56:23 +00:00
George Bosilca
2499112d1c Fix indentation.
This commit was SVN r19313.
2008-08-17 20:10:54 +00:00
Rainer Keller
e84f1f6fdf - Mark the variable bytes_delivered as being unused
(it is just set within MCA_PML_OB1_RECV_REQUEST_UNPACK)
   Iff Coverity's prevent makes usage of __attribute__(unused),
   this should get rid of warning.
   Relates to CID1060

   Would then apply to a many int _rc; definitions, that are
   used in other macros in similar fashion...

This commit was SVN r19179.
2008-08-06 13:46:23 +00:00
George Bosilca
3dafa58b32 Fix coverty issue 1044.
This commit was SVN r19178.
2008-08-06 13:38:21 +00:00
Rainer Keller
c1f2b8e476 - Fix resource leak in case of error.
Coverity CID1067

This commit was SVN r19168.
2008-08-06 08:04:27 +00:00
Jeff Squyres
0af7ac53f2 Fixes trac:1392, #1400
* add "register" function to mca_base_component_t
   * converted coll:basic and paffinity:linux and paffinity:solaris to
     use this function
   * we'll convert the rest over time (I'll file a ticket once all
     this is committed)
 * add 32 bytes of "reserved" space to the end of mca_base_component_t
   and mca_base_component_data_2_0_0_t to make future upgrades
   [slightly] easier
   * new mca_base_component_t size: 196 bytes
   * new mca_base_component_data_2_0_0_t size: 36 bytes
 * MCA base version bumped to v2.0
   * '''We now refuse to load components that are not MCA v2.0.x'''
 * all MCA frameworks versions bumped to v2.0
 * be a little more explicit about version numbers in the MCA base
   * add big comment in mca.h about versioning philosophy

This commit was SVN r19073.

The following Trac tickets were found above:
  Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392
2008-07-28 22:40:57 +00:00
Jeff Squyres
e3e79c0881 Fixes trac:1379:
* Use synonym/deprecated MCA param API for some mca base params
 * In openib BTL, if we have appropriate memory hooks support, and if
   mpi_leave_pinned and mpi_leave_pinned_pipeline were not set by the
   user, set mpi_leave_pinned to 1.
 * Defer checking mpi_leave_pinned_* until as late as possible (i.e.,
   until after the btl's have had a chance to set mpi_leave_pinned to
   1):
   * in ob1 pml
   * in rdma mpool

This commit was SVN r19022.

The following Trac tickets were found above:
  Ticket 1379 --> https://svn.open-mpi.org/trac/ompi/ticket/1379
2008-07-24 22:51:26 +00:00
George Bosilca
3ba0a8c0c1 In the case where the environment is homogeneous we can ALWAYS create
the receiver convertor when we create the request (as we know all
architectures are identical).

This commit was SVN r18934.
2008-07-17 04:57:55 +00:00
George Bosilca
939fa3001d Small cleanups. Remove some switch cases that cannot be reached. Rename
a struct field.

This commit was SVN r18931.
2008-07-17 04:50:39 +00:00
George Bosilca
319a8b3219 Once matched the proc attached to the request should be the source
of the message and not the first on the list. This fix the ticket
#1386.

This commit was SVN r18929.
2008-07-17 03:04:28 +00:00
George Bosilca
3de0488410 Fix the truncation problem. This close the #211.
This commit was SVN r18850.
2008-07-09 17:38:41 +00:00
George Bosilca
dc0ab0d0a8 Enable the sendi path.
This commit was SVN r18633.
2008-06-09 23:03:56 +00:00
Galen Shipman
dbd282fcad doh.. fix GET protocol..
This commit was SVN r18623.
2008-06-09 19:45:44 +00:00
Ralph Castain
9613b3176c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.

I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.

This commit was SVN r18619.
2008-06-09 14:53:58 +00:00
George Bosilca
4d8cbbc167 Add Pasha's patch as it correctly solve the issues. In fact in the current
incarnation these functions do not need the inline keyword anymore.

This commit was SVN r18558.
2008-06-03 16:03:36 +00:00
George Bosilca
e361bcb64c Send optimizations.
1. The send path get shorter. The BTL is allowed to return > 0 to specify that the
   descriptor was pushed to the networks, and that the memory attached to it is 
   available again for the upper layer. The MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag
   can be used by the PML to force the BTL to always trigger the callback.
   Unmodified BTL will continue to work as expected, as they will return OMPI_SUCCESS
   which force the PML to have exactly the same behavior as before. Some BTLs have
   been modified: self, sm, tcp, mx.
2. Add send immediate interface to BTL.
   The idea is to have a mechanism of allowing the BTL to take advantage of
   send optimizations such as the ability to deliver data "inline". Some
   network APIs such as Portals allow data to be sent using a "thin" event
   without packing data into a memory descriptor. This interface change
   allows the BTL to use such capabilities and allows for other optimizations
   in the future. All existing BTLs except for Portals and sm have this interface
   set to NULL.

This commit was SVN r18551.
2008-05-30 03:58:39 +00:00
Galen Shipman
4da4c44210 Receive side changes, basically uses multiple active message callbacks rather
than using a single receive callback followed by a switch on the header.
Also fast pathed the matching for small fragments. 

This commit was SVN r18549.
2008-05-30 01:29:09 +00:00
Jeff Squyres
e7ecd56bd2 This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.

= ORTE Job-Level Output Messages =

Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):

 * orte_output(): (and corresponding friends ORTE_OUTPUT,
   orte_output_verbose, etc.)  This function sends the output directly
   to the HNP for processing as part of a job-specific output
   channel.  It supports all the same outputs as opal_output()
   (syslog, file, stdout, stderr), but for stdout/stderr, the output
   is sent to the HNP for processing and output.  More on this below.
 * orte_show_help(): This function is a drop-in-replacement for
   opal_show_help(), with two differences in functionality:
   1. the rendered text help message output is sent to the HNP for
      display (rather than outputting directly into the process' stderr
      stream)
   1. the HNP detects duplicate help messages and does not display them
      (so that you don't see the same error message N times, once from
      each of your N MPI processes); instead, it counts "new" instances
      of the help message and displays a message every ~5 seconds when
      there are new ones ("I got X new copies of the help message...")

opal_show_help and opal_output still exist, but they only output in
the current process.  The intent for the new orte_* functions is that
they can apply job-level intelligence to the output.  As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.

=== New code ===

For ORTE and OMPI programmers, here's what you need to do differently
in new code:

 * Do not include opal/util/show_help.h or opal/util/output.h.
   Instead, include orte/util/output.h (this one header file has
   declarations for both the orte_output() series of functions and
   orte_show_help()).
 * Effectively s/opal_output/orte_output/gi throughout your code.
   Note that orte_output_open() takes a slightly different argument
   list (as a way to pass data to the filtering stream -- see below),
   so you if explicitly call opal_output_open(), you'll need to
   slightly adapt to the new signature of orte_output_open().
 * Literally s/opal_show_help/orte_show_help/.  The function signature
   is identical.

=== Notes ===

 * orte_output'ing to stream 0 will do similar to what
   opal_output'ing did, so leaving a hard-coded "0" as the first
   argument is safe.
 * For systems that do not use ORTE's RML or the HNP, the effect of
   orte_output_* and orte_show_help will be identical to their opal
   counterparts (the additional information passed to
   orte_output_open() will be lost!).  Indeed, the orte_* functions
   simply become trivial wrappers to their opal_* counterparts.  Note
   that we have not tested this; the code is simple but it is quite
   possible that we mucked something up.

= Filter Framework =

Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr.  The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations.  The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc.  This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).

Filtering is not active by default.  Filter components must be
specifically requested, such as:

{{{
$ mpirun --mca filter xml ...
}}}

There can only be one filter component active.

= New MCA Parameters =

The new functionality described above introduces two new MCA
parameters:

 * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
   help messages will be aggregated, as described above.  If set to 0,
   all help messages will be displayed, even if they are duplicates
   (i.e., the original behavior).
 * '''orte_base_show_output_recursions''': An MCA parameter to help
   debug one of the known issues, described below.  It is likely that
   this MCA parameter will disappear before v1.3 final.

= Known Issues =

 * The XML filter component is not complete.  The current output from
   this component is preliminary and not real XML.  A bit more work
   needs to be done to configure.m4 search for an appropriate XML
   library/link it in/use it at run time.
 * There are possible recursion loops in the orte_output() and
   orte_show_help() functions -- e.g., if RML send calls orte_output()
   or orte_show_help().  We have some ideas how to fix these, but
   figured that it was ok to commit before feature freeze with known
   issues.  The code currently contains sub-optimal workarounds so
   that this will not be a problem, but it would be good to actually
   solve the problem rather than have hackish workarounds before v1.3 final.

This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
Gleb Natapov
31d2797a2f If RDMA PUT is received before ACK and registration of memory fails don't
start sending fragment by copy in/out before ACK is received as we don't
know pointer to receive request yet.

Pipeline protocol sometimes doesn't send ACK though, so this case is still
broken.

This commit was SVN r18423.
2008-05-11 12:40:55 +00:00
Josh Hursey
da2f1c58e2 Some checkpoint/restart cleanup.
* Remove the opal_only option. This was suffering from bit rot, and no one uses it. It can be added back fairly easily if wanted.
 * Cleanup metadata interactions at the local level.
 * Touch up some of the INC funcitonality (fix typos and a minor ordering issue)

This commit was SVN r18416.
2008-05-08 18:47:47 +00:00
Shiqing Fan
8393fb5d47 Use the new memchecker_call function for memory checking of non-blocking communication.
This commit was SVN r18399.
2008-05-07 12:28:51 +00:00
Shiqing Fan
f35a06119c Use memchecker_convertor_call function instead the old one. Move the function to the place that we can use convertor.
This commit was SVN r18370.
2008-05-05 13:57:27 +00:00
Josh Hursey
dcd21d7d07 Some checkpoint/restart fixes in response to r18338 (changes in modex).
Things should be working now.

This commit was SVN r18348.

The following SVN revision numbers were found above:
  r18338 --> open-mpi/ompi@3e55fe6f6d
2008-05-01 17:48:13 +00:00
Ralph Castain
3e55fe6f6d Fold in the revised modex scheme. Move the ompi_proc_t modex portions to the RTE level since the daemons already have that info. Provide each process with the equivalent of a "nidmap" - both a map of what nodes are in the job, and a map of which node each process is on. This enables the use of static ports, though that hasn't been turned "on" in this commit.
Update the rsh tree spawn capability so we spawn the next wave of daemons before launching our own local procs.

Add an ability to encode nodenames for large clusters with contiguous node name numbering schemes - this allows communication of all node names in a few bytes instead of tens-of-bytes/node.

This commit was SVN r18338.
2008-04-30 19:49:53 +00:00
George Bosilca
6e6c370917 Rollback r18274 as its legal to have a sequence number smaller than the
expected one. It doesn't necessarily means the message is duplicated,
it can simply signify the message is out of sequence and the counter
overflowed.

This commit was SVN r18323.

The following SVN revision numbers were found above:
  r18274 --> open-mpi/ompi@73c9de3af9
2008-04-27 18:35:54 +00:00
Josh Hursey
2c736873bb Fix a checkpoint/restart bug that causes a restarted application to occasionally throw a SIGSEGV or SIGPIPE due to invalid socket descriptors.
The problem was caused by a bad ordering between the restart of the ORTE level tcp connections (in the OOB - out-of-band communication) and the Open MPI level tcp connections (BTLs). Before this commit ORTE would shutdown and restart the OOB completely before the OMPI level restarted its tcp connections. What would happen is that a socket descriptor used by the OMPI level on checkpoint was assigned to the ORTE level on restart. But the OMPI level had no knowledge that the socket descriptor it was previously using has been recycled so it closed it on restart. This caused the ORTE level to break as the newly created socket descriptor was closed without its knowledge.

The fix is to have the OMPI level shutdown tcp connections, allow the ORTE level to restart, and then allow the OMPi level to restart its connections. This seems obvious, and I'm surprised that this bug has not cropped up sooner. I'm confident that this specific problem has been fixed with this commit.

Thanks to Eric Roman and Tamer El Sayed for their help in identifying this problem, and patience while I was fixing it.

 * Add a new state {{{OPAL_CRS_RESTART_PRE}}}. This state identifies when we are on the down slope of the INC (finalize-like) which is useful when you want to close, but not reopen a component set for fear of interfering with a lower level.
 * Use this new state in OMPI level coordination. Here we want to make sure to play well with both the OMPI/BTL/TCP and ORTE/OOB/TCP components.
 * Update ft_event functions in PML and BML to handle the new restart state.
 * Add an additional flag to the error output in OOB/TCP so we can see what the socket descriptor was on failure as this can be helpful in debugging.

This commit was SVN r18276.
2008-04-24 17:54:22 +00:00
George Bosilca
3ccac4f803 Oops ...
This commit was SVN r18275.
2008-04-24 15:54:52 +00:00
George Bosilca
73c9de3af9 Bark if we got a wrong sequence number. Here wrong means that the
seq number if smaller than what we expect.

This commit was SVN r18274.
2008-04-24 15:48:43 +00:00
Josh Hursey
cc83d41ad9 Merge in tmp/jjh-scratch
{{{
 svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch .
}}}

Contains:
 * Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart.
 * Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff
 * Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P.
 * Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry
 * Some other sundry cleanup items all dealing with C/R functionality in the trunk.

This commit was SVN r18241.
2008-04-23 00:17:12 +00:00
Ralph Castain
fa082cafa9 Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex.
Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer.

This commit was SVN r18198.
2008-04-17 20:43:56 +00:00
Ralph Castain
3a0d09300b Fully implement the inbound binomial allgather for daemon-based collectives. Supports both modex and barrier operations.
Comm_spawn still uses the rank=0 method - shifting that algo to the daemons is under study.

This commit was SVN r18115.
2008-04-09 22:10:53 +00:00
Shiqing Fan
28746bbcdb Remove the memchecker macro in pml base request, used in req_wait.c, which actually is in the wrong place. Instead, one simple call from send_request_free and recv_request_free(already done) will do all the work, fast and clean.
This commit was SVN r18095.
2008-04-07 17:46:50 +00:00
Shiqing Fan
a1e5df1cc9 Use the new memchecker function call which is based on convertor.
Remove one unnecessary call.

This commit was SVN r18085.
2008-04-07 07:52:04 +00:00
Gleb Natapov
cf40674369 Decide if sends should be throttled at the receiver and pass this to the sender
in an ACK message. The decision can't be done reliably at the sender.

This commit was SVN r17987.
2008-03-27 08:56:43 +00:00
Galen Shipman
0116041133 BTL shouldn't own the passive side's descriptor in the PML get protocol. The BTL
doesn't know when to free it on the passive side. 

This commit was SVN r17943.
2008-03-25 01:43:41 +00:00
George Bosilca
8943ae0b4e Cleanup plus some typos.
This commit was SVN r17858.
2008-03-18 03:03:33 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
George Bosilca
fa31ec81d0 Add the ownership flags to the PML/BTL interface. The layer
owning the descriptor is responsible for releasing it once
the descriptor is not in use anymore.

This commit was SVN r17497.
2008-02-18 17:39:30 +00:00
Shiqing Fan
653857ddbe Wrong function name was copied here.
This commit was SVN r17486.
2008-02-17 19:47:47 +00:00
Gleb Natapov
354c5bc5e1 Don't call progress() from OB1 fragment scheduling functions. They don't serve
any purpose and case recursion calls to progress engine.

This commit was SVN r17478.
2008-02-17 12:42:32 +00:00
Gleb Natapov
0a1fa2cb56 req_match_received is set inside MCA_PML_OB1_RECV_REQUEST_MATCHE().
This commit was SVN r17442.
2008-02-13 08:34:39 +00:00
Gleb Natapov
876f49f1a7 Remove unnecessary assignment. It is done later in the same function.
This commit was SVN r17441.
2008-02-13 08:28:25 +00:00
Shiqing Fan
54c7b71cfd Use the correct way of including memchecker.h, which will work with '--with-devel-headers'.
This commit was SVN r17435.
2008-02-12 18:01:17 +00:00
Rainer Keller
7621800477 - Fix and add comments -- output full name for pd
- Protect argument in macro...

This commit was SVN r17434.
2008-02-12 16:59:59 +00:00
Shiqing Fan
f5792bbda5 merging the memchecker into trunk.
This commit was SVN r17424.
2008-02-12 08:46:27 +00:00
George Bosilca
4e703741b7 Move the PML tags into the legal range.
This commit was SVN r17326.
2008-01-30 00:09:45 +00:00
Rainer Keller
f7e586fc01 - allow --enable-mca-direct=pml-ob1
This commit was SVN r17227.
2008-01-25 09:56:45 +00:00
George Bosilca
6310ce955c The first patch related to the Active Message stuff. So far, here is what we have:
- the registration array is now global instead of one by BTL.
- each framework have to declare the entries in the registration array reserved. Then
  it have to define the internal way of sharing (or not) these entries between all
  components. As an example, the PML will not share as there is only one active PML
  at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3
  are reserved for the framework while the remaining 5 are use internally by each
  framework.
- The registration function is optional. If a BTL do not provide such function,
  nothing happens. However, in the case where such function is provided in the BTL
  structure, it will be called by the BML, when a tag is registered.

Now, it's time for the second step... Converting OB1 from a switch based PML to an
active message one.

This commit was SVN r17140.
2008-01-15 05:32:53 +00:00
George Bosilca
1bd31aa3ac Cleanup the OMPI_DECLSPEC/OMPI_MODULE_DECLSPEC in the PMLs.
This commit was SVN r17093.
2008-01-09 20:32:39 +00:00
Gleb Natapov
b37ff74a24 Make function that is used only in one file static. Remove static functions
declaration.

This commit was SVN r17080.
2008-01-09 09:54:35 +00:00
Ethan Mallove
f32dcb1636 The Sun Studio 12 compilers need to have inline specified as
`static` in cases where a function is not part of a separate
compilation unit (such as `append_recv_req_to_queue`).

This commit was SVN r17069.
2008-01-08 18:45:51 +00:00
George Bosilca
42414b27e9 Use BEGIN_C_DECLS and END_C_DECLS instead of the ugly #if/#endif.
This commit was SVN r17009.
2007-12-21 06:19:46 +00:00
George Bosilca
b58dae00db Allow PERUSE to compile correctly.
This commit was SVN r17008.
2007-12-21 06:18:19 +00:00
Gleb Natapov
35bf8c7c46 Rewrite OB1 matching logic. Get rid of macros, make the code shorter.
This commit was SVN r16993.
2007-12-19 09:16:20 +00:00
Gleb Natapov
5cd38b8b06 Better encapsulate heterogeneous arch handling in ob1.
This commit was SVN r16970.
2007-12-16 08:45:44 +00:00
Gleb Natapov
8b511b969d Introduce a new BTL parameter btl_rndv_eager_limit which determines size of a
first fragment of rendezvous protocol. Remove no longer used btl_min_send_size
parameter.

This commit was SVN r16969.
2007-12-16 08:35:17 +00:00