1
1
Граф коммитов

204 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
bf89131f9e add owner files to opa/ompi/orte mca directories
This commit adds an owner file in each of the component directories
for each framework.  This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page.  Currently there are two
"fields" in the file, an owner and a status.  A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
2015-02-22 15:10:23 -07:00
Nathan Hjelm
a2bdfd99a2 osc/pt2pt: do not set active_incoming_frag_signal_count to 0 on fence completion 2015-02-11 12:34:04 -07:00
Gilles Gouaillardet
9be4dfb152 osc/pt2pt: invoke ompi_osc_signal_outgoing only once per fragment 2015-01-22 13:43:44 +09:00
Ralph Castain
4e592ac434 Fix the tarball by providing the correct list of headers in the Makefile.am 2015-01-07 18:37:26 -08:00
Nathan Hjelm
e68ed2876c osc/pt2pt: threading fixes and code cleanup 2015-01-06 13:39:16 -07:00
Nathan Hjelm
9eba7b9d35 Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma 2015-01-06 13:38:55 -07:00
Nathan Hjelm
1b564f62bd Revert "Merge pull request #275 from hjelmn/btlmod"
This reverts commit ccaecf0fd6, reversing
changes made to 6a19bf85dd.
2014-11-19 23:22:43 -07:00
Nathan Hjelm
0d413fb73f Revert "Remove stale file reference"
This reverts commit 4c8fa17234.
2014-11-19 23:16:16 -07:00
Ralph Castain
4c8fa17234 Remove stale file reference 2014-11-19 18:32:19 -08:00
Nathan Hjelm
22625b005b osc/pt2pt: threading fixes and code cleanup 2014-11-19 11:33:04 -07:00
Nathan Hjelm
29e4e1c90a Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma 2014-11-19 11:33:03 -07:00
Ralph Castain
49d938de29 Merge one-sided updates to the trunk - written by Brian Barrett and Nathan Hjelmn
cmr=v1.7.5:reviewer=hjelmn:subject=Update one-sided to MPI-3

This commit was SVN r30816.
2014-02-25 17:36:43 +00:00
Yossi Etigin
9504969f7d fix communicator double-free from pt2pt component, caused by r29938.
cmr=v1.7.5:reviewer=brbarret

This commit was SVN r30264.

The following SVN revision numbers were found above:
  r29938 --> open-mpi/ompi@ecfb122c97
2014-01-12 17:38:14 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Yossi Etigin
ecfb122c97 Fix segfault in osc pt2pt completion handler, when the request is canceled during finalization.
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29938.
2013-12-17 17:30:14 +00:00
Ralph Castain
5d1fa4fa0e Silence warnings:
osc_pt2pt_data_move.c: In function 'ompi_osc_pt2pt_sendreq_recv_accum_long_cb':
osc_pt2pt_data_move.c:643:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
osc_rdma_data_move.c: In function 'ompi_osc_rdma_control_send_cb':
osc_rdma_data_move.c:1312:37: warning: variable 'header' set but not used [-Wunused-but-set-variable]

This commit was SVN r29092.
2013-08-29 20:56:36 +00:00
George Bosilca
c9e5ab9ed1 Our macros for the OMPI-level free list had one extra argument, a possible return
value to signal that the operation of retrieving the element from the free list
failed. However in this case the returned pointer was set to NULL as well, so the
error code was redundant. Moreover, this was a continuous source of warnings when
the picky mode is on.

The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of
using the return code.

This commit was SVN r28722.
2013-07-04 08:34:37 +00:00
Nathan Hjelm
9d4a26f47d Update OMPI frameworks to use the MCA framework system.
Notes:
  - This commit also eliminates the need for an available components list in use
    in several frameworks. None of the code in question was making use of the
    priority field of the priority component list item so these extra lists were
    removed.
  - Cleaned up selection code in several frameworks to sort lists using opal_list_sort.
  - Cleans up the ompi/orte-info functions. Expose the functions that construct the
    list of params so they can be used elsewhere.

patches for mtl/portals4 from brian

missed a few output variables in openib

This commit was SVN r28241.
2013-03-27 21:17:31 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Ralph Castain
8d2fa3693b First cut at removing the native Windows support. Remove all the Windows-specific components, and the .windows files sprinkled around. Remove the Windows platform files and MTT scripts. Update the NEWS to point Windows users to the cygwin package.
This commit was SVN r28116.
2013-02-26 20:44:56 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Brian Barrett
d46d55ee9b If we're locking the local window, need to wait until the lock returns.
This commit was SVN r26234.
2012-04-04 16:27:24 +00:00
Shiqing Fan
1ed0f40d35 Fix a few type casts on Windows.
This commit was SVN r24857.
2011-07-06 08:08:53 +00:00
Brian Barrett
a4b2bd903b * Implement long-ago discussed RFC to add a callback data pointer in the
request completion callback
* Use the completion callback pointer to remove all need for opal_progress
  calls in the one-sided layer

This commit was SVN r24848.
2011-06-30 20:05:16 +00:00
Brian Barrett
8376e0e507 Use free list get instead of wait; this is a constrained resource that will never come back, as it scales with the number of windows and not some more dynamic resources...
This commit was SVN r24685.
2011-05-05 17:19:59 +00:00
Eugene Loh
2770a12beb Continue clean up of thread options started in r22841, 22842, and 22849.
No need for any CMRs to 1.5... that was already done in CMR 2728.

This commit was SVN r24545.

The following SVN revision numbers were found above:
  r22841 --> open-mpi/ompi@b400b84162
2011-03-18 21:36:35 +00:00
Shiqing Fan
f43862420c Convert the bad dos line endings to unix style for all windows related files.
This commit was SVN r24137.
2010-12-02 12:08:08 +00:00
Ralph Castain
fceabb2498 Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.

Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.

Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.

I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:

1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)

2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.

There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.

This commit was SVN r23925.
2010-10-24 18:35:54 +00:00
Ralph Castain
40a2bfa238 WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gone into validating the branch. However, not all conditions can be checked, so users are cautioned that it may be advisable to not update from the trunk for a few days to allow MTT to identify platform-specific issues.
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.

Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.

This commit was SVN r23764.
2010-09-17 23:04:06 +00:00
Jeff Squyres
ee3b22e4b7 Oops -- use the right function name, otherwise you get compile/link
errors when you configure with --enable-heterogeneous.

This is why we have MTT.  :-)

This commit was SVN r23481.
2010-07-23 01:30:01 +00:00
George Bosilca
733d25a8a3 First step toward fixing the MPI_Get_count issues from the ticket #2241. Next
step is the configure and Fortran mojo that Jeff will put in. Until then I
guess the Fortran interface is broken (at least all functions using the hidden
count firld in the MPI_Status).

This commit was SVN r23467.
2010-07-21 20:07:00 +00:00
Jeff Squyres
35690ecad5 Fixes trac:2472. Use large integers to hold displacements for one-sided
operations, not ints. 

Sorry for the mid-day configure.ac change, folks...

This commit was SVN r23449.

The following Trac tickets were found above:
  Ticket 2472 --> https://svn.open-mpi.org/trac/ompi/ticket/2472
2010-07-20 18:45:48 +00:00
Abhishek Kulkarni
c63c4d6892 Fix bugs where (OMPI_ERROR == *) checks cannot be converted to (OMPI_SUCCESS != *) since the return codes are overloaded to return an "index" on success.
The fix is to just check if the return value is positive or not, since all the SOS encoded errors are *always* negative.

The real fix (as Ralph points out) is to change these functions (opal_pointer_array_add and mca_base_param*) to return the index as a pointer.

This commit was SVN r23173.
2010-05-18 20:54:11 +00:00
Abhishek Kulkarni
afbe3e99c6 * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
 SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
 back the native error code.

* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
  (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
  decode 'ret' to get the native error code.

This commit was SVN r23162.
2010-05-17 23:08:56 +00:00
Shiqing Fan
872a4047ba Fix the bug that caused by ADD_DEPENDENCIES() from different version of CMake.
In CMake 2.6 and earlier, this function add dependencies for targets and also link the target libraries automatically, but in CMake 2.8,this behavior has been changed, i.e. it will only add the dependencies but no link, which will cause linking errors at compilation time.

This commit was SVN r22405.
2010-01-14 18:10:20 +00:00
Shiqing Fan
7cf427c39b Include the missing thread header, which is needed when build with --enable-progress-thread.
This commit was SVN r22239.
2009-11-27 14:49:24 +00:00
Shiqing Fan
bce2f44154 Update related .windows files with proper compiling properties, in order to have a successful DSO build.
This commit was SVN r21805.
2009-08-12 08:55:58 +00:00
Rainer Keller
6c5532072a - Split the datatype engine into two parts: an MPI specific part in
OMPI
   and a language agnostic part in OPAL. The convertor is completely
   moved into OPAL.  This offers several benefits as described in RFC
   http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
   namely:
    - Fewer basic types (int* and float* types, boolean and wchar
    - Fixing naming scheme to ompi-nomenclature.
    - Usability outside of the ompi-layer.
 - Due to the fixed nature of simple opal types, their information is
   completely
   known at compile time and therefore constified
 - With fewer datatypes (22), the actual sizes of bit-field types may be
   reduced
   from 64 to 32 bits, allowing reorganizing the opal_datatype
   structure, eliminating holes and keeping data required in convertor
   (upon send/recv) in one cacheline...
   This has implications to the convertor-datastructure and other parts
   of the code.
 - Several performance tests have been run, the netpipe latency does not
   change with
   this patch on Linux/x86-64 on the smoky cluster.
 - Extensive tests have been done to verify correctness (no new
   regressions) using:
   1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
    ompi-ddt:
    a. running both trunk and ompi-ddt resulted in no differences
       (except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
       correctly).
    b. with --enable-memchecker and running under valgrind (one buglet
       when run with static found in test-suite, commited)
   2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
      all passed (except for the dynamic/ tests failed!! as trunk/MTT)
   3. compilation and usage of HDF5 tests on Jaguar using PGI and
      PathScale compilers.
   4. compilation and usage on Scicortex.
 - Please note, that for the heterogeneous case, (-m32 compiled
   binaries/ompi), neither
   ompi-trunk, nor ompi-ddt branch would successfully launch.

This commit was SVN r21641.
2009-07-13 04:56:31 +00:00
Greg Koenig
60485ff95f This is a very large change to rename several #define values from
OMPI_* to OPAL_*.  This allows opal layer to be used more independent
from the whole of ompi.

NOTE: 9 "svn mv" operations immediately follow this commit.

This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Shiqing Fan
cd565923d3 Completely remove ltdl support for Windows build.
This commit was SVN r21170.
2009-05-05 18:59:13 +00:00
Rainer Keller
250c3d0ddd - Fix Coverity CID 527
malloc buffer for ompi_info_get one character larger for the NUL-termination
   See comment in ompi/mpi/c/info_get.c or MPI-2.1 p289

This commit was SVN r21154.
2009-05-05 13:05:20 +00:00
Brian Barrett
7f898d4e2b * Make rdma the default. Somehow, the code didn't match what was supposed
to happen
* Properly error out (rather than cause buffer overflow) in case where
  the datatype packed description is larger than our control fragments.
  This still isn't standards conforming, but at least we know what
  happened.
* Expose win_set_name to external libraries (like the osc modules)
* Set default window name to the CID of the communcator it's using
  for communication

Refs trac:1905

This commit was SVN r21134.

The following Trac tickets were found above:
  Ticket 1905 --> https://svn.open-mpi.org/trac/ompi/ticket/1905
2009-04-30 22:36:09 +00:00
Rainer Keller
221fb9dbca ... Delayed due to notifier commits earlier this day ...
- Delete unnecessary header files using
   contrib/check_unnecessary_headers.sh after applying
   patches, that include headers, being "lost" due to
   inclusion in one of the now deleted headers...

   In total 817 files are touched.
   In ompi/mpi/c/ header files are moved up into the actual c-file,
   where necessary (these are the only additional #include),
   otherwise it is only deletions of #include (apart from the above
   additions required due to notifier...)

 - To get different MCAs (OpenIB, TM, ALPS), an earlier version was
   successfully compiled (yesterday) on:
   Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
   Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
   Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled

This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
Shiqing Fan
3d4e0472d6 Add windows support files into the tarball, including .windows, CMakeLists.txt files, and CMake modules. Thanks to Jeff for testing it on Linux.
This commit was SVN r21069.
2009-04-24 16:39:33 +00:00
Rainer Keller
6f808d9b05 Preparation work for another commit (after RFC):
- This patch solely _adds_ required headers and is rather localized
   The next patch (after RFC) heavily removes headers (based on script)
 - ompi/communicator/communicator.h: For sources that use
   ompi_mpi_comm_world, don't require them to include "mpi.h"
 - ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs
   #include "ompi/mca/topo/topo.h"
 - ompi/errhandler/errhandler_predefined.h:
   ompi/communicator/communicator.h depends on this header file!
   To prevent recursion just have fwd declarations.
   #include "ompi/types.h" for fwd declarations of the main structs.
 - ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and
   ompi_rb_tree_t, so have the proper classes
 - ompi/mca/op/op.h:
   Op is pretty self-contained: Nobody up to now has done
   #include "opal/class/opal_object.h"
 - ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h:
   #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/pml/base/base.h:
   We use opal_lists  
 - ompi/mca/pml/dr/pml_dr_vfrag.h:
   #include "opal/types.h" for ompi_ptr_t
 - ompi/mca/pml/ob1/pml_ob1_hdr.h:
   #include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t
 - opal/dss/dss_unpack.c:
   #include "opal/types.h"
 - opal/mca/base/base.h:
   #include "opal/util/cmd_line.h" for opal_cmd_line_t
 - orte/mca/oob/tcp/oob_tcp.c:
   #include "opal/types.h" for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp.h:
   #include "opal/threads/threads.h" for opal_thread_t
 - orte/mca/oob/tcp/oob_tcp_msg.c:
   #include "opal/types.h" 
 - orte/mca/oob/tcp/oob_tcp_peer.c:
   #include "opal/types.h"  for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp_send.c:
   #include "opal/types.h" 
 - orte/mca/plm/base/plm_base_proxy.c:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT
 - orte/mca/rml/base/rml_base_receive.c:
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/mca/rml/oob/rml_oob_recv.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/mca/rml/oob/rml_oob_send.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/runtime/orte_data_server.c
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/runtime/orte_globals.h:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT

 Tested on Linux/x86-64

This commit was SVN r20817.
2009-03-17 21:34:30 +00:00
Rainer Keller
d8cf4c0fec - Get pgcc on XT to complain less:
In case we use memcmp, strlen, strup and friends include <string.h>
   Also several constants.h are not included directly
 - Let's have mca_topo_base_cart_create  return ompi-errors in
   ompi/mca/topo/base/topo_base_cart_create.c

This commit was SVN r20773.
2009-03-13 02:10:32 +00:00
Rainer Keller
9dea63d63a - Last of intrusive commits (promised)... err for now.
Anyway, this is blocking the move: do not include pml.h
   if not really needed, aka none of the following used:
     mca_pml
     MCA_PML_CALL
     OMPI_ANY_TAG
     OMPI_ANY_SOURCE
     OMPI_PROC_NULL

 - Notable exceptions (deleting in one header->adding):
   - ompi/mca/mtl/psm/
   - ompi/mca/osc/rdma/
   - ompi/mca/btl/openib/btl_openib_endpoint.c depended on
     pml_base_sendreq.h

 - Tested on Linux/x86-64, this time including make check
   (thanks Jeff and Ralph)

This commit was SVN r20725.
2009-03-04 17:06:51 +00:00
Rainer Keller
fd28b392bf - An intrusive commit yet again (sorry): with the separation we
get bitten by header depending on having already included
   the corresponding [opal|orte|ompi]_config.h header.
   When separating, things like [OPAL|ORTE|OMPI]_DECLSPEC
   are missed.

   Script to add the corresponding header in front of all following
   (taking care of possible #ifdef HAVE_...)

 - Including some minor cleanups to
   - ompi/group/group.h -- include _after_ #ifndef OMPI_GROUP_H
   - ompi/mca/btl/btl.h -- nclude _after_ #ifndef MCA_BTL_H
   - ompi/mca/crcp/bkmrk/crcp_bkmrk_btl.c -- still no need for
     orte/util/output.h
   - ompi/mca/pml/dr/pml_dr_recvreq.c -- no need for mpool.h
   - ompi/mca/btl/btl.h -- reorder to fit
   - ompi/mca/bml/bml.h -- reorder to fit
   - ompi/runtime/ompi_mpi_finalize.c -- reorder to fit
   - ompi/request/request.h -- additionally need ompi/constants.h

 - Tested on linux/x86-64

This commit was SVN r20720.
2009-03-04 15:35:54 +00:00
Terry Dontje
0178b6c45f Added padding to predefined handle structures to maintain library version to
version compatibility.

This commit was SVN r20627.
2009-02-24 17:17:33 +00:00
Rainer Keller
d81443cc5a - On the way to get the BTLs split out and lessen dependency on orte:
Often, orte/util/show_help.h is included, although no functionality
   is required -- instead, most often opal_output.h, or               
   orte/mca/rml/rml_types.h                                           
   Please see orte_show_help_replacement.sh commited next.            

 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   actually showed two *missing* #include "orte/util/show_help.h"     
   in orte/mca/odls/base/odls_base_default_fns.c and                  
   in orte/tools/orte-top/orte-top.c                                  
   Manually added these.                                              

   Let's have MTT the last word.

This commit was SVN r20557.
2009-02-14 02:26:12 +00:00
Shiqing Fan
a5281f0434 - 1/4 commit for Windows Visual Studio and CCP support:
CMakeLists and .windows files.
  In contribs preconfigured and precompiled parts.

This commit was SVN r20108.
2008-12-10 20:59:20 +00:00
George Bosilca
82d1d5d785 The patch for "Unexpected message queue for unknown CID's required" ticket #1460.
I'm unable to split it in two parts, my patch and Edgar's one. So I just update
copyright information for both of us.
What this patch do:
- it use the unexpected queue create by commit r19562 to dispatch the
  unexpected message to the right communicator (once this communicator
  is created and initialized).
- delay the PML comm_add until we have the context_id for the new communicator.
- only do the PML comm_add on processes that really belong to the new
  communicator. Please read the lengthy comment in the source code for the
  reason behind this.

This commit was SVN r19929.

The following SVN revision numbers were found above:
  r19562 --> open-mpi/ompi@acd3406aa7
2008-11-04 21:58:06 +00:00
Jeff Squyres
0af7ac53f2 Fixes trac:1392, #1400
* add "register" function to mca_base_component_t
   * converted coll:basic and paffinity:linux and paffinity:solaris to
     use this function
   * we'll convert the rest over time (I'll file a ticket once all
     this is committed)
 * add 32 bytes of "reserved" space to the end of mca_base_component_t
   and mca_base_component_data_2_0_0_t to make future upgrades
   [slightly] easier
   * new mca_base_component_t size: 196 bytes
   * new mca_base_component_data_2_0_0_t size: 36 bytes
 * MCA base version bumped to v2.0
   * '''We now refuse to load components that are not MCA v2.0.x'''
 * all MCA frameworks versions bumped to v2.0
 * be a little more explicit about version numbers in the MCA base
   * add big comment in mca.h about versioning philosophy

This commit was SVN r19073.

The following Trac tickets were found above:
  Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392
2008-07-28 22:40:57 +00:00
Rolf vandeVaart
9c080b27d6 Fix for bug when running 64-bit heterogeneous.
This commit fixes trac:1341.

This commit was SVN r18940.

The following Trac tickets were found above:
  Ticket 1341 --> https://svn.open-mpi.org/trac/ompi/ticket/1341
2008-07-17 19:04:40 +00:00
Terry Dontje
12baa72580 This commit fixes trac:1306
This commit was SVN r18718.

The following Trac tickets were found above:
  Ticket 1306 --> https://svn.open-mpi.org/trac/ompi/ticket/1306
2008-06-24 14:38:11 +00:00
Ralph Castain
9613b3176c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.

I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.

This commit was SVN r18619.
2008-06-09 14:53:58 +00:00
Jeff Squyres
e7ecd56bd2 This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.

= ORTE Job-Level Output Messages =

Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):

 * orte_output(): (and corresponding friends ORTE_OUTPUT,
   orte_output_verbose, etc.)  This function sends the output directly
   to the HNP for processing as part of a job-specific output
   channel.  It supports all the same outputs as opal_output()
   (syslog, file, stdout, stderr), but for stdout/stderr, the output
   is sent to the HNP for processing and output.  More on this below.
 * orte_show_help(): This function is a drop-in-replacement for
   opal_show_help(), with two differences in functionality:
   1. the rendered text help message output is sent to the HNP for
      display (rather than outputting directly into the process' stderr
      stream)
   1. the HNP detects duplicate help messages and does not display them
      (so that you don't see the same error message N times, once from
      each of your N MPI processes); instead, it counts "new" instances
      of the help message and displays a message every ~5 seconds when
      there are new ones ("I got X new copies of the help message...")

opal_show_help and opal_output still exist, but they only output in
the current process.  The intent for the new orte_* functions is that
they can apply job-level intelligence to the output.  As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.

=== New code ===

For ORTE and OMPI programmers, here's what you need to do differently
in new code:

 * Do not include opal/util/show_help.h or opal/util/output.h.
   Instead, include orte/util/output.h (this one header file has
   declarations for both the orte_output() series of functions and
   orte_show_help()).
 * Effectively s/opal_output/orte_output/gi throughout your code.
   Note that orte_output_open() takes a slightly different argument
   list (as a way to pass data to the filtering stream -- see below),
   so you if explicitly call opal_output_open(), you'll need to
   slightly adapt to the new signature of orte_output_open().
 * Literally s/opal_show_help/orte_show_help/.  The function signature
   is identical.

=== Notes ===

 * orte_output'ing to stream 0 will do similar to what
   opal_output'ing did, so leaving a hard-coded "0" as the first
   argument is safe.
 * For systems that do not use ORTE's RML or the HNP, the effect of
   orte_output_* and orte_show_help will be identical to their opal
   counterparts (the additional information passed to
   orte_output_open() will be lost!).  Indeed, the orte_* functions
   simply become trivial wrappers to their opal_* counterparts.  Note
   that we have not tested this; the code is simple but it is quite
   possible that we mucked something up.

= Filter Framework =

Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr.  The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations.  The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc.  This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).

Filtering is not active by default.  Filter components must be
specifically requested, such as:

{{{
$ mpirun --mca filter xml ...
}}}

There can only be one filter component active.

= New MCA Parameters =

The new functionality described above introduces two new MCA
parameters:

 * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
   help messages will be aggregated, as described above.  If set to 0,
   all help messages will be displayed, even if they are duplicates
   (i.e., the original behavior).
 * '''orte_base_show_output_recursions''': An MCA parameter to help
   debug one of the known issues, described below.  It is likely that
   this MCA parameter will disappear before v1.3 final.

= Known Issues =

 * The XML filter component is not complete.  The current output from
   this component is preliminary and not real XML.  A bit more work
   needs to be done to configure.m4 search for an appropriate XML
   library/link it in/use it at run time.
 * There are possible recursion loops in the orte_output() and
   orte_show_help() functions -- e.g., if RML send calls orte_output()
   or orte_show_help().  We have some ideas how to fix these, but
   figured that it was ok to commit before feature freeze with known
   issues.  The code currently contains sub-optimal workarounds so
   that this will not be a problem, but it would be good to actually
   solve the problem rather than have hackish workarounds before v1.3 final.

This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
Shiqing Fan
f35a06119c Use memchecker_convertor_call function instead the old one. Move the function to the place that we can use convertor.
This commit was SVN r18370.
2008-05-05 13:57:27 +00:00
Ralph Castain
fa082cafa9 Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex.
Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer.

This commit was SVN r18198.
2008-04-17 20:43:56 +00:00
Shiqing Fan
79da2fdd2c Use the new memchecker convertor function.
Remove some unnecessary memchecker calls.

This commit was SVN r18172.
2008-04-16 13:24:35 +00:00
Shiqing Fan
54c7b71cfd Use the correct way of including memchecker.h, which will work with '--with-devel-headers'.
This commit was SVN r17435.
2008-02-12 18:01:17 +00:00
Shiqing Fan
f5792bbda5 merging the memchecker into trunk.
This commit was SVN r17424.
2008-02-12 08:46:27 +00:00
Tim Prins
b88a3f7a94 Update onesided components to fix the case (on 64 bit machines) where the total offset is greater than 2^31-1 bytes.
See: http://www.open-mpi.org/community/lists/users/2008/01/4880.php

This commit was SVN r17400.
2008-02-07 18:45:35 +00:00
Jeff Squyres
213b5d5c6e Per long threads on the mailing list and much confusion discussion
about linkers, have all OPAL, ORTE, and OMPI components '''not'' link
against the OPAL, ORTE, or OMPI libraries.

See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for
details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a
better-formatted version of the same info).

This commit was SVN r16968.
2007-12-15 13:32:02 +00:00
Shiqing Fan
efdcfa3807 - "extern 'C'" has been set twice. Remove one.
This commit was SVN r16022.
2007-08-30 15:03:59 +00:00
Jeff Squyres
466394a878 We only care about the value of ret in the
!OMPI_ENABLE_PROGRESS_THREADS case.  Reviewed by Brian.

This commit was SVN r16000.
2007-08-29 01:36:17 +00:00
Brian Barrett
af4e86c25f Update collectives selection logic to allow for multiple components to be
used at nce (up to one unique collective module per collective function).
Matches r15795:15921 of the tmp/bwb-coll-select branch

This commit was SVN r15924.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r15795
  r15921
2007-08-19 03:37:49 +00:00
Brian Barrett
7a9a8c7e17 Support reduction operations other than MPI_REPLACE for user-defined
datatypes with MPI_ACCUMULATE

This commit was SVN r15418.
2007-07-13 20:46:12 +00:00
Brian Barrett
739fed9dc9 Don't poke at internal structure fiealds of communicators or groups, but
instead use accessor functions

This commit was SVN r15366.
2007-07-11 17:16:06 +00:00
Brian Barrett
2ed0548da8 * No need for waiting until exposure epochs are over in order to complete
a WIN_FREE
  * Fix race condition in threaded builds with pending unlocks and
    finishing an epoch
  * Fix memory leak due to use of OBJ_DESTRUCT instead of OBJ_RELEASE
  * Fix race condition between releasing multiple shared locks and
    starting a new lock
  * Need to incremement the shared count if starting a new shared
    lock once an exclusive lock finishes

This commit was SVN r15185.
2007-06-24 22:36:00 +00:00
Brian Barrett
5528e0ca60 Properly initialize variable for threaded case
This commit was SVN r15174.
2007-06-22 15:29:06 +00:00
Brian Barrett
5f16251808 revert r15167. I don't know what I was thinking, but it was most definitely
"not right".

This commit was SVN r15172.

The following SVN revision numbers were found above:
  r15167 --> open-mpi/ompi@faa401dc47
2007-06-22 15:25:39 +00:00
Brian Barrett
80c50120ad debugging output should be macro version
This commit was SVN r15168.
2007-06-21 22:09:37 +00:00
Brian Barrett
faa401dc47 * Need to OBJ_RELEASE, not OBJ_DESTRUCT things that were created with
OBJ_NEW
  * Need to single when the passive unlock has left an expose epoch for
    the win_free case
  * Clean up some debugging output
  * fix missing variable initialization

This commit was SVN r15167.
2007-06-21 22:08:30 +00:00
Brian Barrett
0798c0784d properly set fields so that most difficult alignment rules are always met.
This commit was SVN r14854.
2007-06-05 01:46:04 +00:00
Brian Barrett
a2713dcac8 eeks! Bad to notice after committing the pt2pt part of r14806 that the
compile failed because of the wrong variable name.

This commit was SVN r14807.

The following SVN revision numbers were found above:
  r14806 --> open-mpi/ompi@7e57bbb0ef
2007-05-30 20:33:08 +00:00
Brian Barrett
7e57bbb0ef React slightly better when datatype creation from a buffer fails
This commit was SVN r14806.
2007-05-30 20:32:02 +00:00
George Bosilca
8b817e96fd Allow threaded compilation.
This commit was SVN r14775.
2007-05-25 01:53:29 +00:00
Brian Barrett
38b0d22243 Some cleanups to the pt2pt component
* Remove unused declaration
  * remove unused variable warning when not using progress threads
  * If we're using progress threads, we want to lock, not trylock
    when in progress, since it was called from the wakeup thread
    and not the progress function

This commit was SVN r14739.
2007-05-23 20:31:25 +00:00
Sven Stork
88f0845c44 - let the pt2pt component compile with threads enabled
This commit was SVN r14725.
2007-05-23 12:56:34 +00:00
Brian Barrett
38eab3613b * Fix race condition with the pending_{in,out} variables -- if we're going
to do while(...) { } then we can't change the variables in the ... 
    atomically, but should do it while holding the module lock.
  * Fix dumb communicator creation error when we don't create the progress
    stuff (because a window already exists), where we would accidently
    jump to the error case.

This commit was SVN r14715.
2007-05-21 20:53:02 +00:00
Brian Barrett
0e9e0c518a Fix a couple more progress thread related issues...
This commit was SVN r14708.
2007-05-21 16:06:14 +00:00
Brian Barrett
1191677b76 Fix dumb threads-related compile issues
This commit was SVN r14704.
2007-05-21 03:23:58 +00:00
Brian Barrett
2b4b754925 Some much needed cleanup of the point-to-point one-sided component...
* Combine polling of the long requests and buffer requests into
    one type, and in one place
  * Associate the list of requests to poll with the component, not
    the individual modules
  * add progress thread that sits on the OMPI request structure
    and wakes up at the appropriate time to poll the message
    list.  Not the best, but without some asynch notification
    from the PML that a given set of requests has completed, there
    isn't much better
  * Instead of calling opal_progress() all over the place, move
    to using the condition variables like the rest of the project.
    Has the advantage of moving it slightly futher along in the
    becoming thread safe thing
  * Fix a problem with the passive side of unlock where it could
    go recursive and cause all kinds of problems, especially
    when progress threads are used.  Instead, have two parts of
    passive unlock -- one to start the unlock, and another to
    complete the lock and send the ack back.  The data moving
    code trips the second at the right time.

This commit was SVN r14703.
2007-05-21 02:21:25 +00:00
Jeff Squyres
51f286d737 Just like r14289 on the ORTE trunk:
Per discussions with Brian and Ralph, make a slight correction in
where components are installed. Use $pkglibdir, not $libdir/openmpi,
so that when compiled in the orte trunk, components are installed to
the right directory (because the component search patch is checking
$pkglibdir).

This commit was SVN r14345.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r14289
2007-04-12 11:19:42 +00:00
Mohamad Chaarawi
bfaf9d4a12 Added new module for intercomm collectives. This will require an
autogen.

This commit was SVN r14149.
2007-03-27 02:06:42 +00:00
Brian Barrett
62e5e81e99 revert r14142, as the onesided change should *not* have come over
This commit was SVN r14143.

The following SVN revision numbers were found above:
  r14142 --> open-mpi/ompi@241545a098
2007-03-26 15:58:41 +00:00
Brian Barrett
241545a098 Back out r14073 - it speeds up TCP latency / bandwidth but at the same time
it kills ROMIO and one-sided performance when using only TCP.  The problem
is that it only allows those two to be progressed every couple of seconds,
leading to what looks like hangs in the one-sided tests (and the ROMIO stuff,
although people seem to not notice that at this point).

This commit was SVN r14142.

The following SVN revision numbers were found above:
  r14073 --> open-mpi/ompi@64fbbc20b8
2007-03-26 15:56:23 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Brian Barrett
d9e0e80190 Make some debugging output only looked at when debugging is enabled
This commit was SVN r13777.
2007-02-25 01:03:19 +00:00
Brian Barrett
65b07140c0 clean up some of the printf warnings caused by the attribute code
This commit was SVN r13395.
2007-01-31 17:11:06 +00:00
Rainer Keller
061ba05439 - Fixes uncovered with the format attribute to
opal_output and opal_output_verbose

This commit was SVN r13371.
2007-01-30 20:56:31 +00:00
Brian Barrett
385a435813 Start long message send as soon as possible, to minimze ack time for the receive,
greatly increasing mid-range bandwidth

This commit was SVN r13317.
2007-01-25 23:07:03 +00:00
Brian Barrett
95c0a17b9a Send the unlock request before starting the requests. We won't unlock until we get an ack from the remote side,
so there's no longer a race there (I used to do the unlock request last, after local completion of all the
requests completed, to try to avoid having the passive side reply to the active side, but I don't do that
anymore).  The unlock side will not "unlock" the window until it actually receives the correct number of results,
so we're good there.

This fixes an issue where we would receive data on the remote side we weren't expecting that could cause
us to release a lock before it really should have been released to the requesting peer.  It could also
cause a deadlock if one of the processes trying to unlock was "self", as that would result in the active
unlock never sending the unlock request, even though it sent the payload, which could cause a counter
that should always be positive to hit -1, causing an infinite loop that could only be solved by
popping up the stack, which was an impossibility.

Refs trac:785

This commit was SVN r13160.

The following Trac tickets were found above:
  Ticket 785 --> https://svn.open-mpi.org/trac/ompi/ticket/785
2007-01-17 21:13:12 +00:00
Brian Barrett
c1be97199b Fix an issue with recursive calls into the component progress caused by btls sometimes calling opal_progress()
during their send calls by dropping the loop through the list of pending control messages if any are marked
as completed.

Refs trac:784

This commit was SVN r13159.

The following Trac tickets were found above:
  Ticket 784 --> https://svn.open-mpi.org/trac/ompi/ticket/784
2007-01-17 20:48:35 +00:00
Brian Barrett
35c57457c6 Don't call ompi_request_test() if the request isn't likely to finish.
Otherwise, we end up recursively calling into the progress functions
and corrupting a list that doesn't like to be corrupted.

Refs trac:561

This commit was SVN r13138.

The following Trac tickets were found above:
  Ticket 561 --> https://svn.open-mpi.org/trac/ompi/ticket/561
2007-01-17 02:30:11 +00:00
Brian Barrett
f03ffb3a62 Send reply from the passive side of an unlock request back to the active
side and only let MPI_WIN_UNLOCK return when the passive side has actively
replied that the window is unlocked.

Refs trac:761

This commit was SVN r13118.

The following Trac tickets were found above:
  Ticket 761 --> https://svn.open-mpi.org/trac/ompi/ticket/761
2007-01-14 22:08:38 +00:00
George Bosilca
87ff2b5ce8 Cast to the correct type.
This commit was SVN r13046.
2007-01-08 22:04:01 +00:00
Brian Barrett
48ec0b2071 Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix
for now...

This commit was SVN r12997.

The following SVN revision numbers were found above:
  r12974 --> open-mpi/ompi@27cea44a9c
2007-01-04 22:07:37 +00:00
Brian Barrett
27cea44a9c Fix a number of issues with the ompi_ptr_t:
* Make sure that the pval always writes to the correct portion of the
    lval.  This only matters on 32 bit big endian machines.
  * On 32 bit machines when assigning to pval, the other 4 bytes of lval
    weren't being written, which could lead to bogus data

We use macros so that there aren't casts all over the code and the pval
assignment can occur to the correct 4 bytes.  Refs trac:587

This commit was SVN r12974.

The following Trac tickets were found above:
  Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
2007-01-03 19:47:48 +00:00