1
1
Граф коммитов

1185 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
5fae72b9aa Add the MPI 2.2 MPI_Dist_graph functionality.
This patch reshape the way we deal with topologies completely. Where
our topologies were mainly storage components (they were not capable
of creating the new communicator), the new version is built around a
[possibly] common representation (in mca/topo/topo.h), but the functions
to attach and retrieve the topological information are specific to each
component. As a result the ompi_create_cart and ompi_create_graph functions
become useless and have been removed.

In addition to adding the internal infrastructure to manage the topology
information, it updates the MPI interface, and the debuggers support and
provides all Fortran interfaces.

This commit was SVN r28687.
2013-07-01 12:40:08 +00:00
Rolf vandeVaart
5ebb74bee3 Fix case where amount of data sent is less than expected. Otherwise, we will get hang when running the RGET protocol.
Reviewed by hjelm,bosilca.

This commit was SVN r28667.
2013-06-21 18:35:16 +00:00
Jeff Squyres
d692aba672 Remove the DR PML. It was abondoned long ago. It had a nice life,
a few papers, and now a decent demise with respect.  

This commit was SVN r28582.
2013-06-04 19:36:16 +00:00
Rolf vandeVaart
52ebb0b17f Change some opal_output to OPAL_OUTPUT per CMR review.
This commit was SVN r28510.
2013-05-14 20:49:42 +00:00
Nathan Hjelm
4e95d691a7 pml/ob1: do not reset the convertor if one was not created (size = 0).
This macro is only used on the failure path so the additional if statement
should not have any affect on performance.

cmr:v1.7

This commit was SVN r28292.
2013-04-05 01:40:11 +00:00
Nathan Hjelm
9d4a26f47d Update OMPI frameworks to use the MCA framework system.
Notes:
  - This commit also eliminates the need for an available components list in use
    in several frameworks. None of the code in question was making use of the
    priority field of the priority component list item so these extra lists were
    removed.
  - Cleaned up selection code in several frameworks to sort lists using opal_list_sort.
  - Cleans up the ompi/orte-info functions. Expose the functions that construct the
    list of params so they can be used elsewhere.

patches for mtl/portals4 from brian

missed a few output variables in openib

This commit was SVN r28241.
2013-03-27 21:17:31 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Brian Barrett
65109de931 Fix leak of comm and datatype references for mprobe/improbe and fix a request leak in improbe
This commit was SVN r28157.
2013-03-07 21:55:22 +00:00
Rolf vandeVaart
ebe63118ac Remove dependency on libcuda.so when building in CUDA-aware support. Dynamically load it if needed.
This commit was SVN r28140.
2013-03-01 13:21:52 +00:00
Nathan Hjelm
b5a2cd1cce remove csum pml
This commit was SVN r28133.
2013-02-28 00:17:56 +00:00
Ralph Castain
8d2fa3693b First cut at removing the native Windows support. Remove all the Windows-specific components, and the .windows files sprinkled around. Remove the Windows platform files and MTT scripts. Update the NEWS to point Windows users to the cygwin package.
This commit was SVN r28116.
2013-02-26 20:44:56 +00:00
Brian Barrett
312f37706e In talking about this with Jeff and Ralph, we don't actually need
ompi_show_help, because opal_show_help is replaced with an 
aggregating version when using ORTE, so there's no reason to
directly call orte_show_help.

This commit was SVN r28051.
2013-02-12 21:10:11 +00:00
Brian Barrett
d80218996f Rather than setting up the direct call stuff in ompi_mca (which requires
modifying ompi_mca for every interface that is direct called), do it in
the framework's .m4 file.

This commit was SVN r28031.
2013-02-04 23:26:42 +00:00
Rolf vandeVaart
729caaf0cd Remove any dependency on libcuda.so in opal layer. All changes are within OMPI_CUDA_SUPPORT code.
This commit was SVN r27986.
2013-01-30 23:07:32 +00:00
Rolf vandeVaart
b5672927f2 Fix build issue when building with --disable-dlopen.
This commit was SVN r27945.
2013-01-28 20:14:59 +00:00
Brian Barrett
f42783ae1a Move the RTE framework change into the trunk. With this change, all non-CR
runtime code goes through one of the rte, dpm, or pubsub frameworks.

This commit was SVN r27934.
2013-01-27 23:25:10 +00:00
Rolf vandeVaart
f63c88701f Improve CUDA GPU transfers over openib BTL. Use aynchronous copies.
This is RFC that was submitted in July and December of 2012.

This commit was SVN r27862.
2013-01-17 22:34:43 +00:00
Nathan Hjelm
51bf75f8e9 fix typo
This commit was SVN r27573.
2012-11-06 21:25:19 +00:00
Nathan Hjelm
1400bab466 fix a couple of errors in r27569
This commit was SVN r27572.

The following SVN revision numbers were found above:
  r27569 --> open-mpi/ompi@f3ce12e71a
2012-11-06 20:06:54 +00:00
Nathan Hjelm
bdedd8b0d3 Per RFC modify the behavior of mca_base_components_close to NOT close the output. Modify frameworks to always close their output and set to -1.
Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality.

This commit was SVN r27570.
2012-11-06 19:09:26 +00:00
Nathan Hjelm
f3ce12e71a Per RFC fix several leaks in opal and ompi. Details below.
pml/v:
  - If vprotocol is not being used vprotocol_include_list is leaked. Assume vprotocol never takes ownership (see below) and always free the string.

coll/ml:
  - (patch verified) calling mca_base_param_lookup_string after mca_base_param_reg_string is unnecessary. The call to mca_base_param_lookup_string causes the value returned by mca_base_param_reg_string to be leaked.
  - Need to free mca_coll_ml_component.config_file_name on component close.

btl/openib:
  - calling mca_base_param_lookup_string after mca_base_param_reg_string is unnecessary. The call to mca_base_param_lookup_string causes the value returned by mca_base_param_reg_string to be leaked.

vprotocol/base:
  - There was no way for pml/v to determine if vprotocol took ownership of vprotocol_include_list. Fix by always never ownership (use strdup).

mca/base:
  - param_lookup will result in storage->stringval to be a newly allocated string if the mca parameter has a string value. ensure this string is always freed.

cmr:v1.7

This commit was SVN r27569.
2012-11-06 18:57:46 +00:00
Nathan Hjelm
2acd0f83de Revert "Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter".
It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now.

This commit was SVN r27527.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-30 19:45:18 +00:00
Ralph Castain
6aac54b02e Revert r27510, r27509, and r27508.
Not sure what happened here, but the resulting trunk wouldn't even configure. After spending time fixing that problem, I found it wouldn't compile due to multiple syntax errors that had been introduced in both the OPAL and OMPI layer. This raised questions as to the completeness of the work.

Given that the author is departing, I pinged Jeff about it and we agreed to revert this for now. Hopefully, it can either be fixed by the author prior to actual departure, or someone else can pick it up (now that it is in the history) and fix it.

This commit was SVN r27511.

The following SVN revision numbers were found above:
  r27508 --> open-mpi/ompi@12c3c743de
  r27509 --> open-mpi/ompi@79e4a8ca38
  r27510 --> open-mpi/ompi@1ad5ff625a
2012-10-27 16:43:45 +00:00
Shiqing Fan
1ad5ff625a Don't know why these lines are modified, probably by subversion. Although it is harmless, change it back as what it was.
This commit was SVN r27510.
2012-10-27 03:04:45 +00:00
Shiqing Fan
12c3c743de Per the MemPin RFC, submit the component source files, and update the memchecker macros.
This commit was SVN r27508.
2012-10-27 02:48:20 +00:00
Ralph Castain
e6014bf2e1 Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter
This commit was SVN r27477.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-24 18:38:44 +00:00
Nathan Hjelm
d59034e6ef MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions.
cmr:v1.7

This commit was SVN r27451.
2012-10-17 20:17:37 +00:00
Eugene Loh
25ad84b925 Ensure that MPI_Status objects have proper alignment:
- fix the Fortran layer to use new macros to convert Fortran-to-C status
- change the C internals to pull out old OMPI_SET_STATUS* macros

Also, change name of "status" argument in topo_test_f.c to "topo_type".

This commit was SVN r27403.
2012-10-04 14:39:51 +00:00
George Bosilca
890dedf13f Cleanup.
This commit was SVN r27372.
2012-09-26 09:44:46 +00:00
Nathan Hjelm
4bde7f3efe silence warning
This commit was SVN r27019.
2012-08-13 17:43:36 +00:00
Nathan Hjelm
702e6d5a68 pml/ob1: fix bugs in mca_pml_ob1_recv_request_progress_rget
This commit was SVN r27018.
2012-08-13 16:26:06 +00:00
Yael Dayan
7895cd1114 adding a fragmentation mechanism to the Get flow in function mca_pml_ob1_recv_request_progress_rget
This commit was SVN r26956.
2012-08-07 07:15:21 +00:00
George Bosilca
3a8478827b Fix the MPI_Cancel issue identified by Fujitsu. And a typo.
This commit was SVN r26871.
2012-07-26 14:06:24 +00:00
George Bosilca
118e30a2ac Typo.
This commit was SVN r26865.
2012-07-25 12:42:31 +00:00
George Bosilca
b6f4bc9656 size_t not int everywhere. Correctly compute with size_t (don't initialize
it to a negative number). Get rid of the multiplication in the critical
path, and keep the functions as simple as possible.

This commit was SVN r26864.
2012-07-25 12:41:53 +00:00
Yael Dayan
8cad1d6481 while working on a fix for the Get flow in pml, I've encountered a problem in "mca_pml_ob1_compute_segment_length" function, at pml_ob1.h file.
The return value from this function was truncated from size_t to int. This fix changes the return value to type size_t.

This commit was SVN r26863.
2012-07-25 12:08:41 +00:00
Jeff Squyres
9cb3f31b50 Odd; this compiled on OS X without needing #include "opal_stdint.h".
Linux appears to need it.  Shrug.

This commit was SVN r26858.
2012-07-24 13:47:24 +00:00
Jeff Squyres
0b4a659683 Stomp some compiler warnings; use proper printf sequences for uint64_t.
This commit was SVN r26856.
2012-07-24 13:03:55 +00:00
George Bosilca
55bc3c4763 Fix the copyright.
This commit was SVN r26848.
2012-07-24 00:20:24 +00:00
George Bosilca
1ad6c82015 Implement the dump function for the PML OB1.
This commit was SVN r26847.
2012-07-24 00:19:18 +00:00
Abhishek Kulkarni
1878f276cd Replace the pattern while(flag) { opal_progress() }; in the C/R code
with the ORTE_WAIT_FOR_COMPLETION macro.

This commit was SVN r26794.
2012-07-13 23:31:56 +00:00
Abhishek Kulkarni
5c58a1c9c1 Fix C/R support in the trunk.
Among other things, this patch deals with the following issues:
* fix ompi-checkpoint argument parsing
* ompi-restart -showme prints an extraneous "Restarted child with PID" 
  message. Move around the debug statement to avoid this.
* fixes for the state machine changes

This commit was SVN r26770.
2012-07-09 23:34:13 +00:00
George Bosilca
46e808c940 Release the lock a little later.
This commit was SVN r26762.
2012-07-08 12:57:00 +00:00
George Bosilca
ee1410a8d2 A size_t variable should not start at a negative value.
This commit was SVN r26761.
2012-07-08 12:56:33 +00:00
George Bosilca
ec760454a6 Cleaning ...
This commit was SVN r26747.
2012-07-04 21:22:13 +00:00
Josh Hursey
28681deffa Backout the ORCA commit. :(
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.

This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
542330e3a7 Commit of ORCA: Open MPI Runtime Collaborative Abstraction
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.

The project is described on the wiki:
  https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition

And on this email thread:
  http://www.open-mpi.org/community/lists/devel/2012/06/11109.php

This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Ralph Castain
e6f3586415 Remove the orte notifier framework, per discussion at the devel meeting and follow-up with Jeff (who took the action item)
This commit was SVN r26637.
2012-06-22 18:09:23 +00:00
Nathan Hjelm
249066e06d Timeout! Per RFC update the BTL interface to hide segment keys. All BTLs (with the exception of wv), all relevant PMLs, and osc/rdma have been updated for the new interface.
This commit was SVN r26626.
2012-06-21 17:09:12 +00:00
Nathan Hjelm
fbd1636ea4 fix seg fault when size == 0
This commit was SVN r26612.
2012-06-15 16:58:23 +00:00
Nathan Hjelm
0d13cbf11c ob1: bug fix. put fallback on send never actually worked. fixed.
This commit was SVN r26602.
2012-06-14 17:29:58 +00:00
Nathan Hjelm
a809881f78 ob1: reset the converter after a failed sendi before trying send
This commit was SVN r26597.
2012-06-12 15:44:47 +00:00
Nathan Hjelm
59e529cf1d ob1: as per developer discussion disable rdma retries. the failure path currently suffers from live-lock
This commit was SVN r26575.
2012-06-07 23:31:20 +00:00
Mike Dubman
e9c274f3b9 raise cm prio for mxm as well, somehow was removed from 1.7, exists in 1.6
This commit was SVN r26554.
2012-06-05 09:02:03 +00:00
Jeff Squyres
99c5afb397 Remove clang compiler warnings.
This commit was SVN r26523.
2012-05-29 23:36:06 +00:00
Nathan Hjelm
a32d4c648d ob1: rewind convertor after failed send
This commit was SVN r26395.
2012-05-07 17:22:22 +00:00
Nathan Hjelm
b6ae288a59 fix segfault when pml direct enabled
This commit was SVN r26371.
2012-05-01 23:12:41 +00:00
Nathan Hjelm
0eb18b9699 ob1: update copyrights
This commit was SVN r26331.
2012-04-24 20:19:15 +00:00
Nathan Hjelm
0a0e487d9c ob1: add emacs mode/indentation defaults
This commit was SVN r26330.
2012-04-24 20:19:06 +00:00
Nathan Hjelm
9a35f96bda ob1: add support for get fallback on put/send
This commit was SVN r26329.
2012-04-24 20:18:56 +00:00
Nathan Hjelm
93780c63be replace tabs w/ spaces
This commit was SVN r26328.
2012-04-24 20:18:45 +00:00
Jeff Squyres
253444c6d0 == Highlights ==
1. New mpifort wrapper compiler: you can utilize mpif.h, use mpi, and use mpi_f08 through this one wrapper compiler
 1. mpif77 and mpif90 still exist, but are sym links to mpifort and may be removed in a future release
 1. The mpi module has been re-implemented and is significantly "mo' bettah"
 1. The mpi_f08 module offers many, many improvements over mpif.h and the mpi module

This stuff is coming from a VERY long-lived mercurial branch (3 years!); it'll almost certainly take a few SVN commits and a bunch of testing before I get it correctly committed to the SVN trunk.

== More details ==

Craig Rasmussen and I have been working with the MPI-3 Fortran WG and Fortran J3 committees for a long, long time to make a prototype MPI-3 Fortran bindings implementation.  We think we're at a stable enough state to bring this stuff back to the trunk, with the goal of including it in OMPI v1.7.  

Special thanks go out to everyone who has been incredibly patient and helpful to us in this journey:

 * Rolf Rabenseifner/HLRS (mastermind/genius behind the entire MPI-3 Fortran effort)
 * The Fortran J3 committee
 * Tobias Burnus/gfortran
 * Tony !Goetz/Absoft
 * Terry !Donte/Oracle
 * ...and probably others whom I'm forgetting :-(

There's still opportunities for optimization in the mpi_f08 implementation, but by and large, it is as far along as it can be until Fortran compilers start implementing the new F08 dimension(..) syntax.

Note that gfortran is currently unsupported for the mpi_f08 module and the new mpi module.  gfortran users will a) fall back to the same mpi module implementation that is in OMPI v1.5.x, and b) not get the new mpi_f08 module.  The gfortran maintainers are actively working hard to add the necessary features to support both the new mpi_f08 module and the new mpi module implementations.  This will take some time.

As mentioned above, ompi/mpi/f77 and ompi/mpi/f90 no longer exist.  All the fortran bindings implementations have been collated under ompi/mpi/fortran; each implementation has its own subdirectory:

{{{
ompi/mpi/fortran/
  base/               - glue code
  mpif-h/             - what used to be ompi/mpi/f77
  use-mpi-tkr/        - what used to be ompi/mpi/f90
  use-mpi-ignore-tkr/ - new mpi module implementation
  use-mpi-f08/        - new mpi_f08 module implementation
}}}

There's also a prototype 6-function-MPI implementation under use-mpi-f08-desc that emulates the new F08 dimension(..) syntax that isn't fully available in Fortran compilers yet.  We did that to prove it to ourselves that it could be done once the compilers fully support it.  This directory/implementation will likely eventually replace the use-mpi-f08 version.

Other things that were done:

 * ompi_info grew a few new output fields to describe what level of Fortran support is included
 * Existing Fortran examples in examples/ were renamed; new mpi_f08 examples were added
 * The old Fortran MPI libraries were renamed:
   * libmpi_f77 -> libmpi_mpifh
   * libmpi_f90 -> libmpi_usempi
 * The configury for Fortran was consolidated and significantly slimmed down.  Note that the F77 env variable is now IGNORED for configure; you should only use FC. Example:
{{{
shell$ ./configure CC=icc CXX=icpc FC=ifort ...
}}}

All of this work was done in a Mercurial branch off the SVN trunk, and hosted at Bitbucket.  This branch has got to be one of OMPI's longest-running branches.  Its first commit was Tue Apr 07 23:01:46 2009 -0400 -- it's over 3 years old!  :-)  We think we've pulled in all relevant changes from the OMPI trunk (e.g., Fortran implementations of the new MPI-3 MPROBE stuff for mpif.h, use mpi, and use mpi_f08, and the recent Fujitsu Fortran patches).

I anticipate some instability when we bring this stuff into the trunk, simply because it touches a LOT of code in the MPI layer in the OMPI code base.  We'll try our best to make it as pain-free as possible, but please bear with us when it is committed.

This commit was SVN r26283.
2012-04-18 15:57:29 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Brian Barrett
61a090e0d1 Checking for NULL function pointers and direct-call semantics can't work
together, so implement all functions in the MTL interface for all
MTLs.  The only places NULL was still being set was for add_comm/del_comm,
and matched probe, both of which are straight forward to implement (or
return ERROR_NOT_IMPLEMENTED, since the PML can't emulate matched probe).

This commit was SVN r26194.
2012-03-26 19:27:03 +00:00
Jeff Squyres
fa8980157a Fix typo.
This commit was SVN r26183.
2012-03-23 00:12:32 +00:00
Brian Barrett
cce936b94c * Implement matched probe for the CM PML. Required adding a peer field to
the ompi_message_t structure to properly initialize convertor (the peer
  is available in the request in OB1, and wasn't needed when I did the
  original implementation).
* Implement matched probe for the Portals4 MTL and add NULL function pointers
  for the other MTLs.
* Add add_comm and del_comm functions to portals4 MTL so that direct call
  almost works again.
* Add NEWS item that we've implemented matched probe

This commit was SVN r26180.
2012-03-22 22:55:59 +00:00
Terry Dontje
e73df369e4 Update bfo pml with code from ob1 to support mprobe, improbe, mrecv, imrecv and cuda.
This commit was SVN r26145.
2012-03-15 10:20:46 +00:00
Nathan Hjelm
f1525bdbff ob1: fix two fragment leaks
- MAJOR! get src descriptor leaks if mca_bml_base_send fails
 - minor. descriptor leaked in mca_pml_send_request_start_copy if the btl returns OMPI_ERR_RESOURCE_BUSY.

This commit was SVN r26077.
2012-03-01 15:53:39 +00:00
Rolf vandeVaart
b0a84b0a7d New btl that extends sm btl to support GPU transfers within a node.
Uses new CUDA IPC support.  Also, a few minor changes in PML to take
advantage of it.

This code has no effect unless user asks for it explicitly via 
configure arguments.  Otherwise, it is either #ifdef'ed out or
not compiled.

This commit was SVN r26039.
2012-02-24 02:13:33 +00:00
Brian Barrett
25d48e22fa Implementation of the MPI-3 Matched Probe functionality. Currently only
implemented in the OB1 PML, will return NOT_SUPPORTED in other PMLs.

This commit was SVN r25865.
2012-02-06 17:35:21 +00:00
Nathan Hjelm
bb1fec0407 added put/get btl descriptor flags
This commit was SVN r25553.
2011-11-30 21:37:23 +00:00
Jeff Squyres
6fbbfd0f7a Gah! r25545 acidentally included ''waaaay'' more stuff than it was
supposed to.  I.e., half-baked/not complete stuff.

This commit backs out all of r25545.  Sorry folks!

This commit was SVN r25546.

The following SVN revision numbers were found above:
  r25545 --> open-mpi/ompi@7f9ae11faf
2011-11-29 23:24:52 +00:00
Jeff Squyres
7f9ae11faf Per http://www.open-mpi.org/community/lists/users/2011/11/17862.php,
to make MPI_IN_PLACE (and other sentinel Fortran constants) work on OS
X, we need to use the following compiler (linker) flag:

    -Wl,-commons,use_dylibs 

So if we're compiling on OS X, test to see if that flag works with the
compiler.  If so, add it to the wrapper FFLAGS and FCFLAGS (note that
per a future update, we'll only have one Fortran compiler anyway).

Fixes trac:1982.  

This commit was SVN r25545.

The following Trac tickets were found above:
  Ticket 1982 --> https://svn.open-mpi.org/trac/ompi/ticket/1982
2011-11-29 23:05:54 +00:00
George Bosilca
0bd2bf9aae The number of segments accepted should be bounded by MCA_BTL_DES_MAX_SEGMENTS
and not by 2.

This commit was SVN r25515.
2011-11-28 17:19:12 +00:00
Nathan Hjelm
f8c8c641f1 added asserts to warn developers that ob1/csum match fragments do not support more than 2 segments
This commit was SVN r25514.
2011-11-28 16:12:25 +00:00
Nathan Hjelm
8962ce25b0 fixed some compiler errors caused by seg_key changes. osc/rdma may need to be updated to use btls that use 128 bit segment keys
This commit was SVN r25448.
2011-11-06 20:19:14 +00:00
George Bosilca
efd88e10d7 Cleanup the error codes. Get rid of all the useless ones, and
mark the distinction between ORTE and OMPI errors.

This commit was SVN r25323.
2011-10-19 03:51:53 +00:00
Ralph Castain
2eaadcfab9 Remove unused variable
This commit was SVN r25284.
2011-10-14 15:32:18 +00:00
George Bosilca
3241bea696 Apply a patch provided by Sébastien Boisvert fixing an issue
with the probe fairness.

This commit was SVN r25265.
2011-10-11 20:28:33 +00:00
George Bosilca
4fd78c4683 Keep track of the last probe on each communicator, so we can probe all
peers in a round-robin fashion. A little bit more fair ...

This commit was SVN r25264.
2011-10-11 20:24:54 +00:00
George Bosilca
80c02647c8 Each level (OPAL/ORTE/OMPI) should only return it's own constants,
instead of the current mismatch.

This commit was SVN r25230.
2011-10-04 14:50:31 +00:00
Mike Dubman
29b63fee81 better support of pml/cm for mxm
This commit was SVN r25113.
2011-09-06 06:38:57 +00:00
Wesley Bland
4e7ff0bd5e By popular demand the epoch code is now disabled by default.
To enable the epochs and the resilient orte code, use the configure flag:

--enable-resilient-orte

This will define both:

ORTE_ENABLE_EPOCH
ORTE_RESIL_ORTE

This commit was SVN r25093.
2011-08-26 22:16:14 +00:00
Shiqing Fan
627f1dd351 Correct several export declarations.
This commit was SVN r25047.
2011-08-15 09:45:51 +00:00
Rolf vandeVaart
3d3b3d4dad Add support for CUDA registering sm and openib buffers. Feature is disabled by default.
This commit was SVN r24987.
2011-08-04 10:15:45 +00:00
Jeff Squyres
b2b781e537 Fix a few miscelaneous memory leaks.
This commit was SVN r24865.
2011-07-08 16:39:58 +00:00
Mike Dubman
fd17f20ed5 Currently MTLs do no handle communicator contexts in any special way,
they only add the context id to the tag selection of the underlying 
messaging meachinsm. 
 
 We would like to enable an MTL to maintain its own context data
per-communicator. This way an MTL will be able to queue incoming eager 
messages and rendezvous requests per-communicator basis.

 The MTL will be allowed to override comm->c_pml_comm member, 
since it's unused in pml_cm anyway. 

This commit was SVN r24858.
2011-07-06 18:25:49 +00:00
Wesley Bland
e1ba09ad51 Add a resilience to ORTE. Allows the runtime to continue after a process (or
ORTED) failure. Note that more work will be necessary to allow the MPI layer to
take advantage of this.

Per RFC:
http://www.open-mpi.org/community/lists/devel/2011/06/9299.php

This commit was SVN r24815.
2011-06-23 20:38:02 +00:00
Ralph Castain
b47ec2ee87 Remove lingering references to opal_profile option
This commit was SVN r24709.
2011-05-18 18:27:29 +00:00
Josh Hursey
045035963a Fix return code from MPI_Probe and MPI_Iprobe.
Instead of returning MPI_SUCCESS every time they are called regardless of the status of the call, they should return a value representative of the action. So similar to MPI_Wait/MPI_Test they will return MPI_SUCCESS if the action was successfull, or the value that matches status.MPI_ERROR for the operation if it is unsuccessful.

This was discussed on the [http://www.open-mpi.org/community/lists/devel/2011/03/9109.php ompi-devel list]

This commit was SVN r24551.
2011-03-22 13:29:29 +00:00
Eugene Loh
2770a12beb Continue clean up of thread options started in r22841, 22842, and 22849.
No need for any CMRs to 1.5... that was already done in CMR 2728.

This commit was SVN r24545.

The following SVN revision numbers were found above:
  r22841 --> open-mpi/ompi@b400b84162
2011-03-18 21:36:35 +00:00
Jeff Squyres
2600672b31 Fix minor memory leak.
This commit was SVN r24494.
2011-03-08 15:21:33 +00:00
George Bosilca
ceb519a026 Fix an annoying warning from gcc about uninitialized variables.
This commit was SVN r24459.
2011-02-25 00:29:20 +00:00
Rolf vandeVaart
8171370287 Fix typo which broke builds when configured with hetero and
debug.

This commit was SVN r24283.
2011-01-21 17:10:09 +00:00
Abhishek Kulkarni
a1090575c2 Nitpick: Get rid of a redundant OPAL_SOS_GET_ERROR_CODE.
This commit was SVN r24282.
2011-01-20 23:48:11 +00:00
Rolf vandeVaart
6a5ad29c36 Update configure command since it changed.
This commit was SVN r24275.
2011-01-20 14:42:12 +00:00
Brian Barrett
8f6a19b0fc export component/module interface so that direct call works again
This commit was SVN r24268.
2011-01-19 20:47:17 +00:00
Rolf vandeVaart
a57e5587f6 Brief description of bfo functionality.
This commit was SVN r24186.
2010-12-17 19:12:00 +00:00
Shiqing Fan
f43862420c Convert the bad dos line endings to unix style for all windows related files.
This commit was SVN r24137.
2010-12-02 12:08:08 +00:00
Rolf vandeVaart
ad4c411ab0 Delete old utility script.
This commit was SVN r24101.
2010-11-30 15:18:07 +00:00
Rolf vandeVaart
fdf59375d6 Add a few #ifdefs for clarity.
This commit was SVN r24098.
2010-11-29 19:34:04 +00:00
Rolf vandeVaart
29a7398df9 Move #define to Makefile for clarity. Add #ifdef
to priority section.

This commit was SVN r24084.
2010-11-23 22:19:32 +00:00
Rolf vandeVaart
f65309364b Add closing comment to #ifdefs. No code changes.
This commit was SVN r24083.
2010-11-23 21:42:20 +00:00
Rolf vandeVaart
b2d457f049 Make format match rest of files in directory.
This commit was SVN r24078.
2010-11-22 15:10:02 +00:00
Rolf vandeVaart
9c57108e20 Change #ifdef to #if to match OMPI coding conventions.
This commit was SVN r24067.
2010-11-17 20:34:15 +00:00
Rolf vandeVaart
90bbb33919 Move variable declaration at beginning of code block to avoid warnings.
Also, add memchecker code to csum to keep PMLs consistent.

This commit was SVN r24066.
2010-11-17 18:01:56 +00:00
Shiqing Fan
c11bdec1c8 revert r24059, need a better solution for windows build.
This commit was SVN r24063.

The following SVN revision numbers were found above:
  r24059 --> open-mpi/ompi@74927c7ab0
2010-11-17 16:09:07 +00:00
Shiqing Fan
74927c7ab0 Remove unnecessary semi-colons, they break windows build.
This commit was SVN r24059.
2010-11-17 00:39:38 +00:00
Shiqing Fan
ba2dbff82d Check for addressability in MPI_*_init, since buffer passed by the application should have been already allocated, but might be not initialized.
Check in MPI_Start / MPI_Startall for defined-ness of the buffer passed into the send request(s).

This commit was SVN r24054.
2010-11-16 01:01:12 +00:00
Rolf vandeVaart
72d06215d5 Add some missing semi-colons.
This commit was SVN r24041.
2010-11-11 19:25:25 +00:00
Rolf vandeVaart
e5e301b564 Abort when unknown header is received.
This commit was SVN r24030.
2010-11-10 19:13:56 +00:00
Rolf vandeVaart
1aa558558d Add some parentheses to keep PMLs consistent.
This commit was SVN r24021.
2010-11-09 18:51:32 +00:00
Rolf vandeVaart
f156162289 Add a few missing SOS calls.
This commit was SVN r24018.
2010-11-09 17:48:11 +00:00
Rolf vandeVaart
9ed780b73d Cleanup plus another macro.
This commit was SVN r23994.
2010-11-04 19:35:25 +00:00
Rolf vandeVaart
e40483465e Remove unneeded function. Fix list handling.
This commit was SVN r23992.
2010-11-04 13:04:47 +00:00
Rolf vandeVaart
1b231f7e73 More miscellaneous cleanup of bfo.
This commit was SVN r23986.
2010-11-02 20:11:47 +00:00
Ralph Castain
9ea2b196ce Convert the opal_event framework to use direct function calls instead of hiding functions behind function pointers. Eliminate the opal_object_t abstraction of libevent's event struct so it can be directly passed to the libevent functions.
Note: the ompi_check_libfca.m4 file had to be modified to avoid it stomping on global CPPFLAGS and the like. The file was also relocated to the ompi/config directory as it pertains solely to an ompi-layer component.

Forgive the mid-day configure change, but I know Shiqing is working the windows issues and don't want to cause him unnecessary redo work.

This commit was SVN r23966.
2010-10-28 15:22:46 +00:00
Ralph Castain
86c7365e8e Clean up a few initialization issues - don't think these are impacting the shared memory situation as it didn't fix the problem.
Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution.

This commit was SVN r23943.
2010-10-26 02:41:42 +00:00
George Bosilca
bd9e48d5cf Add the missing default case. Cleanup required by the author.
This commit was SVN r23939.
2010-10-25 18:55:18 +00:00
Ralph Castain
3c9d167bd2 Fix segfault in ompi_info when no pml_v includes provided
This commit was SVN r23926.
2010-10-24 19:24:44 +00:00
Ralph Castain
fceabb2498 Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.

Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.

Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.

I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:

1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)

2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.

There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.

This commit was SVN r23925.
2010-10-24 18:35:54 +00:00
Rolf vandeVaart
148ed00dd1 Some more refactoring in the BFO PML. Getting it
as close to OB1 PML as possible.

This commit was SVN r23920.
2010-10-22 18:13:35 +00:00
Ralph Castain
1766bf271a Correct an abstraction break that causes ompi_info to segfault if pml-v is not built. Move the definition and instantation of the mca_pml_v struct to the vprotocol base. Include the vprotocol/base/base.h file in pml_v.h. Remove the now useless pml_v.c.
Perhaps somebody out there who cares and uses it can verify that vprotocol works?

This commit was SVN r23919.
2010-10-22 05:12:12 +00:00
Rolf vandeVaart
70fe48698c Change some of the bfo code to be more like the
ob1 code.  Create some new macros and functions to handle
some differences.

This commit was SVN r23913.
2010-10-19 17:46:51 +00:00
Rolf vandeVaart
24e5e38dce Remove a variable that is not needed. Just piggy
back on a pointer value.

This commit was SVN r23887.
2010-10-13 22:01:23 +00:00
Rolf vandeVaart
20c5e6e0d6 Fix a few more cases where we are using a function
as an argument to a macro which could result in it
being called twice.  I did not observe any issues,
but it should be fixed.  Also did some minor refactoring
for clarity and following code convention.

This commit was SVN r23886.
2010-10-12 20:11:48 +00:00
Rolf vandeVaart
44d7006f34 Just some more refactoring and cleanup of bfo PML.
This commit was SVN r23884.
2010-10-12 13:34:35 +00:00
Rolf vandeVaart
59e3fa8ed3 Some more formatting fixes and code refactoring. All
these changes are in the bfo so this has no affect on ob1.

This commit was SVN r23815.
2010-09-29 13:46:45 +00:00
Rolf vandeVaart
f808dd2881 Cosmetic changes to fix spaces. No code change.
This commit was SVN r23803.
2010-09-27 21:01:49 +00:00
Jeff Squyres
73bcc4a36b Fix mistake that came in via the ompi-agen tree in r23764. The mistake wasn't part of the core autogen upgrade; it was an additional 'bonus' cleanup. Oops. The mistake will always create a set of directories under installdir, even if you do not --with-devel-headers. The set of directories will be empty, but still -- they should not be there at all. This commit fixes that -- the directories are not created at all if you do not --with-devel-headers
This commit was SVN r23801.

The following SVN revision numbers were found above:
  r23764 --> open-mpi/ompi@40a2bfa238
2010-09-24 22:53:28 +00:00
Rolf vandeVaart
3cc1fa45bf Fix a few more extraneous spaces. Also update csum
priority logic to match ob1.

This commit was SVN r23798.
2010-09-24 13:14:18 +00:00
Rolf vandeVaart
0331889495 Some more spaces, tabs, include file ordering changes.
No real code changes here.  

This commit was SVN r23789.
2010-09-22 13:48:22 +00:00
Shiqing Fan
a4c2ed7a87 Fix a few things for Windows build - type cast, modified variable names and unresolved symbols.
This commit was SVN r23783.
2010-09-21 09:40:26 +00:00
Rolf vandeVaart
77560269f2 More fixes of spaces, tabs, and ordering of include files
to make the 3 PMLs the same where they are the same.  No
real code changes.

This commit was SVN r23779.
2010-09-20 21:22:33 +00:00
Jeff Squyres
099816e59e Somehow this file got missed.
This commit was SVN r23766.
2010-09-18 04:37:37 +00:00
Ralph Castain
40a2bfa238 WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gone into validating the branch. However, not all conditions can be checked, so users are cautioned that it may be advisable to not update from the trunk for a few days to allow MTT to identify platform-specific issues.
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.

Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.

This commit was SVN r23764.
2010-09-17 23:04:06 +00:00
Rolf vandeVaart
65e8277add Mostly fixes for tabs, spaces and indentations.
Also, some other changes to bring the csum PML up
to date with changes that happened in ob1 over the
last two years. This includes a few bug
fixes and some minor refactoring.  

This commit was SVN r23757.
2010-09-15 18:48:06 +00:00
Rolf vandeVaart
31a168695e Some more cleanup of extraneous spaces and tabs. Also
some changes to script to run diffs between PMLs.

This commit was SVN r23749.
2010-09-13 14:58:00 +00:00
Rolf vandeVaart
3bb587937a Just fix up some trailing spaces, tabs instead of spaces,
missing periods on copyrights, extraneous spaces on blank
lines.  No actual code change.

This commit was SVN r23739.
2010-09-10 21:01:52 +00:00
Josh Hursey
e12ca48cd9 A number of C/R enhancements per RFC below:
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php

Documentation:
  http://osl.iu.edu/research/ft/

Major Changes: 
-------------- 
 * Added C/R-enabled Debugging support. 
   Enabled with the --enable-crdebug flag. See the following website for more information: 
   http://osl.iu.edu/research/ft/crdebug/ 
 * Added Stable Storage (SStore) framework for checkpoint storage 
   * 'central' component does a direct to central storage save 
   * 'stage' component stages checkpoints to central storage while the application continues execution. 
     * 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress) 
     * 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching) 
 * Added Compression (compress) framework to support 
 * Add two new ErrMgr recovery policies 
   * {{{crmig}}} C/R Process Migration 
   * {{{autor}}} C/R Automatic Recovery 
 * Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component 
 * Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option) 
   * {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342) 
   * {{{OMPI_CR_Restart}}} 
   * {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules) 
   * {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192) 
   * {{{OMPI_CR_Quiesce_start}}} 
   * {{{OMPI_CR_Quiesce_checkpoint}}} 
   * {{{OMPI_CR_Quiesce_end}}} 
   * {{{OMPI_CR_self_register_checkpoint_callback}}} 
   * {{{OMPI_CR_self_register_restart_callback}}} 
   * {{{OMPI_CR_self_register_continue_callback}}} 
 * The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future. 
 * Add a progress meter to: 
   * FileM rsh (filem_rsh_process_meter) 
   * SnapC full (snapc_full_progress_meter) 
   * SStore stage (sstore_stage_progress_meter) 
 * Added 2 new command line options to ompi-restart 
   * --showme : Display the full command line that would have been exec'ed. 
   * --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413) 
 * Deprecated some MCA params: 
   * crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir 
   * snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir 
   * snapc_base_global_shared deprecated, use sstore_stage_global_is_shared 
   * snapc_base_store_in_place deprecated, replaced with different components of SStore 
   * snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref 
   * snapc_base_establish_global_snapshot_dir deprecated, never well supported 
   * snapc_full_skip_filem deprecated, use sstore_stage_skip_filem 

Minor Changes: 
-------------- 
 * Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing. 
 * Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components 
 * Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it. 
 * Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}} 
 * Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set. 
 * opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality. 
 * Cleanup the CRS framework and components to work with the SStore framework. 
 * Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably). 
 * Add 'quiesce' hook to CRCP for a future enhancement. 
 * We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}. 
 * Add optional application level INC callbacks (registered through the CR MPI Ext interface). 
 * Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive. 
 * {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked. 
 * {{{opal-restart}}} also support local decompression before restarting 
 * {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata 
 * {{{orte-restart}}} now uses the SStore framework to work with the metadata 
 * Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality. 
 * Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}. 
 * Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped. 
 * Make sure to decrement the number of 'num_local_procs' in the orted when one goes away. 
 * odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options. 
 * Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities. 
 * Improve the checks for 'already checkpointing' error path. 
 * A a recovery output timer, to show how long it takes to restart a job 
 * Do a better job of cleaning up the old session directory on restart. 
 * Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment) 
 * Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize. 

This commit was SVN r23587.

The following Trac tickets were found above:
  Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
  Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
  Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
  Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
  Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
  Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
  Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
2010-08-10 20:51:11 +00:00
Rolf vandeVaart
0324fdb407 Created two new macros that are used when filling in either the
status structure or the _ucount field in the status structure.
On 64-bit sparc, the macros resolve into integer array assignments.
For all others, they are just simple assignments.  This fixes 
possible BUS errors seen when running on the SPARC processor.
This bug was introduced when the _count field changed from an int
into a size_t.  See the changes to request.h for additional details.

This commit fixes trac:2514.

This commit was SVN r23554.

The following Trac tickets were found above:
  Ticket 2514 --> https://svn.open-mpi.org/trac/ompi/ticket/2514
2010-08-04 19:36:40 +00:00
George Bosilca
733d25a8a3 First step toward fixing the MPI_Get_count issues from the ticket #2241. Next
step is the configure and Fortran mojo that Jeff will put in. Until then I
guess the Fortran interface is broken (at least all functions using the hidden
count firld in the MPI_Status).

This commit was SVN r23467.
2010-07-21 20:07:00 +00:00
Rolf vandeVaart
45019a3abf Correctly handle zero-length match fragment.
This commit was SVN r23459.
2010-07-21 15:27:06 +00:00
Rolf vandeVaart
3abb5556a6 Fix bug pointed out by George Bosilca. Also
remove unneeded temp variable.

This commit was SVN r23424.
2010-07-15 19:32:31 +00:00
Jeff Squyres
bfe6c95dce Remove .windows because it doesn't exist in this directory.
This commit was SVN r23406.
2010-07-14 02:14:32 +00:00
Rolf vandeVaart
e27f953fa4 Fix casting in assert.
This commit was SVN r23388.
2010-07-13 12:02:12 +00:00
Rolf vandeVaart
19d007a6fc New PML to support failover between openib BTLs.
openib BTL changes coming soon.

This commit was SVN r23385.
2010-07-13 10:46:20 +00:00
Jeff Squyres
c8bb7537e7 Remove include/opal/sys/cache.h -- its only purpose in life was to
#define CACHE_LINE_SIZE to 128.  This name has a conflict on NetBSD,
and it seems kinda odd to have a header file that ''only'' defines a
single value.  Also, we'll soon be raising hwloc to be a first-class
item, so having this file around seemed kinda weird.

Therefore, I replaced CACHE_LINE_SIZE with opal_cache_line_size, an
int (in opal/runtime/opal_init.c and opal/runtime/opal.h) on the
rationale that we can fill this in at runtime with hwloc info (trunk
and v1.5/beyond, only).  The only place we ''needed'' a compile-time
CACHE_LINE_SIZE was in the BTL SM (for struct padding), so I made a
new BTL_SM_ preprocessor macro with the old CACHE_LINE_SIZE value
(128).  That use isn't suitable for run-time hwloc information,
anyway.

This commit was SVN r23349.
2010-07-06 14:33:36 +00:00
Rolf vandeVaart
03b3e75f86 Add two arguments to the PML error callback function. This
allows the BTL to specify a specific ompi_proc_t that had an
error.  Also add an optional descriptive string.  Currently, arguments
are not used but will be by future failover PML. 
Changes based on RFC.  Reviewed by George Bosilca.

This commit was SVN r23174.
2010-05-19 11:55:45 +00:00
Abhishek Kulkarni
afbe3e99c6 * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
 SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
 back the native error code.

* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
  (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
  decode 'ret' to get the native error code.

This commit was SVN r23162.
2010-05-17 23:08:56 +00:00
George Bosilca
321213e779 Fix segmentation fault on heterogeneous architectures. Don't mess with the
ompi_ptr_t by translating into void*. Instead keep it as an ompi_ptr_t all
the way. Thanks to Timur Magomedov for helping to track down this issue and
test the patch.

cmr:v1.4
cmr:v1.5

This commit was SVN r23030.
2010-04-23 15:14:55 +00:00
Rainer Keller
a48a11821b - mca_base_param_reg_string_name allocates default_pml.
As it is strdup, just free(default_pml).

   cmr:v1.5

This commit was SVN r22955.
2010-04-12 19:54:07 +00:00
Rolf vandeVaart
0adb570693 Add pml_ob1_verbose flag. Fix the current location it is being used
This commit was SVN r22939.
2010-04-07 13:51:42 +00:00
Ralph Castain
522a23d6a3 A few changes to the FT-related configure options:
1. fix a bug that caused an infinite loop in configure when specifying want-ft but not want-ft-thread by removing a stale reference to the opal-progress-thread option

2. add want-ft=orcm so we can build the orcm errmgr component

3. cleanup the use of "ompi_want_ft_xxx" and replace it with "opal_want_ft_xxx" so that naming conventions are preserved

This commit was SVN r22885.
2010-03-25 22:53:48 +00:00
Ralph Castain
b400b84162 Merge in the modified thread configure option branch per today's telecon.
Remove the --enable-progress-threads option as this is no longer functional, and hardcode OPAL_ENABLE_PROGRESS_THREADS to 0.

Replace the --enable-mpi-threads option with --enable-mpi-thread-multiple as this is clearer as to meaning. This option automatically turns "on" opal thread support if it wasn't already so specified. If the user specifies --disable-opal-multi-threads --enable-mpi-thread-multiple, we will error out with a message

Add a new --enable-opal-multi-threads option that turns "on" opal thread support without doing anything wrt mpi-thread-multiple

This commit was SVN r22841.
2010-03-16 23:10:50 +00:00
Josh Hursey
e9b5162d79 Fix the configure logic for --with-ft so that it properly takes a comma separated list.
Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those.

The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed.

This commit was SVN r22824.
2010-03-12 23:57:50 +00:00
Rainer Keller
06f5ba1c19 - Reverse the logic (OPAL_LIKELY -> OPAL_UNLIKELY)
This commit was SVN r22796.
2010-03-08 14:00:59 +00:00
Nysal Jan
97d66bce78 This fixes trac:2154 - CSUM PML false positive. Needs to go to both cmr:v1.4.2 and cmr:v1.5
This commit was SVN r22590.

The following Trac tickets were found above:
  Ticket 2154 --> https://svn.open-mpi.org/trac/ompi/ticket/2154
2010-02-10 10:24:16 +00:00
Shiqing Fan
ad763c327d Restore several linked libraries that were deleted by mistake in r22405.
This commit was SVN r22415.

The following SVN revision numbers were found above:
  r22405 --> open-mpi/ompi@872a4047ba
2010-01-14 21:50:42 +00:00
Avneesh Pant
8bdd334d95 Allow the PSM component to return ERR_NOT_AVAIL so it can be unloaded silently if executed on a node with no QLogic IB hardware. Also minor modifications to have the CM PML allow itself to be unloaded if no MTL components are available. The component selection logic can then continue to use other PMLs.
This commit was SVN r22410.
2010-01-14 19:39:35 +00:00
Shiqing Fan
872a4047ba Fix the bug that caused by ADD_DEPENDENCIES() from different version of CMake.
In CMake 2.6 and earlier, this function add dependencies for targets and also link the target libraries automatically, but in CMake 2.8,this behavior has been changed, i.e. it will only add the dependencies but no link, which will cause linking errors at compilation time.

This commit was SVN r22405.
2010-01-14 18:10:20 +00:00
Jeff Squyres
2bdcb2a979 Move CM's MCA params into their own function (component.register).
This commit was SVN r22392.
2010-01-12 20:11:47 +00:00
Shiqing Fan
c37308b8eb Remove the deleted windows file from the tarball.
This commit was SVN r22347.
2009-12-29 16:11:32 +00:00
Shiqing Fan
a2d00d4ab8 Exclude a pml component that is not necessary for Windows.
This commit was SVN r22342.
2009-12-28 16:12:28 +00:00
Jeff Squyres
4f68dfb03c Remove some dead code (thanks to George for pointing it out).
This commit was SVN r22309.
2009-12-14 21:20:41 +00:00
George Bosilca
76222eb869 Get rid of the useless mca_pml_base_endpoint_t and replace it by
[the well known and widely used!] mca_pml_endpoint_t.

This commit was SVN r22277.
2009-12-08 17:29:54 +00:00
Aurelien Bouteiller
59156cd92a Fix gcc 4.3 warning berserk about non-literal string format.
This commit was SVN r22147.
2009-10-27 20:45:02 +00:00
George Bosilca
16c6370b73 A little bit of cleanup, the main logic is still the same.
This commit was SVN r22043.
2009-10-01 14:05:25 +00:00
Nysal Jan
f53f286456 Setup the convertor once during add_procs() instead on every request
This commit was SVN r21873.
2009-08-24 18:50:39 +00:00
Brian Barrett
07d49e982b hdr_ctx is a uint16, so can have CIDs in range of 0 ... 2^16 - 1. I think
someone (me?) must have done 2^(16 - 1) instead.  Ooops.

This commit was SVN r21869.
2009-08-22 05:21:01 +00:00
Rainer Keller
8e1b23779f - Replace combinations of
#if defined (c_plusplus)
          defined (__cplusplus)
   followed by
      extern "C" {
   and the closing counterpart by BEGIN_C_DECLS and END_C_DECLS.

   Notable exceptions are:
    - opal/include/opal_config_bottom.h:
      This is our generated code, that itself defines BEGIN_C_DECL and
      END_C_DECL
    - ompi/mpi/cxx/mpicxx.h:
      Here we do not include opal_config_bottom.h:                                 
    - Belongs to external code:                                                    
      opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.c        
      opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.h        
    - opal/include/opal/prefetch.h:
      Has C++ specific macros that are protected:                                  

    - Had #if ... } #endif  _and_ END_C_DECLS (aka end up with 2x
      END_C_DECLS)
      ompi/mca/btl/openib/btl_openib.h
    - opal/event/event.h has #ifdef __cplusplus as BEGIN_C_DECLS...
    - opal/win32/ompi_process.h: had extern "C"\n {...
      opal/win32/ompi_process.h: dito
    - ompi/mca/btl/pcie/btl_pcie_lex.l: needed to add *_C_DECLS
      ompi/mpi/f90/test/align_c.c: dito
    - ompi/debuggers/msgq_interface.h: used #ifdef __cplusplus
    - ompi/mpi/f90/xml/common-C.xsl: Amend

   Tested on linux using --with-openib and --with-mx

   The following do not contain either opal_config.h, orte_config.h or
   ompi_config.h
   (but possibly other header files, that include one of the above):
      ompi/mca/bml/r2/bml_r2_ft.h
      ompi/mca/btl/gm/btl_gm_endpoint.h
      ompi/mca/btl/gm/btl_gm_proc.h
      ompi/mca/btl/mx/btl_mx_endpoint.h
      ompi/mca/btl/ofud/btl_ofud_endpoint.h
      ompi/mca/btl/ofud/btl_ofud_frag.h
      ompi/mca/btl/ofud/btl_ofud_proc.h
      ompi/mca/btl/openib/btl_openib_mca.h
      ompi/mca/btl/portals/btl_portals_endpoint.h
      ompi/mca/btl/portals/btl_portals_frag.h
      ompi/mca/btl/sctp/btl_sctp_endpoint.h
      ompi/mca/btl/sctp/btl_sctp_proc.h
      ompi/mca/btl/tcp/btl_tcp_endpoint.h
      ompi/mca/btl/tcp/btl_tcp_ft.h
      ompi/mca/btl/tcp/btl_tcp_proc.h
      ompi/mca/btl/template/btl_template_endpoint.h
      ompi/mca/btl/template/btl_template_proc.h
      ompi/mca/btl/udapl/btl_udapl_eager_rdma.h
      ompi/mca/btl/udapl/btl_udapl_endpoint.h
      ompi/mca/btl/udapl/btl_udapl_mca.h
      ompi/mca/btl/udapl/btl_udapl_proc.h
      ompi/mca/mtl/mx/mtl_mx_endpoint.h
      ompi/mca/mtl/mx/mtl_mx.h
      ompi/mca/mtl/psm/mtl_psm_endpoint.h
      ompi/mca/mtl/psm/mtl_psm.h
      ompi/mca/pml/cm/pml_cm_component.h
      ompi/mca/pml/csum/pml_csum_comm.h
      ompi/mca/pml/dr/pml_dr_comm.h
      ompi/mca/pml/dr/pml_dr_component.h
      ompi/mca/pml/dr/pml_dr_endpoint.h
      ompi/mca/pml/dr/pml_dr_recvfrag.h
      ompi/mca/pml/example/pml_example.h
      ompi/mca/pml/ob1/pml_ob1_comm.h
      ompi/mca/pml/ob1/pml_ob1_component.h
      ompi/mca/pml/ob1/pml_ob1_endpoint.h
      ompi/mca/pml/ob1/pml_ob1_rdmafrag.h
      ompi/mca/pml/ob1/pml_ob1_recvfrag.h
      ompi/mca/pml/v/pml_v_output.h
      opal/include/opal/prefetch.h
      opal/mca/timer/aix/timer_aix.h
      opal/util/qsort.h
      test/support/components.h

This commit was SVN r21855.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2009-08-20 11:42:18 +00:00
Ralph Castain
270f0ffe18 Improve the performance of the csum pml module by not performing checksums on data when sending between procs on the same node.
Thanks to Nysal for this improvement!

This commit was SVN r21848.
2009-08-20 04:33:03 +00:00
Rainer Keller
567e5c4342 - As described in RFC,
http://www.open-mpi.org/community/lists/devel/2009/08/6618.php
   lower the default priority of PML/cm to allow _defined_ behaviour
   for systems, where both MTLs and BTLs are available (Portals and MX).

   Keep the previous behaviour of favoring in case of PSM.
   Still, the user may select --mca pml cm for apps where applicable.

This commit was SVN r21834.
2009-08-18 19:12:43 +00:00
Ralph Castain
ded58ae483 Silence some compiler warnings about print statements
This commit was SVN r21814.
2009-08-13 13:45:38 +00:00
Shiqing Fan
bce2f44154 Update related .windows files with proper compiling properties, in order to have a successful DSO build.
This commit was SVN r21805.
2009-08-12 08:55:58 +00:00
George Bosilca
51b2cfe40d This header is required to compile the FT.
This commit was SVN r21792.
2009-08-11 05:21:27 +00:00
Rainer Keller
76469ea64a - Change the property of a few files, that obviously
don't need to be svn:executable...

This commit was SVN r21786.
2009-08-11 01:40:00 +00:00
George Bosilca
9c2b993589 Complete r21778 by adding the missing headers.
This commit was SVN r21784.

The following SVN revision numbers were found above:
  r21778 --> open-mpi/ompi@e4d52b16b5
2009-08-10 17:07:43 +00:00
Terry Dontje
e4d52b16b5 Add in eager limit checks in pmls.
This commit was SVN r21778.
2009-08-10 12:46:20 +00:00
Donald Kerr
de6a7f57b0 fix #1984; only decrement send request req_state when not equal to zero
This commit was SVN r21775.
2009-08-07 14:58:50 +00:00
Rolf vandeVaart
c82e468ede Undo revision r21767 - sorry folks
This commit was SVN r21769.

The following SVN revision numbers were found above:
  r21767 --> open-mpi/ompi@41f38110ff
2009-08-05 22:23:26 +00:00
Rolf vandeVaart
41f38110ff HCA failover support in openib BTL
This commit was SVN r21767.
2009-08-05 21:53:02 +00:00
George Bosilca
cf8bd2142a Various cleanups and typos.
This commit was SVN r21765.
2009-08-05 03:12:33 +00:00
George Bosilca
98bdf5d17b Remove a compiler warning about missing braces around
initializer.

This commit was SVN r21760.
2009-08-04 21:41:14 +00:00
George Bosilca
32416761ce Make the open/close symmetric with regards to the local variable
contruction/destruction.

This commit was SVN r21753.
2009-08-03 16:45:18 +00:00
George Bosilca
0bf381e931 This patch try to solve a issue on Leopard. The supposedly global
variables that are not initialized and are declared in a file that
doesn't export any globally visible function are marked as
non-initialized constants, i.e. uninitialized common symbols. For some
obscure reasons, they get removed from the object files on Mac OS X.

So far I found two solution to this problem. One require the addition
of "-c" to the linker command, the second one (corresponding to this
patch) force them to became a common initialized symbol.

This commit was SVN r21739.
2009-07-28 17:06:16 +00:00
Terry Dontje
d432c9fdbc Add asserts to catch when btl_eager_limit is smaller than the pml headers.
This commit was SVN r21707.
2009-07-17 14:54:18 +00:00
George Bosilca
e1383027e1 Correct a comment and cleanup/reorder the code.
This commit was SVN r21696.
2009-07-16 17:41:32 +00:00
Ralph Castain
e75d9b8296 Use orte_notifier to alert sys admins to checksum violations in the csum pml.
Add ability to store the RM's jobid string to tag the notifier message so that the sys admin knows what job had the problem.

This commit was SVN r21687.
2009-07-15 19:43:26 +00:00
Rainer Keller
6c5532072a - Split the datatype engine into two parts: an MPI specific part in
OMPI
   and a language agnostic part in OPAL. The convertor is completely
   moved into OPAL.  This offers several benefits as described in RFC
   http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
   namely:
    - Fewer basic types (int* and float* types, boolean and wchar
    - Fixing naming scheme to ompi-nomenclature.
    - Usability outside of the ompi-layer.
 - Due to the fixed nature of simple opal types, their information is
   completely
   known at compile time and therefore constified
 - With fewer datatypes (22), the actual sizes of bit-field types may be
   reduced
   from 64 to 32 bits, allowing reorganizing the opal_datatype
   structure, eliminating holes and keeping data required in convertor
   (upon send/recv) in one cacheline...
   This has implications to the convertor-datastructure and other parts
   of the code.
 - Several performance tests have been run, the netpipe latency does not
   change with
   this patch on Linux/x86-64 on the smoky cluster.
 - Extensive tests have been done to verify correctness (no new
   regressions) using:
   1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
    ompi-ddt:
    a. running both trunk and ompi-ddt resulted in no differences
       (except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
       correctly).
    b. with --enable-memchecker and running under valgrind (one buglet
       when run with static found in test-suite, commited)
   2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
      all passed (except for the dynamic/ tests failed!! as trunk/MTT)
   3. compilation and usage of HDF5 tests on Jaguar using PGI and
      PathScale compilers.
   4. compilation and usage on Scicortex.
 - Please note, that for the heterogeneous case, (-m32 compiled
   binaries/ompi), neither
   ompi-trunk, nor ompi-ddt branch would successfully launch.

This commit was SVN r21641.
2009-07-13 04:56:31 +00:00
Ralph Castain
4881cd0df3 Revert the prior change out from the individual .h files - the problem was in the Makefile.am's, causing the make dist to fail.
This commit was SVN r21414.
2009-06-11 03:15:47 +00:00
Ralph Castain
91ab2b3e4f Specify complete path to included header files so it compiles in all environments
This commit was SVN r21412.
2009-06-11 02:46:30 +00:00
Terry Dontje
fa9f356c8c This commit fixes trac:1931. By separating out structures and defines used by
the debugger plugins into files suffixed by _dbg.h.

This commit was SVN r21404.

The following Trac tickets were found above:
  Ticket 1931 --> https://svn.open-mpi.org/trac/ompi/ticket/1931
2009-06-10 14:57:28 +00:00
George Bosilca
f9a510fd8a There is no need for an atomic read if we are not in a threaded case.
This commit was SVN r21394.
2009-06-08 23:55:52 +00:00
Nysal Jan
85773de539 Move 'multi-frag receive' fixes to csum PML
This commit was SVN r21323.
2009-05-29 08:07:33 +00:00
George Bosilca
be320ca959 Don't add the offset to all segments, only the first one should be affected. Thanks
to Roberto Ammendola for this bug report and patch.

This commit was SVN r21300.
2009-05-27 16:12:18 +00:00
Greg Koenig
60485ff95f This is a very large change to rename several #define values from
OMPI_* to OPAL_*.  This allows opal layer to be used more independent
from the whole of ompi.

NOTE: 9 "svn mv" operations immediately follow this commit.

This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Shiqing Fan
cd565923d3 Completely remove ltdl support for Windows build.
This commit was SVN r21170.
2009-05-05 18:59:13 +00:00
George Bosilca
271eb11f28 Remove an unused statically defined function.
This commit was SVN r21157.
2009-05-05 13:23:49 +00:00
Brian Barrett
77cf736f48 Make max_contextid field match same type as cid in communicator.
Refs trac:1904

This commit was SVN r21141.

The following Trac tickets were found above:
  Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904
2009-05-01 21:11:59 +00:00
Ralph Castain
f6da7d86a2 Propagate Brian's change so we abort if we run out of CIDs to the csum module
This commit was SVN r21137.
2009-05-01 15:09:44 +00:00
Brian Barrett
736debcffc Check during communicator creation that we didn't get assigned a CID we can't handle, so that the code aborts instead of hange.
Refs trac:1904

This commit was SVN r21133.

The following Trac tickets were found above:
  Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904
2009-04-30 19:23:57 +00:00
Josh Hursey
13a3453e35 more copyright fixes - sorry
This commit was SVN r21129.
2009-04-30 16:41:50 +00:00
Josh Hursey
1f42065950 Make sure the CRCPW wrapper does not try to reference a NULL value in MPI_Finalize(), due to the ordering of pml_finalize and comm_del.
Some of the PML interfaces are noops in BKMRK. Allow the CRCPW to detect and skip the call to these functions.

This commit was SVN r21126.
2009-04-30 16:36:50 +00:00
Rainer Keller
221fb9dbca ... Delayed due to notifier commits earlier this day ...
- Delete unnecessary header files using
   contrib/check_unnecessary_headers.sh after applying
   patches, that include headers, being "lost" due to
   inclusion in one of the now deleted headers...

   In total 817 files are touched.
   In ompi/mpi/c/ header files are moved up into the actual c-file,
   where necessary (these are the only additional #include),
   otherwise it is only deletions of #include (apart from the above
   additions required due to notifier...)

 - To get different MCAs (OpenIB, TM, ALPS), an earlier version was
   successfully compiled (yesterday) on:
   Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
   Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
   Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled

This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
Rainer Keller
6c1cce8761 - For the upcoming header cleanup commit,
several header files (previously included by header-files)
   now have to be moved "upward".
   This is mainly system headers such as string.h, stdio.h and for
   networking, but also some orte headers.

This commit was SVN r21095.
2009-04-29 00:49:23 +00:00
Shiqing Fan
3d4e0472d6 Add windows support files into the tarball, including .windows, CMakeLists.txt files, and CMake modules. Thanks to Jeff for testing it on Linux.
This commit was SVN r21069.
2009-04-24 16:39:33 +00:00
Nysal Jan
5353236a53 Reduce the size of FIN header by 8 bytes. This is done by rearranging the fields to reduce the amount of compiler padding
This commit was SVN r21046.
2009-04-21 14:41:51 +00:00
Ralph Castain
2b0b9dd227 Sync to change in a tmp branch
This commit was SVN r21015.
2009-04-15 13:09:51 +00:00
Ralph Castain
46d6c6d516 Sync the csum module with the recent ob1 changes
This commit was SVN r21002.
2009-04-14 18:40:54 +00:00
Nysal Jan
221447ef17 Fix checksum mismatch on Big-endian systems when heterogeneous mode is enabled
This commit was SVN r21001.
2009-04-14 17:21:38 +00:00
Nysal Jan
697f1837f4 Move fix for ticket #1875 to csum PML
This commit was SVN r20986.
2009-04-14 10:44:29 +00:00
George Bosilca
b5deb228f3 Allow the BTL to release the descriptor. In fact the only thing the PML
needs is to be involved in the RMA completion process, which is insured
by the MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag. Fixes trac:1875.

This commit was SVN r20983.

The following Trac tickets were found above:
  Ticket 1875 --> https://svn.open-mpi.org/trac/ompi/ticket/1875
2009-04-13 23:41:50 +00:00
George Bosilca
527540aeb1 Rename req_bytes_delivered to req_bytes_expected for the receive
requests to really reflect what this field means.

This commit was SVN r20971.
2009-04-10 16:36:20 +00:00
George Bosilca
c148d33eb5 Play nicely with the reference count on the ompi_proc structure.
This commit was SVN r20970.
2009-04-10 16:32:02 +00:00
Nysal Jan
1decf8bf36 Move ob1 FIN/ACK fixes to csum PML
This commit was SVN r20954.
2009-04-08 10:43:35 +00:00
George Bosilca
dfc7cea329 Fix the deadlock issues on the osu_bw. The problem is that the PML is
event driver, and if there are no event generated by the BTLs ... well
nothing happens (i.e there is no progress at the PML level and all
pending fragments remain pending). By forcing the BTL to trigger the
callbacks for all ACK and FIN, we give more opportunities to the PML
to do real progress, but we pay this in terms of performance.

This commit was SVN r20953.
2009-04-07 16:56:37 +00:00
George Bosilca
44ce610b8b Add a comment to highlight the fact that this function reappend the
FIN message to the pending list when the send fails. Therefore, any
upper level function is not required to add it.
Make sure we don't send the FIN twice.

This commit was SVN r20952.
2009-04-07 16:48:58 +00:00
George Bosilca
ccb79b963f This is the other half of the commit r20946 as I mess them up between
two of my testing machines. The fix require both commits!

This commit was SVN r20947.

The following SVN revision numbers were found above:
  r20946 --> open-mpi/ompi@e2bb4c9b8f
2009-04-06 21:49:52 +00:00
George Bosilca
e2bb4c9b8f Correct the handling of the pckt_pending list. The problem was that
we returned the pck before coping the values out. With this change
it seems to work at least on two architectures (even with the 
mpool size set back to 0).

This commit was SVN r20946.
2009-04-06 21:45:08 +00:00
Nysal Jan
5032f59edf Fix checksum computation in the buffered send code
This commit was SVN r20935.
2009-04-03 07:09:24 +00:00
Ralph Castain
ba1a98c398 Fix a warning message by pointing to the correct header
This commit was SVN r20930.
2009-04-02 13:54:59 +00:00
Nysal Jan
e561a6c43a Add missing checksum calculation. This fixes a checksum mismatch failure while using TCP BTL
This commit was SVN r20927.
2009-04-01 20:48:35 +00:00
Nysal Jan
aff903f39c Don't print this message by default
This commit was SVN r20914.
2009-04-01 14:31:21 +00:00
Ralph Castain
cba3708893 Cleanup debugging output, remove an unnecessary re-compute of the checksum
This commit was SVN r20895.
2009-03-30 17:09:32 +00:00
Ralph Castain
d5e6104035 Continue to cleanup the csum pml module. Some minor corrections and debug output added.
This commit was SVN r20894.
2009-03-29 23:27:06 +00:00
Ralph Castain
f72e3ba9f9 Update the PML base send init macro to take a converter_flag field (discussed with George).
Update the csum pml module - still not quite right, but closer.

Modify the LANL platform files to keep pace.

This commit was SVN r20859.
2009-03-24 19:12:53 +00:00
Ralph Castain
d88df53a86 A touch more cleanup. Also, bring over the peruse cleanups from r20844
This commit was SVN r20849.

The following SVN revision numbers were found above:
  r20844 --> open-mpi/ompi@daba352af4
2009-03-24 01:36:31 +00:00
Ralph Castain
78323fd6b2 Minor cleanups to compile without warnings
This commit was SVN r20848.
2009-03-24 00:54:16 +00:00
Ralph Castain
75ca19d1d1 Turn off a function that hasn't been added to the code base yet...
This commit was SVN r20847.
2009-03-23 23:56:11 +00:00
Ralph Castain
17f51a0389 Add a new PML module that acts as a "mini-dr" - when requested, it performs a dr-like checksum on messages for BTL's that require it, as specified by MCA params.
Add two new configure options that specify:

1. when to add padding to the openib control header - this *only* happens when the configure option is specified

2. when to use the dr-like checksum as opposed to the memcpy checksum. Not selectable at runtime - to eliminate performance impacts, this is a configure-only option

Also removed an unused checksum version from opal/util/crc.h.

The new component still needs a little cleanup and some sync with recent ob1 bug fixes. It was created as a separate module to avoid performance hits in ob1 itself, though most of the code is duplicative. The component is only selectable by either specifying it directly, or configuring with the dr-like checksum -and- setting -mca pml_csum_enable_checksum 1.

Modify the LANL platform files to take advantage of the new module.

This commit was SVN r20846.
2009-03-23 23:52:05 +00:00
George Bosilca
daba352af4 As the request is not yet updated (i.e. _MATCHED cannot be called as we don't yet know the
expected length of the message) we should use the source and tag from the message header
instead of the value from the status structure attached to the request.
-This line, and those below, will be ignored--

M    pml_ob1_recvreq.c

This commit was SVN r20844.
2009-03-23 20:25:53 +00:00
Aurelien Bouteiller
fa9b6e729b Fix missing file in Makefile.am and the "CREATE FAILURE".
This commit was SVN r20821.
2009-03-18 13:42:48 +00:00
Rainer Keller
6f808d9b05 Preparation work for another commit (after RFC):
- This patch solely _adds_ required headers and is rather localized
   The next patch (after RFC) heavily removes headers (based on script)
 - ompi/communicator/communicator.h: For sources that use
   ompi_mpi_comm_world, don't require them to include "mpi.h"
 - ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs
   #include "ompi/mca/topo/topo.h"
 - ompi/errhandler/errhandler_predefined.h:
   ompi/communicator/communicator.h depends on this header file!
   To prevent recursion just have fwd declarations.
   #include "ompi/types.h" for fwd declarations of the main structs.
 - ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and
   ompi_rb_tree_t, so have the proper classes
 - ompi/mca/op/op.h:
   Op is pretty self-contained: Nobody up to now has done
   #include "opal/class/opal_object.h"
 - ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h:
   #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/pml/base/base.h:
   We use opal_lists  
 - ompi/mca/pml/dr/pml_dr_vfrag.h:
   #include "opal/types.h" for ompi_ptr_t
 - ompi/mca/pml/ob1/pml_ob1_hdr.h:
   #include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t
 - opal/dss/dss_unpack.c:
   #include "opal/types.h"
 - opal/mca/base/base.h:
   #include "opal/util/cmd_line.h" for opal_cmd_line_t
 - orte/mca/oob/tcp/oob_tcp.c:
   #include "opal/types.h" for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp.h:
   #include "opal/threads/threads.h" for opal_thread_t
 - orte/mca/oob/tcp/oob_tcp_msg.c:
   #include "opal/types.h" 
 - orte/mca/oob/tcp/oob_tcp_peer.c:
   #include "opal/types.h"  for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp_send.c:
   #include "opal/types.h" 
 - orte/mca/plm/base/plm_base_proxy.c:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT
 - orte/mca/rml/base/rml_base_receive.c:
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/mca/rml/oob/rml_oob_recv.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/mca/rml/oob/rml_oob_send.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/runtime/orte_data_server.c
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/runtime/orte_globals.h:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT

 Tested on Linux/x86-64

This commit was SVN r20817.
2009-03-17 21:34:30 +00:00
Aurelien Bouteiller
3cd5a0d833 Support for the MPI event logger improving event logging perfs.
This commit was SVN r20804.
2009-03-17 17:35:28 +00:00
Rainer Keller
d8cf4c0fec - Get pgcc on XT to complain less:
In case we use memcmp, strlen, strup and friends include <string.h>
   Also several constants.h are not included directly
 - Let's have mca_topo_base_cart_create  return ompi-errors in
   ompi/mca/topo/base/topo_base_cart_create.c

This commit was SVN r20773.
2009-03-13 02:10:32 +00:00
Rainer Keller
ec0ed48718 - Revert r20739
This commit was SVN r20742.

The following SVN revision numbers were found above:
  r20739 --> open-mpi/ompi@781caee0b6
2009-03-05 21:56:03 +00:00
Rainer Keller
a94438343b - Revert r20740
This commit was SVN r20741.

The following SVN revision numbers were found above:
  r20740 --> open-mpi/ompi@2a70618a77
2009-03-05 21:50:47 +00:00
Rainer Keller
2a70618a77 - Second patch, as discussed in Louisville.
Replace short macros in orte/util/name_fns.h
   to the actual fct. call.

 - Compiles on linux/x86-64

This commit was SVN r20740.
2009-03-05 21:14:18 +00:00
Rainer Keller
781caee0b6 - First of two or three patches, in orte/util/proc_info.h:
Adapt orte_process_info to orte_proc_info, and
   change orte_proc_info() to orte_proc_info_init().
 - Compiled on linux-x86-64
 - Discussed with Ralph

This commit was SVN r20739.
2009-03-05 20:36:44 +00:00
Rainer Keller
fd28b392bf - An intrusive commit yet again (sorry): with the separation we
get bitten by header depending on having already included
   the corresponding [opal|orte|ompi]_config.h header.
   When separating, things like [OPAL|ORTE|OMPI]_DECLSPEC
   are missed.

   Script to add the corresponding header in front of all following
   (taking care of possible #ifdef HAVE_...)

 - Including some minor cleanups to
   - ompi/group/group.h -- include _after_ #ifndef OMPI_GROUP_H
   - ompi/mca/btl/btl.h -- nclude _after_ #ifndef MCA_BTL_H
   - ompi/mca/crcp/bkmrk/crcp_bkmrk_btl.c -- still no need for
     orte/util/output.h
   - ompi/mca/pml/dr/pml_dr_recvreq.c -- no need for mpool.h
   - ompi/mca/btl/btl.h -- reorder to fit
   - ompi/mca/bml/bml.h -- reorder to fit
   - ompi/runtime/ompi_mpi_finalize.c -- reorder to fit
   - ompi/request/request.h -- additionally need ompi/constants.h

 - Tested on linux/x86-64

This commit was SVN r20720.
2009-03-04 15:35:54 +00:00
Eugene Loh
efe8c3a283 Initialize reuse_old_request properly at the beginning of each loop iteration in pml_ob1_start.c.
This commit was SVN r20712.
2009-03-04 06:58:36 +00:00
Rainer Keller
811f2bd9b4 - As discussed on RFC, move the ompi_bitmap to the
opal layer.
   Add a check against a maximum (actually get rid of ifs internally to
   opal_bitmap.c) -- the functionality to set the current maximum size
   opal_bitmap_set_max_size() is currently only used in attribute.c
   to set the maximum OMPI_FORTRAN_HANDLE_MAX...

   Tested on linux/x86-64 with intel-tests with all_tests_no_perf_f
   run with 6 procs.
   Let's look into MTT as well...

This commit was SVN r20708.
2009-03-03 22:25:13 +00:00
Rainer Keller
02416033ad - Get rid of warning on function declarations:
First "static inline", then the type

This commit was SVN r20657.
2009-02-28 14:15:34 +00:00
George Bosilca
e181ba50c9 Stop valgrind from complaining about few uninitialized bytes on the PML
headers. This feature is enabled only in debug mode when the heterogeneous
support is enabled.

This commit was SVN r20648.
2009-02-27 05:24:06 +00:00
Rainer Keller
04567d3af0 - Header orte/mca/errmgr/errmgr.h is not needed.
Once again compiles fine with -Wimplicit-function-declaration   

This commit was SVN r20640.
2009-02-26 04:05:30 +00:00
Rainer Keller
96e1b9b747 - Header orte/mca/rml/rml.h is not needed if no occurence of orte_rml
or ORTE_RML.
   As the others compiles fine with -Wimplicit-function-declaration

This commit was SVN r20639.
2009-02-26 03:52:31 +00:00
Rainer Keller
b356e90fa1 - Get rid of include orte/util/proc_info.h, if not needed
Only proc_info.h-internal include file is opal/dss/dss_types.h
 - In one case (orte/util/hnp_contact.c) had to add proc_info.h again.
 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   works fine, no errors.

   Again, let's have MTT the last word.

This commit was SVN r20631.
2009-02-25 03:38:00 +00:00
Terry Dontje
0178b6c45f Added padding to predefined handle structures to maintain library version to
version compatibility.

This commit was SVN r20627.
2009-02-24 17:17:33 +00:00
George Bosilca
97a2296fdd Correct the GET protocol. Thanks to Mike Dubman for finding the problem and
testing my patch.

This commit was SVN r20591.
2009-02-19 16:00:15 +00:00