1
1
Граф коммитов

643 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
c9e5ab9ed1 Our macros for the OMPI-level free list had one extra argument, a possible return
value to signal that the operation of retrieving the element from the free list
failed. However in this case the returned pointer was set to NULL as well, so the
error code was redundant. Moreover, this was a continuous source of warnings when
the picky mode is on.

The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of
using the return code.

This commit was SVN r28722.
2013-07-04 08:34:37 +00:00
George Bosilca
702e669636 Remove a [very] annoying warning.
This commit was SVN r28690.
2013-07-01 16:49:13 +00:00
George Bosilca
5fae72b9aa Add the MPI 2.2 MPI_Dist_graph functionality.
This patch reshape the way we deal with topologies completely. Where
our topologies were mainly storage components (they were not capable
of creating the new communicator), the new version is built around a
[possibly] common representation (in mca/topo/topo.h), but the functions
to attach and retrieve the topological information are specific to each
component. As a result the ompi_create_cart and ompi_create_graph functions
become useless and have been removed.

In addition to adding the internal infrastructure to manage the topology
information, it updates the MPI interface, and the debuggers support and
provides all Fortran interfaces.

This commit was SVN r28687.
2013-07-01 12:40:08 +00:00
Rolf vandeVaart
5ebb74bee3 Fix case where amount of data sent is less than expected. Otherwise, we will get hang when running the RGET protocol.
Reviewed by hjelm,bosilca.

This commit was SVN r28667.
2013-06-21 18:35:16 +00:00
Rolf vandeVaart
52ebb0b17f Change some opal_output to OPAL_OUTPUT per CMR review.
This commit was SVN r28510.
2013-05-14 20:49:42 +00:00
Nathan Hjelm
4e95d691a7 pml/ob1: do not reset the convertor if one was not created (size = 0).
This macro is only used on the failure path so the additional if statement
should not have any affect on performance.

cmr:v1.7

This commit was SVN r28292.
2013-04-05 01:40:11 +00:00
Nathan Hjelm
9d4a26f47d Update OMPI frameworks to use the MCA framework system.
Notes:
  - This commit also eliminates the need for an available components list in use
    in several frameworks. None of the code in question was making use of the
    priority field of the priority component list item so these extra lists were
    removed.
  - Cleaned up selection code in several frameworks to sort lists using opal_list_sort.
  - Cleans up the ompi/orte-info functions. Expose the functions that construct the
    list of params so they can be used elsewhere.

patches for mtl/portals4 from brian

missed a few output variables in openib

This commit was SVN r28241.
2013-03-27 21:17:31 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Brian Barrett
65109de931 Fix leak of comm and datatype references for mprobe/improbe and fix a request leak in improbe
This commit was SVN r28157.
2013-03-07 21:55:22 +00:00
Rolf vandeVaart
ebe63118ac Remove dependency on libcuda.so when building in CUDA-aware support. Dynamically load it if needed.
This commit was SVN r28140.
2013-03-01 13:21:52 +00:00
Ralph Castain
8d2fa3693b First cut at removing the native Windows support. Remove all the Windows-specific components, and the .windows files sprinkled around. Remove the Windows platform files and MTT scripts. Update the NEWS to point Windows users to the cygwin package.
This commit was SVN r28116.
2013-02-26 20:44:56 +00:00
Brian Barrett
312f37706e In talking about this with Jeff and Ralph, we don't actually need
ompi_show_help, because opal_show_help is replaced with an 
aggregating version when using ORTE, so there's no reason to
directly call orte_show_help.

This commit was SVN r28051.
2013-02-12 21:10:11 +00:00
Rolf vandeVaart
729caaf0cd Remove any dependency on libcuda.so in opal layer. All changes are within OMPI_CUDA_SUPPORT code.
This commit was SVN r27986.
2013-01-30 23:07:32 +00:00
Brian Barrett
f42783ae1a Move the RTE framework change into the trunk. With this change, all non-CR
runtime code goes through one of the rte, dpm, or pubsub frameworks.

This commit was SVN r27934.
2013-01-27 23:25:10 +00:00
Rolf vandeVaart
f63c88701f Improve CUDA GPU transfers over openib BTL. Use aynchronous copies.
This is RFC that was submitted in July and December of 2012.

This commit was SVN r27862.
2013-01-17 22:34:43 +00:00
Nathan Hjelm
2acd0f83de Revert "Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter".
It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now.

This commit was SVN r27527.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-30 19:45:18 +00:00
Ralph Castain
6aac54b02e Revert r27510, r27509, and r27508.
Not sure what happened here, but the resulting trunk wouldn't even configure. After spending time fixing that problem, I found it wouldn't compile due to multiple syntax errors that had been introduced in both the OPAL and OMPI layer. This raised questions as to the completeness of the work.

Given that the author is departing, I pinged Jeff about it and we agreed to revert this for now. Hopefully, it can either be fixed by the author prior to actual departure, or someone else can pick it up (now that it is in the history) and fix it.

This commit was SVN r27511.

The following SVN revision numbers were found above:
  r27508 --> open-mpi/ompi@12c3c743de
  r27509 --> open-mpi/ompi@79e4a8ca38
  r27510 --> open-mpi/ompi@1ad5ff625a
2012-10-27 16:43:45 +00:00
Shiqing Fan
1ad5ff625a Don't know why these lines are modified, probably by subversion. Although it is harmless, change it back as what it was.
This commit was SVN r27510.
2012-10-27 03:04:45 +00:00
Shiqing Fan
12c3c743de Per the MemPin RFC, submit the component source files, and update the memchecker macros.
This commit was SVN r27508.
2012-10-27 02:48:20 +00:00
Ralph Castain
e6014bf2e1 Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter
This commit was SVN r27477.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-24 18:38:44 +00:00
Nathan Hjelm
d59034e6ef MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions.
cmr:v1.7

This commit was SVN r27451.
2012-10-17 20:17:37 +00:00
Eugene Loh
25ad84b925 Ensure that MPI_Status objects have proper alignment:
- fix the Fortran layer to use new macros to convert Fortran-to-C status
- change the C internals to pull out old OMPI_SET_STATUS* macros

Also, change name of "status" argument in topo_test_f.c to "topo_type".

This commit was SVN r27403.
2012-10-04 14:39:51 +00:00
George Bosilca
890dedf13f Cleanup.
This commit was SVN r27372.
2012-09-26 09:44:46 +00:00
Nathan Hjelm
4bde7f3efe silence warning
This commit was SVN r27019.
2012-08-13 17:43:36 +00:00
Nathan Hjelm
702e6d5a68 pml/ob1: fix bugs in mca_pml_ob1_recv_request_progress_rget
This commit was SVN r27018.
2012-08-13 16:26:06 +00:00
Yael Dayan
7895cd1114 adding a fragmentation mechanism to the Get flow in function mca_pml_ob1_recv_request_progress_rget
This commit was SVN r26956.
2012-08-07 07:15:21 +00:00
George Bosilca
3a8478827b Fix the MPI_Cancel issue identified by Fujitsu. And a typo.
This commit was SVN r26871.
2012-07-26 14:06:24 +00:00
George Bosilca
118e30a2ac Typo.
This commit was SVN r26865.
2012-07-25 12:42:31 +00:00
George Bosilca
b6f4bc9656 size_t not int everywhere. Correctly compute with size_t (don't initialize
it to a negative number). Get rid of the multiplication in the critical
path, and keep the functions as simple as possible.

This commit was SVN r26864.
2012-07-25 12:41:53 +00:00
Yael Dayan
8cad1d6481 while working on a fix for the Get flow in pml, I've encountered a problem in "mca_pml_ob1_compute_segment_length" function, at pml_ob1.h file.
The return value from this function was truncated from size_t to int. This fix changes the return value to type size_t.

This commit was SVN r26863.
2012-07-25 12:08:41 +00:00
Jeff Squyres
9cb3f31b50 Odd; this compiled on OS X without needing #include "opal_stdint.h".
Linux appears to need it.  Shrug.

This commit was SVN r26858.
2012-07-24 13:47:24 +00:00
Jeff Squyres
0b4a659683 Stomp some compiler warnings; use proper printf sequences for uint64_t.
This commit was SVN r26856.
2012-07-24 13:03:55 +00:00
George Bosilca
55bc3c4763 Fix the copyright.
This commit was SVN r26848.
2012-07-24 00:20:24 +00:00
George Bosilca
1ad6c82015 Implement the dump function for the PML OB1.
This commit was SVN r26847.
2012-07-24 00:19:18 +00:00
Abhishek Kulkarni
1878f276cd Replace the pattern while(flag) { opal_progress() }; in the C/R code
with the ORTE_WAIT_FOR_COMPLETION macro.

This commit was SVN r26794.
2012-07-13 23:31:56 +00:00
Abhishek Kulkarni
5c58a1c9c1 Fix C/R support in the trunk.
Among other things, this patch deals with the following issues:
* fix ompi-checkpoint argument parsing
* ompi-restart -showme prints an extraneous "Restarted child with PID" 
  message. Move around the debug statement to avoid this.
* fixes for the state machine changes

This commit was SVN r26770.
2012-07-09 23:34:13 +00:00
George Bosilca
46e808c940 Release the lock a little later.
This commit was SVN r26762.
2012-07-08 12:57:00 +00:00
George Bosilca
ee1410a8d2 A size_t variable should not start at a negative value.
This commit was SVN r26761.
2012-07-08 12:56:33 +00:00
Josh Hursey
28681deffa Backout the ORCA commit. :(
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.

This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
542330e3a7 Commit of ORCA: Open MPI Runtime Collaborative Abstraction
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.

The project is described on the wiki:
  https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition

And on this email thread:
  http://www.open-mpi.org/community/lists/devel/2012/06/11109.php

This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Nathan Hjelm
249066e06d Timeout! Per RFC update the BTL interface to hide segment keys. All BTLs (with the exception of wv), all relevant PMLs, and osc/rdma have been updated for the new interface.
This commit was SVN r26626.
2012-06-21 17:09:12 +00:00
Nathan Hjelm
fbd1636ea4 fix seg fault when size == 0
This commit was SVN r26612.
2012-06-15 16:58:23 +00:00
Nathan Hjelm
0d13cbf11c ob1: bug fix. put fallback on send never actually worked. fixed.
This commit was SVN r26602.
2012-06-14 17:29:58 +00:00
Nathan Hjelm
a809881f78 ob1: reset the converter after a failed sendi before trying send
This commit was SVN r26597.
2012-06-12 15:44:47 +00:00
Nathan Hjelm
59e529cf1d ob1: as per developer discussion disable rdma retries. the failure path currently suffers from live-lock
This commit was SVN r26575.
2012-06-07 23:31:20 +00:00
Nathan Hjelm
a32d4c648d ob1: rewind convertor after failed send
This commit was SVN r26395.
2012-05-07 17:22:22 +00:00
Nathan Hjelm
0eb18b9699 ob1: update copyrights
This commit was SVN r26331.
2012-04-24 20:19:15 +00:00
Nathan Hjelm
0a0e487d9c ob1: add emacs mode/indentation defaults
This commit was SVN r26330.
2012-04-24 20:19:06 +00:00
Nathan Hjelm
9a35f96bda ob1: add support for get fallback on put/send
This commit was SVN r26329.
2012-04-24 20:18:56 +00:00
Nathan Hjelm
93780c63be replace tabs w/ spaces
This commit was SVN r26328.
2012-04-24 20:18:45 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Jeff Squyres
fa8980157a Fix typo.
This commit was SVN r26183.
2012-03-23 00:12:32 +00:00
Brian Barrett
cce936b94c * Implement matched probe for the CM PML. Required adding a peer field to
the ompi_message_t structure to properly initialize convertor (the peer
  is available in the request in OB1, and wasn't needed when I did the
  original implementation).
* Implement matched probe for the Portals4 MTL and add NULL function pointers
  for the other MTLs.
* Add add_comm and del_comm functions to portals4 MTL so that direct call
  almost works again.
* Add NEWS item that we've implemented matched probe

This commit was SVN r26180.
2012-03-22 22:55:59 +00:00
Nathan Hjelm
f1525bdbff ob1: fix two fragment leaks
- MAJOR! get src descriptor leaks if mca_bml_base_send fails
 - minor. descriptor leaked in mca_pml_send_request_start_copy if the btl returns OMPI_ERR_RESOURCE_BUSY.

This commit was SVN r26077.
2012-03-01 15:53:39 +00:00
Rolf vandeVaart
b0a84b0a7d New btl that extends sm btl to support GPU transfers within a node.
Uses new CUDA IPC support.  Also, a few minor changes in PML to take
advantage of it.

This code has no effect unless user asks for it explicitly via 
configure arguments.  Otherwise, it is either #ifdef'ed out or
not compiled.

This commit was SVN r26039.
2012-02-24 02:13:33 +00:00
Brian Barrett
25d48e22fa Implementation of the MPI-3 Matched Probe functionality. Currently only
implemented in the OB1 PML, will return NOT_SUPPORTED in other PMLs.

This commit was SVN r25865.
2012-02-06 17:35:21 +00:00
Nathan Hjelm
bb1fec0407 added put/get btl descriptor flags
This commit was SVN r25553.
2011-11-30 21:37:23 +00:00
George Bosilca
0bd2bf9aae The number of segments accepted should be bounded by MCA_BTL_DES_MAX_SEGMENTS
and not by 2.

This commit was SVN r25515.
2011-11-28 17:19:12 +00:00
Nathan Hjelm
f8c8c641f1 added asserts to warn developers that ob1/csum match fragments do not support more than 2 segments
This commit was SVN r25514.
2011-11-28 16:12:25 +00:00
Nathan Hjelm
8962ce25b0 fixed some compiler errors caused by seg_key changes. osc/rdma may need to be updated to use btls that use 128 bit segment keys
This commit was SVN r25448.
2011-11-06 20:19:14 +00:00
Ralph Castain
2eaadcfab9 Remove unused variable
This commit was SVN r25284.
2011-10-14 15:32:18 +00:00
George Bosilca
3241bea696 Apply a patch provided by Sébastien Boisvert fixing an issue
with the probe fairness.

This commit was SVN r25265.
2011-10-11 20:28:33 +00:00
George Bosilca
4fd78c4683 Keep track of the last probe on each communicator, so we can probe all
peers in a round-robin fashion. A little bit more fair ...

This commit was SVN r25264.
2011-10-11 20:24:54 +00:00
Rolf vandeVaart
3d3b3d4dad Add support for CUDA registering sm and openib buffers. Feature is disabled by default.
This commit was SVN r24987.
2011-08-04 10:15:45 +00:00
Jeff Squyres
b2b781e537 Fix a few miscelaneous memory leaks.
This commit was SVN r24865.
2011-07-08 16:39:58 +00:00
Josh Hursey
045035963a Fix return code from MPI_Probe and MPI_Iprobe.
Instead of returning MPI_SUCCESS every time they are called regardless of the status of the call, they should return a value representative of the action. So similar to MPI_Wait/MPI_Test they will return MPI_SUCCESS if the action was successfull, or the value that matches status.MPI_ERROR for the operation if it is unsuccessful.

This was discussed on the [http://www.open-mpi.org/community/lists/devel/2011/03/9109.php ompi-devel list]

This commit was SVN r24551.
2011-03-22 13:29:29 +00:00
Eugene Loh
2770a12beb Continue clean up of thread options started in r22841, 22842, and 22849.
No need for any CMRs to 1.5... that was already done in CMR 2728.

This commit was SVN r24545.

The following SVN revision numbers were found above:
  r22841 --> open-mpi/ompi@b400b84162
2011-03-18 21:36:35 +00:00
Shiqing Fan
f43862420c Convert the bad dos line endings to unix style for all windows related files.
This commit was SVN r24137.
2010-12-02 12:08:08 +00:00
Rolf vandeVaart
90bbb33919 Move variable declaration at beginning of code block to avoid warnings.
Also, add memchecker code to csum to keep PMLs consistent.

This commit was SVN r24066.
2010-11-17 18:01:56 +00:00
Shiqing Fan
c11bdec1c8 revert r24059, need a better solution for windows build.
This commit was SVN r24063.

The following SVN revision numbers were found above:
  r24059 --> open-mpi/ompi@74927c7ab0
2010-11-17 16:09:07 +00:00
Shiqing Fan
74927c7ab0 Remove unnecessary semi-colons, they break windows build.
This commit was SVN r24059.
2010-11-17 00:39:38 +00:00
Shiqing Fan
ba2dbff82d Check for addressability in MPI_*_init, since buffer passed by the application should have been already allocated, but might be not initialized.
Check in MPI_Start / MPI_Startall for defined-ness of the buffer passed into the send request(s).

This commit was SVN r24054.
2010-11-16 01:01:12 +00:00
Rolf vandeVaart
1aa558558d Add some parentheses to keep PMLs consistent.
This commit was SVN r24021.
2010-11-09 18:51:32 +00:00
Ralph Castain
fceabb2498 Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.

Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.

Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.

I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:

1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)

2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.

There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.

This commit was SVN r23925.
2010-10-24 18:35:54 +00:00
Rolf vandeVaart
20c5e6e0d6 Fix a few more cases where we are using a function
as an argument to a macro which could result in it
being called twice.  I did not observe any issues,
but it should be fixed.  Also did some minor refactoring
for clarity and following code convention.

This commit was SVN r23886.
2010-10-12 20:11:48 +00:00
Rolf vandeVaart
f808dd2881 Cosmetic changes to fix spaces. No code change.
This commit was SVN r23803.
2010-09-27 21:01:49 +00:00
Rolf vandeVaart
3cc1fa45bf Fix a few more extraneous spaces. Also update csum
priority logic to match ob1.

This commit was SVN r23798.
2010-09-24 13:14:18 +00:00
Rolf vandeVaart
0331889495 Some more spaces, tabs, include file ordering changes.
No real code changes here.  

This commit was SVN r23789.
2010-09-22 13:48:22 +00:00
Rolf vandeVaart
77560269f2 More fixes of spaces, tabs, and ordering of include files
to make the 3 PMLs the same where they are the same.  No
real code changes.

This commit was SVN r23779.
2010-09-20 21:22:33 +00:00
Ralph Castain
40a2bfa238 WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gone into validating the branch. However, not all conditions can be checked, so users are cautioned that it may be advisable to not update from the trunk for a few days to allow MTT to identify platform-specific issues.
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.

Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.

This commit was SVN r23764.
2010-09-17 23:04:06 +00:00
Rolf vandeVaart
65e8277add Mostly fixes for tabs, spaces and indentations.
Also, some other changes to bring the csum PML up
to date with changes that happened in ob1 over the
last two years. This includes a few bug
fixes and some minor refactoring.  

This commit was SVN r23757.
2010-09-15 18:48:06 +00:00
Rolf vandeVaart
3bb587937a Just fix up some trailing spaces, tabs instead of spaces,
missing periods on copyrights, extraneous spaces on blank
lines.  No actual code change.

This commit was SVN r23739.
2010-09-10 21:01:52 +00:00
Josh Hursey
e12ca48cd9 A number of C/R enhancements per RFC below:
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php

Documentation:
  http://osl.iu.edu/research/ft/

Major Changes: 
-------------- 
 * Added C/R-enabled Debugging support. 
   Enabled with the --enable-crdebug flag. See the following website for more information: 
   http://osl.iu.edu/research/ft/crdebug/ 
 * Added Stable Storage (SStore) framework for checkpoint storage 
   * 'central' component does a direct to central storage save 
   * 'stage' component stages checkpoints to central storage while the application continues execution. 
     * 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress) 
     * 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching) 
 * Added Compression (compress) framework to support 
 * Add two new ErrMgr recovery policies 
   * {{{crmig}}} C/R Process Migration 
   * {{{autor}}} C/R Automatic Recovery 
 * Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component 
 * Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option) 
   * {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342) 
   * {{{OMPI_CR_Restart}}} 
   * {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules) 
   * {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192) 
   * {{{OMPI_CR_Quiesce_start}}} 
   * {{{OMPI_CR_Quiesce_checkpoint}}} 
   * {{{OMPI_CR_Quiesce_end}}} 
   * {{{OMPI_CR_self_register_checkpoint_callback}}} 
   * {{{OMPI_CR_self_register_restart_callback}}} 
   * {{{OMPI_CR_self_register_continue_callback}}} 
 * The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future. 
 * Add a progress meter to: 
   * FileM rsh (filem_rsh_process_meter) 
   * SnapC full (snapc_full_progress_meter) 
   * SStore stage (sstore_stage_progress_meter) 
 * Added 2 new command line options to ompi-restart 
   * --showme : Display the full command line that would have been exec'ed. 
   * --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413) 
 * Deprecated some MCA params: 
   * crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir 
   * snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir 
   * snapc_base_global_shared deprecated, use sstore_stage_global_is_shared 
   * snapc_base_store_in_place deprecated, replaced with different components of SStore 
   * snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref 
   * snapc_base_establish_global_snapshot_dir deprecated, never well supported 
   * snapc_full_skip_filem deprecated, use sstore_stage_skip_filem 

Minor Changes: 
-------------- 
 * Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing. 
 * Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components 
 * Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it. 
 * Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}} 
 * Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set. 
 * opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality. 
 * Cleanup the CRS framework and components to work with the SStore framework. 
 * Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably). 
 * Add 'quiesce' hook to CRCP for a future enhancement. 
 * We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}. 
 * Add optional application level INC callbacks (registered through the CR MPI Ext interface). 
 * Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive. 
 * {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked. 
 * {{{opal-restart}}} also support local decompression before restarting 
 * {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata 
 * {{{orte-restart}}} now uses the SStore framework to work with the metadata 
 * Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality. 
 * Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}. 
 * Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped. 
 * Make sure to decrement the number of 'num_local_procs' in the orted when one goes away. 
 * odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options. 
 * Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities. 
 * Improve the checks for 'already checkpointing' error path. 
 * A a recovery output timer, to show how long it takes to restart a job 
 * Do a better job of cleaning up the old session directory on restart. 
 * Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment) 
 * Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize. 

This commit was SVN r23587.

The following Trac tickets were found above:
  Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
  Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
  Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
  Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
  Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
  Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
  Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
2010-08-10 20:51:11 +00:00
Rolf vandeVaart
0324fdb407 Created two new macros that are used when filling in either the
status structure or the _ucount field in the status structure.
On 64-bit sparc, the macros resolve into integer array assignments.
For all others, they are just simple assignments.  This fixes 
possible BUS errors seen when running on the SPARC processor.
This bug was introduced when the _count field changed from an int
into a size_t.  See the changes to request.h for additional details.

This commit fixes trac:2514.

This commit was SVN r23554.

The following Trac tickets were found above:
  Ticket 2514 --> https://svn.open-mpi.org/trac/ompi/ticket/2514
2010-08-04 19:36:40 +00:00
George Bosilca
733d25a8a3 First step toward fixing the MPI_Get_count issues from the ticket #2241. Next
step is the configure and Fortran mojo that Jeff will put in. Until then I
guess the Fortran interface is broken (at least all functions using the hidden
count firld in the MPI_Status).

This commit was SVN r23467.
2010-07-21 20:07:00 +00:00
Jeff Squyres
c8bb7537e7 Remove include/opal/sys/cache.h -- its only purpose in life was to
#define CACHE_LINE_SIZE to 128.  This name has a conflict on NetBSD,
and it seems kinda odd to have a header file that ''only'' defines a
single value.  Also, we'll soon be raising hwloc to be a first-class
item, so having this file around seemed kinda weird.

Therefore, I replaced CACHE_LINE_SIZE with opal_cache_line_size, an
int (in opal/runtime/opal_init.c and opal/runtime/opal.h) on the
rationale that we can fill this in at runtime with hwloc info (trunk
and v1.5/beyond, only).  The only place we ''needed'' a compile-time
CACHE_LINE_SIZE was in the BTL SM (for struct padding), so I made a
new BTL_SM_ preprocessor macro with the old CACHE_LINE_SIZE value
(128).  That use isn't suitable for run-time hwloc information,
anyway.

This commit was SVN r23349.
2010-07-06 14:33:36 +00:00
Rolf vandeVaart
03b3e75f86 Add two arguments to the PML error callback function. This
allows the BTL to specify a specific ompi_proc_t that had an
error.  Also add an optional descriptive string.  Currently, arguments
are not used but will be by future failover PML. 
Changes based on RFC.  Reviewed by George Bosilca.

This commit was SVN r23174.
2010-05-19 11:55:45 +00:00
Abhishek Kulkarni
afbe3e99c6 * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
 SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
 back the native error code.

* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
  (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
  decode 'ret' to get the native error code.

This commit was SVN r23162.
2010-05-17 23:08:56 +00:00
George Bosilca
321213e779 Fix segmentation fault on heterogeneous architectures. Don't mess with the
ompi_ptr_t by translating into void*. Instead keep it as an ompi_ptr_t all
the way. Thanks to Timur Magomedov for helping to track down this issue and
test the patch.

cmr:v1.4
cmr:v1.5

This commit was SVN r23030.
2010-04-23 15:14:55 +00:00
Rolf vandeVaart
0adb570693 Add pml_ob1_verbose flag. Fix the current location it is being used
This commit was SVN r22939.
2010-04-07 13:51:42 +00:00
Josh Hursey
e9b5162d79 Fix the configure logic for --with-ft so that it properly takes a comma separated list.
Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those.

The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed.

This commit was SVN r22824.
2010-03-12 23:57:50 +00:00
Shiqing Fan
ad763c327d Restore several linked libraries that were deleted by mistake in r22405.
This commit was SVN r22415.

The following SVN revision numbers were found above:
  r22405 --> open-mpi/ompi@872a4047ba
2010-01-14 21:50:42 +00:00
Shiqing Fan
872a4047ba Fix the bug that caused by ADD_DEPENDENCIES() from different version of CMake.
In CMake 2.6 and earlier, this function add dependencies for targets and also link the target libraries automatically, but in CMake 2.8,this behavior has been changed, i.e. it will only add the dependencies but no link, which will cause linking errors at compilation time.

This commit was SVN r22405.
2010-01-14 18:10:20 +00:00
Jeff Squyres
4f68dfb03c Remove some dead code (thanks to George for pointing it out).
This commit was SVN r22309.
2009-12-14 21:20:41 +00:00
George Bosilca
16c6370b73 A little bit of cleanup, the main logic is still the same.
This commit was SVN r22043.
2009-10-01 14:05:25 +00:00
Brian Barrett
07d49e982b hdr_ctx is a uint16, so can have CIDs in range of 0 ... 2^16 - 1. I think
someone (me?) must have done 2^(16 - 1) instead.  Ooops.

This commit was SVN r21869.
2009-08-22 05:21:01 +00:00
Rainer Keller
8e1b23779f - Replace combinations of
#if defined (c_plusplus)
          defined (__cplusplus)
   followed by
      extern "C" {
   and the closing counterpart by BEGIN_C_DECLS and END_C_DECLS.

   Notable exceptions are:
    - opal/include/opal_config_bottom.h:
      This is our generated code, that itself defines BEGIN_C_DECL and
      END_C_DECL
    - ompi/mpi/cxx/mpicxx.h:
      Here we do not include opal_config_bottom.h:                                 
    - Belongs to external code:                                                    
      opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.c        
      opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.h        
    - opal/include/opal/prefetch.h:
      Has C++ specific macros that are protected:                                  

    - Had #if ... } #endif  _and_ END_C_DECLS (aka end up with 2x
      END_C_DECLS)
      ompi/mca/btl/openib/btl_openib.h
    - opal/event/event.h has #ifdef __cplusplus as BEGIN_C_DECLS...
    - opal/win32/ompi_process.h: had extern "C"\n {...
      opal/win32/ompi_process.h: dito
    - ompi/mca/btl/pcie/btl_pcie_lex.l: needed to add *_C_DECLS
      ompi/mpi/f90/test/align_c.c: dito
    - ompi/debuggers/msgq_interface.h: used #ifdef __cplusplus
    - ompi/mpi/f90/xml/common-C.xsl: Amend

   Tested on linux using --with-openib and --with-mx

   The following do not contain either opal_config.h, orte_config.h or
   ompi_config.h
   (but possibly other header files, that include one of the above):
      ompi/mca/bml/r2/bml_r2_ft.h
      ompi/mca/btl/gm/btl_gm_endpoint.h
      ompi/mca/btl/gm/btl_gm_proc.h
      ompi/mca/btl/mx/btl_mx_endpoint.h
      ompi/mca/btl/ofud/btl_ofud_endpoint.h
      ompi/mca/btl/ofud/btl_ofud_frag.h
      ompi/mca/btl/ofud/btl_ofud_proc.h
      ompi/mca/btl/openib/btl_openib_mca.h
      ompi/mca/btl/portals/btl_portals_endpoint.h
      ompi/mca/btl/portals/btl_portals_frag.h
      ompi/mca/btl/sctp/btl_sctp_endpoint.h
      ompi/mca/btl/sctp/btl_sctp_proc.h
      ompi/mca/btl/tcp/btl_tcp_endpoint.h
      ompi/mca/btl/tcp/btl_tcp_ft.h
      ompi/mca/btl/tcp/btl_tcp_proc.h
      ompi/mca/btl/template/btl_template_endpoint.h
      ompi/mca/btl/template/btl_template_proc.h
      ompi/mca/btl/udapl/btl_udapl_eager_rdma.h
      ompi/mca/btl/udapl/btl_udapl_endpoint.h
      ompi/mca/btl/udapl/btl_udapl_mca.h
      ompi/mca/btl/udapl/btl_udapl_proc.h
      ompi/mca/mtl/mx/mtl_mx_endpoint.h
      ompi/mca/mtl/mx/mtl_mx.h
      ompi/mca/mtl/psm/mtl_psm_endpoint.h
      ompi/mca/mtl/psm/mtl_psm.h
      ompi/mca/pml/cm/pml_cm_component.h
      ompi/mca/pml/csum/pml_csum_comm.h
      ompi/mca/pml/dr/pml_dr_comm.h
      ompi/mca/pml/dr/pml_dr_component.h
      ompi/mca/pml/dr/pml_dr_endpoint.h
      ompi/mca/pml/dr/pml_dr_recvfrag.h
      ompi/mca/pml/example/pml_example.h
      ompi/mca/pml/ob1/pml_ob1_comm.h
      ompi/mca/pml/ob1/pml_ob1_component.h
      ompi/mca/pml/ob1/pml_ob1_endpoint.h
      ompi/mca/pml/ob1/pml_ob1_rdmafrag.h
      ompi/mca/pml/ob1/pml_ob1_recvfrag.h
      ompi/mca/pml/v/pml_v_output.h
      opal/include/opal/prefetch.h
      opal/mca/timer/aix/timer_aix.h
      opal/util/qsort.h
      test/support/components.h

This commit was SVN r21855.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2009-08-20 11:42:18 +00:00
Shiqing Fan
bce2f44154 Update related .windows files with proper compiling properties, in order to have a successful DSO build.
This commit was SVN r21805.
2009-08-12 08:55:58 +00:00
George Bosilca
51b2cfe40d This header is required to compile the FT.
This commit was SVN r21792.
2009-08-11 05:21:27 +00:00
George Bosilca
9c2b993589 Complete r21778 by adding the missing headers.
This commit was SVN r21784.

The following SVN revision numbers were found above:
  r21778 --> open-mpi/ompi@e4d52b16b5
2009-08-10 17:07:43 +00:00
Terry Dontje
e4d52b16b5 Add in eager limit checks in pmls.
This commit was SVN r21778.
2009-08-10 12:46:20 +00:00
Donald Kerr
de6a7f57b0 fix #1984; only decrement send request req_state when not equal to zero
This commit was SVN r21775.
2009-08-07 14:58:50 +00:00
Rolf vandeVaart
c82e468ede Undo revision r21767 - sorry folks
This commit was SVN r21769.

The following SVN revision numbers were found above:
  r21767 --> open-mpi/ompi@41f38110ff
2009-08-05 22:23:26 +00:00
Rolf vandeVaart
41f38110ff HCA failover support in openib BTL
This commit was SVN r21767.
2009-08-05 21:53:02 +00:00
George Bosilca
cf8bd2142a Various cleanups and typos.
This commit was SVN r21765.
2009-08-05 03:12:33 +00:00
Terry Dontje
d432c9fdbc Add asserts to catch when btl_eager_limit is smaller than the pml headers.
This commit was SVN r21707.
2009-07-17 14:54:18 +00:00
George Bosilca
e1383027e1 Correct a comment and cleanup/reorder the code.
This commit was SVN r21696.
2009-07-16 17:41:32 +00:00
Rainer Keller
6c5532072a - Split the datatype engine into two parts: an MPI specific part in
OMPI
   and a language agnostic part in OPAL. The convertor is completely
   moved into OPAL.  This offers several benefits as described in RFC
   http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
   namely:
    - Fewer basic types (int* and float* types, boolean and wchar
    - Fixing naming scheme to ompi-nomenclature.
    - Usability outside of the ompi-layer.
 - Due to the fixed nature of simple opal types, their information is
   completely
   known at compile time and therefore constified
 - With fewer datatypes (22), the actual sizes of bit-field types may be
   reduced
   from 64 to 32 bits, allowing reorganizing the opal_datatype
   structure, eliminating holes and keeping data required in convertor
   (upon send/recv) in one cacheline...
   This has implications to the convertor-datastructure and other parts
   of the code.
 - Several performance tests have been run, the netpipe latency does not
   change with
   this patch on Linux/x86-64 on the smoky cluster.
 - Extensive tests have been done to verify correctness (no new
   regressions) using:
   1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
    ompi-ddt:
    a. running both trunk and ompi-ddt resulted in no differences
       (except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
       correctly).
    b. with --enable-memchecker and running under valgrind (one buglet
       when run with static found in test-suite, commited)
   2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
      all passed (except for the dynamic/ tests failed!! as trunk/MTT)
   3. compilation and usage of HDF5 tests on Jaguar using PGI and
      PathScale compilers.
   4. compilation and usage on Scicortex.
 - Please note, that for the heterogeneous case, (-m32 compiled
   binaries/ompi), neither
   ompi-trunk, nor ompi-ddt branch would successfully launch.

This commit was SVN r21641.
2009-07-13 04:56:31 +00:00
George Bosilca
f9a510fd8a There is no need for an atomic read if we are not in a threaded case.
This commit was SVN r21394.
2009-06-08 23:55:52 +00:00
George Bosilca
be320ca959 Don't add the offset to all segments, only the first one should be affected. Thanks
to Roberto Ammendola for this bug report and patch.

This commit was SVN r21300.
2009-05-27 16:12:18 +00:00
Greg Koenig
60485ff95f This is a very large change to rename several #define values from
OMPI_* to OPAL_*.  This allows opal layer to be used more independent
from the whole of ompi.

NOTE: 9 "svn mv" operations immediately follow this commit.

This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Shiqing Fan
cd565923d3 Completely remove ltdl support for Windows build.
This commit was SVN r21170.
2009-05-05 18:59:13 +00:00
George Bosilca
271eb11f28 Remove an unused statically defined function.
This commit was SVN r21157.
2009-05-05 13:23:49 +00:00
Brian Barrett
736debcffc Check during communicator creation that we didn't get assigned a CID we can't handle, so that the code aborts instead of hange.
Refs trac:1904

This commit was SVN r21133.

The following Trac tickets were found above:
  Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904
2009-04-30 19:23:57 +00:00
Rainer Keller
221fb9dbca ... Delayed due to notifier commits earlier this day ...
- Delete unnecessary header files using
   contrib/check_unnecessary_headers.sh after applying
   patches, that include headers, being "lost" due to
   inclusion in one of the now deleted headers...

   In total 817 files are touched.
   In ompi/mpi/c/ header files are moved up into the actual c-file,
   where necessary (these are the only additional #include),
   otherwise it is only deletions of #include (apart from the above
   additions required due to notifier...)

 - To get different MCAs (OpenIB, TM, ALPS), an earlier version was
   successfully compiled (yesterday) on:
   Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
   Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
   Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled

This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
Shiqing Fan
3d4e0472d6 Add windows support files into the tarball, including .windows, CMakeLists.txt files, and CMake modules. Thanks to Jeff for testing it on Linux.
This commit was SVN r21069.
2009-04-24 16:39:33 +00:00
Nysal Jan
5353236a53 Reduce the size of FIN header by 8 bytes. This is done by rearranging the fields to reduce the amount of compiler padding
This commit was SVN r21046.
2009-04-21 14:41:51 +00:00
George Bosilca
b5deb228f3 Allow the BTL to release the descriptor. In fact the only thing the PML
needs is to be involved in the RMA completion process, which is insured
by the MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag. Fixes trac:1875.

This commit was SVN r20983.

The following Trac tickets were found above:
  Ticket 1875 --> https://svn.open-mpi.org/trac/ompi/ticket/1875
2009-04-13 23:41:50 +00:00
George Bosilca
527540aeb1 Rename req_bytes_delivered to req_bytes_expected for the receive
requests to really reflect what this field means.

This commit was SVN r20971.
2009-04-10 16:36:20 +00:00
George Bosilca
c148d33eb5 Play nicely with the reference count on the ompi_proc structure.
This commit was SVN r20970.
2009-04-10 16:32:02 +00:00
George Bosilca
dfc7cea329 Fix the deadlock issues on the osu_bw. The problem is that the PML is
event driver, and if there are no event generated by the BTLs ... well
nothing happens (i.e there is no progress at the PML level and all
pending fragments remain pending). By forcing the BTL to trigger the
callbacks for all ACK and FIN, we give more opportunities to the PML
to do real progress, but we pay this in terms of performance.

This commit was SVN r20953.
2009-04-07 16:56:37 +00:00
George Bosilca
44ce610b8b Add a comment to highlight the fact that this function reappend the
FIN message to the pending list when the send fails. Therefore, any
upper level function is not required to add it.
Make sure we don't send the FIN twice.

This commit was SVN r20952.
2009-04-07 16:48:58 +00:00
George Bosilca
ccb79b963f This is the other half of the commit r20946 as I mess them up between
two of my testing machines. The fix require both commits!

This commit was SVN r20947.

The following SVN revision numbers were found above:
  r20946 --> open-mpi/ompi@e2bb4c9b8f
2009-04-06 21:49:52 +00:00
George Bosilca
e2bb4c9b8f Correct the handling of the pckt_pending list. The problem was that
we returned the pck before coping the values out. With this change
it seems to work at least on two architectures (even with the 
mpool size set back to 0).

This commit was SVN r20946.
2009-04-06 21:45:08 +00:00
Ralph Castain
f72e3ba9f9 Update the PML base send init macro to take a converter_flag field (discussed with George).
Update the csum pml module - still not quite right, but closer.

Modify the LANL platform files to keep pace.

This commit was SVN r20859.
2009-03-24 19:12:53 +00:00
George Bosilca
daba352af4 As the request is not yet updated (i.e. _MATCHED cannot be called as we don't yet know the
expected length of the message) we should use the source and tag from the message header
instead of the value from the status structure attached to the request.
-This line, and those below, will be ignored--

M    pml_ob1_recvreq.c

This commit was SVN r20844.
2009-03-23 20:25:53 +00:00
Rainer Keller
6f808d9b05 Preparation work for another commit (after RFC):
- This patch solely _adds_ required headers and is rather localized
   The next patch (after RFC) heavily removes headers (based on script)
 - ompi/communicator/communicator.h: For sources that use
   ompi_mpi_comm_world, don't require them to include "mpi.h"
 - ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs
   #include "ompi/mca/topo/topo.h"
 - ompi/errhandler/errhandler_predefined.h:
   ompi/communicator/communicator.h depends on this header file!
   To prevent recursion just have fwd declarations.
   #include "ompi/types.h" for fwd declarations of the main structs.
 - ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and
   ompi_rb_tree_t, so have the proper classes
 - ompi/mca/op/op.h:
   Op is pretty self-contained: Nobody up to now has done
   #include "opal/class/opal_object.h"
 - ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h:
   #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/pml/base/base.h:
   We use opal_lists  
 - ompi/mca/pml/dr/pml_dr_vfrag.h:
   #include "opal/types.h" for ompi_ptr_t
 - ompi/mca/pml/ob1/pml_ob1_hdr.h:
   #include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t
 - opal/dss/dss_unpack.c:
   #include "opal/types.h"
 - opal/mca/base/base.h:
   #include "opal/util/cmd_line.h" for opal_cmd_line_t
 - orte/mca/oob/tcp/oob_tcp.c:
   #include "opal/types.h" for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp.h:
   #include "opal/threads/threads.h" for opal_thread_t
 - orte/mca/oob/tcp/oob_tcp_msg.c:
   #include "opal/types.h" 
 - orte/mca/oob/tcp/oob_tcp_peer.c:
   #include "opal/types.h"  for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp_send.c:
   #include "opal/types.h" 
 - orte/mca/plm/base/plm_base_proxy.c:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT
 - orte/mca/rml/base/rml_base_receive.c:
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/mca/rml/oob/rml_oob_recv.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/mca/rml/oob/rml_oob_send.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/runtime/orte_data_server.c
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/runtime/orte_globals.h:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT

 Tested on Linux/x86-64

This commit was SVN r20817.
2009-03-17 21:34:30 +00:00
Eugene Loh
efe8c3a283 Initialize reuse_old_request properly at the beginning of each loop iteration in pml_ob1_start.c.
This commit was SVN r20712.
2009-03-04 06:58:36 +00:00
Rainer Keller
811f2bd9b4 - As discussed on RFC, move the ompi_bitmap to the
opal layer.
   Add a check against a maximum (actually get rid of ifs internally to
   opal_bitmap.c) -- the functionality to set the current maximum size
   opal_bitmap_set_max_size() is currently only used in attribute.c
   to set the maximum OMPI_FORTRAN_HANDLE_MAX...

   Tested on linux/x86-64 with intel-tests with all_tests_no_perf_f
   run with 6 procs.
   Let's look into MTT as well...

This commit was SVN r20708.
2009-03-03 22:25:13 +00:00
Rainer Keller
02416033ad - Get rid of warning on function declarations:
First "static inline", then the type

This commit was SVN r20657.
2009-02-28 14:15:34 +00:00
George Bosilca
e181ba50c9 Stop valgrind from complaining about few uninitialized bytes on the PML
headers. This feature is enabled only in debug mode when the heterogeneous
support is enabled.

This commit was SVN r20648.
2009-02-27 05:24:06 +00:00
Rainer Keller
96e1b9b747 - Header orte/mca/rml/rml.h is not needed if no occurence of orte_rml
or ORTE_RML.
   As the others compiles fine with -Wimplicit-function-declaration

This commit was SVN r20639.
2009-02-26 03:52:31 +00:00
Rainer Keller
b356e90fa1 - Get rid of include orte/util/proc_info.h, if not needed
Only proc_info.h-internal include file is opal/dss/dss_types.h
 - In one case (orte/util/hnp_contact.c) had to add proc_info.h again.
 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   works fine, no errors.

   Again, let's have MTT the last word.

This commit was SVN r20631.
2009-02-25 03:38:00 +00:00
Terry Dontje
0178b6c45f Added padding to predefined handle structures to maintain library version to
version compatibility.

This commit was SVN r20627.
2009-02-24 17:17:33 +00:00
George Bosilca
97a2296fdd Correct the GET protocol. Thanks to Mike Dubman for finding the problem and
testing my patch.

This commit was SVN r20591.
2009-02-19 16:00:15 +00:00
Rainer Keller
d81443cc5a - On the way to get the BTLs split out and lessen dependency on orte:
Often, orte/util/show_help.h is included, although no functionality
   is required -- instead, most often opal_output.h, or               
   orte/mca/rml/rml_types.h                                           
   Please see orte_show_help_replacement.sh commited next.            

 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   actually showed two *missing* #include "orte/util/show_help.h"     
   in orte/mca/odls/base/odls_base_default_fns.c and                  
   in orte/tools/orte-top/orte-top.c                                  
   Manually added these.                                              

   Let's have MTT the last word.

This commit was SVN r20557.
2009-02-14 02:26:12 +00:00
Jeff Squyres
91415c2996 Some minor valgrind-inspired cleanups: fix some memory leaks
This commit was SVN r20542.
2009-02-13 03:45:11 +00:00
George Bosilca
4747a4bb53 ompi_comm_all allocate memory and retain the objects. Therefore, after
each call to ompi_comm_all we should parse the communicator list and
release the objects ...

This commit was SVN r20525.
2009-02-11 21:48:11 +00:00
Jeff Squyres
679e2855b7 Fix CID 1135: the assignment to item was never used (it was
overwritten in the next loop iteration "item = next_item").

This commit was SVN r20189.
2009-01-03 15:15:42 +00:00
Shiqing Fan
a5281f0434 - 1/4 commit for Windows Visual Studio and CCP support:
CMakeLists and .windows files.
  In contribs preconfigured and precompiled parts.

This commit was SVN r20108.
2008-12-10 20:59:20 +00:00
Shiqing Fan
abd21b6d17 - An update for memchecker :
1. fix a bug in pml_ob1_recvreq/sendreq.c, buffer was made defined where the request has already been released.
2. complete memchecker support for collective functions.
3. change the wrongly spelled function name of memchecker, i.e. '*_isaddressible' should be '*_isaddressable'

This commit was SVN r20043.
2008-11-27 16:34:02 +00:00
George Bosilca
82d1d5d785 The patch for "Unexpected message queue for unknown CID's required" ticket #1460.
I'm unable to split it in two parts, my patch and Edgar's one. So I just update
copyright information for both of us.
What this patch do:
- it use the unexpected queue create by commit r19562 to dispatch the
  unexpected message to the right communicator (once this communicator
  is created and initialized).
- delay the PML comm_add until we have the context_id for the new communicator.
- only do the PML comm_add on processes that really belong to the new
  communicator. Please read the lengthy comment in the source code for the
  reason behind this.

This commit was SVN r19929.

The following SVN revision numbers were found above:
  r19562 --> open-mpi/ompi@acd3406aa7
2008-11-04 21:58:06 +00:00
George Bosilca
9260f6b157 There is no reason to ask for an ACK from the BTL here.
This commit was SVN r19789.
2008-10-22 20:13:33 +00:00
Josh Hursey
88aa45dd52 Commit to bring online OpenIB, MX, and shared memory support for Open MPI's checkpoint/restart functionality. Some tuning is still needed, but basic functionality is in place.
There is still a problem with OpenIB and threads (external to C/R functionality). It has been reported in Ticket #1539

Additionally:
* Fix a file cleanup bug in CRS Base.
* Fix a possible deadlock in the TCP ft_event function
* Add a mca_base_param_deregister() function to MCA base
* Add whole process checkpoint timers
* Add support for BTL: OpenIB, MX,  Shared Memory
* Add support Mpool: rdma, sm
* Sundry bounds checking an cleanup in some scattered functions

This commit was SVN r19756.
2008-10-16 15:09:00 +00:00
George Bosilca
00d24bf8ab Scalability patch, or slim-fast effect #1. All BML structures just
got a whole lot smaller, decreasing the memory footprint of the
running application. How much it's a good question. Here is a
breakdown:

- in mca_bml_base_endpoint_t: 3 *size_t + 1 * uint32_t
- in mca_bml_base_btl_t: 1 * int + 1 * double - 1 * float
                         + 6 * size_t + 9 * (void*)

The decrease in mca_bml_base_endpoint_t is for each peer and the
decrease in mca_bml_base_btl_t is for each BTL for each peer.
So, if we consider the most convenient case where there is only
one network between all peers, this decrease the memory foot print
per peer by
9*size_t + 9*(void*) + 2 * int32_t + 1 * double - 1 * float.
On a 64 bits machine this will be 156 bytes per peer.

Now we access all these fields directly from the underlying BTL
structure, and as this structure is common to multiple BML endpoint,
we are a lot more cache friendly. Even if this do not improve the
latency, it makes the SM performance graph a lot smoother.

This commit was SVN r19659.
2008-09-30 21:02:37 +00:00
George Bosilca
b32e4e7f34 Nothing important, mainly replacing tabs with spaces.
This commit was SVN r19658.
2008-09-30 18:30:35 +00:00
George Bosilca
325d006577 Mostly cleanups, and eventually a little bit more scalable add_procs.
There was an argument that was barely used, and on return at the PML
level it contained nothing usable. It has been removed, so now we're
using less memory ...

This commit was SVN r19657.
2008-09-30 15:47:43 +00:00
Jeff Squyres
d2d06008a0 Change the default value of mpi_leave_pinned to -1, meaning that we'll
figure it out at runtime (really meaning: we'll still default to "0"
unless something explicitly overrides to 1, such as the openib BTL).
This way, ompi_info doesn't confusingly report mpi_leave_pinned==0 for
mpi_leave_pinned, but we end up running with mpi_leave_pinned==1.

Fixes trac:1502.

This commit was SVN r19571.

The following Trac tickets were found above:
  Ticket 1502 --> https://svn.open-mpi.org/trac/ompi/ticket/1502
2008-09-16 22:06:14 +00:00
Jeff Squyres
270f482fea Addendum to r19561: also remove a comment that is no longer true and
some code that is commented out.

This commit was SVN r19564.

The following SVN revision numbers were found above:
  r19561 --> open-mpi/ompi@17e65369be
2008-09-16 13:02:10 +00:00
George Bosilca
acd3406aa7 Never drop messages. No never no more.
This is supposed to fix the ticket #1460.

This commit was SVN r19562.
2008-09-15 23:04:18 +00:00