1
1
Граф коммитов

404 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
b2411fe131 Add support for MPI-3's MPI_COMM_SPLIT_TYPE function
This commit was SVN r25738.
2012-01-18 23:35:21 +00:00
Jeff Squyres
6fbbfd0f7a Gah! r25545 acidentally included ''waaaay'' more stuff than it was
supposed to.  I.e., half-baked/not complete stuff.

This commit backs out all of r25545.  Sorry folks!

This commit was SVN r25546.

The following SVN revision numbers were found above:
  r25545 --> open-mpi/ompi@7f9ae11faf
2011-11-29 23:24:52 +00:00
Jeff Squyres
7f9ae11faf Per http://www.open-mpi.org/community/lists/users/2011/11/17862.php,
to make MPI_IN_PLACE (and other sentinel Fortran constants) work on OS
X, we need to use the following compiler (linker) flag:

    -Wl,-commons,use_dylibs 

So if we're compiling on OS X, test to see if that flag works with the
compiler.  If so, add it to the wrapper FFLAGS and FCFLAGS (note that
per a future update, we'll only have one Fortran compiler anyway).

Fixes trac:1982.  

This commit was SVN r25545.

The following Trac tickets were found above:
  Ticket 1982 --> https://svn.open-mpi.org/trac/ompi/ticket/1982
2011-11-29 23:05:54 +00:00
Christopher Yeoh
bab59bda76 Fixes trac:2767: Recursive locking when ROMIO used with THREAD_MULITPLE
This commit was SVN r24681.

The following Trac tickets were found above:
  Ticket 2767 --> https://svn.open-mpi.org/trac/ompi/ticket/2767
2011-05-04 06:31:42 +00:00
George Bosilca
79b13f36ba darray and subarray are now first class citizens in Open MPI. They can be stored
in packed form and reloaded, as any other type (this is mainly for one sided).

This commit was SVN r24480.
2011-03-02 19:22:24 +00:00
George Bosilca
27fecda12c Allow the one sided components to correctly retrieve the op to
be applied. Correct the MPI validation process of the
MPI_Accumulate arguments.

Fix another potential problem not yet reported. If we convert the
MPI datatypes direclty into OPAL datatypes, we will restrict their
number to the locally different types. Which might not be identical
on the remote node, if we are in a heterogeneous environment. So,
for MPI One sided only deal with MPI level types, never simplify
them on OPAL types (at least on the args). The unfortunate
outcome is that we need to create the args for all datatypes.

This commit was SVN r24466.
2011-02-25 20:43:17 +00:00
Josh Hursey
2bdff63e6f move the INIT to after the error handler, so it matches MPI_INIT. Thanks to Jeff for catching this
This commit was SVN r24200.
2011-01-03 18:16:53 +00:00
Josh Hursey
0b514e234b MPI_Init_thread is used in place of MPI_Init, so for the checkpoint/restart functionality it must correctly init the C/R functionality instead of simply making a critical section. This allows the C/R thread to be started properly.
Thanks to Takayuki Seki for finding this bug.

This commit was SVN r24194.
2010-12-29 15:37:30 +00:00
Brian Barrett
621344cce4 Remove duplicate DT tests
This commit was SVN r24189.
2010-12-20 23:38:36 +00:00
Shiqing Fan
ba2dbff82d Check for addressability in MPI_*_init, since buffer passed by the application should have been already allocated, but might be not initialized.
Check in MPI_Start / MPI_Startall for defined-ness of the buffer passed into the send request(s).

This commit was SVN r24054.
2010-11-16 01:01:12 +00:00
Shiqing Fan
066d5fb9d7 The standard does not imply that the contents of the buffer should be defined/addressable at this point. Remove the buffer checks in these functions.
This commit was SVN r23988.
2010-11-03 09:36:24 +00:00
Ralph Castain
fceabb2498 Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.

Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.

Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.

I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:

1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)

2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.

There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.

This commit was SVN r23925.
2010-10-24 18:35:54 +00:00
Jeff Squyres
73bcc4a36b Fix mistake that came in via the ompi-agen tree in r23764. The mistake wasn't part of the core autogen upgrade; it was an additional 'bonus' cleanup. Oops. The mistake will always create a set of directories under installdir, even if you do not --with-devel-headers. The set of directories will be empty, but still -- they should not be there at all. This commit fixes that -- the directories are not created at all if you do not --with-devel-headers
This commit was SVN r23801.

The following SVN revision numbers were found above:
  r23764 --> open-mpi/ompi@40a2bfa238
2010-09-24 22:53:28 +00:00
Ralph Castain
40a2bfa238 WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gone into validating the branch. However, not all conditions can be checked, so users are cautioned that it may be advisable to not update from the trunk for a few days to allow MTT to identify platform-specific issues.
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.

Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.

This commit was SVN r23764.
2010-09-17 23:04:06 +00:00
Rolf vandeVaart
0324fdb407 Created two new macros that are used when filling in either the
status structure or the _ucount field in the status structure.
On 64-bit sparc, the macros resolve into integer array assignments.
For all others, they are just simple assignments.  This fixes 
possible BUS errors seen when running on the SPARC processor.
This bug was introduced when the _count field changed from an int
into a size_t.  See the changes to request.h for additional details.

This commit fixes trac:2514.

This commit was SVN r23554.

The following Trac tickets were found above:
  Ticket 2514 --> https://svn.open-mpi.org/trac/ompi/ticket/2514
2010-08-04 19:36:40 +00:00
Jeff Squyres
51a051b072 This commit, along with r23467, r23468, r23470, r23471 should fix #2241.
This commit:

 * Adds the configury to figure out how many Fortran INTEGERs are 
   necessary to represent the C MPI_Status (which now includes a size_t
   member).
 * Sets MPI_STATUS_SIZE to this value in mpif-config.h.in.
 * Adds a big comment in status_c2f.c explaining why the no changes 
   were necessary to how we copy statuses between Fortran and C.

This commit was SVN r23472.

The following SVN revision numbers were found above:
  r23467 --> open-mpi/ompi@733d25a8a3
  r23468 --> open-mpi/ompi@963fcb13a5
  r23470 --> open-mpi/ompi@418b989781
  r23471 --> open-mpi/ompi@bc74a446ac
2010-07-22 02:23:47 +00:00
Jeff Squyres
418b989781 Divide by size, not status->_count. Gives a much better answer. :-)
This commit was SVN r23470.
2010-07-22 01:53:01 +00:00
Jeff Squyres
963fcb13a5 If the value to be returned is larger than what can be represented in
the count parameter, then invoke MPI_ERR_TRUNCATE.

This commit was SVN r23468.
2010-07-22 01:15:46 +00:00
George Bosilca
733d25a8a3 First step toward fixing the MPI_Get_count issues from the ticket #2241. Next
step is the configure and Fortran mojo that Jeff will put in. Until then I
guess the Fortran interface is broken (at least all functions using the hidden
count firld in the MPI_Status).

This commit was SVN r23467.
2010-07-21 20:07:00 +00:00
Jeff Squyres
0061f2170d ompi/mpi/c/request_get_status.c (MPI_Request_get_status): If
opal_progress is called then check the status of the request before
returning. opal_progress is called only once.  This logic parallels
MPI_Test (ompi_request_default_test).

Thanks to Shaun Jackman for submitting the patch.

This commit was SVN r23215.
2010-05-27 21:37:11 +00:00
Edgar Gabriel
5881719d84 checks for sendcount and recvcount(s) being zero have slightly different
consequences depending on whether the communicator is an intra or an inter
communicator. 

fixes trac:2415

This commit was SVN r23187.

The following Trac tickets were found above:
  Ticket 2415 --> https://svn.open-mpi.org/trac/ompi/ticket/2415
2010-05-20 22:21:26 +00:00
Abhishek Kulkarni
afbe3e99c6 * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
 SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
 back the native error code.

* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
  (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
  decode 'ret' to get the native error code.

This commit was SVN r23162.
2010-05-17 23:08:56 +00:00
Ralph Castain
b400b84162 Merge in the modified thread configure option branch per today's telecon.
Remove the --enable-progress-threads option as this is no longer functional, and hardcode OPAL_ENABLE_PROGRESS_THREADS to 0.

Replace the --enable-mpi-threads option with --enable-mpi-thread-multiple as this is clearer as to meaning. This option automatically turns "on" opal thread support if it wasn't already so specified. If the user specifies --disable-opal-multi-threads --enable-mpi-thread-multiple, we will error out with a message

Add a new --enable-opal-multi-threads option that turns "on" opal thread support without doing anything wrt mpi-thread-multiple

This commit was SVN r22841.
2010-03-16 23:10:50 +00:00
Jeff Squyres
7b3ac4fb73 Refs trac:2273
After talking to both Brian and George, the conensus was to just
remove the flag and the test function.  Begone, evil spirits, BEGONE!

This commit was SVN r22831.

The following Trac tickets were found above:
  Ticket 2273 --> https://svn.open-mpi.org/trac/ompi/ticket/2273
2010-03-16 00:47:10 +00:00
Christopher Yeoh
27cc40e412 Fixes MPI errhandler set races
See #2103 for details

This commit was SVN r22300.
2009-12-14 03:38:01 +00:00
Jeff Squyres
12520ca711 Just like we relaxed the error checking for MPI_CART_CREATE (r21816),
we should have also relaxed the error checking for MPI_GRAPH_CREATE.
Thanks to David Singleton for pointing this out.

This commit was SVN r22251.

The following SVN revision numbers were found above:
  r21816 --> open-mpi/ompi@b8332ea2b2
2009-12-01 21:50:39 +00:00
Brian Barrett
b57b8c5b3f Clean up request handling in the I/O framework to be more consistent with
other request-using frameworks.

 - Rather than having mpi/c/* functions allocate requests explicitly,
   pass the MPI_Request* down to the I/O component and have it 
   perform the allocation.
 - While the I/O base provides a base request which can be used,
   it is not required and all request management occurs within
   the component.
 - Push progress management into the component, rather than having it
   happen in the base.  Progress functions are now easily registered,
   and not all (ie, the one existing) components use progress functions
   in any rational way.

ROMIO switched to generalized requests instead of MPIO_Requests many
moons ago, and Open MPI now uses ROMIO's generalized requests, so there
is no reason to wrap those requests (which are OMPI requests) in another
level of request.

Now the file function passes the MPI_Request* to the ROMIO component,
which passes it to the underlying ROMIO function, which calls 
MPI_Grequest_start to create an OMPI request, which is what gets set
as the request to the user.  Much cleaner.

This patch has two motivations.  One, a whole heck of a lot of code
just got removed, and request handling is now much cleaner for I/O
components.  Two, by adding support for Argonne's proposed generalized
request extensions, we can allow ROMIO to provide async I/O through
generalized requests, which we couldn't rationally do in the old
setup due to the crazy request completion rules.

This commit was SVN r22235.
2009-11-26 05:13:43 +00:00
Jeff Squyres
ac21b4f571 Make MPI_GROUP_INCL|EXCL and MPI_GROUP_TRANSLATE_RANKS a bit more
social when array_size==0 is passed in.  Thanks to Lisandro Dalcin for
pointing this out.

This commit was SVN r22144.
2009-10-26 21:32:15 +00:00
Jeff Squyres
bf6e3d4355 Fixes trac:2061: add MPI_OP_COMMUTATIVE.
This commit was SVN r22128.

The following Trac tickets were found above:
  Ticket 2061 --> https://svn.open-mpi.org/trac/ompi/ticket/2061
2009-10-22 21:46:05 +00:00
Jeff Squyres
c78df0d1b4 Fixes trac:2060: MPI-2.2 ticket 7, convert some function pointer typedefs
from "MPI_*_errhandler_fn" to "MPI_*_errhandler_function" (and their
corresponding C++ types, too).  Also updated the corresponding man
pages, and marked the typedefs to the now-deprecated types as
deprecated.

This commit was SVN r22122.

The following Trac tickets were found above:
  Ticket 2060 --> https://svn.open-mpi.org/trac/ompi/ticket/2060
2009-10-22 16:50:45 +00:00
Jeff Squyres
c4f2db926f Add missing semicolons. Wow.
This commit was SVN r22079.
2009-10-08 19:50:19 +00:00
Terry Dontje
0828945eea Fix an issue with #2048 fix that did not goto the error case.
This commit was SVN r22076.
2009-10-08 13:27:32 +00:00
Terry Dontje
58c864699c This commit fixes trac:2048
This commit was SVN r22075.

The following Trac tickets were found above:
  Ticket 2048 --> https://svn.open-mpi.org/trac/ompi/ticket/2048
2009-10-08 12:54:53 +00:00
Jeff Squyres
bc3060d668 Fixes trac:2028. George and I found this via some collaborative debugging
(yay cisco webex!).  Make sure we only go up to OPAL max datatype, not
OMPI max datatype.

This commit was SVN r22016.

The following Trac tickets were found above:
  Ticket 2028 --> https://svn.open-mpi.org/trac/ompi/ticket/2028
2009-09-25 21:52:42 +00:00
George Bosilca
56c653ebcd Add some comments.
This commit was SVN r22008.
2009-09-24 00:08:28 +00:00
Edgar Gabriel
9abeaad6e2 so here is what happens:
in the v1.2 series the cid's could never go above the max. allowed for a
particular pml. Because of that, pml_add_comm never checked for the cid, and
in fact pml_add_comm was called in comm_set, which is *before* we knew the
cid.

in the v1.3 series (and trunk) we check now the cid to detect overflow, and
because of that pml_add_comm has been moved *after* the cid allocation
routine, namely into the comm_activate routine.

in the v1.2 series, the comm_activate contained a synchronization step of the
old communicator in order to prevent incoming fragments on the new
communicator, with the main problem being that the allreduce in the
communicator allocation finished at different times on different processes,
and thus, this scenario could and did really occur.

in the v1.3 series, the comm_activate does not contain the synchronization
step anymore, since we introduced the new queue for fragments with unknown
cid. The problem is however, that whether a fragment is known or not is
decided by using ompi_comm_lookup(), which will return something useful as
soon as the cid allocation finished, even before pml_add_comm has been
called. So there is a small time gap where we will not post a message into
queue for unknown cid's, but we can also not look up the process structure
belonging to the rank in that comm ( that is in pml_ob1_match_recv_frag or
something like that). 


The current fix reintroduces the synchronization step in comm_activate, and
ensures that no fragment can be received for a new communicator before the
synchronization occurs , and thus comm_nextcid() and pml_add_comm has been
called. It seems to be the safest and easiest way for now. Welcome back, v1.2.

This commit was SVN r21970.
2009-09-17 14:37:02 +00:00
Jeff Squyres
c879170c9e Actually, invoke the error on MPI_COMM_WORLD if you have an invalid
communicator.  :-)

This commit was SVN r21942.
2009-09-04 07:40:28 +00:00
Jeff Squyres
a211c55cce Fix some attribute error detection problems reported by Lisandro
Dalcin. 

This commit was SVN r21941.
2009-09-04 05:18:49 +00:00
Jeff Squyres
11d44cec1b Fix MPI_COMM_SPAWN[_MULTIPLE] to only check the info handles for
errors on the root.  Thanks to Federico Golfre Andreasi for reporting
the problem.

This commit was SVN r21838.
2009-08-19 13:24:12 +00:00
Jeff Squyres
c3afac1d50 Fix comment typo
This commit was SVN r21824.
2009-08-14 12:09:19 +00:00
Jeff Squyres
b8332ea2b2 Patch from Kiril to make the parameter checking on MPI_CART_CREATE a
bit more relaxed.

This commit was SVN r21816.
2009-08-13 22:06:38 +00:00
Rainer Keller
6c5532072a - Split the datatype engine into two parts: an MPI specific part in
OMPI
   and a language agnostic part in OPAL. The convertor is completely
   moved into OPAL.  This offers several benefits as described in RFC
   http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
   namely:
    - Fewer basic types (int* and float* types, boolean and wchar
    - Fixing naming scheme to ompi-nomenclature.
    - Usability outside of the ompi-layer.
 - Due to the fixed nature of simple opal types, their information is
   completely
   known at compile time and therefore constified
 - With fewer datatypes (22), the actual sizes of bit-field types may be
   reduced
   from 64 to 32 bits, allowing reorganizing the opal_datatype
   structure, eliminating holes and keeping data required in convertor
   (upon send/recv) in one cacheline...
   This has implications to the convertor-datastructure and other parts
   of the code.
 - Several performance tests have been run, the netpipe latency does not
   change with
   this patch on Linux/x86-64 on the smoky cluster.
 - Extensive tests have been done to verify correctness (no new
   regressions) using:
   1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
    ompi-ddt:
    a. running both trunk and ompi-ddt resulted in no differences
       (except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
       correctly).
    b. with --enable-memchecker and running under valgrind (one buglet
       when run with static found in test-suite, commited)
   2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
      all passed (except for the dynamic/ tests failed!! as trunk/MTT)
   3. compilation and usage of HDF5 tests on Jaguar using PGI and
      PathScale compilers.
   4. compilation and usage on Scicortex.
 - Please note, that for the heterogeneous case, (-m32 compiled
   binaries/ompi), neither
   ompi-trunk, nor ompi-ddt branch would successfully launch.

This commit was SVN r21641.
2009-07-13 04:56:31 +00:00
Rainer Keller
b572dc3591 - As discussed revert r21330, Fortran-configure info should
not end up in OPAL
 - Will post an updated patch for the OMPI_ALIGNMENT_ parts (within C).

This commit was SVN r21342.

The following SVN revision numbers were found above:
  r21330 --> open-mpi/ompi@95596d1814
2009-06-01 19:02:34 +00:00
Rainer Keller
95596d1814 - Move alignment and size output generated by configure-tests
into the OPAL namespace, eliminating cases like opal/util/arch.c
   testing for ompi_fortran_logical_t.
   As this is processor- and compiler-related information
   (e.g. does the compiler/architecture support REAL*16)
   this should have been on the OPAL layer.
 - Unifies f77 code using MPI_Flogical instead of opal_fortran_logical_t

 - Tested locally (Linux/x86-64) with mpich and intel testsuite
   but would like to get this week-ends MTT output


 - PLEASE NOTE: configure-internal macro-names and
   ompi_cv_ variables have not been changed, so that
   external platform (not in contrib/) files still work.

This commit was SVN r21330.
2009-05-30 15:54:29 +00:00
Edgar Gabriel
d93def71ea second part of the 'running out of cids problem', this time focusing on what
happens when hierarch is used. . Two major items:
 - modify the comm_activate step to take an additional argument, indicating
 whether the new communicatio has to go through the collective selection
 step. This is not required sometimes (e.g. when a process calls
 MPI_COMM_SPLIT with color=MPI_UNDEFINED), and contributed significantly to
 the exhaustion of cids.
 - when freeing a communicator, check whether we can reuse the block of cids
 assigned to that comm. This only works if the current front of the cid
 assignment (cid_block_start) is right ater the block of cids assigned to this
 comm.

Fixes trac:1904
Fixes trac:1926

This commit was SVN r21296.

The following Trac tickets were found above:
  Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904
  Ticket 1926 --> https://svn.open-mpi.org/trac/ompi/ticket/1926
2009-05-27 15:21:07 +00:00
Edgar Gabriel
0bc8164a11 fix the group_compare operation which failed to recognize unequal groups in
case the first process of the group was not represented at all in the second
group. Also added some cleanup of the code w.r.t. booleans vs. ints.
 
Thanks for Geoffrey Irving for reporting the bug and providing the initial
solution. 

This commit was SVN r21192.
2009-05-08 13:51:28 +00:00
Rainer Keller
2941cb1494 - Fix Coverity CID 525 and 526 --- and some more;
- due to the <= with we could overrun the array
   - we didn't correctly test at _all_, since we never marked the
     ranks already excluded / included...
   - when returning in error, we should free (elements_int_list)...

This commit was SVN r21186.
2009-05-07 16:45:18 +00:00
Greg Koenig
60485ff95f This is a very large change to rename several #define values from
OMPI_* to OPAL_*.  This allows opal layer to be used more independent
from the whole of ompi.

NOTE: 9 "svn mv" operations immediately follow this commit.

This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Rainer Keller
221fb9dbca ... Delayed due to notifier commits earlier this day ...
- Delete unnecessary header files using
   contrib/check_unnecessary_headers.sh after applying
   patches, that include headers, being "lost" due to
   inclusion in one of the now deleted headers...

   In total 817 files are touched.
   In ompi/mpi/c/ header files are moved up into the actual c-file,
   where necessary (these are the only additional #include),
   otherwise it is only deletions of #include (apart from the above
   additions required due to notifier...)

 - To get different MCAs (OpenIB, TM, ALPS), an earlier version was
   successfully compiled (yesterday) on:
   Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
   Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
   Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled

This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
George Bosilca
05ee4c280e Mismatch between the reported subversion and the one in the mpi.h.
Thanks to Rob Egan for the report.

This commit was SVN r20985.
2009-04-14 05:29:07 +00:00
Jeff Squyres
bf8defc475 Shaun Jackson noted that MPI_STATUS_IGNORE is actually (effectively)
NULL, so testing for NULL as a bad status parameter here is a bad
idea.

This commit was SVN r20891.
2009-03-28 01:24:41 +00:00
Rainer Keller
d8cf4c0fec - Get pgcc on XT to complain less:
In case we use memcmp, strlen, strup and friends include <string.h>
   Also several constants.h are not included directly
 - Let's have mca_topo_base_cart_create  return ompi-errors in
   ompi/mca/topo/base/topo_base_cart_create.c

This commit was SVN r20773.
2009-03-13 02:10:32 +00:00
Rainer Keller
9dea63d63a - Last of intrusive commits (promised)... err for now.
Anyway, this is blocking the move: do not include pml.h
   if not really needed, aka none of the following used:
     mca_pml
     MCA_PML_CALL
     OMPI_ANY_TAG
     OMPI_ANY_SOURCE
     OMPI_PROC_NULL

 - Notable exceptions (deleting in one header->adding):
   - ompi/mca/mtl/psm/
   - ompi/mca/osc/rdma/
   - ompi/mca/btl/openib/btl_openib_endpoint.c depended on
     pml_base_sendreq.h

 - Tested on Linux/x86-64, this time including make check
   (thanks Jeff and Ralph)

This commit was SVN r20725.
2009-03-04 17:06:51 +00:00
Terry Dontje
0178b6c45f Added padding to predefined handle structures to maintain library version to
version compatibility.

This commit was SVN r20627.
2009-02-24 17:17:33 +00:00
Jeff Squyres
f1a6d170dc Revert part of r20537: per lengtyh discussion on the phone and the
devel list, it ''is'' within in the spirit of MPI to allow
MPI_REQUEST_NULL to be passed to MPI_REQUEST_GET_STATUS.  I filed a
ticket proposal with MPI-2.2 to make this officially accepted:

  https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/137

Plus, r20537 didn't revert out all of the machinery for allowing
MPI_REQUEST_NULL or inactive requests, anyway.  So this commit simply
removes the parameter check that was added in r20537, and we're back
to where we were before this whole conversation.  :-)

This commit was SVN r20616.

The following SVN revision numbers were found above:
  r20537 --> open-mpi/ompi@38aab37bb3
2009-02-20 19:57:46 +00:00
Jeff Squyres
7e210fdaf8 Return MPI_ERR_COMM and MPI_ERR_WIN, respectively, for
MPI_COMM|WIN_SET|GET_ERRHANDLER if a bad MPI handle is passed.  Thanks
to Lisandro Dalcín for reporting the issue.

This commit was SVN r20615.
2009-02-20 19:53:48 +00:00
Rainer Keller
d81443cc5a - On the way to get the BTLs split out and lessen dependency on orte:
Often, orte/util/show_help.h is included, although no functionality
   is required -- instead, most often opal_output.h, or               
   orte/mca/rml/rml_types.h                                           
   Please see orte_show_help_replacement.sh commited next.            

 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   actually showed two *missing* #include "orte/util/show_help.h"     
   in orte/mca/odls/base/odls_base_default_fns.c and                  
   in orte/tools/orte-top/orte-top.c                                  
   Manually added these.                                              

   Let's have MTT the last word.

This commit was SVN r20557.
2009-02-14 02:26:12 +00:00
Jeff Squyres
44092c6a21 Don't allow freeing of predefined datatypes. Thanks to Lisandro
Dalcín for reporting the issue.

This commit was SVN r20538.
2009-02-13 00:00:55 +00:00
Jeff Squyres
38aab37bb3 Be a little tougher looking for MPI_*_NULL cases in some functions.
Thanks to Lisandro Dalcín for reporting the issue.

This commit was SVN r20537.
2009-02-12 23:57:41 +00:00
Jeff Squyres
c596a1bcb3 Fix MPI_File_c2f -- ensure that if you invoke
MPI_File_c2f(MPI_FILE_NULL), you actually get 0, not -1.  Thanks for
Lisandro Dalcin for the bug report.

This commit was SVN r20511.
2009-02-11 00:48:12 +00:00
Jeff Squyres
90c28810f4 Fix CID 1122: comm->c_name is a char array (not a pointer), so
comparing it to NULL is not useful.

This commit was SVN r20444.
2009-02-05 15:31:10 +00:00
Jeff Squyres
73ea7a9aa5 Fix CIDs 1211, 1212, 1214: fix error checking in MPI_REDUCE_LOCAL.
This commit was SVN r20435.
2009-02-05 02:18:03 +00:00
Jeff Squyres
4d8a187450 Two major things in this commit:
* New "op" MPI layer framework
 * Addition of the MPI_REDUCE_LOCAL proposed function (for MPI-2.2)

= Op framework =

Add new "op" framework in the ompi layer.  This framework replaces the
hard-coded MPI_Op back-end functions for (MPI_Op, MPI_Datatype) tuples
for pre-defined MPI_Ops, allowing components and modules to provide
the back-end functions.  The intent is that components can be written
to take advantage of hardware acceleration (GPU, FPGA, specialized CPU
instructions, etc.).  Similar to other frameworks, components are
intended to be able to discover at run-time if they can be used, and
if so, elect themselves to be selected (or disqualify themselves from
selection if they cannot run).  If specialized hardware is not
available, there is a default set of functions that will automatically
be used.

This framework is ''not'' used for user-defined MPI_Ops.

The new op framework is similar to the existing coll framework, in
that the final set of function pointers that are used on any given
intrinsic MPI_Op can be a mixed bag of function pointers, potentially
coming from multiple different op modules.  This allows for hardware
that only supports some of the operations, not all of them (e.g., a
GPU that only supports single-precision operations).

All the hard-coded back-end MPI_Op functions for (MPI_Op,
MPI_Datatype) tuples still exist, but unlike coll, they're in the
framework base (vs. being in a separate "basic" component) and are
automatically used if no component is found at runtime that provides a
module with the necessary function pointers.

There is an "example" op component that will hopefully be useful to
those writing meaningful op components.  It is currently
.ompi_ignore'd so that it doesn't impinge on other developers (it's
somewhat chatty in terms of opal_output() so that you can tell when
its functions have been invoked).  See the README file in the example
op component directory.  Developers of new op components are
encouraged to look at the following wiki pages:

  https://svn.open-mpi.org/trac/ompi/wiki/devel/Autogen
  https://svn.open-mpi.org/trac/ompi/wiki/devel/CreateComponent
  https://svn.open-mpi.org/trac/ompi/wiki/devel/CreateFramework

= MPI_REDUCE_LOCAL =

Part of the MPI-2.2 proposal listed here:

    https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/24

is to add a new function named MPI_REDUCE_LOCAL.  It is very easy to
implement, so I added it (also because it makes testing the op
framework pretty easy -- you can do it in serial rather than via
parallel reductions).  There's even a man page!

This commit was SVN r20280.
2009-01-14 23:44:31 +00:00
Jeff Squyres
895edd04f8 Fix CID 468: remove some dead code. r_proc_list was set to NULL but
never used.

This commit was SVN r20272.
2009-01-14 18:15:17 +00:00
Jeff Squyres
d1c6f3f89a * Fix a truckload of Cisco copyrights to be the same as the rest of
the code base.
 * Fix a few misspellings in other copyrights.

This commit was SVN r20241.
2009-01-11 02:30:00 +00:00
Jeff Squyres
a9850c96c5 Cosmetic change.
This commit was SVN r20203.
2009-01-05 19:07:06 +00:00
Jeff Squyres
611ebeab33 Cosmetic: expunge some more old 2-space-indent code (re-indent with
"indent(1)").

This commit was SVN r20179.
2009-01-02 12:55:17 +00:00
Nysal Jan
ee8ec6f6b5 Remove dead/redundant code. Minimize number of calloc invocations
This commit was SVN r20121.
2008-12-12 10:55:50 +00:00
Shiqing Fan
d06604c258 Get rid of the compiler warning message when --enable-picky is used.
Do the checks according to inter/intracommunicator flags.

This commit was SVN r20063.
2008-12-03 17:44:21 +00:00
Shiqing Fan
abd21b6d17 - An update for memchecker :
1. fix a bug in pml_ob1_recvreq/sendreq.c, buffer was made defined where the request has already been released.
2. complete memchecker support for collective functions.
3. change the wrongly spelled function name of memchecker, i.e. '*_isaddressible' should be '*_isaddressable'

This commit was SVN r20043.
2008-11-27 16:34:02 +00:00
George Bosilca
82d1d5d785 The patch for "Unexpected message queue for unknown CID's required" ticket #1460.
I'm unable to split it in two parts, my patch and Edgar's one. So I just update
copyright information for both of us.
What this patch do:
- it use the unexpected queue create by commit r19562 to dispatch the
  unexpected message to the right communicator (once this communicator
  is created and initialized).
- delay the PML comm_add until we have the context_id for the new communicator.
- only do the PML comm_add on processes that really belong to the new
  communicator. Please read the lengthy comment in the source code for the
  reason behind this.

This commit was SVN r19929.

The following SVN revision numbers were found above:
  r19562 --> open-mpi/ompi@acd3406aa7
2008-11-04 21:58:06 +00:00
Jeff Squyres
57a3dce9ba LANL noticed that calling MPI_ABORT invokes opal_output(0, ...)
unconditionally, which can result in a flood of messages to the user
if all MPI processes invoke abort.  Additionally, some users were
confused because they saw the MPI_ABORT opal_output() messages from
''some'' MPI processes, but not ''all'' of them (despite the fact that
every MPI process supposedly invoked MPI_ABORT).  The reason is that
calling MPI_ABORT triggers ORTE to kill all MPI processes, so it's a
race condition as to whether a) all MPI processes actually invoke
MPI_ABORT, and/or b) whether every process is able to opal_output()
before they are killed.

This commit does two simple things:
 * Now use orte_show_help() for the MPI_ABORT message, so they are
   aggregated. 
 * Add a note in the message that calling MPI_ABORT kills all
   processes, so you might not see all output, yadda yadda yadda.

This commit was SVN r19735.
2008-10-14 19:23:03 +00:00
Jeff Squyres
d0a8be6d2f Fix CID 1117: ensure to check return values.
This commit was SVN r19583.
2008-09-19 13:27:30 +00:00
Nysal Jan
4b68803260 Should be coords(i) >= dims(i)
Refs trac:1463

This commit was SVN r19500.

The following Trac tickets were found above:
  Ticket 1463 --> https://svn.open-mpi.org/trac/ompi/ticket/1463
2008-09-05 04:20:48 +00:00
Jeff Squyres
9a98423bbc [Re-]Fix #1463 with a little thing that I like to call "the right
way".

Don't modify coords in the top-level API function because coords is an
IN variable.  Instead, as Nysal noted, the real cause of the problem
was a missing ! down in topo_base_cart_rank.c.  Put a comment down in
topo_base_cart_rank.c explaining what's going on so that the code is
not so cryptic.

Refs trac:1363.

This commit was SVN r19487.

The following Trac tickets were found above:
  Ticket 1363 --> https://svn.open-mpi.org/trac/ompi/ticket/1363
2008-09-03 08:24:27 +00:00
Jeff Squyres
008fa8c5cc Fixes trac:1236, #1237.
* Various changes to enable 0-dimensional cartesian communicators:
   * Set various mtc_* members to NULL when there are 0 dimensions (and
     don't bother trying to memcpy these arrays when duplicating the
     communicator -- because they're NULL)
   * adjust topo_base_cart_sub to correctly handle 0 dimensions
     (simplified it a bit)
   * adjust a few error codes to return ERR_OUT_OF_RESOURCE
   * adjust error checking of CART_CREATE, CART_RANK
 * Allow MPI_GRAPH_CREATE to accept 0 == nnodes.
 * Bump reported MPI version in mpi.h to 2.1

This commit was SVN r19461.

The following Trac tickets were found above:
  Ticket 1236 --> https://svn.open-mpi.org/trac/ompi/ticket/1236
2008-08-31 19:31:10 +00:00
Jeff Squyres
59cb626b7c Fixes trac:1463: ensure periodic dimensions are handled proprly for
MPI_CART_RANK. 

This commit was SVN r19459.

The following Trac tickets were found above:
  Ticket 1463 --> https://svn.open-mpi.org/trac/ompi/ticket/1463
2008-08-31 18:39:05 +00:00
George Bosilca
697dc524c1 Deal with the ticket #1239 and #712. This will upgrade the Open MPI support
for the F90 type create functions to the requirements of MPI 2.1 standard.

Advice to implementors. An application may often repeat a call to
MPI_TYPE_CREATE_F90_xxxx with the same combination of (xxxx,p,r).
The application is not allowed to free the returned predefined, unnamed
datatype handles. To prevent the creation of a potentially huge amount of
handles, the MPI implementation should return the same datatype handle for
the same (REAL/COMPLEX/INTEGER,p,r) combination. Checking for the
combination (p,r) in the preceding call to MPI_TYPE_CREATE_F90_xxxx and
using a hash-table to find formerly generated handles should limit the
overhead of finding a previously generated datatype with same combination
of (xxxx,p,r). (End of advice to implementors.)

This commit fixes trac:1239, and #712.

This commit was SVN r19458.

The following Trac tickets were found above:
  Ticket 1239 --> https://svn.open-mpi.org/trac/ompi/ticket/1239
2008-08-31 18:36:32 +00:00
Jeff Squyres
93746cd594 Fixed CID 807: Remove unused variable
This commit was SVN r19239.
2008-08-11 20:50:09 +00:00
Rolf vandeVaart
e105b3f254 Finish work related to ticket #1392 where the versions
were bumped from v1.0.0 to v2.0.0.  

This change fixed #1439.

This commit was SVN r19175.
2008-08-06 12:16:54 +00:00
Rainer Keller
82580701fb - We may know the *_name is < MPI_MAX_OBJECT_NAME; Prevent does not.
Fix Coverity issues CID1068 and CID1069

This commit was SVN r19167.
2008-08-06 07:59:59 +00:00
Ralph Castain
a0ae63f19e Ensure we call close_port after comm_spawn[_multiple]. Cleanout the port name in close_port
This commit was SVN r19068.
2008-07-28 16:40:11 +00:00
Jeff Squyres
74aa9689e4 From an initial patch from George, update all the set/get errhandler
functions to use atomics in order to be thread safe.

This commit was SVN r18807.
2008-07-03 19:28:02 +00:00
Jeff Squyres
51d833e8d1 Minor fixes and comment clarifications for MPI-2.1-mandated handling
of strings.  We mostly did the Right Things already; I simplified the
code a bit and also had us not write to more characters in the C
bindings than we're supposed to (per language in the MPI-2.1 spec).

Fixes trac:1238.

This commit was SVN r18705.

The following Trac tickets were found above:
  Ticket 1238 --> https://svn.open-mpi.org/trac/ompi/ticket/1238
2008-06-21 19:33:47 +00:00
Ralph Castain
9613b3176c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.

I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.

This commit was SVN r18619.
2008-06-09 14:53:58 +00:00
Ralph Castain
6ddcce4085 Apply a patch from Edgar to fix the Intercomm MTT tests.
Fixes ticket #1332

This commit was SVN r18591.
2008-06-05 12:53:12 +00:00
Rolf vandeVaart
0d8faf7559 Fix the fix for ticket #1298. Thanks George for pointing it out.
This commit was SVN r18488.
2008-05-23 13:33:38 +00:00
Rolf vandeVaart
8c3b31b181 Need to properly handle zero-length scatters and gathers on intercommunicators. Add a check for the MPI_ROOT and MPI_PROC_NULL processes so they do not enter collective module when count=0.
This commit was SVN r18481.
2008-05-22 19:09:43 +00:00
Jeff Squyres
e7ecd56bd2 This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.

= ORTE Job-Level Output Messages =

Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):

 * orte_output(): (and corresponding friends ORTE_OUTPUT,
   orte_output_verbose, etc.)  This function sends the output directly
   to the HNP for processing as part of a job-specific output
   channel.  It supports all the same outputs as opal_output()
   (syslog, file, stdout, stderr), but for stdout/stderr, the output
   is sent to the HNP for processing and output.  More on this below.
 * orte_show_help(): This function is a drop-in-replacement for
   opal_show_help(), with two differences in functionality:
   1. the rendered text help message output is sent to the HNP for
      display (rather than outputting directly into the process' stderr
      stream)
   1. the HNP detects duplicate help messages and does not display them
      (so that you don't see the same error message N times, once from
      each of your N MPI processes); instead, it counts "new" instances
      of the help message and displays a message every ~5 seconds when
      there are new ones ("I got X new copies of the help message...")

opal_show_help and opal_output still exist, but they only output in
the current process.  The intent for the new orte_* functions is that
they can apply job-level intelligence to the output.  As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.

=== New code ===

For ORTE and OMPI programmers, here's what you need to do differently
in new code:

 * Do not include opal/util/show_help.h or opal/util/output.h.
   Instead, include orte/util/output.h (this one header file has
   declarations for both the orte_output() series of functions and
   orte_show_help()).
 * Effectively s/opal_output/orte_output/gi throughout your code.
   Note that orte_output_open() takes a slightly different argument
   list (as a way to pass data to the filtering stream -- see below),
   so you if explicitly call opal_output_open(), you'll need to
   slightly adapt to the new signature of orte_output_open().
 * Literally s/opal_show_help/orte_show_help/.  The function signature
   is identical.

=== Notes ===

 * orte_output'ing to stream 0 will do similar to what
   opal_output'ing did, so leaving a hard-coded "0" as the first
   argument is safe.
 * For systems that do not use ORTE's RML or the HNP, the effect of
   orte_output_* and orte_show_help will be identical to their opal
   counterparts (the additional information passed to
   orte_output_open() will be lost!).  Indeed, the orte_* functions
   simply become trivial wrappers to their opal_* counterparts.  Note
   that we have not tested this; the code is simple but it is quite
   possible that we mucked something up.

= Filter Framework =

Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr.  The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations.  The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc.  This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).

Filtering is not active by default.  Filter components must be
specifically requested, such as:

{{{
$ mpirun --mca filter xml ...
}}}

There can only be one filter component active.

= New MCA Parameters =

The new functionality described above introduces two new MCA
parameters:

 * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
   help messages will be aggregated, as described above.  If set to 0,
   all help messages will be displayed, even if they are duplicates
   (i.e., the original behavior).
 * '''orte_base_show_output_recursions''': An MCA parameter to help
   debug one of the known issues, described below.  It is likely that
   this MCA parameter will disappear before v1.3 final.

= Known Issues =

 * The XML filter component is not complete.  The current output from
   this component is preliminary and not real XML.  A bit more work
   needs to be done to configure.m4 search for an appropriate XML
   library/link it in/use it at run time.
 * There are possible recursion loops in the orte_output() and
   orte_show_help() functions -- e.g., if RML send calls orte_output()
   or orte_show_help().  We have some ideas how to fix these, but
   figured that it was ok to commit before feature freeze with known
   issues.  The code currently contains sub-optimal workarounds so
   that this will not be a problem, but it would be good to actually
   solve the problem rather than have hackish workarounds before v1.3 final.

This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
Rainer Keller
4b89706dfe - Properly check for valid output parameters...
This commit was SVN r18419.
2008-05-09 08:39:24 +00:00
Shiqing Fan
8088ec8bce More for non-blocking communication.
This commit was SVN r18400.
2008-05-07 13:00:28 +00:00
Shiqing Fan
8393fb5d47 Use the new memchecker_call function for memory checking of non-blocking communication.
This commit was SVN r18399.
2008-05-07 12:28:51 +00:00
Shiqing Fan
f35a06119c Use memchecker_convertor_call function instead the old one. Move the function to the place that we can use convertor.
This commit was SVN r18370.
2008-05-05 13:57:27 +00:00
Terry Dontje
8dd0421015 Moved ident lines to ompi_mpi_init.c and created new ompi_version_string
variable.

This commit was SVN r18345.
2008-05-01 15:06:10 +00:00
Josh Hursey
cc83d41ad9 Merge in tmp/jjh-scratch
{{{
 svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch .
}}}

Contains:
 * Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart.
 * Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff
 * Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P.
 * Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry
 * Some other sundry cleanup items all dealing with C/R functionality in the trunk.

This commit was SVN r18241.
2008-04-23 00:17:12 +00:00
Tim Prins
b2acb51d04 make comm_join work again. Allocate memory to the correct pointer.
This commit was SVN r18186.
2008-04-17 11:56:53 +00:00
Ralph Castain
7b91f8baff Cleanup and fix bugs in the MPI dynamics section. Modify the dpm API so it properly takes ports instead of process names (as correctly identified by Aurelien). Fix race conditions in the use of ompi-server. Fix incompatibilities between the mpi bindings and the dpm implemenation that could cause segfaults due to uninitialized memory.
Fix the ompi-server -h cmd line option so it actually tells you something!

Add two new testing codes to the orte/test/mpi area: accept and connect.

This commit was SVN r18176.
2008-04-16 14:27:42 +00:00
Aurelien Bouteiller
921a6ce3d4 Process with different jobid can kwon connet/accept to each other.
This commit was SVN r18134.
2008-04-11 15:40:59 +00:00
Edgar Gabriel
5989fa570c Sorry, previous commit was in the wrong directory. This is the real fix (have
to undo 1822).

The verification of recvcount==0 and rank = root was braking
inter-communicator scatter, since the root (root==MPI_ROOT) might very well
have recvcount=0. The same fix has been applied to gather.c just the other way
round. 
 
Fixes the bug reported on the mainling list by Martin Audet. If there is a
1.2.7 this fix might be worthwhile porting it over.

Please note, that while the test works now for basic and for inter, we get a
0byte malloc warning from the inter module, which we still have to fix in a
separate patch.

This commit was SVN r18123.
2008-04-10 15:03:14 +00:00
Rainer Keller
334b64e760 - Coverity issue CID 35:
Event var_deref_op: Variable "requests" tracked as NULL was
   dereferenced.
   Only check requests[i] for NULL, if requests is != NULL itself.

This commit was SVN r17973.
2008-03-26 08:19:55 +00:00
Rainer Keller
56f3d59f2a - Coverity issues 939, 940, 941:
Event uninit_use_in_call: Using uninitialized value "tag" in call to
   function "(ompi_dpm).connect_accept" and others
   The tag is set and used in get_rport only on root...

This commit was SVN r17972.
2008-03-26 08:09:11 +00:00
Ralph Castain
90107f3c14 Fix an issue with comm_spawn over who sent/recv first in the modex. The modex assumes that the first name on the list is the "root" that will serve as the allgather collector/distributor. The dpm was putting that entity last, which forced us to pre-inform the parent procs of the child proc's contact info since the parent was trying to send to the child.
Clarify the setting of send_first in the mpi bindings (trivial, i know, but helpful)

Remove the extra xcast of child contact info to the parent job.

This commit was SVN r17952.
2008-03-25 14:57:34 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
Josh Hursey
134684d096 A compiler warning fix.
This commit was SVN r17539.
2008-02-21 14:28:08 +00:00
Josh Hursey
99144db970 Improve checkpoint/restart support by allowing a checkpoint to progress when the process is *not* in the MPI library. This involves creating a separate thread for polling for a checkpoint request. This thread is active when the MPI process is not in the MPI library, and paused when the MPI process is in the library.
Some MPI C interface files saw some spacing changes to conform to the coding standards of Open MPI.

Changed MPI C interface files to use {{{OPAL_CR_ENTER_LIBRARY()}}} and {{{OPAL_CR_EXIT_LIBRARY()}}} instead of just {{{OPAL_CR_TEST_CHECKPOINT_READY()}}}. This will allow the checkpoint/restart system more flexibility in how it is to behave.

Fixed the configure check for {{{--enable-ft-thread}}} so it has a know dependance on {{{--enable-mpi-thread}}} (and/or {{{--enable-progress-thread}}}).

Added a line for Checkpoint/Restart support to {{{ompi_info}}}.

Added some options to choose at runtime whether or not to use the checkpoint polling thread. By default, if the user asked for it to be compiled in, then it is used. But some users will want the ability to toggle its use at runtime.

There are still some places for improvement, but the feature works correctly. As always with Checkpoint/Restart, it is compiled out unless explicitly asked for at configure time. Further, if it was configured in, then it is not used unless explicitly asked for by the user at runtime.

This commit was SVN r17516.
2008-02-19 22:15:52 +00:00
Rainer Keller
9cd2c6f48b - Instead of calling RUNNING_ON_VALGRIND,
implement specific function, thereby
   removing bogus requirement on valgrind/valgrind.h
   dough...
 - Call specific function runindebugger() before
   doing expensive checks on each component of struct.
 - Get rid of void* warnings..

This commit was SVN r17438.
2008-02-12 20:37:51 +00:00
Shiqing Fan
54c7b71cfd Use the correct way of including memchecker.h, which will work with '--with-devel-headers'.
This commit was SVN r17435.
2008-02-12 18:01:17 +00:00
Shiqing Fan
f5792bbda5 merging the memchecker into trunk.
This commit was SVN r17424.
2008-02-12 08:46:27 +00:00
Dan Lacher
98f70d6318 Convert the C++ Comm, Datatype and Winn keyval creation and intercept callbacks
to *not* use the STL as well as removing the STL use from the error handler
routines.  This was removing the STL from the C++ bindings (Solaris has 2
versions of the STL; if OMPI uses one and an MPI application wants to use
another, Bad Things happen).

The main idea is to wrap up the C++ callback function pointers and the user's
extra_state into our own struct that is passed as the extra_state to the C
keyval registration along with the intercept routines in intercepts.cc. When the
C++ intercepts are activated, they unwrap the user's callback and extra state
and call them.

This commit was SVN r17409.
2008-02-10 19:29:25 +00:00
George Bosilca
13de3420ab As the receive buffer is only significant at root, limit the
check only where it makes sense.

This commit was SVN r17366.
2008-02-04 01:44:41 +00:00
Rainer Keller
2b4975de8e - In case of MPI_REQUEST_NULL, set the *status to the empty_status,
by copying structure:

   psendrecv.c:81
   4e7:   cmpl   $0x0,0x34(%ebp)           4e7:   cmpl   $0x0,0x34(%ebp)
   4eb:   je     51e <PMPI_Sendrecv+0x51e> 4eb:   je     517 <PMPI_Sendrecv+0x517>
   psendrecv.c:85
   4ed:   mov    0x34(%ebp),%eax           4ed:   mov    0x34(%ebp),%edx
   4f0:   movl   $0xfffffffe,(%eax)        4f0:   mov    0x38,%eax
   psendrecv.c:86                          4f5:   mov    %eax,(%edx)
   4f6:   mov    0x34(%ebp),%eax           4f7:   mov    0x3c,%eax
   4f9:   movl   $0xffffffff,0x4(%eax)     4fc:   mov    %eax,0x4(%edx)
   psendrecv.c:87                          4ff:   mov    0x40,%eax
   500:   mov    0x34(%ebp),%eax           504:   mov    %eax,0x8(%edx)
   503:   movl   $0x0,0x8(%eax)            507:   mov    0x44,%eax
   psendrecv.c:88                          50c:   mov    %eax,0xc(%edx)
   50a:   mov    0x34(%ebp),%eax           50f:   mov    0x48,%eax
   50d:   movl   $0x0,0xc(%eax)            514:   mov    %eax,0x10(%edx)
   psendrecv.c:89
   514:   mov    0x34(%ebp),%eax
   517:   movl   $0x0,0x10(%eax)
   psendrecv.c:91

This commit was SVN r17230.
2008-01-25 12:58:59 +00:00
George Bosilca
25814c07e0 Update the checks in the reduce family collectives.
This commit was SVN r17096.
2008-01-09 20:40:57 +00:00
George Bosilca
906e8bf1d1 Replace the ompi_pointer_array with opal_pointer_array. The next step
(sometimes after the merge with the ORTE branch), the opal_pointer_array
will became the only pointer_array implementation (the orte_pointer_array
will be removed).

This commit was SVN r17007.
2007-12-21 06:02:00 +00:00
Jeff Squyres
b9106a0d25 Back out r16836 and put in a big comment why.
This commit was SVN r16872.

The following SVN revision numbers were found above:
  r16836 --> open-mpi/ompi@6b9048fc6d
2007-12-06 18:45:21 +00:00
Edgar Gabriel
6b9048fc6d check for MPI_GROUP_EMPTY before freeing a group.
fixes: 1110

This commit was SVN r16836.
2007-12-04 16:13:27 +00:00
Jeff Squyres
cf98657adb * Clean up a little #if logic in MPI_WTICK / MPI_WTIME
* Update MPI_WTICK / MPI_WTIME man pages:
   * Fix C++ declarations
   * Note that we may use better than gettimeofday() on some platforms
 * Add "MPI_WTIME support" ("options:mpi-wtime") flag in ompi_info
   output indicating whether we use "native" or "gettimeofday" for
   MPI_WTIME

This commit was SVN r16774.
2007-11-26 18:23:53 +00:00
Ethan Mallove
005652c9d4 * Embed ident strings into the Open MPI libraries using one of the following
methods (in order of precedence):
  1. #pragma ident <ident string> (e.g., Intel and Sun)
  1. #ident <ident string> (e.g., GCC)
  1. static const char ident[] = <ident string> (all others)
By default, the ident string used is the standard Open MPI version string. Only
the following libraries will get the embedded version strings (e.g., DSOs will
not):
  * libmpi.so
  * libmpi_cxx.so
  * libmpi_f77.so
  * libopen-pal.so
  * libopen-rte.so
* Added two new configure options:
  * `--with-package-name="STRING"` (defaults to "Open MPI username@hostname
    Distribution"). `STRING` is displayed by `ompi_info` next to the "Package"
    heading.
  * `--with-ident-string="STRING"` (defaults to the standard Open MPI version
    string - e.g., X.Y.Zr######). `%VERSION%` will expand to the Open MPI
    version string if it is supplied to this configure option.

This commit was SVN r16644.
2007-11-03 02:40:22 +00:00
Josh Hursey
7437f37e96 This commit contains the following:
* Fix some missing includes in a few places.
 * Add the cr_request() functionality to the BLCR CRS component.
   We are now dependent upon the 0.6.* series of BLCR.
 * Made the CR notification mechanism a registered function.
   This way we can have an OPAL-only version and it can be replaced at
   runtime with the ORTE version.
 * Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only
   CR functionality when the user wants it. Default: Disabled.
 * Fix the placement of a checkpoint request check in MPI_Init
 * Pull the OPAL notification mechanism into the SnapC framework.
   * We no longer fork/exec the 'opal-checkpoint' command for local
   checkpointing, the Local coordinator in the orted does this directly.
   * The Local and Application coordinator talk together bypassing the OPAL
   notifiation mechanism.
   * Optimized the Local <-> App Coordinator communication.
   * Improved the structure used to track vpid_snapshots in the local coord.
 * Fix a race condition in which an application under heavy communication load
   may produce an inconsistent global checkpoint.

This commit was SVN r16389.
2007-10-08 20:53:02 +00:00
Andrew Friedley
069e6dc4a0 Fix a bug introduced when the collective selection logic was changed to allow for a different component to be used for each collective.
Passing the barrier module to the bcast function is a bad idea when barrier is using a different component from bcast..

This commit was SVN r16212.
2007-09-25 17:09:52 +00:00
Tim Prins
4033a40e4e Coding standards...
This commit was SVN r16118.
2007-09-13 14:00:59 +00:00
George Bosilca
bfb4ddc3e2 Coverty: remove dead code.
This commit was SVN r16106.
2007-09-12 17:56:33 +00:00
George Bosilca
8622beda54 This commit should fix the issues with ticket 1065. Now, we correctly
duplicate the MPI_UB and MPI_LB datatypes.

This commit was SVN r16083.
2007-09-10 22:13:42 +00:00
George Bosilca
756eee571e Fix Coverty #24. This test didn't make sense in this branch of the if.
This commit was SVN r16001.
2007-08-29 02:02:19 +00:00
Jeff Squyres
f08cce16db Fix Coverity CID 468: remove unused variable.
This commit was SVN r15996.
2007-08-29 01:21:17 +00:00
Jeff Squyres
b69c7688a0 Fix Coverity defect 676: possible NULL dereference in an error
condition.

This commit was SVN r15956.
2007-08-25 12:17:02 +00:00
Brian Barrett
af4e86c25f Update collectives selection logic to allow for multiple components to be
used at nce (up to one unique collective module per collective function).
Matches r15795:15921 of the tmp/bwb-coll-select branch

This commit was SVN r15924.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r15795
  r15921
2007-08-19 03:37:49 +00:00
Brian Barrett
2b8af283de Add ability to completely turn off MPI one-sided support, so that users
can experiment with using ROMIO directly.

This commit was SVN r15922.
2007-08-18 21:35:51 +00:00
Mohamad Chaarawi
8c458b0ee7 removing unused variables, that cause warnings..
This commit was SVN r15791.
2007-08-07 15:13:46 +00:00
Mohamad Chaarawi
59a7bf8a9f Merging in the Sparse Groups..
This commit includes config changes..

This commit was SVN r15764.
2007-08-04 00:41:26 +00:00
Shiqing Fan
0f468f3668 - Remove the solution and project files, will commit them later.
This commit was SVN r15705.
2007-07-31 17:07:02 +00:00
Sven Stork
27422e05ac - add parameter check for NULL pointer
This commit was SVN r15697.
2007-07-31 09:01:39 +00:00
Sven Stork
80cdafb8f4 - remove dead code found by coverity
This commit was SVN r15685.
2007-07-30 15:36:00 +00:00
Sven Stork
855434de59 - fixes several coverty issues
- add missing initialisation for variables
  - use strncpy instead of strcpy

This commit was SVN r15683.
2007-07-30 14:44:37 +00:00
Jeff Squyres
327576b2a3 Fix incorrect behavior noted by Lisandro Dalcini: when MPI_COMM_SELF is
passed to MPI_COMM_FREE, it invoked the error handler on
MPI_COMM_WORLD, not on MPI_COMM_FREE.  This commit changes the
behavior: if MPI_COMM_SELF is passed to MPI_COMM_FREE, we invoke the
error handler on MPI_COMM_SELF (not MPI_COMM_WORLD).  Fixes trac:1109.

This commit was SVN r15682.

The following Trac tickets were found above:
  Ticket 1109 --> https://svn.open-mpi.org/trac/ompi/ticket/1109
2007-07-30 13:01:33 +00:00
Shiqing Fan
4d7b349cdb - Add VC8 solution and project files.
- If one wants to use this solution, remember to unload the project 'orte-restart' which is currently not working for Windows.

This commit was SVN r15680.
2007-07-30 11:05:34 +00:00
Rainer Keller
1dbbfc04b7 - Rename rank to tmp_rank to get rid of warning
This commit was SVN r15672.
2007-07-29 12:56:02 +00:00
Rainer Keller
bb2d0b45cd - Coverity CID37: if requests == NULL, do not deref requests[i] for
multi-completion calls. Therefore reorder tests where appropriate.
 - Always check for NULL-pointers (flag, index, completed) (yes, we
   use them in the underlying ompi_request_* functions.
 - Break early, if a NULL-request is found

This commit was SVN r15671.
2007-07-29 12:54:21 +00:00
Rainer Keller
4ff78f8e2d - Coverity: Just as init.c -- we do not have a communicator yet;
so do not use OMPI_ERRHANDLER_RETURN which dereferences NULL...

This commit was SVN r15670.
2007-07-29 11:47:19 +00:00
Jeff Squyres
71d6c5b811 Arf. Remove a debugging printf. Thanks to Christian for noticing...
This commit was SVN r15605.
2007-07-25 11:00:18 +00:00
Jeff Squyres
e80b7e9dde If MPI_ALLOC_MEM is invoked with a 0 sized request, return NULL. If
MPI_FREE_MEM is invoked with NULL, return success.

Fixes trac:1101.

This commit was SVN r15593.

The following Trac tickets were found above:
  Ticket 1101 --> https://svn.open-mpi.org/trac/ompi/ticket/1101
2007-07-25 01:00:30 +00:00
Shiqing Fan
efa74f7bfe The label name "ERROR" is defined as a flag in Visual Studio platformSDK. Using "ERROR" as a label causes conflicts. Changing it into lowercase will solve the problem.
This commit was SVN r14869.
2007-06-05 14:32:27 +00:00
Rainer Keller
c8668ef83f - Get rid of unused variables / set but never used warnings.
This commit was SVN r14762.
2007-05-24 18:57:51 +00:00
Ralph Castain
fa5a40070d Test the return status code from comm_dyn_start_processes - if we see an error, then let's report it and not continue on with the comm_spawn procedure!
This commit was SVN r14699.
2007-05-18 20:22:32 +00:00
Mohamad Chaarawi
bfaf9d4a12 Added new module for intercomm collectives. This will require an
autogen.

This commit was SVN r14149.
2007-03-27 02:06:42 +00:00
Jeff Squyres
3e2031e0e3 Finally commit something that has been sitting around in one of my
development trees since last year (had to wait for some intel tests to
run yesterday, so I finally took the time to finish this work):

 * Improve MPI API argument checking by also checking for NULL values
   (especially helps when invalid Fortran MPI handles are passed,
   because the various MPI_*f2c functions are supposed to return an
   "invalid" MPI handle [meaning NULL] when this happens).  So now
   OMPI will generate an MPI exception rather than a segv.
 * Removed a few redundant DATATYPE_NULL checks.
 * Also check for some other forms of "invalid" handles (e.g., already
   been freed, etc.) in some cases.  We could probably be a bit more
   stringent in this regard if we really wanted to.
 * Change MPI_Get_processor_name to zero out the string up to
   MPI_MAX_PROCESSOR_NAME characters, because the MPI spec says that
   the string must be at least that long.  We were already passing
   that length to gethostname(), anyway.

This commit was SVN r14100.
2007-03-21 11:10:42 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Jeff Squyres
266e805427 * Update parameter checking per MPI-1:2.4.1 and MPI-1:5.4.1 -- also
return an error if MPI_COMM_NULL is used.  
 * Minor style fixes.

This commit was SVN r14041.
2007-03-16 13:09:49 +00:00
Brian Barrett
e926bed69f Implement MPI_TYPE_CREATE_DARRAY function. Works with MPICH2 darray-pack
test, Sun's darray test, and an internal LANL test code.  I would not
assume it will work properly on other codes, as I'm still not sure I
completely understand what the standard says this function is supposed to
do.

Refs trac:65

This commit was SVN r13967.

The following Trac tickets were found above:
  Ticket 65 --> https://svn.open-mpi.org/trac/ompi/ticket/65
2007-03-08 16:33:08 +00:00
George Bosilca
4b63631535 Allow correct duplication for MPI_UB and MPI_LB. The problem is that we cannot
create a duplicate type, because any duplicate type lose the PREDEFINED flag.
An MPI_LB (respectively MPI_UB) without the PREDEFINED tag is useless, as it's
not the a marker anymore. The solution is to return the same pointer, but once
the reference count has been increased. In order for this to work, I allowed
the destruction to check for the reference count of an object before complaining
about destroying a predefined type.

This fixed ticket #317.

This commit was SVN r13942.
2007-03-06 18:21:49 +00:00
Tim Prins
74555cda51 - Re-enable MPI_Comm_spawn_multiple from singletons. It has been working for a while, but the check was never removed.
- Coding standardize some code
- Remove now unused help message

This commit was SVN r13858.
2007-02-28 22:09:30 +00:00