1
1
Граф коммитов

1734 Коммитов

Автор SHA1 Сообщение Дата
Josh Hursey
66af515061 Fix C/R functionality with the new libtool. This fixes the case where the restarted process cannot be checkpointed or finalized.
Short Version:
--------------
Event engine needs to be flushed so it does not use old/stale file descriptors.

Long Version:
-------------
The problem was that the restarted process was waiting for the socket to the local daemon to finish establishing during the 'sync' operation. The core problem was that the daemon was sending a header of 36 bytes, but the restarted process only received 35 bytes of the message. So the restarted process became stuck waiting for the last byte to arrive.

After many hours of digging, I figured out that the event engine was using the same file descriptor for its evsig_cb functionality (to signal itself when a signal arrives). So when the daemon wrote in to the new fd the event engine was stealing the first byte (*shakes fist at event engine*) before the recv() could be posted.

The solution is to use the event_reinit() function on restart to re-establish the now-stale file descriptors in the event engine. This seems to have fixed the problem.


A few other minor things:
-------------------------
 * Add a check to make sure the event engine is balanced in its init/finalize
 * Add the opal_event_base_close() to the BLCR restart exec function (still not 100% sure it is needed, but there it is).

This commit was SVN r24296.
2011-01-25 22:43:47 +00:00
Jeff Squyres
afa654746c Somehow this has been sitting, uncommitted, in a local checkout since
last December.  :-(

Add new MCA param: maffinity_libnuma_policy.  Thanks to David
Singleton for the suggestion.  Here's the help text about it:

{{{
   MCA maffinity: parameter "maffinity_libnuma_policy" (current value:
                  <loose>, data source: default value)
                  Binding policy that determines what happens if memory
                  is unavailable on the local NUMA node.  A value of
                  "strict" means that the memory allocation will fail;
                  a value of "loose" means that the memory allocation
                  will spill over to another NUMA node.
}}}

This commit was SVN r24290.
2011-01-24 14:39:16 +00:00
Abhishek Kulkarni
3243b16bb3 Decode SOS error code before checking it with the native error code.
This commit was SVN r24281.
2011-01-20 23:21:38 +00:00
Abhishek Kulkarni
45a53b4f7a Add a missing to opal_sos_finalize in opal_finalize_util.
This commit was SVN r24280.
2011-01-20 23:18:02 +00:00
Jeff Squyres
189b541dbd Add a proper help message for the mca_verbose MCA param (and shuffle
the code to be slightly more efficient).

This commit was SVN r24256.
2011-01-14 20:18:06 +00:00
George Bosilca
5390fd6f33 Reshape the datatype engine. The basic types are built down in OPAL. MPI types are
either direct link to these basic predefined types, or a combination of them.
Anyway, the first items in the datatype list belong to OPAL, the second round
are MPI datatypes created by composing basic OPAL datatypes, and the last
batch are mapped datatype (direct correspondance between an OMPI datatype and
an OPAL one such as int -> int32_t).

Modify the op to fit this new scheme.

This commit was SVN r24247.
2011-01-13 06:08:54 +00:00
Ralph Castain
b09f57b03d Update the multicast subsystem - ported from Cisco branch
This commit was SVN r24246.
2011-01-13 01:54:05 +00:00
Terry Dontje
56c03a3853 removing a file I should not have added
This commit was SVN r24220.
2011-01-11 19:02:08 +00:00
Terry Dontje
a374661ead add configure.params to solaris sysinfo module to allow it to be built
This commit was SVN r24219.
2011-01-11 18:31:55 +00:00
Jeff Squyres
cd8f12d8e5 Remove a few useless files that were missed last night.
This commit was SVN r24218.
2011-01-11 14:15:31 +00:00
Jeff Squyres
54cb4eb2b5 Merge over new version of hwloc 1.1 from the vendor branch. Update
the module to use the new hwloc bitmap API (the cpuset API is both
klunkier and deprecated), which simplified a few things.

This commit was SVN r24217.
2011-01-11 01:41:10 +00:00
Ralph Castain
ac1853b5d8 Took me a couple of days, but finally tracked this one down. Some compilers/glibc's don't like composite test statements in a return and just randomly pick one of the two options.
So....don't do that!!!

This commit was SVN r24212.
2011-01-10 16:29:42 +00:00
Josh Hursey
bbfdf04a81 Fix a couple of 'unused variable' warnings, and one return value warning.
{{{
base/paffinity_base_service.c: In function ‘opal_paffinity_base_cset2mapstr’:
base/paffinity_base_service.c:623: warning: unused variable ‘range_last’
base/paffinity_base_service.c:623: warning: unused variable ‘range_first’
base/paffinity_base_service.c:622: warning: unused variable ‘count’
base/paffinity_base_service.c:622: warning: unused variable ‘m’
}}}

{{{
connect/btl_openib_connect_oob.c: In function ‘init_ud_qp’:
connect/btl_openib_connect_oob.c:1111: warning: control reaches end of non-void function
connect/btl_openib_connect_oob.c: In function ‘init_device’:
connect/btl_openib_connect_oob.c:1235: warning: unused variable ‘i’
connect/btl_openib_connect_oob.c: In function ‘get_pathrecord_sl’:
connect/btl_openib_connect_oob.c:1323: warning: unused variable ‘i’
}}}

This commit was SVN r24196.
2010-12-30 15:37:50 +00:00
Ethan Mallove
9251785161 Emit an error (instead of a SEGV) if the "compiler" parameter is not set
in the wrapper data file.

This commit was SVN r24190.
2010-12-21 19:01:39 +00:00
Jeff Squyres
a525e70f46 Convert "opal_show_help" to be a global variable pointer.
It is statically initialized to the real back-end OPAL show_help
function.  During orte_show_help_init(), the variable is re-assigned
with the value of the back-end ORTE show_help function (the one that
does error message aggregation).  

Therefore, anything that calls opal_show_help() after a certain point
in orte_init() will have their show_help messages be aggregated.
w00t!  Even code down in OPAL -- that has no knowledge of ORTE -- will
have their messages aggregated.  '''Double w00t!'''

During orte_show_help_finalize(), we restore the original pointer
value so that it something calls opal_show_help() after
orte_finalize(), it'll still work properly (but it won't be
aggregated).  

This commit was SVN r24185.
2010-12-16 23:00:25 +00:00
Terry Dontje
6da16ab0d7 add format parameter and layout format to OMPI_Affinity_str
This commit was SVN r24182.
2010-12-16 15:11:17 +00:00
Shiqing Fan
883aeedd26 Add support for nanosleep function using Sleep on Windows. The accuracy of the sleep function on Windows is 1 millisecond mentioned in MSDN doc.
This commit was SVN r24175.
2010-12-15 15:43:25 +00:00
Shiqing Fan
ec82e73bce use sockets instead of pipes on Windows.
This commit was SVN r24174.
2010-12-15 14:34:25 +00:00
George Bosilca
b4355408f5 Fix the Sparc and Sparcv9 atomics based on Nicolai Stange
patch.

CMR:v1.5
CMR:v1.4

This commit was SVN r24150.
2010-12-03 19:16:53 +00:00
George Bosilca
bb412a5ff7 Indentation.
This commit was SVN r24148.
2010-12-03 19:13:57 +00:00
Rolf vandeVaart
3f7dd84278 Fix libevent so it can compile in the few cases where sys/queue.h does not exist.
1. Remove it from libevent207.h because it is not needed.
2. Add compat to the include list so it can use queue.h when needed.

This commit was SVN r24144.
2010-12-02 23:05:02 +00:00
Shiqing Fan
f43862420c Convert the bad dos line endings to unix style for all windows related files.
This commit was SVN r24137.
2010-12-02 12:08:08 +00:00
Ralph Castain
c56185887b Change the event base "wakeup" support to enable the passing of events to the central thread for add/del. Add a macro OPAL_UPDATE_EVBASE for this purpose as it will likely be widely used.
Update the ORTE thread support to utilize this capability. Update the rmcast framework to track the change.

This commit was SVN r24121.
2010-12-01 04:26:43 +00:00
Ralph Castain
2523c9b2e8 Overload the event_base_t struct to include a (hopefully) temporary change to deal with cross-event-base synchronization. This is done transparently so no code changes are required within the rest of the code base. Comments explain what was changed and why.
This commit was SVN r24105.
2010-11-30 16:14:19 +00:00
Ralph Castain
71f116d21f Expose the event_active API
This commit was SVN r24090.
2010-11-24 23:30:13 +00:00
Ralph Castain
380835602c Add support for internal libevent threading support. Add configure logic to define an appropriate flag, and then use that flag to expose the required functions.
This commit was SVN r24088.
2010-11-24 23:24:53 +00:00
Ralph Castain
aa467162da Add a "name" field to the condition wait object to help with debugging
This commit was SVN r24087.
2010-11-24 23:20:06 +00:00
Shiqing Fan
39c9f7468e Add support for managing priorities of windows mca components.
Correct the generated strings in mpi.h.

This commit was SVN r24082.
2010-11-23 19:09:06 +00:00
Rolf vandeVaart
e7ff9375d7 Use pid_t to avoid warnings on some platforms.
This commit was SVN r24072.
2010-11-19 17:14:33 +00:00
Rolf vandeVaart
1735f98c78 Avoid potential warnings by using pid_t in all places.
This commit was SVN r24071.
2010-11-19 16:29:45 +00:00
Shiqing Fan
358b4a5cba Add an option to enable the debug postfix for executables.
This commit was SVN r24070.
2010-11-19 15:54:13 +00:00
Rolf vandeVaart
4b14a6416f No need to conditionalize around this macro. It turns
out it is needed even in one case when we configure
--without-threads.

This commit was SVN r24069.
2010-11-19 15:47:48 +00:00
Shiqing Fan
4fea0f021e Per r24062, this should also be removed.
This commit was SVN r24064.

The following SVN revision numbers were found above:
  r24062 --> open-mpi/ompi@3b0caf7dea
2010-11-17 17:14:55 +00:00
Rolf vandeVaart
3b0caf7dea Remove inclusion of stdbool.h where not needed.
Change OMPI code in libevent to not use bool.
Add some comments to indicate OMPI specific code.
This should fix compiles on Sun Studio Solaris.

This commit was SVN r24062.
2010-11-17 15:14:00 +00:00
George Bosilca
96abaf2e17 Pushing the Debian patch (based on Manuel Prinz modifications).
This commit was SVN r24061.
2010-11-17 02:36:03 +00:00
George Bosilca
d997ef4f49 Update the copyright. Add the fix from Tim Mattox regarding the computation
of the upper bound.

This commit was SVN r24060.
2010-11-17 02:10:15 +00:00
Ralph Castain
32be69eaef Update the OMPI libevent interface module and the internal libevent event.c file to provide ability to disable specific event modes. Basically an issue between #define and checking to see if the value was defined to zero.
This commit was SVN r24056.
2010-11-16 16:06:32 +00:00
Ralph Castain
1b3421f16e Fix a bug spotted by Rolf - ensure that disable-event-xxx results in the corresponding have_event_xxx being undefined or defined to 0
This commit was SVN r24055.
2010-11-16 04:37:30 +00:00
Rolf vandeVaart
37d5267895 The fix for ticket #2560 was somehow removed in the
great autogen update.  Therefore, put them back.

This commit was SVN r24053.
2010-11-15 21:41:56 +00:00
Ralph Castain
b43a4509ac Remove stale mca param. Ensure that verbosity gets properly set for event framework debug
This commit was SVN r24050.
2010-11-13 15:37:17 +00:00
Ralph Castain
db014edb0b Initialize boolean
This commit was SVN r24048.
2010-11-13 15:31:55 +00:00
George Bosilca
36ce319869 Add a second version of the datatype copy function using memmove instead of memcpy.
As memmove is slower than memcpy, I added the required logic to only use it when
really necessary.

No modification from developers point of view, you should always call
opal_datatype_copy_content_same_ddt.

This commit was SVN r24047.
2010-11-12 23:22:35 +00:00
Jeff Squyres
e4744b4ed5 Per http://www.open-mpi.org/community/lists/devel/2010/11/8671.php,
change a bunch of OMPI_<foo> names to OPAL_<foo>.

This commit was SVN r24046.
2010-11-12 23:22:11 +00:00
Shiqing Fan
1f4eae2046 Type cast for compiling under VS 2010.
This commit was SVN r24044.
2010-11-12 08:31:23 +00:00
Shiqing Fan
c03ea1a5f3 A more clean way to build on Windows.
It's not possible to combine two shared libraries on Windows, so we have to do it a bit different. First generate a small event static library by just linking the object files, and link it into other libraries that needs the libevent API.

This commit was SVN r24039.
2010-11-11 12:02:54 +00:00
Jeff Squyres
52f5dd906c Fixes trac:2611: get updated code from GASNet (thanks Paul Hargrove!) that
handles more modern versions of Autoconf's --program-transform
arguments.  Also make it clear that the message is coming from Open
MPI logic, so that we don't blame Autoconf, Red Hat, or anyone else
next time!

This commit was SVN r24024.

The following Trac tickets were found above:
  Ticket 2611 --> https://svn.open-mpi.org/trac/ompi/ticket/2611
2010-11-09 23:36:30 +00:00
Jeff Squyres
dded8a9756 Ensure to always remove the .new file
This commit was SVN r24023.
2010-11-09 23:33:53 +00:00
Shiqing Fan
482a621e31 Change the behavior of exporting/importing symbols on Windows, so that to fit the new build procedure, i.e. import statically linked opal/orte libraries for other libraries/binaries. There are several use cases when creating dll libraries:
1. create DLL A, export symbols of A, import nothing  (A normally is OPAL)
   should define _USRDLL , A_EXPORT 

2. create DLL B, export symbols of B, import A.lib    (B could be ORTE, OMPI or other ompi tools)
   should define _USRDLL, B_EXPORT

3. create DLL C, import B.dll    (C could be external libs or apps)
   should define B_IMPORT

This commit was SVN r24016.
2010-11-09 16:13:30 +00:00
Shiqing Fan
7bac326920 Fix Windows build, add custom command to generate static libraries (opal and orte) for shared build.
This commit was SVN r24012.
2010-11-09 08:32:45 +00:00
Ralph Castain
fa919be622 Add support for thread_kill
This commit was SVN r24008.
2010-11-08 19:06:10 +00:00
Terry Dontje
8e0b24a45b add comment to r23998 code change to be able to track libevent code change better
This commit was SVN r24005.

The following SVN revision numbers were found above:
  r23998 --> open-mpi/ompi@e8aa8984a8
2010-11-08 14:36:28 +00:00
Terry Dontje
e8aa8984a8 corrected stdbool.h inclusion to allow Oracle C++ compilers to work with OMPI
This commit was SVN r23998.
2010-11-05 18:54:19 +00:00
Josh Hursey
676adfb7cc fix compile error, need to revisit this line later
This commit was SVN r23991.
2010-11-03 17:34:11 +00:00
Shiqing Fan
a7dc32afb0 Remove the OPAL_DECLSPEC for the event functions.
This commit was SVN r23987.
2010-11-03 09:10:12 +00:00
Shiqing Fan
505efbaa27 Update the CMake scripts, solve a few export symbols for Windows.
This commit was SVN r23976.
2010-11-02 16:39:27 +00:00
Jeff Squyres
9c15a30b75 Really fix the libevent make distcheck problem. The main issue is how
libevent creates its event-config.h during "make all" (vs. during
configure).  The prior method around this didn't work because it wrote
an event-config.h.in in the source tree -- a Bad Idea(tm).  The new
way uses AC_CONFIG_COMMAND to get stuff executed at the end of
config.status to create event-config.h.  This seems to work properly
during make distcheck.

This commit was SVN r23975.
2010-11-01 23:28:50 +00:00
Jeff Squyres
6bd41cf5d8 Fixes for vpath builds; this should enable 'make dist' again.
This commit was SVN r23973.
2010-10-29 22:07:52 +00:00
Ralph Castain
0171e05942 Only add include paths for event headers if --with-devel-headers was specified
This commit was SVN r23968.
2010-10-29 00:43:10 +00:00
Ralph Castain
838ed14401 Include the libevent headers when --with-devel-headers is specified. Ensure that the proper include paths are added to the wrapper compilers - thanks to Jeff for figuring out how to do it.
This commit was SVN r23967.
2010-10-28 21:26:07 +00:00
Ralph Castain
9ea2b196ce Convert the opal_event framework to use direct function calls instead of hiding functions behind function pointers. Eliminate the opal_object_t abstraction of libevent's event struct so it can be directly passed to the libevent functions.
Note: the ompi_check_libfca.m4 file had to be modified to avoid it stomping on global CPPFLAGS and the like. The file was also relocated to the ompi/config directory as it pertains solely to an ompi-layer component.

Forgive the mid-day configure change, but I know Shiqing is working the windows issues and don't want to cause him unnecessary redo work.

This commit was SVN r23966.
2010-10-28 15:22:46 +00:00
Brian Barrett
50394a05f2 Restore ordering to installdirs components
This commit was SVN r23964.
2010-10-28 01:03:16 +00:00
Terry Dontje
b3f2ac8d46 removed direct include of stdbool.h from event.h that was causing studio C++ issues. Also removed include of stdbool.h in a couple other places since it was already being pulled in via opal_config_bottom.h.
This commit was SVN r23963.
2010-10-27 20:47:42 +00:00
Shiqing Fan
199df1eadf Rename a few var names.
This commit was SVN r23959.
2010-10-27 11:52:57 +00:00
Jeff Squyres
33c3b71317 We had long-ago added a new loop type to libevent: EVLOOP_ONELOOP.
After talking with Brian, we're pretty sure that this is only because
really, really old libevent didn't allow bitwise or-ing of the other
loop types, because what we really need is (EVLOOP_ONCE |
EVLOOP_NONBLOCK).  And that's what EVLOOP_ONELOOP did (i.e., we
changed the logic of libevent's event.c to let ONELOOP do both ONCE
and NONBLOCK things).

In the new libevent version, we didn't implement EVLOOP_ONELOOP
properly.  As a result, and we got hangs in the SM BTL add_procs
function.  Note that the SM BTL wasn't to blame -- it was purely a
side-effect of bad ONELOOP integration (i.e., if you got past the SM
BTL add_procs, you may well have hung somewhere else).

This commit removes all ONELOOP customizations from event.c and
returns it to (almost) its original state from the libevent 2.0.7-rc
distribution.  Everwhere in the code base where we used ONELOOP, we
now use (ONCE | NONBLOCK).

This commit was SVN r23957.
2010-10-26 20:29:22 +00:00
Ralph Castain
a5c440c974 Turn off libevent's internal thread support to (hopefully) minimize performance hit
This commit was SVN r23956.
2010-10-26 20:10:44 +00:00
Shiqing Fan
fae7076d64 add new files into the tarball.
This commit was SVN r23952.
2010-10-26 14:55:37 +00:00
Shiqing Fan
a3d9c91ff7 Exclude stdbool.h for Windows, and use the definition in opal. Immigrate the socket pair support from libevent. Fix other minor things and make it compile.
This commit was SVN r23951.
2010-10-26 14:53:50 +00:00
Ralph Castain
847e43703f Remove cruft
This commit was SVN r23950.
2010-10-26 14:49:36 +00:00
Shiqing Fan
b2c3cb300c Correctly configure the new libevent mca for Windows.
This commit was SVN r23946.
2010-10-26 09:33:47 +00:00
Ralph Castain
86c7365e8e Clean up a few initialization issues - don't think these are impacting the shared memory situation as it didn't fix the problem.
Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution.

This commit was SVN r23943.
2010-10-26 02:41:42 +00:00
Jeff Squyres
ed1e9a412a Need these files in all tarballs -- so don't conditionally add them to
EXTRA_DIST. 

This commit was SVN r23938.
2010-10-25 18:31:38 +00:00
Jeff Squyres
d14474969b Need this variable in optimized builds, too.
This commit was SVN r23937.
2010-10-25 18:31:01 +00:00
George Bosilca
bc3e1376ba event-config.h only exists in the builddir, so we need to explicitly
include it while building.

This commit was SVN r23936.
2010-10-25 18:29:52 +00:00
George Bosilca
c2e40f8616 Remove a warning about signed to unsigned comparaison.
This commit was SVN r23935.
2010-10-25 18:29:11 +00:00
George Bosilca
b9a06afd98 opal_event_libevent207 is prototyped as const, so it should be defined as const.
This commit was SVN r23934.
2010-10-25 18:28:42 +00:00
Jeff Squyres
1d1571a86c Fix vpath builds.
This commit was SVN r23932.
2010-10-25 17:48:02 +00:00
Ralph Castain
a04da165bc Remove the sample and test code from the libevent distro - don't need to include them in ompi
This commit was SVN r23931.
2010-10-25 14:53:33 +00:00
Ralph Castain
bab990d812 Revert r23928 as being the incorrect fix. The correct fix is not to include ipv6 interfaces when ipv6 support was not requested.
This commit was SVN r23930.

The following SVN revision numbers were found above:
  r23928 --> open-mpi/ompi@7394f6d167
2010-10-25 14:31:18 +00:00
Abhishek Kulkarni
c671ec52d1 Fix broken trunk compile after the libevent changes.
This commit was SVN r23929.
2010-10-25 14:11:48 +00:00
Ralph Castain
7394f6d167 Silence warnings about IPV6 sa_family not known when ipv6 support is not enabled in configure
This commit was SVN r23928.
2010-10-25 13:56:23 +00:00
Ralph Castain
fceabb2498 Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.

Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.

Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.

I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:

1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)

2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.

There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.

This commit was SVN r23925.
2010-10-24 18:35:54 +00:00
Jeff Squyres
082086085b Fix some wordings in the test messages
This commit was SVN r23924.
2010-10-23 14:35:25 +00:00
Jeff Squyres
8ffb046649 Fix a few problems with the compiler visibility test:
* Update to be safe for AC 2.68 by using AC_LINK_IFELSE instead of
   AC_TRY_LINK
 * If enable visibility was used, ensure we fail if the compiler
   doesn't support it
 * Rename OMPI_CHECK_VISIBILITY -> OPAL_CHECK_VISIBILITY (and all
   internal variables)

This commit was SVN r23923.
2010-10-23 14:32:44 +00:00
Ralph Castain
29b16cc800 Add missing include
This commit was SVN r23892.
2010-10-15 04:04:20 +00:00
Jeff Squyres
e09bbb49a9 No need to have this AC_ARG_WITH in every component configure.m4 -- just put it up in the framework-level configure.m4.
This commit was SVN r23890.
2010-10-14 22:39:48 +00:00
Jeff Squyres
66d15035ab Replace some intentional-segv's with abort(). Seems safer and doesn't
cause all kinds of compiler warnings.

This commit was SVN r23889.
2010-10-14 22:01:14 +00:00
Sylvain Jeaugey
5fb2a2f2c9 Add a check for the ummunotify device before setting up ptmalloc2 hooks.
This commit was SVN r23882.
2010-10-11 15:05:57 +00:00
Sylvain Jeaugey
78176d2aeb Fix missing include in ummunotify
This commit was SVN r23881.
2010-10-11 15:03:00 +00:00
Jeff Squyres
69a64e5905 Fix typo that prevented the valgrind component from configuring properly
This commit was SVN r23874.
2010-10-07 22:39:08 +00:00
Jeff Squyres
a95ca7444e Fix a meaningless compare of an unsigned against 0. Rework the logic
a bit so that the secondary loop isn't even necessary; makes the whole
thing much simpler, anyway.

This commit was SVN r23860.
2010-10-07 15:04:50 +00:00
Ralph Castain
d9389689b1 Fix yet another mangling
This commit was SVN r23818.
2010-09-30 17:53:52 +00:00
Jeff Squyres
73bcc4a36b Fix mistake that came in via the ompi-agen tree in r23764. The mistake wasn't part of the core autogen upgrade; it was an additional 'bonus' cleanup. Oops. The mistake will always create a set of directories under installdir, even if you do not --with-devel-headers. The set of directories will be empty, but still -- they should not be there at all. This commit fixes that -- the directories are not created at all if you do not --with-devel-headers
This commit was SVN r23801.

The following SVN revision numbers were found above:
  r23764 --> open-mpi/ompi@40a2bfa238
2010-09-24 22:53:28 +00:00
Jeff Squyres
7ef20f60f3 Autoconf updates to make us compatible with AC 2.68. Thanks to Ralf W. for the patch!
This commit was SVN r23797.
2010-09-23 22:37:52 +00:00
Ralph Castain
407eefc66d Update the if configure to include "opal" so they will build!
This commit was SVN r23787.
2010-09-22 03:19:15 +00:00
Ralph Castain
3631e4e936 Revert remaining svn kruft from r23764
This commit was SVN r23786.

The following SVN revision numbers were found above:
  r23764 --> open-mpi/ompi@40a2bfa238
2010-09-22 01:11:40 +00:00
Jeff Squyres
0ca617e570 Make this a warning, not an error.
This commit was SVN r23767.
2010-09-18 07:14:58 +00:00
Ralph Castain
40a2bfa238 WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gone into validating the branch. However, not all conditions can be checked, so users are cautioned that it may be advisable to not update from the trunk for a few days to allow MTT to identify platform-specific issues.
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.

Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.

This commit was SVN r23764.
2010-09-17 23:04:06 +00:00
Rolf vandeVaart
09750d0310 Need output.h header file for opal_output() definition.
Otherwise, build will fail when configuring with --enable-picky.

This commit was SVN r23763.
2010-09-17 12:22:17 +00:00
Shiqing Fan
9a47ca1995 Correct the place of including the if.h, and change retain_loopback to opal_if_retain_loopback for windows module too.
This commit was SVN r23756.
2010-09-14 14:03:48 +00:00
Ralph Castain
c74ce1632a Catch a couple of places (one hidden inside an #if 0, other in solaris module) where retain_loopback needs to be opal_if_retain_loopback
This commit was SVN r23755.
2010-09-14 11:37:10 +00:00
Shiqing Fan
95b17c1e82 Add a missing header for if windows.
This commit was SVN r23754.
2010-09-14 07:51:38 +00:00
Ralph Castain
e96b5f486f Reorganize the opal interface code in opal/util/if.c per prior emails and telecon discussions. Move the interface discovery code into a framework so that configuration logic can separate it out (instead of the prior #if-#else confusion).
All interface APIs for accessing the info remain unchanged in opal/util/if.c.

This has been tested on Mac, Linux, and NetBSD. Nobody else seemed interested in testing it, so there may be some future problems revealed as people try it on other OSs.

This commit was SVN r23743.
2010-09-13 01:58:51 +00:00
Jeff Squyres
3b14366c85 Fix a copyright statement
This commit was SVN r23741.
2010-09-12 09:55:01 +00:00
Rolf vandeVaart
ef8090ec71 Fix the ia32 atomic add and subtract functions so they
do the right thing.  They now properly return
the value after the update.  This also fixes all warnings
reported by the Sun Studio compiler.  George provided the
new assembly routines.  I added some configure code to make
sure the compilers could handle it.

This fixes trac:2560.

This commit was SVN r23721.

The following Trac tickets were found above:
  Ticket 2560 --> https://svn.open-mpi.org/trac/ompi/ticket/2560
2010-09-08 10:47:15 +00:00
Rolf vandeVaart
14e7bcc383 Create new entries in the wrapper data files so the
administrator can specify compiler flags that get
inserted into the command before the user's flags.
These flags can be specified at configure time.
Reviewed by Jeff Squyres.

This fixes ticket #2474.

This commit was SVN r23709.
2010-09-02 10:47:55 +00:00
Rainer Keller
97511912ec - Fixup several functions, that cannot return
- Add one instance where we do not use a parameter in a function
 - Fix a buglet in commit r23689, where the attribute-for-function ptrs
   was applied.

This commit was SVN r23690.

The following SVN revision numbers were found above:
  r23689 --> open-mpi/ompi@5eb571c458
2010-08-31 12:21:13 +00:00
Rainer Keller
5eb571c458 - As suggested in CMR #2558, attribute-macros should be
be tested on function pointers and assigned accordingly,
   instead of using the pre-processor in the header files.

   A functional change is (re-) specifying __opal_attribute_noreturn__
   on orte_errmgr_base_abort(): All modules in the errmgr framework
   either use this function, or define their own abort function,
   which sets __opal_attribute_noreturn__.
   This attributes was taken out with the errmgr overhaul in r22872.

This commit was SVN r23689.

The following SVN revision numbers were found above:
  r22872 --> open-mpi/ompi@e4f2d03d28
2010-08-31 10:28:51 +00:00
Brad Benton
09c4f4d95c Added copyright notices for the files modified in r23669.
This commit was SVN r23687.

The following SVN revision numbers were found above:
  r23669 --> open-mpi/ompi@271cfa8c9a
2010-08-30 17:46:47 +00:00
Jeff Squyres
3eedbee7a4 Fixes trac:2541. Ensure that we keep CPPFLAGS if a non-standard valgrind location was specified. CMR:v1.4.3 CMR:v1.5
This commit was SVN r23680.

The following Trac tickets were found above:
  Ticket 2541 --> https://svn.open-mpi.org/trac/ompi/ticket/2541
2010-08-27 22:45:02 +00:00
Rainer Keller
4abcf5a0d7 - The Sun-compiler 12 update 1 complains about noreturn-attributes
assigned to function-declarations.
   Check this case and mark the currently only case existing in trunk.

   Thanks to Paul Hargrove for bringing this up.

   Let's test the svn commit msg CMR:v1.5

This commit was SVN r23676.
2010-08-27 09:18:30 +00:00
Rainer Keller
044b387d3c - If we don't compile with PGI, then mark the parameter as unused,
otherwise we get swamped with warnings by gcc, everywhere header is
   included.
 - Remove redundant declaration of opal_datatype_safeguard_pointer_debug_breakpoint

   Check whether  CMR:v1.5 works

This commit was SVN r23674.
2010-08-26 15:07:18 +00:00
Nysal Jan
271cfa8c9a Fix the the opal_path_nfs test for GPFS. Reported by Paul H. Hargrove
This commit was SVN r23669.
2010-08-26 10:10:16 +00:00
Jeff Squyres
97fb426325 Per long-ago RFC, now that the odsl default module reports errors nicely, remove all paffinity components except for hwloc and test.
This commit was SVN r23666.
2010-08-25 22:34:30 +00:00
Jeff Squyres
a5ce58f098 Define that we return OPAL_ERR_TIMEOUT if the other end of the socket
closes in an opal_fd_read().

This commit was SVN r23650.
2010-08-24 19:07:04 +00:00
Ethan Mallove
f42c2a737f Fixes trac:2532 - "MPI_Put can result in SIGBUS on SPARC"
Reviewed by Rolf V and Brian B

This commit was SVN r23649.

The following Trac tickets were found above:
  Ticket 2532 --> https://svn.open-mpi.org/trac/ompi/ticket/2532
2010-08-24 18:10:43 +00:00
Ralph Castain
51833bfe6c Not -everyone- wants to ignore loopback devices. Give us a choice.
This commit was SVN r23637.
2010-08-24 02:37:05 +00:00
Shiqing Fan
c110edbf44 Use exclude lists for non-ordinary sub directories check.
This commit was SVN r23631.
2010-08-23 09:43:05 +00:00
Rolf vandeVaart
e71827b8ff Undo 4 of the 5 changes introduced by r22638. Leave
one of them in as it may still be needed on Solaris.

This fixes trac:2530.

This commit was SVN r23626.

The following SVN revision numbers were found above:
  r22638 --> open-mpi/ompi@2a4b1227d9

The following Trac tickets were found above:
  Ticket 2530 --> https://svn.open-mpi.org/trac/ompi/ticket/2530
2010-08-18 20:06:50 +00:00
Rainer Keller
33f2b9398e - This warning now is not supported anymore. Using it generates
a warning itselve (when another warning is generated within the file),
   which can be rather anying.
   Therefore check for output regarding this unrecognized warning.

This commit was SVN r23624.
2010-08-18 06:01:23 +00:00
Ralph Castain
23904c2f3e Correct the extra_dist path to the .windows file
This commit was SVN r23613.
2010-08-14 01:21:58 +00:00
Jeff Squyres
a2f349167e Update hwloc to 1.0.3a1r2398. This fixes a problem with Solaris
linking against libibverbs on Solaris.

Sorry for the mid-day configure change folks; I meant to commit this
last night and forgot.  :-(

This commit was SVN r23606.
2010-08-13 13:18:09 +00:00
Shiqing Fan
550f180014 Add a windows support file into the tarball.
This commit was SVN r23605.
2010-08-13 11:54:13 +00:00
Rainer Keller
14aad075eb - On Jaguar, we don't have pretty printed stackframe, aka no opal_stackframe_output*
This commit was SVN r23602.
2010-08-12 14:44:56 +00:00
Shiqing Fan
330999e36c Some fixes for C/R enhancement on Windows. Add the option and fix some type casts, just let it compile.
This commit was SVN r23599.
2010-08-12 13:31:37 +00:00
Josh Hursey
e12ca48cd9 A number of C/R enhancements per RFC below:
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php

Documentation:
  http://osl.iu.edu/research/ft/

Major Changes: 
-------------- 
 * Added C/R-enabled Debugging support. 
   Enabled with the --enable-crdebug flag. See the following website for more information: 
   http://osl.iu.edu/research/ft/crdebug/ 
 * Added Stable Storage (SStore) framework for checkpoint storage 
   * 'central' component does a direct to central storage save 
   * 'stage' component stages checkpoints to central storage while the application continues execution. 
     * 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress) 
     * 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching) 
 * Added Compression (compress) framework to support 
 * Add two new ErrMgr recovery policies 
   * {{{crmig}}} C/R Process Migration 
   * {{{autor}}} C/R Automatic Recovery 
 * Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component 
 * Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option) 
   * {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342) 
   * {{{OMPI_CR_Restart}}} 
   * {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules) 
   * {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192) 
   * {{{OMPI_CR_Quiesce_start}}} 
   * {{{OMPI_CR_Quiesce_checkpoint}}} 
   * {{{OMPI_CR_Quiesce_end}}} 
   * {{{OMPI_CR_self_register_checkpoint_callback}}} 
   * {{{OMPI_CR_self_register_restart_callback}}} 
   * {{{OMPI_CR_self_register_continue_callback}}} 
 * The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future. 
 * Add a progress meter to: 
   * FileM rsh (filem_rsh_process_meter) 
   * SnapC full (snapc_full_progress_meter) 
   * SStore stage (sstore_stage_progress_meter) 
 * Added 2 new command line options to ompi-restart 
   * --showme : Display the full command line that would have been exec'ed. 
   * --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413) 
 * Deprecated some MCA params: 
   * crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir 
   * snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir 
   * snapc_base_global_shared deprecated, use sstore_stage_global_is_shared 
   * snapc_base_store_in_place deprecated, replaced with different components of SStore 
   * snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref 
   * snapc_base_establish_global_snapshot_dir deprecated, never well supported 
   * snapc_full_skip_filem deprecated, use sstore_stage_skip_filem 

Minor Changes: 
-------------- 
 * Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing. 
 * Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components 
 * Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it. 
 * Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}} 
 * Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set. 
 * opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality. 
 * Cleanup the CRS framework and components to work with the SStore framework. 
 * Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably). 
 * Add 'quiesce' hook to CRCP for a future enhancement. 
 * We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}. 
 * Add optional application level INC callbacks (registered through the CR MPI Ext interface). 
 * Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive. 
 * {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked. 
 * {{{opal-restart}}} also support local decompression before restarting 
 * {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata 
 * {{{orte-restart}}} now uses the SStore framework to work with the metadata 
 * Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality. 
 * Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}. 
 * Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped. 
 * Make sure to decrement the number of 'num_local_procs' in the orted when one goes away. 
 * odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options. 
 * Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities. 
 * Improve the checks for 'already checkpointing' error path. 
 * A a recovery output timer, to show how long it takes to restart a job 
 * Do a better job of cleaning up the old session directory on restart. 
 * Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment) 
 * Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize. 

This commit was SVN r23587.

The following Trac tickets were found above:
  Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
  Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
  Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
  Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
  Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
  Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
  Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
2010-08-10 20:51:11 +00:00
Terry Dontje
b74ef351b7 Added new solaris sysinfo module. Also added code to assign
orte_local_chip_type and orte_local_chip_model in MPI processes it the
appropriate sysinfo module found the values on the machine.

This commit was SVN r23581.
2010-08-09 19:28:56 +00:00
Nysal Jan
b6524f6a92 Fix the conditional branch, jump to the correct location. Reported by Matthew Clark
This commit was SVN r23576.
2010-08-09 10:07:58 +00:00
Ralph Castain
9c69175117 If debug is enabled, provide an mca param and supporting logic to output when OPAL_ACQUIRE_THREAD is waiting and has obtained the thread, and when OPAL_RELEASE_THREAD releases it.
This commit was SVN r23557.
2010-08-05 16:25:32 +00:00
Shiqing Fan
b8db8d0ef8 Need to change another variable name.
This commit was SVN r23556.
2010-08-05 12:38:28 +00:00
Shiqing Fan
714883d472 A better way to make this work with VS 2010.
This commit was SVN r23544.
2010-08-03 09:06:50 +00:00
Shiqing Fan
e822f465b5 Remove a bunch of warnings due to the new POSIX supplement in VS 2010.
This commit was SVN r23540.
2010-08-02 12:16:29 +00:00
Josh Hursey
ba7e94dd89 Some relatively minor C/R related cleanup
* Fix a configure warning for checking --enable-ft-thread
 * In hnp and orted ErrMgr components check to see if other components have already recovered this process before trying to recover it again.
 * Fix 'npernode' for restarting using the resilient rmaps component
 * export ompi_info_set, so that internal functionality can use it.

This commit was SVN r23535.
2010-07-30 18:59:34 +00:00
Shiqing Fan
ea7bf2bd9e Correctly check the data type alignment for VS 2010 environment, and set the event include paths to global level, in order to make the clever VS load them.
This commit was SVN r23534.
2010-07-30 14:25:15 +00:00
Ralph Castain
0ed98967ed Update the thread protection in the ring_buffer class
This commit was SVN r23532.
2010-07-29 02:12:44 +00:00
Rolf vandeVaart
3d9b05ba2b Fix bug introduced by r23463. We now handle positive
error codes correctly again.  Also fix a typo.
Reviewed by Jeff Squyres. 

This commit was SVN r23531.

The following SVN revision numbers were found above:
  r23463 --> open-mpi/ompi@2af3e6e5ae
2010-07-28 19:19:27 +00:00
Jeff Squyres
f313257022 This file should really be distclean, not maintainer clean (it's not
shipped in the tarball).

This commit was SVN r23525.
2010-07-28 14:24:51 +00:00
Jeff Squyres
dca1ee8822 Revert r23495. Per on-list discussion, it doesn't do what it was
supposed to do, and there's disagreement about whether the concept
that it was supposed to do was the Right Thing anyway.

http://www.open-mpi.org/community/lists/devel/2010/07/8223.php

This commit was SVN r23517.

The following SVN revision numbers were found above:
  r23495 --> open-mpi/ompi@32e6dae8b0
2010-07-27 22:38:07 +00:00
Jeff Squyres
88b7923fc5 At least on NetBSD 5.0_STABLE with Libtool 2.2.6b, lt_dlerror() can
sometimes return NULL, so be sure to handle that case properly.

This commit was SVN r23503.
2010-07-27 14:15:53 +00:00
Jeff Squyres
245dc1a86d Add a cast to avoid a compiler warnings on BSD.
This commit was SVN r23502.
2010-07-27 14:14:37 +00:00
Jeff Squyres
0ce1a82cde This commit looks much bigger than it is. There are only 2
substantive changes in this commit; the rest are minor style changes:

 1. Change an OBJ_NEW(opal_list_item_t) to OBJ_NEW(opal_if_t).  This
    was causing memory corruption in the BSD code paths.
 1. Move some local variables from the top of opal_if_init() to inside
    the non-BSD code paths so that we avoid bunches of warnings about
    unused variables when compiling on BSD.  In doing so, I indented
    the whole non-BSD section one level deeper, making the commit look
    huge. 

I also added a few {} around 1-line blocks, added some spaces, broke a
few lines, re-formatted a few comments, ...etc.  Trivial stuff.

This commit was SVN r23501.
2010-07-27 13:46:55 +00:00
Ralph Castain
b3a8a394f0 Cleanup some lingering references to OMPI_SETUP_C and OMPI_SETUP_CXX that generated warnings. Follow the new naming convention by chaniging OMPI_SETUP_ASM to OPAL_SETUP_ASM
This commit was SVN r23500.
2010-07-27 04:51:50 +00:00
Jeff Squyres
41edaa1fe5 While we're here, also rename this macro: it really should be
OPAL_SETUP_CC. 

This commit was SVN r23496.
2010-07-26 22:09:24 +00:00
Jeff Squyres
32e6dae8b0 Add -gstabs+ compiler switch if we're on OSX and -g is in CFLAGS and that flag works with a test compile
This commit was SVN r23495.
2010-07-26 22:05:41 +00:00
Shiqing Fan
71d2749b6b Fix a header problem on Windows.
This commit was SVN r23483.
2010-07-23 07:52:34 +00:00
Jeff Squyres
7d7c0aa48f Somehow the check for the specific value "external" got dropped in the
logic (even though the "else" clause for handling it was there).  This
commit puts back the specific check for the word "external".

Thanks to Jed Brown for noticing the issue.  Fixes trac:2503.

This commit was SVN r23475.

The following Trac tickets were found above:
  Ticket 2503 --> https://svn.open-mpi.org/trac/ompi/ticket/2503
2010-07-22 11:42:15 +00:00
Jeff Squyres
29c1ad4196 Forgot BEGIN/END C_DECLS.
This commit was SVN r23453.
2010-07-21 11:05:08 +00:00
Jeff Squyres
b3952e4f07 Use const for the opal_fd_write() function, just to be nice.
This commit was SVN r23452.
2010-07-21 11:01:16 +00:00
Jeff Squyres
ab5fc1b570 Add trivial functions to loop over read()'ing and write()'ing with a
file descriptor (i.e., read and write complete messages, transparently
handling partial reads/writes, EAGAIN, and EINTR).

This code effectively already exists in a few places in the code base;
this is mainly a consolidation.

This commit was SVN r23450.
2010-07-20 19:53:49 +00:00
Jeff Squyres
64cb8f5d7f Another round of man page cleanups from Debian mantainer Manuel
Prinz.  Many thanks!

This commit was SVN r23445.
2010-07-20 14:07:18 +00:00
Christopher Yeoh
8a3d5d4e1c Adds missing sys/stat.h include needed for more recent versions of glibc
This commit was SVN r23440.
2010-07-20 06:31:16 +00:00