appeared multiple times in ompi_info output (so did others, but this
is the one that was noticed). Ensure that we don't repeat
opal_paffinity_base_register_params() multiple times.
This commit was SVN r24569.
No need for any CMRs to 1.5... that was already done in CMR 2728.
This commit was SVN r24545.
The following SVN revision numbers were found above:
r22841 --> open-mpi/ompi@b400b84162
* Convert from AC_TRY_RUN to AC_RUN_IFELSE.
* Excellent suggestion from Paul Hargrove: use AC_CHECK_FUNC to look
for a Linuxthreads-specific symbol when we're cross compiling to
see if threads will have different PIDs (because AC_CHECK_FUNC
works properly even when in cross-compiling environments).
Background: the old/Linuxthreads-based pthreads implementation used
the Linux clone() call to make threads, which effectively meant that
each thread had a different PID. The new NPTL pthreads implementation
does things better, meaning that threads have the same PID.
Open MPI no longer supports threads with different PIDs -- we ripped
out the supporting code for threads with different PIDs because we
don't have systems available to test this on anymore (anyone who still
has such a system can still use older versions of Open MPI). Hence,
configure needs to determine whether the target system will have the
same PID for threads or not -- even if we're cross-compiling. The
current test compiles and runs a multi-threaded app that checks PIDs
of different threads, but we clearly can't do that in a
cross-compiling environment. So use AC_CHECK_FUNC in cross-compiling
environments.
Simple, no?
This commit was SVN r24537.
There was no compelling reason to support such old kernels. Accordingly, convert the test to print a nice error message indicating we no longer support old kernels (but indicate that earlier OMPI versions do) and error out. Remove all code that was protected by "if have different pids" since it can no longer be compiled.
This commit was SVN r24531.
fix will be included in hwloc 1.1.2.
Brad -- can you verify that this fixes the issue for you?
Fixes trac:2732.
This commit was SVN r24450.
The following Trac tickets were found above:
Ticket 2732 --> https://svn.open-mpi.org/trac/ompi/ticket/2732
filenames -- don't include the project name ("opal")
* Don't link maffinity/hwloc and paffinity/hwloc against the common
hwloc in the static build case (because this will result in
duplicate symbols)
This commit was SVN r24447.
hopefully, this now compiles for libnuma 0.9.x and libnuma 2.0.x.
Fixes for the strategy discussed in the commit message for r24442
(i.e., check against numa_get_mems_allowed(), which only exists in
libnuma 2.0.x) and the new new new plan on #2698 coming in a separate
commit.
This commit was SVN r24443.
The following SVN revision numbers were found above:
r24442 --> open-mpi/ompi@90a8fe4aad
(with libnuma-2.0.4 / LIBNUMA_API_VERSION 2): numa_get_run_node_mask
returns a struct bitmask *.
Whether it's a good idea to blindly pass that on to
numa_set_membind() is another matter: one might want to match against
the list returned by numa_get_mems_allowed(), which may be set by the
outside environment.
Refs trac:2698.
This commit was SVN r24442.
The following SVN revision numbers were found above:
r24421 --> open-mpi/ompi@31510e683b
The following Trac tickets were found above:
Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
Update the CMake script for checking mca subdirs.
Add windows support for __attribute__ packed structures.
Define usleep and posix_memalign with equivalent windows functions.
And a few minor fixes, type casts.
This commit was SVN r24429.
what memory node the process is running on (which is guaranteed to be
a good answer because maffinity won't be invoked unless the process is
already bound to a specific processor), and then bind our memory to
that.
Refs trac:2698.
This commit was SVN r24421.
The following SVN revision numbers were found above:
r24290 --> open-mpi/ompi@afa654746c
The following Trac tickets were found above:
Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
OMPI supports multiple different repository systems (SVN, hg, git).
But the VERSION file has listed "want_svn" and "svn_r" as fields, even
though the actual repo system and version may not be SVN.
So search/replace those fields (and derrivative values that come from
those fields) with "want_repo_rev" and "repo_rev", respectively.
This commit was SVN r24405.
Add some new proc/job states
Rename a constant to reflect coming change - remove the arbitrary difference between restarting a proc locally and relocating it to another node in terms of the number of restarts allowed.
Add pretty-print of signals for "proc aborted due to signal" reports.
This commit was SVN r24378.
The following SVN revision numbers were found above:
r24371 --> open-mpi/ompi@93d28a5792
This means that the converters (opal_err2str, orte_err2str) can now
return NULL as a "silent error". The return value of opal_err2str_fn_t
is the status of the operation (OPAL_SUCCESS or OPAL_ERROR).
This fixes the "Unknown error" message issues on the trunk.
This commit was SVN r24371.
Temporarily remove hwloc's internal version of myriexpress.h. It is
causing a problem when compiling Open MPI with MX support because
hwloc uses AC_CONFIG_HEADER in hwloc's hwloc.m4 to generate
opal/mca/paffinity/hwloc/hwloc/include/hwloc/config.h.
AC_CONFIG_HEADER apparently has the (undocumented) side effect of
adding -I$(top_builddir)/opal/mca/paffinity/hwloc/hwloc/include/hwloc
to OMPI's compilation flags. Hence, when the OMPI MX components are
compiled and #include "myriexpress.h" (or <myriexpress.h>) they see
hwloc's myriexpress.h before the system one. Badness ensures.
This removal is temporary because we need to figure out a better
solution. But for now, OMPI is not using hwloc's myriexpress.h file --
so it's safe to remove. I'll push this issue upstream to hwloc to
figure out a better solution...
This commit was SVN r24354.
The following Trac tickets were found above:
Ticket 2690 --> https://svn.open-mpi.org/trac/ompi/ticket/2690
Short Version:
--------------
Event engine needs to be flushed so it does not use old/stale file descriptors.
Long Version:
-------------
The problem was that the restarted process was waiting for the socket to the local daemon to finish establishing during the 'sync' operation. The core problem was that the daemon was sending a header of 36 bytes, but the restarted process only received 35 bytes of the message. So the restarted process became stuck waiting for the last byte to arrive.
After many hours of digging, I figured out that the event engine was using the same file descriptor for its evsig_cb functionality (to signal itself when a signal arrives). So when the daemon wrote in to the new fd the event engine was stealing the first byte (*shakes fist at event engine*) before the recv() could be posted.
The solution is to use the event_reinit() function on restart to re-establish the now-stale file descriptors in the event engine. This seems to have fixed the problem.
A few other minor things:
-------------------------
* Add a check to make sure the event engine is balanced in its init/finalize
* Add the opal_event_base_close() to the BLCR restart exec function (still not 100% sure it is needed, but there it is).
This commit was SVN r24296.
last December. :-(
Add new MCA param: maffinity_libnuma_policy. Thanks to David
Singleton for the suggestion. Here's the help text about it:
{{{
MCA maffinity: parameter "maffinity_libnuma_policy" (current value:
<loose>, data source: default value)
Binding policy that determines what happens if memory
is unavailable on the local NUMA node. A value of
"strict" means that the memory allocation will fail;
a value of "loose" means that the memory allocation
will spill over to another NUMA node.
}}}
This commit was SVN r24290.
either direct link to these basic predefined types, or a combination of them.
Anyway, the first items in the datatype list belong to OPAL, the second round
are MPI datatypes created by composing basic OPAL datatypes, and the last
batch are mapped datatype (direct correspondance between an OMPI datatype and
an OPAL one such as int -> int32_t).
Modify the op to fit this new scheme.
This commit was SVN r24247.
the module to use the new hwloc bitmap API (the cpuset API is both
klunkier and deprecated), which simplified a few things.
This commit was SVN r24217.
{{{
base/paffinity_base_service.c: In function ‘opal_paffinity_base_cset2mapstr’:
base/paffinity_base_service.c:623: warning: unused variable ‘range_last’
base/paffinity_base_service.c:623: warning: unused variable ‘range_first’
base/paffinity_base_service.c:622: warning: unused variable ‘count’
base/paffinity_base_service.c:622: warning: unused variable ‘m’
}}}
{{{
connect/btl_openib_connect_oob.c: In function ‘init_ud_qp’:
connect/btl_openib_connect_oob.c:1111: warning: control reaches end of non-void function
connect/btl_openib_connect_oob.c: In function ‘init_device’:
connect/btl_openib_connect_oob.c:1235: warning: unused variable ‘i’
connect/btl_openib_connect_oob.c: In function ‘get_pathrecord_sl’:
connect/btl_openib_connect_oob.c:1323: warning: unused variable ‘i’
}}}
This commit was SVN r24196.
It is statically initialized to the real back-end OPAL show_help
function. During orte_show_help_init(), the variable is re-assigned
with the value of the back-end ORTE show_help function (the one that
does error message aggregation).
Therefore, anything that calls opal_show_help() after a certain point
in orte_init() will have their show_help messages be aggregated.
w00t! Even code down in OPAL -- that has no knowledge of ORTE -- will
have their messages aggregated. '''Double w00t!'''
During orte_show_help_finalize(), we restore the original pointer
value so that it something calls opal_show_help() after
orte_finalize(), it'll still work properly (but it won't be
aggregated).
This commit was SVN r24185.
1. Remove it from libevent207.h because it is not needed.
2. Add compat to the include list so it can use queue.h when needed.
This commit was SVN r24144.
Change OMPI code in libevent to not use bool.
Add some comments to indicate OMPI specific code.
This should fix compiles on Sun Studio Solaris.
This commit was SVN r24062.
As memmove is slower than memcpy, I added the required logic to only use it when
really necessary.
No modification from developers point of view, you should always call
opal_datatype_copy_content_same_ddt.
This commit was SVN r24047.
It's not possible to combine two shared libraries on Windows, so we have to do it a bit different. First generate a small event static library by just linking the object files, and link it into other libraries that needs the libevent API.
This commit was SVN r24039.
handles more modern versions of Autoconf's --program-transform
arguments. Also make it clear that the message is coming from Open
MPI logic, so that we don't blame Autoconf, Red Hat, or anyone else
next time!
This commit was SVN r24024.
The following Trac tickets were found above:
Ticket 2611 --> https://svn.open-mpi.org/trac/ompi/ticket/2611
1. create DLL A, export symbols of A, import nothing (A normally is OPAL)
should define _USRDLL , A_EXPORT
2. create DLL B, export symbols of B, import A.lib (B could be ORTE, OMPI or other ompi tools)
should define _USRDLL, B_EXPORT
3. create DLL C, import B.dll (C could be external libs or apps)
should define B_IMPORT
This commit was SVN r24016.
libevent creates its event-config.h during "make all" (vs. during
configure). The prior method around this didn't work because it wrote
an event-config.h.in in the source tree -- a Bad Idea(tm). The new
way uses AC_CONFIG_COMMAND to get stuff executed at the end of
config.status to create event-config.h. This seems to work properly
during make distcheck.
This commit was SVN r23975.
Note: the ompi_check_libfca.m4 file had to be modified to avoid it stomping on global CPPFLAGS and the like. The file was also relocated to the ompi/config directory as it pertains solely to an ompi-layer component.
Forgive the mid-day configure change, but I know Shiqing is working the windows issues and don't want to cause him unnecessary redo work.
This commit was SVN r23966.
After talking with Brian, we're pretty sure that this is only because
really, really old libevent didn't allow bitwise or-ing of the other
loop types, because what we really need is (EVLOOP_ONCE |
EVLOOP_NONBLOCK). And that's what EVLOOP_ONELOOP did (i.e., we
changed the logic of libevent's event.c to let ONELOOP do both ONCE
and NONBLOCK things).
In the new libevent version, we didn't implement EVLOOP_ONELOOP
properly. As a result, and we got hangs in the SM BTL add_procs
function. Note that the SM BTL wasn't to blame -- it was purely a
side-effect of bad ONELOOP integration (i.e., if you got past the SM
BTL add_procs, you may well have hung somewhere else).
This commit removes all ONELOOP customizations from event.c and
returns it to (almost) its original state from the libevent 2.0.7-rc
distribution. Everwhere in the code base where we used ONELOOP, we
now use (ONCE | NONBLOCK).
This commit was SVN r23957.
Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution.
This commit was SVN r23943.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.
Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.
Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.
I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:
1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)
2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.
There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.
This commit was SVN r23925.
* Update to be safe for AC 2.68 by using AC_LINK_IFELSE instead of
AC_TRY_LINK
* If enable visibility was used, ensure we fail if the compiler
doesn't support it
* Rename OMPI_CHECK_VISIBILITY -> OPAL_CHECK_VISIBILITY (and all
internal variables)
This commit was SVN r23923.
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.
Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.
This commit was SVN r23764.
All interface APIs for accessing the info remain unchanged in opal/util/if.c.
This has been tested on Mac, Linux, and NetBSD. Nobody else seemed interested in testing it, so there may be some future problems revealed as people try it on other OSs.
This commit was SVN r23743.
do the right thing. They now properly return
the value after the update. This also fixes all warnings
reported by the Sun Studio compiler. George provided the
new assembly routines. I added some configure code to make
sure the compilers could handle it.
This fixes trac:2560.
This commit was SVN r23721.
The following Trac tickets were found above:
Ticket 2560 --> https://svn.open-mpi.org/trac/ompi/ticket/2560
administrator can specify compiler flags that get
inserted into the command before the user's flags.
These flags can be specified at configure time.
Reviewed by Jeff Squyres.
This fixes ticket #2474.
This commit was SVN r23709.
- Add one instance where we do not use a parameter in a function
- Fix a buglet in commit r23689, where the attribute-for-function ptrs
was applied.
This commit was SVN r23690.
The following SVN revision numbers were found above:
r23689 --> open-mpi/ompi@5eb571c458
be tested on function pointers and assigned accordingly,
instead of using the pre-processor in the header files.
A functional change is (re-) specifying __opal_attribute_noreturn__
on orte_errmgr_base_abort(): All modules in the errmgr framework
either use this function, or define their own abort function,
which sets __opal_attribute_noreturn__.
This attributes was taken out with the errmgr overhaul in r22872.
This commit was SVN r23689.
The following SVN revision numbers were found above:
r22872 --> open-mpi/ompi@e4f2d03d28
assigned to function-declarations.
Check this case and mark the currently only case existing in trunk.
Thanks to Paul Hargrove for bringing this up.
Let's test the svn commit msg CMR:v1.5
This commit was SVN r23676.
otherwise we get swamped with warnings by gcc, everywhere header is
included.
- Remove redundant declaration of opal_datatype_safeguard_pointer_debug_breakpoint
Check whether CMR:v1.5 works
This commit was SVN r23674.
one of them in as it may still be needed on Solaris.
This fixes trac:2530.
This commit was SVN r23626.
The following SVN revision numbers were found above:
r22638 --> open-mpi/ompi@2a4b1227d9
The following Trac tickets were found above:
Ticket 2530 --> https://svn.open-mpi.org/trac/ompi/ticket/2530
a warning itselve (when another warning is generated within the file),
which can be rather anying.
Therefore check for output regarding this unrecognized warning.
This commit was SVN r23624.
linking against libibverbs on Solaris.
Sorry for the mid-day configure change folks; I meant to commit this
last night and forgot. :-(
This commit was SVN r23606.
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php
Documentation:
http://osl.iu.edu/research/ft/
Major Changes:
--------------
* Added C/R-enabled Debugging support.
Enabled with the --enable-crdebug flag. See the following website for more information:
http://osl.iu.edu/research/ft/crdebug/
* Added Stable Storage (SStore) framework for checkpoint storage
* 'central' component does a direct to central storage save
* 'stage' component stages checkpoints to central storage while the application continues execution.
* 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress)
* 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching)
* Added Compression (compress) framework to support
* Add two new ErrMgr recovery policies
* {{{crmig}}} C/R Process Migration
* {{{autor}}} C/R Automatic Recovery
* Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component
* Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option)
* {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342)
* {{{OMPI_CR_Restart}}}
* {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules)
* {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192)
* {{{OMPI_CR_Quiesce_start}}}
* {{{OMPI_CR_Quiesce_checkpoint}}}
* {{{OMPI_CR_Quiesce_end}}}
* {{{OMPI_CR_self_register_checkpoint_callback}}}
* {{{OMPI_CR_self_register_restart_callback}}}
* {{{OMPI_CR_self_register_continue_callback}}}
* The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future.
* Add a progress meter to:
* FileM rsh (filem_rsh_process_meter)
* SnapC full (snapc_full_progress_meter)
* SStore stage (sstore_stage_progress_meter)
* Added 2 new command line options to ompi-restart
* --showme : Display the full command line that would have been exec'ed.
* --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413)
* Deprecated some MCA params:
* crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir
* snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir
* snapc_base_global_shared deprecated, use sstore_stage_global_is_shared
* snapc_base_store_in_place deprecated, replaced with different components of SStore
* snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref
* snapc_base_establish_global_snapshot_dir deprecated, never well supported
* snapc_full_skip_filem deprecated, use sstore_stage_skip_filem
Minor Changes:
--------------
* Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing.
* Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components
* Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it.
* Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}}
* Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set.
* opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality.
* Cleanup the CRS framework and components to work with the SStore framework.
* Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably).
* Add 'quiesce' hook to CRCP for a future enhancement.
* We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}.
* Add optional application level INC callbacks (registered through the CR MPI Ext interface).
* Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive.
* {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked.
* {{{opal-restart}}} also support local decompression before restarting
* {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata
* {{{orte-restart}}} now uses the SStore framework to work with the metadata
* Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality.
* Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}.
* Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped.
* Make sure to decrement the number of 'num_local_procs' in the orted when one goes away.
* odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options.
* Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities.
* Improve the checks for 'already checkpointing' error path.
* A a recovery output timer, to show how long it takes to restart a job
* Do a better job of cleaning up the old session directory on restart.
* Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment)
* Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize.
This commit was SVN r23587.
The following Trac tickets were found above:
Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
orte_local_chip_type and orte_local_chip_model in MPI processes it the
appropriate sysinfo module found the values on the machine.
This commit was SVN r23581.
* Fix a configure warning for checking --enable-ft-thread
* In hnp and orted ErrMgr components check to see if other components have already recovered this process before trying to recover it again.
* Fix 'npernode' for restarting using the resilient rmaps component
* export ompi_info_set, so that internal functionality can use it.
This commit was SVN r23535.
error codes correctly again. Also fix a typo.
Reviewed by Jeff Squyres.
This commit was SVN r23531.
The following SVN revision numbers were found above:
r23463 --> open-mpi/ompi@2af3e6e5ae
substantive changes in this commit; the rest are minor style changes:
1. Change an OBJ_NEW(opal_list_item_t) to OBJ_NEW(opal_if_t). This
was causing memory corruption in the BSD code paths.
1. Move some local variables from the top of opal_if_init() to inside
the non-BSD code paths so that we avoid bunches of warnings about
unused variables when compiling on BSD. In doing so, I indented
the whole non-BSD section one level deeper, making the commit look
huge.
I also added a few {} around 1-line blocks, added some spaces, broke a
few lines, re-formatted a few comments, ...etc. Trivial stuff.
This commit was SVN r23501.
logic (even though the "else" clause for handling it was there). This
commit puts back the specific check for the word "external".
Thanks to Jed Brown for noticing the issue. Fixes trac:2503.
This commit was SVN r23475.
The following Trac tickets were found above:
Ticket 2503 --> https://svn.open-mpi.org/trac/ompi/ticket/2503
file descriptor (i.e., read and write complete messages, transparently
handling partial reads/writes, EAGAIN, and EINTR).
This code effectively already exists in a few places in the code base;
this is mainly a consolidation.
This commit was SVN r23450.
not using in Open MPI (i.e., that stuff is only used in the standalone
builds of hwloc -- it's not compiled/installed/used by Open MPI).
This commit was SVN r23416.
variable was set, it was prefixed to ''all'' values in the wrapper
compiler data text files. For example, if OPAL_DESTDIR was set to
/tmp/bogus and a wrapper compiler data file contained the line:
preprocessor_flags=-pthread
The value would be exanded to:
/tmp/bogus/-pthread
Which is clearly wrong. After some back-and-forth with Ralph and
Brian, Brian submitted this patch that fixes the problem. Now we
handle three cases properly (assume that configure was invoked with
--prefix=/opt/openmpi and no other directory specifications, and
$OPAL_DESTDIR is set to /tmp/buildroot):
1. Individual directories, such as libdir. These need to be prepended
with DESTDIR. I.e., return /tmp/buildroot/opt/openmpi/lib.
2. Compiler flags that have ${FIELD} values embedded in them. For
example, consider if a wrapper compiler data file contains the
line:
preprocessor_flags=-DMYFLAG="${prefix}/share/randomthingy/"
The value we should return is:
-DMYFLAG="/tmp/buildroot/opt/openmpi/share/randomthingy/"
3. Compiler flags that do not have any ${FIELD} values. For example,
consider if a wrapper compiler data file contains the line:
preprocessor_flags=-pthread
The value we should return is:
-pthread
Note, too, that this OPAL_DESTDIR futzing only needs to occur during
opal_init(). By the time opal_init() has completed, all values should
be substituted in that need substituting. Hence, we take an extra
parameter (is_setup) to know whether we should do this futzing or
not.
This commit was SVN r23402.
#define CACHE_LINE_SIZE to 128. This name has a conflict on NetBSD,
and it seems kinda odd to have a header file that ''only'' defines a
single value. Also, we'll soon be raising hwloc to be a first-class
item, so having this file around seemed kinda weird.
Therefore, I replaced CACHE_LINE_SIZE with opal_cache_line_size, an
int (in opal/runtime/opal_init.c and opal/runtime/opal.h) on the
rationale that we can fill this in at runtime with hwloc info (trunk
and v1.5/beyond, only). The only place we ''needed'' a compile-time
CACHE_LINE_SIZE was in the BTL SM (for struct padding), so I made a
new BTL_SM_ preprocessor macro with the old CACHE_LINE_SIZE value
(128). That use isn't suitable for run-time hwloc information,
anyway.
This commit was SVN r23349.
platforms (e.g., PPC64 running RHEL 5.4) -- sometimes it only finds
PUs. So in that case, just run the same calculation, but with PUs
instead of cores.
This commit was SVN r23305.
rename OMPI_CHECK_ATTRIBUTES -> OPAL_CHECK_ATTRIBUTES, because it's in
OPAL (somehow that name must have gotten missed in the Great M4 split
of '10...?)
This commit was SVN r23267.
* Remove OPAL_ERR_PAFFINITY_NOT_SUPPORTED; fit it into the generic
OPAL_ERR_NOT_SUPPORTED case.
* When odls_default detects that processor affinity is not supported,
it prints a specific message about it, and then it suppressed a
generic HNP help message that would normally follow it (i.e., it's
easier to have the "processor affinity is not supported" show_help
message last).
* Use some symbolic names in odls_default instead of fixed int's,
just for slight readability improvements in the code.
* Introduce orte_show_help_suppress(), which gives the ability to
suppress any future showings of any arbitrary show_help() message.
This is useful if you display message X and want to suppress
message Y. This suppression *only* works in environments where
orte_show_help() does coalescing.
This commit was SVN r23249.
distribution tarball, and would therefore cause automake to fail (in
case someone invokes autogen.sh on a distribution tarball).
This commit was SVN r23218.
* If < 0, it's an OPAL_ERR_* value
* If >= 0, it's the actual output value of the function
This is problematic for the OPAL_SOS stuff. This commit changes those
functions to always return OPAL_* statuses and send the output value
back through output parameters (like 95% of the rest of the code
base). This avoids the confusion with OPAL_SOS stuff and makes
paffinity work again (e.g., mpirun --bind-to-core ...).
I updated all paffinitiy modules for the new function signatures, and
bumped the paffinity API version up to 2.0.1. I don't think the
version change will matter, though, because we'll be introducing
support for hardware threads soon, which will either bump the
paffinity version again or we'll replace paffinity with
a new framework.
This commit was SVN r23197.
The fix is to just check if the return value is positive or not, since all the SOS encoded errors are *always* negative.
The real fix (as Ralph points out) is to change these functions (opal_pointer_array_add and mca_base_param*) to return the index as a pointer.
This commit was SVN r23173.
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
back the native error code.
* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
(OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
decode 'ret' to get the native error code.
This commit was SVN r23162.
The OPAL SOS framework tries to meet the following objectives:
* reduce the cascading error messages and the amount of code needed to print an error message.
* build and aggregate stacks of encountered errors and associate related individual errors with each other.
* allow registration of custom callbacks to intercept error events.
For more information, refer to
https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
This commit was SVN r23158.
I forgot to mention one more thing in the r23152 commit message:
* Copy the fix for hwloc's m4 to disable the configure flag
--enable-debug when building in embedding mode, because it can be
hijacked by the outter-level application. In this case, if you
configured OMPI with --enable-debug (or have --enable-debug in a
platform file), you'd see all of hwloc's debug output. Ick. hwloc
1.0 will include this fix.
This commit was SVN r23153.
The following SVN revision numbers were found above:
r23152 --> open-mpi/ompi@ca3362021e
* Fix disabling hwloc build (i.e., put the AM_CONDITIONALs where they
belong in the configure.m4 file)
* Update some svn:ignores
* r23142 removed some extraneous code, but forgot to remove the
variables used only by that code
This commit was SVN r23152.
The following SVN revision numbers were found above:
r23142 --> open-mpi/ompi@610fc67d12
supports a wide variety of operating systems and platforms; see the
opal/mca/paffinity/hwloc/hwloc/README file for details.
This component includes an embedded copy of hwloc, currently based on
hwloc-1.0rc6. But note that hwloc is properly SVN imported into the
/vendor branch, so it will be easy to update when 1.0 GA is released.
Note that the hwloc tree embedded in opal/mca/paffinity/hwloc/hwloc is
identical to a hwloc distribution tarball, except that much of the
documentation was rm -rf'ed (because we don't need it for the embedded
case).
Since the paffinity framework currently does not understand hardware
threads, the hwloc component compensates for this by identifying cores
by the "first" hardware thread on that core. Hopefully we'll update
paffinity someday to understand hardware threads. :-)
configure grew a --with-hwloc option, analogous to what we do for many
other external libraries that OMPI supports. However, there's a new
feature: due to the request of several distros, OMPI can be configured
to build with its internal copy of hwloc or with an external copy of
hwloc (e.g., a system-installed hwloc).
1. If --with-hwloc is not specified, Open MPI will try to use its
internal copy (but silently fail/ignore hwloc if that fails).
1. If --with-hwloc=<dir> is supplied, Open MPI looks for hwloc
support in <dir> (and --with-hwloc-libdir=<dir>, if specified).
1. If --with-hwloc=external is supplied, Open MPI will look for hwloc
in a compiler/linker default external location.
1. If --with-hwloc=internal is supplied, Open MPI will use its
internal copy of hwloc.
Some of OMPI's main configury had to be slightly re-arranged in the
bootstrapping phase to accomodate hwloc's configry needs.
This commit was SVN r23125.
http://marc.info/?l=linux-mm-commits&m=127352503417787&w=2 for more
details.
* Remove the ptmalloc memory component; replace it with a new "linux"
memory component.
* The linux memory component will conditionally compile in support
for ummunotify. At run-time, if it has ummunotify support and
finds run-time support for ummunotify (i.e., /dev/ummunotify), it
uses it. If not, it tries to use ptmalloc via the glibc memory
hooks.
* Add some more API functions to the memory framework to accomodate
the ummunotify model (i.e., poll to see if memory has "changed").
* Add appropriate calls in the rcache to the new memory APIs to see
if memory has changed, and to react accordingly.
* Add a few comments in the openib BTL to indicate why we don't need
to notify the OPAL memory framework about specific instances of
registered memory.
* Add dummy API calls in the solaris malloc component (since it
doesn't have polling/"did memory change" support).
This commit was SVN r23113.