that enabling "local_only" by default could cause excessive
by-NUMA-node paging and/or OOMs (rather than allowing memory
allocations to spill over to other NUMA nodes).
This brought home the very real-world example of people buying servers
with more processors/cores than they need, just to get more memory.
We wouldn't want Badness to occur in such scenarios by default.
Instead, let people turn on "only allow memory allocations on my local
NUMA node" if their application would benefit from it.
This commit was SVN r24648.
After a long period of development with many starts and stops, we
finally got this where we wanted it.
This commit introduces 2 new MCA params (note that the
"maffinity_libnuma_policy" MCA param introduced by r24290 was removed
when libnuma support was removed). Remember that maffinity policies
are only in effect when paffinity is enaabled -- i.e., when processes
are bound to processors!
* '''maffinity_base_alloc_policy:''' Policy that determines how
general memory allocations are bound after MPI_INIT. A value of
"none" means that no memory policy is applied. A value of
"local_only" means that all memory allocations will be restricted
to the local NUMA node where each process is placed. Note that
operating system paging policies are unaffected by this setting.
For example, if "local_only" is used and local NUMA node memory is
exhausted, a new memory allocation may cause paging.
* '''maffinity_base_bind_failure_action:''' What Open MPI will do if
it explicitly tries to bind memory to a specific NUMA location, and
fails. Note that this is a different case than the general
allocation policy described by maffinity_base_alloc_policy. A
value of "warn" means that Open MPI will warn the first time this
happens, but allow the job to continue (possibly with degraded
performance). A value of "error" means that Open MPI will abort
the job if this happens.
This needs at least a little soak time on the trunk before going to
v1.5.
This commit was SVN r24639.
The following SVN revision numbers were found above:
r24290 --> open-mpi/ompi@afa654746c
The following Trac tickets were found above:
Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
Upgrade to hwloc 1.2 (from hwloc 1.1.2). This should fix the problems
Nathan's seeing in #2778.
Let's let this soak on the trunk for a little while and see how LANL's
MTT's work out. If that works, then we can CMR this to v1.5.
This commit was SVN r24635.
The following Trac tickets were found above:
Ticket 2778 --> https://svn.open-mpi.org/trac/ompi/ticket/2778
Nth core, so it fell over to try to find the Nth PU.
-----
hwloc isn't able to find cores on all platforms. Example: PPC64
running RHEL 5.4 (linux kernel 2.6.18) only reports NUMA nodes and
PU's. Fine.
However, note that hwloc_get_obj_by_type() will return NULL in 2
(effectively) different cases:
- no objects of the requested type were found
- the Nth object of the requested type was not found
So first we have to see if we can find *any* cores by looking for the
0th core. If we find it, then try to find the Nth core. Otherwise,
try to find the Nth PU.
This commit was SVN r24632.
Rename the memusage sensor plugin to "resusage" as it will soon be updated to include full process stat monitoring.
Extend the heartbeat sensor to report node and process stats in the heartbeat.
Store the process and node stats in their respective orte_xxx_t object.
This commit was SVN r24629.
released):
backport hwloc r 3416 from trunk: Add cache info entry _after_ checking
that we need one, thanks Andriy Gapon for the fix
This commit was SVN r24612.
The following SVN revision numbers were found above:
r3418 --> open-mpi/ompi@9972663a12
appeared multiple times in ompi_info output (so did others, but this
is the one that was noticed). Ensure that we don't repeat
opal_paffinity_base_register_params() multiple times.
This commit was SVN r24569.
fix will be included in hwloc 1.1.2.
Brad -- can you verify that this fixes the issue for you?
Fixes trac:2732.
This commit was SVN r24450.
The following Trac tickets were found above:
Ticket 2732 --> https://svn.open-mpi.org/trac/ompi/ticket/2732
filenames -- don't include the project name ("opal")
* Don't link maffinity/hwloc and paffinity/hwloc against the common
hwloc in the static build case (because this will result in
duplicate symbols)
This commit was SVN r24447.
hopefully, this now compiles for libnuma 0.9.x and libnuma 2.0.x.
Fixes for the strategy discussed in the commit message for r24442
(i.e., check against numa_get_mems_allowed(), which only exists in
libnuma 2.0.x) and the new new new plan on #2698 coming in a separate
commit.
This commit was SVN r24443.
The following SVN revision numbers were found above:
r24442 --> open-mpi/ompi@90a8fe4aad
(with libnuma-2.0.4 / LIBNUMA_API_VERSION 2): numa_get_run_node_mask
returns a struct bitmask *.
Whether it's a good idea to blindly pass that on to
numa_set_membind() is another matter: one might want to match against
the list returned by numa_get_mems_allowed(), which may be set by the
outside environment.
Refs trac:2698.
This commit was SVN r24442.
The following SVN revision numbers were found above:
r24421 --> open-mpi/ompi@31510e683b
The following Trac tickets were found above:
Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
what memory node the process is running on (which is guaranteed to be
a good answer because maffinity won't be invoked unless the process is
already bound to a specific processor), and then bind our memory to
that.
Refs trac:2698.
This commit was SVN r24421.
The following SVN revision numbers were found above:
r24290 --> open-mpi/ompi@afa654746c
The following Trac tickets were found above:
Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
Temporarily remove hwloc's internal version of myriexpress.h. It is
causing a problem when compiling Open MPI with MX support because
hwloc uses AC_CONFIG_HEADER in hwloc's hwloc.m4 to generate
opal/mca/paffinity/hwloc/hwloc/include/hwloc/config.h.
AC_CONFIG_HEADER apparently has the (undocumented) side effect of
adding -I$(top_builddir)/opal/mca/paffinity/hwloc/hwloc/include/hwloc
to OMPI's compilation flags. Hence, when the OMPI MX components are
compiled and #include "myriexpress.h" (or <myriexpress.h>) they see
hwloc's myriexpress.h before the system one. Badness ensures.
This removal is temporary because we need to figure out a better
solution. But for now, OMPI is not using hwloc's myriexpress.h file --
so it's safe to remove. I'll push this issue upstream to hwloc to
figure out a better solution...
This commit was SVN r24354.
The following Trac tickets were found above:
Ticket 2690 --> https://svn.open-mpi.org/trac/ompi/ticket/2690
Short Version:
--------------
Event engine needs to be flushed so it does not use old/stale file descriptors.
Long Version:
-------------
The problem was that the restarted process was waiting for the socket to the local daemon to finish establishing during the 'sync' operation. The core problem was that the daemon was sending a header of 36 bytes, but the restarted process only received 35 bytes of the message. So the restarted process became stuck waiting for the last byte to arrive.
After many hours of digging, I figured out that the event engine was using the same file descriptor for its evsig_cb functionality (to signal itself when a signal arrives). So when the daemon wrote in to the new fd the event engine was stealing the first byte (*shakes fist at event engine*) before the recv() could be posted.
The solution is to use the event_reinit() function on restart to re-establish the now-stale file descriptors in the event engine. This seems to have fixed the problem.
A few other minor things:
-------------------------
* Add a check to make sure the event engine is balanced in its init/finalize
* Add the opal_event_base_close() to the BLCR restart exec function (still not 100% sure it is needed, but there it is).
This commit was SVN r24296.
last December. :-(
Add new MCA param: maffinity_libnuma_policy. Thanks to David
Singleton for the suggestion. Here's the help text about it:
{{{
MCA maffinity: parameter "maffinity_libnuma_policy" (current value:
<loose>, data source: default value)
Binding policy that determines what happens if memory
is unavailable on the local NUMA node. A value of
"strict" means that the memory allocation will fail;
a value of "loose" means that the memory allocation
will spill over to another NUMA node.
}}}
This commit was SVN r24290.
the module to use the new hwloc bitmap API (the cpuset API is both
klunkier and deprecated), which simplified a few things.
This commit was SVN r24217.
{{{
base/paffinity_base_service.c: In function ‘opal_paffinity_base_cset2mapstr’:
base/paffinity_base_service.c:623: warning: unused variable ‘range_last’
base/paffinity_base_service.c:623: warning: unused variable ‘range_first’
base/paffinity_base_service.c:622: warning: unused variable ‘count’
base/paffinity_base_service.c:622: warning: unused variable ‘m’
}}}
{{{
connect/btl_openib_connect_oob.c: In function ‘init_ud_qp’:
connect/btl_openib_connect_oob.c:1111: warning: control reaches end of non-void function
connect/btl_openib_connect_oob.c: In function ‘init_device’:
connect/btl_openib_connect_oob.c:1235: warning: unused variable ‘i’
connect/btl_openib_connect_oob.c: In function ‘get_pathrecord_sl’:
connect/btl_openib_connect_oob.c:1323: warning: unused variable ‘i’
}}}
This commit was SVN r24196.
1. Remove it from libevent207.h because it is not needed.
2. Add compat to the include list so it can use queue.h when needed.
This commit was SVN r24144.
Change OMPI code in libevent to not use bool.
Add some comments to indicate OMPI specific code.
This should fix compiles on Sun Studio Solaris.
This commit was SVN r24062.
libevent creates its event-config.h during "make all" (vs. during
configure). The prior method around this didn't work because it wrote
an event-config.h.in in the source tree -- a Bad Idea(tm). The new
way uses AC_CONFIG_COMMAND to get stuff executed at the end of
config.status to create event-config.h. This seems to work properly
during make distcheck.
This commit was SVN r23975.
Note: the ompi_check_libfca.m4 file had to be modified to avoid it stomping on global CPPFLAGS and the like. The file was also relocated to the ompi/config directory as it pertains solely to an ompi-layer component.
Forgive the mid-day configure change, but I know Shiqing is working the windows issues and don't want to cause him unnecessary redo work.
This commit was SVN r23966.
After talking with Brian, we're pretty sure that this is only because
really, really old libevent didn't allow bitwise or-ing of the other
loop types, because what we really need is (EVLOOP_ONCE |
EVLOOP_NONBLOCK). And that's what EVLOOP_ONELOOP did (i.e., we
changed the logic of libevent's event.c to let ONELOOP do both ONCE
and NONBLOCK things).
In the new libevent version, we didn't implement EVLOOP_ONELOOP
properly. As a result, and we got hangs in the SM BTL add_procs
function. Note that the SM BTL wasn't to blame -- it was purely a
side-effect of bad ONELOOP integration (i.e., if you got past the SM
BTL add_procs, you may well have hung somewhere else).
This commit removes all ONELOOP customizations from event.c and
returns it to (almost) its original state from the libevent 2.0.7-rc
distribution. Everwhere in the code base where we used ONELOOP, we
now use (ONCE | NONBLOCK).
This commit was SVN r23957.
Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution.
This commit was SVN r23943.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.
Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.
Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.
I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:
1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)
2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.
There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.
This commit was SVN r23925.
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.
Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.
This commit was SVN r23764.
All interface APIs for accessing the info remain unchanged in opal/util/if.c.
This has been tested on Mac, Linux, and NetBSD. Nobody else seemed interested in testing it, so there may be some future problems revealed as people try it on other OSs.
This commit was SVN r23743.
linking against libibverbs on Solaris.
Sorry for the mid-day configure change folks; I meant to commit this
last night and forgot. :-(
This commit was SVN r23606.
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php
Documentation:
http://osl.iu.edu/research/ft/
Major Changes:
--------------
* Added C/R-enabled Debugging support.
Enabled with the --enable-crdebug flag. See the following website for more information:
http://osl.iu.edu/research/ft/crdebug/
* Added Stable Storage (SStore) framework for checkpoint storage
* 'central' component does a direct to central storage save
* 'stage' component stages checkpoints to central storage while the application continues execution.
* 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress)
* 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching)
* Added Compression (compress) framework to support
* Add two new ErrMgr recovery policies
* {{{crmig}}} C/R Process Migration
* {{{autor}}} C/R Automatic Recovery
* Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component
* Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option)
* {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342)
* {{{OMPI_CR_Restart}}}
* {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules)
* {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192)
* {{{OMPI_CR_Quiesce_start}}}
* {{{OMPI_CR_Quiesce_checkpoint}}}
* {{{OMPI_CR_Quiesce_end}}}
* {{{OMPI_CR_self_register_checkpoint_callback}}}
* {{{OMPI_CR_self_register_restart_callback}}}
* {{{OMPI_CR_self_register_continue_callback}}}
* The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future.
* Add a progress meter to:
* FileM rsh (filem_rsh_process_meter)
* SnapC full (snapc_full_progress_meter)
* SStore stage (sstore_stage_progress_meter)
* Added 2 new command line options to ompi-restart
* --showme : Display the full command line that would have been exec'ed.
* --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413)
* Deprecated some MCA params:
* crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir
* snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir
* snapc_base_global_shared deprecated, use sstore_stage_global_is_shared
* snapc_base_store_in_place deprecated, replaced with different components of SStore
* snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref
* snapc_base_establish_global_snapshot_dir deprecated, never well supported
* snapc_full_skip_filem deprecated, use sstore_stage_skip_filem
Minor Changes:
--------------
* Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing.
* Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components
* Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it.
* Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}}
* Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set.
* opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality.
* Cleanup the CRS framework and components to work with the SStore framework.
* Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably).
* Add 'quiesce' hook to CRCP for a future enhancement.
* We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}.
* Add optional application level INC callbacks (registered through the CR MPI Ext interface).
* Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive.
* {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked.
* {{{opal-restart}}} also support local decompression before restarting
* {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata
* {{{orte-restart}}} now uses the SStore framework to work with the metadata
* Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality.
* Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}.
* Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped.
* Make sure to decrement the number of 'num_local_procs' in the orted when one goes away.
* odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options.
* Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities.
* Improve the checks for 'already checkpointing' error path.
* A a recovery output timer, to show how long it takes to restart a job
* Do a better job of cleaning up the old session directory on restart.
* Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment)
* Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize.
This commit was SVN r23587.
The following Trac tickets were found above:
Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
orte_local_chip_type and orte_local_chip_model in MPI processes it the
appropriate sysinfo module found the values on the machine.
This commit was SVN r23581.
logic (even though the "else" clause for handling it was there). This
commit puts back the specific check for the word "external".
Thanks to Jed Brown for noticing the issue. Fixes trac:2503.
This commit was SVN r23475.
The following Trac tickets were found above:
Ticket 2503 --> https://svn.open-mpi.org/trac/ompi/ticket/2503
not using in Open MPI (i.e., that stuff is only used in the standalone
builds of hwloc -- it's not compiled/installed/used by Open MPI).
This commit was SVN r23416.
variable was set, it was prefixed to ''all'' values in the wrapper
compiler data text files. For example, if OPAL_DESTDIR was set to
/tmp/bogus and a wrapper compiler data file contained the line:
preprocessor_flags=-pthread
The value would be exanded to:
/tmp/bogus/-pthread
Which is clearly wrong. After some back-and-forth with Ralph and
Brian, Brian submitted this patch that fixes the problem. Now we
handle three cases properly (assume that configure was invoked with
--prefix=/opt/openmpi and no other directory specifications, and
$OPAL_DESTDIR is set to /tmp/buildroot):
1. Individual directories, such as libdir. These need to be prepended
with DESTDIR. I.e., return /tmp/buildroot/opt/openmpi/lib.
2. Compiler flags that have ${FIELD} values embedded in them. For
example, consider if a wrapper compiler data file contains the
line:
preprocessor_flags=-DMYFLAG="${prefix}/share/randomthingy/"
The value we should return is:
-DMYFLAG="/tmp/buildroot/opt/openmpi/share/randomthingy/"
3. Compiler flags that do not have any ${FIELD} values. For example,
consider if a wrapper compiler data file contains the line:
preprocessor_flags=-pthread
The value we should return is:
-pthread
Note, too, that this OPAL_DESTDIR futzing only needs to occur during
opal_init(). By the time opal_init() has completed, all values should
be substituted in that need substituting. Hence, we take an extra
parameter (is_setup) to know whether we should do this futzing or
not.
This commit was SVN r23402.
platforms (e.g., PPC64 running RHEL 5.4) -- sometimes it only finds
PUs. So in that case, just run the same calculation, but with PUs
instead of cores.
This commit was SVN r23305.
* Remove OPAL_ERR_PAFFINITY_NOT_SUPPORTED; fit it into the generic
OPAL_ERR_NOT_SUPPORTED case.
* When odls_default detects that processor affinity is not supported,
it prints a specific message about it, and then it suppressed a
generic HNP help message that would normally follow it (i.e., it's
easier to have the "processor affinity is not supported" show_help
message last).
* Use some symbolic names in odls_default instead of fixed int's,
just for slight readability improvements in the code.
* Introduce orte_show_help_suppress(), which gives the ability to
suppress any future showings of any arbitrary show_help() message.
This is useful if you display message X and want to suppress
message Y. This suppression *only* works in environments where
orte_show_help() does coalescing.
This commit was SVN r23249.
distribution tarball, and would therefore cause automake to fail (in
case someone invokes autogen.sh on a distribution tarball).
This commit was SVN r23218.
* If < 0, it's an OPAL_ERR_* value
* If >= 0, it's the actual output value of the function
This is problematic for the OPAL_SOS stuff. This commit changes those
functions to always return OPAL_* statuses and send the output value
back through output parameters (like 95% of the rest of the code
base). This avoids the confusion with OPAL_SOS stuff and makes
paffinity work again (e.g., mpirun --bind-to-core ...).
I updated all paffinitiy modules for the new function signatures, and
bumped the paffinity API version up to 2.0.1. I don't think the
version change will matter, though, because we'll be introducing
support for hardware threads soon, which will either bump the
paffinity version again or we'll replace paffinity with
a new framework.
This commit was SVN r23197.
The fix is to just check if the return value is positive or not, since all the SOS encoded errors are *always* negative.
The real fix (as Ralph points out) is to change these functions (opal_pointer_array_add and mca_base_param*) to return the index as a pointer.
This commit was SVN r23173.
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
back the native error code.
* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
(OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
decode 'ret' to get the native error code.
This commit was SVN r23162.
I forgot to mention one more thing in the r23152 commit message:
* Copy the fix for hwloc's m4 to disable the configure flag
--enable-debug when building in embedding mode, because it can be
hijacked by the outter-level application. In this case, if you
configured OMPI with --enable-debug (or have --enable-debug in a
platform file), you'd see all of hwloc's debug output. Ick. hwloc
1.0 will include this fix.
This commit was SVN r23153.
The following SVN revision numbers were found above:
r23152 --> open-mpi/ompi@ca3362021e
* Fix disabling hwloc build (i.e., put the AM_CONDITIONALs where they
belong in the configure.m4 file)
* Update some svn:ignores
* r23142 removed some extraneous code, but forgot to remove the
variables used only by that code
This commit was SVN r23152.
The following SVN revision numbers were found above:
r23142 --> open-mpi/ompi@610fc67d12
supports a wide variety of operating systems and platforms; see the
opal/mca/paffinity/hwloc/hwloc/README file for details.
This component includes an embedded copy of hwloc, currently based on
hwloc-1.0rc6. But note that hwloc is properly SVN imported into the
/vendor branch, so it will be easy to update when 1.0 GA is released.
Note that the hwloc tree embedded in opal/mca/paffinity/hwloc/hwloc is
identical to a hwloc distribution tarball, except that much of the
documentation was rm -rf'ed (because we don't need it for the embedded
case).
Since the paffinity framework currently does not understand hardware
threads, the hwloc component compensates for this by identifying cores
by the "first" hardware thread on that core. Hopefully we'll update
paffinity someday to understand hardware threads. :-)
configure grew a --with-hwloc option, analogous to what we do for many
other external libraries that OMPI supports. However, there's a new
feature: due to the request of several distros, OMPI can be configured
to build with its internal copy of hwloc or with an external copy of
hwloc (e.g., a system-installed hwloc).
1. If --with-hwloc is not specified, Open MPI will try to use its
internal copy (but silently fail/ignore hwloc if that fails).
1. If --with-hwloc=<dir> is supplied, Open MPI looks for hwloc
support in <dir> (and --with-hwloc-libdir=<dir>, if specified).
1. If --with-hwloc=external is supplied, Open MPI will look for hwloc
in a compiler/linker default external location.
1. If --with-hwloc=internal is supplied, Open MPI will use its
internal copy of hwloc.
Some of OMPI's main configury had to be slightly re-arranged in the
bootstrapping phase to accomodate hwloc's configry needs.
This commit was SVN r23125.
http://marc.info/?l=linux-mm-commits&m=127352503417787&w=2 for more
details.
* Remove the ptmalloc memory component; replace it with a new "linux"
memory component.
* The linux memory component will conditionally compile in support
for ummunotify. At run-time, if it has ummunotify support and
finds run-time support for ummunotify (i.e., /dev/ummunotify), it
uses it. If not, it tries to use ptmalloc via the glibc memory
hooks.
* Add some more API functions to the memory framework to accomodate
the ummunotify model (i.e., poll to see if memory has "changed").
* Add appropriate calls in the rcache to the new memory APIs to see
if memory has changed, and to react accordingly.
* Add a few comments in the openib BTL to indicate why we don't need
to notify the OPAL memory framework about specific instances of
registered memory.
* Add dummy API calls in the solaris malloc component (since it
doesn't have polling/"did memory change" support).
This commit was SVN r23113.
It is okay to not have a paffinity module IF you aren't using paffinity anyway. So don't error out of MPI_Init because a paffinity module wasn't selected.
Cleanup error reporting in the odls default module to (once and for all!) eliminate messages originating in the fork'd process. Create some new error codes to allow us to pass enough info back to the parent process to provide useful error messages.
This commit was SVN r23106.
done this way a long time ago for the "gee whiz!" factor -- when in
reality, they really only need one-of-many-run-time priority
selection).
Changed run-time priorities to be as follows:
* darwin: 20
* linux: 20
* posix: 10
* solaris: 30
* test: 5
* windows: 20
I have a very dim (possibly untrue) recollection that Solaris needs to
have a higher priority than others just to ensure that no other is
chosen under Solaris. Make all other "native" components have a
priority of 20 (they shouldn't conflict with each other). Make the
posix fallback component have a priority of 10. Make the test
component priority 5, meaning someone can always select it, but you
can also make a "never select me" component that prioritizes itself
under test.
This commit was SVN r22997.
modify the OPAL_PAFFINITY_PROCESS_IS_BOUND macro to search the cpuset for
the maximum possible number of cpus rather than just the number of cpus
currently online. This corrects a problem where mpi_paffinity_alone was
not working properly on systems in which there can be cpu namespaces with
holes, such as on ppc64 with smt off (as discussed in #2365).
This commit was SVN r22927.
#2322.
The short version is that this patch consolidates two pieces of code
that call the back-end munmap and ensures that (if dlsym is used) the
corresponding dlsym is only invoked once and that the variable holding
the result is volatile.
This commit was SVN r22863.
The following Trac tickets were found above:
Ticket 2104 --> https://svn.open-mpi.org/trac/ompi/ticket/2104
Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those.
The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed.
This commit was SVN r22824.
anything for non-MPI apps. Oops! (But before you freak out, gentle
reader, note that mpi_paffinity_alone for MPI apps still worked fine)
When we made the switchover somewhere in the 1.3 series to have the
orted's do processor binding, then stuff like:
mpirun --mca mpi_paffinity_alone 1 hostname
should have bound hostname to processor 0. But it didn't because of a
subtle startup ordering issue: the MCA param registration for
opal_paffinity_alone was in the paffinity base (vs. being in
opal/runtime/opal_params.c), but it didn't actually get registered
until after the global variable opal_paffinity_alone was checked to
see if we wanted old-style affinity bindings. Oops.
However, for MPI apps, even though the orted didn't do the binding,
ompi_mpi_init() would notice that opal_paffinity_alone was set, yet
the process didn't seem to be bound. So the MPI process would bind
itself (this was done to support the running-without-orteds
scenarios). Hence, MPI apps still obeyed mpi_paffinity_alone
semantics.
But note that the error described above caused the new mpirun switch
--report-bindings to not work with mpi_paffinity_alone=1, meaning that
the orted would not report the bindings when mpi_paffinity_alone was
set to 1 (it ''did'' correctly report bindings if you used
--bind-to-core or one of the other binding options).
This commit separates out the paffinity base MCA param registration
into a small function that can be called at the Right place during the
startup sequence.
This commit was SVN r22602.
* Don't build the pstat component if all defines needed aren't there.
* Update platform file to work better
* Work around two places that depended on modex being operational
This commit was SVN r22536.
Originally the patch was to improve the error message, but when digging into the code I found a subtle bug. If the daemon does not tell the HNP what CRS component it used, then the HNP tries to figure it out from the metadata (this is an uncommon case). The path the HNP used was not complete, so it was unable to find the metadata information. This patch fixes this by adding the 'snapshot_reference' to the 'snapshot_location' which completes the path for this search.
cmr:v1.4 (needs a custom patch)
cmr:v1.5
This commit was SVN r22479.
The following Trac tickets were found above:
Ticket 2190 --> https://svn.open-mpi.org/trac/ompi/ticket/2190
party/"vendor" import, the changes are actually far smaller than the
size of this changeset implies. Here's a list of the changes:
* Update the AMD license header in plpa_map.c to be less restrictive
(see https://svn.open-mpi.org/trac/plpa/changeset/262 for details)
-- '''this is the most/only important change of this update.''' No
code is changed by this; only removing a clase from a license
header in plpa_map.c.
* Changes to the generated {{{configure}}}, {{{config.guess}}}, and
{{{config.sub}}} scripts (which aren't used by OMPI).
* soname version tracking changes (which also aren't used by OMPI;
they're only used when PLPA is built/installed in "standalone"
mode).
* Update the "get version" m4 (which was stolen from OMPI's m4 to
begin with, and is only used during OMPI's autogen.sh step).
* Update various PLPA version numbers to 1.3.2.
* Bug fix in plpa-taskset (which is not built in the OMPI PLPA build).
This commit was SVN r22367.
:-)
Okay, cleanup the prior commit so that the default component search path shows in ompi_info, and remains available in component_find.
This commit was SVN r22278.
"my_perfect_path":SYSTEM_DEFAULT:USER_DEFAULT
and OPAL will substitute its internally derived values for the defaults (instead of forcing the user to figure them out).
This commit was SVN r22272.
Add orte configuration option to control the use of the framework in the system. Although the code will build, it will not be active unless configured with --enable-bootstrap.
If bootstrap is enabled and the new opal_sysinfo framework can successfully determine the cpu model, pass that info to the application as an MCA param to support some work at Sun.
Also, have daemons report back the resources they find to guide process mapping in bootstrap operations (i.e., where the daemon starts at node boot as opposed to being launched at application start).
Adjust some platform files to enable these capabilities.
This commit was SVN r22244.
PARAM_WINDOWS_FILES is a mistake or not). Fixes trac:2079.
This commit was SVN r22171.
The following Trac tickets were found above:
Ticket 2079 --> https://svn.open-mpi.org/trac/ompi/ticket/2079
posix_memalign() will either return 0 or not, indicating success. And
if posix_memalign() fails, it's not always going to be due to
out-of-memory -- just return ERR_IN_ERRNO.
This commit was SVN r22070.
MAP_PRIVATE. We didn't catch this because we checked for a NULL
return, not a -1 return. Doh! Thanks again to Julian Seward for
continuing to track this down.
This commit was SVN r22062.
opposite of MAKE_MEM_DEFINED. Also add in a call to NOACCESS to
(mostly) reverse the effects of MAKE_MEM_DEFINED (technically, page 0
was accessible before this, even though it's a Bad Idea to access it).
This commit was SVN r22056.
This commit looks larger than it really is since it includes a fair amount of code cleanup.
The SIGSTOP/SIGCONT+checkpointing work uses some of the functionality in r20391. Basic use case below (note that the checkpoint generated is useable as usual if the stopped application is terminated).
{{{
shell 1) mpirun -np 2 -am ft-enable-cr my-app
... running ...
shell 2) ompi-checkpoint --stop -v MPIRUN_PID
[localhost:001300] [ 0.00 / 0.20] Requested - ...
[localhost:001300] [ 0.00 / 0.20] Pending - ...
[localhost:001300] [ 0.01 / 0.21] Running - ...
[localhost:001300] [ 1.01 / 1.22] Stopped - ompi_global_snapshot_1234.ckpt
Snapshot Ref.: 0 ompi_global_snapshot_1234.ckpt
shell 2) killall -CONT mpirun
... Application Continues execution in shell 1 ...
}}}
Other items in this commit are mostly cleanup that has been sitting off-trunk for too long:
* Add a new {{{opal_crs_base_ckpt_options_t}}} type that encapsulates the various options that could be passed to the CRS. Currently only TERM and STOP, but this makes adding others ''much'' easier.
* Eliminate ORTE_SNAPC_CKPT_STATE_PENDING_TERM, since it served a redundant purpose with the new options type.
* Lay some basic ground work for some future features.
This commit was SVN r21995.
The following SVN revision numbers were found above:
r20391 --> open-mpi/ompi@0704b98668
masks between the mask argument and a local PLPA mask. There were three
problems:
1) The "get" function computed the number of bits as sizeof(mask),
which is the size of the pointer to the mask rather than the mask
itself. So, only 4 bits were copied with m32 and 8 bits with m64.
There are actually 1024 bits.
2) The "get" and "set" functions both copied a number of bits computed
from the sizeof() mask, but sizeof() reports the number of bytes.
We have to multiply by 8 to get the number of bits.
3) These two functions check to make sure tha the mask argument is not
bigger than the PLPA mask. But, the set function copies a number
of bits in the PLPA mask, which is conceivably greater than the
number of bits in the mask argument. So, accesses to the mask
argument may overrun that argument.
Problems 1 and 2 meant that one would encounter errors when the number of
cores exceeded 4 (with -m32) or 8 (with -m64). Problem 3 probably caused
no errors.
This commit was SVN r21993.
The new options work by adding an ":if-avail" qualifier to the "bind-to-socket" and "bind-to-core" MCA params. If the system does not support this capability, the job will launch anyway. Without the qualifier, the job will abort with an error message indicating that the required functionality is not supported on this system.
This commit was SVN r21975.
Due to a typo (probably cut/paste error) the CFLAGS/CPPFLAGS/LDFLAGS/LIBS arguments were not propogated to the BLCR component.
This changes a few {{{crs_blcr_check_}}} to {{{crs_blcr_save}}}, which they should have been all along.
This needs to be applied to v1.3 as well :/
This commit was SVN r21860.
#if defined (c_plusplus)
defined (__cplusplus)
followed by
extern "C" {
and the closing counterpart by BEGIN_C_DECLS and END_C_DECLS.
Notable exceptions are:
- opal/include/opal_config_bottom.h:
This is our generated code, that itself defines BEGIN_C_DECL and
END_C_DECL
- ompi/mpi/cxx/mpicxx.h:
Here we do not include opal_config_bottom.h:
- Belongs to external code:
opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.c
opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.h
- opal/include/opal/prefetch.h:
Has C++ specific macros that are protected:
- Had #if ... } #endif _and_ END_C_DECLS (aka end up with 2x
END_C_DECLS)
ompi/mca/btl/openib/btl_openib.h
- opal/event/event.h has #ifdef __cplusplus as BEGIN_C_DECLS...
- opal/win32/ompi_process.h: had extern "C"\n {...
opal/win32/ompi_process.h: dito
- ompi/mca/btl/pcie/btl_pcie_lex.l: needed to add *_C_DECLS
ompi/mpi/f90/test/align_c.c: dito
- ompi/debuggers/msgq_interface.h: used #ifdef __cplusplus
- ompi/mpi/f90/xml/common-C.xsl: Amend
Tested on linux using --with-openib and --with-mx
The following do not contain either opal_config.h, orte_config.h or
ompi_config.h
(but possibly other header files, that include one of the above):
ompi/mca/bml/r2/bml_r2_ft.h
ompi/mca/btl/gm/btl_gm_endpoint.h
ompi/mca/btl/gm/btl_gm_proc.h
ompi/mca/btl/mx/btl_mx_endpoint.h
ompi/mca/btl/ofud/btl_ofud_endpoint.h
ompi/mca/btl/ofud/btl_ofud_frag.h
ompi/mca/btl/ofud/btl_ofud_proc.h
ompi/mca/btl/openib/btl_openib_mca.h
ompi/mca/btl/portals/btl_portals_endpoint.h
ompi/mca/btl/portals/btl_portals_frag.h
ompi/mca/btl/sctp/btl_sctp_endpoint.h
ompi/mca/btl/sctp/btl_sctp_proc.h
ompi/mca/btl/tcp/btl_tcp_endpoint.h
ompi/mca/btl/tcp/btl_tcp_ft.h
ompi/mca/btl/tcp/btl_tcp_proc.h
ompi/mca/btl/template/btl_template_endpoint.h
ompi/mca/btl/template/btl_template_proc.h
ompi/mca/btl/udapl/btl_udapl_eager_rdma.h
ompi/mca/btl/udapl/btl_udapl_endpoint.h
ompi/mca/btl/udapl/btl_udapl_mca.h
ompi/mca/btl/udapl/btl_udapl_proc.h
ompi/mca/mtl/mx/mtl_mx_endpoint.h
ompi/mca/mtl/mx/mtl_mx.h
ompi/mca/mtl/psm/mtl_psm_endpoint.h
ompi/mca/mtl/psm/mtl_psm.h
ompi/mca/pml/cm/pml_cm_component.h
ompi/mca/pml/csum/pml_csum_comm.h
ompi/mca/pml/dr/pml_dr_comm.h
ompi/mca/pml/dr/pml_dr_component.h
ompi/mca/pml/dr/pml_dr_endpoint.h
ompi/mca/pml/dr/pml_dr_recvfrag.h
ompi/mca/pml/example/pml_example.h
ompi/mca/pml/ob1/pml_ob1_comm.h
ompi/mca/pml/ob1/pml_ob1_component.h
ompi/mca/pml/ob1/pml_ob1_endpoint.h
ompi/mca/pml/ob1/pml_ob1_rdmafrag.h
ompi/mca/pml/ob1/pml_ob1_recvfrag.h
ompi/mca/pml/v/pml_v_output.h
opal/include/opal/prefetch.h
opal/mca/timer/aix/timer_aix.h
opal/util/qsort.h
test/support/components.h
This commit was SVN r21855.
The following SVN revision numbers were found above:
r2 --> open-mpi/ompi@58fdc18855
- add a cmake module for searching libltdl libraries and headers
- a configure option to enable DSO build, default OFF.
- update a few source files for including correct header
and loading correct mca libraries path/suffix.
This commit was SVN r21804.
Adds several new mpirun options:
* -bysocket - assign ranks on a node by socket. Effectively load balances the procs assigned to a node across the available sockets. Note that ranks can still be bound to a specific core within the socket, or to the entire socket - the mapping is independent of the binding.
* -bind-to-socket - bind each rank to all the cores on the socket to which they are assigned.
* -bind-to-core - currently the default behavior (maintained from prior default)
* -npersocket N - launch N procs for every socket on a node. Note that this implies we know how many sockets are on a node. Mpirun will determine its local values. These can be overridden by provided values, either via MCA param or in a hostfile
Similar features/options are provided at the board level for multi-board nodes.
Documentation to follow...
This commit was SVN r21791.
Refs trac:1987
The patch for v1.3 attached to Ticket #1987 already includes this change.
I did not have a chance to commit this last night, so sorry for the delay.
This commit was SVN r21777.
The following Trac tickets were found above:
Ticket 1987 --> https://svn.open-mpi.org/trac/ompi/ticket/1987
* Check for {{{dlfcn.h}}} in the self component's configure.m4 (also clean up the .m4 a bit.
* Adjust the priority of the BLCR component so that the self component has a higher priority (if the application went to the trouble of writing the routines, why not use them.) The 'self' component checks for the appropriate functions during query, so it know if it -can- be used during component selection.
* Adjust some copyrights that I missed before
* Fix a warning when casing the result of dlsym() into a function pointer. There is a bit of pointer magic to make this happen (thanks to the following website, and RedHat EL 4 man pages for illustrating it:
http://www.opengroup.org/onlinepubs/009695399/functions/dlsym.html
Passing to Jeff for a final review of the patch before moving to v1.3.
This commit was SVN r21768.
The following SVN revision numbers were found above:
r21766 --> open-mpi/ompi@91e52d062b
Due to the visibility patch to libltdl in r21731, this module can no longer access or use the libltdl interfaces directly. Instead just use the dlopen/dlsym/dlclose functions directly. This is a portability implication here, but for the moment it does not seem to bite us.
Also in this patch, cleanup some of the 'self' specific code paths.
* opal-restart need not special case the 'self' component since it can now interact with it as if it were a normal component.
* Cleanup the initialization of the cmd line arguments in opal-restart.
* Make sure to mark opal-restart as a 'tool', but do so by setting the global variable directly instead of setting the environment variable, which could be inherited by the application.
* Most of the functions in the 'self' component should not be used by a command line tool (exception being 'restart'), so make sure that if we accidently call them then errors are returned.
* Increase the priority of the 'none' component to be above that of 'self' when being selected in a command line tool. This allows for both mpirun and opal-restart to work correctly with the 'self' module.
This commit was SVN r21766.
The following SVN revision numbers were found above:
r21731 --> open-mpi/ompi@0278b86456