* Protect the '->bitmap' field if init() is called more than once [it shouldn't be, but if it is then this avoids a memory leak].
* Some new functions
* opal_bitmap_bitwise_and_inplace
* opal_bitmap_bitwise_or_inplace
* opal_bitmap_bitwise_xor_inplace
* opal_bitmap_are_different
* opal_bitmap_get_string
Adding these features to the trunk so others have access to them if they need them. A couple off trunk branches make use of them.
This commit was SVN r24767.
Fix a bug in the new code that prevented the system from correctly matching addresses.
Remove comments in the show-help text indicating that we would continue in the face of incorrect specifications - leave that to the calling layer to decide.
Modify the new opal_ifmatches so it returns error codes letting the caller better understand the result.
Modify the oob to ensure we abort if we don't find interfaces matching specified constraints, and that we do so without multiple error messages.
NOTE: we have a conflict in our standards. We have been using comma-delimited lists of interfaces for all our params. However, one param - opal_net_private_ipv4 - now uses semicolons instead of comma separators. No idea why, but it is confusing.
This commit was SVN r24755.
Interested readers can quench their curiosity either with one
of the Richard Stevens books (ISBN 9780201633467) or the
Wikipedia page (http://en.wikipedia.org/wiki/Subnetwork).
This commit was SVN r24680.
Correctly check the dependencies of MSYS env.
Set up configure include and lib path for building the package.
update a few more CMake scripts.
This commit was SVN r24663.
that enabling "local_only" by default could cause excessive
by-NUMA-node paging and/or OOMs (rather than allowing memory
allocations to spill over to other NUMA nodes).
This brought home the very real-world example of people buying servers
with more processors/cores than they need, just to get more memory.
We wouldn't want Badness to occur in such scenarios by default.
Instead, let people turn on "only allow memory allocations on my local
NUMA node" if their application would benefit from it.
This commit was SVN r24648.
After a long period of development with many starts and stops, we
finally got this where we wanted it.
This commit introduces 2 new MCA params (note that the
"maffinity_libnuma_policy" MCA param introduced by r24290 was removed
when libnuma support was removed). Remember that maffinity policies
are only in effect when paffinity is enaabled -- i.e., when processes
are bound to processors!
* '''maffinity_base_alloc_policy:''' Policy that determines how
general memory allocations are bound after MPI_INIT. A value of
"none" means that no memory policy is applied. A value of
"local_only" means that all memory allocations will be restricted
to the local NUMA node where each process is placed. Note that
operating system paging policies are unaffected by this setting.
For example, if "local_only" is used and local NUMA node memory is
exhausted, a new memory allocation may cause paging.
* '''maffinity_base_bind_failure_action:''' What Open MPI will do if
it explicitly tries to bind memory to a specific NUMA location, and
fails. Note that this is a different case than the general
allocation policy described by maffinity_base_alloc_policy. A
value of "warn" means that Open MPI will warn the first time this
happens, but allow the job to continue (possibly with degraded
performance). A value of "error" means that Open MPI will abort
the job if this happens.
This needs at least a little soak time on the trunk before going to
v1.5.
This commit was SVN r24639.
The following SVN revision numbers were found above:
r24290 --> open-mpi/ompi@afa654746c
The following Trac tickets were found above:
Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
Upgrade to hwloc 1.2 (from hwloc 1.1.2). This should fix the problems
Nathan's seeing in #2778.
Let's let this soak on the trunk for a little while and see how LANL's
MTT's work out. If that works, then we can CMR this to v1.5.
This commit was SVN r24635.
The following Trac tickets were found above:
Ticket 2778 --> https://svn.open-mpi.org/trac/ompi/ticket/2778
Nth core, so it fell over to try to find the Nth PU.
-----
hwloc isn't able to find cores on all platforms. Example: PPC64
running RHEL 5.4 (linux kernel 2.6.18) only reports NUMA nodes and
PU's. Fine.
However, note that hwloc_get_obj_by_type() will return NULL in 2
(effectively) different cases:
- no objects of the requested type were found
- the Nth object of the requested type was not found
So first we have to see if we can find *any* cores by looking for the
0th core. If we find it, then try to find the Nth core. Otherwise,
try to find the Nth PU.
This commit was SVN r24632.
Rename the memusage sensor plugin to "resusage" as it will soon be updated to include full process stat monitoring.
Extend the heartbeat sensor to report node and process stats in the heartbeat.
Store the process and node stats in their respective orte_xxx_t object.
This commit was SVN r24629.
released):
backport hwloc r 3416 from trunk: Add cache info entry _after_ checking
that we need one, thanks Andriy Gapon for the fix
This commit was SVN r24612.
The following SVN revision numbers were found above:
r3418 --> open-mpi/ompi@9972663a12
--disable-dlopen is used. Thanks to David Gunter for reporting the
issue.
This commit was SVN r24603.
The following Trac tickets were found above:
Ticket 2768 --> https://svn.open-mpi.org/trac/ompi/ticket/2768
appeared multiple times in ompi_info output (so did others, but this
is the one that was noticed). Ensure that we don't repeat
opal_paffinity_base_register_params() multiple times.
This commit was SVN r24569.
No need for any CMRs to 1.5... that was already done in CMR 2728.
This commit was SVN r24545.
The following SVN revision numbers were found above:
r22841 --> open-mpi/ompi@b400b84162
* Convert from AC_TRY_RUN to AC_RUN_IFELSE.
* Excellent suggestion from Paul Hargrove: use AC_CHECK_FUNC to look
for a Linuxthreads-specific symbol when we're cross compiling to
see if threads will have different PIDs (because AC_CHECK_FUNC
works properly even when in cross-compiling environments).
Background: the old/Linuxthreads-based pthreads implementation used
the Linux clone() call to make threads, which effectively meant that
each thread had a different PID. The new NPTL pthreads implementation
does things better, meaning that threads have the same PID.
Open MPI no longer supports threads with different PIDs -- we ripped
out the supporting code for threads with different PIDs because we
don't have systems available to test this on anymore (anyone who still
has such a system can still use older versions of Open MPI). Hence,
configure needs to determine whether the target system will have the
same PID for threads or not -- even if we're cross-compiling. The
current test compiles and runs a multi-threaded app that checks PIDs
of different threads, but we clearly can't do that in a
cross-compiling environment. So use AC_CHECK_FUNC in cross-compiling
environments.
Simple, no?
This commit was SVN r24537.
There was no compelling reason to support such old kernels. Accordingly, convert the test to print a nice error message indicating we no longer support old kernels (but indicate that earlier OMPI versions do) and error out. Remove all code that was protected by "if have different pids" since it can no longer be compiled.
This commit was SVN r24531.
fix will be included in hwloc 1.1.2.
Brad -- can you verify that this fixes the issue for you?
Fixes trac:2732.
This commit was SVN r24450.
The following Trac tickets were found above:
Ticket 2732 --> https://svn.open-mpi.org/trac/ompi/ticket/2732
filenames -- don't include the project name ("opal")
* Don't link maffinity/hwloc and paffinity/hwloc against the common
hwloc in the static build case (because this will result in
duplicate symbols)
This commit was SVN r24447.
hopefully, this now compiles for libnuma 0.9.x and libnuma 2.0.x.
Fixes for the strategy discussed in the commit message for r24442
(i.e., check against numa_get_mems_allowed(), which only exists in
libnuma 2.0.x) and the new new new plan on #2698 coming in a separate
commit.
This commit was SVN r24443.
The following SVN revision numbers were found above:
r24442 --> open-mpi/ompi@90a8fe4aad
(with libnuma-2.0.4 / LIBNUMA_API_VERSION 2): numa_get_run_node_mask
returns a struct bitmask *.
Whether it's a good idea to blindly pass that on to
numa_set_membind() is another matter: one might want to match against
the list returned by numa_get_mems_allowed(), which may be set by the
outside environment.
Refs trac:2698.
This commit was SVN r24442.
The following SVN revision numbers were found above:
r24421 --> open-mpi/ompi@31510e683b
The following Trac tickets were found above:
Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
Update the CMake script for checking mca subdirs.
Add windows support for __attribute__ packed structures.
Define usleep and posix_memalign with equivalent windows functions.
And a few minor fixes, type casts.
This commit was SVN r24429.
what memory node the process is running on (which is guaranteed to be
a good answer because maffinity won't be invoked unless the process is
already bound to a specific processor), and then bind our memory to
that.
Refs trac:2698.
This commit was SVN r24421.
The following SVN revision numbers were found above:
r24290 --> open-mpi/ompi@afa654746c
The following Trac tickets were found above:
Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
OMPI supports multiple different repository systems (SVN, hg, git).
But the VERSION file has listed "want_svn" and "svn_r" as fields, even
though the actual repo system and version may not be SVN.
So search/replace those fields (and derrivative values that come from
those fields) with "want_repo_rev" and "repo_rev", respectively.
This commit was SVN r24405.
Add some new proc/job states
Rename a constant to reflect coming change - remove the arbitrary difference between restarting a proc locally and relocating it to another node in terms of the number of restarts allowed.
Add pretty-print of signals for "proc aborted due to signal" reports.
This commit was SVN r24378.
The following SVN revision numbers were found above:
r24371 --> open-mpi/ompi@93d28a5792
This means that the converters (opal_err2str, orte_err2str) can now
return NULL as a "silent error". The return value of opal_err2str_fn_t
is the status of the operation (OPAL_SUCCESS or OPAL_ERROR).
This fixes the "Unknown error" message issues on the trunk.
This commit was SVN r24371.
Temporarily remove hwloc's internal version of myriexpress.h. It is
causing a problem when compiling Open MPI with MX support because
hwloc uses AC_CONFIG_HEADER in hwloc's hwloc.m4 to generate
opal/mca/paffinity/hwloc/hwloc/include/hwloc/config.h.
AC_CONFIG_HEADER apparently has the (undocumented) side effect of
adding -I$(top_builddir)/opal/mca/paffinity/hwloc/hwloc/include/hwloc
to OMPI's compilation flags. Hence, when the OMPI MX components are
compiled and #include "myriexpress.h" (or <myriexpress.h>) they see
hwloc's myriexpress.h before the system one. Badness ensures.
This removal is temporary because we need to figure out a better
solution. But for now, OMPI is not using hwloc's myriexpress.h file --
so it's safe to remove. I'll push this issue upstream to hwloc to
figure out a better solution...
This commit was SVN r24354.
The following Trac tickets were found above:
Ticket 2690 --> https://svn.open-mpi.org/trac/ompi/ticket/2690
Short Version:
--------------
Event engine needs to be flushed so it does not use old/stale file descriptors.
Long Version:
-------------
The problem was that the restarted process was waiting for the socket to the local daemon to finish establishing during the 'sync' operation. The core problem was that the daemon was sending a header of 36 bytes, but the restarted process only received 35 bytes of the message. So the restarted process became stuck waiting for the last byte to arrive.
After many hours of digging, I figured out that the event engine was using the same file descriptor for its evsig_cb functionality (to signal itself when a signal arrives). So when the daemon wrote in to the new fd the event engine was stealing the first byte (*shakes fist at event engine*) before the recv() could be posted.
The solution is to use the event_reinit() function on restart to re-establish the now-stale file descriptors in the event engine. This seems to have fixed the problem.
A few other minor things:
-------------------------
* Add a check to make sure the event engine is balanced in its init/finalize
* Add the opal_event_base_close() to the BLCR restart exec function (still not 100% sure it is needed, but there it is).
This commit was SVN r24296.
last December. :-(
Add new MCA param: maffinity_libnuma_policy. Thanks to David
Singleton for the suggestion. Here's the help text about it:
{{{
MCA maffinity: parameter "maffinity_libnuma_policy" (current value:
<loose>, data source: default value)
Binding policy that determines what happens if memory
is unavailable on the local NUMA node. A value of
"strict" means that the memory allocation will fail;
a value of "loose" means that the memory allocation
will spill over to another NUMA node.
}}}
This commit was SVN r24290.
either direct link to these basic predefined types, or a combination of them.
Anyway, the first items in the datatype list belong to OPAL, the second round
are MPI datatypes created by composing basic OPAL datatypes, and the last
batch are mapped datatype (direct correspondance between an OMPI datatype and
an OPAL one such as int -> int32_t).
Modify the op to fit this new scheme.
This commit was SVN r24247.
the module to use the new hwloc bitmap API (the cpuset API is both
klunkier and deprecated), which simplified a few things.
This commit was SVN r24217.
{{{
base/paffinity_base_service.c: In function ‘opal_paffinity_base_cset2mapstr’:
base/paffinity_base_service.c:623: warning: unused variable ‘range_last’
base/paffinity_base_service.c:623: warning: unused variable ‘range_first’
base/paffinity_base_service.c:622: warning: unused variable ‘count’
base/paffinity_base_service.c:622: warning: unused variable ‘m’
}}}
{{{
connect/btl_openib_connect_oob.c: In function ‘init_ud_qp’:
connect/btl_openib_connect_oob.c:1111: warning: control reaches end of non-void function
connect/btl_openib_connect_oob.c: In function ‘init_device’:
connect/btl_openib_connect_oob.c:1235: warning: unused variable ‘i’
connect/btl_openib_connect_oob.c: In function ‘get_pathrecord_sl’:
connect/btl_openib_connect_oob.c:1323: warning: unused variable ‘i’
}}}
This commit was SVN r24196.
It is statically initialized to the real back-end OPAL show_help
function. During orte_show_help_init(), the variable is re-assigned
with the value of the back-end ORTE show_help function (the one that
does error message aggregation).
Therefore, anything that calls opal_show_help() after a certain point
in orte_init() will have their show_help messages be aggregated.
w00t! Even code down in OPAL -- that has no knowledge of ORTE -- will
have their messages aggregated. '''Double w00t!'''
During orte_show_help_finalize(), we restore the original pointer
value so that it something calls opal_show_help() after
orte_finalize(), it'll still work properly (but it won't be
aggregated).
This commit was SVN r24185.
1. Remove it from libevent207.h because it is not needed.
2. Add compat to the include list so it can use queue.h when needed.
This commit was SVN r24144.
Change OMPI code in libevent to not use bool.
Add some comments to indicate OMPI specific code.
This should fix compiles on Sun Studio Solaris.
This commit was SVN r24062.
As memmove is slower than memcpy, I added the required logic to only use it when
really necessary.
No modification from developers point of view, you should always call
opal_datatype_copy_content_same_ddt.
This commit was SVN r24047.
It's not possible to combine two shared libraries on Windows, so we have to do it a bit different. First generate a small event static library by just linking the object files, and link it into other libraries that needs the libevent API.
This commit was SVN r24039.
handles more modern versions of Autoconf's --program-transform
arguments. Also make it clear that the message is coming from Open
MPI logic, so that we don't blame Autoconf, Red Hat, or anyone else
next time!
This commit was SVN r24024.
The following Trac tickets were found above:
Ticket 2611 --> https://svn.open-mpi.org/trac/ompi/ticket/2611
1. create DLL A, export symbols of A, import nothing (A normally is OPAL)
should define _USRDLL , A_EXPORT
2. create DLL B, export symbols of B, import A.lib (B could be ORTE, OMPI or other ompi tools)
should define _USRDLL, B_EXPORT
3. create DLL C, import B.dll (C could be external libs or apps)
should define B_IMPORT
This commit was SVN r24016.
libevent creates its event-config.h during "make all" (vs. during
configure). The prior method around this didn't work because it wrote
an event-config.h.in in the source tree -- a Bad Idea(tm). The new
way uses AC_CONFIG_COMMAND to get stuff executed at the end of
config.status to create event-config.h. This seems to work properly
during make distcheck.
This commit was SVN r23975.
Note: the ompi_check_libfca.m4 file had to be modified to avoid it stomping on global CPPFLAGS and the like. The file was also relocated to the ompi/config directory as it pertains solely to an ompi-layer component.
Forgive the mid-day configure change, but I know Shiqing is working the windows issues and don't want to cause him unnecessary redo work.
This commit was SVN r23966.
After talking with Brian, we're pretty sure that this is only because
really, really old libevent didn't allow bitwise or-ing of the other
loop types, because what we really need is (EVLOOP_ONCE |
EVLOOP_NONBLOCK). And that's what EVLOOP_ONELOOP did (i.e., we
changed the logic of libevent's event.c to let ONELOOP do both ONCE
and NONBLOCK things).
In the new libevent version, we didn't implement EVLOOP_ONELOOP
properly. As a result, and we got hangs in the SM BTL add_procs
function. Note that the SM BTL wasn't to blame -- it was purely a
side-effect of bad ONELOOP integration (i.e., if you got past the SM
BTL add_procs, you may well have hung somewhere else).
This commit removes all ONELOOP customizations from event.c and
returns it to (almost) its original state from the libevent 2.0.7-rc
distribution. Everwhere in the code base where we used ONELOOP, we
now use (ONCE | NONBLOCK).
This commit was SVN r23957.
Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution.
This commit was SVN r23943.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.
Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.
Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.
I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:
1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)
2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.
There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.
This commit was SVN r23925.
* Update to be safe for AC 2.68 by using AC_LINK_IFELSE instead of
AC_TRY_LINK
* If enable visibility was used, ensure we fail if the compiler
doesn't support it
* Rename OMPI_CHECK_VISIBILITY -> OPAL_CHECK_VISIBILITY (and all
internal variables)
This commit was SVN r23923.
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.
Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.
This commit was SVN r23764.
All interface APIs for accessing the info remain unchanged in opal/util/if.c.
This has been tested on Mac, Linux, and NetBSD. Nobody else seemed interested in testing it, so there may be some future problems revealed as people try it on other OSs.
This commit was SVN r23743.
do the right thing. They now properly return
the value after the update. This also fixes all warnings
reported by the Sun Studio compiler. George provided the
new assembly routines. I added some configure code to make
sure the compilers could handle it.
This fixes trac:2560.
This commit was SVN r23721.
The following Trac tickets were found above:
Ticket 2560 --> https://svn.open-mpi.org/trac/ompi/ticket/2560
administrator can specify compiler flags that get
inserted into the command before the user's flags.
These flags can be specified at configure time.
Reviewed by Jeff Squyres.
This fixes ticket #2474.
This commit was SVN r23709.
- Add one instance where we do not use a parameter in a function
- Fix a buglet in commit r23689, where the attribute-for-function ptrs
was applied.
This commit was SVN r23690.
The following SVN revision numbers were found above:
r23689 --> open-mpi/ompi@5eb571c458
be tested on function pointers and assigned accordingly,
instead of using the pre-processor in the header files.
A functional change is (re-) specifying __opal_attribute_noreturn__
on orte_errmgr_base_abort(): All modules in the errmgr framework
either use this function, or define their own abort function,
which sets __opal_attribute_noreturn__.
This attributes was taken out with the errmgr overhaul in r22872.
This commit was SVN r23689.
The following SVN revision numbers were found above:
r22872 --> open-mpi/ompi@e4f2d03d28
assigned to function-declarations.
Check this case and mark the currently only case existing in trunk.
Thanks to Paul Hargrove for bringing this up.
Let's test the svn commit msg CMR:v1.5
This commit was SVN r23676.
otherwise we get swamped with warnings by gcc, everywhere header is
included.
- Remove redundant declaration of opal_datatype_safeguard_pointer_debug_breakpoint
Check whether CMR:v1.5 works
This commit was SVN r23674.
one of them in as it may still be needed on Solaris.
This fixes trac:2530.
This commit was SVN r23626.
The following SVN revision numbers were found above:
r22638 --> open-mpi/ompi@2a4b1227d9
The following Trac tickets were found above:
Ticket 2530 --> https://svn.open-mpi.org/trac/ompi/ticket/2530
a warning itselve (when another warning is generated within the file),
which can be rather anying.
Therefore check for output regarding this unrecognized warning.
This commit was SVN r23624.
linking against libibverbs on Solaris.
Sorry for the mid-day configure change folks; I meant to commit this
last night and forgot. :-(
This commit was SVN r23606.
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php
Documentation:
http://osl.iu.edu/research/ft/
Major Changes:
--------------
* Added C/R-enabled Debugging support.
Enabled with the --enable-crdebug flag. See the following website for more information:
http://osl.iu.edu/research/ft/crdebug/
* Added Stable Storage (SStore) framework for checkpoint storage
* 'central' component does a direct to central storage save
* 'stage' component stages checkpoints to central storage while the application continues execution.
* 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress)
* 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching)
* Added Compression (compress) framework to support
* Add two new ErrMgr recovery policies
* {{{crmig}}} C/R Process Migration
* {{{autor}}} C/R Automatic Recovery
* Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component
* Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option)
* {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342)
* {{{OMPI_CR_Restart}}}
* {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules)
* {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192)
* {{{OMPI_CR_Quiesce_start}}}
* {{{OMPI_CR_Quiesce_checkpoint}}}
* {{{OMPI_CR_Quiesce_end}}}
* {{{OMPI_CR_self_register_checkpoint_callback}}}
* {{{OMPI_CR_self_register_restart_callback}}}
* {{{OMPI_CR_self_register_continue_callback}}}
* The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future.
* Add a progress meter to:
* FileM rsh (filem_rsh_process_meter)
* SnapC full (snapc_full_progress_meter)
* SStore stage (sstore_stage_progress_meter)
* Added 2 new command line options to ompi-restart
* --showme : Display the full command line that would have been exec'ed.
* --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413)
* Deprecated some MCA params:
* crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir
* snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir
* snapc_base_global_shared deprecated, use sstore_stage_global_is_shared
* snapc_base_store_in_place deprecated, replaced with different components of SStore
* snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref
* snapc_base_establish_global_snapshot_dir deprecated, never well supported
* snapc_full_skip_filem deprecated, use sstore_stage_skip_filem
Minor Changes:
--------------
* Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing.
* Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components
* Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it.
* Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}}
* Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set.
* opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality.
* Cleanup the CRS framework and components to work with the SStore framework.
* Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably).
* Add 'quiesce' hook to CRCP for a future enhancement.
* We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}.
* Add optional application level INC callbacks (registered through the CR MPI Ext interface).
* Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive.
* {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked.
* {{{opal-restart}}} also support local decompression before restarting
* {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata
* {{{orte-restart}}} now uses the SStore framework to work with the metadata
* Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality.
* Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}.
* Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped.
* Make sure to decrement the number of 'num_local_procs' in the orted when one goes away.
* odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options.
* Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities.
* Improve the checks for 'already checkpointing' error path.
* A a recovery output timer, to show how long it takes to restart a job
* Do a better job of cleaning up the old session directory on restart.
* Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment)
* Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize.
This commit was SVN r23587.
The following Trac tickets were found above:
Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
orte_local_chip_type and orte_local_chip_model in MPI processes it the
appropriate sysinfo module found the values on the machine.
This commit was SVN r23581.
* Fix a configure warning for checking --enable-ft-thread
* In hnp and orted ErrMgr components check to see if other components have already recovered this process before trying to recover it again.
* Fix 'npernode' for restarting using the resilient rmaps component
* export ompi_info_set, so that internal functionality can use it.
This commit was SVN r23535.
error codes correctly again. Also fix a typo.
Reviewed by Jeff Squyres.
This commit was SVN r23531.
The following SVN revision numbers were found above:
r23463 --> open-mpi/ompi@2af3e6e5ae
substantive changes in this commit; the rest are minor style changes:
1. Change an OBJ_NEW(opal_list_item_t) to OBJ_NEW(opal_if_t). This
was causing memory corruption in the BSD code paths.
1. Move some local variables from the top of opal_if_init() to inside
the non-BSD code paths so that we avoid bunches of warnings about
unused variables when compiling on BSD. In doing so, I indented
the whole non-BSD section one level deeper, making the commit look
huge.
I also added a few {} around 1-line blocks, added some spaces, broke a
few lines, re-formatted a few comments, ...etc. Trivial stuff.
This commit was SVN r23501.
logic (even though the "else" clause for handling it was there). This
commit puts back the specific check for the word "external".
Thanks to Jed Brown for noticing the issue. Fixes trac:2503.
This commit was SVN r23475.
The following Trac tickets were found above:
Ticket 2503 --> https://svn.open-mpi.org/trac/ompi/ticket/2503
file descriptor (i.e., read and write complete messages, transparently
handling partial reads/writes, EAGAIN, and EINTR).
This code effectively already exists in a few places in the code base;
this is mainly a consolidation.
This commit was SVN r23450.
not using in Open MPI (i.e., that stuff is only used in the standalone
builds of hwloc -- it's not compiled/installed/used by Open MPI).
This commit was SVN r23416.
variable was set, it was prefixed to ''all'' values in the wrapper
compiler data text files. For example, if OPAL_DESTDIR was set to
/tmp/bogus and a wrapper compiler data file contained the line:
preprocessor_flags=-pthread
The value would be exanded to:
/tmp/bogus/-pthread
Which is clearly wrong. After some back-and-forth with Ralph and
Brian, Brian submitted this patch that fixes the problem. Now we
handle three cases properly (assume that configure was invoked with
--prefix=/opt/openmpi and no other directory specifications, and
$OPAL_DESTDIR is set to /tmp/buildroot):
1. Individual directories, such as libdir. These need to be prepended
with DESTDIR. I.e., return /tmp/buildroot/opt/openmpi/lib.
2. Compiler flags that have ${FIELD} values embedded in them. For
example, consider if a wrapper compiler data file contains the
line:
preprocessor_flags=-DMYFLAG="${prefix}/share/randomthingy/"
The value we should return is:
-DMYFLAG="/tmp/buildroot/opt/openmpi/share/randomthingy/"
3. Compiler flags that do not have any ${FIELD} values. For example,
consider if a wrapper compiler data file contains the line:
preprocessor_flags=-pthread
The value we should return is:
-pthread
Note, too, that this OPAL_DESTDIR futzing only needs to occur during
opal_init(). By the time opal_init() has completed, all values should
be substituted in that need substituting. Hence, we take an extra
parameter (is_setup) to know whether we should do this futzing or
not.
This commit was SVN r23402.
#define CACHE_LINE_SIZE to 128. This name has a conflict on NetBSD,
and it seems kinda odd to have a header file that ''only'' defines a
single value. Also, we'll soon be raising hwloc to be a first-class
item, so having this file around seemed kinda weird.
Therefore, I replaced CACHE_LINE_SIZE with opal_cache_line_size, an
int (in opal/runtime/opal_init.c and opal/runtime/opal.h) on the
rationale that we can fill this in at runtime with hwloc info (trunk
and v1.5/beyond, only). The only place we ''needed'' a compile-time
CACHE_LINE_SIZE was in the BTL SM (for struct padding), so I made a
new BTL_SM_ preprocessor macro with the old CACHE_LINE_SIZE value
(128). That use isn't suitable for run-time hwloc information,
anyway.
This commit was SVN r23349.
platforms (e.g., PPC64 running RHEL 5.4) -- sometimes it only finds
PUs. So in that case, just run the same calculation, but with PUs
instead of cores.
This commit was SVN r23305.
rename OMPI_CHECK_ATTRIBUTES -> OPAL_CHECK_ATTRIBUTES, because it's in
OPAL (somehow that name must have gotten missed in the Great M4 split
of '10...?)
This commit was SVN r23267.
* Remove OPAL_ERR_PAFFINITY_NOT_SUPPORTED; fit it into the generic
OPAL_ERR_NOT_SUPPORTED case.
* When odls_default detects that processor affinity is not supported,
it prints a specific message about it, and then it suppressed a
generic HNP help message that would normally follow it (i.e., it's
easier to have the "processor affinity is not supported" show_help
message last).
* Use some symbolic names in odls_default instead of fixed int's,
just for slight readability improvements in the code.
* Introduce orte_show_help_suppress(), which gives the ability to
suppress any future showings of any arbitrary show_help() message.
This is useful if you display message X and want to suppress
message Y. This suppression *only* works in environments where
orte_show_help() does coalescing.
This commit was SVN r23249.
distribution tarball, and would therefore cause automake to fail (in
case someone invokes autogen.sh on a distribution tarball).
This commit was SVN r23218.
* If < 0, it's an OPAL_ERR_* value
* If >= 0, it's the actual output value of the function
This is problematic for the OPAL_SOS stuff. This commit changes those
functions to always return OPAL_* statuses and send the output value
back through output parameters (like 95% of the rest of the code
base). This avoids the confusion with OPAL_SOS stuff and makes
paffinity work again (e.g., mpirun --bind-to-core ...).
I updated all paffinitiy modules for the new function signatures, and
bumped the paffinity API version up to 2.0.1. I don't think the
version change will matter, though, because we'll be introducing
support for hardware threads soon, which will either bump the
paffinity version again or we'll replace paffinity with
a new framework.
This commit was SVN r23197.
The fix is to just check if the return value is positive or not, since all the SOS encoded errors are *always* negative.
The real fix (as Ralph points out) is to change these functions (opal_pointer_array_add and mca_base_param*) to return the index as a pointer.
This commit was SVN r23173.
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
back the native error code.
* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
(OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
decode 'ret' to get the native error code.
This commit was SVN r23162.
The OPAL SOS framework tries to meet the following objectives:
* reduce the cascading error messages and the amount of code needed to print an error message.
* build and aggregate stacks of encountered errors and associate related individual errors with each other.
* allow registration of custom callbacks to intercept error events.
For more information, refer to
https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
This commit was SVN r23158.
I forgot to mention one more thing in the r23152 commit message:
* Copy the fix for hwloc's m4 to disable the configure flag
--enable-debug when building in embedding mode, because it can be
hijacked by the outter-level application. In this case, if you
configured OMPI with --enable-debug (or have --enable-debug in a
platform file), you'd see all of hwloc's debug output. Ick. hwloc
1.0 will include this fix.
This commit was SVN r23153.
The following SVN revision numbers were found above:
r23152 --> open-mpi/ompi@ca3362021e
* Fix disabling hwloc build (i.e., put the AM_CONDITIONALs where they
belong in the configure.m4 file)
* Update some svn:ignores
* r23142 removed some extraneous code, but forgot to remove the
variables used only by that code
This commit was SVN r23152.
The following SVN revision numbers were found above:
r23142 --> open-mpi/ompi@610fc67d12
supports a wide variety of operating systems and platforms; see the
opal/mca/paffinity/hwloc/hwloc/README file for details.
This component includes an embedded copy of hwloc, currently based on
hwloc-1.0rc6. But note that hwloc is properly SVN imported into the
/vendor branch, so it will be easy to update when 1.0 GA is released.
Note that the hwloc tree embedded in opal/mca/paffinity/hwloc/hwloc is
identical to a hwloc distribution tarball, except that much of the
documentation was rm -rf'ed (because we don't need it for the embedded
case).
Since the paffinity framework currently does not understand hardware
threads, the hwloc component compensates for this by identifying cores
by the "first" hardware thread on that core. Hopefully we'll update
paffinity someday to understand hardware threads. :-)
configure grew a --with-hwloc option, analogous to what we do for many
other external libraries that OMPI supports. However, there's a new
feature: due to the request of several distros, OMPI can be configured
to build with its internal copy of hwloc or with an external copy of
hwloc (e.g., a system-installed hwloc).
1. If --with-hwloc is not specified, Open MPI will try to use its
internal copy (but silently fail/ignore hwloc if that fails).
1. If --with-hwloc=<dir> is supplied, Open MPI looks for hwloc
support in <dir> (and --with-hwloc-libdir=<dir>, if specified).
1. If --with-hwloc=external is supplied, Open MPI will look for hwloc
in a compiler/linker default external location.
1. If --with-hwloc=internal is supplied, Open MPI will use its
internal copy of hwloc.
Some of OMPI's main configury had to be slightly re-arranged in the
bootstrapping phase to accomodate hwloc's configry needs.
This commit was SVN r23125.
http://marc.info/?l=linux-mm-commits&m=127352503417787&w=2 for more
details.
* Remove the ptmalloc memory component; replace it with a new "linux"
memory component.
* The linux memory component will conditionally compile in support
for ummunotify. At run-time, if it has ummunotify support and
finds run-time support for ummunotify (i.e., /dev/ummunotify), it
uses it. If not, it tries to use ptmalloc via the glibc memory
hooks.
* Add some more API functions to the memory framework to accomodate
the ummunotify model (i.e., poll to see if memory has "changed").
* Add appropriate calls in the rcache to the new memory APIs to see
if memory has changed, and to react accordingly.
* Add a few comments in the openib BTL to indicate why we don't need
to notify the OPAL memory framework about specific instances of
registered memory.
* Add dummy API calls in the solaris malloc component (since it
doesn't have polling/"did memory change" support).
This commit was SVN r23113.
It is okay to not have a paffinity module IF you aren't using paffinity anyway. So don't error out of MPI_Init because a paffinity module wasn't selected.
Cleanup error reporting in the odls default module to (once and for all!) eliminate messages originating in the fork'd process. Create some new error codes to allow us to pass enough info back to the parent process to provide useful error messages.
This commit was SVN r23106.
and opal_atomic_lifo_pop. Adds memory barriers to remove the race
condition
This commit was SVN r23014.
The following Trac tickets were found above:
Ticket 2355 --> https://svn.open-mpi.org/trac/ompi/ticket/2355
done this way a long time ago for the "gee whiz!" factor -- when in
reality, they really only need one-of-many-run-time priority
selection).
Changed run-time priorities to be as follows:
* darwin: 20
* linux: 20
* posix: 10
* solaris: 30
* test: 5
* windows: 20
I have a very dim (possibly untrue) recollection that Solaris needs to
have a higher priority than others just to ensure that no other is
chosen under Solaris. Make all other "native" components have a
priority of 20 (they shouldn't conflict with each other). Make the
posix fallback component have a priority of 10. Make the test
component priority 5, meaning someone can always select it, but you
can also make a "never select me" component that prioritizes itself
under test.
This commit was SVN r22997.
modify the OPAL_PAFFINITY_PROCESS_IS_BOUND macro to search the cpuset for
the maximum possible number of cpus rather than just the number of cpus
currently online. This corrects a problem where mpi_paffinity_alone was
not working properly on systems in which there can be cpu namespaces with
holes, such as on ppc64 with smt off (as discussed in #2365).
This commit was SVN r22927.
#if directives -- had to change a pair of #if conditionals in
opal/util/stacktrace.c to make the PGI compiler accept it.
This commit was SVN r22923.
The following Trac tickets were found above:
Ticket 2366 --> https://svn.open-mpi.org/trac/ompi/ticket/2366
1. fix a bug that caused an infinite loop in configure when specifying want-ft but not want-ft-thread by removing a stale reference to the opal-progress-thread option
2. add want-ft=orcm so we can build the orcm errmgr component
3. cleanup the use of "ompi_want_ft_xxx" and replace it with "opal_want_ft_xxx" so that naming conventions are preserved
This commit was SVN r22885.
Adds memory barriers to remove race condition which can
occur on PowerPC architectures (and probably others)
This commit was SVN r22880.
The following Trac tickets were found above:
Ticket 2355 --> https://svn.open-mpi.org/trac/ompi/ticket/2355
#2322.
The short version is that this patch consolidates two pieces of code
that call the back-end munmap and ensures that (if dlsym is used) the
corresponding dlsym is only invoked once and that the variable holding
the result is volatile.
This commit was SVN r22863.
The following Trac tickets were found above:
Ticket 2104 --> https://svn.open-mpi.org/trac/ompi/ticket/2104
Remove the --enable-progress-threads option as this is no longer functional, and hardcode OPAL_ENABLE_PROGRESS_THREADS to 0.
Replace the --enable-mpi-threads option with --enable-mpi-thread-multiple as this is clearer as to meaning. This option automatically turns "on" opal thread support if it wasn't already so specified. If the user specifies --disable-opal-multi-threads --enable-mpi-thread-multiple, we will error out with a message
Add a new --enable-opal-multi-threads option that turns "on" opal thread support without doing anything wrt mpi-thread-multiple
This commit was SVN r22841.
Aleksej Saushev.
Dont use bash or bashism in shell scripts
We should use Posix' setpgid(0,0), which is equivalent to setpgrp().
This commit was SVN r22829.
Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those.
The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed.
This commit was SVN r22824.
bug: libmpi_f90 had libmpi.la in its LIBADD instead of libmpi_f77.la.
Fixes trac:2244.
This commit was SVN r22704.
The following Trac tickets were found above:
Ticket 2244 --> https://svn.open-mpi.org/trac/ompi/ticket/2244
discussed extensively. See
https://svn.open-mpi.org/trac/ompi/ticket/2092 and the RFC thread
http://www.open-mpi.org/community/lists/devel/2010/02/7447.php.
Specifically:
* Create LT convenience libraries for OPAL and ORTE if the layer
above them is being created (use the already-defined
AM_CONDITIONALs to know if the project above us is being built).
* ORTE slurps in the LT convenience library for OPAL; OMPI slurps in
the LT convenience library for ORTE.
* Wrapper compilers now only -l one library (e.g., ortecc only does
-lopen-ret, and mpicc only does -lmpi).
This commit was SVN r22691.
discussion on the users list (see
http://www.open-mpi.org/community/lists/users/2009/12/11526.php).
Many thanks to Kevin Buckley who did most of the coding work, and to
Aleksej Saushev for his extreme patience in waiting for me to review
and commit this stuff.
This commit was SVN r22640.
If file does not exist, check the directory it lives in...
Maybe used by caller, trying to open mmap() on NFS, Lustre or
Panasas (thanks Sam).
For now, this is used to warn about the usage of mmap on such FS.
Please note, that Ralph mentioned the orte_no_session_dir parameter.
The help message includes a reference to this.
Tested on NFS and Lustre on Linux on
smoky: mpirun --mca orte_tmpdir_base $HOME/tmp -np 2 ./mpi_stub
jaguar: mpirun ... --mca orte_tmpdir_base /tmp/work/$USER ...
Fixes trac:1354
This should cmr:v1.5 once it has soaked and is shown to work on
Solaris
This commit was SVN r22604.
The following Trac tickets were found above:
Ticket 1354 --> https://svn.open-mpi.org/trac/ompi/ticket/1354
anything for non-MPI apps. Oops! (But before you freak out, gentle
reader, note that mpi_paffinity_alone for MPI apps still worked fine)
When we made the switchover somewhere in the 1.3 series to have the
orted's do processor binding, then stuff like:
mpirun --mca mpi_paffinity_alone 1 hostname
should have bound hostname to processor 0. But it didn't because of a
subtle startup ordering issue: the MCA param registration for
opal_paffinity_alone was in the paffinity base (vs. being in
opal/runtime/opal_params.c), but it didn't actually get registered
until after the global variable opal_paffinity_alone was checked to
see if we wanted old-style affinity bindings. Oops.
However, for MPI apps, even though the orted didn't do the binding,
ompi_mpi_init() would notice that opal_paffinity_alone was set, yet
the process didn't seem to be bound. So the MPI process would bind
itself (this was done to support the running-without-orteds
scenarios). Hence, MPI apps still obeyed mpi_paffinity_alone
semantics.
But note that the error described above caused the new mpirun switch
--report-bindings to not work with mpi_paffinity_alone=1, meaning that
the orted would not report the bindings when mpi_paffinity_alone was
set to 1 (it ''did'' correctly report bindings if you used
--bind-to-core or one of the other binding options).
This commit separates out the paffinity base MCA param registration
into a small function that can be called at the Right place during the
startup sequence.
This commit was SVN r22602.
Not having this check was causing distcheck errors on the OMPI
tarball-build machine because it's still a 32-bit-default machine, so
the evutil.c code was failing some #if conditionals (since it didn't
think it had strtoll available).
This commit was SVN r22577.
finding symbol pthread_atfork, e.g. cxx-test-suite.
Fixes trac:2088
cmr:v1.5:reviewer=jsquyres
This commit was SVN r22542.
The following Trac tickets were found above:
Ticket 2088 --> https://svn.open-mpi.org/trac/ompi/ticket/2088
* Don't build the pstat component if all defines needed aren't there.
* Update platform file to work better
* Work around two places that depended on modex being operational
This commit was SVN r22536.
after the compiler argv tokens.
Not closing #2201 yet; there's still discussion on that ticket about
whether we want to do more or not.
Refs trac:2201
cmr:v1.4.2
cmr:v1.5
This commit was SVN r22513.
The following Trac tickets were found above:
Ticket 2201 --> https://svn.open-mpi.org/trac/ompi/ticket/2201
Originally the patch was to improve the error message, but when digging into the code I found a subtle bug. If the daemon does not tell the HNP what CRS component it used, then the HNP tries to figure it out from the metadata (this is an uncommon case). The path the HNP used was not complete, so it was unable to find the metadata information. This patch fixes this by adding the 'snapshot_reference' to the 'snapshot_location' which completes the path for this search.
cmr:v1.4 (needs a custom patch)
cmr:v1.5
This commit was SVN r22479.
The following Trac tickets were found above:
Ticket 2190 --> https://svn.open-mpi.org/trac/ompi/ticket/2190
In CMake 2.6 and earlier, this function add dependencies for targets and also link the target libraries automatically, but in CMake 2.8,this behavior has been changed, i.e. it will only add the dependencies but no link, which will cause linking errors at compilation time.
This commit was SVN r22405.
party/"vendor" import, the changes are actually far smaller than the
size of this changeset implies. Here's a list of the changes:
* Update the AMD license header in plpa_map.c to be less restrictive
(see https://svn.open-mpi.org/trac/plpa/changeset/262 for details)
-- '''this is the most/only important change of this update.''' No
code is changed by this; only removing a clase from a license
header in plpa_map.c.
* Changes to the generated {{{configure}}}, {{{config.guess}}}, and
{{{config.sub}}} scripts (which aren't used by OMPI).
* soname version tracking changes (which also aren't used by OMPI;
they're only used when PLPA is built/installed in "standalone"
mode).
* Update the "get version" m4 (which was stolen from OMPI's m4 to
begin with, and is only used during OMPI's autogen.sh step).
* Update various PLPA version numbers to 1.3.2.
* Bug fix in plpa-taskset (which is not built in the OMPI PLPA build).
This commit was SVN r22367.
to Eugene, Jeff, and Briand for the help. This patch is supposed to
fix several outstanding issues, notably the one on tickets #2043.
This commit was SVN r22324.
:-)
Okay, cleanup the prior commit so that the default component search path shows in ompi_info, and remains available in component_find.
This commit was SVN r22278.
"my_perfect_path":SYSTEM_DEFAULT:USER_DEFAULT
and OPAL will substitute its internally derived values for the defaults (instead of forcing the user to figure them out).
This commit was SVN r22272.
friends also receive &argc and &argv (George asked Jeff to Ralph to
review before committing). The thought is that passing argv and argc
to opal/orte_init be useful to other projects outside of OMPI that are
using OPAL and/or ORTE (especially in conjunction with some other
bootstrapping code where it is helpful to modify argv). It's such a
small thing that it's easy to apply here to make others' lives a
little easier.
Ask George for more details; I'm just the messenger. :-)
Judging by the copyrights on this patch, it's been around for a
while. :-)
This commit was SVN r22260.
Add orte configuration option to control the use of the framework in the system. Although the code will build, it will not be active unless configured with --enable-bootstrap.
If bootstrap is enabled and the new opal_sysinfo framework can successfully determine the cpu model, pass that info to the application as an MCA param to support some work at Sun.
Also, have daemons report back the resources they find to guide process mapping in bootstrap operations (i.e., where the daemon starts at node boot as opposed to being launched at application start).
Adjust some platform files to enable these capabilities.
This commit was SVN r22244.
PARAM_WINDOWS_FILES is a mistake or not). Fixes trac:2079.
This commit was SVN r22171.
The following Trac tickets were found above:
Ticket 2079 --> https://svn.open-mpi.org/trac/ompi/ticket/2079
pass that on to callers of opal_cmd_line_make_opt_mca().
Thanks to Thomas Naughton III.
- Additionally, cmd-line parameters passed in table to opal_cmd_line_create()
may be wrong (e.g. OPAL_ERR_BAD_PARAM), which may be missed in the
loop.
This commit was SVN r22153.
Continue the reorganization of the configure system. Move files from the main config directory to their appropriate level-specific config directories. Modify the configure system to correctly handle compiler detection, test, and setup so that all things pertaining to opal and orte are done at the lower level, with the ompi configure system only looking at mpi-specific options.
Ensure the wrapper compilers for orte and ompi only get built when appropriate. Add support for c++ to the orte wrapper compilers, both script and non-script versions.
This commit was SVN r22138.
Re-enable "./autogen.sh -no-ompi" again. If you -no-ompi, the entire OMPI
configury is skipped and the entire ompi/ subtree is not built. There's
some simple m4-isms that prune out the relevant parts.
I added ompi/config/, orte/config/, and opal/config/ directories. I moved a
bunch of m4 files from the top-level config/ dir into ompi/config/, and a few
into orte/config/.
Note that all 3 <project>/config directories have a config_files.m4 file. This
file contains the AC_CONFIG_FILES list for that project. The AC_CONFIG_FILES
call cannot be in an AC_DEFUN macro and conditionally called -- if it is
included at all, Autoconf will process it. Hence, these config_files.m4 files
don't AC_DEFUN -- they just have AC_CONFIG_FILES. m4_ifdef() is used to
conditionally include the files or not.
I moved a bunch of obvious OMPI-only m4 files from config/ to ompi/config/,
but I'm sure that there's more that could go. A ticket will be filed with
thoughts on future work in this area.
This commit was SVN r22113.
'.', we should still find the executable - it is in a directory beneath us.
In other words, if someone gives us "foo/bar" instead of "./foo/bar", we should still be able to find bar
This commit was SVN r22110.
posix_memalign() will either return 0 or not, indicating success. And
if posix_memalign() fails, it's not always going to be due to
out-of-memory -- just return ERR_IN_ERRNO.
This commit was SVN r22070.