- Add signal handler BLCR register (helps with debugging)
- ifdef out the cr_request_file section for checkpointing self.
There is a bug with the 0.4.2 version of BLCR such that this
does not handle moving checkpoint files around.
I'm following up with the BLCR folks on this one (and checking
the newest release).
This commit was SVN r14069.
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.
This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.
This commit closes trac:158
More details to follow.
This commit was SVN r14051.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r13912
The following Trac tickets were found above:
Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
builds, so disable it there
* On 10.4.8 (and possibly others), siginfo is NULL in the signal
callback on 64 bit Intel builds, so account for that in the signal
callback.
This commit was SVN r14045.
- mca_base_param_file_prefix
(Default: NULL)
This is the fullname of the "-am" mpirun option. Used to specify a ':'
separated list of AMCA parameter set files.
- mca_base_param_file_path
(Default: $SYSCONFDIR/amca-param-sets/:$CWD)
The path to search for AMCA files with relative paths. A warning will be
printed if the AMCA file cannot be found.
* Added a new function "mca_base_param_recache_files" the re-reads the file
configurations. This is used internally to help bootstrap the MCA system.
* Added a new orterun/mpirun command line option '-am' that aliases for the
mca_base_param_file_prefix MCA parameter
* Exposed the opal_path_access function as it is generally useful in other
places in the code.
* New function "opal_cmd_line_make_opt_mca" which will allow you to append a
new command line option with MCA parameter identifiers to set at the same
time. Previously this could only be done at command line declaration time.
* Added a new directory under the $pkgdatadir named "amca-param-sets" where all
the 'shipped with' Open MPI AMCA parameter sets are placed. This is the first
place to search for AMCA sets with relative paths.
* An example.conf AMCA parameter set file is located in
contrib/amca-param-sets/.
* Jeff Squyres contributed an OpenIB AMCA set for benchmarking.
Note: You will need to autogen with this commit as it adds a configure param.
Sorry :(
This commit was SVN r13867.
timers might natively return nanoseconds instead of microseconds, as is
the case on x86. Fixes an issue with really high shared memory latencies
on Intel macs
This commit was SVN r13038.
components that use configure.m4 for configuration or are always built.
The macro has not been needed since moving to configure types other than
configure.stub
Fixes trac:590
This commit was SVN r13031.
The following Trac tickets were found above:
Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590
* Have darwin backtrace code return an error when buffer() is
called, since it is not imnplemented
* Print out hostname & pid when giving signal information
* If backtrace_buffer() is implemented, use that instead of
backtrace_print() and prefix stacktrace with the hostname
* Make the signal information printed be more user friendly
* If we're using the backtrace_buffer() code, don't print
the last two functions (which will be show_stackframe()
then backtrace_buffer()) so that users won't keep thinking
the error occurred inside Open MPI (sneaky, yes...)
Refs trac:538
This commit was SVN r12883.
The following Trac tickets were found above:
Ticket 538 --> https://svn.open-mpi.org/trac/ompi/ticket/538
whether they are prefixed with a double underscore or not.
__DARWIN_UNIX03 is defined on Tiger when compiling PPC 64 code, so
we can't use that.
Refs trac:575
This commit was SVN r12418.
The following Trac tickets were found above:
Ticket 575 --> https://svn.open-mpi.org/trac/ompi/ticket/575
renamed the register fields in the thread state structures. Support compiling
with either the old or new names, keying off the UNIX03 define (which is what
the 10.5 headers do).
Refs trac:450
This commit was SVN r12285.
The following Trac tickets were found above:
Ticket 450 --> https://svn.open-mpi.org/trac/ompi/ticket/450
* Make sure to AC_SUBST the backtrace CFLAGS so that the right flags
are passed to the component (especially -m64)
* Properly open / close the component. This isn't strictly necessary
to fix the bug, but was an oversight that should be fixed.
This commit was SVN r11806.
The following Trac tickets were found above:
Ticket 405 --> https://svn.open-mpi.org/trac/ompi/ticket/405
* Use $31 instead of mnemonic zero for the gcc inline
assembly test, as the GNU assembler doesn't like
zero, but both Tru64 and GNU assembler should be fine
with $31
* Disable Linux timer component on Alpha. The CPU timer
rolls over every 10 seconds or less, so it's kinda
worthless for our needs.
* Fix some escaping issues when local functions are
denoted with a $
* Remove C++ comments from the Alpha assembly.
* Add base assembly code for the non-inlined functions
on Alpha
This commit was SVN r11764.
The same treatement will happens on all sub-projects. The .h files
have to be C++ compatibles and all symbols with an external visibility
have to get the {PROJECT}_DECLSPEC in front of the prototype.
This commit was SVN r11340.
range of OS friendly path management functions, such as opal_basename
opal_dirname. They should always be used instead of basename and
dirname. There are several functions which allow us to create paths
that are compatible with the OS.
The OPAL_ENV_SEP define should be used (instead of ':') when a env
variable is splitted.
This commit was SVN r11336.
Windows version for the libevent. The one they provide is more than
innapropriate for what we need (without talking about the fact that
the code is just plain wrong).
This commit was SVN r11329.
environment variables.
- The HOME on Windows is called USERPROFILE.
- C++ compilers (at least on Windows) require explicit casts. Even going
through a void* does not help.
- Cleanup the Windows file name management.
- Always use opal_os_path to create OS friendly paths.
This commit was SVN r11311.
different macros, one for each project. Therefore, now we have OPAL_DECLSPEC,
ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project.
This commit was SVN r11270.
So define the constant if it isn't already defined. Something else
includes stdio.h, which has a bunch of declarations that really
confuse non-GNU compilers, so be sure to include that one before
setting __USE_GNU.
refs trac:280
This commit was SVN r11265.
The following Trac tickets were found above:
Ticket 280 --> https://svn.open-mpi.org/trac/ompi/ticket/280
already set. This can annoy compilers that aren't GNUish
* __align is technically a reserved token and IBM XL appears to be doing
something with it that causes compile badness. So use a different
variable name.
refs trac:279
This commit was SVN r11264.
The following Trac tickets were found above:
Ticket 279 --> https://svn.open-mpi.org/trac/ompi/ticket/279
compiler, automatically disable the ptmalloc component. It seems that
optimization level -O2 or higher will cause the generated code to do
Bad Things (e.g., opalcc will segv). Upgrading to the Intel 9.1
compiler seems to fix the problem.
This closes ticket #227.
This commit was SVN r11076.
components to load:
- only allow the ^ to be the first character of the value
- if we find ^ elsewhere in the value, print an error and fail
This commit was SVN r10880.
than $(LN_S). This causes problems with with Windows and probably
elsewhere (re: #200). So use a slightly different trick to get the
right header selected for the MEMCPY and TIMER components.
* Using the same trick used to solve the AC_CONFIG_LINKS problem,
stop using a separate header file for direct calling in the
PML and MTL. This lets me remove some icky code in ompi_mca.m4
that was more fragile than I really liked.
This commit was SVN r10841.
reentrant for free(), so we can't call free() from inside an sbrk() handler.
The solution is never call sbrk() with a negative number. The mmap() allocator
used for large allocations does not have this problem and continues to give
memory back to the OS as soon as possible.
This should go to both the v1.1 and v1.0 branches.
This commit was SVN r9943.
implementations. I dont want to overload the memcpy functions,
therefore people interested in using the high performance memcpy
should use directly opal_memcpy instead. Notice, that there are 2
other versions of memcpy available, which use a destination or a source
described as iovecs.
This commit was SVN r9532.
installation directories) in configure, the files that depend on this
information are not properly rebuilt. If you need this information,
don't setup a -D in the Makefile.am - instead, include
opal/install_dirs.h.
* Use the : option in AC_CONFIG_FILES to avoid needing to expose that
we are playing around with temporary files with our headers to avoid
rebuilding
* Clean up the version file information a bit, and like the install
directory stuff, make sure that there is a dependency so that
ompi_info gets rebuilt properly when a version number changes.
This commit was SVN r9256.
Still release the standard stream upon finalizing.
This commit was SVN r9204.
The following SVN revision numbers were found above:
r9182 --> open-mpi/ompi@a2a26525b3
- move files out of toplevel include/ and etc/, moving it into the
sub-projects
- rather than including config headers with <project>/include,
have them as <project>
- require all headers to be included with a project prefix, with
the exception of the config headers ({opal,orte,ompi}_config.h
mpi.h, and mpif.h)
This commit was SVN r8985.
makes illegal free() calls behave in a much more rational way. You'll still
probably die, but your stack trace will not have 3 billion pages of recusion
inside the memory allocator.
* Fix illegal free in the opal_wrapper code. basename() returns a string in a
static buffer, so it shouldn't be free()ed. It also shouldn't be left around
so long, as another call to basename() may whack the returned buffer. So
leave the free and add a strdup() around the basename() call.
* Turn off some unneeded debugging in the opal_wrapper code that would list the
comamnd to be run, regardless of the -showme option.
This commit was SVN r8758.
A small and ugly workaround the path problem on windows (the absolute
path start with [a-z]: whn : is considered as separator for most of
the environment variables.
This commit was SVN r8746.
page protection, which causes the pages to be droped, which causes problems
if we don't deregister the pages first. Since memory is cheap in this case
(it is still usable, should ptmalloc2 want it back, and is limited in size),
we just mprotect the pages instead. This solves the dropping pages problem,
and doesn't cause even more calls into the cache code.
Thanks to Gleb Natapov for both finding the problem and giving a fix.
This should go to the v1.0 branch
This commit was SVN r8732.
- fall back to compile test for windows paffinity component
when cross compiling
- fall back to platform guess when checking for threads having
different pids with pthreads (yes on linux, no elsewhere)
- pass the proper host, target, and build flags to the
ROMIO configure script
With these changes, cross-compiling should be possible with the exception
of the Fortran 77 and Fortran 90 bindings. Fortran 77 can be cross-
compiled if cache values are provided for type sizes and alignment.
This commit was SVN r8702.
r8698), with changes below:
- Split wrapper flags into those required for each of the three projects,
and cleaned up some cruft (including the LIBMPI_EXTRA_*FLAGS) through-
out the build system
- Added opal_init_util and opal_finalize_util to allow init / cleanup
of all the opal code that doesn't require the MCA system
- Create standalone key=value file parser, based on the one that used
to be in the mca param parser, so that it can be shared in multiple
places
- Add wrapper datafiles for opal, orte, and ompi wrappers, and add
wrapper compiler with support for all the old features
This commit was SVN r8699.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r8690
r8698
* Make sure --without-BTL works for all BTLs
* Fix copy-n-paste error in aix timer configrue help string
This should go to the v1.0 branch
This commit was SVN r8554.
deallocation came from the allocator (malloc, fee, etc) or somewhere
else (the user calling mmap/munmap, etc). Going to be used by Galen
to determine if it is worth searching the allocations tree
Set flag if it is possible to intercept mmap (not always possible
due to a circular dependency between mmap, dlsym, and calloc)
This commit was SVN r8521.
Jeff-length). Also change it such that we also have a lt_dlhandle
type -- even in the case of --disable-dlopen. Specifically, even in
the case of --disable-dlopen, we still have shell functions that use
an lt_dlhandle parameter, so we need that type, even if <ltdl.h> isn't
used.
This commit was SVN r8507.
may call calloc(large number), which causes ptmalloc2 to call mmap, which
causes us to try to dlsym for mmap, which leads to looping badness.
This commit was SVN r8461.
the ptmalloc2 memory hooks component triggers callbacks for memory
allocation / deallocation. If enabled (the default) it is only when
memory is actually obtained from or released to the OS (so little
malloc calls only trigger callbacks if sbrk is called). If disabled,
callbacks are triggered every time malloc/free/etc. is called
* It turns out that syscall and mmap aren't good friends due to the return
type of mmap and some old legacy issues with syscall functions that
take more than 5 parameters. For now, default to either loading
the symbol from glibc using dlsym or using the __m{un,}map functions.
Thanks to George for finding this.
* Fix some dumb typos in the mmap / munmap catching code
This commit was SVN r8410.
both mmap and munmap), adjusting the configure script so that the
component will only be activated on systems that use ptmalloc2 in the
first place -- ie, Linux
* Remove the malloc_hooks component - it became an unworkable solution
once threads and such were considered.
* Remove malloc_interpose component - it never worked quite right and
was not going to be able to intercept malloc, so it wasn't going to
be useful for OMPI's purposes.
* Update tests a little bit to match recent memory hooks api
issues - still needs a bit of work.
This commit was SVN r8381.
* turns out (duh!) that there was a reason that the <projectdir>dir
variable was set in the AM conditional. If not, stupid directories
are created and not needed... duh.
This commit was SVN r8205.
component/base Makefile.am files, reducing the time configure spends
stamping out Makefiles at the end
* Install base_impl.h file when devel-headers are being installed
This commit was SVN r8200.
When compiling C++ code that includes something that looks for the C++
header file "memory" (stupid C++ headers not having .h extensions), it
goes through the header file search path, which includes $(topsrcdir)/opal,
so it finds the directory $(topsrcdir)/opal/memory/ and tries to load
that as the memory header file and all goes downhill.
This commit was SVN r8111.
--with-devel-headers is given to configure:
* allocator, rcache, and mpool were putting things in the wrong place
* timer wasn't installing the inline implementations at all
This commit was SVN r7805.
originally suggested by Ralf Wildenhues, to try to speed autogen, configure,
and make (and possibly even make install). Use automake's include directive
to drastically reduce the number of Makefile files (although the number of
Makefile.am files is the same - most are just included in a top-level
Makefile.am). Also use an Automake SUBDIRs feature to eliminate the
dynamic-mca tree, which was no longer really needed. This makes adding
a framework easier (since you don't have to remember the dynamic-mca
tree) and makes building faster (as make doesn't have to recurse through
the dynamic-mca tree)
This commit was SVN r7777.
OMPI_CHECK_PACKAGE macro instead of doing everything ourself.
The old code was causing problems - it wouldn't add anything to
WRAPPER_EXTRA_{LDFLAGS, LIBS} if libnuma was installed in /usr,
so it didn't work so well.
This should go to the 1.0 branch
This commit was SVN r7683.
set it to empty if it isn't. __malloc_attribute__ is set in sys/cdef.h, and
some versions set it unconditionally. If sys/cdef.h is included after malloc.h,
some compilers will complain loudly. So include malloc.h at the end so that
sys/cdef.h is already included if it's going to be.
This commit was SVN r7613.
in-place already (turns out that I was wrong in thinking that it
didn't work for static components), but the logic for excluding
components was not there. This commit does a few things:
- Adds "exclude" logic, so that you can do:
mpirun --mca btl ^mvapi,openib ...
(note the "^" character -- I tried "!" but then you have to escape
it in the shell, and that was icky) which will exclude both the
mvapi and openib btl components (excluding one component means that
you are excluding all components in the list; it doesn't make sense
to include some and exclude others -- you're entire entirely
including or entirely excluding)
- Simplifies the "include" logic, so the same old stuff like this
still works:
mpirun --mca btl tcp,self ...
will only use the tcp and self btl components.
- Added more verbosity statements to make this selection process
clear.
This commit was SVN r7509.
really can't. Test for munmap, since it's the most likely to cause problems,
since it's always an interposed symbol.
The condition that usually causes problems is if libmpi was brought in as
the result of a library dependency, rather than as a -l on the link line.
The linker in this case will find malloc/free/munmap/etc. in libc, rather
than in libmpi.
This commit was SVN r7508.
Makefile.options
- Sample in each of the three projects of how to link againt the
relevant libraries so that when components are loaded into a parent
process' space, we don't rely on the libopal/liborte/libmpi symbols
being in the parent's public symbol namespace -- instead,
dynamically link to the relevant libraries, allowing the dynamic
linker to pull those libraries in at run-time, if needed
This commit was SVN r7397.
a constructor, like the rest of the code base
- Convert usage in the tree to use the constructor to zero out an
instance of opal_output_stream_t
- Still need to re-enable output files
This commit was SVN r7253.
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
number to be set at autoconf time (instead of at configure time, as
it was before). Set the version number, minus the subversion r number,
at autoconf time. Override the internal variables to include the r
number (if needed) at configure time. Basically, the right thing
should always happen. The only place it might not is the version
reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
in the directory containing source files, even if the Makefile.am is
in another directory. This should start making it feasible to
reduce the number of Makefile.am files we have in the tree, which
will greatly reduce the time to run autogen and configure.
This commit was SVN r7211.
be called with a description of its memory segments to make local. It
is a small enough API that changing to support a
one-process-does-all-assignment model is simple enough if we ever need
it.
This commit was SVN r7148.
add a -I to find the included ltdl.h (vs. a system-installed ltdl.h)
- Clean up kruft in a bunch of Makefile.am's to remove now-unnecessary
AM_CPPFLAGS settings to get static-components.h for each framework
- Move the component_repository API functions out of opal/mca/base/base.h
and into opal/mca/base/mca_base_component_repository.h in order to
decrease unnecessary dependencies (e.g., before this, almost
everything in the tree depended on ltdl.h, which is unnecessary --
only a small number of files really need ltdl.h)
This commit was SVN r7127.
API is still a bit unstable and may change.
- Add a primitive "first use" component that simply has each process
"touch" the pages that they want to use, thereby [hopefully] locking
them locally to a specific processor
- Add hooks in ompi_mpi_init to enable memory affinity when processor
affinity is used.
- Added hooks in ompi_mpi_finalize to shut down memory affinity when
it was initialized during ompi_mpi_init.
- Added right hooks in ompi_info to display maffinity components.
This commit was SVN r7044.
cast the return to an int in the C++ test case, just in case.
* C++ sucks. If compiling with C++ on some GNU compiler/linker
combos, the initialize hook isn't automagically fired for the
malloc code. Add a backup setting during opal_init, which is
early enough not to cause any damage.
This commit was SVN r6983.
* make sure LIBS contains -lpmapi before checking for pm_cycles()
* reorder aix functions so that we don't use get_usecs() before we
define it
This commit was SVN r6970.
* Make ompi_info list timer components
* Remove flag to display whether we have memory intercepts (components are
already listed), until we can figure out how to do it *after* the
components are opened.
This commit was SVN r6950.
on all glibc systems (tested with x86 and x86_64 with a couple of C++
compilers). While not as ideal as the malloc_hooks method, it does
have the advantage of working with threads.
* Modified malloc_hooks component to properly follow prefix rule. No
functionality changes
* Make the memory framework only chose one component, and modify all
components to set priority to 20, except malloc-interpose, which is
at 10. This means that on Linux, malloc_hooks will be used unless
threads are enabled, since I think malloc_hooks is a better design
choice when we can use it
This commit was SVN r6949.
to opal_progress() to use the timers instead of a tick count for deciding
whether to call the event loop or not. Currently supported platforms are:
- solaris (x86 / sparc)
- Linux (x86 / x86_64 / IA64)
- Mac OS X (x86 / Power PC)
This commit was SVN r6922.
* Add memory intercept routines for Darwin using the official Darwin
API (thanks to Drew Gallatin from Myricom for pointing me to some
information from Apple engineers about how to make this work)
* add debugging output to functionality test
This commit was SVN r6920.
all the different cases (curses to RedHat for continually changing
the glibc API!)
- Minor fixes for the Solaris paffinity component
This commit was SVN r6894.
* Add base to memory framework so that we can do something sane with
ompi_info
* Updated ompi_info to print components for memory framework and
show whether we have memory hooks active or not.
This commit was SVN r6861.
- Simple components for getting and setting processor affinity of a
process; does *not* include scheduling decisions
- No one in the OMPI code base invokes the framework yet
- Added linux component for using sched_setaffinity()
- Added shell solaris component that will use processor_bind()
(currently .ompi_ignore'd)
This commit was SVN r6854.
ompi/).
- There's still a handful of places that have orte/ #include files;
still need to clean those up
- A lot of places still use ompi/include/constants.h -- those need to
be converted over to use OPAL_ return codes and then switch to the
opal constants.h. This commit is the first few steps towards
that...
This commit was SVN r6843.
that were set on the command line. This was techinically exactly the
way the code was designed, but it certainly violated the Law of Least
Astonishment (even to its designer ;-) ). So now if you execute
something like this:
mpirun -mca pls_rsh_debug 1 -np 4 hello
You'll see debugging output from the rsh pls component, as you would
expect (this was not previously the case -- the MCA pls_rsh_debug
parame would be set to 1 in the 4 spawned hello processes, but *not*
in the orterun process).
More specifically, MCA parameters will be set in the orterun process
in the following cases:
- The new command line switch "--gmca" (or "-gmca") is used,
indicating that the MCA parameter is "global". --gmca also means
that that MCA parameter will be applied to all context app's. For
example:
mpirun -gmca foo bar -np 1 hello : -np 2 goodbye
The foo MCA param will be set in both the hello and goodbye
processes.
- If there is only one context app. For example:
mpirun -mca pls_rsh_debug 1 -np 4 hello
will set pls_rsh_debug to 1 in both the orterun process and the 4
spawned hello processes.
Also added a few more comments inside orterun to document a somewhat
confusing use of a state variable in a recursive case.
This commit was SVN r6764.
- new preferred API calls for registering MCA parameters are
mca_base_param_reg_{int|string} and
mca_base_param_reg_{int|string}_name.
- See opal/mca/base/mca_base_param.h for docs on new calls.
- Can now register and lookup a value at the same time.
- Can now mark a parameter "read only" at registration time
- Can now mark a parameter "internal" at registration time
- Can now associate a help message with the parameter at registration
time; displayed in the ompi_info output.
The old API calls are still available for backwards compatibility
(mca_base_param_register_{int|string}. They will eventually be
removed -- all developers are encouraged to use the new APIs from here
on out and replace any old calls with the new API.
Some params were also renamed -- the previous convention of using
"base_" as a prefix for any param that was not associated with a
component is henceforth deprecated. Instead, use one of the following
prefixes:
mca: for anything in the MCA base itself
opal: for anything in OPAL
orte: for anything in ORTE
mpi: for anything in OMPI
This commit was SVN r6698.
* Add ability to completely disable libltdl (the dlopen code to load
dynamic shared objects) to configure: --disable-dlopen
* Added MCA param (component_disable_dlopen) to disable DSO loading
at runtime
* Made the event library behave in some not-completely-erroneous way
on platforms where it has absolutely no eventops support (ie, no
select, poll, or epoll)
* Disabled orte_wait, opal_few, and opal_daemon_init code on
platforms without fork, waitpid support. All non-init functions
will return OPMI_ERR_NOT_SUPPORTED
* Disable orteprobe tool when fork or pipe aren't supported
This commit was SVN r6490.
* rename ompi_malloc to opal_malloc
* rename ompi_numtostr to opal_numtostr
* start of rename of ompi_environ to opal_environ
This commit was SVN r6332.
* rename ompi_basename to opal_basename
* rename ompi bitop functions to opal
* rename ompi_cmd_line to opal_cmd_line
* rename ompi_sizet2int to opal_sizet2int
* rename orte_daemon_init to opal_daemon_init
* rename ompi_few to opal_few
This commit was SVN r6330.
- move mpool and allocator frameworks back to ompi (from opal)
- specialize the ompi_free_list class to use an mpool instance
- un-specialize opal_free_list to *not* use mpool; just use malloc/free
This commit was SVN r6292.