Other changes:
1. Remove the old xcpu components as they are not functional.
2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one.
This will require an autogen/configure, I'm afraid.
This commit was SVN r11228.
After seeing the uglyness that is removing directories in the
codebase I decided to push down this to the OPAL by extending the
opal/os_create_dirpath.(c|h) to contain some more functionality.
In this process I renamed 'os_create_dirpath' to 'os_dirpath' since it
is a bit more general now.
Added a few functions to:
- check if an directory is empty
- check to see if the access permissions are set correctly
- destroy the directory at the end of the dirpath
- By using a caller callback function (a la Perl, I believe)
for every file, the caller can have fine grained control over
whether a specific file is deleted or not.
This simplifies things a bit for orte_session_dir_(finalize|cleanup)
as it should no longer contain any of this functionality, but uses
these functions to do the work.
From the external perspective nothing has changed, from the
developer point of view we have some cleaner, more generic code.
This commit was SVN r10640.
to compute the overhead of the convertor (and all convertor related operations).
The second will check the position setting on the convertor. Not yet completed ...
This commit was SVN r10432.
done with the same values on 2 different types return the same value. The 2
types belong to 2 differents classes: contiguous and sparse. With this test
I simulate the behavior of the buffered send, where the sender send the
data from the user attached buffer (which is contiguous) and the receiver
receive it in a sparse type.
This commit was SVN r10372.
actual files, so we should not have a clean rule for them - instead,
make it maintainer-clean. Neither clean nor distclean should
remove files that were in the tarball...
This commit was SVN r9351.
- move files out of toplevel include/ and etc/, moving it into the
sub-projects
- rather than including config headers with <project>/include,
have them as <project>
- require all headers to be included with a project prefix, with
the exception of the config headers ({opal,orte,ompi}_config.h
mpi.h, and mpif.h)
This commit was SVN r8985.
both mmap and munmap), adjusting the configure script so that the
component will only be activated on systems that use ptmalloc2 in the
first place -- ie, Linux
* Remove the malloc_hooks component - it became an unworkable solution
once threads and such were considered.
* Remove malloc_interpose component - it never worked quite right and
was not going to be able to intercept malloc, so it wasn't going to
be useful for OMPI's purposes.
* Update tests a little bit to match recent memory hooks api
issues - still needs a bit of work.
This commit was SVN r8381.
When compiling C++ code that includes something that looks for the C++
header file "memory" (stupid C++ headers not having .h extensions), it
goes through the header file search path, which includes $(topsrcdir)/opal,
so it finds the directory $(topsrcdir)/opal/memory/ and tries to load
that as the memory header file and all goes downhill.
This commit was SVN r8111.
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
number to be set at autoconf time (instead of at configure time, as
it was before). Set the version number, minus the subversion r number,
at autoconf time. Override the internal variables to include the r
number (if needed) at configure time. Basically, the right thing
should always happen. The only place it might not is the version
reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
in the directory containing source files, even if the Makefile.am is
in another directory. This should start making it feasible to
reduce the number of Makefile.am files we have in the tree, which
will greatly reduce the time to run autogen and configure.
This commit was SVN r7211.
1. Valgrind is good for something - chasing down memory leaks in registry led me to re-visit the dictionary functions and discover that I wasn't keeping track of the number of dictionary entries on each segment! Resulted in wasted time searching blank entries as well as leaked memory. This has now been fixed.
2. Fixed the orte_bitmap test. The init function for that class has been eliminated and the constructor adjusted to provide that functionality.
This commit was SVN r7136.
add a -I to find the included ltdl.h (vs. a system-installed ltdl.h)
- Clean up kruft in a bunch of Makefile.am's to remove now-unnecessary
AM_CPPFLAGS settings to get static-components.h for each framework
- Move the component_repository API functions out of opal/mca/base/base.h
and into opal/mca/base/mca_base_component_repository.h in order to
decrease unnecessary dependencies (e.g., before this, almost
everything in the tree depended on ltdl.h, which is unnecessary --
only a small number of files really need ltdl.h)
This commit was SVN r7127.
orte_init_stage1(), since not all ORTE processes call orte_init().
* Expad opal_error test case to make sure ORTE error codes print
properly
* Make project error codes start at easy values (OPAL is -1 to -100,
ORTE is -101 to -200, OMPI is less than -201) to make it easier
to figure out what an error code as an integer means. Also has
the nice property of not changing the values of error codes ever
time a new error code is added.
This commit was SVN r7061.
tree.
- fix up #include's throughout the tree (yay contrib/search_replace.pl!)
- remove a few extraneous #include's
- remove orte_sys_info*() from opal_init()/opal_finalize() (it's
already in orte_init_stage1() and orte_system_finalize())
- remove dependencies in opal on orte_system_info -- util/os_path.c
and util/os_create_dirpath.c (they only used path_sep, anyway --
easily changed to #defines)
This commit was SVN r7059.
cast the return to an int in the C++ test case, just in case.
* C++ sucks. If compiling with C++ on some GNU compiler/linker
combos, the initialize hook isn't automagically fired for the
malloc code. Add a backup setting during opal_init, which is
early enough not to cause any damage.
This commit was SVN r6983.
OPAL_ERROR, same for all the other error codes. Also, make sure that there
are never conflicts between OPAL anr ORTE error codes (for example).
Finally, provide opal_perror(), opal_strerror(), and opal_strerror_r() to
give stringified error messages for the different error codes
This commit was SVN r6969.
to opal_progress() to use the timers instead of a tick count for deciding
whether to call the event loop or not. Currently supported platforms are:
- solaris (x86 / sparc)
- Linux (x86 / x86_64 / IA64)
- Mac OS X (x86 / Power PC)
This commit was SVN r6922.
* Add memory intercept routines for Darwin using the official Darwin
API (thanks to Drew Gallatin from Myricom for pointing me to some
information from Apple engineers about how to make this work)
* add debugging output to functionality test
This commit was SVN r6920.
callbacks to be triggered when memory is about to leave the current
process. The system is designed to allow a variety of interfaces,
hopefully including whole-sale replacement of the memory manager,
ld preload tricks, and hooks into the system memory manager. Since
some of these may or may not be available at runtime and we won't know
until runtime, there is a query funtion to look for availability of
such a setup.
* Added ptmalloc2 memory manager replacement code. Not turned on by
default, can be enabled with --with-memory-manager=ptmalloc2.
Only tested on Linux, not even compiled elsewhere. Do not use
on OS X, or you will never see your process again.
* Added AM_CONDITIONAL for threads test to support ptmalloc2's build
system
This commit was SVN r6790.
I'll send out a general note about this in the morning, but for now I'll just notify people through this note that the new simplified "put" commands have been debugged and work just fine. I'll add documentation to the gpr.h file later - only think to really be aware of is that the tokens array must be NULL terminated. Other than that, things work pretty much as you'd expect.
This commit was SVN r6700.
- Update svn:ignore's to match new exectuable names
- Consolidate the unit test Makefile.am flags into a testing
Makefile.options
- Remove a bunch of SUBDIRS from test/mca/Makefile so that they don't
run by default, but can be invoked manually (they're still in
DIST_SUBDIRS)
This commit was SVN r6598.
* rename ompi_malloc to opal_malloc
* rename ompi_numtostr to opal_numtostr
* start of rename of ompi_environ to opal_environ
This commit was SVN r6332.
* rename ompi_basename to opal_basename
* rename ompi bitop functions to opal
* rename ompi_cmd_line to opal_cmd_line
* rename ompi_sizet2int to opal_sizet2int
* rename orte_daemon_init to opal_daemon_init
* rename ompi_few to opal_few
This commit was SVN r6330.
unit tests without screwing up the nightly builds.
These changes fix the problem of not including the test/mca/gpr
directory in the nightly tarball, prevent the tests from being
compiled, but leave the door open for manual compilation when the time
comes to start the work to re-enable them (e.g., uncomment a few
lines in gpr/Makefile.am).
This commit was SVN r6175.
Also included is a fix to the attribute problem for singletons.
Short explanation:
The prior system placed triggers and subscriptions on the registry for each process - approximately eight/process. Each of these had to be checked every time there was a registry operation such as a "put" or "increment-value". For large numbers of processes, this repetitive checking consumed some significant time.
The new system allows processes to "attach" to existing triggers and subscriptions, without creating a new one. Thus, there are now only eight triggers and five subscriptions on a job - *regardless of how many processes are being run*. This means that the registry now takes the same amount of time (which is pretty darn short) to process an operation regardless of how many processes are in a job.
I'll provide some startup times from scalability tests shortly - need to complete the commit so I can move the system to an appropriate cluster.
This commit was SVN r6164.
the RDS selection logic, which is, unfortunately, not yet well
supported by the testing infrastructure (it causes false failures in
the nightly build).
This commit was SVN r6073.
For anyone interested, the problem stemmed from two things:
1. a bug in the ompi_bitmap utility (which I copied to orte_bitmap to avoid unintentionally disturbing something else) that causes the bitmap NOT to expand unless the caller asks for a bit that is more than one byte outside the current array size. The unit test didn't pick it up because it doesn't check that close to the boundary.
2. a "feature" in the ompi_bitmap utility that only expands the array if you try to SET a bit outside the current boundary, but NOT if you try to CLEAR a bit outside the array limit. This appears intentional as the unit test checks for this behavior, but I hadn't been expecting the asymmetry.
The orte_bitmap utility now appropriately expands in both circumstances. I also added a function to expand the array so it "covers" a bit location without setting or clearing it. The function allows you to ensure the array is big enough to handle the specified bit, but leave the bit alone if it already is there (the other functions would set/clear it if it was).
I've tested it with up to 100 processes without problem.
This commit was SVN r5980.
call the memory pool to do special memory allocations, and extended
the mpool so that it will do the allocations and keep tack of them in
a tree. Currently, if you pass MPI_INFO_NULL to MPI_Alloc_mem, we will
try to allocate the memory and register it with as many mpools as
possible. Alternatively, one can pass an info object with the names of
the mpools as keys, and from these we decide which mpools to register
the new memory with.
- fixed some comments in the allocator and fixed a minor bug
- extended the red black tree test and made a minor correction
This commit was SVN r5902.
Need to do some refining of the component, but it meets basic requirements right now. Nobody else should notice any change - system basically ignores it unless you tell it to do something.
This commit was SVN r5723.
- Change all uses of *printf'ing a size_t to use an explicit cast to
(unsigned long) and the %lu escape
- change ORTE_GPR_REPLICA_MAX_SIZE to INT_MAX until bug 1345 is fixed
(i.e., until we allow size_t in MCA params)
- ns_base_local_fns.c:orte_ns_base_get_proc_name_string(): changed
from %0X -> %lu
- ORTE_NAME_ARGS added explicit (unsigned long) casts, and changed all
usages of ORTE_NAME_ARGS to use %lu's
This commit was SVN r5644.
1. Added pid_t to the dps
2. Processes now "register" their local pid and update their location (i.e., nodename) on the registry during mpi_init
3. Added a new error code for values that exceed maximum for their data type (useful when transitioning a value from one variable to another of different size)
4. Fixed a few places where size_t was being incorrectly handled
5. Updated dps_test to cover pid_t types
This should now provide support for TotalView connection - which David is pursuing.
This commit was SVN r5623.
HEADS UP: string versions of names are now presented in DECIMAL format - not HEX as they previously were. If you used the name services functions (as you were supposed to do) to access these names, you will not have any problems. If you did it yourself, then you need to fix it - my suggestion would be that you fix your code by using the name service functions to avoid future problems.
This commit was SVN r5571.
1. *correctly* fix the printing of size_t variables. Need to do this through a #define, not just typecast things. Thanks to Jeff/Brian for suggesting a cleaner way to do it (as opposed to just doing the #define at the print location). Note that not ALL of the prints have been "fixed" yet - will continue to identify them.
2. Add int64 and size_t to the pack/unpack unit tests.
3. Fix a bug in the int64 pack/unpack system.
This commit was SVN r5570.
Merged in from:
svn merge -r5506:5553 https://svn.open-mpi.org/svn/ompi/tmp/hetero .
This commit was SVN r5552.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r5506
r5553
Merged in from:
svn merge -r5448:5496 https://svn.open-mpi.org/svn/ompi/tmp/hetero .
This commit was SVN r5550.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r5448
r5496
from:
svn merge -r5440:5448 https://svn.open-mpi.org/svn/ompi/tmp/hetero .
This commit was SVN r5549.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r5440
r5448
since you can't fork() in one thread and waitpid() on the child in another,
which is what this test expects you to do. If Linux would just implement
the stupid POSIX standard already, this wouldn't be a problem.
This commit was SVN r5482.
we are part of the source tree and not defined otherwise, we are going
with an always defined if ompi_config.h is included policy. If
ompi_config.h is included before mpi.h or before OMPI_BUILDING is set,
it will set OMPI_BUILDING to 1 and enable all the internal code that
is in ompi_config_bottom.h. Otherwise, it will only include the
system configuration data (enough for defining the C and C++ interfaces
to MPI, but not perturbing the user environment).
This should fix the problems with bool and the like that the Eclipse
folks were seeing. It also cleans up some build system hacks that
we had along the way.
Also, don't use int64_t as the default size of MPI_Offset, because it
requires us including stdint.h in mpi.h, which is something we really
shouldn't be doing.
And finally, fix a ROMIO Makefile that didn't set -DOMPI_BUILDING=1,
as ROMIO includes mpi.h, but not ompi_config.h
This commit was SVN r5430.
* Update cmpset test to call memory barrier when needed before checking the
results
* remove unneeded sync from cmpset_32 on Power PC
This commit was SVN r5420.
OMPI_ENABLE_DEBUG because that changes the size of struct's (e.g.,
ompi_object) in the unit tests as compared to what may have been
compiled in the library.
This commit was SVN r5373.
especially upon abnormal termination of a process. Not yet integrated
into the fork pls; pending more discussion with other developers.
This commit was SVN r5326.
environments. Working on the fix, but don't break everyone's unit
tests while I'm working on it -- will re-commit once ompi_setenv() and
ompi_unsetenv() are fixed.
This commit was SVN r5166.
Modify the locking scheme to try and resolve a problem with dump_triggers that only occurs with multiple processes. Didn't resolve the problem, but should be more robust anyway. Still tracking this one down.
This commit was SVN r5114.
Update the unit-test-status matrix to include priority.
Add several new registry diagnostics that helped track down the above bug.
M test/mca/gpr/gpr_triggers.c
M test/Unit-Test-Status.xls
M test/Unit-Test-Status.pdf
M src/mpi/runtime/ompi_mpi_init.c
M src/mca/oob/base/oob_base_xcast.c
M src/mca/ns/base/ns_base_nds_env.c
M src/mca/gpr/replica/api_layer/gpr_replica_dump_api.c
M src/mca/gpr/replica/api_layer/gpr_replica_api.h
M src/mca/gpr/replica/communications/gpr_replica_comm.h
M src/mca/gpr/replica/communications/gpr_replica_remote_msg.c
M src/mca/gpr/replica/communications/gpr_replica_cmd_processor.c
M src/mca/gpr/replica/communications/gpr_replica_dump_cm.c
M src/mca/gpr/replica/gpr_replica_component.c
M src/mca/gpr/replica/gpr_replica.h
M src/mca/gpr/replica/functional_layer/gpr_replica_dump_fn.c
M src/mca/gpr/replica/functional_layer/gpr_replica_fn.h
M src/mca/gpr/replica/functional_layer/gpr_replica_trig_ops_fn.c
M src/mca/gpr/replica/functional_layer/gpr_replica_messaging_fn.c
M src/mca/gpr/replica/functional_layer/gpr_replica_segment_fn.c
M src/mca/gpr/proxy/gpr_proxy_dump.c
M src/mca/gpr/proxy/gpr_proxy.h
M src/mca/gpr/proxy/gpr_proxy_component.c
M src/mca/gpr/gpr_types.h
M src/mca/gpr/base/base.h
M src/mca/gpr/base/unpack_api_response/gpr_base_dump_notify.c
M src/mca/gpr/base/pack_api_cmd/gpr_base_pack_dump.c
M src/mca/gpr/gpr.h
This commit was SVN r5080.
built. The issue is that these tests are trying to test specific
components, and is calling the functions directly -- and therefore
needs to have the component linked in. This is fine when the
component is statically linked as part of libmpi, but presents a
problem when the component is a DSO.
GNU compilers/linkers allow us to link in the DSO as part of the test
executable (and everything "just works"), but this is not portable. A
better solution is going to involve:
- a better unit test support library that can load a DSO on demand
- using function pointers in the unit tests (rather than direct
function invocation)
This commit was SVN r5051.
* Update a bunch of the unit tests to either be disabled (someone who
isn't me and knows that code needs to fix them) or to work properly
This commit was SVN r4986.
build / run. Only things that actually build / run right now are the
asm and class tests. The mca tests probably will with a static build
but that hasn't been verified
This commit was SVN r4979.
This commit was SVN r4978.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r4977
MPI and non-ORTE applications for RSH on one node with or without
threads. I think we're approaching convergence with the tim branch
This commit was SVN r4895.
OMPI_ERROR if it found something and OMPI_SUCCESS otherwise). Also
look for INADDR_NONE instead of INADDR_ANY as the return from inet_addr()
* add convinience function ompi_ifislocal to quickly test if a given
hostname or IP address (in dotted-quad form) is a local address
* don't ssh to the local machine, but fork() / exec() the bootproxy directly
if ompi_ifislocal returns true *AND* there is no username specified
for the given host
* remove the llm hack to translate localhost -> local machine name
This commit was SVN r3450.
- Add #include "ompi_config.h" to all .c files, and ensure that it's
the first #included file
- remove a few useless #if HAVE_CONFIG_H checks
This commit was SVN r3229.
around with while waiting for other things to compile. :-)
Since there were some unit tests for the argv interface, took the
liberty of updating it for two new functions that were necessary:
ompi_argv_delete() and ompi_argv_insert().
This commit was SVN r2907.
only be used if the RTE init functions have been called. Not quite as
flexible as the real waitpid() function (no -1 support), but all I need
for the SSH / BProc / RMS pcms. This code is not yet turned on by
default (need to add the init / finalize calls to ompi_rte_init?? and
ompi_rte_finalize()
This commit was SVN r2860.
Added a field to the ompi_rte_node_schedule_t structure to keep track of the number of items on the environ list, thus making it easier to append more things to it. Adjusted the mca_pcm_base_build_base_env function correspondingly to take that field as an additional argument.
Changed mpirun2 to a .c program for convenience since it wasn't using any c++ features anyway.
This commit was SVN r2561.
host / cpu information down into a handle that need not exist when
the llm isn't being used. Fix all the test cases and whatnot to match
This commit was SVN r2490.
BTW, in case anyone is trying to use threads, be aware that much of the RTE is NOT thread-safe at this time. We'll work on that soon.
:-)
This commit was SVN r2379.
fail. test/rte/Makefile was removed from the top-level configure.ac
AC_CONFIG_FILES in r2239. I'm not sure what the intent was here, so
I'm just removing "rte" from test/Makefile.am's DIST_SUBDIRS so that
"make dist" can work again. One of the RTE folks can examine this to
see what the right course of action for the long run is. :-)
This commit was SVN r2257.
The following SVN revision numbers were found above:
r2239 --> open-mpi/ompi@58792a3ad0