For anyone interested, the problem stemmed from two things:
1. a bug in the ompi_bitmap utility (which I copied to orte_bitmap to avoid unintentionally disturbing something else) that causes the bitmap NOT to expand unless the caller asks for a bit that is more than one byte outside the current array size. The unit test didn't pick it up because it doesn't check that close to the boundary.
2. a "feature" in the ompi_bitmap utility that only expands the array if you try to SET a bit outside the current boundary, but NOT if you try to CLEAR a bit outside the array limit. This appears intentional as the unit test checks for this behavior, but I hadn't been expecting the asymmetry.
The orte_bitmap utility now appropriately expands in both circumstances. I also added a function to expand the array so it "covers" a bit location without setting or clearing it. The function allows you to ensure the array is big enough to handle the specified bit, but leave the bit alone if it already is there (the other functions would set/clear it if it was).
I've tested it with up to 100 processes without problem.
This commit was SVN r5980.
call the memory pool to do special memory allocations, and extended
the mpool so that it will do the allocations and keep tack of them in
a tree. Currently, if you pass MPI_INFO_NULL to MPI_Alloc_mem, we will
try to allocate the memory and register it with as many mpools as
possible. Alternatively, one can pass an info object with the names of
the mpools as keys, and from these we decide which mpools to register
the new memory with.
- fixed some comments in the allocator and fixed a minor bug
- extended the red black tree test and made a minor correction
This commit was SVN r5902.
Need to do some refining of the component, but it meets basic requirements right now. Nobody else should notice any change - system basically ignores it unless you tell it to do something.
This commit was SVN r5723.
- Change all uses of *printf'ing a size_t to use an explicit cast to
(unsigned long) and the %lu escape
- change ORTE_GPR_REPLICA_MAX_SIZE to INT_MAX until bug 1345 is fixed
(i.e., until we allow size_t in MCA params)
- ns_base_local_fns.c:orte_ns_base_get_proc_name_string(): changed
from %0X -> %lu
- ORTE_NAME_ARGS added explicit (unsigned long) casts, and changed all
usages of ORTE_NAME_ARGS to use %lu's
This commit was SVN r5644.
1. Added pid_t to the dps
2. Processes now "register" their local pid and update their location (i.e., nodename) on the registry during mpi_init
3. Added a new error code for values that exceed maximum for their data type (useful when transitioning a value from one variable to another of different size)
4. Fixed a few places where size_t was being incorrectly handled
5. Updated dps_test to cover pid_t types
This should now provide support for TotalView connection - which David is pursuing.
This commit was SVN r5623.
HEADS UP: string versions of names are now presented in DECIMAL format - not HEX as they previously were. If you used the name services functions (as you were supposed to do) to access these names, you will not have any problems. If you did it yourself, then you need to fix it - my suggestion would be that you fix your code by using the name service functions to avoid future problems.
This commit was SVN r5571.
1. *correctly* fix the printing of size_t variables. Need to do this through a #define, not just typecast things. Thanks to Jeff/Brian for suggesting a cleaner way to do it (as opposed to just doing the #define at the print location). Note that not ALL of the prints have been "fixed" yet - will continue to identify them.
2. Add int64 and size_t to the pack/unpack unit tests.
3. Fix a bug in the int64 pack/unpack system.
This commit was SVN r5570.
Merged in from:
svn merge -r5506:5553 https://svn.open-mpi.org/svn/ompi/tmp/hetero .
This commit was SVN r5552.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r5506
r5553
Merged in from:
svn merge -r5448:5496 https://svn.open-mpi.org/svn/ompi/tmp/hetero .
This commit was SVN r5550.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r5448
r5496
from:
svn merge -r5440:5448 https://svn.open-mpi.org/svn/ompi/tmp/hetero .
This commit was SVN r5549.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r5440
r5448
since you can't fork() in one thread and waitpid() on the child in another,
which is what this test expects you to do. If Linux would just implement
the stupid POSIX standard already, this wouldn't be a problem.
This commit was SVN r5482.
we are part of the source tree and not defined otherwise, we are going
with an always defined if ompi_config.h is included policy. If
ompi_config.h is included before mpi.h or before OMPI_BUILDING is set,
it will set OMPI_BUILDING to 1 and enable all the internal code that
is in ompi_config_bottom.h. Otherwise, it will only include the
system configuration data (enough for defining the C and C++ interfaces
to MPI, but not perturbing the user environment).
This should fix the problems with bool and the like that the Eclipse
folks were seeing. It also cleans up some build system hacks that
we had along the way.
Also, don't use int64_t as the default size of MPI_Offset, because it
requires us including stdint.h in mpi.h, which is something we really
shouldn't be doing.
And finally, fix a ROMIO Makefile that didn't set -DOMPI_BUILDING=1,
as ROMIO includes mpi.h, but not ompi_config.h
This commit was SVN r5430.
* Update cmpset test to call memory barrier when needed before checking the
results
* remove unneeded sync from cmpset_32 on Power PC
This commit was SVN r5420.
OMPI_ENABLE_DEBUG because that changes the size of struct's (e.g.,
ompi_object) in the unit tests as compared to what may have been
compiled in the library.
This commit was SVN r5373.
especially upon abnormal termination of a process. Not yet integrated
into the fork pls; pending more discussion with other developers.
This commit was SVN r5326.
environments. Working on the fix, but don't break everyone's unit
tests while I'm working on it -- will re-commit once ompi_setenv() and
ompi_unsetenv() are fixed.
This commit was SVN r5166.
Modify the locking scheme to try and resolve a problem with dump_triggers that only occurs with multiple processes. Didn't resolve the problem, but should be more robust anyway. Still tracking this one down.
This commit was SVN r5114.
Update the unit-test-status matrix to include priority.
Add several new registry diagnostics that helped track down the above bug.
M test/mca/gpr/gpr_triggers.c
M test/Unit-Test-Status.xls
M test/Unit-Test-Status.pdf
M src/mpi/runtime/ompi_mpi_init.c
M src/mca/oob/base/oob_base_xcast.c
M src/mca/ns/base/ns_base_nds_env.c
M src/mca/gpr/replica/api_layer/gpr_replica_dump_api.c
M src/mca/gpr/replica/api_layer/gpr_replica_api.h
M src/mca/gpr/replica/communications/gpr_replica_comm.h
M src/mca/gpr/replica/communications/gpr_replica_remote_msg.c
M src/mca/gpr/replica/communications/gpr_replica_cmd_processor.c
M src/mca/gpr/replica/communications/gpr_replica_dump_cm.c
M src/mca/gpr/replica/gpr_replica_component.c
M src/mca/gpr/replica/gpr_replica.h
M src/mca/gpr/replica/functional_layer/gpr_replica_dump_fn.c
M src/mca/gpr/replica/functional_layer/gpr_replica_fn.h
M src/mca/gpr/replica/functional_layer/gpr_replica_trig_ops_fn.c
M src/mca/gpr/replica/functional_layer/gpr_replica_messaging_fn.c
M src/mca/gpr/replica/functional_layer/gpr_replica_segment_fn.c
M src/mca/gpr/proxy/gpr_proxy_dump.c
M src/mca/gpr/proxy/gpr_proxy.h
M src/mca/gpr/proxy/gpr_proxy_component.c
M src/mca/gpr/gpr_types.h
M src/mca/gpr/base/base.h
M src/mca/gpr/base/unpack_api_response/gpr_base_dump_notify.c
M src/mca/gpr/base/pack_api_cmd/gpr_base_pack_dump.c
M src/mca/gpr/gpr.h
This commit was SVN r5080.
built. The issue is that these tests are trying to test specific
components, and is calling the functions directly -- and therefore
needs to have the component linked in. This is fine when the
component is statically linked as part of libmpi, but presents a
problem when the component is a DSO.
GNU compilers/linkers allow us to link in the DSO as part of the test
executable (and everything "just works"), but this is not portable. A
better solution is going to involve:
- a better unit test support library that can load a DSO on demand
- using function pointers in the unit tests (rather than direct
function invocation)
This commit was SVN r5051.
* Update a bunch of the unit tests to either be disabled (someone who
isn't me and knows that code needs to fix them) or to work properly
This commit was SVN r4986.
build / run. Only things that actually build / run right now are the
asm and class tests. The mca tests probably will with a static build
but that hasn't been verified
This commit was SVN r4979.
This commit was SVN r4978.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r4977
MPI and non-ORTE applications for RSH on one node with or without
threads. I think we're approaching convergence with the tim branch
This commit was SVN r4895.
OMPI_ERROR if it found something and OMPI_SUCCESS otherwise). Also
look for INADDR_NONE instead of INADDR_ANY as the return from inet_addr()
* add convinience function ompi_ifislocal to quickly test if a given
hostname or IP address (in dotted-quad form) is a local address
* don't ssh to the local machine, but fork() / exec() the bootproxy directly
if ompi_ifislocal returns true *AND* there is no username specified
for the given host
* remove the llm hack to translate localhost -> local machine name
This commit was SVN r3450.
- Add #include "ompi_config.h" to all .c files, and ensure that it's
the first #included file
- remove a few useless #if HAVE_CONFIG_H checks
This commit was SVN r3229.
around with while waiting for other things to compile. :-)
Since there were some unit tests for the argv interface, took the
liberty of updating it for two new functions that were necessary:
ompi_argv_delete() and ompi_argv_insert().
This commit was SVN r2907.
only be used if the RTE init functions have been called. Not quite as
flexible as the real waitpid() function (no -1 support), but all I need
for the SSH / BProc / RMS pcms. This code is not yet turned on by
default (need to add the init / finalize calls to ompi_rte_init?? and
ompi_rte_finalize()
This commit was SVN r2860.
Added a field to the ompi_rte_node_schedule_t structure to keep track of the number of items on the environ list, thus making it easier to append more things to it. Adjusted the mca_pcm_base_build_base_env function correspondingly to take that field as an additional argument.
Changed mpirun2 to a .c program for convenience since it wasn't using any c++ features anyway.
This commit was SVN r2561.
host / cpu information down into a handle that need not exist when
the llm isn't being used. Fix all the test cases and whatnot to match
This commit was SVN r2490.
BTW, in case anyone is trying to use threads, be aware that much of the RTE is NOT thread-safe at this time. We'll work on that soon.
:-)
This commit was SVN r2379.
fail. test/rte/Makefile was removed from the top-level configure.ac
AC_CONFIG_FILES in r2239. I'm not sure what the intent was here, so
I'm just removing "rte" from test/Makefile.am's DIST_SUBDIRS so that
"make dist" can work again. One of the RTE folks can examine this to
see what the right course of action for the long run is. :-)
This commit was SVN r2257.
The following SVN revision numbers were found above:
r2239 --> open-mpi/ompi@58792a3ad0
- unpack string allocates memory for user removing need to know/set max string lengths
- fixed missing 'break' in case statement thanks to unit test yet again!
- updated unit test ompi_pack (test8)
- will remove old OMPI_STRING type and support shortly after checking for usage
This commit was SVN r2219.
- make sure to pass jobid to the spawned process
- update test case and bootproxy to pass/receive jobid
- work on list splitting code for rsh spawn_procs()
This commit was SVN r2212.
* rework build_base_env to use newly discovered (by me, anyway) argv
interface
* add test case for build_base_env where I discovered I can't do boolean
logic....
This commit was SVN r2209.
- added ompi_list_splice and ompi_list_join, which are multi-list
manipulation functions similar to the splice functions in the STL
list interface
- added ompi_list_is_empty() for O(1) testing of for an empty list
- added note that sometime in the future, ompi_list_get_size() may
change to O(N) complexity. DID NOT CHANGE THE FUNCTION IN ANY WAY.
If you are writing new code that needs to check for an empty list, please
use ompi_list_is_empty() rather than ompi_list_get_size() as is_empty()
will always be an O(1) operation (I can't see how you could make it
impossible to make it O(1), so this seems like a safe claim).
This commit was SVN r2186.
I could use some help from one of you MCA gurus - if you run the test, you'll see that I cannot get the gpr components to be recognized. I'm not sure of the reason - I would appreciate any help you can provide to get the gpr components "registered".
This commit was SVN r2158.
structures to make it easier to swap around lists when doing process ->
resource mapping
* Fix spawn interface to take an ompi_list_t* instead of an ompi_list_t
since you can't pass an ompi_list_t by value
* Change allocate_resource to return an ompi_list_t* instead of having
an ompi_list_t** as an argument, since it's a bit cleaner and makes
who should call OBJ_NEW much more clear
* Clean up deallocation in error cases for the llm_base_allocate function
* Update test case for llm to not depend on current environment for
correctness
This commit was SVN r2126.
Looks like it works:
SUPPORT: OMPI Test Passed: oob packed recieve then send: (15 tests)
SUPPORT: OMPI Test Passed: oob packed send then recieve: (15 tests)
This commit was SVN r2110.
and updated the unit test for this.
- This call is for use by the OOB device so it can pass the pack/buffer system
memory. This removes the peek-recv race condition.
This commit was SVN r2105.
- removed send_hton/recv_ntoh routines as we now have send_packed/recv_packed
- removed constness from apis
- adding flag (in work) to allow recv to allocate and return recv buffer
- updated edgars communicator code to use pack routines rather than ntoh routines
This commit was SVN r2095.
or rsh/ssh stdin/stdout, even includes a test case
* add base function for use when PCMs can't provide uniqueness strings, so
that we all have the same value
This commit was SVN r2082.
* Add hostfile component for the LLM (reads hostfiles, returns array of
node identifiers
NOTES:
- This will require the full autogen / configure / make.
- You now need flex to build Open MPI from Subversion. The versions
available on most Linux boxen and OS X is more than new enough. You
do *not* need flex to build from a nightly or release tarball.
This commit was SVN r1890.
All the user interface functions are now in mca/oob/base/base.h
Anyone who uses the oob should just include this file.
All component related functions have been moved to mca/oob/oob.h
The reason for this change was to make the user interface more clear.
This commit was SVN r1884.
-properly initialize variables in the oob_tcp_msg struct
-properly close peer sockets in the tcp oob
-fix compare in bucket allocator to use the correct variable
-remove duplicate free in teg
-updated the oob tests
-add more output to tcp oob when there are failures
This commit was SVN r1866.
-in some cases failed to call complete function when the message
was sent.
-was freeing the wrong iovec in the base recv function
-added a first cut of a oob test
This commit was SVN r1849.