associated with the descriptor is ours again once the callback function
returns. Make it so - probably can optimize out some of the stuff I
did when I mistakenly thought the descriptor free() was called on the
passed descriptor
* Fix some dumb accounting errors with MD usage for unexpected receives
This commit was SVN r6555.
Fixed receive descriptor counts that limited mvapi and openib to 2 procs.
Begin porting error messages to use the BTL_ERROR macro.
This commit was SVN r6554.
1. Modify the registry to eliminate redundant data copying for startup messages.
2. Revise the subscription/trigger system to avoid redundant storage of triggers and subscriptions. This dramatically reduces the search time when a registry action occurs - to illustrate the point, there are now only a handful of triggers on the system for each job. Before, there were a handful of triggers for each PROCESS in the job, all of which had to be checked every time something happened on the registry. This is much, much faster now.
3. Update all subscriptions to the new format. There are now "named" subscriptions - this allows you to "name" a subscription that all the processes will be using. The first one to hit the registry actually defines the subscription. From then on, any subsequent "subscribes" to the same name just cause that process to "attach" to the existing subscription. This keeps the number of subscriptions being tracked by the registry to a minimum, while ensuring that each process still gets notified.
4. Do the same for triggers.
Also fixed a duplicate subscription problem that was causing people to receive data equal to the number of processes times the data they should have received from a trigger/subscription. Sorry about that... :-( ...but it's all better now!
Uncovered a situation where the modex data seems to be getting entered on the registry a second time - the latter time coming after the compound command has been "fired", thereby causing all the subscriptions to fire. Asked Tim and Jeff to look into this.
Second phase of the changes will involve modifying the xcast system so that the same message gets sent to all processes. This will further reduce the message traffic, and - once we have a true "broadcast" version of xcast - really speed things up and improve scalability.
This commit was SVN r6542.
- we modex send and receive a structure containing the nid id and the endpoint id. On the
remote node we can recompose the endpoint_addr via mx_connect.
- accept several retry to mx_connect (up to 5 seconds ... soon to be a MCA param).
- correctly construct/destruct the internal objects.
- some others minor changes.
This commit was SVN r6535.
multiple components to share a single mpool module (e.g., the
ptl/btl and coll sm components).
- Re-tool the ptl, btl, and coll sm components to first look for the
target mpool module, and if they don't find it, to create it.
- coll sm component now correctly identifies when it is supposed to
run or not (i.e., if all the processes in the communicator are on
the same host). Now we just need to fill in some algorithms. :-)
This commit was SVN r6530.
corrected free and realloc in mpool. Added alloc_base to
mca_mpool_base_registration_t to be used as the actual alloc'd base address,
which may be different from the reported base address due to page allignment.
This commit was SVN r6524.
IBM tests, but see same segfaults / assets in sm btl.
* Add prepare_src implementation so that we can send multiple fragments of
large messages
* Add queuing of sends if either there are too many outstanding sends
(we have to limit this so that we don't have more sends pending than
we could get acks for) or if we get an ack with a 0 byte mlength,
which means the remote side dropped the message on us.
Still need to valgrind to make sure I'm not leaking resources
This commit was SVN r6508.
- Change ompi_proc_world() to only return the procs in this job (as
opposed to all of them)
- Add a subscription that fires during MPI_INIT (stg1) for figuring
out which procs are on my local node. Need to figure out what to do
in the esoteric cases -- but the obvious one (Red Storm), where
subscriptions are never fired, is ok, because by definition, no
other procs will be on my node, so their default value (not on my
node) is ok.
--> Need to have RHC check this code; it seems to work, but I think
I'm getting too much data back from the subscription.
- End result is that any proc that is on my node will have its
OMPI_PROC_FLAG_LOCAL bit set on its proc->proc_flags field.
- Added/corrected a few comments in proc.h.
This commit was SVN r6507.
think the data is contiguous - and the convertor routine
you've changed this to doesn't support returning the correct
offset into the user buffer when a NULL address is provided
in the iovec array
This commit was SVN r6496.
* Add ability to completely disable libltdl (the dlopen code to load
dynamic shared objects) to configure: --disable-dlopen
* Added MCA param (component_disable_dlopen) to disable DSO loading
at runtime
* Made the event library behave in some not-completely-erroneous way
on platforms where it has absolutely no eventops support (ie, no
select, poll, or epoll)
* Disabled orte_wait, opal_few, and opal_daemon_init code on
platforms without fork, waitpid support. All non-init functions
will return OPMI_ERR_NOT_SUPPORTED
* Disable orteprobe tool when fork or pipe aren't supported
This commit was SVN r6490.
Red Storm. Add stub functions to ompi_config_bottom.h when they are
around
* Add protection for a bunch of #include <netinet/in.h>s
* Fix up the Portals BTL so that it compiles on Red Storm and has the
right mojo for initialization on Red Storm
* Add some important comments to ompi_check_package and mvapi configures
* Add support for platforms without getpwuid() (aka, Red Storm).
This commit was SVN r6478.
all the time. There is a performance problem (it's a lot slower than the optimized versions)
but otherwise it will never get tested intensively.
This commit was SVN r6467.
sockaddr_in - seems to be a good indicator)
* disable util/if code if no inet devices (again, no sockaddr_in)
* add enable/disable flag to disable stacktrace pretty-print code
(defaults to enabled). Seems there's something funky going on with
the preprocessor on Red Storm that was causing problems - this was
the easiest fix
* clean up a bunch of the configure.m4 files to remove bogus comments,
properly comment them, fix the dumb logic for happy/unhappy
* Create a macro for testing both header and library for a package,
since we seem to do this kind of test quite often. Handles the
-I and -L search paths properly (including stripping out /usr and
/usr/local if not needed)
* Converted mvapi components to configure.m4, using the nice new
ompi_check_package macro (above)
This commit was SVN r6454.
threads (basically, same as before, but we now link the right thread
libraries).
* Add disable-io-romio flag to disable compiling ROMIO
* Migrathe mvapi btl from configure.stub to configure.m4
This commit was SVN r6453.
was that some datatypes can be used in order to create additional datatypes. In such cases
they should not get destroyed otherwise the user will not be able to retrieve how a
datatype was created. So I decide to never increase the reference count for any predefined
datatypes (as we already know they will never get destroyed, except on finalize). For
the others, every time a datatype is user by another one I increase the reference count.
When I destroy a datatype, I parse (the internal structure args) and release one ref count
on all not predefined datatypes used by this one. Thus the datatypes get cleaned. The main
problem with this approach is the recursivity as this function can trigger another call
to itself (but I dont think it will be an issue).
This commit was SVN r6450.
appropriate spot and also added a check for the libsysfs library required by openmpi. Modified the mvapi configure.stub to use AC_TRY_LINK for
pthreads.
This commit was SVN r6441.
(which it is) and not romio itself
* Work around dark, evil linker voodoo that prevented building both shared
and static libraries on OS X. The global variables in ad_init.c were
not initialized, and so were marked as tentative definitions, which caused
much pain for the linker later on. Initializing them makes them
actual definitions and the problem goes away. I hate linkers.
This commit was SVN r6439.
present then all PTL's not in the include list are marked as not available. If the exclude
list is present then all PTLs in this list are marked as not available.
This commit was SVN r6438.
is a lot more difficult than a PTL, and it can adapt it's behavior to the level of threading required
by the user. In this case the behavior is the priorit of the PML. Therefore this information is never
availale before the init function (of the PML) is called. So I try to keep nearly the same structure
as it was before, with one change. When a PML get initialized it does not necessarily means it has been
selected, so it does not means it has to create all it's internal structures (and select the PTL and
all this stuff). They can all be done later, when a PML knows that it definitively get selected
(when the enable function is called with the argument set to true). Thus, in the case of a PML close
one have to check if the PML has been selected or not before trying to clean up the internals.
I had to change the MPI_Init function to allow the PML to be enabled before we start adding procs inside.
This commit was SVN r6434.
all required attributes are set even if the trigger never fires (the
trigger can update the value, if needed. This is all in MPI_Init, so
that does not violate the standard)
This commit was SVN r6427.
components to succeed with --enable-dist. Instead, just add them to
all_components and make dist will still work - we're going to stamp out
the Makefiles no matter what
* Add missing header to ob1 pml for make dist
* Clean up the Portals BTL configure code
This commit was SVN r6413.
frameworks, and components without configure scripts instead of
hard-coded shell variables (for projects and frameworks) and
shell variable building (for components).
* Add 3rd category of component configuration (in addition to configure
scripts and no-configured components): configure.m4 components. These
components can only be built as part of OMPI (like no-configure), but
can provide an m4 file that is run as part of the main configure
script. These macros can set whether the component should be built,
along with just about any other configuration wanted. More care must
be taken compared to configure components, as doing things like setting
variables or calling AC_MSG_ERROR now affects the top-level configure
script (so calling AC_MSG_ERROR if your component can't configure
probably isn't what you want)
* Added support to autogen.sh for the configure.m4-style components,
as well as building up the m4_define lists ompi_mca.m4 now expects
* Updated a number of macros to be more config.cache friendly (both
so that config.cache can be used and so the test can be quickly
run multiple times in the same configrue script):
- ompi_config_asm
- c_weak_symbols
- c_get_alignment
* Added new macros to be shared when configuring components:
- ompi_objc.m4 (this actually provides AC_PROG_OBJC - don't ask...)
- ompi_check_xgrid
- ompi_check_tm
- ompi_check_bproc
* Updated a number of components to use configure.m4 instead of
configure.stub
- btl portals
- io romio
- tm ras and pls
- bjs, lsf_bproc ras and bproc_seed pls
- xgrid ras and pls
- null iof (used by tm)
This commit was SVN r6412.
* mpi_show_mca_params
If set to true, this turns on the dumping of all MCA parameters when MPI_INIT is called.
Only the 'rank 0' processes will print the parameters.
* mpi_show_mca_params_file
(This value is only used if the first argument is set to true) If this value is non-NULL
it specifies the file to put the dump into. This file can then be used as input to mpirun
for debugging purposes. If this value is not set (and mpi_show_mca_params is set) then
the parameters are dumped to stdout.
This commit was SVN r6401.
the external32 conversions (as specified in the MPI standard) the other one is the local
convertor that can be used for MPI_Pack and MPI_Unpack functions.
This commit was SVN r6375.
- After long discussions and ruminations on how we run components in
LAM/MPI, made the decision that, by default, all components included
in Open MPI will use the version number of their parent project
(i.e., OMPI or ORTE). They are certaint free to use a different
number, but this simplification makes the common cases easy:
- components are only released when the parent project is released
- it is easy (trivial?) to distinguish which version component goes
with with version of the parent project
- removed all autogen/configure code for templating the version .h
file in components
- made all ORTE components use ORTE_*_VERSION for version numbers
- made all OMPI components use OMPI_*_VERSION for version numbers
- removed all VERSION files from components
- configure now displays OPAL, ORTE, and OMPI version numbers
- ditto for ompi_info
- right now, faking it -- OPAL and ORTE and OMPI will always have the
same version number (i.e., they all come from the same top-level
VERSION file). But this paves the way for the Great Configure
Reorganization, where, among other things, each project will have
its own version number.
So all in all, we went from a boatload of version numbers to
[effectively] three. That's pretty good. :-)
This commit was SVN r6344.
* rename ompi_malloc to opal_malloc
* rename ompi_numtostr to opal_numtostr
* start of rename of ompi_environ to opal_environ
This commit was SVN r6332.
* rename ompi_basename to opal_basename
* rename ompi bitop functions to opal
* rename ompi_cmd_line to opal_cmd_line
* rename ompi_sizet2int to opal_sizet2int
* rename orte_daemon_init to opal_daemon_init
* rename ompi_few to opal_few
This commit was SVN r6330.
- move mpool and allocator frameworks back to ompi (from opal)
- specialize the ompi_free_list class to use an mpool instance
- un-specialize opal_free_list to *not* use mpool; just use malloc/free
This commit was SVN r6292.