* change all the opal_output_verbose calls in the critical path to
OPAL_OUTPUT_VERBOSE so that they are pre-processed out if debugging
is not enabled
* remove stub code
This commit was SVN r6564.
"run-time" api for the reference implementation.
* Make the non-modex utcp and redstorm compat code do the same things in
the same order
This commit was SVN r6556.
associated with the descriptor is ours again once the callback function
returns. Make it so - probably can optimize out some of the stuff I
did when I mistakenly thought the descriptor free() was called on the
passed descriptor
* Fix some dumb accounting errors with MD usage for unexpected receives
This commit was SVN r6555.
Fixed receive descriptor counts that limited mvapi and openib to 2 procs.
Begin porting error messages to use the BTL_ERROR macro.
This commit was SVN r6554.
1. Modify the registry to eliminate redundant data copying for startup messages.
2. Revise the subscription/trigger system to avoid redundant storage of triggers and subscriptions. This dramatically reduces the search time when a registry action occurs - to illustrate the point, there are now only a handful of triggers on the system for each job. Before, there were a handful of triggers for each PROCESS in the job, all of which had to be checked every time something happened on the registry. This is much, much faster now.
3. Update all subscriptions to the new format. There are now "named" subscriptions - this allows you to "name" a subscription that all the processes will be using. The first one to hit the registry actually defines the subscription. From then on, any subsequent "subscribes" to the same name just cause that process to "attach" to the existing subscription. This keeps the number of subscriptions being tracked by the registry to a minimum, while ensuring that each process still gets notified.
4. Do the same for triggers.
Also fixed a duplicate subscription problem that was causing people to receive data equal to the number of processes times the data they should have received from a trigger/subscription. Sorry about that... :-( ...but it's all better now!
Uncovered a situation where the modex data seems to be getting entered on the registry a second time - the latter time coming after the compound command has been "fired", thereby causing all the subscriptions to fire. Asked Tim and Jeff to look into this.
Second phase of the changes will involve modifying the xcast system so that the same message gets sent to all processes. This will further reduce the message traffic, and - once we have a true "broadcast" version of xcast - really speed things up and improve scalability.
This commit was SVN r6542.
- we modex send and receive a structure containing the nid id and the endpoint id. On the
remote node we can recompose the endpoint_addr via mx_connect.
- accept several retry to mx_connect (up to 5 seconds ... soon to be a MCA param).
- correctly construct/destruct the internal objects.
- some others minor changes.
This commit was SVN r6535.
multiple components to share a single mpool module (e.g., the
ptl/btl and coll sm components).
- Re-tool the ptl, btl, and coll sm components to first look for the
target mpool module, and if they don't find it, to create it.
- coll sm component now correctly identifies when it is supposed to
run or not (i.e., if all the processes in the communicator are on
the same host). Now we just need to fill in some algorithms. :-)
This commit was SVN r6530.
corrected free and realloc in mpool. Added alloc_base to
mca_mpool_base_registration_t to be used as the actual alloc'd base address,
which may be different from the reported base address due to page allignment.
This commit was SVN r6524.
IBM tests, but see same segfaults / assets in sm btl.
* Add prepare_src implementation so that we can send multiple fragments of
large messages
* Add queuing of sends if either there are too many outstanding sends
(we have to limit this so that we don't have more sends pending than
we could get acks for) or if we get an ack with a 0 byte mlength,
which means the remote side dropped the message on us.
Still need to valgrind to make sure I'm not leaking resources
This commit was SVN r6508.
- Change ompi_proc_world() to only return the procs in this job (as
opposed to all of them)
- Add a subscription that fires during MPI_INIT (stg1) for figuring
out which procs are on my local node. Need to figure out what to do
in the esoteric cases -- but the obvious one (Red Storm), where
subscriptions are never fired, is ok, because by definition, no
other procs will be on my node, so their default value (not on my
node) is ok.
--> Need to have RHC check this code; it seems to work, but I think
I'm getting too much data back from the subscription.
- End result is that any proc that is on my node will have its
OMPI_PROC_FLAG_LOCAL bit set on its proc->proc_flags field.
- Added/corrected a few comments in proc.h.
This commit was SVN r6507.
think the data is contiguous - and the convertor routine
you've changed this to doesn't support returning the correct
offset into the user buffer when a NULL address is provided
in the iovec array
This commit was SVN r6496.
* Add ability to completely disable libltdl (the dlopen code to load
dynamic shared objects) to configure: --disable-dlopen
* Added MCA param (component_disable_dlopen) to disable DSO loading
at runtime
* Made the event library behave in some not-completely-erroneous way
on platforms where it has absolutely no eventops support (ie, no
select, poll, or epoll)
* Disabled orte_wait, opal_few, and opal_daemon_init code on
platforms without fork, waitpid support. All non-init functions
will return OPMI_ERR_NOT_SUPPORTED
* Disable orteprobe tool when fork or pipe aren't supported
This commit was SVN r6490.
Red Storm. Add stub functions to ompi_config_bottom.h when they are
around
* Add protection for a bunch of #include <netinet/in.h>s
* Fix up the Portals BTL so that it compiles on Red Storm and has the
right mojo for initialization on Red Storm
* Add some important comments to ompi_check_package and mvapi configures
* Add support for platforms without getpwuid() (aka, Red Storm).
This commit was SVN r6478.
all the time. There is a performance problem (it's a lot slower than the optimized versions)
but otherwise it will never get tested intensively.
This commit was SVN r6467.
sockaddr_in - seems to be a good indicator)
* disable util/if code if no inet devices (again, no sockaddr_in)
* add enable/disable flag to disable stacktrace pretty-print code
(defaults to enabled). Seems there's something funky going on with
the preprocessor on Red Storm that was causing problems - this was
the easiest fix
* clean up a bunch of the configure.m4 files to remove bogus comments,
properly comment them, fix the dumb logic for happy/unhappy
* Create a macro for testing both header and library for a package,
since we seem to do this kind of test quite often. Handles the
-I and -L search paths properly (including stripping out /usr and
/usr/local if not needed)
* Converted mvapi components to configure.m4, using the nice new
ompi_check_package macro (above)
This commit was SVN r6454.
threads (basically, same as before, but we now link the right thread
libraries).
* Add disable-io-romio flag to disable compiling ROMIO
* Migrathe mvapi btl from configure.stub to configure.m4
This commit was SVN r6453.
was that some datatypes can be used in order to create additional datatypes. In such cases
they should not get destroyed otherwise the user will not be able to retrieve how a
datatype was created. So I decide to never increase the reference count for any predefined
datatypes (as we already know they will never get destroyed, except on finalize). For
the others, every time a datatype is user by another one I increase the reference count.
When I destroy a datatype, I parse (the internal structure args) and release one ref count
on all not predefined datatypes used by this one. Thus the datatypes get cleaned. The main
problem with this approach is the recursivity as this function can trigger another call
to itself (but I dont think it will be an issue).
This commit was SVN r6450.
appropriate spot and also added a check for the libsysfs library required by openmpi. Modified the mvapi configure.stub to use AC_TRY_LINK for
pthreads.
This commit was SVN r6441.
(which it is) and not romio itself
* Work around dark, evil linker voodoo that prevented building both shared
and static libraries on OS X. The global variables in ad_init.c were
not initialized, and so were marked as tentative definitions, which caused
much pain for the linker later on. Initializing them makes them
actual definitions and the problem goes away. I hate linkers.
This commit was SVN r6439.
present then all PTL's not in the include list are marked as not available. If the exclude
list is present then all PTLs in this list are marked as not available.
This commit was SVN r6438.
is a lot more difficult than a PTL, and it can adapt it's behavior to the level of threading required
by the user. In this case the behavior is the priorit of the PML. Therefore this information is never
availale before the init function (of the PML) is called. So I try to keep nearly the same structure
as it was before, with one change. When a PML get initialized it does not necessarily means it has been
selected, so it does not means it has to create all it's internal structures (and select the PTL and
all this stuff). They can all be done later, when a PML knows that it definitively get selected
(when the enable function is called with the argument set to true). Thus, in the case of a PML close
one have to check if the PML has been selected or not before trying to clean up the internals.
I had to change the MPI_Init function to allow the PML to be enabled before we start adding procs inside.
This commit was SVN r6434.
all required attributes are set even if the trigger never fires (the
trigger can update the value, if needed. This is all in MPI_Init, so
that does not violate the standard)
This commit was SVN r6427.