all processes call MPI_Gatherv(MPI_IN_PLACE...) because IN_PLACE is
only allowed to be used at the root. Non-root processes must use
their receive buf as the send buf.
This commit was SVN r7363.
- added relevant logic for everything except
mca_coll_basic_reduce_log_intra() -- need some help from George /
Edgar on this one...
- replaced ompi_ddt_sndrcv() with ompi_ddt_copy_content_same_ddt()
where relevant
- removed some "if (size > 1)" conditionals, because the self coll
module will always be chosen for collectives where size==1
Waiting for BA's tests to check the validity of this IN_PLACE stuff.
We'll see how it goes!
This commit was SVN r7351.
-added some alltoall calls (pairwise checked ok, bruck testing)
-changes in use of data hung of communicator
-making sendrecv call a true inline function
-more use ompi_ddt routines
This commit was SVN r7337.
Correct the spring of the vpid problem (similar to the one in the SM PTL).
Add one more argument to the MCA_BTL_SM_FIFO_WRITE macro who will get passed down to the
MCA_BTL_SM_SIGNAL_PEER macro to allow it to have the fifo_fd file descriptor.
This commit was SVN r7305.
a constructor, like the rest of the code base
- Convert usage in the tree to use the constructor to zero out an
instance of opal_output_stream_t
- Still need to re-enable output files
This commit was SVN r7253.
problem because Autoconf replaced the "#undef ..." with "#define
...". Fix this by not putting the "#undef ..." statement directly in
romioconf.h[.in] -- but rather having romioconf.h[.in] #include
romioconf-undefs.h, which has the #undef statements.
This commit was SVN r7252.
than other Unix OS's and does not accept to be called with any arguments. Therefore,
we need these files in order to succesfully compile even if they are empty.
This commit was SVN r7220.
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
number to be set at autoconf time (instead of at configure time, as
it was before). Set the version number, minus the subversion r number,
at autoconf time. Override the internal variables to include the r
number (if needed) at configure time. Basically, the right thing
should always happen. The only place it might not is the version
reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
in the directory containing source files, even if the Makefile.am is
in another directory. This should start making it feasible to
reduce the number of Makefile.am files we have in the tree, which
will greatly reduce the time to run autogen and configure.
This commit was SVN r7211.
- finally added "in use" flags -- one flag protects a set of segments
- these flags now used in bcast to protect (for example) when a
message is so long that the root loops around the segments and has
to re-use old segments -- now it knows that it has to wait until the
non-root processes have finished with that set of segments before it
can start using them
- implement allreduce as a reduce followed by a bcast (per discussion
with rich)
- removed some redundant data on various data structures
- implemented query MCA param ("coll_sm_shared_mem_used_data") that
tells you how much shared memory will be used for a given set of MCA
params (e.g., number of segments, etc.). For example:
ompi_info --mca coll_sm_info_num_procs 4 --param coll sm | \
grep shared_mem_used_data
tells you that for the default MCA param values (as of r7172), for 4
processes, sm will use 548864 bytes of shared memory for its data
transfer section
- remove a bunch of .c files from the Makefile.am that aren't
implemented yet (i.e., all they do is return ERR_NOT_IMPLEMENTED)
Now on to the big Altix to test that this stuff really works...
This commit was SVN r7205.
The following SVN revision numbers were found above:
r7172 --> open-mpi/ompi@bc72a7722b
- bcast now works properly for root!=0 and multi-fragment messages
- destroy mpool when communicator is destroyed
Still need to implement:
- "in use" flags for groups of fragments so that "wrapping around" in
the data segment doesn't overwrite not-yet-read data
- ensure that shared memory isn't removed before all processes have
finished with it (e.g., during COMM_FREE)
This commit was SVN r7172.
this assumes that the peers have all been added via add_procs up front.
Bad things will happen if add_procs is called again later on a new set of
procs to fix this we need to modify the srq which may wreck things.. looking
into this deeper..
This commit was SVN r7142.
1. Added OMPI_PROC_ARCH as a defined registry key and added the code so that the architecture info gets properly transmitted across all processes using the startup message.
2. Added an OMPI_MODEX_KEY definition and removed the hard-coded "modex" key from pml_modex_exchange
This commit was SVN r7129.
add a -I to find the included ltdl.h (vs. a system-installed ltdl.h)
- Clean up kruft in a bunch of Makefile.am's to remove now-unnecessary
AM_CPPFLAGS settings to get static-components.h for each framework
- Move the component_repository API functions out of opal/mca/base/base.h
and into opal/mca/base/mca_base_component_repository.h in order to
decrease unnecessary dependencies (e.g., before this, almost
everything in the tree depended on ltdl.h, which is unnecessary --
only a small number of files really need ltdl.h)
This commit was SVN r7127.
Here's the huge registry check-in you've all been waiting for with baited breath. The revised version sends a single message to all processes at the various stage gates, thus making the startup much more scalable. I could provide you with all the tawdry details, but won't for now - you are welcome to ask, though, and I'll merrily bore your ears to tears.
In addition, the commit contains the following:
1. set the ignore properties on ompi/debuggers and orte/mca/pls/poe
2. Added simplified subscribe and put functions to the registry's API. I have also converted all of the ompi functions that registered subscriptions to the new API, and caught their associated put's as well.
In a follow-on commit, I'll be adding support for George's hetero arch registry subscription (wanted to get this one in first).
This commit was SVN r7118.
Changed component so choice of decision functions controlled by mca params
(for now fixed decision functions (if statements) default)
started fixes for the various bcasts
This commit was SVN r7117.
(because we kept bumping up against the max filename limit in "tar"
when making tarballs, especially if the version number got long).
This commit was SVN r7065.
friendly #defines to be included in mpi.h (even for users), such as
_GNU_SOURCE, which can have some really big consequences on Linux.
Instead, add mpi.h to AC_CONFIG_HEADERS and just include the #defines
we have to have for mpi.h and the C++ bindings.
This commit was SVN r7022.
- Fix compiler warnings
- Fix problem with using "p" instead of "p_index"
- Style updates
- Check return of malloc() for NULL
This commit was SVN r6999.
much time) and somewhat-lame implementation of barrier (need to
precompute some more stuff rather than calculate it every time).
Checkpointing so I can try this on another machine...
This commit was SVN r6985.
issue so PUT is default.. We are determining if this is an openib issue or a
btl issue as we have seen performance increases on mvapi.
This commit was SVN r6928.
that way anway) so that we can properly "make distcheck" easier. Fix
the rule for making the sym link to ptl_sm_send_alternate.c so that
"distcheck" works.
This commit was SVN r6919.
* don't know what I was thinking, but can't use the MCA_PML_CALL macro on
the two data values, as they don't have things that the macro can
expand into
This commit was SVN r6868.
* Add base to memory framework so that we can do something sane with
ompi_info
* Updated ompi_info to print components for memory framework and
show whether we have memory hooks active or not.
This commit was SVN r6861.
of spaces (curses! indent(1) had been updated with a new option that
I did not use). This commit simply converts tabs to real spaces.
This commit was SVN r6799.
the values to the PML structure. This will allow PMLs that want to do
hardware matching at the cost of a smaller range of valid tags and cids.
Updated all the places that used the MPI_TAG_UB_VALUE constant to instead
look at the pml struct.
This commit was SVN r6778.
alignment of 0, then assume there will be no data segment and don't do
the checks to see if it will be beyond the end of the file.
This commit was SVN r6773.
The following SVN revision numbers were found above:
r6672 --> open-mpi/ompi@8b56769307
- Adjust btl sm to allocate just a few bytes extra to allow the common
sm component to assume that there will be a data segment (even though
the sm btl doesn't use the data segment in that portion of code)
This commit was SVN r6772.
* Major rework of Portals to better match Red Storm and hopefully get
better performance:
- Always assume there is only one module (since there are no machines
on the planet with more than one Portals interface)
- make progress all one function rather than dispatching to other
functions and dispatch on event type, not comm type
- remove polling of unneeded events
This commit was SVN r6769.
Change all the places where they are used to fit the new name.
Remove the code to check the remote arch from the PML. We will have a GPR mechanism
in ompi_mpi_initialize to do that.
This commit was SVN r6750.
the message is no longer pending
* Try to push out new messages whenever we finish a send, whether it
worked or not. Means that in the case where the other side has too
many sends pending, we'll constantly retry one (and only one, once the
pending number is reached) message until goodness returns
* Make some warnings only happen in verbose case, as they are mainly
diagnostics
This commit was SVN r6732.
may appear.
(remove *error.h file from Makefile.am -- a cut-n-paste error that has
propagated to a surprising number of directories ;-) )
This commit was SVN r6721.
- new preferred API calls for registering MCA parameters are
mca_base_param_reg_{int|string} and
mca_base_param_reg_{int|string}_name.
- See opal/mca/base/mca_base_param.h for docs on new calls.
- Can now register and lookup a value at the same time.
- Can now mark a parameter "read only" at registration time
- Can now mark a parameter "internal" at registration time
- Can now associate a help message with the parameter at registration
time; displayed in the ompi_info output.
The old API calls are still available for backwards compatibility
(mca_base_param_register_{int|string}. They will eventually be
removed -- all developers are encouraged to use the new APIs from here
on out and replace any old calls with the new API.
Some params were also renamed -- the previous convention of using
"base_" as a prefix for any param that was not associated with a
component is henceforth deprecated. Instead, use one of the following
prefixes:
mca: for anything in the MCA base itself
opal: for anything in OPAL
orte: for anything in ORTE
mpi: for anything in OMPI
This commit was SVN r6698.
* add btl back into ompi_info. Since it now directly calls the
open/close, the missing symbol problems Ralph was seeing when ob1 is
ignored will not occur.
This commit was SVN r6652.
support in OMPI. Currently only enables/disables the architecture
sharing modex in ob1 pml.
* Add sds framework to ompi_info
* Figure out table ids to use for Portals BTL at configure time, since
we should use 30 & 31 on Red Storm, but the reference implementation
only supports 0-8.
* Some bug fixes in Portals UTCP sds
This commit was SVN r6650.
- only call sched_yield if it exists
- don't fail out if modex doens't work in ob1
- bunch of fixes for Portals BTL
- add cnos rml component
- add NULL gpr component (should only be used if replica AND proxy
fail to load)
This commit was SVN r6629.
* remove the recv fragment code, since it really isn't needed
* handle memory descriptor binding a bit more sanely, and use
thresholds so that Portals does the unlink for us, when it feels
like it.
This commit was SVN r6575.
* change all the opal_output_verbose calls in the critical path to
OPAL_OUTPUT_VERBOSE so that they are pre-processed out if debugging
is not enabled
* remove stub code
This commit was SVN r6564.
"run-time" api for the reference implementation.
* Make the non-modex utcp and redstorm compat code do the same things in
the same order
This commit was SVN r6556.
associated with the descriptor is ours again once the callback function
returns. Make it so - probably can optimize out some of the stuff I
did when I mistakenly thought the descriptor free() was called on the
passed descriptor
* Fix some dumb accounting errors with MD usage for unexpected receives
This commit was SVN r6555.
Fixed receive descriptor counts that limited mvapi and openib to 2 procs.
Begin porting error messages to use the BTL_ERROR macro.
This commit was SVN r6554.
1. Modify the registry to eliminate redundant data copying for startup messages.
2. Revise the subscription/trigger system to avoid redundant storage of triggers and subscriptions. This dramatically reduces the search time when a registry action occurs - to illustrate the point, there are now only a handful of triggers on the system for each job. Before, there were a handful of triggers for each PROCESS in the job, all of which had to be checked every time something happened on the registry. This is much, much faster now.
3. Update all subscriptions to the new format. There are now "named" subscriptions - this allows you to "name" a subscription that all the processes will be using. The first one to hit the registry actually defines the subscription. From then on, any subsequent "subscribes" to the same name just cause that process to "attach" to the existing subscription. This keeps the number of subscriptions being tracked by the registry to a minimum, while ensuring that each process still gets notified.
4. Do the same for triggers.
Also fixed a duplicate subscription problem that was causing people to receive data equal to the number of processes times the data they should have received from a trigger/subscription. Sorry about that... :-( ...but it's all better now!
Uncovered a situation where the modex data seems to be getting entered on the registry a second time - the latter time coming after the compound command has been "fired", thereby causing all the subscriptions to fire. Asked Tim and Jeff to look into this.
Second phase of the changes will involve modifying the xcast system so that the same message gets sent to all processes. This will further reduce the message traffic, and - once we have a true "broadcast" version of xcast - really speed things up and improve scalability.
This commit was SVN r6542.
- we modex send and receive a structure containing the nid id and the endpoint id. On the
remote node we can recompose the endpoint_addr via mx_connect.
- accept several retry to mx_connect (up to 5 seconds ... soon to be a MCA param).
- correctly construct/destruct the internal objects.
- some others minor changes.
This commit was SVN r6535.
multiple components to share a single mpool module (e.g., the
ptl/btl and coll sm components).
- Re-tool the ptl, btl, and coll sm components to first look for the
target mpool module, and if they don't find it, to create it.
- coll sm component now correctly identifies when it is supposed to
run or not (i.e., if all the processes in the communicator are on
the same host). Now we just need to fill in some algorithms. :-)
This commit was SVN r6530.
corrected free and realloc in mpool. Added alloc_base to
mca_mpool_base_registration_t to be used as the actual alloc'd base address,
which may be different from the reported base address due to page allignment.
This commit was SVN r6524.
IBM tests, but see same segfaults / assets in sm btl.
* Add prepare_src implementation so that we can send multiple fragments of
large messages
* Add queuing of sends if either there are too many outstanding sends
(we have to limit this so that we don't have more sends pending than
we could get acks for) or if we get an ack with a 0 byte mlength,
which means the remote side dropped the message on us.
Still need to valgrind to make sure I'm not leaking resources
This commit was SVN r6508.