Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
- consistent arguments checking (not allowing to select an algorithm which
is not available)
- consistent way of computing the segcount (number of datatypes by segment).
- small cleanups.
- more informative debugging messages.
This commit was SVN r12545.
description. Most of the bcast algorithms can be completed using this
generic function once we create the tree structure. Add all kind of
trees.
There are 2 versions of the generic bcast function. One using overlapping
between receives (for intermediary nodes) and then blocking sends to all
childs and another where all sends are non blocking. I still have to
figure out which one give the smallest overhead.
This commit was SVN r12530.
N gatherv's:
for (i = 0 ... size)
MPI_Gatherv(..., root = i, ...)
The new algorithm simply does (effectively):
MPI_Gatherv(..., root = 0, ...)
MPI_Bcast(..., root = 0, ...)
This commit was SVN r12469.
allocation logic is completely done outside the data-type engine (in the PML) there is
no need for any special case inside the data-type engine. There is less arguments for
the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is
not required anymore as there is no memory allocated in the engine itself). This change
affect all components using datatypes. I test most of them, but it might happens that I
miss some ... If it's the case please let me know (don't shoot the pianist!!).
This commit was SVN r12331.
the default decision functions (for broadcast, reduce and barrier) are based on a
high performance network (not TCP). It should give good performance (really good) for
any network having the following caracteristics: small latency (5 microseconds) and good
bandwidth (more than 1Gb/s).
+ Cleanup of the reduce algorithms, plus 2 new algorithms (binary and binomial). Now most
of the reduce algorithms use a generic tree based function for completing the reduce.
+ Added macros for computing the trees (they are used for bcast and reduce right now).
+ Allow the usage of all 5 topologies.
+ Jelena's implementation of a binary tree that can be used for non commutative operations.
Right now only the tree building function is there, it will get activated soon.
+ Some others minor cleanups.
This commit was SVN r12326.
all platforms. The only exceptions (and I will not deal with them
anytime soon) are on Windows:
- the write functions which require the length to be an int when it's
a size_t on all UNIX variants.
- all iovec manipulation functions where the iov_len is again an int
when it's a size_t on most of the UNIXes.
As these only happens on Windows, so I think we're set for now :)
This commit was SVN r12215.
size and diplacement of data-type. After this patch all data can contain size_t bytes
and the displacements are defined as ptrdiff_t. All of the files I was able to compile
have been modified to match this requirement.
This commit was SVN r12146.
George: ompi_ddt_type_size() returns a signed int only because of the
MPI spec; it will never return a negative value. So casting the
return value out of it to a (uint32_t) is safe, and makes the
comparisons be between two unsigned values.
This commit was SVN r11639.
The following SVN revision numbers were found above:
r11619 --> open-mpi/ompi@8667648a1b
todos: macroize it as we do it 10 different ways, add mca params to control handling (push up size, no change, switch off segmenting)
This commit was SVN r11619.
I know it does not make much sense but one can play around with the
performance. Numbers are available at http://www.unixer.de/research/nbcoll/perf/.
This is the first step towards collv2. Next step includes the addition
of non-blocking functions to the MPI-Layer and the collv1 interface.
It implements all MPI-1 collective algorithms in a non-blocking manner.
However, the collv1 interface does not allow non-blocking collectives so
that all collectives are used blocking by the ompi-glue layer.
I wanted to add LibNBC as a separate subdirectory, but I could not
convince the buildsystem (and had not the time). So the component looks
pretty messy. It would be great if somebody could explain me how to move
all nbc*{c,h}, and {hb,dict}*{c,h} to a seperate subdirectory.
It's .ompi_ignored because I did not test it exhaustively yet.
This commit was SVN r11401.
different macros, one for each project. Therefore, now we have OPAL_DECLSPEC,
ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project.
This commit was SVN r11270.
shared memory segments
* make sure to properly unlink the collectives sm bootstrap area at
shutdown
* Add missing / in the path for the mpool shared memory segment
* make sure to release the common_mmap structure in the SM btl
after unlinking the file during shutdown
This commit was SVN r10886.
yes this means it WAS possible for two nodes to choice two different algorithms
(discovered by Doug Gregor and figured out by George)
Also changed some names like size to comsize so we know which sizes we are using where
This should be updated in al versions
This commit was SVN r10601.
(1) As pointed out by Torsten after Jeff comment that there are 15 collectives yesterday.. nope.. I have 16 but
miss counted them in my ifdefs (I had two #11s). Replaces with enum...
(2) Added a readonly MCA param for how many backend algorithms are available per collective (used by benchmarker/STS)
This allowed me to remove the tuned query internal functions and replace them with ompi_coll_tuned_forced_max_algorithms[COLL].
(3) I was reading the user forced MCA params for the collectives on each comm create (module init) but I then put the
values into a global set of variables (like ompi_coll_tuned_reduce_forced_algorithm).
To fix this and make the code neater:
(a) The component looks up the MCA param indices on Open if dynamic_rules is set via the
ompi_coll_tuned_COLLECTIVE_intra_check_forced_init () call.
(b) Got rid of the ompi_coll_ompi_coll_tuned_COLLECTIVE_forced_algorithm/segmentsize/etc globals with a struct that
is now cached on the module data hung off the communicator. i.e. done right.
(c) On module init if dynamic rules enabled we call a general getvalues routine (in coll_tuned_forced.c) to get the
CURRENT values using the MCA param indices and then put them on the modules data segment.
A shorter version of getvalues exists for barrier which only needs the algorithm choice
This commit was SVN r9663.
flag, new flags to be included when convertor is initialized
- modified pml/btl module defs and added stub functions for diagnostic
output routines to dump state of queues / endpoints
- updates to data reliability pml
This commit was SVN r9329.
- move files out of toplevel include/ and etc/, moving it into the
sub-projects
- rather than including config headers with <project>/include,
have them as <project>
- require all headers to be included with a project prefix, with
the exception of the config headers ({opal,orte,ompi}_config.h
mpi.h, and mpif.h)
This commit was SVN r8985.
be locally completing. for now using synchronous calls until the new functionality is available. then will change
the code to use the new PML send flags.
This commit was SVN r8867.
this was implemented using a chain (tree followed with pipeline) by setting the chain fanout to a factor of size etc but the chain datastructure was fixed in length and if exceeded the topo create returned a null which isn't helpfull in cid next function of comdup...
Anyway two fixes, first we do have a real linear function so changed the decision function and second altered the
topo chain create to force chain fanouts of less than 1 to 1 and fanouts bigger than max to max.
next check in will change chain to dynamically allocd array (reallocable) but we shouldn't ever use a chain fanout for a linear tree anyway.
(lession must rerun all tests for all data sizes when changing decision functions)
This commit was SVN r8662.
(apparently we've been doing this in opal and orte, but not in ompi
yet). All public symbols begin with "ompi_coll_tuned_" (not
mca_coll_tuned_) except the component struct. Now this component
passes the illegal symbol report with no hits.
This commit was SVN r8589.
testing. Note that this effectively replaces the "basic" component as
the baseline collective component. Please report any problems with
this component.
If you run into problems with this component, you can disable it with:
--mca coll_tuned_priority 0
This commit was SVN r8575.
displacement (for both the inter and intra-communicator version). The
displacements in scatterv are given in multiples of the sendtype.
This fix should probably make to v1.0.1 as well?
This commit was SVN r8251.
* turns out (duh!) that there was a reason that the <projectdir>dir
variable was set in the AM conditional. If not, stupid directories
are created and not needed... duh.
This commit was SVN r8205.
component/base Makefile.am files, reducing the time configure spends
stamping out Makefiles at the end
* Install base_impl.h file when devel-headers are being installed
This commit was SVN r8200.
Lots of misc fixes: printfs->opal_output, handles fanin/out correctly for forced ops
unused vars, correct calculations on meaning of 'msgsize' for decision functions
(varies depending on algorithm), etc
This commit was SVN r8113.
go through the dynamic decision rule interface.
(forced algorithms are set with MCA params)
fixed some silly verbose output with wrong func name in it etc
updates to fixed dec rules.
This commit was SVN r7940.
modules, if its priority is zero (the default value). Reason for that is
+ if there is no other module with a priority > 0, the hierarchical
collective module has a problem anyway, since it has to rely on the coll
modules of the subcommunicators. On the other hand, if its priority is
zero, it won't be chosen anyway, and we can simply save the
allreduce/allgather and comm_split operations which might occur during
hierarchy detection.
+ to improve the startup times until we have the modex thing which we
discussed with Jeff and Tim in Knoxville in place
- adding an mca parameter indicating a symmetric configuration. This can
speed up startup times, since each process can conclude from its data onto
the data of the other processes -> no need for the allreduce operations. Per
default this parameter is set to "no".
This commit was SVN r7932.
stage fairly confident that
- it works in most scenarious (with symmetric hierarchies, with asymmetric
hierarchies, wihout hierarchies - it just removes itself)
- it does not create too many problems (I am not aware of any at least)
- it does not slow down startup anymore dramatically (thanks to the fixes of
Brian, Jeff, Tim and a significant reduction in the number of collective
operations in the comm_query)
Any feedback is highly welcome.
This commit was SVN r7868.
started to add static (fixed if) statement based decision rules based on gigE numbers
added mca params so that a user can force a certain algorithm/segment/topo on a per collective basis
(this is not in the fixed call path but only in the dynamic (at com create) call path).
(these params can be used by test suites such as OCC to choice which algorithm they are using).
This commit was SVN r7854.
number of collective operations and simplifies the logic significantly.
- introducing a special case if size of comm == 1, avoiding thus collective
operations as well ( i.e. no need for hierarchies)
- fix for an unsymmetric case. Still to be tested.
This commit was SVN r7799.
originally suggested by Ralf Wildenhues, to try to speed autogen, configure,
and make (and possibly even make install). Use automake's include directive
to drastically reduce the number of Makefile files (although the number of
Makefile.am files is the same - most are just included in a top-level
Makefile.am). Also use an Automake SUBDIRs feature to eliminate the
dynamic-mca tree, which was no longer really needed. This makes adding
a framework easier (since you don't have to remember the dynamic-mca
tree) and makes building faster (as make doesn't have to recurse through
the dynamic-mca tree)
This commit was SVN r7777.
and for all root nodes and passed all tests.
First cut on barrier (which from my perspective does not make sense from the
performance point of view) and on allreduce (which might make sense),
This commit was SVN r7774.
done. This version also doesn't break ompi (at least if its not chosen :-) ).
New features compared to the version from last Thursday (where bcast and
reduce seemed to work in most scenarios):
- clearer internal infrastructure
- ability to handle all root processes with a (hopefully) minimal number of
local leader communicators.
This commit was SVN r7769.
(actually a work around for an optimisation in the reduce for not saving ops on the first recv of each segment)
Minor change in topo.
This commit was SVN r7758.
- update the hierarch stuff to use btl's instead of ptl's
- start the new logic regarding how to handle local leader communicators
This commit was SVN r7691.
reduce_inorder() function -- we don't use the tree at all.
- Add more relevant "volatile"'s for the control buffers in the
fragment mpool (and associated casts where necessary)
This commit was SVN r7616.
- Move the "process 0" logic out of the main loop in reduce to make
the code a bit less complex (at the price of slight code
duplication, but it iss now significantly easier to read)
- Fix problem with uniquenes guarantee in the bootstrap mpool -- using
the CID alone was not sufficient enough to guarantee uniquenes; now
use (CID, rank 0 process name) tuple to check for uniqueness
- Made a few debugging help changes in coll_sm.h; especially helps
debugging on uniprocessors
This commit was SVN r7599.
- Move one base global to the basic component and make it an MCA
parameter
- Convert the basic component to use the new MCA param API
This commit was SVN r7598.
lower the default priority to 0 so that it's not active unless you
specifically ask for it (this component needs more testing by people
other than me before we unleash it on the public).
This commit was SVN r7545.
Makefile.options
- Sample in each of the three projects of how to link againt the
relevant libraries so that when components are loaded into a parent
process' space, we don't rely on the libopal/liborte/libmpi symbols
being in the parent's public symbol namespace -- instead,
dynamically link to the relevant libraries, allowing the dynamic
linker to pull those libraries in at run-time, if needed
This commit was SVN r7397.
- remove redundant OBJ_CONSTRUCT in bcast
- fix up some macros in coll_sm.h
- check to ensure that if there are too many processes in the
communicator (i.e., if we couldn't fit a flag for each of them in
the control segment), then fail selection
- setup the in_use flags properly
- adapt to new mpool API
- first working copy of reduce -- not tree-baed (but still
NUMA-aware), and only processes in order from process 0 to process
N-1 -- do not have a tree-based and/or commutative version yet
(i.e., process the results in whatever order they arrive)
Reduce now passes the new ibm reduce_big.c test. Woo hoo! Time to
declare success for the evening (and run the intel test tomorrow).
This commit was SVN r7379.
all processes call MPI_Gatherv(MPI_IN_PLACE...) because IN_PLACE is
only allowed to be used at the root. Non-root processes must use
their receive buf as the send buf.
This commit was SVN r7363.
- added relevant logic for everything except
mca_coll_basic_reduce_log_intra() -- need some help from George /
Edgar on this one...
- replaced ompi_ddt_sndrcv() with ompi_ddt_copy_content_same_ddt()
where relevant
- removed some "if (size > 1)" conditionals, because the self coll
module will always be chosen for collectives where size==1
Waiting for BA's tests to check the validity of this IN_PLACE stuff.
We'll see how it goes!
This commit was SVN r7351.
-added some alltoall calls (pairwise checked ok, bruck testing)
-changes in use of data hung of communicator
-making sendrecv call a true inline function
-more use ompi_ddt routines
This commit was SVN r7337.
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
number to be set at autoconf time (instead of at configure time, as
it was before). Set the version number, minus the subversion r number,
at autoconf time. Override the internal variables to include the r
number (if needed) at configure time. Basically, the right thing
should always happen. The only place it might not is the version
reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
in the directory containing source files, even if the Makefile.am is
in another directory. This should start making it feasible to
reduce the number of Makefile.am files we have in the tree, which
will greatly reduce the time to run autogen and configure.
This commit was SVN r7211.
- finally added "in use" flags -- one flag protects a set of segments
- these flags now used in bcast to protect (for example) when a
message is so long that the root loops around the segments and has
to re-use old segments -- now it knows that it has to wait until the
non-root processes have finished with that set of segments before it
can start using them
- implement allreduce as a reduce followed by a bcast (per discussion
with rich)
- removed some redundant data on various data structures
- implemented query MCA param ("coll_sm_shared_mem_used_data") that
tells you how much shared memory will be used for a given set of MCA
params (e.g., number of segments, etc.). For example:
ompi_info --mca coll_sm_info_num_procs 4 --param coll sm | \
grep shared_mem_used_data
tells you that for the default MCA param values (as of r7172), for 4
processes, sm will use 548864 bytes of shared memory for its data
transfer section
- remove a bunch of .c files from the Makefile.am that aren't
implemented yet (i.e., all they do is return ERR_NOT_IMPLEMENTED)
Now on to the big Altix to test that this stuff really works...
This commit was SVN r7205.
The following SVN revision numbers were found above:
r7172 --> open-mpi/ompi@bc72a7722b
- bcast now works properly for root!=0 and multi-fragment messages
- destroy mpool when communicator is destroyed
Still need to implement:
- "in use" flags for groups of fragments so that "wrapping around" in
the data segment doesn't overwrite not-yet-read data
- ensure that shared memory isn't removed before all processes have
finished with it (e.g., during COMM_FREE)
This commit was SVN r7172.
add a -I to find the included ltdl.h (vs. a system-installed ltdl.h)
- Clean up kruft in a bunch of Makefile.am's to remove now-unnecessary
AM_CPPFLAGS settings to get static-components.h for each framework
- Move the component_repository API functions out of opal/mca/base/base.h
and into opal/mca/base/mca_base_component_repository.h in order to
decrease unnecessary dependencies (e.g., before this, almost
everything in the tree depended on ltdl.h, which is unnecessary --
only a small number of files really need ltdl.h)
This commit was SVN r7127.
Changed component so choice of decision functions controlled by mca params
(for now fixed decision functions (if statements) default)
started fixes for the various bcasts
This commit was SVN r7117.
much time) and somewhat-lame implementation of barrier (need to
precompute some more stuff rather than calculate it every time).
Checkpointing so I can try this on another machine...
This commit was SVN r6985.
of spaces (curses! indent(1) had been updated with a new option that
I did not use). This commit simply converts tabs to real spaces.
This commit was SVN r6799.
multiple components to share a single mpool module (e.g., the
ptl/btl and coll sm components).
- Re-tool the ptl, btl, and coll sm components to first look for the
target mpool module, and if they don't find it, to create it.
- coll sm component now correctly identifies when it is supposed to
run or not (i.e., if all the processes in the communicator are on
the same host). Now we just need to fill in some algorithms. :-)
This commit was SVN r6530.
- After long discussions and ruminations on how we run components in
LAM/MPI, made the decision that, by default, all components included
in Open MPI will use the version number of their parent project
(i.e., OMPI or ORTE). They are certaint free to use a different
number, but this simplification makes the common cases easy:
- components are only released when the parent project is released
- it is easy (trivial?) to distinguish which version component goes
with with version of the parent project
- removed all autogen/configure code for templating the version .h
file in components
- made all ORTE components use ORTE_*_VERSION for version numbers
- made all OMPI components use OMPI_*_VERSION for version numbers
- removed all VERSION files from components
- configure now displays OPAL, ORTE, and OMPI version numbers
- ditto for ompi_info
- right now, faking it -- OPAL and ORTE and OMPI will always have the
same version number (i.e., they all come from the same top-level
VERSION file). But this paves the way for the Great Configure
Reorganization, where, among other things, each project will have
its own version number.
So all in all, we went from a boatload of version numbers to
[effectively] three. That's pretty good. :-)
This commit was SVN r6344.
* rename ompi_basename to opal_basename
* rename ompi bitop functions to opal
* rename ompi_cmd_line to opal_cmd_line
* rename ompi_sizet2int to opal_sizet2int
* rename orte_daemon_init to opal_daemon_init
* rename ompi_few to opal_few
This commit was SVN r6330.