Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
- consistent arguments checking (not allowing to select an algorithm which
is not available)
- consistent way of computing the segcount (number of datatypes by segment).
- small cleanups.
- more informative debugging messages.
This commit was SVN r12545.
description. Most of the bcast algorithms can be completed using this
generic function once we create the tree structure. Add all kind of
trees.
There are 2 versions of the generic bcast function. One using overlapping
between receives (for intermediary nodes) and then blocking sends to all
childs and another where all sends are non blocking. I still have to
figure out which one give the smallest overhead.
This commit was SVN r12530.
N gatherv's:
for (i = 0 ... size)
MPI_Gatherv(..., root = i, ...)
The new algorithm simply does (effectively):
MPI_Gatherv(..., root = 0, ...)
MPI_Bcast(..., root = 0, ...)
This commit was SVN r12469.
allocation logic is completely done outside the data-type engine (in the PML) there is
no need for any special case inside the data-type engine. There is less arguments for
the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is
not required anymore as there is no memory allocated in the engine itself). This change
affect all components using datatypes. I test most of them, but it might happens that I
miss some ... If it's the case please let me know (don't shoot the pianist!!).
This commit was SVN r12331.
the default decision functions (for broadcast, reduce and barrier) are based on a
high performance network (not TCP). It should give good performance (really good) for
any network having the following caracteristics: small latency (5 microseconds) and good
bandwidth (more than 1Gb/s).
+ Cleanup of the reduce algorithms, plus 2 new algorithms (binary and binomial). Now most
of the reduce algorithms use a generic tree based function for completing the reduce.
+ Added macros for computing the trees (they are used for bcast and reduce right now).
+ Allow the usage of all 5 topologies.
+ Jelena's implementation of a binary tree that can be used for non commutative operations.
Right now only the tree building function is there, it will get activated soon.
+ Some others minor cleanups.
This commit was SVN r12326.
all platforms. The only exceptions (and I will not deal with them
anytime soon) are on Windows:
- the write functions which require the length to be an int when it's
a size_t on all UNIX variants.
- all iovec manipulation functions where the iov_len is again an int
when it's a size_t on most of the UNIXes.
As these only happens on Windows, so I think we're set for now :)
This commit was SVN r12215.
size and diplacement of data-type. After this patch all data can contain size_t bytes
and the displacements are defined as ptrdiff_t. All of the files I was able to compile
have been modified to match this requirement.
This commit was SVN r12146.
George: ompi_ddt_type_size() returns a signed int only because of the
MPI spec; it will never return a negative value. So casting the
return value out of it to a (uint32_t) is safe, and makes the
comparisons be between two unsigned values.
This commit was SVN r11639.
The following SVN revision numbers were found above:
r11619 --> open-mpi/ompi@8667648a1b
todos: macroize it as we do it 10 different ways, add mca params to control handling (push up size, no change, switch off segmenting)
This commit was SVN r11619.
I know it does not make much sense but one can play around with the
performance. Numbers are available at http://www.unixer.de/research/nbcoll/perf/.
This is the first step towards collv2. Next step includes the addition
of non-blocking functions to the MPI-Layer and the collv1 interface.
It implements all MPI-1 collective algorithms in a non-blocking manner.
However, the collv1 interface does not allow non-blocking collectives so
that all collectives are used blocking by the ompi-glue layer.
I wanted to add LibNBC as a separate subdirectory, but I could not
convince the buildsystem (and had not the time). So the component looks
pretty messy. It would be great if somebody could explain me how to move
all nbc*{c,h}, and {hb,dict}*{c,h} to a seperate subdirectory.
It's .ompi_ignored because I did not test it exhaustively yet.
This commit was SVN r11401.
different macros, one for each project. Therefore, now we have OPAL_DECLSPEC,
ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project.
This commit was SVN r11270.
shared memory segments
* make sure to properly unlink the collectives sm bootstrap area at
shutdown
* Add missing / in the path for the mpool shared memory segment
* make sure to release the common_mmap structure in the SM btl
after unlinking the file during shutdown
This commit was SVN r10886.
yes this means it WAS possible for two nodes to choice two different algorithms
(discovered by Doug Gregor and figured out by George)
Also changed some names like size to comsize so we know which sizes we are using where
This should be updated in al versions
This commit was SVN r10601.
(1) As pointed out by Torsten after Jeff comment that there are 15 collectives yesterday.. nope.. I have 16 but
miss counted them in my ifdefs (I had two #11s). Replaces with enum...
(2) Added a readonly MCA param for how many backend algorithms are available per collective (used by benchmarker/STS)
This allowed me to remove the tuned query internal functions and replace them with ompi_coll_tuned_forced_max_algorithms[COLL].
(3) I was reading the user forced MCA params for the collectives on each comm create (module init) but I then put the
values into a global set of variables (like ompi_coll_tuned_reduce_forced_algorithm).
To fix this and make the code neater:
(a) The component looks up the MCA param indices on Open if dynamic_rules is set via the
ompi_coll_tuned_COLLECTIVE_intra_check_forced_init () call.
(b) Got rid of the ompi_coll_ompi_coll_tuned_COLLECTIVE_forced_algorithm/segmentsize/etc globals with a struct that
is now cached on the module data hung off the communicator. i.e. done right.
(c) On module init if dynamic rules enabled we call a general getvalues routine (in coll_tuned_forced.c) to get the
CURRENT values using the MCA param indices and then put them on the modules data segment.
A shorter version of getvalues exists for barrier which only needs the algorithm choice
This commit was SVN r9663.