(aka the root). This commit is based on a patch provided by Pierre
Jolivet.
Fix all the output to match the failing MPI call.
This commit was SVN r29761.
To support the new mpool two changes were made to the mpool infrastructure:
1) Added an mpool flag to indicate that an mpool does not need the memory
hooks to use the leave pinned protocols. This flag is checked in the
mpool lookup.
2) Add a mpool context to the base registration. This new member is used
by the udreg mpool to store the udreg context associated with the
particular registration. The new member will not break the ABI
compatibility as the new member is only currently used by the udreg
mpool.
Dynamics support for Cray systems makes use of the global rank provided by
orte to give the ugni library a unique rank for each process. Dynamics
support is not available under direct-launch (srun.)
cmr=v1.7.4
This commit was SVN r29719.
Only use Portals on communicators with more than one rank
Fix computation of number of children when using the hypercube tree
This commit was SVN r29616.
and tuned to correctly handle 0 recvcounts.
Tested with the reproducer from #1550.
Refs trac:1559
This commit was SVN r29542.
The following Trac tickets were found above:
Ticket 1559 --> https://svn.open-mpi.org/trac/ompi/ticket/1559
The algorithms are intended for MPI-3.0 compliance and are not
optimized. We should aim to add better algorithms in the future through
cheetah.
MPI_Iallreduce and MPI_Igatherv on intercommunicators are required for
MPI_Comm_idup support.
cmr=v1.7.4:reviewer=brbarret:ticket=trac:2715
This commit was SVN r29333.
The following Trac tickets were found above:
Ticket 2715 --> https://svn.open-mpi.org/trac/ompi/ticket/2715
1. Change in rte api implementation: now comm_world used to do p2p.
This allows to not worry about other comms being destroyed.
2. added a notification mechanism with a help of which runtime can say libhcoll that RTE api can not be used any longer.
pass a pointer to a flag, and its size to libhcoll.
The flag changes when the RTE is no longer available.
Currently this flag is just ompi_mpi_finalized global bool value.
cmr=v1.7.3:reviewer=jladd
This commit was SVN r29331.
Blocking versions are simple linear algorithms implemented in coll/basic. Non-
blocking versions are from libnbc 1.1.1. All algorithms have been tested with
simple test cases.
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r29265.
of MPI_Alltoall.
- add support for MPI_IN_PLACE in the self collective component.
- fix the extent usage in the tuned collective component.
- correctly use the peer counts instead of local - add support for MPI_IN_PLACE in the self collective component.
- fix the extent usage in the tuned collective component.
- correctly use the peer counts instead of local.
Thanks to Fujitsu for the patch.
This commit was SVN r29187.
configure-time dynamic allocation of flags. The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process. Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.
This commit was SVN r29100.
option to autodetect whether fragmentation should be enabled
cmr=v1.7.3:ticket=trac:3717
This commit was SVN r29065.
The following Trac tickets were found above:
Ticket 3717 --> https://svn.open-mpi.org/trac/ompi/ticket/3717
Working on faster algorithms for tuned that will come at a later time.
cmr=v1.7.3:ticket=trac:2965
This commit was SVN r28952.
The following Trac tickets were found above:
Ticket 2965 --> https://svn.open-mpi.org/trac/ompi/ticket/2965
many builds. I am temporarily .ompi_ignore'ing this component until
it can be fixed by its owner.
* It calls AC_MSG_ERROR, which configure.m4 scripts are ''never''
supposed to do. If you don't want to build, then call $2.
* All static and --disable-dlopen builds are broken; they fall afoul
of whatever test configure.m4 is doing and therefore error out of
configure entirely (vs. simply disabling the hcoll component).
* There appear to be multiple shell scripting errors in the
configure.m4. Here's the output of "./configure --disable-dlopen":
{{{
--- MCA component coll:hcoll (m4 configuration macro)
checking for MCA component coll:hcoll compile mode... static
checking --with-hcoll value... simple ok (unspecified)
./configure: line 421: test: basic: integer expression expected
configure: error: Can not use coll/hcoll and coll/ml (static build)
simultaneously. You have two options:
1. Use static build & disable ml with:
--enable-mpi-no-build=coll-ml
2. Use dso build for ML & disable ml at runtime: -mca
coll self
./configure: line 310: return: basic: numeric argument required
./configure: line 320: exit: basic: numeric argument required
}}}
Finally, all of these configure.m4 errors aside, I don't understand
why there is a ''compile-time'' exclusion between the hcoll and ml
components. Why isn't this a ''run-time'' decision? Having what
seems to be an unnecessary compile-time exclusion goes against the
general Open MPI philosophy.
Note: Open MPI 1.7 is also broken in all the same ways. I suggest
that the RM's .ompi_ignore hcoll over there, too.
Mellanox: please fix.
This commit was SVN r28748.
value to signal that the operation of retrieving the element from the free list
failed. However in this case the returned pointer was set to NULL as well, so the
error code was redundant. Moreover, this was a continuous source of warnings when
the picky mode is on.
The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of
using the return code.
This commit was SVN r28722.
of individual regions (each region is a multiple of page size in
length), and each process claims its own regions by binding it to its
local memory. Each process would end up membining something like 16
individual regions in the overall shmem segment.
There were two errors in this code relating to the memory affinity
pinning. Some combination of these two errors would lead to kernel
panics (!) on my RHEL 6.2 x86_64 machines when used with mmap'ed
shared memory (not posix or sysv shared memory, curiously enough):
1. The shared memory segment is initially divided into two regions:
control and data. The control starts at the beginning of the shmem
segment, the data starts after that. The data portion, unfortunately,
was ''not'' aligned to a page. So all the multiple-of-page-size
regions that we divvy up were also not alined on page boundaries. And
therefore all the regions we tried to membind were not on page
boundaries.
The solution was to ensure that the data portion started on a page
boundary. Then all of the individual regions were on page boundaries,
too.
That being said, in my tests, Linux mbind() fails gracefully when the
address is not on a page boundary. So I'm not sure how this worked at
all / led to a kernel panic...
2. There was some bad pointer math that resulted in membinding regions
larger than they should have been, resulting in region overlaps.
There were definitely overlaps between regions in the same process;
it's likely that there were overlaps between regions of multiple
processes, too -- I'm not sure (and don't care to figure out :-) ).
The solution was to fix the pointer math so that each region membinds
exactly only itself and no neighboring/overlapping regions.
cmr:v1.7.2:reviewer=samuel
This commit was SVN r28442.
Notes:
- This commit also eliminates the need for an available components list in use
in several frameworks. None of the code in question was making use of the
priority field of the priority component list item so these extra lists were
removed.
- Cleaned up selection code in several frameworks to sort lists using opal_list_sort.
- Cleans up the ompi/orte-info functions. Expose the functions that construct the
list of params so they can be used elsewhere.
patches for mtl/portals4 from brian
missed a few output variables in openib
This commit was SVN r28241.
Features:
- Support for an override parameter file (openmpi-mca-param-override.conf).
Variable values in this file can not be overridden by any file or environment
value.
- Support for boolean, unsigned, and unsigned long long variables.
- Support for true/false values.
- Support for enumerations on integer variables.
- Support for MPIT scope, verbosity, and binding.
- Support for command line source.
- Support for setting variable source via the environment using
OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
- Cleaner API.
- Support for variable groups (equivalent to MPIT categories).
Notes:
- Variables must be created with a backing store (char **, int *, or bool *)
that must live at least as long as the variable.
- Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
mca_base_var_set_value() to change the value.
- String values are duplicated when the variable is registered. It is up to
the caller to free the original value if necessary. The new value will be
freed by the mca_base_var system and must not be freed by the user.
- Variables with constant scope may not be settable.
- Variable groups (and all associated variables) are deregistered when the
component is closed or the component repository item is freed. This
prevents a segmentation fault from accessing a variable after its component
is unloaded.
- After some discussion we decided we should remove the automatic registration
of component priority variables. Few component actually made use of this
feature.
- The enumerator interface was updated to be general enough to handle
future uses of the interface.
- The code to generate ompi_info output has been moved into the MCA variable
system. See mca_base_var_dump().
opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system
This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.
This commit was SVN r28236.
ompi_show_help, because opal_show_help is replaced with an
aggregating version when using ORTE, so there's no reason to
directly call orte_show_help.
This commit was SVN r28051.