The alps ess component is obsolete. It relies on header
files only present in very old CLE (Cray Linux) 3.X for
the Cray XT series. As support for these systems is being
dropped starting with release 1.9, this code is being removed.
Several updates, including:
* Remove -single dash options
* Don't chmod the whole tree; just chmod the files we're trying to remove
* No more support for SVN or HG; 100% git
* Strengthen the dirty repo checks
* Use git describe for the repo version
* Set tarball_version to "" (i.e., empty) in VERSION
Removed a redundant copy of the scripts running on the build server
and moved the remaining copy out to a top-level directory in contrib
(i.e., contrib/build-server vs. contrib/dist/build-server, where I
never could remember where to find them).
Update the VERSION file scheme:
* Remove "want_repo_rev".
* Add "tarball_version".
All values are now always included (major, minor, release, greek,
repo_rev). However, configure.ac now runs "opal_get_version.sh
... --tarball", which will return the value of tarball_version (if it
is non-empty) or the "full" version string (i.e.,
"major.minor.releasegreek").
Remove configure.params support: configure.params hasn't been used in
years.
Also remove autogen.subdirs support; those should really be handled by
their respective Makefile.am's.
A problem was found with the libnbc MPI_Iallgather
routine when using intercommunicators. Special
thanks to Takahiro Kawashima(Fujitsu) for the patch
and a test case. Verified master fails without the
patch and the test passes with the patch applied.
fixes#219
It turns out that the alps plm code was developed only
on cray systems that were running batch schedulers.
However, for bring up and development systems, its not
at all uncommon for there to be no batch scheduler, and
thus to orte it appears that orte_num_allocated_nodes
is always zero. This forces a user using mpirun on such
a system to always specify a host list:
mpirun -n 4 -N 1 -host 32,45,68 ....
just to get the job to run, but then since the -L argument for aprun
is never built, the app always runs on the first batch of nodes that
aprun finds available.
Version numbering and "make dist" are quite complicated/subtle; I'm
not going to get this finished tonight. So revert VERSION to enable
other people to build.
More fixes coming soon...
This is a first cut at updating various infrastructure for git. There
will definitely be more commits; some of the scripts require
committed/pushed code (e.g., the various make-tarball scripts). So
it's not possible to know if we got it right without committing/pushing.
We don't use this script any more (we use gitdub now), but it took a
long time to figure this out. So I'm putting this script in git just
so that it's in history if we ever need it again.
Remove sections of README concerning Cray that are no longer
relevant owing to being obsolete. Minor grammar fixes.
Clarification of the --with-pmi section, as this works
differently for Cray systems.
This commit was SVN r32819.
In all previous releases, the version number would be "A.B.C" unless C
was 0, in which case it would be "A.B". This commit changes that
scheme to always be "A.B.C", even if C==0.
Hence, v1.9.0 will be the first release where this new scheme is evident.
This commit was SVN r32816.
Initialize the blocking_fence flag to false as the code logic indicates that it should only be set if someone provides that flag.
Thanks to Lisandro Dalcin for reporting it
cmr=v1.8.4:reviewer=hjelmn
This commit was SVN r32812.
Per discussions with pmix folks, it was determined that
the way the cray pmi pmix component was computing the
PMIX_NODE_RANK attribute for a process was incorrect.
This commit fixes the problem.
This commit was SVN r32810.
When using the native aprun launcher, it was observed that
there were frequent memory corruption errors occuring either
during a PMI kvs-fence operation, or at mpi termation during
opal cleanup of allocated objects. This was especially bad
when using
aprun --c none
In some cases, the application would even just hang in finalize
if using ptmalloc, owing to some kind of infinite loop in
cleanup of small blocks, etc.
It turns out that the proble was in orte_ess_base_proc_binding's
improper use of opal_hwloc_base_get_available_cpus. The cpuset
(bitmap) returned from that function is not meant to be freed
by the caller.
This problem is likely never observed when using the mpirun launcher
as there's an early exit if the OMPI_MCA_orte_bound_at_launch
environment variable is set.
This commit was SVN r32809.