* Remove OPAL_ERR_PAFFINITY_NOT_SUPPORTED; fit it into the generic
OPAL_ERR_NOT_SUPPORTED case.
* When odls_default detects that processor affinity is not supported,
it prints a specific message about it, and then it suppressed a
generic HNP help message that would normally follow it (i.e., it's
easier to have the "processor affinity is not supported" show_help
message last).
* Use some symbolic names in odls_default instead of fixed int's,
just for slight readability improvements in the code.
* Introduce orte_show_help_suppress(), which gives the ability to
suppress any future showings of any arbitrary show_help() message.
This is useful if you display message X and want to suppress
message Y. This suppression *only* works in environments where
orte_show_help() does coalescing.
This commit was SVN r23249.
distribution tarball, and would therefore cause automake to fail (in
case someone invokes autogen.sh on a distribution tarball).
This commit was SVN r23218.
* If < 0, it's an OPAL_ERR_* value
* If >= 0, it's the actual output value of the function
This is problematic for the OPAL_SOS stuff. This commit changes those
functions to always return OPAL_* statuses and send the output value
back through output parameters (like 95% of the rest of the code
base). This avoids the confusion with OPAL_SOS stuff and makes
paffinity work again (e.g., mpirun --bind-to-core ...).
I updated all paffinitiy modules for the new function signatures, and
bumped the paffinity API version up to 2.0.1. I don't think the
version change will matter, though, because we'll be introducing
support for hardware threads soon, which will either bump the
paffinity version again or we'll replace paffinity with
a new framework.
This commit was SVN r23197.
The fix is to just check if the return value is positive or not, since all the SOS encoded errors are *always* negative.
The real fix (as Ralph points out) is to change these functions (opal_pointer_array_add and mca_base_param*) to return the index as a pointer.
This commit was SVN r23173.
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
back the native error code.
* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
(OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
decode 'ret' to get the native error code.
This commit was SVN r23162.
The OPAL SOS framework tries to meet the following objectives:
* reduce the cascading error messages and the amount of code needed to print an error message.
* build and aggregate stacks of encountered errors and associate related individual errors with each other.
* allow registration of custom callbacks to intercept error events.
For more information, refer to
https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
This commit was SVN r23158.
I forgot to mention one more thing in the r23152 commit message:
* Copy the fix for hwloc's m4 to disable the configure flag
--enable-debug when building in embedding mode, because it can be
hijacked by the outter-level application. In this case, if you
configured OMPI with --enable-debug (or have --enable-debug in a
platform file), you'd see all of hwloc's debug output. Ick. hwloc
1.0 will include this fix.
This commit was SVN r23153.
The following SVN revision numbers were found above:
r23152 --> open-mpi/ompi@ca3362021e
* Fix disabling hwloc build (i.e., put the AM_CONDITIONALs where they
belong in the configure.m4 file)
* Update some svn:ignores
* r23142 removed some extraneous code, but forgot to remove the
variables used only by that code
This commit was SVN r23152.
The following SVN revision numbers were found above:
r23142 --> open-mpi/ompi@610fc67d12
supports a wide variety of operating systems and platforms; see the
opal/mca/paffinity/hwloc/hwloc/README file for details.
This component includes an embedded copy of hwloc, currently based on
hwloc-1.0rc6. But note that hwloc is properly SVN imported into the
/vendor branch, so it will be easy to update when 1.0 GA is released.
Note that the hwloc tree embedded in opal/mca/paffinity/hwloc/hwloc is
identical to a hwloc distribution tarball, except that much of the
documentation was rm -rf'ed (because we don't need it for the embedded
case).
Since the paffinity framework currently does not understand hardware
threads, the hwloc component compensates for this by identifying cores
by the "first" hardware thread on that core. Hopefully we'll update
paffinity someday to understand hardware threads. :-)
configure grew a --with-hwloc option, analogous to what we do for many
other external libraries that OMPI supports. However, there's a new
feature: due to the request of several distros, OMPI can be configured
to build with its internal copy of hwloc or with an external copy of
hwloc (e.g., a system-installed hwloc).
1. If --with-hwloc is not specified, Open MPI will try to use its
internal copy (but silently fail/ignore hwloc if that fails).
1. If --with-hwloc=<dir> is supplied, Open MPI looks for hwloc
support in <dir> (and --with-hwloc-libdir=<dir>, if specified).
1. If --with-hwloc=external is supplied, Open MPI will look for hwloc
in a compiler/linker default external location.
1. If --with-hwloc=internal is supplied, Open MPI will use its
internal copy of hwloc.
Some of OMPI's main configury had to be slightly re-arranged in the
bootstrapping phase to accomodate hwloc's configry needs.
This commit was SVN r23125.
http://marc.info/?l=linux-mm-commits&m=127352503417787&w=2 for more
details.
* Remove the ptmalloc memory component; replace it with a new "linux"
memory component.
* The linux memory component will conditionally compile in support
for ummunotify. At run-time, if it has ummunotify support and
finds run-time support for ummunotify (i.e., /dev/ummunotify), it
uses it. If not, it tries to use ptmalloc via the glibc memory
hooks.
* Add some more API functions to the memory framework to accomodate
the ummunotify model (i.e., poll to see if memory has "changed").
* Add appropriate calls in the rcache to the new memory APIs to see
if memory has changed, and to react accordingly.
* Add a few comments in the openib BTL to indicate why we don't need
to notify the OPAL memory framework about specific instances of
registered memory.
* Add dummy API calls in the solaris malloc component (since it
doesn't have polling/"did memory change" support).
This commit was SVN r23113.
It is okay to not have a paffinity module IF you aren't using paffinity anyway. So don't error out of MPI_Init because a paffinity module wasn't selected.
Cleanup error reporting in the odls default module to (once and for all!) eliminate messages originating in the fork'd process. Create some new error codes to allow us to pass enough info back to the parent process to provide useful error messages.
This commit was SVN r23106.
and opal_atomic_lifo_pop. Adds memory barriers to remove the race
condition
This commit was SVN r23014.
The following Trac tickets were found above:
Ticket 2355 --> https://svn.open-mpi.org/trac/ompi/ticket/2355
done this way a long time ago for the "gee whiz!" factor -- when in
reality, they really only need one-of-many-run-time priority
selection).
Changed run-time priorities to be as follows:
* darwin: 20
* linux: 20
* posix: 10
* solaris: 30
* test: 5
* windows: 20
I have a very dim (possibly untrue) recollection that Solaris needs to
have a higher priority than others just to ensure that no other is
chosen under Solaris. Make all other "native" components have a
priority of 20 (they shouldn't conflict with each other). Make the
posix fallback component have a priority of 10. Make the test
component priority 5, meaning someone can always select it, but you
can also make a "never select me" component that prioritizes itself
under test.
This commit was SVN r22997.
modify the OPAL_PAFFINITY_PROCESS_IS_BOUND macro to search the cpuset for
the maximum possible number of cpus rather than just the number of cpus
currently online. This corrects a problem where mpi_paffinity_alone was
not working properly on systems in which there can be cpu namespaces with
holes, such as on ppc64 with smt off (as discussed in #2365).
This commit was SVN r22927.
#if directives -- had to change a pair of #if conditionals in
opal/util/stacktrace.c to make the PGI compiler accept it.
This commit was SVN r22923.
The following Trac tickets were found above:
Ticket 2366 --> https://svn.open-mpi.org/trac/ompi/ticket/2366
1. fix a bug that caused an infinite loop in configure when specifying want-ft but not want-ft-thread by removing a stale reference to the opal-progress-thread option
2. add want-ft=orcm so we can build the orcm errmgr component
3. cleanup the use of "ompi_want_ft_xxx" and replace it with "opal_want_ft_xxx" so that naming conventions are preserved
This commit was SVN r22885.
Adds memory barriers to remove race condition which can
occur on PowerPC architectures (and probably others)
This commit was SVN r22880.
The following Trac tickets were found above:
Ticket 2355 --> https://svn.open-mpi.org/trac/ompi/ticket/2355
#2322.
The short version is that this patch consolidates two pieces of code
that call the back-end munmap and ensures that (if dlsym is used) the
corresponding dlsym is only invoked once and that the variable holding
the result is volatile.
This commit was SVN r22863.
The following Trac tickets were found above:
Ticket 2104 --> https://svn.open-mpi.org/trac/ompi/ticket/2104
Remove the --enable-progress-threads option as this is no longer functional, and hardcode OPAL_ENABLE_PROGRESS_THREADS to 0.
Replace the --enable-mpi-threads option with --enable-mpi-thread-multiple as this is clearer as to meaning. This option automatically turns "on" opal thread support if it wasn't already so specified. If the user specifies --disable-opal-multi-threads --enable-mpi-thread-multiple, we will error out with a message
Add a new --enable-opal-multi-threads option that turns "on" opal thread support without doing anything wrt mpi-thread-multiple
This commit was SVN r22841.
Aleksej Saushev.
Dont use bash or bashism in shell scripts
We should use Posix' setpgid(0,0), which is equivalent to setpgrp().
This commit was SVN r22829.
Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those.
The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed.
This commit was SVN r22824.
bug: libmpi_f90 had libmpi.la in its LIBADD instead of libmpi_f77.la.
Fixes trac:2244.
This commit was SVN r22704.
The following Trac tickets were found above:
Ticket 2244 --> https://svn.open-mpi.org/trac/ompi/ticket/2244
discussed extensively. See
https://svn.open-mpi.org/trac/ompi/ticket/2092 and the RFC thread
http://www.open-mpi.org/community/lists/devel/2010/02/7447.php.
Specifically:
* Create LT convenience libraries for OPAL and ORTE if the layer
above them is being created (use the already-defined
AM_CONDITIONALs to know if the project above us is being built).
* ORTE slurps in the LT convenience library for OPAL; OMPI slurps in
the LT convenience library for ORTE.
* Wrapper compilers now only -l one library (e.g., ortecc only does
-lopen-ret, and mpicc only does -lmpi).
This commit was SVN r22691.
discussion on the users list (see
http://www.open-mpi.org/community/lists/users/2009/12/11526.php).
Many thanks to Kevin Buckley who did most of the coding work, and to
Aleksej Saushev for his extreme patience in waiting for me to review
and commit this stuff.
This commit was SVN r22640.
If file does not exist, check the directory it lives in...
Maybe used by caller, trying to open mmap() on NFS, Lustre or
Panasas (thanks Sam).
For now, this is used to warn about the usage of mmap on such FS.
Please note, that Ralph mentioned the orte_no_session_dir parameter.
The help message includes a reference to this.
Tested on NFS and Lustre on Linux on
smoky: mpirun --mca orte_tmpdir_base $HOME/tmp -np 2 ./mpi_stub
jaguar: mpirun ... --mca orte_tmpdir_base /tmp/work/$USER ...
Fixes trac:1354
This should cmr:v1.5 once it has soaked and is shown to work on
Solaris
This commit was SVN r22604.
The following Trac tickets were found above:
Ticket 1354 --> https://svn.open-mpi.org/trac/ompi/ticket/1354
anything for non-MPI apps. Oops! (But before you freak out, gentle
reader, note that mpi_paffinity_alone for MPI apps still worked fine)
When we made the switchover somewhere in the 1.3 series to have the
orted's do processor binding, then stuff like:
mpirun --mca mpi_paffinity_alone 1 hostname
should have bound hostname to processor 0. But it didn't because of a
subtle startup ordering issue: the MCA param registration for
opal_paffinity_alone was in the paffinity base (vs. being in
opal/runtime/opal_params.c), but it didn't actually get registered
until after the global variable opal_paffinity_alone was checked to
see if we wanted old-style affinity bindings. Oops.
However, for MPI apps, even though the orted didn't do the binding,
ompi_mpi_init() would notice that opal_paffinity_alone was set, yet
the process didn't seem to be bound. So the MPI process would bind
itself (this was done to support the running-without-orteds
scenarios). Hence, MPI apps still obeyed mpi_paffinity_alone
semantics.
But note that the error described above caused the new mpirun switch
--report-bindings to not work with mpi_paffinity_alone=1, meaning that
the orted would not report the bindings when mpi_paffinity_alone was
set to 1 (it ''did'' correctly report bindings if you used
--bind-to-core or one of the other binding options).
This commit separates out the paffinity base MCA param registration
into a small function that can be called at the Right place during the
startup sequence.
This commit was SVN r22602.
Not having this check was causing distcheck errors on the OMPI
tarball-build machine because it's still a 32-bit-default machine, so
the evutil.c code was failing some #if conditionals (since it didn't
think it had strtoll available).
This commit was SVN r22577.