modify the OPAL_PAFFINITY_PROCESS_IS_BOUND macro to search the cpuset for
the maximum possible number of cpus rather than just the number of cpus
currently online. This corrects a problem where mpi_paffinity_alone was
not working properly on systems in which there can be cpu namespaces with
holes, such as on ppc64 with smt off (as discussed in #2365).
This commit was SVN r22927.
#2322.
The short version is that this patch consolidates two pieces of code
that call the back-end munmap and ensures that (if dlsym is used) the
corresponding dlsym is only invoked once and that the variable holding
the result is volatile.
This commit was SVN r22863.
The following Trac tickets were found above:
Ticket 2104 --> https://svn.open-mpi.org/trac/ompi/ticket/2104
Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those.
The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed.
This commit was SVN r22824.
anything for non-MPI apps. Oops! (But before you freak out, gentle
reader, note that mpi_paffinity_alone for MPI apps still worked fine)
When we made the switchover somewhere in the 1.3 series to have the
orted's do processor binding, then stuff like:
mpirun --mca mpi_paffinity_alone 1 hostname
should have bound hostname to processor 0. But it didn't because of a
subtle startup ordering issue: the MCA param registration for
opal_paffinity_alone was in the paffinity base (vs. being in
opal/runtime/opal_params.c), but it didn't actually get registered
until after the global variable opal_paffinity_alone was checked to
see if we wanted old-style affinity bindings. Oops.
However, for MPI apps, even though the orted didn't do the binding,
ompi_mpi_init() would notice that opal_paffinity_alone was set, yet
the process didn't seem to be bound. So the MPI process would bind
itself (this was done to support the running-without-orteds
scenarios). Hence, MPI apps still obeyed mpi_paffinity_alone
semantics.
But note that the error described above caused the new mpirun switch
--report-bindings to not work with mpi_paffinity_alone=1, meaning that
the orted would not report the bindings when mpi_paffinity_alone was
set to 1 (it ''did'' correctly report bindings if you used
--bind-to-core or one of the other binding options).
This commit separates out the paffinity base MCA param registration
into a small function that can be called at the Right place during the
startup sequence.
This commit was SVN r22602.
* Don't build the pstat component if all defines needed aren't there.
* Update platform file to work better
* Work around two places that depended on modex being operational
This commit was SVN r22536.
Originally the patch was to improve the error message, but when digging into the code I found a subtle bug. If the daemon does not tell the HNP what CRS component it used, then the HNP tries to figure it out from the metadata (this is an uncommon case). The path the HNP used was not complete, so it was unable to find the metadata information. This patch fixes this by adding the 'snapshot_reference' to the 'snapshot_location' which completes the path for this search.
cmr:v1.4 (needs a custom patch)
cmr:v1.5
This commit was SVN r22479.
The following Trac tickets were found above:
Ticket 2190 --> https://svn.open-mpi.org/trac/ompi/ticket/2190
party/"vendor" import, the changes are actually far smaller than the
size of this changeset implies. Here's a list of the changes:
* Update the AMD license header in plpa_map.c to be less restrictive
(see https://svn.open-mpi.org/trac/plpa/changeset/262 for details)
-- '''this is the most/only important change of this update.''' No
code is changed by this; only removing a clase from a license
header in plpa_map.c.
* Changes to the generated {{{configure}}}, {{{config.guess}}}, and
{{{config.sub}}} scripts (which aren't used by OMPI).
* soname version tracking changes (which also aren't used by OMPI;
they're only used when PLPA is built/installed in "standalone"
mode).
* Update the "get version" m4 (which was stolen from OMPI's m4 to
begin with, and is only used during OMPI's autogen.sh step).
* Update various PLPA version numbers to 1.3.2.
* Bug fix in plpa-taskset (which is not built in the OMPI PLPA build).
This commit was SVN r22367.
:-)
Okay, cleanup the prior commit so that the default component search path shows in ompi_info, and remains available in component_find.
This commit was SVN r22278.
"my_perfect_path":SYSTEM_DEFAULT:USER_DEFAULT
and OPAL will substitute its internally derived values for the defaults (instead of forcing the user to figure them out).
This commit was SVN r22272.
Add orte configuration option to control the use of the framework in the system. Although the code will build, it will not be active unless configured with --enable-bootstrap.
If bootstrap is enabled and the new opal_sysinfo framework can successfully determine the cpu model, pass that info to the application as an MCA param to support some work at Sun.
Also, have daemons report back the resources they find to guide process mapping in bootstrap operations (i.e., where the daemon starts at node boot as opposed to being launched at application start).
Adjust some platform files to enable these capabilities.
This commit was SVN r22244.
PARAM_WINDOWS_FILES is a mistake or not). Fixes trac:2079.
This commit was SVN r22171.
The following Trac tickets were found above:
Ticket 2079 --> https://svn.open-mpi.org/trac/ompi/ticket/2079
posix_memalign() will either return 0 or not, indicating success. And
if posix_memalign() fails, it's not always going to be due to
out-of-memory -- just return ERR_IN_ERRNO.
This commit was SVN r22070.
MAP_PRIVATE. We didn't catch this because we checked for a NULL
return, not a -1 return. Doh! Thanks again to Julian Seward for
continuing to track this down.
This commit was SVN r22062.
opposite of MAKE_MEM_DEFINED. Also add in a call to NOACCESS to
(mostly) reverse the effects of MAKE_MEM_DEFINED (technically, page 0
was accessible before this, even though it's a Bad Idea to access it).
This commit was SVN r22056.
This commit looks larger than it really is since it includes a fair amount of code cleanup.
The SIGSTOP/SIGCONT+checkpointing work uses some of the functionality in r20391. Basic use case below (note that the checkpoint generated is useable as usual if the stopped application is terminated).
{{{
shell 1) mpirun -np 2 -am ft-enable-cr my-app
... running ...
shell 2) ompi-checkpoint --stop -v MPIRUN_PID
[localhost:001300] [ 0.00 / 0.20] Requested - ...
[localhost:001300] [ 0.00 / 0.20] Pending - ...
[localhost:001300] [ 0.01 / 0.21] Running - ...
[localhost:001300] [ 1.01 / 1.22] Stopped - ompi_global_snapshot_1234.ckpt
Snapshot Ref.: 0 ompi_global_snapshot_1234.ckpt
shell 2) killall -CONT mpirun
... Application Continues execution in shell 1 ...
}}}
Other items in this commit are mostly cleanup that has been sitting off-trunk for too long:
* Add a new {{{opal_crs_base_ckpt_options_t}}} type that encapsulates the various options that could be passed to the CRS. Currently only TERM and STOP, but this makes adding others ''much'' easier.
* Eliminate ORTE_SNAPC_CKPT_STATE_PENDING_TERM, since it served a redundant purpose with the new options type.
* Lay some basic ground work for some future features.
This commit was SVN r21995.
The following SVN revision numbers were found above:
r20391 --> open-mpi/ompi@0704b98668
masks between the mask argument and a local PLPA mask. There were three
problems:
1) The "get" function computed the number of bits as sizeof(mask),
which is the size of the pointer to the mask rather than the mask
itself. So, only 4 bits were copied with m32 and 8 bits with m64.
There are actually 1024 bits.
2) The "get" and "set" functions both copied a number of bits computed
from the sizeof() mask, but sizeof() reports the number of bytes.
We have to multiply by 8 to get the number of bits.
3) These two functions check to make sure tha the mask argument is not
bigger than the PLPA mask. But, the set function copies a number
of bits in the PLPA mask, which is conceivably greater than the
number of bits in the mask argument. So, accesses to the mask
argument may overrun that argument.
Problems 1 and 2 meant that one would encounter errors when the number of
cores exceeded 4 (with -m32) or 8 (with -m64). Problem 3 probably caused
no errors.
This commit was SVN r21993.
The new options work by adding an ":if-avail" qualifier to the "bind-to-socket" and "bind-to-core" MCA params. If the system does not support this capability, the job will launch anyway. Without the qualifier, the job will abort with an error message indicating that the required functionality is not supported on this system.
This commit was SVN r21975.
Due to a typo (probably cut/paste error) the CFLAGS/CPPFLAGS/LDFLAGS/LIBS arguments were not propogated to the BLCR component.
This changes a few {{{crs_blcr_check_}}} to {{{crs_blcr_save}}}, which they should have been all along.
This needs to be applied to v1.3 as well :/
This commit was SVN r21860.
#if defined (c_plusplus)
defined (__cplusplus)
followed by
extern "C" {
and the closing counterpart by BEGIN_C_DECLS and END_C_DECLS.
Notable exceptions are:
- opal/include/opal_config_bottom.h:
This is our generated code, that itself defines BEGIN_C_DECL and
END_C_DECL
- ompi/mpi/cxx/mpicxx.h:
Here we do not include opal_config_bottom.h:
- Belongs to external code:
opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.c
opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.h
- opal/include/opal/prefetch.h:
Has C++ specific macros that are protected:
- Had #if ... } #endif _and_ END_C_DECLS (aka end up with 2x
END_C_DECLS)
ompi/mca/btl/openib/btl_openib.h
- opal/event/event.h has #ifdef __cplusplus as BEGIN_C_DECLS...
- opal/win32/ompi_process.h: had extern "C"\n {...
opal/win32/ompi_process.h: dito
- ompi/mca/btl/pcie/btl_pcie_lex.l: needed to add *_C_DECLS
ompi/mpi/f90/test/align_c.c: dito
- ompi/debuggers/msgq_interface.h: used #ifdef __cplusplus
- ompi/mpi/f90/xml/common-C.xsl: Amend
Tested on linux using --with-openib and --with-mx
The following do not contain either opal_config.h, orte_config.h or
ompi_config.h
(but possibly other header files, that include one of the above):
ompi/mca/bml/r2/bml_r2_ft.h
ompi/mca/btl/gm/btl_gm_endpoint.h
ompi/mca/btl/gm/btl_gm_proc.h
ompi/mca/btl/mx/btl_mx_endpoint.h
ompi/mca/btl/ofud/btl_ofud_endpoint.h
ompi/mca/btl/ofud/btl_ofud_frag.h
ompi/mca/btl/ofud/btl_ofud_proc.h
ompi/mca/btl/openib/btl_openib_mca.h
ompi/mca/btl/portals/btl_portals_endpoint.h
ompi/mca/btl/portals/btl_portals_frag.h
ompi/mca/btl/sctp/btl_sctp_endpoint.h
ompi/mca/btl/sctp/btl_sctp_proc.h
ompi/mca/btl/tcp/btl_tcp_endpoint.h
ompi/mca/btl/tcp/btl_tcp_ft.h
ompi/mca/btl/tcp/btl_tcp_proc.h
ompi/mca/btl/template/btl_template_endpoint.h
ompi/mca/btl/template/btl_template_proc.h
ompi/mca/btl/udapl/btl_udapl_eager_rdma.h
ompi/mca/btl/udapl/btl_udapl_endpoint.h
ompi/mca/btl/udapl/btl_udapl_mca.h
ompi/mca/btl/udapl/btl_udapl_proc.h
ompi/mca/mtl/mx/mtl_mx_endpoint.h
ompi/mca/mtl/mx/mtl_mx.h
ompi/mca/mtl/psm/mtl_psm_endpoint.h
ompi/mca/mtl/psm/mtl_psm.h
ompi/mca/pml/cm/pml_cm_component.h
ompi/mca/pml/csum/pml_csum_comm.h
ompi/mca/pml/dr/pml_dr_comm.h
ompi/mca/pml/dr/pml_dr_component.h
ompi/mca/pml/dr/pml_dr_endpoint.h
ompi/mca/pml/dr/pml_dr_recvfrag.h
ompi/mca/pml/example/pml_example.h
ompi/mca/pml/ob1/pml_ob1_comm.h
ompi/mca/pml/ob1/pml_ob1_component.h
ompi/mca/pml/ob1/pml_ob1_endpoint.h
ompi/mca/pml/ob1/pml_ob1_rdmafrag.h
ompi/mca/pml/ob1/pml_ob1_recvfrag.h
ompi/mca/pml/v/pml_v_output.h
opal/include/opal/prefetch.h
opal/mca/timer/aix/timer_aix.h
opal/util/qsort.h
test/support/components.h
This commit was SVN r21855.
The following SVN revision numbers were found above:
r2 --> open-mpi/ompi@58fdc18855