Libltdl erroneously returns an error string of "file not found" for
lots of reasons, even if the file really *is* there, but just failed
to dlopen() for some reason. So if lt_dlerror() returns "file not
found", do some simple hueristics and if we *do* find a file, print a
slightly better error message.
This commit was SVN r21214.
1. replacing mpi_paffinity_alone with opal_paffinity_alone - for back-compatibility, I have aliased mpi_paffinity_alone to the new param name. This caus
es a mild abstraction break in the opal/mca/paffinity framework - per the devel discussion...live with it. :-) I also moved the ompi_xxx global variable
that tracked maffinity setup so it could be properly closed in MPI_Finalize to the opal/mca/maffinity framework to avoid an abstraction break.
2. Added code to the odls/default module to perform paffinity binding and maffinity init between process fork and exec. This has been tested on IU's odi
n cluster and works for both MPI and non-MPI apps.
3. Revise MPI_Init to detect if affinity has already been set, and to attempt to set it if not already done. I have *not* tested this as I haven't yet f
igured out a way to do so - I couldn't get slurm to perform cpu bindings, even though it supposedly does do so.
This has only been lightly tested and would definitely benefit from a wider range of evaluation...
This commit was SVN r21209.
* Pass the sequence number of the checkpoint along with reference from the global to the local coordinator.
* 'orte-restart --apponly' now just generates the app context file, and does not run with it. This provides the user the ability to edit the file before launching.
* Add a OPAL_CRS_NONE state
* Split the INC into three distinct parts.
* Implement a restart mechanism for the 'none' component. If given a context it simply execvp()'s it.
This commit was SVN r21195.
* Add 'orte-checkpoint -l' option that lists all checkpoints currently available on the system.
* Add 'orte-restart -i' which prints information regarding the checkpoint targeted for restart.
* Add ability to extract the timing metadata.
* Fix show_help() in the orte-checkpoint and orte-restart tools. They should be using the opal versions instead of the orte versions (otherwise nothing is printed).
This commit was SVN r21194.
OMPI_* to OPAL_*. This allows opal layer to be used more independent
from the whole of ompi.
NOTE: 9 "svn mv" operations immediately follow this commit.
This commit was SVN r21180.
This patch contains the following items:
* Fix the flag passed to open() for the read side of the named pipe between the local and app coordinator. There is a race condition when using O_RDWR on a named pipe (not sure how that bug got in there in the first place).
* Adjust control in the C/R thread timing
* Clarify return code in BLCR component
* Allow the user to adjust the max wait time for the named pipes in the FileM local coordinator by using the MCA parameter "snapc_full_max_wait_time" (Default: 20 seconds)
* If the application terminates while there are active FileM operations, force mpirun to wait on these operations to complete.
* Allow the user to set the local copy command (Default: cp) via MCA parameter "filem_rsh_cp"
* Implement the ability to throttle the number of outgoing connections in FileM. At larger scales this type of explicit throttling helps prevent overwhelming the HNP machine. Default: 10, set via MCA parameter: {{{filem_rsh_max_outgoing}}}
This commit was SVN r21167.
The following SVN revision numbers were found above:
r21131 --> open-mpi/ompi@0deb009225
- Delete unnecessary header files using
contrib/check_unnecessary_headers.sh after applying
patches, that include headers, being "lost" due to
inclusion in one of the now deleted headers...
In total 817 files are touched.
In ompi/mpi/c/ header files are moved up into the actual c-file,
where necessary (these are the only additional #include),
otherwise it is only deletions of #include (apart from the above
additions required due to notifier...)
- To get different MCAs (OpenIB, TM, ALPS), an earlier version was
successfully compiled (yesterday) on:
Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled
This commit was SVN r21096.
we had already tested this scenario manually to know that it seemed to
be working. What we ''didn't'' test was --enable-static
--disable-shared --disable-dlopen -- but my MTT '''did.''' Yay!
This commit fixes that scenario. Essentially we need to call a dummy
function in hooks.c to ensure that the linker pulls in all those
symbols into the final executable (and therefore pulls in the
malloc_initialize_hook, etc.). Thanks for the heads-up from Brian in
fixing this one!
This commit was SVN r21022.
forgotten to save before the commit was sent.
This comment explains why we're doing a cache check here rather than a
real check.
This commit was SVN r20975.
component (which we probably don't test regularly because we probably
only test environments where the other paffinity components are used)
was not getting built because it had a bad configure test.
This commit was SVN r20974.
and https://svn.open-mpi.org/trac/ompi/ticket/1853, mallopt() hints do
not always work -- it is possible for memory to be returned to the OS
and therefore OMPI's registration cache becomes invalid.
This commit removes all use of mallopt() and uses a different way to
integrate ptmalloc2 than we have done in the past. In particular, we
use almost exactly the same technique as MX:
* Remove all uses of mallopt, to include the opal/memory mallopt
component.
* Name-shift all of OMPI's internal ptmalloc2 public symbols (e.g.,
malloc -> opal_memory_ptmalloc2_malloc).
* At run-time, use the existing glibc allocator malloc hook function
pointers to fully hijack the glibc allocator with our own
name-shifted ptmalloc2.
* Make the decision whether to hijack the glibc allocator ''at run
time'' (vs. at link time, as previous ptmalloc2 integration
attempts have done). Look at the OMPI_MCA_mpi_leave_pinned
and OMPI_MCA_mpi_leave_pinned_pipeline environment variables and
the existence of /sys/class/infiniband to determine if we should
install the hooks or not.
* As an added bonus, we can now tell if libopen-pal is linked
statically or dynamically, and if we're linked statically, we
assume that munmap intercept support doesn't work.
See the opal/mca/memory/ptmalloc2/README-open-mpi.txt file for all the
gory details about the implementation.
Fixes trac:1853.
This commit was SVN r20921.
The following Trac tickets were found above:
Ticket 1853 --> https://svn.open-mpi.org/trac/ompi/ticket/1853
each page. Anyway, this function is _NEVER_ called as we use bind instead of set.
So please don't rely on the first touch memory affinity to do the right thing.
This commit was SVN r20917.
generate mangled windex files. Made ompi-top.1 and ompi-iof.1 build
by default. Also, added the orte-top synonym to the ompi-top manpage.
This commit was SVN r20915.
Enable the debug library suffix, which is extremely necessary on Windows. If users want to debug their own programs in Visual Studio, but linking the programs to the release version libraries of Open MPI, i.e. mixing debug and release version DLLs, that will definitely cause some errors. What we have to do is providing both debug and release versions libraries, distinguished with suffix 'd', e.g. libmpid.dll for debug version.
This commit was SVN r20828.
- This patch solely _adds_ required headers and is rather localized
The next patch (after RFC) heavily removes headers (based on script)
- ompi/communicator/communicator.h: For sources that use
ompi_mpi_comm_world, don't require them to include "mpi.h"
- ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs
#include "ompi/mca/topo/topo.h"
- ompi/errhandler/errhandler_predefined.h:
ompi/communicator/communicator.h depends on this header file!
To prevent recursion just have fwd declarations.
#include "ompi/types.h" for fwd declarations of the main structs.
- ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t
- ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and
ompi_rb_tree_t, so have the proper classes
- ompi/mca/op/op.h:
Op is pretty self-contained: Nobody up to now has done
#include "opal/class/opal_object.h"
- ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h:
#include "opal/types.h" for ompi_ptr_t
- ompi/mca/pml/base/base.h:
We use opal_lists
- ompi/mca/pml/dr/pml_dr_vfrag.h:
#include "opal/types.h" for ompi_ptr_t
- ompi/mca/pml/ob1/pml_ob1_hdr.h:
#include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t
- opal/dss/dss_unpack.c:
#include "opal/types.h"
- opal/mca/base/base.h:
#include "opal/util/cmd_line.h" for opal_cmd_line_t
- orte/mca/oob/tcp/oob_tcp.c:
#include "opal/types.h" for opal_socklen_t
- orte/mca/oob/tcp/oob_tcp.h:
#include "opal/threads/threads.h" for opal_thread_t
- orte/mca/oob/tcp/oob_tcp_msg.c:
#include "opal/types.h"
- orte/mca/oob/tcp/oob_tcp_peer.c:
#include "opal/types.h" for opal_socklen_t
- orte/mca/oob/tcp/oob_tcp_send.c:
#include "opal/types.h"
- orte/mca/plm/base/plm_base_proxy.c:
#include "orte/util/name_fns.h" for ORTE_NAME_PRINT
- orte/mca/rml/base/rml_base_receive.c:
#include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
- orte/mca/rml/oob/rml_oob_recv.c:
#include "opal/types.h" for ompi_iov_base_ptr_t
- orte/mca/rml/oob/rml_oob_send.c:
#include "opal/types.h" for ompi_iov_base_ptr_t
- orte/runtime/orte_data_server.c
#include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
- orte/runtime/orte_globals.h:
#include "orte/util/name_fns.h" for ORTE_NAME_PRINT
Tested on Linux/x86-64
This commit was SVN r20817.
In case we use memcmp, strlen, strup and friends include <string.h>
Also several constants.h are not included directly
- Let's have mca_topo_base_cart_create return ompi-errors in
ompi/mca/topo/base/topo_base_cart_create.c
This commit was SVN r20773.
get bitten by header depending on having already included
the corresponding [opal|orte|ompi]_config.h header.
When separating, things like [OPAL|ORTE|OMPI]_DECLSPEC
are missed.
Script to add the corresponding header in front of all following
(taking care of possible #ifdef HAVE_...)
- Including some minor cleanups to
- ompi/group/group.h -- include _after_ #ifndef OMPI_GROUP_H
- ompi/mca/btl/btl.h -- nclude _after_ #ifndef MCA_BTL_H
- ompi/mca/crcp/bkmrk/crcp_bkmrk_btl.c -- still no need for
orte/util/output.h
- ompi/mca/pml/dr/pml_dr_recvreq.c -- no need for mpool.h
- ompi/mca/btl/btl.h -- reorder to fit
- ompi/mca/bml/bml.h -- reorder to fit
- ompi/runtime/ompi_mpi_finalize.c -- reorder to fit
- ompi/request/request.h -- additionally need ompi/constants.h
- Tested on linux/x86-64
This commit was SVN r20720.
known-bad memory access pattern. Specifically, a NULL pointer is
passed in a system call as part of a probe to figure out which
affinity API this system has. We know it's a NULL and we did it on
purpose, so don't have Valgrind yell about it.
This commit was SVN r20572.
The prior ompi_proc_t structure had a uint8_t flag field in it, where only one
bit was used to flag that a proc was "local". In that context, "local" was
constrained to mean "local to this node".
This commit provides a greater degree of granularity on the term "local", to include tests
to see if the proc is on the same socket, PC board, node, switch, CU (computing
unit), and cluster.
Add #define's to designate which bits stand for which local condition. This
was added to the OPAL layer to avoid conflicting with the proposed movement of
the BTLs. To make it easier to use, a set of macros have been defined - e.g.,
OPAL_PROC_ON_LOCAL_SOCKET - that test the specific bit. These can be used in
the code base to clearly indicate which sense of locality is being considered.
All locations in the code base that looked at the current proc_t field have
been changed to use the new macros.
Also modify the orte_ess modules so that each returns a uint8_t (to match the
ompi_proc_t field) that contains a complete description of the locality of this
proc. Obviously, not all environments will be capable of providing such detailed
info. Thus, getting a "false" from a test for "on_local_socket" may simply
indicate a lack of knowledge.
This commit was SVN r20496.