and https://svn.open-mpi.org/trac/ompi/ticket/1853, mallopt() hints do
not always work -- it is possible for memory to be returned to the OS
and therefore OMPI's registration cache becomes invalid.
This commit removes all use of mallopt() and uses a different way to
integrate ptmalloc2 than we have done in the past. In particular, we
use almost exactly the same technique as MX:
* Remove all uses of mallopt, to include the opal/memory mallopt
component.
* Name-shift all of OMPI's internal ptmalloc2 public symbols (e.g.,
malloc -> opal_memory_ptmalloc2_malloc).
* At run-time, use the existing glibc allocator malloc hook function
pointers to fully hijack the glibc allocator with our own
name-shifted ptmalloc2.
* Make the decision whether to hijack the glibc allocator ''at run
time'' (vs. at link time, as previous ptmalloc2 integration
attempts have done). Look at the OMPI_MCA_mpi_leave_pinned
and OMPI_MCA_mpi_leave_pinned_pipeline environment variables and
the existence of /sys/class/infiniband to determine if we should
install the hooks or not.
* As an added bonus, we can now tell if libopen-pal is linked
statically or dynamically, and if we're linked statically, we
assume that munmap intercept support doesn't work.
See the opal/mca/memory/ptmalloc2/README-open-mpi.txt file for all the
gory details about the implementation.
Fixes trac:1853.
This commit was SVN r20921.
The following Trac tickets were found above:
Ticket 1853 --> https://svn.open-mpi.org/trac/ompi/ticket/1853
after each process create it's FIFOs but before they access the
peer's FIFOs. Second, replace a one way synchronization by a real
barrier, so we know that every process is really where we expect
them to be.
This commit was SVN r20906.
* Ensure we don't try to do opal_list_get_next() on an item we just
deleted
* set myaddrs = NULL when we're done with it, just for good measure
Once this is ported to OMPI v1.3 branch, it fixes
https://bugs.openfabrics.org/show_bug.cgi?id=1579.
This commit was SVN r20896.
Add two new configure options that specify:
1. when to add padding to the openib control header - this *only* happens when the configure option is specified
2. when to use the dr-like checksum as opposed to the memcpy checksum. Not selectable at runtime - to eliminate performance impacts, this is a configure-only option
Also removed an unused checksum version from opal/util/crc.h.
The new component still needs a little cleanup and some sync with recent ob1 bug fixes. It was created as a separate module to avoid performance hits in ob1 itself, though most of the code is duplicative. The component is only selectable by either specifying it directly, or configuring with the dr-like checksum -and- setting -mca pml_csum_enable_checksum 1.
Modify the LANL platform files to take advantage of the new module.
This commit was SVN r20846.
- This patch solely _adds_ required headers and is rather localized
The next patch (after RFC) heavily removes headers (based on script)
- ompi/communicator/communicator.h: For sources that use
ompi_mpi_comm_world, don't require them to include "mpi.h"
- ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs
#include "ompi/mca/topo/topo.h"
- ompi/errhandler/errhandler_predefined.h:
ompi/communicator/communicator.h depends on this header file!
To prevent recursion just have fwd declarations.
#include "ompi/types.h" for fwd declarations of the main structs.
- ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t
- ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and
ompi_rb_tree_t, so have the proper classes
- ompi/mca/op/op.h:
Op is pretty self-contained: Nobody up to now has done
#include "opal/class/opal_object.h"
- ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h:
#include "opal/types.h" for ompi_ptr_t
- ompi/mca/pml/base/base.h:
We use opal_lists
- ompi/mca/pml/dr/pml_dr_vfrag.h:
#include "opal/types.h" for ompi_ptr_t
- ompi/mca/pml/ob1/pml_ob1_hdr.h:
#include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t
- opal/dss/dss_unpack.c:
#include "opal/types.h"
- opal/mca/base/base.h:
#include "opal/util/cmd_line.h" for opal_cmd_line_t
- orte/mca/oob/tcp/oob_tcp.c:
#include "opal/types.h" for opal_socklen_t
- orte/mca/oob/tcp/oob_tcp.h:
#include "opal/threads/threads.h" for opal_thread_t
- orte/mca/oob/tcp/oob_tcp_msg.c:
#include "opal/types.h"
- orte/mca/oob/tcp/oob_tcp_peer.c:
#include "opal/types.h" for opal_socklen_t
- orte/mca/oob/tcp/oob_tcp_send.c:
#include "opal/types.h"
- orte/mca/plm/base/plm_base_proxy.c:
#include "orte/util/name_fns.h" for ORTE_NAME_PRINT
- orte/mca/rml/base/rml_base_receive.c:
#include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
- orte/mca/rml/oob/rml_oob_recv.c:
#include "opal/types.h" for ompi_iov_base_ptr_t
- orte/mca/rml/oob/rml_oob_send.c:
#include "opal/types.h" for ompi_iov_base_ptr_t
- orte/runtime/orte_data_server.c
#include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
- orte/runtime/orte_globals.h:
#include "orte/util/name_fns.h" for ORTE_NAME_PRINT
Tested on Linux/x86-64
This commit was SVN r20817.
without mpi.h we have no notion of MPI_SUCCESS...
- ompi/mca/btl/sm/btl_sm.h: ptrdiff_t needs stddef.h
- ompi/mca/mpool/base/: If we use opal_pointer_array_t,
better include the class header.
This commit was SVN r20816.
Adapt orte_process_info to orte_proc_info, and
change orte_proc_info() to orte_proc_info_init().
- Compiled on linux-x86-64
- Discussed with Ralph
This commit was SVN r20739.
Anyway, this is blocking the move: do not include pml.h
if not really needed, aka none of the following used:
mca_pml
MCA_PML_CALL
OMPI_ANY_TAG
OMPI_ANY_SOURCE
OMPI_PROC_NULL
- Notable exceptions (deleting in one header->adding):
- ompi/mca/mtl/psm/
- ompi/mca/osc/rdma/
- ompi/mca/btl/openib/btl_openib_endpoint.c depended on
pml_base_sendreq.h
- Tested on Linux/x86-64, this time including make check
(thanks Jeff and Ralph)
This commit was SVN r20725.
get bitten by header depending on having already included
the corresponding [opal|orte|ompi]_config.h header.
When separating, things like [OPAL|ORTE|OMPI]_DECLSPEC
are missed.
Script to add the corresponding header in front of all following
(taking care of possible #ifdef HAVE_...)
- Including some minor cleanups to
- ompi/group/group.h -- include _after_ #ifndef OMPI_GROUP_H
- ompi/mca/btl/btl.h -- nclude _after_ #ifndef MCA_BTL_H
- ompi/mca/crcp/bkmrk/crcp_bkmrk_btl.c -- still no need for
orte/util/output.h
- ompi/mca/pml/dr/pml_dr_recvreq.c -- no need for mpool.h
- ompi/mca/btl/btl.h -- reorder to fit
- ompi/mca/bml/bml.h -- reorder to fit
- ompi/runtime/ompi_mpi_finalize.c -- reorder to fit
- ompi/request/request.h -- additionally need ompi/constants.h
- Tested on linux/x86-64
This commit was SVN r20720.
opal layer.
Add a check against a maximum (actually get rid of ifs internally to
opal_bitmap.c) -- the functionality to set the current maximum size
opal_bitmap_set_max_size() is currently only used in attribute.c
to set the maximum OMPI_FORTRAN_HANDLE_MAX...
Tested on linux/x86-64 with intel-tests with all_tests_no_perf_f
run with 6 procs.
Let's look into MTT as well...
This commit was SVN r20708.
a notifier module.
The Notifier framework was extended slightly to
convey more information about each event notice.
This works with the FTB v0.5 API.
To compile with FTB support, use --with-ftb=/path/to/ftb/install
CIFTS == Coordinated Infrastructure for Fault Tolerant Systems
FTB == Fault Tolerance Backplane
see http://wiki.mcs.anl.gov/cifts/index.php
This commit was SVN r20655.
Added a few comments and changed the return code after the FIFO write to be SUCCESS,
even if the FIFO write indicated an error. Such an error would only mean that the
FIFO was full, but the FIFO-write operation would still be queued. Therefore, the
PML should think of this as successful.
This commit was SVN r20644.
anyhow -- if oob functionality is neededm then orte/mca/oob/oob.h
Nevertheless compiles fine with -Wimplicit-function-declaration
This commit was SVN r20641.
Only proc_info.h-internal include file is opal/dss/dss_types.h
- In one case (orte/util/hnp_contact.c) had to add proc_info.h again.
- Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
works fine, no errors.
Again, let's have MTT the last word.
This commit was SVN r20631.
* compute mmap-file size more wisely and pass requested size to allocator
* change MCA parameters:
- get rid of mpool_sm_per_peer_size
- get rid of mpool_sm_max_size
- set default mpool_sm_min_size to 0
* no longer pad sm allocations to page boundaries
* have sm_btl_first_time_init check return codes on free-list creations
Have mca_btl_sm_prepare_src() check to see if it can allocate an EAGER fragment
rather than a MAX fragment if the smaller size works.
Remove ompi/class/ompi_[circular_buffer_]fifo.h and references thereto.
Remove opal/util/pow2.[c|h] and references thereto.
This commit was SVN r20614.
* The main thing done here is to convert from multiple FIFOs/queues per
receiver (each receiver has one FIFO for each sender) to a single FIFO/queue
per receiver (all senders sharing the same FIFO for a given receiver).
* This requires rewriting the FIFO support, so that
ompi/class/ompi_[circular_buffer_]fifo.h is no longer used and FIFO
support is instead in btl_sm.h.
* The number of FIFOs per receiver is actually an MCA tunable parameter,
but it appears that 1 or possibly 2 FIFOs (even for 112 local processes)
per receiver is sufficient.
This commit was SVN r20578.