This means we need not check for jemalloc in the configure script for
this component. Removing this.
In some machines having the TLS option on can cause errors in
opening this component. --disable-tls while configuring jemalloc.
Please look for instructions for installing jemalloc as a static
library linked directly into memkind in CONTRIBUTING file
github.com/memkind/memkindw
This commit rewrites both the mpool and rcache frameworks. Summary of
changes:
- Before this change a significant portion of the rcache
functionality lived in mpool components. This meant that it was
impossible to add a new memory pool to use with rdma networks
(ugni, openib, etc) without duplicating the functionality of an
existing mpool component. All the registration functionality has
been removed from the mpool and placed in the rcache framework.
- All registration cache mpools components (udreg, grdma, gpusm,
rgpusm) have been changed to rcache components. rcaches are
allocated and released in the same way mpool components were.
- It is now valid to pass NULL as the resources argument when
creating an rcache. At this time the gpusm and rgpusm components
support this. All other rcache components require non-NULL
resources.
- A new mpool component has been added: hugepage. This component
supports huge page allocations on linux.
- Memory pools are now allocated using "hints". Each mpool component
is queried with the hints and returns a priority. The current hints
supported are NULL (uses posix_memalign/malloc), page_size=x (huge
page mpool), and mpool=x.
- The sm mpool has been moved to common/sm. This reflects that the sm
mpool is specialized and not meant for any general
allocations. This mpool may be moved back into the mpool framework
if there is any objection.
- The opal_free_list_init arguments have been updated. The unused0
argument is not used to pass in the registration cache module. The
mpool registration flags are now rcache registration flags.
- All components have been updated to make use of the new framework
interfaces.
As this commit makes significant changes to both the mpool and rcache
frameworks both versions have been bumped to 3.0.0.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Add primitive magic number and version checking in the connectivity
checker protocol. These checks doesn't *guarantee* to we won't get
false PINGs and ACKs, but they do significantly reduce the possibility
of interpretating random incoming fragments as PINGs or ACKs.
The btl_recv.h:lookup_sender() function uses the hashed ORTE proc name
to determine the sender of the packet. With add_procs_cutoff>0, the
usnic BTL may not have knowledge of all the senders.
Until the usNIC BTL can be adjusted to do something like the
openib/ugni BTLs (i.e., use opal_proc_for_name() to lookup unknown
sender proc names), set MCA_BTL_FLAGS_SINGLE_ADD_PROCS, which means
that ob1 will only all add_procs() once -- with all the procs in it.
Also in this commit, adapt the connectivity checker to not rely on
knowing all the senders (which is a bit easier than adapting the main
BTL send path): the receiving connectivity agent will simply echo back
the same PING message (which contains the sender's IP address+UDP
port) back to the sender without checking that it knows who the sender
is. If the sender receives the echoed PING back on the expexted
interface, it will find a match in the pending pings list. If the
sender receives the echoed PING back an unexpected interface, a match
will not be found, and the incoming PING message will be dropped.
Fixesopen-mpi/ompi#1440
This commit fixes an inconsistency between btl_openib_receive_queues,
btl_openib_max_send_size and btl_openib_eager_limit. Before this
commit if the ini file specified a set of default receive queues that
happen to not contain one large enough for the default max_send_size
of eager_limit users would see an error like:
WARNING: The largest queue pair buffer size specified in the
btl_openib_receive_queues MCA parameter is smaller than the maximum
send size (i.e., the btl_openib_max_send_size MCA parameter), meaning
that no queue is large enough to receive the largest possible incoming
message fragment. The OpenFabrics (openib) BTL will therefore be
deactivated for this run.
Local host: somehost
Largest buffer size: 65536
Maximum send fragment size: 131072
This commit adds code that detects the source of the max_send_size and
eager_limit values and sets either or both of them to the size
supported by the largest queue pair if both 1) the value is larger
than the largest queue pair size, and 2) the value was not set by the
user or a MCA configuration file.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds two m4 macros: OPAL_SUMMARY_ADD, OPAL_SUMMARY_PRINT.
OPAL_SUMMARY_ADD adds an item to a section in the summary. For example
OPAL_SUMMARY_ADD([[Transports]],[[Foo]],...,[yes]) will add the
following to the summary:
Transports
-----------------------
Foo: yes
With this commit two sections are added: Transports, Resource Managers.
The OPAL_SUMMARY_PRINT macro is called after AC_OUTPUT and prints out
some information about the build (version, projects, etc) and then
the summarys sections. It will additionally print a warning if
internal debugging is enabled.
Example output:
Open MPI configuration:
-----------------------
Version: 3.0.0 a1
Build Open Platform Abstration project: yes
Build Open Runtime project: yes
Build Open MPI project: yes
Build Open SHMEM project: no
MPI C++ bindings (deprecated): no
MPI Fortran bindings: mpif.h, use mpi, use mpi_f08
Debug build: yes
Transports
-----------------------
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): no
KNEM Shared Memory: no
Linux CMA IPC: no
Mellanox MXM: no
Open UCX: no
OpenFabrics libfabric: no
OpenFabrics Verbs: no
portals4: no
QLogic Infinipath (PSM): no
tcp: yes
XPMEM Shared Memory: no
Resource Managers
-----------------------
Cray Alps: no
Grid Engine: no
LSF: no
Slurm: yes
Torque: yes
INTERNAL DEBUGGING IS ENABLED. DO NOT USE THIS BUILD FOR PERFORMANCE MEASUREMENTS!
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit fixes a bug in the opal key value parser that might cause
the filename parser to go past the beginning of the string.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>