1
1
Граф коммитов

652 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
9687d5e867 Upgrade all www.open-mpi.org URLs to https
Found a handful of other URLs that weren't https-ized, so I updated
them, too (after verifying that they support https, of course).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-04-02 10:43:50 -04:00
Jeff Squyres
a0e34785e3 configure: fix PRRTE conflict with -iquote
PRRTE needs hwloc and libevent, so it needs to be setup "late" in
configure.ac.  However, we don't want to do it at the absolute bottom
of configure.ac, because right near the bottom, we setup CPPFLAGS (and
others) with values that are expected to be used only in
Makefile[.am]'s -- i.e., "$(foo)" values.  Such values are not able to
be used here in configure.

Hence, move the PRRTE setup up above where we do these "final"/
only-relevant-to-Makefile[.am] CPPFLAGS (etc.) updates occur.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-04-01 08:58:06 -07:00
Nathan Hjelm
160ff188b8
Merge pull request #7169 from hjelmn/fix_what_wg21_calls_our_problem_not_theirs_seriously__in_some_ways_they_are_correct_but_wtf
configure: use -iquote for non-system include paths
2020-03-30 09:22:54 -07:00
Noah Evans
ee3517427e Add threads framework
Add a framework to support different types of threading models including
user space thread packages such as Qthreads and argobot:

https://github.com/pmodels/argobots

https://github.com/Qthreads/qthreads

The default threading model is pthreads.  Alternate thread models are
specificed at configure time using the --with-threads=X option.

The framework is static.  The theading model to use is selected at
Open MPI configure/build time.

mca/threads: implement Argobots threading layer

config: fix thread configury

- Add double quotations
- Change Argobot to Argobots
config: implement Argobots check

If the poll time is too long, MPI hangs.

This quick fix just sets it to 0, but it is not good for the
Pthreads version. Need to find a good way to abstract it.

Note that even 1 (= 1 millisecond) causes disastrous performance
degradation.

rework threads MCA framework configury

It now works more like the ompi/mca/rte configury,
modulo some edge items that are special for threading package
linking, etc.

qthreads module
some argobots cleanup

Signed-off-by: Noah Evans <noah.evans@gmail.com>
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-03-27 10:15:45 -06:00
Nathan Hjelm
a6212fe767 configure: use -iquote for non-system include paths
This fixes a bug in the configuration system identified by a
change in the C++ standard. C++20 adds a new mandatory header
named version. This (and any system) header will always be
included with <> and not "". On case-insensitive filesystems
this new header conflicts with the VERSION file at the top
level of the build tree.

To fix this issue Open MPI needs to use -iquote instead of -I
for non-system include paths to ensure that these include are
only searched for the quote form of include. This commit also
adds a check to ensure that if the compiler does not support
-iquote that it falls back to -I until support can be added.

References #7155

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-03-09 21:18:40 -04:00
Ralph Castain
727bd8a60d
Report PRRTE build status if autogen'd --no-prrte
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-03-08 11:02:42 -07:00
Jeff Squyres
2ffdea6f52 Remove a few more remnants of zlib
The compress framework was removed in 66da0c63, but these bits escaped
removal.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-03-03 08:48:47 -08:00
Jeff Squyres
f53abe416f Re-add debugger flag configury
Looks like this was mistakently removed in the conversion to PRRTE.
We still need CFLAGS_WITHOUT_OPTFLAGS for Open MPI's MPI debugger
interface.  Not having this functionality means that ompi/debuggers
was being compiled incorrectly, which led to -- among other things --
32 bit builds failing.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-02-29 07:08:12 -08:00
Nathan Hjelm
0b8baa217d ompi: remove obsolete c++ bindings
This commit contains the following changes:

The C++ bindings were removed from the standard in MPI-3.0. This
commit removes the entirety of the C++ bindings as well as the
support configury.

Removes all references to C++ from the man pages. This includes the
bindings themselves, all references to what C++ bindings return,
all not-available comments, and differences between C++ and other
language bindings.

If the user passes --enable-mpi-cxx, --enable-mpi-cxx-seek, or
--enable-cxx-exceptions, print a warning message an abort configure.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-26 13:04:55 -08:00
Ralph Castain
8d66045e95
Deprecate the enable-orterun-prefix-by-default options
Mark the --enable-orterun-prefix-by-default and
--enable-mpirun-prefix-by-default options as deprecated, but continue to
honor them by translating them to the new
--enable-prte-prefix-by-default option.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-21 08:02:13 -08:00
Jeff Squyres
3bfc7a7b62
Tweak: rename "deprecated" --> "deleted"
We're dealing with CLI options that have been deleted, not
deprecated.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 5aea4446288db9207fbc60f23ee903b149217121)
2020-02-21 07:33:05 -08:00
Ralph Castain
7ebaee7437
Deprecate the --with-pmi option
Per the developer's meeting, add detection of the deprecated --with-pmi
(and its associated --with-pmi-libdir) configure option and error out
with a polite note of the change in support

Since "--with-pmi" now shows in the configure help output, mark the help
string with a giant *DEPRECATED* to warn users not to use it

Signed-off-by: Ralph Castain <rhc@pmix.org>

Ma

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-19 14:39:45 -08:00
Ralph Castain
edaf9160ae
Enable build against PMIx v2.2 without internal PRRTE
If you autogen.pl --without-prrte, we wouldn't configure or build PRRTE
support. However, configuring with --disable-internal-rte wasn't working
as it was being ignored. This led to some false errors when compiling
with an earlier PMIx v2.2 release.

That said, there were a couple of places that needed protection against
PMIx v2.2.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-15 16:28:16 -08:00
Gilles Gouaillardet
174e967dbc
Remove ORTE project
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.

remove orte: misc fixes

 - UCX fixes
 - VPATH issue
 - oshmem fixes
 - remove useless definition
 - Add PRRTE submodule
 - Get autogen.pl to traverse PRRTE submodule
 - Remove stale orcm reference
 - Configure embedded PRRTE
 - Correctly pass the prefix to PRRTE
 - Correctly set the OMPI_WANT_PRRTE am_conditional
 - Move prrte configuration to the end of OMPI's configure.ac
 - Make mpirun a symlink to prun, when available
 - Fix makedist with --no-orte/--no-prrte option
 - Add a `--no-prrte` option which is the same as the legacy
   `--no-orte` option.
 - Remove embedded PMIx tarball. Replace it with new submodule
   pointing to OpenPMIx master repo's master branch
 - Some cleanup in PRRTE integration and add config summary entry
 - Correctly set the hostname
 - Fix locality
 - Fix singleton operations
 - Fix support for "tune" and "am" options

Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-02-07 18:20:06 -08:00
guserav
4ad78aaa15 Revert "Remove opal/mca/common/ofi."
This reverts commit dd20174532.

Signed-off-by: guserav <erik.zeiske@web.de>
2019-08-09 10:33:21 -07:00
Ralph Castain
cd1b5641be Update slurm pmi configury to account for pmix
When Slurm is built against PMIx, some installations place a copy of the
PMIx library that Slurm is linking against in the Slurm PMI location.
Current configury ignores that location. The desired behavior is to look
for a PMIx lib in that location when --with-pmi is given. If the user
also specifies --with-pmix and gives a different location, then override
anything previously found and look for it where the user directed.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-21 11:33:35 -08:00
KAWASHIMA Takahiro
8bbd201029
Merge pull request #6205 from kawashima-fj/pr/fp16
Add FP16 datatypes
2019-02-08 14:52:13 +09:00
Jeff Squyres
dd20174532 Remove opal/mca/common/ofi.
It never lived up to its purpose (and has caused amorphous indirect
errors such as https://github.com/open-mpi/ompi/issues/2519), so
delete it.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 06:29:58 -08:00
Jeff Squyres
3f4af8e51c opal/common: remove stale common components
The verbs and verbs_usnic components are now no longer necessary / no
longer used anywhere in the code base.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 05:36:06 -08:00
KAWASHIMA Takahiro
ef4c47db1f configure: disable short float with Intel compiler
`short float` support of the Intel C++ Compiler (group of C and C++
compilers), at least versions 18.0 and 19.0, is half-baked. It can
compile declarations of `short float` variables and expressions of
`sizeof(short float)` but cannot compile operations of `short float`
variables. In this situation, `AC_CHECK_TYPES(short float)` defines
`HAVE_SHORT_FLOAT` as 1 and compilation errors occur in
`ompi/mca/op/base/op_base_functions.c`. To avoid this error
tentatively, we disable `short float` support when using the Intel
C++ Compiler.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 15:02:13 +09:00
KAWASHIMA Takahiro
2ad1c09848 opal/datatype: Add opal_short_float_t
The type `short float`, which is proposed in ISO/IEC JTC 1/SC 22 WG 14
(C WG), is not supported by most compilers yet. But some compilers
(including gcc 7 for AArch64 and clang 6) support `_Float16`, which
is defined in ISO/IEC TS 18661-3:2015 (ISO/IEC JTC 1/SC 22/WG 14 N1945)
as an extensions for C. If it is detected in `configure`, it is used
as an alternate type of `short float` in Open MPI internal code.

This commit adds a `configure` option `--enable-alt-short-float=TYPE`.
It can be used to specify a type other than `short float` and `_Float16`
as the alternate type.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro
f6b39452f6 opal/datatype: Support short float
The type `short float` is proposed for the C language in ISO/IEC JTC
1/SC 22 WG 14 (C WG) for mainly IEEE 754-2008 binary16, a.k.a.
half-precision floating point or FP16.

By this commit, `short float` and `short float _Complex` are detected
in `configure` and used in Open MPI internal code. `MPI_SHORT_FLOAT`
and its complex number version are not added yet.

This commit changes values of existing `OPAL_DATATYPE_*` macros.
This change does not affect ABI compatibility of `libmpi.so` and the
like because these values are only used in OPAL and OMPI internal code.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro
4e6403ac0f configure: Remove $ac_cv_type_[TYPE] checks for C99 types
Now Open MPI requires a C99 compiler. Checking availability of
the following types is no more needed.

- `long long` (`signed` and `unsigned`)
- `long double`
- `float _Complex`
- `double _Complex`
- `long double _Complex`

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-11-14 09:57:10 +09:00
Jeff Squyres
0379a44678 configure.ac: no longer using strncpy_s()
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-09-27 11:58:47 -07:00
Jeff Squyres
d246fb7897 Update Automake minimum version to 1.13.4
The embedded PMIx Automake minimum version is already 1.13.4, so to
autogen.pl Open MPI successfully, you already have to have Autoamek
1.13.4.  So we might as well make it official (i.e., bump Open MPI's
Automake minimum to match the Automake minimum in the embedded PMIx).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-09-15 13:29:30 -07:00
Nathan Hjelm
1cdbceb095 patcher/base: improve instruction cache flush for aarch64
This commit updates the patcher component to either use the
__clear_cache intrinsic or the correct assembly to flush the
instruction cache.

Fixes #5631

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-06 09:47:53 -06:00
Sergey Oblomov
d57ae62dee MCA/UCX: added common module
- implemented non-blocking routines for flush operations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-22 16:41:09 +03:00
Thananon Patinyasakdikul
390d72addd
Merge pull request #4885 from davideberius/spc_pr
Initial Software-based Performance Counters PR
2018-06-12 14:04:49 -07:00
David Eberius
d377a6b6f4 Added Software-based Performance Counters driver code along with several counters.
This code is the implementation of Software-base Performance Counters as described in the paper 'Using Software-Base Performance Counters to Expose Low-Level Open MPI Performance Information' in EuroMPI/USA '17 (http://icl.cs.utk.edu/news_pub/submissions/software-performance-counters.pdf).  More practical usage information can be found here: https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI.

All software events functions are put in macros that become no-ops when SOFTWARE_EVENTS_ENABLE is not defined.  The internal timer units have been changed to cycles to avoid division operations which was a large source of overhead as discussed in the paper.  Added a --with-spc configure option to enable SPCs in the Open MPI build.  This defines SOFTWARE_EVENTS_ENABLE.  Added an MCA parameter, mpi_spc_enable, for turning on specific counters.  Added an MCA parameter, mpi_spc_dump_enabled, for turning on and off dumping SPC counters in MPI_Finalize.  Added an SPC test and example.

Signed-off-by: David Eberius <deberius@vols.utk.edu>
2018-06-11 22:48:16 -04:00
Brian Barrett
9fff40647d oshmem: disable if no spmls build
This patch disables the oshmem layer if there are no SPMLs that
will build.  With the limited set of SPMLs available to support
oshmem, many builds end up installing an oshmem library that we
know will not work.  There has been a bit of customer confusion
over oshmem, hopefully this will lead customers in the right
direction.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-05-25 08:48:50 -07:00
Brian Barrett
22bdf85299 dist: Add infrastructre for prjects to not build
Two related changes to allow projects to not build based on
configure test results, as opposed to only reacting to
user configure options today.  Use case is disabling a project
like oshmem because no communication channels can be built.

First, Move PROJECT_* AM_CONDITIONALs from the top of configure to
the bottom, so that we can change the results during configure.
Second, add a DIST_SUBDIRS to Makefile.am (and populate it in
opal_mca) so that "make dist" will work even when a project is
disabled.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-05-25 08:48:50 -07:00
Howard Pritchard
0751578cbf
Merge pull request #4949 from hppritcha/topic/memkind_update
mpool/memkind: refactor to use the current API
2018-04-25 07:24:28 -06:00
Howard Pritchard
824197f886 mpool/memkind: refactor to use the current API
The mpool/memkind component was using a deprecated "partitions" API.
This commit refactors the memkind component to make use of the
supported public API.

The public API uses 3 parameters to specify a mpool "kind":

- a memkind type (which for now is just default or HBM)
- a memkind policy
- a memkind_bits (partly to specify pagesize)

The MCA parameters were changed to reflect these memkind
parameters.

Add a make check test for sanity checking of the memkind component.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-04-24 22:11:21 -06:00
Jeff Squyres
3f0ccff1b6 configure: remove block on POWER 7/BE systems
We thought there was a silent data corruption issue on POWER 7/BE
systems, so we blocked building on POWER 7/BE systems altogether.  We
later figured out that it was just data hangs -- not silent data
corruption.  So in hindsight, the configure block probably wasn't
necessary -- but we didn't know it at the time.

Regardless, the hangs have now been fixed, and we're removing the
POWER 7/BE block in configure.

For more detail on the entire saga, see
https://github.com/open-mpi/ompi/issues/4349#issuecomment-374970982.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-04-06 12:02:08 -04:00
Jeff Squyres
9ef0f3d83a ompi/monitoring: add .sh versionig to common monitoring lib
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-02-20 07:07:23 -08:00
Nathan Hjelm
1e630f4a46 configure: check for C11 and atomic types
This commit updates the configure code for Open MPI to check for C11
support. The features requested are: atomics and thread local
storage.

References #3879

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-12-11 15:27:12 -07:00
George Bosilca
458ccc12e1
Move the profiling library in common/monitoring
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-09-25 12:18:23 -04:00
Mark Allen
9b029c1be3 removing nmcheck_prefix.pl due to false positives
This test has proven to produce too many false positives so far. I hope
to re-enable it in the future, but until it has a longer history of not
producing false postivies it doesn't need to produce false nuisance
failures for everybody.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-08-24 13:01:39 -04:00
Joshua Hursey
12a015d90f config: Remove support for big endian PPC, XL compiler older than 13.1
* Removes support for big endian PPC
 * Removes support for XL compiler older than 13.1
 * Fixes Issue #4053

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-08-15 17:01:36 -04:00
Howard Pritchard
45e2771162 configure: remove CR/FT related options
As part of the process for addressing removal of CR/FT related
code from master (and hence from the 3.0.0 release), it was agreed
at the OMPI devel F2F on 7/13/17 that we'd break this in to two
pieces:

1) remove the configure arguments (fewer changes)
2) remove all the CR/FT code, etc. in a subsequent bigger commit
    that may not make it in to 3.0.0 in time.

By doing 1), the available configure options would not change
in a subsequent 3.0.x release if we end up not being able to do 2)
before 3.0.0 is released.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-07-17 13:48:59 -06:00
Jeff Squyres
ccf17808b6 Merge pull request #3258 from markalle/pr/symbol_name_pollution
symbol name pollution
2017-07-12 16:19:25 -05:00
Gilles Gouaillardet
a111fc8ff2 opal/datatype: fix opal_dt_swap_long_double if no IEEE754_H
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-12 10:27:45 +09:00
Gilles Gouaillardet
8fd08b933a opal/datatype: add minimal support to convert long double
between ieee 754 quadruple precision and extended precision formats.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-12 10:27:45 +09:00
Mark Allen
f0af4636ce testcase to check for bad symbol name prefixes
This checks the main libs that would be directly or indirectly linked
against the users executable (libmpi.so, libmpi_mpifh.so, libmpi_usempi.so,
libopen-rte, libopen-pal) using "nm" and looking for symbols without ompi_
opal_ mpi_ etc prefixes.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-07-11 02:13:21 -04:00
Ralph Castain
2753f53e6d Detect that we have a mix of BE/LE in the system, provide a warning that OMPI doesn't currently support this environment, and error out
Fixes #2817

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-03 15:47:05 -07:00
bosilca
d55b666834 Topic/monitoring (#3109)
Add a monitoring PML, OSC and IO. They track all data exchanges between processes,
with capability to include or exclude collective traffic. The monitoring infrastructure is
driven using MPI_T, and can be tuned of and on any time o any communicators/files/windows.
Documentations and examples have been added, as well as a shared library that can be
used with LD_PRELOAD and that allows the monitoring of any application.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>


* add ability to querry pml monitorinting results with MPI Tools interface
using performance variables "pml_monitoring_messages_count" and
"pml_monitoring_messages_size"

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Fix a convertion problem and add a comment about the lack of component
retain in the new component infrastructure.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Allow the pvar to be written by invoking the associated callback.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Various fixes for the monitoring.
Allocate all counting arrays in a single allocation
Don't delay the initialization (do it at the first add_proc as we
know the number of processes in MPI_COMM_WORLD)

Add a choice: with or without MPI_T (default).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Cleanup for the monitoring module.
Fixed few bugs, and reshape the operations to prepare for
global or communicator-based monitoring. Start integrating
support for MPI_T as well as MCA monitoring.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Adding documentation about how to use pml_monitoring component.

Document present the use with and without MPI_T.
May not reflect exactly how it works right now, but should reflects
how it should work in the end.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Change rank into MPI_COMM_WORLD and size(MPI_COMM_WORLD) to global variables in pml_monitoring.c.
Change mca_pml_monitoring_flush() signature so we don't need the size and rank parameters.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Improve monitoring support (including integration with MPI_T)

Use mca_pml_monitoring_enable to check status state. Set mca_pml_monitoring_current_filename iif parameter is set
Allow 3 modes for pml_monitoring_enable_output: - 1 : stdout; - 2 : stderr; - 3 : filename
Fix test : 1 for differenciated messages, >1 for not differenciated. Fix output.
Add documentation for pml_monitoring_enable_output parameter. Remove useless parameter in example
Set filename only if using mpi tools
Adding missing parameters for fprintf in monitoring_flush (for output in std's cases)
Fix expected output/results for example header
Fix exemple when using MPI_Tools : a null-pointer can't be passed directly. It needs to be a pointer to a null-pointer
Base whether to output or not on message count, in order to print something if only empty messages are exchanged
Add a new example on how to access performance variables from within the code
Allocate arrays regarding value returned by binding

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add overhead benchmark, with script to use data and create graphs out of the results
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix segfault error at end when not loading pml
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Start create common monitoring module. Factorise version numbering
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix microbenchmarks script
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Improve readability of code

NULL can't be passed as a PVAR parameter value. It must be a pointer to NULL or an empty string.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add osc monitoring component

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add error checking if running out of memory in osc_monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Resolve brutal segfault when double freeing filename
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Moving to ompi/mca/common the proper parts of the monitoring system
Using common functions instead of pml specific one. Removing pml ones.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add calls to record monitored data from osc. Use common function to translate ranks.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix test_overhead benchmark script distribution

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix linking library with mca/common

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add passive operations in monitoring_test

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix from rank calculation. Add more detailed error messages

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix alignments. Fix common_monitoring_get_world_rank function. Remove useless trailing new lines

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix osc_monitoring mget_message_count function call

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Change common_monitoring function names to respect the naming convention. Move to common_finalize the common parts of finalization. Add some comments.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add monitoring common output system

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add error message when trying to flush to a file, and open fails. Remove erroneous info message when flushing wereas the monitoring is already disabled.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Consistent output file name (with and without MPI_T).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Always output to a file when flushing at pvar_stop(flush).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Update the monitoring documentation.
Complete informations from HowTo. Fix a few mistake and typos.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use the world_rank for printf's.
Fix name generation for output files when using MPI_T. Minor changes in benchmarks starting script

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean potential previous runs, but keep the results at the end in order to potentially reprocess the data. Add comments.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add security check for unique initialization for osc monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean the amout of symbols available outside mca/common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Remove use of __sync_* built-ins. Use opal_atomic_* instead.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Allocate the hashtable on common/monitoring component initialization. Define symbols to set the values for error/warning/info verbose output. Use opal_atomic instead of built-in function in osc/monitoring template initialization.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Deleting now useless file : moved to common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add histogram ditribution of message sizes

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add histogram array of 2-based log of message sizes. Use simple call to reset/allocate arrays in common_monitoring.c

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add informations in dumping file. Separate per category (pt2pt/osc/coll (to come)) monitored data

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add coll component for collectives communications monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix warning messages : use c_name as the magic id is not always defined. Moreover, there was a % missing. Add call to release underlying modules. Add debug info messages. Add warning which may lead to further analysis.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix log10_2 constant initialization. Fix index calculation for histogram array.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add debug info messages to follow more easily initialization steps.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Group all the var/pvar definitions to common_monitoring. Separate initial filename from the current on, to ease its lifetime management. Add verifications to ensure common is initialized once only. Move state variable management to common_monitoring.
monitoring_filter only indicates if filtering is activated.
Fix out of range access in histogram.
List is not used with the struct mca_monitoring_coll_data_t, so heritate only from opal_object_t.
Remove useless dead code.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix invalid memory allocation. Initialize initial_filename to empty string to avoid invalid read in mca_base_var_register.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Don't install the test scripts.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix missing procs in hashtable. Cache coll monitoring data.
    * Add MCA_PML_BASE_FLAG_REQUIRE_WORLD flag to the PML layer.
    * Cache monitoring data relative to collectives operations on creation.
    * Remove double caching.
    * Use same proc name definition for hash table when inserting and
      when retrieving.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use intermediate variable to avoid invalid write while retrieving ranks in hashtable.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add missing release of the last element in flush_all. Add release of the hashtable in finalize.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use a linked list instead of a hashtable to keep tracks of communicator data. Add release of the structure at finalize time.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Set world_rank from hashtable only if found

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use predefined symbol from opal system to print int

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Move collective monitoring data to a hashtable. Add pvar to access the monitoring_coll_data. Move functions header to a private file only to be used in ompi/mca/common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix pvar registration. Use OMPI_ERROR isntead of -1 as returned error value. Fix releasing of coll_data_t objects. Affect value only if data is found in the hashtable.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add automated check (with MPI_Tools) of monitoring.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix procs list caching in common_monitoring_coll_data_t

    * Fix monitoring_coll_data type definition.
    * Use size(COMM_WORLD)-1 to determine max number of digits.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add linking to Fortran applications for LD_PRELOAD usage of monitoring_prof

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add PVAR's handles. Clean up code (visibility, add comments...). Start updating the documentation

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix coll operations monitoring. Update check_monitoring accordingly to the added pvar. Fix monitoring array allocation.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Documentation update.
Update and then move the latex and README documentation to a more logical place

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Aggregate monitoring COLL data to the generated matrix. Update documentation accordingly.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix monitoring_prof (bad variable.vector used, and wrong array in PMPI_Gather).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add reduce_scatter and reduce_scatter_block monitoring. Reduce memory footprint of monitoring_prof. Unify OSC related outputs.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add the use of a machine file for overhead benchmark

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Check for out-of-bound write in histogram

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix common_monitoring_cache object init for MPI_COMM_WORLD

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add RDMA benchmarks to test_overhead
Add error file output. Add MPI_Put and MPI_Get results analysis. Add overhead computation for complete sending (pingpong / 2).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add computation of average and median of overheads. Add comments and copyrigths to the test_overhead script

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add technical documentation

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Adapt to the new definition of communicators

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Update expected output in test/monitoring/monitoring_test.c

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add dumping histogram in edge case

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Adding a reduce(pml_monitoring_messages_count, MPI_MAX) example

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add consistency in header inclusion.
Include ompi/mpi/fortran/mpif-h/bindings.h only if needed.
Add sanity check before emptying hashtable.
Fix typos in documentation.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* misc monitoring fixes

* test/monitoring: fix test when weak symbols are not available
* monitoring: fix a typo and add a missing file in Makefile.am
and have monitoring_common.h and monitoring_common_coll.h included in the distro
* test/monitoring: cleanup all tests and make distclean a happy panda
* test/monitoring: use gettimeofday() if clock_gettime() is unavailable
* monitoring: silence misc warnings (#3)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

* Cleanups.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Changing int64_t to size_t.
Keep the size_t used accross all monitoring components.
Adapt the documentation.
Remove useless MPI_Request and MPI_Status from monitoring_test.c.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add parameter for RMA test case

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean the maximum bound computation for proc list dump.
Use ptrdiff_t instead of OPAL_PTRDIFF_TYPE to reflect the changes from commit fa5cd0dbe5.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add communicator-specific monitored collective data reset

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add monitoring scripts to the 'make dist'
Also install them in the build and the install directories.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-06-26 18:21:39 +02:00
Howard Pritchard
462342d148 Merge pull request #3311 from hppritcha/topic/libfabric_moves_to_ofi
common/libfabric: move libfabric to ofi
2017-04-21 07:50:38 -06:00
Howard Pritchard
841192645b common/libfabric: move libfabric to ofi
This PR renames the common library for OFI libfabric from
libfabric to ofi.  There are a number of reasons this
is good to do:

1) its shorter and replaces 9 characters with three for
   function names for what may eventually be a fairly extensive interface
2) OFI is the term used for MTL and RML components that use
   the OFI libfabric interface
3) A planned OSC component will also use the OFI term.
4) Other HPC libraries that can use OFI libfabric tend to use
   the term "ofi" internally and also in their configure options
   relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM)

There seem to be comments in places in the Open MPI source
code that indicate that this common library will be going away.
Far from it as we will want to be able to share things like
AV objects between OMPI and possibly OSHMEM components that
use the OFI libfabric interface.

This PR also adds a synonym to the --with-libfabric(-libdir)
configury options: --with-ofi and with-ofi-libdir.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-04-20 13:07:16 -06:00
Gilles Gouaillardet
cc8a655fe6 configury: remove now obsolete reference to OPAL_PTRDIFF_TYPE
since Open MPI now requires a C99, and ptrdiff_t type is part of C99,
there is no more need for the abstract OPAL_PTRDIFF_TYPE type.

Thanks George, Nathan and Paul for the help.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-19 13:42:45 +09:00
Ralph Castain
08b5fe46db Check for zlib.h
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-20 11:55:11 -08:00