1
1
Граф коммитов

3434 Коммитов

Автор SHA1 Сообщение Дата
Noah Evans
a64abadf97 Fix mca_base_var_files separator
In the opal list parsing behavior paths should be separated by ':' while files are separated by ','. In the opal and pmix code (the pmix fix is in a separate commit) there was a mistake in the parsing such that files were being separated by ':' when they should be separated by ','s. This commit attempts to address this mismatch.

Signed-off-by: Noah Evans <noah.evans@gmail.com>
2018-06-19 12:59:07 -06:00
Ralph Castain
f0a0d606a0 Correct accounting for tools
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 1be080f7b92bad39745f42628a8cb6afefad2d2a)
2018-06-18 13:24:25 -07:00
Thananon Patinyasakdikul
13f58f3191
Merge pull request #5274 from thananon/ofi_sep
btl/ofi: add scalable endpoint support.
2018-06-18 08:41:06 -07:00
Jeff Squyres
266d5b2110
Merge pull request #5277 from jsquyres/pr/cygwin-patch
external libevent: fix for Cygwin
2018-06-18 11:08:30 -04:00
Ralph Castain
7981818b84 Update PMIx atomics
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-17 10:03:49 -07:00
Ralph Castain
fa18ba395d Sync to latest PMIx v3.0rc
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-17 02:41:46 -07:00
Ralph Castain
ac7bb15505 Fix other typo in help message
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-16 16:30:52 -07:00
Ralph Castain
8cfce583c0 Correct typo to properly check for PMIx v4
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-16 16:29:05 -07:00
Jeff Squyres
07c8ec6a3c external libevent: fix for Cygwin
Fix from Marco Atzeri for building on Cygwin.

Signed-off-by: Marco Atzeri <marco.atzeri@gmail.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-16 09:08:58 -07:00
Ralph Castain
e2e6da4379
Merge pull request #5258 from rhc54/topic/pmix4
Sync to PMIx v3.0rc and add ext4x
2018-06-15 09:41:08 -07:00
Thananon Patinyasakdikul
dae3c9447c btl/ofi: add scalable endpoint support.
This commit add support for scalable endpoint to enhance multithreaded
application performance. The BTL will detect the support from ofi
provider and will fallback to normal usage of scalable endpoint is not
supported.

NEW MCA parameters:
- mca_btl_ofi_disable_sep: force the btl to not use scalable endpoint.
- mca_btl_ofi_num_contexts_per_module: number of communication context
  to create (should be the same as number of thread).

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
2018-06-14 15:44:29 -07:00
Howard Pritchard
7dcab6e4a4
Merge pull request #5269 from hppritcha/topic/squash_gcc7.3.0_warnings
topo/treematch - quash compiler warning
2018-06-13 21:13:04 -05:00
Howard Pritchard
64de269cc3 topo/treematch - quash compiler warning
quash a compiler warning showing up with gcc 7.3

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-06-13 16:34:17 -05:00
Thananon Patinyasakdikul
390d72addd
Merge pull request #4885 from davideberius/spc_pr
Initial Software-based Performance Counters PR
2018-06-12 14:04:49 -07:00
Jeff Squyres
591b225527
Merge pull request #5245 from PeterGottesman/hwloc-fix
Ensure required hwloc directories are in dist tarballs
2018-06-12 13:08:35 -04:00
Peter Gottesman
afe363b73b Ensure required hwloc directories are in dist tarballs
Fixes MTT failure when running autogen on a tarball

Signed-off-by: Peter Gottesman <pgottesm@cisco.com>
2018-06-12 08:33:50 -07:00
Howard Pritchard
b43f94895f
Merge pull request #5261 from thananon/ofi_new_mr
btl/ofi: change required mr mode bits.
2018-06-11 20:54:09 -06:00
David Eberius
d377a6b6f4 Added Software-based Performance Counters driver code along with several counters.
This code is the implementation of Software-base Performance Counters as described in the paper 'Using Software-Base Performance Counters to Expose Low-Level Open MPI Performance Information' in EuroMPI/USA '17 (http://icl.cs.utk.edu/news_pub/submissions/software-performance-counters.pdf).  More practical usage information can be found here: https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI.

All software events functions are put in macros that become no-ops when SOFTWARE_EVENTS_ENABLE is not defined.  The internal timer units have been changed to cycles to avoid division operations which was a large source of overhead as discussed in the paper.  Added a --with-spc configure option to enable SPCs in the Open MPI build.  This defines SOFTWARE_EVENTS_ENABLE.  Added an MCA parameter, mpi_spc_enable, for turning on specific counters.  Added an MCA parameter, mpi_spc_dump_enabled, for turning on and off dumping SPC counters in MPI_Finalize.  Added an SPC test and example.

Signed-off-by: David Eberius <deberius@vols.utk.edu>
2018-06-11 22:48:16 -04:00
Thananon Patinyasakdikul
b6e07bfac6 btl/ofi: change required mr mode bits.
FI_MR_UNSPEC is not supposed to be used beyond ofi version 1.5. This
commit replaces FI_MR_UNSPEC with the new FI_MR_BASIC mode bits
(FI_MR_PROV_KEY | FI_MR_ALLOCATED | FI_MR_VIRT_ADDR).

The btl functionality remains the same.

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
2018-06-11 09:25:57 -07:00
Ralph Castain
48f27655a6 Sync to PMIx v3.0rc and add ext4x
Sync to the draft rc for PMIx v3.0. Add an external component for PMIx master, which is at v4.0

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-11 05:54:23 -07:00
Nicolas Morey-Chaisemartin
038ceb0b61 btl: openib: Add support for Broadcom Cumulus RoCE HCA
Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
2018-06-07 15:16:58 +02:00
Nicolas Morey-Chaisemartin
d3416345ab btl: openib: add support for QLogic RoCE HCA
Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
2018-06-07 15:16:41 +02:00
Arm
ce4d769aee
Merge pull request #5235 from thananon/ofi_einr_fix
btl/ofi: handles FI_EINTR properly.
2018-06-06 11:30:48 -07:00
Ralph Castain
840fb42f93 PMIx rte component does support dynamics
Minor cleanups

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-05 21:55:19 -07:00
Thananon Patinyasakdikul
f34a73af70 btl/ofi: handles FI_EINTR properly.
OFI sockets provider will sometime return FI_EINTR. Apparently this flag
does not exist in OFI version 1.5. This commit added an ifdef check.

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
2018-06-05 18:40:00 -07:00
Nathan Hjelm
8192eb9aa6 patch/overwrite: add support for aarch64
This commit adds patcher support of aarch64. The current
implementation uses mov, movk, and br to perform the jump. This uses 5
instructions.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-06-04 16:04:26 -06:00
Arm
555029b323
Merge pull request #5216 from thananon/btl_ofi
btl/ofi: the component now compiles with libfabric < 1.5.
2018-06-04 13:47:00 -07:00
Thananon Patinyasakdikul
c18cf7315f btl/ofi: configury: will not build if OFI ver < 1.5.
This commit fixes problem where users have libfabric version < 1.5
installed and build failed because the new MR modebits does not exist.

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
2018-06-04 10:35:50 -07:00
Ralph Castain
014bb3c8de Fix external hwloc builds
Remove spurious comma in header file definition. Remove unused variables

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-03 11:24:21 -07:00
Ralph Castain
55ac526a67 Enable the PMIx ompi/rte component
Get the OMPI rte/pmix component working. This was tested using PRRTE as the RM, configuring OMPI using:

* autogen --no-orte

* with external libevent, external hwloc, and external PMIx master

* configuring PMIx master with the same libevent and hwloc

* execute the application using PRRTE's "prun" launcher, which has the same cmd line as ORTE's mpirun

Note that PMIx master appears to have a bug in the event notification system that caches job termination events. Thus, the first execution runs fine, but subsequent executions cause an "abort" when the OMPI default error handler is invoked upon notification of the prior job's termination. Will work that separately.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 134cca9ac0de092d767999357573a31703f72292)
2018-06-03 07:25:12 -07:00
Ralph Castain
4ceaed3a76 Merge https://github.com/bgoglin/ompi into topic/rte2 2018-06-03 07:24:12 -07:00
Howard Pritchard
623e36de8a
Merge pull request #5205 from thananon/btl_ofi
new btl/ofi: RDMA only btl using libfabric.
2018-06-02 13:00:50 -06:00
Thananon Patinyasakdikul
6efb8069ec new btl/ofi: RDMA only btl using libfabric.
This commit added new transport layer to be used with osc rdma module.
This BTL provides put, get, atomic and fetch atomic operations. It can
be used with multiple hardware vendors as long as they have their
provider under Libfabric and have the right capabilities.

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
2018-06-01 15:22:04 -07:00
Jeff Squyres
fb0473acb5 pmix3x: compiler warning stomp
This fix was already included in pmix upstream (https://github.com/pmix/pmix/commit/fb7af8af2).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-05-30 10:14:37 -07:00
Sylvain Jeaugey
4eb75623ef cuda: add option to remove warning about missing libcuda.
Signed-off-by: Sylvain Jeaugey <sjeaugey@nvidia.com>
2018-05-24 14:56:46 -07:00
Brice Goglin
847f2e9933 opal/hwloc: remove now unused available field from opal_hwloc_obj_data_t
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
b260600450 opal/hwloc: simplify df_search() and make it work with hwloc 2.x NUMA nodes
Don't do a recursive search (hence no need for *idx anymore).
Find the level depth, to hide cache-issues first.
Then iterate over that level to find the objects we want.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
a06fc74664 opal/hwloc: remove an obsolete comment about offlines CPUs etc
Only online/available objects are enabled in OMPI now.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
369a7ea279 opal/hwloc: remove df_search_cores and fix things for hwloc 2.x NUMA nodes
Just iterate over cores inside the given object cpuset.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
0cd0c12111 opal/hwloc: remove min_bound() functions
df_search_min_bound() would need to be fixed for hwloc 2.0,
but it's only used in opal_hwloc_base_find_min_bound_target_under_obj()
which isn't used anymore. So just remove all of them.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
d12ef324c9 hwloc 2.0 doesn't have hwloc/myriexpress.h anymore
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
33ea2f0de4 fix OPAL_HWLOC_WANT_SHMEM management in opal/mca/hwloc/external/external.h
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
bd08a6ead9 hwloc: fix hwloc/shmem.h in the external case
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Jeff Squyres
af4299ebc5 hwloc: updates for hwloc 2.0.x API
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-05-24 11:53:07 +02:00
Brice Goglin
77cc3fcda5 hwloc: update to hwloc 2.0.1
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:52:59 +02:00
bosilca
2ab628b92e
Merge pull request #5074 from bosilca/topic/remove_warnings
Remove warnings identified by clang.
2018-05-15 11:15:23 -04:00
Howard Pritchard
db45d61dfa
Merge pull request #5147 from hppritcha/topic/plug_debug_hole_in_verbs
btl/openib: add conditional around an assert
2018-05-05 08:12:53 -06:00
Howard Pritchard
30eed9f035 btl/openib: addition conditional around an assert
A user trying to build Open MPI with explicit use
of CFLAGS on the make command line hit problems.

This fixes one of the problems.

https://www.mail-archive.com/users@lists.open-mpi.org//msg32241.html

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
2018-05-04 14:17:07 -06:00
Ralph Castain
4ff61450a4 Ensure pmix_cleanup finalizes the class system
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-05-04 06:22:36 -07:00
Gilles Gouaillardet
edb8fe8e4b pmix/ext1x: fix index handling when populating an info array
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-04-26 11:06:43 +09:00