1
1
Граф коммитов

26922 Коммитов

Автор SHA1 Сообщение Дата
Yossi
05b568a46d Merge pull request #3074 from alex-mikheev/topic/pml_ucx_bsend_dt_fix
ompi: pml ucx: fix datatype packing error in bsend
2017-03-01 18:56:51 +02:00
Alex Mikheev
152f77df59
ompi: pml ucx: fix datatype packing error in bsend
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-01 16:18:19 +02:00
Mike Dubman
abc56cea8f Merge pull request #3071 from yosefe/topic/fix-memhooks-mxm-hcoll
yalla/mtl_mxm/hcoll: open memory component to activate memory hooks.
2017-03-01 12:19:16 +01:00
Yossi Itigin
33471c44ee pml_yalla/mtl_mxm/hcoll: open memory component to activate memory hooks.
Memory hooks are now set-up on demand. pml/yalla, mtl/mxm and
coll/hcoll need the memory hooks, so make sure those are installed.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-03-01 12:12:20 +02:00
Gilles Gouaillardet
880f2d5431 mpi/c: revamp error handling in MPI_{Pack,Unpack}[_external]
Thanks Alex and the folks at Mellanox for the help.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-03-01 10:03:31 +09:00
Jeff Squyres
d5266aba90 Merge pull request #2955 from jsquyres/pr/hwloc-external-fixes
Fix --with-hwloc=external
2017-02-28 14:57:07 -05:00
Josh Hursey
0006f0d7c5 Merge pull request #2773 from jjhursey/topic/hook-fwk
Add a 'hook' framework
2017-02-28 12:29:50 -06:00
Jeff Squyres
cc23439465 Merge pull request #3055 from jsquyres/pr/README-backwards-compat
README: Add more info about "backwards compatibility"
2017-02-28 13:03:43 -05:00
Ralph Castain
735fbf8f67 Merge pull request #3011 from artpol84/add_proc_fix/master
ompi: Avoid unnecessary PMIx lookups when adding procs.
2017-02-28 08:25:08 -08:00
Jeff Squyres
fec519a793 hwloc: rename opal/mca/hwloc/hwloc.h -> hwloc-internal.h
Per a prior commit, the presence of "hwloc.h" can cause ambiguity when
using --with-hwloc=external (i.e., whether to include
opal/mca/hwloc/hwloc.h or whether to include the system-installed
hwloc.h).

This commit:

1. Renames opal/mca/hwloc/hwloc.h to hwloc-internal.h.
2. Adds opal/mca/hwloc/autogen.options to tell autogen.pl to expect to
   find hwloc-internal.h (instead of hwloc.h) in opal/mca/hwloc.
3. s@opal/mca/hwloc/hwloc.h@opal/mca/hwloc/hwloc-internal.h@g in the
   rest of the code base.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-28 07:48:42 -08:00
Jeff Squyres
a065b9b83b hwloc: allow frameworks to have alternate header filenames
Frameworks are usually required to have a framework/framework.h file.
However, this is sometimes problematic (see the hwloc use case/problem
description, below).

This commit allows frameworks to have an "autogen.options" file (i.e.,
project/mca/framework/autogen.options) that specifies things that
autogen needs to know about the framework.  Currently, the only option
recognized in autogen.options is "framework_header", which allows a
framework to specify that its header file is named something other
than "framework.h" (the framework header file must still be in the
project/mca/framework directory; it simply may be named something
other than framework.h).  More options may be introduced over time.

The use case that motivated this is the hwloc framework
(https://github.com/open-mpi/ompi/issues/2616).

Per MCA framework rules, the hwloc framework is required to have an
opal/mca/hwloc/hwloc.h file.  However, the hwloc library itself *also*
has an hwloc.h file.  This causes a problem when configuring Open MPI
with --with-hwloc=external (meaning: do not use the hwloc embedded
within the Open MPI source code tree -- instead, use an hwloc
installation from outside the Open MPI source code tree).
Specifically, when in the opal/mca/hwloc directory, the presence of
"-I." in DEFAULT_INCLUDES (put there by Automake) causes a confusion
between the hwloc.h in opal/mca/hwloc/hwloc.h and the system-installed
hwloc.h.  Chaos ensues (see the GitHub issue for more detail).

The solution is to rename the opal/mca/hwloc/hwloc.h to something else
(e.g., hwloc-internal.h), and extend autogen.pl to allow frameworks to
have an alternate name for their framework header file.

This commit introduces the autogen.pl mechanism to allow the alternate
header file name.  A follow-on commit will effect this change in the
hwloc framework (and update all the places in the code base to use the
new filename).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-28 07:45:34 -08:00
Jeff Squyres
0cd3b6c235 treematch: do not include <hwloc.h>
Instead, include "opal/mca/hwloc/hwloc.h"

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-28 07:45:23 -08:00
Josh Hursey
b1c4e50500 Merge pull request #2934 from jjhursey/topic/coll-comm-restructure
Move coll structure outside of the communicator
2017-02-28 08:45:18 -06:00
Ralph Castain
c320b3671a Merge pull request #3050 from naughtont3/tjn-fix-orteclean
orte-clean: fix bad username usage, add orte-dvm
2017-02-28 06:01:37 -08:00
Thomas Naughton
74f8c2ae30 orte-clean: fix bad username/uid usage, add orte-dvm
This fixes a mismatch between PS listing that returned
USERNAME but code was pruning based on UID.

This changes the OPAL_PS_FLAVOR_CHECK format to return
'uid' instead of 'user'.  (Note: Avoiding call to
getlogin_r() but assuming UID is uniform on system,
same assumption exists for session dir anyway.)

Note, still maintains behavior from man page for root
running orte-clean on node (kills all orteds).

Adds 'orte-dvm' to list of procnames that will be checked/killed.

Signed-off-by: Thomas Naughton <naughtont@ornl.gov>
2017-02-28 08:00:06 -05:00
Jeff Squyres
9ce3c7f150 Merge pull request #3057 from hjelmn/osc_rdma_atomic
osc/rdma: fix compile warning
2017-02-28 06:15:30 -05:00
Nathan Hjelm
032bcf915a osc/rdma: fix compile warning
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-27 16:26:00 -07:00
Jeff Squyres
842f8c1286 README: Add more info about "backwards compatibility"
Add more clarifying statements about our definition of "backwards
compatibility" -- adding an example with static linking and another
with containers.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-27 17:16:48 -05:00
Jeff Squyres
07d8452646 Merge pull request #3052 from jsquyres/pr/update-authors
AUTHORS: update names
2017-02-27 16:39:15 -05:00
Jeff Squyres
e6b3be8e1f AUTHORS: update names
Update the .mailmap and re-run `contrib/dist/make-authors.pl`.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-27 16:37:23 -05:00
Jeff Squyres
45cdf2eb7a Merge pull request #3051 from jsquyres/pr/mailmap-update
.mailmap: Remove accent from Aurelein's name
2017-02-27 16:21:28 -05:00
Jeff Squyres
afc49f3361 .mailmap: Remove accent from Aurelein's name
This was per request of Aurelein.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-27 16:18:13 -05:00
George Bosilca
366d64b7e5 Move the collective structure outside the communicator.
As we changed the ABI (forcing a major release), we can limit
the size of the predefined communicators by moving the collective
structure outside the communicator. This might have a minimal,
but unnoticeable, impact on performance. This approach has been
discussed during the January 2017 devel meeting.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-27 11:54:17 -06:00
Joshua Hursey
c10bbfded6 ompi/hook: Add the hook/license framework
* Include a 'demo' component that shows some of the features.
 * Currently has hooks for:
   - MPI_Initialized
     - top, bottom
   - MPI_Init_thread
     - top, bottom
   - MPI_Finalized
     - top, bottom
   - MPI_Init
     - top (pre-opal_init), top (post-opal_init), error, bottom
   - MPI_Finalize
     - top, bottom
 * Other places in ompi can 'register' to hook into any one of these places
   by passing back a component structure filled with function pointers.
 * Add a `MCA_BASE_COMPONENT_FLAG_REQUIRED` flag to the MCA structure that
   is checked by the `hook` framework. If a required, static component has
   been excluded then the `hook` framework will fail to initialize.
   - See note in `opal/mca/mca.h` as to why this is checked in the `hook`
     framework and not in `opal/mca/base/mca_base_component_find.c`

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-27 12:05:53 -05:00
Nathan Hjelm
581bff9871 Merge pull request #3034 from hjelmn/osc_rdma_atomic
osc/rdma: make locking code more robust
2017-02-27 08:46:52 -07:00
Ralph Castain
f054261590 Merge pull request #3027 from naughtont3/tjn-envvar-dvmuri
dvm: Add envvar 'ORTE_HNP_DVM_URI' to schizo:ompi
2017-02-27 06:56:44 -08:00
Ralph Castain
feed472ea5 Merge pull request #3043 from rhc54/topic/purge
Skip empty files to avoid infinite loop
2017-02-27 06:03:54 -08:00
Ralph Castain
a774ea73e4 Skip empty files to avoid infinite loop
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-27 06:02:54 -08:00
Nathan Hjelm
4707c7c5e0 osc/rdma: make locking code more robust
Under heavy load the locking code could fail if the underlying btl
module started to return OPAL_ERR_OUT_OF_RESOURCE on atomic
operations. This commit updates the code to gracefully handle btl
errors.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-27 00:01:26 -07:00
Gilles Gouaillardet
af0b5cffb4 asm: rename the AMD64 into X86_64
in this context, AMD64 really means amd64 or em64t, so let's
rename this into X86_64 in order to avoid any confusion

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-27 15:10:50 +09:00
Gilles Gouaillardet
ab5e86c97d travis: install hwloc packages
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-27 14:40:35 +09:00
Gilles Gouaillardet
2f4013ce33 configury: fix asm atomic detection
there is no need to look for an assembly file when BUILTIN_GCC is used

Fixes open-mpi/ompi#3032
Refs open-mpi/ompi#3036

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-27 10:42:50 +09:00
Alex Mikheev
c9b5b12af4
oshmem: sshmem ucx: use fixed base address
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-26 15:16:28 +02:00
Ralph Castain
efc3a98ea6 Merge pull request #3031 from rhc54/topic/ofi
Add CPPFLAGS to build of rml/ofi component.
2017-02-25 11:23:03 -08:00
Ralph Castain
9f8f7f3189 Add CPPFLAGS to build of rml/ofi component.
Fix finalize to ensure we only destruct the msg queue list once.
Update platform file

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-25 09:17:41 -08:00
Ralph Castain
0db91889a7 Merge pull request #3018 from naughtont3/tjn-dvmerrmgr-issue2987
debug fix for DVM early quit
2017-02-25 08:09:16 -08:00
Sylvain Jeaugey
f827b6b8dd Fix more typos using the allgather module for allreduce operations, causing a crash when CUDA collectives are enabled.
Signed-off-by: Sylvain Jeaugey <sjeaugey@nvidia.com>
Signed-off-by: Akshay Venkatesh <akvenkatesh@nvidia.com>
2017-02-24 16:35:29 -08:00
Thomas Naughton
006be92df5 dvm: Add envvar 'ORTE_HNP_DVM_URI' to schizo:ompi
Add ability to pass DVM URI purely via environment
to simplify invocation from command-line (e.g., start dvm,
export URI, mpirun w/o needing to add `--hnp` arg).
If user passes both envvar *and* cmdline, the cmdline wins.

Signed-off-by: Thomas Naughton <naughtont@ornl.gov>
2017-02-24 16:55:32 -05:00
Jeff Squyres
d7dd4d769e openmpi-mca-params.conf: Fix comment
Make sure to specify "--level 9" to ompi_info to see all MCA params.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-24 07:09:06 -08:00
Jeff Squyres
99ec16edea Merge pull request #3023 from clementFoyer/patch-1
Fix minor typo
2017-02-23 10:38:46 -05:00
Thomas Naughton
beb5b250bf orte dvm: debug fix for DVM early quit
Ensure that job errors do not cause the DVM to fail unless the failed job is the DVM itself.

Refs #2987, with improvements from Ralph

Signed-off-by: Thomas Naughton <naughtont@ornl.gov>
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-23 10:17:53 -05:00
Clement Foyer
f371cc0a43 Fix minor typo
Return value in comment about opal_list_item_compare_fn_t typedef when a < b is indicated to be 11 instead of -1.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
2017-02-23 16:10:32 +01:00
Ralph Castain
591a2d4a88 Merge pull request #3017 from rhc54/topic/dlopen
Update to PMIx master to include dlopen fixes and addition of libltdl support
2017-02-22 12:57:07 -08:00
Ralph Castain
e86a0dbf39 Update to PMIx master to include dlopen fixes and addition of libltdl support
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-22 11:54:33 -08:00
Ralph Castain
57f6646cbe Merge pull request #3016 from rhc54/topic/copyright
Be a little less OMPI-centric on checking for the top-level directory
2017-02-22 11:32:30 -08:00
Ralph Castain
8ae55429bc Be a little less OMPI-centric on checking for the top-level directory
Look for .git directory

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-22 11:29:51 -08:00
Alex Mikheev
c63137e1c0 oshmem: sshmem ucx: minor code cleanup
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:48:00 +02:00
Alex Mikheev
132fbd9ae9 oshmem: sshmem: add UCX allocator
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:48:00 +02:00
Alex Mikheev
e038e3f9e0 oshmem: sshmem: code cleaunp
The commit removes unused code and interface function, moves
common code to the base.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:47:59 +02:00
Yossi
fb67c966a8 Merge pull request #2944 from alex-mikheev/topic/pml_ucx_bsend
ompi: pml ucx: add support for the buffered send
2017-02-22 12:21:03 +02:00