1
1

5322 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
dec247d96e opal/datatype: minor compiler warning stomp
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-05-30 10:08:19 -07:00
bosilca
4ebed21b6d
Merge pull request #4670 from ggouaillardet/topic/opal_bitmap
opal/bitmap: fix opal_bitmap_set_bit()
2018-05-29 10:50:21 -04:00
Sylvain Jeaugey
4eb75623ef cuda: add option to remove warning about missing libcuda.
Signed-off-by: Sylvain Jeaugey <sjeaugey@nvidia.com>
2018-05-24 14:56:46 -07:00
Brice Goglin
847f2e9933 opal/hwloc: remove now unused available field from opal_hwloc_obj_data_t
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
b260600450 opal/hwloc: simplify df_search() and make it work with hwloc 2.x NUMA nodes
Don't do a recursive search (hence no need for *idx anymore).
Find the level depth, to hide cache-issues first.
Then iterate over that level to find the objects we want.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
a06fc74664 opal/hwloc: remove an obsolete comment about offlines CPUs etc
Only online/available objects are enabled in OMPI now.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
369a7ea279 opal/hwloc: remove df_search_cores and fix things for hwloc 2.x NUMA nodes
Just iterate over cores inside the given object cpuset.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
0cd0c12111 opal/hwloc: remove min_bound() functions
df_search_min_bound() would need to be fixed for hwloc 2.0,
but it's only used in opal_hwloc_base_find_min_bound_target_under_obj()
which isn't used anymore. So just remove all of them.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
d12ef324c9 hwloc 2.0 doesn't have hwloc/myriexpress.h anymore
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
33ea2f0de4 fix OPAL_HWLOC_WANT_SHMEM management in opal/mca/hwloc/external/external.h
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Brice Goglin
bd08a6ead9 hwloc: fix hwloc/shmem.h in the external case
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:53:07 +02:00
Jeff Squyres
af4299ebc5 hwloc: updates for hwloc 2.0.x API
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-05-24 11:53:07 +02:00
Brice Goglin
77cc3fcda5 hwloc: update to hwloc 2.0.1
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2018-05-24 11:52:59 +02:00
Jeff Squyres
22bfdb194d
Merge pull request #5174 from hoopoepg/topic/typo-in-comment
CONVERTOR: fixed typos in comments
2018-05-17 12:15:02 -04:00
Sergey Oblomov
52d5ca048e CONVERTOR: fixed typos in comments
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-16 22:02:39 +03:00
bosilca
2ab628b92e
Merge pull request #5074 from bosilca/topic/remove_warnings
Remove warnings identified by clang.
2018-05-15 11:15:23 -04:00
Howard Pritchard
db45d61dfa
Merge pull request #5147 from hppritcha/topic/plug_debug_hole_in_verbs
btl/openib: add conditional around an assert
2018-05-05 08:12:53 -06:00
Howard Pritchard
30eed9f035 btl/openib: addition conditional around an assert
A user trying to build Open MPI with explicit use
of CFLAGS on the make command line hit problems.

This fixes one of the problems.

https://www.mail-archive.com/users@lists.open-mpi.org//msg32241.html

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
2018-05-04 14:17:07 -06:00
Ralph Castain
4ff61450a4 Ensure pmix_cleanup finalizes the class system
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-05-04 06:22:36 -07:00
Nathan Hjelm
380dcb57de
Merge pull request #5072 from bosilca/topic/datatype_add_size_t
Allow OPAL DDT to receive size_t count argument.
2018-05-01 09:47:45 -06:00
Gilles Gouaillardet
edb8fe8e4b pmix/ext1x: fix index handling when populating an info array
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-04-26 11:06:43 +09:00
Ralph Castain
f424aa367e Fix external PMIx v1.2.5 support
As @hjelmn and I discussed, this is a little hacky. However, it is the only solution that can be done solely from the OMPI side.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-04-25 13:42:36 -07:00
Howard Pritchard
0751578cbf
Merge pull request #4949 from hppritcha/topic/memkind_update
mpool/memkind: refactor to use the current API
2018-04-25 07:24:28 -06:00
Howard Pritchard
824197f886 mpool/memkind: refactor to use the current API
The mpool/memkind component was using a deprecated "partitions" API.
This commit refactors the memkind component to make use of the
supported public API.

The public API uses 3 parameters to specify a mpool "kind":

- a memkind type (which for now is just default or HBM)
- a memkind policy
- a memkind_bits (partly to specify pagesize)

The MCA parameters were changed to reflect these memkind
parameters.

Add a make check test for sanity checking of the memkind component.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-04-24 22:11:21 -06:00
Nathan Hjelm
69456c8962
Merge pull request #5042 from Stonesjtu/patch-1
Fix typo
2018-04-16 12:13:35 -06:00
George Bosilca
6ff11267fb
Remove warnings identified by clang.
Plus minor spacing and indentation issues.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-04-14 17:14:12 -04:00
George Bosilca
cd683e3eec
Allow OPAL DDT to receive size_t count argument.
Fixes issue #5069, which relates a BigMPI bug with the use of
MPI_Type_vectpor to construct very large datatypes (>2GB).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-04-14 15:32:19 -04:00
Todd Kordenbrock
55c6918316
Merge pull request #5053 from tkordenbrock/topic/master/btl-portals4.del_proc.fix
master: btl-portals4: don't free module resources when proc count goes to zero
2018-04-12 12:12:34 -05:00
Gilles Gouaillardet
37e7bca867 pmix/ext1x: fix misc build time errors
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-04-12 14:58:55 +09:00
Kaiyu Shi
b7c5e65d4f Fix typo
Signed-off-by: Kaiyu Shi <skyisno.1@gmail.com>
2018-04-11 10:29:43 +08:00
Todd Kordenbrock
b569633ddf btl-portals4: don't free module resources when proc count goes to zero
This commit fixes a segfault in btl-portals4 add_procs().  The segfault
occurs if add_procs() is called after a del_procs() call that reduces
the proc count to zero which would cause PT and NI resources to be
freed.  This commit resolves the segfault by using a common
initiailization boolean and only freeing module resources in
finalize().

Signed-off-by: Todd Kordenbrock (thkgcode@gmail.com)
2018-04-10 14:20:22 -05:00
Jeff Squyres
45922c4e81 pmix/base: set PMIx to follow OPAL's mca_component_show_load_errors
Have Open MPI's PMIx component to set PMIx's "show_load_errors" to do
the same thing that Open MPI's "show_load_errors" does.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-04-10 10:24:35 -07:00
Jeff Squyres
f200b866df btl/tcp: roll back parts of 40afd525f8
Some of the show_help() messages that were added in 40afd525f8 were
really normal / expected behavior (e.g., if 2 peers connect in the TCP
BTL more-or-less simultaneously, one of them will drop the connection
-- no need to show_help() about this; it's expected behavior).  Roll
back these messages to be opal_output_verbose() kinds of messages.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-04-07 12:28:10 -07:00
Jeff Squyres
a2fc1ace09
Merge pull request #4992 from jsquyres/pr/pmix-version-info-mca-vars
pmix: add "pmix*_library_version" info MCA var
2018-04-04 17:29:06 -04:00
Ralph Castain
cd52ccdb68 Move past the '.' when getting jobstepid
The strtoul function returns the pointer to the first non-digit character, which is a '.' in this case. Calling strtoul at that point will always yield a zero - you have to move past it to get the remaining number

Thanks to Greg Lee for the detailed analysis of the problem.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-04-04 11:22:38 -07:00
Jeff Squyres
9f472d8a7b pmix: add "pmix*_library_version" info MCA var
Simple MCA vars for ext1, ext2, and pmix3 components to reflect what
the underlying PMIx library version is.  For example:

```
$ ompi_info --param pmix pmix3x --parsable --level 9 | grep
library_version
mca:pmix:pmix3x:param:pmix_pmix3x_library_version:value:PMIx library version 3.0.0 (embedded in Open MPI)
mca:pmix:pmix3x:param:pmix_pmix3x_library_version:source:default
mca:pmix:pmix3x:param:pmix_pmix3x_library_version:status:writeable
mca:pmix:pmix3x:param:pmix_pmix3x_library_version:level:4
mca:pmix:pmix3x:param:pmix_pmix3x_library_version:help:Version of the underlying PMIx library
mca:pmix:pmix3x:param:pmix_pmix3x_library_version:deprecated:no
mca:pmix:pmix3x:param:pmix_pmix3x_library_version:type:string
mca:pmix:pmix3x:param:pmix_pmix3x_library_version:disabled:false
```

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-29 14:21:07 -07:00
Jeff Squyres
8c419294a8 btl/tcp: fix CID 710596
sizeof(addrs[0].addr_inet)==16 (so that it can handle IPv6 addresses),
but the memory that we are copying from (my_ss->sin_addr) is only 4
bytes long.  Don't copy beyond the end of that source buffer.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-26 14:21:22 -07:00
Jeff Squyres
3003be14f3 btl/sm: fix CID 1415105
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-26 14:21:21 -07:00
Jeff Squyres
a17f4afdc7 btl/tcp: fix CID 1416634
Fix resource leak in the TCP BTL.  Also add a little defensive programming.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-26 14:21:21 -07:00
Nathan Hjelm
7f761d8434 opal_free_list: use lifo atomic functions in opal_free_list_wait_mt
This commit fixes a multi-threading bug when using the thread-safe
free list functions. opal_free_list_wait_mt() was using the
conditional version of opal_lifo_pop() and not the thread-safe call.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-03-26 10:16:42 -06:00
Ralph Castain
e443adc7a1 Reset OMPI master to PMIx master
Track PMIx master instead of the reference server - fixes problem of external PMIx master builds.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-25 08:36:46 -07:00
Artem Polyakov
77ff99e9ee
Merge pull request #4933 from karasevb/timings_update
timings: added new timing points
2018-03-25 00:10:49 -07:00
Jeff Squyres
06ec93a61a util/fd: fix CID 1430413
Take multiple defensive steps to fix CID 1430413 and ensure that ret
is always initialized upon return.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-24 04:25:26 -07:00
Jeff Squyres
c3adcb05eb Miscellaneous compiler warnings fixes
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-23 11:45:30 -07:00
Jeff Squyres
f66ac43fbc opal/util: fix CID 1430381
Fix minor resource leak.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-23 08:48:11 -07:00
Jeff Squyres
023a4a82d3
Merge pull request #4942 from jsquyres/pr/tcp-btl-help-message-updates
TCP help message updates
2018-03-22 08:53:04 -05:00
Jeff Squyres
0f8077ace6 oob/tcp: add show_help message about version mismatch
Be more explicit about version mismatch between ORTE processes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-21 20:18:28 -07:00
Jeff Squyres
a15d8233c9
Merge pull request #3434 from dsharma283/pr-3431
ompi/opal: add support for HDR link speeds
2018-03-21 21:57:20 -05:00
Jeff Squyres
40afd525f8 btl/tcp: make error messages more specific
Convert some verbose messages to opal_show_help() messages.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-21 19:34:03 -07:00
Jeff Squyres
e0d86b1c72 opal/util/fd: add opal_fd_get_peer_name(()
Returns a string name (either a resolved name or IPv4/IPv6 name in a
string if unresolvable.  The caller is responsible for freeing the
string.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-21 19:34:03 -07:00