1
1
Граф коммитов

30270 Коммитов

Автор SHA1 Сообщение Дата
Aurelien Bouteiller
6b3be224d4
Adding a FIN message to differentiate normal TCP closing from failures
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-01-27 13:32:34 -05:00
Aurelien Bouteiller
b7be64482a
Revert "Revert "Handle error cases in TCP BTL""
This reverts commit 5162011428.

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-01-27 13:31:06 -05:00
Jeff Squyres
69bd54c83b
Merge pull request #7310 from itemko/artemry/azure_ci_updates_for_review
Fixed several Mellanox Open MPI CI issues.
2020-01-16 09:25:43 -05:00
Artem Ryabov
fd7e94022b Fixed several Mellanox Open MPI CI issues.
The following issues have been fixed:
- Corrected the link to CI status badge in README after some renaming.
- Updated agent capabilities.
- Switched to jenkins_scripts from master branch.
- Corrected support e-mail.

Signed-off-by: Artem Ryabov <artemry@mellanox.com>
2020-01-16 13:43:28 +03:00
Nathan Hjelm
037b0bd9ee
Merge pull request #7304 from hjelmn/btl_vader_fix_max_address_on_aarch64
btl/vader: modify how the max attachment address is determined
2020-01-15 17:02:33 -07:00
Nathan Hjelm
728d51f9f3 btl/vader: modify how the max attachment address is determined
This PR removes the constant defining the max attachment address and
replaces it with the largest address that shows up in /proc/self/maps.
This should address issues found on AARCH64 where the max address
may differ based on the configuration.

Since the calculated max address may differ between processes the
max address is sent as part of the modex and stored in the endpoint
data.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-01-14 15:15:36 -08:00
Nathan Hjelm
61f96b3d6d
Merge pull request #7283 from hjelmn/fix_issues_in_both_vader_and_opal_interval_tree_t_that_were_causing_issue_6524
Fix issues in both vader and opal interval tree t that were causing issue 6524
2020-01-14 14:54:51 -07:00
Jeff Squyres
b3fe523150
Merge pull request #7124 from devreal/fix-opal-align-min
Fix OPAL_ALIGN_MIN to work on 32-bit systems
2020-01-14 15:18:55 -05:00
Jeff Squyres
02f02e7c1e
Merge pull request #7300 from itemko/artemry/mellanox-ci-with-azure-pipelines-review
Reworked Mellanox Open MPI CI with Azure Pipelines
2020-01-14 14:11:34 -05:00
Artem Ryabov
98bfe87ee5 Reworked Mellanox OpenMPI CI with Azure Pipelines.
Signed-off-by: Artem Ryabov <artemry@mellanox.com>
2020-01-14 20:20:51 +03:00
Jeff Squyres
25931ea8bf
Merge pull request #7200 from cpshereda/master-opal_gethostname-change
Fix unsafe use of gethostname()
2020-01-13 16:07:43 -05:00
Jeff Squyres
887400c878
Merge pull request #7174 from jsquyres/pr/ofi-mtl-fi-version-bump
mtl/ofi: increase the FI_VERSION requested to 1.5 and make sure to check for OFI_LOCAL_COMM
2020-01-13 13:22:22 -05:00
Charles Shereda
cbc6feaab2 Created opal_gethostname() as safer gethostname substitute.
The opal_gethostname() function provides a more robust mechanism
to retrieve the hostname than gethostname(), which can return
results that are not null-terminated, and which can vary in its
behavior from system to system.

opal_gethostname() just returns the value in opal_process_info.nodename;
this is populated in opal_init_gethostname() inside opal_init.c.

-Changed all gethostname calls in opal subtree to opal_gethostname
-Changed all gethostname calls in orte subtree to opal_gethostname
-Changed all gethostname calls in ompi subdir to opal_gethostname
-Changed all gethostname calls in oshmem subdir to opal_gethostname
-Changed opal_if.c in test subdir to use opal_gethostname
-Changed opal_init.c to include opal_init_gethostname. This function
 returns an int and directly sets opal_process_info.nodename per
 jsquyres' modifications.

Relates to open-mpi#6801

Signed-off-by: Charles Shereda <cpshereda@lanl.gov>
2020-01-13 08:52:17 -08:00
Robert Wespetal
49128a7adb mtl/ofi: Add workaround for EFA local/remote capabilities bug
Some versions of Libfabric contain a bug in EFA where FI_REMOTE_COMM and
FI_LOCAL_COMM are not advertised. In order to workaround this, we need to call
fi_getinfo() without those capability bits to see if EFA is available first.

Also move around some of the provider include/exclude list logic so we can skip
this workaround if applicable.

Signed-off-by: Robert Wespetal <wesper@amazon.com>
2020-01-13 08:26:01 -08:00
Jeff Squyres
21bc9042e1 mtl/ofi: check for FI_LOCAL_COMM+FI_REMOTE_COMM
Make sure to get an RDM provider that can provide both local and
remote communication.  We need this check because some providers could
be selected via RXD or RXM, but can't provide local communication, for
example.

Add OPAL_CHECK_OFI_VERSION_GE() m4 macro to check that the Libfabric
we're building against is >= a target version.  Use this check in two
places:

1. MTL/OFI: Make sure it is >= v1.5, because the FI_LOCAL_COMM /
   FI_REMOTE_COMM constants were introduced in Libfabric API v1.5.
2. BTL/usnic: It already had similar configury to check for Libfabric
   >= v1.1, but the usnic component was checking for >= v1.3.  So
   update the btl/usnic configury to use the new macro and check for
   >= v1.3.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-01-13 08:19:53 -08:00
Jeff Squyres
dd2d7d2866
Merge pull request #7189 from michaellass/fix-dims_create
dims_create: fix calculation of factors for odd squares
2020-01-10 09:47:09 -05:00
Jeff Squyres
c471f1cb0b
Merge pull request #7286 from jsquyres/pr/add-git-submodule-status-to-github-issue-template
Github issue template: updates
2020-01-08 12:53:57 -05:00
Jeff Squyres
a26a6c8e42 Github issue template: updates
1. Add more recent release version numbers in the examples
2. Add request for output from `git submodule status` when
   building/installing from a git clone

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-01-08 11:47:35 -05:00
Brian Barrett
f853971bc1
Merge pull request #6821 from jsquyres/pr/make-hwloc201-tarball-a-submodule
hwloc v2.1.0: use a git submodule
2020-01-08 07:41:37 -08:00
bosilca
2d659323b9
Merge pull request #7233 from bosilca/fix/unused_fortran_protected
Remove unused variable.
2020-01-08 08:04:06 -05:00
Nathan Hjelm
f86f805be1 btl/vader: fix issues with xpmem registration invalidation
This commit fixes an issue discovered in the XPMEM registration cache. It
was possible for a registration to be invalidated by multiple threads
leading to a double-free situation or re-use of an invalidated registration.

This commit fixes the issue by setting the INVALID flag on a registation
when it will be deleted. The flag is set while iterating over the tree
to take advantage of the fact that a registration can not be removed
from the VMA tree by a thread while another thread is traversing the VMA
tree.

References #6524
References #7030
Closes #6534

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-01-07 22:50:52 -07:00
Nathan Hjelm
1145abc0b7 opal: make interval tree resilient to similar intervals
There are cases where the same interval may be in the tree multiple
times. This generally isn't a problem when searching the tree but
may cause issues when attempting to delete a particular registration
from the tree. The issue is fixed by breaking a low value tie by
checking the high value then the interval data.

If the high, low, and data of a new insertion exactly matches an
existing interval then an assertion is raised.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-01-07 21:43:58 -07:00
Jeff Squyres
a7e2bb44d5
Merge pull request #7264 from jsquyres/pr/linux-specfile-fixes
openmpi.spec: update modulefile_path behavior
2020-01-07 17:50:49 -05:00
bosilca
1b93a173ab
Merge pull request #7149 from bosilca/fix/datatype_overflow
Prevent overflow when dealing with datatype count.
2020-01-07 14:13:32 -05:00
Jeff Squyres
9caf43a9c2
Merge pull request #7271 from e-kwsm/update-man
Fix typo and update URLs (https, redirection) [skip ci]
2020-01-06 16:41:32 -05:00
Eisuke Kawashima
d26d4e1d63
Fix typo and update URLs (https, redirection) [skip ci]
Signed-off-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
2020-01-07 03:52:25 +09:00
Howard Pritchard
b17cd9049f
Merge pull request #7238 from rwespetal/ofi-prov-name-fix
mtl/ofi: ignore case when comparing provider names
2020-01-02 12:23:39 -07:00
Jeff Squyres
352e575e18 openmpi.spec: update modulefile_path behavior
Allow the user to override the modulefile_path (root directory to
install the Open MPI modulefile), even if install_in_opt==1.  For
example:

rpmbuild \
    --rebuild \
    --define 'install_in_opt 1' \
    --define 'modulefile_path /path/to/my/modulefiles/openmpi/%{version}' \
    openmpi-4.0.2-1.src.rpm

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-12-30 06:53:56 -08:00
Jeff Squyres
a2a9a9516b hwloc2: bump up to hwloc v2.1.0
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-12-24 16:01:03 -08:00
Jeff Squyres
c292e759da hwloc2: bump up to hwloc 2.0.4
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-12-24 16:01:03 -08:00
Jeff Squyres
e5722acc37 hwloc201: replace with "hwloc2" component+git submodule
Rename the component to be "hwloc2" (since it can now be any v2.x.y
version of hwloc), and make the embedded copy of hwloc be a git
submodule.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-12-24 16:01:03 -08:00
Jeff Squyres
c32eeb3eb9 autogen: add sanity checks for git submodules
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-12-24 13:15:45 -05:00
Artem Polyakov
3f11c8ef6c
Merge pull request #7249 from janjust/oshmem_atomic_set_fix
Oshmem atomic set fix
2019-12-20 06:47:50 -08:00
Jeff Squyres
569d63ce46
Merge pull request #7252 from jsquyres/pr/clarify-with-hwloc-functionality
hwloc: clarify --with-hwloc behavior
2019-12-19 15:29:57 -05:00
Tomislav Janjusic
2d8f9b1d09 oshmem/extended: Fix shmem_atomic_set for float and double.
Co-authored with: Artem Polyakov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-12-19 21:15:41 +02:00
Tomislav Janjusic
cb5ff55b27 oshmem/ucx: fixed a build issue
Co-authored with: Artem Polyakov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-12-19 21:14:54 +02:00
Jeff Squyres
18c3e1af5e hwloc: clarify --with-hwloc behavior
Clarify in README what --with-hwloc does in its different use cases.

Also, ensure that the behavior when specifying `--with-hwloc` is the
same as if that option is not specified at all.  This is what we did
in Open MPI <= v3.x; looks like we inadvertantly caused `--with-hwloc`
to be synonymous with `--with-hwloc=external` in v4.0.0.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-12-19 08:38:57 -08:00
Jeff Squyres
8b424c3863
Merge pull request #7232 from bosilca/hjelmn_neighbor_alltoall_fix
Neighbor alltoall fix
2019-12-17 17:24:05 -05:00
Robert Wespetal
9b72e9465d mtl/ofi: ignore case when comparing provider names
Change the provider include and exclude list name comparison check to
ignore case. The UDP provider's name is uppercase and was being selected
despite being in the exclude list.

Signed-off-by: Robert Wespetal <wesper@amazon.com>
2019-12-16 13:05:00 -08:00
George Bosilca
97eb5d0cf2 Remove unused variable.
It got left over during the Fortran rework.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-12-11 16:59:50 -05:00
George Bosilca
86acdee460
Fix the communication ordering for all cartesian neighbor collectives.
This work is rooted in the [MPI Forum issue
153](https://github.com/mpi-forum/mpi-issues/issues/153).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-12-11 12:40:38 -05:00
Todd Kordenbrock
1af6dbe277
Merge pull request #7066 from tkordenbrock/topic/master/portals4.fix.flowcontrol.bugs
portals4: fix flow control bugs
2019-12-11 06:31:26 -06:00
Jeff Squyres
cf9a5fb06c
Merge pull request #7226 from mcoil1/pr/fix-memory_patcher
memory/patcher: fix compiler warning
2019-12-09 07:26:31 -05:00
Jeff Squyres
73ecece55d
Merge pull request #7211 from mcoil1/pr/fixing-compiler-warnings
Fix a few compiler warnings
2019-12-09 07:26:15 -05:00
Jeff Squyres
fb04596892
Merge pull request #7225 from wbailey2/pr/fix-fcoll_two_phase_support
fcoll/two_phase_support: warning stomp
2019-12-08 14:09:10 -05:00
Maxwell Coil
52241dbbcd libnbc: fixed uninitialized variable
Squash compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
2019-12-08 14:03:48 -05:00
Maxwell Coil
3ced33c2eb ompi/dpm/dpm.c: Fix uninititalized variable
Squash compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
2019-12-08 14:03:48 -05:00
Maxwell Coil
52a9cce6f3 memory/patcher: fix compiler warning
syscall() returns a long, but we are invoking shmat(), which returns
a void*.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
2019-12-08 13:56:00 -05:00
William Bailey
e2718e0196 fcoll/two_phase: Compiler warning for wrong variable type used
Squash compiler warning. Changed output specifier to match variable type (long int -> long long int).

Signed-off-by: William Bailey <wbailey2@nd.edu>
2019-12-08 13:15:30 -05:00
Jeff Squyres
cdf46e6682
Merge pull request #7222 from jsquyres/pr/mpool-basic-pointer-fix
mpool/base: fix basic mpool_base() function
2019-12-06 13:22:49 -05:00