1
1

514 Коммитов

Автор SHA1 Сообщение Дата
Yossi Itigin
66d931b7c4
Merge pull request #5116 from yosefe/topic/ucx-connect-errs
ucx: improve error messages during connection establishment
2018-05-02 14:04:24 +03:00
Geoff Paulsen
591b174434
Merge pull request #5003 from sam6258/shmem_free_fix
ompi/oshmem: fix shmem_free to perform no-op on null ptr
2018-04-30 12:03:24 -05:00
Yossi Itigin
385f38ab4e ucx: improve error messages during connection establishment
Also, unite common code calling ucp_ep_create()

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-04-30 15:45:05 +03:00
Joshua Ladd
15d5e2937a
Merge pull request #4996 from xinzhao3/topic/shmem-cswap
ompi/oshmem: fix cswap bug in mca/atomic/mxm.
2018-04-04 08:28:57 -04:00
Joshua Ladd
e87cb25711
Merge pull request #4982 from xinzhao3/topic/shmem-final
ompi/oshmem: fix bug in shmem_finalize.
2018-04-04 08:27:55 -04:00
Scott Miller
a8766adb55 ompi/oshmem: fix shmem_free to perform no-op on null ptr
Signed-off-by: Scott Miller <scott.miller1@ibm.com>
2018-04-02 17:12:24 -04:00
Xin Zhao
4aad386c2b ompi/oshmem: fix bug in shmem_finalize.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-04-02 09:07:59 -05:00
Xin Zhao
a5b72cc2e4 ompi/oshmem: fix cswap bug in mca/atomic/mxm.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-03-30 03:17:01 -05:00
Xin Zhao
af32c305de ompi/oshmem: fix bug in shmem_alltoall in mca/scoll/basic.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-03-29 14:54:36 -05:00
Artem Polyakov
77ff99e9ee
Merge pull request #4933 from karasevb/timings_update
timings: added new timing points
2018-03-25 00:10:49 -07:00
Jeff Squyres
c3adcb05eb Miscellaneous compiler warnings fixes
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-23 11:45:30 -07:00
Boris Karasev
3796307a57 timings: added new timing points
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-03-21 05:16:25 +02:00
Alex Mikheev
292d185c30
oshmem: refactor group cache
- Use opal hash table instead of list for group lookup.
- Code cleanup/refactoring. Group cache is now a part
  of the proc_group.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-02-22 11:48:06 +02:00
Yossi Itigin
1b1402299a
Merge pull request #4833 from alex-mikheev/topic/oshmem_gcache_grp_msg_fix
oshmem: increase group cache size to 1000
2018-02-19 14:39:26 +02:00
Alex Mikheev
03a094b9a8
oshmem: increase group cache size to 1000
and fix typos in help messages

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-02-19 11:50:24 +02:00
Alex Mikheev
cca67a69ea oshmem: scoll: fixes strided alltoall
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-02-19 09:41:21 +02:00
Gilles Gouaillardet
88e26c63e0 spml/ucx: fix a double free() issue
in mca_spml_ucx_add_procs() error path

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-22 13:42:16 +09:00
Joshua Ladd
dbefb35aad
Merge pull request #4635 from karasevb/oshmem/spec_1.3/broadcast
oshmem: remove "shmem_broadcast" in accordance with the spec v1.3
2018-01-17 12:11:09 -05:00
Alex Mikheev
ae326546f4
ompi/oshmem: ucx is selected over yalla/ikrit by default
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-01-17 15:08:04 +02:00
Yossi Itigin
1193e1eb83 spml_ucx: fix rkey leak
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-12-26 20:47:26 +02:00
Boris Karasev
f6818af1ab oshmem: remove "shmem_broadcast" in accordance with the spec v1.3
Fixes: https://github.com/open-mpi/ompi/issues/4098

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-12-19 10:41:12 +02:00
Nathan Hjelm
1282e98a01 opal/asm: rename existing arithmetic atomic functions
This commit renames the arithmetic atomic operations in opal to
indicate that they return the new value not the old value. This naming
differentiates these routines from new functions that return the old
value.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-30 10:41:22 -07:00
Nathan Hjelm
9d0b3fe9f4 opal/asm: remove opal_atomic_bool_cmpset functions
This commit eliminates the old opal_atomic_bool_cmpset functions. They
have been replaced by the opal_atomic_compare_exchange_strong
functions.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-30 10:41:22 -07:00
Nathan Hjelm
3ff34af355 opal: rename opal_atomic_cmpset* to opal_atomic_bool_cmpset*
This commit renames the atomic compare-and-swap functions to indicate
the return value. This is in preperation for adding support for a
compare-and-swap that returns the old value. At the same time the
return type has been changed to bool.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-10-31 12:47:23 -06:00
Alex Mikheev
7cb7af1685
OSHMEM: add ucx to the list of default spmls
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-10-18 10:41:00 +03:00
Alina Sklarevich
c7f5d13550 OSHMEM/CONFIGURE: verbs component - restore the previous build behavior
In case where support was requested but not found, stop the build.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-10-16 11:53:02 +03:00
Alina Sklarevich
3008827f83 OSHMEM/CONFIGURE: Check for the presence of ibv_exp_reg_shared_mr.
+ The sshmem verbs component will disqualify itself if this verb isn't
present on the build host.
+ In case where support was requested but not found, don't stop the
build - continue without this component.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-10-12 19:57:12 +03:00
Mike Dubman
3d1a7ddd9f Merge pull request #4271 from karasevb/oshmem/spec
oshmem: refactoring the definition of `SHMEM_ALLTOALLS_SYNC_SIZE`
2017-09-27 13:17:37 +03:00
Boris Karasev
7479328937 oshmem: refactoring the definition of SHMEM_ALLTOALLS_SYNC_SIZE
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-09-26 12:08:55 +03:00
Mike Dubman
4c98e6bde2 Merge pull request #4258 from yosefe/topic/spml-ucx-fix-quiet-typo
spml_ucx: fix typo in shmem_quiet() error message.
2017-09-26 11:10:40 +03:00
Boris Karasev
584ff76dea oshmem: introduced the definition SHMEM_ALLTOALLS_SYNC_SIZE
In accordance with the OSHMEM spec, this definition must be included in
the code.

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-09-26 09:12:09 +03:00
Yossi Itigin
3081576124 spml_ucx: fix typo in shmem_quiet() error message.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-09-24 19:20:55 +03:00
Gilles Gouaillardet
b9315edb85 configury: remove the --disable-mpi-io option
Fixes open-mpi/ompi#2185

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-09-20 14:39:09 +09:00
Howard Pritchard
bfd5ed6e98 Merge pull request #1910 from hpcraink/pr/shmem_fix_f77
Fix shmem.fh: fails to compile with F77 fixed-form compiled programs...
2017-09-19 14:28:08 -06:00
Rainer Keller
d529c289db Fails to compile with F77 fixed-form compiled programs...
Convert to F77 notation and split into two (shorter) lines.
Also, make usage of the SHMEM_MAX_NAME_LEN definition, by moving
that first.

Signed-off-by: Rainer Keller <rainer.keller@hft-stuttgart.de>
2017-09-15 15:09:43 +02:00
Alina Sklarevich
007b1803ec SPML_UCX: use ompi_proc_world_size() to set the estimated_num_eps value
before this fix, mca_spml_ucx_component_open was using
oshmem_num_procs() to set the value of params.estimated_num_eps for UCX.
The oshmem_num_procs() function uses oshmem_group_all which will be
initialized after the call to mca_spml_ucx_component_open and therefore,
cannot be used there.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-09-04 14:46:00 +03:00
Gilles Gouaillardet
77f30a4378 oshmem_info: cleanup oshmem_info output
- there is no C++ bindings in OpenSHMEM
- only Fortran binding is shmem.fh

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-09-01 13:25:19 +09:00
Gilles Gouaillardet
d1740a679c oshmem: add C++ wrappers
though there are no C++ bindings for oshmem, we need C++ wrappers
since a C compiler might not be able to compile a C++ source.
the C++ wrappers are :
- shmemc++ / oshc++
- shmemcxx / oshcxx
- shmemCC / oshCC (on case sensitive filesystems)

also add the examples/hello_oshmem_cxx.cc example

Thanks Bert Wesarg for bringing this to our attention

Fixes open-mpi/ompi#2097

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-09-01 13:24:34 +09:00
Joshua Hursey
e1d079544b mca: Dynamic components link against project lib
* Resolves #3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-08-24 11:56:16 -04:00
Jeff Squyres
791bcee6c0 ompi/fortran: remove proof-of-concept mpi_f08 module
This module was always intended to be a proof of concept, and was far
from complete.  If/when someone implemented F08 descriptor support for
the mpi_f08 module, this commit can either be restored or used as
reference material.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-08-10 06:19:17 -07:00
Alex Mikheev
692021f637
oshmem: sshmem sysv: auto huge page alloc can fallback to regular pages.
Fallback to the regular pages if huge page allocation is set to auto
and it was not possible to allocate requested amount of memory with
the hugepages.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-08-06 13:33:04 +03:00
Mike Dubman
dd3acd9220 Merge pull request #4006 from alex-mikheev/topic/oshmem_shmem_ptr
oshmem: shmem_ptr() implementation
2017-08-03 19:45:38 +03:00
Alex Mikheev
1b5df76f8b
oshmem: shmem_ptr() implementation
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-08-03 13:56:34 +03:00
Howard Pritchard
1d612da1cb oshmem: fix issue with shmem_g c11 generics
There was a typo in the shmem_g c11 generic interface
in shmem.h.in

Thanks to @nspark for reporting the problem and
specifying the fix.

Fixes #3968

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
2017-08-01 09:58:20 -06:00
Boris Karasev
77c50efb95 Yoda SPML is removed
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-07-14 08:47:16 +03:00
Gilles Gouaillardet
72c7329462 configury: use 'uname -n' when 'hostname' is not available
the 'hostname' command might not be available on some platforms
such as Fedora Core 26, so mimick config/libtool.m4 and fallback
to 'uname -n' if needed

Refs. #3680

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-06-12 15:04:32 +09:00
KAWASHIMA Takahiro
362445d486 Use same prefix format for [host:pid]
Hostname and PID are output as a message prefix in many places in
our code. Their printf-formats were either `[%s:%d]` or `[%s:%05d]`.
This commit changes `[%s:%d]` to `[%s:%05d]`. The latter was more
widely used in our code (including OPAL output system and the signal
handler).

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-06-08 19:35:03 +09:00
KAWASHIMA Takahiro
6b91eddc8b Apply opal_abort_delay to the signal handler
This commit expands the effect of the MCA parameter `opal_abort_delay`
to the OPAL signal handler. This allows attaching of a debugger on
segmentation fault etc. before quitting the job.

The sleep code is moved to the `opal_delay_abort` function from the
`ompi_mpi_abort` and `oshmem_shmem_abort` functions for code cleanup.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-06-08 19:34:48 +09:00
Gilles Gouaillardet
08526e8adc fortran/base: rename strings.h into fortran_base_strings.h
rename ompi/mpi/fortran/base/strings.h so it does not get pulled
when /usr/include/strings.h is expected.

Refs open-mpi/ompi#3639

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-06-02 09:46:20 +09:00
George Bosilca
037a85a782
Fix the OSHMEM request padding.
This patch fixes a missed case by 5b670a2 (PR #3634).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-06-01 18:30:02 -04:00