1
1
Граф коммитов

629 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
67ba8da76f ompi_mpi_init: fix race condition
There was a race condition in 35438ae9b5: if multiple threads invoked
ompi_mpi_init() simultaneously (which could happen from both MPI and
OSHMEM), the code did not catch this condition -- Bad Things would
happen.

Now use an atomic cmp/set to ensure that only one thread is able to
advance ompi_mpi_init from NOT_INITIALIZED to INIT_STARTED.

Additionally, change the prototype of ompi_mpi_init() so that
oshmem_init() can safely invoke ompi_mpi_init() multiple times (as
long as MPI_FINALIZE has not started) without displaying an error.  If
multiple threads invoke oshmem_init() simultaneously, one of them will
actually do the initialization, and the rest will loop waiting for it
to complete.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-05 18:09:13 -07:00
Jeff Squyres
35438ae9b5 mpi/finalized: revamp INITIALIZED/FINALIZED
Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be
invoked during an attribute destruction callback (e.g., during the
destruction of keyvals on MPI_COMM_SELF during the very beginning of
MPI_FINALIZE).  In such cases, MPI_FINALIZED must return "false".

Prior to this commit, we hung in FINALIZED if it were invoked during
a COMM_SELF attribute destruction callback in FINALIZE.  See
https://github.com/open-mpi/ompi/issues/5084.

This commit converts the MPI_INITIALIZED / MPI_FINALIZED
infrastructure to use a single enum (ompi_mpi_state, set atomically)
to represent the state of MPI:

- not initialized
- init started
- init completed
- finalize started
- finalize past COMM_SELF destruction
- finalize completed

The "finalize past COMM_SELF destruction" state is what allows us to
return "false" from MPI_FINALIZED before COMM_SELF has been fully
destroyed / all attribute callbacks have been invoked.

Since this state is checked at nearly every MPI API call (to see if
we're outside of the INIT/FINALIZE epoch), care was taken to use
atomics to *set* the ompi_mpi_state value in ompi_mpi_init() and
ompi_mpi_finalize(), but performance-critical code paths can simply
read the variable without needing to use a slow call to an
opal_atomic_*() function.

Thanks to @AndrewGaspar for reporting the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-01 13:36:29 -07:00
Sergey Oblomov
319bb376f9 MCA/UCX: branch optimization in cswap call
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-30 07:14:43 +03:00
Sergey Oblomov
daad71f036 MCA/UCX: switch/case optimization
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-29 22:33:33 +03:00
Sergey Oblomov
6be4066e23 MCA/UCX: cswap call if updated to non-blocking API
- minor fixes

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-29 20:07:51 +03:00
Sergey Oblomov
b668e19cd1 Merge remote-tracking branch 'wg/master' into topic/amo-non-blocking-ucp
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-29 19:08:48 +03:00
Yossi Itigin
976cd5e307
Merge pull request #5186 from hoopoepg/topic/ucx-amo-error-msg
MCA/UCX: fixed error messages for incorrect msg size
2018-05-29 15:54:02 +03:00
Sergey Oblomov
0c3ed93ef0 MCA/UCX: added opal progress call to wait request routine
- added opal_progress call to wait function to avoid
  possible [dead]lock issues
- wait call is declared as inline
- minor fixes

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-29 11:34:40 +03:00
Yossi Itigin
705c8a7b9b
Merge pull request #5198 from brminich/shmem_fence
OSHMEM/SMPL/UCX: Add real fence support
2018-05-27 11:25:42 +03:00
Artem Polyakov
66e774d959
Merge pull request #4638 from karasevb/oshmem/spec_1.3/c11
oshmem: remove `shmem_put/get` when not the C11 case in accordance with the spec v1.3
2018-05-26 17:29:51 -07:00
Mikhail Brinskii
8e9d401938 OSHMEM/SMPL/UCX: Add real fence support
+ Add quiet method to SPML, so it can have different implementation with
fence.
+ Use ucp_worker_fence for spml_fence method of UCX SPML

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2018-05-25 22:43:06 +03:00
Brian Barrett
9fff40647d oshmem: disable if no spmls build
This patch disables the oshmem layer if there are no SPMLs that
will build.  With the limited set of SPMLs available to support
oshmem, many builds end up installing an oshmem library that we
know will not work.  There has been a bit of customer confusion
over oshmem, hopefully this will lead customers in the right
direction.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-05-25 08:48:50 -07:00
Sergey Oblomov
bbaffd3681 MCA/UCX: atomic add/swap are moved to new UCX atomic API
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-22 22:23:31 +03:00
Sergey Oblomov
4495da5cb9 MCA/UCX: fixed error messages for incorrect msg size
- supported 4 or 8 bytes only

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-22 19:53:23 +03:00
Yossi Itigin
66d931b7c4
Merge pull request #5116 from yosefe/topic/ucx-connect-errs
ucx: improve error messages during connection establishment
2018-05-02 14:04:24 +03:00
Geoff Paulsen
591b174434
Merge pull request #5003 from sam6258/shmem_free_fix
ompi/oshmem: fix shmem_free to perform no-op on null ptr
2018-04-30 12:03:24 -05:00
Yossi Itigin
385f38ab4e ucx: improve error messages during connection establishment
Also, unite common code calling ucp_ep_create()

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-04-30 15:45:05 +03:00
Joshua Ladd
15d5e2937a
Merge pull request #4996 from xinzhao3/topic/shmem-cswap
ompi/oshmem: fix cswap bug in mca/atomic/mxm.
2018-04-04 08:28:57 -04:00
Joshua Ladd
e87cb25711
Merge pull request #4982 from xinzhao3/topic/shmem-final
ompi/oshmem: fix bug in shmem_finalize.
2018-04-04 08:27:55 -04:00
Scott Miller
a8766adb55 ompi/oshmem: fix shmem_free to perform no-op on null ptr
Signed-off-by: Scott Miller <scott.miller1@ibm.com>
2018-04-02 17:12:24 -04:00
Xin Zhao
4aad386c2b ompi/oshmem: fix bug in shmem_finalize.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-04-02 09:07:59 -05:00
Xin Zhao
a5b72cc2e4 ompi/oshmem: fix cswap bug in mca/atomic/mxm.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-03-30 03:17:01 -05:00
Xin Zhao
af32c305de ompi/oshmem: fix bug in shmem_alltoall in mca/scoll/basic.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-03-29 14:54:36 -05:00
Artem Polyakov
77ff99e9ee
Merge pull request #4933 from karasevb/timings_update
timings: added new timing points
2018-03-25 00:10:49 -07:00
Jeff Squyres
c3adcb05eb Miscellaneous compiler warnings fixes
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-23 11:45:30 -07:00
Boris Karasev
3796307a57 timings: added new timing points
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-03-21 05:16:25 +02:00
Alex Mikheev
292d185c30
oshmem: refactor group cache
- Use opal hash table instead of list for group lookup.
- Code cleanup/refactoring. Group cache is now a part
  of the proc_group.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-02-22 11:48:06 +02:00
Yossi Itigin
1b1402299a
Merge pull request #4833 from alex-mikheev/topic/oshmem_gcache_grp_msg_fix
oshmem: increase group cache size to 1000
2018-02-19 14:39:26 +02:00
Alex Mikheev
03a094b9a8
oshmem: increase group cache size to 1000
and fix typos in help messages

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-02-19 11:50:24 +02:00
Alex Mikheev
cca67a69ea oshmem: scoll: fixes strided alltoall
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-02-19 09:41:21 +02:00
Gilles Gouaillardet
88e26c63e0 spml/ucx: fix a double free() issue
in mca_spml_ucx_add_procs() error path

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-22 13:42:16 +09:00
Joshua Ladd
dbefb35aad
Merge pull request #4635 from karasevb/oshmem/spec_1.3/broadcast
oshmem: remove "shmem_broadcast" in accordance with the spec v1.3
2018-01-17 12:11:09 -05:00
Alex Mikheev
ae326546f4
ompi/oshmem: ucx is selected over yalla/ikrit by default
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-01-17 15:08:04 +02:00
Yossi Itigin
1193e1eb83 spml_ucx: fix rkey leak
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-12-26 20:47:26 +02:00
Boris Karasev
5ea892dc0b oshmem: remove shmem_put/get when not the C11 case in accordance with the spec v1.3
Fixes: https://github.com/open-mpi/ompi/issues/4307

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-12-19 12:40:10 +02:00
Boris Karasev
f6818af1ab oshmem: remove "shmem_broadcast" in accordance with the spec v1.3
Fixes: https://github.com/open-mpi/ompi/issues/4098

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-12-19 10:41:12 +02:00
Nathan Hjelm
1282e98a01 opal/asm: rename existing arithmetic atomic functions
This commit renames the arithmetic atomic operations in opal to
indicate that they return the new value not the old value. This naming
differentiates these routines from new functions that return the old
value.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-30 10:41:22 -07:00
Nathan Hjelm
9d0b3fe9f4 opal/asm: remove opal_atomic_bool_cmpset functions
This commit eliminates the old opal_atomic_bool_cmpset functions. They
have been replaced by the opal_atomic_compare_exchange_strong
functions.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-30 10:41:22 -07:00
Nathan Hjelm
3ff34af355 opal: rename opal_atomic_cmpset* to opal_atomic_bool_cmpset*
This commit renames the atomic compare-and-swap functions to indicate
the return value. This is in preperation for adding support for a
compare-and-swap that returns the old value. At the same time the
return type has been changed to bool.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-10-31 12:47:23 -06:00
Alex Mikheev
7cb7af1685
OSHMEM: add ucx to the list of default spmls
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-10-18 10:41:00 +03:00
Alina Sklarevich
c7f5d13550 OSHMEM/CONFIGURE: verbs component - restore the previous build behavior
In case where support was requested but not found, stop the build.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-10-16 11:53:02 +03:00
Alina Sklarevich
3008827f83 OSHMEM/CONFIGURE: Check for the presence of ibv_exp_reg_shared_mr.
+ The sshmem verbs component will disqualify itself if this verb isn't
present on the build host.
+ In case where support was requested but not found, don't stop the
build - continue without this component.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-10-12 19:57:12 +03:00
Mike Dubman
3d1a7ddd9f Merge pull request #4271 from karasevb/oshmem/spec
oshmem: refactoring the definition of `SHMEM_ALLTOALLS_SYNC_SIZE`
2017-09-27 13:17:37 +03:00
Boris Karasev
7479328937 oshmem: refactoring the definition of SHMEM_ALLTOALLS_SYNC_SIZE
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-09-26 12:08:55 +03:00
Mike Dubman
4c98e6bde2 Merge pull request #4258 from yosefe/topic/spml-ucx-fix-quiet-typo
spml_ucx: fix typo in shmem_quiet() error message.
2017-09-26 11:10:40 +03:00
Boris Karasev
584ff76dea oshmem: introduced the definition SHMEM_ALLTOALLS_SYNC_SIZE
In accordance with the OSHMEM spec, this definition must be included in
the code.

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-09-26 09:12:09 +03:00
Yossi Itigin
3081576124 spml_ucx: fix typo in shmem_quiet() error message.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-09-24 19:20:55 +03:00
Gilles Gouaillardet
b9315edb85 configury: remove the --disable-mpi-io option
Fixes open-mpi/ompi#2185

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-09-20 14:39:09 +09:00
Howard Pritchard
bfd5ed6e98 Merge pull request #1910 from hpcraink/pr/shmem_fix_f77
Fix shmem.fh: fails to compile with F77 fixed-form compiled programs...
2017-09-19 14:28:08 -06:00
Rainer Keller
d529c289db Fails to compile with F77 fixed-form compiled programs...
Convert to F77 notation and split into two (shorter) lines.
Also, make usage of the SHMEM_MAX_NAME_LEN definition, by moving
that first.

Signed-off-by: Rainer Keller <rainer.keller@hft-stuttgart.de>
2017-09-15 15:09:43 +02:00
Alina Sklarevich
007b1803ec SPML_UCX: use ompi_proc_world_size() to set the estimated_num_eps value
before this fix, mca_spml_ucx_component_open was using
oshmem_num_procs() to set the value of params.estimated_num_eps for UCX.
The oshmem_num_procs() function uses oshmem_group_all which will be
initialized after the call to mca_spml_ucx_component_open and therefore,
cannot be used there.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-09-04 14:46:00 +03:00
Gilles Gouaillardet
77f30a4378 oshmem_info: cleanup oshmem_info output
- there is no C++ bindings in OpenSHMEM
- only Fortran binding is shmem.fh

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-09-01 13:25:19 +09:00
Gilles Gouaillardet
d1740a679c oshmem: add C++ wrappers
though there are no C++ bindings for oshmem, we need C++ wrappers
since a C compiler might not be able to compile a C++ source.
the C++ wrappers are :
- shmemc++ / oshc++
- shmemcxx / oshcxx
- shmemCC / oshCC (on case sensitive filesystems)

also add the examples/hello_oshmem_cxx.cc example

Thanks Bert Wesarg for bringing this to our attention

Fixes open-mpi/ompi#2097

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-09-01 13:24:34 +09:00
Joshua Hursey
e1d079544b mca: Dynamic components link against project lib
* Resolves #3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-08-24 11:56:16 -04:00
Jeff Squyres
791bcee6c0 ompi/fortran: remove proof-of-concept mpi_f08 module
This module was always intended to be a proof of concept, and was far
from complete.  If/when someone implemented F08 descriptor support for
the mpi_f08 module, this commit can either be restored or used as
reference material.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-08-10 06:19:17 -07:00
Alex Mikheev
692021f637
oshmem: sshmem sysv: auto huge page alloc can fallback to regular pages.
Fallback to the regular pages if huge page allocation is set to auto
and it was not possible to allocate requested amount of memory with
the hugepages.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-08-06 13:33:04 +03:00
Mike Dubman
dd3acd9220 Merge pull request #4006 from alex-mikheev/topic/oshmem_shmem_ptr
oshmem: shmem_ptr() implementation
2017-08-03 19:45:38 +03:00
Alex Mikheev
1b5df76f8b
oshmem: shmem_ptr() implementation
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-08-03 13:56:34 +03:00
Howard Pritchard
1d612da1cb oshmem: fix issue with shmem_g c11 generics
There was a typo in the shmem_g c11 generic interface
in shmem.h.in

Thanks to @nspark for reporting the problem and
specifying the fix.

Fixes #3968

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
2017-08-01 09:58:20 -06:00
Boris Karasev
77c50efb95 Yoda SPML is removed
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-07-14 08:47:16 +03:00
Gilles Gouaillardet
72c7329462 configury: use 'uname -n' when 'hostname' is not available
the 'hostname' command might not be available on some platforms
such as Fedora Core 26, so mimick config/libtool.m4 and fallback
to 'uname -n' if needed

Refs. #3680

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-06-12 15:04:32 +09:00
KAWASHIMA Takahiro
362445d486 Use same prefix format for [host:pid]
Hostname and PID are output as a message prefix in many places in
our code. Their printf-formats were either `[%s:%d]` or `[%s:%05d]`.
This commit changes `[%s:%d]` to `[%s:%05d]`. The latter was more
widely used in our code (including OPAL output system and the signal
handler).

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-06-08 19:35:03 +09:00
KAWASHIMA Takahiro
6b91eddc8b Apply opal_abort_delay to the signal handler
This commit expands the effect of the MCA parameter `opal_abort_delay`
to the OPAL signal handler. This allows attaching of a debugger on
segmentation fault etc. before quitting the job.

The sleep code is moved to the `opal_delay_abort` function from the
`ompi_mpi_abort` and `oshmem_shmem_abort` functions for code cleanup.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-06-08 19:34:48 +09:00
Gilles Gouaillardet
08526e8adc fortran/base: rename strings.h into fortran_base_strings.h
rename ompi/mpi/fortran/base/strings.h so it does not get pulled
when /usr/include/strings.h is expected.

Refs open-mpi/ompi#3639

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-06-02 09:46:20 +09:00
George Bosilca
037a85a782
Fix the OSHMEM request padding.
This patch fixes a missed case by 5b670a2 (PR #3634).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-06-01 18:30:02 -04:00
Mark Allen
482d84b6e5 fixes for Dave's get/set info code
The expected sequence of events for processing info during object creation
is that if there's an incoming info arg, it is opal_info_dup()ed into the obj
at obj->s_info first. Then interested components register callbacks for
keys they want to know about using opal_infosubscribe_infosubscribe().

Inside info_subscribe_subscribe() the specified callback() is called with
whatever matching k/v is in the object's info, or with the default. The
return string from the callback goes into the new k/v stored in info, and
the input k/v is saved as __IN_<key>/<val>. It's saved the same way
whether the input came from info or whether it was a default. A null return
from the callback indicates an ignored key/val, and no k/v is stored for
it, but an __IN_<key>/<val> is still kept so we still have access to the
original.

At MPI_*_set_info() time, opal_infosubscribe_change_info() is used. That
function calls the registered callbacks for each item in the provided info.
If the callback returns non-null, the info is updated with that k/v, or if
the callback returns null, that key is deleted from info. An __IN_<key>/<val>
is saved either way, and overwrites any previously saved value.

When MPI_*_get_info() is called, opal_info_dup_mpistandard() is used, which
allows relatively easy changes in interpretation of the standard, by looking
at both the <key>/<val> and __IN_<key>/<val> in info. Right now it does
  1. includes system extras, eg k/v defaults not expliclty set by the user
  2. omits ignored keys
  3. shows input values, not callback modifications, eg not the internal values

Currently the callbacks are doing things like
    return some_condition ? "true" : "false"
that is, returning static strings that are not to be freed. If the return
strings start becoming more dynamic in the future I don't see how unallocated
strings could support that, so I'd propose a change for the future that
the callback()s registered with info_subscribe_subscribe() do a strdup on
their return, and we change the callers of callback() to free the strings
it returns (there are only two callers).

Rough outline of the smaller changes spread over the less central files:
  comm.c
    initialize comm->super.s_info to NULL
    copy into comm->super.s_info in comm creation calls that provide info
    OBJ_RELEASE comm->super.s_info at free time
  comm_init.c
    initialize comm->super.s_info to NULL
  file.c
    copy into file->super.s_info if file creation provides info
    OBJ_RELEASE file->super.s_info at free time
  win.c
    copy into win->super.s_info if win creation provides info
    OBJ_RELEASE win->super.s_info at free time

  comm_get_info.c
  file_get_info.c
  win_get_info.c
    change_info() if there's no info attached (shouldn't happen if callbacks
      are registered)
    copy the info for the user

The other category of change is generally addressing compiler warnings where
ompi_info_t and opal_info_t were being used a little too interchangably. An
ompi_info_t* contains an opal_info_t*, at &(ompi_info->super)

Also this commit updates the copyrights.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-05-17 01:12:49 -04:00
David Solt
50aa143ab6 Major structural changes to data types: .super infosubscriber
ompi_communicator_t, ompi_win_t, ompi_file_t all have a super class of type opal_infosubscriber_t instead of a base/super type of opal_object_t (in previous code comm used c_base, but file used super).  It may be a bit bold to say that being a subscriber of MPI_Info is the foundational piece that ties these three things together, but if you object, then I would prefer to turn infosubscriber into a more general name that encompasses other common features rather than create a different super class.  The key here is that we want to be able to pass comm, win and file objects as if they were opal_infosubscriber_t, so that one routine can heandle all 3 types of objects being passed to it.

MPI_INFO_NULL is still an ompi_predefined_info_t type since an MPI_Info is part of ompi but the internal details of the underlying information concept is part of opal.

An ompi_info_t type still exists for exposure to the user, but it is simply a wrapper for the opal object.

Routines such as ompi_info_dup, etc have all been moved to opal_info_dup and related to the opal directory.

Fortran to C translation tables are only used for MPI_Info that is exposed to the application and are therefore part of the ompi_info_t and not the opal_info_t

The data structure changes are primarily in the following files:

    communicator/communicator.h
    ompi/info/info.h
    ompi/win/win.h
    ompi/file/file.h

The following new files were created:

    opal/util/info.h
    opal/util/info.c
    opal/util/info_subscriber.h
    opal/util/info_subscriber.c

This infosubscriber concept is that communicators, files and windows can have subscribers that subscribe to any changes in the info associated with the comm/file/window.  When xxx_set_info is called, the new info is presented to each subscriber who can modify the info in any way they want.  The new value is presented to the next subscriber and so on until all subscribers have had a chance to modify the value.  Therefore, the order of subscribers can make a difference but we hope that there is generally only one subscriber that cares or modifies any given key/value pair.  The final info is then stored and returned by a call to xxx_get_info.

The new model can be seen in the following files:

    ompi/mpi/c/comm_get_info.c
    ompi/mpi/c/comm_set_info.c
    ompi/mpi/c/file_get_info.c
    ompi/mpi/c/file_set_info.c
    ompi/mpi/c/win_get_info.c
    ompi/mpi/c/win_set_info.c

The current subscribers where changed as follows:

    mca/io/ompio/io_ompio_file_open.c
    mca/io/ompio/io_ompio_module.c
    mca/osc/rmda/osc_rdma_component.c (This one actually subscribes to "no_locks")
    mca/osc/sm/osc_sm_component.c (This one actually subscribes to "blocking_fence" and "alloc_shared_contig")

Signed-off-by: Mark Allen <markalle@us.ibm.com>

Conflicts:
	AUTHORS
	ompi/communicator/comm.c
	ompi/debuggers/ompi_mpihandles_dll.c
	ompi/file/file.c
	ompi/file/file.h
	ompi/info/info.c
	ompi/mca/io/ompio/io_ompio.h
	ompi/mca/io/ompio/io_ompio_file_open.c
	ompi/mca/io/ompio/io_ompio_file_set_view.c
	ompi/mca/osc/pt2pt/osc_pt2pt.h
	ompi/mca/sharedfp/addproc/sharedfp_addproc.h
	ompi/mca/sharedfp/addproc/sharedfp_addproc_file_open.c
	ompi/mca/topo/treematch/topo_treematch_dist_graph_create.c
	ompi/mpi/c/lookup_name.c
	ompi/mpi/c/publish_name.c
	ompi/mpi/c/unpublish_name.c
	opal/mca/mpool/base/mpool_base_alloc.c
	opal/util/Makefile.am
2017-05-12 14:41:05 -04:00
Xin Zhao
ee952fcccd Passing estimated_num_procs to UCX init in PML and SPML.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-27 20:36:52 +03:00
Pavel Shamis (Pasha)
95c440683b OSHMEM: shmem_wait code cleanup
* updating naming convention for the arguments in order to ensure
that the name aligns with an actual meaning of the argument
* remove local variable references in the macro
* adding volatile for the poll variables

Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>
2017-03-15 21:53:44 +00:00
Yossi
1a95633e40 Merge pull request #2717 from alex-mikheev/topic/sshmem_ucx
oshmem: sshmem: adds UCX allocator
2017-03-09 12:58:06 +02:00
Jeff Squyres
fec519a793 hwloc: rename opal/mca/hwloc/hwloc.h -> hwloc-internal.h
Per a prior commit, the presence of "hwloc.h" can cause ambiguity when
using --with-hwloc=external (i.e., whether to include
opal/mca/hwloc/hwloc.h or whether to include the system-installed
hwloc.h).

This commit:

1. Renames opal/mca/hwloc/hwloc.h to hwloc-internal.h.
2. Adds opal/mca/hwloc/autogen.options to tell autogen.pl to expect to
   find hwloc-internal.h (instead of hwloc.h) in opal/mca/hwloc.
3. s@opal/mca/hwloc/hwloc.h@opal/mca/hwloc/hwloc-internal.h@g in the
   rest of the code base.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-28 07:48:42 -08:00
George Bosilca
366d64b7e5 Move the collective structure outside the communicator.
As we changed the ABI (forcing a major release), we can limit
the size of the predefined communicators by moving the collective
structure outside the communicator. This might have a minimal,
but unnoticeable, impact on performance. This approach has been
discussed during the January 2017 devel meeting.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-27 11:54:17 -06:00
Gilles Gouaillardet
af0b5cffb4 asm: rename the AMD64 into X86_64
in this context, AMD64 really means amd64 or em64t, so let's
rename this into X86_64 in order to avoid any confusion

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-27 15:10:50 +09:00
Alex Mikheev
c9b5b12af4
oshmem: sshmem ucx: use fixed base address
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-26 15:16:28 +02:00
Alex Mikheev
c63137e1c0 oshmem: sshmem ucx: minor code cleanup
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:48:00 +02:00
Alex Mikheev
132fbd9ae9 oshmem: sshmem: add UCX allocator
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:48:00 +02:00
Alex Mikheev
e038e3f9e0 oshmem: sshmem: code cleaunp
The commit removes unused code and interface function, moves
common code to the base.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:47:59 +02:00
Alex Mikheev
ea3ea4835b oshmem: mem use hook: apply code review fixes
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
(cherry picked from commit a422154a141f0be5b92d2b6c26d7b2b4176dfe18)
2017-01-30 11:30:20 +02:00
Alex Mikheev
9da9e6260d
oshmem: spml ucx: on error print ucx error string
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-29 10:28:24 +02:00
Alex Mikheev
986ca000f8
oshmem: spml: add memory allocation hook
The hook is called from memheap when memory range
is going to be allocated by smalloc(), realloc() and others.

ucx spml uses this hook to call ucp_mem_advise in order to speedup
non blocking memory mapping.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-26 16:41:39 +02:00
Jeff Squyres
e79e478447 oshmem: add some deprecated names in shmem.h.in
Per https://github.com/openshmem-org/tests-uh/issues/17, add some
deprecated constant names that we didn't previously support in Open
MPI.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-17 16:13:38 -08:00
Alex Mikheev
83c2ab76a5
oshmem: memheap: refactor component selection code
Do not call component's init function until the component has been
selected.

Use mca_base_select() instead of the custom component selection code.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-16 13:48:58 +02:00
Alex Mikheev
67d66c2326
oshmem: sshmem: make mmap allocator a default instead of verbs
By default use mmap() to allocate memory for the symmetric heap.
It is safer and more portable choice than sysv and verbs.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-12-14 13:31:16 +02:00
Alina Sklarevich
e9d2d029c6 PML/SPML/UCX: Adapt to the API changes in the UCX lib.
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2016-12-08 11:33:29 +02:00
Gilles Gouaillardet
804a784fce Merge pull request #2544 from ggouaillardet/topic/mca_spml_yoda_get
spml/yoda: fix support for BTLs that do not register memory in mca_sp…
2016-12-08 17:26:07 +09:00
Gilles Gouaillardet
062ed9c919 spml/yoda: fix support for BTLs that do not register memory in mca_spml_yoda_get()
Refs open-mpi/ompi#2499

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-08 15:56:25 +09:00
Joshua Ladd
dc6f4a0feb Remove shmemx.h from shmem.h. Add shmem.h to shmemx.h
Fixes #2483
Signed-off-by: Joshua Ladd <joshual@mellanox.com>
2016-12-06 06:42:26 +02:00
Mike Dubman
53a0c86c16 Merge pull request #2455 from yosefe/topic/ucp-uct-nonblock-mem-reg-api
spml_ucx: allow registering the heap in non-blocking mode.
2016-11-27 11:42:09 +02:00
Mike Dubman
f339632216 Merge pull request #2452 from alex-mikheev/topic/scoll_basic_fixes
oshmem: fixes scoll basic barrier and broadcast
2016-11-25 18:03:56 +02:00
Yossi Itigin
0241a2697d spml_ucx: allow registering the heap in non-blocking mode.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2016-11-25 15:09:22 +02:00
Alex Mikheev
0f83a1fd57
oshmem: scoll: fixes basic barrier broadcast and alltoall
Add missing fence() call to alltoall and central counter broadcast.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-24 16:56:55 +02:00
Alex Mikheev
d1723355d3
oshmem: memheap: removes find_offset
Reasons for removal are:
- the function is only used by the shmem_lock code
- only a subset of the function is used by the shmem_lock
- for the general case the function is not correct

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-21 16:09:17 +02:00
Gilles Gouaillardet
19bdd1d626 oshmem/memheap: initialize common symbol mca_memheap_base_map
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-11-21 09:32:27 +09:00
Joshua Ladd
9a79da729f Merge pull request #2354 from alex-mikheev/topic/oshmem_mkey_cache
ikrit spml cleanup, mkey cache and assorted bug fixes
2016-11-14 17:22:13 -05:00
Alex Mikheev
864904e8ab
oshmem: ucx: check status only if configured --with-oshmem-param-check
Current standard says that behaviour in the case of error is undefined

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-10 11:29:03 +02:00
Alex Mikheev
48a7a0bbb9
oshmem: lock: call opal_progress only when busy waiting
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-10 11:27:24 +02:00
Alex Mikheev
bf61961f8b
oshmem: code review fixes
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-08 15:11:59 +02:00
Alex Mikheev
f133d9b6c8
oshmem: fixes comiplation errors in sshmem
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-08 15:11:07 +02:00
Gilles Gouaillardet
11dc86f26b cleanup: always #include <pthread.h>
pthreads are now mandatory, so there is no more need to

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-11-08 13:07:45 +09:00
Alex Mikheev
18301429a0 oshmem: removes fortran shmem_put() because v1.3 does not have it
picked from the 64a8f64afd351e37e7d80a080e495c082b56517c

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-07 13:47:57 +02:00
Alex Mikheev
ff5095e533 OSHMEM: adds support for mkey caching by spml
It improves cpu cache hit ratio.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:43 +02:00
Alex Mikheev
5c2f807ef8 OSHMEM: fixes verbosity log level cal
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:42 +02:00
Alex Mikheev
7caa736533 OSHMEM: fixes potential deadlock in shmem_lock()
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:42 +02:00
Alex Mikheev
defcc3ddc1 OSHMEM: spml ikrit: get/put request cleanup
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:41 +02:00
Alex Mikheev
61bd59a369 OSHMEM: fixes addr_acessible()
check every possible transport

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:41 +02:00
Alex Mikheev
23c3dc8345 OSHMEM: mxm: optimize mxm_peer layout.
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:40 +02:00
Alex Mikheev
df74d549dc OSHMEM: spml ikrit: changes mxm_peers layout
use single array instead of array of pointers

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:39 +02:00
Alex Mikheev
b5c7c7de78 OSHMEM: memheap: disable oob if allgather mkey exchange is used
In this case there is no point to add another progress callback

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:39 +02:00
Alex Mikheev
0826e63363 OSHMEM: spml_ikrit: makes quiet wait for get_nbi requests
shmem_quit() shall complete all outstanding get_nbi() requests

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:38 +02:00
Alex Mikheev
2f91ce7281 OSHMEM: mxm versions less than 2.0 are no longer supported
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:38 +02:00
Pavel Shamis (Pasha)
92b0ebd7c3 For UCX it is legal to return UCS_INPROGRESS (1) code for non-blocking function
calls, which means that the operation was successfully started but not
immediately completed. This is a "good" return code that should not be handled
as an error.

Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>
2016-11-03 15:36:13 -05:00
Boris Karasev
68b5acd9f4 oshmem/spml/yoda: fixed the btl operations
Fixed the shmem OOM error which is referenced on #2028

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2016-11-02 13:38:35 +02:00
Alex Mikheev
511dd43736
oshmem: fixes typo in the error message
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-10-27 09:27:45 +03:00
Alex Mikheev
f630b43285
OSHMEM: fixes crash during initialization
Do not call mpi comm_dup() if mpi failed to initialize. Also do not set
signal handlers.
Small code styling fixes.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-10-26 11:15:06 +03:00
Alex Mikheev
6c798fe08d OSHMEM: updates copyrights in fortran fetch/set
(cherry picked from commit f5297ccdb277208a96aaffd72a6454afe712fdb4)
2016-10-20 15:09:27 +03:00
Yossi Itigin
05ca466c6b ucx: adapt pml_ucx and spml_ucx to new UCX APIs
- pass field_mask to ucp_init().
- use non-blocking disconnect.
- recv() with pre-allocated request.
- call opal_progress() from iprobe() and improbe().
- use shift pattern in connect/disconnect.
2016-10-12 23:45:45 +03:00
Joshua Ladd
fe2b8b7e06 OSHMEM Specification version: Bump to v1.3. 2016-10-06 22:12:07 +03:00
Joshua Hursey
fc3cf994db build: Custom libmpi_FOO name fix for wrapper compilers
* In open-mpi/ompi@f6f24a4f67 I missed
   updating the library references for the wrapper compilers.
 * Fixes the CXX wrapper compiler and CXX library is renamed as needed.
 * Fixes the Java wrapper compiler and the Java library is renamed as needed.
2016-09-30 16:40:56 -05:00
Joshua Hursey
f6f24a4f67 build: Custom libmpi(_FOO) name option in configure
* Add a configure time option to rename libmpi(_FOO).*
   - `--with-libmpi-name=STRING`
 * This commit only impacts the installed libraries.
   Internal, temporary libraries have not been renamed to limit the
   scope of the patch to only what is needed.

For example:
```shell
shell$ ./configure --with-libmpi-name=wookie
...
shell$ find . -name "libmpi*"
shell$ find . -name "libwookie*"
./lib/libwookie.so.0.0.0
./lib/libwookie.so.0
./lib/libwookie.so
./lib/libwookie.la
./lib/libwookie_mpifh.so.0.0.0
./lib/libwookie_mpifh.so.0
./lib/libwookie_mpifh.so
./lib/libwookie_mpifh.la
./lib/libwookie_usempi.so.0.0.0
./lib/libwookie_usempi.so.0
./lib/libwookie_usempi.so
./lib/libwookie_usempi.la
shell$
```
2016-09-29 21:47:24 -05:00
Alex Mikheev
dd2405a625 OSHMEM: fixes typo in c11 generic 2016-09-26 11:43:38 +03:00
Alex Mikheev
71712df8d1 OSHMEM: fixes arg mismatch in c11 macros 2016-09-26 09:59:23 +03:00
Alex Mikheev
caa1d17672 OSHMEM: fixes compiler warnings 2016-09-25 18:16:45 +03:00
Alex Mikheev
9a21392ec2 OSHMEM: v1.3: add C11 generics
add missing put*/get* functions. Move *put|get16 functions from shmemx.h to
shmem.h as required by 1.3 spec.
2016-09-25 16:43:00 +03:00
Alex Mikheev
3a034352fe OSHMEM: v1.3: adds shmem_fetch and shmem_set AMOs
The commit adds atomic set and fetch functions as described in
oshmem 1.3 spec.
2016-09-25 12:03:42 +03:00
Gilles Gouaillardet
92dd719df1 oshmem: move finalization from the liboshmem destructor into oshmem_onexit()
so we can use the legacy start_pes even when Open MPI is compiled with
--enable-static or --disable-visibility
2016-09-21 09:21:26 +09:00
Joshua Ladd
d5e65c4860 Merge pull request #2052 from alex-mikheev/topic/spml_ikrit_zcopy_fix
OSHMEM: spml ikrit: fixes zero copy
2016-09-12 12:35:32 -04:00
Alex Mikheev
439456ae96 OSHMEM: spml ikrit: fixes zero copy
Allow mxm to use zero copy in put() and get() for the large messages.
2016-09-04 12:16:09 +03:00
Gilles Gouaillardet
184d53a018 oshmem: swap fields of oshmem_proc_data_t to prevent padding
previously, the definition was

struct oshmem_proc_data_t {
    int num_transports;
    char * transport_ids;
};

so in 64 bits arch, the compiler would very likely insert a 4 bytes
padding before the two fields in order to have transport_ids aligned
2016-09-01 14:20:14 +09:00
Gilles Gouaillardet
0a25420dac oshmem: get rid of oshmem_proc_t and use ompi_proc_t instead
store oshmem related per proc data in an oshmem_proc_data_t struct,
that is stored in the padding section of an ompi_proc_t

this data can be accessed via the OSHMEM_PROC_DATA(proc) macro

Fixes open-mpi/ompi#2023
2016-09-01 14:20:14 +09:00
Gilles Gouaillardet
6b7bc64101 spml/yoda: MCA_PML(add_procs) all procs from oshmem_comm_world
and fix oshmem_group_proc_{init,create} so they use the number of procs in oshmem_comm_world

Thanks Debendra Das for the report and Josh Ladd for the guidance

Fixes open-mpi/ompi#1966
2016-08-17 14:24:02 +09:00
Gilles Gouaillardet
273e56096b configury: capture configury command line
configury command line is quoted and made available via the OPAL_CONFIGURE_CLI macro.
it can be retrieved via {orte-info,ompi_info,oshmem_info} -c, or
{orte-info,ompi_info,oshmem_info} --all --parseable | grep ^config:cli:
2016-07-29 09:14:09 +09:00
Boris Karasev
49b67094e0 oshmem/fortran: fix warning mesages && fix size 2016-07-22 15:54:01 +06:00
Gilles Gouaillardet
2a98f9fcc3 oshmem: replace header files in include/mpp with symlinks
This is a work around to avoit what looks like a CMake bug

Thanks Paul Kapinos for the report

Fixes open-mpi/ompi#1868
2016-07-14 14:32:25 +09:00
Rainer Keller
3ec1b868d1 Fix missing include and missing MCA_SPML_CALL. 2016-07-13 11:23:47 +02:00
Pavel Shamis (Pasha)
1bb778857f OSHMEM: Removing erroneous initialization check
Since the introduction of the on-demand proc allocation
the check become erroneous and irrelevant.
Moreover, it completely breaks OpenSHMEM support in OMPI.

Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>
2016-06-24 16:57:10 -05:00
Igor Ivanov
a8ab5b55b9 oshmem: Fix double lock issue
Signed-off-by: Igor Ivanov <igor.ivanov.va@gmail.com>
2016-06-10 15:52:31 +03:00
Gilles Gouaillardet
544a2f1631 configury: fix mpifort and oshmemfort wrapper data
NAG compiler use gcc (and not ld) as a linker, so in order to pass an option to the linker,
the flag is -Wl,-Wl,,<option> and not -Wl,<option>

Thanks Paul Hargrove for the report
2016-06-06 11:54:12 +09:00
Nathan Hjelm
dbfab94ede atomic/mxm: rename symbol that is a duplicate of one in atomic/ucx
This fixes an error when building with --enable-static.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 15:34:40 -06:00
Jeff Squyres
265e5b9795 Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1
ompi/opal/orte/oshmem/test: max hostname length cleanup
2016-05-02 09:44:18 -04:00
Karol Mroz
941f2c1e0b oshmem: fixup hostname max length usage
Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-25 07:08:23 +02:00
Nathan Hjelm
ae0ffbb67f Merge pull request #1397 from hjelmn/enable_thread_multiple
ompi: always enable MPI_THREAD_MULTIPLE support
2016-04-23 08:40:22 -06:00
Igor Ivanov
75050b44a2 oshmem: Align OSHMEM API with spec v1.3 (extension api changes)
openshmem.org specification does not mention about extension api
but there is an agreemnet to do these changes for related ex api too.
see
Annex G:
Version 1.3
Added const to every read-only pointer argument
2016-04-18 19:38:16 +03:00
Igor Ivanov
157f81b699 oshmem: Align OSHMEM API with spec v1.3 (Added const to every read-only pointer argument)
Annex G:
Version 1.3
Added const to every read-only pointer argument
2016-04-18 19:25:31 +03:00
Igor Ivanov
c02d0b7161 oshmem: Align OSHMEM API with spec v1.3 (shmem_lock change signature)
Annex G:
Version 1.3
Added volatile to remotely accessible pointer argument in
SHMEM_LOCK
See Sections 8.9.1
2016-04-18 19:25:18 +03:00
Igor Ivanov
a52b0797fc oshmem: Align OSHMEM API with spec v1.3 (shmem_wait change signature)
Annex G:
Version 1.3
Added volatile to remotely accessible pointer argument in SHMEM_WAIT
See Sections 8.7.1
2016-04-18 19:24:55 +03:00
Karol Mroz
a468c3ba1a opal_info_support: pass component map when handling params
Pass component_map to opal_info_do_params(). It will be needed to output
component versions.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-02 21:17:44 +02:00
Jeff Squyres
2c5b39718d oshmem: fix scoll_null_alltoall() prototype
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-03-26 03:50:57 -07:00
Gilles Gouaillardet
cb84f582b2 oshmem: add missing prototypes for pshmem_alltoall[s]{32,64} 2016-03-24 15:22:29 +09:00
Mike Dubman
1d8fbfefb0 Merge pull request #1478 from igor-ivanov/pr/oshmem-v1.3-alltoall
oshmem: Add alltoall
2016-03-22 07:51:36 +02:00
Mike Dubman
7483a66ef6 Merge pull request #1455 from igor-ivanov/pr/oshmem-v1.3
oshmem: Add Non-blocking Remote Memory Access Routines
2016-03-22 07:50:11 +02:00