1
1

9468 Коммитов

Автор SHA1 Сообщение Дата
Alina Sklarevich
d93b67257b PML UCX: handle a synchronous send.
MCA_PML_BASE_SEND_SYNCHRONOUS

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-13 18:11:55 +03:00
Alina Sklarevich
eec310c99c PML/UCX/YALLA: Fix the message release call.
Set message to MPI_MESSAGE_NULL.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-13 14:41:13 +03:00
Gilles Gouaillardet
6886c1229a Merge pull request #3327 from jeffhammond/fix-issue-3326
check for negative ranks in ompi_win_peer_invalid
2017-04-13 10:53:32 +09:00
Ralph Castain
dadc924cde Cleanup warnings when timing is not enabled
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-11 17:29:27 -07:00
Jeff Hammond
b3a20100d3 check for negative ranks in ompi_win_peer_invalid
resolves #3326 (https://github.com/open-mpi/ompi/issues/3326)

Signed-off-by: jeff.r.hammond@intel.com
2017-04-11 14:26:16 -07:00
Nathan Hjelm
bea7d9e4f7 Merge pull request #3320 from hjelmn/osc_pt2pt_fix
osc/pt2pt: fix infinite frag allocation loop
2017-04-11 09:09:30 -06:00
Artem Polyakov
4477b87e1d Merge pull request #3303 from karasevb/timing2/master
OMPI timings
2017-04-11 07:52:40 -07:00
Boris Karasev
d132eab4a5 ompi/timings: fixed the error of opal timings env import
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-04-11 12:08:48 +06:00
Nathan Hjelm
12b52b2b2c osc/pt2pt: fix infinite frag allocation loop
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-04-10 16:30:47 -06:00
KAWASHIMA Takahiro
b4599d7bb7 datatype: Fix darray MPI_ACCUMULATE bug
Array sizes of `array_of_gsizes`, `array_of_distribs`, `array_of_dargs`,
and `array_of_psizes` parameters of the `ompi_datatype_create_darray`
function (and `MPI_TYPE_CREATE_DARRAY`) are all `ndims`.
`ndims` are `i[2]`, not `i[0]`. See MPI-3.1 p.122.

Because this function `__ompi_datatype_create_from_args` is used by
pt2pt OSC, using a datatype created by `MPI_TYPE_CREATE_DARRAY` for
`MPI_(R)(GET_)ACCUMULATE` caused a segmentation fault or something
on a target process.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-10 17:31:59 +09:00
Ralph Castain
95ae0d1df3 Cleanup timing macros for portability across compilers. Rename the --enable-timing configure option to be --enable-pmix-timing so it doesn't pickup external timing requests. Remove a stale function reference in PMIx so it can compile with timing enabled.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-10 12:56:38 +06:00
Howard Pritchard
f5942ff23c Merge pull request #3304 from hppritcha/topic/de-ortization-of-ompi
de-ORTEfy the ompi tree
2017-04-07 14:14:41 -06:00
Noah Evans
ef29fb13cb de-ORTEfy the ompi tree
The ompi tree should be runtime independent, but over time a few
ORTE depedent definitions and functions have escaped into the ompi
tree. I'm working on my own runtime so I've used this as an opportunity
to get rid of ORTE dependencies in the ompi/ tree. I still need to go
back and change orte to conform to the new world and these changes are
untested, but I can now compile (but not link) without orte so I'm
commiting this changeset.

Signed-off-by: Noah Evans <noah.evans@gmail.com>
2017-04-07 12:35:58 -06:00
Boris Karasev
36a0e71f2d ompi/timings: preparing to production state
Adds:
- enabling/disabling of timings throught environment variable `OMPI_TIMING_ENABLE`
- output format: [file name]:[function name]:[description]: avg/min/max
- dynamically extending array of results for case then inited size was exhausted
- catch and collect errors
- cleanup

Note:
For use feature need to configure with `--enable-timings`
and set env `OMPI_TIMING_ENABLE = 1`

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-04-07 21:16:57 +06:00
Artem Polyakov
e3acf2a339 ompi/timings: add OMPI-level timing framework.
This is an extension of OPAL timing framework that allows to use
MPI_reduce to provide the compact representation of the collected
timings throughout the whole application.

NOTE: the functionality is disabled now, it will be enabled after
the runtime verification.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-04-07 21:16:22 +06:00
Artem Polyakov
1063c0d567 opal/timing: remove timings from MPI_Init and MPI_Finalize
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-04-07 21:16:21 +06:00
Nathan Hjelm
1322e5dee8 Merge pull request #3274 from hjelmn/osc_rdma_fix
osc/rdma: fix typo in atomic code
2017-04-04 00:20:42 -06:00
Gilles Gouaillardet
5dfd4ab6ca coll/tuned: remove set-but-not-used variables
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-04 13:18:11 +09:00
Nathan Hjelm
fad0803920 osc/rdma: fix typo in atomic code
Fixes #3267

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-04-03 15:54:28 -06:00
Xin Zhao
ee952fcccd Passing estimated_num_procs to UCX init in PML and SPML.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-27 20:36:52 +03:00
Nathan Hjelm
c72fb30eb5 osc/pt2pt: fix typo
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2017-03-23 09:00:21 -06:00
Xin Zhao
6a99c60fbd Add multithreading support in PML UCX framework.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-20 19:55:00 +02:00
Jeff Squyres
ce0e1cd32c Merge pull request #3201 from hppritcha/jjhursey-topic/timer-gettimeofday
Jjhursey topic/timer gettimeofday
2017-03-18 20:12:36 -04:00
Howard Pritchard
b9331527f5 timer: hack use of clock_gettime
better solution needed later
 workaround for #3003

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-18 15:08:59 -05:00
Ralph Castain
45b46dc446 Merge pull request #3181 from artpol84/add_proc_fix_2/master
ompi: Avoid unnecessary PMIx lookups when adding procs (step 2).
2017-03-16 15:06:08 -07:00
Jeff Squyres
760db0d5ce osc/pt2pt: fix compiler warning
Remove unused variable.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-16 05:46:11 -07:00
Jeff Squyres
1947280865 topo/treematch: squash some compiler warnings
Only define MIN/MAX if they are not already defined.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-16 05:44:26 -07:00
Joshua Hursey
48d13aa8ef mpi/c: Force wtick/wtime to use gettimeofday
* See https://github.com/open-mpi/ompi/issues/3003 for a discussion about
   this patch. Once we get a better version in place we can revert this
   change.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-03-15 21:24:37 -05:00
Artem Polyakov
1f7a3a2d54 ompi: Avoid unnecessary PMIx lookups when adding procs (step 2).
Follow-up for 717f3fef62b193845e9add5aaaae3543c2f2ebfb.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-03-16 07:47:27 +07:00
Nathan Hjelm
37214eda09 Merge pull request #3164 from hjelmn/ob1_pinned
pml/ob1: do not cache leave_pinned
2017-03-14 13:22:18 -06:00
Nathan Hjelm
3e7ef48c13 pml/ob1: do not cache leave_pinned
This commit fixes a bug that disabled both the RDMA pipeline and RDMA
protocols in ob1. ob1 was internally caching the values of
opal_leave_pinned and opal_leave_pinned_pipeline at init time. This is
no longer valid as opal_leave_pinned may be set by any call to a btl's
add_procs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-14 09:00:40 -06:00
Valentin Petrov
fe069c9570 Fixes the coll_allgather usage bug
One should use the correct module object when calling
      c_coll.coll_allgather. Otherwise there will be a segfault in the
      case, for example, when hcoll is used. In that case
      c_coll.coll_allgather = mca_coll_hcoll_allgather while
      c_coll.coll_gather_module = tuned.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2017-03-14 09:47:39 +02:00
Jeff Squyres
086748bb70 Merge pull request #3102 from omor1/master
Add missing definition of MPI_T_PVAR_SESSION_NULL (resolve #2652)
2017-03-13 15:27:05 -04:00
Alex Mikheev
c081239f88
ompi: pml ucx: fix persistant request init CR changes
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-08 13:26:29 +02:00
Alex Mikheev
c113c37a7a
ompi: pml ucx: fix persistant request initialization
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-08 10:59:41 +02:00
Nathan Hjelm
0195d15401 osc/pt2pt: flush pending fragments on lock ack
This commit addresses an issue that can occur in cases where a lot of
fragments are outstanding.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-06 13:58:46 -07:00
Edgar Gabriel
607dc2c039 Merge pull request #3103 from edgargabriel/pr/sharedfp-name-collision-fix
sharedfp/lockedfile and sm: fix the namecollision
2017-03-05 14:46:20 -06:00
Edgar Gabriel
2d462b3b80 sharedfp/lockedfile and sm: fix name collision
this fixes the issue reported by Nicolas Joly on the mailing: the sharedfp/lockedfile component does not support right now a scenario where multiple jobs read from the same input file, due to a collision of the filenames utilized for the sharedfp handle. Although not part of the oroginal report, the same occurs for the sharedfp/sm component. Add therefore the jobid to be part of the lockedfilename/sm file name.

use the OMPI_CAST_RTE_NAME macro to determine jobid

Fixes: #3098

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-03-05 11:28:28 -06:00
Omri Mor
20ab37a297 Add missing MPI_T_PVAR_SESSION_NULL to mpi.h
MPI_T_pvar_session_free() should reject null sessions and set *session to MPI_T_PVAR_SESSION_NULL

Signed-off-by: Omri Mor <omri50@gmail.com>
2017-03-05 09:03:30 -06:00
Artem Polyakov
9448814c40 ompi/pml/ucx: Fix uninitialized UCX request field.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-03-05 03:06:30 +07:00
Edgar Gabriel
d1fed77781 Merge pull request #3094 from edgargabriel/pr/master-lustre-priority
io/ompio: adjust the priority of the OMPIO component on lustre
2017-03-03 09:29:14 -06:00
KAWASHIMA Takahiro
39294caf04 Merge pull request #3086 from kawashima-fj/pr/coll-base-defs
coll: Update `ompi/mca/coll/base/coll_base_functions.h`
2017-03-03 18:53:00 +09:00
KAWASHIMA Takahiro
7cb42d9aaa Merge pull request #3085 from kawashima-fj/pr/pml-bfo-typo
pml/bfo: Correct a function name and header filenames
2017-03-03 18:48:01 +09:00
Edgar Gabriel
9e19834327 io/ompio: adjust the priority of the OMPIO component on lustre
this commit brings over the behavior from the 2.x series to master, mostly with the fork for the 3.x series in mind.
Also, use strncasecmp instead of two strncmps

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-03-02 12:10:11 -06:00
Jeff Squyres
dc53cd5f74 MPI_Wtick: may return a higher resolution than 10e-6 these days
Thanks to Mark Dixon (@ccaamad) for reporting the error.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-02 10:39:28 -05:00
KAWASHIMA Takahiro
c4ca5e703d coll: Update ompi/mca/coll/base/coll_base_functions.h
- Support MPI-2.2 and MPI-3.0 COLL features.

  * `MPI_REDUCE_SCATTER_BLOCK`
  * neighborhood collective communication
  * nonblocking collective communication

- Add `*_BASE_ARGS` and `*_BASE_ARG_NAMES` for convenience.

- Use parameter names used in the MPI Standard.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-03-02 17:58:02 +09:00
KAWASHIMA Takahiro
96aa0d90c1 pml/bfo: Correct a function name and header filenames
These lines were incorrectly modified in 90f2940.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-03-02 16:02:53 +09:00
Alex Mikheev
152f77df59
ompi: pml ucx: fix datatype packing error in bsend
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-01 16:18:19 +02:00
Yossi Itigin
33471c44ee pml_yalla/mtl_mxm/hcoll: open memory component to activate memory hooks.
Memory hooks are now set-up on demand. pml/yalla, mtl/mxm and
coll/hcoll need the memory hooks, so make sure those are installed.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-03-01 12:12:20 +02:00
Gilles Gouaillardet
880f2d5431 mpi/c: revamp error handling in MPI_{Pack,Unpack}[_external]
Thanks Alex and the folks at Mellanox for the help.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-03-01 10:03:31 +09:00