openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	4345308dfd	osc/rdma: fix CAS 32-bit network atomic compatibility check When checking for btl compatibility with 32-bit CAS osc/rdma was checking the incorrect flag field. Signed-off-by: Nathan Hjelm <hjelmn@cs.unm.edu>	2019-05-10 07:27:53 -06:00
KAWASHIMA Takahiro	dabad084b5	Merge pull request #6621 from bosilca/topic/persistent_req_leak Fix the leak of fragments for persistent sends (issue #6565)	2019-05-03 15:21:42 +09:00
George Bosilca	a16cf0e4dd	Fix the leak of fragments for persistent sends. The rdma_frag attached to the send request was not correctly released upon request completion, leaking until MPI_Finalize. A quick solution would have been to add RDMA_FRAG_RETURN at different locations on the send request completion, but it would have unnecessarily made the sendreq completion path more complex. Instead, I added the length to the RDMA fragment so that it can be completed during the remote ack. Be more explicit on the comment. The rdma_frag can only be freed once when the peer forced a protocol change (from RDMA GET to send/recv). Otherwise the fragment will be returned once all data pertaining to it has been trasnferred. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-05-02 09:40:11 -04:00
Yossi Itigin	5d2200a7d6	Merge pull request #6605 from brminich/topic/shmem_all2all_put SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h	2019-05-01 12:00:21 +03:00
bosilca	399b7133ab	Merge pull request #6556 from EmmanuelBRELLE/PR_fix_local_handle_in_PUT_message pml/ob1: fixed local handle sent during PUT control message	2019-04-27 13:51:22 -04:00
Mikhail Brinskii	2ef5bd8b36	SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h The new routine transfers the data asynchronously from the source PE to all PEs in the OpenSHMEM job. The routine returns immediately. The source and target buffers are reusable only after the completion of the routine. After the data is transferred to the target buffers, the counter object is updated atomically. The counter object can be read either using atomic operations such as shmem_atomic_fetch or can use point-to-point synchronization routines such as shmem_wait_until and shmem_test. Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>	2019-04-26 14:47:58 +03:00
Mark Allen	d85cac8f1a	fixing an unsafe usage of integer disps[] (romio321 gpfs) There are a couple MPI_Alltoallv calls in ad_gpfs_aggrs.c where the send/recv data comes from places like req[r].lens, and the send buffer and send displacements for example were being calculated as sbuf = pick one of the reqs: req[bottom].lens sdisps[r] = req[r].lens - req[bottom].lens which might be okay if the .lens was data inside of req[] so they'd all be close to each other. But each .lens field is just a pointer that's malloced, so those addresses can be all over the place, so the integer-sized sdisps[] isn't safe. I changed it to have a new extra array sbuf and rbuf for those two Alltoallv calls, and copied the data into the sbuf from the same locations it used to be setting up the sdisps[] at, and after the Alltoallv I copy the data out of the new rbuf into the same locations it used to be setting up the rdisps[] at. For what it's worth I was able to get this to fail -np 2 on a GPFS filesystem with hints romio_cb_write enable. I didn't whittle the test down to something small, but it was failing in an MPI_File_write_all call. Signed-off-by: Mark Allen <markalle@us.ibm.com>	2019-04-23 16:01:55 -04:00
Jeff Squyres	9a9d106296	Merge pull request #6555 from EmmanuelBRELLE/PR-pmlob1_fix_rc_for_putfrag_when_get_failed pml/ob1: fixed exit from get_frag_fail when falling back on btl_put	2019-04-22 17:19:12 -04:00
Edgar Gabriel	d43427fc76	common/ompio: refactor the build_io_array function abstract out the io_array structure to be used in common_ompio_build_io_array function. This is preparation for a future component that would like to use the same function, but not modify the io_array stored on the file handle itself. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-04-17 14:42:33 -05:00
Valentin Petrov	30970bdfdf	OSC/UCX: correctly handle NULL origin addr and MPI_NO_OP Addtional bugfix: origin_addr -> result_addr for no_op, replace_op and sum_op fetch destination. Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2019-04-17 10:30:21 +03:00
Brelle Emmanuel	e630046a4b	pml/ob1: fixed local handle sent during PUT control message In case of using a btl_put in ob1, the handle of the locally registered memory is sent with a PUT control message. In the current master code the sent handle is necessary the handle in the frag but if the handle has been successfully registered in the request, the frag structure does not have any valid handle and all fragments use the request one. I suggest to check if the handle in the fragment is valid and if not to send the handle from the request. Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net>	2019-04-01 18:45:05 +02:00
Brelle Emmanuel	9c689f2225	pml/ob1: fixed exit from get_frag_fail when falling back on btl_put In the case the btl_get fails Ob1 tries to fallback on btl_put first but the return code was ignored. So the code fell back on both btl_put and btl_send. Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net>	2019-04-01 18:17:10 +02:00
George Bosilca	6ea0c4eab9	Prevent a segfault when accessing a rank outside a communicator. This is not fixing any issue, it is simply preventing a sefault if the communicator creation has not happened as expected. Thus, this code path should never really be hit in a correct MPI application with a valid communicator creation support. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-03-28 12:03:29 -04:00
Artem Polyakov	bfff5783f9	Merge pull request #6371 from artpol84/osc/select_dbg osc/base: Add debug output stating a selected component	2019-03-22 22:24:04 -07:00
Sergey Oblomov	d8e3562bae	PML/SPML/UCX: added evaluation of mmap events - there was a set of UCX related issues reported which caused by mmap API hooks conflicts. We added diagnostic of such problems to simplify bug-resolving pipeline Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2019-03-12 21:14:27 +02:00
Nathan Hjelm	73085e9ce3	Merge pull request #6413 from nuriallv/issue_osc_rdma osc/rdma: fix when determining the node with the rank_array info for a peer	2019-02-27 16:30:06 -07:00
bosilca	8400502d8a	Merge pull request #6353 from bosilca/topic/fix_monitoring_pvar Fix the PVAR allocation usage.	2019-02-25 16:03:56 -05:00
Nuria Losada	3cae149262	osc/rdma: fix when determining the node with the rank_array info for a peer Signed-off-by: Nuria Losada <nlosada@icl.utk.edu>	2019-02-20 13:12:00 -05:00
Artem Polyakov	13a8e42108	Merge pull request #6163 from artpol84/osc/mt_submission Refactoring of osc/ucx component for MT	2019-02-20 09:41:27 -08:00
Gilles Gouaillardet	ad114be28c	configury: automatically select rte/pmix runtime if ORTE project is not built Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-20 13:55:55 +09:00
Gilles Gouaillardet	69d136ae5e	ompi/pmix: fix misc OPAL function calls Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-20 13:55:55 +09:00
Gilles Gouaillardet	fe05fcc11a	osc/rdma: correctly handle communications to self mark the "self" peer OMPI_OSC_RDMA_PEER_LOCAL_BASE when the window is dynamically created and use_cpu_atomics is set in order to correctly handle communications to self. Thanks Bart Janssens for reporting this issue. Refs. open-mpi/ompi#6394 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-20 09:52:17 +09:00
Artem Polyakov	19e2ae2efb	opal/common/ucx: Switch to opal/tsd Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2019-02-19 14:22:07 -08:00
Artem Polyakov	7984d7d997	opal/common/ucx: Remove unused debugging macro Will be reintroduced later if needed and after adaptation to the OMPI infrastructure. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2019-02-19 14:22:07 -08:00
Artem Polyakov	43f16d8796	opal/common/ucx: Remove common_ucx_int.h Place the content of common_ucx_int.h back to the common_ucx.h and include common_ucx_wpool.h explicitly. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2019-02-19 14:22:07 -08:00
Xin Zhao	c6de09940f	ompi/osc/ucx: Switch osc/ucx code to use Worker Pool. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2019-02-19 14:22:07 -08:00
Yossi Itigin	91d05f91e2	Merge pull request #6384 from brminich/topic/ucx_worker_net_address PML/UCX: Use net worker address for remote peers	2019-02-17 12:21:00 +02:00
Matias A Cabral	25bdd118ac	MTL_OFI: Changed Recv cancel to be non-blocking Updated the OFI MTL's Recv cancel to be a non-blocking call to match the MPI spec. Given fi_cancel succeeded, then it is expected that the user will wait on the request to read the result of if the cancel has completed. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com	2019-02-14 17:07:20 -05:00
Mikhail Brinskii	751d88192d	PML/UCX: Use net worker address for remote peers For remote node peers pack smaller worker address, which contains network device addresses only. This would reduce amount of OOB traffic during startup. Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>	2019-02-14 18:06:36 +02:00
Brian Barrett	7a593cea4a	Merge pull request #6361 from aravindksg/fix_tg_segfault mtl/ofi: Fix segfault when not using Thread-Grouping feature	2019-02-12 12:04:26 -08:00
KAWASHIMA Takahiro	8bbd201029	Merge pull request #6205 from kawashima-fj/pr/fp16 Add FP16 datatypes	2019-02-08 14:52:13 +09:00
Artem Polyakov	35090b69f1	osc/base: Add debug output stating a selected component Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2019-02-07 15:54:20 -08:00
Aravind Gopalakrishnan	6edcc479c4	mtl/ofi: Fix segfault when not using Thread-Grouping feature For the non thread-grouping paths, only the first (0th) OFI context should be used for communication. Otherwise this would access a non existant array item and cause segfault. While at it, clarifiy some content regarding SEPs in README (Credit to Matias Cabral for README edits). Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2019-02-07 11:52:53 -08:00
Jeff Squyres	f5e1a672cc	ofi: revamp OPAL_CHECK_OFI configury Update the OPAL_CHECK_OFI configury macro: - Make it safe to call the macro multiple times: - The checks only execute the first time it is invoked - Subsequent invocations, it just emits a friendly "checking..." message so that configure output is sensible/logical - With the goal of ultimately removing opal/mca/common/ofi, rename the output variables from OPAL_CHECK_OFI to be opal_ofi_{happy\|CPPFLAGS\|LDFLAGS\|LIBS}. - Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions. - Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that causes the macro to be invoked at a fairly random time, which makes configure stdout confusing / hard to grok. - Remove a little left-over kruft in OPAL_CHECK_OFI, too (which resulted in an indenting change, making the change to opal_check_ofi.m4 look larger than it really is). Thanks Alastair McKinstry for the report and initial fix. Thanks Rashika Kheria for the reminder. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 06:29:58 -08:00
Jeff Squyres	aba2571881	mtl/ofi/Makefile.am: down with tabs! Replace all tabs with spaces. No code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 06:29:58 -08:00
Gilles Gouaillardet	945f830f7a	mtl/ofi: fix configury when VPATH is used Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-07 06:29:58 -08:00
Matias Cabral	0601b3e982	Merge pull request #6325 from aravindksg/fix_help_reference mtl/ofi: Fix reference to help text object	2019-02-05 07:22:51 -08:00
George Bosilca	e42b573cd3	Fix the PVAR allocation usage. According to the MPI standard the obj_handle is a pointer to an MPI object, and therefore cannot be MPI_COMM_WORLD. The MPI standard example 14.6 highlight this usage. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-02-02 19:03:43 -05:00
KAWASHIMA Takahiro	4d7bde27fb	ompi/datatype: Use `short float` for `MPI_REAL2` ... and add `MPI_COMPLEX4`. This commit changes values of existing `OMPI_DATATYPE_MPI_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OMPI internal code. On the other hand, `ompi_datatype_t::id` values of existing datatypes are not changed and 73 is newly assigned to for `MPI_COMPLEX4` to retain ABI compatibility. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 13:01:10 +09:00
KAWASHIMA Takahiro	4375c11a58	ompi/datatype: Add `ompi_mpi_short_float` ... and `ompi_mpi_c_short_float_complex` and `ompi_mpi_cxx_sfltcplex`. These are Open MPI internal variables intended to be defined as `MPI_SHORT_FLOAT`, `MPI_C_SHORT_FLOAT_COMPLEX`, and `MPI_CXX_SHORT_FLOAT_COMPLEX` in the future. `OMPI_DATATYPE_MPI_C_SHORT_FLOAT_COMPLEX` is also required to support `MPI_COMPLEX4` in the next commit. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:43:13 +09:00
Sergey Lebedev	829846dbcc	fp16 hcoll bindings Signed-off-by: Sergey Lebedev <sergeyle@mellanox.com>	2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro	f6b39452f6	opal/datatype: Support `short float` The type `short float` is proposed for the C language in ISO/IEC JTC 1/SC 22 WG 14 (C WG) for mainly IEEE 754-2008 binary16, a.k.a. half-precision floating point or FP16. By this commit, `short float` and `short float _Complex` are detected in `configure` and used in Open MPI internal code. `MPI_SHORT_FLOAT` and its complex number version are not added yet. This commit changes values of existing `OPAL_DATATYPE_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OPAL and OMPI internal code. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:40:14 +09:00
Thananon Patinyasakdikul	782ec851ea	Merge pull request #6319 from thananon/pr/allow_overtake pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.	2019-01-30 15:32:04 -05:00
Jeff Squyres	2203f8d900	Merge pull request #6185 from ggouaillardet/topic/hwloc_macros hwloc: remove public hwloc macros from opal_config.h	2019-01-30 07:32:22 -05:00
Gilles Gouaillardet	0aeb27f776	topo/treematch: silence a hwloc related warning treematch/km_partitioning.c #include "config.h", but there is no such file when the embedded treematch is used. In order to prevent the embedded treematch from incorrectly using the config.h from the embedded hwloc, generate a dummy config.h. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-01-30 14:51:38 +09:00
Aravind Gopalakrishnan	9cabcfdbba	mtl/ofi: Fix reference to help text object When we exceed the threshold number of contexts created, print appropriate help text Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2019-01-29 15:10:06 -08:00
Thananon Patinyasakdikul	0263456cf4	pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE. We missed an assert to check if ALLOW_OVERTAKE is set or not before validating the sequence number and this will cause deadlock. Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>	2019-01-29 14:55:06 -05:00
Brian Barrett	23da9fac23	Merge pull request #6294 from bwbarrett/mtl-ofi-no-device-warning mtl/ofi: Print descriptive error message on modex failure	2019-01-29 08:32:49 -08:00
Brian Barrett	1bb7a73a9c	Merge pull request #6302 from bwbarrett/feature/ofi-av-count mtl/ofi: Provide av count hint during initialization	2019-01-29 08:32:24 -08:00
Brian Barrett	44be7f139a	mtl/ofi: Provide av count hint during initialization Provide the av_attr.count hint (number of addresses that will be inserted into the address vector through the life of the process) at initialization of the address vector. It's ok to be a bit wrong, but some endpoints (RxR) can benefit by not going through the slow growth realloc churn. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2019-01-24 15:47:24 -08:00

1 2 3 4 5 ...

6918 Коммитов