openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	70f8a6e792	osc/pt2pt: fix several bugs This commit fixes some bugs uncovered during thread testing of 2.0.1rc1. With these fixes the component is running cleanly with threads. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-24 14:35:45 -06:00
Ralph Castain	bcf5ac3971	Set the default value of both barrier counters to zero, thus ensuring the coll/sync component is off by default	2016-08-24 07:51:32 -07:00
Ralph Castain	22844b0dc6	Balance priorities to ensure something is below sync	2016-08-23 17:33:45 -07:00
Ralph Castain	540f23c4dd	Adjust priority of coll/sync downwards	2016-08-23 17:12:48 -07:00
Edgar Gabriel	41ed4a28d2	add the protective lock around read and write operations in ompio	2016-08-23 11:07:58 -05:00
Howard Pritchard	696121cc4a	Merge pull request #1988 from hppritcha/topic/another_ofi_fix mtl/ofi: fix a botched assignment of av_type	2016-08-22 17:59:59 -06:00
Ralph Castain	6549c878a9	Silence the warnings	2016-08-22 15:35:27 -07:00
Ralph Castain	871bedb103	Add missing "const" qualifiers	2016-08-22 12:54:24 -07:00
Edgar Gabriel	a76f4d7c69	Merge pull request #1990 from edgargabriel/topic/mt-io steps towards making file I/O operations thread safe	2016-08-22 08:19:33 -05:00
Joshua Ladd	deae1ab375	Merge pull request #1985 from vspetrov/master coll/hcoll: Fixes predifined types mapping	2016-08-22 09:18:59 -04:00
Edgar Gabriel	bc042259bc	make initialization of the io framework thread safe. Also, remove the lock/unlock in the file_open ompi-interface routines of romio314. The global lock in the romio component does probably not work, it is easy to construct a testcase where two threads perform collective I/O operations on different file handles. With a global lock it is easy to deadlock. THe lock has to be at least on the file handle basis. move the mutex to file/file.c to avoid duplicate symbol problem in file_open.c pfile_open.c	2016-08-21 16:09:00 -05:00
George Bosilca	b96ec77e40	This variable belongs to the tuned modules and not to base.	2016-08-20 15:37:55 -04:00
George Bosilca	e8425eb1f5	Rename an OMPI internal variable (ticket #1955 ).	2016-08-20 15:37:55 -04:00
rhc54	102d3afe2c	Merge pull request #1992 from rhc54/topic/sync Restore the coll/sync module and provide a test to verify its operation	2016-08-20 13:33:28 -05:00
George Bosilca	fd57f5bccd	Remove some of the clang warnings.	2016-08-20 14:21:42 -04:00
Ralph Castain	9888615e75	Restore the coll/sync module and provide a test to verify its operation	2016-08-20 10:14:52 -07:00
Howard Pritchard	61d62b6821	mtl/ofi: fix a botched assignment of av_type Well now the av_type is being assigned correctly Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-08-19 17:01:02 -05:00
Valentin Petrov	9790373fc6	coll/hcoll: Fixes predifined types mapping	2016-08-19 11:19:12 +03:00
Nathan Hjelm	e5c7512692	Merge pull request #1983 from hjelmn/request_cb ompi/request: change semantics of ompi request callbacks	2016-08-18 08:31:56 -06:00
Nathan Hjelm	6aa658ae33	ompi/request: change semantics of ompi request callbacks This commit changes the sematics of ompi request callbacks. If a request's callback has freed or re-posted (using start) a request the callback must return 1 instead of OMPI_SUCCESS. This indicates to ompi_request_complete that the request should not be modified further. This fixes a race condition in osc/pt2pt that could lead to the req_state being inconsistent if a request is freed between the callback and setting the request as complete. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-17 20:14:01 -06:00
Edgar Gabriel	e14c23ba79	Merge pull request #1980 from edgargabriel/topic/coverty-cleanup io/ompio: Topic/coverty cleanup	2016-08-17 17:27:51 -05:00
Edgar Gabriel	2c8437ce62	fs/pvfs2: fix a common symbol	2016-08-17 13:10:32 -05:00
Edgar Gabriel	eba5293586	fix coverty warning CID 1369021	2016-08-17 13:02:45 -05:00
Nathan Hjelm	40b70889e5	osc/pt2pt: make receive count an unsigned int This receive_count MCA variable should never be negative. Change it to an unsigned int. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-08-17 08:14:24 -06:00
Gilles Gouaillardet	8faa1edafa	osc/pt2pt: silence misc warnings	2016-08-17 14:24:14 +09:00
LANL OMPI Bot	96c7762050	Merge pull request #1942 from hppritcha/topic/minor_ofi_fix mtl/ofi: use mca param to set av type	2016-08-16 14:14:12 -06:00
Nathan Hjelm	9444df1eb7	osc/pt2pt: make lock_all locking on-demand The original lock_all algorithm in osc/pt2pt sent a lock message to each peer in the communicator even if the peer is never the target of an operation. Since this scales very poorly the implementation has been replaced by one that locks the remote peer on first communication after a call to MPI_Win_lock_all. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-11 15:33:07 -06:00
Nathan Hjelm	7589a25377	osc/pt2pt: do not repost receive from request callback This commit fixes an issue that can occur if a target gets overwhelmed with requests. This can cause osc/pt2pt to go into deep recursion with a stack like req_complete_cb -> ompi_osc_pt2pt_callback -> start -> req_complete_cb -> ... . At small scale this is fine as the recursion depth stays small but at larger scale we can quickly exhaust the stack processing frag requests. To fix the issue the request callback now simply puts the request on a list and returns. The osc/pt2pt progress function then handles the processing and reposting of the request. As part of this change osc/pt2pt can now post multiple fragment receive requests per window. This should help prevent a target from being overwhelmed. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-08-11 15:33:07 -06:00
George Bosilca	8d0baf140f	If the RTE fails to deliver the daemon information, gracefully fallback to a non-reordered communicator. Optimize the loops building the process hierarchy.	2016-08-11 13:04:27 -04:00
Howard Pritchard	e46eee3fcb	mtl/ofi: use mca param to set av type Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-08-10 16:10:17 -06:00
Gilles Gouaillardet	dfbf2b7be4	opal/threads: add OPAL_THREAD_SUB_SIZE_T macro -1 is not a valid size_t, so instead of OPAL_THREAD_ADD_SIZE_T(..., -1), simply OPAL_THREAD_SUB_SIZE_T(..., 1) and keep picky compilers happy	2016-08-10 13:37:36 +09:00
Nathan Hjelm	799104f688	Merge pull request #1947 from hjelmn/perf pml/ob1: be more selective when using rdma capable btls	2016-08-09 22:15:09 -06:00
Nathan Hjelm	4079eec974	pml/ob1: be more selective when using rdma capable btls This commit updates the btl selection logic for the RDMA and RDMA pipeline protocols to use a btl iff: 1) the btl is also used for eager messages (high exclusivity), or 2) no other RDMA btl is available on an endpoint and the pml_ob1_use_all_rdma MCA variable is true. This fixes a performance regression with shared memory when an RDMA capable network is available. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-09 20:54:42 -06:00
Nathan Hjelm	2788083b98	Merge pull request #1936 from hjelmn/osc_pt2pt_fix osc/pt2pt: do not set rdma_frag after start	2016-08-08 14:17:40 -06:00
Nathan Hjelm	e4d7ea75a9	Merge pull request #1935 from hjelmn/persistent_fix pml/ob1: reset req_bytes_packed on start	2016-08-08 14:17:13 -06:00
Todd Kordenbrock	3be6052523	Merge pull request #1896 from PDeveze/Patchs-on-coll-portals4 Patchs on coll portals4	2016-08-08 14:57:02 -05:00
Edgar Gabriel	fb9fa4fbc4	Merge pull request #1938 from edgargabriel/pr/barrier-on-close io/ompio: Add barrier to file_close and to file_set_size	2016-08-08 09:22:08 -05:00
Edgar Gabriel	4709f4229b	Merge pull request #1929 from edgargabriel/pr/ompio-code-reorg io/ompio: next step in code-reorganization	2016-08-08 09:20:54 -05:00
Thananon Patinyasakdikul	23b27c510c	romio: make romio use internal opal_random instead of rand(3). This fixes issue #1877	2016-08-05 09:04:52 -07:00
Howard Pritchard	ff669e7b15	code cleanup: clang is now a happier panda Clang 5.1 on my mac was a sad panda compiling a couple of files, complaining about uninitialized stack variables. This commit makes clang a happier panda (or at least not so sad). Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-08-04 19:34:44 -06:00
Edgar Gabriel	9c3180160c	io/ompio: Add barrier to file_close and to file_set_size This fixes a bug reported on the mailing for ompio. https://www.open-mpi.org/community/lists/users/2016/05/29333.php	2016-08-04 11:20:31 -05:00
Gilles Gouaillardet	60e91e890a	coll/base: give a boost to ompi_coll_base_sendrecv_nonzero_actual() Based on current implementation it is faster to use a blocking send than the non-blocking version. Switch the exchange function used in the barrier to use the blocking version combined with the non-blocking version of the receive. This is similar to open-mpi/ompi@223d75595d	2016-08-04 13:31:07 +09:00
Nathan Hjelm	11c853d05e	osc/pt2pt: do not set rdma_frag after start It is possible for the start call to complete the requests. For this reason the module rdma_frag field should be filled in before start is called. If the request completes the completion callback will reset the rdma_frag field to NULL. Fixes a bug discovered by @tkordenbrock. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-03 15:20:36 -06:00
Nathan Hjelm	889dd32806	pml/ob1: reset req_bytes_packed on start On start we were not correctly resetting all request fields. This was leading to a double-completion on persistent receives. This commit updates the base start code to reset the receive req_bytes_packed and the send request convertor. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-03 11:29:30 -06:00
Edgar Gabriel	aa7e852e44	common/ompio: files are only compiled in case MPI I/O is requested fixes: open-mpi/ompi#1932	2016-08-02 15:01:38 -05:00
Edgar Gabriel	19fe5cac50	io/ompio: next step in code-reorganization - move the sort_iovec operations to fcoll/base - move set_view_internal to common/ompio - move set_file_default to common/ompio - remove io_ompio_sort, not used anymore.	2016-08-02 09:18:29 -05:00
Gilles Gouaillardet	917d96ba50	coll/libnbc: cleanup handling of the second temporary buffer in ireduce	2016-08-02 16:32:15 +09:00
Gilles Gouaillardet	ed9139ca13	coll/libnbc: correctly handle datatype alignment when allocating two buffers at once	2016-08-02 15:44:12 +09:00
Edgar Gabriel	c0bd8728fd	io/ompio: move aggregator selection code to a separate file - move all functions related to aggregator selection to a single file - perform code cleanup fixing many Coverty complains along the way.	2016-08-01 14:04:27 -05:00
Edgar Gabriel	160d9a78c1	Merge pull request #1886 from edgargabriel/pr/ompio-reorg io/ompio: move io/ompio functionality to common/ompio	2016-07-29 12:24:21 -05:00
Joshua Ladd	4a03a657c6	Merge pull request #1913 from vspetrov/hcoll_derived_datatypes coll/hcoll mpi datatypes support	2016-07-29 10:08:23 -04:00
Nathan Hjelm	1da558407c	Merge pull request #1911 from hjelmn/threads opal/thread: clean up and add additional OPAL_THREAD macros	2016-07-29 06:44:11 -06:00
Valentin Petrov	3582bba6b7	coll/hcoll mpi datatypes support	2016-07-29 10:06:39 +03:00
Howard Pritchard	5ff6b81eee	Merge pull request #1871 from hppritcha/topic/ofi_mtl_params mtl/ofi: add some more mca parameters	2016-07-28 18:21:23 -06:00
Nathan Hjelm	aac611237b	opal/thread: clean up and add additional OPAL_THREAD macros This commit expands the OPAL_THREAD macros to include 32- and 64-bit atomic swap. Additionally, macro declararations have been updated to include both OPAL_THREAD_* and OPAL_ATOMIC_*. Before this commit the former was used with add and the later with cmpset. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-07-28 09:23:14 -06:00
Howard Pritchard	22c8743557	mtl/ofi: add some more mca parameters allow for toggling of both control/data progress models. allow for using FI_AV_TABLE or FI_AV_MAP for av type. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-07-28 02:35:09 -06:00
Gilles Gouaillardet	a0a999e63d	coll/base: fix ompi_coll_base_allgatherv_intra_basic_default() with MPI_IN_PLACE	2016-07-28 13:57:18 +09:00
Gilles Gouaillardet	b8a1ffb87e	coll/base: fix ompi_coll_base_allgatherv_intra_basic_default() Fixes open-mpi/ompi#1907	2016-07-28 13:50:04 +09:00
Pascal Deveze	10763f5abc	mtl/portals4: Take into account the limitation of portals4 (max_msg_size) and split messages if necessary	2016-07-26 08:44:07 +02:00
Pascal Deveze	724801b018	mtl-portals4: Introduce a "short_limit" for the short message size. "eager_limit" will only be used for the limit of the eager part of the messages sent with the rndv protocol	2016-07-26 08:43:24 +02:00
Pascal Deveze	9e58b4842f	mtl-portals4: Correct how the request_status._ucount is set	2016-07-26 08:42:48 +02:00
Pascal Deveze	3ca194f10a	mtl-portals4: Store ptl_process_id (from PtlGetPhysId) and display it.	2016-07-26 08:42:08 +02:00
Pascal Deveze	bd3b1cf7be	mtl-portals4: Control that flowctl_idx is egal to REQ_FLOWCTL_TABLE_ID and use OPAL_ATOMIC_CMPSET_32 to test and set flowctl_active flag to true	2016-07-26 08:41:31 +02:00
Ralph Castain	9ab20cafe3	Pass the nodeid for each proc in the job. Fix a mistaken error output message	2016-07-25 15:41:15 -07:00
Edgar Gabriel	b0fa1fd2a1	move the internal file_open/close functions to common/ompio	2016-07-21 13:08:32 -05:00
Edgar Gabriel	ccf76b7791	moving the internal read/write functions to common/ompio and update all fs/fcoll/sharedfp components to use these functions.	2016-07-21 13:08:32 -05:00
Edgar Gabriel	688710d408	make common/ompio compile	2016-07-21 13:08:32 -05:00
Edgar Gabriel	39ae93b87b	modify the fcoll components to use the common/ompio print queues	2016-07-21 13:08:32 -05:00
Edgar Gabriel	fe17410943	next step in making the print_queue functionality move to common/ompio	2016-07-21 13:08:32 -05:00
Edgar Gabriel	af67c8f239	first cut on moving some ompio functionality to common/ompio	2016-07-21 13:08:32 -05:00
Edgar Gabriel	a899c0fb38	fcoll/static: fix coverty warnings fix coverty warnings CID 72144, CID 710677, CID 1364164	2016-07-21 13:08:15 -05:00
Pascal Deveze	a7e3de6c4f	coll-portals4: No more messages passed to Portals4 bigger than the limit given by PtlNIInit	2016-07-21 15:58:20 +02:00
Pascal Deveze	175e6aa385	coll-portals4: Before calling PtlCTWait, call PtlTriggeredInc twice so be sure all pending PtlTriggredPut are triggered	2016-07-21 15:58:20 +02:00
Pascal Deveze	df59d6cdd4	coll-portals4: Correct and simplify how the data are cut in segment_nb segments (bcast)	2016-07-21 15:58:09 +02:00
Pascal Deveze	274f8d608c	coll-portals4: Change output format and change variable names (minor changes).	2016-07-21 11:06:45 +02:00
Todd Kordenbrock	37ad6aa711	Merge pull request #1853 from PDeveze/Patchs-on-osc-portals4 Patchs on osc portals4	2016-07-20 09:22:19 -05:00
Todd Kordenbrock	210534adb3	Merge pull request #1850 from PDeveze/Patchs-on-mtl-portals4 Patchs on mtl portals4	2016-07-20 08:21:03 -05:00
Pascal Deveze	9cac32ba6a	mtl/portals4: Modifications concerning the short message management	2016-07-19 11:21:50 +02:00
Pascal Deveze	49e9936914	mtl/portals4: Some little patches	2016-07-19 11:18:55 +02:00
Pascal Deveze	f19a2b961c	osc/portals4: Correct an error in an if statement	2016-07-18 13:16:12 +02:00
Pascal Deveze	81823d7a63	osc/portals4: Store the no_locks parameter in osc_portals4_component.no_locks	2016-07-18 11:51:52 +02:00
Pascal Deveze	76b38651da	osc/portals4: For the contiguous datatype, take into account the lower bound before calling portals4	2016-07-18 11:20:50 +02:00
Pascal Deveze	7aaf16e7fe	osc/portals4: Put/Get splitting because Portals4 may restrict sizes	2016-07-18 10:49:28 +02:00
Pascal Deveze	025201b459	osc/portals4: set the initial value of req_status.MPI_ERROR to MPI_SUCCESS	2016-07-18 09:52:56 +02:00
Pascal Deveze	aa0d687a0a	osc/portals4: Display an ouput message if ompi_osc_portals4_get_dt() or ompi_osc_portals4_get_op() returns an error	2016-07-18 09:52:56 +02:00
Pascal Deveze	c4181909a4	osc/portals4: Be sure that the ME are operationnal (wait for the PTL_EVENT_LINK)	2016-07-18 09:52:56 +02:00
Pascal Deveze	e99e7d08ed	osc/portals4: For the ME, use the uid from PtlGetUid instead of PTL_UID_ANY	2016-07-18 09:52:56 +02:00
Pascal Deveze	56b36eeb7e	osc/portals4: Format of "target_disp" is OPAL_PTRDIFF_TYPE and %lu is the appropriate format to display it.	2016-07-18 09:52:55 +02:00
Pascal Deveze	a76566c754	osc/portals4: To allocate a PT, use REQ_OSC_TABLE_ID and test that the right ID is allocated	2016-07-18 09:52:55 +02:00
Edgar Gabriel	195ec89732	fcoll/base: mv coll_array functionis to fcoll base the coll_array functions are truly only used by the fcoll modules, so move them to fcoll/base. There is currently one exception to that rule (number of aggreagtors logic), but that function will be moved in a long term also to fcoll/base.	2016-07-14 08:41:14 -05:00
Edgar Gabriel	1f1504ebbb	remove some unused code	2016-07-14 08:41:14 -05:00
Joshua Ladd	06930a0423	Merge pull request #1840 from artpol84/yalla_perf_fix pml/yalla: fix yalla performance regression	2016-07-14 10:55:30 +03:00
Pascal Deveze	b87ed1ad4a	mtl/portals4: Display actual limits given by the portals4 PtlNIInit function	2016-07-12 15:07:31 +02:00
Pascal Deveze	f666b0d9aa	mtl/portals4: Allocate a PT with the PTL_PT_FLOWCTRL flag only if OMPI_MTL_PORTALS4_FLOW_CONTROL is set	2016-07-12 15:07:31 +02:00
Pascal Deveze	bed572cd6c	mtl/portals4: Unlink the ME first, then free the CT and at the end free the PT	2016-07-12 15:07:30 +02:00
Ralph Castain	0e433eaa78	Silence warning	2016-07-11 19:43:02 -07:00
Nathan Hjelm	b47208e909	osc/rdma: fix bug in CAS This commit fixes a bug in the RDMA compare-and-swap implementation that caused the origin value to always be written even if the compare should have failed. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-07-11 09:54:23 -06:00
Edgar Gabriel	c8b1c6cae1	Merge pull request #1856 from edgargabriel/pr/zero-size-iread-iwrite io/ompio: fix the request in case of a zero size write/read operation	2016-07-11 08:19:02 -05:00
Gilles Gouaillardet	14624506df	coll/libnbc: do not exchange data between roots in ompi_coll_libnbc_ireduce_scatter_inter() this is now useless since the scatter is done via the local communicator	2016-07-11 17:18:30 +09:00
Edgar Gabriel	3dd81e9e09	io/ompio: fix the request in case of a zero size write/read operation	2016-07-08 14:11:22 -05:00
Gilles Gouaillardet	a55d57406b	coll/base: fix non zero lower bound datatype handling in mca_coll_base_alltoallv_intra_basic_inplace()	2016-07-08 16:55:26 +09:00
Gilles Gouaillardet	7b8094aac1	coll/base: silence misc warning as reported by Coverity with CIDs 1363349-1363362 Offset temporary buffer when a non zero lower bound datatype is used. Thanks Hristo Iliev for the report (cherry picked from commit `0e393195d9`)	2016-07-08 13:06:26 +09:00
Gilles Gouaillardet	678d08647b	coll/libnbc: various fixes - correctly handle non commutative operators - correctly handle non zero lower bound ddt - correctly handle ddt with size > extent - revamp NBC_Sched_op so it takes two buffers and matches ompi_op_reduce semantic - various fix for inter communicators Thanks Yuki Matsumoto for the report	2016-07-07 15:55:49 +09:00
Gilles Gouaillardet	3e559a14a9	coll/inter: fix non standard ddt handling - correctly handle non zero lower bound ddt - correctly handle ddt with size > extent Thanks Yuki Matsumoto for the report	2016-07-07 15:49:59 +09:00
Gilles Gouaillardet	488d037d51	coll/basic: fix non standard ddt handling - correctly handle non zero lower bound ddt - correctly handle ddt with size > extent Thanks Yuki Matsumoto for the report	2016-07-07 15:49:53 +09:00
Gilles Gouaillardet	c06fb04a9a	coll/base: fix non zero lower bound ddt handling in ompi_coll_base_reduce_intra_basic_linear() Thanks Yuki Matsumoto for the report	2016-07-07 15:49:48 +09:00
Ralph Castain	ee56d9dc1a	Shorten the session directory name as some OS's are now providing unusually long temp directory names, causing us to overflow the sockaddr field	2016-07-05 14:59:50 -07:00
George Bosilca	eac5b3c668	Various cleanups in the monitoring PML.	2016-07-05 18:31:25 +02:00
Artem Polyakov	a4ff9bef6d	fix #2	2016-07-05 14:38:35 +03:00
Artem Polyakov	bc973cad30	fix	2016-07-05 14:33:31 +03:00
Artem Polyakov	7d96f12fec	pml/yalla: fix yalla performance regression It was introduced in PR https://github.com/open-mpi/ompi/pull/1228 in particular in commit `041a6a9f53`. Original solution was using "flexible array member" called "mxm_base" to "fall-through" to the "mxm" send/recv member that located in the outer structure. After changing number of elements in "mxm_base" from 0 to 1 we actually allocating 2 mxm_req_base_t elements which leads to increased overal size and harms cache performance. It also brakes "mca_pml_yalla_check_request_state" function.	2016-07-05 10:52:48 +03:00
Joshua Hursey	0a09f8bc51	coll/hcoll: Protect module destruct when not fully initialized * If hcoll is given a negative priority, but not enabled=0 then the module is constructed, but then destructed before calling it's query(). So the previous pointers are not initialized. If we try to OBJ_RELEASE them in a debug build an assert will fire. This commit adds some protection against that and initializes the _module pointers to NULL.	2016-07-01 13:41:27 -05:00
Joshua Hursey	59f304b9e9	coll/base: neg. priority cleanup, verbose output improvements * Print a verbose message if the component was disqualified because of a negative priority. * If a disqualified component provided a module, release it. * Display list of selected components in priority order - During the process of volunteering collective functions for a communicator, print the component name and priority. This will cause the verbose messages to be displayed in reverse priority order (lowest priority first, up to highest). This is helpful when determining which collective components are active in which order for a given communicator. To see the messages you need the following MCA parameter set to 9 or higher: `-mca coll_base_verbose 9` * Adjust verbose for commonly needed verbose output from 10 to 9 to make it easier to access this information.	2016-07-01 13:41:27 -05:00
Nathan Hjelm	5f390b5f5a	bml/r2: be more restrictive on rdma endpoints This commit makes bml/r2 more restrictive on which endpoints end up in the rdma endpoint list. Before this commit an endpoint was added if it supported either put or get. This was done to ensure that endpoints are available for RMA. Thought it is possible to support put or get endpoints we only currently support endpoints that have put, get, and amos. bml/r2 now reflects this support. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-29 18:54:58 -06:00
Nathan Hjelm	2409024c17	osc/rdma: fix typo Need to increment the total size after checking the local offset not before. This typo causes large allocations with MPI_Win_allocate() to fail. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-21 09:50:29 -06:00
George Bosilca	9c4f56be4b	Fix the coll_base_sendrecv function.	2016-06-18 18:23:51 +02:00
Ralph Castain	5d330d5220	Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler. Add PMIx 2.0 Remove PMIx 1.1.4 Cleanup copying of component Add missing file Touchup a typo in the Makefile.am Update the pmix ext114 component Minor cleanups and resync to master Update to latest PMIx 2.x Update to the PMIx event notification branch latest changes	2016-06-14 13:08:41 -07:00
Edgar Gabriel	1ddfd6cdca	io/ompio: fix the preallocate function handle preallocating sizes less than the current file size correctly.	2016-06-14 10:50:32 -05:00
Gilles Gouaillardet	80e362de52	coll/base: fix memory free in ompi_coll_base_allreduce_intra_recursivedoubling err handler Fix CID 1362630 Fixes open-mpi/ompi@0e393195d9	2016-06-09 13:12:25 +09:00
Gilles Gouaillardet	ead7efef3f	coll/basic: silence CID 1362614 in mca_coll_basic_allreduce_inter()	2016-06-09 09:40:19 +09:00
Gilles Gouaillardet	ad2e1a5ae9	coll/base: silence CID 1362613 in ompi_coll_base_alltoall_intra_basic_linear()	2016-06-09 09:40:05 +09:00
Gilles Gouaillardet	80b267af1c	coll/base: silence CID 1362601 in ompi_coll_base_sendrecv_zero()	2016-06-09 09:37:31 +09:00
Gilles Gouaillardet	0e393195d9	coll/base: fix [all]reduce with non zero lower bound datatypes Offset temporary buffer when a non zero lower bound datatype is used. Thanks Hristo Iliev for the report	2016-06-08 16:48:00 +09:00
Nathan Hjelm	3ddf3ccbf3	Merge pull request #1758 from hjelmn/ob1_fixes pml/ob1: bug fixes	2016-06-07 11:18:55 -06:00
Todd Kordenbrock	9671d6af47	Merge pull request #1689 from francois-wellenreiter/remove_trig_rdv_portals4 MTL portals4 : remove the triggered rendez-vous protocol	2016-06-06 21:55:01 -05:00
Nathan Hjelm	5d0b4679ea	pml/ob1: bug fixes This commit fixes two bugs in pml/ob1: - Do not called MCA_PML_OB1_PROGRESS_PENDING from mca_pml_ob1_send_request_start_copy as this may lead to a recursive call to mca_pml_ob1_send_request_process_pending. - In mca_pml_ob1_send_request_start_rdma return the rdma frag object if a btl fragment can not be allocated. This fixes a leak identified by @abouteiller and @bosilca. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-06 17:54:55 -06:00
Gilles Gouaillardet	c976559877	coll/basic: fix log basic bcast The log basic bcast was completely broken. The rank 0 gets the hibit set to -1, so it always returned an error.	2016-06-06 11:01:51 +09:00
Gilles Gouaillardet	99fedcb7a3	fs/base: silence a memory leak in mca_fs_base_get_fstype() Fixes CID 1351211	2016-06-06 09:20:14 +09:00
George Bosilca	9376b0340b	Fix the basic barrier. The log basic barrier was completely broken. The rank 0 gets the hibit set to 0, so it always returned an error.	2016-06-03 23:46:25 -04:00
Edgar Gabriel	d6af5444a6	fix the get_byte_offset code	2016-06-03 11:36:53 -05:00
Nathan Hjelm	e968ddfe64	start bug fixes (#1729 ) * mpi/start: fix bugs in cm and ob1 start functions There were several problems with the implementation of start in Open MPI: - There are no checks whatsoever on the state of the request(s) provided to MPI_Start/MPI_Start_all. It is erroneous to provide an active request to either of these calls. Since we are already looping over the provided requests there is little overhead in verifying that the request can be started. - Both ob1 and cm were always throwing away the request on the initial call to start and start_all with a particular request. Subsequent calls would see that the request was pml_complete and reuse it. This introduced a leak as the initial request was never freed. Since the only pml request that can be mpi complete but not pml complete is a buffered send the code to reallocate the request has been moved. To detect that a request is indeed mpi complete but not pml complete isend_init in both cm and ob1 now marks the new request as pml complete. - If a new request was needed the callbacks on the original request were not copied over to the new request. This can cause osc/pt2pt to hang as the incoming message callback is never called. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> * osc/pt2pt: add request for gc after starting a new request Starting a new receive may cause a recursive call into the pt2pt frag receive function. If this happens and the prior request is on the garbage collection list it could cause problems. This commit moves the gc insert until after the new request has been posted. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-02 20:22:40 -04:00
Matias A Cabral	29ab28f4f6	Adding owner.txt file for PSM2 MTL.	2016-06-02 16:26:16 -07:00
George Bosilca	d577e12dd0	Fix comment.	2016-06-03 00:57:31 +09:00
George Bosilca	2e1b1d34c6	Safety first !	2016-06-02 11:52:43 +09:00
George Bosilca	50cec456fb	ompi_request_complete with signal Rewrite the ompi_request_complete function to take in account the with_signal argument. Change the comment to explain the expected behavior. Alter all the ompi_request_complete uses to make sure the status of the request is set before calling ompi_request_complete. bot🏷️enhancement	2016-06-02 11:49:12 +09:00
George Bosilca	223d75595d	Give a boost to MPI_Barrier. Based on current implementation it is faster to use a blocking send than the non-blocking version. Switch the exchange function used in the barrier to use the blocking version combined with the non-blocking version of the receive.	2016-06-02 11:45:25 +09:00
Nathan Hjelm	086ffc1838	pml/ob1: fix race on pml completion of send requests The request code was setting the request as pml_complete before calling MCA_PML_OB1_SEND_REQUEST_MPI_COMPLETE. This was causing MCA_PML_OB1_SEND_REQUEST_RETURN to be called twice in some cases. The code now mirrors the recvreq code and only sets the request as pml complete if the request has not already been freed. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-01 13:36:06 -06:00
Gilles Gouaillardet	5f565dfec3	configury: clean the flex generated .c files	2016-06-01 11:13:31 +09:00
Thananon Patinyasakdikul	60d0fbf683	Removal of ompi_request_lock from pml/ucx.	2016-05-26 12:36:58 -04:00
George Bosilca	90f294096e	Remove more references to the request mutex. Regarding BFO it should be mentionned that this component is currently unmaintained, and that despite my efforts I could not make it compile (it would not compile before this patch either).	2016-05-25 23:27:06 -04:00
Nathan Hjelm	9d439664f0	pml/yalla: update for request changes This commit brings the pml/yalla component up to date with the request rework changes. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 15:42:53 -06:00
Nathan Hjelm	8445c885ce	pml/cm: update for request changes This fixes a hang caused by the request refactor work. The cm pml was not updated and was hanging is most cases. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 15:35:32 -06:00
Valentin Petrov	5ff6372886	coll/hcoll: bugfix: initialize req_type field If left uninitialized then segfault is possible in MPI_Waitall in the case the field by chance equals OMPI_REQUEST_GEN.	2016-05-25 15:38:01 +03:00
bosilca	b90c83840f	Refactor the request completion (#1422 ) * Remodel the request. Added the wait sync primitive and integrate it into the PML and MTL infrastructure. The multi-threaded requests are now significantly less heavy and less noisy (only the threads associated with completed requests are signaled). * Fix the condition to release the request.	2016-05-24 18:20:51 -05:00
Jeff Squyres	e7d46b96a3	Merge pull request #1680 from yburette/topic/fix_provider_selection mtl/ofi: Change default provider selection behavior.	2016-05-23 15:06:02 -04:00
Francois WELLENREITER	b2b0fc63e2	MTL portals4 : remove the triggered rendez-vous protocol	2016-05-23 15:50:00 +02:00
Gilles Gouaillardet	bca44592af	Merge pull request #1643 from ggouaillardet/topic/romio_openbsd57 io/romio: fix filesystem type check on OpenBSD	2016-05-23 16:33:56 +09:00
Nathan Hjelm	31bfeede82	bml/r2: always add btl progress function This commit changes the behavior of bml/r2 from conditionally registering btl progress functions to always registering progress functions. Any progress function beloning to a btl that is not yet in use is registered as low-priority. As soon as a proc is added that will make use of the btl is is re-registered normally. This works around an issue with some btls. In order to progress a first message from an unknown peer both ugni and openib need to have their progress functions called. If either btl is not in use after the first call to add_procs the callback was never happening. This commit ensures the btl progress function is called at some point but the number of progress callbacks is reduced from normal to ensure lower overhead when a btl is not used. The current ratio is 1 low priority progress callback for every 8 calls to opal_progress(). Fixes open-mpi/ompi#1676 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-21 15:54:04 -04:00
yohann	2f0cde791a	mtl/ofi: Change default provider selection behavior. As more providers get added to libfabric, the default exclude list would need to be updated. Instead, we choose to include only the providers known to work by default. New default: - include: psm,psm2,gni - exclude: none	2016-05-19 10:59:25 -07:00
Ralph Castain	a35bb8453a	Unlock the mutex prior to destructing it. Thanks to Nicolas Joly for the report	2016-05-19 10:36:58 -07:00
rhc54	8b534e9897	Merge pull request #1668 from rhc54/topic/slurm When direct launching applications, we must allow the MPI layer to pr…	2016-05-16 12:23:19 -07:00
Jeff Squyres	5275e5e2a1	bml_r2: use __func__ to identify function names There were some old/stale function names in some debugging/verbose opal_output calls. Use __func__ instead, so that they won't become stale in the future. Thanks to Durga Choudhury for pointing out the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-16 11:06:47 -04:00
Ralph Castain	01ba861f2a	When direct launching applications, we must allow the MPI layer to progress during RTE-level barriers. Neither SLURM nor Cray provide non-blocking fence functions, so push those calls into a separate event thread (use the OPAL async thread for this purpose so we don't create another one) and let the MPI thread sping in wait_for_completion. This also restores the "lazy" completion during MPI_Finalize to minimize cpu utilization. Update external as well Revise the change: we still need the MPI_Barrier in MPI_Finalize when we use a blocking fence, but do use the "lazy" wait for completion. Replace the direct logic in MPI_Init with a cleaner macro	2016-05-14 16:37:00 -07:00
Aurélien Bouteiller	7f65c2b18e	forgot to update copyright in commits `627a89b` `4899c89`	2016-05-13 11:34:59 -04:00
George Bosilca	37e03e3e5b	Don't update req_bytes_received if no bytes were received.	2016-05-12 23:39:32 -04:00
Matias A Cabral	528abff6ae	Merge remote-tracking branch 'upstream/master'	2016-05-10 15:42:08 -07:00
Matias A Cabral	d28ee62a96	Update in PSM and PSM2 MTLs to detect entries created by drivers for Intel TrueScale and Intel OmniPath, and detect a link in ACTIVE state. This fix addresses the scenario reported in the below OMPI users email, including formerly named Qlogic IB, now Intel True scale. Given the nature of the PSM/PSM2 mtls this fix applies to OmniPath: https://www.open-mpi.org/community/lists/users/2016/04/29018.php	2016-05-09 12:08:44 -07:00
Gilles Gouaillardet	0a19337371	coll/base: return MPI_ERR_UNSUPPORTED_OPERATION when coll_base_*_two_procs algo is used on a communicator that has no two tasks Thanks Dave Love for the report	2016-05-09 14:18:40 +09:00
Gilles Gouaillardet	b159587325	io/romio: fix filesystem type check on OpenBSD 5.7 check the existence of the f_type field in struct statfs Thanks Paul Hargrove for the report	2016-05-09 13:54:46 +09:00
Ralph Castain	6b24e2779b	Remove stale component - I'm not going to get to it	2016-05-07 04:13:34 -07:00
Edgar Gabriel	def1b95fd7	Merge pull request #1646 from edgargabriel/getview-preallocate-fixes io/ompio: file_getview and file_preallocate fixes	2016-05-06 11:46:00 -05:00
Edgar Gabriel	e65e189671	io/ompio: fix file size after file_preallocate Thanks for @dalcini for reporting Fixes open-mpi/ompi#1633	2016-05-06 08:20:59 -05:00
Edgar Gabriel	d358965134	io/ompio: fix envelope of datatype returned by getview Thanks for @dalcini for reporting Fixes open-mpi/ompi#1632	2016-05-06 08:19:48 -05:00
Edgar Gabriel	7c92acaa78	Merge pull request #1637 from edgargabriel/pr/netbsd-compilation-problems fs/lustre and fs/pvfs2: fix netbsd compilation problems	2016-05-06 08:05:36 -05:00
Gilles Gouaillardet	6c9d65c0ca	coll/libnbc: fix MPI_Ireduce_scatter_block for one task communicator Thanks Lisandro Dalcin for the report Fixes open-mpi/ompi#248	2016-05-06 09:43:29 +09:00
Ralph Castain	08022d7af1	Some minor cleanups of warnings from gcc 6.0.0. Update s1/s2 pmix to get max_procs as required.	2016-05-05 15:28:13 -07:00
Jeff Squyres	f167be1c91	ompio: always return valid info from FILE_GET_INFO MPI-3.1 says that even if no info keys are set on the file, we need to return a new, empty info. Thanks to Lisandro Dalcin for identifying the issue. Fixes open-mpi/ompi#1630 Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-05 12:03:29 -07:00
Aurélien Bouteiller	4899c89731	Fix a race condition when multiple threads try to create a bml endpoint simultaneously.	2016-05-05 10:49:30 -04:00
Aurélien Bouteiller	627a89bf71	Fix a race condition when multiple threads do the "first send" to an endpoint simultaneously.	2016-05-05 09:04:10 -04:00
Joshua Ladd	4771c9ece6	Merge pull request #1617 from jladd-mlnx/topic/disable-hcoll-barrier-in-finalize-ompi-trunk HCOLL: fix hang in hcoll barrier called from finalize for MXM/yalla	2016-05-04 10:12:34 -04:00
Edgar Gabriel	78fa8bb2c4	remove some unused variables that can cause compilation problems on netbsd	2016-05-03 10:25:15 -05:00
Todd Kordenbrock	3498bed650	Merge pull request #1555 from shawone/check_reduce_ret coll-portals4: check return value from reduce kary tree functions	2016-05-03 10:17:23 -05:00
Jeff Squyres	33dd8ca81e	osc_rdma_peer: properly include ompi_config.h Thanks to Paul Hargrove for reporting. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-03 07:39:55 -07:00
Devendar Bureddy	cafd55f18c	HCOLL: fix hang in hcoll barrier called from finalize for MXM/yalla tear down HCOLL barrier may not complete if HCOLL progress is not called periodically. which is the case in HCOLL teardown progress in the finalize. (cherry picked from commit 793244d75dd94d1d5e0243bcccf6d04318750f3f)	2016-05-03 00:49:57 +03:00
Nathan Hjelm	d3d779f6d9	osc/rdma: clear all_sync object when obtaining a lock This commit fixes a bad synchronization detection bug that occurs when mixing MPI_Win_fence() and MPI_Win_lock(). If no communication has occurred in the fence epoch it is safe to just clear the all_sync object (it was set up by fence). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-02 15:28:47 -06:00
Jeff Squyres	265e5b9795	Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1 ompi/opal/orte/oshmem/test: max hostname length cleanup	2016-05-02 09:44:18 -04:00
Ralph Castain	6ac7929bd0	Extend the schizo framework to allow definition of CLI options by environment. Refactor orterun to mesh with the orted_submit code, thus improving code reuse. Eliminate the orte-submit tool as orterun can now meet that need. Cleanups per @jjhursey review	2016-05-01 11:30:25 -07:00
Nathan Hjelm	7bda3eb2dc	osc/rdma: fix global index array calculation This commit fixes a bug that occurs when ranks are either not mapped evenly or by something other than core. Fixes open-mpi/ompi#1599 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-28 19:11:11 -06:00
Nathan Hjelm	f0f3383006	Merge pull request #1590 from hjelmn/thread_multiple osc/pt2pt: do not drop/reacquire the ompi_request_lock	2016-04-26 16:48:37 -06:00
Nathan Hjelm	34ff6293bd	osc/pt2pt: do not drop/reacquire the ompi_request_lock This lock is now recursive so it is safe to call into the pml without dropping the lock. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-26 14:19:38 -06:00
George Bosilca	bf190671e9	Make the request lock recursive. If during the request completion callback we post another request that completes right away (such a small send or a match for an unexpected short message) we will try to complete the second request while holding the lock for the completion of the first. For performance reasons (mainly to avoid unlocking and locking the request mutex several times) we have made the request lock recursive.	2016-04-26 16:16:07 -04:00
Nathan Hjelm	c16e639b2f	Merge pull request #1563 from hjelmn/ompi_coverity ompi coverity fixes	2016-04-26 09:17:48 -06:00
Karol Mroz	3322347da9	ompi: fixup hostname max length usage Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-25 07:08:23 +02:00
Nathan Hjelm	ae0ffbb67f	Merge pull request #1397 from hjelmn/enable_thread_multiple ompi: always enable MPI_THREAD_MULTIPLE support	2016-04-23 08:40:22 -06:00
Nathan Hjelm	1ff3d3b16b	pml/ob1: fix coverity issue Fix CID 1357978 (1 of 1): Logically dead code (DEADCODE): Remove duplicate check for NULL == endpoint. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 14:48:13 -06:00
Nathan Hjelm	70533e6d50	fcoll/static: fix coverity issues Fix CID 72362: Explicit null dereferenced (FORWARD_NULL) From what I can tell the code @ fcoll_static_file_read_all.c:649 should be setting bytes_per_process[i] to 0 not bytes_per_process. Fix CID 72361: Explicit null dereferenced (FORWARD_NULL) Modified check to check for blocklen_per_process non-NULL before trying to free blocklen_per_process[l]. This is sufficient because free (NULL) is safe. Also cleaned up the initialization of this an a couple other arrays. They were allocated with malloc() then initialized to 0. Changed to used calloc(). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 14:48:13 -06:00
Nathan Hjelm	8871bdb2f8	fcoll/two_phase: fix coverity issues Fix CID 72296: Resource leak (RESOURCE_LEAK): Changed code to goto exit instead of returning to ensure memory is freed. Fix CID 712589: Out-of-bounds read (OVERRUN): In this loop i and j are identical and always less than iov_count. The CID was triggered because i was incremented if i was < iov_count. This meant that if the loop did go on the next iteration would access an invalid index. Fix CID 741363: Uninitialized scalar variable (UNINIT): Allocate tmp_len with calloc to insure every index is initialized. Fix CID 741364: Uninitialized pointer read (UNINIT): Allocate recv_types with calloc to ensure all indices are always initialized. Also added a check to not loop and destroy if recv_types is NULL. Also added a NULL check on the allocation of decoded iov. This is not the cause of CID 126784 but should be fixed. Fix CID 712588: Out-of-bounds read (OVERRUN): Similar to CID 712589. Should silence the issue. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 14:47:41 -06:00
Valentin Petrov	21f1c572c0	Adds mapping to hcoll complex dte	2016-04-19 14:14:28 +03:00
Nicolas Chevalier	c86d4035d2	coll-portals4: check return value from reduce kary tree functions	2016-04-18 12:02:30 +00:00
Nathan Hjelm	3245428e82	Merge pull request #1535 from kawashima-fj/pr/osc-pt2pt-header-fix osc/pt2pt: Fix a struct name typo	2016-04-14 15:55:25 -06:00
Nathan Hjelm	330302c4b4	Merge pull request #1534 from kawashima-fj/pr/parallel-rma-fix osc/pt2pt: Fix tag conflicts on parallel RMA communications	2016-04-14 15:13:32 -06:00
Jeff Squyres	fdf33674b3	Merge pull request #1532 from kmroz/wip-hindexed-cleanup-1 romio,java: cleanup deprecated hindexed call	2016-04-14 17:07:31 -04:00
KAWASHIMA Takahiro	35ea9e5c3c	Add FUJITSU copyright	2016-04-12 13:47:53 +09:00
KAWASHIMA Takahiro	39bcbe439a	osc/pt2pt: Fix a struct name typo Fortunately the sizes of `ompi_osc_pt2pt_header_put_t` and `ompi_osc_pt2pt_header_get_t` are same. So this doesn't affect the behavior.	2016-04-11 20:55:22 +09:00
KAWASHIMA Takahiro	28a0577364	osc/pt2pt: Insert breaks in long lines	2016-04-11 19:06:01 +09:00
KAWASHIMA Takahiro	5ac95df9dc	osc/pt2pt: use two distinct "namespaces" for tags - revised Before this commit, a same PML tag may be used for distinct communications for long messages. For example, consider a condition where rank A calls ```MPI_PUT``` targeting rank B and rank B calls ```MPI_GET``` targeting rank A simultaneously. A PML tag for the ```MPI_PUT``` is acquired on rank A and is used for the long-message communication from rank A to rank B. A PML tag for the ```MPI_GET``` is acquired on rank B and is used for the long-message communication from rank A to rank B. These two tags may become a same value because they are managed independently on each rank. This will cause a data corruption. This commit separates the tag used in a single RMA communication call, one for communication from an origin to a target, and one for communication from a target to an origin. A "base" tag is acquired using ```get_tag``` function and PML tag is caluculated from the base tag by ```tag_to_target``` and ```tag_to_origin``` function.	2016-04-11 19:05:20 +09:00
KAWASHIMA Takahiro	3576ecafa7	Revert "osc/pt2pt: use two distinct "namespaces" for tags" This reverts commit `06ecdb6aa7` to reimplement the fix completely.	2016-04-11 19:04:11 +09:00
Karol Mroz	5c54184986	romio: replace deprecated hindexed call Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-10 19:56:22 +02:00
Nathan Hjelm	c6b19818be	bml: always enable the bml This commit ensures the bml is always enabled whether or not it will be used. This ensures that any available btls communicate their modex so that they can be used for one-sided communication. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-08 21:14:17 -06:00
George Bosilca	896f857fc4	Thanks @hjelmn for catching up the typo.	2016-04-07 13:56:26 -04:00

... 2 3 4 5 6 ...

6223 Коммитов