openmpi

Автор	SHA1	Сообщение	Дата
Gilles Gouaillardet	189da7fdab	pmix2x: plug a memory leak in _event_hdlr() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-24 09:13:30 +09:00
Gilles Gouaillardet	acbc32d3b2	pmix2x: plug a memory leak in opal_lkupcbfunc() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-24 09:13:29 +09:00
Gilles Gouaillardet	b5b21043c4	pmix2x: plug a memory leak in _reg_nspace() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-24 09:13:29 +09:00
Gilles Gouaillardet	0f47310a75	pmix2x/pmix2x_client: plug misc memory leaks Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-24 09:13:29 +09:00
Joshua Hursey	029964a748	libevent2022: Fix broken configure AC_LANG_PROGRAM * The AC_LANG_PROGRAM macro adds the `main()` so it is erroneous to add it to the test program. * This was detected with the XL compilers which will fail to build the program in this situation. The GNU compiler does not error out or warn, but successfully compiles the program. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-01-23 13:44:12 -06:00
Ralph Castain	8c960bae8d	Update to latest PMIx master Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-23 07:07:40 -08:00
George Bosilca	999d4973a9	Fix an issue with extremely large data identified by tjb900. Due to the conversion from ssize_t to int we were losing bytes, and ended up writing outside the receiver buffer. Similarly on the send, due to the conversion to a lesser type, we could missinterpret the end of the fragment.	2017-01-18 10:33:12 -05:00
Nathan Hjelm	91c34c8df6	Merge pull request #2703 from hjelmn/rcache_fix rcache/base: do not release vma stuctures in vma_tree_delete	2017-01-12 09:53:34 -07:00
Jeff Squyres	938ab01ad6	Merge pull request #2714 from hjelmn/timer_rollover timer/linux: prevent 64-bit overflow	2017-01-12 06:40:52 -05:00
Nathan Hjelm	45c05880aa	timer/linux: prevent 64-bit overflow The linux timer code was multiplying the result of the x86 time stamp counter by 1000000 before dividing by the cpu frequency. This can cause us to overflow 64 bits if the time stamp counter grows larger than ~ 1.8e13 (about 8400 seconds after boot). To fix the issue the units of opal_timer_linux_freq have been changed to MHz. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-01-11 20:03:10 -07:00
Gilles Gouaillardet	aeee48357a	btl/sm: correctly handle nodes with zero NUMA hwloc object the hwloc topology might not contain a NUMA object with hwloc < v2 if the node is not NUMA, so force the NUMA object count to one in order to correctly allocate mca_btl_sm_component.sm_mpools. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-12 11:45:29 +09:00
George Bosilca	c2cd717f82	Don't refcount the predefined datatypes. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-01-11 16:48:59 -05:00
Ralph Castain	31a8476223	Merge pull request #2702 from rhc54/topic/cov Silence Coverity CID 1398541	2017-01-10 17:50:23 -08:00
Nathan Hjelm	79cabc92fd	rcache/base: do not release vma stuctures in vma_tree_delete This commit fixes a deadlock that can occur when the libc version holds a lock when calling munmap. In this case we could end up calling free() from vma_tree_delete which would in turn try to obtain the lock in libc. To avoid the issue put any deleted vma's in a new list on the vma module and release them on the next call to vma_tree_insert. This should be safe as this function is not called from the memory hooks. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-01-10 16:58:07 -07:00
Ralph Castain	e568b211e4	Silence Coverity CID 1398541 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-10 15:30:50 -08:00
Jeff Squyres	b980e334dc	usnic: add completion stats This should probably not go to the v2.x branch, since it changes the output format of the usnic stats. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-10 12:06:54 -08:00
Jeff Squyres	706f53bb01	usnic: ensure that stats string is always truncated Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-10 12:06:54 -08:00
Jeff Squyres	1fdd0fe228	usnic: add missing params to show_help() call Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-10 12:06:54 -08:00
Jeff Squyres	7048adec04	usnic: add some assert()s Add some run-time assert checks for debug builds. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-10 12:06:32 -08:00
Jeff Squyres	2d28ccb5fd	usnic: add verbose output of queue lengths Show the actual RX/TX and CQ length returned by libfabric in verbose output. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-10 12:06:32 -08:00
Jeff Squyres	bd5b8ed754	usnic: ensure that queues are long enough Double check the queue lengths that we get back from libfabric to ensure that they are at least as long as we need. They should never be shorter than we need, but let's just check to be sure. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-10 12:06:32 -08:00
Jeff Squyres	53dc75a89c	usnic: ensure to reset flags on returned frags Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-10 12:06:31 -08:00
Jeff Squyres	c4d7876ca0	usnic: check send credits on data channel for data frags Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-10 12:06:31 -08:00
Jeff Squyres	879d25e5df	usnic: ensure to check send credits for ACKs Don't just blindly send ACKs; ensure that we have send credits before doing so. If we don't have any send credits, just don't send the ACK (it'll come again soon enough; it's not a tragedy if we don't send it now). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-10 12:06:31 -08:00
Jeff Squyres	7787dad4db	usnic: ensure CQs are long enough The libfabric usnic provider may give you back TX/RX queues that are longer than you asked for. So just use the TX/RQ/CQ lengths that we asked for, regardless of what length comes back. Additionally, keep the length of the priority channel CQ separate from the length of the data CQ. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-10 12:03:53 -08:00
Jeff Squyres	b02d8c48f5	usnic: make the releasing safer Since the usnic BTL is single-threaded in this area, there really is no danger, but don't use one of the pointers hanging off the frag after we return it to the freelist. Instead, save the endpoint pointer before returning the frag. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-10 12:03:53 -08:00
Jeff Squyres	e25b860627	usnic: clarify types The types are technically typedef equivalent, but it's less confusing to use the types that agree with the name of the constructor. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-10 12:03:53 -08:00
Jeff Squyres	40fe575132	usnic: trivial updates (no code/logic changes) - Add more explanatory comments - Trivial whitespace / style updates - Rename opal_btl_usnic_force_retrans() -> opal_btl_usnic_fast_retrans() Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-10 10:40:02 -08:00
Gilles Gouaillardet	6d59b476de	Merge pull request #2686 from ggouaillardet/topic/pmix2x_ptl_base_sendrecv pmix2x: ptl/base: send header and message data together via writev()	2017-01-10 16:26:10 +09:00
Gilles Gouaillardet	44c1ff60f1	Merge pull request #2672 from ggouaillardet/topic/misc_memory_leaks Plug misc memory leaks	2017-01-10 13:16:04 +09:00
Gilles Gouaillardet	a01960bee5	pmix2x: ptl/base: send header and message data together via writev() on Linux, sending the header and then the message data does severely impact performances of ptl/tcp : on the receiver, reading the data can often result in an PMIX_ERR_RESOURCE_BUSY or PMIX_ERR_WOULD_BLOCK, which ends up degrading performances) this commit send both header and message data at the same time via writev() and makes ptl/tcp virtually as efficient as ptl/usock. Short writev generally occur when the kernel buffer is full, so there is no point for retrying in this case. fwiw, no such degradation was observed on OSX. Refs open-mpi/ompi#2657 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-10 13:07:39 +09:00
Nathan Hjelm	d6bd69dc93	mca/base: account for NULL string_value in verbose set The MCA variable code calls the string from value function with a NULL string to verify values. The verbosity enumerator was not correctly checking for a non-NULL value before trying to set the string. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-01-09 11:52:31 -07:00
Ralph Castain	67fce2861b	Merge pull request #2685 from rhc54/topic/cov Resolve Coverity issues	2017-01-07 13:11:40 -08:00
Ralph Castain	e25e69dc2f	Resolve Coverity issues Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-07 10:45:52 -08:00
George Bosilca	cfeeecd381	Remove the tcp_local field from the TCP component. Instead use the OPAL process name to get the name of the local process. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-01-07 13:24:18 -05:00
Ralph Castain	822e2680ba	Cleanup some configure stuff for static builds - still can't get wrapper extra libs to be recognized Signed-off-by: Ralph Castain <rhc@open-mpi.org> pmix2x: minor configure updates Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-07 08:37:36 -08:00
Ralph Castain	444f5fa35d	Raise the priority of the usock component so it gets preferentially picked Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-05 22:53:04 -08:00
Gilles Gouaillardet	c2ddb1e2fc	mca/base: plug a memory leak register mca_base_var_enum_value_flag_t so they can be free'd upon finalize Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 13:46:36 +09:00
Gilles Gouaillardet	6d5cb9fe0d	event: plug a leak when closing the event framework Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 13:46:35 +09:00
Gilles Gouaillardet	6ef281e163	pmix/base: fix misc memory leaks Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 13:46:35 +09:00
Gilles Gouaillardet	a59dfd7b14	sec/munge: plug a memory leak Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 13:46:35 +09:00
Gilles Gouaillardet	c612499bc1	opal: mca/base: fix a memory leak in the mca_base_var_enum_flag_t destructor Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 11:35:59 +09:00
Gilles Gouaillardet	7e5da7382e	btl/tcp: plug leaks when closing component remove tcp_local from the tcp_procs table, and release it Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 11:35:59 +09:00
Gilles Gouaillardet	507623d6b1	mpool/hugepage: plug a memory leak on finalize Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 11:35:58 +09:00
Gilles Gouaillardet	51021028d6	mpool/base: plug a memory leak on finalize Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 11:35:58 +09:00
Ralph Castain	6509f60929	Complete the memprobe support. This provides a new scaling tool called "mpi_memprobe" that samples the memory footprint of the local daemon and the client procs, and then reports the results. The output contains the footprint of the daemon on each node, plus the average footprint of the client procs on that node. Samples are taken after MPI_Init, and then again after MPI_Barrier. This allows the user to see memory consumption caused by add_procs, as well as any modex contribution from forming connections if pmix_base_async_modex is given. Using the probe simply involves executing it via mpirun, with however many copies you want per node. Example: $ mpirun -npernode 2 ./mpi_memprobe Sampling memory usage after MPI_Init Data for node rhc001 Daemon: 12.483398 Client: 6.514648 Data for node rhc002 Daemon: 11.865234 Client: 4.643555 Sampling memory usage after MPI_Barrier Data for node rhc001 Daemon: 12.520508 Client: 6.576660 Data for node rhc002 Daemon: 11.879883 Client: 4.703125 Note that the client value on node rhc001 is larger - this is where rank=0 is housed, and apparently it gets a larger footprint for some reason. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-05 10:32:17 -08:00
Ralph Castain	91d714fe93	Add flags to direct PMIx to only use one listener, but without directing which one (tcp or usock) to use. This allows the user to set PMIX_MCA_ptl in their environment to select the transport method. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-04 09:16:44 -08:00
Ralph Castain	f355fb926d	Continue cleanup of notifications. Resolve a race condition that can result in attempt to send a message on a closed socket Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-04 09:16:33 -08:00
Ralph Castain	9eab9a1ed3	Remove stale global variables Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers. Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation). Begin first cut at memory profiler Some minor cleanups of memprobe Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-02 14:04:24 -08:00
Ralph Castain	e8aea2ebfc	Minor cleanups Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-30 16:19:42 -08:00

... 3 4 5 6 7 ...

3149 Коммитов