openmpi

Автор	SHA1	Сообщение	Дата
rhc54	876257469e	Merge pull request #1728 from rhc54/topic/sim Enable simulation of large-scale clusters	2016-05-29 21:29:16 -07:00
Ralph Castain	3913595e10	Enable simulation of large-scale clusters by allowing multiple daemons/node. Specifying the ras_base_multiplier parameter to be greater than 1 will cause ORTE to replicate each allocated node by that factor. A daemon will be spawned for each replica, thus letting ORTE function as if it were on a much larger cluster. Note that this cannot be used for MPI performance testing. It is really only useful for ORTE scaling tests. It also only works with the rsh/ssh launcher.	2016-05-29 18:56:18 -07:00
rhc54	a93c01d4f4	Merge pull request #1724 from rhc54/topic/timeout Add a timeout cmd line option and an option to report state info upon timeout to assist with debugging Jenkins tests	2016-05-28 08:36:41 -07:00
Ralph Castain	ebe159acef	Add a timeout cmd line option and an option to report state info upon timeout to assist with debugging Jenkins tests If requested, obtain stacktraces for each application process and report it to stderr upon timeout stack traces: minor improvements - Also include the hostname and PID of the each process for which we're sending the stack traces (vs. just including the ORTE process name) - Send a specific error message if we couldn't find "gstack" in the $PATH (e.g., on OS X) - Send a sepcific error message if gstack fails to run - Print a message that obtaining the stack traces may take a few seconds so that users don't wonder what's happening Signed-off-by: Jeff Squyres <jsquyres@cisco.com> help-orterun.txt: minor tweaks Trivial update: show "--timeout" (instead of "-timeout") in the help message, just to encourage the use of double-dash options. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> trivial: stacktrace -> stack trace Trivial word smything. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-28 08:36:25 -07:00
Jeff Squyres	59f4a765b3	Merge pull request #1656 from hpcraink/pr/make_manpage In case, we do not build Fortran, Fortran 2008 or CXX, the regexp in …	2016-05-28 11:02:12 -04:00
Jeff Squyres	e126d2cd18	Merge pull request #1584 from bgoglin/master Update hwloc to v1.11.3	2016-05-28 11:01:54 -04:00
Nathan Hjelm	d8fd3a411a	Merge pull request #1725 from hjelmn/request_fixes ompi/request: fix bugs in MPI_Wait_some and MPI_Wait_any	2016-05-27 13:47:49 -06:00
Nathan Hjelm	0591139f49	ompi/request: fix bugs in MPI_Wait_some and MPI_Wait_any This commit fixes two bugs in MPI_Wait_any: - If all requests are inactive then the sync wait would hang forever because no requests are attached to the sync. - The request pointer was pointing to the request before the completed request which caused the wrong request to be freed or marked inactive. MPI_Wait_some had a similar issue if all the requests were pending. These issues were identified by MTT. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-27 12:36:10 -06:00
Nathan Hjelm	3974987ba3	Merge pull request #1723 from hjelmn/warning_fixes win: fix warnings	2016-05-27 12:26:04 -06:00
Nathan Hjelm	0adfb328e1	win: fix warnings Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-05-27 10:14:02 -06:00
rhc54	e5ee7adbe0	Merge pull request #1722 from rhc54/topic/pmixext Enable PMIx external support for both 1.1.4 and 2.0 versions	2016-05-27 08:59:09 -07:00
Ralph Castain	55923eacd3	Stealing some pieces of Josh Hursey's PR #1583 and modifying a bit, allow the opal/pmix external component to handle both PMIx 1.1.4 and PMIx 2.0 versions. Automatically detect the version of the target external library and adjust the only two APIs that changed (PMIx_Init and PMIx_Finalize) Rename temp vars in .m4 to avoid conflict with Travis	2016-05-27 08:06:31 -07:00
Nathan Hjelm	d25b846c01	Merge pull request #1704 from hpcraink/pr/configure_framework Fix configure for FreePGI on OSX	2016-05-26 17:01:08 -06:00
Nathan Hjelm	8c9292d5d1	Merge pull request #1721 from hjelmn/xrc_fix btl/openib: fix XRC WQE calculation	2016-05-26 17:00:31 -06:00
Nathan Hjelm	56bdcd0888	btl/openib: fix XRC WQE calculation Before dynamic add_procs support was committed to master we called add_procs with every proc in the job. The XRC code in the openib btl was taking advantage of this and setting the number of work queue entries (WQE) based on all the procs on a remote node. Since that is no longer the case we can not simply increment the sd_wqe field on the queue pair. To fix the issue a new field has been added to the xrc queue pair structure to keep track of how many wqes there are total on the queue pair. If a new endpoint is added that increases the number of wqes and the xrc queue pair is already connected the code will attempt to modify the number of wqes on the queue pair. A failure is ignored because all that will happen is the number of active send work requests on an XRC queue pair will be more limited. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-26 15:58:31 -06:00
Aurelien Bouteiller	49bd28d0ac	Merge pull request #1714 from hjelmn/scif_exclusivity btl/scif: reduce default exclusivity	2016-05-26 17:53:11 -04:00
Pavel Shamis (Pasha)	60fd25f3fb	VADER: Adjusting VADER_MAX_ADDRESS for non x86 platforms. The original VADER_MAX_ADDRESS was tunned for x86_64 platforms only. For non x86_64 platforms we can use XPMEM_MAXADDR_SIZE. Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>	2016-05-26 16:38:04 -05:00
Nathan Hjelm	f19c647f21	Merge pull request #1718 from hjelmn/config_fix config: fix typo in mxm configury	2016-05-26 13:19:23 -06:00
Joshua Ladd	1a5fd6bf83	Merge pull request #1719 from ICLDisco/ucx_request_fix Removal of ompi_request_lock from pml/ucx.	2016-05-26 15:09:57 -04:00
Thananon Patinyasakdikul	60d0fbf683	Removal of ompi_request_lock from pml/ucx.	2016-05-26 12:36:58 -04:00
Nathan Hjelm	8c2086995d	config: fix typo in mxm configury A 1 was missing when setting $1_LDFLAGS leading to erroneous items in the wrapper cflags. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-26 10:28:07 -06:00
Nathan Hjelm	87ea9be863	Merge pull request #1715 from hjelmn/ugni_overhead btl/ugni: reduce overhead of progress function	2016-05-26 10:17:00 -06:00
Gilles Gouaillardet	46710ba151	travis: fix a typo and create bogus directories to avoid compiler warnings	2016-05-26 15:28:10 +09:00
George Bosilca	90f294096e	Remove more references to the request mutex. Regarding BFO it should be mentionned that this component is currently unmaintained, and that despite my efforts I could not make it compile (it would not compile before this patch either).	2016-05-25 23:27:06 -04:00
Nathan Hjelm	5d322170a0	Merge pull request #1716 from hjelmn/request_fixes Request fixes	2016-05-25 18:14:03 -06:00
Nathan Hjelm	9d439664f0	pml/yalla: update for request changes This commit brings the pml/yalla component up to date with the request rework changes. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 15:42:53 -06:00
Nathan Hjelm	8445c885ce	pml/cm: update for request changes This fixes a hang caused by the request refactor work. The cm pml was not updated and was hanging is most cases. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 15:35:32 -06:00
Nathan Hjelm	dbfab94ede	atomic/mxm: rename symbol that is a duplicate of one in atomic/ucx This fixes an error when building with --enable-static. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 15:34:40 -06:00
Nathan Hjelm	99627319f0	btl/ugni: reduce overhead of progress function This commit reduces the overhead of calling the ugni progress function. It does the following: - Check for new connections once every eight calls. - Do not call remote smsg progress unless we are connected to at least one remote peer. - Do not call rdma progress unless at least one rdma fragment is outstanding. - Check endpoint wait list size before obtaining a lock. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 14:27:34 -06:00
Nathan Hjelm	5caf12cd9b	btl/scif: reduce default exclusivity This commit reduces the default exclusivity so that btl/scif is not used for send/recv over other shared memory transports. Fixes open-mpi/ompi#1712 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 14:25:07 -06:00
Nathan Hjelm	8e1d59aea8	Merge pull request #1708 from hjelmn/c__fix request: fix compilation error	2016-05-25 10:48:02 -06:00
Nathan Hjelm	ef11ba9394	request: fix compilation error The request.h header is unfortunately included files in the C++ bindings. C++ does not allow assigning from void * to another pointer without a cast. This commit adds the cast. We can clean this up when the C++ bindings are deleted. Fixes open-mpi/ompi#1707 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 09:52:23 -06:00
Joshua Ladd	ce783a9ebf	Merge pull request #1706 from vspetrov/coll_hcoll_req_type_bugfix coll/hcoll: bugfix: initialize req_type field	2016-05-25 10:56:33 -04:00
Valentin Petrov	5ff6372886	coll/hcoll: bugfix: initialize req_type field If left uninitialized then segfault is possible in MPI_Waitall in the case the field by chance equals OMPI_REQUEST_GEN.	2016-05-25 15:38:01 +03:00
Rainer Keller	3727cba9bb	Fix compilation for FreePGI on OSX Our checks and the ones of libevent are somewhat flawed. If adding multiple "-framework" to CXXFLAGS or CFLAGS, we strip the keyword from the command-line, not good. libevent however assumes plain gcc without testing properly that the compiler supports -Wno-deprecated-declarations.	2016-05-25 09:12:39 +02:00
George Bosilca	2b868c4952	Fix MPI datatype args. Compensate for the datatype ID that we add to the array.	2016-05-24 23:36:54 -04:00
Jeff Squyres	dd9a819a1c	odls_default: do not opal_output() while creating a process! It is verbotten to use opal_output() after the fork() but before the exec()! It results in all manner of undefined behavior. For example, on some OS X systems, if you run a trivial "hello world" MPI program with a high level of ODLS verbosity: ```sh $ mpirun -np 3 --mca odls_base_verbose 100 ./hello_c ``` You will see a bunch of output from the mpirun ODLS base, but then it may hang in odls_default_module.c:do_child() -- after the fork() but before the exec() -- while trying to opal_output() some debugging statements. The solution is to remove these extraneous opal_output() statements. Indeed, the ODLS base is already outputting the same information that these opal_output() statements are trying to emit, anyway. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-24 21:28:57 -04:00
Nathan Hjelm	461ca1203b	Merge pull request #1703 from hjelmn/grdma_cuda_fix rcache/grdma: fix typo in cuda code	2016-05-24 18:51:22 -06:00
bosilca	b90c83840f	Refactor the request completion (#1422 ) * Remodel the request. Added the wait sync primitive and integrate it into the PML and MTL infrastructure. The multi-threaded requests are now significantly less heavy and less noisy (only the threads associated with completed requests are signaled). * Fix the condition to release the request.	2016-05-24 18:20:51 -05:00
Nathan Hjelm	af52dad8f8	rcache/grdma: fix typo in cuda code Fixes open-mpi/ompi#1702 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-24 15:56:39 -06:00
Nathan Hjelm	1d3110471c	Merge pull request #1697 from hjelmn/acc_order win: add support for accumulate_ordering info key	2016-05-24 14:34:05 -06:00
Pavel Shamis (Pasha)	d984b4b3f9	CMA: Fixing logic for CMA system call detection The OPAL_CMA_NEED_SYSCALL_DEFS is always defined/set to 0 or 1. Therefore instead of checking if the macro is defined, we have to look at the value itself. Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>	2016-05-24 14:53:25 -05:00
Nathan Hjelm	5126da5377	win: add support for accumulate_ordering info key This commit adds support for the MPI-3.1 accumulate_ordering info key. The default value is rar,war,raw,waw and is supported using an MCA variable flag enumerator. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-24 11:13:30 -06:00
rhc54	b7928c2607	Merge pull request #1693 from rhc54/topic/eval2 Fix the dist mapper option	2016-05-24 05:32:12 -07:00
Ralph Castain	30aaf785a8	Fix the dist mapper option	2016-05-23 23:20:33 -07:00
rhc54	927d3f4c3c	Merge pull request #1692 from rhc54/topic/eval2 Fix the --tune problem by searching the argv for MCA params in advance of opal_init_util	2016-05-23 22:19:09 -07:00
rhc54	8d2d5ef1fe	Merge pull request #1691 from rhc54/topic/java Fix command line usage when Java user provides the -Djava.library.path=foo options	2016-05-23 21:12:49 -07:00
Ralph Castain	80f4e3b872	Fix the --tune problem by searching the argv for MCA params in advance of opal_init_util. Only search the first app_context as we historically have done - we can debate whether or not to search all app_contexts	2016-05-23 21:09:44 -07:00
Ralph Castain	2da0210de3	Fix command line usage when Java user provides the -Djava.library.path=foo options	2016-05-23 15:29:36 -07:00
Nathan Hjelm	a651f26701	Merge pull request #1690 from hjelmn/flag_enum mca/base: fix typo in flag enumeration	2016-05-23 14:14:36 -06:00

1 2 3 4 5 ...

25191 Коммитов