openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	37c3ed68e7	Cleanup connect/disconnect and bring comm_spawn back online!	2015-09-06 10:27:39 -07:00
Jeff Squyres	f782a7640e	usnic: minor re-order of Makefile.am sources Put the hwloc.c file alphabetically in the list.	2015-09-05 05:02:00 -07:00
rhc54	665b30376a	Merge pull request #868 from rhc54/topic/hwloc Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given	2015-09-04 17:58:07 -07:00
Ralph Castain	2ecbbc84e7	Hide a symbol that is only used in one file and is not properly prefixed	2015-09-04 17:08:24 -07:00
Ralph Castain	d97bc29102	Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given	2015-09-04 16:54:40 -07:00
rhc54	d45ccda813	Merge pull request #866 from rhc54/topic/updatepmix Update PMIx support	2015-09-04 11:09:36 -07:00
Ralph Castain	f6948c2bb4	Sync with PMIx master 43e45c3. Get multi-node publish/lookup/unpublish working	2015-09-04 10:07:17 -07:00
Rolf vandeVaart	ebfd00b66e	While debugging user problems, these extra verbosity statements would be helpful	2015-09-03 17:15:39 -04:00
Howard Pritchard	0557beee22	Merge pull request #864 from hppritcha/topic/pmix_cray_more_funcs pmix/cray: more stubs plus a get_version method	2015-09-03 14:52:46 -06:00
Howard Pritchard	6e7345c790	pmix/cray: more stubs plus a get_version method Add more stubs to reduce likelihood of future mysterious segfaults if some of the newer pmix funcs start to get used within ompi. Add a get_version to return the version of the Cray PMI library being used, since the Cray PMI library actually has a function to get that info. Be more accurate about which functions have a hope of being implemented using Cray PMI and those which never will. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2015-09-03 12:51:50 -07:00
Ralph Castain	a772b46c15	Bring the MPI_Publish and friends online	2015-09-02 12:04:07 -07:00
matcabral	1f9218a0bc	Fix for openib btl mca command line parameter btl_openib_mtu being ignored.	2015-09-02 02:22:30 -07:00
Ralph Castain	95dbd70f44	Sync to PMIx 1.1, sha- 51479b0	2015-09-01 14:09:25 -07:00
Rolf vandeVaart	30b1a6e003	Merge pull request #836 from rolfv/pr/fix-cuda-war Add config code to check for need of workaround. Add runtime way to turn oiff just in case.	2015-09-01 15:05:29 -04:00
Nathan Hjelm	f926796e57	Merge pull request #828 from hjelmn/openib_thread_fix openib thread fixes	2015-09-01 09:12:50 -06:00
rhc54	d8cb3fe705	Merge pull request #852 from rhc54/topic/pmix Sync to PMIx tarball - includes:	2015-09-01 06:54:34 -07:00
Gilles Gouaillardet	6dfa996760	configury: fix a typo in opal/mca/pmix/pmix1xx/configure.m4	2015-09-01 14:59:07 +09:00
Ralph Castain	c1bbd7bc78	Sync to PMIx tarball - includes: * update to configury to silence ident messages (thanks Gilles!) * fix for warnings Jeff saw when get didn't find the requested data * fix for Mac OSX operations	2015-08-31 21:51:02 -07:00
rhc54	2d3c6af8ad	Merge pull request #851 from rhc54/topic/copyfix Only copy the value across if the "get" operation succeeded	2015-08-31 19:51:13 -07:00
Ralph Castain	ef69958e01	Only copy the value across if the "get" operation succeeded	2015-08-31 17:11:26 -07:00
Jeff Squyres	8558458bb9	usnic: adjust for new PMIX argument type	2015-08-31 14:55:58 -07:00
Rolf vandeVaart	54ab0d1a51	Add config code to check for need of workaround. Add runtime way to turn it off just in case	2015-08-31 17:18:47 -04:00
Nathan Hjelm	3c34f6f25c	Merge pull request #517 from hjelmn/class_fix opal/class: enable use of opal classes after opal_class_finalize	2015-08-31 12:13:58 -07:00
Nathan Hjelm	faf06edb5b	Merge pull request #824 from hjelmn/opal_mutex_mod opal/mutex: remove unnecessary ()s from OPAL_SCOPED_LOCK macro	2015-08-31 12:08:25 -07:00
rhc54	6e78e2c89b	Merge pull request #846 from rhc54/topic/pmix Sync to PMIx tarball	2015-08-31 08:53:07 -07:00
Nathan Hjelm	2aab6ad90f	Merge pull request #827 from hjelmn/recursive_locks Add support for recursive locks (revisited)	2015-08-31 07:52:23 -07:00
Ralph Castain	a3842af709	Sync to PMIx tarball	2015-08-31 07:47:46 -07:00
Ralph Castain	bcabd1e282	Sync with PMIx tarball, bringing across the warning fixes pointed out by Gilles	2015-08-30 21:13:55 -07:00
Gilles Gouaillardet	7e6a213465	pmix: fix compilation error compilation failed because of missing prototypes when configure'd with --enable-debug --enable-picky on a CentOS 7 box	2015-08-31 10:33:13 +09:00
rhc54	51a8a0f5d7	Merge pull request #842 from rhc54/topic/smfix Fix shared memory operations by resolving local peers	2015-08-30 14:49:43 -07:00
Ralph Castain	b0d7564400	Sync to PMIx 1.1 - do not check pmix version when making connections	2015-08-30 12:15:30 -07:00
Ralph Castain	38ba54366c	Fix shared memory operations by resolving local peers	2015-08-30 12:07:14 -07:00
Ralph Castain	0d5814b5ca	Cleanup Coverity issues	2015-08-29 21:19:27 -07:00
Ralph Castain	3cab860a01	Some cleanups - still some errors that impact shared memory operations	2015-08-29 18:11:11 -07:00
Ralph Castain	1d71037139	Update some APIs	2015-08-29 17:26:32 -07:00
Ralph Castain	79827ceaa8	Remove stale directory	2015-08-29 17:15:17 -07:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Nathan Hjelm	1d56007ab1	rcache/vma: make rcache lock recursive There is currently a path through the grdma mpool and vma rcache that leads to deadlock. It happens during the rcache insert. Before the insert the rcache mutex is locked. During the call a new vma item is allocated and then inserted into the rcache tree. The allocation currently goes through the malloc hooks which may (and does) call back into the mpool if the ptmalloc heap needs to be reallocated. This callback tries to lock the rcache mutex which leads to the deadlock. This has been observed with multi-threaded tests and the openib btl. This change may lead to some minor slowdown in the rcache vma when threading is enabled. This will only affect larger message paths in some of the btls. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-26 10:01:37 -06:00
Nathan Hjelm	54998e5745	opal: add recursive mutex This new class is the same as the opal_mutex_t class but has a different constructor. This constructor adds the recursive flag to the mutex attributes for the lock. This class can be used where there may be re-enty into the lock from within the same thread. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-26 10:01:37 -06:00
Nathan Hjelm	f451876058	Merge pull request #825 from hjelmn/white_space_purge periodic trailing whitespace purge	2015-08-25 19:23:52 -06:00
Nathan Hjelm	64e4419d76	btl/openib: allow the use of the openib btl in thread muliple There were several issues preventing the openib btl from running in thread multiple mode: - Missing locks in UDCM when generating a loopback endpoint. Fixed in open-mpi/ompi@8205d79819. - Incorrect sequence numbers generated in debug mode. This did not prevent the openib btl from running but instead produced incorrect error messages in debug builds. - Recursive locking of the rcache lock caused by the malloc hooks. This is fixed by open-mpi/ompi#827 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-24 16:04:52 -06:00
Nathan Hjelm	c101385f64	btl/openib: fix sequence number generation for debug mode When using eager RDMA in debug builds the openib btl generates a sequence number for each send. The code independently updated the head index and the sequence number for the eager rdma transaction. If multiple threads enter this code at the same time and run in the following order: thread 1: update sequence (0 -> 1) thread 2: update sequence (1 -> 2) thread 2: update head (0 -> 1) thread 1: update head (1 -> 2) the sequence number for head[0] gets 1 and the sequence number for head[1] gets 0. The fix is to generate the sequence number from the head index. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-24 16:00:06 -06:00
Nathan Hjelm	8205d79819	btl/openib: add missing lock calls Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-24 12:21:49 -06:00
Nathan Hjelm	156ce6af21	periodic whitespace purge Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-24 09:32:33 -06:00
Nathan Hjelm	f59b3ed7ed	opal/mutex: remove unnecessary ()s from OPAL_SCOPED_LOCK macro Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-21 10:36:49 -06:00
Nathan Hjelm	209a7a0721	opal/lifo: add load-linked store-conditional support This commit adds implementations for opal_atomic_lifo_pop and opal_atomic_lifo_push that make use of the load-linked and store-conditional instruction. These instruction allow for a more efficient implementation on supported platforms. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-18 14:01:52 -06:00
Nathan Hjelm	2a7e191dd8	opal/fifo: if available use load-linked store-conditional These instructions allow a more efficient implementation of the opal_fifo_pop_atomic function. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-18 14:01:52 -06:00
Nathan Hjelm	6a19a10fbb	atomic/ppc: add atomics for load-link, store-conditional, and swap This commit adds implementations of opal_atomic_ll_32/64 and opal_atomic_sc_32/64. These atomics can be used to implement more efficient lifo/fifo operations on supported platforms. The only supported platform with this commit is powerpc/power. This commit also adds an implementation of opal_atomic_swap_32/64 for powerpc. Tested with Power8. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-18 14:01:52 -06:00
Nathan Hjelm	f87dbca042	Merge pull request #817 from hjelmn/remove_alpha opal/asm: remove alpha support	2015-08-18 13:52:03 -06:00
Nathan Hjelm	551c2ea480	opal/asm: remove alpha support This commit removes alpha asm support. No current processor manufacturer makes chips compatible with DEC alpha and no participating organization has alpha processors. This makes it difficult to support alpha via assembly. This doesn't mean Open MPI will no longer build/work on alpha processors. It should continue to work with gcc's builtin sync atomics. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-18 09:11:38 -06:00
Nathan Hjelm	0a968de53f	mca/base: use standard verbosity levels Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-17 11:48:06 -06:00
Nathan Hjelm	8cbf743cfa	mca/base: standize MCA verbosity levels Up until this point we have had inconsistent usage for MCA verbosity levels. This commit attempts to correct this by recommending components use these standard levels: none (0), error (1), warn (10), info (20), debug (40), and trace (60). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-17 11:47:57 -06:00
Gilles Gouaillardet	950b07d17b	Merge pull request #809 from ggouaillardet/topic/xrc_runtime_check btl/openib: remove OFED version runtime check when XRC is used	2015-08-16 10:54:09 +09:00
Gilles Gouaillardet	d02ccd67de	btl/openib: remove OFED version runtime check when XRC is used this test seems broken : - some false positive were reported - it fails to detect some OFED version mismatch this commit simply removes this test, which means the application will likely fail if XRC is used ad OFED version is different between compile time and runtime	2015-08-14 09:10:03 +09:00
Rolf vandeVaart	cb8c86910e	Add static definitions where needed and remove one unused definition	2015-08-13 14:59:07 -04:00
Jeff Squyres	14340770c4	usnic: remove some logically dead code This code really had no purpose; just assign FI_VERSION(1, 1). This fixes CID 1315274. Also clarify the commet about why we still retain libfabric v1.0.0 compatibility code, even though configure.m4 requires libfabric >= v1.1.0.	2015-08-12 05:21:18 -07:00
Jeff Squyres	7f857034d9	common verbs: check return value of sscanf() Fixes CID 1304563.	2015-08-12 05:14:58 -07:00
Jeff Squyres	92bc8afd43	opal_progress_threads: fix double RELEASE If a thread failed to start, the tracker would be released twice. This commit fixes CID 1316020.	2015-08-12 05:11:40 -07:00
Gilles Gouaillardet	92b2d2ffeb	configury: fix libevent configure.ac fix interleaved messages : checking for working epoll library interface... checking if epoll can build... yes yes	2015-08-12 15:37:22 +09:00
Jeff Squyres	3369606c75	Merge pull request #791 from jsquyres/pr/usnic-async-events usnic: move cchecker to OPAL-wide progress thread	2015-08-11 07:54:07 -04:00
Jeff Squyres	236cf7ff62	usnic: add v1.8/v1.10 compat code Add compat code so that I can sync master against the v1.10 branch.	2015-08-10 16:27:38 -07:00
Jeff Squyres	7da1c4b875	usnic: avoid race condition in connectivity checker In short applications, it's possible that the agent (i.e., local rank 0) will finalize after non-local rank 0 procs detect the connectivity checker named socket, but before they complete a connect() on it. As such, their connect() gets ECONNREFUSED. This commit adds a simple counter in the agent that won't let it quit before it accept()'s from all local procs, or 10 seconds goes by (whichever occurs first). This is similar to the timeout for the clients: they'll exit if they don't see the expected named socket within 10 seconds.	2015-08-10 15:40:33 -07:00
Jeff Squyres	bad508687e	usnic: move cchecker to OPAL-wide progress thread There's no longer any need for the usnic BTL to have its own progress thread: it can use the opal_progress_thread() infrastructure. This commit removes the code to startup/shutdown the usnic-BTL-specific progress thread and instead, just adds its events to the OPAL-wide progress thread. This necessitated a small change in the finalization step. Previously, we would stop the progress thread and then tear down the events. We can no longer stop the progress thread, and if we start tearing down events, this will cause shutdown/hangups to be sent across sockets, potentially firing some of the still-remaining events while some (but not all) of the data structures have been torn down. Chaos ensues. Instead, queue up an event to tear down all the pending events. Since the progress thread will only fire one event at a time, having a teardown event means that it can tear down all the pending events "atomically" and not have to worry that one of those events will get fired in the middle of the teardown process.	2015-08-10 15:40:33 -07:00
Rolf vandeVaart	95d19af0eb	Merge pull request #783 from rolfv/pr/fix-thread-issue Refs open-mpi/ompi#627. Fix support for multi-threads with CUDA 7.0	2015-08-10 11:13:56 -04:00
Rolf vandeVaart	8cc6bef090	Refs open-mpi/ompi#627 . Fix support for multi-threads with CUDA 7.0	2015-08-10 10:22:45 -04:00
Jeff Squyres	9e1e563120	event: remove opal_async_event_base opal_async_event_base is not used anywhere. The opal_progress_thread API should be used instead.	2015-08-07 10:13:41 -07:00
Jeff Squyres	d7c25f683e	pmix_native: update to the new opal_progress_thread API	2015-08-07 10:13:40 -07:00
Jeff Squyres	99fa054507	opal_progress_threads: update to the API There are now four functions and one global constant: * opal_progress_thread_name: the name of the OPAL-wide async progress thread. If you have general purpose events that you need to run in a progress thread, but not a dedicated progress thread, use this name in the functions below to glom your events on to the general OPAL-wide async progress thread. * opal_progress_thread_init(): return an event base corresponding to a progress thread of the specified name (a progress thread will be created for that name if it does not already exist). * opal_progress_thread_finalize(): decrement the refcount on the passed progress thread name. If the refcount is 0, stop the thread and destroy the event base. * opal_progress_thread_pause(): stop processing events on the event base corresponding to the progress thread name, but do not destroy the event base. * opal_progess_thread_resume(): resume processing events on the event base corresponding to a previously-paused progress thread name.	2015-08-07 10:13:40 -07:00
Jeff Squyres	b5c37dbfe2	CSCuv67889: usnic: fix an error corner case Ensure that we have non-NULL on all levels of pointers, which will save us if there are exitable errors very early during component / module initialization.	2015-08-06 10:54:28 -07:00
Nathan Hjelm	6265aaa354	Merge pull request #771 from hjelmn/lifo_fix opal/lifo: add missing opal_atomic_wmb and remove unnecessary opal_atomic_rmb	2015-08-04 14:02:29 -06:00
Nathan Hjelm	6003a4dae1	opal/lifo: add missing opal_atomic_wmb and remove unnecessary opal_atomic_rmb Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-04 08:54:06 -06:00
Nathan Hjelm	45a8e8daff	Merge pull request #770 from hjelmn/fifo_fix opal/fifo: add missing memory barrier	2015-08-04 08:46:59 -06:00
Nathan Hjelm	9abccbd9fc	opal/fifo: add missing memory barrier Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-08-03 16:22:28 -06:00
Jeff Squyres	cbcd16b399	usnic: remove a stale shell variable name	2015-07-31 18:53:54 -07:00
Jeff Squyres	0ee8295e6e	usnic: ensure that we have libfabric >= v1.1	2015-07-31 18:53:54 -07:00
rhc54	c6cc1a9707	Merge pull request #766 from rhc54/topic/hwloc Update x86_32 cpuid assembly code.	2015-07-31 12:53:16 -07:00
Jeff Squyres	2e7f794aae	usnic: convert to use fi_recvmsg / FI_MORE Minor optimization to post 16 receive buffers at a time (vs. 1).	2015-07-31 12:45:40 -07:00
Ralph Castain	b42545b0cb	Update x86_32 cpuid assembly code. Cheery-picked from open-mpi/hwloc@40f9978bcc	2015-07-31 11:40:38 -07:00
Ralph Castain	023936e84b	Silence coverity warnings	2015-07-29 07:28:08 -07:00
George Bosilca	c03b3b135c	Don't allow multiple pvar with the same pvar_index. Fix Cisco copyright.	2015-07-25 15:57:50 -04:00
Guillaume Papauré	98b6d65385	avoid use of non initialized variable	2015-07-25 15:29:32 -04:00
bosilca	5b9f59bb43	Merge pull request #738 from rolfv/pr/fix-msg-and-spaces Fix arguments to error message, remove tabs and trailing spaces	2015-07-25 11:38:54 -04:00
Jeff Squyres	df800286e4	Merge pull request #709 from avilcheslopez/master Improving opal_pointer_array bounds checking.	2015-07-23 14:45:11 -04:00
Alejandro Vilches	994ed60b3d	Improving opal_pointer_array bounds checking (using OPAL_UNLIKELY).	2015-07-23 11:53:16 -07:00
Rolf vandeVaart	1f32fa21ae	Fix arguments to error message, remove tabs and trailing spaces	2015-07-23 10:02:45 -04:00
Rolf vandeVaart	773b509407	Merge pull request #737 from rolfv/pr/add-cuda-war Add a workaroud for issue in libcuda.so library	2015-07-22 16:14:14 -04:00
Rolf vandeVaart	7703c96496	Add a workaroud for issue in libcuda.so library	2015-07-22 11:35:27 -04:00
Jeff Squyres	ec3a38384f	Merge pull request #688 from jsquyres/pr/usnic-libfabric-msg-prefix-fix usnic fixes for differences between libfabric v1.0.0 and v1.1.0	2015-07-21 10:18:36 -04:00
Gilles Gouaillardet	f7cf7d5070	configury: fix XRC detection on OFED < 3.12 since ibv_create_xrc_rcv_qp is now deprecated, and in order to be "future-proof", we have to consider the case in which only XRC Domains are supported. also, correctly handle distro that ship broken ibverbs devel headers Thanks Paul Hargrove for the detailled report.	2015-07-13 10:43:22 +09:00
Ralph Castain	219c4dfba5	Create a new opal_async_event_base and have the pmix/native and ORTE level use it. This reduces our thread count by one.	2015-07-12 08:23:34 -07:00
Ralph Castain	683efcb850	Rename the current opal_event_base to opal_sync_event_base in preparation for adding an async progress thread to opal. No functional changes made here - just a simple rename.	2015-07-11 10:08:19 -07:00
rhc54	053d9b2a7c	Merge pull request #713 from rhc54/topic/errhandler Add an opal/errhandler so opal-level errors can be up-leveled	2015-07-11 07:58:57 -07:00
Ralph Castain	a2243dcddd	Add an opal/errhandler so opal-level errors can be up-leveled	2015-07-11 07:09:11 -07:00
Ralph Castain	61fb067f14	Update the opal_hotel class to support a given event base instead of defaulting to using opal_event_base	2015-07-11 06:42:23 -07:00
Jeff Squyres	633da6641e	usnic: gracefully handle when we can't alloc an ACK The comment didn't match the debugging code (which was ugly, and apparently never happens, anyway). Just return and let the sender retransmit.	2015-07-10 14:19:33 -07:00
Jeff Squyres	3327fa56b5	usnic: minor code cleanups	2015-07-10 10:10:43 -07:00
Jeff Squyres	f9c65a701e	usnic: "sin" assignment needs to be outside the #if The "sin" variable is used below; need to ensure that it is assigned for all builds (not just debug builds).	2015-07-10 06:51:03 -07:00
Jeff Squyres	cd87c8ad41	usnic: misc compiler warnings fixes	2015-07-10 06:51:03 -07:00
Jeff Squyres	ba429dc890	usnic: temporarily disable the BTL put method The usnic BTL put method is currently broken. Disable it until we can fix it properly.	2015-07-10 06:51:03 -07:00
Jeff Squyres	f265358fbe	usnic: handle FI_MSG_PREFIX differences libfabric v1.0.0->v1.1.0 In libfabric v1.0.0 (i.e., API v1.0), the usnic provider handled FI_MSG_PREFIX inconsistently between sends and receives. This has been fixed in libfabric v1.1.0 (i.e., API v1.1): FI_MSG_PREFIX is handled consistently for both sends and receives. Run-time detect which libfabric we are running with and adapt behavior appropriately.	2015-07-10 06:51:03 -07:00
Jeff Squyres	ddd0de6cfc	usnic: make more OS-bypass memory Valgrind-defined This helps reduce false positives when running MPI apps through Valgrind.	2015-07-10 06:51:03 -07:00
Jeff Squyres	9bc7a54e0c	usnic: correctly count CRC errors Handle the differences between libfabric v1.0.0 and v1.1.0 in the return value of fi_cq_readerr(). Also consolidate CRC and truncation errors into the same handling block, since truncation errors are typically another symptom of CRC errors. This ensures that buffers get reposted properly.	2015-07-10 06:51:03 -07:00
Jeff Squyres	fc686f5538	usnic: make configure complain if libfabric cannot be found Instead of silently determining that the usnic BTL can't be built, announce that usnic is checking for libfabric support, and then AC_MSG_RESULT the result of that check.	2015-07-10 06:45:33 -07:00
Jeff Squyres	4341639a66	Revert "configury: fix (again) XRC detection on OFED < 3.12" @ggouaillardet is likely offline for the weekend, but master is broken on RHEL 6.5 systems that do not have MOFED installed. So I'm taking the liberty of revering this commit; I'm guessing Gilles will fixup and re-commit next week. This reverts commit `77f8282d51`.	2015-07-10 06:45:33 -07:00
Gilles Gouaillardet	77f8282d51	configury: fix (again) XRC detection on OFED < 3.12 since ibv_create_xrc_rcv_qp is now deprecated, and in order to be "future-proof", we have to consider the case in which only XRC Domains are supported. Thanks Paul Hargrove for the detailled report.	2015-07-10 15:31:45 +09:00
Rolf vandeVaart	ae0f3cfee7	Make explicit call to initalize MCA parameters in common CUDA code. This allows us to view them with ompi_info and possibly modify with tools interface	2015-07-09 12:51:55 -04:00
Rolf vandeVaart	cdffa4724d	Force smcuda BTL to use CUDA IPC path for all GPU buffers where possible	2015-07-08 17:11:25 -04:00
Ralph Castain	ed93154e43	Fix hetero operations. An error in the hwloc utilities only allocated memory for the first display of a binding map, and then assumed that all nodes had the same number of cores in them. This resulted in memory corruption whenever someone displayed a binding pattern for a hetero cluster, and a smaller node was first in line.	2015-07-07 12:52:16 -07:00
Gilles Gouaillardet	9f171de412	btl/openib: queue pending fragments once only when running out of credit Fixes open-mpi/ompi#640	2015-07-06 09:45:01 +09:00
bosilca	77367ca02c	Merge pull request #687 from rolfv/pr/fix-smcuda-perfprob Add the ability use different size buffers for host and CUDA buffers	2015-07-02 18:42:41 -04:00
Jeff Squyres	4e7d979f8d	Merge pull request #686 from jsquyres/pr/autogen-no-ompi-bool-fixes bool: use SIZEOF__BOOL, not SIZEOF_BOOL	2015-07-02 12:19:07 -04:00
Rolf vandeVaart	30a872b478	Add the ability to send host buffers through one sized staging buffers and CUDA buffers through different sized buffers. Fixes performance issues	2015-07-02 11:11:15 -04:00
Jeff Squyres	f1353947ff	libfabric: fix wrappers for static builds Need to set the WRAPPER_EXTRA flags so that the wrappers for static builds pull in -lfabric. Also update/fix some comments.	2015-07-02 07:58:16 -07:00
Jeff Squyres	cd5751c217	bool: use SIZEOF__BOOL, not SIZEOF_BOOL When you "autogen.pl --no-ompi", the AC_SIZEOF(bool) test is not run. But we do run AC_SIZEOF(_Bool), which is the equivalent. So switch the uses of SIZEOF_BOOL in the code base to be SIZEOF__BOOL, and it's all good.	2015-07-02 07:32:02 -07:00
Ralph Castain	861fe1d9dd	This is the third time I am fixing this - I have no idea who or why this is being reset.	2015-07-02 08:39:48 -05:00
Alina Sklarevich	27797654db	openib btl: added a new vendor_part_id for Mellanox ConnectX4-LX.	2015-06-29 13:50:43 +03:00
Ralph Castain	75ceec663a	Now that it has been officially released, update the embedded HWLOC to 1.11.0	2015-06-28 14:07:45 -07:00
bureddy	c78b8e9b8e	Merge pull request #664 from bureddy/master powerpc: update mem barrier instructions	2015-06-25 14:09:49 -07:00
Jeff Squyres	a172bd161e	usnic: switch to use the new libfabric common library The usnic BTL configure.m4 no longer needs to OPAL_CHECK_LIBFABRIC; it just uses the results from opal/mca/common/libfabric's configure.m4. We also now don't need to link against libfabric -- they just link against the opal_common_libfabric library.	2015-06-25 13:33:15 -07:00
Ralph Castain	8d128fe090	Remove the non-null attributes from the cmd_line parser as this isn't something we can guarantee, and the optimization isn't worth the potential for error	2015-06-25 13:26:20 -07:00
Ralph Castain	ea0e21bb06	Add a common/libfabric component to the opal layer where we can place common functions	2015-06-25 11:04:00 -07:00
Nathan Hjelm	ee36d813dc	Merge pull request #657 from hjelmn/c99 more c99 updates	2015-06-25 11:21:09 -06:00
Howard Pritchard	f45914db9b	Merge pull request #670 from hppritcha/topic/ownership_update ownership: update ownership files	2015-06-25 11:02:45 -06:00
Nathan Hjelm	4d92c9989e	more c99 updates This commit does two things. It removes checks for C99 required headers (stdlib.h, string.h, signal.h, etc). Additionally it removes definitions for required C99 types (intptr_t, int64_t, int32_t, etc). Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-06-25 10:14:13 -06:00
rhc54	1a767ed47c	Merge pull request #654 from rhc54/topic/config Remove internal bool type definitions	2015-06-25 09:10:21 -07:00
Howard Pritchard	e49a37c034	ownership: update ownership files per discussions at OMPI devel workshop Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2015-06-25 10:04:42 -06:00
Devendar Bureddy	ed406b05cb	powerpc: update mem barrier instructions - added isync interface. - define opal_atomic_wmb() to lwsync as it is recommend over eieio on cache enabled storage. (http://www.ibm.com/developerworks/systems/articles/powerpc.html).	2015-06-25 10:54:44 +03:00
Nathan Hjelm	4552afff06	Fix definition of MPI_T_pvar_get_index The definition of MPI_T_pvar_get_index was incorrect. This commit fixes the definition and adds a missing return code. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-06-24 17:31:26 -06:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Ralph Castain	a809902c0a	Now that we require C99, and stdbool.h is part of C99, we no longer need to define our own bool types. Since bool is commonly used in a lot of places, just include stdbool.h in opal_config_bottom.h	2015-06-23 11:31:48 -07:00
Ralph Castain	cc9b416ab3	Ensure we properly commit suicide if/when we lose connection to the daemon. There are multiple paths by which a lost daemon can be reported, and so a race condition exists in the pmix support. Our MPI layer wants the ability to determine the response to the failure, and so it will call down to the RTE with any abort request. This comes down to the pmix layer as a "pmix_abort" command, which involves communicating the request to the daemon - who is gone. Sadly, the pmix component may not know that just yet, and so we hang. So add a brief timer event to kick us out of the communication. The precise amount of time we should wait is somewhat TBD, but set something short for now and we can adjust.	2015-06-18 09:45:52 -07:00
Jeff Squyres	8ab2b11f88	btl_openib.c: fix another compiler warning Remove this unused variable	2015-06-17 09:00:12 -07:00
Jeff Squyres	f688289aaf	btl_openib.c: fix compiler warning This return code is not used; tell the compiler we're not going to use it.	2015-06-17 08:56:56 -07:00
Jeff Squyres	097b48d521	mca_base_component_respository.c: fix compiler warning This function is only used in the DL case -- it can be #if'ed out if we're not compiling with DL support to avoid a compiler warning about defined-but-not-used.	2015-06-17 08:54:59 -07:00
Jeff Squyres	dfa36197ea	usnic/Makefile.am: ensure static builds include -lfabric	2015-06-17 08:15:29 -07:00
Gilles Gouaillardet	2cef2d0fe6	opal/memory: silence a warning as reported by Coverity with CID 71663	2015-06-17 11:17:55 +09:00
Gilles Gouaillardet	58d1b3f4d0	opal_os_dirpath_create: fix TOCTOU as reported by Coverity with CID 70396	2015-06-17 11:17:54 +09:00
Gilles Gouaillardet	de66447ebb	opal_cmd_line_get_usage_msg: silence warning as reported by Coverity with CID 1269967	2015-06-17 11:17:54 +09:00
Gilles Gouaillardet	f2f66e6e63	opal_daemon_init: silence warning as reported by Coverity with CID 710642	2015-06-17 11:17:53 +09:00
Gilles Gouaillardet	8427e87ee9	opal_argv_delete: silence warning as reported by Coverity with CID 71914	2015-06-17 11:17:53 +09:00
Gilles Gouaillardet	d9c490cf9f	refactor opal_bitmap_get_string make it more efficient and fix CID 71992 (dead code)	2015-06-17 11:17:53 +09:00
Jeff Squyres	44e7646de9	usnic/configure.m4: convert to use external libfabric Use the new OPAL_CHECK_LIBFABRIC macro.	2015-06-15 15:17:06 -07:00
Jeff Squyres	3e1b85ceb3	libfabric: remove embedded libfabric OMPI now only builds against external libfabric installations.	2015-06-15 15:17:05 -07:00
Jeff Squyres	c74ab51dd4	opal/mca/dl/dl.h: fix the #ifndef/#define name Thanks to Scott Atchley for noticing the name mismatch.	2015-06-15 13:08:57 -07:00
rhc54	adbff46a13	Merge pull request #642 from rhc54/topic/hwloc Update hwloc to 1.11.0	2015-06-13 12:09:58 -07:00
Ralph Castain	ff92781ec4	Replace hwloc191 with hwloc1110 Fix hwloc compile. Ignore LAMA mapper due to deprecated hwloc functions	2015-06-13 10:11:45 -07:00
Jeff Squyres	4384131e65	openib: minor style and defensive programming fixes Minor comment/whitespace fixes. Also some minor logic changes that are mainly for defensive programming purposes (i.e., ensure to always set malloc_hook_set to true or false, and then check it before we try to actually invoke it).	2015-06-12 20:11:47 -07:00
Jeff Squyres	2f137ff151	openib: reset memalign threshhold properly Now that open-mpi/ompi#638 is fixed, reset the openib BTL memalign threshhold properly. This effectively re-instates commit open-mpi/ompi@ce915b5757.	2015-06-12 20:11:47 -07:00
Jeff Squyres	88c13adc8c	openib: only set the memory hook if it is enabled Instead of unconditionally setting the memory hook, only set it when the memory hooks are both available and have been enabled (e.g., opal/mca/memory/linux has decided that it can be enabled, and when the mpi_leave_pinned MCA param is set to 1, or is set to -1 and some component requested the memory hooks be enabled). If we set the memory hook when memory hooks are not enabled, __malloc_hook will be NULL, which will cause problems when btl_openib_malloc_hook() tries to invoke it. Fixes open-mpi/ompi#638.	2015-06-12 20:11:47 -07:00
Ralph Castain	12d3c9ca22	Revert "Fix a typo that incorrectly set the alignment threshold in the openib BTL." This reverts commit `ce915b5757`.	2015-06-10 14:02:49 -07:00

1 2 3 4 5 ...

3722 Коммитов