openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	08285c6361	lt_interface: properly check OPAL_HAVE_LTDL_ADVISE	2015-02-11 12:25:20 -08:00
Jeff Squyres	4f1996df5d	various: remove $(LTDLINCL) from Makefile.am's that didn't need it	2015-02-11 12:25:20 -08:00
Ralph Castain	3de8c5c7c6	Cleanup the munge support - the credential cannot be reused for multiple connections	2015-02-10 20:34:35 -08:00
George Bosilca	e173f9b0c0	Somehow we lost one of the most critical parameter allowing the PML to decide how to order the different interconnects. Bring it back !	2015-02-10 20:32:05 -05:00
George Bosilca	7f4c5fa96f	Add the displacement of the element to the safeguard check.	2015-02-10 20:13:36 -05:00
Ralph Castain	3ae3b96c17	Fix master compilation - a buried header dependency must have been removed.	2015-02-10 07:22:10 -08:00
Mike Dubman	6816e3421f	Merge pull request #377 from regrant/ib_wr_fix fix problem with get_pathrecord posting too many recv requests	2015-02-10 08:47:23 +02:00
Ralph Castain	bef830efef	Fix debug output	2015-02-09 20:49:04 -08:00
Ralph Castain	07134f5b17	Add munge security	2015-02-09 20:49:03 -08:00
Ralph Castain	a3275aa867	Once again, fix the blasted singleton comm_spawn	2015-02-05 17:34:25 -08:00
Jeff Squyres	0dbbffb753	pmix_base_frame: use the "= { 0 }" initializer Per open-mpi/ompi#381, convert the specific intialization of opal_pmix to use the generic "= { 0 }" initializer. This form can be used to initialize any type when the intent is just to zero out / assign some value.	2015-02-05 17:51:06 -05:00
Ralph Castain	f28238af59	Fix a race condition seen by Absoft during finalize. Stop the orte progress thread without cleaning it up, thus allowing the frameworks to still cancel their posted recv's. Then cleanup the memory footprint afterwards.	2015-02-05 11:41:37 -08:00
Ralph Castain	4d882796b6	Silence warnings	2015-02-05 11:41:00 -08:00
Howard Pritchard	e508a4078e	Merge pull request #376 from regrant/ib_error_fix fixes OpenIB connect error reporting for ibv_* calls that return an errn...	2015-02-04 10:22:03 -07:00
Jeff Squyres	621af3aa07	pmix_base: fix global opal_pmix symbol for static linking on OS X OS X has weirdness when static linking. If a symbol is not initialized, it is put into the common block section, and Weird Things happen (linking when trying to using that global symbol will fail). If you initialize the variable, it goes into a different section (and linking to it will work). This link (that might go stale someday) has some information about OS X linker scope and treatment of symbol definitions: https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/MachOTopics/1-Articles/executing_files.html#//apple_ref/doc/uid/TP40001829-98432-TPXREF120 Fixes #375.	2015-02-04 12:12:31 -05:00
Ryan Grant	de93497789	fix problem with get_pathrecord posting too many recv requests	2015-02-04 09:53:58 -07:00
Ryan Grant	5d5e9bc1f8	fixes OpenIB connect error reporting for ibv_* calls that return an errno	2015-02-04 09:09:14 -07:00
Jeff Squyres	a3728f09af	libfabric: add another missing file to the Makefile.am	2015-02-04 04:02:27 -08:00
Jeff Squyres	66a680879e	libfabric: fix header file name in Makefile.am	2015-02-03 19:41:25 -08:00
Jeff Squyres	cb7cc171f9	usnic: update README.txt notes Update notes about copying the usnic BTL between master and the v1.8 branch.	2015-02-03 15:54:36 -08:00
Jeff Squyres	edf7232e00	usnic: enable building with an external libfabric	2015-02-03 13:46:06 -08:00
Jeff Squyres	bfa54d5d7b	usnic: update to match new libfabric	2015-02-03 13:46:06 -08:00
Jeff Squyres	d2490d2fd8	libfabric: update Makefile.am to match new libfabric drop	2015-02-03 13:46:05 -08:00
Jeff Squyres	3dc0abfbc4	libfabric: update to (just past) 1.0rc1 Updated to Github ofiwg/libfabric@6b005d0d19.	2015-02-03 13:46:05 -08:00
Ralph Castain	d3267c200f	Add missing OMPI-changes to libevent 2.0.22	2015-02-02 20:57:40 -08:00
Jeff Squyres	965ccab6cc	libfabric: remove a few warnings Embedding libfabric is a temporary measure; I'm removing some warning notifications so that the output isn't so cluttered (we're getting the real warnings fixed upstream, but the OMPI community doesn't really care/need to see the warnings in the meantime).	2015-01-29 17:38:02 -08:00
Todd Kordenbrock	37e6096fe7	Copyright update.	2015-01-29 11:08:13 -06:00
Todd Kordenbrock	ca30e129e8	Add the option to use the Portals4 logical to physical table. This commit adds an MCA variable to select Portals4 logical addressing, populates the logical-to-physical mapping table and initializes the NI in this mode.	2015-01-29 11:08:13 -06:00
George Bosilca	b9a63cbe7a	One less warning.	2015-01-27 13:25:55 -05:00
Ralph Castain	294ebc907a	Fix singleton operations so they can work inside a slurm environment	2015-01-27 09:29:42 -06:00
Ralph Castain	ba25e8a0ce	Fix singletons	2015-01-27 09:29:42 -06:00
Ralph Castain	028b00154d	Complete implementation of the schizo framework to support OMPI component	2015-01-27 09:29:42 -06:00
Jeff Squyres	436223959d	usnic: update to match new libfabric APIs	2015-01-24 05:49:36 -08:00
Jeff Squyres	7d5755f62b	libfabric: update to ofiwg/libfabric@b3f7af4c67 Pull down a new embedded copy of libfabric from https://github.com/ofiwg/libfabric.	2015-01-24 05:48:48 -08:00
Howard Pritchard	4de512af66	Merge pull request #358 from hppritcha/topic/ugni_spawn_issue btl/ugni: use PMIX_GLOBAL for modex_send in ugni	2015-01-22 12:55:46 -06:00
Howard Pritchard	056daa05bf	btl/ugni: use PMIX_GLOBAL for modex_send in ugni Using PMIX_REMOTE is not the right thing for ugni BTL when its possible that spawned ranks end up on the same node as some of the spawnee ranks.	2015-01-22 06:53:45 -08:00
Bert Wesarg	0d0a754c42	Remove VampirTrace.	2015-01-22 08:08:07 +01:00
Gilles Gouaillardet	9f80aa2d28	btl/openib: regression fix when rdmacm or udcm are disabled This fixes a regression introduced in open-mpi/ompi@661c35ca67 Thanks to Mark Santcroos for reporting this issue	2015-01-20 11:31:50 +09:00
George Bosilca	da83b084f5	Shifting the datatype around should alter it's true LB and UB.	2015-01-19 02:28:17 -05:00
George Bosilca	3ae89dc686	Clarify some of the comments.	2015-01-19 02:26:59 -05:00
Rolf vandeVaart	66f6026214	Improve error message to help user figure out what to do	2015-01-16 13:55:27 -05:00
Jeff Squyres	65a279019e	usnic: fix typo in memchecker usage	2015-01-16 09:42:19 -08:00
Jeff Squyres	3969fe3a94	libfabric: ensure wrapper libs are loaded for static builds For static builds, we need to also set <framework>_<component>_WRAPPER_EXTRA_LIBS so that the wrappers know what other libraries to add to link executables.	2015-01-16 09:29:52 -08:00
Gilles Gouaillardet	661c35ca67	cleanup dead code caused by the removal of the --with-threads configure option	2015-01-16 19:13:59 +09:00
Gilles Gouaillardet	ac16970d21	opal_tree: use a safer syntax intel compiler incorrectly inline this function, so use a safer syntax to get correct generated code.	2015-01-16 18:45:55 +09:00
Gilles Gouaillardet	5687ce8a07	Revert "opal/lifo: fix type declaration when cmpset_128 is available" This reverts commit `1ba36175be`.	2015-01-16 15:18:07 +09:00
Gilles Gouaillardet	1ba36175be	opal/lifo: fix type declaration when cmpset_128 is available	2015-01-16 15:12:29 +09:00
Gilles Gouaillardet	b23126497c	Merge branch 'master' of https://github.com/open-mpi/ompi	2015-01-16 10:55:35 +09:00
Nathan Hjelm	006074c48d	Merge pull request #332 from hjelmn/openib_updates Openib updates	2015-01-15 15:05:18 -06:00
Jeff Squyres	d13c14ec82	CSCus22527: fix off-by-one error in checking the number of VFs Ensure to count this process when checking for how many VFs we need on the local server. (cherry picked from commit 386c01934e98cb8dcb48ff648ecdfb0c8677baa9)	2015-01-15 11:44:29 -08:00
Jeff Squyres	4685767b2d	libfabric: update usnic configury Use new common m4 macro for choosing between libnl3 and libnl.	2015-01-15 07:12:39 -08:00
Jeff Squyres	400b02e566	libfabric: update to github:ofiwg/libfabric HEAD Specifically: bbf0f3ea8e92c92a7cee56473ecdbbbb34cceb7d (15 Jan 2015)	2015-01-15 07:11:54 -08:00
Gilles Gouaillardet	bf6adedd70	atomic/ia32: silence warnings	2015-01-15 18:53:58 +09:00
Aurélien Bouteiller	f49981bb2a	Disable coalescing until pull request #332 gets in.	2015-01-14 14:12:47 -05:00
Nathan Hjelm	cf4975501d	rcache/vma: fix parent class of mca_rcache_vma_t There was a mismatch between the structure for mca_rcache_vma_t and the OBJ_CLASS_INSTANCE. One was opal_list_item_t and the other was ompi_free_list_item_t. The super class in the structure looks like it is the correct one. Changed the superclass in OBJ_CLASS_INSTANCE to match.	2015-01-14 10:21:24 -07:00
Jeff Squyres	e4e5e7dbc0	usnic: ensure to clean up nicely in case of low resources If there are not enough resources (e.g., low VFs), we can end up calling finalize_one_channel() on the same channel multiple times. So ensure to NULL out fields that we have freed already so that we do not try to free them a second time. Fixes CSCus26648.	2015-01-13 14:37:31 -08:00
Jeff Squyres	8807ae2497	usnic libfabric: also set the us_netmask_be field. From libfabric upstream commit ofiwg/libfabric@3976745. Part of the fix for CSCus22495.	2015-01-13 12:04:57 -08:00
Jeff Squyres	d00cede718	usnic: fix if_include/exclude of CIDR-specified networks Fix the ordering so that we obtain the usnic netmask information before we do the filtering based on CIDR-specified networks. Also requires upstream Github libfabric commit 3976745. Fixes CSCus22495.	2015-01-13 12:04:51 -08:00
Jeff Squyres	a220b92cf8	usnic: fix function name in opal_output	2015-01-13 12:04:07 -08:00
Gilles Gouaillardet	955f3c2730	configury: check existence of the atomic_init function in libfabric intel compilers implements atomic_init in c++ only, so disable c11 atomic in libfabric for now	2015-01-13 16:39:41 +09:00
Gilles Gouaillardet	cbe0d26b2d	configury: do test the __STDC_NO_ATOMICS__ macro for libfabric	2015-01-13 16:06:37 +09:00
Jeff Squyres	5ed688a074	usnic: enusre that we only get "usnic"-named providers Also, a minor update to a verbose message.	2015-01-12 13:21:22 -08:00
Jeff Squyres	881b1dcf19	usnic: document libfabric abstractions Handy tips to remember the libfabric abstractions and what they correspond to in usnic/VIC terms.	2015-01-09 15:21:51 -08:00
Gilles Gouaillardet	194d9f84d3	btl/usnic: move call to check_reg_mem_basics() avoid annoying memlock related messages when there is no usnic device.	2015-01-09 11:37:45 +09:00
George Bosilca	1344097d35	Turn OFF the TCP dump mechanism.	2015-01-08 18:50:49 -05:00
George Bosilca	8ddd3b3b09	Cleanup the TCP dump mechanism.	2015-01-08 18:50:05 -05:00
Nathan Hjelm	c65f026fee	btl/vader: fix typo in xpmem setup	2015-01-08 12:52:38 -07:00
Nathan Hjelm	9f6faadd91	opal_fifo: add missing memory barrier in pop Thanks to Adrian Reber for reporting this. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-01-08 09:14:56 -07:00
Gilles Gouaillardet	4c29d8e247	btl/openib: silence warning (unused code)	2015-01-08 17:18:07 +09:00
Gilles Gouaillardet	8ab605d9c5	btl/tcp: fix overflow in mca_btl_tcp_endpoint_dump()	2015-01-08 15:40:16 +09:00
Nathan Hjelm	7d206ae769	btl/ugni: fix a couple of bugs Two fixes: - Do not try to return a mailbox to the free list if one wasn't allocated. - Do not try to tear down IRQ CQs if they were not created.	2015-01-07 13:48:17 -07:00
Dave Goodell	49069bc661	usnic: fix fi_av_insert (ARP resolution) bugs We had several problems in the old code: 1. We were specifying an arbitrary timeout (100 ms) and then abandoning all remaining pending AV insert operations. We would then free the endpoint buffer that we gave to fi_av_insert(), usually causing libfabric's progress thread to write to a freed buffer. 2. We were claiming in a show_help message that the timeout was controllable via an MCA parameter. This commit removes that parameter, since there's no good method for us to specify a timeout like this to libfabric right now. 3. We also weren't waiting for the correct number of fi_av_insert() operations to complete. We were waiting for nprocs, which is accidentally fine for 2 procs on separate hosts, but not for most other proc counts. Reviewed-by: Jeff Squyres <jsquyres@cisco.com>	2015-01-07 08:25:17 -08:00
Gilles Gouaillardet	06e071454e	btl/openib: cleanup duplicate code	2015-01-07 14:07:30 +09:00
Gilles Gouaillardet	135ecce0eb	btl/openib: rename OPAL_HAVE_XRCD macro into OPAL_HAVE_CONNECTX_XRC_DOMAINS	2015-01-07 13:27:25 +09:00
George Bosilca	bf62bed65f	Typo in the poll/epoll ops declaration.	2015-01-06 21:21:25 -05:00
Ralph Castain	a7c5ff2ace	Update to libevent 2.0.22-stable	2015-01-06 16:37:25 -08:00
Nathan Hjelm	6733d89cf9	btl/vader: fix return code check when opening ptrace_scope file	2015-01-06 15:17:56 -07:00
Nathan Hjelm	cde79bfa60	btl/openib: misc cleanup (tabs, etc) and put credit code into a common place (was duplicated in the send and sendi paths)	2015-01-06 11:39:23 -07:00
Nathan Hjelm	9bae131589	btl/openib: fix message coalescing There was a bug in the openib btl handling this valid sequence of calls: desc = btl_alloc (); btl_free (desc); When triggered the bug would cause either fragment loss or undefined behavior (SEGV, etc). The problem occured because btl_alloc contained the logic to modify the pending fragment (length, etc) and these changes were not corrected if the fragment was freed instead of sent. To fix this issue I 1) moved some of the coalescing logic to the btl_send function, and 2) retry the coalesced fragment on btl_free if it was never sent. This appears to completely address the issue.	2015-01-06 11:39:16 -07:00
Nathan Hjelm	9aaac11648	btl/openib: fix recieve queue source detection	2015-01-06 11:39:11 -07:00
Howard Pritchard	7df648f1cf	btl/openib: fix problems from commit `b3617e73` For systems with OFED's lacking XRC support, commit `b3617e73` broke the build of the openib btl. This commit addresses the issues introduced by this commit.	2015-01-06 11:31:12 -07:00
Ralph Castain	4c38c31ccf	Actually copy buffer contents when dss.copy of a buffer is requested	2015-01-06 09:09:06 -08:00
Gilles Gouaillardet	b3617e736e	btl/openib: add XRC support with OFED 3.12+ based on an original patch contributed by Bull.	2015-01-06 15:30:52 +09:00
Howard Pritchard	c857cc926c	Merge pull request #327 from hppritcha/topic/async_progress Topic/async progress	2015-01-05 16:20:44 -07:00
Dave Goodell	8afd8487f8	opal_stdint.h: fix "#pragma GCC" warnings This was more complicated than I would like, but it's just an unfortunate GCC/clang difference. I don't have access to all the C compilers out there, so this may still have problems with other compilers that implement some form of `#pragma GCC diagnostic` support but don't actually behave the same as some versions of GCC. fixes #323	2015-01-05 14:44:46 -08:00
Gilles Gouaillardet	9e9261e90a	pmix: correctly set locality flags in proc_flags do not use opal_process_info.cpuset which is not set at that time.	2014-12-26 15:37:08 +09:00
Howard Pritchard	0a6f841d5f	xpmem/config: simple xpmem search on Cray's Use the pkg-config related m4 functions to find out where Cray's xpmem.h and libxpmem are located on a system. With this commit, there is no longer any need to have to explicitly indicate an xpmem install location on the configure line, at least for Cray systems running CLE 4.X and 5.X.	2014-12-24 14:40:06 -07:00
Howard Pritchard	065c756860	btl/ugni: improve error handling Improve error handling when pthread functions return errors. Remove stale debug code.	2014-12-24 11:50:24 -07:00
Howard Pritchard	f8e354ce00	btl/ugni: add a request_progress_thread mca param Replace temporary environment variables with a MCA parameter for the ugni btl. A user wishing to use the ugni btl async. progress thread needs to set the request_progress_thread param to true. For example, using env. variable format: export OMPI_MCA_btl_ugni_request_progress_thread=1	2014-12-24 11:50:24 -07:00
Howard Pritchard	8b250cc15b	btl/ugni: more debug cleanup	2014-12-24 11:50:24 -07:00
Howard Pritchard	f0c519517b	btl/ugni: switch to using opal_progress Switch to invoking opal_progress from the async progress thread, rather than calling ugni btl specific progress.	2014-12-24 11:50:24 -07:00
Howard Pritchard	47747c1b27	btl/ugni: remove some debug output	2014-12-24 11:50:24 -07:00
Howard Pritchard	2d14c2a204	btl/ugni: switch to using tx cq irqs for rdma Verified via testing with unit tests, etc. that in fact BTE TX descriptors using CQs configured to generate IRQs were in fact working correctly on Cray XC. Disable send message back to self and just use IRQs generated by completion of TX descriptors posted to BTE.	2014-12-24 11:50:24 -07:00
Howard Pritchard	acd07d98da	btl/ugni: turn off chatty debug in irq cq setup	2014-12-24 11:50:24 -07:00
Howard Pritchard	0dec2f4af7	btl/ugni: mark btl frags for irqs as btl owned Make sure frags allocated to generate irqs to wake the progress thread, etc. set the MCA_BTL_DES_FLAGS_BTL_OWNERSHIP flag.	2014-12-24 11:50:23 -07:00
Howard Pritchard	d188f0bc6f	btl/ugni: honor enable_mpi_threads Honor enable_mpi_threads setting to enable the ugni btl async progress thread. If the app doesn't request thread-multiple the thread will not be created.	2014-12-24 11:50:23 -07:00
Howard Pritchard	43cdcb745f	btl/ugni: add missing mutex lock	2014-12-24 11:50:23 -07:00
Howard Pritchard	83bcbd1cf9	btl/ugni: compilation fixes Fix compilation problems in ugni btl associated with async progress additions.	2014-12-24 11:50:23 -07:00
Howard Pritchard	13ab8a9e5a	btl/ugni: use MCA_BTL_DES_FLAGS_SIGNAL Use MCA_BTL_DES_FLAGS_SIGNAL frag flag to indicate whether or not an interrupt needs to be delivered along with a control message going through smsg.	2014-12-24 11:50:23 -07:00
Howard Pritchard	3fc7b389ff	initial async progress changes for gni	2014-12-24 11:50:23 -07:00
Devendar Bureddy	ccafc62c07	OMPI: btl openib: fix max registarable memory caluclation - by default allow to register maximum possible (i.e 2 * total_memory) memory. This beheviour can be turned off using mca parameter "btl_openib_allow_max_memory_registration" - In fallback case, use device specific parameters to calulate memory limit.	2014-12-23 23:35:54 +02:00
Howard Pritchard	ffbf9738a3	btl/vader: disable SGI UV xpmem for now This commit allows master to build again on SGI UV systems. Fixes #322	2014-12-23 12:04:25 -07:00
Gilles Gouaillardet	f6da257477	configury: test external hwloc version is 1.8 or greater hwloc_topology_dup is only available from hwloc 1.8	2014-12-22 13:42:38 +09:00
Jeff Squyres	40dd4c5b76	configury: manually remove some stamp-h? files Due to what might be a bug in Automake, we need to remove stamp-h? files manually. See http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19418.	2014-12-20 08:32:57 -08:00
Jeff Squyres	d5b3e5802e	libfabric configury: add more tests Properly test for some dependent libraries; don't just assume elsewhere in Open MPI's configury will find those libraries. Also consolidate some CPPFLAGS and clarify some comments.	2014-12-20 08:32:47 -08:00
Jeff Squyres	012e008649	libfabric configury: make AC_CONFIG_FILES be unconditional Also add the generated config.h file to .gitignore.	2014-12-20 08:32:47 -08:00
Jeff Squyres	45ef0352d7	libfabric: do a proper check for intrinsic atomics	2014-12-20 08:32:46 -08:00
Jeff Squyres	ff1364cbe4	Revert "libfabric: add missing header file" That wasn't a missing header file; in fact, it should have been .gitignored! This reverts commit `35bf5fc60c`.	2014-12-19 17:39:30 -08:00
Jeff Squyres	35bf5fc60c	libfabric: add missing header file	2014-12-19 17:33:11 -08:00
Jeff Squyres	e0f660cb9e	libfabric: fix clang compile error in usnic provider From ofiwg/libfabric@0078c93ae4	2014-12-19 15:45:16 -08:00
Jeff Squyres	75797c4f30	libfabric: update embedded libfabric configury To support the newly-copied libfabric downloaded from github ofiwg/libfabric@8da3957de3.	2014-12-19 14:45:30 -08:00
Jeff Squyres	e2362988a9	libfabric: update to ofiwg/libfabric@8da3957de3 Pull down a new embedded copy of libfabric from https://github.com/ofiwg/libfabric.	2014-12-19 14:45:21 -08:00
Howard Pritchard	91b0d03bf2	pmix/cray: remove dead code	2014-12-19 13:08:23 -08:00
Ralph Castain	123fdd603f	If we are using hwthread cpus, then default to binding there, letting the user override to whatever they want	2014-12-19 08:04:28 -08:00
Rolf vandeVaart	26482db736	Bump up max send size. Gives much better performance for GPU transfers while only decreasing host transfers by a small amount.	2014-12-18 13:22:58 -08:00
Jeff Squyres	de31b08a24	Merge pull request #319 from miked-mellanox/topic/opal_path_nfs_autofs skip check for autofs if fstype is autofs jenkins: check	2014-12-18 15:47:16 -05:00
Mike Dubman	da5b8c6879	OPAL: skip comparison when when fs=autofs in mtab, because we are looking for reals fs type	2014-12-18 21:42:25 +02:00
Jeff Squyres	c621d1e622	libfabric: don't LIBADD the common library in the static case Adding the libfabric common library in the --disable-dlopen case will result in duplicate symbols.	2014-12-18 11:04:08 -08:00
Jeff Squyres	140bb3d421	hwloc configure: fix typo -- add missing $ Arrgh! Missed a "$" in the last commit, making the test always false.	2014-12-18 10:25:43 -08:00
Jeff Squyres	be6d46490f	hwloc: only add CPPFLAGS if hwloc is actually being built As pointed out by @ggouaillardet, we were adding some unnecessary -I flags to CPPFLAFGS when --without-hwloc was being used. This commit slightly updates the hwloc191 component configury to only add such things when the component is, in fact, going to be compiled/installed.	2014-12-18 08:56:49 -08:00
Jeff Squyres	c205c70f39	usnic libfabric: remove useless "config.h" includes This change was also committed upstream in libfabric.	2014-12-18 08:47:59 -08:00
Jeff Squyres	269d7f9713	openib: don't use opal_using_threads() in component_init Use the flag that was passed in, instead.	2014-12-17 15:08:43 -08:00
Jeff Squyres	c1b43b6753	libfabric: the LIBADD should be unconditional The LIBADD for the common libfabric library does not belong down in the providers; it needs to be set when the libfabric core itself decides to build.	2014-12-17 14:02:08 -08:00
Jeff Squyres	f1a5d3a90d	configury: propagate a libtool shared lib version for libfabric	2014-12-17 13:36:01 -08:00
Jeff Squyres	d6f059f538	configury: add some descriptive output messages in configure Ensure that the ofi MTL and the usnic BTL have good descriptive output messages in configure.	2014-12-17 13:36:01 -08:00
Jeff Squyres	6edc19d78d	libfabric: ensure that shell variables are initialized Ensure that the <provider>_happy shell variables are initialized to 0. Without this, the --without-libfabric case would leave them initialized, resulting in "test: -eq operator expecting a value" kinds of errors.	2014-12-17 13:36:01 -08:00
Rolf vandeVaart	f55de452ab	Change the way we register the sm memory pool with CUDA. Rather than just registering local free lists, register the entire pool as the local process does not know which memory the remote processes are using for free lists. Fixes performance problem we were seeing with copying out of memory (since host piece was not pinned).	2014-12-17 14:21:34 -05:00
George Bosilca	830df07202	Fix the indentation.	2014-12-16 16:07:42 -05:00
George Bosilca	146ab96e29	These variables are now unnecessary.	2014-12-16 16:05:00 -05:00
Aurélien Bouteiller	ee3b090316	The fallback case when yama is not installed was not correct in CMA vader	2014-12-16 14:39:14 -05:00
Aurélien Bouteiller	0bf860ef02	indentation	2014-12-16 14:22:26 -05:00
Jeff Squyres	95da4a5a0e	usnic: no longer use opal_using_threads() Instead, use the flag that is passed in.	2014-12-16 08:49:01 -08:00
Artem Polyakov	01601f3284	Merge pull request #305 from artpol84/timing Timing framework improvement	2014-12-16 15:13:48 +06:00
George Bosilca	357daa834e	Stay on the safe side: Only one thread is allowed to handle an event_base.	2014-12-15 23:19:51 -05:00
George Bosilca	2fec570fe7	There is no need to keep track of these events. They are scheduled as triggers in libevent, so one bookkepping should be enough.	2014-12-15 22:35:29 -05:00
George Bosilca	46baab350c	The event is automatically deleted by default.	2014-12-15 21:59:20 -05:00
George Bosilca	b01abfa0d7	Don't over-do it!	2014-12-15 21:33:32 -05:00
George Bosilca	f87a4b691b	Solve another handshake problem, where one threads was calling del_event while cleaning up after receiving a zero byte on the connect socket (localyy started connection), while another was trying to accept a new connection from the same peer. Create a zero-timed event and delocalize the accept into a timer_event. Add support for registering an error callback, that can be used when a connection is discovered as failed during the initialization process.	2014-12-15 20:27:32 -05:00
George Bosilca	e20413c885	Rearrange the code to remove a compiler complaint about the missing return from a non-void function.	2014-12-15 15:42:57 -05:00
Ralph Castain	573a574a3c	Remove an unused dstore type that was redundant with another one. Define a corresponding PMIX_NODE_ID type (contains the vpid of the daemon hosting the proc) and ensure that the PMIx server includes that info in its process map	2014-12-15 12:11:13 -08:00
Mike Dubman	2fbe87defe	Merge pull request #314 from miked-mellanox/topic/fix_opal_path_nfs add support for autofs and make check pass. jenkins: check,src_rpm	2014-12-15 20:52:52 +02:00
Ralph Castain	9658256a98	Restore the passing of the complete job map to the local proc on first get_attr so the info can be used by the MPI layer without continual calls back to the server. We'll find a more memory efficient method later.	2014-12-13 18:44:09 -08:00
Mike Dubman	42f3fa0d1e	OPAL: add support for autofs magic type	2014-12-13 20:27:47 +02:00
Jeff Squyres	9e6b157cb6	opal: minor update to guess_strlen This is a minor update to open-mpi/ompi@c52601f0c5. If we have vsnprintf(), we might as well not have the rest of the guess_strlen() routine. Also document the nifty trick/behavior of vsnprintf() that enables this shortcut (it was new to me!).	2014-12-13 08:09:34 -05:00
George Bosilca	2edbe16c47	Add the necessary infrastructure to allow the dumping of all TCP informations related to an endpoint (status and all pending fragments). Do some minor space cleanup.	2014-12-13 01:59:55 -05:00
George Bosilca	5b8616d890	Fix the race condition in endpoint connection initialization. The race was quite subtle, and only happened on the process with the smallest guid (as this process will tear down the connection created locally and replace it with the result of accept). If multiple threads are active in the system, the deadlock occurs during the recv event deletion as one thread will hold the recv event lock of the endpoint and try to access the TCP event base lock, while the other thread will hold the TCP event base lock while trying to access the recv event lock (in case data is available on the socket). The proposed solution let the event callback fail to process the data, preventing the deadlock and allowing the other thread to always complete it's job. As the event is not execute the same triggered will trigger again at the next opportunity, so this solution introduce a minimal delay in the connection establishement.	2014-12-13 01:45:00 -05:00
Ralph Castain	c52601f0c5	It looks like the guess_len function in our local printf.c has some questionable code in it. Now that we are checking in configure for vsnprintf, take advantage of that check to use the far simpler method if it is available. Given that we no longer support such ancient systems where this might not be available, one suspects the other questionable code may no longer be required - but set that aside for another day.	2014-12-12 17:47:17 -08:00
Ralph Castain	bffb2b7a4b	Correct some issues with variables used before being set	2014-12-12 17:23:32 -08:00
Ralph Castain	0630680f36	Two cleanups required for transfer to 1.8.4: * Use %d format for the topo signature as some systems apparently have problems with %u * Use correct variable in show_help message	2014-12-12 17:23:32 -08:00
Howard Pritchard	6cf258638a	mpool/udreg: minor comment improvement	2014-12-12 14:05:18 -07:00
Nathan Hjelm	38d66272c5	btl/vader: fix compile on SGI UV	2014-12-12 09:09:01 -07:00
Jeff Squyres	e4b3c6f1c4	libfabric psm: fix (void*) dereference Committed upstream to libfabric as well.	2014-12-11 20:12:13 -08:00
Jeff Squyres	0f28233b35	libfabric: don't use __thread There's no real reason that this routine should use thread local storage. Plus, __thread appears to be a GCC extension.	2014-12-11 14:10:48 -08:00
Rolf vandeVaart	9ee8e1dcf4	With PGI compile we need stdarg.h for va_list define	2014-12-11 16:14:57 -05:00
Jeff Squyres	4551cab6f1	help messages: fix obvious typos	2014-12-11 12:23:33 -08:00
Nathan Hjelm	7e5af9cecf	opal_lifo: fix potential race condition when using 128-bit atomics On x86_64 reading a 128-bit value requires multiple instructions. Under some conditions if the counted pointer counter is read before the item pointer the fifo can be left in an inconsistent state. This commit forces the read of the counter to always be read first. The fifo does not appear to suffer from the same race.	2014-12-10 12:51:44 -07:00
rolfv	f471b09ae9	Add support for CUDA Unified memory. Basically, add a new flag and disable some optimizations when that flag is detected. Lightly reviewed by bosilca.	2014-12-10 05:46:00 -08:00
Artem Polyakov	8ffad75a0a	Introduce timing interval measurement facility in timing framework	2014-12-10 16:47:49 +06:00
Nathan Hjelm	52ed5a9bf8	opal_lifo: fix one more potential issue with the new 128-bit lifo atomics It is possible the compiler can reorder the read of the head item and the head itself. This could lead to a situation where the item returned was not really the head item.	2014-12-09 21:48:14 -07:00
Nathan Hjelm	a40fe8311f	opal_lifo: add missing memory barriers in 128-bit atomic functions	2014-12-09 19:50:08 -07:00
Jeff Squyres	e6c8bfc201	libfabric: Gah -- also remove the "pragma pop" line Thanks to Nathan for pointing out that I missed snipping one line in `2f9c69f016` (I removed the trailing comment, but not the trailing pragma -- oops!).	2014-12-09 14:03:39 -08:00
Jeff Squyres	2f9c69f016	libfabric: use correct C99 notation for var-length array Nathan pointed out the correct C99 way to notate a variable-length array in a struct. This change has now been accepted upstream in libfabric.	2014-12-09 13:33:15 -08:00
Nathan Hjelm	d0da29351f	opal_progress: fix sched_yield check	2014-12-09 14:14:20 -07:00
Jeff Squyres	cd0a54d76f	usnic: short term fix to enable builds on non-libfabric platforms This isn't quite the Right fix yet, because it doesn't address usnic for external libfabric builds. I'll fix that separately / later.	2014-12-09 09:19:26 -08:00
Nathan Hjelm	b2b7ecc7c4	Merge pull request #300 from hjelmn/topic/atomic_lifo_fifo Add opal_fifo_t class and rename opal_atomic_lifo_t to opal_lifo_t	2014-12-09 10:54:50 -06:00
Jeff Squyres	c40fd09d2a	libfabric: fix providers to conditionally add libs/flags Only allow the usnic and PSM providers to add CPPFLAGS and LIBADD flags when they are going to be built.	2014-12-09 07:15:25 -08:00
Jeff Squyres	45d6f29a27	Merge pull request #310 from yburette/master libfabric: add optional PSM provider.	2014-12-09 06:39:34 -08:00
Jeff Squyres	6e24a1eb85	usnic: update for libfabric API change Use FI_ADDR_UNSPEC for posting a receive from an unspecified source.	2014-12-09 06:06:52 -08:00
Jeff Squyres	f5a07f651c	libfabric: Open MPI addition to stem a flood of warnings Add a pragma to not warn about zero-length arrays. This needs to be addressed upstream, but for now, do it here.	2014-12-09 06:04:37 -08:00
Jeff Squyres	f331f48796	libfabric: update embedded libfabric to 934a714 Update the embedded copy of libfabric to the github ofiwg/libfabric repo hash 934a714ca85f1a30a1e384a7d5f714ee962dc253.	2014-12-09 06:03:51 -08:00
Jeff Squyres	09d03a154b	libfabric: fix some typos in the usnic configury	2014-12-09 05:52:24 -08:00
Ralph Castain	18d9fdfd8d	Restore full topology comparison to support inventory monitoring	2014-12-09 01:33:06 -08:00
Ralph Castain	9b2f8cd840	Add the processor architecture to the topology signature	2014-12-09 01:17:00 -08:00
Howard Pritchard	3a14c8eeff	fix build for cray xc Recent addition of libfabric embdded broke build on Cray XC/XE. This commit fixes this problem.	2014-12-08 22:21:13 -08:00
Yohann Burette	f90a7b51d2	libfabric: add optional PSM provider.	2014-12-08 16:49:41 -08:00
Ralph Castain	bb529ebd8e	Revise the way we handle hetero nodes as users are finding this (a) a significant surprise, and (b) confusing as to when it is required. So try to automate it a bit by creating a topology "signature" that mpirun can share on the cmd line with the remote daemons, thus allowing them to check to see if they match. This isn't comprehensive of course - for now, it only checks the number of each type of hwloc object on the node. This is good enough to pickup major differences (e.g., where we have different numbers of sockets or assigned core bindings). Retain the hetero-nodes flag for those cases where the user knows that there are differences and our automated system isn't good enough to see it. Will obviously require further refinement as we find out which variances it can detect, and which it cannot.	2014-12-08 15:38:14 -08:00
Yohann Burette	f33a9afd22	libfabric: fix typo in Makefile.am	2014-12-08 13:19:43 -08:00
Jeff Squyres	ac8e9d103c	libfabric: need to make AM_CONDITIONALs always be run Ensure that the usnic-specific AM_CONDITIONAL for the embedded libfabric is always run.	2014-12-08 11:51:26 -08:00
Jeff Squyres	d64881f040	psm_am.h: add missing file from libfabric snapshot This is just about to be fixed upstream, but "make dist" was not including this file in the libfabric tarball.	2014-12-08 11:39:08 -08:00
Jeff Squyres	d02756cdbb	libfabric: various configury updates 1. Ensure to override CFLAGS properly. Move the setting of CFLAGS outside the AM_CONDITIONAL so that Automake doesn't get confused (because CFLAGS is already set inside an AM_CONDITIONAL -- moving it outside the conditional ensure that this local CFLAGS override trumps all other CFLAGS overrides). 2. Only build libfabric on Linux. Add a little more configury to ensure that we only try to build libfabric on Linux. 3. Remove a dead/unused file 4. Fix typo in condition check 5. Use "false", not "/bin/false"	2014-12-08 11:39:07 -08:00
Jeff Squyres	92818d1fa5	usnic: remove SVN-style $Id$ tokens (and #idents) This commit is also upstream in libfabric.	2014-12-08 11:39:07 -08:00
Jeff Squyres	9547345b18	usnic: fix show_help message Rename a few symbols to use libfabric-friendly names. Fix a show_help message when fi_av_insert times out.	2014-12-08 11:39:07 -08:00
Jeff Squyres	8e49cc754f	usnic: update to latest libfabric API changes	2014-12-08 11:37:37 -08:00
Jeff Squyres	c4e8d67515	libfabric: sync to upstream libfabric github Bring down the latest from the libfabric github, as of 9d051567c8eb7adc2af89516f94c7d0539152948.	2014-12-08 11:37:37 -08:00
Jeff Squyres	7a96b58882	common verbs: remove usnic-specific code Now that the usnic BTL uses libfabric, we can remove the usnic-specific code from opal/mca/common/verbs.	2014-12-08 11:37:37 -08:00
Jeff Squyres	984982790a	usnic: convert from verbs to libfabric (yay!) This commit represents the conversion of the usnic BTL from verbs to libfabric. For the moment, libfabric is embedded in Open MPI (currently in the usnic BTL). This is because the libfabric API is still changing, and also has not yet been released. Ultimately, this embedded copy of libfabric will likely disappear and the usnic BTL will rely on an external installation of libfabric. New configure options: * --with-libfabric: will cause configure to fail if libfabric support cannot be built * --without-libfabric: will prevent libfabric support from being built * --with-libfabric=DIR: use an external libfabric installation * --with-libfabric-libdir=LIBDIR: when paired with --with-libfabric=DIR, use LIBDIR for the libfabric installation library dir The --with-libnl3[-libdir] arguments are now gone.	2014-12-08 11:37:37 -08:00
Nathan Hjelm	113b6bbdca	opal_stdint.h: fix GCC diagnostic pragma	2014-12-05 07:19:14 -07:00
Gilles Gouaillardet	32bac600f7	opal: fix a warning caused by the introduction of opal_int128_t type	2014-12-05 12:14:31 +09:00
Nathan Hjelm	d1114ec17a	Add opal_fifo_t class This commit adds a new class: opal_fifo.h. The new class has atomic, non-atomic, and opal_using_threads() conditoned routines. It should be used when first-in first-out is required and should perform much better than using locks and an opal_list_t. Like with opal_lifo_t there are two versions of the atomic implementation: 128-bit compare-and-swap, and spin-locked. More implementations can be added later (LL/SC comes to mind). This commit also adds a unit test for the opal_fifo_t class. This test verifies the fifo implementation when using multiple threads.	2014-12-04 15:30:02 -07:00
Nathan Hjelm	20c6eb5237	Rename opal_atomic_lifo_t to opal_lifo_t and improve interface - Rename opal_atomic_lifo_t to opal_lifo_t to reflect both atomic and non-atomic usage. Added new routines (opal_lifo_*_st) for non-atomic usage as well as routines conditioned off opal_using_threads(). The atomic versions are always thread safe and the non-atomic are always not thread safe. - Add a new atomic lifo implementation that makes use of 128-bit compare-and-swap. The new implementation should scale better with larger numbers of threads. - Add threading unit test for opal_lifo_t.	2014-12-04 15:30:02 -07:00
Nathan Hjelm	a0083ceab4	Adjust cmpxchg16b clobber list	2014-12-04 15:29:28 -07:00
Nathan Hjelm	fe787512d8	Add support for __sync builtin compare and swap on 128-bit values	2014-12-04 09:23:51 -07:00
Nathan Hjelm	250f749602	Fix return type of opal_atomic_cmpset_128. The return type will be opal_int128_t after the fetching atomics changes but for now it is int.	2014-12-04 09:23:51 -07:00
Nathan Hjelm	b1632dfb3c	Define opal_int128_t type if a 128-bit integer is available. There currently is no standard support for 128-bit integer types. Any use of the __int128 and int128_t types can lead to warnings from the compiler when using -Wpedantic. Additionally, some compilers may support __int128 and other may support int128_t. This commit addresses both issues by defining opal_int128_t if there is a supported 128-bit type. In the case of GCC a pragma has been added to suppress warnings about __int128 not being a standard C type.	2014-12-04 09:23:51 -07:00
Nathan Hjelm	b2b58b31a2	Add support for 128-bit compare and swap on x86_64 when available. A 128-bit compare-and-swap will enable a better atomic lifo implementation that uses the pointer + counter method to avoid ABA issues. This commit adds configury to check for the instruction (cmpxchg16b) and adds an implementation that uses the __int128 type available in C99.	2014-12-04 08:53:28 -07:00
George Bosilca	04a4cbd77a	Fix the clock_gettime monotonic timer. Thanks to Gilles for the first sketch of the patch.	2014-12-04 00:20:56 -05:00
Jeff Squyres	983bd49f11	opal_timer_require_monotinic: change to bool / level 5	2014-12-03 17:09:43 -08:00
Jeff Squyres	8880b070b8	Merge pull request #295 from jsquyres/topic/bosilca-accurate-timers Topic/bosilca accurate timers	2014-12-03 19:46:14 -05:00
Jeff Squyres	cf35e0c28c	timers: fix 32 bit compile of timer	2014-12-03 16:43:33 -08:00
Howard Pritchard	c67afadcfc	Merge pull request #289 from hppritcha/topic/remove_pmi Topic/remove pmi	2014-12-03 16:58:35 -07:00

... 2 3 4 5 6 ...

3239 Коммитов