openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	0c502d90cd	hwloc README-ompi.txt: update for what we pulled from hwloc Document what we pulled from the hwloc tree.	2015-03-27 06:49:42 -07:00
Brice Goglin	29ccbfd590	hwloc pci: fix bridge depth It was setup in the PCI backend before filtering, and partially updated after filtering in the core. Only setup once correctly after filtering in the core. (cherry picked from commit open-mpi/hwloc@9659653d24) Conflicts: tests/hwloc/linux/40intel64-2g2n4c+pci.output tests/hwloc/xml/192em64t-12gr2n8c2t-distancegroups.xml tests/hwloc/xml/192em64t-24n8c2t-distancegroups.xml tests/hwloc/xml/192em64t-24n8c2t-nodistancegroups.xml tests/hwloc/xml/24em64t-2n6c2t-pci.xml tests/hwloc/xml/32em64t-2n8c2t-pci-normalio.xml tests/hwloc/xml/96em64t-4n4d3ca2co-pci.xml utils/hwloc/test-hwloc-compress-dir.input.tar.gz utils/hwloc/test-hwloc-compress-dir.output.tar.gz Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 06:49:39 -07:00
Brice Goglin	1905f35a1e	hwloc: bitmap: fix a corner case in hwloc_bitmap_isincluded() with infinite sets If super_set contains more allocated ulongs than sub_set, we did not check the last ulongs. We would return true instead of false when sub_set is infinite while the last ulongs in super_set are not full. This fixes tests/hwloc_bitmap_compare_inclusion on some platforms. (cherry picked from commit open-mpi/hwloc@299e6e846f) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:44 -07:00
Brice Goglin	5c9157c547	hwloc: core: only update root->complete sets if insert succeeds Otherwise we get spurious bits for crazy topologies such as 8em64t-2s2ca2c-buggynuma.output Will make debug asserts easier. (cherry picked from commit open-mpi/hwloc@546cd9330a) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:44 -07:00
Brice Goglin	dec01097f8	hwloc: groups: add complete sets when inserting distance/pci groups Make sure we define complete cpuset/nodeset when we define groups' main cpuset/nodeset during later insert of groups (for PCI hostbridges or distances). Otherwise they may end up clearing child/parent complete sets which suddenly become incoherent while they were fixed earlier. Needed to fix allowed_nodeset meaning. (cherry picked from commit open-mpi/hwloc@7c88d17add) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:33 -07:00
Brice Goglin	d6e415cd41	hwloc: AIX: Fix PU os_index When looking for PUs inside R_MAXSDL rads, some AIX 6.1 releases return one first rad without any PU. AIX 6.1 00F63F144C00 does (on quad-power7). AIX 6.1 00CBAAC24C00 doesn't (on 16x power6). So we can't assume rad #x contains PU #x. But we already have the right code to fill the cpuset from the rad, so use that to obtain the PU os_index as well. Cannot be used to obtain NUMA node os_index since there's no way to directly retrieve NUMA nodes from rads (mempools seem unrelated). Just keep using #rad for NUMA nodes os_index and document that convention when converting back in set_membind(). Thanks to Hendryk Bockelmann and Erik Schnetter for helping debugging. (cherry picked from commit open-mpi/hwloc@60006c7b88) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:33 -07:00
Brice Goglin	80140bbe7b	hwloc: distances: when we fail to insert an intermediate group, don't try to group further above Otherwise we'll have some NULL objects above, would be annoying. No need to dig further, the distance matrix is likely buggy. We still keep the inserted groups at this level (incomplete level) because removing them is hard. (cherry picked from commit open-mpi/hwloc@312a971ec9) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:33 -07:00
Brice Goglin	29c99156cf	hwloc: pci: fix SR-IOV VF vendor/device names Commit 626129d2818693e62b83c1cfa2ba6e058e5bed66 fixed the hwloc device/vendor numbers obtained from libpciaccess. But the corresponding names are still retrieved from pciaccess numbers, so fix these numbers inside pciaccess structures before retrieving the names. (cherry picked from commit open-mpi/hwloc@85ea6e4acc) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:32 -07:00
Brice Goglin	da164be0ef	hwloc: error: point to the FAQ when displaying the big OS error message (cherry picked from commit open-mpi/hwloc@b191f816f6) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:32 -07:00
Brice Goglin	3f96e7a271	hwloc: synthetic: Misc levels are not allowed in the synthetic description Misc objects were used between system and machine in the past but quickly got replaced with groups. (cherry picked from commit open-mpi/hwloc@6c2aa6d1ea) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:16 -07:00
Brice Goglin	050bb35feb	hwloc: x86: use Group instead of Misc for unknown x2apic levels Misc are reserved for annotating the topology, the core doesn't like merging them. Group is more appropriate. (cherry picked from commit open-mpi/hwloc@3c47649591) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:16 -07:00
Brice Goglin	379c7b0d8b	hwloc: x86: use ulong for cache sizes, uint won't be enough in the near future (cherry picked from commit open-mpi/hwloc@ae82597773) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:16 -07:00
Brice Goglin	6caf9edbea	hwloc: hpux: improve hwloc_hpux_find_ldom() looking for NUMA node hwloc_get_first_largest_obj_inside_cpuset() returns the largest/highest object, but it could still have a child with the same cpuset. So check children as well in case there's a matching NUMA node there. (cherry picked from commit open-mpi/hwloc@57a1c4fbe4) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:16 -07:00
Brice Goglin	fff1bb5dcd	hwloc: core: reorder children in merge_useless_child() as well When ignore_keep_structure is enabled, intermediate level can disappear between parent and child, making the new child complete_cpuset smaller, causing the child list to require a reorder just like in remove_ignored(). (cherry picked from commit open-mpi/hwloc@88afbe6b62) Embed this related commit: core: abstract out reorder_children(), needed when merging modifies the list of children (cherry picked from commit open-mpi/hwloc@14db82d391) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:16 -07:00
Brice Goglin	77978a846e	hwloc: core: fix the merging of identical objects in presence of Misc objects If object A contains B + I/O as children, we can "ignore" I/Os and still try to merge A and B. We now do the same for Misc objects without cpusets instead of I/Os. This fixes a corner case when export/reimport to XML creates a slightly different topology (making hwloc_insert_misc fail inside a Linux cgroup). Thanks to Dave Love for reporting the problem. Fixes #118 (cherry picked from commit open-mpi/hwloc@650371e115) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:15 -07:00
Brice Goglin	5427b33caf	hwloc: debug: fix an overzealous assertion about the parent cpuset vs its children When I/O are attached under a PU, removing the children's cpusets from the parent cpuset doesn't give 0, it gives the PU cpuset. The assertion fails on single-pu machines with I/O when --merge is given, only one PU remains with I/O under it. But if we insert Misc by cpuset under PU, it gives 0 as expected. Fix the assertion accordingly. Thanks to Thomas Van Doren for reporting the issue. (cherry picked from commit open-mpi/hwloc@45c94c336d) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:15 -07:00
Brice Goglin	9b59d532fc	hwloc: cpuid-x86: Fix duplicate asm labels in case of heavy inlining on x86-32 hwloc_x86_discover() calls hwloc_look_x86() twice, which calls hwloc_have_x86_cpuid(). If everything gets inlined, the asm label inside hwloc_have_x86_cpuid() is duplicated. Use a local label with f annotation in jumps to avoid the problem. Thanks to Thomas Van Doren for reporting the issue (found with gcc -m32). (cherry picked from commit open-mpi/hwloc@50e447f5bc) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:15 -07:00
Brice Goglin	86a536ca58	hwloc: x86 and OSF: Don't forget to set NUMA node nodeset x86: Not critical since BSDs that use this backend have no membind support, but better fix it for uniformization. (cherry picked from commit open-mpi/hwloc@a431361c7d) OSF: Looks like nobody ever tried to play with memory binding on OSF/Tru64. (cherry picked from commit open-mpi/hwloc@2d6c73356d) Conflicts: NEWS Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:15 -07:00
Brice Goglin	db5bc72496	hwloc: API: clearly state that os_index isn't unique while logical_index is (cherry picked from commit open-mpi/hwloc@6c75302ab2) Conflicts: opal/mca/hwloc/hwloc191/hwloc/include/hwloc.h Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:14:15 -07:00
Brice Goglin	a636790604	hwloc: opal/mca/hwloc/hwloc191/hwloc/NEWS update Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:13:41 -07:00
Brice Goglin	7c96aecfaf	hwloc: errors: improve the advice to send hwloc-gather-topology files in the OS error message (cherry picked from commit open-mpi/hwloc@f77aa01b3c) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:13:41 -07:00
Brice Goglin	d5f8c89527	hwloc: configure: fix the check for X11/Xutil.h At least some solaris enforce the need to #include X11/Xlib.h first. Thanks to Siegmar Gross for reporting the issue. (cherry picked from commit open-mpi/hwloc@005a7e89b6) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:13:01 -07:00
Brice Goglin	50b035dddb	hwloc: misc.h: Fix hwloc_strncasecmp() with some icc tolower needs <ctype.h> Thanks to Ralph Castain for reporting the failure. (cherry picked from commit open-mpi/hwloc@038c372a58) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:10:57 -07:00
Brice Goglin	6764413aa3	hwloc: misc.h: Fix hwloc_strncasecmp() build under strict flags on BSD strncasecmp() needs <strings.h> Thanks to Pavan Balaji for reporting the failure. (cherry picked from commit open-mpi/hwloc@37439c4801) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:10:46 -07:00
Brice Goglin	6b0011f138	hwloc: v1.9.1 released, doing 1.9.2rc1 now Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2015-03-27 05:06:15 -07:00
Jeff Squyres	a85edb8ad4	libfabric: update to Github libfabric 0d7daf720f04	2015-03-26 14:40:46 -07:00
Elena	90f5b2bb84	Introduce -tune command line option to set env vars and mca params from file	2015-03-26 18:33:53 +02:00
Ralph Castain	1b24536941	Allow for different security domains. Let the initiator of the connection determine the method to be used - if the receiver cannot support it, then that's an error that will cause the connection attempt to fail.	2015-03-25 13:22:01 -07:00
Ralph Castain	9dbc69df0f	Stop an ugly infinite loop caused by continual re-opening of the opal if framework.	2015-03-24 17:50:14 -07:00
Rolf vandeVaart	dfb7e00ef5	Make sure context is still around when doing some other cleanup	2015-03-24 16:47:40 -04:00
Ralph Castain	ed5d10b816	Somehow slipped by - ensure we correctly count the cores	2015-03-19 17:56:18 -07:00
Ralph Castain	43a3baad5e	Ensure we use the first compute node's topology for mapping Don't filter the topology by cpuset if you are mpirun until you know that no other compute nodes are involved. This deals with the corner case where mpirun is executing on a node of different topology from the compute nodes. Simplify - don't mandate that all cpus in the given cpuset be present on every node. We can then run everything thru the filter as before, which ensures that any procs run on mpirun are also contained within the specified cpuset. Correctly count the number of available PUs under each object when given a cpuset Fix the default binding settings, and correctly count PUs when no cpuset is given Ensure the binding policy gets set in all cases	2015-03-19 16:30:36 -07:00
Nathan Hjelm	ccba8ce856	Merge pull request #457 from hjelmn/mpit_fixes mca/base: fix bugs in framework deregistration/re-registration	2015-03-18 08:37:49 -06:00
Ralph Castain	d7d8ae46ed	We no longer pass the RML URI for procs launched via mpirun as the daemon has no need for that info.	2015-03-17 06:10:20 -07:00
Mike Dubman	7640507438	Merge pull request #472 from miked-mellanox/topic/fix_compile_warn btl/openib: fix compiler warning, by HalR	2015-03-13 14:06:07 +02:00
Jeff Squyres	4ab9e67832	hwloc external: portability updates Change "test -a" to "&& test", and change foo="$bar" to foo=$bar. No substantive code changes.	2015-03-13 04:40:09 -07:00
Jeff Squyres	4d63c88ed1	hwloc external: whitespace cleanup, no code changes	2015-03-13 04:40:05 -07:00
Mike Dubman	00784ae3ba	btl/openib: fix compiler warning, by HalR	2015-03-13 13:17:23 +02:00
Todd Kordenbrock	9350b06f7d	btl-portals4: fix compiler warnings	2015-03-12 20:34:04 -05:00
Jeff Squyres	65a0e041ac	dl: need to use LIBADD, not LIBS When we use LIBADD for static libraries, the dependent libraries get propagated properly. For example, the dl/dlopen component will almost certainly require the -ldl library; when using LIBS, that doesn't get propagated elsewhere in the tree, but when using LIBADD, it does (e.g., when linking opal_wrapper_compiler).	2015-03-12 15:01:14 -07:00
Ryan Grant	6f76984a3c	Merge pull request #470 from tkordenbrock/topic/update-portals4-to-btl3 btl-portals4: implement the BTL 3.0 interface	2015-03-12 15:34:05 -06:00
Jeff Squyres	a1daa39425	libfabric: update to Github lifabric 90ac5a258418e Update to latest upstream Github lifabric in order to fix some usnic bugs.	2015-03-12 13:23:32 -07:00
Todd Kordenbrock	d1656347c8	btl-portals4: implement the BTL 3.0 interface	2015-03-12 14:19:44 -05:00
adrianreber	714d9aa67e	Merge pull request #348 from adrianreber/topic/orte_cr_continue_like_restart Topic/orte cr continue like restart	2015-03-12 14:54:02 +01:00
Nathan Hjelm	fd78491768	Merge pull request #451 from elenash/master fix: mca_base_env_var mca parameter is never handled if it's set from am...	2015-03-11 09:54:25 -06:00
Nathan Hjelm	ce6caab2a7	Merge pull request #463 from hjelmn/cuda_async btl/openib: cuda: fix CUDA-aware support with async copy	2015-03-11 09:52:48 -06:00
Jeff Squyres	c61dd4d56f	usnic: each err eq entry reports 1 completion Actually, the return from fi_eq_readerr() only indicates a single error completion (not err_entry.data completions).	2015-03-11 08:07:20 -07:00
Ralph Castain	2de5cd6e5f	Ensure we don't install the libevent internal headers	2015-03-11 07:35:20 -07:00
Nathan Hjelm	395635f017	Merge pull request #461 from hjelmn/btl_openib_cleanup btl/openib: remove derived btl segment type	2015-03-11 08:20:41 -06:00
Jeff Squyres	9c926e5e82	usnic: add more commments/explanation about error cases If we really get a catastrophic error from a libfabric call, don't bother trying to continue (because data has been corrupted and there's nothing sane left to do). Just call opal_btl_usnic_exit() (which tries to call the PML error callback, but we're so early in the module_init process that this likely hasn't been setup yet, so the job will likely abort).	2015-03-11 07:16:28 -07:00
Jeff Squyres	51583789fb	usnic: re-indent some show_help code Nothing too substantial here, but two of the messages moved from "libfabric API failed" to "internal error during init", just to be a bit more descriptive.	2015-03-11 07:15:28 -07:00
Jeff Squyres	1b836d784c	usnic: subtract number of errored insertions from loop count When we get errors, the entry.data field tells us how many errors are being reported. So decrement the loop count variable by that much. This fixes CSCut30441.	2015-03-11 07:13:10 -07:00
Adrian Reber	f45dd069bd	FT: fix compilation using --with-ft (1/5) Enabling the FT code breaks compilation (again). This series tries to fix the compiler errors. This is again only fixing the compiler errors without any warranty that the result might actually support FT again. This first patch moves orte_cr_continue_like_restart from ORTE to opal_cr_continue_like_restart in OPAL. This only leaves three calls from OPAL to ORTE in the FT code. As it is not yet 100% clear how to handle these calls the code orte_sstore.set_attr() has been #ifdef'd out for now.	2015-03-11 14:23:33 +01:00
Nathan Hjelm	b308afa8fd	btl/openib: remove derived btl segment type The derived segment type (btl_openib_segment_t) was intended to store the registration info needed for put and get. In BTL 3.0 this is no longer required. I intended to remove this type as part of open-mpi/ompi@74f1af4548 . Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-03-10 14:41:15 -06:00
Nathan Hjelm	3d32dbd793	btl/openib: cuda: fix CUDA-aware support with async copy This commit should resolve an issue seen with CUDA-aware support. The problem came in with BTL 3.0. Before 3.0 the size of the copy was stored in the incoming segment's des_remote_count field. This field does not exist in BTL 3.0 so I stored the value in the des_segment_count field. This caused problems with the cuda support code. To fix the issue the endpoint pointer is now stored in the in fragment's endpoint pointer which free's up the segment's des_cbdata pointer for storing the transfer size. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-03-10 14:38:12 -06:00
Jeff Squyres	cbd99d5f60	libfabric: update to Github upstream 1b4bb2285b Get a usnic bug fix.	2015-03-10 12:09:02 -07:00
Jeff Squyres	d97551bdb1	usnic: endpoint type hint moved to a sub-struct Update to match new libfabric API/structure change.	2015-03-10 09:47:41 -07:00
Jeff Squyres	1a1be2efa0	libfabric: update to Github upstream 7095f3dc	2015-03-10 09:47:40 -07:00
Jeff Squyres	afec1454f5	usnic: only setup the connectivity checker if we have modules If we ended up with no modules (e.g., all usnic devices were excluded), there was a race condition in that the connectivity agent could tear down its local socket before one or more of the local clients saw it. Therefore, the local clients would timeout waiting for the socket to appear. So move the connectivity checker init later in the bootstrapping process (it must be setup before module_init()), and have it only invoked if we actually ended up with one or more modules.	2015-03-10 07:43:20 -07:00
Jeff Squyres	06accb721c	usnic: ensure to free all resources if no usnic BTLs found If all usnic devices are excluded, then we need to ensure the error path includes freeing the filter. This was Coverity CID 1288085	2015-03-10 07:43:20 -07:00
Jeff Squyres	8fef4e865f	dl dlopen: fix use-after-free Re-structure the loop looking for duplicates a little so that we only have a single free of the string that happens regardless of whether we found a duplicate or not. This was Coverity CID 1288090	2015-03-10 07:43:20 -07:00
Jeff Squyres	3efb5f56ae	dl dlopen: ensure dirs is not NULL opal_argv_split() may have returned NULL. This was Coverity CID 1288088	2015-03-10 07:43:20 -07:00
Jeff Squyres	86968dcdda	dl dlopen: fix resource leak closedir() was one block higher than it should have been. This was Coverity CID 1288087.	2015-03-10 07:43:20 -07:00
Jeff Squyres	546ad3f060	dl dlopen: free resources upon error Ensure to take the right path out upon errors (that will free any pending resources). This was Coverity CID 1288086	2015-03-10 07:43:19 -07:00
Rolf vandeVaart	49b5eb6c91	Fix missing initialization of variable	2015-03-10 10:33:27 -04:00
Jeff Squyres	4b2cba46f4	usnic: fix bootstrap error paths Fix previously-unfinished error paths during startup/bootstrapping. Instead of just blindly continuing on when an fi_* function call fails, opal_show_help and skip that device. Also, only check the usnic config minimums once. They're VIC-wide and won't change on a per-device basis -- we only need to check them once. Fixes CSCut19179.	2015-03-09 16:57:41 -07:00
Nathan Hjelm	005c6022e2	mca/base: fix bugs in framework deregistration/re-registration There were a number of bugs in the framework/variable code that affected deregistration: - Frameworks could be erroneously closed if seperately registered and opened then subsequently closed. This was a bug in the original design which only reference counted opens but not registrations. This would cause undefined behavior if MPI_T_finalize actually calls ompi_info_close_components as intended. Now both registrations and opens are reference counted and frameworks/components are not torn down until the matching number of close calls have been made. - group_find_by_name did not pass the invalidok flags down to mca_base_var_group_get_internal correctly. - Group deregistration caused the group to be completely reset. This does not match the behavior required by MPI_T as it could reduce the number of variables/subgroups in a group. This commit also updates MPI_T_finalize to call ompi_info_close_components as originally intended. Closes #374 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-03-09 16:52:53 -06:00
Jeff Squyres	1995f6beba	cuda: convert to opal_dl interface	2015-03-09 08:18:13 -07:00
Jeff Squyres	a9d86129c6	mca base: convert to opal_dl interface	2015-03-09 08:16:55 -07:00
Jeff Squyres	39364d315c	libltdl: dl component based on libltdl Works on any system that libltdl supports and has ltdl.h and libltdl available.	2015-03-09 08:16:55 -07:00
Jeff Squyres	7d340c0c26	dlopen: simple dl component based on POSIX dlopen Works on systems with dlopen (e.g., Linux and OS X). It requires dlfcn.h and libdl, which many systems have installed by default.	2015-03-09 08:16:55 -07:00
Jeff Squyres	e81c070ef0	dl framework: new dynamic loader framework Embedding libltdl without the use of Libtool bootstrapping has proven... difficult. Instead, create a new simple "dl" framework. It only provides 4 functions: - open a DSO (very similar to lt_dlopenadvise()) - lookup a symbol in a previously-opened DSO (very similar to lt_dlsym()) - close a previously-opened DSO (very similar to lt_dlclose()) - iterate over all files in a directory (very similar to ld_dlforeachfile()) There will be follow-on commits with a simple dlopen-based component (nowhere near as complete/functional as libltdl, but good enough for Linux and OS X), and a libltdl-based component for all other platforms. The intent is that the dlopen-based component can be built by default in almost all cases. But if libltdl is available, that component will be built. End result: we still get DSO-based functionality by default in (almost?) all cases. Without embedding libltdl. Which is what we want.	2015-03-09 08:16:55 -07:00
Gilles Gouaillardet	2789d782ab	timer/linux: fix insecure data handling as reported by Coverity with CID 1269923	2015-03-09 19:14:56 +09:00
Rolf vandeVaart	237c268a09	Add extra check during cleanup to make sure we really should clean up the CUDA resources.	2015-03-06 13:12:19 -05:00
Elena	737f06dd68	fix: mca_base_env_var mca parameter is never handled if it's set from amca conf file	2015-03-06 12:12:26 +02:00
Gilles Gouaillardet	da90ed4483	opal/compress: remove misc dead code as reported by Coverity with CIDs 71856, 1269714, 1269715, 1269717, 1269718, 1269723, 1269724	2015-03-06 15:34:08 +09:00
Gilles Gouaillardet	521317341d	btl/openib: fix a double free as reported by Coverity with CID 1287033	2015-03-06 14:58:11 +09:00
George Bosilca	75479c0f17	Fix some typos.	2015-03-05 12:59:58 -05:00
Alina Sklarevich	1560ed9761	initialize opal_common_verbs_want_fork_support to -1. This way, if the call to ibv_fork_init() fails, the job will still continue.	2015-03-05 14:29:09 +02:00
Gilles Gouaillardet	e1cc931e1b	btl/tcp: silence CID 710616	2015-03-05 14:20:08 +09:00
Gilles Gouaillardet	852dbafd51	mca/base: fix misc memory leaks as reported by Coverity with CIDs 710628, 1196713 and 1269855	2015-03-05 14:06:18 +09:00
Gilles Gouaillardet	134c866aa9	btl/openib: fix misc memory leaks as reported by Coverity with CIDs 1269848, 1269852 and 1269862	2015-03-05 14:06:18 +09:00
Gilles Gouaillardet	d1b2f043ff	fix misc memory leaks as already reported by Coverity with CIDs 71818, 71819, 72250, 715767, 1196749 and 1274002	2015-03-05 13:58:05 +09:00
Mike Dubman	171d674ca4	Merge pull request #441 from open-mpi/revert-438-topic/use_opal_common_verbs_want_fork_support Revert "create the opal_common_verbs_want_fork_support parameter."	2015-03-04 10:13:00 +02:00
Gilles Gouaillardet	f43b5b46ee	btl/openib: fix heterogeneous support Thanks @bosilca for the pointer	2015-03-04 13:53:05 +09:00
Gilles Gouaillardet	81b0444ef2	btl/openib: fix comment syntax, no code change and silence gcc warning about nested comments	2015-03-04 11:24:14 +09:00
Rolf vandeVaart	edf58eb549	Implement CUDA-aware workaround while fork support worked out	2015-03-03 09:50:01 -05:00
Mike Dubman	98503b56e0	Revert "create the opal_common_verbs_want_fork_support parameter."	2015-03-03 14:28:31 +02:00
Mike Dubman	cc7caf699e	Merge pull request #438 from alinask/topic/use_opal_common_verbs_want_fork_support create the opal_common_verbs_want_fork_support parameter.	2015-03-02 07:43:37 +02:00
Gilles Gouaillardet	04a0438b56	sec/munge: send NULL terminated strings	2015-03-02 12:19:46 +09:00
Alina Sklarevich	8fe42f1bc1	create the opal_common_verbs_want_fork_support parameter. call the opal_common_verbs_mca_register function to make sure that opal_common_verbs_want_fork_support mca parameter is created and therefore can be used to control the fork support.	2015-03-01 17:40:49 +02:00
Ralph Castain	d81c372ea2	Remove the "forwarding" of envars when direct launched - there aren't any envars we can forward under that use-case	2015-02-27 12:19:48 -08:00
Rolf vandeVaart	e48bc77342	Fix the coverity fix	2015-02-27 12:49:44 -05:00
Rolf vandeVaart	cfe91d4d0f	Fix compile error for CUDA-aware by using new fork MCA parameter.	2015-02-27 10:00:46 -05:00
Gilles Gouaillardet	e0026224e7	pstat linux: close the files as reported by Coverity with CID 71983	2015-02-27 19:48:01 +09:00
Gilles Gouaillardet	f33cd58ee9	btl/tcp: fix misc memory leaks as reported by Coverity with CIDs 710615, 710616 and 710618	2015-02-27 19:16:22 +09:00
Gilles Gouaillardet	60404d1953	btl/base: fix misc memory leaks as reported by Coverity as CIDs 71818 71819	2015-02-27 19:06:06 +09:00
George Bosilca	455b465329	Reflect in the naming the location of the variable.	2015-02-26 18:22:23 -05:00
Jeff Squyres	5215dc0db3	shmem base: do not allow framework selection to occur twice Both opal_shmem_base_select() and opal_shmem_base_best_runnable_component_name() and were calling opal_shmem_base_runtime_query(), which would do component selection (and closing of losing components) twice. Put protection in opal_shmem_base_runtime_query() to return the cached results the second time. Additionally, make opal_shmem_base_runtime_query() "own" the cached results (vs. opal_shmem_base_select).	2015-02-26 14:56:46 -08:00
Jeff Squyres	312b0afb67	shmem base: make these the version-less struct names Minor style commit; no substantive code change.	2015-02-26 14:56:46 -08:00
Jeff Squyres	90a2c3cd99	shmem base: this function had no purpose being public Make it static to the base, and move it up higher in the file so that multiple functions beneath it can call it.	2015-02-26 14:56:46 -08:00
Jeff Squyres	9666884cf1	shmem: make the base_module_t a real mca_base_module_t This allows the opal_shmem_base_module_t to be properly cast to an mca_base_module_t. (this commit is the rationale for the previous shmem C99 .member initialization commit)	2015-02-26 14:56:46 -08:00
Jeff Squyres	62259a74f5	shmem: use C99 struct initialization Use .member=foo initialization for the shmem framework and components and modules.	2015-02-26 14:56:46 -08:00
George Bosilca	778ba0317e	Revert "Minor cleanups." This reverts commit `3b4da0bda4`.	2015-02-26 17:53:58 -05:00
George Bosilca	2c60c18e6f	A better fix for the want_fork_support issue. I noticed a naming conflict where ompi was used down in OPAL. I correctly renamd the MCA parameter, and created a deprecated synomym for the old name.	2015-02-26 17:35:54 -05:00
George Bosilca	5c3ce3a737	Merge branch 'master' of github.com:open-mpi/ompi	2015-02-26 17:10:18 -05:00
George Bosilca	aeace0468e	A more sensible fix, move the MCA variable in the verbs common area.	2015-02-26 16:51:09 -05:00
Nathan Hjelm	855d422e62	Merge pull request #408 from hjelmn/btl_3_0_mod btl: expose local registration thresholds	2015-02-26 12:57:43 -07:00
Mike Dubman	dbc15009b6	Merge pull request #415 from alinask/topic/fix_fork_support_flow Fix the calls to ibv_fork_init and remove btl_openib_want_fork_support.	2015-02-26 21:50:11 +02:00
Rolf vandeVaart	bbdcf9ff33	Fix missing cast change from opal_free_list_changes. Fixes warning	2015-02-26 11:24:49 -05:00
Gilles Gouaillardet	b888768ca3	btl/scif: fix a typo this is likely a typo introduced by open-mpi/ompi@5f1254d710 @hjelmn could you please double check this ?	2015-02-26 13:45:51 +09:00
Nathan Hjelm	8a17e69067	btl/ugni: fix typos introduced by free list update	2015-02-25 12:43:05 -07:00
George Bosilca	f3b58006c8	Merge branch 'master' of github.com:open-mpi/ompi	2015-02-25 12:01:35 -05:00
Jeff Squyres	9381f38a98	libevent2021: remove stale owner.txt file I'm guessing this directory was accidentally left in the tree when creating the owner.txt files.	2015-02-25 07:37:27 -08:00
Jeff Squyres	f3c9354d4b	usnic: restore compatibility with the v1.8 branch Also include two other minor changes: 1. More C99-style member initialization in the component struct 1. Fix the BTL module member initialization to not be redundant	2015-02-25 05:37:51 -08:00
Nysal Jan K.A	881a9f3d58	Fix cache line size detection on power Due to the nature of the cache architecture on power, we don't export coherency_line_size for L2 in sysfs. If we are unable to get the L2 cache line size, try L1. See open-mpi/ompi#383 for more information.	2015-02-25 17:26:28 +05:30
Alina Sklarevich	e4c4e7df5e	Fix the calls to ibv_fork_init and remove btl_openib_want_fork_support. In order to have an effect, ibv_fork_init should be called in the beginning of the verbs initialization flow - before the calls to the ibv_create_qp and ibv_create_cq verbs. These functions are called from the oob/ud code and by the time the other verbs components (btl openib, pml yalla, ...) call ibv_fork_init, it's too late. This commit forces the call to ibv_fork_init (if it's requested) right at the beginning of all the components that are using verbs. (ibv_fork_init() can be safely called multiple times) This commit also removes the btl_openib_want_fork_support mca parameter and adds a new mca parameter instead - opal_verbs_want_fork_support. Through this new parameter, fork support may be requested for ALL components. The default value for this parameter is set to 1. Before this commit the btl_openib_want_fork_support parameter didn't provide fork support for the openib btl if its value was set to 1. (because when openib called ibv_fork_init, it was already after the calls to ibv_create_* in oob/ud and thereofre it failed).	2015-02-25 10:58:50 +02:00
Jeff Squyres	a85a392896	Merge pull request #422 from jsquyres/topic/coverity-fixes Some Coverity fixes	2015-02-24 17:00:10 -05:00
Jeff Squyres	3cd36ab12a	openib: fix double free This was CID 1269989	2015-02-24 15:24:10 -05:00
Jeff Squyres	f381c5ea8b	crs none: ensure file is != NULL before closing it This was CID 71701	2015-02-24 15:24:09 -05:00
Jeff Squyres	8fd5e75463	crs base: ensure metadata != NULL This was CID 71700	2015-02-24 15:24:08 -05:00
Jeff Squyres	5894f0c1f2	usnic: update to new mpool API NOTE: Have not added cross-compatibility with v1.8 branch yet	2015-02-24 10:05:45 -07:00
Nathan Hjelm	5f1254d710	Update code base to use the new opal_free_list_t Use of the old ompi_free_list_t and ompi_free_list_item_t is deprecated. These classes will be removed in a future commit. This commit updates the entire code base to use opal_free_list_t and opal_free_list_item_t. Notes: OMPI_FREE_LIST__MT -> opal_free_list_ (uses opal_using_threads ()) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-24 10:05:45 -07:00
Nathan Hjelm	ed78553512	Update opal_free_list_t usage to reflect new class interface. Please verify your components have been updated correctly. Keep in mind that in terms of threading: OPAL_FREE_LIST_GET -> opal_free_list_get_st OPAL_FREE_LIST_RETURN -> opal_free_list_return_st I used the opal_using_threads() variant anytime it appeared multiple threads could be operating on the free list. If this is not the case update to _st. If multiple threads are always in use change to _mt.	2015-02-24 10:05:44 -07:00
Howard Pritchard	c9e81b54fb	Merge pull request #412 from hppritcha/topic/owner_files add owner files to opa/ompi/orte mca directories	2015-02-23 09:48:20 -07:00
Jeff Squyres	3e8f468709	mca_base_framework: use the right type for dequeued list items The items on the list are (mca_base_component_list_item_t)'s, not (mca_base_component_t)'s.	2015-02-23 08:30:58 -08:00
Gilles Gouaillardet	8d44d7086a	hwloc/base: fix misc memory leaks as reported by Coverity with CIDs 710636 and 1270441	2015-02-23 13:55:04 +09:00
Howard Pritchard	bf89131f9e	add owner files to opa/ompi/orte mca directories This commit adds an owner file in each of the component directories for each framework. This allows for a simple script to parse the contents of the files and generate, among other things, tables to be used on the project's wiki page. Currently there are two "fields" in the file, an owner and a status. A tool to parse the files and generate tables for the wiki page will be added in a subsequent commit.	2015-02-22 15:10:23 -07:00
George Bosilca	3b4da0bda4	Minor cleanups.	2015-02-21 16:36:29 -05:00
Jeff Squyres	937bbbac34	libfabric: update to 8528d35551a78b5241e615c0e6ac5a711f96a03c Update to latest from libfabric Github master ofiwg/libfabric@8528d35551	2015-02-20 12:37:27 -08:00
Nathan Hjelm	cc750b00a6	btl: export local registration thresholds Some BTLs do not require local registration for some rdma transactions. For example: inline put on openib, fma put on ugni. This commit adds code to expose the local registration thresholds to BTL users. Optimized code can take advantage of this information to improve rdma performance.	2015-02-19 16:13:37 -07:00
Jeff Squyres	6098b84294	libfabric: pass the appropriate LDFLAGS to libfabric components When compiling against an external libfabric, ensure to also pass the appropriate -L flags so that the compiler/linker can find it.	2015-02-19 05:35:38 -08:00
Jeff Squyres	2d636147e3	reachable netlink: fix the component symbol name	2015-02-19 04:25:15 -08:00
Ralph Castain	008755ab17	Remove stale file reference	2015-02-18 18:36:08 -08:00
Nathan Hjelm	0e09b9298a	mca/base: add framework flag indicating a framework does not have dso components This flag is needed for a special case framework: dl. The framework is needed before any dl components can be used.	2015-02-18 14:03:51 -07:00
rhc54	ae16a168ec	Merge pull request #401 from rhc54/reachable Add reachable framework for determining TCP connections	2015-02-18 08:22:48 -08:00
Jeff Squyres	b66fc3aed9	opal_check_visibility.m4: remove extraneous sym link The sym link to this m4 is not necessary down in the component.	2015-02-18 03:40:25 -08:00
Jeff Squyres	f040ef09ff	libfabric: properly define HAVE_ALIAS_ATTRIBUTE @ggouaillardet identified that HAVE_ALIAS_ATTRIBUTE was not properly being defined in the embedded libfabric. This is because the embedded configury missed the test for it (i.e., the real configure.ac for libfabric always defines HAVE_ALIAS_ATTRIBUTE to 0 or 1 -- we didn't emulate that properly here in libfabric's configure.m4). Also, fix some grammar and properly escape another AC_MSG_CHECKING message in libfabric's configure.m4.	2015-02-18 03:26:34 -08:00
Gilles Gouaillardet	28714b60cb	btl/sm: fix misc errors as reported by Coverity as CIDs 711636 and 1269847	2015-02-18 17:05:19 +09:00
Ralph Castain	9ef523c152	Add reachable framework for determining TCP connections	2015-02-17 21:47:09 -08:00
Jeff Squyres	9cb047c1ee	libfabric: don't install the osd.h headers When configured --with-devel-headers, there's now 2 "osd.h" header files in libfabric (in different dirs). Automake's "install" target didn't like this, and errored out. Since embedding libfabric is a temporary measure, just avoid the problem by not installing any libfabric headers.	2015-02-17 07:10:12 -08:00
Gilles Gouaillardet	55948f2a6d	hwloc: fix misc memory leak as reported by Coverity with CID 1270441 (previous commit open-mpi/ompi@c25185f3a9 did not fully fix that one)	2015-02-17 14:06:15 +09:00
Gilles Gouaillardet	da7ffb6448	btl/vader: fix memory leak as reported by Coverity with CID 1269904	2015-02-16 13:51:05 +09:00
Gilles Gouaillardet	c25185f3a9	opal/hwloc: fix misc memory leaks as reported by Coverity with CIDS 710631-710638, 1196705, 1196716, 1196717, 1196752, 1196753	2015-02-16 12:23:37 +09:00
Gilles Gouaillardet	8dd77c692e	opal/hwloc: fix misc bugs as reported by Coverity with CIDs 72224, 703566, 1196821, 1196842, 1196657 and 1196658	2015-02-16 11:59:48 +09:00
Gilles Gouaillardet	0ce59f2d29	pmix: fix misc memory leaks as reported by Coverity as CID 1269843, 1269854, 1269856, 1269857 and 1269858	2015-02-16 11:19:43 +09:00
George Bosilca	a7a4d6335e	Various cleanups.	2015-02-15 11:39:09 -05:00
George Bosilca	a4aa74d4b9	Fix the SM BTL.	2015-02-15 11:38:45 -05:00
George Bosilca	84994c7438	This comment seems to contradict with the compilers opportunities to optimize the unused data out.	2015-02-15 11:37:22 -05:00
Jeff Squyres	2ca14acaf0	libfabric: add missing files into Makefile.am	2015-02-14 05:01:29 -08:00
Jeff Squyres	955d8b7525	usnic: adapt for new libfabric API	2015-02-13 14:44:23 -08:00
Jeff Squyres	3abebe7251	libfabric: update to ofiwg/libfabric@06fdfbef98	2015-02-13 14:44:06 -08:00
Nathan Hjelm	1162093d34	btl/scif: fix debug build Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:39 -07:00
Jeff Squyres	67ee1e6cf8	usnic: restore compatibilty between master and v1.8 Add the functions that changed between BTL 2.0 and 3.0 into compat.h and compat.c: * module.btl_prepare_src: the signature and body of this method changed between 2.0 and 3.0. However, the functions that this method calls did not need to change, so they are copied over wholesale (with the exception that they no longer accept the unused `registration` parameter). * module.btl_prepare_dst: this method does not exist in BTL 3.0. * module.btl_put: the signature and body of this method changed between 2.0 and 3.0.	2015-02-13 11:46:38 -07:00
Jeff Squyres	ad841d7ba3	usnic: update to BTL 3.0	2015-02-13 11:46:38 -07:00
Jeff Squyres	0a5fd8e36a	usnic: update README for new BTL 3.0 scheme details	2015-02-13 11:46:38 -07:00
Jeff Squyres	cf99f0c905	usnic: just add comments/explanations -- no code changes	2015-02-13 11:46:38 -07:00
Jeff Squyres	af61065b87	usnic: minor update of member field names	2015-02-13 11:46:38 -07:00
Jeff Squyres	8311428602	btl.h: whitespace cleanup No code changes	2015-02-13 11:46:38 -07:00
Jeff Squyres	7971fd57f0	btl.h: add more description for reg/dereg functions	2015-02-13 11:46:38 -07:00
Nathan Hjelm	a3b739d117	btl/ugni: use pthread_join to wait on progress thread completion Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:38 -07:00
Nathan Hjelm	953efc3eb2	btl/openib: fix compilation issues with XRC Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:38 -07:00
Nathan Hjelm	a9763e123d	add btl comment Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:38 -07:00
Nathan Hjelm	1e518504e4	btl/smcuda: update for BTL 3.0 interface	2015-02-13 11:46:37 -07:00
Nathan Hjelm	aba0675fe7	btl/vader: update for BTL 3.0 interface Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:37 -07:00
Nathan Hjelm	f8ac3fb1e8	btl/ugni: add support for atomic operations Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:37 -07:00
Nathan Hjelm	655604f509	btl/ugni: update for BTL 3.0 interface Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:37 -07:00
Nathan Hjelm	4972d97b8b	btl/template: update for BTL 3.0 interface Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:37 -07:00
Nathan Hjelm	f241b6e0a7	btl/tcp: update for BTL 3.0 interface Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:36 -07:00
Nathan Hjelm	25176cad27	btl/sm: update for BTL 3.0 interface Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:36 -07:00
Nathan Hjelm	19abc19ad9	btl/self: update for BTL 3.0 interface Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:36 -07:00
Nathan Hjelm	f96d48a2e1	btl/scif: update for BTL 3.0 interface Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:36 -07:00
Nathan Hjelm	cf91156105	btl/openib: add atomic operation support Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:36 -07:00
Nathan Hjelm	74f1af4548	btl/openib: update for BTL 3.0 interface Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:36 -07:00
Nathan Hjelm	fc7397949c	btl: require that btls handle descriptor = NULL in the btl_sendi function The send inline optimization uses the btl_sendi function to achieve lower latency and higher message rates. Before this commit BTLs were allowed to assume the descriptor was non-NULL and were expected to return a valid descriptor if the send could not be completed using btl_sendi. This behavior was fine until the usage of btl_sendi was changed in ob1. This commit allows the caller to specify NULL for the descriptor. The affected btls have been updated to handle this case. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:36 -07:00
Nathan Hjelm	593f97ae92	btl: add support for 64-bit atomic operations This commit adds an interface for btl's to export support for 64-bit atomic operations on integers. BTL's that can support atomic operations should implement these functions and set the appropriate btl_flags and btl_atomic_flags. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:36 -07:00
Nathan Hjelm	f8e15ca83d	Update the interface to provide a cleaner interface for RDMA operations. The old BTL interface provided support for RDMA through the use of the btl_prepare_src and btl_prepare_dst functions. These functions were expected to prepare as much of the user buffer as possible for the RDMA operation and return a descriptor. The descriptor contained segment information on the prepared region. The btl user could then pass the RDMA segment information to a remote peer. Once the peer received that information it then packed it into a similar descriptor on the other side that could then be passed into a single btl_put or btl_get operation. Changes: - Added functions to register and deregister memory regions with the btl. If no registration is needed a btl should set these function pointers to NULL. These function take over for btl_prepare_src/dst and btl_free for RDMA operations. The caller should specify the maximum permissions needed on the memory. - Changed the function signatures for both btl_put and btl_get. In place of a prepared descriptor the caller should provide the source and destination addresses and registration handles as well as a new callback function. The callback will be provided with the local address and registration handle, callback context, callback data, and status. See mca_btl_base_rdma_completion_fn_t in btl.h. - Added a new btl constraint: MCA_BTL_REG_HANDLE_MAX_SIZE. This value specifies the maximum size of any btl's registration handle. - Removed the btl_prepare_dst function. This reflects the fact that RDMA operations no longer depend on "prepared" descriptors. - Removed the btl_seg_size member. There is no need to btl's to subclass the mca_btl_base_segment_t class anymore. - Expose the btl's put/get limitations with new struct members: btl_put_limit, btl_put_alignment, btl_get_limit, btl_get_alignment. - Remove the mca_mpool_base_registration_t argument from the btl_prepare_src function. The argument was intended to support RDMA operations and is no longer necessary. - Remove des_remote/des_remote_count from the mca_btl_base_descriptor_t structure. This structure member was originally used to specify the remote segment for RDMA operations. Since the new btl interface no longer uses desriptors for RDMA this member no longer has a purpose. In addition to removing these members the local segment structure fields have been renamed to from des_local/des_local_count to des_segments/des_segment_count. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:36 -07:00
Howard Pritchard	6a275f4489	Merge pull request #395 from hppritcha/topic/pmix_cray_kvs pmix/cray: remove workaround for OBJ_RELEASE	2015-02-13 11:25:50 -07:00
Howard Pritchard	bd9d185951	pmix/cray: remove workaround for OBJ_RELEASE Per feedback from rhc, manually set the base_ptr member of the opal_buffer_t variable to NULL prior to calling OBJ_RELEASE. A similar feature of opal_dss.load also exists so likewise reset the base_ptr to NULL prior to invoking it. Hopefully the opal_buffer_t struct does not change frequently. Minor cleanups to reduce output when pmix_base_verbose mca paramater is set.	2015-02-13 07:47:26 -08:00
Jeff Squyres	f7b4b23383	usnic: ensure to NULL-terminate the string/not overflow This was CID 1269921.	2015-02-12 13:41:30 -08:00
Jeff Squyres	8febd41a39	usnic: fix minor memory leak This was CID 1269859.	2015-02-12 13:41:30 -08:00
Jeff Squyres	4c074da1c2	usnic: fix minor memory leak This was CID 1269853.	2015-02-12 13:41:30 -08:00
Jeff Squyres	a7ce2d406c	usnic: don't bother comparing unsigned values for <0 This was CID 1269812.	2015-02-12 13:41:30 -08:00
Jeff Squyres	caacc6ad91	usnic: properly differentiate data pool vs. malloc usnic_fls() can actually return 0, leading us to incorrectly free() a buffer instead of OMPI_FREE_LIST_RETURN_MT'ing it. So add an explicit bool in the struct that tracks whether the buffer came from malloc or a freelist. This was CID 1269660.	2015-02-12 13:41:30 -08:00
Jeff Squyres	3b39535ebb	usnic: ensure that the string is NULL-terminated This was CID 1269666.	2015-02-12 13:41:30 -08:00
Jeff Squyres	41c6e26a38	usnic: ensure the copied string is NULL-terminated This was CID 1269667	2015-02-12 13:41:30 -08:00
Jeff Squyres	81585c0a7c	usnic: strengthen the check-if-accept()-failed test This was Coverity CID 1269801.	2015-02-12 13:41:30 -08:00
Jeff Squyres	117e6feaa1	shmem sysv: ensure we don't shmdt(NULL) This was CID 71999.	2015-02-12 13:41:30 -08:00
Jeff Squyres	6d3a84514f	mca_base_cmd_line.c: fix minor memory leak This was CID 1269874.	2015-02-12 13:41:29 -08:00
Jeff Squyres	f8e334357d	mca_base_pvar.c: protect removal from list Only remove it from the list if it is actually on the list. This was CID 1269758.	2015-02-12 13:41:29 -08:00
Nathan Hjelm	f1dc29b145	btl/vader: fix modex size when xpmem is in use Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-12 14:06:24 -07:00
Nathan Hjelm	49ba150972	mca/base: fix path string parsing CID 993709	2015-02-12 13:03:46 -07:00
Jeff Squyres	00c878957c	mca_base_var.c: add debug check for another programming error Coverity alerted us to the fact that there are places where the synonym_for param is hard-coded to -1 when calling register_variable(). It would be a coding error if synonym_for==-1 and (flags & MCA_BASE_VAR_FLAG_SYNONYM)>0, so let's add that to the debug-only check at the top of the function. This was CID 993717.	2015-02-12 10:24:02 -08:00
Jeff Squyres	332943f1c3	pstat linux: ensure to close the file This was CID 71983.	2015-02-12 10:24:02 -08:00
Jeff Squyres	6a64fe85a1	pstat linux: ensure read() returns >=0 This was CID 71182.	2015-02-12 10:24:02 -08:00
Jeff Squyres	8be0e0b0ca	usnic: don't close fp upon error Let the caller close fp. Properly check for errors when calling subroutines. This was Coverity CID 1269995.	2015-02-12 10:24:01 -08:00
Howard Pritchard	0cf2b478e0	Merge pull request #391 from hppritcha/topic/cray_pmi_kvs pmix/cray: initial kvs removal work	2015-02-11 19:55:34 -07:00
Howard Pritchard	9955834ff1	pmix/cray: initial kvs removal work Remove use of the Cray PMI KVS - which is designed for a lighweight MPI that exchanges only a minimimal amount of connection info (about 128 bytes per rank) - within cray/pmix. Use Cray PMI collective extensions instead. This is the first of several steps to accelerate launch of Open MPI on Cray systems using either native aprun or nativized slurm.	2015-02-11 15:14:55 -08:00
Rolf vandeVaart	08dceda2c0	Fix logic for handling priority and eager RDMA. There was some refactoring that was done in this code and it ended up changing the logic that is used to set up eager RDMA. Rather than setting up eager RDMA with a high priority message, it did it the other way around. For some reason, CUDA-aware support did not like this. So, basically, restore the logic to the way it was prior to the refactoring. The refactoring did not intend to change this. Lightly reviewed by hjelmn.	2015-02-11 16:38:36 -05:00
Jeff Squyres	4f1996df5d	various: remove $(LTDLINCL) from Makefile.am's that didn't need it	2015-02-11 12:25:20 -08:00
Ralph Castain	3de8c5c7c6	Cleanup the munge support - the credential cannot be reused for multiple connections	2015-02-10 20:34:35 -08:00
George Bosilca	e173f9b0c0	Somehow we lost one of the most critical parameter allowing the PML to decide how to order the different interconnects. Bring it back !	2015-02-10 20:32:05 -05:00
Ralph Castain	3ae3b96c17	Fix master compilation - a buried header dependency must have been removed.	2015-02-10 07:22:10 -08:00
Mike Dubman	6816e3421f	Merge pull request #377 from regrant/ib_wr_fix fix problem with get_pathrecord posting too many recv requests	2015-02-10 08:47:23 +02:00
Ralph Castain	bef830efef	Fix debug output	2015-02-09 20:49:04 -08:00
Ralph Castain	07134f5b17	Add munge security	2015-02-09 20:49:03 -08:00
Ralph Castain	a3275aa867	Once again, fix the blasted singleton comm_spawn	2015-02-05 17:34:25 -08:00
Jeff Squyres	0dbbffb753	pmix_base_frame: use the "= { 0 }" initializer Per open-mpi/ompi#381, convert the specific intialization of opal_pmix to use the generic "= { 0 }" initializer. This form can be used to initialize any type when the intent is just to zero out / assign some value.	2015-02-05 17:51:06 -05:00
Ralph Castain	4d882796b6	Silence warnings	2015-02-05 11:41:00 -08:00
Howard Pritchard	e508a4078e	Merge pull request #376 from regrant/ib_error_fix fixes OpenIB connect error reporting for ibv_* calls that return an errn...	2015-02-04 10:22:03 -07:00
Jeff Squyres	621af3aa07	pmix_base: fix global opal_pmix symbol for static linking on OS X OS X has weirdness when static linking. If a symbol is not initialized, it is put into the common block section, and Weird Things happen (linking when trying to using that global symbol will fail). If you initialize the variable, it goes into a different section (and linking to it will work). This link (that might go stale someday) has some information about OS X linker scope and treatment of symbol definitions: https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/MachOTopics/1-Articles/executing_files.html#//apple_ref/doc/uid/TP40001829-98432-TPXREF120 Fixes #375.	2015-02-04 12:12:31 -05:00
Ryan Grant	de93497789	fix problem with get_pathrecord posting too many recv requests	2015-02-04 09:53:58 -07:00
Ryan Grant	5d5e9bc1f8	fixes OpenIB connect error reporting for ibv_* calls that return an errno	2015-02-04 09:09:14 -07:00
Jeff Squyres	a3728f09af	libfabric: add another missing file to the Makefile.am	2015-02-04 04:02:27 -08:00
Jeff Squyres	66a680879e	libfabric: fix header file name in Makefile.am	2015-02-03 19:41:25 -08:00
Jeff Squyres	cb7cc171f9	usnic: update README.txt notes Update notes about copying the usnic BTL between master and the v1.8 branch.	2015-02-03 15:54:36 -08:00
Jeff Squyres	edf7232e00	usnic: enable building with an external libfabric	2015-02-03 13:46:06 -08:00
Jeff Squyres	bfa54d5d7b	usnic: update to match new libfabric	2015-02-03 13:46:06 -08:00
Jeff Squyres	d2490d2fd8	libfabric: update Makefile.am to match new libfabric drop	2015-02-03 13:46:05 -08:00
Jeff Squyres	3dc0abfbc4	libfabric: update to (just past) 1.0rc1 Updated to Github ofiwg/libfabric@6b005d0d19.	2015-02-03 13:46:05 -08:00
Ralph Castain	d3267c200f	Add missing OMPI-changes to libevent 2.0.22	2015-02-02 20:57:40 -08:00
Jeff Squyres	965ccab6cc	libfabric: remove a few warnings Embedding libfabric is a temporary measure; I'm removing some warning notifications so that the output isn't so cluttered (we're getting the real warnings fixed upstream, but the OMPI community doesn't really care/need to see the warnings in the meantime).	2015-01-29 17:38:02 -08:00
Todd Kordenbrock	37e6096fe7	Copyright update.	2015-01-29 11:08:13 -06:00
Todd Kordenbrock	ca30e129e8	Add the option to use the Portals4 logical to physical table. This commit adds an MCA variable to select Portals4 logical addressing, populates the logical-to-physical mapping table and initializes the NI in this mode.	2015-01-29 11:08:13 -06:00
George Bosilca	b9a63cbe7a	One less warning.	2015-01-27 13:25:55 -05:00
Ralph Castain	294ebc907a	Fix singleton operations so they can work inside a slurm environment	2015-01-27 09:29:42 -06:00
Ralph Castain	ba25e8a0ce	Fix singletons	2015-01-27 09:29:42 -06:00
Ralph Castain	028b00154d	Complete implementation of the schizo framework to support OMPI component	2015-01-27 09:29:42 -06:00
Jeff Squyres	436223959d	usnic: update to match new libfabric APIs	2015-01-24 05:49:36 -08:00
Jeff Squyres	7d5755f62b	libfabric: update to ofiwg/libfabric@b3f7af4c67 Pull down a new embedded copy of libfabric from https://github.com/ofiwg/libfabric.	2015-01-24 05:48:48 -08:00
Howard Pritchard	056daa05bf	btl/ugni: use PMIX_GLOBAL for modex_send in ugni Using PMIX_REMOTE is not the right thing for ugni BTL when its possible that spawned ranks end up on the same node as some of the spawnee ranks.	2015-01-22 06:53:45 -08:00
Gilles Gouaillardet	9f80aa2d28	btl/openib: regression fix when rdmacm or udcm are disabled This fixes a regression introduced in open-mpi/ompi@661c35ca67 Thanks to Mark Santcroos for reporting this issue	2015-01-20 11:31:50 +09:00
Rolf vandeVaart	66f6026214	Improve error message to help user figure out what to do	2015-01-16 13:55:27 -05:00
Jeff Squyres	65a279019e	usnic: fix typo in memchecker usage	2015-01-16 09:42:19 -08:00
Jeff Squyres	3969fe3a94	libfabric: ensure wrapper libs are loaded for static builds For static builds, we need to also set <framework>_<component>_WRAPPER_EXTRA_LIBS so that the wrappers know what other libraries to add to link executables.	2015-01-16 09:29:52 -08:00
Gilles Gouaillardet	661c35ca67	cleanup dead code caused by the removal of the --with-threads configure option	2015-01-16 19:13:59 +09:00
Nathan Hjelm	006074c48d	Merge pull request #332 from hjelmn/openib_updates Openib updates	2015-01-15 15:05:18 -06:00
Jeff Squyres	d13c14ec82	CSCus22527: fix off-by-one error in checking the number of VFs Ensure to count this process when checking for how many VFs we need on the local server. (cherry picked from commit 386c01934e98cb8dcb48ff648ecdfb0c8677baa9)	2015-01-15 11:44:29 -08:00
Jeff Squyres	4685767b2d	libfabric: update usnic configury Use new common m4 macro for choosing between libnl3 and libnl.	2015-01-15 07:12:39 -08:00
Jeff Squyres	400b02e566	libfabric: update to github:ofiwg/libfabric HEAD Specifically: bbf0f3ea8e92c92a7cee56473ecdbbbb34cceb7d (15 Jan 2015)	2015-01-15 07:11:54 -08:00
Aurélien Bouteiller	f49981bb2a	Disable coalescing until pull request #332 gets in.	2015-01-14 14:12:47 -05:00
Nathan Hjelm	cf4975501d	rcache/vma: fix parent class of mca_rcache_vma_t There was a mismatch between the structure for mca_rcache_vma_t and the OBJ_CLASS_INSTANCE. One was opal_list_item_t and the other was ompi_free_list_item_t. The super class in the structure looks like it is the correct one. Changed the superclass in OBJ_CLASS_INSTANCE to match.	2015-01-14 10:21:24 -07:00
Jeff Squyres	e4e5e7dbc0	usnic: ensure to clean up nicely in case of low resources If there are not enough resources (e.g., low VFs), we can end up calling finalize_one_channel() on the same channel multiple times. So ensure to NULL out fields that we have freed already so that we do not try to free them a second time. Fixes CSCus26648.	2015-01-13 14:37:31 -08:00
Jeff Squyres	8807ae2497	usnic libfabric: also set the us_netmask_be field. From libfabric upstream commit ofiwg/libfabric@3976745. Part of the fix for CSCus22495.	2015-01-13 12:04:57 -08:00
Jeff Squyres	d00cede718	usnic: fix if_include/exclude of CIDR-specified networks Fix the ordering so that we obtain the usnic netmask information before we do the filtering based on CIDR-specified networks. Also requires upstream Github libfabric commit 3976745. Fixes CSCus22495.	2015-01-13 12:04:51 -08:00
Jeff Squyres	a220b92cf8	usnic: fix function name in opal_output	2015-01-13 12:04:07 -08:00
Gilles Gouaillardet	955f3c2730	configury: check existence of the atomic_init function in libfabric intel compilers implements atomic_init in c++ only, so disable c11 atomic in libfabric for now	2015-01-13 16:39:41 +09:00
Gilles Gouaillardet	cbe0d26b2d	configury: do test the __STDC_NO_ATOMICS__ macro for libfabric	2015-01-13 16:06:37 +09:00
Jeff Squyres	5ed688a074	usnic: enusre that we only get "usnic"-named providers Also, a minor update to a verbose message.	2015-01-12 13:21:22 -08:00
Jeff Squyres	881b1dcf19	usnic: document libfabric abstractions Handy tips to remember the libfabric abstractions and what they correspond to in usnic/VIC terms.	2015-01-09 15:21:51 -08:00

... 3 4 5 6 7 ...

2145 Коммитов