1
1
Граф коммитов

22541 Коммитов

Автор SHA1 Сообщение Дата
Adrian Reber
f45dd069bd FT: fix compilation using --with-ft (1/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

This first patch moves orte_cr_continue_like_restart from ORTE
to opal_cr_continue_like_restart in OPAL. This only leaves three
calls from OPAL to ORTE in the FT code. As it is not yet 100%
clear how to handle these calls the code orte_sstore.set_attr()
has been #ifdef'd out for now.
2015-03-11 14:23:33 +01:00
Mike Dubman
a188cb2ff9 Merge pull request #465 from alinask/topic/fix_yalla_warn
PML_YALLA: fix compilation warnings.
2015-03-11 11:38:41 +02:00
Alina Sklarevich
f9a9b936a1 PML_YALLA: fix compilation warnings. 2015-03-11 10:58:54 +02:00
Nathan Hjelm
b308afa8fd btl/openib: remove derived btl segment type
The derived segment type (btl_openib_segment_t) was intended to store
the registration info needed for put and get. In BTL 3.0 this is no
longer required. I intended to remove this type as part of
open-mpi/ompi@74f1af4548 .

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-03-10 14:41:15 -06:00
Nathan Hjelm
3d32dbd793 btl/openib: cuda: fix CUDA-aware support with async copy
This commit should resolve an issue seen with CUDA-aware support. The
problem came in with BTL 3.0. Before 3.0 the size of the copy was
stored in the incoming segment's des_remote_count field. This field
does not exist in BTL 3.0 so I stored the value in the
des_segment_count field. This caused problems with the cuda support
code. To fix the issue the endpoint pointer is now stored in the in
fragment's endpoint pointer which free's up the segment's des_cbdata
pointer for storing the transfer size.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-03-10 14:38:12 -06:00
Nathan Hjelm
d929137768 osc/pt2pt: need to unlock self before waiting for unlock acks
This commit fixes a bug in osc/pt2pt which causes MPI_Win_unlock_all
to hang. The problem was caused by code refactoring that moved the
unlock of the local process to after the loop that waits for unlock
acks. This will cause the code to loop forever waiting on the self
ack.

Fixes #444
2015-03-10 14:10:37 -06:00
Yohann Burette
d48a8ab8f0 mtl/ofi: Use fi_allocinfo(). 2015-03-10 12:50:55 -07:00
Jeff Squyres
cbd99d5f60 libfabric: update to Github upstream 1b4bb2285b
Get a usnic bug fix.
2015-03-10 12:09:02 -07:00
Jeff Squyres
2e8ee003b0 ofi: endpoint type hint moved to a sub-struct, BUFFERED went away
Update to match	new libfabric API/structure change.
2015-03-10 09:55:45 -07:00
Jeff Squyres
d97551bdb1 usnic: endpoint type hint moved to a sub-struct
Update to match new libfabric API/structure change.
2015-03-10 09:47:41 -07:00
Jeff Squyres
1a1be2efa0 libfabric: update to Github upstream 7095f3dc 2015-03-10 09:47:40 -07:00
Jeff Squyres
afec1454f5 usnic: only setup the connectivity checker if we have modules
If we ended up with no modules (e.g., all usnic devices were
excluded), there was a race condition in that the connectivity agent
could tear down its local socket before one or more of the local
clients saw it.  Therefore, the local clients would timeout waiting
for the socket to appear.

So move the connectivity checker init later in the bootstrapping
process (it *must* be setup before module_init()), and have it only
invoked if we actually ended up with one or more modules.
2015-03-10 07:43:20 -07:00
Jeff Squyres
06accb721c usnic: ensure to free all resources if no usnic BTLs found
If all usnic devices are excluded, then we need to ensure the error
path includes freeing the filter.

This was Coverity CID 1288085
2015-03-10 07:43:20 -07:00
Jeff Squyres
8fef4e865f dl dlopen: fix use-after-free
Re-structure the loop looking for duplicates a little so that we only
have a single free of the string that happens regardless of whether we
found a duplicate or not.

This was Coverity CID 1288090
2015-03-10 07:43:20 -07:00
Jeff Squyres
3efb5f56ae dl dlopen: ensure dirs is not NULL
opal_argv_split() may have returned NULL.

This was Coverity CID 1288088
2015-03-10 07:43:20 -07:00
Jeff Squyres
86968dcdda dl dlopen: fix resource leak
closedir() was one block higher than it should have been.

This was Coverity CID 1288087.
2015-03-10 07:43:20 -07:00
Jeff Squyres
546ad3f060 dl dlopen: free resources upon error
Ensure to take the right path out upon errors (that will free any
pending resources).

This was Coverity CID 1288086
2015-03-10 07:43:19 -07:00
Rolf vandeVaart
49b5eb6c91 Fix missing initialization of variable 2015-03-10 10:33:27 -04:00
Howard Pritchard
b73d566d57 Merge pull request #454 from hppritcha/topic/coverity_fixes
fcoll/dynamic: coverity fixes
2015-03-10 07:59:56 -06:00
Mike Dubman
6f91a007e1 Merge pull request #458 from yosefe/topic/pml-yalla-fix-segv
keep mxm context alive as long as pml_yalla component is open.
2015-03-10 13:38:14 +02:00
Gilles Gouaillardet
a69d935d55 oob/tcp: fix misc issues
as reported by Coverity with CIDs 70726, 710564,
1196630, 1269805, 1269803, 1269932
2015-03-10 19:32:01 +09:00
yosefe
976144dca7 keep mxm context alive as long as pml_yalla component is open.
pml_yalla_del_comm may be called after yalla module is finalized, which
leads to invalid memory access if mxm context is already destroyed in
this point.
2015-03-10 11:52:44 +02:00
Gilles Gouaillardet
dc0bc756dc iof/base: fix misc memory leak
as reported by Coverity with CID 1196732
2015-03-10 14:37:53 +09:00
Gilles Gouaillardet
f7f7fa73dd opal_cr: fix incorrect NULL assignment
as reported by Coverity with CID 1288084
2015-03-10 12:06:57 +09:00
Jeff Squyres
4b2cba46f4 usnic: fix bootstrap error paths
Fix previously-unfinished error paths during startup/bootstrapping.
Instead of just blindly continuing on when an fi_* function call
fails, opal_show_help and skip that device.

Also, only check the usnic config minimums once.  They're VIC-wide and
won't change on a per-device basis -- we only need to check them once.

Fixes CSCut19179.
2015-03-09 16:57:41 -07:00
Nathan Hjelm
005c6022e2 mca/base: fix bugs in framework deregistration/re-registration
There were a number of bugs in the framework/variable code that
affected deregistration:

 - Frameworks could be erroneously closed if seperately registered and
   opened then subsequently closed. This was a bug in the original
   design which only reference counted opens but not
   registrations. This would cause undefined behavior if
   MPI_T_finalize actually calls ompi_info_close_components as
   intended. Now both registrations and opens are reference counted
   and frameworks/components are not torn down until the matching
   number of close calls have been made.

 - group_find_by_name did not pass the invalidok flags down
   to mca_base_var_group_get_internal correctly.

 - Group deregistration caused the group to be completely reset. This
   does not match the behavior required by MPI_T as it could reduce
   the number of variables/subgroups in a group.

This commit also updates MPI_T_finalize to call
ompi_info_close_components as originally intended.

Closes #374

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-03-09 16:52:53 -06:00
Howard Pritchard
fba88360a8 fcoll/dynamic: more coverity fixes
Okay coverity seems to get one stuck in a loop where
by fixing one set of resource allocation problems, it
starts finding more.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-03-09 15:01:05 -07:00
Howard Pritchard
2d61a652c8 fcoll/dynamic: coverity fixes
okay, hopefully really fix CIDS 72325-72328, and 72330-72332.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-03-09 13:53:52 -07:00
Nathan Hjelm
0d80bfb391 Merge pull request #443 from hjelmn/mpit_31
Add new error code introduced in MPI-3.1.
2015-03-09 13:03:15 -06:00
Jeff Squyres
b958daa3e6 Merge pull request #410 from jsquyres/topic/libltdl-must-die
RFC: Replace libltdl with OPAL "dl" framework
2015-03-09 11:59:31 -04:00
Jeff Squyres
914880a368 libtldl: remove libltdl from the tree
The libltdl interface has been completely replaced by the OPAL DL
framework (i.e., the opal_dl interface).

Fixes open-mpi/ompi#311
2015-03-09 08:18:14 -07:00
Jeff Squyres
a026456bef (orte|ompi|oshmem)*info tools: convert to opal_dl interface
Noe that this commit removes option:lt_dladvise from the various
"info" tools output.  This technically breaks our CLI "ABI" because
we're not deprecating it / replacing it with an alias to some other
"into" tool output.

Although the dl/libltdl component contains an "have_lt_dladvise" MCA
var that contains the same information, the "option:lt_dladvise"
output from the various "info" tools is *not* an MCA var, and
therefore we can't alias it.  So it just has to die.
2015-03-09 08:18:13 -07:00
Jeff Squyres
0a2767a5d3 opal lt_interface: remove in favor of opal_dl interface 2015-03-09 08:18:13 -07:00
Jeff Squyres
1995f6beba cuda: convert to opal_dl interface 2015-03-09 08:18:13 -07:00
Jeff Squyres
c683500a29 debuggers: convert to opal_dl interface 2015-03-09 08:16:55 -07:00
Jeff Squyres
a9d86129c6 mca base: convert to opal_dl interface 2015-03-09 08:16:55 -07:00
Jeff Squyres
39364d315c libltdl: dl component based on libltdl
Works on any system that libltdl supports and has ltdl.h and libltdl
available.
2015-03-09 08:16:55 -07:00
Jeff Squyres
7d340c0c26 dlopen: simple dl component based on POSIX dlopen
Works on systems with dlopen (e.g., Linux and OS X).  It requires
dlfcn.h and libdl, which many systems have installed by default.
2015-03-09 08:16:55 -07:00
Jeff Squyres
e81c070ef0 dl framework: new dynamic loader framework
Embedding libltdl without the use of Libtool bootstrapping has
proven... difficult.  Instead, create a new simple "dl" framework.  It
only provides 4 functions:

- open a DSO (very similar to lt_dlopenadvise())
- lookup a symbol in a previously-opened DSO (very similar to lt_dlsym())
- close a previously-opened DSO (very similar to lt_dlclose())
- iterate over all files in a directory (very similar to ld_dlforeachfile())

There will be follow-on commits with a simple dlopen-based component
(nowhere near as complete/functional as libltdl, but good enough for
Linux and OS X), and a libltdl-based component for all other
platforms.

The intent is that the dlopen-based component can be built by default
in almost all cases.  But if libltdl is available, that component will
be built.  End result: we still get DSO-based functionality by default
in (almost?) all cases.  Without embedding libltdl.  Which is what we
want.
2015-03-09 08:16:55 -07:00
Jeff Squyres
d6530b0e99 opal_check_package: use AC_SEARCH_LIBS instead of AC_CHECK_LIB
Per discussion on devel
(http://www.open-mpi.org/community/lists/devel/2015/02/17030.php), and
per Autoconf 2.69 docs, use the recommended AC_SEARCH_LIBS instead of
AC_CHECK_LIB (e.g., for functions that appear in libc on some
platforms and in a specific library on other platforms).
2015-03-09 08:15:38 -07:00
Gilles Gouaillardet
59be12b260 filem/raw: fix misc memory leaks
as reported by Coverity with CIDs 716815, 716817, 720760,
1196703, 1196704, 1196746
2015-03-09 19:56:20 +09:00
Gilles Gouaillardet
1746e23f11 opal/cr: fix misc memory leak and error case
as reported by Coverity with CIDs 71858 and 710640
2015-03-09 19:28:52 +09:00
Gilles Gouaillardet
2789d782ab timer/linux: fix insecure data handling
as reported by Coverity with CID 1269923
2015-03-09 19:14:56 +09:00
Gilles Gouaillardet
6de973daae coll/sm: remove unused value
as reported by Coverity with CID 1269962
2015-03-09 17:31:32 +09:00
Gilles Gouaillardet
1896d4fba7 bcol/basesmuma: fix misc memory leak
as reported by Coverity with CID 715762
2015-03-09 17:22:25 +09:00
Gilles Gouaillardet
341bdd1fc3 ompi/group: refactor ompi_group_incl
and fixes CID 70478
2015-03-09 17:07:11 +09:00
Gilles Gouaillardet
2ab9a411f8 plm/base: fix misc memory leaks
as reported by Coverity with CIDs 1196733 and 1196745
2015-03-09 16:25:07 +09:00
Gilles Gouaillardet
fa10025843 ras/slurm: fix misc memory leaks
as reported by Coverity with CIDs 968580 and 1196723-1196727
2015-03-09 15:58:51 +09:00
Gilles Gouaillardet
eae39bd948 ras/simulator: fix misc memory leaks
as reported by Coverity with CIDs 710647, 714133 and 714134
2015-03-09 15:52:29 +09:00
Gilles Gouaillardet
9107bf5077 ompi/topo: fix misc errors
as reported by Coverity with CIDs 1041232, 1041234, 1041235
1269789 and 1269996
2015-03-09 15:22:22 +09:00