1
1
Граф коммитов

23709 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
14340770c4 usnic: remove some logically dead code
This code really had no purpose; just assign FI_VERSION(1, 1).  This
fixes CID 1315274.

Also clarify the commet about why we still retain libfabric v1.0.0
compatibility code, even though configure.m4 requires libfabric >= v1.1.0.
2015-08-12 05:21:18 -07:00
Jeff Squyres
7f857034d9 common verbs: check return value of sscanf()
Fixes CID 1304563.
2015-08-12 05:14:58 -07:00
Jeff Squyres
92bc8afd43 opal_progress_threads: fix double RELEASE
If a thread failed to start, the tracker would be released twice.
This commit fixes CID 1316020.
2015-08-12 05:11:40 -07:00
Gilles Gouaillardet
92b2d2ffeb configury: fix libevent configure.ac
fix interleaved messages :
checking for working epoll library interface... checking if epoll can build... yes
yes
2015-08-12 15:37:22 +09:00
Howard Pritchard
f1ac629c9d Merge pull request #794 from nrgraham23/java_handle_bugfix
Java null handle bugfix
2015-08-11 12:30:19 -06:00
Nysal Jan K A
eb846ef427 Merge pull request #798 from nysal/topic/pmi_fixes
Fix PMI and PMI2 builds
2015-08-11 22:02:05 +05:30
Nysal Jan K.A
2d2ea63231 Fix PMI and PMI2 builds 2015-08-11 21:13:45 +05:30
Jeff Squyres
3369606c75 Merge pull request #791 from jsquyres/pr/usnic-async-events
usnic: move cchecker to OPAL-wide progress thread
2015-08-11 07:54:07 -04:00
Jeff Squyres
236cf7ff62 usnic: add v1.8/v1.10 compat code
Add compat code so that I can sync master against the v1.10 branch.
2015-08-10 16:27:38 -07:00
Jeff Squyres
7da1c4b875 usnic: avoid race condition in connectivity checker
In short applications, it's possible that the agent (i.e., local rank
0) will finalize after non-local rank 0 procs detect the connectivity
checker named socket, but before they complete a connect() on it.  As
such, their connect() gets ECONNREFUSED.

This commit adds a simple counter in the agent that won't let it quit
before it accept()'s from all local procs, or 10 seconds goes by
(whichever occurs first).  This is similar to the timeout for the
clients: they'll exit if they don't see the expected named socket
within 10 seconds.
2015-08-10 15:40:33 -07:00
Jeff Squyres
bad508687e usnic: move cchecker to OPAL-wide progress thread
There's no longer any need for the usnic BTL to have its own progress
thread: it can use the opal_progress_thread() infrastructure.  This
commit removes the code to startup/shutdown the usnic-BTL-specific
progress thread and instead, just adds its events to the OPAL-wide
progress thread.

This necessitated a small change in the finalization step.
Previously, we would stop the progress thread and then tear down the
events.  We can no longer stop the progress thread, and if we start
tearing down events, this will cause shutdown/hangups to be sent
across sockets, potentially firing some of the still-remaining events
while some (but not all) of the data structures have been torn down.
Chaos ensues.

Instead, queue up an event to tear down all the pending events.  Since
the progress thread will only fire one event at a time, having a
teardown event means that it can tear down all the pending events
"atomically" and not have to worry that one of those events will get
fired in the middle of the teardown process.
2015-08-10 15:40:33 -07:00
Nathaniel Graham
8f4c16da27 Java null handle bugfix
A helper method in Request.java could cause a crash
if the request array that was passed contained nulls.

Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2015-08-10 15:54:36 -06:00
Nathan Hjelm
624a4a0f82 Merge pull request #699 from hjelmn/libnbc_fixes
coll/libnbc: rewrite parts of libnbc
2015-08-10 14:51:42 -06:00
Jeff Squyres
87db836800 Merge pull request #788 from yburette/topic/deprioritize_some_providers
mtl/ofi: Deprioritize some OFI providers.
2015-08-10 14:45:59 -04:00
Nathan Hjelm
d42e0968b1 coll/libnbc: rewrite parts of libnbc
This commit rewrites parts of libnbc to fix issues identified by
coverity and myself. The changes are as follows:

 - libnbc function would return invalid error codes (internal to
   libnbc) to the mpi layer. These codes names are of the form
   NBC_. They do not match up with the error codes expected by the mpi
   layer. I purged the use of all these error codes with the exception
   of NBC_OK and NBC_CONTINUE in progress. These codes are used to
   identify when a request handle is complete.

 - Handles and schedules were leaked by all collective routines on
   error. A new routine was added to return a collective handle
   (NBC_Return_handle).

 - Temporary buffers containting in/out neighbors for neighborhood
   collectives were always leaked.

 - Neigborhood collectives contained code to handle MPI_IN_PLACE which
   is never a valid input for the send or receive buffer. Stipped this
   code out.

 - Files were inconsistently named. Most are nbc_isomething.c but one
   was named coll_libnbc_ireduce_scatter_block.c.

 - Made the NBC_Schedule "structure" and object so it can be
   retained/released. This may enable the use of schedule caching at a
   later time. More testing will be needed to ensure the caching code
   works. If it doesn't the code should be stripped out completely.

 - Added code to simply common case of scheduling send/recv +
   barrier.

 - Code cleanup for readability.

The code now passes the clang static analyzer.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-10 11:53:25 -06:00
Rolf vandeVaart
95d19af0eb Merge pull request #783 from rolfv/pr/fix-thread-issue
Refs open-mpi/ompi#627. Fix support for multi-threads with CUDA 7.0
2015-08-10 11:13:56 -04:00
Rolf vandeVaart
8cc6bef090 Refs open-mpi/ompi#627. Fix support for multi-threads with CUDA 7.0 2015-08-10 10:22:45 -04:00
bosilca
e77ff6b84e Merge pull request #789 from bosilca/topic/treematch_coverity
Fix treematch issues identified by Coverity.
2015-08-08 20:59:17 -04:00
George Bosilca
0a91d7af4d Fix issues identified by Coverity. 2015-08-08 16:41:30 -04:00
Howard Pritchard
acf64b20c7 Merge pull request #786 from nrgraham23/add_implements_cloneable
Add explicit implementation of Cloneable
2015-08-08 13:02:43 -06:00
Jeff Squyres
bd5bf4a224 Merge pull request #781 from hppritcha/topic/suppress_picky_warning
mca/topo: suppress picky warning
2015-08-08 06:14:52 -04:00
yohann
88038b5261 mtl/ofi: Deprioritize some OFI providers.
Some OFI providers such as "sockets" are used for debugging
purposes mostly. For these providers, other components usually
offer better performance -- e.g. for sockets, the BTL/TCP would
be a better choice.
Thus, we chose to ignore some providers unless explicitly asked
by the user on the command line:

e.g. --mca mtl_ofi_provider sockets
2015-08-07 16:09:51 -07:00
Edgar Gabriel
d719497f82 Performance tuning: increase the priority of the sm sharedfp component to ensure that it is selected if it can run. 2015-08-07 16:32:53 -05:00
Edgar Gabriel
8561f5f180 Merge pull request #787 from edgargabriel/pr/adio-lustre-fix
remove a erroneous parenthesis which prevents the compilation of the …
2015-08-07 16:18:15 -05:00
Howard Pritchard
8e7e4ca7f4 Merge pull request #780 from hppritcha/topic/plm_alps_minor_cleanup
plm/alps: remove unneded env. variable setting
2015-08-07 15:03:45 -06:00
Edgar Gabriel
9e29edf15c remove a erroneous paranthesis which prevents the compilation of the lustre adio 2015-08-07 15:22:41 -05:00
rhc54
d6c5770436 Merge pull request #784 from jsquyres/pr/async-event-base-init
Update the opal_progress_thread API
2015-08-07 12:26:43 -07:00
Edgar Gabriel
1293d9c69b free memory correctly in case of an error. Fixes CID 131540 and CID 1315419 2015-08-07 13:30:50 -05:00
Nathaniel Graham
8fcc317a57 Add explicit implementation of Cloneable
Added Cloneable to the implemented interface list as per
Coverity suggestion.  The required methods were already
implemented, but it was not explicitly stated.  This is
an intent revealing change.

Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2015-08-07 12:16:06 -06:00
Edgar Gabriel
0aa3049bfc Performance tuning: change the default behavior of ompio to *not* segment individual read/write operations.
In most cases, performance seems to be better if not segmented.
2015-08-07 13:06:39 -05:00
Edgar Gabriel
db5af26de7 Performance tuning. make sure we catch if the user wants to set the default fileview and replace it with our optimized default file view. Otherwise, performance will suffer. file_get_view should still return the correct filetype, not our optimized default file view. This is the correct version compared to ffa67b9693, which unfortunately broke
some test cases in mpi_test_suite. Thanks for @ggouaillardet for reporting this!
2015-08-07 12:49:58 -05:00
Jeff Squyres
9e1e563120 event: remove opal_async_event_base
opal_async_event_base is not used anywhere.  The opal_progress_thread
API should be used instead.
2015-08-07 10:13:41 -07:00
Jeff Squyres
09f7434491 ORTE: update for the new opal_progress_thread API 2015-08-07 10:13:40 -07:00
Jeff Squyres
d7c25f683e pmix_native: update to the new opal_progress_thread API 2015-08-07 10:13:40 -07:00
Jeff Squyres
99fa054507 opal_progress_threads: update to the API
There are now four functions and one global constant:

* opal_progress_thread_name: the name of the OPAL-wide async progress
  thread.  If you have general purpose events that you need to run in
  *a* progress thread, but not a *dedicated* progress thread, use this
  name in the functions below to glom your events on to the general
  OPAL-wide async progress thread.
* opal_progress_thread_init(): return an event base corresponding to a
  progress thread of the specified name (a progress thread will be
  created for that name if it does not already exist).
* opal_progress_thread_finalize(): decrement the refcount on the
  passed progress thread name.  If the refcount is 0, stop the thread
  and destroy the event base.
* opal_progress_thread_pause(): stop processing events on the event
  base corresponding to the progress thread name, but do not destroy
  the event base.
* opal_progess_thread_resume(): resume processing events on the event
  base corresponding to a previously-paused progress thread name.
2015-08-07 10:13:40 -07:00
Edgar Gabriel
6f6c01ee8d free the datatypes that were created using type_dup during file_set_view 2015-08-07 11:50:25 -05:00
Edgar Gabriel
1ae4f8c7e6 Revert "Performance tuning. make sure we catch if the user wants to set the default fileview and replace it with"
This reverts commit ffa67b9693.
2015-08-07 09:53:07 -05:00
Gilles Gouaillardet
67fdfdda7d configury: patch generated configure files to fix the libtool.m4 bug
the bug started with libtool 2.4.3
Fixes open-mpi/ompi#751
2015-08-07 11:47:32 +09:00
Gilles Gouaillardet
907c095f66 Merge pull request #779 from edgargabriel/topic/fcoll_fixes
Topic/fcoll fixes
2015-08-07 09:14:31 +09:00
Jeff Squyres
b5c37dbfe2 CSCuv67889: usnic: fix an error corner case
Ensure that we have non-NULL on all levels of pointers, which will
save us if there are exitable errors very early during component /
module initialization.
2015-08-06 10:54:28 -07:00
Jeff Squyres
c9e91ff4fc .gitignore: add man page in CUDA extension 2015-08-06 06:49:33 -07:00
Jeff Squyres
982455aaa4 autogen.pl: ensure to patch *all* Autotools output
Per open-mpi/ompi#751, ensure to patch up all Autotools output (not
just in some cases).

Also, adjust the patching process to only write our verbose statements
and a new configure script if the content actually changed as result
of the patching.
2015-08-06 04:38:44 -07:00
Rolf vandeVaart
731cfe3e46 Merge pull request #777 from rolfv/pr/cudaext-build-always
Build always and fix return value
2015-08-05 18:13:58 -04:00
Howard Pritchard
10aac8037f mca/topo: suppress picky warning
When configured with --enable-picky

topo_base_lazy_init.c compiles with a warning:

  CC       base/topo_base_lazy_init.lo
base/topo_base_lazy_init.c:46:67: warning: implicit conversion from enumeration type 'enum mca_base_register_flag_t' to different enumeration type 'mca_base_open_flag_t' (aka 'enum mca_base_open_flag_t') [-Wenum-conversion]
        err = mca_base_framework_open (&ompi_topo_base_framework, MCA_BASE_REGISTER_DEFAULT);

This commit fixes this implicit conversion problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-08-05 16:11:04 -06:00
Jeff Squyres
98b5551126 openmpi-nightly-tarball: update libfabric install location 2015-08-05 17:57:07 -04:00
Rolf vandeVaart
cb84a85d17 Build always and fix return value 2015-08-05 17:23:55 -04:00
Howard Pritchard
1b55d14dff plm/alps: remove unneded env. variable setting
In order to address issue #741, the orted's now are
always launched with the Cray PMI environment variables

PMI_NO_FORK
PMI_NO_PREINITIALIZE

set to disable running of the library's ctor.
So there's no longer a need to set these for the
application(s) being launched by the orted's.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-08-05 13:27:18 -07:00
Edgar Gabriel
16d4171f6b the individual component should call internal ompio functions directly. The reason is that otherwise
the redirection to the ompi_file_t structure (and back to the ompio internal structure) is ambiguise and wrong
for the shared file pointer scenario.
2015-08-05 14:31:11 -05:00
Edgar Gabriel
02a4eb2f13 add the ompi_file_t pointer correctly on the ompio file handle for the sm and individual component. 2015-08-05 14:28:27 -05:00
Jeff Squyres
ae0de54f6c Merge pull request #774 from jsquyres/pr/FUNCTION-fixes
__FUNCTION__ --> __func__ fixes
2015-08-05 09:59:49 -04:00