1
1
Граф коммитов

23270 Коммитов

Автор SHA1 Сообщение Дата
Yohann Burette
452f97f84e Merge pull request #795 from yburette/topic/include_exclude_lists
mtl/ofi: add include/exclude list MCA vars.
2015-08-12 14:33:03 -07:00
yohann
27520b99b8 mtl/ofi: add include/exclude list MCA vars.
mtl_ofi_provider_include (resp. mtl_ofi_provider_exclude) can be used
to specify which provider(s) the OFI MTL can select (resp. ignore).

e.g. --mca mtl_ofi_provider_include "psm,sockets"

By default, mtl_ofi_provider_exclude is set to "sockets,mxm".

This deprecates the old MCA var named "mtl_ofi_provider".
2015-08-12 13:52:04 -07:00
bosilca
4368ebb490 Merge pull request #803 from jsquyres/pr/treematch-sans-hwloc-fix
treematch: ensure hwloc support is enabled
2015-08-12 16:05:05 -04:00
Jeff Squyres
e9b7203ece treematch: ensure hwloc support is enabled
This commit does the following:

* s/ompi_check_treematch/ompi_topo_treematch/ (i.e., abide by the
  prefix rule)
* change the value of ompi_topo_treematch_happy from yes/no to 0/1, so
  that we can use -eq for numerical comparisons (vs. string
  comparisons).  It's the little things in life, no?
* Check the valueo f $OPAL_HAVE_HWLOC to ensure that hwloc support is
  enabled.  If not, disqualify treematch from building.
* Fixes a few places that were underquoted
* Convert from "test ... -a ..." to "test ... && test ..."

Fixes open-mpi/ompi#797
2015-08-12 12:23:12 -07:00
Jeff Squyres
6a7d5271c4 ompi_ext.m4: allow extensions to have config.h.in
Previously, extensions were required to have a config.h for their C
bindings.  This commit allows them to have a config.h.in, in case
their C bindings header file is generated.
2015-08-12 12:22:59 -07:00
rhc54
929c05dbdb Merge pull request #801 from rhc54/topic/cleanup
Cleanup some cruft and update to coordinate with CM operations:
2015-08-12 11:47:26 -07:00
Ralph Castain
0b1d4b62be Cleanup some cruft and update to coordinate with CM operations:
* don't pass --tree-spawn to the orted cmd line. If someone doesn't want tree-spawn, it shows up as an MCA param anyway
* ensure state/orted component disqualifies itself from CM operations
* clarify the DVM proc_type definitions
* ensure we stop littering the tmp dir with session directories
2015-08-12 10:32:14 -07:00
Edgar Gabriel
55f0e1a1f8 fix the lustre compilation problems for older lustre versions. Add the prototype for the static function to avoid a warning message. 2015-08-12 09:45:07 -05:00
Jeff Squyres
2409fa166b get_library_version: checking string constants vs. NULL is dead code
The prior code was checking string constants (which are #defines from
configure) against NULL.  They can never be NULL, so the checks were
overly-defensive.  If the preprocessor macros do not exist, we'll get
a different compiler error.  So remove the dead code.

This fixes CID 72349.
2015-08-12 05:35:12 -07:00
Jeff Squyres
3be125afff op base: whitespace cleanup
No logical code changes.
2015-08-12 05:35:11 -07:00
Jeff Squyres
a2addbafed op base: move return statement to correct level
This fixes CID 71945.
2015-08-12 05:35:11 -07:00
Jeff Squyres
31b329e585 odls default: ensure to initialize opts
This fixes CID 71127.
2015-08-12 05:27:37 -07:00
Jeff Squyres
14340770c4 usnic: remove some logically dead code
This code really had no purpose; just assign FI_VERSION(1, 1).  This
fixes CID 1315274.

Also clarify the commet about why we still retain libfabric v1.0.0
compatibility code, even though configure.m4 requires libfabric >= v1.1.0.
2015-08-12 05:21:18 -07:00
Jeff Squyres
7f857034d9 common verbs: check return value of sscanf()
Fixes CID 1304563.
2015-08-12 05:14:58 -07:00
Jeff Squyres
92bc8afd43 opal_progress_threads: fix double RELEASE
If a thread failed to start, the tracker would be released twice.
This commit fixes CID 1316020.
2015-08-12 05:11:40 -07:00
Gilles Gouaillardet
92b2d2ffeb configury: fix libevent configure.ac
fix interleaved messages :
checking for working epoll library interface... checking if epoll can build... yes
yes
2015-08-12 15:37:22 +09:00
Howard Pritchard
f1ac629c9d Merge pull request #794 from nrgraham23/java_handle_bugfix
Java null handle bugfix
2015-08-11 12:30:19 -06:00
Nysal Jan K A
eb846ef427 Merge pull request #798 from nysal/topic/pmi_fixes
Fix PMI and PMI2 builds
2015-08-11 22:02:05 +05:30
Nysal Jan K.A
2d2ea63231 Fix PMI and PMI2 builds 2015-08-11 21:13:45 +05:30
Jeff Squyres
3369606c75 Merge pull request #791 from jsquyres/pr/usnic-async-events
usnic: move cchecker to OPAL-wide progress thread
2015-08-11 07:54:07 -04:00
Jeff Squyres
236cf7ff62 usnic: add v1.8/v1.10 compat code
Add compat code so that I can sync master against the v1.10 branch.
2015-08-10 16:27:38 -07:00
Jeff Squyres
7da1c4b875 usnic: avoid race condition in connectivity checker
In short applications, it's possible that the agent (i.e., local rank
0) will finalize after non-local rank 0 procs detect the connectivity
checker named socket, but before they complete a connect() on it.  As
such, their connect() gets ECONNREFUSED.

This commit adds a simple counter in the agent that won't let it quit
before it accept()'s from all local procs, or 10 seconds goes by
(whichever occurs first).  This is similar to the timeout for the
clients: they'll exit if they don't see the expected named socket
within 10 seconds.
2015-08-10 15:40:33 -07:00
Jeff Squyres
bad508687e usnic: move cchecker to OPAL-wide progress thread
There's no longer any need for the usnic BTL to have its own progress
thread: it can use the opal_progress_thread() infrastructure.  This
commit removes the code to startup/shutdown the usnic-BTL-specific
progress thread and instead, just adds its events to the OPAL-wide
progress thread.

This necessitated a small change in the finalization step.
Previously, we would stop the progress thread and then tear down the
events.  We can no longer stop the progress thread, and if we start
tearing down events, this will cause shutdown/hangups to be sent
across sockets, potentially firing some of the still-remaining events
while some (but not all) of the data structures have been torn down.
Chaos ensues.

Instead, queue up an event to tear down all the pending events.  Since
the progress thread will only fire one event at a time, having a
teardown event means that it can tear down all the pending events
"atomically" and not have to worry that one of those events will get
fired in the middle of the teardown process.
2015-08-10 15:40:33 -07:00
Nathaniel Graham
8f4c16da27 Java null handle bugfix
A helper method in Request.java could cause a crash
if the request array that was passed contained nulls.

Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2015-08-10 15:54:36 -06:00
Nathan Hjelm
624a4a0f82 Merge pull request #699 from hjelmn/libnbc_fixes
coll/libnbc: rewrite parts of libnbc
2015-08-10 14:51:42 -06:00
Jeff Squyres
87db836800 Merge pull request #788 from yburette/topic/deprioritize_some_providers
mtl/ofi: Deprioritize some OFI providers.
2015-08-10 14:45:59 -04:00
Nathan Hjelm
d42e0968b1 coll/libnbc: rewrite parts of libnbc
This commit rewrites parts of libnbc to fix issues identified by
coverity and myself. The changes are as follows:

 - libnbc function would return invalid error codes (internal to
   libnbc) to the mpi layer. These codes names are of the form
   NBC_. They do not match up with the error codes expected by the mpi
   layer. I purged the use of all these error codes with the exception
   of NBC_OK and NBC_CONTINUE in progress. These codes are used to
   identify when a request handle is complete.

 - Handles and schedules were leaked by all collective routines on
   error. A new routine was added to return a collective handle
   (NBC_Return_handle).

 - Temporary buffers containting in/out neighbors for neighborhood
   collectives were always leaked.

 - Neigborhood collectives contained code to handle MPI_IN_PLACE which
   is never a valid input for the send or receive buffer. Stipped this
   code out.

 - Files were inconsistently named. Most are nbc_isomething.c but one
   was named coll_libnbc_ireduce_scatter_block.c.

 - Made the NBC_Schedule "structure" and object so it can be
   retained/released. This may enable the use of schedule caching at a
   later time. More testing will be needed to ensure the caching code
   works. If it doesn't the code should be stripped out completely.

 - Added code to simply common case of scheduling send/recv +
   barrier.

 - Code cleanup for readability.

The code now passes the clang static analyzer.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-10 11:53:25 -06:00
Rolf vandeVaart
95d19af0eb Merge pull request #783 from rolfv/pr/fix-thread-issue
Refs open-mpi/ompi#627. Fix support for multi-threads with CUDA 7.0
2015-08-10 11:13:56 -04:00
Rolf vandeVaart
8cc6bef090 Refs open-mpi/ompi#627. Fix support for multi-threads with CUDA 7.0 2015-08-10 10:22:45 -04:00
bosilca
e77ff6b84e Merge pull request #789 from bosilca/topic/treematch_coverity
Fix treematch issues identified by Coverity.
2015-08-08 20:59:17 -04:00
George Bosilca
0a91d7af4d Fix issues identified by Coverity. 2015-08-08 16:41:30 -04:00
Howard Pritchard
acf64b20c7 Merge pull request #786 from nrgraham23/add_implements_cloneable
Add explicit implementation of Cloneable
2015-08-08 13:02:43 -06:00
Jeff Squyres
bd5bf4a224 Merge pull request #781 from hppritcha/topic/suppress_picky_warning
mca/topo: suppress picky warning
2015-08-08 06:14:52 -04:00
yohann
88038b5261 mtl/ofi: Deprioritize some OFI providers.
Some OFI providers such as "sockets" are used for debugging
purposes mostly. For these providers, other components usually
offer better performance -- e.g. for sockets, the BTL/TCP would
be a better choice.
Thus, we chose to ignore some providers unless explicitly asked
by the user on the command line:

e.g. --mca mtl_ofi_provider sockets
2015-08-07 16:09:51 -07:00
Edgar Gabriel
d719497f82 Performance tuning: increase the priority of the sm sharedfp component to ensure that it is selected if it can run. 2015-08-07 16:32:53 -05:00
Edgar Gabriel
8561f5f180 Merge pull request #787 from edgargabriel/pr/adio-lustre-fix
remove a erroneous parenthesis which prevents the compilation of the …
2015-08-07 16:18:15 -05:00
Howard Pritchard
8e7e4ca7f4 Merge pull request #780 from hppritcha/topic/plm_alps_minor_cleanup
plm/alps: remove unneded env. variable setting
2015-08-07 15:03:45 -06:00
Edgar Gabriel
9e29edf15c remove a erroneous paranthesis which prevents the compilation of the lustre adio 2015-08-07 15:22:41 -05:00
rhc54
d6c5770436 Merge pull request #784 from jsquyres/pr/async-event-base-init
Update the opal_progress_thread API
2015-08-07 12:26:43 -07:00
Edgar Gabriel
1293d9c69b free memory correctly in case of an error. Fixes CID 131540 and CID 1315419 2015-08-07 13:30:50 -05:00
Nathaniel Graham
8fcc317a57 Add explicit implementation of Cloneable
Added Cloneable to the implemented interface list as per
Coverity suggestion.  The required methods were already
implemented, but it was not explicitly stated.  This is
an intent revealing change.

Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2015-08-07 12:16:06 -06:00
Edgar Gabriel
0aa3049bfc Performance tuning: change the default behavior of ompio to *not* segment individual read/write operations.
In most cases, performance seems to be better if not segmented.
2015-08-07 13:06:39 -05:00
Edgar Gabriel
db5af26de7 Performance tuning. make sure we catch if the user wants to set the default fileview and replace it with our optimized default file view. Otherwise, performance will suffer. file_get_view should still return the correct filetype, not our optimized default file view. This is the correct version compared to ffa67b9693, which unfortunately broke
some test cases in mpi_test_suite. Thanks for @ggouaillardet for reporting this!
2015-08-07 12:49:58 -05:00
Jeff Squyres
9e1e563120 event: remove opal_async_event_base
opal_async_event_base is not used anywhere.  The opal_progress_thread
API should be used instead.
2015-08-07 10:13:41 -07:00
Jeff Squyres
09f7434491 ORTE: update for the new opal_progress_thread API 2015-08-07 10:13:40 -07:00
Jeff Squyres
d7c25f683e pmix_native: update to the new opal_progress_thread API 2015-08-07 10:13:40 -07:00
Jeff Squyres
99fa054507 opal_progress_threads: update to the API
There are now four functions and one global constant:

* opal_progress_thread_name: the name of the OPAL-wide async progress
  thread.  If you have general purpose events that you need to run in
  *a* progress thread, but not a *dedicated* progress thread, use this
  name in the functions below to glom your events on to the general
  OPAL-wide async progress thread.
* opal_progress_thread_init(): return an event base corresponding to a
  progress thread of the specified name (a progress thread will be
  created for that name if it does not already exist).
* opal_progress_thread_finalize(): decrement the refcount on the
  passed progress thread name.  If the refcount is 0, stop the thread
  and destroy the event base.
* opal_progress_thread_pause(): stop processing events on the event
  base corresponding to the progress thread name, but do not destroy
  the event base.
* opal_progess_thread_resume(): resume processing events on the event
  base corresponding to a previously-paused progress thread name.
2015-08-07 10:13:40 -07:00
Edgar Gabriel
6f6c01ee8d free the datatypes that were created using type_dup during file_set_view 2015-08-07 11:50:25 -05:00
Edgar Gabriel
1ae4f8c7e6 Revert "Performance tuning. make sure we catch if the user wants to set the default fileview and replace it with"
This reverts commit ffa67b9693.
2015-08-07 09:53:07 -05:00
Gilles Gouaillardet
67fdfdda7d configury: patch generated configure files to fix the libtool.m4 bug
the bug started with libtool 2.4.3
Fixes open-mpi/ompi#751
2015-08-07 11:47:32 +09:00