1
1
Граф коммитов

23443 Коммитов

Автор SHA1 Сообщение Дата
yohann
98b300e1bb mtl/ofi: Require proper ordering by OFI provider. 2015-08-14 16:36:10 -07:00
Rolf vandeVaart
652a685e78 Merge pull request #811 from rolfv/pr/fix-cuda-ext-again
Fix macro return value when not CUDA-aware
2015-08-14 14:40:45 -04:00
Rolf vandeVaart
0e87478e40 Fix macro return value when not CUDA-aware 2015-08-14 13:56:25 -04:00
Edgar Gabriel
022a9d8d89 Merge pull request #810 from edgargabriel/pr/coll_timing_cleanup
Code cleanup for the time breakdown feature in ompio/fcoll
2015-08-14 10:05:12 -05:00
Jeff Squyres
42b9a966d6 Makefile.am's: if calling OPAL functions, must link to it
On some OSs (e.g., Ubuntu 14.04.2 LTS), the linker is configured such
that the symbols of library dependencies are not available to the
application.  Hence, we need to explicitly list such dependencies when
creating the executable.

For this commit, these tests are use OPAL function calls, so we must
explicitly link in libopen-pal.so.
2015-08-14 07:51:55 -07:00
Edgar Gabriel
072b18e197 Code cleanup for the time breakdown feature in ompio/fcoll
- make the internal structure follow the Open MPI naming convention
 - provide a single flag/macro which controls the compilation/utilization of this
   feature, to avoid that somebody using this has to modify every single
   fcoll component. A configure option could be added later if desired.
2015-08-14 08:53:04 -05:00
Rolf vandeVaart
a7dcfb2012 Merge pull request #806 from rolfv/pr/add-static-defs
Add static definitions where needed and remove one unused definition
2015-08-14 08:59:17 -04:00
Ralph Castain
5040f47ef3 Use the correct verbosity in an output_verbose 2015-08-13 22:33:25 -07:00
Gilles Gouaillardet
4c15560e49 patch generated configure to fix an other libtool bug.
Thanks to Eric Schnetter for reporting this issue and providing a fix.
2015-08-14 10:37:40 +09:00
Edgar Gabriel
4bfc6ae798 Performance tuning: incorporate the usage of non-blocking operations in our array group-communication operations. 2015-08-13 20:05:18 -05:00
Gilles Gouaillardet
d02ccd67de btl/openib: remove OFED version runtime check when XRC is used
this test seems broken :
 - some false positive were reported
 - it fails to detect some OFED version mismatch
this commit simply removes this test, which means the application
will likely fail if XRC is used ad OFED version is different
between compile time and runtime
2015-08-14 09:10:03 +09:00
Gilles Gouaillardet
6118236f1a Merge pull request #796 from ggouaillardet/topic/hcoll_config
configury: fix hcoll, fca and mxm detection and revamp yalla Makefile.am
Thanks to David Shrader and Ake Sandgren for bringing this issue to our attention
2015-08-14 08:55:46 +09:00
Edgar Gabriel
9f369ba515 move the inclusion of the lustre_user and lliblustreapi header files to the fs_lustre.h file. 2015-08-13 15:36:16 -05:00
Nathan Hjelm
b8356dae05 ompi/win: add internal support for returning same_size and same_disp_unit info keys
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-13 13:19:52 -06:00
Nathan Hjelm
b933eda36b ompi: add internal error code for MPI_ERR_RMA_FLAVOR
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-13 13:18:29 -06:00
Rolf vandeVaart
cb8c86910e Add static definitions where needed and remove one unused definition 2015-08-13 14:59:07 -04:00
Ralph Castain
a2a049a612 Update test to match the one in MTT 2015-08-13 11:12:34 -07:00
Gilles Gouaillardet
7b7842fd5a configury: fix description of the --with-knem option 2015-08-13 17:53:48 +09:00
Gilles Gouaillardet
6b2fe9120e yalla: fix Makefile.am LDFLAGS 2015-08-13 17:33:52 +09:00
Gilles Gouaillardet
288e13ba7c configury: fix --with-mxm
* search mxm libs in both DIR/lib and DIR/lib64
 * fix the description of the --with-mxm option
2015-08-13 11:09:26 +09:00
Gilles Gouaillardet
1a238d3a4f configury: fix fca detection
* do not add -I/.../include/fca -I /.../include/fca_core to CPPFLAGS
 * allow configure --with-fca
 * search fca libs in both DIR/lib and DIR/lib64
 * fix the description of the --with-fca option
2015-08-13 11:09:15 +09:00
Gilles Gouaillardet
df98a73131 configury: fix hcoll detection
* do not add -I/.../include/hcoll -I /.../include/hcoll/api to CPPFLAGS
 * allow configure --with-hcoll
 * search hcoll libs in both DIR/lib and DIR/lib64
 * fix the description of the --with-hcoll option
2015-08-13 11:08:56 +09:00
Yohann Burette
452f97f84e Merge pull request #795 from yburette/topic/include_exclude_lists
mtl/ofi: add include/exclude list MCA vars.
2015-08-12 14:33:03 -07:00
yohann
27520b99b8 mtl/ofi: add include/exclude list MCA vars.
mtl_ofi_provider_include (resp. mtl_ofi_provider_exclude) can be used
to specify which provider(s) the OFI MTL can select (resp. ignore).

e.g. --mca mtl_ofi_provider_include "psm,sockets"

By default, mtl_ofi_provider_exclude is set to "sockets,mxm".

This deprecates the old MCA var named "mtl_ofi_provider".
2015-08-12 13:52:04 -07:00
bosilca
4368ebb490 Merge pull request #803 from jsquyres/pr/treematch-sans-hwloc-fix
treematch: ensure hwloc support is enabled
2015-08-12 16:05:05 -04:00
Jeff Squyres
e9b7203ece treematch: ensure hwloc support is enabled
This commit does the following:

* s/ompi_check_treematch/ompi_topo_treematch/ (i.e., abide by the
  prefix rule)
* change the value of ompi_topo_treematch_happy from yes/no to 0/1, so
  that we can use -eq for numerical comparisons (vs. string
  comparisons).  It's the little things in life, no?
* Check the valueo f $OPAL_HAVE_HWLOC to ensure that hwloc support is
  enabled.  If not, disqualify treematch from building.
* Fixes a few places that were underquoted
* Convert from "test ... -a ..." to "test ... && test ..."

Fixes open-mpi/ompi#797
2015-08-12 12:23:12 -07:00
Jeff Squyres
6a7d5271c4 ompi_ext.m4: allow extensions to have config.h.in
Previously, extensions were required to have a config.h for their C
bindings.  This commit allows them to have a config.h.in, in case
their C bindings header file is generated.
2015-08-12 12:22:59 -07:00
rhc54
929c05dbdb Merge pull request #801 from rhc54/topic/cleanup
Cleanup some cruft and update to coordinate with CM operations:
2015-08-12 11:47:26 -07:00
Ralph Castain
0b1d4b62be Cleanup some cruft and update to coordinate with CM operations:
* don't pass --tree-spawn to the orted cmd line. If someone doesn't want tree-spawn, it shows up as an MCA param anyway
* ensure state/orted component disqualifies itself from CM operations
* clarify the DVM proc_type definitions
* ensure we stop littering the tmp dir with session directories
2015-08-12 10:32:14 -07:00
Edgar Gabriel
55f0e1a1f8 fix the lustre compilation problems for older lustre versions. Add the prototype for the static function to avoid a warning message. 2015-08-12 09:45:07 -05:00
Jeff Squyres
2409fa166b get_library_version: checking string constants vs. NULL is dead code
The prior code was checking string constants (which are #defines from
configure) against NULL.  They can never be NULL, so the checks were
overly-defensive.  If the preprocessor macros do not exist, we'll get
a different compiler error.  So remove the dead code.

This fixes CID 72349.
2015-08-12 05:35:12 -07:00
Jeff Squyres
3be125afff op base: whitespace cleanup
No logical code changes.
2015-08-12 05:35:11 -07:00
Jeff Squyres
a2addbafed op base: move return statement to correct level
This fixes CID 71945.
2015-08-12 05:35:11 -07:00
Jeff Squyres
31b329e585 odls default: ensure to initialize opts
This fixes CID 71127.
2015-08-12 05:27:37 -07:00
Jeff Squyres
14340770c4 usnic: remove some logically dead code
This code really had no purpose; just assign FI_VERSION(1, 1).  This
fixes CID 1315274.

Also clarify the commet about why we still retain libfabric v1.0.0
compatibility code, even though configure.m4 requires libfabric >= v1.1.0.
2015-08-12 05:21:18 -07:00
Jeff Squyres
7f857034d9 common verbs: check return value of sscanf()
Fixes CID 1304563.
2015-08-12 05:14:58 -07:00
Jeff Squyres
92bc8afd43 opal_progress_threads: fix double RELEASE
If a thread failed to start, the tracker would be released twice.
This commit fixes CID 1316020.
2015-08-12 05:11:40 -07:00
Gilles Gouaillardet
92b2d2ffeb configury: fix libevent configure.ac
fix interleaved messages :
checking for working epoll library interface... checking if epoll can build... yes
yes
2015-08-12 15:37:22 +09:00
Howard Pritchard
f1ac629c9d Merge pull request #794 from nrgraham23/java_handle_bugfix
Java null handle bugfix
2015-08-11 12:30:19 -06:00
Nysal Jan K A
eb846ef427 Merge pull request #798 from nysal/topic/pmi_fixes
Fix PMI and PMI2 builds
2015-08-11 22:02:05 +05:30
Nysal Jan K.A
2d2ea63231 Fix PMI and PMI2 builds 2015-08-11 21:13:45 +05:30
Jeff Squyres
3369606c75 Merge pull request #791 from jsquyres/pr/usnic-async-events
usnic: move cchecker to OPAL-wide progress thread
2015-08-11 07:54:07 -04:00
Jeff Squyres
236cf7ff62 usnic: add v1.8/v1.10 compat code
Add compat code so that I can sync master against the v1.10 branch.
2015-08-10 16:27:38 -07:00
Jeff Squyres
7da1c4b875 usnic: avoid race condition in connectivity checker
In short applications, it's possible that the agent (i.e., local rank
0) will finalize after non-local rank 0 procs detect the connectivity
checker named socket, but before they complete a connect() on it.  As
such, their connect() gets ECONNREFUSED.

This commit adds a simple counter in the agent that won't let it quit
before it accept()'s from all local procs, or 10 seconds goes by
(whichever occurs first).  This is similar to the timeout for the
clients: they'll exit if they don't see the expected named socket
within 10 seconds.
2015-08-10 15:40:33 -07:00
Jeff Squyres
bad508687e usnic: move cchecker to OPAL-wide progress thread
There's no longer any need for the usnic BTL to have its own progress
thread: it can use the opal_progress_thread() infrastructure.  This
commit removes the code to startup/shutdown the usnic-BTL-specific
progress thread and instead, just adds its events to the OPAL-wide
progress thread.

This necessitated a small change in the finalization step.
Previously, we would stop the progress thread and then tear down the
events.  We can no longer stop the progress thread, and if we start
tearing down events, this will cause shutdown/hangups to be sent
across sockets, potentially firing some of the still-remaining events
while some (but not all) of the data structures have been torn down.
Chaos ensues.

Instead, queue up an event to tear down all the pending events.  Since
the progress thread will only fire one event at a time, having a
teardown event means that it can tear down all the pending events
"atomically" and not have to worry that one of those events will get
fired in the middle of the teardown process.
2015-08-10 15:40:33 -07:00
Nathaniel Graham
8f4c16da27 Java null handle bugfix
A helper method in Request.java could cause a crash
if the request array that was passed contained nulls.

Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2015-08-10 15:54:36 -06:00
Nathan Hjelm
624a4a0f82 Merge pull request #699 from hjelmn/libnbc_fixes
coll/libnbc: rewrite parts of libnbc
2015-08-10 14:51:42 -06:00
Jeff Squyres
87db836800 Merge pull request #788 from yburette/topic/deprioritize_some_providers
mtl/ofi: Deprioritize some OFI providers.
2015-08-10 14:45:59 -04:00
Nathan Hjelm
d42e0968b1 coll/libnbc: rewrite parts of libnbc
This commit rewrites parts of libnbc to fix issues identified by
coverity and myself. The changes are as follows:

 - libnbc function would return invalid error codes (internal to
   libnbc) to the mpi layer. These codes names are of the form
   NBC_. They do not match up with the error codes expected by the mpi
   layer. I purged the use of all these error codes with the exception
   of NBC_OK and NBC_CONTINUE in progress. These codes are used to
   identify when a request handle is complete.

 - Handles and schedules were leaked by all collective routines on
   error. A new routine was added to return a collective handle
   (NBC_Return_handle).

 - Temporary buffers containting in/out neighbors for neighborhood
   collectives were always leaked.

 - Neigborhood collectives contained code to handle MPI_IN_PLACE which
   is never a valid input for the send or receive buffer. Stipped this
   code out.

 - Files were inconsistently named. Most are nbc_isomething.c but one
   was named coll_libnbc_ireduce_scatter_block.c.

 - Made the NBC_Schedule "structure" and object so it can be
   retained/released. This may enable the use of schedule caching at a
   later time. More testing will be needed to ensure the caching code
   works. If it doesn't the code should be stripped out completely.

 - Added code to simply common case of scheduling send/recv +
   barrier.

 - Code cleanup for readability.

The code now passes the clang static analyzer.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-10 11:53:25 -06:00
Rolf vandeVaart
95d19af0eb Merge pull request #783 from rolfv/pr/fix-thread-issue
Refs open-mpi/ompi#627. Fix support for multi-threads with CUDA 7.0
2015-08-10 11:13:56 -04:00