1
1
Граф коммитов

5694 Коммитов

Автор SHA1 Сообщение Дата
Edgar Gabriel
7e370948c1 first cut on the fileview for shared filepointers fix. 2015-08-19 17:11:42 -05:00
yohann
bcc10fbcd4 mtl/ofi: remove redundant code. 2015-08-19 13:13:59 -07:00
Yossi Itigin
f9e2ede47f Merge pull request #816 from yosefe/topic/yalla-fix-on-demand-map
yalla: fix passing on-demand mapping config to mxm.
2015-08-19 17:25:30 +03:00
Gilles Gouaillardet
646b9943e8 topo/treematch: initialize the global_bl symbol 2015-08-19 10:39:17 +09:00
Edgar Gabriel
1b45712595 bring the addproc component up to date with support for split collectives. No pr required
for this commit, since the addproc component is not part of v2.x
2015-08-18 12:17:46 -05:00
Todd Kordenbrock
10cf64373a osc-portals4: allow atomic ops on datatypes that are max_fetch_atomic_size bytes in length
Portals4 supports atomic ops on datatypes less than or equal to
max_fetch_atomic_size bytes.  This commit fixes a bug that required
the datatype to be less than max_fetch_atomic_size bytes.
2015-08-18 11:51:16 -05:00
Nathan Hjelm
145bac088d Merge pull request #753 from hjelmn/verbose_standard
Standardize verbosity levels
2015-08-18 09:43:28 -06:00
yosefe
85580ad055 yalla: fix passing on-demand mapping config to mxm. 2015-08-18 15:00:59 +03:00
Edgar Gabriel
5ef0632f9d cleanup the usage of printf vs. opal_output 2015-08-17 14:55:12 -05:00
Nathan Hjelm
2f447b2c4c bml/r2: use the bml framework output and set verbosity level to info
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-17 11:48:06 -06:00
yohann
98b300e1bb mtl/ofi: Require proper ordering by OFI provider. 2015-08-14 16:36:10 -07:00
Edgar Gabriel
072b18e197 Code cleanup for the time breakdown feature in ompio/fcoll
- make the internal structure follow the Open MPI naming convention
 - provide a single flag/macro which controls the compilation/utilization of this
   feature, to avoid that somebody using this has to modify every single
   fcoll component. A configure option could be added later if desired.
2015-08-14 08:53:04 -05:00
Edgar Gabriel
4bfc6ae798 Performance tuning: incorporate the usage of non-blocking operations in our array group-communication operations. 2015-08-13 20:05:18 -05:00
Gilles Gouaillardet
6118236f1a Merge pull request #796 from ggouaillardet/topic/hcoll_config
configury: fix hcoll, fca and mxm detection and revamp yalla Makefile.am
Thanks to David Shrader and Ake Sandgren for bringing this issue to our attention
2015-08-14 08:55:46 +09:00
Edgar Gabriel
9f369ba515 move the inclusion of the lustre_user and lliblustreapi header files to the fs_lustre.h file. 2015-08-13 15:36:16 -05:00
Gilles Gouaillardet
6b2fe9120e yalla: fix Makefile.am LDFLAGS 2015-08-13 17:33:52 +09:00
Gilles Gouaillardet
1a238d3a4f configury: fix fca detection
* do not add -I/.../include/fca -I /.../include/fca_core to CPPFLAGS
 * allow configure --with-fca
 * search fca libs in both DIR/lib and DIR/lib64
 * fix the description of the --with-fca option
2015-08-13 11:09:15 +09:00
Gilles Gouaillardet
df98a73131 configury: fix hcoll detection
* do not add -I/.../include/hcoll -I /.../include/hcoll/api to CPPFLAGS
 * allow configure --with-hcoll
 * search hcoll libs in both DIR/lib and DIR/lib64
 * fix the description of the --with-hcoll option
2015-08-13 11:08:56 +09:00
yohann
27520b99b8 mtl/ofi: add include/exclude list MCA vars.
mtl_ofi_provider_include (resp. mtl_ofi_provider_exclude) can be used
to specify which provider(s) the OFI MTL can select (resp. ignore).

e.g. --mca mtl_ofi_provider_include "psm,sockets"

By default, mtl_ofi_provider_exclude is set to "sockets,mxm".

This deprecates the old MCA var named "mtl_ofi_provider".
2015-08-12 13:52:04 -07:00
Jeff Squyres
e9b7203ece treematch: ensure hwloc support is enabled
This commit does the following:

* s/ompi_check_treematch/ompi_topo_treematch/ (i.e., abide by the
  prefix rule)
* change the value of ompi_topo_treematch_happy from yes/no to 0/1, so
  that we can use -eq for numerical comparisons (vs. string
  comparisons).  It's the little things in life, no?
* Check the valueo f $OPAL_HAVE_HWLOC to ensure that hwloc support is
  enabled.  If not, disqualify treematch from building.
* Fixes a few places that were underquoted
* Convert from "test ... -a ..." to "test ... && test ..."

Fixes open-mpi/ompi#797
2015-08-12 12:23:12 -07:00
Edgar Gabriel
55f0e1a1f8 fix the lustre compilation problems for older lustre versions. Add the prototype for the static function to avoid a warning message. 2015-08-12 09:45:07 -05:00
Jeff Squyres
3be125afff op base: whitespace cleanup
No logical code changes.
2015-08-12 05:35:11 -07:00
Jeff Squyres
a2addbafed op base: move return statement to correct level
This fixes CID 71945.
2015-08-12 05:35:11 -07:00
Nathan Hjelm
624a4a0f82 Merge pull request #699 from hjelmn/libnbc_fixes
coll/libnbc: rewrite parts of libnbc
2015-08-10 14:51:42 -06:00
Jeff Squyres
87db836800 Merge pull request #788 from yburette/topic/deprioritize_some_providers
mtl/ofi: Deprioritize some OFI providers.
2015-08-10 14:45:59 -04:00
Nathan Hjelm
d42e0968b1 coll/libnbc: rewrite parts of libnbc
This commit rewrites parts of libnbc to fix issues identified by
coverity and myself. The changes are as follows:

 - libnbc function would return invalid error codes (internal to
   libnbc) to the mpi layer. These codes names are of the form
   NBC_. They do not match up with the error codes expected by the mpi
   layer. I purged the use of all these error codes with the exception
   of NBC_OK and NBC_CONTINUE in progress. These codes are used to
   identify when a request handle is complete.

 - Handles and schedules were leaked by all collective routines on
   error. A new routine was added to return a collective handle
   (NBC_Return_handle).

 - Temporary buffers containting in/out neighbors for neighborhood
   collectives were always leaked.

 - Neigborhood collectives contained code to handle MPI_IN_PLACE which
   is never a valid input for the send or receive buffer. Stipped this
   code out.

 - Files were inconsistently named. Most are nbc_isomething.c but one
   was named coll_libnbc_ireduce_scatter_block.c.

 - Made the NBC_Schedule "structure" and object so it can be
   retained/released. This may enable the use of schedule caching at a
   later time. More testing will be needed to ensure the caching code
   works. If it doesn't the code should be stripped out completely.

 - Added code to simply common case of scheduling send/recv +
   barrier.

 - Code cleanup for readability.

The code now passes the clang static analyzer.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-10 11:53:25 -06:00
George Bosilca
0a91d7af4d Fix issues identified by Coverity. 2015-08-08 16:41:30 -04:00
Jeff Squyres
bd5bf4a224 Merge pull request #781 from hppritcha/topic/suppress_picky_warning
mca/topo: suppress picky warning
2015-08-08 06:14:52 -04:00
yohann
88038b5261 mtl/ofi: Deprioritize some OFI providers.
Some OFI providers such as "sockets" are used for debugging
purposes mostly. For these providers, other components usually
offer better performance -- e.g. for sockets, the BTL/TCP would
be a better choice.
Thus, we chose to ignore some providers unless explicitly asked
by the user on the command line:

e.g. --mca mtl_ofi_provider sockets
2015-08-07 16:09:51 -07:00
Edgar Gabriel
d719497f82 Performance tuning: increase the priority of the sm sharedfp component to ensure that it is selected if it can run. 2015-08-07 16:32:53 -05:00
Edgar Gabriel
9e29edf15c remove a erroneous paranthesis which prevents the compilation of the lustre adio 2015-08-07 15:22:41 -05:00
Edgar Gabriel
1293d9c69b free memory correctly in case of an error. Fixes CID 131540 and CID 1315419 2015-08-07 13:30:50 -05:00
Edgar Gabriel
0aa3049bfc Performance tuning: change the default behavior of ompio to *not* segment individual read/write operations.
In most cases, performance seems to be better if not segmented.
2015-08-07 13:06:39 -05:00
Edgar Gabriel
db5af26de7 Performance tuning. make sure we catch if the user wants to set the default fileview and replace it with our optimized default file view. Otherwise, performance will suffer. file_get_view should still return the correct filetype, not our optimized default file view. This is the correct version compared to ffa67b9693, which unfortunately broke
some test cases in mpi_test_suite. Thanks for @ggouaillardet for reporting this!
2015-08-07 12:49:58 -05:00
Edgar Gabriel
6f6c01ee8d free the datatypes that were created using type_dup during file_set_view 2015-08-07 11:50:25 -05:00
Edgar Gabriel
1ae4f8c7e6 Revert "Performance tuning. make sure we catch if the user wants to set the default fileview and replace it with"
This reverts commit ffa67b9693.
2015-08-07 09:53:07 -05:00
Gilles Gouaillardet
907c095f66 Merge pull request #779 from edgargabriel/topic/fcoll_fixes
Topic/fcoll fixes
2015-08-07 09:14:31 +09:00
Howard Pritchard
10aac8037f mca/topo: suppress picky warning
When configured with --enable-picky

topo_base_lazy_init.c compiles with a warning:

  CC       base/topo_base_lazy_init.lo
base/topo_base_lazy_init.c:46:67: warning: implicit conversion from enumeration type 'enum mca_base_register_flag_t' to different enumeration type 'mca_base_open_flag_t' (aka 'enum mca_base_open_flag_t') [-Wenum-conversion]
        err = mca_base_framework_open (&ompi_topo_base_framework, MCA_BASE_REGISTER_DEFAULT);

This commit fixes this implicit conversion problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-08-05 16:11:04 -06:00
Edgar Gabriel
16d4171f6b the individual component should call internal ompio functions directly. The reason is that otherwise
the redirection to the ompi_file_t structure (and back to the ompio internal structure) is ambiguise and wrong
for the shared file pointer scenario.
2015-08-05 14:31:11 -05:00
Edgar Gabriel
02a4eb2f13 add the ompi_file_t pointer correctly on the ompio file handle for the sm and individual component. 2015-08-05 14:28:27 -05:00
Jeff Squyres
a36d7e6026 treematch: __FUNCTION__ -> __func__ fixes 2015-08-05 05:39:38 -07:00
Jeff Squyres
a0ebbee6ef libnbc: __FUNCTION__ -> __func__ fixes 2015-08-05 05:27:23 -07:00
Gilles Gouaillardet
3d1780f1a2 sharedfp: set f_fh when opening a shared file 2015-08-05 15:07:21 +09:00
Jeff Squyres
047eccef8d Merge pull request #725 from bosilca/treematch
Add a new topo module: Treematch
2015-07-31 15:17:54 -04:00
Howard Pritchard
8649a9f6ef Merge pull request #757 from roblatham00/lustre-excl-open-fix
hint processing should not open files
2015-07-31 12:16:14 -06:00
rhc54
a9b10cfbf0 Merge pull request #761 from jithinjosepkl/master
Fix warnings in direct (pml-cm,mtl-ofi) build
2015-07-31 09:15:30 -07:00
Edgar Gabriel
ffa67b9693 Performance tuning. make sure we catch if the user wants to set the default fileview and replace it with
our optimized default file view. Otherwise, performance will suffer. file_get_view should still return the correct filetype, not our optimized default file view
2015-07-30 19:15:00 -05:00
Edgar Gabriel
93a303ba89 Performance tuning: make sure the individual component is selected for 1 and 2 process communicators (important for some benchmarks) 2015-07-30 17:31:16 -05:00
Edgar Gabriel
9b2a7e41f0 make sure the final number of aggregators is recorded correctly when not using
our aggregator selection logic.
2015-07-30 17:24:01 -05:00
Rob Latham
6e9cbe397f hint processing should not open files
move opening of files from hint processing and into open routines.

This is MPICH commit 92f1c69f0de8 and 22a77dceda11

see https://trac.mpich.org/projects/mpich/ticket/2261
Ref: https://github.com/open-mpi/ompi/issues/158

Signed-off-by: Pavan Balaji <balaji@anl.gov>
2015-07-30 12:25:20 -05:00
Jithin Jose
bc4e8b7e73 Fix warnings in direct (pml-cm,mtl-ofi) build
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-07-29 15:49:37 -07:00
Edgar Gabriel
477083bca3 the memory chunk that has to be allocated for the llapi_get_stripe function seems to have changed compared to earlier version. This implementation now follows the code snipplet from the man pages. 2015-07-29 17:13:55 -05:00
Edgar Gabriel
217dcca853 - the memory chunk that has to be allocated for the llapi_get_stripe function seems to have changed compared to earlier version. This implementation now follows the code snipplet from the man pages.
- implementation of file_get_size and set_size
2015-07-29 17:10:39 -05:00
yohann
6eba52a121 mtl/ofi: add missing return. 2015-07-29 14:14:34 -07:00
Ralph Castain
023936e84b Silence coverity warnings 2015-07-29 07:28:08 -07:00
Edgar Gabriel
a3327fe299 Merge pull request #756 from edgargabriel/pr/nb-sharedfp-splitcoll2
- make the split collective shared file pointer operations work
2015-07-28 19:53:27 -05:00
Edgar Gabriel
3780089ce0 clean up the usage of opal_output vs. printf 2015-07-28 18:27:31 -05:00
Howard Pritchard
377bad18bd Merge pull request #747 from hppritcha/topic/ofi_progress_fix
mtl/ofi: don't inline ofi progress method
2015-07-28 09:42:01 -06:00
Edgar Gabriel
824d488709 - make the split collective shared file pointer operations work
- minor code restructering in io/ompio required for that.
2015-07-28 09:05:05 -05:00
Edgar Gabriel
e380f8c235 - fix the delete priority of the ompio component
- some application use MPI_File_delete as a collective function (e.g. IOR), which I think is not really covered by the standard. Right now, one process succeeds and theother ones return an error code. Fix that by not returning no error if the file that we try to delete does not exist anymore, to make these applications work.
2015-07-27 15:53:40 -05:00
Edgar Gabriel
3fb0614566 mark the request as ACTIVE 2015-07-27 12:43:45 -05:00
Edgar Gabriel
5e166c81a1 Merge pull request #745 from edgargabriel/pr/sharedfp-sm-logic3
Pr/sharedfp sm logic3
2015-07-27 12:04:53 -05:00
Howard Pritchard
f5c43c1185 mtl/ofi: retain inline progress function
Retain inline progress function for ofi
mtl, but have a non-inlined progress function
which is registered with the opal progress
mechanism.

 @jithinjosepkl

I've bad news about the psm provider.  I still notice
segfaults - not always - but frequently at finalize
when using the psm provider.  I don't notice this
when using the sockets provider.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-07-27 09:16:52 -06:00
Gilles Gouaillardet
318a1a40a4 coll/libnbc: ireduce_scatter_block
silence malloc(0) warning reported by Lisandro
2015-07-27 16:23:08 +09:00
George Bosilca
e239de581b Create a new topology framework using the TreeMatch library developped
at Inria Bordeaux. This allows us to take advantage of the remap
capability of MPI to rearrange the ranks beased on the weights
povided by the application.

Fix the indentation and protect with __DEBUG__ one fprintf.

Add the Cecill-B license to the imported library.

Fix a compiler warning.

Restrict the TreeMatch dependencies.

The TreeMatch software is released under BSD3 (as indicated by their
copyright information @
https://gforge.inria.fr/scm/viewvc.php/COPYING?view=markup&root=treematch).

Update the README.
2015-07-25 13:30:42 -04:00
Jeff Squyres
3e6694f7ea sharedfp: whitespace cleanup
No code changes.

Replace tabs with spaces and do other whitespace cleanup (via emacs).
2015-07-25 05:46:37 -07:00
Jeff Squyres
868a84d4da sharedfp: have sm_data->mutex always point to the right mutex
Even if the mutex is actually located in
sm_data->sm_offset_ptr->mutex, have sm_data->mutex point to it.  This
avoids a few #if blocks that are otherwise identical.
2015-07-25 05:42:57 -07:00
Edgar Gabriel
4f85e0d833 add the configure logic to check for sem_open and sem_init.
Change the code to rely on HAVE_SEM_OPEN etc. instead of my internal macro.
2015-07-24 10:23:43 -05:00
Edgar Gabriel
d1d23054c6 rename the sm_offset structure to mca_sharedfp_sm_offset to obey to the Open MPI naming convention 2015-07-24 10:10:41 -05:00
Edgar Gabriel
c91cb67787 fix a bug in the unnamed semaphore section that was introduced when I tried to unify the named and unnamed semaphore logic. 2015-07-24 10:05:07 -05:00
Edgar Gabriel
57c301f25a remove an erroneous free statement. 2015-07-24 09:44:27 -05:00
Jeff Squyres
6929aca1b7 topo/basic: also remove .windows from Makefile.am 2015-07-22 09:20:43 -04:00
Jeff Squyres
24ca887bd8 topo/basic: remove stale (empty) .windows file 2015-07-22 09:10:50 -04:00
Edgar Gabriel
b484784dca make ompio return gracefully in case something goes wrong early in file_open. 2015-07-20 10:03:16 -05:00
Edgar Gabriel
86c3000e18 fix the delete selection logic in io/base. With the previous version, there was a mismatch
in the version number and no component was selected for file_delete.
2015-07-20 10:01:30 -05:00
Howard Pritchard
466c8b0159 Merge pull request #697 from edgargabriel/pr/nb-coll-part2
pr/nb collective I/O part2
2015-07-14 14:00:39 -06:00
Edgar Gabriel
e355db005e fix the logic for setting stripe size and stripe count in the lustre fs module. Takes now also the MPI_Info object into consideration. 2015-07-14 10:53:19 -05:00
Ralph Castain
683efcb850 Rename the current opal_event_base to opal_sync_event_base in preparation for adding an async progress thread to opal. No functional changes made here - just a simple rename. 2015-07-11 10:08:19 -07:00
Edgar Gabriel
f2af8e94ff - first cut on the io interface changes
- add the C interfaces for the new non-blocking collective I/O functions of MPI 3.1
2015-07-09 10:58:13 -05:00
yosefe
103cac5bd9 yalla: fix mxm configuration parsing.
Take configuration from MXM_MPI_xx instead of MXM_PML_xx, same as mtl
mxm.
2015-07-08 19:18:23 +03:00
Gilles Gouaillardet
9e89985f3d restore whitespaces into the pdf files 2015-07-07 09:17:00 +09:00
Rolf vandeVaart
30a872b478 Add the ability to send host buffers through one sized staging buffers and CUDA buffers through different sized buffers. Fixes performance issues 2015-07-02 11:11:15 -04:00
Jeff Squyres
13425e759c bml r2: very minor cleanups
Delete stale comments, use C99 struct initialization.
2015-06-25 15:54:16 -07:00
Ralph Castain
265cd14f60 Purge whitespace 2015-06-25 13:27:56 -07:00
rhc54
8b62b63786 Merge pull request #666 from rhc54/topic/libfab
Add a common/libfabric component
2015-06-25 12:46:34 -07:00
Yohann Burette
7fd5ded327 mtl/ofi: message truncation is now indicated by FI_ETRUNC. 2015-06-25 11:06:41 -07:00
Yohann Burette
483ff23db1 mtl/ofi: cancels are now tracked by an error entry. 2015-06-25 11:06:41 -07:00
Ralph Castain
ea0e21bb06 Add a common/libfabric component to the opal layer where we can place common functions 2015-06-25 11:04:00 -07:00
Nathan Hjelm
ee36d813dc Merge pull request #657 from hjelmn/c99
more c99 updates
2015-06-25 11:21:09 -06:00
Howard Pritchard
f45914db9b Merge pull request #670 from hppritcha/topic/ownership_update
ownership: update ownership files
2015-06-25 11:02:45 -06:00
Nathan Hjelm
4d92c9989e more c99 updates
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Howard Pritchard
e49a37c034 ownership: update ownership files
per discussions at OMPI devel workshop

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-06-25 10:04:42 -06:00
Jeff Squyres
0bb3fd0a10 coll hierarch: remove last stale file 2015-06-25 08:40:50 -07:00
George Bosilca
dc1b125b12 There is no destructor for the base requests. 2015-06-24 14:29:45 -07:00
bosilca
1b8556f926 Merge pull request #653 from hjelmn/moar_ob1_fixes
pml/ob1: fix bugs in static request objects
2015-06-24 14:28:11 -07:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Nathan Hjelm
9a8a87611e pml/ob1: fix bugs in static request objects
This commit fixes several bugs in the static request objects used by
ob1 for blocking send/receive operations.

 - Fix memory leak when using MPI_THREAD_MULTIPLE. Requests were
   allocated off the free list but were destructed and NOT returned.

 - Fix double-destruct of static objects. There is no reason to
   CONSTRUCT/DESTUCT the static object for each send/receive
   operation. This adds overhead and no benefit. To keep the code
   clean helper functions have been added to finalize ob1 send/receive
   requests.

 - Remove now unnecessary include of alloca.h.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-23 11:00:45 -06:00
Nathan Hjelm
ac51acb3e1 Merge pull request #651 from hjelmn/fix_thread_multiple_check
pml/ob1: do not use OPAL_ENABLE_MULTI_THREADS to determine thread multiple support
2015-06-22 21:45:43 -06:00
Nathan Hjelm
284dd6babe pml/ob1: do not use OPAL_ENABLE_MULTI_THREADS to determine thread multiple support
OPAL_ENABLE_MULTI_THREADS is always on. The correct value to check is
OMPI_ENABLE_THREAD_MULTIPLE.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-22 19:17:23 -06:00
Andrew Friedley
2c9be59b37 Add new PSM2 MTL.
This new MTL runs over PSM2 for Omni Path.  PSM2 is a descendant of PSM
with changes to support more ranks and some MPI-3 features like mprobe.

PSM2 will only support Omni Path networks; PSM only supports True Scale.
Likewise, the existing PSM MTL will continue to be maintained for True
Scale, while the PSM2 MTL is developed and maintained for Omni Path.
2015-06-22 07:55:46 -07:00
Gilles Gouaillardet
0bd765eddd fix NBC_Copy for legitimate zero size messages
this fixes a regression from open-mpi/ompi@9a70765f27
2015-06-22 09:51:25 +09:00
Edgar Gabriel
dedeee9771 finishing the changes for the non-blocking and split cpllective I/O operations. Everything except for the
interface changes to the io framework is done.
2015-06-18 06:22:41 -05:00
Edgar Gabriel
3b11a8b61c making the current work compile. 2015-06-18 05:56:51 -05:00
Edgar Gabriel
cc219281ba checkpoint of the current work, since I need to resync wioth master to fix the compilation problems 2015-06-18 05:20:07 -05:00
Edgar Gabriel
100515e321 remove split collective interfaces from fcoll and their fake implemenations. Not required anymore 2015-06-18 05:20:07 -05:00
Edgar Gabriel
19cac73a9b first part of the changes trequired to support non-blocking colelctive io operations 2015-06-18 05:20:07 -05:00
Gilles Gouaillardet
0f08070a1c ompio: fix misc memory leaks
as identified by Coverity with CIDs 72147-72149, 731275 and 1269872
2015-06-17 11:17:54 +09:00
Gilles Gouaillardet
0f17cdfc57 fcoll: fix misc memory leaks
as reported by Coverity with CIDs 72293,72294 and 1269894
2015-06-17 11:17:52 +09:00
Nathan Hjelm
c33b786dd9 Merge pull request #620 from hjelmn/ompi_coverity
ompi coverity fixes
2015-06-16 06:10:40 -06:00
rhc54
9a8bda0b72 Merge pull request #637 from jithinjosepkl/pr/pml-cm-opt
pml-cm bug fixes
2015-06-15 19:25:09 -07:00
Jithin Jose
7ccde09a09 Do opal_convertor_copy_and_prepare_for_send for buffered send mode as
MCA_PML_CM_HVY_SEND_REQUEST_BSEND_ALLOC calls opal_convertor_pack
directly.

Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-06-15 17:12:50 -07:00
Jeff Squyres
cc66745e7a mtl/ofi: convert to use external libfabric
Use the new OPAL_CHECK_LIBFABRIC macro.
2015-06-15 15:17:06 -07:00
Gilles Gouaillardet
ee3a1da28a pml/ob1:mca_pml_ob1_recv_request_put_frag silence a warning
proc local variable is used only in heterogeneous mode
2015-06-15 10:00:53 +09:00
George Bosilca
67b70bb47a Add multi-threaded support. 2015-06-12 14:22:17 -07:00
George Bosilca
b2cf74cabc A first cut at a possible solution for the missing requests
from the message queues (a debugging feature). With this approach
all blocking (single threaded) requests are allocated from the main
freelist, so they will be accounted for during the message queues
investigation).
2015-06-12 14:22:17 -07:00
Ryan Grant
eec120678c Merge pull request #614 from tkordenbrock/topic/portals4.triggered.collectives
coll-portals4: implement collective operations using Portals4 triggered operations
2015-06-11 08:20:55 -06:00
Jithin Jose
7cfbfc4c89 Initialize convertor in pml-cm-send and recv.
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-06-10 09:39:31 -07:00
Todd Kordenbrock
b725186768 mtl-portals4: Verify the result of PtlPTAlloc()
The Portals4 MTL allocates two Portals IDs requesting specific
well-known IDs and assumes that those IDs are allocated.  If those IDs
are in use, PtlPTAlloc() will allocate a different ID.  This commit
verifies that the requested IDs were allocated.
2015-06-09 14:43:50 -05:00
Jeff Squyres
347290f785 pml/Makefile.am: add missing file to $(headers) 2015-06-02 20:07:54 -07:00
Jeff Squyres
a55eb5e2c6 Merge pull request #602 from jithinjosepkl/pr/pml-cm-opt
Optimizations to PML-CM
2015-06-02 13:47:10 -05:00
Todd Kordenbrock
a274d2795c coll-portals4: implement collective operations using Portals4 triggered operations
This commit implements the reduce, allreduce, barrier and bcast
collective operations using Portals4 triggered operations.
2015-06-02 11:41:19 -05:00
Nathan Hjelm
472e5635c7 topo/base: fix coverity issue
CID 1295340 Unchecked return value (CHECKED_RETURN)

Check the return code of mca_base_framework_open. If the call fails for some reason
the component array will not be properly defined. This will cause issues in
mca_topo_base_find_available.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-02 08:59:15 -06:00
Edgar Gabriel
aa72e5b2ca fix the selection logic to not overwrite on the new aggregator side the list of
aggregators determined by the algorithm.
2015-05-27 22:35:45 -05:00
Jithin Jose
5ba5a9ade2 Offset buffer by datatype true_lb to handle resized datatypes.
- Follow up patch for 56869bff38

Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-05-27 13:51:05 -07:00
Jithin Jose
c745854d9b Avoid opal_convertor_pack for contigous data types in MXM mtl
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-05-27 11:09:25 -07:00
Jithin Jose
07043894bd Avoid extra lookup for ompi_proc in homogenous build
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-05-26 21:42:42 -07:00
Jithin Jose
50089977ac Inline PML-CM
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-05-26 21:42:41 -07:00
Jithin Jose
56869bff38 Avoid datatype pack/unpack for contiguous data on homogenous systems.
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-05-26 21:42:41 -07:00
Jeff Squyres
fb12572438 OFI: make v1.10 and v2.0 fit in version checking scheme
v1.10 is now in the same compatibility level as v1.7/v1.8 (there
is/will be no v1.9 series).  v2.0 now takes over for what used to be
called v1.9.
2015-05-26 18:58:28 -07:00
Nathan Hjelm
d21bd24126 ompi/crcp: fix logic issue after component selection
CID 70630 Dereference before null check

Cleaned up useless goto statements and deleted NULL check. If
mca_base_select returns success than best_module and best_component
will always be non-NULL.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-26 11:48:40 -06:00
Gilles Gouaillardet
e980958ad4 pml/ob1: silence a warning 2015-05-26 15:05:44 +09:00
Gilles Gouaillardet
85c45e2275 pml/ob1: fix mca_pml_ob1_recv_request_put_frag(...) in heterogeneous mode 2015-05-22 15:48:45 +09:00
Nathan Hjelm
ce48eabd84 pml/ob1: use c99 flexible array members instead of size 1 arrays
This commit updates several ob1 structures to take advantage of C99's
flexible array member.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-20 10:31:35 -06:00
Gilles Gouaillardet
b6c67e051d io/ompio: fix misc memory leaks
as reported by Coverity with CIDs 72147-72149,72187,72188,731274,731275,741356,
1269889,1269893,1271535 and 1269872
2015-05-20 17:19:39 +09:00
Gilles Gouaillardet
1488e82efd osc/pt2pt: enable heterogeneous support 2015-05-14 16:42:48 +09:00
Todd Kordenbrock
c42e277385 mtl-portals4: thread multiple updates
When activating short receive blocks on the overflow list, remove
the PTL_ME_EVENT_LINK_DISABLE flag so the event gets generated.
Without PTL_EVENT_LINK, the block status can't reach the activated
state.

Replace #ifdef with #if for Open MPI configure booleans, because
Open MPI configure booleans are always defined and the value must
be checked.
2015-05-13 17:06:18 -05:00
Yohann Burette
27f1884cf8 mtl/ofi: Reworked header files. Added compat to ease maintenance. 2015-05-12 15:47:50 -07:00
rhc54
b59fa14004 Merge pull request #583 from rhc54/topic/mallocwarnings
Silence malloc(0) warnings reported by Lisandro
2015-05-12 13:37:38 -07:00
Ralph Castain
9a70765f27 Silence malloc(0) warnings reported by Lisandro 2015-05-12 12:38:58 -07:00
Ryan Grant
bbeaf41a52 Merge pull request #580 from tkordenbrock/topic/mtl.add.status.to.short.recv.blocks
mtl-portals4: add status to short recv blocks to coordinate out of or…
2015-05-11 13:44:45 -06:00
Ryan Grant
265682bdb9 Merge pull request #581 from tkordenbrock/topic/remove.overlapping.multiMD.code
portals4: use a single Memory Descriptor to cover all of memory
2015-05-11 13:20:32 -06:00
George Bosilca
78f5f0f8a9 Show the name of the collective that failed to get initialized. 2015-05-11 15:10:37 -04:00
Todd Kordenbrock
9df163f116 portals4: use a single Memory Descriptor to cover all of memory
In days past, some implementations of Portals4 could not cover all
of memory with a single Memory Descriptor so multiple large
overlapping Memory Descriptors were created.  Because none of the
current implementations have this limitation (and no future
implementations should either), this commit removes the overlapping
Memory Descriptors code.
2015-05-11 11:49:41 -05:00
Todd Kordenbrock
074583060d mtl-portals4: add status to short recv blocks to coordinate out of order events
If OMPI is initialized as thread multiple, then it is possible for
Portals events to be processed out of order by different threads.
Out of order events could lead to reactivation of the block
(PTL_EVENT_AUTO_FREE) before the block is removed from the active
list (PTL_EVENT_AUTO_UNLINK).  This commit adds a status field to
ompi_mtl_portals4_recv_short_block_t that coordinates these events.
2015-05-11 11:48:25 -05:00
Gilles Gouaillardet
650289bc33 romio314: update one more romio->romio314 name
Also missed this in open-mpi/ompi@db257cdbc0.
2015-05-08 18:26:33 +09:00
Ralph Castain
6e95bcd583 Fix typo in oob_tcp.c when IPV6 enabled. Cleanup a few other warnings, including a type in coll_sm that prevented that component from registering its MCA params! 2015-05-07 21:05:08 -07:00
Gilles Gouaillardet
9d56b85b55 initialize common symbols from ompi 2015-05-08 10:11:58 +09:00
Gilles Gouaillardet
ab148e4e0c romio314: update one more romio->romio314 name
Also missed this in open-mpi/ompi@db257cdbc0.
2015-05-08 09:12:22 +09:00
Jeff Squyres
b3d89cf7b0 romio314: update one more romio->romio314 name
Missed this in db257cdbc0.
2015-05-07 09:40:45 -07:00
Jeff Squyres
691b4ec1e5 romio314: whitespace cleanup
No code changes
2015-05-05 06:23:59 -07:00
Jeff Squyres
db257cdbc0 romio314: adhere to the prefix rule
Rename all files and symbols from "io_romio" to "io_romio314".  This
fixes --disable-dlopen builds (because they were missing
the mca_io_romio314_component symbol).
2015-05-05 06:23:59 -07:00
Devendar Bureddy
88eb1fa936 HCOLL: refactoring hcoll_init 2015-05-04 22:03:36 +03:00
Jeff Squyres
8127c24f30 romio314/Makefile.am: whitespace cleanup
No code changes.
2015-05-04 07:20:11 -07:00
Jeff Squyres
332bca7183 romio314/Makefile.am: name the component library properly 2015-05-04 07:20:11 -07:00
Howard Pritchard
74089f5e3a Merge pull request #558 from ggouaillardet/refresh/romio314
Refresh/romio314
2015-05-01 14:23:43 -06:00
Gilles Gouaillardet
34128c1cad mca_topo_base_dist_graph_neighbors: do not fail if legitimate parameters are provided.
Per the MPI 3.0 standard (chapter 7, page 310) :
"If maxindegree or maxoutdegree is smaller than the numbers returned by
MPI_DIST_GRAPH_NEIGHBOR_COUNT, then only the first part of the full list is returned."
2015-05-01 13:31:05 +09:00
Gilles Gouaillardet
6b3126e69e ROMIO 3.1.4 refresh: add refresh notes 2015-04-30 19:02:20 +09:00
Gilles Gouaillardet
e1b6ab4f1d ROMIO 3.1.4 refresh: remove old romio 2015-04-30 19:01:23 +09:00
Gilles Gouaillardet
85e77079b4 ROMIO 3.1.4 refresh: use romio from mpich 3.1.4 2015-04-30 19:00:50 +09:00
Gilles Gouaillardet
92f6c7c1e2 ROMIO 3.1.4 refresh: apply post romio-3.1.4 patches 2015-04-30 18:56:53 +09:00
Gilles Gouaillardet
6400bc75ab ROMIO 3.1.4 refresh: patch romio for Open MPI 2015-04-30 18:53:55 +09:00
Gilles Gouaillardet
eacd434a02 ROMIO 3.1.4 refresh: import romio from mpich 3.1.4 tarball 2015-04-30 18:53:03 +09:00
Gilles Gouaillardet
e2e91142d5 ROMIO 3.1.4 refresh: prepare new romio directory 2015-04-30 18:52:22 +09:00
Gilles Gouaillardet
1ee58af8f5 update ROMIO .gitignore 2015-04-30 18:49:53 +09:00
Ryan Grant
6ab91a6781 Merge pull request #561 from tkordenbrock/topic/mtl.fix.datatype.overflow
mtl-portals4: fix datatype overflow in ompi_mtl_portals4_long_isend()
2015-04-28 15:15:59 -06:00
Ryan Grant
cc3da91700 Merge pull request #562 from tkordenbrock/topic/mtl.expand.source.bits.to.24
mtl-portals4: expand the source field of the match bits to 24 bits
2015-04-28 14:31:06 -06:00
Rolf vandeVaart
91a8ec52ca Fix possible unintialized warnings 2015-04-28 16:25:35 -04:00
Todd Kordenbrock
8a4616f724 mtl-portals4: fix datatype overflow in ompi_mtl_portals4_long_isend()
The length parameter of ompi_mtl_portals4_long_isend() was declared
as "int", which may not be big enough depending on the platform and
compiler options used.  This commit changes the type to size_t to
prevent overflow.
2015-04-28 14:40:25 -05:00
Todd Kordenbrock
3e437f6184 mtl-portals4: expand the source field of the match bits to 24 bits
The source field was 16 bits which is not sufficient for many
current and future machines.  This commit expands the source field
to 24 bits and reduces the tag field from 32 bits to 24 bits.
2015-04-28 14:25:30 -05:00
Gilles Gouaillardet
18b75bd40d io/base: check the MCA version matches 2015-04-28 17:48:23 +09:00
Yohann Burette
1be185ed87 mtl/ofi: Remove use of MR. 2015-04-24 15:55:21 -07:00
Nathan Hjelm
2716b8b1da osc/pt2pt: correct flush expected counts
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-04-24 13:34:21 -06:00
Nathan Hjelm
f1d09e55ec osc/pt2pt: silence warnings
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-23 15:35:47 -06:00
Nathan Hjelm
29b435a5a4 osc/pt2pt: fix bugs that caused incorrect fragment counting
This commit fixes a bug identified by MTT that occurred when mixing
passive and active target synchronization. The bugs fixed in this
commit are:

 - Do not update incoming fragment counts for any type of unbuffered
   control message. These messages are out-of-band and should not be
   considered towards the signal counts.

 - Complete a change from using received counts to expected counts for
   lock, unlock, and flush acks. Part of the change made it into
   master before the rest was ready. This was preventing wakeups in
   some cases.

 - Turn the passive_target_access_epoch module member into a
   counter. As long as at least one peer is locked we are in a
   passive-target epoch and not an active target one. This fix will
   ensure that fragment flags are set appropriately.

fixes #538

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-23 13:22:24 -06:00
Ryan Grant
1436417488 Merge pull request #547 from tkordenbrock/topic/mtl.add.logical.mode
mtl-portals4: add the option to use the Portals4 logical to physical mapping
2015-04-22 15:06:13 -06:00
Nathan Hjelm
7c95ecf859 mca/base: provide functions to determine if a framework is registered/open
This commit also fixes a problem with the lazy opening of topo
components. The topo framework incorrectly: 1) checked if the topo
framework was open by checking the length of the components list, and
2) called the framework open directly instead of using
mca_base_framework_open.

fixes #544

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-21 13:54:25 -06:00
Todd Kordenbrock
8e56002ec7 mtl-portals4: add missing return to portals4_init_interface() 2015-04-21 11:30:33 -05:00
Todd Kordenbrock
34c50fa963 mtl-portals4: move MD cleanup closer to failure
PtlMDRelease() was called if read_msg() returned a failure code.
This commit moves the PtlMDRelease() inside read_msg() so that it
doesn't get called in cases where the failure happens before or at
the PtlMDBind().
2015-04-21 11:30:33 -05:00
Todd Kordenbrock
422be76770 mtl-portals4: add a debug message for thread multiple mode 2015-04-21 11:30:33 -05:00
Todd Kordenbrock
35e5ffd001 mtl-portals4: add the option to use the Portals4 logical to physical table
This commit adds an MCA variable to select Portals4 logical
addressing, populates the logical-to-physical mapping table and
initializes the NI in this mode.
2015-04-21 11:30:33 -05:00
Yohann Burette
19607d2ce7 mtl/ofi: Remove memset() from progress path. 2015-04-20 14:12:39 -07:00
Yohann Burette
d2eda04801 mtl/ofi: Use fi_tinject for small messages. 2015-04-20 14:12:39 -07:00
Nathan Hjelm
033894b493 Merge pull request #541 from hjelmn/c99_components
C99 component initialization
2015-04-20 10:45:39 -06:00
Yohann Burette
ba1bc00df1 mtl/ofi: Remove FI_CANCEL. 2015-04-20 09:40:37 -07:00
Devendar Bureddy
19f5a3eff4 HCOLL: skip hcoll if enable_mpi_threads is true
reasons:
    1) default OCOMS is not configured with --enable-ocoms-multi-threads
    2) locking overheads
2015-04-20 19:39:49 +03:00
Devendar Bureddy
dd8e9fa176 HCOLL: enable by defaut 2015-04-20 19:39:30 +03:00
Nathan Hjelm
d251fa1525 pml/ob1: fix heterogenous build
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-20 09:27:00 -06:00
Howard Pritchard
3339274136 Merge pull request #542 from hppritcha/topic/coverity_714118
fcoll/two_phase: coverity fix
2015-04-20 05:42:12 -06:00
Howard Pritchard
de215addc6 fcoll/two_phase: coverity fix
fix CID 714118

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-04-18 14:34:48 -06:00
Nathan Hjelm
df75d0382f ompi: use C99 subobject naming for component initialization
This commit helps future-proof ompi components by initializing each
component member by name.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Yohann Burette
9392bb5ede mtl/ofi: Implement Probe/Mprobe/Mrecv using FI_PEEK/FI_CLAIM. 2015-04-17 16:42:13 -07:00
Mangala Jyothi Bhaskar
c4de46e284 Fix number of aggregators used in two phase fcoll 2015-04-16 10:39:10 -05:00
Nathan Hjelm
3436f2917d Merge pull request #449 from hjelmn/mca_base_update
mca/base update
2015-04-16 08:41:48 -06:00
Ralph Castain
a4b1225892 Don't register the PSM errhandler until it is certain that the PSM component can be used.
This doesn't matter on the master, but it does matter on the 1.8 branch as the MTL select logic is different over there.
2015-04-14 07:54:53 -07:00
Jeff Squyres
49f52a5356 osc_sm_passive_target.c: update the check for lock types
Based on some on-list and IM discussion with @hjelmn about
open-mpi/ompi@40b7643119, change the testing to a switch/case.  If we
fall into the default case, assert() error (because it's an OMPI
developer programming error).
2015-04-13 12:02:15 -04:00
Jeff Squyres
40b7643119 osc_sm_passive_target.c: ensure ret is always defined
Fixes a compiler warning
2015-04-13 11:31:43 -04:00
Ralph Castain
3e44d3c9e3 Enable singletons to run without any active OOB module until they attempt to comm_spawn 2015-04-10 14:06:42 -07:00
Nathan Hjelm
80ed805a16 osc/pt2pt: fix synchronization bugs
The fragment flush code tries to send the active fragment before
sending any queued fragments. This could cause osc messages to arrive
out-of-order at the target (bad). Ensure ordering by alway sending
the active fragment after sending queued fragments.

This commit also fixes a bug when a synchronization message (unlock,
flush, complete) can not be packed at the end of an existing active
fragment. In this case the source process will end up sending 1 more
fragment than claimed in the synchronization message. To fix the issue
a check has been added that fixes the fragment count if this situation
is detected.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-06 08:39:19 -06:00
Jithin Jose
9c937d44ae Inline MTL-OFI
Signed-off-by: Jithin Jose <jithin.jose@intel.com>

Conflicts:
	ompi/mca/mtl/ofi/mtl_ofi_recv.c
2015-04-03 15:19:30 -07:00
Jithin Jose
50304dfe05 Inline mtl-datatype pack/unpack
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-04-03 15:19:21 -07:00